Abstract
Large-scale, unbiased single-cell genomics studies of complex developmental compartments, such as hematopoiesis, have inferred novel cell states and trajectories; however, further characterization has been hampered by difficulty isolating cells corresponding to discrete genomic states. To address this, we present a framework that integrates multimodal single-cell analyses (RNA, surface protein and chromatin) with high-dimensional flow cytometry and enables semiautomated enrichment and functional characterization of diverse cell states. Our approach combines transcription factor expression with chromatin activity to uncover hierarchical gene regulatory networks driving these states. We delineated and isolated rare bone marrow Lin−Sca−CD117+CD27+ multilineage cell states (‘MultiLin’), validated predicted lineage trajectories and mapped differentiation potentials. Additionally, we used transcription factor activity on chromatin to trace and isolate multilineage progenitors undergoing multipotent to oligopotent lineage restriction. In the proposed model of steady-state hematopoiesis, discrete states governed developmental trajectories. This framework provides a scalable solution for isolating and characterizing novel cell states across different biological systems.
A fundamental challenge in developmental biology is understanding the hierarchical states within stem cell and progenitor compartments, along with the gene regulatory networks (GRNs) that drive cell fate decisions.1 GRNs involve transcription factors that act on regulatory elements like enhancers and silencers, influencing cell-type-specific gene expression patterns.2–4 Single-cell genomics technologies like scRNA-seq and ATAC-seq5,6 provide high-resolution data on transcriptome and chromatin states. However, due to technical limitations, these techniques often rely on inferential rather than experimental evidence to understand developmental relationships and cellular potentials. Newer methods, such as CITE-seq7, combine transcriptome profiles and surface protein data, but do not permit the isolation of discrete populations because these methods are destructive. Integrating CITE-seq with high-dimensional flow cytometry (e.g., InfinityFlow8,9) provides a powerful approach to isolate and study cell populations manifesting discrete transcriptional and chromatin signatures, thus offering insights into the developmental potential of progenitors and their underlying GRNs.
In hematology, flow cytometry has long been used to identify and isolate progenitor populations based on surface markers. This method has helped define key progenitor types like multipotent progenitors (MPP),10 megakaryocyte erythroid progenitors (MEP), common myeloid progenitors (CMP), granulocyte monocyte progenitors (GMP)11 and common lymphoid progenitors (CLP)12, forming the basis of the classical model of hematopoiesis. However, flow cytometry is limited by the number of detectable markers, and recent findings show that these progenitor populations are more heterogeneous than previously thought.3,4,13 New techniques, including scRNA-seq, have revealed a more complex view of hematopoiesis, suggesting that lineage commitment is a continuum rather than a series of stable, defined states14,15. More recently, the punctuated continuum model re-introduced the classical concept of stability across developmental trajectories, with pools of cells that are variably lineage specified.14,16,17
Novel bioinformatics approaches and clonogenic assays have helped identify intermediate progenitor states, such as MultiLin cells, which exhibit mixed-lineage gene expression and reside within traditional CMP and GMP gates.2–4 These findings challenge classical progenitor definitions, highlighting the importance of transitional states in lineage specification. To better understand hematopoietic progenitors, we combined multiomic single-cell methods7,18 with high-dimensional flow-cytometric profiling8,9 and developed a unified computational framework (termed “ChromLinker”) that integrates data from different sources to derive GRNs that reflect developmental trajectories. Analyzing transcription factor activities on accessible chromatin regions, identified key surface proteins, such as CD55 and CD371, as markers of critical lineage transitions in MultiLin progenitors. This framework provides a comprehensive view of progenitor heterogeneity and reveals how transcriptional and chromatin profiles influence lineage commitment and orchestrate steady-state blood production; forming the basis of a model of steady-state hematopoiesis in which discrete states govern developmental trajectories. Interactive analysis tools and datasets for the MarrowAtlas are available at https://altanalyze.org/MarrowAtlas/.
Results
A multimodal single-cell atlas resolves multilineage states
To create a comprehensive CITE-seq7 atlas of murine bone marrow, we first identified optimal gating strategies to enrich for rare hematopoietic stem progenitor cell (HSPC) states (Fig. 1a). This includes broad CD117+ progenitors (Miltenyi autoMACS), Lin−Sca1+CD117+ stem cells and multipotent progenitors (hereafter HSC-MPP, excluding CD150−CD48+ MPP3-MPP4-gate cells) and Lin−CD117+CD127+ lymphoid progenitors (hereafter CD127+ progenitors, Extended Data Fig. 1a,b). Informed by retrospective and new scRNA-Seq captures, we developed a gating procedure for previously defined markers of MuliLin cells that depleted neutrophil and monocyte specified populations (Lin−CD117+CD34+CD115−Ly6C− MultiLin gate)(Extended Data Fig. 1g,h)2,3.
Fig. 1. A high-resolution CITE-seq atlas of stem/progenitor cells.

a, Diagram showing the enrichment of C57/Bl6 bone marrow cells by magnetic antibody enrichment (CD117+) followed by flow cytometry sorting: HSC-MPP gate (Lin−Sca1+CD117+ gating out CD150−CD48+ MPP3-MPP4-gate cells), CD127+ gate (Lin−CD117+CD127+), and initial MultiLin enrichment gate (Lin−CD117+CD34+CD115−Ly6C−). This MultiLin gate was anticipated to include bipotential cell states (common lymphoid progenitor; CLP, monocytic-dendritic progenitor; MDP, neutrophil monocyte progenitor expressing intermediate levels Irf8 and Gfi1; IG2, basophil-mast cell progenitor; BMCP, megakaryocyte-erythroid progenitor; MEP) but not their expected progeny. (b) Density plots illustrating the change in distribution of normalized ADT read counts for different concentrations of antibody (4x, 2x, 1x, 0.5x, and 0.25x) across the above gated input populations (HSC-MPP, MultiLin, CD117+, CD127+). c, Heatmap of TotalVI denoised ADT expression values across Biolegend universal mix v1.0, the 5-fold molecular titration of antibodies and the final product with curated antibody concentrations. d, scTriangulate workflow incorporating published transcriptome-defined labels (1. Reference), RNA-based ICGS2-derived clusters (2. RNA), and ADT-based ICGS2-derived clusters (3. ADT), and well-based scRNA-seq captured populations (4: asterix indicate Ly6C+ eosinophils), to inform the final 87 clusters (5). e, 3 vertically stacked heatmaps across the final 87 clustered states illustrate the marker gene expression values (top), cluster distribution of cells from each capture population (middle, annotated to the left), and the ADT values for surface protein markers (bottom). Marker genes and antigens shown (right). 25 representative cells per cluster shown.
Previously, we molecularly titrated CITE-seq antibodies with antibody-derived-oligonucleotide tags (ADT) on human bone marrow by diluting them five fold and analyzing ADT sequence abundance in scRNA-seq19. Initially, we manually-titrated 65 TotalSeq-A antibodies on CD117+ enriched mouse bone marrow cells2,20 and then captured the progenitor gates above using the TotalSeq-A Universal Cocktail (Universal Mix v1.0, n=195) (Supplementary Table 1, 2). As we previously showed in human19, manual titration improved detection of many ADTs over a universal mix (Extended Data Fig. 2a,b). As before, we performed individual 5-fold titrations of 195 antibodies (Extended Data Fig. 2d), prioritized for an optimized CITE-seq panel based on known discriminative markers of progenitor populations as well as dose-dependent signals of the antibodies used for detection (Fig. 1c). The revised titrated CITE-seq panel of 103 ADTs, demonstrated improved cell-state re-classification accuracy when used to profile 63,000 new cells in HSC-MPP, CD127+, MultiLin, CD117+ gates (online Methods, Extended Data Fig. 2a–c, 2e and Supplementary Table 1, 2). To annotate the captured populations, we compiled a database of reference cell states for HSC-MPP21,22, dendritic cell23, basophil-mast cell commitment24, early lymphoid25 and myeloid intermediate cells2–4. Then we integrated RNA and ADTs across the database of published cell populations, with those from unsupervised clustering of each modality (ICGS226) using scTriangulate27 (Extended Data Fig. 2f). scTriangulate identified 87 stable discrete cell states, as evidenced by modality specific contribution scores and cluster confidence (Extended Data Fig. 2g–j). This multimodal integration defined clusters from a combination of prior-defined cell states (n=31) and unsupervised transcriptome (n=40) or ADT (n=16) clustering (Extended Data Fig. 2g–k). To supplement this atlas, we used a well-based scRNA-seq capture method (HIVE; Methods) to analyze CD117+ progenitors, and Lin−CD117+CD34+CD115−Ly6C− MultiLin populations along with gates enriched for CD125+ eosinophil and FcER1a+ basophil-mast-cell progenitors (Extended Data Fig. 4a; Ly6C-1, and Ly6C+ Eo-Trajectory gates)(~15,000 cells)(Fig. 1d4). We annotated the cells through gene-set enrichment against markers from our database and from the literature 28–34. We noted several clusters unique to MultiLin (ML) gate which we denote as ML-1a, ML-1b, ML-2, and ML-3, along with those with transcriptomes enriched for cell cycle (ML-CC) or MDP gene expression (ML-MDP)(Fig. 1e). We also noted rare populations that that share qHSC and macrophage marker genes that selectively appeared in the HSC-MPP gate and occurred at the same frequency as macrophages (Extended Data Fig. 2l). These cells possessed unique transcriptional and cell surface markers relative to qHSC and macrophages (Extended Data Fig. 2l). Flow cytometry revealed CD193 and CD115 macrophage markers on sparse cells within the HSC-MPP gate (Extended Data Fig. 2m), consistent with reported naturally occurring HSC-macrophage interactions35,36. To explore gene expression and ADT in the 87 clusters or annotate submitted datasets, we developed a series of online analysis tools in MarrowAtlas (see Data Availability; https://altanalyze.org/MarrowAtlas/).
High-dimensional flow cytometry reconciles gating strategies
We generated a high-throughput flow cytometry atlas using the InfinityFlow protocol9,37 to match the CITE-seq atlas. We profiled 140 cell surface markers in bone marrow CD117+ cells from Irf8GFP transgenic mice38. The backbone contained Irf8GFP and 22 surface markers selected based on prior studies3,4,11–13,20,34,39–48 (Supplementary Table 3). Infinity markers include 95 antibodies from the titrated CITE-Seq panel (Supplementary Table 1), and 10 transcription factor transgenic reporters (e.g., Gata1, Gata2, Pu.1, Gfi1, Myc). The ‘Initial InfinityFlow Object’ marker panel was sequentially optimized to better separate known lineages (i.e., basophil, mast, eosinophil) to produce a ‘Curated InfinityFlow Object’ (Fig. 2a, Supplementary Table 3). Initial analysis with the software pyInfinityFlow9 yielded a UMAP embedding that resembled the structure found in the CITE-seq data (Fig. 2a).
Fig. 2. High dimensional InfinityFlow atlas of stem/progenitor cells.

a, A diagram illustrating the InfinityFlow workflow. Final InfinityFlow atlas containing 1,935,037 cells, based on a 26 antibody backbone (right) incorporating 95 Infinity-marker antibodies and 14 transgenic transcription-factor reporters. b, InfinityFlow UMAPs in which published gating schemes (see reference above each UMAP) are in silico projected and color coded to denote flow-cytometrically-defined subsets of HSPC. c, InfinityFlow UMAP with surface protein expression abundance projected for Sca1 and CD27.
To correspond the cells in our InfinityFlow atlas with prior studies, we performed in-silico gating using FlowJo to emulate defined hematopoietic progenitor gating schemes 3,4,11–13,20,34,39–48 (Fig. 2b). While gating long/short-term HSCs, MPP2, MPP3, and MPP446 was largely consistent across studies, subdivisions within GMP and CMP (i.e., preGM, IG2) varied significantly across studies. For example, MPP148 (which neighbors early megakaryocytic and erythroid specified cells)13, and the MPP4 compartment based on CD48 and CD62L expression20 were clearly delineated in the Infinity object. In contrast, In silico CMP and GMP gates11 evidenced substantial heterogeneity. Within CMP-GMP gating, CD150 identified megakaryocyte-erythroid biased cells13, transgenic fluorescent reporters for IRF8 and Gfi1 proteins identified the IG24 population with monocytic and neutrophil potentials, and Ly6C expresion marked neutrophil or monocyte specified progenitors.44,45 We could also clearly detect CD125+ committed eosinophil progenitors (EoP)42, FCER1A+ basophil-mast cell progenitors (BMCP)41, and CD127+ common lymphoid progenitors (CLP)12. Thus, we integrated published progenitors within the InfinityFlow atlas, with the exception of CMP and GMP (which appear to be gates instead of populations).
Analyzing the InfinityFlow atlas, we noted that while CD34 persists on some myeloid progenitor states (e.g. eosinophil42 and neutrophil3 progenitors), Lin−Sca−CD117+ MultiLin cells were CD27hi (Fig. 2c). Moreover, most bipotential lineage populations (except the ILC-NK progenitor branch) downregulated CD27 (Fig. 2c). Therefore, we used a Lin−Sca1−CD117+CD27+ gate to isolate MultiLin progenitors for the rest of the study. Thus, in silico gating of the InfinityFlow atlas enabled consolidation of a diverse set of historically delineated gating schemes for hematopoietic progenitors and established a foundation for integration with their CITE-seq and TEA-seq defined transcriptional and chromatin states. The Flow Cytometry Standard file for the final InfinityObject can be re-analyzed using standard commercially-available flow cytometry data analysis software (see Data Availability).
Integrated atlases permit isolation of cluster-defined cells
CITE-seq ADT values and InfinityFlow signals have distinct distributions and sensitivity. Assuming surface protein expression rankings remain consistent49,50, we tested whether the percentiles of surface protein expression remained consistent across platforms using a two-step, percentile-based normalization strategy (Fig. 3a). First, we mapped the CITE-seq antibody signal intensities for the gated HSC-MPP, MultiLin or CD127+ populations to the signal intensities of corresponding percentiles within the total CD117+ population. Next, we generated complementary spline functions to map InfinityFlow signals with their percentile ranks within the CD117+ compartment. This can be visualized and calculated as the area under the curve for the kernel density estimation (KDE) functions for the signal of each antibody. This two-step process enabled the inference of expected InfinityFlow fluorescence intensity for any given CITE-seq ADT value (Fig. 3a).
Fig. 3. Integration of CITE-seq atlas and InfinityFlow atlas enables validation of cluster-defined cell populations.

a, Diagram depicting normalization strategy applied to CITE-seq counts to approximate the signal distributions observed in InfinityFlow. b, UMAP embedding of curated InfinityFlow data trained using equal numbers of cells (n~1,150) for each transferred CITE-seq label (87 clusters sampled), colored by cluster. c, InfinityFlow UMAP embedding (top) with the indicated gated populations, and mapped CITE-seq cells from those gates (bottom). d, Transcription factor expression levels from either the InfinityFlow captures using transgenic fluorescent reporters imputed as part of the InfinityFlow object (upper), or the mRNA-count for the corresponding gene encoding the transcription factor (lower). e, “FS-scRNA-seq validation”: CITE-seq cell populations were selectively sorted based on defining InfinityFlow gates and profiled using scRNA-seq. The genomic purity of each sorted cell population is assessed across all CITE-seq defined clusters, following label transfer and displayed as a heatmap. X-axis = inferred sorted population, Y-axis = scTriangulate reference cell states (% mapped cells indicated by color bar); initial gates targeting specified states (top), gates from Pronk et al.13 (middle), and corresponds to new gates targeting MultiLin populations (bottom). f-g, Statistical evaluation of CITE-seq to InfinityFlow label transfer using Adjusted Rand Index (ARI), considering equivalent flow cytometry defined cell populations (CITE-seq, InfinityFlow); evaluation workflow (f), ARI of label transfer approaches tested (g). KDE: Kernal Density estimate, cH: cellHarmony
Using the software cellHarmony51 re-implemented in Python3, we transferred the transcriptome-defined CITE-seq cluster labels into the InfinityFlow atlas (protein features), resulting in well-separated populations (Fig. 3b). In addition to cell populations, capture gates from CITE-Seq generally correspond to the matching gates from InfinityFlow (Fig. 3c). Projection of the CITE-Seq directly within the InfinityFlow UMAP embedding enables direct correspondence between RNA expression and transcription factor reporter abundance, illustrating the fidelity of our alignment strategy (Fig. 3d, Supplementary Table 4). Comparison of the CITE-Seq projected labels with historical flow cytometry gating schemes by InfinityFlow (Fig. 2b) found a strong concordance for specified cell-states (i.e., erythroid, megakaryocytic, monocytic, dendritic, neutrophil), while MultiLin clusters overlapped with earlier delineations of CMP, preGM and GMP, suggesting our approach has the potential to resolve new cell states and gating solutions (Extended Data Fig. 3).
To rigorously test label transfer into the InfinityFlow object, we developed an in silico isolation strategy called flow-sort-scRNA-seq validation (“FS-scRNA-seq”) in which gates are predicted to enrich for target cell populations then utilized to capture populations for scRNA-Seq and mapped back to the atlas to evaluate enrichment. Given clear transcriptomic signatures3,24,44, we first isolated granulopoiesis states, and optimized surface marker combinations for sorting (Extended Data Fig. 4a). InfinityFlow-guided gates resulted in higher enrichment for the neutrophil subsets (Fig. 3e) than in silico gating (Extended Data Fig. 3). Within the Ly6C−Irf8− GMP-gate fraction CD9 marked eosinophil and basophil-mast-cell lineage specification (Extended Data Fig. 4a). Eosinophil commitment exhibited a clear developmental progression as CD125+Ly6C+CD106− EoP1 became CD125+Ly6C+CD106+ EoP2. While both EoP1 and EoP2 were absent in 10X Genomics captures, they were marked by a lineage-specific CRE reporter (EpxCreROSAtdTomato+)(Extended Data Fig. 4b–e) and were found in well-based captures albeit with low UMI counts (Extended Data Fig. 4f–i).
Next, we replicated gating schemes that sub-fractionated the CMP gate into progenitors specifying to granulocyte-monocyte or megakaryocytic-erythroid linages13. FS-scRNA-seq validation revealed that preMegE13 and erythroid/megakaryocyte gates13 were enriched for corresponding CITE-seq clusters (Fig. 3e). The preGM13 gate was highly heterogeneous and contained MultiLin clusters (Fig. 3e). To further enrich MultiLin clusters, we developed the algorithm Ab-MarkerFinder that iteratively optimizes surface marker combinations (based on unique expression patterns) for each target population (Methods). We applied this algorithm to all cell populations to predict optimal markers for isolation (Supplementary Table 5). These refined gates improved the isolation of several MultiLin populations (Extended Data Fig. 5a–5f). We achieved over 50% enrichment for the ML-1b cluster (Fig. 3e), while refined gates targeting specific MPP, MEP, BMCP and IG2 populations delivered high purity in flow-sorting-scRNA-seq validation (Fig. 3e). Thus, our InfinityFlow-derived strategy surpassed the resolution of historical gates to isolate unique populations spanning multipotent and oligopotent states.
To benchmark these correspondences, we took advantage of the fact that scRNA-seq captures from FS-scRNA-seq validated gates (Fig. 3e) could be used to provide labeled data in cases in which the transcriptome profile of specific flow cytometry gates is already known (Fig. 3f). By selecting populations with mutually exclusive transcriptome states (Extended Data Fig. 5g,h), we could quantify the success rate of different strategies to project labels from CITE-seq to InfinityFlow (Extended Data Fig. 5i). KDE mapping normalization plus cellHarmony label transfer outperformed the other tested techniques49,52–54 for transferring labels between CITE-seq and InfinityFlow (Fig. 3g). Thus, our framework for label transfer, which accounted for technological differences, outperformed alternative algorithms and signal normalization approaches.
Deep learning maps transcription factor activity
To delineate GRNs we profiled open chromatin regions within HSC-MPP- and MultiLin-gated cells using TEA-seq18, a trimodal assay that measures nascent transcriptomes, surface epitopes and chromatin accessibility (Fig. 4a and Extended Data Fig. 6). To define TEA-seq cluster analogs of the CITE-seq atlas, we applied the harmonypy55 workflow, and identified TEA-seq clusters corresponding to 57 of the 87 CITE-seq clusters with highly concordant marker genes (Extended Data Fig. 6c, Extended Data 7a–c). We validated the success of label transfer with Spearman correlations of marker gene rankings (Extended Data Fig. 7d, 7e). For the ATAC-seq cell states with sufficient read depth (32 of the 57) we inferred transcription factor activity using the ChromBPNet neural-network-modeling framework56. This enabled transformation of TEA-seq-cluster chromatin-accessibility profiles into base-pair-resolution ‘contribution scores’ that predict the importance of each DNA base pair to chromatin accessibility (measured as peak counts) within the given region (Fig. 5a, Extended Data Fig. 6). Short DNA sequences with similar contribution score profiles (seqlets) were clustered into contribution weight matrices (CWMs) using TF-MoDISco57 and assessed for their similarity to known transcription factor DNA-binding motifs using the CIS-BP2 database58. The CWMs were then scanned across all open chromatin regions to presumptively identify transcription factors that control accessibility and gene expression within each cell state. This analysis involved pairwise correlations of seqlet contribution score values for a given open-chromatin region and expression of a target gene located within the same topologically-associating domain (TAD)59. TF-gene interactions were prioritized based on the strength of Pearson correlation between seqlet contribution score and CITE-seq gene expression values across the 32 mapped TEA-seq clusters (Methods). Seqlets were found to cluster together with patterns specific to groups of CITE-seq clusters (e.g. Ets, RFX and C2H2-ZF factors in Sca+ HSC-MPP clusters, bZIP and Ets factors in MultiLin, neutrophil and monocytic clusters, and GATA and bHLH factors in erythroid and megakaryocytic clusters; Fig. 4b,c).
Fig. 4. Base resolution gene regulatory network predictions reveal divergent cell fate decisions.

a, ChromLinker workflow to identify DNA motifs underlying differential gene expression. Harmony transfer of CITE-seq cluster labels to TEA-seq (RNA), then clustered TEA-seq ATAC data was analyzed by ChromBPNet. b, Heatmap of the 80,441 seqlet (base resolution importance of chromatin accessibility) contribution score values significantly correlated (Pearson >0.4) to the expression of the 4,601 top-ranked variably-expressed genes. Correlations were restricted by the TAD-delimited genomic regions encompassing associated genes. c, Tick marks denote seqlet patterns for corresponding annotated transcription factor families defined by CIS-BP2 (left), with callouts to recurring motif patterns (below). d, Cytoscape visualization of the predicted GRN with major transcription factors (Hlf, Gata2, Gata1, Irf8, and Spi1) and their connections highlighted. transcription factor=triangle, target gene=circle. e, Cluster-specific GRN (ML-2, ML-3) with gene nodes colored and scaled according to their relative expression levels. f, Heatmap of “TF activity” composite score: Z-score integrating target gene expression, transcription factor expression, and regulatory contribution of the transcription factor to its putative target genes in each of the 32 clusters. g-j, Dot plot comparisons of the InfinityFlow fluorescent reporter level (vertical axis) and activity (horizontal axis) across each of the 32 mapped clusters for SPI1 (g), GATA2 (h), GATA1 (i), and IRF8 (j).
Fig. 5. Simulated perturbation of transcription factors nominates Cd55 and Clec12a (encoding CD371) as MultiLin progenitor reporters.

a, Heatmap of the activity scores aggregated on exemplar surface-protein-encoding genes. b, CellOracle cell identity shifts (black arrows) following simulated knock-out of Gata2, Gata1, Spi1, Irf8, Cebpa, and Cebpe projected over the integrated scRNA-seq UMAP coordinates. The ML-1b (purple), ML-2 (turquoise) and ML-3 (orange) clusters are highlighted to illustrate the hypothesized lineage (ML-1b/ML-2 versus ML-3) bifurcation event. c, Heatmap illustrating cell-state specific differential expression of genes encoding surface proteins (same as a) followed by perturbation of the selected transcription factors. d-f, ChromBPNet base resolution contribution score genome browser tracks for selected MultiLin clusters paired with dot plots of gene expression. Green arcs indicate peaks with accessibility correlated to target gene expression. Gray triangles are used to show magnification of the peak regions to show the base resolution contributions. Yellow background highlights are used to indicate the seqlet position that correlates to target gene expression: CEBP seqlet correlated to Clec12a expression (d), EICE (IRF+ETS composite element) seqlet correlated to Clec12a expression (e), and GATA seqlet correlated to Cd55 expression (f). The EICE element is within a peak distal to the Clec12a gene.
To define cell-state-specific regulatory interactions within the stem/progenitor compartment, we restricted ChromBPNet predicted transcription factors to targets overlapping top cluster marker genes. For this analysis, we used the CITE-seq RNA expression data paired with the seqlet contribution scores of linked open chromatin regions and known transcription factor DNA-binding motif annotations to prioritize transcription factor-target gene links (Methods, Extended Data Fig. 6; Supplementary Table 5). Cytoscape network visualization of the predicted TF-gene interactions in various cell states was consistent with known developmental functions of the transcription factor Hlf in stem cells, Gata2, Gata1, Myc and Nrf1 in megakaryocytic-erythroid progenitors and Spi1 and Irf8 in monocytic-dendritic cell progenitors (Fig. 4d). Cell-state-specific models highlighted the known activity of Hlf in qHSCs, which lacked lineage-specific transcription factor activities (Fig. 4e, Extended Data Fig. 8a). MultiLin states ML-1b and ML-2 showed step wise priming of Pu.1 and Irf8 activity, which was more pronounced in IG2-proNeu1 and ML-MDP (Fig. 4e, Extended Data Fig. 8a). In contrast, ML-3 showed nascent Gata1 and Gata2 activity, which was retained in BMCP and became more pronounced in MEP (Fig. 4e, Extended Data Fig. 8a). Thus, cluster-specific GRNs were constructed from base-pair resolution contribution scores, the expression of transcription factors and their target genes.
To generate a comprehensive map of transcription factor activities across the 32 identified TEA-seq clusters, we utilized a composite activity Z-score by integrating transcription factor expression, target gene expression, and the regulatory contribution of the transcription factor to its putative target genes in each of the 32 clusters (Methods). This analysis revealed a distinctive set of transcription factors that were more active in HSC-MPP and MultiLin cells (Fig. 4f). Importantly the regulatory network inferred activities of SPI1, GATA2, GATA1, IRF8, and MYC corresponded with their InfinityFlow transcription factor-reporter expression values across the 32 cell states (Fig. 4g–j and Extended Data Fig. 8b). Thus, activity scores were well supported by their InfinityFlow reporter expression values after integration by cluster label propagation (Fig. 3d,e).
Surface proteins predict transcription factor activities
We sought to identify surface proteins that reported the activity of transcription factors underlying hierarchical lineage specification. When we aggregated activity scores for individual genes encoding dynamically expressed surface proteins in the MultiLin clusters, we noted that CD55 was expressed on early basophil, mast cell, erythroid and megakaryocytic progenitors and that Cd55 expression was linked to the activity of Gata1 and /or Gata2 transcription factors, while CD371 was expressed on early eosinophil, neutrophil, monocyte and dendritic cell progenitors and Clec12a expression was linked to the activity of Spi1, Irf8, Cebpa and Cebpe (Fig. 5a, Extended Data Fig. 9b). Using our CITE-Seq expression data, CellOracle60 in silico knock-out (KO) of Gata1 or Gata2 downregulated Cd55 expression and shifted trajectory away from CD55+ ML-3 (Fig. 5b–c, Extended Data Fig. 9b, Supplementary Table 6). Conversely, KO of Spi1, Irf8, Cebpa, or Cebpe downregulated Clec12a expression and shifted trajectory away from CD371+ ML-2 (Fig. 5b–c, Extended Data Fig. 9b, Supplementary Table 6). Thus, while inducing the expression of CD55 or CD371, these transcription factors act dynamically within discrete MultiLin cells to regulate the specification of erythroid or myeloid cell fates.
To detemine whether the genomic regions implicated in the assembled GRN (Fig. 4b–d) were unique or known, we tested their enrichment in published epigenomic profiling experiments.61 The majority (~68%) of seqlet positions in the GRN overlapped with previously identified candidate cis-regulatory elements (cCRE)(ENCODEv461), across all 32 TEA-seq clusters (Extended Data Fig. 8c–d). To validate the identity of the transcription factors inferred to act on these cCRE (Extended Data Fig. 8e), we used GIGGLE indexing and search62 to query a collection of over 3,500 ChIP-seq datasets covering 347 transcription factors (CistromeDB63). We found a strong (~90%) enrichment between the transcription factor families corresponding to seqlets (Gata1, Gata2, Irf8, Spi1, Cebpa, and Cebpe) and their ChIP-seq binding sites (Extended Data Fig. 8f and 8g). Restricted by TADs59, analysis of cCREs identified candidate binding positions for CEBP factors (Cebpa and Cebpe), as well as ETS-IRF composite elements64 (Irf8 and Spi1) correlated to expression of Clec12a (Fig. 5d, 5e), while binding sites for GATA factors (Gata1 and Gata2) correlated to expression of Cd55 (Fig. 5f). Thus, the transcription factor inferences were validated by published ChIP-seq data63, and their chromatin activity linked to expression of target genes.
Transcription factors induce hierarchical lineage restriction
Lineage priming was suggested to be dictated by the onset of multilineage transcription factor activity that is selectively reinforced and resolved in distinct patterns that determine particular lineages4. To determine the extent of mixed-lineage priming in each cell population, we computed the likelihood that a given cell state coincidently expresses defining markers for all committed lineage cell states (termed sc-Hrödinger). HSC had the lowest sc-Hrödinger scores, while MultiLin clusters had among the highest scores across CITE-Seq cell states (Extended Data Fig. 9a). Marker genes for neutrophils (e.g., Elane, Ptrn3, Ms4a3), eosinophils (e.g., Hdc, Cebpe, Ldhc), monocytes (e.g., F13a1, Ly6c2, Slpi) and dendritic cells (e.g., Ccr2, Rab7b, Ms4a4c) were primed in ML-2 cells, whereas marker genes for MkP (e.g., Pf4, Slc14a1, Gp5), erythroblasts (e.g., Klf1, Aqp1, Tspo2), basophils (e.g., Ms4a2, Alox5, Cd200r3) and mast cells (e.g., Scin, Gpr183, Kcnc1) were primed in ML-3 cells (Extended Data Fig. 9b). An alternative approach to identify fate transitions, Capybara65, identified lineage priming predictions concordant with sc-Hrödinger (Extended Data Fig. 9c).
Next, to infer lineage trajectories in our CITE-Seq atlas between predicted cell states, we applied the partition-based graph abstraction software (PAGA66). PAGA predicted 95 cell state interactions and prior-validated precursor-progeny relationships among bi-potential intermediate cell-states (i.e., IG2 to neutrophil and monocyte) and HSC-MPP to megakaryocytes (Extended Data Fig. 9d), suggesting that many inferred relationships are real67. To experimentally test the PAGA predictions, we used CellTag68, a lentiviral library with expressed barcodes and GFP, captured in single-cell transcriptomes to facilitate lineage-tracing and enable clonal tracking (Extended Data Fig. 9e). Both Sca+ (Lin−Kit+Sca+CD27+) and Sca1− (Lin−Kit+Sca−CD27+; ‘MultiLin’) populations were sorted from C57/Bl6 bone marrow, transduced with CellTag vectors and cultured as techical replicates with BMEC-Akt69 immortalized endothelial cells for 48h. GFP+ cells were sorted and captured for scRNA-seq (Extended Data Fig. 9e) and the transcriptomes were annotated based on the CITE-seq atlas clusters using cellHarmony. Using a prior described clonal coupling score70, considering statistically significant interactions with evidence from both technical replicates (Methods), we identified 128 predicted cell state interactions, representing potential lineage trajectories (Fig. 6a, Online Methods). ML-1b had connections to itself and to ML-2, few ML-1b cells were identified (Supplementary Table 7 and Source Data). Similarly, few mast cells, EoP and basophil cluster cells were identified (Fig. 6b), indicating a bias in the developmental output or capture efficiency of specific populations in this in vitro system. CellTag interactions confirmed 63 of the 95 relationships predicted by PAGA (Extended Data Fig. 9d). When projected into the CITE-seq atlas UMAP (Fig. 6b), the CellTag interactions confirmed known lineage relationships and nominated new ones, as noted below (Supplementary Table 7). Newly-detected relationships that involved MultiLin states had a lower clonal coupling score (6) compared to relationships between known populations (15.6) (Supplementary Table 7), consistent with the transitory nature of the MultiLin states. Strong links (clonal coupling score >6) were observed between MkP, ERP, and ML-3 populations (Fig. 6b), and between MDPs, monocytes and neutrophil populations (Extended Data Fig. 9b), suggesting a divergence between CD55+ ML-3 and CD371+ ML-2 MultiLin states. Distinct clonal relationships were observed for ML-2 (BMCP, IG2-MP, MDP-2) and ML-3 (MkP-HSC, ERP-HSC)(Fig. 6b), suggesting they were distinct cell states, rather than arbitrarily delineated subsets. These clonal relationships were also independently observed in a prior generated inducible in vivo lineage-recording mouse dataset of hematopoiesis (DARLIN71)(Fig. 6b).
Fig. 6. Divergent regulatory logic mapped to surface protein expression.

a, CellTag-multi lentiviral barcoding workflow, performed on two transduced progenitor subsets (Sca+, Sca−)(n=2 technical replicates). b, A heatmap of the shared CellTag barcodes between cell states. c, Conserved in vitro (CellTag) and in vivo (DARLIN) clonal coupling relationships projected over the integrated scRNA-seq UMAP and colored by the transduced input population. Width of edges denotes average clonal coupling scores. d, InfinityFlow UMAP with cells marked by lineage-defined-CRE activation of ROSA-LSL-tdTomato reporters: (red) Epor-CRE: erythroid, (orange) Epx-CRE: eosinophil, (yellow) Mrp8-CRE: neutrophil, (green) CD11c-CRE: dendritic cell. Dashed lines denote MultiLin populations to be tested for developmental potential (future CRE activity): (magenta) CD55+CD371−, (light blue) CD55−CD371−, (purple) CD55−CD371+. e, Cells in the above gates were index-sorted and single cells were cultured for 5 days. The output of single-cells is classified as either all tdTomato/CRE− (gray), all tdTomato+ (red), or a mix of tdTomato+ and tdTomato− cells (purple). f, Violin plots reveal clonogenicity (cell count) and lineage fate assessment (color) of MultiLin single-cell cultures.
To address the functional potential of MultiLin populations, we used lineage-specific EpoRCre ROSA−LSL-tdTomato (erythroid), EpxCre ROSA−LSL-tdTomato (eosinophil), Mrp8Cre ROSA-LSL-tdTomato (neutrophil) and CD11cCre ROSA−LSL-tdTomato (dendritic cell) reporter mice. Introducing bone marrow from the latter mice into the InfinityFlow object (Fig. 6c). Notably, Cre activity was not detect in MultiLin cells (Fig. 6c). Individual cells sorted from CD55+/−CD371+/− MultiLin gates (Extended Data Fig. 10a) were cultured for 5 days in StemSpan SFEM with cytokines (Methods) and assessed in terms of clonogenicity and expression of tdTomato (Extended Data Fig. 9f). Cells sorted from the CD55−CD371− MultiLin gate gave rise to tdTomato+ cells marked by all four CRE (Fig. 6d), indicating they represented an early progenitor. Cells sorted from the CD55+CD371− MultiLin gate, but not the CD55−CD371+ MultiLin gate gave rise to EpoRCRERosatdTomato+ cells; whereas cells sorted from the CD55−CD371+ MultiLin gate gave rise to tdTomato+ cells marked by Epx-CRE, Mrp8−CRE and CD11c−CRE (Fig. 6d). These data indicated that nascent and distinctive transcription programming within the diverse Lin−Sca−CD117+CD27+ MultiLin compartment selectively restricted their developmental potentials.
MultiLin states respond to perturbation
We developed a 27-color flow cytometry panel and visualized populations in the embedding space based on InfinityFlow in-silico gating (Fig. 7a). We found general concordance between the expected cluster content of in-silico-sorted InfinityFlow populations and their flow-sort-sRNA-seq analysis (Extended Data Fig. 10b). The lineage markers CD117, Sca1, CD27 and Ly6C enriched the MultiLin fraction (Fig. 7a); CD371 and Irf8 identified the initial split to eosinophil, neutrophil, monocyte and dendritic cell outputs (Fig. 7a); while CD55 revealed basophil, mast, erythroid and megakaryocyte cell outputs (Fig. 7a). Within the CD371+ populations, Irf8lo cells could be divided based on Ly6C (neutrophils) or CD125 (eosinophils) expression (Fig. 7a) while CD115 or CD135 expression marked monocytes or MDP, respectively in the Irf8hi fraction (Fig. 7a). The CD55+ cells could be split by differential expression of CD150, ITGB7 and CD41 for Meg/eryth lineage populatoins, while differential expressin of CD131, FCER1A and CD117 could distinguish basophil/mast lineage populations (Fig. 7a). The remaining CD55+CD150−CD131− gate contained an ITGB7+ population, which could be an erythroid-mast cell progenitor (EMaP), because they localized between erythroid and mast cell outputs in the InfinityFlow object (Fig. 7a) and ITGB7− population (Fig. 7a). Overall, this combination of markers provided a discrete separation of MultiLin fractions.
Fig. 7. Full spectrum flow cytometry profiling of MultiLin gene regulatory logic and response to infection.

a, Sequential flow cytometry gating to enrich MultiLin cell states. UMAP with in-silico gating to indicate the relative distribution of the gated populations (red highlighted cells; bottom right of each flow plot). Lin−CD117+CD27+Ly6clow to CD371 vs. Irf8GFP delineates broad MultiLin and bipotential populations. Lin−CD117+CD27+Ly6clowCD371−/lowIrf8−/low gate to CD55 vs. Irf8 identifies CD55−Irf8− ML1 gate (light green) and CD55−Irf8low ML2 gate (purple). CD55+ populations to CD150 vs. CD131 identifies CD150−CD131+ BMCP gate (blue) while CD150−CD131− to Lin vs. ITGB7 identifies Lin−ITGB7− ML3 gate (fuchsia) and Lin−ITGB7+ EMaP gate (light-violet). CD150+CD131− to ITGB7 vs. CD41 identifies ITGB7+CD41− ERP gate (red), ITGB7−CD41+ MEG gate (light-pink), and ITGB7−CD41− MEP gate (tan). Lin−CD117+CD27+Ly6clowCD371+Irf8− to CD131 vs. CD125 identifies CD131−CD125+ EOP gate (dark orange) and contaminating CD131+CD125− basophils (dark blue, also CD55+). CD131−CD125− to CD48 vs. Ly6C identifies CD48+Ly6C− EoNP gate (light orange), and CD48+Ly6Clow Neu gate (yellow). Lin−CD117+CD27+Ly6clowCD371+Irf8high to CD115 vs. CD135 reveals CD115+CD135− MP gate (green) and CD115−CD135+ MDP gate (light-blue). b-d, UMAPs representing InfinityFlow contents of gated populations; Lin−CD117+CD27+Ly6clow (b), Lin−CD117+CD27+Ly6clowCD371+ (c), and Lin−CD117+CD27+Ly6clowCD55+ (d). e, Transcription factor levels reported from above gated populations by InfinityFlow in-silico gating visualized as 2-dimensional contour plots (x-axis: imputed transcription factor signal, y-axis: forward scatter signal). f, Single cells were index-sorted from gates in A-D and cultured for 5 days, with output classified as either CRE-tdTomato positive or negative. The frequency of sorted cells with the indicated lineage-specific CRE (either Epor, Epx, Mrp8, or Cd11c) positive output is plotted as a bar plot (color of column matches color of gates in a). CD55+ and CD371+ populations indicated (below). g-i, Temporally resolved progenitor response to infection. Full-spectrum flow cytometry post-infection time course of cell state frequencies determined by indicated mature populations (g) or MultiLin gates (h). j, Heatmap of differentially expressed genes in MultiLin progenitors and associated bipotential cell-states at each timepoint post infection (cellHarmony, biological triplicates).
To examine the developmental regulators underlying the separable MultiLin states, we analyzed their transcription factor expression profiles using in-silico gating of the InfinityFlow object (Fig. 7b–d). The CD371−CD55− ML1 gate cells expressed a mixture of antagonistic transcription factors, including intermediate levels of PU.1, IRF8, GFI1, GATA2 and GATA1 (Fig. 7e). This gating scheme detected previously-identified bistable switches72,73, including divergent PU.1 and GATA1 expression between ‘ML1’ to “ML2” and “ML3” gate cells, an IRF8+GFI1+ population neighboring IRF8lo (eosinophil and neutrophil) and IRF8hi (monocyte and DC) populations, and a GATA1+GATA2+ population neighboring GATA2hi (basophil and mast cell) and GATA1hi (megakaryocyte and erythroid) populations (Fig. 7e). To trace the development of the gated populations, we used marrow from the lineage-specific-CRE ROSA−LSL-tdTomato reporter mice (as in Fig. 6), and index sorted single cells across the gates to assess CRE marking. The “ML1” gate cells gave rise to progeny marked by all four CRE reporters (Fig. 7f). EpoRCre ROSAtdTomato+ (erythroid) fate was enriched in the “MEP” and “ML3” gates (Fig. 7f); Epx (eosinophil) fate was enriched in “ML2” and “EoNP” gates (Fig. 7f); Mrp8 (neutrophil) fate was enriched in “ML2”, “EoNP”, “MP” and “MDP” gates (Fig. 7f); CD11c (DC) fate was enriched in “ML2”, “MP” and “MDP” gates (Fig. 7f), indicating the gating scheme resolved the divergent lineage potentials of the MuliLin cells by integrating cell surface markers, chromatin and gene expression states as well as expression of developmentally important transcription factors.
To determine whether MultiLin states respond to infection or stress, we analyzed their alterations in N. brasiliensis infection, which is known to induce a Th2-driven response, with the accumulation of basophils, mast cells and eosinophils in lungs and gut74. In Balb/c mice infected with Nippostrongylus brasiliensis (subcutaneous injecitoin of 750 L3 larvae as published75,76) full-spectrum flow cytometry and scRNA-seq capture of bone marrow cells prior to infection (D0) and at days 5, 7 and 10 after infection(Extended Data Fig. 10c). In comparison to D0 we detected a specific increase in the bone marrow CD55+CD371− ML3 gate at day 5 and day 7 and a simultaneous decrease in the CD371+CD55− ML2 gate (Fig. 7g,h), which coincided with an increase in bone marrow CD131+ basophil progenitors and FcER1a+ mast cell progenitors at day 7 (Fig. 7g,h). Differential gene expression analysis revealed a specific upregulation of genes in ML-3, BMCP, mast cell and basophil cluster cells starting at day 5 versus day 0 (Fig. 7i). In contrast, EoP cluster cells exhibited an upregulation of similar genes starting at day 3 (Fig. 7i), while the ML2 cluster cells (the putative EoP precursor) were reduced at day 3, and did not show the same transcriptional signal (Fig. 7i). The IL-5 receptor (CD125) was detected as specific marker of EoP (Source Data), consistent with the fact that infection-driven IL-5 production77 amplifies CD125+ EoP78, indicating that, while infection-induced IL-5 directly signaled to nascent EoP, the expansion of the ML3 gate cells that proceeded the increase in BMCP gate cells confirmed its role as a precursor to the basophil/mast cell fate. Thus, the specific responses of MultiLin populations to N. brasiliensis infection supported their developmental and physiological importance in the hematopoietic system.
To provide additional evidence for the role of ML-1, ML-2 and ML-3 cells across diverse in vivo perturbations (including genetic, cancer, infection and aging), we performed a retrospective analysis of 13 published scRNA-seq datasets79–91 spanning 22 distinct perturbations (see Methods). This analysis identified reproducible concordant shifts in ML-1 that corresponded to increased lymphoid production, in ML-2, corresponding with monocytic, dendritic and neutrophil production and in ML-3 corresponding with BMCP and ErP/MkP enriched outputs (Extended Fig. 10d–e). Thus, MultiLin subsets behaved as key determinants of biased lineage outputs of the hematopoietic system in the context of diverse perturbations spanning, infection, cancer and disease.
Discussion
In this study, we developed a unified framework that integrates multimodal single-cell analyses (CITE-seq, TEA-seq, and InfinityFlow) to isolate and characterize distinct hematopoietic progenitor populations. By combining transcriptional, chromatin, and surface protein features, we identified novel multilineage populations and mapped their functional potentials, offering a broadly applicable strategy for complex tissue systems.
This framework bridges classical surface-marker-based flow cytometry with genomic approaches, enabling the isolation and validation of progenitor states based on dynamically regulated transcription factors and stable molecular features such as open chromatin and surface proteins. We identified multipotent progenitors beyond traditional gates and characterized MultiLin cells based on their distinct gene regulatory networks (GRNs). Using this framework, we were able to isolate cells with the expected gene expression patterns, track the developmental potentials of different progenitor states and explore the dynamic GRNs underlying lineage specification. A public resource for exploration of tehese GRNs is provided (https://altanalyze.org/MarrowAtlas/).
One of the key observations from this study was the dynamic instability of MultiLin cells, which express co-regulated alternative-lineage determinants.4 This instability suggests a stochastic process of lineage restriction, as supported by prior findings in HSPCs4,92 and enteroendocrine progenitors.93 The MultiLin GRNs reflect dynamic activities of CEBP, IRF, and GATA family transcription factors, which cooperate or antagonize each other to regulate open chromatin regions.94–96 The resulting bifurcation induces the emergence of oligopotent progenitors, such as ML-2 and ML-3, which can be distinguished based on surface markers like CD371 and CD55. These results challenge previous models suggesting lineage bias within HSC-MPP and instead highlight the role of nascent transcription factor activity in MultiLin to promote lineage restriction.
We argue that hematopoietic progenitors exist in discrete, hierarchically organized states (multipotent, oligopotent, bipotent, and lineage-restricted) defined by their integrated transcriptional, chromatin, and proteomic features. The hierarchically nested GRN are the fundamental basis of discreteness and stability of developmental states within the hematopoietic compartment and lineage trajectories. The GRN enable dynamic control of hematopoietic outputs during homeostasis and in response to various stresses. The distinct GRN clearly illustrate the complex developmentally-diverse progenitor content of classical hematopoietic flow gates10–12, and argue against lineage commitment as a continuum.14,15 We do not fully exclude the possibility of a continuum of lineage progression with some stable states, as proposed by the punctuated hematopoiesis model.14,16,17 However, as opposed to pools of variably lineage specified progenitors14,16,17, we argue that the intermediate states identified here (e.g. MultiLin cells) represent more stable hierarchical developmental stages with distinct developmental potentials determined by their underlying GRN. These stable states were validated using cell barcoding and lineage tracing methods, providing strong evidence for their role in controlling developmental outcomes. Our data also suggest that clonal memory and gene regulatory mechanisms likely influence the production of different lineages, which could provide insight into clonal heterogeneity and lineage restriction in hematopoiesis suggested by the “parallel hematopoiesis” model.97 Indeed, single HSC transplant results in heterogeneous production, some of which is unilineage.98
Online Methods
Mice
All procedures were performed according to an Cincinnati Children’s Medical Center Institutional Animal Care and Use Committee approved protocol ( IACUC2023–0009).
Mice were housed on a 14hour light/10 hour dark cycle at 72oF, in ventilated micoisolator cages that are autoclaved sterile with irradiated isopad (medical grade cotton) bedding. Filtered reverse osmosis water pouches were made fresh daily. The diet was irradiated Teklad 2919. The facility is AAALAC accredited, and managed by specialized and ACLAM board-certified veterinarians.
Mice of both sexes between the ages of 4 weeks to 8 weeks were used for these studies. Some mice were obtained from the Jackson Labs (Bar Harbor ME); including C57/Bl6 (Strain #:000664), Balb/c (Strain #:000651), and transgenic C57/Bl6 mice including Irf8-GFP transgenic mice38(Strain #:027084), Plzf-GFPcre99 (Strain #:024529), Csf1r-EGFP100 (Strain #:018549), GFP-c-Myc KI101 (Strain #:021935), Ncr1gfp102 (Strain #:022739), Tcf7GFP flox103 (Strain #:030909), Ki67-RFP104 (Strain #:029802), ROSA-LSL-tdTomato105 (Strain #:007914), Epor-CRE106 (Strain #:035702), Mrp8-CRE107 (Strain #:021614), CD11c-CRE108 (Strain #:008068). Other transgenic mice were gifts of the investigators: Pu.1-YFP109, Gata1-mCherry110, Gata2-Venus72, Gfi1-tdTomato111, Tox-GFP112, Epx-CRE113, Ms4a3-CRE45, FUCCI2114.
Bone Marrow Cell Isolation and CD117 Enrichment
Mouse bone marrow cells were isolated from iliac crest, femur, and tibia bones of 6- to 8-week-old C57/BL6 male mice or Balb/cJ for N. brasiliensis infection modeling. Bones were crushed to obtain single cell suspension and CD117 enriched (130–097-146, Miltenyi Biotec) on Miltenyi AutoMACS (program Possel) according to manufacturer’s protocol.
Single-cell CITE-seq generation
CITE-seq was performed with 3 different ADT panel configuration of BioLegend TotalSeq-A antibodies: (1) a customized 65-plex cocktail (manually titrated using flow cytometry), (2) a prototype 200-plex cocktail for antibody titration (Universal 1.0 BioLegend Catalogue Number: 199901), and (3) a final titrated and lyophilized 110-plex cocktail (Supplementary Table 1: available as BioLegend custom Cat: 900003823) with fresh CD135 and CD127 added as spike-in ADT.
The 65-plex cocktail was comprised of known hematopoietic stem and progenitor markers as an initial test of the technology. ADT conjugated antibodies were incubated with AF647-fluorochrome-conjugated oligo-dT oligonucleotide (IDT). Using CD117-enriched mouse bone marrow as input, concentrations were estimated from flow cytometric titration of the AF647 signal. After review of these data, final concentrations were recommended by BioLegend. All antibodies used and associated adjustment from vendor concentrations are indicated in Supplementary Table 1.
The 200-plex cocktail was provided as a lyophilized mix, uniform for each TotalSeq antibody, which was rehydrated at 8x concentration and subsequently diluted to five concentrations: 4x, 2x, 1x, 0.5x, and 0.25x. 1x concentration was determined based on the performance of ADTs in the 65-plex panel or according to manufacturer’s recommendation. Five sorted populations were profiled with the 200-plex panel to comprehensively represent rare cell populations in the marrow: total bone marrow cells, CD117-enriched bone marrow cells, FS HSC-MPP cells (Lin−Sca1+CD117+ and exclusive of CD48+CD150−), FS MultiLin cells (Lin−CD117+CD34+CD16/32+/lowCD115−CD11b−Ly6C−CD150−), and FS CD127+ (Lymphoid) cells. A unique HTO antibody was added to denote each titration concentration. Same cell population stained with different titration concentrations were pooled together after washing separately.
For the titrated 110-plex panel, a customized lyophilized cocktail was created by BioLegend using selected concentrations from the 200-plex titration experiment. The titrated 110-plex cocktail (with fresh CD135 and CD127 added as spike-in ADT) was applied to the same five populations except for total bone marrow cells.
In addition, each sorted cell population from various gates were stained with a unique HTO, washed and pooled. In each experiment, cells were washed on Laminar Wash MINI or HT2000 System (Curiox Biosystems Inc) with the following settings: 25 cycles, flow rate of 10uL/s, initial volume of 55uL. Cells were captured using the Chromium X with Chromium Next GEM Single Cell 3ʹ kit v3.1 chemistry (PN-1000268, 10X Genomics).
Fluidigm scRNA-seq
Flow cytometry sorted cells purified from murine bone marrow were captured using the C1 Single-Cell Auto Prep System (Fluidigm) and underwent scRNA-seq analysis as previously described.3
TEA-seq
TEA-seq was performed on HSC-MPP gate and MultiLin gate cells following the published protocol18 with minor modifications: (1) using a digitonin concentration of 0.025% based on optimization results, (2) the addition of Protector RNAse Inhibitor (Roche 3335402001) in perm, wash and tagmentation buffers, (3) two ports were loaded to created duplicate technical replicates, and (4) 80ul instead of 160ul supernatant from cDNA amplifications were used to purify pre-amplified ADT. Briefly, 100,000 to 250,000 leftover cells from CITE-seq experiment were permeablized in digitonin perm buffer, washed and then processed according to the manufacturer’s protocol for Chromium Next GEM Single Cell Multiome ATAC + Gene Expression (PN-1000285, 10X Genomics).
Single cell library prep and sequencing
Library prep was performed according to protocols with minor modifications such as reduced PCR cycles for ADT and HTO library amplifications. We found pre-PCR ADT/HTO in our libraries were abundant enough that we generally reduced 2 cycles (ADT: 10–12 cycles, HTO: 9–10 cycles). Final transcriptome, ADT and HTO libraries were quantified and analyzed by Qubit dsDNA HS (1000268, Invitrogen), High Sensitivity DNA kit (5067–4626, Agilent Technologies) on 2100 Bioanalyzer (G2939BA, Agilent Technologies) and KAPA HiFi library quantification kit (KK4824, Roche). Dual-indexed transcriptome libraries were pooled and sequenced across multiple Illumina Novaseq 6000 S4 or X plus flow cells (Illumina) with PE150+10+10 or PE100+10+10 settings. Single indexed ADT and HTO libraries were pooled and sequenced on Illumina S2 flow cells with PE50+8 setting or with transcriptome libraries. BCL files were demultiplexed into fastq files for Cell Ranger input. “AT” was added to the end of RPI-x ADT i7 index (6bp) to match D70X_long HTO index (8bp). HTO and ADT FASTQ files were supplied as 3P feature barcode together with transcriptome FASTQ files into Cell Ranger V6.1.2 count pipeline. Transcriptome was mapped to mm10-v2 reference genomes for downstream analysis and visualization.
HTO calling and quality control
Cells were multiplexed using HTOs to distinguish CITE-seq ADT concentration or cell population. HTO barcode count matrices were obtained through the multimodal analysis workflow in Cell Ranger, prior to normalization (counts per ten thousand (CPTT)). Cell barcodes with > 30% of normalized reads assigned to multiple HTOs were annotated as doublets, with confident singlet assigned to cells with >40% of normalized reads assigned to a single HTO (HTO processing module of AltAnalyze v.2.1.4). Cells were further filtered based on the 7 mouse/rat isotype control antibodies counts (Source code) and performed quality control filtering in Seurat V4 by nFeature_RNA>500 & nCount_RNA >1000 & percent.mt<25. This QC step filtered 393,748 cells to 315,792 high-quality single cells in the initial titration dataset and 90,889 to 72,198 cells in the final titrated CITE-seq dataset. Individual library metrics are provided for all single-cell genomic experiments in Supplementary Table 2.
Multimodal analysis
All Cell Ranger produced count matrices underwent ambient RNA exclusion using the software SoupX115 with a contamination fraction of 15% and quality control filtering by HTO. Ambient corrected transcriptome counts and associated ADTs counts were supplied as input to the software TotalVI116 to obtain normalized and denoised ADT counts. To derive clusters from the initial titration CITE-seq datasets, the software cellHarmony was used to transfer labels from CPTT normalized expression centroids computed in author-provided labels from three prior published reference bone marrow atlases. Marker heatmaps were obtained using MarkerFinder in AltAnalyze using either single cells or combined donor pseudobulks for each scTriangulate cell population. For differential ADT analyses, we applied an empirical Bayes moderated t-test (FDR corrected). To quantitatively assess performance of the final titrated ADT mix relative to the original TotalSeq-A Universal 1.0 mix, we generated ADT mean expression values for all 87 scTriangulate clusters for all ADTs (TotalVI normalized) in both datasets and re-assigned cell identify to all cell-barcode using these centroids with cellHarmony (AltAnalyze 2.1.4). Performance was evaluated using Adjusted Rand Score function in the python scikit-learn library.
HIVE scRNA-seq
The Honeycomb HIVE CLX capture system (Honeycomb Biotechnologies, Inc) was applied using the manufacturer’s instructions to the following populations: CD117-enriched bone marrow, sorted cells from basophil/mast enriched gates, and those from eosinophil enriched gates. Namely, CD117+ progenitors, and Lin-CD117+CD34+CD115−Ly6C− MultiLin populations along with eosinophil- and basophil-mast-cell-progenitor enriched gates (Extended Data Fig. 4a; Ly6C-1, Ly6C+ Eo Trajectory gates). Briefly, 30,000 cells of each population were loaded per HIVE by centrifugation and processed immediately without freezing. Libraries were quality checked and then pooled for sequencing on a Novaseq 6000 SP flow cells for PE50 using customized sequencing and indexing primers. Cells were called based UMI per cell knee plot and gene count matrix was generated by aligning to mm10-v2 for downstream analysis.
Decision-level cluster integration
The software scTriangulate (version 0.13.0) was applied to the optimized titrated CITE-seq compendium to define high-confidence multimodal single-cell clusters. As input for scTriangulate, clusters were derived from three separate sources as outlined below: 1) literature-centric supervised analysis, 2) independent transcriptome unsupervised clustering and 3) ADT unsupervised clustering. scTriangulate applies coalitional iteration with diverse single-modality or multi-modal input clustering solutions, to assess aggregate stability (Shapley value) of each overlapping annotation at a single-cell level. scTriangulate was run with 5 stability metric options: 1) Re-assign score, 2) Single Cell Clustering Assessment Framework (SCCAF) reclassification score and 3) Term Frequency-Inverse Document Frequency (TF-IDF) ranking the top first, fifth and tenth markers as separate scores (GitHub Repo). Among the 90 resulting scTriangulate clusters, three clusters labels and cells were excluded due to high mitochondrial gene expression. The final clusters were refined by computing centroids for these 87 clusters for 100 random cells per cluster, reclassifying all parental cells using the software cellHarmony in AltAnalyze v.2.1.4 (correlation cutoff > 0.3). Cell population names were initially derived from Fisher exact test enrichment compared to the literature-defined cell cluster labels and manual curation based on identified marker genes. Prior to scTriangulate, literature-centric cell population annotations were derived from the aggregate of prior murine progenitor single-cell cluster annotations, using reference-based label projection. Reference centroids or defining gene-sets were selected from studies in which reliable evidence exists to demonstrate progenitor lineage potential (CFU, index sorting, secondary transplant). Specifically, HSC-MPP subsets, dendritic cell, basophil/mast cell commitment, early lymphoid, and myeloid intermediate centroids were derived from each study2–4,11,21–25 and used to identify the best matching cells in our pre-titration scRNA-seq compendium using the software cellHarmony (top 50 scoring cells per centroid). The top-selected unique cells from all references were combined to produce an aggregate reference, using defined marker genes (MarkerFinder, AltAnalyze). The combined multi-study reference centroids were applied back to the optimized titrated CITE-seq dataset, using cellHarmony centroid classification (default options). To derive unsupervised RNA and ADT clusters in this dataset, we applied ICGS2 clustering (default options) in two rounds, to define subclusters for the principle identified lineages in AltAnalyze (Ensembl version 72 BioMart database). For each modality, sub-clustering was applied to initial ICGS2 clusters binned according to flow-gate inferred lineages (HSPC, MultiLin, CD117, CD127 enriched). From 28 initial transcriptome defined clusters, 58 ICGS2 sub-clusters were identified. From 13 initial ADT defined clusters, 93 ICGS subclusters were identified. Full source annotations are provided in Synapse (syn60529836).
Cell fate prediction analysis
sc-Hrödinger is a python module in the software AltAnalyze (https://github.com/nsalomonis/altanalyze/blob/master/stats_scripts/multiLineageScore.py) developed to compute the probability that a given cell state is consistent with mixed-lineage priming (metastability) based on the coincidence expression of lineage-defining markers from presumed uni-potential committed progenitor cell states. sc-Hrödinger quantifies the degree to which a cell simultaneously expresses markers for multiple lineages (in >1 defined reference cell states, in >25% of cells of that cell state). Multipotent cell states are frequently defined by priming to one more lineages. For markers per reference cell states, for a given cell, a sc-Hrödinger is computed as the mean of the gene-level calls for expression all lineage marker genes (1 or 0, if CPTT > 0 for each gene). Unique non-overlapping marker genes are defined using the AltAnalyze MarkerFinder algorithm. In this analysis, MarkerFinder was run on 11 selected uni-potential committed progenitor cell states representing the major cell lineages identified in our CITE-seq analysis (aHSC, MkP, mast cell progenitor, basophil progenitor, eosinophil progenitor, monocyte progenitor, early neutrophil progenitor, early lymphoid, early erythroid, early conventional dendritic cell, early plasmacytoid dendritic). The top 100 marker genes per representative cell states were provided (Pearson Rho ranked) along with an expression file filtered to these genes (CPTT CITE-seq normalized counts) and associated cellbarcode to cluster annotations.
Capybara Cell Identity Analysis
Single-cell RNA sequencing data from the titrated CITE-seq experiment was processed and log-transformed from CPTT values after soupx ambient RNA correction. Cell identity was quantified using Capybara (v1.0). First, a reference dataset was constructed using specific end-state clusters. Second, cell identity was measured across all of the captured cell states. Quadratic programming scores representing the continuous cell identity assignments were extracted from the query dataset to generate a heatmap visualization of z-score normalized QP values. Cells exhibiting characteristics of multiple end states were curated using the Capybara function `multi.id.curate.qp` and the frequency of “Multi-ID” states for each input CITE-seq cluster identity were plotted as a percentage.
MultiLin Cell State Correlations with Perturbation
To determine the functional outputs associated with perturbations in distinct MultiLin cell states, we performed a retrospective analysis of 13 published scRNA-seq datasets79–91 spanning 22 distinct perturbations pre-leukemic models, infection, inhibitors, knockouts, aging and disease) in single-cell bone marrow RNA-seq (GSE227026, GSE248396, GSE228562, GSE191147, GSE252833, GSE223632, GSE197407, GSE209742, GSE147729, GSE189217, GSE264087, GSE236407, GSE235798). Label transfer to scTriangulate CITE-Seq centroids (syn66721893) was performed using an optimized version of cellHarmony for direct use with 10x Genomics Chromium h5 and mtx format files (https://github.com/SalomonisLab/altanalyze3). Significant associations between the cell frequency of distinct MultiLin subsets and downstream differentiation outcomes were quantified using standardized chi-square residuals [(observed – expected)/sqrt(expected)] on the cell frequency counts table (R chisq.test function). The rank correlation of the standardized residuals were calculated to determine the extent to which there was an association between over(under)-representation of pairs of cell types. Multiple testing and dependencies among the p-values was accounted for using the Benjamini-Yekutieli (BY, R p.adjust function) FDR procedure, where a BYfdr ≤ 0.05 was considered statistically significant. This analysis yielded MultiLin associations with 64 down-stream populations that were visualized as a heatmap.
RShiny app development
Two RShiny apps were developed from the final Titrated ADT CITE-seq libraries: (1) an Azimuth reference atlas for label transfer of uploaded scRNA-seq datasets and (2) a ShinyCell interactive browser for exploration of single-cell populations, RNA and ADT expression and distinct dataset covariates. These apps are available at https://altanalyze.org/MarrowAtlas/. The multimodal Azimuth bone marrow reference RShiny interface was built following the Azimuth v0.4.6 instructions (https://github.com/satijalab/azimuth) using the neighbors from the titrated RNA data restricted to the top MarkerFinder marker genes (syn66721893). CITE-seq RNA counts were scaled and normalized as CPTT with clusters defined for four different annotation levels based on the multimodal scTriangulate clusters. A ShinyCell viewer for the healthy bone marrow compendium was generated using a formatted h5ad counts matrix with corresponding sample/cell-level metadata for both the titrated dataset (syn66721894). Associated RShiny and h5ad creation scripts are linked to the datasets in Synapse.
Infinity Flow
Infinity Flow captures were performed using a 5-laser spectral cytometer (Aurora System, Cytek Biosciences). An initial backbone of 22 selected pan-lineage cell-surface protein markers were used for all InfinityFlow captures to enable regression to impute 95 additional fluorochrome-conjugated antibody signals, and 18 transgenic fluorescent reporter signals. All Flow Cytometry fluorescence intensities were normalized using the logicle transformation as implemented in the pyInfinityFlow package using default parameters. Regression was carried out using pyInfinityFlow to apply the XGBoost algorithm54, enabling the imputation of 113 features flow cytometry features, overlapping with the optimized CITE-seq panel, including the original backbone. As ground truth for XGBoost, we leveraged Flow Cytometry of ~100,000 cells stained with one of the Infinity Markers and all of the backbone antibodies. We trained marker specific XGBoost models for the measured Infinity Marker on 80% of cells and tested the accuracy of the regression using mean squared error between the predicted Infinity Marker expression and the ground truth Infinity Marker expression on the remaining 20% of cells “held out” from training, using the default settings of pyInfinityFlow. For the prediction of surface proteins, CD117-enriched bone marrow cells were sampled from mice with the Irf8-GFP transgenic reporter to improve prediction accuracy. The target sample on which the final regression models were applied included staining for CD131-PE to not impute this signal and carry no error from prediction. This yielded an initial InfinityFlow object based on the 22-color backbone. To identify poor performing imputed markers to optimize the original 22 marker backbone, we assessed imputation variability (mean squared error) or selected markers that were highly expressed in cell populations with poor separation in the obtained UMAP embedding. This analysis nominated 27 additional markers able to distinguish multiple rare cell populations, including early specifying basophil, mast, and eosinophil cells. This updated panel was applied to a new target sample from Irf8-GFP CD117-enriched bone marrow samples. Overlapping markers between the initial InfinityFlow object and the new target population were used to generate regression models to predict all remaining signals from the initial InfinityFlow object onto the curated panel capture. The resulting curated InfinityFlow object was used for all downstream processing. In-silico gating was done by importing the InfinityFlow object into FlowJo as an FCS file. FCS files are available in Synapse (syn60529836).
Supervised KDE Mapping
To normalize the CITE-seq ADT signals to best approximate the signals of analogous surface proteins observed in flow cytometry, reference spline functions were built to map the percentile of expression level for each feature to that feature’s intensity in the InfinityFlow space. This creates a map between rank and signal intensity that is assumed to be the same for any sample drawn from the same population (CD117-enriched murine bone marrow cells). The signal intensity is assumed to be conserved between the different populations captured by CITE-seq after TotalVI batch correction (CD117-enriched, HSC-MPP gated cells, MultiLin gated cells, and CD127+ gated cells), so another spline function is created to map the signal intensity of the CD117-enriched population ADT intensity for each feature to its percentile rank among cells. These CITE-seq specific spline functions can then be applied to each cell and for each feature in all sorted populations to map their signal intensity to the predicted percentile across the distribution observed for CD117-enriched bone marrow. That percentile then serves as input to the analogous InfinityFlow reference spline function to map to the InfinityFlow signal intensity. This two-step, percentile-based normalization strategy is conceptually similar to landmark registration, which aligns datasets based on corresponding features.117
FS-scRNAseq Capture
Surface marker expression as well as CITE-seq atlas cluster label transfer to the InfinityFlow object were used to nominate new gating schemes. Initial gates targeted specified progenitor states for a first round of FS-scRNA-seq capture. These gates were applied to a Sony MA900 sorter. In a second round of FS-scRNA-seq capture, gates from Pronk et al. were applied using a Cytek Aurora CS. Since the Sony MA900 and Aurora CS operate on different bit systems, optimization measures were taken when setting up the CS to ensure accuracy (a comparison sort was run in parallel on Aurora CS and BD FACS Aria). A third round of FS-scRNA-seq capture used a new brute force algorithm called Ab-MarkerFinder, which iteratively applied the MarkerFinder algorithm (pyInfinityFlow) and in silico gating to nominate an optimal gating strategy (details below)(Supplementary Table 5). These were applied in silico with visualization over the 2-dimensional UMAP space to curate markers for the final gating strategy. All final gating strategies were integrated into a single panel and applied on the Cytek Aurora CS. We subsequently performed FS enrichment of all cell populations using conventional flow cytometry. The majority of these populations were captured using the Chromium X with Chromium Next GEM Single Cell 3ʹ kit v3.1 chemistry (PN-1000268, 10X Genomics), along with HSC-MPP and CD127+ produced CITE-seq captures.
Ab-MarkerFinder
To nominate novel gating schemes for populations of interest from the curated InfinityFlow object, we developed a new brute-force algorithm that could be trained with millions of cells. Briefly, MarkerFinder is applied for the prospective population of interest to identify the top 5 correlated and top 5 anti-correlated features. For each of these features, the signal is split over n (default n=100) evenly spaced amounts after logicle normalization of fluorescence intensity. For each of these values, a candidate gate in both positive and negative direction is tested for improvements to both purity and yield of the prospective population. The best gate (determined by improvement of purity), picked among the 10 tested markers, is then used to filter the data. The process is then re-iterated with the filtered data for the prospective population. The iteration continues until either the desired purity (default 90%) or a yield limit (default 50%) is reached.
Benchmarking of ADT to InfinityFlow label transfer approaches
To assess the accuracy of KDE mapping relative to prior established multimodal label transfer approaches, prior described and novel hematopoietic gating strategies with corresponding single-cell captures were used at a benchmark dataset. This benchmarking dataset was restricted to gates that were observed to be mutually exclusive when visually assessed using the combined CD117-enriched bone marrow InfinityFlow object. Following the same QC filtering protocol and SoupX correction approach as the CITE-seq data, the top 60 unique marker genes per each of the 23 FS isolated cell populations using the software MarkerFinder (250 random cells per cluster). To obtain ADT profiles for each cell population, corresponding cells in the final titrated CITE-seq data identified using cellHarmony transcriptome mapping with two separate parameters to ensure rigor (centroid- and community-based alignment).
Using these matched InfinityFlow and CITE-seq ground-state predictions, we evaluated mapping of the Flow-defined CITE-seq populations to the corresponding InfinityFlow in silico flow gate defined populations. These analyses consider the 113 matching CITE-seq ADTs (or corresponding gene mRNA) and InfinityFlow cell-surface markers. To assess the precision of KDE mapping with cellHarmony alignment relative to potential orthogonal strategies, we tested 4 distinct multimodal label transfer approaches (Seurat Bridge-integration, MARIO-CCA, Harmony-KNN, XGBoost) 52–55 and CyCombine49 feature normalization. Bridge integration was implemented with the source FS-scRNA-seq sorted populations and associated labels (input), the titrated CITE-seq (bridge) and the filtered InfinityFlow object (target). Harmony-KNN was applied using harmonypy Python library to the combined CITE-seq and InfinityFlow cell surface marker intensity matrix, without KDE mapping, followed by a k-nearest neighbors classification model built using scikit-learn. A python XGBoost classification model was fitted to the 23 labels in the CITE-seq matrix and queried against the InfinityFlow matrix. As ground truth for XGBoost, we leveraged Flow Cytometry of ~100,000 cells stained with one of the Infinity Markers and all of the backbone antibodies. We trained the XGBoost model for the measured Infinity Marker on 80% of cells and tested the accuracy of the regression using mean squared error between the predicted Infinity Marker expression and the ground truth Infinity Marker expression on the remaining 20% of cells “held out” from training, using the default settings of pyInfinityFlow. For MARIO-CCA, the MARIO algorithm was similarly applied with canonical correlation analysis (CCA) vectors. For the cellHarmony using community-alignment, was applied with the pyInfinityFlow associated library using either KDE mapping, no mapping or CyCombine normalization. The ARI score was used to assess accuracy of the obtained labels from each separate approach to the ground-state in silico flow InfinityFlow population annotations.
Integration of CITE-seq and InfinityFlow and UMAP Projection
After normalizing the ADT intensity values with the supervised KDE mapping approach, cellHarmony (pyInfinityFlow package) was applied to transfer the scTriangulate defined labels to the InfinityFlow dataset using community-alignment. Upon transferring these labels, the InfinityFlow dataset was subsampled to a size of 1e5 cells, distributing the sampled cells as evenly as possible among the cluster labels. This subsampled InfinityFlow set was used to fit a 2-dimensional UMAP projection using the 27 base fluorescence signals of the captured InfinityFlow object. This UMAP was then applied to the entire InfinityFlow dataset using the transform function to apply find embedding coordinates for all cells. The resulting coordinates and cluster labels were saved to the InfinityFlow object FCS file. To co-embed the CITE-seq data, a k-nearest-neighbors (KNN) model was used to identify the top 4 nearest neighbors of the CITE-seq cells to the 1e5 sampled InfinityFlow cells, and the UMAP position was taken to be the centroid position of those 4 nearest neighbors in the embedded space (Consensus KNN). To assess the correlation between CITE-seq mRNA count and InfinityFlow transcription factor reporter fluorescence intensity, the Pearson correlation was calculated between the CITE-seq mRNA count for those transcription factors against the centroid fluorescence intensity of the InfinityFlow reporter expression value.
In-vitro HSPC Culturing Conditions
Bone marrow HSPCs were cultured using StemSpan SFEM (StemCell Technologies), enriched with 50ng/mL SCF, 10ng/mL IL-3, 10ng/mL IL-6, 2U/mL EPO, 50ng/mL IL-11, 10ng/mL IL-5, 50ng/mL TPO, 15ng/mL G-CSF, 15ng/mL GM-CSF, 10ng/mL IL-7 (PeproTech, Rocky Hill, NJ USA).
CellTag Lineage Mapping
pSMAL-CellTag-multi-V1 plasmid DNA library was validated by Sanger sequencing.118 The lentiviral vector used in these studies was manufactured by the Vector Production Facility at Cincinnati Children’s Hospital Medical Center, as described briefly below. The target vector, Delta 8.9, VSVG, and pRSV-Rev plasmids were packaged in DMEM using PEI transfection reagent. Transfection reagent was removed after 6 hours. Viral supernatant was harvested 44- and 68-hours post transfection, clarified with a 0.45 um filter, and processed with Gamma Gold Clarification, XT5 Scale Ion Exchange Chromatography, and Tangential Flow Filtration to remove contaminants from cell culture.
Virus was titrated on Lin−Sca−CD117+CD27+ MultiLin cells and titer was chosen on transduction efficiency of 80%. CellTagging was performed separately on two cultures of Sca+ (Lin−Sca+CD117+CD27+ gate) and two cultures of Sca1− (Lin−Sca−CD117+CD27+ gate ‘MultiLin’) marrow progenitors. These technical replicates were processed independently, including single cell captures and library generation. 50,000 (Sca+ or Sca−) cells per well were plated at 50uL in a 96-well plate, then spinfected (10,000 RPM, 45min at room temperature), and then incubated overnight in a 37C incubator with 5% CO2. Cells were recovered the next day and transferred to a 24-well plate with bone marrow epithelial cells (BMEC-Akt69) to potentially preserve self-renewal capacity. BMEC-Akt were plated at 80% confluency 48 hours prior to co-culture. 0h starts when transduced HSC/progenitor cells enter co-culture with BMEC-Akt. After 48 hours of co-culture, transduced cells were sorted for GFP+CD45.2+Ly6C−/low to both gate away BMEC-Akt1 cells, and avoid over-representation of neutrophil output cells. Technical replicates were processed separately to create four 10X 3’ scRNA-seq libraries (Sca+1, Sca+2, Sca−1, Sca−2).
CellTag Clonal relationship inference
The CellTag118 associated scRNA-seq was aligned to our CITE-seq compendium cell atlas annotations using the software cellHarmony as described above. UMAP coordinates were projected from the CellTag into the CITE-Seq embeddings using a custom python script (Synapse: syn53237568). To reconstruct lineage relationships from the CellTag, we assessed the cell identities shared in each clone and calculated a comprehensive clonal coupling score between two cell types, as described below.
Clonal coupling score.
We calculate a clonal coupling score119 between two cell types. We first calculate the number of observed unique clonal barcodes, or clones, shared between two cell types, and . We calculate an expected number of shared clones between the two cell types using the values in the matrix, where , and is the set of cell types in the following way:
Clonal coupling score between two cell types is calculated as the ratio of observed matrix over the expected matrix .119 The clonal coupling score, thus, is a symmetric adjacency matrix which can be visualized as a network. Firstly, the clonal coupling scores were centralized. Lineage relationships for which the clonal coupling score was or greater than 30 was set to a maximum value of 30. Similarly, lineage relationships for which the clonal coupling score was or less than 1/30 was set to a minimum value of 1/30. Lastly, the clonal coupling scores were log2-normalized. Note that in contrast to the approach by Weinreb et al119, clonal coupling scores were not calculated as the median observed/expected matrix ratio from randomized trials of sampled cell-type specific cells. We specifically avoided the latter to account for smaller clones from rarer cell populations.
Clonal relationship inference.
To leverage the confidence acquired through technical replicates, we consider all the lineage relationships between two cell types that satisfy one of the two criteria: 1) log-normalized clonal coupling score > 0 in both replicates, and 2) number of shared clones > 4 in both replicates. This criterion assumes that a lineage relationship is unlikely to occur out of random chance if it is observed in both the replicates, thereby increasing the specificity of a lineage relationship. Additionally, the criterion accounts for relationships that are penalized by clonal coupling score for having the number of observed barcodes too close to the expected number of barcodes.
Lineage Trajectory Inference
Log-scaled normalized gene expression (CPTT) of only the MarkerFinder genes (n = 2,893) from the cells from the CITE-seq was used to create an anndata object using the scanpy package. Two different approaches, diffusion pseudotime (DPT) and PAGA, from the scanpy package were used to infer pseudotemporal ordering of the hematopoietic stem cells. Highly variable genes were determined with the default parameters. UMAP coordinates of titrated CITE-seq data were evaluated in PAGA. For PAGA, cell populations were restricted to those quantified using CellTag to enable direct comparison.
Diffusion Pseudotime (DPT).
Neighborhood graph of cells was computed on 30 principal components (‘n_pcs’ parameter) with the number of neighbors (‘n_neighbors’ parameter) set to 30, as recommended by the scanpy authors.120 Diffusion map was computed with the default settings. Cell populations were restricted to those detected by CellTag to draw direct inferences. To calculate the diffusion pseudotime, a cell annotated as “MPP1-G1” was set as the root cell for pseudotime (most inferred primitive HSC-MPP cluster captured by CellTag). In our analysis, the first MPP1-G1 barcode in the list (“AGGGAGTAGCTGCCTG-1.AS_CITE_HSC”) was set as the root cell. Diffusion pseudotime was calculated using the function scanpy.tl.diffmap and the pseudotime values for each cell are projected on the UMAP.
PAGA.
PAGA66 graph was computed using the connectivities from the neighborhood graph (described above) and annotated single-cell clusters as nodes. “MPP1-G1” was set as the root node (‘root_key’ parameter). DPT values are used to guide the PAGA graph (‘use_time_prior’ parameter). Finally, only the edges with weight greater than 0.6 in the PAGA are visualized on the UMAP.
Network similarity metrics
Jaccard Index.
The Jaccard index is scaled from 0 to 1, where 0 means the least similarity in the edges of two networks. It is the ratio of intersection of edges between two networks divided by the union of edges between the two networks.
Hamming–Ipsen–Mikhailov (HIM) metric.
HIM metric is a previously published network similarity score121 to assess the similarity between the topologies of the two networks, taking into account differences in edge weights and degree (the number of edges leaving a node or cell type) distributions. As indicated by the equation below, HIM distance is a linear combination of the normalized Hamming distance, which accounts for differences in edge weights, and the normalized Ipsen–Mikhailov distance, which accounts for the similarity in degree distributions. The Ipsen–Mikhailov has parameter γ, was set to 0.05. Let indicate the Hamming distance and indicate the Ipsen-Mikhailov distance between two undirected networks , then the HIM metric of similarity is described by the following equation.
DARLIN data analysis and validation
Processed h5ad of the DARLIN dataset was downloaded from Zendo (tissue_adata_refined_20221106_joint.h5ad). The counts matrix in the anndata was log-normalized using scanpy. cellHarmony (community alignment as described above) was used to project our CITE-seq compendium cell atlas annotations on the cells from DARLIN dataset. The log-normalized gene expression matrix was provided as the input to cellHarmony.
Clone-by-cell count matrix was extracted from the above-mentioned h5ad (adata.obsm[‘X_clone’]). Using this matrix, log2-normalized clonal coupling scores were calculated using the approach described above used CellTag lineage barcoding data. Lineage relationships that had log-clonal coupling score greater than 0 and were evidenced by at least 3 clones were considered for visualization purposes (Fig. 7c, Extended Data 9d). Further, lineage relationships with log-clonal coupling score greater than 0, greater than 0.3, and greater than or equal to 0.3 were assigned as level 1, level 2, and level 2 (relaxed), respectively in Supplemental Table 7.
CITE-seq and TEA-seq Integration
To transfer labels from the titrated CITE-seq and matching TEA-seq captures, the two modalities were integrated using the software harmonypy, separately for each capture (HSC-MPP and MultiLin). Principal components (nPCs=30) were calculated using the sklearn.decomposition.PCA function after concatenating the CPTT-normalized gene expression matrices from CITE-seq and TEA-seq captures. Principal component 1 was removed as it was correlated strongly with CITE-seq vs. TEA-seq batches, and remaining PCs (PC2-PC30) were used as input to harmonypy to generate an embedding with minimal platform batch effects. A nearest neighbor classification was then carried out from CITE-seq defined clusters to TEA-seq cells (KNN=10) using the sklearn.neighbors.KNeighborsClassifier class, identifying of 57 corresponding clusters. UMAP embeddings were derived using a custom python script (Synapse: syn53237568).
Cluster Specific TEA-seq ATAC Processing
Using the cluster assignments from the CITE-seq to TEA-seq classification, we split the 10X Cell Ranger chromatin accessibility position-sorted BAM file into separate files for each cluster. Peaks were called using the MACS2 peak calling algorithm with the following options: “--nomodel --shift 37 --ext 73 -p 0.05 -B --SPMR --call-summits”. To merge peaks called between cluster splits, summits were ranked by p-value from MACS2 and overlapping summits with higher p-values were removed within a window of +/−500 bp, yielding approximately 800,000 peaks. To focus on dynamic peaks of interest, the log2-CPM normalized read count value for each peak was compared to the corresponding log2-CPTT CITE-seq gene expression values using a Pearson correlation test across the 57 clusters. These correlations were restricted to peaks with a given gene’s previously defined TAD identified in murine HSPCs59 (GSE119347, “BMHSC_TADs.bed.gz” translated from mm9 to mm10 using UCSC lift-over tool). Only the top 5,000 genes ranked by variance were considered. Any peak with a p-value < 0.001 from this test was nominated as a dynamic peak used for downstream analyses, yielding approximately 100,000 dynamic peaks.
Base Resolution Contribution Scores for Tn5 Insertion and CWM Pattern Scoring
To estimate the sequence specific importance for Tn5 insertion from chromatin accessibility captured by TEA-seq, bias corrected ChromBPNet models were generated on each pseudobulk ATAC bam split file for clusters with enough reads to complete model training (32 of the 57 mapped clusters). Bias models were created using the total possorted BAM file from one replicate of the MultiLin TEA-seq capture. The Tn5 counts prediction head model was then used to create contribution scores at single base resolution across the dynamic peak set. These contribution scores were scanned for seqlets of frequently used base pair patterns using the TF-MoDISco-lite program, yielding approximately 30–50 CWM patterns for each cluster (approximately 1,000 patterns total). To identify all seqlets across all dynamic peaks, as opposed to the sampled set used by TF-MoDISco for clustering, the base frequency of each CWM pattern was used as input to scan for loci with similar base pair patterns using the gimmemotifs python package. Matching loci were subsequently scored in their ability to match the CWM by taking the dot product of the base resolution contribution score at matching nucleotides of that loci to the corresponding positions in the CWM. Any loci that matched the CWM and had a dot product score greater than the 5th percentile of those seqlets of the CWM previously called by TF-MoDISco-lite were included in the final set. Thus, this created a set of approximately 20 million seqlets, each matching a CWM pattern identified by TF-MoDISco-lite from the 32 cluster-specific ChromBPNet models. Further details and code are provided in our GitHub repository.
Constructing Putative Gene Regulatory Networks
To infer CWM patterns that likely contribute to gene expression, pairwise Pearson correlation tests were performed between all seqlets within a given gene’s TAD (as previously defined in dynamic peak selection under “Cluster Specific TEA-seq ATAC Processing”) to that gene across the 32 pseudobulk clusters for which ChromBPNet models could be trained. Only the top 5,000 genes ranked by variance (32 pseudobulks) were considered. Any seqlet that correlated to a gene within a TAD above a threshold of 0.4 was maintained as a candidate connection. For visualization, we further restricted to the top 20 correlated seqlets to each gene, yielding approximately 80,000 seqlets to 5,000 genes. Pearson correlation values were recalculated across both these sets to generate a pair-wise correlation matrix between genes and seqlets. Each seqlet was annotated using its matching CWM and the CIS-BP2 motif database to define candidate transcription factors and their families The genes were ordered using hierarchical clustering of seqlet correlation values using the SciPy Python package. Seqlets were then grouped using MarkerFinder with gene clusters as groups. Base resolution contribution score values were then visualized using the UCSC genome browser. For heatmap visualization, we plotted the z-score log2-CPTT normalized gene expression values with the same hierarchically clustered order of genes from the seqlet to gene expression correlation heatmap to show their expression across the 32 clusters. We replicated this procedure for the seqlet contribution score values for the MarkerFinder ordered 80,000 seqlets. Next, the contribution score values for the seqlets across the clusters were aggregated using the CIS-BP2-defined transcription factor families to visualize the average contribution scores for seqlets of a given transcription factor family. This procedure was repeated at the level of CWM patterns defined by TF-MoDISco-lite. To derive the shown GRN model, we restricted visualization to the most informative lineage markers for all 32 clusters, by intersecting the 5,000 top variably expressed genes with the top 25 MarkerFinder markers from all 88 CITE-seq clusters, yielding ~500 putative target genes. The seqlet correlation matrix was rederived with this 500 gene by 80,000 seqlet set and replotted as a pair-wise correlation heatmap and the Cytoscape network produced from all CWM associated transcription factors and top marker genes (object deposited in Synapse syn60529836).
In Silico Perturbation of Select Transcription Factors
To test the cluster specific GRNs built using ChromBPNet defined seqlets as well as transcription factor and target gene expression, we input these connections as the base GRN provided to CellOracle. The Oracle object was constructed using all expressed genes and an auto-selected k-value of 80 with all other parameters set to defaults. 100 cells were randomly sampled from each of the 32 clusters in the HSC-MPP and MultiLin gate populations for pruning and in silico perturbation steps. Cell transition shifts were calculated by simulating KO of select transcription factors (Gata1, Gata2, Irf8, Spi1, Cebpa, and Cebpe) by setting the normalized expression value to 0 in all cells and the n_propagation parameter set to 5. Following in silico perturbation, the shifted transcriptomes were tested against the input transcriptomes using an empirical Bayes adjusted linear model (limma122) to assess significance of change in target gene expression.
Validation of GRN Seqlet Loci
To quantify the overlap between GRN seqlet positions and previously defined enhancer regions (cCREs) we used bedtools intersect to count the overlap between seqlet positions and cCRE regions. We used GIGGLE index and search functions with default parameters to estimate the significance of overlap between the collection of seqlet positions associated to each transcription factor to the peaks that were previously defined in all murine ChIP-seq samples targeting transcription factors. The significance of self-to-self enrichment between a given transcription factors seqlet positions and the corresponding ChIP-seq sample regions was tested using a one-sided Mann Whitney U test of the self-to-self GIGGLE score against the self-to-others scores.
Index Sorting and Assessment of CRE Reporter Fate
Index sorting was carried out on a Sony MA900 or BD FACSymphony S6 sorter on mice with lineage-specific-cre systems that activate tdTomato upon lineage commitment. Single cells were sorted into 96-well plates under “in-vitro culturing conditions of HSPCs” (above) and cells were analyzed either under an Olympus fluorescent microscope after 5 days to count the number of cells produced and how many became tdTomato+, or alternatively using the Agilent BioTek Cytation C10.
N. brasiliensis Infection Model Time Course
750 infectious larvae of N. brasiliensis were inoculated by subcutaneous injection to 8-week old Balb/cJ male mice.75,76 Bone marrow was collected from mice at time points of 3, 5, 7, and 10-days post infection or from uninfected mice as control samples (day 0). Bone marrow was CD117-enriched as described above and processed by full spectrum flow cytometry with a Cytek Aurora cytometer and captured using the Chromium X with Chromium Next GEM Single Cell 3ʹ kit v3.1 chemistry (PN-1000268, 10X Genomics). Supervised assignment of cell labels to the multimodal scTriangulate annotations were performed using cellHarmony as described above. cellHarmony differential expression analyses were performed for each assigned cell state relative and time-point relative to day 0 (eBayes t-test p<0.05, FDR corrected and fold>1.2), with associated heatmap visualization in AltAnalyze.
Extended Data
Extended Data Fig. 1. Experimental and bioinformatics rubric to define, isolate and resolve genomics linkages between murine progenitors.

a, Comprehensive and concordant atlases of murine hematopoietic progenitors were created using multiomic single-cell (CITE-seq) and flow-cytometric profiling (InfinityFlow), then integrated via a unified computational workflow for label transfer (KDE+cellHarmony). b, Trimodal multiomic single cell data (TEA-seq) was analyzed to derive GRNs (ChromLinker). c, Based on their transcriptional and chromatin profiles, genomic-defined MultiLin populations were isolated by flow cytometry and validated. Single cell lineage tracing established the developmental potential of these cells. d, Discrete states govern hematopoietic developmental trajectories. Markers responsive to nascent transcription factor activity (CD55, CD371) highlighted. e-f, Gates used to selectively enrich HSC-MPP cells (e) and CD127+ lymphoid cells (f). g, Fluidigm captures for the populations represented as a gene expression heatmap of marker genes using the named marker strategies (indicated as a tick-map at the bottom). h, Initial MultiLin enrichment strategy (Lin−CD117+CD34+CD115−Ly6C−) based upon Fluidigm captures.
Extended Data Fig. 2. CITE-seq encompassing early hematopoietic states.

a-c, A cluster comparison across TotalVI processed surface protein ADT signals from; a, the hand-titrated 60-ADT antibody panel, b, the universal mix (v1.0), and (c) the final product after molecular sequence-based titration. d, Cartoon illustrating the experimental design of the titration experiment. HASH antibodies were used to multiplex titration samples from the same population. e, A bar plot illustrating the ARI score for re-classification of transcriptome defined CITE-seq labels using ADT feature values for the BioLegend Universal mix versus the final titrated mix. f, Outline illustrating the populations captured to generate the CITE-seq atlas, collected features of the CITE-seq atlas, resolving clustering using scTriangulate. g-j, The integrated transcriptome UMAP embedding, illustrating; the sort gate used to enrich for the targeted population (g), scTriangulate confidence scores (h), the RNA (i), ADT contribution values for the final cluster definitions (j) and source annotations by final stable clusters (k). l, Marker heatmap of cells from qHSC, HSC-Mac-1 and Mac-Nr1h3 (CITE-seq titrated), for RNAs (top) and ADTs (bottom). m, Flow plots of HSC-MPP gated bone marrow cells (Lin−Sca1+CD117+ gating out CD150−CD48+ for MPP3-MPP4-gate cells) reveals rare CD193+ and CD115+ populations; in agreement with predicted HSC-Macrophage populations observed with CITE-seq.
Extended Data Fig. 3. Predicted cluster-defined populations within published flow-defined populations.

Heatmap shows percentage overlap between InfinityFlow atlas populations and in-silico gated populations from the indicated publications (below). CITE-seq atlas population labels (vertical axis) and published flow-cytometry-defined populations (horizontal axis).
Extended Data Fig. 4. Identification of new gating schemes for CITE-seq defined populations.

a, Gating scheme for initial “FS-scRNA-seq” validation focused on myeloid end-state populations. b-e, Expression of Epx-CRE ROSA-LSL-tdTomato reporter (b,d) and eosinophil marker CD125 (IL5RA)(c,e) for gates within the (b,c) Lin−CD16–32+Irf8lowLy6C− gated cells and (d,e) Lin−CD16–32+Irf8lowLy6C+ gated cells. f-g, UMAP of gene expression data from HIVE captures of CD117+, Eosinophil trajectory, and BMCP trajectories clustered (f) and illustrating Ly6c2 expression (g). h, UMIs for genes from selected FS-scRNA-seq populations (Unknown MultiLin, EoP, Ly6C+ EoP1, Ly6C+ EoP2). i, UMIs for cell HASH oligos from selected FS-scRNA-seq populations (Unknown MultiLin, EoP, Ly6C+ EoP1, Ly6C+ EoP2).
Extended Data Fig. 5. FS-scRNA-seq validation steps and creation of benchmark dataset.

a-f, Gating schemes derived from Ab-MarkerFinder with targeted enrichment for the MPP3-IER (a), ML-1b and ML-2 (b), MEP (c), BMCP (d), IG2-MP (e), and IG2-proNeu1 (f) clusters. g, InfinityFlow in silico gated populations replicating mutually-exclusive (non-overlapping) FS-scRNA-seq definitions projected over the UMAP embedding. h, Gene expression profiles of CITE-seq cells (cluster colors are the same as those pointed to in g) that map to FS-scRNA-seq sorted populations by their marker genes. i, Example heatmap of pairwise overlap between the true gate label to the predicted gate label (confusion matrix) in InfinityFlow data used to calculate ARI scores.
Extended Data Fig. 6. “ChromLinker” analysis scheme integrates CITE-seq atlas populations with TEA-seq and then infers GRN.

a, Input data (CITE-seq and TEA-seq) and their underlying components. b, CITE-seq clusters defined by scTriangulate. c, harmonypy integration of CITE-seq labels to TEA-seq (RNA). d, UMAP representation of clusters captured by TEA-seq gates (HSC-MPP and MultiLin gates). e, TEA-seq BAM files (ATAC) were split according to pseudobulk cluster definitions. f, Peaks were called on individual pseudobulk cluster BAM files. g-h, Peaks were tested for association with genes within pre-defined TADs (g) using Pearson correlation of pseudobulk TEA-seq ATAC accessibility profile to pseudobulk CITE-seq gene expression across the 57 cluster profiles (h). i, A set of ~100,000 peaks significantly correlated to gene expression values (p-value < 0.001). j, ChromBPNet bias models were generated using the total merged peak set and the combined 10X Cell Ranger output BAM files from the MultiLin sort gate. k, ChromBPNet bias-factorized models were successfully generated for 32 of the 57 pseudobulk profiles. l, Contribution scores were calculated for each of the 32 models. m, TF-MoDISco was used to cluster the contribution score seqlets and identify CWM patterns, which were annotated with known transcription factor DNA-binding motifs using the CIS-BP2 database. n, The dynamic peak set was scanned and scored for the CWM profiles identified by TF-MoDISco. o-p, A merged database of seqlets was generated (o) and within TADs, the dot product of the contribution scores were correlated to gene expression (p). q-r, Significantly correlated seqlets (r > 0.4) were identified for each gene (q), annotated by their underlying transcription factors to generate a pairwise correlation matrix (r). s, These connections were filtered to significant connections to build an initial gene regulatory network. t, The connections were scored for each cluster and aggregated to generate activity scores: Z-score integrating target gene expression, transcription factor expression, and regulatory contribution of the transcription factor to its putative target genes in each of the 32 clusters.
Extended Data Fig. 7. Label Transfer from CITE-seq to TEA-seq.

a, UMAP projections of merged CITE-seq (blue) and TEA-seq (orange) transcriptome profiles prior to harmonypy integration (integration was done separately for corresponding HSC-MPP and MultiLin gates between CITE-seq and TEA-seq). b, Distributions of principal components of CITE-seq (blue) and TEA-seq (orange) (an X denotes the removal of principal component 1 prior to harmonypy integration - `run_harmony`). c, UMAP projections of merged CITE-seq (blue) and TEA-seq (orange) profiles after modified harmonypy algorithm implementation with removal of principal component #1. d-e, Validation of label propagation by pairwise comparison of ranking (Spearman correlation as blue-white-red color scale) of marker genes within each cluster for both MultiLin TEA-seq replicates (d) and HSC/MPP TEA-seq replicates (e). 1=0–90*/
Extended Data Fig. 8. Prior evidence of epigenetic activity and transcription factor ChIP-seq binding validates gene regulatory network.

a, Cluster-specific GRN with gene nodes colored and scaled according to their relative expression levels: qHSC, ML-1b, IG2-proNeu1, ML-MDP, BMCP, MEP. b, Dot plot comparisons of the InfinityFlow fluorescent reporter level (vertical axis) and activity (horizontal axis) across each of the 32 mapped clusters for MYC. c, Stacked bar plots showing candidate cis-regulatory elements (cCREs) in the proposed GRN and ENCODE v4, with the green area illustrating the overlap. Gray bars indicate non-overlapping regions. d, Stacked bar plots showing candidate cis-regulatory elements (cCREs) in the proposed GRN for the 32 clusters. Overlap with ENCODE v4 cCRE colored according to cluster, with unique cCRE colored grey.e, ChIP-seq experiments targeting transcription factors (CistromeDB) were tested for corresponding transcription factor/seqlet-enrichment using GIGGLE to index and Fisher’s exact test to score. f, A heatmap of all pairwise comparisons of seqlet instances (rows) and CistromeDB ChIP-seq peak sets (columns). Transcription factors are grouped with color bars to annotate families. Red outlines are used to highlight direct family to self-comparisons. The color indicates the rank across all ChIP-seq datasets in CistromeDB for the enrichment of the given transcription factor/seqlet instances by GIGGLE score. g, Dot plot showing the Log2 fold enrichment of the GIGGLE score between the seqlet instances and its corresponding ChIP-seq peak (self-to-self) set over all other ChIP-seq peak sets (self-to-others). Significance is given by a one-sided (positive enrichment) Mann-Whitney U test (significant: p<0.05).
Extended Data Fig. 9. MultiLin populations display distinct lineage priming.

a, sc-Hrödinger scores using the top 100 marker genes per cluster for mixed-lineage priming of the indicated cluster gene expression programs (top) across the indicated CITE-seq atlas populations (left) using CITE-seq gene expression data. b, Bar plots show the expression of specified progenitor marker genes within MultiLin clusters. c, Capybara predictions, using scaled quadratic programming (QP) and multiple identity (Multi-ID) percentages for each cell state relative to the same restricted cell-states as in (a). d, The integrated scRNA-seq gene expression UMAP illustrating PAGA-defined linkages (edges) between states colored as high confidence (blue), medium confidence (green), and not recapitulated by CellTag (dotted). e, CellTag workflow: Two progenitor subsets (Lin−Kit+Sca+, Lin−Kit+Sca−) were independently transduced with CellTag-multi lentiviral barcoding vector, cultured and then GFP+ cells were sorted and captured for scRNA-seq analysis (n=2 technical replicates). f, Cells in MultiLin Lin−Kit+Sca−CD27+ gate were index-sorted for CD55−CD371−, CD55+CD371−, CD55−CD371+ populations and single cells were cultured for 5 days. The output of single-cells is classified as either (gray) all tdTomato− (red) all tdTomato+, or (purple) a mix of tdTomato+ and tdTomato− cells.
Extended Data Fig. 10. Flow cytometric and genomic dissection of MultLin heterogeneity and lineage restriction.

a, 11-color Flow cytometry panel used with Sony MA900 sorter to monitor activity of lineage-defined-CRE activation of ROSA-LSL-tdTomato reporters. b, Heatmap comparison of the predicted cluster content of in-silico-sorted InfinityFlow cell populations (left) and FS-scRNA-Seq analysis (right) for the sort gates (A-G). c, Nippostrongylus brasiliensis infection model schematic shows full spectrum flow analysis and scRNA-seq capture timepoints after infection. d, Cell frequency curve plot for 4 out of 22 in vivo perturbation scRNA-seq datasets. MultiLin cell populations are re-scaled among themselves to 1 (left side of the plot). Unadjusted cell frequency is shown for selected non-MultiLin clusters. e, Chi-square residuals for MultiLin scaled cell-population frequency versus all non-HSPC-MultiLin clusters with significant associations (*).
Supplementary Material
Acknowledgements
This work was partially supported by RC2DK122376, and R01HL122661 to H.L.G.
NIH training grant T32CA117846 partially supported K.F.. Flow cytometric data were acquired using equipment maintained by the CCHMC Research Flow Cytometry Core supported by NIH S10OD025045. Sequencing was performed by the CCHMC DNA Sequencing and Genotyping Core, or by Novogene US. We thank M. Daud Khan for assistance with genomics analyses. The CellTag lentivirus was packaged and purified by the CHMC Translational Core Laboratory Vector Production Facility. We thank B. Song for contributing to ADT titration and cocktail formulation, CITE seq atlases, initial InfinityFlow, and CellTag culture and library work. We thank K. Jindal for advice and troubleshooting CellTag protocols. We thank J. Butler (University of Florida) for BMEC-Akt1 as a gift. We thank M. DeLay (Cytek Biosciences), and K. Weller (OSUCCC) for help gaining access to Cytek Aurora and Cytek Aurora CS for InfinityFlow and FS-scRNA-seq captures. Cytek Biosciences financially supported the use of the CS at OSUCCC. We thank Biolegend and former Biolegend employees B. Z Yeung (BioTuring) and K. Nazor (Proteintech Genomics) for providing the prototype cocktails used for antibody titration. Acknowledgement of individuals and companies does not imply their endorsement of the study’s data and conclusions.
Footnotes
Ethics declarations
Competing interests
J.C. declares that they are an employee and stakeholder of BioLegend, Inc. (a part of the Revvity group of companies). S.M. is a co-founder of CapyBio Inc. The other authors declare no competing interests.
Data availability
All genomics and flow cytometry data (raw and processed) have been deposited in open-access repositories:
GEO Submission: GSE266609
Synapse: syn60529836
Murine Hematopoietic CITE-seq interactive browsers: https://altanalyze.org/MarrowAtlas/
Murine MultiLin GRN visualization web application: https://multilin-grn-viewer-6c053f707717.herokuapp.com/
Code availability
Scripts and associated documentation necessary to reproduce the genomics and InfinityFlow bioinformatics analyses have been deposited in GitHub: https://github.com/KyleFerchen/MultiLin_Project_Code_Repository
References
- 1.Tunnacliffe E & Chubb JR What Is a Transcriptional Burst? Trends Genet 36, 288–297 (2020). 10.1016/j.tig.2020.01.003 [DOI] [PubMed] [Google Scholar]
- 2.Basu J et al. ThPOK is a critical multifaceted regulator of myeloid lineage development. Nat Immunol 24, 1295–1307 (2023). 10.1038/s41590-023-01549-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Muench DE et al. Mouse models of neutropenia reveal progenitor-stage-specific defects. Nature 582, 109–114 (2020). 10.1038/s41586-020-2227-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Olsson A et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016). 10.1038/nature19348 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Buenrostro JD et al. Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation. Cell 173, 1535–1548 e1516 (2018). 10.1016/j.cell.2018.03.074 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Buenrostro JD et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015). 10.1038/nature14590 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Stoeckius M et al. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14, 865–868 (2017). 10.1038/nmeth.4380 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Becht E et al. High-throughput single-cell quantification of hundreds of proteins using conventional flow cytometry and machine learning. Sci Adv 7, eabg0505 (2021). 10.1126/sciadv.abg0505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ferchen K, Salomonis N & Grimes HL pyInfinityFlow: optimized imputation and analysis of high-dimensional flow cytometry data for millions of cells. Bioinformatics 39 (2023). 10.1093/bioinformatics/btad287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Morrison SJ, Wandycz AM, Hemmati HD, Wright DE & Weissman IL Identification of a lineage of multipotent hematopoietic progenitors. Development 124, 1929–1939 (1997). 10.1242/dev.124.10.1929 [DOI] [PubMed] [Google Scholar]
- 11.Akashi K, Traver D, Miyamoto T & Weissman IL A clonogenic common myeloid progenitor that gives rise to all myeloid lineages. Nature 404, 193–197 (2000). 10.1038/35004599 [DOI] [PubMed] [Google Scholar]
- 12.Kondo M, Weissman IL & Akashi K Identification of clonogenic common lymphoid progenitors in mouse bone marrow. Cell 91, 661–672 (1997). 10.1016/s0092-8674(00)80453-5 [DOI] [PubMed] [Google Scholar]
- 13.Pronk CJ et al. Elucidation of the phenotypic, functional, and molecular topography of a myeloerythroid progenitor cell hierarchy. Cell Stem Cell 1, 428–442 (2007). 10.1016/j.stem.2007.07.005 [DOI] [PubMed] [Google Scholar]
- 14.Laurenti E & Gottgens B From haematopoietic stem cells to complex differentiation landscapes. Nature 553, 418–426 (2018). 10.1038/nature25022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Velten L et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat Cell Biol 19, 271–281 (2017). 10.1038/ncb3493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liggett LA & Sankaran VG Unraveling Hematopoiesis through the Lens of Genomics. Cell 182, 1384–1400 (2020). 10.1016/j.cell.2020.08.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Giladi A et al. Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis. Nat Cell Biol 20, 836–846 (2018). 10.1038/s41556-018-0121-4 [DOI] [PubMed] [Google Scholar]
- 18.Swanson E et al. Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. Elife 10 (2021). 10.7554/eLife.63632 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang X et al. An immunophenotype-coupled transcriptomic atlas of human hematopoietic progenitors. Nat Immunol 25, 703–715 (2024). 10.1038/s41590-024-01782-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Solomon M et al. Slow cycling and durable Flt3+ progenitors contribute to hematopoiesis under native conditions. J Exp Med 221 (2024). 10.1084/jem.20231035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li Y et al. Single-Cell Analysis of Neonatal HSC Ontogeny Reveals Gradual and Uncoordinated Transcriptional Reprogramming that Begins before Birth. Cell Stem Cell 27, 732–747 e737 (2020). 10.1016/j.stem.2020.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rodriguez-Fraticelli AE et al. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature 583, 585–589 (2020). 10.1038/s41586-020-2503-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Schlitzer A et al. Identification of cDC1− and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nat Immunol 16, 718–728 (2015). 10.1038/ni.3200 [DOI] [PubMed] [Google Scholar]
- 24.Dahlin JS et al. A single-cell hematopoietic landscape resolves 8 lineage trajectories and defects in Kit mutant mice. Blood 131, e1–e11 (2018). 10.1182/blood-2017-12-821413 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhou W et al. Single-Cell Analysis Reveals Regulatory Gene Expression Dynamics Leading to Lineage Commitment in Early T Cell Development. Cell Syst 9, 321–337 e329 (2019). 10.1016/j.cels.2019.09.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Venkatasubramanian M, Chetal K, Schnell DJ, Atluri G & Salomonis N Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF. Bioinformatics 36, 3773–3780 (2020). 10.1093/bioinformatics/btaa201 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Li G et al. Decision level integration of unimodal and multimodal single cell data with scTriangulate. Nat Commun 14, 406 (2023). 10.1038/s41467-023-36016-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Fast EM et al. External signals regulate continuous transcriptional states in hematopoietic stem cells. Elife 10 (2021). 10.7554/eLife.66512 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Harly C et al. The transcription factor TCF-1 enforces commitment to the innate lymphoid cell lineage. Nat Immunol 20, 1150–1160 (2019). 10.1038/s41590-019-0445-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ishizuka IE, Constantinides MG, Gudjonson H & Bendelac A The Innate Lymphoid Cell Precursor. Annu Rev Immunol 34, 299–316 (2016). 10.1146/annurev-immunol-041015-055549 [DOI] [PubMed] [Google Scholar]
- 31.Lee RD et al. Single-cell analysis identifies dynamic gene expression networks that govern B cell development and transformation. Nat Commun 12, 6843 (2021). 10.1038/s41467-021-27232-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu Z et al. Dendritic cell type 3 arises from Ly6C(+) monocyte-dendritic cell progenitors. Immunity 56, 1761–1777 e1766 (2023). 10.1016/j.immuni.2023.07.001 [DOI] [PubMed] [Google Scholar]
- 33.Rodriguez-Rodriguez N et al. Identification of aceNKPs, a committed common progenitor population of the ILC1 and NK cell continuum. Proc Natl Acad Sci U S A 119, e2203454119 (2022). 10.1073/pnas.2203454119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Sommerkamp P et al. Mouse multipotent progenitor 5 cells are located at the interphase between hematopoietic stem and progenitor cells. Blood 137, 3218–3224 (2021). 10.1182/blood.2020007876 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Winkler IG et al. Bone marrow macrophages maintain hematopoietic stem cell (HSC) niches and their depletion mobilizes HSCs. Blood 116, 4815–4828 (2010). 10.1182/blood-2009-11-253534 [DOI] [PubMed] [Google Scholar]
- 36.Wattrus SJ et al. Quality assurance of hematopoietic stem cells by macrophages determines stem cell clonality. Science 377, 1413–1419 (2022). 10.1126/science.abo4837 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Dutertre CA et al. Single-Cell Analysis of Human Mononuclear Phagocytes Reveals Subset-Defining Markers and Identifies Circulating Inflammatory Dendritic Cells. Immunity 51, 573–589 e578 (2019). 10.1016/j.immuni.2019.08.008 [DOI] [PubMed] [Google Scholar]
- 38.Wang H et al. A reporter mouse reveals lineage-specific and heterogeneous expression of IRF8 during lymphoid and myeloid cell differentiation. J Immunol 193, 1766–1777 (2014). 10.4049/jimmunol.1301939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Balazs AB, Fabian AJ, Esmon CT & Mulligan RC Endothelial protein C receptor (CD201) explicitly identifies hematopoietic stem cells in murine bone marrow. Blood 107, 2317–2321 (2006). 10.1182/blood-2005-06-2249 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Benveniste P et al. Intermediate-term hematopoietic stem cells with extended but time-limited reconstitution potential. Cell Stem Cell 6, 48–58 (2010). 10.1016/j.stem.2009.11.014 [DOI] [PubMed] [Google Scholar]
- 41.Hamey FK et al. Single-cell molecular profiling provides a high-resolution map of basophil and mast cell development. Allergy 76, 1731–1742 (2021). 10.1111/all.14633 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Iwasaki H et al. Identification of eosinophil lineage-committed progenitors in the murine bone marrow. J Exp Med 201, 1891–1897 (2005). 10.1084/jem.20050548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kiel MJ et al. SLAM family receptors distinguish hematopoietic stem and progenitor cells and reveal endothelial niches for stem cells. Cell 121, 1109–1121 (2005). 10.1016/j.cell.2005.05.026 [DOI] [PubMed] [Google Scholar]
- 44.Kwok I et al. Combinatorial Single-Cell Analyses of Granulocyte-Monocyte Progenitor Heterogeneity Reveals an Early Uni-potent Neutrophil Progenitor. Immunity 53, 303–318 e305 (2020). 10.1016/j.immuni.2020.06.005 [DOI] [PubMed] [Google Scholar]
- 45.Liu Z et al. Fate Mapping via Ms4a3-Expression History Traces Monocyte-Derived Cells. Cell 178, 1509–1525 e1519 (2019). 10.1016/j.cell.2019.08.009 [DOI] [PubMed] [Google Scholar]
- 46.Pietras EM et al. Functionally Distinct Subsets of Lineage-Biased Multipotent Progenitors Control Blood Production in Normal and Regenerative Conditions. Cell Stem Cell 17, 35–46 (2015). 10.1016/j.stem.2015.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Solomon M, DeLay M & Reynaud D Phenotypic Analysis of the Mouse Hematopoietic Hierarchy Using Spectral Cytometry: From Stem Cell Subsets to Early Progenitor Compartments. Cytometry A 97, 1057–1065 (2020). 10.1002/cyto.a.24041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wilson A et al. Hematopoietic stem cells reversibly switch from dormancy to self-renewal during homeostasis and repair. Cell 135, 1118–1129 (2008). 10.1016/j.cell.2008.10.048 [DOI] [PubMed] [Google Scholar]
- 49.Pedersen CB et al. cyCombine allows for robust integration of single-cell cytometry datasets within and across technologies. Nat Commun 13, 1698 (2022). 10.1038/s41467-022-29383-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Triana S et al. Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states. Nat Immunol 22, 1577–1589 (2021). 10.1038/s41590-021-01059-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.DePasquale EAK et al. cellHarmony: cell-level matching and holistic comparison of single-cell transcriptomes. Nucleic Acids Res 47, e138 (2019). 10.1093/nar/gkz789 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hao Y et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 42, 293–304 (2024). 10.1038/s41587-023-01767-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhu B et al. Robust single-cell matching and multimodal analysis using shared and distinct features. Nat Methods 20, 304–315 (2023). 10.1038/s41592-022-01709-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chen T & Guestrin C XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 10 (2016). [Google Scholar]
- 55.Korsunsky I et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Trevino AE et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 e5023 (2021). 10.1016/j.cell.2021.07.039 [DOI] [PubMed] [Google Scholar]
- 57.Shrikumar A et al. Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.6.5. arXiv; (2018). [Google Scholar]
- 58.Weirauch MT et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014). 10.1016/j.cell.2014.08.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Chen C et al. Spatial Genome Re-organization between Fetal and Adult Hematopoietic Stem Cells. Cell Rep 29, 4200–4211 e4207 (2019). 10.1016/j.celrep.2019.11.065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Kamimoto K et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023). 10.1038/s41586-022-05688-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). 10.1038/nature11247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Layer RM et al. GIGGLE: a search engine for large-scale integrated genome analysis. Nat Methods 15, 123–126 (2018). 10.1038/nmeth.4556 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Taing L et al. Cistrome Data Browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Res 52, D61–D66 (2024). 10.1093/nar/gkad1069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Brass AL, Kehrli E, Eisenbeis CF, Storb U & Singh H Pip, a lymphoid-restricted IRF, contains a regulatory domain that is important for autoinhibition and ternary complex formation with the Ets factor PU.1. Genes Dev 10, 2335–2347 (1996). 10.1101/gad.10.18.2335 [DOI] [PubMed] [Google Scholar]
- 65.Kong W et al. Capybara: A computational tool to measure cell identity and fate transitions. Cell Stem Cell 29, 635–649 e611 (2022). 10.1016/j.stem.2022.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Wolf FA et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol 20, 59 (2019). 10.1186/s13059-019-1663-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Sanjuan-Pla A et al. Platelet-biased stem cells reside at the apex of the haematopoietic stem-cell hierarchy. Nature 502, 232–236 (2013). 10.1038/nature12495 [DOI] [PubMed] [Google Scholar]
- 68.Jindal K et al. Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes. Nat Biotechnol (2023). 10.1038/s41587-023-01931-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Poulos MG et al. Vascular Platform to Define Hematopoietic Stem Cell Factors and Enhance Regenerative Hematopoiesis. Stem Cell Reports 5, 881–894 (2015). 10.1016/j.stemcr.2015.08.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Weinreb C & Klein AM Lineage reconstruction from clonal correlations. Proc Natl Acad Sci U S A 117, 17041–17048 (2020). 10.1073/pnas.2000238117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Li L et al. A mouse model with high clonal barcode diversity for joint lineage, transcriptomic, and epigenomic profiling in single cells. Cell 186, 5183–5199 e5122 (2023). 10.1016/j.cell.2023.09.019 [DOI] [PubMed] [Google Scholar]
- 72.Ahmed N et al. A Novel GATA2 Protein Reporter Mouse Reveals Hematopoietic Progenitor Cell Types. Stem Cell Reports 15, 326–339 (2020). 10.1016/j.stemcr.2020.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Laslo P et al. Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126, 755–766 (2006). 10.1016/j.cell.2006.06.052 [DOI] [PubMed] [Google Scholar]
- 74.Obata-Ninomiya K, Domeier PP & Ziegler SF Basophils and Eosinophils in Nematode Infections. Front Immunol 11, 583824 (2020). 10.3389/fimmu.2020.583824 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Herbert DR et al. Intestinal epithelial cell secretion of RELM-beta protects against gastrointestinal worm infection. J Exp Med 206, 2947–2957 (2009). 10.1084/jem.20091268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Martin RK et al. B1 Cell IgE Impedes Mast Cell-Mediated Enhancement of Parasite Expulsion through B2 IgE Blockade. Cell Rep 22, 1824–1834 (2018). 10.1016/j.celrep.2018.01.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Coffman RL, Seymour BW, Hudak S, Jackson J & Rennick D Antibody to interleukin-5 inhibits helminth-induced eosinophilia in mice. Science 245, 308–310 (1989). 10.1126/science.2787531 [DOI] [PubMed] [Google Scholar]
- 78.Fulkerson PC & Rothenberg ME Targeting eosinophils in allergy, inflammation and beyond. Nat Rev Drug Discov 12, 117–129 (2013). 10.1038/nrd3838 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Isobe T et al. Preleukemic single-cell landscapes reveal mutation-specific mechanisms and gene programs predictive of AML patient outcomes. Cell Genom 3, 100426 (2023). 10.1016/j.xgen.2023.100426 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Stepanchick E et al. DDX41 haploinsufficiency causes inefficient hematopoiesis under stress and cooperates with p53 mutations to cause hematologic malignancy. Leukemia 38, 1787–1798 (2024). 10.1038/s41375-024-02304-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sturgess K et al. Pharmacological inhibition of METTL3 impacts specific haematopoietic lineages. Leukemia 37, 2133–2137 (2023). 10.1038/s41375-023-01965-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Abdelhamed S et al. Mutant Samd9l expression impairs hematopoiesis and induces bone marrow failure in mice. J Clin Invest 132 (2022). 10.1172/JCI158869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Vukadin L et al. A mouse model of Zhu-Tokita-Takenouchi-Kim syndrome reveals indispensable SON functions in organ development and hematopoiesis. JCI Insight 9 (2024). 10.1172/jci.insight.175053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Williams MJ et al. Maintenance of hematopoietic stem cells by tyrosine-unphosphorylated STAT5 and JAK inhibition. Blood Adv 9, 291–309 (2025). 10.1182/bloodadvances.2024014046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Kain BN et al. Hematopoietic stem and progenitor cells confer cross-protective trained immunity in mouse models. iScience 26, 107596 (2023). 10.1016/j.isci.2023.107596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Wang M et al. Genotoxic aldehyde stress prematurely ages hematopoietic stem cells in a p53-driven manner. Mol Cell 83, 2417–2433 e2417 (2023). 10.1016/j.molcel.2023.05.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Herault L et al. Single-cell RNA-seq reveals a concomitant delay in differentiation and cell cycle of aged hematopoietic stem cells. BMC Biol 19, 19 (2021). 10.1186/s12915-021-00955-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Mitchell CA et al. Stromal niche inflammation mediated by IL-1 signalling is a targetable driver of haematopoietic ageing. Nat Cell Biol 25, 30–41 (2023). 10.1038/s41556-022-01053-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Garner H et al. Understanding and reversing mammary tumor-driven reprogramming of myelopoiesis to reduce metastatic spread. Cancer Cell 43, 1279–1295 e1279 (2025). 10.1016/j.ccell.2025.04.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Auer F et al. Trajectories from single-cells to PAX5-driven leukemia reveal PAX5-MYC interplay in vivo. Leukemia 39, 1607–1626 (2025). 10.1038/s41375-025-02626-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Zheng Z et al. The ATF4-RPS19BP1 axis modulates ribosome biogenesis to promote erythropoiesis. Blood 144, 742–756 (2024). 10.1182/blood.2023021901 [DOI] [PubMed] [Google Scholar]
- 92.Wheat JC et al. Single-molecule imaging of transcription dynamics in somatic stem cells. Nature 583, 431–436 (2020). 10.1038/s41586-020-2432-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Singh PNP et al. Transcription factor dynamics, oscillation, and functions in human enteroendocrine cell differentiation. Cell Stem Cell 31, 1038–1057 e1011 (2024). 10.1016/j.stem.2024.04.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Iwasaki H et al. The order of expression of transcription factors directs hierarchical specification of hematopoietic lineages. Genes Dev 20, 3010–3021 (2006). 10.1101/gad.1493506 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Cantor AB et al. Antagonism of FOG-1 and GATA factors in fate choice for the mast cell lineage. J Exp Med 205, 611–624 (2008). 10.1084/jem.20070544 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Johnson KD et al. Constructing and deconstructing GATA2-regulated cell fate programs to establish developmental trajectories. J Exp Med 217 (2020). 10.1084/jem.20191526 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Lin DS et al. A multi-track landscape of haematopoiesis informed by cellular barcoding and agent-based modelling. bioRxiv, 2024.2003.2028.587126 (2024). 10.1101/2024.03.28.587126 [DOI] [Google Scholar]
- 98.Yamamoto R et al. Clonal analysis unveils self-renewing lineage-restricted progenitors generated directly from hematopoietic stem cells. Cell 154, 1112–1126 (2013). 10.1016/j.cell.2013.08.007 [DOI] [PubMed] [Google Scholar]
- 99.Constantinides MG, McDonald BD, Verhoef PA & Bendelac A A committed precursor to innate lymphoid cells. Nature 508, 397–401 (2014). 10.1038/nature13047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Sasmono RT et al. A macrophage colony-stimulating factor receptor-green fluorescent protein transgene is expressed throughout the mononuclear phagocyte system of the mouse. Blood 101, 1155–1163 (2003). 10.1182/blood-2002-02-0569 [DOI] [PubMed] [Google Scholar]
- 101.Huang CY, Bredemeyer AL, Walker LM, Bassing CH & Sleckman BP Dynamic regulation of c-Myc proto-oncogene expression during lymphocyte development revealed by a GFP-c-Myc knock-in mouse. Eur J Immunol 38, 342–349 (2008). 10.1002/eji.200737972 [DOI] [PubMed] [Google Scholar]
- 102.Gazit R et al. Lethal influenza infection in the absence of the natural killer cell receptor gene Ncr1. Nat Immunol 7, 517–523 (2006). 10.1038/ni1322 [DOI] [PubMed] [Google Scholar]
- 103.Yang Q et al. TCF-1 upregulation identifies early innate lymphoid progenitors in the bone marrow. Nat Immunol 16, 1044–1050 (2015). 10.1038/ni.3248 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Basak O et al. Mapping early fate determination in Lgr5+ crypt stem cells using a novel Ki67-RFP allele. EMBO J 33, 2057–2068 (2014). 10.15252/embj.201488017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Madisen L et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat Neurosci 13, 133–140 (2010). 10.1038/nn.2467 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Heinrich AC, Pelanda R & Klingmuller U A mouse model for visualization and conditional mutations in the erythroid lineage. Blood 104, 659–666 (2004). 10.1182/blood-2003-05-1442 [DOI] [PubMed] [Google Scholar]
- 107.Passegue E, Wagner EF & Weissman IL JunB deficiency leads to a myeloproliferative disorder arising from hematopoietic stem cells. Cell 119, 431–443 (2004). 10.1016/j.cell.2004.10.010 [DOI] [PubMed] [Google Scholar]
- 108.Caton ML, Smith-Raska MR & Reizis B Notch-RBP-J signaling controls the homeostasis of CD8− dendritic cells in the spleen. J Exp Med 204, 1653–1664 (2007). 10.1084/jem.20062648 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Kirstetter P, Anderson K, Porse BT, Jacobsen SE & Nerlov C Activation of the canonical Wnt pathway leads to loss of hematopoietic stem cell repopulation and multilineage differentiation block. Nat Immunol 7, 1048–1056 (2006). 10.1038/ni1381 [DOI] [PubMed] [Google Scholar]
- 110.Hoppe PS et al. Early myeloid lineage choice is not initiated by random PU.1 to GATA1 protein ratios. Nature 535, 299–302 (2016). 10.1038/nature18320 [DOI] [PubMed] [Google Scholar]
- 111.Thambyrajah R et al. GFI1 proteins orchestrate the emergence of haematopoietic stem cells through recruitment of LSD1. Nat Cell Biol 18, 21–32 (2016). 10.1038/ncb3276 [DOI] [PubMed] [Google Scholar]
- 112.Seehus CR et al. The development of innate lymphoid cells requires TOX-dependent generation of a common innate lymphoid cell progenitor. Nat Immunol 16, 599–608 (2015). 10.1038/ni.3168 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Doyle AD et al. Homologous recombination into the eosinophil peroxidase locus generates a strain of mice expressing Cre recombinase exclusively in eosinophils. J Leukoc Biol 94, 17–24 (2013). 10.1189/jlb.0213089 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Abe T et al. Visualization of cell cycle in mouse embryos with Fucci2 reporter directed by Rosa26 promoter. Development 140, 237–246 (2013). 10.1242/dev.084111 [DOI] [PubMed] [Google Scholar]
- 115.Young MD & Behjati S SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. Gigascience 9 (2020). 10.1093/gigascience/giaa151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Gayoso A et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat Methods 18, 272–282 (2021). 10.1038/s41592-020-01050-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Caron DP et al. Multimodal hierarchical classification of CITE-seq data delineates immune cell states across lineages and tissues. Cell Rep Methods 5, 100938 (2025). 10.1016/j.crmeth.2024.100938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Jindal K et al. Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes. Nat Biotechnol 42, 946–959 (2024). 10.1038/s41587-023-01931-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Weinreb C, Rodriguez-Fraticelli A, Camargo FD & Klein AM Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367 (2020). 10.1126/science.aaw3381 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Wolf FA, Angerer P & Theis FJ SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018). 10.1186/s13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Saelens W, Cannoodt R, Todorov H & Saeys Y A comparison of single-cell trajectory inference methods. Nat Biotechnol 37, 547–554 (2019). 10.1038/s41587-019-0071-9 [DOI] [PubMed] [Google Scholar]
- 122.Ritchie ME et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015). 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All genomics and flow cytometry data (raw and processed) have been deposited in open-access repositories:
GEO Submission: GSE266609
Synapse: syn60529836
Murine Hematopoietic CITE-seq interactive browsers: https://altanalyze.org/MarrowAtlas/
Murine MultiLin GRN visualization web application: https://multilin-grn-viewer-6c053f707717.herokuapp.com/
