Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 Sep 4:2025.09.01.673389. [Version 1] doi: 10.1101/2025.09.01.673389

An Inflammatory and Quiescent HSC Subpopulation Expands with Age in Humans

Ksenia R Safina 1,2,3,4, Dylan A Kotliar 3,5,6,7, Michelle Curtis 3,5,6, Jonathan D Good 1,2,3,4, Chen Weng 8,9,10,11, Shawn David 12,13, Soumya Raychaudhuri 3,5,6,7,14, Antonia Kreso 3,15,16, Jennifer Trowbridge 12, Vijay G Sankaran 3,8,10,11,17, Peter van Galen 1,2,3,4,17
PMCID: PMC12424952  PMID: 40949975

Abstract

Aging of the blood system impacts systemic health and can be traced to hematopoietic stem cells (HSCs). Despite multiple reports on human HSC aging, a unified map detailing their molecular age-related changes is lacking. We developed a consensus map of gene expression in HSCs by integrating seven single-cell datasets. This map revealed previously unappreciated heterogeneity within the HSC population. It also links inflammatory pathway activation (TNF/NFκB, AP-1) and quiescence within a single gene expression program. This program dominates an inflammatory HSC subpopulation that increases with age, highlighting a potential target for further experimental studies and anti-aging interventions.

Background

The aging blood system has a decreasing capacity to mount effective immune responses, transport oxygen, and produce lymphocytes13. Many of these changes can be traced to HSCs, which exhibit age-associated clonal expansion and myeloid bias47. Human studies across age groups have revealed multiple alterations including epigenetic reprogramming, decreased polarity, and skewed differentiation811. However, there is little consensus on the molecular programs consistently altered with age, in part due to technical variation across datasets12. Moreover, while the human HSC compartment is often analyzed as a whole, heterogeneity can be observed using single-cell sequencing13,14. The molecular basis of this heterogeneity, and how it evolves with age, remains poorly defined. To establish a robust reference for HSC aging and define the extent to which HSC heterogeneity changes with age, we undertook an effort to integrate human HSC aging studies. Our analysis reveals a consensus program of inflammation and quiescence in a subpopulation of HSCs that becomes increasingly dominant with age.

Results and Discussion

To robustly characterize age-associated changes in human HSCs, we combined single-cell/single-nucleus data across six studies and one unpublished dataset from 98 individuals and annotated 28,989 HSCs using the BoneMarrowMap atlas (Fig. 1a, Suppl. Table 1). An orthogonal approach of identifying and annotating clusters on the integrated object yielded 44,187 HSCs and included 87% of HSCs identified by BoneMarrowMap (Suppl. Fig. 1), suggesting that the BoneMarrowMap algorithm selects an accurate and stringent population of HSCs. All downstream analysis was performed on BoneMarrowMap-annotated HSCs.

Figure 1. Data from multiple single-cell datasets show that transcriptional heterogeneity of human HSCs changes with age.

Figure 1.

a. Schematic shows dataset collection and HSC annotation. b. Volcano plot shows the differentially expressed genes in Young vs Aged samples across five datasets. AP-1 members are shown in red, genes with absolute logFC>1 and p.adjusted<0.05 are shown in blue; top-10 most significant hits per cohort and significant AP-1 members are labeled. c. Bar plot shows top-15 most significant gene sets in GSEA analysis; NES, normalized enrichment score. d. Uniform Manifold Approximation and Projections (UMAPs) show all integrated HSCs, colored by score for aging signature, Hallmark TNF-α/NF-κB signaling gene set, and Reactome DNA replication gene set. e. UMAP shows HSC clusters identified in Seurat. f. Box plot shows comparison of cluster abundances between the Young (n=27) and Aged (n=17) cohorts. Symbols indicate cluster fractions per sample; p-values of two-sided Wilcoxon test with Bonferroni correction are shown. g. UMAP is colored by neighborhood coefficient of correlation to Aged vs. Young cohorts; neighborhoods that didn’t pass FDR<10% are colored in gray. h. Heatmap shows activity of the top-5 regulons associated with the Aging cohort, identified by SCENIC; averaged Z-scores of AUC values are shown.

First, we inferred differential gene expression (DE) between young and aged samples with at least 20 HSCs (young: 19–37 y.o., n=25; aged: 60–87 y.o., n=15). 68 and 29 genes were up- and down-regulated in aged samples, respectively (Fig. 1b, Suppl. Table 2). Notably, three members of the AP-1 complex, JUN, JUNB, and FOSB, implicated in stress-activated MAPK signaling and quiescence, were consistently upregulated across studies (Suppl. Fig. 2).1517

Gene set enrichment analysis (GSEA) revealed two major biological phenomena distinguishing expression profiles of young and aged samples (Fig. 1c, Suppl. Table 2). First, several inflammatory-response pathways, including TNF, IFN-g, and AP-1 signaling, were upregulated in aged samples. Second, aged samples showed higher quiescence, in contrast to young samples showing more active proliferation. These findings are in agreement with previous reports showing increased circulating TNF and delayed cell cycle entry of aged HSCs9,18,19.

The activity of these pathways was not uniform across the population of HSCs (Fig. 1d, Suppl. Fig. 3). The aging signature (defined as 68 genes up-regulated in aged samples), together with TNF signaling, was more active in clusters 0 and 3 (Fig 1e, Suppl. Fig. 3b). These clusters were more abundant in Aged samples (Fig. 1f). DNA replication, E2F, and cell cycle signatures were most active in cluster 4, one of two clusters that were more abundant in Young samples (Suppl. Fig. 3b). Differential abundance of HSC populations between the Young and Aged cohorts was supported by cluster-independent co-varying neighborhood analysis (CNA, Fig. 1g). To explore this heterogeneity further, we next inferred the activity of transcription factors (TFs) using SCENIC. Despite substantial variability between datasets and individuals, combined analysis highlighted several AP-1 members as the top differentially active TFs in the aged cohort (Fig. 1h, Suppl. Fig. 4), in agreement with DE results. Altogether, these observations suggest that at least part of the observed transcriptional heterogeneity in HSCs can be attributed to age, prompting us to investigate its potential sources.

To capture heterogeneity across cells that may be missed by conventional DE analysis and clustering, we used consensus non-negative matrix factorization (cNMF). This cluster-free and unbiased approach decomposes each cell’s gene expression profile into a set of underlying gene expression programs (GEPs) active across cells in the experiment (Fig. 2a)20. We ran cNMF on 22 samples with at least 100 HSCs (n=14 young, n=8 aged) to identify GEPs reproducible across datasets. We then clustered GEPs from individual samples together, which identified four representative clusters carrying at least four programs coming from at least two datasets, yielding four meta-programs (metaGEPs) (Fig. 2b, Suppl. Fig. 5, Suppl. Table 3). To functionally annotate metaGEPs 1–4, we assessed the enrichment of relevant gene sets in each of the meta-programs (Fig. 2c). metaGEP1 was enriched for the aging signature, TNF signaling, and quiescence, indicating that the activity of these processes is linked within a single program; we termed metaGEP1 the inflammatory aging program. The three remaining programs were mainly associated with lineage commitment and were annotated accordingly. metaGEP2 was enriched for a granulocyte-macrophage progenitor (GMP) signature and for cell-cycle-related pathways. metaGEPs 3–4 showed enrichment for megakaryocyte-erythrocyte progenitor (MEP) and common lymphoid progenitor (CLP) signatures, respectively (Fig. 2c).

Figure 2. cNMF analysis recovers four gene expression meta-programs (metaGEPs) in human HSCs, including age-dependent programs.

Figure 2.

a. Schematic shows cNMF analysis steps. b. Heatmap shows Pearson correlation between Z-score vectors of individual programs comprising the four identified metaGEPs. c. Heatmap shows relevant pathways enriched in metaGEPs as inferred by GSEA; BH-corrected p-values: *, p<0.05, **, p<0.01, ***, p<0.001. d. UMAP shows all integrated HSCs, colored by the predominantly active metaGEP. e. Box plot shows comparison of predominant program abundances between the Young (n=27) and Aged (n=17) cohorts. Symbols indicate cluster fractions per sample; p-values of two-sided Wilcoxon test with Bonferroni correction are shown. f. Heatmap shows activity of metaGEP-specific regulons, identified by SCENIC; averaged Z-scores of AUC values are shown. GEP: gene expression program, GMP: granulocyte-macrophage progenitor, MEP: megakaryocyte-erythrocyte progenitor, CLP: common lymphoid progenitor.

Similar to our prior DE-based observations (Fig. 1d, Suppl. Fig. 3b), the activity of metaGEPs was not uniform across the population of HSCs and suggested that HSCs are composed of subpopulations of cells governed by different programs (Suppl. Fig. 6). To define these HSC subpopulations, we identified the predominant program in each cell of the original integrated object (Fig. 2d). To investigate whether the activity of metaGEPs 1–4 changes with age, we compared the fractions of cells predominated by each of the four programs in every sample between the young and aged cohorts with at least 20 HSCs (n=27 young, n=17 aged, Fig. 2e). Inflammatory aging metaGEP1 was more abundant in aged samples, while CLP-associated metaGEP4 showed higher activity in the young cohort. The latter agrees with lower lymphoid output with age1,9. Overlaying metaGEP activities with TF activities inferred by SCENIC revealed nine TFs whose activity was highly and consistently correlated with metaGEP1 usages across the datasets, including seven AP-1 complex members and two TFs implicated in maintaining quiescence, KLF4 and EGR121,22 (Fig. 2f, Suppl. Fig. 7). These findings indicate metaGEP1 underlies age-associated differences in human HSCs and reveal molecular features of an inflammatory and quiescent HSC population that expands with age.

In this study, we aggregated data from seven single-cell and single-nucleus studies and consistently annotated 28,989 HSCs across individuals, including 27 young and 17 aged samples with at least 20 HSCs, and identified four meta-programs that drive transcriptional heterogeneity through variable activity across HSCs. The largest meta-program we recovered, metaGEP1, was enriched for TNF signaling and quiescence, and marked a subpopulation of HSCs that expanded with age. These findings indicate that HSC aging is not uniform but is instead shaped by shifting program activities.

Inflammation has been linked to HSC activation and exhaustion2325. In contrast, our data link inflammatory signaling and quiescence within a single program associated with aging, more reminiscent of chronic inflammation which can induce quiescence26,27. With the emerging concept that acute inflammatory stimuli induce trained immunity in HSCs13,28,29, an outstanding question is to what extent metaGEP1-dominated HSCs generate pro-inflammatory immune cells that contribute to aging.

Members of the AP-1 complex were among the top-enriched transcription factors in both DE and cNMF analysis. While the activity of immediate early response (IER) genes, including members of AP-1, can be prone to technical variation due to sample processing30, our finding that AP-1 genes are consistently upregulated across several datasets implies biological relevance. Along with other genes highlighted in our analysis, including anti-inflammatory ZFP36 and NFKBIA31,32, perturbation of these factors may modulate HSC aging.

The cluster-independent framework we used to discover the biology of aging HSCs can be easily adapted to larger datasets. Indeed, there were only 22 samples with >100 HSCs from which to discover GEPs, and larger samples tended to resolve more programs successfully (Suppl. Fig. 8). Analyzing more samples with larger cell counts will improve the numerical stability of NMF solutions and may allow for more granular resolution of HSC programs. Finally, since inflammatory pathways in HSCs are affected by clonal hematopoiesis14, future work should assess how genetic changes interact with the programs identified here.

Conclusions

Our study provides a consensus, single-cell resolution map of human HSCs by integrating seven prior datasets. Among the most robust changes in aging HSCs are upregulation of pathways associated with inflammatory signaling (TNF, IFN-g), stress-activated MAPK signaling (AP-1), and quiescence. We use a cluster-free approach to uncover activity programs that vary in intensity across individual HSCs. The strongest program shaping HSC heterogeneity is associated with inflammation and quiescence, linking these properties together in individual cells. An inflammatory HSC subset dominated by this program expands with age. These findings open new directions for testing whether modulating inflammatory HSCs can rejuvenate blood production and help maintain healthy hematopoiesis with age.

Methods

Single-cell sequencing

We generated one of the seven single-cell RNA-seq datasets ourselves (others were publicly available). Bone marrow cells from nine donors were collected from the iliac crest of patients undergoing cardiac surgery under an excess sample banking and sequencing protocol that covers all study procedures and was approved by the Institutional Review Board (IRB) of Mass General Brigham. Donors were confirmed negative for common CHIP mutations using targeted sequencing. Mononuclear cells were isolated using Ficoll or lymphoprep and cryopreserved in liquid nitrogen storage. Cells were thawed using standard procedures, and viable (DAPI negative) cells were sorted on a Sony SH800 flow cytometer. Next, 10,000–15,000 cells were loaded onto a 10x Genomics chip. Further processing was done using the recommended procedures for the 10x Genomics 3’ v3.0, v3.1, or v4 chemistry. Libraries were sequenced on the NovaSeq SP 100 cycle with the following parameters (Read 1: 28 + Read 2: 75 + Index 1 (i7): 10 + Index 2 (i7): 10). Count matrices were generated using CellRanger v.7.1.0 with default settings and GRCh38 as the reference genome.

Dataset preparation and annotation

We compiled six publicly available human single-cell datasets and added nine samples from our lab. The complete dataset contained bone marrow cells from 98 individuals (9 prenatal, 2 cord blood, 4 infant, 7 child, 31 young, 15 middle-aged, and 30 aged)4,11,3336. Gene expression matrices were available for each of the datasets, except for the Adelman and some samples in the Weng datasets. For the Adelman dataset, we mapped sequencing data onto hg38 using STAR37 with default settings and quantified gene counts using featureCounts from Rsubread package38 to produce a gene expression matrix. For the Weng dataset, we prepared the input object as described in Github repository petervangalen/ReDeeM_2024. For each of the seven datasets, we loaded the count matrix into Seurat39, keeping genes captured in at least five cells and cells with at least 100 genes captured. We then filtered the initial matrix based on nCount, nGene, mitochondrial genes content, and doublet scores (as determined by scrubletR40) for each sample in the dataset, removing differentiated cell populations where applicable. The accession codes and dataset-specific filtering parameters are available in Supplementary Table 1. We then subsetted each of the Seurat objects for genes present in all of the datasets (n=11,612). To obtain concordant cell type annotations among the datasets, we mapped each of the datasets onto the BoneMarrowMap atlas and transferred cell type labels41. For subsequent analyses, we only retained cells annotated as HSCs. 30,070 cells were annotated as HSCs; of those, 1,081 cells were mapped outside the reference HSC cluster and were excluded from the analysis, leaving us with 28,989 HSCs.

To cross-validate BoneMarrowMap-based HSC assignment, we integrated seven HSPC datasets with scVI42 (v1.3.0, n_layers=4, n_latent=30, max_epochs=60), specifying sample name as a batch key and dataset name and single-cell vs. single-nucleus data type as categorical covariates (batch_key=‘Sample’, categorical_covariate_keys=[‘Datase’, ‘data_type’]), uploaded the integrated object to Seurat, computed neighborhood graph on scVI latent variables, identified Louvain clusters (resolution=1.5) and computed UMAP. We scored each cell by three published HSC signatures4345, computed average signature scores per cluster (aggregating across cells and three public signatures), and selected four clusters with the highest average HSC score as HSCs. This yielded 44,187 cells and included 87% of HSCs identified by BoneMarrowMap, implying the two annotation approaches are concordant, with BoneMarrowMap being more specific.

Differential expression analysis

To identify gene expression changes associated with age, we pseudobulked HSC samples per sample and ran DESeq246 using dataset name and sample sex as covariates. Race and ethnicity were not reported in publicly available datasets and therefore not included as covariates. To ensure the robustness of the analysis, we only used samples with at least 20 cells and datasets with at least two such samples in both Young and Aged cohorts (n=25 young and n=15 aged samples across five datasets). We obtained results for the ‘Cohort_Aged_vs_Young’ coefficient and shrunken log fold changes using apeglm47. To define the aging signature, we used genes with logFC>1 and p.adj<0.05, yielding 68 genes.

Gene set enrichment analysis

We used the fgsea package48 to conduct gene set enrichment analysis (GSEA) and characterize ranked gene lists produced in this study. We used two collections of signatures from MSigDB4951, HALLMARK and C2:CP. We obtained a quiescence signature from García-Prat et al. by selecting genes with logFC < −2 and FDR<0.00001 from the DE results of Table S1 therein52. To define MEP, GMP, and CLP lineage signatures, we identified markers of MEP, Early GMP, and CLP populations of the BoneMarrowMap-annotated object, respectively (using Seurat’s FindMarkers function, providing HSC, MPP-MkEry and MPP-MyLy as ident.2). For each lineage, we selected top-50 genes with p.adj<1e-10 and logFC>1, and excluded genes overlapping between the three 50-gene sets. To annotate cNMF metaGEPs, we also included the aging signature (68 genes) identified in the DE analysis of this study. To characterize the recovery of the aging signature compared to a random signature, we generated 20 random signatures with an expression pattern similar to the aging signature and included them in GSEA (as detailed below).

Variable genes identification

To identify variable genes shared across the seven datasets, we first excluded ribosomal genes (n=94 genes starting with RPS or RPL), then identified top 3,000 variable features within each dataset using FindVariableFeatures() in Seurat, and finally, selected the genes identified as variable in at least 4 datasets out of 7 (n=1,554). We then subsetted the expression count matrix for these genes and used it as an input for scVI and cNMF.

Dataset integration

We integrated datasets using scVI (v1.3.0, n_layers=3, n_latent=30), specifying sample name as a batch key and dataset name as a categorical covariate (batch_key=‘Sample’, categorical_covariate_keys=[‘Dataseť]), uploaded the integrated object to Seurat, computed neighborhood graph on scVI latent variables, identified Louvain clusters (resolution=0.5) and computed UMAP. Louvain clusters and UMAP coordinates were transferred to the original object with the entire gene set (11,612 genes), to score cells for gene sets of interest using AddModuleScore() (Fig. 1d, Suppl. Fig. 3a).

Consensus non-negative matrix factorization for inference of gene expression programs

We used consensus non-negative matrix factorization (cNMF)20 to identify gene expression programs (GEPs) across datasets. To mitigate the effect of sample-driven batch effects, we identified GEPs at the level of individual samples and clustered the discovered programs to define meta-programs (metaGEPs) shared across multiple samples and datasets, similar to the approach taken by Gavish et al.53 We used young or aged samples with at least 100 cells, resulting in 8 Ainciburu, 6 Weng, 7 Li, and 1 Zhang sample (total n=22). As cNMF is sensitive to cells with low gene counts and low number of genes, we additionally preprocessed each of the samples as follows. We preserved cells with at least 50 captured genes and the total number of transcripts within the sample-specific limits (see below), and genes expressed in at least 10 cells. Sample-specific total count limits were defined as median +/− 2.5 mean absolute deviation for each sample except sample BM3 in Zhang dataset. Data from Zhang et al. has an order of magnitude higher expression counts than other datasets with several cells having much higher counts than the median; these outlier cells drive individual programs in cNMF results, dominating the observed variation in expression data. To address this, we used a fixed interval for total counts of [100–10,000] for Zhang sample BM3.

After preprocessing, we ran cNMF on each sample, with 500 iterations of factorization and a k (the number of inferred GEPs) ranging from 3 to 10. For each sample, a single final value of k was selected based on the stability vs. error plot and visual inspection of 500 clustered factorization results. We assessed distance thresholds of 0.05, 0.1 and 0.15; distance threshold of 0.1 produced the most visually stable clusters and was used for all the samples. Among the smaller samples, individual cells sometimes drive some of the identified programs, similar to sample BM3 in Zhang dataset. To address this, for each program, we computed the ratio between 100% and 75% quantiles of usage values, and excluded programs with a ratio of more than 10 (which reflects that the maximum usage value is much larger than most usage values). After this filtering, 97 GEPs remained.

Identification and analysis of metaGEPs

We clustered Z-score spectra of 97 GEPs using the iterative clustering algorithm implemented in starCAT54 with a modification allowing for more than one program per sample in a cluster (corr_thresh=0.1, pct_thresh=0.1). We defined meta-programs (metaGEPs) as clusters carrying at least four programs coming from at least two datasets. Z-score spectra within each metaGEP were averaged and annotated with GSEA. The variance-normalized TPM spectra within each metaGEP were averaged to produce the meta-program expression matrix (4 metaGEP × 11,612 genes) which we used to infer metaGEP usages in each of the datasets with starCAT54. To compare usages between the young and aged cohorts, we defined a single predominant metaGEP per cell, computed the fractions of each predominant metaGEP in every sample with at least 20 cells (n=27 young and n=17 aged samples), and compared the fractions between the two age cohorts.

The aging signature was the most consistently recovered gene set, as it showed the most frequent significant enrichment by GSEA on GEPs. To test whether this high recovery rate is specific to the aging signature rather than its gene expression pattern, we constructed 20 random gene sets and assessed their recovery by GSEA. To construct a random gene set, we computed average normalized expression for 11,612 genes, split the genes into 50 equidistant expression bins, and randomly sampled 68 genes from the bins containing aging signature genes, to mimic the expression pattern of the aging signature. To assess the recoverability of gene sets by GSEA, we revisited our cNMF results, selected all visually stable cNMF runs (k values with distance thresholds producing the most stable clusters), and annotated them with GSEA using 20 random gene sets, the aging, TNF signaling, quiescence signature, and the three lineage signatures, applying a Bonferroni correction (which is more stringent than the default Benjamini-Hochberg procedure). For each cNMF run within each sample, we assessed whether it contains a program significantly positively enriched for any of the tested gene sets, and computed the fraction of cNMF runs within each sample that recovered each of the gene sets (Suppl. Fig. 8a). The aging signature was indeed the most recoverable; lineage signatures were less recoverable, but invariably still more recoverable than the random gene sets. We also assessed the recovery rate of a combination of gene sets (Aging + TNF + Quiescence, and Aging + TNF, Suppl. Fig. 8b), which also showed higher recovery than random gene sets. Samples with more HSCs had higher recovery rates than samples with fewer HSCs. This demonstrates that the genesets we used to annotate metaGEPs are robust compared to random gene sets, supporting biological significance, and that using additional large (CD34+ enriched) samples may help resolve and annotate gene expression programs better.

SCENIC analysis

We used pySCENIC to infer the activity of transcription factors (TFs)55. We followed the standard pipeline with default parameters, except using ‘--auc_threshold 0.01’ for the ‘ctx’ command. We inferred TF activities (represented as area under the curve values, AUC) within each dataset separately and then retained 108 TFs identified in all seven datasets. To identify age-specific TFs, we first subsetted young and aged samples, converted age cohort to a binary variable (with 1 being Aged), and computed point-biserial correlation between the binary age variable and TF activities within each dataset. We then assigned ranks to TFs within each dataset, averaged both ranks and correlation values, and selected TFs with mean rank <=10 and mean correlation >=0.15 as TFs associated with age, yielding five TFs (Fig. 1h). We repeated the same procedure to infer TFs associated with each of the metaGEPs, retaining samples from all age cohorts and computing Pearson correlation between TF activities and metaGEP usages, instead of point-biserial correlation (Fig. 2f, Suppl. Fig. 7). For visualization purposes, AUC values were converted into Z-scores. We limited the Z-score values to the 0.1 and 99.9 percentiles when visualizing TF activities in individual cells (Suppl. Fig. 4) to avoid distorting the color scale with outliers.

Co-varying Neighborhood Analysis

To test whether the age cohort is associated with certain neighborhoods of cells, we conducted association testing using co-varying neighborhood analysis (CNA)56. We first subsetted the integrated object for young and aged samples, encoded age as a binary variable (with 1 being Aged), computed a neighborhood graph using scVI latent variables, and ran CNA, correcting for the dataset name as a batch variable. To visualize significant associations in the UMAP, we colored neighborhoods with FDRs>0.1 grey.

Supplementary Material

Supplement 1

Table 1. Single-cell datasets used for our integrated analysis of HSC aging. a. Summary of the seven datasets used in the study, including the preprocessing of the original count matrices. b. Detailed information on individual HSC samples. Only 96 out of 98 individuals are included, because two did not have any BoneMarrowMap-annotated HSCs.

media-1.xlsx (27.4KB, xlsx)
Supplement 2

Table 2. The results of Aged vs Young pseudobulk gene expression changes analysis. a. DESeq2 results, positive logFC corresponds to upregulation in Aged samples. b. GSEA results, positive NES corresponds to positive enrichment in Aged samples.

media-2.xlsx (1.1MB, xlsx)
Supplement 3

Table 3. Z-scores of gene contributions to metaGEPs 1–4. Averaged Z-score values of genes (rows) contributing to meta-gene expression programs (metaGEPs) 1–4 (columns).

media-3.xlsx (120.8KB, xlsx)
Supplement 4

Acknowledgments

We thank members of the Van Galen Laboratory for helpful discussions. We thank Nathan Salomonis, Christopher Hourigan, and Hojun Li for instructions on data usage.

Funding

Peter van Galen is supported by the Ludwig Center at Harvard, the National Institutes of Health (NIH) (R33CA278393), the Starr Cancer Consortium, the Edward P. Evans Foundation, the Vera and Joseph Dresner Foundation, the MPN Research Foundation, a Research Scholar Grant from the American Cancer Society (RSG-24-1318769-01-CDP), a Hevolution/American Federation for Aging Research New Investigator Award, and the Brigham Research Institute. V.G.S. is an Investigator of and supported by the Howard Hughes Medical Institute, as well as NIH grants R01DK103794, R01HL146500, R01CA265726, R01CA292941, and the Mathers Foundation. J.J.T. is a Scholar of the Leukemia & Lymphoma Society and supported by NIH grants R01DK118072, R01AG069010, U01AG077925 and The Mark Foundation for Cancer Research. C.W. is supported by NIH grants 1K99HG013991-01.

Footnotes

Declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Sternum bone marrow samples from the Safina dataset were collected under an excess sample banking and sequencing protocol approved by the Mass General Brigham Institutional Review Board (IRB), which covered all study procedures. Additional datasets were obtained from public repositories.

Consent for publication

Not applicable.

Availability of data and materials

Data are available at https://figshare.com/projects/Aging_HSCs_2025/235781 (under embargo until publication). Annotated code to reproduce all analyses is available at https://github.com/noranekonobokkusu/Aging_HSCs_2025 and will be made available upon publication. Sequencing data generated in this study were deposited to GEO under accession number GSE302126 and will be made available upon publication.

References

  • 1.MacKinney A. A. Jr. Effect of aging on the peripheral blood lymphocyte count. J. Gerontol. 33, 213–216 (1978). [DOI] [PubMed] [Google Scholar]
  • 2.Montecino-Rodriguez E., Berent-Maoz B. & Dorshkind K. Causes, consequences, and reversal of immune system aging. J. Clin. Invest. 123, 958–965 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Price E. A. Aging and erythropoiesis: current state of knowledge. Blood Cells Mol. Dis. 41, 158–165 (2008). [DOI] [PubMed] [Google Scholar]
  • 4.Weng C. et al. Deciphering cell states and genealogies of human haematopoiesis. Nature 627, 389–398 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yamamoto R. et al. Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell Stem Cell 22, 600–607.e4 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.de Haan G. & Lazare S. S. Aging of hematopoietic stem cells. Blood 131, (2018). [Google Scholar]
  • 7.Leins H. et al. Aged murine hematopoietic stem cells drive aging-associated immune remodeling. Blood 132, 565–576 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pang W. W. et al. Human bone marrow hematopoietic stem cells are increased in frequency and myeloid-biased with age. Proc. Natl. Acad. Sci. U. S. A. 108, 20012–20017 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Amoah A. et al. Aging of human hematopoietic stem cells is linked to changes in Cdc42 activity. Haematologica 107, 393–402 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kuranda K. et al. Age-related changes in human hematopoietic stem/progenitor cells. Aging Cell 10, 542–546 (2011). [DOI] [PubMed] [Google Scholar]
  • 11.Adelman E. R. et al. Aging Human Hematopoietic Stem Cells Manifest Profound Epigenetic Reprogramming of Enhancers That May Predispose to Leukemia. Cancer Discov. 9, 1080–1101 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Konturek-Ciesla A. et al. Temporal multimodal single-cell profiling of native hematopoiesis illuminates altered differentiation trajectories with age. Cell Rep. 42, 112304 (2023). [DOI] [PubMed] [Google Scholar]
  • 13.Zeng A. G. X. et al. Identification of a human hematopoietic stem cell subset that retains memory of inflammatory stress. bioRxiv 2023.09.11.557271 (2023) doi: 10.1101/2023.09.11.557271. [DOI] [Google Scholar]
  • 14.Jakobsen N. A. et al. Selective advantage of mutant stem cells in human clonal hematopoiesis is associated with attenuated response to inflammation and aging. Cell Stem Cell 31, 1127–1144.e17 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Santaguida M. et al. JunB protects against myeloid malignancies by limiting hematopoietic stem cell proliferation and differentiation without affecting self-renewal. Cancer Cell 15, 341–352 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Okada S., Fukuda T., Inada K. & Tokuhisa T. Prolonged expression of c-fos suppresses cell cycle entry of dormant hematopoietic stem cells. Blood 93, 816–825 (1999). [PubMed] [Google Scholar]
  • 17.Wagner E. F. & Nebreda A. R. Signal integration by JNK and p38 MAPK pathways in cancer development. Nat. Rev. Cancer 9, 537–549 (2009). [DOI] [PubMed] [Google Scholar]
  • 18.Alvarez-Rodríguez L., López-Hoyos M., Muñoz-Cacho P. & Martínez-Taboada V. M. Aging is associated with circulating cytokine dysregulation. Cell Immunol 273, 124–132 (2012). [DOI] [PubMed] [Google Scholar]
  • 19.Noda S., Ichikawa H. & Miyoshi H. Hematopoietic stem cell aging is associated with functional decline and delayed cell cycle progression. Biochem. Biophys. Res. Commun. 383, 210–215 (2009). [DOI] [PubMed] [Google Scholar]
  • 20.Kotliar D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife 8, (2019). [Google Scholar]
  • 21.Min I. M. et al. The transcription factor EGR1 controls both the proliferation and localization of hematopoietic stem cells. Cell Stem Cell 2, 380–391 (2008). [DOI] [PubMed] [Google Scholar]
  • 22.Park C. S. et al. KLF4 enhances transplantation-induced hematopoiesis by inhibiting TLRs and noncanonical NFκB signaling at a steady state. Exp. Hematol. 144, 104730 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Essers M. A. G. et al. IFNalpha activates dormant haematopoietic stem cells in vivo. Nature 458, 904–908 (2009). [DOI] [PubMed] [Google Scholar]
  • 24.Baldridge M. T., King K. Y., Boles N. C., Weksberg D. C. & Goodell M. A. Quiescent haematopoietic stem cells are activated by IFN-gamma in response to chronic infection. Nature 465, 793–797 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bogeska R. et al. Inflammatory exposure drives long-lived impairment of hematopoietic stem cell self-renewal activity and accelerated aging. Cell Stem Cell 29, 1273–1284.e8 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chavez J. S. et al. PU.1 enforces quiescence and limits hematopoietic stem cell expansion during inflammatory stress. J Exp Med 218, (2021). [Google Scholar]
  • 27.Pietras E. M. et al. Re-entry into quiescence protects hematopoietic stem cells from the killing effect of chronic exposure to type I interferons. J. Exp. Med. 211, 245–262 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kain B. N. et al. Hematopoietic stem and progenitor cells confer cross-protective trained immunity in mouse models. iScience 26, 107596 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cheong J.-G. et al. Epigenetic memory of coronavirus infection in innate immune cells and their progenitors. Cell 186, 3882–3902.e24 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Konturek-Ciesla A., Olofzon R., Kharazi S. & Bryder D. Implications of stress-induced gene expression for hematopoietic stem cell aging studies. Nat Aging 4, 177–184 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jacobs M. D. & Harrison S. C. Structure of an IkappaBalpha/NF-kappaB complex. Cell 95, 749–758 (1998). [DOI] [PubMed] [Google Scholar]
  • 32.Tanaka-Yano M. et al. Tristetraprolin overexpression drives hematopoietic changes in young and middle-aged mice generating dominant mitigating effects on induced inflammation in murine models. GeroScience 46, 1271–1284 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ainciburu M. et al. Uncovering perturbations in human hematopoiesis associated with healthy aging and myeloid malignancies at single-cell resolution. Elife 12, (2023). [Google Scholar]
  • 34.Li H. et al. The dynamics of hematopoiesis over the human lifespan. Nat Methods 22, 422–434 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Oetjen K. A. et al. Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry. JCI Insight 3, (2018). [Google Scholar]
  • 36.Zhang Y. et al. Temporal molecular program of human hematopoietic stem and progenitor cells after birth. Dev Cell 57, 2745–2760.e6 (2022). [DOI] [PubMed] [Google Scholar]
  • 37.Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Liao Y., Smyth G. K. & Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res 47, e47 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hao Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 42, 293–304 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.GitHub - Moonerss/scrubletR. GitHub; https://github.com/Moonerss/scrubletR. [Google Scholar]
  • 41.Zeng A. G. X. et al. Single-cell transcriptional atlas of human hematopoiesis reveals genetic and hierarchy-based determinants of aberrant AML differentiation. Blood Cancer Discov. OF1–OF18 (2025). [Google Scholar]
  • 42.Gayoso A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 40, 163–166 (2022). [DOI] [PubMed] [Google Scholar]
  • 43.Laurenti E. et al. The transcriptional architecture of early human hematopoiesis identifies multilevel control of lymphoid commitment. Nat Immunol 14, 756–763 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Eppert K. et al. Stem cell gene expression programs influence clinical outcome in human leukemia. Nat Med 17, 1086–1093 (2011). [DOI] [PubMed] [Google Scholar]
  • 45.Jaatinen T. et al. Global gene expression profile of human cord blood-derived CD133+ cells. Stem Cells 24, 631–641 (2006). [DOI] [PubMed] [Google Scholar]
  • 46.Love M. I., Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zhu A., Ibrahim J. G. & Love M. I. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35, 2084–2092 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Korotkevich G. et al. Fast gene set enrichment analysis. bioRxiv 060012 (2021) doi: 10.1101/060012. [DOI] [Google Scholar]
  • 49.Subramanian A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Liberzon A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Liberzon A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1, 417–425 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.García-Prat L. et al. TFEB-mediated endolysosomal activity controls human hematopoietic stem cell fate. Cell Stem Cell 28, 1838–1850.e10 (2021). [DOI] [PubMed] [Google Scholar]
  • 53.Gavish A. et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618, 598–606 (2023). [DOI] [PubMed] [Google Scholar]
  • 54.Kotliar D. et al. Reproducible single cell annotation of programs underlying T-cell subsets, activation states, and functions. bioRxiv 2024.05.03.592310 (2024) doi: 10.1101/2024.05.03.592310. [DOI] [Google Scholar]
  • 55.Aibar S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Reshef Y. A. et al. Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics. Nat. Biotechnol. 40, 355–363 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Table 1. Single-cell datasets used for our integrated analysis of HSC aging. a. Summary of the seven datasets used in the study, including the preprocessing of the original count matrices. b. Detailed information on individual HSC samples. Only 96 out of 98 individuals are included, because two did not have any BoneMarrowMap-annotated HSCs.

media-1.xlsx (27.4KB, xlsx)
Supplement 2

Table 2. The results of Aged vs Young pseudobulk gene expression changes analysis. a. DESeq2 results, positive logFC corresponds to upregulation in Aged samples. b. GSEA results, positive NES corresponds to positive enrichment in Aged samples.

media-2.xlsx (1.1MB, xlsx)
Supplement 3

Table 3. Z-scores of gene contributions to metaGEPs 1–4. Averaged Z-score values of genes (rows) contributing to meta-gene expression programs (metaGEPs) 1–4 (columns).

media-3.xlsx (120.8KB, xlsx)
Supplement 4

Data Availability Statement

Data are available at https://figshare.com/projects/Aging_HSCs_2025/235781 (under embargo until publication). Annotated code to reproduce all analyses is available at https://github.com/noranekonobokkusu/Aging_HSCs_2025 and will be made available upon publication. Sequencing data generated in this study were deposited to GEO under accession number GSE302126 and will be made available upon publication.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES