Unlocking hematopoietic stem cell potential: integrative computational approaches for genomic and transcriptomic analysis

Pawan Kumar Raghav; Basudha Banerjee; Rajni Chadha

doi:10.3389/fcell.2025.1589823

. 2025 Sep 3;13:1589823. doi: 10.3389/fcell.2025.1589823

Unlocking hematopoietic stem cell potential: integrative computational approaches for genomic and transcriptomic analysis

Pawan Kumar Raghav ^1,^*, Basudha Banerjee ², Rajni Chadha ²

PMCID: PMC12440966 PMID: 40970097

Abstract

Hematopoietic stem cells (HSCs) sustain lifelong hematopoiesis through their capacity for self-renewal and multilineage differentiation. However, the isolation and functional characterization of HSCs remain challenging due to their cellular heterogeneity and dynamically regulated transcriptional and epigenetic landscapes. Advances in experimental and computational biology, including single-cell RNA sequencing (scRNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), network inference algorithms, and machine learning, have improved our ability to resolve transcriptional states, trace lineage trajectories, and reconstruct gene regulatory networks (GRN) at single-cell resolution. These approaches enable the discovery of novel HSC subtypes and regulatory factors, and facilitate the integration of multi-omics data to uncover epigenetic and transcriptional mechanisms that drive stem cell fate decisions. Additionally, machine learning models trained on high-throughput datasets provide predictive power for identifying novel enhancers, transcription factors, and therapeutic targets. This review underscores the synergistic role of computational tools in deciphering HSC biology and highlights their potential to improve stem cell therapies and precision treatments for hematologic disorders.

Keywords: Hematopoietic stem cells (HSCs), single-cell RNA sequencing (scRNA-Seq), computational biology, regenerative medicine, stem cell therapy, HSC transplantation, self-renewal, differentiation

1 Introduction

Hematopoiesis is the process by which hematopoietic stem cells (HSCs) proliferate and differentiate into all blood cell lineages, ensuring the continuous production of blood cells throughout an organism’s life (Ng and Alexander, 2017). HSCs can be sourced from bone marrow, peripheral, and umbilical cord blood (Lee and Hong, 2020). Understanding the regulation of HSC self-renewal and lineage differentiation is crucial for both basic research and clinical applications (Barriga et al., 2012). HSC transplantation remains a cornerstone in treating hematologic malignancies, autoimmune disorders, and immunodeficiencies, where their self-renewal capacity is critical for long-term engraftment and therapeutic success (Weissman and Shizuru, 2008). Despite their substantial clinical utilization, achieving a highly purified HSC population for transplantation continues to pose significant challenges. Standard therapeutic protocols often rely on mobilized peripheral blood or whole bone marrow, which contains a heterogeneous mixture of progenitor and mature cells. Consequently, the proportion of true, self-renewing HSCs is relatively low (Skulimowska et al., 2022). For successful transplantation, an optimal dose of approximately 2 × 10⁶ CD34⁺ cells per kilogram of the recipient’s body weight is recommended (Tricot et al., 2010). However, CD34 expression alone does not guarantee stem cell purity or functional potential. Pharmacological agents like NSC87877, a c-Kit inhibitor, when combined with stem cell factor (SCF), have shown promise for enhancing HSC proliferation post-isolation (Raghav et al., 2018). Increasing the accessibility of highly purified, self-renewing HSCs can enhance therapeutic outcomes and pave the way for novel treatment approaches (Negrin et al., 2000; Logan et al., 2012; Czechowicz and Weissman, 2010).

Computational approaches have emerged as powerful tools to overcome the limitations of HSC identification and characterization by tracing complex gene regulatory interactions (GRN) and epigenetic landscapes that govern HSC fate. Techniques such as single-cell RNA sequencing (scRNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), network inference algorithms, and machine learning enable the mapping of transcriptional profiles, regulatory networks, and functional heterogeneity at the single-cell level (Kamimoto et al., 2023; Moignard et al., 2015; Wilson et al., 2015; Wang et al., 2024).

Among these technologies, scRNA-seq has proven particularly valuable in revealing transcriptional heterogeneity within HSC populations. It provides high-resolution insights into the lineage commitment and developmental trajectories of HSCs (Wilson et al., 2015; Hérault et al., 2022; Velten et al., 2017). Analytical tools such as FastQC (Andrews, 2010), STAR (Dobin et al., 2013; Du et al., 2020), Seurat (Butler et al., 2018), SCANPY (Wolf et al., 2018), DESeq2 (Love et al., 2014), CellAssign (Zhang et al., 2019), edgeR (Robinson et al., 2010), and Monocle (Trapnell et al., 2014; Qiu et al., 2017a; Qiu et al., 2017b) are commonly used to process and interpret scRNA-seq data. ChIP-seq complements transcriptomic approaches by identifying genome-wide transcription factor (TF) binding sites and epigenetic modifications that regulate HSC self-renewal and differentiation (Cui et al., 2009; Joshi et al., 2013). Tools such as Bowtie2 (Langmead and Salzberg, 2012), MACS2 (Zhang et al., 2008), SICER (Xu et al., 2014), and GREAT (McLean et al., 2010) enable precise mapping of protein-DNA interactions and chromatin dynamics during HSC development. Network inference algorithms are another critical layer in decoding the regulatory circuitry of HSCs. By integrating large-scale expression data, these methods uncover interactions among TFs and their target genes, thereby identifying pivotal regulators such as PU.1, GATA2, LMO2, and MYB (Velten et al., 2017; Wilson et al., 2016; Rodriguez-Fraticelli et al., 2020; Moignard et al., 2013). Tools such as ARACNE (mutual information-based) (Margolin et al., 2006), WGCNA (correlation-based module detection) (Langfelder and Horvath, 2008), Cytoscape (Shannon et al., 2003), and GeneNet (Bayesian network inference) (Ananko et al., 2002) are widely used for inferring and visualizing these networks. Machine learning techniques further enhance our ability to model gene expression, predict regulatory elements, and analyze chromatin accessibility in HSCs (Xiang et al., 2020; Fortelny and Bock, 2020; Lal et al., 2021). Scikit-Learn, DeepCpG, and ChromNet, provide robust data integration, feature selection, model training, and predictive analysis capabilities (Angermueller et al., 2017; Lundberg et al., 2016; Scikit-Learn, 2016; Shannon et al., 2003).

Computational approaches revolutionize the understanding of HSC biology by unraveling cellular heterogeneity, elucidating transcriptional and epigenetic control mechanisms, and identifying biomarkers and therapeutic targets. Figure 1 presents a comprehensive framework for unraveling the complexity of HSC regulation by integrating multi-omics data with advanced computational pipelines. This framework ultimately facilitates the isolation and functional validation of pure HSC populations for therapeutic applications.

Illustration of a workflow for analyzing hematopoietic stem cells (HSCs) isolated from bone marrow followed by next-generation sequencing (NGS) and high-performance computing. The approaches include scRNA-Seq, ChIP-Seq, network inference, and machine learning for identifying progenitor HSCs, transcription factors, regulatory networks, biomarkers, and drug targets. Relevant tools listed include Seurat, SCANPY, Monocle, DAVID, DESeq2, Cell Ranger, FastQC, Bowtie, Enrichr, Sicer, Great, MACS2, GeneNet, GENIE3, WGCNA, ARACNE, Cytoscape, PyTorch, DeepCpG, ChromNet, Scikit-Learn, and TensorFlow. — Computational approaches for HSCs’ genomic and transcriptomic data analysis. Illustrates the integrative workflow for analyzing bone marrow-derived HSCs using NGS and high-performance computing. scRNA-seq used to identify progenitor HSCs, resolve transcriptional heterogeneity, and explore cell state transitions using tools such as Seurat, SCANPY, Monocle, DESeq2, DAVID, and CellRanger. ChIP-seq identifies transcription factor binding sites and assesses gene regulatory dynamics, with key tools including FastQC, Bowtie, MACS2, SICER, Enrichr, and GREAT. Network inference approaches, such as GeneNet, GENIE3, WGCNA, ARACNE, and Cytoscape, enable the reconstruction of gene regulatory networks governing HSC fate decisions. Machine learning methods, including PyTorch, DeepCpG, ChromNet, scikit-learn, and TensorFlow, are applied to identify biomarkers, predict regulatory elements, and model gene expression patterns. NGS: next-generation sequencing; scRNA-seq: Single-cell RNA sequencing; ChIP-seq: Chromatin immunoprecipitation sequencing.

2 Approaches for analyzing HSC genomics and transcriptomic data

Following high-throughput data generation and expression quantification, various computational approaches are employed to analyze genomic and transcriptomic data in HSCs (Figure 1). These methods enable in-depth exploration of transcriptional heterogeneity, regulatory mechanisms, and lineage trajectories. scRNA-seq is a powerful tool for dissecting HSC heterogeneity, allowing the identification of novel cell types, functional states, and regulatory networks (Wilson et al., 2015; Hérault et al., 2022). ChIP-seq reveals genome-wide TF binding sites and epigenetic regulation, including cis-regulatory landscapes of HSCs (Qi et al., 2021). Network inference algorithms use high-throughput expression data to infer regulatory interactions among genes or proteins (Cahan et al., 2021). These approaches have been used to reconstruct transcriptional networks involved in hematopoietic development. Application of network inference on single-cell gene expression data decodes early blood development regulatory programs (Moignard et al., 2015). Machine learning algorithms, including support vector machines (SVMs), random forests, and deep learning, are employed to predict regulatory interactions and identify novel gene networks between endothelial cells and HSCs (Wang et al., 2024).

3 scRNA-seq in HSC analysis

scRNA-seq is a technique that enables high-resolution characterization of cellular heterogeneity by profiling gene expression at the single-cell level (Hérault et al., 2022). scRNA-seq has been instrumental in uncovering the transcriptional diversity of HSCs and their progeny. This groundbreaking approach has unveiled novel cell states, differentiation trajectories, and regulatory networks that were previously unknown (Hérault et al., 2022; Ivanova et al., 2002). Seminal studies have utilized scRNA-seq to identify and functionally characterize distinct HSC subpopulations. These studies revealed cells with transcriptional signatures linked to quiescence, immune activation, and a megakaryocyte-erythroid lineage bias (Wilson et al., 2015; Velten et al., 2017; Rothenberg, 2021). The technique has also delved into the differentiation of lineage-specific T and B lymphocytes and has identified transcriptional regulators that commit cells to these lineages (Velten et al., 2017; Rothenberg, 2021). Beyond steady-state hematopoiesis, scRNA-seq has unveiled how radiation affects the transcriptional programs of HSCs, shedding light on stress-induced alterations in quiescence and survival pathways. (Gao et al., 2021; Fast et al., 2021). scRNA-seq offers unprecedented insights into the molecular mechanisms that govern HSC identity and function. (Velten et al., 2017; Rodriguez-Fraticelli et al., 2018). Table 1 outlines the fundamental computational processes involved in analyzing scRNA-seq data and the commonly employed tools for HSC-specific research.

TABLE 1.

Commonly used computational tools for the analysis of HSCs’ scRNA-seq data. Outlines key analytical approaches, including quality control, normalization, dimensionality reduction, clustering, differential gene expression analysis, pseudotime trajectory inference, and network analysis. For each step, representative tools are listed alongside corresponding references. scRNA-seq, single-cell RNA sequencing; HSCs, Hematopoietic stem cells; PCA, Principal component analysis; t-SNE, t-Distributed Stochastic Neighbor Embedding; UMAP, Uniform Manifold Approximation and Projection; DEGs, Differentially expressed genes; GO, Gene ontology.

Approaches	Steps	Tools	References
Quality control and preprocessing	Cell quality control	FastQC RSeQC	Andrews (2010) Wang et al. (2012)
	Read alignment	STAR HISAT	Dobin et al. (2013) Kim et al. (2015)
	Unique molecular identifier	Cell Ranger Scater	Zheng et al. (2017) McCarthy et al. (2017)
	Gene expression quantification	HTSeq featureCounts	Anders et al. (2015) Liao et al. (2014)
	Quality filtering of cells and genes	Seurat SCANPY	Butler et al. (2018) Wolf et al. (2018)
Normalization	Normalization of sequencing depth	DESeq2 scran	Love et al. (2014) Lun et al. (2016)
Normalization	Normalization of gene expression	scran ZINB-WaVE	Lun et al. (2016) Risso et al. (2018)
Dimensionality reduction	PCA	Seurat SCANPY	Butler et al. (2018) Wolf et al. (2018)
	t-SNE	Seurat SCANPY	Butler et al. (2018) Wolf et al. (2018)
	UMAP	UMAP SCANPY	Ghojogh et al. (2021) Wolf et al. (2018)
	Diffusion maps	destiny SCANPY	Angerer et al. (2016) Wolf et al. (2018)
Clustering and cell type identification	Hierarchical clustering	Seurat SCANPY	Butler et al. (2018) Wolf et al. (2018)
	k-means clustering	Seurat SCANPY	Butler et al. (2018) Wolf et al. (2018)
	Density-based clustering	Seurat SCANPY	Butler et al. (2018) Wolf et al. (2018)
	Cluster identification based on marker genes	Seurat CellAssign	Butler et al. (2018) Zhang et al. (2019)
Differential gene expression analysis	DEGs between cell types	DESeq2 edgeR	Love et al. (2014) Robinson et al. (2010)
Differential gene expression analysis	GO enrichment analysis	clusterProfiler GSEA	Yu et al. (2012) Subramanian et al. (2005)
Cell trajectory and pseudotime analysis	Ordering of cells along a developmental trajectory	Monocle Slingshot	Trapnell et al. (2014) Street et al. (2018)
Cell trajectory and pseudotime analysis	Inference of gene expression dynamics along the trajectory	Monocle scVelo	Trapnell et al. (2014) Bergen et al. (2020)
Network analysis	Construction of gene co-expression networks	WGCNA SCENIC	Langfelder and Horvath (2008) Aibar et al. (2017)

Open in a new tab

3.1 Quality control and preprocessing

scRNA-seq generates high-dimensional raw data that requires extensive preprocessing to ensure analytical accuracy and biological validity (Zheng et al., 2017). This crucial step involves removing technical noise and low-quality cells before downstream analyses. The preprocessing pipeline typically encompasses cell quality assessment, read alignment, unique molecular identifier (UMI) counting, gene expression quantification, and quality filtering.

Cell quality control serves as the foundation and is an essential step for excluding cells with poor-quality reads or abnormal transcript profiles. Tools such as Cell Ranger, Seurat, and RSeQC are being widely used for this purpose (Butler et al., 2018; Zheng et al., 2017; Wang et al., 2012). Cell Ranger evaluates sequencing quality, total read count, and genome-mapping percentages, while Seurat utilizes metrics such as the number of detected genes, UMI counts, and mitochondrial gene content to identify and exclude low-quality cells (Butler et al., 2018).

Read alignment maps sequencing reads to a reference genome. STAR and HISAT are the most commonly used aligners (Dobin et al., 2013; Kim et al., 2015). STAR offers ultrafast and high-accuracy alignment through a two-pass strategy, while HISAT employs hierarchical indexing to efficiently map spliced reads.

UMI counting enables accurate quantification of gene expression by distinguishing between true transcripts (UMIs, short DNA sequences that tag individual mRNA molecules) and PCR duplicates (Smith et al., 2017). Cell Ranger, Drop-seq, and Scater facilitate UMI counting (Macosko et al., 2015; Baran-Gale et al., 2018; McCarthy et al., 2017).

Gene expression quantification typically involves counting UMIs associated with each gene. Commonly used tools include Cell Ranger, HTSeq, featureCounts, and Kallisto (Anders et al., 2015; Du et al., 2020; Zheng et al., 2017; Liao et al., 2014). Cell Ranger quantifies gene expression using the feature-barcode matrix generated from UMI counting (Zheng et al., 2017). Kallisto employs pseudo-alignment for faster transcript quantification without the need for complete read mapping (Bray et al., 2016; Brüning et al., 2022). Quality filtering of cells and genes ensures that only relevant data is retained for downstream analysis. Seurat and Cell Ranger apply user-defined thresholds based on gene detection, UMI counts, and mitochondrial gene expression (Butler et al., 2018; Zheng et al., 2017). Genes expressed in insufficient cells or at extremely low levels are filtered out to minimize noise and enhance statistical power (Wolf et al., 2018). These preprocessing steps are crucial for ensuring the reliability of scRNA-seq analysis. Their extensive validation in HSC studies forms the foundation for robust interpretation of single-cell transcriptomic data.

3.2 Normalization

Normalization is a critical step in scRNA-seq data analysis, addressing variability introduced by differences in sequencing depth, capture efficiency, and RNA content across cells (Cuevas-Diaz Duran et al., 2024). Appropriate normalization ensures that observed gene expression differences reflect biological variation rather than technical noise. Several tools have been developed to normalize scRNA-seq workflows, each employing distinct strategies to correct biases.

SCnorm adjusts for cell-specific technical variability using a variance-stabilizing normalization (VSN) approach, allowing accurate comparison of RNA expression levels across cells (Bacher et al., 2017).

Seurat, a widely used R package for scRNA-seq analysis, offers multiple normalization methods. These include global-scaling approaches and Cell Cycle Regression (CCR), which corrects cell cycle-related transcriptional effects that can confound downstream clustering and trajectory analysis (Butler et al., 2018).

DESeq2 is an R package for differential expression analysis, including normalization methods for scRNA-seq data. It uses a model-based approach to estimate size factors that account for differences in sequencing depth across cells (Love et al., 2014).

Other tools, such as scran and ZINB-WaVE, offer alternative frameworks for normalization, particularly for sparse and zero-inflated single-cell datasets (Lun et al., 2016; Risso et al., 2018). The appropriate normalization strategy is essential for accurate differential expression, clustering, and trajectory inference. The choice often depends on the specific characteristics of the dataset and the downstream analytical goals.

3.3 Dimensionality reduction

Dimensionality reduction transforms high-dimensional gene expression data into a lower-dimensional space while preserving essential biological variation (Townes et al., 2019). This facilitates data visualization, clustering, and trajectory inference by mitigating noise and computational complexity. Several widely used techniques are applied for HSC scRNA-seq studies.

Principal component analysis (PCA) is a linear method that identifies orthogonal axes (principal components) capturing the variance in gene expression. It is typically the first step in most scRNA-seq workflows and is implemented in tools such as Seurat and SCANPY (Butler et al., 2018; Wolf et al., 2018; Stuart et al., 2019).

t-Distributed stochastic neighbor embedding (t-SNE) is a nonlinear technique that emphasizes local data structure, making it helpful in visualizing distinct cell populations based on expression similarity. It is commonly employed in Seurat and SCANPY for cluster visualization (Butler et al., 2018; Wolf et al., 2018; Stuart et al., 2019).

Uniform manifold approximation and projection (UMAP) is a recent nonlinear method that preserves local and global data structures. UMAP has gained popularity due to its superior scalability and speed over t-SNE, particularly for large datasets. It is supported by SCANPY, and Harmony (Wolf et al., 2018; Korsunsky et al., 2019; Ghojogh et al., 2021).

Diffusion maps model gene expression similarity using diffusion distances, which are robust to noise and particularly useful for capturing continuous trajectories and identifying rare cell states. Destiny and Diffusion Maps are commonly used tools for this purpose (Angerer et al., 2016; Haghverdi et al., 2015).

3.4 Clustering

Clustering identifies transcriptionally distinct HSC populations within complex tissues such as the bone marrow (Butler et al., 2018). Multiple clustering strategies have been developed, each with strengths suited to different data structures and biological contexts.

Hierarchical clustering is a widely used method based on the recursive merging of similar cells or genes. The algorithm constructs a dendrogram to represent the nested relationships between clusters, which can then be cut at a desired resolution to define distinct groups. This method is implemented in tools such as Seurat, SCANPY, and Monocle, and has been applied extensively in HSC studies to resolve lineage-specific transcriptional states (Wolf et al., 2018; Qiu et al., 2017a; Satija et al., 2015).

k-means clustering partitions cells into k user-defined clusters by iteratively assigning cells to the nearest centroid and updating centroid positions until convergence. Despite its simplicity, k-means remains effective for well-separated clusters and is supported in frameworks such as Scikit-Learn and Cell Ranger (Scikit-Learn, 2016; Zheng et al., 2017).

Density-based clustering, including Density-Based Spatial Clustering of Applications with Noise (DBSCAN), identifies clusters based on cell density. This method captures clusters of varying shapes and sizes, excluding outliers or rare cell types as noise. DBSCAN is available in Seurat and SCANPY and has been used to delineate heterogeneous populations within HSC datasets (Butler et al., 2018; Wolf et al., 2018; Satija et al., 2015).

Marker gene-based annotation represents a supervised approach that leverages prior knowledge of gene expression signatures specific to known cell types. Tools such as SingleR and CellAssign compare transcriptomes against reference datasets or predefined marker panels to assign cell identities. This approach is particularly valuable for validating cluster annotations or transferring labels across datasets (Zhang et al., 2019; Aran et al., 2019).

3.5 Differential expression and enrichment analysis

Differential gene expression (DGE) analysis is a key component of scRNA-seq workflows, enabling the identification of genes that vary significantly across cell types, states, or conditions. DGE analysis has been instrumental in uncovering transcriptional regulators of HSCs associated with differentiation, aging, and lineage commitment (Wang et al., 2019). Several widely adopted tools support DGE analysis in scRNA-seq data.

DESeq2 is an R/Bioconductor package that employs a negative binomial distribution to model count data and estimate dispersion and fold changes between groups (Love et al., 2014). It has been used to identify transcriptional changes in aging HSCs (Adelman et al., 2019).

edgeR is another R/Bioconductor package that similarly models gene expression using a negative binomial distribution and generalized linear models, offering robust statistical frameworks for identifying differentially expressed genes (DEGs) across groups (Robinson et al., 2010). It has been applied in studies investigating dynamic gene expression during HSC differentiation (Lun et al., 2016).

Limma-voom combines linear modeling with precision weights derived from mean-variance relationships in log-transformed count data. This method is effective for scRNA-seq and has been used in both HSC-specific and broader single-cell studies (Lun et al., 2016; Ritchie et al., 2015).

MAST (Model-based Analysis of Single-cell Transcriptomics) utilizes a Bayesian hierarchical framework to model the bimodal distribution of single-cell data. It is particularly suited for zero-inflated datasets and has been widely applied to identify DEGs in HSCs and their progeny (Vanuytsel et al., 2022; Finak et al., 2015).

Initially developed for trajectory analysis, Monocle supports DGE testing along pseudotemporal trajectories, capturing dynamic changes during HSC lineage specification (Trapnell et al., 2014; Gao et al., 2021).

SCDE (Single-Cell Differential Expression) models dropouts and overdispersion using a Bayesian approach and have been applied in studies exploring gene expression dynamics during HSC differentiation (Tusi et al., 2018; Kharchenko et al., 2014).

In parallel with DGE, gene ontology (GO) enrichment analysis is used to interpret biological functions associated with DEG sets, revealing signaling pathways, cellular processes, and transcriptional programs relevant to hematopoiesis. To interpret the biological significance of differentially expressed genes, several widely used tools have been developed for GO enrichment analysis.

DAVID (Database for Annotation, Visualization, and Integrated Discovery), which enables functional annotation of gene lists, has been applied in studies of HSC differentiation (Adelman et al., 2019; Huang et al., 2009).

Enrichr, a web-based tool that offers access to multiple gene set libraries and enrichment algorithms, has been used to identify transcriptional regulators underlying dynamic HSC states (Kuleshov et al., 2016).

GSEA (Gene Set Enrichment Analysis) assesses whether predefined gene sets show statistically significant differences between biological conditions. It has been widely adopted in single-cell studies of HSCs and immune lineages (Subramanian et al., 2005).

ClusterProfiler provides a programmatic interface for GO and pathway enrichment directly within R and supports visualization and statistical comparison of multiple gene sets (Yu et al., 2012; Xu et al., 2024).

These tools have proven essential for decoding the molecular underpinnings of HSC identity and fate decisions. By linking gene expression patterns to functional pathways, DGE and enrichment analyses continue to deepen understanding of the regulatory networks governing hematopoiesis.

3.6 Pseudotime analysis

Pseudotime analysis is a computational strategy used to infer the temporal progression of cellular states from static single-cell transcriptomic data. By ordering cells along a putative developmental trajectory based on their gene expression profiles, pseudotime analysis enables the identification of key regulators and pathways involved in differentiation, lineage commitment, and cellular transitions (Street et al., 2018; Bergen et al., 2020; Campbell and Yau, 2019). Several tools have been developed to model pseudotemporal dynamics in HSCs, each using distinct algorithms to reconstruct lineage hierarchies and predict gene expression changes.

Monocle is one of the most widely used tools for pseudotime inference. It employs a reverse graph embedding algorithm to map gene expression dynamics along developmental trajectories. In HSC studies, Monocle has been used to reconstruct differentiation pathways and identify transcriptional regulators of lineage fate decisions (Trapnell et al., 2014; Olsson et al., 2016).

SCORPIUS utilizes a random walk-based algorithm to model the progression of cells along a smooth trajectory, enabling the prediction of future transcriptional states and identifying key regulatory genes. It has been applied to delineate hematopoietic lineage bifurcation, including the transition from HSCs to lymphoid and myeloid progenitors (Liang et al., 2020; Cannoodt et al., 2016).

Wanderlust reconstructs developmental progressions using a minimum spanning tree approach, enabling detailed mapping of sequential gene expression changes. This method has uncovered lineage-specific gene regulatory programs during HSC differentiation (Velten et al., 2017; Bendall et al., 2014).

Waterfall applies a hierarchical clustering framework to model cellular progression, effectively capturing transcriptional transitions and branching events. In HSCs, Waterfall has been used to trace developmental hierarchies and pinpoint regulatory genes involved in early hematopoietic commitment (Shin et al., 2015).

3.7 Network analysis

Network analysis provides a systems-level view of gene and protein interactions, enabling the identification of regulatory modules, signaling pathways, and transcriptional hierarchies that govern cellular identity and function (Cahan et al., 2021). In HSCs, network analysis has been pivotal for reconstructing GRN, identifying lineage-specific transcriptional regulators, and uncovering dynamic programs that govern differentiation and stem cell fate decisions. Several computational frameworks have been widely applied to single-cell transcriptomic data for network inference and analysis in HSCs.

Weighted Gene Co-expression Network Analysis (WGCNA) is an R-based package that constructs gene co-expression networks by identifying modules of highly correlated genes. These modules are often associated with biological traits or cell states. WGCNA has been used to identify hub genes and co-expression modules relevant to HSC maintenance and differentiation (Desterke et al., 2020).

SCENIC (Single-Cell Regulatory Network Inference and Clustering) integrates co-expression analysis with motif enrichment to infer TF–target relationships at single-cell resolution (Aibar et al., 2017). In HSCs, SCENIC has enabled the reconstruction of GRN and the identification of lineage-defining TFs and their regulatory targets (Moignard et al., 2015).

Monocle, in addition to trajectory inference, supports dynamic network analysis by modeling gene expression changes over pseudotime. This allows for identifying temporally regulated genes and pathways during hematopoietic differentiation (Trapnell et al., 2014; Olsson et al., 2016).

CellNet is a supervised machine learning tool designed to assess and reconstruct cell type–specific GRN using gene expression data. It has been employed to evaluate the fidelity of engineered or reprogrammed HSCs and to identify regulatory signatures distinguishing distinct hematopoietic states (Cahan et al., 2014; Lu et al., 2016).

Ingenuity Pathway Analysis (IPA) is a commercial platform that maps gene expression data onto curated biological pathways and networks. IPA has been used to identify upstream regulators, canonical pathways, and molecular interactions relevant to HSC signaling and functional specification (Marx-Blümel et al., 2021).

4 HSC ChIP-Seq data analysis

ChIP-seq maps genome-wide binding sites of TFs and other regulatory proteins, providing critical insights into the epigenetic regulation of gene expression (Lundberg et al., 2016). ChIP-seq has been instrumental in delineating cis-regulatory landscapes that control self-renewal and lineage commitment of HSCs. The method involves crosslinking DNA and proteins in situ, isolating protein–DNA complexes, immunoprecipitating them using target-specific antibodies, and sequencing the recovered DNA fragments. This enables the identification of genomic loci bound by TFs and chromatin-modifying proteins (Gade and Kalvakolanu, 2012). ChIP-seq profiled undifferentiated and activated HSCs to identify dynamic TF binding events and cis-regulatory regions associated with self-renewal and differentiation (Qi et al., 2021). These findings have deepened the understanding of HSC regulation and may inform future therapeutic strategies for hematological diseases.

4.1 Computational pipeline for ChIP-Seq data analysis

Computational analysis of ChIP-seq data involves several key steps, each facilitated by specialized bioinformatics tools.

4.1.1 Quality control and preprocessing

Raw sequencing reads must be assessed for quality and trimmed to remove adapters or low-quality bases. FastQC and Trimmomatic tools are routinely used at this stage (Andrews, 2010; Bolger et al., 2014).

4.1.2 Alignment

Cleaned reads are aligned to a reference genome using aligners such as Bowtie2 or BWA, producing binary alignment map (BAM) files that record read locations and mapping quality (Langmead and Salzberg, 2012; Li and Durbin, 2009).

4.1.3 Peak calling

Aligned reads identify enriched regions referred to as “peaks” that signify protein-DNA interactions. Standard tools include MACS2, which models peak significance, and SICER, which is suited for broad enrichment signals (Zhang et al., 2008; Xu et al., 2014).

4.1.4 Peak annotation and functional analysis

Identified peaks are annotated with genomic features (e.g., promoters, enhancers) using ChIPseeker (Yu et al., 2015). Enrichment analysis tools such as GREAT and Enrichr are then used to interpret the functional roles of bound regions (McLean et al., 2010; Kuleshov et al., 2016).

This pipeline enables the discovery of genome-wide TF binding sites, enhancer-promoter interactions, and regulatory motifs central to HSC function.

4.2 HSC ChIP-Seq studies

Applying ChIP-seq to HSCs has enabled high-resolution mapping of TF binding sites and chromatin modifications, offering critical insights into the regulatory architecture underlying hematopoiesis (Lundberg et al., 2016; Gade and Kalvakolanu, 2012). Through computational ChIP-seq data analysis, numerous studies have characterized gene regulatory elements that govern HSC self-renewal, quiescence, and lineage specification (Wilson et al., 2016; Cui et al., 2009). A study employed ChIP-seq to map genome-wide TF occupancy in HSCs subpopulations (Subramanian et al., 2023). MACS2 was used for peak calling and HOMER for motif discovery (Zhang et al., 2008) identified dynamic changes in cis-regulatory landscapes during differentiation. The analysis revealed stage-specific binding of key TFs, underscoring the dynamic regulatory programs that orchestrate HSC fate decisions. The distribution of histone modifications H3K4me3 and H3K27me3 in HSCs and their progeny was investigated using Bowtie for read alignment, MACS2 for peak calling, and IGV for visualization (Zhang et al., 2021). The study demonstrated that histone mark distribution is altered during differentiation. These findings suggested that epigenetic reprogramming is pivotal in regulating gene expression and lineage commitment. The function of Polycomb Repressive Complex 2 (PRC2) was examined in HSC regulation (Xie et al., 2014). ChIP-seq profiling of PRC2 components revealed enrichment at genes involved in differentiation. Functional studies showed that loss of PRC2 activity impaired HSC self-renewal and promoted premature differentiation, highlighting its essential role in maintaining stem cell identity. The enhancer landscape during HSC differentiation was characterized by profiling H3K4me1, a histone modification associated with active and primed enhancers. (Lara-Astiaso et al., 2014). The analysis revealed that lineage-specific enhancers are established early and maintained throughout differentiation, serving as epigenetic bookmarks for future transcriptional activation. The study also mapped binding sites of key TFs implicated in lineage choice and functional specification. ChIP-seq delineates the binding profile of GATA1, a master regulator of erythropoiesis, in erythroid progenitors derived from HSCs (Wilson et al., 2016). Bowtie and HOMER demonstrated that GATA1 targets both promoters and enhancers of erythroid-specific genes, reinforcing its central role in erythroid lineage programming. Similarly, ChIP-seq analysis revealed that GATA2, another critical TF in early hematopoiesis, binds to regulatory elements associated with genes essential for HSC maintenance and differentiation (Joshi et al., 2013). Loss of GATA2 disrupted these programs, confirming its indispensable role in sustaining HSC identity.

These studies underscore the power of ChIP-seq to uncover the transcriptional and epigenetic networks that define HSC behavior. High-resolution binding data with advanced computational pipelines facilitate the identification of promoters, enhancers, and TF occupancy patterns that govern key aspects of HSC function from quiescence and self-renewal to lineage commitment (Joshi et al., 2013; Gade and Kalvakolanu, 2012; Hannah et al., 2011). These findings enhance understanding of hematopoietic development and provide a framework for identifying novel targets for therapeutic manipulation in hematological disorders.

5 Network inference algorithms

Network inference algorithms offer a robust computational framework for reconstructing GRN from high-throughput gene expression data. These approaches enable the identification of transcriptional regulators, target genes, and functional modules that control cellular processes such as development, differentiation, and lineage commitment (Kamimoto et al., 2023; Mercatelli et al., 2020). Their application has been particularly transformative in the study of HSCs, where understanding the regulatory circuitry is essential for elucidating the mechanisms governing self-renewal, multipotency, and differentiation. Several network inference algorithms have been developed, each with unique strengths and assumptions based on data types and modeling goals (Saint-Antoine and Singh, 2020). These include Bayesian approaches, mutual information-based algorithms, and correlation-based methods. These algorithms have been applied to transcriptomic data, particularly from scRNA-seq, to predict regulatory interactions with increasing granularity and biological relevance. In a study, GENIE3 (tree-based ensemble learning) predicted regulatory interactions and was employed to infer GRN from single-cell expression profiles of developing mouse embryos (Kamimoto et al., 2023). The analysis identified well-established regulators of hematopoiesis, including GATA2, Runx1, and Scl/Tal1, as well as novel candidates such as LMO2 and MYB. Functional validation through genetic perturbation experiments confirmed the predicted regulatory interactions and demonstrated the network’s ability to forecast downstream effects of TF deletion. Similarly, a study used network inference to analyze bulk RNA-seq data from murine HSCs and their progenitors (Cabezas-Wallscheid et al., 2014). Their analysis revealed a GATA2-centered module regulating self-renewal and identified several additional factors involved in HSC lineage priming.

In another study, network inference was applied to human scRNA-seq datasets to reconstruct differentiation trajectories in early hematopoiesis (Velten et al., 2017). The analysis highlighted PU.1 as a key regulator of myeloid lineage commitment, consistent with prior functional evidence. GRN underlying the differentiation of HSCs into all major blood lineages has been reconstructed (Serina Secanechia et al., 2022). Using scRNA-seq data across developmental timepoints identified both canonical regulators (e.g., GATA2, Runx1, Scl/Tal1) and novel contributors such as CEBPα and Spi1. CRISPR-Cas9-mediated perturbations were used to validate predictions, demonstrating the predictive strength of the inferred network. These studies illustrate how integrating expression data with network inference enables mechanistic insights into HSC biology. By revealing both established and previously uncharacterized regulators, these approaches provide a blueprint for understanding hematopoietic fate decisions at a systems level (Armingol et al., 2021).

5.1 Computational workflow for network inference in HSCs

The computational reconstruction of GRN in HSCs typically involves four key steps.

5.1.1 Preprocessing

Raw transcriptomic data (e.g., RNA-seq or scRNA-seq) undergo to quality control, normalization, and batch correction to minimize technical variability and retain biological signals (Lun et al., 2016).

5.1.2 Network inference

Preprocessed data are input into network inference algorithms such as GENIE3, ARACNE, WGCNA, and GeneNet. These tools infer edges between TFs and potential targets, constructing initial GRN (Margolin et al., 2006; Langfelder and Horvath, 2008; Ananko et al., 2002).

5.1.3 Network validation

Inferred interactions are validated against known regulatory databases or experimentally using loss-of-function or gain-of-function assays. This step assesses biological plausibility and predictive robustness (Kamimoto et al., 2023).

5.1.4 Network analysis

The final network is analyzed using centrality, modularity, and connectivity metrics to identify master regulators and key subnetworks (Cahan et al., 2021). Tools like Cytoscape is commonly used for visualization and annotation (Shannon et al., 2003).

6 Machine learning approaches for HSC data analysis

Machine learning approaches have become indispensable in HSC computational biology, particularly for modeling complex regulatory networks and predicting gene interactions from high-dimensional data. These techniques facilitate the discovery of novel transcriptional programs and molecular mechanisms underlying HSC differentiation, lineage commitment, and self-renewal (Bian and Cahan, 2016). A notable study utilized a deep learning-based framework to predict tissue-specific regulatory interactions between endothelial cells and HSCs using scRNA-seq data from mouse bone marrow (Wang et al., 2024). This approach accurately captured previously unrecognized cross-cell-type interactions, highlighting the capacity of machine learning to elucidate complex intercellular communication.

ChIP-seq data have been integrated with machine learning, including applying a random forest algorithm to predict TF binding sites, identifying key regulators of HSC function and differentiation (Kamimoto et al., 2023). This highlights the utility of machine learning for enhancer and TF motif prediction. SVMs have also been applied to classify distinct stages of HSC differentiation based on gene expression profiles. A study delineated hematopoietic progenitor cell phenotyping through machine learning approaches, offering insights into the transcriptional differences from fetal liver HSCs (Fidanza et al., 2020). A neural network model has been developed to identify functional enhancers regulating self-renewal and lineage-specific regulators (Xia et al., 2020), and machine learning has also been used to estimate the regulatory potential of DNA sequences, identifying transcription factors and enhancer elements relevant to HSC identity and fate (Xiang et al., 2020).

Other studies have demonstrated the predictive power of random forest models in modeling gene expression changes during HSC differentiation and mapping chromatin accessibility across regulatory regions (Fortelny and Bock, 2020; Lal et al., 2021). Collectively, these applications underscore the transformative role of machine learning in decoding regulatory complexity in HSC biology (Fidanza et al., 2020).

6.1 Machine learning tools for HSC data analysis

Several computational tools and platforms have been developed to implement machine learning techniques for HSC datasets.

6.1.1 Scikit-learn

A widely used Python library offering an extensive suite of machine learning algorithms, including SVM, decision trees, and clustering. It has been applied in studies predicting intercellular regulatory interactions (Wang et al., 2024; Scikit-Learn, 2016).

6.1.2 TensorFlow

A robust open-source framework developed by Google, suitable for large-scale deep learning applications. TensorFlow constructs a neural network model to predict gene expression in single HSCs (Athanasiadis et al., 2017).

6.1.3 PyTorch

An alternative deep learning platform known for its flexibility and dynamic computation graph, used to model lineage trajectories of individual HSCs (Wang et al., 2024).

6.1.4 DeepCpG

A deep learning model for predicting DNA methylation from sequencing data. It has been used to model methylation dynamics at single CpG resolution in HSCs and progenitors (Angermueller et al., 2017).

6.1.5 ChromNet

A tool that infers chromatin interactions from ChIP-seq data using deep learning, applied to predict enhancer-promoter connectivity in HSCs (Lundberg et al., 2016).

6.2 Machine learning based workflow for HSC data analysis

Machine learning-driven analysis of HSC data typically follows a structured workflow.

6.2.1 Data preprocessing

Raw expression or epigenomic data are filtered, normalized, and batch corrected. Genes with low expression or limited variance are excluded (Gonzalez Zelaya, 2019).

6.2.2 Feature selection

Informative features are extracted to reduce dimensionality and improve model generalizability. Approaches such as minimum redundancy maximum relevance (mRMR) are commonly employed (Dhal and Azad, 2022).

6.2.3 Model training

Selected features are used to train machine learning models such as SVM, random forests, or neural networks (Bian and Cahan, 2016).

6.2.4 Model evaluation

Cross-validation or independent test sets assess model performance, ensuring robustness and avoiding overfitting (Xiong et al., 2020).

6.2.5 Network analysis

Predicted regulatory interactions are visualized and interpreted using platforms like Cytoscape, aiding in identifying key regulators and pathways (Shannon et al., 2003).

6.3 Case study: regulatory prediction between endothelial cells and HSCs

A complete machine learning pipeline was demonstrated in a study aimed at decoding HSCs based on their morphological features, using microscopy images, enabling rapid identification of HSCs and progenitor cells (Wang et al., 2024). The SVM model was trained and validated using cross-validation techniques after applying a mutual information-based minimum redundancy maximum relevance algorithm for feature selection (Dhal and Azad, 2022). The resulting network, visualized using Cytoscape, revealed novel intercellular signaling pathways that were experimentally supported, showcasing the strength of machine learning for hypothesis generation and network reconstruction.

7 Conclusion

HSC biology has entered a transformative era, driven by advances in high-throughput sequencing technologies and the parallel development of sophisticated computational frameworks. From scRNA-seq and ChIP-seq to network inference algorithms and machine learning, these techniques and tools have collectively revolutionized our ability to dissect the heterogeneity of HSCs, trace lineage trajectories, and decipher regulatory circuits at unprecedented resolution. Crucially, computational strategies enhance the identification and functional characterization of true, self-renewing HSCs. They also facilitate the discovery of biomarkers, transcriptional regulators, and epigenetic modifiers that underpin hematopoietic differentiation. Integrating multi-omics datasets with predictive modeling and functional validation is poised to unlock deeper mechanistic insights into normal and pathological hematopoiesis. The convergence of machine learning, systems biology, and experimental hematology will be essential for achieving the long-standing goal of prospectively isolating and therapeutically deploying pure HSC populations. Furthermore, linking these computational insights to clinical outcomes can accelerate the development of precision therapies for hematologic malignancies, bone marrow failure syndromes, and immune disorders. In essence, computational approaches are no longer ancillary tools in HSC research; they are central to the next-generation of discoveries and therapeutic innovations in stem cell biology and regenerative medicine.

Acknowledgments

The authors are most grateful to NIH/NLM (U.S. National Institutes of Health’s National Library of Medicine) for accessing free full-text scientific publications on PubMed Central (www.ncbi.nlm.nih.gov/pmc/), which was integral for the successful completion of this work. The figure is original and created by the author using Biorender (https://www.biorender.com/).

Funding Statement

The author(s) declare that no financial support was received for the research and/or publication of this article.

Author contributions

PR: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing. BB: Data curation, Formal Analysis, Validation, Visualization, Writing – original draft, Writing – review and editing. RC: Data curation, Formal Analysis, Validation, Visualization, Writing – original draft, Writing – review and editing.

Conflict of interest

Authors BB, RC were employed by BioExIn.

The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

BAM, Binary Alignment Map; CCR, Cell Cycle Regression; ChIP-Seq, Chromatin Immunoprecipitation Sequencing; DAVID, Database for Annotation, Visualization, and Integrated Discovery; DBSCAN, Density-Based Spatial Clustering of Applications with Noise; DEGs, Differentially Expressed Genes; DGE, Differential Gene Expression; GO, Gene Ontology; GRN, Gene Regulatory Network; GSEA, Gene Set Enrichment Analysis; HSCs, Hematopoietic Stem Cells; IPA, Ingenuity Pathway Analysis; MAST, Model-based Analysis of Single-cell Transcriptomics; NGS, Next-Generation Sequencing; PCA, Principal Component Analysis; PRC2, Polycomb Repressive Complex 2; SCDE, Single-Cell Differential Expression; SCF, Stem Cell Factor; scRNA-Seq, Single-cell RNA-Sequencing; SVM, Support Vector Machines; TFs, Transcription Factors; t-SNE, t-Distributed Stochastic Neighbor Embedding; UMAP, Uniform Manifold Approximation and Projection; UMIS, Unique Molecular Identifiers; VSN, Variance-Stabilizing Normalization; WGCNA, Weighted Gene Co-expression Network Analysis.

References

Adelman E. R., Huang H.-T., Roisman A., Olsson A., Colaprico A., Qin T., et al. (2019). Aging human hematopoietic stem cells manifest profound epigenetic reprogramming of enhancers that may predispose to leukemia. Cancer Discov. 9 (8), 1080–1101. 10.1158/2159-8290.CD-18-1474 [DOI] [PMC free article] [PubMed] [Google Scholar]
Aibar S., González-Blas C. B., Moerman T., Huynh-Thu V. A., Imrichova H., Hulselmans G., et al. (2017). SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14 (11), 1083–1086. 10.1038/nmeth.4463 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ananko E. A., Podkolodny N. L., Stepanenko I. L., Ignatieva E. V., Podkolodnaya O. A., Kolchanov N. A. (2002). GeneNet: a database on structure and functional organisation of gene networks. Nucleic Acids Res. 30 (1), 398–401. 10.1093/nar/30.1.398 [DOI] [PMC free article] [PubMed] [Google Scholar]
Anders S., Pyl P. T., Huber W. (2015). HTSeq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31 (2), 166–169. 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
Angerer P., Haghverdi L., Büttner M., Theis F. J., Marr C., Buettner F. (2016). destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 32 (8), 1241–1243. 10.1093/bioinformatics/btv715 [DOI] [PubMed] [Google Scholar]
Angermueller C., Lee H. J., Reik W., Stegle O. (2017). DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18 (1), 67. 10.1186/s13059-017-1189-z [DOI] [PMC free article] [PubMed] [Google Scholar]
Aran D., Looney A. P., Liu L., Wu E., Fong V., Hsu A., et al. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20 (2), 163–172. 10.1038/s41590-018-0276-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Armingol E., Officer A., Harismendy O., Lewis N. E. (2021). Deciphering cell-cell interactions and communication from gene expression. Nat. Rev. Genet. 22 (2), 71–88. 10.1038/s41576-020-00292-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Athanasiadis E. I., Botthof J. G., Andres H., Ferreira L., Lio P., Cvejic A. (2017). Single-cell RNA-sequencing uncovers transcriptional states and fate decisions in haematopoiesis. Nat. Commun. 8 (1), 2045. 10.1038/s41467-017-02305-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bacher R., Chu L.-F., Leng N., Gasch A. P., Thomson J. A., Stewart R. M., et al. (2017). SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14 (6), 584–586. 10.1038/nmeth.4263 [DOI] [PMC free article] [PubMed] [Google Scholar]
Baran-Gale J., Chandra T., Kirschner K. (2018). Experimental design for single-cell RNA sequencing. Brief. Funct. Genomics 17 (4), 233–239. 10.1093/bfgp/elx035 [DOI] [PMC free article] [PubMed] [Google Scholar]
Barriga F., Ramírez P., Wietstruck A., Rojas N. (2012). Hematopoietic stem cell transplantation: clinical use and perspectives. Biol. Res. 45 (3), 307–316. 10.4067/S0716-97602012000300012 [DOI] [PubMed] [Google Scholar]
Bendall S. C., Davis K. L., Amir E.-A. D., Tadmor M. D., Simonds E. F., Chen T. J., et al. (2014). Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157 (3), 714–725. 10.1016/j.cell.2014.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bergen V., Lange M., Peidli S., Wolf F. A., Theis F. J. (2020). Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38 (12), 1408–1414. 10.1038/s41587-020-0591-3 [DOI] [PubMed] [Google Scholar]
Bian Q., Cahan P. (2016). Computational tools for stem cell biology. Trends Biotechnol. 34 (12), 993–1009. 10.1016/j.tibtech.2016.05.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 (15), 2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bray N. L., Pimentel H., Melsted P., Pachter L. (2016). Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34 (5), 525–527. 10.1038/nbt.3519 [DOI] [PubMed] [Google Scholar]
Brüning R. S., Tombor L., Schulz M. H., Dimmeler S., John D. (2022). Comparative analysis of common alignment tools for single-cell RNA sequencing. Gigascience 11, giac001. 10.1093/gigascience/giac001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36 (5), 411–420. 10.1038/nbt.4096 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cabezas-Wallscheid N., Klimmeck D., Hansson J., Lipka D. B., Reyes A., Wang Q., et al. (2014). Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15 (4), 507–522. 10.1016/j.stem.2014.07.005 [DOI] [PubMed] [Google Scholar]
Cahan P., Cacchiarelli D., Dunn S.-J., Hemberg M., de Sousa Lopes S. M. C., Morris S. A., et al. (2021). Computational stem cell biology: open questions and guiding principles. Cell Stem Cell 28 (1), 20–32. 10.1016/j.stem.2020.12.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cahan P., Li H., Morris S. A., Lummertz da Rocha E., Daley G. Q., Collins J. J. (2014). CellNet: network biology applied to stem cell engineering. Cell 158 (4), 903–915. 10.1016/j.cell.2014.07.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
Campbell K. R., Yau C. (2019). A descriptive marker gene approach to single-cell pseudotime inference. Bioinformatics 35 (1), 28–35. 10.1093/bioinformatics/bty498 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cannoodt R., Saelens W., Saeys Y. (2016). Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 46 (11), 2496–2506. 10.1002/eji.201646347 [DOI] [PubMed] [Google Scholar]
Cuevas-Diaz Duran R., Wei H., Wu J. (2024). Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genomics 25 (1), 444. 10.1186/s12864-024-10364-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cui K., Zang C., Roh T.-Y., Schones D. E., Childs R. W., Peng W., et al. (2009). Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4 (1), 80–93. 10.1016/j.stem.2008.11.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
Czechowicz A., Weissman I. L. (2010). Purified hematopoietic stem cell transplantation: the next generation of blood and immune replacement. Immunol. Allergy Clin. North Am. 30 (2), 159–171. 10.1016/j.iac.2010.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
Desterke C., Petit L., Sella N., Chevallier N., Cabeli V., Coquelin L., et al. (2020). Inferring gene networks in bone marrow hematopoietic stem cell-supporting stromal niche populations. iScience 23 (6), 101222. 10.1016/j.isci.2020.101222 [DOI] [PMC free article] [PubMed] [Google Scholar]
Dhal P., Azad C. (2022). A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 52 (4), 4543–4581. 10.1007/s10489-021-02550-9 [DOI] [Google Scholar]
Dobin A., Davis C. A., Schlesinger F., Drenkow J., Zaleski C., Jha S., et al. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 (1), 15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
Du Y., Huang Q., Arisdakessian C., Garmire L. X. (2020). Evaluation of STAR and Kallisto on single cell RNA-seq data alignment. G3 (Bethesda) 10 (5), 1775–1783. 10.1534/g3.120.401160 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fast E. M., Sporrij A., Manning M., Rocha E. L., Yang S., Zhou Y., et al. (2021). External signals regulate continuous transcriptional states in hematopoietic stem cells. eLife 10, 10. 10.7554/elife.66512 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fidanza A., Stumpf P. S., Ramachandran P., Tamagno S., Babtie A., Lopez-Yrigoyen M., et al. (2020). Single-cell analyses and machine learning define hematopoietic progenitor and HSC-like cells derived from human PSCs. Blood 136 (25), 2893–2904. 10.1182/blood.2020006229 [DOI] [PMC free article] [PubMed] [Google Scholar]
Finak G., McDavid A., Yajima M., Deng J., Gersuk V., Shalek A. K., et al. (2015). MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278. 10.1186/s13059-015-0844-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fortelny N., Bock C. (2020). Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol. 21 (1), 190. 10.1186/s13059-020-02100-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gade P., Kalvakolanu D. V. (2012). Chromatin immunoprecipitation assay as a tool for analyzing transcription factor activity. Methods Mol. Biol. 809, 85–104. 10.1007/978-1-61779-376-9_6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gao S., Wu Z., Kannan J., Mathews L., Feng X., Kajigaya S., et al. (2021). Comparative transcriptomic analysis of the hematopoietic system between human and mouse by single cell RNA sequencing. Cells 10 (5), 973. 10.3390/cells10050973 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghojogh B., Ghodsi A., Karray F., Crowley M. (2021). Uniform manifold approximation and projection (UMAP) and its variants: tutorial and survey. arXiv. [Google Scholar]
Gonzalez Zelaya C. V. (2019). “Towards explaining the effects of data preprocessing on machine learning,” in 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, 2086–2090. [Google Scholar]
Haghverdi L., Buettner F., Theis F. J. (2015). Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31 (18), 2989–2998. 10.1093/bioinformatics/btv325 [DOI] [PubMed] [Google Scholar]
Hannah R., Joshi A., Wilson N. K., Kinston S., Göttgens B. (2011). A compendium of genome-wide hematopoietic transcription factor maps supports the identification of gene regulatory control mechanisms. Exp. Hematol. 39 (5), 531–541. 10.1016/j.exphem.2011.02.009 [DOI] [PubMed] [Google Scholar]
Hérault L., Poplineau M., Remy E., Duprez E. (2022). Single cell transcriptomics to understand HSC heterogeneity and its evolution upon aging. Cells 11 (19), 3125. 10.3390/cells11193125 [DOI] [PMC free article] [PubMed] [Google Scholar]
Huang D. W., Sherman B. T., Lempicki R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4 (1), 44–57. 10.1038/nprot.2008.211 [DOI] [PubMed] [Google Scholar]
Ivanova N. B., Dimos J. T., Schaniel C., Hackney J. A., Moore K. A., Lemischka I. R. (2002). A stem cell molecular signature. Science 298 (5593), 601–604. 10.1126/science.1073823 [DOI] [PubMed] [Google Scholar]
Joshi A., Hannah R., Diamanti E., Göttgens B. (2013). Gene set control analysis predicts hematopoietic control mechanisms from genome-wide transcription factor binding data. Exp. Hematol. 41 (4), 354–66.e14. 10.1016/j.exphem.2012.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kamimoto K., Stringa B., Hoffmann C. M., Jindal K., Solnica-Krezel L., Morris S. A. (2023). Dissecting cell identity via network inference and in silico gene perturbation. Nature 614 (7949), 742–751. 10.1038/s41586-022-05688-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kharchenko P. V., Silberstein L., Scadden D. T. (2014). Bayesian approach to single-cell differential expression analysis. Nat. Methods 11 (7), 740–742. 10.1038/nmeth.2967 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim D., Langmead B., Salzberg S. L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12 (4), 357–360. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
Korsunsky I., Millard N., Fan J., Slowikowski K., Zhang F., Wei K., et al. (2019). Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16 (12), 1289–1296. 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kuleshov M. V., Jones M. R., Rouillard A. D., Fernandez N. F., Duan Q., Wang Z., et al. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44 (W1), W90–W97. 10.1093/nar/gkw377 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lal A., Chiang Z. D., Yakovenko N., Duarte F. M., Israeli J., Buenrostro J. D. (2021). Deep learning-based enhancement of epigenomics data with AtacWorks. Nat. Commun. 12 (1), 1507. 10.1038/s41467-021-21765-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Langfelder P., Horvath S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 9, 559. 10.1186/1471-2105-9-559 [DOI] [PMC free article] [PubMed] [Google Scholar]
Langmead B., Salzberg S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9 (4), 357–359. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lara-Astiaso D., Weiner A., Lorenzo-Vivas E., Zaretsky I., Jaitin D. A., David E., et al. (2014). Immunogenetics. Chromatin state dynamics during blood formation. Science 345 (6199), 943–949. 10.1126/science.1256271 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee J. Y., Hong S.-H. (2020). Hematopoietic stem cells and their roles in tissue regeneration. Int. J. Stem Cells 13 (1), 1–12. 10.15283/ijsc19127 [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H., Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14), 1754–1760. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liang R., Arif T., Kalmykova S., Kasianov A., Lin M., Menon V., et al. (2020). Restraining lysosomal activity preserves hematopoietic stem cell quiescence and potency. Cell Stem Cell 26 (3), 359–376.e7. 10.1016/j.stem.2020.01.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liao Y., Smyth G. K., Shi W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30 (7), 923–930. 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
Logan A. C., Weissman I. L., Shizuru J. A. (2012). The road to purified hematopoietic stem cell transplants is paved with antibodies. Curr. Opin. Immunol. 24 (5), 640–648. 10.1016/j.coi.2012.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
Love M. I., Huber W., Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 (12), 550. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu Y.-F., Cahan P., Ross S., Sahalie J., Sousa P. M., Hadland B. K., et al. (2016). Engineered murine HSCs reconstitute multi-lineage hematopoiesis and adaptive immunity. Cell Rep. 17 (12), 3178–3192. 10.1016/j.celrep.2016.11.077 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lun A. T. L., McCarthy D. J., Marioni J. C. (2016). A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. [version 2; peer review: 3 approved, 2 approved with reservations]. F1000Res 5, 2122. 10.12688/f1000research.9501.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lundberg S. M., Tu W. B., Raught B., Penn L. Z., Hoffman M. M., Lee S.-I. (2016). ChromNet: learning the human chromatin network from all ENCODE ChIP-seq data. Genome Biol. 17, 82. 10.1186/s13059-016-0925-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Marx-Blümel L., Marx C., Sonnemann J., Weise F., Hampl J., Frey J., et al. (2021). Molecular characterization of hematopoietic stem cells after in vitro amplification on biomimetic 3D PDMS cell culture scaffolds. Sci. Rep. 11 (1), 21163. 10.1038/s41598-021-00619-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Macosko E. Z., Basu A., Satija R., Nemesh J., Shekhar K., Goldman M., et al. (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161 (5), 1202–1214. 10.1016/j.cell.2015.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
Margolin A. A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla F. R., et al. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma. 7 (Suppl. 1), S7. 10.1186/1471-2105-7-S1-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]
McCarthy D. J., Campbell K. R., Lun A. T. L., Wills Q. F. (2017). Scater: preprocessing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33 (8), 1179–1186. 10.1093/bioinformatics/btw777 [DOI] [PMC free article] [PubMed] [Google Scholar]
McLean C. Y., Bristor D., Hiller M., Clarke S. L., Schaar B. T., Lowe C. B., et al. (2010). GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28 (5), 495–501. 10.1038/nbt.1630 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mercatelli D., Scalambra L., Triboli L., Ray F., Giorgi F. M. (2020). Gene regulatory network inference resources: a practical overview. Biochim. Biophys. Acta Gene Regul. Mech. 1863 (6), 194430. 10.1016/j.bbagrm.2019.194430 [DOI] [PubMed] [Google Scholar]
Moignard V., Macaulay I. C., Swiers G., Buettner F., Schütte J., Calero-Nieto F. J., et al. (2013). Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat. Cell Biol. 15 (4), 363–372. 10.1038/ncb2709 [DOI] [PMC free article] [PubMed] [Google Scholar]
Moignard V., Woodhouse S., Haghverdi L., Lilly A. J., Tanaka Y., Wilkinson A. C., et al. (2015). Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat. Biotechnol. 33 (3), 269–276. 10.1038/nbt.3154 [DOI] [PMC free article] [PubMed] [Google Scholar]
Negrin R. S., Atkinson K., Leemhuis T., Hanania E., Juttner C., Tierney K., et al. (2000). Transplantation of highly purified CD34+Thy-1+ hematopoietic stem cells in patients with metastatic breast cancer. Biol. Blood Marrow Transpl. 6 (3), 262–271. 10.1016/s1083-8791(00)70008-5 [DOI] [PubMed] [Google Scholar]
Ng A. P., Alexander W. S. (2017). Haematopoietic stem cells: past, present and future. Cell Death Discov. 3, 17002. 10.1038/cddiscovery.2017.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
Olsson A., Venkatasubramanian M., Chaudhri V. K., Aronow B. J., Salomonis N., Singh H., et al. (2016). Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537 (7622), 698–702. 10.1038/nature19348 [DOI] [PMC free article] [PubMed] [Google Scholar]
Qi Q., Cheng L., Tang X., He Y., Li Y., Yee T., et al. (2021). Dynamic CTCF binding directly mediates interactions among cis-regulatory elements essential for hematopoiesis. Blood 137 (10), 1327–1339. 10.1182/blood.2020005780 [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiu X., Hill A., Packer J., Lin D., Ma Y.-A., Trapnell C. (2017b). Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14 (3), 309–315. 10.1038/nmeth.4150 [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiu X., Mao Q., Tang Y., Wang L., Chawla R., Pliner H. A., et al. (2017a). Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14 (10), 979–982. 10.1038/nmeth.4402 [DOI] [PMC free article] [PubMed] [Google Scholar]
Raghav P. K., Singh A. K., Gangenahalli G. (2018). Stem cell factor and NSC87877 combine to enhance c-Kit mediated proliferation of human megakaryoblastic cells. PLoS ONE 13 (11), e0206364. 10.1371/journal.pone.0206364 [DOI] [PMC free article] [PubMed] [Google Scholar]
Risso D., Perraudeau F., Gribkova S., Dudoit S., Vert J.-P. (2018). A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9 (1), 284. 10.1038/s41467-017-02554-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ritchie M. E., Phipson B., Wu D., Hu Y., Law C. W., Shi W., et al. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 (7), e47. 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson M. D., McCarthy D. J., Smyth G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 (1), 139–140. 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rodriguez-Fraticelli A. E., Weinreb C., Wang S.-W., Migueles R. P., Jankovic M., Usart M., et al. (2020). Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature 583 (7817), 585–589. 10.1038/s41586-020-2503-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rodriguez-Fraticelli A. E., Wolock S. L., Weinreb C. S., Panero R., Patel S. H., Jankovic M., et al. (2018). Clonal analysis of lineage fate in native haematopoiesis. Nature 553 (7687), 212–216. 10.1038/nature25168 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rothenberg E. V. (2021). Single-cell insights into the hematopoietic generation of T-lymphocyte precursors in mouse and human. Exp. Hematol. 95, 1–12. 10.1016/j.exphem.2020.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Saint-Antoine M. M., Singh A. (2020). Network inference in systems biology: recent developments, challenges, and applications. Curr. Opin. Biotechnol. 63, 89–98. 10.1016/j.copbio.2019.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
Satija R., Farrell J. A., Gennert D., Schier A. F., Regev A. (2015). Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33 (5), 495–502. 10.1038/nbt.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]
Scikit-Learn K. O. (2016). Machine learning for evolution strategies. Cham: Springer International Publishing, 45–53. [Google Scholar]
Serina Secanechia Y. N., Bergiers I., Rogon M., Arnold C., Descostes N., Le S., et al. (2022). Identifying a novel role for the master regulator Tal1 in the endothelial to hematopoietic transition. Sci. Rep. 12 (1), 16974. 10.1038/s41598-022-20906-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 (11), 2498–2504. 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
Shin J., Berg D. A., Zhu Y., Shin J. Y., Song J., Bonaguidi M. A., et al. (2015). Single-cell RNA-seq with Waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17 (3), 360–372. 10.1016/j.stem.2015.07.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Skulimowska I., Sosniak J., Gonka M., Szade A., Jozkowicz A., Szade K. (2022). The biology of hematopoietic stem cells and its clinical implications. FEBS J. 289 (24), 7740–7759. 10.1111/febs.16192 [DOI] [PubMed] [Google Scholar]
Smith T., Heger A., Sudbery I. (2017). UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27 (3), 491–499. 10.1101/gr.209601.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
Street K., Risso D., Fletcher R. B., Das D., Ngai J., Yosef N., et al. (2018). Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19 (1), 477. 10.1186/s12864-018-4772-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W. M., et al. (2019). Comprehensive integration of single-cell data. Cell 177 (7), 1888–1902.e21. 10.1016/j.cell.2019.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
Subramanian A., Tamayo P., Mootha V. K., Mukherjee S., Ebert B. L., Gillette M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102 (43), 15545–15550. 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]
Subramanian S., Thoms J. A. I., Huang Y., Cornejo-Páramo P., Koch F. C., Jacquelin S., et al. (2023). Genome-wide transcription factor-binding maps reveal cell-specific changes in the regulatory architecture of human HSPCs. Blood 142 (17), 1448–1462. 10.1182/blood.2023021120 [DOI] [PMC free article] [PubMed] [Google Scholar]
Townes F. W., Hicks S. C., Aryee M. J., Irizarry R. A. (2019). Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 20 (1), 295. 10.1186/s13059-019-1861-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., et al. (2014). The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32 (4), 381–386. 10.1038/nbt.2859 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tricot G., Cottler-Fox M. H., Calandra G. (2010). Safety and efficacy assessment of plerixafor in patients with multiple myeloma proven or predicted to be poor mobilizers, including assessment of tumor cell mobilization. Bone Marrow Transplant. 45 (1), 63–68. 10.1038/bmt.2009.130 [DOI] [PubMed] [Google Scholar]
Tusi B. K., Wolock S. L., Weinreb C., Hwang Y., Hidalgo D., Zilionis R., et al. (2018). Population snapshots predict early haematopoietic and erythroid hierarchies. Nature 555 (7694), 54–60. 10.1038/nature25741 [DOI] [PMC free article] [PubMed] [Google Scholar]
Vanuytsel K., Villacorta-Martin C., Lindstrom-Vautrin J., Wang Z., Garcia-Beltran W. F., Vrbanac V., et al. (2022). Multi-modal profiling of human fetal liver hematopoietic stem cells reveals the molecular signature of engraftment. Nat. Commun. 13 (1), 1103. 10.1038/s41467-022-28616-x [DOI] [PMC free article] [PubMed] [Google Scholar]
Velten L., Haas S. F., Raffel S., Blaszkiewicz S., Islam S., Hennig B. P., et al. (2017). Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19 (4), 271–281. 10.1038/ncb3493 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang L., Wang S., Li W. (2012). RSeQC: quality control of RNA-seq experiments. Bioinformatics 28 (16), 2184–2185. 10.1093/bioinformatics/bts356 [DOI] [PubMed] [Google Scholar]
Wang S., Han J., Huang J., Islam K., Shi Y., Zhou Y., et al. (2024). Deep learning-based predictive classification of functional subpopulations of hematopoietic stem cells and multipotent progenitors. Stem Cell Res. Ther. 15 (1), 74. 10.1186/s13287-024-03682-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang T., Li B., Nelson C. E., Nabavi S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinforma. 20 (1), 40. 10.1186/s12859-019-2599-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
Weissman I. L., Shizuru J. A. (2008). The origins of the identification and isolation of hematopoietic stem cells, and their capability to induce donor-specific transplantation tolerance and treat autoimmune diseases. Blood 112 (9), 3543–3553. 10.1182/blood-2008-08-078220 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wilson N. K., Kent D. G., Buettner F., Shehata M., Macaulay I. C., Calero-Nieto F. J., et al. (2015). Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16 (6), 712–724. 10.1016/j.stem.2015.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wilson N. K., Schoenfelder S., Hannah R., Sánchez Castillo M., Schütte J., Ladopoulos V., et al. (2016). Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell model. Blood 127 (13), e12–e23. 10.1182/blood-2015-10-677393 [DOI] [PubMed] [Google Scholar]
Wolf F. A., Angerer P., Theis F. J. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19 (1), 15. 10.1186/s13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xia B., Zhao D., Wang G., Zhang M., Lv J., Tomoiaga A. S., et al. (2020). Machine learning uncovers cell identity regulator by histone code. Nat. Commun. 11 (1), 2696. 10.1038/s41467-020-16539-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiang G., Keller C. A., Heuston E., Giardine B. M., An L., Wixom A. Q., et al. (2020). An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res. 30 (3), 472–484. 10.1101/gr.255760.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xie H., Xu J., Hsu J. H., Nguyen M., Fujiwara Y., Peng C., et al. (2014). Polycomb repressive complex 2 regulates normal hematopoietic stem cell function in a developmental-stage-specific manner. Cell Stem Cell 14 (1), 68–80. 10.1016/j.stem.2013.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiong Z., Cui Y., Liu Z., Zhao Y., Hu M., Hu J. (2020). Evaluating explorative prediction power of machine learning algorithms for materials discovery using-fold forward cross-validation. Comp. Mater Sci. 171, 109203. 10.1016/j.commatsci.2019.109203 [DOI] [Google Scholar]
Xu S., Grullon S., Ge K., Peng W. (2014). Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells. Methods Mol. Biol. 1150, 97–111. 10.1007/978-1-4939-0512-6_5 [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu S., Hu E., Cai Y., Xie Z., Luo X., Zhan L., et al. (2024). Using clusterProfiler to characterize multiomics data. Nat. Protoc. 19 (11), 3292–3320. 10.1038/s41596-024-01020-z [DOI] [PubMed] [Google Scholar]
Yu G., Wang L.-G., Han Y., He Q.-Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16 (5), 284–287. 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu G., Wang L.-G., He Q.-Y. (2015). ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31 (14), 2382–2383. 10.1093/bioinformatics/btv145 [DOI] [PubMed] [Google Scholar]
Zhang A., Wei Y., Shi Y., Deng X., Gao J., Feng Y., et al. (2021). Profiling of h3k4me3 and h3k27me3 and their roles in gene subfunctionalization in allotetraploid cotton. Front. Plant Sci. 12, 761059. 10.3389/fpls.2021.761059 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang A. W., O’Flanagan C., Chavez E. A., Lim J. L. P., Ceglia N., McPherson A., et al. (2019). Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16 (10), 1007–1015. 10.1038/s41592-019-0529-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y., Liu T., Meyer C. A., Eeckhoute J., Johnson D. S., Bernstein B. E., et al. (2008). Model-based analysis of ChIP-seq (MACS). Genome Biol. 9 (9), R137. 10.1186/gb-2008-9-9-r137 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng G. X. Y., Terry J. M., Belgrader P., Ryvkin P., Bent Z. W., Wilson R., et al. (2017). Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049. 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] Adelman E. R., Huang H.-T., Roisman A., Olsson A., Colaprico A., Qin T., et al. (2019). Aging human hematopoietic stem cells manifest profound epigenetic reprogramming of enhancers that may predispose to leukemia. Cancer Discov. 9 (8), 1080–1101. 10.1158/2159-8290.CD-18-1474 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Aibar S., González-Blas C. B., Moerman T., Huynh-Thu V. A., Imrichova H., Hulselmans G., et al. (2017). SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14 (11), 1083–1086. 10.1038/nmeth.4463 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Ananko E. A., Podkolodny N. L., Stepanenko I. L., Ignatieva E. V., Podkolodnaya O. A., Kolchanov N. A. (2002). GeneNet: a database on structure and functional organisation of gene networks. Nucleic Acids Res. 30 (1), 398–401. 10.1093/nar/30.1.398 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] Anders S., Pyl P. T., Huber W. (2015). HTSeq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31 (2), 166–169. 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

[B6] Angerer P., Haghverdi L., Büttner M., Theis F. J., Marr C., Buettner F. (2016). destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 32 (8), 1241–1243. 10.1093/bioinformatics/btv715 [DOI] [PubMed] [Google Scholar]

[B7] Angermueller C., Lee H. J., Reik W., Stegle O. (2017). DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18 (1), 67. 10.1186/s13059-017-1189-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Aran D., Looney A. P., Liu L., Wu E., Fong V., Hsu A., et al. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20 (2), 163–172. 10.1038/s41590-018-0276-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Armingol E., Officer A., Harismendy O., Lewis N. E. (2021). Deciphering cell-cell interactions and communication from gene expression. Nat. Rev. Genet. 22 (2), 71–88. 10.1038/s41576-020-00292-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Athanasiadis E. I., Botthof J. G., Andres H., Ferreira L., Lio P., Cvejic A. (2017). Single-cell RNA-sequencing uncovers transcriptional states and fate decisions in haematopoiesis. Nat. Commun. 8 (1), 2045. 10.1038/s41467-017-02305-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Bacher R., Chu L.-F., Leng N., Gasch A. P., Thomson J. A., Stewart R. M., et al. (2017). SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14 (6), 584–586. 10.1038/nmeth.4263 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Baran-Gale J., Chandra T., Kirschner K. (2018). Experimental design for single-cell RNA sequencing. Brief. Funct. Genomics 17 (4), 233–239. 10.1093/bfgp/elx035 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Barriga F., Ramírez P., Wietstruck A., Rojas N. (2012). Hematopoietic stem cell transplantation: clinical use and perspectives. Biol. Res. 45 (3), 307–316. 10.4067/S0716-97602012000300012 [DOI] [PubMed] [Google Scholar]

[B14] Bendall S. C., Davis K. L., Amir E.-A. D., Tadmor M. D., Simonds E. F., Chen T. J., et al. (2014). Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157 (3), 714–725. 10.1016/j.cell.2014.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Bergen V., Lange M., Peidli S., Wolf F. A., Theis F. J. (2020). Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38 (12), 1408–1414. 10.1038/s41587-020-0591-3 [DOI] [PubMed] [Google Scholar]

[B16] Bian Q., Cahan P. (2016). Computational tools for stem cell biology. Trends Biotechnol. 34 (12), 993–1009. 10.1016/j.tibtech.2016.05.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 (15), 2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Bray N. L., Pimentel H., Melsted P., Pachter L. (2016). Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34 (5), 525–527. 10.1038/nbt.3519 [DOI] [PubMed] [Google Scholar]

[B19] Brüning R. S., Tombor L., Schulz M. H., Dimmeler S., John D. (2022). Comparative analysis of common alignment tools for single-cell RNA sequencing. Gigascience 11, giac001. 10.1093/gigascience/giac001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36 (5), 411–420. 10.1038/nbt.4096 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] Cabezas-Wallscheid N., Klimmeck D., Hansson J., Lipka D. B., Reyes A., Wang Q., et al. (2014). Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15 (4), 507–522. 10.1016/j.stem.2014.07.005 [DOI] [PubMed] [Google Scholar]

[B23] Cahan P., Cacchiarelli D., Dunn S.-J., Hemberg M., de Sousa Lopes S. M. C., Morris S. A., et al. (2021). Computational stem cell biology: open questions and guiding principles. Cell Stem Cell 28 (1), 20–32. 10.1016/j.stem.2020.12.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] Cahan P., Li H., Morris S. A., Lummertz da Rocha E., Daley G. Q., Collins J. J. (2014). CellNet: network biology applied to stem cell engineering. Cell 158 (4), 903–915. 10.1016/j.cell.2014.07.020 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] Campbell K. R., Yau C. (2019). A descriptive marker gene approach to single-cell pseudotime inference. Bioinformatics 35 (1), 28–35. 10.1093/bioinformatics/bty498 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Cannoodt R., Saelens W., Saeys Y. (2016). Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 46 (11), 2496–2506. 10.1002/eji.201646347 [DOI] [PubMed] [Google Scholar]

[B27] Cuevas-Diaz Duran R., Wei H., Wu J. (2024). Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets. BMC Genomics 25 (1), 444. 10.1186/s12864-024-10364-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] Cui K., Zang C., Roh T.-Y., Schones D. E., Childs R. W., Peng W., et al. (2009). Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4 (1), 80–93. 10.1016/j.stem.2008.11.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] Czechowicz A., Weissman I. L. (2010). Purified hematopoietic stem cell transplantation: the next generation of blood and immune replacement. Immunol. Allergy Clin. North Am. 30 (2), 159–171. 10.1016/j.iac.2010.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] Desterke C., Petit L., Sella N., Chevallier N., Cabeli V., Coquelin L., et al. (2020). Inferring gene networks in bone marrow hematopoietic stem cell-supporting stromal niche populations. iScience 23 (6), 101222. 10.1016/j.isci.2020.101222 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] Dhal P., Azad C. (2022). A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 52 (4), 4543–4581. 10.1007/s10489-021-02550-9 [DOI] [Google Scholar]

[B32] Dobin A., Davis C. A., Schlesinger F., Drenkow J., Zaleski C., Jha S., et al. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 (1), 15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] Du Y., Huang Q., Arisdakessian C., Garmire L. X. (2020). Evaluation of STAR and Kallisto on single cell RNA-seq data alignment. G3 (Bethesda) 10 (5), 1775–1783. 10.1534/g3.120.401160 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] Fast E. M., Sporrij A., Manning M., Rocha E. L., Yang S., Zhou Y., et al. (2021). External signals regulate continuous transcriptional states in hematopoietic stem cells. eLife 10, 10. 10.7554/elife.66512 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] Fidanza A., Stumpf P. S., Ramachandran P., Tamagno S., Babtie A., Lopez-Yrigoyen M., et al. (2020). Single-cell analyses and machine learning define hematopoietic progenitor and HSC-like cells derived from human PSCs. Blood 136 (25), 2893–2904. 10.1182/blood.2020006229 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] Finak G., McDavid A., Yajima M., Deng J., Gersuk V., Shalek A. K., et al. (2015). MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278. 10.1186/s13059-015-0844-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] Fortelny N., Bock C. (2020). Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol. 21 (1), 190. 10.1186/s13059-020-02100-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38] Gade P., Kalvakolanu D. V. (2012). Chromatin immunoprecipitation assay as a tool for analyzing transcription factor activity. Methods Mol. Biol. 809, 85–104. 10.1007/978-1-61779-376-9_6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39] Gao S., Wu Z., Kannan J., Mathews L., Feng X., Kajigaya S., et al. (2021). Comparative transcriptomic analysis of the hematopoietic system between human and mouse by single cell RNA sequencing. Cells 10 (5), 973. 10.3390/cells10050973 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] Ghojogh B., Ghodsi A., Karray F., Crowley M. (2021). Uniform manifold approximation and projection (UMAP) and its variants: tutorial and survey. arXiv. [Google Scholar]

[B41] Gonzalez Zelaya C. V. (2019). “Towards explaining the effects of data preprocessing on machine learning,” in 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, 2086–2090. [Google Scholar]

[B42] Haghverdi L., Buettner F., Theis F. J. (2015). Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31 (18), 2989–2998. 10.1093/bioinformatics/btv325 [DOI] [PubMed] [Google Scholar]

[B43] Hannah R., Joshi A., Wilson N. K., Kinston S., Göttgens B. (2011). A compendium of genome-wide hematopoietic transcription factor maps supports the identification of gene regulatory control mechanisms. Exp. Hematol. 39 (5), 531–541. 10.1016/j.exphem.2011.02.009 [DOI] [PubMed] [Google Scholar]

[B44] Hérault L., Poplineau M., Remy E., Duprez E. (2022). Single cell transcriptomics to understand HSC heterogeneity and its evolution upon aging. Cells 11 (19), 3125. 10.3390/cells11193125 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45] Huang D. W., Sherman B. T., Lempicki R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4 (1), 44–57. 10.1038/nprot.2008.211 [DOI] [PubMed] [Google Scholar]

[B46] Ivanova N. B., Dimos J. T., Schaniel C., Hackney J. A., Moore K. A., Lemischka I. R. (2002). A stem cell molecular signature. Science 298 (5593), 601–604. 10.1126/science.1073823 [DOI] [PubMed] [Google Scholar]

[B47] Joshi A., Hannah R., Diamanti E., Göttgens B. (2013). Gene set control analysis predicts hematopoietic control mechanisms from genome-wide transcription factor binding data. Exp. Hematol. 41 (4), 354–66.e14. 10.1016/j.exphem.2012.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] Kamimoto K., Stringa B., Hoffmann C. M., Jindal K., Solnica-Krezel L., Morris S. A. (2023). Dissecting cell identity via network inference and in silico gene perturbation. Nature 614 (7949), 742–751. 10.1038/s41586-022-05688-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] Kharchenko P. V., Silberstein L., Scadden D. T. (2014). Bayesian approach to single-cell differential expression analysis. Nat. Methods 11 (7), 740–742. 10.1038/nmeth.2967 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] Kim D., Langmead B., Salzberg S. L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12 (4), 357–360. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] Korsunsky I., Millard N., Fan J., Slowikowski K., Zhang F., Wei K., et al. (2019). Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16 (12), 1289–1296. 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B52] Kuleshov M. V., Jones M. R., Rouillard A. D., Fernandez N. F., Duan Q., Wang Z., et al. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44 (W1), W90–W97. 10.1093/nar/gkw377 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] Lal A., Chiang Z. D., Yakovenko N., Duarte F. M., Israeli J., Buenrostro J. D. (2021). Deep learning-based enhancement of epigenomics data with AtacWorks. Nat. Commun. 12 (1), 1507. 10.1038/s41467-021-21765-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B54] Langfelder P., Horvath S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 9, 559. 10.1186/1471-2105-9-559 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B55] Langmead B., Salzberg S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9 (4), 357–359. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B56] Lara-Astiaso D., Weiner A., Lorenzo-Vivas E., Zaretsky I., Jaitin D. A., David E., et al. (2014). Immunogenetics. Chromatin state dynamics during blood formation. Science 345 (6199), 943–949. 10.1126/science.1256271 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B57] Lee J. Y., Hong S.-H. (2020). Hematopoietic stem cells and their roles in tissue regeneration. Int. J. Stem Cells 13 (1), 1–12. 10.15283/ijsc19127 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B58] Li H., Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14), 1754–1760. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B59] Liang R., Arif T., Kalmykova S., Kasianov A., Lin M., Menon V., et al. (2020). Restraining lysosomal activity preserves hematopoietic stem cell quiescence and potency. Cell Stem Cell 26 (3), 359–376.e7. 10.1016/j.stem.2020.01.013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B60] Liao Y., Smyth G. K., Shi W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30 (7), 923–930. 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]

[B61] Logan A. C., Weissman I. L., Shizuru J. A. (2012). The road to purified hematopoietic stem cell transplants is paved with antibodies. Curr. Opin. Immunol. 24 (5), 640–648. 10.1016/j.coi.2012.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B62] Love M. I., Huber W., Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 (12), 550. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B63] Lu Y.-F., Cahan P., Ross S., Sahalie J., Sousa P. M., Hadland B. K., et al. (2016). Engineered murine HSCs reconstitute multi-lineage hematopoiesis and adaptive immunity. Cell Rep. 17 (12), 3178–3192. 10.1016/j.celrep.2016.11.077 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B64] Lun A. T. L., McCarthy D. J., Marioni J. C. (2016). A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. [version 2; peer review: 3 approved, 2 approved with reservations]. F1000Res 5, 2122. 10.12688/f1000research.9501.2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B65] Lundberg S. M., Tu W. B., Raught B., Penn L. Z., Hoffman M. M., Lee S.-I. (2016). ChromNet: learning the human chromatin network from all ENCODE ChIP-seq data. Genome Biol. 17, 82. 10.1186/s13059-016-0925-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B66] Marx-Blümel L., Marx C., Sonnemann J., Weise F., Hampl J., Frey J., et al. (2021). Molecular characterization of hematopoietic stem cells after in vitro amplification on biomimetic 3D PDMS cell culture scaffolds. Sci. Rep. 11 (1), 21163. 10.1038/s41598-021-00619-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B67] Macosko E. Z., Basu A., Satija R., Nemesh J., Shekhar K., Goldman M., et al. (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161 (5), 1202–1214. 10.1016/j.cell.2015.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B68] Margolin A. A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla F. R., et al. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinforma. 7 (Suppl. 1), S7. 10.1186/1471-2105-7-S1-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B69] McCarthy D. J., Campbell K. R., Lun A. T. L., Wills Q. F. (2017). Scater: preprocessing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33 (8), 1179–1186. 10.1093/bioinformatics/btw777 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B70] McLean C. Y., Bristor D., Hiller M., Clarke S. L., Schaar B. T., Lowe C. B., et al. (2010). GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28 (5), 495–501. 10.1038/nbt.1630 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B71] Mercatelli D., Scalambra L., Triboli L., Ray F., Giorgi F. M. (2020). Gene regulatory network inference resources: a practical overview. Biochim. Biophys. Acta Gene Regul. Mech. 1863 (6), 194430. 10.1016/j.bbagrm.2019.194430 [DOI] [PubMed] [Google Scholar]

[B72] Moignard V., Macaulay I. C., Swiers G., Buettner F., Schütte J., Calero-Nieto F. J., et al. (2013). Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat. Cell Biol. 15 (4), 363–372. 10.1038/ncb2709 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B73] Moignard V., Woodhouse S., Haghverdi L., Lilly A. J., Tanaka Y., Wilkinson A. C., et al. (2015). Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat. Biotechnol. 33 (3), 269–276. 10.1038/nbt.3154 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B74] Negrin R. S., Atkinson K., Leemhuis T., Hanania E., Juttner C., Tierney K., et al. (2000). Transplantation of highly purified CD34+Thy-1+ hematopoietic stem cells in patients with metastatic breast cancer. Biol. Blood Marrow Transpl. 6 (3), 262–271. 10.1016/s1083-8791(00)70008-5 [DOI] [PubMed] [Google Scholar]

[B75] Ng A. P., Alexander W. S. (2017). Haematopoietic stem cells: past, present and future. Cell Death Discov. 3, 17002. 10.1038/cddiscovery.2017.2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B76] Olsson A., Venkatasubramanian M., Chaudhri V. K., Aronow B. J., Salomonis N., Singh H., et al. (2016). Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537 (7622), 698–702. 10.1038/nature19348 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B77] Qi Q., Cheng L., Tang X., He Y., Li Y., Yee T., et al. (2021). Dynamic CTCF binding directly mediates interactions among cis-regulatory elements essential for hematopoiesis. Blood 137 (10), 1327–1339. 10.1182/blood.2020005780 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B78] Qiu X., Hill A., Packer J., Lin D., Ma Y.-A., Trapnell C. (2017b). Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14 (3), 309–315. 10.1038/nmeth.4150 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B79] Qiu X., Mao Q., Tang Y., Wang L., Chawla R., Pliner H. A., et al. (2017a). Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14 (10), 979–982. 10.1038/nmeth.4402 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B80] Raghav P. K., Singh A. K., Gangenahalli G. (2018). Stem cell factor and NSC87877 combine to enhance c-Kit mediated proliferation of human megakaryoblastic cells. PLoS ONE 13 (11), e0206364. 10.1371/journal.pone.0206364 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B81] Risso D., Perraudeau F., Gribkova S., Dudoit S., Vert J.-P. (2018). A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9 (1), 284. 10.1038/s41467-017-02554-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B82] Ritchie M. E., Phipson B., Wu D., Hu Y., Law C. W., Shi W., et al. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 (7), e47. 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B83] Robinson M. D., McCarthy D. J., Smyth G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 (1), 139–140. 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B84] Rodriguez-Fraticelli A. E., Weinreb C., Wang S.-W., Migueles R. P., Jankovic M., Usart M., et al. (2020). Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature 583 (7817), 585–589. 10.1038/s41586-020-2503-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B85] Rodriguez-Fraticelli A. E., Wolock S. L., Weinreb C. S., Panero R., Patel S. H., Jankovic M., et al. (2018). Clonal analysis of lineage fate in native haematopoiesis. Nature 553 (7687), 212–216. 10.1038/nature25168 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B86] Rothenberg E. V. (2021). Single-cell insights into the hematopoietic generation of T-lymphocyte precursors in mouse and human. Exp. Hematol. 95, 1–12. 10.1016/j.exphem.2020.12.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B87] Saint-Antoine M. M., Singh A. (2020). Network inference in systems biology: recent developments, challenges, and applications. Curr. Opin. Biotechnol. 63, 89–98. 10.1016/j.copbio.2019.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B88] Satija R., Farrell J. A., Gennert D., Schier A. F., Regev A. (2015). Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33 (5), 495–502. 10.1038/nbt.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B89] Scikit-Learn K. O. (2016). Machine learning for evolution strategies. Cham: Springer International Publishing, 45–53. [Google Scholar]

[B90] Serina Secanechia Y. N., Bergiers I., Rogon M., Arnold C., Descostes N., Le S., et al. (2022). Identifying a novel role for the master regulator Tal1 in the endothelial to hematopoietic transition. Sci. Rep. 12 (1), 16974. 10.1038/s41598-022-20906-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B91] Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 (11), 2498–2504. 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B92] Shin J., Berg D. A., Zhu Y., Shin J. Y., Song J., Bonaguidi M. A., et al. (2015). Single-cell RNA-seq with Waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17 (3), 360–372. 10.1016/j.stem.2015.07.013 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B93] Skulimowska I., Sosniak J., Gonka M., Szade A., Jozkowicz A., Szade K. (2022). The biology of hematopoietic stem cells and its clinical implications. FEBS J. 289 (24), 7740–7759. 10.1111/febs.16192 [DOI] [PubMed] [Google Scholar]

[B94] Smith T., Heger A., Sudbery I. (2017). UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27 (3), 491–499. 10.1101/gr.209601.116 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B95] Street K., Risso D., Fletcher R. B., Das D., Ngai J., Yosef N., et al. (2018). Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19 (1), 477. 10.1186/s12864-018-4772-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B96] Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W. M., et al. (2019). Comprehensive integration of single-cell data. Cell 177 (7), 1888–1902.e21. 10.1016/j.cell.2019.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B97] Subramanian A., Tamayo P., Mootha V. K., Mukherjee S., Ebert B. L., Gillette M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102 (43), 15545–15550. 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B98] Subramanian S., Thoms J. A. I., Huang Y., Cornejo-Páramo P., Koch F. C., Jacquelin S., et al. (2023). Genome-wide transcription factor-binding maps reveal cell-specific changes in the regulatory architecture of human HSPCs. Blood 142 (17), 1448–1462. 10.1182/blood.2023021120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B99] Townes F. W., Hicks S. C., Aryee M. J., Irizarry R. A. (2019). Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 20 (1), 295. 10.1186/s13059-019-1861-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B100] Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., et al. (2014). The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32 (4), 381–386. 10.1038/nbt.2859 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B123] Tricot G., Cottler-Fox M. H., Calandra G. (2010). Safety and efficacy assessment of plerixafor in patients with multiple myeloma proven or predicted to be poor mobilizers, including assessment of tumor cell mobilization. Bone Marrow Transplant. 45 (1), 63–68. 10.1038/bmt.2009.130 [DOI] [PubMed] [Google Scholar]

[B101] Tusi B. K., Wolock S. L., Weinreb C., Hwang Y., Hidalgo D., Zilionis R., et al. (2018). Population snapshots predict early haematopoietic and erythroid hierarchies. Nature 555 (7694), 54–60. 10.1038/nature25741 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B102] Vanuytsel K., Villacorta-Martin C., Lindstrom-Vautrin J., Wang Z., Garcia-Beltran W. F., Vrbanac V., et al. (2022). Multi-modal profiling of human fetal liver hematopoietic stem cells reveals the molecular signature of engraftment. Nat. Commun. 13 (1), 1103. 10.1038/s41467-022-28616-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[B103] Velten L., Haas S. F., Raffel S., Blaszkiewicz S., Islam S., Hennig B. P., et al. (2017). Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19 (4), 271–281. 10.1038/ncb3493 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B104] Wang L., Wang S., Li W. (2012). RSeQC: quality control of RNA-seq experiments. Bioinformatics 28 (16), 2184–2185. 10.1093/bioinformatics/bts356 [DOI] [PubMed] [Google Scholar]

[B105] Wang S., Han J., Huang J., Islam K., Shi Y., Zhou Y., et al. (2024). Deep learning-based predictive classification of functional subpopulations of hematopoietic stem cells and multipotent progenitors. Stem Cell Res. Ther. 15 (1), 74. 10.1186/s13287-024-03682-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B106] Wang T., Li B., Nelson C. E., Nabavi S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinforma. 20 (1), 40. 10.1186/s12859-019-2599-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B107] Weissman I. L., Shizuru J. A. (2008). The origins of the identification and isolation of hematopoietic stem cells, and their capability to induce donor-specific transplantation tolerance and treat autoimmune diseases. Blood 112 (9), 3543–3553. 10.1182/blood-2008-08-078220 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B108] Wilson N. K., Kent D. G., Buettner F., Shehata M., Macaulay I. C., Calero-Nieto F. J., et al. (2015). Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16 (6), 712–724. 10.1016/j.stem.2015.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B109] Wilson N. K., Schoenfelder S., Hannah R., Sánchez Castillo M., Schütte J., Ladopoulos V., et al. (2016). Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell model. Blood 127 (13), e12–e23. 10.1182/blood-2015-10-677393 [DOI] [PubMed] [Google Scholar]

[B110] Wolf F. A., Angerer P., Theis F. J. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19 (1), 15. 10.1186/s13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B111] Xia B., Zhao D., Wang G., Zhang M., Lv J., Tomoiaga A. S., et al. (2020). Machine learning uncovers cell identity regulator by histone code. Nat. Commun. 11 (1), 2696. 10.1038/s41467-020-16539-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B112] Xiang G., Keller C. A., Heuston E., Giardine B. M., An L., Wixom A. Q., et al. (2020). An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res. 30 (3), 472–484. 10.1101/gr.255760.119 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B113] Xie H., Xu J., Hsu J. H., Nguyen M., Fujiwara Y., Peng C., et al. (2014). Polycomb repressive complex 2 regulates normal hematopoietic stem cell function in a developmental-stage-specific manner. Cell Stem Cell 14 (1), 68–80. 10.1016/j.stem.2013.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B114] Xiong Z., Cui Y., Liu Z., Zhao Y., Hu M., Hu J. (2020). Evaluating explorative prediction power of machine learning algorithms for materials discovery using-fold forward cross-validation. Comp. Mater Sci. 171, 109203. 10.1016/j.commatsci.2019.109203 [DOI] [Google Scholar]

[B115] Xu S., Grullon S., Ge K., Peng W. (2014). Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells. Methods Mol. Biol. 1150, 97–111. 10.1007/978-1-4939-0512-6_5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B116] Xu S., Hu E., Cai Y., Xie Z., Luo X., Zhan L., et al. (2024). Using clusterProfiler to characterize multiomics data. Nat. Protoc. 19 (11), 3292–3320. 10.1038/s41596-024-01020-z [DOI] [PubMed] [Google Scholar]

[B117] Yu G., Wang L.-G., Han Y., He Q.-Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16 (5), 284–287. 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B118] Yu G., Wang L.-G., He Q.-Y. (2015). ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31 (14), 2382–2383. 10.1093/bioinformatics/btv145 [DOI] [PubMed] [Google Scholar]

[B119] Zhang A., Wei Y., Shi Y., Deng X., Gao J., Feng Y., et al. (2021). Profiling of h3k4me3 and h3k27me3 and their roles in gene subfunctionalization in allotetraploid cotton. Front. Plant Sci. 12, 761059. 10.3389/fpls.2021.761059 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B120] Zhang A. W., O’Flanagan C., Chavez E. A., Lim J. L. P., Ceglia N., McPherson A., et al. (2019). Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16 (10), 1007–1015. 10.1038/s41592-019-0529-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B121] Zhang Y., Liu T., Meyer C. A., Eeckhoute J., Johnson D. S., Bernstein B. E., et al. (2008). Model-based analysis of ChIP-seq (MACS). Genome Biol. 9 (9), R137. 10.1186/gb-2008-9-9-r137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B122] Zheng G. X. Y., Terry J. M., Belgrader P., Ryvkin P., Bent Z. W., Wilson R., et al. (2017). Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049. 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Unlocking hematopoietic stem cell potential: integrative computational approaches for genomic and transcriptomic analysis

Pawan Kumar Raghav

Basudha Banerjee

Rajni Chadha

Abstract

1 Introduction

FIGURE 1.

2 Approaches for analyzing HSC genomics and transcriptomic data

3 scRNA-seq in HSC analysis

TABLE 1.

3.1 Quality control and preprocessing

3.2 Normalization

3.3 Dimensionality reduction

3.4 Clustering

3.5 Differential expression and enrichment analysis

3.6 Pseudotime analysis

3.7 Network analysis

4 HSC ChIP-Seq data analysis

4.1 Computational pipeline for ChIP-Seq data analysis

4.1.1 Quality control and preprocessing

4.1.2 Alignment

4.1.3 Peak calling

4.1.4 Peak annotation and functional analysis

4.2 HSC ChIP-Seq studies

5 Network inference algorithms

5.1 Computational workflow for network inference in HSCs

5.1.1 Preprocessing

5.1.2 Network inference

5.1.3 Network validation

5.1.4 Network analysis

6 Machine learning approaches for HSC data analysis

6.1 Machine learning tools for HSC data analysis

6.1.1 Scikit-learn

6.1.2 TensorFlow

6.1.3 PyTorch

6.1.4 DeepCpG

6.1.5 ChromNet

6.2 Machine learning based workflow for HSC data analysis

6.2.1 Data preprocessing

6.2.2 Feature selection

6.2.3 Model training

6.2.4 Model evaluation

6.2.5 Network analysis

6.3 Case study: regulatory prediction between endothelial cells and HSCs

7 Conclusion

Acknowledgments

Funding Statement

Author contributions

Conflict of interest

Generative AI statement

Publisher’s note

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases