Abstract
Motivation
Understanding cell fate determination is crucial in developmental biology and regenerative medicine. Although theoretical frameworks such as epigenetic landscape and gene regulatory networks have been proposed for decades, traditional studies have often been limited by population-averaging and low-throughput techniques, which obscure the heterogeneity of individual cells and fail to provide a systematic view of cell fate control. Recent advances in single-cell technologies have provided unprecedented resolution, revealing the complexity of cell fate decisions and driving the need for more sophisticated computational methods.
Results
In this review, we first emphasize experimental advances, such as single-cell multi-omics, lineage tracing, and perturbation techniques, which produce novel data modalities and enable dynamic tracking of cell fate transitions. We then discuss the modeling paradigms for cell fate studies and further assess the role of emerging AI tools in perturbation modeling and discuss the potential of single-cell and spatial foundation models. Additionally, we highlight several case studies on predicting and manipulating cell fates, and discuss key challenges and future directions of the field.
Availability and implementation
This work generates no new software.
1 Introduction
Cell fate determination refers to the process by which individual cells commit to specific functional roles within a multicellular organism, often proceeding through a series of transient or intermediate states (Moris et al. 2016, Lee et al. 2024). Although this process exhibits intrinsic stochasticity in cellular activities such as gene expression (Losick and Desplan 2008), it nonetheless maintains a high degree of precision. This precision is maintained through the orchestration of tightly regulated signaling pathways and gene regulatory networks, ensuring consistent and evolutionarily conserved outcomes across diverse biological contexts (MacNeil and Walhout 2011).
Significant historical observations have highlighted both the specificity and plasticity of cell fate determination. Spemann and Mangold’s identification of the “organizer” in 1924 marked the first demonstration of inductive signaling in fate determination (Spemann et al. 2024). John Gurdon’s demonstration of cell identity plasticity (Gurdon 1962) and Harold Weintraub’s conversion of fibroblasts into myoblasts (Davis et al. 1987) further established the feasibility of cell reprogramming. This field reached a watershed moment when Yamanaka and colleagues reprogrammed somatic cells to induced pluripotent stem cells (iPSCs) in 2006 (Takahashi and Yamanaka 2006). Despite these foundational discoveries and subsequent advances in various model systems, the systems-level principles that govern cell fate determination remain incompletely understood (De Belly et al. 2022).
To elucidate these mechanisms and predict cell fate determination, substantial theoretical frameworks have been developed. Classical concepts such as Waddington’s epigenetic landscape provide an intuitive metaphor for depicting developmental trajectories, while gene regulatory networks provide a foundation for reaction dynamics-based modeling. Within this framework, numerous models have been proposed, including the mutual antagonistic transcription factor network for myeloid progenitor differentiation (Doré and Crispino 2011) and the seesaw model for stem cell lineage specification (Shu et al. 2013). Recent progress has been driven by data-centric techniques enabling the acquisition and analysis of large amounts of single-cell omics data (Tanay and Regev 2017). Advanced sequencing and manipulation techniques now allow detailed profiling of cellular states, capturing both temporal dynamics (Chen et al. 2022) and perturbation responses (Peidli et al. 2024). Complementing these experimental advances, sophisticated computational algorithms for cell state classification (Pasquini et al. 2021), trajectory reconstruction (Kester and Van Oudenaarden 2018), and perturbation-response predictions (Ji et al. 2021) have emerged, enabling comprehensive analysis of high-dimensional data and even the generation of new data through generative modeling (Rivero-Garcia et al. 2024) for tackling a wide range of questions related to cell fate control.
In this review, we first revisit classical frameworks such as the epigenetic landscape and gene regulatory networks. We then highlight recent progress in single-cell omics and perturbation technologies, followed by computational methods—including AI-driven approaches—for modeling and predicting cell fate. Finally, we discuss emerging applications and future directions bridging data and biology in this rapidly evolving field. By integrating computational and experimental perspectives, this review aims to illuminate the evolving paradigm of cell fate studies and highlight its potential for advancing both basic research and medical applications.
2 Theoretical depiction of cell fates
2.1 Epigenetic landscape
Over half a century ago, Conrad Waddington introduced the concept of the “epigenetic landscape” (Fig. 1Aa) (Waddington 1942, 2014). Waddington suggested that the landscape’s formation is governed by underlying gene interactions, where shifts in the regulatory network can alter its contours. Since its inception, this theoretical framework has been continuously refined through concepts from dynamic systems (Kauffman 1969, Ferrell 2012), including attractor and bifurcation theory, providing a mathematical foundation that has evolved into a multidimensional phase space characterized by cell state features (Huang 2012) (Fig. 1Ab).
Figure 1.
Theoretical depiction of cell fates. (A) Epigenetic landscape. Aa: Waddington’s metaphor. Cells are like stones rolling down a branching hillside, with “valleys” corresponding to distinct cell states. Ab: Mathematical representation of epigenetic landscape. Cell states can be characterized by a single-cell matrix, which turns the epigenetic landscape into a multidimensional phase space. In a three-dimensional projection, each cell fate corresponds to an attractor with a potential denoting its transition probability. Fate transitions can then be conceptualized as cells moving between attractors. (B) Gene regulatory network (GRN). Ba: A schematic of GRN. A GRN depicts genes and proteins as nodes, with edges indicating their regulatory relationships. Bb: Network motifs. Transcription networks are organized through various network motifs of transcription factors and genes, such as autoregulation, cascade, feedforward loops, feedback loops, and single-input modules (Shoval and Alon 2010). Bc: Mathematical representation of GRN. At time , a cell’s state can be described as a vector . The GRN, represented by , determines how that state evolves over time: . This network dynamics can be modeled using ODEs or Boolean approaches. ODE, ordinary differential equation.
In the modern view of the epigenetic landscape, stable cell fates represent attractors within the cell state space. Fate transitions can be conceptualized either as cells moving between attractors driven by perturbations, or as reshaping of the landscape itself, where perturbations “lift” or “lower” specific attractors, or even create new ones (Enver et al. 2009). The latter scenario is particularly relevant in cancer cells, where new attractors and thus increased attractor variability correspond to heightened plasticity and heterogeneity (Feinberg and Levchenko 2023). Perturbations driving transitions can arise from cellular signals or stochastic fluctuations, and the transitional potential of each fate can be quantitatively approximated using steady-state probabilities (Wang et al. 2011) or by decomposing the quasi-potential landscape through vector field analysis (Wang et al. 2010, Huang 2012).
2.2 Gene regulatory network
To mechanistically decode cell fate decisions, gene regulatory networks (GRNs) offer a powerful framework for modeling how genes and molecular regulators interact to orchestrate cell fate. These networks establish and sustain functional tissues by driving sequential and largely irreversible gene expression patterns leading to specific lineage differentiation (Levine and Davidson 2005).
GRNs are composed of interacting genes, proteins, RNAs and metabolites that participate in the regulatory processes of interest. Among the different types of GRNs, transcriptional networks are especially pivotal, as transcription factors (TFs) serve as key regulators of gene expression during cell fate determination (Spitz and Furlong 2012). Transcription networks are hierarchically organized and highly modular (MacNeil and Walhout 2011), with TFs and their target genes interacting through specific network motifs (Basso et al. 2005, Harman et al. 2021) (Fig. 1B). These motifs enable precise control over gene expression, allowing cells to integrate diverse signals and adaptively prime for fate decisions. A representative example is the GATA6–NANOG network governing fate decisions between epiblast and primitive endoderm during early mouse development (Fujikura et al. 2002, Bessonnard et al. 2014). Beyond transcriptional control, post-transcriptional regulation forms another crucial layer, encompassing (i) RNA-binding proteins that modulate splicing, polyadenylation, and mRNA localization (Hentze et al. 2018), (ii) microRNAs and long non-coding RNAs that fine-tune transcript abundance and translation (Bartel 2004, Statello et al. 2021), and (iii) epitranscriptomic modifications (Roundtree et al. 2017). For example, N6-methyladenosine (m6A) modification has been shown to control the balance between self-renewal and differentiation in embryonic stem cells (Geula et al. 2015) and embryonic neural stem cell (Wang et al. 2018b).
Integrating GRNs into the epigenetic landscape framework allows for mathematical representation of network function. A cell’s state at time t can be generally described as a state vector , where represents the expression level of gene . The state at the next time step is given by , where function is determined by the GRN (Lee et al. 2024). Over time, cellular states traverse the state space and evolve toward stable attractors in the landscape (Fig. 1Bc). For relatively small-scale GRNs, Boolean models or nonlinear ordinary differential equations (ODEs) are commonly used to model the network dynamics in either a discrete or continuous manner (Karlebach and Shamir 2008). When molecular copy numbers are low, master equations under the Markovian assumption can be used to describe the time evolution of the probabilities of , thereby capturing the stochastic fluctuation at the single-molecular level (Feinberg and Levchenko 2023).
3 Single-cell techniques for investigating cell fates
To accurately capture cell fate attractors and transitions within the epigenetic landscape, experiments should gather extensive phenotypic information as well as dynamic changes at single-cell resolution (Shapiro et al. 2013). The emergence of single-cell technologies provide a more quantitative experimental foundation for understanding cell fate decisions, capturing cellular heterogeneity at unprecedented resolution (Fig. 2A) (Han et al. 2020).
Figure 2.
Single-cell techniques for investigating cell fates. (A) Overview of single-cell techniques for cell fate study. Recent advancements of multi-omics, lineage tracing and perturbation at the single-cell level offer a robust experimental framework for investigating cell fates. (B) Single-cell multi-omics. From genomics and epigenomics to metabolomics, these single-cell omics techniques capture cell states at different levels of the central dogma, providing a comprehensive characterization of static cell fates. (C) Various types of barcodes for single-cell lineage tracing. Ca: Static barcodes, including inserted random DNA sequences (Weinreb et al. 2020) and polylox barcodes based on the Cre-Lox system (Pei et al. 2017). sgRNA, single-guide RNA. Cb: Mutable barcodes, accumulating mutations in target array of the CRISPR-Cas system (Pacesa et al. 2024). Cc: Dual-channel barcodes, a combination of static and mutable barcodes systems (He et al. 2022, Liao et al. 2024) or two orthogonal CRISPR-based barcoding systems (Cas9 and Cas12) (Chen et al. 2025a). (D) Single-cell perturbation techniques. Da: Single-cell perturbation workflow. For genetic or epigenetic perturbation, the CRISPR/Cas enzyme and guide RNA (gRNA) pools with guide barcode are delivered into a cell line, and perturbed cells are sequenced to capture gRNAs and assess gene disruption outcomes (Bock et al. 2022). For chemical perturbation, small molecules are introduced to single cells, providing experimentally tractable means to shift cell states and probe fate decisions (Gavriilidis et al. 2024). Db: Recent advances in single-cell perturbation, including multiple perturbations within single cells (Adamson et al. 2016, Dixit et al. 2016), integration of transcriptional or epigenetic alterations (Xie et al. 2017), multi-omics readouts [e.g. proteome (Mimitou et al. 2019, Frangieh et al. 2021, Papalexi et al. 2021, Wessels et al. 2023), chromatin accessibility (Rubin et al. 2019, Liscovitch-Brauer et al. 2021, Pierce, Granja and Greenleaf 2021)], spatial perturbation, and cell hashing methods (Mimitou et al. 2019) to enhance throughput and accuracy.
3.1 Multi-modal single-cell omics for comprehensive characterization of static cell fates
The complexity of gene regulatory networks governing cell fate requires genome-level quantification. Next-generation sequencing (NGS) (Margulies et al. 2005) has enabled cost-effective, high-throughput analysis of genetic mutations [WGS (Ng and Kirkness 2010)], protein-DNA interactions [ChIP-seq (Park 2009)], transcriptomic patterns [RNA-seq (Wang et al. 2009)], and chromatin states [ATAC-seq (Buenrostro et al. 2015)] that affect cell fate. Expanding these techniques to the single-cell level has significantly improved the resolution and accuracy of cell fate research. Single-cell RNA sequencing (scRNA-seq) has become a cornerstone of this field, effectively capturing cellular heterogeneity at the transcriptomic level (Kolodziejczyk et al. 2015, Tanay and Regev 2017). Besides, single-cell adaptations of classical chromatin-profiling and 3D-genome assays, such as sc-ChIP-seq (Rotem et al. 2015), sc-CUT&Tag (Kaya-Okur et al. 2019), sc-CUT&RUN (Womersley et al. 2025), and sc-Hi-C (Nagano et al. 2013), have enabled direct measurement of protein-DNA interactions, histone marks, as wells as higher-order chromatin structure at cellular resolution. Recent technological advancements have further enabled comprehensive single-cell profiling of other biological layers, including chromatin accessibility (Lareau et al. 2019), DNA methylation (Luo et al. 2017, Chatterton et al. 2023), proteome (Specht et al. 2021, Ye et al. 2025), and metabolome (Yuan et al. 2021, Hu et al. 2023) (Fig. 2B).
Traditional sequencing-based omics lose spatial context, limiting insights into cell–cell interactions critical for fate decisions (Bressan et al. 2023). Emerging spatial omics technologies overcome this by preserving tissue coordinates at single-cell (or near single-cell) resolution (Codeluppi et al. 2018, Moffitt et al. 2018, Wang et al. 2018a), while measuring RNA, proteins, chromatin or multi-modal signals (Takei 2025, Takei et al. 2025). Mapping molecular phenotypes back onto three-dimensional tissue architecture, spatial data reveals how physical adjacency (Pelka et al. 2021, Kim et al. 2023), gradients (Liu et al. 2024, Sanchís-Calleja et al. 2024), and tissue architecture (Mo et al. 2024, Ding et al. 2025) bias cell-state transitions. Moreover, integrating spatial omics with lineage tracing (He et al. 2022, Xie et al. 2023) and perturbation (Dhainaut et al. 2022, Baysoy et al. 2024, Binan et al. 2025) methods further connects where a cell locates and what signals it receives that changes the cell fate, which is critical for understanding biological processes in which cell-cell signaling and position play important roles (Dhainaut et al. 2022, Liu et al. 2024, Pan et al. 2024).
Although integrated single-cell omics provide comprehensive characterization of cell states (Stuart et al. 2019, Welch et al. 2019), they typically require cell destruction for data extraction, offering only static snapshots (Weinreb et al. 2018).
3.2 Lineage tracing directly captures cell fate transitions at single-cell resolutions
Reconstructing lineage relationships illuminates authentic cell fate decisions and state transitions, including progenitor-progeny relationships and cellular birth-death dynamics (Quinn et al. 2021, Yang et al. 2022). Combining lineage tracing with single-cell omics offers a powerful approach to overcome the limitations of static snapshots. These relationships can be traced using pre-genomic methods (e.g. direct observation, tracer dyes, transplantation) or, more recently, genetic barcode-based techniques (VanHorn and Morris 2021).
Two main experimental paradigms exist for establishing lineage relationships in the single-cell era. Retrospective tracing (“labeling-free” methods) leverages naturally accumulated genomic variations, such as single-nucleotide variations (SNVs) (The TRACERx Consortium et al. 2017) and copy number variations (CNVs) (Cai et al. 2014), in descendant cells to infer lineage relationships based on shared markers. This approach is readily applicable in systems where genetic manipulation is not feasible [e.g. single-cell RNA-seq of patient tumor samples (Wang et al. 2014)]. However, the low natural mutation rate limits the number of identifiable genetic features available for lineage tree reconstruction, thus constraining the resolution and accuracy of cell fate studies (Kester and Van Oudenaarden 2018).
In contrast, prospective tracing methods introduce unique genetic barcodes into cells and track their progeny, enabling direct readout of cell lineages. An example is the insertion of random DNA sequences into the genome, which serve as heritable markers detectable alongside scRNA-seq (Fig. 2Ca) (Weinreb et al. 2020). The most commonly used systems include Cre-Lox and CRISPR-based barcoding. Cre-Lox lineage tracing is well-suited for in vivo studies, where Cre recombinase acts on polylox cassettes to excise or invert DNA fragments between loxP sites, generating a large repertoire of recombinant barcodes for high-diversity clonal tracing (Pei et al. 2017). On the other hand, CRISPR-based barcoding uses Cas9 or other gene editing tools to modify synthetic sequences containing CRISPR target sites (Fig. 2Cb) (Pacesa et al. 2024). As lineages expand, these mutable barcodes accumulate unique mutations, providing detailed insights into the hierarchical structure of cellular lineages. However, this can introduce artifacts due to barcode homoplasy and failures in barcode detection, leading to inaccurate lineage tree inferences (Wagner and Klein 2020). On the contrary, static barcodes generated by other techniques provide relatively limited hierarchical information, necessitating additional experiment designs, such as sampling and sequencing cells at multiple time points (Weinreb et al. 2020) or implementing multiple rounds of genetic labeling (Biddy et al. 2018), often at the expense of data quality. To address these limitations, recent innovations have introduced dual-channel barcoding systems that either integrate static barcoding with mutable CRISPR-based barcoding systems (He et al. 2022, Liao et al. 2024) or combine two orthogonal CRISPR-based barcoding systems (Chen et al. 2025a), offering a more robust experimental foundation for lineage tree reconstructions (Fig. 2Cc).
Single-cell lineage tracing has been widely applied in a wide range of areas, revealing dynamic gene expression properties not captured by static snapshots, such as non-genetic heritable gene expression programs (Schiffman et al. 2024), diverse cell state transition trajectories (Harmange et al. 2023), and early developmental fate biases (Rodriguez-Fraticelli et al. 2018, Wang et al. 2022a). Furthermore, lineage tracing is increasingly integrated with single-cell multi-omics in vitro and in vivo (Li et al. 2023a, Jindal et al. 2024, Liu et al. 2025), and also capture cell location and identity with spatial omics (He et al. 2022, Koblan et al. 2025). However, sequencing-based single-cell lineage tracing is inherently destructive (Tang 2022), precluding repeated measurements from the same cell at different time points or simultaneous profiling of a cell and its progeny. The design of barcodes that can be jointly read out by imaging and sequencing, or the development of live-cell sequencing approaches (Chen et al. 2022, Mazelis et al. 2025), holds promise for overcoming these limitations.
3.3 Perturbation techniques enable cause-and-effect analysis at single-cell resolution
A fundamental challenge in single-cell omics is distinguishing true regulatory causality from correlation in high-dimensional datasets (Crow and Gillis 2018). Directly coupling perturbations with single-cell omics readouts offers a powerful solution (Bock et al. 2022), enabling high-dimensional phenotyping and revealing heterogeneous cell responses. Perturbations can be genetic, epigenetic, and chemical (Fig. 2Da).
As with genetic and epigenetic perturbations, linking perturbation identity to single-cell profiles remains critical, typically achieved through paired barcoding (Adamson et al. 2016, Dixit et al. 2016, Xie et al. 2017), gRNA readout constructions (Datlinger et al. 2017), or gRNA-targeted capture (Mimitou et al. 2019, Replogle et al. 2020). Modern perturbation methods extend beyond simple genotype-phenotype mapping (Fig. 2Db). First, Perturb-seq and related methods enable multiple simultaneous perturbations within single cells, increasing screening efficiency and facilitating genetics study (Adamson et al. 2016, Dixit et al. 2016). Second, CRISPR system can be engineered to induce transcriptional or epigenetic alterations rather than just gene knockouts (Xie et al. 2017, Nuñez et al. 2021). Third, Combining perturbations with single-cell multi-omics, such as CITE-seq for proteome measurements (Mimitou et al. 2019, Frangieh et al. 2021) and ATAC-seq for chromatin accessibility (Rubin et al. 2019, Liscovitch-Brauer et al. 2021), further extends the phenotype space. Moreover, combining perturbations with spatial omics causally maps cellular fate within intact tissue architecture (Dhainaut et al. 2022, Binan et al. 2025). Lasty, cell hashing can further enhance the throughput and dimensionality of these approaches, enabling simultaneous measurement of even more modalities (Mimitou et al. 2019).
Beyond genetic or epigenetic perturbations, chemical perturbations—including small molecules, cytokines or growth factors, and inhibitors or activators—provide experimentally tractable means to shift cell states and probe fate decisions (Gavriilidis et al. 2024). Chemical reagents are attractive because they (i) can be delivered transiently or repeatedly without genomic modification, (ii) are often dose-tunable with well-defined concentration–response behavior, and (iii) can be combined or sequenced to produce complex, time-dependent signaling histories that mimic development or therapy. Recent large-scale population compendia [e.g. LINCS/L1000 (Subramanian et al. 2017)] and single-cell multiplexing approaches [e.g. sci-Plex (Srivatsan et al. 2020), transient barcoding (Shin et al. 2019)] offer systematic and high-throughput surveys for multiple compounds, doses, and contexts while controlling for batch effects and reading out heterogeneous responses at single-cell resolution (Cui et al. 2024a, McFaline-Figueroa et al. 2024). However, it is crucial that the outcome of a chemical perturbation is not just what you add but when, how much, and for how long (Heemskerk et al. 2019, Teague et al. 2024).
Despite these advances, perturbation workflows face inherent limitations. Determining the precise pre-perturbation state of a cell remains a challenge. Reliable computational inference algorithms (Song et al. 2025) or integration with lineage tracing [e.g. in Perturb-seq workflows (Harmange et al. 2023)] are needed to fully track perturbation responses from initial to final state. Furthermore, comprehensive genome-wide screens can be limited by sequencing throughput and selecting appropriate targets requires significant biological insight and prior knowledge (Bock et al. 2022).
4 Modeling paradigms for cell fate determination
In the preceding section, we depicted cell fate within the cell state space in the framework of epigenetic landscape. In the data-rich era, understanding cell fate involves three key steps: defining individual cell states, characterizing transitions between them, and identifying driving factors. We next outline leading approaches for each step across data types.
4.1 Constructing a cell state map
Before assigning a fate for each cell, it is crucial to define the cell state space based on the chosen data modality and the specific biological question. Here, cell states refer to the molecular configurations that cells occupy at a given point in time, while cell fates represent terminal outcomes, the functional roles that cells ultimately adopt. Cell fates are then mapped from cell states based on the premise that cells with similar gene expression profiles share the same state (Rafelski and Theriot 2024). In practice, unsupervised clustering methods, such as Louvain and Ledian algorithm (Traag et al. 2019), are used to group heterogeneous single cells around attractors in gene expression space, with each cluster corresponding a distinct cell state (Trapnell 2015). These clusters are subsequently annotated either manually, based on the expression of well-established marker genes, or automatically using annotation tools (Pasquini et al. 2021) (Fig. 3A). In the absence of definitive ground truth for cell types, clusters are typically interpreted as distinct cell identities, capturing unique molecular signatures or metabolic states (Wagner et al. 2016). It should be noted that some clusters may represent transitioning states, reflecting intermediate or bipotent identity, which should be interpreted with caution.
Figure 3.
Modeling paradigms for cell fate determination. (A) Constructing a cell fate map. scRNA-seq data are transformed into a cell fate map through feature selection (e.g. highly variable genes), dimensionality reduction and clustering. Each cluster corresponds to an attractor in the gene expression space and is annotated as specific cell type or cell identity based on marker gene expression. Additional omics data, such as metabolomics, epigenomics, and spatial transcriptomics, can also be integrated for deeper functional insights. (B) Trajectory inference from single-cell transcriptomes. Most algorithms infer fate trajectories by ordering cells in pseudotime, a quantitative measure of biological progress (Trapnell et al. 2014). Other approaches, including population balance analysis (Tusi et al. 2018, Weinreb et al. 2018), optimal transport (Schiebinger et al. 2019, Sha et al. 2024), and RNA velocity (La Manno et al. 2018), provide complementary frameworks. (C) Leveraging single-cell lineage tracing. Ca: Lineage reconstruction. Mutable barcode–based lineage tracing requires reconstructing lineage trees from barcode data and/or single-cell transcriptomes. Cb: Clonal fate distribution analysis. Fate bias describes how cells within a particular lineage commit to a dominant fate, with statistical significance assessed against the global distribution (Li et al. 2023a, Weng et al. 2024). Visualizing clonal cells in a fate map projection (e.g. by UMAP) helps delineate fate boundaries and potentials. Hierarchical clustering based on clonal fate coupling metrics can then unveil fate hierarchies (Rodriguez-Fraticelli et al. 2018, Chan et al. 2019, Bowling et al. 2020, Weinreb and Klein 2020). *, significance marker. (D) Discovering mechanisms through GRNs. Da: Small-scale GRN interpretation. Mathematical models of small regulatory circuits comprising key TFs involved in cell fate determination are derived and experimentally validated. Db: Large-scale GRN inference. Advanced methods integrate TF binding data, scRNA-seq, and chromatin accessibility data (e.g. scATAC-seq) to infer cell-type/state specific GRNs. These inferred networks facilitate downstream analyses, including TF analysis (to identify master regulators via topological metrics) and in silico perturbations (to explore fate transitions).
Despite the widespread use of single-cell transcriptomic data to define cell fates, it is important to recognize that RNA expression alone captures only part of a cell’s phenotypic diversity. Alternative or integrative approaches to mapping cell fates—incorporating metabolomics (Rombouts et al. 2021), epigenomics (Liao et al. 2024, Zuo et al. 2024), or spatial information (Srivatsan et al. 2021, Li et al. 2024b)—may offer more comprehensive insights. For example, sci-Space applied to developing mouse embryos enabled the reconstruction of spatially resolved cell state trajectories and migratory patterns of differentiating neurons (Srivatsan et al. 2021). Moreover, to alleviate the “curse of dimensionality,” it is often necessary to perform feature selection (e.g. highly variable genes) and apply dimensionality reduction methods prior to clustering. However, each of these steps introduces additional parameter choices that bring in variability, potentially causing misclassification of cell states and the inadvertent omission of rare populations (Patterson-Cross et al. 2021).
4.2 Trajectory inference from single-cell transcriptomes
Once cell fates have been defined, the subsequent step is to characterize the fate trajectory. A major class of methods for this purpose, known as trajectory inference (TI), posits that accurate prediction of fate trajectory is possible if an adequate number of cells in transitional states are captured (Kester and Van Oudenaarden 2018). The first category of methods assumes that cells with similar expression profiles are closely positioned along the continuum of fate transitions, an idea rooted in the concept of pseudotime, which was originally proposed by Monocle (Trapnell et al. 2014). Building on this premise, numerous algorithms arrange cells along a pseudotime axis from snapshot data according to various criteria (Kester and Van Oudenaarden 2018) (Fig. 3B). A second category of methods conceptualizes fate transitions as a stochastic process and leverages time-series scRNA-seq to reconstruct trajectories (Weinreb et al. 2018, Schiebinger et al. 2019, Yeo et al. 2021, Sha et al. 2024). Notably, Waddington-OT uses optimal transport to infer the temporal evolution of cell state distribution during cellular reprogramming and reveals a broader spectrum of developmental programs (Schiebinger et al. 2019). Beyond these approaches, RNA velocity (La Manno et al. 2018) offers another way to infer the direction and likelihood of cell state transitions by estimating the time derivative of gene expression—captured through the ratio of spliced to unspliced mRNA. This concept has been extended to account for transient cell states (Bergen et al. 2020) and continuous vector fields by incorporating metabolic labeling scRNA-seq (Qiu et al. 2022).
Despite the utility of these methods in discerning kinship and transitions among cell types, multiple challenges persist. First, the assumption that cells with similar transcriptomes are temporally proximal may be confounded by factors such as cell cycle dynamics, spatial constraints, and cellular stress (Tritschler et al. 2019). Second, many trajectory methods assume fixed starting, branching, and ending points, which may not accurately reflect complex differentiation patterns (Saelens et al. 2019). Moreover, branching inferred from transcriptomic data does not necessarily coincide with actual cell division events and need not form a strictly tree-like structure, potentially leading to misinterpretations such as spurious branch points or overlooked dynamic processes (Wagner and Klein 2020).
4.3 Leveraging single-cell lineage tracing
Single-cell lineage tracing can address many of these limitations by coupling single-cell omics with genetic labeling, thereby capturing both phenotypic characteristics and lineage information within the same sample. This introduces an additional step of lineage data processing. For static barcode, clones are called based on unique barcode (or combinations); for mutable barcode, specialized bioinformatic tools are developed to infer discrete lineage tree structure (Fig. 3Ca). Some algorithms reconstruct lineage trees solely from the barcode data (Jones et al. 2020, Sashittal et al. 2023), while others (Zafar et al. 2020, Pan et al. 2023) integrate transcriptomic information into the reconstruction process. The performance of these methods has been benchmarked (Gong et al. 2021).
Because fate determination is inherently dynamic, the characterization of fate trajectory typically focuses on changes in clonal fate distributions as the lineage expands (Fig. 3Cb). Many lineages exhibit fate bias, reflecting their tendency—whether through intrinsic mechanisms or stochastic processes—to produce specific cell fates (Weinreb et al. 2020, Weng et al. 2024). At the clonal level, fate bias can be quantified by the proportion of cells that adopt the dominant fate, with statistical significance can be evaluated against the global fate distribution (Li et al. 2023a, Weng et al. 2024). When visualized on a cell state map, these clonal trajectories form distinct paths, further illustrating fate boundaries and potentials. Moreover, relationships among different fates can be inferred by hierarchical clustering based on clonal fate coupling metrics (Rodriguez-Fraticelli et al. 2018, Chan et al. 2019). On the other hand, when explicit lineage tree is available, permutation tests (shuffling fate labels across the same tree structure) can be used to assess fate bias (Regalado et al. 2025), evaluate the heritability and correlation of cells states (Schiffman et al. 2024), and identify recurrent lineage motif (Tran et al. 2024). Additionally, several studies have established mathematical models to describe the dynamic process of clonal fate transitions using transition maps (Wang et al. 2022a) and optimal transport (Forrow and Schiebinger 2021, Lange et al. 2024).
It is important to note, however, that cells within the same lineage do not necessarily exhibit similar expression profiles, and cell division and differentiation are not always perfectly coupled (Udomlumleart et al. 2021, Kukreja et al. 2024). Therefore, caution is warranted when using gene expression data to assist in lineage tree reconstruction or when inferring trajectories based solely on lineage information.
4.4 Discovering mechanisms through GRNs
After mapping cell fate trajectories, studies often explore underlying mechanisms via GRNs, using either small-scale modeling of known TFs or large-scale inference. Leveraging previously identified core TFs relevant to specific biological questions, many studies have used mathematical modeling followed by experimental validation to probe small-scale GRNs (Fig. 3Da). Such work has yielded crucial insights into a range of differentiation processes, including the binary fate decision between erythroid and myelomonocytic lineages governed by GATA1 and PU.1 (Huang et al. 2007), cellular reprogramming (Chickarmane et al. 2006), and cell polarization (Chau et al. 2012).
With the accumulation of biological knowledge and advances in high-throughput techniques, reconstructing large-scale GRNs has become a major focus in systems biology. Since the essence of transcriptional network inference lies in defining the relationships between TFs and their target genes, the most straightforward approach is to evaluate correlations from transcriptome data and identify co-expressed gene modules (Langfelder and Horvath 2008). To distinguish true regulatory interactions from spurious correlations and establish regulatory directions, TF binding profiles [i.e. measured by ChIP-seq (Park 2009) and CUT&RUN (Skene and Henikoff 2017)] or chromatin accessibility data are typically used to assign TFs to cis-regulatory elements based on motif analysis. Then, the regulatory elements are linked to target genes within a certain genomic distance (Badia-i-Mompel et al. 2023). In the single-cell field, state-of-the-art methods use scRNA-seq and scATAC-seq to reconstruct GRNs for distinct cell states (Fig. 3Db) and they have been comprehensively reviewed and benchmarked elsewhere (Badia-i-Mompel et al. 2023, Nourisa et al. 2025). Besides these multimodal approaches, some methods infer GRNs using perturbational scRNA-seq data (Ishikawa et al. 2023, Jiang et al. 2023, Littman et al. 2023, Hu et al. 2025) for casual regulatory relationships. Once inferred, GRNs can significantly enhance our understanding of cell fate in at least two ways. First, network topology analyses (e.g. TF centrality) can reveal hubs and modules in the network and pinpoint key regulators (Kuppe et al. 2022, Lee et al. 2024), while enrichment-based strategies infer TF activity from transcriptomics data to uncover the roles of master regulators in fate decisions (Aibar et al. 2017, Walsh et al. 2017, Garcia-Alonso et al. 2018). Second, GRNs can be used to predict fate transitions, exemplified by CellOracle (Kamimoto et al. 2023) and SCENIC+ (Bravo González-Blas et al. 2023), which performs in silico TF perturbations to estimate cell identity transition probabilities (Qiu et al. 2022).
However, several challenges remain in GRN inference. First, TF motif databases are often incomplete, especially for members of the same family and for co-binding events (Inukai, Kock and Bulyk 2017). Moreover, chromatin accessibility does not necessarily imply TF binding; complementary information from scChIP-seq (Rotem et al. 2015) and scCUT&Tag (Kaya-Okur et al. 2019, Bartosovic et al. 2021) can help address this limitation, and chromosome conformation assays such as single-cell Hi-C (Nagano et al. 2013) provide additional insights into long-range regulatory connections. Third, regulatory effects can shift due to epigenetic changes like promoter DNA methylation, which has been ignored in current frameworks (Fan et al. 2024). In addition, most mainstream GRN inference methods focus primarily on transcriptional networks; integrating post-transcriptional elements such as miRNAs (Kittelmann and McGregor 2019, Li et al. 2025) and alternative isoforms (Lambourne et al. 2025) could further enhance the resolution of inferred regulatory programs.
5 AI-driven advances in understanding cell fates
In recent years, rapid advances in artificial intelligence (AI) have profoundly reshaped scientific research(Embedding AI in Biology 2024). In the realm of cell fate study, AI-driven models have gone beyond conventional applications such as cell type annotation (Pasquini et al. 2021) and GRN inference and response predictions (Mochida et al. 2018), and are beginning to establish a new paradigm: the construction of AI virtual cells (AIVCs) (Bunne et al. 2024, Carr et al. 2024, Noutahi et al. 2025). By integrating extensive datasets into large-scale machine learning frameworks, AIVCs are envisioned as computational surrogates capable of simulating cellular behavior across multiple scales and diverse states (Bunne et al. 2024). Within the context of cell fate, the ability of AIVCs to represent and discover novel cellular states, and to conduct in silico experiments that manipulate these states, is be particularly valuable, as it can narrow down the hypothesis space and accelerate biological discoveries. In this section, we highlight recent advances toward this vision with a focus on perturbation modeling and single-cell foundation models.
5.1 Perturbation modeling
Perturbation modeling aims to predict cellular responses and uncover mechanisms based on genetic or chemical inputs and phenotypic readouts (Gavriilidis et al. 2024). A wide array of classical machine learning models has been developed to analyze perturbation effects, with targets ranging from quantifying perturbation magnitude (Dixit et al. 2016, Yang et al. 2020, Dong et al. 2023), ranking the perturbation effects (Duan et al. 2019, Yang et al. 2020), and finding optimal perturbation strategy (Zhang et al. 2023) to identifying distinct cell groups or states (Jin et al. 2022, Hawkins et al. 2023) and prioritizing responsive cell types (Skinnider et al. 2021). Compared with purely statistical approaches, these machine learning models are generally more adept at extracting meaningful features from large-scale, high-dimensional datasets, and they still maintain high interpretability due to the simplicity of structure and transparency. Nonetheless, their performance hinges on effective feature engineering, which can be constrained by the need for prior knowledge exceeding our current biological understanding (Li et al. 2023b), and they often lack the ability to perform in silico prediction at scale.
Through hierarchically structured networks, deep neural networks (DNNs) surpass the constraints of shallow learning approaches by uncovering previously unknown patterns and constructing rich latent space representations in a data-driven manner (LeCun et al. 2015). Perturbation modeling with DNNs is typically framed as learning latent representations in which baseline cell states are transformed by perturbation embeddings to generate predicted phenotypic outcomes. Genetic perturbation are often encoded as gene embeddings, whereas chemical perturbations rely on molecular descriptors with dosage information (Hetzel et al. 2022). Despite the differences, most models converge on autoencoder-based architecture (Roḳaḥ et al. 2023). By perturbing the latent layer directly (Sadria et al. 2024) or comparing the latent space representations of perturbed and unperturbed cells (Keinan et al. 2023), these models can evaluate the perturbation effect.
Standard autoencoders struggle to generalize to unseen perturbations or cell types, whereas variational autoencoders (e.g. scGen) enable such predictions by learning probabilistic latent spaces that capture transferable perturbation effects (Fig. 4A) (Lotfollahi et al. 2019). Subsequent VAE-based methods enhance model capability or interpretability through various strategies, such as refining VAE architectures (Lotfollahi et al. 2019, Rampášek et al. 2019), evaluating perturbation effects in a cell-type-specific manner (Kana et al. 2023, Weinberger et al. 2023), and integrating biological insights from gene annotations (Seninge et al. 2021), ontology information (Doncevic and Herrmann 2023), or functional enrichment analysis (Geiger-Schuller et al. 2023). Meanwhile, the compositional perturbation autoencoder (CPA) (Lotfollahi et al. 2023) uses an adversarial autoencoder framework to decompose the data into a collection of latent embeddings for cell type, perturbations and other covariates, enabling prediction the effects of new perturbation combinations. Advance CPA versions have introduced specialized networks for encoding small molecules (Hetzel et al. 2022) or incorporating multi-modal data (Inecik et al. 2022). Beyond autoencoders, other types of neural networks have also demonstrated impressive capabilities in perturbation-focused tasks (Bunne et al. 2023, Zheng et al. 2023, Piran et al. 2024, Roohani et al. 2024, Zinati et al. 2024). A notable example is GEARS (Roohani et al. 2024), which predicts perturbation responses by learning gene perturbation embeddings derived from a gene ontology knowledge graph combined with an inferred gene co-expression network. More recent work further leverage these strategies via subtask-decomposition (Gao et al. 2024) or introducing attention mechanisms (Alsulami et al. 2024, Bai et al. 2024).
Figure 4.
AI-driven advances in understanding cell fates. (A) A schematic representation of scGen model (Lotfollahi et al. 2019). scGen utilizes a variational autoencoder trained on scRNA-seq data from both perturbed and unperturbed cells, mapping these cells within a latent space. The shift (δ) between perturbed and unperturbed representations reflects the differences in cellular states due to perturbations, which can be applied to novel conditions (such as new cell types). (B) Overview of a single-cell foundation model. scRNA-seq datasets are embedded as inputs for self-supervised pretraining of a transformer model. After pretraining, the model can perform cell-fate related downstream tasks such as perturbation modeling, GRN inference and trajectory inference, either through fine-tuning or in a zero-shot manner.
Despite their promise, deep learning models for perturbation prediction face notable limitations. Benchmarking studies show that they often fail to outperform linear baselines (Wu et al. 2024, Ahlmann-Eltze et al. 2025), and may approach random guessing in zero-shot predictions (Li et al. 2024a). A plausible explanation is that many perturbations impact only a small subset of genes and have limited downstream effects on the broader gene network, making simple additive linear models sufficiently effective for capturing global expression profiles (Rood et al. 2024). The recent Virtual Cell Challenge represents an important step toward community-wide benchmarking for better model development (Roohani et al. 2025). As data resources and evaluation frameworks continue to expand, collective efforts are expected to steadily improve model robustness and accelerate progress toward realistic digital virtual cells.
5.2 Foundation models
Due to the inherent similarities between natural language and single-cell omics data, particularly in their sequential structure and context dependency, large language models (LLMs) have recently been applied to single-cell data analysis. These models, also referred to as single-cell foundation models (scFMs), are typically built upon the self-attention transformer architecture and pre-trained on extensive single-cell RNA-seq datasets (Szałata et al. 2024) (Fig. 4B). Although these models were not originally designed for cell fate research, many of their downstream applications, such as perturbation modeling and GRN inference, lend themselves naturally to investigating cell fate. More recently, similar foundation model frameworks have been extended to spatial transcriptomics, which will also be discussed. For further details on the architectures, pre-training strategies, or applications in other areas, we refer readers to recent reviews (Consens et al. 2023, Szałata et al. 2024).
Compared with the deep neural networks introduced earlier, scFMs bring two key advantages to perturbation modeling: comprehensive cell-state embeddings learned from extensive datasets, and the ability to capture complex gene-gene interactions through attention mechanisms. Typified by scFoundation (Hao et al. 2024a), some models (Gong et al. 2023, Liu et al. 2023a, Yang et al. 2024, Hao et al. 2024a) adopt a straightforward approach by directly inputting embeddings into other advanced perturbation prediction models like GEARS. In this setup, GEARS treats these embeddings as nodes in a gene-relational graph and integrates them with perturbation information to predict post-perturbation gene expression. Other models (Amara-Belgadi et al. 2023, Wen et al. 2023, Cui et al. 2024b, Hsieh et al. 2024) opt to fine-tune with Perturb-seq data, adapting their representations to enhance predictive capabilities. An alternative is the zero-shot approach showcased by Geneformer (Theodoris et al. 2023). Here, genes are ranked by normalized expression, and perturbation is simulated by moving a gene to the top of the ranking (activation) or removing it (inhibition). By measuring the cosine similarity between original and perturbed embeddings for both cells and genes, Geneformer can quantify the potential effect of perturbing each gene in its cellular context.
Beyond perturbation modeling, scFMs are also increasingly applied to GRN inference. Here, the self-attention mechanism provides a natural advantage: each attention head produces weights that quantify how strongly one gene’s representation attends to others, effectively capturing putative regulatory dependencies. By aggregating these weights across layers or heads, one can construct a gene–gene similarity network that reflects context-specific transcriptional regulation. Typical workflows involve (i) extracting the attention matrix of genes, (ii) constructing a similarity network based on attention patterns similarity, and (iii) clustering genes to identify commonalities in function or expression (Shen et al. 2022, Amara-Belgadi et al. 2023, Cui et al. 2024b, Kalfon et al. 2025). Additionally, scGPT (Cui et al. 2024b) offers a complementary approach by comparing a gene’s attention scores before and after perturbation using Perturb-seq data, enabling the construction of subnetworks centered on perturbed genes and highlighting casual regulatory effects.
Building on these advances, foundation models have also been extended to spatial transcriptomics with the rapid development of the technology. Compared with scFMs, spatial foundation models (spFMs) aim not only to represent gene expression states but also to capture their spatial organization within tissues. spFMs typically adopt graph-based frameworks [e.g. NicheCompass (Birk et al. 2024) and Novae (Blampey et al. 2024)] or transformer-based architectures [e.g. Nicheformer (Schaar et al. 2024), stFormer (Cao et al. 2024), scGPT-spatial (Wang et al. 2025) and SToFM (Zhao et al. 2025)] and are pretrain on large spatial transcriptomics datasets. Through this process, they learn joint embeddings that integrate transcriptomic similarity with spatial proximity, thereby enabling a range of downstream tasks such as cell type prediction and deconvolution (Schaar et al. 2024, Wang et al. 2025, Zhao et al. 2025), gene expression imputation (Wang et al. 2025, Zhao et al. 2025), and signaling pathway analysis (Birk et al. 2024, Blampey et al. 2024). Besides, some models introduce multimodal extensions by combining spatial transcriptomics with histology images (Chen et al. 2025b, Liu et al., Zhang et al.) or proteomics (Madhu et al. 2025), thereby enriching the learned embeddings with complementary structural and molecular information. Collectively, these efforts highlight the potential of spFMs to serve as general-purpose models for tissue-scale biology, bridging molecular states with spatial context.
Although foundation models do not fundamentally change the underlying principles of cell fate research, they offer new methodologies that leverage larger datasets and advanced model architectures to create more expressive data representations. However, empirical evidence suggests that task-specific models can often perform on par with or even exceed the performance of transformer-based approaches (Kedzierska et al. 2025). For example, scFMs still lag behind simpler methods like GEARS in genetic perturbation tasks, and do not consistently excel in GRN inference (Liu et al. 2023b). They also face challenges such as data scarcity, the nonsequential nature of single-cell omics data, sensitivity to hyperparameters, interpretability and high computational cost (Consens et al. 2023). However, with the emergence of spatial foundation models, we anticipate a new wave of models that integrate multimodal, multi-omics, and spatiotemporal data, moving toward uniform representations of cells that could serve as the basis for an AIVC.
6 Case studies of applications
As previously discussed, single-cell technologies and their associated analytical methods have been widely adopted in laboratories, greatly advancing our understanding of the mechanisms underlying cell fate determination. In this section, we will explore several case studies that illustrate how these methods are being used to predict and manipulate cell fates, as well as their extension to various organisms (Fig. 5A).
Figure 5.
Case studies of applications. (A) Overview of application domains in cell fate research. (B) Deconvolution of bulk RNA-seq data using scRNA-seq profiles from healthy and infected samples allows the inference of disease-induced, cell-type-specific immune responses of tuberculosis (TB) (Bossel Ben-Moshe et al. 2019). (C) Cell state manipulation for cancer therapy. In cancer models, perturbation and lineage tracing experiments, combined with single-cell RNA-seq, reveal the side effects of epigenetic manipulations (Schaff et al. 2024, Metzner et al. 2025). Such studies help identify potential therapeutic targets and can also inform regenerative medicine research. (D) Expansion of single-cell techniques to multiple model organisms. From investigating plant regenerative capacity (e.g. in cotton) to other species, scRNA-seq and pseudotime analysis generate cell fate maps for comparing developmental processes (Zhu et al. 2023). Follow-up gene editing experiments, guided by differential expression and ontology analyses, serve to validate predicted regulatory factors and highlight evolutionarily conserved mechanisms.
6.1 Predicting key factors in disease-associated processes
One of the key reasons human diseases are often complex is the intricate interactions between immune cells (Schäfer et al. 2024), which cannot be fully captured by bulk methods. The advent of single-cell omics enables systematic analysis of immune cells, including the identification of distinct immune cell subsets (Wang et al. 2022b), the mapping of their developmental trajectories (Park et al. 2020), and the assessment of their responses to pathogens (Bossel Ben-Moshe et al. 2019), all of which hold promise for improving clinical predictions. For example, scRNA-seq has been used in bacterial infection models, such as Salmonella and tuberculosis (TB), to analyze immune responses (Bossel Ben-Moshe et al. 2019) (Fig. 5B). By deconvolving bulk measurements from mixtures of diseased cell types using scRNA-seq data from both healthy and infected samples, researchers have inferred disease induced, cell-type specific responses, revealing immune mechanisms linked to disease progression.
6.2 Cell state manipulation for cancer therapy and regenerative medicine
As scientists dissect the roles of plastic cell states/fates during development and tumor formation, the idea of manipulating cell states as a therapeutic strategy naturally emerges (Schade et al. 2024). Single-cell omics, especially in combination with genetic perturbation (Metzner et al. 2025) or lineage tracing (Schaff et al. 2024), are able to identify key factors as potential therapeutic targets (Fig. 5C). For instance, several epigenetic inhibitors are currently under development as part of cancer therapies (Comet et al. 2016). However, analysis of the single-cell RNA-seq map of breast cancer cells with specific epigenetic knockouts (KO) revealed that perturbation of H3K27me3 might trigger a partial epithelial-to-mesenchymal (EMT) transition, making cancer cells more migrative and aggressive (Zhang et al. 2022). Meanwhile, lineage-tracing-based research on melanoma has demonstrated that modulating signaling pathways prior to targeted therapy can reduce the number of drug-resistant cells (Harmange et al. 2023). Such a framework could also be applied to the study of somatic cell pluripotency, potentially improving the efficiency of reprogramming process (Jain et al. 2024).
6.3 Expansion of single-cell techniques to diverse organisms
Most of the single-cell omics techniques discussed earlier were initially demonstrated in mammalian cell lines and later applied to translational medicine research. However, the applications of single-cell techniques are not confined to specific organism. After making necessary protocol adjustments, researchers can extend these experimental strategies and computational algorithms to a variety of cellular systems (Alfieri et al. 2022). One translational research example in a non-model organism is in Gossypium hirsutum (Zhu et al. 2023), where lineage inference, gene co-expression network, and differential expression analysis were used to construct models of cell differentiation and gene regulation. The study identified LAX2, LAX1, and LOX3 as key genes involved in callus formation, which could serve as potential targets for enhancing the regenerative capacity of cotton cells (Fig. 5D). Beyond technical adaptations, expanding single-cell approaches to non-model organisms raises important biological questions about the conservation and divergence of cellular identity and fate programs across evolutionarily distant taxa (Sebé-Pedrós et al. 2018). Addressing these questions introduces associated computational challenges, such as establishing accurate gene annotation and functional equivalence across species without well-annotated reference dataset, and developing methods for cross-species data integration (Zhong et al. 2025). For instance, researchers recently constructed a unified single-cell atlas spanning six vascular plant species using an automated cell-type annotation tool named XSpeciesSpanner, revealing both evolutionarily conserved cell types (e.g. epidermal and vascular cells) and lineage-specific innovations (Xue et al. 2025).
7 Challenges and future directions
7.1 Modeling biological and technical variability
Biological systems inherently have diverse types of noise. In the context of single-cell gene expression, this noise can be broadly categorized as intrinsic or extrinsic (Elowitz et al. 2002, Swain et al. 2002, Sun and Zhang 2020). Intrinsic noise arises from the stochastic nature of molecular processes within individual cells (e.g. random fluctuations in transcription), and extrinsic noise stems from cellular heterogeneity that affect multiple genes simultaneously (e.g. variations in the abundance of shared transcription factors). However, distinguishing these two noise types remains challenging. Moreover, the timescales of the noise, or fluctuations, may vary in different genes, especially in mammalian cells (Huang 2009). Some genes display heritable variability that persists for multiple generations, potentially reflecting metastable cell states within the epigenetic landscape (Chang et al. 2008, Shaffer et al. 2020). This highlights the need for a comprehensive framework capable of systematically modeling the sources and timescales of biological variability including transient noise, heritable fluctuations, and stable heterogeneity (i.e. distinct cell types).
In addition to biological noise, technical errors and biases in omics measurements can further confound analyses. For example, in scRNA-seq, only a subset of RNA molecules is recovered and quantified, leading to additional Poisson-type technical noise or “dropout” events (Kharchenko et al. 2014). While several imputation methods have been proposed, accurately estimating the magnitude of true technical noise in the presence of biological variability remains difficult (Akhtyamov et al. 2023, Cheng et al. 2023). Furthermore, in lineage tracing or perturbation experiments, maintaining cell viability and functionality during genetic manipulation can affect the reliability of the results (Ecker et al. 2018). Consequently, ongoing improvements in single-cell techniques in reducing or quantifying technical variability are essential.
7.2 Data integration and interpretation
The advent of single-cell techniques has produced data for comprehensive characterization of cell states, but it brings in several challenges for data integration and interpretation. First, large-scale single-cell and multi-omics projects generate datasets of unprecedented size and heterogeneity, creating practical bottlenecks in storage, memory, I/O, and compute that influence algorithm design (Nouri et al. 2023, Zhao et al. 2024). Methods must be both computationally scalable (e.g. streaming and sparse representations) and transparent about which sources of variation they remove or retain, since it can affect dimensionality reduction and clustering (Stuart and Satija 2019).
The second challenge is cross-modality integration. In practice, most multi-omics dataset resources are “unpaired” (different assays measured in disjoint cell sets), and the information of cell identities will be inherently lost (Stuart et al. 2019). Scientists have attempted to project cells into co-embedded space to address this problem (Ghazanfar et al. 2024, Hao et al. 2024b), which requires reliable embedding algorithms and ground truth paired datasets for training. Conversely, “paired” assays can suffer reduced per-modality quality and coverage relative to single-omics experiments (Stuart et al. 2019, Zhang and Nie 2021). Each data type can also bring modality-specific technical and biological variabilities that complicate joint analysis (Rautenstrauch et al. 2022).
Lastly, even when integrating the same modality, biological interpretation is complicated by batch effects (i.e. from different samples, labs, platforms), and benchmark studies have shown a trade-off between removing batch effects and preserving true biological variation (Korsunsky et al. 2019, Tran et al. 2020, Luecken et al. 2022).
Large atlasing efforts demonstrate there is no one-size-fits-all integration strategy and that choices made during atlas construction strongly influence downstream interpretation (Hrovatin et al. 2025). To address these issues, efforts should combine careful experimental design and metadata standards with integration pipelines that (i) explicitly model sample/technology/lab effects, (ii) include metrics that quantify retention of known biological signals as well as batch removal, and (iii) expose parameters so users can tune the balance between harmonization and signal preservation. Building such context-aware, benchmarked frameworks will be essential for integrative analyses that remove technical artifacts while preserving biologically meaningful variation (Flores et al. 2023).
7.3 Toward an integrated experimental-computational paradigm
The convergence of experimental and computational approaches marks a paradigm shift in cell fate research. This integration creates iterative cycles where computational predictions guide experiments, and experimental findings drive new analytical methods, accelerating discovery in unprecedented ways.
The traditional linear workflow from hypothesis to experiment to analysis is being replaced by a dynamic cycle where computation and experimentation are deeply intertwined. Large-scale single-cell data now generate testable hypotheses, revealing unexpected cell states, novel regulatory circuits, and hidden transition pathways. These insights guide targeted perturbations, producing new data that drive innovative analyses. This bidirectional loop accelerates discovery: computational predictions highlight critical decision points for experimental validation, while unexpected experimental observations inspire new computational models, enabling rapid hypothesis testing and refinement.
Artificial intelligence, particularly foundation models, is revolutionizing our capacity to predict and control cell fate. AIVCs—comprehensive computational representations of cellular behavior—exemplify this integration of experimental and computational approaches. By enabling in silico perturbation experiments, AIVCs allow researchers to systematically explore thousands of genetic or chemical interventions, accelerating the identification of optimal reprogramming protocols and guiding cells toward desired fates. These models will also offer digital twin capabilities for drug screening, predicting off-target effects, and optimizing dosing strategies, supporting personalized medicine by simulating patient-specific cellular responses.
As experimental and computational boundaries blur, the most significant breakthroughs will emerge from fully integrated research programs. This unified approach promises to unlock therapeutic opportunities in developmental biology, regenerative medicine, cancer treatment, and beyond, fundamentally changing how we manipulate cellular identity for human health.
Acknowledgements
We thank members of the Lin Lab, especially Yihan Deng, for helpful discussions and suggestions on the manuscript.
Contributor Information
Yutong Zhou, Integrated Science Program, Yuanpei College, Peking University, Beijing, 100871, China.
Shuyang Hou, Integrated Science Program, Yuanpei College, Peking University, Beijing, 100871, China.
Xinhao Miao, Integrated Science Program, Yuanpei College, Peking University, Beijing, 100871, China.
Guangxin Zhang, Integrated Science Program, Yuanpei College, Peking University, Beijing, 100871, China.
Zining Li, Integrated Science Program, Yuanpei College, Peking University, Beijing, 100871, China.
Di Zhang, Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan, 610213, China.
Yongjie Lin, Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, 100871, China.
Yihan Lin, Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan, 610213, China; Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China; The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, 100871, China.
Author contributions
Yutong Zhou (Data curation [lead], Project administration [lead], Writing—original draft [lead], Writing—review & editing [equal]), Shuyang Hou (Writing—original draft [supporting], Writing—review & editing [supporting]), Xinhao Miao (Writing—original draft [supporting], Writing—review & editing [supporting]), Guangxin Zhang (Writing—original draft [supporting], Writing—review & editing [supporting]), Zining Li (Data curation [supporting], Writing—original draft [supporting]), Di Zhang (Funding acquisition [equal], Writing—review & editing [equal]), Yongjie Lin (Data curation [supporting], Writing—review & editing [equal]), and Yihan Lin (Conceptualization [lead], Funding acquisition [equal], Supervision [lead], Writing—review & editing [equal])
Conflict of interest: None declared.
Funding
This work was supported by the Beijing Natural Science Foundation [QY23055]; Sichuan Science and Technology Program [2025ZNSFSC0993]; and the National Natural Science Foundation of China [T2325002, T2321001, and 32088101].
References
- Adamson B, Norman TM, Jost M et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 2016;167:1867–82.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ahlmann-Eltze C, Huber W, Anders S. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines. Nat Methods 2025;22:1657–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aibar S, González-Blas CB, Moerman T et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 2017;14:1083–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akhtyamov P, Shaheen L, Raevskiy M et al. scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting. Brief Bioinf 2023;25:bbad447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alfieri JM, Wang G, Jonika MM et al. A primer for single-cell sequencing in non-model organisms. Genes (Basel) 2022;13:380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alsulami R, Lehmann R, Khan SA et al. PrePR-CT: predicting perturbation responses in unseen cell types using cell-type-specific graphs. bioRxiv 2024;2024.07.24.604816.
- Amara-Belgadi S, Li O, Zhang DY et al. Bioformers: a scalable framework for exploring biostates using transformers. bioRxiv 2023;2023.11.29.569320.
- Badia-I-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24:739–54. [DOI] [PubMed] [Google Scholar]
- Bai D, Ellington CN, Mo S et al. AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 2024; 40: i453–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004;116:281–97. [DOI] [PubMed] [Google Scholar]
- Bartosovic M, Kabbe M, Castelo-Branco G. Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues. Nat Biotechnol 2021;39:825–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Basso K, Margolin AA, Stolovitzky G et al. Reverse engineering of regulatory networks in human B cells. Nat Genet 2005;37:382–90. [DOI] [PubMed] [Google Scholar]
- Baysoy A, Tian X, Zhang F et al. Spatially Resolved in vivo CRISPR Screen Sequencing via Perturb-DBiT. bioRxiv 2024;2024.11.18.624106.
- Bergen V, Lange M, Peidli S et al. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol 2020;38:1408–14. [DOI] [PubMed] [Google Scholar]
- Bessonnard S, De Mot L, Gonze D et al. Gata6, Nanog and Erk signaling control cell fate in the inner cell mass through a tristable regulatory network. Development 2014;141:3637–48. [DOI] [PubMed] [Google Scholar]
- Biddy BA, Kong W, Kamimoto K et al. Single-cell mapping of lineage and identity in direct reprogramming. Nature 2018;564:219–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Binan L, Jiang A, Danquah SA et al. Simultaneous CRISPR screening and spatial transcriptomics reveal intracellular, intercellular, and functional transcriptional circuits. Cell 2025;188:2141–58.e18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birk S, Bonafonte-Pardàs I, Feriz AM et al. Quantitative characterization of cell niches in spatial atlases. Published online March 2024;21:2024–02. [Google Scholar]
- Blampey Q, Benkirane H, Bercovici N et al. Novae: a graph-based foundation model for spatial transcriptomics data. bioRxiv 2024;2024.09.09.612009.
- Bock C, Datlinger P, Chardon F et al. High-content CRISPR screening. Nat Rev Methods Primers 2022;2:8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bossel Ben-Moshe N, Hen-Avivi S, Levitin N et al. Predicting bacterial infection outcomes using single cell RNA-sequencing analysis of human immune cells. Nat Commun 2019;10:3266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowling S, Sritharan D, Osorio FG et al. An engineered CRISPR-Cas9 mouse line for simultaneous readout of lineage histories and gene expression profiles in single cells. Cell 2020;181:1410–22.e27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bravo González-Blas C, De Winter S, Hulselmans G et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat Methods 2023;20:1355–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bressan D, Battistoni G, Hannon GJ. The dawn of spatial omics. Science 2023;381:eabq4964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buenrostro JD, Wu B, Chang HY et al. ATAC-seq: a method for assaying chromatin accessibility genome-wide. CP Mol Biol 2015;109:21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bunne C, Roohani Y, Rosen Y et al. How to build the virtual cell with artificial intelligence: priorities and opportunities. Cell 2024;187:7045–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bunne C, Stark SG, Gut G et al. Learning single-cell perturbation responses using neural optimal transport. Nat Methods 2023;20:1759–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai X, Evrony GD, Lehmann HS et al. Single-cell, genome-wide sequencing identifies clonal somatic Copy-Number variation in the human brain. Cell Rep 2014;8:1280–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao S, Yang K, Cheng J et al. stFormer: a foundation model for spatial transcriptomics. bioRxiv 2024;2024.09.27.615337.
- Carr A, Cool J, Karaletsos T et al. AI: a transformative opportunity in cell biology. Mol Biol Cell 2024;35:pe4. 10.1091/mbc.E24-09-0415 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan MM, Smith ZD, Grosswendt S et al. Molecular recording of mammalian embryogenesis. Nature 2019;570:77–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang HH, Hemberg M, Barahona M et al. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature 2008;453:544–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatterton Z, Lamichhane P, Ahmadi Rastegar D et al. Single-cell DNA methylation sequencing by combinatorial indexing and enzymatic DNA methylation conversion. Cell Biosci 2023;13:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chau AH, Walter JM, Gerardin J et al. Designing synthetic regulatory networks capable of self-organizing cell polarization. Cell 2012;151:320–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen C, Liao Y, Peng G. Connecting past and present: single-cell lineage tracing. Protein Cell 2022;13:790–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen C, Liao Y, Zhu M et al. Dual-nuclease single-cell lineage tracing by Cas9 and Cas12a. Cell Rep 2025. a;44:115105. [DOI] [PubMed] [Google Scholar]
- Chen W, Guillaume-Gentil O, Rainer PY et al. Live-seq enables temporal transcriptomic recording of single cells. Nature 2022;608:733–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W, Zhang P, Tran TN et al. A visual–omics foundation model to bridge histopathology with spatial transcriptomics. Nat Methods 2025. b;22:1568–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng Y, Ma X, Yuan L et al. Evaluating imputation methods for single-cell RNA-seq data. BMC Bioinformatics 2023;24:302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chickarmane V, Troein C, Nuber UA et al. Transcriptional dynamics of the embryonic stem cell switch. Gage FH (ed.). PLoS Comput Biol 2006;2:e123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Codeluppi S, Borm LE, Zeisel A et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat Methods 2018;15:932–5. [DOI] [PubMed] [Google Scholar]
- Comet I, Riising EM, Leblanc B et al. Maintaining cell identity: PRC2-mediated regulation of transcription and cancer. Nat Rev Cancer 2016;16:803–10. [DOI] [PubMed] [Google Scholar]
- Consens ME, Dufault C, Wainberg M et al. To transformers and beyond: large language models for the genome. arXiv 2023;2311.07621.
- Crow M, Gillis J. Co-expression in single-cell analysis: saving grace or original sin? Trends Genet 2018;34:823–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui A, Huang T, Li S et al. Dictionary of immune responses to cytokines at single-cell resolution. Nature 2024. a;625:377–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui H, Wang C, Maan H et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods 2024. b;21:1470–80. 10.1038/s41592-024-02201-0 [DOI] [PubMed] [Google Scholar]
- Datlinger P, Rendeiro AF, Schmidl C et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods 2017;14:297–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davis RL, Weintraub H, Lassar AB. Expression of a single transfected cDNA converts fibroblasts to myoblasts. Cell 1987;51:987–1000. [DOI] [PubMed] [Google Scholar]
- De Belly H, Paluch EK, Chalut KJ. Interplay between mechanics and signalling in regulating cell fate. Nat Rev Mol Cell Biol 2022;23:465–80. [DOI] [PubMed] [Google Scholar]
- Dhainaut M, Rose SA, Akturk G et al. Spatial CRISPR genomics identifies regulators of the tumor microenvironment. Cell 2022;185:1223–39.e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ding DY, Tang Z, Zhu B et al. Quantitative characterization of tissue states using multiomics and ecological spatial analysis. Nat Genet 2025;57:910–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixit A, Parnas O, Li B et al. Perturb-Seq: dissecting molecular circuits with scalable Single-Cell RNA profiling of pooled genetic screens. Cell 2016;167:1853–66.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doncevic D, Herrmann C. Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Martelli PL (ed.). Bioinformatics 2023;39:btad387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong M, Wang B, Wei J et al. Causal identification of single-cell experimental perturbation effects with CINEMA-OT. Nat Methods 2023;20:1769–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doré LC, Crispino JD. Transcription factor networks in erythroid cell and megakaryocyte development. Blood 2011;118:231–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duan B, Zhou C, Zhu C et al. Model-based understanding of single-cell CRISPR screening. Nat Commun 2019;10:2233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ecker S, Pancaldi V, Valencia A et al. Epigenetic and transcriptional variability shape phenotypic plasticity. Bioessays 2018;40:1700148. [DOI] [PubMed] [Google Scholar]
- Elowitz MB, Levine AJ, Siggia ED et al. Stochastic gene expression in a single cell. Science 2002;297:1183–6. ED [DOI] [PubMed] [Google Scholar]
- Embedding AI in Biology. Nat Methods 2024;21:1365–6. [DOI] [PubMed] [Google Scholar]
- Enver T, Pera M, Peterson C et al. Stem cell states, fates, and the rules of attraction. Cell Stem Cell 2009;4:387–97. [DOI] [PubMed] [Google Scholar]
- Fan S, Ma L, Song C et al. Promoter DNA methylation and transcription factor condensation are linked to transcriptional memory in mammalian cells. Cell Syst 2024;15:808–23.e6. [DOI] [PubMed] [Google Scholar]
- Feinberg AP, Levchenko A. Epigenetics as a mediator of plasticity in cancer. Science 2023;379:eaaw3835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferrell JE. Bistability, bifurcations, and Waddington’s epigenetic landscape. Curr Biol 2012;22:R458–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flores JE, Claborne DM, Weller ZD et al. Missing data in multi-omics integration: recent advances through artificial intelligence. Front Artif Intell 2023;6:1098308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forrow A, Schiebinger G. LineageOT is a unified framework for lineage tracing and trajectory inference. Nat Commun 2021;12:4940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frangieh CJ, Melms JC, Thakore PI et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat Genet 2021;53:332–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fujikura J, Yamato E, Yonemura S et al. Differentiation of embryonic stem cells is induced by GATA factors. Genes Dev 2002;16:784–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gao Y, Wei Z, Dong K et al. Toward subtask-decomposition-based learning and benchmarking for predicting genetic perturbation outcomes and beyond. Nat Comput Sci 2024;4:773–85. [DOI] [PubMed] [Google Scholar]
- Garcia-Alonso L, Iorio F, Matchan A et al. Transcription factor activities enhance markers of drug sensitivity in cancer. Cancer Res 2018;78:769–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gavriilidis GI, Vasileiou V, Orfanou A et al. A mini-review on perturbation modelling across single-cell omic modalities. Comput Struct Biotechnol J 2024;23:1886–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geiger-Schuller K, Eraslan B, Kuksenko O et al. Systematically characterizing the roles of E3-ligase family members in inflammatory responses with massively parallel Perturb-seq. bioRxiv 2023;2023.01.23.525198. 10.1101/2023.01.23.525198 [DOI]
- Geula S, Moshitch-Moshkovitz S, Dominissini D et al. m6 a mRNA methylation facilitates resolution of naïve pluripotency toward differentiation. Science 2015;347:1002–6. [DOI] [PubMed] [Google Scholar]
- Ghazanfar S, Guibentif C, Marioni JC. Stabilized mosaic single-cell data integration using unshared features. Nature Biotechnol 2024;42:284–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gong J, Hao M, Cheng X et al. xTrimoGene: an efficient and scalable representation learner for single-cell RNA-seq data. Adv Neural Inf Process Syst 2023;36:69391–403. [Google Scholar]
- Gong W, Granados AA, Hu J et al. Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees. Cell Syst 2021;12:810–26.e4. [DOI] [PubMed] [Google Scholar]
- Gurdon JB. The developmental capacity of nuclei taken from intestinal epithelium cells of feeding tadpoles. Development 1962;10:622–40. [PubMed] [Google Scholar]
- Han X, Zhou Z, Fei L et al. Construction of a human cell landscape at single-cell level. Nature 2020;581:303–9. [DOI] [PubMed] [Google Scholar]
- Hao M, Gong J, Zeng X et al. Large-scale foundation model on single-cell transcriptomics. Nat Methods 2024. a;21:1481–91. [DOI] [PubMed] [Google Scholar]
- Hao Y, Stuart T, Kowalski MH et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 2024. b;42:293–304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harman JR, Thorne R, Jamilly M et al. A KMT2A-AFF1 gene regulatory network highlights the role of core transcription factors and reveals the regulatory logic of key downstream target genes. Genome Res 2021;31:1159–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harmange G, Hueros RAR, Schaff DL et al. Disrupting cellular memory to overcome drug resistance. Nat Commun 2023;14:7130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hawkins DY, Zuch DT, Huth J et al. ICAT: a novel algorithm to robustly identify cell states following perturbations in single-cell transcriptomes. Martelli PL (ed.). Bioinformatics 2023;39:btad278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He Z, Maynard A, Jain A et al. Lineage recording in human cerebral organoids. Nat Methods 2022;19:90–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heemskerk I, Burt K, Miller M et al. Rapid changes in morphogen concentration control self-organized patterning in human embryonic stem cells. eLife 2019;8:e40526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hentze MW, Castello A, Schwarzl T et al. A brave new world of RNA-binding proteins. Nat Rev Mol Cell Biol 2018;19:327–41. [DOI] [PubMed] [Google Scholar]
- Hetzel L, Böhm S, Kilbertus N et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. Adv Neural Inf Process Syst 2022;35:26711–22. [Google Scholar]
- Hrovatin K, Sikkema L, Shitov VA et al. Considerations for building and using integrated single-cell atlases. Nat Methods 2025;22:41–57. [DOI] [PubMed] [Google Scholar]
- Hsieh K-L, Chu Y, Li X et al. scEMB: learning context representation of genes based on large-scale single-cell transcriptomics. bioRxiv 2024;2024.09.24.614685. 10.1101/2024.09.24.614685 [DOI]
- Hu Q, Lu X, Xue Z et al. Gene regulatory network inference during cell fate decisions by perturbation strategies. NPJ Syst Biol Appl 2025;11:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu T, Allam M, Cai S et al. Single-cell spatial metabolomics with cell-type specific protein profiling for tissue systems biology. Nat Commun 2023;14:8260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang S. Non-genetic heterogeneity of cells in development: more than just noise. Development 2009;136:3853–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang S. The molecular and mathematical basis of Waddington’s epigenetic landscape: a framework for post-Darwinian biology? Bioessays 2012;34:149–57. [DOI] [PubMed] [Google Scholar]
- Huang S, Guo Y-P, May G et al. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol 2007;305:695–713. [DOI] [PubMed] [Google Scholar]
- Inecik K, Uhlmann A, Lotfollahi M et al. MultiCPA: multimodal compositional perturbation autoencoder. bioRxiv 2022;2022.07.08.499049. 10.1101/2022.07.08.499049 [DOI]
- Inukai S, Kock KH, Bulyk ML. Transcription factor–DNA binding: beyond binding site motifs. Curr Opin Genet Dev 2017;43:110–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishikawa M, Sugino S, Masuda Y et al. RENGE infers gene regulatory networks using time-series single-cell RNA-seq data with CRISPR perturbations. Commun Biol 2023;6:1290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jain N, Goyal Y, Dunagin MC et al. Retrospective identification of cell-intrinsic factors that mark pluripotency potential in rare somatic cells. Cell Syst 2024;15:109–33.e10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ji Y, Lotfollahi M, Wolf FA et al. Machine learning for perturbational single-cell omics. Cell Syst 2021;12:522–37. [DOI] [PubMed] [Google Scholar]
- Jiang J, Chen S, Tsou T et al. D-SPIN constructs gene regulatory network models from multiplexed scRNA-seq data revealing organizing principles of cellular perturbation response. bioRxiv 2023;2023.04.19.537364. 10.1101/2023.04.19.537364 [DOI]
- Jin K, Schnell D, Li G et al. CellDrift: inferring perturbation responses in temporally sampled single-cell data. Brief Bioinf 2022;23:bbac324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jindal K, Adil MT, Yamaguchi N et al. Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes. Nat Biotechnol 2024;42:946–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones MG, Khodaverdian A, Quinn JJ et al. Inference of single-cell phylogenies from lineage tracing data using Cassiopeia. Genome Biol 2020;21:92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalfon J, Samaran J, Peyré G et al. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat Commun 2025;16:3607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamimoto K, Stringa B, Hoffmann CM et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 2023;614:742–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kana O, Nault R, Filipovic D et al. Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. Patterns (N Y) 2023;4:100817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 2008;9:770–80. [DOI] [PubMed] [Google Scholar]
- Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 1969;22:437–67. [DOI] [PubMed] [Google Scholar]
- Kaya-Okur HS, Wu SJ, Codomo CA et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 2019;10:1930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kedzierska KZ, Crawford L, Amini AP et al. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol 2025;26:101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keinan G, Sayal K, Gonen A et al. Learning perturbation-specific cell representations for prediction of transcriptional response across cellular contexts. bioRxiv 2023;2023.03.20.533433.
- Kester L, Van Oudenaarden A. Single-cell transcriptomics meets lineage tracing. Cell Stem Cell 2018;23:166–79. [DOI] [PubMed] [Google Scholar]
- Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods 2014;11:740–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim H, Kumar A, Lövkvist C et al. CellNeighborEX : deciphering neighbor-dependent gene expression from spatial transcriptomics data. Mol Syst Biol 2023;19:e11670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kittelmann S, McGregor AP. Modulation and evolution of animal development through microRNA regulation of gene expression. Genes (Basel) 2019;10:321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koblan LW, Yost KE, Zheng P et al. High-resolution spatial mapping of cell state and lineage dynamics in vivo with PEtracer. Science 2025;390:eadx3800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kolodziejczyk AA, Kim JK, Svensson V et al. The technology and biology of single-cell RNA sequencing. Mol Cell 2015;58:610–20. [DOI] [PubMed] [Google Scholar]
- Korsunsky I, Millard N, Fan J et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat Methods 2019;16:1289–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kukreja K, Jia BZ, McGeary SE et al. Cell state transitions are decoupled from cell division during early embryo development. Nat Cell Biol 2024;26:2035–45. [DOI] [PubMed] [Google Scholar]
- Kuppe C, Ramirez Flores RO, Li Z et al. Spatial multi-omic map of human myocardial infarction. Nature 2022;608:766–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- La Manno G, Soldatov R, Zeisel A et al. RNA velocity of single cells. Nature 2018;560:494–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lambourne L, Mattioli K, Santoso C et al. Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms. Mol Cell 2025;85:1445–66.e13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lange M, Piran Z, Klein M et al. Mapping lineage-traced cells across time points with moslin. Genome Biol 2024;25:277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lareau CA, Duarte FM, Chew JG et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol 2019;37:916–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–44. [DOI] [PubMed] [Google Scholar]
- Lee J, Kim N, Cho K-H. Decoding the principle of cell-fate determination for its reverse control. NPJ Syst Biol Appl 2024;10:47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levine M, Davidson EH. Gene regulatory networks for development. Proc Natl Acad Sci USA 2005;102:4936–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li B, Wang C, Wang Y RegNetwork 2025: an integrative data repository for gene regulatory networks in human and mouse. Nucleic Acids Research 2025;gkaf779. [DOI] [PMC free article] [PubMed]
- Li C, Gao H, She Y et al. Benchmarking AI models for in silico gene perturbation of cells. bioRxiv 2024. a;2024.12.20.629581. 10.1101/2024.12.20.629581 [DOI]
- Li Y, Li H, Peng C et al. Unraveling the spatial organization and development of human thymocytes through integration of spatial transcriptomics and single-cell multi-omics profiling. Nat Commun 2024. b;15:7784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L, Bowling S, McGeary SE et al. A mouse model with high clonal barcode diversity for joint lineage, transcriptomic, and epigenomic profiling in single cells. Cell 2023. a;186:5183–99.e22. [DOI] [PubMed] [Google Scholar]
- Li Z, Gao E, Zhou J et al. Applications of deep learning in understanding gene regulation. Cell Rep Methods 2023. b;3:100384. 10.1016/j.crmeth.2022.100384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao H, Choi J, Shendure J. Molecular recording using DNA typewriter. Nat Protoc 2024;19:2833–62. [DOI] [PubMed] [Google Scholar]
- Liao M, Zhu X, Lu Y et al. Multi-omics profiling of retinal pigment epithelium reveals enhancer-driven activation of RANK-NFATc1 signaling in traumatic proliferative vitreoretinopathy. Nat Commun 2024;15:7324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liscovitch-Brauer N, Montalbano A, Deng J et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens. Nat Biotechnol 2021;39:1270–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Littman R, Cheng M, Wang N et al. SCING: inference of robust, interpretable gene regulatory networks from single cell and spatial transcriptomics. iScience 2023;26:107124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu L, Chen A, Li Y et al. Spatiotemporal omics for biology and medicine. Cell 2024;187:4488–519. [DOI] [PubMed] [Google Scholar]
- Liu M, Yue Y, Chen X et al. Genome-coverage single-cell histone modifications for embryo lineage tracing. Nature 2025;640:828–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu T, Chen T, Zheng W et al. scELMo: embeddings from language models are good learners for single-cell data analysis. bioRxiv 2023. a;2023.12.07.569910. 10.1101/2023.12.07.569910 [DOI]
- Liu T, Li K, Wang Y et al. Evaluating the Utilities of Large Language Models in Single-cell Data Analysis. bioRxiv 2023. b;2023.09.08.555192.
- Liu T, Huang T, Ding T et al. spEMO: leveraging multi-modal foundation models for analyzing spatial multi-omic and histopathology data. bioRxiv 2025;2025.01.13.632818.
- Losick R, Desplan C. Stochasticity and cell fate. Science 2008;320:65–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lotfollahi M, Klimovskaia Susmelj A, De Donno C et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol Syst Biol 2023;19:e11517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lotfollahi M, Naghipourfar M, Theis FJ et al. Conditional out-of-sample generation for unpaired data using trVAE. arXiv 2019;1910.01791. [DOI] [PubMed]
- Lotfollahi M, Wolf FA, Theis FJ. scGen predicts single-cell perturbation responses. Nat Methods 2019;16:715–21. [DOI] [PubMed] [Google Scholar]
- Luecken MD, Büttner M, Chaichoompu K et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 2022;19:41–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo C, Keown CL, Kurihara L et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Science 2017;357:600–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacNeil LT, Walhout AJM. Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res 2011;21:645–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madhu H, Rocha JF, Huang T et al. HEIST: a graph foundation model for spatial transcriptomics and proteomics data. arXiv 2025;2506.11152.
- Margulies M, Egholm M, Altman WE et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005;437:376–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazelis I, Sun H, Kulkarni A et al. Multi-step genomics on single cells and live cultures in sub-nanoliter capsules. bioRxiv 2025;2025.03.14.642839. 10.1101/2025.03.14.642839 [DOI]
- McFaline-Figueroa JL, Srivatsan S, Hill AJ et al. Multiplex single-cell chemical genomics reveals the kinase dependence of the response to targeted therapy. Cell Genom 2024;4:100487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Metzner E, Southard KM, Norman TM. Multiome perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst 2025;16:101161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mimitou EP, Cheng A, Antonino M et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat Methods 2019;16:409–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mo C-K, Liu J, Chen S et al. Tumour evolution and microenvironment interactions in 2D and 3D space. Nature 2024;634:1178–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mochida K, Koda S, Inoue K et al. Statistical and machine learning approaches to predict gene regulatory networks from transcriptome datasets. Front Plant Sci 2018;9:1770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moffitt JR, Bambah-Mukku D, Eichhorn SW et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 2018;362:eaau5324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moris N, Pina C, Arias AM. Transition states and cell fate decisions in epigenetic landscapes. Nat Rev Genet 2016;17:693–703. [DOI] [PubMed] [Google Scholar]
- Nagano T, Lubling Y, Stevens TJ et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 2013;502:59–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ng PC, Kirkness EF. Whole genome sequencing. In: Barnes MR, Breen G (eds.), Genetic Variation. Vol 628. Totowa, NJ: Humana Press, 2010. 215–226. [DOI] [PubMed] [Google Scholar]
- Nouri N, Kurlovs AH, Gaglia G et al. Scaling up single-cell RNA-seq data analysis with CellBridge workflow. Mathelier A (ed.). Bioinformatics 2023;39:btad760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nourisa J, Passemiers A, Stock M et al. geneRNIB: a living benchmark for gene regulatory network inference. bioRxiv 2025;2025.02.25.640181.
- Noutahi E, Hartford J, Tossou P et al. Virtual cells: predict, explain, discover. arXiv 2025;2505.14613.
- Nuñez JK, Chen J, Pommier GC et al. Genome-wide programmable transcriptional memory by CRISPR-based epigenome editing. Cell 2021;184:2503–19.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pacesa M, Pelea O, Jinek M. Past, present, and future of CRISPR genome editing technologies. Cell 2024;187:1076–100. [DOI] [PubMed] [Google Scholar]
- Pan X, Danies-Lopez A, Zhang X. Mapping lineage-resolved scRNA-seq data with spatial transcriptomics using TemSOMap. bioRxiv 2024;2024.10.31.621331. 10.1101/2024.10.31.621331 [DOI]
- Pan X, Li H, Putta P et al. LinRace: cell division history reconstruction of single cells using paired lineage barcode and gene expression data. Nat Commun 2023;14:8388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Papalexi E, Mimitou EP, Butler AW et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat Genet 2021;53:322–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park J-E, Botting RA, Domínguez Conde C et al. A cell atlas of human thymic development defines T cell repertoire formation. Science 2020;367:eaay3224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park PJ. ChIP–seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pasquini G, Rojo Arias JE, Schäfer P et al. Automated methods for cell type annotation on scRNA-seq data. Comput Struct Biotechnol J 2021;19:961–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patterson-Cross RB, Levine AJ, Menon V. Selecting single cell clustering parameter values using subsampling-based robustness metrics. BMC Bioinformatics 2021;22:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pei W, Feyerabend TB, Rössler J et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 2017;548:456–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peidli S, Green TD, Shen C et al. scPerturb: harmonized single-cell perturbation data. Nat Methods 2024;21:531–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pelka K, Hofree M, Chen JH et al. Spatially organized multicellular immune hubs in human colorectal cancer. Cell 2021;184:4734–52.e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pierce SE, Granja JM, Greenleaf WJ. High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nat Commun 2021;12:2969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piran Z, Cohen N, Hoshen Y et al. Disentanglement of single-cell data with biolord. Nat Biotechnol 2024;42:1678–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiu X, Zhang Y, Martin-Rufino JD et al. Mapping transcriptomic vector fields of single cells. Cell 2022;185:690–711.e45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinn JJ, Jones MG, Okimoto RA et al. Single-cell lineages reveal the rates, routes, and drivers of metastasis in cancer xenografts. Science 2021;371:eabc1944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rafelski SM, Theriot JA. Establishing a conceptual framework for holistic cell states and state transitions. Cell 2024;187:2633–51. [DOI] [PubMed] [Google Scholar]
- Rampášek L, Hidru D, Smirnov P et al. DrVAE: improving drug response prediction via modeling of drug perturbation effects. Schwartz R (ed.). Bioinformatics 2019;35:3743–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rautenstrauch P, Vlot AHC, Saran S et al. Intricacies of single-cell multi-omics data integration. Trends Genet 2022;38:128–39. [DOI] [PubMed] [Google Scholar]
- Regalado SG, Qiu C, Kottapalli S et al. Lineage recording in monoclonal gastruloids reveals heritable modes of early development. bioRxiv 2025;2025.05.23.655664. 10.1101/2025.05.23.655664 [DOI]
- Replogle JM, Norman TM, Xu A et al. Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat Biotechnol 2020;38:954–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rivero-Garcia I, Torres M, Sánchez-Cabo F. Deep generative models in single-cell omics. Comput Biol Med 2024;176:108561. [DOI] [PubMed] [Google Scholar]
- Rodriguez-Fraticelli AE, Wolock SL, Weinreb CS et al. Clonal analysis of lineage fate in native haematopoiesis. Nature 2018;553:212–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roḳaḥ L, Maimon OZ, Shmueli E. eds. Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook. 3rd edn. Cham: Springer International Publishing AG, 2023. [Google Scholar]
- Rombouts C, De Spiegeleer M, Van Meulebroek L et al. Comprehensive polar metabolomics and lipidomics profiling discriminates the transformed from the non-transformed state in colon tissue and cell lines. Sci Rep 2021;11:17249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rood JE, Hupalowska A, Regev A. Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas. Cell 2024;187:4520–45. [DOI] [PubMed] [Google Scholar]
- Roohani Y, Huang K, Leskovec J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat Biotechnol 2024;42:927–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roohani YH, Hua TJ, Tung P-Y et al. Virtual cell challenge: toward a turing test for the virtual cell. Cell 2025;188:3370–4. [DOI] [PubMed] [Google Scholar]
- Rotem A, Ram O, Shoresh N et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat Biotechnol 2015;33:1165–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roundtree IA, Evans ME, Pan T et al. Dynamic RNA modifications in gene expression regulation. Cell 2017;169:1187–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin AJ, Parker KR, Satpathy AT et al. Coupled single-cell CRISPR screening and epigenomic profiling reveals causal gene regulatory networks. Cell 2019;176:361–76.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadria M, Layton A, Goyal S et al. Fatecode enables cell fate regulator prediction using classification-supervised autoencoder perturbation. Cell Rep Methods 2024;4:100819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saelens W, Cannoodt R, Todorov H et al. A comparison of single-cell trajectory inference methods. Nat Biotechnol 2019;37:547–54. [DOI] [PubMed] [Google Scholar]
- Sanchís-Calleja F, Jain A, He Z et al. Decoding morphogen patterning of human neural organoids with a multiplexed single-cell transcriptomic screen. bioRxiv 2024;2024.02.08.579413. 10.1101/2024.02.08.579413 [DOI]
- Sashittal P, Schmidt H, Chan M et al. Startle: a star homoplasy approach for CRISPR-Cas9 lineage tracing. Cell Syst 2023;14:1113–21.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaar AC, Tejada-Lapuerta A, Palla G et al. Nicheformer: a foundation model for single-cell and spatial omics. bioRxiv 2024;2024.04.15.589472. [DOI] [PMC free article] [PubMed]
- Schade AE, Perurena N, Yang Y et al. AKT and EZH2 inhibitors kill TNBCs by hijacking mechanisms of involution. Nature 2024;635:755–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schäfer PSL, Dimitrov D, Villablanca EJ et al. Integrating single-cell multi-omics and prior biological knowledge for a functional characterization of the immune system. Nat Immunol 2024;25:405–17. [DOI] [PubMed] [Google Scholar]
- Schaff DL, Fasse AJ, White PE et al. Clonal differences underlie variable responses to sequential and prolonged treatment. Cell Syst 2024;15:213–26.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schiebinger G, Shu J, Tabaka M et al. Optimal-Transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 2019;176:928–43.e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schiffman JS, D'Avino AR, Prieto T et al. Defining heritability, plasticity, and transition dynamics of cellular phenotypes in somatic evolution. Nat Genet 2024;56:2174–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sebé-Pedrós A, Saudemont B, Chomsky E et al. Cnidarian cell type diversity and regulation revealed by whole-organism single-cell RNA-seq. Cell 2018;173:1520–34.e20. [DOI] [PubMed] [Google Scholar]
- Seninge L, Anastopoulos I, Ding H et al. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat Commun 2021;12:5684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sha Y, Qiu Y, Zhou P et al. Reconstructing growth and dynamic trajectories from single-cell transcriptomics data. Nat Mach Intell 2024;6:25–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaffer SM, Emert BL, Reyes Hueros RA et al. Memory sequencing reveals heritable single-cell gene expression programs associated with distinct cellular behaviors. Cell 2020;182:947–59.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet 2013;14:618–30. [DOI] [PubMed] [Google Scholar]
- Shen H, Shen X, Feng M et al. A universal approach for integrating super large-scale single-cell transcriptomes by exploring gene rankings. Brief Bioinf 2022;23:bbab573. [DOI] [PubMed] [Google Scholar]
- Shin D, Lee W, Lee JH et al. Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations. Sci Adv 2019;5:eaav2249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shoval O, Alon U. SnapShot: network motifs. Cell 2010;143:326–e1. [DOI] [PubMed] [Google Scholar]
- Shu J, Wu C, Wu Y et al. Induction of pluripotency in mouse somatic cells with lineage specifiers. Cell 2013;153:963–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skene PJ, Henikoff S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 2017;6:e21856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skinnider MA, Squair JW, Kathe C et al. Cell type prioritization in single-cell data. Nat Biotechnol 2021;39:30–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song B, Liu D, Dai W et al. Decoding heterogeneous single-cell perturbation responses. Nat Cell Biol 2025;27:493–504. 10.1038/s41556-025-01626-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Specht H, Emmott E, Petelski AA et al. Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biol 2021;22:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spemann H, Mangold, Née Pröscholdt H. Induction of embryonic primordia by implantation of organizers from a different species. Cells Dev 2024;178:203940. [DOI] [PubMed] [Google Scholar]
- Spitz F, Furlong EEM. Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 2012;13:613–26. [DOI] [PubMed] [Google Scholar]
- Srivatsan SR, McFaline-Figueroa JL, Ramani V et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 2020;367:45–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srivatsan SR, Regier MC, Barkan E et al. Embryo-scale, single-cell spatial transcriptomics. Science 2021;373:111–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Statello L, Guo C-J, Chen L-L et al. Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol 2021;22:96–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuart T, Butler A, Hoffman P et al. Comprehensive integration of single-cell data. Cell 2019;177:1888–902.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet 2019;20:257–72. [DOI] [PubMed] [Google Scholar]
- Subramanian A, Narayan R, Corsello SM et al. A next generation connectivity map: L 1000 platform and the first 1,000,000 profiles. Cell 2017;171:1437–52.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun M, Zhang J. Allele-specific single-cell RNA sequencing reveals different architectures of intrinsic and extrinsic gene expression noises. Nucleic Acids Res 2020;48:533–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swain PS, Elowitz MB, Siggia ED. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci USA 2002;99:12795–800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szałata A, Hrovatin K, Becker S et al. Transformers in single-cell omics: a review and new perspectives. Nat Methods 2024;21:1430–43. [DOI] [PubMed] [Google Scholar]
- Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 2006;126:663–76. [DOI] [PubMed] [Google Scholar]
- Takei Y. Spatial multi-omics of nuclear architecture with two-layer seqFISH+. Nat Rev Genet 2025;26:582–3. [DOI] [PubMed] [Google Scholar]
- Takei Y, Yang Y, White J et al. Spatial multi-omics reveals cell-type-specific nuclear compartments. Nature 2025;641:1037–47. [DOI] [PubMed] [Google Scholar]
- Tanay A, Regev A. Scaling single-cell genomics from phenomenology to mechanism. Nature 2017;541:331–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang L. Sequencing single cells without killing. Nat Methods 2022;19:1166– [DOI] [PubMed] [Google Scholar]
- Teague S, Primavera G, Chen B et al. Time-integrated BMP signaling determines fate in a stem cell model for early human development. Nat Commun 2024;15:1471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The TRACERx Consortium, The PEACE Consortium, Abbosh C et al. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature 2017;545:446–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Theodoris CV, Xiao L, Chopra A et al. Transfer learning enables predictions in network biology. Nature 2023;618:616–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Traag VA, Waltman L, Van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 2019;9:5233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tran HTN, Ang KS, Chevrier M et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol 2020;21:12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tran M, Askary A, Elowitz MB. Lineage motifs as developmental modules for control of cell type proportions. Dev Cell 2024;59:812–26.e3. [DOI] [PubMed] [Google Scholar]
- Trapnell C. Defining cell types and states with single-cell genomics. Genome Res 2015;25:1491–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trapnell C, Cacchiarelli D, Grimsby J et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014;32:381–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tritschler S, Büttner M, Fischer DS et al. Concepts and limitations for learning developmental trajectories from single cell genomics. Klein A, Treutlein B (eds.). Development 2019;146:dev170506. [DOI] [PubMed] [Google Scholar]
- Tusi BK, Wolock SL, Weinreb C et al. Population snapshots predict early haematopoietic and erythroid hierarchies. Nature 2018;555:54–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Udomlumleart T, Hu S, Garg S. Lineages of embryonic stem cells show non-Markovian state transitions. iScience 2021;24:102879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanHorn S, Morris SA. Next-Generation lineage tracing and fate mapping to interrogate development. Dev Cell 2021;56:7–21. [DOI] [PubMed] [Google Scholar]
- Waddington CH. Canalization of development and the inheritance of acquired characters. Nature 1942;150:563–5. [DOI] [PubMed] [Google Scholar]
- Waddington CH. The Strategy of the Genes. London, UK: Routledge, 2014. [Google Scholar]
- Wagner A, Regev A, Yosef N. Revealing the vectors of cellular identity with single-cell genomics. Nat Biotechnol 2016;34:1145–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner DE, Klein AM. Lineage tracing meets single-cell omics: opportunities and challenges. Nat Rev Genet 2020;21:410–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh LA, Alvarez MJ, Sabio EY et al. An integrated systems biology approach identifies TRIM25 as a key determinant of breast cancer metastasis. Cell Rep 2017;20:1623–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang C, Cui H, Zhang A et al. scGPT-spatial: continual pretraining of single-cell foundation model for spatial transcriptomics. bioRxiv 2025;2025.02.05.636714. 10.1101/2025.02.05.636714 [DOI]
- Wang J, Xu L, Wang E et al. The potential landscape of genetic circuits imposes the arrow of time in stem cell differentiation. Biophys J 2010;99:29–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Zhang K, Xu L et al. Quantifying the Waddington landscape and biological paths for development and differentiation. Proc Natl Acad Sci USA 2011;108:8257–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang S-W, Herriges MJ, Hurley K et al. CoSpar identifies early cell fate biases from single-cell transcriptomic and lineage information. Nat Biotechnol 2022. a;40:1066–74. [DOI] [PubMed] [Google Scholar]
- Wang X, Allen WE, Wright MA et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 2018. a;361:eaat5691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Shen X, Chen S et al. Reinvestigation of classic T cell subsets and identification of novel cell subpopulations by single-cell RNA sequencing. J Immunol 2022. b;208:396–406. [DOI] [PubMed] [Google Scholar]
- Wang Y, Li Y, Yue M et al. N6-methyladenosine RNA modification regulates embryonic neural stem cell self-renewal through histone modifications. Nat Neurosci 2018. b;21:195–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y, Waters J, Leung ML et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 2014;512:155–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009;10:57–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinberger E, Lin C, Lee S-I. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat Methods 2023;20:1336–45. [DOI] [PubMed] [Google Scholar]
- Weinreb C, Klein AM. Lineage reconstruction from clonal correlations. Proc Natl Acad Sci USA 2020;117:17041–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinreb C, Rodriguez-Fraticelli A, Camargo FD et al. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 2020;367:eaaw3381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinreb C, Wolock S, Tusi BK et al. Fundamental limits on dynamic inference from single-cell snapshots. Proc Natl Acad Sci USA 2018;115:E2467–76. 10.1073/pnas.1714723115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welch JD, Kozareva V, Ferreira A et al. Single-Cell multi-omic integration compares and contrasts features of brain cell identity. Cell 2019;177:1873–87.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wen H, Tang W, Dai X et al. CellPLM: pre-training of cell language model beyond single cells. bioRxiv 2023;2023.10.03.560734. 10.1101/2023.10.03.560734 [DOI]
- Weng C, Yu F, Yang D et al. Deciphering cell states and genealogies of human haematopoiesis. Nature 2024;627:389–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wessels H-H, Méndez-Mancilla A, Hao Y et al. Efficient combinatorial targeting of RNA transcripts in single cells with Cas13 RNA Perturb-seq. Nat Methods 2023;20:86–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Womersley HJ, Muliaditan D, DasGupta R et al. Single-nucleus CUT&RUN elucidates the function of intrinsic and genomics-driven epigenetic heterogeneity in head and neck cancer progression. Genome Res 2025;35:162–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Wershof E, Schmon SM et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. arXiv 2024;2408.10609.
- Xie L, Liu H, You Z et al. Comprehensive spatiotemporal mapping of single-cell lineages in developing mouse brain by CRISPR-based barcoding. Nat Methods 2023;20:1244–55. [DOI] [PubMed] [Google Scholar]
- Xie S, Duan J, Li B et al. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol Cell 2017;66:285–99.e5. [DOI] [PubMed] [Google Scholar]
- Xue H-C, Xu Z-G, Liu Y-J et al. A unified cell atlas of vascular plants reveals cell-type foundational genes and accelerates gene discovery. Cell 2025;188:6370–90.e29. S009286742500858X. [DOI] [PubMed] [Google Scholar]
- Yang D, Jones MG, Naranjo S et al. Lineage tracing reveals the phylodynamics, plasticity, and paths of tumor evolution. Cell 2022;185:1905–23.e25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang L, Zhu Y, Yu H et al. scMAGeCK links genotypes with multiple phenotypes in single-cell CRISPR screens. Genome Biol 2020;21:19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang X, Liu G, Feng G et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res 2024;34:830–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye Z, Sabatier P, Van Der Hoeven L et al. Enhanced sensitivity and scalability with a Chip-Tip workflow enables deep single-cell proteomics. Nat Methods 2025;22:499–509. 10.1038/s41592-024-02558-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeo GHT, Saksena SD, Gifford DK. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat Commun 2021;12:3222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan Z, Zhou Q, Cai L et al. SEAM is a spatial single nuclear metabolomics method for dissecting tissue microenvironment. Nat Methods 2021;18:1223–32. [DOI] [PubMed] [Google Scholar]
- Zafar H, Lin C, Bar-Joseph Z. Single-cell lineage tracing by integrating CRISPR-Cas9 mutations with transcriptomic data. Nat Commun 2020;11:3055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Cammarata L, Squires C et al. Active learning for optimal intervention design in causal models. Nat Mach Intell 2023;5:1066–75. [Google Scholar]
- Zhang L, Nie Q. scMC learns biological variation through the alignment of multiple single-cell genomics datasets. Genome Biol 2021;22:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang N, Long Y, Xia S et al. SpaFoundation: a visual foundation model for spatial transcriptomics. bioRxiv 2025;2025.08.07.669202.
- Zhang Y, Donaher JL, Das S et al. Genome-wide CRISPR screen identifies PRC2 and KMT2D-COMPASS as regulators of distinct EMT trajectories that contribute differentially to metastasis. Nat Cell Biol 2022;24:554–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao K, So H-C, Lin Z. scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis. Genome Biol 2024;25:223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao S, Luo Y, Yang G et al. SToFM: a multi-scale foundation model for spatial transcriptomics. arXiv 2025;2507.11588.
- Zheng Y, Schupp J, Adams T et al. Unagi: deep generative model for deciphering cellular dynamics and in-silico drug discovery in complex diseases. Res Sq 2023;rs.3.rs-3676579. 10.21203/rs.3.rs-3676579/v1 [DOI] [PMC free article] [PubMed]
- Zhong H, Han W, Gomez-Cabrero D et al. Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life. Nucleic Acids Res 2025;53:gkae1316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu X, Xu Z, Wang G et al. Single-cell resolution analysis reveals the preparation for reprogramming the fate of stem cell niche in cotton lateral meristem. Genome Biol 2023;24:194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zinati Y, Takiddeen A, Emad A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat Commun 2024;15:4055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuo Z, Cheng X, Ferdous S et al. Single cell dual-omic atlas of the human developing retina. Nat Commun 2024;15:6792. [DOI] [PMC free article] [PubMed] [Google Scholar]





