Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Sep 1:2024.08.30.610498. [Version 1] doi: 10.1101/2024.08.30.610498

Dissecting reversible and irreversible single cell state transitions from gene regulatory networks

Daniel A Ramirez 1,2, Mingyang Lu 1,2,*
PMCID: PMC11384016  PMID: 39257745

Abstract

Understanding cell state transitions and their governing regulatory mechanisms remains one of the fundamental questions in biology. We develop a computational method, state transition inference using cross-cell correlations (STICCC), for predicting reversible and irreversible cell state transitions at single-cell resolution by using gene expression data and a set of gene regulatory interactions. The method is inspired by the fact that the gene expression time delays between regulators and targets can be exploited to infer past and future gene expression states. From applications to both simulated and experimental single-cell gene expression data, we show that STICCC-inferred vector fields capture basins of attraction and irreversible fluxes. By connecting regulatory information with systems’ dynamical behaviors, STICCC reveals how network interactions influence reversible and irreversible state transitions. Compared to existing methods that infer pseudotime and RNA velocity, STICCC provides complementary insights into the gene regulation of cell state transitions.

Keywords: Gene regulatory network (GRN), systems biology, single cell RNA-seq, RNA velocity, cell state transition, basin of attraction, time delay, irreversible transition

Introduction

A key question in biology is how gene regulatory networks (GRNs) control biological processes and how this knowledge can shed light on therapeutic strategies for diseases1,2. A central focus in recent years are those gene networks that control cell state transitions during healthy developmental processes, such as cell differentiation3 and cell cycle progression4, and during disease development, such as tumorigenesis5 and fibrosis6. In many cases, cell state transitions are irreversible due to feedback control or epigenetic mechanisms7; for example, in cell cycle, irreversible transitions were found to be achieved by a negative feedback loop creating a one-way toggle switch8. In some other systems, cells can interconvert between distinct stable states stochastically or in response to specific stimuli, such as when certain differentiated cell types dedifferentiate in response to tissue damage9,10. These mechanisms may be combined to permit complex decision-making between multiple cell fates according to environmental cues11,12. Single cell transcriptomics has become a popular technology to identify distinct transcriptional cell states13, paths of transitions between states14, and cell-to-cell variations within each state15. However, it remains challenging to reveal complex patterns of cell state transitions and their underlying regulatory mechanisms. A recent computational approach, named RNA velocity, estimates the instantaneous rate of change of gene expression for cells using single cell RNA sequencing (scRNA-seq) data16. In the RNA velocity method, a simple kinetic model is established for each gene to describe RNA processing from the unspliced form to the spliced form, using measured abundances from scRNA-seq data. Because of the time delays during RNA processing, RNA velocity can exploit the memory present in the read counts to predict changes in gene expression. RNA velocity has been demonstrated to capture cell state transitions in several applications1719 and has been generalized to consider the temporal sequences of other regulatory events during gene regulation20,21. This method has also been used to infer pseudotime trajectories during cell differentiation and to infer gene regulatory interactions2224. The ability of RNA velocity to infer gene expression dynamics from static data has made it a powerful tool in single cell transcriptomics. However, there are a few limitations of existing methods based on RNA velocity. First, these methods only consider the regulatory events of individual genes and fall short in characterizing the relationship between cell state transitions and GRNs. Second, from a dynamical-systems view, state transitions could be reversible and irreversible; however, such a feature of state transitions has rarely been captured in a typical RNA velocity analysis.

To address these limitations, here we develop a new computational tool, named state transition inference using cross-cell correlation (STICCC), to predict the irreversible and reversible directions of state transitions at single-cell resolution using gene expression data and a set of regulator-target interactions (Fig. 1A). This approach assumes that changes in regulators’ activity/expression should precede changes in targets’ expression – and therefore, cells’ future expression state can be inferred from their current regulatory state. The core of the approach is a metric named cross-cell correlation (CCC), which measures the association between regulator activity/expression in one cell and corresponding target gene expression in a different cell with similar gene expression25. A high CCC indicates that the latter cell represents a likely time evolution of the former, according to the provided regulator-target relationships. Using the distribution of CCCs from a cell to its neighboring cells (“neighbor” as defined in the gene expression space), one can obtain a vector representing the likely direction of future transition for one cell. Similarly, another transition vector can be inferred using the distribution of CCCs from the neighboring cells to the center cell, indicating the likely direction towards prior states. Using both the incoming and outgoing vectors allows us to decompose the state transitions into the reversible and irreversible components. Compared to existing trajectory inference approaches, such as pseudotime and RNA velocity, STICCC has the unique advantage of closely integrating regulatory information into the modeling of cell state transitions and dissecting the reversible and irreversible state transitions.

Figure 1. Overview of the method STICCC.

Figure 1.

A) Workflow of a new algorithm to infer single cell state transition. The method takes single cell gene expression data and a set of regulator-target relationships (or topology of a transcriptional regulatory network) as the input. For each cell, STICCC predicts a vector pointing towards its future state by using cross cell correlations (CCCs). Finally, local smoothing is applied to generate global state transition vectors. B) CCC between a center cell i and its neighboring (proximal in gene expression) cell j, ρij, is defined as the Pearson correlation of regulator expression in cell i with corresponding target expression in cell j. The regulator expression is inverted for inhibitory edges to preserve the sign of the correlation. Here, a repressilator (REP) gene circuit is illustrated as an example. The panel shows the circuit diagram, circuit topology (R: regulators, T: targets, S: regulation signs), and the definition of CCC. C) Illustration of the inference of outgoing transition vectors by STICCC. The top left panel illustrates a scatter plot of single cell gene expression in low dimensional projection. The top right panel shows the relationship between the calculated value of ρij for each neighboring cell j on the x-axis and the predicted value of ρij based on the linear regression. The bottom panels show a zoomed-in view of the gene expression space surrounding the center cell i (red), with the CCC values from cell i to a neighboring cell j (bottom left) or from each neighboring cell to cell i (bottom right) depicted in a blue gradient. The transition vectors align with the gradient of CCCs, which can be inferred through multiple linear regression. D) Illustration of cell-wise predictions of outgoing transition vectors for a full dataset. E) Smoothed global state transition patterns from D. F) Net flow and reversibility are defined as the average and half the difference of incoming and outgoing vectors, respectively.

In the following, we first summarize the methodology of STICCC, with the details provided in the Methods section. We then apply STICCC to several simulated gene expression datasets derived from synthetic gene regulatory circuits. We demonstrate that the algorithm can recapitulate basins of attraction and irreversible flux, both of which can be associated with the dynamical behaviors of the corresponding gene circuits. Inferred transition vectors are robust to noise from dropouts and agree with simulated expression trajectories. Furthermore, we illustrate the application of STICCC on several experimental single-cell gene expression datasets to elucidate diverse patterns of cell state transitions. By comparing state transition vectors from slightly perturbed network topologies, STICCC can also explore how reversible and irreversible state transitions are influenced by regulators and regulatory interactions. STICCC is available as a free R package to the community on GitHub: https://github.com/lusystemsbio/viccc/.

Results

Vector inference of single cell state transitions

We develop the STICCC method to infer single cell state transition vectors from single-cell gene expression data and a list of regulator-target gene relationships. STICCC utilizes a metric termed cross-cell correlation (CCC, Eq. 4 in Methods), based on delayed correlation2527, to uncover the transition propensity between two single cells (Fig. 1B). By assuming a linear relationship between the CCCs from a center cell i towards its neighboring cells and the cells’ relative gene expression, we can infer the outgoing transition vector v1 of cell i using multiple linear regression (Fig. 1C and Eq. 5 in Methods). The same process can be applied to each cell in a dataset to describe the collective transition pattern (Fig. 1D). For easier visual interpretation, from the cell-wise vectors, we apply inverse distance-weighted smoothing on a uniform grid to create a vector field (Fig. 1E, see Methods), which allows to generate a clear representation of the vector field for cell state transitions.

To illustrate the inference of the outgoing transition vectors, we applied it to simulated gene expression data from a repressilator (REP) gene circuit consisting of three genes sequentially inhibiting each other (Fig. 1B)28. The gene expression dynamics of the REP circuit were simulated by a typical ordinary differential equation (ODE) model with Hill kinetics for gene regulation and linear terms for protein degradation (see Methods for detailed modeling procedures). Such a model can generate oscillatory gene expression dynamics, consistent with previous theoretical and experimental studies of the circuit28,29. When simulating the circuit from two different initial conditions, we generated time trajectories that gradually converge to a limit cycle in different ways (Fig. S1). For each case, we applied STICCC to the gene expression snapshots extracted from the simulated time trajectories, and we found the inferred outgoing transition vectors form a clockwise circular flow in the PCA projection, largely consistent with the directions along the simulation trajectory (Fig. S1). Interestingly, the linear-regression-based algorithm successfully recovered the state transition patterns even when the circuit gene expression dynamics were analyzed with nonlinear differential equations.

Next, we simulated the same circuit but using RACIPE, an ODE-based systems biology modeling framework for generating an ensemble of models with randomly sampled parameter sets from a GRN topology. The simulated gene expression data from RACIPE can be analogized to single-cell gene expression measurements with large cell-to-cell variability (see Methods)25,30,31. Unlike the above-mentioned simulations that exhibit oscillatory gene expression dynamics, most of the simulated expression profiles from RACIPE correspond to the stable steady states of dynamical models. Even for such a noisy data set, when supplied as the input to STICCC, they yield a similar circular pattern of outgoing transition vectors (Fig. S2A), consistent with how REP operates as an oscillatory circuit. The observed flow was also robust to variations in dataset size: even in down-sampled sets of as few as 200 ssimulated models, the overall predicted vector field remains the same (Fig. S2B). In addition, transition vectors for this circuit were found to be dominated by tangential flow only beyond a certain radius from the origin (Fig. S2C). These results together demonstrate the capability of STICCC in inferring dynamical state transitions from snapshots of gene expression profiles.

STICCC captures the net flow and basins of attraction of synthetic circuits

Whereas v1 represents the likely future expression changes of cell i, STICCC also infers its likely prior states by an incoming transition vector v2, which evaluates how CCC values from neighboring cells to cell i change with respect to gene expression differences (Eq. 9 in Methods). Since the vectors represent the outgoing and incoming transitions of the same cell, the average of v1 and v2 reflects the net flow resulting from irreversible state transitions, while v1v2/2 indicates the reversibility of state transitions (Fig. 1F). This approach can therefore be applied to a range of systems and describe both reversible and irreversible cell state transitions.

We applied STICCC to evaluate the inference of reversible and irreversible cell state transitions on several small synthetic gene regulatory circuits, where circuits’ dynamical behaviors have been well-studied and intuitive. The first example is the same REP circuit we have discussed (Fig. 2A). We simulated single-cell gene expression data for a population of cells by RACIPE30 (see Methods). Using the ensemble of simulated steady-state gene expression values, STICCC predicts a continuous clockwise oscillation for both the outgoing vectors v1 and the incoming vectors v2 (Fig. 2B). Since the directions and magnitudes for the v1 and v2 vectors are mostly the same, the net flow is very similar to either vector. Meanwhile, the reversibility vectors have almost zero magnitude, supporting the expected directed flow observed in the REP circuit. A noteworthy feature of STICCC is its ability to robustly capture clear patterns of the irreversible state transitions despite noise and variability represented in the simulated models.

Figure 2. Reversible and irreversible state transitions of synthetic circuits.

Figure 2.

Application of STICCC to synthetic circuits. Each row shows the outgoing vectors (2nd column), incoming vectors (3rd column), net flow (4th column) and reversibility (5th column) for each synthetic circuit (diagram in 1st column). The four synthetic circuits are repressilator (REP, panels AB), coupled toggle switch (CTS, panels CD), incoherent feed forward loop (iFFL, panels EF), a toggle switch/repressilator (TS/REP, panels GH). For each, 10,000 simulated gene expression profiles are projected onto the first two principal components (PCA loadings are shown in the 2nd column). Colored arrows illustrated the overall transition patterns for the outgoing vectors (green), incoming vectors (blue), and net flow/reversibility (purple).

Whereas STICCC reveals oscillatory patterns and irreversible transitions in the REP circuit, it can also characterize multi-stable circuits with reversible transitions. We next simulated single cell gene expression for a coupled toggle switch (CTS) (Fig. 2C). The circuit permits two major gene expression states, with high expression of one pair of genes, A/C, or another, B/D, in addition to two minor gene expression states characterized by high expression of A/D and B/C. Interestingly, the v2 vectors for the CTS circuit showed the opposite direction of transition as v1, making the magnitude of net flow near zero (Fig. 2D). The reversibility is highest in the regions of state transitions to and from the two major expression states, suggesting the circuit allows bidirectional transitions between the two major gene expression states (high expression of A/C and B/D, respectively). Cells in the two minor states (high expression of A/D and B/C, respectively) were predicted to move toward one of the two major states, and the magnitude of the v1 vectors diminishes as models move from the center of the PCA plot towards the center of each basin.

Next, we applied STICCC to an incoherent feedforward loop (iFFL), a gene circuit known to generate gene expression dynamics allowing excitation and adaptation32 (Fig. 2E). In this iFFL, gene X directly activates but indirectly inhibits downstream target Z through gene Y; the circuit allows a pulse-like response, where Z expression initially increases in response to X then decreases as the indirect regulation from Y takes effect 32. These time-dynamic behaviors are not apparent from simulated steady-state gene expression of a cell population alone, with no observable temporal patterns. Using STICCC, the v1 and v2 vectors capture state transitions in different regions of the phenotypic space, such that the net flow creates an oscillatory transition, while the reversibility vectors show two-way transitions between two basins (Fig. 2F). This combination of irreversible flows and basins of attraction explains the typical dynamics of an iFFL as follows. When the signaling node I increases, the cells would transit from the bottom-left basin to the top-right basin through the counterclockwise oscillatory path (Fig. S3, Table S1). This transition path allows Z expression to return to its original level from a deviation caused by the initial signaling in I. This behavior of an iFFL has been previously characterized using traditional systems biology modeling32, and here STICCC can capture such dynamical behaviors well with single-cell transition vectors.

The last synthetic circuit comprises a combination of a toggle switch motif (genes A and C) and a third gene B allowing a negative feedback loop (Fig. 2G). Thus, this circuit, termed the toggle switch/repressilator (TS/REP), couples a bistable motif and an oscillatory circuit motif. However, the simulated single cell gene expression distribution showed only two slightly separated basins in the gene expression space (Fig. 2H). On the other hand, the STICCC-inferred vectors exhibit an oscillation in the net flow, as well as reversible transitions between two basins. The net flow illustrates that the transitions between two basins follow two distinct paths according to the basin in which they begin (indicated by the purple arrows in the third panel from the left in Fig. 2H).

In conclusion, from the applications to these classic synthetic gene circuits, STICCC can reveal their dynamical behaviors, such as properties of basins of attractions, transition paths between states, and oscillatory dynamics, most of which are not apparent from snapshots of single cell gene expression. Note that STICCC requires the regulator-target relationship of the gene regulatory circuit, but not the detailed rate equations and kinetic parameters.

STICCC is robust against technical noise in scRNA-seq data

The simulated gene expression steady states of a GRN were generated by RACIPE, which captures the effects of cell-to-cell variations by using a large range of kinetic parameters. The simulations did not explicitly model the technical noise introduced by sampling transcripts in scRNA-seq data, however. As such, the RACIPE simulation data for the REP and CTS circuits were post-processed to model dropouts using methods from single-cell transcriptional regulation simulator BoolODE33. To evaluate the overall change in vector fields caused by simulated noise, we varied the dropout proportion and measured the pairwise change in angle for the predicted vector for each cell (net flow for the REP, reversibility for the CTS) from the original simulation, as well as a null distribution constructed from shuffling the indices of the original simulation. At low dropout proportions (< 30%), the angle changes remain relatively small for both the REP and CTS circuits. However, the CTS showed a larger deviation in transition vectors due to noise than the REP at higher dropout proportions (Figs. 3AB). In both circuits, the effects of noise grow in proportion to the dropout rate, but the qualitative results are well preserved, even at higher noise levels (Fig. S4). Our results suggest that the reliance of STICCC on not one cell, but a local distribution of samples helps to mitigate the effects of technical noise in single-cell data.

Figure 3. State transition inference is robust against dropout and agrees with simulated expression trajectories.

Figure 3.

Single cell gene expression data with various levels of dropout effects are simulated and applied to STICCC for the repressilator (REP, panel A) and coupled toggle switch circuits (CTS, panel B). Panels A and B show the distribution of angle changes of the single-cell vector predictions for various dropout rates, indicated on the x-axis and by color, compared to those for the condition with no dropout (labeled as ND). Null denotes the case when changes are computed by comparing the vectors from the ND condition between random pairs of cells. Other cases denote the conditions for different dropout levels α, where dropout simulations were performed using BoolODE, which sets read counts below a specified threshold (dropQuantile) for each gene to zero with a probability of dropProb. The parameter dropProb was specified as α shown in the x-axis and colors, and dropQuantile was fixed at 0.2. Boxes encompass the interquartile range (IQR) and mark the median with a horizontal line, and whiskers extend to 1.5 times the IQR. C) PCA density plot for stochastic trajectory of a REP model with a limit cycle. Contours (in red) indicate density along the whole trajectory, points indicate the deterministic limit cycle and are numbered 1–76 (indicated by light to dark blues). Arrows at several points illustrate net flow predicted by STICCC using the simulated gene expression and the REP network. D) Violin plot showing the distribution of observed angles (in radians) starting from each labeled point in the deterministic limit cycle. X-axis shows start points, violins show distribution of angles in the first two PCs of observed transition vectors, blue dashed line indicates tangential directions of the limit cycle trajectory, and purple dots are the angles of the predicted net flow. E) PCA density plot for stochastic trajectory of a CTS model switching between states. Reference points were selected along a linear path connecting basins, as numbered by 1–60 (by light to dark blues). Arrows are shown at selected points along the path to indicate reversibility predictions by STICCC. F) Violin plot of the observed and predicted angles along the path. For each starting point indicated on the x-axis, the violins show the distribution of observed angles from trajectory simulations, the blue dashed lines indicate the forward and backward directions along the path, and the purple dots indicate either end of the predicted reversibility vectors.

STICCC predicts state transitions observed in stochastic dynamical trajectories

STICCC predictions were also validated against time trajectories from the REP and CTS models. For REP, kinetic parameters of the differential equations were selected to produce a limit cycle and the time trajectory from random initial conditions was simulated with a low noise level for a long duration (achieved by modeling the corresponding stochastic differential equations in Eq. (2) in Methods; see model parameters in Table S2). A random subset of 2000 snapshots from this noisy trajectory were used as the input expression data for STICCC. We then sampled 76 points along the deterministic limit cycle and compared the predicted vector from the local neighborhood of each point to the distribution of observed vectors (Fig. 3C, arrows indicate predicted net flow). To calculate observed vectors, we selected points from the noisy time trajectory which were close to a particular point along the limit cycle, then found the corresponding gene expression states after a short time lag (see Methods). An observed vector was defined as the vector from the state at one initial timepoint to the state after a short time delay, i.e., a few steps forward in the trajectory. The STICCC predictions closely agree with the observed vectors and the general direction of the limit cycle (Fig. 3D, angles for net flow in purple, limit cycle in blue, and observed vectors on violin plots). Although there is variability due to noise and the slightly deformed shape of the limit cycle in the PCA projection, there is a consistent peak in observed vectors corresponding to the direction of the limit cycle. STICCC appears to slightly overestimate the angle, predicting an outward spiral instead of oscillation, which may reflect the influence of noise in creating larger oscillatory trajectories than observed in deterministic simulations.

To validate the CTS predictions, we simulated time trajectories for a specific CTS model with three steady states and sufficient noise to drive stochastic state transitions between basins (Table S3 for model parameters). Similarly, a random subset of 5000 snapshots from this noisy trajectory was used as the input expression data for STICCC. We chose a set of reference points along a linear path between the three basins to compare observed and predicted vectors (Fig. 3E, arrows indicate predicted reversibility). Observed vectors generally showed a bimodal distribution, with trajectories continuing into either basin. These observed peaks aligned with either end of the predicted reversibility vector, suggesting STICCC can capture bidirectional transition routes (Fig. 3F, vector angles for linear path in blue, bidirectional reversibility in purple, and distribution of observed vector angles on violin plots). In summary, the reversible and irreversible state transitions predicted by STICCC are largely consistent with the local state transitions observed in stochastic simulations of the circuit models.

Inferring signal-induced state transitions

We next evaluated whether STICCC would predict single-cell state transitions of a system with a varying signal input. We again simulated the CTS circuit (Fig. 4A), but this time beginning from the cells from the leftmost basin (Fig. 4C, t=0), and performed time dynamic simulations for each model with a varying signaling state (Fig. 4B). An increased signal (increasing when t0,20 and constant when t20,40) would cause up to a 50-fold increase of the production rate of gene D, which can drive the cells to transit to the rightmost basin. In the simulation, we added a small amount of stochastic noise to permit cells to escape from the initial basin (See Methods). A decreasing signal (when t40,60), on the other hand, should negate the effects, allowing some models to transit back to the original state. Gene expression snapshots were extracted at selected timepoints (t=0,1,3,40,60,80) and supplied as inputs to STICCC with the same circuit topology. In the presence of increased signal, as the cells transit to the target state, the vector fields gradually shift. A small net flow towards the target state appears during signaling (Fig. 4C, t=3,40,60). Meanwhile, reversibility vectors shifted under signaling such that the boundary between apparent basins moved further to the left (Fig. 4C, compare t=40,60). This corresponds to the expected effect of driving cells toward the B/D state, as a shorter duration and smaller amount of noise will be sufficient to cause transitions in the favored direction. Moreover, these changes began to revert as the signal was decreased and eventually removed, such that the portion of the vector field shown at t=0 is comparable to the same region at t=80. As shown in this more challenging test where the signaling information is not included in the input of the algorithm, STICCC can clearly detect the signal-induced state transitions from snapshots of single-cell gene expression data and circuit topology.

Figure 4. Characterization of signal-induced state transitions from snapshot data.

Figure 4.

A) The CTS circuit was modified to include a manually controlled input signal I, which regulates the production rate of gene D. B) Time dependent signal strength. Production rate of gene D increased linearly up to 50-fold during t0,20, then remained at the high level during t20,40 before linearly decreasing back to baseline levels during t40,60. C) Net flow and reversibility at selected timepoints, indicated by vertical red lines in panel B. Data is projected to the same PCA axes as the original CTS results. A small net flow towards the target state appears during signal induction and the apparent separation between basins suggested by the reversibility vectors shifts to the left, both of which changes largely revert upon signal removal.

Inferred vectors characterize state transitions from scRNA-seq data

Having tested the method on simulated datasets, we applied STICCC on three single-cell gene expression datasets. First, we applied STICCC to an scRNA-seq dataset of 976 budding yeast cells. For the input regulatory links, we used an established GRN model of yeast cell cycle (Fig. 5A)25. Dimensional reduction of 5624 genes measured in the scRNA-seq data using PCA revealed a circular structure that arranged cells in order of cell cycle phase annotations obtained from Seurat (see Methods) (Fig. 5B). Although the topology includes various types of regulation including transcriptional and signaling relationships, in this case we considered each interaction equally when computing cross-cell correlations. Since some of the genes in the GRN do not correspond to a single transcript, we applied the same mapping as in ref. 25 to select genes to retrieve gene expression data (Table S4). After applying STICCC, we found a circular pattern where the net flow forms a path along the order of cell cycle phases, with little reversibility overall. However, the reversibility predictions vary at different points along the cycle, with the ratio of net flow to reversibility reaching a minimum at the point between G2M and G1, consistent with the observation of Start during budding yeast cell cycle where cell cycle commitment would be achieved to allow irreversible cell cycle progression.

Figure 5. Velocity inference in an experimental scRNA-seq data of budding yeast cell cycle.

Figure 5.

A) The gene regulatory network topology for budding yeast cell cycle, with blue pointed arrowheads denoting activation and red circular arrowheads denoting inhibition. B) The inferred outgoing, incoming, net flow and reversibility vectors. Each plot shows the projection of gene expression profiles of network genes onto its first two principal components. Colors of points represent the cell cycle states. C) Summary of edge sensitivity analysis showing median magnitude difference between paired vectors across edge perturbations, where vectors are computed with part of the GRN intentionally left out. X-axis shows the differences in net flow, and y-axis shows the differences in reversibility. Red labels highlight the perturbations with significant changes in either net flow or reversibility. D) Net flow and reversibility with outgoing edges from CLB6 omitted. E) Net flow and reversibility with outgoing edges from HTA1 (DNA Synthesis) removed.

To uncover the potential roles of genes and interactions in different types of cell state transitions, one can also supply a modified topology alongside the same single cell gene expression data and compare the resulting vectors to the original results (see Methods and Fig. S5). In the cell cycle GRN, this edge sensitivity analysis highlighted links relating to CLB6 and DNA synthesis as key components in producing the observed oscillatory transitions (Fig. 5C). CLB6 is known to regulate the initiation of S phase and removing related edges resulted in large changes to the predicted net flow, particularly near the cells in the S phase (Fig. 5D). Whereas edges linked to DNA synthesis were removed, reversibility overall was greatly impacted, suggesting the importance of these edges in the dynamical behaviors of the cell cycle GRN (Fig. 5E). Prior work has suggested these genes and edges are significant actors in the control of the yeast cell cycle34, suggesting the power of the edge sensitivity analysis in uncovering important network components, especially those that are nonredundant in the network. Notably, this approach permits uncovering influential nodes and edges from a topology without any detailed physical information about the nodes, implying that STICCC could effectively identify key regulators in less-known systems. In summary, STICCC effectively captures the irreversibility and oscillatory nature of cell cycle stages, as well as indicating which regulatory information most informs the predictions.

The next example is a single-cell qPCR dataset35 describing hematopoietic stem cells (HSCs) differentiated into either erythroids or myeloids using erythropoietin (EPO), granulocyte macrophage-colony stimulating factor (GM-CSF), and a combination of both drugs simultaneously. Gene expression was measured over six days after signal induction, and the stem cells bifurcated into two differentiated states. A GRN for HSC differentiation has been previously established through literature curation35 and served as the basis for which genes to measure in the qPCR. In our analysis, from the GRN genes, we further removed genes with low expression and variance (see Methods) (Fig. 6A). PCA on the combined dataset comprising cells receiving all three treatments revealed a bifurcating structure, where early timepoints for all treatments clustered together with untreated cells, then the populations diverged into the two differentiated cell lineages based on the treatment at later timepoints (Fig. 6B). The resulting net flow from STICCC had very small magnitudes, but the reversibility vectors indicated a bifurcation occurring around data from Day 3 into two differentiated states around data from Day 6. This is consistent with findings from the original work that the acquisition of erythroid- and myeloid-specific markers occurred mainly between Day 3 and Day 635. Strong reversibility vectors suggest the existences of multiple basins of attractions that are associated with various cell phenotypes during cell differentiation. The reversible cell state transitions may also be related to the rebellious cell states where some cells, upon symmetric destabilization of a progenitor, differentiate into the opposite phenotype of the inducing signal. Interestingly, we did not observe single-cell vectors associated with the state transitions during the early timepoints (before Day three)35, suggesting that the network may miss gene regulatory interactions involved in destabilizing the progenitor state. Moreover, similar patterns of vector fields were observed when STICCC was applied to data for each treatment condition separately (Fig. S6). Finaly, edge sensitivity analysis of the circuit revealed Gata1 and Pu1 as highly influential (Fig. 6C), which is supported by experimental evidences indicating these genes as critical supporters of the erythroid and myeloid lineages, respectively36.

Figure 6. STICCC distinguishes cell state transitions with dominant net flow or reversibility.

Figure 6.

A) Gene regulatory network topology for differentiation of hematopoietic stem cells. B) PCA of HSC gene expression data colored by timepoint, with grid smoothed vectors for net flow (left panel) and reversibility (right panel). Fig. S6 shows similar results for each treatment condition analyzed separately. C) Summary of influential edges and nodes from edge sensitivity analysis. Edge perturbations are labeled in red if they are in the top 15% of perturbations by combined change in median net flow and reversibility. D) NetAct inferred transcription factor regulatory network using the scRNA-seq data from A549 cells treated with TGF-β undergoing epithelial-mesenchymal transition. E) PCA plots of A549 gene expression data during EMT along with smoothed arrows indicating net flow (left panel) and reversibility (right panel) predictions. F) Summary of influential edges and nodes for EMT network, with the top 15% of perturbations labeled in red. X-axis denotes median paired change in vector magnitude for net flow, and y-axis denotes the same quantity for reversibility.

Lastly, we applied STICCC to scRNA-seq data characterizing TGF-β induced epithelial-mesenchymal transition (EMT) in a published dataset of 3133 A549 cancer cells, sequenced at five timepoints during the course of seven days’ exposure to TGF-β. The GRN was obtained from the method NetAct, which integrates information from transcription factor (TF)-target gene databases and TF activity inference to construct dynamic transcriptional regulatory network (Fig. 6D)37. STICCC predicts a much stronger net flow than reversibility, where the net flow vectors consistently moved from the E population to the M population, as would be expected from a strong EMT-inducing signal (Fig. 6E, cells colored by TGFBI expression). TGF-β induced EMT has been characterized as irreversible12, although recent work suggests reversibility may occur on longer timescales through epigenetic mechanisms not included in the present network38. Edge sensitivity analysis highlighted AP-1 TF family members Jun39 and Fos40, as well as Smad3, a well-known mediator of the TGF-B signaling employed in the dataset41 (Fig. 6F). Overall, STICCC appears to perform well on experimental single-cell data, capturing known state transitions in the context of multi-step differentiation, bifurcation, and oscillatory dynamics. We expect STICCC to allow to generate new hypotheses regarding cell state transitions and individual genes or edges driving them.

Discussion

In this study, we have presented a computational method for inferring reversible and irreversible cell state transitions from single-cell gene expression data and a set of regulatory interactions, which provides insight into the gene regulation of phenotypic state transitions. We first developed the method in the context of small synthetic circuits and simulated single cell gene expression data, where the circuits’ dynamics are known. We found that the structure of the governing GRN encodes sufficient information to recover oscillatory trajectories and multi-state distributions. Moreover, the incoming and outgoing cross-cell correlations (CCCs) provide complementary information, helping to distinguish reversible and irreversible interactions for a more complete characterization of cell state transitions. STICCC can accurately recapitulate the expected behaviors of multiple synthetic circuits, produce robust results in noisy conditions, and uncover the roles of different nodes and edges by edge sensitivity analysis. Moreover, STICCC can detect reversible and irreversible state transitions in a system driven by an external signal, even when the signal interaction is not presented in the input circuit topology. For simulated gene circuits, STICCC reliably uncovers reversible and irreversible transition patterns and can describe the structure of phenotypes in multi-stable, oscillatory, or hybrid systems.

Following the success of STICCC in characterizing simulated data, we applied it to three experimental single-cell gene expression datasets. For the cell cycle dataset, the method recovered the expected direction of oscillation and identified varying levels of irreversibility among cell cycle phase transitions. Edge sensitivity analysis also uncovered significant drivers of cell cycle progression, which allows to generate new hypotheses and suggest targets for genetic perturbation. STICCC also identified a bifurcation occurring during differentiation of hematopoietic stem cells and suggested an irreversible epithelial-mesenchymal transition during TGF-β induction in lung cancer cells. STICCC shows great promise to explore and decode gene networks governing complex cellular processes.

Some existing methods address similar questions, notably including RNA velocity and related work, which is a set of methods aiming to predict short-term changes in gene expression by analyzing the presence of spliced and unspliced mRNA transcripts16,19,20,24,42. Our approach differs from RNA velocity in several key aspects as follows. First, STICCC incorporates prior knowledge of the GRN, rather than a kinetic model of mRNA splicing, to generate transition vectors. Second, STICCC directly relies on a local distribution of samples to compute each vector, mitigating some effect of noise in the input data. Another recent work in this area is scKINETICS, which estimates regulatory velocity using TF-target relationships as the basis for a dynamical model and constrains velocity predictions based on chromatin accessibility and the observed manifold of gene expression states43. Whereas scKINETICS is based on a linear model of regulatory activity, however, we demonstrate that STICCC can predict state transitions from gene expression data associated with nonlinear regulatory dynamics. Moreover, we are not aware of existing work which explicitly addresses the question of disentangling reversible and irreversible state transitions, while STICCC provides a convenient framework for this.

We anticipate applications of STICCC in studying the dynamics and basic properties of GRN topologies. By systematically applying the algorithm to small circuit motifs, one could characterize circuits according to the prevalence of reversible or irreversible flows, expanding on other recent work addressing essential properties of circuit motifs4448. Systematic patterns in the relationship between GRN topology and predicted flows could help to predict state changes in experimental data based on inferred and a priori known interactions. Conversely, the same mapping between circuits and dynamical behaviors may help to determine the regulatory circuitry underlying observed transition data. Influential edges and nodes can be prospectively identified and validated with perturbation studies to efficiently identify key regulators and targets for intervention in disease systems49,50. STICCC is a step forward in analyzing cell state transitions in scRNA-seq data with the advantage of integrating measured data with known gene regulatory interactions.

There are several aspects of STICCC worth further investigation in the future. First, the computational challenge of making a stable inference while minimizing the time cost remains present. The current algorithm can analyze 10,000 models in a 4-node simulated circuit in ~10 minutes on a 2021 M1 MacBook Pro, but performance varies significantly with dataset size, distribution, and parameter settings. Re-implementation of major calculations in a more performant language such as C and parallelization of parts of the algorithm could yield significant improvements. Second, further work should be done to compare the reversibility predictions with high-resolution time-series scRNA seq data, which, while a rapidly growing resource51, remains fairly sparse. Third, we also do not distinguish different types of regulations, e.g., signaling as opposed to transcriptional regulation, which may operate on different timescales and need to be considered separately in the analysis.

Methods

GRN simulations

To simulate scRNA-seq data, we applied random circuit perturbation (RACIPE30, using sRACIPE31) to generate a set of 10,000 simulated gene expression profiles for each GRN. RACIPE is an algorithm designed to generate, from a GRN topology, an ensemble of ordinary differential equation (ODE)-based models with randomly generated kinetic parameters. Each model represents a distinct set of parameters that could capture cell-to-cell variability, and collectively the simulation results yield a distribution of steady-state gene expression profiles similar to that of single-cell sequencing data31. In brief, the dynamics for a target gene A regulated by genes Bi are modeled in RACIPE as in Eq. 1 below:

dAdt=GAiλBiAiλBiA+1λBiA1+BiBiA0nBiAkAA, (Eq. 1)

where GA and kA signify the maximum production rate and degradation rate of A, respectively, λBiA denotes the fold-change of A in response to regulation from Bi, BiA0 is the threshold level of regulator Bi, and nBiA is the Hill coefficient. RACIPE assumes independent regulatory interactions, thus calculating the dynamics of a gene as the product of shifted Hill functions corresponding to each incoming regulatory interaction. Following the methodology of RACIPE, parameters are sampled from a uniform distribution which ensures a high degree of variability in dynamics and steady states.

Noisy time trajectories were obtained via a similar approach, with stochastic differential equations with a Gaussian white noise term scaled to the average expression of each node, as described in ref. 31. The dynamics for a gene under such stochastic simulations are given by Eq. 2 below, where ξA is the noise level for gene A and ηt represents Gaussian white noise, with zero mean and unit variance:

dAdt=GAiλBiAiλBiA+1λBiA1+BiBiA0nBiAkAA+ξAηt. (Eq. 2)

Signal-induced state transitions (results in Fig. 4) were modeled by adjusting the parameter for production of the signaling target gene and performing new time-series simulations, using the steady state solutions of the untreated condition as the starting point. In these perturbed simulations, the production rate parameter for the affected gene was multiplied by a factor st, which linearly increases from the initial value of 1 to a specified fold-change, then remains at the target signaling level before linearly decreasing back to the initial value (see Fig.4B for the signaling dynamics). Simulations of signal-driven transitions took the form of Eq. 3:

dAdt=stGAiλBiAiλBiA+1λBiA1+BiBiA0nBiAkAA. (Eq. 3)

The package sRACIPE was also applied to simulate ODE time trajectories of specific individual simulated models. To do this, the method was applied as usual with the additional parameter printInterval, which recorded gene expression states at regular intervals as the ODE system evolved from random initial conditions. For the repressilator (REP) circuit (results in Fig.3CD), the parameters used to generate an oscillatory trajectory are listed in Table S2, and the parameters for the multi-stable CTS model (results in Fig.3EF) are in Table S3. All simulation data from RACIPE were normalized using the included function sracipeNormalize, which simply log transforms simulated data with a pseudocount of 1 added.

Dimensional reduction

We performed principal component analysis (PCA) on the log-normalized gene expression data to obtain a clear representation of the phenotypic state space. For datasets with a large number of genes sequenced, namely the EMT and cell cycle datasets used here, the first 15 principal components were used for vector inference instead of the gene expression matrix to save computational cost.

STICCC algorithm

We develop a computational algorithm STICCC to infer the reversible and irreversible vectors of state transitions from single cell gene expression data and a set of gene regulatory interactions. First, the CCC between two cells i and j, ρij, is defined by a Pearson correlation (Fig. 1B)

ρij=correR,is,eT,j, (Eq. 4)

where eR,i and eT,j are the activity or expression levels of a set of regulators for cell i and expression levels of a set of target genes for cell j, respectively; s represents a sign vector indicating the interaction type (i.e., 1 for activation, and −1 for inhibition); denotes the Hadamard (or component-wise) product. The CCC represents the propensity of cell i to tend towards the gene expression state of cell j. The transition propensity from cell j to cell i , ρji, can be then defined in a similar way but with the data source for regulators and targets reversed.

Next, from the single-cell gene expression profiles of a cell population, we infer a state transition vector for each cell by approximating as the local gradient of CCC with respect to the gene expression changes (Fig. 1C). Given a starting cell i and a distribution of CCC values from cell i towards its neighboring cells (i.e., cells within a set radius in gene expression space, see Section Optimization of sampling radius for the selection of a user-defined sampling radius), the outgoing transition vector v1 of cell i was computed using multiple linear regression

y1=v1X+ε, (Eq. 5)

where X is an n×m matrix of relative gene expression values for n neighboring cells from cell i. Each row of X, Xj,:, is the gene expression difference between a neighboring cell j and the starting cell i:

Xj,:=xjxi, (Eq. 6)

where xi represents either the normalized expression of m network genes or the values of m principal components for cell i. y1 is a column vector of size n representing the relative incoming CCC values, where its jth component

y1,j=ρijρii. (Eq. 7)

ε is a vector for the noise term. Then, v1 of cell i was computed by Eq. 8 using the least-squares estimator

v1=XTX1XTy1, (Eq. 8)

The outgoing transition vector represents the change in gene expression associated with the steepest increase in CCC, therefore suggesting a likely future expression state according to the GRN.

While v1 illustrates the future state transitions of cell i, y can also be replaced with y2, a column vector of incoming relative CCC values from neighboring cells to cell i, as opposed to outgoing CCC from cell i toward its neighbors (y2,j=ρjiρii). We define

v2=XTX1XTy2, (Eq. 9)

which indicates the direction of the transition from neighboring cells towards cell i . The minus sign in Eq. 9 is presented to allow to interpret v2 as the direction of transition from cell i, similar to v1 (Fig. 2A). In other words, whereas v1 estimates the gradient of ρij to identify likely future states, v2 estimates the reverse gradient of ρji to identify a transition from likely precursors. To integrate the information from outgoing and incoming transitions, we define the net flow as the average of v1 and v2, which captures irreversible transition patterns, and reversibility as v1v2/2, which captures bidirectional state transitions.

Optimization of sampling radius

Since STICCC vectors are inferred based on the neighboring cells in gene expression space of each cell, it is important to carefully select the size of the neighborhood, which we represent as a fraction of the maximum pairwise distance in gene expression space between cells. A larger value will tend to include more cells in each neighborhood, helping mitigate the effect of noise in the gene expression data, but is more computationally expensive. Moreover, we found that the accuracy of the multiple regression deteriorated at larger sampling radii (Fig. S7). To select an optimal sampling radius for a dataset, we aim to maximize the number of cells with a minimum neighborhood size of 15, a proportion we term ‘coverage’, while minimizing the error in the multiple regression. In particular, we calculated median absolute percentage error (MAPE) to ensure robustness to both outlying cells and variance in the magnitude of inferred vectors between datasets. In practice, we applied a simple grid search to test 11 evenly spaced sampling radii between 0.05 and 0.3, selecting that which maximizes the ratio of coverage to MAPE. Although a more sophisticated optimization may reduce regression error further, we found the transition patterns were relatively robust to small changes in the selected radius. Based on several simulated and experimental datasets, we observed an optimal search radius between 7–20%; in general, smaller, sparser datasets require a larger search radius to provide sufficient coverage. The default value of the search radius in STICCC is thus set to 0.15, or 15% of the maximum pairwise distance.

Edge sensitivity analysis

Given a starting GRN, the role of each member gene and interaction can be uncovered by removing them from the input topology and generating a new vector field on the same expression data, thus simulating a case where the ground truth network is partially unknown or incorrect. The difference between the untreated and perturbed vector fields then shows the influence of the gene or interaction omitted. This can be systematically repeated to quantify and compare the roles of many genes, links, or submodules of the network. The overall impact of a change to the topology is summarized as the median pairwise difference in vector magnitude between the untreated and perturbed vector field – in this way, each perturbation can be compared on the axes of median pairwise magnitude change in net flow and reversibility.

Comparison between STICCC predictions and simulated trajectories

To compare STICCC with simulated trajectories, we first generated noisy trajectories from SDE models with fixed parameters (see Tables S2, 3) and random initial conditions. Each circuit was simulated using sRACIPE for 105 (for REP) or 106 (for CTS) time units with a noise parameter of 0.1 for the REP and 2.0 for the CTS, sufficient to generate a robust gene expression distribution and many instances of the trajectory crossing near the same points. Given the full trajectory, we manually identified points in gene expression space to compare the directions of predicted and observed vectors. To identify observed directions, first all timepoints of the noisy trajectory passing within a set distance (2% of the maximum pairwise distance among the full trajectory data) of some query points were collected to serve as start points. For each start point, the subsequent gene expression state from the trajectory, after a specified lag period, was collected as an end point. The distribution of vectors from each paired start and end points was denoted the observed vector. The lag period was selected for each circuit to be long enough to overcome the effect of noise, but short enough to emphasize local, near-instantaneous gene expression changes, as well as to allow a similar variance in gene expression across instantaneous gene expression fluctuations versus slow overall state transitions. Specifically, this is achieved by computing the root mean squared distance (RMSD) in PCA space between start points and end points as a function of the lag time. Lag times were selected separately for each set of start and end points from the noisy trajectory such that the RMSD between start and end points was as close as possible to a target value, which in this study was set to approximately 0.65 (REP) and 3.5 (CTS) (Fig. S8).

Processing experimental gene expression data

The scRNA-Seq data for the cell cycle in budding yeast was obtained from a previously published study which sequenced transcripts from 38,285 budding yeast cells under multiple treatments 52. In preprocessing, 976 wild type cells grown in YPD (yeast extract, peptone, glucose) were subset from the complete data and log normalized. Non-expressed genes were removed from the initial total of 6828, leaving 5623 log counts values per cell, which were transformed to z-scores. The 15 genes chosen for the GRN were based on extensive review of published literature, as detailed in ref. 25 and Table S4.

Hematopoietic stem cell data was obtained from a previously published study which generated single cell qPCR data from triplicates of blood progenitor EML cells treated with either EPO, GM-CSF/IL3 and ATRA, or a mixture of both35. The former two treatments were designed to cause differentiation into erythroid and myeloid cells, respectively. Beginning from the median Qc values across these replicates for 17 genes of interest identified in the previous study, we subtracted the LOD for each gene to obtain log-gene expression values and combined data across timepoints and treatments. Of these genes, several were excluded from the network and expression data as they had low expression and low variance, leaving a final network of nine genes (network topology in Table S5).

The EMT dataset was obtained from a work published in 2020 which examined EMT across four cancer cell lines and three signaling conditions. ScRNA-seq data were captured at 8 timepoints throughout signal induction and removal. We selected data from A549 cells treated with TGF-β and kept only the first 7 days because few measurements were taken from the later timepoints and it did not appear that a full reversal of EMT took place.53 Gene expression values were log-normalized with a pseudocount of 1. The GRN topology was produced by applying the GRN inference method NetAct to this dataset, resulting in a network of 29 genes 37 (network topology available on GitHub for STICCC analysis). However, edges to and from the gene ESR1 were removed as no transcripts for it were captured in the measurements, and a further five genes were removed which had consistently low variance and expression in the profiled cells (E2F2, IRF9, PPARD, SRF, KLF4). For each dataset where STICCC was applied, the sampling radius and gene expression space used as input is listed in Table S6.

Supplementary Material

Supplement 1
media-1.pdf (4.6MB, pdf)

Acknowledgements

D. Ramirez and M. Lu are supported by startup funds from Northeastern University, by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM128717, and by National Science Foundation under Award Number MCB-2114191. D. Ramirez was also supported by a Northeastern University Bioengineering Department Dean’s Fellowship. D. Ramirez and M. Lu acknowledge their affiliation with the Center for Theoretical Biological Physics at Northeastern University and appreciate the support provided by the center.

Footnotes

Code availability

The STICCC software package is freely available on GitHub: https://github.com/lusystemsbio/viccc/. All analysis scripts and input network topologies used to generate the results in this work are also available on GitHub: https://github.com/lusystemsbio/sticcc_analysis/.

Ethics declarations

The Authors declare no Competing Financial or Non-Financial Interests.

Data availability

The budding yeast scRNA-seq data analyzed here is publicly available at GEO: GSE125162. The hematopoietic stem cell single cell qPCR data is available in the supporting information of the original work 35. The EMT scRNA-seq data are available at GEO: GSE147405.

References

  • 1.Gerstein M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Schwikowski B., Uetz P. & Fields S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000). [DOI] [PubMed] [Google Scholar]
  • 3.Okawa S., Nicklas S., Zickenrott S., Schwamborn J. C. & del Sol A. A Generalized Gene-Regulatory Network Model of Stem Cell Differentiation for Predicting Lineage Specifiers. Stem Cell Rep. 7, 307–315 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Haase S. B. & Wittenberg C. Topology and Control of the Cell-Cycle-Regulated Transcriptional Circuitry. Genetics 196, 65–90 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen Y., Xu L., Lin R. Y.-T., Müschen M. & Koeffler H. P. Core transcriptional regulatory circuitries in cancer. Oncogene 39, 6633–6646 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Forte E. et al. Dynamic Interstitial Cell Response during Myocardial Infarction Predicts Resilience to Rupture in Genetically Diverse Mice. Cell Rep. 30, 3149–3163.e6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Blanco M. A. et al. Chromatin-state barriers enforce an irreversible mammalian cell fate decision. Cell Rep. 37, 109967 (2021). [DOI] [PubMed] [Google Scholar]
  • 8.Verdugo A., Vinod P. K., Tyson J. J. & Novak B. Molecular mechanisms creating bistable switches at cell cycle transitions. Open Biol. 3, 120179 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hormoz S. et al. Inferring Cell-State Transition Dynamics from Lineage Trees and Endpoint Single-Cell Measurements. Cell Syst. 3, 419–433.e8 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Nichols J. M. et al. Cell and molecular transitions during efficient dedifferentiation. eLife 9, e55435 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Doncic A. & Skotheim J. M. Feedforward Regulation Ensures Stability and Rapid Reversibility of a Cellular State. Mol. Cell 50, 856–868 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tian X.-J., Zhang H. & Xing J. Coupled Reversible and Irreversible Bistable Switches Underlying TGFβ-induced Epithelial to Mesenchymal Transition. Biophys. J. 105, 1079–1089 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ke M., Elshenawy B., Sheldon H., Arora A. & Buffa F. M. Single cell RNA-sequencing: A powerful yet still challenging technology to study cellular heterogeneity. BioEssays 44, 2200084 (2022). [DOI] [PubMed] [Google Scholar]
  • 14.Zhou P., Wang S., Li T. & Nie Q. Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics. Nat. Commun. 12, 5609 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yeo S. K. et al. Single-cell RNA-sequencing reveals distinct patterns of cell state heterogeneity in mouse models of breast cancer. eLife 9, e58810 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.La Manno G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kimmel J. C., Hwang A. B., Scaramozza A., Marshall W. F. & Brack A. S. Aging induces aberrant state transition kinetics in murine muscle stem cells. Development dev. 183855 (2020) doi: 10.1242/dev.183855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wolfien M. et al. Single-Nucleus Sequencing of an Entire Mammalian Heart: Cell Type Composition and Velocity. Cells 9, 318 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Qiao C. & Huang Y. Representation Learning of RNA Velocity Reveals Robust Cell Transitions. http://biorxiv.org/lookup/doi/10.1101/2021.03.19.436127 (2021) doi: 10.1101/2021.03.19.436127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bergen V., Lange M., Peidli S., Wolf F. A. & Theis F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020). [DOI] [PubMed] [Google Scholar]
  • 21.Haensel D. et al. Defining Epidermal Basal Cell States during Skin Homeostasis and Wound Healing Using Single-Cell Transcriptomics. Cell Rep. 30, 3932–3947.e6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bocci F., Zhou P. & Nie Q. spliceJAC: transition genes and state-specific gene regulation from single-cell transcriptome data. Mol. Syst. Biol. 18, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lange M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Qiu X. et al. Mapping transcriptomic vector fields of single cells. Cell 185, 690–711.e45 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Katebi A., Kohar V. & Lu M. Random Parametric Perturbations of Gene Regulatory Circuit Uncover State Transitions in Cell Cycle. iScience 23, 101150 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen H., Mundra P. A., Zhao L. N., Lin F. & Zheng J. Highly sensitive inference of time-delayed gene regulation by network deconvolution. BMC Syst. Biol. 8, S6 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Li X. et al. Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling. BMC Bioinformatics 7, 26 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Elowitz M. B. & Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338 (2000). [DOI] [PubMed] [Google Scholar]
  • 29.Loinger A. & Biham O. Stochastic simulations of the repressilator circuit. Phys. Rev. E 76, 051917 (2007). [DOI] [PubMed] [Google Scholar]
  • 30.Huang B. et al. Interrogating the topological robustness of gene regulatory circuits by randomization. PLOS Comput. Biol. 13, e1005456 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kohar V. & Lu M. Role of noise and parametric variation in the dynamics of gene regulatory circuits. Npj Syst. Biol. Appl. 4, 40 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mangan S. & Alon U. Structure and function of the feed-forward loop network motif. Proc. Natl. Acad. Sci. 100, 11980–11985 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pratapa A., Jalihal A. P., Law J. N., Bharadwaj A. & Murali T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods (2020) doi: 10.1038/s41592-019-0690-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Adler S. O. et al. A yeast cell cycle model integrating stress, signaling, and physiology. FEMS Yeast Res. 22, foac026 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mojtahedi M. et al. Cell Fate Decision as High-Dimensional Critical State Transition. PLOS Biol. 14, e2000640 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Burda P., Laslo P. & Stopka T. The role of PU.1 and GATA-1 transcription factors during normal and leukemogenic hematopoiesis. Leukemia 24, 1249–1257 (2010). [DOI] [PubMed] [Google Scholar]
  • 37.Su K. et al. NetAct: a computational platform to construct core transcription factor regulatory networks using gene activity. Genome Biol. 23, 270 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jain P. et al. Epigenetic memory acquired during long-term EMT induction governs the recovery to the epithelial state. J. R. Soc. Interface 20, 20220627 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Piechaczyk M. & Farràs R. Regulation and function of JunB in cell proliferation. Biochem. Soc. Trans. 36, 864–867 (2008). [DOI] [PubMed] [Google Scholar]
  • 40.Kovary K. & Bravo R. The Jun and Fos Protein Families Are Both Required for Cell Cycle Progression in Fibroblasts. Mol. Cell. Biol. 11, 4466–4472 (1991). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Meng F., Li J., Yang X., Yuan X. & Tang X. Role of Smad3 signaling in the epithelial-mesenchymal transition of the lens epithelium following injury. Int. J. Mol. Med. (2018) doi: 10.3892/ijmm.2018.3662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Cui H. et al. DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics. Genome Biol. 25, 27 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Burdziak C. et al. scKINETICS: inference of regulatory velocity with single-cell transcriptomics data. Bioinformatics 39, i394–i403 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Huang L., Clauss B. & Lu M. What Makes a Functional Gene Regulatory Network? A Circuit Motif Analysis. J. Phys. Chem. B 126, 10374–10383 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ahnert S. E. & Fink T. M. A. Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space. J. R. Soc. Interface 13, 20160179 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ye Y., Kang X., Bailey J., Li C. & Hong T. An enriched network motif family regulates multistep cell fate transitions with restricted reversibility. PLOS Comput. Biol. 15, e1006855 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Shen-Orr S. S., Milo R., Mangan S. & Alon U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68 (2002). [DOI] [PubMed] [Google Scholar]
  • 48.Jiménez A., Cotterell J., Munteanu A. & Sharpe J. A spectrum of modularity in multifunctional gene circuits. Mol. Syst. Biol. 13, 925 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Yuan B. et al. CellBox: Interpretable Machine Learning for Perturbation Biology with Application to the Design of Cancer Combination Therapy. Cell Syst. 12, 128–140.e4 (2021). [DOI] [PubMed] [Google Scholar]
  • 50.Ben Guebila M. et al. GRAND: a database of gene regulatory network models across human conditions. Nucleic Acids Res. 50, D610–D621 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chen W. et al. Live-seq enables temporal transcriptomic recording of single cells. Nature 608, 733–740 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Jackson C. A., Castro D. M., Saldi G.-A., Bonneau R. & Gresham D. Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments. eLife 9, e51254 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Cook D. P. & Vanderhyden B. C. Context specificity of the EMT transcriptional response. Nat. Commun. 11, 2142 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (4.6MB, pdf)

Data Availability Statement

The budding yeast scRNA-seq data analyzed here is publicly available at GEO: GSE125162. The hematopoietic stem cell single cell qPCR data is available in the supporting information of the original work 35. The EMT scRNA-seq data are available at GEO: GSE147405.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES