Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 May 23:2023.04.19.537364. [Version 5] doi: 10.1101/2023.04.19.537364

D-SPIN constructs gene regulatory network models from multiplexed scRNA-seq data revealing organizing principles of cellular perturbation response

Jialong Jiang 1,12,*,, Sisi Chen 1,2,11,*, Tiffany Tsou 1,2, Christopher S McGinnis 3, Tahmineh Khazaei 1, Qin Zhu 3, Jong H Park 1,2, Inna-Marie Strazhnik 1, Jost Vielmetter 1, Yingying Gong 1, John Hanna 4, Eric D Chow 5,6, David A Sivak 7, Zev J Gartner 3,8,9,10, Matt Thomson 1,2,
PMCID: PMC10153191  PMID: 37131803

Abstract

Gene regulatory networks within cells modulate the expression of the genome in response to signals and changing environmental conditions. Reconstructions of gene regulatory networks can reveal the information processing and control principles used by cells to maintain homeostasis and execute cell-state transitions. Here, we introduce a computational framework, D-SPIN, that generates quantitative models of gene regulatory networks from single-cell mRNA-seq datasets collected across thousands of distinct perturbation conditions. D-SPIN constructs probabilistic models of regulatory interactions between genes or gene-expression programs to fit the cell state distributions under different perturbations. Using large Perturb-seq and drug-response datasets, we demonstrate that D-SPIN models reveal key regulators of cell fate decisions and the coordination of distant cellular pathways in response to gene knockdown perturbations. D-SPIN also dissects gene-level drug response mechanisms in heterogeneous cell populations, elucidating how combinations of immunomodulatory drugs acting on distinct regulators induce novel cell states through additive recruitment of gene expression programs. D-SPIN provides a computational framework for constructing interpretable models of gene regulatory networks to reveal principles of cellular information processing and physiological control.

Introduction

Cells simultaneously orchestrate a myriad of biological pathways to maintain homeostasis in response to diverse environmental cues [1, 2, 3]. The intricate networks of molecular interactions process information and control cell states. Understanding such networks leads to key biological and therapeutic challenges, including developmental control, novel cell state discovery, and rational intervention design [4, 5, 6, 7]. The construction of mathematical models that can predict the behavior of cells and tissues has been a central goal in computational biology [8]. Methods derived from artificial intelligence (AI), including deep neural networks and transformers, can be trained on large data sets to make predictions but lack interpretability due to the large number of parameters and the feedforward architectures that do not naturally map onto the structure of biochemical pathways [9, 10]. Building interpretable and predictive models of the information processing network could allow the design of therapeutic strategies for manipulating and programming cell states in the human body to treat diseases.

Gene regulatory network (GRN) models are mathematical models that predict genome-wide gene expression levels through networks of gene interactions mediated by different physical mechanisms, including promoter binding, post-transcriptional modifications, and protein interactions [11, 12, 13, 14]. Global reconstructions of gene regulatory networks in E. coli, yeast, and sea urchin embryos have revealed network modularity, recurring network motifs, and combinatorial logic through the assembly of transcription factor (TF) complexes at gene promoters [15, 16, 17, 18, 19, 20, 21]. However, in metazoans, gene network models have primarily focused on subcircuits involved in specific processes like T cell activation and embryonic stem cell differentiation, rather than global, cell-scale networks [5, 6, 7]. Furthermore, there are very few quantitative models of regulatory networks that can simulate and predict the response of a cell to genetic perturbations, therapeutics, or other signals. How gene regulatory networks globally modulate core cellular processes in parallel in response to environmental conditions remains poorly understood.

Historically, gene regulatory network reconstruction and modeling have been constrained by the number of biochemical or genetic measurements required to reconstruct networks with hundreds to thousands of interacting protein components. Classical approaches, such as pairwise biochemical binding measurements or associating genetic perturbations with impacted genes, enable the identification of gene regulators separately [22, 23]. The regulatory interactions are consolidated into pathways and integrated into diagrams of network models [24, 25, 26, 27]. However, for the global analysis of gene regulatory networks, perturbation approaches require knocking out hundreds to thousands of genes while monitoring transcription across thousands of genes. Traditionally, the experimental realization of genome-wide network reconstruction through genetic perturbation has been limited by experimental scale when genes were knocked out in bulk assays.

Developments in single-cell genomics and perturbation barcoding circumvent some of the conventional limitations of perturbation-driven network reconstruction. Perturbation multiplexing approaches, including Perturb-seq, click-tags, and MULTI-seq, allow the transcriptional state of each cell in a population to be measured across thousands of different genetic, drug-treatment, and signaling conditions [28, 29, 30, 31, 32, 33]. These experimental approaches identify the perturbation delivered to every single cell while simultaneously measuring changes in the entire transcriptome through single-cell mRNA-seq, enabling large-scale interrogation of cellular perturbation response to diverse perturbations. A core challenge is developing computational methods to integrate data from thousands of such experiments into a gene regulatory network model that can classify perturbations, map the flow of information across a regulatory network, simulate the interaction between perturbations with the network, and predict cellular responses to novel perturbations.

Here, we introduce a mathematical modeling and network inference framework, D-SPIN (Dimension-scalable Single-cell Perturbation Integration Network), which constructs regulatory network models directly from single-cell perturbation-response data. D-SPIN constructs a probabilistic graphical model on single genes or gene programs to infer regulatory interactions between genes and applied perturbations. The model can simulate the distribution of cell states from the inferred regulatory network, thus providing a circuit/pathway-level interpretation of the data that is sufficient to produce the observed gene expression. The inferred regulatory network achieves superior accuracy compared to existing methods on both synthetic datasets and large single-cell perturbation datasets with hundreds to thousands of genes. While many popular network inference methods only apply to TF regulations as they use motif or chromatin accessibility information, D-SPIN applies to general types of interactions, and the discovered interactions include post-transcriptional regulation and phosphorylation in signaling pathways. D-SPIN can accommodate a wide range of different perturbation types, including genetic perturbations, small molecules, signaling conditions, and even physiological states of health and disease.

We applied D-SPIN to construct gene regulatory network models from experimental datasets with thousands of perturbations and millions of cells, including two of the largest single-cell mRNA-seq datasets in existence: a genome-wide Perturb-seq experiment on the K562 chronic myelogenous leukemia cell line [29] and a new human immune cell drug-response experiment we performed. The integrated regulatory network models of K562 cells reveal key regulators of erythroid/myeloid fate choices and global organizing principles of homeostatic responses to perturbations. In our profiled responses of human immune cells to immunomodulatory drugs, D-SPIN identifies gene-level signatures that classify drugs with different mechanisms of action. D-SPIN further demonstrates that drug combinations can generate novel macrophage states through additive recruitment of gene expression programs, which is mediated by convergent, independent pathways regulating the anti-inflammatory M2 gene program. Broadly, D-SPIN provides a computational framework for large-scale modeling of gene regulatory networks across cell types and physiological conditions, as well as an interpretable and generative representation of cell states for cell-scale models.

Results

Integrating information from perturbations reveals hidden regulatory interactions

To simulate the cell states with regulatory interaction between genes, a core challenge is uncovering hidden conditional dependence between genes. These hidden interactions do not affect the network states under wild-type conditions, resulting in the problem of alternative or indistinguishable models. As a toy model, in a simplified four-pathway network, pathways A and D constitutively inhibit genes B3 and C3 in pathways B and C, respectively (Figure 1A(i), Figures S1A and S1B). In the natural state of the network, since B3 and C3 are persistently suppressed, their expression becomes irrelevant to their upstream regulators B2 and C2. Consequently, these genes have low correlation and mutual information, and the inhibitory interaction between B3 and C3 also remains hidden.

Figure 1: D-SPIN constructs unified gene regulatory network models from single-cell mRNA-seq data across perturbation conditions.

Figure 1:

(A)(i) A computational example of regulatory interactions hidden by other pathways. Gene B2 and B3 only have a weak correlation (Corr) and low mutual information (MI). (ii) External perturbation inputs alter the correlation pattern between genes, providing the possibility to decode hidden regulatory interactions, while computation methods are needed. (iii) Compared with existing top-performing methods, such as GENIE3, D-SPIN achieves more accurate network reconstruction and recovers the hidden regulations from perturbations. (B) D-SPIN accepts as input single-cell mRNA-seq data from a cell population profiled across different perturbed experimental conditions, such as genetic perturbations, drug treatments, signaling conditions, or disease/healthy conditions. (C) To enhance interpretability, apart from single-gene network constructions, D-SPIN contains an optional pipeline of unsupervised gene program discovery. (D) D-SPIN constructs a unified regulatory network J , whose edges represent inferred interactions between genes or gene programs. D-SPIN also infers interactions between the network and applied perturbations. (E) Conceptually, the probabilistic model inferred by D-SPIN defines a probability landscape over all cell states s. The pairwise interaction J between genes/programs defines a basis landscape which can be further modulated by sample-specific perturbation response vectors h(n) through tuning individual gene/program activities. The architecture also enables D-SPIN to scale to large datasets with millions of cells through a parallelized inference procedure. (F) D-SPIN yields a probabilistic model that can be interpreted as a gene regulatory network. The D-SPIN model can be applied to (i) estimate the distribution of transcriptional states defined by the regulatory network model under a specific condition, (ii) reveal gene regulatory network organization, such as modularity and sub-network architecture, and classify perturbation responses in the network context, and (iii) identify gene regulators of the program-level responses.

External perturbations offer an opportunity to reveal such masked interactions by exposing previously hidden dependencies between variables. Under perturbed conditions, these interactions now become impactful. In the four-pathway network, though being inhibited, the expression level of B3 can be modulated by activating or inhibiting pathway B. Further, when the perturbation shuts down pathway A, B3 is primarily regulated by B2, leading to a significantly elevated correlation and mutual information between B2 and B3 (Figure 1A(ii)).

However, integrating the information between multiple perturbations has been a computational challenge, as the impact of perturbations can only be interpreted in the context of the network. A global joint model of network and perturbations is required to combine multiple perturbation observations into a coherent network reconstruction. Existing top-performing methods, such as GENIE3, GRNBoost2, and PIDC, focus on predicting single-gene expression state or information between gene pairs [34, 35, 36, 37]. However, none of them construct a global network model for the observed state distribution, and no perturbation information is explicitly included. Therefore, these existing methods are not suitable for integrating information from multiple perturbation conditions and cannot recover these hidden interactions (Figures S1C).

To address the challenge, we developed the computational framework D-SPIN. D-SPIN constructs a probabilistic model that encodes the distribution of transcriptional states within a cell population across different perturbed conditions (Figures 1B1E, STAR Methods 1.1). Mathematically, D-SPIN builds a condition-dependent spin network model or Markov random field to simulate gene interactions, where perturbations selectively activate certain cell states in analogy to the retrieval of associative memory of Hopfield networks (Figure 1E) [38]. Spin network models were initially introduced in physics to study collective behaviors and have been applied to complex systems in machine learning and biology, such as neural networks, bird flocks, and protein structures [39, 40, 38, 41, 42, 43].

D-SPIN recovered the correct architecture of the four-pathway model by integrating information from perturbations. In contrast, other methods cannot identify the hidden interaction of B2-B3, C2-C3, and B3-C3 when given all the single gene perturbation information (Figure 1A(iii), Figures S1C). While the accuracy of network construction with D-SPIN is similar to existing methods without perturbations (Figures S1C and S1D), the unique architecture of D-SPIN to extract information from perturbations improves the inference accuracy over other methods.

D-SPIN infers a unified regulatory network model from single-cell perturbation data

The application of spin network models to single-cell data has been limited due to the mathematical and computational challenges of inferring model parameters on large datasets [44, 42, 43]. To enable information integration in large experiments with thousands of conditions, we leveraged a factorizing of the spin network equation to split the learning problem into two parts: 1) construction of a unified gene regulatory network model and 2) inference of how each perturbation interacts with the unified network model (STAR Methods 1.2). For large networks, we further simplified the objective function by estimating conditional dependence between genes using pseudo-likelihood [45, 46, 47, 48]. These efficient computational procedures of D-SPIN allow network inference of up to thousands of genes and millions of single cells (STAR Methods 1.3).

As gene-level networks with thousands of nodes are difficult to interpret, we also designed D-SPIN to construct reduced-dimensional models by modeling interactions between gene programs, i.e., co-regulated groups of genes [49, 50, 51]. Cells regulate their transcriptional states by modulating transcription factors that impact sets of genes related to differentiation state, cellular pathways, and core physiological processes. Building upon published algorithms, D-SPIN contains an automatic pipeline for identifying gene programs from single-cell data based on unsupervised orthogonal nonnegative matrix factorization (oNMF) and phenotypic gene-program annotation (Figure 1C, STAR Methods 1.9) [52, 53, 54, 55]. D-SPIN can also accommodate pre-defined gene sets from prior biological knowledge (STAR Methods 1.10). The program-level model provides a coarse-grained description of cellular networks, in which D-SPIN identifies the interaction between these programs and their interaction with applied perturbations.

The training of the D-SPIN model is to assign a probability to each cell state under different perturbation conditions that best match the observed data under the specific form of the model (Figure 1E). Depending on the scale of the dataset, D-SPIN employs three different gradient-based algorithms to optimize the maximum likelihood estimator and estimate the regulatory network (J) and each perturbation vector (h) (STAR Methods 1.3). Following the inference of J and h, D-SPIN models can be analyzed as probabilistic graphical models that reveal regulatory connections between cellular processes and identify interactions between applied perturbations and the genes or programs (Figure 1F). Further, to connect the program-level network model with gene-level models, we also developed algorithms to identify key gene regulators for the perturbation responses of gene programs (Figure 1F, STAR Methods 1.4).

D-SPIN models can simulate the distribution of transcriptional states in a cell population under normal and perturbed conditions (Figure 1F). D-SPIN provides a quantitative framework for generating the observed gene expression with a gene network model, providing a circuit/pathway-level representation of the data. Therefore, D-SPIN dissects the uninterpretable cell state shifts in low-dimensional embeddings like PCA or UMAP into the action of perturbation onto specific nodes of the regulatory network. D-SPIN can be applied to model cell populations perturbed by different inputs, including gene knockdown/activation, altered signaling conditions, and disease states, as well as the treatment of small molecules or other therapeutics.

D-SPIN constructs an accurate, interpretable, and generative model of HSC perturbation response

To assess the network model constructed by D-SPIN, we applied D-SPIN to a synthetic dataset of hematopoietic stem cell (HSC) differentiation using the network simulation and benchmarking framework BEELINE [37]. In the HSC network, there are two key regulators of cell fates: the transcriptional factor (TF) SPI1 (PU.1) controls granulocyte/monocyte differentiation, and GATA1 controls megakaryocyte/erythrocyte fates (Figure 2A). From biologically identified regulatory networks, we used the BEELINE framework to generate synthetic gene-expression profiles from the HSC network for 22 perturbation conditions, encompassing knockdown/activation of each network node individually (Figures 2B and 2C, STAR Methods 2.6). The simulated single-cell expression profiles are classified into 7 different cell state designations with the known TF markers, including monocytes (Mono), granulocytes (Gran), erythrocytes (Ery), and megakaryocytes (Mega) [56].

Figure 2. D-SPIN achieves state-of-the-art performance on network inference benchmarking tasks and reveals network-level mechanisms of cell fate modulation by TF perturbations.

Figure 2

(A) HSC regulatory network model [56] contains 11 TFs that interact through activation (blue) and repression (red) to modulate the differentiation of HSCs into different cell types. (B) Example simulated gene-expression profile heatmaps from the network across a series of single-gene knockdown and activation conditions. (C) Heatmap of average TF expression and UMAP embeddings for 7 clusters of cell states generated across all simulated conditions. (D) The network diagram shows the unified regulatory network model inferred by D-SPIN. Edges with diamond markers show D-SPIN-inferred perturbation vectors that infer how knockdown or activation of TFs (e.g., GATA1 knockdown) impacts the regulatory network through up- or downregulation of TFs. (E) Diagrams of the true network and inferred regulatory networks by D-SPIN and other state-of-the-art methods. The box plots quantify the accuracy of the top 10 and 20 inferred edges across 10 random repeats of simulations. The ground truth network has a total of 20 edges. (F) (left) UMAP embedding comparisons of simulated single-cell data by BEELINE and state distributions generated by D-SPIN models for control and PU.1 activation/knockdown. (right) Bar plots quantify the proportion of cell types from simulated data and D-SPIN models. (Bottom) Cosine similarity between the cell state distributions of data and the D-SPIN model. D-SPIN models generate cell state distributions highly consistent with simulated data by applying perturbations to the underlying network. (G) (top) Network diagram and (bottom) UMAP embedding of three different perturbations that generate increased monocyte population. The D-SPIN model reveals network edges that mediate the response of the network to different perturbations by applying a formal reasoning framework to the network model. Both PU.1 activation and knockdown of CEBPA or Gfi-1 generate increased monocyte proportions by directly interacting with monocyte genes or indirectly through the interaction between Gfi-1 and EgrNab. (H) (top) Network diagram of an example 250-node directed modular network with 5 mutually inhibiting modules and (bottom) simulated single-cell data from the network for evaluating the accuracy and scalability of network inference. (I) Diagrams of the inferred directed network of different methods, as in (top) subnetwork of correctly-inferred edges and (bottom) adjacency matrices with true positives, false positives, and false negatives. (J) The plot shows the inference accuracies of each method when given different numbers of perturbations in 10 randomly generated modular networks. The accuracy of D-SPIN continuously increases with the perturbation number, while other methods do not improve. (K) Bar plots quantify the directed network inference accuracy of three prevalent network topologies with different sizes of 125, 250, 500, and 1000 under 800 perturbations in 10 random repeats. D-SPIN achieves significantly increased accuracy compared with other methods across network sizes.

The network topology inferred by D-SPIN achieves higher accuracy compared with existing network inference algorithms. We applied D-SPIN to construct a gene-level network model from the simulated data and compare the network structure with 3 different network reconstruction methods (PIDC, GRNBoost2, and GENIE3) that were top performers in the BEELINE benchmarking study [34, 35, 36, 37]. For D-SPIN, on average, 0.96 of the top 10 edges found by the model were correct across inference runs, as compared to 0.6 to 0.77 for PIDC, GRNBoost2, and GENIE3 (Figures 2D and 2E). For D-SPIN, 0.75 of the top 20 edges were correct, compared with 0.625 to 0.665 for the other methods (Figure 2E). Furthermore, the inference accuracy of D-SPIN further improves through sampling higher-order perturbations. By including 96 additional double-gene perturbations, D-SPIN achieved 100% accuracy on the top 10 edges and improved the accuracy of the top 20 edges from 0.75 to 0.81 (Figure S2B(i)).

Inferring directions of interactions from only stationary data has been a difficult and persistent problem [34]. D-SPIN can also infer a directed network model with the pseudolikelihood algorithm. The direction of interaction is inferred from the asymmetry between the predicting power of two factors (STAR Methods 1.3). D-SPIN achieved 0.73 accuracy with the top 26 directed edges on average, while the other three methods had accuracies of 0.41, 0.47, and 0.53. Despite the strength of D-SPIN in directed network inference, we focus our following analysis on undirected networks as mathematically, there are fundamental difficulties in simulating stationary cell state distribution from directed graphical models with feedback regulations (STAR Methods 1.3) [57, 58, 59].

D-SPIN is a generative model that can simulate the distribution of transcriptional states of cell populations under perturbations. Most existing network inference methods are not generative, as they use either regression or information-theoretic measures to identify candidate gene interactions [37]. For the HSC datasets, the distribution generated by the D-SPIN model was highly concordant with the data distribution across all perturbation conditions, as visualized by the UMAP embedding and cell state distribution comparison (Figure 2F, Figure S2C). Quantitatively, the cell state distributions by D-SPIN were all above 96% cosine similarity with the simulated data. Thus, for the HSC network, D-SPIN constructed a quantitative model of how the cell population is generated by the underlying gene regulatory network.

As an interpretable and generative model, D-SPIN can be analyzed to gain insight into how edges within a D-SPIN network model mediate the response to applied perturbations. We developed a formal framework to compute how the knockdown or activation of specific nodes in the network might impact the distribution of cell states generated by the regulatory network and to define how individual edges contribute to the perturbation response of the network (STAR Methods 2.7). For example, both PU.1 activation and the knockdown of either CEBPA or Gfi-1 led to an increased proportion of monocyte cell states, but with different inferred network mechanisms (Figure 2G). Activating PU.1 increases the activity of other monocyte genes (EgrNab and cJun) via the PU.1-EgrNab and PU.1-cJun edges. In contrast, under Gfi-1 knockdown, although Gfi-1 has both positive and negative interactions with the PU.1-EgrNab-cJun module, edge-sensitivity analysis shows that the negative interaction Gfi-1-EgrNab has a stronger impact on PU.1. The knockdown increases the activity of PU.1, EgrNab, and cJun, therefore boosting the monocyte population (Figure S2D(iv)). Likewise, CEBPA knockdown reduces Gfi-1 activity through the CEBPA-Gfi-1 interaction, producing similar increased monocyte states as in Gfi-1 knockdown (Figure S2D(v)). Overall, PU.1 activation induced increased monocyte states through the positive interactions PU.1-EgrNab and PU.1-cJun, while increased monocyte states in CEBPA or Gfi-1 knockdown were mediated by the negative edge Gfi-1-EgrNab. Therefore, similar cell-state distribution changes of increased monocyte states can be achieved by two different network-level mechanisms in the D-SPIN model.

Thus, D-SPIN, using simulated data, can infer an accurate and generative regulatory network model of a single-cell population by integrating information across perturbation conditions. The network model provides a map of how the distribution of cell states in the underlying HSC cell population is generated through interactions between internal regulators. The D-SPIN model can simulate the cell population and hypothesize network edges and network-level mechanisms that might mediate the observed cell population shifts under perturbations.

D-SPIN achieves superior accuracy in large-scale regulatory network inference

To demonstrate the capability of D-SPIN of building accurate network models with perturbation information, we assessed the inference accuracy of D-SPIN against other state-of-the-art methods on large-scale simulated networks with hundreds to a thousand nodes and various topologies. We explored three distinct network architectures: highly modular networks, Erdős–Rényi (ER) random networks, and scale-free networks (Figure 2H, Figure S2E).

Each network topology captures specific characteristics that occur in real biological networks. In modular networks, activating interactions are enriched in the same module, representing groups of coexpressed genes that often perform coherent biological functions. Modular networks present challenges for network inference procedures because gene coexpression makes it difficult to define the precise regulators that control module activation and inhibition [19, 50, 60, 61]. ER networks randomly connect any pair of genes, which are considered the most general random network topology [62]. Scale-free networks contain hub nodes with a high number of interactions, and some biological networks exhibit similar statistics, such as metabolic networks [63, 64]. We generated each topology with 125, 250, 500, and 1000 nodes. As perturbation design for large networks remains an open challenge, we applied random perturbations to each network, where the perturbations are independent, normally distributed random variables on each network node (STAR Methods 2.8).

D-SPIN achieved significantly higher accuracy in directed network inference compared with other methods, including PIDC, GRNBoost2, and GENIE3 (Figure 2I, Figure S2F). In the example 250-node modular network, the accuracy of the D-SPIN inferred network is 0.695, more than double that of the best of the other three methods, 0.303. The accuracy improvement came from D-SPIN’s ability to integrate information from perturbations. With no perturbation, the accuracy of D-SPIN is similar to that of existing methods. However, with the increasing number of perturbations, the accuracy of D-SPIN continues to rise from an average of 0.37 to 0.62 with 800 normally distributed random perturbations (Figure 2J). In contrast, applying the alternative network inference methods (PIDC, GRNBoost2, GENIE3) to each perturbation condition and averaging the results does not increase the accuracy of inference (Figure 2J, Figure S2G).

The superior accuracy of D-SPIN applies to all tested network topologies and scales up to a thousand genes. In 1000-gene networks, the directed network inferred by D-SPIN has average accuracies of 0.62 in modular networks, 0.68 in ER networks, and 0.70 in scale-free networks, compared to 0.36, 0.30, and 0.08 for the best of the other three methods (Figure 2K). The most striking example is scale-free networks, where the networks contain activating interactions from the hub nodes, and the unperturbed steady state distributions are all genes being turned on. D-SPIN still achieved an accuracy of 0.70 by integrating information from perturbations, while the accuracies of other methods are only 0.08, which is uninformative about the true network architectures (Figures S2Fand S2G). The performance difference demonstrates the importance of combining all perturbations into the same model.

Additionally, the training algorithms of D-SPIN are highly efficient and faster than the other three methods by orders of magnitude in large-scale single-cell datasets containing hundreds of thousands of cells. We assessed the scalability of different network inference methods with the number of cells (Figure S2H, STAR Methods 2.9). D-SPIN became the fastest method for datasets exceeding 16,000 cells, a number most current single-cell studies surpass. Remarkably, at 256,000 cells with 2 CPU cores, D-SPIN finished in 6 hours while all other methods could not finish within a week.

The directed TF-target network inferred by D-SPIN corresponds well with ChIP-seq databases

D-SPIN builds regulatory network models directly from large-scale single-cell datasets, providing mechanistic insights into the hidden associations between genes and pathways. The model allows us to generate hypotheses about the cellular pathways that can motivate further experimental validations, such as identifying interactions between different pathways and nominating key regulators of biological processes. Perturb-seq is a single-cell genomics technique that enables highly parallelized genetic screens by combining the knockdown of thousands of genes using CRISPR/Cas9 with a transcriptome-scale single-cell mRNA-seq readout [28, 69]. We applied D-SPIN to one of the largest genome-wide Perturb-seq datasets of human chronic myelogenous leukemia (CML) (K562) cell lines, where 9,867 genes were knocked down individually across 2 million single cells [29].

To assess the network reconstruction quality of D-SPIN, we inferred directed gene-level regulatory networks and compared the results with independent biological measurements from ChIP-seq databases. ChIP-seq data measures the binding of transcription factors (TFs) to target gene promoters and has been used in the literature to assess the quality of inferred gene regulatory network models [37]. We created datasets of genes with active TFs and 500, 1000, and 1500 highly variable genes (HVGs), and evaluated the network inference of D-SPIN together with PIDC, GRNBoost2, GENIE3, CellOracle, and z-score on all cells in the datasets, including perturbed cells [37, 70, 71] (STAR Methods 2.10). D-SPIN achieved leading correspondence with ChIP-seq data with accuracy rates (fold improvement over random guess) across all three datasets (Figure 3A). Notably, associating gene responses with the perturbed gene by differential expression (DE) or z-score was shown to be effective in flow cytometry or bulk sequencing data [71, 72], but has limited performance in our evaluation, likely due to the high noise level in the Perturb-seq data.

Figure 3. Regulatory network models of genome-wide Perturb-seq data reveal the architecture of cellular pathways and regulators of differentiation fate control.

Figure 3

(A) Bar plots quantify the (early) accuracy rate of the correspondence between regulatory network inference and ChIP-seq databases for multiple datasets generated from the genome-wide Perturb-seq dataset [29]. Methods with the (*) symbol utilize TF-motif binding information as prior knowledge in the network inference. DE/z-score stands for using differentiation expression/z-score to link responding genes to perturbed regulators. (B) Schematics of the experiments of the genome-wide Perturb-seq dataset. (i) 9,867 gene knockdown perturbations are individually delivered to ∼2M human K562 cells using guide RNAs. (ii) Scatter plots of the number of differentially expressed genes (DEGs) and the number of cells that passed quality filtering for each perturbation. (iii) The D-SPIN model constructs a unified regulatory network model and infers interactions between each perturbation and the network, classifying gene-knockdown perturbations into groups. (C) The network diagram shows the inferred regulatory network model (J matrix) between 30 gene-expression programs (P1-P30, circles), as well as the interactions between programs and 40 groups of gene-knockdown perturbations (G1-G40, diamonds) encoded in response vectors (h vectors). Interactions are rendered as positive (blue) or negative (red) edges, with thickness scales with the strength of interactions. (left box) Gene programs are functionally annotated through gene ontology annotation tools and manual lookup [65, 66, 67]. (D) (top) The regulatory network model exhibits a modular structure with tightly connected subnetworks. The 7 network modules can be assigned physiological functions based on the gene programs they encompass. (bottom) The histogram quantifies the distribution of all network edges and edges inside the same module. Edges inside modules are mostly positive interactions and contain the majority of strong positive interactions in the network. (E) Bubble plot shows enriched cellular pathways in gene ontology and KEGG pathway from databases [65] for each guide RNA group. Gene targets in the same guide group identified by D-SPIN are involved in similar pathways or potentially reveal associations between pathways, such as (arrow pointer) protein transport genes AKIRIN2 and IPO9 that were recently found to (inset schematics) mediate the import of proteasome into the nucleus [68]. (F) Scatter plots show (left) gene knockdowns that have the strongest effect on the erythroid and myeloid differentiation programs quantified by z-score; (right) genes with the strongest interactions with the erythroid and myeloid program inferred by D-SPIN. 9 of top 10 regulators identified by D-SPIN are known regulators associated with erythroid/myeloid differentiation. (G) Rendering of the subnetwork of activating regulators of the erythroid and myeloid programs inferred by D-SPIN. Most genes in the subnetwork are known regulators, and several are key regulators of the differentiation.

Regulatory relationships between TFs and targets are also reflected in the binding motifs, and the information is also widely leveraged for regulatory network construction in methods like SCENIC+ and CellOracle [73, 70]. As an explicit statistical framework, D-SPIN can also incorporate prior information on motif-binding information to prioritize such interactions (STAR Methods 1.5). Independent of the single-cell datasets, the motif-binding network has better correspondence with ChIP-seq datasets than all network inference methods. After incorporating the motif-binding prior information, both CellOracle and D-SPIN exhibited substantial accuracy rate improvement, while D-SPIN still achieved the highest correspondence with ChIP-seq data. The early accurate rate of D-SPIN achieved 12 to 16, indicating that the top-identified interactions by D-SPIN are highly reliable (Figure 3A).

Program-level regulatory network model of genome-wide Perturb-seq data

Interpreting gene-level networks with hundreds to thousands of nodes can be complex. For interpretability, we constructed a regulatory network model with D-SPIN at the gene program level to investigate the global architecture and information processing logic of the regulatory network implied by the perturbation responses. Transcriptome-scale gene expression changes under gene knockdown reflect how cells coordinate the activities of major cellular functions to maintain homeostasis under stress. To reveal the global architecture of perturbation responses, we focused on 3,136 knockdown perturbations that are each associated with more than 10 differentially expressed genes (DEGs) and 20 cells (Figure 3B) (STAR Methods 2.1). We coarse-grained the transcriptional profile into 30 gene programs, a number informed by both the Bayesian information criterion (BIC) and the elbow method, to balance the model’s representative power and complexity (Figure S3B)(STAR Methods 1.11) [74].

The extracted gene programs reflect both general cell biology and lineage-specific gene programs for K562 cells (Figure 3C, Figure S3A, SI Table 3). We annotated the gene programs with a combination of bioinformatic databases, including DAVID, Enrichr, and manual lookup [65, 66, 67]. The gene programs contain genes involved in core cellular processes such as transcription, translation, RNA processing, and mitosis. There are also lineage-specific programs, including an erythroid-specific program with hemoglobin (HBG1, HBG2, HGZ) and glycophorin (GYPA, GYPB, GYPE) genes, as well as two myeloid-associated programs with phagosome/actin-organization (ACTB, ACTG1, ARPC3) and immune-response (LAPTM5, VASP, RAC2) genes, respectively, which agrees with the multi-lineage potential of K562 cells.

D-SPIN generated a program-level regulatory network model that provides a wiring diagram of K562 internal regulations and responses to gene knockdown perturbations. The inferred network model is robust against sampling noise, and more than 98% of edges remain structurally stable when resampled from near-optimal solutions of the network inference problem (Figure S3C, STAR Methods 1.7). The D-SPIN network model contains a set of sub-networks or modules of programs that are enriched with positive interactions (Figure 3D). Using the Leiden community detection algorithm [75] (STAR Methods 2.12), we automatically decomposed the D-SPIN network into seven core cellular functions, including transcription, translation, protein degradation, and cell-cycle-related functions.

The network contains a set of negative interactions between gene programs that are expressed in distinct cellular states. The strongest negative interaction is between gene programs expressed at different stages of mitosis. For example, the P29 Spindle microtubule has negative interactions with both P25 DNA replication and P27 Histone. The strong interactions inside the cell cycle sub-network of P25-P30 are able to reconstruct the transcription state distribution during cell cycle progression (Figures S3D and S3E). We also observe negative interactions between P4 Erythroid and P6 Phagosome, which is consistent with the presence of two mutually exclusive differentiation paths that lead to erythroid and myeloid cell fates.

In addition to the core K562 regulatory network, D-SPIN also inferred interactions between each gene program and each gene knockdown perturbation (Figure 3C). Similar to the grouping of sub-network modules, we classified knockdown perturbations into a set of 40 perturbation groups that we refer to as G1-G40 (Guide RNA group 1 through 40) with unsupervised Leiden clustering of the perturbation response vectors (STAR Methods 2.12, SI Table 3). Gene knockdowns within the same guide cluster have similar perturbation responses, suggesting that these genes are involved in the same pathway or have potential interactions.

Identified clusters reflect the pathway-level organization in the K562 cell, revealing both well-known cell biology and more novel or cryptic organization of pathways (Figure 3E). Globally, each guide clusters are enriched for specific cell-biological functions, including DNA replication, the MAPK signaling pathway, and RNA degradation. As an example of interactions between pathways, we found components of the proteasome core particle (20S proteasome) are grouped into cluster G3 with genes involved in protein transport, including AKIRIN2 and IPO9 (Figure 3E). In recent studies, AKIRIN2 was found to bind directly to fully assembled 20S proteasomes to mediate their nuclear import with the nuclear import factor IPO9 [68]. Similarly, the D-SPIN guide clusters group the gene C7orf26 with the integrator subunits INTS10, INTS13, and INTS14, a key result of the original Perturb-seq study [29].

D-SPIN network identifies key regulators of erythroid and myeloid differentiation

The program-level network provides a global view of regulatory architecture, but regulations are ultimately executed by specific genes. Identifying these key regulators can motivate experimental validations and help pinpoint potential therapeutic targets. Therefore, we also designed D-SPIN to infer interactions between gene programs and candidate regulators such as transcription factors, providing a finer-grained perspective of gene regulations (STAR Methods 2.11).

K562 cells have the potential to differentiate into erythroid or myeloid lineages. Traditional approaches to finding cell fate controllers often examine the phenotype measurements of knockdown experiments. In Perturb-seq, knocking down a fate regulator may alter the marker gene expression of the target fate or the opposite fate. However, the knockdown may also not produce an obvious phenotype due to the robustness of fate-control networks. By integrating all perturbation information into a unified network model, D-SPIN provides a more comprehensive way of tackling such problems. For comparison, we extracted an erythroid and a myeloid program from the original Perturb-seq study and analyzed potential regulators with both differential expression (DE) and D-SPIN network models [29].

The D-SPIN network uncovers many more well-known regulators of the two cell fates compared with the standard DE analysis. D-SPIN discovered erythroid fate regulators KLF1, NFE2, GFI1B, and GATA1, while DE only found GATA1; D-SPIN discovered myeloid fate regulators SPI1 (PU.1) and MEF2C [76], while DE found none of them (Figures 3F and 3G, SI Table 3, SI Data). Interestingly, D-SPIN uniquely identified NPM1, which inhibits both cell fates. Recent research shows that in K562 cells, BCR-ABL1 kinase activates NPM1 to relocate KLF1 and SPI1 proteins from the nucleus to the cytoplasm, therefore silencing their cell fate control capabilities [77]. The relocation of KLF1 and SPI1 potentially explains why their knockdowns have little effect on the phenotype of cell fate program expressions.

The striking contrast between the regulators identified by D-SPIN and those indicated by traditional DE analysis underscores the advantage of constructing a unified regulatory network model. D-SPIN discovered the cell-type-specific (BCR-ABL1 dependent) post-transcriptional differentiation inhibition effect of NPM1, which also cannot be found with methods using TF binding motif analysis. By constructing a network model, D-SPIN can uncover hidden regulatory interactions in single-cell perturbation datasets that are missed by traditional analyses.

Coarse-grained D-SPIN models provide insight into global perturbation response strategies

Cells coordinate many internal processes to maintain homeostasis in response to damage and perturbation, while the corresponding control strategies remain poorly understood due to the lack of global models. The program-level D-SPIN model allows us to analyze the distributed regulation of core cellular processes in response to damage induced by gene knockdown.

While the program-level network is compact, the model can be simplified further through an automated coarse-graining strategy based on graph clustering (Figure 4A). Gene programs are grouped into sets that we call modules and gene perturbations are grouped into major categories. We term the strategy network coarse-graining, similar to statistical physics, where correlated degrees of freedom are grouped into a single variable to aid computation and interpretation. The coarse-graining strategy produces a minimal network with 4 distinct patterns of program activation and inhibition under perturbations, which we call stress-response strategies (STAR Methods 2.13).

Figure 4: D-SPIN network model identifies global perturbation response strategies in K562 cells for distinct classes of gene knockdowns.

Figure 4:

(A) Diagram of network coarse-graining by grouping gene programs into 7 identified gene program modules and grouping guide RNA groups into 4 identified strategies. The resulting coarse-grained model enables the global analysis of cellular regulatory responses. (B) Violin plots of averaged perturbation response vectors on programs in guide groups in each response strategy. K562 cells contain 4 distinct classes of global response strategies, and we name each strategy by the upregulated characteristic biological function. (C) Violin plots show gene program expression under knockdown of example pathway components relative to the control samples treated by non-targeting guide RNAs. The upregulated biological function is typically distinct from the genes being perturbed. For example, when a RNA polymerase subunit is knocked down, the cell upregulates metabolism while downregulates translation and degradation; when a ribosome subunit is knocked down, cells upregulate protein degradation and metabolism while downregulating translation. (D) Violin plots show the single-cell gene expression distribution for ribosome and proteasome subunits under knockdown of example genes in translation upregulation and degradation upregulation strategies relative to control samples. The program-level response strategies are reflected as coherent expression changes at the single-gene level. For example, when respiration-related genes COX17 or COX7C are knocked down, all ribosome subunit genes are upregulated, and proteasome subunit genes are downregulated.

The D-SPIN-identified response strategies suggest non-trivial forms of homeostasis regulation. We found that each strategy features the upregulation of a major biological function and the downregulation of other functions, except for the fourth strategy, which upregulates both metabolism and degradation (Figure 4B). The four global response strategies are named by the characteristic program upregulation as metabolism upregulation, transcription upregulation, translation upregulation, and degradation upregulation. In general, K562 cells may be capable of mounting a wide range of homeostatic response strategies, but D-SPIN identified the four highlighted strategies under the specific context of Perturb-seq experiments.

The regulatory strategies provide insight into the information flow in the cell. The compensatory functions are typically not directly associated with the gene knockdown, suggesting long-range regulation or coupling between seemingly distinct processes. For example, a large number of metabolism upregulation responses were induced by perturbations to transcription processes, including RNA polymerase(G9, G22), Transcription factor IID (G10), and mediator (G16). Consistent with the observation, transcription stress has been shown to induce an elevated ATP pool and rewired metabolism states [78]. The knockdown of mTOR signaling components leads to translation upregulation responses, reflecting the importance of mTOR in coordinating protein synthesis and energy utilization [79]. Furthermore, disruptions to different sub-processes of the same cellular process can lead to distinct response strategies. For example, perturbations to translation initiation factors (G28) lead to the upregulation of translation-related genes and the downregulation of protein degradation. Perturbations to ribosome subunits and ribosomal RNA (G28, G39, G40) induce the downregulation of translation-related genes and the upregulation of protein degradation. At the single-gene level, the response strategies are reflected as the coherent up- or downregulation of genes involved in the corresponding gene programs (Figure 4D).

These different response strategies indicate that the cell mitigates the loss of a gene by upregulating a potentially compensating cellular function. These upregulations suggest the existence of active regulatory feedback within the network to connect distinct cellular processes for homeostasis maintenance. The D-SPIN model organizes the thousands of perturbations and two million single cells into classes of regulatory strategies, providing insights into principles of information processing and homeostatic control.

Modeling immunomodulatory drug responses in primary human immune cells

Understanding how unique cell types in an interacting community respond to a given drug or other therapeutic interventions is essential for effective drug development. Specifically, immunomodulatory drugs vary in the breadth and specificity of their biochemical targets, and a key question is to understand how differences in biochemical preferences translate into transcriptional changes, and thus differences in therapeutic responses [80, 81, 82].

We therefore developed an experimental platform for large-scale profiling of human immune cell drug responses and analyzed the single-cell data with D-SPIN. Our experimental system was designed to characterize T cell-driven hyper-activation of the immune system, as is observed in auto-immune and hyperinflammatory states. We cultured a heterogeneous population of primary donor-derived peripheral blood mononucleated cells (PBMCs) that contained T cells, B cells, myeloid cells, and NK cells. We activated the T cells with anti-CD3/CD28 chimera (Figure 5A, Figure S4A, STAR Methods 1.12), which led to the immune activation of the entire cell population.

Figure 5. D-SPIN derives a drug-response network model from human immunomodulatory drug-response single-cell mRNA-seq profiling experiments.

Figure 5

(A) Schematic of the experiment design for profiling drug responses on T cell-mediated immune activation. Peripheral blood mononuclear cells (PBMCs) were harvested from a human donor. The cell population was treated by anti-CD3/CD28 antibodies that specifically activate T cells and immunomodulatory drugs drawn from a library. The cell population was profiled after 24 hours of drug and antibody treatment. (B) (top) PCA projections derived from a 30-hour time-course experiment of T cell-mediated immune activation, where samples were taken every 30 minutes for single-cell mRNA-seq. (bottom) Time courses of the proportion of activated and resting cells in each cell type and example activation gene markers. T cells reach activated states first in 2 hours, and myeloid cell activation lasts 16 hours. (C) (left) Histogram of profiled drug classes shows a variety of biochemical properties and target pathways. (right-top) UMAP embedding of 1.5 million filtered single cells obtained from the drug profiling experiments. In the profiled cell population, we identified 32 cell states in the major cell types of T, B, NK, and myeloid cells, and each cell state is curated by marker genes and gene differential expression. (right-bottom) UMAP embedding of the resting control cell population (without antibody activation) and the activated cell population. The resting and activated cell states compose the major partition on the UMAP. (D) D-SPIN-inferred regulatory network model between 30 gene programs (P1-P30, circles), as well as interactions between programs and 7 drug classes (diamonds) identified through clustering the response vectors. In the network rendering, each program is positioned on the UMAP at the cell state where it is most highly expressed. (right box) Gene programs are functionally annotated through gene ontology annotation tools and manual lookup [65, 66, 67]. (E) (left) UMAP embeddings of experimental cell state distribution and the state distribution generated by the D-SPIN model in activated control, tacrolimus treatment, and vorinostat treatment. (right) Line plots quantify the cell state distributions of experimental data and D-SPIN models. The dashed line is the uniform distribution for reference. D-SPIN models closely match the control and tacrolimus-treated samples. The model fits less well in the vorinostat-treated sample but still captures the overall distribution pattern. (F) Scatter plots show cosine similarity between experimental data and cell-state distributions generated by D-SPIN, compared with two reference null models, uniform distribution and cell type abundance by pooling all cells together. D-SPIN models have higher than 0.9 cosine similarities for 92.4% of conditions.

To profile the action of different immunomodulatory drugs on the immune activation process, we selected 502 small molecules and collected a total of 1.5 million filtered single cells. The drug library contains a diverse set of small molecules targeting pathways, including mTOR, MAPK, glucocorticoids, JAK/STAT, and histone deacetylases (HDAC) (Figure 5C, SI Data). Therapeutically, the library contains drugs used for treating auto-immune diseases (e.g., tacrolimus, budesonide, tofacitinib) as well as FDA-approved anti-cancer drugs (e.g., bosutinib, crizotinib). Drugs were added together with the activating antibody or alone to profile their effect on the resting cell population. In total, we profiled 1.5 million filtered single cells in resting and activated conditions, with over 1,200 total conditions and 31 different immune cell states, including 4 CD4 T cell states, 10 CD8 T cell states, an NK cell state, 4 B cell states, and 8 myeloid cell states. (Figure 5C, Figure S4E, STAR Methods 2.2, SI Table 5).

We constructed a program-level regulatory network with D-SPIN to dissect how small molecules interact with the network to create the altered cell population states. Although the cell population contains various immune cell types and drug-specific cell states, D-SPIN took all single cells to construct a unified regulatory network model that captures all the cell types and cell states by pairwise interaction between gene programs and drug impacts on the programs. We coarse-grained the transcriptional profile into 30 gene programs, a number informed by both the BIC and the elbow method (STAR Methods 1.11).

Among the discovered gene programs, there are global cell-type programs such as P6 T cells, P15 B cells, P14 NK cells, and P20 myeloid cells, as well as specific cell-state programs, including those corresponding to T cell resting and activations, anti-inflammatory M2 macrophage, and pathogen-responding M1 macrophages. The program function was annotated by a combination of informatics databases and manual lookup [65, 66, 67] and subsequently validated on a single-cell atlas of human immune cells [83].

Similar to the K562 Perturb-seq response network, the drug response network has a distinct modular structure, with a set of core network modules composed of tightly interacting gene programs (Figure 5D, Figure S5C). Each module represents a group of programs that are expressed together in a cell type or subtype in the population. For example, a module containing P11 Granzyme K T cell, P12 T cell cytotoxicity, P13 Effector CD8 T cell, and P14 NK cell corresponds to the population of CD8 T cells and NK cells. Negative interactions primarily occur between programs expressed in different cell types, such as P2 T cell maintenance and P19 Antigen presentation, as antigen presentation happens in B cells and myeloid cells but not T cells (Figure 5D).

As a minimal probabilistic model, D-SPIN can simulate the drug-modulated cell state distribution with only 465 network interaction parameters and 30 parameters for each treatment condition with high fidelity. Qualitatively, we found visual similarity between D-SPIN simulated cell state distributions on the UMAP embedding and actual cell states. Quantitatively, the cosine similarity between the model and data distribution is higher than 90% for 92.4% of the samples. The drugs with lower model accuracy tended to be drugs that drove the cell population into a few highly specialized states, such as the proteasome inhibitor bortezomib, where D-SPIN models only have qualitative agreement with the cell state distribution. The quantitative agreement between D-SPIN simulated cell states and data distribution is further validated under various metrics, including error of mean and cross-correlation, optimal transport distance, and probabilities of most frequent states (Figure S5G, STAR Methods 2.14).

The program-level network organizes drugs into phenotypic classes

While many of the drugs in our experiments have been previously reported as immunosuppressive, less is known about their specific effect at the transcriptome level. Our single-cell profiling and D-SPIN analysis provided a single integrated platform to compare the transcriptomic effects of drugs with different chemical mechanisms of action. In addition to the network model, D-SPIN inferred perturbation response vectors that quantify interactions between each drug and the regulatory network. We identified 70 out of 502 drugs that had statistically significant interactions with the D-SPIN-inferred network compared with control experiments (STAR Methods 2.12). D-SPIN grouped these effective drugs into 7 classes, including strong inhibitor, weak inhibitor I, weak inhibitor II, glucocorticoid, M1 macrophage inducer, epigenetic regulator, and toxicant (Figure 6A).

Figure 6. Drug classification derived from D-SPIN aligns with known drug targets and mechanisms of action.

Figure 6

(A) Diagrams of response vectors inferred by D-SPIN are shown as positive/negative interactions between the drug and gene programs for drug examples. D-SPIN identifies 7 phenotypical classes of drugs based on their response vectors. (B) UMAP embeddings of the cell population after treatment with antibody and example drugs from different classes. The top row shows 5 immune inhibitors with decreasing strength from left to right. The bottom row shows 4 example drugs that each induce novel cell states distinct from both the resting and activated control cell population. Activated control population (bottom-right) is shown for comparison. (C) Bubble plots show cell-subtype distributions induced by selected drugs and the control population. Bubbles sizes scale with population proportion, and are colored by major cell types and resting/activated classification. With decreasing inhibitor strength, the proportion of the activated immune-cell population (deep colors) gradually increases. Some drugs induce cell states that are different from both resting and activated control populations. (D) Bar plots quantify the (early) accuracy rate of the correspondence between the inferred regulatory networks from the drug profiling dataset and protein-protein interactions in the String-db databases [67]. Methods with the (*) symbol utilize TF-motif binding information as prior knowledge in the network inference. (E) Rendering of the subnetwork of D-SPIN-identified regulators in kinases and phosphatases. The node size is proportional to the number of identified interactions. The subnetwork is partitioned into two groups of genes that are primarily expressed in myeloid cells or lymphoid cells. (F) Heatmap of D-SPIN-inferred gene-level responses for drugs with immune inhibitory effects. The dendrogram of hierarchical clustering shows that the gene-level signatures classify drugs into groups with similar molecular mechanisms of action.

Drugs within any one of these seven classes induced similar shifts in cell population structures. For example, three of the drug classes include drugs that inhibit T cell, B cell, and myeloid cell activation. Both UMAP visualization and cell state distributions indicate that these inhibitors act with a spectrum of strengths (Figures 6B and 6C). Very strong inhibitors, including the cancer drug dasatinib and immunosuppressive drug tacrolimus, completely block the immune cell activation and result in a cell population similar to unstimulated PBMCs. Weak inhibitors, such as temsirolimus, only slightly increase the proportions of cells in the resting states.

Beyond inhibitors, D-SPIN identified a drug class of glucocorticoids (GCs), which are steroid-derived small molecules that activate the glucocorticoid receptor (Figures 6A and 6B). The GC class includes well-known immunosuppressive drugs, including halcinonide, budesonide, triamcinolone, and dexamethasone. GCs suppress immune activation but also generate cell populations different from the strong inhibitors, in particular, distinct myeloid cell states as visualized on the UMAP. On the program level, D-SPIN showed GCs more weakly suppress the chemokine secretion program (P25) than strong inhibitors but more strongly induce a program P22 associated with M2 macrophages, including expression of CD163 [84, 85].

D-SPIN also identified a group of drugs that induce the activation of inflammatory, pathogen-responsive M1 macrophages (Figures 6A and 6B). The class includes activators of Toll-like receptors (TLRs) in macrophages, which sense the presence of pathogens and induce innate immune responses. The M1 macrophage inducer class contains TLR7 agonists (vesatolimod, resiquimod) and TLR8 agonists (motolimod, resiquimod), and produces macrophage states related to host defense that highly express P27 Pathogen response and P29 Metallothionein programs.

The other two classes of drugs, epigenetic modifiers and toxicants, are associated with stress response programs (Figures 6A and 6B). Toxicants include the proteasome inhibitor bortezomib, a potent non-selective histone deacetylase (HDAC) inhibitor, panobinostat, and DNA topoisomerase inhibitor 10-hydroxycamptothecin. Toxicants strongly activate the P30 Stress response and have mostly inhibitory interactions with other gene programs, especially generic cell-type programs such as P6 T cell and P20 Myeloid cell. The epigenetic-modifiers class consists of HDAC inhibitors, which generate an epigenetically disrupted T cell state (CD8 T epi.) that has elevated expression of histone component genes and DNA topoisomerase TOP2A. All together, these data identify classes of drugs by the unique downstream programs that they impact.

Gene-level network reveals signaling hubs and response signatures of different drug targets

Phenotypical classification reveals a spectrum of inhibitors with different strengths, and these inhibitors employ distinct mechanisms. The majority of these inhibitors have known primary targets, such as tyrosine kinases (TKs) for dasatinib and Janus kinases (JAKs) for cerdulatinib, as well as extensive biochemical assays on the alternative targets and binding strengths of these inhibitors [86, 87]. However, it remains an important question of how the differences in biochemical properties, including target specificity and breadth, translate into different transcriptional responses in the context of a heterogeneous interacting cell community, and ultimately lead to distinct clinical outcomes. Understanding these varying transcriptional responses would be valuable for developing more effective therapeutic strategies.

To identify gene-level signatures of drug actions and gain a global view of the regulatory network controlling T cell-mediated immune activation, we constructed gene-level network models with D-SPIN. During immune activation, regulatory interactions are executed by both DNA-binding interactions of TFs as well as signaling transduction pathways of (de)phosphorylation mediated by kinases (phosphatases). We selected 657 highly expressed regulatory genes for the network construction, including 388 TFs, 187 kinases, and 71 phosphatases [88, 89]. We assessed the network inference quality with the physical protein-protein interaction networks in the String-db database, which integrated multiple data sources, including databases of interaction experiments, curated complexes/pathways, and literature text-mining [67]. We also evaluated CellOracle for comparison, as other inference methods, such as PIDC, GENIE3, and GRNBoost2, do not scale to datasets with millions of cells [70]. D-SPIN achieved significantly higher accuracy rates compared to CellOracle, both without and with prior knowledge of TF-motif binding information. Notably, in the accuracy benchmarking of the K562 Perturb-seq dataset, incorporating motif binding information significantly boosted the accuracy of both D-SPIN and CellOracle. However, in the context of immune activation, where signal transduction pathways play a key role, the binding information only improved the accuracy by a small margin. The result demonstrates the unique advantages of D-SPIN as a data-driven, perturbation-based inference method, compared to motif-based methods like CellOracle or SCENIC+ [70, 73].

Among the hundreds of kinases and phosphatases, D-SPIN identified 41 genes with regulatory roles, highlighting key signaling hubs during immune activation. The top hub genes with the most interactions are primarily Src-family protein tyrosine kinases, including LYN, FYN, HCK, SYK, LCK, BTK, and FGR. Src-family kinases are major components in the immune signaling pathways [90]. For example, FYN and LCK are among the first activated molecules downstream of the T cell receptor. Accordingly, inhibitors of Src-family kinases, such as dasatinib and bosutinib, were among the strongest inhibitors observed in our experiments. The identified regulatory genes also include multiple phosphatases from the dual-specificity phosphatase (DUSP) gene family, which contribute to regulating the intensity of immune response by controlling MAPK signaling [91]. Globally, the inferred core network is partitioned into two groups: genes primarily expressed in myeloid cells and genes expressed in lymphoid cells. Inhibitory interactions between the two groups indicate that these genes do not coexist in the same cell state.

The gene-level responses inferred by D-SPIN distinguish inhibitors with different biochemical mechanisms of action. Hierarchical clustering of the responses organizes the inhibitors into three major categories: strong inhibitors, weak inhibitors, and glucocorticoids (GCs). The strong and weak inhibitors further split into subclasses that match the primary targets of these small-molecule drugs. The targets of strong inhibitors include Src family kinases (bosutinib, cerdulatinib, dasatinib), JAKs (tofacitinib, ruxolitinib), and calcineurin (cyclosporine, tacrolimus). The targets of weak inhibitors include growth factors (crizotinib, sunitinib, nilotinib), mTOR (everolimus, temsirolimus, sirolimus), cytokine (ralimetinib), cAMP (alprostadil), and topoisomerase (topotecan HCl, doxorubicin HCl). Broadly, the analysis suggests that inhibition can be achieved via a range of distinct biochemical pathways and mechanisms. However, the strong inhibitors in our data target molecules immediately downstream of T cell receptor activation, such as the Src family kinase LCK.

Furthermore, the fine-grained D-SPIN model highlights gene signatures that separate drugs with different mechanisms. Compared with strong inhibitors, the anti-inflammatory GC drugs selectively activated KLF9, TSC22D3, MAFB, NEAT1, and DUSP1/2. Each of these genes participates in M2 macrophage polarization or signal transduction of glucocorticoid receptors [92, 93, 94, 95, 96, 97]. In comparison, strong inhibitors had increased repression of STAT1/3, JAK3, and IRF1/4/7/9. JAK-STAT signaling and the IRF family are both central pathways during the inflammation [80, 98], whose inhibition corresponds to the shut-down of immune activation by these strong inhibitors. The distinctions between strong inhibitor types are more subtle. A group of responding genes exhibits increased response strength from calcineurin inhibitors to JAK inhibitors to Src inhibitors, including CSF1R, ACP5, JAK3, and DAPK1. The strongest inhibitors, Src inhibitors, have additional suppression of inflammatory-associated genes, such as NFKB1A, IL1B, and EGR2.

Weak inhibitors with different mechanisms each had unique response signatures on a few genes. mTOR inhibitors such as rapamycin induced the activation of EEF1A1, an upstream regulator of the PI3K/AKT/mTOR pathway [99]. The activation also suggests potential compensatory mechanisms on EEF1A1 under mTOR inhibition. Topoisomerase inhibitors specifically activated a few genes associated with the p53 pathway, including ATF3, MDM2, and PHPT1 [100, 101, 102]. p53 plays a vital role in maintaining genome stability, and p53 deficiency is known to sensitize cells to topoisomerase inhibitors [103]. Both ATF3 and MDM2 have been shown to participate in the DNA damage stress response induced by topoisomerase inhibitors [104, 101].

Drug combinations generate novel cell states with hyper-suppression

Immunomodulatory drugs are often used in combinations. Knowing how specific drug combinations could tune the transcriptional cell-state distribution of the immune cell population could allow more precise drug regimens to meet therapeutic goals. However, the design of drug combinations at the transcriptome scale is challenging due to the large number of potential drug-gene interactions. D-SPIN models provide a framework to compare the action of individual drugs on both the gene level and program level in the context of the regulatory network to identify drug combinations with potentially useful therapeutic applications.

Therefore, we applied D-SPIN to interpret the mechanisms of combinatorial drug action (Figure S7A). We selected 10 drugs from different drug classes identified by D-SPIN and profiled all pairwise combinations experimentally. We found that 83% of the drug interactions in our profiling were additive or sub-additive on the gene program level, meaning that the effect of the drug combination is equal to, or weaker than, the sum of each’s effects. Other types of drug interactions include dominant, synergistic, and antagonistic (STAR Methods 2.16). The additive interactions between drugs recruited a combination of transcriptional programs from single drugs, creating novel cell states or population states, especially between drugs that have distinct impacts.

Among the combinations, glucocorticoids (GCs) and strong inhibitors induced coherent anti-inflammatory effects on gene programs. However, their combination produced a novel macrophage state that was distinct from the state produced by either drug alone (Figure S7A). To investigate how the drug combination generated novel biological outcomes, we performed single-cell mRNA-seq on cell populations treated with the GC halcinonide in combination with the Src inhibitor dasatinib across a range of doses and analyzed the combinatorial response with D-SPIN (Figures 7A and 7B). Both halcinonide and dasatinib are anti-inflammatory drugs that suppress the activation of lymphoid and myeloid cell populations, and in the D-SPIN model, halcinonide and dasatinib activate or inhibit the same set of gene programs (Figure 7C). However, the two drugs acted on specific gene programs with different intensities (Figure 7D). For example, compared with dasatinib, halcinonide induced weaker suppression of genes involved in macrophage activation, such as IDO1, CD40, and SLAMF7, in the P24 inflammatory macrophage program. Halcinonide also promoted the M2 polarization of macrophages, strongly activating M2-associated genes, including CD163, MS4A6A, and VSIG4 in the P22 M2 macrophage program.

Figure 7: D-SPIN reveals network-level mechanisms of combinatorial drug action.

Figure 7:

(A) (top) UMAP embeddings of drug dosage titration of dasatinib plus halcinonide. Cell states are colored by major cell types and the resting/activated classification. (bottom) UMAP rendering of example myeloid programs by coloring cells with program expressions. (B) UMAP embedding of activated control, dasatinib, halcinonide, and drug combination-treated cell population, with arrows indicating different myeloid cell state changes. The drug combination induces a novel macrophage state. (C)(left) Schematic diagram of coherent superposition where the single drugs activate/inhibit the same set of downstream gene programs, and the combination drug response is the superposition of single drug effects. (right) Diagram of response vectors for dasatinib and halcinonide alone and their combination, showing coherent effects on gene programs. (D)(left) Scatter plots compare the response vectors on each gene program for dasatinib and halcinonide. The two drugs activate/inhibit the same set of gene programs, but with different strengths. (right) The drug combination responses are plotted against the stronger single-drug response for each gene program. The combination responses are generally higher than the maximum of the single-drug responses. (E) (top) Diagram of response vectors for two single drugs and their combination on impacted myeloid programs under different dosages. (bottom) UMAP embeddings of myeloid states for each drug combination dosage. The novel macrophage state is induced by a combination of gene program recruitment of single drugs. (F) Surface plot shows response vectors on program P22 under different dosage combinations. Dots are experimental data, and the surface is spline interpolation. Single-drug dose responses are sigmoidal on the logarithm concentration scale, and the combination response is approximately the sum of the two sigmoid-shaped single drug responses, showing the signature of superposition/additivity. (G) D-SPIN depicts the phase diagram of myeloid states under different drug dosage combinations of dasatinib and halcinonide. Drug combination dosages move the myeloid cell population between five different myeloid states. (H) Rendering of the subnetwork of Src inhibitor and glucocorticoid responses as well as inferred regulators of the P22 M2 macrophage program. D-SPIN nominates downstream effectors that mediate the effect of P22 activations, which are highlighted by colored circles.

We found that the novel macrophage state was induced by the additive recruitment of gene programs by the two drugs, including augmented activation of the M2 macrophage program and hyper-repression of the macrophage activation. The strengths of responses were generally larger than either drug alone but did not exceed their sum (Figure 7D). The two drugs had coherent effects on gene programs across the entire dosage range, and the strength of action increased with drug dosages (Figure 7E). Given that both drugs were profiled at maximally effective doses, the enhanced response of the drug combination indicates pathway-level cooperation between the two drugs rather than additive dosing (Figure 7F). Furthermore, the generative D-SPIN model enabled us to compute the macrophage state distribution with any drug dosage combination, therefore, we could portray the population state as a function of drug dosages like a phase diagram (Figure 7G) (STAR Methods 2.17). In the diagram, at a low dosage of dasatinib, the myeloid state transitions from the activated macrophage to the M2 macrophage with increased halcinonide dosages. At higher dosages of dasatinib, increasing halcinonide dosage produces resting monocyte, inhibited monocyte, and hyper-inhibited M2 states. Overall, manipulating the drug doses allowed a smooth conversion of the macrophage population between different states. The drug combination recruited a combination of gene programs in a dose-dependent fashion, which was also observed in a previous study performed at the single-protein level [105]. Conceptually, by combining the two drugs, we effectively fine-tuned the transcriptional states of the macrophage population by exploiting the additivity of their effects.

Furthermore, the gene-level regulatory network model revealed that Src inhibitors and GC activate the M2 program through distinct gene regulators, therefore explaining why their effects are additive. We used D-SPIN to specifically analyze the regulators of the M2 macrophage program and extracted the core subnetwork together with the responding genes of Src inhibitors and GCs (Figure 7H). The core subnetwork nominated candidate gene regulators that directly mediate the response of the M2 program by each drug type. We found that the effects of Src inhibitors and GCs are mediated by distinct groups of genes. For GCs, the response is mediated by the activation of TSC22D3, DUSP1, CEBPD, and MAFB, all of which are known to be associated with glucocorticoid receptor signaling and M2 polarization [93, 96, 106, 94]. For Src inhibitors, the response is mediated by the inhibition of IRF1 and activation of CSF1R and ACP5. IRF1 is a key controller of M1 polarization, and its inhibition would bias macrophages towards the M2 state [107]. CSF1R and ACP5 are also found participating in M2 polarization, especially in the context of tumor-associated macrophages (TAMs) in the cancer microenvironment [108, 109]. Thus, both Src inhibitors and GCs activate the M2 macrophage program, but their responses are mediated by two distinct sets of regulators and pathways. Their distinct mechanisms provide the opportunity to manipulate a spectrum of macrophage states by fine-tuning the dosage of the two-drug combinations.

Discussion

Here, we introduce D-SPIN, a generalizable and interpretable framework that can be applied to study the perturbation response of cells, including genetic perturbations, small molecules, and other signaling conditions. D-SPIN constructs quantitative, generative models of gene regulatory networks by integrating information from perturbation conditions. The mathematical structure of D-SPIN allowed us to develop a computationally efficient, parallel inference procedure that can be run on hundreds of CPU cores to perform network inference on datasets with thousands of perturbations and millions of cells. Single-cell mRNA-seq methods enable large-scale perturbation response studies across cell types and organisms. D-SPIN provides the framework to integrate such data into regulatory network models that can be analyzed and compared to reveal the architecture, logic, and evolution of these networks across species and time.

A major goal in biology is to control the distribution of cell states in a cell population. Being able to generate a specific set of progeny from stem cells or modulate the state of the immune system has tremendous implications for treating cancer and autoimmune diseases. We showed that D-SPIN identified network-level mechanisms of cell-state modulation by TF perturbations in synthetic HSC networks (Figure 2G) and nominated key regulators of erythroid-myeloid fate choices in K562 perturb-seq datasets (Figures 3F and 3G). Further, D-SPIN revealed the phase diagram of transitioning between macrophage states by modulating the dosage combination of small molecule drugs (Figure 7G), and the effector genes that mediate the additive drug response on the M2 program (Figure 7H). Together, these results suggest that D-SPIN could be used to design interventions that precisely tune networks and cell states.

For drug combinations, we demonstrated that Src inhibitors and glucocorticoids created a novel hyper-inhibited M2 state by additively recruiting a set of gene programs. The novel cell state encompasses complicated expression-level changes of numerous single genes, but at the regulatory network level, the cell state neatly originates from the superposition of single-drug effects. By focusing on the regulatory networks, D-SPIN enables us to dissect these combinatorial mechanisms and interpret them through the inferred D-SPIN regulatory network and responses, and nominate specific regulators that mediate the observed program-level responses. The results provide a conceptual framework for interpreting and predicting the effect of drug combinations. Although additivity alone is insufficient to accurately predict combinatorial response due to the presence of non-additive interactions, it could still serve as a guide to narrow down potential drug-combination targets for therapeutic objectives. Additivity could arise from the modularity of gene regulatory circuits, such that different pathways impact gene expression levels independently. While this principle has been studied on a small scale for a small number of drugs [105], our results suggest that such principles of superposition might hold at the transcriptome scale. Further work is needed to reveal the specific conditions where additivity holds or breaks down.

Cells are distributed control systems that modulate many internal processes to maintain homeostasis and execute cell-fate transitions. However, the principles of distributed control at the transcriptome scale are poorly understood. With D-SPIN, we showed that in K562 cells, perturbations triggered system-wide response strategies where distinct cellular processes are coordinated with long-range information flow. For example, knocking down translation initiation factors or mTOR pathway genes induces coherent down/up-regulation of proteasome/ribosome subunits, pointing to the presence of sensing and controlling mechanisms in the cell. The long-range coordination between pathways might be general, as we also observed in the drug profiling experiments that mTOR inhibitors like rapamycin activated EEF1A1, a key factor in translation. Identifying these feedback control points with combinatorial knockdown screening may lead to the discovery of new regulators or regulating mechanisms.

Our work on D-SPIN represents a significant advancement in constructing biologically interpretable predictive models of regulatory networks underlying cellular decision-making. The transparent graphical model architecture naturally captures interactions between genes and can be interpreted as pathways, circuits, or networks that delineate the flow of information. In the meantime, the model can generate the full distribution of transcriptional states in cell populations across perturbation conditions. While low-dimensional embedding visualizations such as UMAP have become a common practice of single-cell analysis to overview the cell type proportion and shifts, D-SPIN constructs an interpretable circuit/pathway type of model that uncovers internal regulatory connections controlling these shifts of cell population in large-scale perturbation profiling. The simplicity of D-SPIN’s formulation originates from the maximum entropy principle [110].

Toward the ultimate goal of building a cell-scale “digital twin” model of the cell, deep neural network models, especially transformer models, have been gaining popularity [8, 9, 10]. Similar to D-SPIN, these deep models can integrate information from large datasets and make predictions with accuracies that improve with more training data. However, in contrast to D-SPIN, their complex architectures and immense number of parameters provide limited mechanistic insight. D-SPIN directly identifies biological pathways and regulators from perturbation-based single-cell gene expression data. This interpretability has enabled us to uncover candidate regulators of K562 cell fate selection, as well as identify drug response pathways in myeloid cells that generate novel cell states under combinatorial drug conditions. Importantly, D-SPIN achieves this without requiring additional data sources such as ATAC-seq, making it particularly valuable as perturbation-based data becomes increasingly abundant through scaled condition-based barcoding strategies [29, 33].

D-SPIN’s connection with spin network models provides insights into the fundamental nature of cellular regulation. The spin network model is an equilibrium model used to study physical systems at or near thermal equilibrium. Theoretically, D-SPIN depicts the cell population as a collection of points residing in an energy landscape that can be tilted by the perturbation vector to shift the distribution of cell states in a cell population. Energy-landscape models represent a highly simplified class of dynamical systems, as their behavior can be captured within a single energy (potential) function. Equilibrium spin network models have been used to study a much broader range of systems that are far from thermal equilibrium, including neural networks and bird flocks [39, 40, 38]. However, it remains unclear why equilibrium models can yield such significant predictive power for strongly non-equilibrium systems. The ability of equilibrium models to produce low-error reconstructions of cell population gene-expression states suggests that cells, in certain situations, may be effectively modeled as equilibrium systems driven through various configurations by a biasing drive. Such models have been demonstrated for cell-fate regulation and might represent a simplifying principle [5]. The ability to model a cell as an equilibrium system driven through different states presents a powerful simplification, as also explored in other chemical systems, offering a potential path toward more global theories of gene regulation [111, 112].

Limitations of the Study

The D-SPIN framework has several current limitations that represent targets for extension in future work. First, for simplicity and interpretability, D-SPIN only considers pairwise interactions between genes or gene programs in the network. These interactions correspond to the second-order terms in the energy function. The inclusion of higher-order multi-body interactions or concentration-dependent regulatory responses would enhance accuracy and predictive capabilities [113, 114, 115, 116]. Second, D-SPIN is an equilibrium model and does not account for the dynamics of the system. Spin network models have natural extensions that incorporate dynamics [117, 118, 119]. Incorporating temporal dynamics would allow the simulation of directed networks, as directed network models in general cannot define consistent stationary distributions when the network contains feedback loops [58, 57, 59]. Third, D-SPIN assumes that the interactions within the core regulatory network J are not altered by the perturbations, which is a reasonable approximation in perturbation response scenarios such as gene knockdown or small molecule action. However, in scenarios such as cellular differentiation or disease progression, the regulatory network may undergo changes under different conditions due to shifts in epigenetic regulation. Epigenetic reorganization can be included in future versions of D-SPIN by allowing interactions encoded in J to also be condition-dependent.

Supplementary Material

Supplement 1

Supplementary Tables

Supplementary tables are available at Caltech Research Data Repository https://doi.org/10.22002/1pxgr-eqa61

Acknowledgements

We would like to acknowledge the NIH (TR01 GM150125, R01HD100039), the Heritage Medical Research Institute, Charles Trimble, the Shurl and Kay Curci Foundation, the Merkin Institute for Translational Research, 10x Genomics, Amgen, and Chan Zuckerberg Initiative. The single-cell profiling experiments were performed at the Beckman Institute Single-cell Profiling and Engineering Center (SPEC). We acknowledge the Protein Expression Center at the Beckman Institute of Caltech for the use of the liquid-handling robot. Sequencing was performed at the UCSF CAT, supported by UCSF PBBR, RRP IMIA, and NIH 1S10OD028511–01 grants. We acknowledge Dr. Guy Riddihough, Dr. Ariane Helou, and Dr. Jenna Sternberg for editorial assistance with the manuscript. DAS acknowledges the support of an NSERC Discovery Grant and a Tier-II Canada Research Chair. We acknowledge Rob Phillips and Venkat Chandrasekaran for insightful scientific discussions. JH acknowledges the support of NIH 5R01GM135337 grants.

Declaration of Interests

The work was supported in part by funds from a Caltech Amgen Research Collaboration Award and reagent gifts from 10x Genomics. MWT received funding from Adaptive Biotechnologies for unrelated work. MWT is a member of the advisory board of Cell Systems. MWT is a co-founder of CognitiveAI and YurtsAI. ZJG is a co-founder of Scribe Biosciences. The Regents of the University of California with ZJG and CSM as inventors have filed patent applications related to MULTI-seq.

Data and code availability

Gene counts and metadata of drug profiling experiments are available at Caltech Research Data Repository https://doi.org/10.22002/2cjss-wgh69

Supplementary data of the analyses are available at Caltech Research Data Repository

https://doi.org/10.22002/tbmca-wbj97

MATLAB, Python implementations and Jupyter Notebook demonstrations of D-SPIN are available on GitHub https://github.com/JialongJiang/DSPIN

References

  • [1].Bray Dennis. Protein molecules as computational elements in living cells. Nature, 376(6538):307–312, 1995. [DOI] [PubMed] [Google Scholar]
  • [2].Regev Aviv and Shapiro Ehud. The π-calculus as an abstraction for biomolecular systems. In Modelling in Molecular Biology, pages 219–266. Springer, 2004. [Google Scholar]
  • [3].Regev Aviv and Shapiro Ehud. Cellular abstractions: Cells as computation. Nature, 419(6905):343–343, 2002. [DOI] [PubMed] [Google Scholar]
  • [4].Sivak David Aand Thomson Matt. Environmental statistics and optimal regulation. PLoS computational biology, 10(9):e1003826, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Sokolik Cameron, Liu Yanxia, Bauer David, McPherson Jade, Broeker Michael, Heimberg Graham, Qi Lei S, Sivak David A, and Thomson Matt. Transcription factor competition allows embryonic stem cells to distinguish authentic signals from noise. Cell systems, 1(2):117–129, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Kueh Hao Yuan, Champhekar Ameya, Nutt Stephen L, Elowitz Michael B, and Rothenberg Ellen V. Positive feedback between pu. 1 and the cell cycle controls myeloid differentiation. Science, 341(6146):670–673, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Yosef Nir, Shalek Alex K, Gaublomme Jellert T, Jin Hulin, Lee Youjin, Awasthi Amit, Wu Chuan, Karwacz Katarzyna, Xiao Sheng, Jorgolli Marsela, et al. Dynamic regulatory network controlling th17 cell differentiation. Nature, 496(7446):461–468, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Bunne Charlotte, Roohani Yusuf, Rosen Yanay, Gupta Ankit, Zhang Xikun, Roed Marcel, Alexandrov Theo, AlQuraishi Mohammed, Brennan Patricia, Burkhardt Daniel B, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell, 187(25):7045–7063, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Heimberg Graham, Kuo Tony, Daryle J DePianto Omar Salem, Heigl Tobias, Diamant Nathaniel, Scalia Gabriele, Biancalani Tommaso, Turley Shannon J, Rock Jason R, et al. A cell atlas foundation model for scalable search of similar human cells. Nature, pages 1–3, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Cui Haotian, Wang Chloe, Maan Hassaan, Pang Kuan, Luo Fengning, Duan Nan, and Wang Bo. scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature Methods, 21(8):1470–1480, 2024. [DOI] [PubMed] [Google Scholar]
  • [11].Davidson Eric H. The regulatory genome: gene regulatory networks in development and evolution. Elsevier, 2010. [Google Scholar]
  • [12].Davidson Eric H. Emerging properties of animal gene regulatory networks. Nature, 468(7326):911–920, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Davidson Eric H and Erwin Douglas H. Gene regulatory networks and the evolution of animal body plans. Science, 311(5762):796–800, 2006. [DOI] [PubMed] [Google Scholar]
  • [14].Bintu Lacramioara, Buchler Nicolas E, Garcia Hernan G, Gerland Ulrich, Hwa Terence, Kondev Jané, and Phillips Rob. Transcriptional regulation by the numbers: models. Current opinion in genetics & development, 15(2):116–124, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Monod Jacques. The growth of bacterial cultures. Annual review of microbiology, 3(1):371–394, 1949. [Google Scholar]
  • [16].Pardee Arthur B, Jacob François, and Monod Jacques. The genetic control and cytoplasmic expression of “inducibility” in the synthesis of β-galactosidase by e. coli. Journal of Molecular Biology, 1(2):165– 178, 1959. [Google Scholar]
  • [17].Shen-Orr Shai S, Milo Ron, Mangan Shmoolik, and Alon Uri. Network motifs in the transcriptional regulation network of escherichia coli. Nature genetics, 31(1):64–68, 2002. [DOI] [PubMed] [Google Scholar]
  • [18].Milo Ron, Shen-Orr Shai, Itzkovitz Shalev, Kashtan Nadav, Chklovskii Dmitri, and Alon Uri. Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002. [DOI] [PubMed] [Google Scholar]
  • [19].Wagner Günter P, Pavlicev Mihaela, and Cheverud James M. The road to modularity. Nature Reviews Genetics, 8(12):921–931, 2007. [DOI] [PubMed] [Google Scholar]
  • [20].Lee Tong Ihn, Rinaldi Nicola J, Robert François, Odom Duncan T, Bar-Joseph Ziv, Gerber Georg K, Hannett Nancy M, Harbison Christopher T, Thompson Craig M, Simon Itamar, et al. Transcriptional regulatory networks in saccharomyces cerevisiae. science, 298(5594):799–804, 2002. [DOI] [PubMed] [Google Scholar]
  • [21].Hnisz Denes, Abraham Brian J, Lee Tong Ihn, Lau Ashley, Saint-André Violaine, Sigova Alla A, Hoke Heather A, and Young Richard A. Super-enhancers in the control of cell identity and disease. Cell, 155(4):934–947, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Harbison Christopher T, Gordon D Benjamin, Lee Tong Ihn, Rinaldi Nicola J, Macisaac Kenzie D, Danford Timothy W, Hannett Nancy M, Tagne Jean-Bosco, Reynolds David B, Yoo Jane, et al. Transcriptional regulatory code of a eukaryotic genome. Nature, 431(7004):99–104, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Maerkl Sebastian J and Quake Stephen R. A systems approach to measuring the binding energy landscapes of transcription factors. Science, 315(5809):233–237, 2007. [DOI] [PubMed] [Google Scholar]
  • [24].Hu Zhanzhi, Killion Patrick J, and Iyer Vishwanath R. Genetic reconstruction of a functional transcriptional regulatory network. Nature genetics, 39(5):683–687, 2007. [DOI] [PubMed] [Google Scholar]
  • [25].Huang Linda S and Sternberg Paul W. Genetic dissection of developmental pathways. Methods in cell biology, 48:97–122, 1995. [DOI] [PubMed] [Google Scholar]
  • [26].Ferguson Edwin L, Sternberg Paul W, and Horvitz H Robert. A genetic pathway for the specification of the vulval cell lineages of caenorhabditis elegans. Nature, 326(6110):259–267, 1987. [DOI] [PubMed] [Google Scholar]
  • [27].Yuh Chiou-Hwa, Bolouri Hamid, and Davidson Eric H. Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science, 279(5358):1896–1902, 1998. [DOI] [PubMed] [Google Scholar]
  • [28].Dixit Atray, Parnas Oren, Li Biyu, Chen Jenny, Fulco Charles P, Jerby-Arnon Livnat, Marjanovic Nemanja D, Dionne Danielle, Burks Tyler, Raychowdhury Raktima, et al. Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. cell, 167(7):1853– 1866, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Replogle Joseph M, Saunders Reuben A, Pogson Angela N, Hussmann Jeffrey A, Lenail Alexander, Guna Alina, Mascibroda Lauren, Wagner Eric J, Adelman Karen, Lithwick-Yanai Gila, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq. Cell, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].McGinnis Christopher S, Patterson David M, Winkler Juliane, Conrad Daniel N, Hein Marco Y, Srivastava Vasudha, Hu Jennifer L, Murrow Lyndsay M, Weissman Jonathan S, Werb Zena, et al. Multi-seq: sample multiplexing for single-cell rna sequencing using lipid-tagged indices. Nature methods, 16(7):619–626, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Gehring Jase, Park Jong Hwee, Chen Sisi, Thomson Matthew, and Pachter Lior. Highly multiplexed single-cell rna-seq by dna oligonucleotide tagging of cellular proteins. Nature Biotechnology, 38(1):35–38, 2020. [DOI] [PubMed] [Google Scholar]
  • [32].Srivatsan Sanjay R, McFaline-Figueroa José L, Ramani Vijay, Saunders Lauren, Cao Junyue, Packer Jonathan, Pliner Hannah A, Jackson Dana L, Daza Riza M, Christiansen Lena, et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science, 367(6473):45–51, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Zhang Jesse, Ubas Airol A, de Borja Richard, Svensson Valentine, Thomas Nicole, Thakar Neha, Lai Ian, Winters Aidan, Khan Umair, Jones Matthew G, et al. Tahoe-100m: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. bioRxiv, pages 2025–02, 2025. [Google Scholar]
  • [34].Vân Anh Huynh-Thu, Irrthum Alexandre, Wehenkel Louis, and Geurts Pierre. Inferring regulatory networks from expression data using tree-based methods. PloS one, 5(9):e12776, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Moerman Thomas, Santos Sara Aibar, González-Blas Carmen Bravo, Simm Jaak, Moreau Yves, Aerts Jan, and Aerts Stein. Grnboost2 and arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics, 35(12):2159–2161, 2019. [DOI] [PubMed] [Google Scholar]
  • [36].Chan Thalia E, Stumpf Michael PH, and Babtie Ann C. Gene regulatory network inference from single-cell data using multivariate information measures. Cell systems, 5(3):251–267, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Pratapa Aditya, Jalihal Amogh P, Law Jeffrey N, Bharadwaj Aditya, and Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nature methods, 17(2):147–154, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Hopfield John J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Schneidman Elad, Berry Michael J, Segev Ronen, and Bialek William. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087):1007–1012, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Bialek William, Cavagna Andrea, Giardina Irene, Mora Thierry, Silvestri Edmondo, Viale Massimiliano, and Aleksandra M Walczak. Statistical mechanics for natural flocks of birds. Proceedings of the National Academy of Sciences, 109(13):4786–4791, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Santhanam Narayana P and Wainwright Martin J. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory, 58(7):4117–4134, 2012. [Google Scholar]
  • [42].Lang Alex H, Li Hu, Collins James J, and Mehta Pankaj. Epigenetic landscapes explain partially re-programmed cells and identify key reprogramming genes. PLoS computational biology, 10(8):e1003734, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Teschendorff Andrew E and Feinberg Andrew P. Statistical mechanics meets single-cell biology. Nature Reviews Genetics, 22(7):459–476, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Margolin Adam A, Nemenman Ilya, Basso Katia, Wiggins Chris, Stolovitzky Gustavo, Favera Riccardo Dalla, and Califano Andrea. Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. In BMC bioinformatics, volume 7, pages 1–15. Springer, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Besag Julian. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2):192–225, 1974. [Google Scholar]
  • [46].Aurell Erik and Ekeberg Magnus. Inverse ising inference using all the data. Physical review letters, 108(9):090201, 2012. [DOI] [PubMed] [Google Scholar]
  • [47].Ravikumar Pradeep, Wainwright Martin J, and Lafferty John D. High-dimensional ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, pages 1287–1319, 2010. [Google Scholar]
  • [48].Nguyen H Chau, Zecchina Riccardo, and Berg Johannes. Inverse statistical problems: from the inverse ising problem to data science. Advances in Physics, 66(3):197–261, 2017. [Google Scholar]
  • [49].Roguev Assen, Bandyopadhyay Sourav, Zofall Martin, Zhang Ke, Fischer Tamas, Collins Sean R, Qu Hongjing, Shales Michael, Park Han-Oh, Hayles Jacqueline, et al. Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. science, 322(5900):405–410, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Segal Eran, Shapira Michael, Regev Aviv, Pe’er Dana, Botstein David, Koller Daphne, and Friedman Nir. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature genetics, 34(2):166–176, 2003. [DOI] [PubMed] [Google Scholar]
  • [51].Chen Xiaoqiao, Chen Sisi, and Thomson Matt. Minimal gene set discovery in single-cell mrna-seq datasets with activesvm. Nature Computational Science, 2(6):387–398, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Choi Seungjin. Algorithms for orthogonal nonnegative matrix factorization. In 2008 ieee international joint conference on neural networks (ieee world congress on computational intelligence), pages 1828–1832. IEEE, 2008. [Google Scholar]
  • [53].Vavasis Stephen A. On the complexity of nonnegative matrix factorization. SIAM Journal on Optimization, 20(3):1364–1377, 2010. [Google Scholar]
  • [54].Heimberg Graham, Bhatnagar Rajat, El-Samad Hana, and Thomson Matt. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell systems, 2(4):239–250, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Chen Sisi, Rivaud Paul, Park Jong H, Tsou Tiffany, Charles Emeric, Haliburton John R, Pichiorri Flavia, and Thomson Matt. Dissecting heterogeneous cell populations across drug and disease conditions with popalign. Proceedings of the National Academy of Sciences, 117(46):28784–28794, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Krumsiek Jan, Marr Carsten, Schroeder Timm, and Theis Fabian J. Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network. PloS one, 6(8):e22649, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Pearl Judea. Reverend bayes on inference engines: A distributed hierarchical approach. In Probabilistic and Causal Inference: The Works of Judea Pearl, pages 129–138. ACM, 2022. [Google Scholar]
  • [58].Pearl Judea. Evidential reasoning using stochastic simulation of causal models. Artificial intelligence, 32(2):245–257, 1987. [Google Scholar]
  • [59].Arnold Barry C and Press S James. Compatible conditional distributions. Journal of the American Statistical Association, 84(405):152–156, 1989. [Google Scholar]
  • [60].Hartwell Leland H, Hopfield John J, Leibler Stanislas, and Murray Andrew W. From molecular to modular cell biology. Nature, 402(Suppl 6761):C47–C52, 1999. [DOI] [PubMed] [Google Scholar]
  • [61].Jiang Jialong, Sivak David A, and Thomson Matt. Active learning of spin network models. arXiv preprint arXiv:1903.10474, 2019. [Google Scholar]
  • [62].Newman Mark. Networks. Oxford university press, 2018. [Google Scholar]
  • [63].Barabási Albert-László. Scale-free networks: a decade and beyond. science, 325(5939):412–413, 2009. [DOI] [PubMed] [Google Scholar]
  • [64].Broido Anna D and Clauset Aaron. Scale-free networks are rare. Nature communications, 10(1):1017, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Sherman Brad T, Hao Ming, Qiu Ju, Jiao Xiaoli, Baseler Michael W, Lane H Clifford, Imamichi Tomozumi, and Chang Weizhong. David: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res, 10, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Kuleshov Maxim V, Jones Matthew R, Rouillard Andrew D, Fernandez Nicolas F, Duan Qiaonan, Wang Zichen, Koplev Simon, Jenkins Sherry L, Jagodnik Kathleen M, Lachmann Alexander, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research, 44(W1):W90–W97, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [67].Szklarczyk Damian, Gable Annika L, Nastou Katerina C, Lyon David, Kirsch Rebecca, Pyysalo Sampo, Doncheva Nadezhda T, Legeay Marc, Fang Tao, Bork Peer, et al. The string database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic acids research, 49(D1):D605–D612, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68].de Almeida Melanie, Hinterndorfer Matthias, Brunner Hanna, Grishkovskaya Irina, Singh Kashish, Schleiffer Alexander, Jude Julian, Deswal Sumit, Kalis Robert, Vunjak Milica, et al. Akirin2 controls the nuclear import of proteasomes in vertebrates. Nature, 599(7885):491–496, 2021. [DOI] [PubMed] [Google Scholar]
  • [69].Schraivogel Daniel, Gschwind Andreas R, Milbank Jennifer H, Leonce Daniel R, Jakob Petra, Mathur Lukas, Korbel Jan O, Merten Christoph A, Velten Lars, and Steinmetz Lars M . Targeted perturb-seq enables genome-scale genetic screens in single cells. Nature methods, 17(6):629–635, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70].Kamimoto Kenji, Stringa Blerta, Hoffmann Christy M, Jindal Kunal, Solnica-Krezel Lilianna, and Morris Samantha A. Dissecting cell identity via network inference and in silico gene perturbation. Nature, 614(7949):742–751, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [71].Prill Robert J, Marbach Daniel, Saez-Rodriguez Julio, Sorger Peter K, Alexopoulos Leonidas G, Xue Xiaowei, Clarke Neil D, Altan-Bonnet Gregoire, and Stolovitzky Gustavo. Towards a rigorous assessment of systems biology models: the dream3 challenges. PloS one, 5(2):e9202, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Seçilmiş Deniz, Hillerton Thomas, Tjärnberg Andreas, Nelander Sven, Nordling Torbjörn EM, and Sonnhammer Erik LL. Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Scientific reports, 12(1):16531, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [73].González-Blas Carmen Bravo, De Winter Seppe, Hulselmans Gert, Hecker Nikolai, Matetovici Irina, Christiaens Valerie, Poovathingal Suresh, Wouters Jasper, Aibar Sara, and Aerts Stein. Scenic+: single-cell multiomic inference of enhancers and gene regulatory networks. Nature methods, 20(9):1355–1367, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [74].Hastie Trevor, Tibshirani Robert, Friedman Jerome H, and Friedman Jerome H. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009. [Google Scholar]
  • [75].Traag Vincent A, Waltman Ludo, and Van Eck Nees Jan. From louvain to leiden: guaranteeing well-connected communities. Scientific reports, 9(1):5233, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [76].Schüler Andrea, Schwieger Maike, Engelmann Afra, Weber Kristoffer, Horn Stefan, Müller Ursula, Arnold Michael A, Olson Eric N, and Stocking Carol. The mads transcription factor mef2c is a pivotal modulator of myeloid cell fate. Blood, The Journal of the American Society of Hematology, 111(9):4532–4541, 2008. [DOI] [PubMed] [Google Scholar]
  • [77].Zahran Zeinab Albadry M, Gu Xiaorong, Jha Babal K, and Saunthararajah Yogenthiran. Bcr-abl1 dislocates npm1, pu. 1 and klf1 into cytoplasm to thereby skew granulo-monocytic and impede erythroid differentiation. Blood, 142:1375, 2023. [Google Scholar]
  • [78].Milanese Chiara, Bombardieri Cíntia R, Sepe Sara, Barnhoorn Sander, Payán-Goméz César, Caruso Donatella, Audano Matteo, Pedretti Silvia, Vermeij Wilbert P, Brandt Renata MC, et al. Dna damage and transcription stress cause atp-mediated redesign of metabolism and potentiation of anti-oxidant buffering. Nature communications, 10(1):4887, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [79].Yang Miaomiao, Lu Yanming, Piao Weilan, and Jin Hua. The translational regulation in mtor pathway. Biomolecules, 12(6):802, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [80].Schwartz Daniella M, Kanno Yuka, Villarino Alejandro, Ward Michael, Gadina Massimo, and O’Shea John J. Jak inhibition as a therapeutic strategy for immune and inflammatory diseases. Nature reviews Drug discovery, 16(12):843–862, 2017. [DOI] [PubMed] [Google Scholar]
  • [81].Zhou Wen, Yui Mary A, Williams Brian A, Yun Jina, Wold Barbara J, Cai Long, and Rothenberg Ellen V . Single-cell analysis reveals regulatory gene expression dynamics leading to lineage commitment in early t cell development. Cell systems, 9(4):321–337, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [82].Van de Sande Bram, Lee Joon Sang, Mutasa-Gottgens Euphemia, Naughton Bart, Bacon Wendi, Manning Jonathan, Wang Yong, Pollard Jack, Mendez Melissa, Hill Jon, et al. Applications of single-cell rna sequencing in drug discovery and development. Nature Reviews Drug Discovery, pages 1–25, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [83].Conde C Domínguez, Xu C, Jarvis LB, Rainbow DB, Wells SB, Gomes T, Howlett SK, Suchanek O, Polanski K, King HW, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science, 376(6594):eabl5197, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [84].Desgeorges Thibaut, Caratti Giorgio, Mounier Rémi, Tuckermann Jan, and Chazaud Bénédicte. Glucocorticoids shape macrophage phenotype for tissue repair. Frontiers in immunology, 10:1591, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [85].Liu Yan-Cun, Zou Xian-Biao, Chai Yan-Fen, and Yao Yong-Ming. Macrophage polarization in inflammatory diseases. International journal of biological sciences, 10(5):520, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [86].Skuta Ctibor, Popr Martin, Muller Tomas, Jindrich Jindrich, Kahle Michal, Sedlak David, Svozil Daniel, and Bartunek Petr. Probes & drugs portal: an interactive, open data resource for chemical biology. Nature methods, 14(8):759–760, 2017. [DOI] [PubMed] [Google Scholar]
  • [87].Wishart David S, Knox Craig, Guo An Chi, Shrivastava Savita, Hassanali Murtaza, Stothard Paul, Chang Zhan, and Woolsey Jennifer. Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research, 34(suppl 1):D668–D672, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [88].Manning Gerard, Whyte David B, Martinez Ricardo, Hunter Tony, and Sudarsanam Sucha. The protein kinase complement of the human genome. Science, 298(5600):1912–1934, 2002. [DOI] [PubMed] [Google Scholar]
  • [89].Chen Mark J, Dixon Jack E, and Manning Gerard. Genomics and evolution of protein phosphatases. Science signaling, 10(474):eaag1796, 2017. [DOI] [PubMed] [Google Scholar]
  • [90].Lowell Clifford A. Src-family kinases: rheostats of immune cell signaling. Molecular immunology, 41(6–7):631–643, 2004. [DOI] [PubMed] [Google Scholar]
  • [91].Lang Roland, Hammer Michael, and Mages Jörg. Dusp meet immunology: dual specificity mapk phosphatases in control of the inflammatory response. The Journal of Immunology, 177(11):7497–7504, 2006. [DOI] [PubMed] [Google Scholar]
  • [92].Gans Ian, Hartig Ellen I, Zhu Shusen Tilden Andrea R, Hutchins Lucie N, Maki Nathaniel J, Graber Joel H, and Coffman James A. Klf9 is a key feedforward regulator of the transcriptomic response to glucocorticoid receptor activity. Scientific reports, 10(1):11415, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [93].Yang Heng, Xia Lin, Chen Jian, Zhang Shuqing, Martin Vincent, Li Qingqing, Lin Shangqing, Chen Jinfeng, Calmette Joseph, Lu Min, et al. Stress–glucocorticoid–tsc22d3 axis compromises therapy-induced antitumor immunity. Nature medicine, 25(9):1428–1441, 2019. [DOI] [PubMed] [Google Scholar]
  • [94].Kim Hwijin. The transcription factor mafb promotes anti-inflammatory m2 polarization and cholesterol efflux in macrophages. Scientific reports, 7(1):7591, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [95].Gao Yin, Fang Peng, Li Wen-Jin, Zhang Jian, Wang Guang-Ping, Jiang Duan-Feng, and Chen Fang-Ping. Lncrna neat1 sponges mir-214 to regulate m2 macrophage polarization by regulation of b7-h3 in multiple myeloma. Molecular immunology, 117:20–28, 2020. [DOI] [PubMed] [Google Scholar]
  • [96].Hoppstädter Jessicaand Ammit Alaina J. Role of dual-specificity phosphatase 1 in glucocorticoid-driven anti-inflammatory responses. Frontiers in immunology, 10:1446, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [97].Clark Andrew R, Martins Joana RS, and Tchen Carmen R . Role of dual specificity phosphatases in biological responses to glucocorticoids. Journal of Biological Chemistry, 283(38):25765–25769, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [98].Wang Lian, Zhu Yanghui, Zhang Nan, Xian Yali, Tang Yu, Ye Jing, Reza Fekrazad, He Gu, Wen Xiang, and Jiang Xian. The multiple roles of interferon regulatory factor family in health and disease. Signal Transduction and Targeted Therapy, 9(1):282, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [99].Romaus-Sanjurjo Daniel, Saikia Junmi M, Kim Hugo J, Tsai Kristen M, Le Geneva Q, and Zheng Binhai. Overexpressing eukaryotic elongation factor 1 alpha (eef1a) proteins to promote corticospinal axon repair after injury. Cell Death Discovery, 8(1):390, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [100].Yan Chunhong, Lu Dan, Hai Tsonwin, and Boyd Douglas D. Activating transcription factor 3, a stress sensor, activates p53 by blocking its ubiquitination. The EMBO journal, 24(13):2425–2435, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [101].Shu Jianfeng, Jiang Jinni, Wang Xiaofang, Yang Xuejie, Zhao Guofang, and Cai Ting. Mdm2 provides top2 poison resistance by promoting proteolysis of top2βcc in a p53-independent manner. Cell Death & Disease, 15(1):83, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [102].Zaccara Sara, Tebaldi Toma, Pederiva C, Ciribilli Yari, Bisio Alessandra, and Inga Alberto. p53-directed translational control can shape and expand the universe of p53 target genes. Cell Death & Differentiation, 21(10):1522–1534, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [103].Yeo Constance Qiao Xin, Alexander Irina, Lin Zhaoru, Lim Shuhui, Aning Obed Akwasi, Kumar Ramesh, Sangthongpitag Kanda, Pendharkar Vishal, Ho Vincent HB, and Cheok Chit Fang. p53 maintains genomic stability by preventing interference between transcription and replication. Cell reports, 15(1):132–146, 2016. [DOI] [PubMed] [Google Scholar]
  • [104].Cheng Yung-Chih, Snavely Andrew, Barrett Lee B, Zhang Xuefei, Herman Crystal, Frost Devlin J, Riva Priscilla, Tochitsky Ivan, Kawaguchi Riki, Singh Bhagat, et al. Topoisomerase i inhibition and peripheral nerve injury induce dna breaks and atf3-associated axon regeneration in sensory neurons. Cell reports, 36(10), 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [105].Geva-Zatorsky Naama, Dekel Erez, Cohen Ariel A, Danon Tamar, Cohen Lydia, and Alon Uri. Protein dynamics in drug combinations: a linear superposition of individual-drug responses. Cell, 140(5):643– 651, 2010. [DOI] [PubMed] [Google Scholar]
  • [106].Yang Yanhua, Xia Shujun, Zhang Lu, Wang Wenhan, Chen Lin, and Zhan Weiwei. Mir-324–5p/ptprd/cebpd axis promotes papillary thyroid carcinoma progression via microenvironment alteration. Cancer Biology & Therapy, 21(6):522–532, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [107].Shi Yu, Zhang Bin, Zhu Jian, Huang Wu, Han Bin, Wang Qilong, Qi Chunjian, Wang Minghai, and Liu Fang. mir-106b-5p inhibits irf1/ifn-β signaling to promote m2 macrophage polarization of glioblastoma. OncoTargets and therapy, pages 7479–7492, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [108].Cannarile Michael A, Weisser Martin, Jacob Wolfgang, Jegg Anna-Maria, Ries Carola H, and Rüttinger Dominik. Colony-stimulating factor 1 receptor (csf1r) inhibitors in cancer therapy. Journal for immunotherapy of cancer, 5(1):53, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [109].Ma Ruo-Yu, Black Annabel, and Qian Bin-Zhi. Macrophage diversity in cancer revisited in the era of single-cell omics. Trends in immunology, 43(7):546–563, 2022. [DOI] [PubMed] [Google Scholar]
  • [110].Jaynes Edwin T. Information theory and statistical mechanics. Physical review, 106(4):620, 1957. [Google Scholar]
  • [111].Bintu Lacramioara, Buchler Nicolas E, Garcia Hernan G, Gerland Ulrich, Hwa Terence, Kondev Jane, Kuhlman Thomas, and Phillips Rob. Transcriptional regulation by the numbers: applications. Current opinion in genetics & development, 15(2):125–135, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [112].Gunawardena Jeremy. A linear framework for time-scale separation in nonlinear biochemical systems. PloS one, 7(5):e36321, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [113].Lotfollahi Mohammad, Wolf F Alexander, and Theis Fabian J . scgen predicts single-cell perturbation responses. Nature methods, 16(8):715–721, 2019. [DOI] [PubMed] [Google Scholar]
  • [114].Lotfollahi Mohammad, Susmelj Anna Klimovskaia, De Donno Carlo, Hetzel Leon, Ji Yuge, Ibarra Ignacio L, Srivatsan Sanjay R, Naghipourfar Mohsen, Daza Riza M, Martin Beth, et al. Predicting cellular responses to complex perturbations in high-throughput screens. Molecular Systems Biology, page e11517, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [115].Roohani Yusuf, Huang Kexin, and Leskovec Jure. Predicting transcriptional outcomes of novel multigene perturbations with gears. Nature Biotechnology, pages 1–9, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [116].Dong Mingze, Wang Bao, Wei Jessica, de O. Fonseca Antonio H, Perry Curtis J, Frey Alexander, Ouerghi Feriel, Foxman Ellen F, Ishizuka Jeffrey J, Dhodapkar Rahul M, et al. Causal identification of single-cell experimental perturbation effects with cinema-ot. Nature Methods, 20(11):1769–1779, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [117].Sompolinsky Haim and Zippelius Annette. Relaxational dynamics of the edwards-anderson model and the mean-field theory of spin-glasses. Physical Review B, 25(11):6860, 1982. [Google Scholar]
  • [118].Fisher Daniel S and Huse David A. Nonequilibrium dynamics of spin glasses. Physical Review B, 38(1):373, 1988. [DOI] [PubMed] [Google Scholar]
  • [119].Glauber Roy J. Time-dependent statistics of the ising model. Journal of mathematical physics, 4(2):294–307, 1963. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

Data Availability Statement

Gene counts and metadata of drug profiling experiments are available at Caltech Research Data Repository https://doi.org/10.22002/2cjss-wgh69

Supplementary data of the analyses are available at Caltech Research Data Repository

https://doi.org/10.22002/tbmca-wbj97

MATLAB, Python implementations and Jupyter Notebook demonstrations of D-SPIN are available on GitHub https://github.com/JialongJiang/DSPIN


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES