Summary
Modeling cellular responses to genetic perturbations is a significant challenge in computational biology. Measuring all gene perturbations and their combinations across cell types and conditions is experimentally challenging, highlighting the need for predictive models that generalize across data types to support this task. Here we present MORPH, a MOdular framework for predicting Responses to Perturbational cHanges. MORPH combines a discrepancy-based variational autoencoder with an attention mechanism to predict cellular responses to unseen perturbations. It supports both single-cell transcriptomics and imaging outputs and can generalize to unseen perturbations, combinations of perturbations, and perturbations in new cellular contexts. The attention-based framework enables inference of gene interactions and regulatory networks, while the learned gene embeddings can guide the design of informative perturbations. Overall, we envision MORPH as a flexible tool for optimizing perturbation experiments, enabling efficient exploration of the perturbation space to advance understanding of cellular programs for fundamental research and therapeutic applications.
1. Introduction
Single-cell sequencing and imaging technologies have revolutionized our understanding of cellular heterogeneity, enabling detailed characterization of various cell states and functions at scale [1]. Coupled with CRISPR-based perturbation platforms and multiplexed barcoding strategies, these technologies now allow systematic, high-dimensional mapping of the heterogeneous cellular responses to individual genetic perturbations. Several methods combine genetic perturbation with single-cell RNA sequencing (scRNA-seq) as a readout for the gene expression changes [2-7], while optical pooled screens capture cellular phenotypes through imaging data [8-10]. Understanding these perturbation-induced effects not only provides insights into the structure of gene regulatory networks but also has significant implications for biomedical research, such as identifying potential therapeutic targets that could drive cells to a desired state [11-13] and optimizing combination therapies [14-16].
While the availability of single-cell perturbation data is growing, experimentally testing all individual gene perturbations– and their combinations– across all possible cell types, states and diseases contexts remains unfeasible. Therefore, there is a need for computational methods that leverage the existing data to predict perturbation effects across different conditions and data types [17, 18]. Such models would be crucial for efficiently exploring the vast perturbational space and prioritizing the most informative experiments to perform.
Current state-of-the-art (SOTA) computational methods for this task [19, 20] perform well in approximating the state of perturbed cells on average but struggle to capture the full distribution of cellular responses. These methods often rely on randomly pairing control cells (those receiving non-targeting guides) and perturbed cells to address a fundamental challenge in high-throughput sequencing-based perturbation datasets: the inherent unpaired nature of the data [21]. This is due to the destructive nature of the assays, which means we have no direct information about the same cell before and after perturbation. Additionally, existing methods often assume that the effects of gene perturbations are additive [22, 23], limiting their ability to capture complex, non-additive interactions. Moreover, current methods primarily focus on gene expression data and are not designed to handle the imaging modality [19, 20, 22, 23].
To address these challenges, we developed MORPH, a MOdular framework for predicting Responses to Perturbational cHanges. Its modular design enables MORPH to adapt seamlessly to various data modalities, such as transcriptomic or imaging data. Given single-cell data on control cells and prior knowledge of gene perturbations, MORPH leverages a conditional variational autoencoder with an attention mechanism to predict the distribution of cellular responses to unseen perturbations. Using three single-cell Perturb-seq datasets [7] and different sources of prior knowledge, we demonstrated that MORPH can efficiently predict the effect of single-gene perturbations unseen during training. Moreover, it can transfer the gene perturbation effects to unseen cell lines when fine-tuned on data from control cells of the second cell line. We also validated MORPH’s ability to predict the effect of unseen combinatorial perturbations on a dataset including single-gene and double-genes perturbations [5]. We found that our model efficiently detects the type of gene interaction existing between the perturbed genes. Building on our theoretical work providing identifiability guarantees for causal effect estimation with unpaired data [24], we showed that MORPH’s learned attention mechanisms have a causal interpretation and can help infer regulatory mechanisms. In other words, MORPH learns functional relations among the genes, and it is possible to use the learned embeddings to classify perturbations that will induce similar effects and genes that will respond similarly to a perturbation. Finally, we evaluated MORPH’s modularity by leveraging an optical pooled screening dataset on HeLa cells infected with the Ebola virus [10]. MORPH could predict the top perturbations altering the cell infection state. For all these applications, we demonstrated that MORPH outperformed available baseline models. Overall, our results highlight MORPH’s versatility and potential to guide the design of future perturbation experiments to enable more efficient and targeted perturbation screens.
2. Results
MORPH enables the prediction of cellular responses to genetic perturbations at single-cell level
MORPH is designed to predict the effect of a genetic perturbation on an individual cell. We represent a cell as a vector encoding any high-dimensional readout, such as gene expression profiles from RNA sequencing or morphological features extracted from imaging. We encode prior knowledge about a gene of interest (such as pathway associations, gene interactions, etc.) and its perturbation effect as an embedding vector . This embedding allows MORPH to relate unseen perturbations to previously seen ones, based on the intuition that perturbations with similar biological properties tend to have similar effects. Formally, given a control cell , an embedding for gene , MORPH learns a function that maps the pair (, ) to the predicted perturbed cell , i.e., . For each gene perturbation in the dataset, MORPH compares the predicted perturbed distribution of to the actual perturbed distribution of , enabling it to learn the mapping from unpaired data.
To realize the mapping , MORPH adopts a variational autoencoder (VAE)-based architecture. A control cell and gene perturbation embedding vector are first embedded into latent spaces using two separate encoders. To enhance interpretability and reliability, these latent representations are passed through an attention mechanism that guides the model to focus on the most relevant aspects of the input data, before being decoded into the predicted perturbed cell . All encoders and decoders are implemented as multi-layer perceptrons (MLPs) (Figure 1a, Methods). The model is trained using a variational lower bound on the control cell and a maximum mean discrepancy between the predicted perturbed cell distribution and the actual perturbed cell distribution of (Methods).
Fig. 1. Overview of MORPH and its applications.
a, Model architecture. For each pair of control cell and genetic perturbation, the model maps them into latent representations using separate encoders. In the latent space, attention modules dynamically identify the most relevant features for the given perturbation. The resulting attention output is then passed to a decoder to generate the predicted perturbed cell. We demonstrate MORPH on both transcriptomic and imaging modalities. b, Downstream applications. MORPH can be applied to a variety of tasks, including predicting unseen genetic effects, transferring perturbation effects across cell lines, predicting genetic interactions, designing perturbation experiments, and inferring gene regulatory networks.
The design of our attention mechanism was motivated by a previous study [2] that had represented the effects of genetic perturbations using a bipartite regulatory network, where perturbations influence the expression of regulated genes. Clustering perturbations based on shared regulated genes reveals perturbation modules — groups of perturbations that produce similar cellular effects. Similarly, clustering genes based on their co-regulation by perturbations yields gene programs — sets of genes that respond in a coordinated way. These perturbation modules and gene programs reflect the underlying structure of gene regulation. MORPH integrates a cross-attention mechanism to capture the regulation from perturbations to gene programs (Methods). It constructs queries by concatenating the latent vectors obtained by encoding the control cell and the perturbation embedding. These queries attend to a learned gene-program matrix that serves as keys and values, producing attention scores that modulate the influence of each gene program on the final prediction (Figure 1a). This gene-program matrix is shared across all cells of the same type and remains fixed at inference time, under the assumption that gene programs are relatively stable within a given cell type, while their activation depends on the specific perturbation and the initial state of a cell.
These design choices enable MORPH to address fundamental questions about genetic perturbations. We demonstrate that MORPH can predict cellular responses to unseen perturbations in both transcriptomic and imaging data, highlighting its versatility in handling various data modalities. Moreover, it can transfer perturbation effects across cell types, extrapolate genetic interaction types, optimize the design of genetic perturbations, and facilitate the investigation of genetic regulatory networks (Figure 1b).
MORPH generalizes to predict the effect of unseen single-gene Perturb-seq experiments with high accuracy
We evaluated the ability of MORPH to predict cellular responses to unseen single-gene perturbations using single-cell Perturb-seq data from the three screens in [7]: experiments targeting essential genes in K562 cells (2,033 perturbations and over 310,000 cells) and RPE1 cells (2,264 perturbations and over 240,000 cells), as well as a genome-wide experiment in K562 cells (9,823 perturbations and over 1,900,000 cells). The model was trained separately on each dataset, and predictions were compared against held-out perturbations not seen during training (Figure 2a).
Fig. 2. Predicting responses to unseen single-gene perturbations.
a, Workflow of single-gene perturbation predictions. Different sources of prior knowledge were explored, such as gene embedding vectors derived from single-cell foundation models and perturbation-specific databases like DepMap. b-c, Sources of prior knowledge (b) and models (c) were evaluated using a distributional distance (MMD), average RMSE between the mean predicted and observed gene expression profiles, and Pearson correlation between the mean predicted and observed perturbation gene expression changes relative to the control. Metrics were calculated using the top 50 differentially expressed (DE) genes for each perturbation. Evaluations were performed across 5-fold cross-validation splits and outlier distribution splits. d, UMAPs of observed perturbed cells and cells predicted by MORPH (on the right) and the current state-of-the-art model that predicts single-cell level response, GEARS (on the left). e-f, Spider plots summarizing the prediction accuracy using each prior knowledge base across the top 8 enriched gene sets. These gene sets were identified through gene set enrichment analysis on the union of genes where one prior outperformed the other. Prediction accuracy was computed as , where is the mean prediction loss (MMD) for each gene set under a given prior, and is the highest mean prediction loss across all gene sets.
We tested the model on two types of data splits (Figure 2b-c and Supplemental Figure 1). The first uses a standard five-fold cross-validation, where the dataset was divided into five equal sets of perturbations. In each iteration, one set of perturbations served as the test set, and the remaining four were used for training. This process was repeated for all five folds, and the performance metrics were averaged to evaluate the model. The second, termed the “outlier distribution split”, specifically included perturbations producing phenotypes that were most distinct from the control distribution in the test split. This was achieved by performing Leiden clustering on the pseudo-bulk profiles of each perturbation and identifying the clusters whose centers were farthest from the cluster that contained control, measured by Euclidean distance between averaged bulk expressions. This split was designed to assess the ability of the model to generalize to unseen perturbations that induce significantly distinct profiles, which are more reflective of real-world scenarios (Supplemental Figure 1).
To assess performance, we used three primary metrics (Figure 2b-c, Methods). Root mean squared error (RMSE) was used to quantify the mean difference between predictions and observations. Pearson correlation was employed to measure the agreement between the predicted and observed mean changes relative to control cells. In addition to these standardly reported metrics, we calculated the maximum mean discrepancy (MMD) to evaluate distributional differences between predicted and true effects in gene expression observed in individual perturbed cells. Lower MMD values indicate closer alignment between the two cell distributions. This metric offers more fine-grained information than the first two metrics that compare only the mean gene expression. For completeness, we also reported the fraction of genes for which the predicted direction of mean changes relative to the control matches the ground truth, as shown in Supplemental Figures 2-3. We calculated these metrics using the top 50 differentially expressed (DE) genes for each perturbation for a more focused evaluation (Figure 2 and Supplemental Figure 2) as well as using all genes (Supplemental Figure 3).
We experimented with four types of prior knowledge on genes for the perturbation embedding : control gene expression from unperturbed cells, gene embeddings derived from the Geneformer single-cell foundation model [25], language-based embeddings obtained by GenePT [26], and embeddings derived from the DepMap database [12] containing cell viability scores across cell lines (Figure 2a). Among these, DepMap embeddings consistently yielded the best performance, particularly in the outlier distribution split, a trend observed across different screens (Figure 2b and Supplemental Figure 2-3). This result is notable because DepMap is the only prior based directly on experimental perturbation data, highlighting the value of using functionally grounded information to improve generalization.
We compared our method using the DepMap prior to a range of existing methods, including a simple baseline that assumes no perturbation effects (Control distribution), as well as two state-of-the-art models: GEARS [19], a graph-based deep learning model, and a linear model [23], which we adapted to incorporate the DepMap embeddings as prior knowledge.1 Across all metrics, MORPH consistently outperformed the baselines (Figure 2c). Importantly, it also excelled in the outlier distribution split, demonstrating its ability to generalize to perturbations inducing highly distinct effects when compared to the control cells (Figure 2c and Supplemental Figure 2). UMAP visualizations [27] of perturbations with maximal perturbation effects in the test set further support this conclusion, showing that the predictions of MORPH aligned closely with the observed perturbed cell states (Figure 2d and Supplemental Figure 4).
As mentioned above, the choice of prior knowledge influenced prediction accuracy. To better understand the impact of prior knowledge on the prediction accuracy, we analyzed gene-specific performance differences (Figure 2e-f and Supplemental Figure 5). For each dataset, we identified target genes for which a specific prior outperformed others and performed gene set enrichment analysis on these subsets of perturbations (Methods). We then evaluated the performance of different priors among the significantly enriched gene sets. In the RPE1 cells, DepMap embeddings achieved the best performance across all significantly enriched gene sets (Figure 2e). GenePT language-based embeddings, particularly those incorporating information from both NCBI and UniProt, outperformed versions based solely on NCBI or STRING data (Figure 2e). In the K562 cells, DepMap embeddings also exhibited consistently strong performances, with 2 exceptions: the “ribosomal small subunit biogenesis” and the “maturation of SSU-rRNA” for which GenePT embeddings (NCBI+UniProt) lead to better performance (Figure 2f). This variation may reflect context-dependent differences in how prior knowledge informs predictions. To test whether combining multiple strong priors could improve performance, we implemented a mixture-of-experts (MoE) model that integrates the language-based and DepMap priors. While the MoE model did not show substantial improvement over individual priors when evaluated on this dataset (Supplemental Table 1), it may offer more balanced performance across a broader range of genes, which could be advantageous in settings where consistency across various targets is important.
In summary, we demonstrated that MORPH outperforms other methods in predicting cellular responses to unseen single-gene perturbations, achieving superior results across key metrics such as MMD, RMSE, and Pearson correlation. DepMap embeddings consistently provided the most accurate predictions, particularly in challenging scenarios that require generalizing to outlier distributions.
MORPH enables transferring perturbation effects across cell lines
With limited perturbation screens and the large number of potential cellular contexts of interest, it is crucial to determine whether perturbation effects can be transferred across distinct contexts. Specifically, we aimed to assess whether a model trained on a particular perturbation in one context—such as a particular cell line or disease state—could accurately predict the effects of the same perturbation in a different cellular context.
To evaluate MORPH’s ability to transfer perturbation effects across contexts, we trained it on RPE1 perturbation data and assessed its transferability to K562 (both datasets were obtained from Replogle et al. [7]). The model was first trained on all RPE1 perturbations, then fine-tuned using only K562 control cells by minimizing reconstruction loss (Methods). Fine-tuning on the control cells helps the model adapt to the new cell line’s basal state. For comparison, we included two baseline models: (1) the control distribution, which predicts the control cells in K562 without perturbation effects, and (2) the mean shift model, which applies the same perturbation-induced pseudobulk changes from RPE1 controls to K562 controls to predict perturbed cells in K562. Our results demonstrate that MORPH trained on RPE1 and fine-tuned on the control cells from K562 substantially outperforms both baseline models (Figure 3a). In fact, this model achieves performance comparable to MORPH trained directly on a subset of random 100 perturbations from K562. Furthermore, MORPH pretrained on RPE1 and then fine-tuned on 100 perturbations from K562 improves the performance to a level similar to MORPH trained on 1,500 perturbations from K562, suggesting that this transfer learning scheme enables MORPH to reach similar performance using only 7% (100/1,500) of the data.
Fig. 3. MORPH effectively transfers perturbation effects across cell lines and predicts double-gene perturbations.
a, Evaluation of model performance in transferring perturbation effects across cell lines using MMD and RMSE metrics. Baseline models include the control distribution and mean shift, which applies the same mean change vector from control in the training cell line to the test cell line. The figure compared the performance of different training strategies of MORPH, including a model trained on the training cell line and fine-tuned using only control cells from the test cell line (“Fine-tuned on control”), a model fine-tuned on 100 perturbations from the test cell line (“Fine-tuned on 100 perturbations), a model trained from scratch on 100 perturbations in the test cell line (“Trained on 100 perturbations”), and a reference model trained on 1,500 (75%) perturbations in the test cell line. b, Comparison of the performance of the model fine-tuned on control cells (K562), measured using MMD, and the cell line disagreement, calculated as the Euclidean distance between shift vectors for each perturbation in training (RPE1) and test (K562) cell lines. Pearson correlation = 0.52, p-value < 0.05). Genes are colored by gene sets representing varying levels of similarity across cell lines. c, Box plots showing model performance in predicting cellular responses to double-gene perturbations averaged across scenarios where 0, 1, or 2 genes in the pair are unseen during training. d, UMAP visualization of top perturbations with maximal effects in the test set. e, Analysis on double-gene perturbations where both single genes were observed during training, stratified by gene interaction types. Box plots comparing model performance across different interaction types. f, Box plots evaluating model performance in predicting genetic interaction types. Evaluation metrics include Pearson correlation and the area under the receiver operating characteristic curve (AUC-ROC). These metrics are computed based on gene interaction scores derived from model predictions and corresponding scores from observed data. AUC-ROC values indicate classification performance, where a random classifier would achieve an AUC-ROC of 0.5.
We hypothesized that perturbations transferring well across cell lines are those involving pathways or gene sets that are more similar between the cell lines. To test this hypothesis, we defined a cell line disagreement metric by calculating the Euclidean distance between the mean shift vectors for each perturbation in K562 and RPE1. We then compared the prediction performance of the model fine-tuned only on control cells with this cell line disagreement metric. We observed a positive correlation between cell line disagreement and prediction loss (Pearson correlation = 0.52, p-value < 0.05), indicating that perturbations affecting pathways with dissimilar effects between the two cell lines are harder to predict (Figure 3b).
By analyzing the gene sets with higher cell line disagreement, we identified pathways such as exosome and mRNA turnover, erythroid differentiation, FACT complex, 40S ribosomal subunit, cytoplasmic processes, and the mediator complex (Figure 3b). Among these, erythroid differentiation appears to be particularly cell-line-specific. K562 cells are derived from chronic myeloid leukemia [28] and retain erythroid differentiation potential [29, 30]. In contrast, RPE1 cells are non-cancerous immortalized retinal pigment epithelial cells that lack these erythroid features, likely contributing to the higher divergence in pathways related to erythroid differentiation between the two cell lines.
Pathways with higher similarity between K562 and RPE1 include ubiquitous and essential cellular processes, such as post-translational modifications and microRNA biogenesis, which regulate gene expression and protein function. Likewise, the 28S/39S ribosomal subunit and mitochondrial, critical for metabolism and protein synthesis, are more preserved across these cell lines (Figure 3b).
Overall, these analyses show that perturbation effects learned on one cell line can be transferred to a different cell line by fine-tuning the model using only control data from the new cell line. This approach outperforms baseline methods and reduces data requirements while maintaining high prediction accuracy. Transferability depends on pathway similarity between the two cell lines: conserved processes like mitochondrial functions generalize well, while cell-type-specific pathways, like erythroid differentiation, are harder to predict.
MORPH generalizes to predict the effect of unseen combinatorial gene perturbations
With ongoing technological advances, it is plausible that experimental data for single-gene perturbations across a broad range of genes and cell lines will become increasingly accessible over time. However, generating experimental data for a comprehensive set of combinatorial perturbations will remain a significant challenge; for example, even considering perturbations involving only up to 5 genes out of around 20,000 in the human genome in one cellular context yields over 2.97 × 1019 combinations. Computational methods that can efficiently generate potential responses to combinatorial perturbations could offer a scalable alternative that could reduce the burden of exhaustive experimentation.
To evaluate MORPH’s ability to predict combinatorial perturbations, we investigated multi-gene perturbations from Norman et al. [5], which includes 234 single-gene and double-gene perturbations across 110,000 cells in the K562 cell line. We assessed performance under varying levels of difficulty: In the simplest case, both genes in a double perturbation had been individually perturbed and observed during training (referred to as 0/2 unseen). More challenging settings included cases where only one (1/2 unseen) or neither (2/2 unseen) of the single-gene perturbations had been seen during training. Comparing against a linear model, GEARS, and the control baseline, we found that MORPH consistently outperforms the current state-of-the-art approaches across all key metrics, improving performance by 33% in MMD and 22% in RMSE (Figure 3c and Supplemental Figure 6a). Visualizing the UMAP projections of the perturbations with maximal effects in the test set, we also observed that MORPH’s predictions align more closely with the observed perturbed cells (Figure 3d and Supplemental Figure 6b).
As highlighted by Norman et al. [5], gene interaction (GI) types vary, and the effects of double-gene perturbations are not always simple additive combinations of single-gene effects. To assess how GI types influence prediction performance, we conducted a 5-fold cross-validation experiment. For each fold, MORPH was trained on all single-gene perturbations and 4/5 of the double-gene perturbations, with the remaining 1/5 of the double-gene perturbations reserved for testing. After gathering predictions across all folds, we reported the performance stratified by GI type. For this analysis, we included a new baseline proposed in prior work, SALT, which predicts double-gene perturbation effects by adding the mean shift vectors of the two single-gene perturbations, assuming most effects are additive [22]. SALT was not included in earlier evaluations, as it requires that both single-gene perturbations be observed during training. We found that the GI types, potentiation and redundancy, posed the greatest challenges for prediction across all methods (Figure 3e). Notably, current SOTA methods showed a substantial drop in performance for these GI types, while MORPH remained more robust. For instance, in redundant interactions, MORPH improved prediction performance by approximately 40% compared to the best SOTA methods.
Finally, we tested whether MORPH’s predictions could accurately classify gene pairs into their correct GI types. By comparing gene interaction scores derived from model predictions to those based on observed data (Methods and Supplemental Note 2), we found that MORPH achieved the highest Pearson correlation between predicted and true scores (Figure 3f). Additionally, when using the predictions to classify gene pairs into GI types with a threshold derived from the given labels in [5], MORPH achieved the highest area under the receiver operating characteristic curve (AUC-ROC) (Figure 3f). These results highlight the utility of our approach in not only predicting perturbation effects but also understanding and classifying the nature of complex genetic interactions.
MORPH provides an informative gene embedding for optimal design of perturbations
Perturbation screens are costly and time-intensive. This highlights the need for strategies to optimize experimental design. We assessed MORPH’s performance on the problem of selecting perturbations to accelerate learning across the entire perturbation space. Specifically, we aimed to achieve accurate predictions across all perturbations from screening only a limited subset. Towards this, we adopted an iterative lab-in-the-loop framework [31, 32]; namely, we selected perturbations based on the score of an acquisition function computed using MORPH trained on existing data, these perturbations were then added to the dataset, and MORPH was updated using both new and existing data for the next round (Figure 4a).
Fig. 4. MORPH can be used for optimal design of perturbations.
a, Iterative perturbation design framework. Perturbations are selected via an acquisition function, experimentally screened, and used to update the model with newly acquired data for the next round. b-e, Line plots comparing acquisition strategies and models for efficiently covering the perturbation space as measured by prediction over a held-out test set. The green line with ‘X’ markers represents MORPH with randomly selected perturbations. The green line with circle markers corresponds to MORPH using an adaptive strategy that selects perturbations with the highest predicted loss based on its learned latent representations. The green line with triangle markers shows a variant of MORPH that selects perturbations using a fixed, prior-based latent space. For comparison, the blue lines with ‘X’ and circle markers show GEARS with randomly selected and adaptively selected perturbations based on its learned latent representations, respectively. Lines indicate mean performance and shaded regions show ±0.2 standard deviations across 5 runs with different random seeds. Evaluation metrics include MMD (b) and RMSE (c) calculated on the top 50 DE genes on a set of perturbations withheld for testing. d-e shows the precision of using the predictions to identify the top 25 and 35 perturbations that shifted cells farthest from the control state. f-g, Line plots showing performance improvements of MORPH with adaptively selected perturbations using learned representations across different allocations of initial perturbations and rounds, under a fixed experimental budget. Shaded regions show ±0.2 standard deviations across 5 runs using different random seeds.
The key principle of this approach is to identify the most informative perturbations that help the model generalize more efficiently. The acquisition function facilitates this selection by evaluating and ranking the unscreened perturbations based on their potential to improve model performance. We first considered two acquisition strategies commonly used in the active learning literature: prioritizing perturbations with high uncertainty [33-36] and prioritizing perturbations with dissimilar effects [37, 38]. However, these approaches performed similar to random selection (Supplemental Figure 7, Methods). To overcome the limitation of these approaches that they may not capture the information in deep generative models, we adopted a learning-based approach, originally developed in the context of computer vision [39] (Methods). Specifically, to estimate the challenging perturbations for the model, we trained a light-weight auxiliary model to predict the prediction loss for each unscreened perturbation. We evaluated two ways to train the auxiliary model for loss prediction: (1) using representations learned by MORPH (referred to as “adaptively selected perturbations using learned representations”), and (2) using fixed, prior-knowledge representations (referred to as “adaptively selected perturbations using prior-knowledge representations”).
Using the K562 dataset from Replogle et al. [7], which includes 2,033 perturbations, we withheld 204 perturbations for testing reported the prediction loss on the held-out test set as a measure of how well the model generalizes. The strategy of “adaptively selected perturbations using learned representations” consistently outperformed both the random baseline and “adaptively selected perturbations using prior-knowledge representations” in test-set prediction performance, as measured by MMD (Figure 4b) and RMSE (Figure 4c). To further assess the contribution of the learned representations and model architecture, we also evaluated GEARS under both random and adaptive selection. Across both prediction metrics - MMD and RMSE (Figure. 4b-c) - MORPH consistently outperformed GEARS, suggesting that MORPH more effectively captured the overall perturbation space.
We further evaluated the selection strategies on a second downstream task: identifying perturbations with the most significant effects on cells. Since most single-gene perturbations have minimal impact on cells, identifying those with strong effects could highlight valuable experimental targets. To assess MORPH’s performance on this task, we ranked perturbations in the test set based on their predicted impact, measured by the distributional distance (MMD) between predicted perturbed and control distributions. We then calculated the precision of detecting the top 25 and 35 most impactful perturbations for each selection strategy. Also in this task, Morph consistently outperformed GEARS, and the adaptive method using learned representations consistently outperformed the baseline and prior-only method (Figure 4d-e). Moreover, the larger performance gain when switching from random to adaptively selected perturbations in MORPH underscored the informativeness and suitability of its latent space for guiding experimental acquisition.
Finally, we investigated the impact of different allocations of a fixed experimental budget (here we considered 500 perturbations) on MORPH’s performance across the previous two tasks: predicting cellular responses and identifying perturbations with significant effects. Using the strategy of adaptively selecting perturbations using learned representations, we observed that allocating more perturbations to later active learning rounds generally improved predictive performance compared to front-loading them in the initial round. For instance, randomly selecting 100 perturbations in the first round (red lines) outperformed randomly selecting 300 perturbations in the first round (purple line) in terms of predictive performance measured by MMD on the test set (Figure 4f). This is likely because more strategically chosen perturbations lead to better overall performance. While going beyond two active learning rounds, performance differences among allocation strategies became negligible (different types of red lines), performing only one round without active learning (blue dot) yielded significantly worse results (Figure 4f).We observed similar trends when evaluating the model’s ability to identify high-impact perturbations: more strategically selected perturbations led to better identification performance. Having three rounds (red dotted line) outperformed two rounds (red dashed line). And five rounds (red solid line) achieved the best performance, though the difference between three and five rounds was marginal (Figure 4g). Overall, active learning with few rounds of experiments consistently enhanced the predictive performance within the fixed budget.
MORPH’s attention-based framework allows the inference of gene regulatory programs
To enhance the interpretability and reliability of MORPH, we aimed to extract the mechanisms that it learned through training. Specifically, we sought to determine whether analyzing the weights of the different components learned by MORPH from the Perturb-seq data (Figure 1a) could reveal the underlying gene regulatory network. As in prior work [2], we modeled gene regulation via a bipartite network, where perturbations act as parent nodes and regulated genes as child nodes. Clustering the parent nodes gives rise to “perturbation modules” such that perturbations within the same module induce similar effects. Clustering the child nodes gives rise to “gene programs,” where genes within the same program exhibit similar responses to shared perturbations. The directed edges between modules (parent nodes) and programs (children nodes) represent regulatory effects, providing insights into how perturbations influence gene expression patterns.
Notably, MORPH is based on a formal causal framework: The proposed regulatory bipartite network model can be formalized as a causal structural equation model, and we proved identifiability of the causal regulatory effects in this model (Supplemental Note 4). In addition, we demonstrated on simulated data, where the bipartite network structure is known, how MORPH’s attention-based framework can recover it; more precisely, we simulated control and perturbed gene expression data based on a given bipartite network, trained MORPH on the simulated data, and analyzed the learned weights (Figure 5a; Methods). To identify perturbation modules, we clustered perturbations in the latent space generated by the perturbation encoder (Figure 1a, bottom panel of the encoder). This analysis showed that perturbations within different modules formed distinct clusters in the learned embedding space (Figure 5b). Next, we mapped gene programs to expressed genes by perturbing the attention maps to determine which genes had the highest scores within each learned program (Figure 1a, attention block and decoder; Methods). This analysis showed that expressed genes generated from different gene programs were grouped into different clusters (Figure 5c). Finally, we reconstructed the bipartite graph by connecting perturbation modules to gene programs using the learned attention maps (Figure 1a, attention block; Methods). This analysis showed that the reconstructed bipartite network closely resembled the data-generating network and accurately captured both positive and negative regulation (Figure 5d). These results together demonstrate that MORPH can successfully recover the true underlying regulatory network on simulated data by analyzing the weights of each component of MORPH.2
Fig. 5. MORPH’s attention-based framework enables gene regulatory network inference.
a, The gene regulatory network used to generate the simulated data, where perturbation modules (M) are clusters of genes that induce similar effects when perturbed and gene programs (P) are genes that exhibit similar response to perturbations. b, UMAP visualization of the learned perturbation latent space colored by perturbation modules on the simulated data. c, Hierarchically clustered genes into programs using the learned mapping from latent representations to genes, colored by true gene programs, on the simulated data. d, Heatmap representing the recovered gene regulatory structures inferred from the learned attention maps on the simulated data. e-g, Same analysis for MORPH trained on [7]: e, UMAP of the learned perturbation latent space colored by perturbation modules reported in [7]; f, genes clustered into programs, colored by gene programs from [7]; g, box plots of rank-normalized attention score changes for reported up- and down-regulated programs post-perturbation from [7].
We then applied the same analysis to Perturb-seq data from Replogle et al. [7] and validated our findings against the structures reported in the original study. Clustering perturbations in the latent space consistently revealed perturbation modules that aligned with those identified in [7] (Figure 5e). Similarly, clustering the program-to-gene mappings uncovered gene clusters that corresponded to the provided labels in [7] (Figure 5f). Finally, we compared the model’s learned regulatory edges between modules and programs, represented as attention scores, to the bipartite structure reported in [7]. On average, the model assigned greater attention to programs that were up-regulated after perturbation when predicting perturbation outcomes, whereas down-regulated programs received less attention compared to control reconstruction. (Figure 5g). These results suggest that MORPH effectively captured biologically meaningful regulatory patterns.
MORPH is generally applicable to single-cell data including imaging-based read-outs
To evaluate the modularity of MORPH, we applied it to imaging data from an optical pooled screen [10]. To minimize batch effects, we used data from a single plate; this included 5,230,322 HeLa-TetR-Cas9 cells transduced with guide RNAs targeting 20,336 genes. These cells were infected with the Ebola virus and underwent genome-wide perturbations. Measurements were taken across six different imaging channels post-perturbation. The authors classified the cells into four infection states—faint, punctate, cytoplasmic, and peripheral—representing progression from the earliest to the latest stages of infection. The primary objective was to identify perturbations that significantly alter the distribution of infection states.
To extract features from the images, we fine-tuned a pretrained Vision Transformer [40] (originally trained on ImageNet-21k, containing 14 million images [41]) using an auxiliary task of classifying the segmented single-cell images from the optical pooled screen into the four infection stages. The extracted image features were then used as input to MORPH (Figure 6a). By visualizing the extracted image features, we observed that most cells were in the cytoplasmic and peripheral states, indicative of later infection stages. Only a small subset of perturbations significantly shifted cells toward earlier infection states, such as faint or punctate (Figure 6b).
Fig. 6. Application to imaging modality for perturbation outcome prediction using optical pooled screens.
a, Workflow of predicting perturbation outcomes using an optical pooled screen of cells infected with the Ebola virus and subjected to genome-wide perturbations. In step 1, a pretrained Vision Transformer was fine-tuned on an auxiliary task to classify segmented cell images into four infection stages. In step 2, image-based features were extracted and used as inputs to MORPH, which was trained to predict imaging features after perturbations. Finally, a logistic regression model classified the predicted perturbed cell features into four infection stages. b, UMAP visualizations of extracted image features colored by infection state (top), or to highlight the image features extracted from NPC1 knockout cells, a perturbation that significantly alters cell infection states, versus those extracted from cells receiving a non-targeting guide RNA (bottom). c, Model performance evaluation using the normalized loss between predicted and true infection state distribution vectors. Perturbations on the x-axis are ranked by their impact on infection states based on chi-squared test, with genes having maximal impact on the left and minimal impact on the right. The y-axis shows the cumulative mean loss for each method. The dotted line indicates perturbations with a Bonferroni-corrected p-value < 0.01. d, Barplot comparing the predicted infection states of NPC1 knockout cells across different methods. The observed infection states (red) represent the ground truth measurements.
To evaluate model performance, we classified the predicted perturbed cell features into the four infection states using logistic regression and computed the normalized L1 loss between the predicted and observed infection state distribution vectors. For comparison, we benchmarked MORPH against other models including the control distribution and a linear model. Since Gears cannot be applied to imaging data, we excluded it from this analysis. Both the linear model and MORPH utilized the same prior knowledge (DepMap), which we identified as the strongest prior knowledge source based on the analyses in the previous sections.
Since most perturbations did not significantly effect infection states, the control distribution baseline performed well when considering genome-wide perturbations. However, the primary interest lies in identifying perturbations that significantly alter the infection state distributions. To identify such perturbations, we performed a chi-squared test comparing the perturbed state distribution to the control state distribution for each perturbation. We applied Bonferroni correction to account for multiple testing and ranked the perturbations based on their impact on infection states. Notably, MORPH outperformed the baselines by 33% for the top-ranked genes, i.e., perturbations that most significantly effect infection state. Furthermore, MORPH remained competitive when considering all genes, demonstrating its robustness across different scenarios (Figure 6c-d and Supplemental Figure 9).
3. Discussion
By combining prior biological knowledge with transcriptomic or phenotypic information from control cells, MORPH effectively extrapolates perturbation effects beyond observed data in both sequencing and imaging modalities. Instead of using one-hot vectors that only indicate perturbation identity, MORPH encodes perturbations as feature vectors derived from biological databases, capturing functional relationships between genes. This approach enables the model to generalize to unseen perturbations. Additionally, its modular design allows for easy integration of different prior knowledge sources, making the framework highly adaptable and extensible.
Through evaluating different sources of prior knowledge, we identified DepMap [12] as the most informative database on the perturbation screens considered in this paper. We hypothesize that DepMap embeddings outperform other priors because it is the only database built directly from perturbation experiments. While language-based embeddings from NCBI [42] and UniProt [43] provide general gene function annotations and literature-derived information [26], DepMap specifically measures how gene disruptions affect cell viability across different cancer contexts. This supports the notion that the relevance of data to build priors is more important than sheer quantity. Future applications of MORPH, particularly in non-diseased contexts, may benefit from using context-relevant prior databases as a best practice.
MORPH can transfer perturbation effects across cellular contexts when fine-tuned on control cells of the target context. The success of this transfer depends on pathway similarity between the source and target contexts. MORPH performed best on perturbed genes with high cross-context agreement, but its performance declined for genes exhibiting context-specific behavior. For example, in our experiments conserved processes such as mitochondrial functions generalized well, while cell-line-specific pathways, like erythroid differentiation, were more challenging to predict.
Beyond single-gene perturbations, MORPH effectively models multi-gene perturbations by combining learned gene embeddings in a latent space. Our results show that MORPH can capture complex genetic interaction types, including redundant and potentiation interactions, which are challenging for other methods. Potentiation involves one gene amplifying the effect of the other, while redundancy reflects overlapping functions. Such interactions likely require more nuanced representations, which MORPH appears better equipped to learn.
Given the high cost and time demands of perturbation screens, there is a need for strategies that optimize experimental design. An iterative design of experiments enables the adaptive selection of informative perturbations based on previously collected results, thereby maximizing data utility. MORPH learns an informative gene embedding that evolves as new data is incorporated, facilitating the efficient iterative design of perturbations. MORPH’s learned embeddings consistently outperform those of existing methods, owing to its architectural design—which effectively leverages existing data and priors—and training dynamics, which are better suited to the unpaired nature of perturbation screens.
Finally, MORPH’s predictions are interpretable: the attention modules offer insights into the underlying gene regulatory networks—a feature not available in previous methods for predicting perturbation effects. This enhances both the reliability and transparency of the framework. We also provide theoretical guarantees, proving the identifiability of the inferred network structure under suitable assumptions, and we validated these results through both simulation studies and empirical data.
Looking ahead, MORPH is expected to improve as more perturbation screening data become available. Its iterative framework could be used to efficiently generalize to new cellular contexts by leveraging the perturbation embeddings learned in one context to iteratively identify and prioritize the most informative perturbations in another context. Furthermore, MORPH could directly be extended beyond genetic perturbations to model the effects of chemical perturbations, enabling applications in drug discovery [21]. This could be achieved by integrating meaningful molecular representations of drugs, for example by leveraging existing drug representation learning techniques, and using these feature vectors as inputs to the perturbation encoder. Overall, MORPH provides a versatile framework for modeling perturbation effects across different modalities, with broad applications to interrogating biological mechanisms and aiding in the development of more precise and effective treatments.
4. Resource availability
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Caroline Uhler (cuhler@mit.edu).
Data and code availability
Data:
The datasets analyzed in this paper are publicly available and include Perturb-seq datasets from Norman et al. [5], Replogle et al. [7], and Dixit et al. [2], as well as the optical pooled screen dataset from Carlson et al. [10]. In addition, we used gene embeddings derived from Geneformer [25], GenePT [26], and DepMap [12]. The corresponding accession numbers of the single-cell datasets are listed in the attached Key Resources Table. A processed version of the Norman [5] and DepMap [12] datasets can be retrieved from: https://drive.google.com/drive/folders/1TQJE281q4xH7HcNHMg1v0urD99EDj5bO?usp=drive_link.
Code:
All original code is publicly available and has been deposited at https://github.com/uhlerlab/MORPH. The repository also contains the open-source software packages used, a detailed description on how to reproduce all our results, as well as a demo application and instructions on how to apply our pipeline to user-provided datasets.
9. STAR Methods
Learning framework of MORPH
Single-cell RNA sequencing and other sequencing-based assays are inherently destructive, where each cell is usually only measured either before or after a perturbation, but not both. Consequently, we only have access to distribution-level data rather than paired individual measurements from before and after perturbation. Thus, MORPH considers a setting where unpaired samples are drawn from control and perturbed distributions: and , where represents the control distribution and represents the distribution under perturbation on gene . Each sample from these distributions represents features of a cell, such as its gene expression levels across genes or its image features of dimension .
Given a control sample and a perturbation embedding for a gene (see the following section for details on how to obtain the perturbation embedding), the goal is to learn a function
| (1) |
that predicts the perturbed outcome , which can be applied to both “seen” gene perturbations (i.e., perturbations that have been tested and are in the dataset) and “unseen” gene perturbations. Building on [24], we learn the function via a discrepancy-based variational autoencoder (VAE), which minimizes a distributional loss between the predicted perturbed samples and the real perturbed samples .
To be more precise, in this framework, the control distribution is modeled using a VAE. Let denote the latent variables that generate the cell features and be their prior distribution, realized using a standard Gaussian. The VAE consists of two components: an encoder , parameterized by , which approximates the posterior distribution of the latent variables given ; and a decoder , parameterized by , which approximates the conditional distribution of given under no perturbation. The Evidence Lower Bound (ELBO) for the likelihood of observing a sample is:
| (2) |
where
| (3) |
and
| (4) |
Here, the encoder is realized via a Gaussian distribution, where the mean and standard deviation are parameterized by . The sampling of from is implemented via the reparamterization trick for back-propagation [44]. The term represents the Kullback–Leibler (KL) divergence, which regularizes the latent space by encouraging the marginal approximate posterior to be close to the prior . We use a deterministic decoder, where denoting the output as after decoding sampled from , it reduces to an loss for sample reconstruction of the control cells:
| (5) |
To model the perturbation effects, MORPH uses a discrepancy-based component. Specifically, MORPH encodes into and into a latent representation, and combines them into a joint latent representation via concatenation. The decoder then generates a virtual counterfactual using , simulating the effect of perturbing gene on the control cell . We measure the discrepency between the distribution of generated perturbed samples and real perturbed samples using maximum mean discrepancy (MMD) [45], which is defined as
| (6) |
for two arbitrary distributions and , where denotes a kernel function. Denoting by the set of all genes that were pertrubed in a dataset, MORPH uses the following discrepency loss:
| (7) |
where is a tunable hyperparameter. This loss contains an MMD term not only for each gene perturbation in the dataset but also for the control distribution, since we observed that this benefits the alignment between generated and observed control distributions.
The full objective function minimized by MORPH for model training is:
| (8) |
where , , are hyperparameters controlling the balance among accurate reconstruction of control states, regularization of the latent space, and alignment between predicted and observed perturbation distributions. The selection of hyparameters is discussed in Supplemental Note 2.
Incorporating biological knowledge to represent perturbations
To generalize to unseen perturbations, MORPH represents each gene perturbation using a feature vector, i.e., , which represents prior knowledge of gene and its perturbation effect. The feature vector allows MORPH to relate unseen perturbations to previously seen ones, based on the intuition that perturbations with similar biological properties tend to have similar effects. We explored multiple sources of prior knowledge to construct these vectors.
Control gene expression:
The simplest form of prior knowledge leverages gene expression profiles from control cells. Let denote the gene expression matrix of control cells, where denotes the number of control cells and is the number of genes measured in each cell as defined in the previous section. To obtain a feature vector for perturbing gene , the column of is extracted, i.e., . This approach is based on the hypothesis that genes with similar expression patterns in the control cell may exhibit similar effects when being perturbed. In our experiments, we randomly sampled from the control cells to obtain a 1, 500-dimensional feature vector for each gene.
Geneformer:
The emergence of single-cell foundation models derived from large-scale single-cell datasets provide new approaches to obtain gene feature vectors. Specifically, we utilized the gene embeddings from Geneformer [25], which offer the advantage of context-dependent representations. In particular, we fed control cell data into the pretrained foundation model and extracted the second-to-last layer’s3 output for each gene per control cell. We then averaged these vectors across all control cells to obtain a 256-dimensional feature vector for each gene.
GenePT and GPT:
As proposed in [26], we defined another prior by leveraging OpenAI’s ChatGPT text embedding [46]. We obtained the gene embeddings shared by the authors, which were derived from gene summaries in NCBI [42] and UniProt [43]. Beyond these sources, we also explored generating embeddings using the same approach but incorporating information from STRING [47]. This resulted in three different text-based priors, namely, a 1, 536-dimensional gene feature for NCBI, a 3, 072-dimensional gene feature for NCBI and UniProt, and a 1, 536-dimensional gene feature for STRING.
DepMap:
Finally, we explored gene embeddings derived from the DepMap database. DepMap provides a matrix , where , 100 is the number of cell lines in the dataset. Each entry represents the viability of cell line under perturbation . To represent a specific perturbation on gene , we extracted the column of the matrix , i.e., , resulting in a 1, 100-dimensional feature vector .
MORPH’s attention mechanism
To refine the joint latent embedding of a control cell and a gene feature vector, MORPH employs a cross-attention mechanism [48] in the latent space. Toward this, the -dimensional joint embedding serves as the query, and to construct keys and values, MORPH learns a latent gene program matrix , where we chose 4 gene factors each of dimension . The attention mechanism integrates information from the matrix into dynamically, allowing the model to capture complex dependencies between perturbations and gene programs. The model has 2 attention layers, with the output of the layer for computed using:
| (9) |
where ; , , represent the query, key and value matrices; , , are the learnable weight matrices.
The attended output is added to using a residual connection:
| (10) |
and is then passed into an MLP to obtain , similar to a standard transformer block; see e.g. [48]. Finally, the refined latent representation after 2 attention layers is decoded using an MLP to predict the post-perturbational cellular response .
Transfer of perturbation effects across cell lines
To transfer perturbation effects from cell line 1 to cell line 2, we fist trained MORPH on all perturbation data from cell line 1. This yielded a mapping , which is trained on and models the perturbation response in cell line 1.
Next, we fine-tuned by minimizing the reconstruction loss in equation (8) using only the control cells from cell line 2. This step ensures that the model aligns its latent space to the new cell line’s distribution. Finally, we used the finetuned model, denoted by , to predict the effect of perturbing gene in cell line 2; i.e., , where denotes the control cell from cell line 2 and is the feature vector for gene .
In the evaluations, we also considered fine-tuning using a subset of perturbations from cell line 2. This alters the procedure described above by obtaining using additionally this subset of perturbations by minimizing the full objective in equation (8).
Genetic interaction prediction
For model evaluation, we performed five-fold cross-validation by splitting the double-gene perturbations into five equal folds. In each iteration, the model was trained on all single-gene perturbations and four of the five double-gene perturbation folds, while the remaining fold was used for testing. This process was repeated across all five folds. The same procedure was also applied to evaluate the baseline models.
After training, we computed gene interaction scores for the test set perturbations following the approach described in [5] and then classified each gene pair into a specific GI subtype (Supplemental Note 2). This was performed on the predicted and true observed gene expression data. Model performance was assessed using two metrics: (1) Pearson correlation between predicted and observed GI scores, and (2) the area under the receiver operating characteristic curve (AUC-ROC) for classifying gene pairs into GI subtypes.
Adaptive design of perturbation experiments
To investigate whether MORPH can help with designing an efficient strategy for adaptively selecting perturbations, we considered the problem of minimizing the model’s prediction error on unseen perturbations given a fixed budged of perturbations that could be performed. Towards this, we adapted three prominent experimental design strategies described below and compared these to a random baseline, which consists of choosing random perturbations in every experimental round.
Selection strategy based on prediction error maximization [39]
The active learning process iteratively updates the learned perturbation prediction model over rounds. In round 1, a random subset of genes is selected for which perturbation experiments are performed. These experiments, together with control cells, are used as warm-up to obtain . In round , denote the perturbations performed so far as and the current model as . The perturbations in the next batch of experiments are chosen, which maximize the prediction error; prediction error is estimated as follows, adapted from [39].
Let be the ground-truth prediction error of the current model on gene , evaluated using the MMD between the observed and predicted perturbed cells. In order to extrapolate the prediction error to unseen genes, a light-weight prediction model is trained to minimize the mean squared error (MSE) loss:
| (11) |
as described in the following paragraph. All perturbations are ranked using and the perturbations with highest prediction error as measured by are selected for the next batch of experiments . The model is then updated from to using the additional perturbation data.
We considered two approaches for obtaining : (1) using the learned perturbation representation, which is obtained by passing through the perturbation encoder of , as input to , and (2) using the prior-knowledge perturbation representation, i.e., , as input to .5 This comparison allowed us to evaluate whether incorporating a dynamically learned latent space improves the selection of perturbations over a static prior-based representation.
Selection strategy based on coverage maximization [37, 38]
We also explored a coverage-based strategy [37, 38] (Supplemental Figure 7). To estimate coverage, we applied -means clustering to the learned perturbation representation, which is obtained by passing through the perturbation encoder of , at each round , for all genes . The number of clusters was set equal to the number of perturbations we can select in round , and the perturbations closest to the cluster centers were selected for the next batch of experiments . Similarly, we also benchmarked this strategy by applying the same -means clustering method directly to the prior-knowledge perturbation representation , which remains fixed during training.
Selection strategy based on uncertainty maximization
The last strategy we explored involves selecting the most uncertain samples by estimating uncertainty within the learned perturbation representation, which is obtained by passing through the perturbation encoder of (Supplemental Figure 7). To assess uncertainty, we measured the sensitivity of the latent representation to perturbation inputs. Specifically, for each , we quantified uncertainty as the magnitude of the rate of change in the latent space, evaluated by the difference of the encoded perturbation representation at two consecutive training epochs. To obtain a scalar measure, we computed its -norm and averaged across epochs. This approach is based on the intuition that if the model has effectively learned the effect of a perturbation, its corresponding latent representation should remain relatively stable.
Evaluation
To assess performance of each approach, we considered two types of evaluations. First, we evaluated the model prediction on a held-out set of perturbations, , across all rounds using MMD and RMSE between the predicted and observed perturbed cell distributions; see the evaluation section below for more details.
Second, we considered the downstream task of identifying perturbations with the most significant effect. Let be the set of perturbations predicted to have the top largest MMD distance from the control distribution and be the corresponding set based on the observed data. We compared these sets using
| (12) |
Interpreting model predictions through gene programs
To identify the gene programs, we manipulated the attention scores and observed their resulting changes in the output space. Let denote a mapping matrix, where each entry quantifies the attribution of gene to program . is obtained as follows and is used to estimate the gene programs from the learned gene program matrix .
We manipulated the first attention map defined in Equation (9) by setting the attention weight corresponding to factor in to 1 and all other attention weights to 0. This operation simulates a scenario where the model focuses entirely on factor when generating the predicted gene expression profile. We set column in to the model’s predicted output under this manipulation. To obtain the gene programs, we performed hierarchical clustering on the rows of .
To identify the bipartite regulatory network connecting perturbation modules and gene programs, we leveraged the learned attention weights. As before, we used the first attention layer . For each control cell and perturbation , the trained model produces an attention map that guides the prediction of a perturbed cell, and a control attention map that guides the prediction of a control cell. To quantify the effect of the perturbation, we computed the difference: , and averaged across samples of for each to obtain . Here, denotes the number of samples with perturbation on gene in the dataset. This map reveals how the model shifts its focus across gene programs when predicting the perturbed state. We interpreted high values to indicate up-regulation and low values to indicate down-regulation.
Simulation study for gene regulatory network inference
To generate simulated data based on a bipartite structure, we first constructed a colored directed bipartite graph , as shown in Figure 5a. This graph consists of two disjoint sets of nodes: parent nodes, partitioned into clusters , , , and child nodes, partitioned into clusters , , . Regulatory relationships between parent and child nodes are represented by colored directed edges, where blue edges denote activation effects and red edges denote inhibitory effects. If a cluster is connected to a cluster , then every node in regulates every node in .
We obtained the regulation matrix , where each entry represents the regulatory effect of parent node on child node , as follows:
Let denote the vector of parent nodes. The value of each parent node was sampled independently from a normal distribution:
where and . The values of the child nodes were computed as a weighted sum of the parent nodes:
To simulate single-gene perturbations, we applied soft interventions on individual parent nodes as follows. For each intervention applied to a parent node , the value was modified as:
The child node values were then recalculated based on the perturbed parent nodes:
We sampled 6, 500 cells, trained MORPH on this simulated data, and evaluated the inferred bipartite regulatory network by comparing it to the ground-truth network .
Application to imaging data
To demonstrate the applicability of MORPH on imaging data, we applied it to predict the infection state distribution of Ebola-virus-infected cells under different perturbations based on an optical pooled screening dataset [10].
To extract image features that are then inputted into MORPH, we selected the three most relevant imaging channels: DAPI, VP35 RNA, and VP35 protein. We inputted these channels into a Vision Transformer (ViT) [40], which was pretrained on ImageNet-21k with 14 million images. We finetuned the resulting embeddings using an auxiliary classification task, where the images of control cells were classified into four infection states—Faint, Punctate, Cytoplasmic, and Peripheral—as labeled in [10]. This auxiliary task was trained by minimizing the following supervised contrastive loss [49]:
| (13) |
where is the collection of indices for all images in the batch, is the set of indices of images that belong to the same infection state as the -th image, is the set of indices of all images in the batch except the -th image, is a temperature parameter, and is the feature representation of image , which serves as input to MORPH.
To evaluate the capability of MORPH to predict infection states under unseen perturbations, we fit a logistic regression on the extracted image features of the control cells to classify them into the four infection states, and we applied this logistic regression model to the image features predicted by MORPH for unseen perturbations. Since most perturbations resulted in a cell state distribution nearly identical to the control distribution, we applied a chi-squared test, as described in the following paragraph, on the predicted state distribution to de-noise the predictions: if the test statistic was below a threshold, the prediction was set to the control state distribution vector. These de-noised predicted state distribution vectors were then compared with the ground-truth state distribution vector.
For each perturbation, the chi-squared test compared its (predicted) state distribution to that of the control cells. Given four possible infection states—Faint, Punctate, Cytoplasmic, and Peripheral—the chi-squared statistic was computed with three degrees of freedom as: where represents the count of cells in state for control () and perturbation (). The expected counts, , were calculated as , where , , and . To account for multiple hypothesis testing, we applied a Bonferroni correction to adjust the resulting p-values.
Datasets and preprocessing
Single-cell RNA sequencing data:
For the Replogle [7] and Dixit [2] datasets, the Scanpy toolbox [50] was used to perform cell and gene filtering, library size normalization, and log1p transformation. For all datasets, the 5,000 most highly variable genes, identified using the highly_variable_genes function [50], were selected to reduce the complexity of the prediction problem, following preprocessing approaches similar to [19, 51]. Preprocessing for the Norman [5] dataset was inherited from [51]. To ensure sufficient data for learning a distribution, perturbations with fewer than 16 cells were filtered out from the datasets.
Optical pooled screening data:
The processed imaging data, single-cell masks, and metadata were obtained from the Google Cloud Storage provided by [10]. For each train-test split, the ViT model was fine-tuned on the training set using an auxiliary task of classifying images into different infection states, and the fine-tuned ViT model was then applied to the entire dataset to extract image features; see the previous section for more details. Since the dataset is highly imbalanced—most perturbations exhibit phenotypes similar to the control distribution—the training set features were balanced by up-sampling perturbations with significant effects. Perturbations with significant effects were identified using a a chi-squared test, comparing infection states with and without perturbations, based solely on the training set.
Evaluation
Single-cell RNA-seq data:
We evaluated MORPH by comparing the distribution of real and predicted gene expression of perturbed cells. We used three metrics for this: root mean squared error (RMSE) of feature means, Pearson correlation between the predicted and true mean post-perturbation gene expression changes relative to the control, and maximum mean discrepancy (MMD) between the real and predicted gene expression distributions of the perturbed cells .
RMSE:
It measures the root mean squared error between the means of the observed and predicted cell distributions as
| (14) |
where and denote the predicted and observed mean of feature in the perturbed cells (where gene was perturbed), and is the number of total features in a cell.
Pearson correlation:
It quantifies the relationship between the observed and predicted mean changes from the control as
| (15) |
where and represent the predicted and observed mean changes in feature following perturbation, relative to its mean value in control cells, . Additionally, and denote the average predicted and observed changes across all genes, respectively. The fraction of genes with correctly predicted directions of perturbation-induced changes relative to the control is computed as:
| (16) |
where is the indicator function that equals 1 if the predicted and observed changes have the same sign and 0 otherwise.
MMD:
Metrics based solely on feature means can be insensitive to distributional heterogeneity, potentially favoring predictions that capture only the average rather than the full complexity of the underlying distribution. To address this limitation, we incorporated a distributional distance measure—maximum mean discrepancy (MMD) [45]—which captures differences beyond the mean by considering higher-order moments. We used an unbiased estimate of MMD by averaging kernel similarities over the cells in each set (Equation (6)). The Gaussian kernel was used, and the MMD is reported as an average over multiple bandwidths, similar to [21].
We reported all metrics using the top 50 marker genes as well as using all genes. Marker genes are computed for each perturbation with the scanpy [50] function rank_genes_groups, using the untreated control cells as reference.
Imaging data:
We evaluated the model based on its ability to distinguish between biologically relevant cell states. Specifically, we computed the L1 loss between the predicted and observed cell state vectors for each perturbation, normalizing it by 2 to ensure a range between 0 to 1:
| (17) |
where and represent the predicted and observed cell state distribution vectors for cells subjected to perturbation , with showing the proportion of perturbed cells in state .
Implementation details
We used PyTorch [52] to implement the MORPH neural network model. Hyperparameter search is discussed in Supplemental Note 1. Pathway enrichment analysis was implemented using the GSEApy package [53].
Supplementary Material
Acknowledgments
We would like to thank E. Forte for valuable discussions and feedback on the manuscript. C.H, J.Z. and M.D. were supported by the Eric and Wendy Schmidt Center at the Broad Institute. J.Z. was partially supported by an Apple AI/ML PhD Fellowship. C.U. was partially supported by NCCIH/NIH (1DP2AT012345), NID-DK/NIH (5RC2DK135492-02), ONR (N00014-24-1-2687), AstraZeneca, the United States Department of Energy (DE-SC0023187), and MIT J-Clinic for Machine Learning and Health.
Footnotes
Declaration of interests
The authors declare no competing interests.
[23] proposed to use PCA-derived embeddings of target genes from the training data to predict the effect of unseen perturbations. However, we observed that DepMap embeddings capture more relevant information about gene-gene similarity with respect to their perturbation effects. To ensure that we compare our model to the strongest version of this baseline, we adapted the linear model to use DepMap embeddings instead.
We found that residual connections in the attention layers, while beneficial for prediction performance, diluted the signal when interpreting the gene regulatory network (GRN). Specifically, removing the residual connections in the first attention layer led to a clearer readout of program-to-gene mappings, improving interpretability at the cost of slightly reduced prediction accuracy (Methods, Supplemental Table 4 and Supplemental Note 3). Given that our primary interest here is GRN inference, we focus on this variant of MORPH for the remainder of the interpretability analysis. Results using the full model can be found in Supplemental Figure 8.
We used the second-to-last layer since the final layer tends to encode task-specific features, whereas the second-to-last layer captures more generalizable representations [25].
We set for the genome-wide Perturb-seq dataset. See Supplemental Note 1 for details.
We did not apply this to GEARS, since the input gene embeddings for GEARS are initialized randomly.
References
- [1].A focus on single-cell omics. Nature Reviews Genetics 24, 485–485 (2023) [DOI] [PubMed] [Google Scholar]
- [2].Dixit A., Parnas O., Li B., Chen J., Fulco C.P., Jerby-Arnon L., Marjanovic N.D., Dionne D., Burks T., Raychowdhury R., et al. : Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. Cell 167(7), 1853–1866 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Adamson B., Norman T.M., Jost M., Cho M.Y., Nuñez J.K., Chen Y., Villalta J.E., Gilbert L.A., Horlbeck M.A., Hein M.Y., et al. : A multiplexed single-cell crispr screening platform enables systematic dissection of the unfolded protein response. Cell 167(7), 1867–1882 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Horlbeck M.A., Xu A., Wang M., Bennett N.K., Park C.Y., Bogdanoff D., Adamson B., Chow E.D., Kampmann M., Peterson T.R., et al. : Mapping the genetic landscape of human cells. Cell 174(4), 953–967 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Norman T.M., Horlbeck M.A., Replogle J.M., Ge A.Y., Xu A., Jost M., Gilbert L.A., Weissman J.S.: Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365(6455), 786–793 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Frangieh C.J., Melms J.C., Thakore P.I., Geiger-Schuller K.R., Ho P., Luoma A.M., Cleary B., Jerby-Arnon L., Malu S., Cuoco M.S., et al. : Multimodal pooled perturb-cite-seq screens in patient models define mechanisms of cancer immune evasion. Nature genetics 53(3), 332–341 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Replogle J.M., Saunders R.A., Pogson A.N., Hussmann J.A., Lenail A., Guna A., Mascibroda L., Wagner E.J., Adelman K., Lithwick-Yanai G., et al. : Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq. Cell 185(14), 2559–2575 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Feldman D., Funk L., Le A., Carlson R.J., Leiken M.D., Tsai F., Soong B., Singh A., Blainey P.C.: Pooled genetic perturbation screens with image-based phenotypes. Nature protocols 17(2), 476–512 (2022) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Carlson R.J., Leiken M.D., Guna A., Hacohen N., Blainey P.C.: A genome-wide optical pooled screen reveals regulators of cellular antiviral responses. Proceedings of the National Academy of Sciences 120(16), 2210623120 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Carlson R.J., Patten J., Stefanakis G., Soong B.Y., Radhakrishnan A., Singh A., Thakur N., Amarasinghe G.K., Hacohen N., Basler C.F., et al. : Single-cell image-based genetic screens systematically identify regulators of ebola virus subcellular infection dynamics. bioRxiv (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Hart T., Chandrashekhar M., Aregger M., Steinhart Z., Brown K.R., MacLeod G., Mis M., Zimmermann M., Fradet-Turcotte A., Sun S., et al. : High-resolution crispr screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163(6), 1515–1526 (2015) [DOI] [PubMed] [Google Scholar]
- [12].Tsherniak A., Vazquez F., Montgomery P.G., Weir B.A., Kryukov G., Cowley G.S., Gill S., Harrington W.F., Pantel S., Krill-Burger J.M., et al. : Defining a cancer dependency map. Cell 170(3), 564–576 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Wang T., Yu H., Hughes N.W., Liu B., Kendirli A., Klein K., Chen W.W., Lander E.S., Sabatini D.M.: Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic ras. Cell 168(5), 890–903 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Lee J.S., Nair N.U., Dinstag G., Chapman L., Chung Y., Wang K., Sinha S., Cha H., Kim D., Schperberg A.V., et al. : Synthetic lethality-mediated precision oncology via the tumor transcriptome. Cell 184(9), 2487–2502 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Katti A., Diaz B.J., Caragine C.M., Sanjana N.E., Dow L.E.: Crispr in cancer biology and therapy. Nature Reviews Cancer 22(5), 259–279 (2022) [DOI] [PubMed] [Google Scholar]
- [16].O’Neil N.J., Bailey M.L., Hieter P.: Synthetic lethality and cancer. Nature Reviews Genetics 18(10), 613–623 (2017) [DOI] [PubMed] [Google Scholar]
- [17].Uhler C.: Building a two-way street between cell biology and machine learning. Nature cell biology 26(1), 13–14 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Bunne C., Roohani Y., Rosen Y., Gupta A., Zhang X., Roed M., Alexandrov T., AlQuraishi M., Brennan P., Burkhardt D.B., et al. : How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell 187(25), 7045–7063 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Roohani Y., Huang K., Leskovec J.: Predicting transcriptional outcomes of novel multigene perturbations with gears. Nature Biotechnology 42(6), 927–935 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Cui H., Wang C., Maan H., Pang K., Luo F., Duan N., Wang B.: scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature Methods, 1–11 (2024) [DOI] [PubMed] [Google Scholar]
- [21].Bunne C., Stark S.G., Gut G., Del Castillo J.S., Levesque M., Lehmann K.-V., Pelkmans L., Krause A., Rätsch G.: Learning single-cell perturbation responses using neural optimal transport. Nature methods 20(11), 1759–1768 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Gaudelet T., Del Vecchio A., Carrami E.M., Cudini J., Kapourani C.-A., Uhler C., Edwards L.: Season combinatorial intervention predictions with salt & peper. arXiv preprint arXiv:2404.16907 (2024) [Google Scholar]
- [23].Ahlmann-Eltze C., Huber W., Anders S.: Deep learning-based predictions of gene perturbation effects do not yet outperform simple linear methods. BioRxiv, 2024–09 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Zhang J., Greenewald K., Squires C., Srivastava A., Shanmugam K., Uhler C.: Identifiability guarantees for causal disentanglement from soft interventions. Advances in Neural Information Processing Systems 36 (2024) [Google Scholar]
- [25].Theodoris C.V., Xiao L., Chopra A., Chaffin M.D., Al Sayed Z.R., Hill M.C., Mantineo H., Brydon E.M., Zeng Z., Liu X.S., et al. : Transfer learning enables predictions in network biology. Nature 618(7965), 616–624 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Chen Y., Zou J.: Simple and effective embedding model for single-cell biology built from chatgpt. Nature Biomedical Engineering, 1–11 (2024) [DOI] [PubMed] [Google Scholar]
- [27].McInnes L., Healy J., Melville J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018) [Google Scholar]
- [28].Gahmberg C.G., Andersson L.: K562–a human leukemia cell line with erythroid features. In: Seminars in Hematology, vol. 18, pp. 72–77 (1981) [PubMed] [Google Scholar]
- [29].Andersson L.C., JOKINEN M., Gahmberg C.G.: Induction of erythroid differentiation in the human leukaemia cell line k562. Nature 278(5702), 364–365 (1979) [DOI] [PubMed] [Google Scholar]
- [30].Rutherford T., Clegg J., Higgs D., Jones R., Thompson J., Weatherall D.: Embryonic erythroid differentiation in the human leukemic cell line k562. Proceedings of the National Academy of Sciences 78(1), 348–352 (1981) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Zhang J., Cammarata L., Squires C., Sapsis T.P., Uhler C.: Active learning for optimal intervention design in causal models. Nature Machine Intelligence 5(10), 1066–1075 (2023) [Google Scholar]
- [32].Tosh C., Tec M., White J.B., Quinn J.F., Ibanez Sanchez G., Calder P., Kung A.L., Dela Cruz F.S., Tansey W.: A bayesian active learning platform for scalable combination drug screens. Nature Communications 16(1), 156 (2025) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Lewis D.D.: A sequential algorithm for training text classifiers: Corrigendum and additional data. In: Acm Sigir Forum, vol. 29, pp. 13–19 (1995). ACM; New York, NY, USA [Google Scholar]
- [34].Joshi A.J., Porikli F., Papanikolopoulos N.: Multi-class active learning for image classification. In: 2009 Ieee Conference on Computer Vision and Pattern Recognition, pp. 2372–2379 (2009). IEEE [Google Scholar]
- [35].Wang K., Zhang D., Li Y., Zhang R., Lin L.: Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology 27(12), 2591–2600 (2016) [Google Scholar]
- [36].Seung H.S., Opper M., Sompolinsky H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 287–294 (1992) [Google Scholar]
- [37].Sener O., Savarese S.: Active learning for convolutional neural networks: A coreset approach. arXiv preprint arXiv:1708.00489 (2017) [Google Scholar]
- [38].Nguyen H.T., Smeulders A.: Active learning using pre-clustering. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 79 (2004) [Google Scholar]
- [39].Yoo D., Kweon I.S.: Learning loss for active learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 93–102 (2019) [Google Scholar]
- [40].Wu B., Xu C., Dai X., Wan A., Zhang P., Yan Z., Tomizuka M., Gonzalez J., Keutzer K., Vajda P.: Visual Transformers: Token-based Image Representation and Processing for Computer Vision (2020) [Google Scholar]
- [41].Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). Ieee [Google Scholar]
- [42].Sayers E.W., Beck J., Bolton E.E., Brister J.R., Chan J., Comeau D.C., Connor R., DiCuccio M., Farrell C.M., Feldgarden M., et al. : Database resources of the national center for biotechnology information. Nucleic Acids Research 52(D1), 33 (2024) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Bateman A., Martin M.-J., Orchard S., Magrane M., Ahmad S., Alpi E., Bowler-Barnett E.H., Britto R., Cukura A., Denny P., et al. : Uniprot: the universal protein knowledgebase in 2023. Nucleic acids research 51(D 1), 523–531 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Kingma D.P., Welling M., et al. : Auto-encoding variational bayes. Banff, Canada: (2013) [Google Scholar]
- [45].Gretton A., Borgwardt K.M., Rasch M.J., Schölkopf B., Smola A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) [Google Scholar]
- [46].OpenAI: New and Improved Embedding Model. Accessed: 2024-05-04 (2024). https://openai.com/blog/new-and-improved-embedding-model
- [47].Szklarczyk D., Kirsch R., Koutrouli M., Nastou K., Mehryary F., Hachilif R., Gable A.L., Fang T., Doncheva N.T., Pyysalo S., et al. : The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research 51(D1), 638–646 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Vaswani A.: Attention is all you need. Advances in Neural Information Processing Systems (2017) [Google Scholar]
- [49].Khosla P., Teterwak P., Wang C., Sarna A., Tian Y., Isola P., Maschinot A., Liu C., Krishnan D.: Supervised contrastive learning. Advances in neural information processing systems 33, 18661–18673 (2020) [Google Scholar]
- [50].Wolf F.A., Angerer P., Theis F.J.: Scanpy: large-scale single-cell gene expression data analysis. Genome biology 19, 1–5 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Lotfollahi M., Klimovskaia Susmelj A., De Donno C., Hetzel L., Ji Y., Ibarra I.L., Srivatsan S.R., Naghipourfar M., Daza R.M., Martin B., et al. : Predicting cellular responses to complex perturbations in high-throughput screens. Molecular systems biology 19(6), 11517 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., et al. : Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019) [Google Scholar]
- [53].Fang Z., Liu X., Peltz G.: Gseapy: a comprehensive package for performing gene set enrichment analysis in python. Bioinformatics 39(1), 757 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data:
The datasets analyzed in this paper are publicly available and include Perturb-seq datasets from Norman et al. [5], Replogle et al. [7], and Dixit et al. [2], as well as the optical pooled screen dataset from Carlson et al. [10]. In addition, we used gene embeddings derived from Geneformer [25], GenePT [26], and DepMap [12]. The corresponding accession numbers of the single-cell datasets are listed in the attached Key Resources Table. A processed version of the Norman [5] and DepMap [12] datasets can be retrieved from: https://drive.google.com/drive/folders/1TQJE281q4xH7HcNHMg1v0urD99EDj5bO?usp=drive_link.
Code:
All original code is publicly available and has been deposited at https://github.com/uhlerlab/MORPH. The repository also contains the open-source software packages used, a detailed description on how to reproduce all our results, as well as a demo application and instructions on how to apply our pipeline to user-provided datasets.






