scPerb: Predict single-cell perturbation via style transfer-based variational autoencoder

Zijia Tang; Minghao Zhou; Kai Zhang; Qianqian Song

doi:10.1016/j.jare.2024.10.035

. 2024 Oct 31;75:189–198. doi: 10.1016/j.jare.2024.10.035

scPerb: Predict single-cell perturbation via style transfer-based variational autoencoder

Zijia Tang ^a, Minghao Zhou ^b, Kai Zhang ^c, Qianqian Song ^b,^⁎

PMCID: PMC12536660 PMID: 39486785

Graphical abstract

Keywords: Single-cell RNA sequencing, Perturbation, Style transfer, Variational auto-encoder

Highlights

•
We introduced scPerb, a framework to predict cellular responses to perturbations, addressing limitations of costly, labor-intensive methods.
•
scPerb uses a style transfer strategy, isolating perturbation-related variance from unperturbed to perturbed cells.
•
scPerb outperforms current methods in post-perturbation gene expression prediction, achieving R² values of 0.98, 0.98, and 0.96 on benchmarks.
•
Its robust performance across datasets shows scPerb's potential in precision medicine for cost-effective predictions.

Abstract

Introduction

Traditional methods for obtaining cellular responses after perturbation are usually labor-intensive and costly, especially when working with multiple different experimental conditions. Therefore, accurate prediction of cellular responses to perturbations is of great importance in computational biology. Existing methodologies, such as graph-based approaches, vector arithmetic, and neural networks, either mix perturbation-related variances with cell-type-specific patterns or implicitly distinguish them within black-box models.

Objectives

This study aims to introduce and demonstrate a novel framework, scPerb, which explicitly extracts perturbation-related variances and transfers them from unperturbed to perturbed cells to accurately predict the effect of perturbation in single-cell level.

Methods

scPerb utilizes a style transfer strategy by incorporating a style encoder into the architecture of a variational autoencoder. The style encoder captures the differences in latent representations between unperturbed and perturbed cells, enabling accurate prediction of post-perturbation gene expression data.

Results

Comprehensive comparisons with existing methods demonstrate that scPerb delivers improved performance and higher accuracy in predicting cellular responses to perturbations. Notably, scPerb outperforms other methods across multiple datasets, achieving superior R² values of 0.98, 0.98, and 0.96 on three benchmarking datasets.

Conclusion

scPerb offers a significant advancement in predicting cellular responses by effectively separating and transferring perturbation-related variances. This framework not only enhances prediction accuracy but also provides a robust tool for computational biology, addressing the limitations of current methodologies.

Introduction

Single-cell RNA sequencing (scRNA-seq) is a revolutionary technology to profile gene expression of cells in heterogeneous tissue samples [1], [2], [3]. It can measure transcripts in thousands of single cells from multiple biological samples under different conditions [4], [5], [6], [7], [8]. Such breakthrough technology has inspired the development of tailored computational tools such as cell type annotations [9], [10], [11], [12], identification of pseudo-time trajectories [13], [14], and rare cell type detection [15], [16], facilitating the biological insights into single-cell data [17], [18]. Although scRNA-seq technologies have led to a remarkable growth of single-cell data, it is still challenging to collect the matched pairs of control and perturbed samples for a particular perturbation. As current databases comprise a wide variety of single-cell data collected from samples at normal conditions, there is a critical need to leverage the existing data at normal conditions to generate and predict the single-cell data after a certain perturbation. To achieve this, an accurate and robust method is needed, with generalized capabilities in revealing gene expression patterns across different tissues, different platforms, and limited data size.

Recent efforts to address the challenges of perturbation prediction use generative models such as Generative Adversarial Networks (GAN) [19] and Variational Autoencoders (VAE) [20]. GAN-based models use a generator to simulate perturbed data and an adversarial discriminator to assess how closely the predicted data matches the ground truth. While this adversarial setup is designed to produce robust predictions, GAN models often suffer from training instability. The generator may collapse, particularly when balancing the adversarial process becomes difficult, which is often the case with noisy or sparse single-cell data. This instability can result in poor generalization to new datasets or perturbations, limiting the model's reliability for broader biological applications. To address the challenge of predicting single-cell perturbations, sc-WGAN [21] applies the more stable Wasserstein GAN (WGAN), while stGAN [22] introduces style transfer learning by incorporating multiple styles into the generator. However, both models still suffer from significant limitations. Since GAN are inherently difficult to train due to the challenge of balancing the generator and discriminator, it often results in unstable gradients. This instability makes sc-WGAN and stGAN prone to model collapse, where they fail to decrease loss effectively during training. As a result, these models struggle to generalize to new datasets and have significantly lower accuracy in predicting perturbations, especially in different biological scenarios. On the other hand, VAE-based model, i.e. scGen [23], samples gene expression profiles from a multivariate Gaussian distribution through variational inference. It relies on the assumption of a fixed linear relationship between control (unperturbed) and perturbed cells. This oversimplified assumption is not sufficient for capturing the changes of complex biological data after perturbation.

Therefore, we introduce scPerb, a novel tool designed to predict single-cell gene expression under perturbations such as drug dosage, treatment, or gene modification. Unlike previous models, scPerb decouples perturbation-specific features using learnable parameters, overcoming the limitations of fixed vectors—such as those in scGen—that struggle to capture non-linear perturbation features. Additionally, it avoids the instability commonly observed in GAN-based models, which often results in poor accuracy due to challenges in learning features under varying conditions. This adaptive approach enables scPerb to effectively model complex perturbation patterns, leading to significantly more accurate and robust predictions across diverse datasets. scPerb adopts a novel strategy to decouple gene expression data into perturbation-independent contents and perturbation-specific styles. Here the ‘content’ represents the perturbation-irrelevant information, while ‘style’ refers to the perturbation-specific information. To learn the contents and styles, scPerb takes gene expression data input from both control (unperturbed) and perturbed cells, projecting each cell's gene expression into a latent space, with a tailor-designed loss. By transferring this perturbation style from control to perturbed cells, scPerb predicts the gene expressions of perturbed cells. In comprehensive benchmarks, scPerb outperforms other modeling approaches such as scGen, CVAE, stGAN, and sc-WGAN. These benchmarking results provide a valuable resource for the community, highlighting both the potential and limitations of these models when applied to scRNA-seq data. scPerb is implemented as an integrated workflow in Python and is available at https://github.com/QSong-github/scPerb.

Results

In this section, we demonstrate that scPerb accurately predicts perturbed single-cell gene expression data, outperforming several benchmark models including scGen, CVAE, stGAN, and sc-WGAN across multiple datasets. Additionally, scPerb consistently achieves superior performance when extending to smaller datasets, such as the Hpoly dataset. This underscores the versatility and reliability of scPerb in predicting gene expression changes under different perturbations.

Overview of scPerb framework

In this work, we presented a novel tool, i.e., scPerb, to predict single-cell gene expressions under specific conditions such as a dose [24], a treatment [25], [26], or a modification of genes [27], [28], [29] (Fig. 1). We hypothesized the observations $X^{ctrl}$ and $X^{perb}$ from the control and perturbed datasets had two independent latent features: a cell type-related latent feature, denoted as “content” $c$ ; and a dataset-specific feature, denoted as “style” $s$ . scPerb learned the contents $Z_{c}^{ctrl}$ and $Z_{c}^{perb}$ of the cell types from both the control and perturbed datasets, where $c$ represented the content features of the cell types and transferred the style $Z_{s}^{ctrl}$ from the control dataset to the perturbed dataset $Z_{s}^{perb}$ , and $s$ represented the dataset styles. scPerb solved the perturbation task by learning the latent features of cell types and the condition-specific style vector. Specifically, scPerb estimated the multi-variance normal distribution of the cell type feature $c$ . scPerb also used a neural network to learn the style transformation matrix from the datasets. Different with previous methods that adopt a constant vector to transfer the latent features from cells of the control condition to that of the perturbed condition, scPerb introduces learnable parameters and allows the neural network to learn both cell type and condition differences between the control and perturbed datasets. With comprehensive evaluation, scPerb performs better with more accurate prediction results when compared to other approaches. Details are shown in Materials and Methods.

Fig. 1 — **scPerb predicts gene expressions of perturbed cells.** scPerb was designed to predict gene expressions in perturbed cells and combines the principles of both style transfer and VAE. With the perturbed and control dataset as inputs, the content encoder projected the data into latent space. Differences between the latent representations of the perturbed dataset and the control dataset were captured by a style vector (s), which enabled transferring from the perturbed style to the control style. Such style vector was initiated with a random vector and updated via a style encoder, which learned the style of the perturbed dataset and transferred it to the control dataset by adding it to the latent representation of the control dataset. By minimizing the differences between both latent representations and gene expressions between predicted perturbed data and real perturbed data, scPerb transferred the control style to the perturbed style and predicted the gene expression of perturbed cells.

scPerb outperforms other benchmarking methods

To demonstrate the performance of scPerb, we compared scPerb with currently existing methods, including scGen [23], CVAE [30], stGAN [22], and sc-WGAN [21]. Three datasets were used for benchmarking, including two published human peripheral blood mononuclear cell (PBMC) datasets, i.e., PBMC-Kang [24] and PBMC-Zheng [31] datasets, which were perturbed with interferon ( $IFN$ - $β$ ), and the intestinal epithelial cell dataset fetched by parasitic helminth H.poly [25], i.e., H.poly dataset.

Based on those three datasets, each method’s performance was evaluated using the $R^{2}$ between predictions and real perturbed data. Specifically, we randomly selected a cell type to predict its gene expression data after perturbation, meanwhile using the rest of the cell types for model training. We repeated such process across all cell types and presented the average of the $R^{2}$ in Fig. 2a. In the PBMC-Zheng dataset [31], scPerb achieved the average $R^{2}$ score of $0.98$ , which was better than the performance of the competitors, including scGen (average $R^{2}$ = $0.94$ ), CVAE (average $R^{2}$ = $0.93$ ), stGAN (average $R^{2}$ = $0.39$ ) and sc-WGAN (average $R^{2}$ = $0.10$ ). Surprisingly, the GAN-based methods had much worse performance, as both GAN-based methods could not reach a $R^{2}$ value exceeding $0.5$ . Meanwhile, in the PBMC-Kang dataset, scPerb achieved the highest average $R^{2}$ score of $0.98$ , while the second-best and third-best approaches were scGen and CVAE which had $0.96$ and $0.91$ . Similarly, the stGAN and sc-wGAN only had an average $R^{2}$ score of $0.42$ and $0.12$ , respectively, in this dataset. Finally, we applied scPerb to the H.poly dataset and still got a $0.96$ average $R^{2}$ score, followed by the scGen, CVAE, stGAN, and sc-wGAN with the average $R^{2}$ score of $0.95$ , $0.93$ , $0.58$ , $0.14$ . When comparing their results in a specific cell type, scPerb consistently outperformed other benchmarking methods (Fig. 2b). For example, in CD4-T cell type, one of the most numerous cell types in the PBMC-Zheng dataset, scPerb achieved a superior $R^{2}$ score of $0.99$ , which was much better than scGen, CVAE, stGAN, and sc-WGAN ( $R^{2}$ score: $0.96$ , $0.95$ , $0.16$ , and $0.09$ ) respectively.

In addition, we evaluated the performance of the proposed scPerb and the other benchmarking methods across genes. In Fig. 2c, we illustrated the prediction of our scPerb and the performance of the other three benchmarking methods in CD4-T cells from the PBMC-Zheng dataset. The scatter plot demonstrated that scPerb got the average $R^{2}$ score of $0.9905$ when we used all the genes in this cell type. The performance could go up to $0.9935$ when we only consider the top 100 DEGs. In comparison under the same setting, scGen achieved the average $R^{2}$ score of $0.9605$ over all genes and $0.9963$ on the top 100 DEGs. Our scPerb could outperform CVAE (average $R^{2}$ score of all genes = $0.9472$ , average $R^{2}$ score of top 100 DEGs = $0.9578$ ) and sc-WGAN (average $R^{2}$ score = $0.0924$ , average $R^{2}$ score = $0.7195$ ) on both the evaluation criteria. Specifically, DEGs including IFIT1, IFIT3, IFI6, ISG20, and ISG15, showed the best performance.

In Fig. 2d, the distribution of IFIT2 in the control dataset largely differed from the distribution of its perturbed dataset. Notably, based on the predictions of perturbed gene expressions, the mean of scPerb’s prediction was close to the mean of the perturbed dataset. However, the distribution of scGen’s and st-WGAN’s prediction was comparable to the ground truth but resulted in a mean much lower than the mean of the ground truth. The predictions of CVAE resulted somewhere in between the control data and the perturbed data, meaning that it cannot clearly learn the style difference between control data and perturbed data. Though the prediction of stGAN seems to resemble the mean of the ground truth, the Wilcoxon test [32] resulted in P value less than $0.05$ , showing the significant difference between the mean of stGAN’s prediction distribution and the ground truth. For the other gene FTL, as shown in Fig. 2e, its distribution pattern in the control dataset resembled the distribution in the real perturbed dataset. Under such scenario, most of the predictions in scPerb were close to the mean of the perturbed data, whereas the predictions from scGen and CVAE exhibited a much lower mean compared with the ground truth. Both GAN-based methods stGAN and sc-WGAN presented many outliers which were deviate from the perturbed data. To further illustrate that our result was better than that of benchmarks, we applied Wilcoxon test to these results. In this case, only scPerb resulted in an adjusted P value larger than $0.05$ for both genes ( $0.176$ , and $0.074$ respectively for the FTL gene and the IFIT2 gene), which showed that the prediction of scPerb did not have a significant difference from the ground truth. In contrast, all benchmarking methods resulted in P values less than $0.05$ , showing a significant difference from the ground truth. To be more specific, scGen scored $6.3 \times 10^{- 15}$ and $0.0033$ for the FTL gene and the IFIT2 gene, while CVAE scored $0.0307$ and $1.63 \times 10^{- 9}$ , stGAN scored $4.81 \times 10^{- 109}$ and $3.14 \times 10^{- 103}$ , and sc-WGAN scored $2.01 \times 10^{- 31}$ and $2.41 \times 10^{- 10}$ . Therefore, scPerb demonstrated superior performance than the other benchmarking methods.

scPerb predicts single-cell perturbation response accurately

In this section, we aimed to show that scPerb could accurately predict the single-cell perturbation responses for other cell types. Fig. 3a summarized the performance of scPerb over different cell types. In CD4-T, CD14 Mono, and FCGR3A Mono cells, scPerb could achieve an average $R^{2}$ score = $0.99$ in both the top 100 DEGs and all gene expressions. In Dendritic cells, the average $R^{2}$ score was $0.98$ and $0.98$ respectively. In B cells and NK cells, the performance of the top 100 DEGs was slightly better than the performance of all genes, which was $0.99$ vs. $0.98$ and $0.98$ vs. $0.97$ respectively. We also observed that in CD8-T cells, the performance of the top 100 DEGs was $0.94$ , which was slightly lower than the performance on all genes (average $R^{2}$ score = $0.96$ ). In Fig. 3b, the dot plot demonstrated the correlation of representative genes among different cell types. In half of the selected genes, the dot plot showed a strong difference between the gene expression and the real perturbed gene expression. On the other half of the selected genes, we presented similar gene patterns in both the control dataset and the perturbed dataset. In the green dashed rectangle box, we highlighted the mean of the expression in the control, predicted, and real perturbed datasets. Fig. 3b implied that the mean gene expression of B cells, CD8-T cells, and Dendritic cells in our scPerb prediction was associated with the mean gene expression in the real perturbed dataset. The UMAP in Fig. 3c showed that the predicted gene expression from scPerb in CD4-T cells was correlated with the real perturbed gene expression in the latent space. Such consistent observation was also observed for a specific gene IFI6 (Fig. 3d).

scPerb accurately predicts the perturbation of cells in multiple PBMC datasets

scPerb had robust predictions of perturbed gene expressions in multiple datasets. In the PBMC-Kang dataset [24], scPerb outperformed all other methods(Fig. 4a), achieving a mean $R^{2}$ of 0.98 across all cell types, followed by scGen ( $R^{2}$ = $0.96$ ), CVAE ( $R^{2}$ = $0.91$ ), stGAN ( $R^{2}$ = $0.42$ ), and sc-WGAN (R² = $0.12$ ). Specifically, scPerb predicted the perturbed gene expressions in FCGR3A Mono cells with exceptional accuracy, achieving R² scores of $0.995$ for all genes and $0.998$ for the top 100 DEGs. In contrast, scGen produced R² values of $0.962$ and $0.954$ , while sc-WGAN and stGAN yielded significantly lower $R^{2}$ scores (Fig. 4b).

**Result of scPerb in PBMC-Kang dataset. a**: This bar plot compared the $R^{2}$ values of all the methods within the PBMC-Kang dataset, while central values represented the mean $R^{2}$ values across all 7 cell types in the dataset; **b–c**: Comparing the distribution of all the methods in the *MT2A* gene in CD4-T cells in the PBMC-Kang dataset. Center values in Fig. 4c were the adjusted P values comparing the prediction of each method to the ground truth by using the Wilcoxon test; d: A dot plot comparing the mean gene expression of all 7 cell types and all 3 conditions in the PBMC-Kang dataset; e: The correlation of the mean expression of all 6998 genes in FCGR3A Mono cells. It compared predictions from three of the best benchmark methods and scPerb against the ground truth, with shaded lines representing the 95 % confidence interval of the regression estimate.

For the MT2A gene, one of the top DEGs in FCGR3A Mono cells, scPerb provided predictions closely aligned with the ground truth, outperforming all other methods. The Wilcoxon test [34] further validated scPerb’s accuracy, with a P-value of $0.878$ , indicating no statistically significant difference between scPerb’s predictions and the real perturbed data. In contrast, scGen, CVAE, and both GAN-based methods resulted in P-values far below $0.0001$ , highlighting significant discrepancies in their predictions (Fig. 4c).

Moreover, scPerb provided robust predictions across various gene expression scenarios, whether the control gene expression was lower (e.g., IFIT1), comparable (e.g., RPL13A), or higher (e.g., FTH1) than the real perturbed gene expression (Fig. 4d). Notably, scPerb’s predictions correlated closely with the real data for the top 5 DEGs, as shown by the red dots in Fig. 4e. Overall, scPerb achieved higher $R^{2}$ values ( $0.995$ for all genes and $0.996$ for the top 100 DEGs) compared to all other benchmark methods, including scGen, CVAE, and sc-WGAN.

scPerb has robust results across different datasets

In the H.poly dataset [25], scPerb demonstrated superior performance with robust predictive accuracy. Across all cell types, scPerb achieved an average $R^{2}$ of $0.96$ , outperforming scGen ( $R^{2}$ = $0.95$ ) and CVAE ( $R^{2}$ = $0.93$ ), as well as the GAN-based methods stGAN ( $R^{2}$ = $0.38$ ) and sc-WGAN ( $R^{2}$ = $0.14$ ). The line plot in Fig. 5a highlights scPerb’s notable performance, especially in Tuft cells, where it attained an $R^{2}$ of $0.94$ . In contrast, other VAE-based methods performed worse, with scGen at $0.91$ and CVAE at $0.84$ . As shown in Fig. 5a, all VAE-based methods (scPerb, scGen, CVAE) consistently outperformed GAN-based models (stGAN, sc-WGAN) across most cell types.

scPerb also excelled in predicting gene expression in Enterocyte Progenitor cells. As illustrated in Fig. 5b, scPerb’s predictions (green dot) closely matched the real perturbed data (orange dot) compared to the unperturbed dataset (blue dot). In contrast, the predictions from other methods (Fig. 5c–f) were indistinguishable from either the unperturbed or real perturbed data, further emphasizing scPerb’s superior predictive capacity.

To further demonstrate scPerb's ability to predict perturbations, We added two large new datasets (GSE161195 and GSE161801) and a cross study to evaluate the reproducible effectiveness of scPerb. scPerb outperforms scGen on both datasets (Supplement Fig. 1, Supplement Fig. 2) and on cross study (Supplement Fig. 3).

Materials and methods

Here we presented scPerb, a generative model to predict gene expression data after perturbation. We hypothesized the observations $X^{ctrl}$ and $X^{perb}$ from the control and perturbed datasets had two independent latent features: a cell type-related latent feature, denoted as “content” $c$ ; and a dataset-specific feature, denoted as “style” $s$ . scPerb learned the contents $Z_{c}^{ctrl}$ and $Z_{c}^{perb}$ of the cell types from both the control and perturbed datasets, where $c$ represented the content features of the cell types and transferred the style $Z_{s}^{ctrl}$ from the control dataset to the perturbed dataset $Z_{s}^{perb}$ , and $s$ represented the dataset styles (Fig. 1).

scPerb first translated the input data into a probability distribution in the latent space using an encoder. Specifically, it mapped the input data to a mean ( $μ$ ) and a variance ( $σ$ ) for each latent variable. We then projected the style vector $s$ into the latent space and learned the transformation from the control dataset $X^{ctrl}$ to the perturbed dataset $X^{perb}$ , and the learned difference between $X^{ctrl}$ and $X^{perb}$ would be denoted as $σ_{s}$ . Furthermore, we denoted $E_{μ}^{c} (.)$ as the content encoder acquiring the cell-type awareness features, $E_{ϕ}^{s} (.)$ as the style encoder projecting the random style vectors to the latent space, $E_{μ}^{c} (.)$ and $E_{σ}^{c} (.)$ as the $μ$ and $σ$ estimation for the probability distribution generated by the encoders, and $D_{ϕ} (.)$ as the decoder generating the perturbed data using the latent variables $c$ and $s$ . In the inference stage, given a specific cell type from the control dataset $X^{ctrl}$ , scPerb would extract the cell type-related features $Z_{c}^{ctrl}$ , generate the “fake” perturbed cell type ${\hat{X}}^{perb}$ based on $Z_{c}^{ctrl}$ and $σ_{s}$ , and minimize the differences between $Z_{s}^{ctrl}$ and $Z_{s}^{perb}$ .

Encoders

To extract common cell type content features, we projected both inputs $(X^{ctrl}, X^{perb})$ into the latent space. Followed by the setting of VAE, we assumed the content features were multivariate normal distributions, $N (μ, σ)$ , where $μ$ and $σ$ represented the mean and variance of multivariate normal distribution). The latent representation $Z^{ctrl}$ of input data $X^{ctrl}$ was obtained from the learned distribution

\begin{matrix} N (μ^{ctrl}, σ^{ctrl}) : Z_{c}^{ctrl} \sim N (μ^{ctrl}, σ^{ctrl}) \end{matrix}

where $μ^{ctrl} = E_{μ}^{c} (E_{θ}^{c} (X^{ctrl}))$ and $σ^{ctrl} = E_{σ}^{c} (E_{θ}^{c} (X^{ctrl}))$ .

Since the projection weights were shared between the two input datasets $X^{ctrl}$ and $X^{perb}$ , the latent representation $Z^{perb}$ of input data $X^{perb}$ was obtained from $Z_{c}^{perb} \sim N (μ^{perb}, σ^{perb})$ , where $μ^{perb} = E_{μ}^{c} (E_{θ}^{c} (X^{perb}))$ and $σ^{perb} = E_{σ}^{c} (E_{θ}^{c} (X^{perb}))$ . Followed by VAE settings, we used KL loss to estimate $μ^{ctrl}$ , $σ^{ctrl}$ , $μ^{perb}$ , and $σ^{perb}$ :

\begin{matrix} KLLos s^{ctrl} & = K L (N (μ^{ctrl}, σ^{ctrl}), N (0, I)) \\ K L L o s s^{perb} & = K L (N (μ^{perb}, σ^{perb}), N (0, I)) \end{matrix}

where KL divergence was calculated by:

\begin{matrix} K L (P, Q) = \sum_{x \in X} P (x) l o g (\frac{P (x)}{Q (x)}) \end{matrix}

In this work, our task was to generate the “fake” perturbed cell types from the same cell types in the control dataset. Therefore, instead of learning the dataset styles explicitly, we applied a light-wise network to learn the transformation $σ_{s}$ in the latent space. Our idea was inspired by the style transfer learnings [22], where randomly sampled style vector ( $s$ ) and projected the latent space as the styles. In scPerb, we applied a style encoder $E_{ϕ}^{s} (.)$ , which can project the $s$ into the latent space as the transformation variable to convert $Z_{c}^{ctrl}$ to $Z_{c}^{perb}$ :

\begin{matrix} σ_{s} & = E_{ϕ}^{s} (s) \\ {\hat{Z}}_{c}^{perb} & = Z_{c}^{ctrl} + σ_{s} \end{matrix}

Therefore, we had the following $StyleLoss$ :

\begin{matrix} StyleLoss = S m o o t h L 1 L o s s (Z_{c}^{perb}, \hat{Z} c^{perb}) \end{matrix}

While the $S m o o t h L 1 L o s s$ was defined below:

\begin{matrix} S m o o t h L 1 l o s s (x, y) = \{\begin{matrix} \frac{{(x - y)}^{2}}{2 β} & i f |x - y| < β \\ |x - y| - 0.5 β & otherwise \end{matrix}) \end{matrix}

Decoder

In the decoder part, scPerb reparametrized the latent variable from the estimated posterior distribution $Z_{c}^{ctrl} \sim N (μ^{ctrl}, σ^{ctrl})$ and $Z_{c}^{perb} \sim N (μ^{perb}, σ^{perb})$ . Unlike the standard VAE, which directly reconstructed the output ${\hat{X}}^{perb}$ from the latent variable $Z_{c}^{ctrl}$ and $Z_{c}^{perb}$ , scPerb converted the representation of the control data $Z_{c}^{ctrl}$ to the latent representation ${\hat{Z}}_{c}^{perb}$ , and generated the predicted perturbed data from decoder $D_{ϕ}$ :

\begin{matrix} {\hat{X}}^{perb} = D_{ϕ} ({\hat{Z}}_{c}^{perb}) \end{matrix}

Note that our task was to predict the perturbation of the cell types using the control dataset, instead of generating the samples from $Z_{c}^{perb}$ and $Z_{c}^{ctrl}$ as the original VAE, we only used ${\hat{Z}}_{c}^{perb}$ to generate ${\hat{X}}^{perb}$ . Therefore, our $GeneratedLoss$ was:

\begin{matrix} G e n e r a t e d L o s s = S m o o t h L 1 l o s s (X^{p e r b}, {\hat{X}}^{p e r b}) \end{matrix}

Loss function

The final objective function consisted of the $Generatedloss$ , $StyleLoss$ , and the $KL$ regulation terms.

Loss = w_{1} S t y l e L o s s + w_{2} K L L o s s^{ctrl} + w_{3} K L L o s s^{perb} + w_{4} G e n L o s s

Datasets and preprocess

The PBMC-Zheng dataset was obtained from a study by Zheng et al. [31]. that involved massively parallel digital transcriptional profiling of single cells using single-cell RNA sequencing (scRNA-seq). This dataset includes 18,868 Peripheral Blood Mononuclear Cells (PBMCs), consisting of 9925 perturbed cells infected with IFN-β and 8943 control cells. To ensure data quality, we first removed megakaryocyte cells, which had uncertain or ambiguous label assignments due to their small sample size and difficulty in classification. Then we performed log transformation on gene expression levels to stabilize the variance and make the training process smoother. For our analysis, we focused on the average gene expression profiles of the top 20 gene clusters, which contain 7000 genes. This dataset is split into training and testing data sets. The training data set can be obtained from https://www.dropbox.com/s/wk5zewf2g1oat69/train_pbmc.h5ad?dl=1 and the testing data set can be obtained from https://www.dropbox.com/S/Nqi971n0tk4nbfj/valid_pbmc.h5ad?dl=1.

Kang et al. published a dataset of PBMCs [24], including both control and perturbed cells (also infected by IFN- $β$ ). We did the same data preprocessing as the PBMC-Zheng dataset, removing megakaryocyte cells, performing log transformation, and filtering the top 20 gene clusters (6998 genes in total). Among these two prepossessed PBMC datasets, seven cell types exist, respectively: B cells, CD4-T cells, CD8-T cells, CD14 Mono cells, Dendritic cells, FCGR3A Mono cells, and NK cells. This dataset can be obtained by the accession number GSE96583.

Harber et al. presented a dataset using the responses of epithelial cells infected by Salmonella and H.poly [25]. In this dataset, there were 3240 control cells, 2711 H.poly-infected cells, and the rest 1770 Salmonella-infected cells. Like the PBMC datasets, we normalized and log-transformed the data and selected the top 7000 highly variable genes to get a side-by-side comparison. This dataset can be obtained by the accession number GSE92332.

In our model, we performed further data preprocessing to ensure consistency between control and perturbed cells within each cell type. Specifically, we randomly selected an equal number of control cells and perturbed cells for each cell type in order to balance the dataset. This data preprocessing step helped us create a more robust and unbiased dataset, enabling accurate comparisons in each cell type. By doing such data processing, we guaranteed that each pair of $X^{ctrl}$ and $X^{perb}$ have the same cell type, so the following style transfer process would be valid.

Statistics and reproducibility

In scPerb, we evaluated the performance of our model under a fixed seed of 42 by using the square of the R value ( $R^{2}$ ), calculated through $scipy . s t a t s . l i n r e g r e s s$ function [35]. This metric evaluated the degree to which the predicted perturbed data and the real perturbed data were correlated. We computed the $R^{2}$ values for all genes’ mean and variance and the top 100 Differential Expressed Genes (DEGs). To understand the model’s results visually, we created scatter plots comparing the predicted perturbed data to the corresponding ground truth data. This graph allowed us to observe how well the model’s predictions aligned with the actual values.

Additionally, we used a violin plot to examine the discrepancies between the predicted perturb data and the real perturb data for the top DEGs. DEGs (Differentially Expressed Genes) are genes that exhibit statistically significant differences in expression levels between two or more conditions. In our case, the top DEGs refer to the genes with the greatest statistical differences between control and perturbed conditions. The top DEGs are those most significant ones calculated using Wilcoxon rank-sum test [34] of scanpy.tl.rank_genes_groups function. Through these analyses, we aimed to assess the accuracy and performance of our scPerb model based on the input gene expression data. The evaluation of $R^{2}$ values and the visualization of the scatter and violin plots provided valuable insights into the model’s capabilities and highlighted any discrepancies between the predicted and real perturbated data for further investigation.

Discussion

scPerb is a novel generative model that predicts gene expressions after perturbation. The encoder of scPerb projects gene expressions of both control and perturbed data into the high-dimensional latent space. scPerb aggregates it with the dataset-specific styles to generate a high-quality representation for the perturbed dataset. Based on the representation, the decoder from scPerb can reconstruct gene expressions of perturbed data. The experiments demonstrate that scPerb can capture the latent content features and generate dataset-specific styles across different cell types and conditions. Moreover, the quantitative evaluation indicated the performance of scPerb outperforms four existing methods, presenting outperformed results in each cell types of three different datasets.

Compared with previous work [21], [22], [23], [30], scPerb is a data-driven algorithm that fully explores the gene expression in the raw dataset and does not rely on solid domain priors. On the opposite, previous work extract the principal components and build up a graph-based model in the low-dimensional manifold. Such methods rely heavily on the experienced domain knowledge, and lack of generalization capabilities. Compared with other data-driven algorithms, scPerb incorporates the stableness from the VAE settings and exploits the advantage of the GAN to generate high-quality samples.

However, minor problems still exist. In Endocrine cells in the H.poly dataset, one of the cell types containing the fewest cells in the H.poly dataset (163 in 5059), scPerb makes predictions slightly worse than scGen [23]. Using $R^{2}$ values as a criterion, scGen results in $0.89$ while scPerb only results in $0.87$ . Note that scGen only calculates a fixed liner vector while scPerb uses style transfer, in this case, the problem of “overfitting” exists. However, such cases are very rare and scPerb can still outperform other methods such as scGen in other cases when the data is small. In Tuft cells, also one of the cell types containing the fewest cells in the H.poly dataset (248 in 5059), scPerb achieves a $R^{2}$ value of $0.94$ while scGen only gets $0.91$ .

The recent advancements in droplet microfluidics and microfluidic impedance cytometry [36], [37] provide data resources for perturbation studies. As more data is produced from these platforms, scPerb and other models can be evaluated for robustness and accuracy across diverse perturbation scenarios. It will not only enhance the reliability of scPerb’s predictions but also expand its applicability to a wider range of biological contexts.

Code availability

scPerb is provided as a Python package available at https://github.com/QSong-github/scPerb, with detailed functions for implementation.

Compliance with Ethics Requirements

This article does not contain any studies with human or animal subjects.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Q.S. is supported by the National Institute of General Medical Sciences of the National Institutes of Health (R35GM151089).

Footnotes

^{Appendix A}

Supplementary data to this article can be found online at https://doi.org/10.1016/j.jare.2024.10.035.

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Supplementary Data 1

mmc1.docx^{(2.3MB, docx)}

References

1.Baron M., et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Syst. 2016;3:346–360. doi: 10.1016/j.cels.2016.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Puram S.V., et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell. 2017;171:1611–1624. doi: 10.1016/j.cell.2017.10.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Athanasiadis E.I., et al. Single-cell RNA-sequencing uncovers transcriptional states and fate decisions in haematopoiesis. Nat Commun. 2017;8:2045. doi: 10.1038/s41467-017-02305-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Azizi E., et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174:1293–1308. doi: 10.1016/j.cell.2018.05.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Cusanovich D.A., et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018;174:1309–1324. doi: 10.1016/j.cell.2018.06.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Muraro M.J., et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 2016;3:385–394. doi: 10.1016/j.cels.2016.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Iram T., Consortium T.M. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Buenrostro J.D., et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell. 2018;173:1535–1548. doi: 10.1016/j.cell.2018.03.074. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Jagadeesh K.A., et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat Genet. 2022;54:1479–1492. doi: 10.1038/s41588-022-01187-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Shao X., et al. scCATCH: automatic annotation on cell types of clusters from single-cell RNA sequencing data. Iscience. 2020;23 doi: 10.1016/j.isci.2020.100882. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Crow M., Paul A., Ballouz S., Huang Z.J., Gillis J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun. 2018;9:884. doi: 10.1038/s41467-018-03282-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wei J.-R., et al. Identification of visual cortex cell types and species differences using single-cell RNA sequencing. Nat Commun. 2022;13:6902. doi: 10.1038/s41467-022-34590-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Tasaki S., et al. Inferring protein expression changes from mRNA in Alzheimer’s dementia using deep neural networks. Nat Commun. 2022;13:655. doi: 10.1038/s41467-022-28280-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Denyer T., et al. Spatiotemporal developmental trajectories in the Arabidopsis root revealed using high-throughput single-cell RNA sequencing. Dev Cell. 2019;48:840–852. doi: 10.1016/j.devcel.2019.02.022. [DOI] [PubMed] [Google Scholar]
15.Torre E., et al. Rare cell detection by single-cell RNA sequencing as guided by single-molecule RNA FISH. Cell Syst. 2018;6:171–179. doi: 10.1016/j.cels.2018.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Wu H., Kirita Y., Donnelly E.L., Humphreys B.D. Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J Am Soc Nephrol. 2019;30:23. doi: 10.1681/ASN.2018090912. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Andrews T.S., Kiselev V.Y., McCarthy D., Hemberg M. Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc. 2021;16:1–9. doi: 10.1038/s41596-020-00409-w. [DOI] [PubMed] [Google Scholar]
18.Chen G., Ning B., Shi T. Single-cell RNA-seq technologies and related computational data analysis. Front Genet. 2019;10:317. doi: 10.3389/fgene.2019.00317. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Goodfellow I., et al. Generative adversarial nets. Adv Neural Inf Proces Syst. 2014;27 [Google Scholar]
20.D.P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprint arXiv:1312.6114, 2013.
21.Ghahramani A., Watt F.M., Luscombe N.M. Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. bioRxiv. 2018 [Google Scholar]
22.Karras T, Laine S, Aila T. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019:4401–10.
23.Lotfollahi M., Wolf F.A., Theis F.J. scGen predicts single-cell perturbation responses. Nat Methods. 2019;16:715–721. doi: 10.1038/s41592-019-0494-8. [DOI] [PubMed] [Google Scholar]
24.Kang H.M., et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol. 2018;36:89–94. doi: 10.1038/nbt.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Haber A.L., et al. A single-cell survey of the small intestinal epithelium. Nature. 2017;551:333–339. doi: 10.1038/nature24489. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Hagai T., et al. Gene expression variability across cells and species shapes innate immunity. Nature. 2018;563:197–202. doi: 10.1038/s41586-018-0657-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Dixit A., et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016;167:1853–1866. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Adamson B., et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016;167:1867–1882. doi: 10.1016/j.cell.2016.11.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Datlinger P., et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods. 2017;14:297–301. doi: 10.1038/nmeth.4177. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Cortes C, Lawarence N, Lee D, Sugiyama M, Garnett R. In: Proceedings of the 29th annual conference on neural information processing systems; 2015.
31.Zheng G.X., et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Cuzick J. A Wilcoxon-type test for trend. Stat Med. 1985;4:87–90. doi: 10.1002/sim.4780040112. [DOI] [PubMed] [Google Scholar]
33.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426; 2018.
34.Wolf F.A., Angerer P., Theis F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:1–5. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Virtanen P., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Jiang Z., Shi H., Tang X., Qin J. Recent advances in droplet microfluidics for single-cell analysis. TrAC Trends Anal Chem. 2023;159 [Google Scholar]
37.Zhu J., et al. Microfluidic impedance cytometry enabled one-step sample preparation for efficient single-cell mass spectrometry. Small. 2024;20 doi: 10.1002/smll.202310700. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data 1

mmc1.docx^{(2.3MB, docx)}

[b0005] 1.Baron M., et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Syst. 2016;3:346–360. doi: 10.1016/j.cels.2016.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0010] 2.Puram S.V., et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell. 2017;171:1611–1624. doi: 10.1016/j.cell.2017.10.044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0015] 3.Athanasiadis E.I., et al. Single-cell RNA-sequencing uncovers transcriptional states and fate decisions in haematopoiesis. Nat Commun. 2017;8:2045. doi: 10.1038/s41467-017-02305-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0020] 4.Azizi E., et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174:1293–1308. doi: 10.1016/j.cell.2018.05.060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0025] 5.Cusanovich D.A., et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018;174:1309–1324. doi: 10.1016/j.cell.2018.06.052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0030] 6.Muraro M.J., et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 2016;3:385–394. doi: 10.1016/j.cels.2016.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0035] 7.Iram T., Consortium T.M. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0040] 8.Buenrostro J.D., et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell. 2018;173:1535–1548. doi: 10.1016/j.cell.2018.03.074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0045] 9.Jagadeesh K.A., et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat Genet. 2022;54:1479–1492. doi: 10.1038/s41588-022-01187-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0050] 10.Shao X., et al. scCATCH: automatic annotation on cell types of clusters from single-cell RNA sequencing data. Iscience. 2020;23 doi: 10.1016/j.isci.2020.100882. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0055] 11.Crow M., Paul A., Ballouz S., Huang Z.J., Gillis J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun. 2018;9:884. doi: 10.1038/s41467-018-03282-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0060] 12.Wei J.-R., et al. Identification of visual cortex cell types and species differences using single-cell RNA sequencing. Nat Commun. 2022;13:6902. doi: 10.1038/s41467-022-34590-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0065] 13.Tasaki S., et al. Inferring protein expression changes from mRNA in Alzheimer’s dementia using deep neural networks. Nat Commun. 2022;13:655. doi: 10.1038/s41467-022-28280-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0070] 14.Denyer T., et al. Spatiotemporal developmental trajectories in the Arabidopsis root revealed using high-throughput single-cell RNA sequencing. Dev Cell. 2019;48:840–852. doi: 10.1016/j.devcel.2019.02.022. [DOI] [PubMed] [Google Scholar]

[b0075] 15.Torre E., et al. Rare cell detection by single-cell RNA sequencing as guided by single-molecule RNA FISH. Cell Syst. 2018;6:171–179. doi: 10.1016/j.cels.2018.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0080] 16.Wu H., Kirita Y., Donnelly E.L., Humphreys B.D. Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J Am Soc Nephrol. 2019;30:23. doi: 10.1681/ASN.2018090912. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0085] 17.Andrews T.S., Kiselev V.Y., McCarthy D., Hemberg M. Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc. 2021;16:1–9. doi: 10.1038/s41596-020-00409-w. [DOI] [PubMed] [Google Scholar]

[b0090] 18.Chen G., Ning B., Shi T. Single-cell RNA-seq technologies and related computational data analysis. Front Genet. 2019;10:317. doi: 10.3389/fgene.2019.00317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0095] 19.Goodfellow I., et al. Generative adversarial nets. Adv Neural Inf Proces Syst. 2014;27 [Google Scholar]

[b0100] 20.D.P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprint arXiv:1312.6114, 2013.

[b0105] 21.Ghahramani A., Watt F.M., Luscombe N.M. Generative adversarial networks uncover epidermal regulators and predict single cell perturbations. bioRxiv. 2018 [Google Scholar]

[b0110] 22.Karras T, Laine S, Aila T. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019:4401–10.

[b0115] 23.Lotfollahi M., Wolf F.A., Theis F.J. scGen predicts single-cell perturbation responses. Nat Methods. 2019;16:715–721. doi: 10.1038/s41592-019-0494-8. [DOI] [PubMed] [Google Scholar]

[b0120] 24.Kang H.M., et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol. 2018;36:89–94. doi: 10.1038/nbt.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0125] 25.Haber A.L., et al. A single-cell survey of the small intestinal epithelium. Nature. 2017;551:333–339. doi: 10.1038/nature24489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0130] 26.Hagai T., et al. Gene expression variability across cells and species shapes innate immunity. Nature. 2018;563:197–202. doi: 10.1038/s41586-018-0657-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0135] 27.Dixit A., et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016;167:1853–1866. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0140] 28.Adamson B., et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016;167:1867–1882. doi: 10.1016/j.cell.2016.11.048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0145] 29.Datlinger P., et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods. 2017;14:297–301. doi: 10.1038/nmeth.4177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0150] 30.Cortes C, Lawarence N, Lee D, Sugiyama M, Garnett R. In: Proceedings of the 29th annual conference on neural information processing systems; 2015.

[b0155] 31.Zheng G.X., et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0160] 32.Cuzick J. A Wilcoxon-type test for trend. Stat Med. 1985;4:87–90. doi: 10.1002/sim.4780040112. [DOI] [PubMed] [Google Scholar]

[b0165] 33.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426; 2018.

[b0170] 34.Wolf F.A., Angerer P., Theis F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:1–5. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0175] 35.Virtanen P., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0180] 36.Jiang Z., Shi H., Tang X., Qin J. Recent advances in droplet microfluidics for single-cell analysis. TrAC Trends Anal Chem. 2023;159 [Google Scholar]

[b0185] 37.Zhu J., et al. Microfluidic impedance cytometry enabled one-step sample preparation for efficient single-cell mass spectrometry. Small. 2024;20 doi: 10.1002/smll.202310700. [DOI] [PubMed] [Google Scholar]

PERMALINK

scPerb: Predict single-cell perturbation via style transfer-based variational autoencoder

Zijia Tang

Minghao Zhou

Kai Zhang

Qianqian Song

Graphical abstract

Highlights

Abstract

Introduction

Objectives

Methods

Results

Conclusion

Introduction

Results

Overview of scPerb framework

Fig. 1.

scPerb outperforms other benchmarking methods

Fig. 2.

scPerb predicts single-cell perturbation response accurately

Fig. 3.

scPerb accurately predicts the perturbation of cells in multiple PBMC datasets

Fig. 4.

scPerb has robust results across different datasets

Fig. 5.

Materials and methods

Encoders

Decoder

Loss function

Datasets and preprocess

Statistics and reproducibility

Discussion

Code availability

Compliance with Ethics Requirements

Declaration of competing interest

Acknowledgements

Footnotes

Appendix A. Supplementary data

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases