Summary
Single-cell sequencing reveals the heterogeneity of cellular response to chemical perturbations. However, testing all relevant combinations of cell types, chemicals, and doses is a daunting task. A deep generative learning formalism called variational autoencoders (VAEs) has been effective in predicting single-cell gene expression perturbations for single doses. Here, we introduce single-cell variational inference of dose-response (scVIDR), a VAE-based model that predicts both single-dose and multiple-dose cellular responses better than existing models. We show that scVIDR can predict dose-dependent gene expression across mouse hepatocytes, human blood cells, and cancer cell lines. We biologically interpret the latent space of scVIDR using a regression model and use scVIDR to order individual cells based on their sensitivity to chemical perturbation by assigning each cell a “pseudo-dose” value. We envision that scVIDR can help reduce the need for repeated animal testing across tissues, chemicals, and doses.
Keywords: deep learning, variational autoencoders, chemical perturbation, dose response, risk assessment, computational modeling, gene expression, single-cell RNA–seq, pharmacology, toxicology
Highlights
-
•
Predicts chemical perturbations in gene expression across cell types
-
•
Predicts response to multiple doses of a chemical
-
•
Enables biological interpretation of model predictions
-
•
“Pseudo-dose” metric evaluates cell-specific chemical sensitivity
The bigger picture
Cellular response to chemical perturbation is highly heterogeneous and dose dependent. It would be impossible to experimentally characterize the risks of chemical or drug exposure across all relevant combinations of cell types, chemicals, and doses. We introduce scVIDR, a computational method that utilizes recent advances in generative deep learning to address this challenge. Across a range of chemical exposure scenarios, we show that after training on available single-cell gene expression data, scVIDR can predict perturbations across untested cell types and doses. We envision that scVIDR will help reduce the need for repeated animal testing across tissues, chemicals, and doses.
Variational autoencoders can predict chemical perturbations across cell types using vector arithmetic. However, vector arithmetic alone cannot predict perturbations in single-cell gene expression accurately in animal studies across multiple doses. We utilize a regression-based method to improve on in vivo predictions by accounting for cell-type-specific differences in gene expression response. We then extend this model to predict the response to multiple doses of a chemical and derive a metric to characterize chemical sensitivity in individual cells.
Introduction
In 2010, Sydney Brenner suggested that it is possible to deduce the physiology of biological systems by understanding the interactions and behaviors of their constituent units.1 The appropriate unit, in his opinion, was the cell. Single-cell sequencing (scSeq) has revolutionized the study of cell biology. With the ability to capture the transcriptomic state of thousands of cells at once, a fine-grained picture of the organization of cell physiology has begun to emerge.2 Much of the effort in scSeq has been made in the realm of cell-type/-state discovery,3,4 cellular development,5,6,7,8 and disease progression.9,10 These represent natural applications of scSeq, especially regarding the spatial and temporal dynamics of cellular systems and their interactions. However, relatively little attention has been given to how cells respond to environmental signals like chemical exposures, which in addition to being spatial and temporal are also chemical and dose dependent.
Broadly, cells exhibit the ability to recognize and respond to external stimuli. This process is mediated by a coordinated set of extracellular and intracellular interactions that transduce resulting signals into cellular responses.11 These responses, as a function of dose, define dose-response curves.12 The dose-response curve is heavily dependent on the type of cell and its internal state.13,14 Thus, even cells of the same type can respond to the same exposure in a heterogeneous manner.15 scSeq provides a comprehensive measure of the transcriptome of a cell and captures the inherent variation among cells of the same type. This makes scSeq a useful tool in the study of chemical perturbations of biological systems.
However, a comprehensive cell atlas of chemical perturbations is impossible to assemble given the vast number of combinations of dose, exposure duration, and cell types.16 Recently developed resources like scPerturb17 and the multiplexed interrogation of gene expression through single-cell RNA sequencing (MIX-seq) protocol18 cover a meaningful but relatively small portion of this space. Algorithms that generalize chemical perturbations across cell state and dose can provide better estimates of the cartography of the chemical perturbation space. In this work, we use deep generative modeling to computationally predict cellular response across dose and cell types. We use a class of deep neural networks for dimensionality reduction called autoencoders. Specifically, we use a variational autoencoder19 (VAE), which relies on Bayesian priors to encode single-cell data into a latent distribution. VAEs have been used to model several technical aspects unique to single-cell data, including statistical confounders such as library size and batch effects20 and zero inflation.21
In perturbational single-cell biology, autoencoder models such as scGen22 have been able to predict the response of interferon (IFN-)-treated peripheral blood mononuclear cells (PBMCs). However, for considering more complicated in vivo perturbations, existing models do not consider cell-type-specific effects in predicting the mean expression of differentially expressed genes (DEGs). Advances in other autoencoder frameworks such as the compositional perturbational autoencoder (CPA)16 aim to deal with these issues by trying to infer basal state from the data by modeling covariates with different autoencoders and then iteratively composing them when performing predictions for a particular set of conditions. While promising, CPA can only work with vary large data samples (relative to other perturbational autoencoders), as the model needs to learn a latent space for each covariate. Thus, for confident prediction, CPA will need datasets that already have a great deal of the perturbational space mapped. Additionally, most perturbational autoencoder frameworks are uninterpretable in terms of the quantitative relationship between latent space and expression prediction. Thus, it is difficult to ascertain which specific genes the model uses to predict differential gene expression after treatment. Thus, there is a need for simpler models that better account for the complexity of in vivo experiments, that predict high doses from less data, and that provide more informative interpretations at the level of individual genes.
Here, we propose single-cell variational inference of dose-response (scVIDR), which builds on latent space vector arithmetic when using VAEs to study single-cell perturbations (Figure 1). scVIDR predicts cell-type-specific DEG expression and approximates high-dose experiments better than other state-of-the-art algorithms. We also use scVIDR to interpret the latent space using linear models to assess the pathways involved in the single-cell dose-response. We accomplish this across several datasets including the dose-response of liver cells to 2,3,7,8 tetrachlorodibenzo-p-dioxin (TCDD) in vivo,22,23 PBMCs treated with IFN-,24 and a multiplexed dataset of 188 different drug combinations applied to three prominent cancer cell lines (sci-Plex25).
We use data from a single-nucleus dose-response experiment in livers from mice gavaged with TCDD as a case study for in vivo dose-response prediction.22,23 Hepatic responses to TCDD represent an interesting case study, as its canonical receptor, the aryl hydrocarbon receptor (AhR), is unevenly expressed along the hepatic lobule, the functional unit of the liver. AhR is more highly expressed in the centrilobular region compared with the portal region (Figure S1).26 Thus, not only does response to TCDD vary across different cell types in the liver, but it also varies within cell types (such as hepatocytes) along the portal to the central axis of the liver lobule.22,27 To model response variation between cell types, the latent space of the VAE is used to order hepatocytes with respect to their transcriptomic response to TCDD and thus align all hepatocytes along a “pseudo-dose” axis.
Results
scVIDR predicts single-dose, single-cell perturbation expression better than other state-of-the-art algorithms
According to the manifold hypothesis, high-dimensional data often lay on a lower-dimensional, latent manifold.28 For single-cell data, this is a reasonable assumption given that the expression of one gene is often highly dependent on the expression of other genes encoding transcription factors and is functionally constrained by the process of evolution.29 Further evidence of this can be seen in the extensive use and success of dimensionality reduction algorithms in the analysis of scSeq data.30 Lower-dimensional representations of single-cell data are at the heart of many single-cell gene expression analysis methods such as trajectory inference.31 One method of interest is modeling of the latent manifold using neural networks. These latent manifolds have been shown to simplify complex relationships in single-cell gene expression data.32,33,34 Specifically, simple vector arithmetic on such spaces can predict in vitro chemical perturbations with high accuracy.16,35 However, the accuracy of such models when predicting in vivo dose-responses is inconsistent.
We begin by considering a single-cell gene expression dataset consisting of cells, where represents the expression profile of cell . We assume that gene expression is generated by some continuous random process involving a lower-dimensional random variable . The generative process that describes the mapping from to is given by the probability distribution, . Thus, given that we know and not , we would like to approximate the probability distribution that maps to , . Since calculating is usually intractable, we use a neural network, the encoder, to approximate it using a different Gaussian distribution, . To map values back from to , we use a second neural network, the decoder, to approximate . In practice, both the encoder and decoder are trained together to minimize the reconstruction error of the decoder and the difference between the prior distribution and the encoder distribution.
We initially developed models for a single-dose chemical perturbation where we characterize whether a cell has been treated with a set concentration of the chemical of interest with the indicator variable (Figure 1A). We set for cells that have been treated with the chemical (treatment) and for cells that have not been treated (control). Our dataset contains cell types within both the and groups. Each time a model is evaluated, one treated cell type is withheld from training and used in evaluation. In standard VAE vector arithmetic (scGen), the latent space representation of the perturbation of some cell type is approximated by . is the latent gene expression representations of cell type ,35 and is the difference between the centroids of the treated and control training groups in the latent space. When we compare the difference of centroids between the treated and control groups, , of individual cell types with , we see that cell-type-specific differences vary greatly in a principal-component analysis (PCA) projection (Figure S2A). Examination of the magnitudes (Figure S2B) and the directions of each cell’s (Figure S2C) in high-dimensional space show that diverges greatly from . Hence, we calculate , a function of the mean latent representation of the control group of cell type . We approximate this function by training a linear regression model with the other cell types on the latent space (experimental procedures) and show that better matches the ground truth (Figure S2). It should be noted that when there is only one cell type available for training, for all practical purposes, scVIDR is equivalent to scGen (Figure S7).
We applied this model to the case of a single dose of TCDD administered to mice. Gene expression was measured with single-nucleus RNA-seq (snRNA-seq) originating from the mouse liver. We set for unperturbed gene expression and for gene expression perturbed by 30 g/kg TCDD. The dataset covered 6 different liver cell types: cholangiocytes, endothelial cells, stellate cells, central hepatocytes, portal hepatocytes, and portal fibroblasts (Figure 2). Our training set (Figure 2A) consisted of all control and TCDD-treated cell types except for TCDD-treated portal hepatocytes, which were used for model evaluation. We compared the performance of scGen, scPreGAN,36 CellOT,37 and scVIDR (our method) on the top 5,000 highly variable genes (HVGs) and the top 100 DEGs. When predicting the gene expression of portal hepatocytes, each method generated a set of virtual portal hepatocytes (Figure 2B). We then computed the average expression of each gene across all cells and compared the average gene expression in predicted cells versus cells derived from snRNA-seq experiments. Across HVGs, the scVIDR model yielded an average of 0.92 (Figure 2C). Across DEGs, scVIDR produced an average of 0.81 (Figure 2C). Continuing the evaluation across all cell types (Figure 2D), leaving out one cell-type perturbation at a time as described above for portal hepatocytes, our model outperformed all other models (with p < 0.001, one sided Mann-Whitney U test) when evaluated on both HVGs and DEGs.
We had similar results for IFN--treated PBMCs (Figure S3).24 Here, for PBMCs treated with IFN-, and for untreated PBMCs (Figure S3A). Across HVGs, the models yielded values of 0.97, 0.92, 0.77, and 0.66, and across DEGs, they yielded s of 0.96, 0.86, 0.80, and 0.84 for scVIDR, scGen, scPreGAN, and CellOT, respectively (Figure S3C). When accuracy was assessed for all cell types, scVIDR significantly outperformed all other models (Figure S3D).
To test if scVIDR can perform out-of-distribution predictions robust to experimental batch effects and diverse genetic backgrounds, we test scVIDR on two additional experiments. In the first experiment, we recapitulate results from Lotfollahi et al.,35 in which we predict perturbations across studies (in this case, we look at IFN- perturbation of PBMCs from Kang et al.24 and try to predict it in PBMCs from Zheng et al.38). We show that scVIDR can predict biologically plausible perturbations across studies (Figure S8). In the second experiment, we show that scVIDR can better predict LPS6 perturbation in rats ( for HVGs) using perturbations from other species (pig, rabbit, and mouse)39 than scGen ( for HVGs), scPreGAN ( for HVGs), and CellOT ( for HVGs) (Figure S9). In both experiments, we show that scVIDR can be used to predict perturbations not only across cell types but also across multiple perturbation studies and models.
scVIDR accurately predicts the transcriptomic response for multiple doses across cell types
Next, we predicted the response for multiple doses of TCDD (Figure 1B). Here, is equal to the magnitude of the perturbation, which in our case is equivalent to the dose. Thus, represents expression at dose 0, and represents expression at dose 30, where the dose is in units of g/kg in Figure 3 and of nM in Figure S4. As with the single-dose case, we train the model on the dose-response data for all cell types except one, for which only the condition is kept. We calculate the (experimental procedures; Figure 3A), which is the estimated difference of means between the highest dose and the untreated groups. For scVIDR, intermediate doses are then calculated on the latent space by interpolating log linearly on the . For scGen, we log linearly interpolate on (experimental procedures). Finally, those latent space representations are decoded back into gene expression space using the decoder portion of each of the models.
We analyzed a mouse liver snRNA-seq dataset that included 8 doses (p = [0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10, 30]) of TCDD and a control (p = 0) in g/kg (Figure 3). scVIDR outperforms scGen in approximating expression across the dose-response of TCDD in mouse liver. We used the mean score across all evaluated genes as our performance metric (Figure 3B). scVIDR significantly outperformed scGen at predicting HVGs and DEGs for doses >0.3 g/kg (Mann-Whitney one-sided U test p < 0.001). scVIDR predicts the important TCDD receptor repressor gene, Ahrr, at doses 1, 3, and 10 g/kg in portal hepatocytes better than scGen (Figure 3C). When predicting all other cell types (cholangiocytes, endothelial cells, stellate cells, central hepatocytes, portal hepatocytes, and portal fibroblasts), scVIDR significantly outperformed scGen only at the highest doses of 10 and 30 g/kg on prediction of all HVGs (Figure 3D). When predicting on just the DEGs, scVIDR significantly outperformed scGen for doses >0.3 g/kg (Figure 3E).
We used scVIDR to predict the effects of a test set of 37 drugs out of 188 treatments in the sci-Plex dose-response data25 at 24 h for A549 cells (Figure S4A). scVIDR was trained on all data (all drugs and doses) in K562 and MCF7 cells. The model was also trained on the remaining 151 drugs in A549 cells not used in validation, as well as the vehicle data for the 37 drugs in the test set (Figure S4A). The dose-response for the 37 drugs was predicted as above by first calculating the between the control and the highest dose for a particular drug and log linearly interpolating along the in order to predict the intermediate doses. We evaluated predictions made by scVIDR at the gene, drug, and drug pathway levels. For the drug belinostat, a histone deacetylase inhibitor, scVIDR improves on predictions of DEGs such as MALAT1 relative to scGen (Figure S4B). When predicting gene expression of the DEGs in belinostat-treated A549 cells, scVIDR also significantly outperformed scGen on all doses (Figure S4C). On predicting the DEGs of all drugs with the same mode of action as belinostat (epigenetics), scVIDR similarly outperformed scGen on all doses (Figure S4D). Finally, when looking across all 37 drugs in the test dataset, we were able to predict the expression of DEGs significantly better than scGen on average for the 3 highest doses of 100, 1,000, and 10,000 nM (Figure S4E).
Regression on the latent space infers the relationship between predicted gene expression and
Insight into model decisions can provide information regarding proper model usage and pitfalls. It would be useful to identify which genes and pathways are associated with scVIDR’s prediction; however, standard VAEs do not have a linear map from the latent space to the gene expression and thus are hard to interpret. To interpret the predictions of scVIDR, we approximate the function of the decoder with linear regression (experimental procedures). We take inspiration from the use of PCA in scSeq40 and the development of linearly decoded VAEs (LDVAEs).41 PCA is a linear transformation that projects the data onto a lower-dimensional (latent) space while retaining as much variance as possible. This transformation is represented by a linear weight matrix, , with dimensions where is the number of latent variables and is the number of genes. We can understand each principal component as a linear combination of genes. This allows us to assess the relationship between genes and a direction in latent space.
In a VAE, the mapping from the latent space to the gene space is done by the decoder that, unlike the inverse of PCA, is non-linear. In LDVAEs, however, the decoder portion of the VAE is a linear regression layer, and thus the weight matrix of this layer, , describes a linear relationship between direction in the latent space and gene prediction.41
However, interpretability comes at the expense of model accuracy. LDVAEs have higher reconstruction error than standard VAEs on single-cell data.41 Similarly, using PCA and vector arithmetic to predict scSeq perturbations performed poorly compared to scGen.35 As a result, one would like to try to interpret the latent space of a standard VAE. We present an approach to interpret the VAE’s latent space using sparse regression.
We take an alternative approach to LDVAEs in which we instead approximate the non-linear function of the decoder in a standard VAE using sparse linear regression (Figure 4A). Sparse regression methods like local interpretable model-agnostic explanations (LIME) have been used to interpret complex models.42 We specifically use sparse linear ridge regression, given that each gene has a non-zero contribution to each latent variable and that gene weights are distributed parsimoniously. This gives us a linear transformation matrix, , that approximates the function of the decoder.
We use this weight matrix to interrogate the relationship between predicted gene expression and . The span of is simply a direction in scVIDR’s latent space. The importance of to each gene’s predicted expression is the sum of the latent dimensional components of multiplied by the gene’s corresponding latent dimensional weight from . In matrix form,
In practice, we found that normalizing the weight matrix by its L2 norm gives better insights when interpreting the model (experimental procedures). Gene scores represent how significant changes in latent space dimensions will impact the decoded transcriptomic response when we interpolate on the span of on the latent space. Thus, genes with higher scores will be predicted to have bigger changes when we increase the dose of our prediction by scVIDR.
We utilize a trained scVIDR model where portal hepatocytes were left out of training and the was approximated (Figures 4B–4D). Gene scores for were calculated as described above. The genes with the top 20 highest-magnitude genes scores included well-established markers of TCDD-induced hepatotoxicity such as genes from the cytochrome P450 family (Figure 4B).26 To see whether this relationship extended to pathways involved in TCDD-induced hepatotoxicity, we performed Enrichr analysis38 using the 2019 WikiPathways database43 on genes with the top 100 gene scores (Figure 4C). Among the top enriched terms, we found the hallmarks of hepatic response to TCDD in mice, such as oxidation by cytochrome P450,44 fatty acid omega oxidation,45 and tryptophan metabolism.46 To derive the relationship between the actual doses and the gene pathways, the genes with the top 100 gene scores that were in “fatty acid oxidation” from WikiPathways were used in calculating enrichment scores for each cell using Scanpy.47 A sigmoid function was fit to the median enrichment score in each dose (experimental procedures). We observed a small mean absolute error in our model and thus concluded that there was a sigmoidal dose-response relationship for the gene set generated by Enrichr (Figures 4D and 4E).
Pseudo-dose captures zonation in TCDD hepatocyte response
In single-cell analysis of developmental trajectories, it is useful to order cells with respect to a latent time course, termed “pseudo-time.” This is because cells develop at different rates due to natural variations among themselves and their environment. This ordering is usually done using algorithms such as Slingshot48 and Monocle.49 In pharmacology and toxicology, we experience a similar problem, as cells of the same type have variable sensitivities to the same toxicant. Hence, we propose to order cells in terms of a latent dose. We call this ordering of cells a “pseudo-dose.”
Working off the assumption that (experimental procedures) is the axis of perturbation in latent space, we orthogonally project the latent representation of each cell to the to obtain a scalar coefficient for each cell along (Figures 5A and 5B). We use this scalar coefficient as the pseudo-dose value for each cell.
To test whether these pseudo-dose values capture the latent response across cell types, we distinguished between the portal and central regions of the liver lobule. Zonation of the lobule not only defines differences in hepatocyte gene expression along the portal to the central axis but also defines their metabolic characteristics.50 Thus, we expect that the two zones will exhibit different sensitivities to TCDD. The pseudo-dose correlated well with the actual dose administered to the hepatocytes with an (Figure 5C). We also found that the pseudo-dose displayed a sigmoidal relationship (experimental procedures) between the expression of DEGs such as Fmo3 (Figure 5D). Finally, we found the pseudo-dose to be statistically higher on average in the central hepatocytes versus the portal hepatocytes (Figures 5E and 5F). This is consistent with liver biology, given that central hepatocytes respond more strongly to treatment due to TCDD sequestration51 and higher AhR expression levels in the centrilobular zone.26
Discussion
Mapping the combinatorial space of single-cell perturbation is important to toxicology and pharmacology to facilitate the generalization of drug or toxicant effects across several domains. Computational modeling allows researchers to use current large-scale databases to predict new perturbations to scSeq data. We have demonstrated an improvement to such modeling using VAEs with regression. These improvements include highly correlated prediction of cell-type-specific effects in mouse liver, PBMCs, and A549 cells. We also modeled a latent response for mouse hepatocytes using pseudo-dose and interrogated the VAE to predict dose-dependent perturbations in portal hepatocyte pathways. We show that deep generative modeling can be used to model complex perturbations in single-cell gene expression data from several different datasets.
Model limitations
When evaluating the model in the mouse liver, scVIDR performed better on the cell types most sensitive to TCDD, e.g., hepatocytes and endothelial cells (Figures S5A, S5C, and S5D). For cell types less sensitive to TCDD, the model often underestimated the expression of DEGs (Figure S5E). This is likely a result of a combination of factors including the similarity of the treatment to the control data (Figure S5A), the smaller control cell populations (Figure S5B), and the overall low expression of HVGs (Figure S5E). Thus, we believe that the VAE has less information to predict differential gene expression for these cell types. Our model improves on this problem with respect to scGen for most cell types in the liver (except for stellate cells and cholangiocytes at higher doses). Results from sci-Plex imply that incorporating scSeq data from livers treated with other compounds could improve these predictions, as the model would have more information on different liver responses.
In the sci-Plex dataset, prediction of certain drugs with epigenetic mode of actions produced the poorest prediction scores (Figure S6). This is because scSeq data provide no information regarding epigenetic modifications (e.g., chromatin accessibility, histone marks, and DNA-binding proteins). Integration with epigenetic data such as single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq) could help to predict such responses with higher accuracy.
While scVIDR and its pseudo-dose metric work on standard dose-response scenarios, it remains untested for use with more complex cellular trajectories such as those found in development and circadian rhythms.52 Such trajectories include branching and cycling, which involve non-linear dynamics, and may require more sophisticated models to properly capture their topology. Algorithms such as CellOT37 can represent complex distributional shifts along latent dimensions; however, they are still only developed for single-perturbation measurements and extrapolate poorly to larger perturbations.
Future directions
When looking to the future of generative modeling in chemical-induced perturbation of gene expression, a problem domain of interest is time-dependent drug effects. Chemical exposures are not only a function of concentration but also of time.53 Dose-time-response analysis is central to risk assessment in clinical settings.54 Predicting the response not only as a function of amount of drug but also as a function of the time the drug is within a patient’s system and the time of day at which the drug was administered would allow for more effective and safer dosing regimens.54,55
Developmental state can also be impacted by chemical perturbation. An example of this is the inhibition of B cell lymphopoiesis by TCDD.56 The latent space could be useful for analyzing a simplified model of the dynamics of developmental systems and how they change with chemical perturbation. PCA for dimensionality reduction has been used in this area for successful cellular fate prediction during hematopoiesis.57
Conclusions
Taken together, our tool facilitates dose-response predictions for a particular drug in a specific cell type using the response of other cell types. Dose-response modeling is important in the realm of drug development and toxicity testing, as the physiological response of chemical perturbation is dose dependent. We envision the use of scVIDR in optimizing dose-response studies during drug discovery and development. scVIDR enables prediction of chemical response in a wide array of cell types and doses using only the control and the highest doses of previous experiments. As more data become available on single-cell chemical perturbations, generative modeling can yield insights into the underlying manifold of gene expression and how different classes of chemicals act on that manifold. Discovery of the properties of the manifold will allow for generalizations to be made about the physiology of tissues and understudied chemical perturbations.
Experimental procedures
Resource availability
Lead contact
The lead contact for this work is Sudin Bhattacharya (sbhattac@msu.edu).
Materials availability
The study did not generate new unique materials or reagents.
Single-cell expression datasets and preprocessing
Nault et al.23 performed all TCDD liver dose-response experiments, which were deposited in the Gene Expression Omnibus (GEO)59 under the accession number GSE184506. Kang et al.24 performed all IFN- PBMC experiments, which were deposited in GEO under the accession number GSE96583. Zheng et al.38 performed all experiments relating to study B, which were deposited in the Sequence Read Archive60 under accession number SRP073767. Hagai et al.39 performed all LPS6 species experiments, which were deposited in BioSciences under accession number E-MTAB-5919.61
The sci-Plex dataset25 and the TCDD dose-response dataset23 were collected and processed uniformly from raw count expression matrices. The cell expression vectors are normalized to the median total expression counts for each cell. The cell counts are then log transformed with a pseudo-count of 1. Finally, we select the top 5,000 most HVGs on which to do our analysis. The preprocessing was carried out using the scanpy.pp package using the normalize_total, log1p, and highly_variable functions.47
The TCDD dose-response dataset comprised of snRNA-seq of C57BL6 of flash frozen mouse livers. Mice in this dataset were administered, subchronically, a specified dose of TCDD via oral gavage every 4 days for 28 days. In our analysis, all immune cell types were left out, as immune cells are known to migrate from the lymph to the liver during TCDD administration.22 Thus, there is a small size for the immune cell populations in the low-dose datasets versus the higher doses. PBMC data from Kang et al.,24 study B data from Zheng et al.,38 and species data from Hagai et al.39 were accessed as a processed dataset from Lotfollahi et al.35
When training scGen and scVIDR, batch effects are accounted for with the scvi.data package using the setup_anndata function. Differential abundances of cells in different groups are accounted for by random sampling with replacement of the same number of cells for each dose and random sampling without replacement of the same number of cells for each cell type.
Implementation and training of models
All code in this manuscript is implemented in the Python programming language. The scVIDR model is built on the python package, scGen v.2.0.0,35 which in turn is built on the python package scVI v.0.13.0.20 Here, we modify the model to accommodate predictions of the dose-response, linear regression on the latent space, pseudo-dose calculations, and approximations of the gene importance in chemical perturbations
Hyperparameters for the model and training are the default values selected by scGen v.2.0.0. Table 1 outlines the model hyperparameters used in deploying scVIDR and scGen. Table 2 outlines the training hyperparameters when deploying scVIDR and scGen.
Table 1.
Hyperparameter | Value |
---|---|
Latent dimension | 100 |
Number of layers | 2 |
Layer width | 800 |
Dropout rate | 0.2 |
Kullback-Leibler weight |
Table 2.
Hyperparameter | Value |
---|---|
Training epochs | 100 |
Learning rate | 0.001 |
Learning rate decay | |
Optimizer | Adam |
Optimizer epsilon | 0.01 |
Early stopping | true |
Early stopping patience | 25 |
Our implementation of CellOT37 and scPreGAN36 uses default parameters from both of their respective publications.
Calculation of the for single- and multiple-dose predictions
The , as defined by Lotfollahi et al.,35 is the difference between the mean latent representations of the treated (t = 1) and untreated (t = 0) conditions:
where is the mean latent representation for treatment in the dataset.
We can calculate a cell-type-specific for some cell type, , by taking the difference between the mean latent representations of the treated and control groups, or
If we want to estimate a for some type of cell type based on and where is unknown, we can approximate a function based on , or
where we approximate the above function using all other existing cell types in the dataset as input to ordinary least-squares regression as implemented by the LinearRegression function in the sklearn.linear_model package.62
Predictions of dose-response in the latent space in scVIDR and scGen
To predict the latent representation for a response at some dose, , we interpolate log linearly on such that for each latent cell in our prediction, :
where is the highest dose in the dataset. To calculate the dose-response values for scGen, we simply replace with calculated by scGen.
Evaluating model performance
Performance of the model on the prediction task is the same as that in Lotfollahi et al.35 We quantified performance using the value for mean gene expression for each gene across all cells. The was calculated using the linregress function from the scipy.stats package.63 We compared the DEGs that are selected using the rank_gene_groups from the Scanpy package and taking the top 100. Models were compared on the same prediction in which we resample 80% of the cells in the cell type we are predicting 100 times. Resampling is done using the choice function from the numpy.random package.64
Statistical significance was determined by the one-sided Mann-Whitney U test as it is implemented by the mannwhitneyu function from the scipy.stats package. We considered p values less than 0.001 as statistically significant.
Distances were used to establish relationships between distributions and vectors. Cosine distance was calculated using the cosine function in the scipy.spatial.distance package. The Sinkhorn distance was calculated using the SampleLoss class in the geomloss package.65
Inferring feature-level contributions to perturbation prediction
In PCA, we perform an orthogonal linear transformation on the data such that our projected data preserve as much variance as possible. It is known that the solution to this maximization problem is to project the data onto the eigenvectors of the covariance matrix, or
where is the mean-centered scRNA-seq expression matrix, is the eigenvectors corresponding to the highest eigenvalues of the covariance matrix of , and represents the -dimensional projection of the data onto its principal components. We can see from this formula that is calculated as a linear combination of weights and gene expression, and thus there is a linear relationship between the genes and the principal components. We can exploit this fact and calculate a loading for each gene with each corresponding eigenvector by taking the product of the eigenvector and the square root of the corresponding eigenvalue, or
where is the value (corresponding to gene j) of the eigenvector and is the eigenvalue for the eigenvector. These loadings represent a normalized score of the relationship between a gene’s expression and a particular principal component. These loadings are also directly proportional to the actual correlation between the gene’s expression and the principal component of interest.
It can be shown that PCA and autoencoders with a single hidden layer (with a size less than the observations) and a strictly linear map are nearly equivalent.66 We can project principal components back into expression space using the following function:
Additionally, we note that PCA is a solution to the minimization of the reconstruction error:
We find similarly that the loss function that we try to optimize in the autoencoder we described above is
where is the weights of the hidden layer and is the weights of the final layer of the autoencoder. In effect, we can see that the autoencoder described above can approximate the loadings of a PCA using .
The reconstruction error for a standard VAE with the assumption that the observations are a multivariate Gaussian is
where is the number of samples, is the function of the decoder neural network, and is the transformation by the encoder of the observations onto the latent space. In an LDVAE, the is replaced with a single layer with linear transfer operators such that the reconstruction error is the following:
in which is the linear weights of the decoder. These weights give us an approximation of the contributions of individual genes to the dimensions of the latent space. We can interpret as a loadings matrix by which we can interpret the latent dimensions of the LDVAE.
To approximate feature contributions to predicting the perturbation in scVIDR, we train a ridge regression model. We then take the decoder portion of our model and sample 100,000 points from the latent space and generate their corresponding expression vectors. This will be our training dataset for a ridge regression. We then train the ridge regression using the Ridge class from the sklearn.linear_model package. We can describe the loss of our ridge regression as
where are the sampled points from the latent space, is the approximation of the predicted gene expression vectors, and is an m × n matrix where m is the number of genes and n is the number of latent dimensions. We divide using the to normalize for the effect of overexpressed genes. We then calculate the gene scores by taking the dot product of normalized and , or
We use these gene scores to order genes for Enrichr67 pathway analysis with the gseapy package.68 Scores for each pathway were calculated using the score_genes function from the scanpy.tl package with the genes sets derived from the Enrichr results.
Calculating the pseudo-dose values
We can order each cell, , with respect to the variable response of to the chemical by taking the latent representation, , and orthogonally projecting it onto :
The scalar multiple of , , is the pseudo-dose value for .
Regression of sigmoid function for evaluating dose-response relationships
To establish whether a standard dose-response relationship existed between the top pathways inferred by Enrichr and the pseudo-dose and gene expression, a logistic function of the form
was used, where d is the dose or pseudo-dose. The parameters of the function above were fit to the output variables (median enrichment score and Fmo3 normalized expression) using the Levenberg-Marquardt algorithm implementation in the curve_fit function in the scipy.optimize package. The regression was evaluated using the mean absolute error metric implementation in the mean_absolute_error function in the sklearn.metrics package.
Acknowledgments
This work was supported by the National Human Genome Research Institute R21 HG010789 to T.Z. and S.B. O.K. is supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under award number T32 ES007255. T.Z. and S.B. are partially supported by the USDA National Institute of Food and Agriculture, Michigan AgBioResearch. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was supported in part through computational resources and services provided by the Institute for Cyber-Enabled Research at Michigan State University.
Author contributions
Conceptualization, S.B. and O.K.; methodology, O.K. and D.F.; software, validation, and writing – original draft, O.K.; formal analysis, O.K., D.F., R.N., and D.M.; data curation, O.K. and R.N.; supervision and funding acquisition, S.B. and T.Z.; writing - review and editing, all authors.
Declaration of interests
The authors declare no competing interests.
Inclusion and diversity
One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in their field of research or within their geographical location.
Published: August 11, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.patter.2023.100817.
Supplemental information
Data and code availability
All data used in the manuscript are publicly available and are referenced in the manuscript. The code for the software and for reproducing the figures is available at https://github.com/BhattacharyaLab/scVIDR. Long-term archive of code repository is made available via Zenodo at http://doi.org/10.5281/zenodo.8025235.58
References
- 1.Brenner S. Sequences and consequences. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2010;365:207–212. doi: 10.1098/rstb.2009.0221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Regev A., Teichmann S.A., Lander E.S., Amit I., Benoist C., Birney E., Bodenmiller B., Campbell P., Carninci P., Clatworthy M., et al. The human cell atlas. Elife. 2017;6 doi: 10.7554/eLife.27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wilkerson B.A., Zebroski H.L., Finkbeiner C.R., Chitsazan A.D., Beach K.E., Sen N., Zhang R.C., Bermingham-Mcdonogh O. Novel cell types and developmental lineages revealed by single-cell rna-seq analysis of the mouse crista ampullaris. Elife. 2021;10 doi: 10.7554/eLife.60108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Keren-Shaul H., Spinrad A., Weiner A., Matcovitch-Natan O., Dvir-Szternfeld R., Ulland T.K., David E., Baruch K., Lara-Astaiso D., Toth B., et al. A Unique Microglia Type Associated with Restricting Development of Alzheimer’s Disease. Cell. 2017;169:1276–1290.e17. doi: 10.1016/j.cell.2017.05.018. [DOI] [PubMed] [Google Scholar]
- 5.Pellin D., Loperfido M., Baricordi C., Wolock S.L., Montepeloso A., Weinberg O.K., Biffi A., Klein A.M., Biasco L. A comprehensive single cell transcriptional landscape of human hematopoietic progenitors. Nat. Commun. 2019;10:2395. doi: 10.1038/s41467-019-10291-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rodriguez-Fraticelli A.E., Weinreb C., Wang S.W., Migueles R.P., Jankovic M., Usart M., Klein A.M., Lowell S., Camargo F.D. Single-cell lineage tracing unveils a role for TCF15 in haematopoiesis. Nature. 2020;583:585–589. doi: 10.1038/s41586-020-2503-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Taylor D.M., Aronow B.J., Tan K., Bernt K., Salomonis N., Greene C.S., Frolova A., Henrickson S.E., Wells A., Pei L., et al. The Pediatric Cell Atlas: Defining the Growth Phase of Human Development at Single-Cell Resolution. Dev. Cell. 2019;49:10–29. doi: 10.1016/j.devcel.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Semrau S., Goldmann J.E., Soumillon M., Mikkelsen T.S., Jaenisch R., Van Oudenaarden A. Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat. Commun. 2017;8:1096. doi: 10.1038/s41467-017-01076-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.van Galen P., Hovestadt V., Wadsworth Ii M.H., Hughes T.K., Griffin G.K., Battaglia S., Verga J.A., Stephansky J., Pastika T.J., Lombardi Story J., et al. Single-Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease Progression and Immunity. Cell. 2019;176:1265–1281.e24. doi: 10.1016/j.cell.2019.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Peng J., Sun B.F., Chen C.Y., Zhou J.Y., Chen Y.S., Chen H., Liu L., Huang D., Jiang J., Cui G.S., et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29:725–738. doi: 10.1038/s41422-019-0195-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Brivanlou A.H., Darnell J.E. Signal Transduction and the Control of Gene Expression. Science. 2002;295:813–818. doi: 10.1126/science.1066355. [DOI] [PubMed] [Google Scholar]
- 12.Blumenthal D.K. In: Goodman & Gilman’s: The Pharmacological Basis of Therapeutics, 13e. Brunton L.L., Hilal-Dandan R., Knollmann B.C., editors. McGraw-Hill Education; 2017. Pharmacodynamics: Molecular Mechanisms of Drug Action. [Google Scholar]
- 13.Yao J., Pilko A., Wollman R. Distinct cellular states determine calcium signaling response. Mol. Syst. Biol. 2016;12:894. doi: 10.15252/MSB.20167137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kramer B.A., Pelkmans L. Cellular state determines the multimodal signaling response of single cells. bioRxiv. 2019 doi: 10.1101/2019.12.18.880930. Preprint at. [DOI] [Google Scholar]
- 15.Zhang Q., Caudle W.M., Pi J., Bhattacharya S., Andersen M.E., Kaminski N.E., Conolly R.B. Embracing systems toxicology at single-cell resolution. Curr. Opin. Toxicol. 2019;16:49–57. doi: 10.1016/j.cotox.2019.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lotfollahi M., Klimovskaia Susmelj A., De Donno C., Ji Y., Ibarra I.-C.L., Wolf F.A., Yakubova N., Theis F.J., Lopez-Paz D. Learning interpretable cellular responses to complex perturbations in high-throughput screens. bioRxiv. 2021 doi: 10.1101/2021.04.14.439903. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Peidli S., Green T.D., Shen C., Gross T., Min J., Garda S., Yuan B., Schumacher L.J., Taylor-King J., Marks D., et al. scPerturb: Harmonized Single-Cell Perturbation Data. bioRxiv. 2023 doi: 10.1101/2022.08.20.504663. Preprint at. [DOI] [PubMed] [Google Scholar]
- 18.McFarland J.M., Paolella B.R., Warren A., Geiger-Schuller K., Shibue T., Rothberg M., Kuksenko O., Colgan W.N., Jones A., Chambers E., et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 2020;11:4296. doi: 10.1038/s41467-020-17440-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kingma D.P., Welling M. 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings (International Conference on Learning Representations. ICLR; 2014. Auto-encoding variational bayes. [Google Scholar]
- 20.Lopez R., Regier J., Cole M.B., Jordan M.I., Yosef N. Deep generative modeling for single-cell transcriptomics. Nat. Methods. 2018;15:1053–1058. doi: 10.1038/s41592-018-0229-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Qiu Y.L., Zheng H., Gevaert O. Genomic data imputation with variational auto-encoders. GigaScience. 2020;9 doi: 10.1093/gigascience/giaa082. giaa082–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nault R., Fader K.A., Bhattacharya S., Zacharewski T.R. Single-Nuclei RNA Sequencing Assessment of the Hepatic Effects of 2,3,7,8-Tetrachlorodibenzo-p-dioxin. CMGH. 2021;11:147–159. doi: 10.1016/j.jcmgh.2020.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Nault R., Saha S., Bhattacharya S., Dodson J., Sinha S., Maiti T., Zacharewski T. Benchmarking of a Bayesian single cell RNAseq differential gene expression test for dose–response study designs. Nucleic Acids Res. 2022;50:e48. doi: 10.1093/nar/gkac019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kang H.M., Subramaniam M., Targ S., Nguyen M., Maliskova L., McCarthy E., Wan E., Wong S., Byrnes L., Lanata C.M., et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 2018;36:89–94. doi: 10.1038/nbt.4042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Srivatsan S.R., McFaline-Figueroa J.L., Ramani V., Saunders L., Cao J., Packer J., Pliner H.A., Jackson D.L., Daza R.M., Christiansen L., et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science. 2020;367:45–51. doi: 10.1126/science.aax6234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lindros K.O., Oinonen T., Johansson I., Ingelman-Sundberg M. Selective Centrilobular Expression of the Aryl Hydrocarbon Receptor in Rat Liver. J. Pharmacol. Exp. Therapeut. 1997;280:506–511. [PubMed] [Google Scholar]
- 27.Yang Y., Filipovic D., Bhattacharya S. A Negative Feedback Loop and Transcription Factor Cooperation Regulate Zonal Gene Induction by 2, 3, 7, 8-Tetrachlorodibenzo-p-Dioxin in the Mouse Liver. Hepatol. Commun. 2022;6:750–764. doi: 10.1002/hep4.1848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Fefferman C., Mitter S., Narayanan H. Testing the manifold hypothesis. J. Am. Math. Soc. 2016;29:983–1049. doi: 10.1090/jams/852. [DOI] [Google Scholar]
- 29.Davidson E.H. The Regulatory Genome. Elsevier; 2006. The “Regulatory Genome” for Animal Development; pp. 1–29. [DOI] [Google Scholar]
- 30.Sun S., Zhu J., Ma Y., Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 2019;20:269. doi: 10.1186/s13059-019-1898-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Van den Berge K., Roux de Bézieux H., Street K., Saelens W., Cannoodt R., Saeys Y., Dudoit S., Clement L. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 2020;11:1201. doi: 10.1038/s41467-020-14766-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ding J., Condon A., Shah S.P. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 2018;9:2002. doi: 10.1038/s41467-018-04368-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Eraslan G., Simon L.M., Mircea M., Mueller N.S., Theis F.J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 2019;10:390. doi: 10.1038/s41467-018-07931-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Grønbech C.H., Vording M.F., Timshel P.N., Sønderby C.K., Pers T.H., Winther O. scVAE: variational auto-encoders for single-cell gene expression data. Bioinformatics. 2020;36:4415–4422. doi: 10.1093/bioinformatics/btaa293. [DOI] [PubMed] [Google Scholar]
- 35.Lotfollahi M., Wolf F.A., Theis F.J. scGen predicts single-cell perturbation responses. Nat. Methods. 2019;16:715–721. doi: 10.1038/s41592-019-0494-8. [DOI] [PubMed] [Google Scholar]
- 36.Wei X., Dong J., Wang F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics. 2022;38:3377–3384. doi: 10.1093/bioinformatics/btac357. [DOI] [PubMed] [Google Scholar]
- 37.Bunne C., Stark S.G., Gut G., del Castillo J.S., Lehmann K.-V., Pelkmans L., Krause A., Rätsch G. Learning Single-Cell Perturbation Responses using Neural Optimal Transport. bioRxiv. 2021 doi: 10.1101/2021.12.15.472775. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zheng G.X.Y., Terry J.M., Belgrader P., Ryvkin P., Bent Z.W., Wilson R., Ziraldo S.B., Wheeler T.D., McDermott G.P., Zhu J., et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017;8:14049–14112. doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hagai T., Chen X., Miragaia R.J., Rostom R., Gomes T., Kunowska N., Henriksson J., Park J.-E., Proserpio V., Donati G., et al. Gene expression variability across cells and species shapes innate immunity. Nature. 2018;563:197–202. doi: 10.1038/s41586-018-0657-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rostom R., Svensson V., Teichmann S.A., Kar G. Computational approaches for interpreting scRNA-seq data. FEBS Lett. 2017;591:2213–2225. doi: 10.1002/1873-3468.12684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Svensson V., Gayoso A., Yosef N., Pachter L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics. 2020;36:3418–3421. doi: 10.1093/bioinformatics/btaa169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ribeiro M.T., Singh S., Guestrin C. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. [Google Scholar]
- 43.Martens M., Ammar A., Riutta A., Waagmeester A., Slenter D.N., Hanspers K., A Miller R., Digles D., Lopes E.N., Ehrhart F., et al. WikiPathways: Connecting communities. Nucleic Acids Res. 2021;49:D613–D621. doi: 10.1093/nar/gkaa1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Henry E.C., Welle S.L., Gasiewicz T.A. TCDD and a Putative Endogenous AhR Ligand, ITE, Elicit the Same Immediate Changes in Gene Expression in Mouse Lung Fibroblasts. Toxicol. Sci. 2010;114:90–100. doi: 10.1093/toxsci/kfp285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Cholico G.N., Fling R.R., Zacharewski N.A., Fader K.A., Nault R., Zacharewski T.R. Thioesterase induction by 2,3,7,8-tetrachlorodibenzo-p-dioxin results in a futile cycle that inhibits hepatic β-oxidation. Sci. Rep. 2021;11 doi: 10.1038/s41598-021-95214-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Friedrich M., Sankowski R., Bunse L., Kilian M., Green E., Ramallo Guevara C., Pusch S., Poschet G., Sanghvi K., Hahn M., et al. Tryptophan metabolism drives dynamic immunosuppressive myeloid states in IDH-mutant gliomas. Nat. Can. (Que.) 2021;2:723–740. doi: 10.1038/s43018-021-00201-z. [DOI] [PubMed] [Google Scholar]
- 47.Wolf F.A., Angerer P., Theis F.J. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Street K., Risso D., Fletcher R.B., Das D., Ngai J., Yosef N., Purdom E., Dudoit S. Slingshot: Cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 2018;19:477. doi: 10.1186/s12864-018-4772-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Qiu X., Mao Q., Tang Y., Wang L., Chawla R., Pliner H.A., Trapnell C. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods. 2017;14:979–982. doi: 10.1038/nmeth.4402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Cunningham R.P., Porat-Shliom N. Liver Zonation – Revisiting Old Questions With New Technologies. Front. Physiol. 2021;12 doi: 10.3389/FPHYS.2021.732929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Santostefano M.J., Richardson V.M., Walker N.J., Blanton J., Lindros K.O., Lucier G.W., Alcasey S.K., Birnbaum L.S. Dose-dependent localization of TCDD in isolated centrilobular and periportal hepatocytes. Toxicol. Sci. 1999;52:9–19. doi: 10.1093/toxsci/52.1.9. [DOI] [PubMed] [Google Scholar]
- 52.Saelens W., Cannoodt R., Todorov H., Saeys Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 2019;37:547–554. doi: 10.1038/s41587-019-0071-9. [DOI] [PubMed] [Google Scholar]
- 53.Lioy P.J. Assessing total human exposure to contaminants: A multidisciplinary approach. Environ. Sci. Technol. 1990;24:938–945. doi: 10.1021/es00077a001. [DOI] [Google Scholar]
- 54.Gabrielsson J., Andersson R., Jirstrand M., Hjorth S. Dose-Response-Time Data Analysis: An Underexploited Trinity. Pharmacol. Rev. 2019;71:89–122. doi: 10.1124/pr.118.015750. [DOI] [PubMed] [Google Scholar]
- 55.Dobrek L. Chronopharmacology in Therapeutic Drug Monitoring—Dependencies between the Rhythmics of Pharmacokinetic Processes and Drug Concentration in Blood. Pharmaceutics. 2021;13 doi: 10.3390/pharmaceutics13111915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Li J., Bhattacharya S., Zhou J., Phadnis-Moghe A.S., Crawford R.B., Kaminski N.E. Aryl Hydrocarbon Receptor Activation Suppresses EBF1 and PAX5 and Impairs Human B Lymphopoiesis. J. Immunol. 2017;199:3504–3515. doi: 10.4049/jimmunol.1700289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Yeo G.H.T., Saksena S.D., Gifford D.K. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat. Commun. 2021;12:3222. doi: 10.1038/s41467-021-23518-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kana O.Z., BhattacharyaLab . 2023. BhattacharyaLab/scVIDR: Gamma. [DOI] [Google Scholar]
- 59.Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M., et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41:D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Katz K., Shutov O., Lapoint R., Kimelman M., Brister J.R., O’Sullivan C. The Sequence Read Archive: a decade more of explosive growth. Nucleic Acids Res. 2022;50:D387–D390. doi: 10.1093/nar/gkab1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Hagai T. 2018. RNA-seq of of dermal fibroblasts. [Google Scholar]
- 62.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
- 63.Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Harris C.R., Millman K.J., van der Walt S.J., Gommers R., Virtanen P., Cournapeau D., Wieser E., Taylor J., Berg S., Smith N.J., et al. Array programming with NumPy. Nature. 2020;585:357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Feydy J., Séjourné T., Vialard F.-X., Amari S., Trouvé A., Peyré G. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. arXiv. 2018 doi: 10.48550/arxiv.1810.08278. Preprint at. [DOI] [Google Scholar]
- 66.Plaut E. From principal subspaces to principal components with linear autoencoders. arXiv. 2018 doi: 10.48550/arXiv.1804.10253. Preprint at. [DOI] [Google Scholar]
- 67.Chen E.Y., Tan C.M., Kou Y., Duan Q., Wang Z., Meirelles G.V., Clark N.R., Ma’ayan A. Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinf. 2013;14:128. doi: 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Fang Z., Liu X., Peltz G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics. 2023;39:btac757. doi: 10.1093/bioinformatics/btac757. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data used in the manuscript are publicly available and are referenced in the manuscript. The code for the software and for reproducing the figures is available at https://github.com/BhattacharyaLab/scVIDR. Long-term archive of code repository is made available via Zenodo at http://doi.org/10.5281/zenodo.8025235.58