Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data

Philipp Angerer; David S Fischer; Fabian J Theis; Antonio Scialdone; Carsten Marr

doi:10.1093/bioinformatics/btaa198

. 2020 Mar 24;36(15):4291–4295. doi: 10.1093/bioinformatics/btaa198

Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data

Philipp Angerer ^b1,^b2, David S Fischer ^b1,^b2, Fabian J Theis ^b1, Antonio Scialdone ^b1,^b3,^b4,^✉, Carsten Marr ^b1,^✉

Editor: Inanc Birol

PMCID: PMC7520047 PMID: 32207520

Abstract

Motivation

Dimensionality reduction is a key step in the analysis of single-cell RNA-sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single-cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it difficult to characterize the underlying biological processes.

Results

In this article, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined sub-region. We apply our method to single-cell RNA-seq datasets from different experimental protocols and to different low-dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes.

Availability and implementation

To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Single-cell RNA-sequencing (scRNA-seq) has massively improved the resolution developmental trajectories Baron et al. (2016) and allowed unprecedented insights into the heterogeneity of complex tissues Tritschler et al. (2017) and Vento-Tormo et al. (2018). On the flip side, new challenges have arisen due to the amount of data that needs to be processed Angerer et al. (2017), higher levels of technical and biological noise Yuan et al. (2017), and identification and interpretation of known and novel cell types Pliner et al. (2019). To exploit the new opportunities and deal with the new challenges, a large number of algorithms and tools have been developed Zappia et al. (2018).

Dimension reduction methods create a low-dimensional embedding of the high-dimensional gene expression space. Those embeddings are widely used for two applications: They (i) serve as a visual overview of the data on which gene expression profiles and per-cell or per-cluster statistics can be compared. They (ii) serve as inputs for further downstream computational analysis. For example, principal component analysis (PCA) is a popular technique to identify orthogonal linear combinations of genes that explain variance in the data. PCA loadings quantify the contribution of genes to each principal component and help understand the genetic drivers of the underlying molecular processes. However, linear methods are often not able to capture the complexity of high-dimensional datasets (Haghverdi et al., 2015), which is why nonlinear dimension reduction methods have become the standard for scRNA-seq data analysis [see e.g. t-SNE (t-Stochastic Neighborhood Embedding) Husnain et al. (2019), diffusion maps; Coifman et al. (2005), Haghverdi et al. (2015) and Husnain et al. (2019); UMAP (Uniform Manifold Approximation and Projection) Becht et al. (2018) and McInnes et al. (2018); and graph-based methods Islam et al. (2011)]. However, no intrinsic measure of individual genes’ contribution to each embedding dimension exists for non-linear embeddings. Without such a measure, the identification of genes that drive the variability in the data requires tedious manual inspection and prior knowledge about possible target genes.

Here, we introduce gene relevance, a measure for a gene’s contribution to variance in low-dimensional embeddings, and present a method to infer a local as well as a global gene relevance score from any kind of low-dimensional embedding. To demonstrate the utility of the method, we apply gene relevance to several datasets prepared with different droplet- and plate-based protocols (see Supplementary Table S1). Gene relevance is available as part of the R package destinyAngerer et al. (2016).

2 Materials and methods

We define gene relevance as a measure of how much a gene contributes to the cell-to-cell variability in a low-dimensional embedding of a scRNA-seq dataset (see Fig. 1). It can be interpreted as a generalization of PCA loadings to non-linear dimensionality reduction techniques. Note that PCA loadings are constant with respect to the PC space while feature importance in a non-linear embedding is naturally a non-constant function of the embedding coordinates. A ranking of genes based on their relevance is built for every cell of the embedding. These rankings can be combined to obtain a measurement of the local or global relevance of each gene (see Fig. 1e–f), which highlight genes relevant in defined sub-regions of the embedding and all cells, respectively. To explore and visualize the results further, the method also provides a gene relevance map, where the locally most relevant genes are displayed along with their corresponding neighborhoods in the embedding (see Fig. 1g). Below, we describe in details every step in the estimation of global and local gene relevance.

Fig. 1. — The gene relevance concept. (a) A gene expression matrix from a scRNA-seq experiment is (b) reduced to a low-dimensional embedding *s_pc*, with each dot representing a cell, and the color representing the expression *x_gc* of gene $g \in {A, B, \dots, Z}$ in cell c. (c) Expression changes are calculated from estimates of partial derivatives with respect to the embedding, which results in one value per cell×gene×dimension combination. (d) We score the relevance of each gene in each cell according to the partial derivatives’ euclidean norm. This score indicates how relevant each gene is within its neighborhood. (e) For local gene relevance scores, we subdivide the embedding into bins and determine the fraction of cells per bin for which a given gene is among the (e.g. 10) most relevant genes (indicated by ‘%top rank’ in the figure legend). We color bins of the embedded cells according to their local gene relevance score, and fade them according to the number of cells they contain (indicated by ‘#cells’ in the legend). (f) For the global gene relevance score, we determine local relevance for all cells instead of a bin. In our illustrative example, Gene B has been ranked among the top 10 genes in 5.4% of all cells. (g) A gene relevance map indicates all cells where a given gene has the largest norm of partial derivatives (with or without a smoothing step—see Section 2). Such cells mark the areas where that gene has high local relevance

2.1 Neighborhoods

If a k nearest neighbor (kNN) search has been performed as part of the embedding, it can be efficiently used for estimating the gene relevance. To perform the kNN search, destiny offers the choice between euclidean distance, cosine distance, and spearman rank correlation distance. The latter was used in all analyses performed for this article.

2.2 Gene expression changes

The differential d_gc of gene g in cell c describes the change in gene expression x_gc along a change in embedding coordinates s_pc, where $p \in {1, \dots, P}$ is the embedding dimension and d_gc corresponds to the partial derivatives of the gene expression with respect to each embedding coordinate:

d_{g c} = (\frac{\partial x_{g c}}{\partial s_{1 c}}, \dots, \frac{\partial x_{g c}}{\partial s_{P c}})

(1)

We estimated d_gc from the cells’ neighborhood ${NN}_{k} (c)$ in gene expression space using finite differences. To address the high dropout rate present in scRNA-seq data, we do not define d_gc for x_gc = 0.

{\hat{(d_{g c})}}_{p} = {\begin{matrix} N A, & if x_{g c} = 0 \\ \underset{n \in {NN}_{k} (c) \land n \neq c}{median} \frac{x_{g c} - x_{g n}}{s_{p n} - s_{p c}}, & otherwise \end{matrix}

(2)

2.3 Local gene relevance

The basis of local and global gene relevance is the score of gene $g \in {1, \dots, G}$ in cell $c \in {1, \dots, C}$ , defined as the euclidean norm $| | d_{g c} | |_{2}$ of the differential d_gc:

| | d_{g c} | |_{2} = \sqrt{\sum_{p = 1}^{P} {(d_{g c})}_{p}^{2}}

(3)

In each cell c, genes can be ranked according to their score $| | d_{g c} | |_{2}$ , from most to least relevant. Given the ranks ${rg}_{| | d_{g c} | |_{2}}$ of gene g and a rank cutoff ${rg}_{\max}$ , we define the local gene relevance $L R_{{rg}_{\max}} (g, Ψ)$ of a gene for a set of cells $Ψ \subseteq {1, \dots, C}$ as:

{LR}_{{rg}_{\max}} (g, Ψ) = \frac{\sum_{c \in Ψ} [{rg}_{| | d_{g c} | |_{2}} < {rg}_{\max}]}{| Ψ |}

(4)

with the Iverson bracket notation

[P] = \{\begin{cases} \begin{matrix} 1, & if P is true \end{matrix} \\ \begin{matrix} 0, & otherwise \end{matrix} \end{cases}, for any statement P

(5)

In our analyses, we used the default ${rg}_{\max} = 10$ . The method is robust against the cutoff used, with the most relevant genes stable even for higher cutoffs >100 (see Supplementary Fig. S7).

2.4 Global gene relevance

The global relevance ${GR}_{{rg}_{\max}} (g)$ can simply be defined as the local gene relevance for the set of all cells ${1, \dots, C}$ .

{GR}_{{rg}_{\max}} (g) = {LR}_{{rg}_{\max}} (g, {1, \dots, C})

(6)

2.5 Local gene relevance plots

The local gene relevance of genes can be visualized by evenly dividing the embedding space into bins $b \in {1, \dots, B}$ and calculating $L R_{{rg}_{\max}} (g, Ψ_{b})$ for the set of cells falling into each bin $Ψ_{b}$ . This visualization is shown in Figures 1e and 2d.

Fig. 2. — Gene relevance automatically detects drivers of embryonic blood development. (a) Diffusion map of 271 single hematopoietic progenitor cells from mostly Day 7.5 and 7.75 mouse embryos, profiled in Scialdone *et al.* (2016). (b) Global gene relevance identifies Hbb-bh1 and Hba-x as genes that change most dramatically during hematopoietic development. (c) A gene relevance map identifies the contribution of relevant genes in specific regions of the process and the corresponding code to create it. The genes corresponding to each color are shown in panel (b). (d) A local gene relevance plot details the areas where the contribution of genes is highest. Alox5ap shows a high local relevance in the top region of the diffusion map and has been implicated with early blood development Ibarra-Soria *et al.* (2018)

2.6 Gene relevance maps

For a set of genes of interest $Ω \subseteq {1, \dots, G}$ (which can be chosen, e.g. among those with highest global relevance) and each cell c, we define the locally most relevant gene $l_{c}^{m}$ after a number of smoothing steps m:

l_{c}^{m} = \underset{g \in Ω}{arg max} {\begin{matrix} | | d_{g c} | |_{2}, & if m = 0 \\ \frac{1}{k} \sum_{n \in N N_{k} (c)} [l_{n}^{m - 1} = g], & otherwise . \end{matrix}

(7)

During a smoothing iteration, we replace the local gene relevance score of cell c and gene g with the fraction of neighbors that have g as the most relevant gene.

The amount of smoothing is controlled by the smoothing parameter m. Decreasing m will result in more locality and more genes with a high relevance in a small region will appear on the map, while genes with a medium relevance in more or larger regions will vanish. When determining the locally most relevant gene using the globally most relevant genes as $Ω$ , the aforementioned parameter ${rg}_{\max}$ can be used as a sensitivity parameter, with higher values resulting in more genes being selected as relevant. Finding the globally most relevant genes are robust to both parameters (see Supplementary Fig. S7), while locally relevant genes are affected to a larger extent. Importantly, due to the short computation time of gene relevance maps, one can explore several combinations of parameters. For cell sizes in the range of few hundreds, the running time is less than one to a few seconds with 40 000 cell×gene combinations per second processed. For cell sizes in the thousands 20 000 cell×gene combinations per second are processed, resulting in run times of a few minutes (run times have been measured on a single 3.6 GHz CPU core).

2.7 scRNA-seq data

We demonstrate our method on four datasets (see Section 3 and Supplementary Table S1 for details).

In the mouse gastrulation data from Scialdone et al. (2016), we used count data from 271 cells mostly of the neural plate (embryonic Day 7.5) and head fold (embryonic Day 7.75) development stages of mouse embryos. There, the libraries were constructed using the Smart-seq2 protocol, read counts were obtained via HTseq-count. The 271 cells we used correspond to the clusters annotated as ‘blood progenitor’ and ‘primitive erythroid’ in the original publication. We selected highly variable genes using the method of Brennecke et al. (2013) because of its stable performance Yip et al. (2018), and embedded the log-transformed data using the diffusion map implementation destiny Angerer et al. (2016).

For the two human cell datasets, we applied the same analysis steps, starting from the highly variable gene selection (see Supplementary Fig. S3). The human endocrine cell data from Veres et al. (2019) was sequenced using the inDrops platform, while the human brain organoid data from Gray Camp et al. (2015) used the SMARTer Ultra Low RNA Kit in combination with an Illumina sequencer. The data from Kolodziejczyk et al. (2015) used Nextera kits together with an Illumina sequencer. For further details about the pre-processing of these data, please refer to the individual publications.

3 Results

We demonstrate our method on a scRNA-seq dataset of blood progenitors and blood cells from mouse embryos Scialdone et al. (2016; see Fig. 2a). In the original publication, these data were used to reconstruct a trajectory representing primitive erythropoiesis, along which blood marker expression increases and other markers (such as endothelial cells) decrease. There, an ad hoc method was devised to find important genes in the 2D diffusion map embedding of the data. Here, we show how our method can be used ‘out of the box’ to rank genes based on their local and global relevance.

First, we ranked all highly variable genes according to their global gene relevance (see Fig. 2b). As expected, the high-ranking genes are mostly associated with blood development, including the hemoglobin genes Hba-a1, Hba-x, Hbb-bh1 and the erythrocyte membrane genes Gypa and Cited4 Yahata et al. (2002). The genes Cyr61 and Hapln1 are involved in extracellular matrix and important for development of the cardiovascular system Latinkić et al. (2001). The top of the list has a good overlap with the ad hoc method in Scialdone et al. (2016): 4 genes are shared between the top 10 of both lists, and we find a Rank-Biased Overlap of RBO_p = 0.48, where we used p = 0.9, which assigns ∼86% of the weight to the first 10 genes Webber et al. (2010).

Second, we created a gene relevance map (Fig. 2c). Five out of the six locally most relevant genes (see Fig. 2c) are among the ten most globally relevant ones. Interestingly, Alox5ap is included only in the gene relevance map, because its contribution is confined to a small region of the diffusion space (bottom right panel in Fig. 2d) and hard to detect at the level of gene expression (see Supplementary Fig. S1). This gene was not discovered by the ad hoc method of Scialdone et al. (2016), but it has been recently found to be important in early blood development Ibarra-Soria et al. (2018). Locally and globally relevant genes can also be inferred in other embeddings such as t-SNE van der Maaten and Hinton (2008) and UMAP Becht et al. (2018), with a stable recovery of the most relevant genes between comparable embeddings (see Supplementary Fig. S2).

Applied to other scRNA-seq datasets, we showcase versatility and ease of application of our method. In droplet-sequenced data of human endocrine cells Veres et al. (2019), gene relevance maps detect genes driving the separation of sub-regions of the embedding (see Supplementary Fig. S3a), in accordance to the markers identified in the original paper. In human brain organoid cells Gray Camp et al. (2015), we detect relevant genes different from the markers specified in the article because of a low-density region between mesenchymal cells and neurons/neural progenitors (see Supplemetnary Fig. S3b). The genes therefore seem to be selected for driving the difference between progenitors and neurons: TXNRD1 plays a vital role for neuron progenitor cells Soerensen et al. (2008), the selenoprotein SELT protects neurons against oxidative stress in mouse models Boukhzar et al. (2016), and CRABP1 modulates the neuronal cell cycle in mice Lin et al. (2017).

Finally, we applied gene relevance to mouse embryonic stem cells grown in three different pluripotency retaining media Kolodziejczyk et al. (2015). As expected for cells in a relatively homogenous pluripotent steady state, the relevant genes for diffusion map embedding of all three media were enriched for housekeeping, metabolic and proliferation pathways (see Supplementary Fig. S5).

4 Discussion

We presented a method that is able to reliably detect relevant genes from low-dimensional embeddings of scRNA-seq data. More specifically, our method computes both a local and a global gene relevance score: local gene relevance identifies the main drivers of the cell-to-cell variability in defined sub-regions of the embedding, while global gene relevance identifies those of the whole embedding. In addition to a gene ranking based on global relevance, the method also provides graphic tools to visualize the local gene relevance (see Fig. 1e) and the changes in gene expression levels within the embedding (see Fig. 1c and Supplementary Fig. S1). It can be used for any single-cell dataset and any dimensionality reduction technique.

We applied our method to three datasets, including one from mouse embryonic blood progenitors, where we show that it performs comparably to a technique custom-made for the dataset. Interestingly, our method identifies Alox5ap (Fig. 2), a gene that was recently shown to be important for blood development in a later publication Ibarra-Soria et al. (2018). In two other examples, we used human cells, endocrine Veres et al. (2019) and from brain organoids Gray Camp et al. (2015), showing that the method works robustly in varied conditions.

When compared with the classical method of looking at PCA loadings, gene relevance provides a generalization not tied to this dimension reduction method, as it is also suited for non-linear dimension reduction methods. It can therefore be used to explore the differences in the embeddings obtained by different dimension reduction methods (see Supplementary Fig. S2). Applied to a PCA embedding, gene relevance recovers a similar list of genes to those with the highest PC loadings (see Supplementary Fig. S8). Other methods to identify important genes specifically from scRNA-seq data exist, but most of them aim to find marker genes that can best distinguish different cell types Delaney et al. (2019). Conversely, the method we presented is unsupervised and does not rely on cell type annotation.

Recently, two computational methods have been developed to identify variable genes in spatial RNA-seq datasets, trendsceek and SpatialDE Edsgärd et al. (2018) and Svensson et al. (2018). Although these methods were designed to find patterns in spatial transcriptomic datasets, they can also be used to identify relevant genes in low-dimensional embeddings of scRNA-seq datasets [see Supplementary Fig. S6 in Edsgärd et al. (2018)]. We compared our approach to trendsceek and found similar genes (see Supplementary Fig. S6) in a considerably shorter running time: our method took 6.5 s, whereas trendsceek needed 1080 s (run times measured on a single 2.0 GHz CPU core). SpatialDE returned a perfect score for too many genes, making gene ranking impossible. This is probably related to both methods being optimized toward identifying spatial patterns. Moreover, neither method allows estimation of local gene relevance.

To summarize, our gene relevance method is a fast and versatile exploratory tool that can help identify the biological processes and reveal the presence and driving genes of potentially rare cell sub-populations. It is available online, easily applicable and faster than model fitting approaches. Although we focused our discussion on scRNA-seq datasets, our method can be applied to virtually any kind of dataset where low-dimensional embeddings are obtained, including, for instance, single-cell epigenomic Shema et al. (2019) and mass cytometry data Spitzer and Nolan (2016).

Gene relevance has been developed as part of the Bioconductor package destiny: bioconductor.org/packages/destiny. API documentation for analysis and plotting and are available at is at https://theislab.github.io/destiny/

The datasets analyzed within this publication are available from their original publications as described in Supplementary Table S1

Supplementary Material

btaa198_Supplementary_Data

Click here for additional data file.^{(3MB, zip)}

Acknowledgements

We thank Moritz Thomas and Ali Boushehri for helpful feedback on the article.

Author contributions

P.A. designed the analysis, implemented the method. A.S. interpreted the results and wrote the article with P.A. and C.M. D.F. contributed to the mathematical description of the gene relevance concept. F.T. contributed the initial idea of gene relevance. C.M. supervised the study.

Financial Support: D.S.F. acknowledges financial support by a German research foundation (DFG) fellowship through the Graduate School of Quantitative Biosciences Munich (QBM) [GSC 1006] and by the Joachim Herz Stiftung. F.T. and C.M. acknowledge support from the DFG funded collaborative research center [SFB 1243].

Conflict of Interest: none declared.

References

Angerer P. et al. (2016) Destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics, 32, 1241–1243. [DOI] [PubMed] [Google Scholar]
Angerer P. et al. (2017). Single cells make big data: New challenges and opportunities in transcriptomics.
Baron M. et al. (2016) A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst., 3, 346–360.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Becht E. et al. (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol., 37, 38–44. [DOI] [PubMed] [Google Scholar]
Boukhzar L. et al. (2016). Selenoprotein T exerts an essential oxidoreductase activity that protects dopaminergic neurons in mouse models of Parkinson’s disease. Antioxid. Redox. Signal, 24, 557–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brennecke P. et al. (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods, 10, 1093–1095. [DOI] [PubMed] [Google Scholar]
Coifman R.R. et al. (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. Natl. Acad. Sci. USA, 102, 7426–7431. [DOI] [PMC free article] [PubMed] [Google Scholar]
Delaney C. et al. (2019) Combinatorial prediction of gene-marker panels from single-cell transcriptomic data. Mol. Syst. Biol., 15, e9005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edsgärd D. et al. (2018) Identification of spatial expression trends in single-cell gene expression data. Nat. Methods, 15, 339–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gray Camp J. et al. (2015) Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc. Natl. Acad. Sci. USA, 112, 15672–15677. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haghverdi L. et al. (2015) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics, 31, 2989–2998. [DOI] [PubMed] [Google Scholar]
Husnain M. et al. (2019) Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE. Symmetry, 11, 107. 10.3390/sym11010107 [DOI] [Google Scholar]
Ibarra-Soria X. et al. (2018) Defining murine organogenesis at single-cell resolution reveals a role for the leukotriene pathway in regulating blood progenitor formation. Nat. Cell Biol., 20, 127–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
Islam S. et al. (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res., 21, 1160–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolodziejczyk A.A. et al. (2015) Single cell RNA-Sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell, 17, 471–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
Latinkić B.V. et al. (2001) Promoter function of the angiogenic inducer Cyr61Gene in transgenic mice: tissue specificity, inducibility during wound healing, and role of the serum response element. Endocrinology, 142, 2549–2557. [DOI] [PubMed] [Google Scholar]
Lin Y.-L. et al. (2017) Cellular retinoic Acid-Binding protein 1 modulates stem cell proliferation to affect learning and memory in male mice. Endocrinology, 158, 3004–3014. [DOI] [PMC free article] [PubMed] [Google Scholar]
McInnes L. et al. (2018) UMAP: Uniform manifold approximation and projection.
Pliner H.A. et al. (2019) Supervised classification enables rapid annotation of cell atlases. Nat. Methods, 16, 983–986. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scialdone A. et al. (2016) Resolving early mesoderm diversification through single-cell expression profiling. Nature, 535, 289–293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shema E. et al. (2019) Single-cell and single-molecule epigenomics to uncover genome regulation at unprecedented resolution. Nat. Genet., 51, 19–25. [DOI] [PubMed] [Google Scholar]
Soerensen J. et al. (2008) The role of thioredoxin reductases in brain development. PLoS One, 3, e1813. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spitzer M.H., Nolan G.P. (2016) Mass cytometry: single cells, many features. Cell, 165, 780–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
Svensson V. et al. (2018) SpatialDE: identification of spatially variable genes. Nat. Methods, 15, 343–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tritschler S. et al. (2017) Systematic single-cell analysis provides new insights into heterogeneity and plasticity of the pancreas. Mol. Metab., 6, 974–990. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Maaten L., Hinton G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605. [Google Scholar]
Vento-Tormo R. et al. (2018) Single-cell reconstruction of the early maternal-fetal interface in humans. Nature, 563, 347–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
Veres A. et al. (2019) Charting cellular identity during human in vitro β-cell differentiation. Nature, 569, 368–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
Webber W. et al. (2010) A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28, 1–38. [Google Scholar]
Yahata T. et al. (2002) Cloning of mouse cited4, a member of the CITED family p300/CBP-binding transcriptional coactivators: induced expression in mammary epithelial cells. Genomics, 80, 601–613. [DOI] [PubMed] [Google Scholar]
Yip S. H. et al. (2019) Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Briefings in Bioinformatics, 20, 1583–1589. 10.1093/bib/bby011 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan G.-C. et al. (2017) Challenges and emerging directions in single-cell analysis. Genome Biol., 18, 84. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zappia L. et al. (2018) Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol., 14, e1006245. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btaa198_Supplementary_Data

Click here for additional data file.^{(3MB, zip)}

[btaa198-B1] Angerer P. et al. (2016) Destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics, 32, 1241–1243. [DOI] [PubMed] [Google Scholar]

[btaa198-B2] Angerer P. et al. (2017). Single cells make big data: New challenges and opportunities in transcriptomics.

[btaa198-B3] Baron M. et al. (2016) A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst., 3, 346–360.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B4] Becht E. et al. (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol., 37, 38–44. [DOI] [PubMed] [Google Scholar]

[btaa198-B5] Boukhzar L. et al. (2016). Selenoprotein T exerts an essential oxidoreductase activity that protects dopaminergic neurons in mouse models of Parkinson’s disease. Antioxid. Redox. Signal, 24, 557–574. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B6] Brennecke P. et al. (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods, 10, 1093–1095. [DOI] [PubMed] [Google Scholar]

[btaa198-B7] Coifman R.R. et al. (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. Natl. Acad. Sci. USA, 102, 7426–7431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B8] Delaney C. et al. (2019) Combinatorial prediction of gene-marker panels from single-cell transcriptomic data. Mol. Syst. Biol., 15, e9005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B9] Edsgärd D. et al. (2018) Identification of spatial expression trends in single-cell gene expression data. Nat. Methods, 15, 339–342. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B10] Gray Camp J. et al. (2015) Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc. Natl. Acad. Sci. USA, 112, 15672–15677. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B11] Haghverdi L. et al. (2015) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics, 31, 2989–2998. [DOI] [PubMed] [Google Scholar]

[btaa198-B12] Husnain M. et al. (2019) Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE. Symmetry, 11, 107. 10.3390/sym11010107 [DOI] [Google Scholar]

[btaa198-B13] Ibarra-Soria X. et al. (2018) Defining murine organogenesis at single-cell resolution reveals a role for the leukotriene pathway in regulating blood progenitor formation. Nat. Cell Biol., 20, 127–134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B14] Islam S. et al. (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res., 21, 1160–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B15] Kolodziejczyk A.A. et al. (2015) Single cell RNA-Sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell, 17, 471–485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B16] Latinkić B.V. et al. (2001) Promoter function of the angiogenic inducer Cyr61Gene in transgenic mice: tissue specificity, inducibility during wound healing, and role of the serum response element. Endocrinology, 142, 2549–2557. [DOI] [PubMed] [Google Scholar]

[btaa198-B17] Lin Y.-L. et al. (2017) Cellular retinoic Acid-Binding protein 1 modulates stem cell proliferation to affect learning and memory in male mice. Endocrinology, 158, 3004–3014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B18] McInnes L. et al. (2018) UMAP: Uniform manifold approximation and projection.

[btaa198-B19] Pliner H.A. et al. (2019) Supervised classification enables rapid annotation of cell atlases. Nat. Methods, 16, 983–986. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B20] Scialdone A. et al. (2016) Resolving early mesoderm diversification through single-cell expression profiling. Nature, 535, 289–293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B21] Shema E. et al. (2019) Single-cell and single-molecule epigenomics to uncover genome regulation at unprecedented resolution. Nat. Genet., 51, 19–25. [DOI] [PubMed] [Google Scholar]

[btaa198-B22] Soerensen J. et al. (2008) The role of thioredoxin reductases in brain development. PLoS One, 3, e1813. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B23] Spitzer M.H., Nolan G.P. (2016) Mass cytometry: single cells, many features. Cell, 165, 780–791. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B24] Svensson V. et al. (2018) SpatialDE: identification of spatially variable genes. Nat. Methods, 15, 343–346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B25] Tritschler S. et al. (2017) Systematic single-cell analysis provides new insights into heterogeneity and plasticity of the pancreas. Mol. Metab., 6, 974–990. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B26] van der Maaten L., Hinton G. (2008) Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605. [Google Scholar]

[btaa198-B27] Vento-Tormo R. et al. (2018) Single-cell reconstruction of the early maternal-fetal interface in humans. Nature, 563, 347–353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B28] Veres A. et al. (2019) Charting cellular identity during human in vitro β-cell differentiation. Nature, 569, 368–373. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B29] Webber W. et al. (2010) A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28, 1–38. [Google Scholar]

[btaa198-B30] Yahata T. et al. (2002) Cloning of mouse cited4, a member of the CITED family p300/CBP-binding transcriptional coactivators: induced expression in mammary epithelial cells. Genomics, 80, 601–613. [DOI] [PubMed] [Google Scholar]

[btaa198-B31] Yip S. H. et al. (2019) Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Briefings in Bioinformatics, 20, 1583–1589. 10.1093/bib/bby011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B32] Yuan G.-C. et al. (2017) Challenges and emerging directions in single-cell analysis. Genome Biol., 18, 84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa198-B33] Zappia L. et al. (2018) Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol., 14, e1006245. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data

Philipp Angerer

David S Fischer

Fabian J Theis

Antonio Scialdone

Carsten Marr

Roles

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

2 Materials and methods

Fig. 1.

2.1 Neighborhoods

2.2 Gene expression changes

2.3 Local gene relevance

2.4 Global gene relevance

2.5 Local gene relevance plots

Fig. 2.

2.6 Gene relevance maps

2.7 scRNA-seq data

3 Results

4 Discussion

Supplementary Material

Acknowledgements

Author contributions

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data

Philipp Angerer

David S Fischer

Fabian J Theis

Antonio Scialdone

Carsten Marr

Roles

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

2 Materials and methods

Fig. 1.

2.1 Neighborhoods

2.2 Gene expression changes

2.3 Local gene relevance

2.4 Global gene relevance

2.5 Local gene relevance plots

Fig. 2.

2.6 Gene relevance maps

2.7 scRNA-seq data

3 Results

4 Discussion

Supplementary Material

Acknowledgements

Author contributions

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases