TooManyCellsInteractive: A visualization tool for dynamic exploration of single-cell data

Conor Klamann; Christie J Lau; Javier Ruiz-Ramírez; Gregory W Schwartz

doi:10.1093/gigascience/giae056

. 2024 Aug 22;13:giae056. doi: 10.1093/gigascience/giae056

TooManyCellsInteractive: A visualization tool for dynamic exploration of single-cell data

Conor Klamann ^1,^#, Christie J Lau ^2,^3,^#, Javier Ruiz-Ramírez ⁴, Gregory W Schwartz ^5,^6,^7,^✉

PMCID: PMC11340645 PMID: 39172544

Abstract

Background

As single-cell sequencing technologies continue to advance, the growing volume and complexity of the ensuing data present new analytical challenges. Large cellular populations from single-cell atlases are more difficult to visualize and require extensive processing to identify biologically relevant subpopulations. Managing these workflows is also laborious for technical users and unintuitive for nontechnical users.

Results

We present TooManyCellsInteractive (TMCI), a browser-based JavaScript application for interactive exploration of cell populations. TMCI provides an intuitive interface to visualize and manipulate a radial tree representation of hierarchical cell subpopulations and allows users to easily overlay, filter, and compare biological features at multiple resolutions. Here we describe the software architecture and demonstrate how we used TMCI in a pan-cancer analysis to identify unique survival pathways among drug-tolerant persister cells.

Conclusions

TMCI will facilitate exploration and visualization of large-scale sequencing data in a user-friendly way. TMCI is freely available at https://github.com/schwartzlab-methods/too-many-cells-interactive. An example tree from data within this article is available at https://tmci.schwartzlab.ca/.

Keywords: single-cell sequencing, data visualization, hierarchical clustering, big data, browser-based, interactive graphical user interface, drug-tolerant persister cells, cell line

Introduction

Single-cell sequencing quantifies transcriptomic and epigenomic activity at the resolution of individual cells, which enables unprecedented insight into the cellular landscape of biological processes and diseases. However, current approaches for single-cell visualization were not developed to scale with increasingly complex data produced by high-throughput sequencing technologies—both in terms of the number of measured cells and the number of features measured per cell.

A key component of single-cell analysis is to identify distinct cell states and types present within the experimental sample [1–4]. Most standard visualization workflows begin by collapsing the high-dimensional cell features (e.g., genes or chromosome regions) into 2 dimensions using techniques such as principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), or uniform manifold approximation (UMAP) [5–7]. Current methods apply dimensionality reduction to make the data more amenable for analysis and visualization, a technique that often distorts distances between cells [8–12]. As a result, cells placed closer together on a scatterplot may not necessarily represent cells with higher biological similarity. Dimensionality reduction is commonly followed by unsupervised clustering algorithms such as k-means, Louvain, or Leiden, which are all limited to generating a single-resolution grouping that cannot simultaneously identify subpopulations and is heavily influenced by user-defined parameters [13]. By default, most analysis toolkits also apply clustering on low-dimensional embeddings to reduce computation time, thereby removing potential signals in the data for downstream interpretations. To overcome such limitations, we previously introduced TooManyCells—a suite of tools for cell-clade quantification [1, 2]. The TooManyCells dendrogram depicts all cells starting at the root node, which become recursively bipartitioned at each subsequent child node based on similarity. While TooManyCells preserves distances between cells and presents multiple resolutions of cellular populations, the method produces static representations that generate complex trees with larger datasets.

In addition to methodological limitations of existing visualizations, many bioinformatic tools are out of reach because they require both computational expertise and biological insight for data exploration. To help bridge this gap, interactive tools such as CELLxGENE [14], Cirrocumulus [15], and others [16, 17] facilitate visualization and inquiry of high-throughput single-cell data. More recent tools were designed to assist with specific challenges within analytical workflows such as read alignment [18], parameter selection [19], compute-intensive processes [20], cell-type annotation [21], and lack of familiarity with programming [22, 23]. However, their approaches toward single-cell data analysis remain fundamentally unchanged from that of conventional workflows and do not address scalability issues. Altogether, the limitations of current interactive visualization approaches inevitably affect our ability to interpret the underlying biology and represent a critical issue in high-throughput data analysis.

To address these limitations, we introduce TooManyCellsInteractive (TMCI), a browser-based JavaScript application for interactive exploration of cell populations. TMCI is an easy-to-use tool that displays single-cell data as a radial tree of nested cell clusters and their relationships, and it can be applied to a variety of different data types, including gene expression from single-cell RNA sequencing (scRNA-seq) [1] and chromatin accessibility from single-cell assay for transposase-accessible chromatin (scATAC-seq) [2]. TMCI works seamlessly with TooManyCells dendrograms, so users can interactively explore the tree structure through a responsive dashboard to quickly and easily retrieve population statistics, manually or statistically alter cluster resolution of the tree, quickly overlay feature information, and batch export the display across thousands of trees (Fig. 1). Here, we demonstrate TMCI’s advantages over commonly used visualization tools by benchmarking across several datasets and highlighting an example use case of TMCI to study drug-tolerance mechanisms across multiple cancer types. With an intuitive interface and flexible export system, TMCI is a robust solution to visualize large single-cell datasets. TMCI is open source and packaged with all dependencies at https://github.com/schwartzlab-methods/too-many-cells-interactive. An example tree from data within this article is available at https://tmci.schwartzlab.ca/.

Figure 1: — Overview of the TMCI output interface. (A, B) Direct interactions with the main interface. The user may manually edit the tree in the main interface by stretching or shrinking branches (A) or selecting a new tree root (B). (C) Color picker for cell labels through hex value or slider when selecting the label of choice in the legend. (D, E) Visualization features of the tree branches and nodes, including the disabling of branch scaling (D) and adjusting the width of branches (E), among other visualization features. (F) Live-updating tooltips containing statistics for each node. (G) Breadcrumb toolbar containing previous structural changes as the user interactively prunes the tree based on the distribution of nodes. (H) Fuzzy-search bar to see the overlay of a feature on each node in the tree, such as gene expression for each cellular population. The user may select 1 or several features through the fuzzy-search bar and select thresholds for “high” and “low” cutoffs for simultaneous feature overlays (e.g., both *CD4* and *CD8a*).

Results

Implementation

TMCI consists of a browser-based graphical user interface (Fig. 1), a web server, a relational database, a containerized runtime environment, and a collection of initialization and data-processing scripts (Fig. 2). The TMCI browser application is written in TypeScript, a statically typed superset of JavaScript, and implements a variety of frameworks and libraries to provide a highly interactive graphical user interface. Principal UI elements include an interactive radial tree for data visualization and a dashboard-style panel of input controls enabling users to make real-time adjustments to their plots. Such adjustments include node filtering (“pruning”), scale modification, feature overlay, and manual position adjustment (Supplementary Note S1). For saving the tree, TMCI supports image exporting to both PNG and SVG formats.

Figure 2: — The architecture for TMCI. (A) The front-end architecture of TMCI. The user interacts with the D3 tree visualization and React user interface, which sends state-change requests to Redux. This state-tracking feature enables batch processing: the user may upload a configuration state, which the Express server will read without loading the graphical user interface and automatically export the corresponding SVG. (B) The back-end Express server container takes as input the tree structure and cell label files in the Node application. Similarly, a PostgreSQL container reads the matrix files containing the count matrices with features such as gene expression or chromatin accessibility. The Express container manages feature overlays on the tree through PostgreSQL queries in response to front-end requests. Flowcharts are Unified Modeling Language structured diagrams where dashed closed arrows indicate dependencies, dashed open arrows indicate artifacts, solid closed arrows indicate relationships, and solid diamonds are compositions.

The browser application’s base architecture is provided by custom React.js components, while state management is handled by Redux and the interactive plots are created with D3.js, a widely used low-level collection of data visualization modules for scaling, event binding, Document Object Model (DOM) traversal, and high-performance animations (Fig. 2).

The back-end Node application transpiles the Typescript to JavaScript using a Webpack bundler and serves it to the user’s browser via an Express application (Fig. 2). If users wish to include custom feature overlays in their plots, such as gene expression data, they may upload the data to the PostgreSQL database that has been configured to connect to the Node server (Supplementary Note S2).

Both the PostgreSQL database and the Node server run in Docker containers, for which TMCI provides a declarative configuration via Docker Compose. TMCI’s containerized architecture allows it to be run on any computer with Docker installed, and the TMCI codebase includes Bash scripts intended as convenience wrappers around commonly used Docker commands that can be easily extended for custom use.

Because D3.js has no strict browser dependencies, TMCI’s radial tree plots can be rendered without a browser interface. TMCI provides both a Node script and a shell script to enable easy programmatic rendering. The scripts require an additional configuration JSON string that can be exported directly from the browser interface. Thus, users may refine their visualizations in the graphical environment and then reuse their configurations as templates for scripted batch processing on the server.

TMCI reduces time to display trees

To compare the computational time and memory of our TMCI approach to data visualization from both our original, static implementation as well as other commonly used single-cell data exploration tools CELLxGENE [14] and Cirrocumulus [15], we developed 5 benchmarks for common single-cell analyses: loading in all single-cell data and generating a visualization (display with features), overlaying colors on the visualization corresponding to a single feature annotation (overlay feature), batch processing 5 sequential feature overlays (overlay multiple features), adjusting the cluster resolution through tree pruning (i.e., reducing the size of the tree by collapsing child nodes into the parent node; prune tree), and rendering the visualization itself without loading the full read matrix (tree display). We ran these benchmarks using 54,220 cells from a scRNA-seq dataset of 11 samples across 5 cancer cell lines (Fig. 3A, B), 18,859 cells (“subset”; Fig. 3C, D), and 41,668 cells (Fig. 3E, F) from the Tabula Muris dataset containing 10 mouse organs [24], as well as 483,152 cells from the Tabula Sapiens dataset of 24 tissues and organs from the human body [25].

Figure 3: — Comparative analysis of performance. (A–F) Comparisons on the x-axis including, from left to right, loading a count matrix and displaying a visualization (all programs), overlaying a single feature on a tree (tree programs only), batch processing 5 features on a tree (tree programs only), pruning a larger tree (tree programs only), and rendering a tree without the count matrix loaded (tree programs only). Comparisons were split by 11 samples from 5 cancer cell lines in response to drug treatment (A, B), a set of 10 samples from mouse tissues [24] (C, D), or a set 24 samples from mouse tissues (E, F) and 24 human tissues and organs [25], measuring time (A, C, E, G) or memory usage (B, D, F, H). TMC: TooManyCells; X: Incomplete due to insufficient memory.

To assess a baseline performance of each program, we compared the time and memory needed to display trees without feature overlays on our cancer cell line dataset, meaning that no matrix processing was required. Inputs for both programs were the tree and label files generated by TooManyCells, and we ran each benchmark 5 times to account for potential variability. TMCI was 4-fold faster than TooManyCells in the cancer cell line dataset (mean 1.07 vs. 4.62 seconds, t-test: Inline graphic ; Fig. 3A), demonstrating an order of magnitude speed improvement with our new implementation. Importantly, this upgrade did not come at the cost of memory, as TMCI used 120 MB less memory than TooManyCells (mean 188 vs. 308 MB, t-test: ; Fig. 3B). As this benchmark did not alter the structure of the tree, we next compared tools by pruning the tree to have nodes containing no fewer than 1,000 cells. This additional processing resulted in TMCI using approximately the same amount of resources as the unpruned tree and TooManyCells, increasing its performance to a mean of 1.64 seconds (t-test: Inline graphic ) and 300 MB of RAM (t-test: ; Fig. 3A, B). While the performance increase of TMCI over TooManyCells was consistent across datasets, some gains were 20-fold as with the larger Tabula Muris dataset with insufficient memory to complete the visualization for TooManyCells on the Tabula Sapiens dataset (Fig. 3C–H, Supplementary Tables S1–S4).

Although TMCI outperformed with only tree display and processing, this benchmark did not account for matrix processing. As such, we next compared the performance of each program when rendering feature overlays, which introduces the resource-intensive task of retrieving expression data from the matrix. For a single feature on the cancer cell lines dataset, TMCI outperformed TooManyCells in task duration (mean 554 vs. 1,021 seconds, t-test: Inline graphic ; Fig. 3A). This advantage remained through TMCI’s greatly reduced memory usage (mean 19.9 vs. 89.8 GB, t-test: ; Fig. 3B). While this test displayed significant memory gains for TMCI over TooManyCells, a more applicable benchmark is to batch process the creation of several graphics from a single tree with varying gene expression overlays. In this benchmark of 5 features, TMCI outperformed TooManyCells in both time (mean 577 vs. 5,102 seconds, Inline graphic ) and memory (mean 19.9 vs. 89.8 GB, ; Fig. 3A, B) usage due to the unique persistent feature database, which enables TMCI to generate any number of images after only a single data import operation. TooManyCells, on the other hand, must process the matrix for each new graphic, leading to a linear Inline graphic performance where n is the number of feature overlays requested. As a result, TMCI was able to generate 10 trees with just 23 seconds longer than a single tree, while TooManyCells took 10 times longer than a single tree. These observations were consistent through all datasets (Fig. 3C–F, Supplementary Tables S1–S4).

To compare with non-tree-based methods, we measured the performance of loading an entire single-cell dataset and producing a visualization. For our cancer cell line dataset, Cirrocumulus was the fastest (mean 331 seconds), with CELLxGENE (mean 528 seconds) and TMCI (mean 550 seconds) close behind, and TooManyCells being the slowest (mean 1 009 seconds; Fig. 3A). Likewise, Cirrocumulus had the lowest memory usage (mean 9.34 GB), followed by CELLxGENE (mean 11.3 GB), TMCI (mean 19.9 GB), and TooManyCells (mean 89.8 GB; Fig. 3B). Importantly, TMCI displays all cluster resolutions, while CELLxGENE and Cirrocumulus only show “flat” clusterings, even though TMCI has a closer performance to these 2 tools than TooManyCells, which is significantly more resource heavy (Supplementary Tables S1 and S2). These observations are consistent in the Tabula Muris datasets but not in the larger Tabula Sapiens, where TMCI has the second lowest time and memory usage, outperforming CELLxGENE (Fig. 3E–H, Supplementary Tables S1–S4). Together, these benchmarks indicate not only the comparable performance of TMCI to a generate static and interactive tree of single-cell data compared to other tools across multiple clustering resolutions but also its ability to quickly and efficiently batch process many trees at once.

Case study: TMCI effectively delineates subpopulations of cancer drug-tolerant persister cells

To demonstrate the utility of TMCI for quantification and visualization of relationships between diverse single-cell datasets, we explored the transcriptional differences induced by short-term (2–3 days) and long-term (6–7 weeks) treatment of cancer cells in vitro. While treatment eliminates the majority of cancer cells, rare populations of drug-tolerant persister cells survive and may potentially act as a reservoir for drug-resistant growth [26]. Persister cells are characterized by a nongenetic, slow-cycling state that is reversible; upon drug holiday, persister cells are resensitized to treatment [27]. We sought to better understand the differences between short- and long-term treatment exposure in these persister cells using TMCI. To this end, we aggregated publicly available scRNA-seq data from 5 independent cancer persister cell experiments across various disease areas and treatment modalities (Fig. 4A, Table 1) [1, 28–31]. The TMCI visualization identified distinct separation between cancer cell lines, followed by division of control and treatment arms (Fig. 4B). This hierarchy suggests that cells of a given cancer type, regardless of drug treatment, are more transcriptionally similar to one another than persister cells across cancer types for most populations.

Figure 4: — TMCI identifies distinct transcriptional programs across short-term and long-term treated drug-tolerant persister cells across cancer types. (A) Counts of T-cell acute lymphoblastic leukemia (DND-41), pancreatic cancer (LNCaP), melanoma (SK-MEL-28), lung cancer (PC9), and breast cancer (MDA-MB-231) cells from public persister cell scRNA-seq experiments. Each cell line (control) received a short-term (1–3 days) or long-term (6–7 weeks) anticancer treatment. (B, C) TMCI tree of cells from (A) colored by cell line and treatment condition (B) or by average expression of *ID2* per node (C). (D) Box-and-whisker plot of *ID2* expression for cells from each treatment condition. (E) The top 10 enriched pathways determined by Metascape [40] of the top 100 upregulated genes among short-term (top) and long-term (bottom) treated cells. Pathways relevant to the regulation of cellular proliferation are highlighted in red. (F, G) Normalized enrichment scores (NES) and respective q-values from gene set enrichment analysis (GSEA) [41] of short-term (F) and long-term (G) treated cells compared to corresponding control cells. “FISCHER_G2_M_CELL_CYCLE” is highlighted in red. (H) GSEA curve of the “FISCHER_G2_M_CELL_CYCLE” gene set for short-term treated cells against untreated cells. (I) NES scores of the “FISCHER_G2_M_CELL_CYCLE” gene set for each node against all other nodes in the TMCI tree from (B). (J) TMCI tree from (B) colored by diapause gene signature. (K) Box-and-whisker of diapause signature scores for cells from each treatment condition. For all box-and-whisker plots, center line: median, box bounds: interquartile range, whiskers: minimum and maximum scores. Statistical annotations represent results of a 1-sided Mann–Whitney U test with Benjamini–Hochberg correction. ***, ^****, ns: not significant.

Inline graphic — TMCI identifies distinct transcriptional programs across short-term and long-term treated drug-tolerant persister cells across cancer types. (A) Counts of T-cell acute lymphoblastic leukemia (DND-41), pancreatic cancer (LNCaP), melanoma (SK-MEL-28), lung cancer (PC9), and breast cancer (MDA-MB-231) cells from public persister cell scRNA-seq experiments. Each cell line (control) received a short-term (1–3 days) or long-term (6–7 weeks) anticancer treatment. (B, C) TMCI tree of cells from (A) colored by cell line and treatment condition (B) or by average expression of *ID2* per node (C). (D) Box-and-whisker plot of *ID2* expression for cells from each treatment condition. (E) The top 10 enriched pathways determined by Metascape [40] of the top 100 upregulated genes among short-term (top) and long-term (bottom) treated cells. Pathways relevant to the regulation of cellular proliferation are highlighted in red. (F, G) Normalized enrichment scores (NES) and respective q-values from gene set enrichment analysis (GSEA) [41] of short-term (F) and long-term (G) treated cells compared to corresponding control cells. “FISCHER_G2_M_CELL_CYCLE” is highlighted in red. (H) GSEA curve of the “FISCHER_G2_M_CELL_CYCLE” gene set for short-term treated cells against untreated cells. (I) NES scores of the “FISCHER_G2_M_CELL_CYCLE” gene set for each node against all other nodes in the TMCI tree from (B). (J) TMCI tree from (B) colored by diapause gene signature. (K) Box-and-whisker of diapause signature scores for cells from each treatment condition. For all box-and-whisker plots, center line: median, box bounds: interquartile range, whiskers: minimum and maximum scores. Statistical annotations represent results of a 1-sided Mann–Whitney U test with Benjamini–Hochberg correction. ***, ^****, ns: not significant.

Table 1:

Human cancer cell lines from single-cell RNA-sequencing persister cell experiments used in this case study. Corresponding anticancer drugs, treatment duration, and GEO accession numbers are listed

Disease area	Cell line	Treatment	Duration	GEO accession
Prostate cancer	LNCaP	DMSO	48 h	GSM5155455
		Enzalutamide	48 h	GSM5155456
Melanoma	SK-MEL-28	Untreated		GSM4932163
		Dabrafenib	72 h	GSM4932166
Non–small cell lung cancer	PC9	Untreated		GSM3972651
		Erlotinib	72 h	GSM3972652
Breast cancer	MDA-MB-231	Untreated		GSM4684556
		Doxorubicin	7 wk	GSM4684557
T-cell acute lymphoblastic leukemia	DND-41	DMSO	24 h	GSM4121361
		Compound E	24 h	GSM4121362
		Compound E	6 wk	GSM4121364

Open in a new tab

TMCI identified differentially expressed ID2 across persister cell populations

In order to understand how survival programs could be affected by the duration of treatment, we sought to characterize the unique expression profiles among persister cell populations. We identified differentially expressed genes between control and persister cells of each cell line separately and aggregated the complete list of genes using rank product analysis [32]. The batch functionality of TMCI allowed us to efficiently visualize the distribution of top-ranking most differentially expressed genes across the entire dataset collection. From these visualizations, we identified ID2 as one of the most highly upregulated genes across long-term treated cells in comparison to controls (rank product: 4, permutation test: Inline graphic ) but not among the short-term treated cells (rank product: 380, permutation test: ) (Fig. 4C, Supplementary Tables S5 and S6). Comparison of ID2 expression between each control and corresponding treatment arm showed a significant increase of fold change values for all cell lines (Mann–Whitney U test: Inline graphic ), regardless of treatment duration, with the exception of short-term treated DND-41 cells (Fig. 4D, Supplementary Table S7). ID2 is known to play a role in tumorigenesis as a key regulator of cell cycle progression and overexpression of ID2 in cell line experiments modulates proliferative capacity and cell invasiveness [33, 34]. Differential ID2 expression in our analysis suggests varying proliferative activity between treatment durations.

From the tree structure, we noticed a subset of treated MDA-MB-231 breast cancer cells with particularly high ID2 expression that did not group together with the predominant cell line cluster (Supplementary Fig. S1a). Rather, this subset in node 4 grouped more closely with PC9 lung cancer cells than with other cells of the same disease type and treatment condition in node 126. To explore the differences underlying this distinct cell state, we performed differential expression analysis comparing treated MDA-MB-231 cells of node 4 against node 126 (Supplementary Table S8). Metascape analysis of the top 100 most downregulated genes identified “negative regulation of cell differentiation” (hypergeometric test: Inline graphic ) and “PTEN regulation” (hypergeometric test: ) as significantly enriched pathways (Supplementary Fig. S1b). We corroborated these results through gene set enrichment analysis, which identified signals of dysregulated ID2, KRAS, PTEN, and YAP1 expression among the most differentially represented oncogenic signatures (Supplementary Table S9). Interestingly, many of these signatures were derived from RNA interference screens of KRASG13D-mutant cell lines for synthetic lethal targets [35]. MDA-MB-231 cells also harbor this oncogenic mutation, which drives constitutive signaling of the KRAS^G13D protein [36, 37]. However, we found significant downregulation of KRAS expression and underrepresentation of its target genes, accompanied by upregulation of KRAS-mutant synthetic lethal vulnerabilities TBK1, YAP1, and STK33 within the subset of interest [38, 39] (Mann–Whitney U tests: KRAS Inline graphic FC 1.80, ; TBK1 FC 0.682, ; YAP1FC 0.932, ; STK33 FC 1.312, ; Supplementary Table S8, Supplementary Fig. S1c). Altogether, these findings suggest multiple treated populations, one of which undergoes activation of KRAS-mutant compensatory signaling within a subset of treated MDA-MB-231 cells, and demonstrate some of the advantages of a tree-based approach for single-cell analysis.

TMCI identifies distinct proliferation mechanisms within persister cell populations

To interrogate the ongoing biological mechanisms within short- and long-term treated persister cell populations, we performed pathway analysis using the top 100 upregulated differentially expressed genes in the treated cells. Metascape [40] analysis of the differentially expressed genes from short-term treated cells identified “negative regulation of cell population proliferation” as a key biological process (hypergeometric test: Inline graphic ; Fig. 4E). Conversely, the same analysis performed on differentially expressed genes identified “cell population proliferation” enrichment in long-term treated cells, suggesting an increase of cellular proliferation across pathways (hypergeometric test: ; Fig. 4E). Subsequent exploration of the full list of differentially expressed genes using gene set enrichment analysis [41] returned markedly distinct biological programs between the short- and long-term treated populations. Among short-term treated populations, the most significantly decreased hits were found to be associated with various proliferation and cell cycle regulation programs. In line with our previous findings, these programs were not significantly downregulated among long-term treated cells (Supplementary Table S10). Among these gene sets, we found the expression of “FISCHER_G2_M_CELL_CYCLE” significantly decreased among short-term treated cells (NES Inline graphic , Kolmogorov–Smirnov test: ) but not among long-term treated cells (NES , Kolmogorov–Smirnov test: ; Fig. 4F–I, Supplementary Fig. S2a). Consistent with this observation, additional G2M checkpoint and E2F target gene sets showed similar patterns (Supplementary Table S11). These findings suggest that persister cells utilize distinct pathways associated with modulation of proliferation and cell cycling throughout the duration of treatment.

TMCI identifies subpopulations with highly expressed diapause programs

As we identified proliferation and cell cycle factors associated with treatment duration, we were interested in understanding the temporal expression of diapause programs within the various persister cell populations. Diapause is a reversible state of suspended embryonic development triggered by adverse environmental conditions [42]. Similarly, persister cells that survive throughout exposure to treatment undergo transcriptional adaptations resembling a diapause-like state [43, 44]. Overlaying diapause gene signature scores on the tree structure showed enrichment in all treated subpopulations compared to controls (Fig. 4J, Supplementary Fig. S2b).

Comparison between each control and treatment arm showed significantly increased diapause signature scores in all treated cell lines, again regardless of treatment duration (Mann–Whitney U test: Inline graphic ; Fig. 4K, Supplementary Fig. S2c–f, Supplementary Table S12). For DND-41, which includes measurements of both short- and long-term treatment durations, the median diapause signature score increased from control to short term to long term, suggesting a direct correlation between diapause gene signature scores and treatment duration. Confirming that the easily seen difference in diapause signature scores within each cell line was significant, we compared the TMCI visualization against a traditional scatterplot generated with CELLxGENE (Supplementary Fig. S2g–h, Supplementary Fig. S3, Supplementary Note S4). Although TMCI and CELLxGENE had the same diapause signature scores, the significantly different subpopulations were more easily seen in TMCI’s tree. Together, our analysis points to persister cells with different proliferation activity depending on treatment duration.

Discussion

As high-throughput single-cell technologies continue to measure increasing numbers of cells, we need new visualization tools to better identify and interpret cell states. Here we present TMCI as a powerful, interactive solution that simplifies data exploration of large datasets. These visualizations are intuitive, supporting easy tree manipulation through statistical or manual pruning, color mapping, feature overlays, and more. With these features, identification of rare cellular populations is straightforward compared to previous iterations of single-cell data figures. Importantly, these benefits are not at the cost of performance, with TMCI either outperforming or on par with alternative interactive visualizations. As we implemented TMCI as a web server, users can easily and quickly access large datasets with little computational impact on their local host. As a result of TMCI’s speed, its batch-processing capability allows for quick plotting of thousands of trees derived from a single, manually customized tree.

Using the numerous features afforded by TMCI, we delineated cellular populations from drug-treated cancer cell lines and identified distinct transcriptional programs between short- and long-term treated cell lines. These programs included cell proliferation pathways downregulated in short-term persister cell states, which are then subsequently lost in the long-term cellular populations across all cancer types measured. This finding extended to the diapause signature, which was increased in persister cells, in concordance with previous studies, but here across cancer type. Together, TMCI identified transcriptional programs that are dependent on treatment duration, suggesting further investigation on the timing of treatment for persister cells.

Although TMCI is a feature-rich application for tree structure exploration, several future directions could enhance TMCI’s capabilities. While projection-based visualizations such as t-SNE and UMAP have limited capabilities in identifying cell relationships, they are still widely used among the single-cell sequencing analysis community. To link these visualizations together, a new user interface could be created for simultaneous investigation similar to Sleepwalk [9]. These combinations of multiple embeddings may also include other dashboard features such as gene expression heatmaps and enriched pathways. Furthermore, TMCI currently displays relationships generated from a single data modality. As new multiomic technologies sequence both RNA and chromatin accessibility or protein from the same cell, there exists new opportunities for TMCI to integrate multiple data modalities in a tree structure. In the meantime, through our application to drug-treated cancer cell lines, we show that big data visualization tools will be necessary as available data grow, and we provide TMCI as a solution for visualizing tree-based relationships in such data.

Materials and Methods

Benchmarks

We performed benchmarks using an AWS EC2 instance running Ubuntu 20.04 and Docker 20.10.17 with 64x Intel Xeon Platinum 8375C CPU @ 2.90 GHz and 534 GB RAM. We compared CELLxGENE [14], Cirrocumulus [15], TooManyCells [1, 2], and TMCI using 54,220 cells from 5 cancer cell lines, 41,668 cells from the Tabula Muris dataset [24], a smaller subset of 18,859 cells from the Tabula Muris dataset, and 483,152 cells from the Tabula Sapiens dataset [25]. For each method, we devised 5 benchmarks for compute time and memory usage, some of which were unique to tree-based approaches. For all methods, we loaded all single-cell data and ran the default options to generate visualizations (display with features). Based on this benchmark, we also overlaid a single color on the visualization corresponding to a single feature annotation (overlay feature) or also used batch processing for 5 sequential feature overlays (overlay multiple features). Specific to tree-based approaches, we measured tree pruning by collapsing child nodes into the parent node (prune tree). We also benchmarked performance when only displaying the full tree visualization without loading the entire single-cell matrix (tree display). We ran each benchmark 5 times to account for variability in processing time and memory.

Preprocessing of drug-treated cancer scRNA-seq data

To demonstrate the utility of TMCI, we investigated drug-tolerant persister cell populations, which are capable of surviving anticancer drug treatment through nongenetic programming of reversible mechanisms [27]. We aggregated publicly available scRNA-seq data from 5 in vitro persister experiments, including prostate cancer, melanoma, non–small cell lung cancer, breast cancer, and T-cell acute lymphoblastic leukemia cell lines (Table 1). The duration of anticancer drug treatment for each cell line varied from short term (2–3 days) to long term (6–7 weeks), enabling the identification of persister cells across cancer types and time. All datasets were previously generated using similar library preparation methods (10x Genomics 3′ Single Cell Gene Expression), sequencing platforms (Illumina), and alignment pipelines (Cell Ranger). After manual checks to verify that the files contained raw read count data, we aggregated the matrices using AnnData and Scanpy [4] tools in Python. We applied all normalization and filtering using the TooManyCells command-line tool, based on its original default parameters of term frequency-inverse document frequency (TF-IDF) normalization and filtering for cells expressing at least 250 transcripts and genes detected in at least 1 cell [1]. For other batch-effect correction techniques such as Harmony [45], we recommend using our TooManyCells (à la Python) Python implementation, which better handles noncount, transformed embeddings and is fully compatible with Scanpy [4] (Supplementary Note S3).

Generating drug-treated cancer cell trees

After data normalization and filtering, we used TooManyCells to generate a tree and identify transcriptionally distinct subpopulations within our dataset [1]. In brief, TooManyCells implements a matrix-free hierarchical spectral clustering approach [46] to recursively partition scRNA-seq cell data into similar groups, and it uses Newman–Girvan modularity [47] as an indicator for reaching a leaf in the tree. The resulting tree structure depicts all cells at the central root node, with subdividing branches for each group partition until any additional split would be considered random. This information is encoded in the cluster_tree.json output file and can be viewed interactively through TMCI. We used the resulting tree structure groupings as input for TMCI, through which we applied minimum distance search pruning at a cutoff of 0.019 to improve the visibility of small subpopulations.

Measuring differential expression across cellular populations

Using the tree structure, we conducted differential gene expression analysis between control and persister cell states of each cell individually, using the TooManyCells “differential” functionality with upper quartile normalized read counts. From the resulting Inline graphic fold change values, we aggregated a list of differentially expressed genes across cell lines using rank product analysis [32]. These results identified genes that are more broadly associated with the persister state across drug treatments and cancer disease types. We used the batch functionality of TMCI to iterate through the list of top-ranking gene targets and visually identified ID2 as a potential target of interest on account of its high expression among long-term treated cell lines, which we did not observe across short-term treated persister cells. These findings corroborated with statistical comparisons of expression between treatment conditions within each given cell line, highlighting ID2 as a target of interest.

To explore the biological mechanisms associated with each cell state, we conducted gene set enrichment analysis [41] across the tree structure. For this analysis, we calculated the Inline graphic fold change values of each node against all other cells using the methods from Scanpy “rank_genes_groups.” With each ordered gene list, we ran the GSEApy “preranked” module with MSigDB Hallmark, C2 (curated), and C6 (oncogenic) gene sets [48]. We used 2-sided statistical tests for all analyses.

Availability of Supporting Source Code and Requirements

1. Project name: TooManyCellsInteractive

Project homepage: https://github.com/schwartzlab-methods/too-many-cells-interactive

Operating system: Platform independent

Programming language: TypeScript, JavaScript

Other requirements: Docker, Docker-compose

License: GNU General Public License v3.0

RRID:SCR_025315

Note: Archival versions of the code are available via Software Heritage [49] and figshare [50], with a tutorial at https://schwartzlab-methods.github.io/too-many-cells-interactive/.

2. Project name: TooManyCells (à la Python)

Project homepage: https://github.com/schwartzlab-methods/too-many-cells-python

Operating system: Platform independent

Programming language: Python

Other requirements: Graphviz (https://graphviz.org/)

License: GNU Affero General Public License v3.0

PyPi: toomanycells (https://pypi.org/project/toomanycells/)

RRID:SCR_025327

Supplementary Material

giae056_GIGA-D-23-00386_Original_Submission

giae056_giga-d-23-00386_original_submission.pdf^{(5.7MB, pdf)}

giae056_GIGA-D-23-00386_Revision_1

giae056_giga-d-23-00386_revision_1.pdf^{(5.8MB, pdf)}

giae056_GIGA-D-23-00386_Revision_2

giae056_giga-d-23-00386_revision_2.pdf^{(7.3MB, pdf)}

giae056_Response_to_Reviewer_Comments_Original_Submission

giae056_response_to_reviewer_comments_original_submission.pdf^{(29.9KB, pdf)}

giae056_Response_to_Reviewer_Comments_Revision_1

giae056_response_to_reviewer_comments_revision_1.pdf^{(37.6KB, pdf)}

giae056_Reviewer_1_Report_Original_Submission

Qingnan Liang, Ph.D. -- 1/26/2024 Reviewed

giae056_reviewer_1_report_original_submission.pdf^{(119KB, pdf)}

giae056_Reviewer_2_Report_Original_Submission

Mehmet Tekman -- 2/12/2024 Reviewed

giae056_reviewer_2_report_original_submission.pdf^{(132.2KB, pdf)}

giae056_Reviewer_3_Report_Original_Submission

Georgios Fotakis -- 2/19/2024 Reviewed

giae056_reviewer_3_report_original_submission.pdf^{(139.7KB, pdf)}

giae056_Reviewer_3_Report_Revision_1

Georgios Fotakis -- 6/10/2024 Reviewed

giae056_reviewer_3_report_revision_1.pdf^{(117.6KB, pdf)}

giae056_Supplemental_Files

giae056_supplemental_files.zip^{(4.2MB, zip)}

Contributor Information

Conor Klamann, Data Sciences Institute, University of Toronto, Toronto, ON M5G 1Z5, Canada.

Christie J Lau, Princess Margaret Cancer Centre, University Health Network, Toronto, ON M5G 1L7, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada.

Javier Ruiz-Ramírez, Princess Margaret Cancer Centre, University Health Network, Toronto, ON M5G 1L7, Canada.

Gregory W Schwartz, Princess Margaret Cancer Centre, University Health Network, Toronto, ON M5G 1L7, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Vector Institute, Toronto, ON M5G 1M1, Canada.

Additional Files

Supplementary Fig. S1. Treated MDA-MB-231 breast cancer cells cluster into distinct cell states. (a) The TMCI tree from Fig. 4B with node identifiers (left) with a collapsed focus on MDA-MB-231 (top right). A subset of treated MDA-MB-231 breast cancer cells clustered together into node 4, separately from the predominant cluster in node 126 and containing a subgroup with high ID2 expression in node 48 (bottom right). (b) The top 4 enriched pathways determined from Metascape analysis of the 100 most downregulated genes. Pathways corroborating our findings from gene set enrichment analysis are highlighted in red. (c) Box-and-whisker plot of expression of KRAS and known KRAS-mutant synthetic lethal partners. For all box-and-whisker plots, center line: median, box bounds: interquartile range, whiskers: minimum and maximum scores. Statistical annotations represent results of a 1-sided Mann–Whitney U test with Benjamini–Hochberg correction. ns: not significant, * Inline graphic , ****.

Supplementary Fig. S2. Short-term treatment across cell lines has lower proliferative enrichment. (a) Gene set enrichment analysis (GSEA) curve for the “FISCHER_G2_M_CELL_CYCLE” gene set in long-term treated cells. (b) TMCI tree from Fig. 4B colored by diapause signature score per node cluster (red: greater than 1 median absolute deviation away from the median diapause signature score; gray: otherwise). (c) GSEA curves of the upregulated diapause signature score gene set from short-term (left) or long-term (right) treated cells. (d) TMCI tree from Fig. 4B colored by normalized enrichment scores of the upregulated diapause gene set comparing each node to every other node. (e) GSEA curves of the downregulated diapause signature score gene set as in (c). (f) Normalized enrichment scores for the downregulated diapause signature score gene set as in (d). (g, h) CELLxGENE UMAP scatterplots of the same data as in (b) colored by cell line (g) or diapause signature scores (h) per cell.

Supplementary Fig. S3. Excerpt from Fig. 4B (a) and Supplementary Fig. S2g (b) for cell line annotations and Fig. 4J (c) and Supplementary Fig. S2h (d) for diapause score overlays for cell line data visualized using TMCI (a, c) and CELLxGENE (b, d) for a side-by-side comparison. (e) CELLxGENE visualization colored by cluster annotations to more directly compare with tree clusters from TMCI.

Supplementary Fig. S4. (a, b) Visualizations of read count data after Harmony [45] batch-effect correction for cell line, within each treatment condition. We used the resulting PCA embeddings to generate UMAP embeddings, displayed in CELLxGENE (a), and TooManyCells tree structure, displayed in TMCI (b). We set TooManyCells parameters for corrected data according to recommended defaults (Supplementary Note S3). We pruned the tree with “plain distance search” of 0.05 to improve visual clarity of the clusters.

Supplementary Table S1. Mean execution time for each performance benchmark across cells from 10 mouse tissues [24] ( Inline graphic cells), 24 mouse tissues ( cells), 24 human tissues and organs ( cells) [25], or 11 samples from 5 cancer cell lines in response to drug treatment ( cells).

Supplementary Table S2. Pairwise t-test results for each execution time performance benchmark across datasets from Supplementary Table S1. Benjamini–Hochberg method was used for testing and adjustment of p-values.

Supplementary Table S3. Mean memory usage for each performance benchmark across datasets from Supplementary Table S1.

Supplementary Table S4. Pairwise t-test results for each memory usage performance benchmark across datasets from Supplementary Table S1. Benjamini–Hochberg method was used for testing and adjustment of p-values.

Supplementary Table S5. Rank product of differentially expressed genes between long-term treated cells ( Inline graphic ) and their corresponding controls (). Mann–Whitney U test was used to calculate p-values.

Supplementary Table S6. Rank product of differentially expressed genes between short-term treated cells ( Inline graphic ) and their corresponding controls (). Mann–Whitney U test was used to calculate p-values.

Supplementary Table S7. Inline graphic of upper quartile-normalized gene expression between treated cells and their corresponding controls within a given cell line. Mann–Whitney U test was used to calculate p-values, and Benjamini–Hochberg method was used for testing and adjustment of p-values.

Supplementary Table S8. Inline graphic of log-normalized gene expression between a subset of treated MDA-MB-231 cells at node 4 and the predominant cluster at node 126. Mann–Whitney U test was used to calculate p-values, and Benjamini–Hochberg method was used for testing and adjustment of p-values.

Supplementary Table S9. Gene set enrichment analysis results for treated MDA-MB-231 cells at node 4 and node of upper quartile-normalized gene expression between treated cells and their corresponding controls within a given cell line. Mann–Whitney U test was used to calculate p-values, and Benjamini–Hochberg method was used for testing and adjustment of p-values.

Supplementary Table S10. Gene set enrichment analysis results for long-term treated cells ( Inline graphic ) in comparison to their corresponding controls (), using the Hallmark, C2, and C6 gene sets.

Supplementary Table S11. Gene set enrichment analysis results for short-term treated cells ( Inline graphic ) in comparison to their corresponding controls (), using the Hallmark, C2, and C6 gene sets.

Supplementary Table S12. Comparison of diapause scores between treated cells and their corresponding control conditions within a given cell line. One-sided Mann–Whitney U test was used to evaluate higher diapause scores among treated conditions compared to control. Benjamini–Hochberg method was used for testing and adjustment of p-values.

Abbreviations

GSEA: gene set enrichment analysis; NES: normalized enrichment score; PCA: principal component analysis; RAM: random-access memory; scATAC-seq: single-cell assay for transposase-accessible chromatin; scRNA-seq: single-cell RNA sequencing; TMC: TooManyCells; TMCI: TooManyCellsInteractive; t-SNE: t-distributed stochastic neighbor embedding; UMAP: uniform manifold approximation and projection; DOM: Document Object Model.

Author Contributions

G.W.S. conceived and supervised the project. C.K. developed the tool and benchmarks, as well as ran and analyzed benchmarks. C.J.L. collected, ran, and analyzed cancer cell line data. C.K., C.J.L., and G.W.S. wrote the manuscript.

Funding

This work was supported by the University of Toronto Data Sciences Institute Research Software Development Support Program (G.W.S.), the Canadian Cancer Society Challenge Grant (grant 707484; G.W.S.), the Natural Sciences and Engineering Research Council of Canada (grants RGPIN-2023-04713 and DGECR-2023-00395; G.W.S.), the Social Sciences and Humanities Research Council (grant NFRFE-2022-00681; G.W.S.), the Canada Research Chairs Program (G.W.S.), the Princess Margaret Cancer Foundation (G.W.S.), and the University of Toronto Data Sciences Institute Doctoral Student Fellowship (C.J.L.).

Data Availability

The GEO accession numbers for each dataset reported in this article are GSM5155455 and GSM5155456 (prostate cancer), GSM4932163 and GSM4932166 (melanoma), GSM3972651 and GSM3972652 (non–small cell lung cancer), GSM4684556 and GSM4684557 (breast cancer), and GSM4121361, GSM4121362, and GSM4121364 (T-cell acute lymphoblastic leukemia). Archival versions of the code are available via Software Heritage [49] and figshare [50]. Code for analyses within this article is available at https://github.com/schwartzlab-methods/too-many-cells-interactive-paper-analyses.

Competing Interests

The authors declare no competing interests.

References

1. Schwartz GW, Zhou Y, Petrovic J., et al. TooManyCells identifies and visualizes relationships of single-cell clades. Nat Methods. 2020;17:405–13. 10.1038/s41592-020-0748-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Schwartz GW, Zhou Y, Petrovic J, et al. TooManyPeaks identifies drug-resistant-specific regulatory elements from single-cell leukemic epigenomes. Cell Rep. 2021;36:1–17. 10.1016/j.celrep.2021.109575 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Satija R, Farrell JA, Gennert D, et al. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502. 10.1038/nbt.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. 10.1186/s13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv. 2018. https://arxiv.org/abs/1802.03426 (Accessed: 01-Feb-2023)
6. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605. https://jmlr.org/papers/v9/vandermaaten08a.html [Google Scholar]
7. Xiang R, Wang W, Yang L, et al. A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front Genet. 2021;12:646936. 10.3389/fgene.2021.646936 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Wattenberg M, Viégas F, Johnson I. How to use t-SNE effectively. Distill. 2016;1:e2. 10.23915/distill.00002 [DOI] [Google Scholar]
9. Ovchinnikova S, Anders S. Exploring dimension-reduced embeddings with Sleepwalk. Genome Res. 2020;30:749–56. 10.1101/gr.251447.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Cooley SM, Hamilton T, Aragones SD, et al. A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data. bioRxiv. 2022. https://www.biorxiv.org/content/10.1101/689851v6 (Accessed: 01-Feb-2023)
11. Chari T, Banerjee J, Pachter L. The specious art of single-cell genomics. PLOS Computational Biology. 2021;19:e1011288. 10.1371/journal.pcbi.1011288 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nat Commun. 2019;10:1–14. 10.1038/s41467-019-13056-x [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9:5233. 10.1038/s41598-019-41695-z [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Megill C, Martin B, Weaver C, et al. Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. bioRxiv. 2021. https://www.biorxiv.org/content/10.1101/2021.04.05.438318v1 (Accessed: 01-Feb-2023) [Google Scholar]
15. Li B, Gould J, Yang Y, et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat Methods. 2020;17:793–98. 10.1038/s41592-020-0905-x [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Zheng GXY, Terry JM, Belgrader P, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Speir ML, Bhaduri A, Markov NS, et al. UCSC Cell Browser: visualize your single-cell data. Bioinformatics. 2021;37:4578–80. 10.1093/bioinformatics/btab503 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Prieto C, Barrios D, Villaverde A. SingleCAnalyzer: interactive analysis of single cell RNA-seq data on the cloud. Front Bioinform. 2022;2:793309. 10.3389/fbinf.2022.793309 [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Innes BT, Bader GD. scClustViz—single-cell RNAseq cluster assessment and visualization. F1000Research. 2019;7:1522. 10.12688/f1000research.16198.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Tabaka M, Gould J, Regev A. scSVA: an interactive tool for big data visualization and exploration in single-cell omics. bioRxiv. 2019. 10.1101/512582 (Accessed 01-Feb-2023) [DOI]
21. Hasanaj E, Wang J, Sarathi A, et al. Interactive single-cell data analysis using Cellar. Nat Commun. 2022;13:1998. 10.1038/s41467-022-29744-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Hillje R, Pelicci PG, Luzi L. Cerebro: interactive visualization of scRNA-seq data. Bioinformatics. 2020;36:2311–13. 10.1093/bioinformatics/btz877 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Kotliar D, Colubri A. Sciviewer enables interactive visual interrogation of single-cell RNA-seq data from the Python programming environment. Bioinformatics. 2021;37:3961–63. 10.1093/bioinformatics/btab689 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Tabula The, et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–72. 10.1038/s41586-018-0590-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. CONSORTIUM TTS . The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022;376:eabl4896. 10.1126/science.abl4896 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Glickman MS, Sawyers CL. Converting cancer therapies into cures: lessons from infectious diseases. Cell. 2012;148:1089–98. 10.1016/j.cell.2012.02.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Sharma SV, Lee DY, Li B, et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell. 2010;141:69–80. 10.1016/j.cell.2010.02.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Taavitsainen S, Engedal N, Cao S, et al. Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse. Nat Commun. 2021;12:5307. 10.1038/s41467-021-25624-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Celeste FV, Powers S. Induction of multiple alternative mitogenic signaling pathways accompanies the emergence of drug-tolerant cancer cells. Cancers. 2024;16:1001. 10.3390/cancers16051001 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Aissa AF, Islam AB, Ariss MM, et al. Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat Commun. 2021;12:1628. 10.1038/s41467-021-21884-z [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Johnson KE, Howard GR, Morgan D, et al. Integrating transcriptomics and bulk time course data into a mathematical framework to describe and predict therapeutic resistance in cancer. Phys Biol. 2020;18:016001. 10.1088/1478-3975/abb09c [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Breitling R, Armengaud P, Amtmann A, et al. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 2004;573:83–92. 10.1016/j.febslet.2004.07.055 [DOI] [PubMed] [Google Scholar]
33. Itahana Y, Singh J, Sumida T, et al. Role of Id-2 in the maintenance of a differentiated and noninvasive phenotype in breast cancer cells. Cancer Res. 2003;63:7098–105. https://pubmed.ncbi.nlm.nih.gov/14612502/ [PubMed] [Google Scholar]
34. Stighall M, Manetopoulos C, Axelson H, et al. High ID2 protein expression correlates with a favourable prognosis in patients with primary breast cancer and reduces cellular invasiveness of breast cancer cells. Int J Cancer. 2005;115:403–11. 10.1002/ijc.20875 [DOI] [PubMed] [Google Scholar]
35. Barbie DA, Tamayo P, Boehm JS, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108–12. 10.1038/nature08460 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Kozma SC, Bogaard ME, Buser K, et al. The Human C-Kirsten Ras gene is activated by a novel mutation in Codon 13 in the breast carcinoma cell line MDA-MB231. Nucleic Acids Res. 1987;15:5963–71. 10.1093/nar/15.15.5963 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Karapetis CS, Khambata-Ford S, Jonker DJ, et al. K-Ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med. 2008;359:1757–65. 10.1056/NEJMoa0804385 [DOI] [PubMed] [Google Scholar]
38. Aguirre AJ, Hahn WC. Synthetic lethal vulnerabilities in KRAS -mutant cancers. CSH Perspect Med. 2018;8:a031518. 10.1101/cshperspect.a031518 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Scholl C, Fröhling S, Dunn IF, et al. Synthetic lethal interaction between oncogenic KRAS dependency and STK33 suppression in human cancer cells. Cell. 2009;137:821–34. 10.1016/j.cell.2009.03.017 [DOI] [PubMed] [Google Scholar]
40. Zhou Y, Zhou B, Pache L, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10:1523. 10.1038/s41467-019-09234-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102:15545–550. 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Fenelon JC, Renfree MB. The history of the discovery of embryonic diapause in mammals. Biol Reprod. 2018;99:242–51. 10.1093/biolre/ioy112 [DOI] [PubMed] [Google Scholar]
43. Rehman SK, Haynes J, Collignon E, et al. Colorectal cancer cells enter a diapause-like DTP state to survive chemotherapy. Cell. 2021;184:226–42. 10.1016/j.cell.2020.11.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Dhimolea E, de Matos Simoes R, Kansara D, et al. An embryonic diapause-like adaptation with suppressed Myc activity enables tumor treatment persistence. Cancer Cell. 2021;39:240–56. 10.1016/j.ccell.2020.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat Methods. 2019;16:1289–96. 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Shu L, Chen A, Xiong M, et al. Efficient SPectrAl neighborhood blocking for entity resolution. 2011. 2011 IEEE 27th International Conference on Data Engineering: Hannover, Germany. 1067–78. 10.1109/ICDE.2011.5767835 [DOI] [Google Scholar]
47. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. 10.1103/PhysRevE.69.026113 [DOI] [PubMed] [Google Scholar]
48. Liberzon A, Subramanian A, Pinchback R, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–40. 10.1093/bioinformatics/btr260 [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Klamann C, Lau CJ, Ruiz-Ramírez J, et al. TooManyCellsInteractive: a visualization tool for dynamic exploration of single-cell data. Software Heritage. 2024. https://archive.softwareheritage.org/browse/embed/swh:1:dir:4269544ffe9b965db0133e901d941f8d6e237529/ [DOI] [PMC free article] [PubMed]
50. Klamann C, Lau CJ, Ruiz-Ramírez J et al., TooManyCellsInteractive: a visualization tool for dynamic exploration of single-cell data. Figshare. 2023. 10.6084/m9.figshare.24247426.v1. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Klamann C, Lau CJ, Ruiz-Ramírez J et al., TooManyCellsInteractive: a visualization tool for dynamic exploration of single-cell data. Figshare. 2023. 10.6084/m9.figshare.24247426.v1. [DOI] [PMC free article] [PubMed]

Supplementary Materials

giae056_GIGA-D-23-00386_Original_Submission

giae056_giga-d-23-00386_original_submission.pdf^{(5.7MB, pdf)}

giae056_GIGA-D-23-00386_Revision_1

giae056_giga-d-23-00386_revision_1.pdf^{(5.8MB, pdf)}

giae056_GIGA-D-23-00386_Revision_2

giae056_giga-d-23-00386_revision_2.pdf^{(7.3MB, pdf)}

giae056_Response_to_Reviewer_Comments_Original_Submission

giae056_response_to_reviewer_comments_original_submission.pdf^{(29.9KB, pdf)}

giae056_Response_to_Reviewer_Comments_Revision_1

giae056_response_to_reviewer_comments_revision_1.pdf^{(37.6KB, pdf)}

giae056_Reviewer_1_Report_Original_Submission

Qingnan Liang, Ph.D. -- 1/26/2024 Reviewed

giae056_reviewer_1_report_original_submission.pdf^{(119KB, pdf)}

giae056_Reviewer_2_Report_Original_Submission

Mehmet Tekman -- 2/12/2024 Reviewed

giae056_reviewer_2_report_original_submission.pdf^{(132.2KB, pdf)}

giae056_Reviewer_3_Report_Original_Submission

Georgios Fotakis -- 2/19/2024 Reviewed

giae056_reviewer_3_report_original_submission.pdf^{(139.7KB, pdf)}

giae056_Reviewer_3_Report_Revision_1

Georgios Fotakis -- 6/10/2024 Reviewed

giae056_reviewer_3_report_revision_1.pdf^{(117.6KB, pdf)}

giae056_Supplemental_Files

giae056_supplemental_files.zip^{(4.2MB, zip)}

Data Availability Statement

[bib1] 1. Schwartz GW, Zhou Y, Petrovic J., et al. TooManyCells identifies and visualizes relationships of single-cell clades. Nat Methods. 2020;17:405–13. 10.1038/s41592-020-0748-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2. Schwartz GW, Zhou Y, Petrovic J, et al. TooManyPeaks identifies drug-resistant-specific regulatory elements from single-cell leukemic epigenomes. Cell Rep. 2021;36:1–17. 10.1016/j.celrep.2021.109575 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3. Satija R, Farrell JA, Gennert D, et al. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502. 10.1038/nbt.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. 10.1186/s13059-017-1382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5. McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv. 2018. https://arxiv.org/abs/1802.03426 (Accessed: 01-Feb-2023)

[bib6] 6. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–605. https://jmlr.org/papers/v9/vandermaaten08a.html [Google Scholar]

[bib7] 7. Xiang R, Wang W, Yang L, et al. A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front Genet. 2021;12:646936. 10.3389/fgene.2021.646936 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8. Wattenberg M, Viégas F, Johnson I. How to use t-SNE effectively. Distill. 2016;1:e2. 10.23915/distill.00002 [DOI] [Google Scholar]

[bib9] 9. Ovchinnikova S, Anders S. Exploring dimension-reduced embeddings with Sleepwalk. Genome Res. 2020;30:749–56. 10.1101/gr.251447.119 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10. Cooley SM, Hamilton T, Aragones SD, et al. A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data. bioRxiv. 2022. https://www.biorxiv.org/content/10.1101/689851v6 (Accessed: 01-Feb-2023)

[bib11] 11. Chari T, Banerjee J, Pachter L. The specious art of single-cell genomics. PLOS Computational Biology. 2021;19:e1011288. 10.1371/journal.pcbi.1011288 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12. Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nat Commun. 2019;10:1–14. 10.1038/s41467-019-13056-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9:5233. 10.1038/s41598-019-41695-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14. Megill C, Martin B, Weaver C, et al. Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. bioRxiv. 2021. https://www.biorxiv.org/content/10.1101/2021.04.05.438318v1 (Accessed: 01-Feb-2023) [Google Scholar]

[bib15] 15. Li B, Gould J, Yang Y, et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat Methods. 2020;17:793–98. 10.1038/s41592-020-0905-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16. Zheng GXY, Terry JM, Belgrader P, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17. Speir ML, Bhaduri A, Markov NS, et al. UCSC Cell Browser: visualize your single-cell data. Bioinformatics. 2021;37:4578–80. 10.1093/bioinformatics/btab503 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18. Prieto C, Barrios D, Villaverde A. SingleCAnalyzer: interactive analysis of single cell RNA-seq data on the cloud. Front Bioinform. 2022;2:793309. 10.3389/fbinf.2022.793309 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19. Innes BT, Bader GD. scClustViz—single-cell RNAseq cluster assessment and visualization. F1000Research. 2019;7:1522. 10.12688/f1000research.16198.2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20. Tabaka M, Gould J, Regev A. scSVA: an interactive tool for big data visualization and exploration in single-cell omics. bioRxiv. 2019. 10.1101/512582 (Accessed 01-Feb-2023) [DOI]

[bib21] 21. Hasanaj E, Wang J, Sarathi A, et al. Interactive single-cell data analysis using Cellar. Nat Commun. 2022;13:1998. 10.1038/s41467-022-29744-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22. Hillje R, Pelicci PG, Luzi L. Cerebro: interactive visualization of scRNA-seq data. Bioinformatics. 2020;36:2311–13. 10.1093/bioinformatics/btz877 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23. Kotliar D, Colubri A. Sciviewer enables interactive visual interrogation of single-cell RNA-seq data from the Python programming environment. Bioinformatics. 2021;37:3961–63. 10.1093/bioinformatics/btab689 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24. Tabula The, et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–72. 10.1038/s41586-018-0590-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25. CONSORTIUM TTS . The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022;376:eabl4896. 10.1126/science.abl4896 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26. Glickman MS, Sawyers CL. Converting cancer therapies into cures: lessons from infectious diseases. Cell. 2012;148:1089–98. 10.1016/j.cell.2012.02.015 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27. Sharma SV, Lee DY, Li B, et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell. 2010;141:69–80. 10.1016/j.cell.2010.02.027 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28. Taavitsainen S, Engedal N, Cao S, et al. Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse. Nat Commun. 2021;12:5307. 10.1038/s41467-021-25624-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29. Celeste FV, Powers S. Induction of multiple alternative mitogenic signaling pathways accompanies the emergence of drug-tolerant cancer cells. Cancers. 2024;16:1001. 10.3390/cancers16051001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30. Aissa AF, Islam AB, Ariss MM, et al. Single-cell transcriptional changes associated with drug tolerance and response to combination therapies in cancer. Nat Commun. 2021;12:1628. 10.1038/s41467-021-21884-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31. Johnson KE, Howard GR, Morgan D, et al. Integrating transcriptomics and bulk time course data into a mathematical framework to describe and predict therapeutic resistance in cancer. Phys Biol. 2020;18:016001. 10.1088/1478-3975/abb09c [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32. Breitling R, Armengaud P, Amtmann A, et al. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 2004;573:83–92. 10.1016/j.febslet.2004.07.055 [DOI] [PubMed] [Google Scholar]

[bib33] 33. Itahana Y, Singh J, Sumida T, et al. Role of Id-2 in the maintenance of a differentiated and noninvasive phenotype in breast cancer cells. Cancer Res. 2003;63:7098–105. https://pubmed.ncbi.nlm.nih.gov/14612502/ [PubMed] [Google Scholar]

[bib34] 34. Stighall M, Manetopoulos C, Axelson H, et al. High ID2 protein expression correlates with a favourable prognosis in patients with primary breast cancer and reduces cellular invasiveness of breast cancer cells. Int J Cancer. 2005;115:403–11. 10.1002/ijc.20875 [DOI] [PubMed] [Google Scholar]

[bib35] 35. Barbie DA, Tamayo P, Boehm JS, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108–12. 10.1038/nature08460 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36. Kozma SC, Bogaard ME, Buser K, et al. The Human C-Kirsten Ras gene is activated by a novel mutation in Codon 13 in the breast carcinoma cell line MDA-MB231. Nucleic Acids Res. 1987;15:5963–71. 10.1093/nar/15.15.5963 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37. Karapetis CS, Khambata-Ford S, Jonker DJ, et al. K-Ras mutations and benefit from cetuximab in advanced colorectal cancer. N Engl J Med. 2008;359:1757–65. 10.1056/NEJMoa0804385 [DOI] [PubMed] [Google Scholar]

[bib38] 38. Aguirre AJ, Hahn WC. Synthetic lethal vulnerabilities in KRAS -mutant cancers. CSH Perspect Med. 2018;8:a031518. 10.1101/cshperspect.a031518 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39. Scholl C, Fröhling S, Dunn IF, et al. Synthetic lethal interaction between oncogenic KRAS dependency and STK33 suppression in human cancer cells. Cell. 2009;137:821–34. 10.1016/j.cell.2009.03.017 [DOI] [PubMed] [Google Scholar]

[bib40] 40. Zhou Y, Zhou B, Pache L, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10:1523. 10.1038/s41467-019-09234-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 41. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005;102:15545–550. 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 42. Fenelon JC, Renfree MB. The history of the discovery of embryonic diapause in mammals. Biol Reprod. 2018;99:242–51. 10.1093/biolre/ioy112 [DOI] [PubMed] [Google Scholar]

[bib43] 43. Rehman SK, Haynes J, Collignon E, et al. Colorectal cancer cells enter a diapause-like DTP state to survive chemotherapy. Cell. 2021;184:226–42. 10.1016/j.cell.2020.11.018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] 44. Dhimolea E, de Matos Simoes R, Kansara D, et al. An embryonic diapause-like adaptation with suppressed Myc activity enables tumor treatment persistence. Cancer Cell. 2021;39:240–56. 10.1016/j.ccell.2020.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45. Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat Methods. 2019;16:1289–96. 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 46. Shu L, Chen A, Xiong M, et al. Efficient SPectrAl neighborhood blocking for entity resolution. 2011. 2011 IEEE 27th International Conference on Data Engineering: Hannover, Germany. 1067–78. 10.1109/ICDE.2011.5767835 [DOI] [Google Scholar]

[bib47] 47. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. 10.1103/PhysRevE.69.026113 [DOI] [PubMed] [Google Scholar]

[bib48] 48. Liberzon A, Subramanian A, Pinchback R, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–40. 10.1093/bioinformatics/btr260 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] 49. Klamann C, Lau CJ, Ruiz-Ramírez J, et al. TooManyCellsInteractive: a visualization tool for dynamic exploration of single-cell data. Software Heritage. 2024. https://archive.softwareheritage.org/browse/embed/swh:1:dir:4269544ffe9b965db0133e901d941f8d6e237529/ [DOI] [PMC free article] [PubMed]

[bib50] 50. Klamann C, Lau CJ, Ruiz-Ramírez J et al., TooManyCellsInteractive: a visualization tool for dynamic exploration of single-cell data. Figshare. 2023. 10.6084/m9.figshare.24247426.v1. [DOI] [PMC free article] [PubMed]

PERMALINK

TooManyCellsInteractive: A visualization tool for dynamic exploration of single-cell data

Conor Klamann

Christie J Lau

Javier Ruiz-Ramírez

Gregory W Schwartz

Abstract

Background

Results

Conclusions

Introduction

Figure 1:

Results

Implementation

Figure 2:

TMCI reduces time to display trees

Figure 3:

Case study: TMCI effectively delineates subpopulations of cancer drug-tolerant persister cells

Figure 4:

Table 1:

TMCI identified differentially expressed ID2 across persister cell populations

TMCI identifies distinct proliferation mechanisms within persister cell populations

TMCI identifies subpopulations with highly expressed diapause programs

Discussion

Materials and Methods

Benchmarks

Preprocessing of drug-treated cancer scRNA-seq data

Generating drug-treated cancer cell trees

Measuring differential expression across cellular populations

Availability of Supporting Source Code and Requirements

Supplementary Material

Contributor Information

Additional Files

Abbreviations

Author Contributions

Funding

Data Availability

Competing Interests

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases