Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 1.
Published in final edited form as: Nat Methods. 2021 Apr;18(4):327–328. doi: 10.1038/s41592-021-01102-w

User-friendly, scalable tools and workflows for single-cell RNAseq analysis

P Moreno 1,*, N Huang 1,2, JR Manning 1, S Mohammed 1, A Solovyev 1, K Polanski 2, W Bacon 1, R Chazarra 1, C Talavera-López 1,2, MA Doyle 3,4, G Marnier 1, B Grüning 5, H Rasche 5, N George 1, S Fexova 1, M Alibi 1, Z Miao 1, Y Perez-Riverol 1, M Haeussler 6, A Brazma 1, S Teichmann 2, KB Meyer 2, I Papatheodorou 1,*
PMCID: PMC8299072  NIHMSID: NIHMS1704043  PMID: 33782609

To the Editor — As Single-cell RNA-Seq (scRNA-Seq) becomes widespread, accessible and scalable computational pipelines for data analysis are needed. We introduce an interactive computational environment for single cell studies, based on Galaxy1, with functions from established workflows. Single Cell interactive application (SCiAp) provides easy access to data from the Human Cell Atlas (HCA) and EMBL-EBI’s Single Cell Expression Atlas (SCEA)2 projects and can be deployed in different computing platforms, making single-cell data analysis of large-scale projects accessible to the scientific community.

Consortia such as the HCA, the Fly Cell Atlas, among others, are generating large amounts of scRNA-Seq datasets that will be available for researchers to re-utilize and to analyze their own datasets. For instance, SCEA provides scRNA-Seq datasets comprising >3 million cells, across cell types, tissues and more than14 species. This large collection of scRNA-Seq data demands adequate computational infrastructure, analysis tools and workflows to help researchers make the most of it.

The Galaxy framework has enabled flexible and scalable deployment across multiple clouds through the Galaxy-Kubernetes integration3, supporting analysis of large datasets. Galaxy offers a user-friendly framework for building and sharing workflows. It is supported by a vibrant community of bioinformaticians who continuously enrich the tool repository with analysis methods for e.g., scRNA-seq4. Built on Galaxy, SCiAp facilitates data access (HCA, SCEA and own data), downstream analysis and visualisation of scRNA-Seq datasets. We share tools and workflows (including those used in SCEA) in SCiAp that can run through the web interface or the command line. An instance, known as the HCA Galaxy instance, is available at https://humancellatlas.usegalaxy.eu/ (Figure 1). Additional technical details and usability among many other topics are covered in the Supplementary Methods.

Figure 1:

Figure 1:

(A) Load matrix data from HCA or SCEA directly into SCiAp Galaxy. (B) Run configurable scRNA-Seq analysis through SCiAp. (C) Inspect results interactively through UCSC-CellBrowser and plots within Galaxy.

A key feature of SCiAp is the ability to integrate tools from different workflows, written in different languages. We break monolithic tools into analysis modules, enabling users to try different competing tool-sets and, where possible, integrate them into the same workflows. For example, we produced >20 modules for Scanpy5, covering data input, filtering, normalisation, variable genes, clustering, dimensionality reductions and trajectory methods among others. Supplementary Table 1 shows all the tools integrated and the different functional modules in which they were broken; Supplementary Note 1 shows the integration of modules from different tools on analysis workflows. SCiAp provides functionality from Scanpy, Seurat6, Monocle37, SC38, SCmap9, Scater10, SCCAF11, SCPred12, SCEasy and UCSC CellBrowser. Supplementary Figure 1 shows a map of scRNA-seq data analysis functionalities that are covered by tool wrappers contributed as part of this work and external contributions incorporated, shown accordingly.

In summary, SCiAp is a suite of components derived from commonly used tools in scRNA-Seq analysis. Being based on Galaxy, it can be deployed on large computational infrastructures or on existing Galaxy instances, reducing software engineering complexities for the biological research community. Supplementary Table 2 shows an overview comparison between SCiAp and similar services. SCiAp outperforms for wider accessibility and the variety of tool sets provided. We also provide the underlying tools that resolve software dependencies via Bioconda13 and Biocontainers14, commonly used frameworks in bioinformatics. Lab-based scientists with a deep understanding of a cellular system can use this computational framework to interrogate scRNA-Seq data, propose further hypotheses and guide their experiments to explore the translational potential of large-scale, single cell studies using the friendly Galaxy environment.

Supplementary Material

1704043_Supp_material

Acknowledgements

PM, JM, NH, KBM, IP: Silicon Valley Community Foundation 2018-183498

MH: Silicon Valley Community Foundation 2018-182809 and NHGRI 5U41HG002371-19

The authors wish to acknowledge the invaluable support from the Bioconda, Biocontainers and Galaxy communities.

Footnotes

Data Availability

Example input data, in the form of Galaxy histories, are available at usegalaxy.eu, with direct links available in the Supplementary section “Supplementary Note 1: Workflows”. Single Cell Expression Atlas data is directly available from https://www.ebi.ac.uk/gxa/sc and from its FTP at ftp.ebi.ac.uk/pub/databases/microarray/data/atlas/sc_experiments/. The Human Cell Atlas data is available from https://data.humancellatlas.org/. In both cases, the appropriate Galaxy modules retrieve data directly from Single Cell Expression Atlas and the Human Cell Atlas.

Code Availability

Code contributed here is all made available through multiple GitHub repos, biocontainers, bioconda recipes and Galaxy Toolshed entries, as shown and linked in supplementary sections “Supplementary Table 1” and “Supplementary Note 2”.

Ethics Declaration

Competing interests

The authors declare no competing interests.

References

  • 1.Afgan E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46, W537–W544 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Papatheodorou I. et al. Expression Atlas update: from tissues to single cells. Nucleic Acids Res. 48, D77–D83 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Moreno P. et al. Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud. bioRxiv 488643 (2018) doi: 10.1101/488643. [DOI] [Google Scholar]
  • 4.Tekman M. et al. A single-cell RNA-sequencing training and analysis suite using the Galaxy framework. Gigascience 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wolf FA, Angerer P. & Theis FJ SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Stuart T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902.e21 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cao J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kiselev VY et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kiselev VY, Yiu A. & Hemberg M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018). [DOI] [PubMed] [Google Scholar]
  • 10.McCarthy DJ, Campbell KR, Lun ATL & Wills QF Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Miao Z. et al. Putative cell type discovery from single-cell gene expression data. Nat. Methods 17, 621–628 (2020). [DOI] [PubMed] [Google Scholar]
  • 12.Alquicira-Hernandez J, Sathe A, Ji HP, Nguyen Q. & Powell JE scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol. 20, 264 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grüning B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.da Veiga Leprevost F. et al. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33, 2580–2582 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1704043_Supp_material

RESOURCES