Abstract
Here, we introduce Trackplot, a Python package for generating publication-quality visualization by a programmable and interactive web-based approach. Compared to the existing versions of programs generating sashimi plots, Trackplot offers a versatile platform for visually interpreting genomic data from a wide variety of sources, including gene annotation with functional domain mapping, isoform expression, isoform structures identified by scRNA-seq and long-read sequencing, as well as chromatin accessibility and architecture without any preprocessing, and also offers a broad degree of flexibility for formats of output files that satisfy the requirements of major journals. The Trackplot package is an open-source software which is freely available on Bioconda (https://anaconda.org/bioconda/trackplot), Docker (https://hub.docker.com/r/ygidtu/trackplot), PyPI (https://pypi.org/project/trackplot/) and GitHub (https://github.com/ygidtu/trackplot), and a built-in web server for local deployment is also provided.
Author summary
Simultaneously visualizing how isoform expression, protein-DNA/RNA interactions, accessibility, and architecture of chromatin differs across conditions and cell types could inform our understanding on regulatory mechanisms and functional consequences of alternative splicing. However, the existing versions of tools generating sashimi plots remain inflexible, complicated, and user-unfriendly for integrating data sources from multiple bioinformatic formats or various genomics assays. Thus, a more scalable visualization tool is necessary to broaden the scope of sashimi plots. To overcome these limitations, we present Trackplot, a comprehensive tool that delivers high-quality plots via a programmable and interactive web-based platform. Trackplot seamlessly integrates diverse data sources and utilizes a multi-threaded process, enabling users to explore genomic signal in large-scale sequencing datasets.
Introduction
Uncovering differential isoform expression is crucial for enhancing proteome diversity and transcript functionality [1]. Various library protocols and sequencing methods, such as single-cell RNA sequencing (scRNA-seq) [2] and long-read sequencing [3], have been developed and widely used to explore the heterogeneity of isoform expression in single cells. Despite the availability of advanced tools for analyzing and visualizing genomics data, several challenges persist. Existing tools like sashimi [4], ggsashimi [5], and SplicePlot [6] are limited in efficiency and flexibility when handling the ever-growing volume and size of data. Moreover, these tools often only provide a command-line interface, which can be daunting for inexperienced programmers. Additionally, conventional interactive genome browsers like Integrative Genomics Viewer (IGV) [7] lack flexibility in output format. To address these limitations, we introduce Trackplot, a comprehensive tool that generates high-quality plots in a programmable and interactive web-based format. Trackplot offers integrated visualization of diverse data sources, including gene annotation with functional domain mapping, isoform expression, isoform structures identified by scRNA-seq and long-read sequencing, as well as chromatin accessibility and architecture.
Design and implementation
Trackplot is a platform that leverages Python and JavaScript to visualize genomic data from diverse sources and generate plots suitable for publication. It offers easy accessibility and ensures high reproducibility. Users can freely download Trackplot from GitHub and install it from source code, PyPI, Pipenv, Bioconda, AppImage, or a Docker image. It provides multiple approaches for generating plots, including an application programming interface (API) for scripts and Jupyter Notebooks, a command-line interface (CLI), and a user-friendly web interface. Trackplot supports most standard data formats in bioinformatics, such as BAM, BED, bigWig, bigBed, GTF, BedGraph, HiCExplorer’s native h5 format, and the depth file generated by samtools [8] (Fig 1).
To generate plots, Trackplot initially requires the precise genomic coordinates of interest and a meta file containing information such as file path, data category, display label, color, and strandness for each data track. If there is a need for additional annotations, such as cell meta for demultiplexing or highlight regions for polyadenylation sites, these can be provided as parameters. Additionally, when the domain model is activated, Trackplot incorporates an automated process to request the API (https://rest.uniprot.org/uniprotkb/search?&query="ID") of UniProt [9] using the transcript ID in order to retrieve its corresponding protein ID, which may have multiple values. Subsequently, each protein ID associated with the given transcript ID is utilized to access the ENSEMBL [10] API (https://www.ebi.ac.uk/proteins/api/features/"uniprot_ID"). The tool verifies whether the length of the coding sequence (CDS) is three times that of the protein length, and if so, it collects and visualizes the domain information with user-defined filters. Subsequently, all the configuration information will be stored in a Plot object for further processing. Once the configuration process is complete, the tool utilizes packages such as pysam, pyBigWig, or hicmatrix to parse the input track files. It extracts and stores comprehensive information, including abundance, splicing junctions, gene annotation, and protein domains of the target region, in a Pandas dataframe. This dataframe can then be utilized for further analysis and processing. Finally, Trackplot utilizes the matplotlib package to generate plots, which provides flexibility in adjusting the size and resolution (dots per inch, DPI) of the figure. It also supports various output formats, including png, pdf, and tiff, ensuring compatibility with the requirements of major scientific journals. In addition to supporting some features already present in existing software, such as sample aggregation, Reads per kilobase of transcript per Million reads mapped (RPKM) / reads per million (RPM) calculation, and intron shrinkage, our tool outperforms the existing sashimi tool in terms of speed and efficiency (S1 Fig). In summary, Trackplot provides a highly accessible, reproducible, and flexible tool for generating genomic data plots.
Results
Trackplot functions similarly to previous Sashimi plot packages, taking all splicing reads including novel junctions from BAM files and gene model annotations from GTF or BED files as input to visualize the differential usage of exons or transcripts. An example of a plot generated by Trackplot for eight bulk RNA-seq samples from the TNP GBM model [11] is shown in S1A Fig, which suggests gradual exclusion of the middle exon during tumorigenesis. The tool identified that the long isoform, which encodes a protein with key functional domains, is gradually spliced out, and the short isoform without functional domains becomes the major isoform (S1A Fig). Moreover, trackplot could take input in various bioinformatics formats, making it flexible in integrating data from multiple sources. Through the integration of RNA binding signal data (bigWig) and coverage data (BAM), Trackplot effectively illustrates the enrichment of PTBP1 at exon 2 of PTBP3. This observation suggests that PTBP1 is likely to directly regulate the alternative splicing of PTBP3’s exon 2, consistent with previous findings [12] (S2B Fig).
The advent of long-read sequencing platforms, such as Pacific Biosciences and Oxford Nanopore Technologies, has revolutionized transcriptome analysis by providing full transcript structures without the need for assembly. However, existing sashimi plot tools are primarily designed for short-read sequencing data and visualize sequencing reads by aggregating the depth of each coordinate, thereby losing the exon connections from individual reads. This limitation is effectively addressed by Trackplot, which offers a read-by-read style visualization with exon-sort options. This unique feature enables Trackplot to distinctly present the exon-intron structures of each isoform, providing a more comprehensive view of the transcriptome (S3A Fig). Moreover, Trackplot has the capability to extract and visualize additional information from the BAM file tags, such as the length of poly(A) tails or the modification status of each nucleobase (S3B Fig). By incorporating these features, Trackplot offers enhanced insights into the complexity and diversity of transcriptomes.
Several methods have recently been proposed to identify and estimate alternative polyadenylation (APA) events at the single-cell level, including SCAPE [13]. Existing tools in the field lack the capability to accurately demultiplex gene expression into distinct cell populations, often requiring users to manually split and deduplicate BAM files prior to analysis. However, Trackplot offers an automated solution to this challenge by implementing a demultiplexing and deduplication process based on a user-provided meta file containing cell barcodes and their corresponding cell types. This feature enables Trackplot to generate a clearer and more accurate representation of differential expression APA (alternative polyadenylation) events among 3’ enriched single-cell RNA sequencing (scRNA-seq) data, as illustrated in S4A Fig. Furthermore, Trackplot extends its functionality to support the analysis of single-cell data that simultaneously profiles the transcriptome and chromatin accessibility. In an example analysis, Trackplot presents a differential chromatin accessibility pattern of U2AF1L4 between CD4 naïve T cells and CD16 monocytes. This observation correlates with distinct usage patterns of alternative polyadenylation sites (pA1 and pA2) in these two cell populations, as depicted in S4B Fig. These findings highlight the utility of Trackplot in exploring the relationship between transcriptional enhancers and 3’ end processing. In summary, Trackplot provides a comprehensive platform for researchers to investigate isoform diversity within cell populations and explore potential enhancer elements involved in the regulation of gene and isoform expression. Its automated demultiplexing capability and integration of transcriptomic and chromatin accessibility data make it a valuable tool for unraveling the complex regulatory mechanisms underlying gene expression.
Availability and future directions
With trackplot, it is possible to integrate multiple data sources from a wide variety of genomic assays and generate publication-ready plots. It allows users to visualize NGS data with flexible formats of input files as well as outputs. To ensure maximum reproducibility, Trackplot is distributed through PyPI, Bioconda, and Docker, allowing for easy installation and usage in different computational environments. Moreover, it also could be used via a command line or an API for an interactive environment such as Jupyter Notebook. In summary, trackplot offers an easy, fast, and reliable method for visualizing genomic data. The tool is written in Python and JavaScript which greatly facilitates future maintenance, and we will continue to maintain the updates and upgrades of the package based on suggestions and comments from the community.
The python package Trackplot is open-source and freely available on Docker (https://hub.docker.com/r/ygidtu/trackplot), GitHub (https://github.com/ygidtu/Trackplot), PyPI (https://pypi.org/project/Trackplot/) and Bioconda (https://anaconda.org/bioconda/trackplot). The script to generate S2–S4 Figs and several reproducible examples are available on GitHub (https://github.com/ygidtu/Trackplot/tree/main/example/Article_figures).
Supporting information
Acknowledgments
We also thank Prof. Jingwen Lin and all members from Lu Chen’s lab for testing and advice on the alpha version of Trackplot.
Data Availability
The python package Trackplot is open-source and freely available on Docker (https://hub.docker.com/r/ygidtu/trackplot), GitHub (https://github.com/ygidtu/Trackplot), PyPI (https://pypi.org/project/Trackplot/) and Bioconda (https://anaconda.org/bioconda/trackplot). The script to generate S2-S4 Figs and several reproducible examples are available on GitHub (https://github.com/ygidtu/Trackplot/tree/main/example/Article_figures).
Funding Statement
This work is supported by the National Natural Science Foundation of China (82303975 to R.Z. and 82273117 to Y.W), the National Key Research and Development Program of China, Stem Cell and Translational Research (2022YFA1105200 to Y.W.), the China Postdoctoral Science Foundation (2022TQ0226 to R.Z.), and Post-Doctor Research Project, West China Hospital, Sichuan University (2023HXBH100 to R.Z.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Wright CJ, Smith CWJ, Jiggins CD. Alternative splicing as a source of phenotypic diversity. Nat Rev Genet. 2022. Epub 2022/07/13. doi: 10.1038/s41576-022-00514-4 . [DOI] [PubMed] [Google Scholar]
- 2.Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161(5):1202–14. Epub 2015/05/23. doi: 10.1016/j.cell.2015.05.002 ; PubMed Central PMCID: PMC4481139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17(1):239. Epub 2016/11/27. doi: 10.1186/s13059-016-1103-0 ; PubMed Central PMCID: PMC5124260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Katz Y, Wang ET, Silterra J, Schwartz S, Wong B, Thorvaldsdottir H, et al. Quantitative visualization of alternative exon expression from RNA-seq data. Bioinformatics. 2015;31(14):2400–2. Epub 2015/01/27. doi: 10.1093/bioinformatics/btv034 ; PubMed Central PMCID: PMC4542614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Garrido-Martin D, Palumbo E, Guigo R, Breschi A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 2018;14(8):e1006360. Epub 2018/08/18. doi: 10.1371/journal.pcbi.1006360 ; PubMed Central PMCID: PMC6114895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wu E, Nance T, Montgomery SB. SplicePlot: a utility for visualizing splicing quantitative trait loci. Bioinformatics. 2014;30(7):1025–6. Epub 2013/12/24. doi: 10.1093/bioinformatics/btt733 ; PubMed Central PMCID: PMC3967110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. Epub 2012/04/21. doi: 10.1093/bib/bbs017 ; PubMed Central PMCID: PMC3603213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. Epub 2009/06/10. doi: 10.1093/bioinformatics/btp352 ; PubMed Central PMCID: PMC2723002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.UniProt Consortium T. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018;46(5):2699. Epub 2018/02/10. doi: 10.1093/nar/gky092 ; PubMed Central PMCID: PMC5861450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022;50(D1):D988–D95. Epub 2021/11/19. doi: 10.1093/nar/gkab1049 ; PubMed Central PMCID: PMC8728283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang X, Zhou R, Xiong Y, Zhou L, Yan X, Wang M, et al. Sequential fate-switches in stem-like cells drive the tumorigenic trajectory from human neural stem cells to malignant glioma. Cell Res. 2021;31(6):684–702. Epub 2021/01/05. doi: 10.1038/s41422-020-00451-z ; PubMed Central PMCID: PMC8169837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Tan LY, Whitfield P, Llorian M, Monzon-Casanova E, Diaz-Munoz MD, Turner M, et al. Generation of functionally distinct isoforms of PTBP3 by alternative splicing and translation initiation. Nucleic Acids Res. 2015;43(11):5586–600. Epub 2015/05/06. doi: 10.1093/nar/gkv429 ; PubMed Central PMCID: PMC4477659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhou R, Xiao X, He P, Zhao Y, Xu M, Zheng X, et al. SCAPE: a mixture model revealing single-cell polyadenylation diversity and cellular dynamics during cell differentiation and reprogramming. Nucleic Acids Res. 2022;50(11):e66. Epub 2022/03/16. doi: 10.1093/nar/gkac167 ; PubMed Central PMCID: PMC9226526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The python package Trackplot is open-source and freely available on Docker (https://hub.docker.com/r/ygidtu/trackplot), GitHub (https://github.com/ygidtu/Trackplot), PyPI (https://pypi.org/project/Trackplot/) and Bioconda (https://anaconda.org/bioconda/trackplot). The script to generate S2-S4 Figs and several reproducible examples are available on GitHub (https://github.com/ygidtu/Trackplot/tree/main/example/Article_figures).