Trackplot: a fast and lightweight R script for epigenomic enrichment plots

Anand Mayakonda; Frank Westermann

doi:10.1093/bioadv/vbae031

. 2024 Feb 28;4(1):vbae031. doi: 10.1093/bioadv/vbae031

Trackplot: a fast and lightweight R script for epigenomic enrichment plots

Anand Mayakonda ^1,^2,^✉, Frank Westermann ^3,⁴

Editor: Aida Ouangraoua

PMCID: PMC10932608 PMID: 38476298

Abstract

Motivation

BigWig files serve as essential inputs in epigenomic data visualization. However, current R packages for visualizing these files are limited, slow, and burdened by numerous dependencies.

Results

We introduce trackplot, a minimal R script designed for the rapid generation of integrative genomics viewer (IGV) style track plots, profile plots, and heatmaps from bigWig files. This script offers speed, owing to its reliance on bwtool, resulting in performance gains of several magnitudes compared to equivalent packages. The script is lightweight, requiring only the data.table and bwtool packages as primary dependencies. Notably, the plots are generated in base R graphics, eliminating the need for additional packages. trackplot queries the University of California Santa Cruz (UCSC) genome browser for gene models thereby enhancing the reproducibility of analyses. The script extends its support to general transfer format (GTF) further enhancing its versatility. This tool addresses the gaps in existing bigWig visualization approaches by offering speed, simplicity, and minimal dependencies, thereby presenting a valuable asset to researchers in the fields of epigenomics.

Availability and implementation

trackplot is implemented in R is made available under MIT license at https://github.com/PoisonAlien/trackplot.

1 Introduction

Efficient data visualization is a critical step in the analysis of epigenomics and transcriptomics, enabling the interpretation of complex biological insights. These analyses often demand the representation of quantitative variations across genomic locations and experimental conditions. Notably, sequencing methodologies like chromatin immunoprecipitation with sequencing (ChIP-seq), assays for transposase-accessible chromatin with sequencing (ATAC-seq), and whole genome bisulfite sequencing (WGBS) produce voluminous data, underscoring the significance of effective visualization tools. BigWig files have emerged as essential components of data representation (Kent et al. 2010). These binary, compressed, and indexed files offer rapid data accessibility, making them the preferred choice for storing extensive datasets. As such, they also constitute the primary data source for visualization.

However, the current landscape of data visualization tools in R such as Gviz and karyoploteR comes with limitations (Hahne and Ivanek 2016, Gel and Serra 2017). These tools often suffer from cumbersome dependencies and speed when handling large datasets.

To address these limitations, we introduce trackplot—an R- script that takes a lightweight and efficient approach, leveraging bwtool (Pohl and Beato 2014) to significantly enhance data extraction from bigWig files, and base R graphics for complex visualizations. Notably, trackplot significantly minimizes dependencies, simplifying installation and maintenance. This provides an elegant solution for streamlined epigenomic data analyses.

2 Results

2.1 Lightweight and minimal dependency

trackplot leverages the data.table R package for rapid data import and manipulation, utilizing bwtool to process bigWig files of queried regions. Dynamic SQL queries enable the extraction of gene models from the UCSC genome browser (Nassar et al. 2023), and complex visualizations are executed using base R graphics, circumventing heavy dependencies like ggplot2 and tidyverse packages—that often suffer from version compatibility issues, external system dependencies, longer run-times, and frequent installation errors. These design choices result in swift processing and a lightweight package structure suitable for UNIX-based systems—commonly used in bioinformatic analysis. Comparative analyses of package dependencies, in contrast to similar R packages such as Gviz, and karyoploteR, underscore the advantages of trackplot (Fig. 1A) (Gu and Hübschmann 2022). Its efficient, dependency-light approach facilitates accelerated analysis and visualization workflows.

Figure 1. — (A) Number of dependency packages required to install *trackplot, karyoploteR*, and *Gviz* R packages (lower is better). (B) Heatmap and (C) Profile plot for BRD4 binding sites in U87 cell lines treated with DMSO and BRD4 degrader at 2 and 24 hours respectively (data from GSE99171 (Xu *et al.* 2018)). (D) Time taken by *trackplot* and *deeptools* to generate (B) and (C). (E) ENCODE ChIP-seq data for human embryonic stem cell line H1 visualized as an IGV style track. The top panel shows OCT4 and NANOG binding sites. The bottom plots show chromHMM models, gene models, and cytoband extracted from the UCSC genome browser. (F) Principal component analysis of hematopoietic stem and myeloid progenitor cells based on the H3K27Ac signal across the promoters of the protein-coding genes (1200 bp upstream and 800 bp downstream of transcription start sites). The plot shows the standard differentiation trajectory of HSCs on principal component 2 (top to bottom). Data was obtained from GEO accession number GSE231426 (Subramanian *et al.* 2023). HSC: Haemopoietic stem cells; CMP: Common myeloid progenitor; MEP: Megakaryocte-Erythroid Progenitor cells; GMP: Granulocyte monocyte progenitor cells.

2.2 Profile plots and heatmaps for epigenomic datasets

tracklplot allows generating profile plots and heatmaps, two fundamental visualizations in epigenomic datasets such as ChIP-seq and ATAC-seq (Fig. 1B and C). Leveraging the bwtool, the script extracts the underlying signal for user-defined genomic regions from BED files or can automatically retrieve transcript models from the UCSC genome browser. Comparing trackplot with deeptools (Ramírez et al. 2014) – a commonly used software for processing bigwig files, showed that trackplot is up to 20 times faster for generating profile plots and heatmaps from the same dataset (Fig. 1D). Of note, extracting signals from bigwig files across thousands of genomic regions is often computationally expensive and time consuming. Trackplot significantly reduces the run times while offering additional functionalities. This speed is primarily owed to the efficient processing of bigwig files by bwtool which is further enhanced by rapid aggregation and plotting of the resulting output in R. Please refer to the Supplementary Material for the details on the datasets, reproducible code, and bench-marking experiments.

2.3 Visualizing IGV style tracks

Key visualization in trackplot is a user-friendly mechanism for generating IGV (Integrative Genomics Viewer) style signal tracks (Fig. 1E). This functionality allows for the extraction and visualization of tracks from either specific genomic regions of interest or individual target genes. The script extracts gene models for the queried regions from the UCSC genome browser database by employing dynamic SQL queries. Additionally, the tool extends its capabilities to incorporate chromHMM data visualization. chromHMM models can be from user-provided datasets or loaded from 20 cell lines accessible via the UCSC genome browser. The option to incorporate custom gene models in GTF format adds a layer of flexibility to the visualization process. Moreover, the plotting function enables customization such as highlighting target loci of interest, summarizing replicates per condition, and auto-scaling the tracks. Notably, in the absence of bigWig files the script accommodates the use of narrowPeak or broadPeak files to depict peak intensities.

2.4 Comprehensive insights through summary plots and PCA

In addition to the plotting functions, trackplot can be employed to extract peak intensities, thereby enabling a quantitative overview of genomic loci across conditions. In particular, for uniformly processed and normalized bigWig files, these values can be utilized for dimensional reduction such as principal component analysis (PCA; Fig. 1F).

3 Conclusion

Although trackplot does not implement any novel algorithms or data structures, it excels in bringing together existing solutions to provide an easy-to-use toolset—implemented in R programming language. Overall, trackplot offers an elegant data visualization solution for epigenomic datasets. Its versatile functionalities can be used for various analysis needs, ranging from IGV-style genomic tracks to profile plots, heatmaps, summary plots, and PCA. Furthermore, by mitigating the heavy R package dependencies, trackplot demonstrates its potential in data visualization while offering speed and ease of use.

Supplementary Material

vbae031_Supplementary_Data

vbae031_supplementary_data.zip^{(1.1MB, zip)}

Acknowledgements

The authors would like to thank Joschka Hey for the helpful suggestions and all the users for their valuable feedback and feature requests.

Contributor Information

Anand Mayakonda, Hopp Children’s Cancer Center (KiTZ), 69120 Heidelberg, Germany; Division of Neuroblastoma Genomics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.

Frank Westermann, Hopp Children’s Cancer Center (KiTZ), 69120 Heidelberg, Germany; Division of Neuroblastoma Genomics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.

Supplementary data

Supplementary data are available at Bioinformatics Advances online.

Conflict of interest

None declared.

Funding

This work was supported by the Dieter-Hopp-Foundation grant “Innovative Therapies for Children with Cancer—International Data Integration Platform to Prioritize Drug Development and Access for Children with Cancer” – work package 5: Data visualization and Sharing.

Data availability

All the data sources used are properly cited with the accession numbers wherever required.

References

Gel B, Serra E.. karyoploteR: an R/bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 2017;33:3088–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gu Z, Hübschmann D.. Pkgndep: a tool for analyzing dependency heaviness of R packages. Bioinformatics 2022;38:4248–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hahne F, Ivanek R.. Visualizing genomic data using gviz and bioconductor. Methods Mol Biol 2016;1418:335–51. [DOI] [PubMed] [Google Scholar]
Kent WJ, Zweig AS, Barber G. et al. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 2010;26:2204–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nassar LR, Barber GP, Benet-Pagès A. et al. The UCSC genome browser database: 2023 update. Nucleic Acids Res 2023;51:D1188–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pohl A, Beato M.. bwtool: a tool for bigWig files. Bioinformatics 2014;30:1618–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramírez F, Dündar F, Diehl S. et al. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 2014;42:W187–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
Subramanian S, Thoms JAI, Huang Y. et al. Genome-wide transcription factor-binding maps reveal cell-specific changes in the regulatory architecture of human HSPCs. Blood 2023;142:1448–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu L, Chen Y, Mayakonda A. et al. Targetable BET proteins- and E2F1-dependent transcriptional program maintains the malignancy of glioblastoma. Proc Natl Acad Sci USA 2018;115:E5086–95. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

vbae031_Supplementary_Data

vbae031_supplementary_data.zip^{(1.1MB, zip)}

Data Availability Statement

All the data sources used are properly cited with the accession numbers wherever required.

[vbae031-B1] Gel B, Serra E.. karyoploteR: an R/bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 2017;33:3088–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B2] Gu Z, Hübschmann D.. Pkgndep: a tool for analyzing dependency heaviness of R packages. Bioinformatics 2022;38:4248–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B3] Hahne F, Ivanek R.. Visualizing genomic data using gviz and bioconductor. Methods Mol Biol 2016;1418:335–51. [DOI] [PubMed] [Google Scholar]

[vbae031-B4] Kent WJ, Zweig AS, Barber G. et al. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 2010;26:2204–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B5] Nassar LR, Barber GP, Benet-Pagès A. et al. The UCSC genome browser database: 2023 update. Nucleic Acids Res 2023;51:D1188–95. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B6] Pohl A, Beato M.. bwtool: a tool for bigWig files. Bioinformatics 2014;30:1618–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B7] Ramírez F, Dündar F, Diehl S. et al. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 2014;42:W187–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B8] Subramanian S, Thoms JAI, Huang Y. et al. Genome-wide transcription factor-binding maps reveal cell-specific changes in the regulatory architecture of human HSPCs. Blood 2023;142:1448–62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[vbae031-B9] Xu L, Chen Y, Mayakonda A. et al. Targetable BET proteins- and E2F1-dependent transcriptional program maintains the malignancy of glioblastoma. Proc Natl Acad Sci USA 2018;115:E5086–95. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Trackplot: a fast and lightweight R script for epigenomic enrichment plots

Anand Mayakonda

Frank Westermann

Roles

Abstract

Motivation

Results

Availability and implementation

1 Introduction

2 Results

2.1 Lightweight and minimal dependency

Figure 1.

2.2 Profile plots and heatmaps for epigenomic datasets

2.3 Visualizing IGV style tracks

2.4 Comprehensive insights through summary plots and PCA

3 Conclusion

Supplementary Material

Acknowledgements

Contributor Information

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Trackplot: a fast and lightweight R script for epigenomic enrichment plots

Anand Mayakonda

Frank Westermann

Roles

Abstract

Motivation

Results

Availability and implementation

1 Introduction

2 Results

2.1 Lightweight and minimal dependency

Figure 1.

2.2 Profile plots and heatmaps for epigenomic datasets

2.3 Visualizing IGV style tracks

2.4 Comprehensive insights through summary plots and PCA

3 Conclusion

Supplementary Material

Acknowledgements

Contributor Information

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases