Abstract
Circos plots are widely used to display multi-dimensional next-generation genomic data, but existing implementations of Circos are not interactive with limited support of data types. Here, we developed next-generation Circos (NG-Circos), a flexible JavaScript-based circular genome visualization tool for designing highly interactive Circos plots using 21 functional modules with various data types. To our knowledge, NG-Circos is the most powerful software to construct interactive Circos plots. By supporting diverse data types in a dynamic browser interface, NG-Circos will accelerate the next-generation data visualization and interpretation, thus promoting the reproducible research in biomedical sciences and beyond. NG-Circos is available at https://wlcb.oit.uci.edu/NG-Circos and https://github.com/YaCui/NG-Circos.
INTRODUCTION
Visualizing increasing volumes of next-generation biological data is critical to the interpretation of such data. Circos plots are circular two-dimensional visual representations that provide a comprehensive solution for presentation and interpretation of multi-dimensional genomic data. Circos (1), the predominant tool for making Circos plots, has been wildly used for complex biological data visualization in many studies. However, Circos's outputs are not interactive. Other Circos-derived tools, such as Circoletto (2), CIRCUS (3), J-Circos (4), shinyCircos (5), Rcircos (6), Circleator (7), OmicCircos (8), ggbio (9) are either incapable to produce interactive Circos plots in a web browser or are limited to specific data types. Our previous developed tool, BioCircos.js (10), appears to be the only published software capable of producing interactive Circos plots and has become the state-of-the-art tool in the field (11–12). Nonetheless, BioCircos.js (10) implements only nine functional modules, limiting its scope to perform additional analytical tasks.
To address this weakness, here we developed next-generation Circos (NG-Circos), a JavaScript-based circular genome visualization tool that extends beyond the framework of BioCircos.js (10) to integrate and interpret genomic data types through interactive Circos plots. NG-Circos currently contains 21 modules, enabling various functions that were absent in other tools (including BioCircos.js (10)). By supporting diverse types genomic data types in an interactive browser interface, NG-Circos will accelerate the next-generation data visualization and interpretation, thus promoting reproducible research in biomedical sciences and beyond.
MATERIALS AND METHODS
Implementation of NG-Circos
NG-Circos is written in JavaScript and generates interactive graphics with SVG element based on D3.js (data-driven documents) and jQuery.js. Based on JavaScript, NG-Circos can be used without installing additional packages. After downloading NG-Circos, users can reproduce almost all circular plots drawn by Circos with a web browser. Note that NG-Circos itself is not a web application, but is a library to build interactive Circos plots in web applications.
Implementing image-download function in NG-Circos
The download function in NG-Circos is built using the svg-crowbar.js (https://nytimes.github.io/svg-crowbar/) from The New York Times. NG-Circos now supports the SVG and PNG formats. The SVG image format allows users to extract high-quality images that can be further utilized in Adobe Illustrator.
Input data processing in NG-Circos
We provide a data processing script (written by python and shell) for processing raw data, enabling users to easily transform their data into JSON format with default parameters for corresponding module. Notably, the input data of NG-Circos can be either generated by the supporting python scripts, or directly through the well-documented JSON data formats. Users can integrate NG-Circos into an existing JavaScript based web application which has its own internal JSON data structures. We provide an example for each module to illustrate the input data structure and all the steps needed to recreate that example (https://wlcb.oit.uci.edu/modules/).
Processing GWAS data in LocusZoom plot
In Figure 1F, we used PLINK (13) to calculate the r-square value of specific populations and to extract the recombination rate from the Hapmap3 data (14) for specified SNPs.
Web browsers supported by NG-Circos
The running speed of NG-Circos depends on the computing power of browsers and hardware. NG-Circos has passed the debugging and examination in all major internet browsers including Google Chrome, Internet Explorer/Edge, Mozilla Firefox, Safari and Opera.
RESULTS
Workflow of NG-Circos
NG-Circos has a highly user-friendly workflow. It has three main steps to draw an interactive Circos plot: Step 1 includes drawing chromosomes (or other segments) as the coordinate axes. Step 2 involves adding various data tracks using the relevant modules with high flexibility in module choices (21 modules are currently implemented, Supplementary Table S1). The input data of NG-Circos can be either generated by the supporting python scripts, or directly through the well-documented JSON data formats. For each module, we provide one example which includes the input data files and all the steps to recreate that example (https://wlcb.oit.uci.edu/modules/). Finally, step 3 incorporates interactive animations, mouse events (Supplementary Table S2) and designing toolboxes for graphic elements. NG-Circos is highly customizable, allowing users to adjust personal settings. We also provide a set of carefully evaluated default settings for each module and provide many demos to make NG-Circos easy to use. In addition, the capability of NG-Circos can be simply broadened by including more functional modules in step 2.
NG-Circos provides flexible module choices for diverse Circos plots
The current version of NG-Circos consists of 21 modules (Supplementary Table S1). The combination of modules in NG-Circos allows users to construct diverse types of Circos plots. For example, NG-Circos can reproduce complex published Circos plots (15) by combining ARC, GENE, HEATMAP, LINK and WIG modules (Figure 1A). Not only can NG-Circos reproduce complex published Circos plots, but also can it renders additional functions such as providing popular interactive Circos plot demos (e.g. Lollipop, Wig and LocusZoom (16) plots) shown in Figure 1B–F (15) (17) (18) (19), that are not seen in other tools. Moreover, we offer more demos in the online website (https://wlcb.oit.uci.edu/NG-Circos) to show the power of this tool: users can easily replace the demo data with their data to produce their own plots. All figures can be download in the SVG and PNG format, in which the SVG format renders users high-quality images that could be further utilized through other applications such as the Adobe Illustrator. Overall, NG-Circos offers users great flexibility in module choices and Circos plot types.
Case study for interactive data exploration using NG-Circos
Here we present a case study to further illustrate the power of interactive data exploration using NG-Circos. In this case, users can interactively explore driver single nucleotide polymorphisms (SNPs), gene fusions and their impact on protein structure in lung cancer (Figure 2). For example, mouse over events show the SNP frequencies in lung cancer from the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Figure 2B) (20) and the three-dimensional (3D) protein structure of an EML4-ALK gene fusion (Figure 2C) (21). Remarkably, NG-Circos can also redirect elements (such as SNPs or gene fusions) to external resources. For instance, clicking on a SNP, such as the EGFR T790M variant, opens up a new Protein Data Bank (PDB) database webpage, displaying the T790M variant-affected 3D structure of EGFR (Figure 2D; PDB code: 2JIT) (22). To sum up, NG-Circos serves as a great tool to explore genomic data interactively such that users can extract additional information by mouse hovering and clicking on the plots.
DISCUSSION
Interactive data exploration across diverse data types will certainly promote the next-generation data visualization and interpretation, with some successful examples, such as cBioPortal (23), seen in cancer research. Circos plots are widely used to display voluminous next-generation genomic data, but existing implementations of Circos does not generate interactive outputs, which hinders its usability. To address this issue, NG-Circos provides flexible modules choices for interactive data exploration and diverse Circos plots types. As additional types of genomic data are generated in the future, we will keep updating additional functional modules to extend the power of NG-Circos. We will also actively maintain NG-Circos and respond to inquiries from users. By supporting diverse types of genomic data in an interactive web interface, NG-Circos, we believe, will enhance genomic research in the biomedical field in the future.
Supplementary Material
ACKNOWLEDGEMENTS
We acknowledge Tianyi Zang, Yadong Wang and members of the Li lab for constructive discussions and support.
Contributor Information
Ya Cui, Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA 92697, USA.
Zhe Cui, Division of Biostatistics, Dan L Duncan Cancer Center and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA.
Jianfeng Xu, Division of Biostatistics, Dan L Duncan Cancer Center and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA.
Dapeng Hao, Division of Biostatistics, Dan L Duncan Cancer Center and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA.
Jiejun Shi, Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA 92697, USA.
Dan Wang, Department of Medicine, Division of Cardiology, University of California, Los Angeles, CA 90095, USA.
Hui Xiao, Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA 92697, USA.
Xiaohong Duan, ChosenMed Technology (Beijing) Co. Ltd, Beijing 100176, China.
Runsheng Chen, CAS Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101,China.
Wei Li, Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, CA 92697, USA.
SUPPLEMENTARY DATA
Supplementary Data are available at NARGAB Online.
FUNDING
No external funding.
Conflict of interest statement. None declared.
REFERENCES
- 1. Krzywinski M., Schein J., Birol I., Connors J., Gascoyne R., Horsman D., Jones S.J., Marra M.A.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009; 19:1639–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Darzentas N. Circoletto: visualizing sequence similarity with Circos. Bioinformatics. 2010; 26:2620–2621. [DOI] [PubMed] [Google Scholar]
- 3. Naquin D., d’Aubenton-Carafa Y., Thermes C., Silvain M.. CIRCUS: a package for Circos display of structural genome variations from paired-end and mate-pair sequencing data. BMC Bioinformatics. 2014; 15:198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. An J., Lai J., Sajjanhar A., Batra J., Wang C., Nelson C.C.. J-Circos: an interactive Circos plotter. Bioinformatics. 2015; 31:1463–1465. [DOI] [PubMed] [Google Scholar]
- 5. Yu Y., Ouyang Y., Yao W.. ShinyCircos: an R/Shiny application for interactive creation of Circos plot. Bioinformatics. 2018; 34:1229–1231. [DOI] [PubMed] [Google Scholar]
- 6. Zhang H., Meltzer P., Davis S.. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics. 2013; 14:244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Crabtree J., Agrawal S., Mahurkar A., Myers G.S., Rasko D.A., White O.. Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG. Bioinformatics. 2014; 30:3125–3127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Hu Y., Yan C., Hsu C.H., Chen Q.R., Niu K., Komatsoulis G.A., Meerzaman D.. Omiccircos: a simple-to-use R package for the circular visualization of multidimensional Omics data. Cancer Inform. 2014; 13:13–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Yin T., Cook D., Lawrence M.. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol. 2012; 13:R77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Cui Y., Chen X., Luo H., Fan Z., Luo J., He S., Yue H., Zhang P., Chen R.. BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics. 2016; 32:1740–1742. [DOI] [PubMed] [Google Scholar]
- 11. Juanillas V., Dereeper A., Beaume N., Droc G., Dizon J., Mendoza J.R., Perdon J.P., Mansueto L., Triplett L., Lang J. et al.. Rice galaxy: an open resource for plant science. Gigascience. 2019; 8:giz028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Nott A., Holtman I.R., Coufal N.G., Schlachetzki J.C.M., Yu M., Hu R., Han C.Z., Pena M., Xiao J., Wu Y. et al.. Brain cell type–specific enhancer–promoter interactome maps and disease-risk association. Science. 2019; 366:1134–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., De Bakker P.I.W., Daly M.J. et al.. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007; 81:559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Belmont J.W., Hardenbol P., Willis T.D., Yu F., Yang H., Ch’Ang L.Y., Huang W., Liu B., Shen Y., Tam P.K.H. et al.. The international HapMap project. Nature. 2003; 426:789–796. [DOI] [PubMed] [Google Scholar]
- 15. Akdemir K.C., Jain A.K., Allton K., Aronow B., Xu X., Cooney A.J., Li W., Barton M.C.. Genome-wide profiling reveals stimulus-specific functions of p53 during differentiation and DNA damage of human embryonic stem cells. Nucleic Acids Res. 2014; 42:205–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Pruim R.J., Welch R.P., Sanna S., Teslovich T.M., Chines P.S., Gliedt T.P., Boehnke M., Abecasis G.R., Willer C.J., Frishman D.. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2011; 26:2336–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Twohig J.P., Cardus Figueras A., Andrews R., Wiede F., Cossins B.C., Derrac Soria A., Lewis M.J., Townsend M.J., Millrine D., Li J. et al.. Activation of naïve CD4 + T cells re-tunes STAT1 signaling to deliver unique cytokine responses in memory CD4 + T cells. Nat. Immunol. 2019; 20:458–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Schultheis A.M., Martelotto L.G., De Filippo M.R., Piscuglio S., Ng C.K.Y., Hussein Y.R., Reis-Filho J.S., Soslow R.A., Weigelt B.. TP53 mutational spectrum in endometrioid and serous endometrial cancers. Int. J. Gynecol. Pathol. 2016; 35:289–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Cho S.W., Xu J., Sun R., Mumbach M.R., Carter A.C., Chen Y.G., Yost K.E., Kim J., He J., Nevins S.A. et al.. Promoter of lncRNA gene PVT1 is a tumor-suppressor DNA boundary element. Cell. 2018; 173:1398–1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Forbes S.A., Beare D., Boutselakis H., Bamford S., Bindal N., Tate J., Cole C.G., Ward S., Dawson E., Ponting L. et al.. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017; 45:D777–D783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Wang D., Li D., Qin G., Zhang W., Ouyang J., Zhang M., Xie L.. The structural characterization of tumor fusion genes and proteins. Comput. Math. Methods Med. 2015; 2015:doi:10.1155/2015/912742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Yun C.H., Mengwasser K.E., Toms A. V., Woo M.S., Greulich H., Wong K.K., Meyerson M., Eck M.J.. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. Proc. Natl. Acad. Sci. U.S.A. 2008; 105:2070–2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gao J., Aksoy B.A., Dogrusoz U., Dresdner G., Gross B., Sumer S.O., Sun Y., Jacobsen A., Sinha R., Larsson E. et al.. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 2013; 6:pl1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Jiang S., Xie Y., He Z., Zhang Y., Zhao Y., Chen L., Zheng Y., Miao Y., Zuo Z., Ren J.. m6ASNP: a tool for annotating genetic variants by m6A function. Gigascience. 2018; 7:giy035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Mateo L., Guitart-Pla O., Pons C., Duran-Frigola M., Mosca R., Aloy P.. A PanorOmic view of personal cancer genomes. Nucleic Acids Res. 2017; 45:W195–W200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Teng X., Chen X., Xue H., Tang Y., Zhang P., Kang Q., Hao Y., Chen R., Zhao Y., He S.. NPInter v4.0: an integrated database of ncRNA interactions. Nucleic Acids Res. 2020; 48:D160–D165. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.