Abstract
VennPainter is a program for depicting unique and shared sets of genes lists and generating Venn diagrams, by using the Qt C++ framework. The software produces Classic Venn, Edwards’ Venn and Nested Venn diagrams and allows for eight sets in a graph mode and 31 sets in data processing mode only. In comparison, previous programs produce Classic Venn and Edwards’ Venn diagrams and allow for a maximum of six sets. The software incorporates user-friendly features and works in Windows, Linux and Mac OS. Its graphical interface does not require a user to have programing skills. Users can modify diagram content for up to eight datasets because of the Scalable Vector Graphics output. VennPainter can provide output results in vertical, horizontal and matrix formats, which facilitates sharing datasets as required for further identification of candidate genes. Users can obtain gene lists from shared sets by clicking the numbers on the diagram. Thus, VennPainter is an easy-to-use, highly efficient, cross-platform and powerful program that provides a more comprehensive tool for identifying candidate genes and visualizing the relationships among genes or gene families in comparative analysis.
Introduction
In comparative genomics, the visualization of results can help viewers discover correlations and trends in large datasets [1–4]. Many methods can visualize statistical analysis (e.g., scatter diagrams, line graphs, and histograms) [2,3,5–8], biological networks (e.g., pathway and functional networks) [7,9–11] and comparisons of large-scale ‘omic’ data (e.g., clusters, heatmaps, and circsters) [1,3,4,6,12]. Venn diagrams, first developed by John Venn in 1880 [13], are widely used for comparing multiple genomic, transcriptomic and proteomic datasets due to their ease-of-interpretation and graphical simplicity [14–21]. These diagrams help to identify candidate genes and gene networks for downstream analyses. For example, the simple n-Venn diagram is a collection of n simple intersecting closed curves in the plane. It indicates the relationships among datasets, including intersections, sums, complements [13,22]. The curves divide the plane into 2n-1 distinct intersections, each defined by its intersection of the interior or exterior of each of the curves [23]. Generally, the Classic Venn diagram deciphers no more than four sets. The development of Symbolic Logic has facilitated several approaches for constructing Venn diagram with more than five sets, including Classic, Edwards’, Lewis Carroll’s and Nested Venn diagrams [24,25]. Edwards’ and Nested methods might generate Venn diagram for an infinite number sets, but the partition of sets among multiple datasets might have complex associations because distinct open regions increase exponentially with the increase in set-number. This makes it difficult to generate intuitive diagrams that display associations among datasets.
Many open access programs can generate Venn diagrams, such as Venny [26], VennDiagram [27], BioVenn [28], GeneVenn [29], 4-way Venn Diagram Generator, DrawVenn, VennMaster [30], VennPlex [31], VennTure [32] and others. However, these programs have some limitations. For example, DrawVenn requires the manual drawing of diagrams and it cannot process data. VennMaster [30] provides area-proportional Euler diagrams for functional GO analysis of microarrays only. VennPlex [31] compares and visualizes datasets with differentially regulated data points. Powerful VennDiagram [27] generates Venn and Euler diagrams in R and it provides a large number of customizable features. Unfortunately, its command-line operation is not user-friendly. VennTure [32] can generate six-sets Venn diagrams with a graphic user interface (GUI), yet it consumes large amounts of memory and has low computational efficiency. Venny [26], BioVenn [28], GeneVenn [29], and 4-way Venn Diagram Generator are web applications. Despite their power and being user-friendly, none of them can evaluate more than four datasets. The latest program, jVenn [33], can handle six input lists at most but only provides Classic and Edwards’ Venn diagrams.
Available programs generate no more than six-set Venn diagrams and only support Classic and Edwards’ Venn layouts. Larger datasets often present an insurmountable challenge to deciphering and drawing Venn diagrams of shared relationships manually. This complexity might explain the dearth of applications [34–37]. To rectify this limitation and address Venn-based demands, we report the development of VennPainter, a program that introduces a new nested Venn layout. Fig 1 illustrates seven-set Edwards’ (Fig 1a) and Classic’s (Fig 1b) Venn diagrams. The former illustrates that intersections become smaller with increasing numbers of sets, which presents a challenge for interpretation. The irregular curves of the latter approach are equally challenging. In comparison, the nested Venn (Fig 1c) is far more easily interpreted. VennPainter incorporates the nested Venn layout and increases the number of allowable datasets up to eight with diagram output. It also offers text output of up to 31 datasets for downstream analyses. Further, VennPainter elevates computational efficiency.
Implementation and Method
VennPainter and its availability
VennPainter (Fig 2) was developed with Qt 4.8.5 under its LGPL v2.1 license. The Qt C++ framework was chosen for its cross-platform capabilities, open-source nature, and secure language construction for communicating between objects (signals and slots) (http://qt-project.org/). For data sets ranging from nine to 31, VennPainter provides vertical, horizontal and matrix text-based formats for the benefit of downstream analyses. The user manual and basic instructions appear in the initial interface of the program, and can be downloaded together with VennPainter at https://github.com/linguoliang/VennPainter/.
Algorithm
VennPainter uses set-theory to generate Venn diagrams. The intersection is defined as follow:
and its complement:
Technically, integer ax is assigned to label element x.
ax can represent the following:
and
Thus, if ax1 = ax2, then x1 and x2 belong to the same intersection. VennPainter labels every intersection Um with an integer in the Venn diagram (S1, S2 and S3 Figs). If , then x ∈ Um. The flowchart (Fig 3) shows how VennPainter works.
Adapted Venn Diagrams in VennPainter
Users can select Classic Venn, Edwards’ Venn and Nested Venn diagrams [25] (Fig 4a–4c).
Input and Output
VennPainter requires that each set be input as a text file. A white space character (space, tab, and newline) must separate every element in the set. After uploading all files, the program stores all elements in a hash table and classifies the elements. The algorithm obtains all statistics from a single read of the hash table. VennPainter can export integrated data as a text file (Fig 5) in Matrix, Vertical and Horizontal text-based formats. In the Matrix format, the first row contains all datasets and the first column contains all elements from the datasets. Other columns contain elements belonging to respective datasets. In the Vertical mode, each row indicates an intersection. For example, a six-set Venn diagram has 64 intersections and, thus, the text file contains 64 rows. Horizontal mode is identical to the vertical mode except for the exchange of columns and rows. Further, VennPainter can export single-shared datasets. Users can obtain a specific shared-dataset by clicking the number on the diagram and the ‘export’ button. Exported images are in the SVG format (Scalable Vector Graphics) [38,39], which can be read and modified easily by many graphic vector editors, such as Adobe Illustrator, Inkscape and CorelDRAW. The software provides tooltips when the mouse point over buttons or numbers in the diagram.
Results and Discussion
Example Application
To demonstrate the functions of VennPainter, we use it to depict shared gene sets in the goldfish x common carp hybrid system using eight annotated gene lists generated from RNA-seq data (S4 Fig) [37]. The Nested Venn diagram shows unique and shared relationships of eight sets by inlaying four unique-shared diagrams into the other four sets’ unique sharing diagram. The number in the center-most area (27,681) in the black rectangle shows the shared genes by all eight samples (S4 Fig). In a very intuitive manner, Nested Venn shows that each sample had more than 200 unique genes. It efficiently obtains candidate genes and facilitates downstream analyses of GO enrichment and KEGG annotation [37].
We evaluate the following seven primate gene-lists from GFF files (NCBI Genome database; S1 Table) using VennPainter: Homo sapiens, Gorilla gorilla, Macaca mulatta, Nomascus leucogenys, Pongo abelii, Pan paniscus, and Rhinopithecus roxellana. A comparison of our analyses with that of Zhou et al. 2014 [36] is informative. Analyses by the latter authors discovered 38 unique or shared sets, only 14 sets were marked with gene numbers, and 10,244 genes or gene families were shared by the seven primates (Fig 6a). In contrast, VennPainter depicts 127 intersections that a seven-set Venn diagrams should resolve, and these primates share 8,452 annotated genes (Fig 6b). Their Venn diagram did not depict all possible logical relationships among all the sets.
Benchmark Test
To evaluate VennPainter’s relative performance, we use benchmarking data (Table 1). The benchmarking database contains four files of about 0.5MB each. Comparisons use an Intel core i5-5200U, 12GB memory, and Win 10 (64-bit). jVenn and Venny use Google Chrome 47.0.2526.111 m (64-bit). In comparison, jVenn, Venny and VennDiagram consume 3554 milliseconds (ms), 979 ms and 1078 ms, respectively, while VennPainter only costs 137 ms. Vennture crashes after 8.4*105ms. Thus, VennPainter is more than seven times faster than other tested programs. The increased speed owes to VennPainter bring programmed in C++, while Venny and jVenn were programmed by JavaScript and VennDiagram by R.
Table 1. Comparison of BioVenn, Venny, jVenn, VennDiagrams, VennTure, and VennPainter.
BioVenn | Venny | jVenn | VennDiagrams | VennTure | VennPainter | |
---|---|---|---|---|---|---|
Application type | web application | web application | web application | R package | Standalone (Windows only) | Standalone (Cross-platform) |
Fill-shape Color | Yes | Yes | Yes | Yes | No | Yes |
Maximum sets | 3 | 4 | 6 | 5 | 6 | 8 in graph, 31 in data |
Image format | PNG and SVG | PNG | PNG | TIFF | EMF | SVG |
Layouts | Classic | Classic | Classic and Edwards | Classic | Edwards | Nest Venn, Classic and Edwards |
Interface | Graphical User Interface | Graphical User Interface | Graphical User Interface | Commend Line Interface | Graphical User Interface | Graphical User Interface |
Performace with benchmark data | -a | 979ms | 3554ms | 1078ms | 8.4*105 msb | 137ms |
a: BioVenn is based on browser/server architecture. It is impossible to be estimated in local machine.
b: Time was estimated when VennTure ran out of memory (2GB). VennTure is a win32 program that cannot manage more than 2GB memory.
Platforms and GUI
Several features make VennPainter more efficient at processing data than other available tools. VennPainter works with Windows, Linux and Mac operating systems (Table 1) and it has a concise GUI that eliminates the need for programming skills. The simple clicking on a number in any diagram promotes downstream analyses. Unlike other programs, VennPainter provides three diagrams including Classic Venn, Edwards’ Venn and Nested Venn diagrams for flexibility. Nested Venn is the default depiction when evaluating for more than six sets because regions have a more evenly distribution than Edwards’ Venn and are more orderly than classic Venn [34]. This approach makes it easy to fill in and visualize numbers. Nested Venn diagrams are particularly effective when considering more than six datasets, and VennPainter extends the capacity of processing up to eight datasets. So far, only VennPainter can achieve this comparison. Thus, VennPainter can applied to all shared data that need to be extract from dataset(s) for genomic and transcriptomic comparison.
Supporting Information
Acknowledgments
This research was supported by the National Natural Science Foundation of China (91331105, 31360514 and 61363021), Program for Innovative Research Team (in Science and Technology) in University of Yunnan Province, and the State Key laboratory of Genetics, Resources and Evolution, Kunming Institute of Zoology, CAS.
Data Availability
VennPainter is available on https://github.com/linguoliang/VennPainter/, The GFF files are downloaded from NCBI genome database.
Funding Statement
This research was supported by the National Natural Science Foundation of China (91331105, 31360514 and 61363021), Program for Innovative Research Team (in Science and Technology) in University of Yunnan Province, and the State Key laboratory of Genetics, Resources and Evolution, Kunming Institute of Zoology, CAS. The funders JL and WZ (three projects of the National Natural Science Foundation of China) conceived and designed the study. The other funders (the last two projects in the above list) had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Cain AA, Kosara R, Gibas CJ (2012) Genosets: Visual analytic methods for comparative genomics. PloS ONE 7(10): e46401 10.1371/journal.pone.0046401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bare JC, Koide T, Reiss DJ, Tenenbaum D, Baliga NS (2010) Integration and visualization of systems biology data in context of the genome. BMC Bioinformatics 11: 382 10.1186/1471-2105-11-382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Goecks J, Eberhard C, Too T, Nekrutenko A, Taylor J (2013) Web-based visual analysis for high-throughput genomics. BMC Genomics 14: 397 10.1186/1471-2164-14-397 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Baran R, Robert M, Suematsu M, Soga T, Tomita M (2007) Visualization of three-way comparisons of omics data. BMC Bioinformatics 8: 72 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Thorvaldsen S, Fla T, Willassen NP (2010) DeltaProt: a software toolbox for comparative genomics. BMC Bioinformatics 11: 573 10.1186/1471-2105-11-573 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Phanstiel DH, Boyle AP, Araya CL, Snyder MP (2014) Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics 30(19): 2808–2810. 10.1093/bioinformatics/btu379 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18): 3674–3676. [DOI] [PubMed] [Google Scholar]
- 8.Rasko DA, Myers GS, Ravel J (2005) Visualization of comparative genomic analyses by BLAST score ratio. BMC Bioinformatics 6: 2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Forman JJ, Clemons PA, Schreiber SL, Haggarty SJ (2005) SpectralNET—an application for spectral graph analysis and visualization. BMC Bioinformatics 6: 260 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, et al. (2011) clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12: 436 10.1186/1471-2105-12-436 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Demchak B, Hull T, Reich M, Liefeld T, Smoot M, Ideker T, et al. (2014) Cytoscape: the network visualization tool for GenomeSpace workflows. F1000Research 3: 151 10.12688/f1000research.4492.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hu Y, Yan C, Hsu CH, Chen QR, Niu K, Komatsoulis GA, et al. (2014) OmicCircos: A Simple-to-Use R Package for the Circular Visualization of Multidimensional Omics Data. Cancer Informatics 13: 13–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ruskey F, Weston M (1997) A survey of Venn diagrams. Electronic Journal of Combinatorics 4. [Google Scholar]
- 14.Tobias CM, Sarath G, Twigg P, Lindquist E, Pangilinan J, Penning BW, et al. (2008) Comparative genomics in switchgrass using 61,585 high-quality expressed sequence tags. The Plant Genome 1(2): 111–124. [Google Scholar]
- 15.Thomson NR, Clayton DJ, Windhorst D, Vernikos G, Davidson S, Churcher C, et al. (2008) Comparative genome analysis of Salmonella enteritidis PT4 and Salmonella gallinarum 287/91 provides insights into evolutionary and host adaptation pathways. Genome Research 18(10): 1624–1637. 10.1101/gr.077404.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Beare PA, Unsworth N, Andoh M, Voth DE, Omsland A, Gilk SD, et al. (2009) Comparative genomics reveal extensive transposon-mediated genomic plasticity and diversity among potential effector proteins within the genus Coxiella. Infection and Immunity 77(2): 642–656. 10.1128/IAI.01141-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wilkinson P, Waterfield NR, Crossman L, Corton C, Sanchez-Contreras M, Vlisidou I et al. (2009) Comparative genomics of the emerging human pathogen Photorhabdus asymbiotica with the insect pathogen Photorhabdus luminescens. BMC genomics 10: 302 10.1186/1471-2164-10-302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bauer J, Antosh M, Chang C, Schorl C, Kolli S, Neretti N, et al. (2010) Comparative transcriptional profiling identifies takeout as a gene that regulates life span. Aging (Albany NY) 2(5): 298–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chen Y, Stine OC, Badger JH, Gil AI, Nair GB, Nishibuchi M, et al. (2011) Comparative genomic analysis of Vibrio parahaemolyticus: serotype conversion and virulence. BMC Genomics 12: 294 10.1186/1471-2164-12-294 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pan G, Xu J, Li T, Xia Q, Liu S-L, Zhang G, et al. (2013) Comparative genomics of parasitic silkworm microsporidia reveal an association between genome expansion and host adaptation. BMC Genomics 14: 186 10.1186/1471-2164-14-186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang Y, Zou X, Ding Y, Wang H, Wu X, Liang B (2013) Comparative genomics and functional study of lipid metabolic genes in Caenorhabditis elegans. BMC Genomics 14: 164 10.1186/1471-2164-14-164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Henderson DW (1963) Venn diagrams for more than four classes. American Mathematical Monthly 70(4): 424–426. [Google Scholar]
- 23.Bultena B (2013) Face-balanced, Venn and polyVenn diagrams: University of Victoria. [Google Scholar]
- 24.Edwards AWF (2004) Cogwheels of the mind: the story of Venn diagrams: JHU Press. [Google Scholar]
- 25.Radcliffe NJ (2010) Nested Venn Diagrams. Available: http://www.stochasticsolutions.com/pdf/NestedVenn.pdf.
- 26.Oliveros J (2007) Venny. An interactive tool for comparing lists with Venn Diagrams. Available: http://bioinfogp.cnb.csic.es/tools/Venny/index.html.
- 27.Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly-customizable Venn and Euler C in R. BMC Bioinformatics 12: 35 10.1186/1471-2105-12-35 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hulsen T, de Vlieg J, Alkema W (2008) BioVenn–a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 9: 488 10.1186/1471-2164-9-488 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pirooznia M, Nagarajan V, Deng Y (2007) GeneVenn-A web application for comparing gene lists using Venn diagrams. Bioinformation 1(10):420–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kestler HA, Müller A, Kraus JM, Buchholz M, Gress TM, Liu H, et al. (2008) VennMaster: area-proportional Euler diagrams for functional GO analysis of microarrays. BMC Bioinformatics 9: 67 10.1186/1471-2105-9-67 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cai H, Chen H, Yi T, Daimon CM, Boyle JP, Peers C, et al. (2013) VennPlex–A novel Venn diagram program for comparing and visualizing datasets with differentially regulated datapoints. PLoS ONE 8: e53388 10.1371/journal.pone.0053388 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Martin B, Chadwick W, Yi T, Park S-S, Lu D, Ni B, et al. (2012) VENNTURE–a novel Venn diagram investigational tool for multiple pharmacological dataset analysis. PLoS ONE 7: e36911 10.1371/journal.pone.0036911 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C (2014) jVenn: an interactive Venn diagram viewer. BMC bioinformatics 15: 293 10.1186/1471-2105-15-293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhang Q-J, Zhu T, Xia E-H, Shi C, Liu Y-L, Zhang Y, et al. (2014) Rapid diversification of five Oryza AA genomes associated with rice adaptation. Proceedings of the National Academy of Sciences USA 111 (46): E4954–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen Z-Y, Guo X-J, Chen Z-X, Chen W-Y, Liu D-C, Zheng YL, et al. (2015) Genome-wide characterization of developmental stage- and tissue-specific transcription factors in wheat. BMC Genomics 16(1): 125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhou X, Wang B, Pan Q, Zhang J, Kumar S, Sun X, et al. (2014) Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nature Genetics 46: 1303–1310. 10.1038/ng.3137 [DOI] [PubMed] [Google Scholar]
- 37.Liu S, Luo J, Chai J, Ren L, Zhou Y, Huang F, et al. (2016) Genomic incompatibilities in the diploid and tetraploid offspring of the goldfish× common carp cross. Proceedings of the National Academy of Sciences USA 113(5):1327–1332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ferraiolo J, Jun F, Jackson D (2000) Scalable vector graphics (SVG) 1.0 specification. World Wide Web Consortium Available: https://www.w3.org/TR/1999/WD-SVG-19991203.pdf.
- 39.Quint A (2003) Scalable vector graphics. IEEE Multimedia 10(3): 99–102. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
VennPainter is available on https://github.com/linguoliang/VennPainter/, The GFF files are downloaded from NCBI genome database.