Abstract
ImageGP is an extensively utilized, open‐access platform for online data visualization and analysis. Over the past 7 years, it has catered to more than 700,000 usages globally, garnering substantial user feedback. The updated version, ImageGP 2 (available at https://www.bic.ac.cn/BIC), introduces a redesigned interface leveraging cutting‐edge web technologies to enhance functionality and user interaction. Key enhancements include the following: (i) Addition of modules for data format transformation, facilitating operations such as matrix merging, subsetting, and transformation between long and wide formats. (ii) Streamlined workflows with features like preparameter selection data validation and grouping of parameters with similar attributes. (iii) Expanded repertoire of visualization functions and analysis tools, including Weighted Gene Co‐Expression Network Analysis, differential gene expression analysis, and FASTA sequence processing. (iv) Personalized user space for uploading large data sets, tracking analysis history, and sharing reproducible analysis data, scripts, and results. (v) Enhanced user support through a simplified error debugging feature accessible with a single click. (vi) Introduction of an R package, ImageGP, enabling local data visualization and analysis. These updates position ImageGP 2 as a versatile tool serving both wet‐lab and dry‐lab researchers with expanded capabilities.
Keywords: biology cloud platform, data analysis, data transformation, data visualization, ImageGP
The infinity symbol illustrates the seamless workflow of ImageGP 2, encompassing essential functions such as data format transformation, data validation, and parameter combination. This process culminates in the generation of diverse visual outputs, including line, point, and bar plots. Key features include a personalized user center for managing large data sets, interactive visualizations, and streamlined error feedback mechanisms. Additionally, the introduction of the ImageGP R package enables local and batch analyses. Overall, the infinity symbol embodies the limitless potential for data analysis and visualization offered by ImageGP 2.

Highlights
Advanced user interface, expanded analytical capabilities, and seamless data handling.
New modules for data transformation and preparameter selection data validation.
Personalized user center, reproducible scripts, seamless error debugging, the introduction of local analysis capabilities.
INTRODUCTION
In the era of “Omics” in life sciences, the accumulation of vast and complex biological data sets has become ubiquitous across genomic, transcriptomic, epigenomic, proteomic, metabolomic, and clinical domains [1]. The big science projects such as the Human Genome Project [2, 3], ENCODE [4], Human Cell Atlas [5], Earth BioGenome Project [6], Protist 10,000 Genomes Project [7], and the Proteomic Navigator of the Human Body (π‐HuB) are generating unprecedented volumes of data. For instance, GenBank's latest release (261.0) alone encompasses over 3.38 billion whole genome sequencing records, containing 27.9 trillion base pairs of genome data [8]. Similarly, the NGDC GSA warehouse holds nearly 50 petabytes of data (June 2024) [9], while IMP catalogs 716 billion genome base pairs from plant genomes [10]. The management and integration of such heterogeneous data pose significant challenges, necessitating robust tools and methodologies.
Effective utilization of these vast biological data sets promises substantial value but requires overcoming numerous challenges, including data complexity, integration across diverse resources, and the establishment of standardized principles for big data handling [11, 12, 13, 14]. Tools facilitating the analysis of large biological data sets play a pivotal role in translating this wealth of information into comprehensive insights into biomedical mechanisms, which are crucial for applications in translational and personalized medicine [1].
In this context, data visualization emerges as a crucial tool for researchers striving to comprehend and communicate complex biological insights effectively. Data visualization tools, particularly those in the form of interactive dashboards, offer intuitive graphical representations such as charts, graphs, and maps. These visual aids enable researchers to discern trends, identify outliers, and uncover patterns within data sets, thereby enhancing understanding and facilitating data‐driven decision‐making. The accessibility and interpretability offered by visual representations transcend the barriers posed by raw data, making complex scientific findings accessible to broader audiences.
Numerous tools have been developed to cater to the diverse needs of data visualization. These tools can broadly be categorized into command‐line tools (e.g., R, Python, Perl, LaTeX, Javascript, MATLAB, Gnuplot, Graphviz), desktop software (e.g., Excel, PowerPoint, Cytoscape [15], Gephi [16], IGV [17], Mayavi [18], Tbtools [19]), and online platforms (e.g., ImageGP [20], EVenn [21], HemI [22], Sangerbox [23], OmicStudio [24], shinyCircos [25], TOmicsVis [26], Wekemo Bioincloud [27]), iMeataLab Suite [28], iNAP [29], Majorbio Cloud [30], MetOrigin [31]. Each category offers distinct advantages and limitations, balancing flexibility, ease of use, and computational resources.
ImageGP 2 (accessible at https://www.bic.ac.cn/BIC/#/) represents a significant evolution in online data visualization and analysis platforms, tailored to meet the advanced needs of biomedical researchers. This updated version introduces a redesigned interface utilizing state‐of‐the‐art web technologies, enhancing functionality and user interaction. Key features include modules for data format transformation, streamlined workflows with preparameter selection validation, an expanded array of visualization functions and analytical tools (e.g., Weighted Gene Co‐Expression Network Analysis [WGCNA], differential gene expression analysis), and personalized user spaces for managing large data sets and analysis history. Additionally, the integration of an R package, ImageGP, extends these capabilities to local environments, addressing challenges related to usability and data management in both wet‐lab and dry‐lab settings.
RESULT
Overview of ImageGP 2
ImageGP 2 represents a substantial redesign aimed at enhancing user experience and functionality based on feedback from its predecessor. It incorporates 45 distinct tools and includes 13 instructional resources in six thematic sections (Figure 1). These encompass professional plot generation, data transformation and extraction capabilities, bioinformatics analyses, interactive visualization tools, as well as text and video tutorials. A dedicated bioinformatics resource section is also provided.
Figure 1.

Functional framework of ImageGP 2. The inner circle illustrates six key sections: professional plot generation, data transformation and extraction, bioinformatics analyses, interactive visualization tools, a bioinformatics resource hub, and text and video tutorials. The outer circle presents examples of representative visualization outputs.
The platform boasts 17 data visualization tools tailored for creating diverse plots such as heatmaps, box plots, bar plots, scatter plots (including variants like enrichment and volcano plots), Principal Co‐ordinates Analysis plots, histograms, line plots, and various Venn diagrams (Figure 1). While many of these tools were available in the earlier version of ImageGP, the backend codes have been entirely restructured. This overhaul introduces enhanced data validation logic to minimize user input errors and offers expanded parameter options to refine data screening and aesthetic attributes exploration. Comprehensive tutorials are available in both text and video formats, accessible within the tutorial section and alongside each tool for user convenience. Each tool features a carousel chart outlining input data formats, parameters, and output formats, accompanied by demo buttons to facilitate the reproducibility of illustrative examples.
Addition of modules for data format transformation
In the context of biological analysis, the majority of processed data is stored in matrix formats, such as gene expression matrices and matrices detailing abundance levels of bacteria, proteins, or metabolites. Typically, these matrices adopt a wide format structure, suitable for comparisons across all samples; for example, shown in heatmap. However, visualization tools like ggplot2, grounded in The Grammar of Graphics, require data in a long format to effectively map variables to visual aesthetics [32].
Semantically, a wide matrix should contain more columns and a long matrix must contain more rows (Figure 2A). This is not the main difference. A wide format matrix typically contains numerous columns where each column (except the first) holds homogeneous data. For instance, in a gene expression matrix, each numeric value within columns represents gene expression levels across samples. Utilizing this format for visualizations, such as mapping expression values to point size attributes, presents challenges due to the need to aggregate data across all columns.
Figure 2.

Matrix formats and transformations. (A) Illustration of the transformation between wide and long format matrices. (B) Overview of matrix merging capabilities, demonstrating five modes: left, right, inner, outer, and horizontal stacking (hstack). (C) Description of the “explode matrix” function, which expands matrix dimensions by splitting elements in a single column and duplicating values from other columns.
Conversely, a long format matrix from the gene expression table is structured with fewer columns (e.g., samples, genes, values) and more rows, representing each unique combination of genes and samples (Figure 2A). This format accommodates heterogeneous data among columns, facilitating straightforward mapping of specific columns to distinct aesthetic attributes. Notably, a matrix may be treated as wide or long formats depending on the analytical requirements. For example, a gene expression matrix functions as a wide format when comparing all samples, yet serves as a long format for generating correlation plots between any two samples using scatter plots, where columns denote varied attributes.
Another common operation involves matrix merging, executed in five modes: left (retaining all items from the left matrix), right (retaining all items from the right matrix), inner (retaining common items from both matrices), outer (combining all items from both matrices), and hstack (concatenating all columns) (Figure 2B). This functionality is typically employed to integrate a long‐format abundance matrix with metadata matrices, thereby incorporating additional sample attributes. Furthermore, this operation supports matrix subset extraction using left, right, or inner modes. For instance, in the left merging mode, only the subset of expression data corresponding to target genes in the left matrix is extracted. Additionally, matrix merging facilitates tasks such as ID mapping for gene identification transfer.
Here, we utilize a gene expression plot to illustrate the application of these matrix operation functions. Typically, the gene expression matrix is structured in a wide matrix format, as shown below:
| Gene | Samp1 | Samp2 | Samp3 | Samp4 | Samp5 | Samp6 | Samp7 | Samp8 | Samp9 | Samp10 | Samp11 | Samp12 |
| Gene1 | 5.0 | 5.0 | 5.0 | 5.8 | 5.0 | 5.8 | 9.9 | 9.6 | 9.9 | 9.6 | 8.9 | 10.4 |
| Gene2 | 9.9 | 8.0 | 4.2 | 4.7 | 4.2 | 4.7 | 9.9 | 8.6 | 9.9 | 8.6 | 4.2 | 4.2 |
| Gene3 | 8.2 | 8.1 | 9.7 | 7.8 | 9.7 | 7.8 | 12.5 | 10.8 | 12.5 | 10.8 | 12.3 | 11.1 |
| Gene4 | 6.5 | 7.1 | 6.8 | 6.9 | 6.8 | 6.9 | 11.1 | 10.8 | 11.1 | 10.8 | 10.5 | 11.1 |
Suppose we want to analyze the expression distribution of Gene1 across all samples using a density plot. Initially, this matrix is structured in a wide format. To facilitate analysis, we transpose the matrix:
| Samp | Gene1 | Gene2 | Gene3 | Gene4 |
| Samp1 | 5.0 | 9.9 | 8.2 | 6.5 |
| Samp2 | 5.0 | 8.0 | 8.1 | 7.1 |
| Samp3 | 5.0 | 4.2 | 9.7 | 6.8 |
| Samp4 | 5.8 | 4.7 | 7.8 | 6.9 |
| Samp5 | 5.0 | 4.2 | 9.7 | 6.8 |
| Samp6 | 5.8 | 4.7 | 7.8 | 6.9 |
| Samp7 | 9.9 | 9.9 | 12.5 | 11.1 |
| Samp8 | 9.6 | 8.6 | 10.8 | 10.8 |
| Samp9 | 9.9 | 9.9 | 12.5 | 11.1 |
| Samp10 | 9.6 | 8.6 | 10.8 | 10.8 |
| Samp11 | 8.9 | 4.2 | 12.3 | 10.5 |
| Samp12 | 10.4 | 4.2 | 11.1 | 11.1 |
In this transposed matrix, one column represents a gene, transforming it into a long matrix suitable for gene‐centric analysis with each gene as one separate attribute. By pasting this data into the histogram plot tool and configuring parameters, we could generate the expression distribution profile for Gene1 in all samples (Figure S1).
To extend the analysis to compare gene expression profiles among different sample groups, we incorporate metadata:
| Samp | Group |
| Samp1 | Root |
| Samp2 | Root |
| Samp3 | Root |
| Samp4 | Root |
| Samp5 | Root |
| Samp6 | Root |
| Samp7 | Leaf |
| Samp8 | Leaf |
| Samp9 | Leaf |
| Samp10 | Leaf |
| Samp11 | Leaf |
| Samp12 | Leaf |
Using the “merge matrix” tool, we combine these matrices to create a merged data set (Figure S2):
| Samp | Gene1 | Gene2 | Gene3 | Gene4 | Group |
| Samp1 | 5.0 | 9.9 | 8.2 | 6.5 | Root |
| Samp2 | 5.0 | 8.0 | 8.1 | 7.1 | Root |
| Samp3 | 5.0 | 4.2 | 9.7 | 6.8 | Root |
| Samp4 | 5.8 | 4.7 | 7.8 | 6.9 | Root |
| Samp5 | 5.0 | 4.2 | 9.7 | 6.8 | Root |
| Samp6 | 5.8 | 4.7 | 7.8 | 6.9 | Root |
| Samp7 | 9.9 | 9.9 | 12.5 | 11.1 | Leaf |
| Samp8 | 9.6 | 8.6 | 10.8 | 10.8 | Leaf |
| Samp9 | 9.9 | 9.9 | 12.5 | 11.1 | Leaf |
| Samp10 | 9.6 | 8.6 | 10.8 | 10.8 | Leaf |
| Samp11 | 8.9 | 4.2 | 12.3 | 10.5 | Leaf |
| Samp12 | 10.4 | 4.2 | 11.1 | 11.1 | Leaf |
This merged data set allows us to conduct a comparative analysis between sample groups, visualizing the expression distribution profile of each gene across different conditions using still the histogram plot tool (Figure S3).
If we want to analyze multiple genes or all genes simultaneously, the matrix is not suitable since each gene is one individual attribute. One way to do this is to collapse all genes into one column and all expression values into one column. That is the function of the tool “Wide to long matrix” (Figure S4):
| Samp | Group | Gene | value |
| Samp1 | Root | Gene1 | 5.0 |
| Samp2 | Root | Gene1 | 5.0 |
| Samp3 | Root | Gene1 | 5.0 |
| Samp4 | Root | Gene1 | 5.8 |
| Samp5 | Root | Gene1 | 5.0 |
| Samp6 | Root | Gene1 | 5.8 |
| Samp7 | Leaf | Gene1 | 9.9 |
| Samp8 | Leaf | Gene1 | 9.6 |
| Samp9 | Leaf | Gene1 | 9.9 |
| Samp10 | Leaf | Gene1 | 9.6 |
| Samp11 | Leaf | Gene1 | 8.9 |
| Samp12 | Leaf | Gene1 | 10.4 |
| Samp1 | Root | Gene2 | 9.9 |
| Samp2 | Root | Gene2 | 8.0 |
| Samp3 | Root | Gene2 | 4.2 |
| Samp4 | Root | Gene2 | 4.7 |
| Samp5 | Root | Gene2 | 4.2 |
| Samp6 | Root | Gene2 | 4.7 |
| Samp7 | Leaf | Gene2 | 9.9 |
| Samp8 | Leaf | Gene2 | 8.6 |
| Samp9 | Leaf | Gene2 | 9.9 |
| Samp10 | Leaf | Gene2 | 8.6 |
| Samp11 | Leaf | Gene2 | 4.2 |
| Samp12 | Leaf | Gene2 | 4.2 |
| Samp1 | Root | Gene3 | 8.2 |
| Samp2 | Root | Gene3 | 8.1 |
| Samp3 | Root | Gene3 | 9.7 |
| Samp4 | Root | Gene3 | 7.8 |
| Samp5 | Root | Gene3 | 9.7 |
| Samp6 | Root | Gene3 | 7.8 |
| Samp7 | Leaf | Gene3 | 12.5 |
| Samp8 | Leaf | Gene3 | 10.8 |
| Samp9 | Leaf | Gene3 | 12.5 |
| Samp10 | Leaf | Gene3 | 10.8 |
| Samp11 | Leaf | Gene3 | 12.3 |
| Samp12 | Leaf | Gene3 | 11.1 |
| Samp1 | Root | Gene4 | 6.5 |
| Samp2 | Root | Gene4 | 7.1 |
| Samp3 | Root | Gene4 | 6.8 |
| Samp4 | Root | Gene4 | 6.9 |
| Samp5 | Root | Gene4 | 6.8 |
| Samp6 | Root | Gene4 | 6.9 |
| Samp7 | Leaf | Gene4 | 11.1 |
| Samp8 | Leaf | Gene4 | 10.8 |
| Samp9 | Leaf | Gene4 | 11.1 |
| Samp10 | Leaf | Gene4 | 10.8 |
| Samp11 | Leaf | Gene4 | 10.5 |
| Samp12 | Leaf | Gene4 | 11.1 |
Subsequently, utilizing the histogram plot tool with appropriate configurations provides expression distribution profiles for selected genes across different groups (Figure S5).
The tool also includes a function termed “explode matrix,” which expands matrix size by splitting elements within one column and duplicating values from another column in the same row (Figure 2C). This feature, referred to as “exploding,” amplifies matrix dimensions, as demonstrated in previous applications involving the transformation of gene ontology enrichment tables into network formats to visualize pathway‐gene relationships [33].
Refined tool usage workflows
The operational procedures for each tool have been refined to enhance user interaction. Initially, users are prompted to define input parameters, such as specifying whether the input matrix is in the long or wide format if necessary and selecting between directly pasting data into a text area or using previously uploaded files. Following input submission, users engage the “Check Data” feature to validate adherence to predefined rules. For single matrices, validation includes checks for matrix legality (uniform row and column dimensions), absence of special characters in header rows (typically reserved for column names), absence of duplicate entries in the first column (typically reserved for row names in wide format), and numeric consistency for wide format matrices. Clear explanations are provided for detected errors, including error type, specific items causing issues, and their respective positions within the matrix. Users are empowered to rectify input data before proceeding with subsequent operations.
In cases where multiple matrices are involved, intermatrix relationships are scrutinized. For instance, in the context of a heatmap analysis comprising three matrices—heatmap data, row annotations, and column annotations—the system verifies that all items in the first column of annotation matrices align with corresponding entries in the heatmap data matrix. Continuous refinement of file validation logic is implemented based on user feedback to mitigate runtime failures effectively. Upon successful data validation, users can adjust additional parameters as desired to advance further on the tool page.
Parameters are logically grouped to simplify user interaction and presented in an accordion format. Groups lacking essential parameters remain folded, ensuring streamlined navigation. For tools like “heatmap,” where no essential parameters exist, all groups remain folded after data validation, allowing users to proceed with the submission promptly. Conversely, tools like “boxplot” feature essential parameters denoted by a red star, such as “X‐axis variable” and “Y‐axis variable,” which are initially expanded for convenient user selection. Additional parameters lacking star designation are considered optional, enabling users to focus solely on essential configurations initially. Post the run of the initial analysis, users can freely explore parameter explanations to experiment with adjustments and observe their effects.
Further enhancements include parameter optimization for clarity and functionality. Ambiguous parameters, such as those specifying data types, have been eliminated to reduce user confusion. Automated data type checks have been integrated into backend operations to enhance reliability. Parameters involving geometry ordering, such as “X‐axis variable order” now include data filtering capabilities. Users can select desired dropdown values to screen data or determine the plot layout, optimizing visualization output. Expanded parameter options encompass data preprocessing, statistical labeling, color customization, facet plots, and support for diverse output formats like interactive plots and PowerPoint presentations. Additionally, flexibility in column order and header names has been extended, broadening ImageGP's applicability beyond biological data sets to include data from any fields such as chemistry and physics—all predicated on structured matrix inputs.
Expanded toolset
The updated version of ImageGP features an enhanced array of tools with optimized parameter organization and increased flexibility. For instance, the box plot tool now supports various configurations such as single‐group, multiple‐group, pair‐lined, and facet box plots. Users can seamlessly transform these plots into violin plots, dot plots, jitter plots, or combinations thereof, and adjust the layout between vertical and horizontal orientations. Specific parameters are also provided for presenting single‐cell marker gene box plots (Figure 3A).
Figure 3.

Representative visualization and analysis results. (A) Various configurations of box plots generated through parameter combinations, including single‐group, multiple‐group, paired‐line, facet plots, and beeswarm plots. (B) Weighted gene co‐expression network analysis results and corresponding comprehensive reports structured in eight sections. (C) Interactive plots for multiple sequence alignments, allowing dynamic layout adjustments. (D) Phylogenetic tree representation enriched with associated annotation heatmaps, facilitating the integration of qualitative and quantitative data for enhanced visualization.
In the context of Linear discriminant analysis Effect Size analysis, users can directly input modified output to generate plots exclusively. Additionally, users could assign colors to each group and produce editable vector images with embedded text. Unlike the previous version, where all results are shown in one zipped file, users can now conveniently browse results through an online document enriched with text and images. This updated approach to result presentation not only facilitates comprehensive analysis with multiple steps and outcomes but also supports the integration of additional bioinformatics tools such as WGCNA and differential gene/protein expression analysis [34].
Beyond data transformation capabilities, this version introduces 10 new bioinformatics analysis tools, including WGCNA, limma differential expression analysis, multiple sequence alignment, reverse complement FATSA, RNA translation, motif search, FASTA extraction, points detection in specified areas, GXF to BED conversion, and CDS/protein sequence extraction from GXF files. Additionally, three interactive plot tools have been incorporated.
WGCNA analysis, for example, involves a structured process of eight steps designed for ease of use. Users simply paste or upload their expression data and initiate the analysis to receive comprehensive reports. Each report comprises eight sections corresponding to the analysis steps, featuring subsections with static or interactive visuals, tables, explanations, and options for downloading all results (Figure 3B). This reporting format can be extended to show results from cohesive workflows integrated by separate tools.
The multiple sequence alignment tool introduces interactive plots for the first time, allowing users to dynamically adjust layouts without recomputation. Interactive features enable users to hover over plot elements for detailed information (Figure 3C). Notably, the Circle phylogenetic tree tool supports phylogenetic analysis based on Newick format input, incorporating various annotations (Figure 3D). Phylogenetic trees are pivotal for organizing biological diversity knowledge, structuring classifications, and providing evolutionary insights. Users can enrich these trees with attribute matrices to set branch and node colors, integrating qualitative and quantitative information for enhanced data visualization.
Personalized user environment
While ImageGP does not mandate login for usage, registered users could gain access to personalized features to manage large data sets more efficiently. For example, users may encounter browser crashes when directly pasting large input matrices into the web page text area, potentially leading to suboptimal user experiences. To mitigate this issue, a personalized user space has been implemented for registered users, consisting of two main components: File Management and Tools Records. Registration is straightforward and free of charge.
In the File Management section, users can upload, copy, move, rename files, and organize directories. Uploaded files can be selected for use in tool pages, with a preview feature displaying the initial five lines of text content. For files exceeding 500 characters per line, only the first 500 alphabetic characters in each line are shown. The selection of directories displays the first five files/folders contained within, allowing users to verify file choices efficiently without overwhelming the web page with excessive content. However, the complete file content is utilized during subsequent parameter selection and actual analysis phases.
Within the Tools Records section, logged‐in users can review submission times, execution statuses, and results of each submitted job. Collaboration is facilitated through the ability to share results with collaborators. Users can adjust parameters based on previous selections, conduct reanalysis of jobs either retaining previous parameters or specifying new ones, and save results into distinct folders as either updates to existing jobs or entirely new analyses. This functionality enhances user control and workflow management across multiple analysis sessions.
Improved user support with streamlined error reporting
While we conduct preliminary checks on input data before submission, unforeseen errors may still occur due to user‐defined parameters or specific data content like empty values or symbol conflicting. These instances are categorized as operational challenges for online tools. To address these issues proactively, based on accumulated experience, we implement rigorous data and parameter validation processes. However, users may still meet runtime errors. Those users unfamiliar with programming may struggle to provide comprehensive information for debugging purposes.
To streamline error resolution, a “Request for Help” button is now integrated into the result page. This feature automatically detects errors during program execution and prompts users to submit error logs directly to our team. Upon receiving these logs, our developers promptly initiate debugging protocols to refine the program code and address identified issues. Users who opt to leave their contact emails could receive responses within 1–3 days, containing pertinent debug information and resolutions.
This enhancement not only simplifies the error reporting process for users but also enables continuous program optimization based on real‐world feedback. As a result, ImageGP 2 remains agile in addressing evolving data complexities and user requirements through iterative updates and enhancements.
The R package ImageGP
ImageGP 2 represents a reimagined web server that integrates the new R package ImageGP pivotal to the functionality of various analysis and visualization tools within the platform.
Previously, the online version of ImageGP utilized bash scripts to dynamically generate R scripts based on user inputs, resulting in repetitive code logic across scripts. This approach complicated code revision for debugging and the inclusion of new functionalities. Moreover, users unfamiliar with bash scripting faced challenges in executing these scripts. In the redesigned version, we have transitioned to using pure R code to handle user inputs and parameter validation. Similar functional blocks of code have been modularized into R functions, totaling 96 functions including 12 primary plotting functions. These functions encompass data transformation, logical checks, and attribute mapping, systematically employed across all visualization tools and other operational contexts. Consequently, bug fixes applied to individual functions propagate throughout all tools, enhancing efficiency and maintenance.
The R package ImageGP retains the same parameters as its online counterpart. Each data visualization tool generates an R script tailored to user inputs and parameters. Users have the option to download these scripts, open them in RStudio or other R Integrated Development Environments, and customize file paths and output prefixes for local execution. This capability offers several advantages: first, it enables users to simulate data copies with identical headers, select parameters online, generate visualizations, and subsequently replace file locations locally to produce real results, ensuring data privacy. Second, users can introduce additional tuning parameters directly into the local R script for customized analyses. Third, the local script facilitates batch plot generation through iterative processes.
This enhanced integration of the R package ImageGP not only enhances user flexibility and data privacy but also empowers advanced users to extend functionality and optimize analyses beyond standard parameters available online.
DISCUSSION
ImageGP was originally developed to enhance data visualization capabilities for researchers, which has proven beneficial. However, it is crucial for users to recognize that visualizations containing numerous data points can lead to misinterpretations. Poorly designed visualizations may introduce biases or confusion. A simple graph might fail to capture attention or convey significant insights, whereas an elaborate visualization could obscure the intended message or offer profound clarity.
In recent years, there has been an increasing demand for effective visual representation of information, especially in scientific research contexts. Successful data visualization transcends mere graphical depiction by requiring clear objectives that drive design choices. Researchers must determine what specific aspects of the data they wish to visualize. This involves decisions on geometric elements (e.g., points, lines, bars), mapping data columns to aesthetic attributes such as color, shape, and size, applying statistical transformations, and specifying the coordinate system for the plot. Techniques like faceting enable the visualization of different data subsets. The integration of these components defines the graphical output.
When designing ImageGP, our approach guides users through selecting the plot type and configuring data attributes like x‐axis, y‐axis, color, size, and shape. This approach aims to familiarize users with the data visualization process and facilitate the interpretation of visual results. ImageGP also provides users with the flexibility to experiment with various visualization types to identify the most suitable one for their needs.
ImageGP2 represents a substantial evolution from its predecessor. Our ongoing updates focus on expanding functionality and enhancing user accessibility. Currently, we offer text and video tutorials along with training courses to reduce usability barriers. Looking ahead, our development efforts for ImageGP will concentrate on two primary objectives: first, transforming ImageGP into a computational platform that simplifies the transition from command‐line tools to online tools, thereby broadening accessibility for researchers. Second, integrating individual tools into workflows that enable users to initiate analyses directly from raw data, such as raw sequence data in FASTQ format, progressing seamlessly through mapping, quantification, and subsequent visualizations.
CONCLUSION
In summary, ImageGP 2 emerges as a versatile tool at the forefront of biomedical research, facilitating seamless integration between data matrices and visual representations. Its enhanced capabilities promise to empower researchers in leveraging big biological data for transformative discoveries and applications in various scientific domains.
METHODS
ImageGP 2 is hosted in a high‐performance computing server with 180 threads, 386 GB memory, and 15 TB storage to deal with 2000 jobs each day. ImageGP 2 is implemented as a web application using JavaScript and HTML for front‐end development. The core JavaScript libraries used include Vue.js (https://vuejs.org) for the main frame, vis.js (https://visjs.org) for network display, echarts (https://echarts.apache.org), plotly.js (https://plotly.com/), and D3.js (https://d3js.org/) for interactive charts (like multiple sequence alignment, phylogenetic tree visualization, maps). The backend data transporting was conducted using the high‐level web framework Django (https://www.djangoproject.com). The MySQL open‐source data management system is utilized for saving and accessing table data. All submitted jobs are managed by the distributing system Celery (https://docs.celeryq.dev/en/stable/index.html) scheduled in two queues (data analysis queue and data visualization queue) with Redis as the backend. Most data visualization is generated based on the R package ImageGP which mainly depends on the ggplot2, WGCNA, and limma packages [32, 34, 35]. The R packages plotly and eoffice are used to transfer picture objects to interactive plots or Microsoft PowerPoint formats. Most data format transforming and sequence processing are dealt with using Python scripts.
AUTHOR CONTRIBUTIONS
Tong Chen wrote the manuscript with all figures. Mei Yang drafted the first two figures. Tao Chen, Mei Yang, Siqing Fan, Minglei Shi, Buqing Wei, Huijiao Lv, Wandi Cao, Chongming Wang, Jianzhou Cui, Jiwen Zhao, Yilai Han, Jiao Xi, and Ziqiang Zheng tested the tools and gave useful feedback. Tong Chen, Yong‐Xin Liu, and Luqi Huang supervised and funded this project, and revised the manuscript. All authors have read the final manuscript and approved it for publication.
CONFLICT OF INTEREST STATEMENT
Tong Chen and Yong‐Xin Liu hold the position of Editor‐in‐Chief for iMeta and are blinded from peer review and decision‐making for the manuscript.
ETHICS STATEMENT
No animals or humans were involved in this study.
Supporting information
Figure S1: Density plot showing the expression distribution profile for Gene1 in all samples.
Figure S2: Combine these two matrices to create a merged data set. All configured parameters are highlighted in yellow.
Figure S3: Visualizing the expression distribution profile of each gene across different conditions.
Figure S4: Transfer wide matrix to long matrix.
Figure S5: Displaying expression distribution profiles for selected genes across different groups.
ACKNOWLEDGMENTS
We would like to thank all 50,000+ users for their suggestions and usage of ImageGP. This work was supported by grants from the Scientific and Technology Innovation Project of China Academy of Chinese Medical Sciences (CI2021A0411), The Fundamental Research Funds for the Central Public Welfare Research Institutes (ZZ13‐YQ‐095), Scientific and Technological Innovation Project of China Academy of Chinese Medical Sciences (CI2023E002), the Key Project at Central Government Level: The Ability Establishment of Sustainable Use for Valuable Chinese Medicine Resources (2060302), National Key R&D Program of China (2020YFA0908000), Agricultural Science and Technology Innovation Program (CAAS‐ZDRW202308), the Natural Science Foundation of China (U23A20148), and Youth Innovation Promotion Association of CAS (No. 2020425).
Chen, Tong , Liu Yong‐Xin, Chen Tao, Yang Mei, Fan Siqing, Shi Minglei, Wei Buqing, et al. 2024. “ImageGP 2 for Enhanced Data Visualization and Reproducible Analysis in Biomedical Research.” iMeta 3, e239. 10.1002/imt2.239
Contributor Information
Tong Chen, Email: chent@nrc.ac.cn, Email: chentong_biology@163.com.
Yong‐Xin Liu, Email: liuyongxin@caas.cn.
Luqi Huang, Email: huangluqi01@126.com.
DATA AVAILABILITY STATEMENT
This paper does not generate any new data. ImageGP 2 could be accessed at https://www.bic.ac.cn/BIC/#/. The R package ImageGP is saved in https://github.com/Tong-Chen/ImageGP and https://gitee.com/ct5869/ImageGP. Supplementary materials (figures, tables, scripts, graphical abstract, slides, videos, Chinese translated version, and update materials) may be found in the online DOI or iMeta Science http://www.imeta.science/.
REFERENCES
- 1. Li, Yixue , and Luonan Chen. 2014. “Big Biological Data: Challenges and Opportunities.” Genomics, Proteomics & Bioinformatics 12: 187–189. 10.1016/j.gpb.2014.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Lander, Eric S. , Linton Lauren M., Birren Bruce, Nusbaum Chad, Zody Michael C., Baldwin Jennifer, Devon Keri, et al. 2001. “Initial Sequencing and Analysis of the Human Genome.” Nature 409: 860–921. 10.1038/35057062 [DOI] [PubMed] [Google Scholar]
- 3. Venter, J. Craig , Adams Mark D., Myers Eugene W., Li Peter W., Mural Richard J., Sutton Granger G., Smith Hamilton O., et al. 2001. “The Sequence of the Human Genome.” Science 291: 1304–1351. 10.1126/science.1058040 [DOI] [PubMed] [Google Scholar]
- 4. The Encode Project Consortium . 2011. “A User's Guide to the Encyclopedia of DNA Elements (ENCODE).” PLoS Biol 9: e1001046. 10.1371/journal.pbio.1001046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Regev, Aviv , Teichmann Sarah A., Lander Eric S., Amit Ido, Benoist Christophe, Birney Ewan, Bodenmiller Bernd, et al. 2017. “The Human Cell Atlas.” eLife 6: e27041. 10.7554/eLife.27041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lewin, Harris A. , Robinson Gene E., Kress W. John, Baker William J., Coddington Jonathan, Crandall Keith A., Durbin Richard, et al. 2018. “Earth BioGenome Project: Sequencing Life for the Future of Life.” Proceedings of the National Academy of Sciences 115: 4325–4333. 10.1073/pnas.1720115115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Gao, Xinxin , Chen Kai, Xiong Jie, Zou Dong, Yang Fangdian, Ma Yingke, Jiang Chuanqi, et al. 2024. “The P10K Database: A Data Portal for the Protist 10 000 Genomes Project.” Nucleic Acids Research 52: D747–D755. 10.1093/nar/gkad992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Sayers, Eric W. , Cavanaugh Mark, Clark Karen, Pruitt Kim D., Sherry Stephen T., Yankie Linda, and Karsch‐Mizrachi Ilene. 2024. “GenBank 2024 Update.” Nucleic Acids Research 52: D134–D137. 10.1093/nar/gkad903 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Bai, Xue , Bao Yiming, Bei Shaoqi, Bu Congfan, Cao Ruifang, Cao Yongrong, Cen Hui, et al. 2024. “Database Resources of The National Genomics Data Center, China National Center for Bioinformation in 2024.” Nucleic Acids Research 52: D18–D32. 10.1093/nar/gkad1078 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Chen, Tong , Yang Mei, Cui Guanghong, Tang Jinfu, Shen Ye, Liu Juan, Yuan Yuan, Guo Juan, and Huang Luqi. 2024. “IMP: Bridging the Gap for Medicinal Plant Genomics.” Nucleic Acids Research 52: D1347–D1354. 10.1093/nar/gkad898 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Wong, Bang . 2012. “Visualizing Biological Data.” Nature Methods 9: 11311131. 10.1038/nmeth.2258 [DOI] [Google Scholar]
- 12. Pavlopoulos, Georgios A. , Malliarakis Dimitris, Papanikolaou Nikolas, Theodosiou Theodosis, Enright Anton J., and Iliopoulos Ioannis. 2015. “Visualizing Genome and Systems Biology: Technologies, Tools, Implementation Techniques and Trends, Past, Present and Future.” GigaScience 4: 38. 10.1186/s13742-015-0077-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. O'Donoghue, Seán I. , Baldi Benedetta Frida, Clark Susan J., Darling Aaron E., Hogan James M., Kaur Sandeep, Maier‐Hein Lena, et al. 2018. “Visualization of Biomedical Data.” Annual Review of Biomedical Data Science 1: 275–304. 10.1146/annurev-biodatasci-080917-013424 [DOI] [Google Scholar]
- 14. O'Donoghue, Seán I. 2021. “Grand Challenges in Bioinformatics Data Visualization.” Frontiers in Bioinformatics 1: 13. 10.3389/fbinf.2021.669186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Shannon, Paul , Markiel Andrew, Ozier Owen, Baliga Nitin S., Wang Jonathan T., Ramage Daniel, Amin Nada, Schwikowski Benno, and Ideker Trey. 2003. “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks.” Genome Research 13: 2498–2504. 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Bastian, Mathieu , Heymann Sebastien, and Jacomy Mathieu. 2009. “Gephi: An Open Source Software for Exploring and Manipulating Networks.” Proceedings of the International AAAI Conference on Web and Social Media 3: 361–362. 10.1609/icwsm.v3i1.13937 [DOI] [Google Scholar]
- 17. Robinson, James T. , Thorvaldsdóttir Helga, Winckler Wendy, Guttman Mitchell, Lander Eric S., Getz Gad, and Mesirov Jill P.. 2011. “Integrative Genomics Viewer.” Nature Biotechnology 29: 24–26. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Ramachandran, Prabhu , and Varoquaux Gael. 2011. “Mayavi: 3D Visualization of Scientific Data.” Computing in Science & Engineering 13: 40–51. 10.1109/MCSE.2011.35 [DOI] [Google Scholar]
- 19. Chen, Chengjie , Wu Ya, and Xia Rui. 2022. “A Painless Way to Customize Circos Plot: From Data Preparation to Visualization Using TBtools.” iMeta 1: e35. 10.1002/imt2.35 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Chen, Tong , Liu Yong‐Xin, and Huang Luqi. 2022. “ImageGP: An Easy‐To‐Use Data Visualization Web Server for Scientific Researchers.” iMeta 1: e5. 10.1002/imt2.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Chen, Tong , Zhang Haiyan, Liu Yu, Liu Yong‐Xin, and Huang Luqi. 2021. “EVenn: Easy to Create Repeatable and Editable Venn Diagrams and Venn Networks Online.” Journal of Genetics and Genomics 48: 863–866. 10.1016/j.jgg.2021.07.007 [DOI] [PubMed] [Google Scholar]
- 22. Ning, Wanshan , Wei Yuxiang, Gao Letian, Han Cheng, Gou Yujie, Fu Shanshan, Liu Dan, et al. 2022. “HemI 2.0: An Online Service for Heatmap Illustration.” Nucleic Acids Research 50: W405–W411. 10.1093/nar/gkac480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Shen, Weitao , Song Ziguang, Zhong Xiao, Huang Mei, Shen Danting, Gao Pingping, Qian Xiaoqian, et al. 2022. “Sangerbox: A Comprehensive, Interaction‐Friendly Clinical Bioinformatics Analysis Platform.” iMeta 1: e36. 10.1002/imt2.36 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Lyu, Fengye , Han Feiran, Ge Changli, Mao Weikang, Chen Li, Hu Huipeng, Chen Guoguo, Lang Qiulei, and Fang Chao. 2023. “OmicStudio: A Composable Bioinformatics Cloud Platform with Real‐Time Feedback That Can Generate High‐Quality Graphs for Publication.” iMeta 2: e85. 10.1002/imt2.85 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Wang, Yazhou , Jia Lihua, Tian Ge, Dong Yihan, Zhang Xiao, Zhou Zhengfu, Luo Xiang, Li Yang, and Yao Wen. 2023. “shinyCircos‐V2.0: Leveraging the Creation of Circos Plot with Enhanced Usability and Advanced Features.” iMeta 2: e109. 10.1002/imt2.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Miao, Ben‐Ben , Dong Wei, Han Zhao‐Fang, Luo Xuan, Ke Cai‐Huan, and You Wei‐Wei. 2023. “TOmicsVis: An All‐In‐One Transcriptomic Analysis and Visualization R Package with Shinyapp Interface.” iMeta 2: e137. 10.1002/imt2.137 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Gao, Yunyun , Zhang Guoxing, Jiang Shunyao, and Liu Yong‐Xin. 2024. “Wekemo Bioincloud: A User‐Friendly Platform for Meta‐Omics Data Analyses.” iMeta 3: e175. 10.1002/imt2.175 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li, Leyuan , Ning Zhibin, Cheng Kai, Zhang Xu, Simopoulos Caitlin M. A., and Figeys Daniel. 2022. “iMetaLab Suite: A One‐Stop Toolset for Metaproteomics.” iMeta 1: e25. 10.1002/imt2.25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Feng, Kai , Peng Xi, Zhang Zheng, Gu Songsong, He Qing, Shen Wenli, Wang Zhujun, et al. 2022. “iNAP: An Integrated Network Analysis Pipeline for Microbiome Studies.” iMeta 1: e13. 10.1002/imt2.13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Ren, Yi , Yu Guo, Shi Caiping, Liu Linmeng, Guo Quan, Han Chang, Zhang Dan, et al. 2022. “Majorbio Cloud: A One‐Stop, Comprehensive Bioinformatic Platform for Multiomics Analyses.” iMeta 1: e12. 10.1002/imt2.12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Yu, Gang , Xu Cuifang, Zhang Danni, Ju Feng, and Ni Yan. 2022. “MetOrigin: Discriminating the Origins of Microbial Metabolites for Integrative Analysis of the Gut Microbiome and Metabolome.” iMeta 1: e10. 10.1002/imt2.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Wickham, Hadley . 2016. Ggplot2: Elegant Graphics for Data Analysis. Switzerland: Springer International Publishing Cham. [Google Scholar]
- 33. Yang, Mei , Chen Tong, Liu Yong‐Xin, and Huang Luqi. 2024. “Visualizing Set Relationships: EVenn's Comprehensive Approach to Venn Diagrams.” iMeta 3: e184. 10.1002/imt2.184 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Langfelder, Peter , and Horvath Steve. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9: 559. 10.1186/1471-2105-9-559 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Ritchie, Matthew E. , Phipson Belinda, Wu Di, Hu Yifang, Law Charity W., Shi Wei, and Smyth Gordon K.. 2015. “Limma Powers Differential Expression Analyses for RNA‐sequencing and Microarray Studies.” Nucleic Acids Research 43: e47. 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1: Density plot showing the expression distribution profile for Gene1 in all samples.
Figure S2: Combine these two matrices to create a merged data set. All configured parameters are highlighted in yellow.
Figure S3: Visualizing the expression distribution profile of each gene across different conditions.
Figure S4: Transfer wide matrix to long matrix.
Figure S5: Displaying expression distribution profiles for selected genes across different groups.
Data Availability Statement
This paper does not generate any new data. ImageGP 2 could be accessed at https://www.bic.ac.cn/BIC/#/. The R package ImageGP is saved in https://github.com/Tong-Chen/ImageGP and https://gitee.com/ct5869/ImageGP. Supplementary materials (figures, tables, scripts, graphical abstract, slides, videos, Chinese translated version, and update materials) may be found in the online DOI or iMeta Science http://www.imeta.science/.
