Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jul 22;20(7):e0323079. doi: 10.1371/journal.pone.0323079

Datavzrd: Rapid programming- and maintenance-free interactive visualization and communication of tabular data

Felix Wiegand 1,*, David Lähnemann 1,2, Felix Mölder 1,3, Hamdiye Uzuner 1, Adrian Prinz 1, Alexander Schramm 4, Johannes Köster 1
Editor: Vivek Kumar5
PMCID: PMC12282858  PMID: 40694543

Abstract

Tabular data, often scattered across multiple tables, is the primary output of data analyses in virtually all scientific fields. Exchange and communication of tabular data is therefore a central challenge. We present Datavzrd, a tool for creating portable, visually rich, interactive reports from tabular data in any kind of scientific discipline. Datavzrd unifies the strengths of currently common generic approaches for interactive visualization like R Shiny with the portability, ease of use and sustainability of plain spreadsheets. The generated reports do not require the maintenance of a web server nor the installation of specialized software for viewing and can simply be attached to emails, shared via cloud services, or serve as manuscript supplements. They can be specified without requiring imperative programming, thereby enabling rapid development and offering accessibility for non-computational scientists, unlocking the look and feel of dedicated manually crafted web applications without the maintenance and development burden. Datavzrd reports scale from small tables to thousands or millions of rows and offer the ability to link multiple related tables, allowing to jump between corresponding rows or hierarchically explore growing levels of detail.

Introduction

Tabular datasets, prevalent across diverse scientific fields such as biology, physics, economics, and environmental science, stand as the primary outcomes of scientific investigations. Communicating such results in a way that others can access the contained information at ideally the same level of detail as the original analysis author is a major cornerstone of reproducibility and transparency after publication. Moreover, it is important for the efficiency of scientific communication before publication, enabling non-computational scientists to, for example, dive deeply into data underlying high-level figures.

A traditional option for such communication are spreadsheet applications like Excel (and their corresponding file format xlsx, in the following for simplicity called Excel files) which is still commonly regarded as a primary choice, especially among non-computer science professionals. Most spreadsheet applications, including Excel, lack the capability to quickly and easily visualize different columns of a dataset. They also do not support reproducibility of results for similar datasets without repeating the work. Finally, multiple investigations of supplementary Excel files show that the automated conversion of certain values can lead to the misinterpretation of data: for example, gene names such as SEPT2 (Septin 2) are inadvertently converted into dates, leading to errors in genomic research [4,7].

Beyond plain spreadsheet representations, other solutions are available. Naturally, any general purpose programming language (e.g. Python or R) or transformation languages (e.g. XSLT) could be used (e.g. in comparison with helper libraries like great-tables [26]) to implement entirely custom reporting. Moreover, various frameworks have been developed that further simplify these tasks. Shiny [9] works by implementing the intended visuals and corresponding interfaces in an R or Python script, which is then executed in a server process that provides the resulting user interface in the form of served HTML pages. ShinyApps.io [27] enables easy hosting of Shiny applications without the need to maintain an own web server. However, at the time of writing, it is limited to five applications, 25 usage hours per month, and 1 GB per dataset with the free plan. Similarly to Shiny, Dash [28] allows creating interactive dashboards with Python code, but also requiring a running server process. Lumen [29] provides data-driven dashboards that can be configured without requiring imperative programming, using the declarative YAML language, but also requires a server process. Of course, charting libraries such as D3.js [30] or Plotly [31] enable the creation of interactive visualizations, but using them within tables requires substantial bespoke coding and web development effort. MultiQC [2] enables the summarization of various standard formats into single-page HTML-based reports. Unlike Dash, Lumen, and Shiny, MultiQC reports are portable and do not require a running web server. Apart from pre-implemented standard formats, MultiQC also offers the ability to specify custom data, however, without being able to interactively explore large interrelated complex tables.

Finally, there are numerous specialized web applications that aim to make certain domain specific kinds of data explorable [1418]. Those applications often combine query interfaces with tabular and plot based visualizations, backed by a server process that extracts information from a database. Their development is complex and requires thousands of lines of code and work hours, although they share similar visuals, and likewise require substantial workload for maintenance of the installation. Finally, data security requirements often turn their exposure to the public into a complex and sometimes even impossible task, so that they cannot be easily used to communicate the results supporting a publication in a transparent way.

So far, there was no solution available that would maintain both a visual and interactive interface to any kind and size of tabular data as well as maintenance-free, reliable, and sustainable availability guarantees.

To overcome this situation, we present Datavzrd (https://datavzrd.github.io), which aims to provide visually rich, interactive, and portable representations of tabular data, while avoiding the caveats with above listed approaches. Datavzrd can be used in virtually any kind of data analysis for any kind of tabular data type in a rapid and ad-hoc fashion, obviating the need for specialized implementations and continuous maintenance as well as deployment of a web application.

Datavzrd is implemented as a command line application with the Rust [32] programming language. It is available as an MIT licensed open source software via GitHub [33], can be installed via Cargo [34] and Conda [35], or used as a Snakemake wrapper [36] for rapid integration into reproducible data analysis workflows.

Results

Feature overview

Without requiring imperative programming, Datavzrd can produce highly interactive and visually rich interfaces (cf. Fig 2) from tabular input files in various formats (CSV, TSV, Parquet [37] or JSON). Using simple YAML-based declarative specifications, users define datasets along with the desired visualizations for each column. The self-contained report is then generated by executing Datavzrd with the command datavzrd path/to/config.yaml -o path/to/output. Within the YAML specifications, a plethora of visual standard elements can be activated per table column, including tick plots (Fig 11A), bar plots (Fig 11B), heatmaps (Fig 11C), pill plots (Fig 11D), ellipsis, custom plots, linkouts (Fig 11E), and scientific notation of floating point numbers. The resulting per-cell visualizations are complemented by extensive annotation and control over the displayed tables, including sorting, filtering and searching (Fig 9), optional line number display, per-column histograms, additional header rows, table descriptions, and the ability to hide columns into a detail-mode.

Fig 2. Screenshot of a Datavzrd report with annotated visual elements and controls.

Fig 2

The underlying example dataset entails genomic variants along with various scores and predictions. Gene names and coordinates have been altered to de-identify the data. Source: https://github.com/snakemake-workflows/dna-seq-varlociraptor. An interactive version of this report is available under https://datavzrd.github.io/example-molecular-tumor-board.

Fig 11. Different pre-defined column visualizations of Datavzrd.

Fig 11

Left: corresponding YAML specification in the configuration file. Right: resulting column visualization. A: A Datavzrd tick plot definition with user-defined domain. B: A Datavzrd bar plot definition with user-defined domain and additional color domain. C: A Datavzrd heatmap for a column named Rated. D: Pill plot definition for a cell containing multiple values separated by any delimiter. E: YAML specification of a linkout to the NCBI gene database.

Fig 9. Comparison of search and filter mode (text input, filter brush or multi-select).

Fig 9

Beyond single tables, it offers the ability to jump back and forth between corresponding rows in multiple related tables and to create custom visualizations that entirely replace the default table views with Vega-Lite [3] plots or plain HTML. Importantly, Datavzrd reports are encoded as standalone HTML files that do not require a web server for viewing them and can be simply opened in any HTML5 [38] compliant web browser. This way, they can be easily shared via Email or attached as supplementary files to publications. Additionally, it also eliminates server maintenance and reduces server load by shifting processing load to the browser (without hampering scalability, see section Scalability).

For enhanced accessibility and ease of hosting, Datavzrd provides a publish subcommand, which automates the upload of reports to a newly created or existing GitHub repository using a simple command-line invocation. Afterwards it provides the user with a link to the repository settings where GitHub Pages [39] need to be enabled manually — once upon initial publication of the repository. This way authors can ensure transparency and continuous public availability of their results without the burden of maintaining a web server.

To further simplify the configuration process, Datavzrd also includes the suggest subcommand, which automatically generates a YAML configuration file based on one or more input CSV files. By inferring column types and suggesting suitable visualizations, it enables rapid report creation, allowing users to focus on their data rather than technical details.

The result is that Datavzrd’s approach offers a more rapid path towards reproducible, transparent, comprehensive and informative communication of tabular scientific results that moreover is free of continued maintenance tasks (see Fig 1 and the Shiny vs. Datavzrd comparison in the supplement). Via compression methods and data partitioning strategies, Datavzrd is scalable towards big datasets without overwhelming the browser memory while still maintaining interactive capabilities. An overview of Datavzrd’s out of the box visual interface capabilities can be found in Fig 2.

Fig 1. Timeline comparison.

Fig 1

Comparison of the work items needed for publication and communication and the interaction capabilities of Datavzrd compared to above mentioned alternative approaches. Time per step is meant as a relative approximation of the true times, which may of course vary depending on the actual kind of data that shall be handled. We postulate that, for example, simple configuration requires less time than backend or frontend implementation and running a command line tool like Datavzrd is less effort than deploying a web service on a given infrastructure. Finally, the required effort for Datavzrd ends with the publication of the generated static HTML files, whereas server based approaches require continuous maintenance of their deployment in order to remain secure, operable, and accessible.

Community driven extensibility

Datavzrd offers the ability to share recurrent configuration snippets for columns or views as so-called spells. This can be helpful since data analyses are likely to share columns with values that can be interpreted and visualized similarly, even when their field of application might be completely different. Examples for such cases are columns with boolean values (e.g. 0/1 or true/false) or columns with p-values or posterior probabilities.

A spell is a reusable template that defines the rendering of a specific column or view in a Datavzrd report. Each spell is written in YAML and supports parameterization through YTE [40], a YAML template engine, allowing users to pass custom values to the spell using the keyword ‘with‘. For example, a p-value spell might define a heatmap to visually represent statistical significance, where a configurable significance threshold determines the color gradient. In practice, users can apply a spell by specifying it directly within a report configuration as shown in the configuration of Fig 4.

Fig 4. Datavzrd boolean spell and its application to a gene table.

Fig 4

Left: Datavzrd spell for displaying boolean values of the column highly_variable of the report to the right. The true_value and false_value parameters define which values are rendered into a plus sign and a minus sign. Right: Example Datavzrd view of a bioinformatics spreadsheet supplement of Klein et al. [20]. Gene names have been rendered as links to a public database, boolean values have been rendered using Datavzrd’s boolean spell [56] (see Community driven extensibility), other columns have been rendered as heatmaps with categorical or linear color scale.

Spells can be published in a central repository [41] and explored via an automatically updated catalog [42] so that others can reuse them in their own Datavzrd reports. This collaborative approach fosters a shared library of visualization techniques that grows and adapts to evolving research needs across disciplines, thereby obviating the need to reinvent the wheel for visualizing common data types. Since spells can also encode full views, they can be used to easily create reports with pre-defined visualizations for virtually any tool creating tabular output.

Example reports for various scientific disciplines

To highlight Datavzrd’s field-agnostic application across various scientific disciplines we created Datavzrd reports from the supplementary data of four recently published papers [1922], each originating from different fields: bioinformatics [43], social science [44], astronomy [45], and anthropology [46]. We have published the reports on GitHub pages using the publish subcommand outlined above. Tables 13 - 18 from Klein et al. [20] let the user navigate between driver genes of different cell types while also directly providing access to aforementioned genes in the Ensembl database. Table 25 of the same publication directly provides access to given genomic regions in the integrative genomics viewer [23] using a linkout. Instead of simply displaying numeric values with a given uncertainty (e.g. 20.2±0.2) as plain text, a table containing radiocarbon data from L. G. Sanjuán et al. [19] shows them in a convenient tick plot with the range of the uncertainty depicted as a red range bar and the central value represented as a blue tick. Individual views are presented exemplarily in Figs 3, 4, 5 and 6.

Fig 3. Example Datavzrd view of an astronomy spreadsheet supplement of Barraud et al. [22].

Fig 3

Boolean columns have been rendered with Datavzrd’s boolean spell [56], numeric columns have been rendered as heatmaps. The squarish link buttons at the far right of each row (generated by Datavzrd’s dataset linking functionality, see section Interactivity and visuals) allow to jump to corresponding rows in other views.

Fig 5. Example Datavzrd view of a social science spreadsheet supplement of Alayón-Gamboa et al. [21].

Fig 5

Categorical columns have been rendered as heatmaps with a categorical color scale. The squarish link buttons at the far right of each row (generated by Datavzrd’s dataset linking functionality, see section Interactivity and visuals) allow to jump to corresponding rows in other views.

Fig 6. Example Datavzrd view of an anthropology spreadsheet supplement of L. G. Sanjuán et al. [19].

Fig 6

Numerical values have been rendered as tick plots or heatmaps. Categorical columns have been rendered as heatmaps with categorical color scale.

Memory and storage footprint

We evaluated the storage utilization of Datavzrd reports in comparison to the raw input data and an Excel file. The analysis has been implemented as a fully reproducible Snakemake [5] workflow [47]. Our dataset [48] used for testing comprises 17 columns and 173534 rows with information about registered electric vehicles in the state of Washington. Seven of the columns contained numeric, ten contained nominal values.

In Fig 7, we compare the on disk storage footprint of two Datavzrd reports (one that represents the dataset entirely in memory and one that uses data partitioning for the same dataset) with the footprint of the CSV and Excel representation of the same dataset. One can see that the Datavzrd in-memory report occupies nearly as little space as the plain Excel file, while being (naturally) much smaller than the plain text CSV representation. Further, it can be seen that a bit less than half of the size is occupied by static resources like Javascript libraries and HTML pages. In contrast, the partitioned version of the report requires additional overhead for storing search indices and suffers from decreased compression rates, because each table page has to be compressed separately.

Fig 7. Storage usage of Datavzrd reports compared with Excel and raw input data.

Fig 7

While it is apparent that both Excel and Datavzrd are able to successfully compress the dataset, Fig 7 indicates that compression rates of Datavzrd depend on the amount of table rows that can be compressed together (i.e. the page size or whether the entire dataset is held in memory). Fig 8 shows that the compression rate for the in-memory mode of Datavzrd exceeds that of Excel, while smaller page sizes lead to a monotone decrease of compression rates.

Fig 8. Comparison of compression rates for increasing input sizes and different page sizes.

Fig 8

Methods

We shed light on the central technical challenges and solutions within Datavzrd by considering portability, interactivity and scalability.

Portability

In the context of scientific research, seamless data exchange among researchers is a key factor for collaborative progress. With this in mind, Datavzrd implements various features targeting a high portability of the resulting reports.

A key feature of Datavzrd is a self-contained report architecture — eliminating the need for a dedicated backend or server. The entire report, structured as HTML files in a regular folder, is platform-agnostic, offering compatibility across all operating systems.

Browsers subject HTML pages to the same-origin-policy [8], which prohibits them from dynamically accessing resources hosted on domains other than their own. In case of standalone HTML pages that are accessed from the file system instead of being provided by a web server, this implies that the page may only access local files via static HTML tags. Hence it becomes impossible to dynamically load data stored in binary formats such as HDF5 [49], Parquet [37], or Flatbuffers [51], as well as to directly access databases like SQLite [52] or DuckDB [50]. To work around this limitation, Datavzrd stores data in JavaScript files, which are then loaded via static script tags. This approach requires regenerating the report whenever the input dataset changes; however, because data and configuration are completely separated, no updates to the latter are needed as long as the structure of the data remains unchanged. We recommend embedding the invocation of Datavzrd into automated workflows (e.g., using Snakemake [5], Nextflow [24], or CWL [25]) to facilitate reproducible report generation after data updates.

In addition to sharing full reports, Datavzrd provides a share button for each row, enabling to share individual rows of a dataset with others. This is solved by encoding the row data into a URL parameter. The URL directs the user to a website hosted on GitHub that can display the exchanged dataset row by reading the URL parameter. Since the row data is in theory discoverable from URL parameters that might occur in server logs, the base URL of the HTML page can be customized through the configuration file, allowing organizations to ensure that sensitive data remains within their local network by hosting the previously mentioned HTML page themselves. Datavzrd can also encode this URL into a QR code, providing a practical method for presenters to share data during presentations.

Individual pages of the report can be exported into SVG format, for example in order to embed them into manuscript figures. Additionally, Datavzrd reports offer an export to Excel (XLSX). This way, they can serve as replacement for sharing spreadsheets without hampering the ability of collaborators to further analyze aspects of the data in their traditional spreadsheet environment.

Interactivity and visuals

The most obvious kind of interaction with a table is to filter or search for rows of interest. Depending on whether the dataset is small enough to reside entirely in memory or not, Datavzrd offers either a filter or a search mode. The filter mode presents a widget for each column, which, depending on the type of data in the column, either allows to choose from discrete values, specify a search keyword, or select a numeric range. The widgets of the filter mode, along with the search mode, are displayed in Fig 9. Filters for different columns can be combined, yielding the intersection of the resulting rows. For a synthetic dataset with 100,000 rows and 30 columns (including nominal, integer, and string-valued columns), interactive filtering operations consistently complete in under 100 ms. Measurements were taken using the browser’s performance.now() API in Safari 18.5 on macOS 13.7.6 with a 3.8 GHz Quad-Core Intel Core i5 and 48 GB of RAM. The search mode (due to constraints in the possibilities of representing large datasets out of memory in standalone HTML files) offers the ability to search for keywords or numbers and subsequently jump to the corresponding pages of the table, see section Scalability.

In practice, data can often be best represented by multiple tabular datasets that are related to each other. It may even be beneficial to divide wider tables into multiple ones, each containing a specific subset of columns [10]. Therefore, Datavzrd offers the capability to link between datasets, allowing the user to jump back and forth between corresponding rows of related tables.

The configuration of the datasets allows for the values of any column to be used as foreign keys to create a link to a row of another dataset with one column configured to be used as the primary key. Each primary key value must only occur once in the linked dataset to preserve uniqueness of the key lookup.

The same mechanism enables hierarchical linking by utilizing values from one column as foreign keys to connect with different tables. This feature allows for concise overview tables to establish connections with more detailed subtables, enhancing the exploration of specific details within the given data without requiring the entire hierarchical structure to be present in browser memory. An example for a dataset definition using both linking options is provided in Fig 10.

Fig 10. A Datavzrd dataset definition including different linkouts.

Fig 10

In this example, table-a creates a link to table-b where column gene-name of table-a and column gene of table-b match in value. With the gene details definition, table-a is linked to one of many tables existing for each value of gene-name.

Columns in Datavzrd reports can be displayed in different modes: normal, detail, available, pinned, and hidden. These modes are designed to address the challenge of maintaining clarity while accommodating a large amount of data. Research indicates that an ideal table typically contains 3-5 columns to ensure readability and prevent information overload [11]. While the exact number of columns will depend on their information content and the context of the table (on paper or on screen), Datavzrd’s display modes offer the ability to follow these guidelines easily. In particular, the hidden mode allows for complete concealment of less critical columns, thereby keeping the table concise and focused. Meanwhile, the detail mode enables selective unfolding of additional columns per row as needed, allowing users to access more comprehensive data without compromising the table’s clarity. Using available hides a column by default while still providing the user with the option to reveal it using a multi-select column interface on the top right when exploring the dataset. Complementary to this, pinned keeps the column visible in the table but excludes it from the interface, preventing it from being accidentally hidden and reducing clutter in the selection interface.

Guidelines for effective tables find that column headers should be kept short or use abbreviations [12]. This is possible using the label keyword allowing to specify column labels for the resulting report without the need to modify the given input dataset. To make tables self-explanatory, the description keyword can be used to provide additional information and context [12]. This keyword supports Markdown-formatted text [13] as well as LaTeX-style math equations, enabling the report creator to include detailed and complex annotations, ensuring the table can be understood by any observer. Additionally, Datavzrd provides the option to set a character limit for a column using the ellipsis keyword. This only shows the first n characters in the cell with the rest still accessible via a tooltip that appears while hovering the cell. External resources like knowledge databases can provide additional context with even more details to researchers. For easy access to external resources, Datavzrd provides the option to configure one or more linkouts for a cell using the value of the cell or other values of the same row to format the URL.

Datavzrd relies on Vega-Lite as its underlying engine for generating a wide variety of plots [3]. Internally, Vega-Lite specifications compile to Vega, which itself builds on D3.js for rendering. Datavzrd provides an even higher level of abstraction with its available pre-made plot configurations for ticks, bars, pills and heatmaps. When specified for a column (cf. Fig 11), these visualizations are automatically applied and configured to every cell within that column. Extending the existing plot configurations, Datavzrd also allows to include Vega-Lite plots that can be fully customized through JSON specifications, allowing researchers the flexibility to create visuals tailored to their specific needs and preferences. Through this mechanism, users can harness the full expressiveness of Vega-Lite, which supports a wide range of visualizations such as scatter plots, line and area charts, bar charts, heatmaps, geographic maps and faceted layouts (see examples for the extensive set of supported types). If neither of the previously mentioned options is suitable for a given task, configurations of columns may include any javascript function to process data and return plain HTML that is rendered into the respective table cell.

By using Vega-Lite plots, users can not only specify custom plots using values of a single row to display inside a table but also harness the entire dataset to craft plots that convey meaningful insights independently of the tabular view and stand as a single view inside the report. Full plot views in Datavzrd retain the previously introduced linking feature for datasets, enabling users to click on data points within a plot for navigating to corresponding rows of complementary tables.

Finally, beyond tabular and plot views, Datavzrd offers the possibility to configure plain HTML views, allowing to embed any kind of custom interface or visualization.

Scalability

For ensuring scalability towards big datasets, Datavzrd uses a combination of different compression techniques. Initially, the data is compressed using JSONM [53]. JSONM uses memoization principles on JSON objects in order to eliminate any unnecessary repetition of data. In particular, it stores keys only once, such that objects of the same kind can be represented via arrays of values. The JSONM representation is then compressed with lz-string [54], a JavaScript library that implements Lempel-Ziv compression algorithms while ensuring that the resulting compressed word can be represented as a Javascript string (thus not containing any invalid binary only characters).

Any tabular dataset encoded by the above approach will, upon rendering the report’s HTML pages, still be represented in uncompressed form in the browser memory. If the uncompressed dataset size exceeds the available system memory this can lead to slowdowns or crashes. In order to avoid this situation, Datavzrd partitions large datasets into multiple chunks, each of them being only loaded when the corresponding table page in the report is accessed. The threshold for and size of these chunks can be configured upon report generation (default: 20000 rows). Since those chunks cannot be loaded dynamically due to the same-origin-policy, we encode each table page as a separate HTML file. Page sizes may be configured by the user to address variations in performance across different machines and account for datasets with differing numbers of columns.

Searching throughout such chunked datasets, paginated across multiple HTML pages, presents a challenge as direct access to these pages is restricted by the same-origin-policy. To overcome this limitation, Datavzrd implements a pre-built search index for each column, providing an effective solution for search and navigation across the distributed dataset. The index includes the values of each column alongside their respective page locations, and is displayed upon request using an embedded iframe. This way, we avoid that the indexes of all columns are loaded entirely into memory when rendering a page. Within the index table, the user can filter for values of interest, such that the pages of occurrence are displayed and can be navigated to. Upon the latter, the row corresponding to the value of interest is highlighted.

If the dataset is small enough to fit entirely into memory, Datavzrd can instead rely on a versatile filtering mechanism which enables to dynamically select and combine data ranges across multiple columns.

Conclusion

Tabular data, often scattered across multiple tables, is the primary output of data analyses in virtually all scientific fields. Exchange and communication of tabular data is therefore a central challenge in data science. So far, this has been mostly handled by spreadsheet applications like Microsoft Excel - offering limited visual and interactive capabilities as well as potential problems with compatibility, replicability and value encoding [4] - or server based approaches like R Shiny, Lumen or dedicated web applications, requiring maintenance of web servers and potentially extensive imperative programming while limiting sustainability and reproducibility upon publication.

To overcome this situation, we have developed Datavzrd, which unifies rich visual capabilities with the portability offered by plain spreadsheets. It enables the generation of server-free, portable, interactive visual reports on tabular data that can be rapidly configured via a declarative specification language, in many cases obviating the need for imperative programming. Once a configuration file is created, Datavzrd can be repeatedly applied to various datasets of the same structure, offering automated, efficient, and scalable table visualization. Linking features, various display modes, extensive visualization capabilities, and the ability to specify fully custom non-tabular views, simplify structured dataset exploration, even across multiple related tables. Datavzrd’s versatile export functionalities offer straightforward inclusion into manuscript figures. Via spells, custom column or view configurations can be shared and developed as a community project.

Future work on Datavzrd will entail a further improvement of the data compression and decompression strategy. One approach is to directly compress binary encoded JSONM with lz-string. Given that a machine independent decoding of the binary encoding from within Javascript can be realized, this approach would provide a substantial further reduction of the space required for numeric values. Furthermore, we plan to extend support to additional data types, such as temporal data. Currently, Datavzrd can already be integrated into other standalone reporting tools like for example Snakemake reports. Using web components [55], we plan to generalize the user interface so that its components can be embedded into arbitrary web application frameworks. Currently, Datavzrd’s aggregation capabilities are limited to simple, per-column summaries such as histograms. More complex operations like window functions or pivot tables are not supported. We see this as a deliberate design choice: such aggregations are better handled upstream in the analysis workflow [5,24,25], where they can be executed transparently and reproducibly. In future versions, we plan to extend Datavzrd with lightweight interactive features such as scatterplots for user-selected column pairs.

In summary, Datavzrd streamlines, simplifies, and enriches the exchange and communication of tabular data, one of the most abundant outputs of scientific data analyses, while providing seamless scaling from small tables to thousands or even millions of rows without losing the ability of interactive and visually rich exploration. All these capabilities can be used on any kind of tabular data, and the resulting reports can be used as supplementary files for published manuscripts, or shared with collaborating researchers before publication. This enables the collaborators and readers to interactively explore the data underlying the findings of an analysis or published manuscript directly in their web browser without imposing additional maintenance or implementation burdens to the analysis authors. Likewise, it enables journals or scientific data hosting services like Zenodo [1] to sustainably ensure the availability of also the interactive resources associated with a manuscript by simply hosting a set of static HTML files instead of requiring the maintenance of server processes.

Software availability

Datavzrd is implemented as a command line application with the Rust [32] programming language. It is available as an MIT licensed open source software via Github [33], can be installed via Cargo [34] and Conda [35] or used as a Snakemake wrapper [36] for rapid integration into reproducible data analysis workflows.

Supporting information

S1 Appendix. Comparison of Shiny and Datavzrd for Interactive Table Creation.

We illustrate how Datavzrd supports rich, per-cell visualizations using Vega-Lite and compare its concise, declarative configuration against a minimal Shiny implementation. Example configurations, rendered outputs, and setup instructions are provided for both tools.

(PDF)

pone.0323079.s001.pdf (1.3MB, pdf)
S2 File. Example report for Fig 2.

Interactive Datavzrd report showcasing genomic variants with associated scores and predictions in a molecular tumor board context. The dataset has been de-identified by altering gene names and coordinates; see the original workflow at https://github.com/snakemake-workflows/dna-seq-varlociraptor and explore the interactive report at https://datavzrd.github.io/example-molecular-tumor-board.

(ZIP)

pone.0323079.s002.zip (1.3MB, zip)

Acknowledgments

We thank everyone who contributed by opening issues or submitting pull requests to Datavzrd. Although not listed as co-authors, their input helped refine the software and is acknowledged with appreciation.

Data Availability

Code is available under https://github.com/datavzrd/datavzrd.

Funding Statement

The author(s) received no specific funding for this work.

References

Decision Letter 0

Vivek Kumar

28 May 2025

PONE-D-25-17640Datavzrd: Rapid programming- and maintenance-free interactive visualization and communication of tabular dataPLOS ONE

Dear Dr. Wiegand,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 12 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Vivek Kumar, Ph. D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that your Data Availability Statement is currently missing the repository name and/or the DOI/accession number of each dataset OR a direct link to access each database. If your manuscript is accepted for publication, you will be asked to provide these details on a very short timeline. We therefore suggest that you provide this information now, though we will not hold up the peer review process if you are unable.

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

4. We are unable to open your Supporting Information file “datavzrd-report-supplement.zip”. Please kindly revise as necessary and re-upload

Additional Editor Comments:

The paper introduces Datavzrd, a declarative framework for generating server-free, interactive HTML visualizations from tabular data. It is clearly written and presents an appealing tool that could benefit a wide range of disciplines working with structured data. The framework emphasizes ease of use, portability, and client-side rendering, reducing the need for complex server infrastructure or imperative programming.

However, the manuscript lacks detailed workflows, comprehensive comparisons with existing tools (e.g., D3, Plotly, Dash), and quantified performance evaluations. Technical details, including supported data formats and visualization configurations, are either missing or insufficiently explained. Visual elements like figures and code snippets also need improvement for clarity and completeness.

The authors should ensure that the supporting files (such as datavzrd-supplement.pdf and datavzrd-report-supplement.zip and its contents) are accessible in the manuscript’s Supporting Information section so they are not missed by the reviewers and are eventually accessible to the readers of the publication as well.

Overall, Datavzrd shows good potential, but the paper would benefit from additional empirical support, clearer technical exposition, and a more balanced discussion of its limitations and place within the broader visualization tool landscape.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper presents a framework called Datavzrd that allows easy visualization of tabular data via HTML pages. It does not report substantial research findings, but presents a new research method which could be interesting for many disciplines dealing with tabular data.

The paper is written in comprehensible language and presented in an appealing style. It claims many advantages of their tool over existing alternatives, but misses evidence for these claims.

Major comments:

1. The authors claim that their tool allows more efficient visualizations with less code than existing alternatives, in particular Shiny. This is argued in text, but not substantiated by examples. An example workflow for a simple data table should be included, which puts the Datavzrd approach in contrast to Shiny (similar to the supplementary material plus brief explanations).

2. In Sec. 1 Excel and the xlsx format are mentioned. It is not clear, however, which other – esp. non-proprietary – file formats are supported and how the tool is applied to these.

3. Furthermore it remains unclear how the Datavzrd definition of visualizations interacts with the dataset. There are very few code snippets (e.g. Fig. 4) which, however, do not show how visualizations are defined. An example workflow as suggested in comment 1 could help.

Minor comments:

4. In the related works section of the introduction, D3, Plotly, and Dash could be mentioned as further widely used tools (also requiring lots of code, though).

5. Figure 2 is too small and thus unreadable. It should be rotated by 90 degrees to fill a whole page.

6. Code and visualization in Fig. 4 do not match. Furthermore, the very small code snippet does not explain what a "spell" is and how the values true and false are translated in visual representations.

7. Several references are missing bibliographical information such as journal/conference, page numbers etc., such as [14], [16], [17], [22]. Furthermore, journal names should be capitalized. More care should be taken when preparing the next version.

Reviewer #2: In this paper, the authors introduce Datavzrd, a framework designed for generating server-free, portable, and interactive visual reports from tabular data. The system leverages a declarative specification language to facilitate rapid report configuration, eliminating the need for imperative programming. The paper is generally well-written and is supported by illustrative examples and clear explanations.

One notable advantage the authors might emphasize is that client-side JavaScript frameworks reduce server load, as all rendering, interactivity, and event handling occur in the client’s browser. Given the capabilities of modern PCs and smartphones, this approach is both practical and scalable.

However, to substantiate the authors' claim regarding the uniqueness of Datavzrd, it is important to include a more comprehensive comparison with existing client-side JavaScript frameworks. Specifically, a discussion of alternatives such as D3.js, Vega, Plotly.js, and even older technologies like XSLT would strengthen the paper by placing Datavzrd within the context of existing tools.

Many server-based reporting tools provide advanced aggregation capabilities such as window functions, multi-level aggregations, predictive analytics, and pivot operations. These typically require multiple data passes and significant memory. The authors should clarify the scope of Datavzrd’s aggregation capabilities, which currently seem limited to simple aggregations, such as column histograms and heatmaps.

On page 5, the authors state: “Via compression methods and data partitioning strategies, Datavzrd is scalable towards big datasets without overwhelming the browser memory while still maintaining interactive capabilities.” The authors should quantify performance characteristics to improve clarity and reproducibility. Consider including details such as:

Performance benchmarks: e.g., “Interactive operations such as filtering respond in under 300 milliseconds with datasets up to 5 GB in size.”

Test environment: e.g., “Tested on a machine with 32 GB RAM, Intel i7 CPU, running Windows 11.”

Visualization types supported: e.g., “Line chart, heatmap, 3D surface plot.”

Resource usage: e.g., “CPU and RAM utilization during peak operations.”

Avoid vague claims such as “handles big files efficiently” without accompanying empirical evidence.

Additional Comments:

Page 4:

Figure 2 is too small to be legible. The authors should enlarge the figure to ensure that the visualizations are clearly discernible.

Page 4:

The manuscript states:

“Datavzrd provides a publish subcommand, which automates the upload of reports to a newly created or existing GitHub repository and enables hosting via GitHub Pages with a single click.”

The phrase "with a single click" is misleading. A more accurate description would be:

“...and enables hosting via GitHub Pages using a simple command-line invocation.”

The process requires executing a command in the following format:

datavzrd publish --repo-name <repo_name> --report-path <report_path> [--org <organization>]

This is not a one-click operation and should be described as such.

Page 10:

The authors note:

“Datavzrd stores data in JavaScript files, which are then loaded via static script tags.”

This is an important implementation detail. However, the authors should also acknowledge that when data updates are needed, the visualizations must be regenerated using the Datavzrd command line. This introduces some operational overhead, and its mention would provide a more balanced and transparent evaluation.

Page 14:

The statement:

“For the first time, this enables the collaborators and readers to interactively explore the data underlying the findings…” is overly assertive. Without rigorous comparative analysis, claims of precedence should be avoided.

Evaluation of the Linked Resources:

Live Demo

https://datavzrd.github.io/example-report-moscot/Suppl.%20Tab.%2013/index_1.html

The Birth date field is interpreted as a string, preventing effective range-based searches. Additionally, the column histogram is not time-ordered. These are areas where usability could be improved.

Demonstration Table

https://datavzrd.github.io/example-report-moscot/Suppl.%20Tab.%207/index_1.html

In the third field, after initiating a search, the data range in the histogram appears confusing. This could hinder interpretability and should be refined.

GitPod Tutorial

The tutorial is well-structured and user-friendly. However, a bug exists: after executing the show-report-url command and clicking the generated link, the hide button on top of each field does not actually hide the corresponding field in the report. This issue should be addressed.

GitHub Live Preview

https://datavzrd.github.io/datavzrd/index.html

The font size in the Markdown file is inconsistent and should be normalized for a professional appearance.

The hide button on top of each field does not actually hide the corresponding field in the following links:

https://datavzrd.github.io/datavzrd/oscars/index_1.html

https://datavzrd.github.io/datavzrd/movies/index_1.html

Additionally, data visualizations are missing in the following links:

https://datavzrd.github.io/datavzrd/movies-plot/index_1.html

https://datavzrd.github.io/datavzrd/oscar-plot/index_1.html</organization></report_path></repo_name>

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Xiaorong Cao

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jul 22;20(7):e0323079. doi: 10.1371/journal.pone.0323079.r002

Author response to Decision Letter 1


27 Jun 2025

We thank both reviewers for the comprehensive and positive reviews. In the following, we answer each comment in detail.

Reviewer 1

General comments The paper presents a framework called Datavzrd that allows easy visualization of tabular data via HTML pages. It does not report substantial research findings, but presents a new research method which could be interesting for many disciplines dealing with tabular data. The paper is written in comprehensible language and presented in an appealing style. It claims many advantages of their tool over existing alternatives, but misses evidence for these claims.

Response We thank the reviewer for the constructive summary and for recognizing the potential interdisciplinary relevance and clarity of the manuscript. We have re- worked various areas of the manuscript to better and more clearly support the claimed advantages. See our responses to specific comments below.

Comment 1 The authors claim that their tool allows more efficient visualizations with less code than existing alternatives, in particular Shiny. This is argued in text, but not substantiated by examples. An example workflow for a simple data table should be included, which puts the Datavzrd approach in contrast to Shiny (similar to the supplementary material plus brief explanations).

Response

We have extended the comparison in the supplement with explanations on the usage of both Shiny and Datavzrd. Additionally we have added another new supplementary section that further explains the comparison of the Shiny and Datavzrd configuration example. The supplement highlights that the Datavzrd approach requires about half of the lines compared to Shiny (28 vs. 49), involves no programming, and avoids data manipulation or plotting libraries. Instead, the full configuration is expressed in a short, human-readable YAML file. This addition should further substantiate our claim that Datavzrd enables more concise and accessible visualizations compared to imperative frameworks like Shiny.

Comment 2 In Sec. 1 Excel and the xlsx format are mentioned. It is not clear, however, which other – esp. non-proprietary – file formats are supported and how the tool is applied to these.

Response We have clarified the supported file formats in the revised manuscript. In short, Datavzrd supports CSV (and TSV) or Apache Parquet1 as input formats, both being non-proprietary. For exporting tabular views from the browser display, Datavzrd supports CSV and Excel.

Comment 3 Furthermore it remains unclear how the Datavzrd definition of visualiza- tions interacts with the dataset. There are very few code snippets (e.g. Fig. 4) which, however, do not show how visualizations are defined. An example workflow as suggested in comment 1 could help.

Response We have unified the examples of column configurations into a single figure and extended them with a bar and pill plot definition. Furthermore we expanded the supplement with an example of a custom plot definition that uses Vega-Lite syntax. In short, one can say that the user specifies a Datavzrd configuration file. When applying that file to one or more tables (by invoking Datavzrd as a command line tool), Datavzrd uses the specified visualizations to render the tables into self-contained interactive HTML which can then be send around, stored in some cloud, be attached as a supplementary to a manuscript, or hosted via static web servers (including e.g. github pages). To clarify the core workflow, we expanded the manuscript with an explanation on how configuration and dataset interact:

Using simple YAML-based declarative specifications, users define datasets along with the desired visualizations for each column. The self-contained report is then generated by exe- cuting Datavzrd with the command datavzrd path/to/config.yaml -o path/to/output.

1 https://parquet.apache.org/

2

Comment 4 In the related works section of the introduction, D3, Plotly, and Dash could be mentioned as further widely used tools (also requiring lots of code, though).

Response We thank the reviewer for the suggestion. In response, we have added a sentence to the related works section of the introduction mentioning Dash, D3.js, and Plotly, and positioning them in the broader spectrum of solutions:

Similar to Shiny, Dash allows to create interactive dashboards with Python code, but also requiring a running server process. [...] Of course, charting libraries such as D3.js or Plotly enable the creation of interactive visualizations, but using them within tables requires substantial bespoke coding and web-development effort.

Furthermore we have clarified the position of Vega-Lite, Vega and D3.js within Datavzrds ecosystem in the interactivity section:

Datavzrd relies on Vega-Lite as its underlying engine for generating a wide variety of plots. Internally, Vega-Lite specifications compile to Vega, which itself builds on D3.js for rendering. Datavzrd provides an even higher level of abstraction with its available pre-made plot configurations for ticks, bars, pills and heatmaps. When specified for a column (cf. Figure 9), these visualizations are automatically applied and configured to every cell within that column.

Comment 5 Figure 2 is too small and thus unreadable. It should be rotated by 90 degrees to fill a whole page.

Response We have rotated the figure accordingly to improve readability.

Comment 6 Code and visualization in Fig. 4 do not match. Furthermore, the very small code snippet does not explain what a ”spell” is and how the values true and false are translated in visual representations.

Response We fully agree with the reviewer. In response, we have switched Figs. 4 and 5 to better align the code with the corresponding visualization. We also improved the explanation of how the spell definition on the left corresponds to the rendered report example on the right.

Comment 7 Several references are missing bibliographical information such as jour- nal/conference, page numbers etc., such as [14], [16], [17], [22]. Furthermore, journal names should be capitalized. More care should be taken when preparing the next version.

Response We thank the reviewer for pointing out the missing details in some of the references. We have carefully reviewed all references in the manuscript and, to the best of our ability, added all available bibliographic information, including journal/conference, DOIs, and page numbers where applicable.

For the specific reference mentioned by the reviewer, we have updated the entry to include additional details. A summary of the changes for the specifically mentioned references is provided in the list below.

14 A multi-analytical study of [...] We have added more information including the article identifier eadp1917 since Science Advances doesn’t use traditional page numbers.

16 Sustainable data analysis with Snakemake We have added the missing DOI and issue number. F1000 does not provide page numbers.

17 The Not-So-Same-Origin Policy We have added a missing author as well as a link to article.

22 Zenodo We could not find a better option of citing Zenodo themselves. We use the BibTex entry they provide to cite them at the bottom of their ”About” page: https://about.zenodo.org.

Reviewer 2

General comments In this paper, the authors introduce Datavzrd, a framework de- signed for generating server-free, portable, and interactive visual reports from tabular data. The system leverages a declarative specification language to facilitate rapid report configuration, eliminating the need for imperative programming. The paper is generally well-written and is supported by illustrative examples and clear explanations.

Response We thank the reviewer for the positive and encouraging feedback. We are glad the manuscript provided clear explanations and illustrative examples that were helpful in conveying the core ideas and design of Datavzrd.

Comment 1 One notable advantage the authors might emphasize is that client-side JavaScript frameworks reduce server load, as all rendering, interactivity, and event han- dling occur in the client’s browser. Given the capabilities of modern PCs and smart- phones, this approach is both practical and scalable.

4

Response We thank the reviewer for this helpful suggestion. We have incorporated this point into the manuscript:

Additionally, it also eliminates server maintenance and reduces server load by shifting processing load to the browser (without hampering scalability, see section Scalability).

Comment 2 However, to substantiate the authors’ claim regarding the uniqueness of Datavzrd, it is important to include a more comprehensive comparison with existing client-side JavaScript frameworks. Specifically, a discussion of alternatives such as D3.js, Vega, Plotly.js, and even older technologies like XSLT would strengthen the paper by placing Datavzrd within the context of existing tools.

Response We have extended our discussion of Datavzrd’s functionality in the context of other approaches in the introduction (thereby also adding XSLT, Dash, D3.js, Vega, Plotly.js):

Naturally, any general purpose programming language (like e.g. Python or R) or trans- formation languages (e.g. XSLT) could be used (also e.g. in comparison with helper libraries like great-tables) to implement entirely custom reporting. [...] Similar to Shiny, Dash allows to create interactive dashboards with Python code, but also requiring a run- ning server process. [...] Of course, charting libraries such as D3.js or Plotly enable the creation of interactive visualizations, but using them within tables requires substantial bespoke coding and web-development effort.

Furthermore we have clarified the position of Vega-Lite, Vega and D3.js within Datavzrds ecosystem in the interactivity section:

Datavzrd relies on Vega-Lite as its underlying engine for generating a wide variety of plots. Internally, Vega-Lite specifications compile to Vega, which itself builds on D3.js for rendering. Datavzrd provides an even higher level of abstraction with its available pre-made plot configurations for ticks, bars, pills and heatmaps.

Comment 3 Many server-based reporting tools provide advanced aggregation capabil- ities such as window functions, multi-level aggregations, predictive analytics, and pivot operations. These typically require multiple data passes and significant memory. The authors should clarify the scope of Datavzrd’s aggregation capabilities, which currently seem limited to simple aggregations, such as column histograms and heatmaps.

Response We clarified the limit of Datavzrd’s aggregation capabilities together with an explanation on why these are not developed further at the moment:

Currently, Datavzrd’s aggregation capabilities are limited to simple, per-column sum- maries such as histograms. More complex operations like window functions or pivot tables are not supported. We see this as a deliberate design choice: such aggregations are better handled upstream in the analysis workflow, where they can be executed transpar- ently and reproducibly. In future versions, we plan to extend Datavzrd with lightweight interactive features such as scatterplots for user-selected column pairs.

Comment 4 On page 5, the authors state: “Via compression methods and data par- titioning strategies, Datavzrd is scalable towards big datasets without overwhelming the browser memory while still maintaining interactive capabilities.” The authors should quantify performance characteristics to improve clarity and reproducibility. Consider including details such as: Performance benchmarks: e.g., “Interactive operations such as filtering respond in under 300 milliseconds with datasets up to 5 GB in size.” Test environment: e.g., “Tested on a machine with 32 GB RAM, Intel i7 CPU, running Win- dows 11.” Visualization types supported: e.g., “Line chart, heatmap, 3D surface plot.” Resource usage: e.g., “CPU and RAM utilization during peak operations.” Avoid vague claims such as “handles big files efficiently” without accompanying empirical evidence.

Response We conducted additional benchmarking experiments on a large synthetic dataset and added the resulting quantitative performance characteristics for filter opera- tion times and the corresponding hardware and software specifications to the manuscript.

For a synthetic dataset with 100,000 rows and 30 columns (including nominal, integer, and string-valued columns), interactive filtering operations consistently complete in under 100 ms. Measurements were taken using the browser’s performance.now() API in Safari 18.5 on macOS 13.7.6 with a 3.8 GHz Quad-Core Intel Core i5 and 48 GB of RAM.

Regarding supported visualization types, we clarified that, in addition to Datavzrd’s predefined plot types (see Section “Feature overview” incl. new figure), users can embed fully custom Vega-Lite specifications. This enables the creation of a wide range of visualization types supported by Vega-Lite, including scatter plots, maps, faceted views, and more.

Through this mechanism, users can harness the full expressiveness of Vega-Lite, which supports a wide range of visualizations such as scatter plots, line and area charts, bar charts, heatmaps, geographic maps and faceted layouts (see https://vega.github. io/ vega-lite/ examples/ for the extensive set of supported types).

Comment 5 Figure 2 is too small to be legible. The authors should enlarge the figure to ensure that the visualizations are clearly discernible.

Response We have rotated the figure to increase the size and readability.

Comment 6 The manuscript states: “Datavzrd provides a publish subcommand, which automates the upload of reports to a newly created or existing GitHub repository and enables hosting via GitHub Pages with a single click. The phrase ”with a single click” is misleading. A more accurate description would be: “...and enables hosting via GitHub Pages using a simple command-line invocation.” The process requires executing a command in the following format: datavzrd publish --repo-name --report-path [--org ] This is not a one-click operation and should be described as such.

Response This has been indeed phrased very badly and is misleading. The click operation is needed afterwards to enable GitHub pages for the repository since this is not possible via the GitHub CLI. We have clarified the section of the manuscript accordingly:

For enhanced accessibility and ease of hosting, Datavzrd provides a publish subcom- mand, which automates the upload of reports to a newly created or existing GitHub repository using a simple command-line invocation. Afterwards it provides the user with a link to the repository settings where GitHub Pages need to be enabled manually — once upon initial publication of the repository.

Comment 7 The authors note: “Datavzrd stores data in JavaScript files, which are then loaded via static script tags.” This is an important implementation detail. However, the authors should also acknowledge that when data updates are needed, the visualiza- tions must be regenerated using the Datavzrd command line. This introduces some operational overhead, and its mention would provide a more balanced and transparent evaluation.

Response We have clarified in the manuscript that updating the input dataset requires regenerating the report:

This approach requires regenerating the report whenever the input dataset changes; how- ever, because data and configuration are completely separated, no updates to the latter are needed as long as the structure of the data remains unchanged. We recommend em- bedding the invocation of Datavzrd into automated workflows (e.g., using Snakemake, Nextflow , or CWL) to facilitate reproducible report generation after data updates.

Comment 8 The statement: “For the first time, this enables the collaborators and readers to interactively explore the data underlying the findings. . . ” is overly assertive. Without rigorous comparative analysis, claims of precedence should be avoided.

Response We agree and have removed the phrase “For the first time”.

Comment 9 Live Demo: https://datavzrd.github.io/example-report-moscot/ Suppl.%20Tab.%2013/index_1.html The Birth date field is interpreted as a string, preventing effective range-based searches. Additionally

Attachment

Submitted filename: Response_to_reviewers_Datavzrd.pdf

pone.0323079.s003.pdf (179.5KB, pdf)

Decision Letter 1

Vivek Kumar

8 Jul 2025

Datavzrd: Rapid programming- and maintenance-free interactive visualization and communication of tabular data

PONE-D-25-17640R1

Dear Dr. Wiegand,

Thank you for submitting the revised version of your manuscript, "Datavzrd: Rapid programming- and maintenance-free interactive visualization and communication of tabular data" to PLOS One. I have carefully reviewed the revisions alongside the reviewers' comments and am pleased to see that you have addressed the feedback thoroughly and thoughtfully. In light of the improvements made and the overall strength of the manuscript, I do not believe a second round of peer review is necessary.

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Vivek Kumar, Ph. D.

Academic Editor

PLOS ONE

Additional Editor Comments:

Acceptance letter

Vivek Kumar

PONE-D-25-17640R1

PLOS ONE

Dear Dr. Wiegand,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Vivek Kumar

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Comparison of Shiny and Datavzrd for Interactive Table Creation.

    We illustrate how Datavzrd supports rich, per-cell visualizations using Vega-Lite and compare its concise, declarative configuration against a minimal Shiny implementation. Example configurations, rendered outputs, and setup instructions are provided for both tools.

    (PDF)

    pone.0323079.s001.pdf (1.3MB, pdf)
    S2 File. Example report for Fig 2.

    Interactive Datavzrd report showcasing genomic variants with associated scores and predictions in a molecular tumor board context. The dataset has been de-identified by altering gene names and coordinates; see the original workflow at https://github.com/snakemake-workflows/dna-seq-varlociraptor and explore the interactive report at https://datavzrd.github.io/example-molecular-tumor-board.

    (ZIP)

    pone.0323079.s002.zip (1.3MB, zip)
    Attachment

    Submitted filename: Response_to_reviewers_Datavzrd.pdf

    pone.0323079.s003.pdf (179.5KB, pdf)

    Data Availability Statement

    Code is available under https://github.com/datavzrd/datavzrd.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES