Skip to main content
F1000Research logoLink to F1000Research
. 2020 Mar 19;8:ISCB Comm J-1750. Originally published 2019 Oct 14. [Version 2] doi: 10.12688/f1000research.20590.2

Interactive Clustered Heat Map Builder: An easy web-based tool for creating sophisticated clustered heat maps

Michael C Ryan 1, Mark Stucky 1, Chris Wakefield 2, James M Melott 2, Rehan Akbani 2, John N Weinstein 2,3,#, Bradley M Broom 2,a,#
PMCID: PMC7111501  PMID: 32269754

Version Changes

Revised. Amendments from Version 1

This version of the article has been revised to address the comments and questions of reviewers and to update the manuscript to reflect a few updates in the latest release of the Interactive CHM Builder.

Abstract

Clustered heat maps are the most frequently used graphics for visualization and interpretation of genome-scale molecular profiling data in biology.  Construction of a heat map generally requires the assistance of a biostatistician or bioinformatics analyst capable of working in R or a similar programming language to transform the study data, perform hierarchical clustering, and generate the heat map.  Our web-based Interactive Heat Map Builder can be used by investigators with no bioinformatics experience to generate high-caliber, publication quality maps.  Preparation of the data and construction of a heat map is rarely a simple linear process.  Our tool allows a user to move back and forth iteratively through the various stages of map generation to try different options and approaches.  Finally, the heat map the builder creates is available in several forms, including an interactive Next-Generation Clustered Heat Map that can be explored dynamically to investigate the results more fully.

Keywords: Bioinformatics, Genomics, Heat Map, Web Tool, Website, Hierarchical Clustering

Introduction

Many thousands of publications on genomics studies include clustered heat maps (CHMs) because the hierarchical clustering and intuitive visualization provide insight into the relationships among sample sub-groups and key biological processes 18. Construction of a CHM requires data transformation, application of clustering methods, association of covariate (classification) data, and production of the heat map visualization. Generally, those tasks require the assistance of an analyst with biostatistics or bioinformatics skills who can work in R or a similar language to manipulate the study data and generate the map. This is usually not a simple linear process because data transformation and clustering methods are often revisited to find the ideal match for the study, and modifications are often made to heat map visualizations to select the best colors, adjust covariates, insert gaps, etc. Our Interactive CHM Builder is a web-based tool for data transformation, clustering, and generation of high-quality heat maps. It can be used by investigators with no bioinformatics experience and only modest exposure to biostatistical methods. The tool guides users through the steps of creating a heat map and supports iterative refinement of the map by working backward and forward through the steps to refine data transformation, annotation, clustering, and formatting options. (Caveat: Iterative exploration of different options may introduce a multiple-comparisons issue that would have to be taken into account if the map were used for formal statistical inference, rather than discovery.)

One obvious limitation of traditional heat maps is that they contain a huge amount of information but are static in nature and do not readily support a deeper exploration of the biology behind the image. The Interactive CHM Builder produces traditional heat map images as PDF files but can also produce interactive next-generation CHMs (NG-CHMs). NG-CHMs support interactive exploration of patterns in the data through zooming, panning, searching, and advanced link-outs to dozens of external resources. An NG-CHM file can be downloaded and viewed locally with the NG-CHM viewer and, importantly, can be embedded in a study results webpage or publication.

The Interactive CHM Builder 9, available at https://build.ngchm.net/NGCHM-web-builder/, is easy to try out using sample data provided at the site. Other methods of producing NG-CHMs, including an R library and a set of tools for the Galaxy platform 10, 11, are described at https://www.ngchm.net/.

Methods

Implementation

The Interactive Builder 9 is web-based application that accepts an uploaded data matrix and then walks the user through several steps to transform the data, perform hierarchical clustering, and format the resulting CHM. The application is implemented as HTML, CSS, and JavaScript on the browser-side and Java servlets on the web server. Data manipulation and heat map generation are implemented in Java classes used by the servlets. The clustering is performed by a servlet using the Renjin engine ( https://www.renjin.org) to perform R clustering functions in Java. Browser sessions are tracked by the server to create a working area for each user and prevent users from seeing each other’s data or maps. In addition to the working version of the data matrix on which transformations are performed, an original version of the matrix is preserved. Returning to a previous matrix state is accomplished by restoring the original version and then re-applying transformations until the requested state is restored. The site retains constructed heat maps and the related uploaded data only for the duration of the HTTP session.

A Java NG-CHM heat map generator .jar file is used to construct the heat map repeatedly as options are selected in each step of the builder. The heatmapProperties.json file, which contains all options selected by the user, conveys the selected options to the generator. The current NG-CHM file set is stored in a directory under the session ID. The NG-CHM file is a zipped version of the NG-CHM directory. The downloaded .ngchm file can be saved locally and viewed interactively using a local instance of the NG-CHM viewer that can also be downloaded from the builder site. An overview is given in Figure 1.

Figure 1. High-level overview of the interaction of heat map builder components.

Figure 1.

Heat maps are built on a webserver. A browser session ID is used to create a separate, temporary working area for each user. Heat map construction sessions are cleaned up when the session is ended, but PDF and NG-CHM heat map files can be downloaded.

The full source code for the Interactive Builder is available in GitHub.

Operation

There is no need to install software to use the Interactive Builder 9 it is available for public use on our server at https://build.ngchm.net/NGCHM-web-builder/. If, however, a local private installation of Interactive Builder is preferred, there are two simple installation methods.

Organizations familiar with Docker can run the Builder as a Docker container ( https://docs.docker.com/). To do this, clone the git repository. The base folder of this repository has a docker build file. Run the docker build command in this directory with a –t option to name the resulting docker image. For example: docker build . –t nghm_builder. Then use the docker run command to start a container using the image. The heat maps created by the software are transient and last only for the duration of a user http session so there is no need to mount an external directory to the container for persistent storage. The port for connecting to the webserver in the container does need to be specified in the docker run command. Connect the desired external port to the tomcat instance in the container. For example, docker run --name=“ngchm_builder” -d -p 8888:80 ngchm_builder. Users should then be able to connect to Interactive Builder using their browser and the URL of the docker container. For example, http://<docker machine IP or URL>/ NGCHM-web-builder.

The other option for deploying the software is to install it on an existing web server like tomcat ( https://tomcat.apache.org/tomcat-9.0-doc). To do this, first clone the git repository and then use the ant script, ant_buildfile.xml in the NG-CHM_GUI_BUILDER folder to create a .war file. Then simply copy the .war file to the webapps directory of the web server. The application should then be available at http://<server URL>/ NGCHM-web-builder.

Use case

The starting point for a CHM is a matrix of data. In this use-case example, we focus on gene expression data from The Cancer Genome Atlas (TCGA) bladder cancer project 12, 13. The rows and columns of the matrix require identifiers, in this case sample ids and gene symbols, and the cells of the matrix must be numeric values. The builder will accept either a tab-delimited text file (*.txt), comma-separated text file (*.csv) or Excel spreadsheet (*.xlsx).

Select matrix

The Open Matrix File button on the first page of the builder ( Figure 2) is used to upload the data matrix. A name and optional description to be associated with the heat map are entered. When the data have been loaded, the Select Matrix page will show the first few rows and columns of the matrix. It is important that the builder correctly identify the row labels, column labels, and matrix data; the backgrounds of labels and matrix data should be blue and green, respectively. If the input file has extra rows or columns, you may need to correct the identification of labels and matrix data by selecting the appropriate radio button and then clicking on the correct location in the matrix displayed.

Figure 2. Heat map creation starts with importing a text matrix file (e.g., *.txt, *.csv or Excel *.xlsx file) and identifying the row labels, column labels and numerical data values.

Figure 2.

Note that several screens in the builder include advanced features that are hidden by default to simplify the process for first-time users. The use-case example here does not require advanced features, but be aware that additional capabilities can be accessed using the Advanced Features checkbox.

Transform/filter the data

Creating a good heat map depends on proper data preparation. The second step in the build process is the Data Transform page ( Figure 3), which provides three primary categories of matrix transformations: functions that identify and replace missing/invalid values, filters to remove rows or columns, and transforms to perform mathematical operations on data values. There are additional choices in advanced mode for transposing the matrix and calculating correlations.

Figure 3. The data transform page makes it easy to perform operations on the matrix like log transformation or filtering to reduce and normalize data.

Figure 3.

The right-hand panel of the Transform page provides summary statistics about the data matrix, including the number of rows and columns, a histogram of the data distribution, and an indication of the number of invalid cells in the matrix. The top of the page also provides suggestions about transformations that can be performed and flags any problems with the data. The use-case matrix is too large for the Interactive Builder to use in creating a heat map interactively; the clustering time, which increases approximately as the square of the larger matrix dimension for most clustering algorithms, is limiting. Currently, the website limits the heat map to no more than 5,000 total rows and columns (for example 1,000 samples and 4,000 genes) at the clustering stage. However, users can upload much larger matrices as long as filters on the transform page reduce the size to 5,000. For practical purposes, that often means extracting the most relevant data (e.g., with few enough missing values, sufficient signal, and sufficient standard deviation across samples) for clustering. We are also progressively increasing the size limit as compute power and clustering algorithms advance.

For this use case the transform tab is used to fix duplicate column headers; set a minimum threshold to reduce the influence of noise in the heat map; normalize the data with a log transform and mean center; and filter to remove rows with many missing values and to keep only rows with strong variation across samples. The transforms applied were:

  • Action: Duplicates Duplicates process: Rename. Column. Suffix duplicates with underscore and instance number. Apply.

  • Action: Transform Data Transform: Threshold. Set Values Below 0.00001 to NA. Apply.

  • Action: Transform Data Transform: Logarithmic. Log Base 10. Apply.

  • Action: Transform Data Transform: Mean Center Row. Apply.

  • Action: Filter Data Filter: Missing Data Row. Remove if > 50% Missing Values. Apply.

  • Action: Filter Data Filter: Standard Deviation Row. Keep 500 rows with highest Standard Deviation. Apply.

After applying the transformations, the matrix contains no errors and should be suitable for heat map generation ( Figure 4). Note that the left-hand panel shows the history of transformations performed on the matrix, and one can ‘undo’ back to any previous state of the matrix (including the original version) by clicking the desired previous state and hitting reset. More generally, the entire process of creating a heat map is iterative; the Next and Previous buttons can be used to return to previous steps to try different options. If, after generating the heat map, it appears that there should be more or fewer rows or different transforms, one can return to the pertinent screen and use the history and Reset option to adjust the data matrix. Finally, as an added feature, the Transform screen enables the user to download the filtered, transformed matrix for use in other analyses.

Figure 4. The transformed dataset has a better distribution and size for heat map generation than did the original.

Figure 4.

The history of transformations in the left-hand panel can be used to undo changes and revert to previous matrix states.

Clustering

The next step is clustering ( Figure 5). The row order and column order drop-down menus can be used to select the clustering algorithm and distance measure to be applied to the rows and/or columns. Ward’s algorithm with Euclidean distance metric is one common choice, but the menus include many other possibilities, appropriate for different purposes and data characteristics. For the sample case, the Ward/Euclidean options provide strong separation in the dendrogram and interesting groups of samples. The menus also allow the rows and columns to be left in original order or randomized. Additional options will be provided in the future.

Figure 5. The clustering step supports many different clustering methods and distance measures.

Figure 5.

The Apply button performs clustering and displays the resulting dendrograms.

Please be aware that clustering of larger matrices may take a few minutes to complete. (The time it takes to cluster data increases approximately as the square of the number of rows or number of columns, whichever is larger.)

Covariate bars

The next page allows covariate (classification) bars to be added to the heat map ( Figure 6). Covariate bars add descriptive information about the rows or columns of the heat map. A covariate bar file has the same labels as the rows or columns in the matrix and an annotation value. In this use-case we will use TCGA clinical data to add age, smoking status, gender, and tumor stage to the heat map. The covariate file contains sample ids and clinical values – one value per line. When a covariate file is added, one must identify it as a row or column covariate and specify whether it contains discrete (categorical) data or continuous values. In this case smoker status, gender, and stage are discrete column covariates, and age is a continuous column covariate.

Figure 6. The covariate screen allows for the addition of supplemental data that describes the rows or columns of the data.

Figure 6.

This screen is also used to change the color of values and ordering of the covariate bars.

After covariate bars have been added, the colors associated with the covariate values can be changed. If the color scheme might be useful for other maps, the palette can be saved to the server using the See Palettes button. Covariates can be reordered on the same screen.

An advanced feature, accessed on the cluster page, is the ability to generate a covariate bar based on the clustering dendrogram. If, for example there are four distinct clusters in the data and one wants to emphasize them in discussion of the heat map, a covariate that identifies the four top clusters based on the four top branches of the dendrogram can be generated.

Another notable advanced feature is the ability to include classification data in the original matrix uploaded in the first step, rather than providing individual covariate files on the covariate page. Choosing advanced features on the first page enables the user to identify covariates as well as labels and data in the uploaded matrix.

Format heat map

The format screen ( Figure 7) supports the final step in generation of a heat map, adjustments of its appearance:

Figure 7. The format step is used to make changes to the appearance of the heat map, for example, changing the color scheme or altering the breakpoints associated with the colors.

Figure 7.

Many appearance change options are available.

  • Adjustment of colors and break points in the body of the heat map.

  • Formatting of labels

  • Formatting of the dendrograms

  • Specification of the data type of the labels for link-outs.

For this use case, several changes were made: (i) a slight adjustment to the break points to emphasize high and low values in the matrix, (ii) identification of row labels as gene symbols, and (iii) identification of column labels as TCGA sample identifiers. Associating the labels with known data types activates available type-specific link-outs to external data resources.

Interesting advanced features on the same page include the addition of ‘top items’ that will be displayed in the global (i.e., full) heat map view. For example, to show the positions of a few key genes, they can be entered on the page and will show on the global heat map display. Another powerful advanced feature is the ability to add gaps to emphasize sub-groups in the heat map.

Heat map – view and download

The heat map is now complete, but the Prev button can still be used to go back to previous build steps to try different options. On this final page of the Interactive Builder ( Figure 8), the map can be explored dynamically and downloaded. The Get Heat Map PDF button downloads a PDF of the summary and/or detail views as they appear on the screen – including a version of the detailed view zoomed as desired. The legends and other metadata are shown on a separate page of the pdf. The final screen can also be used to explore the dynamic heat map by zooming, panning, searching, dendrogram selection, and link outs. Clicking the Expand Map button devotes the whole browser window to the map.

Figure 8. The heat map review and download screen shows the completed heat map, allows for dynamic exploration of the map, and provides download options for a PDF, an NG-CHM, and/or the construction history.

Figure 8.

Heat maps constructed on the Interactive Builder website are not saved. However, NG-CHMs can be downloaded to save and explore dynamically on your own computer. Select the Get NG-CHM file to obtain a map and then select the Get Heat Map Viewer to get a stand-alone NG-CHM viewer to run on your computer. See our NG-CHM site for more details on the capabilities of dynamic heat maps, additional builders to generate NG-CHMs (Galaxy and R) 2, and instructions on how to embed dynamic heat maps in your websites - https://www.ngchm.net/. Also see our YouTube channel for tutorials on NG-CHM features.

NG-CHM

The interactive NG-CHM produced by the Builder for the use case can be viewed here. Try the pan, zoom, search, and link-out features.

Reproducibility

Reproducibility of results is becoming increasingly important for publication in high-impact journals 14. Therefore, it is important to be able to report the exact steps performed to transform data and create a heat map. That is particularly challenging with an iterative tool that facilitates exploration of alternative options. The Get Creation Log button on the file page of the Interactive Builder is meant to address that need. The history provided by the log shows each option, including the data transformations that were performed to produce the current map. With the original data file and the history, it is possible to recreate a heat map exactly.

Conclusions

The Interactive CHM Builder 9 is an easy to use yet powerful tool for creating custom clustered heat maps for any type of study that generates a matrix of data. It has an intuitive step by step process to prepare the data and build high-quality CHMs. A sample dataset is built-in so it takes just seconds to try out the process and become familiar with the basic steps for heat map generation. It is also easy to back up to previous steps or data states to try alternative approaches and refine formatting. Finally, heat maps can be downloaded as either PDF files or NG-CHM files that support in-depth exploration of the maps.

Although there are many methods available to correct/normalize/filter data, perform hierarchical clustering, and present the resulting heat maps, most of them require programming and biostatical skills. For non-programmers the options are more limited. The best-known software packages for that purpose are Cluster 3.0 15 for data manipulation and clustering combined with TreeView 16 for display of heat maps. Newer tools in the category include Morpheus ( https://software.broadinstitute.org/morpheus/) and Heatmapper 17. Some advantages of the Interactive CHM Builder are:

  • Unlike Cluster 3.0/TreeView, no software installation and configuration are required. Interactive CHM Builder is available as a web service.

  • Unlike other heat map tools, Interactive CHM Builder provides a step by step process starting with an unprocessed matrix that includes: correction of invalid/missing values, data normalization and transformation, data filtering, clustering, addition of covariates, and advanced customization of heat map display including link outs. At each step of the process we provide histograms and incremental heat map visualizations to assist with understanding the data and the effect of option selection.

  • It is a fluid tool that supports the iterative nature of heat map creation, enabling users to move easily back and forth to revisit and modify any step of the process.

  • Unlike other tools, it provides a complete history of each option selected to transform the data and generate the heat map. That capability enables the user to reproduce the heat map even months or years later.

  • Finally, the resulting NG-CHMs provide enhanced ability to support dynamic exploration of patterns in the data. They can be shared with collaborators and larger research communities on a website with an NG-CHM plugin or as a stand-alone heat map and viewer.

Data availability

Open Science Framework: NG-CHM Interactive Builder Use-Case Data. https://doi.org/10.17605/OSF.IO/H7ZS2 13.

This project contains the sample TCGA bladder cancer matrix used in the use-case.

Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Software availability

The Interactive CHM Builder is freely available for use as a web resource at: https://build.ngchm.net/NGCHM-web-builder/.

Source code available from: https://github.com/MD-Anderson-Bioinformatics/NG-CHM_GUI_BUILDER.

Archived source code at time of publication: https://doi.org/10.5281/zenodo.3460673 9.

License: GNU General Public License version 2.

Funding Statement

This work was supported in part by Grant Numbers U24CA143883, U24CA199461, U24CA210949 and U24CA210950 from the National Cancer Institute, as well as generous gifts from the Mary K. Chapman Foundation and the Michael & Susan Dell Foundation.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 2; peer review: 2 approved]

References

  • 1. Weinstein JN, Myers T, Buolamwini J, et al. : Predictive statistics and artificial intelligence in the U.S. National Cancer Institute's Drug Discovery Program for Cancer and AIDS. Stem Cells. 1994;12(1):13–22. 10.1002/stem.5530120106 [DOI] [PubMed] [Google Scholar]
  • 2. Weinstein JN, Myers TG, O'Connor PM, et al. : An information-intensive approach to the molecular pharmacology of cancer. Science. 1997;275(5298):343–9. 10.1126/science.275.5298.343 [DOI] [PubMed] [Google Scholar]
  • 3. Myers TG, Anderson NL, Waltham M, et al. : A protein expression database for the molecular pharmacology of cancer. Electrophoresis. 1997;18(3–4):647–53. 10.1002/elps.1150180351 [DOI] [PubMed] [Google Scholar]
  • 4. Eisen MB, Spellman PT, Brown PO, et al. : Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95(25):14863–8. 10.1073/pnas.95.25.14863 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Scherf U, Ross DT, Waltham M, et al. : A gene expression database for the molecular pharmacology of cancer. Nat Genet. 2000;24(3):236–44. 10.1038/73439 [DOI] [PubMed] [Google Scholar]
  • 6. Ross DT, Scherf U, Eisen MB, et al. : Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 2000;24(3):227–35. 10.1038/73432 [DOI] [PubMed] [Google Scholar]
  • 7. Zeeberg BR, Qin H, Narasimhan S, et al. : High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics. 2005;6:168. 10.1186/1471-2105-6-168 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Weinstein JN: Biochemistry. A postgenomic visual icon. Science. 2008;319(5871):1772–3. 10.1126/science.1151888 [DOI] [PubMed] [Google Scholar]
  • 9. mstucky, flikeda, Ryan M, et al. : MD-Anderson-Bioinformatics/NG-CHM_GUI_BUILDER 2.15.1. (Version 2.15.1). Zenodo. 2019. 10.5281/zenodo.3460673 [DOI]
  • 10. Broom BM, Ryan MC, Brown RE, et al. : A Galaxy implementation of next-generation clustered heatmaps for interactive exploration of molecular profiling data. Cancer Res. 2018;77(21):e23–e26. 10.1158/0008-5472.CAN-17-0318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86. 10.1186/gb-2010-11-8-r86 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Robertson AG, Kim J, Al-Ahmadie H, et al. : Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer. Cell. 2017;171(3):540–556.e25. 10.1016/j.cell.2017.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ryan M: NG-CHM Interactive Builder Use-Case Data.2019. 10.17605/OSF.IO/H7ZS2 [DOI]
  • 14. McNutt M: Reproducibility. Science. 2014;343(6168):229. 10.1126/science.1250475 [DOI] [PubMed] [Google Scholar]
  • 15. de Hoon MJ, Imoto S, Nolan J, et al. : Open source clustering software. Bioinformatics. 2004;20(9):1453–1454. 10.1093/bioinformatics/bth078 [DOI] [PubMed] [Google Scholar]
  • 16. Saldanha AJ: Java Treeview--extensible visualization of microarray data. Bioinformatics. 2004;20(17):3246–3248. 10.1093/bioinformatics/bth349 [DOI] [PubMed] [Google Scholar]
  • 17. Babicki S, Arndt D, Marcu A, et al. : Heatmapper: web-enabled heat mapping for all. Nucleic Acids Res. 2016;44(W1):W147–53. 10.1093/nar/gkw419 [DOI] [PMC free article] [PubMed] [Google Scholar]
F1000Res. 2020 Mar 30. doi: 10.5256/f1000research.25215.r61493

Reviewer response for version 2

Melissa S Cline 1

The authors have faithfully addressed the reviewers' feedback. This is a well-written write up of an excellent piece of research software, and represents a fine resource for the research community.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2020 Feb 21. doi: 10.5256/f1000research.22637.r59179

Reviewer response for version 1

Melissa S Cline 1

Ryan et al. present a manuscript for the interactive clustered heat map builder tool that has been widely used in cancer research consortia. The tool is of excellent quality overall, is very useful, and is intuitive in its design and execution. The manuscript is well-written, with some caveats below.

Major feedback:

The authors are not doing justice to the tool, which offers much more than user-friendly heatmap generation. To put the functionality in perspective, they should contrast it with the Cluster/TreeView suite, which also offers a user-friendly interface to filtering and data transformation.

Minor feedback

The Operation subsection assumes knowledge of Docker and Tomcat. The authors should cite appropriate background reference material for readers who aren't familiar with these technologies.

For the use case, the authors summarized how they transformed the data, but did not indicate how those transformations were done with their tool. This needs to be clarified, because it's not obvious.

The sample data in the OSF Storage site is stored as a single tarball. This is awkward, as the entire tarball has to be downloaded and expanded in order to access any single file.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2020 Mar 11.
Michael Ryan 1

Thank you for your feedback and suggestions.  Below we have described how each was addressed.

Major feedback:

The authors are not doing justice to the tool, which offers much more than user-friendly heatmap generation. To put the functionality in perspective, they should contrast it with the Cluster/TreeView suite, which also offers a user-friendly interface to filtering and data transformation.

Good suggestion.  We have modified the 5 th sentence in the first paragraph of the introduction to broaden the description of the scope of the tool and have included a new paragraph in the conclusion section to contrast our tool with Cluster 3.0/Treeview.

Minor feedback

The Operation subsection assumes knowledge of Docker and Tomcat. The authors should cite appropriate background reference material for readers who aren't familiar with these technologies.

We agree.  Additional detail including links to the appropriate reference material for Docker and Tomcat, has been added to the Operation section.

For the use case, the authors summarized how they transformed the data, but did not indicate how those transformations were done with their tool. This needs to be clarified, because it's not obvious.

Thank you for pointing that out.  The transforms section of the use case has been modified to provide the exact path through the screen options for each transform performed.  That should make it easier to follow the steps exactly.

The sample data in the OSF Storage site is stored as a single tarball. This is awkward, as the entire tarball has to be downloaded and expanded in order to access any single file.

Agreed.  Accordingly, we have modified the OSF Storage site to contain a folder with all of the files needed for the use case as individual, uncompressed files.

F1000Res. 2019 Oct 29. doi: 10.5256/f1000research.22637.r55133

Reviewer response for version 1

Natasha Caplen 1, Soumya Sundara Rajan 1

Ryan and co-workers have developed the software tool Interactive Clustered Heat Map (CHM) builder to enable investigators with minimal expertise in bioinformatics and biostatistics to generate publication-quality heatmaps. The use of heatmaps to visualize related datasets is a common feature in many reports of the results of studies that include genome or transcriptome-scale experiments. However, the statistical underpinnings of a heatmap require the application of appropriate transformation and clustering procedures. The interactive CHM tool makes use of user-uploaded data that is then processed to generate heatmaps defined by a set of standardized options; for example, the user can select different distance metrics (e.g., the calculation of Euclidean distance versus Manhattan distance) or clustering options (random versus hierarchical). The user also has the option to input possible co-variant data sets for the further stratification of the primary results. Furthermore, the user can customize the visual properties of the heatmap by selecting the output of the computational pipeline from a palette of colors. The article itself is well-written, though, as stated below, we recommend some edits to the current text. A particularly positive feature of this CHM tool is the inclusion of a dynamic capability that allows the user to explore their data in greater depth. Many of the features of the graphical user interface (GUI) are easy to use, and the user does not have to refer to the accompanying article describing the builder software continually. However, to enhance the impact of this resource, we recommend modification of the current versions of their article and software tool to address the following points. 

Article

In the Introduction, the authors discuss the user’s ability to use their tool to generate heatmaps reiteratively, refining data transformation, annotation, clustering, and formatting. The authors also point out that this may introduce the risk of generating a multiple-comparison issue. To help the user avoid such issues, can the authors briefly mention other resources (e.g., review articles) that the user can refer to when considering which of the transformation, clustering, and distance metrics will be most applicable to their dataset?

The authors should include a discussion of how the interactive HCM builder compares to other free heatmap generators available, for example, heatmapper.ca; Babickiet al., Heatmapper: web-enabled heat mapping for all  Nucleic Acids Res. 2016 May 17 (epub ahead of print). DOI:10.1093/nar/gkw419); and the Morpheus software from the Broad Institute ( https://software.broadinstitute.org/morpheus/).

Some datasets require non-hierarchical clustering to obtain the most appropriate and meaningful interpretation of the results. Please explain why this software provides only either hierarchical, random, or no clustering options?

Website

Some test runs found that when the user runs through the work-flow and generates a heatmap using a dataset, the generation of a new heatmap either using the same dataset or a different dataset requires the user to close the website and re-open the homepage. The re-set function may need modification.

Some test runs found that when choosing the formatting and then palettes after adding co-variants, the apply button on the left-hand window has lines running through it.

It is easy to maneuver and resize the highlighter box over any region of the heatmap generated using the sample data. However, we noted not all heatmaps performed as well using user-uploaded data.

Please state clearly on the website’s front-page that the website limits the heat map to “no more than 4,000 total rows and columns and no more than 3,500 elements on either axis.” In the absence of this statement on the front-page, users may attempt to upload more complex datasets.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2020 Mar 11.
Michael Ryan 1

Thank you for your detailed comments and suggestions on the article and the tool.  Each suggestion/comment is addressed below:

Article

 

In the Introduction, the authors discuss the user’s ability to use their tool to generate heatmaps reiteratively, refining data transformation, annotation, clustering, and formatting. The authors also point out that this may introduce the risk of generating a multiple-comparison issue. To help the user avoid such issues, can the authors briefly mention other resources (e.g., review articles) that the user can refer to when considering which of the transformation, clustering, and distance metrics will be most applicable to their dataset?

We are not aware of any review article that covers the topic comprehensively.  But, in the text, we do cite an article of ours (Weinstein JN: A postgenomic visual icon. Science. 2008;319: 1772) that provides additional background on some of the relevant subtleties of heat map generation.  As you correctly point out, the optimum approach depends on the specifics of an individual dataset and objectives of the study.  In light of your comment, we will consider writing a review article that addresses those issues at more length.  Also, we are contemplating a future enhancement to the Interactive CHM Builder that would provide templates or workflows based on study type as a starting point to assist with navigating data transformations and heat map generation.

The authors should include a discussion of how the interactive HCM builder compares to other free heatmap generators available, for example, heatmapper.ca; Babickiet al., Heatmapper: web-enabled heat mapping for all  Nucleic Acids Res. 2016 May 17 (epub ahead of print). DOI:10.1093/nar/gkw419 ); and the Morpheus software from the Broad Institute ( https://software.broadinstitute.org/morpheus/ ).

Thank you for the suggestion.  We have added a paragraph to the conclusions section to enumerate what we feel are the advantages of our Interactive CHM builder compared with the other similar tools.

Some datasets require non-hierarchical clustering to obtain the most appropriate and meaningful interpretation of the results. Please explain why this software provides only either hierarchical, random, or no clustering options?

We agree.  Thank you. The methods we have currently implemented are the ones that are most heavily used in publications of omics research.  We will add non-hierarchical clustering methods to our requested features list for a future release.

Website

 

Some test runs found that when the user runs through the work-flow and generates a heatmap using a dataset, the generation of a new heatmap either using the same dataset or a different dataset requires the user to close the website and re-open the homepage. The re-set function may need modification.

Thank you for reporting this issue.  We have modified the restart flow and believe the problem has been corrected.

Some test runs found that when choosing the formatting and then palettes after adding co-variants, the apply button on the left-hand window has lines running through it.

We have been unable to reproduce that issue in the latest release of the software so we believe it has been corrected.  If you encounter it again, we would appreciate it if you submit a git issue, noting the browser and operating system for which it occurs.

It is easy to maneuver and resize the highlighter box over any region of the heatmap generated using the sample data. However, we noted not all heatmaps performed as well using user-uploaded data.

Thank you for the report.  Since submission of the paper, we have made several improvements to the selection/sizing features and have tested many odd sized asymmetrical matrices.  We will continue to implement improvements in selection mechanics if additional issues arise.

Please state clearly on the website’s front-page that the website limits the heat map to “no more than 4,000 total rows and columns and no more than 3,500 elements on either axis.” In the absence of this statement on the front-page, users may attempt to upload more complex datasets.

For many studies, an important step in preparing data for clustering and heat map generation is filtering out rows and/or columns that have a high proportion of missing values or that show little variance across samples.   We want to allow users to upload matrices that are above the clustering limit because the filtering step will often reduce the size of the matrices such that they can be clustered.  The manuscript was not clear on this point.  Thank you for pointing this out.  We have modified the Use Case section “Transform/filter the data”, paragraph 2, to explicitly discuss clustering limits and the use of filtering to reduce larger datasets.

The interactive nature of the tool does limit the maximum matrix we can cluster.  As you know, the compute time for most clustering algorithms essentially increases as the square of the largest dimension.  The tool’s limit for the clustering step has been increased from 4,000 to 5,000 total rows/columns. The 3,500 axis limit has been removed.  We’ve also added new system messages that more clearly explain those issues, and we plan to continue pursuing increases in the limits as computational power increases and clustering algorithms advance.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. mstucky, flikeda, Ryan M, et al. : MD-Anderson-Bioinformatics/NG-CHM_GUI_BUILDER 2.15.1. (Version 2.15.1). Zenodo. 2019. 10.5281/zenodo.3460673 [DOI]
    2. Ryan M: NG-CHM Interactive Builder Use-Case Data.2019. 10.17605/OSF.IO/H7ZS2 [DOI]

    Data Availability Statement

    Open Science Framework: NG-CHM Interactive Builder Use-Case Data. https://doi.org/10.17605/OSF.IO/H7ZS2 13.

    This project contains the sample TCGA bladder cancer matrix used in the use-case.

    Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES