Abstract
Most imaging studies in the biological sciences rely on analyses that are relatively simple. However, manual repetition of analysis tasks across multiple regions in many images can complicate even the simplest analysis, making record keeping difficult, increasing the potential for error, and limiting reproducibility. While fully automated solutions are necessary for very large data sets, they are sometimes impractical for the small- and medium-sized data sets that are common in biology. This paper introduces Slide Set, a framework for reproducible image analysis and batch processing with ImageJ. Slide Set organizes data into tables, associating image files with regions of interest and other relevant information. Analysis commands are automatically repeated over each image in the data set, and multiple commands can be chained together for more complex analysis tasks. All analysis parameters are saved, ensuring transparency and reproducibility. Slide Set includes a variety of built-in analysis commands and can be easily extended to automate other ImageJ plugins, reducing the manual repetition of image analysis without the set-up effort or programming expertise required for a fully automated solution.
Keywords: image analysis, image processing, automation, reproducibility, software
Introduction
Advances in computer technology and computational methods have enabled increasingly complex analyses of scientific imaging data. Yet many common imaging studies in the biological sciences rely on analyses that are, at their core, relatively simple. For example, measuring the average signal or correlation between two channels in an area of an immunofluorescence image, the density of a band in a Western blot, or the distance between two image features do not require complex algorithms or highly specialized software. Many widely-used software packages, both commercial (Adobe Photoshop, MetaMorph, etc.) and open-source (ImageJ (1)), can be used to perform such tasks. However, while it can be quite easy to analyze one region of a single image, actual imaging studies generally require analysis of several regions in many images. This repetition can render even the simplest analysis tasks quite complicated in practice. The principal difficulties include recording regions of interest (ROIs) and linking them to the resulting quantifications, saving input parameters so the analysis can be reliably replicated, tracking the dependencies of multiple linked analysis steps, and managing the tedium and error risk derived from repetition. Given the recent concern regarding reproducibility of scientific research (2,3), any successful image analysis strategy must address these issues.
One approach to improve the reproducibility of image analysis is to use programming or scripting to automate the analysis process, thus removing variability from interactive user input. Rerunning the same analysis program on the same data should produce the same result. However, while image processing libraries are available for most commonly used programming languages, their use requires considerable programming expertise. High-level scripting supported by several image analysis programs can remove some of the barriers of low-level languages. For example, ImageJ supports scripts written in its own macro language, as well as BeanShell, Clojure, Javascript, Python, and Ruby, and Adobe Photoshop supports Javascript. In addition, MathWorks MATLAB, an interactive environment for numerical computation, contains robust support for scripted image analysis. Although high-level scripting languages are simpler to apply than developing a stand-alone application from scratch, they may remain inaccessible to researchers without any programming experience. Furthermore, developing a new scripted solution for each data set can be somewhat inefficient, especially for simple analyses.
Several software packages also support fully automated image analysis workflows with little scripting, including CellProfiler, PhenoRipper, and the Protocols feature of Icy (4-7). But while these programs increase the accessibility of automated image analysis, they do so at the expense of interactivity. This is a suitable approach for very large data sets, where the number of images alone makes interactivity impractical, but it creates additional obstacles for small- or medium-sized data sets. Importantly, automatic segmentation of images into ROIs is not a trivial problem. Even with advances in machine learning algorithms, automated segmentation generally requires special considerations during experimental design and image acquisition, such as including additional markers and ensuring very low background signal. While high signal-to-noise ratios and specific feature markers are certainly desirable characteristics for large and small imaging studies alike, they are not always possible. Thus, for some small- and medium-sized data sets, interactive ROI segmentation may be the only available option. Even when automatic segmentation is possible, tuning the segmentation algorithm parameters can be extremely time-consuming. For smaller data sets, interactive segmentation may simply be much more efficient. Additionally, the initial analysis of small- and medium-sized data sets can greatly benefit from an interactive approach that allows the comparison of different individual analyses or different analysis parameters. Programs optimized for the full automation required for the largest data sets do not allow this flexibility. As evidenced by the fact that the large majority of imaging studies in the biological sciences do not use scripted or fully automated analyses (Table 1), a need remains for image analysis software which allows interactivity, but records any choices made by the user and automates repetitive steps whenever possible.
Table 1.
Many imaging studies do not use scripted or fully-automated analyses.
Research articles with quantitative image analysis |
Workflow type | ||||
---|---|---|---|---|---|
Fully Interactive |
Scripted | Automated, non- scripted |
Unable to determine |
||
Journal of Cell Biology February 3, 2015 |
4 | 4 (100%)a | 0 (0%)a | 0 (0%)a | 0 |
Journal of Cell Biology February 16, 2015 |
7 | 6 (100%) | 2 (33%) | 0 (0%) | 1 |
Cell February 26, 2015 |
11 | 2 (67%) | 2 (67%) | 0 (0%) | 8 |
Nature Cell Biology March 1, 2015 |
10 | 10 (100%) | 2 (20%) | 1 (10%) | 0 |
Journal of Cell Science March 1, 2015 |
14 | 8 (80%) | 2 (20%) | 0 (0%) | 4 |
Total | 46 | 30 (91%) | 8 (24%) | 1 (3%) | 13 |
Frequency of the workflow type among articles where the workflow type could be identified.
A majority of research articles in recent issues of selected cell biology journals which have emphasized quantitative image analysis employed some form of interactive workflow. Only one of the papers employed an automated non-scripted workflow; this was part of a high-content integrated imaging and analysis system. More than one-quarter of the papers surveyed did not provide a complete description of the image analysis workflows used.
Slide Set, an open-source Java-based plugin for the Fiji distribution of ImageJ (8), provides a framework for interactive and reproducible image analysis, controlled through a graphical user interface with intuitive menu-based commands. By organizing data into tables, Slide Set allows image files to be associated with other important information, such as experimental conditions and time point information. ROIs can be recorded interactively and saved using the widely-supported scalable vector graphics (SVG) format (9), ensuring their availability for later inspection or reuse. Slide Set runs analysis commands on data tables rather than individual images, allowing easy and consistent repetition of the analysis across every image in the table. Analysis results are also organized into tables, including records of the commands, parameters, and input data used to generate them. Additionally, command sequences can be saved and reused on additional datasets. With a wide variety of built-in analysis commands, as well as easy extensibility to include other ImageJ plugins, Slide Set builds flexible image analysis workflows that are transparent and reproducible.
Materials and Methods
Recommended installation and online resources
Slide Set is a plugin for the Fiji distribution of ImageJ (http://fiji.sc/) (8), and will run on any computer that runs Fiji. The author’s website (http://cellbio.emory.edu/bnanes/slideset) contains the most up-to-date Slide Set version, as well as instructions for installing Slide Set using the Fiji updater. The website also includes a copy of the Slide Set user documentation, step-by-step tutorials, and further information for developers. A public source code repository is also available (https://github.com/bnanes/slideset), further supporting the openness and reproducibility of image analysis with Slide Set (10). In addition to the general description of Slide Set’s data model and workflow structure below, five walk-through examples demonstrate the use of Slide Set for various analysis tasks. The capabilities demonstrated include interactive ROI selection (Example S1), calculating between-channel correlations (Example S1), calculating average values and using them as thresholds to compute Mander’s colocolization coefficients (Example S2), creating ROIs based on threshold values (Example S3), automating general ImageJ plugins (Example S4), separating immunohistochemistry images into two absorbance components (Example S5), and automating a trainable segmentation plugin (Example S5) (11).
System requirements and additional installation options
Silde Set requires Java SE6 or newer (included with Fiji), and runs on Windows, Mac OSX, and Linux operating systems. Slide Set depends on components from ImageJ2 (http://developer.imagej.net), which are included in Fiji, but not with ordinary ImageJ (http://rsbweb.nih.gov/ij/). The recommended way to install the most up-to-date version Slide Set is through the Fiji updater. From the Fiji updater, select “Manage Update Sites,” then add the Slide Set update site (http://cellbio.emory.edu/bnanes/slideset/update) to the list. Once the update site is added, new Slide Set updates will be automatically installed whenever the Fiji updater is run. Alternatively, Slide Set can be installed manually by extracting the Slide Set distribution archive (slideset-core-<version>-dist.zip) into the Fiji plugins directory. Note that Slide Set will not be automatically updated if it is installed manually. The latest Slide Set distribution can be downloaded from the author’s website (http://cellbio.emory.edu/bnanes/slideset). Slide Set version 1.3.1 is also available here as Program S1. Because Slide Set version 1.3.1 is targeted to ImageJ1 version 1.50a and ImageJ2 version 2.0.0-rc-34, it may not be compatible with the latest Fiji updates. The Slide Set source code is available from a public repository (https://github.com/bnanes/slideset). The project can be built on any system with a Java development kit using the Apache Maven build system (http://maven.apache.org/). The source code contains detailed documentation of the APIs for creating Slide Set command plugins and handling data types not supported by the Slide Set core. Slide Set source code and binaries are distributed under a Simplified BSD license.
Results and Discussion
Slide Set facilitates image analysis workflows that are both interactive and reproducible through table-based data organization and a built-in ROI editor. Each table column represents one field in the dataset and each row contains one entry (Figure 1). Table columns have defined data types, such as numeric or text. Most data types are stored directly in the table. However, more complex data, such as images and ROI sets, are stored in separate files, and the table contains links to those files. The images and ROIs are then loaded from the linked files when needed. Representing images as links to image files rather than copying the actual image data into the table saves disk space and ensures that all analyses are done on the original images, rather than copies, eliminating a potential source of error. Tables organize data used for analysis command inputs, which are matched to table columns based on data type, as well as command results, which are stored in a new data table linked to the input table. Results tables also contain records of the analysis commands and input parameters used to create the table. Multiple analyses can be chained together to form a tree of data tables, stored in a project file. Slide Set uses an XML-based format for project files, so the data can be inspected in any text editor. Data from individual tables can also be exported as comma separated value spreadsheets for further analysis using other software.
Figure 1. Schematic representation of the Slide Set data model.
The root data table (top) contains columns for images, regions of interest (ROIs), and image labels. The associated image files, ROI files, and labels are grouped together in rows. Analysis command inputs can be filled from data table columns or with constant values. The command is repeated for each row in the input data table, and command results are stored as columns in a new table. Columns from the input table can also be copied to the results table, and values are repeated as necessary for commands that produce multiple results. The results tables shown contain two results from image A, indicating that the ROI set associated with that image contains two regions. Results tables can be used as inputs for further analysis commands.
Slide Set includes a built-in editor for interactive specification of ROIs for images in data tables (Figure 2A; Example S1). ROIs are automatically saved in files linked from the table, allowing reuse of the same ROIs for multiple analyses. Rather than use a custom file format for storing ROI data, Slide Set stores ROIs as SVG files. SVG is a commonly used format for representing vector image data (9), and can be read by a variety of software, including many Web browsers. SVG elements directly correspond to the most common ROI types used in ImageJ, including points, lines, curves, rectangles, circles, and polygons. One SVG file can contain all the ROIs for a single image. With a standard file format, ROIs can also be viewed and edited by other programs if necessary. In addition, SVG files containing both ROI and image data can be exported from Slide Set for use in the creation of figures for publication. The SVG format is limited to representations of two-dimensional ROIs. If multi-dimensional ROIs are needed, Slide Set also supports storing ROIs as special ROI Set files, zipped archives of individual ROI files in ImageJ2’s native ROI format.
Figure 2. The Slide Set user interface and command workflow.
(A) The Slide Set region of interest (ROI) editor allows interactive ROI specification using the ImageJ selection tools. Selected ROIs are automatically saved as scalable vector graphics (SVG) format files. (B) The main Slide Set user interface contains two pains. In the right pane, tables in the project are organized as a tree, with the root data table at the top, and the results of analysis commands descending from it. Multiple commands can be combined for more complex analyses. In the left pane, a log tracks the progress of Slide Set commands as they run. (C) Command inputs are matched to data table columns or set with constant values. (D) Command results can be saved in the results table, discarded, or, for images and ROIs, saved as separate files. Values from the input table can also be copied directly into the results table for reference or reuse as inputs for subsequent analysis commands. (E) Command results are stored in a new data table, shown in the tree as a descendent of the input data table.
From the user perspective, each Slide Set analysis begins with a project file (Example S6 contains a demonstration Slide Set project, along with a step-by-step record of its creation). New projects contain an empty data table, to which images can be added. Additional columns can be added for related information, such as experimental group codes. After images are added to the data table, ROIs can be added using the ROI editor, which allows ROI specification using the usual ImageJ selection tools. Selected regions are automatically saved as SVG files, which are added to the data table in a new column. Once the necessary ROIs have been selected, the desired analysis commands can be run. In the example project, four different analyses have been run on the root data table, calculating channel averages within each ROI, channel averages along ROI boundaries, and correlation coefficients between channels, and segmenting the image based on threshold values (Figure 2B). Although each command is different, set-up and execution of every command follows the same general workflow.
First, command input parameters are set, either from the source data table or with constant values (Figure 2C). If an input parameter is matched to an appropriately typed column from the source data table, the parameter value will be set to the value in the relevant row of that column each time the command runs. For example, an image input parameter is typically matched to a table column containing a list of image files, allowing the command to cycle through each image in the data set. In contrast, if an input parameter is set to a constant value, the same parameter value is used each time the command runs. This is most useful for parameters that should not be varied across different images or groups, such as threshold values. After the command input parameters have been matched to table columns or set with constant values, the user specifies handling for the command results (Figure 2D). Although most results are stored directly in the results table, some commands produce image or ROI results, which must be stored in separate files. For these results, file name patterns can be specified. Results that are not needed can be discarded (Example S3). Additionally, columns from the input table can be copied into the results table. In the example project, the image labels have been copied for later reference. Finally, after the command input parameters and results handling have been set, the command is repeated for each row in the data table, and its results are stored in a new table (Figure 2E).
Command results can be inspected from within Slide Set or exported as comma separated value spreadsheets for further analysis with other software. Additionally, results tables can be used as input data for other commands, allowing the combination of commands for more complex analyses (Example S2). In the example data set, the Threshold Segmentation command is used to define ROIs (Figure 3, A–C), which are then used as inputs for a second command, to calculate the average signal within the automatically selected regions (Figure 3, D and E). The resulting tables are organized into a tree, with the original data set at the base and each results table in a path descending from the original data (Figure 2B). Each table also contains a record of the analysis command and properties responsible for its creation (Figure 4). Thus, the analysis workflow is fully recorded, greatly facilitating replication or independent verification. Further supporting replication of analysis strategies, Slide Set exports command skeletons, which contain linear sequences of analysis commands along with their input parameters. Command skeletons are subsequently re-applied to different input data, ensuring easy repetition of complex command sequences.
Figure 3. Multiple analysis commands can be combined.
(A) The Threshold Segmentation command input parameters are set to select regions of high intensity on the green channel. (B) The regions of interest (ROIs) selected by the command will be stored in separate files, with the file paths recorded in the results table. In addition, the image file names will be copied into the results table. (C) The ROI manager displays the resulting ROIs drawn on the original images. (D) The Region Statistics command is run on the results table from the Threshold Segmentation command. (E) This command then calculates the average values on each channel of the ROIs automatically selected by Threshold Segmentation. Since there were many ROIs selected, the results table contains more than one row for each image.
Figure 4. Table properties record the analysis command and input parameters that created the table.
The properties of this table show that it was created using the Border Statistics command with the border width set to 5 pixels and no thresholds, ensuring that the analysis can be reliably replicated.
Slide Set includes a variety of built-in analysis commands, and the documentation included with the program details the input parameters for each command and the results produced. Built-in commands include some of the most commonly used image analysis tasks, including calculating basic ROI statistics (Example S2), colocalization analyses (Examples S1 and S2), image segmentation (Examples S3 and S5), absorbance unmixing (Example S5), and utility functions (Table 2). In addition, Slide Set can be used to automate any general ImageJ2 plugin, provided that the plugin’s input and output data types are supported (Example S4; Table 2). Command plugins can also be developed specifically for Slide Set using a Java application programming interface (API) very similar to the ImageJ2 plugin API. Additionally, an API is available for handling data types not supported by the Slide Set core. Plugins are recognized and loaded at runtime, so they can be compiled individually and installed simply by placing the plugin JAR file in the ImageJ plugins directory. Detailed API specifications and documentation for developers are included in the Slide Set source code.
Table 2.
Slide Set core analysis commands.
Category | Command | Description |
---|---|---|
Basic statistics | ||
Region Statistics | Calculate signal intensity within ROIs | |
Border Statistics | Calculate signal intensity along regions defined by ROI borders |
|
ROI Lengths | Calculate the lengths of ROI borders | |
Colocalization | ||
Pearson’s Correlation | Calculate correlations between two channels within ROIs | |
Manders’ Coefficients | Calculate Manders’ colocalization coefficients between two channels within ROIs |
|
Segmentation | ||
Threshold Segmentation | Segment an image into ROIs based on threshold values | |
Otsu Segmentation | Segment an image based on thresholds automatically calculated using Otsu’s method (12) |
|
Bin Regions | Categorize one set of ROIs based on a second set of ROIs | |
Filter Regions | Filter one set of ROIs based on a second set of ROIs | |
Create Mask Image | Create a binary image mask from a set of ROIs | |
Trainable Weka Segmentation |
Bridge to the trainable segmentation plugin included with Fiji (11) |
|
Other | ||
Unmix Absorbances | Separate a color image into two absorbance components | |
Math Functions | Add, subtract, multiply, or divide numeric data | |
Round | Round numeric data to integers |
In conclusion, Slide Set approaches image analysis in a way unlike any other currently available software package (Table 3). It provides a balance between fully automated or scripted image analysis systems, which can be impractical for many small- and medium-sized data sets, and manual repetition of image analysis commands, which can be tedious, error prone, and difficult to reproduce. By allowing interactive ROI specification, organizing images and related data into tables, and recording the dependencies of linked analysis tasks, Slide Set significantly improves the efficiency and reproducibility of image analysis with moderately sized data sets without requiring any programming expertise. Several existing software packages, such as CellProfiler, PhenoRipper, and the Protocols feature of Icy, support fully automated, non-interactive image analysis. However, fully automated analysis strategies, while necessary for the largest data sets, require certain trade-offs. For example, while these programs contain excellent tools for automated ROI segmentation, their support for interactive ROI specification, recording, and reuse is necessarily limited. For those small- and medium-sized data sets that are not suitable for automated segmentation—especially those that lack highly specific regional markers—Slide Set may provide a more appropriate analysis toolkit. Additionally, Slide Set can be particularly useful during the initial stages of image analysis, when different results at one analysis step might suggest different approaches at subsequent steps. Fully automated systems generally require running the entire analysis workflow at once. In fact, the large majority of published studies do not use fully automated analysis workflows (Table 1). Instead, they rely on fully interactive workflows which may be difficult to reproduce, a deficit that Slide Set is uniquely positioned to address. Finally, Slide Set’s ability to automate general ImageJ2 plugins is likely to be an increasingly useful feature. ImageJ and Fiji support a dynamic ecosystem of independently-developed image analysis plugins. As the ImageJ community transitions from ImageJ1 to ImageJ2, many plugins will be compatible with Slide Set without any additional development effort (Example S4). Thus, the variety of analysis tasks that can be included in a Slide Set workflow will continue to increase. As an added benefit, since Slide Set is a plugin for ImageJ and uses many standard ImageJ components—for example, display controls and ROI selection tools—users already familiar with ImageJ will not need to learn an entirely new software system in order to begin using Slide Set. Slide Set is therefore particularly well suited to the biological sciences, where many imaging studies rely on relatively simple analyses, but are complicated by the difficulty of reliable repetition. Improving the efficiency and reproducibility of image analysis frees time and resources for further scientific investigation, rather than simple repetitive tasks.
Table 3.
Slide Set compared to other image analysis software packages.
Slide Set | ImageJ | MATLAB | CellProfiler | PhenoRipper | ||
---|---|---|---|---|---|---|
Automation without scripting | + | + | + | |||
Automation requires scripting | + | + | ||||
Organize images and metadata in tables |
+ | |||||
Workflow characteristics | ||||||
Modular workflows | + | + | ||||
Automatically record analysis parameters |
+ | + | + | |||
Interactive—results after each command |
+ | + | ||||
Easily compare parameter changes at each step |
+ | |||||
Optimized for very large datasets |
+ | + | ||||
Regions of interest | ||||||
Interactive selection | + | + | ||||
Automatically save and reuse | + | |||||
Interactive revision | + | |||||
Automated segmentation | + | + | + | + | + | |
Export as SVG | + | |||||
Plugin/extension/scripting support |
+a | + | + | + | ||
Open source | + | + | + | + |
Most plugins designed for ImageJ2 will work with Slide Set automatically.
Supplementary Material
Method Summary.
Slide Set improves the transparency and reproducibility of image analysis workflows while maintaining the flexibility of interactive region of interest selection and a graphical user interface. By organizing data into tables, recording all parameter choices, and automating task repetition across multiple images, Slide Set makes interactive image analysis more efficient and less prone to error.
Acknowledgments
I thank Alexa Mattheyses for helpful discussions and reviewing the manuscript; Andrew Kowalczyk and members of the Kowalczyk laboratory for their help and advice; and Chantel Cadwell, Joshua Lewis, Wenji Su, and everyone else who tested early versions of Slide Set. This work was supported by a grant from the National Institutes of Health (F30HL110447), and this paper is subject to the NIH Public Access Policy.
Footnotes
Competing Interests Statement
The author declares no competing interests.
References
- 1.Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nature methods. 2012;9:671–675. doi: 10.1038/nmeth.2089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews. Drug discovery. 2011;10:712. doi: 10.1038/nrd3439-c1. [DOI] [PubMed] [Google Scholar]
- 3.Reducing our irreproducibility. Nature. 2013;496:398–398. [Google Scholar]
- 4.Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, Guertin DA, Chang JH, et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome biology. 2006;7:R100. doi: 10.1186/gb-2006-7-10-r100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kamentsky L, Jones TR, Fraser A, Bray MA, Logan DJ, Madden KL, Ljosa V, Rueden C, et al. Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software. Bioinformatics. 2011;27:1179–1180. doi: 10.1093/bioinformatics/btr095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rajaram S, Pavie B, Wu LF, Altschuler SJ. PhenoRipper: software for rapidly profiling microscopy images. Nature methods. 2012;9:635–637. doi: 10.1038/nmeth.2097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.de Chaumont F, Dallongeville S, Chenouard N, Herve N, Pop S, Provoost T, Meas-Yedid V, Pankajakshan P, et al. Icy: an open bioimage informatics platform for extended reproducible research. Nature methods. 2012;9:690–696. doi: 10.1038/nmeth.2075. [DOI] [PubMed] [Google Scholar]
- 8.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, et al. Fiji: an open-source platform for biological-image analysis. Nature methods. 2012;9:676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.World Wide Web Consortium . Scalable Vector Graphics (SVG) 1.1 W3C Recommendation. 2011. [Google Scholar]
- 10.Ince DC, Hatton L, Graham-Cumming J. The case for open computer programs. Nature. 2012;482:485–488. doi: 10.1038/nature10836. [DOI] [PubMed] [Google Scholar]
- 11.Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explorations. 2009;11:10–18. [Google Scholar]
- 12.Otsu N. A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man, Cyber. 1979;9:62–66. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.