Whole slide imaging is believed to have the potential to replace the use of an optical microscope as the means of reviewing histopathology for anatomic diagnosis. Potential benefits include: Improve efficiency, cost-effectiveness, and accessibility to high-quality pathology review, and potentially improve diagnostic accuracy and reproducibility. Many parties including practicing and academic pathologists, manufacturers, health system administrators, regulators, and patients-want to see the widespread adoption of safe and effective digital pathology. The widespread adoption of digital pathology has faced some obstacles; paramount of which manual interpretation has been the gold standard for over a century, but additionally economics, infrastructure, workflow, and concerns that whole slide images may not be adequate for diagnostic need. This last point is embodied by the fact that WSI has not been approved for broad use in the US due to a lack of regulatory science and data needed to assure safety and effectiveness. In an effort to advance the field of digital pathology, the authors propose an open working group focused on defining and characterizing the technical and clinical components related to digital pathology.
A whole slide image (virtual slide) is a digital reproduction of an optical image of a real physical object (a section of stained tissue on a glass slide). WSI systems, as with all digital imaging systems, capture a finite amount of information from the object. Determining what information is important within the broad scope of histo- and cyto-morphologic detail utilized by a pathologist to render a diagnosis remains unclear. Often these issues are spoken about with language such as “Image Fidelity” or “Image Quality”; however neither term is solely appropriate or accurate. Rather the question is: How do the imaging characteristics impact the degree to which a digital image is “fit for purpose”, for example, fit for some diagnostic task on a computer screen by a pathologist? In fact, the impact may be different across diagnostic tasks and may be different when image analysis is involved, as with computer-aided diagnosis. The challenge is to identify tasks that stress imaging characteristics and yield results that generalize to other tasks, as well as to design studies that compare pathologist performance completing these tasks with WSI to performance with a microscope.
Experienced pathologists can readily detect differences between the microscope image and whole slide images and indeed between whole slide images or microscopes from different vendors. They may even cite these subjective differences as a deficiency of WSI (and as a reason for nonadoption of the technology), but it is not clear that these differences lead to differences in diagnostic performance. One reason for the lack of clarity is that the quantitative assessment of technical performance characteristics (resolution, contrast, dynamic range) throughout the WSI imaging chain (illumination, optics, scanner, image processing, transmission and display) has not been comprehensively evaluated and appreciated despite being considered as the first level of efficacy to be tackled.[1] The differences between the optimal microscope and WSI are sufficiently substantial that comparisons of user performance must examine the broad scope of the activity of histopathologic diagnosis. Diagnostic performance is the second level of efficacy to be addressed.[1] If this level is approached with a clear and comprehensive understanding of the technical performance characteristics, it will be possible to link the two. At the same time, data from a technical assessment can complement clinical data and may reduce the size and time (and cost) of clinical studies. Currently, it is not clear how technical performance metrics affects diagnostic performance of pathologists. On one hand, several reports of implementation of WSI in clinical use and initial validation studies have been published, some quite large and long-term.[2,3,4] On the other hand, WSI systems might be generating images in which there is an inherent risk of increased diagnostic error compared to diagnoses made with a microscope. Regardless, systematic evaluation is an element of quality assurance.
Data on the performance of the pathologist in reference to technical performance are limited, at best. Apart from obvious comments about the difficulty in identifying small “objects”, such as Helicobacter organisms or mitoses, or general comments about “out of focus” areas on whole slide images, only rare qualitative allusions to the general effect of technical performance on diagnostic performance have been published. For example, difficulties interpreting dysplasia/atypia;[5] difficulties with micrometastasis detection in one of the authors’ work,[6] the distinction of reactive atypia from adenocarcinoma and (less seriously) distinguishing neutrophils from eosinophils in esophagitis,[2] and discordances in the diagnosis of skin lesions attributed to difficulty in identifying eosinophils and apoptotic cells, and grading cytological atypia.[7] In all cases, the difficulties were noted to be rare and sporadic. No single causative factor was explicitly identified rather discordances were attributed by the authors to a combination of factors including scan focus, compression, color reproduction, dynamic range, and display factors, as well as to the specific clinical problem being addressed. Gilbertson et al.[5] commented that the effects of various image quality factors may be additive.
These findings have typically not been examined in quantitative experiments and resulted from early-phase studies focusing on the general application of digital pathology as a proof of concept. As such, these studies did not address the limits of the new technology or requirements for its adoption into clinical practice. For those questions to be answered, appropriate study designs are needed. Many of the published trials and reviews of WSI are small in terms of sample size per diagnostic task,[8] they often lack description of the technical characteristics of the WSI imaging chain (e.g., illumination, scanner optics, camera calibration, display calibration), and do not control for variables such as inter-scanner, inter-display, or inter-observer variability, nor differences in tissue preparation or staining protocols that might interact with image properties. Issues such as adequate sampling for specific tasks, enriching data for important population subgroups, and accounting for reader variability, need to be thoughtfully addressed. Such study designs and analyses are not trivial and have not yet been embraced in the WSI evaluation literature, but have been instrumental in the evaluation, regulatory approval, and community adoption of digital imaging systems in radiology.[9,10,11]
This lack of knowledge and inability to address the impact of the technical characteristics in WSI leads to uncertainty with multiple consequences. We do not know with certainty whether the specifications of existing systems are adequate for general diagnostic use, whether they might be made adequate with relatively minor adjustments (e.g., to image compression levels or display settings) or additional pathologist training,[12,13] or whether digital pathology at its current technology levels may be adequate for some diagnostic tasks, but not for others. Existing users of WSI may be unaware of the technical performance levels of their device, how to maintain those levels, and what tasks they can accomplish at those levels. New users may be reluctant to take up WSI if technical or diagnostic performance is perceived as inferior to the microscope. Regulators currently do not have the data necessary to approve WSI as a primary diagnostic modality. Vendors do not have clear guidance on what constitutes “adequate” levels of technical performance for diagnostic use, relying on subjective and often conflicting pathologist opinion.
To address the issues above, we propose a working group of stakeholders (industry, clinicians, academia, and government) interested in advancing the evaluation of WSI. An overarching goal of the working group is to characterize WSI properly with systematic technical measurements and validation studies that would allow the clinical utility of digital pathology to be maximized. Much of this characterization will utilize the microscope for baseline performance expectations. The short-term objectives of the working group are:
To form a group of interested parties
To lay out the key technical performance metrics for WSI: Gather information on the current state of the science, identify gaps in knowledge and unmet needs, and identify circumstances in which technical performance has been linked to diagnostic performance
To raise awareness of the issues among pathologist users, vendors, regulators, and research and healthcare funding agencies.
If we are successful with the modest short-term objectives, we will consider some more ambitious long-term objectives. Possible long-term objectives are to facilitate and promote research in this area aiming to:
Develop, standardize, and explore a range of technical performance metrics in WSI
Design and execute experiments investigating pathologist performance as a function of image quality
Create and disseminate methods, tools, and examples for evaluating technical and diagnostic performance (phantoms, shared sets of slides, WSI images, protocols, study designs, analysis methods and source code).
These objectives will be further refined based on feedback and the expertise of the working group participants. We expect that the contributions of this group will make it easier for investigators to answer the key questions related to the validation of digital pathology and its adoption into clinical practice.
We invite you to join this working group (http://nciphub.org/groups/wsi_working_group), and we ask that you share this invitation to motivated and interested groups and individuals. We believe that the time and environment are ripe for this working group and have received encouragement and support from industry and government.
Footnotes
Available FREE in open access from: http://www.jpathinformatics.org/text.asp?2015/6/1/4/151880
REFERENCES
- 1.Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making. 1991;11:88–94. doi: 10.1177/0272989X9101100203. [DOI] [PubMed] [Google Scholar]
- 2.Bauer TW, Schoenfield L, Slaw RJ, Yerian L, Sun Z, Henricks WH. Validation of whole slide imaging for primary diagnosis in surgical pathology. Arch Pathol Lab Med. 2013;137:518–24. doi: 10.5858/arpa.2011-0678-OA. [DOI] [PubMed] [Google Scholar]
- 3.Thorstenson S, Molin J, Lundström C. Implementation of large-scale routine diagnostics using whole slide imaging in Sweden: Digital pathology experiences 2006-2013. J Pathol Inform. 2014;5:14. doi: 10.4103/2153-3539.129452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Evans AJ, Chetty R, Clarke BA, Croul S, Ghazarian DM, Kiehl TR, et al. Primary frozen section diagnosis by robotic microscopy and virtual slide telepathology: The University Health Network experience. Hum Pathol. 2009;40:1070–81. doi: 10.1016/j.humpath.2009.04.012. [DOI] [PubMed] [Google Scholar]
- 5.Gilbertson JR, Ho J, Anthony L, Jukic DM, Yagi Y, Parwani AV. Primary histologic diagnosis using automated whole slide imaging: A validation study. BMC Clin Pathol. 2006;6:4. doi: 10.1186/1472-6890-6-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Randell R, Ambepitiya T, Mello-Thoms C, Ruddle RA, Brettle D, Thomas RG, et al. Effect of display resolution on time to diagnosis with virtual pathology slides in a systematic search task. J Digit Imaging. 2014 doi: 10.1007/s10278-014-9726-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Velez N, Jukic D, Ho J. Evaluation of 2 whole-slide imaging applications in dermatopathology. Hum Pathol. 2008;39:1341–9. doi: 10.1016/j.humpath.2008.01.006. [DOI] [PubMed] [Google Scholar]
- 8.Gavrielides MA, Conway C, O’Flaherty N, Gallas BD, Hewitt SM. Observer variability in the interpretation of HER2/neu immunohistochemical expression with unaided and computer-aided digital microscopy. Anal Cell Pathol. 2014 doi: 10.1043/1543-2165-135.2.233. DOI: 10.3233/ACP-140090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: A tutorial review. Acad Radiol. 2007;14:723–48. doi: 10.1016/j.acra.2007.03.001. [DOI] [PubMed] [Google Scholar]
- 10.Gallas BD, Chan HP, D’Orsi CJ, Dodd LE, Giger ML, Gur D, et al. Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. Acad Radiol. 2012;19:463–77. doi: 10.1016/j.acra.2011.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhou XH, Obuchowski NA, McClish DK. Hoboken, NJ: John Wiley and Sons; 2009. Statistical Methods in Diagnostic Medicine. [Google Scholar]
- 12.Weinstein RS, Descour MR, Liang C, Bhattacharyya AK, Graham AR, Davis JR, et al. Telepathology overview: From concept to implementation. Hum Pathol. 2001;32:1283–99. doi: 10.1053/hupa.2001.29643. [DOI] [PubMed] [Google Scholar]
- 13.Dunn BE, Choi H, Recla DL, Kerr SE, Wagenman BL. Robotic surgical telepathology between the Iron Mountain and Milwaukee Department of Veterans Affairs Medical Centers: A 12-year experience. Hum Pathol. 2009;40:1092–9. doi: 10.1016/j.humpath.2009.04.007. [DOI] [PubMed] [Google Scholar]