Abstract
This paper describes the results of comparisons of digitally scanned whole slide images (WSI) and glass microscope slides for diagnosis of tissues under peer review by the National Toxicology Program (NTP). Findings in this paper were developed as a result of data collected from six Pathology Working Groups (PWGs), one pathology peer review, and survey comments from over 25 participating pathologists. For each PWG, 6–14 pathologists examined 10–143 tissues per study from 6 and 9-month perinatal studies and 2-year carcinogenicity studies. Overall it was found that evaluation of WSI is generally equivalent to using glass slides. Concordance of PWG consensus diagnoses based upon review of WSI vs. glass slides ranged from 74% to 100% (median 86%). The intra- and inter-observer diagnostic variation did not appear to influence the conclusions of any study. Based upon user opinions collected from surveys, WSI may be less optimal than glass slides for evaluation of subtle lesions, large complex lesions, small lesions in a large section of tissue, and foci of altered hepatocytes. These results indicate that, although there may be some limitations, the use of WSI can effectively accomplish the objectives of a conventional glass slide review and definitely serves as a useful adjunct to the conduct of PWGs.
Keywords: digital images, digital pathology, digital slides, pathology working group, peer review, whole slide images, whole slide imaging
INTRODUCTION
The rapid technical advances recently achieved in digital scanning of glass microscope slides, image processing, and digital storage, combined with high speed networks and improvements in personal computers, are making examination of digitally imaged tissue sections an increasingly viable option for histopathology evaluation. The potential benefits resulting from this technology are seemingly limitless. Diagnostic evaluations and second opinions can be done remotely, worldwide, and without the additional expense and time associated with travel. Slides do not need to be shipped, thus avoiding loss, breakage, or shipping expenses. As other disciplines increasingly utilize the digital medium to advance their work, the use of this technology may be viewed as an essential progression for pathologists.
Many of the newer concepts, strategies, and approaches discussed by McCullough et al. (2004) in regard to the integration of digital imaging and light microscopy have come to fruition. However, it is imperative that digital technology not be automatically implemented in toxicologic pathology without due consideration of accuracy, advantages, and disadvantages in the regulated environment. Guidelines for validation of digital pathology systems in the regulated nonclinical environment or the clinical diagnostic environment are documented in Long et al. (2013) and Pantanowitz et al. (2013). As indicated by McCullough et al. (2004), there will be an expected reluctance to accept new or unproven methodology. Therefore, to improve the comfort level of pathologists and ensure adequacy, these new methods need to be compared to, and evaluated against, time-tested and accepted conventional methodologies.
The National Toxicology Program (NTP) staff began the transition from film-based technology to digital technology to capture images of lesions on microscopic slides in 1997. In 2002, the NTP, and later the National Center for Toxicologic Research (NCTR), began using whole slide scanning technology to acquire and store whole slide images of glass microscope slides. With the advent of this technology, the NTP has created a digital image database consisting of digitally converted 2×2 slides, original digital photographs and photomicrographs, magnetic resonance imaging (MRI) images, ultrasound images, and digitally scanned whole slide images. There are currently over 80,000 digital images in the NTP database. The NTP’s digital images have been used in a variety of endeavors, including international nomenclature harmonization efforts and development of the online NTP Non-Neoplastic Lesion Atlas (Cesta et al., 2014).
The NTP has recently evaluated the use of digitally scanned microscope slides (also referred to as whole slide images or, hereafter, WSI) for pathology peer reviews and Pathology Working Groups (PWGs). The current peer review process for NTP studies entails a multi-level review of the findings initially generated by the Study Pathologist at a contract laboratory, including a pathology data review, an audit of pathology specimens, a pathology Quality Assessment (QA) peer review, and finally a PWG. The PWG review, the last stage of the peer review process for NTP studies, is typically a face-to-face meeting in which a panel of pathologists, including experts in the topic area of the PWG, reviews slides and then discusses and votes on the diagnoses of challenging lesions. The majority vote is considered the consensus diagnosis. The Study Pathologist and QA Pathologist attend the PWG in person or via tele- and videoconference.
There has been interest in using WSI for reviews currently performed by NTP staff and NCTR support personnel, with the goals of increasing efficiency and decreasing travel time and expenses while involving the Study Pathologist and more outside expert opinions. After individual pathologists have evaluated the WSI remotely, they can then use web-conferencing to discuss and determine a consensus diagnosis with other pathologists while the images are being viewed in “real-time” by a PWG. To this end, the NTP carried out two initial “virtual” PWGs in 2007 using WSI. The diagnoses and conclusions for these studies were subsequently confirmed by PWGs using the original glass microscope slides. Since these initial PWGs in 2007, WSI have been used in some manner in all NTP PWGs. Typically, the consensus opinions are derived from direct examination of the tissue sections on glass microscope slides. The WSI have largely been utilized to supplement the PWG process by illustrating findings, pointing out areas of interest during discussion of diagnoses, resolving questions, and ensuring that all PWG members observe key diagnostic features of a finding and are considering the same tissue or region of tissue as intended by the PWG Coordinator. This paper describes a series of evaluations to define and refine the use of this technology in the PWG and peer review setting.
The objective of this series of exercises was to compare diagnoses made using digitally scanned WSI in PWGs (hereafter referred to as a digital PWG) to diagnoses made based on glass slides (hereafter referred to as a conventional glass PWG) to evaluate the utility of WSI for pathology peer review. This study also presents the diagnoses and opinion poll results of over 25 participating pathologists evaluating the experience of using WSI for peer review and PWGs. The overall goal was to assess the sufficiency of using WSI for data interpretation in a regulated nonclinical environment for pathology peer review and PWGs.
MATERIALS AND METHODS
This paper describes a series of 7 exercises used to determine whether the examination of digitally scanned whole slide images (WSI) would result in the same outcome as the direct examination of the glass microscope slides. Studies included 6 and 9-month perinatal studies and 2-year carcinogenicity bioassays. The exercises included proliferative and non-proliferative findings. Most changes under review were proliferative and included those in a continuum of pre-neoplastic lesions, hyperplasias, and neoplasms. All slides and images were relabeled prior to scanning so the participating pathologists were blinded to any treatment or control designations. Diagnostic reproducibility, observational data by the authors, and comments from participating pathologists were documented. For exercises in which the same pathologists participated in both the digital and conventional glass PWGs/reviews, the time between the two PWGs/reviews ranged from 2 to 7 weeks. The first 4 exercises with ongoing conventional glass slide PWGs compared the diagnoses between the conventional glass and digital PWGs. In one instance, a QA peer review was conducted at a contract research organization using WSI (Exercise 3). A larger group of pathologists with no prior knowledge of individual diagnoses from the original conventional PWGs participated in the final 3 exercises, one of which focused on relatively subtle findings in common target organs (Exercise 7). The results of these comparisons were used to evaluate the usefulness of WSI. In all cases, the glass slide evaluations remained the official NTP and NCTR data.
RESULTS
Exercise No. 1: Comparison of Conventional Glass and Digital PWGs on Proliferative Lesions in p53 (+/−) Transgenic Mice in 6 and 9-Month Perinatal Studies
An initial digital PWG was followed by a conventional PWG three weeks later. Forty-seven slides were scanned at 40x magnification representing the following issues: differences of opinion between the Study Pathologist and the QA Pathologist that were not resolved following the QA review; potential treatment-related proliferative changes in the liver, spleen, lymph nodes, and thymus; and unusual neoplasms. PWG participants reviewed the WSI and were instructed to record their diagnoses on worksheets for each image prior to the digital PWG. The digital PWG was held via teleconference with the PWG Coordinator and the six participants located in North Carolina, Arkansas, and the state of Washington. During the digital PWG teleconference, each image was reviewed and available for viewing via projection. The pathologists discussed the diagnosis(es) for each image and reexamined images if necessary. Consensus diagnoses of the PWG were reached when at least 4 of the 6 PWG participants were in agreement. The same 6 pathologists reviewed the same 47 slides in a conventional glass PWG held at the NCTR in Arkansas three weeks later. This exercise compared concordance of the consensus diagnoses from the digital PWG to those of the conventional glass slide PWG.
Exercise No. 1 Results
Overall, there was concordance between the consensus diagnoses on 46 of 47 (98%) cases between the digital PWG and the conventional glass PWG held 3 weeks apart with the same participants. This single discrepant diagnosis, which had no effect on the study conclusions, was in the consensus diagnosis for the liver of 1 animal (hepatocellular adenoma in the digital PWG, basophilic cell focus in the conventional glass PWG). In the digital PWG, only 3 cases out of 47 did not have a consensus concerning the diagnosis prior to further discussion. These involved the differentiation between foci of altered hepatocytes and hepatocellular adenoma and between hepatocellular adenoma and carcinoma. Participants noted that it was very difficult to differentiate between normal splenic tissue and minimal degrees of lymphoid hyperplasia using WSI in the digital PWG and between lymphoid hyperplasia and lymphoma using glass slides in the conventional PWG. A variety of distinctive tumors and lesions were easily confirmed, including olfactory neuroblastoma, alveolar bronchiolar adenoma, yolk sac carcinoma, and seminiferous tubule degeneration.
Exercise No. 2: Comparison of Conventional Glass PWG after a Digital Review of Carcinogenicity Bioassays in F344 Rats and B6C3F1 Mice
The PWG review included a two-phase process with the same 7 pathologists and PWG Coordinator in each phase. In the first phase, the pathologists remotely examined the scanned images at their work locations in North Carolina, Arkansas, and the state of Washington. The pathologists were instructed to complete their digital review of the scanned images using the web viewer and submit their recorded diagnoses to the PWG Coordinator. The slides were scanned at 20x magnification. Slides selected for rats included degenerative lesions in the mesenteric lymph node, as well as proliferative lesions in the forestomach and large intestine (100 total rat cases). Slides selected for mice included proliferative lesions of the Harderian gland, large intestine, glandular stomach, liver, and pituitary gland (48 total mouse cases). The second phase was a conventional glass PWG held two weeks later at the NCTR. This exercise compared concordance of the consensus diagnoses based on review of WSI to those of the conventional glass slide PWG.
Exercise No. 2 Results: F344 Rat
Agreement between the digital and glass slide consensus diagnoses for the rat lesions was 89%. The final PWG conclusion would have been the same whether using WSI or glass slides. Of the 100 rat cases evaluated using both methods, there were 11 differences between the digital versus glass slide consensus diagnoses, and they often involved the distinction between hyperplasia, adenoma, and carcinoma (Table 1). Individual pathologist agreement of WSI diagnosis vs. glass slide diagnosis (intra-observer variability) for the 6 of 7 pathologists that completed the rat cases ranged from 55% to 91% (mean 74%, median 73%).
Table 1.
Discordant diagnoses between digital images and glass slides for rats (11 of 100 cases, Exercise No. 2).
Digital consensus | Glass consensus | No. of cases |
---|---|---|
Forestomach – hyperplasia | Forestomach – papilloma | 1 |
Large intestine – adenoma | Large intestine – autolysis | 1 |
Large intestine – adenoma | Large intestine – carcinoma | 1 |
Large intestine – adenoma | Large intestine – hyperplasia | 2 |
Large intestine – carcinoma | Large intestine – adenoma | 1 |
Large intestine – carcinoma | Large intestine – hyperplasia | 1 |
Large intestine – hyperplasia | Large intestine – adenoma | 1 |
Large intestine – hyperplasia | Large intestine – carcinoma | 3 |
Exercise No. 2 Results: B6C3F1 Mouse
Agreement between the consensus diagnoses based on mouse WSI with those based on glass slides was 75%. In the mouse review, the final PWG conclusions and study interpretation would also have been the same regardless of methodology. Of the 48 cases reviewed using WSI and glass slides, there were 12 differences (Table 2). Intra-observer variability for the 6 of 7 individual pathologists who completed the mouse cases ranged from 67% to 89% (mean 83%, median 84%). Lack of concordance between the WSI and glass slide review consensus diagnoses for both the rat and mouse lesions usually centered on distinguishing minimal lesions and borderlines between the proliferative continuums (Tables 1 and 2).
Table 2.
Discordant diagnoses between digital images and glass slides for mice (12 of 48 cases, Exercise No. 2).
Digital consensus | Glass consensus | No. of cases |
---|---|---|
Colon – goblet cell hyperplasia | No remarkable lesion | 1 |
Harderian gland – adenoma | Harderian gland – carcinoma | 1 |
Hepatocellular adenoma | Hepatocellular carcinoma | 1 |
No consensus | Glandular stomach – epithelial hyperplasia | 2 |
No consensus | Hepatocellular adenoma | 2 |
No consensus | Pituitary gland, pars distalis – hyperplasia | 1 |
No remarkable lesion | Glandular stomach – epithelial hyperplasia | 2 |
Pituitary gland, pars distalis – hyperplasia | No remarkable lesion | 1 |
Small intestine – adenoma | Small intestine – carcinoma | 1 |
Note: Because only 6 of the 7 pathologists completed the review of the mouse digital images, no consensus was established if the diagnoses were evenly divided between the two diagnostic choices.
Exercise No. 3: Focused Digital Review of Brain and Spinal Cord Proliferative Lesions
This exercise describes our experience with a relatively large QA peer review of digitally scanned slides. Rat and mouse brain and spinal cord slides (1916 slides) from a 2-year carcinogenesis bioassay were scanned at 20x magnification and the WSI were remotely examined using a web viewer. One pathologist evaluated the WSI of the rat and a second pathologist reviewed the WSI of the mouse. Both were asked to review for proliferative lesions only. The results were compared to the Study Pathologist’s (SP) diagnoses.
Exercise No. 3 Results
Although there were delays owing to limitations in the amount of storage available on the server, the time required to scan and view the large number of slides, and the slow image loading in the web viewer, the QA review process using WSI was completed sooner than could have been achieved by having the reviewing pathologists coordinate traveling to the NCTR to examine the glass microscope slides. Additionally, the digital QA review resulted in significant cost savings. This digital review, which focused on identification of proliferative lesions, was sufficient to identify five additional brain tumors and three cases of spinal cord gliosis. These new findings prompted a glass slide review, which verified the findings of the digital review.
Exercise No. 4: Comparison of Digital and Conventional Glass PWGs of a Chronic Bioassay in p53 (+/−) Transgenic Mice
This exercise compared consensus diagnoses from a conventional glass PWG held at the NCTR with those from a digital PWG held 50 days later. The 91 slides were scanned at 20x magnification and represented ranges of normal (notable variability in pancreatic islet size), spontaneous lesions (hydronephrosis), and diagnostic difficulty (lymphoreticular neoplasms). Images reviewed at the digital PWG were the same cases evaluated at the conventional glass PWG. All participants in the digital PWG reviewed the images ahead of time and submitted their diagnoses to the PWG Coordinator. Five participants from the conventional glass PWG, plus two additional pathologists, participated in the digital PWG at their work locations in Arkansas, North Carolina, Maryland, and Connecticut via computer connections and teleconference.
Exercise No. 4 Results
Overall, there was 85% agreement (77/91 cases) in consensus diagnoses made based on glass slides from the original conventional PWG compared to those made using WSI during the subsequent digital PWG. There was 100% agreement on the relatively straightforward diagnoses of pancreatic islet hyperplasia (20/20 cases), hydronephrosis (9/9 cases), and Harderian gland tumor (1/1 case). There was 83% agreement (5/6 cases) when distinguishing sarcoma, fibrosarcoma, leiomyosarcoma, neuroblastoma, and granuloma in various tissues. As one might expect there was less agreement (80%, 8/10 cases) on lesions within the diagnostic continuum of basophilic focus of altered hepatocytes, hepatocyte hyperplasia, hepatocellular adenoma, and hepatocellular carcinoma. The lowest agreement (75%, 34/45 cases) occurred when distinguishing between lymphoid hyperplasia, malignant lymphoma, histiocytic sarcoma, and granulocytic leukemia.
Exercise No. 5: Comparison of Digital and Conventional Glass PWGs with Each Other and Previously Recorded PWGs
This exercise compared results from contemporaneous digital and glass PWGs to each other and each were also compared to the original conventional glass NTP PWGs of 2 year carcinogenicity bioassays, using 14 pathologists who had no prior knowledge of the original conventional glass NTP PWG diagnoses. There were 17 rat slides and 11 mouse slides, all scanned at 20x magnification. Changes reviewed included proliferative lesions of the clitoral gland, larynx, liver, mammary gland, nose, pituitary gland, skin, small intestine, thyroid gland, and nose. Non-proliferative lesions included inflammation of the larynx, metaplasia of the larynx, inflammation of the Harderian gland, and inflammation and atrophy in the nose. Two groups of seven pathologists reviewed both the glass slides as well as the WSI, separated in time by one month. All review answer sheets were submitted to and tabulated by the PWG Coordinator. The PWGs in this exercise were conducted in the traditional “round table” manner where the participants reviewed the data, examined either the glass slides or WSI, and discussed and voted on the lesions to achieve consensus (Fig. 1). Questionnaires were distributed after the PWGs to evaluate the users’ experiences and opinions.
Figure 1.
Pathologists participating in a digital Pathology Working Group.
Exercise No. 5 Results
Concordance of consensus diagnoses was 100% for rats (17/17) and 91% for mice (10/11), for an average concordance of 96% between the digital and glass PWGs within this exercise. Interestingly, there were larger differences between the consensus diagnoses between either of the digital and glass PWGs from this exercise and the original glass NTP PWGs, which used different groups of pathologists than in the current exercise. The digital PWG consensus diagnoses in this exercise were the same as the original glass NTP PWGs 71% of the time. The glass PWG consensus diagnoses in this exercise were the same as the original glass NTP PWGs 75% of the time. Most disagreements were in diagnoses involving the diagnostic continuum of normal, hyperplasia, adenoma, and carcinoma. This suggests that intrinsic diagnostic variability may be a greater issue than the method of evaluation of lesions using glass slides or WSI. A disadvantage to this experimental design was that most pathologists (10 of 14) reported having a memory of cases from the previous review one month prior, be it glass or digital.
Participants noted that an important advantage of using WSI for a PWG included the ease of viewing critical areas of pertinent lesions on the screen by the group. The majority (12/14) of the participants agreed with the statement “Digitized images best serve as an adjunct to traditional PWGs”. The majority (11/14) of the participants thought it was very useful or essential to have the ability to project WSI of problem cases during past conventional NTP PWGs. The majority (12/14) of the participants agreed with the statement “Experts on the organ of interest should be able to view images digitally and vote remotely”.
Opinion questionnaires captured the concerns by participants that WSI evaluation may be more difficult than glass slide evaluation for the following: large tissue sections (e.g., brain), complex large lesions, large benign neoplasms vs. minimally invasive carcinomas, foci of altered hepatocytes (eosinophilic vs. mixed vs. basophilic), inflammatory processes with exuberant fibroplasia vs. mesenchymal neoplasia, nuclear changes such as contracted condensed nuclei vs. mitotic figures, subtle lesions, small lesions and/or intracellular changes for which 20x or 40x magnification is required, multifocal lesions such as cardiomyopathy, and tinctorial changes. Evaluation may be hampered by possible limitations in adjusting the focus depth in a tissue using WSI. These limitations perceived by the participants may have led to disagreement by the majority (9/14) of the participants with the statement “Digital PWGs will replace conventional PWGs in the future”. Most participants (10/14) felt that the quality of WSI was worse than viewing glass slides on a microscope. It should be noted, however, that a limitation in this exercise and some of the other exercises herein was that slides were scanned at 20x magnification rather than 40x. The conventional glass slide PWG took approximately the same amount of time as the digital PWG in this exercise, but the majority of participants felt it was more accurate.
Exercise No. 6: Comparison of Digital Review with Previously Recorded Conventional Glass PWG
This exercise represents the largest comparison of WSI vs. glass slide review of PWG slides done thus far by the NTP. A panel of 14 pathologists, none of whom were present at the original PWG, reviewed WSI scanned at 20x magnification of 122 cases for rats and 143 cases for mice. Participants were given about one month to finish the task and submit their diagnoses.
Exercise No. 6 Results
Agreement of consensus diagnoses between the WSI review in this exercise and the original glass PWG was 86% for rat and 84% for mouse lesions (Tables 3 and 4). Only 9 of the 14 participants completed the task, for which there was no formal digital PWG conference. Participants took from 8 to greater than 20 hours to complete the review. Lack of agreement usually centered on the distinction along the proliferative continuum of hyperplasia, adenoma, and carcinoma, especially for thyroid and clitoral glands in rats or for skin in mice.
Table 3.
Digital and glass diagnostic agreement in a chronic study in rats (Exercise No. 6).
Original glass slide PWG diagnosis | No. of cases | Disagreements digital vs. glass | % Agreement digital and glass |
---|---|---|---|
Spleen, Liver – mononuclear cell leukemia | 8 | 0 | 100 |
Testis, Epididymis – mesothelioma | 20 | 0 | 100 |
Zymbal’s Gland carcinoma | 7 | 0 | 100 |
Oral Mucosa, Forestomach, Tongue – hyperplasia, papilloma | 20 | 2 | 90 |
Miscellaneous tumors | 14 | 2 | 86 |
Mammary Gland tumors | 11 | 2 | 82 |
Heart – cardiomyopathy, schwannoma | 20 | 4 | 80 |
Thyroid Gland – adenoma, carcinoma | 12 | 3 | 75 |
Clitoral Gland – hyperplasia, adenoma, carcinoma | 10 | 4 | 60 |
Total | 122 | 17 | 86 |
Table 4.
Digital and glass diagnostic agreement in a chronic study in mice (Exercise No. 6).
Original glass slide PWG diagnosis | No. of cases | Disagreements digital vs. glass | % Agreement digital and glass |
---|---|---|---|
Lung proliferative lesions | 9 | 0 | 100 |
Various lesions | 8 | 0 | 100 |
Mammary Gland tumors | 16 | 1 | 94 |
Forestomach proliferative lesions | 27 | 2 | 93 |
Ovary tumors | 8 | 1 | 88 |
Harderian Gland tumors, Eye – cataract | 34 | 5 | 85 |
Miscellaneous tissues/tumors for which SP and QAP had disagreed | 10 | 3 | 70 |
Skin/subcutaneous tumors | 31 | 11 | 65 |
Total | 143 | 23 | 84 |
Note: SP = Study Pathologist; QAP: Quality Assessment review Pathologist.
Exercise No. 7: Comparison of Digital Review of Minimal to Mild Lesions in Common Target Organs with Previously Recorded Conventional Glass PWGs
The final exercise was designed to evaluate the use of WSI in the diagnosis of subtle, usually minimal to mild, lesions in 6 common target organs: liver, kidney, lung, skin, brain, and nasal cavity. A total of 22 slides with 31 diagnoses from previously recorded conventional PWGs were scanned at 40x magnification. A panel of 9 pathologists with no prior knowledge of the original diagnoses recorded their diagnoses on a worksheet with predetermined multiple-choice options to select from for each diagnosis. Results were compared to the diagnoses from previously conducted conventional glass PWGs. The pathologists were asked to rate tissues in order of ease of evaluation using a grading scale ranging from 1 (easiest) to 6 (most difficult). Following this exercise, participants completed a questionnaire evaluating their experiences and confidence in evaluating WSI.
Exercise No. 7 Results
Overall, there was 74% agreement of the individual diagnoses for participating pathologists with the original PWG diagnoses (range 58% to 84%). The least successful (22%) diagnostic concordance with an original PWG diagnosis was for a mixed cell focus in a mouse liver. The most successful (100%) diagnostic concordance involved diagnoses of hematopoietic cell proliferation and hepatocellular hypertrophy in the liver and papillary necrosis and tubular necrosis in the kidney in both rats and mice.
Of the six tissue types evaluated, skin (mean score = 1.8) was considered as the least difficult tissue to evaluate using WSI, followed by lung (2.1). Nasal cavity (3.2) and kidney (4.4) were considered intermediate in difficulty. Liver (4.9) was considered the most difficult tissue to evaluate digitally, followed by brain (4.6).
All participants agreed that the process was simple and easy to learn and use but that the WSI quality was slightly inferior to glass slides. Participants were most confident in using WSI to identify lesions that could be easily diagnosed at low magnification. They were least confident when using digital technology to detect subtle intracellular changes (e.g., hyaline droplet accumulation in the kidney), as well as slight tinctorial differences. Some participants expressed the opinion that large organs were especially challenging and difficult to examine. Users strongly disagreed with the effectiveness of using WSI by the Study Pathologist to evaluate an entire 90-day subchronic or a 2-year chronic study.
A few participants commented that they were dissatisfied and frustrated with some of the technical features and limitations of the image viewing technology (server access, refresh rate, restricted fine focus, etc.). Several participants commented on difficulty achieving optimal image focus (particularly at high magnification). Inefficient viewing was stated to result from protracted “refresh/resolution rates,” which led participants to describe the process as “too slow” or “not good/quick enough”. Several also mentioned that the time required to review WSI as compared to glass slides was generally excessive. Despite the technical limitations, participants stated that remote access to WSI, the capability of sharing images at multiple sites, and the ease of recording and archiving images were highly beneficial applications.
DISCUSSION
This series of exercises suggests that the review of whole slide images (WSI) is a useful, reliable, potentially cost-saving, and productive method in the PWG and/or peer review process. In general, the diagnostic quality and accuracy of toxicological pathology rodent lesion diagnoses made by viewing digitally scanned WSI equal that using glass slides.
Accuracy of Whole Slide Images for Peer Review and PWGs
There was generally a high degree of diagnostic concordance between rodent lesion diagnoses based on glass slides and WSI evaluated during the PWG stage of the NTP peer review process (range 74% to 100%, mean 87%, median 86%). This range is remarkably similar to the concordance range of 73% to 98% reported in a literature review of studies validating WSI for clinical diagnostic use (Pantanowitz et al. 2013). The percentage of discordance may be less important than the type of discordance. In studies comparing concordance of diagnoses based on WSI and glass slides in the clinical setting of human medicine, concordance is often classified as no discrepancy (complete agreement between the two diagnoses), minor discrepancy (a difference in the two diagnoses with no effect on clinical care or prognosis), and major discrepancy (a difference that affects clinical care or prognosis) (e.g., see Jones et al. 2015). In the exercises described herein, many of the discrepancies were along the proliferative continuum (e.g., hyperplasia, adenoma, carcinoma). Such discrepancies would certainly have clinical ramifications (e.g., a patient with a precursor lesion vs. benign vs. malignant tumor). However, all proliferative findings along the continuum are taken into account in the final carcinogenic activity calls in NTP studies (i.e., clear evidence, some evidence, equivocal evidence, or no evidence of carcinogenic activity), so the final outcomes in the nonclinical, toxicologic pathology setting would generally not be affected. Although diagnoses based on the glass slides remain the official NTP data, it is important to note that the study interpretations as a result of PWGs based on using WSI would not have differed from those based on examination of conventional glass slides.
The results seem to indicate that evaluation of WSI is adequate for obtaining an accurate diagnosis for most types of lesions. Discrepant diagnoses in many exercises frequently centered on distinguishing minimal lesions from normal; the diagnostic continuum of hyperplasia, adenoma, and carcinoma; and between lymphoid hyperplasia, lymphoma, and lymphoma types. However, difficulties in these diagnoses are also typical with conventional glass slide reviews. It appeared that for many of the exercises, different pathologists might render different opinions due to differences in the normal range of diagnostic interpretation, irrespective of whether glass slides or WSI are used. In Exercise No. 5, there was excellent agreement (96%) between the digital and glass consensus diagnoses made by the same pathologists, but lesser agreement (71% and 75%, respectively) when compared to the original previously recorded PWG diagnoses made by a different group of pathologists. Intrinsic diagnostic variability may thus be a greater issue than methodology in some instances.
In some cases, however, effective use of WSI may be somewhat limited. Based on user opinions, WSI may not be optimal for evaluation of subtle lesions, large complex lesions, small lesions in a large section of tissue, foci of altered hepatocytes, cardiomyopathy, or when focus on varying depths within a tissue is required. Pathologists had more confidence using WSI for lesions that could be diagnosed at lower magnification than lesions that required higher magnification and distinction of intracellular details (mitotic figures, cellular pleomorphism, intracellular accumulations, etc.). There was a lesser degree of agreement on distinguishing malignant lymphoma, histiocytic sarcoma, and granulocytic leukemia, which are lesions sometimes dependent upon a more careful detailed cytologic evaluation. When viewing WSI, participants frequently mentioned the limited ability to differentiate subtle color variation and intracellular changes. These challenges were likely reflected in poor correlative scores for commonly observed lesions such as hyaline droplet accumulation in the kidney, foci of altered hepatocytes, and necrosis in the brain, which are lesions dependent on tinctorial contrast to accurately diagnose.
Other factors that may have been contributed to diagnostic disagreements include the absence of a concurrent control (e.g., sebaceous hyperplasia in skin), marginal image quality as a result of either poor staining or age of the section, as well as user experience and technical ability with the image viewer. Certain measures can be taken to ensure data quality when evaluating WSI. In general, slides should be scanned at the maximum magnification (40x). In 5 of the 7 exercises herein, slides were scanned at 20x magnification, which may have contributed to difficulty ascertaining intracellular detail, decreased diagnostic correlation, and user dissatisfaction. In addition, measures should be in place to ensure that all tissues on the glass slide are present in the WSI, the tinctorial quality of the glass slide is accurately reflected, and the image is otherwise a true representation of the glass slide (Long et al. 2013, Pantanowitz et al. 2013).
The time between digital and conventional glass PWGs by the same participants in this paper ranged from 2 to 7 weeks. Most pathologists in Exercise 5 mentioned that they had some memory of their previous diagnosis when the PWGs were separated by about 1 month. However, the time intervals in these exercises are in line with the evidence-based guidelines for validating whole slide imaging systems for diagnostic purposes by the College of American Pathologists as described by Pantanowitz et al. (2013) of a “washout period” between viewing WSI and glass slides of at least 2 weeks.
Advantages of Whole Slide Images in Peer Review and PWGs
Whole slide images are unarguably beneficial to the PWG process in many ways. Use of WSI allows accurate measurement of a lesion, which can be important in the diagnosis of hyperplastic lesions vs. adenomas, for example. Having WSI available to review in advance of the PWG can aid in developing familiarity with a study being reviewed. Participants can preview the slides at their convenience, spend as little or as much time as preferred, and prepare for active discussion of the lesions being reviewed. Consequently, the PWG can function more efficiently and expediently.
Having the WSI available during the PWG is generally very helpful for demonstrating lesions and guiding pathologists to a critical area of a lesion or key features to aid in diagnosis, and annotating WSI is generally easier and more accurate than annotating glass slides. The ability to display WSI during the PWG discussion and voting phase ensures that all pathologists are viewing the correct area of the slide or lesion for cases that are controversial or show a lack of consensus. This assures that each PWG participant observes and considers the same morphologic features that form the basis for a specific diagnosis, thereby facilitating consensus building.
A distinct advantage of the use of WSI in the peer review and PWG process is the flexibility it allows. Digital pathology systems allow remote participation by the Study Pathologist at the contract laboratory or outside experts who can contribute significantly to the PWG review and discussion. Most participants supported the concept that experts on the tissue of interest should be able to evaluate WSI and vote remotely and that WSI are an excellent adjunct to PWGs. A valuable part of the PWG experience is the face-to-face round table discussion atmosphere. Participants preferred having face-to-face interactions in PWGs to foster discussion and resolve discrepancies in the diagnoses. Technology such as video conferencing can enable remote pathologists to demonstrate specific controversial lesions and help maintain the round table discussions of the PWG.
Disadvantages of Whole Slide Images in Peer Review and PWGs
In general, although the diagnostic quality and accuracy of the pathology results of these rodent findings made by using WSI may have been equivalent to using glass slides, the time required to evaluate WSI was generally regarded as excessive. Processing delays associated with the scanning of the tissue sections and data storage also slowed the overall review process, but this is improved as the technology advances. Completion of WSI evaluation prior to a formal PWG meeting may be limited by individual time constraints. It is significant that 5 of the 14 pathologists selected to do Exercise No. 6 did not complete the task when given a one-month turnaround time. However, there was no formal teleconference that otherwise may have prompted completion of the task.
Due to technical limitations, pathologists in several exercises noted that image loading and refresh times in the viewing software made viewing of large areas of tissues very tedious and slower than if one was evaluating the glass slide. Monitor size and resolution and computer, server, and connectivity speed all influence the WSI viewing experience. Some of these limitations likely led the participants to list the liver and brain as the most difficult organs to evaluate. It should be noted that advances in digital pathology technology since commencement of these exercises, in addition to diminution of data storage and connectivity limitations, have resulted in decreased slide scanning time and faster loading and viewing of high resolution WSI.
Pathologists commented that it took some time to learn how to navigate the use of the digital pathology system and this increased the time spent viewing WSI as compared to glass slides. Some participants noted that time spent evaluating WSI decreased with practice. Therefore, a recommendation to increase efficiency, apart from technological advances, would be to have formal training on image viewing technology for participants. Adequate training of pathologists in the use of digital pathology technology has been shown in a literature review to result in greater accuracy of WSI interpretation (95% with training vs. 79% without training), better concordance between WSI and glass slide diagnoses (89% with training vs. 84% without training), and shorter interpretation time (4.9 minutes with training vs. 11.5 minutes without training) (Pantanowitz et al. 2013).
Conclusions
The quantitative results of this exercise, as well as the subjective survey comments by participants, indicate that WSI evaluation has equivalent applications as the evaluation of glass slides to the PWG and pathology peer review processes with some limitations in time, image quality, and lesion types. Concordance between diagnoses based on WSI and glass slide evaluation in PWGs ranged from 74% to 100%. While it is difficult to specify what percentage of concordance is acceptable or unacceptable, the ultimate answer in these cases is whether or not the outcome of the PWG, which leads ultimately to the conclusions of the studies, would have differed based on WSI vs. glass slide evaluation. Since the conclusions of the studies would have been the same regardless of methodology, evaluations of WSI or glass slides were equivalent in the PWGs described herein. The use of WSI for digital PWGs has met the goals of involving remote participants such as the original Study Pathologist and outside experts, improving the quality of PWG discussions and consensus building, and saving resources and costs by eliminating or limiting travel as well as (potentially) slide shipping expenses. The use of WSI has to some degree met the goals of saving pathologists time and providing high quality images with convenient access and easily navigable viewing software. Furthermore, it is expected that digital imaging technology will continue to develop finer image quality and faster refresh/resolution rates which should result in a more user-friendly experience. This should permit the evaluation of larger numbers of images as well as enhance the diagnostic confidence of the user.
Given its current state of development, this technology is an effective tool to review, discuss, and acquire valuable input on a limited number of images and/or lesions. Changes along a diagnostic continuum (e.g., normal, hyperplasia, adenoma, carcinoma) may be diagnostically challenging regardless of whether glass slides or WSI are used. If the WSI cannot be duly interpreted, the glass microscope slide should be used for pathology peer review or the PWG (Toumari et al. 2007). It is up to the pathologist’s judgment to assess whether the WSI is of sufficient quality to render a diagnosis, as holds true for light microscopy. In conclusion, although technical limitations may curb user confidence in the evaluation of subtle lesions as well as large sections of tissue, digital pathology systems equal glass slide evaluation in the accuracy of toxicological pathology rodent lesions and have multiple useful applications for the peer review and PWG processes.
Acknowledgments
The authors give our sincere thanks to all participating pathologists in the exercises reported herein: Amy Brix, Karen Cimon, Mark Cesta, John Cullen, Gordon Flake, Sabine Francke-Carroll, Ronald Herbert, Georgette Hill, Mark Hoenerhoff, Brian Knight, Linda Kooistra, John Latendresse, Robert Maronpot, Paul Mellick, Steven Mog, Rebecca Moore, James Morrison, Todd Painter, Cynthia Shackleford, Robert Sills, and Jerrold Ward. The authors also thank Ann Chavis, Lorri Ezedin, Julie Foley, Carrie Prince, Maureen Puccini, Annette Shambley, Emily Singletary, Alan Warbritton, and Lisa Wiley for technical support. We thank Arun Pandiri and Vivian Chen for their review of the manuscript. This project has been funded in whole or in part with Federal funds from the National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services.
Abbreviations
- NCTR
National Center for Toxicologic Research
- NTP
(U.S.) National Toxicology Program
- PWG
Pathology Working Group
- SP
Study Pathologist
- QA
Quality Assessment
- WSI
whole slide image(s)
References
- Cesta MF, Malarkey DE, Herbert RA, Brix A, Hamlin MH, II, Singletary E, Sills RC, Bucher JR, Birnbaum LS. The National Toxicology Program web-based nonneoplastic lesion atlas: A global toxicology and pathology resource. Toxicol Pathol. 2014;42:458–460. doi: 10.1177/0192623313517304. Website: http://ntp.niehs.nih.gov/nnl/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones NC, Nazarian RM, Duncan LM, Kamionek M, Lauwers GY, Tambouret RH, Wu CL, Nielsen GP, Brachtel EF, Mark EJ, Sadow PM, Grabbe JP, Wilbur DC. Interinstitutional whole slide imaging teleconsultation service development: assessment using internal training and clinical consultation cases. Arch Pathol Lab Med. 2015;139:627–635. doi: 10.5858/arpa.2014-0133-OA. [DOI] [PubMed] [Google Scholar]
- Long RE, Smith A, Machotka SV, Chlipala E, Cann J, Knight B, Kawano Y, Ellin J, Lowe A. Scientific and Regulatory Policy Committee (SRPC) paper: Validation of digital pathology systems in the regulated nonclinical environment. Toxicol Pathol. 2013;41:115–24. doi: 10.1177/0192623312451162. [DOI] [PubMed] [Google Scholar]
- McCullough B, Ying X, Monticello T, Bonnefoi M. Digital microscopy imaging and new approaches in toxicologic pathology. Toxicol Pathol. 2004;32(Suppl 2):49–58. doi: 10.1080/01926230490451734. [DOI] [PubMed] [Google Scholar]
- Pantanowitz L, Sinard JH, Henricks WH, Fatheree LA, Carter AB, Contis L, Beckwith BA, Evans AJ, Otis CN, Lal A, Parwani AV. Validating whole slide imaging for diagnostic purposes in pathology: Guideline from the College of American Pathologists Pathology and Laboratory Quality Center. Arch Pathol Lab Med. 2013;137:1710–22. doi: 10.5858/arpa.2013-0093-CP. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toumari DL, Kemp RK, Sellers R, Yarrington JT, Geoly F, Fouillet XLM, Dybdal N, Perry R, Long P. Society of Toxicologic Pathology position paper on pathology image data: Compliance with 21 CFR parts 58 and 11. Toxicol Pathol. 2007;35:450–55. doi: 10.1080/01926230701284509. [DOI] [PubMed] [Google Scholar]