Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 28.
Published in final edited form as: Lab Invest. 2021 Dec 28;102(5):554–559. doi: 10.1038/s41374-021-00718-y

A generative adversarial approach to facilitate archival-quality histopathologic diagnoses from frozen tissue sections

Kianoush Falahkheirkhah 1,2, Tao Guo 7, Michael Hwang 9, Pheroze Tamboli 7, Christopher G Wood 10, Jose A Karam 8,10, Kanishka Sircar 7,8, Rohit Bhargava 1,2,3,4,5,6,*
PMCID: PMC9050807  NIHMSID: NIHMS1762079  PMID: 34963688

Abstract

In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (>24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (<1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing, morphologic deformation and more frequent artifacts. Here, we bridge this gap using artificial intelligence. We synthesize FFPE-like images (“virtual FFPE”) from FF images using a generative adversarial network (GAN) from 98 paired kidney samples derived from 40 patients. Five board-certified pathologists evaluated the results in a blinded test. Image quality of the virtual FFPE data was assessed to be high and showed a close resemblance to real FFPE images. Clinical assessments of disease on the virtual FFPE images showed a higher inter-observer agreement compared to FF images. The nearly instantaneously generated virtual FFPE images can not only reduce time to information but can facilitate more precise diagnosis from routine FF images without extraneous costs and effort.

Keywords: Nephropathology, clear cell renal cell carcinoma, digital pathology, artificial intelligence, deep learning, generative adversarial networks

Introduction

Histopathologic examination of microscopic morphologic patterns within stained tissues is often a critical diagnostic step in many clinical and research activities. Notably, the content and quality of images is dependent on the selection two methods to prepare thin sections - formalin-fixed, paraffin-embedding (FFPE) or fresh frozen (FF). FFPE processing is commonly used for stability of tissue to make it broadly available, requires fewer resources, enables easier sample preparation and better preserves tissue morphology but is time-consuming. Pathologists are historically accustomed to examining stained FFPE sections and determinations made on them are the typical gold standard. In contrast, FF processing can enable rapid decision making at the point of care and is more suitable for molecular analyses1,2 but needs greater resources and may suffer from artifacts. FF techniques are especially suitable for intraoperative pathology since images can be obtained significantly faster (<30 minutes) compared to FFPE approaches (>24 hr)3,4. While FS processing allows decision-making in near real-time,5 diagnoses are more challenging due to greater prevalence of artifacts and variability compared to archival quality FFPE images. This gap in diagnostic quality may lead to a deferral or inaccurate diagnosis.611 In the case of kidney cancer, for example, FF cause difficulties in diagnoses12 and discordance with FFPE gold standards13, leading to an uncertain benefit for intraoperative assessment14. Here, we report bridging this gap by utilizing artificial intelligence (AI) to model the underlying relationship between morphology and contrast of FFPE and FF stained images.

Deep learning (DL)15 is a powerful AI technique with successes in a variety of fields, including image processing16, speech recognition17, self-driving cars18, and healthcare19. Instead of relying on extensive information to develop our DL framework, as is common for unconstrained problems, we sought to take advantage of the dual preparation methods in both the choice of the DL framework and our study design. Generative adversarial networks (GANs)20 are a special class of DL methods that have garnered attention in image style transfer and synthesis2124. They have also shown great promise for a variety of digital pathology tasks such as tumor classification2527, stain normalization2830, and virtual staining of label-free tissues3133. We chose GANs to generate “virtual FFPE” images from FF images since they can be especially powerful in our study design. We use renal cell carcinoma (RCC) as an exemplar of the general problem in pathology of relating FS to FFPE since pathologists are familiar with evaluating both types of images and extensive ground truth is available. Moreover, RCC is not as prevalent as some of the major cancers but provides a large enough cohort to develop and validate the approach here. Since a paired FF-FFPE dataset from the same histological section is not feasible, we employed the next best available option - FF and FFPE processed sections from the same kidney, as close as possible to each other. Appropriate to this constraint, we developed a Cycle-GAN30 framework for unpaired image-to-image translation. A typical GAN constructs two different convolutional neural networks: a generator and a discriminator. While the discriminator tries to distinguish real images from fake ones, the generator learns to produce images that are difficult for the discriminator to distinguish. Cycle-GANs, in contrast, employ generators and discriminators for each of the two domains so that the images can be translated between them. Here, we first modify the architecture of the generator and discriminator to rapidly convert FF to FFPE images and synthesize important finer morphology. The generator is a combination of U-Net34 and Res-Net35 and has been successfully implemented previously36. The discriminator is multi-scale, which is suitable to recover finer details23. Second, we evaluate the AI results using evaluations by board-certified pathologists who routinely examine FF and FFPE images from the same patients and provide the best human judges to relate results for realism and utility.

Methods

1. Sample preparation

Archived formalin-fixed, paraffin-embedded tissues and frozen samples from The University of Texas MD Anderson Cancer Center (Houston, TX) were obtained after informed consent and using an institutional review board-approved protocol (IRB# LAB 08–670). For this retrospective study, we searched our institution’s database for surgically resected kidney cases using the keywords “partial nephrectomy”, “clear cell renal cell carcinoma”, between 2019–2020. We found 40 cases of clear cell renal cell carcinomas that comprised the study cohort. H&E sections with tumor and non-neoplastic kidney were manually outlined using a marker by 2 fellowship trained genitourinary pathologists (TG and MH) with 5 and 6 years of experience, respectively and 1 senior genitourinary pathologist with over 20 years of experience. The scanners used are Leica Aperio AT2 scanners. Slides are scanned at 20x magnification and saved as .svs files. The slides are viewed using the Aperio ImageScope v12.4.3 software. Briefly, 98 samples were extracted from 40 patients comprising 49 frozen sections and 49 adjacent permanent sections (FFPE) from the same tissue block. The patients age-range is from 22 to 81, including 13 females and 27 males.

2. Model design

The overall objective of the proposed study is to learn the translation between FF and FFPE domains. We introduce our framework in Fig. 1. After digitizing the FF and FFPE samples, we extract patches at 20x. Our framework follows Cycle-GAN30, which includes two generators and two discriminators (GFFPE, GFF, DFFPE, and DFF) for translation between FF and FFPE domains. The GFFPE is responsible for translating images from FF to FFPE domain whereas GFF reverses the process and maps images from the FFPE domain to FF. Simultaneously, the discriminators send feedback on whether the images are real or fake. DFF takes FF domain images and DFFPE takes FFPE images. The proposed framework needs to be trained using proper loss functions for achieving stable and decent performance. For calculating the GAN loss, we use 2 discriminators that have an identical layout and we apply this to different image scale similar to the previous study23. One of them operates on a full resolution image while the other operates on images down-sampled by a factor of 2. These multi-scale discriminators have been rarely used in biomedical imaging. However, they can significantly improve the analysis of histological images as they evaluate the images at different field of views and resolutions -- similar to how pathologists make decisions. Moreover, we utilize a modified U-Net architecture that has been successfully used in the previous study37. These are modifications to the original Cycle-GAN framework, which improves the quality and better transfers the morphology.

Figure 1.

Figure 1.

An overview of the proposed method. Our framework includes two generators and two discriminators for translation between FF and FFPE domains and vice versa.

3. Loss functions

We optimized the parameters of our framework with respect to a combination of two different loss functions: cycle loss and adversarial loss. The main idea behind cycle-GAN is to translate between domains through a cycle. In our case, the cycle is FF → Fake FFPE → Reconstructed FF and simultaneously, FFPE → Fake FF → Reconstructed FFPE. It is necessary to minimize the difference between reconstructed images and original images, which is called cycle loss and defined as:

Lcycle{GFF,GFFPE}=||GFF(GFFPE(FF))FF||1+||GFFPE(GFF(FFPE))FFPE||1 (1)

For the translation from FF to FFPE, the adversarial loss is defined as:

Ladv{GFFPE}=(1DFFPE(GFFPE(FF)))2 (2)
Ladv{DFFPE}=(1DFFPE(FFPE))2+DFFPE(GFFPE(FF))2 (3)

Where equations 2 and 3 represent adversarial loss for generator and discriminator, respectively. Similarly, for mapping from FFPE to FF, LadvGFF and LadvDFF are calculated in the same way. The overall objective function is achieved by linear combination of aforementioned loss as below:

Ltotal{GFF,GFFPE,DFF,DFFPE}=Ladv{DFF,DFFPE,GFF,GFFPE}+γ1Lcycle{GFF,GFFPE} (4)

Where γ1 is regularization terms used to incorporate the importance of different losses to the total objective function and is set to 10 similar to original cycle-GAN21.

4. Implementation details

The framework is implemented in PyTorch 1.3, CUDA 10.1, and Python 3.7.1 and computations are performed on a single NVIDIA® GeForce® RTX 2080 SUPER GPU and Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz. All of the generative and discriminative models are initialized to random weights. Adam38 was used to optimize the parameters of models with an initial learning rate of 10−4 and they are multiplied by 0.96 after every 1000 iterations. The training stops after a total 100,000 iterations. For data augmentation, we apply random affine transformation to the images at each iteration including −10% to 10% random translation and 0 to 180 degree random rotation using nearest neighbor. The implementation can be found here: https://github.com/kiakh93/Virtual_FFPE

5. Evaluation

The evaluation of results is not straightforward since FF and FFPE are from different tissue sections and there is no one to one comparison between virtual FFPE and real FFPE. Thus, pixel level evaluation methods such as PSNR and RMSE cannot be used. Instead, we use two different approaches to evaluate our results: pathologist review and perceptual similarity.

5.1. Pathologist review

We created a survey to identify the quality of diagnoses. The purpose of this survey was to evaluate the quality and diagnostic reliability of virtual FFPE images compared to real FFPE. The survey included 35 FF, 35 virtual FFPE, and 35 real FFPE samples. The virtual FFPE images have been generated given the FF images using the generative network (GFFPE), whereas the real FFPE images have been selected from parallel sections corresponding to the FF samples. Each image in the survey has field of view of 2mm ×2mm. Board-certified pathologists were asked to review each of the images and to answer two questions:

  • How closely does the image resemble FFPE tissue, on a scale of 1to 5? (1 is a FF section and 5 is a permanent FFPE section)

  • What is the grade of cancer?

The first question quantitatively evaluates the similarity between virtual FFPE images and real FFPE images. Hence, for each group of images (virtual FFPE, FFPE, FF) we will have an average score between 1 and 5 which indicates the similarity of each group to FFPE or FF domains. The second question, however, measures pathologists’ concordance rate in grading clear cell renal carcinoma for FFPE, virtual FFPE, and FF samples. The hypothesis is that the virtual FFPE images increase the concordance rate compare to FF images, which we can be tested by comparing their Fleiss’ kappa39. This measure quantifies the degree of agreement in classification above that which would be expected by chance. A value of 0 demonstrates agreement occurring only by chance and a value of 1 shows perfect agreement. Upon completion of the survey by pathologists, we performed statistical analyses for both the concordance rate of diagnostic interpretation and inter-observer variability (question 2 in the survey), we use Fleiss’ kappa statistic, which is a common statistical approach for analyzing inter-observer variability.

Results

Since FF and FFPE samples are from the same patient but not serial sections, they are not exactly matched; we developed a cycle-GAN21 framework for this unpaired collection (Fig. 1). To train our framework, we used a dataset that includes hematoxylin and eosin (H&E)-stained whole slide images of FF sections and FFPE sections from the same 20 samples (with a total of 6396 patches of 512 × 512) across different grades of clear cell RCC. The model estimates FFPE images after training from FF images alone, without the need for other FFPE images or any further information. The key elements of these virtual FFPE images involve morphological structures, staining levels and contrast, and each of these can be appreciated in Fig. 2. Importantly, we show different artifact enhancement in virtual FFPE images. While Fig. 2a demonstrates that folding and thickness variation artifacts have been mitigated in the virtual FFPE image, Fig. 2b shows that unwanted non-tissue materials can also be eliminated, which improves diagnosis quality. In Fig. 2c, the input frozen image is blurry and overly smooth, which can be due to the imaging system focusing poorly or sample being lifted from the slide. However, the virtual FFPE image overcomes the blurriness and improves the morphological details in the processed image. Finally, In Fig. 2d, we demonstrate that our framework can reduce the freezing artifacts that exist commonly in frozen samples. Additional examples can be found in supplementary Figure 1. Overall, these examples indicate the potential of the reported approach in not only reducing time for accurate analysis but also present an opportunity to overcome physical factors that affect image quality for frozen sections. The inference itself is fast, requiring 0.105 seconds for a 2000 × 2000 pixel region, which can be further improved by using stronger computing hardware with parallelization capabilities. We especially note that virtual FFPE images can be expected to have a more consistent appearance over a population, uniformity in color and contrast across the image and a tunable level that will likely not be matched by real stains. Although visual agreement is good between virtual and real FFPE images in terms of morphologic patterns, color and contrast, there are also invariable differences between the precise structures in these images as they are from different parts of the tissue. A quantitative evaluation is needed to assess the quality and utility of the predictions.

Figure 2.

Figure 2.

Visual comparison of frozen samples artifacts enhancement. a, sample folding and thickness artifacts. b, non-tissue materials artifacts. c, blurring artifacts. d, freezing artifacts.

We used determinations of practicing pathologists as an independent test of the results. Five pathologists completed the survey, as described in the methods section, to assess images. The pathologists are board certified, practicing clinicians with experience: KS (25 years); TG (5 years); MH (6 years); AA (5 years); BN (6 years). The survey was designed with 35 examples each of FF, virtual FFPE, and FFPE samples to ensure that a large number of images were analyzed but not so large as to induce fatigue that may introduce error during the tests. The set was the same for all pathologists but was randomized in the order of presented images presented to each pathologist. In order to quantify the assessment on a common scale, we developed a Pathologist Evaluation Index (PEI) that assigns a 1–5 score to each image. Here, 1 indicates how closely the presented field of view from the image resembled an FF image and 5 an FFPE sample. As summarized in Table 1, pathologists assigned lower and higher PEI scores to FF and FFPE images, respectively, serving to assure the relative magnitudes of the assessment and bounding the expected range for both the test and individuals. Importantly, we should bear in mind that the FFPE section that was generated was not the standard section that is immediately placed in formalin. Instead, these FFPE samples were derived from a tissue block that was initially frozen in OCT, sectioned for frozen section analysis and then subsequently placed in formalin for fixation. This does alter the staining qualities of such an FFPE section over one which has been placed directly in formalin from the outset. In addition, PEI scores higher than 1 for FF and lower than 5 for FFPE likely arise from the limited fields of view that we had to present to each pathologist to ensure that the regions they were looking at were consistent. The intermediate scores may also reflect that each pathologist has knowledge that these images are from the three sources. The distribution of scores helps bound the effects from this bias and individual scoring proclivities, emphasizing that the raw scores are not globally absolute. In all cases, importantly, we note that the virtual FFPE scores were intermediate between the FF and FFPE. Artifacts, for example from freezing and sample folding, persisted in the virtual FFPE images; consequently, their average score is lower than the one assigned to FFPE by pathologists on average. It is possible, in principle, to use AI methods to eliminate these known artifacts and their effects in virtual FFPE. While we show this possibility in the discussion section, we did not attempt to optimize diagnostic quality in this study to keep it focused on FF-FFPE translation.

Table 1.

Pathologists review of the survey for assigning score to the images, where we report mean ± standard deviation of the scores that have been assessed by pathologists to each domain. Scores interval is from 1 to 5, where 1 being FF-like samples and 5 being FFPE-like samples.

Assigning score by pathologists
1 2 3 4 5 Total
Frozen 1.86 ± 0.94 3.26 ± 1.06 1.34 ± 0.87 1.46 ± 0.78 1.48 ± 0.77 1.88 ± 1.14
Virtual FFPE 2.53 ± 0.84 3.79 ± 0.47 2.53 ± 1.33 2.56 ± 0.93 1.87 ± 0.93 2.65 ± 1.18
FFPE 2.91 ± 1.01 3.94 ± 0.33 3.72 ± 1.25 3.28 ± 0.98 2.40 ± 1.14 3.25 ± 1.07

While physical quality assessment is reassuring, the impact of this image on pathologic assessment is important. In particular, detecting tumor for margin assessment is a valuable application of FF processing. Hence, we focused next of evaluating the performance of pathologists in distinguishing benign versus cancerous regions by calculating inter-observer agreements using Fleiss’ kappa. Further, to assess the utility in characterizing disease we also examined their determination of tumor grade for clear cell RCC (question 2 in the survey). As shown in Table 2, Fleiss’ kappa value (κ) for FF, virtual FFPE, and FFPE images is 0.52, 0.67, and 0.89, respectively, for distinguishing benign versus cancerous samples. For grading clear cell RCC, it is 0.39, 0.51, and 0.63, respectively. Importantly, the inter-observer agreement for grading FFPE RCC has been reported 0.22 previously41 which is lower than our findings. The main reason is that we have selected ROIs of 2mm × 2mm rather than the whole slides in our analysis. Furthermore, we haven’t used multi-institutional images for our study and our sample size is relatively small. Overall, a kappa value 0 demonstrates agreement by chance and 0.01–0.20 as none to slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as almost perfect agreement42. Thus, our findings suggest that pathologists’ agreement for virtual FFPE images increases from moderate to substantial for benign versus cancerous and from fair to moderate for grading clear cell RCC. This is not surprising as discriminating benign from cancerous tissue is much easier than assigning grade, which shows greater interobserver variability43,44. In addition, we assume that the ground truth class for each sample is the consensus reference based on real FFPE. Thus, the average accuracy pathologists for grading frozen and virtual FFPE has been calculated as 0.8667 and 0.9133 respectively. Similarly, average accuracy pathologists for discriminating benign from cancerous was 0.6933 for frozen and 0.7467 for virtual FFPE. Therefore, these findings demonstrate that the virtual FFPE framework improves the quality of diagnosis.

Table 2.

Pathologists review of the survey for calculation of inter-observer agreement for each domain using Fleiss’ kappa for 95% confidence interval.

Fleiss’ Kappa with 95% CI
Benign vs cancer Grading
Frozen 0.52 (0.42,0.62) 0.39 (0.34,0.44)
Virtual FFPE 0.67 (0.59,0.75) 0.51 (0.44,0.58)
FFPE 0.89 (0.78,1.00) 0.63 (0.56,0.70)

Discussion

In this study, we present a DL framework that facilitates rapid and more assured diagnoses by combining the speed advantages of FF with the quality of FFPE processing. Without any additional tissue processing of FF sections, our computational framework transfers the morphologic textures and colors from FF samples to that similar to those obtained after FFPE processing in the form of virtual FFPE images. Overall, the use of virtual FFPE images increases inter-observer agreement between pathologists compared to using FF alone; importantly, without increasing workflow times, training or using any additional reagents. This is not unexpected as images that resemble the gold standard (FFPE) offer the potential for more precise diagnosis with fewer complicating factors. The presented approach, thus, offers a significant value for the small effort needed in implementing and using it. The present study involved examining a specific organ and tumor type; however, our strategy may be employed to generate images for other organs and diseases with appropriate DL framework tuning. Hence, the pathologists’ workflow may be facilitated when evaluating FF samples for a variety of indications where faster, more assured information is desirable, including diagnosis and subtyping of tumors, assessment of surgical resection margin status, evaluation of allografts for organ transplantation and screening at the time of tissue procurement for ancillary testing.

To better transfer the morphology and remove artifacts, we modified architectures and loss functions of the original cycle-GAN30. We utilized a generator that is a combination of U-Net34 and Res-Net35 architectures. Our generator leverages advantages of both architectures such as learning multi-level features and preventing the vanishing gradient problem. Furthermore, we used 2 discriminators with identical architecture that operate at different image scales similar to previous studies45. One of the discriminators operates at a larger receptive field, guiding the generator to synthesize images with consistent global context. The other discriminator, however, performs locally, which encourages the generator to synthesize finer morphology and semantically valid structures that exists in FFPE images. Following the aforementioned modifications, we observed that virtual FFPE images look more realistic and better smooth out freezing artifacts (supplementary Figure 3) compared to the original cycle-GAN. A comprehensive examination and strategies for elimination of artifacts is possible and contributions of such efforts can now be evaluated in larger sample sets since we show the possibility of the primary FF-FFPE translation.

Similar to every other machine learning model, our framework has some potential failure cases. For instance, occasional artifacts from FF domain persist in virtual FFPE images, such as the sample being folded or torn (supplementary Figure 4). Obviously, a style transfer approach cannot repair physical damages but an additional module to our workflow that focuses on inpainting can potentially help46. There are other artifacts that are also well know. For example, when the freezing process is slow, cell morphology may not be well preserved and cells typically appear larger. The problem of bloated cell morphology cannot be fully resolved using our framework at present though further extensions can be made to address known artifacts. Finally, in order to apply this to other tissue types, training on the corresponding tissue datasets is necessary for accurate synthesis of virtual FFPE images. Different types of tissues include different cell types, morphologies, and structures, whose textures have to be introduced to the prediction framework at the training stage. In addition, different institutions use different protocols, sample preparations, and imaging scanners, which cause color variation in histological images. Thus, various type of tissues from different institutions are needed to generalize translation between FF and FFPE domains. We anticipate that a larger number of samples and greater diversity can potentially lead to better training of the model and also make the predictions more powerful.

Our method is very rapid; for example, for a 1 cm × 1 cm image with 0.5 μm pixel sizes, the computation takes approximately 10.5 seconds on a single GPU. Given that modern optical scanners can digitize a high-resolution image within minutes, our approach has the potential for real-time analysis during intraoperative consultation. As most intraoperative frozen sections involve interpretation of fewer than 5 slides per case, the extra time required using our approach would be a few minutes. In practice, a pathologist will use his/her judgement to decide if it is worthwhile to perform FF to virtual FFPE translation in a given case, but we do not anticipate that the extra few minutes will not be a barrier. Pathologists may use this technique for an organ site or tumor type that is known to be problematic intraoperatively, or a particularly challenging case, or if a more precise diagnosis is desired. Just as ordering a recut section or calling for a colleague’s opinion, our method could prove to be a valuable tool in the arsenal of the pathologist. This method also suggests analysis strategies that may be powerful; for example, performing FF to FFPE translation on one slide while waiting for a recut section or other sections from the same case to be cut and stained. Ultimately, the conclusion is that this tool provides a means to allow the pathologist the option to evaluate a near FFPE quality section without waiting until the following day. In order to further evaluate the impact of virtual FFPE images on accuracy of diagnosis, a large-scale clinical evaluation is needed that includes a wide range of tissue types and pathologists with differing levels of expertise and experience to carefully analyze a large number of images to recognize other potential failure cases and drawbacks.

The technique presented here also presents new opportunities in addressing the diversity in pathologist capability for assessing frozen section pathology on two fronts. In general, the accuracy of diagnoses from FF sections improves with pathologist experience. Given that relatively few, mostly academic medical centers perform a high volume of FF sections, expertise is highly variable and can take long to acquire. An approach like the one presented can aid in development of expertise by allowing a user to analyze FF images and get its FFPE “twin”. This approach can be especially useful in enabling diagnoses among the larger cohort of community pathologists47, where opportunities to gain FF section pathology expertise are fewer as well as for organs in which disease is not especially prevalent. In contrast to highly prevalent cancers where opportunities for understanding and training can be significant, such as that of the breast, the use of our method for cancers with fewer occurrences can be more impactful. Another complementary aspect of our framework is generating virtual FF images given real FFPE images (supplementary Figure 2). While virtual FF images do not have additional diagnostic value for pathologists, they similarly offer significant value for educational purposes. New technologies to aid pathologist education can be useful where a concern may be quality of extensive training in a short period of time48.

Conclusions

We report a machine learning approach to bridge the gap between the two prevalent methods of tissue processing for histopathology. Using a deep learning framework, we report on this method to synthesize FFPE images of kidney samples, which we term virtual FFPE, from clinical FF images. Validation of the quality and utility of virtual FFPE images was assured using a survey administered to multiple board certified and experienced physicians, demonstrating that the virtual FFPE are of high quality and increase inter-observer agreement, as calculated by Fleiss’ kappa, for detecting cancerous regions and assigning a grade to clear cell RCC within the sample. This framework can be broadly applied to other type of tissues across the biomedical sciences to generalize the mapping between FF and FFPE domains and derive the advantages of both. This study paves the way for routine FF assessment to be augmented with benefits of FFPE information without adding to the cost, time or effort required, thereby increasing the quality of histopathologic examinations using machine learning.

Supplementary Material

Supplementary Figures
Nature Competing Interests Form
Nature Editorial Policy Checklist
Nature Code and Software Submission Checklist

Acknowledgements

In addition to the co-authors, we would like to acknowledge pathologists who completed this survey (AA; BN).

Funding

This work was supported by the National Institutes of Health via grant number R01CA260830.

Footnotes

Ethics Approval

The study is approved under the University of Illinois at Urbana-Champaign Institutional Review Board (IRB #06684)

Competing Interests Statement

There are no conflicts of interest to report.

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  • 1.Perlmutter MA et al. Comparison of snap freezing versus ethanol fixation for gene expression profiling of tissue specimens. J. Mol. Diagnostics 6, 371–377 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ripoli FL et al. A comparison of fresh frozen vs. Formalin-fixed, paraffin-embedded specimens of canine mammary tumors via branched-DNA assay. Int. J. Mol. Sci 17, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zordan A Fluorescence in situ hybridization on formalin-fixed, paraffin-embedded tissue sections. Methods Mol. Biol 730, 189–202 (2011). [DOI] [PubMed] [Google Scholar]
  • 4.Fejzo MS & Slamon DJ Frozen tumor tissue microarray technology for analysis of tumor RNA, DNA, and proteins. Am. J. Pathol 159, 1645–1650 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jaafar H Intra-operative frozen section consultation: Concepts, applications and limitations. Malaysian Journal of Medical Sciences 13, 4–12 (2006). [PMC free article] [PubMed] [Google Scholar]
  • 6.Evans CA & Suvarna SK Intraoperative diagnosis using the frozen section technique. J. Clin. Pathol 59, 334 (2006). [PMC free article] [PubMed] [Google Scholar]
  • 7.Mahe E et al. Intraoperative pathology consultation: Error, cause and impact. Can. J. Surg 56, (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ferreiro JA, Myers JL & Bostwick DG Accuracy of Frozen Section Diagnosis in Surgical Pathology: Review of a 1-Year Experience With 24,880 Cases at Mayo Clinic Rochester. Mayo Clin. Proc 70, 1137–1141 (1995). [DOI] [PubMed] [Google Scholar]
  • 9.Howanitz PJ, Hoffman GG & Zarbo RJ The accuracy of frozen-section diagnoses in 34 hospitals. Arch. Pathol. Lab. Med 114, 355–359 (1990). [PubMed] [Google Scholar]
  • 10.White VA & Trotter MJ Intraoperative consultation/final diagnosis correlation: Relationship to tissue type and pathologic process. Arch. Pathol. Lab. Med 132, 29–36 (2008). [DOI] [PubMed] [Google Scholar]
  • 11.Novis DA, Gephardt GN & Zarbo RJ Interinstitutional comparison of frozen section consultation in small hospitals: A College of American Pathologists Q-Probes Study of 18532 frozen section consultation diagnoses in 233 small hospitals - ProQuest. Arch. Pathol. Lab. Med 120, 1087–1093 (1996). [PubMed] [Google Scholar]
  • 12.Lam JS, Bergman J, Breda A & Schulam PG Importance of surgical margins in the management of renal cell carcinoma. Nat. Clin. Pract. Urol 2008 56 5, 308–317 (2008). [DOI] [PubMed] [Google Scholar]
  • 13.Breda A et al. Positive Margins in Laparoscopic Partial Nephrectomy in 855 Cases: A Multi-Institutional Survey From the United States and Europe. J. Urol 178, 47–50 (2007). [DOI] [PubMed] [Google Scholar]
  • 14.Kafka IZ & Averch TD Intraoperative Assessment of Tumor Resection Margins. Smith’s Textb. Endourol 1097–1100 (2018). doi: 10.1002/9781119245193.CH94 [DOI] [Google Scholar]
  • 15.Lecun Y, Bengio Y & Hinton G Deep learning. Nature 521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]
  • 16.Ulyanov D, Vedaldi A & Lempitsky V Deep image prior. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 9446–9454 (2018). [Google Scholar]
  • 17.Nassif AB, Shahin I, Attili I, Azzeh M & Shaalan K Speech Recognition Using Deep Neural Networks: A Systematic Review. IEEE Access 7, 19143–19165 (2019). [Google Scholar]
  • 18.Maqueda AI, Loquercio A, Gallego G, Garcia N & Scaramuzza D Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 5419–5427 (IEEE Computer Society, 2018). doi: 10.1109/CVPR.2018.00568 [DOI] [Google Scholar]
  • 19.Miotto R, Wang F, Wang S, Jiang X & Dudley JT Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform 19, 1236–1246 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Goodfellow I et al. Generative adversarial nets. in Advances in neural information processing systems 2672–2680 (2014). [Google Scholar]
  • 21.Zhu J-Y, Park T, Isola P, Efros AA & Research BA Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Monet Photos. (2017).
  • 22.Johnson J, Alahi A & Fei-Fei L Perceptual losses for real-time style transfer and super-resolution. in European conference on computer vision 694–711 (2016). [Google Scholar]
  • 23.Wang T-C et al. High-resolution image synthesis and semantic manipulation with conditional gans. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 8798–8807 (2018). [Google Scholar]
  • 24.Ben-Ezra M, Lin Z & Wilburn B Penrose pixels: Super-resolution in the detector layout domain. in Proceedings of the IEEE International Conference on Computer Vision (2007). doi: 10.1109/ICCV.2007.4408888 [DOI] [Google Scholar]
  • 25.Campanella G et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med 25, 1301–1309 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jayapandian CP et al. Development and evaluation of deep learning–based segmentation of histologic structures in the kidney cortex with multiple histologic stains. Kidney Int. 99, 86–101 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Iizuka O et al. Deep Learning Models for Histopathological Classification of Gastric and Colonic Epithelial Tumours. Sci. Rep 10, 1–11 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.BenTaieb A & Hamarneh G Adversarial Stain Transfer for Histopathology Image Analysis. IEEE Trans. Med. Imaging 37, 792–802 (2018). [DOI] [PubMed] [Google Scholar]
  • 29.Shaban MT, Baur C, Navab N & Albarqouni S StainGAN: Stain Style Transfer for Digital Histological Images. Proc. - Int. Symp. Biomed. Imaging 2019-April, 953–956 (2018). [Google Scholar]
  • 30.Zhu J-Y, Park T, Isola P & Efros AA Unpaired image-to-image translation using cycle-consistent adversarial networks. in Proceedings of the IEEE international conference on computer vision 2223–2232 (2017). [Google Scholar]
  • 31.Rivenson Y et al. Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning. Nat. Biomed. Eng 1 (2019). [DOI] [PubMed] [Google Scholar]
  • 32.Rivenson Y et al. PhaseStain: the digital staining of label-free quantitative phase microscopy images using deep learning. Light Sci. Appl 8, 23 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Rana A et al. Use of Deep Learning to Develop and Analyze Computational Hematoxylin and Eosin Staining of Prostate Core Biopsy Images for Tumor Diagnosis. JAMA Netw. Open 3, e205111 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ronneberger O, Fischer P & Brox T U-net: Convolutional networks for biomedical image segmentation. in International Conference on Medical image computing and computer-assisted intervention 234–241 (2015). [Google Scholar]
  • 35.He K, Zhang X, Ren S & Sun J Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition 770–778 (2016). [Google Scholar]
  • 36.Falahkheirkhah K, Yeh K, Mittal S, Pfister L & Bhargava R Deep learning-based protocols to enhance infrared imaging systems. Chemom. Intell. Lab. Syst 217, 104390 (2021). [Google Scholar]
  • 37.Falahkheirkhah K, Yeh K, Mittal S, Pfister L & Bhargava R A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging. arXiv Prepr. arXiv1911.04410 (2019). [Google Scholar]
  • 38.Kingma DP & Ba J Adam: A method for stochastic optimization. in International Conference for Learning Representations (ICLR) (2015). [Google Scholar]
  • 39.Schreiber D, Wong AT, Rineer J, Weedon J & Schwartz D Prostate biopsy concordance in a large populationbased sample: A Surveillance, epidemiology and end results study. J. Clin. Pathol 68, 453–457 (2015). [DOI] [PubMed] [Google Scholar]
  • 40.Zhang R, Isola P, Efros AA, Shechtman E & Wang O The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit 586–595 (2018). [Google Scholar]
  • 41.Lang H et al. Multicenter determination of optimal interobserver agreement using the Fuhrman grading system for renal cell carcinoma. Cancer 103, 625–629 (2005). [DOI] [PubMed] [Google Scholar]
  • 42.McHugh ML Interrater reliability: The kappa statistic. Biochem. Medica 22, 276–282 (2012). [PMC free article] [PubMed] [Google Scholar]
  • 43.Al-Aynati M et al. Interobserver and intraobserver variability using the Fuhrman grading system for renal cell carcinoma. Arch. Pathol. Lab. Med 127, 593–596 (2003). [DOI] [PubMed] [Google Scholar]
  • 44.Lang H et al. Multicenter determination of optimal interobserver agreement using the Fuhrman grading system for renal cell carcinoma. Cancer 103, 625–629 (2005). [DOI] [PubMed] [Google Scholar]
  • 45.Park T, Liu M-Y, Wang T-C & Zhu J-Y Semantic Image Synthesis With Spatially-Adaptive Normalization. in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2337–2346 (2019). doi: 10.1109/cvpr.2019.00244 [DOI] [Google Scholar]
  • 46.Bertalmio M, Sapiro G, Caselles V & Ballester C Image inpainting. Proc. ACM SIGGRAPH Conf. Comput. Graph 417–424 (2000). doi: 10.1145/344779.344972 [DOI] [Google Scholar]
  • 47.Horowitz RE Expectations and essentials for the community practice of pathology. Hum. Pathol 37, 969–973 (2006). [DOI] [PubMed] [Google Scholar]
  • 48.Domen RE & Baccon J Pathology residency training: Time for a new paradigm. Hum. Pathol 45, 1125–1129 (2014). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures
Nature Competing Interests Form
Nature Editorial Policy Checklist
Nature Code and Software Submission Checklist

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

RESOURCES