External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens

Bogdana Schmidt; Simon John Christoph Soerensen; Hriday P Bhambhvani; Richard E Fan; Indrani Bhattacharya; Moon Hyung Choi; Christian A Kunder; Chia‐Sui Kao; John Higgins; Mirabela Rusu; Geoffrey A Sonn

doi:10.1111/bju.16464

. 2024 Jul 11;135(1):133–139. doi: 10.1111/bju.16464

External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens

Bogdana Schmidt ¹, Simon John Christoph Soerensen ^2,³, Hriday P Bhambhvani ⁷, Richard E Fan ², Indrani Bhattacharya ⁴, Moon Hyung Choi ⁹, Christian A Kunder ⁵, Chia‐Sui Kao ⁸, John Higgins ⁵, Mirabela Rusu ^2,^4,⁶, Geoffrey A Sonn ^2,^4,^✉

PMCID: PMC11628895 NIHMSID: NIHMS2033274 PMID: 38989669

Abstract

Objectives

To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole‐mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.

Materials and Methods

The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1‐mm² tiles created from 150 whole‐mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.

Results

The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non‐cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non‐cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non‐cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.

Conclusion

The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.

Keywords: radical prostatectomy, artificial intelligence, Gleason grading, prostate cancer, whole‐mount prostatectomy

Abbreviations

AI: artificial intelligence
GG: Grade Group
GU: genitourinary
(N)(P)PV: (negative) (positive) predictive value
RP: radical prostatectomy
QWK: quadratically weighted Cohen's kappa
UWK: unweighted Cohen's kappa

Introduction

Prostate cancer is the second most frequent cancer and fifth leading cause of cancer death in men, accounting for ~1 400 000 new cases and 375 000 deaths worldwide in 2020 alone [1]. A key factor in determining the management of prostate cancer is the Gleason grade from prostate histopathology, which has proven to be a robust predictor of long‐term prognosis [2]. Accurate grade assessment is critical not only for patient counselling and decision‐making but also directly influences treatment outcomes [3].

Despite its pivotal role in the management of prostate cancer, the Gleason grading system faces challenges. Its application is laborious, time‐consuming, and prone to substantial interobserver variability [4]. Furthermore, the grading of prostate cancer differs between core biopsy and prostatectomy pathology, as core biopsy specimens do not offer a complete picture of the extent of the cancer [5]. These issues emphasise the need for more consistent and faster prostate cancer grading. Applying artificial intelligence (AI) may improve speed and accuracy of prostate cancer grading. AI's potential to revolutionise industries reliant on visual assessment has been increasingly recognised. Examples of AI's potential to improve prostate cancer care include optimising interpretation of prostate MRI [6], prostate MRI and histopathology registration [7, 8, 9, 10, 11, 12], as well as identifying cancerous lesions on prostate MRI [13, 14, 15, 16, 17, 18].

Artificial intelligence algorithms have also been developed for genitourinary (GU) pathology interpretation [19, 20, 21, 22, 23, 24, 25, 26]. Among them, the Prostate cANcer graDe Assessment (PANDA) challenge, which aimed at automating Gleason grading, demonstrated the burgeoning capabilities of deep learning in this domain [27]. Another significant contribution is the DeepDx Prostate AI algorithm (Deep Bio Inc., Seoul, South Korea), an automated Gleason scoring system that employs deep neural networks to automatically grade and outline cancerous prostatic tissue [23]. The AI algorithm was trained and validated on 1133 prostate core needle biopsy images from two hospitals in South Korea, with initial results revealing almost perfect agreement with experienced pathologists (quadratically weighted Cohen's kappa of 0.91) [23].

Artificial intelligence for medical visual applications requires extensive external validation to ensure that it generalises to patient populations beyond those included in the training data [6]. This is particularly important in the context of prostate cancer, where the ability of AI trained on biopsy specimens to generalise to prostatectomies is crucial to understand the extent of the disease within the prostate. In this context, our study aimed to externally validate the performance of DeepDx Prostate on whole‐mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size. This product has Conformité Européene (CE) mark approval as a commercial product in Europe. It is not United States Food and Drug Administration (FDA) approved and is not in clinical use in the USA. By offering a more objective, cost‐effective, and efficient grading method, we envision that this system could substantially enhance Gleason grading, bolster the efficiency of pathologists, and foster advancements in prostate cancer care.

Materials and Methods

The study of 150 whole‐mount RP specimens, all acquired from a single institution, was approved by the Institutional Review Board at Stanford University.

Data Collection

We acquired digital histopathology images from 150 whole‐mount RP specimens from our institution, scanned using a Leica Aperio CS (Leica Biosystems, Nußloch, Baden‐Wurttemberg, Germany) at ×20. These whole‐mount images were split into 2048 × 2048 0.5‐μm/pixel tiles, which represented ~1 mm × 1 mm. Of these prostate tissue tiles, 500 tiles were selected for this reader study. The tile dimensions were selected based on feedback from an expert GU pathologist (C.A.K.), so that it allowed for optimal viewing at full magnification on a typical computer monitor. We extracted individual tiles and ensured all tiles contained prostate tissue. The tile selection was manual and non‐randomised to ensure we included a representative mix of normal, cancerous, and mixed benign and cancerous tissue. To reduce bias in analysis, tile selections were made by an independent investigator, not a pathologist set to review the tiles (R.E.F.). In selecting our tiles, we sought to develop a representative distribution of benign vs different Gleason patterns.

Prostate Tissue Evaluation

Two experienced GU pathologists (C.A.K. and C.‐S.K.) independently evaluated each tile to determine the grade group (GG). The International Society of Urological Pathology (ISUP) GG reference standard was established for each tile by a consensus of the two experienced pathologists. In cases with initial disagreement, the two pathologists discussed cases to come to an agreement. In cases where agreement could not be reached, a third GU pathology expert (J.H.) served as the tiebreaker.

In each whole‐slide image, the AI algorithm outlined and graded cancerous and non‐cancerous tissue. After we divided the whole‐slide images into tiles, the algorithm rated what percentage of the tile was benign, Gleason patterns 3, 4, and/or 5. The AI algorithm outputs individual Gleason patterns 3–5 for each tile, which we subsequently grouped into GGs according to standard urological pathology practices for comparison with pathologist assessments.

Only Gleason patterns representing >5% of tissue were considered in the analysis. In the case of tiles (n = 93) where the AI algorithm reported a minor growth pattern of ≤5%, this pattern was excluded from contributing to the overall predicted GG.

Statistical Analysis

Grading concordance was evaluated in four scenarios: (i) inter‐pathologist in differentiating individual GGs, (ii) reference standard (consensus of expert pathologists) vs the AI algorithm for cancer detection, (iii) reference standard vs the AI algorithm in differentiating individual GGs, and (iv) reference standard vs the AI algorithm in classifying tissue into risk groups: benign, low (GG 1), intermediate (GG 2–3), and high‐risk (GG 4–5). Confusion matrices were generated for each of the analyses. For all analyses, except for binary comparisons, both an unweighted (UWK) and a quadratically weighted (QWK) Cohen's kappa and corresponding 95% CIs were calculated. A kappa of <0.00 is considered poor agreement, 0.00–0.20 slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement, and 0.81–1.00 almost perfect agreement [28].

We tested the AI algorithm's performance in distinguishing among three clinically relevant cut‐offs: its ability to distinguish malignant vs benign tiles, differentiating GG ≥2 vs GG 1 and benign tiles, and discerning GG ≥3 vs GG 1–2 and benign tiles. The following metrics were calculated: Accuracy, defined as:

(true positive + true negative)/(true positive + false positive + true negative + false negative), positive predictive value (PPV), sensitivity, specificity, negative PV (NPV).

All statistical analyses were performed in R 4.0.5 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Inter‐Pathologist and DeepDx Prostate AI Agreement in Gleason Grading

In the analysis of 500 tiles, initial agreement in discerning individual GGs occurred between expert pathologists on 389 (77.8%) tiles. The agreement between the two experienced pathologists in determining GGs at the tile level had a QWK of 0.94 (95% CI 0.92–0.96) and UWK of 0.73 (95% CI 0.68–0.77) (Fig. 1a). After initial agreement between experts on GGs in 389 tiles, consensus was reached on a further 53 tiles through discussion. Disagreements on the remaining 58 tiles were settled by a third pathologist. The final consensus of all 500 tiles became the pathologist reference standard. In all, 150 tiles were deemed to be benign, 84 were GG 1, 57 were GG 2, 81 were GG 3, 57 were GG 4, and 71 were GG 5 by the pathologists.

Fig. 1 — Agreement in cancer diagnosis and grading between expert GU pathologists and the DeepDx Prostate AI algorithm. (a) Inter‐pathologist agreement on GGs determination, exhibiting almost perfect agreement with a QWK of 0.94 and an UWK of 0.73. (b) Almost perfect agreement in identifying cancerous vs non‐cancerous tissue between the reference standard and the AI algorithm, with a UWK of 0.91. (c) Almost perfect agreement between the reference standard and the AI algorithm in distinguishing individual GGs, represented by a QWK of 0.89 and UWK of 0.56. (d) Almost perfect agreement in risk group classification between the reference standard and the AI algorithm, characterised by a QWK of 0.89 and UWK of 0.73.

The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non‐cancerous tissue had a UWK of 0.91 (95% CI 0.87–0.95) (Fig. 1b). For evaluation across GGs, Gleason patterns output by the AI were grouped into GGs, achieving a QWK of 0.89 (95% CI 0.87–0.92) and UWK of 0.56 (95% CI 0.51–0.61) in comparison with the consensus GGs established by the pathologists (Fig. 1c).

The agreement between the AI algorithm and the reference standard in categorising tiles into risk groups of benign, low (GG 1), intermediate (GG 2–3), and high‐risk (GG 4–5) had QWK of 0.89 (95% CI 0.86–0.92) and UWK of 0.73 (95% CI 0.69–0.78) (Fig. 1d). Figure 2 shows an example whole‐slide image, with a corresponding tile graded by both the AI algorithm and a pathologist.

Fig. 2 — The DeepDx Prostate AI algorithm shows agreement with an expert GU pathologist in detecting, localising, and grading prostate cancer on whole‐mount histopathology images. (a) The cancer extent predicted by the AI algorithm (green and yellow pixels) has a high degree of overlap with the expert GU pathologist. (b) In magnified versions of the cancer region, (c) the AI algorithm and an expert GU pathologist show high concordance in Gleason grading. Note that the AI algorithm localises Gleason‐patterns on a gland level, an intractable task for a human pathologist.

DeepDx Prostate AI Performance in Prostate Cancer GG Classification

For the differentiation of malignant vs benign cases, there was an accuracy of 0.96, sensitivity of 0.997, specificity of 0.88, NPV of 0.99, and PPV of 0.95. For the distinction between GG ≥2 vs GG 1 and benign, the performance metrics were an accuracy of 0.92, sensitivity of 0.98, specificity of 0.85, NPV of 0.97, and PPV of 0.88. When separating GG ≥3 from GG 1–2 and benign, the observed values were an accuracy of 0.92, sensitivity of 0.95, specificity of 0.90, NPV of 0.96, and PPV of 0.87 (Table 1).

Table 1.

Performance metrics of the DeepDx Prostate AI algorithm for clinically relevant classifications.

	Positive tiles, n	Negative tiles, n	Accuracy	Sensitivity	Specificity	NPV	PPV
Malignant vs benign	350	150	0.96	0.997	0.88	0.99	0.95
GG ≥2 vs GG 1 and benign	266	234	0.92	0.98	0.85	0.97	0.88
GG ≥3 vs GG 1–2 and benign	209	291	0.92	0.95	0.90	0.96	0.87

Open in a new tab

This table presents the classification performance of the AI algorithm for three different scenarios: distinguishing malignant vs benign tiles, differentiating GG ≥2 vs GG 1 and benign tiles, and discerning GG ≥3 vs GG 1–2 and benign tiles.

Discussion

Our study has four key findings. First, despite the algorithm being trained on biopsy images from a different patient population and institutions, it showed robust performance when validated on a separate set of whole‐mount RP specimens. The AI algorithm's ability to adapt to a different cohort and from biopsy to RP data without fine‐tuning enhances its clinical utility, offering a practical tool for physicians.

The AI algorithm provided Gleason patterns, which were then grouped into GGs to enable a direct comparison with the GGs determined by the pathologists. The AI algorithm demonstrated great concordance with expert GU pathologists in differentiating cancerous from non‐cancerous tissue (UWK = 0.91, 95% CI 0.87–0.95), as well as in classifying individual GGs (QWK = 0.89, 95% CI 0.87–0.92; UWK = 0.56, 95% CI 0.0.51–0.61) and when categorising tiles into risk groups of benign, low (GG 1), intermediate (GG 2–3), and high‐risk (GG 4–5) (QWK = 0.89, 95% CI 0.86–0.92; UWK = 0.73, 95% CI 0.69–0.78). This indicates that the AI algorithm generalises well across different datasets and tissue types.

Second, the AI algorithm's capability to distinguish malignant vs benign cases demonstrated near‐perfect sensitivity at 0.997 with a single false‐negative and a high specificity of 0.88. When identifying GG ≥2 vs GG 1 and benign, the sensitivity was 0.98 with specificity at 0.85. For discerning GG ≥3 vs GG 1–2 and benign, sensitivity was 0.95 with specificity of 0.90. These capabilities may reduce pathologists’ workload, improve inter‐observer variability, and facilitate quicker, more efficient data labelling for further AI algorithm development.

Third, AI algorithms such as DeepDx Prostate have the potential to be used to rapidly grade whole‐slide images from whole‐mount RP specimens. Whole‐mount RP slides are twice as wide as regular pathology slides while maintaining the same length, resulting in whole‐slide images of 7 gigapixels. With the algorithm, a whole‐slide image could be annotated and graded in 10 min. This level of speed and efficiency is not possible for humans and could facilitate more quantitative pathology, rather than just grade grouping. Studies, including work by Sauter et al. [29] have demonstrated that more subtle distinctions within GGs, such as distinguishing GG 2 and GG 3 tumours based on the predominant Gleason pattern, can have significant prognostic implications, further supporting the shift towards a more nuanced, quantitative understanding of prostate cancer pathology. The AI algorithm's rapid grading and annotation could facilitate this shift, enabling more accurate risk stratification and labelling for other AI models based on prostate histopathology.

Fourth, the ability of the AI algorithm to quantitatively analyse Gleason pattern proportions and segment those areas potentially enhances the accuracy and prognostic discrimination within GGs. Notable work by Sauter et al. [29] and Kachanov et al. [30] demonstrates the value of quantifying Gleason pattern 4 fractions for prognostication and risk stratification. In a clinical setting, pathologists only roughly estimate the proportions of each Gleason pattern in prostate cancer. However, AI can directly and precisely indicate the proportions of Gleason patterns based on segmentation, which is expected to provide greater assistance in risk stratification.

Our study has noteworthy limitations. First, due to inter‐observer variability of Gleason grading across pathologists, a reference standard can be difficult to establish. To address this, we used a consensus method between multiple GU pathologists. Second, the validation dataset in our study originates from a single centre, which could limit the generalisability of our findings. Yet, the robustness of the results obtained from our independent validation suggests that the algorithm demonstrates a strong potential for broad generalisation, as our single centre is a tertiary referral hospital seeing a breadth of prostate cancer pathology. This indicates that the model is likely to perform well when applied to additional datasets. Third, while our focus was on predicting GGs, a potential limitation is that we did not aim to predict more meaningful outcomes such as biochemical recurrence, metastasis or prostate cancer death in men undergoing RP due to unavailability of such data. Fourth, clinical utility of our study is somewhat limited as the analysis of how the AI algorithm's outcomes contribute to real‐world clinical benefits was not conducted. Building upon the demonstrated performance of the AI algorithm, further research is necessary to investigate the clinical utility of these findings. We plan to compare Gleason grading of the entire RP specimen to the pathology report and publish these data in a separate manuscript. Fifth, while manual selection of tiles for analysis may introduce selection bias, thereby potentially affecting the AI model's performance estimates, these selections were made to ensure a balanced pathology representation and reflect the diverse tissue morphologies in prostate cancer.

Notwithstanding these limitations, the DeepDx Prostate algorithm showed promise in this study. The algorithm's ability to identify and grade cancerous lesions on RP specimens could lead to near‐term clinical adoption. The AI algorithm demonstrated robust performance, not only in matching the diagnostic accuracy of expert GU pathologists but also in its ability to distinguish among various clinically relevant cut‐offs. Most importantly, despite being trained on a completely different dataset of biopsy tissues, the model showed a high degree of generalisability when validated on a separate set of whole‐mount RP specimens. This adaptability is critical for the wide‐scale implementation of such AI algorithms across various healthcare settings. Furthermore, the robust performance of the AI algorithm makes it a valuable tool for data labelling, thereby paving the way for additional AI research projects in prostate cancer.

Conclusion

The DeepDx Prostate AI algorithm is an accurate tool for identifying and grading prostate cancer on digital histopathology images of whole‐mount RP specimens, demonstrating almost perfect concordance with expert GU pathologists and impressive performance in various clinically relevant tasks. AI digital pathology algorithms have great promise for clinical use in grading biopsy and RP specimens.

Disclosure of Interests

All support for the present manuscript: this work was supported by the Departments of Radiology and Urology at Stanford University and the National Cancer Institute of the National Institutes of Health under Award Number R37CA260346. The content is solely the responsibility of the authors’ and does not necessarily represent the official views of the National Institutes of Health. Dr Bhambhvani was previously an employee of Valar Labs (January 2022–October 2022). Dr Rusu received consultant fees from Roche Molecular Systems until April 2023, unrelated to this research. Dr Rusu received research grants from GE Healthcare and Philips Healthcare. There are no financial conflicts of interest between any company and the authors. All authors certify that they have disclosed all relationships that could be viewed as presenting a potential conflict of interest.

Acknowledgements

We thank Dr Beatrice Knudsen for her assistance in discussing the study and editing the manuscript. We also thank Joonyoung Cho, jycho@deepbio.co.kr and Sun Woo Kim swkim@deepbio.co.kr, for running the DeepDx algorithm on the tiles.

B.S. and S.J.C.S. shared first authorship.

M.R. and G.A.S. shared senior authorship.

References

1. Sung H, Ferlay J, Siegel RL et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021; 71: 209–249 [DOI] [PubMed] [Google Scholar]
2. Egevad L, Granfors T, Karlberg L, Bergh A, Stattin P. Prognostic value of the Gleason score in prostate cancer. BJU Int 2002; 89: 538–542 [DOI] [PubMed] [Google Scholar]
3. Albertsen PC, Fryback DG, Storer BE, Kolon TF, Fine J. Long‐term survival among men with conservatively treated localized prostate cancer. JAMA 1995; 274: 626–631 [PubMed] [Google Scholar]
4. Allsbrook WC Jr, Mangold KA, Johnson MH et al. Interobserver reproducibility of Gleason grading of prostatic carcinoma: urologic pathologists. Hum Pathol 2001; 32: 74–80 [DOI] [PubMed] [Google Scholar]
5. Priester A, Natarajan S, Khoshnoodi P et al. Magnetic resonance imaging underestimation of prostate cancer geometry: use of patient specific molds to correlate images with whole mount pathology. J Urol 2017; 197: 320–326 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Soerensen SJC, Fan RE, Seetharaman A et al. Deep learning improves speed and accuracy of prostate gland segmentations on magnetic resonance imaging for targeted biopsy. J Urol 2021; 206: 604–612 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Shao W, Banh L, Kunder CA et al. ProsRegNet: a deep learning framework for registration of MRI and histopathology images of the prostate. Med Image Anal 2021; 68: 101919 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Sood RR, Shao W, Kunder C et al. 3D registration of pre‐surgical prostate MRI and histopathology images via super‐resolution volume reconstruction. Med Image Anal 2021; 69: 101957 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Rusu M, Shao W, Kunder CA et al. Registration of presurgical MRI and histopathology images from radical prostatectomy via RAPSODI. Med Phys 2020; 47: 4177–4188 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Shao W, Bhattacharya I, Soerensen SJ et al. Weakly supervised registration of prostate MRI and histopathology images. In de Bruijne M, Cattin PC, Cotin S et al. eds, Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24. Cham: Springer, 2021: 98–107 [Google Scholar]
11. Li L, Pahwa S, Penzias G et al. Co‐registration of ex vivo surgical histopathology and in vivo T2 weighted MRI of the prostate via multi‐scale spectral embedding representation. Sci Rep 2017; 7: 8717 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Sandgren K, Nilsson E, Keeratijarut Lindberg A et al. Registration of histopathology to magnetic resonance imaging of prostate cancer. Phys Imaging Radiat Oncol 2021; 18: 19–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Seetharaman A, Bhattacharya I, Chen LC et al. Automated detection of aggressive and indolent prostate cancer on magnetic resonance imaging. Med Phys 2021; 48: 2960–2972 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Bhattacharya I, Seetharaman A, Shao W et al. CorrSigNet: learning CORRelated prostate cancer SIGnatures from radiology and pathology images for improved computer aided diagnosis. In Martel AL, Abolmaesumi P, Stoyanov D et al. eds, Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. Cham: Springer, 2020: 315–325 [Google Scholar]
15. Bhattacharya I, Seetharaman A, Kunder C et al. Selective identification and localization of indolent and aggressive prostate cancers via CorrSigNIA: an MRI‐pathology correlation and deep learning framework. Med Image Anal 2022; 75: 102288 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Bhattacharya I, Shao W, Soerensen SJ et al. Integrating zonal priors and pathomic MRI biomarkers for improved aggressive prostate cancer detection on MRI. In Medical Imaging 2022: Computer‐Aided Diagnosis. Bellingham: SPIE, 2022: 192–198 [Google Scholar]
17. Bosma JS, Saha A, Hosseinzadeh M, Slootweg I, de Rooij M, Huisman H. Semi‐supervised learning with report‐guided pseudo labels for deep learning‐based prostate cancer detection using biparametric MRI. Radiol Artif Intell 2023; 5: e230031 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Saha A, Hosseinzadeh M, Huisman H. End‐to‐end prostate cancer detection in bpMRI via 3D CNNs: effects of attention mechanisms, clinical priori and decoupled false positive reduction. Med Image Anal 2021; 73: 102155 [DOI] [PubMed] [Google Scholar]
19. Acs B, Rantalainen M, Hartman J. Artificial intelligence as the next step towards precision pathology. J Intern Med 2020; 288: 62–81 [DOI] [PubMed] [Google Scholar]
20. Bulten W, Balkenhol M, Belinga JJA et al. Artificial intelligence assistance significantly improves Gleason grading of prostate biopsies by pathologists. Mod Pathol 2021; 34: 660–671 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Nir G, Karimi D, Goldenberg SL et al. Comparison of artificial intelligence techniques to evaluate performance of a classifier for automatic grading of prostate cancer from digitized histopathologic images. JAMA Netw Open 2019; 2: e190442 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Bulten W, Pinckaers H, van Boven H et al. Automated deep‐learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol 2020; 21: 233–241 [DOI] [PubMed] [Google Scholar]
23. Ryu HS, Jin MS, Park JH et al. Automated Gleason scoring and tumor quantification in prostate core needle biopsy images using deep neural networks and its comparison with pathologist‐based assessment. Cancers (Basel) 2019; 11: 1860 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Melo PAS, Estivallet CLN, Srougi M, Nahas WC, Leite KRM. Detecting and grading prostate cancer in radical prostatectomy specimens through deep learning techniques. Clinics (Sao Paulo) 2021; 76: e3198 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Nagpal K, Foote D, Liu Y et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med 2019; 2: 48 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Nagpal K, Foote D, Tan F et al. Development and validation of a deep learning algorithm for Gleason grading of prostate cancer from biopsy specimens. JAMA Oncol 2020; 6: 1372–1380 [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Bulten W, Kartasalo K, Chen PHC et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat Med 2022; 28: 154–163 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–174 [PubMed] [Google Scholar]
29. Sauter G, Steurer S, Clauditz TS et al. Clinical utility of quantitative Gleason grading in prostate biopsies and prostatectomy specimens. Eur Urol 2016; 69: 592–598 [DOI] [PubMed] [Google Scholar]
30. Kachanov M, Budäus L, Beyersdorff D et al. Targeted multiparametric magnetic resonance imaging/ultrasound fusion biopsy for quantitative Gleason 4 grading prediction in radical prostatectomy specimens: implications for active surveillance candidate selection. Eur Urol Focus 2023; 9: 303–308 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0001] 1. Sung H, Ferlay J, Siegel RL et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021; 71: 209–249 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0002] 2. Egevad L, Granfors T, Karlberg L, Bergh A, Stattin P. Prognostic value of the Gleason score in prostate cancer. BJU Int 2002; 89: 538–542 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0003] 3. Albertsen PC, Fryback DG, Storer BE, Kolon TF, Fine J. Long‐term survival among men with conservatively treated localized prostate cancer. JAMA 1995; 274: 626–631 [PubMed] [Google Scholar]

[bju16464-bib-0004] 4. Allsbrook WC Jr, Mangold KA, Johnson MH et al. Interobserver reproducibility of Gleason grading of prostatic carcinoma: urologic pathologists. Hum Pathol 2001; 32: 74–80 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0005] 5. Priester A, Natarajan S, Khoshnoodi P et al. Magnetic resonance imaging underestimation of prostate cancer geometry: use of patient specific molds to correlate images with whole mount pathology. J Urol 2017; 197: 320–326 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0006] 6. Soerensen SJC, Fan RE, Seetharaman A et al. Deep learning improves speed and accuracy of prostate gland segmentations on magnetic resonance imaging for targeted biopsy. J Urol 2021; 206: 604–612 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0007] 7. Shao W, Banh L, Kunder CA et al. ProsRegNet: a deep learning framework for registration of MRI and histopathology images of the prostate. Med Image Anal 2021; 68: 101919 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0008] 8. Sood RR, Shao W, Kunder C et al. 3D registration of pre‐surgical prostate MRI and histopathology images via super‐resolution volume reconstruction. Med Image Anal 2021; 69: 101957 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0009] 9. Rusu M, Shao W, Kunder CA et al. Registration of presurgical MRI and histopathology images from radical prostatectomy via RAPSODI. Med Phys 2020; 47: 4177–4188 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0010] 10. Shao W, Bhattacharya I, Soerensen SJ et al. Weakly supervised registration of prostate MRI and histopathology images. In de Bruijne M, Cattin PC, Cotin S et al. eds, Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24. Cham: Springer, 2021: 98–107 [Google Scholar]

[bju16464-bib-0011] 11. Li L, Pahwa S, Penzias G et al. Co‐registration of ex vivo surgical histopathology and in vivo T2 weighted MRI of the prostate via multi‐scale spectral embedding representation. Sci Rep 2017; 7: 8717 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0012] 12. Sandgren K, Nilsson E, Keeratijarut Lindberg A et al. Registration of histopathology to magnetic resonance imaging of prostate cancer. Phys Imaging Radiat Oncol 2021; 18: 19–25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0013] 13. Seetharaman A, Bhattacharya I, Chen LC et al. Automated detection of aggressive and indolent prostate cancer on magnetic resonance imaging. Med Phys 2021; 48: 2960–2972 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0014] 14. Bhattacharya I, Seetharaman A, Shao W et al. CorrSigNet: learning CORRelated prostate cancer SIGnatures from radiology and pathology images for improved computer aided diagnosis. In Martel AL, Abolmaesumi P, Stoyanov D et al. eds, Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. Cham: Springer, 2020: 315–325 [Google Scholar]

[bju16464-bib-0015] 15. Bhattacharya I, Seetharaman A, Kunder C et al. Selective identification and localization of indolent and aggressive prostate cancers via CorrSigNIA: an MRI‐pathology correlation and deep learning framework. Med Image Anal 2022; 75: 102288 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0016] 16. Bhattacharya I, Shao W, Soerensen SJ et al. Integrating zonal priors and pathomic MRI biomarkers for improved aggressive prostate cancer detection on MRI. In Medical Imaging 2022: Computer‐Aided Diagnosis. Bellingham: SPIE, 2022: 192–198 [Google Scholar]

[bju16464-bib-0017] 17. Bosma JS, Saha A, Hosseinzadeh M, Slootweg I, de Rooij M, Huisman H. Semi‐supervised learning with report‐guided pseudo labels for deep learning‐based prostate cancer detection using biparametric MRI. Radiol Artif Intell 2023; 5: e230031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0018] 18. Saha A, Hosseinzadeh M, Huisman H. End‐to‐end prostate cancer detection in bpMRI via 3D CNNs: effects of attention mechanisms, clinical priori and decoupled false positive reduction. Med Image Anal 2021; 73: 102155 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0019] 19. Acs B, Rantalainen M, Hartman J. Artificial intelligence as the next step towards precision pathology. J Intern Med 2020; 288: 62–81 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0020] 20. Bulten W, Balkenhol M, Belinga JJA et al. Artificial intelligence assistance significantly improves Gleason grading of prostate biopsies by pathologists. Mod Pathol 2021; 34: 660–671 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0021] 21. Nir G, Karimi D, Goldenberg SL et al. Comparison of artificial intelligence techniques to evaluate performance of a classifier for automatic grading of prostate cancer from digitized histopathologic images. JAMA Netw Open 2019; 2: e190442 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0022] 22. Bulten W, Pinckaers H, van Boven H et al. Automated deep‐learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol 2020; 21: 233–241 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0023] 23. Ryu HS, Jin MS, Park JH et al. Automated Gleason scoring and tumor quantification in prostate core needle biopsy images using deep neural networks and its comparison with pathologist‐based assessment. Cancers (Basel) 2019; 11: 1860 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0024] 24. Melo PAS, Estivallet CLN, Srougi M, Nahas WC, Leite KRM. Detecting and grading prostate cancer in radical prostatectomy specimens through deep learning techniques. Clinics (Sao Paulo) 2021; 76: e3198 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0025] 25. Nagpal K, Foote D, Liu Y et al. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med 2019; 2: 48 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0026] 26. Nagpal K, Foote D, Tan F et al. Development and validation of a deep learning algorithm for Gleason grading of prostate cancer from biopsy specimens. JAMA Oncol 2020; 6: 1372–1380 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0027] 27. Bulten W, Kartasalo K, Chen PHC et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat Med 2022; 28: 154–163 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bju16464-bib-0028] 28. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–174 [PubMed] [Google Scholar]

[bju16464-bib-0029] 29. Sauter G, Steurer S, Clauditz TS et al. Clinical utility of quantitative Gleason grading in prostate biopsies and prostatectomy specimens. Eur Urol 2016; 69: 592–598 [DOI] [PubMed] [Google Scholar]

[bju16464-bib-0030] 30. Kachanov M, Budäus L, Beyersdorff D et al. Targeted multiparametric magnetic resonance imaging/ultrasound fusion biopsy for quantitative Gleason 4 grading prediction in radical prostatectomy specimens: implications for active surveillance candidate selection. Eur Urol Focus 2023; 9: 303–308 [DOI] [PubMed] [Google Scholar]

PERMALINK

External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens

Bogdana Schmidt

Simon John Christoph Soerensen

Hriday P Bhambhvani

Richard E Fan

Indrani Bhattacharya

Moon Hyung Choi

Christian A Kunder

Chia‐Sui Kao

John Higgins

Mirabela Rusu

Geoffrey A Sonn

Abstract

Objectives

Materials and Methods

Results

Conclusion

Abbreviations

Introduction

Materials and Methods

Data Collection

Prostate Tissue Evaluation

Statistical Analysis

Results

Inter‐Pathologist and DeepDx Prostate AI Agreement in Gleason Grading

Fig. 1.

Fig. 2.

DeepDx Prostate AI Performance in Prostate Cancer GG Classification

Table 1.

Discussion

Conclusion

Disclosure of Interests

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens

Bogdana Schmidt

Simon John Christoph Soerensen

Hriday P Bhambhvani

Richard E Fan

Indrani Bhattacharya

Moon Hyung Choi

Christian A Kunder

Chia‐Sui Kao

John Higgins

Mirabela Rusu

Geoffrey A Sonn

Abstract

Objectives

Materials and Methods

Results

Conclusion

Abbreviations

Introduction

Materials and Methods

Data Collection

Prostate Tissue Evaluation

Statistical Analysis

Results

Inter‐Pathologist and DeepDx Prostate AI Agreement in Gleason Grading

Fig. 1.

Fig. 2.

DeepDx Prostate AI Performance in Prostate Cancer GG Classification

Table 1.

Discussion

Conclusion

Disclosure of Interests

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases