Abstract
Objectives
To develop and validate an artificial intelligence algorithm to decide on the necessity of dynamic contrast-enhanced sequences (DCE) in prostate MRI.
Methods
This study was approved by the institutional review board and requirement for study-specific informed consent was waived. A convolutional neural network (CNN) was developed on 300 prostate MRI examinations. Consensus of two expert readers on the necessity of DCE acted as reference standard. The CNN was validated in a separate cohort of 100 prostate MRI examinations from the same vendor and 31 examinations from a different vendor. Sensitivity/specificity were calculated using ROC curve analysis and results were compared to decisions made by a radiology technician.
Results
The CNN reached a sensitivity of 94.4% and specificity of 68.8% (AUC: 0.88) for the necessity of DCE, correctly assigning 44%/34% of patients to a biparametric/multiparametric protocol. In 2% of all patients, the CNN incorrectly decided on omitting DCE. With a technician reaching a sensitivity of 63.9% and specificity of 89.1%, the use of the CNN would allow for an increase in sensitivity of 30.5%. The CNN achieved an AUC of 0.73 in a set of examinations from a different vendor.
Conclusions
The CNN would have correctly assigned 78% of patients to a biparametric or multiparametric protocol, with only 2% of all patients requiring re-examination to add DCE sequences. Integrating this CNN in clinical routine could render the requirement for on-table monitoring obsolete by performing contrast-enhanced MRI only when needed.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13244-021-01058-7.
Keywords: Multiparametric MRI, Prostate cancer, Artificial Intelligence
Key points
AI helps in automated decision-making between biparametric and multiparametric prostate MRI protocols.
AI would have correctly assigned 78% of patients to a biparametric/multiparametric protocol.
Re-examinations would have only been necessary in 2% of all patients.
The performance of the trained network differed slightly between MRIs from different vendors.
Background
Prostate MRI has shown considerable clinical value in detection and staging of prostate cancer and is part of clinical routine in most institutions worldwide [1–4]. Conventionally, “multiparametric” prostate MRI consists of high-resolution T2—weighted, diffusion- weighted and dynamic contrast-enhanced (DCE) sequences [5]. Recently, the use of an abbreviated MRI protocol without application of a contrast agent (termed “biparametric MRI”) has been proposed and several investigations have reported a comparable performance of the protocol in cancer detection compared to the complete multiparametric protocol [6–11]. Omittance of DCE-MRI from the acquisition results in shorter examinations, an optimization duly needed in times of increasing demand for prostate MRI. Furthermore, biparametric MRI of the prostate avoids any contrast agent side effects, improves cost-effectiveness, and optimizes general workflow in the radiology department. In addition, DCE sequences are deemed of diagnostic quality in only a subset of patients, as recently reported by the PRECISION study group [12]. However, as DCE-MRI is known to reduce the number of indeterminate lesions, and to be of particular value in examinations with poor image quality or for the less-experienced radiologist [5, 13, 14], on-table monitoring and an individualized per-patient decision on DCE are currently proposed by the PI-RADS committee [15].
Ideally, the decision to inject a contrast agent should therefore be performed by an experienced radiologist on a per-scan and ad-hoc basis, and its application should be limited to those cases when it is deemed to improve clinical decision-making. However, given the expected rise in examinations due to the inclusion of prostate MRI into national and international urologic guidelines, such an individual and timely decision on every prostate MRI may not be feasible anymore in a clinical setting for most institutions.
Therefore, we sought to develop, train, and validate convolutional neural network (CNN) that would automatically identify patients in whom acquisition of a DCE sequence would be beneficial. This would allow for shorter biparametric examinations for many patients and increase patient safety by the omission of contrast media injection while simultaneously avoiding a decreased diagnostic accuracy in those patients who would benefit from a complete multiparametric MRI protocol.
Materials and methods
Patient cohorts and image analysis
This study was approved by the institutional review board and the requirement for study-specific informed consent was waived. A retrospective search was performed on our prospectively maintained institutional database from 02/01/2018 to 11/30/2019 for consecutive patients undergoing multiparametric prostate MRI (in accordance with PIRADS guidelines [5]) for suspicion of prostate cancer. Three distinctive cohorts were formed: (1) a group of 300 multiparametric prostate MRI for training of the neural network (“training set”) (see Additional file 1: Appendix S1), (2) a group of 100 multiparametric prostate MRI for validation of the trained network (“validation set”) and (3) a group of 31 patients undergoing prostate MRI on a scanner from a different vendor (“different vendor set”). The MRI examinations included into the “training” and “validation” were performed in 363 patients (age at time of MRI: 64.4 years, mean PSA at time of MRI: 8.33 ng/ml). All MRI scans of the first two groups were performed on Siemens Skyra scanners (Siemens Healthineers, Erlangen, Germany) at a field strength of 3 Tesla using a 60 Ch or 18 Ch phased-array body coil, while the “different vendor set” underwent examinations on a GE Discovery MR750w (GE Healthcare, Chicago, Illinois, USA) using a 16 Ch phased-array body coil at a field strength of 3 Tesla. Typical MRI parameters for axial T2—weighted and diffusion-weighted sequences can be found in Additional file 1: Table S1. Two board-certified radiologists with 10 and 7 years of experience in dedicated prostate imaging (‘expert radiologists’ according to ESUR/ESUI consensus [16], O.F.D. and A.M.H., R1 and R2) independently reviewed all examinations of the “training set”, “validation set” and “different vendor set” and scored whether DCE sequences would have been beneficial for diagnosis (regardless of the reason, e.g. distortions by rectal gas, low signal-to-noise-ratio etc.). After completion of readings a consensus was reached by the two readers by reviewing all examinations with discrepant decisions. The resulting consensus on the “training set” was used as reference standard for training of the CNN. In addition, the “validation set” was reviewed by a technician (R3) with daily practice in acquiring prostate MRI examinations and again the necessity of DCE sequences was noted.
Training and validation of a neural network
A detailed account including technical specifications of the neural network can be found in Additional file 1: Appendix S1. The accompanying PyTorch code and the training scheme can be found at https://github.com/enderkon/ProstateQC.git.
Development of the convolutional neural network was based on T2—weighted axial images and corresponding diffusion-weighted images (b values of 100, 600 and 1000 s/mm2). After exporting the anonymized image data of these sequences from PACS, images were pre-processed to standardize pixel size across all images. The convolutional neural network was trained on a set of 300 MRI examinations (“training set”) with the consensus on desirability of DCE by two experienced radiologists as reference standard and then applied to a separate set of 100 MRI examinations not used for training (“validation set”). Separate branches for anatomical (T2—weighted) and diffusion-weighted images were created, whose outputs were then reduced to receive a single probability (ranging from 0 to 1, with 0 meaning “DCE not desirable” and 1 meaning “DCE highly desirable”). For training, 10 random experiments were performed with the training set being split into training (80%), validation (5%) and testing partitions (15%) randomly, with the training partition being used to train the network, the validation partition being used to decide when to stop training and testing partition to monitor the error. The model was trained for 30 epochs for each random experiment, the iteration that led to the lowest classification error and cross-entropy loss on the validation set was determined to be the final model for experiment. While analyzing validation data, all 10 trained networks were applied separately, and results were aggregated through averaging to yield the final prediction for each image. The trained network was consecutively validated in a training set with images of 100 additional patients to evaluate its sensitivity and specificity and in a separate set of 31 prostate MRIs performed on a scanner from a different MRI vendor, see Fig. 1.
Statistics
To assess inter-reader agreement, Cohen’s kappa was estimated and interpreted as proposed by Landis and Koch [17] and as follows: excellent agreement > 0.75, good agreement 0.59–0.75, fair agreement 0.40–0.58, poor agreement < 0.4. Diagnostic accuracy was assessed by the area under the curve of a receiver-operator-characteristics (ROC) analysis and the best cut-off value was estimated by maximizing the Youden index.
All statistical analyses were performed using IBM SPSS Statistics 26 (IBM Inc., Armonk, USA) and MedCalc 18.2.1 (MedCalc Software Ltd, Ostend, Belgium).
Results
Inter-reader agreement
R1 and R2 agreed on the necessity for DCE sequences in 267/300 (89%) cases of the “training set” (kappa: 0.76), 89/100 (89%) cases of the “validation set” (kappa: 0.76) and in 26/31 cases of the “different vendor set” (kappa: 0.64). In the remaining examinations, a consensus reading was needed to complete the standard of reference. Agreement between R3 (technician) and the reference standard (consensus of R1 and R2) for the “validation set” was fair with a kappa of 0.55 (see Table 1).
Table 1.
DCE necessary | DCE not necessary | Agreement | AUC (95% CI) | Sensitivity | Specificity | |
---|---|---|---|---|---|---|
Consensus (R1/R2) | 36/100 (36%) | 64/100 (64%) | Ref | Ref | Ref | Ref |
Technician (R3) | 70/100 (70%) | 30/100 (30%) | 0.55 | 0.765 (0.669–0844) | 63.9% | 89.1% |
Artificial Intelligence (AI) | 56/100 (56%) | 44/100 (44%) | 0.54 | 0.881 (0.801–0.937) | 94.4% | 68.8% |
Agreement: kappa with Consensus as reference standard; AUC: Area-under-the-curve; Sensitivity and Specificity of the artificial intelligence based on ROC analysis with a maximized Youden index and high sensitivity to avoid re-examinations
Diagnostic accuracy of the neural network
The final neural network showed a sensitivity of 94.4% and specificity of 68.8% in the “validation set” when maximizing the Youden index (AUC: 0.88, J: 0.63) in ROC analysis (see Fig. 2a). When aiming for a low rate of false negatives (a low re-examination rate for adding DCE sequences), this would result in 2% of all patients (and 2/36 patients with a need for DCE, 5.6%) needing a supplementary examination including the injection of a contrast agent (false negatives), while 44% of patients correctly underwent biparametric and 34% of patients correctly underwent multiparametric MRI (see Table 2). In 20% of patients, the CNN decided to perform DCE while the radiologists did not deem DCE to be necessary (false positives).
Table 2.
Radiologists: DCE necessary | Radiologists: DCE not necessary | |
---|---|---|
AI: DCE necessary | 34/100 (34%) | 20/100 (20%) |
AI: DCE not necessary | 02/100 (2%) | 44/100 (44%) |
Sensitivity: 94.4% | Specificity: 68.8% |
“DCE necessary/not necessary” is based on the consensus of two expert radiologists
AI Artificial Intelligence
With R3 (technician) reaching a sensitivity of 63.9% and specificity of 89.1%, the use of the neural network would allow for an increase in sensitivity of 30.5% at an albeit lower specificity (see Figs. 2b and 3).
When applying the trained neural networks to a set of MRI examinations from a different vendor, ROC analysis with maximized Youden index (AUC: 0.73, J: 0.42) demonstrated a sensitivity of 100% and a specificity of 42.1% (see Table 3).
Table 3.
AUC (95% CI) | Criterion | Sensitivity (%) | Specificity (%) | |
---|---|---|---|---|
AI validation set | 0.881 (0.801–0.937) | > 0.221 | 94.4 | 68.8 |
AI different vendor set | 0.726 (0.537–0.870) | > 0.171 | 100.0 | 42.1 |
Discussion
Due to its necessity in order to perform targeted biopsies, the widespread integration of prostate MRI in the diagnostic workup of patients with suspected prostate cancer will likely lead to an increased number of examinations to be performed by radiology departments in the near future [1]. This represents a challenge, as it not only requires improving the radiological workflow [18], but also ensuring optimal image quality of the examinations, as stressed by the recently published PI-QUAL scoring system from the PRECISION trial group [12, 19]. We developed and validated a CNN to independently decide on the necessity of dynamic contrast-enhanced sequences (DCE) with high accuracy and with a very low false negative rate (i.e. a low rate of patients who falsely did not undergo DCE).
Multiparametric prostate MRI includes T2—weighted, diffusion-weighted and dynamic contrast-enhanced sequences. Recently, several authors suggested that dynamic contrast-enhanced sequences could be omitted from the MRI protocol (“biparametric MRI”), thus shortening examination times and avoiding any potential unwanted side effects from the contrast agent [6, 7]. However, it is also known that contrast-enhanced sequences can be of value in a subset of patients undergoing prostate MRI [20, 21] and that they can be of particular value for the unexperienced radiologist and in examinations with artifacts (e.g., from hip prothesis) or poor image quality [14]. While previous papers focused on the use of AI in prostate MRI to improve planning and image quality of the examinations [18] or tumor detection [22, 23], we aimed to harness the benefits of artificial intelligence to improve quality control and workflow-relevant decision making.
Ideally, the decision on whether to perform DCE should be made on a per-patient and ad-hoc basis, as currently proposed by the PI-RADS committee [15]. However, applying on-table monitoring and having a radiologist render this decision on a per-case basis oftentimes is not feasible in clinical routine—and re-examinations should be avoided particularly to not endanger the (time) benefits gained from not performing DCE sequences in every patient. In this study we sought to delegate the task of deciding between a biparametric and multiparametric protocol to artificial intelligence, which would allow for real-time decision-making and a straightforward implementation into the clinical workflow. The trained neural network was able to correctly decide on contrast agent application with a very high sensitivity of > 94%. This approach would have correctly assigned 44% of patients to a biparametric protocol—thus sparing them from contrast injection—and 34% to a standard multiparametric MRI. Twenty percent of patients would have undergone multiparametric instead of biparametric MRI based on the algorithm’s decision. At the same time, only 2% of all patients (5.6% in the subgroup of patients with the need for DCE) would not have received DCE when expert radiologists would have deemed it necessary. Depending on the clinical question posed in these patients, they could be scheduled for a re-examination. Finally, performance of the neural network was superior to the accuracy of the radiology technician acquiring the images (reader 3 in this study), i.e. it would be beneficial having the neural network deciding on DCE in clinical routine. This would particularly apply in a setting where the biparametric MRI protocol is used as an institutional standard for detection of target lesions on prostate MRI. The AI could automatically detect a scan that might require the acquisition of DCE sequences with high accuracy and alert the attending radiologist (who still has to supervise the application of contrast agent due to legal reasons and the possibility for adverse reactions).
When applying the neural network trained on in-house scanners to a set of MRI examinations performed on a different scanner, a sensitivity of 100% at an albeit lower specificity of 42.1% was achieved. While this is certainly an encouraging result and shows that the network is not restricted to images from scanners of a certain manufacturer, a dedicated training set based on scans from this different vendor would likely further increase the accuracy of the neural network.
In addition, our approach could be improved and refined in a few ways: Though the testing set for the neural network consisted of sequentially acquired clinical MRI examinations, the artificial intelligence requires validation in clinical routine in the future. In this study, the decision rendered by the neural network was regarded as dichotomous—however, it would be possible to define a range of probability values in which the AI is unsure in its decision, which would prompt the technician to call a radiologist for this particular examination. This approach could reduce the number re-examinations, while still allowing for omittance of DCE in many cases. Also, the consensus of two “expert level” radiologists was used as standard of reference in this study. However, there might be cases in which a more novice reader would have appreciated a DCE sequence when the more experienced reader does not require it. In addition, while we assessed the performance of the neural network in a set of MRI examinations from a different vendor, the number of scans included into the “different vendor set” was rather low. However, as the code for the neural network will be freely available, our results can easily be tested in a different institution and with different scanners.
In conclusion, we designed a neural network with the ability to accurately decide between acquisition of a full multiparametric MRI protocol including DCE and a faster biparametric protocol. The rate of patients who would have left the MRI scanner without an ultimately needed DCE sequence based on the decision made by the CNN was very low. Hence, integration of AI into quality assessment and decision making could allow for shorter examination times and a more streamlined clinical workflow, while maintaining diagnostic accuracy by including DCE only when truly needed.
Supplementary Information
Abbreviations
- AI
Artificial intelligence
- AUC
Area-under-the-curve
- CNN
Convolutional neural network
- DCE
Dynamic contrast-enhanced
- DWI
Diffusion-weighted imaging
- mpMRI
Multiparametric MRI
- PI-RADS
Prostate imaging reporting and data system
- ROC
Receiver-operating characteristics
Authors' contributions
AMH: Study design, Data acquisition, Data analysis, Manuscript preparation. RDM: Data acquisition, Manuscript preparation. AT: Data acquisition. EK: Study design, Technical Development, Data acquisition, Data analysis, Manuscript preparation. OFD: Study design, Data acquisition, Data analysis, Manuscript preparation. All authors read and approved the final manuscript.
Funding
None.
Availability of data and materials
A detailed account including technical specifications of the neural network can be found in Additional file 1: Appendix S1. The accompanying PyTorch code and the training scheme can be found at https://github.com/enderkon/ProstateQC.git.
Declarations
Ethics approval and consent to participate
This study was approved by the institutional review board and the requirement for study-specific informed consent was waived.
Consent for publication
Not applicable.
Competing interests
None.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Ahmed HU, El-Shater Bosaily A, Brown LC, et al. Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study. Lancet. 2017 doi: 10.1016/S0140-6736(16)32401-1. [DOI] [PubMed] [Google Scholar]
- 2.Padhani AR, Barentsz J, Villeirs G, et al. PI-RADS Steering Committee: the PI-RADS multiparametric MRI and MRI-directed biopsy pathway. Radiology. 2019 doi: 10.1148/radiol.2019182946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mehralivand S, Shih JH, Harmon S, et al. A grading system for the assessment of risk of extraprostatic extension of prostate cancer at multiparametric MRI. Radiology. 2019 doi: 10.1148/radiol.2018181278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Muehlematter UJ, Burger IA, Becker AS, et al. Diagnostic accuracy of multiparametric MRI versus (68)Ga-PSMA-11 PET/MRI for extracapsular extension and seminal vesicle invasion in patients with prostate cancer. Radiology. 2019 doi: 10.1148/radiol.2019190687. [DOI] [PubMed] [Google Scholar]
- 5.Turkbey B, Rosenkrantz AB, Haider MA, et al. Prostate imaging reporting and data system version update of prostate imaging reporting and data system. Eur Urol. 2019 doi: 10.1016/j.eururo.2019.02.033. [DOI] [PubMed] [Google Scholar]
- 6.Kuhl CK, Bruhn R, Krämer N, Nebelung S, Heidenreich A, Schrading S. Abbreviated biparametric prostate MR imaging in men with elevated prostate-specific antigen. Radiology. 2017 doi: 10.1148/radiol.2017170129. [DOI] [PubMed] [Google Scholar]
- 7.Bosaily AE-S, Frangou E, Ahmed HU, et al. Additional value of dynamic contrast-enhanced sequences in multiparametric prostate magnetic resonance imaging: data from the PROMIS study. Eur Urol. 2020 doi: 10.1016/j.eururo.2020.03.002. [DOI] [PubMed] [Google Scholar]
- 8.Knaapila J, Jambor I, Ettala O, et al. Negative predictive value of biparametric prostate magnetic resonance imaging in excluding significant prostate cancer: a pooled data analysis based on clinical data from four prospective, registered studies. Eur Urol Focus. 2020 doi: 10.1016/j.euf.2020.04.007. [DOI] [PubMed] [Google Scholar]
- 9.Tamada T, Kido A, Yamamoto A, et al. Comparison of biparametric and multiparametric MRI for clinically significant prostate cancer detection with PI-RADS Version 2.1. Magn Reson Imaging. 2020;5:4. doi: 10.1002/jmri.27283. [DOI] [PubMed] [Google Scholar]
- 10.Barth BK, de Visschere PJL, Cornelius A, et al. Detection of clinically significant prostate cancer: short dual-pulse sequence versus standard multiparametric MR Imaging-A multireader study. Radiology. 2017 doi: 10.1148/radiol.2017162020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Weiss J, Martirosian P, Notohamiprodjo M et al (2018) Implementation of a 5-minute magnetic resonance imaging screening protocol for prostate cancer in men with elevated prostate-specific antigen before biopsy. Invest Radiol. 2018;53(3):186–190. [DOI] [PubMed]
- 12.Giganti F, Allen C, Emberton M, Moore CM, Kasivisvanathan V. Prostate imaging quality (PI-QUAL): a new quality control scoring system for multiparametric magnetic resonance imaging of the prostate from the PRECISION trial. Eur Urol Oncol. 2020 doi: 10.1016/j.euo.2020.06.007. [DOI] [PubMed] [Google Scholar]
- 13.Zawaideh JP, Sala E, Shaida N, et al. Diagnostic accuracy of biparametric versus multiparametric prostate MRI: assessment of contrast benefit in clinical practice. Eur Radiol. 2020 doi: 10.1007/s00330-020-06782-0. [DOI] [PubMed] [Google Scholar]
- 14.Gatti M, Faletti R, Calleris G, et al. Prostate cancer detection with biparametric magnetic resonance imaging (bpMRI) by readers with different experience: performance and comparison with multiparametric (mpMRI) Abdom Radiol (NY) 2019 doi: 10.1007/s00261-019-01934-3. [DOI] [PubMed] [Google Scholar]
- 15.Schoots IG, Barentsz JO, Bittencourt LK, et al. PI-RADS committee position on MRI without contrast medium in biopsy naive men with suspected prostate cancer: a narrative review. AJR Am J Roentgenol. 2020 doi: 10.2214/AJR.20.24268. [DOI] [PubMed] [Google Scholar]
- 16.de Rooij M, Israël B, Tummers M, et al. ESUR/ESUI consensus statements on multi-parametric MRI for the detection of clinically significant prostate cancer: quality requirements for image acquisition, interpretation and radiologists’ training. Eur Radiol. 2020 doi: 10.1007/s00330-020-06929-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 doi: 10.2307/2529310. [DOI] [PubMed] [Google Scholar]
- 18.Esser M, Zinsser D, Kündel M, et al. Performance of an automated workflow for magnetic resonance imaging of the prostate: comparison with a manual workflow. Invest Radiol. 2020 doi: 10.1097/RLI.0000000000000635. [DOI] [PubMed] [Google Scholar]
- 19.Giganti F, Kirkham A, Kasivisvanathan V, et al. Understanding PI-QUAL for prostate MRI quality: a practical primer for radiologists. Insights Imaging. 2021 doi: 10.1186/s13244-021-00996-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Padhani AR, Schoots I, Villeirs G. Contrast medium or no contrast medium for prostate cancer diagnosis. That is the question. J Magn Reson Imaging. 2020;5:4. doi: 10.1002/jmri.27180. [DOI] [PubMed] [Google Scholar]
- 21.de Rooij M, Israël B, Bomers JGR, Schoots IG, Barentsz JO. Can biparametric prostate magnetic resonance imaging fulfill its PROMIS? Eur Urol. 2020 doi: 10.1016/j.eururo.2020.04.062. [DOI] [PubMed] [Google Scholar]
- 22.Schelb P, Kohl S, Radtke JP, et al. Classification of cancer at prostate MRI: deep learning versus clinical PI-RADS Assessment. Radiology. 2019 doi: 10.1148/radiol.2019190938. [DOI] [PubMed] [Google Scholar]
- 23.Bonekamp D, Kohl S, Wiesenfarth M, et al. Radiomic machine learning for characterization of prostate lesions with MRI: comparison to ADC values. Radiology. 2018 doi: 10.1148/radiol.2018173064. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
A detailed account including technical specifications of the neural network can be found in Additional file 1: Appendix S1. The accompanying PyTorch code and the training scheme can be found at https://github.com/enderkon/ProstateQC.git.