Analysis and Evaluation of a Deep Learning Reconstruction Approach with Denoising for Orthopedic MRI

Kevin M Koch; Mohammad Sherafati; V Emre Arpinar; Sampada Bhave; Robin Ausman; Andrew S Nencka; R Marc Lebel; Graeme McKinnon; S Sivaram Kaushik; Douglas Vierck; Michael R Stetz; Sujan Fernando; Rajeev Mannem

doi:10.1148/ryai.2021200278

. 2021 Aug 11;3(6):e200278. doi: 10.1148/ryai.2021200278

Analysis and Evaluation of a Deep Learning Reconstruction Approach with Denoising for Orthopedic MRI

Kevin M Koch ^1,^✉, Mohammad Sherafati ¹, V Emre Arpinar ¹, Sampada Bhave ¹, Robin Ausman ¹, Andrew S Nencka ¹, R Marc Lebel ¹, Graeme McKinnon ¹, S Sivaram Kaushik ¹, Douglas Vierck ¹, Michael R Stetz ¹, Sujan Fernando ¹, Rajeev Mannem ¹

PMCID: PMC8637471 PMID: 34870214

Abstract

Purpose

To evaluate two settings (noise reduction of 50% or 75%) of a deep learning (DL) reconstruction model relative to each other and to conventional MR image reconstructions on clinical orthopedic MRI datasets.

Materials and Methods

This retrospective study included 54 patients who underwent two-dimensional fast spin-echo MRI for hip (n = 22; mean age, 44 years ± 13 [standard deviation]; nine men) or shoulder (n = 32; mean age, 56 years ± 17; 17 men) conditions between March 2019 and June 2020. MR images were reconstructed with conventional methods and the vendor-provided and commercially available DL model applied with 50% and 75% noise reduction settings (DL 50 and DL 75, respectively). Quantitative analytics, including relative anatomic edge sharpness, relative signal-to-noise ratio (rSNR), and relative contrast-to-noise ratio (rCNR) were computed for each dataset. In addition, the image sets were randomized, blinded, and presented to three board-certified musculoskeletal radiologists for ranking based on overall image quality and diagnostic confidence. Statistical analysis was performed with a nonparametric hypothesis comparing derived quantitative metrics from each reconstruction approach. In addition, inter- and intrarater agreement analysis was performed on the radiologists’ rankings.

Results

Both denoising settings of the DL reconstruction showed improved edge sharpness, rSNR, and rCNR relative to the conventional reconstructions. The reader rankings demonstrated strong agreement, with both DL reconstructions outperforming the conventional approach (Gwet agreement coefficient = 0.98). However, there was lower agreement between the readers on which DL reconstruction denoising setting produced higher-quality images (Gwet agreement coefficient = 0.31 for DL 50 and 0.35 for DL 75).

Conclusion

The vendor-provided DL MRI reconstruction showed higher edge sharpness, rSNR, and rCNR in comparison with conventional methods; however, optimal levels of denoising may need to be further assessed.

Keywords: MRI Reconstruction Method, Deep Learning, Image Analysis, Signal-to-Noise Ratio, MR-Imaging, Neural Networks, Hip, Shoulder, Physics, Observer Performance, Technology Assessment

Supplemental material is available for this article.

Keywords: MRI Reconstruction Method, Deep Learning, Image Analysis, Signal-to-Noise Ratio, MR-Imaging, Neural Networks, Hip, Shoulder, Physics, Observer Performance, Technology Assessment

graphic file with name ryai.2021200278.va.jpg

Summary

A deep learning MRI reconstruction algorithm with two different denoising settings was evaluated, and output images were found to have higher image quality compared with images processed with conventional reconstruction methods.

Key Points

■ Deep learning–based MRI raw data reconstructions showed quantifiable image quality improvements relative to conventional reconstructions in edge sharpness (P ≤ .01), signal-to-noise ratio, contrast-to-noise ratio, and contrast ratio (P < .002).
■ Radiologists’ scoring of the deep learning reconstruction largely agreed with the automated analytical quality improvements, although there were variations in the assessment of optimal levels of denoising across readers (interrater γ value = 0.31–0.35).

Introduction

One of the key remaining challenges of MRI compared with other imaging modalities (eg, radiography, CT) is its relatively long data acquisition duration. Over the past 2 decades, acquisition acceleration techniques, such as partial Fourier encoding (1), parallel imaging (2–4), and incoherent sparse sampling of data points (5–7), have substantially shortened image acquisition durations. However, all MRI undersampling techniques ultimately reach performance limits whereby conventional reconstructions cannot adequately reconstruct the undersampled data. To improve image quality beyond the limitations of conventional MRI acquisition and reconstruction techniques, applications of deep convolutional neural network models (8–12) to MRI have recently undergone a surge of technical advancement and development (13–21).

In this study, a premarket vendor-provided prototype for raw data MRI reconstruction using a deep learning (DL) network model with an adjustable noise reduction setting was evaluated on clinical orthopedic MRI datasets. The evaluated prototype allowed for varying noise reduction through a tunable noise reduction factor ranging from 0 to 100. Although the DL reconstruction technology evaluated in this study was a premarket prototype, since the conclusion of data collection for the present study, this technology has received regulatory clearance for clinical use.

In this study, orthopedic intermediate-weighted MR images of the hip and shoulder were analyzed. Using images from these orthopedic datasets, quantitative analysis using edge quantification, relative signal-to-noise ratio (rSNR), relative contrast ratio (rCR), and relative contrast-to-noise ratio (rCNR) measurements were estimated on conventional reconstructed MR images, along with DL reconstructed images with noise reduction of 50% (DL 50) and 75% (DL 75). In addition, three board-certified musculoskeletal radiologists (R.M., S.F., M.R.S.) performed a shuffled and blinded reader study. This study sought to test the hypothesis that the DL reconstruction approaches could improve image quality in terms of reader evaluation scores and objective image quality metrics compared with conventional construction methods.

Materials and Methods

This study was performed with partial grant support from GE Healthcare. Authors who were not employees of GE Healthcare had control over the inclusion of any data and information that might have presented a conflict of interest for those authors who are employees of GE Healthcare. The authors who were not employees of GE Healthcare had full control of the data and guarantee the study validity/reported results.

Patient Inclusion

A retrospective analysis of clinical data was performed under a Health Insurance Portability and Accountability Act–compliant registration protocol approved by the institutional review board of the Medical College of Wisconsin. Ethics approval and study design documents can be provided upon request. The study sought to perform exploratory image quality analysis of the evaluated DL reconstruction. As the study analyzed source imaging data collected for clinical purposes, the study was performed under an institutional review board–granted waiver of consent.

De-identified imaging data (following Digital Imaging and Communications in Medicine presentation state [DICOM PS] 3.15–2011 Annex E Basic using 113100) from 54 patients were included in the study. Data were gathered from two MRI scanners spanning different generations of vendor hardware. Eligibility criteria for this study included patients who underwent diagnostic MRI of the hip or shoulder in which the scanner operators were able to manually invoke the premarket prototype DL reconstruction on the raw acquired data. Data were collected between March 2019 and June 2020. All included patients were imaged owing to symptomatic presentation of their hip or shoulder. All but one patient had clinically relevant imaging findings. The study sample size was limited by the data acquisition duration of the study and the ability to manually run online prototype reconstructions without disturbing imaging clinic workflows. Owing to this convenience-based patient screening process, only 71 total patient datasets were considered (screened) for the study, with 54 datasets satisfying the inclusion and exclusion criteria. Inclusion criteria included imaging examinations of the shoulder or hip with intermediate-weighted fast/turbo spin-echo acquisitions that were successfully processed using the prototype DL reconstruction algorithm. Screened patients who underwent imaging of other joints or for whom the DL reconstruction algorithm was not run on an appropriate intermediate-weighted turbo/fast spin-echo acquisition were excluded.

Imaging Method and Analysis

The turbo/fast spin-echo imaging data analyzed in this study were extracted from routine clinical examinations performed on each included patient, with only one acquisition within the examination used for detailed retrospective DL reconstruction and analysis. Two-dimensional fast spin-echo acquisition parameters for shoulder and hip imaging, respectively, were as follows: sagittal or coronal plane; field of view, 12 cm and 18 cm; echo time, 49 msec and 63 msec; repetition time, 2.8 sec and 3.0 sec; echo-train length, nine for both; section thickness, 320 × 224 in-plane data matrix for both; pixel bandwidth, 163 Hz and 109 Hz; and flip angle, 111° for both. No parallel imaging acquisition methods were used on the data collected in this study. All datasets were acquired with fat saturation, with the exception of two hip datasets, which were collected for radial analysis and did not use fat saturation methods. Datasets were collected using 3.0-T Discovery MR750 and 3.0-T Signa Architect scanners operating on software version DV26 (GE Healthcare). For hip images, radiofrequency coil selection within the geometry embracing method array was performed automatically using system software. Shoulder images were collected using a vendor-provided hard-shell eight-channel shoulder array.

DL Reconstruction Approach

The evaluated neural network prototype for MRI reconstruction was trained with a supervised learning approach using pairs of high-spatial-resolution high-signal-to-noise ratio images and synthesized low-resolution low-signal-to-noise ratio images, which were augmented to provide a training database of 4 million distinct image and augmentation combinations (22). The evaluated DL network does not perform denoising based on previously reconstructed DICOM images; rather, the network is constructed to accept raw complex imaging data as input. The evaluated DL reconstruction is compatible with raw data collected on two-dimensional Cartesian sampling grids (except for echo planar trajectories) and produces DICOM images. Although the reconstruction outputs magnitude DICOM images, DL inferencing occurs on fully sampled complex imaging data after the application of any necessary parallel imaging algorithms.

The publicly available details of this vendor-developed DL reconstruction network architecture and training approach have been reported elsewhere (22). Briefly, the network contains 4.4 million trainable parameters in approximately 10 000 kernels. It uses no bias terms and employs rectified linear unit activation, thereby preserving the network’s broad applicability to low- and high-intensity images and ability to perform blind denoising. Network training was performed in a single epoch of the training database, using the Adam optimizer (23) to minimize loss between the predicted and the near-perfect images. The commercially available implementation of this algorithm uses a T4 graphics processing unit chip (NVIDIA) for inferencing. For the datasets analyzed in this study, inference times on the graphics processing unit were performed in less than 1.5 seconds.

Image Analysis

Conventional reconstruction was used for routine clinical diagnostics and is termed the no DL method in this study. In addition, DL reconstruction images were computed with noise reduction factors of 50% (DL 50) and 75% (DL 75). In the context of this study, conventional reconstruction refers to Fourier transform reconstruction and apodization of fully sampled k-space data. Although the datasets used for this evaluation study did not use parallel imaging acquisition methods, the evaluated DL reconstruction can be deployed on such datasets after synthesis of skipped k-space points using conventional parallel imaging methods.

To quantitatively assess the effect of each reconstruction on anatomic boundaries within the images, perpendicular slopes—computed as a spatial gradient on imaging data—were estimated across designated anatomic boundaries with anticipated sharp transitions. In the shoulder, the superior and caudal margins of the infraspinatus and supraspinatus tendons were analyzed in 18 datasets, the supraspinatus tendon was analyzed in eight datasets, and the humeral head boundary was analyzed in six datasets. The anatomic regions used in each dataset were chosen based on optimal visibility of the anatomy on the conventionally reconstructed images. Figure E1 (supplement) provides example images and plots of hip and shoulder images used for the boundary analysis. For hip images (in 22 patients), the borders of the femoral head were evaluated. Two patients who underwent hip imaging were excluded from the edge analysis component of the study owing to poor visibility of the boundary on the conventional reconstruction dataset. In this work, this computed relative slope is termed the relative edge sharpness index. The details of this computational approach are provided in Appendix E1 (supplement).

The rSNR, rCNR, and rCR were also estimated using bone and muscle regions within each fat-saturated image set. The details of these calculation procedures are also provided in Appendix E1 (supplement).

Finally, the reconstructed image datasets were blinded and randomized. Three board-certified musculoskeletal radiologists (R.M., S.F., M.R.S.) averaging 10 years of experience each (12, 11, and 8 years of experience, respectively) then evaluated each image set for anatomic and pathologic conspicuity and overall image quality. On the basis of these evaluations, ratings were assigned to each reconstructed image set on a scale of 1 to 3 (1, best; 2, intermediate; 3, worst).

Statistical and Quantitative Assessments

Continuous variables were reported as the mean ± standard deviation and median. The 95% CIs were computed for all reported descriptive metric statistics. The two-sided Wilcoxon signed rank test (24) was used to assess agreement between continuous variables, which were mostly observed to have nonnormal distributions. The level of significance for statistical hypothesis tests was set at .05. For the multireader study, Gwet agreement coefficient γ was utilized as a measure of scoring consistency (25,26). Python packages SciPy (27), Pingouin (28), and Pandas (https://pandas.pydata.org/) were used for data analysis and pairwise test calculations, and the MATLAB-based package mReliability (29) was used for the multireader scoring reliability analyses.

Results

Patient Overview

A total of 54 patients were included in this study. Thirty-two patients (mean age, 56 years ± 17; 17 men) underwent shoulder imaging, and 22 patients (mean age, 44 years ± 13; nine men) underwent hip imaging. Detailed imaging findings from each patient are outlined in Table E1 (supplement).

Image Quality Metric Comparisons

Figure 1 comprises a box and whisker plot of the relative edge sharpness index metric. Edge sharpness was higher for hip images with the DL 50 (mean, 0.96 [95% CI: 0.93, 0.98]) and DL 75 (mean, 0.97 [95% CI: 0.94, 0.99]) models compared with the conventional method (mean, 0.75 [95% CI: 0.67, 0.83]). Similarly, for shoulder images the DL models had higher edge sharpness (DL 50: mean, 0.97 [95% CI: 0.96, 0.99]; DL 75: mean, 0.98 [95% CI: 0.97, 0.99]) compared with the conventional method (mean, 0.90 [95% CI: 0.85, 0.93]). A table of descriptive statistics summarizing the data presented in Figure 1 is included in Table E2 (supplement). For both hip and shoulder images, there were no differences in edge sharpness between DL 50 and DL 75 images.

Box and whisker plot of the normalized sharpness (relative edge sharpness index) for each reconstruction method. *P < .001 compared with no DL, †P = .01 compared with no DL, DL = deep learning, DL 50 = DL noise reduction factor of 50%, DL 75 = DL noise reduction factor of 75%. Diamonds are outlier points in the box distribution plots. — Box and whisker plot of the normalized sharpness (relative edge sharpness index) for each reconstruction method. *P < .001 compared with no DL, ^†P = .01 compared with no DL, DL = deep learning, DL 50 = DL noise reduction factor of 50%, DL 75 = DL noise reduction factor of 75%. Diamonds are outlier points in the box distribution plots.

Figure 2 comprises box and whisker plots summarizing the computed rSNR, rCNR, and rCR measurements. In all analyses (with the exception of rSNR for no DL compared with DL 75) the DL 50 and DL 75 images have higher image quality metrics than the conventional (no DL) images. Descriptive statistics of the data underlying these plots are provided in Tables E3–E5 (supplement).

Box and whisker plot for the computed signal analysis metrics, including (A) relative signal-to-noise ratio (rSNR), (B) relative contrast ratio (rContrast), and (C) relative contrast-to-noise ratio (rCNR). Box and whisker plots are provided for each relevant metric distribution. (A) The rSNR measurements are provided for both captured signal regions (muscle and bone), whereas the (B) rContrast and (C) rCNR present only one distribution per reconstruction method, as both derived regional signals are used for the contrast calculation. *P < .001 compared with no DL, †P < .001 compared with DL-50, ‡P = .002 compared with no DL, DL-50 = DL noise reduction factor of 50%, DL-75 = DL noise reduction factor of 75%, No-DL = no deep learning. Diamonds are outlier points in the box distribution plots. — Box and whisker plot for the computed signal analysis metrics, including **(A)** relative signal-to-noise ratio (rSNR), **(B)** relative contrast ratio (rContrast), and **(C)** relative contrast-to-noise ratio (rCNR). Box and whisker plots are provided for each relevant metric distribution. **(A)** The rSNR measurements are provided for both captured signal regions (muscle and bone), whereas the **(B)** rContrast and **(C)** rCNR present only one distribution per reconstruction method, as both derived regional signals are used for the contrast calculation. ^*P < .001 compared with no DL, ^†P < .001 compared with DL-50, ^‡P = .002 compared with no DL, DL-50 = DL noise reduction factor of 50%, DL-75 = DL noise reduction factor of 75%, No-DL = no deep learning. Diamonds are outlier points in the box distribution plots.

As indicated in Figure 2, pairwise hypothesis tests of the rSNR, rCNR, and rCR distributions showed substantial and pronounced performance differences between the non-DL and DL reconstructions and was also shown to be a function of the DL denoising setting. As with the sharpness analysis, the observed signal quality performance trends were consistently found in both the hip and shoulder cohorts. Of note, the only observed nonsignificant differences were between two of the bone compartment analyses in the hip dataset. It is hypothesized that this lack of significance is due to the complex noise signatures observed in the fat-suppressed bone images and the relatively low cohort size of the hip rSNR analysis (n = 20).

Reader Study

Table 1 provides descriptive statistics of the scores given to each image set by three radiologists. All three raters consistently agreed that the conventional reconstruction (no DL) had the highest ranking (ie, score of 3 for worst image quality and conspicuity). The mean score for the hip images was 2.97 ± 0.25 for no DL, 1.73 ± 0.45 for DL 50, and 1.3 ± 0.5 for DL 75. Additionally, the mean score for shoulder images was 2.97 ± 0.31 for no DL, 1.71 ± 0.48 for DL 50, and 1.26 ± 0.46 for DL 75. For both the hip and the shoulder, Wilcoxon signed rank tests of the ranking distributions showed significant differences for each DL compared with no DL as well as for DL 50 compared with DL 75 (P < .001).

Table 1:

Ranking Data from Reader Study of Conventional and DL Reconstruction Approaches

Open in a new tab

Rater reliability results are provided in Table 2. Both interrater and intrarater Gwet first-order agreement coefficient values are presented. Of interest, there was a high level of interrater consistency in the conventional (no DL) approach, which was consistently deemed to have the lowest image quality and conspicuity (score of 3). Across both hip and shoulder images the agreement between DL 50 and DL 75 was lower, however. Within this study, the computed intrarater reliability provides a metric indicating how consistently one rater rated each of the methods across the patient samples. Two raters (rater 1 and rater 3) consistently ranked the reconstruction methods across the patient sample. Rater 2 was substantially less consistent in ranking the DL 75 and DL 50 methods, however.

Table 2:

Inter- and Intrarater Gwet Agreement Coefficients

Open in a new tab

Example Images from Study Sample

Figures 3–5 present visual evidence supporting the improved diagnostic performance of the DL reconstruction approaches in intermediate-weighted fat-suppressed images analyzed in this study. Figure 3 presents images from a different patient showing hematopoietic marrow in the femoral acetabulum. The presented DL reconstruction images show improved visualization of focal marrow heterogeneity patterns with each successive denoising factor. Figure 4 presents the reconstructed images from a patient with a subchondral femoral head fatigue fracture, labral tearing, and cartilage disease. Finally, Figure 5 presents a shoulder image from a patient with a thickened and hyperintense rotator interval and coracohumeral ligament, subcortical cystic changes, and subscapularis tendinosis.

Coronal intermediate proton density–weighted fat-suppressed MR image of the hip using (A) no deep learning (DL) reconstruction, (B) DL reconstruction with 50% noise reduction setting, and (C) DL reconstruction with 75% noise reduction setting. The variations of signal to noise in the resulting image are clear, having substantial effects on the visibility of edema patterns within the femoral acetabulum (green arrow in C). Improved boundary conspicuity is evident in the cartilage and labrum (blue arrow in A). — Coronal intermediate proton density–weighted fat-suppressed MR image of the hip using **(A)** no deep learning (DL) reconstruction, **(B)** DL reconstruction with 50% noise reduction setting, and **(C)** DL reconstruction with 75% noise reduction setting. The variations of signal to noise in the resulting image are clear, having substantial effects on the visibility of edema patterns within the femoral acetabulum (green arrow in C). Improved boundary conspicuity is evident in the cartilage and labrum (blue arrow in A).

Oblique coronal intermediate proton density–weighted fat-suppressed MR image of the shoulder using (A) no deep learning (DL) reconstruction, (B) DL reconstruction with 50% noise reduction setting, and (C) DL reconstruction with 75% noise reduction setting. Improved sharpness of abnormal features is seen in the DL reconstruction images, including thickened and hyperintense rotator interval and coracohumeral ligament (blue arrow), subcortical cystic changes (yellow arrow), and subscapularis tendinosis (green arrows) in panel C. — Oblique coronal intermediate proton density–weighted fat-suppressed MR image of the shoulder using **(A)** no deep learning (DL) reconstruction, **(B)** DL reconstruction with 50% noise reduction setting, and **(C)** DL reconstruction with 75% noise reduction setting. Improved sharpness of abnormal features is seen in the DL reconstruction images, including thickened and hyperintense rotator interval and coracohumeral ligament (blue arrow), subcortical cystic changes (yellow arrow), and subscapularis tendinosis (green arrows) in panel C.

Discussion

The purpose of this study was to assess two DL settings—50% and 75% noise reduction—of MRI reconstruction in comparison with conventional DL techniques. Such DL-based reconstruction techniques may allow for accelerated acquisition times while retaining sufficient clinical image quality. To assess the DL reconstruction performance, the analysis in this study sought to provide objective quantitative measures that could be paired with radiologist rankings of the reconstruction approaches. By using this analytic approach, the DL reconstruction network showed notable improvements in rSNR, rCNR, and rCR, which increased with the tunable denoising factor. Of more interest, the quantitative edge sharpness analysis performed in this study showed that the DL reconstruction methods improved the sharpness of edges relative to the conventional approach. As expected, the denoising factor in the DL reconstruction algorithm did not affect the edge sharpness metrics.

For several years, the potential of DL reconstruction in CT (4,5) has been shown to offer superior capabilities relative to advanced iterative CT reconstruction algorithms (30,31). These technical advances have culminated in commercial releases that are altering the state of the art in diagnostic CT imaging (32). The underlying physical and mathematic principles of conventional MRI signal acquisition and reconstruction are different from those applied in CT. However, the application of DL to raw data reconstruction in MRI holds similar promise to that demonstrated with CT.

It is notable that the DL reconstruction affected the raw image contrast measurements in this study. DL noise reduction is performed on complex-valued images with zero mean noise. Their subsequent conversion to magnitude-valued images results in a shift in the mean image level owing to the modified noise statistics (33). In the magnitude images, the noise has a nonzero mean that is proportionally larger at low signal levels. Thus, there is an apparent contrast increase between high and low signal regions (ie, muscle and bone) using the DL reconstruction, with dark regions in the magnitude-valued images appearing darker owing to a reduction in the noise mean.

To a large degree, the independent reader analyses of the reconstructed images agreed with the objective quantitative measures. That is, the ranking statistically showed that the DL 75 setting was preferred over the DL 50 setting, and both DL settings were preferred over the conventional reconstruction. The premarket DL implementation used in the present study allowed for tuned adjustment of the DL reconstruction settings (ie, 50 vs 75). The released commercial product offers three denoising settings of low, medium, and high. Performance reproduction of the DL 50 and DL 75 settings evaluated in the present work can be achieved with the commercial product using the medium and high settings, respectively.

Although still significant, the weakest difference between the reader rankings was for the comparison between DL 75 and DL 50 images. This observation can be further analyzed by exploring the inter- and intrarater repeatability coefficients provided in Table 2. In particular, the conventional reconstruction consistently obtained lower image quality scores across all raters and patients. However, there was substantially less agreement within the DL reconstruction denoising setting ratings. Although the DL 75 setting yielded the best cumulative performance, it was not particularly consistent across raters. When examining the three intrarater coefficients, two raters were relatively consistent in their rating of the DL 75 versus DL 50 settings, but one rater was far less consistent. These results point to potential nuances of the specific DL reconstruction settings that may require customization depending on user preferences within each radiologist group.

This study has demonstrated the benefits of the evaluated DL MRI reconstruction algorithm, but there were several limitations in its scope. First, the study had only a modest sample size of 54 patients. A consequence of this sample size is that the study data did not demonstrate substantial motion artifacts. Although some minor motion was observed and showed artifacts in all performed reconstructions, the overall behavior of the DL reconstruction in the presence of substantial motion was not evaluated. The imaging data used in this evaluation were not collected with parallel imaging techniques. Although the evaluated DL reconstruction technique is compatible with parallel imaging methods that fill in missing k-space points, such data were not evaluated in the present study. Additionally, this study was specifically focused on acquisitions encountered in orthopedic imaging applications using fat-saturated, intermediate-weighted contrasts. Further evaluations of the DL reconstruction approach will be required for other anatomic regions, applications, and contrasts.

Efforts were made to perform a blinded reader study, but it was impossible to disguise images that showed clear image feature differences. In the present study, the clear image quality differences in the DL reconstructions were likely evident to study raters and may have imparted some bias to the scoring. Finally, because the study was focused on analysis of images collected in a practical clinical setting, it was necessarily performed without a reference standard image set. In a dedicated research setting, higher-quality images could have been collected with substantial increases in acquisition time relative to the conventional clinical images. The ensuing analysis could have then measured the standard of care and the DL reconstructed images against this ideal reference standard image set.

In conclusion, this preliminary analysis of a vendor-supplied DL MRI reconstruction prototype has shown encouraging image quality improvements when applied to challenging orthopedic MRI acquisitions in the hip and shoulder. Quantitatively, the study established that the DL 75 denoising setting provided improved objective image quality relative to the DL 50 setting. However, the multirater scoring component of the study has shown that radiologist perceptions of the reconstructed images may require adjustments of this setting depending on local group preferences. In summary, this study has demonstrated that DL-based MRI reconstruction methods can improve the quality of two-dimensional fast/turbo spin-echo MRI used for orthopedic diagnostic techniques.

Acknowledgments

GE Healthcare provided evaluation technology and reviewed this manuscript for technical accuracy.

Supported in part by a grant from GE Healthcare.

Disclosures of Conflicts of Interest: K.M.K. institution received a grant from GE Healthcare. M.S. disclosed no relevant relationships. V.E.A. disclosed no relevant relationships. S.B. disclosed no relevant relationships. R.A. disclosed no relevant relationships. A.S.N. institution received funding from GE Healthcare for work in neuroimaging MRI technology development and dissemination; is an inventor on patents including MRI technology focusing on multispectral imaging and magnetic field measurement and modulation; is a scientific advisor for and holds stock in Vasognosis, a start-up company focused on neurovascular imaging applications. R.M.L. is employed by and holds stock options in GE Healthcare; GE Healthcare has patents pending on the algorithms used in this work, but no money has been received. G.M. is employed by GE Healthcare; has been issued U.S. patent no. US10635943B1. S.S.K. is employed by GE Healthcare; received royalties from the Medical College of Wisconsin for a licensed patent unrelated to this work that was filed in 2015. D.V. disclosed no relevant relationships. M.R.S. disclosed no relevant relationships. S.F. disclosed no relevant relationships. R.M. disclosed no relevant relationships.

Abbreviations:

DICOM: Digital Imaging and Communications in Medicine
DL: deep learning
DL 50: DL reconstruction image with a noise reduction factor of 50%
DL 75: DL reconstruction image with a noise reduction factor of 75%
rCR: relative contrast ratio
rCNR: relative contrast-to-noise ratio
rSNR: relative signal-to-noise ratio

References

1. McGibney G, Smith MR, Nichols ST, Crawley A. Quantitative evaluation of several partial Fourier reconstruction algorithms used in MRI. Magn Reson Med 1993;30(1):51–59. [DOI] [PubMed] [Google Scholar]
2. Sodickson DK, Manning WJ. Simultaneous acquisition of spatial harmonics (SMASH): fast imaging with radiofrequency coil arrays. Magn Reson Med 1997;38(4):591–603. [DOI] [PubMed] [Google Scholar]
3. Griswold MA, Jakob PM, Heidemann RM, et al. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn Reson Med 2002;47(6):1202–1210. [DOI] [PubMed] [Google Scholar]
4. Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med 1999;42(5):952–962. [PubMed] [Google Scholar]
5. Candes EJ, Romberg J, Tao T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 2006;52(2):489–509. [Google Scholar]
6. Donoho DL. Compressed sensing. IEEE Trans Inf Theory 2006;52(4):1289–1306. [Google Scholar]
7. Lustig M, Donoho D, Pauly JM. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn Reson Med 2007;58(6):1182–1195. [DOI] [PubMed] [Google Scholar]
8. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Technical report. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates, 2012. [Google Scholar]
9. Ronneberger O, Fischer P, Brox T. U-net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241 [Google Scholar]
10. Greenspan H, Van Ginneken B, Summers RM. Deep learning in medical imaging: Overview and future promise of an exciting new technique [guest editorial]. IEEE Trans Med Imaging 2016;35(5):1153–1159. [Google Scholar]
11. Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Med Phys 2017;44(10):e360–e375. [DOI] [PubMed] [Google Scholar]
12. Han Y, Ye JC. Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT. IEEE Trans Med Imaging 2018;37(6):1418–1429. [DOI] [PubMed] [Google Scholar]
13. Hammernik K, Klatzer T, Kobler E, et al. Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med 2018;79(6):3055–3071. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Han Y, Yoo J, Kim HH, Shin HJ, Sung K, Ye JC. Deep learning with domain adaptation for accelerated projection-reconstruction MR. Magn Reson Med 2018;80(3):1189–1205. [DOI] [PubMed] [Google Scholar]
15. Hyun CM, Kim HP, Lee SM, Lee S, Seo JK. Deep learning for undersampled MRI reconstruction. Phys Med Biol 2018;63(13):135007. [DOI] [PubMed] [Google Scholar]
16. Aggarwal HK, Mani MP, Jacob M. MoDL:. Model-Based Deep Learning Architecture for Inverse Problems. IEEE Trans Med Imaging 2019;38(2):394–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Biswas S, Aggarwal HK, Poddar S, Jacob M. Model-based free-breathing cardiac MRI reconstruction using deep learned\& STORM priors: MoDL-STORM. Proc IEEE Int Conf Acoust Speech Signal Process 2018;2018:6533–6537. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Schlemper J, Caballero J, Hajnal JV, Price A, Rueckert D. A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction. arXiv 1703.00555 [preprint] https://arxiv.org/abs/1703.00555. Posted March 1, 2017. Accessed March 15, 2020. [DOI] [PubMed]
19. Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature 2018;555(7697):487–492. [DOI] [PubMed] [Google Scholar]
20. Knoll F, Hammernik K, Zhang C, et al. Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction. arXiv 1904.01112 [preprint] https://arxiv.org/abs/1904.01112. Posted April 1, 2019. Accessed March 15, 2020. [DOI] [PMC free article] [PubMed]
21. Yaman B, Hosseini SAH, Moeller S, Ellermann J, Uğurbil K, Akçakaya M. Self-supervised physics-based deep learning MRI reconstruction without fully sampled data. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) 2020; 921–925. [Google Scholar]
22. Lebel RM. Performance characterization of a novel deep learning-based MR image reconstruction pipeline. arXiv:2008.06559 [preprint]. https://arxiv.org/abs/2008.06559. Posted August 14, 2020. Accessed August 14, 2020.
23. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv 1412.6980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed March 15, 2020. [Google Scholar]
24. Wilcoxon F. Individual Comparisons by Ranking Methods. Biom Bull 1945;1(6):80–83. [Google Scholar]
25. Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61(Pt 1):29–48. [DOI] [PubMed] [Google Scholar]
26. Gwet KL. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, 2014.
27. Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020;17(3):261–272. [Published correction appears in Nat Methods 2020;17(3):352.]. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Vallat R. Pingouin: statistics in Python. J Open Source Softw 2018;3(31):1026. [Google Scholar]
29. Girard J. mReliability: Reliability analysis in MATLAB. https://github.com/jmgirard/mReliability. Published 2020. Accessed March 15, 2020.
30. Geyer LL, Schoepf UJ, Meinel FG, et al. State of the art: Iterative CT reconstruction techniques. Radiology 2015;276(2):339–357. [DOI] [PubMed] [Google Scholar]
31. Willemink MJ, Leiner T, de Jong PA, et al. Iterative reconstruction techniques for computed tomography part 2: initial results in dose reduction and image quality. Eur Radiol 2013;23(6):1632–1642. [DOI] [PubMed] [Google Scholar]
32. GE Healthcare . True Fidelity CT Reconstruction. https://www.gehealthcare.com/products/truefidelity. Published 2020. Accessed June 2, 2020.
33. Gudbjartsson H, Patz S. The Rician distribution of noisy MRI data. Magn Reson Med 1995;34(6):910–914 [Published correction appears in Magn Reason Med 1996;36(2):332.]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r1] 1. McGibney G, Smith MR, Nichols ST, Crawley A. Quantitative evaluation of several partial Fourier reconstruction algorithms used in MRI. Magn Reson Med 1993;30(1):51–59. [DOI] [PubMed] [Google Scholar]

[r2] 2. Sodickson DK, Manning WJ. Simultaneous acquisition of spatial harmonics (SMASH): fast imaging with radiofrequency coil arrays. Magn Reson Med 1997;38(4):591–603. [DOI] [PubMed] [Google Scholar]

[r3] 3. Griswold MA, Jakob PM, Heidemann RM, et al. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn Reson Med 2002;47(6):1202–1210. [DOI] [PubMed] [Google Scholar]

[r4] 4. Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med 1999;42(5):952–962. [PubMed] [Google Scholar]

[r5] 5. Candes EJ, Romberg J, Tao T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 2006;52(2):489–509. [Google Scholar]

[r6] 6. Donoho DL. Compressed sensing. IEEE Trans Inf Theory 2006;52(4):1289–1306. [Google Scholar]

[r7] 7. Lustig M, Donoho D, Pauly JM. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn Reson Med 2007;58(6):1182–1195. [DOI] [PubMed] [Google Scholar]

[r8] 8. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Technical report. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates, 2012. [Google Scholar]

[r9] 9. Ronneberger O, Fischer P, Brox T. U-net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241 [Google Scholar]

[r10] 10. Greenspan H, Van Ginneken B, Summers RM. Deep learning in medical imaging: Overview and future promise of an exciting new technique [guest editorial]. IEEE Trans Med Imaging 2016;35(5):1153–1159. [Google Scholar]

[r11] 11. Kang E, Min J, Ye JC. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Med Phys 2017;44(10):e360–e375. [DOI] [PubMed] [Google Scholar]

[r12] 12. Han Y, Ye JC. Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT. IEEE Trans Med Imaging 2018;37(6):1418–1429. [DOI] [PubMed] [Google Scholar]

[r13] 13. Hammernik K, Klatzer T, Kobler E, et al. Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med 2018;79(6):3055–3071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14] 14. Han Y, Yoo J, Kim HH, Shin HJ, Sung K, Ye JC. Deep learning with domain adaptation for accelerated projection-reconstruction MR. Magn Reson Med 2018;80(3):1189–1205. [DOI] [PubMed] [Google Scholar]

[r15] 15. Hyun CM, Kim HP, Lee SM, Lee S, Seo JK. Deep learning for undersampled MRI reconstruction. Phys Med Biol 2018;63(13):135007. [DOI] [PubMed] [Google Scholar]

[r16] 16. Aggarwal HK, Mani MP, Jacob M. MoDL:. Model-Based Deep Learning Architecture for Inverse Problems. IEEE Trans Med Imaging 2019;38(2):394–405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17. Biswas S, Aggarwal HK, Poddar S, Jacob M. Model-based free-breathing cardiac MRI reconstruction using deep learned\& STORM priors: MoDL-STORM. Proc IEEE Int Conf Acoust Speech Signal Process 2018;2018:6533–6537. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18. Schlemper J, Caballero J, Hajnal JV, Price A, Rueckert D. A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction. arXiv 1703.00555 [preprint] https://arxiv.org/abs/1703.00555. Posted March 1, 2017. Accessed March 15, 2020. [DOI] [PubMed]

[r19] 19. Zhu B, Liu JZ, Cauley SF, Rosen BR, Rosen MS. Image reconstruction by domain-transform manifold learning. Nature 2018;555(7697):487–492. [DOI] [PubMed] [Google Scholar]

[r20] 20. Knoll F, Hammernik K, Zhang C, et al. Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction. arXiv 1904.01112 [preprint] https://arxiv.org/abs/1904.01112. Posted April 1, 2019. Accessed March 15, 2020. [DOI] [PMC free article] [PubMed]

[r21] 21. Yaman B, Hosseini SAH, Moeller S, Ellermann J, Uğurbil K, Akçakaya M. Self-supervised physics-based deep learning MRI reconstruction without fully sampled data. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) 2020; 921–925. [Google Scholar]

[r22] 22. Lebel RM. Performance characterization of a novel deep learning-based MR image reconstruction pipeline. arXiv:2008.06559 [preprint]. https://arxiv.org/abs/2008.06559. Posted August 14, 2020. Accessed August 14, 2020.

[r23] 23. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv 1412.6980 [preprint] https://arxiv.org/abs/1412.6980. Posted December 22, 2014. Accessed March 15, 2020. [Google Scholar]

[r24] 24. Wilcoxon F. Individual Comparisons by Ranking Methods. Biom Bull 1945;1(6):80–83. [Google Scholar]

[r25] 25. Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61(Pt 1):29–48. [DOI] [PubMed] [Google Scholar]

[r26] 26. Gwet KL. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, 2014.

[r27] 27. Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020;17(3):261–272. [Published correction appears in Nat Methods 2020;17(3):352.]. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28] 28. Vallat R. Pingouin: statistics in Python. J Open Source Softw 2018;3(31):1026. [Google Scholar]

[r29] 29. Girard J. mReliability: Reliability analysis in MATLAB. https://github.com/jmgirard/mReliability. Published 2020. Accessed March 15, 2020.

[r30] 30. Geyer LL, Schoepf UJ, Meinel FG, et al. State of the art: Iterative CT reconstruction techniques. Radiology 2015;276(2):339–357. [DOI] [PubMed] [Google Scholar]

[r31] 31. Willemink MJ, Leiner T, de Jong PA, et al. Iterative reconstruction techniques for computed tomography part 2: initial results in dose reduction and image quality. Eur Radiol 2013;23(6):1632–1642. [DOI] [PubMed] [Google Scholar]

[r32] 32. GE Healthcare . True Fidelity CT Reconstruction. https://www.gehealthcare.com/products/truefidelity. Published 2020. Accessed June 2, 2020.

[r33] 33. Gudbjartsson H, Patz S. The Rician distribution of noisy MRI data. Magn Reson Med 1995;34(6):910–914 [Published correction appears in Magn Reason Med 1996;36(2):332.]. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Analysis and Evaluation of a Deep Learning Reconstruction Approach with Denoising for Orthopedic MRI

Kevin M Koch, PhD

Mohammad Sherafati, PhD

V Emre Arpinar, PhD

Sampada Bhave, PhD

Robin Ausman, BS

Andrew S Nencka, PhD

R Marc Lebel, PhD

Graeme McKinnon, PhD

S Sivaram Kaushik, PhD

Douglas Vierck, BS

Michael R Stetz, MD

Sujan Fernando, MD

Rajeev Mannem, MD

Abstract

Purpose

Materials and Methods

Results

Conclusion

Summary

Key Points

Introduction

Materials and Methods

Patient Inclusion

Imaging Method and Analysis

DL Reconstruction Approach

Image Analysis

Statistical and Quantitative Assessments

Results

Patient Overview

Image Quality Metric Comparisons

Figure 1:

Figure 2:

Reader Study

Table 1:

Table 2:

Example Images from Study Sample

Figure 3:

Figure 5:

Figure 4:

Discussion

Acknowledgments

Acknowledgments

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases