Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 1.
Published in final edited form as: J Magn Reson Imaging. 2019 Jul 16;51(3):768–779. doi: 10.1002/jmri.26872

Utility of Deep Learning Super-Resolution in the Context of Osteoarthritis MRI Biomarkers

Akshay S Chaudhari 1, Kathryn J Stevens 1,3, Jeff P Wood 2, Amit K Chakraborty 1, Eric K Gibbons 4, Zhongnan Fang 5, Arjun D Desai 1, Jin Hyung Lee 6,7,8, Garry E Gold 1,3,7, Brian A Hargreaves 1,7,9
PMCID: PMC6962563  NIHMSID: NIHMS1045686  PMID: 31313397

Abstract

Background:

Super-resolution is an emerging method for enhancing MRI resolution, however its impact on image quality is still unknown.

Purpose:

To evaluate MRI super-resolution using quantitative and qualitative metrics of cartilage morphometry, osteophyte detection, and global image blurring.

Study Type:

Retrospective

Population:

176 MRI studies of subjects at varying stages of osteoarthritis.

Field Strength/Sequence:

Original-resolution 3D double-echo steady-state (DESS) and DESS with 3x thicker slices retrospectively enhanced using super-resolution and tricubic interpolation (TCI) at 3T.

Assessment:

A quantitative comparison of femoral cartilage morphometry was performed for the original-resolution DESS, the super-resolution, and the TCI scans in 17 subjects. A reader study by three musculoskeletal radiologists assessed cartilage image quality, overall image sharpness, osteophytes incidence in all three sets of scans. A reference-less blurring metric evaluated blurring in all three image dimensions for the three sets of scans.

Statistical Tests:

Mann-Whitney U-Tests compared dice coefficients (DC) of segmentation accuracy for the DESS, super-resolution, and TCI images, along with the image quality readings, and blurring metrics. Sensitivity, specificity, and diagnostic odds ratio (DOR) with 95% confidence intervals compared osteophyte detection for the super-resolution and TCI images, with the original-resolution as a reference.

Results:

DC for the original-resolution (90.2±1.7%) and super-resolution (89.6±2.0%) were significantly higher (p<0.001) than TCI (86.3±5.6%). Segmentation overlap of super-resolution with the original-resolution (DC=97.6±0.7%) was significantly higher (p<0.0001) than TCI overlap (DC=95.0±1.1%). Cartilage image quality for sharpness and contrast levels, and the through-plane quantitative blur factor for super-resolution images was significantly (p<0.001) better than TCI. Super-resolution osteophyte detection sensitivity of 80% (76–82%), specificity of 93% (92–94%), and DOR of 32 (22–46) was significantly higher (p<0.001) than TCI sensitivity of 73% (69–76%), specificity of 90% (89–91%), and DOR of 17 (13–22).

Data Conclusion:

Super-resolution appears to consistently outperform naïve interpolation and may improve image quality without biasing quantitative biomarkers.

Keywords: Super-Resolution, Artificial Intelligence, Machine Learning Interpretability, Osteoarthritis Biomarkers, Image Acceleration, Cartilage Segmentation

INTRODUCTION:

Osteoarthritis (OA) affects over 30 million adults in the United States is and a leading source of pain (1). Knee OA accounts for approximately 80% of the burden of the disease (2). While radiography currently serves as the standard for OA diagnoses, it is insensitive to the multitude of soft-tissues affected in knee OA activity and can only detect late-stage OA changes (3). MRI is a common tool for studying soft-tissue changes caused by OA and may provide potential biomarkers sensitive to early OA (4, 5).

Within MRI, there is great interest in improving resolution for detecting subtle anatomical defects occurring in early OA activity and for generating high-resolution 3D images. However, high-resolution MRI is primarily limited by long scan times, which can increase the time and costs of large research studies. For example, the Osteoarthritis Initiative (OAI) included a 11-minute high-resolution double-echo steady-state (DESS) pulse sequence that was used to scan over 25,000 subjects at varying timepoints (6). Traditional MRI acceleration methods such as parallel imaging may come at the expense of noisier images due to g-factor limitations, while compressed sensing can come at the expense of long reconstruction durations and tradeoffs between artifacts and blurring due to the use of empirical regularization parameters (79). The use of 3D fast spin echo sequences is promising to expedite image acquisition, however, such images may be susceptible to image blurring due to k-space modulation over long echo train lengths (10).

Image super-resolution (SR) includes an established family of computer vision techniques that operate in image space by directly transforming low-resolution images into higher-resolutions, using a variety of algorithms (11). Convolutional neural networks (CNNs) and deep-learning-based SR may provide an alternate technique to reduce MRI acquisition time without the tradeoffs of conventional acceleration methods (12, 13). Recently, SR has been used on DESS acquisitions from the OAI with promising results, however, the evaluation was performed only using image quality metrics such as structural similarity (SSIM), mean-square-errors (MSE), and peak signal-to-noise ratio (pSNR) (14, 15). While these metrics are common loss functions for CNN training, they do not always correlate to perceived image quality (16, 17).

Previous applications of DESS SR have only optimized for heuristic image quality metrics and the accuracy of regional T2 relaxation time measurements, respectively (18, 19). However, the impact of SR on the perceptual image quality and its impact on resolution-dependent techniques such as quantitative morphological analysis (through segmentation) and abnormality detection have yet to be evaluated. Thus, it remains unclear whether the same biomarkers can be obtained with SR DESS images as compared to the original high-resolution DESS images. Specifically, the DESS sequence in the OAI was primarily used to evaluate articular cartilage morphometry and osteophytes, which are osteo-cartilaginous protrusions developing on the margins of osteoarthritic joints (20). Variations in both, cartilage and osteophytes, are known hallmarks of OA activity (21, 22). Consequently, the purpose of this study was to determine the impact of SR on potential imaging biomarkers of OA progression.

METHODS:

Subject Population:

We used publicly available DESS data from the OAI, which is a longitudinal observational cohort study investigating the natural history of and risk factors for knee OA. The scan parameters for DESS were as follows: FOV=14cm, matrix=384×307 (zero-filled to 384×384), spatial resolution=0.36mm×0.45mm, TE/TR=5/16ms, slice thickness=0.7mm, and number of slices=160 (6). 124 3D DESS image volumes were used for network training, 35 for validation, and 17 for testing. The distribution of subjects with varying Kellgren-Lawrence grades (KLG) of knee OA severity was maintained approximately equally in the 124 training datasets (3 KLG-1, 41 KLG-2, 71 KLG-3, 9 KLG-4), 35 validation datasets (1 KLG-1, 12 KLG-2, 20 KLG-3, 2 KLG-4), and 17 testing (6 KLG-2, 10 KLG-3, 1 KLG-4) datasets. KLG grades were obtained directly from the public OAI readings database, where they were defined through centralized readings of fixed flexion radiographs by two readers (a radiologist and rheumatologist) (23). In case of disagreement, a consensus read was performed with a third reader. The placement of subjects into the training, validation, and testing group was performed in fully randomized manner. 3D DESS images from all 17 testing subjects for all 3 image sets (original high-resolution, DeepResolve super-resolution, and tricubic interpolation) were used in the quantitative and qualitative experiments described below.

Super-Resolution:

We utilized a 3D CNN entitled DeepResolve to transform low-resolution images into higher-resolution images by learning a difference image (the residual image) between the two (13). Given a set of low-resolution (x(i)) and high-resolution (y(i)) images, and residuals (r(i)=x(i)y(i)); DeepResolve was used to learn residual transformations r^=f(x) during training. The training was subject to a L2-loss between the high-resolution and super-resolution images. During inference, a super-resolution image y^=x+f(x) was estimated using a summation of the underlying low-resolution image using (x) and the estimated residual image f(x).

The original DESS acquisition included slices with a thickness of 0.7mm. DeepResolve was utilized DESS slices with a thickness of 2.1mm. First, low resolution DESS slices with 3x thicker slices (2.1mm) were generated using a 48th-order anti-aliasing filter. Second, the commonly used approach of tricubic interpolation (TCI) was used to interpolate the thicker slices to the same slice locations of the original high-resolution sequence, thereby creating paired low and high resolution images (13). The 3x resolution downsampling factor was chosen from previous findings, which demonstrated that using higher downsampling factors with the current SR network may lead to excessive image blurring that may not be recovered using DeepResolve (13). Overall, the TCI images were the input to DeepResolve, whose output were the super-resolved images. DeepResolve training was performed using 32×32×32 patches using a CNN that utilized 20 convolution operators with 64 filters each of kernel size 3×3×3 and rectified linear unit activations (Fig. 1a)(24). The network structure is described in detail previously (13, 25).

Figure 1:

Figure 1:

The neural network architecture for the magnetic resonance super-resolution method (DeepResolve) includes twenty 3D convolutional (with 64 filters each) and rectified linear unit activation blocks (except no activation for the final convolutional layer). DeepResolve converts a low-resolution image into a residual image, where a summation of the two can create the super resolution image (a). The femoral cartilage segmentation network utilizes a 3D U-Net autoencoder approach consisting of six encoding and decoding convolutional layers, with filter sizes growing exponential from 32 to 512 (b).

Quantitative Cartilage Morphometry:

We assessed potential blurring induced by DeepResolve by evaluating variations in cartilage morphometry, with the hypothesis that similar appearing images should produce the same cartilage segmentation results. However, manual cartilage segmentation is a challenging task with intra-reader and inter-reader discrepancies of approximately 2–4% coefficient of variation (5). Such variability may not be adequate decouple variations in cartilage segmentation caused by image blurring or by human variability. Consequently, we designed an additional 3D CNN to perform highly accurate cartilage segmentation of the original high-resolution, DeepResolve, and TCI images in order to eliminate human inter-reader variability.

Inspired by previous successful approaches to perform cartilage segmentation, we designed a U-Net CNN based on the efficient encoder-decoder approach (2628). However, previous approaches have utilized 2D CNNs which may not be adequate to capture the through-plane variations in image quality of DeepResolve. As a result, we extend the current state-of-the-art 2D segmentation models to 3D. The 3D U-Net CNN utilized in this study included 5 encoding-decoding steps using filters increasing exponentially from 32 to 512, each with kernel size 3×3×3 and rectified linear unit activations (Fig. 1b). Ground-truth segmented labels were obtained from the OAI. All slices were downsampled by a factor of 2 to increase signal-to-noise ratio and reduce computational complexity, based on the recommendations that approximately 1.5mm slices are adequate for cartilage morphometry (29). Network training on the original high-resolution images was performed on input image dimensions of 288×288×32 using a soft Dice-coefficient loss function on identical training, validation, and testing splits as the DeepResolve training.

Dice coefficients (DC), volumetric overlap error (VOE), absolute volume difference (VD), and the cartilage volume root-mean-square coefficient of variation (RMS-CV%) were calculated to assess segmentation accuracy of the network for the high-resolution, DeepResolve, and TCI images compared to the ground-truth manual labels. In addition, the same accuracy metrics were compared for DeepResolve and TCI segmentations with the original high-resolution images serving as the ground-truth. The hypothesis behind this experiment was that if DeepResolve maintained identical image quality as the original high-resolution data, it would have the exact segmentation overlap as the original images, and any subsequent variations in the overlap would signify the extent of image variability.

Neural Network Training:

Network training for both 3D DeepResolve and 3D U-Net was performed using Keras and a Tensorflow backend (Google, Mountain View, CA) with an Adam optimizer (default parameters of β1=0.99, β2=0.995, ε=1e-08) (30, 31). A static learning rate of 0.0001 was used for DeepResolve training over 20 epochs while a dynamic learning rate was used for U-Net (initially 0.01 decaying by 50% every 4 epochs for 40 epochs). DeepResolve and U-Net training was performed on NVIDIA 1080 Ti and NVIDIA Titan Xp graphical processing units (NVIDIA, Santa Clara, CA) respectively. The models that resulted in the best loss for the 35 validation datasets were chosen as the final models to be used during inference.

Reader Study:

Qualitative cartilage image quality was assessed by two musculoskeletal radiologists (K.S. and J.W.) with varying levels of experience (K.S. – 20 years, J.W. – 3 years) and one musculoskeletal radiology resident (Am.C.). All images were presented to the radiologists in a fully blinded and randomized manner. A minimum of a one-week washout period was maintained between reading images from the same subject to minimize memory bias. No additional information apart from the images was provided to the readers.

All three readers scored the three sets of images (original high-resolution, DeepResolve, and TCI) for sharpness, contrast, signal-to-noise ratio (SNR), and artifacts, specifically for articular cartilage. A scoring of overall image sharpness was also performed. Scoring was performed on a 1–5 Likert scale (1=non-diagnostic, 2=limited quality, 3=minimum diagnostic quality, 4=good, 5=high-quality). In addition to cartilage quality, the radiologists also used the DESS sequence to locate and quantify the size osteophytes in 14 sub-regions of the knee. Osteophyte scoring guidelines were based on the recommendations provided in the assessment of the MRI Osteoarthritis Knee Score (MOAKS), with two modifications (32). The original MOAKS suggests scoring for osteophytes in 12 sub-regions of the knee, however in this study, we added two more sub-regions for analysis - the central and peripheral posterior femoral condyles (Fig. 2). Second, the original MOAKS guidelines suggested scoring osteophyte sizes on a scale of 0–3, without an established criteria for differentiated readings. In this study, we assigned osteophyte scores by measuring the distance the osteophyte projected beyond the joint margin where grades: 1=0–2mm projection, 2=2–4mm projection, and 3=4+mm projection.

Figure 2:

Figure 2:

The sub-divisions for the analysis of osteophytes according to the MRI Osteoarthritis Knee Score (MOAKS) criteria. In addition to the originally proposed 12 sub-regions, we scored two additional regions by sub-dividing the posterior femoral condyle into central (Cen) and peripheral (Per) compartments. The remaining sub-divisions were consistent with the original MOAKS definitions. The femoral trochlea, central, and posterior edges are defined using vertical lines arising from the anterior and posterior tibial aspects. The sub-spinous notch (SS) and the patellar crista are defined as parts of the medial compartment. Osteophytes of the patella are scored on the superior and inferior poles.

Reference-less Blur Factor:

Most quantitative image quality metrics such as SSIM, RMSE, and pSNR rely on a high-quality reference and a lower-quality testing image. However, in many real-life scenarios, a high-quality reference is not always available, which necessitates a reference-less image quality metric. Towards this end, we utilized a reference-less ‘blur factor’ metric which was used to estimate the extent of image blurring on a single-image basis. The blur factor was proposed for evaluating natural images and has previously been used in MRI also (33, 34). This metric applies a low-pass filter to the test image and calculates the differences between the filtered image and the original image, normalized to the intensity of the original. Sharp images would cause larger differences between the original and filtered images, compared to images that were blurry to begin with. Due to normalization, the blur factor exists in a range of 0 to 1, with 0 being least blurry and 1 being most blurry. Since blur factor relies on low-pass filtering of images where the filter kernels can be decoupled into different directions, we calculated the blurring in all three image dimensions of the original high-resolution, DeepResolve, and TCI scans. One dimensional separable Shah functions of length 9 were used as blurring functions in all three dimensions.

Statistical Analysis:

Mann-Whitney U-Tests (α=0.05) with Bonferroni corrections compared the 3D U-Net segmentation performance (DC, VOE, and VD) metrics along with the blur factors for the three image-sets. The same tests were also used to compare the segmentation overlap that existed between the DeepResolve and TCI segmentations using the original high-resolution segmentations as a reference. In the reader study, Mann-Whitney U-Tests (α=0.05) with Bonferroni corrections also assessed variations between quality scores between the three image sets. Fleiss’ Kappa (κ) was used to measure overall inter-reader concordance while linearly-weighted Cohen’s Kappa (κ) was used to measure pairwise concordance between the three readers for cartilage image quality and overall image sharpness readings.

For the osteophyte analysis, sensitivity, specificity, accuracy, and their corresponding 95% confidence intervals (CI) were calculated for the DeepResolve and TCI images, using the original high-resolution as the reference standard. Using the aforementioned metrics, a diagnostic odds ratio (DOR) and its CI was calculated and tested for DeepResolve and TCI. Cochran-Mantel-Haenszel tests stratified by DeepResolve and TCI scans were also used to assess variations in osteophyte detection. Fleiss’ κ measured inter-reader concordance for osteophyte detection while Cohen’s κ measured intra-reader concordance comparing the DeepResolve and TCI sequences to the original high-resolution sequence. All statistical analysis was performed in Python (v3.6.1) using the NumPy (v1.12.1) and SciPy (v0.19.1) libraries.

RESULTS:

Example comparisons between multi-planar reformations of the original, DeepResolve, and TCI images (Fig. 3) showed that compared to the original, DeepResolve maintains adequate image quality. However, TCI images caused a considerable blurring which likely affected quantitative segmentation accuracy (dotted arrows, Fig. 3) and the visualization of bony osteophytes (solid arrows, Fig. 3).

Figure 3:

Figure 3:

Example multi-planar reformations for the original high-resolution images, DeepResolve, and tricubic interpolated (TCI) images. Image acquisition directions are depicted as follows: readout (R), phase-encoding (P), and slice-encoding (S). Examples of osteophytes (solid arrows) in the femoral trochlea (sagittal and axial slices) and in the medial tibia (coronal slice) show that compared to the original images, DeepResolve has high image fidelity but TCI images blur out osteophyte detail. Additionally, small cartilage features (dotted arrows) in the posterior lateral femoral condyle (coronal and axial) that are depicted well on the original and DeepResolve images are blurred out in TCI images, which affects quantitative segmentation accuracy.

Quantitative Cartilage Morphometry:

The 3D U-Net network generated Dice coefficient (DC), volumetric overlap error (VOE), and absolutely volume difference (VD) values of 90.2 ± 1.7%, 17.8 ± 2.9%, and 3.8 ± 3.7% respectively for the original high-resolution images (Fig. 4ac). The DC, VOE, and VD quantitative DeepResolve values of 89.6 ± 2.0%, 18.8 ± 3.2%, and 4.6 ± 3.9% were not statistically significant compared to the original (p=0.38, p = 0.38, and p=0.08 respectively). The DC, VOE, and VD quantitative TCI values of 86.3 ± 5.6%, 24.0 ± 4.7%, and 5.8 ± 5.1% were statistically significant for DC (p<0.001) and VOE (p<0.001), but not VD (p=0.28) compared to the original. The RMS-CV% values for the original high-resolution, DeepResolve, and TCI compared to the ground-truth labels was 3.1%, 2.8%, and 4.9% respectively.

Figure 4:

Figure 4:

The 3D U-Net segmentation network (a–c) demonstrates accurate segmentation metrics of Dice coefficient (DC), volumetric overlap error (VOE), and volume difference (VD) values for the original high-resolution and DeepResolve images, using manual segmentations as the ground truth. TCI had significantly lower DC and VOE than DeepResolve (p<0.001). Using the cartilage surface generated by segmenting the original high-resolution dataset as the reference, DC, VOE, and VD metrics were calculated for DeepResolve and TCI images. DeepResolve had significantly higher overlap (p<0.0001) with the original cartilage surface than TCI.

In comparisons of the segmentation overlap (Fig. 4df) of the DeepResolve and TCI with respect to the original scans, DeepResolve had DC, VOE, and VD values of 97.6 ± 0.7%, 4.7 ± 1.4%, and 1.4 ± 1.0% while TCI had values 95.0 ± 1.1%, 9.5 ± 2.0%, and 1.5 ± 1.0%. For these comparisons, the DC and VOE metrics were significantly different (p<0.00001) between the two scans, but not VD (p=0.33). The DeepResolve and TCI RMS-CV% for overall cartilage volume were 0.4% and 2.6% respectively, compared to the original.

Reader Study –Image Quality:

For the reader study assessing variations in cartilage quality, DeepResolve images consistently had better performance than the TCI images, for all three readers (Fig. 5ac). Following pooling of scores from the three readers (Fig. 5d), compared to TCI, both DeepResolve and the original high-resolution had significantly higher cartilage sharpness scores (p<1e-9) and contrast scores (p<0.001). The original high-resolution had significantly higher cartilage sharpness than DeepResolve (p<0.001). All DeepResolve image quality metrics maintained a minimum diagnostic quality score of 3, unlike TCI, which had a sharpness metric of 2.3±0.5.

Figure 5:

Figure 5:

Reader 1 found TCI sharpness and artifacts significantly worse than the original and DeepResolve images (a). Readers 2 and 3 had significant sharpness differences between TCI and both, the original and DeepResolve images (b,c). They also perceived significant sharpness differences between DeepResolve and original resolution images, as well as SNR differences between the original images and both, the DeepResolve and TCI images. Overall both DeepResolve and the original images had significantly better sharpness and contrast scores, compared to TCI, while the original images had significantly better sharpness than DeepResolve (d). * indicates significant (p<0.05) differences compared to the original high-resolution. ** indicates significant (p<0.05) differences compared to DeepResolve.

Inter-reader cartilage quality readings had a Fleiss κ=0.20 (observed agreement = 0.54, expected agreement = 0.42). Readers 1 and 2 had a κ = 0.17 (0.05 – 0.28), readers 1 and 3 had a κ = 0.09 (0 – 0.20), and readers 2 and 3 had a κ = 0.52 (0.41 – 0.64). While there was only slight agreement between pairs of two readers, each reader had consistently higher scores for DeepResolve than TCI (contingency Table 2 and Fig. 5ac). An overall image sharpness ratings for the original high-resolution images was 4.2 ± 0.5, for the DeepResolve images was 3.7 ± 0.8, and for the TCI images was 2.7 ± 0.9. The original resolution images had significantly higher overall sharpness ratings than DeepResolve and TCI images (p<0.0001), while the DeepResolve images also had significantly higher overall sharpness than TCI (p<0.0001).

Table 2:

Contingency tables depicting the inter-reader variations between pairs of the three readers in the analysis of cartilage image quality. All readings are pooled across the readings of sharpness, signal-to-noise ratio, contrast, and artifacts for 17 subjects assessed using all three image sets (original high-resolution, DeepResolve super-resolution, and tricubic interpolation). Note that no scores of 1 (non-diagnostic) were assigned.

Cartilage Image Quality Reader 2
2 3 4 5 Total
Reader 1 2 6 6 7 0 19
3 9 39 39 0 87
4 1 34 53 0 88
5 0 6 4 0 10
Total 16 85 103 0 204
Cartilage Image Quality Reader 3
2 3 4 5 Total
Reader 1 2 5 7 7 0 19
3 8 34 45 0 87
4 1 38 49 0 88
5 0 6 4 0 10
Total 14 85 105 0 204
Cartilage Image Quality Reader 3
2 3 4 5 Total
Reader 2 2 14 0 0 0 14
3 2 54 29 0 85
4 0 31 74 0 105
5 0 0 0 0 0
Total 16 85 103 0 204

Reader Study – Osteophyte Detection:

The sensitivity, specificity, and accuracy of the DeepResolve scans was higher for all osteophyte grades compared to TCI (Table 3). The overall accuracy for DeepResolve was significantly higher (p<0.001) than TCI as assessed using the Cochran-Mantel-Haenszel test. The DeepResolve DOR was 32.0 (22.3–45.9) while that of TCI was 16.8 (13.0–21.8), indicating a significant (p<0.01) variation between osteophyte detection for DeepResolve and TCI. DeepResolve had a κ of 0.72 (0.68 – 0.77) while TCI had a κ of 0.63 (0.58 – 0.68), compared to the original high-resolution for osteophyte detection. The overall Fleiss’ κ was 0.61 (observed agreement = 0.75, expected agreement = 0.37).

Table 3:

The sensitivity, specificity, and accuracy and the corresponding confidence intervals (in parentheses) for grading osteophytes on a scale of 0–3 using the DeepResolve and TCI images, with the original high-resolution images as the reference standard. All results are pooled across all patients and readers.

Osteophyte Grade Incidence Sensitivity Specificity Accuracy
DeepResolve TCI DeepResolve TCI DeepResolve TCI
1 270 0.80 (0.74 – 0.84) 0.73 (0.68 – 0.89) 0.88 (0.85– 0.91) 0.85 (0.82 – 0.89) 0.85 (0.82 – 0.88) 0.81 (0.78 – 0.84)
2 332 0.79 (0.74 – 0.83) 0.70 (0.65 – 0.75) 0.83 (0.79 −0.86) 0.78 (0.73 – 0.82) 0.81 (0.78 – 0.84) 0.74 (0.71 – 0.78)
3 109 0.81 (0.72 – 0.88) 0.78 (0.69 – 0.85) 0.97 (0.95 – 0.98) 0.94 (0.91 – 0.95) 0.94 (0.92 – 0.96) 0.91 (0.89 – 0.93)
Total 711 0.80 (0.76 – 0.82) 0.73 (0.69 – 0.76) 0.93 (0.92 – 0.94) 0.91 (0.90 – 0.92) 0.90 (0.89 – 0.91) 0.86 (0.85 – 0.88)

Reference-less Blur Factor:

The blur factors for the in-plane dimensions (x and y) were similar for all three sets of images, while there was a larger difference between the blur factors in the through-plane dimension (z) (Fig. 6). The variances of blur factors were small, suggesting repeatable behavior of the metric for images with similar pre-processing and post-processing pipelines. While the DeepResolve z-blur factor was not the same compared to the original images, it was significantly (p<0.001) superior than the TCI images, corroborated using the example multi-planar reformations shown in Figure 3. Additional examples depicting the blurring in the TCI images that can be improved using DeepResolve for enhancing the visualization of cartilage and osteophytes in the coronal and axial planes are shown in Figure 7.

Figure 6:

Figure 6:

The reference-less blur factor is calculated by convolving the test image with a blurring kernel and evaluating the absolute value of the image gradients (|∇f|) that are normalized (denoted by “N”) to the original image intensity (a). The blur factors demonstrate minimal blurring variations between the three scans in the ‘x’ and ‘y’ directions (in-plane), but the blurring in the ‘z’ (through-plane) was considerably different between scans (b). TCI had the worst through-plane blurring, while DeepResolve was able to start with TCI images and enhance their quality to reduce the through-plane blurring. * indicates significant (p<0.05) differences compared to the original high-resolution. ** indicates significant (p<0.05) differences compared to DeepResolve.

Figure 7:

Figure 7:

Example coronal reformats (a–c) demonstrate the variations in the depiction of cartilage using the three image sets. Using the zoomed in inlays (dotted box), DeepResolve enhanced the appearance of jagged cartilage artifacts (solid arrow) and sharpened the contours of the cartilage-bone interface (dotted arrow), compared to TCI. Similarly, example axial reformats (d–f) and the zoomed in inlays show that DeepResolve enhanced the depiction of several osteophytes whose contours were blurred out in TCI images. The depiction of both, cartilage and osteophytes, using DeepResolve had a closer resemblance to the sharper original resolution images than the blurrier TCI images.

DISCUSSION:

In this study, we demonstrated the utility of deep-learning-based super-resolution beyond the assessment of image quality using quantitative SSIM, pSNR, and RMSE. We first utilized a MRI super resolution method (DeepResolve) to enhance the slice resolution of lower-resolution DESS scans with threefold thicker slices, without biasing the quantitative osteoarthritis biomarkers the original high-resolution DESS sequence was utilized for. To quantify variations caused by potential blurring of the cartilage, a highly-accurate fully-automated convolutional neural network demonstrated nearly identical quantitative segmentation metrics for the original high-resolution images and the DeepResolve images, both considerably better than TCI images. In a reader study evaluating qualitative image quality as assessed by radiologists, DeepResolve significantly enhanced image quality compared to TCI. Additionally, DeepResolve also significantly outperformed TCI in the detection of osteophytes. We also demonstrated the utility of the blur factor metric to ascertain a reference-less measure of image blurring. DeepResolve did not match the high image quality of the original high-resolution data, but was able to considerably outperform TCI in image quality and biomarker accuracy. This finding may be promising since several MRI visualization and analysis techniques rely on the use of interpolation methods to resize images. Moreover, such results demonstrate that super-resolution is not only a promising method for enhancing image quality, but it can also be used to recreate quantitative biomarkers that necessitate high-resolution MRI.

The 3D U-Net segmentation network was able to generate very high accuracy for performing automated femoral segmentation. Using such an automated method that has an accuracy coefficient of variation similar to the variability between two human readers eliminated human uncertainty during cartilage segmentation (5, 29). The use of such a network to demonstrate a high overlap between the original high-resolution segmented volume and the DeepResolve segmented volume depicted that the neural network perceived both sets of images relatively similarly. Exactly similar images fed through the 3D U-Net would produce overlap metrics of 100% DC, and 0% VOE, VD, and CV% values, which were similar to the actual results generated. In comparison, traditional methods such as TCI performed significantly worse, which was expected since interpolation has been shown to blur out subtle MRI features (35).

While the 3D U-Net evaluated image quality from a quantitative metric, the reader study was able to determine subjective image quality specific to tissues of interest. The DeepResolve metrics of contrast and artifacts were comparable to that of the original high-resolution. The DeepResolve sharpness metric was slightly lower than the high-resolution, but it was considerably higher than TCI, which is a commonly used technique. There was however, a low concordance amongst the three readers, but this was primarily caused by variations in grading the quality metrics either 3 (minimum diagnostic quality) or 4 (good quality), which can be challenging to distinguish between, especially when only analyzing subtle tissues such as cartilage. While Likert scales are based on subjective readings which can be affected by reader experience and comfort levels, there was general and consistent trend of DeepResolve consistently outperforming TCI stayed for all three readers.

The reader study also demonstrated that the DeepResolve sequence had higher conspicuity for detecting osteophytes as compared to TCI. Identifying osteophytes requires the ability to perform multi-planar reformations, which requires a high in-plane and through-plane resolution, making it an ideal application area for through-plane slice resolution. The sensitivity for detecting osteophytes was consistent over all osteophyte grades, while the DeepResolve sensitivities for subtle osteophytes (grade 1 and 2) were considerably higher than those of TCI. The specificity was the highest for the largest (grade 3) osteophytes for both DeepResolve and TCI scans, suggesting that if the osteophyte is large enough, it has similar conspicuity on both scans. The inter-reader and intra-reader agreement was comparable to previous studies where the initial MOAKS scoring criteria was presented and validated by expert musculoskeletal radiologists (32). The overall high diagnostic odds ratio, inter- and intra-reader agreement, and accuracy of DeepResolve suggests that super-resolution may not bias the detection of osteophytes.

The DESS sequence was included in the OAI primarily to evaluate the morphometry of cartilage and the high-resolution was later utilized to evaluate variations in bone shape (6, 29, 36). In this study, we evaluated the impact of using super-resolution to extract the biomarkers that the original DESS enabled. The increased accuracy of cartilage and osteophyte findings, and the image quality of DeepResolve images compared to TCI images showed that DeepResolve considerably enhanced the image quality of the TCI inputs. However, the differences between the DeepResolve and original resolution images demonstrated that the SR method could be further improved in order to remove the residual blurring between the two sets of images. Nonetheless, using such a case study that necessitates the need for high-resolution, DeepResolve may be a promising method to accelerate future MRI scans that are acquired with a lower-resolution. In addition, the underlying super-resolution neural network has been extended to other pulse sequences such as qDESS, which separates the two echoes generated in DESS and also calculates an automatic T2 relaxation time map (19, 37).

Accurately evaluating image quality is one of the primary challenges in deep learning medical image reconstruction or enhancement techniques. Reductions in traditional cost functions such as mean absolute error or mean square errors, do not correspond to perceptual image quality (16). Here, we proposed a novel method for quantifying the results of the DeepResolve technique, which was successful in determining the extent of blurring in the three sets of images, and conversely, the resolution enhancement of the super-resolution methods. As depicted in the blur factor plots, the in-plane blurring (x and y directions) was primarily induced during the TCI process and overall, DeepResolve only culminated in minimal changes in-plane blurring compared to the original resolution images. The additional benefit of the blur factor was that it did not require a reference high-quality image for comparison, which could be beneficial for use as an optimization criteria in unsupervised learning algorithms or when a reference image is unavailable. The low-pass filter used in this study utilized a 1D Shah function (of length 9) as per previous recommendations, however, future studies could investigate the use of additional blurring filters. Additionally, in this work, slice resolution enhancement was chosen because most MRI vendors have slice-interpolation options available and because musculoskeletal MRI is normally performed with 2D fast spin echo sequences that have a high in-plane resolution, but thicker slices and slice gaps. However, future work will be necessary to investigate the tradeoffs with downsampling in different dimensions in order to determine the ideal technique for limiting scan time.

While the results from this study demonstrate the promise of DeepResolve, this study had several limitations. First, the analysis of the quantitative cartilage morphometry was performed using an additional CNN rather than a trained human observer. However, since the expected variation between the original high-resolution and the DeepResolve scans was subtle, it would have been challenging to decouple human segmentation variations from the inter-scan variations. While the reader study assessed cartilage image quality, an analysis of cartilage lesions may have been useful. However, the DESS sequence used in the OAI has low sensitivity to cartilage lesions, likely due to the combination of the two separate DESS echo contrasts (38, 39). Future studies separating the two DESS echoes, akin to qDESS, may provide an improved estimate of structural abnormalities. Third, the analysis of osteophytes was binned into discrete 0–3 categories instead of reporting the actual size due to the challenges in standardizing readings between different readers. Future studies could potentially perform segmentation of the osteophytes to evaluate whether super-resolution changes their perceived shape and dimensions. Moreover, applying these findings to larger cohorts would be beneficial to determine the utility of SR, especially for subjects suffering from varying OA severities. Our preliminary findings also suggest that DeepResolve (trained on Siemens single-contrast DESS images) can be fine-tuned to enhance GE multi-contrast quantitative DESS images (that generates two varied contrasts) using only 30 patients (19). However, additional characterization ascertaining the generalizability of DeepResolve to additional sequences, protocols, and vendors will be necessary for widespread adoption.

In conclusion, in an effort to interpret the results of a deep-learning-based super-resolution neural network, we have quantitatively shown that super-resolution minimally affects perceived global image blurring, and qualitatively and quantitatively shown that it minimally biases cartilage and osteophyte biomarkers and image quality. Based on such a performance that minimally blurs or biases subtle musculoskeletal tissues, super-resolution may be a more promising technique than naïve interpolation for accelerating image acquisition by transforming low-resolution images that can be acquired rapidly into higher-resolution images.

Table 1:

The osteophyte scoring paradigm which summarizes the locations that osteophytes along with a criteria for determining osteophyte grade, based on the extent of protrusion from the joint surface.

Bone Region Side Scan Plane
Femur Trochlea Medial Axial
Lateral Axial
Posterior Medial Peripheral Axial
Medial Central Axial
Lateral Peripheral Axial
Lateral Central Axial
Central Medial Coronal
Lateral Coronal
Patella N/A Superior Sagittal
Inferior Sagittal
Medial Axial
Lateral Axial
Tibia N/A Medial Coronal
Lateral Coronal
Protrusion 0–2mm 2–4mm 4+ mm
Osteophyte Grade 1 2 3

Acknowledgments

Grant Support:

Contract grant sponsor: National Institutes of Health (NIH); contract grant numbers NIH R01 AR063643, R01 EB002524, K24 AR062068, and P41 EB015891. Contract grant sponsor: GE Healthcare (research support). Image data was acquired from the Osteoarthritis Initiative (OAI). The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories; Novartis Pharmaceuticals Corporation, GlaxoSmithKline; and Pfizer, Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health. This manuscript was prepared using an OAI public use data set and does not necessarily reflect the opinions or views of the OAI investigators, the NIH, or the private funding partners.

Disclosures:

A.C. has provided consulting services to Skope MR Inc, Subtle Medical, and Chondrometrics GmBH; and is a shareholder of Subtle Medical, LVIS Corporation, and Brain Key. Z.F. is an employee of LVIS Corporation. G.G. and B.H. have received research support from GE Healthcare and Philips. B.H. is a shareholder of LVIS Corporation. Neither organizations were involved in the design, execution, data analysis, or the reporting of this study.

REFERENCES:

  • 1.Wallace IJ, Worthington S, Felson DT, et al. : Knee osteoarthritis has doubled in prevalence since the mid-20th century. Proc Natl Acad Sci 2017; 114:201703856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Vos T, Flaxman AD, Naghavi M, et al. : Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990–2010: A systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012; 380:2163–2196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Guermazi A, Roemer FW, Burstein D, Hayashi D: Why radiography should no longer be considered a surrogate outcome measure for longitudinal assessment of cartilage in knee osteoarthritis. Arthritis Res Ther 2011; 13:247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Baum T, Joseph GB, Karampinos DC, Jungmann PM, Link TM, Bauer JS: Cartilage and meniscal T2 relaxation time as non-invasive biomarker for knee osteoarthritis and cartilage repair procedures. Osteoarthritis Cartilage 2013; 21:1474–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Eckstein F, Kwoh CK, Link TM: Imaging research results from the Osteoarthritis Initiative (OAI): a review and lessons learned 10 years after start of enrolment. Ann Rheum Dis 2014; 2006:1289–1300. [DOI] [PubMed] [Google Scholar]
  • 6.Peterfy CG, Schneider E, Nevitt M: The osteoarthritis initiative: report on the design rationale for the magnetic resonance imaging protocol for the knee. Osteoarthr Cartil 2008; 16:1433–1441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Blaimer M, Breuer F, Mueller M, Heidemann RM, Griswold MA, Jakob PM: SMASH, SENSE, PILS, GRAPPA: how to choose the optimal method. Top Magn Reson Imaging 2004; 15:223–36. [DOI] [PubMed] [Google Scholar]
  • 8.Hollingsworth KG: Reducing acquisition time in clinical MRI by data undersampling and compressed sensing reconstruction. Phys Med Biol 2015; 60:R297–R322. [DOI] [PubMed] [Google Scholar]
  • 9.Heidemann RM, Özsarlak Ö, Parizel PM, et al. : A brief review of parallel magnetic resonance imaging. Eur Radiol 2003; 13:2323–2337. [DOI] [PubMed] [Google Scholar]
  • 10.Busse RF, Hariharan H, Vu A, Brittain JH: Fast spin echo sequences with very long echo trains: design of variable refocusing flip angle schedules and generation of clinical T2 contrast. Magn Reson Med 2006; 55:1030–1037. [DOI] [PubMed] [Google Scholar]
  • 11.Park SC, Park MK, Kang MG: Super-resolution image reconstruction: A technical overview. IEEE Signal Process Mag 2003; 20:21–36. [Google Scholar]
  • 12.Chen Y, Xie Y, Zhou Z, Shi F, Christodoulou AG, Li D: Brain MRI Super Resolution Using 3D Deep Densely Connected Neural Networks 2018:1–4.
  • 13.Chaudhari AS, Fang Z, Kogan F, et al. : Super-resolution musculoskeletal MRI using deep learning. Magn Reson Med 2018; 80:2139–2154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lustig M, Donoho D, Pauly JM: Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn Reson Med 2007; 58:1182–95. [DOI] [PubMed] [Google Scholar]
  • 15.Zhao H, Gallo O, Frosio I, Kautz J: Loss Functions for Neural Networks for Image Processing. arXiv Prepr arXiv151108861 2015:1–11.
  • 16.McCann MT, Jin KH, Unser M: Convolutional Neural Networks for Inverse Problems in Imaging: A Review. IEEE Signal Process Mag 2017; 34:85–95. [DOI] [PubMed] [Google Scholar]
  • 17.Mardani M, Gong E, Cheng JY, et al. : Deep Generative Adversarial Neural Networks for Compressive Sensing (GANCS) MRI. IEEE Trans Med Imaging 2018; PP(c):1–1. [DOI] [PMC free article] [PubMed]
  • 18.Chaudhari AS, Fang Z, Kogan F, et al. : Super-resolution musculoskeletal MRI using deep learning. Magn Reson Med 2018. [DOI] [PMC free article] [PubMed]
  • 19.Chaudhari A, Fang Z, Lee JH, Gold G, Hargreaves B: Deep Learning Super-Resolution Enables Rapid Simultaneous Morphological and Quantitative Magnetic Resonance Imaging. In Int Work Mach Learn Med Image Reconstr; 2018:3–11.
  • 20.Altman RD, Gold GE: Atlas of individual radiographic features in osteoarthritis, revised. Osteoarthr Cartil 2007; 15 Suppl A:A1–56. [DOI] [PubMed] [Google Scholar]
  • 21.Ding C, Garnero P, Cicuttini F, Scott F, Cooley H, Jones G: Knee cartilage defects: association with early radiographic osteoarthritis, decreased cartilage volume, increased joint surface area and type II collagen breakdown. Osteoarthr Cartil 2005; 13:198–205. [DOI] [PubMed] [Google Scholar]
  • 22.Zhang Y, Jordan JM: Epidemiology of osteoarthritis. Clin Geriatr Med 2010; 26:355–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Felson DT, Niu J, Guermazi A, Sack B, Aliabadi P: Defining radiographic incidence and progression of knee osteoarthritis: Suggested modifications of the Kellgren and Lawrence scale. Ann Rheum Dis 2011; 70:1884–1886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Nair V, Hinton GE: Rectified Linear Units Improve Restricted Boltzmann Machines. Proc 27th Int Conf Mach Learn 2010:807–814.
  • 25.Kim J, Kwon Lee J, Mu Lee K: Accurate image super-resolution using very deep convolutional networks. In Proc IEEE Conf Comput Vis Pattern Recognit; 2016:1646–1654. [Google Scholar]
  • 26.Ronneberger O, Fischer P, Brox T: U-Net: Convolutional Networks for Biomedical Image Segmentation. Miccai 2015:234–241.
  • 27.Norman B, Pedoia V, Majumdar S: Use of 2D U-Net Convolutional Neural Networks for Automated Cartilage and Meniscus Segmentation of Knee MR Imaging Data to Determine Relaxometry and Morphometry. Radiology 2018; 000:172322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu F, Zhou Z, Jang H, Samsonov A, Zhao G, Kijowski R: Deep Convolutional Neural Network and 3D Deformable Approach for Tissue Segmentation in Musculoskeletal Magnetic Resonance Imaging 2017; 00. [DOI] [PMC free article] [PubMed]
  • 29.Eckstein F, Hudelmaier M, Wirth W, et al. : Double echo steady state magnetic resonance imaging of knee articular cartilage at 3 Tesla: a pilot study for the Osteoarthritis Initiative. Ann Rheum Dis 2006; 65:433–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kingma DP, Ba J: Adam: A Method for Stochastic Optimization 2014:1–15.
  • 31.Desai AA, Gold GE, Hargreaves BA, Chaudhari AS: Technical Considerations for Semantic Segmentation in MRI using Convolutional Neural Networks. arXiv Prepr arXiv190201977 2019.
  • 32.Hunter DJ, Guermazi A, Lo GH, et al. : Evolution of semi-quantitative whole joint assessment of knee OA: MOAKS (MRI Osteoarthritis Knee Score). Osteoarthr Cartil 2011; 19:990–1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Crété-roffet F, Dolmiere T, Ladret P, et al. : The Blur Effect : Perception and Estimation with a New No-Reference Perceptual Blur Metric. Hum Vis Electron imaging XII 2007; 6492(International Society for Optics and Photonics):64920I. [Google Scholar]
  • 34.Kamesh Iyer S, Tasdizen T, Burgon N, et al. : Compressed sensing for rapid late gadolinium enhanced imaging of the left atrium: A preliminary study. Magn Reson Imaging 2016; 34:846–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Greenspan H, Oz G, Kiryati N, Peled S: MRI inter-slice reconstruction using super-resolution. Magn Reson Imaging 2002; 20:437–446. [DOI] [PubMed] [Google Scholar]
  • 36.Neogi T, Bowes MA, Niu J, et al. : Magnetic resonance imaging-based three-dimensional bone shape of the knee predicts onset of knee osteoarthritis: Data from the osteoarthritis initiative. Arthritis Rheum 2013; 65:2048–2058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chaudhari AS, Black MS, Eijgenraam S, et al. : Five-minute knee MRI for simultaneous morphometry and T 2 relaxometry of cartilage and meniscus and for semiquantitative radiological assessment using double-echo in steady-state at 3T. J Magn Reson Imaging 2018; 47:1328–1341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kohl S, Meier S, Ahmad SS, et al. : Accuracy of cartilage-specific 3-Tesla 3D-DESS magnetic resonance imaging in the diagnosis of chondral lesions: comparison with knee arthroscopy. J Orthop Surg Res 2015; 10:191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chaudhari AS, Stevens KJ, Sveinsson B, et al. : Combined 5-minute double-echo in steady-state with separated echoes and 2-minute proton-density-weighted 2D FSE sequence for comprehensive whole-joint knee MRI assessment. J Magn Reson Imaging 2018:1–12. [DOI] [PMC free article] [PubMed]

RESOURCES