Abstract
Objective
To develop and evaluate deep learning (DL) risk assessment models for predicting the progression of radiographic medial joint space loss using baseline knee X-rays.
Methods
Knees from the Osteoarthritis Initiative without and with progression of radiographic joint space loss (defined as ≥ 0.7mm decrease in medial joint space width measurement between baseline and 48-month follow-up X-rays) were randomly stratified into training (1400 knees) and hold-out testing (400 knees) datasets. A DL network was trained to predict the progression of radiographic joint space loss using the baseline knee X-rays. An artificial neural network was used to develop a traditional model for predicting progression utilizing demographic and radiographic risk factors. A combined joint training model was developed using a DL network to extract information from baseline knee X-rays as a feature vector, which was further concatenated with the risk factor data vector. Area under the curve (AUC) analysis was performed using the hold-out test dataset to evaluate model performance.
Results
The traditional model had an AUC of 0.660 (61.5% sensitivity and 64.0% specificity) for predicting progression. The DL model had an AUC of 0.799 (78.0% sensitivity and 75.5% specificity), which was significantly higher (p<0.001) than the traditional model. The combined model had an AUC of 0.863 (80.5% sensitivity and specificity), which was significantly higher than the DL (p=0.015) and traditional (P<0.001) models.
Conclusion
DL models using baseline knee X-rays had higher diagnostic performance for predicting the progression of radiographic joint space loss than the traditional model using demographic and radiographic risk factors.
Keywords: Osteoarthritis, Deep Learning, Radiographs, Risk Assessment Models
INTRODUCTION
Osteoarthritis (OA) is one of the most prevalent and disabling chronic diseases in the United States and worldwide 1,2. The knee is the joint most commonly affected by OA 3,4. Identifying individuals at high risk for knee OA incidence and progression would provide a window of opportunity for disease modification during the earliest stages of the disease process when interventions such as weight loss, physical activity, and range of motion and strengthening exercises are likely to be most effective 5. Identifying individuals at high risk for OA progression would also be useful for selecting the most optimal subjects for inclusion in clinical trials investigating new disease modifying drugs 6,7. Clinical drug trials currently require large numbers of subjects and long follow-up periods due to the inherently different rates of disease progression in individuals with knee OA 8–13. However, exclusive selection of subjects at high risk for knee OA progression for inclusion in clinical trials could reduce study size and duration, decrease the required financial resources, and potentially increase the likelihood of successful development of new disease modifying drugs 6,7.
There is an important need to create OA risk assessment models for widespread use in clinical practice and clinical drug trials. However, current models, which have primarily used clinical and radiographic risk factors including age, gender, race, body mass index (BMI), history of knee injury, and Kellgren-Lawrence (KL) radiographic grade, have shown only moderate success for predicting the incidence 14–17 and progression 18,19 of knee OA. Incorporation of semi-quantitative and quantitative measures of knee joint pathology on baseline X-rays 20–23 and magnetic resonance (MR) images 24 has improved the diagnostic performance of OA risk assessment models. However, the time and expertise needed to acquire these imaging parameters would make it impossible to incorporate them into widespread, cost-effective OA risk assessment models. Thus, new and improved strategies are needed to create comprehensive risk assessment models for the incidence and progression of knee OA.
Deep learning (DL) is an advanced artificial intelligence method which uses multiple levels of representation obtained by composing simple nonlinear modules that each transform the representation at one level into a representation at a higher and more abstract level 25–28. With the combination of enough such transformations, very complex features can be learned 29. DL has tremendous potential for creating OA risk assessment models by providing a new rapid and fully-automated method to extract useful prognostic information from imaging studies. DL could potentially learn a representative subset of features on baseline imaging studies associated with the incidence and progression of knee OA. Our study was performed to develop and evaluate DL risk assessment models for predicting the progression of radiographic medial joint space loss using baseline knee X-rays. We hypothesize that DL models would have higher diagnostic performance for predicting the progression of radiographic joint space loss than traditional models using demographic and radiographic risk factors.
METHODS
Eligible Study Participants
Eligible study participants were selected from the Osteoarthritis Initiative (OAI) database, a multi-center study which collected longitudinal clinical and imaging data over a nine-year follow-up period in 4796 subjects between the ages of 45 and 75 years with or at high risk for knee OA 30. Study participants were selected from both the incidence cohort of subjects without radiographic knee OA but with knee pain and risk factors for OA incidence and the progression cohort of subjects with radiographic knee OA and risk factors for OA progression. The OAI was approved by the Committee on Human Research and the Internal Review Boards at University of California at San Francisco and at each individual clinical recruitment site.
Eligible study participants had the following clinical and imaging data publicly available in the OAI database: 1) age, gender, race, BMI, and history of knee injury (defined according to a question on the standardized OAI questionnaire as a nonspecific acute injury preventing weight bearing for at least two days) at baseline, 2) KL grade of knee OA 31 provided by central reading at baseline, 3) anatomic axis alignment (tibiofemoral angle) measurements 32 provided by central reading at baseline, and 4) minimum medial joint space width measurements 33 provided by central reading at baseline and 48-month follow-up. KL grade, tibiofemoral angle, and minimum medial joint space width measurements were obtained from bilateral standing posterior-anterior knee X-rays acquired using standardized technique with a SynaFlexor fixed-flexion positioner 34. There were 2301 subjects with 4602 knees with the above-mentioned clinical and imaging data available in the OAI database. One-hundred fifty-five knees in 154 subjects had a KL grade of 4 at baseline and were excluded as their minimum medial joint space width measurements would be expected to be 0mm at baseline. Thus, there was 2300 subjects with 4447 knees in the OAI database eligible to participate in our study.
Outcome Measure for the OA Risk Assessment Models
The outcome measure for the OA risk assessment models was a definitive progression of medial joint space loss on longitudinal bilateral standing posterior-anterior knee X-rays between baseline and 48-month follow-up measured using semi-automated software. The software determined the minimum joint space width across the medial compartment of the knee joint 33. Definitive progression of radiographic joint space loss was defined according to the National Institute of Health OA Biomarkers Consortium Project as a greater than or equal to 0.7mm decrease in minimum medial joint space width measurements obtained between baseline and 48-month follow-up. This cutoff was based on the mean and standard deviation of one year changes in minimum medial joint space width measurements on bilateral standing posterior-anterior X-rays in 90 knees in the OAI reference control cohort with a KL grade of 0 and WOMAC pain score of 0 at both baseline and 24-month follow-up 35.
OA Risk Assessment Models
Traditional Risk Assessment Models
Traditional risk assessment models were developed using five alternative approaches including Random forest 36, logistic regression 37, and three different artificial neural networks (ANNs) 14–16,24,38,39. Random forest is an ensemble-learning model that creates a multitude of decision trees during training with the output being the mode of the classifications of the individual trees. Logistic regression is a multivariable method for modeling binary classification that uses a logistic function to analyze the input to provide a confidence score between 0 and 1 for the output. The first ANN model (ANN), illustrated in Figure 1, had an identical architecture as an ANN that showed high diagnostic performance for creating OA risk assessment models in previous studies and consisted of four layers including an input layer, two hidden layers with 64 and 32 hidden nodes, and an output layer 38,39. The second ANN model (ANN 2) consisted of four layers including an input layer, two hidden layers with 85 and 25 hidden nodes, and an output layer 40. The third ANN model (ANN 3) consisted of five layers including an input layer, three hidden layers with 20, 26 and 18 hidden nodes, and an output layer 38. For all ANN models, the softmax output layer was used to compress the information 38 and provide a confidence value between 0 and 1 indicating the likelihood for progression of radiographic joint space loss. The input of the risk assessment models consisted of seven demographic and radiographic risk factors including baseline age, gender, race, BMI, history of knee injury, KL grade, and tibiofemoral angle 7,41–43. The risk factors that were continuous variables were normalized by means and standard deviations 44. No pre-processing of categorical variables was performed.
Figure 1:
Illustration of the architecture of the traditional ANN risk assessment model for predicting the progression of radiographic joint space loss. The ANN had four layers including an input layer with seven demographic and radiographic risk factors, two hidden layers with 64 and 32 hidden nodes, and an output layer with two nodes providing a confidence value between 0 and 1 indicating the likelihood for progression of radiographic joint space loss.
DL Risk Assessment Model
The DL risk assessment model consisted of two separate deep convolutional neural networks (CNNs) connected in a cascaded fashion to create a fully-automated processing pipeline. The first joint cropping CNN was used to crop regions of interest around each individual knee joint on the baseline bilateral standing posterior-anterior knee X-rays to narrow the range of information utilized for DL analysis. The second classification CNN evaluated the cropped images of the knee joint to determine the likelihood for progression of radiographic joint space loss. The detailed structure of the joint cropping and classification CNNs are described in Supplemental Table 1. The processing pipeline framework was implemented in a hybrid computing environment involving Python (version 2.7, Python Software Foundation, Wilmington, DE) and MATLAB (version 2018a, MathWorks, Natick, MA). The CNNs were coded using TensorFlow (version 1.08, Google, Mountain View, CA).
The first fully-automated joint cropping CNN was adapted from You Only Look Once (YOLO) 45, which consisted of 24 convolutional layers followed by two fully connected layers. The input of the CNN was the baseline knee X-rays in DICOM format, which were resized to 448×448 matrix size, normalized by the means and standard deviations of images in the ImageNet training dataset 26,44, and converted to NumPy arrays in Python. The convolutional layers and fully connected layers were used to extract image features to provide the coordinates of two square boxes that defined the regions of each individual knee joint on the X-rays. The pre-defined square boxes were doubled in area to correct for potential errors in the localization process and superimposed over the original DICOM X-ray images with full matrix size. Cropped images were then obtained containing each individual knee joint, which were downsized to 224×224 matrix size and used as the input to the classification CNN.
The second classification CNN was adapted from Densely Connected Convolutional Networks (DenseNet) 46, which consisted of three dense blocks with each block connected by a convolutional layer and a maxpooling layer. A global average pooling (GAP) layer followed the dense blocks. The SoftMax output layer with two nodes was used to compress the information 38 and provide a confidence value between 0 and 1 indicating the likelihood for progression of radiographic joint space loss. The GAP layer was modified using a gradient back-propagation method to calculate saliency maps matching the input image size that showed the regions of discriminative high activation on the X-ray on which the classification CNN based its interpretation 47.
Combined Traditional and DL Risk Assessment Models
Combined traditional and DL risk assessment models were developed using two different approaches. In the first approach, a simple logistic regression model was used to provide a final confidence value between 0 and 1 indicating the likelihood for progression of radiographic joint space loss based on the individual confidence values generated by the best traditional model and the DL model 37. The logistic regression model provided a final confidence value for the progression of radiographic joint space loss (h(x)) based on the two inputs:
where X =[XT XDL] was a vector of the confidence values of traditional model (XT) and DL model (DL), and W was a vector of the parameters of the logistic regression model.
In the second approach, a joint training model was used to take into account the demographic and radiographic risk factors and the DL analysis of baseline knee X-rays as individual inputs. The combined model was developed using YOLO 45 and DenseNet 46 to extract DL information as a feature vector, which was further concatenated with the information extracted from demographic and radiographic risk factor data. The joint training model contained three components: a feature extractor of DL analysis of baseline knee X-rays, a feature extractor of demographic and radiographic risk factor data, and a fully connected network to combine the information. The feature extractor of DL analysis of baseline knee X-rays had the same architecture as the DL risk assessment model. The feature extractor of demographic and radiographic risk factor data was a two layer fully-connected network. The risk factor data was normalized by means and standard deviations and used as the input into a seven-dimensional fully connected layer. The output of the feature extractor of the DL analysis and the feature extractor of the risk factor data were combined as a new vector and then used as the input into another fully-connected network for joint model training. The CNNs and fully-connected layers were connected in a cascaded fashion to create a fully-automated processing pipeline as shown in Figure 2.
Figure 2:
Illustration of the architecture of the combined joint training model for predicting the progression of radiographic joint space loss. The proposed model consisted of two separate convolutional neural networks connected in a cascaded fashion to create a fully-automated pipeline. The combined joint training model was created using YOLO and DenseNet to extract DL information from baseline knee X-rays as a feature vector, which was further concatenated with the normalized demographic and radiographic risk factor data vector. BN: batch normalization, Conv2D: 2D convolution, ReLU: rectified linear activation, 2D: two-dimensional.
Model Training and Evaluation
Training and evaluation of the OA risk assessment models was performed on a desktop computer running a 64-bit Linux operating system (Ubuntu 16.04) with an Intel i7 7700k quad-core CPU with 32 GB DDR3 RAM and two Nvidia GTX 1080-Ti graphic cards with 3584 CUDA cores and 11GB GDDR5X RAM. A detailed description of the training and evaluation methods used for each model is provided in the Supplemental Material.
A total of 1950 knees of the 4447 knees from the 2300 subjects in the OAI database eligible to participate in our study were randomly selected for model training and evaluation, with the number chosen based upon limitations in computational efficiency and capacity. Knees with and without progression of radiographic joint space loss were randomly stratified into three non-overlapping datasets for training, validation, and hold-out testing. The randomization process was performed using a random data generator API (tf.data) in TensorFlow (version 1.08, Google, Mountain View, CA), which used a non-deterministic random shuffled order method to select knees for the training, validation, and hold-out testing datasets. The training dataset consisted of 1400 knees (735 knees without and 665 knees with progression of radiographic joint space loss), which was used to iteratively optimize model parameters. The validation dataset consisted of 150 knees (76 knees without and 74 knees with progression of radiographic joint space loss), which was used to select the most optimal model during the training process. The hold-out testing dataset consisted of 400 knees (200 subjects without and 200 subjects with progression of radiographic joint space loss), which was used for final evaluation of the optimal model to avoid training over-fitting and to ensure that learned features could be generalized to new datasets. Since the outcome measure for the OA risk assessment models was the progression of medial joint space loss, the baseline X-rays of all knees with a KL grade of 3 were reviewed by a fellowship-trained musculoskeletal radiologist with 17 years of clinical experience to ensure that only knees with joint space loss more advanced in the medial than lateral compartment were included. The distribution of demographic and radiographic risk factors for knees in the training, validation, and hold-out testing datasets is provided in the Supplemental Material.
Statistical Analysis
Statistical analysis was performed using MATLAB (version 2019a, MathWorks, Natick, MA) and MedCalc (version 14.8; MedCalc Software, Ostend, Belgium). All analyzed data consisted of statistically independent observations. Statistical significance was defined as a p-value less than 0.05.
Receiver operator characteristic (ROC) analysis with areas under the curves (AUCs) was used to determine the diagnostic performance of the traditional risk assessment models, DL model, combined logistic regression model, and combined joint training model for predicting the progression of radiographic joint space loss for knees in the hold-out testing dataset. Two-sided exact binomial tests were used to calculate 95% confidence intervals. Sensitivity and specificity were also determined for the best traditional model, DL model, combined logistic regression model, and combined joint training model 26,48. The Youden index was used to determine optimal model sensitivity and specificity. AUCs of the best traditional risk assessment model, DL model, combined logistic regression model, and combined joint training model were compared using a nonparametric approach 49.
RESULTS
Table 1 compares the distribution of baseline KL grades in knees without and with progression of radiographic joint space loss in the training, validation, and testing datasets. The distribution of baseline KL grades for knees without and with progression of radiographic joint space loss was similar for the training, validation, and testing datasets. For all datasets, there were more knees without progression of radiographic joint space loss that had a baseline KL grade of 0 and more knees with progression of radiographic joint space loss that had baseline KL grades of 2 and 3.
Table I:
Distribution of baseline KL arades for all knees, knees without progression of radiographic joint space loss, and knees with progression of radiographic joint space loss in the training dataset and hold-out testing dataset.
Knees in Training Dataset | |||
KL Grade | All Knees (N=1400) | Knees Without Progression (N=735) | Knees With Progression (N=665) |
0 | 581 | 435 | 146 |
1 | 218 | 115 | 103 |
2 | 338 | 95 | 243 |
3 | 263 | 90 | 173 |
Knees in Validation Dataset | |||
KL Grade | All Knees (N=150) | Knees Without Progression (N=76) | Knees With Progression (N=74) |
0 | 63 | 45 | 18 |
1 | 19 | 10 | 9 |
2 | 38 | 10 | 28 |
3 | 30 | 11 | 19 |
Knees in Hold-Out Testing Dataset | |||
KL Grade | All Knees (N=150) | Knees Without Progression (N=200) | Knees With Progression (N=200) |
0 | 170 | 126 | 44 |
1 | 46 | 24 | 22 |
2 | 102 | 26 | 76 |
3 | 82 | 24 | 58 |
KL: Kellgren-Lawrence
The ANN model was the traditional OA risk assessment model that showed the highest diagnostic performance. The AUCs for predicting the progression of radiographic joint space loss for all knees were 0.660 (95% confidence interval of 0.620 and 0.714) for the ANN model, 0.637 (95% confidence interval of 0.588 and 0.684) for the ANN 2 model, 0.621 (95% confidence interval of 0.571 and 0.668) for the ANN 3 model, 0.590 (95% confidence interval of 0.540 and 0.639) for the Random forest model, and 0.572 (95% confidence interval of 0.522 and 0.621) for the logistic regression model.
Table II shows the sensitivity, specificity, and AUCs for the OA risk assessment models for predicting the progression of radiographic joint space loss in knees in the hold-out testing dataset with the ROC curves shown in Figures 3 and 4. The traditional ANN model had the lowest diagnostic performance with an AUC of 0.660 (61.5% sensitivity and 64.0% specificity) for all knees, 0.639 (64.9% sensitivity and 68.0% specificity) for KL grade 0 and 1 knees, and 0.681 (64.9% sensitivity and 68.0% specificity) for KL grades 2 and 3 knees. The combined joint training model had the highest diagnostic performance with an AUC of 0.863 (80.5% sensitivity and specificity) for all knees, 0.882 (90.4% sensitivity and 71.3% specificity) for KL grades 0 and 1 knees, and 0.857 (79.9% sensitivity and 84.1% specificity) for KL grades 2 and 3 knees. Figure 5 shows saliency maps for baseline knee X-rays without and with progression of radiographic joint loss evaluated by the combined joint training model. The discriminative high activation regions on the X-rays on which the classification CNN based its interpretation were centered on the joint space and surrounding bone.
Table II:
Sensitivity, specificity, and AUCs for the OA risk assessment models for predicting the progression of radiographic joint space loss in knees in the hold-out testing dataset.
All Knees with Baseline KL Grades 0, 1, 2, and 3 (50.0% of Knees with Progression of Radiographic Joint Space Loss) | |||
Models | Sensitivity (95% CI) | Specificity (95% CI) | AUC (95% CI) |
Traditional ANN Model | 61.5% (54.4% – 68.3%) | 64.0% (56.9% – 70.6%) | 0.660 (0.611 – 0.706) |
DL Model | 78.0% (71.6% – 83.5%) | 75.5% (68.9% – 81.3%) | 0.799 (0.756 to 0.837) |
Combined Logistic Regression Model | 76.5% (70.0% – 82.2%) | 76.5% (70.0% – 82.2%) | 0.823 (0.781 to 0.859) |
Combined Joint Training Model | 80.5% (74.3% – 85.8%) | 80.5% (74.3% – 85.8%) | 0.863 (0.825 to 0.895) |
Knees with Baseline KL Grades 0 and 1 (30.6% of Knees with Progression of Radiographic Joint Space Loss) | |||
Models | Sensitivity | Specificity | AUC |
Traditional ANN Model | 64.9% (56.2 – 73.0) | 68.0% (53.3% – 80.5%) | 0.639 (0.572 to 0.704) |
DL Model | 80.3% (68.7% – 89.1%) | 73.3% (65.5% – 80.2%) | 0.787 (0.726 to 0.840) |
Combined Logistic Regression Model | 78.8% (67.0% – 87.9%) | 70.7% (62.7% – 77.8%) | 0.824 (0.767 to 0.873) |
Combined Joint Training Model | 90.4% (83.2% – 97.5%) | 71.3% (63.4% – 78.4%) | 0.882 (0.831 to 0.922) |
Knees with Baseline KL Grades of 2 and 3 (72.8% of Knees with Progression of Radiographic Joint Space Loss) | |||
Models | Sensitivity | Specificity | AUC |
Traditional ANN Model | 64.9% (56.2% – 73.0%) | 68.0% (53.3% – 80.5%) | 0.681 (0.608 to 0.748) |
DL Model | 76.1% (68.0% – 83.1%) | 84.0% (70.9% – 92.8%) | 0.822 (0.759 to 0.875) |
Combined Logistic Regression Model | 91.0% (84.9% – 95.3%) | 64.1% (49.2% – 77.1%) | 0.833 (0.771 to 0.884) |
Combined Joint Training Model | 79.9% (72.1% – 86.3%) | 84.1% (70.9% – 92.8%) | 0.857 (0.798 to 0.904) |
KL: Kellgren-Lawrence
Figure 3:
Receiver operating characteristic (ROC) curves showing the diagnostic performance of the OA risk assessment models for predicting the progression of radiographic joint space loss for knees with all baseline KL grades in the hold-out testing dataset.
Figure 4:
Receiver operating characteristic (ROC) curves showing the diagnostic performance of the OA risk assessment models for predicting the progression of radiographic joint space loss for knees in the hold-out testing dataset (a) without radiographic OA (baseline KL grades of 0 and 1) and (b) with radiographic OA (baseline KL grades of 2 and 3).
Figure 5:
Saliency maps for baseline knee X-rays in the hold-out testing group (a) without progression of radiographic joint space loss and (b) with progression of radiographic joint loss evaluated by the combined joint training model. Note that the discriminative high activation regions on the X-rays on which the classification CNN based its interpretation were centered on the joint space and surrounding bone (color regions).
DL analysis of baseline knee X-rays improved the diagnostic performance for predicting the progression of radiographic joint space loss when compared to traditional ANN models using demographic and radiographic risk factors. The DL model, combined logistic regression model, and combined joint training model had significantly higher AUCs than the traditional ANN model for all knees (p<0.001), KL grades 0 and 1 knees (p<0.001), and KL grades 2 and 3 knees (p=0.010–0.001). The combined joint training model had significantly higher AUCs than the DL model for all knees (p=0.015) and KL grades 0 and 1 knees (p=0.006) but not for KL grades 2 and 3 knees (p=0.170). There were no significant differences in AUCs between the combined logistic regression model and the DL model (p=0.415–0.469) or between the combined logistic regression model and combined joint training model (p=0.183–0.531).
DISCUSSION
In our study, DL models were found to have significantly higher (p<0.001) diagnostic performance for predicting the progression of radiographic medial joint space loss when compared to the traditional ANN model using demographic and radiographic risk factors. The combined joint training model had the highest overall diagnostic performance with an AUC of 0.863 for predicting the progression of radiographic joint space loss for all knees. The diagnostic performance of the combined joint training model compares favorably to other risk assessment models for knee OA reported in the literature, which have had AUCs ranging between 0.70 to 0.82 for predicting OA incidence 14–17,23,24,50 and between 0.71 and 0.79 for predicting OA progression 15,19–22.
The combined joint training model had an AUC of 0.857 for predicting the progression of radiographic joint space loss for KL grade 2 and 3 knees in our study, which was significantly higher (p<0.001) than the AUC of 0.681 for the traditional model. Previous studies have also shown the benefits of analyzing baseline imaging studies in risk assessment models for predicting the progression of knee OA. LaValley et al showed that incorporating quantitative tibial subchondral bone mineral density measures on baseline dual-energy X-ray absorptiometry (DXA) could significantly increase (p<0.05) the AUC of a traditional OA risk assessment model from 0.65 to 0.73 19. Studies by Janvier et al 21 and Kraus et al 22 found that incorporating quantitative subchondral tibial bone texture measures on baseline knee X-rays could significantly increase (p<0.05) the AUC of a traditional OA risk assessment model. In these studies, the AUCs for the traditional models ranged between 0.57 and 0.71, while the AUCs of the models combining demographic and radiographic risk factors with subchondral tibial bone texture measures ranged between 0.77 and 0.79.
The combined joint training model had an AUC of 0.882 for predicting the progression of radiographic joint space loss for KL grade 0 and 1 knees in our study, which was significantly higher (p<0.001) than the AUC of 0.639 for the traditional model. Previous studies have also shown the benefits of analyzing baseline imaging studies in risk assessment models for predicting the incidence of knee OA. Janvier et al showed that incorporating quantitative subchondral tibial bone texture measures on baseline knee X-rays significantly increased (p<0.05) the AUC of a traditional OA risk assessment model from 0.57 to 0.73 23. Joseph et al found that incorporating semi-quantitative measures of meniscal tear and cartilage lesions and quantitative measures of cartilage T2 relaxation time on baseline MR images significantly increased (p<0.05) the AUC of a traditional OA risk assessment models from 0.67 to 0.73 24. Unlike these previous studies, our study did not define the incidence of knee OA as the development of definitive osteophytes on knee X-rays. However, a study by Ratzlaff et al showed that the mean decrease in minimum medial joint space width in knees that transitioned from a KL grade of 0 or 1 at baseline to a KL grade of 2 at follow-up ranged between 0.18mm and 0.28mm 51. Thus, it is highly likely that most KL 0 and 1 knees in our study, which showed a 0.7mm or greater decrease in minimum medial joint space width over time, also demonstrated the formation of definitive osteophytes on knee X-rays. Even if this was not the case, there is still a benefit of using knees without radiographic OA to investigate the importance of risk factors for OA progression to avoid collider bias 52.
Previous studies have clearly shown the benefits of incorporating quantitative and semi-quantitative measures of knee joint pathology on baseline imaging studies in OA risk assessment models 19, 21−24). However, obtaining quantitative parameters typically requires segmenting joint structures, identifying specific features that warrant investigation based on a priori knowledge, and then extracting the features from the image datasets. Obtaining semi-quantitative parameters requires assessment of each individual structural feature on the imaging studies using a categorical based scoring system. Acquiring quantitative and semi-quantitative imaging parameters are time consuming and reader dependent and thus would be difficult to incorporate into widespread, cost-effective OA risk assessment models. One distinct advantage of the DL models developed in our study is that they can automatically learn a representative subset of features on baseline imaging studies associated with OA incidence and progression. The fully automated DL models could be widely applied in clinical practice and clinical drug trials to rapidly predict the progression of radiographic joint space loss using readily obtainable demographic and radiographic risk factors and baseline knee X-rays.
The architecture of the DL models provided high diagnostic performance for predicting the progression of radiographic joint space loss despite using a relatively small training dataset. The DenseNet classification CNN used in our study provides deeper connectivity than other neural networks, which allows direct propagation of information throughout different network layers and thereby reduces the number of parameters needed to create prediction models 46. DenseNet also allows the creation of saliency maps that can be used to determine whether the regions of high activation on which the classification CNN based its interpretation are located in reasonable areas of the X-ray such as along the joint space or in regions of osteophyte formation. The weights of DenseNet in our study were also initialized by the pre-trained model of ImageNet 53 to increase training efficiency. Finally, combined joint training was used to maximize the diagnostic performance of the DL models. Combined joint training allows the models to extract and analyze the demographic and radiographic risk factors and the DL analysis of baseline knee X-rays together to achieve the most optimal prediction performance. The same joint training approach could be used in future studies to further improve diagnostic performance by incorporating DL analysis of baseline MR images in risk assessment models for the incidence and progression of knee OA.
Our study has several limitations. One limitation was the absence of healthy knees from the OAI reference control cohort without pain or risk factors for OA in the training and testing datasets. Another limitation was that the diagnostic performance of the OA risk assessment models was only evaluated using a hold-out testing dataset in the OAI database. Furthermore, the OA risk assessment models were only developed and evaluated for predicting the progression of radiographic joint space loss in the medial compartment over a 48-month follow-up period. Another limitation of our study was the use of only 1950 knees in the OAI database for model training and evaluation due to limitations in computational efficiency and capacity. A final limitation was that the DL models could provide no mechanistic information regarding the factors responsible for the progression of radiographic joint space loss.
In conclusion, our study has demonstrated the feasibility of using DL risk assessment models for predicting the progression of radiographic medial joint space loss over a 48-month follow-up period using baseline knee X-rays. DL models were found to have significantly higher (p<0.001) diagnostic performance for predicting the progression of radiographic joint space loss when compared to the traditional model using demographic and radiographic risk factors. However, further validation of the DL risk assessment models is needed using different subject populations. Future work is also needed to develop more comprehensive risk assessment models incorporating DL analysis of baseline MR images for predicting the incidence and progression of knee OA.
ROLE OF FUNDING SOURCE
Research support was provided by National Institute of Health grants P41 EB022544, R01 AR068373–01, R01 EB027087, and K24 AR070892.
COMPETING INTERESTS
Ali Guermazi is the President of Boston Imaging Core Lab and a consultant for GE Healthcare, Merck Serono, Tissue Gene, Roche, Galapagos, AstraZeneca, and Pfizer. Richard Kijowski is a consultant for Boston Imaging Core Lab and GE Healthcare. No other authors have competing interests to disclose.
SUPPLEMENTAL MATERIAL
Details of Model Training and Evaluation
Training and evaluation of the OA risk assessment models was performed on a desktop computer running a 64-bit Linux operating system with an Intel i7 7700k quad-core CPU with 32 GB DDR3 RAM and two Nvidia GTX 1080-Ti graphic cards with 3584 CUDA cores and 11GB GDDR5X RAM. Knees with and without progression of radiographic joint space loss were randomly stratified into three non-overlapping datasets for training, validation, and hold-out testing. During the training process, several approaches were used to reduce the risk of overfitting. For the traditional models that were trained from scratch, the number of training values was kept smaller than the number of training parameters through use of relatively shallow networks. For the DL models, data augmentation and fine tuning for the pre-trained model of ImageNet 53 was performed to increase training efficiency. The training process was also closely observed to ensure that the validation accuracy was not lower than the training accuracy, which is an important finding indicating the risk of model overfitting. Finally, the final evaluation of the models was performed using a hold-out testing dataset that did not include any knees used during the training and validation process.
The average training time for the OA risk assessment models using the training datasets was 20 minutes, 13 hours, 13.5 hours, and 13.8 hours for the traditional ANN model, DL model, combined logistic regression model, and combined joint training model, respectively. However, once training was completed, the average time for the models to provide a probability score for the progression of radiographic joint space loss using the hold-out testing datasets was 0.5 seconds, 6.7 seconds, 7.1 seconds, and 8.9 seconds for the traditional ANN model, DL model, combined logistic regression model, and combined joint training model, respectively.
Traditional Risk Assessment Models
Traditional risk assessment models were developed using five alternative approaches including Random forest 36, logistic regression 37, and three different ANNs 14–16,24,38,39. The input into the models was the demographic and radiographic risk factors, with continuous variables normalized by means and standard deviations. The Machine Learning Toolbox in MATLAB (version 2019a, MathWorks, Natick, MA) was used to train the Random forest and logistic regression models. For the Random forest model, 200 decision trees were ensemble to estimate confidence values with the use of early stopping to avoid overfitting. The logistic regression model was trained using standardized methodology. The ANN models were trained using the backpropagation algorithm with all input data trained simultaneously. The weights of the models were randomly initialized and updated using gradient descent at a learning rate of 0.01 for a total of 50 epochs in the training datasets.
DL Risk Assessment Model
Model training was performed for the joint cropping CNN and classification CNN individually. However, once the training process was completed, the DL risk assessment model operated as a fully-automated processing pipeline. To increase training efficiency, several image augmentation methods were used during model training including image translation, rotation, and flipping 27,46,54.
The reference standard for training the joint cropping CNN was manual outline of the region of each individual knee joint on the baseline bilateral standing posterior-anterior knee X-rays performed by a fellowship-trained musculoskeletal radiologist with 17 years of clinical experience. The radiologist evaluated the baseline X-rays and placed two square boxes to isolate each individual knee joint on the X-rays. The x-y coordinates of the top-left and bottom-right corners of the square boxes were labeled and used for network training. The CNN was trained to automatically output the coordinate information of the square boxes outlining the region of each individual knee joint. The input into the joint cropping CNN was the baseline knee X-rays in DICOM format, which were resized to 448×448 matrix, normalized by means and standard deviations with respect to images in the ImageNet training dataset 53, and converted to Numpy arrays, a data format that is widely used by the core library for scientific computing in Python. The output was two pre-defined square boxes placed around each individual knee joint on the X-rays, which were doubled in area to correct for potential errors in the localization process. The model was trained at a batch size of 32 images using multi-class cross-entropy loss 25. The weights of the model were initialized by the pre-trained model of ImageNet 53 and were updated using Stochastic Gradient Descent 25 at a fixed learning rate of 0.0001 and momentum of 0.9 for a total of 100 epochs in the training dataset. In order to assess the accuracy of placement of the pre-defined square boxes, the Average Precision (mAP) was calculated by computing the Intersect over Union (IoU) between the initial pre-defined square boxes and the expanded pre-defined square boxes created by the joint cropping CNN and the reference label square boxes placed around the knee joint by the radiologist during the training process. For all training cases, the average precision was 90.0mAP for the initial square boxes and 98.4mAP for the expanded square boxes that were used as the input to the classification CNN, indicating very high localization accuracy.
The reference standard for training the classification CNN was a label of 0 or 1 indicating the presence or absence of progression of radiographic joint space loss, respectively. The input into the classification CNN was the cropped images containing each individual knee joint created by the joint cropping CNN, which were normalized by means and standard deviations with respect to images in the ImageNet training dataset 53. To increase training efficiency, the weights of the networks were initialized by the pre-trained model of ImageNet 53 and then retrained on the training dataset using a memory efficient training strategy 27,46,55. During fine-tuning with ImageNet weights, the training process was closely observed to ensure that the validation accuracy was not lower than the training accuracy, which is an important finding indicating the risk of model overfitting. The use of a pre-trained model required that the input image to the classification CNN had the same 224×224 matrix size as the images in ImageNet (53). The model was trained using an adaptive gradient-based optimization algorithm 56 implemented in TensorFlow (version 1.08, Google, Mountain View, CA) with an initial learning rate of 0.001, image normalization by means and standard deviations, cross entropy loss function, three dense blocks, dense block growth rate of 12, dense block depth of 6, and 50 epochs. The training parameters were selected by grid hyper-parameter search based on model performance on the validation dataset or by reference to previous implementations that showed favorable performance 46.
Combined Traditional and DL Risk Assessment Models
The reference standard for training the combined logistic regression model was a label of 0 or 1 indicating the presence or absence of progression of radiographic joint space loss, respectively. The input into the model was the confidence values for the progression of radiographic joint space loss provided by the best traditional and DL models. The weights of the model were randomly initialized. The model was trained using gradient descent at a learning rate of 0.001 for a total of 50 epochs in the training dataset.
The reference standard for training the combined joint training model was a label of 0 or 1 indicating the presence or absence of progression of radiographic joint space loss, respectively. The combined joint training model contained three components: a feature extractor of DL analysis of baseline knee X-rays, a feature extractor of demographic and radiographic risk factor data, and a fully connected network to combine the information. The feature extractor of DL analysis of baseline knee X-rays used the same architecture as the DL risk assessment model. The feature extractor of demographic and radiographic risk factor data was a two layer fully-connected network. The output of the feature extractor of the DL analysis and the feature extractor of the risk factor data were combined as a new vector, and then used as the input into another fully-connected network for joint model training. For the feature extractor of DL analysis, YOLO 45 and DenseNet 46 were initialized by the pre-trained model of ImageNet 53. For the feature extractor of the risk factor data and for the joint training model, the weights of the fully connected layers were randomly initialized. The model was trained at a batch size of 32 image patches and were updated using an adaptive gradient-based optimization algorithm 56 with an initial learning rate of 0.001 for a total of 100 epochs in the training dataset. Model training was a complete joint training process with the weights of CNNs and fully connected layers updated simultaneously.
Details of the Training, Validation, and Hold-Out Testing Datasets
Knees with and without progression of radiographic joint space loss were randomly stratified into three non-overlapping datasets for training, validation, and hold-out testing. The training dataset consisted of 1400 knees (735 knees without and 665 knees with progression of radiographic joint space loss), the validation dataset consisted of 150 knees (76 knees without and 74 knees with progression of radiographic joint space loss), and the hold-out testing dataset consisted of 400 knees (200 subjects without and 200 subjects with progression of radiographic joint space loss). Supplemental Table 2 describes the distribution of demographic and radiographic risk factors for knees in the training, validation, and hold-out testing datasets. Non-parametric Kruskal-Wallis tests were used to compare clinical and radiographic risk factors between the three datasets. There were no significant differences (p=228–0.591) between knees in the training, validation, and hold-out testing datasets in the baseline age, gender, race, BMI, history of knee injury, KL grade, or tibiofemoral angle.
Supplementary Material
REFERENCES
- 1.Felson DT, Zhang Y. An update on the epidemiology of knee and hip osteoarthritis with a view to prevention. Arthritis Rheum. 1998;41(8):1343–1355. doi: [DOI] [PubMed] [Google Scholar]
- 2.Felson DT. An update on the pathogenesis and epidemiology of osteoarthritis. Radiol Clin North Am. 2004;42(1):1–9. doi: 10.1016/S0033-8389(03)00161-1 [DOI] [PubMed] [Google Scholar]
- 3.Bedson J, Jordan K, Croft P. The prevalence and history of knee osteoarthritis in general practice: a case-control study. Fam Pract. 2004;22(1):103–108. doi: 10.1093/fampra/cmh700 [DOI] [PubMed] [Google Scholar]
- 4.Peat G, McCarney R, Croft P. Knee pain and osteoarthritis in older adults: a review of community burden and current use of primary health care. Ann Rheum Dis. 2001;60(2):91–97. doi: 10.1136/ard.60.2.91 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Felson DT, Hodgson R. Identifying and treating preclinical and early osteoarthritis. Rheum Dis Clin North Am. 2014;40(4):699–710. doi: 10.1016/j.rdc.2014.07.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Karsdal MA, Michaelis M, Ladel C, et al. Disease-modifying treatments for osteoarthritis (DMOADs) of the knee and hip: lessons learned from failures and opportunities for the future. Osteoarthr Cartil. 2016;24(12):2013–2021. doi: 10.1016/J.JOCA.2016.07.017 [DOI] [PubMed] [Google Scholar]
- 7.Hunter DJ. Risk stratification for knee osteoarthritis progression: a narrative review. Osteoarthr Cartil. 2009;17(11):1402–1407. doi: 10.1016/J.JOCA.2009.04.014 [DOI] [PubMed] [Google Scholar]
- 8.Cicuttini FM, Wluka AE, Wang Y, Stuckey SL. Longitudinal study of changes in tibial and femoral cartilage in knee osteoarthritis. Arthritis Rheum. 2004;50(1):94–97. doi: 10.1002/art.11483 [DOI] [PubMed] [Google Scholar]
- 9.Wluka AE, Stuckey S, Snaddon J, Cicuttini FM. The determinants of change in tibial cartilage volume in osteoarthritic knees. Arthritis Rheum. 2002;46(8):2065–2072. doi: 10.1002/art.10460 [DOI] [PubMed] [Google Scholar]
- 10.Hanna F, Ebeling PR, Wang Y, et al. Factors influencing longitudinal change in knee cartilage volume measured from magnetic resonance imaging in healthy men. Ann Rheum Dis. 2005;64(7):1038–1042. doi: 10.1136/ard.2004.029355 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Raynauld J-P, Martel-Pelletier J, Berthiaume M-J, et al. Long term evaluation of disease progression through the quantitative magnetic resonance imaging of symptomatic knee osteoarthritis patients: correlation with clinical symptoms and radiographic changes. Arthritis Res Ther. 2005;8(1):R21. doi: 10.1186/ar1875 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gandy SJ, Dieppe PA, Keen MC, Maciewicz RA, Watt I, Waterton JC. No loss of cartilage volume over three years in patients with knee osteoarthritis as assessed by magnetic resonance imaging. Osteoarthr Cartil. 2002;10(12):929–937. doi: 10.1053/JOCA.2002.0849 [DOI] [PubMed] [Google Scholar]
- 13.Wirth W, Hellio Le Graverand M-P, Wyman BT, et al. Regional analysis of femorotibial cartilage loss in a subsample from the Osteoarthritis Initiative progression subcohort. Osteoarthr Cartil. 2009;17(3):291–297. doi: 10.1016/J.JOCA.2008.07.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yoo TK, Kim DW, Choi SB, Oh E, Park JS. Simple Scoring System and Artificial Neural Network for Knee Osteoarthritis Risk Prediction: A Cross-Sectional Study. Crispin J, ed. PLoS One. 2016;11(2):e0148724. doi: 10.1371/journal.pone.0148724 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lazzarini N, Runhaar J, Bay-Jensen AC, et al. A machine learning approach for the identification of new biomarkers for knee osteoarthritis development in overweight and obese women. Osteoarthr Cartil. 2017;25(12):2014–2021. doi: 10.1016/J.JOCA.2017.09.001 [DOI] [PubMed] [Google Scholar]
- 16.Kerkhof HJM, Bierma-Zeinstra SMA, Arden NK, et al. Prediction model for knee osteoarthritis incidence, including clinical, genetic and biochemical risk factors. Ann Rheum Dis. 2014;73(12):2116–2121. doi: 10.1136/annrheumdis-2013-203620 [DOI] [PubMed] [Google Scholar]
- 17.Zhang W, McWilliams DF, Ingham SL, et al. Nottingham knee osteoarthritis risk prediction models. Ann Rheum Dis. 2011;70(9):1599–1604. doi: 10.1136/ARD.2011.149807 [DOI] [PubMed] [Google Scholar]
- 18.Halilaj E, Le Y, Hicks JL, Hastie TJ, Delp SL. Modeling and predicting osteoarthritis progression: data from the osteoarthritis initiative. Osteoarthr Cartil. 2018;26(12):1643–1650. doi: 10.1016/J.JOCA.2018.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.LaValley MP, Lo GH, Price LL, Driban JB, Eaton CB, McAlindon TE. Development of a clinical prediction algorithm for knee osteoarthritis structural progression in a cohort study: value of adding measurement of subchondral bone density. Arthritis Res Ther. 2017;19(1):95. doi: 10.1186/s13075-017-1291-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Woloszynski T, Podsiadlo P, Stachowiak GW, Kurzynski M, Lohmander LS, Englund M. Prediction of progression of radiographic knee osteoarthritis using tibial trabecular bone texture. Arthritis Rheum. 2012;64(3):688–695. doi: 10.1002/art.33410 [DOI] [PubMed] [Google Scholar]
- 21.Janvier T, Jennane R, Valery A, et al. Subchondral tibial bone texture analysis predicts knee osteoarthritis progression: data from the Osteoarthritis Initiative: Tibial bone texture & knee OA progression. Osteoarthr Cartil. 2017;25(2):259–266. doi: 10.1016/J.JOCA.2016.10.005 [DOI] [PubMed] [Google Scholar]
- 22.Kraus VB, Feng S, Wang S, et al. Trabecular morphometry by fractal signature analysis is a novel marker of osteoarthritis progression. Arthritis Rheum. 2009;60(12):3711–3722. doi: 10.1002/art.25012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Janvier T, Jennane R, Toumi H, Lespessailles E. Subchondral tibial bone texture predicts the incidence of radiographic knee osteoarthritis: data from the Osteoarthritis Initiative. Osteoarthr Cartil. 2017;25(12):2047–2054. doi: 10.1016/J.JOCA.2017.09.004 [DOI] [PubMed] [Google Scholar]
- 24.Joseph GB, McCulloch CE, Nevitt MC, et al. Tool for osteoarthritis risk prediction (TOARP) over 8 years using baseline clinical data, X-ray, and MRI: Data from the osteoarthritis initiative. J Magn Reson Imaging. 2018;47(6):1517–1526. doi: 10.1002/jmri.25892 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
- 26.Liu F, Guan B, Zhou Z, et al. Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning. Radiol Artif Intell. 2019;1(3):180091. doi: 10.1148/ryai.2019180091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Guan B, Zhang J, Sethares WA, Kijowski R, Liu F. SpecNet: Spectral Domain Convolutional Neural Network. May 2019. http://arxiv.org/abs/1905.10915 Accessed May 28, 2019.
- 28.Liu F SUSAN: segment unannotated image structure using adversarial network. Magn Reson Med. December 2018. doi: 10.1002/mrm.27627 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Suzuki K Overview of deep learning in medical imaging. Radiol Phys Technol. 2017;10(3):257–273. doi: 10.1007/s12194-017-0406-5 [DOI] [PubMed] [Google Scholar]
- 30.Lester G Clinical research in OA--the NIH Osteoarthritis Initiative. J Musculoskelet Neuronal Interact. 8(4):313–314. http://www.ncbi.nlm.nih.gov/pubmed/19147953 Accessed June 20, 2019. [PubMed] [Google Scholar]
- 31.KELLGREN JH, LAWRENCE JS. Radiological assessment of osteo-arthrosis. Ann Rheum Dis. 1957;16(4):494–502. doi: 10.1136/ard.16.4.494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Felson DT, Cooke TDV, Niu J, et al. Can anatomic alignment measured from a knee radiograph substitute for mechanical alignment from full limb films? Osteoarthr Cartil. 2009;17(11):1448–1452. doi: 10.1016/J.JOCA.2009.05.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Neumann G, Hunter D, Nevitt M, et al. Location specific radiographic joint space width for osteoarthritis progression. Osteoarthr Cartil. 2009;17(6):761–765. doi: 10.1016/J.JOCA.2008.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kothari M, Guermazi A, von Ingersleben G, et al. Fixed-flexion radiography of the knee provides reproducible joint space width measurements in osteoarthritis. Eur Radiol. 2004;14(9):1568–1573. doi: 10.1007/s00330-004-2312-6 [DOI] [PubMed] [Google Scholar]
- 35.Osteoarthritis Biomarkers Consortium FNIH Project: Study Design. https://www.oai.ucsf.edu/datarelease/biospecimens.asp Accessed June 20, 2019.
- 36.Huang L, Jin Y, Gao Y, Thung KH, Shen D. Longitudinal clinical score prediction in Alzheimer’s disease with soft-split sparse regression based random forest. Neurobiol Aging. 2016;46:180–191. doi: 10.1016/j.neurobiolaging.2016.07.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yuan Z, Ghosh D. Combining Multiple Biomarker Models in Logistic Regression. Biometrics. 2008;64(2):431–439. doi: 10.1111/j.1541-0420.2007.00904.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kadhim Al-Shayea Q Artificial Neural Networks in Medical Diagnosis. IJCSI Int J Comput Sci Issues. 2011;8(2). www.IJCSI.org Accessed June 20, 2019. [Google Scholar]
- 39.Hafezi-Nejad N, Guermazi A, Roemer FW, et al. Prediction of medial tibiofemoral compartment joint space loss progression using volumetric cartilage measurements: Data from the FNIH OA biomarkers consortium. Eur Radiol. 2017;27(2):464–473. doi: 10.1007/s00330-016-4393-4 [DOI] [PubMed] [Google Scholar]
- 40.Doi K Current status and future potential of computer-aided diagnosis in medical imaging. Br J Radiol. 2005;78(suppl_1):s3–s19. doi: 10.1259/bjr/82933343 [DOI] [PubMed] [Google Scholar]
- 41.Silverwood V, Blagojevic-Bucknall M, Jinks C, Jordan JL, Protheroe J, Jordan KP. Current evidence on risk factors for knee osteoarthritis in older adults: a systematic review and meta-analysis. Osteoarthr Cartil. 2015;23(4):507–515. doi: 10.1016/J.JOCA.2014.11.019 [DOI] [PubMed] [Google Scholar]
- 42.Bastick AN, Belo JN, Runhaar J, Bierma-Zeinstra SMA. What Are the Prognostic Factors for Radiographic Progression of Knee Osteoarthritis? A Meta-analysis. Clin Orthop Relat Res. 2015;473(9):2969–2989. doi: 10.1007/s11999-015-4349-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Blagojevic M, Jinks C, Jeffery A, Jordan KP. Risk factors for onset of osteoarthritis of the knee in older adults: a systematic review and meta-analysis. Osteoarthr Cartil. 2010;18(1):24–33. doi: 10.1016/J.JOCA.2009.08.010 [DOI] [PubMed] [Google Scholar]
- 44.Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. November 2017. http://arxiv.org/abs/1711.05225 Accessed June 20, 2019. [Google Scholar]
- 45.Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol 2016-December. IEEE Computer Society; 2016:779–788. doi: 10.1109/CVPR.2016.91 [DOI] [Google Scholar]
- 46.Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. August 2016. http://arxiv.org/abs/1608.06993 Accessed September 12, 2018. [Google Scholar]
- 47.Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning Deep Features for Discriminative Localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE; 2016:2921–2929. doi: 10.1109/CVPR.2016.319 [DOI] [Google Scholar]
- 48.Liu F, Zhou Z, Jang H, Samsonov A, Zhao G, Kijowski R. Deep Convolutional Neural Network and 3D Deformable Approach for Tissue Segmentation in Musculoskeletal Magnetic Resonance Imaging. Magn Reson Med. July 2017:DOI: 10.1002/mrm.26841. doi:10.1002/mrm.26841 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–845. [PubMed] [Google Scholar]
- 50.Riddle DL, Stratford PW, Perera RA. The incident tibiofemoral osteoarthritis with rapid progression phenotype: development and validation of a prognostic prediction rule. Osteoarthr Cartil. 2016;24(12):2100–2107. doi: 10.1016/J.JOCA.2016.06.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ratzlaff C, Ashbeck EL, Guermazi A, Roemer FW, Duryea J, Kwoh CK. A quantitative metric for knee osteoarthritis: reference values of joint space loss. Osteoarthr Cartil. 2018;26(9):1215–1224. doi: 10.1016/J.JOCA.2018.05.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zhang Y, Niu J, Felson DT, Choi HK, Nevitt M, Neogi T. Methodologic challenges in studying risk factors for progression of knee osteoarthritis. Arthritis Care Res (Hoboken). 2010;62(11):1527–1532. doi: 10.1002/acr.20287 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis. 2015;115(3):211–252. doi: 10.1007/s11263-015-0816-y [DOI] [Google Scholar]
- 54.Shin H-C, Roth HR, Gao M, et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans Med Imaging. 2016;35(5):1285–1298. doi: 10.1109/TMI.2016.2528162 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Pleiss G, Chen D, Huang G, Li T, van der Maaten L, Weinberger KQ. Memory-Efficient Implementation of DenseNets. July 2017. http://arxiv.org/abs/1707.06990 Accessed September 12, 2018.
- 56.Kingma DP, Ba JL. Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. International Conference on Learning Representations, ICLR; 2015. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.