Norman (2019) [18] |
Journal of Digital Imaging |
OA severity (KL grade) |
DenseNet neural network architectures |
Sensitivity & specificity: 84% & 86% (KL grades 0–1), 70% & 84% (KL grade 2), 69% & 97% (KL grade 3), 86% & 99% (KL grade 4). |
Comparable sensitivity and specificity to manual KL grading and previous automatic systems employing different AI/ML algorithms |
Training, validation and testing sets were selected from the same dataset. Misclassifications of KL grading typically occurred when there was hardware in the knee. |
Provides additional data supporting the potential of AI in automatic assessment of OA radiological severity. |
Tiulpin (2018) [19] |
Scientific reports |
OA severity (KL grade) |
Deep Siamese CNN architecture |
Average multi-class accuracy: 66.71%. AUC: 0.93. Kappa coefficient (agreement with expert annotations on test dataset): 0.83 (excellent). MSE value: 0.48. |
Different datasets used for initial training and testing |
Validation and testing sets were selected from the same dataset. |
The provision of probability distributions for each KL grade prediction may assist clinicians in choosing KL grade in ambiguous cases. |
Heisinger (2020) [13] |
Journal of Clinical Medicine |
Need for TKA |
Artificial neural networks (ANNs) with linear, radial basis function and three-layer perceptron neural networks architectures |
Total percentage of correctly predicted knees: 80%. Positive predictive value: 84%. Negative predictive value: 73%. Sensitivity: 41%. Specificity 30%. |
First study to consider longitudinal change in symptomology (pain, function, quality of life) and radiographic structural change in a 4-year period prior to TKA |
Training and testing sets were selected from the same dataset. |
Future externally validated algorithms that can predict TKA need in advance using routinely available patient data could be highly useful for decisions for referral and triage in a primary care setting. |
Leung (2020) [15] |
Radiology |
Need for TKA |
Multitask deep learning model (ResNet34) trained with transfer learning |
AUC: 0.87. Sensitivity: 83%. Specificity: 77%. |
First study to directly predict TKA from knee radiographs using deep learning model |
Limited data size (radiographs from 728 individuals in total) / Training and testing sets were selected from the same dataset. |
TKA prediction models solely based on radiological data have limited clinical utility, although they may serve as a reference for future ML studies. |
El-Galaly (2020) [12] |
Clinical Orthopaedics and Related Research |
Need for early revision TKA |
LASSO regression, random forest classifier, gradient boosting model, neural network |
AUCs: 0.57–0.60. |
First study to predict early revision TKA (≤ 2 years of primary TKA) using preoperative patient data from arthroplasty registries / Temporal external validation was conducted (testing set selected from a separate hold-out year not included in training set). |
Training and testing sets were selected from the same dataset. |
Results from this study suggest that future models predicting early revision TKA may benefit from including more pre-operative information or predicting revision over a longer follow-up duration. |