Table-1.
Serial No. | Year | Modality | Aim (task assisted) | MSK tissue | AI Approach | Dataset used | Performance Output | Obvious limitations | Reference |
---|---|---|---|---|---|---|---|---|---|
1. | 2021 | Radiograph | Application of deep learning algorithm to detect and visualize vertebral fractures on plain frontal radiographs | Bone | ImageNet convolutional neural network (CNN) | 1306 | Area under curve = 0.72 Sensitivity = 73% Specificity = 73% | -Small dataset | Chen HY et al.35 |
2. | 2020 | Radiograph | Assessment of a deep-learning system for fracture detection in musculoskeletal radiographs | Bone | Ensemble of CNNs | 7,15,343 | Overall AUC = 0.974; Sensitivity = 95.2%; Specificity = 81.3%; PPV = 47.4% NPV = 99.7% | over-represented infrequently acquired regions | Jones RM et al.36 |
3. | 2020 | Radiograph | Bone fracture detection through the two-stage system of Crack-Sensitive Convolutional Neural Network | Bone | Double CNN models in sequence- FastNet, followed by CrackNet | 3053 | Accuracy = 0.91; precision = 0.89; recall = 0.90; F-measure = 0.90 | -Small dataset | Ma Y et al.25 |
4. | 2019 | Radiograph | Classify hip fracture, patient traits and hospital process variables | Bone | CNN's | 23,602 | The fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86) | -Absence of a reliable gold standard. -Limited label accuracy. -Limited accuracy of covariate data. -Pre-processing reduces image resolution. | Badgeley M et al.37 |
5. | 2018 | Radiographs | Deep neural network improves fracture detection by clinicians (all extremities for pretraining but wrist radiographs for final training, validation and testing | Bone | CNN | 1,32,345 | The average clinician's sensitivity was 80.8% (95% CI, 76.7–84.1%) unaided and 91.5% (95% CI, 89.3–92.9%) aided, and specificity was 87.5% (95 CI, 85.3–89.5%) unaided and 93.9% (95% CI, 92.9–94.9%) aided. | -Single Institute study -Ground truth is subject to the experience of the radiologist | Lindsey R et al.38 |
6. | 2018 | Radiograph | The ability of a deep learning algorithm to detect and classify proximal humerus fractures using AP shoulder radiographs. | Bone | CNN | 1891 | Sensitivity = 0.99 Specificity = 0.97; Youden index = 0.97; Area under curve = 1.0 | -Neer classification was used, which is only moderately reliable. -Cannot be applied to clinics | Chung SW et al.39 |
7. | 2017 | Radiograph | Automated deep learning system to detect hip fractures from frontal pelvic x-rays | Bone | Regression-based CNN | 53,000 | The area under the ROC curve of 0.994 | Small labelled dataset | Gale W et al.23 |
8. | 2017 | Radiograph | Automated fracture detection on plain radiographs (wrist radiographs). | Bone | Inception V3 Network- CNN | 11,112 | The area under the ROC curve 0.954 | -Ground truth was a radiologist (human) -Small labelled dataset. | Kim DH et al.40 |
9. | 2017 | Radiographs | Automatic Classification of Proximal Femur Fractures | Bone | Attention Models- Spatial transformer | 1000 | High sensitivity and specificity | -Small dataset (Single institution study) | Kazi et al.41 |
10. | 2021 | CT | A fully automated rib fracture detection system on chest CT images and its impact on radiologist performance. | Bone | CNN | 8529 | -Increased detection recall and classification accuracy (0.922 and 0.863) compared with the radiologists alone (0.812 vs. 0.850). -The radiologists achieved a higher precision rate, recall rate, and F1-score for fracture detection when using the deep learning model, at 0.943, 0.978, and 0.960, |
NA | Meng XH et al.42 |
11. | 2020 | CT | A multiscale Deep Learning Method for Quantitative Visualization of Traumatic Hemoperitoneum at CT: Assessment of Feasibility and Comparison with Subjective Categorical Estimation. | Bone | 3D- U-Net | 130 | Mean DSC for the multiscale algorithm was 0.61 ± 0.15 compared with 0.32 ± 0.16 for the 3D U-Net method and 0.52 ± 0.17. AUCs for automated volume measurement and categorical estimation were 0.86 and 0.77, respectively (P = .004). An optimal cutoff of 278.9 mL yielded Accuracy = 84%, Sensitivity = 82%, Specificity = 93%, PPV = 86%, NPV = 83%. | -Single institution study | Dreizin D et al.43 |
12. | 2020 | CT | Automatic Detection and Classification of Rib Fractures on Thoracic CT Using Convolutional Neural Network: Accuracy and Feasibility. | Bone | Faster R–CNN and YOLOv3 | 1079 | The precision of the five radiologists improved from 80.3% to 91.1%, and the sensitivity increased from 62.4% to 86.3% with artificial intelligence-assisted diagnosis. On average, the diagnosis time of the radiologists was reduced by 73.9 s. | -The current model cannot show the anatomical location of the rib fractures (right or light, number of ribs, anatomical name of fractured rib) -Small validation test set | Zhou QQ et al.44 |
13. | 2018 | CT | An automatic system that can detect incidental osteoporotic vertebral fractures in chest, abdomen, and pelvis. | Bone | ResNet34 model for feature extraction; Long-short term memory model | 1432 | These results indicate that our CNN/LSTM approach has high efficacy for diagnosing OVF and its performance is on par with practicing radiologists. | -Single institution study, therefore generalisability is arguable. -Single label for entire model, therefore chances of confounding. | Tomita N et al.45 |
14. | 2020 | MRI | MRI-based Diagnosis of Rotator Cuff Tears using Deep Learning and Weighted Linear Combinations | Muscle, tendons | Base model = VGG-16 | 2492 | Mean area under the curve = 0.98 | -Single Institution study | Kim M et al.46 |
15. | 2019 | MRI | Deep Learning Algorithm in Detecting Osteonecrosis of the Femoral Head on MRI | Bone | ResNet-CNN | 1892 hips (1037 diseased and 855 normal) | Sensitivity and specificity for the external test set were 84.8% and 91.3% for the DL algorithm. Sensitivity and specificity for the geographic external test set were 75.2% and 97.2% for the DL algorithm. Higher than less experienced radiologist, and comparable to the experienced radiologist. | -Ideal testing environment. -Slight selection bias. -Difficult to know if the performance of the model will be hindered by other diseases affecting the trabecular pattern. | Chee CG et al.47 |
16. | 2018 | MRI | Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet | Ligament | MRNet (CNN) | 1370 | In detecting abnormalities, ACL tears, and meniscal tears, this model achieved area under the ROC (AUC) values of 0.937 (95% CI 0.895, 0.980), 0.965 (95% CI 0.938, 0.993), and 0.847 (95% CI 0.780, 0.914), respectively, on the internal validation set. | -Performance was sub-par as compared to radiologists. | Bien N et al.48 |
17. | 2018 | MRI | Super-resolution musculoskeletal MRI using deep learning | NA (Scan quality) | 3D- CNN “DeepResolve” | 124 double echo in steady-state (DESS) data sets with 0.7-mm slice thickness and tested on 17 patients. | Significantly better structural similarity, peak signal to noise ratio, and root mean square error than tricubic interpolation, Fourier interpolation, and sparse-coding super-resolution for all down-sampling factors. | It did not match the image quality of the high-resolution ground-truth images, but it outperformed other resolution enhancement methods. | Chaudhari AS et al.49 |
18. | 2018 | USG | investigation into the feasibility of using deep learning methods for developing arbitrary full spatial resolution regression analysis of B-mode ultrasound images of human skeletal muscle. | Muscle | Feature engineering (Wavelet), convolutional neural networks (CNN), residual convolutional neural networks (ResNet) and deconvolutional neural networks | 8 | Deconvolutional Neural Network > CNN/ResNet > Wavelet | None stated. | Cunningham R et al.50 |
19. | 2017 | USG | Ultrasound aided vertebral level localization for lumbar surgery | Bone | Deep CNN, Random Forest | 19 | DL method outperformed the Random Forest on the test dataset (F-measure of 0.90 vs 0.83) | Semi-automatic (therefore, user dependent) | Baka N et al.51 |