Abstract
Purpose: To develop a deep learning model for automated classification of orthopedic hardware on pelvic and hip radiographs, which can be clinically implemented to decrease radiologist workload and improve consistency among radiology reports.
Materials and Methods: Pelvic and hip radiographs from 4279 studies in 1073 patients were retrospectively obtained and reviewed by musculoskeletal radiologists. Two convolutional neural networks, EfficientNet-B4 and NFNet-F3, were trained to perform the image classification task into the following most represented categories: no hardware, total hip arthroplasty (THA), hemiarthroplasty, intramedullary nail, femoral neck cannulated screws, dynamic hip screw, lateral blade/plate, THA with additional femoral fixation, and post-infectious hip. Model performance was assessed on an independent test set of 851 studies from 262 patients and compared to individual performance of five subspecialty-trained radiologists using leave-one-out analysis against an aggregate gold standard label.
Results: For multiclass classification, the area under the receiver operating characteristic curve (AUC) for NFNet-F3 was 0.99 or greater for all classes, and EfficientNet-B4 0.99 or greater for all classes except post-infectious hip, with an AUC of 0.97. When compared with human observers, models achieved an accuracy of 97%, which is non-inferior to four out of five radiologists and outperformed one radiologist. Cohen’s kappa coefficient for both models ranged from 0.96 to 0.97, indicating excellent inter-reader agreement.
Conclusion: A deep learning model can be used to classify a range of orthopedic hip hardware with high accuracy and comparable performance to subspecialty-trained radiologists.
Keywords: Deep learning, Image classification, Pelvic radiography, Hip radiography, Orthopedic hardware
Introduction
Orthopedic hardware is one of the most encountered entities in pelvis and hip radiography, ranging from total hip arthroplasty (THA) to various fixation devices utilized in the treatment of femoral fractures. Approximately 500,000 THAs are performed annually in the USA alone, with volumes expected to rise substantially over the next few decades in the setting of an aging population [1–3]. In the USA, hip fractures have an estimated incidence of 340,000 annually, and are associated with significantly increased mortality and impact on the healthcare system [4, 5]. Choice of surgical implant, either prosthesis or osteosynthesis, is guided by fracture location and degree of displacement [6]. Despite the variety of surgical management options, rates of reoperation remain high (10.0–48.8%) [7–9]. Pelvis and hip radiographs are routinely performed for postoperative follow-ups, to identify potential complications, and for presurgical planning. As certain pathologies are associated with different hardware types, accurate and specific identification of hardware in the radiology report is especially important if complications are suspected clinically or further operative interventions such as revisions are planned.
Prior studies have demonstrated the effectiveness of deep learning in performing a range of musculoskeletal imaging interpretation tasks related to pelvic/hip pathology, for example, hip fracture detection and subclassification [10], presence/absence of hardware [10, 11], identification of specific hardware complications such as femoral component subsidence and periprosthetic dislocations [12–14]. Many of these prior studies have included hardware detection as an ancillary function for models developed for another primary purpose. For example, models developed for the detection and subclassification of hip fractures [10], pelvic and acetabular fractures [11], and quantitative leg length analysis [15], include a hardware detection function, but they do not subclassify the type of hardware. Gong et al. developed a model to identify four hip arthroplasty designs from leading manufacturers; however, their study was restricted to arthroplasty implants with specific appearances [16].
To our knowledge, this represents the first general deep learning-based hardware subclassification for pelvis/hip radiographs in the literature. Though THAs are the most common device encountered, the array of orthopedic hardware in the hip is diverse in appearance and function, and more precise classification is clinically useful and adds value to the radiology report. Our goal is to develop practical artificial intelligence (AI) systems to improve radiology workflow efficiency by automating common tasks that can be performed with a high degree of certainty/low degree of error. In this regard, orthopedic hardware follow-ups comprise a significant percentage of our pelvis/hip radiographic volume. In this study, we trained and validated a deep learning model to identify the most common types of orthopedic hardware on pelvic and hip radiographs. We also compared model performance to five subspecialty-trained musculoskeletal radiologists.
Methods
Following institutional review board approval of this study with waiver of informed consent, studies were retrospectively obtained from the Picture Archiving and Communications System (PACS) report search function from two hospitals within an integrated health system. A total of 5454 pelvis and hip radiographs were obtained from May 2013 to January 2022. Single anteroposterior (AP) views of the pelvis as well as one- to three-views of the hips (AP, frog leg, and/or cross-table lateral) were included. Images were exported in DICOM format. To ensure complete deidentification, images were manually reviewed to ensure no burned-in protected health information (PHI) was present. Images were converted to JPEG format for complete metadata anonymization and for further processing and analysis. Pelvic radiographs containing bilateral hips were secondarily divided based on the midpoint of the width of the image, to derive two images, one of each hip. Five hundred fifty-five radiographs (10%) which were technically inadequate (i.e., poor exposure/image quality, did not include the entire joint or hardware) or with overlaid measurements/outlines for arthroplasty planning were excluded.
Initial image labels were extracted from finalized reports, interpreted by the original radiologist. All images underwent further manual review and classification by a musculoskeletal radiologist. The most commonly represented categories were selected for, and the training categories were defined as follows: (0) No hardware, (1) total hip arthroplasty (THA), (2) hemiarthroplasty, (3) intramedullary nail (IMN), (4) femoral neck cannulated screws, (5) sliding/dynamic hip screw (DHS), (6) lateral blade or plate fixation, (7) combination of THA with additional femoral fixation (e.g., plate or cerclage wires), and (8) post-infectious hip (e.g., antibiotic-impregnated cement spacer). Hardware involving the lumbar spine and/or sacroiliac joints was not considered, and determination of category was made exclusively based on the hardware related to the hip joint. Six hundred twenty-one images (11%) which did not conform to these categories were excluded from the dataset, as there was insufficient representation to adequately perform machine training. Excluded categories included, for instance, acetabular or other pelvic plate and screw fixations. This left a total of 4279 radiographs from 1073 patients to comprise the overall dataset. Frontal radiographs of the hip depicting examples of the different implant types are shown in Fig. 1.
Fig. 1.

Frontal radiographs of the hip depicting different implant types in the training and test sets. a No hardware. b Total hip arthroplasty. c Hemiarthroplasty. d Intramedullary nail. e Femoral neck cannulated screws. f Sliding/dynamic hip screw. g Lateral blade or plate fixation. h Combination of THA with additional femoral fixation. i + j Two examples of post-infectious hip
A 70:10:20 split was used on the overall dataset to create the training, validation, and test sets respectively, with randomization by class distribution to ensure adequate sample sizes of the different classes. This resulted in a test set comprised of 851 studies from 262 patients. Images from a single patient were stratified solely into the training, validation, or test sets. No patient overlap was present across the sets. Each image in the test set was independently reviewed and labeled by five fellowship-trained musculoskeletal radiologists, with a range of 1 to 28 years of experience.
An EfficientNet and an NFNet image classification model were selected to perform classification of the hip hardware images [17, 18]. EfficientNets are a family of convolutional neural networks that have been previously reported in the medical imaging literature to diagnose COVID-19 on radiographs, identify diabetic retinopathy, and identify osteoporosis on hip radiographs [19–21]. Pretrained weights from the ImageNet dataset classification task, further optimized using the NoisyStudent training algorithm, were utilized to initiate model weights and improve convergence [22]. A B4 size EfficientNet model was chosen as a compromise between optimal computational performance and theoretical model accuracy.
NFNets are a family of convolutional neural networks previously utilized to diagnose ulcerative colitis from colonoscopy images and COVID-19 on radiographs [23, 24]. NFNets have been shown to achieve greater test accuracy than EfficientNets while having less reliance on the computationally demanding large batch sizes required by EfficientNets [18]. A F3 size NFNet model was chosen as a compromise between optimal computational efficiency and model accuracy.
Model weights and architecture were obtained from an open-source library based on the PyTorch 1.8 framework [25]. This was implemented into a custom training and inference routine utilizing the FastAI package in Python 3.8.5 [26]. Training was performed on a workstation utilizing a NVidia RTX 3090 graphics processing unit. As a part of the training pipeline, images were automatically downsampled to a maximum dimension of 380 × 380 pixels for EfficientNet and 320 × 320 pixels for NFNet. Random augmentation of images was performed at the time of training with randomized adjustments to image contrast, brightness, zoom, warp, and rotation. A staggered training approach was utilized, initially only on the final layer and then on the entire model, for 4 and 50 epochs respectively, for a total of 54 total epochs. Cyclically variable learning rates peaking at 0.001 and 0.0001 for EfficientNet and NFNet training, respectively, were used for each stage of training.
Accuracy, precision, recall, Cohen’s Kappa scores, and area under the receiver operating characteristic (ROC) curve (AUC) were computed to assess model performance. 95% confidence intervals and p-values were calculated with bootstrapping, and the p-value threshold for statistical significance was 0.05. The consensus label among the five radiologists was used to establish the ground truth, against which overall model performance was measured. For any discordant cases where there was inter-radiologist disagreement, the final ground truth label was determined by majority vote. All image review was conducted blind to patient demographics, clinical information, and referring physician. To further analyze the inter-reader agreement between radiologists and machine classifications, a leave-one-out analysis was performed to directly compare individual radiologist performance with machine performance against an aggregate gold standard label.
Grad-CAM heatmaps reflecting machine visual saliency of the test set were computed to visually confirm accurate identification of features by the trained models [27].
Results
The multiclass distribution and patient demographics of the dataset are presented in Table 1. For cases of THA with additional femoral fixation and post-infectious hip, a relatively greater number of cases were included in the dataset due to the greater heterogeneity of their postoperative appearances. For the training and test sets, the mean age of patients was 66.8 and 67.4 years, and the percentage of female patients was 55.7% and 53.2%, respectively.
Table 1.
Multiclass distribution and patient demographics of the training and test sets
| Training/validation | Test | |
|---|---|---|
| Number of images | 3428 | 851 |
| • No hardware | 1098 (32.0%) | 274 (32.2%) |
| • Total hip arthroplasty (THA) | 577 (16.8%) | 146 (17.2%) |
| • Hemiarthroplasty | 279 (8.1%) | 67 (7.9%) |
| • Intramedullary nail | 230 (6.7%) | 57 (6.7%) |
| • Femoral neck cannulated screw | 214 (6.2%) | 53 (6.2%) |
| • Dynamic hip screw | 140 (4.1%) | 35 (4.1%) |
| • Lateral blade or plate fixation | 118 (3.4%) | 29 (3.4%) |
| • THA with additional femoral fixation | 288 (8.4%) | 71 (8.3%) |
| • Post-infectious hip | 484 (14.1%) | 119 (14.0%) |
| Number of patients | 811 | 262 |
| • Mean age (years) | 66.8 | 67.4 |
| • Female (%) | 55.7 | 53.2 |
Of the 851 cases in the test set, the ground truth was established by unanimous consensus label among the five interpreting radiologists for 734 cases (86.3%). Of the remaining 117 cases (13.7%) for which there was at least 1 reader disagreement, 113 cases (97%) were resolved by at least majority (3 out of 5) vote, and 4 cases (3%) were resolved by the most frequently assigned label.
For machine inferences, the highest confidence label was designated as the predicted classification. Performance statistics of precision, recall, F1-score, and AUC are summarized in Table 2. Multiclass classification ROC curves for the two convolutional neural networks are shown in Fig. 2, with NFNet-F3 demonstrating AUCs of 0.99 or greater for all classes, indicating excellent agreement with ground truth. EfficientNet-B4 has an AUC of 0.97 for the post-infectious category, and greater than or equal to 0.99 for all other categories. AUC performance differences across classification networks per class were not statistically significant. Confusion matrices for multiclass classification for the two networks are shown in Fig. 3.
Table 2.
Performance metrics of the deep learning model on the test set compared (EfficientNet/NFNet)
| Precision | Recall | F1-score | AUC (0.95 confidence interval) | |
|---|---|---|---|---|
| No hardware | 1.00/1.00 | 0.99/0.99 | 0.99/1.00 | 1.00 (1.00, 1.00)/1.00 (1.00, 1.00) |
| Total hip arthroplasty (THA) | 0.98/0.98 | 0.99/0.97 | 0.98/0.97 | 0.99 (0.98, 1.00)/0.99 (0.98, 1.00) |
| Hemiarthroplasty | 0.97/0.96 | 0.97/1.00 | 0.97/0.98 | 0.99 (0.97, 1.00)/1.00 (1.00, 1.00) |
| Intramedullary nail | 0.97/0.95 | 1.00/0.98 | 0.98/0.97 | 1.00 (1.00, 1.00)/1.00 (1.00, 1.00) |
| Femoral neck cannulated screw | 0.98/1.00 | 1.00/0.98 | 0.99/0.99 | 1.00 (1.00, 1.00)/0.99 (0.98, 1.00) |
| Dynamic hip screw | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00 (1.00, 1.00)/1.00 (1.00, 1.00) |
| Lateral blade or plate fixation | 1.00/0.97 | 0.93/0.97 | 0.96/0.97 | 1.00 (1.00, 1.00)/1.00 (1.00, 1.00) |
| THA with additional femoral fixation | 0.81/0.91 | 0.96/0.97 | 0.88/0.94 | 1.00 (1.00, 1.00)/1.00 (1.00, 1.00) |
| Post-infectious hip | 0.96/0.95 | 0.87/0.91 | 0.91/0.93 | 0.97 (0.94, 0.99)/0.99 (0.98, 1.00) |
Fig. 2.

Receiver operating characteristic curves for multiclass classification for NFNet-F3 (top) and EfficientNet-B4 (bottom) convolutional neural networks
Fig. 3.
Confusion matrices for multiclass classification for NFNet-F3 (top) and EfficientNet-B4 (bottom) convolutional neural networks
Representative heatmaps for correct predictions in each of the categories are depicted in Fig. 4. Across different views (AP, frog leg, and cross-table lateral), the model highlighted the area of the hardware, with the salient features of each hardware type demonstrating the strongest signal. For post-infectious cases, the model also emphasized the region of the cement spacer and/or antibiotic beads.
Fig. 4.
Example heatmaps from the NFNet-F3 model for the models’ correct predictions for each hardware classification category, including frontal, frog leg, and cross-table lateral views. a Total hip arthroplasty. b Hemiarthroplasty. c Intramedullary nail. d Femoral neck cannulated screws. e Sliding/dynamic hip screw. f Lateral blade or plate. g Combination of THA with additional femoral fixation. h Post-infectious hip
We further analyzed the incorrect predictions made by the models on the test set, consisting of 23 cases (2.7%) for NFNet-F3 and 28 cases (3.3%) for EfficientNet-B4. With both models, the most frequently misclassified category was post-infectious hip [11/23 (47.8%) for NFNet-F3; 16/28 (47.1%) for EfficientNet-B4] (Fig. 5a). A second, less common cause of misclassification were cases of THA-labeled hemiarthroplasty and vice versa [5/23 (21.7%) for NFNet-F3; 2/28 (7.1%) for EfficientNet-B4] (Fig. 5b).
Fig. 5.
a Post-infectious hip. Model prediction of “Combination of THA with additional femoral fixation.” The model likely made the classification based on the residual hardware present. b Total hip arthroplasty (with thin lucency along the press-fit acetabular component). Model prediction of “Hemiarthroplasty.” Distinguishing THA versus hemiarthroplasty can sometimes be challenging, especially in cases of bipolar hemiarthroplasties, or on cross-table lateral views, which makes it more difficult to determine the presence/absence of native acetabulum
Results of model performance and individualist radiologist interpretations on the test set compared to the gold standard for accuracy and Cohen’s kappa coefficient are shown in Table 3. Overall, NFNet-F3 demonstrated marginally higher accuracy ranges (97.1–97.3%) compared to EfficientNet-B4 (96.5–96.7%), but the difference was not statistically significant (p > 0.05). Accuracy for both models was non-inferior to four out of five radiologists and outperformed one radiologist. Cohen’s kappa coefficient demonstrated excellent agreement among the models and radiologists across the board with NFNet-F3 kappa scores ranging from 0.964 to 0.967 and EfficientNet-B4 kappa scores ranging from 0.957 to 0.960. Bar charts for comparing model accuracy and Cohen’s kappa to individual radiologists are depicted in Fig. 6.
Table 3.
Performance metrics of the deep learning model versus radiologist interpretation on the test set compared to the gold standard (GS)
| Accuracy (%) | Radiologist vs GS | NF-F3 vs GS | EN-B4 vs GS |
|---|---|---|---|
| Rad1 | 98.5 (97.6, 99.3) | 97.1 (95.9, 98.2) | 96.5 (95.2, 97.7) |
| Rad2 | 94.7 (93.2, 96.2) | 97.2 (96.1, 98.3) | 96.6 (95.4, 97.8) |
| Rad3 | 98.5 (97.6, 99.3) | 97.2 (96.1, 98.3) | 96.6 (95.4, 97.8) |
| Rad4 | 93.7 (92.0, 95.3) | 97.3 (96.2, 98.4) | 96.7 (95.5, 97.9) |
| Rad5 | 96.7 (95.5, 97.9) | 97.3 (96.2, 98.4) | 96.7 (95.5, 97.9) |
| Kappa | Radiologist vs GS | NF-F3 vs GS | EN-B4 vs GS |
| Rad1 | 0.981 (0.971, 0.991) | 0.964 (0.951, 0.978) | 0.957 (0.942, 0.972) |
| Rad2 | 0.936 (0.918, 0.954) | 0.966 (0.952, 0.979) | 0.959 (0.944, 0.973) |
| Rad3 | 0.981 (0.971, 0.991) | 0.966 (0.952, 0.979) | 0.959 (0.944, 0.973) |
| Rad4 | 0.923 (0.903, 0.943) | 0.967 (0.954, 0.98) | 0.96 (0.946, 0.975) |
| Rad5 | 0.96 (0.946, 0.975) | 0.967 (0.954, 0.98) | 0.96 (0.946, 0.975) |
Note: Data in parentheses are 95% confidence intervals
Fig. 6.

Comparison of accuracy (top) and Cohen’s kappa coefficient (bottom) for the two models versus each radiologist’s interpretation, with 95% confidence intervals depicted by the black vertical lines at the top of each bar
Discussion
In this study, we developed a deep learning-based system to automatically perform orthopedic hardware classification on pelvis and hip radiographs, with the goal of deploying the system to interpret and pre-dictate studies as part of the clinical workflow. Both neural networks, NFNet-F3 and EfficientNet-B4, performed exceptionally well, with accuracies of > 99% for most categories when compared against the gold standard of aggregate radiologist interpretations. Model accuracy was non-inferior to expert-level interpretations from subspecialty-trained radiologists (and outperformed one radiologist) and demonstrated excellent inter-reader agreement.
An analysis of failure cases provides insight into model shortcomings and opportunities for refinement, which may be of value during clinical deployment. For both models, the most miscategorized entity was post-infectious hip, likely due to the greater complexity and heterogeneity of postoperative appearances of the pelvis/hip, depending on the initial hardware placed and subsequent surgical intervention(s). Models were most likely to misclassify these based on the residual hardware present, such as THA with additional femoral fixation. A second common cause of misclassification is the distinction between THA and hemiarthroplasty. This can be especially difficult for bipolar hemiarthroplasties, due to the presence of both a femoral head component and an outer bearing, and on cross-table lateral views, where more overlapping soft tissue in the region of the hip obscures evaluation for the presence/absence of native acetabulum.
Prior studies have developed neural networks which can perform hardware detection. Krogue et. al. developed a deep learning model to identify and subclassify hip fractures as well as detect the presence of orthopedic hardware on radiography [10]. However, classification was limited to two categories–arthroplasty and open reduction internal fixation (ORIF), which comprised only a small subset of the overall dataset (172 arthroplasty and 59 ORIF out of an overall 3026 cases). Other studies have included the function of hardware detection as part of models developed for other purposes, for example, pelvic/acetabular fracture detection and quantitative leg length analysis [11, 15], but these models do not subclassify the type of hardware. Gong et al. developed a model to identify four hip arthroplasty designs from leading manufacturers; however, their study was restricted specifically to arthroplasty implants, and the training and validation dataset they used of 357 images from 313 patients was much smaller [16].
To our knowledge, this is the first instance of a deep learning-based approach to subclassify hardware in pelvis/hip radiographs. While THAs are the most frequently observed devices, the range of orthopedic hardware in the hip/pelvic region is extensive in appearance and function, and more precise and consistent classification enhances the clinical utility of radiology reports. We improve upon the previous work described by expanding the range and specificity of hardware identification, training on a larger dataset, and incorporating additional radiographic views. Compared to prior models that considered only a single image in performing classification tasks, usually the frontal view, our model can perform classification tasks based also on other views, which adds clinical utility and more closely approximates human readers who look at several views.
Our goal was to develop a model that can be clinically implemented to decrease radiologist workload by automating a task that can be performed with a high degree of certainty and low degree of error. Furthermore, automated subclassification of hardware type can increase consistency among different radiologists’ reports and provide more clinically useful information. This may be an improvement to original freeform reporting styles that vary widely across radiologists in the degree of specificity and particular phrases used, such as “orthopedic hardware” or “postsurgical changes of ORIF.” We envision the model can be used to prepopulate a preliminary report, which a human radiologist can review for accuracy and look for evidence of complications, and as the automated process can occur at the time of acquisition, prior to radiologist review of images. It is also hoped that this process can save time in the radiologist workflow.
There are several limitations of the study. The dataset is derived from one health system, and as with other deep learning-based projects, data from other sites can provide external validation and improve model generalizability. There are also other types of hardware in the pelvis, for example, acetabular or sacroiliac fixation, for which our model is not trained to categorize. In the current study, we excluded 11% of cases that did not meet inclusion criteria, which will influence overall impact of the algorithm. Lastly, as mentioned above, the model is descriptive of hardware and should only be implemented in the context of aiding radiologist reporting, as human interpretation is still necessary to identify other pertinent information and identify potential hardware complications.
In the future, determining how best to integrate the model into the clinical workflow will be paramount. Once clinically implemented, continuous assessment and data collection to evaluate the impact of the system on metrics such as accuracy, efficiency, and need for report revision will be essential for long-term success. We also recognize that classification of hardware types is a relatively straightforward task for trained radiologists, with near perfect accuracy, thus success of the AI models was not surprising. We view the current work as a stepping stone to future work expanding model functionality in performing other hardware-related tasks, such as the evaluation of implant positioning, loosening, or infection. In this situation, the ability of the model to evaluate hardware on multiple radiographic views may be especially crucial to contribute additional pertinent information. Finally, eventually, it would be interesting to correlate performance with clinical and surgical outcomes to see whether the model could be predictive in identifying patients at higher risk of undergoing subsequent revision.
In conclusion, we developed a deep learning model to classify commonly encountered orthopedic hardware in the pelvis and hip with high accuracy and comparable performance to subspecialty MSK-trained radiologists. Automated reporting using our model has the potential to enhance radiologist workflow/efficiency and improve consistency across reporting of orthopedic hardware. In addition, accurate classification of implant types can serve as a first stage for detecting hardware malpositioning and other complications.
Authors’ Contributions
YM, JLB, BHD, and CXF contributed to the study conception and design. Data collection was performed by YM, JLB, and AHY. YM, JLB, CFB, LY, and CXF evaluated the radiographs. Data analysis was performed by YM and CXF. YM and CXF wrote the manuscript. All authors reviewed and commented on previous versions of the manuscript. All authors approved the final manuscript.
Data Availability
Sample data generated or analyzed during the study are available from the corresponding author by request, to the extent permitted by institutional data sharing policy.
Declarations
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Kurtz S, Ong K, Lau E, Mowat F, Halpern M. Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J Bone Joint Surg Am. 2007;89(4):780–785. 10.2106/JBJS.F.00222. [DOI] [PubMed] [Google Scholar]
- 2.Sloan M, Premkumar A, Sheth NP. Projected Volume of Primary Total Joint Arthroplasty in the U.S., 2014 to 2030. J Bone Joint Surg Am. 2018;100(17):1455–1460. 10.2106/JBJS.17.01617 [DOI] [PubMed] [Google Scholar]
- 3.Shichman I, Roof M, Askew N, et al. Projections and Epidemiology of Primary Hip and Knee Arthroplasty in Medicare Patients to 2040-2060. JB JS Open Access. 2023;8(1):e22.00112. 10.2106/JBJS.OA.22.00112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Brauer CA, Coca-Perraillon M, Cutler DM, Rosen AB. Incidence and mortality of hip fractures in the United States. JAMA. 2009;302(14):1573–1579. 10.1001/jama.2009.1462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Johnell O, Kanis JA. An estimate of the worldwide prevalence, mortality and disability associated with hip fracture. Osteoporos Int. 2004;15(11):897–902. 10.1007/s00198-004-1627-0. [DOI] [PubMed] [Google Scholar]
- 6.Palm H (2021) Hip Fracture: The Choice of Surgery. In: Falaschi P, Marsh D, editors. Orthogeriatrics: The Management of Older Patients with Fragility Fractures. 2nd ed. Cham (CH): Springer http://www.ncbi.nlm.nih.gov/books/NBK565572/. Accessed October 16, 2023. [PubMed]
- 7.Bhandari M, Devereaux PJ, Swiontkowski MF, et al. Internal Fixation Compared with Arthroplasty for Displaced Fractures of the Femoral Neck : A Meta-Analysis. JBJS. 2003;85(9):1673. [DOI] [PubMed] [Google Scholar]
- 8.Mundi S, Pindiprolu B, Simunovic N, Bhandari M. Similar mortality rates in hip fracture patients over the past 31 years. Acta Orthopaedica. 2014;54–59. 10.3109/17453674.2013.878831 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Li J, Zhao Z, Yin P, Zhang L, Tang P. Comparison of three different internal fixation implants in treatment of femoral neck fracture—a finite element analysis. Journal of Orthopaedic Surgery and Research. 2019;14(1):76. 10.1186/s13018-019-1097-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Krogue JD, Cheng KV, Hwang KM, et al. Automatic Hip Fracture Identification and Functional Subclassification with Deep Learning. Radiol Artif Intell. 2020;2(2):e190023. 10.1148/ryai.2020190023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kitamura G. Deep learning evaluation of pelvic radiographs for position, hardware presence, and fracture detection. European Journal of Radiology. 2020;130:109139. 10.1016/j.ejrad.2020.109139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rouzrokh P, Wyles CC, Kurian SJ, et al. Deep Learning for Radiographic Measurement of Femoral Component Subsidence Following Total Hip Arthroplasty. Radiol Artif Intell. 2022;4(3):e210206. 10.1148/ryai.210206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wei J, Li D, Sing DC, et al. Detecting total hip arthroplasty dislocations using deep learning: clinical and Internet validation. Emerg Radiol. 2022;29(5):801–808. 10.1007/s10140-022-02060-2. [DOI] [PubMed] [Google Scholar]
- 14.Rouzrokh P, Ramazanian T, Wyles CC, et al. Deep Learning Artificial Intelligence Model for Assessment of Hip Dislocation Risk Following Primary Total Hip Arthroplasty From Postoperative Radiographs. J Arthroplasty. 2021;36(6):2197-2203.e3. 10.1016/j.arth.2021.02.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Larson N, Nguyen C, Do B, et al. Artificial Intelligence System for Automatic Quantitative Analysis and Radiology Reporting of Leg Length Radiographs. J Digit Imaging. 2022;35(6):1494–1505. 10.1007/s10278-022-00671-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gong Z, Fu Y, He M, Fu X. Automated identification of hip arthroplasty implants using artificial intelligence. Sci Rep. Nature Publishing Group; 2022;12(1):12179. 10.1038/s41598-022-16534-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Tan M, Le QV. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv; 2020. http://arxiv.org/abs/1905.11946. Accessed May 14, 2022.
- 18.Brock A, De S, Smith SL, Simonyan K. High-Performance Large-Scale Image Recognition Without Normalization. arXiv; 2021. 10.48550/arXiv.2102.06171.
- 19.Marques G, Agarwal D, de la Torre Díez I. Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Appl Soft Comput. 2020;96:106691. 10.1016/j.asoc.2020.106691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chetoui M, Akhloufi MA. Explainable Diabetic Retinopathy using EfficientNET. Annu Int Conf IEEE Eng Med Biol Soc. 2020;2020:1966–1969. 10.1109/EMBC44109.2020.9175664. [DOI] [PubMed] [Google Scholar]
- 21.Yamamoto N, Sukegawa S, Kitamura A, et al. Deep Learning for Osteoporosis Classification Using Hip Radiographs and Patient Clinical Covariates. Biomolecules. 2020;10(11):1534. 10.3390/biom10111534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Xie Q, Luong M-T, Hovy E, Le QV. Self-training with Noisy Student improves ImageNet classification. arXiv; 2020. 10.48550/arXiv.1911.04252.
- 23.Turan M, Durmus F. UC-NfNet: Deep learning-enabled assessment of ulcerative colitis from colonoscopy images. Med Image Anal. 2022;82:102587. 10.1016/j.media.2022.102587. [DOI] [PubMed] [Google Scholar]
- 24.Akter S, Shamrat FMJM, Chakraborty S, Karim A, Azam S. COVID-19 Detection Using Deep Learning Algorithm on Chest X-ray Images. Biology (Basel). 2021;10(11):1174. 10.3390/biology10111174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.PyTorch Image Models. Hugging Face; 2023. https://github.com/huggingface/pytorch-image-models. Accessed November 22, 2023.
- 26.fast.ai - fast.ai—Making neural nets uncool again. fast.ai. https://www.fast.ai/. Accessed November 22, 2023.
- 27.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int J Comput Vis. 2020;128(2):336–359. 10.1007/s11263-019-01228-7. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Sample data generated or analyzed during the study are available from the corresponding author by request, to the extent permitted by institutional data sharing policy.



