Abstract
Background
Occult scaphoid fractures on initial radiographs of an injury are a diagnostic challenge to physicians. Although artificial intelligence models based on the principles of deep convolutional neural networks (CNN) offer a potential method of detection, it is unknown how such models perform in the clinical setting.
Questions/purposes
(1) Does CNN-assisted image interpretation improve interobserver agreement for scaphoid fractures? (2) What is the sensitivity and specificity of image interpretation performed with and without CNN assistance (as stratified by type: normal scaphoid, occult fracture, and apparent fracture)? (3) Does CNN assistance improve time to diagnosis and physician confidence level?
Methods
This survey-based experiment presented 15 scaphoid radiographs (five normal, five apparent fractures, and five occult fractures) with and without CNN assistance to physicians in a variety of practice settings across the United States and Taiwan. Occult fractures were identified by follow-up CT scans or MRI. Participants met the following criteria: Postgraduate Year 3 or above resident physician in plastic surgery, orthopaedic surgery, or emergency medicine; hand fellows; and attending physicians. Among the 176 invited participants, 120 completed the survey and met the inclusion criteria. Of the participants, 31% (37 of 120) were fellowship-trained hand surgeons, 43% (52 of 120) were plastic surgeons, and 69% (83 of 120) were attending physicians. Most participants (73% [88 of 120]) worked in academic centers, whereas the remainder worked in large, urban private practice hospitals. Recruitment occurred between February 2022 and March 2022. Radiographs with CNN assistance were accompanied by predictions of fracture presence and gradient-weighted class activation mapping of the predicted fracture site. Sensitivity and specificity of the CNN-assisted physician diagnoses were calculated to assess diagnostic performance. We calculated interobserver agreement with the Gwet agreement coefficient (AC1). Physician diagnostic confidence was estimated using a self-assessment Likert scale, and the time to arrive at a diagnosis for each case was measured.
Results
Interobserver agreement among physicians for occult scaphoid radiographs was higher with CNN assistance than without (AC1 0.42 [95% CI 0.17 to 0.68] versus 0.06 [95% CI 0.00 to 0.17], respectively). No clinically relevant differences were observed in time to arrive at a diagnosis (18 ± 12 seconds versus 30 ± 27 seconds, mean difference 12 seconds [95% CI 6 to 17]; p < 0.001) or diagnostic confidence levels (7.2 ± 1.7 seconds versus 6.2 ± 1.6 seconds; mean difference 1 second [95% CI 0.5 to 1.3]; p < 0.001) for occult fractures.
Conclusion
CNN assistance improves physician diagnostic sensitivity and specificity as well as interobserver agreement for the diagnosis of occult scaphoid fractures. The differences observed in diagnostic speed and confidence is likely not clinically relevant. Despite these improvements in clinical diagnoses of scaphoid fractures with the CNN, it is unknown whether development and implementation of such models is cost effective.
Level of Evidence
Level II, diagnostic study.
Introduction
Scaphoid fractures comprise a considerable portion of patients with hand trauma, with a mean annual incidence of 1.5 to 4.3 per 10,000 patients [2, 8, 11]. It is the most commonly fractured carpal bone, accounting for 60% of all carpal fractures [8]. There is no consensus about scaphoid diagnostic imaging protocols, but most hospitals obtain a wrist radiograph and repeat radiographs in 10 to 14 days [6]. Some institutions use MRI, CT, or bone scintigraphy as the second-line imaging technique instead of a repeat radiography [6]. Up to 20% of scaphoid fractures are not visible on initial injury radiographs, which is a diagnostic challenge to physicians [26]. Scaphoid nonunions from delays in diagnosis and inadequate treatment can lead to degenerative wrist arthritis, chronic wrist pain, and carpal collapse [9]. These are especially challenging to treat successfully; therefore, early and accurate diagnosis of scaphoid fractures is essential.
Studies have shown that even with repeat scaphoid radiographs 6 weeks after the injury, the positive predictive value of physician diagnosis is poor, ranging from 14% to 26% [14]. In addition, the interobserver agreement of follow-up radiographs in cases of suspected scaphoid fractures were poor to fair in a previous study [24]. To date, the level of training, observer experience, image presentation, and simplification of classification have been shown to have a minimal impact on the reliability and accuracy of diagnosing scaphoid fractures [14].
We previously developed a convolutional neural network (CNN) that can detect apparent scaphoid fractures with a sensitivity and specificity of 87% and 92%, respectively, and occult scaphoid fractures with a sensitivity and specificity of 79% and 72%, respectively [28].
Such artificial intelligence (AI) algorithms can augment work efficiency and increase clinical throughput by decreasing human error and fatigue. However, the high performance of a new machine-learning model in a controlled laboratory setting may not be translatable clinically if it does not integrate seamlessly into the current clinical workflow or fill a clinical gap. For example, a recent multicenter study developed a computerized decision support system that automatically analyzed fetal heart tracings and alerted physicians, but a randomized trial demonstrated that it failed to decrease perinatal operative delivery rates [3]. Similarly, it is unknown whether implementing the neural network we have developed will substantially improve the clinical detection of scaphoid fractures. Therefore, we asked: (1) Does CNN-assisted image interpretation improve interobserver agreement for scaphoid fractures? (2) What is the sensitivity and specificity of image interpretation performed with and without CNN assistance (as stratified by type: normal scaphoid, occult fracture, and apparent fracture)? (3) Does CNN assistance improve time to diagnosis and physician confidence level?
Patients and Methods
Study Design and Setting
Hand surgeons, plastic surgeons, emergency medicine physicians, rheumatologists, and internists were prospectively recruited using a crowd-sourcing methodology via email in the United States and Taiwan. Initial recruitment emails were distributed in February 2022, and two solicitation emails were sent in mid-February 2022 and March 2022. Resident physicians above Postgraduate Year 3 in the aforementioned specialties were recruited to ensure adequate competency in interpreting scaphoid radiographs. A short email describing the purpose of the study and providing a link to the internet-based study platform with scaphoid radiographs was sent to the participants. The survey was built using Qualtrics software platform (Qualtrics XM), and collected demographic data of the participants including age, gender, level of training, years of experience, medical specialty, hand fellowship training, practice setting, and baseline confidence level for reading scaphoid radiographs.
The survey included five normal scaphoid radiographs, five apparent fracture radiographs, and five occult fracture radiographs in a randomized order. Any identifiable patient information was removed from the radiographs. For each patient radiograph, a posteroanterior (PA) view, lateral view, and scaphoid view (PA with ulnar deviation) were displayed to the participant. The participants first interpreted the radiograph without any CNN input. They then proceeded immediately to the second half of the survey, which consisted of the same images in a different order in conjunction with CNN predictions. Before commencing the second portion of the survey, the participants were informed that the subsequent 15 images would be accompanied by CNN predictions. The CNN outputs included a categorical prediction of apparent fracture, occult fracture, or normal scaphoid and gradient-weighted class activation mapping isolating the fracture location on the scaphoid (Fig. 1).
Fig. 1.

Example CNN-assisted radiographs were presented to the survey participants. (A) The CNN first predicted the presence or absence of fracture, then (B) displayed gradient-weighted class activation mapping of the fracture site.
This study followed the Checklist for AI in Medical Imaging reporting guidelines [17].
Participants
Potential participants were recruited from collaborators in the senior author’s (KCC) prior clinical trials, multiple departments at Chang Gung Medical Center (orthopaedics, plastic surgery, emergency medicine, family medicine, and rheumatology), and University of Michigan plastic surgery and emergency medicine residents above Postgraduate Year 3. A total of 68% (120 of 176) completed the survey and met the inclusion criteria. Of these, 69% (83 of 120) were attending physicians and 31% (37 of 120) were resident physicians and were recruited in this study. Most participants and were plastic surgeons or orthopaedic surgeons, but diverse specialties were represented among the participants, including primary care and emergency medicine, which are integral in the diagnosis of and referral for scaphoid fractures (Table 1). Most participants were men (74% [89 of 120]), and the participants had a 6.8 of 10 (10 being the most confident) level of confidence in diagnosing scaphoid fractures at baseline.
Table 1.
Participant descriptive statistics (n = 120)
| Category | Value |
| Age in years, mean ± SD | 38 ± 10 |
| Gender, % (n) | |
| Men | 74 (89) |
| Women | 22 (26) |
| Declined to answer | 4 (5) |
| Level of training, % (n) | |
| Attending physicians | 69 (83) |
| Residents or fellows | 31 (37) |
| Years of experience, mean ± SD | 9 ± 8 |
| Specialty. % (n) | |
| Plastic surgery | 43 (52) |
| Orthopaedic surgery | 30 (36) |
| Primary care | 7 (8) |
| General surgery | 1 (1) |
| Emergency medicine | 12 (14) |
| Rheumatology | 8 (9) |
| Hand fellowship, % (n) | |
| Yes | 31 (37) |
| No | 69 (83) |
| Practice setting, % (n) | |
| Academic | 73 (88) |
| Private | 21 (25) |
| Other | 6 (7) |
Scaphoid Radiographs
All 15 included radiographs were selected from scaphoid radiographs obtained between 2000 and 2020 at the senior author’s (KCC) institution. All included radiographs were interpreted by a musculoskeletal radiologist at the time of injury and a hand surgeon (KCC) during the study period to ensure ground truth accuracy (presence or absence of fracture). The 15 selected radiographs were in Digital Imaging and Communications in Medicine format for CNN interpretation, and these were converted to high-resolution JPEG images for the survey for compatibility with Qualtrics. All occult fractures were initially missed by the radiologist on plain radiographs but subsequently confirmed by another radiologist via repeat radiographs, CT images, or MR images. The initial radiograph was then examined by the senior author (KCC), and the fracture was confirmed as occult.
Outcomes
The primary outcomes of this study were sensitivity and specificity of physician diagnoses for scaphoid fractures on radiographs, with and without CNN assistance. Additionally, we measured the interobserver reliability among physicians in detecting normal scaphoids, apparent scaphoid fractures, and occult scaphoid fractures. Secondary outcomes included time to diagnosis and self-assessed confidence in diagnosis with and without CNN assistance.
Ethical Approval
Our institutional review board approved this study and deemed it as no more than minimal risk (HUM00171115).
Statistical Analysis
Interobserver agreement among physicians was calculated using the Gwet agreement coefficient (AC1). The Fleiss κ statistic, frequently used to describe the agreement among observers’ interpretations [10], is known to be paradoxically low for rare observations or extremely low prevalence [25]. Because of this, the κ statistic yielded paradoxically low values for detection of apparent scaphoid fractures, because more than 97% of participants interpreted the radiographs correctly; therefore, we elected to use AC1 instead, which is more robust for low-prevalence scenarios [7]. AC1 values between 0.01 and 0.20 indicate slight agreement; 0.21 to 0.40, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, substantial agreement; and 0.81 to 0.99, near-perfect agreement [15]. Although these descriptors offer an intuitive way to interpret interobserver agreement, they must also be interpreted in the context of confidence intervals and bias [20]. Sensitivity and specificity were assessed according to standard formulas. We determined 95% CIs using binomial confidence limits. Time to diagnosis was measured by the mean length of time the participant spent on a single case during the survey. Diagnostic confidence level was self-assessed with a 10-point Likert scale, with 1 being the least confident and 10 being the most confident. A two-tailed α of 0.05 was the criterion for significance. Statistical analyses were performed using R version 4.2.1 (the R Foundation). A post hoc power analysis was performed, and a sample size of 120 in each group had 98% power to detect a difference in means of 0.11, assuming the common standard deviation is 0.2, using a two-group t-test with a 0.05 two-sided significance level.
Results
Interobserver Agreement
The interobserver agreement for all 15 radiographs among participants with and without CNN assistance were not different (AC1 0.67 [95% CI 0.48 to 0.86] versus 0.44 [95% CI 0.20 to 0.69], respectively). A subgroup analysis showed that using the CNN improved interobserver reliability for occult scaphoid fractures (AC1 0.42 [95% CI 0.17 to 0.68] versus 0.06 [95% CI 0.00 to 0.17]) but not for apparent fractures (AC1 0.98 [95% CI 0.95 to 1.00] versus 0.95 [95% CI 0.87 to 1.00]) and normal scaphoids (AC1 0.75 [95% CI 0.38 to 1.00] versus 0.50 [95% CI 0.17 to 0.84]) (Table 2).
Table 2.
Interobserver agreement
| Without CNN (n = 120), AC1a (95% CI) |
With CNN (n = 120), AC1a (95% CI) |
|
| Overall | 0.44 (0.20 to 0.69) | 0.67 (0.48 to 0.86) |
| Apparent fracture | 0.95 (0.87 to 1.00) | 0.98 (0.95 to 1.00) |
| Normal scaphoid | 0.50 (0.17 to 0.84) | 0.75 (0.38 to 1.00) |
| Occult fracture | 0.06 (0.00 to 0.17) | 0.42 (0.17 to 0.68) |
Gwet agreement coefficient (AC1).
Sensitivity and Specificity of Diagnosis
The sensitivity and specificity of scaphoid fracture diagnosis without CNN were 0.72 (95% CI 0.69 to 0.74) and 0.78 (95% CI 0.74 to 0.81), respectively. Sensitivity and specificity with CNN assistance were 0.87 (95% CI 0.85 to 0.89) and 0.76 (95% CI 0.72 to 0.80), respectively (Table 3). The only clinically significant difference in sensitivity was for occult fractures (0.75 [95% CI 0.72 to 0.79] with CNN versus 0.46 [95% CI 0.42 to 0.50] without CNN).
Table 3.
Physician diagnostic performance
| Sensitivity, % (95% CI) | Specificity, % (95% CI) | |||
| Without CNN | With CNN | Without CNN | With CNN | |
| Overall | 0.72 (0.69 to 0.74) | 0.87 (0.85 to 0.89) | 0.78 (0.74 to 0.81) | 0.76 (0.72 to 0.80) |
| Apparent fracture | 0.97 (0.96 to 0.99) | 0.99 (0.98 to 1.00) | NA | NA |
| Normal scaphoid | NA | NA | 0.78 (0.74 to 0.81) | 0.80 (0.76 to 0.83) |
| Occult fracture | 0.46 (0.42 to 0.50) | 0.75 (0.72 to 0.79) | NA | NA |
Time to Diagnosis and Diagnostic Confidence
We found no clinically relevant differences in time to diagnosis without or with CNN assistance for apparent fractures (15 ± 16 seconds versus 11 ± 5 seconds, mean difference 4 [95% CI 1 to 7]; p = 0.004), occult fractures (30 ± 27 seconds versus 18 ± 12 seconds, mean difference 12 [95% CI 6 to 17]; p < 0.001), or normal scaphoids (26 ± 17 seconds versus 20 ± 30 seconds, mean difference 6 [95% CI 0 to 13]; p = 0.04) (Table 4). There were no clinically relevant differences in diagnostic confidence level for apparent fractures (Likert scale 8.8 ± 1.2 versus 9.2 ± 0.8, mean difference 0.4 [95% CI 0.2 to 0.7]; p = 0.001), occult fractures (Likert scale 6.2 ± 1.6 versus 7.2 ± 1.7, mean difference 1.0 [95% CI 0.5 to 1.3]; p < 0.001), or normal scaphoids (Likert scale 6.7 ± 1.5 versus 8.0 ± 1.4, mean difference 1.3 [95% CI 0.9 to 1.6]; p < 0.001) (Table 4).
Table 4.
Time to diagnosis and confidence level
| Without CNN (n = 120) | With CNN (n = 120) | Mean difference (95% CI) | p value | |
| Time to diagnosis in seconds | ||||
| Apparent fracture | 15 ± 16 | 11 ± 5 | 4 (1 to 7) | 0.004 |
| Occult fracture | 30 ± 27 | 18 ± 12 | 12 (6 to 17) | < 0.001 |
| Normal | 26 ± 17 | 20 ± 30 | 6 (0 to 13) | 0.04 |
| Confidence levela | ||||
| Apparent fracture | 8.8 ± 1.2 | 9.2 ± 0.8 | 0.4 (0.2 to 0.7) | 0.001 |
| Occult fracture | 6.2 ± 1.6 | 7.2 ± 1.7 | 1.0 (0.5 to 1.3) | < 0.001 |
| Normal | 6.7 ± 1.5 | 8.0 ± 1.4 | 1.3 (0.9 to 1.6) | < 0.001 |
Data are presented as mean ± SD. aPhysician confidence level for each diagnosis per case on the Likert scale (1 is least confident; 10 is most confident).
Discussion
Plain radiographs continue to be the most common first-line method to diagnose scaphoid fractures [6]. However, scaphoid fractures present a diagnostic challenge because up to 20% are invisible on the initial radiograph [26]. CNNs have been successfully implemented for many medical image classification problems, including cardiac function assessment [5, 18], lung cancer screening [1], and mammography interpretations [16, 27]. We trained a deep CNN to improve performance and interobserver reliability among physicians interpreting scaphoid radiographs [28]. High-fidelity model performance does not guarantee success in the clinical setting; therefore, we aimed to evaluate the performance of our CNN-based tool in assisting physician diagnoses. A previous study has demonstrated poor interobserver reliability for scaphoid radiographs even 6 weeks after the initial injury [14]. We found that interobserver reliability improved when physicians interpreted occult scaphoid radiographs with CNN assistance. Physician diagnostic performance of scaphoid fractures in radiographs improved with CNN assistance, as evidenced by higher sensitivity for occult fractures and decreased time to diagnosis.
Limitations
First, some degree of recall bias is possible, because the same 15 scaphoid radiographs were used in succession to assess changes in physician performance with CNN assistance. However, this recall bias was minimized by rearranging the order of radiographs for the second portion of the survey. In addition, the correct diagnoses based on the scaphoid radiographs were not revealed at any point during the first portion of the survey. Recall bias could have been further minimized if the images with and without CNN input were collectively randomized. Because most of the recruited participants were experts in hand surgery and are familiar with scaphoid fractures, the true effects of the CNN on physician diagnostic performance of scaphoid fractures are likely higher in the real world. Although approximately 30% of participants were in primary care, rheumatology, or emergency medicine, because most of the participants were plastic and orthopaedic surgeons, this study may lack generalizability. This survey experiment did not include any radiologists because the scaphoid CNN was originally designed to assist emergency medicine physicians, primary care physicians, and hand surgeons, who most frequently treat patients with scaphoid fractures and who need to make decisions based on the radiographs before a formal radiology read. Nonetheless, the absence of radiologists among the study participants is a limitation. However, given the high level of expertise among our participants, including radiologists likely would not have altered our findings. In this study, 31% of participants were resident physicians with lower levels of expertise, but the inclusion criteria for these residents were limited to senior residents (Postgraduate Year 3 or higher) in specialties that regularly manage scaphoid fractures. The heterogenous group of surgeons from different specialties, training levels, and countries likely decreased the interobserver agreement compared with a study conducted at a single institution among surgeons of similar experience levels. The survey software did not permit physicians to modulate the images’ brightness, contrast, or magnification levels, which could have affected diagnostic performance. However, all radiographic images uploaded to the survey were derived from high-resolution Digital Imaging and Communications in Medicine images, and the software limitations were present for both portions (without and with scaphoid CNN input) of the survey.
Interobserver Agreement
Scaphoid CNN assistance improved interobserver agreement primarily for the detection of occult scaphoid fractures. Interobserver reliability without CNN assistance for diagnosing scaphoid fractures was low or moderate and within the reported ranges from previous studies [13, 14, 23]. The slightly higher overall baseline interobserver agreement is likely because of the high expertise level of the participant group. Despite this, the interobserver reliability increased from 0.44 to 0.67 with CNN assistance. Nearly 70% of the participants were fully trained attending physicians and 73% of physicians were orthopaedic or plastic surgeons with extensive training in hand surgery. Because of the advanced level of expertise among the participant group, our results are likely an underestimate of the CNN’s true effects on interobserver reliability, considering the extremely high interobserver agreement for apparent fractures with or without CNN assistance (AC1 > 0.95). When physicians who are less familiar with scaphoid fractures use the CNN, there will likely be a larger difference in interobserver reliability. Given the participants’ considerable experience with scaphoid fractures, it was not surprising to witness an extremely high sensitivity of 97% for apparent scaphoid fractures without CNN assistance. These results emphasize that the scaphoid CNN will have a true clinical impact on the detection of occult scaphoid fractures.
Sensitivity and Specificity of Diagnosis
The sensitivity of detecting occult fractures was improved by the CNN. However, likely because of the participants’ high expertise levels in hand surgery, the CNN did not affect the sensitivity and specificity for the detection of normal scaphoids or apparent scaphoid fractures. The true improvements in detection sensitivity and specificity in all physicians are likely underappreciated in the current study. Even among an experienced group of participants, the CNN improved the detection performance for occult scaphoid fractures. This is a testament to the difficulty in detecting occult scaphoid fractures on initial radiographs, which can be missed by expert hand surgeons and radiologists. The CNN erroneously determined one of the normal scaphoid images as fractured, which decreased the model’s specificity. For conditions such as occult scaphoid fractures, which cannot be easily detected on radiographs, lowering the model’s threshold to increase false positives to prevent misdiagnoses is more clinically beneficial than the contrary. This is a critical observation, underscoring the importance of the fidelity in AI models designed to complement physicians. Errors in the model can decrease physician performance. Legal or ethical frameworks in line with recent proposals about the regulation of AI must be applied to potential medical errors induced by misinterpretation of AI [19]. Physician liability in cases of adoption and rejection of the algorithm’s recommendations is a necessary future work in this field. The emerging frameworks of AI applications in healthcare that encompass risks, benefits, ethics, and accountability should be closely adhered to, and physicians applying AI in clinical practice should be knowledgeable about these guidelines [12]. A recent study has shown that AI-assisted clinical experts outperform humans and AI alone when detecting lung malignancies [21]. However, to achieve such positive synergistic performance in the clinical setting, the model’s accuracy and reliability must be thoroughly validated using datasets from multiple sources.
Time to Diagnosis and Diagnostic Confidence
Assistance from the CNN-based tool shortened diagnosis times and strengthened physician confidence for diagnosing scaphoid fractures; although the differences observed are statistically significant, these findings are likely not clinically relevant. The mean differences in time to diagnoses were between 4 and 12 seconds, and that of confidence level was between 0.4 and 1.3 on a 10-point Likert scale, which are small differences. In large academic practices, approximately 40% of all inpatient imaging examinations are marked as requiring immediate attention [4]; therefore, it is impossible for radiologists to interpret every radiograph in real time. Neural networks can preanalyze high-volume wrist radiographs to prioritize cases to prevent observer fatigue in addition to extracting information in images not apparent to the naked eye [22]. Frequently, physicians are tasked with making management decisions for patients with scaphoid fractures before a radiologist makes a final analysis. For primary care physicians, the decision might be whether to obtain advanced imaging such as MRI or send the patient to the emergency room. For a hand surgeon, it is usually a decision between cast immobilization or surgery. Improvement in the speed and accuracy of scaphoid fracture diagnoses may not only enhance clinic workflow but also expedite patient care. However, the minimal differences in diagnostic confidence and speed noticed in this study will result in only a modest improvement in workflow and should not be the primary reason for adopting the CNN clinically.
Conclusion
AI algorithms such as CNNs can improve physician diagnostic performance when detecting occult scaphoid fractures on plain radiographs. Interobserver agreement also improved with CNN assistance for ascertaining occult scaphoid fractures on radiographs. These findings suggest that AI can help bridge the clinical gap posed by challenges in diagnosing occult fractures and improve patient care by decreasing time to diagnosis, improving physician confidence, and enabling early diagnosis of occult fractures. Future studies should continue to improve the model’s performance, especially for occult scaphoid fractures, and consider the cost-effectiveness of implementing this tool clinically in the context of the potential clinical benefit. AI will likely be an important diagnostic adjunct tool for physicians in the future and revolutionize musculoskeletal medicine.
Footnotes
The institution of one or more of the authors (APY) has received, during the study period, funding from the National Endowment for Plastic Surgery Grant from the Plastic Surgery Foundation (694527). The institution of one or more of the authors (KCC) has received, during the study period, funding from the National Institutes of Health and from Sonex Health. One of the authors (KCC) certifies receipt of personal payments or benefits, during the study period, in an amount of USD 10,000 to USD 100,000 from Wolters Kluwer and in an amount of USD 10,000 to USD 100,000 from Elsevier.
All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research® editors and board members are on file with the publication and can be viewed on request.
Ethical approval for this study was deemed exempt by the institutional review board of the University of Michigan Medical School (HUM00171115).
This work was performed at the University of Michigan Medical School, Ann Arbor, MI, USA.
Contributor Information
Alfred P. Yoon, Email: alfredy@med.umich.edu.
William T. Chung, Email: wtomokic@med.umich.edu.
Chien-Wei Wang, Email: chienwew@med.umich.edu.
Chang-Fu Kuo, Email: zandis@gmail.com.
Chihung Lin, Email: lin3031@gmail.com.
References
- 1.Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med. 2019;25:954-961. [DOI] [PubMed] [Google Scholar]
- 2.Barton NJ. Twenty questions about scaphoid fractures. J Hand Surg Br. 1992;17:289-310. [DOI] [PubMed] [Google Scholar]
- 3.Belfort MA, Saade GR, Thom EA. Intrapartum fetal ECG ST-segment analysis. N Engl J Med. 2015;373:2480-2481. [DOI] [PubMed] [Google Scholar]
- 4.Chan KT, Carroll T, Linnau KF, Lehnert B. Expectations among academic clinicians of inpatient imaging turnaround time: does it correlate with satisfaction? Acad Radiol. 2015;22:1449-1456. [DOI] [PubMed] [Google Scholar]
- 5.Ghorbani A, Ouyang D, Abid A, et al. Deep learning interpretation of echocardiograms. NPJ Digit Med. 2020;3:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Groves AM, Kayani I, Syed R, et al. An international survey of hospital practice in the imaging of acute scaphoid trauma. AJR Am J Roentgenol. 2006;187:1453-1456. [DOI] [PubMed] [Google Scholar]
- 7.Gwet KL. Computing inter‐rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008;61:29-48. [DOI] [PubMed] [Google Scholar]
- 8.Hove LM. Epidemiology of scaphoid fractures in Bergen, Norway. Scand J Plast Reconstr Surg Hand Surg. 1999;33:423-426. [DOI] [PubMed] [Google Scholar]
- 9.Kawamura K, Chung KC. Treatment of scaphoid fractures and nonunions. J Hand Surg Am. 2008;33:988-997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics. 1977;33:363-374. [PubMed] [Google Scholar]
- 11.Larsen CF, Brøndum V, Skov O. Epidemiology of scaphoid fractures in Odense, Denmark. Acta Orthop Scand. 1992;63:216-218. [DOI] [PubMed] [Google Scholar]
- 12.Lekadir K, Quaglio G, Garmendia AT, Gallin C. Artificial intelligence in healthcare: applications, risks, and ethical and societal impacts. Available at: https://www.europarl.europa.eu/thinktank/en/document/EPRS_STU(2022)729512. Accessed February 2, 2023.
- 13.Low G, Raby N. Can follow-up radiography for acute scaphoid fracture still be considered a valid investigation? Clin Radiol. 2005;60:1106-1110. [DOI] [PubMed] [Google Scholar]
- 14.Mallee WH Mellema JJ Guitton TG et al.. 6-week radiographs unsuitable for diagnosis of suspected scaphoid fractures. Arch Orthop Trauma Surg. 2016;136:771-778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22:276-282. [PMC free article] [PubMed] [Google Scholar]
- 16.McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577:89-94. [DOI] [PubMed] [Google Scholar]
- 17.Mongan J, Moy L, Kahn CE, Jr. Checklist for artificial intelligence in medical imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell. 2020;2:e200029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ouyang D, He B, Ghorbani A, et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature. 2020;580:252-256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Price WN, 2nd, Gerke S, Cohen IG. Potential liability for physicians using artificial intelligence. JAMA. 2019;322:1765-1766. [DOI] [PubMed] [Google Scholar]
- 20.Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005;85:257-268. [PubMed] [Google Scholar]
- 21.Sim Y, Chung MJ, Kotter E, et al. Deep convolutional neural network-based software improves radiologist detection of malignant lung nodules on chest radiographs. Radiology. 2020;294:199-209. [DOI] [PubMed] [Google Scholar]
- 22.Thrall JH, Li X, Li Q, et al. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. J Am Coll Radiol. 2018;15:504-508. [DOI] [PubMed] [Google Scholar]
- 23.Tiel-van Buul MM, van Beek EJ, Borm JJ, Gubler FM, Broekhuizen AH, Royen EAV. The value of radiographs and bone scintigraphy in suspected scaphoid fracture: a statistical analysis. J Hand Surg Br. 1993;18:403-406. [DOI] [PubMed] [Google Scholar]
- 24.Tiel-van Buul MM, van Beek EJ, Broekhuizen AH, Nooitgedacht EA, Davids PH, Bakker AJ. Diagnosing scaphoid fractures: radiographs cannot be used as a gold standard! Injury. 1992;23:77-79. [DOI] [PubMed] [Google Scholar]
- 25.Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37:360-363. [PubMed] [Google Scholar]
- 26.Waeckerle JF. A prospective study identifying the sensitivity of radiographic findings and the efficacy of clinical findings in carpal navicular fractures. Ann Emerg Med. 1987;16:733-737. [DOI] [PubMed] [Google Scholar]
- 27.Wu N, Phang J, Park J, et al. Deep neural networks improve radiologists' performance in breast cancer screening. IEEE Trans Med Imaging. 2020;39:1184-1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yoon AP, Lee YL, Kane RL, Kuo CF, Lin C, Chung KC. Development and validation of a deep learning model using convolutional neural networks to identify scaphoid fractures in radiographs. JAMA Netw Open. 2021;4:e216096. [DOI] [PMC free article] [PubMed] [Google Scholar]
