Abstract
Purpose
Olecranon fractures are classified using the commonly accepted Mayo classification. Its reliability has been analyzed by means of radiographs. A CT scan is often obtained due to joint involvement. Purpose of this study was to evaluate the intra- and interobserver reliability of the Mayo classification based on CT examination.
Methods
Radiographic and CT images of 20 olecranon fractures were classified by four surgeons at two time points 30 days apart. Intra- and interobserver reliability were assessed using kappa coefficients.
Results
Mean intraobserver reliability between X-rays was substantial and between CTs almost perfect (0.76 and 0.82, respectively). Mean interobserver reliability was fair for X-rays and moderate for CTs (0.32 and 0.44, respectively).
Conclusion
Despite the more detailed imaging compared with radiography only moderate interobserver reliability was found for the classification of olecranon fractures based on CT imaging. This might lead to inconsistent fracture classification in both scientific and clinical setting.
Keywords: Olecranon fractures, Mayo classification, Intraobserver reliability, Interobserver reliability, CT-scans
1. Introduction
Classifications of fractures provide assistance in grading injury severity, selecting adequate therapy with the best functional outcome, estimating prognosis, and add to the international comparability of study results.1
The Mayo classification by Cabanela and Morrey was introduced in 1993. The system is based on displacement, fracture comminution and stability of the ulnohumeral joint.2,3 groups exist with 2 appending subgroups each (Fig. 1). Undisplaced fractures belong in group I, displaced fractures in group II, fracture dislocations in group III. The subgroups stand for the presence of comminution. A are noncomminuted simple fractures and B are multifragmentary fracture patterns.3 The classification is characterized by rather simple grouping, therapy recommendations and is frequently used among upper extremity surgeons.4,5 Functional outcome thereby correlates with injury severity.6
Fig. 1.
Illustration of the Mayo classification for olecranon fractures.
Olecranon fractures are a frequent injury and account for 10 % of all elbow fractures. Fracture management is based upon patients' functional demand and radiological examination.7 Since fracture patterns can be more complex, a CT scan is often performed in clinical practice.
The reliability of the classification has already been investigated using radiographs.8,9
Hence, aim of this study is to investigate the intra- and interobserver reliability of the Mayo classification for olecranon fractures primarily based on CT imaging. We hypothesized a superior reliability compared to radiographs.
2. Materials & methods
2.1. Patients
A database search of patients treated at our hospital for isolated olecranon fractures was conducted. Inclusion criteria were a complete radiological set of images with both radiographs and CT scans. Cases with concomitant injuries such as Monteggia-(like) injuries, or non-evaluable radiographs were excluded. The radiographs had to consist of validated strictly lateral and anterior-posterior views to ensure adequate quality for classification. Based on a power analysis our database was searched until 20 cases were enrolled (see 2.3 Statistical analysis).
The CT scans were presented to four investigators at two different time points. The CT scans were completely anonymized and retained in randomized order. The investigators included two experienced upper extremity surgeons (O3 and O4) and two six-year residents of orthopaedic trauma surgery (O1 and O2). Before the relevant CT scans were presented, training was provided to refresh personal knowledge of the Mayo classification. For this purpose, typical injury images were shown both as radiographs and as CT scans together with the classification scheme. This was followed by the evaluation of the 20 fractures. The investigators were not allowed to share their assessments with each other or use the scheme during evaluation.
2.2. Imaging review
The CT scans were performed on a Dual-energy CT (IQon, Philips Healthcare (Best, The Netherlands)). The CT-images were taken with a slice thickness of 0.4–0.7 mm (voxel dimensions of 0.4–0.7 mm in the transverse plane, 0.22–0.42 mm in the coronal plane, 0.22–0.42 mm in the sagittal plane). CT scans were provided with the ability to scroll through all planes (axial, coronal, sagittal, 3D-reconstruction) using a digital image data base (IMPAX EE R20 XVIII by AGFA HealthCare (Bonn, Germany)). After the initial assessment (d1), the CT scans were resubmitted in a different order 30 days later, and a second assessment was performed (d2). In addition, on d1 and d2, the corresponding radiographs in two planes were also evaluated using the Mayo classification. The radiographs were independently shown from the CT scans,
All analyses performed in our study involving X-rays and CT scans from patients were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Institutional ethics committee approval was given prior to this study (VT 21-1248). Informed consent was not required.
2.3. Statistical analysis
To calculate the required sample size for the kappa analysis between four observers a power analysis was performed with an expected kappa of 0.5 (moderate) ± 0.25 between observers or time points, respectively. For a power of 0.80 the analysis resulted in 20 cases in total. The sample size is comparable to other studies investigating intra- and interobserver reliability in fracture classification.8,10,11
The weighted kappa was used to calculate intraobserver reliability between the two time points (d1 and d2). To calculate interobserver reliability of the Mayo classification, we used the Fleiss kappa coefficient. Fleiss kappa is a validated statistical measure used to compare the agreement of more than two raters. Kappa takes the value one at total agreement. A possibly negative kappa indicates that the correlation found is lower than that expected by chance. In general, values of the kappa coefficient are as follows: 0.81 to 1.00, almost perfect; 0.61 to 0.80, substantial; 0.41 to 0.60, moderate; 0.21 to 0.40, fair; 0.00 to 0.20, slight; and <0.00, poor.8,12
3. Results
The dataset included CT scans of eight male and 12 female patients. Mean age was 52.1 years. (range: 25–79).
Table 1 summarizes all collected values for intraobserver reliability between X-rays and CT scans on the 1st and 2nd evaluation day, as well as the separate reliability between X-rays and CT scans with one another at d1 and d2 for each investigator.
Table 1.
Weighted Kappa coefficients for intraobserver reliability between the two time points (d1, d2, 30 days apart) and between X-ray and CT using the Mayo classification.
| Mayo |
||||
|---|---|---|---|---|
| Observer | x-ray(d1)/x-ray(d2) | CT(d1)/CT(d2) | x-ray(d1)/CT(d1) | x-ray(d2)/CT(d2) |
| O1 | 0.72 (0.47–0.96) | 0.83 (0.62–1.03) | 0.58 (0.33–0.83) | 0.66 (0.41–0.9) |
| O2 | 1 (1–1) | 1 (1–1) | 0.85 (0.69–1.02) | 0.85 (0.69–1.02) |
| O3 | 0.6 (0.37–0.82) | 0.78 (0.6–0.96) | 0.46 (0.2–0.73) | 0.76 (0.54–0.98) |
| O4 | 0.75 (0.56–0.94) | 0.70 (0.47–0.94) | 0.56 (0.34–0.78) | 0.63 (0.39–0.87) |
| Mean | 0.76 | 0.82 | 0.61 | 0.72 |
d1 = time point 1; d2 = time point 2; brackets include 95 % confidence intervals; CT=Computed Tomography; O1 and O2 = six-year residents; O3 and O4 = senior upper extremity surgeons.
The highest reliability was found between CT-based classification at d1 and d2 (weighted kappa = 0.82) and the lowest results were obtained between X-rays and CTs on d1 (weighted kappa = 0.61).
Table 2 shows the Fleiss kappa values for interobserver reliability for X-rays and CT scans at d1 and d2, respectively. The best agreement was obtained between CT-based classification at d1 (Fleiss kappa = 0.47). The worst agreement was found between X-rays at d1 (Fleiss kappa = 0.25).
Table 2.
Fleiss Kappa coefficients for interobserver reliability between the two time points (d1, d2, 30 days apart) using the Mayo classification.
| Observation point | x-ray | CT |
|---|---|---|
| d1 | 0.25 (0.15–0.35) | 0.47 (0.36–0.57) |
| d2 |
0.39 (0.29–0.49) |
0.42 (0.31–0.52) |
| Mean | 0.32 | 0.44 |
d1 = time point 1; d2 = time point 2; brackets include 95 % confidence intervals; CT=Computed Tomography; 4 observers involved at each timepoint.
Regarding interobserver reliability based on fracture type, the lowest agreement at both observation points was found with respect to type IIIB in the classification based on radiographs (d1 = 0.16, d2 = 0.15). In the classification based on CT scans, type IIB at d1 (0.31) and type IIA at d2 (0.27) showed the lowest agreement (Table 3). In X-rays the highest agreement was found for type IB at d1 (0.74) and type IA at d2 (0.84). In CT scans the highest agreement was found for type IA at both observation points (d1 = 0.84, d2 = 0.84) (Table 3).
Table 3.
Fleiss Kappa coefficients for interobserver reliability between the two time points (d1, d2, 30 days apart) using the Mayo classification, divided according to the fracture types.
| Fracture type | x-ray d1 | x-ray d2 | CT d1 | CT d2 |
|---|---|---|---|---|
| IA | 0.66 (0.41–0.77) | 0.84 (0.66–1) | 0.84 (0.66–1) | 0.84 (0.66–1) |
| IB | 0.74 (0.52–0.88) | 0.42 (0.24–0.6) | 0.72 (0.54–0.9) | 0.39 (0.22–0.57) |
| IIA | 0.31 (0–0.25) | 0.51 (0.33–0.69) | 0.36 (0.18–0.54) | 0.27 (0.09–0.45) |
| IIB | 0.38 (0–0.3 | 0.21 (0.3–0.38) | 0.31 (0.13–0.49) | 0.36 (0.18–0.54) |
| IIIB | 0.16 (0–0.19) | 0.15 (0–0.32) | 0.47 (0.29–0.65) | 0.53 (0.35–0.7) |
d1 = time point 1; d2 = time point 2; brackets include 95 % confidence intervals; CT=Computed Tomography; 4 observers involved at each timepoint; none of the observers classified fractures as type IIIA.
4. Discussion
This study investigated the intra- and interobserver reliability of olecranon fractures using the Mayo classification based on CT scans. CT scans showed the highest intra-as well as interobserver agreement.
Complete agreement between surgeons can almost never be achieved due to the complexity of injuries. There are numerous examples of classification systems failing to achieve the goal of near-perfect agreement.13, 14, 15, 16, 17, 18, 19 We think that this is both due to complex classification systems themselves and to the fact that each surgeon tries to fit the fracture at hand into a pattern of his known or looked-up classification. This is where interindividual differences eventually arise as almost no fracture is like the other.
One possibility to optimize intra- and interobserver reliability is to keep the classification system as simple as possible. This was attempted by Cabanela and Morrey in 1993.2 Their classification captivates by a memorable separation of 3 main types with not more than 2 subtypes (either multifragmentary (B) or not (A)). The current literature contains studies investigating the intra- and interobserver reliability of this classification based on X-rays.8,9 In our study, the investigation of its reliability was extended to include the classification based on CT. CT examinations of joint-related fractures are the norm because a CT is quickly and safely available. Combined with the overall lower radiation exposure of modern CT devices the reluctance to perform a CT has decreased to fully understand the injury at hand and to safely rule out possible concomitant injuries.
We found substantial intraobserver reliability for the Mayo classification both between CT scans at two different time points (30 days apart) and when comparing X-rays with CT scans. The lowest agreement was found when comparing X-rays with CT scans at d1 (Table 1). This was probably because intermediate fragments can be assessed more reliably on CT scans and smaller fragments are more likely to be missed on radiographs. Thus, based on the radiographs, a subtype A fracture was more likely to be considered, in contrast to a subtype B fracture based on the CT scan (Fig. 2). The agreement increased slightly at time point d2 which might indicate a certain learning curve in the awareness for intermediate fragments. We assume, that the high intraobserver agreement between the CT scans at d1 and d2 was made possible by the detailed imaging of the fractures themselves in the CT. Here, it was possible to examine and grade the entire fracture morphology in all three available planes underlying the major advantage of CT examination in a trauma setting. In their quantitative three-dimensional computed tomography analysis of olecranon fractures, Lubberts et al. concluded that this form of imaging has the potential to enhance our understanding of fracture morphologies and patterns and thereby might help to improve surgical management and eventually functional outcome.20
Fig. 2.
Example of an olecranon fracture in (A) radiographic lateral view and (B) sagittal CT imaging showing a Mayo type IIB fracture with a slightly displaced intermediate fragment at the base of the coronoid process.
Level of experience had no effect on the consistent assessment of fracture patterns for either radiographs or CT scans between the 2 time points. The resident with 6 years of experience (O2) showed the highest intraobserver agreement for both radiographs and CT scans at both time points (see Table 1).
Interobserver reliability showed to be an overall fair agreement in our study. The CT-based classification at d1 showed the highest result (0.47) and thus a substantial agreement according to Landis and Koch (Table 2).12 Again, we assume the advantage of the more detailed imaging in CT improves reproducibility.
The lowest interobserver agreement subdivided by fracture types was documented based on radiographs for type IIIB fractures (Table 3). We believe that one surgeon, for example, classified such a fracture as a displaced fracture without joint dislocation (IIB) and another surgeon as a dislocation fracture (IIIB). We believe that this sub-classification is difficult even in detailed CT imaging. The classification system lacks more definite radiological orientation in which cases a dislocation is to be suspected. Type IIB and IIA fractures showed the lowest agreement at each observation point, respectively for CT scans. We assume that small intermediate fragments could be weighted differently by observers. Thus, one surgeon may see a small osteochondral fragment as still attached to the larger fragment or not worth mentioning at all (IIA) and another surgeon defines it as a free intermediate fragment (IIB). As displaced fractures of the olecranon are generally submitted to surgical fixation, this classification difference has, however, no clinical impact.
Not surprisingly, the highest interobserver agreement for both imaging methods was found in fracture type IA (undisplaced, noncomminuted) and IB (undisplaced, comminuted). Greater displacements of an olecranon fracture can be excluded well with radiographs and even more sufficiently with CT scans. This has led to a high interobserver agreement in the evaluation of such fracture patterns.
Our results are comparable to those found in the literature regarding the Mayo classification based on radiographs. Benetton et al. documented in their study a fair intraobserver reliability of a kappa of 0.18 for specialists and a moderate one for nonspecialists with 0.51. For the interobserver reliability a value of 0.19 was obtained with the investigation time points also 30 days apart.8
Tamaoki et al. found an intraobserver reliability of a kappa of 0.63 within 3 time points of measurement, which were each 3 weeks apart. The interobserver reliability was 0.33 on average.9
The results of these studies are only comparable with our results obtained from radiographs. Here, with a mean kappa of 0.76, we documented a stronger intraobserver agreement compared to both aforementioned studies. The interobserver reliability, with a value of 0.32, was at the same level of the results documented by Tamaoki (mean kappa of 0.33).
We have to keep in mind that in general, no classification system is universally accepted and each one includes a certain degree of interobserver variability.8 This is evident not only in the classification of olecranon fractures, but also in other fractured areas.9,13, 14, 15, 16, 17 Brunner et al. have investigated the agreement of proximal humerus fractures for the Neer and AO classification based on CT imaging. They documented a mean interobserver reliability for the Neer classification with respect to grading the group and fracture type of 0.48 and 0.42, respectively, and for the AO classification with respect to the type and finally group classification of 0.61 and 0.48, respectively. The results are thus comparable with our CT-based classification with respect to their interobserver agreement and document that an almost perfect agreement does not appear possible, even with CT images.21
Our study is accompanied by some limitations. One limitation is, that the time required per examiner to classify the fractures was not documented. Thus, we are unable to determine, whether more or less time was taken for assessment than in a clinical setting and whether this had an impact on the final classification.
We included only cases where a CT has been performed to better understand the fracture morphology and to safely rule out possible associated injuries. This inevitably leads to an over-representation of fractures of type IIB and higher. If we assume that the most common fracture type is the type IIA fracture, then, the average fracture types are under-represented in our study regarding their observer reliability.2
Finally, it should be noted that the Mayo classification was originally designed for the evaluation of radiographic images.2 Therefore, it should be questioned whether it is also suitable for the evaluation of CT scans. It is, however characterized by a memorable and simple system that can be used with fewer inconsistent interobserver sub-classifications based on CT imaging.
5. Conclusion
This paper introduces as the first one a scientific evaluation of the reliability of the Mayo classification for olecranon fractures based on CT scans. In summary, intraobserver reliability of the Mayo classification, assessed by CT imaging was considered almost perfect whereas interobserver reliability was only considered moderate. This might lead to inconsistent fracture classification in both scientific and clinical setting. Agreement based on CT scans appears to be superior to that based on X-rays.
Consent to participate
Not applicable.
Consent to publish
Not applicable.
Availability of data and materials
All data and materials are available from the corresponding author.
Code availability
Not applicable.
Ethical statement
All analyses performed in our study involving X-rays and CT scans from patients were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Institutional ethics committee approval was given prior to this study (VT 21-1248). Informed consent was not required.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Guardian/patient’s consent
Not applicable.
CRediT authorship contribution statement
Andreas Harbrecht: Conceptualization, Methodology, Formal analysis, Data curation, Visualization, Role of Observer, Writing – original draft, Writing – review & editing. Michael Hackl: Supervision, Review of the original draft. Nadine Ott: Role of Observer, Supervision, Review of the original draft. Stephan Uschok: Role of Observer, Supervision, Review of the original draft. Kilian Wegmann: Supervision, Review of the original draft. Lars P. Müller: Supervision, Review of the original draft. Tim Leschinger: Methodology, Conceptualization, Role of Observer, Review of the original draft.
Declaration of competing interest
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Acknowledgments
None.
References
- 1.Garbuz D.S., Masri B.A., Esdaile J., Duncan C.P. Classification systems in orthopaedics. J Am Acad Orthop Surg. 2002;10(4):290–297. doi: 10.5435/00124635-200207000-00007. [DOI] [PubMed] [Google Scholar]
- 2.Cabanela M., Morrey B. 2 ed. WB Saunders; Philadelphia: 1993. The Elbow and its Disorders. [Google Scholar]
- 3.Bruggemann A., Mukka S., Wolf O. Epidemiology, classification and treatment of olecranon fractures in adults: an observational study on 2462 fractures from the Swedish Fracture Register. Eur J Trauma Emerg Surg. 2022;48(3):2255–2263. doi: 10.1007/s00068-021-01765-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sullivan C.W., Desai K. Classifications in brief: Mayo classification of olecranon fractures. Clin Orthop Relat Res. 2019;477(4):908–910. doi: 10.1097/CORR.0000000000000614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cantore M., Candela V., Sessa P., Giannicola G., Gumina S. Epidemiology of isolated olecranon fractures: a detailed survey on a large sample of patients in a suburban area. JSES Int. 2022;6(2):309–314. doi: 10.1016/j.jseint.2021.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Patino J.M., Rullan Corna A.F., Michelini A.E., Abdon I.M., Marinucci B. Olecranon fractures: do they lead to osteoarthritis? Long-term outcomes and complications. Int Orthop. 2020;44(11):2379–2384. doi: 10.1007/s00264-020-04695-7. [DOI] [PubMed] [Google Scholar]
- 7.Hamoodi Z., Duckworth A.D., Watts A.C. Olecranon fractures: a critical analysis Review. JBJS Rev. 2023;11(1) doi: 10.2106/JBJS.RVW.22.00150. [DOI] [PubMed] [Google Scholar]
- 8.Benetton C.A., Cesa G., El-Kouba Junior G., Ferreira A.P., Vissoci J.R., Pietrobon R. Agreement of olecranon fractures before and after the exposure to four classification systems. J Shoulder Elbow Surg. 2015;24(3):358–363. doi: 10.1016/j.jse.2014.10.025. [DOI] [PubMed] [Google Scholar]
- 9.Tamaoki M.J., Matsunaga F.T., Silveira J.D., Balbachevsky D., Matsumoto M.H., Belloti J.C. Reproducibility of classifications for olecranon fractures. Injury. 2014;45(Suppl 5):S18–S20. doi: 10.1016/S0020-1383(14)70015-4. [DOI] [PubMed] [Google Scholar]
- 10.Eismann E.A., Stephan Z.A., Mehlman C.T., et al. Pediatric triplane ankle fractures: impact of radiographs and computed tomography on fracture classification and treatment planning. J Bone Joint Surg Am. 2015;97(12):995–1002. doi: 10.2106/JBJS.N.01208. [DOI] [PubMed] [Google Scholar]
- 11.Jones G.L., Bishop J.Y., Lewis B., Pedroza A.D., Group M.S. Intraobserver and interobserver agreement in the classification and treatment of midshaft clavicle fractures. Am J Sports Med. 2014;42(5):1176–1181. doi: 10.1177/0363546514523926. [DOI] [PubMed] [Google Scholar]
- 12.Landis J.R., Koch G.G. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. [PubMed] [Google Scholar]
- 13.Foroohar A., Tosti R., Richmond J.M., Gaughan J.P., Ilyas A.M. Classification and treatment of proximal humerus fractures: inter-observer reliability and agreement across imaging modalities and experience. J Orthop Surg Res. 2011;6 doi: 10.1186/1749-799X-6-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Belloti J.C., Tamaoki M.J., Franciozi C.E., et al. Are distal radius fracture classifications reproducible? Intra and interobserver agreement. Sao Paulo Med J. 2008;126(3):180–185. doi: 10.1590/S1516-31802008000300008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Matsunaga F.T., Tamaoki M.J., Cordeiro E.F., et al. Are classifications of proximal radius fractures reproducible? BMC Muscoskel Disord. 2009;10:120. doi: 10.1186/1471-2474-10-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Walton N.P., Harish S., Roberts C., Blundell C. AO or Schatzker? How reliable is classification of tibial plateau fractures? Arch Orthop Trauma Surg. 2003;123(8):396–398. doi: 10.1007/s00402-003-0573-1. [DOI] [PubMed] [Google Scholar]
- 17.Flikkila T., Nikkola-Sihto A., Kaarela O., Paakko E., Raatikainen T. Poor interobserver reliability of AO classification of fractures of the distal radius. Additional computed tomography is of minor value. J Bone Joint Surg Br. 1998;80(4):670–672. doi: 10.1302/0301-620x.80b4.8511. [DOI] [PubMed] [Google Scholar]
- 18.Dust T., Hartel M.J., Henneberg J.E., et al. The influence of 3D printing on inter- and intrarater reliability on the classification of tibial plateau fractures. Eur J Trauma Emerg Surg. 2023;49(1):189–199. doi: 10.1007/s00068-022-02055-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pieroh P., Hoch A., Hohmann T., et al. Fragility fractures of the pelvis classification: a multicenter assessment of the intra-rater and inter-rater reliabilities and percentage of agreement. J Bone Joint Surg Am. 2019;101(11):987–994. doi: 10.2106/JBJS.18.00930. [DOI] [PubMed] [Google Scholar]
- 20.Lubberts B., Janssen S., Mellema J., Ring D. Quantitative 3-dimensional computed tomography analysis of olecranon fractures. J Shoulder Elbow Surg. 2016;25(5):831–836. doi: 10.1016/j.jse.2015.10.002. [DOI] [PubMed] [Google Scholar]
- 21.Brunner A., Honigmann P., Treumann T., Babst R. The impact of stereo-visualisation of three-dimensional CT datasets on the inter- and intraobserver reliability of the AO/OTA and Neer classifications in the assessment of fractures of the proximal humerus. J Bone Joint Surg Br. 2009;91(6):766–771. doi: 10.1302/0301-620X.91B6.22109. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data and materials are available from the corresponding author.


