Abstract
Objective To evaluate the inter- and intraobserver agreement regarding the Walch classification system for shoulder arthritis.
Methods Computed tomography scans of the shoulder joint of adult patients were selected between 2012 and 2016, and they were classified by physicians with different levels of expertise in orthopedics. The images were examined at three different times, and the analyses were evaluated by the Fleiss Kappa index to verify the intra- and interobserver agreement.
Results The Kappa index for the intraobserver agreement ranged from 0.305 to 0.545. The inter-observer agreement was very low at the end of the three evaluations (κ = 0.132).
Conclusion The intraobserver agreement regarding the modified Walch classification varied from moderate to poor. The interobserver agreement was low.
Keywords: shoulder joint, osteoarthritis/classification, reproducibility of results
Introduction
Osteoarthritis (OA) is defined as joint degeneration of primary and secondary origin. Such a limitation causes difficulty to perform daily activities, and can become disabling.
Shoulder arthrosis can affect up to 20% of the elderly population. 1 The primary form is insidious, with no previous shoulder disorders, and it usually affects other joints. In the secondary form, however, there is a previous history. 1
The initial treatment for OA is based on clinical and drug management. The surgical treatment is frequently indicated to patients with impairments to perform their daily activities who did not respond to the medical treatment.
The number of shoulder arthroplasties and hemiarthroplasties has been growing over the past few decades. Previous studies show a 10.6% and 6.7% increase in the number of shoulder total arthroplasties and hemiarthroplasties respectively, between 1993 and 2007. 2
Imaging scans aid in the diagnosis and staging of the disease, as well as in the indication of the treatment. Radiographs are routinely used in three views – the anteroposterior, scapular and axillary views. 1 The main objective of computed tomography (CT) scans is to show glenoid anteversion and to provide a detailed view of joint involvement. 3
The main purpose of the classifications is to enable the communication among professionals studying a certain disease, in order to standardize diagnoses and treatments in clinical research. Thus, a good classification must be reproducible and have the ability to predict the prognosis of a particular condition. 4
One method to evaluate the reproducibility of a classification system is the analysis of the intra- and interobserver agreement. Intraobserver agreement refers to the concordance in the observations made by the same observer in different observation intervals, whereas interobserver agreement refers to the concordance between different observers.
There are several classifications for shoulder OA. The most used OA classification system was proposed by Walch et al 3 in 1999, which was modified in 2016. 4 This system stages and assesses the progression of shoulder OA based on CT scans of the patients' joints. It considers glenoid morphology, its retroversion angle, and its relationship with the humeral head. These data enable the determination of the best type of arthroplasty to be performed to treat the condition.
However, there is little information on reproducibility and agreement, especially regarding the 2016 modification.
The present study aims to evaluate the intra- and interobserver agreement regarding the modified Walch classification for shoulder OA.
Materials and Methods
The present is a retrospective, cross-sectional, analytical study of the agreement regarding classifications. The research project was approved by the Ethics in Research Committee of Plataforma Brasil (under C.A.A.E n∘ 66863817.3.0000.5505).
Results
There was no correct answer, just the observation of intra- and interobserver agreement (the greatest agreement and greatest disagreement).
Figure 1 shows the Kappa index for the intraobserver agreement at three distinct assessments using seven levels (A1, A2, B1, B2, B3, C and D). The best result revealed a moderate agreement (κ = 0.545).
Fig. 1.

Mean intraobserver agreement at the end of the three evaluations. Abbreviations: ELE1 and ELE2, expert level examiners; ALE, advanced-level examiner; BLE, basic level examiner; UMS, undergraduate medical student.
Figure 2 shows the Kappa index for the interobserver agreement for separate assessments, as well as the overall agreement at the completion of the three assessments using the same seven levels. The best agreement was obtained at the first evaluation, but it was deemed small (κ = 0.214). After the three assessments, there was very little interobserver agreement (κ = 0.132).
Fig. 2.

Interobserver agreement regarding the three evaluations and at general agreement evaluation.
The agreement calculations were made using only the four basic levels of the Walch classification (A, B, C, and D). Images rated as A1 and A2 were grouped as A; images classified as B1, B2 and B3 were grouped as B.
Figure 3 shows the Kappa index for the intraobserver agreement using only the four basic levels. In this scenario, the best result was substantial, a virtually perfect agreement (κ = 0.798).
Fig. 3.

Mean intraobserver agreement using four levels. Abbreviations: ELE1 and ELE2, expert level examiners; ALE, advanced level examiner; BLE, basic level examiner; UMS, undergraduate medical student.
F igure 4 presents the comparison of the interobserver Kappa indices when the seven levels (A1, A2, B1, B2, B3, C and D) were used, after the grouping regarding the four basic levels. Although the classification system was simplified, the best interobserver agreement remained very small (κ = 0.172).
Fig. 4.

Comparison of the interobserver agreement at each of the three evaluations and general agreement evaluation using seven and four levels.
Discussion
The Walch classification was chosen because it is widely used by orthopedists to determine shoulder joint involvement in patients with primary arthrosis. Intra- and interobserver agreement is very important to the evaluation of any orthopedic classification system.
The Kappa index regarding the intraobserver agreement ranged from 0.305 (ELE1) to 0.545 (BLE), showing that there was small to moderate agreement for the same evaluator. The wide variation between the results probably results from the complexity of this classification system. Professional experience did not have the expected effect on intra-observer agreement, since the highest index was obtained by the BLE, and the lowest index was obtained by the ELE1.
Interobserver agreement was very low at the completion of the three evaluations (κ = 0.132). The index decreased between the three evaluation moments. This reduction showed that time and familiarization with the classification system had no relevant effect at the end of the evaluations; in addition, the training performed prior to the first evaluation may have influenced the results.
Our work showed lower Kappa indices compared to studies assessing the agreement regarding different classification systems, as well as lower intra- and interobserver agreement concerning the Walch classification when compared to other studies. Matsunaga et al, 8 analyzing the Mason classification for proximal radial fractures, demonstrated satisfactory intra- (κ = 0.582) and interobserver (κ = 0.429–0.560) agreement.
The use of the four basic levels of assessment resulted in a better intraobserver agreement, with substantial values obtained for most evaluators. This finding highlights the difficulty in evaluating the Walch classification subdivisions, and it shows that a simplification of the classification leads to a better agreement.
Belotti et al 9 demonstrated that intra- and interobserver agreement for distal radial classifications was higher if there were fewer variables. This fact is in line with the present study, in which there was an increase in agreement when fewer variables were used.
Our results reveal an important difference compared to those reported by Bercik et al, 4 who demonstrated very good interobserver and virtually perfect intraobserver agreement. This difference may be explained by the use of specialized software to determine the version angle of the glenoid and three-dimensional (3D) reconstructions of CT scans in the abovementioned studies, which were not employed by us.
The use of 3D reconstruction images seems to improve the understanding of glenoid morphology. Osteoarthritis can cause bone degeneration in the sagittal, coronal and axial planes, thus presenting itself as a 3D defect that is difficult to see in two-dimensional images.
Scalise et al 10 and Budge et al 11 used CT with 3D reconstruction. Both showed that there was a better morphological understanding of the glenoid and, thus, a better agreement between the evaluators when 3D images were analyzed.
It is worth noting that the present study was limited to evaluating the opinions of the examiners; it did not have the goal of establishing a correct answer for each scan evaluated. Therefore, the accuracy of each observer was not assessed. This would require analyzing each observer's responses and comparing them with a golden standard method (with high specificity and sensitivity) for diagnosis.
Conclusion
The intraobserver agreement of the modified Walch classification varied from moderate to poor. The interobserver agreement, however, was low.
Conflitos de Interesse Os autores declaram não haver conflitos de interesse.
Trabalho desenvolvido no Hospital da Pontifícia Universidade Católica de Campinas, Campinas, SP, Brasil. Publicado Originalmente por Elsevier Editora Ltda.
Work developed at Hospital da Pontifícia Universidade Católica de Campinas, Campinas, SP, Brazil. Originally Published by Elsevier.
Referências
- 1.Cofield R H, Briggs B T. Glenohumeral arthrodesis. Operative and long-term functional results. J Bone Joint Surg Am. 1979;61(05):668–677. [PubMed] [Google Scholar]
- 2.Day J S, Lau E, Ong K L, Williams G R, Ramsey M L, Kurtz S M. Prevalence and projections of total shoulder and elbow arthroplasty in the United States to 2015. J Shoulder Elbow Surg. 2010;19(08):1115–1120. doi: 10.1016/j.jse.2010.02.009. [DOI] [PubMed] [Google Scholar]
- 3.Walch G, Badet R, Boulahia A, Khoury A. Morphologic study of the glenoid in primary glenohumeral osteoarthritis. J Arthroplasty. 1999;14(06):756–760. doi: 10.1016/s0883-5403(99)90232-2. [DOI] [PubMed] [Google Scholar]
- 4.Bercik M J, Kruse K, II, Yalizis M, Gauci M O, Chaoui J, Walch G. A modification to the Walch classification of the glenoid in primary glenohumeral osteoarthritis using three-dimensional imaging. J Shoulder Elbow Surg. 2016;25(10):1601–1606. doi: 10.1016/j.jse.2016.03.010. [DOI] [PubMed] [Google Scholar]
- 5.Viera A J, Garrett J M. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(05):360–363. [PubMed] [Google Scholar]
- 6.Altman D G. London: Chapman and Hall; 1995. Practical statistics for medical research. 3rd ed. [Google Scholar]
- 7.Svanholm H, Starklint H, Gundersen H J, Fabricius J, Barlebo H, Olsen S. Reproducibility of histomorphologic diagnoses with special reference to the kappa statistic. APMIS. 1989;97(08):689–698. doi: 10.1111/j.1699-0463.1989.tb00464.x. [DOI] [PubMed] [Google Scholar]
- 8.Matsunaga F T, Tamaoki M J, Cordeiro E F et al. Are classifications of proximal radius fractures reproducible? BMC Musculoskelet Disord. 2009;10:120. doi: 10.1186/1471-2474-10-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Belloti J C, Tamaoki M J, Franciozi C E et al. Are distal radius fracture classifications reproducible? Intra and interobserver agreement. Sao Paulo Med J. 2008;126(03):180–185. doi: 10.1590/S1516-31802008000300008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Scalise J J, Codsi M J, Bryan J, Brems J J, Iannotti J P. The influence of three-dimensional computed tomography images of the shoulder in preoperative planning for total shoulder arthroplasty. J Bone Joint Surg Am. 2008;90(11):2438–2445. doi: 10.2106/JBJS.G.01341. [DOI] [PubMed] [Google Scholar]
- 11.Budge M D, Lewis G S, Schaefer E, Coquia S, Flemming D J, Armstrong A D. Comparison of standard two-dimensional and three-dimensional corrected glenoid version measurements. J Shoulder Elbow Surg. 2011;20(04):577–583. doi: 10.1016/j.jse.2010.11.003. [DOI] [PubMed] [Google Scholar]




