Abstract
Background:
Musculoskeletal ultrasound (MSUS) is increasingly being utilized in the evaluation of pediatric musculoskeletal diseases. In order to provide objective assessments of arthritis, reliable MSUS scoring systems are needed. Recently, joint-specific scoring systems for arthritis of the pediatric elbow, wrist and finger joints were proposed by the Childhood Arthritis and Rheumatology Research Alliance (CARRA) MSUS workgroup. This study aims to assess reliability of these scoring systems when used by sonographers with different levels of expertise.
Methods:
Members of the CARRA MSUS workgroup attended training sessions for scoring the elbow, wrist and finger. Subsequently, scoring exercises of B-mode and Power Doppler (PD)-mode still-images for each joint were performed. Inter-reader reliability was determined using two-way single score intra-class correlation coefficients (ICC) for synovitis and Cohen’s kappa for tenosynovitis.
Results:
Seventeen pediatric rheumatologists with different levels of MSUS expertise (1 – 15 years) completed a 2-hour training session and calibration exercise for each joint. Excellent reliability (ICC>0.75) was found after the first scoring exercise for all of the finger and elbow views evaluated on B-mode and PD-mode, and for all of the wrist views on B-mode. After a second training session and a scoring exercise the wrist PD-mode views reached excellent reliability as well.
Conclusion:
The preliminary CARRA MSUS scoring systems for assessing arthritis of the pediatric elbow, wrist and finger joints demonstrate excellent reliability among pediatric MSUS sonographers with different levels of expertise. This reliable joint-specific scoring system could serve as a clinical tool and scientific outcome measure with further validation.
Keywords: juvenile idiopathic arthritis, diagnostic imaging, ultrasonography
Introduction
Juvenile idiopathic arthritis (JIA) is a significant cause of morbidity worldwide1. Persistent joint inflammation can lead to functional limitations and lower health-related quality of life1. Clinical evaluation of JIA disease activity includes the assessment of active joint count (AJC), physician global assessment (PGA), parent/patient global assessment, presence and duration of morning stiffness and biologic markers of inflammation2. While these variables are included in validated composite outcome measures such as the Juvenile Arthritis Disease Activity Score (JADAS)3, recent guidelines acknowledge the need for further standardization of the patient/parent and PGA assessments2. The PGA can have poor interrater reliability among providers, particularly in patients with low disease activity or inactive disease4. In addition, the reliability of AJC is limited5 and cannot always adequately identify joints with synovitis6.
Musculoskeletal ultrasound (MSUS) is increasingly being utilized in children7. It is well tolerated, readily available and relatively inexpensive compared to other imaging modalities. Normal age-related findings and definitions of pediatric synovitis on MSUS have been developed8–11. MSUS can provide point of care information including the identification of subclinical disease6,12,13. In order to provide objective assessments of arthritis, reliable scoring systems are necessary. They exist for rheumatoid arthritis14,15 but in light of the unique sonographic features of the pediatric joint, specific MSUS scoring systems for JIA are needed. Our group recently proposed a joint-specific MSUS scoring system for the assessment of arthritis of the pediatric elbow, wrist and finger joints which demonstrated excellent reliability when used by experienced ultrasonographers (>7 years of experience)16. As MSUS use increases in pediatric rheumatology, reliable scoring systems for different levels of experience are needed. The objective of this study was to assess the interreader reliability of a B-Mode and Power Doppler (PD) mode scoring system for arthritis of the pediatric elbow, wrist and finger16 between sonographers with different levels of experience.
Methods
Seventeen pediatric rheumatology providers who are members of the Childhood Arthritis and Rheumatology Research Alliance (CARRA) MSUS workgroup participated. All providers had prior formal training in pediatric MSUS with 1 to 15 years of subsequent clinical experience in pediatric MSUS. For the analysis, participants were divided into 2 groups: an expert group, defined as participants with > 5 years of experience in MSUS and > 10 MSUS studies/week in children (n = 5), and a non-expert group (n = 12). This study was approved by the Cincinnati Children’s Hospital Medical Center Institutional Review Board (approval number: 2018-7939). Written assent and consent to participate was obtained from all children whose images were used in the scoring exercises. Written informed consent for publication was obtained from all the study participants.
For each of the joints (elbow, wrist and finger) participants received an initial 2-hour online virtual training session from an expert who had contributed to the preliminary CARRA scoring system (PVF, EO, JR)16. The training session included reviews of normal sonoanatomy, pathologic findings in JIA, the preliminary CARRA semiquantitative scoring system and case-based examples of scoring including pitfalls. Participants then took part in a calibration exercise using still images. Through a subsequent debrief, any remaining questions were addressed.
The preliminary CARRA scoring system for the elbow, wrist and finger joints consists of a semiquantitative grading from 0 to 3 (0- being normal or no pathology, 3- being severe pathology) for both B-mode and PD-mode images; tenosynovitis is assessed through a binary system with 0 (no pathology) or 1 (presence of pathology) in B- and PD-mode (Supplementary Table 1–4)16. Anonymized MSUS images of children 2 to 17 years of age were used, and the age of the patient was available to the participants.
Scoring exercises of both B-mode and PD-mode were completed for each joint. Interreader reliability was estimated using two-way single score intraclass correlation coefficients (ICC), a validated statistical measure of interreader reliability when variables in a study are rated by multiple coders17. An ICC was considered excellent for values of 0.75–1.00, good 0.60–0.74, fair 0.40–0.59, and poor <0.4018. As a nominal variable, agreement for the scoring system of the extensor tendons of the wrist in transverse view was assessed using Cohen’s Kappa coefficient. Kappa values from 0.0 to 0.2 indicates slight agreement, 0.21 to 0.40 indicates fair agreement, 0.41 to 0.60 indicates moderate agreement, 0.61 to 0.80 indicates substantial agreement, and 0.81 to 1.0 indicates almost perfect or perfect agreement17. For those views which did not attain excellent reliability (ICC) or moderate agreement (Kappa) for all participants for the lower end of the 95% confidence interval (CI), the participants underwent a subsequent round of calibration and scoring exercises using a different set of B-mode and PD-mode images. Statistical analysis software used was SAS v9.4©, Cary, NC.
Results
A total of 300 still-images were used for the first round of calibration and scoring exercises. These images were obtained in children aged 2–18 years, distributed equally across this age range, and including a broad number of images for each of the grade 0–3 categories (Supplementary Table 5). Interreader reliability results for the entire group of raters are shown in the table. In general, experts and non-experts combined demonstrated excellent interreader agreement for all B-mode and PD-mode views of the elbow and finger joints as well as the distal radioulnar and midline radiocarpal views of the wrist. For all participants, the view of the radiocarpal joint in ulnar probe position reached excellent reliability in B-mode but only good reliability in PD-mode for the lower limit of the CI (excellent for the ICC itself). Images of the extensor tendons of the wrists demonstrated moderate agreement in B-mode (kappa criteria, see materials and methods section) but only fair agreement in PD-mode for all participants Given the overall excellent reliability for synovitis and moderate agreement for tenosynovitis in B-mode, instead of proposing a modification of the scoring system, the PD mode scoring was repeated for the ulnar view of the radiocarpal joint and the extensor tendons following a second training session with special focus on the distinction of physiologic and pathologic findings. Regardless of level of expertise, excellent interrater reliability of 0.96 (95% CI 0.94–0.97) and moderate agreement of 0.72 (95% CI 0.61 – 0.83) were obtained following the second training and scoring exercise. Separate results for the expert and non-expert group are shown in Supplementary Table 6 and Supplementary Table 7.
Table.
Interreader Reliability Exercise for the Pediatric Elbow, Wrist, and Finger Joints
| Joint | Exercise 1 ICC (95% CI)1 (n= 300 images ) | Exercise 2 ICC (95% CI)1 (n= 28 images ) | |||
|---|---|---|---|---|---|
| B-Mode | PD-mode | B-Mode | PD-mode | ||
| Elbow | Anterior humeroradial and humeroulnar joint recesses in longitudinal view | 0.97 (0.96–0.98) | 0.89 (0.84–0.92) | N/A | N/A |
| Posterior humeroulnar joint recess in longitudinal view | 0.96 (0.94–0.97) | 0.90 (0.86–0.92) | N/A | N/A | |
| Wrist | Distal radioulnar joint recess in transverse view | 0.94 (0.92–0.96) | 0.96 (0.94–0.97) | N/A | N/A |
| Dorsal radiocarpal joint recess in midline longitudinal view | 0.93 (0.91–0.94) | 0.97 (0.96–0.98) | N/A | N/A | |
| Dorsal radiocarpal joint recess in ulnar longitudinal view | 0.87 (0.81–0.93) | 0.83 (0.61–0.91) | N/A | 0.96 (0.94 – 0.97) | |
| Extensor tendons in transverse view2 | 0.67 (0.53 – 0.81) | 0.51 (0.37 – 0.65) | N/A | 0.72 (0.61 – 0.83) | |
| Finger | MCP dorsal joint recess in longitudinal view | 0.93 (0.91–0.97) | 0.97 (0.96–0.98) | N/A | N/A |
| MCP volar joint recess in longitudinal view | 0.93 (0.91–0.94) | 0.87 (0.84–0.90) | N/A | N/A | |
| PIP volar long joint recess in longitudinal view | 0.96 (0.95–0.97) | 0.89 (0.86–0.91) | N/A | N/A | |
| PIP dorsal long joint recess in longitudinal view | 0.96 (0.94–0.97) | 0.95 (0.93–0.97) | N/A | N/A | |
CI: confidence interval. MCP: metacarpophalangeal joint, PIP: proximal interphalangeal joint.
Intraclass correlation coefficient (ICC) was based on a two-way random effects model for a single measure. Excellent ICC was defined to be between 0.75–1.00, good 0.60–0.74, fair 0.40–0.59 and poor <0.418.
Kappa values from 0.0 to 0.2 indicates slight agreement, 0.21 to 0.40 indicates fair agreement, 0.41 to 0.60 indicates moderate agreement, 0.61 to 0.80 indicates substantial agreement, and 0.81 to 1.0 indicates almost perfect or perfect agreement17.
Discussion
This study demonstrated excellent reliability of the preliminary semiquantitative CARRA MSUS scoring system for the pediatric elbow, wrist, and finger joints for providers with different levels of expertise. In addition, advanced MSUS concepts were successfully taught in a virtual format. By demonstrating the reliability of this semiquantitative measurement instrument, our study supports the potential use of MSUS scoring systems as an objective outcome measure at bedside and in further research studies.
Several MSUS scoring systems have been published in recent years16,19–22. Only a few of these scoring systems have been evaluated with sonographers of variable experience. The pediatric MSUS scoring system of the knee proposed by the CARRA group was tested in pediatric rheumatology providers from the CARRA JIA Ultrasound workgroup (n=16) with <1 to 10 years of US experience. This exercise demonstrated good to excellent reliability for B-mode and PD-mode views22. Most recently, Rossi-Semerano et al23 reported the reliability of the OMERACT pediatric US synovitis scoring system among 13 pediatric ultrasonographers of diverse subspecialty backgrounds: nine rheumatologists, two pediatricians, and two radiologists with varying degrees of experience. This group used a total of 75 images to evaluate the reliability of the most representative view of the wrist, elbow, MCP II, knees, and ankle joints. For the scoring systems of the MCP II, wrist, and elbow, they found fair to good reliability for the B-mode and excellent reliability for PD-mode. However, the scoring system used was not joint-specific, which the authors acknowledged23. Our study used a larger sample of normal and pathologic images (n=300), including all views recommended in the evaluation of JIA and involved 17 pediatric rheumatology sonographers with different levels of expertise. Our joint-specific scoring system is based on one-plane view per area, using the most representative view to capture pathology, with any abnormal findings confirmed in a second plane.
Since all views reached excellent reliability18 after the second training session, the differences in the ICC noted after the first scoring exercise may have resulted from variability in the experience level of the participants interpreting MSUS rather than intrinsic issues with the scoring system. The lower ICC levels and 95% CI range in the MCP and PIP volar views in PD-mode for the expert group was a function of the number of positives rating being small. Reliability coefficients assume an equal distribution of positive and negative findings24. The cutoffs used for differentiating the various levels of agreement (poor to excellent) were based on Cicchetti et al18. Other authors (Koo and Li25) have proposed slightly higher cutoffs. It is important to note, that no universal agreement exists on how to define these levels. The ICC ranges between 0.00 and 1.00, with values closer to 1.00 representing stronger reliability. Given that most of our ICC values are 0.9 or above with the lower end of the range being very close to it, the results suggest very good reliability independent of the specific cut-offs used. We also considered the lower end of the 95% CI for the decision of whether agreement was good enough and did not just base this decision on the ICC value itself.
The value of training sessions that include a review of the definitions of key MSUS findings in healthy children and in children with arthritis, normal sonoanatomy, the scoring system, sonographic images with pathology, and a calibration session, was demonstrated. Participant feedback following the first and second training exercises noted that careful review of the normal spectrum of sonographic findings related to the degree of skeletal maturation was most helpful. Given the in-person meeting limitations imposed by the COVID-19 pandemic, this project was conducted in an online virtual setting. A major benefit of the change in format included the opportunity for the recording of the sessions and the possibility to reach out to a larger group. The session recordings were used by participants unable to attend the virtual meetings. The excellent reliability reached in this project supports the use of an online virtual format as an effective method for pediatric MSUS training, including training for scoring exercises.
Additional exercises of the reliability of the proposed MSUS scoring systems in real-time ultrasound images among different patient age groups, for instance via patient-based exercise, may follow. Future studies will also need to assess the construct and predictive validity of this preliminary MSUS scoring system.
Conclusion
A novel MSUS scoring system for B-mode and PD-mode of the pediatric elbow, wrist and finger showed excellent reliability among pediatric rheumatology ultrasonographers with varying levels of expertise. This was supported by an in-depth virtual training format. This joint-specific scoring system for pediatric arthritis could serve as a clinical and scientific outcome measure following further refinement and validation.
Supplementary Material
Acknowledgments
The authors wish to acknowledge Childhood Arthritis and Rheumatology Research Alliance (CARRA) and the ongoing Arthritis Foundation financial support of CARRA. Project funded by a CARRA – Arthritis Foundation Small Grant. Dr. Vega-Fernandez’s work was supported by the Center for Clinical & Translational Science & Training (CCTST) at the University of Cincinnati funded by the National Institutes of Health (NIH) Clinical and Translational Science Award (CTSA) program, grant 2UL1TR001425-05A1 and KL2 (2KL2TR001426-05A); and the National Institutes of Arthritis and Musculoskeletal Skin Diseases under Award - Number P30AR076316 Cincinnati Children’s Hospital Medical Center, Medical Diversity and Health Disparities Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Footnotes
Financial Interest: The authors have no financial interests to report.
References
- 1.Palman J, Shoop-Worrall S, Hyrich K, McDonagh JE. Update on the epidemiology, risk factors and disease outcomes of Juvenile idiopathic arthritis. Best Pract Res Clin Rheumatol 2018;32:206–22. [DOI] [PubMed] [Google Scholar]
- 2.Ringold S, Angeles-Han ST, Beukelman T, et al. 2019 American College of Rheumatology/Arthritis Foundation Guideline for the Treatment of Juvenile Idiopathic Arthritis: Therapeutic Approaches for Non-Systemic Polyarthritis, Sacroiliitis, and Enthesitis. Arthritis Rheumatol 2019;71:846–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Consolaro A, Ruperto N, Bazso A, et al. Development and validation of a composite disease activity score for juvenile idiopathic arthritis. Arthritis Rheum 2009;61:658–66. [DOI] [PubMed] [Google Scholar]
- 4.Taylor J, Giannini EH, Lovell DJ, Huang B, Morgan EM. Lack of Concordance in Interrater Scoring of the Provider’s Global Assessment of Children With Juvenile Idiopathic Arthritis With Low Disease Activity. Arthritis Care Res (Hoboken) 2018;70:162–6. [DOI] [PubMed] [Google Scholar]
- 5.Guzman J, Burgos-Vargas R, Duarte-Salazar C, Gomez-Mora P. Reliability of the articular examination in children with juvenile rheumatoid arthritis: interobserver agreement and sources of disagreement. J Rheumatol 1995;22:2331–6. [PubMed] [Google Scholar]
- 6.Vega-Fernandez POE, Henrickson M, Huggins J, Altaye M, Cassedy A, Roth J, Ting T. Correlation of Subclinical Synovitis with Juvenile Idiopathic Arthritis Outcome Measurements [abstract]. Arthritis Rheumatol 2021;73 (suppl 10):10. [Google Scholar]
- 7.Windschall D, Malattia C. Ultrasound imaging in paediatric rheumatology. Best Pract Res Clin Rheumatol 2020;34:101570. [DOI] [PubMed] [Google Scholar]
- 8.Collado P, Vojinovic J, Nieto JC, et al. Toward Standardized Musculoskeletal Ultrasound in Pediatric Rheumatology: Normal Age-Related Ultrasound Findings. Arthritis Care Res (Hoboken) 2016;68:348–56. [DOI] [PubMed] [Google Scholar]
- 9.Roth J, Jousse-Joulin S, Magni-Manzoni S, et al. Definitions for the sonographic features of joints in healthy children. Arthritis Care Res (Hoboken) 2015;67:136–42. [DOI] [PubMed] [Google Scholar]
- 10.Roth J, Ravagnani V, Backhaus M, et al. Preliminary Definitions for the Sonographic Features of Synovitis in Children. Arthritis Care Res (Hoboken) 2017;69:1217–23. [DOI] [PubMed] [Google Scholar]
- 11.Collado P, Windschall D, Vojinovic J, et al. Amendment of the OMERACT ultrasound definitions of joints’ features in healthy children when using the DOPPLER technique. Pediatr Rheumatol Online J 2018;16:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Janow GL, Panghaal V, Trinh A, Badger D, Levin TL, Ilowite NT. Detection of active disease in juvenile idiopathic arthritis: sensitivity and specificity of the physical examination vs ultrasound. J Rheumatol 2011;38:2671–4. [DOI] [PubMed] [Google Scholar]
- 13.De Lucia O, Ravagnani V, Pregnolato F, et al. Baseline ultrasound examination as possible predictor of relapse in patients affected by juvenile idiopathic arthritis (JIA). Ann Rheum Dis 2018;77:1426–31. [DOI] [PubMed] [Google Scholar]
- 14.Terslev L, Naredo E, Aegerter P, et al. Scoring ultrasound synovitis in rheumatoid arthritis: a EULAR-OMERACT ultrasound taskforce-Part 2: reliability and application to multiple joints of a standardised consensus-based scoring system. RMD Open 2017;3:e000427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bruyn GAW, Siddle HJ, Hanova P, et al. Ultrasound of Subtalar Joint Synovitis in Patients with Rheumatoid Arthritis: Results of an OMERACT Reliability Exercise Using Consensual Definitions. J Rheumatol 2019;46:351–9. [DOI] [PubMed] [Google Scholar]
- 16.Vega-Fernandez P, Ting TV, Oberle EJ, et al. The MUSICAL pediatric ultrasound examination - a comprehensive, reliable, time efficient assessment of synovitis. Arthritis Care Res (Hoboken) 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hallgren KA. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. Tutor Quant Methods Psychol 2012;8:23–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cicchetti DV. Multiple comparison methods: establishing guidelines for their valid application in neuropsychological research. J Clin Exp Neuropsychol 1994;16:155–61. [DOI] [PubMed] [Google Scholar]
- 19.Collado P, Naredo E, Calvo C, et al. Reduced joint assessment vs comprehensive assessment for ultrasound detection of synovitis in juvenile idiopathic arthritis. Rheumatology (Oxford) 2013;52:1477–84. [DOI] [PubMed] [Google Scholar]
- 20.Vojinovic J, Magni-Manzoni S, Collado P, et al. SAT0636 Ultrasonography definitions for synovitis grading in children: the omeract pediatric ultrasound task force. BMJ Publishing Group Ltd; 2017. [Google Scholar]
- 21.Sande NK, Boyesen P, Aga AB, et al. Development and reliability of a novel ultrasonographic joint-specific scoring system for synovitis with reference atlas for patients with juvenile idiopathic arthritis. RMD Open 2021;7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ting TV, Vega-Fernandez P, Oberle EJ, et al. Novel Ultrasound Image Acquisition Protocol and Scoring System for the Pediatric Knee. Arthritis Care Res (Hoboken) 2019;71:977–85. [DOI] [PubMed] [Google Scholar]
- 23.Rossi-Semerano L, Breton S, Semerano L, et al. Application of the OMERACT synovitis ultrasound scoring system in juvenile idiopathic arthritis: a multicenter reliability exercise. Rheumatology (Oxford) 2021;60:3579–87. [DOI] [PubMed] [Google Scholar]
- 24.Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther 2005;85:257–68. [PubMed] [Google Scholar]
- 25.Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 2016;15:155–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
