Abstract
Objective
To test the reliability of the Outcome Measures in Rheumatology Giant cell arteritis (GCA) Ultrasonography Score (OGUS) and other composite scores in a patient-based exercise involving experts and non-experts in vascular ultrasonography.
Methods
Six GCA patients were scanned twice (two rounds separated ≥3 hours) by 12 experts and 12 non-experts. Non-experts received 90 min of theoretical and 240 min of practical training between rounds 1 and 2. Ultrasonography was conducted on temporal arteries (common superficial, frontal and parietal branches) and axillary arteries bilaterally to calculate the OGUS, the Southend score and the Halo count. Inter-reader and intra-reader reliability were assessed by intraclass correlation coefficient (ICC).
Results
Mean age of GCA patients was 78±5.1 years, 2 (33.3%) were women, and all were in clinical remission. Expert inter-reader ICC of the OGUS was 0.60 in both rounds, 0.40 in round 1 and 0.51 in round 2 for the Southend score and 0.45 and 0.52, respectively, for the Halo count. Median ICCs for intra-reader reliability were 0.86, 0.73 and 0.65 for the OGUS, Southend score and Halo count, respectively.
For non-experts, inter-reader ICCs in round 1 were 0.20 for the OGUS, 0.20 for a normalised Southend score (=score divided by available segments) and 0.35 for a normalised Halo count. After training, inter-reader reliability ICCs improved to 0.52, 0.29 and 0.54, respectively.
Conclusion
Inter-reader reliability was fair to moderate, and intra-reader reliability was good for OGUS, Southend score and Halo count among experts. Inter-reader reliability of non-experts in vascular ultrasonography improved after the training.
Keywords: Giant Cell Arteritis, Ultrasonography, Vasculitis
WHAT IS ALREADY KNOWN ON THIS TOPIC
The Outcome Measures in Rheumatology Giant cell arteritis Ultrasonography Score (OGUS) revealed good reliability in an online exercise, as well as feasibility, convergent construct validity and sensitivity to change in a prospective cohort study.
WHAT THIS STUDY ADDS
We demonstrate in a patient-based reliability exercise a moderate inter-rater and good intra-rater reliability of OGUS among experts in vascular ultrasonography.
Inter-rater reliability of non-experts in vascular ultrasonography improved after a specific training programme.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
OGUS might be used as an outcome measure in future clinical trials in giant cell arteritis, particularly when agents are investigated that interfere directly with acute-phase reactants.
Introduction
Giant cell arteritis (GCA) is the most common primary vasculitis in the elderly.1 Imaging, particularly ultrasonography has been established as a complementary tool in the diagnostic work-up of the disease, while its role for monitoring is still uncertain.2
Clinical symptoms and acute phase reactants, including erythrocyte sedimentation rate and C-reactive protein, have traditionally been used to assess disease activity in clinical practice and trials.3 However, modern biological and targeted synthetic drugs often interfere directly with the interleukin-6 pathway resulting in normal inflammatory markers even when the disease is clinically active.4 Therefore, alternative objective parameters are warranted for monitoring inflammatory activity as well as vascular structural damage, particularly when clinical symptoms are unequivocal.
The application of ultrasonography as a monitoring tool in GCA has been investigated recently. For example, a prospective study on 47 GCA patients demonstrated a reduction of the halo sign as well as intima-media thickness (IMT) from week 8 of glucocorticoid treatment onwards. Sensitivity to change was higher in temporal as compared with axillary arteries, where new ultrasonographic lesions emerged in some patients despite clinical remission.5 The ‘GCA treatment with Ultra-Short glucocorticoid and Tocilizumab (GUSTO)’ trial documented a rapid reduction of the IMT in temporal and axillary arteries after glucocorticoid pulses. This was followed by a re-increment of the IMT to almost baseline levels after 4 weeks despite tocilizumab monotherapy, and subsequently by a gradual but steady decrease.6 The ‘Prognosis of Temporal Arteritis (PROTEA)’ study observed that the halo sign was sensitive to change on standard glucocorticoid treatment, with rapid reduction of the IMT in temporal arteries and a delayed response in axillary arteries.7
The Outcome Measures in Rheumatology (OMERACT) ultrasonography large vessel vasculitis working group has recently published a new ultrasonography score for GCA, the OMERACT Giant cell arteritis Ultrasonography Score (OGUS).8 OGUS corroborates measurements of the IMT of eight vessels including bilateral common temporal arteries, parietal and frontal branches as well as axillary arteries. The Halo count (representing the number of vessels with the presence of a halo sign) was suggested as an alternative.8 A third tool, the Southend score, requires a conversion of IMT measurements into a semi-quantitative score.9 Both the Halo count and the Southend score are based on the same eight vessels as the OGUS.
OGUS, Southend score and Halo count all revealed feasibility, convergent construct validity and sensitivity to change in a previous study.8 Inter-reader and intra-reader reliability were previously tested in an online exercise, using stored images from patients. While this is an accepted approach at the stage of score development, it is insufficient for the validation according to the OMERACT Instrument Selection Algorithm (OFISA), given that it ignores the possibility of acquisition variability between different investigators.10 One previous study addressed the patient-based reliability of OGUS, Halo count and Southend scores as part of the protocol to develop a prediction tool for GCA.11 This reliability exercise, however, did not follow the usual OMERACT procedure and is therefore insufficient to fully validate the reliability of OGUS.
In the present work, we studied the reliability of OGUS and the other two ultrasonography scores for GCA in a patient-based exercise involving experts and non-experts in vascular ultrasonography. We involved non-experts not only to increase the generalisability of our results, but also to investigate whether a short theoretical and practical training module, which was part of the European Large Vessel Vasculitis Imaging Course (EULVIC), would have a positive effect on inter-rater reliability in this group.
Patients and methods
Study design, patients and setting
This study consisted of two parts: (1) a patient-based reliability exercise of experts in vascular sonography and (2) a substudy to investigate the effects of EULVIC on the inter-rater reliability of non-experts. Six patients with an established clinical diagnosis of GCA were invited to participate in both parts of the study. They were selected based on good general health to be able to participate in an exercise lasting several hours, a positive ultrasonography result of temporal and/or axillary arteries at the time of diagnosis, presence of acute or chronic ultrasonographic lesions in at least one vascular territory at the most recent routine clinical visit before the study and willingness to participate.
Part 1: 12 experts in vascular ultrasonography from the OMERACT ultrasonography large vessel vasculitis subgroup participated in this project. All of them were previously involved in the development of the OGUS including the online reliability exercise using stored images. A preliminary half-day meeting was held in advance of the EULVIC conference in Innsbruck, Austria, in June 2023 to discuss details on ultrasound machine settings and measurements before the exercise.
The reliability exercise was conducted the next day in accordance with previous studies of the OMERACT ultrasonography working groups to test the patient-based reliability of ultrasonography elementary lesions and scores.12 13
Part 2: Participants of EULVIC were asked to complete a survey ahead of the course regarding their interest in participating in this study, their experience with vascular ultrasonography and the number of GCA patients examined in a year. We selected 12 participants (termed non-experts) based on their experience (prioritising those with the lowest) and considering a country balance. Non-experts conducted the reliability exercise at the start of EULVIC and repeated it after the teaching sessions of ultrasonography. EULVIC is an international imaging course in ultrasonography, MRI, CT, positron emission tomography and histology in large vessel vasculitis. Concerning ultrasonography, EULVIC offers 90 min of theoretical lectures and live demonstrations as well as 240 min of practical workshops (maximum of eight participants per ultrasound machine and tutor) with healthy individuals and patients (different patients than those participating in the reliability exercise). All speakers and tutors are international experts in the field and were also involved as experts in the reliability exercise.
Ultrasonography examinations and data collection
Experts and non-experts individually examined the six patients with GCA according to a standardised protocol, performing a bilateral examination of common superficial temporal arteries with their frontal and parietal branches and axillary arteries in longitudinal and transverse scans. Vessel segments were considered pathological, when the measured IMT was above the cut-off of normal, as previously described.14,16 Measurements of the IMT of each arterial wall were conducted in accordance with the recommendations for the evaluation of the OGUS.8 Briefly, IMT was measured in the area of greatest thickness, preferentially in longitudinal view and at the vessel wall distal to the ultrasound probe, given that the intima-media complex is not constantly visible at the superficial wall because of potential technical artefacts. In cases where the intima-media complex could not be depicted clearly (eg, in case of very low IMT in normal vessels), it was recommended to compress the vessel until no lumen was visible anymore, subsequently measuring both walls and dividing the result by two. All measurements were conducted in greyscale without the use of Doppler. Two decimals after the comma were registered.
Experts had 15 min and non-experts 10 min for each patient. Then they rotated to the next patient until every sonographer had examined all patients. The shorter time for non-experts was because the reliability exercise was organised during EULVIC and we intended to avoid too much overlap with lectures and workshops. Data were collected immediately (with the assistance of medical students) using a paper-based case report form (CRF) and were eventually transferred into an electronic CRF. Sonographers and medical students were blinded to all routine clinical, laboratory and imaging data and there was no communication among sonographers, nor between sonographers and patients regarding their disease and findings. In order not to interfere with blinding, only patients who had not undergone temporal artery biopsy were included. None of the patients had visibly swollen or tender temporal arteries.
A second round with an identical sequence of examinations (but in a different order of patients) was conducted; the time differences between the first and second rounds were 3 hours for experts and 1–2 days for non-experts.
We used six Canon Aplio A ultrasound machines, all equipped with an 8–17 MHz hockey stick (for temporal arteries) and 5–14 MHz linear probes (for axillary arteries). Settings were optimised by experts before the study; the same settings were subsequently used by all participants. The following settings were applied for the examination of temporal arteries (axillary arteries): B-mode frequency, 17 MHz (14 MHz); image depth, 1.5 cm (3 cm); and one focus point at 0.5 cm (1.5 cm) below the skin surface. The use of Colour Doppler or superb microvascular imaging modes was allowed to identify vessels and the lumen but was eventually switched off for the measurements.
Calculation of ultrasonography scores
The OGUS was calculated as previously described: OGUS=(Common Right/0.4 mm+Common Left/0.4 mm+Parietal Right/0.3 mm+Parietal Left/0.3 mm+Frontal Right/0.3 mm+Frontal Left/0.3mm+Axillary Right/1.0 mm+Axillary Left/1.0 m)/number of available segments.8 In case of missing segments (eg, due to anatomical variants or because the arterial segment was not found), all available segments were measured, and the final score was divided by the number of evaluated segments. The Southend score was determined as previously described.9 Briefly, based on prespecified cut-off values, each IMT measurement was transformed into a semiquantitative score (range per segment 0–4). For axillary arteries, the obtained values were multiplied by 3. Subsequently, all values were added for the final score (range 0–48). We also determined the Halo count, which is the sum of all segments with an abnormal IMT (range 0–8), based on the rounded cut-off value of normal for the respective segment (common temporal artery: 0.4 mm, frontal/parietal branch: 0.3 mm, axillary artery 1.0 mm).14 17
For non-experts, we also calculated a normalised Southend score and a normalised Halo count. This was done because of missing IMT measurements, given that scores from cases with missing values could not be compared directly with cases having complete data sets. To obtain normalised scores, the actual Southend score/Halo count was divided by the number of segments examined.
Statistical analysis
For descriptive statistics, we provide the absolute number and percentages, mean and SD or median and IQR, as indicated. Inter-reader reliability was assessed by the ICC. ICC estimates and their 95% CIs were based on a single-rating, absolute agreement, two-way random-effects model (inter-rater reliability) or a two-way mixed-effects model (intra-rater reliability). For intra-reader reliability, the median and IQR of all experts are reported. ICC values <0.5 are indicative of poor reliability, values 0.5–0.75 of moderate reliability, 0.76–0.9 of good reliability and values >0.90 indicate excellent reliability.18 To investigate how the scoring results of non-experts differed from those of experts in both rounds, we first retrieved the mean scores of experts. Subsequently, we calculated the absolute differences between the individual scoring of non-experts and the mean scores of experts (OGUSdiff / Southenddiff / Halodiff = │individual value of OGUS / Southend / Halo of non-expert sonographer – mean value OGUS / Southend / Halo expert sonographer │).
The analyses were performed using IBM SPSS statistical software, V.29 (SPSS Chicago, Illinois, USA).
Results
Demographic data of participants and patients
Mean age of experts in vascular ultrasonography was 53 (±12.7) years, 4 (33.3%) of them were women. They had 16.5 (IQR 9.8–20.0) years of experience with vascular ultrasonography and examined a median of 70 (IQR 34–93) patients with suspected or established GCA per year. Nine (75%) work primarily in an academic setting and three (25%) in a community-based hospital; three (25%) additionally see patients in private practice.
Non-experts were younger, mean age was 38.1 (±7.5) years and they were more commonly women (n=7, 58.3%). Four of the 12 non-experts (33.3%) had no experience in vascular ultrasonography, while 8/12 (66.6%) reported a median experience of 1.8 years (IQR 0–3.3). The non-experts examined a median of 10 (IQR 5–30) GCA patients per year. Nine (75%) of them work primarily in an academic setting and three (25%) in a community-based hospital; two (16.7%) additionally see patients in private practice.
The full clinical characteristics of patients are depicted in table 1. In brief, mean age was 78 (±5.1) years, and 2 (33.3%) were women. Median disease duration was 22.5 (IQR 16.3–28.3) months, all had established GCA and were in clinical remission at the time of the study. Clinical phenotypes were cranial GCA in two (33.3%), extra-cranial GCA in three (50%) and mixed type (cranial+extra-cranial GCA) in one (16.7%) case. Four (66.7%) patients were on glucocorticoid treatment at the time of the study, three (50%) received a disease-modifying anti-rheumatic drug (one each on methotrexate, leflunomide or tocilizumab).
Table 1. Patients’ characteristics.
| Pt ID | Sex | Age(years) | Disease duration(months) | Temporal artery abnormality* | Headache* | Jaw claudication* | PMR* | Constitutional symptoms* | ESR (mm/hour)* | CRP (mg/dL)* | GCA phenotype* | GC therapy | DMARD therapy |
| 1 | M | 87 | 28 | Yes | Yes | No | No | No | 74 | 11.9 | cGCA | 0 | LEF |
| 2 | M | 75 | 6 | No | Yes | No | No | No | n.d. | 10.3 | cGCA | 1 | |
| 3 | M | 79 | 16 | No | Yes | No | Yes | Yes | n.d. | 5.5 | cGCA+LV-GCA | 1 | |
| 4 | F | 77 | 17 | No | No | No | Yes | No | 65 | 10 | LV-GCA | 0 | MTX |
| 5 | F | 72 | 29 | No | Yes | Yes | No | Yes | 69 | 2.7 | LV-GCA | 1 | TCZ |
| 6 | M | 78 | 55 | No | No | No | Yes | Yes | 52 | 0.9 | LV-GCA | 1 |
aAt the time of diagnosis.
cGCA, cranial giant cell arteritis; CRP, C-reactive protein; DMARD, disease-modifying anti-rheumatic drug; ESR, erythrocyte sedimentation rate; F, female; GC, glucocorticoid; GCA, giant cell arteritis; ID, patient identifier; LEF, leflunomide; LV-GCA, extra-cranial large-vessel GCA; M, masculine; MTX, methotrexate; n.d., not determined; PMR, polymyalgia rheumatica; TCZ, tocilizumab
Scoring results, number of positive vessels
Mean scoring results per patient, resulting from the evaluation of experts, are depicted in table 2. Figure 1 depicts individual OGUS results and corresponding means of experts and non-experts in vascular ultrasonography in the first and second rounds. Overall, all patients had at least one abnormal vessel, mean OGUS score (resulting from average scorings in round 1) was 0.89 (±0.16), mean Southend score was 19.0 (±4.6) and mean Halo count was 2.8 (±1.4). The most common positive vessels across all patients were axillary arteries, with a median (all experts, round 1) of 6 (IQR 3–7) out of 12 vessels, followed by frontal (median 5, IQR 4–5), common (4, IQR 3–5) and parietal branches (3, IQR 2–4).
Table 2. Results of the OGUS, Southend score and Halo count by experts in vascular ultrasonography.
| Patient number | OGUS | Southend score | Halo count | |||
| Round 1 | Round 2 | Round 1 | Round 2 | Round 1 | Round 2 | |
| 1 | 1.18 (0.18) | 1.16 (0.20) | 24.5 (7.1) | 24.3 (6.2) | 5.3 (1.8) | 5.4 (2.1) |
| 2 | 0.90 (0.10) | 0.86 (0.16) | 10.6 (3.6) | 10.6 (4.2) | 3.3 (1.7) | 3.0 (1.8) |
| 3 | 0.92 (0.13) | 0.91 (0.16) | 20.6 (5.5) | 22.2 (4.2) | 2.8 (1.7) | 3.0 (2.1) |
| 4 | 0.81 (0.11) | 0.76 (0.14) | 18.3 (7.6) | 18.6 (5.5) | 1.5 (1.1) | 1.1 (0.9) |
| 5 | 0.72 (0.12) | 0.65 (0.08) | 19.3 (3.6) | 15.8 (4.5) | 1.8 (1.1) | 1.3 (1.0) |
| 6 | 0.80 (0.12) | 0.74 (0.08) | 20.8 (3.8) | 19.8 (2.7) | 2.0 (1.4) | 1.8 (0.6) |
Data indicate mean (±standard deviationSD) scoring result per patient from all experts in vascular ultrasonography; .
OGUS, Outcome Measures in Rheumatology Giant cell arteritis Ultrasonography Score
Figure 1. Individual measurements (
experts in ultrasonography;
non-experts) and means (
experts in ultrasonography;
non-experts) of the OGUS are depicted for rounds 1 (A) and 2 (B). OGUS, Outcome Measures in Rheumatology Giant cell arteritis Ultrasonography Score.
Reliability of experts regarding ultrasonography measurements and scores
As detailed in table 3, the inter-reader ICCs of IMT measurements were good in both rounds, while the inter-reader ICCs of OGUS were moderate. The Southend score and the Halo count revealed poor (round 1) and moderate (round 2) inter-rater reliabilities. Analysing the reliability of IMT measurements at individual vessels, we found ICCs ranging from 0.35 (95% CI 0.18 to 0.63, axillary arteries) to 0.55 (95% CI 0.35 to 0.79, common temporal arteries) in round 1, and from 0.44 (95% CI 0.25 to 0.71, axillary arteries) to 0.54 (95% CI 0.34 to 0.79, frontal branch) in round 2. Analysing the reliability of IMT measurements at individual patients, we observed ICCs ranging from 0.54 (95% CI 0.31 to 0.84) to 0.90 (95% CI 0.79 to 0.98) in round 1 and from 0.53 (95% CI 0.29 to 0.84) to 0.91 (95% CI 0.81 to 0.98) in round 2.
Table 3. Inter-rater and intra-rater reliability of experts regarding the ultrasonography scores in GCA.
| Inter-raterRound 1 | Inter-raterRound 2 | Intra-rater | ||||
| Score | ICC | 95% CI | ICC | 95% CI | Median ICC | IQR |
| OGUS (n=6) | 0.60 | 0.33 to 0.90 | 0.60 | 0.33 to 0.91 | 0.86 | 0.65–0.92 |
| Southend Score (n=6) | 0.40 | 0.17 to 0.82 | 0.51 | 0.24 to 0.87 | 0.73 | 0.61–0.87 |
| Halo count (n=6) | 0.45 | 0.20 to 0.84 | 0.52 | 0.25 to 0.87 | 0.65 | 0.40–0.84 |
| Individual IMT measurements (n=48) | 0.79 | 0.72 to 0.86 | 0.84 | 0.78 to 0.90 | 0.88 | 0.80–0.95 |
Data indicate the inter-rater ICC of the ultrasonography scores and IMT measurements in both rounds as well as the median ICC for intra-rater reliability.
GCAgiant cell arteritisICC, intraclass correlation coefficient; IMT, intima-media thickness; OGUS, Outcome Measures in Rheumatology Ultrasonography Score for GCA
Median ICC for intra-reader reliability was good for the OGUS and individual measurements, and moderate for the Southend score and the Halo count. CIs were wider for the composite scores as compared with the individual measurements as detailed in table 3.
Reliability of non-experts and the effect of EULVIC
In round 1, one sonographer had three missing IMT values, and two declared that five and two segments were not visible, while in round 2, seven sonographers had one to three (median 1) missing IMT values.
As detailed in table 4, inter-reader reliability was poor in round 1 for all scores as well as for individual measurements. In round 2, moderate reliability was found for the OGUS, the normalised Halo count and individual IMT measurements, while reliability remained poor for the normalised Southend score. Reliability for the (non-normalised) Southend scores and Halo counts could not be calculated because of missing values.
Table 4. Inter-rater reliability of non-experts in vascular ultrasonography before and after ultrasonography training at the European Large Vessel Vasculitis Imaging Course (EULVIC).
| Inter-raterRound 1 | Inter-raterRound 2 | |||||
| Score | n | ICC | 95% CI | N | ICC | 95% CI |
| OGUS | 6 | 0.20 | 0.05 to 0.65 | 6 | 0.52 | 0.25 to 0.88 |
| Southend score | 2 | 0 | 0 to 0.96 | 1 | n.d. | |
| Normalised Southend score* | 6 | 0.20 | 0.05 to 0.64 | 6 | 0.29 | 0.10 to 0.73 |
| Halo count | 2 | 0 | 0 to 0.98 | 1 | n.d. | |
| Normalised Halo count* | 6 | 0.35 | 0.14 to 0.78 | 6 | 0.54 | 0.28 to 0.88 |
| Individual IMT measurements | 46 | 0.24 | 0.15 to 0.37 | 43 | 0.64 | 0.53 to 0.75 |
Data indicate the inter-rater ICC of the ultrasonography scores and IMT measurements before and after the training at EULVIC.
normalized scores for Halo count and Southend score were calculated because of missing intima-media thinckenss measurements in the non-expert group dividing the actual Southend score / Halo count by the number of segments examined
GCAgiant cell arteritisICC, intraclass correlation coefficient; IMT, intima media thickness; n.d.not determinedOGUS, Outcome Measures in Rheumatology ultrasonography score for GCA
Comparison of scores determined by experts and non-experts
Next, we investigated whether the values of non-experts differed from those of experts. As outlined in the Methods’ section, we obtained the absolute difference between the individual scores obtained by non-experts and the mean scoring result of experts. Median OGUSdiff in round 1 was 0.17 (IQR 0.07 to 0.26), and 0.15 (IQR 0.07 to 0.22) in round 2 (p=0.22 for difference between round 1 and 2); median Southenddiff was 7.8 (IQR 4.0 to 10.0) and 6.4 (IQR −3.3 to 9.0), respectively, (p=0.46) and median Halodiff was 1.73 (IQR1.24 to 2.32) and 1.23 (IQR 0.46 to 1.96), respectively (p=0.86).
Discussion
We demonstrate in the present study that inter-reader and intra-reader reliability of experts in vascular ultrasonography regarding the OGUS was moderate and good, respectively, while it was poorer for the other ultrasonography scores. Non-experts yielded a moderate inter-reader reliability of the OGUS after a training module at EULVIC. These results are a further step in the validation of OGUS according to the OFISA.10
One of the most intriguing observations is that only limited effort is required to improve the ultrasonography skills of non-experts in order to achieve an acceptable reliability of IMT measurements and OGUS values in GCA patients with established disease. This observation is even more notable considering the heterogeneous cultural and clinical backgrounds of the physicians involved. Given that a certain level of expertise and reliability of examination results are essential to investigate the efficacy of a drug in clinical trials,19 20 our data might encourage trial designers to incorporate OGUS as an outcome measure in future studies in GCA. We further observed a trend toward the homogenization of scoring results between experts and non-experts in round 2, further underlining the positive effects of the instructions at EULVIC. These findings, however, do not imply that non-experts had already gained sufficient experience for using vascular sonography in clinical routine, particularly not for diagnosing GCA or for differentiating vasculitic lesions from atherosclerosis or other findings.
Few previous studies have investigated the influence of a training programme on the reliability of ultrasonography findings in GCA reporting good to excellent inter-observer reliabilities.21 22 These studies, however, only focused on the halo sign rather than on the calculation of an ultrasonography composite score and investigated the reliability only after the course; hence, it was impossible to evaluate the effects of training on the possible improvement of reliability.
While education in ultrasonography has been included as an overarching principle in recent EULAR recommendations on the use of imaging in large vessel vasculitis, the specific length and content of this training remains to be determined.2 In addition to the organisation of national and international imaging courses like EULVIC, ultrasonography of large vessel vasculitis should be incorporated as a mandatory component in national and international training curricula of rheumatologists.23 In Germany, for example, it has already been included in the model curriculum of the German Society for Rheumatology.24
The OGUS revealed a slightly higher reliability than the Southend score and the Halo count in both expert and non-expert physicians (after the training). Since all scores are based on the same IMT measurements, this can only be explained by the different score calculations or weighting of the individual measures. The Southend score, for example, weights the axillary artery three times higher than other vessels, and we observed that inter-reader reliability was lowest at that site.9 The Halo count has a lower variability than the other scores (given that each value of the IMT is dichotomised), which might inflate the ICC.25 Besides, we cannot exclude that results for the Halo count might have been different if we had asked for the evaluation of the presence or absence of a ‘halo’ based on the OMERACT definition for this lesion.26 Some measurements slightly above the threshold of normal might not have been considered as a halo by experts, given that the definition is more closely linked to the acute phase of the disease.27 In contrast, all our patients had established GCA revealing (most likely) chronic changes or scarring of the arterial wall. Future studies are required to determine whether the Halo count performs better when based on visual assessment of the halo, rather than on IMT measurements.
A recent patient-based reliability exercise performed in five GCA patients and one control by five experts reported an excellent inter-rater reliability for the Halo count, Southend Score and OGUS with ICCs of 0.916, 0.977, 0.963, respectively, as well as excellent intra-rater reliabilities with ICCs of 0.910, 0.974 and 0.973, respectively.11 The ICCs obtained in our study were somewhat lower, particularly for the Southend score and the Halo count which might be explained by the differences in the patient cohorts (more acute GCA patients in the previous study), the use of higher frequency probes (18–22 MHz probes vs 8–17 MHz, personal communication) as well as a more generous time-schedule for the sonographic evaluation per patient (20 vs 15 min) in the former as compared with our exercise.11 We further observed in previous studies that the Southend score and the Halo count revealed slightly worse reliabilities on stored images, as well as lower sensitivities to change and convergent construct validities.8 Based on those data and our current findings, we are of the opinion that the OGUS should be the preferred score for clinical trials, while the Southend halo score and the Halo count might be more useful for diagnostic purposes and disease stratification.928,30
An important next step in the implementation of the OGUS as an outcome measure in clinical trials would be the definition of a cut-off for remission, given that this state is considered the most relevant clinical outcome in GCA.31 Intuitively, a score ≤1 could be such a cut-off, given that it would be achieved when all IMT measurements are within the normal range.8 On the other hand, a value ≤1 does not exclude abnormal findings in a few vessels provided that the IMT is far below the cut-off of normal at other sides. In our study, the mean OGUS was below 1 in almost all patients, even though the mean halo count ranged from 1.1 up to 5.4, indicating that at least one vessel was abnormal in each patient. Recently, a Danish prospective follow-up study reported that an OGUS cut-off of 1.1 yielded a sensitivity of 60% and a specificity of 95% to discriminate between patients in remission and relapse; however, these results still have to be confirmed by future research.5
Although OGUS has been developed as an outcome measure for clinical studies, there have been efforts to test this score for diagnostic purposes. In a prospective, multicentre study including patients with a suspicion of GCA, an OGUS of 0.80 was identified as the optimal cut-off for the distinction of GCA and non-GCA patients yielding a sensitivity of 87% and specificity of 93%.11 While there might be future studies to further elaborate on or validate this cut-off, we emphasise that the primary use of OGUS should be monitoring in the setting of a clinical trial and not diagnosis.
The most important limitation of our study is the small number of patients and the fact that all patients were in clinical and laboratory remission. ICC calculation depends on the number of cases and the variability of data points.25 32 Consequently, ICCs of composite scores (producing a single value per patient) were generally lower and CIs wider than individual IMT measurements (8 values × 6 cases). This explains also in part why the reliability of OGUS was poorer in the present patient-based reliability exercise than in the previous online study where the sample size was much larger.8 Unfortunately, it is almost impossible to organise a patient-based reliability exercise in a different way, including (many) more patients, given the high costs and the limited time patients and physicians can dedicate to such a project. Besides, we acknowledge the fact that in GCA, ultrasonography findings in temporal and other arteries are usually most prominent before, or within the first days of glucocorticoid treatment.6 7 However, patients cannot be left without therapy, and it is a matter of chance whether patients with a newly diagnosed GCA can be recruited for such a study.
Experts as well as non-experts conducted the reliability exercise using ultrasound equipment and settings that they were not used to, which might have negatively impacted our results. Consequently, we would expect a better intra-rater reliability in clinical trials where investigators use ultrasound machines and settings, they are familiar with.
Other limitations are the facts that non-experts had different (although low) levels of experience and that they had less time to measure the IMT than experts. In clinical practice, novice sonographers usually need substantially more time for the examination (including IMT measurements) of the eight vessels required to calculate the OGUS. We cannot exclude that the training offered at EULVIC improved skills to different degrees depending on previous experience with vascular ultrasound and enthusiasm for GCA. Unfortunately, it is almost impossible to recruit a homogeneous group of participants during an imaging course. Future studies among ‘ultrasound naive’ trainees or medical students might shed more light on this issue. A consequence of the limited time for non-experts were several missing IMT values which might have introduced bias. Besides, it required the calculation of a normalised Southend score and Halo count, which has not been defined previously.8 9 However, non-expert physicians were participants of EULVIC and preferred attending lectures and workshops rather than spending more time on this exercise.
All these limitations might have led to an underestimation of the reliability of OGUS and the other ultrasonography scores, which ultimately strengthens our conclusion that the scores are sufficiently reliable to qualify as an outcome measure in clinical trials.
What are the next steps for the full validation of the OGUS according to the OFISA? First, the score needs to be tested in a randomised controlled trial, also to determine its discriminative potential between treatment responders and non-responders. Second, a cut-off surrogating remission versus active disease needs to be defined and validated as outlined. Third, it must be tested, whether in GCA patients with concomitant atherosclerosis, the IMT cut-off values for normal/abnormal are still valid. Fourth, the minimum training requirements and the minimum threshold for accuracy qualifying a sonographer to evaluate the OGUS for a clinical trial have to be determined. Our work will hopefully stimulate researchers around the globe to fill in these gaps with prospective data.
In conclusion, we confirmed moderate inter-rater and good intra-rater reliability of the OGUS among experts. Additionally, we demonstrated that a brief theoretical and practical instruction from non-expert physicians is sufficient to achieve moderate inter-rater reliability of the score. This is a further step in the validation of the OGUS according to the OFISA and supports the utilisation of this ultrasonography composite score in future clinical trials in GCA.
Acknowledgements
We thank the patients for their kind support in the performed reliability exercise as well as the team of Canon Austria for providing six ultrasound machines used in the reliability exercise and for the local support. Besides, we would like to thank (Lilien Heit, Barbara Lindhuber, Francesco Teso, Johannes Miehling, Hannah Oberhammer, Julia Brandtner) for their help with data acquisition.
Footnotes
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Patient consent for publication: Not applicable.
Ethics approval: This study involves human participants and was approved by institutional review board of the Medical University Innsbruck, Austria (approval number 1118/2023). Participants gave informed consent to participate in the study before taking part.
Provenance and peer review: Not commissioned; externally peer reviewed.
Data availability statement
Data are available upon reasonable request.
References
- 1.Dejaco C, Brouwer E, Mason JC, et al. Giant cell arteritis and polymyalgia rheumatica: current challenges and opportunities. Nat Rev Rheumatol. 2017;13:578–92. doi: 10.1038/nrrheum.2017.142. [DOI] [PubMed] [Google Scholar]
- 2.Dejaco C, Ramiro S, Bond M, et al. EULAR recommendations for the use of imaging in large vessel vasculitis in clinical practice: 2023 update. Ann Rheum Dis Published Online First. 2023 doi: 10.1136/ARD-2023-224543. [DOI] [PubMed] [Google Scholar]
- 3.Sanchez-Alvarez C, Bond M, Soowamber M, et al. Measuring treatment outcomes and change in disease activity in giant cell arteritis: a systematic literature review informing the development of the EULAR-ACR response criteria on behalf of the EULAR-ACR response criteria in giant cell arteritis task force. RMD Open. 2023;9:e003233. doi: 10.1136/rmdopen-2023-003233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dejaco C, Ramiro S, Touma Z, et al. What is a response in randomised controlled trials in giant cell arteritis? Ann Rheum Dis. 2023;82:897–900. doi: 10.1136/ard-2022-223751. [DOI] [PubMed] [Google Scholar]
- 5.Nielsen BD, Therkildsen P, Keller KK, et al. Ultrasonography in the assessment of disease activity in cranial and large-vessel giant cell arteritis: a prospective follow-up study. Rheumatology (Oxford) 2023;62:3084–94. doi: 10.1093/rheumatology/kead028. [DOI] [PubMed] [Google Scholar]
- 6.Seitz L, Christ L, Lötscher F, et al. Quantitative ultrasound to monitor the vascular response to tocilizumab in giant cell arteritis. Rheumatology (Oxford) 2021;60:5052–9. doi: 10.1093/rheumatology/keab484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ponte C, Monti S, Scirè CA, et al. Ultrasound halo sign as a potential monitoring tool for patients with giant cell arteritis: a prospective analysis. Ann Rheum Dis. 2021;80:1475–82. doi: 10.1136/annrheumdis-2021-220306. [DOI] [PubMed] [Google Scholar]
- 8.Dejaco C, Ponte C, Monti S, et al. The provisional OMERACT ultrasonography score for giant cell arteritis. Ann Rheum Dis. 2023;82:556–64. doi: 10.1136/ard-2022-223367. [DOI] [PubMed] [Google Scholar]
- 9.van der Geest KSM, Borg F, Kayani A, et al. Novel ultrasonographic Halo Score for giant cell arteritis: assessment of diagnostic accuracy and association with ocular ischaemia. Ann Rheum Dis. 2020;79:393–9. doi: 10.1136/annrheumdis-2019-216343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Terslev L, Naredo E, Keen HI, et al. The OMERACT Stepwise Approach to Select and Develop Imaging Outcome Measurement Instruments: The Musculoskeletal Ultrasound Example. J Rheumatol. 2019;46:1394–400. doi: 10.3899/jrheum.181158. [DOI] [PubMed] [Google Scholar]
- 11.Sebastian A, van der Geest KSM, Tomelleri A, et al. Development of a diagnostic prediction model for giant cell arteritis by sequential application of Southend Giant Cell Arteritis Probability Score and ultrasonography: a prospective multicentre study. Lancet Rheumatol. 2024;6:e291–9. doi: 10.1016/S2665-9913(24)00027-4. [DOI] [PubMed] [Google Scholar]
- 12.Hočevar A, Bruyn GA, Terslev L, et al. Development of a new ultrasound scoring system to evaluate glandular inflammation in Sjögren’s syndrome: an OMERACT reliability exercise. Rheumatology (Sunnyvale) 2022;61:3341–50. doi: 10.1093/rheumatology/keab876. [DOI] [PubMed] [Google Scholar]
- 13.Schäfer VS, Chrysidis S, Dejaco C, et al. Assessing Vasculitis in Giant Cell Arteritis by Ultrasound: Results of OMERACT Patient-based Reliability Exercises. J Rheumatol. 2018;45:1289–95. doi: 10.3899/jrheum.171428. [DOI] [PubMed] [Google Scholar]
- 14.Schäfer VS, Juche A, Ramiro S, et al. Ultrasound cut-off values for intima-media thickness of temporal, facial and axillary arteries in giant cell arteritis. Rheumatology (Oxford) 2017;56:1479–83. doi: 10.1093/rheumatology/kex143. [DOI] [PubMed] [Google Scholar]
- 15.Bosch P, Dejaco C, Schmidt WA, et al. Ultrasound for diagnosis and follow-up of chronic axillary vasculitis in patients with long-standing giant cell arteritis. Ther Adv Musculoskelet Dis. 2021;13:1759720X21998505. doi: 10.1177/1759720X21998505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Czihal M, Köhler A, Lottspeich C, et al. Temporal artery compression sonography for the diagnosis of giant cell arteritis in elderly patients with acute ocular arterial occlusions. Rheumatology (Oxford) 2021;60:2190–6. doi: 10.1093/rheumatology/keaa515. [DOI] [PubMed] [Google Scholar]
- 17.Ješe R, Rotar Ž, Tomšič M, et al. The cut-off values for the intima-media complex thickness assessed by colour Doppler sonography in seven cranial and aortic arch arteries. Rheumatology (Oxford) 2021;60:1346–52. doi: 10.1093/rheumatology/keaa578. [DOI] [PubMed] [Google Scholar]
- 18.Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15:155–63. doi: 10.1016/j.jcm.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Consortium CPIECOA (eCOA) Training the Raters: An Important Factor in Clinical Trial Success. Appl Clin Trials. 2023;32 [Google Scholar]
- 20.Kobak KA, Engelhardt N, Williams JBW, et al. Rater training in multicenter clinical trials: issues and recommendations. J Clin Psychopharmacol. 2004;24:113–7. doi: 10.1097/01.JCP.0000116651.91923.54. [DOI] [PubMed] [Google Scholar]
- 21.De Miguel E, Castillo C, Rodríguez A, et al. Learning and reliability of colour Doppler ultrasound in giant cell arteritis. Clin Exp Rheumatol. 2009;27:S53–8. [PubMed] [Google Scholar]
- 22.Chrysidis S, Terslev L, Christensen R, et al. Vascular ultrasound for the diagnosis of giant cell arteritis: a reliability and agreement study based on a standardised training programme. RMD Open. 2020;6:e001337. doi: 10.1136/rmdopen-2020-001337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.UNION EUROPÉENNE DES MÉDECINS SPÉCIALISTES European standards in medical training. 2014. https://www.uems.eu/__data/assets/pdf_file/0005/44438/UEMS-2014.21-European-Training-Requirements-Rheumatology-.pdf Available.
- 24.Pfeil A, Krusche M, Vossen D, et al. Model curriculum of the German society for Rheumatology for advanced training in the discipline internal medicine and rheumatology. English version. Z Rheumatol. 2021;80:64–7. doi: 10.1007/s00393-021-01080-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pleil JD, Wallace MAG, Stiegel MA, et al. Human biomarker interpretation: the importance of intra-class correlation coefficients (ICC) and their calculations based on mixed models, ANOVA, and variance estimates. J Toxicol Environ Health, Part B. 2018;21:161–80. doi: 10.1080/10937404.2018.1490128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chrysidis S, Duftner C, Dejaco C, et al. Definitions and reliability assessment of elementary ultrasound lesions in giant cell arteritis: a study from the OMERACT Large Vessel Vasculitis Ultrasound Working Group. RMD Open. 2018;4:e000598. doi: 10.1136/rmdopen-2017-000598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schäfer VS, Chrysidis S, Schmidt WA, et al. OMERACT definition and reliability assessment of chronic ultrasound lesions of the axillary artery in giant cell arteritis. Semin Arthritis Rheum. 2021;51:951–6. doi: 10.1016/j.semarthrit.2021.04.014. [DOI] [PubMed] [Google Scholar]
- 28.van der Geest KSM, Wolfe K, Borg F, et al. Ultrasonographic Halo Score in giant cell arteritis: association with intimal hyperplasia and ischaemic sight loss. Rheumatology (Oxford) 2021;60:4361–6. doi: 10.1093/rheumatology/keaa806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sebastian A, Tomelleri A, Kayani A, et al. Probability-based algorithm using ultrasound and additional tests for suspected GCA in a fast-track clinic. RMD Open. 2020;6:e001297. doi: 10.1136/rmdopen-2020-001297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tomelleri A, Geest KSM, Khurshid MA, et al. Disease Stratification in Giant Cell Arteritis and Polymyalgia Rheumatica: State of the Art and Future Perspectives. Nat Rev Rheumatol. 2023;19:446–59. doi: 10.1038/s41584-023-00976-8. [DOI] [PubMed] [Google Scholar]
- 31.Dejaco C, Kerschbaumer A, Aletaha D, et al. Treat-to-target recommendations in giant cell arteritis and polymyalgia rheumatica. Ann Rheum Dis. 2024;83:48–57. doi: 10.1136/ard-2022-223429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bujang MA. A simplified guide to determination of sample size requirements for estimating the value of intraclass correlation coefficient: A review. Arch Orofac Sci. 2017;12:1–11. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data are available upon reasonable request.

