Table 3. Interrater reliability by training level.
Scale and raters | Sellar invasion a | Suprasellar extension b | ||
---|---|---|---|---|
Reliability (95% CI) | Percent agreement | Reliability (95% CI) | Percent agreement | |
Full scale | ||||
Faculty raters | 0.67 (0.48–0.80) | 9/50 (18%) | 0.80 (0.68–0.88) | 19/50 (38%) |
Resident raters | 0.68 (0.49–0.80) | 22/50 (44%) | 0.78 (0.64–0.87) | 14/50 (28%) |
Intermediate scores | ||||
Faculty raters | 0.14 (− 0.29 to 0.52) | 2/23 (9%) | 0.27 (− 0.15 to 0.61) | 13/24 (54%) |
Resident raters | 0.13 (− 0.30 to 0.52) | 9/23 (39%) | 0.49 (0.11–0.75) | 9/24 (38%) |
Scale ends | ||||
Faculty raters | 0.73 (0.49–0.87) | 7/27 (26%) | 0.85 (0.70–0.93) | 6/26 (23%) |
Resident raters | 0.86 (0.71–0.95) | 13/27 (48%) | 0.86 (0.70–0.95) | 3/26 (12%) |
Dichotomous scale | ||||
Faculty raters | 0.58 (0.36–0.74) | 36/50 (72%) | 0.51 (0.27–0.69) | 43/50 (86%) |
Resident raters | 0.62 (0.41–0.77) | 36/50 (72%) | 0.15 (–0.22–0.48) | 45/50 (90%) |
Abbreviation: CI, confidence interval.
Full scale: Grades 0–IV. Dichotomous scale: Grades 0–III versus Grade IV.
Full scale: Types 0–D. Dichotomous scale: Types 0–C versus Type D.