Table 2.
Average Κappa Score | Average Agreement (%) | Robust Agreement (%) | |
---|---|---|---|
Pathologist 1 | 0.454 (0.372–0.536) | 61.3 (53.3–69.3) | 66.9 (59.2–74.6) |
Pathologist 2 | 0.515 (0.433–0.597) | 64.8 (57.0–72.6) | 72.3 (65.0–79.6) |
Pathologist 3 | 0.514 (0.432–0.596) | 63.1 (55.2–71.0) | 75.4 (68.3–82.5) |
Inter-pathologist | 0.479 (0.397–0.561) | 62.7 (54.8–70.6) | 71.5 (64.1–78.9) |
Baseline model24 | 0.445 (0.364–0.526) | 60.1 (52.1–68.1) | 69.0 (61.4–76.6) |
Our model | 0.525 (0.443–0.607) | 66.6 (58.9–74.3) | 76.7 (69.8–83.6) |
Average kappa score is calculated by averaging pairs of an annotator’s kappa scores. For instance, Pathologist 1 average is calculated by averaging the kappa scores of Pathologist 1 & Pathologist 2, Pathologist 1 & Pathologist 3, and Pathologist 1 & our model. Average agreement was calculated in the same fashion. Robust agreement indicates agreement for an annotator with at least two of the three other annotators. 95% confidence intervals are shown in parentheses.