Skip to main content
. 2019 Mar 4;9:3358. doi: 10.1038/s41598-019-40041-7

Table 2.

Comparison of pathologists and our model for classification of predominant subtypes in 143 whole-slide images in our test set.

Average Κappa Score Average Agreement (%) Robust Agreement (%)
Pathologist 1 0.454 (0.372–0.536) 61.3 (53.3–69.3) 66.9 (59.2–74.6)
Pathologist 2 0.515 (0.433–0.597) 64.8 (57.0–72.6) 72.3 (65.0–79.6)
Pathologist 3 0.514 (0.432–0.596) 63.1 (55.2–71.0) 75.4 (68.3–82.5)
Inter-pathologist 0.479 (0.397–0.561) 62.7 (54.8–70.6) 71.5 (64.1–78.9)
Baseline model24 0.445 (0.364–0.526) 60.1 (52.1–68.1) 69.0 (61.4–76.6)
Our model 0.525 (0.443–0.607) 66.6 (58.9–74.3) 76.7 (69.8–83.6)

Average kappa score is calculated by averaging pairs of an annotator’s kappa scores. For instance, Pathologist 1 average is calculated by averaging the kappa scores of Pathologist 1 & Pathologist 2, Pathologist 1 & Pathologist 3, and Pathologist 1 & our model. Average agreement was calculated in the same fashion. Robust agreement indicates agreement for an annotator with at least two of the three other annotators. 95% confidence intervals are shown in parentheses.