. 2019 Mar 4;9:3358. doi: 10.1038/s41598-019-40041-7

Table 2.

Comparison of pathologists and our model for classification of predominant subtypes in 143 whole-slide images in our test set.

	Average Κappa Score	Average Agreement (%)	Robust Agreement (%)
Pathologist 1	0.454 (0.372–0.536)	61.3 (53.3–69.3)	66.9 (59.2–74.6)
Pathologist 2	0.515 (0.433–0.597)	64.8 (57.0–72.6)	72.3 (65.0–79.6)
Pathologist 3	0.514 (0.432–0.596)	63.1 (55.2–71.0)	75.4 (68.3–82.5)
Inter-pathologist	0.479 (0.397–0.561)	62.7 (54.8–70.6)	71.5 (64.1–78.9)
Baseline model²⁴	0.445 (0.364–0.526)	60.1 (52.1–68.1)	69.0 (61.4–76.6)
Our model	0.525 (0.443–0.607)	66.6 (58.9–74.3)	76.7 (69.8–83.6)

Average kappa score is calculated by averaging pairs of an annotator’s kappa scores. For instance, Pathologist 1 average is calculated by averaging the kappa scores of Pathologist 1 & Pathologist 2, Pathologist 1 & Pathologist 3, and Pathologist 1 & our model. Average agreement was calculated in the same fashion. Robust agreement indicates agreement for an annotator with at least two of the three other annotators. 95% confidence intervals are shown in parentheses.