. 2016 Apr 6;2(3):138–153. doi: 10.1002/cjp2.42

Table 5.

Subject level AUC and kappa agreement between automated Ki67 and visually derived scores for a subset of the participating studies for which visual scores were available (N = 1,849)

Study	Cases (N)	AUC (95% CI)	Observed agreement (95% CI)	Kappa
ABCS	215	86 (79, 94)	87 (82, 87)	0.52 (0.45, 0.59)
CNIO	154	87 (78, 97)	79 (72, 85)	0.39 (0.32, 0.47)
ESTHER	244	95 (93, 98)	92 (88, 95)	0.69 (0.62, 0.74)
PBCS	1,236	88 (87, 91)	89 (87, 91)	0.50 (0.47, 0.52)
TMA in training set ^*
Yes	613	90 (86, 93)	87 (84, 90)	0.54 (0.50, 0.58)
No	1,236	89 (87, 91)	89 (87, 91)	0.50 (0.47, 0.52)
Overall	1,849	90 (88, 91)	88 (87, 90)	0.65 (0.63, 0.67)

Semi‐quantitative categories of visual scores were used to determine kappa agreement. AUC was determined using continuous automated scores and dichotomous categories of visual scores.

*Agreement analyses were stratified by whether or not a study had TMAs in the training set. ABCS, CNIO and ESTHER all had TMAs in the training set while PBCS did not have TMAs in the training set.