. 2023 Apr 6;25:e41233. doi: 10.2196/41233

Table 1.

Attempts to optimize the aggregation of raw scores using training set. In intergrader assessment (IGA) simulations, the true prevalence was 40%; otherwise, the training set gold standard prevalence was 2%.

Model and cutoff description (≥cutoff)			% correct		Sensitivity (%)		Specificity (%)		Kappa		Prevalence (%)		False positives, n		Skilled grader burden reduction (%)
IGA simulation
	Maximize kappa (6)	86		90		83		0.715		46.0		N/A^a		N/A
Total raw score
	Maximized kappa (from IGA) (6)	80		87		80		0.115		21.6		3		78
	Naïve/majority rules (11)	98		65		98		0.534		2.8		8		97
	WHO^b minimum accepted sensitivity (10)	97		70		97		0.450		4.0		7		96
	Maximize kappa (full set) (14)	99		48		100		0.605		1.1		12		99
	Mimic true prevalence (12)	98		61		99		0.587		2.1		9		98
Truncated mean approach (simulated IGA sample)
	Maximize kappa (4)	90		95		87		0.797		46.0		N/A		N/A
Truncated mean approach
	Maximized kappa (from IGA) (4)	85		91		84		0.162		17.1		2		84
	Naïve/majority Rules (8)	98		61		98		0.497		2.8		9		97
	WHO minimum accepted sensitivity (6)	94		74		95		0.326		6.5		6		93
	Maximize kappa (full set) (11)	99		43		100		0.565		1.0		13		99
	Mimic true prevalence (9)	98		52		99		0.524		1.9		11		98
	Create 90% skilled grader burden reduction (5)	91		78		91		0.229		10.3		5		90
Virtual reading center model with overreads
	Truncated mean with maximized kappa from IGA; skilled overead of all positive images (n=196)	99		78		99		0.685 (0.786 in IGA sample)		2.5		5		84
	Truncated mean with 90% skilled grader burden; with skilled overread of all positive images (n=115)	99		74		99		0.673 (0.741 in IGA sample)		2.4		6		90

^aN/A: not applicable.

^bWHO: World Health Organization.