Skip to main content
. 2023 Apr 6;25:e41233. doi: 10.2196/41233

Table 1.

Attempts to optimize the aggregation of raw scores using training set. In intergrader assessment (IGA) simulations, the true prevalence was 40%; otherwise, the training set gold standard prevalence was 2%.

Model and cutoff description (≥cutoff) % correct Sensitivity (%) Specificity (%) Kappa Prevalence (%) False positives, n Skilled grader burden reduction (%)
IGA simulation

Maximize kappa (6) 86 90 83 0.715 46.0 N/Aa N/A
Total raw score

Maximized kappa (from IGA) (6) 80 87 80 0.115 21.6 3 78

Naïve/majority rules (11) 98 65 98 0.534 2.8 8 97

WHOb minimum accepted sensitivity (10) 97 70 97 0.450 4.0 7 96

Maximize kappa (full set) (14) 99 48 100 0.605 1.1 12 99

Mimic true prevalence (12) 98 61 99 0.587 2.1 9 98
Truncated mean approach (simulated IGA sample)

Maximize kappa (4) 90 95 87 0.797 46.0 N/A N/A
Truncated mean approach

Maximized kappa (from IGA) (4) 85 91 84 0.162 17.1 2 84

Naïve/majority Rules (8) 98 61 98 0.497 2.8 9 97

WHO minimum accepted sensitivity (6) 94 74 95 0.326 6.5 6 93

Maximize kappa (full set) (11) 99 43 100 0.565 1.0 13 99

Mimic true prevalence (9) 98 52 99 0.524 1.9 11 98

Create 90% skilled grader burden reduction (5) 91 78 91 0.229 10.3 5 90
Virtual reading center model with overreads

Truncated mean with maximized kappa from IGA; skilled overead of all positive images (n=196) 99 78 99 0.685 (0.786 in IGA sample) 2.5 5 84

Truncated mean with 90% skilled grader burden; with skilled overread of all positive images (n=115) 99 74 99 0.673 (0.741 in IGA sample) 2.4 6 90

aN/A: not applicable.

bWHO: World Health Organization.