Skip to main content
. 2022 Jul 3;46(2):zsac154. doi: 10.1093/sleep/zsac154

Table 3.

Cohen’s kappa values comparing manual- and auto-scoring against three different comparators

All stages W N1 N2 N3 R
Manual Auto Manual Auto Manual Auto Manual Auto Manual Auto Manual Auto
Dataset A
 vs. individual scorers 0.62 ± 0.071 0.69 ± 0.054 0.80 ± 0.053 0.83 ± 0.052 0.32 ± 0.119 0.39 ± 0.113 0.59 ± 0.110 0.67 ± 0.081 0.42 ± 0.156 0.56 ± 0.185 0.79 ± 0.041 0.84 ± 0.041
 vs. unbiased consensus of scorers 0.69 ± 0.063 0.78 ± 0.013 0.85 ± 0.046 0.89 ± 0.004 0.42 ± 0.099 0.51 ± 0.025 0.68 ± 0.094 0.77 ± 0.017 0.51 ± 0.151 0.69 ± 0.058 0.83 ± 0.032 0.90 ± 0.007
 vs. any scorer 0.90 ± 0.050 0.96 ± 0.005 0.93 ± 0.045 0.96 ± 0.001 0.80 ± 0.089 0.88 ± 0.008 0.90 ± 0.072 0.96 ± 0.006 0.89 ± 0.113 0.99 ± 0.013 0.93 ± 0.034 0.98 ± 0.002
Dataset B
 vs. individual scorers 0.62 ± 0.062 0.66 ± 0.033 0.76 ± 0.047 0.79 ± 0.030 0.33 ± 0.097 0.41 ± 0.095 0.60 ± 0.090 0.65 ± 0.048 0.65 ± 0.093 0.72 ± 0.071 0.80 ± 0.056 0.82 ± 0.040
 vs. unbiased consensus of scorers 0.69 ± 0.038 0.75 ± 0.005 0.82 ± 0.042 0.85 ± 0.003 0.42 ± 0.053 0.53 ± 0.028 0.67 ± 0.057 0.74 ± 0.008 0.72 ± 0.065 0.81 ± 0.020 0.85 ± 0.047 0.87 ± 0.005
 vs. any scorer 0.95 ± 0.028 0.97 ± 0.002 0.96 ± 0.021 0.98 ± 0.002 0.92 ± 0.036 0.95 ± 0.005 0.95 ± 0.036 0.98 ± 0.004 0.96 ± 0.033 1.00 ± 0.001 0.96 ± 0.039 0.97 ± 0.002
Dataset C
 vs. individual scorers 0.60 ± 0.055 0.64 ± 0.038 0.75 ± 0.049 0.76 ± 0.041 0.32 ± 0.086 0.39 ± 0.084 0.57 ± 0.068 0.62 ± 0.052 0.50 ± 0.177 0.55 ± 0.080 0.84 ± 0.050 0.88 ± 0.036
 vs. unbiased consensus of scorers 0.69 ± 0.045 0.76 ± 0.004 0.82 ± 0.042 0.82 ± 0.006 0.44 ± 0.078 0.54 ± 0.012 0.66 ± 0.057 0.76 ± 0.006 0.59 ± 0.133 0.75 ± 0.005 0.88 ± 0.046 0.93 ± 0.003
 vs. any scorer 0.96 ± 0.024 0.99 ± 0.001 0.97 ± 0.021 0.98 ± 0.002 0.94 ± 0.036 0.97 ± 0.002 0.96 ± 0.026 0.99 ± 0.001 0.96 ± 0.034 1.00 ± 0.001 0.97 ± 0.033 1.00 ± 0.001

Data are presented as mean ± SD Cohen’s kappa values. Individual Scorers: Pairwise comparison between the evaluated scorer (manual- or auto-scoring) and each remaining scorer (resulting kappa values are averaged). Unbiased Consensus of Scorers: Each evaluated scorer is compared to the consensus of the remaining scorers (unbiased consensus), and the auto-scoring is compared to the same unbiased consensus for each scorer (resulting kappa values are averaged). Any Scorer: Each evaluated scorer is compared to all remaining scorers, where each epoch is considered correct if at least one of the remaining scorers agreed with the evaluated scorer, and the auto-scoring is compared to the same combinations of scorers (resulting kappa values are averaged). Kappa values of 0.21–0.4, 0.41–0.6, 0.61–0.8, and >0.8 represent fair, moderate, substantial, and almost-perfect agreement

[32].