Skip to main content
. 2023 Feb 10;46(5):zsad028. doi: 10.1093/sleep/zsad028

Table 3.

Scorers performance on IS-RC, DOD-H, and DOD-O datasets with Soft-Agreement (SA), overall accuracy (%Acc.), macro F1-score (%MF1), Cohen’s Kappa (k), weighted-averaging F1-score (%F1), and % per-class F1-score. The scorer with the best performance (i.e. high agreement with the consensus among the different physicians) is indicated in bold

Overall metrics Per-class F1-score
Scorers SA Acc. MF1 k F1 W N1 N2 N3 R
IS-RC Scorer-1 0.79 83.0 69.5 0.72 83.8 83.1 47.2 87.3 48.0 82.1
Scorer-2 0.81 89.4 72.8 0.82 89.2 91.3 57.6 92.5 32.9 89.8
Scorer-3 0.53 40.7 26.5 0.11 40.8 29.8 14.7 54.5 17.9 15.6
Scorer-4 0.52 38.9 26.1 0.12 40.5 28.6 14.7 54.2 15.4 17.5
Scorer-5 0.70 73.7 61.6 0.63 75.8 88.7 36.9 70.2 25.8 86.2
Scorer-6 0.79 87.2 77.2 0.81 88.2 92.5 54.6 89.4 59.8 89.5
Average 0.69 68.7 55.5 0.53 69.7 68.9 37.6 74.7 33.3 63.5
DOD-H Scorer-1 0.88 87.0 81.5 0.81 87.4 87.5 60.0 89.4 84.8 85.7
Scorer-2 0.91 89.3 84.1 0.84 89.7 87.4 65.1 91.6 84.3 92.2
Scorer-3 0.92 90.6 84.5 0.86 90.4 89.9 67.5 92.1 77.9 95.3
Scorer-4 0.84 82.6 76.7 0.75 83.1 76.5 49.1 85.4 80.7 92.0
Scorer-5 0.92 89.9 83.6 0.85 89.9 86.7 66.0 92.1 81.0 92.2
Average 0.89 87.9 82.1 0.82 88.1 85.5 61.5 90.0 81.7 91.5
DOD-O Scorer-1 0.87 85.0 75.1 0.77 84.6 90.0 49.5 85.2 67.6 83.3
Scorer-2 0.87 85.0 78.2 0.78 86.0 89.3 58.4 85.4 69.1 88.6
Scorer-3 0.88 86.0 75.0 0.78 84.6 91.0 54.3 86.5 56.1 87.0
Scorer-4 0.88 86.7 77.7 0.80 87.2 91.2 59.3 89.4 62.9 85.8
Scorer-5 0.91 89.9 82.3 0.84 90.0 93.7 68.3 90.7 70.5 88.2
Average 0.88 86.5 77.6 0.79 86.4 91.0 58.0 87.3 65.2 86.5