Skip to main content
. 2021 Jul 1;17(7):1343–1354. doi: 10.5664/jcsm.9192

Table 2.

Kappa values and accuracy for manual vs CReSS sleep staging.

Sleep Stage Discrimination MESA Dataset (n = 296) SHHS Dataset (n = 296)
Kappa (95% CI) Macro-F1 (95% CI) Weighted Macro-F1 (95% CI) Percent Accuracy Kappa (95% CI) Macro-F1 (95% CI) Weighted Macro-F1 (95% CI) Percent Accuracy
CReSS Applied to Heart Rate and Airflow Signals
Wake/LS/DS/REM sleep 0.643 (0.641–0.645) 0.728 (0.717–0.740) 0.777 (0.768–0.785) 77.6 0.578 (0.576–0.581) 0.692 (0.679–0.706) 0.739 (0.728–0.750) 73.3
Wake/sleep 0.711 (0.708–0.714) 0.855 (0.843–0.864) 0.897 (0.888–0.903) 89.3 0.634 (0.631–0.638) 0.816 (0.803–0.827) 0.890 (0.882–0.897) 88.3
NREM sleep/REM sleep 0.790 (0.786–0.793) 0.895 (0.885–0.902) 0.936 (0.930–0.940) 93.5 0.756 (0.752–0.759) 0.878 (0.866–0.887) 0.926 (0.919–0.931) 92.4
LS/DS 0.469 (0.462–0.475) 0.734 (0.718–0.748) 0.870 (0.861–0.878) 86.8 0.445 (0.439–0.451) 0.721 (0.707–0.733) 0.846 (0.837–0.854) 83.6
Wake/NREM sleep/REM sleep 0.719 (0.717–0.722) 0.819 (0.808–0.827) 0.850 (0.841–0.857) 84.8 0.665 (0.662–0.668) 0.781 (0.765–0.793) 0.832 (0.821–0.841) 82.6
CReSS Applied to Heart Rate, Airflow, and Thoracic Respiratory Effort Signals
Wake/LS/DS/REM sleep 0.680 (0.678–0.682) 0.748 (0.737–0.759) 0.800 (0.791–0.807) 79.8 0.635 (0.633–0.637) 0.750 (0.723–0.769) 0.805 (0.782–0.819) 76.7
Wake/sleep 0.756 (0.753–0.759) 0.878 (0.868–0.887) 0.911 (0.903–0.918) 90.8 0.705 (0.701–0.708) 0.885 (0.862–0.900) 0.909 (0.889–0.920) 90.4
NREM sleep/REM sleep 0.823 (0.820–0.827) 0.912 (0.906–0.918) 0.944 (0.940–0.949) 94.5 0.807 (0.803–0.810) 0.908 (0.894–0.920) 0.942 (0.932–0.950) 93.8
LS/DS 0.461 (0.454–0.468) 0.730 (0.715–0.745) 0.874 (0.865–0.881) 87.0 0.464 (0.458–0.470) 0.735 (0.705–0.771) 0.881 (0.863–0.895) 84.3
Wake/NREM sleep/REM sleep 0.762 (0.760–0.764) 0.847 (0.838–0.856) 0.871 (0.863–0.878) 86.9 0.729 (0.727–0.731) 0.847 (0.826–0.864) 0.869 (0.851–0.882) 85.8

An F1 score is the harmonic mean of positive predictive value (precision) and sensitivity (recall); the macro-F1 score presented here for each sleep stage discrimination is the arithmetic mean of the F1 scores calculated across all sleep stages. In addition, we present weighted macro-F1 scores performed according to the frequency of each sleep stage within the dataset. Discriminations of wake/LS/DS/REM sleep and wake/NREM sleep/REM sleep are based on all epochs. For the discrimination of NREM sleep/REM sleep that does not include wake, we transformed the confusion matrix by removing the wake column and row; the same transformation was undertaken for the LS/DS discrimination by removing both wake and REM sleep. CI = confidence interval, CReSS = CardioRespiratory Sleep Staging, DS = deep sleep (corresponding to N3), LS = light sleep (corresponding to N1 + N2), NREM = non-REM, REM = rapid eye movement.