Table 2.
Kappa values and accuracy for manual vs CReSS sleep staging.
| Sleep Stage Discrimination | MESA Dataset (n = 296) | SHHS Dataset (n = 296) | ||||||
|---|---|---|---|---|---|---|---|---|
| Kappa (95% CI) | Macro-F1 (95% CI) | Weighted Macro-F1 (95% CI) | Percent Accuracy | Kappa (95% CI) | Macro-F1 (95% CI) | Weighted Macro-F1 (95% CI) | Percent Accuracy | |
| CReSS Applied to Heart Rate and Airflow Signals | ||||||||
| Wake/LS/DS/REM sleep | 0.643 (0.641–0.645) | 0.728 (0.717–0.740) | 0.777 (0.768–0.785) | 77.6 | 0.578 (0.576–0.581) | 0.692 (0.679–0.706) | 0.739 (0.728–0.750) | 73.3 |
| Wake/sleep | 0.711 (0.708–0.714) | 0.855 (0.843–0.864) | 0.897 (0.888–0.903) | 89.3 | 0.634 (0.631–0.638) | 0.816 (0.803–0.827) | 0.890 (0.882–0.897) | 88.3 |
| NREM sleep/REM sleep | 0.790 (0.786–0.793) | 0.895 (0.885–0.902) | 0.936 (0.930–0.940) | 93.5 | 0.756 (0.752–0.759) | 0.878 (0.866–0.887) | 0.926 (0.919–0.931) | 92.4 |
| LS/DS | 0.469 (0.462–0.475) | 0.734 (0.718–0.748) | 0.870 (0.861–0.878) | 86.8 | 0.445 (0.439–0.451) | 0.721 (0.707–0.733) | 0.846 (0.837–0.854) | 83.6 |
| Wake/NREM sleep/REM sleep | 0.719 (0.717–0.722) | 0.819 (0.808–0.827) | 0.850 (0.841–0.857) | 84.8 | 0.665 (0.662–0.668) | 0.781 (0.765–0.793) | 0.832 (0.821–0.841) | 82.6 |
| CReSS Applied to Heart Rate, Airflow, and Thoracic Respiratory Effort Signals | ||||||||
| Wake/LS/DS/REM sleep | 0.680 (0.678–0.682) | 0.748 (0.737–0.759) | 0.800 (0.791–0.807) | 79.8 | 0.635 (0.633–0.637) | 0.750 (0.723–0.769) | 0.805 (0.782–0.819) | 76.7 |
| Wake/sleep | 0.756 (0.753–0.759) | 0.878 (0.868–0.887) | 0.911 (0.903–0.918) | 90.8 | 0.705 (0.701–0.708) | 0.885 (0.862–0.900) | 0.909 (0.889–0.920) | 90.4 |
| NREM sleep/REM sleep | 0.823 (0.820–0.827) | 0.912 (0.906–0.918) | 0.944 (0.940–0.949) | 94.5 | 0.807 (0.803–0.810) | 0.908 (0.894–0.920) | 0.942 (0.932–0.950) | 93.8 |
| LS/DS | 0.461 (0.454–0.468) | 0.730 (0.715–0.745) | 0.874 (0.865–0.881) | 87.0 | 0.464 (0.458–0.470) | 0.735 (0.705–0.771) | 0.881 (0.863–0.895) | 84.3 |
| Wake/NREM sleep/REM sleep | 0.762 (0.760–0.764) | 0.847 (0.838–0.856) | 0.871 (0.863–0.878) | 86.9 | 0.729 (0.727–0.731) | 0.847 (0.826–0.864) | 0.869 (0.851–0.882) | 85.8 |
An F1 score is the harmonic mean of positive predictive value (precision) and sensitivity (recall); the macro-F1 score presented here for each sleep stage discrimination is the arithmetic mean of the F1 scores calculated across all sleep stages. In addition, we present weighted macro-F1 scores performed according to the frequency of each sleep stage within the dataset. Discriminations of wake/LS/DS/REM sleep and wake/NREM sleep/REM sleep are based on all epochs. For the discrimination of NREM sleep/REM sleep that does not include wake, we transformed the confusion matrix by removing the wake column and row; the same transformation was undertaken for the LS/DS discrimination by removing both wake and REM sleep. CI = confidence interval, CReSS = CardioRespiratory Sleep Staging, DS = deep sleep (corresponding to N3), LS = light sleep (corresponding to N1 + N2), NREM = non-REM, REM = rapid eye movement.