Table 3.
DOD-H: Healthy controls, N = 25 | |||||||
---|---|---|---|---|---|---|---|
Scorer | Fit | Wake | N1 | N2 | N3 | REM | Mean |
Expert 1 | – | 0.83 ± 0.11 | 0.49 ± 0.15 | 0.86 ± 0.12 | 0.78 ± 0.24 | 0.84 ± 0.16 | 0.76 ± 0.11 |
Expert 2 | – | 0.83 ± 0.14 | 0.52 ± 0.11 | 0.88 ± 0.05 | 0.78 ± 0.23 | 0.89 ± 0.06 | 0.78 ± 0.07 |
Expert 3 | – | 0.84 ± 0.12 | 0.54 ± 0.13 | 0.88 ± 0.05 | 0.74 ± 0.25 | 0.93 ± 0.05 | 0.79 ± 0.07 |
Expert 4 | – | 0.73 ± 0.18 | 0.40 ± 0.15 | 0.83 ± 0.07 | 0.75 ± 0.22 | 0.90 ± 0.09 | 0.72 ± 0.11 |
Expert 5 | – | 0.83 ± 0.14 | 0.53 ± 0.12 | 0.89 ± 0.04 | 0.76 ± 0.24 | 0.90 ± 0.09 | 0.78 ± 0.08 |
U-Sleep | ✗ | 0.88 ± 0.10 | 0.56 ± 0.14 | 0.86 ± 0.05 | 0.73 ± 0.23 | 0.93 ± 0.05 | 0.79 ± 0.06 |
SimpleNet | ✓ | 0.83 ± 0.13 | 0.57 ± 0.14 | 0.90 ± 0.04 | 0.80 ± 0.23 | 0.90 ± 0.09 | 0.80 ± 0.07 |
DeepSleepNet | ✓ | 0.84 ± 0.10 | 0.56 ± 0.13 | 0.90 ± 0.05 | 0.79 ± 0.24 | 0.88 ± 0.10 | 0.79 ± 0.07 |
SeqSleepNet | ✓ | 0.81 ± 0.18 | 0.54 ± 0.14 | 0.87 ± 0.08 | 0.73 ± 0.25 | 0.86 ± 0.12 | 0.76 ± 0.11 |
DOD-O: OSA patients, N = 55 | |||||||
---|---|---|---|---|---|---|---|
Scorer | Fit | Wake | N1 | N2 | N3 | REM | Mean |
Expert 1 | – | 0.87 ± 0.11 | 0.38 ± 0.15 | 0.82 ± 0.13 | 0.59 ± 0.31 | 0.81 ± 0.25 | 0.69 ± 0.12 |
Expert 2 | – | 0.87 ± 0.09 | 0.46 ± 0.17 | 0.82 ± 0.11 | 0.61 ± 0.29 | 0.86 ± 0.22 | 0.72 ± 0.12 |
Expert 3 | – | 0.88 ± 0.09 | 0.42 ± 0.16 | 0.83 ± 0.13 | 0.46 ± 0.33 | 0.85 ± 0.22 | 0.69 ± 0.11 |
Expert 4 | – | 0.89 ± 0.09 | 0.46 ± 0.15 | 0.84 ± 0.07 | 0.52 ± 0.33 | 0.83 ± 0.24 | 0.71 ± 0.12 |
Expert 5 | – | 0.90 ± 0.08 | 0.48 ± 0.15 | 0.86 ± 0.08 | 0.62 ± 0.33 | 0.85 ± 0.22 | 0.74 ± 0.11 |
U-Sleep | ✗ | 0.89 ± 0.09 | 0.53 ± 0.14 | 0.85 ± 0.08 | 0.66 ± 0.30 | 0.88 ± 0.20 | 0.76 ± 0.10 |
SimpleNet | ✓ | 0.89 ± 0.09 | 0.52 ± 0.16 | 0.88 ± 0.11 | 0.63 ± 0.35 | 0.85 ± 0.22 | 0.75 ± 0.11 |
DeepSleepNet | ✓ | 0.86 ± 0.11 | 0.46 ± 0.17 | 0.87 ± 0.10 | 0.67 ± 0.30 | 0.84 ± 0.22 | 0.74 ± 0.12 |
SeqSleepNet | ✓ | 0.84 ± 0.13 | 0.46 ± 0.20 | 0.86 ± 0.10 | 0.59 ± 0.33 | 0.77 ± 0.28 | 0.71 ± 0.14 |
Highest scores from human experts and U-Sleep are highlighted in bold. Scores where one of the trained ML models (last three rows) performed as well or superior to U-Sleep are indicated by underlined numbers. However, these models were fit to the particular datasets, while U-Sleep has not seen any data from DOD-H and DOD-O during model building and training, indicated by checkmarks or crosses in the Fit column. Numbers shown are mean ± 1 standard deviation per-subject F1 scores computed between the output of a single model or human expert and the consensus scores generated from the 4 (N − 1) remaining (when comparing human to consensus) or best human annotators (when comparing model to consensus).