Table 3.
Confusion matrix statistics in the six subjects analyzed for comparisons of automated methods
| A. Spindle Auto | |||||
|---|---|---|---|---|---|
| Subject | Precision | Recall | F1 | Overlap % | |
| 1 | 0.86 | 0.37 | 0.52 | 49 | |
| 2 | 0.99 | 0.28 | 0.44 | 50 | |
| 3 | 0.99 | 0.30 | 0.47 | 36 | |
| 4 | 0.98 | 0.33 | 0.49 | 39 | |
| 5 | 0.96 | 0.37 | 0.53 | 39 | |
| 6 | 0.97 | 0.30 | 0.46 | 36 | |
| Mean ± SD | 0.96 ± 0.05 | 0.33 ± 0.04 | 0.48 ± 0.04 | 41 ± 6.3 | |
| B. Spindle Auto F1-optimized | |||||
| Subject | Precision | Recall | Max F1 | Overlap % | Threshold at Max |
| 1 | 0.74 | 0.76 | 0.75 | 62 | 0.93 |
| 2 | 0.85 | 0.79 | 0.82 | 61 | 0.45 |
| 3 | 0.91 | 0.90 | 0.90 | 56 | 0.19 |
| 4 | 0.85 | 0.82 | 0.83 | 48 | 0.67 |
| 5 | 0.87 | 0.88 | 0.88 | 58 | 0.56 |
| 6 | 0.86 | 0.89 | 0.88 | 57 | 0.24 |
| Mean ± SD | 0.85 ± 0.06 | 0.84 ± 0.06 | 0.84 ± 0.05 | 57 ± 5.1 | 0.51 ± 0.28 |
| C. TFσ peak Auto | |||||
| Subject | Precision | Recall | F1 | Overlap% | |
| 1 | 0.68 | 0.95 | 0.79 | 76 | |
| 2 | 0.86 | 0.92 | 0.89 | 74 | |
| 3 | 0.93 | 0.91 | 0.92 | 62 | |
| 4 | 0.79 | 0.95 | 0.86 | 73 | |
| 5 | 0.81 | 0.96 | 0.88 | 73 | |
| 6 | 0.90 | 0.88 | 0.89 | 64 | |
| Mean ± SD | 0.83 ± 0.09 | 0.93 ± 0.03 | 0.87 ± 0.04 | 70 ± 5.8 |
This table presents the confusion matrix statistics (described in Methods) when comparing various automated detection methods against hand-scored TFσ peaks. These methods are analyzed in six subjects from the control subject cohort with full-night recordings and hand-scored TFσ peaks and different from the six segments in DREAMS database shown in Table 1. (A) Results for auto-detected spindles recapitulate the patterns observed in Table 1 of hand-scored spindles, except precisions scores are now close to 1 due to more reliable scoring of spindles by the automated algorithm. (B) Automated spindle detection using F1-optimized thresholds show both high precision and recall scores, as the optimization goal was to maximize the F1 scores. However, varying thresholds are needed to achieve maximal F1 score in different subjects, highlighting the challenge of pre-selecting a single uniform threshold for all subjects. (C) The automated detection algorithm of TFσ peaks also achieves high precision and recall scores in comparison to hand-scored TFσ peaks. Importantly, the F1-scores are numerically higher in all subjects relative to the best possible F1-score using spindle detection algorithms with adjusted thresholds. This result demonstrates the benefit of detecting TFσ peaks directly from the time-frequency domain and the robustness of the automated detection algorithm to emulate the process of hand-scoring TFσ peaks.