. 2023 Mar 6;6:33. doi: 10.1038/s41746-023-00784-0

Table 3.

(ii) Generalizability on different data centers with a heterogeneous dataset.

Datasets	U-Sleep-v1	U-Sleep-v1 (S)	U-Sleep-v1 (FT)
ABC	73.6 ± 11.4	71.4 ± 13.9	69.0 ± 12.5
CCSHS	84.9 ± 5.1	77.3 ± 7.2	77.3 ± 6.7
CFS	76.6 ± 11.6	70.2 ± 10.8	70.9 ± 10.2
CHAT	82.1 ± 6.5	72.9 ± 8.0	68.8 ± 8.7
DCSM	79.3 ± 9.3	71.5 ± 11.2	69.3 ± 10.5
HPAP	73.8 ± 10.8	68.9 ± 11.1	67.9 ± 12.5
MESA	72.7 ± 10.8	68.5 ± 14.3	68.7 ± 11.9
MROS	71.4 ± 12.1	61.7 ± 13.7	63.9 ± 13.2
PHYS	74.2 ± 10.7	72.9 ± 11.2	73.2 ± 11.4
SEDF-SC	77.8 ± 7.9	75.8 ± 8.0	77.9 ± 7.7
SEDF-ST	77.2 ± 10.1	64.3 ± 15.4	67.5 ± 12.4
SHHS	76.9 ± 9.7	70.9 ± 9.3	73.0 ± 8.9
SOF	74.8 ± 9.8	64.6 ± 12.6	67.5 ± 11.2
avg OA	76.5 ± 10.6	69.9 ± 11.9	70.2 ± 11.1
BSDB	72.5 ± 12.0 ^(DT)	77.6 ± 11.3	77.3 ± 11.4

Performance of U-Sleep-v1, pre-trained on the OA datasets, and evaluated on all the test sets of the OA datasets and on the test set of the BSDB dataset (data split in Supplementary Table 7 and Supplementary Table 8). We also report the performance of U-Sleep-v1 trained from scratch (S) or fine-tuned (FT) on the BSDB dataset, and evaluated on all the test sets of all the available datasets. We report the F1-score (%F1), specifically the mean value and the standard deviation (μ ± σ) computed across the recordings.