. 2024 Jun 6;15:4853. doi: 10.1038/s41467-024-49027-0

Table 2.

Intraclass correlation results reflecting the model’s ability to reproduce gold-standard measures based on expert review of the videos

ICCs (CI: 95%)		1st place	2nd place	3rd place	4th place	5th place
% Time frozen	Private test	0.949** (0.85–0.98)	0.934** (0.80–0.98)	0.942** (0.83–0.98)	0.886** (0.69–0.96)	0.877** (0.67–0.96)
% Time frozen	Private+public test	0.869** (0.77–0.93)	0.884** (0.79–0.94)	0.898** (0.82–0.94)	0.870** (0.77–0.93)	0.852** (0.74–0.92)
No. of FOG episodes	Private test	0.763** (0.04–0.94)	0.869** (0.64–0.96)	0.717** (0.34–0.90)	0.093 (−0.22 to 0.50)	0.885** (0.68–0.96)
No. of FOG episodes	Private+public test	0.500** (0.18–0.71)	0.456** (0.18–0.67)	0.597** (0.30–0.78)	0.084 (−0.12 to 0.32)	0.346* (0.04–0.59)
FOG duration	Private test	0.991** (0.97–1.00)	0.991** (0.97–1.00)	0.985** (0.95–0.99)	0.965** (0.90–0.99)	0.985** (0.96–1.00)
FOG duration	Private+public test	0.955** (0.92–0.98)	0.944** (0.89–0.97)	0.965** (0.93–0.98)	0.950** (0.91–0.97)	0.907** (0.82–0.95)

*p < 0.05, **p < 0.001 (exact p-values are shown in Supplementary Table 2) in an ICC2(2,1) test. Note that for some models and outcome measures (e.g., 1st place model, no. of FOG episodes), performance when combining the public and private test sets was lower than that seen in the private data. The data was randomly divided into different test sets, so this finding is somewhat counterintuitive. Notably, this occurred for the number of FOG episodes, an outcome that was generally less robust compared to FOG duration or % time frozen (perhaps because the splitting or lumping of adjacent episodes affects the number of episodes much more than the duration of % time frozen).

ICCs intraclass correlation coefficients.