Table 2.
Intraclass correlation results reflecting the model’s ability to reproduce gold-standard measures based on expert review of the videos
ICCs (CI: 95%) | 1st place | 2nd place | 3rd place | 4th place | 5th place | |
---|---|---|---|---|---|---|
% Time frozen | Private test | 0.949** (0.85–0.98) | 0.934** (0.80–0.98) | 0.942** (0.83–0.98) | 0.886** (0.69–0.96) | 0.877** (0.67–0.96) |
Private+public test | 0.869** (0.77–0.93) | 0.884** (0.79–0.94) | 0.898** (0.82–0.94) | 0.870** (0.77–0.93) | 0.852** (0.74–0.92) | |
No. of FOG episodes | Private test | 0.763** (0.04–0.94) | 0.869** (0.64–0.96) | 0.717** (0.34–0.90) | 0.093 (−0.22 to 0.50) | 0.885** (0.68–0.96) |
Private+public test | 0.500** (0.18–0.71) | 0.456** (0.18–0.67) | 0.597** (0.30–0.78) | 0.084 (−0.12 to 0.32) | 0.346* (0.04–0.59) | |
FOG duration | Private test | 0.991** (0.97–1.00) | 0.991** (0.97–1.00) | 0.985** (0.95–0.99) | 0.965** (0.90–0.99) | 0.985** (0.96–1.00) |
Private+public test | 0.955** (0.92–0.98) | 0.944** (0.89–0.97) | 0.965** (0.93–0.98) | 0.950** (0.91–0.97) | 0.907** (0.82–0.95) |
*p < 0.05, **p < 0.001 (exact p-values are shown in Supplementary Table 2) in an ICC2(2,1) test. Note that for some models and outcome measures (e.g., 1st place model, no. of FOG episodes), performance when combining the public and private test sets was lower than that seen in the private data. The data was randomly divided into different test sets, so this finding is somewhat counterintuitive. Notably, this occurred for the number of FOG episodes, an outcome that was generally less robust compared to FOG duration or % time frozen (perhaps because the splitting or lumping of adjacent episodes affects the number of episodes much more than the duration of % time frozen).
ICCs intraclass correlation coefficients.