Skip to main content
. 2020 Dec 8;6:36. doi: 10.1038/s41531-020-00135-w

Table 2.

Population-level and subject-level inter-rater agreement for clinical MDS-UPDRS III subscores.

Subtest Number of categories Kappa (SEM) 95% confidence interval Agreement p value Subject-level agreement (a/b/c)
Left hand rest tremor 4 0.68 (0.06) 0.65−0.71 Substantial <0.00001 96.7/1.6/1.6
Right hand rest tremor 4 0.62 (0.06) 0.58−0.65 Substantial <0.00001 72.6/27.4/0
Left leg rest tremor 3 0.58 (0.07) 0.54−0.61 Moderate <0.00001 93.5/6.45/0
Right leg rest tremor 3 0.72 (0.06) 0.69−0.75 Substantial <0.00001 82.5/17.5/0
Left hand postural tremor 4 0.76 (0.06) 0.73−0.79 Substantial <0.00001 79.3/20.6/0
Right hand postural tremor 4 0.75 (0.06) 0.72−0.78 Substantial <0.00001 82.5/17.5/0
Left hand kinetic tremor 3 0.62 (0.07) 0.59−0.66 Substantial <0.00001 69.8/30.2/0
Right hand kinetic tremor 3 0.45 (0.07) 0.41−0.49 Moderate <0.00001 61.9/38.1/0
Left fingertap 5 0.54 (0.04) 0.52−0.56 Moderate <0.00001 50.0/48.4/1.6
Right fingertap 4 0.64 (0.05) 0.61−0.66 Substantial <0.00001 64.5/35.5/0
Left pronation/supination 5 0.42 (0.05) 0.40−0.44 Moderate <0.00001 42.9/52.4/4.8
Right pronation/supination 4 0.37 (0.05) 0.34−0.39 Fair <0.00001 42.9/52.4/4.8
Left leg agility 5 0.55 (0.05) 0.53−0.58 Moderate <0.00001 57.1/41.3/1.6
Right leg agility 4 0.57 (0.06) 0.54−0.60 Moderate <0.00001 61.9/38.1/0

The population-level inter-rater agreement for each MDS-UPDRS III subscore was calculated using Fleiss’ Kappa. This ranges from −1 to 1, where 0 indicates chance agreement, 1 indicates complete agreement, and −1 indicates complete disagreement. Kappa values are shown for each subtest along with standard error of the mean (SEM) and 95% confidence intervals. The number of categories available in the sample (of all raters) is also shown. All estimates of agreement were highly significant. The subject-level agreement was calculated as the percentage of subjects where the blinded rating clinicians (a) agreed completely (3 raters agreed), (b) agreed moderately (2 raters agreed) or (c) disagreed (all 3 ratings were different). This is shown as a/b/c in the table above. Complete disagreement was rare (<5% in all subitems).