. Author manuscript; available in PMC: 2011 Nov 1.

Published in final edited form as: Arthritis Care Res (Hoboken). 2010 Jun 25;62(11):1533–1541. doi: 10.1002/acr.20280

Table 4.

Final results for the best definitions of improvement (DI) all with Kappa >0.8. Definitions are ordered according to the final score.

Definitions	Chi square^*	% Sensitivity	% Specificity	% False Neg	% False Pos	AUC	Kappa	Rank^#	Final Score^#
3 of any 6 improved by at least 20%, no more than 1 worsened by more than 30% which cannot be muscle strength	90.3	98	87	9	3	92	0.86	131	113
3 of any 6 improved by at least 20%, no more than 2 worsened by ≥ 25%, which cannot be muscle strength (IMACS definition P1) (13)	90.3	98	87	9	3	92	0.86	104	90
3 of any 6 improved by at least 20%, no more than 2 worsened by more than 30% which cannot be muscle strength	90.3	98	87	9	3	92	0.86	81	70
2 of any 6 improved by at least 40%, no more than 1 worsened by more than 30% which cannot be muscle strength	85.2	97	87	13	3	92	0.84	61	51
2 of any 6 improved by at least 30%, no more than 1 worsened by more than 30% which cannot be muscle strength	90.1	100	78	0	5	89	0.85	46	39
3 of any 6 improved by at least 20%, no more than 1 worsened by more than 30%	84.3	98	83	10	4	90	0.83	36	30
3 of any 6 improved by at least 20%, no more than 2 worsened by more than 30%	84.3	98	83	10	4	90	0.83	17	14
3 of any 6 improved by at least 20%, no more than 2 worsened by ≥ 25% (IMACS definition P2) (13)	84.3	98	83	10	4	90	0.83	13	11
2 of any 6 improved by at least 40%, no more than 1 worsened by more than 30%	79.2	97	83	14	4	90	0.81	13	11
2 of any 6 improved by at least 40%, no more than 2 worsened by more than 30%, which cannot be muscle strength	79.2	97	83	14	4	90	0.81	13	11
2 of any 6 improved by at least 30%, no more than 2 worsened by more than 30%, which cannot be muscle strength	84.3	100	74	0	6	87	0.82	6	5
3 of any 6 improved by at least 20% (IMACS definition P3) (13)	84.3	98	83	10	4	90	0.83	3	3
2 of any 6 improved by at least 30%, no more than 1 worsened by more than 30%	84.3	100	74	0	6	87	0.82	1	1

All chi-squares correspond to a p value < 0.0001

The ranks were obtained by asking the attendees of the consensus meeting to decide upon which of the definitions of improvement that performed best were easiest to use and most credible (content validity). Than for each definition, the content validity rankings obtained were summed up and the resulting sum was multiplied by the corresponding value of the kappa statistic, to obtain the “final score” that incorporated both statistical criteria and experts’ judgments.”