. 2010 Feb 26;5:20. doi: 10.1186/1748-5908-5-20

Table 2.

Statistical methods used in the included papers to compare direct and proxy measures of behaviour

Report	n_i	n_j	n_k	Statistics used	Notes
Item-by-item comparisons: items treated as distinct

Flocke, 2004[9]	10	19	138
Stange, 1998[19]	79	32	138
Ward, 1996[20]	2	26	41	Sensitivity = a/(a + c)
Wilson, 1994[21]	3	20	16
Zuckerman, 1975[22]	15	17	3

Stange, 1998[19]	79	32	138
Ward, 1996[20]	2	26	41
Wilson, 1994[21]	3	20	16	Specificity = d/(b + d)
Zuckerman, 1975[22]	15	17	3

Dresselhaus, 2000[8] Gerbert, 1988[11] Pbert, 1999[15] Rethans, 1987*[18] Wilson, 1994[21]	7 4 15 24 3	8 3 9 1 20	20 63 12 25 16	Agreement: comparison of: (i) (a + b)/T, and (ii) (a + c)/T	Agreement was assessed by comparing the proportion of recommended behaviours performed as measured by the direct and proxy measures. Three reports performed hypothesis tests, using analysis of variance [8], Cochran's Q-test [15], and McNemar's test [18].

Gerbert, 1988[11] Pbert, 1999[15] Stange, 1998[19]	4 15 79	3 9 32	63 12 138	kappa = 2(ad - bc)/{(a + c)(c + d) + (b + d)(a + b)}	All three reports used kappa-statistics to summarise agreement; two reports [11,15] also used them for hypothesis testing.

Gerbert, 1988[11]	4	3	63	Disagreement = (i) c/T (ii) b/T (iii) (b + c)/T	Disagreement was assessed as the proportion of items recorded as performed by one measure but not by the other.

Item-by-item comparisons: items treated as interchangeable within categories of behaviour

Luck, 2000[12]	NR	8	20
Page, 1980 [14]	16-17	1	30	Sensitivity = a/(a + c)
Rethans, 1994[17]	25-36	3	35

Luck, 2000[12] Page, 1980[14]	NR	8 1	20 30	Specificity = d/(b + d)

Gerbert, 1986[10] Page, 1980[14]	20 16-17	3 1	63 30	Convergent validity = (a + d)/T	Convergent validity was assessed as the proportion of items showing agreement.

Comparisons of summary scores for each consultation: summary scores were the number (or proportion) of recommended items performed

Luck, 2000*[12]	NR	8	20		Analysis of variance to compare means of scores on direct measure and proxy.
Pbert, 1999*[15]	15	9	12
				Summary score:
Rethans, 1987*[18]	24	1	25		Paired t-tests to compare means of scores on direct measure and proxy.

Pbert, 1999*[15]	15	9	12		Pearson correlation of the scores on direct measure and proxy.

Comparisons of summary scores for each clinician: summary scores were the number (or proportion) of recommended items performed

O'Boyle, 2001[13]	1	NA	120		Comparison of means of scores on direct measure and proxy.
				Summary score:
O'Boyle, 2001*[13]	1	NA	120		Pearson correlation of scores on direct measure and proxy.
Rethans, 1994*[17]	25-36	3	25

Comparisons of summary scores for each consultation: summary scores were weighted sums of the number of recommended items performed

Peabody, 2000*[16]	21	8	28		Analysis of variance to compare means of scores on direct measure and proxy.
				Summary score:
Page, 1980*[14]	16-17	1	30		Pearson correlation of scores on direct measure and proxy.

a, b, c, d, T are defined in Table 1; i = item, j = consultation, k = physician, n_i= average number of items per consultation, n_j= average number of consultations per clinician; n_k= average number of clinicians assessed; ω_i= weight for i^thitem; x_ijk= 0 if item is not performed; x_ijk= 1 if item is performed;.

NR = Not reported; NA = Not applicable.

* This study used this method for hypothesis testing.