Skip to main content
. 2011 Jul 6;21(4):651–657. doi: 10.1007/s11136-011-9960-1

Table 1.

Example of one COSMIN box with 4-point scale

Box B. Reliability: relative measures (including test–retest reliability, inter-rater reliability, and intra-rater reliability)
Excellent Good Fair Poor
Design requirements
1. Was the percentage of missing items given? Percentage of missing items described Percentage of missing items NOT described
2. Was there a description of how missing items were handled? Described how missing items were handled Not described but it can be deduced how missing items were handled Not clear how missing items were handled
3. Was the sample size included in the analysis adequate? Adequate sample size (≥100) Good sample size (50–99) Moderate sample size (30–49) Small sample size (<30)
4. Were at least two measurements available? At least two measurements Only one measurement
5. Were the administrations independent? Independent measurements Assumable that the measurements were independent Doubtful whether the measurements were independent Measurements NOT independent
6. Was the time interval stated? Time interval stated Time interval NOT stated
7. Were patients stable in the interim period on the construct to be measured? Patients were stable (evidence provided) Assumable that patients were stable Unclear whether patients were stable Patients were NOT stable
8. Was the time interval appropriate? Time interval appropriate Doubtful whether time interval was appropriate Time interval NOT appropriate
9. Were the test conditions similar for both measurements? e.g., type of administration, environment, and instructions Test conditions were similar (evidence provided) Assumable that test conditions were similar Unclear whether test conditions were similar Test conditions were NOT similar
10. Were there any important flaws in the design or methods of the study? No other important methodological flaws in the design or execution of the study Other minor methodological flaws in the design or execution of the study Other important methodological flaws in the design or execution of the study
Statistical methods
11. For continuous scores: Was an intraclass correlation coefficient (ICC) calculated? ICC calculated and model or formula of the ICC is described ICC calculated but model or formula of the ICC not described. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred No ICC or Pearson or Spearman correlations calculated
12. For dichotomous/nominal/ordinal scores: Was kappa calculated? Kappa calculated Only percentage agreement calculated
13. For ordinal scores: Was a weighted kappa calculated? Weighted Kappa calculated Unweighted Kappa calculated Only percentage agreement calculated
14. For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic Weighting scheme described Weighting scheme NOT described