. 2011 Jul 6;21(4):651–657. doi: 10.1007/s11136-011-9960-1

Table 1.

Example of one COSMIN box with 4-point scale

Box B. Reliability: relative measures (including test–retest reliability, inter-rater reliability, and intra-rater reliability)
	Excellent	Good	Fair	Poor
Design requirements
1. Was the percentage of missing items given?	Percentage of missing items described	Percentage of missing items NOT described
2. Was there a description of how missing items were handled?	Described how missing items were handled	Not described but it can be deduced how missing items were handled	Not clear how missing items were handled
3. Was the sample size included in the analysis adequate?	Adequate sample size (≥100)	Good sample size (50–99)	Moderate sample size (30–49)	Small sample size (<30)
4. Were at least two measurements available?	At least two measurements			Only one measurement
5. Were the administrations independent?	Independent measurements	Assumable that the measurements were independent	Doubtful whether the measurements were independent	Measurements NOT independent
6. Was the time interval stated?	Time interval stated		Time interval NOT stated
7. Were patients stable in the interim period on the construct to be measured?	Patients were stable (evidence provided)	Assumable that patients were stable	Unclear whether patients were stable	Patients were NOT stable
8. Was the time interval appropriate?	Time interval appropriate		Doubtful whether time interval was appropriate	Time interval NOT appropriate
9. Were the test conditions similar for both measurements? e.g., type of administration, environment, and instructions	Test conditions were similar (evidence provided)	Assumable that test conditions were similar	Unclear whether test conditions were similar	Test conditions were NOT similar
10. Were there any important flaws in the design or methods of the study?	No other important methodological flaws in the design or execution of the study		Other minor methodological flaws in the design or execution of the study	Other important methodological flaws in the design or execution of the study
Statistical methods
11. For continuous scores: Was an intraclass correlation coefficient (ICC) calculated?	ICC calculated and model or formula of the ICC is described	ICC calculated but model or formula of the ICC not described. Pearson or Spearman correlation coefficient calculated with evidence provided that no systematic change has occurred	Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic change has occurred or WITH evidence that systematic change has occurred	No ICC or Pearson or Spearman correlations calculated
12. For dichotomous/nominal/ordinal scores: Was kappa calculated?	Kappa calculated			Only percentage agreement calculated
13. For ordinal scores: Was a weighted kappa calculated?	Weighted Kappa calculated		Unweighted Kappa calculated	Only percentage agreement calculated
14. For ordinal scores: Was the weighting scheme described? e.g. linear, quadratic	Weighting scheme described	Weighting scheme NOT described