Table 3.
Box B. Reliability: Relative Measures (Including Test-Retest Reliability, Interrater Reliability, and Intrarater Reliability) |
Cevidanes et al.25 |
Cevidanes et al.16 |
Nada et al.12 |
DeCesare et al.7 |
Gkantidis et al.21 |
Weissheimer et al.3 |
||||||||||||||||||
E |
G |
F |
P |
E |
G |
F |
P |
E |
G |
F |
P |
E |
G |
F |
P |
E |
G |
F |
P |
E |
G |
F |
P |
|
Design requirements | ||||||||||||||||||||||||
Was the percentage of missing items given? | x | x | x | x | x | x | ||||||||||||||||||
Was there a description of how missing items were handled? | x | x | x | x | x | x | ||||||||||||||||||
Was the sample size included in the analysis adequate? | x | x | x | x | x | x | ||||||||||||||||||
Were at least two measurements available? | x | x | x | x | x | x | ||||||||||||||||||
Were the administrations independent? | x | x | x | x | x | x | ||||||||||||||||||
Was the time interval stated? | x | x | x | x | x | x | ||||||||||||||||||
Were patients stable in the interim period on the construct to be measured? | x | x | x | x | x | x | ||||||||||||||||||
Was the time interval appropriate? | x | x | x | x | x | x | ||||||||||||||||||
Were the test conditions similar for both measurements (e.g., type of administration, environment, instructions)? | x | x | x | x | x | x | ||||||||||||||||||
Were there any important flaws in the design or methods of the study? | x | x | x | x | x | x | ||||||||||||||||||
Statistical methods | ||||||||||||||||||||||||
For continuous scores: Was an intra class correlation coefficient calculated? | x | x | x | x | x | x | ||||||||||||||||||
For dichotomous/nominal/ordinal scores: Was kappa calculated? | x | x | x | x | x | x | ||||||||||||||||||
For ordinal scores: Was a weighted kappa calculated? | x | x | x | x | x | x | ||||||||||||||||||
For ordinal scores: Was the weighting scheme described (e.g., linear, quadratic)? | x | x | x | x | x | x | ||||||||||||||||||
Score | Poor | Poor | Poor | Poor | Poor | Poor |
COSMIN box with four-point scale for methodological quality: E, excellent; G, good; F, fair; P, poor. A methodological quality score per box was obtained by taking the lowest rating of any item in a box (‘‘worst score counts''). A poor score on any item was thus considered to represent a fatal flaw. In the scoring system, items 1 and 2 (on the number of missing items and how missing items are handled) were scored less strictly than the other items as this information is often not reported in articles. In all boxes, a small sample size was considered poor methodological quality.24 COSMIN indicates Consensus-Based Standards for the Selection of Health Measurement Instruments.