Table 5.
Interrater reliability per METRICS item.
| METRICSa item | Score | Quality | Cohen κ | Asymptotic standard error | Approximate T | P value | ||||||
|
|
Meanb (SD) | Range |
|
|
|
|
|
|||||
| Model | 3.72 (0.58) | 2.5-5.0 | Very good | 0.820 | 0.090 | 6.044 | <.001 | |||||
| Timing | 2.90 (1.93) | 1.0-5.0 | Good | 0.853 | 0.076 | 6.565 | <.001 | |||||
| Count | 3.04 (1.32) | 1.0-5.0 | Good | 0.962 | 0.037 | 10.675 | <.001 | |||||
| Specificity of prompts and language | 3.44 (1.25) | 1.0-5.0 | Very good | 0.765 | 0.086 | 8.083 | <.001 | |||||
| Evaluation | 3.31 (1.16) | 1.0-5.0 | Good | 0.885 | 0.063 | 9.668 | <.001 | |||||
| Individual factors | 2.50 (1.42) | 1.0-5.0 | Satisfactory | 0.865 | 0.087 | 6.860 | <.001 | |||||
| Transparency | 3.24 (1.01) | 1.0-5.0 | Good | 0.558 | 0.112 | 5.375 | <.001 | |||||
| Range | 3.24 (1.07) | 2.0-5.0 | Good | 0.836 | 0.076 | 8.102 | <.001 | |||||
| Randomization | 1.31 (0.87) | 1.0-4.0 | Suboptimal | 0.728 | 0.135 | 5.987 | <.001 | |||||
| Overall | 3.01 (0.58) | 1.5-4.1 | Good | 0.381 | 0.086 | 10.093 | <.001 | |||||
aMETRICS: Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language.
bThe mean scores represent the results of evaluating the included studies averaged for the 2 rater scores.