. 2024 Feb 15;13:e54704. doi: 10.2196/54704

Table 5.

Interrater reliability per METRICS item.

METRICS^a item	Score			Quality		Cohen κ		Asymptotic standard error		Approximate T		P value
	Mean^b (SD)	Range
Model	3.72 (0.58)	2.5-5.0	Very good		0.820		0.090		6.044		<.001
Timing	2.90 (1.93)	1.0-5.0	Good		0.853		0.076		6.565		<.001
Count	3.04 (1.32)	1.0-5.0	Good		0.962		0.037		10.675		<.001
Specificity of prompts and language	3.44 (1.25)	1.0-5.0	Very good		0.765		0.086		8.083		<.001
Evaluation	3.31 (1.16)	1.0-5.0	Good		0.885		0.063		9.668		<.001
Individual factors	2.50 (1.42)	1.0-5.0	Satisfactory		0.865		0.087		6.860		<.001
Transparency	3.24 (1.01)	1.0-5.0	Good		0.558		0.112		5.375		<.001
Range	3.24 (1.07)	2.0-5.0	Good		0.836		0.076		8.102		<.001
Randomization	1.31 (0.87)	1.0-4.0	Suboptimal		0.728		0.135		5.987		<.001
Overall	3.01 (0.58)	1.5-4.1	Good		0.381		0.086		10.093		<.001

^aMETRICS: Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language.

^bThe mean scores represent the results of evaluating the included studies averaged for the 2 rater scores.