Skip to main content

View full-text article in PMC

. 2026 Feb 11;5:e84322. doi: 10.2196/84322

Table 4.

Runtime and accuracy of model predictions.

Run			Runtime (seconds)		Exact match accuracy (95% CI)		Loose match accuracy		Q1^a accuracy		Q2^b accuracy		Q3^c accuracy		Q4^d accuracy
Llama 3.3-70B
	Run 1	1331.919		0.275 (0.227-0.297)		0.755		0.16		0.86		0.08		0
	Run 2	1318.557		0.25 (0.227-0.297)		0.77		0.22		0.76		0.02		0
	Run 3	1326.938		0.255 (0.227-0.297)		0.765		0.26		0.74		0.02		0
Mistral-7B
	Run 1	1245.114		0.295 (0.284-0.358)		0.785		0.46		0.52		0.2		0
	Run 2	1249.270		0.35 (0.284-0.358) ^e		0.775		0.52		0.64		0.24		0
	Run 3	1244.751		0.315 (0.284-0.358)		0.76		0.38		0.64		0.24		0
Gemma 2-9B
	Run 1	1250.046		0.255 (0.257-0.329)		0.77		0.06		0.26		0.7		0
	Run 2	1439.940		0.315 (0.257-0.329)		0.82		0		0.46		0.76		0.04
	Run 3	1229.739		0.305 (0.257-0.329)		0.795		0		0.42		0.8		0
DeepSeek r1–distill Qwen-14B
	Run 1	1317.195		0.28 (0.252-0.324)		0.81		0		0.06		0.82		0.24
	Run 2	1309.082		0.27 (0.252-0.324)		0.815		0		0.14		0.9		0.04
	Run 3	1257.635		0.31 (0.252-0.324)		0.8		0		0.22		0.94		0.08
Qwen 2.5-7B
	Run 1	7211.855		0.296 (0.270-0.343)		0.835		0		0.48		0.68		0.02
	Run 2	7302.680		0.315 (0.270-0.343)		0.84		0		0.44		0.74		0.08
	Run 3	7231.687		0.305 (0.270-0.343)		0.825		0		0.56		0.64		0.02

^aQ1: quartile 1.

^bQ2: quartile 2.

^cQ3: quartile 3.

^dQ4: quartile 4.

^eItalicization indicates the highest accuracy.