. 2024 Aug 13;7(8):e2425981. doi: 10.1001/jamanetworkopen.2024.25981

Table 4. Performance of the LLM and Text String Search Compared With Criterion Standard (n = 400)^a.

	Precision			Recall			F1 Score			Cohen κ (95% CI)
	Wearing helmet	Not wearing helmet	Unknown	Wearing helmet	Not wearing helmet	Recall unknown	Wearing helmet	Not wearing helmet	Unknown	Cohen κ (95% CI)
LLM	0.98	1.00	0.98	0.98	0.96	1.00	0.98	0.980	1.00	0.98 (0.96-1.00)
Text string search	0.98	1.00	0.98	0.98	0.96	1.00	0.98	0.98	1.00	0.98 (0.96-1.00)

Abbreviation: LLM, large language model.

^{^a}

These 400 records were a random draw from the high-detail prompt (December 7, 2023).

Table 4. Performance of the LLM and Text String Search Compared With Criterion Standard (n = 400)a.