Skip to main content
. 2024 Aug 13;7(8):e2425981. doi: 10.1001/jamanetworkopen.2024.25981

Table 4. Performance of the LLM and Text String Search Compared With Criterion Standard (n = 400)a.

Precision Recall F1 Score Cohen κ (95% CI)
Wearing helmet Not wearing helmet Unknown Wearing helmet Not wearing helmet Recall unknown Wearing helmet Not wearing helmet Unknown
LLM 0.98 1.00 0.98 0.98 0.96 1.00 0.98 0.980 1.00 0.98 (0.96-1.00)
Text string search 0.98 1.00 0.98 0.98 0.96 1.00 0.98 0.98 1.00 0.98 (0.96-1.00)

Abbreviation: LLM, large language model.

a

These 400 records were a random draw from the high-detail prompt (December 7, 2023).