Skip to main content
. 2025 Jun 14;59(5):1148–1159. doi: 10.1007/s43441-025-00798-8

Table 2.

LLM assessment results

LLM assessed Scored value N % [95% CI]
Flan 1 1 10 [1.8, 40.4]
2 3 30 [10.8, 60.3]
3 6 60 [31.3, 83.2]
4 0 0 [0.0, 28.0]
GPT35 1 0 0 [0.0, 28.0]
2 0 0 [0.0, 28.0]
3 7 70 [40.0, 89.0]
4 3 30 [10.8, 60.3]
GPT4 1 0 0 [0.0, 28.0]
2 0 0 [0.0, 28.0]
3 4 40 [17.0, 69.0]
4 6 60 [31.3, 83.2]
Granite 1 5 50 [24.0, 76.0]
2 2 20 [5.7, 51.0]
3 3 30 [10.8, 60.3]
4 0 0 [0.0, 28.0]
Llama 2 1 1 10 [1.8, 40.4]
2 1 10 [1.8, 40.4]
3 4 40 [17.0, 69.0]
4 4 40 [17.0, 69.0]