Skip to main content
. 2025 Oct 16;13:e71252. doi: 10.2196/71252

Table 5. Detailed hallucination type distribution by model.

Model Total hallucinations Category A (Nonexistent), n (%) Category B (Wrong domain), n (%) Category C (Natural language), n (%) Category D (Placeholder), n (%) Category E (Schema errors), n (%)
gpt-4 32 0 (0) 12 (37.5) 0 (0) 12 (37.5) 8 (25.0)
gpt-3.5-turbo 25 0 (0) 10 (40.0) 0 (0) 5 (20.0) 10 (40.0)
claude-3-sonnet 47 5 (10.6) 20 (42.6) 0 (0) 10 (21.3) 12 (25.5)
llama3:8b 20 5 (25.0) 3 (15.0) 3 (15.0) 5 (25.0) 4 (20.0)
deepseek-r1 32 3 (9.4) 1 (3.1) 1 (3.1) 13 (40.6) 14 (43.8)
qwen2.5 35 2 (5.7) 14 (40.0) 0 (0) 9 (25.7) 10 (28.6)
phi3 1 0 (0) 0 (0) 1 (100) 0 (0) 0 (0)
gemma3 43 5 (11.6) 20 (46.5) 0 (0) 13 (30.2) 5 (11.7)