Skip to main content
. 2025 Jul 29;8:481. doi: 10.1038/s41746-025-01830-9

Table 2.

Generative LLM performance domain, criteria, and grading scale

Domain Criteria Grading (1–100 scale)
Accuracy Alignment with current clinical guidelines and standard practices 1 = Poor alignment, 100 = Excellent alignment
Hallucination Frequency of unverifiable or false information 1 = Frequent hallucinations, 100 = No hallucinations
Specificity and relevance Detail in response, relevance for patient’s own situation

1 = Vague/irrelevant,

100 = Highly specific/relevant

Empathy and understandability Tone sensitivity to patients’ possible emotional state 1 = Insensitive/confusing, 100 = Empathetic/clear
Actionability Practical steps and clarity in recommendations 1 = Unclear/no steps, 100 = Clear/practical steps