Skip to main content
. 2025;33(1):4–10. doi: 10.5455/aim.2024.33.4-10

Table 2. PEM evaluated metrics by the included studies. The * indicates the technical metrics.

Metric Frequency References Definition
Citation Support* 1 [41]. Whether the LLM includes and properly references credible sources.
Patient Satisfaction 1 [40]. The extent to which a patient would find the answer helpful and comforting.
Similarity* 1 [64]. How closely the responses of one LLM aligns with another.
Bias* 2 [41], [59]. The presence of unfair or prejudiced assumptions in the text.
Hallucinations* 2 [19], [41]. Fabricating information or presenting false details as fact.
Reasoning* 2 [22], [59]. The clarity and logical soundness of the argument or explanation.
Response Length* 2 [7], [43]. The conciseness or verbosity of the answer.
Responsiveness* 2 [7], [43]. The time to complete the LLM response.
Reproducibility* 5 [1], [18], [19], [23], [56]. Consistency of the answer when asked multiple times.
Safety 5 [22], [37], [41], [59], [69]. Avoidance of harmful, unethical, or disallowed content.
Clarity 6 [5], [9], [19], [39], [41], [60]. How easily the text can be understood.
Actionability 8 [2], [6], [13], [21], [30], [38], [54], [55]. Whether the response provides usable advice or next steps.
Tone 11 [5], [9], [10], [20], [30], [39]–[41], [47], [51], [60]. The emotional or stylistic manner of the answer.
Appropriateness 13 [9], [19], [22], [23], [27], [33], [39], [41], [45], [46], [50], [60], [69]. Suitability of the response for the context and audience.
Understandability 13 [2], [6], [9], [13], [21], [30], [38], [51], [53]–[55], [59], [60]. How straightforward and comprehensible the language is.
Reliability* 15 [10], [15]–[17], [20], [26]–[28], [31], [43], [47], [49], [57], [61], [64]. Trustworthiness and factual correctness of the content.
Quality* 19 [5], [11], [15]–[17], [25], [26], [28], [37], [46], [49], [50], [53], [54], [57], [61], [65], [66], [69]. Overall caliber and usefulness of the response.
Comprehensiveness 24 [1], [3], [4], [7], [10], [11], [18], [19], [22], [32]–[34], [36], [40], [44], [46]–[48], [50], [51], [56], [60], [67], [68]. The degree to which the answer covers all relevant points.
Readability* 51 [1]–[3], [6]–[11], [13]–[17], [20], [21], [23]–[26], [28], [30], [31], [34], [35], [38], [41], [42], [44]–[57], [59]–[67]. The ease with which the text can be read and parsed.
Accuracy* 54 [1]–[5], [7], [10]–[13], [16], [18]–[23], [25]–[34], [36], [38]–[44], [46]–[51], [53]–[56], [58]–[60], [62], [63], [66]–[69]. Correctness and precision of the information provided.