Artificial Authority: The Promise and Perils of LLM Judges in Healthcare

. 2026 Jan 16;13(1):108. doi: 10.3390/bioengineering13010108

AI	Artificial Intelligence
LLM	Large Language Model
LLM-as-a-Judge	Large Language Model as a Judge
EHR	Electronic Health Record
BLEU	Bilingual Evaluation Understudy
ROUGE	Recall-Oriented Understudy for Gisting Evaluation
BERTScore	Bidirectional Encoder Representations from Transformers Scoring
CoT	Chain-of-Thought
G-EVAL	Generative Evaluation Architecture using LLMs
PDSQI-9	Provider Documentation Summarization Quality Instrument (9-item)
ICC	Intraclass Correlation Coefficient
RAG	Retrieval-Augmented Generation
BHC	Brief Hospital Course
SOAP	Subjective, Objective, Assessment, Plan
GENMOD	General Model for Multi-Section Note Generation
SPECMOD	Specialized Model for Independent Section Generation
κ (Kappa)	Cohen’s Kappa Statistic
QA	Question Answering
CDC	Centers for Disease Control and Prevention
MedQuAD	Medical Question Answering Dataset
WAI-O-S	Working Alliance Inventory–Observer Short Form
ORS	Outcome Rating Scale
PoLL	Panel of LLM Evaluators
F1 Score	Harmonic Mean of Precision and Recall