| AI | Artificial Intelligence |
| LLM | Large Language Model |
| LLM-as-a-Judge | Large Language Model as a Judge |
| EHR | Electronic Health Record |
| BLEU | Bilingual Evaluation Understudy |
| ROUGE | Recall-Oriented Understudy for Gisting Evaluation |
| BERTScore | Bidirectional Encoder Representations from Transformers Scoring |
| CoT | Chain-of-Thought |
| G-EVAL | Generative Evaluation Architecture using LLMs |
| PDSQI-9 | Provider Documentation Summarization Quality Instrument (9-item) |
| ICC | Intraclass Correlation Coefficient |
| RAG | Retrieval-Augmented Generation |
| BHC | Brief Hospital Course |
| SOAP | Subjective, Objective, Assessment, Plan |
| GENMOD | General Model for Multi-Section Note Generation |
| SPECMOD | Specialized Model for Independent Section Generation |
| κ (Kappa) | Cohen’s Kappa Statistic |
| QA | Question Answering |
| CDC | Centers for Disease Control and Prevention |
| MedQuAD | Medical Question Answering Dataset |
| WAI-O-S | Working Alliance Inventory–Observer Short Form |
| ORS | Outcome Rating Scale |
| PoLL | Panel of LLM Evaluators |
| F1 Score | Harmonic Mean of Precision and Recall |