ABSTRACT
Background
Systemic infections are a leading cause of hospitalization and death among patients with cirrhosis. Timely and accurate infection identification is essential for both clinical care and the development of predictive models. However, existing methods such as ICD-10 coding are unreliable, and manual chart review is resource-intensive and difficult to scale. This study aimed to develop and validate an automated large language model (LLM)-based approach for infection classification and subtyping in patients with cirrhosis presenting to the emergency department (ED).
Method
We developed INFEHR (INfection identification and subtyping using Free-text EHR analysis), an LLM-powered pipeline utilizing Claude 3.5 Sonnet to analyze clinical notes from the first 72 hours of admission. Model outputs were compared against a physician-adjudicated gold standard in a cohort of 1,000 encounters from patients with cirrhosis who presented to the ED. Performance was benchmarked against ICD-10 code–based labeling and CDC Adult Sepsis Event criteria.
Results
INFEHR achieved 94.7% overall accuracy, with 99.5% sensitivity and 92.8% positive predictive value for identifying infection presence, outperforming ICD-10–based classification across all metrics ( p < 0.0001). The model also demonstrated strong performance in classifying pathogen type and infection site. This pipeline processed notes within seconds, offering improvements in efficiency and scalability over manual review.
Conclusion
INFEHR offers a scalable, reproducible, and accurate method for infection phenotyping in cirrhosis. By overcoming limitations of traditional coding and manual review, it supports high-throughput infection surveillance, improves cohort construction for clinical research, and enables future integration into real-time decision-support tools in hepatology.
Full Text Availability
The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.
