Skip to main content
European Heart Journal. Digital Health logoLink to European Heart Journal. Digital Health
. 2026 Jan 12;7(Suppl 1):ztaf143.087. doi: 10.1093/ehjdh/ztaf143.087

Large language model-based clustering versus standard questionnaires for evaluating quality of life and therapeutic compliance in chronic heart failure: a study protocol

M Coriano' 1, C A Papappicco 2, C Lanera 3, S A Inderst 4, A Di Ninno 5, G Megale 6, L Ferrara 7, F Perillo 8, G Lorenzoni 9, D Gregori 10, F Tona 11,a
PMCID: PMC12795159

Abstract

Background

Quality of life (QoL) and medication adherence are critical predictors of heart failure (HF) outcomes. Although standard questionnaires (SQs) are commonly used to assess these domains, they often lack the sensitivity to capture subtle but clinically meaningful changes. Large language models (LLMs) present a promising opportunity to interpret free-text patient input, offering potentially richer and more individualized data.

Pourpose

The HEALING-LLM study aims to investigate the feasibility and potential of using an LLM to process free-text responses regarding QoL and medication adherence in patients with chronic HF, and to compare these results with those obtained through traditional instruments.

Methods/Results

Prospective, observational, monocentric cohort study (CET code: 5924/AO/24; URC code: AOP3264). Patients admitted to the cardiology unit with HF stage C or D are enrolled at discharge . Follow-up is conducted online at 1, 6, and 12 months post-discharge using the MyCap-RedCap® mobile app or email. Each follow-up includes the EQ-5D-5L to assess QoL, a free-text question mirroring the five EQ-5D-5L dimensions, the 8-item Morisky Medication Adherence Scale (MMAS-8), and a corresponding free-text question reflecting MMAS-8 content.

Responses from the SQs are categorized according to established severity levels for EQ-5D-5L and probability of non-adherence for MMAS-4. Free-text responses undergo processing through a structured LLM pipeline using ChatGPT-4o mini (OpenAI®). Data are first imported into R using the {REDCapTidieR} package. Free-text responses are then sent to the LLM through a prompt architecture consisting of a system-level component that defines the model’s role and context, and a user-level component that specifies the task, formatting, and examples. The model outputs structured classifications aligned with EQ-5D-5L and MMAS-4 criteria in JSON format. These outputs are subsequently parsed and validated in R using the {tidyverse} and {jsonlite} packages to ensure compliance with REDCap SQL standards.

Planned statistical analyses include evaluation of the model’s robustness to prompt variation, comparison of LLM-derived classifications with SQ scores using weighted Cohen’s kappa and Bland-Altman plots, and correlation with clinical reference standards via Spearman coefficients. In addition, HF specialists will review the LLM outputs to detect discordant cases, assess clinical utility, and rate usability. The primary outcome is to evaluate agreement, reproducibility across patient subgroups, and responsiveness to clinical change over time between LLM-based and SQ-derived measures.

Conclusions

Integrating LLMs into patient-reported outcome systems represents an innovative strategy to assess QoL and medication adherence in HF. The HEALING-LLM study will provide essential groundwork for validating this technology in future large-scale clinical trials.

graphic file with name 34753920250612013359_1.jpg

Grafical abstract part 1

graphic file with name 34753920250612013359_2.jpg

Grafical abstract part 2


Articles from European Heart Journal. Digital Health are provided here courtesy of Oxford University Press on behalf of the European Society of Cardiology

RESOURCES