Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2026 Jan 19:2026.01.16.26344310. [Version 1] doi: 10.64898/2026.01.16.26344310

Using Artificial Intelligence to Assess Treatment-Effect Heterogeneity in Pragmatic Cardiovascular Trials: Insights from TRANSFORM-HF

Guangyu Tong, Changjun Li, Fan Li, Yukang Zeng, Stephen J Greene, Kevin J Anstrom, Jeffrey Testani, Robert J Mentz, Eric J Velazquez
PMCID: PMC12870580  PMID: 41646819

Abstract

Background and Aims

Pragmatic clinical trials are designed to assess interventions in real-world settings, and their broad inclusion criteria and clinical variability create valuable opportunities for exploring heterogeneity of treatment effects. In this context, TRANSFORM-HF was a pragmatic trial that found no overall survival difference between torsemide and furosemide in patients hospitalized with heart failure (HF), highlighting the importance of analytic strategies that can uncover clinically meaningful variation in treatment response.

The aim of this study was to evaluate whether baseline patient characteristics modify the relative survival benefit of torsemide versus furosemide using machine learning methods.

Methods

This was a post hoc analysis of the pragmatic, multicenter, open-label, randomized TRANSFORM-HF trial, which enrolled 2,859 patients hospitalized with HF across 60 US hospitals and randomized them to torsemide or furosemide. More than 50 baseline covariates were incorporated into a Bayesian Accelerated Failure Time model with Bayesian Additive Regression Trees (AFT-BART) to estimate individualized survival treatment effects (ISTEs). The outcome of interest was all-cause mortality over follow-up. Machine learning analyses estimated ISTEs for each patient and identified key baseline covariates modifying relative treatment effects. An exploratory decision tree was used to aid interpretability and to highlight influential effect modifiers.

Results

While the overall trial showed no survival difference, machine learning analyses revealed heterogeneity in treatment effects. Atrial fibrillation (AF), BNP/NT-proBNP levels, and prior loop diuretic use emerged as the most influential baseline modifiers. Patients without AF, with lower BNP levels, and without prior loop use showed suggestive benefit from torsemide (fitted mean ISTE up to 0.49; 95% CrI, –0.07 to 1.04), whereas patients with AF, elevated BNP and prior loop use showed relative benefit with furosemide (fitted mean ISTE –0.40; 95% CrI, –0.78 to –0.05). Although most subgroup effect estimates were statistically inconclusive due to wide credible intervals, split points were statistically significant, highlighting the importance of candidate modifiers.

Conclusions

In this exploratory AI-driven analysis of TRANSFORM-HF, we identified meaningful heterogeneity in diuretic response, with atrial fibrillation, natriuretic peptide levels, and prior diuretic use modifying relative survival benefit. These findings illustrate how integrating artificial intelligence into the analysis of pragmatic trials can move beyond average treatment effects to generate individualized, hypothesis-generating insights, and underscore the value of anticipating and systematically evaluating treatment-effect heterogeneity when designing and interpreting pragmatic cardiovascular trials.

What is the clinical question being addressed?

To apply artificial intelligence to detect treatment effect heterogeneity in the pragmatic TRANSFORM HF trial and to identify patient subgroups with differential benefit from torsemide versus furosemide, supporting precision medicine in heart failure.

What is the main finding?

Machine learning analyses within this pragmatic trial revealed substantial heterogeneity in the survival effects of torsemide versus furosemide. Torsemide appeared more beneficial among patients without atrial fibrillation and with lower BNP/NT-proBNP levels, whereas furosemide was favored in those with atrial fibrillation, higher BNP/NT-proBNP concentrations, and prior loop diuretic use. Exploratory analyses demonstrated that applying AI-driven methods to evaluate treatment effect heterogeneity within a pragmatic trial framework can leverage the inherent clinical diversity of pragmatic designs and potentially generate personalized and practice-relevant insights from pragmatic clinical research.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES