Skip to main content
JAMA Network logoLink to JAMA Network
. 2025 Mar 18;333(19):1688–1698. doi: 10.1001/jama.2025.3046

Optimal Vasopressin Initiation in Septic Shock

The OVISS Reinforcement Learning Study

Alexandre Kalimouttou 1,2,3, Jason N Kennedy 4,5, Jean Feng 6, Harvineet Singh 6, Suchi Saria 7,8,9,10,11, Derek C Angus 4,5, Christopher W Seymour 4,5, Romain Pirracchio 1,2,6,
PMCID: PMC11920879  PMID: 40098600

Key Points

Question

Does a reinforcement learning model identify the optimal initiation rule for vasopressin in patients with septic shock who are receiving norepinephrine?

Findings

Among 14 453 critically ill patients with septic shock from 232 hospitals in 4 independent datasets, a reinforcement learning model recommended the initiation of vasopressin more frequently, sooner, at a lower norepinephrine dose, and at a lower organ failure score compared with the average observed actions of clinicians. Patients in whom vasopressin was initiated similarly, vs differently, to that recommended by the reinforcement learning model had statistically significantly reduced in-hospital mortality (adjusted odds ratio, 0.81).

Meaning

The initiation of vasopressin following recommendations by a reinforcement learning model was associated with decreased mortality in patients with septic shock already receiving norepinephrine.

Abstract

Importance

Norepinephrine is the first-line vasopressor for patients with septic shock. When and whether a second agent, such as vasopressin, should be added is unknown.

Objective

To derive and validate a reinforcement learning model to determine the optimal initiation rule for vasopressin in adult, critically ill patients receiving norepinephrine for septic shock.

Design, Setting, and Participants

Reinforcement learning was used to generate the optimal rule for vasopressin initiation to improve short-term and hospital outcomes, using electronic health record data from 3608 patients who met the Sepsis-3 shock criteria at 5 California hospitals from 2012 to 2023. The rule was evaluated in 628 patients from the California dataset and 3 external datasets comprising 10 217 patients from 227 US hospitals, using weighted importance sampling and pooled logistic regression with inverse probability weighting.

Exposures

Clinical, laboratory, and treatment variables grouped hourly for 120 hours in the electronic health record.

Main Outcome and Measure

The primary outcome was in-hospital mortality.

Results

The derivation cohort (n = 3608) included 2075 men (57%) and had a median (IQR) age of 63 (56-70) years and Sequential Organ Failure Assessment (SOFA) score at shock onset of 5 (3-7 [range, 0-24, with higher scores associated with greater mortality]). The validation cohorts (n = 10 217) were 56% male (n = 5743) with a median (IQR) age of 67 (57-75) years and a SOFA score of 6 (4-9). In validation data, the model suggested vasopressin initiation in more patients (87% vs 31%), earlier relative to shock onset (median [IQR], 4 [1-8] vs 5 [1-14] hours), and at lower norepinephrine doses (median [IQR], 0.20 [0.08-0.45] vs 0.37 [0.17-0.69] µg/kg/min) compared with clinicians’ actions. The rule was associated with a larger expected reward in validation data compared with clinician actions (weighted importance sampling difference, 31 [95% CI, 15-52]). The adjusted odds of hospital mortality were lower if vasopressin initiation was similar to the rule compared with different (odds ratio, 0.81 [95% CI, 0.73-0.91]), a finding consistent across external validation sets.

Conclusions and Relevance

In adult patients with septic shock receiving norepinephrine, the use of vasopressin was variable. A reinforcement learning model developed and validated in several observational datasets recommended more frequent and earlier use of vasopressin than average care patterns and was associated with reduced mortality.


This multicenter study uses reinforcement learning to examine the treatment implications of a vasopressin initiation rule developed to improve short-term and hospital outcomes in critically ill adults with septic shock receiving norepinephrine.

Introduction

Sepsis is common and can be fatal, accounting for more than 270 000 deaths in the US annually.1 Emergency care of patients with sepsis and life-threatening organ dysfunction includes fluid and vasopressor administration.2

International guidelines3 recommend using norepinephrine as the first-line vasopressor. They suggest considering vasopressin as a second-line agent when blood pressure remains poorly responsive to norepinephrine. However, septic shock is a dynamic condition, complicating the decision to initiate vasopressin. Although vasopressin use has increased,4 it is used inconsistently, and there is little guidance from randomized clinical trials on optimal timing.

Reinforcement learning is a branch of machine learning where a virtual agent learns from trial and error an optimized set of treatment rules to maximize the probability of a good outcome.5 It is well-suited for complex, dynamic problems like septic shock. The Optimal Vasopressin Initiation in Septic Shock (OVISS) study applied reinforcement learning in multiple electronic health record datasets to derive, validate, and measure the treatment implications of a vasopressin initiation rule optimized to improve both short-term and hospital outcomes among adult critically ill patients receiving norepinephrine for septic shock.

Methods

This study is reported according to the TRIPOD+AI guidelines6 (Supplement 1). Certified deidentification removed the institutional review board approval requirement for the University of California, San Francisco (UCSF) data. Medical Information Mart for Intensive Care (MIMIC-IV) and eICU Collaborative Research Database (eICU-CRD) data are publicly available. Data from University of Pittsburgh Medical Center (UPMC) were transferred to UCSF under a data-sharing agreement (DUA00003086). The University of Pittsburgh Institutional Review Board waived the requirement of written informed consent (IRB number: STUDY19030218).

Derivation and Validation Cohorts

We developed a reinforcement learning model to identify the optimal vasopressin initiation strategy (referred to as the reinforcement learning rule) in critically ill patients with septic shock who are receiving norepinephrine. We assessed the treatment implications of the rule by comparing the recommendations provided by the reinforcement learning rule with clinicians’ actual vasopressin initiation practices (referred to as the clinician-observed action).

First, the training, testing, and internal validation of the model used data from the UCSF De-Identified Clinical Data Warehouse.7 These data included more than 120 000 critically ill admissions from 2012 to 2023. We included unique patients experiencing their first episode of community- or hospital-onset septic shock, defined by the Sepsis-3 criteria,2 which could be met in either the emergency department or intensive care unit (ICU), and were already receiving norepinephrine.

Second, to externally validate the reinforcement learning model, we used 3 datasets: MIMIC-IV,8 eICU-CRD,9 and a previously curated electronic health record dataset of adult critically ill patients admitted to 18 hospitals at UPMC from 2018 to 2020.10 These 3 databases include more than 250 000 admissions in 227 US hospitals from 2008 to 2020. MIMIC-IV and eICU-CRD contain only data during intensive care, while UPMC includes data irrespective of critical care location. Abstracted data included demographics, vital signs, severity of illness, treatment, and laboratory results. Race was derived using fixed categories consistent with the Centers for Medicare & Medicaid Services meaningful use dataset. Race and ethnicity data were collected to ensure fairness and equity.

Data Preparation for Reinforcement Learning Models

UCSF data were split into derivation (training and test) and internal validation sets using a 70/15/15 random splitting procedure. The validation cohorts consisted of the UCSF internal validation set and 3 external datasets (Figure 1). To prepare for model training,11,12 each hospital stay was considered as a single trajectory, discretized into 1-hour epochs starting at t = 1 hour after shock onset. Data were truncated at the earliest of shock recovery (defined as ≥12 hours off norepinephrine), ICU discharge (dead or alive), or t = 120 hours.

Figure 1. Patient Accrual in a Study of a Reinforcement Learning Rule for Vasopressin Initiation in Septic Shock.

Figure 1.

eICU-CRD indicates eICU Collaborative Research Database; MIMIC-IV, Medical Information Mart for Intensive Care; UCSF, University of California, San Francisco; UPMC, University of Pittsburgh Medical Center.

Each patient was described by a fixed set of baseline characteristics, including age, sex, height, weight, race and ethnicity, Sequential Organ Failure Assessment (SOFA) score (range, 0-24 points, with higher scores indicating worse organ function), mean arterial pressure, serum lactate, norepinephrine dose, fluid before inclusion, mechanical ventilation, continuous kidney replacement therapy, and Charlson Comorbidity Index score, a method of categorizing comorbidities based on International Classification of Diseases, Ninth Revision (ICD-9) diagnosis scores in administrative data (range, 0-37). Other data, such as source of infection, microbiology, or time to source control, were not captured or would not be pragmatically available to clinicians at the time of vasopressor initiation.

Each 1-hour epoch was described by 14 clinical, time-varying features (eTable 1 in Supplement 1), including patient vital signs, laboratory measurements, and organ support (eg, mechanical ventilation, kidney replacement therapy). Additional data included the volume of intravenous fluids administered up to the current epoch after shock onset, including colloids but excluding transfusion and medication fluids. The mean vasopressor doses administered to each patient were abstracted in each epoch. If a measurement was unavailable for a given variable in a specific epoch, the last observed value was carried forward, reflecting clinical practice.13 If multiple measurements were available for a variable during an epoch, the mean value was used.

Model Training

A reinforcement learning model is a specific type of machine learning algorithm where an agent learns how to make decisions by interacting with an environment. The agent aims to maximize rewards over time by choosing actions based on the current state, with the decision-making process guided by the likelihood of future outcomes (eMethods and eFigure 1 in Supplement 1).

We defined the action for each epoch as a binary decision (eg, “start vasopressin” or “do not start vasopressin”). The model reward per epoch was calculated as a weighted combination of in-hospital mortality, changes in serum lactate levels, mean arterial pressure, SOFA score,14 and norepinephrine dose, with each element scaled by a predetermined coefficient (eMethods in Supplement 1). Gamma, the discount factor that determines the effective horizon of the reinforcement learning agent, was set at 0.99. We further adjusted the model hyperparameters (ie, parameters that influence how the learning process happens) and determined the optimal number of reinforcement learning algorithm iterations (eMethods in Supplement 1). Assumptions regarding model parameters were tested in sensitivity analyses (eMethods in Supplement 1).

Statistical Analysis

Continuous variables were described as mean (SD) or median (IQR) and categorical variables as counts (percentage). For descriptive analyses, the significance level was set at .05, with no adjustment for multiple comparisons.

Evaluation and Treatment Implications of the Reinforcement Learning Rule

Model performance was evaluated using weighted importance sampling,15 an off-policy evaluation16 method that compares the mean individual reward obtained with the reinforcement learning rule with the mean reward obtained from the clinician-observed action. Weighted importance sampling and the weighted importance sampling difference between the reinforcement learning rule and the clinician-observed action were calculated in both internal and external validation data for the overall reward and individual reward components (eMethods in Supplement 1).17,18,19

We evaluated the treatment implications of the rule recommended by the reinforcement learning algorithm in the 3 external datasets. We described the percentage of patients who would receive vasopressin, timing of initiation relative to shock onset, dose of norepinephrine, SOFA score, and serum lactate at vasopressin initiation under the clinician-observed action and under the optimal reinforcement learning rule.

We used weighted pooled logistic regression to estimate the adjusted odds of in-hospital mortality, comparing patients in whom care was similar vs different to the reinforcement learning rule in each time block, treating time as a discrete variable. Specifically, we defined concordance in each time block as present if the clinician-observed action matched the action recommended by the reinforcement learning algorithm. To account for time-varying confounders and treatment decisions being made hourly, the regression model was also weighted using inverse probability of treatment weighting20 (eMethods in Supplement 1). Predicted probabilities from the model were used to estimate survival curves, censored at 120 hours for illustration. This evaluation also included assessment of risk-adjusted odds of in-hospital mortality, comparing several simple rules for vasopressin initiation vs observed clinical actions, including (1) vasopressin initiation at a serum lactate level greater than 4 mmol/L, (2) norepinephrine dose greater than 0.7 µg/kg/min, and (3) mean arterial pressure less than 65 mm Hg and more than 12 hours from shock onset. Secondary outcomes included the use of mechanical ventilation and kidney replacement therapy during the patient trajectory, conditional on not yet receiving that level of organ support.

Robustness of Results to Confounding

Both the derivation of the rule and all validation exercises were conducted in observational data. As such, the study is susceptible to confounding bias, where features about the patients not captured in the data nevertheless influenced both clinician actions and patient outcomes. Thus, we conducted 3 evaluations of the robustness of our results to confounding (eMethods in Supplement 1): (1) calculation of E-values,21 (2) inclusion of ICU- and hospital-level features, and (3) falsification analyses, using a negative outcome (“death occurring on odd vs even time blocks”) and a negative exposure (“introduce vasopressin systematically 2 hours after shock onset”).22

Sensitivity Analyses

Several sensitivity analyses were performed. First, we attempted further confounder control by repeating analyses in restricted cohorts,23,24 including (1) patients for whom the model recommended vasopressin and who actually received it, (2) patients for whom the model advised against introducing vasopressin at any point in their treatment trajectory, (3) after excluding patients with severe acute kidney injury (eMethods in Supplement 1), and (4) patients in whom vasopressin was started within 48 hours. These smaller cohorts remove patients in whom clinicians may have important unmeasured biases about the initiation of vasopressin. Second, we assessed models after varying gamma from 0.99 to 0.95 and 0.90. Third, we repeated models using 2-hour time epochs. Fourth, we used alternative weighting for the reward in which each component of the reward function was given the same weight and repeated the analyses.

Reinforcement Learning Model Interpretability

To enhance interpretability, the contribution of each feature in the reinforcement learning rule was quantified using Shapley Additive Explanations (SHAP) values.25 Specifically, we analyzed SHAP values to determine the relative importance of clinical features at the time the model recommended vasopressin initiation (eMethods in Supplement 1).

Model Fairness and Equity

To study model fairness and equity, we stratified the evaluation analyses and treatment implications by sex (male, female), race (Black, White, and other/unknown), and age (<65 and ≥65 years).

Results

Patients

There were 14 453 adult patients with septic shock receiving norepinephrine in the 4 datasets (Figure 1; Table 1). Of these, 3608 were randomly selected from the UCSF cohort for the derivation cohort (median [IQR] age, 63 [56-70] years; 2075 [57%] were male; median SOFA score at shock onset, 5 [3-7]). An additional 628 UCSF patients comprised the internal validation cohort, and external cohorts included 3056 patients from MIMIC-IV, 910 patients from eICU-CRD, and 6251 patients from UPMC (overall median [IQR] age, 67 [57-76] years; 5743 [56%] were male; median SOFA score at shock onset, 6 [4-9]).

Table 1. Patient Characteristicsa.

Characteristic Derivation cohort, No. (%)b Internal and external validation cohorts (validation set), No. (%)b
UCSF training and test sets (n = 3608) UCSF internal (n = 628) MIMIC-IV (n = 3056) eICU-CRD (n = 910) UPMC (n = 6251)
Summary of datasets
Enrollment period 2012-2023 2012-2023 2008-2019 2014-2015 2019-2020
No. of hospitals 5 5 1 208 18
Total No. of admissions 119 730 119 730 76 540 200 859 102 467
Setting Integrated health system Health systems across US Integrated health system in Pennsylvania
California California Boston
Patient characteristics
Age, median (IQR), y 63 (56-70) 63 (56-69) 67 (56-77) 67 (57-77) 66 (57-75)
Admission weight, median (IQR), kg 74 (62-90) 74 (61-90) 80 (66-96) 75 (64-96) 77 (54-97)
Sex
Male 2075 (57) 351 (56) 1820 (59) 468 (51) 3455 (55)
Female 1533 (43) 277 (44) 1236 (41) 442 (49) 2796 (45)
Racec
Black 342 (10) 65 (11) 262 (9) 86 (9) 716 (11)
White 1654 (45) 270 (44) 1858 (61) 640 (70) 4970 (80)
Other/unknownd 1612 (45) 293 (45) 936 (31) 184 (21) 565 (9)
Charlson Comorbidity Index score, mean (SD)e 4 (2) 4 (1) 5 (2) 5 (3) 2 (1)
Clinical presentation
Heart rate, mean (SD), beats/min 97 (25) 99 (24) 90 (22) 98 (24) 95 (25)
Systolic blood pressure, median (IQR), mm Hg 105 (81-121) 112 (88-119) 102 (88-117) 104 (85-113) 102 (91-116)
Respiration rate, mean (SD), breath/min 21 (6) 20 (5) 20 (6) 22 (8) 22 (7)
Temperature, mean (SD), °C 37.1 (1.9) 37.0 (1.9) 36.7 (1.0) 36.7 (1.3) 36.6 (1.4)
Glasgow Coma Scale score, mean (SD) 11 (4) 10 (5) 10 (5) 12 (4) 10 (5)
SOFA score, median (IQR)f 5 (3-7) 5 (3-9) 4 (3-6) 4 (3-7) 5 (3-8)
Biological results
White blood cell count, median (IQR), x109/L 13.2 (9.1-17.8) 13.2 (9.1-18.0) 12.5 (8.2-18.3) 12.4 (8.0-18.3) 13.8 (8.6-20.2)
Serum lactate, median (IQR), mmol/L 1.9 (1.2-3.5) 1.8 (1.2-3.8) 1.8 (1.7-3.0) 1.8 (1.6-3.4) 2.8 (1.6-5.1)
Serum creatinine, median (IQR), mg/dL 1.3 (0.9-2.2) 1.3 (0.8-2.2) 1.4 (1.0-2.1) 1.7 (1.1-2.7) 1.7 (1.1-2.8)
Blood urea nitrogen, median (IQR), mg/dL 27 (17-45) 25 (15-44) 27 (17-42) 35 (23-51) 33 (20-51)
Hemoglobin, mean (SD), g/dL 10.3 (2.3) 10.4 (2.3) 10.6 (2.7) 10.1 (2.6) 10.3 (2.8)
Platelet count, median (IQR), x109/L 194 (131-289) 192 (130-287) 190 (129-270) 168 (97-237) 171 (103-250)
International normalized ratio, median (IQR) 1.6 (1.2-2.0) 1.5 (1.2-1.9) 1.3 (1.2-1.7) 1.4 (1.2-1.8) 1.6 (1.3-2.1)
Bilirubin, median (IQR), mg/dL 0.9 (0.4-1.1) 0.9 (0.5-1.1) 0.7 (0.4-1.3) 0.7 (0.4-1.2) 1.1 (0.6-2.3)
Organ support
Norepinephrine dose, median (IQR), µg/kg/min 0.05 (0.03-0.10) 0.06 (0.03-0.10) 0.12 (0.06-0.23) 0.26 (0.12-0.46) 0.16 (0.06-0.45)
Vasopressin introduction during shock 908 (25) 151 (24) 1485 (48) 179 (20) 1522 (24)
Mechanical ventilationg
At shock onset 692 (19) 127 (20) 22 (1) 77 (8) 3551 (57)
During the entire trajectory 1252 (35) 225 (35) 225 (9) 124 (14) 4673 (75)
Kidney replacement therapyh
At shock onset 168 (5) 27 (4) 22 (1) 11 (1) 421 (7)
During the entire trajectory 268 (7) 36 (6) 504 (16) 141 (15) 1278 (20)
Outcome
In-hospital mortality 1603 (44) 254 (41) 1186 (38) 256 (28) 2705 (43)

Abbreviations: EHR, electronic health record; eICU-CRD, eICU Collaborative Research Database; MIMIC-IV, Medical Information Mart for Intensive Care; SOFA, Sequential Organ Failure Assessment; UCSF, University of California, San Francisco; UPMC, University of Pittsburgh Medical Center.

SI conversion factors: To convert white blood cell count to /μL, divide by 0.001; lactate to mg/dL, divide by 0.111; creatinine to μmol/L, multiply by 88.4; urea nitrogen to mmol/L, multiply by 0.357; hemoglobin to g/L, multiply by 10.0; platelet count to ×103/μL, divide by 1.0; bilirubin to μmol/L, multiply by 17.104.

a

Each value represents the first-hour measurement at shock onset when available; otherwise, the most recent value from the previous 24 hours was used.

b

Unless otherwise indicated.

c

Race was self-reported by patients and recorded in the EHR. It was categorized using fixed classifications aligned with the Centers for Medicare & Medicaid Services’ EHR meaningful use dataset.

d

Includes Asian, Chinese, Filipino, Hawaiian, Hispanic, Native American/American Indian, and Other Pacific Islander.

e

A method of categorizing comorbidities of patients based on the International Classification of Diseases, Ninth Revision (ICD-9) diagnosis scores found in administrative data; score ranges from 0 to 37.

f

Corresponds to the severity of organ dysfunction, reflecting 6 organ systems each; scores range from 0 to 4 points for the cardiovascular, hepatic, hematologic, respiratory, neurological, and renal systems. The total score range is 0 to 24 points.

g

Corresponds to endotracheal or tracheostomy tube to assist or replace spontaneous breathing.

h

Term includes intermittent hemodialysis and continuous kidney replacement therapy to assist kidney function.

In the 3 external datasets, clinicians initiated vasopressin in 3186 of 10 217 patients already receiving norepinephrine (31%) (Table 2). Vasopressin was started at a median (IQR) SOFA score of 9 (6-12), a median 5 (1-14) hours after shock onset, norepinephrine dose of 0.37 (0.17-0.69) µg/kg/min, and serum lactate of 3.6 (1.8-6.8) mmol/L. In-hospital mortality ranged from 28% to 43%. UPMC patients received more organ support and the initiation of vasopressin by clinicians began at a higher norepinephrine dose (UPMC, 0.6 µg/kg/min vs UCSF, 0.14 µg/kg/min).

Table 2. Patient Data in the Epoch Containing Vasopressin Initiation in External Validation Sets.

Overall MIMIC-IV eICU-CRD UPMC
Clinician-observed action Reinforcement learning rule P value Clinician-observed action Reinforcement learning rule P value Clinician-observed action Reinforcement learning rule P value Clinician-observed action Reinforcement learning rule P value
Patients with vasopressin started, No. (%) 3186 (31) 8884 (87) <.001 1485 (47) 2649 (87) <.001 179 (20) 813 (89) <.001 1522 (24) 5422 (87) <.001
Norepinephrine dose, median (IQR), µg/kg/min 0.37 (0.17-0.69) 0.2 (0.08-0.45) <.001 0.28 (0.14-0.40) 0.16 (0.08-0.30) <.001 0.68 (0.35-1.35) 0.35 (0.13-1.12) <.001 0.6 (0.20-1.40) 0.2 (0.08-0.51) <.001
Time since shock onset, median (IQR), h 5 (1-14) 4 (1-8) <.001 4 (1-12) 5 (2-9) .01 6 (2-14) 3 (1-7) <.001 6 (2-17) 4 (1-7) <.001
SOFA score, median (IQR)a 9 (6-12) 7 (5-10) <.001 10 (8-12) 10 (7-12) <.001 13 (11-15) 11 (9-13) <.001 7 (5-10) 6 (3-8) <.001
Serum lactate, median (IQR), mmol/L 3.6 (1.8-6.8) 2.5 (1.7-4.9) <.001 2.6 (1.7-5.2) 2.3 (1.7-4.5) <.001 3.2 (1.7-5.9) 1.7 (1.1-2.9) <.001 4.7 (2.5-8.8) 2.7 (1.7-5.3) <.001

Abbreviations: eICU-CRD, eICU Collaborative Research Database; MIMIC-IV, Medical Information Mart for Intensive Care; SOFA, Sequential Organ Failure Assessment; UPMC, University of Pittsburgh Medical Center.

SI conversion factor: To convert lactate to mg/dL, divide by 0.111.

a

Corresponds to the severity of organ dysfunction, reflecting 6 organ systems each; scores range from 0 to 4 points for the cardiovascular, hepatic, hematologic, respiratory, neurological, and renal systems. The total score range is 0 to 24 points.

Evaluation of Reinforcement Learning Model for Vasopressin Initiation

Among 2362 patients in whom vasopressin was recommended by the reinforcement learning rule and initiated by clinicians during their trajectory, 338 (14%) had vasopressin initiated in the same epoch as the reinforcement learning rule recommendation. Compared with the clinician-observed action in external data, the reinforcement learning model suggested vasopressin initiation in more patients (8884 of 10 217 [87%]; P < .001), at a lower median (IQR) SOFA score (7 [5-10]; P < .001), earlier after shock onset (4 [1-8] hours; P < .001), at lower norepinephrine doses (0.20 [0.08-0.45] µg/kg/min; P < .001), and at lower serum lactate (2.5 [1.7-4.9] mmol/L; P < .001) (Figure 2). These results were consistent across each of the 3 validation sets, except for timing of vasopressin initiation in MIMIC-IV (median [IQR], 4 [1-12] hours for clinician-observed action vs 5 [2-9] hours for reinforcement learning rule; P = .01) (Table 2; eTable 2 in Supplement 1).

Figure 2. Comparison of Clinician-Observed Administration of Vasopressin With Treatment Recommended by the Reinforcement Learning Rule.

Figure 2.

A and B, 100 randomly selected patients included in each panel. Each line represents 1 patient trajectory. Red indicates the patient received norepinephrine alone, with a color scale representing the norepinephrine dose, and purple indicates both norepinephrine and vasopressin were infused. White corresponds to discharge alive. Black boxes at the end of the trajectory represent mortality. C, Number of patients in whom vasopressin was initiated in each time block for the clinical observed actions. D, Number of patients in whom vasopressin was initiated per the reinforcement learning rule.

The reinforcement learning rule outperformed the clinician-observed action (weighted importance sampling difference, 31 [95% CI, 15-52]) (Figure 3). The weighted importance sampling difference was consistent for each component of the reward (in-hospital mortality, 29 [95% CI, 27-34]; mean arterial pressure, 32 [95% CI, 29-36]; SOFA score, 28 [95% CI, 21-37]; norepinephrine dose, 35 [95% CI, 26-49]; and serum lactate, 25 [95% CI, 22-27]). These findings were consistent across each of the datasets (Figure 3; eTable 3 in Supplement 1).

Figure 3. Weighted Importance Sampling.

Figure 3.

Weighted importance sampling measures the mean individual reward obtained using the reinforcement learning rule and the mean reward associated with the clinician-observed actions. Weighted importance sampling was estimated in the internal and external validation sets (overall reward) for each reward component independently (reward component) and each internal and external validation set separately (internal/external validation). Results are presented as the difference in weighted importance sampling between the reinforcement learning rule and the clinician’s observed rule, with bootstrapped 95% CIs. A negative weighted importance sampling difference indicated that the clinician-observed actions were associated with a higher reward, whereas a positive difference suggested the reinforcement learning rule yielded a higher reward. For example, the reinforcement learning rule was associated with a higher overall reward (weighted importance sampling difference, 31 [95% CI, 15-52]) as well as higher rewards for each component individually. In the UCSF internal validation set, the lower bound of the 95% CI crossed 0 (weighted importance sampling difference, 15 [95% CI, −48 to 129]), indicating that the overall reward obtained with the reinforcement learning rule was not statistically higher than that associated with the clinician-observed actions. The dotted line is the reference line (ie, no difference in weighted importance sampling between the algorithm rule and the clinician-observed actions).

eICU-CRD indicates eICU Collaborative Research Database; MIMIC-IV, Medical Information Mart for Intensive Care; SOFA, Sequential Organ Failure Assessment; UCSF, University of California, San Francisco; UPMC, University of Pittsburgh Medical Center.

Using pooled logistic regression with inverse probability of treatment weighting in 3 external datasets, concordance with the reinforcement learning rule in each time block was associated with decreased odds of in-hospital mortality at each time point (adjusted odds ratio [aOR], 0.81 [robust 95% CI, 0.73-0.91]) (Figure 4; eFigure 2 in Supplement 1). Reinforcement learning rule concordance was also associated with reduced odds of requiring kidney replacement therapy at each time point (aOR, 0.47 [robust 95% CI, 0.46-0.49]), but not with the odds of requiring mechanical ventilation at each time point (aOR, 1.00 [robust 95% CI, 0.96-1.04]) (eTable 4 in Supplement 1). More simple decision rules for initiating vasopressin were associated with statistically significantly worse odds of in-hospital mortality compared with the clinician-observed action (Figure 4).

Figure 4. Risk-Adjusted Odds of In-Hospital Mortality Comparing Concordance With the Reinforcement Learning Rule or a Simple Clinical Rule With Clinician-Observed Actions.

Figure 4.

Distribution using a regular standard error estimator or a robust standard error estimator. Reinforcement learning rule results displayed for combined validation cohort as well as each individual cohort. Simple clinical decision rule results displayed for combined validation cohort only. The risk-adjusted odds for in-hospital mortality were derived from inverse probability of treatment weighted pooled logistic regression models, adjusting for baseline and time-varying confounders. The results for vasopressin initiated per the reinforcement learning rule show the ORs for in-hospital mortality of concordance with the reinforcement learning rule in each 1-hour epoch compared with the clinician-observed actions for the overall external validation set and for each external validation dataset separately. The results for vasopressin initiated per the simple clinical rule show the ORs for in-hospital mortality of concordance with 3 independent simple clinical rules for vasopressin initiation in each 1-hour epoch for the overall external validation set. The 3 simple rules are: “initiate vasopressin when serum lactate is >4 mmol/L,” “initiate vasopressin when norepinephrine dose is >0.7 μg/kg/min,” and “initiate vasopressin when MAP is <65 mm Hg and time from shock onset is at least 12 hours.”

eICU-CRD indicates eICU Collaborative Research Database; MAP, mean arterial pressure; MIMIC-IV, Medical Information Mart for Intensive Care; OR, odds ratio; UPMC, University of Pittsburgh Medical Center.

Robustness to Confounding

First, these analyses were supported by an E-value of 1.46. Second, when adding ICU- and hospital-level variables to models, the association between concordance and the odds of in-hospital mortality at each time point was consistent (UPMC aOR for in-hospital mortality, 0.87 [95% CI, 0.75-1.01]) (eTable 5 in Supplement 1). Third, falsification analyses found no association between concordance with the reinforcement learning rule and the negative outcome (aOR, 1.17 [robust 95% CI, 0.94-1.44]) nor between the negative exposure and mortality (aOR, 1.10 [robust 95% CI, 0.94-1.30]).

Sensitivity Analyses

The model performed similarly among restricted cohorts of patients in whom vasopressin was recommended or recommended early (within 48 hours). The findings were neutral when no vasopressin was recommended by the rule or if vasopressin was already initiated by clinicians early (within 12 hours) (eTable 5 in Supplement 1). When the gamma parameter was varied, the weighted importance sampling results were similar, but the association with mortality was muted (gamma, 0.95; aOR for in-hospital mortality, 0.91 [95% CI, 0.79-1.11]). Models were not statistically significant using a 2-hour time epoch or alternative reward weighting (eTable 5 in Supplement 1).

Model Interpretability

According to SHAP values, the 4 most important features informing the vasopressin initiation reinforcement learning rule were the time since shock onset, SOFA score, norepinephrine dose, and serum lactate at the time of vasopressin initiation (eFigure 3 in Supplement 1).

Fairness and Equity

The reinforcement learning rule showed no significant differences in recommendations for vasopressin initiation across the sex, race, and age subgroups (eFigure 4 in Supplement 1). There was also no significant difference in the evaluation or treatment implications of the reinforcement learning rule between the sex, race, and age subgroups (eTables 7 and 8 in Supplement 1).

Discussion

In a multicenter study of 232 hospitals and 14 453 patients in 4 datasets, this study derived, validated, and tested the clinical implications of a reinforcement learning model for vasopressin initiation in patients with septic shock already receiving norepinephrine. Compared with average clinician actions, the reinforcement learning rule was associated with a higher reward and reduced in-hospital mortality.

International practice guidelines suggest vasopressin as the second-line vasopressor if the mean arterial pressure remains inadequate despite low to moderate norepinephrine doses.3 While vasopressin use has increased,4 there are many unanswered questions. There are no randomized clinical trials that identify the optimal administration strategy, at what dose of norepinephrine to initiate vasopressin, or how to wean the drug. Randomized clinical trials, even those with 3 to 4 groups, may not capture the universe of possible treatment approaches for vasopressor use. Meanwhile, prior work uses reinforcement learning to optimize complex medical decisions, such as for the treatment of diabetes26,27 or mechanical ventilation.18 Thus, this study reports the first evaluation of reinforcement learning to enhance clinical decision-making for the initiation of vasopressin in septic shock. The OVISS reinforcement learning rule suggests initiating vasopressin at a lower norepinephrine dose compared with the average use of nuanced clinicians in practice. These findings align with subgroup analyses from the Vasopressin and Septic Shock Trial (VASST),28 which suggested that patients who received vasopressin may have improved outcomes in a less severe shock stratum (interaction P = .10).

The reinforcement learning rule for vasopressin initiation may improve outcomes through several mechanisms. First, although clinicians had access to the same data, the rule may capture subclinical features about the mechanisms and benefits of vasopressin and thus assign different emphases to clinical data to drive vasopressin initiation than assessments by clinicians. Second, from a biological perspective, exogenous vasopressin may compensate for the relative deficiency observed in septic shock.29,30,31,32 Third, vasopressin may increase glomerular filtration pressure,33 consistent with the results herein where rule concordance was associated with a lower need for kidney replacement therapy. Fourth, prior studies like Vasopressin vs Norepinephrine as Initial Therapy in Septic Shock (VANISH) and VASST demonstrate vasopressin’s catecholamine-sparing effects, which may reduce the adrenergic burden and the need for high norepinephrine doses, perhaps lowering the risk of tachyarrhythmias or myocardial ischemia. Fifth, earlier vasopressin administration may reflect increased clinical vigilance and aggressive monitoring, potentially leading to improved outcomes. Alternatively, although the reinforcement learning rule appears to be associated with improved outcomes, there is the possibility that confounding obscured the true effects.

Limitations

This study has limitations. First, the study did not prospectively test the reinforcement learning rule, instead using offline data. Future bedside use requires more rigorous assessments using randomized clinical trials. Second, the model may have limited transportability to underrepresented groups in the derivation data.34 To mitigate this, 4 datasets from more than 200 hospitals were used. The study also tested the reinforcement learning rule in age, sex, and race strata, finding no significant differences. These analyses did not include low- or middle-income countries with varying practice.35 Third, because the models were derived offline and evaluated using existing data, there were several factors, such as clinician subjectivity, which were not fully captured by the model. Analyses after adjustment for hospital- or ICU-level characteristics were consistent. Fourth, models do not include unmeasured variables, such as infectious source or timing of source control, or seek to vary the lag adjustment for confounding control. For the observed relationships to be fully explained by these and other unmeasured confounders, that confounder would need to increase both the likelihood of adhering to the reinforcement learning rule and mortality by more than 45%. Fifth, the study addressed missing values by carrying forward the last observed value, a common method that simplifies analysis and avoids introducing noise. It offers a conservative estimate and was used in prior studies with reinforcement learning.11,36 Sixth, the reinforcement learning rule recommended the initiation of vasopressin, not a dosing strategy or protocol for weaning. Seventh, long-term outcomes among survivors, such as cognitive or functional status, were not available.

Conclusions

In adult patients with septic shock receiving norepinephrine, the use of vasopressin was variable. A reinforcement learning model developed and validated in several observational datasets recommended more frequent and earlier use of vasopressin than average care patterns and was associated with reduced mortality.

Supplement 1.

TRIPOD+AI checklist

eMethods

eTable 1. List of clinical features considered in each 1-hour epoch

eFigure 1. Architecture of Reinforcement Learning model

eFigure 2. Estimated Survival Curves

eFigure 3. Feature importance based on Shapley values

eTable 2. Patient parameters at vasopressin initiation

eTable 3. Weighted Importance Sampling results

eTable 4. Primary and Secondary Outcomes Results

eTable 5. Sensitivity Analyses

eTable 6. Characteristics of patients according to vasopressin recommendation

eTable 7. Fairness analysis

eFigure 4. Fairness analysis boxplot

eTable 8. Distribution of Races in each Dataset

jama-e253046-s001.pdf (1.8MB, pdf)
Supplement 2.

Data Sharing Statement

jama-e253046-s002.pdf (12KB, pdf)

References

  • 1.Fleischmann-Struzek C, Mellhammar L, Rose N, et al. Incidence and mortality of hospital- and ICU-treated sepsis: results from an updated and expanded systematic review and meta-analysis. Intensive Care Med. 2020;46(8):1552-1562. doi: 10.1007/s00134-020-06151-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Singer M, Deutschman CS, Seymour CW, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801-810. doi: 10.1001/jama.2016.0287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Evans L, Rhodes A, Alhazzani W, et al. Executive summary: Surviving Sepsis Campaign: international guidelines for the management of sepsis and septic shock 2021. Crit Care Med. 2021;49(11):1974-1982. doi: 10.1097/CCM.0000000000005357 [DOI] [PubMed] [Google Scholar]
  • 4.Vail EA, Gershengorn HB, Hua M, Walkey AJ, Wunsch H. Epidemiology of vasopressin use for adults with septic shock. Ann Am Thorac Soc. 2016;13(10):1760-1767. doi: 10.1513/AnnalsATS.201604-259OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sutton RS, Barto AG. The reinforcement learning problem. 1998. Accessed July 3, 2023. http://incompleteideas.net/book/first/Chap3PrePub.pdf
  • 6.Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi: 10.1136/bmj-2023-078378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.UCSF Data . How to get de-identified clinical data for cohort studies, pattern recognition, and more. Accessed January 12, 2024. https://data.ucsf.edu/research/deid-data
  • 8.Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R. MIMIC-IV. PhysioNet . March 16, 2021. Accessed March 3, 2025. https://physionet.org/content/mimiciv/1.0/
  • 9.Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018;5:180178. doi: 10.1038/sdata.2018.178 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Seymour CW, Kennedy JN, Wang S, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA. 2019;321(20):2003-2017. doi: 10.1001/jama.2019.5791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Futoma J, Hughes M, Doshi-Velez F. POPCORN: Partially Observed Prediction Constrained Reinforcement Learning. March 31, 2020. Accessed March 3, 2025. https://finale.seas.harvard.edu/sites/g/files/omnuum4281/files/finale/files/popcorn_partially_observed_prediction_constrained_reinforcement_learning.pdf
  • 12.Ernst D, Geurts P, Wehenkel L. Tree-based batch mode reinforcement learning. J Mach Learn Res. April 2005. Accessed March 3, 2025. https://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf
  • 13.Rubinstein RY, Kroese DP. Simulation and the Monte Carlo Method. John Wiley & Sons; 1981. [Google Scholar]
  • 14.Vincent JL, Moreno R, Takala J, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Med. 1996;22(7):707-710. doi: 10.1007/BF01709751 [DOI] [PubMed] [Google Scholar]
  • 15.Voloshin C, Le HM, Jiang N, Yue Y. Empirical study of off-policy policy evaluation for reinforcement learning. arXiv. Revised November 27, 2021. Accessed March 3, 2025. http://arxiv.org/abs/1911.06854
  • 16.Precup D, Sutton RS, Singh S. Eligibility traces for off-policy policy evaluation. Accessed April 23, 2023. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs
  • 17.Tang S, Wiens J. Model selection for offline reinforcement learning: practical considerations for healthcare settings. Proc Mach Learn Res. 2021;149:2-35. [PMC free article] [PubMed] [Google Scholar]
  • 18.Peine A, Hallawa A, Bickenbach J, et al. Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care. NPJ Digit Med. 2021;4(1):32. doi: 10.1038/s41746-021-00388-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Saria S. Individualized sepsis treatment using reinforcement learning. Nat Med. 2018;24(11):1641-1642. doi: 10.1038/s41591-018-0253-x [DOI] [PubMed] [Google Scholar]
  • 20.Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34(28):3661-3679. doi: 10.1002/sim.6607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167(4):268-274. doi: 10.7326/M16-2607 [DOI] [PubMed] [Google Scholar]
  • 22.Keele L, Zhao Q, Kelz RR, Small D. Falsification tests for instrumental variable designs with an application to tendency to operate. Med Care. 2019;57(2):167-171. doi: 10.1097/MLR.0000000000001040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Psaty BM, Siscovick DS. Minimizing bias due to confounding by indication in comparative effectiveness research: the importance of restriction. JAMA. 2010;304(8):897-898. doi: 10.1001/jama.2010.1205 [DOI] [PubMed] [Google Scholar]
  • 24.Psaty BM, Koepsell TD, Lin D, et al. Assessment and control for confounding by indication in observational studies. J Am Geriatr Soc. 1999;47(6):749-754. doi: 10.1111/j.1532-5415.1999.tb01603.x [DOI] [PubMed] [Google Scholar]
  • 25.Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2(10):749-760. doi: 10.1038/s41551-018-0304-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jafar A, Pasqua MR, Olson B, Haidar A. Advanced decision support system for individuals with diabetes on multiple daily injections therapy using reinforcement learning and nearest-neighbors: in-silico and clinical results. Artif Intell Med. 2024;148:102749. doi: 10.1016/j.artmed.2023.102749 [DOI] [PubMed] [Google Scholar]
  • 27.Lauffenburger JC, Yom-Tov E, Keller PA, et al. The impact of using reinforcement learning to personalize communication on medication adherence: findings from the REINFORCE trial. Digit Med. 2024;7(1):1-12. doi: 10.1038/s41746-024-01028-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Russell JA, Walley KR, Singer J, et al. ; VASST Investigators . Vasopressin versus norepinephrine infusion in patients with septic shock. N Engl J Med. 2008;358(9):877-887. doi: 10.1056/NEJMoa067373 [DOI] [PubMed] [Google Scholar]
  • 29.Holmes CL, Patel BM, Russell JA, Walley KR. Physiology of vasopressin relevant to management of septic shock. Chest. 2001;120(3):989-1002. doi: 10.1378/chest.120.3.989 [DOI] [PubMed] [Google Scholar]
  • 30.Landry DW, Levin HR, Gallant EM, et al. Vasopressin deficiency contributes to the vasodilation of septic shock. Circulation. 1997;95(5):1122-1125. doi: 10.1161/01.CIR.95.5.1122 [DOI] [PubMed] [Google Scholar]
  • 31.Sharshar T, Carlier R, Blanchard A, et al. Depletion of neurohypophyseal content of vasopressin in septic shock. Crit Care Med. 2002;30(3):497-500. doi: 10.1097/00003246-200203000-00001 [DOI] [PubMed] [Google Scholar]
  • 32.Sharshar T, Blanchard A, Paillard M, Raphael JC, Gajdos P, Annane D. Circulating vasopressin levels in septic shock. Crit Care Med. 2003;31(6):1752-1758. doi: 10.1097/01.CCM.0000063046.82359.4A [DOI] [PubMed] [Google Scholar]
  • 33.Cowley AW Jr. Long-term control of arterial blood pressure. Physiol Rev. 1992;72(1):231-300. doi: 10.1152/physrev.1992.72.1.231 [DOI] [PubMed] [Google Scholar]
  • 34.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi: 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]
  • 35.Bitton E, Zimmerman S, Azevedo LCP, et al. An international survey of adherence to Surviving Sepsis Campaign guidelines 2016 regarding fluid resuscitation and vasopressors in the initial management of septic shock. J Crit Care. 2022;68:144-154. doi: 10.1016/j.jcrc.2021.11.016 [DOI] [PubMed] [Google Scholar]
  • 36.Zhang K, Wang H, Du J, et al. An interpretable RL framework for pre-deployment modeling in ICU hypotension management. NPJ Digit Med. 2022;5(1):173. doi: 10.1038/s41746-022-00708-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

TRIPOD+AI checklist

eMethods

eTable 1. List of clinical features considered in each 1-hour epoch

eFigure 1. Architecture of Reinforcement Learning model

eFigure 2. Estimated Survival Curves

eFigure 3. Feature importance based on Shapley values

eTable 2. Patient parameters at vasopressin initiation

eTable 3. Weighted Importance Sampling results

eTable 4. Primary and Secondary Outcomes Results

eTable 5. Sensitivity Analyses

eTable 6. Characteristics of patients according to vasopressin recommendation

eTable 7. Fairness analysis

eFigure 4. Fairness analysis boxplot

eTable 8. Distribution of Races in each Dataset

jama-e253046-s001.pdf (1.8MB, pdf)
Supplement 2.

Data Sharing Statement

jama-e253046-s002.pdf (12KB, pdf)

Articles from JAMA are provided here courtesy of American Medical Association

RESOURCES