Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 15.
Published in final edited form as: Intensive Care Med. 2025 Mar 13;51(3):556–569. doi: 10.1007/s00134-025-07848-7

A common longitudinal intensive care unit data format (CLIF) for critical illness research

Juan C Rojas 1, Patrick G Lyons 2, Kaveri Chhikara 3, Vaishvik Chaudhari 1, Sivasubramanium V Bhavani 4, Muna Nour 4, Kevin G Buell 3, Kevin D Smith 3, Catherine A Gao 5, Saki Amagai 6, Chengsheng Mao 6, Yuan Luo 6, Anna K Barker 7, Mark Nuppnau 7, Michael Hermsen 9, Jay L Koyner 3, Haley Beck 8, Rachel Baccile 3, Zewei Liao 10, Kyle A Carey 3, Brenna Park-Egan 2, Xuan Han 11, Alexander C Ortiz 12, Benjamin E Schmid 13, Gary E Weissman 13, Chad H Hochberg 14, Nicholas E Ingraham 15, William F Parker 3,8,16,17,*
PMCID: PMC12167506  NIHMSID: NIHMS2075294  PMID: 40080116

Abstract

Rationale:

Critical illness threatens millions of lives annually. Electronic health record (EHR) data are a source of granular information that could generate crucial insights into the nature and optimal treatment of critical illness.

Objectives:

Overcome the data management, security, and standardization barriers to large-scale critical illness EHR studies.

Methods:

We developed a Common Longitudinal Intensive Care Unit (ICU) data Format (CLIF), an open-source database format to harmonize EHR data necessary to study critical illness. We conducted proof-of-concept studies with a federated research architecture: (1) an external validation of an in-hospital mortality prediction model for critically ill patients and (2) an assessment of 72-h temperature trajectories and their association with mechanical ventilation and in-hospital mortality using group-based trajectory models.

Measurements and main results:

We converted longitudinal data from 111,440 critically ill patient admissions from 2020 to 2021 (mean age 60.7 years [standard deviation 17.1], 28% Black, 7% Hispanic, 44% female) across 9 health systems and 39 hospitals into CLIF databases. The in-hospital mortality prediction model had varying performance across CLIF consortium sites (AUCs: 0.73–0.81, Brier scores: 0.06–0.10) with degradation in performance relative to the derivation site. Temperature trajectories were similar across health systems. Hypothermic and hyperthermic-slowresolver patients consistently had the highest mortality.

Conclusions:

CLIF enables transparent, efficient, and reproducible critical care research across diverse health systems. Our federated case studies showcase CLIF’s potential for disease sub-phenotyping and clinical decision-support evaluation. Future applications include pragmatic EHR-based trials, target trial emulations, foundational artificial intelligence (AI) models of critical illness, and real-time critical care quality dashboards.

Keywords: Critical care data, Temperature trajectory modeling, Machine learning

Introduction

The intensive care unit (ICU) is an optimal setting for data science because of voluminous longitudinal electronic health record (EHR) data, as demonstrated by exemplar de-identified databases such as MIMIC [14]. Clinical artificial intelligence (AI) applications in the ICU range from early warning systems for patient deterioration to programs designed to optimize resource allocation and personalize treatment recommendations [57]. However, real-world ICU data science is often inefficient and difficult to scale because of challenges in acquiring, organizing, cleaning, and harmonizing EHR data. ICU EHR data are complex, highly correlated, and subject to degradation through data capture and storage procedures designed for purposes other than research [8].

Local EHR data repositories, or Electronic Data Warehouses (EDWs), are designed to maintain source data integrity and meet various institutional research and operational needs [9]. EDWs often have unique idiosyncrasies, syntax, and data vocabularies which means extensive preprocessing is required before data can be analyzed for a specific use case [10, 11]. Established open-source common data models (CDMs), such as the Observational Medical Outcomes Partnership (OMOP) [12], address this data harmonization and standardization challenge for the entire EHR. While OMOP is capable of representing critical care data elements such as ventilator settings, infusion titrations, and mechanical circulatory support, these concepts are captured inconsistently—and often without granularity—across OMOP implementations, making multi-center critical care studies with OMOP extremely challenging [1316].

In this manuscript, we describe the Common Longitudinal ICU data Format (CLIF), an open-source critical care database format we designed to perform reproducible research across our diverse health systems. Intended to complement existing CDMs, CLIF is a standardized representation of a minimum set of essential Common ICU Data Elements (mCIDE) organized into a humanreadable structure clinicians can understand (Fig. 1). This manuscript begins by describing CLIF’s rationale, structure, and key processes. We then present two case studies demonstrating CLIF’s value and conclude with its broader implications for critical care research.

Fig. 1.

Fig. 1

CLIF Entity Relationship Diagram. This diagram depicts the relationships between various tables in the Common Longitudinal ICU Data Format (CLIF) schema. The diagram includes the following 22 tables: 1. Patient, 2. Hospitalization, 3. Admission Diagnosis, 4. Provider, 5. ADT (Admission, Discharge, Transfer), 6. Vitals, 7. Scores, 8. Dialysis, 9. Intake/Output, 10. Procedures, 11. Therapy Session, 12. Therapy Details, 13. Respiratory Support, 14. Position, 15. ECMO (Extracorporeal Membrane Oxygenation) and Mechanical Circulatory Support (MCS), 16. Labs, 17. Microbiology Culture, 18. Sensitivity, 19. Microbiology Non-culture, 20. Medication Orders, 21. Medication Admin Intermittent, 22. Medication Admin Continuous. Each table represents a specific aspect of ICU data, and lines between tables indicate how they are related through shared identifiers, primarily hospitalization_id. The depicted entity-relationship model was version 1.0, used for the case studies in this manuscript. The CLIF format is maintained with the git version control system, release 2.0.0 is available at https://clif-consortium.github.io/website/

Methods

CLIF consortium objectives and process

We assembled a geographically diverse group of US-based clinician-scientists and data scientists experienced in EHR-based clinical outcomes and AI research. We met virtually starting in July 2023 to identify the practical challenges of using EHR data to study critical illness locally and across centers (Table 1) and addressed them by developing operating procedures, terminologies, and quality control methods. Our guiding principles were: (1) efficient, clinically understandable data structures; (2) consistent and harmonizable data elements; (3) scalability and flexibility for future advancements; (4) federated analysis for collaborative research while maintaining data privacy and security; and (5) open-source development in line with the 2023 National Institutes of Health (NIH) Data Management and Sharing Policy and FAIR (Findable, Accessible, Interoperable, Reusable) data principles [17, 18]. Our overall objective is a standardized framework for representing critical illness data across all hospital care settings, from ICUs to emergency departments and hospital wards that is designed to allow researchers to accomplish benchmark critical care informatics tasks such as identifying sepsis and describing the trajectory of respiratory failure wherever critical illness begins.

Table 1.

Practical Challenges to EHR Data Science in the Hospital

Challenge Description Example
Complex longitudinal data with differing frequencies drawn from multiple sources Diverse domains such as vital signs, laboratory measurements, medications, and respiratory support require different data structures for representation and analysis Vital signs are frequently recorded during hospitalization (e.g. hourly) while laboratory results occur much less frequently and can be distributed across different record id numbers for the same patient (e.g., tests obtained in a clinic before the patient is referred to the emergency department or admitted). Microbiology tests are similarly recorded but are further complicated by the possibility of multiple observations per test (e.g., blood culture positive for multiple organisms) and nested antimicrobial susceptibility testing
Interdependent data Complex care processes are implicitly embedded in the presence, absence, frequency, and content of structured data Continuous neuromuscular blockade (recorded in the medication administration table) requires invasive mechanical ventilation (recorded in the respiratory flowsheet tables)
Temporally-dependent data Sepsis onset is defined by complex temporal heuristics involving the sequencing and timing of antimicrobials, infectious tests, and abnormal physiology
Inefficient and inaccurate data capture Many bedside measurements (e.g., vital signs, respiratory parameters) require manual recording or human validation of automatically recorded data before they are available in the EHR Respiratory flowsheets often contain carryforward and copy/paste observations, leading to internal inconsistencies (e.g., patients recorded as receiving low-flow nasal oxygen and invasive ventilation simultaneously)
Complex data storage ICU data storage is fragmented across different systems (e.g., ventilator data, laboratory systems, vital signs), making comprehensive data analysis challenging without sophisticated integration efforts. Diverse end-user needs and goals (e.g., operational quality reporting vs. clinical research) lead enterprise and research data warehouses to adopt a “one size fits all” content and format approach for EHR data, which may not be optimal for specific research or operational needs ICU data on ventilator settings may be stored separately from laboratory results or vital signs, requiring complex data integration for analysis. Additionally, the “one size fits all” approach in data warehouses can result in data formats that are not ideal for specific research tasks, such as temporal analyses or patient-specific interventions
Local idiosyncrasies ICU practices and data recording can vary significantly between institutions, with local protocols influencing how data is recorded and stored, leading to variability that complicates multicenter studies ICU triage decisions, such as when to escalate care to invasive ventilation, are often based on local protocols, which can differ significantly between hospitals, leading to challenges in generalizing study findings across different settings

Minimum common ICU data elements

The NIH defines a Common Data Element (CDE) as a “standardized, precisely defined question, paired with a set of allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection” [19]. We developed a minimum set of Common ICU Data Elements (mCIDE) denoted with the “_category” suffix. Each mCIDE (1) represents a precisely defined clinical entity essential for characterizing critical illness and (2) has a limited set of permissible values. For example, “vital_category” has nine permissible values, corresponding to the standard set of vital signs that are essential for critical care decision making. CLIF preserves site-specific source EHR data elements using *_name variables. For example, lab_name (e.g., “LAB HEMOGLOBIN—AUTOMATED”) preserves the specific lab test name as used at the site, while lab_category (e.g., “Hemoglobin”) maps this test to a specific permissible value of the lab_category CDE. This structure creates standardized ICU elements while preserving original data labels for quality control, as is common practice in many data models and standards (e.g., OMOP’s _concept_id and source_value fields, and FHIR’s codeableConcept) [12, 20].

The selection of mCIDE was guided by consensus among site PIs, with emphasis on clinical relevance, feasibility, and harmonization across CLIF sites. Whenever possible, we adopted NIH-endorsed CDEs into CLIF. We created several novel CIDEs for CLIF, such as modes of mechanical ventilation (mode_category), preserving source EHR data elements in “_name” fields.

CLIF entity-relationship model

CLIF’s entity-relationship (ER) model is inspired by how critical care researchers organize and analyze clinical data in practice (Fig. 1). It organizes the mCIDE into over 20 clinically relevant longitudinal tables linked by patient and hospitalization, defined by clinical information type and organ system. CLIF’s ER model features specialized critical care tables such as respiratory support, continuous medications, position (for prone mechanical ventilation), and patient assessments (e.g. Glasgow Coma Scale or Richmond Agitation-Sedation Scale). CLIF also contains other standard inpatient tables (e.g. vitals, labs) likely to be found in any EDW. CLIF is language agnostic and can be implemented as a Structured Query Language (SQL) database or as efficient flat files (e.g. parquet). Table E1 in the Supplement provides links to publicly available extract-transform-load (ETL) guidelines and EHR vendor-specific resources (e.g., Epic) to support the adoption and implementation of CLIF.

Federated analytics in the CLIF consortium distributed database

Unlike centralized data repositories such as MIMIC-IV or the Amsterdam UMC Data Warehouse, our federated analysis approach is similar to how the Food and Drug Administration Sentinel System [21] and the European Medical Agency Data Analysis and Real-World Interrogation Network (DARWIN EU) [22] operate. Our process is (1) the primary study site develops a project using the CLIF format and distributes code to all consortium sites, (2) each site runs the analysis, communicates any errors to the primary site, and returns aggregate results, and (3) the primary site combines the aggregate results. No patient-level data is exchanged at any point in the process, so each participating institution maintains control of its data while contributing to a distributed research database. All CLIF databases (Fig. E1) include data from medical and surgical ICUs, as well as blended ICUs that manage patients from both populations in the same unit. This design ensures the representation of diverse ICU types and a wide range of critically ill patients across participating centers.

Case study cohort identification

We used the same cohort-discovery script and the CLIF admission-discharge-transfer (ADT) and patient tables to identify all adults (≥ 18 years) admitted to an ICU within 48 h of hospitalization and staying at least 24 h, from January 1, 2020, to December 31, 2021, at each site across the consortium (Fig. E2). We chose these inclusion criteria to identify the general ICU population, excluding patients who die shortly after ICU admission or are admitted to the ICU for non-critical reasons (e.g., to facilitate a procedure). After cohort identification, we ran a standardized outlier handling script on the CLIF tables to remove physiologically impossible values (e.g. creatinine < 0 mg/dL).

Case study I: Development and external validation of an in-hospital mortality model for ICU patients

Accurate and reliable hospital mortality predictions for critically ill patients may help clinical teams prioritize therapeutic interventions, facilitate more informed shared decision-making around goals of care, and optimize resource allocation within healthcare systems. Existing prediction models are limited by suboptimal accuracy and significant performance variation across hospitals and differential performance among vulnerable populations may exacerbate baseline inequities in access to (and quality of) critical care. In this case study, we developed and externally validated an AI model to predict hospital mortality using clinical data from the first 24 h in the ICU.

We trained a light gradient boosted machine binary classifier (LightGBM) [23] to predict in-hospital death on a separate cohort of ICU admissions in CLIF format from Rush University Medical Center using data from 2019, 2022, and 2023, performing hyperparameter tuning through a grid search with fivefold cross validation. We selected LightGBM for its high discrimination and its ability to handle missing data without the need for imputation or exclusion of cases with high levels of missingness. We selected 30 demographic, laboratory, and vital sign variables a priori from the first 24 h of ICU admission for their established clinical relevance, supported by prior research and validated ICU scoring systems such as SOFA and APACHE [2428] (Table E2). We report missingness for each variable by site in Table E3. After training at RUSH, the final model was shared with consortium members through a public repository in the standard LightGBM text format.

We then evaluated this model on the 2020–2021 cohort described above at Rush and all other CLIF sites using a federated approach with a common model evaluation script, the model object, and each site’s local CLIF database. To comprehensively assess the model’s generalizability, we applied the TRIPOD-AI checklist (Table E4) across all test sites [29]. We evaluated model discrimination using the area under the receiver operating characteristic curve (AUC), calibration using Brier scores and calibration plots, and clinical utility through decision curve analysis [29, 30].

Case study II: Temperature trajectory subphenotyping

Growing recognition of heterogeneity within critical illness syndromes has led to the emergence of algorithmic clinical subphenotyping as a means of generating new hypotheses for investigation, improving clinical prognostication, and characterizing heterogeneous treatment effects [31, 32]. Despite the potential value of these advances in precision medicine, subphenotyping models are rarely externally validated [33].

In our second case study, we externally validated a previously developed unsupervised temperature trajectory subphenotyping model [34]. This approach uses group-based trajectory modeling and patient temperature trends over 72 h to assign patient encounters into one of four mutually exclusive subphenotypes: normothermic (NT), hypothermic (HT), hyperthermic fast-resolver (HFR), and hyperthermic slow-resolver (HSR). In 1- and 2-hospital studies of patients with undifferentiated suspected infection and COVID-19 (regardless of ICU status), these subphenotypes have demonstrated distinct immune profiles and different ICU utilization and mortality rates [35, 36]. However, the temperature trajectory model has not been evaluated within a broader critically ill population.

We developed analysis scripts that standardized body temperature measurements during the first 72 h of ICU admission and classified each patient into the temperature trajectory subgroup with the lowest sum of the mean squared errors between the patient’s observed temperature and the subphenotype’s reference trajectory. Finally, we assessed differences in patient characteristics by subphenotype and the association of subphenotypes with in-hospital mortality and invasive mechanical ventilation using multivariable logistic regression adjusted for age, sex, race, and ethnicity.

CLIF open-source commitment, AI disclosure, and IRB approval

CLIF continues to mature via our collaborative development process supported by git version control and is released under the open-source Apache 2.0 license. Our website and code repository (clif-consortium.github.io/website/) contains data dictionaries, ETL pipeline examples, quality control scripts, and complete analysis code for each case study. Each of the nine CLIF consortium sites independently received IRB approval to conduct observational studies or to build and/or quality-check a research EDW (Table E5). We used AI-assisted technologies, including large language models (LLMs), to edit the manuscript and code analysis scripts. The authors carefully reviewed and verified all AI-generated content to ensure accuracy and originality. All quoted material is properly cited.

Results

CLIF consortium demographics

To date, we have established CLIF databases at nine US health systems, comprising 39 unique hospitals and 111,440 ICU admissions. Health system-level populations were similar in terms of age (mean 60.7 years overall [standard deviation 17.1], range 56.4 [18.5] to 63.1 [17.1]) and sex (44% female overall, range 41–47% female) but racially diverse (3% to 66% Black). Patients received invasive mechanical ventilation in 42,442 hospitalizations (39% overall, range 30–55%) and 10,721 patients died in the hospital (9.6% overall, range 7.4–12.5%). Table 2 outlines detailed ICU patient characteristics, such as demographics, mortality, and ventilator parameters, while Tables E6 and E7 include SOFA scores in the first 24 h and vasopressor use, respectively. These data elements illustrate the breadth of clinical information captured by CLIF.

Table 2.

Characteristics and outcomes of ICU patient encounters in 2020–2021 across CLIF sites

Characteristic Emory University Johns Hopkins Health System Northwestern University Oregon Health & Science University Rush University University of Chicago University of Michigan University of Minnesota University of Pennsylvania CLIF Consortium (Combined)
Hospitalizations with an ICU stay, n 15,124 18,242 17,150 8559 9853 8053 7301 13,130 14,028 111,440
Hospitals, n 4 5 8 2 1 1 1 11 6 39
Age (years), mean (SD ) 61.3 (16.4) 60.8 (17.4) 63.1 (17.1) 60.3 (17.4) 59.6 (16.9) 56.4 (18.5) 58.9 (16.2) 61.7 (17.3) 60.3 (16.9) 60.7 (17.1)
Female n (%) 7100 (46.9%) 8315 (45.6%) 7425 (43.3%) 3517 (41.1%) 4660 (47.3%) 3411 (42.4%) 3011 (41.2%) 5726 (43.6%) 6193 (44.1%) 49,358 (44.3%)
Race n (%)
Asian 521 (3.4%) 790 (4.3%) 593 (3.5%) 258 (3.0%) 327 (3.3%) 148 (1.8%) 143 (2.0%) 847 (6.5%) 435 (3.1%) 4062 (3.6%)
Black 6707 (44.3%) 5643 (30.9%) 2199 (12.8%) 214 (2.5%) 3995 (40.5%) 5290 (65.7%) 879 (12.0%) 896 (6.8%) 5240 (37.4%) 31,063 (27.9%)
White 6953 (46.0%) 10,255 (56.2%) 12,723 (74.2%) 6960 (81.3%) 3697 (37.5%) 2015 (25.0%) 5780 (79.2%) 10,790 (82.2%) 7158 (51.0%) 66,331 (59.5%)
Others 943 (6.2%) 1554 (8.5%) 1635 (9.5%) 1127 (13.2%) 1834 (18.6%) 600 (7.5%) 499 (6.8%) 597 (4.5%) 1195 (8.5%) 9984 (8.9%)
Ethnicity n (%)
Hispan ic or Latin o 577 (3.8%) 1088 (6.0%) 1753 (10.2%) 657 (7.7%) 1968 (20.0%) 514 (6.4%) 495 (3.5%) 249 (1.9%) 495 (3.5%) 7511 (6.7%)
Hospital mortality, n (%) 1307 (8.6%) 1905 (10.4%) 1502 (8.8%) 677 (7.9%) 729 (7.4%) 1004 (12.5%) 847 (11.6%) 1032 (7.9%) 1718 (12.2%) 10,721 (9.6%)
Mechan ical ventilation, N, [%) 6031 (39.9%) 6804 (37.3%) 5080 (29.6%) 3231 (37.7%) 2859 (29.0%) 3462 (43.0%) 4007 (54.9%) 4801 (36.6%) 6167 (44.0%) 42,442 (39.1%)
FiO2 Median [IQR] 0.4 [0.4, 0.6] 0.4 [0.4, 0.6] 0.4 [0.4, 0.6] 0.35 [0.3, 0.5] 0.4 [0.35, 0.7] 0.4 [0.4, 0.6] 0.4 [0.3, 0.6] 0.4 [0.35, 0.6] 0.4 [0.4, 0.6] 0.4 [0.37, 0.60]
Peep Median [IQR] 6.0 [6.0, 10.0] 5.0 [5.0, 8.0] 5.0 [5.0, 7.0] 5.0 [5.0, 8.0] 8.0 [8.0, 8.0] 5.0 [5.0, 5.0] 5.0 [5.0, 10.0] 5.1 [5.0, 8.0] 5.0 [5.0, 10.0] 5.0 [5.4, 8.03]
Initial Mode category, n (%)
Assist control-volume control 2501 (41.57%) 3942 (57.95%) 4284 (84.33%) 2623 (81.18%) 337 (11.79%) 2810 (81.17%) 1097 (27.38%) 4453 (92.77%) 3664 (59.41%) 25,711 (60.58%)
Pressure-regulated volume control 1004 (16.65%) 1335 (19.62%) 174 (3.43%) 60 (1.86%) 1740 (60.86%) 0 (0%) 2416 (60.29%) 52 (1.08%) 2 (0.03%) 6783 (15.98%)
Pressure control 124 (2.06%) 106 (1.56%) 151 (2.97%) 89 (2.75%) 280 (9.79%) 80 (2.31%) 225 (5.62%) 73 (1.52%) 60 (0.97%) 1188 (2.80%)
Pressure support/CPAP 466 (7.73%) 277 (4.07%) 161 (3.17%) 260 (8.05%) 402 (14.06%) 419 (12.1%) 37 (0.92%) 91 (1.9%) 944 (15.31%) 3057 (7.20%)
SIMV 668 (11.08%) 1086 (15.96%) 196 (3.86%) 0 (0%) 25 (0.87%) 52 (1.5%) 0 (0%) 61 (1.27%) 1398 (22.67%) 3486 (8.21%)
Other 31 (0.51%) 57 (0.84%) 5 (0.1%) 185 (5.73%) 8 (0.28%) 31 (0.9%) 212 (5.29%) 23 (0.48%) 29 (0.47%) 581 (1.37%)
No mode documented 1237 (20.51%) 1 (0.0%) 109 (2.15%) 14 (0.43%) 67 (2.34%) 70 (2.02%) 20 (0.5%) 48 (0.98%) 70 (1.14%) 1636 (3.85%)

CLIF common longitudinal ICU data format, SD standard deviation

Case study I: development and external validation of an in-hospital mortality model for ICU patients

The Rush training cohort (N = 17,139 ICU admissions) had similar demographics to the Rush test cohort (Table E8). Details of the final LightGBM hyperparameters and Tripod-AI checklist are provided in Supplementary Tables E5 and E4. The proportion of missing model features across participating CLIF sites is documented in Table E3. The most important features were minimum albumin level, maximum aspartate aminotransferase (AST), minimum pulse rate, minimum diastolic blood pressure (DBP), and mean aspartate aminotransferase (AST) (see variable importance plot Fig. E3).

In the hold-out test cohort of 111,440 ICU admissions (Table 2), the AUC for predicting in-hospital mortality varied across sites, ranging from 0.73 to 0.81. Specifically, Rush University, University of Chicago, and Johns Hopkins Health System exhibited the highest AUCs, with values of 0.81 [95% CI: 0.79–0.83], 0.81 [95% CI: 0.79–0.82], and 0.81 [95% CI: 0.80–0.82], respectively. In contrast, Emory and the University of Minnesota reported the lowest AUCs, at 0.76 [95% CI: 0.75–0.78] and 0.73 [95% CI: 0.72–0.75], respectively] (Fig. 2a). Brier scores ranged from 0.064 at Oregon Health & Science University (OHSU) to 0.096 at both the University of Chicago and Penn Medicine. The calibration plot (Fig. 2b) demonstrates similar predicted versus observed probabilities across all sites, with the exception of the overestimation of mortality for higher-risk patients at some sites.

Fig. 2.

Fig. 2

Mortality model performance validation via receiver-operator characteristic curves (A), calibration curves (B), and decision curves (C). A ROC Curve. Purpose: Demonstrates the model’s ability to distinguish between patients who survive and those who do not (discrimination) across sites. How to interpret: A curve closer to the top left corner reflects better discrimination. The Area Under the Curve (AUC) quantifies this, with higher values (closer to 1) indicating stronger predictive performance. B Calibration plot. Purpose: Evaluate how closely the model’s predicted probabilities align with the actual outcomes at each site (calibration). How to interpret: A curve that closely follows the diagonal “ideal” line means the predicted risks are accurate. Deviations at higher probabilities suggest the model may over- or underestimate risk in those cases, with the degree of deviation varying by site. C Decision analysis curve. Purpose: Evaluate how beneficial the model is for decision-making at different prediction thresholds, balancing true positives and false positives. How to interpret: The vertical red line marks the high-risk threshold chosen by RUSH University (0.218). At this threshold, the model’s net benefit varies across sites, with the University of Chicago showing the highest value (0.026), indicating the model would add the most clinical value there. Conversely, the University of Minnesota and OHSU have the lowest net benefits (0.002 and 0.007, respectively), followed by Northwestern (0.009), meaning the model is less beneficial but still better than a ‘no alert’ or ‘alert all’ approach at these sites

Net benefit is a weighted average of true positives and false positives intended to quantify the clinical utility of a model at different treatment thresholds varied across sites, as shown in the decision-curve analysis in Fig. 2c [30]. At the high-risk threshold of a 0.21 probability of death determined a priori by Rush University, the model conferred the highest net benefit to the University of Chicago (0.026), followed by the University of Pennsylvania (0.025), and John Hopkins Health System (0.020). The model led to the lowest net benefit at the University of Minnesota (0.008) and Northwestern University (0.009), sites that also had overestimation calibration errors and lower AUCs. Sites that also had overestimation calibration errors and lower AUCs. The model had a positive net benefit at all sites at the threshold, indicating it increased utility compared to an “alert none” strategy (net benefit = 0 by definition).

Case study 2: Temperature trajectory subphenotypes

Across the nine participating institutions, this case study analyzed 111,440 ICU admissions and each subphenotype had a consistently observed temperature trajectory across all sites (Fig. 3). Normothermic encounters had the highest prevalence overall and at each site (range 43.5–66.0%), followed by hypothermic encounters (range 11.4–37.9%), hyperthermic slow resolvers (range 8.2–15.8%), and hyperthermic fast resolvers (range 6.2–8.3%). The distribution of patient characteristics across subphenotypes was similar across sites. Hypothermic patients were consistently older than other groups (range 60.7–68.2), and hyperthermic slow resolvers were the youngest group at all sites but one (range 47.4–59.8). Several consistent outcome patterns were observed across sites (Figs. E4 and E5). Hyperthermic slow resolvers had the highest rates of invasive mechanical ventilation at all sites (range 38.2–82.4%). Mechanical ventilation rates were lowest among normothermic and hypothermic patients. Mortality was lowest among normothermic patients (range 4.6–8.5%) and highest among hypothermic (10.1–25.7%) and hyperthermic slow resolving (8.6–19.7%) patients. After adjustment for age, sex, race, and ethnicity, subphenotype membership was consistently and independently associated with these outcomes (Fig. 3B). HSR and HFR subphenotypes were associated with significantly increased odds for invasive mechanical ventilation (IMV) (as compared to normothermic) at all sites. Additionally, HSR, HFR, and HT subphenotypes were associated with significantly increased odds for mortality at all sites.

Fig. 3.

Fig. 3

Temperature trends across temperature trajectory sub phenotypes at all sites. Purpose: Visualize temperature trajectories over time (hours) for different patient subphenotypes: HSR (yellow), HFR (green), NT (red), and HT (blue). How to interpret: HSR patients exhibit the highest initial temperature, which stabilizes at a higher average compared to other groups. HFR patients experience a sharp temperature drop within the first 24 h. NT patients maintain a relatively stable temperature around 37 °C. HT patients have the lowest temperature trajectory, remaining below 37 °C throughout. The variations in site-specific trends suggest that patient temperature responses may differ based on institutional practices or population characteristics

Discussion

We developed CLIF to standardize complex ICU data into a consistent, longitudinal format necessary for transparent and reproducible critical care research. While our initial proof-of-concept studies involved over 100,000 patients across U.S. institutions, CLIF is designed to represent the entire longitudinal course of critical illness in any hospital setting. Our open-source development model allows critical care clinicians and data scientists worldwide to collaborate in defining common ICU data elements. This approach promotes scalability, enhances collaboration, and accommodates international differences in critical illness presentation and ICU treatment practices.

In our first case study, the mortality model demonstrated good discrimination and calibration in the internal RUSH validation cohort. However, its performance varied across the other eight CLIF consortium sites. Decision curve analysis revealed positive but varying clinical utility across sites, demonstrating the model’s overall potential as a clinical decision support tool but also its sensitivity to local clinical and operational differences. These findings underscore the challenge of generalizing prognostic models across diverse healthcare settings in a one-size-fits-all fashion [37]. Healthcare systems implementing mortality prediction models should first validate them using local data and potentially recalibrate them to account for site-specific factors, ensuring optimal clinical utility for their patient population and care practices. Despite these challenges, this case study highlights CLIF’s value in enabling rigorous, multi-site evaluations of prediction models in the ICU [38].

Our second case study expands Bhavani et al.’s temperature trajectory subphenotyping model to a larger, broader, and more diverse cohort of undifferentiated patients with critical illness [34]. Mirroring prior findings in sepsis, we observed the highest mortality rates in the hypothermic group, suggesting this subphenotype robustly predicts outcomes in a general critical illness population. The consistent association of temperature trajectories with mortality and mechanical ventilation across health systems highlights the potential of longitudinal data analysis for critical illness phenotyping and personalizing ICU treatment. These findings have direct clinical implications: for instance, hyperthermic slow resolvers, who exhibit elevated rates of invasive mechanical ventilation and mortality, could be prioritized for early intervention strategies such as enhanced monitoring or timely escalation of care. This trajectory-based risk stratification approach could be integrated into real-time ICU decision-making to improve patient outcomes and is closely aligned with the objectives of the ongoing PRECISE trial, which aims to assess how similar precision-based tools can optimize care and improve outcomes for patients with sepsis [39].

Limitations and areas for improvement

The CLIF format has limitations, and we are actively seeking feedback. First, CLIF required substantial data science and critical care expertise to implement at each consortium site. Implementation challenges included mapping vital signs from different measurement methods (e.g., arterial versus cuff-based blood pressure), standardizing laboratory data from multiple sources (e.g., point-of-care versus standard testing), and harmonizing diverse documentation practices for respiratory support, particularly ventilator settings—all of which required both technical expertise and deep understanding of site-specific ICU workflows to resolve. To address these implementation barriers, the consortium is developing open-source tools, including a quality assurance application, to help diverse healthcare institutions adopt CLIF in a scalable and flexible manner.

Second, CLIF is not currently linked to the leading Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) standard for healthcare data exchange. While FHIR integration was not needed for the retrospective cases presented here, it is critical for future real-time clinical CLIF applications. The mCIDE required for CLIF tables is already represented in FHIR resources (e.g., vitals in Observation, medications in Medication Administration). Future work includes developing applications that use FHIR to extract the mCIDE and convert it into FHIR-enabled real-time CLIF tables, which could enable predictive models for clinical decision support, ICU care quality dashboards, and patient/family-facing applications that provide patient access to and summarize ICU medical data. FHIR integration will make CLIF a scalable tool bridging retrospective research and real-time clinical applications, enhancing its impact on critical care. Additionally, we have begun mapping CLIF elements to established clinical terminologies when possible (e.g., LOINC for laboratory data, RxNorm for medications). Aligning CLIF with existing CDMs like OMOP represents a crucial next step that will enable integration with standardized patient data from other clinical settings (e.g., outpatient visits before and after critical illness), thereby enhancing broader adoption and cross-system interoperability.

Third, the case studies presented in this manuscript used ADT data to define the critical illness population. This approach is susceptible to selection bias-for example, inclusion may be based on local ICU triage idiosyncrasies rather than physiological status—and left-censoring [40]. However, a key strength of the CLIF framework is that it is designed to represent critical acute illness wherever it occurs in the hospital, including the emergency department and hospital wards. This means that future CLIF analyses can use physiology and life support treatments to define the onset of critical illness and do not have to rely on a patient’s physical location or ADT data to define a disease state. Furthermore, while a small number of CLIF tables (e.g., mechanical circulatory support) will primarily be relevant to the ICU, most will be useful for general acute care clinical research.

Fourth, our federated analysis currently relies on manually executing analysis code, collaborative debugging, and site-specific reviews, which ensures high data quality but makes the process labor-intensive and difficult to scale. To improve efficiency, we are developing privacy-preserving distributed algorithms to automate workflows. To ensure the overall quality of the distributed CLIF database and support new CLIF sites, we are developing the CLIF Lighthouse, which automates quality-control checks, identifies outliers, and allows sites to compare data completeness (e.g., missing PaO2 values) with the consortium. By enabling cross-site consistency checks, the Lighthouse system helps detect local errors and enhances the reliability of federated analyses. In addition to individual data elements, the CLIF lighthouse will also quality control check derived concepts that require multiple CLIF tables (e.g. diagnosis of acute hypoxic respiratory failure, duration of mechanical ventilation).

Fifth, we acknowledge that CLIF is an evolving framework and the mCIDE needs to grow to represent critical illness comprehensively across all acute care settings. For instance, we are actively developing tables to capture advanced care planning and ‘code status’ decisions throughout hospitalization. This will enable analysis of temporal changes and differentiation between comfort care-related and other ICU mortality causes, addressing a key gap in critical care research.

Finally, while our work demonstrates the benefits of federated analysis, we would ideally release de-identified versions of our CLIF databases for public use. We are developing an open-source pipeline from MIMIC-IV to the CLIF data format (“MIMIC-IV-Ext-CLIF”), which will broaden access to the CLIF format to those outside the consortium, thereby broadening the impact of our work (Table E1). Once CLIF’s utility as a format is firmly established, we hope to make the case to our health systems leadership to make the large investment required to follow the inspirational examples of AmsterdamUMCdb and MIMIC [3, 4].

Conclusions and future directions

We developed and implemented an open-source Common Longitudinal ICU data Format (CLIF) across nine diverse health systems and demonstrated its value in two proof-of-concept case studies. We believe CLIF is a scalable and versatile tool for harmonizing critical care data across institutions. With sufficient development, CLIF could serve as the data format for pragmatic EHR-based trials, target trial emulation framework for causal inference, and foundational multi-modal AI models of critical illness. Beyond its value for research, CLIF can directly support quality reporting, benchmarking, and evidence-based practice, providing actionable insights for bedside ICU clinicians. These aspirations highlight CLIF’s potential to drive innovation and improve the standard of care in intensive care globally.

Supplementary Material

Supplementary Material

The online version contains supplementary material available at https://doi.org/10.1007/s00134-025-07848-7.

Take-home message.

The Common Longitudinal ICU data Format (CLIF) is an open-source relational database structure designed to harmonize and standardize electronic healthcare record data for critical illness research. Two case studies involving over 100,000 critically ill adults across a federated network of 9 distinct CLIF databases demonstrate the potential of CLIF to improve the care of critically ill patients through reproducible and transparent data science

Acknowledgements

The authors would like to acknowledge Bhakti Patel MD (University of Chicago) for her feedback on the manuscript.

Funding

Dr. Lyons is supported by NIH/NCI K08CA270383. Dr. Rojas is supported by NIH/NIDA R01DA051464 and the Robert Wood Johnson Foundation and has received consulting fees from Truveta. Dr. Bhavani is supported by NIH/NIGMS K23GM144867. Dr. Buell is supported by an institutional research training grant (NIH/NHLBI T32 HL007605). Dr. Gao is supported by NIH/NHLBI K23HL169815, a Parker B. Francis Opportunity Award, and an American Thoracic Society Unrestricted Grant. Dr. Luo is supported in part by NIH U01TR003528 and R01LM013337. Dr. Hochberg is supported by NIH/NHLBI K23HL169743. Dr. Ingraham is supported by NIH/NHLBI K23HL166783. Dr. Ortiz is supported by an institutional research training grant (NIH/NHLBI T32 HL007891). Dr. Weissman is supported by NIH/NIGMS R35GM155262. Dr. Parker is supported by NIH K08HL150291, R01LM014263, and the Greenwall Foundation. The other authors have no conflicts of interest to disclose.

Data availability

The underlying data from each institution are not publicly available due to privacy, regulatory, and institutional restrictions. This study was conducted using a federated analysis approach, without centralizing patient level data at any single site. Data were analyzed locally at participating institutions, ensuring that patient-level data remained within each contributing site. The CLIF Consortium welcomes opportunities for collaboration and data sharing through a standard project intake and approval process, which includes review and approval by the consortium’s leadership and participating sites. Researchers interested in collaborating with the CLIF Consortium are encouraged to contact the consortium for more information.

References

  • 1.Johnson AEW, Pollard TJ, Shen L et al. (2016) MIMIC-III, a freely accessible critical care database. Sci Data 3:160035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pollard TJ, Johnson AEW, Raffa JD et al. (2018) The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data 5:180178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Thoral PJ, Peppink JM, Driessen RH et al. (2021) Sharing ICU patient data responsibly under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) example: The Amsterdam university medical centers database (AmsterdamUMCdb) example. Crit Care Med 49:e563–e577 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Johnson AEW, Bulgarelli L, Shen L et al. (2023) MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 10:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Rojas JC, Carey KA, Edelson DP et al. (2018) Predicting intensive care unit readmission with machine learning using electronic health record data. Ann Am Thorac Soc 15:846–853 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Peine A, Hallawa A, Bickenbach J et al. (2021) Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care. NPJ Digit Med 4:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gutierrez G (2020) Artificial intelligence in the Intensive Care unit. Crit Care 24:101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hersh WR, Weiner MG, Embi PJ et al. (2013) Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 51:S30–S37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Denney MJ, Long DM, Armistead MG et al. (2016) Validating the extract, transform, load process used to populate a large clinical research database. Int J Med Inform 94:271–274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sun H, Depraetere K, De Roo J et al. (2015) Semantic processing of EHR data for clinical research. J Biomed Inform 58:247–259 [DOI] [PubMed] [Google Scholar]
  • 11.Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J et al. (2021) Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform 115:103697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.OMOP Common Data Model. https://ohdsi.github.io/CommonDataModel/. Accessed 1 Feb 2025
  • 13.Paris N, Lamer A, Parrot A (2021) Transformation and evaluation of the MIMIC database in the OMOP common data model: development and usability study. JMIR Med Inform 9:e30970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Leese P, Anand A, Girvin A et al. (2023) Clinical encounter heterogeneity and methods for resolving in networked EHR data: a study from N3C and RECOVER programs. J Am Med Inform Assoc 30:1125–1136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sikora A, Keats K, Murphy DJ et al. (2024) A common data model for the standardization of intensive care unit medication features. JAMIA Open 7:ooae033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mapping of Critical Care EHR Flowsheet data to the OMOP CDM via SSSOM. https://www.ohdsi.org/2023showcase-501/. Accessed 1 Feb 2025
  • 17.Wilkinson MD, Dumontier M, Aalbersberg IJJ et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Data Management and Sharing Policy. https://sharing.nih.gov/data-management-and-sharing-policy. Accessed 1 Feb 2025
  • 19.NIH Common Data Elements (CDE) Repository. https://cde.nlm.nih.gov/home. Accessed 1 Feb 2025
  • 20.Selvaraj S (2024) Bridging healthcare systems: revolutionizing U.S. Public Healthcare through HL7 FHIR interoperability and API technology. Int J Sci Res (Raipur) 13:684–690 [Google Scholar]
  • 21.Desai RJ, Marsolo K, Smith J et al. (2024) The FDA sentinel real world evidence data enterprise (RWE-DE). Pharmacoepidemiol Drug Saf 33:e70028. [DOI] [PubMed] [Google Scholar]
  • 22.Data Analysis and Real World Interrogation Network (DARWIN EU). In: European Medicines Agency (EMA). https://www.ema.europa.eu/en/about-us/how-we-work/big-data/real-world-evidence/data-analysis-real-world-interrogation-network-darwin-eu. Accessed 1 Feb 2025 [Google Scholar]
  • 23.Ke G, Meng Q, Finley T et al. (2017) LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3149–3157 [Google Scholar]
  • 24.Lambden S, Laterre PF, Levy MM, Francois B (2019) The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Crit Care 23:374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bennett CE, Wright RS, Jentzer J et al. (2019) Severity of illness assessment with application of the APACHE IV predicted mortality and outcome trends analysis in an academic cardiac intensive care unit. J Crit Care 50:242–246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schupp T, Weidner K, Rusnak J et al. (2023) Diagnostic and prognostic value of the AST/ALT ratio in patients with sepsis and septic shock. Scand J Gastroenterol 58:392–402 [DOI] [PubMed] [Google Scholar]
  • 27.Atrash AK, de Vasconcellos K (2020) Low albumin levels are associated with mortality in the critically ill: a retrospective observational study in a multidisciplinary intensive care unit. South Afr J Crit Care 36:74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Churpek MM, Adhikari R, Edelson DP (2016) The value of vital sign trends for detecting clinical deterioration on the wards. Resuscitation 102:1–5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Collins GS, Dhiman P, Andaur Navarro CL et al. (2021) Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open 11:e048008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vickers AJ, Elkin EB (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26:565–574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bos LDJ, Sjoding M, Sinha P et al. (2021) Longitudinal respiratory subphenotypes in patients with COVID-19-related acute respiratory distress syndrome: results from three observational cohorts. Lancet Respir Med 9:1377–1386 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lyons PG, Meyer NJ, Maslove DM (2024) The road to precision in critical care. Crit Care Med 52:999–1001 [DOI] [PubMed] [Google Scholar]
  • 33.Gordon AC, Alipanah-Lechner N, Bos LD et al. (2024) From ICU syndromes to ICU subphenotypes: consensus report and recommendations for developing precision medicine in the ICU. Am J Respir Crit Care Med 210:155–166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bhavani SV, Carey KA, Gilbert ER et al. (2019) Identifying novel sepsis subphenotypes using temperature trajectories. Am J Respir Crit Care Med 200:327–335 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bhavani SV, Verhoef PA, Maier CL et al. (2022) Coronavirus disease 2019 temperature trajectories correlate with hyperinflammatory and hypercoagulable subphenotypes. Crit Care Med 50:212–223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Benzoni NS, Carey KA, Bewley AF et al. (2023) Temperature trajectory subphenotypes in oncology patients with neutropenia and suspected infection. Am J Respir Crit Care Med 207:1300–1309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Lyons PG, Hofford MR, Yu SC et al. (2023) Factors associated with variability in the performance of a proprietary sepsis prediction model across 9 networked hospitals in the US. JAMA Intern Med 183:611–612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rockenschaub P, Akay EM, Carlisle BG et al. (2025) External validation of AI-based scoring systems in the ICU: a systematic review and meta-analysis. BMC Med Inform Decis Mak 25:5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bhavani SV, Holder A, Miltz D et al. (2024) The precision resuscitation with crystalloids in sepsis (PRECISE) trial: a trial protocol. JAMA Netw Open 7:e2434197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Haneuse S, Daniels M (2016) A general framework for considering selection bias in EHR-based studies: What data are observed and why? EGEMS (Wash, DC) 4:1203. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Data Availability Statement

The underlying data from each institution are not publicly available due to privacy, regulatory, and institutional restrictions. This study was conducted using a federated analysis approach, without centralizing patient level data at any single site. Data were analyzed locally at participating institutions, ensuring that patient-level data remained within each contributing site. The CLIF Consortium welcomes opportunities for collaboration and data sharing through a standard project intake and approval process, which includes review and approval by the consortium’s leadership and participating sites. Researchers interested in collaborating with the CLIF Consortium are encouraged to contact the consortium for more information.

RESOURCES