Abstract
Background
Delirium, an acute and fluctuating neurocognitive disorder prevalent among hospitalized and geriatric surgical patients, remains a pervasive yet underrecognized clinical challenge. Leveraging Electronic Health Records (EHRs), Machine Learning (ML) models have emerged as promising tools for early prediction and intervention. This scoping review synthesizes the existing literature, identifies current research gaps, and outlines future directions to advance delirium prediction modeling.
Methods
Following the PRISMA Extension for Scoping Reviews (PRISMA-ScR) guidelines, literature from 2020 to 2025 was systematically searched across Google Scholar, EMBASE, PubMed, Scopus, and Web of Science using a comprehensive query strategy.
Results
The review highlights a significant reliance on structured preoperative and intraoperative EHR for delirium prediction, despite the existence of abundant and highly informative unstructured clinical narratives. Furthermore, a substantial heterogeneity exists in the utilized delirium identification methodologies (e.g. Nursing Delirium Screening Scale (Nu-DESC), Delirium Observation Screening Scale (DOSS), International Classification of Diseases (ICD) criteria, 4AT delirium detection, Confusion Assessment Method (CAM), Intensive Care Delirium Screening Checklist (ICDSC), Cornell Assessment of Pediatric Delirium (CAPD) Diagnostic and Statistical Manual of Mental Disorders 5th version (DSM-5), natural language processing (NLP) based analysis), alongside a focus on specific surgical subgroups. This limited data utilization and methodological variation pose challenges to ML model generalizability and robustness. The literature also showed a research emphasis on critically ill patients, potentially overlooking subtle delirium in low-severity cases.
Conclusions
Future research should focus on early risk stratification and prioritize four key areas: (1) expanded utilization of both tabular EHR and unstructured clinical notes; (2) development of integrated multimodal fusion models adaptable to dynamic patient states; (3) investigation of the temporal dynamics of delirium development using time-series analysis; and (4) application of causal inference methods to elucidate the relationships between risk factors and delirium. Superior prediction performance can be achieved by leveraging cutting-edge architectures (e.g. transformers) and parallel computing efficiencies to move beyond traditional machine learning. To enhance real-world adoption, future work should integrate Explainable AI tools such as Shapley Additive Explanations (SHAP) within EHR-based decision support systems, improving interpretability and mitigating subgroup disparities in localized risk assessment.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12911-026-03362-y.
Keywords: Delirium, electronic health record (EHR), machine learning, prediction models; Clinical notes; Natural Language Processing (NLP), multimodal fusion; Explainable AI (XAI); Time-series analysis; Causal inference
Background
Developing machine learning and artificial intelligence prediction models from EHR to improve healthcare outcomes is a critical and ongoing research objective [1–3]. These algorithms are increasingly applied to construct predictive models for cognitive syndromes and related mental health conditions, such as Alzheimer’s disease and related dementias (ADRD), attention deficit hyperactivity disorder (ADHD) and delirium [4–6]. Postoperative delirium, manifests as an acute and fluctuating alterations in attention, cognition, and awareness [7, 8] Older patients undergoing surgeries like cardiac and hip fracture surgeries are at an elevated risk [9–11]. This multifactorial cognitive syndrome, occurring in 15–80% of patients is linked to long-term cognitive decline, increased mortality, prolonged hospitalizations and costs [12–18]. Early identification is crucial, as 30–40% of cases are potentially preventable [19, 20]. However, delirium risk factors are often unrecognized [21].
By consolidating diverse patient data encoding contextual, categorical, and chronological information, EHRs have become a cornerstone for predictive modeling research. Most existing delirium prediction models primarily rely on static, on-admission, or intraoperative variables, limiting their applicability for longitudinal and anticipatory care planning [6, 22–24]. Models based solely on preoperative features demonstrate modest predictive performance; for instance, a Fast-and-Frugal Tree (FFT) analysis using preoperative demographics, American Society of Anesthesiologists (ASA) classification [25], comorbidities (e.g., hypertension, coronary artery disease (CAD), diabetes), baseline cognition, medications, and alcohol use achieved a balanced accuracy of 0.637 [26]. In contrast, incorporating dynamic intraoperative variables such as anesthesia duration and pre-medication improved model performance, particularly with tree-based classifiers, yielding approximately a 10% gain in test accuracy. The EHR-derived variables commonly used for delirium prediction can thus be temporally categorized as follows:
- Preoperative Variables: These predominantly comprise baseline patient characteristics assessed immediately prior to admission and pre-existing conditions, including:
- Demographics: Age, sex, body mass index (BMI), race/ethnicity.
- Comorbidities: Prevalent chronic conditions such as hypertension, diabetes mellitus, cardiovascular disease (e.g., CAD, heart failure), preexisting cognitive impairment, and documented mental health disorders (e.g., depression, anxiety).
- Medication History: Chronic use of specific pharmacological agents, including benzodiazepines and antidepressants.
- Lifestyle Factors: Tobacco smoking status, drug abuse and alcohol consumption history.
- Preoperative Laboratory Biomarkers: Serum creatinine, serum sodium, serum albumin, white blood cell count, and serum glucose levels.
- Intraoperative Variables: These pertain to factors and events occurring during the surgical procedure:
- Surgical Characteristics: Surgical specialty and specific procedure type (e.g., cardiac, orthopedic), anesthesia type, specific technical aspects relevant to the surgical intervention.
- Temporal Aspects: Duration of the surgical procedure, anesthesia duration.
- Physiological Perturbations: e.g., Vital signs, estimated blood loss, blood clot.
- Pharmacological Interventions: Intraoperative administration of specific medications.
- Postoperative Variables: These represent data points collected in the immediate and subsequent post-surgical period:
- Healthcare Utilization: Postoperative length of hospital stay (LOS), mechanical ventilation (MV) usage, duration and mode of MV.
- Postoperative Laboratory Biomarkers: Serial measurements of blood analytes.
- Pharmacological Management: Postoperative drug prescriptions (e.g., opioids, anticholinergics).
- Physiological Monitoring: Serial vital sign measurements (e.g., heart rate, blood pressure, oxygen saturation, temperature).
Despite the demonstrated utility of preoperative and on-admission structured EHR variables for delirium prediction, limited attention has been given to pre-admission and unstructured clinical data, such as progress notes or physician narratives. The present scoping review systematically maps existing delirium prediction models across data modalities to identify methodological trends and opportunities for advancement.
Methods
The PRISMA Extension for Scoping Reviews (PRISMA-ScR) [30] guidelines were followed to guide this review.
Search strategy
To ensure transparency and reproducibility, the search strategy was formalized using explicit Boolean logic and predefined search parameters.
Data Sources: Searches were conducted using Harzing’s Publish or Perish (PoP) software (version 8.17) [31] which was used to retrieve records from Google Scholar, EMBASE, PubMed, Scopus and Web of Science. Searches were performed using free-text keywords, controlled vocabularies (e.g., MeSH or Emtree terms) were not explicitly applied.
Search Period: The search was restricted to studies published between 2020 and 2025. The final search was executed in August 2025.
Search Terms and Boolean Logic: The search strategy employed free-text keywords organized into the following two thematic concept groups. Keywords within each group were joined by the OR operator, and the two groups were combined using the AND operator yielding the Combined Query as (Clinical Concept) AND (Technical Concept) 1) Clinical Concept: “Delirium” OR “Cognition” OR “Cognitive” 2) Technical Concept:”prediction” OR”machine learning” OR”artificial intelligence” OR”deep learning” OR”electronic health record”
Study selection
Studies were eligible for inclusion if they met the following criteria:
Machine Learning Focus: Employed and analyzed one or more machine learning algorithm.
Defined Population: The characteristics of the study population and inclusion/exclusion criteria were clearly described such as specific surgical patients and their severity levels.
Data Transparency: Explicitly reported the data source.
Delirium Evaluation: Clearly defined the delirium assessment method.
Study Designs: Employed a retrospective, prospective, or observational design. Reviews, surveys, and trials were excluded.
Sample Size: Involved a minimum of 20 individual patients
Screening and inclusion of articles
A total of 1746 records were identified through database search. After removal of duplicate records, 479 articles were screened at the title and abstract level, of which 74 required full-text examination. 63 studies met the inclusion criteria and were included in the final survey. Among these, 56 studies focused on structured data–based delirium prediction models. The study selection process is summarized in the PRISMA flow diagram (Fig. 1). Detailed characteristics of included studies are provided in the Supplementary Table . Study screening and eligibility assessment were conducted using predefined inclusion and exclusion criteria, with iterative consultation among the author team when eligibility questions arose.
Fig. 1.
PRISMA-ScR [30] flow diagram: article identification, screening, and selection. Stages of article identification, screening, and selection employed in this scoping review, as guided by the preferred reporting items for systematic reviews and meta-analyses for scoping reviews (PRISMA-ScR)
Data charting and synthesis
Data charting was conducted using a structured extraction framework to support descriptive synthesis. Extracted information included study cohort characteristics including clinical setting, study design, delirium assessment method, data modality (structured, unstructured, or multimodal), machine learning algorithms, feature construction strategies, reported performance metrics (e.g., Area Under the Receiver Operating Characteristic Curve (AUC-ROC)), and the use of interpretability or explainability techniques. These data were used in the organization of tables, figures and the narrative synthesis.
Categorization of included studies
Included studies were grouped according to data modality and modeling approach:
Structured Data Models: Studies using tabular EHR data variables encoded as numerical or categorical variables.
Unstructured Data Models: Studies incorporating clinical notes or other free-text data.
Multimodal Fusion Models: Studies integrating heterogeneous data sources (e.g., tabular variables and clinical text), which remain limited in number and represent an important area for future research.
Structured data delirium prediction models: characteristics of included studies
Delirium prediction models have significantly benefited from various supervised machine learning algorithms, including Logistic Regression (LR), probability-based Naive Bayes, ensemble methods like Random Forests and eXtreme Gradient Boosting (XGBoost), neural networks leveraging layered learning to capture complex patterns, and discriminative Support Vector Machines [6, 32–36]. As shown in Fig. 2, feature engineering presents a critical methodological bifurcation. Predictive models either perform explicit feature selection prior to training commonly using LASSO regression or tree-based feature importance metrics to identify a relevant subset, or adopt a comprehensive baseline approach, training on all available features and relying on established delirium risk factors for predictive power.
Fig. 2.
Methodological workflow for delirium prediction
Logistic Regression (LR) remains the foundational cornerstone of delirium prediction (63% of surveyed structured-data studies), valued for its interpretability and widespread clinical use. Gradient boosting ensemble methods such as eXtreme Gradient Boosting (XGBoost) follow in prevalence (35% of surveyed structured-data studies). A recent study comparing LR and several ML models including gradient boosting reported comparable AUC-ROC values, with LR showing higher sensitivity and fewer variables highlighting its practical advantages [37]. However, direct performance comparisons across studies remain limited due to heterogeneity in cohorts, feature sets, study design, outcome definitions and validation strategies.
Features/Variables selection considerations
Optimal feature selection requires balancing predictive power with pragmatic accessibility. While established high-risk factors like dementia, cerebrovascular accident (CVA), and traumatic brain injury (TBI) are powerful predictors, their inclusion risks creating a model that simply learns to identify these pre-existing, at-risk individuals [38]. Consequently, careful consideration is crucial to avoid collinearity and overfitting while ensuring generalizability across a wider population. Table 1 systematically details the varied approaches to pre-existing neurological disorder management in delirium prediction studies either through feature inclusion or patient cohort selection. Specific studies, by excluding patients with dementia, psychiatric, neurological disorders, TBI, or low Mini-Mental State Examination (MMSE) scores [62], report AUC-ROC values ranging from 0.72–0.90 in focused intensive care unit (ICU), non-cardiac, cardiovascular, Coronary Artery Bypass Graft (CABG) surgical and general anesthesia surgical cohorts [32–34, 63–69]. Conversely, studies without explicit neurological exclusion criteria show a broader AUC-ROC range of 0.66–0.93 across general hospitalized and surgical populations [70–73]. Notably, some cardiac surgery models further exclude patients with pre-existing cognitive dysfunction or neurosurgical history, highlighting that such neurological comorbidities may introduce performance bias [74].
Table 1.
Overview of structured-data studies including pre-existing neurological disorders
| S No. | Previous Neurological Disorder | Ref |
|---|---|---|
| 1 | Dementia | [39–54] |
| 2 | History of delirium, Mental Illness, Head Injury | [44, 50, 52, 54, 55] |
| 3 | Psychotic/Neurological Disorder, Trauma | [40] |
| 4 | Neurological Surgery | [56] |
| 5 | Parkinson’s disease, Headache Disorder | [37, 48, 57] |
| 6 | Known neurological disorder or cognitive impairments | [51, 52, 58, 59] |
| 7 | Brain tumor, Metastatis, CBD | [37, 45, 60] |
| 8 | History of Stroke | [46, 47, 49, 53, 54, 61] |
| 9 | Schizophrenia spectrum, Bipolar Disorder | [51, 52] |
Feature importance assessment
Feature importance assessment in delirium prediction studies broadly follows two complementary paradigms (Table 2).
Feature Selection prior to model training: This pre-training step aims to reduce dimensionality, improve model stability, and enhance interpretability by identifying a subset of salient predictors before final model fitting. Common approaches include L1-penalized regression (LASSO), elastic-net regularization, recursive feature elimination, step-wise backward selection using Akaike Information Criterion (AIC), and univariate statistical filtering based on variance or hypothesis testing. Approximately, 33% of the reviewed structured-data studies employed such antecedent feature selection, reflecting a conventional clinical machine learning workflow encompassing preprocessing, feature selection, hyperparameter tuning, model training, and internal or external validation. While effective in identifying strong individual predictors, these approaches typically do not capture higher-order feature interactions.
Model-intrinsic and post-hoc feature importance analysis: A second and increasingly prevalent paradigm involves assessing feature importance after model training, either through model-intrinsic measures or post-hoc explainable artificial intelligence (XAI) frameworks. Model-intrinsic approaches derive importance directly from trained models, including tree-based importance metrics from Random Forests, XGBoost, LightGBM, odds ratio estimates from logistic regression, and nomograms. In contrast, post-hoc XAI methods such as SHapley Additive exPlanations (SHAP) [85], Local Interpretable Model-Agnostic Explanations (LIME) [86] and Layer-wise Relevance Propagation (LRP) [87] provide model-agnostic or model-specific explanations by attributing contributions of individual features to model predictions. SHAP, a method grounded in game theory, was employed in approximately 25% of the reviewed structured-data studies for feature importance analysis. It quantifies each feature’s contribution to an individual prediction by comparing the model’s output with the expected (baseline) prediction across all possible feature combinations. Computed post hoc, SHAP values explicitly capture the effects of feature interactions, providing a data-driven understanding of model behavior. This contrasts with pre-training feature selection techniques, which emphasize marginal feature effects and may obscure interaction-driven contributions. Given its ability to support nuanced interpretability, SHAP-based analyses appear particularly valuable for future delirium prediction research.
Table 2.
Feature important strategies adopted by structured-data delirium risk prediction studies
| A. Feature Selection (Pre Modeling) | References |
|---|---|
| LASSO Regression | [63, 75] |
| Feature Selector Packages in R | [41] |
| Recursive Feature Elimination (RFE) | [39] |
|
Univariate Variance (p-value filtering/sample T tests) |
[53, 54, 74] |
| Random forest-based recursive variable selection | [44] |
| Step-wise backward predictor selection using Akaike Information Criterion (AIC)/LR | [76, 77] |
| B. Model-Intrinsic Feature Importance (During/After Training) | References |
| Feature importance for tree &/LR model | [49, 58, 59, 61, 64, 69, 78] |
| Ensemble model combining XGBoost & RF | [55] |
| XGBoost Importance | [51, 57] |
| Top features selected by decision trees, RF, LightGBM, LR/GBM | [60, 79] |
| Nomogram of LR model | [37] |
| Odds Ratio estimates | [50, 54] |
| Information gain for XGBoost | [80] |
| Multi-Layer Perceptron (MLP) | [81] |
| B. AI tools after Model development | References |
| SHapley Additive exPlanations (SHAP) | |
| Layer-wise Relevance Propagation (LRP) | [46] |
Across the reviewed literature, feature importance analyses consistently identified age, surgical type and duration, American Society of Anesthesiologists (ASA) physical status classification, intraoperative variables, and preexisting comorbidities such as neurological disorders and chronic obstructive pulmonary disease (COPD) as key delirium predictors [32, 34, 35, 40, 56, 63, 70, 82, 84]. SHAP-based analyses further highlighted the importance of clinical and intraoperative factors like Glasgow Coma Scale (GCS) [88], Richmond Agitation Sedation Scale (RASS) [89], sedation levels, mechanical ventilation (MV) and body temperature [33, 34, 71, 83]. Alcohol use and smoking status were also identified as influential predictors [32, 35]. Emerging evidence additionally suggests potential model performance biases associated with sociodemographic variables such as sex and race, underscoring the importance of fairness-aware interpretability analyses [90].
Variable distribution across studies
The reviewed studies demonstrate substantial heterogeneity in study populations, encompassing pediatric to geriatric cohorts across diverse medical and surgical settings. Demographic variables such as age at admission, sex, and body mass index (BMI) were the most frequently utilized, appearing in 80% of the reviewed structured-data studies, reflecting their well-established association with delirium risk [91]. Following demographic factors, in-hospital, ICU-specific, and surgical variables were commonly incorporated, including laboratory analytes (e.g., serum albumin, sodium), vital signs, biosignals (e.g., electrocardiogram), hematological measures, and operative parameters such as anesthesia duration, surgical duration and site, ASA physical status, surgical specialty, anesthesia modality, intraoperative complications, intraoperative heart rate, and perioperative medication use. Figure 3 illustrates overall variable utilization trends.
Fig. 3.
Feature categories leveraged in structured-data delirium prediction studies. Brackets show the counts of studies where these feature categories are utilized
Temporal focus of predictors
Most contemporary studies emphasize in-hospital features or variables obtained near admission, appearing in 67% of the reviewed structured-data studies, while pre-existing comorbidities are also frequently assessed (57% of structured-data studies), consistent with their established contribution to delirium susceptibility [92]. Delirium’s complex, multifactorial etiology likely reflects underlying neurobiological disturbances that evolve dynamically over time [92]. Variables capturing neurological history and cognitive status such as dementia, prior delirium, and psychiatric illness were examined less often (25% of structured-data studies). A small subset of studies leveraged the temporal dimension of EHR data (such as vital signs, laboratory measurements, and medication data) using architectures such as bidirectional long short-term memory (BiLSTM) networks and self-attention [71, 83]. While this review highlights a prevalent reliance on static in-hospital features, future research may benefit from incorporating time-series modeling approaches to better capture evolving patient trajectories and improve temporal precision in delirium risk prediction.
Delirium assessment and risk factor analysis
Delirium, an acute neurocognitive disorder that manifests in diverse subtypes (e.g., hypoactive, hyperactive, mixed) and clinical contexts (ICU, postoperative, subsyndromal), is characterized by substantial diagnostic heterogeneity resulting in outcome definition variability. Across the reviewed studies, delirium labels were derived using diverse assessment approaches, including validated screening instruments such as the Confusion Assessment Method (CAM), Nursing Delirium Screening Scale (Nu-DESC), Intensive Care Delirium Screening Checklist (ICDSC), Diagnostic and Statistical Manual of Mental Disorders (DSM-5), Cornell Assessment of Pediatric Delirium (CAPD), International Classification of Diseases (ICD9/10), chart reviews, and NLP-driven analyses [13, 93–97]. This heterogeneity reflects differences in assessment purpose (screening vs. diagnosis), measurement frequency, temporal horizon, and ascertainment quality, and represents a central methodological challenge for model generalizability.
The subjectivity inherent in clinical delirium assessment, compounded by the disorder’s fluctuating course and multifactorial etiology, contributes to under-recognition and inconsistent labeling across studies and care settings [21]. The resulting diagnostic heterogeneity likely explains the challenge of developing robust and generalizable machine learning models suitable for clinical deployment. Furthermore, variability in diagnostic criteria likely contributes to discrepancies between delirium incidence reported in administrative databases and prospective studies using ICD codes [48, 98]. Supplementary Table summarizes the distribution of delirium assessment methods across study cohorts and clinical settings.
Given delirium’s prevalence in critical care, 15 studies specifically focused on ICU patients predominantly employing CAM-ICU [99] and ICDSC [94] as screening tools [35, 40–42, 49, 53, 63, 68, 69, 75, 83, 84, 100–102]. A systematic review highlights CAMICU’s high effectiveness for both screening and diagnosis, and ICDSC’s utility in identifying subsyndromal delirium with prognostic implications [103]. Supplementary Table provides a detailed view of the delirium assessment techniques employed by the reviewed studies. Some studies utilized the quicker, behavior-focused Nu-DESC [93], while 13 studies leveraged DSM (alone or with CAM) for delirium assessment [32, 34, 36, 55, 57, 60, 64, 65, 72, 74, 80–82].
Administrative coding approaches using ICD codes were employed in a smaller subset of studies [39, 41, 48, 73, 104]. A few also utilized chart-based reviews [97], where trained nurses, blinded to interview data, abstracted information to analyze factors impacting accurate delirium identification [43, 44, 67]. Less commonly, standardized scales were utilized, including the 16-item Delirium Rating Scale-Revised-98 (DRS-R-98) [105], a validated tool for assessing delirium severity and diagnosis, particularly valuable for longitudinal studies [45, 65]. The 25-item Delirium Observation Screening Scale (DOSS) was also employed, designed for early identification based on nurses’ observations aligned with DSM or CAM criteria [41, 51, 58, 60, 76]. Additionally, one observational study used NLP for sentence labeling of delirium-related keywords and CAM terminology in clinical notes [100].
To diagnose delirium in children, two studies relied on Cornell Assessment of Pediatric Delirium (CAPD) [95], a standardized tool specifically designed for this age group [75, 106]. One single-center retrospective cohort study modeled Pediatric Emergence Delirium (PED) where Watcha Scale was used for assessing emergent delirium [107, 108]. The Watcha scale is a four-point arousal scale that ranges from a score of 1 to 4 for assessing the presence of emergence delirium and emergence agitation in children recovering from general anesthesia. Together, these findings highlight outcome definition heterogeneity as a core barrier to cross-study comparability, rather than a deficiency of individual modeling approaches, consistent with the aims of a scoping review.
Observed limitations
A consistently reported limitation across these studies is the challenge of generalizability and integration of delirium prediction models into diverse clinical environments. This limitation primarily stems from the heterogeneity of institutional protocols, clinical documentation practices, and EHR system variations. Consequently, models developed and validated at a single-center institution often exhibit diminished performance when applied to external, real-world data. This necessitates the establishment of standardized data pipelines and comprehensive guidelines to enhance model applicability.
Methodological quality assessment
The methodological quality of the included studies was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) [109] framework. Consistent with the scope of machine learning-based delirium prediction studies, the assessment focused on key PROBAST domains, including participants, predictors, outcome, and analysis, rather than generating a formal risk of bias (ROB) score for each study.
Information relevant to methodological quality was synthesized across the study characteristics tables and figures (Table 1, Table 2, Fig. 3 and Supplementary Table ). Participant selection and study design including study cohort characteristics, study type (retrospective, prospective, or observational), and the inclusion of patients with pre-existing neurological disorders were examined to assess potential selection bias and applicability concerns. Particular attention was given to whether the inclusion of high-risk populations, such as patients with pre-existing neurological disorders, may have influenced delirium incidence and model performance. Predictor handling and feature selection strategies, both prior to and after model development, were assessed to identify potential risks related to data leakage and model interpretability. Outcome definition and measurement were evaluated based on the reported delirium assessment methods (e.g., CAM, ICDSC, or ICD-based definitions) recognizing that subjective or administrative definitions may introduce outcome misclassification. Finally, modeling approaches and reported performance metrics (e.g., AUC-ROC) were considered to evaluate analytical rigor and reporting practices.
In line with the objectives of a scoping review and PRISMA-ScR guidance, this methodological quality assessment was used to contextualize and interpret the evidence rather than to exclude studies or formally grade risk of bias. Accordingly, the review prioritizes breadth of evidence and methodological transparency over in-depth appraisal of individual study quality.
Algorithm comparison across included studies
As summarized in the Supplementary Table , several studies evaluated multiple machine learning algorithms within the same cohort, reporting a range of AUC-ROC values rather than performance for a single model. Commonly compared methods included logistic regression, random forests, gradient boosting models, support vector machines, and neural networks. Studies were categorized based on their use of within-study multi-algorithm comparisons. However, the rigor of benchmarking varied substantially, some studies provided comprehensive evaluations across diverse architectures, while others reported limited or selective comparisons. An emerging best practice in clinical machine learning is the systematic comparison of multiple algorithms under a unified evaluation framework, as exemplified by recent work employing automated machine learning (Auto-ML) pipelines such as PyCaret [110]. Such approaches facilitate standardized preprocessing, consistent validation strategies, and transparent performance comparisons, thereby ensuring more robust and reproducible model selection in clinical prediction research.
Unstructured data delirium prediction models: characteristics of included studies
Unstructured data sources such as clinical notes, physician narratives, radiology reports, and patient surveys remain largely underutilized in delirium prediction modeling [111–114]. Critical patient context, including social, behavioral, and lifestyle factors, are often not captured in a structured format yet can significantly influence delirium risk. Recent advances in NLP and deep learning enable extraction of predictive signals from these unstructured modalities [115]. Beyond text, medical imaging and surgical video data analyzed through convolutional and transformer-based architectures offer additional insights [115–117]. Specifically, MRI and cerebral images have been used by healthcare providers to study and identify delirium [118–120].
Although few studies have addressed delirium prediction from unstructured data, one large-scale retrospective analysis stands out for its scope and methodological rigor. The study processed 1.5 million clinical notes from 10,000 patients spanning dementia, COVID-19, and neurocritical cohorts [100].
From these, 200,000 sentences were annotated using delirium-related keywords to train predictive models. Comparative experiments using SVMs, Recurrent Neural Networks (RNNs), and transformer architectures demonstrated the superiority of contextual language models. The transformer-based model achieved an AUC-ROC of 0.99, outperforming the bag-of-words SVM baseline that encoded unigram and bigram features. However, this high performance likely reflects dataset-specific signals such as the prevalence of neurological comorbidities and explicit delirium mentions limiting generalizability across broader clinical settings.
A recent protocol outlines the design of machine learning models for predicting hospital-induced delirium using both structured and unstructured EHR data [121]. In alignment with TRIPOD reporting guidelines [122], it recommends tree-based algorithms and logistic regression for structured features, alongside diverse NLP methods for unstructured text. The protocol introduces a delirium-specific thesaurus of keywords and n-grams to support manual sentence labeling by nurses under expert supervision from clinicians and text-mining specialists. For text processing, Clinical Named Entity Recognition (NER) using Conditional Random Fields (CRFs) is proposed to extract delirium-related entities, while Latent Dirichlet Allocation (LDA) topic modeling will uncover latent risk factors for early detection. The framework further proposes a three-stage multimodal early-fusion architecture integrating both data modalities for prediction. Results from this protocol are forthcoming and have not yet been reported.
A rule-based NLP approach was employed in one study to identify delirium from clinical notes, without incorporating deep learning or machine learning architectures [123]. Using the open-source MedTaggerIE pipeline [124], the researchers integrated domain-specific knowledge from the CAM to detect delirium-related concepts. The system scanned notes for direct mentions and key phrases, normalizing lexical variants (e.g., “unresponsiveness” and “decreased responsiveness” were both coded as “disconnected”), and achieved an accuracy of 0.967. A subsequent multi-institutional validation study compared NLP-derived delirium cases against ICD-coded diagnoses [125], revealing that the NLP method detected more than twice as many delirium episodes (7.36 vs. 3.02 per 100 hospitalizations). These findings underscore the underdiagnosis of delirium in conventional coding systems, though the authors note that even advanced NLP methods may underestimate true prevalence due to incomplete or inconsistent clinical documentation.
Collectively, these findings highlight a clear need to advance delirium prediction research through the integration of sophisticated machine learning and artificial intelligence architectures, including transformer-based and other deep learning models. Despite the central role of clinical narratives in documenting delirium-related symptoms, relatively few studies have incorporated unstructured data, such as clinical notes in delirium prediction models. This limitation reflects several methodological challenges, including variability in documentation practices, the absence of standardized annotations, and the risk of label leakage arising from temporal overlap between clinical text and delirium assessment. From a methodological perspective, future research would benefit from clearly defined temporal windows for text extraction and robust strategies to mitigate information leakage, while maintaining a balance between interpretability and predictive performance. Addressing these challenges and moving beyond conventional rule-based NLP approaches is essential to fully leverage the rich, unstructured information embedded in clinical narratives, ultimately enabling the development of more robust, generalizable and clinically meaningful delirium prediction systems.
Multimodal prediction
Multimodal models integrate diverse EHR modalities, combining structured data (e.g., laboratory results, diagnosis codes) with unstructured data (e.g., clinical notes). While single-source data such as screening, diagnosis, or treatment variables have been widely utilized for outcome modeling [126], multimodal fusion has demonstrated superior predictive performance [127]. In precision health, multimodal machine learning enables more comprehensive representations of patient states by leveraging complementary information across data types. Research in multimodal learning categorized its core challenges into five domains: representation, translation, alignment, fusion, and colearning [128]. Recent advances, such as the Chameleon architecture, exemplify progress in this field by unifying text and image modalities through token-level fusion within autoregressive transformers [129]. Architectural innovations such as query-key normalization and strategically placed layer normalizations have improved training stability and performance, with Chameleon outperforming LLaMA-2 even on text-only benchmarks.
Multimodal techniques have been increasingly applied across clinical domains such as respiratory and infectious disease diagnosis and prognosis [130, 131]. In the context of delirium, a recent study employed a multimodal machine learning framework integrating structured EHR data including demographics, vital signs, laboratory results, and medication history with unstructured clinical notes [132]. The notes aggregated within a 12-hour window preceding risk stratification (approximately 24 hours before CAM assessment), were processed using a conventional NLP pipeline involving sentence segmentation, tokenization, stemming, lemmatization, and bag-of-words feature extraction. The derived text features, refined with expert clinical input, were combined with structured EHR data to train a random forest–based fusion model, achieving strong discriminative performance (AUC-ROC = 0.94). However, reliance on bag-of-words representations limits the model’s capacity to capture contextual semantics within clinical narratives. Incorporating transformer-based embeddings and extending temporal observation windows could enhance early risk detection and enable more proactive delirium management.
A preliminary protocol paper outlined a three-stage predictive framework for delirium [121]. Its architecture begins with an early fusion stage that integrates structured EHR variables with textual features, followed by a logistic regression model with LASSO regularization for dimensionality reduction and feature selection. In the final stage, a delirium risk classifier is applied, with candidate algorithms including Bayesian classifiers, decision trees, and SVMs.
Despite the promise of multimodal learning, the limited number of multimodal delirium prediction studies reflects several practical and methodological barriers. First, integrating heterogeneous EHR modalities requires careful alignment of structured variables and unstructured text across differing temporal resolutions, increasing preprocessing complexity and susceptibility to information leakage. Second, reliable labeling remains challenging, as delirium onset is often transient, inconsistently documented, and temporally misaligned with clinical notes, complicating supervised multimodal learning. Third, institutional barriers including restricted access to raw clinical text, imaging data, and cross-departmental data governance constraints limit the availability of harmonized multimodal datasets. Additional challenges include increased computational demands, the need for domain expertise to curate modality-specific features, and difficulties in ensuring model interpretability and clinical trust. Collectively, these barriers help explain the scarcity of multimodal delirium prediction models despite their demonstrated potential and highlight the need for standardized data integration pipelines and robust temporal modeling strategies in future work.
Discussion
The multifactorial and fluctuating nature of delirium presents significant challenges for accurate prediction, particularly in already vulnerable hospitalized and surgical patient populations. Given the added burden a delirium diagnosis imposes, early identification is paramount for proactive intervention. This imperative underpins the need for this scoping review, where we synthesize current literature to propose key methodological and modeling considerations for future delirium prediction research. These include: (a) leveraging unstructured clinical data with advanced context-based transformer models while implementing robust safeguards against label leakage; (b) expanding focus to include subtle or subsyndromal delirium, rather than exclusively critically ill patients, to enhance broader clinical applicability; (c) incorporating longitudinal modeling of EHR time-series data using attention-based encoders, Transformers, BiLSTMs, or state-space architectures to capture temporal dynamics and (d) integrating Explainable AI (XAI) tools, such as SHAP to enhance model interpretability, facilitate clinical integration, and potentially mitigate generalizability challenges by providing data-driven insights into predictions.
Integrating machine learning models into clinical practice
Despite promising progress in predictive modeling, the real-world adoption of machine learning models for delirium remains limited. Bridging this translational gap requires aligning model development and evaluation with clinical workflows, patient demographics, and causal structures inherent to each healthcare setting. Models trained on balanced or demographically matched cohorts may perform inconsistently when applied to local populations differing in age, sex, or comorbidity distributions. These discrepancies underscore the importance of site-specific calibration and causal inference–guided validation to ensure fairness, reliability, and interpretability. Incorporating causal discovery and counterfactual modeling frameworks can further elucidate whether identified risk factors influence delirium outcomes through genuine causal pathways or merely reflect confounded associations.
Capturing the temporal dynamics of patient physiology represents another critical step toward clinical translation. Many current models rely on static EHR snapshots, overlooking time-dependent fluctuations in vital signs and laboratory parameters that often precede delirium onset. Researchers leveraging longitudinal EHR data particularly vital signs, medication trajectories and laboratory results could encode these as temporal sequences, benefiting from architectures such as attention-based temporal encoders, Transformer-based time-series models, or bidirectional LSTMs (BiLSTMs) to model patient trajectories over time. However, standard transformer architectures are constrained by quadratic computational complexity and fixed context windows, which limit scalability for long and irregular clinical sequences [133]. Emerging sequence-aware models, including state-space architectures (e.g., Mamba) [134], offer a computationally efficient alternative for modeling long-range dependencies. Recent benchmarking studies in medical imaging, including work on ensemble strategies, highlight the potential of such architectures in complex healthcare data [135, 136]. While largely explored in image-based applications, the underlying principle of leveraging model diversity may also be relevant for delirium prediction, particularly when integrating static tabular features with temporal and unstructured EHR representations. Systematic evaluation of these approaches on longitudinal EHR data for delirium prediction remains largely unexplored. Addressing this gap may support more robust risk stratification and earlier, more actionable risk identification.
Recent implementation-adjacent studies illustrate both the promise and challenges of translating delirium prediction models into clinical workflows [137]. Prospective evaluations using real-time EHR data, with predictions generated at admission and recalculated as new clinical information becomes available, highlight the importance of temporally aligned feature construction and careful outcome ascertainment to mitigate data leakage and under-detection. These studies also underscore persistent real-world limitations, including incomplete capture of environmental and precipitating risk factors, variability in screening fidelity, and difficulty identifying transient or medication-induced delirium episodes.
Finally, integrating model predictions within rule-based clinical decision pathways anchored to patient priors and contextual variables may help mitigate systemic bias and support real-time decision-making. Embedding model outputs within EHR systems through clinician-facing dashboards, augmented by interpretable XAI frameworks, can facilitate trust and adoption. Future work should therefore prioritize causal calibration, temporal modeling, and explainable deployment pipelines to advance the safe, equitable, and clinically meaningful integration of AI into delirium care.
Best practices for real-world generalizability
Despite promising predictive performance reported across studies, several recurring methodological factors may limit real-world generalizability of delirium prediction models. Data leakage remains a key concern, particularly when predictors are derived from information collected after delirium onset or when clinical notes contain explicit references to delirium, assessment tools, or diagnostic terminology. Such leakage can inflate apparent model performance and undermine clinical validity at deployment. Validation strategies also vary widely. Reliance on random train–test splits may overestimate performance in longitudinal or temporally evolving clinical data, whereas time-based or site-based splits are more aligned with real-world implementation but are less commonly reported. In addition, clinical utility is infrequently assessed. Approaches such as decision-curve analysis or net benefit evaluation can contextualize model predictions relative to clinical thresholds and workflows yet remain uncommon in the literature.
Conclusion
This scoping review highlights both the progress and persistent limitations in delirium risk prediction modeling. While machine learning approaches leveraging structured EHR data have shown promising performance, the underutilization of unstructured clinical text and longitudinal information remains a key barrier to advancing predictive performance, broader generalizability, and clinical translation. Future efforts should focus on integrating multimodal data streams through advanced transformer-based and temporal modeling architectures, paired with explainable and causally robust validation frameworks. Such approaches have the potential to facilitate the transition of delirium prediction models from experimental settings to reliable, real-world clinical tools that support early intervention and improved patient outcomes.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Abbreviations
- ADL
Activities of Daily Life
- AQT
A Quick Test of cognitive Speed
- APACHE
Acute Physiology and Chronic Health Evaluation
- ASA
American Society of Anesthesiologists
- BP
Blood Pressure
- CAD
Coronary Artery Disease
- CCI
Charlson Comorbidity Index
- COPD
Chronic Obstructive Pulmonary Disease
- CVD
Cardiovascular diseases
- DT
Decision Trees
- EEG
Electroencephalogram
- HER
Electronic Health Record
- GCS
Glasgow Coma Scale
- HR
Heart Rate
- ICU
Intensive Care Unit
- LOS
Length of Stay
- MMSE
Mini Mental State Examination
- MoCA
Montreal Cognitive Assessment
- MV
Mechanical Ventilation
- PRISMA-ScR
PRISMA Extension for Scoping Reviews
- RR
Respiratory Rate
- SOFA
Sequential Organ Failure Assessment
- SPO2
Saturation of Peripheral Oxygen
Author contributions
Lena Ara: Writing, review & editing, Methodology, Investigation, Formal analysis. Zina Ben Miled: Conceptualization, Writing, review & editing, Methodology, Investigation, Formal analysis. Malaz Boustani: Conceptualization, Review & editing, Methodology, Formal analysis, Validation. Sanjay Mohanty: Conceptualization, Writing, review & editing, Methodology, Investigation, Formal analysis, Validation, Funding acquisition.
Funding
This research was supported by the National Institute on Aging, K23AG071945.
Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not Applicable
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Dahiwade D, Patle G, Meshram E. Designing disease prediction model using machine learning approach. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). 2019. 1211–15.
- 2.Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. The Lancet. 2019;393(10181):1577–79. 10.1016/S0140-6736(19)30037-6. [DOI] [PubMed] [Google Scholar]
- 3.Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inf Decis Mak. 2019;19(1):281. 10.1186/s12911-019-1004-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Javeed A, Dallora AL, Berglund JS, Ali A, Ali L, Anderberg P. Machine learning for dementia prediction: a systematic review and future research directions. J Med Syst. 2023;47(1):17. 10.1007/s10916-023-01906-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kavitha C, Mani V, Srividhya SR, Khalaf OI, Tavera Romero CA. Early-Stage Alzheimer’s disease prediction using machine learning. Model Front Public Health. 2022;10:853294. 10.3389/fpubh.2022.853294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ruppert MM, Lipori J, Patel S, Ingersent E, Cupka J, OzrazgatBaslanti T, et al. ICU delirium-prediction models: a systematic review. Crit Care Explor. 2020;2(12):e0296. 10.1097/CCE.0000000000000296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Fong TG, Inouye SK. The inter-relationship between delirium and dementia: the importance of delirium prevention. Nat Rev Neurol. 2022;18(10):579–96. 10.1038/s41582-022-00698-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mohanty S, Gillio A, Lindroth H, Ortiz D, Holler E, Azar J, et al. Major surgery and long term cognitive outcomes: the Effect of postoperative delirium on dementia in the Year following discharge. J Surg Res. 2022;270:327–34. 10.1016/j.jss.2021.08.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gleason LJ, Schmitt EM, Kosar CM, Tabloski P, Saczynski JS, Robinson T, et al. Effect of delirium and other Major Complications on outcomes after elective surgery in older adults. JAMA Surg. 2015;150(12):1134–40. 10.1001/jamasurg.2015.2606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mahanna-Gabrielli E, Schenning KJ, Eriksson LI, Browndyke JN, Wright CB, Culley DJ, et al. State of the clinical science of perioperative brain health: report from the American Society of Anesthesiologists brain health initiative summit 2018 [published correction appears in Br J Anaesth. 2019 Dec; 123(6):917. 10.1016/j.bja.2019.07.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Boone MD, Sites B, von Recklinghausen FM, Mueller A, Taenzer AH, Shaefi S. Economic burden of postoperative neurocognitive disorders among US Medicare patients. JAMA Netw Open. 2020;3(7):e208931. 10.1001/jamanetworkopen.2020.8931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lindroth H, Bratzke L, Purvis S, Brown R, Coburn M, Mrkobrada M, et al. Systematic review of prediction models for delirium in the older adult inpatient. BMJ Open. 2018;8(4):e019223. 10.1136/bmjopen-2017-019223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Inouye SK, van Dyck CH, Alessi CA, Balkin S, Siegal AP, Horwitz RI. Clarifying confusion: the confusion assessment method. A new method for detection of delirium. Ann Intern Med. 1990;113(12):941–48. 10.7326/0003-4819-113-12-941. [DOI] [PubMed] [Google Scholar]
- 14.Goldberg TE, Chen C, Wang Y, Jung E, Swanson A, Ing C, et al. Association of delirium with long-term cognitive decline: a metaanalysis [published correction appears in JAMA Neurol. 2020 Nov 1; 77(11): 1452. doi: 10.1001/jamaneurol.2020.3284. JAMA Neurol. 2020;(11):1373-81.doi77. 10.1001/jamaneurol.2020.2273. [DOI] [PMC free article] [PubMed]
- 15.Inouye SK, Marcantonio ER, Kosar CM, Tommet D, Schmitt EM, Travison TG, et al. The short-term and long-term relationship between delirium and cognitive trajectory in older surgical patients. Alzheimers Dement. 2016;12(7):766–75. 10.1016/j.jalz.2016.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dasgupta M, Dumbrell AC. Preoperative risk assessment for delirium after noncardiac surgery: a systematic review. J Am Geriatr Soc. 2006;54(10):1578–89. 10.1111/j.1532-5415.2006.00893.x. [DOI] [PubMed] [Google Scholar]
- 17.Milstein A, Pollack A, Kleinman G, Barak Y. Confusion/Delirium following cataract surgery: an incidence study of 1-year duration. Int Psychogeriatr. 2002;14(3):301–06. 10.1017/s1041610202008499. [DOI] [PubMed] [Google Scholar]
- 18.Marcantonio E, Ta T, Duthie E, Resnick NM. Delirium severity and psychomotor types: their relationship with outcomes after hip fracture repair. J Am Geriatr Soc. 2002;50(5):850–57. 10.1046/j.1532-5415.2002.50210.x. [DOI] [PubMed] [Google Scholar]
- 19.Fong TG, Tulebaev SR, Inouye SK. Delirium in elderly adults: diagnosis, prevention and treatment. Nat Rev Neurol. 2009;5(4):210–20. 10.1038/nrneurol.2009.24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Qureshi O, Arthur ME. Recent advances in predicting, preventing, and managing postoperative delirium. Fac Rev. 2023;12:19. 10.12703/r/12-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lange PW, Lamanna M, Watson R, Maier AB. Undiagnosed delirium is frequent and difficult to predict: results from a prevalence survey of a tertiary hospital. J Clin Nurs. 2019;28(13–14):2537–42. 10.1111/jocn.14833. [DOI] [PubMed] [Google Scholar]
- 22.Lee A, Mu JL, Joynt GM, Chiu CH, Lai VKW, Gin T, et al. Risk prediction models for delirium in the intensive care unit after cardiac surgery: a systematic review and independent external validation. Br J Anaesth. 2017;118(3):391–99. 10.1093/bja/aew476. [DOI] [PubMed] [Google Scholar]
- 23.Xie Q, Wang X, Pei J, Wu Y, Guo Q, Su Y, et al. Machine LearningBased prediction models for delirium: a systematic review and MetaAnalysis. J Am Med Dir Assoc. 2022;23(10):1655–68.e6. 10.1016/j.jamda.2022.06.020. [DOI] [PubMed] [Google Scholar]
- 24.Strating T, Shafiee Hanjani L, Tornvall I, Hubbard R, Scott I. Navigating the machine learning pipeline: a scoping review of inpatient delirium prediction models. BMJ Health Care Inf. 2023, 07;30. 10.1136/bmjhci-2023-100767. [DOI] [PMC free article] [PubMed]
- 25.Hendrix JM, Garmon EH. American Society of Anesthesiologists physical status classification System. In: STATPEARLS Treasure Island, FL: StatPearls Publishing; 2025. Updated February 11, 2025. doi: https://www.ncbi.nlm.nih.gov/books/NBK441940/. [PubMed]
- 26.Heinrich M, Woike JK, Spies CD, Wegwarth O. Forecasting postoperative delirium in older adult patients with Fast-and-Frugal decision trees. J Clin Med. 2022;11(19):5629. 10.3390/jcm11195629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wagner DP, Draper EA. Acute Physiology and Chronic health evaluation (APACHE II) and Medicare reimbursement. Health Care Financing Rev. 1984;Suppl(Suppl):91–105. [PMC free article] [PubMed] [Google Scholar]
- 28.Bieniek J, Wilczyn´ski K, Szewieczek J. Fried frailty phenotype assessment components as applied to geriatric inpatients. Clin Interventions Aging. 2016;11:453–59. doi: 10.2147/CIA. S101369. [DOI] [PMC free article] [PubMed]
- 29.Cheung A, Haas B, Ringer TJ, McFarlan A, Wong CL. Canadian study of health and Aging clinical frailty Scale: does it predict adverse outcomes among geriatric Trauma patients? J The Am Coll Surgeons. 2017;225(5):658–65.e3. 10.1016/j.jamcollsurg.2017.08.008. [DOI] [PubMed] [Google Scholar]
- 30.Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73. 10.7326/M18-0850. [DOI] [PubMed] [Google Scholar]
- 31.Aw H. 2007. Computer software. doi: https://harzing.com/resources/publish-or-perish. Publish or Perish.
- 32.Lee DY, Oh AR, Park J, Lee SH, Choi B, Yang K, et al. Machine learning-based prediction model for postoperative delirium in noncardiac surgery. BMC Psychiatry. 2023;23(1):317. 10.1186/s12888-023-04768-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nowakowska K, Sakellarios A, Ka´zmierski J, Fotiadis DI, Pezoulas VC. AI-Enhanced predictive modeling for identifying Depression and delirium in cardiovascular patients scheduled for cardiac surgery. Diagn (Basel). 2023;14(1):67. 10.3390/diagnostics14010067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liu Y, Shen W, Tian Z. Using machine learning algorithms to predict high-risk factors for postoperative delirium in elderly patients. Clin Interv Aging. 2023;18:157–68. 10.2147/CIA.S398314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhang Y, Hu J, Hua T, Zhang J, Zhang Z, Yang M. Development of a machine learning-based prediction model for sepsis-associated delirium in the intensive care unit. Sci Rep. 2023;13:12697. 10.1038/s41598-023-38650-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Nagata C, Hata M, Miyazaki Y, Masuda H, Wada T, Kimura T, et al. Development of postoperative delirium prediction models in patients undergoing cardiovascular surgery using machine learning algorithms [published correction appears in Sci Rep. 2024 Feb 22; 14(1): 4396. doi: 10.1038/s41598-024-51975-y. Sci Rep. 2023;13(1):21090. 10.1038/s41598-023-48418-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Song YX, Yang XD, Luo YG, Ouyang CL, Yu Y, Ma YL, et al. Comparison of logistic regression and machine learning methods for predicting postoperative delirium in elderly patients: a retrospective study. CNS Neurosci Ther. 2023;29(1):158–67. 10.1111/cns.13991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bramley P, McArthur K, Blayney A, McCullagh I. Risk factors for postoperative delirium: an umbrella review of systematic reviews. Int J Surg. 2021, sep;93:106063. 10.1016/j.ijsu.2021.106063. [DOI] [PubMed] [Google Scholar]
- 39.Benovic S, Ajlani AH, Leinert C, Fotteler M, Wolf D, Steger F, et al. Introducing a machine learning algorithm for delirium prediction-the supporting SURgery with GEriatric Co-Management and AI project (SURGE-Ahead). Age Ageing. 2024;53(5). 10.1093/ageing/afae101. [DOI] [PMC free article] [PubMed]
- 40.Gong KD, Lu R, Bergamaschi TS, Sanyal A, Guo J, Kim HB, et al. Predicting intensive care delirium with machine learning: Model development and external validation. Anesthesiology. 2023;138(3):299311. 10.1097/ALN.0000000000004478. [DOI] [PubMed] [Google Scholar]
- 41.Schulthess-Lisibach AE, Gallucci G, Benelli V, Ka¨lin R, Schulthess S, Cattaneo M, et al. Predicting delirium in older non-intensive care unit inpatients: development and validation of the DELIrium risK tool (DELIKT). Int J Clin Pharm. 2023;45(5):1118–27. 10.1007/s11096-023-01566-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kim JH, Hua M, Whittington RA, Lee J, Liu C, Ta CN, et al. A machine learning approach to identifying delirium from electronic health records. JAMIA Open. 2022;5(2):ooac042. 10.1093/jamiaopen/ooac042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Castro VM, Sacks CA, Perlis RH, McCoy TH. Development and external validation of a delirium prediction model for hospitalized patients with coronavirus disease 2019. J Acad Consult Liaison Psychiatry. 2021;62(3):298–308. 10.1016/j.jaclp.2020.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Oosterhoff JHF, Karhade AV, Oberai T, Franco-Garcia E, Doornberg JN, Schwab JH. Prediction of postoperative delirium in geriatric hip fracture patients: a clinical prediction model using machine learning algorithms. Geriatr Orthop Surg Rehabil. 2021;12:21514593211062277. 10.1177/21514593211062277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kurisu K, Inada S, Maeda I, Ogawa A, Iwase S, Akechi T, et al. A decision tree prediction model for a short-term outcome of delirium in patients with advanced cancer receiving pharmacological interventions: a secondary analysis of a multicenter and prospective observational study (phase-R). Palliat Support Care. 2022;20(2):153–58. 10.1017/S1478951521001565. [DOI] [PubMed] [Google Scholar]
- 46.Zhao H, You J, Peng Y, Feng Y. Machine learning algorithm using electronic chart-derived data to predict delirium after elderly hip fracture surgeries: a retrospective case-control study. Front Surg. 2021;8:634629. 10.3389/fsurg.2021.634629. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Xue B, Li D, Lu C, King CR, Wildes T, Avidan MS, et al. Use of machine learning to Develop and evaluate models using preoperative and intraoperative data to identify risks of postoperative Complications. JAMA Netw Open. 2021;4(3):e212240. 10.1001/jamanetworkopen.2021.2240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Jauk S, Kramer D, Großauer B, Rienmu¨ller S, Avian A, Berghold A, et al. Risk prediction of delirium in hospitalized patients using machine learning: an implementation and prospective evaluation study. J Am Med Inf Assoc. 2020;27(9):1383–92. 10.1093/jamia/ocaa113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Hur S, Ko RE, Yoo J, Ha J, Cha WC, Chung CR. A machine LearningBased algorithm for the prediction of intensive care unit delirium (PRIDE): retrospective study. JMIR Med Inf. 2021;9(7):e23401. 10.2196/23401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pagali SR, Miller D, Fischer K, Schroeder D, Egger N, Manning DM, et al. Predicting delirium risk using an Automated Mayo delirium prediction tool: development and validation of a risk-stratification model. Mayo Clin Proc. 2021;96(5):1229–35. 10.1016/j.mayocp.2020.08.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Mueller B, Street WN, Carnahan RM, Lee S. Evaluating the performance of machine learning methods for risk estimation of delirium in patients hospitalized from the emergency department. Acta Psychiatr Scand. 2023;147(5):493–505. 10.1111/acps.13551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hercus C, Hudaib AR. Delirium misdiagnosis risk in psychiatry: a machine learning-logistic regression predictive algorithm. BMC Health Serv Res. 2020;20(1):151. 10.1186/s12913-020-5005-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Haight TN, Marsh EB. Identifying delirium early after Stroke: a New prediction tool for the intensive care unit. J Stroke Cerebrovasc Dis. 2020;29(11):105219. 10.1016/j.jstrokecerebrovasdis.2020.105219. [DOI] [PubMed] [Google Scholar]
- 54.Bowman K, Jones L, Masoli J, Mujica-Mota R, Strain D, Butchart J, et al. Predicting incident delirium diagnoses using data from primarycare electronic health records. Age Ageing. 2020;49(3):374–81. 10.1093/ageing/afaa006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kim YJ, Lee H, Woo HG, Lee SW, Hong M, Jung EH, et al. Machine learning-based model to predict delirium in patients with advanced cancer treated with palliative care: a multicenter, patientbased registry cohort. Sci Rep. 2024;14(1):11503. 10.1038/s41598-024-61627-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bishara A, Chiu C, Whitlock EL, Douglas VC, Lee S, Butte AJ, et al. Postoperative delirium prediction using machine learning models and preoperative electronic health record data. BMC Anesthesiol. 2022;22(1):8. 10.1186/s12871-021-01543-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Jung JW, Hwang S, Ko S, Jo C, Park HY, Han HS, et al. A machinelearning model to predict postoperative delirium following knee arthroplasty using electronic health records. BMC Psychiatry. 2022;22(1):436. 10.1186/s12888-022-04067-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Spiller TR, Tufan E, Petry H, Bo¨ttger S, Fuchs S, Duek O, et al. Delirium screening in an acute care setting with a machine learning classifier based on routinely collected nursing data: a model development study. J Psychiatr Res. 2022;156:194–99. 10.1016/j.jpsychires.2022.10.018. [DOI] [PubMed] [Google Scholar]
- 59.de la Varga-Mart´ınez O, G´omez-Pesquera E, Mun˜oz-Moreno MF, Marcos-Vidal JM, Lo´pez-G´omez A, Rodenas-G´omez F, et al. Development and validation of a delirium risk prediction preoperative model for cardiac surgery patients (DELIPRECAS): an observational multicentre study. J Clin Anesth. 2021;69:110158. 10.1016/j.jclinane.2020.110158. [DOI] [PubMed] [Google Scholar]
- 60.Wang Y, Lei L, Ji M, Tong J, Zhou CM, Yang JJ. Predicting postoperative delirium after microvascular decompression surgery with machine learning. J Clin Anesth. 2020;66:109896. 10.1016/j.jclinane.2020.109896. [DOI] [PubMed] [Google Scholar]
- 61.Li X, Wang G, He Y, Wang Z, Zhang M. White-cell derived inflammatory Biomarkers in prediction of postoperative delirium in elderly patients undergoing surgery for lower limb fracture under non-general anaesthesia. Clin Interv Aging. 2022;17:383–92. 10.2147/CIA.S346954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kurlowicz L, Mini Mental WM, State E. (MMSE); 1999. Issue 3, January 1999, comprehensive geriatric assessment toolkit. Internet. https://cgatoolkit.ca/Uploads/ContentDocuments/MMSE.pdf.
- 63.Tang D, Ma C, Xu Y. Interpretable machine learning model for early prediction of delirium in elderly patients following intensive care unit admission: a derivation and validation study. Front Med (Lausanne). 2024;11:1399848. 10.3389/fmed.2024.1399848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Hata M, Miyazaki Y, Nagata C, Masuda H, Wada T, Takahashi S, et al. Predicting postoperative delirium after cardiovascular surgeries from preoperative portable electroencephalography oscillations. Front Psychiatry. 2023;14:1287607. 10.3389/fpsyt.2023.1287607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Zhang Y, Wan DH, Chen M, Li YL, Ying H, Yao GL, et al. Automated machine learning-based model for the prediction of delirium in patients after surgery for degenerative spinal disease. CNS Neurosci Ther. 2023;29(1):282–95. 10.1111/cns.14002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Hu XY, Liu H, Zhao X, Sun X, Zhou J, Gao X, et al. Automated machine learning-based model predicts postoperative delirium using readily extractable perioperative collected electronic data. CNS Neurosci Ther. 2022;28(4):608–18. 10.1111/cns.13758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Racine AM, Tommet D, D’Aquila ML, Fong TG, Gou Y, Tabloski PA, et al. Machine learning to Develop and internally Validate a predictive model for post-operative delirium in a prospective, observational clinical cohort study of older surgical patients. J Gen Intern Med. 2021;36(2):265–73. 10.1007/s11606-020-06238-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Mulkey MA, Huang H, Albanese T, Kim S, Yang B. Supervised deep learning with vision transformer predicts delirium using limited lead EEG. Sci Rep. 2023;13(1):7890. 10.1038/s41598-023-35004-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Wang J, Ji Y, Wang N, Chen W, Bao Y, Qin Q, et al. Establishment and validation of a delirium prediction model for neurosurgery patients in intensive care. Int J Nurs Prac. 2020;26(4):e12818. 10.1111/ijn.12818. [DOI] [PubMed] [Google Scholar]
- 70.Matsumoto K, Nohara Y, Sakaguchi M, Takayama Y, Fukushige S, Soejima H, et al. Temporal generalizability of machine learning models for Predicting postoperative delirium using electronic health record data: Model development and validation study. JMIR Perioperative Med. 2023;6:e50895. 10.2196/50895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Liu S, Schlesinger JJ, McCoy AB, Reese TJ, Steitz B, Russo E, et al. New onset delirium prediction using machine learning and long shortterm memory (LSTM) in electronic health record. J Am Med Inf Assoc. 2022;30(1):120–31. 10.1093/jamia/ocac210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Ro¨hr V, Blankertz B, Radtke FM, Spies C, Koch S. Machine-learning model predicting postoperative delirium in older patients using intraoperative frontal electroencephalographic signatures. Front Aging Neurosci. 2022;14:911088. 10.3389/fnagi.2022.911088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Wang L, Zhang Y, Chignell M, Shan B, Sheehan KA, Razak F, et al. Boosting delirium identification accuracy with sentiment-based Natural language processing: mixed methods study. JMIR Med Inf. 2022;10(12):e38161. 10.2196/38161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Yang T, Yang H, Liu Y, Liu X, Ding YJ, Li R, et al. Postoperative delirium prediction after cardiac surgery using machine learning models. Comput Bio Med. 2024;169:107818. 10.1016/j.compbiomed.2023.107818. [DOI] [PubMed] [Google Scholar]
- 75.Lei L, Zhang S, Yang L, Yang C, Liu Z, Xu H, et al. Machine learningbased prediction of delirium 24 h after pediatric intensive care unit admission in critically ill children: a prospective cohort study. Int J Nurs Stud. 2023;146:104565. 10.1016/j.ijnurstu.2023.104565. [DOI] [PubMed] [Google Scholar]
- 76.Menzenbach J, Kirfel A, Guttenthaler V, Feggeler J, Hilbert T, Ricchiuto A, et al. Pre-operative prediction of postoperative DElirium by appropriate SCreening (PROPDESC) development and validation of a pragmatic POD risk screening score based on routine preoperative data. J Clin Anesth. 2022;78:110684. 10.1016/j.jclinane.2022.110684. [DOI] [PubMed] [Google Scholar]
- 77.Segern¨as A, Skoog J, Ahlgren AE, Almerud OS, ¨ Thulesius H, Zachrisson H. Prediction of postoperative delirium after cardiac surgery with a Quick test of cognitive Speed, Mini-Mental State Examination and hospital anxiety and Depression Scale. Clin Interv Aging. 2022;17:359–68. 10.2147/CIA.S350195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Xu Y, Meng Y, Qian X, Wu H, Liu Y, Ji P, et al. Prediction model for delirium in patients with cardiovascular surgery: development and validation. J Cardiothorac Surg. 2022;17(1):247. 10.1186/s13019-022-02005-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Li Q, Zhao Y, Chen Y, Yue J, Xiong Y. Developing a machine learning model to identify delirium risk in geriatric internal medicine inpatients. Eur Geriatr Med. 2022;13(1):173–83. 10.1007/s41999-021-00562-9. [DOI] [PubMed] [Google Scholar]
- 80.Ahmed A, Garcia-Agundez A, Petrovic I, Radaei F, Fife J, Zhou J, et al. Delirium detection using wearable sensors and machine learning in patients with intracerebral hemorrhage. Front Neurol. 2023;14:1135472. 10.3389/fneur.2023.1135472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Xue X, Chen W, Chen X. A novel radiomics-based machine learning framework for prediction of acute kidney Injury-related delirium in patients who underwent cardiovascular surgery. Comput Math Methods Med. 2022, 2022;4242069. 10.1155/2022/4242069. [DOI] [PMC free article] [PubMed]
- 82.Song Y, Zhang D, Wang Q, Liu Y, Chen K, Sun J, et al. Prediction models for postoperative delirium in elderly patients with machinelearning algorithms and SHapley Additive exPlanations. Transl Psychiatry. 2024;14(1):57. 10.1038/s41398-024-02762-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Sheikhalishahi S, Bhattacharyya A, Celi LA, Osmani V. An interpretable deep learning model for time-series electronic health records: case study of delirium prediction in critical care. Artif Intell Med. 2023;144:102659. 10.1016/j.artmed.2023.102659. [DOI] [PubMed] [Google Scholar]
- 84.Bhattacharyya A, Sheikhalishahi S, Torbic H, Yeung W, Wang T, Birst J, et al. Delirium prediction in the ICU: designing a screening tool for preventive interventions. JAMIA Open. 2022;5(2):ooac048. 10.1093/jamiaopen/ooac048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
- 86.Ribeiro MT, Singh S, Guestrin C.”why should I trust You?”: explaining the predictions of any classifier. 2016. https://arxivorg/abs/1602.04938. [Google Scholar]
- 87.Binder A, Montavon G, Bach S, Mu¨ller K, Samek W. Layer-wise Relevance Propagation for Neural networks with Local renormalization layers. CoRr. 2016;abs/1604.00825. http://arxiv.org/abs/1604.00825.
- 88.Jain S, Margetis K, Iverson LM. StatPearls [Internet]. Glasgow Coma Scale. Treasure Island (FL): StatPearls Publishing; 2025. [Updated 2025 Jun 23; Cited 2025]. doi: https://www.ncbi.nlm.nih.gov/books/NBK513298/.
- 89.Scale MKRA-S. 2020. https://www.mdcalc.com/richmond-agitation-sedation-scale-rass. Accessed July 5, 2020.
- 90.Tripathi S, Fritz BA, Avidan MS, Chen Y, King CR. Algorithmic bias in machine learning based delirium prediction. arXiv. Preprint posted online November 8, 2022. arXiv: 221104442.
- 91.Ormseth CH, LaHue SC, Oldham MA, Josephson SA, Whitaker E, Douglas VC. Predisposing and precipitating factors associated with delirium: a systematic review. JAMA Network Open. 2023, 01;6(1):e2249950–0. 10.1001/jamanetworkopen.2022.49950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Wilson JE, Mart MF, Cunningham C, Shehabi Y, Girard TD, MacLullich AMJ, et al. Delirium [Published correction appears in Nat Rev Dis Primers.Published. 2020 Nov 12;6(1):90. 10.1038/s41572-020-00223-4 [DOI] [PMC free article] [PubMed]
- 93.Gaudreau JD, Gagnon P, Harel F, Tremblay A, Roy MA. Fast, systematic, and continuous delirium assessment in hospitalized patients: the nursing delirium screening scale. J Pain Symptom Manage. 2005;29(4):368–75. 10.1016/j.jpainsymman.2004.07.009. [DOI] [PubMed] [Google Scholar]
- 94.Bergeron N, Dubois MJ, Dumont M, Dial S, Skrobik Y. Intensive care delirium screening Checklist: evaluation of a new screening tool. Intensive Care Med. 2001;27(5):859–64. 10.1007/s001340100909. [DOI] [PubMed] [Google Scholar]
- 95.Traube C, Silver G, Kearney J, Patel A, Atkinson TM, Yoon MJ, et al. Cornell assessment of pediatric delirium: a valid, rapid, observational tool for screening delirium in the PICU*. Crit Care Med. 2014;42(3):656–63. 10.1097/CCM.0b013e3182a66b76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Association ED. American delirium Society. The DSM-5 criteria, level of arousal and delirium diagnosis: inclusiveness is safer. BMC Med. 2014;12:141. 10.1186/s12916-014-0141-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Inouye SK, Leo-Summers L, Zhang Y, Bogardus STJ, Leslie DL, Agostini JV. A chart-based method for identification of delirium: validation compared with interviewer ratings using the confusion assessment method. J Am Geriatr Soc. 2005;53(2):312–18. 10.1111/j.1532-5415.2005.53120.x. [DOI] [PubMed] [Google Scholar]
- 98.Siddiqi N, House AO, Holmes JD. Occurrence and outcome of delirium in medical in-patients: a systematic literature review. Age Ageing. 2006;35(4):350–64. 10.1093/ageing/afl005. [DOI] [PubMed] [Google Scholar]
- 99.Miranda F, Gonzalez F, Plana MN, Zamora J, Quinn TJ, Seron P. Confusion assessment method for the intensive care unit (CAM-ICU) for the diagnosis of delirium in adults in critical care settings. Cochrane Database Syst Rev. 2023;11(11): CD013126. doi: 10.1002/14651858.CD013126.pub2. [DOI] [PMC free article] [PubMed]
- 100.Ge W, Alabsi H, Jain A, Ye E, Sun H, Fernandes M, et al. Identifying patients with delirium based on unstructured clinical notes: observational study. JMIR Form Res. 2022;6(6):e33834. 10.2196/33834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Cherak SJ, Soo A, Brown KN, Ely EW, Stelfox HT, Fiest KM. Development and validation of delirium prediction model for critically ill adults parameterized to ICU admission acuity. PLoS One. 2020;15(8):e0237639. 10.1371/journal.pone.0237639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Bao L, Liu T, Zhang Z, Pan Q, Wang L, Fan G, et al. The prediction of postoperative delirium with the preoperative bispectral index in older aged patients: a cohort study. Aging Clin Exp Res. 2023;35(7):1531–39. 10.1007/s40520-023-02408-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Gusmao-Flores D, Salluh JIF, Chalhub RA, Quarantini LC. The con-´ fusion assessment method for the intensive care unit (CAM-ICU) and intensive care delirium screening checklist (ICDSC) for the diagnosis of delirium: a systematic review and meta-analysis of clinical studies. Crit Care. 2012;16(4):R115. 10.1186/cc11407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Netzer M, Hackl WO, Schaller M, Alber L, Marksteiner J, Ammenwerth E. Evaluating performance and interpretability of machine learning methods for Predicting delirium in gerontopsychiatric patients. Stud Health Technol Inf. 2020;271:121–28. 10.3233/SHTI200087. [DOI] [PubMed] [Google Scholar]
- 105.Trzepacz PT, Mittal D, Torres R, Kanary K, Norton J, Jimerson N. Validation of the delirium rating Scale-revised-98: comparison with the delirium rating scale and the cognitive test for delirium [published correction appears in J Neuropsychiatry Clin Neurosci 2001 Summer; 13(3): 433. J Neuropsychiatry Clin Neurosci. 2001;13(2):229–42. 10.1176/jnp.13.2.229. [DOI] [PubMed] [Google Scholar]
- 106.Lin N, Liu K, Feng J, Chen R, Ying Y, Lv D, et al. Development and validation of a postoperative delirium prediction model for pediatric patients: a prospective, observational, single-center study. Medicine. 2021;100(20):e25894. 10.1097/MD.0000000000025894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Sikich N, Lerman J. Development and psychometric evaluation of the pediatric anesthesia emergence delirium scale. Anesthesiology. 2004;100(5):1138–45. 10.1097/00000542-200405000-00015. [DOI] [PubMed] [Google Scholar]
- 108.Bajwa SA, Costi D, Cyna AM. A comparison of emergence delirium scales following general anesthesia in children. Paediatr Anaesth. 2010;20(8):704–11. 10.1111/j.1460-9592.2010.03328.x. [DOI] [PubMed] [Google Scholar]
- 109.Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–58. [DOI] [PubMed] [Google Scholar]
- 110.Gul S, Ayturan K, Hardalac¸ F. PyCaret for Predicting type 2 diabetes: a phenotype- and gender-based approach with the “nurses’ health study” and the “health professionals’ follow-up study”. Datasets. J Personalized Med. 2024;14(8):804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Seinen TM, Fridgeirsson EA, Ioannou S, Jeannetot D, John LH, Kors JA, et al. Use of unstructured text in prognostic clinical prediction models: a systematic review. J Am Med Inf Assoc. 2022;29(7):1292302. 10.1093/jamia/ocac058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, et al. Neural Natural language processing for unstructured data in electronic health records: a review. Comput Sci Rev. 2022;46:100511. 10.1016/j.cosrev.2022.100511. [Google Scholar]
- 113.Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 2017;5:8869–79. 10.1109/ACCESS.2017.2694446. [Google Scholar]
- 114.Mahbub M, Srinivasan S, Danciu I, Peluso A, Begoli E, Tamang S, et al. Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients. PLoS One. 2022;17(1):e0262182. 10.1371/journal.pone.0262182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Mall PK, Singh PK, Srivastav S, Narayan V, Paprzycki M, Jaworska T, et al. A comprehensive review of deep neural networks for medical image processing: recent developments and future opportunities. Healthcare Analytics. 2023;4:100216. 10.1016/j.health.2023.100216. [Google Scholar]
- 116.Farhad M, Masud MM, Beg A, Ahmad A, Ahmed L. A review of medical diagnostic video analysis using deep learning techniques. Appl Sci. 2023;13(11). https://www.mdpi.com/2076-3417/13/11/6582.
- 117.Li M, Jiang Y, Zhang Y, Zhu H. Medical image analysis using deep learning algorithms. Front Public Health. 2023;11:1273253. 10.3389/fpubh.2023.1273253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Morandi A, Gunther ML, Vasilevskis EE, Girard TD, Hopkins RO, Jackson JC, et al. Neuroimaging in delirious intensive care unit patients: a preliminary case series report. Psychiatry (Edgmont). 2010;7(9):28–33. [PMC free article] [PubMed] [Google Scholar]
- 119.Song RJ, Guo FJ, Huang XF, Li M, Sun YY, Yu AY, et al. Brain functional magnetic resonance imaging in ICU patients who developed delirium. Front Phys. 2024;12:1391104. 10.3389/fphy.2024.1391104. [Google Scholar]
- 120.Hijazi Z, Lange P, Watson R, Maier AB. The use of cerebral imaging for investigating delirium aetiology. Eur J Intern Med. 2018;52:35–39. 10.1016/j.ejim.2018.01.024. [DOI] [PubMed] [Google Scholar]
- 121.Ser SE, Shear K, Snigurska UA, Prosperi M, Wu Y, Magoc T, et al. Clinical prediction models for hospital-induced delirium using structured and unstructured electronic health record data: protocol for a development and validation study. JMIR Res Protoc. 2023;12:e48521. 10.2196/48521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. 10.1136/bmj-2023-078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Fu S, Lopes GS, Pagali SR, Thorsteinsdottir B, LeBrasseur NK, Wen A, et al. Ascertainment of delirium status using Natural language processing from electronic health records. J Gerontol Biol Sci Med Sci. 2022;77(3):524–30. 10.1093/gerona/glaa275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, et al. An information extraction framework for cohort identification using electronic health records. AMIA Jt Summits Transl Sci Proc. 2013, 2013;149–53. [PMC free article] [PubMed]
- 125.St Sauver J, Fu S, Sohn S, Weston S, Fan C, Olson J, et al. Identification of delirium from real-world electronic health record clinical notes. J Clin Transl Sci. 2023;7(1):e187. 10.1017/cts.2023.610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2(4):230–43. 10.1136/svn-2017-000101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, et al. Multimodal machine learning in precision health: a scoping review. NPJ Digit Med. 2022;5(1):171. 10.1038/s41746-022-00712-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Baltruˇsaitis T, Ahuja C, Morency LP. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell. 2018;41(2):423–43. 10.1109/TPAMI.2018.2798607. [DOI] [PubMed] [Google Scholar]
- 129.Team C. Chameleon: mixed-modal early-fusion foundation models. arXiv. 2025. 10.48550/arXiv.2405.09818. [Google Scholar]
- 130.Sait U, Gl KV, Shivakumar S, Kumar T, Bhaumik R, Prajapati S, et al. A deep-learning based multimodal system for covid-19 diagnosis using breathing sounds and chest X-ray images. Appl Soft Comput. 2021;109:107522. 10.1016/j.asoc.2021.107522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Kumar S, Bhagat V, Sahu P, Chaube MK, Behera AK, Guizani M, et al. A novel multimodal framework for early diagnosis and classification of COPD based on CT scan images and multivariate pulmonary respiratory diseases. Comput Methods Programs Biomed. 2024;243:107911. 10.1016/j.cmpb.2023.107911. [DOI] [PubMed] [Google Scholar]
- 132.Friedman JI, Parchure P, Cheng FY, Fu W, Cheertirala S, Timsina P, et al. Machine learning multimodal model for delirium risk stratification. JAMA Netw Open. 2025;8(5):e258874. 10.1001/jamanetworkopen.2025.8874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Tay Y, Dehghani M, Bahri D, Metzler D. Efficient Transformers: a survey. 2022. https://arxiv.org/abs/2009.06732.
- 134.Gu A, Dao T. Mamba: linear-time sequence modeling with selective State spaces. 2024. doi: https://arxiv.org/abs/2312.00752.
- 135.Yanar E, Kutan F, Ayturan K, Kutbay U, Algın O, Hardala¸c F, et al. A comparative analysis of the Mamba, transformer, and CNN architectures for multi-label chest X-Ray anomaly detection in the NIH ChestX-Ray14 dataset. Diagnostics. 2025;15(17):2215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Yanar E, Hardala¸c F, Ayturan K. CELM: an ensemble deep learning model for early cardiomegaly diagnosis in Chest Radiography. Diagnostics. 2025;15(13):1602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Jauk S, Kramer D, Sumerauer S, Veeranki SPK, Schrempf M, Puchwein P. Machine learning-based delirium prediction in surgical in-patients: a prospective validation study. JAMIA Open. 2024, 09;7(3):ooae091. 10.1093/jamiaopen/ooae091. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.



