Skip to main content
Alzheimer's & Dementia : Translational Research & Clinical Interventions logoLink to Alzheimer's & Dementia : Translational Research & Clinical Interventions
. 2022 Sep 29;8(1):e12351. doi: 10.1002/trc2.12351

Risk factors and machine learning model for predicting hospitalization outcomes in geriatric patients with dementia

Xin Wang 1, Chika F Ezeana 1, Lin Wang 1, Mamta Puppala 1,, Yan‐Siang Huang 2, Yunjie He 1, Xiaohui Yu 1, Zheng Yin 1, Hong Zhao 1, Eugene C Lai 3, Stephen T C Wong 1,4,
PMCID: PMC9520763  PMID: 36204350

Abstract

Introduction

Geriatric patients with dementia incur higher healthcare costs and longer hospital stays than other geriatric patients. We aimed to identify risk factors for hospitalization outcomes that could be mitigated early to improve outcomes and impact overall quality of life.

Methods

We identified risk factors, that is, demographics, hospital complications, pre‐admission, and post‐admission risk factors including medical history and comorbidities, affecting hospitalization outcomes determined by hospital stays and discharge dispositions. Over 150 clinical and demographic factors of 15,678 encounters (8407 patients) were retrieved from our institution's data warehouse. We further narrowed them down to twenty factors through feature selection engineering by using analysis of variance (ANOVA) and Glmnet. We developed an explainable machine‐learning model to predict hospitalization outcomes among geriatric patients with dementia.

Results

Our model is based on stacking ensemble learning and achieved accuracy of 95.6% and area under the curve (AUC) of 0.757. It outperformed prevalent methods of risk assessment for encounters of patients with Alzheimer's disease dementia (ADD) (4993), vascular dementia (VD) (4173), Parkinson's disease with dementia (PDD) (3735), and other unspecified dementias (OUD) (2777). Top identified hospitalization outcome risk factors, mostly from medical history, include encephalopathy, number of medical problems at admission, pressure ulcers, urinary tract infections, falls, admission source, age, race, anemia, etc., with several overlaps in multi‐dementia groups.

Discussion

Our model identified several predictive factors that can be modified or intervened so that efforts can be made to prevent recurrence or mitigate their adverse effects. Knowledge of the modifiable risk factors would help guide early interventions for patients at high risk for poor hospitalization outcome as defined by hospital stays longer than seven days, undesirable discharge disposition, or both. The interventions include starting specific protocols on modifiable risk factors like encephalopathy, falls, and infections, where non‐existent or not routine, to improve hospitalization outcomes of geriatric patients with dementia.

Highlights

  • A total 15,678 encounters of Geriatrics with dementia with a final 20 risk factors.

  • Developed a predictive model for hospitalization outcomes for multi‐dementia types.

  • Risk factors for each type were identified including those amenable to interventions.

  • Top factors are encephalopathy, pressure ulcers, urinary tract infection (UTI), falls, and admission source.

  • With accuracy of 95.6%, our ensemble predictive model outperforms other models.

Keywords: cognitive impairment, dementia, explainable artificial intelligence, geriatric patient risk factors, hospitalization outcome prediction, machine learning, multi‐dementia modalities, multi‐view ensemble learning model

1. INTRODUCTION

A Global Health and Aging report presented by the World Health Organization (WHO) reveals that the number of people aged 65 or older will be triple − from an estimated 524 million in 2010 to nearly 1.5 billion in 2050. 1 More than six million Americans are living with Alzheimer's disease dementia (ADD), the most common type of dementia. 2 , 3 This number will reach 14 million by 2060. 4

The risk of being affected by dementia, including ADD, vascular dementia (VD), and dementia associated with Parkinson's disease, etc., increases dramatically with age. Older persons with dementia use up more healthcare costs and are hospitalized more often than other geriatric patients. 3 , 5 , 6 When they are admitted to the hospital, they stay longer and their medical costs are significantly higher compared to other seniors without dementia admitted for the same medical conditions. 6 While hospitalization may be unavoidable due to reasons such as trauma, infection, stroke, surgery, or exacerbation of systemic illnesses, it is important to understand the determining risk factors of hospitalization outcomes in order to decrease hospital stays, lessen the patient's ordeal, and minimize unfavorable dispositions.

Diverse computational approaches, including machine learning, a subfield of artificial intelligence (AI), have been applied to various problems associated with dementia conditions. For example, such approaches have recently been employed in predicting progression from mild cognitive impairment to ADD, predicting progression and cognitive impairment in ADD, prioritizing drug targets, and repositioning old drugs for ADD. 7 , 8 , 9 While machine learning approaches have also been used in estimating incidence, diagnosis, and classification of dementias, 10 , 11 it has not been applied in predicting hospitalization outcomes of geriatric patients with dementias. It is time for machine learning to play a role in solving this important issue. 12 , 13 , 14

In this study, we developed a stacking ensemble learning model for predicting hospitalization outcomes of geriatric patients with dementia on the first or second day of hospital admission. Having such an early assessment of hospitalization outcome will allow timely interventions, especially when and where they are not routine, and better care coordination to improve hospitalization outcomes, enhance quality of life, and reduce hospital readmission risk.

2. METHODS

2.1. Data description, data preprocessing, and modeling

Clinical records of patients aged 65 years and above with a diagnosis of dementia at Houston Methodist's nine‐hospital system over ten years (January 2010 to December 2019), with over 150 risk factors relating to hospitalization outcomes, including demographics, hospital complications, pre‐admission, and post‐admission risk factors such as past medical history and comorbidities, were derived from the Houston Methodist clinical data warehouse. 15 Diagnoses and applicable risk factors were extracted using ICD codes, and the quality of the data was extensively validated manually prior to the study. Dementia diagnoses include ADD, VD, Parkinson's disease with dementia (PDD), and others such as frontotemporal dementia, dementia with Lewy body, and Huntington's disease dementia, grouped into other unspecified dementias (OUD) (Supplementary Figure S1). For PDD group, we identified patients with Parkinson's disease first by code G20 and then associated them with dementia diagnosis.

1. RESEARCH IN CONTEXT

  1. Systematic review: We searched for published machine‐learning based models predicting hospitalization outcomes in geriatric patients with dementia in PubMed, Google Scholar, and other bibliographical archives. While there are computational modeling and evidence‐based research associated with dementia conditions such as predicting incidence, diagnosis, disease progression, admission reasons, and frequencies, etc., surprisingly, none used powerful machine‐learning approaches for hospitalization outcome prediction.

  2. Interpretation: We applied machine‐learning methods to reveal risk factors, including modifiable, that is, preventable, controllable, and improvable, factors for hospitalization outcomes. Modifiable factors like encephalopathy, infections, anemia, falls, etc., were associated with unfavorable hospitalization outcomes.

  3. Future directions: Machine‐learning augmented predictive models and knowledge of modifiable risk factors from patient's medical history can help prioritize clinical resource allocation and ensure timely interventions to prevent or improve these factors, reduce length of stay, enable better discharge disposition, and improve hospitalization outcomes. Further studies including outcomes of interventional protocols will duly follow.

Hospitalization outcomes (desirable/good, average, or non‐desirable/poor) were determined based on two factors: length of stay (Supplementary Figure S2) and discharge disposition. Desirable (good) outcome classification had three or less days of hospital stay and discharge disposition of discharge to home, home health, or rehabilitation. Average outcomes had hospital stay of between four and seven days with discharge dispositions of transfer to other healthcare institutions including mental health facilities and home health with intravenous access. Non‐desirable (poor) outcomes had longer hospital stays of over seven days as well as discharge to hospice, long‐term acute care facility, nursing home, and skilled nursing facility, or died. The above outcome categorizations were determined by the study team including a senior neurologist. We combined the two variables of discharge disposition group and length of stay category as our hospitalization outcome target and predicted this target on the four dementia sub‐groups and derived risk‐classes. To handle multi‐class classification task, we used single class classification. The classes were dichotomized into low‐risk versus not low‐risk and high‐risk versus not high‐risk.

We first narrowed down 150 initial risk factors into 35 significant risk factors with the most significance levels (Supplementary Table S1) by analysis of variance (ANOVA) and subsequently used Glmnet to identify the twenty most important variables, that is, visit age, sex, race, marital status, body mass index (BMI), previous diagnosis of dementia, encephalopathy, delirium, physical restrain status, UTI, diabetes, admission source, number of medical problems at admission, deep venous thrombosis (DVT), falls, opioids, anemia, pressure ulcers, anti‐psychotics, and number of discharge diagnosis. Then, selected risk factors were encoded and passed to our prediction model (Supplementary Figure S3, Supplementary Table S2). Supplementary Figure S1 illustrates the methodology of the model establishment. Ordinal encoding and one‐hot technique were used to encode categorical variables as integers (Supplementary Table S2).

The most widely used baseline models for predicting hospital outcomes include K‐nearest neighbors (KNN), random forest, and decision tree models. We integrated two tree‐based algorithms, namely, decision tree and random forest, into an ensemble multi‐target detection model on multi‐dementia modality using the stacking ensemble method to avoid discriminative information not deeply mined in other multi‐target machine learning models. 16 The data were randomized into a 3:1 ratio, and the model was trained using a cohort of 10,452 encounters while an independent cohort of 5226 patients was used for model testing. We compared our model performance to that of multi‐layer perceptron (MLP) neural network and other baseline models. Finally, we performed feature importance analysis and evaluated the effect of the number of missing features. We evaluated the performance of the prediction models obtained by applying AUCROC (area under the curve of the receiver operating characteristics) and precision‐recall analysis. We employed the method proposed by Delong et al. 17 for pairwise comparison of the receiver operating characteristic (ROC) to compare the performance difference in order to determine the best model.

2.2. Statistical and feature important analyses

Analyses were performed using Python and R version 3.4.3. 18 , 19 Keras package in Python was used to design and train deep‐learning models. The model selection package in Python was used to generate the stratified ten‐fold training and validation dataset. Glmnet and speedglm packages in R were used for generating the Glmnet regression and logistic regression models. Scikit‐learn package in Python was used to perform AUCROC and precision‐recall sensitivity analysis of the prediction results using the model. Feature importance analysis was performed using Shapley Additive Explanation (SHAP) algorithm, a game‐theoretic approach to explore interactions between individual risk features.

3. RESULTS

3.1. Statistical distribution and demographics

There was a total of 15,678 patient encounters (8407 distinct patients) recorded, comprising of 4993 encounters with ADD, 4173 encounters with PDD, 3735 encounters with VD, and 2777 encounters that were OUD. At admission, median ages and ranges of the patients were 83 years (65–106) for the entire cohort, 84 years (65–106) for ADD, 82 years (65–101) for VD, 80 years (65–100) for PDD, and 84 years (65–105) for OUD. Women were the majority of the entire cohort, 55.1%; women were 62.5%, 58.1%, 40.7%, and 59.2% for ADD, VD, PDD, and OUD, respectively. Other details are in Table 1.

TABLE 1.

Demographic distribution of entire cohort and dementia groups

Total cohort Alzheimer's disease dementia Vascular dementia Parkinson's disease with dementia Other unspecified dementia
Number of patient encounters (N) 15,678 4993 3735 4173 2777
Age (years)
Median 83 84 82 80 84
Mean 81.99 83.53 81.31 80.23 82.78
Range (65,106) (65,106) (65,101) (65,100) (65,105)
Sex
Male 7043 (0.449) 1873 (0.375) 1564 (0.419) 2474 (0.593) 1132 (0.407)
Female 8635 (0.551) 3120 (0.625) 2171 (0.581) 1699 (0.407) 1645 (0.592)
Race
Asian 408 (2.6) 89 (1.8) 81 (2.6) 178 (4.2) 60 (2.2)
Black 3582 (22.8) 1157 (23.2) 1155 (30.9) 614 (14.7) 656 (23.7)
Caucasian 10,452 (66.7) 3384 (67.8) 2269 (60.7) 3105 (74.4) 1694 (61)
Hispanic 243 (1.5) 89 (1.8) 45 (1.2) 69 (1.6) 40 (1.4)
Indian/Native American 149 (0) 1 (0) 1 11 (0.2) 1
Others and declined 979 (6.2) 273 (5.4) 184 (4.9) 196 (4.6) 326 (11.7)
Body mass index 29.44 29.19 35.00 26.81 24.78

Examining the age distribution across our dataset showed that though the study was among the elderly > = 65, on average, patients with VD and Parkinson's with dementia were younger than patients with ADD (Supplementary Figure S4). Combining age and sex together in Supplementary Figure S5, the mean age of women across the dementia groups tended to be higher than that of men for each group. When the risk classes were dichotomized into low‐risk versus not low‐risk and then high‐risk versus not high‐risk, the selected models were trained to identify our primary concerned low‐risk (green) and high‐risk (red) hospitalization outcomes using supervised learning and achieved good accuracy (Table 2).

TABLE 2.

Categorization of length of stay and discharge disposition groups in Alzheimer's disease, vascular dementia, Parkinson's with dementia, and unspecified dementia

Alzheimer's disease Category I Category II Category III
Desirable discharge 488 985 346
Non‐desirable discharge 160 902 845
Average discharge 118 608 380
Vascular dementia Category I Category II Category III
Desirable discharge 440 802 389
Non‐desirable discharge 80 680 836
Average discharge 46 320 250
Parkinson's disease with dementia Category I Category II Category III
Desirable discharge 590 1078 440
Non‐desirable discharge 265 550 617
Average discharge 87 296 250
Unspecified dementia Category I Category II Category III
Desirable discharge 420 642 340
Non‐desirable discharge 134 240 598
Average discharge 60 343

Note: Category: Refers to the level of length of stay: I (0–3 days), II (4–7 days), III (>7 days).

Note: Discharge disposition group:

Desirable discharge: discharged to home, rehabilitation, etc.

Average discharge: discharged to home‐health re‐admit, transfer to other medical centers, etc.

Non‐desirable discharge: discharged to nursing home, transfer to nursing home‐based skilled nursing facility, long‐term acute care, expired, etc.

3.2. Model performance evaluation

Our stacking ensemble machine‐learning model achieved the best performance AUCROC and accuracy at 95.6%, better than other baseline models that is, random forest, decision tree, logistic regression, and MLP as seen in Figure 1A and 1B, respectively. The AUCROC was more than that of baseline models at 0.757 (95% CI, 0.752−0.762, Figure 2A) while precision‐recall AUC at 0.795 (95% CI, 0.794−0.796, Figure 2B) by stratified ten‐fold cross‐validation to predict the low‐risk target on ADD. The plot of Figure 2A illustrated the black font ROC curve (the AUCROC 0.757) and the variance of the curve in the gray field when the training dataset was split into different subsets. The precision‐recall AUC curve was significantly better than the straight (luck/chance) line, indicating our model was not penalized for predicting the majority class in all cases (Figure 2B). For comparison of performance of our model with other models, see Supplementary Table S3.

FIGURE 1.

FIGURE 1

AUCROC comparison curves of models and Boxplots of model accuracy in Alzheimer's disease dementia (ADD). (A) AUCROC curves for ensemble learning models and other baseline models on ADD. AUC, area under the curve; ROC, receiver operating characteristic. (B) Boxplot for model accuracy distribution of ensemble learning models and other baseline models on ADD

FIGURE 2.

FIGURE 2

Ten‐fold cross‐validation AUCROC and 10‐fold cross‐validation precision‐recall on Alzheimer's disease dementia (ADD) to predict low‐risk target. (A) Ten‐fold cross‐validation AUCROC on ADD. AUC, area under the curve; ROC, receiver operating characteristic. (B) Ten‐fold cross‐validation precision‐recall on ADD. AUCPR, area under the precision‐recall curve

For the important high‐risk outcome detection, the best result achieves an accuracy of 87.40% and AUCROC of 0.659(95% CI, 0.654–0.663) on ADD by stratifying ten repeats of ten‐fold cross‐validation. Similarly, the overall accuracies of 76.75% and 80% were for VD and PDD, respectively. In the confusion matrix for ADD (Figure 3A), the lower right block means 90.91% of 845 high‐risk samples were correctly classified while 9.09% were misclassified. The upper left of the matrix indicated that 83.26.% of 4148 other non‐high‐risk cases were correctly classified but 16.74% were misclassified. Figure 3B and 3D shows similar matrix for the other groups.

FIGURE 3.

FIGURE 3

Confusion matrix of the high‐risk outcomes across: (A) Alzheimer's disease dementia (ADD), (B) vascular dementia (VD), (C) Parkinson's disease with dementia (PDD), and (D) other unspecified dementias (OUD)

3.3. Predictive features in multi‐dementia groups

Four different prediction models were built for the dementia groups with each group's predictive features. The four models shared certain important hospitalization outcome predictive features, including visit age, BMI, encephalopathy, admission source, number of medical problems at admission, and pressure ulcers ranked among the top ten for all four, that is, 60% overlap amongst the list of top ten features. Certain features like the number of medical problems at admission and pressure ulcers ranked high in all four dementia groups, and these translated as strongly predictive of a poor or non‐desirable hospitalization outcome. Encephalopathy and admission source, that is, location the patient was admitted through, such as emergency room, physician office, or transferred from another Houston Methodist hospital or clinic, were also strongly predictive in three dementia groups. For admission sources, physician or clinical offices were correlated with better hospitalization outcomes consistently, compared with sources like emergency rooms and hospital acute care facilities. Fall history or risk in VD and OUD were predictive of poor outcomes if they occur during the hospitalization (Table 3, Supplementary Figure S6).

TABLE 3.

Hospitalization outcome predictive factors in multi‐dementia groups (ranked from most to least predictive factors) including modifiable factors

ADD VD PDD OUD
1. Encephalopathy a Number of medical problems at admission Number of medical problems at admission Number of medical problems at admission
2. Urinary tract infection a Body mass index Visit age Encephalopathy a
3. Pressure ulcers a Admission source Body mass index Pressure ulcers a
4. Number of medical problems at admission Marital status Pressure ulcers a Anemia a
5. Race Falls a Encephalopathy a Falls a
6. Admission source Pressure ulcers a Diabetes a Admission Source
7. Body mass index Encephalopathy a Admission Source Race
8. Sex Visit age Previous diagnosis of dementia Visit age
9. Diabetes a Delirium Race Marital status
10. Visit age Physical restrain status Marital status Urinary tract infection a
11. Delirium Deep venous thrombosis a Physical restrain status Deep venous thrombosis a
12. Anemia a Anemia a Delirium Opioids a
13. Opioids a No. of discharge diagnosis Deep venous thrombosis a Physical restrain status
14. Falls a Race Antipsychotics a Previous diagnosis of dementia
15. Deep venous thrombosis a Sex No. of discharge diagnosis Diabetes a
16. Antipsychotics a Previous diagnosis of dementia Anemia a Delirium
17. Marital status Urinary tract infection a Opioids a Antipsychotics a
18. Previous diagnosis of dementia Diabetes a Falls a Body mass index
19. No. of discharge diagnosis Antipsychotics a Sex Sex
20. Physical restrain status Opioids a Urinary tract infection a Antipsychotics a

Abbreviations: ADD, Alzheimer's disease dementia; OUD, other unspecified dementias; PDD, Parkinson's dementia; VD, vascular dementia.

a

Modifiable factors.

3.4. Feature importance analysis and model interpretability

We deployed the SHAP method 20 for determining the importance of features (Supplementary Figure S7A–D). The Beeswarm plots on the bottom part of each sub‐figure summarized how the top features in a dataset impact the model's output. A randomly selected 94‐year‐old man from the ADD group, BMI of 18.83, without UTI, delirium, encephalopathy, and diabetes in their history, or other medical problems at admission was correctly predicted as low risk, with high‐risk probability of only 17% (Supplementary Figure S8A–C). The long blue bar for encephalopathy in Supplementary Figure S8A indicates this feature was crucial in the prediction. Supplementary Figure S8B mirrors rotating 8A ninety degrees and then stacking horizontally, thus revealing the weights assigned to each feature. The SHAP decision plot (Supplementary Figure S8C) shows how this model arrives at its prediction.

3.5. The effect of missing value on model performance

Considering the risk‐benefit trade‐off involving missing value imputation in the context of feature selection, we applied Recursive Feature Elimination (RFE) on the training dataset to eliminate each feature one by one without replacing with random noise. The box and whisker plot in Supplementary Figure S9A were created for the distribution of accuracy scores with two to nine features removed or missing. They showed the overall performance of our model could achieve with these missing features. Elimination of additional features resulted in a decrease in prediction performance. In ADD, the drop in model's performance became significant (P = .0211) when six or more features were eliminated. Thus, five features should be the maximum allowable number of missing features to retain a trusted model performance and accuracy. Supplementary Figure S9B shows the results for VD, the cut‐off number of missing features is four.

4. DISCUSSION

Different machine‐learning algorithms including support vector machine, decision tree, random forest, and KNN have been attempted to detect and stage Alzheimer's disease (AD). 21 The use of random forest models in classifying AD has been reviewed by Sarica et al. 22 while ensemble machine‐learning classifiers based on random forest were applied on a multimodal AD dataset to develop more accurate prediction models for AD clinical decision supports by El‐Sappagh et al. 23 Random forest has also been used in several other studies. 24 , 25 , 26 Danso et al. used an explainable ensemble‐based model to predict ADD and used SHAP for interpreting risk factors. 27 Our stacking ensemble model, a decision tree meta‐model was defined by combining the predictions from the base models of random forest and decision tree to generate outstanding accuracy, AUCROC, and precision‐recall AUC. Our model outperformed other base models (Supplementary Table S3).

The presented results show that our predictive model not only attained the best performance but also helped discover key risk features across different dementia subtypes to support effective and timely clinical decision‐making. Predictive factors can be modifiable or non‐modifiable. Modifiable factors are deemed preventable, controllable, or improvable risk factors often from patient's history for which we could institute interventions to mitigate their effect on the hospitalization outcomes or prevent occurrence/reoccurrence in the current admission, for example, encephalopathy, infection, anemia, pressure ulcers, and falls. Non‐modifiable factors are risk factors not amenable to such clinical interventions, for example, age, sex, admission source, previous diagnosis of dementia, etc. (Table 3). Our predictive model may help boost the clinician's confidence in predicting patient outcome and facilitate communication with patients on their expectations and management during hospitalization. Furthermore, its performance can lead to early identification of modifiable risk factors that can translate to timely interventions including implementation of protocols that could mitigate anticipated poor hospitalization outcomes. As seen in the example above, the model demonstrates the effect of the absence (low‐risk) or presence (high‐risk) of UTI, delirium, encephalopathy, diabetes, etc. among other factors would play in the hospitalization outcome of the 94‐year‐old man with ADD. Our predictive model has potential to improve dementia services, such as allocating more appropriate medical resource to assist Alzheimer's patients with encephalopathy. Timely treatment of the underlying causes of encephalopathy, including metabolic, toxic, or infectious abnormalities at admission or early institution of delirium protocol, can improve the hospitalization outcomes of geriatric patients with dementia.

Despite the promising results achieved by machine learning approaches for AD and other dementias, their translation to clinical practice remains limited, partly due to the difficulty in interpreting these models. 28 To fill the gap, we applied feature importance analysis on the training dataset to improve the model interpretability and increase prediction performance. 29 Also, we examined the effect of removing two to nine features or risk factors on the model performance. The combination of stacking ensemble learning with RFE is shown to be able to handle the challenges of constraints commonly encountered in clinical practice, namely, imbalanced data, reliable prediction models, 30 and missing feature evaluation.

Interestingly, several risk factors of hospitalization outcomes identified are similar to those acknowledged in a study on dementia prevention. 31 Age, obesity, and diabetes were reported as important risk features both in our study and the Lancet Commission report on dementia prevention, intervention, and care. 31 Another example of an explainable output of our predictive model can be seen in the effect of the admission source variable, our model suggests that admission source has an impact on the hospitalization outcome such that emergency rooms and hospital acute care facility sources were associated with longer hospital stay when compared with physician or clinical office sources. Patients admitted through the former sources would often be expected to be sicker and thus stay longer in the hospital than those admitted from the latter.

Interestingly, UTI, a hospitalization‐outcome predictive factor intercepts with infections, that is, UTI, pneumonia and other respiratory infections, gastrointestinal infections, etc., published as risk factors for hospitalizations, emergency department visits, poor prognosis, and hospice care in advanced dementia. 32 , 33 Neuropsychiatric symptoms like delirium, falls, and infections were identified as common admission reasons in Lewy body dementia while antipsychotics were fingered in longer hospital stays and poorer outcomes. 34 Geriatric patients with dementia had higher complication rates for UTI, pressure ulcers, pneumonia, and delirium when compared with geriatrics without dementia. 35 Also noteworthy is that falls, delirium, and comorbidities were published as leading reasons for hospitalization or admission diagnoses in dementia patients 36 , 37 , 38 while diabetes, pneumonia, UTI, and fall‐related fracture were significantly associated with prolonged admission and recurrence of admissions, 39 all overlapping with a number of our hospitalization outcome risk factors. However, the objectives of these published studies were different from ours and also none employed advanced machine‐learning approaches like us. Importantly, our study identified novel factors such as certain demographics, admission source, number of medical problems at admission, opioids, etc., never mentioned in other published studies.

4.1. Clinical significance and implications

In this study, we identified the risk factors for hospitalization outcomes of patients with dementia including ADD, VD, PDD, and OUD. A major win for both the patient and the clinician is the ability to use this model to recognize the risk factors and projected outcome early during hospitalization. In the era of digital health, the engagement of machine learning in hospitalization outcome prediction is important. The strengths of this study include a large patient cohort and subgroups of geriatric patients with dementia, the use of electronic medical record (EMR) with defined data elements and risk factors, the identification of modifiable risk factors, and the application of advanced machine‐learning in predictive modeling.

The new insight on the factors (modifiable and non‐modifiable) associated with desirable and non‐desirable hospitalization outcomes is novel. Healthcare teams can use such knowledge for more judicious resource allocation, focused care management where needed most, timely protocol implementation, and development and institution of clinical interventions towards patients at high risk of undesirable hospitalization outcomes. These interventions could be incorporated into the smart clinical pathway of these more vulnerable patients, thereby improving outcomes. For example, for ADD, our model identified the top fifteen factors most associated with hospitalization outcomes: encephalopathy, UTI, pressure ulcers, number of medical problems at admission, race, admission source, BMI, sex, diabetes, visit age, delirium, anemia, opioids, falls, and DVT. While not much can be done for non‐modifiable factors such as race, age, and admission source, interventions can be implemented for the modifiable ones. Many healthcare institutions already have preventive protocols for many modifiable risk factors in place and routinely institute them for patients with these risk factors in their medical history. For example, preventive interventions such as regular position adjustments while lying down could be instituted for a patient with history or risk of pressure ulcers while frequent ambulation, mandatory thromboprophylaxis, and avoidance of interventions that might promote the recurrence of DVT put in place in a patient with a history or risk of DVT. There is always room for improvements of these protocols based on individual clinical scenarios and patient's characteristics. A significance of our work is that interventions to prevent urinary tract infections, falls, pressure ulcers, and DVT, or correct anemia among others, can be designed and willfully implemented and followed through upon admission.

Our ongoing work is applying the predictive model for a prospective randomized study where we aim to use our model to stratify these patients and implement interventions targeted at mitigating these risk factors in a treatment group and compare with standard of care recipients.

4.2. Limitations

Our study is based on clinical data from a major multi‐hospital health system in the greater Houston area. It would be interesting to see whether the results would extend to dementia patients from different regions as well as more granular dementia groups or a revised grouping system such as considering Lewy Body Dementia that is, PDD and dementia with Lewy body as a group. The Methods section above details how we classified these dementia groups in this study.

Diagnoses and clinical features were extracted using ICD codes which we acquiesce may not be without some reliance limitations despite our efforts to validate the data. In the same vein, while we considered encephalopathy as a syndrome caused by a dysfunction of the brain and has multiple etiologies, delirium is considered a clinical symptom that may or may not be a manifestation of encephalopathy. Both are coded in our hospital EMR and thus are listed individually based on coder's documentation for the admission. This study is based on encounters with multiple visits in some patients, thus, the presence or absence of certain risk factors and their count implications in these patients could argue to post certain limitation. Nevertheless, in our data, ninety percent of patients had no more than three visits, and there was no significant difference in the risk factors among number‐of‐visits groups as well as in poor versus good outcomes. (Supplementary Table S4). This limitation is thus rather minor. Our hospitalization‐outcome categorization, based on length of stay and discharge dispositions groupings, can be argued as not a universally adopted classification metric. However, this categorization is designed by our clinical care team that include a senior neurologist with extensive years and depth of experience. Furthermore, although we achieved a good performance in predicting the low‐risk hospitalization outcomes across the four groups of dementia patients, the predictive performance on the high‐risk outcomes could be improved further.

5. CONCLUSION

This study systematically integrated machine learning and big data to identify risk factors for hospitalization outcomes in geriatric patients with dementia and their ranked importance in order to generate a predictive model for early assessment. The results of this study are valuable to healthcare systems and clinicians as they can gain insight to what constitutes high‐risk factors for undesirable hospitalization outcomes in this population early in the hospital admission process. Thus, they would be able to implement additional interventions into the clinical pathway of these more vulnerable patients in order to mitigate these factors, achieve more desirable outcomes, and improve patients’ quality of life.

CONFLICTS OF INTEREST

The authors of this work declare that apart from support funding for this work by the following National Institutes of Health (for R01AG057635 and NIH R01AG069082), the T.T. and W.F. Chao Foundation, John S. Dunn Research Foundation, Houston Methodist Cornerstone Award, and the Paul Richard Jeanneret Research Fund, there are no other potential conflicts of interest. Author disclosures are available in the supporting information.

Supporting information

SUPPORTING INFORMATION

SUPPORTING INFORMATION

SUPPORTING INFORMATION

ACKNOWLEDGMENTS

This study is supported by NIH R01AG057635, NIH R01AG069082, the T.T. and W.F. Chao Foundation, John S. Dunn Research Foundation, Houston Methodist Cornerstone Award, and the Paul Richard Jeanneret Research Fund. We thank the Clinical Informatics Oversight Committee and Hospital IT Division of Houston Methodist Hospital for their advice and support of this project, Dr. Guihua Li for assisting in part of the data review, and Dr. Rebecca Danforth for proofreading the manuscript. The sponsors had no role in the study's conception and design, collection, analysis, and interpretation of the data, or writing of this manuscript. The corresponding authors have full access to all the data and reserved rights to share the same. They have the final responsibility to submit the manuscript for its publication.

Wang X, Ezeana CF, Wang L, et al. Risk factors and machine learning model for predicting hospitalization outcomes in geriatric patients with dementia. Alzheimer's Dement. 2022;8:e12351. 10.1002/trc2.12351

Xin Wang, Chika F Ezeana, and Lin Wang contributed equally as co‐first authors.

REFERENCES

  • 1. American College of O‐G. Practice bulletin no. 122: breast cancer screening. Obstet Gynecol. 2011;118(2 Pt 1):372‐382. [DOI] [PubMed] [Google Scholar]
  • 2. Dementia. AGS Health in Aging. Aging & Health A‐Z Web site. Published 2021. Accessed October 21, 2021. https://www.healthinaging.org/a‐z‐topic/dementia. 2021.
  • 3. Alzheimer's Association . 2020 Alzheimer's disease facts and figures. Alzheimers Dement. 2020;16(3):391‐460. doi: 10.1002/alz.12068 [DOI] [Google Scholar]
  • 4. Matthews KA, Xu W, Gaglioti AH, et al. Racial and ethnic estimates of Alzheimer's disease and related dementias in the United States (2015‐2060) in adults aged ≥65 years. Alzheimers Dement. 2019;15(1):17‐24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Qiu C, Kivipelto M, von Strauss E. Epidemiology of Alzheimer's disease: occurrence, determinants, and strategies toward intervention. Dialogues Clin Neurosci. 2009;11(2):111‐128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Alzheimer's Association . 2018 Alzheimer's disease facts and figures. Alzheimers Dement. 2018;14(3):367‐429. [Google Scholar]
  • 7. Dansson HV, Stempfle L, Egilsdóttir H, et al. Predicting progression and cognitive decline in amyloid‐positive patients with Alzheimer's disease. Alzheimers Res Ther. 2021;13(1):1‐16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Grueso S, Viejo‐Sobera R. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer's disease dementia: a systematic review. Alzheimers Res Ther. 2021;13(1):162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Tsuji S, Hase T, Yachie‐Kinoshita A, et al. Artificial intelligence‐based computational framework for drug‐target prioritization and inference of novel repositionable drugs for Alzheimer's disease. Alzheimers Res Ther. 2021;13(1):92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Qiu S, Joshi PS, Miller MI, et al. Development and validation of an interpretable deep learning framework for Alzheimer's disease classification. Brain. 2020;143(6):1920‐1933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Kimura Y, Watanabe A, Yamada T, et al. AI approach of cycle‐consistent generative adversarial networks to synthesize PET images to train computer‐aided diagnosis algorithm for dementia. Ann Nucl Med. 2020;34(7):512‐515. [DOI] [PubMed] [Google Scholar]
  • 12. Hao Z, Yang B, Ruggiano N, Ma Y, Guo Y, Pan X. Depression prediction amongst Chinese older adults with neurodegenerative diseases: a performance comparison between decision tree model and logistic regression analysis. Br J Soc Work. 2022;52(1):274‐290. [Google Scholar]
  • 13. Lavin A. Neuro‐symbolic neurodegenerative disease modeling as probabilistic programmed deep kernels. In: International Workshop on Health Intelligence Springer, Cham; 2021:49‐64. [Google Scholar]
  • 14. Thakur A, Mishra AP, Panda B, Rodríguez DCS, Gaurav I, Majhi B. Application of artificial intelligence in pharmaceutical and biomedical studies. Curr Pharm Des. 2020;26(29):3569‐3578. [DOI] [PubMed] [Google Scholar]
  • 15. Puppala M, He T, Chen S, et al. METEOR: an enterprise health informatics environment to support evidence‐based medicine. IEEE Trans Biomed Eng. 2015;62(12):2776‐2786. [DOI] [PubMed] [Google Scholar]
  • 16. Ye T, Zu C, Jie B, Shen D, Zhang D. Discriminative multi‐task feature selection for multi‐modality classification of Alzheimer's disease. Brain Imaging Behav. 2016;10(3):739‐749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. DeLong E, Clarke DeLong D. DL. Pearson Comparing the areas under two or more correlated Receiver Operating Characteristic (ROC) curves: a nonparametric approach. Biometrics. 1988;44:837‐845. [PubMed] [Google Scholar]
  • 18. R Core Team R. R: a language and environment for statistical computing. In: R Foundation for Statistical Computing Vienna, Austria; 2013. [Google Scholar]
  • 19. Van Rossum G. Python Programming Language. In: USENIX Annual Technical conference. 2007;41(1):1‐36. [Google Scholar]
  • 20. Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine‐learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2(10):749‐760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Venugopalan J, Tong L, Hassanzadeh HR, Wang MD. Multimodal deep learning models for early detection of Alzheimer's disease stage. Sci Rep. 2021;11(1):1‐13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Sarica A, Cerasa A, Quattrone A. Random forest algorithm for the classification of neuroimaging data in Alzheimer's disease: a systematic review. Front Aging Neurosci. 2017;9:329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. El‐Sappagh S, Alonso JM, Islam SMR, Sultan AM, Kwak KS. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer's disease. Sci Rep. 2021;11(1):2660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Ganz M, Greve DN, Fischl B, Konukoglu E, AsDN Initiative. Relevant feature set estimation with a knock‐out strategy and random forests. Neuroimage. 2015;122:131‐148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Lebedev A, Westman E, Van Westen G, et al. Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between‐cohort robustness. Neuroimage Clin. 2014;6:115‐125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Li H, Liu Y, Gong P, Zhang C, Ye J, Initiative ADN. Hierarchical interactions model for predicting Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) conversion. PLoS One. 2014;9(1):e82450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Danso SO, Zeng Z, Muniz‐Terrera G, Ritchie CW. Developing an explainable machine learning‐based personalised dementia risk prediction model: a transfer learning approach with ensemble learning algorithms. Front Big Data. 2021;4:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Pellegrini E, Ballerini L, MdCV Hernandez, et al. Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: a systematic review. Alzheimers Dement (Amst). 2018;10:519‐535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: a review. J King Saud University Comput Inform Sci. 2019. [Google Scholar]
  • 30. Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Informatics Decision Making 2011:11:51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Livingston G, Huntley J, Sommerlad A, et al. Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet North Am Ed. 2020;396(10248):413‐446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Givens JL, Selby K, Goldfeld KS, Mitchell SL. Hospital transfers of nursing home residents with advanced dementia. J Am Geriatr Soc. 2012;60(5):905‐909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Mitchell SL. Advanced dementia. N Engl J Med. 2015;372(26):2533‐2540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Spears CC, Besharat A, Monari EH, Martinez‐Ramirez D, Almeida L, Armstrong MJ. Causes and outcomes of hospitalization in Lewy body dementia: a retrospective cohort study. Parkinsonism Relat Disord. 2019;64:106‐111. [DOI] [PubMed] [Google Scholar]
  • 35. Rao A, Suliman A, Vuik S, Aylin P, Darzi A. Outcomes of dementia: systematic review and meta‐analysis of hospital administrative database studies. Arch Gerontol Geriatr. 2016;66:198‐204. [DOI] [PubMed] [Google Scholar]
  • 36. Rudolph JL, Zanin NM, Jones RN, et al. Hospitalization in community‐dwelling persons with Alzheimer's disease: frequency and causes. J Am Geriatr Soc. 2010;58(8):1542‐1548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Sommerlad A, Perera G, Mueller C, et al. Hospitalisation of people with dementia: evidence from English electronic health records from 2008 to 2016. Eur J Epidemiol. 2019;34(6):567‐577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Hermann D, Muck S, Nehen HG. Supporting dementia patients in hospital environments: health‐related risks, needs and dedicated structures for patient care. Eur J Neurol. 2015;22(2):239‐e218. [DOI] [PubMed] [Google Scholar]
  • 39. Chang C‐C, Lin P‐H, Chang Y‐T, et al. The impact of admission diagnosis on recurrent or frequent hospitalizations in 3 dementia subtypes: a hospital‐based cohort in Taiwan with 4 years longitudinal follow‐ups. Medicine (Baltimore). 2015;94(46):e2091‐e2091. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPORTING INFORMATION

SUPPORTING INFORMATION

SUPPORTING INFORMATION


Articles from Alzheimer's & Dementia : Translational Research & Clinical Interventions are provided here courtesy of Wiley

RESOURCES