Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 22;15:30979. doi: 10.1038/s41598-025-16376-9

Machine learning enhanced expert system for detecting heart failure decompensation using patient reported vitals and electronic health records

Shumit Saha 1,2,3,, Heather Ross 4,6,7, Pedro Elkind Velmovitsky 1, Chloe X Wang 7, Julie K K Vishram-Nielsen 4,5, Cedric Manlhiot 4, Bo Wang 7,8,9,10, Joseph A Cafazzo 1,2,9,11
PMCID: PMC12373751  PMID: 40846740

Abstract

Heart failure (HF) is a condition with periods of stability interrupted by periods of worsening symptoms, known as decompensation episodes. Digital interventions are promising tools to alleviate burdens on HF management through automated alerts at the earliest decompensation sign. To accomplish this, our lab developed Medly, an expert system-enhanced digital therapeutic program for HF patients. Medly’s algorithm is a knowledge-based system that analyzes weight, blood pressure, and heart rate and sends automated alerts to clinicians and patients if deterioration is identified. Rules were set conservatively to account for false negatives. However, reducing false negatives resulted in an increase in false positives, which can lead to unnecessary clinical workload. Further, patients’ electronic health records (EHR) were not used when developing the rules-based algorithm. This study aimed to enhance Medly’s performance with machine learning and include a richer set of data, including EHR, for predicting decompensated HF episodes. We performed a retrospective study using XGBoost for the binary classification of whether the patient needed to be contacted for a possible decompensation episode. Features included blood pressure, weight change, heart rate, and EHR data (e.g., blood work, medication history). We further performed interpretability analysis to investigate the importance of including EHR data in the model. The enhanced algorithm achieved 98.08% accuracy, 95.26% sensitivity, 98.86% specificity, and a PPV of 88.18% – a marked improvement over the 55.8% in the rules-based algorithm. EHR data, mainly B-type natriuretic peptide (BNP) and total cholesterol, was crucial in predicting decompensation and correcting false-positive alerting.

Keywords: Heart failure, Decompensated HF, Machine learning, Electronic health records, EHR

Subject terms: Cardiology, Machine learning

Introduction

Heart failure (HF) is a progressive condition with periods of stability interrupted by periods of worsening symptoms and instability1, known as decompensation. Without immediate intervention decompensated HF significantly impacts patients quality of life and frequently leads to hospitalization2. The course of HF may include multiple episodes of decompensation separated by stable periods of varying duration3. Therefore, a primary goal of HF management is to maintain stability. However, maintaining stability in HF patients requires frequent follow-up with clinicians. Further, failure to have early follow-up post hospital discharge is one of the most common reasons for HF-related re-hospitalization4. Studies have shown that the 30-day readmission rate after a HF-related hospitalization is 22%5, increasing to over 50% within six months3. Re-hospitalization makes HF one of the costliest chronic diseases, consuming significant healthcare resources6. Therefore, an efficient tool is required in HF management to maintain stability, thereby improving quality of life and reducing hospital admission, by predicting and preventing episodes of decompensation.

Digital Health interventions, combined with an expert system, are promising tools that can potentially alleviate some of the burdens of HF management by empowering patients to engage in self-care and enabling efficient clinical care through automated alerts at the earliest sign of decompensation episodes7. To accomplish this, experts from the University Health Network (UHN) developed Medly; a digital therapeutic program for patients with HF8,9. Using Medly, patients capture daily weight, blood pressure, heart rate, and symptoms either manually or via Bluetooth through the Medly application on a mobile phone or tablet8. Automated alerts are sent to nurse coordinators and the patient themselves if any deterioration is identified through the rules-based expert system that analyzes their measurements and symptoms8. Moreover, a single nurse can support more than 250 patients with the Medly platform, depending on the mix of patient acuity. Medly has shown success in HF management by significantly improves HF-related quality of life and self-care maintenance and management912. Further, a recent study by Ware et al. demonstrated that the use of Medly led to a 50% reduction in HF-related hospitalizations and a 24% reduction in all-cause hospitalizations11.

However, a limitation of the Medly rules-based algorithm is that its development was conservative, sacrificing its positive predictive value, for ensuring safety and avoiding false negatives. This resulted in the rules-based algorithm having a higher number of false positives, resulting in additional clinical follow-up caused by alerts. The rules-based algorithm also relies only on most recent 2 day weight delta, and the current day’s heart rate, blood pressure, and reported symptoms in making its determination. By adding the patient’s clinical history within the electronic health record (EHR), including lab results, medication history, and other chronic disease history, there may be room to improve the current rules-based algorithm by reducing false positives, improving clinic efficiency, and determining the importance of incorporating EHR data.

The purpose of this study was to implement a machine learning-enhanced expert system algorithm that includes EHR data in addition to the original rules-based expert system algorithm for predicting decompensated episodes of HF through Medly. Furthermore, we attempted to answer two crucial questions in this study while implementing the machine learning algorithm: (a) what clinical inputs are most important to predict the decompensation episodes, (b) how much of the medical history from the EHR can help reduce false positives experienced with the rules-based algorithm.

Methods

Ethics tatement

The study was approved by the Research Ethics Board of the University Health Network (UHN), Toronto, Canada. All participants gave written consent before participating in the study (REB No: 19-5213). The study was performed in accordance with the approved guidelines and regulations.

Study participants

This was a single-center retrospective cohort study. We used historical Medly data (From August 2016 to August 2019) collected through the clinical Medly program at UHN. The eligibility criteria for enrolling into the Medly Program were: age 18 years or older, diagnosed with HF and were followed by a cardiologist at the HF clinic, able to speak and read English (or have an informal caregiver who did) to understand the text adequately in the Medly app, and able to comply with using Medly-generated instructions and alerts11. As the Medly program has been implemented as part of the standard of care, there were no explicit exclusion criteria for participating in the program.

The medly program

Medly is an expert system-enhanced digital therapeutic platform for patients with HF. The application is a Health Canada-cleared, Class 2 medical device. In the standard protocol of Medly, the application collects the vital signs (i.e., weight, blood pressure, heart rate, and symptoms) of each patient, each morning11,12. Data are automatically transmitted to a data server. An in-app rules-based expert system algorithm generates an alert of possible decompensation based on the morning entry. When an alert is generated, a message is sent to the patient’s mobile phone along with a notification to the most responsible clinician8. Based on the level of the alert, appropriate counseling is provided to the patient from the app. Since the Medly program is integrated as part of the standard of care, all patients enrolled into the study were monitored through Medly.

Medly’s rules-based algorithm

Seto et al., developed a rule set based on extensive inputs from HF clinicians to generate alerts and instructions for patients8. The alerts and instructions were generated based on the patient’s daily measurements of weight, blood pressure, heart rate, and symptoms. A matrix of possible outcome states from the measurements was defined in the rules-based system. The set of rules and alert messages and actions associated with the outcome states were defined and validated based on interviews with ten HF clinicians. Details of the algorithm are presented in our previous work8.

Data analysis

Input variables

The rules-based algorithm used the morning measurements of blood pressure, weight change, heart rate, and symptoms to generate an alert for the same day. To improve the rules-based algorithm, we added the previous 2-days morning measurements of the parameters along with the same day measurement for a total of 3 consecutive days. Furthermore, we included information on the laboratory investigations of each patient. The laboratory investigations included complete blood count (CBC), lipid profile, serum electrolytes, renal function test, serum urea, serum creatinine, B-type natriuretic peptide (BNP), serum creatinine. We also included patients’ medication history, their other co-morbid conditions, and their smoking history. Table 1 shows the complete set of input variables were used in the analysis.

Table 1.

Input Variables.

Feature Type # Of Features (variable type)
Age 1 (continuous)
Sex 1 (categorical)
Same day features Blood pressure (systolic and diastolic) 2 (continuous)
Heart rate 1 (continuous)
Weight change 1 (continuous)
Symptoms (unusual heartbeat, tiredness, shortness of breath, light-headedness, chest pain, reduced activities, icd fired, night breathing worsened, swollen ankles) 10 (categorical-yes/no)
Previous 2-day features Blood pressure (systolic and diastolic) 4 (continuous)
Heart rate 2 (continuous)
Weight change 2 (continuous)
Symptoms (unusual heartbeat, tiredness, shortness of breath, light-headedness, chest pain, reduced activities, icd fired, night breathing worsened, swollen ankles) 20 (categorical-yes/no)
EHR Complete blood count (CBC): Platelets (109 Cells/L) 1 (continuous)
RBC (1012 Cells/L) 1 (continuous)
WBC/Leukocytes 1 (continuous)
HCT 1 (continuous)
Neutrophils/Polys 1 (continuous)
Lymphocytes (109 Cells/L) 1 (continuous)
Monocytes (109 Cells/L) 1 (continuous)
Eosinophils (109 Cells/L) 1 (continuous)
Basophils (109 Cells/L) 1 (continuous)
Renal function test: E-GFR(ml/min/1.73m2) 1 (continuous)
African EGFR (ml/min/1.73m2) 1 (continuous)
Lipid profile: Total Cholesterol (Mmol/L) 1 (continuous)
HDL-C (Mmol/L) 1 (continuous)
LDL-C (Mmol/L), 1 (continuous)
Triglycerides (Mmol/L) 1 (continuous)
Serum electrolytes: Sodium (Mmol/L) 1 (continuous)
Potassium (Mmol/L) 1 (continuous)
Chloride (Mmol/L) 1 (continuous)
Calcium (Mmol/L) 1 (continuous)
Phosphate (Mmol/L) 1 (continuous)
Magnesium (Mmol/L) 1 (continuous)
Serum Urea (Mmol/L) 1 (continuous)
Serum Creatinine (µmol/L) 1 (continuous)
b-type natriuretic peptide: BNP (Pg/Ml) 1 (continuous)
Hemoglobin: Hgb (G/L) 1 (continuous)
Drug history: ACEI 1 (categorical-yes/no)
Beta-blocker 1 (categorical-yes/no)
MRA 1 (categorical-yes/no)
Diuretics 1 (categorical-yes/no)
Other disease history Hypertension 1 (categorical-yes/no)
Diabetes 1 (categorical-yes/no)
Atrial fibrillation 1 (categorical-yes/no)
COPD 1 (categorical-yes/no)
Chronic respiratory disease (CRD) 1 (categorical-yes/no)
Cancer 1 (categorical-yes/no)
Other history Smoking 1 (categorical-yes/no)
Total Features 81

Outcome measure

Medly’s rules-based algorithm outputs different alerts. A total of eight alerts were generated based on patients’ vitals, where 1 corresponds to ‘everything is normal,’ and 8 corresponds to ‘high-level emergency or call 911’8. In the rules-based output, alert numbers 1–3 corresponded to no contact needed with the physicians or clinic. However, alerts number 4–8 were associated with some warning, diuretic instructions, and required contact with the corresponding clinic. For simplification purposes, we classified these eight alerts into 2 classes: “Contact” or “No Contact”. If the patient was asked to contact the care team at any stage, that alert was categorized as a “Contact”. Otherwise, all other alerts were classified as “No Contact”. Furthermore, clinician experts J. K. K. V-N and C.M. trained and oversaw 3 highly qualified personnel in.

evaluating all alerts manually, using clinical data, to determine the true positives, true negatives, false positives, and false negatives for our evaluation system. The definition of the true positives, true negatives, false positives, and false negatives for our system are as follows:

  1. True Positive: The rules-based algorithm generated a “Contact” alert deemed appropriate by the cardiologist (change in treatment, counseling provided, or care escalation).

  2. True Negative: Rules-based algorithm did not generate a “Contact” alert, and the patient remained stable (no emergency visit or critical clinical deterioration within the next 24 h).

  3. False Positive: Rules-based algorithm generated a “Contact” alert that was deemed inappropriate by the cardiologist (no change in treatment, counseling provided, or care escalation).

  4. False Negative: Rules-based algorithm did not generate a “Contact” alert, but the patient experienced clinical deterioration within the next 24 h and/or received emergency care.

HF decompensation episodes Estimation

Model development

The predictive model was developed using the XGBoost framework, employing a gradient-boosted decision tree ensemble with a binary logistic objective function13. One main advantage of XGBoost is that it provides a parallel tree boosting that solves many data science problems quickly and accurately. For these reasons, XGBoost has been widely used in solving several healthcare-related classification and regression problems1419. The XGBoost Python implementation was used (https://xgboost.readthedocs.io/en/stable/), version 1.4.2.

Hyperparameter tuning was conducted through a randomized grid search to optimize model performance while preventing overfitting. The search explored combinations of key hyperparameters, including tree depth, learning rate, column subsampling ratio, class weighting to address imbalance, and the number of boosting rounds. Each configuration was evaluated using five-fold stratified cross-validation, with area under the receiver operating characteristic curve (AUC) as the scoring metric.

Following hyperparameter optimization, the best parameter set was merged with a fixed baseline configuration that included the use of the GBTree booster, subsampling of training instances, and a fixed random seed for reproducibility. The final model was trained on the full training dataset, with performance monitored on a held-out validation set using AUC as the evaluation criterion. Early stopping with a patience of 30 iterations was employed to terminate training once validation performance plateaued, enhancing generalizability and reducing the risk of overfitting.

To handle missing data, the model leveraged XGBoost’s native mechanism, which automatically learns the optimal split direction for missing values during tree construction. All continuous variables were standardized prior to training to mitigate the influence of outliers and ensure numerical stability. Categorical variables were one-hot encoded, and features derived from patient-reported daily measures, rolling summaries of prior days’ vitals, and laboratory-based EHR values were included in the model without manual feature selection, allowing the algorithm to learn relevant feature interactions directly.

We used the feature set from Table 1. In conjunction with the feature set, we used the output from the Medly rules-based algorithm and used it as an input to the XGBoost classifier. Overall, the model’s input was: Same day features + previous two days features + Medly rules-based algorithm output + EHR data (Fig. 1).

Fig. 1.

Fig. 1

Algorithm flow: This diagram illustrates the overall architecture of the proposed XGBoost-based prediction pipeline. Input data includes same-day vital signs and symptoms (e.g., blood pressure, heart rate, weight change), two-day historical trends of these vitals, and EHR data such as lab results and comorbidity history. The output from the Medly rules-based algorithm is also included as an input feature, leveraging clinical knowledge. The integration of these data sources into the XGBoost model reflects our approach to enhancing prediction accuracy while maintaining alignment with real-world clinical data availability.

Validation and evaluation

The dataset was divided into training and test sets. Our dataset consisted of 342 patients who had daily measurements for 1 to 3 years. We first split the dataset by 80:20 ratio into the training and testing sets on a patient level, such that the training and testing sets did not have overlapping patients. The resulting training set contains 52,932 incidents, and the testing set contains 11,161 incidents (labeled by 3 highly qualified personnel, as previously mentioned). However, in the training dataset, we had an overall “No Contact” to “Contact” ratio of 10:1. To alleviate the effect of class imbalance on model training, in the training set we randomly subsampled the “No Contact” incidents to match with the “Contact” incidents to a ratio of 0.75. This results in 11,636 incidents used in training. The “Contact” to “No Contact” ratio in the testing set remained unchanged. The parameter search was performed in 5-fold cross-validation only on the training set. After the best parameters were obtained, the model was trained with the training set. All data was standardized using the Standard Scaler class of sci-kit learn’s library to handle outliers and make the process more efficient. Any missing medication dosage information was considered as no prescription given.

We evaluated the performance of the architectures in the testing set and calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy. Furthermore, we calculated the precision, recall, and F1 score to assess the performance of the model. To ensure fair comparisons with the Medly rules-based algorithm, we further selected classification thresholds based on the Medly rules-based algorithm’s true positive rate. By holding the true positive rate at the same level, we assessed changes in PPV, which would allow us to investigate improvements in PPV without sacrificing sensitivity.

Interpretability analysis

We further conducted feature importance analyses to determine which features played a significant role in predicting the decompensation episodes and correcting the false positives from the rules-based algorithm. To accomplish this, we performed three analyses: (a) the SHAP summary plot of the model, (b) ablation studies and (c) performance analysis of the addition of the EHR data between the original Medly rules-based algorithm vs. XGBoost algorithm. Each of these strategies for interpretability of the model, and feature importance, are detailed below.

SHAP summary

We used the SHAP method to investigate the feature importance20. The SHAP method evaluates the relative contribution of each feature to the model prediction and assigns a Shapley score to each feature as a measure of its importance. We used the SHAP summary plot to visualize the feature importance20. SHAP summary plot is superior to standard feature importance bar charts, which do not represent the range and distribution of the importance on the model’s output. Alternatively, the SHAP summary plot represents feature importance as feature effects on the model. In the SHAP summary plot, each point represents a sample. The X-axis represents the Shapley value, and the color of the points (i.e., Y-axis) represents the feature value. The X-axis indicates the Shapley value of each sample (> 0: contribution towards positive prediction; <0: contribution towards negative prediction). The red color indicates a higher feature value, while the blue color indicates a lower feature value.

We used SHAP’s Python implementation (https://shap.readthedocs.io/en/latest/), version 0.39.0.

Ablation studies

We further performed several ablation studies to check the feature importance. We applied several combinations of input features to the XGBoost model and checked the performance of the algorithm. The hypothesis was that if removing a set of features reduced the PPV rate, then that set of features played a significant role in reducing the false positives. The list of ablation studies is given below:

  1. Ablation Study 1: Same day Features.

  2. Ablation Study 2: Same day features + Medly rules-based algorithm output.

  3. Ablation Study 3: Same day features + Medly rules-based algorithm output + EHR data.

  4. Ablation Study 4: Same day features + Previous 2 days features.

  5. Ablation Study 5: Same day features + Previous 2 days features + Medly rules-based algorithm output.

Evaluation of these ablation studies was performed based on PPV, NPV, sensitivity, and specificity.

After the first set of ablation studies, we performed another set on the EHR data. Here, we removed one-by-one each continuous variable from the EHR data set, re-trained the model, and tested the performance on the test data. In addition, we evaluated the PPV and F1-score (harmonic mean between precision and recall) after removing the variables. The hypothesis was that if the PPV and F1 score were reduced from the overall score after removing the variable, that variable might play an essential role in reducing the false positives. Thus, we performed 25 ablation studies and compared the performance of the algorithm in each instance.

Performance analysis of the use of EHR data between medly rules-based vs. XGBoost algorithm

We calculated the average and standard deviation of each continuous EHR data variable for estimated true negative, false positives, true positives, and false negatives for both Medly rules-based algorithm and the XGBoost algorithm. Then, we compared the trend of change in the EHR value between the false positives estimated by rules-based and XGboost algorithms. When the average value of an EHR value increased in the estimated false positives from rules-based to XGBoost, this indicated the fact that XGBoost considered the lower values of that variable in the process of correcting false positives.

Results

Dataset

We had 11,636 and 11,161 incidents in the training and testing sets, respectively. Of the 11,636 incidents in the training set, 6,649 incidents were marked as “No Contact”, and 4,987 points were scored as “Contact”. In the 11,161 incidents in the testing set, 10,252 incidents were marked as “No Contact”, and 909 points were scored as a “Contact”.

Results of medly rules-based algorithm

The original rule-based algorithm obtained 93.6% accuracy with 94% sensitivity and 93% specificity. However, the PPV was 55.8%, as the original rules-based algorithm was deliberately conservative in nature to ensure it would minimize missing decompensation episodes (Fig. 2).

Fig. 2.

Fig. 2

Performance comparison between Medly rules-based algorithm vs. XGBoost algorithm shows that XGBoost can improve the positive predictive value (PPV) while maintaining similar sensitivity and accuracy.

Results of XGBoost algorithm

The XGBoost algorithm obtained 98.08% accuracy with 95.26% sensitivity and 98.86% specificity. The PPV was 88.18%. The confusion matrix showed that the false positive value was 116, while the false positive was 684 in the rules-based algorithm. Thus, it was evident that the XGBoost algorithm reduced the false positives with the aid of EHR data or the previous 2-days measurements (Fig. 2).

Interpretability analysis

Below are the results of each of the interpretability analyses, providing more details on the contributions and importance of each feature.

SHAP summary

We used a SHAP summary plot to investigate the relationship between the value of a feature and the impact on the prediction (Fig. 3-a). From the summary plot, it was clear that the Medly Rules-based algorithm played a dominant role in the final prediction from the XGBoost model.

Fig. 3.

Fig. 3

(a) SHAP summary plot: SHAP summary plot showing the relative importance of input features. The Medly rules-based output is the top contributor, followed by key EHR markers like BNP and phosphate. The plot also shows directional effects of each feature (e.g., higher BNP increases likelihood of decompensation). (b) Ablation study results: Results of ablation studies comparing different input feature configurations. The inclusion of EHR data leads to notable improvement in PPV (from 0.69 to 0.86), and adding previous 2-day vitals further boosts performance to 0.88. This analysis supports the additive value of each feature category, particularly for reducing false positives.

From the EHR data, we found that higher BNP, lower platelets, having a history of chronic respiratory disease, lower total cholesterol, and lower LDL were associated with a greater likelihood of decompensation episodes (Fig. 3).

From the previous 2-days measurements, we found that the weight change, blood pressure, and heart rate played a vital role in predicting the risk of decompensation episodes.

Ablation studies

From the ablation studies, we found that the existing Medly Rules-based algorithm output was significant in obtaining the high sensitivity and specificity from the XGBoost algorithm (Fig. 3-b). However, we obtained the PPV rate of 0.69 when using only same-day features + rules-based algorithm as input to the XGBoost model (Fig. 3-b, 3rd column). After adding the EHR data as an input with the same-day features and rules-based algorithm, the PPV value increased to 0.86 (Figs. 3-b and 4th column). As well, adding the previous 2-day measurements with the same-day features and rules-based algorithm and EHR data (current proposed algorithm, Fig. 3-b, 1 st column) improved the PPV from 0.86 to 0.88. Overall, it was evident that EHR data variables may play a pivotal role in improving the PPV rate, compared to the previous 2-day measurements.

In the next set of ablation studies, we removed one-by-one each continuous variable from the EHR data and evaluated the PPV and F1-score after removing the variables (Fig. 4-a and b). While using all variables, the PPV and F1 scores were 0.867 and 0.908, respectively. After removing BNP and phosphate, the PPV and F1 score reduced to 0.83 and 0.89, respectively. However, eliminating total cholesterol and platelets did not reduce the PPV rate.

Fig. 4.

Fig. 4

Finer ablation studies: Change in PPV when each EHR variable is individually removed. Removal of BNP causes the largest decline, confirming its strong predictive value; (b) Corresponding drop in F1 score (harmonic mean of precision and recall), again highlighting BNP and phosphate as critical variables; (c) Mean BNP values in incidents classified as false positives by both the Medly rules-based and XGBoost models. The increase in BNP among false positives identified by XGBoost indicates that the model correctly deprioritizes alerts when BNP levels are low, improving precision.

Performance analysis of the use of EHR data between medly rules-based vs. XGBoost algorithm

We found that the proposed XGBoost algorithm reduced the false positives from the rules-based algorithm in estimating the decompensation episodes by incorporating the EHR data. Therefore, we compared how the XGBoost was using EHR data in the process of correcting the false positives from the Medly rules-based algorithm. We found that the average BNP value in the false-positive incidents for XGBoost was higher than the rules-based algorithm (Fig. 4-c). The average went higher since the lower values of the BNP were removed in the estimated false positives incidents. This indicated the fact that XGBoost considered the lower values of BNP in the process of correcting false positives.

Discussion

The most significant findings of this study were that implementing a machine learning enhanced expert system with EHR data in Medly improved the overall predictive ability of HF decompensation episodes. As shown in Fig. 4-c, BNP and phosphate played vital roles in predicting decompensation episodes and correcting the false-positive incidents generated by the original Medly rules-based algorithm. Overall, our study has demonstrated the concept that EHR data should be incorporated in enhancing rules-based expert systems for the daily prediction of HF decompensation.

This approach presents some advantages over using machine learning or an expert system approach exclusively. Domain knowledge can be explicitly incorporated into the algorithm, improving its explainability and making the output more deterministic. This in turn, it creates more flexibility of updating the model and as demonstrated, improving the robustness model.

Such approaches are not new, and have been demonstrated in heart disease prediction21, brain tumor classification22, and autism diagnosis23, incorporating other methods such as fuzzy logic and particle swarm optimization. It should also be noted that the machine learning component of the Medly AI algorithm is designed to use the most recent available laboratory results and continue to rely on them until the next scheduled test becomes available. This design ensures that the model functions entirely within existing clinical care pathways, without necessitating additional or more frequent testing. As a result, the model avoids introducing any added burden to patients or clinicians in terms of care demands, logistical complexity, or cost. This integration is particularly important in outpatient or ambulatory care settings, where lab tests may not be performed routinely, and real-time availability of EHR data can be limited. This approach will enhance decision-making and clinical efficiency without introducing friction into the care process or increasing the burden on the healthcare system.

HF is one of the most expensive chronic diseases24. An analysis in 2012 showed that the global cost of HF management was $108 billion per annum25, with $65 billion attributed to direct costs and $43 billion to indirect costs25. In the USA, the total cost of care (direct and indirect costs) for HF in 2020 is estimated at $43.6 billion, with over 70% of costs attributed to medical costs26. The majority of this cost is driven by hospitalization due to HF decompensation episodes24. HF decompensation also carries significant mortality with 22–42% of patients dying within 1 to 5 years from the first HF-related hospitalization27. These outcomes can be improved with proper patient self-care and active clinical monitoring. A recent meta-analysis has shown that self-management interventions can reduce HF-related hospitalization or all cause death and improve HF-related quality of life28. Proper self-care support can be achieved by accurately predicting HF decompensation on a daily basis while the patient is at home. Taken together, accurate prediction of HF decompensation is essential for better management of HF, which may reduce HF-related hospitalization and mortality.

Accurate prediction of HF decompensation can be achieved with automated computerized decision support systems, such as Medly. Furthermore, with the aid of machine learning and appropriate data, this type of clinical decision support system can be better equipped to handle the level of complexity inherent to patients with HF. Thus, a machine learning approach could potentially have a large impact on improving HF management. However, machine learning approaches in the field of HF have largely focused on predicting outcomes (such as medication adherence29,30, and hospital readmission3135 in retrospective patient cohorts. To the best of our knowledge, no previous studies have used machine learning to optimize the management of patients with HF either in-hospital or using remote management strategies on a daily basis.

In this study, we improved the performance of an existing rules-based expert system and demonstrated the importance of integrating EHR data into the analysis. Our findings suggest that implementing the machine learning algorithm improved the overall accuracy and overall sensitivity and specificity of the current Medly rules-based algorithm. Most importantly, the implementation of the machine learning algorithm improved the PPV from 0.55 to 0.88, indicating a significant reduction in false positives, while not increasing the false negatives. A reduction in false positives could improve the overall clinic efficiency, while at the same time, not compromising safety as no associated increase in false negatives was observed. Moreover, it may reduce the consumption of healthcare resources, including nurses, physicians, and unnecessary follow-ups. Taken together, these findings suggest that the implementation of a machine learning algorithm with the aid of EHR data and previous day data improves the overall performance of Medly’s HF decompensation estimation.

We further performed the interpretability analysis of our results to understand which features played a significant role in predicting decompensation episodes and reducing the false positives. Based on the Shapley summary analysis, we found that both same day and the previous 2-day weight changes, heart rate, and blood pressure play a significant role in predicting decompensation episodes. These findings were in line with previous studies which showed that blood pressure, weight, and heart rate can have ability to predict the early onset of a decompensation episode3639. Overall, it was evident that weight changes, heart rate, and blood pressure were the most important features in predicting the decompensation episodes in this model.

Among EHR data, higher BNP, phosphate, lower platelets, lower LDL, lower total cholesterol, and having history of cardiorespiratory diseases were important features in predicting decompensation episodes. It is well known that BNP levels increase in line with HF severity40. Myocardial stretch is a major element in the stimulation of BNP secretion. Physiological effects of BNP are associated with vasodilation, natriuresis, and inhibition of the sympathetic nervous system and renin-angiotensin-aldosterone which are highly associated with HF. Taken together, it is evident that BNP is a useful marker of the heart’s status during treatment for heart failure. On the same token, high phosphate levels contribute to heart failure, having been shown to influence mitochondrial dysfunction and myocardial energy metabolism remodeling41. Thus, the inclusions of BNP and phosphate in an automation algorithm are important and justified.

Previous studies have shown that lower total cholesterol, LDL, and HDL were associated with worse symptoms of HF and higher in-hospital mortality42,43. Although it was not clear why lower total cholesterol is associated with HF decompensation, it is thought that multiple factors such as malnutrition and inflammation could be the underlying reasons. Overall, total cholesterol and LDL are important EHR data variables that should be considered in predicting worsening HF.

It should be noted that only diuretics, among other treatment options, was relevant in correcting false positives based on SHAP analysis.

Although previous day measurements and EHR played a vital role in predicting decompensated episodes, data from the EHR played a more significant role in reducing the false positives of the Medly rules-based algorithm. Our analysis revealed that lower values of BNP were accounted in the machine learning algorithm, which was helpful to correct the false positives of the rules-based algorithm. This analysis further justified the importance of incorporating EHR data variables in predicting HF decompensation.

Our study is subject to some limitations. We used a retrospective dataset to develop and validate the machine learning algorithm. Hence, it will be essential to further validate this algorithm in a new prospective dataset before implementing it in the Medly platform.

Overall, this study showed the importance of implementing a machine learning algorithm for predicting HF decompensation. Furthermore, we demonstrated the importance of incorporating EHR data, especially BNP and cholesterol, to predict decompensation episodes. Furthermore, incorporating such EHR data lowered the false positives of the rules-based algorithm. Lowering false positives may significantly reduce the use of healthcare resources. Once validated in a prospective dataset, this algorithm could prove very useful in the management of HF patients.

Acknowledgements

The authors would like to acknowledge the support of the Vector Institute Pathfinder project and the Wolfond Chair in Digital Health.

Author contributions

Conceptualization: SS, HR, BW, JCData curation: CW, JV, PEV, CMFormal analysis: SS, CWFunding acquisition: HR, BW, JCInvestigation: SS, HR, BW, JCMethodology: SS, HR, BW, JCProject administration: SSResources, software: BW, JCSupervision: HR, BW, JCValidation, Visualization: SS, CWWriting – original draft: SSWriting– review & editing: SS, HR, E, CW, BW, JC.

Funding

Vector Institute Pathfinder Program, Wolfond Chair in Digital Health.

Data availability

The datasets generated and/or analysed during the current study are not publicly available due to containing personal health information, but are available from the corresponding author on reasonable request”.

Declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

Members of the research team (JC and HJR) have intellectual property rights of the Medly system.

Ethics statement

All participants gave written informed consent before participating in the study (REB No: 19-5213). The study was performed in accordance with the approved guidelines and regulations.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Cowie, M. R. et al. Improving care for patients with acute heart failure: before, during and after hospitalization. ESC Heart Fail.1 (2), 110–145 (2014). [DOI] [PubMed] [Google Scholar]
  • 2.Gheorghiade, M. et al. Acute heart failure syndromes: current state and framework for future research. Circulation112 (25), 3958–3968 (2005). [DOI] [PubMed] [Google Scholar]
  • 3.Desai, A. S. & Stevenson, L. W. Rehospitalization for heart failure: predict or prevent? Circulation126 (4), 501–506 (2012). [DOI] [PubMed] [Google Scholar]
  • 4.White, M., Garbez, R., Carroll, M., Brinker, E. & Howie-Esquivel, J. Is teach-back associated with knowledge retention and hospital readmission in hospitalized heart failure patients? J. Cardiovasc. Nurs.28 (2), 137–146 (2013). [DOI] [PubMed] [Google Scholar]
  • 5.Cox, Z. L., Lai, P., Lewis, C. M. & Lenihan, D. J. Centers for medicare and medicaid services’ readmission reports inaccurately describe an institution’s decompensated heart failure admissions. Clin. Cardiol.40 (9), 620–625 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rosseter, R. Nursing Shortage Fact Sheet (American Association of Colleges of Nursing, 2014).
  • 7.Ong, M. K. et al. Effectiveness of remote patient monitoring after discharge of hospitalized patients with heart failure: the better effectiveness after transition–heart failure (BEAT-HF) randomized clinical trial. JAMA Intern. Med.176 (3), 310–318 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Seto, E. et al. Developing healthcare rule-based expert systems: case study of a heart failure telemonitoring system. Int. J. Med. Informatics. 81 (8), 556–565 (2012). [DOI] [PubMed] [Google Scholar]
  • 9.Seto, E. et al. Mobile phone-based telemonitoring for heart failure management: a randomized controlled trial. J. Med. Internet. Res.14 (1), e31 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ware, P. et al. Patient adherence to a mobile Phone–Based heart failure telemonitoring program: A longitudinal Mixed-Methods study. JMIR mHealth uHealth. 7 (2), e13259 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ware, P. et al. Outcomes of a heart failure telemonitoring program implemented as the standard of care in an outpatient heart function clinic: Pretest-Posttest pragmatic study. J. Med. Internet. Res.22 (2), e16538 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ware, P., Ross, H. J., Cafazzo, J. A., Laporte, A. & Seto, E. Implementation and evaluation of a smartphone-based telemonitoring program for patients with heart failure: mixed-methods study protocol. JMIR Res. Protcols. 7 (5), e121 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen, T. & Guestrin, C. (eds) Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; (2016).
  • 14.Li, M., Fu, X. & Li, D. (eds) Diabetes prediction based on xgboost algorithm. IOP Conference Series: Materials Science and Engineering; : IOP Publishing. (2020).
  • 15.Li, S. & Zhang, X. Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Comput. Appl.32(7), 1–9 (2019).
  • 16.Luckner, M., Topolski, B. & Mazurek, M. (eds) Application of XGBoost algorithm in fingerprinting localisation task. IFIP International Conference on Computer Information Systems and Industrial Management; : Springer. (2017).
  • 17.Ogunleye, A. & Wang, Q-G. XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans. Comput. Biology Bioinf.17 (6), 2131–2140 (2019). [DOI] [PubMed] [Google Scholar]
  • 18.Torlay, L., Perrone-Bertolotti, M., Thomas, E. & Baciu, M. Machine learning–XGBoost analysis of Language networks to classify patients with epilepsy. Brain Inf.4 (3), 159–169 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhang, H. et al. Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model. Appl. Soft Comput.80, 57–79 (2019). [Google Scholar]
  • 20.Lundberg, S. M. & Lee, S-I. (eds) A unified approach to interpreting model predictions. Proceedings of the 31st international conference on neural information processing systems; (2017).
  • 21.Al Bataineh, A. & Manacek, S. MLP-PSO hybrid algorithm for heart disease prediction. J. Pers. Med.12(8), 1208 (2022). [DOI] [PMC free article] [PubMed]
  • 22.Celik, M. & Inik, O. Development of hybrid models based on deep learning and optimized machine learning algorithms for brain tumor Multi-Classification. Expert Syst. Appl.238, 122159 (2024).
  • 23.Algaysi, M. E., Albahri, A. S. & Hamid, R. A. Evaluation and benchmarking of hybrid machine learning models for autism spectrum disorder diagnosis using a 2-tuple linguistic neutrosophic fuzzy sets-based decision-making model. Neural Comput. Appl.36 (29), 18161–18200 (2024). [Google Scholar]
  • 24.Savarese, G. & Lund, L. H. Global public health burden of heart failure. Cardiac Fail. Rev.3 (1), 7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cook, C., Cole, G., Asaria, P., Jabbour, R. & Francis, D. P. The annual global economic burden of heart failure. Int. J. Cardiol.171 (3), 368–376 (2014). [DOI] [PubMed] [Google Scholar]
  • 26.Heidenreich, P. A. et al. Forecasting the impact of heart failure in the united states: a policy statement from the American heart association. Circulation Heart Fail.6 (3), 606–619 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bytyçi, I. & Bajraktari, G. Mortality in heart failure patients. Anatol. J. Cardiol.15 (1), 63 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jonkman, N. H. et al. Do self-management interventions work in patients with heart failure? An individual patient data meta-analysis. Circulation133 (12), 1189–1198 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Son, Y-J., Kim, H-G., Kim, E-H., Choi, S. & Lee, S-K. Application of support vector machine for prediction of medication adherence in heart failure patients. Healthc. Inf. Res.16 (4), 253–259 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Karanasiou, G. S. et al. Predicting adherence of patients with HF through machine learning techniques. Healthc. Technol. Lett.3 (3), 165–170 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Koulaouzidis, G., Iakovidis, D. & Clark, A. Telemonitoring predicts in advance heart failure admissions. Int. J. Cardiol.216, 78–84 (2016). [DOI] [PubMed] [Google Scholar]
  • 32.Mortazavi, B. J. et al. Analysis of machine learning techniques for heart failure readmissions. Circulation: Cardiovasc. Qual. Outcomes. 9 (6), 629–640 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zheng, B. et al. Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst. Appl.42 (20), 7110–7120 (2015). [Google Scholar]
  • 34.Frizzell, J. D. et al. Prediction of 30-Day All-Cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol.2 (2), 204–209 (2017). [DOI] [PubMed] [Google Scholar]
  • 35.Tripoliti, E. E., Papadopoulos, T. G., Karanasiou, G. S., Naka, K. K. & Fotiadis, D. I. Heart failure: diagnosis, severity Estimation and prediction of adverse events through machine learning techniques. Comput. Struct. Biotechnol. J.15, 26–47 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Joshi, R. & Gyllensten, I. C. Changes in daily measures of blood pressure and heart rate improve weight-based detection of heart failure deterioration in patients on telemonitoring. IEEE J. Biomedical Health Inf.23 (3), 1041–1048 (2018). [DOI] [PubMed] [Google Scholar]
  • 37.Zhang, J., Goode, K. M., Cuddihy, P. E., Cleland, J. G. & Investigators, T. H. Predicting hospitalization due to worsening heart failure using daily weight measurement: analysis of the Trans-European Network‐Home‐Care management system (TEN‐HMS) study. Eur. J. Heart Fail.11 (4), 420–427 (2009). [DOI] [PubMed] [Google Scholar]
  • 38.Henriques, J. et al. Prediction of heart failure decompensation events by trend analysis of telemonitoring data. IEEE J. Biomedical Health Inf.19 (5), 1757–1769 (2014). [DOI] [PubMed] [Google Scholar]
  • 39.Hasan, A. & Paul, V. Telemonitoring in chronic heart failure. Eur. Heart J.32 (12), 1457–1464 (2011). [DOI] [PubMed] [Google Scholar]
  • 40.Yoshimura, M., Yasue, H. & Ogawa, H. Pathophysiological significance and clinical application of ANP and BNP in patients with heart failure. Can. J. Physiol. Pharmacol.79 (8), 730–735 (2001). [PubMed] [Google Scholar]
  • 41.Turner, M. E. et al. Phosphate in cardiovascular disease: from new insights into molecular mechanisms to clinical implications. Arterioscler. Thromb. Vasc Biol.44 (3), 584–602 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Horwich, T. B., Hernandez, A. F., Dai, D., Yancy, C. W. & Fonarow, G. C. Cholesterol levels and in-hospital mortality in patients with acute decompensated heart failure. Am. Heart J.156 (6), 1170–1176 (2008). [DOI] [PubMed] [Google Scholar]
  • 43.Rauchhaus, M. et al. The relationship between cholesterol and survival in patients with chronic heart failure. J. Am. Coll. Cardiol.42 (11), 1933–1940 (2003). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated and/or analysed during the current study are not publicly available due to containing personal health information, but are available from the corresponding author on reasonable request”.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES