Prediction of hospital readmission of multimorbid patients using machine learning models

Jules Le Lay; Edgar Alfonso-Lizarazo; Vincent Augusto; Bienvenu Bongue; Malek Masmoudi; Xiaolan Xie; Baptiste Gramont; Thomas Célarier

doi:10.1371/journal.pone.0279433

. 2022 Dec 22;17(12):e0279433. doi: 10.1371/journal.pone.0279433

Prediction of hospital readmission of multimorbid patients using machine learning models

Jules Le Lay ^1,^#, Edgar Alfonso-Lizarazo ^2,^#, Vincent Augusto ^1,^‡,^*, Bienvenu Bongue ^3,⁴, Malek Masmoudi ⁵, Xiaolan Xie ¹, Baptiste Gramont ⁶, Thomas Célarier ^4,^7,^8,^‡

Editor: Antonio De Vincentis⁹

PMCID: PMC9779015 PMID: 36548386

Abstract

Objective

The objective of this study is twofold. First, we seek to understand the characteristics of the multimorbid population that needs hospital care by using all diagnoses information (ICD-10 codes) and two aggregated multimorbidity and frailty scores. Second, we use machine learning prediction models on these multimorbid patients characteristics to predict rehospitalization within 30 and 365 days and their length of stay.

Methods

This study was conducted on 8 882 anonymized patients hospitalized at the University Hospital of Saint-Étienne. A descriptive statistical analysis was performed to better understand the characteristics of the patient population. Multimorbidity was measured using raw diagnoses information and two specific scores based on clusters of diagnoses: the Hospital Frailty Risk Score and the Calderon-Larrañaga index. Based on these variables different machine learning models (Decision Tree, Random forest and k-nearest Neighbors) were used to predict near future rehospitalization and length of stay (LoS).

Results

The use of random forest algorithms yielded better performance to predict both 365 and 30 days rehospitalization and using the diagnoses ICD-10 codes directly was significantly more efficient. However, using the Calderon-Larrañaga’s clusters of diagnoses can be used as an efficient substitute for diagnoses information for predicting readmission. The predictive power of the algorithms is quite low on length of stay indicator.

Conclusion

Using machine learning techniques using patients’ diagnoses information and Calderon-Larrañaga’s score yielded efficient results to predict hospital readmission of multimorbid patients. These methods could help improve the management of care of multimorbid patients in hospitals.

Introduction

The management for care of multimorbid patients, in hospitals is a rising concern among the scientific community. Multimorbid patients tend to have more complex needs and require coordinated care from several providers [1]. Multimorbidity, defined as the “co-occurrence of multiple chronic or acute diseases and medical conditions within one person” [2], is highly prevalent in Europe. Based on the Survey on Health, Aging and Retirement in Europe (SHARE) Nielsen et al. [3] found that 31.42% of the participants above 50 years old in 14 European countries and Israel were affected by multimorbidity.. Southern Europe (Italy, Spain, France and Israel) and Northern Europe (Denmark, Sweden, and the Netherlands) had a slightly lower multimorbidity prevalence (29.8% and 26.2% respectively).

Currently, there are multiple research projects to improve the overall quality of care both inside and outside of healthcare centers, establishing dedicated care pathways for multimorbid patients [4–6].

Barnett et al. [7] reported an association between age, sex, deprivation and multimorbidity based on a list of 40 medical conditions. This list was built using policy recommendations and important chronic conditions identified in [8]. However, counting conditions can be quite limiting, and are a controversial measure of multimorbidity for these studies [9]. [10] highlighted the importance of using a standard measure of multimorbidity to analyze and compare the results of studies in which different scores have been built to describe multimorbidity.

The most common measure of multimorbidity is the Charlson comorbidity index score, originally introduced in 1987 [11] and first updated in 1994 [12] and numerous times since to be applied with administrative databases [13, 14] or to predict other outcomes [15]. In a recent systematic review, [9] explored the different multimorbidity measures developed outside of the counts of conditions. The hospital frailty risk score (HFRS), which uses weighted counts [16] or the Calderon-Larrañaga score, which counts clusters of diagnoses groups [17] are other ways to build a more efficient index.

Healthcare services can be monitored through several performance indicators. In this study we are interested in the patients’ readmission and length of stay (LoS) indicators. Rehospitalization (or readmission) can be defined as “an admission to a hospital within a certain time frame (which can be 7, 15, 30,60, 90 days or even as long as a year) following an original (index) admission and discharge” [18]. According to [19], monitoring readmission and predicting the readmission of patients during their initial hospitalization are essential for two main reasons. First, authorities use this metric to evaluate and report the efficiency of healthcare centers, where a higher readmission count is being associated with lower efficiency. Second, providing a clinically relevant readmission risk early in a hospitalization stay allows hospital workers to trigger preventive action and avoid subsequent admission, improving the consumption of medical supplies and the cost-effectiveness of the patients’ care. This metric, from a cost-effectiveness perspective, is even more crucial for patients with additional chronic conditions as comorbidities that are associated with higher costs of care [20]. The LoS in hospitals is a key quality of care metric for patients and care providers, it relates to the occupancy rate of the service and is used to improve the care given.

Machine learning is a powerful set of data analysis techniques that identify and use patterns in data to realize predictions without explicitly specifying the procedure. Its use in healthcare over the past years has been extensive for the prediction of outcomes, as shown in [21]. A scoping review recently covered the use of machine learning algorithms for the prediction of hospital readmission [22]. According to this review there is a relatively high interest in tree-based methods (decision trees, random forest and boosted tree methods), although other techniques as neural networks and regularized logistic regression are also used. [23] predicted the LoS of stroke patients using J48 and a Bayesian network.

The objectives of the study presented in this article are 1) to describe the characteristics of elderly multimorbid patients in the studied hospital using diagnoses information, multimorbidity and frailty scores, and 2) to assess the ability of machine learning models to predict rehospitalization within 30 and 365 days and patients’ length of stay, using diagnoses information and two aggregative scores: the hospital frailty risk score and the Calderon-Larrañaga score.

Materials and methods

Data description

The data used in the present study were extracted from the anonymized patients’ electronic records of the hospital of Saint-Étienne (CHUSE) under ‘Commission Nationale Informatique et Libertés’ (CNIL) authorization number 919300, that also waived the need for consent [24]. An exception to the obligation to inform the patient was granted, as the effort required to contact all the patients involved in the study was deemed disproportionate. All patient data accessed during this research was anonymyzed. CHUSE is a university hospital at the heart of the regional healthcare network: -Groupement Hospitalier de la Loire. In 2019, CHUSE had more than one hundred thousand stays in one of the thousand beds in the medicine, surgery and obstetrics areas.

We focused on adult patients above 60 years old hospitalized in the CHUSE and discharged in 2017 with diagnoses in 2 different chapters of the ICD-10 classification. We excluded patients following a highly controlled care pathway, such as dialysis or outpatient surgery, except when this outpatient care led to an extended hospital stay.

For each stay we analyzed variables related to the general information of the patient as well as information concerning their care pathway. These variables are listed below:

anonymous patient identifier
age and sex of the patient
service sequence
date of admission in each unit
length of stay (LoS) at each service of the sequence
total length of stay (sum of the length of stays in individual services)
admission modality and origin of the patient
discharge modality and destination of the patient
the list of diagnoses made at each service

From this list we were able to predict the same-hospital readmission within 30 and 365 days by linking the different stays of a unique patient using his/her anonymous identifier.

As previously mentioned there are numerous methods in the literature to describe multimorbidity. To compare the different approaches, we decided in this study capture multimorbidity by using all diagnoses information and using two multimorbidity scores: the hospital frailty risk score [16] and the Calderon-Larrañaga score [17].

The hospital frailty risk score [16] was developed to identify older patients presenting frailty diagnoses. A higher risk score is associated with a higher risk of adverse outcomes and a higher use of medical resources. Weights were calculated using logistic regression targetting of the identified frail population and validated for the prediction of adverse outcomes.

Calderon-Larrañaga [17] explored a different approach. To build this multimorbidity score, [17] gathered a panel of medical experts to group diagnoses into categories according to “clinical criteria and relevance (pathophysiological pathway, treatment, prognosis, and prevalence)” and defined the score of a patient as the number of categories where the patients had at least one diagnosis.

In order to capture the multimorbidity by using all diagnosis information, we built a database containing the exhaustive list of diagnoses made to the patient during their stay (3-digit ICD-10 codes) to compare the performance obtained by creating thematic groups of diagnoses.

The Charlson comorbidity index score was computed for comparative purposes using the ICD-10 translation of the index established by [13]; however, this score was not used for predictive procedures.

Statistical analysis methodology

The use of different measures to synthesize multimorbidity in the literature and the tendency to use exhaustive data in process mining raises the question of the relevance of using aggregated scores in advanced statistical procedures. For this reason we compare the ability of several machine learning algorithms to predict readmission within 30 and 365 days and length of stay in the elderly multimorbid patient population using all diagnosis information and two multimorbidity scores: the hospital frailty risk score (HFRS) [16] and the Calderon-Larrañaga score [17]. In our study, we also use different categories of the Calderon-Larrañaga’s and HFRS, to perform our statistical analysis and assess the loss of predictive power when aggregating the scores. For the Calderon-Larrañaga score we use the categories, and for HFRS we use groups of diagnoses having an equal weight in the score’s calculation. We will refer to these nonaggregated versions of the scores as Calderon-Larrañaga portfolio and HFRS portfolio in the remainder of the paper. In the portfolio versions of the measures the category information was coded as a binary value equal to 1 if the category/diagnosis was active for the patient. We chose HFRS because it was designed to predict frailty and was validated for the prediction of 30-day readmission [16]. As mentioned before we built a database containing an exhaustive list of diagnoses made during the patients’ stay (3-digit ICD-10 codes).

It is important to note that for the readmission analysis we excluded patients who died during their initial hospitalization, as readmission is not applicable for deceased patients. This resulted in the exclusion of 780 patients from the readmission database. However, we kept the data from these patients to predict length of stay. Palliative care or complex pathologies and care resulting in the death of the patient might be associated with longer hospital stays.

However, it is also important to note that for this study we only have access to the patients’ data during their hospitalization, which means that we do not have access to their status after one year.

The machine learning methods used in this study for prediction of hospital readmission were tree classifier, a random forest classifier and a k-nearest neighbor classifier.

For the length of stay prediction we used a tree regressor and a random forest regressor. All experiments were performed using Python 3.7, pandas [25, 26] and scikit-learn [27].

For all learning experiments, the data were split between training and testing samples, with the training sample representing 75% of the original dataset. We used a grid search with cross-validation to parametrize the learning algorithms. The parameters that were tested and optimized were the depth and number of leaves for the decision tree approach, the depth, number of leaves and number of estimators for the random forest approach and the algorithm, leaf size and number of neighbors for the k-nearest neighbors approach.

As previously mentioned, we used the patients’ anonymous identifiers to identify the patients stays. In addition to diagnosis information, we used the age, sex, length of stay, ED admission information and number of steps in the pathway to predict the same-hospital readmission within 30 days and 365 after discharge of the initial stay from the inpatient database. Those variables and the residential zip code (except the length of stay), were used for the prediction of length of stay.

The class size of readmitted patients within 30 days (1 965, or 15.86% of the stays) appeared to be far smaller than that of nonreadmitted patients. To address these imbalanced classes, we performed resampling techniques on the dataset using the imbalanced-learn Python package presented in [28] and used appropriate metrics to evaluate the results.

Resampling is a method that changes the composition of the dataset to allow training on a balanced dataset. There are techniques that delete samples from the majority class, others that generate samples of the minority class and some that combines the two. We used the imbalanced-learn Python package presented in [28]. The different methods were applied to train the learning algorithm on the balanced dataset, and we selected the best performing combination of resampling and learning algorithm. For the prediction of hospital readmission within 30 days we used the same 3 classifier algorithms: decision tree, random forest and k-nearest neighbor.

Some metrics such as accuracy are not an appropriate target to assess the performance of algorithms when dealing with unbalanced datasets. Therefore, we decided to focus on the F1-score, a weighted average of precision and recall, and receiving operating characteristic area under the curve (ROC-AUC) which compares sensitivity and specificity. Thus, we will have a better understanding of the classifier’s efficiency.

To evaluate regression algorithms, we used two well-known metrics: mean absolute error and mean squared error. The classifiers’ performances were assessed using accuracy (percentage of correctly predicted instance) and F1-score for rehospitalization within 365 days and receiver operating characteristic area under curve and F1-score for hospitalization within 30 days, where patients were unevenly distributed between positive and negative classes.

Database

Table 1 presents the general variables used by the prediction algorithm. Additionally the algorithms use information on the diagnoses made to the patients. Five versions of this database were coded in order to compare the diagnoses information and aggregated multimorbidity scores:

Table 1. General variables regarding the patient and descriptive information on their values.

Variable	Nature	Signification	Mean	Range (min-max)	Std dev
pat_id	Integer	Anonymous patient identifier	-	-	-
sex	Binary	Sex of the patient (1 = M)
age	Integer	Age of the patient (in years)	76.35	60–104	9.28
duration	Integer	Total length of stay of the patient in hospital (in days)	10.66	0–250	11.81
geo	Zip Code	Residential zip code of the patient	-	-	-
nb_steps	Integer	Number of medical units visited during pathway	1.68	1–14	1.02
mortality^a	Binary	True if the patient died during the index stay	0.05	-	-
ed_adm	Binary	True if the patient was admitted through the ED (Emergency department)	0.36	-	-
re_hosp30	Binary	True if the patient was readmitted within 30 days after index hospitalization	0.16	-	-
re_hosp365	Binary	True if the patient was readmitted within 365 days after index hospitalization	0.47	-	-

Open in a new tab

^a Predictive features for length of stay.

using raw information on the diagnoses: one column per diagnoses with binary values;
using Calderon-Larrañaga score;
using Calderon-Larrañaga portfolio (as explained in section Statistical Analysis Methodology);
using the HFRS score; and
using the HFRS portfolio.

These features were used to train and test prediction models targeting (i) readmission within 365 days, (ii) readmission within 30 days after the end of the initial hospitalization and (iii) the total length of stay.

Results

Epidemiological description

As specified above, we include patients above 60 years old with diagnoses in 2 different categories of ICD-10 classification who were discharged in 2017 in our study. This database includes 12 391 hospital stays of 8 882 unique patients (4 541 male patients and 4 341 female patients). The distribution of ages for this population is shown in Fig 1. On the left side of the graph are male patients, and on the right side of the graph are female patients.

We observe that the mean age for both male and female patients is quite high (74.73 and 77.93 respectively), with a median (74 and 79 years old, respectively). In our study, it was more likely that older multimorbid people were more susceptible to hospitalization.

We calculated the age-adjusted Charlson comorbidity index score [12], the Calderon-Larrañaga score [17] and the HFRS [16] for each patient. The mean age-adjusted Charlson’s score was 5.65, the mean Calderon-Larrañaga’s score was 4.94 and the mean HFRS was 6.32. The mean absolute difference between age-adjusted Charlson comorbidity and Calderon-Larrañaga scores was 2.39, which shows how the different scores highlight different multimorbidity profiles.

Table 2 shows the ten ICD-10 codes that are the most frequently diagnosed in the 12 391 stays of the database. Eight out of these 10 codes refer to chronic diseases, among them we can notice hypertension, that appears in 51.6% of the stays, type 2 diabetes mellitus (25.7%), overweight and obesity (13.4%), and chronic ischemic heart diseases (13.3%).

Table 2. The ten most frequent diagnoses appearing in the database.

ICD-10 Code	diagnoses	Number of occurrences
I10	Essential hypertension	6 391
E11	Type 2 diabetes mellitus	3 189
I48	Atrial fibrillation	2 681
E78	Disorders of lipoprotein metabolism ^a	2 512
N18	Chronic kidney disease	2 063
I50	Heart failure	2 035
J96	Respiratory failure	1 718
E66	Overweight and obesity	1 658
I25	Chronic ischemic heart disease	1 643
Z74	Problems related to care-provider dependency	1 626

Open in a new tab

^a E78: “Disorders of lipoprotein metabolism and other lipidemias”, contains more specific sub-categories, such as “E780: Pure hypercholesterolemia”, “E781: Pure hyperglyceridemia”.

Prediction of readmission within 365 days

The results obtained from the different combination of algorithm and multimorbidity evaluation are displayed in Table 3. We display the mean observed value and the 95% confidence interval calculated using a bootstrap method.

Table 3. Accuracy, F1-score and computation times obtained for the prediction of readmission within 365 days.

Metric	Accuracy	F1-score	Computation time
DT¹—All Diags	0.597 [0.580–0.614]	0.540 [0.518–0.563]	557.7s
RF²—All Diags	0.826 [0.811–0.840]	0.812 [0.794–0.829]	5 291.4s
KNN³—All Diags	0.549 [0.532–0.568]	0.551 [0.532–0.572]	335.1s
DT—CL⁴ score	0.547 [0.529–0.564]	0.527 [0.505–0.551]	6.7s
RF—CL score	0.616 [0.598–0.634]	0.632 [0.612–0.651]	1 472.2s
KNN—CL score	0.534 [0.517–0.552]	0.542 [0.521–0.563]	2.8s
DT—CL portfolio	0.595 [0.578–0.613]	0.570 [0.548–0.592]	29.5s
RF—CL portfolio	0.730 [0.716–0.748]	0.704 [0.685–0.725]	1 754.8s
KNN—CL portfolio	0.544 [0.527–0.560]	0.551 [0.528–0.560]	98.9s
DT—HFRS score	0.553 [0.536–0.570]	0.523 [0.502–0.545]	8.2s
RF—HFRS score	0.594 [0.577–0.611]	0.607 [0.587–0.627]	1 581.6s
KNN—HFRS score	0.550 [0.531–0.567]	0.568 [0.544–0.588]	21.8s
DT—HFRS portfolio	0.576 [0.559–0.594]	0.583 [0.562–0.602]	18.3s
RF—HFRS portfolio	0.719 [0.702–0.734]	0.696 [0.676–0.715]	1 583.3s
KNN—HFRS portfolio	0.533 [0.515–0.552]	0.548 [0.527–0.568]	96.3s

Open in a new tab

¹ DT = Decision Tree,

² RF = Random forest,

³ KNN = K-nearest neighbors,

⁴ CL = Calderon-Larrañaga.

Random forest appears to be the best performing algorithm. The best performance measures, accuracy (0.826) and f1-score (0.812), are obtained by using all diagnosis information. Using the aggregated multimorbidity and frailty scores returns smaller performances. However by using the components of the scores (portfolios) the performance improves. For example using Calderon-Larrañaga’s portfolio we obtain an accuracy of 0.730 and F1-score of 0.704. For HFRS with components we obtain 0.719 and 0.696 respectively with the random forest classifiers. A similar behavior is observed with KNN classifiers.

Although using all diagnostic information provides the best performance, the computational time is higher compared to the computation time of the aggregated multimorbidity scores.

A good time-performance trade-off is obtained by using Calderon-Larrañaga portfolio and random forest, as the decrease in accuracy and F1-score is small (−0.096 and −0.108, respectively), with a net improvement in the computation time. Using the HFRS portfolio and random forest is the second best option. These performance gains can be explained by the thematic groups of diagnoses created by the experts specifically for the score, which must result in easier analysis for the algorithms. Both metrics indicate that the random-forest algorithm, used on the Calderon-Larrañaga and HFRS portfolios are viable alternatives for the prediction of hospital rehospitalization within 365 days.

Prediction of readmission within 30 days

As presented in section, we implemented resampling methods to train the algorithms on balanced training sets before testing it on unbalanced sets. We used oversampling (random oversampling, synthetic minority oversampling technique or SMOTE and variations), undersampling (random undersampling, near miss, condensed nearest neighbors, Tomek links, edited nearest neighbors) techniques, as well as mixed methods (SMOTE combined with edited nearest neighbors (SMOTE-ENN) or SMOTE combined with Tomek links). All related results are available in Fig 2 of the appendix Results of the resampling techniques used to predict the within 30 day readmission.

We singled out the most efficient combinations of classifier and resampling techniques for each dataset to perform a grid search with cross validation when considering the combination of F1-score and ROC-AUC. For the five datasets, it appeared that random undersampling was the most efficient technique.

The performance is higher when the algorithm considers all information related to the patients’ diagnoses, as seen in Table 4. For example the combination of random forest and random undersampling with all diagnoses information provides an ROC-AUC score of 0.625 with a 95% Confidence Interval (CI) of [0.602–0.649] on the testing samples. Similar to the results for readmission within 365 days, using Calderon-Larrañaga portfolio is more efficient than using the aggregated score alone. With an ROC-AUC score of 0.594 [0.573–0.620] this algorithm gives the best result after all diagnoses and is significantly more efficient than the experiments using HFRS. The ROC curves and calibration curves are displayed in Fig 3. We note that the results are quite different for HFRS, and the ROC-AUC and F1-score are both comparable for the two versions of this multimorbidity index (the HFRS score and HFRS portfolio).

Table 4. Accuracy, F1-score and ROC-AUC results obtained for the prediction of readmission within 30 days.

Best combination	Set	Accuracy	F1-score	ROC AUC
All diags: RF and RU¹	Balanced TrS²	0.891 [0.879–0.902]	0.889 [0.877–0.901]	0.891 [0.879–0.902]
	TrS	0.703 [0.694–0.712]	0.500 [0.483–0.515]	.772 [0.763–0.781]
	TeS³	0.603 [0.586–0.620]	0.358 [0.328–0.387]	0.625 [0.602–0.649]
CL score: RF and RU	Balanced TrS	0.636 [0.620–0.652]	0.649 [0.632–0.667]	0.636 [0.619–0.653]
	TrS	0.544 [0.535–0.555]	0.334 [0.319–0.349]	.596 [0.583–0.608]
	TeS	0.526 [0.507–0.545]	0.307 [0.275–0.334]	0.565 [0.535–0.589]
CL portfolio: RF and RU	Balanced TrS	0.829 [0.816–0.842]	0.834 [0.822–0.848]	0.829 [0.817–0.842]
	TrS	0.634 [0.624–0.645]	0.444 [0.427–0.460]	0.725 [0.713–0.735]
	TeS	0.556 [0.537–0.575]	0.331 [0.304–0.358]	0.594 [0.573–0.620]
HFRS: RF and RU	Balanced TrS	0.633 [0.616–0.649]	0.653 [0.634–0.670]	0.633 [0.615–0.650]
	TrS	0.535 [0.525–0.546]	0.335 [0.320–0.349]	0.597 [0.583–0.610]
	TeS	0.510 [0.491–0.532]	0.304 [0.275–0.328]	0.561 [0.536–0.584]
HFRS’ portfolio: RF and RU	Balanced TrS	0.835 [0.821–0.850]	0.830 [0.813–0.843]	0.835 [0.822–0.849]
	TrS	0.645 [0.644–0.664]	0.440 [0.422–0.455]	0.713 [0.702–0.723]
	TeS	0.562 [0.544–0.580]	0.309 [0.282–0.336]	0.571 [0.547–0.594]

Open in a new tab

¹ RU: Random Undersampling,

² TrS: Training Set,

³ TeS: Testing Set.

Prediction of length of stay

Overall, the random forest algorithm performed better than the decision tree, with an improvement of 10 to 30 square days in mean squared error from the decision tree results. This improvement is the most significant when using all available diagnoses (-0.802 MAE and -30.462 MSE). These results can not be considered conclusive as the best mean absolute error is still above 5 days, which represents half of the mean length of stay in the database. The raw results are displayed in Table 5.

Table 5. Length of stay prediction results.

Algorithm	MAE	MSE
Decision Tree—All diags	6.010	103.149
Random forest—All diags	5.208	72.687
Decision Tree—Calderon-Larrañaga’s score	6.297	97.854
Random forest—Calderon-Larrañaga’s score	6.146	88.163
Decision Tree—Calderon-Larrañaga’s portfolio	5.911	82.767
Random forest—Calderon-Larrañaga’s portfolio	5.894	81.903
Decision Tree—HFRS score	5.849	104.728
Random forest—HFRS score	5.609	77.532
Decision Tree—HFRS portfolio	6.137	100.737
Random forest—HFRS portfolio	5.728	82.811

Open in a new tab

Discussion

In this study we included patients based on the number of diagnoses in different ICD-10 categories, before applying different scores (the Charlson comorbidity score, Calderon-Larrañaga score and HFRS).

The mean age of the studied patients is 76 years. In addition, female multimorbid patients tend to be older than male patients. The mean age-adjusted Charlson morbidity score was 5.65, and the mean Calderon-Larrañaga score was 4.94. The most prevalent diseases diagnosed in patients were hypertension and type 2 diabetes mellitus.

We built 5 versions of our database for each outcome, each taking into account various information on the diagnosis. The raw diagnosis information was compared to 2 aggregation scores: the HFRS, a score built to measure the frailty of patients based on diagnoses information, and the Calderon-Larrañaga score, a multimorbidity score based on groups of diagnoses.

The prediction of medium-term readmission was efficient, with the best score achieved using random forest and all diagnoses information (an accuracy score of 0.826 [0.811–0.840] and a F1-score of 0.812 [0.794–0.829]). Using Calderon-Larrañaga portfolio resulted in a slight decrease in performance for both indicators (−0.096 and −0.108 respectively). We believe that the clusters of diseases used in Calderon-Larrañaga portfolio can be used as an efficient substitute of diagnoses information for predicting readmission within 365 days after initial discharge.

We tested multiple resampling solutions to account for the imbalance in the rehospitalization within 30 days dataset. We obtained at best an ROC-AUC score of 0.625, which is acceptable, although it is slightly lower than the recent results in the literature, as [22] reported a median AUC of 0.68 on the studies they identified. The accuracy on the unbalanced testing set is of 0.603. When using the combination of Calderon-Larrañaga portfolio, random forest and random undersampling, we obtained a mean ROC-AUC score of 0.594 and a mean accuracy of 0.556. The use of Calderon-Larrañaga portfolio can be a viable alternative for the prediction of the within 30 day readmission on a medico-administrative database.

Overall, the predictive power of our algorithms is quite low for length of stay prediction. The use of a random forest regressor gave the best results for the two metrics used, MAE and MSE. Both the HFRS and the Calderon-Larrañaga score were outperformed by the use of all information on diagnosis. However, the HFRS performed better than the Calderon-Larrañaga score. The HFRS seems to be an acceptable alternative to the use of exhaustive information on diagnosis with a random forest algorithm for predicting length of stay.

In general, the experiments show that for readmission within 365 days, using all diagnosis gives the best results. The Calderon-Larrañaga and HFRS give comparable results. Calderon-Larrañaga portfolio gives significantly better results than the experiments using HFRS scores and portfolio for the prediction of readmission within 30 days, but is still outperformed by the random forest with a random undersampling on all diagnosis information. The different experiments show that the machine learning algorithms for the prediction of length of stay give at best a mean absolute error of 5.208 and mean square error of 72.687 (random forest used with HFRS score).

Calderon-Larrañaga portfolio is a standardized and thorough tool that accounts for most ICD-10 codes and chronic conditions, and we believe that it gives a quite accurate view on multimorbidity. Thus, it can be a good alternative to using the raw information on diagnoses for predicting readmission within 365 and 30 days.

The main limitations of this study are related to the lack of information on the vital status of patients after their hospitalization. This could represent a bias, as it is possible that patients died between the initial discharge and the end of the period of interest. Thus, the absence of a hospital stay within 30 or 365 days after discharge may be caused by the death of the patient.

A key component of multimorbidity according to Barnett et al. [7] that we could not grasp in this study is the socioeconomic aspect. Similar to [7], we can use geographical information as a proxy, but we do not have access in the present database to an evaluation of the socioeconomic status per area. In addition, we have access only to the residential zip-code of the patient, which can cover quite a large area and hide many disparities between patients. A favorable familial situation, with an available caregiver, is a key component of the support and recovery of patients, and this information is not available to us. We believe that these two key components of care for multimorbid patients would be a valuable addition when using the methodology presented in this paper.

Conclusion

In this study we described the general characteristics of the multimorbid population hospitalized in the CHUSE in 2017 using diagnoses information, multimorbidity or frailty scores and various machine learning techniques were used to predict key components of the hospitalization pathway: the length of stay and the rehospitalization within 30 and 365 days after initial discharge.

Random forrest algorithms were the most efficient for predicting those three outcomes. Using all diagnoses information gave better results at the price of high calculation times. Using the Calderon-Larrañaga portfolio is an efficient alternative for rehospitalization within 30 and 365 days. However, for LoS prediction, the HFRS metrics gave the best results.

For future research, we intend to apply prediction techniques on patient data to single out complicated pathways and combine this with the results obtained in this study in a discrete event simulation model. Our goal is to evaluate the organizational impact of redirecting those patients toward a newly created unit, dedicated to multimorbid patients. It would also be of interest to combine the approach of this paper with prospective data at admission, which would provide additional valuable information, such as socioeconomic data and familial context, respecting the anonymity of patients. This could allow us to evaluate if the patient would be fit for the new multimorbid pathway and to track the decision process of the care team when deciding in which unit the patient must be routed.

Appendices: Results of the resampling techniques used to predict the within 30 day readmission

Fig 2 shows the results obtained when testing the different resampling techniques. The algorithms were trained on a dataset balanced using the specified resampling technique using a random search with cross validation from the scikit-learn package [27]. Only the ROC-AUC score on the testing set is displayed here.

Appendices: ROC and calibration curves for the prediction of readmission within 30 days

Fig 3 displays the ROC and calibration curves for the 5 best performing algorithms for predicting 30 days all-cause readmission. We used the scikit-learn package [27] functions to generate those curves. They were generated using the testing set.

Data Availability

The data cannot be shared publicly. Data are available from the CNIL for researchers who meet the criteria for access to confidential data (data accessed with CNIL authorization number 919300). CNIL web site: https://www.cnil.fr/.

Funding Statement

This work was funded by the ‘Agence Nationale de la Recherche’, which funded the thesis on the study of multimorbid pathways under grant number ANR-18-CE19-0016. (thesis of author JLL, project manager VA). https://anr.fr/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Rijken M, Struckmann V, van der Heide I, Hujala A, Barbabella F, van Ginneken E, et al. How to improve care for people with multimorbidity in Europe? Richardson E, Van Ginneken E, editors. European Observatory Policy Briefs. Copenhagen (Denmark): European Observatory on Health Systems and Policies; 2017. Available from: http://www.ncbi.nlm.nih.gov/books/NBK464548/. [PubMed] [Google Scholar]
2. van den Akker M, Buntinx F, Knottnerus JA. Comorbidity or multimorbidity: what’s in a name? A review of literature. European Journal of General Practice. 1996;2(2):65–70. doi: 10.3109/13814789609162146 [DOI] [Google Scholar]
3. Nielsen C.R. and Halling A. and Andersen-Ranberg K. Disparities in multimorbidity across Europe—Findings from the SHARE Survey European Geriatric Medicine. 2017;1(8):16–21. doi: 10.1016/j.eurger.2016.11.010 [DOI] [Google Scholar]
4. Rijken M, Hujala A, van Ginneken E, Melchiorre MG, Groenewegen P, Schellevis F. Managing multimorbidity: Profiles of integrated care approaches targeting people with multiple chronic conditions in Europe. Health Policy. 2018;122(1):44–52. doi: 10.1016/j.healthpol.2017.10.002 [DOI] [PubMed] [Google Scholar]
5. van der Heide I, Snoeijs SP, Boerma WG, Schellevis FG, Rijken MP. How to strengthen patient-centredness in caring for people with multimorbidity in Europe? Richardson E, Van Ginneken E, editors. European Observatory Policy Briefs. Copenhagen (Denmark): European Observatory on Health Systems and Policies; 2017. Available from: http://www.ncbi.nlm.nih.gov/books/NBK464537/. [PubMed] [Google Scholar]
6. Shakib S, Dundon BK, Maddison J, Thomas J, Stanners M, Caughey GE, et al. Effect of a Multidisciplinary Outpatient Model of Care on Health Outcomes in Older Patients with Multimorbidity: A Retrospective Case Control Study. PloS One. 2016;11(8):e0161382. doi: 10.1371/journal.pone.0161382 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. The Lancet. 2012;380(9836):37–43. doi: 10.1016/S0140-6736(12)60240-2 [DOI] [PubMed] [Google Scholar]
8. Diederichs C, Berger K, Bartels DB. The Measurement of Multiple Chronic Diseases—A Systematic Review on Existing Multimorbidity Indices. The Journals of Gerontology: Series A. 2010;66A(3):301–311. doi: 10.1093/gerona/glq208 [DOI] [PubMed] [Google Scholar]
9. Stirland LE, González-Saavedra L, Mullin DS, Ritchie CW, Muniz-Terrera G, Russ TC. Measuring multimorbidity beyond counting diseases: systematic review of community and population studies and guide to index choice. BMJ. 2020; p. m160. doi: 10.1136/bmj.m160 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Diederichs C, Bartels DB, Berger K. The Importance Of A Standardized Instrument To Assess The Burden Of Multimorbidity. The Journals of Gerontology: Series A. 2011;66A(12):1395–1396. doi: 10.1093/gerona/glr162 [DOI] [Google Scholar]
11. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. Journal of Chronic Diseases. 1987;40(5):373–383. doi: 10.1016/0021-9681(87)90171-8 [DOI] [PubMed] [Google Scholar]
12. Charlson ME, Szatrowski T, Peterson J, Gold J. Validation of a combined multimorbidity index. Journal of Clinical Epidemiology. 1994;47(11):1245–1151. doi: 10.1016/0895-4356(94)90129-5 [DOI] [PubMed] [Google Scholar]
13. Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data. Medical Care. 2005;43(11):1130–1139. doi: 10.1097/01.mlr.0000182534.19832.83 [DOI] [PubMed] [Google Scholar]
14. Sundararajan V, Henderson T, Perry C, Muggivan A, Quan H, Ghali WA. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. Journal of Clinical Epidemiology. 2004;57(12):1288–1294. doi: 10.1016/j.jclinepi.2004.03.012 [DOI] [PubMed] [Google Scholar]
15. Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, et al. Updating and Validating the Charlson Comorbidity Index and Score for Risk Adjustment in Hospital Discharge Abstracts Using Data From 6 Countries. American Journal of Epidemiology. 2011;173(6):676–682. doi: 10.1093/aje/kwq433 [DOI] [PubMed] [Google Scholar]
16. Gilbert T, Neuburger J, Kraindler J, Keeble E, Smith P, Ariti C, et al. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study. The Lancet. 2018;391(10132):1775–1782. doi: 10.1016/S0140-6736(18)30668-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Calderón-Larrañaga A, Vetrano DL, Onder G, Gimeno-Feliu LA, Coscollar-Santaliestra C, Carfí A, et al. Assessing and Measuring Chronic Multimorbidity in the Older Population: A Proposal for Its Operationalization. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences. 2016. doi: 10.1093/gerona/glw233 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Yu S, Farooq F, van Esbroeck A, Fung G, Anand V, Krishnapuram B. Predicting readmission risk with institution-specific prediction models. Artificial Intelligence in Medicine. 2015;65(2):89–96. doi: 10.1016/j.artmed.2015.08.005 [DOI] [PubMed] [Google Scholar]
19. Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk Prediction Models for Hospital Readmission: A Systematic Review. JAMA. 2011;306(15):1688–1698. doi: 10.1001/jama.2011.1515 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Shwartz M, Iezzoni LI, Moskowitz MA, Ash AS, Sawitz E. The Importance of Comorbidities in Explaining Differences in Patient Costs. Medical Care. 1996;34(8):767–782. doi: 10.1097/00005650-199608000-00005 [DOI] [PubMed] [Google Scholar]
21.Shailaja K, Seetharamulu B, Jabbar MA. Machine Learning in Healthcare: A Review. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA); 2018. p. 910–914.
22. Huang Y, Talwar A, Chatterjee S, Aparasu RR. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Medical Research Methodology. 2021;21(1):96. doi: 10.1186/s12874-021-01284-z [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Al Taleb AR, Hoque M, Hasanat A, Khan MB. Application of data mining techniques to predict length of stay of stroke patients. In: 2017 International Conference on Informatics, Health Technology (ICIHT); 2017. p. 1–5.
24.Commission Nationale de l’Informatique et des Libertés. https://www.cnil.fr/en/home
25.Team TPD. pandas-dev/pandas: Pandas 1.3.2; 2021. Available from: 10.5281/zenodo.5203279. [DOI]
26.Wes McKinney. Data Structures for Statistical Computing in Python. In: Stéfan van der Walt, Jarrod Millman, editors. Proceedings of the 9th Python in Science Conference; 2010. p. 56 – 61.
27. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
28. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research. 2017;18(17):1–5. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0279433.r001

Decision Letter 0

Antonio De Vincentis

20 Apr 2022

PONE-D-22-08556Prediction of Hospital Readmission of Multimorbid Patients Using Machine Learning ModelsPLOS ONE

Dear Dr. LE LAY,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 04 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Antonio De Vincentis

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

Upon resubmission, please provide the following:

● The name of the colleague or the details of the professional service that edited your manuscript

● A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

● A clean copy of the edited manuscript (uploaded as the new *manuscript* file)

3. Please ensure you include in the Methods section of your manuscript all information regarding the data access authorization you have provided in the Ethics Statement section of the online submission form, including information regarding the exception for patient consent granted by the authority who approved the data access.

Additionally, please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I would like to thank the authors for their work.

This is an interesting paper, which aims to understand the characteristics of the multimorbid population that needs hospital care by using all diagnoses information and two aggregated multimorbidity and frailty scores, and to use machine learning prediction models on these multimorbid patients characteristics to predict rehospitalization within 30 and 365 days and their length of stay.

Before the publication I would just highlight two possible minor revisions:

1) Page 9, line 31: the quotation mark before "an admissione" seems "inverted";

2) Materials and methods: in the "epidemiological description" section some results are described. I would suggest to adapt and move this part to the "Results" section.

Reviewer #2: The Authors present an interesting study, aiming at understanding the characteristics of the multimorbid population that needs hospital care and predict, using ML algorithms, the re-hospitalization.

The paper is generally well written.

My comments:

- I think that, for the models listed in tab 4, a figure showing the ROC curves may be beneficial for the readers as it would allow to understand if cut-offs other than predict probability = 0.5 are viable and what is their performance. This may be particularly interesting for the potential “clinical” application of these models where, according to one’s aim, a trade-off between sensitivity and specificity may be acceptable.

- I think that for the models listed in tab 4, reliability plots should be shown. This would allow the readers to better understand the reliability of the proposed models. It may also help the Authors to understand if any problem with calibration arises. RF models, for example, may need some kind of recalibration, and this issue may be particularly important given that the models were trained on “artificial data” with solved class imbalance and applied to “real” testing data. I’d suggest the Authors to try and see if Platt scaling or isotonic regression are viable ways to improve the calibration of the models (if any issue arise).

- In general, models seem to be prone to overfitting (as expected). I would suggest to try and run the models including only a subsample of available variables (based on the importance of each variable, obtained after the fitting of complete RF models – for example best 5%, best 25%, best 50%). This may have several advantages: it may reduce the overfitting, may reduce the calculation time, and may have sense from a clinical point of view. Indeed, it has been proven that some chronic conditions (or group of conditions) are strongly linked to the risk of hospitalization (heart failure, COPD, dementia for example), whereas other are not (osteoarthritis for example). These latter conditions may even introduce some form of statistical noise in the models.

- Please, add 95%CI to the performance metrics listed in the paper (accuracy, F1, AUCs) to allow a better comparison between models.

- I’d like to know what is the performance (Accuracy, F1, AUC) of the simple count of chronic conditions according to the Calderon-Larranaga’s list of chronic diseases categories (for example, used in a simple logistic regression). This also goes for age used as single predictor. From an application point of view, it is important to know what the actual benefit of the implementation of ML models in comparison with much more simpler measures is.

- For the prediction of LoS, the performance shown by the models is not optimal (as highlighted by the Authors). Here, I would suggest, as sensitivity analyses, to either exclude or aggregate those participants with very long LoS (for example, outliers with LoS > Q75 + 2*IQR). From a clinical point of view, these very long LoS are likely to be associated with hospitalizations characterized by adverse events and complications (delirium, nosocomial infections and so on).

- The Calderon-Larranaga list of chronic conditions has been created using data from a population-based study enrolling only persons aged 60+. Has the HFRS been validated in younger persons? If not, it would be preferable to exclude patients younger than 60 from the analysis as these persons are more likely to be affected by a few chronic conditions with a strong impact on health, whereas those older are more likely to accumulate an elevated number chronic conditions.

- In some Countries, the diagnoses reported on the discharge letter from hospitals are used to obtain some kind of public (re)fundings. This is likely to introduce a bias in the reported diagnoses: those diagnoses that lead to higher (re)fundings are more likely to be reported. Is this the case also for the setting used by the Authors?

- It would be interesting to look at the predictive performance of the models in different age strata of the population. It is likely that among very old individuals with multiple chronic conditions (and likely affected by frailty and disability), a general tendency toward non-hospitalization and at-home (or nursing home) management is present. This may lead to a paradox where younger persons with multimorbidity are likely to be (re)hospitalized, whereas older persons are likely to be (re)hospitalized only when they are “healthier”.

Reviewer #3: In their paper, Le Lay et al studied the accuracy of machine learning models in predicting length of stay and hospital readmission. Despite the topic is interesting, all the sections of the study should be improved to clarify the objectives and the results of the study.

Introduction

The Introduction should be improved: the authors should explain the rationale of using machine learning to predict hospitalizations and length of stay, expand what it is known on this topic and what this study will add to the literature. Furthermore, Introduction should include the objectives of the study, that are lacking. Reading the Abstract, it is not clear which is the objective of the study. To compare different machine learning predicting models?

Methods

Epidemiological description included results such as the total number of the participants and their mean age. This information should be moved in the Results.

There is no information about ethic committee approval.

Why using three different methods to classify multimorbidity?

Which were the variables included in the learning experiments?

Results

General characteristics of the study sample should be reported. Otherwise, it is very difficult to interpret the results.

Prediction of readmission within 30 days: I think that could be more appropriate to describe resampling, the metrics used, etc in the Methods section and to report in this subsection (and in the Results section in general) only the results.

Discussion/Conclusions

The first paragraph of the discussion should report a summary of the main results of the study.

It seems that the conclusions simply discuss the results of the study (thus, most part of this section should be moved to the Discussion, comparing them with the available literature).

I suggest to report all the limitations of the study in a single paragraph and not throughout the Discussion.

The conclusion should briefly summarize the conclusions of the study, potential clinical implications and future perspectives.

In the conclusions the authors stated that they reported the general characteristics of the multimorbidity population, but I can’t find this information.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 22;17(12):e0279433. doi: 10.1371/journal.pone.0279433.r002

Author response to Decision Letter 0

9 Aug 2022

Jules Le Lay

École des Mines de Saint-Étienne

158 cours Fauriel,

42100 Saint-Étienne

Dear Editors of the PLOS ONE journal,

Thanks for the opportunity to submit a revised version of our manuscript entitled « Prediction of Hospital Readmission of Multimorbid Patients Using Machine Learning Models» by Le Lay, Alfonso-Lizarazo, Augusto, Bongue, Masmoudi, Xie, Gramont, and Célarier. We appreciate the efforts you and the rewiewers made to comment our manuscript, which have allowed us to improve the article.

Here under is a thorough response to the editors’ and reviewers’ comments :

Editors:

1) Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at :

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

R/ We added elements and renamed the files to meet the requirements. Please let us know if we missed any of the points listed.

2) We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service

R/ The initial manuscript was edited for language, spelling and grammar by AJE editing services. We’ll attach the editing certificate to the resubmission. The modifications in this version concern mostly the figures and tables, and only a few sentences. Please let us know if another editing is required.

3) Please ensure you include in the Methods section of your manuscript all information regarding the data access authorization you have provided in the Ethics Statement section of the online submission form, including information regarding the exception for patient consent granted by the authority who approved the data access.

R/ We added in the Methods section the information regarding data access, this study was approved by the French national commission on informatics and liberty (CNIL) for the access to data under authorization number 919300. (This information was added to the revised paper and was already present on the online submission form).

4) We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly.

R/ Indeed, the data used for the experiments presented in the paper are anonymized patients data extracted from medical records of the University hospital of Saint-Étienne. We were given access to this database by the 'Commission Nationale Informatique et Libertés' under authorization number 919300. The authors do not have the right to share this information publicly.

Reviewer 1:

5) Page 9, line 31: the quotation mark before "an admissione" seems "inverted";

R/ Indeed, I did not use the right quotation mark, this was corrected in the revised version of the paper.

6) Materials and methods: in the "epidemiological description" section some results are described. I would suggest to adapt and move this part to the "Results" section.

R/ We moved this paragraph at the beginning of the “Results” section as suggested.

Reviewer 2:

7) I think that, for the models listed in tab 4, a figure showing the ROC curves may be beneficial for the readers as it would allow to understand if cut-offs other than predict probability = 0.5 are viable and what is their performance. This may be particularly interesting for the potential “clinical” application of these models where, according to one’s aim, a trade-off between sensitivity and specificity may be acceptable.

R/ The ROC curves have been built for those models and have been included in the revised paper.

8) I think that for the models listed in tab 4, reliability plots should be shown. This would allow the readers to better understand the reliability of the proposed models. It may also help the Authors to understand if any problem with calibration arises. RF models, for example, may need some kind of recalibration, and this issue may be particularly important given that the models were trained on “artificial data” with solved class imbalance and applied to “real” testing data. I’d suggest the Authors to try and see if Platt scaling or isotonic regression are viable ways to improve the calibration of the models (if any issue arise).

R/ Calibration curves were plotted for those models and have been included in the revised paper, in the appendix.

9) In general, models seem to be prone to overfitting (as expected). I would suggest to try and run the models including only a subsample of available variables (based on the importance of each variable, obtained after the fitting of complete RF models – for example best 5%, best 25%, best 50%). This may have several advantages: it may reduce the overfitting, may reduce the calculation time, and may have sense from a clinical point of view. Indeed, it has been proven that some chronic conditions (or group of conditions) are strongly linked to the risk of hospitalization (heart failure, COPD, dementia for example), whereas other are not (osteoarthritis for example). These latter conditions may even introduce some form of statistical noise in the models.

R/ We think that this is an interesting improvement opportunity. However, unfortunately, we did not have enough time to develop the experimentation of this approach.

10) Please, add 95% CI to the performance metrics listed in the paper (accuracy, F1, AUCs) to allow a better comparison between models.

R/ We implemented a bootstrap method to calculate the 95% CI for the metrics used in the classification metrics. This information was included in the revised paper.

11) I’d like to know what is the performance (Accuracy, F1, AUC) of the simple count of chronic conditions according to the Calderon-Larrañaga’s list of chronic diseases categories (for example, used in a simple logistic regression). This also goes for age used as single predictor. From an application point of view, it is important to know what the actual benefit of the implementation of ML models in comparison with much more simpler measures is.

R/ Our work focuses on the use of classical machine learning methods on medico-economic datasets enriched with medical tools used in the description of multimorbid patients. In this way, although the comparison with other methods is not part of our objectives, we carried out an experiment using logistic regression for hospital readmission within 30 days with random undersampling using all diagnosis: and obtained the following results (which have not been included in the article)

Accuracy F1-score ROC-AUC SCORE

Balanced Training set 0.686 [0.670 – 0.704] 0.680 [0.661 – 0.698] 0.686 [0.670 – 0.703]

Training set 0.636 [0.626 – 0.645] 0.371 [0.636 – 0.663] 0.649 [0.636 – 0.663]

Testing set 0.600 [0.585 – 0.617] 0.291 [0.262 – 0.322] 0.577 [0.550 – 0.603]

Based on these results, the logistic regression with random undersampling using all diagnosis provides lower performances with respect to the method presented in the article.

12) For the prediction of LoS, the performance shown by the models is not optimal (as highlighted by the Authors). Here, I would suggest, as sensitivity analyses, to either exclude or aggregate those participants with very long LoS (for example, outliers with LoS > Q75 + 2*IQR). From a clinical point of view, these very long LoS are likely to be associated with hospitalizations characterized by adverse events and complications (delirium, nosocomial infections and so on).

R/ Although in our dataset a very small number of patients have a very long LoS compared to the average patient, we decided not to exclude or aggregate records. Indeed, we believe it is important to include these records in the training of machine learning algorithms to better identify characteristic patterns of prolonged stays.

13) The Calderon-Larrañaga list of chronic conditions has been created using data from a population-based study enrolling only persons aged 60+. Has the HFRS been validated in younger persons? If not, it would be preferable to exclude patients younger than 60 from the analysis as these persons are more likely to be affected by a few chronic conditions with a strong impact on health, whereas those older are more likely to accumulate an elevated number chronic conditions.

R/ We have taken this remark into account and for this revised paper we have carried out the experimentation excluding patients younger than 60 and reported them in the revised paper. The results, in comparison to those taking into account the entire population (including patients younger than 60), are slightly lower or slightly higher depending on the indicator under analysis. For example, the mean absolute error and mean square error were improved for LoS prediction. Accuracy and F1-score were approximately the same for 365 days hospital readmission, with a slight improvement in the best-case scenario (using all diagnosis and random forest). Accuracy, F1-score and ROC-AUC for 30-days readmission have slightly decreased.

14) In some Countries, the diagnoses reported on the discharge letter from hospitals are used to obtain some kind of public (re)fundings. This is likely to introduce a bias in the reported diagnoses: those diagnoses that lead to higher (re)fundings are more likely to be reported. Is this the case also for the setting used by the Authors?

R/ The anonymized patient data used in our study is extracted from the hospital electronic records. The methodology used in this study does not allow us to identify the existence of this kind of bias or to quantify it if it exists.

15) It would be interesting to look at the predictive performance of the models in different age strata of the population. It is likely that among very old individuals with multiple chronic conditions (and likely affected by frailty and disability), a general tendency toward non-hospitalization and at-home (or nursing home) management is present. This may lead to a paradox where younger persons with multimorbidity are likely to be (re)hospitalized, whereas older persons are likely to be (re)hospitalized only when they are “healthier”.

R/ As mentioned in the response to comment 13, in the revised paper we have presented the results excluding patients younger than 60, for this reason we have decided not to perform an analysis by age strata.

Reviewer 3:

1) The Introduction should be improved: the authors should explain the rationale of using machine learning to predict hospitalizations and length of stay, expand what it is known on this topic and what this study will add to the literature. Furthermore, Introduction should include the objectives of the study, that are lacking. Reading the Abstract, it is not clear which is the objective of the study. To compare different machine learning predicting models.

R/ The introduction section has been modified including these elements.

The objective of this study is the prediction of rehospitalization within 30 and 365 days and length of stay of multimorbid patients using machine learning models in which multimorbidity is measured using all diagnoses information (ICD-10 codes) and two aggregated multimorbidity and frailty scores.

2) Epidemiological description included results such as the total number of the participants and their mean age. This information should be moved in the Results.

R/ We moved this section at the beginning of the “Results” section.

3) There is no information about ethic committee approval.

R/ The study was approved by the French national commission on informatics and liberty (CNIL) for the access to data under authorization number 919300. This information was added to the revised paper.

4) Why using three different methods to classify multimorbidity?

R/ Since our study focuses on multimorbidity patients, it is important for us to consider different approaches reported in the literature to measure multimorbidity and evaluate which of them allow a better prediction of re-hospitalization for this type of patients.

5) Which were the variables included in the learning experiments?

R/ We have included in the revised paper (section materials and methods) an explicit list of the variables used in the models.

6) General characteristics of the study sample should be reported. Otherwise, it is very difficult to interpret the results.

R/ We reported in the paper the general characteristics of the dataset in section “Results/Epidemiological description” the epidemiological features.

7) Prediction of readmission within 30 days: I think that could be more appropriate to describe resampling, the metrics used, etc in the Methods section and to report in this subsection (and in the Results section in general) only the results.