Skip to main content
PLOS One logoLink to PLOS One
. 2024 Jan 25;19(1):e0294362. doi: 10.1371/journal.pone.0294362

In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health records

Hyungjun Park 1, Chang-Min Choi 2,3, Sung-Hoon Kim 4, Su Hwan Kim 5,6, Deog Kyoem Kim 5,7, Ji Bong Jeong 5,6,*
Editor: John Adeoye8
PMCID: PMC10810421  PMID: 38271404

Abstract

Coronavirus disease 2019 (COVID-19) has strained healthcare systems worldwide. Predicting COVID-19 severity could optimize resource allocation, like oxygen devices and intensive care. If machine learning model could forecast the severity of COVID-19 patients, hospital resource allocation would be more comfortable. This study evaluated machine learning models using electronic records from 3,996 COVID-19 patients to forecast mild, moderate, or severe disease up to 2 days in advance. A deep neural network (DNN) model achieved 91.8% accuracy, 0.96 AUROC, and 0.90 AUPRC for 2-day predictions, regardless of disease phase. Tree-based models like random forest achieved slightly better metrics (random forest: 94.1% of accuracy, 0.98 AUROC, 0.95 AUPRC; Gradient boost: 94.1% of accuracy, 0.98 AUROC, 0.94 AUPRC), prioritizing treatment factors like steroid use. However, the DNN relied more on fixed patient factors like demographics and symptoms in aspect to SHAP value importance. Since treatment patterns vary between hospitals, the DNN may be more generalizable than tree-based models (random forest, gradient boost model). The results demonstrate accurate short-term forecasting of COVID-19 severity using routine clinical data. DNN models may balance predictive performance and generalizability better than other methods. Severity predictions by machine learning model could facilitate resource planning, like ICU arrangement and oxygen devices.

Introduction

The COVID-19 infection swiftly propagated on a global scale, leading the World Health Organization (WHO) to officially declare it as a pandemic [1]. Although more than half of all infections are subclinical in young ages, 43% of older than 65 get hospitalized and 76% were died until September 2021 [2]. To control the spread of COVID-19 infection, pharmaceutical (vaccination) [3] and non-pharmaceutical interventions such as using face masks, isolation, quarantining, and social distancing have been exercised [4, 5]. Although the difference of stringency and period of covid-19 restriction, most of the countries conducted severe restriction more than 10 months [4]. At the peak of the pandemic, these interventions were necessary to reduce shortages in hospital care resources, particularly among critical care resources such as intensive care units (ICUs) and oxygen supplies [6, 7]. As a result of the number of COVID-19 patients exceeding the hospital’s capacity, some patients received inadequate care, resulting in poorer results than expected for the natural course of COVID-19 [6]. Moreover, severe COVID-19 patients require high-flow oxygen devices and mechanical ventilation, and preparation of those facilities was warranted [8]. Therefore, the Korean government replaced public hospitals with infectious disease-dedicated hospitals, monitored the daily number of severe COVID-19 patients, and determined whether this number would exceed the hospital capacity [9].

Prediction of future need for intensive care is required for controlling capacities for facilities in medical centers. Timely preparation of essential medical devices based on short-term forecasts (1 day or 2 days) is critical for ensuring efficient management of healthcare resources. By predicting short-term patient outcomes, informed decisions can be made regarding the potential transfer of patients to alternate hospitals or care units within the same facility’s ICU. With the development of artificial intelligence, several prediction models were established to predict the short-term outcomes of COVID-19 [1012]. In the early phase of the pandemic, many models were developed using static predictions—these predicted future events at admission [1012], but did not indicate when the event would occur [13]. In contrast to static predictions, dynamic predictions or real-time predictions estimate a temporal event, such as the daily risk of acute kidney injury [14] or need for ICU care [15]. Dynamic prediction in the medical context pertains to the daily forecasting of impending outcomes, such as complications or mortality. Concerning COVID-19, the near-future possible outcome include patient mortality [11], the requirement for mechanical ventilation [16], as well as discharge or mortality [17]. The predictive capacity for complications or mortality empowers healthcare workers to concentrate on addressing correctable issues in patients and to adequately prepare for future intensive care unit (ICU) demands related to mechanical ventilation. Given the restrictions on available resources, forecasting not only adverse outcomes but also predicting the potential recovery of patients from mechanical ventilation or oxygen support assumes importance in hospital resource management. Consequently, making accurate predictions about patients’ potential recovery from mechanical ventilation and their subsequent discharge from the ICU enables informed decision-making concerning the transfer of other patients in need of mechanical ventilation. However, previous studies focused only on the start date of the event and not on recovery.

We examined various machine learning and deep learning models to predict the daily severity of COVID-19 at a hospital dedicated to COVID-19 patients. In addition, we assessed the importance of input variables in each model and considered the possible hazard of bias associated with models that displayed exceptional performance.

Material and methods

Clinical dataset description

Deidentified patients with a confirmed diagnosis of COVID-19 from January 1, 2020, to October 31, 2021 at Boramae Medical Center (BMC) were consecutively included. The authors obtained access to this data set from February 1, 2022, following approval from the Institutional Review Board (IRB). BMC was dedicated to infectious diseases and only admitted COVID-19 patients. During the pandemic, BMC used telemonitoring to monitor patients with COVID-19 in residential treatment centers. In case of symptom progression at a residential treatment center, the patient was referred to BMC. All patients identified with COVID-19 were diagnosed by nasopharyngeal swab reverse transcription-polymerase chain reaction. Patients younger than 18 years of age were excluded from this study. This study was approved by the ethics committee of the BMC and was conducted as per the Declaration of Helsinki. The requirement for informed consent was waived by the ethics committee of the BMC (approval number 10-2021-130), considering the retrospective nature of the study.

COVID-19 severity as the target outcome

The disease severity defined by World Health Organization (WHO) clinical progression scales, was recorded daily at BMC at midnight [18]. The primary objective of our study was to forecast the severity of the condition prior to the event, focusing on predictions for day 0, 1, and 2. It was deemed sufficient to forecast the patient’s severity up to 2 days in advance to adequately prepare the necessary oxygen and ICU resources. The WHO clinical progression scale ranges from 0 (not infected) to 10 (dead) [18]. In our model, the score was divided into 3 scores, i.e., mild cases (monitored, no oxygen therapy; WHO scale < 5), moderate disease (hospitalized, need oxygen by mask or nasal prong; WHO scale = 5), and severe disease (oxygen by high flow, non-invasive ventilation, or mechanical ventilation; WHO scale > 5). Utilization of the oxygen device was meticulously documented with corresponding timestamps, thereby ensuring that all daily fluctuations in severity were accurately captured on an daily basis.

Data preprocessing and feature selection

From the electronic medical record system at BMC, we extracted the following data: demographic features (age, sex, underlying disease, smoking status), daily symptom records, laboratory data, drug use, vital signs. (S1 and S2 Tables) The laboratory data encompassed a range of parameters, including complete blood cell count, chemistry markers (such as protein, albumin, liver function tests, BUN/Creatinine), electrolyte levels, inflammatory markers (CRP, LDH, ferritin, procalcitonin), and coagulation measurements. The selection of these specific variables was determined by the attending clinicians responsible for patient care, who deemed them relevant for the management and treatment of COVID-19 patients. All these data served as input variables utilized in the prediction of daily severity fluctuations.

The binary and categorical features such as the presence of symptoms and type of oxygen device used were categorized as 0 or 1 by one-hot encoding. Although information on the type of oxygen device and fraction of oxygen was available, it was not included in the model to prevent data leakage because it directly represented the outcome. Antiviral agent use was treated as a binary variable and steroid use, which was represented as daily summed doses, was treated as a continuous variable. For continuous variables, data cleansing was conducted by removing error values (such as negative blood pressure values) and capping outlier values. Data clipping was usually conducted in the 1st and 99th percentile values [15]; however, we used outlier values decided by the clinicians as the data clipping points because several 1st and 99th percentile values did not represent severity of the laboratory values. Each data point is described in S1 Table. Among the extracted features, rarely checked laboratory examinations or vital signs, such as HbA1c and neurologic examinations, were excluded from our input dataset. Patients’ symptoms were recorded daily and categorized into 18 types. Positive symptoms during admission are described in S2 Table. Every day, all daily symptom records were scrutinized and marked either as present or absent. If a patient was intubated, the corresponding symptom was deemed to be absent.

Data exclusion and imputation

Data integrity is important for precise prediction [19]. Laboratory test were less conducted than vital sign data because patients with mild COVID-19 infection did not frequently undergo laboratory tests. The admission events were excluded if >50% of their vital signs data, >80% of their laboratory data, or their COVID-19 severity outcome data were missing during the entire admission period (S1 Fig).

Missing value imputation was conducted in two steps. First, if the variable had an existing previous value, the last observation carried forward method was used (such as laboratory data and vital signs). Second, when the patient was hospitalized without laboratory results, we assumed that the prediction based on normal values until abnormal data were observed would be robust. As many data distributions were skewed, the mode was appropriate for representing normal values in our dataset. (S2 and S3 Figs) Thus, no previously observed data were imputed with each mode value.

Models

We evaluated several models for predicting the daily severity of COVID-19, classifying each model as a non-temporal or temporal model. The non-temporal model that received daily data consisted of logistic regression, random forest [20], gradient boost (XG boost) [21], and deep neural networks (DNN) [22]. The detailed hyperparameters of the models were described in Supplement methods. On the day of prediction, the model receives information regarding vital signs in the last 8 hours (16–24 hour) and laboratory and symptom data along with daily information. As temporal model, Transformer received trend information for prediction [23]. The model receives input as trend data, thus, the vital sign trend of n-day and lab data of n-day. The vital sign and laboratory data had different frequencies; the index time window of the vital sign was defined as 8 hours, and that of the lab was 24 hours. The detailed input pipeline is explained in Supplemental methods. (S1 File) We evaluated the model using different input lengths by changing it every two hospital days to determine its effect on the model.

The source code used for developing our model and for drawing the figure was implemented by Python (version 3.8), numpy (version 1.22), scikit-learn (version 1.0.2), pandas (version 1.4.0), Pytorch (version 1.10), and matplotlib (version 3.5.1).

Statistical analysis

The baseline characteristics were described as mean and standard deviation for continuous variables and number and percentage for binary variables. The difference between continuous variables was calculated by ANOVA; binary variables were calculated using the chi-square test. Model performance was evaluated by area under the receiver operating curve (AUROC) and area under the precision-recall curve (AUPRC), which involved multi-class classification. AUROC and AUPRC were calculated by binarization to the target and the rest of the target. Each target severity was plotted by each receiver operating characteristic curve and precision-recall curve. For a more comprehensive comparison of the models, we implemented a decision curve analysis [24]. Given that this methodology was originally conceptualized for binary classification, it necessitated the modification of the data format to a "one versus the rest" approach for our predictions classified as mild, moderate, or severe. The patients were divided into training (60%), validation (20%), and test (20%) sets by stratified splitting, and the model was trained based on the validation set using early stopping. To calculate the mean and standard deviation of performance metrics according to the input length, we repeated splitting and training five times per input length.

The model importance was computed using Shapley additive explanation (SHAP) values, which is model-independent. Using this strategy, each of the random forest, gradient boost, and deep neural network models assessed the importance of the variable. Input variable importance was assessed based on the data type, distinguishing between hospital/treatment-dependent factors (extrinsic factors) and patient-dependent factors (intrinsic factors). Hospital/treatment-dependent factors encompassed the number of vital sign occurrences (count, variance) and the dosage of drugs (max, min, median). On the other hand, patient-dependent factors included vital sign values (max, mean, median, min), demographics (operation history, previous admission, age, sex), and the presence of symptoms.

Results

Among the included patients, 4,660 patients were identified and 3,996 patients remained following exclusion criteria regarding age and data integrity. The baseline characteristics were classified by the maximum COVID-19 severity during admission. (Table 1) Older age, presence of underlying disease, previous operation, and previous admission were associated with severe COVID-19 outcomes. The median duration of hospital admission was 9 days (interquartile range: 7–12 days; S4 Fig). A good example of the model prediction is provided in Fig 1. The model can predict the aggravation and recovery of COVID-19 1 and 2 days in advance.

Table 1. Baseline characteristics of included patients.

Severity Mild Moderate Severe p-value
(N = 2704) (N = 955) (N = 337)
Age (years) 49.9 ± 17.0 58.4 ± 16.3 66.5 ± 15.0 <0.005
Sex (male) 1202 (44.5%) 499 (52.3%) 214 (63.5%) <0.005
Smoking 346 (12.8%) 73 (7.6%) 38 (11.3%) <0.005
Alcohol 736 (27.2%) 208 (21.8%) 70 (20.8%) 0.001
Weight (kg) 66.1 ± 15.1 69.2 ± 16.7 67.9 ± 15.4 <0.005
Height (cm) 164.8 ± 10.8 164.3 ± 11.4 164.2 ± 9.1 0.374
BMI (kg/m2) 26.1 ± 61.9 26.4 ± 19.4 25.0 ± 4.3 0.906
Previous medical history
    Hypertension 609 (22.5%) 343 (35.9%) 164 (48.7%) <0.005
    Diabetes mellitus 294 (10.9%) 213 (22.3%) 99 (29.4%) <0.005
    Tuberculosis 37 (1.4%) 16 (1.7%) 8 (2.4%) 0.333
    Dyslipidemia 208 (7.7%) 97 (10.2%) 35 (0.4%) 0.028
    Congestive heart disease 77 (2.8%) 23 (2.4%) 19 (5.6%) 0.009
    Neurologic disease 67 (2.5%) 55 (5.8%) 25 (7.4%) <0.005
    Pulmonary disease 56 (2.1%) 19 (2.0%) 11 (3.3%) 0.336
    Liver disease 32 (1.2%) 17 (1.8%) 6 (1.8%) 0.318
    Other diseases 1089 (40.3%) 404 (42.3%) 187 (55.5%) <0.005
    Operation history 875 (32.4%) 319 (33.4%) 135 (40.1%) 0.018
    Previous admission 716 (26.5%) 271 (28.4%) 115 (34.1%) 0.01

The previous admission denotes all the previous admission in the BMC regardless of the period. The severity of the patients was classified at the patient-level and the highest severity during the admission was selected for the classification.

Fig 1. Example of prediction of daily severity during admission.

Fig 1

The target is actual daily severity, and day 0 to day 2 represent the model predictions. The patient’s status aggravated to severe on day 2, recovered to moderate on day 9, and became mild on day 13. The model (day 2) predicts disease aggravation and recovery 2 days before the infection.

Model performances

The model predicted the disease severity per day, including the present day (day 0), day 1, and day 2. The random forest model had the best performance, followed by the XG boost model, the deep neural network model, and the logistic regression model. The model’s prediction performance peaked at day 0 with an AUROC of 0.981 and an AUPRC of 0.940 (DNN), and it decreased as the prediction horizon expanded (Table 2, S3 Table).

Table 2. The best model performance for daily COVID-19 severity.

Models Accuracy (%) AUROC AUPRC
Day 0 Logistic regression 89 [88.4–89.6] 0.962 [0.959–0.966] 0.872 [0.861–0.883]
DNN 92.9 [92.3–93.4] 0.972 [0.969–0.976] 0.934 [0.926–0.943]
XG boost 95.7 [95.3–96.1] 0.991 [0.99–0.993] 0.972 [0.967–0.977]
Random forest 95.2 [0.966–0.861] 0.992 [0.99–0.993] 0.977 [0.973–0.98]
Day 1 Logistic regression 87.1 [86.4–87.7] 0.948 [0.944–0.952] 0.837 [0.826–0.849]
DNN 91.9 [91.3–92.4] 0.966 [0.962–0.97] 0.923 [0.913–0.931]
XG boost 94.4 [94–94.9] 0.985 [0.982–94.9] 0.958 [0.951–0.963]
Random forest 94.4 [93.9–94.9] 0.988 [0.987–0.99] 0.969 [0.965–0.973]
Day 2 Logistic regression 86.2 [85.5–86.9] 0.938 [0.934–0.942] 0.808 [0.795–0.82]
DNN 91.6 [91.1–92.2] 0.962 [0.958–0.966] 0.913 [0.904–0.921]
XG boost 93.6 [93.1–94.1] 0.979 [0.976–0.982] 0.946 [0.939–0.953]
Random forest 93.9 [93.4–94.4] 0.985 [0.983–0.987] 0.959 [0.953–0.964]

The accuracy of prediction decreased with increasing time horizon (accuracy: 92.6% [day 1], 91.8% [day 2], DNN). The transformer, which is more complex than DNN, did not show better performance. (Table 3) Regarding severity, the DNN model showed AUROC values of 0.958, 0.928, and 0.987 for mild, moderate, and severe COVID-19, respectively. In AUPRC, the model showed prediction performance of 0.985, 0.823, and 0.947 for mild, moderate, and severe COVID-19 after 2 days (Fig 2).

Table 3. Comparison of the model performances between DNN and transformer.

Prediction horizon (days) Models Input lengths (days) Accuracy AUROC AUPRC
0 DNN 1 93.45 0.981 0.94
Transformer 2 91.33 0.981 0.938
6 93.44 0.984 0.945
10 93.12 0.983 0.94
14 92.51 0.982 0.936
20 93.22 0.984 0.943
1 DNN 1 92.62 0.969 0.914
Transformer 2 89.49 0.967 0.909
6 89.72 0.967 0.904
10 90.17 0.968 0.91
14 88.29 0.963 0.891
20 88.46 0.963 0.897
2 DNN 1 91.82 0.963 0.907
Transformer 2 85.08 0.952 0.875
6 88.84 0.96 0.894
10 87.83 0.954 0.874
14 86.59 0.954 0.876
20 85.46 0.95 0.866

Fig 2. The AUROC and AUPRC for daily prediction of COVID-19 severity.

Fig 2

The upper figures denote the receiver operating curve for predicting day 0, day 1, and day 2. The lower figures denote the precision-recall curve for each severity outcome.

In addition, our decision curve analysis highlighted the net-benefit of the model based on varying threshold probabilities. Consistent with our previous AUROC comparisons, the random forest and XG boost models demonstrated higher net-benefits for the mild, moderate, and severe predictions for day 2, as displayed in Fig 3.

Fig 3. The comparison of the model by decision curve analysis.

Fig 3

The figure represents the decision curve analysis for severity predictions for the period two days ahead. The baseline comparison (All positive) is premised on the assumption that every two-day event will be mild, moderate, or severe. All of our models demonstrated benefits exceeding the basic assumption made without any model (All positive).

Feature importance of different models

The SHAP method was used to demonstrate the models’ feature importance. (Fig 4) We selected the highest feature importance up to the 20th variable for predicting severe COVID-19 after 2 days. Although feature importance was determined in the same subset of the test set, the selected features varied among models. DNN showed higher feature importance in the following order: operation history, male sex, older age, hypertension, and nausea. The XG boost model used the following ranking: steroid (maximum, mean), lactate dehydrogenase (minimum), old age, and SpO2 (minimum, count). Additionally, the random forest model used the following: steroid (maximum), SpO2 (count), Pulse rate (count), steroid (minimum), and SpO2 (minimum).

Fig 4. The feature importance among the models.

Fig 4

The three model’s feature importance were compared by the SHAP method. The tree-based model places a greater focus on steroid usage, SpO2 count, and vital signs (Pulse rates, and diastolic BP). The DNN model placed a greater focus on demographic data and symptoms.

The factors were categorized as patient-dependent and hospital/treatment-dependent. For predicting patient’s severity, the patient factor cannot be altered by the hospital where the patient was admitted. However, hospital/treatment factors, such as the number of vital signs that should be monitored and the dosage of steroid administered, can differ between hospitals. Regarding the SHAP variable analysis, the patient-dependent percentage for each model was 95% in DNN, 75% in XGBoost, and 35% in random forest (p-value: 0.065). Thus, the DNN prioritized patient factors, whereas the random forest and XG boost prioritized hospital/treatment factors. The same phenomenon was also observed for predicting mild and moderate COVID-19 infection (S5 Fig).

Discussion

We examined a range of models to forecast daily COVID-19 severity. The DNN model had an accuracy, AUROC, and AUPRC of 91.8%, 0.96, and 0.90, respectively for predicting the severity of COVID-19 as either mild, moderate, or severe 2 days in advance. The transformer model that received trend data did not perform better than the DNN model that received only daily data. The tree model (random forest, XG boost) outperformed DNN; however, the model’s feature importance was heavily weighted towards hospital/treatment factors, resulting in unclear generalizability in other hospitals.

Severe COVID-19 shows a similar disease course as that observed in acute respiratory distress syndrome, which requires high-level oxygen support from high-flow nasal oxygen, ventilator support, and ICU care [25]. Respiratory devices are restricted resources; thus, treatment of severe COVID-19 patients is limited by device and healthcare worker availability [6]. Our model can help identify patients that show aggravation of the infection and recovery in near future; thus, clinicians could be advised to start or wean respiratory devices in a limited resource setting. Previous studies focused on predicting disease aggravation at admission along with real-time prediction [16, 26]. While it is essential to identify potential patients with infection aggravation in future, identifying those who will recover is also important in managing overall hospital resources. By integrating our model into clinical practice, clinicians can readily monitor patients who may be at risk of deterioration due to COVID-19 as well as those showing signs of recovery. Such insights prove valuable in anticipating potential strains on medical facilities and allow for proactive arrangements, including network preparations for transferring patients requiring ventilator support and intensive care unit (ICU) facilities.

A static prediction implies prediction of any future event at the onset of specific time [15]. For example, a logistic model incorporating clinical symptoms and abnormal laboratory tests at admission could predict whether patients would develop a serious illness within 30 days [10]. Dynamic or real-time prediction predicts outcomes at each time-point [15]. Recently, a dynamic prediction model for forecasting future COVID-19-related aggravation [16] or mortality [11] was developed. Although temporal model (RNN) improved the performance of numerous tasks [13, 27, 28], daily severity prediction tasks did not benefit from the use of temporal model in this study. We anticipated improved performance by incorporating temporal information into the transformer model, but the improvement was task-dependent. In this study, the AUROC and AUPRC for predicting severity 2 days in advance were >0.9, which could indicate that the task was rather simple and that the trend information provides no additional information compared to daily information.

Among non-temporal models, tree-based models, such as random forest and XG boost, performed better than DNN and logistic regression models. When logistic regression demonstrates performance equivalent to that of machine learning models, including deep learning model, a basic model may be preferred over a machine learning model because it is more interpretable and understandable [29]. In this study, as the dimension of the input data exceeded 300, machine learning enhanced model performance, suggesting that complex computation is more advantageous for prediction tasks than linear computation [30]. Regarding model performance, the random forest model should be preferred because it exhibited the best results among non-temporal models. However, the SHAP method evaluated model reliability based on the significance of features rather than model performance. The tree model revealed that drug use and frequency of vital sign checks are of higher importance, and these factors are dependent on treatment pattern and hospital rules. Moreover, in random forest and XG boost models, steroid usage was the most important factor. Steroid usage was considered a surrogate marker for severe COVID-19 infection, even if it is intended to treat the disease. Therefore, performance may be diminished if the treatment pattern or hospital rule for vital sign changes. This risk was identified in a prior study that employed the ensemble model based on two tree models; higher internal validation performance was not maintained in other hospitals [31]. This is because the ensemble model was hospital- and treatment-dependent, and the prediction was not transferable to other hospitals. However, in this study, the deep learning model placed greater emphasis on patient demographics and symptoms. As the feature cannot be altered even if the patient is transferred to a different hospital, the model’s predictions are transferable to other hospital settings. Thus, we determined that the DNN model is superior than the tree model due to greater generalizability despite its inferior performance.

The strength of this study is that our deep learning model predicts COVID-19 severity regardless of the phase of the disease. To the best of our knowledge, this is the first study to predict future severity using a time-series dataset, regardless of the course of COVID-19. We assessed a variety of models, including machine learning and deep learning models, as well as data structures derived from non-temporal and temporal input. The model’s interpretation was evaluated using the SHAP method, which considers each model’s important features and their generalizability using a domain-wise knowledge approach. Furthermore, our model’s predictive capability is particularly beneficial during nighttime when timely interventions are pivotal. By accurately estimating the potential risk of patient deterioration, clinicians are better equipped to allocate essential medical devices, ensuring that patients receive the necessary support even during off-peak hours

This study had some limitations. First, it was based on single-centered retrospective cohort data; thus, the model may not show high performance using other hospital datasets. Due to the difficulty of accessing other hospitals with a comparable patient group, external validation was restricted. The variability in the frequency of vital sign and laboratory tests may result in a higher incidence of missing data. Additionally, the unavailability of daily symptom records in other hospitals could potentially present challenges to the generalizability of our model. Nevertheless, we evaluated the model’s essential components using the SHAP method and its generalizability with variable characteristics. We followed to a common technique for processing time-series data; thus, the problem regarding hospital and treatment-related factors would be similar to that of another task prediction. Consequently, our method for determining the importance of a variable will benefit from a future study focusing on generalizability. Further validation is required to support our findings. Second, the severity of COVID-19 was determined by WHO guidelines and the application of oxygen. The application of oxygen could depend on the clinician’s preference. If the clinical pattern of oxygen usage in another hospital differed from that in ours, our model would not accurately predict the application and weaning of oxygen at that hospital. If severity could be independently established as per the physician’s preference, the target severity would be appropriate for any future research. Third, this study has not yet undergone prospective validation, hence, it does not provide guidance on what steps should be taken in instances where there is a discrepancy between clinical judgments and model predictions. A subsequent prospective study is needed to collect actual clinical experiences and reassess how to proceed in such scenarios.

Conclusions

Our hierarchical transformer model showed higher predictive performance in the early and late periods of hospital admission. Our model could be useful for COVID-19 patients by predicting their future outcomes and aids the distribution of respiratory devices.

Supporting information

S1 File. Supplement methods.

(DOCX)

S1 Fig. Missing value proportion per patient.

(TIF)

S2 Fig. Exploration of laboratory distribution.

(TIF)

S3 Fig. Difference of mode, median, and mean in ferritin value.

(TIF)

S4 Fig. Histogram of the length of hospital admission duration.

(TIF)

S5 Fig. The feature importance of the models for predicting mild and moderate COVID-19.

(TIF)

S1 Table. Percentile and clipping values of continuous variables.

(DOCX)

S2 Table. Symptom presentation of COVID-19 patients during hospital admission.

(DOCX)

S3 Table. Detailed model performance comparison.

(DOCX)

Data Availability

The data that support the findings of this study were provided by the Institutional Review Board (IRB) of Boramae Medical Center. While these data are not publicly accessible, they can be provided upon a reasonable request and with the approval of the IRB. For further inquiries or data requests, please directly contact the IRB of Boramae Medical Center at Tel: 82-2-870-1851. We have acknowledged the concern about the authors being the sole individuals responsible for data access and have taken appropriate measures to address this.

Funding Statement

This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Nka AD, Ka’e AC, Bouba Y, Semengue ENJ, Tchouaket MCT, Takou D, et al. Global burden of SARS-CoV-2 infection, hospitalization and case fatality rate among COVID-19 vaccinated individuals and its associated factors: A systematic review and meta-analysis protocol. PLoS One. 2022;17: 1–7. doi: 10.1371/journal.pone.0272839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Centers for Disease Control and Prevention (CDC). Cases, Data, and Surveillance. [cited 25 May 2023]. Available: https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/burden.html [Google Scholar]
  • 3.Ssentongo P, Ssentongo AE, Voleti N, Groff D, Sun A, Ba DM, et al. SARS-CoV-2 vaccine effectiveness against infection, symptomatic and severe COVID-19: a systematic review and meta-analysis. BMC Infect Dis. 2022;22: 1–12. doi: 10.1186/s12879-022-07418-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fink G, Tediosi F, Felder S. Burden of Covid-19 restrictions: National, regional and global estimates. eClinicalMedicine. 2022;45. doi: 10.1016/j.eclinm.2022.101305 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bo Y, Guo C, Lin C, Zeng Y, Li HB, Zhang Y, et al. Effectiveness of non-pharmaceutical interventions on COVID-19 transmission in 190 countries from 23 January to 13 April 2020. Int J Infect Dis. 2021;102: 247–253. doi: 10.1016/j.ijid.2020.10.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zhuang Z, Cao P, Zhao S, Han L, He D, Yang L. The shortage of hospital beds for COVID-19 and non-COVID-19 patients during the lockdown of Wuhan, China. Ann Transl Med. 2021;9: 200–200. doi: 10.21037/atm-20-5248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Overwhelmed Hospitals Are Shipping COVID-19 Patients To Far-Off Cities: NPR. [cited 1 Sep 2022]. Available: https://www.npr.org/2021/08/19/1029378744/hospital-beds-shortage-covid-coronavirus-states
  • 8.Alhazzani W, Evans L, Alshamsi F, Møller MH, Ostermann M, Prescott HC, et al. Surviving Sepsis Campaign Guidelines on the Management of Adults with Coronavirus Disease 2019 (COVID-19) in the ICU: First Update. Crit Care Med. 2021;2019: E219–E234. doi: 10.1097/CCM.0000000000004899 [DOI] [PubMed] [Google Scholar]
  • 9.Her M. Repurposing and reshaping of hospitals during the COVID-19 outbreak in South Korea. One Heal. 2020;10: 100137. doi: 10.1016/j.onehlt.2020.100137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med. 2020;180: 1081–1089. doi: 10.1001/jamainternmed.2020.2033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yan L, Zhang H-T, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell. 2020;2: 283–288. doi: 10.1038/s42256-020-0180-7 [DOI] [Google Scholar]
  • 12.Ji D, Xu J, Chen Z, Yang T, Zhao P, Chen G, et al. Prediction for Progression Risk in Patients with COVID-19 Pneumonia: the CALL Score. Clin Infect Dis. 2012;0954162: 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Rasmy L, Nigo M, Kannadath BS, Xie Z, Mao B, Patel K, et al. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. Lancet Digit Heal. 2022;4: e415–e425. doi: 10.1016/S2589-7500(22)00049-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572: 116–119. doi: 10.1038/s41586-019-1390-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Roy S, Mincu D, Loreaux E, Mottram A, Protsyuk I, Harris N, et al. Multitask prediction of organ dysfunction in the intensive care unit using sequential subnetwork routing. J Am Med Informatics Assoc. 2021;28: 1936–1946. doi: 10.1093/jamia/ocab101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hao B, Hu Y, Sotudian S, Zad Z, Adams WG, Assoumou SA, et al. Development and validation of predictive models for COVID-19 outcomes in a safety-net hospital population. J Am Med Informatics Assoc. 2022;29: 1253–1262. doi: 10.1093/jamia/ocac062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rodrigues DS, Nastri ACS, Magri MM, Oliveira MS de, Sabino EC, Figueiredo PHMF, et al. Predicting the outcome for COVID-19 patients by applying time series classification to electronic health records. BMC Med Inform Decis Mak. 2022;22: 1–15. doi: 10.1186/s12911-022-01931-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Marshall JC, Murthy S, Diaz J, Adhikari N, Angus DC, Arabi YM, et al. A minimal common outcome measure set for COVID-19 clinical research. Lancet Infect Dis. 2020;20: e192–e197. doi: 10.1016/S1473-3099(20)30483-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jin Y, Schneeweiss S, Merola D, Lin KJ. Impact of longitudinal data-completeness of electronic health record data on risk score misclassification. J Am Med Inform Assoc. 2022;29: 1225–1232. doi: 10.1093/jamia/ocac043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Breiman L. Random_Forest. Mach Learn. 2001;45: 5–32. doi: 10.1017/CBO9781107415324.004 [DOI] [Google Scholar]
  • 21.Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013;7. doi: 10.3389/fnbot.2013.00021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Murtagh F. Multilayer perceptrons for classification and regression. Neurocomputing. 1991;2: 183–197. doi: 10.1016/0925-2312(91)90023-5 [DOI] [Google Scholar]
  • 23.Phuong M, Hutter M. Formal Algorithms for Transformers. 2022; 1–16. Available: http://arxiv.org/abs/2207.09238 [Google Scholar]
  • 24.Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagnostic Progn Res. 2019;3: 1–8. doi: 10.1186/s41512-019-0064-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Matthay MA, Thompson BT, Ware LB. The Berlin definition of acute respiratory distress syndrome: should patients receiving high-flow nasal oxygen be included? Lancet Respir Med. 2021;9: 933–936. doi: 10.1016/S2213-2600(21)00105-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schwab P, Mehrjou A, Parbhoo S, Celi LA, Hetzel J, Hofer M, et al. Real-time prediction of COVID-19 related mortality using electronic health records. Nat Commun. 2021;12. doi: 10.1038/s41467-020-20816-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lee J, Ta C, Kim JH, Liu C, Weng C. Severity Prediction for COVID-19 Patients via Recurrent Neural Networks. AMIA. Annu Symp proceedings AMIA Symp. 2021;2021: 374–383. [PMC free article] [PubMed] [Google Scholar]
  • 28.Park HJ, Jung DY, Ji W, Choi CM. Detection of bacteremia in surgical in-patients using recurrent neural network based on time series records: Development and validation study. J Med Internet Res. 2020;22: 1–10. doi: 10.2196/19512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Volovici V, Syn NL, Ercole A, Zhao JJ, Liu N. Steps to avoid overuse and misuse of machine learning in clinical research. Nat Med. 2022;28: 1996–1999. doi: 10.1038/s41591-022-01961-6 [DOI] [PubMed] [Google Scholar]
  • 30.Grinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on tabular data? 2022. Available: http://arxiv.org/abs/2207.08815 [Google Scholar]
  • 31.Roimi M, Neuberger A, Shrot A, Paul M, Geffen Y, Bar-Lavie Y. Early diagnosis of bloodstream infections in the intensive care unit using machine-learning algorithms. Intensive Care Med. 2020;46: 454–462. doi: 10.1007/s00134-019-05876-8 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

John Adeoye

21 Jul 2023

PONE-D-23-14493In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health recordsPLOS ONE

Dear Dr. Jeong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 04 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

John Adeoye

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure:

“This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383)”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383)”

We note that you have provided funding information that is currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383)”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf."

4. Thank you for stating the following in your Competing Interests section: 

“Declarations of interest: none”

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

 This information should be included in your cover letter; we will change the online submission form on your behalf.

5. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Additional Editor Comments:

1. Comment on the quality of structured dataset used. You may check the article https://doi.org/10.1186/s40537-023-00703-w for details.

2. Please comment on the net benefit of the outperforming model using decision curve analysis.

3. Address reviewer recommendations and suggestions.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Kindly find the below comments:

Although the topic that you selected is good a but organization of the manuscript is well.

1. as the statement written in abstract "Objective: We reviewed various forecasting models to predict the daily severity of COVID-19 using electronic health records"

Kindly provide the various forecasting models that you studied in the literature review section. and this also may not be stands for an objective.

2. Remove the heads in abstracts like Objective, Materials and Methods, Results, Conclusion (Need to Refer other published manuscripts for writing an abstract)

3. Arrange the section as Abstract, Introduction, Literature Review, Proposed Methodology, Results and discussion , Conclusion and future scope, references

4. Literature review section is messing. Also, add comparative analysis of previous methods table in this section with following column heads as Ref. No., Title of paper , Methodology/Algorithm Used, Major findings, Limitations of the study

5. Model architecture is not available. Also model working is missing.

6. Need to visualize the results.

7. Performance (As you put the % of precision, recall directly, ) and model evaluation is missing. Kindly show the graphs for accuracy, precision, f1-score, support, and any others parameter. Also need to find Q-measure test Friedman test and student t-test for your model.

8. Model testing with other methods/algorithms should be there in the result section.

9. Need to perform various ML processing techniques WITH NOVELTY on your dataset.

10. At least 30+ references should be there.

Reviewer #2: General Comments: The manuscript presents a study on predicting the daily severity of COVID-19 using electronic health records (EHR) and various machine learning and deep learning models. The authors evaluated the performance of different models and assessed the importance of input variables. The study provides valuable insights into predicting COVID-19 severity in real-time and highlights the need to consider hospital-specific factors. Overall, the manuscript is well-structured, and the methods and results are adequately described. Addressing the following comments would further improve the clarity and comprehensiveness of the manuscript:

Major Comments:

1. The abstract provides a concise summary of the study, but it would be helpful to include specific details on the performance of each model in terms of AUROC and AUPRC. Additionally, it would be beneficial to highlight the significance and implications of the findings quantitatively.

1. The introduction provides a good background on COVID-19 and the need for predicting severity. However, it would be helpful to include some references to support the statements made in the introduction. Additionally, the introduction should clearly state the research objectives and the research gap that the study aims to address.

2. The methods section provides a detailed description of the dataset, data preprocessing, and the models used. However, there are several areas that need clarification:

- It is not clear how the authors determined the prediction horizon (e.g., 2 days in advance). This should be explained in more detail.

- The authors mention that missing data were imputed using the last observation carried forward method and the mode. However, it is not clear how missing data were handled for patients without any previous data. This should be explained.

- The authors should provide more information about the hyper-parameters of the models used, such as the number of hidden layers and the learning rate for the deep neural network model.

- It would be helpful to provide more information about the performance metrics used to evaluate the models.

- Data preprocessing and feature selection: The description of the data preprocessing and feature selection steps is clear. However, it would be useful to provide more details on the specific features selected for the models and the rationale behind their selection.

- The models used for prediction are well-described. However, it would be beneficial to provide more information on the hyperparameters of each model to ensure reproducibility of the study. Additionally, it would be helpful to provide references for the transformer model and clarify how it handles the temporal aspect of the data.

2. The results section provides a comprehensive overview of the findings. However, it would be beneficial to include specific details on the performance metrics (AUROC and AUPRC) for each model at different prediction horizons (day 0, day 1, day 2). It would also be useful to provide statistical significance testing or confidence intervals to assess the differences in performance between models.

3. The discussion provides a good interpretation of the results and highlights the importance of considering hospital-specific factors in real-time prediction models. However, it would be helpful to discuss the limitations of the study, such as the generalizability of the findings to other healthcare settings and the potential impact of missing data on the model performance. Additionally, it would be helpful to compare the findings of this study with previous studies that have predicted COVID-19 severity.

Minor Comments:

1. In the introduction, it would be helpful to define the term "short-term outcomes" as used in the context of COVID-19 severity prediction.

2. In the methods section, it would be useful to provide more information on the number of patients included in each severity category (mild, moderate, severe) to assess the distribution of severity levels in the dataset.

3. In the results section, it would be beneficial to include a figure or table summarizing the performance metrics (AUROC, AUPRC) of each model at different prediction horizons.

4. In the discussion, it would be helpful to provide some insights into the clinical implications of the findings and how real-time prediction of COVID-19 severity can improve patient care and resource allocation.

5. The manuscript could benefit from proofreading and minor grammatical edits to improve readability.

Reviewer #3: While many media sources continue to discount or obfuscate the long-term effects of COVID-19, I remain isolated to protect an immunocompromised spouse. Consequently, I am very familiar with how risk containment has changed over time and best practices. It is also cool to see a snapshot of what predicted risk from pre-pandemic to around Delta invasivity data.

For the introduction, this is an excellent summary of the first few years of the pandemic and the thesis of the manuscript. It is true that early precautionary measures were useful for reducing risk, and that this prevention or mitigation strategy was used in both mild and severe cases. I had not been aware of South Korea's efforts to minimize disease severity, but that is great. To continue reducing disease severity and better estimate hospitals being at full capacity, yes, predicting disease severity is critical. The point about event risk is well-taken. Most studies arbitrarily chose a given endpoint versus baseline, rather than taking all timepoints or dynamic timepoints into account. It is also true that many of these other studies, ours included, focused on onset of COVID-19 EHR instead of estimating recovery. Thus, predicting COVID-19 onset and recovery at the BMC is a very useful extension of prior work.

For methods, in summary, I see no problems here at all. To begin, given the secondary nature of the data, it makes sense that informed consent would be waived. The timestamped data for a robustly large cohort is also good. The data collected reflect what is available through EHRs and what is routinely collected in many tertiary care settings. Scaling variables is described surprisingly well. I commend the authors on this point. All of the decisions seem fine with regard to making variables binary, continuous, or bringing in the distribution tails (with clinician direction) when values might be unforeseen outliers. Supplementary Figures 1 and 2 also show a willingness for the raw data to be transparent, which I appreciate. The distributions are all similar to what I would expect in relatively healthy middle-aged to aged adults who present at a given clinic. Data missingness for mild cases is also understandable, as this will stochastically vary depending on the nursing staff, attending physician, and capacity of the tertiary clinic. Supplemental Table 1 goes into this in detail. For imputation, it is reasonable to include the mode to avoid bizarre behavior for vitals and other data in estimation analyses. I appreciate the additional data in Supplement Figures 3-5 that describe raw data, as well as data fit for mild to moderate COVID-19 cases using different estimation methods. For statistical methods, this all seems standard regarding classification metrics, split-model training vs. assessment probands, and even some reasonable mean +/- SD for input length.

For results, it is refreshing and welcome to read a brief summary of Table 1, as well as initial figures, and for it all to make intuitive sense. Table 2 reveals that regardless of the model type tested, the AUROC or AUPRC was outstanding. It was interesting how RF did better than DNN. Given the sparsity of the model set and N, however, too many interaction terms may have loaded and diluted overall model fit.

My only suggestion here is to list, either in text or the tables, if Model X significantly differed from Model Y (e.g., if Prediction Horizon day 0, 1, or 2 showed any difference for Input Length for Accuracy). In other words, just some basic statistics to formally show what is described in section 3.1. In section 3.2, the authors describe an intriguing pattern in the data: that the best fit methods predominantly extracted hospital/treatment factors (i.e., external factors) compared to DNN which extracted patient factors (i.e., internal factors). To strengthen or formalize this observation, a statistical test comparing factors on a binary scale ('0' = internal, '1' = external) might be useful. This is just a friendly suggestion for substantiating the claim made and is not a critique.

For the discussion, I again agree that predicting recovery is just as important as initial infection and degree of disease aggravation. Comparisons with other studies are appropriate and thoughtful. The strengths and limitations sections are both thorough and, again, thoughtful.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Auriel A. Willette

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jan 25;19(1):e0294362. doi: 10.1371/journal.pone.0294362.r002

Author response to Decision Letter 0


20 Aug 2023

9 October 2023

To Academic Editor

Plos one

Dear Editor:

We are resubmitting our revised manuscript titled "In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health records" for your consideration. We thank the reviewers for providing valuable feedback to improve our work.

In the revised manuscript, we have addressed all concerns raised. Specifically, we clarified our methods for determining the prediction horizon, handling missing data, selecting model features, and evaluating performance. We also enhanced the introduction and discussion with additional references and clearer statements of objectives, limitations, and clinical implications. The revised article contains improved readability, expanded tables detailing model metrics, new figures illustrating decision curve analysis, and a categorized analysis of intrinsic and extrinsic variables.

While our single-center retrospective study has limitations in generalizability, we believe this rigorously analyzed, clinically-informed model provides timely insights into predicting COVID-19 severity for optimizing resource allocation. By considering both worsening and recovery phases, our approach advances knowledge beyond models focused solely on adverse events like mortality.

We are grateful for the reviewers' thoughtful feedback and have diligently incorporated their recommendations. We hope you will find this significantly improved resubmission suitable for publication in PLOS ONE. Please do not hesitate to contact us with any additional questions.

I look forward to hearing from you.

Sincerely,

Ji Bong Jeong

Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea, Division of Gastroenterology, Department of Internal Medicine, Seoul Metropolitan Government Seoul National University Boramae Medical Center, 20, Boramae‐ro 5‐gil, Dongjak‐gu, Seoul, 07061, Republic of Korea.

Email: jibjeong@snu.ac.kr

Phone: +82-2-870-2222

Fax: +82-2-870-3861

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

- We revised the manuscript according to the PLOS ONE’s style. Thank you.

2. Thank you for stating the following financial disclosure:

“This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383)”

Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

- We changed the funding information from the manuscript to cover letter, as the acknowledgement do not include the funding.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383)”

We note that you have provided funding information that is currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This work was supported by research funding from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (04-2022-0004), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), which is funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C2383)”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf."

- Absolutely. We removed, and revised the statement in cover letter according to the previous statement.

4. Thank you for stating the following in your Competing Interests section:

“Declarations of interest: none”

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

This information should be included in your cover letter; we will change the online submission form on your behalf.

- Sure, we already declare our competing interests in cover letter. Thus, we removed the declaration in Manuscript.

5. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

- Usually, the raw data include many patient’s data from vital sign to laboratory result. Although the patient ID was de-identified, the Boramae Medical center’s IRB did not consider that the dataset was fully de-identified. Thus we follow the A criteria, and added the contact information of corresponding author in the data availability section.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

- We re-uploaded all the supporting information with captions according to the guideline.

Additional Editor Comments:

1. Comment on the quality of structured dataset used. You may check the article https://doi.org/10.1186/s40537-023-00703-w for details.

- Thank you for your comment on the quality of dataset.

To describe the detail, we classify the question of good quality dataset.

1. Does the dataset have available and common type of input and outcome?

- As the input variable in dataset was decided by clinician who treated the COVID-19 patient more than thousands and prepare the medical devices for potential aggravating patients. The input variable are commonly collected in other covid-19 hospital, such as demographics, vital sign, laboratory results, and the patient symptoms. Although, the patient symptoms were not recorded in EMR system, the symptom were used for considering the severity of COVID-19 and might be written in EMR chart.

2. How good are dataset used to construct machine learning model for outcome?

- We collected the data from EMR system as raw values such as vital sign, lab value, symptoms, et al. The dataset was rigorously cleansed by data science method and the cleansing method was thoroughly described in Supplement Method with dataset distribution.

3. What data quality criteria were often fulfilled or deficient in datasets used to construct machine learning model for outcomes?

- The criteria for handling missing data and accepting a certain percentage of missingness per patient were meticulously determined in consultation with clinicians (Hyungjun Park, Sung-Hoon Kim, Ji Bong Jeong). The threshold for exclusion was set based on the distribution of missingness per admission.

Several factors accounted for the missingness. For instance, some patients were included at near the end of inclusion date, resulting in incomplete data extraction. In other cases, some patients exhibited mild symptoms, making frequent vital sign check unnecessary. Given the complexity and diversity of individual cases, it's challenging to enumerate all possible reasons for missingness during data cleansing.

However, we would like to assure you that the data cleansing process was conducted diligently and involved substantial clinical input, ensuring the quality of the dataset used for our models

4. what is the effect of data quality on the median performance metrics of the machine learning models constructed in your outcome?

- We assessed missing data and excluded patients where this was excessively prevalent. Typically, these missing entries were noted in patients with milder conditions, who did not require intense observation during their hospital stay. Details pertaining to missing data can be found in Supplement Figure 1 and within the exclusion criteria.

In this study, we did not directly investigate the effect of missingness on performance. Generally, higher degrees of missing data could lead to diminished predictive performance. Nevertheless, the description of missingness could offer valuable insights regarding the usual proportion of missing data among in-hospital patients. Furthermore, outlining missingness as a proportion of the entire admission period could provide beneficial perspectives to researchers interested in real-time prediction using EMR datasets.

2. Please comment on the net benefit of the outperforming model using decision curve analysis.

For model comparison, we implemented a decision curve analysis. As this method was initially designed to handle binary classification problems, we needed to modify the data format to a "one versus the rest" framework for our predictions of mild, moderate, and severe outcomes. This approach aligned well with our previous comparisons using AUROC and AUPRC. The modified statistical analysis is depicted as follows:

- For a more comprehensive comparison of the models, we implemented a decision curve analysis. [24] Given that this methodology was originally conceptualized for binary classification, it necessitated the modification of the data format to a "one versus the rest" approach for our predictions classified as mild, moderate, or severe.

We have supplemented our results section with an additional paragraph and figure:

- In addition, our decision curve analysis highlighted the net-benefit of the model based on varying threshold probabilities. Consistent with our previous AUROC comparisons, the random forest and XG boost models demonstrated higher net-benefits for the mild, moderate, and severe predictions for day 2, as displayed in Fig 3.

In response to a previous review, we aim to assess the model based on its performance and significance. While the random forest and XG boost models showed enhanced predictive scores, the potential risk associated with the use of extrinsic factors—which can vary among hospitals—might undermine performance in external validation.

Our analysis places significant emphasis on the results. We kindly ask you to consider our diligent efforts to provide a thorough analysis despite the limitations of our dataset.

3. Address reviewer recommendations and suggestions.

- We revised considerably according to the reviewer’s suggestion.

Comments to the Author

Reviewer #1: Kindly find the below comments:

Although the topic that you selected is good a but organization of the manuscript is well.

1. as the statement written in abstract "Objective: We reviewed various forecasting models to predict the daily severity of COVID-19 using electronic health records"

Kindly provide the various forecasting models that you studied in the literature review section. and this also may not be stands for an objective.

- Thank you for your comment. Actually, what we intended to write was evaluated severe model such as DNN, transformer, random forest, gradient boost for predicting the outcome and potential risks of overfitting of that models. And It confused you to understand our abstract.

Thus, we changed the sentence to “Among multiple forecasting models to predict the daily severity of COVID-19, we compare the accuracy and potential over-fitting risks of the models.”.

2. Remove the heads in abstracts like Objective, Materials and Methods, Results, Conclusion (Need to Refer other published manuscripts for writing an abstract)

- As your comment, we revised the abstract format to unstructured form.

3. Arrange the section as Abstract, Introduction, Literature Review, Proposed Methodology, Results and discussion , Conclusion and future scope, references

- Thank you for your comment on structure of the manuscript. Most of the medical article in PLOS one follows the format “Introduction, Material and Methods, Result, Discussion, Conclusion”. (https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf ). We followed the format as the editor’s opinion. If your format is required for publishing, we will discuss the editorial office to revising it.

4. Literature review section is messing. Also, add comparative analysis of previous methods table in this section with following column heads as Ref. No., Title of paper , Methodology/Algorithm Used, Major findings, Limitations of the study

As we've detailed previously, our manuscript is adherent to the typical stylistic conventions for medical literature, and similar pieces have been published using this particular literature review format. For your reference, you can view an example of such a format in the following article (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0284965).

In order to facilitate a more straightforward comprehension of our work, we have made amendments to our introduction. Now, it clearly outlines why it is crucial to predict both the exacerbation and recovery, along with defining what constitutes a short-term outcome.

5. Model architecture is not available. Also model working is missing.

- The model architectures for the deep neural network, transformer, and tree-based models were thoroughly elucidated in the Supplement Method section. Within this section, detailed information pertaining to the model's hyperparameters, layers, activation functions, and other relevant aspects was provided. (S Table 3)

6. Need to visualize the results.

In regard to your request for a visual representation of the results, we have done so in three distinct ways. Firstly, we have displayed a practical example of the model's application in a real-world scenario. Secondly, we have illustrated the performance of models (including DNN, logistic regression, random forest, and XG boost) over three consecutive days (day 0, day 1, day 2) using AUROC and AUPRC metrics. Moreover, we added the decision curve analysis with visualization in Figure 3. Thirdly, we have presented the SHAP values for the top three models (DNN, XG boost, random forest). These visual representations of our findings can be viewed in Figures 1 through 4..

7. Performance (As you put the % of precision, recall directly, ) and model evaluation is missing. Kindly show the graphs for accuracy, precision, f1-score, support, and any others parameter. Also need to find Q-measure test Friedman test and student t-test for your model.

For the sake of comprehensive reporting and given the constraints of table presentation, we have included detailed performance metrics of the model in Supplement Table 4. In addition, we have expanded Table 2 to present the confidence intervals of the AUROC, AUPRC, and accuracy measurements.

8. Model testing with other methods/algorithms should be there in the result section.

In response to your suggestion regarding testing our model with other methods or algorithms, we have indeed made comparisons with models from logistic regression, random forest, gradient boost, DNN, and transformers as detailed in Tables 2 and 3. We acknowledge that there are potentially other models that could be applicable to this dataset, however, our assumption was based on the understanding that these five models are commonly utilized for data of this nature..

9. Need to perform various ML processing techniques WITH NOVELTY on your dataset.

- In our study, we conducted a comparative analysis of models from logistic regression, random forest, gradient boost, and deep neural network, including the transformer model that processes time-series data. While the transformer model had access to a broader spectrum of information, it did not surpass the performance of the random forest and gradient boost models that utilized single day input data.

When considering model performance for future predictions, our findings suggest that tree-based models might be the most suitable. However, earlier studies, despite differing outcomes, have indicated a potential lack of generalizability with tree-based models. (Early diagnosis of bloodstream infections in the intensive care unit using machine-learning algorithms. Intensive Care Med.)To explore potential causes of overfitting commonly associated with tree-based models, we analyzed the SHAP values in our study.

We wish to emphasize that our study does not assert our model's superior performance; instead, we highlight its reduced risk of overfitting due to less reliance on extrinsic factors such as the number of vital signs or treatment patterns, which could vary across hospitals. For enhanced readability, we have revised our results in light of this aspect..

- The factors were categorized as patient-dependent and hospital/treatment-dependent. For predicting patient’s severity, the patient factor cannot be altered by the hospital where the patient was admitted. However, hospital/treatment factors, such as the number of vital signs that should be monitored and the dosage of steroid administered, can differ between hospitals. Regarding the SHAP variable analysis, the patient-dependent percentage for each model was 95% in DNN, 75% in XGBoost, and 35% in random forest. Thus, the DNN prioritized patient factors, whereas the random forest and XG boost prioritized hospital/treatment factors. The same phenomenon was also observed for predicting mild and moderate COVID-19 infection (S5 Fig)

10. At least 30+ references should be there.

- In response to your valuable comment, we have included additional references in this study, resulting in the total number of references exceeding 30.

Reviewer #2

Major Comments:

1. The abstract provides a concise summary of the study, but it would be helpful to include specific details on the performance of each model in terms of AUROC and AUPRC. Additionally, it would be beneficial to highlight the significance and implications of the findings quantitatively.

- Thank you for your comment. To add the detail of each model performance, we reorganized the abstract format readily.

1. The introduction provides a good background on COVID-19 and the need for predicting severity. However, it would be helpful to include some references to support the statements made in the introduction. Additionally, the introduction should clearly state the research objectives and the research gap that the study aims to address.

- In accordance with your feedback, we have enhanced the introduction section by incorporating a more comprehensive set of references and expounding upon the research gap that distinguishes our study from previous ones. Specifically, while previous investigations primarily concentrated on adverse outcomes such as mortality or the initiation of mechanical ventilation in the near future, our research specifically addresses the recuperation from such adverse outcomes, which is instrumental in facilitating the optimization of hospital resource allocation.

2. The methods section provides a detailed description of the dataset, data preprocessing, and the models used. However, there are several areas that need clarification:

- It is not clear how the authors determined the prediction horizon (e.g., 2 days in advance). This should be explained in more detail.

- The main objective of this study was to construct a predictive model for assessing future disease severity, with the intention of facilitating the preparation of essential medical resources such as ICU and oxygen devices. It was considered practically beneficial if medical professionals could identify patients likely to recover or experience worsening conditions up to 2 days in advance, as this would significantly enhance real-world clinical practice. Consequently, we revised the manuscript to reflect the targeted outcome of COVID-19 severity within a 2-day time horizon in the Method section.

- The primary objective of our study was to forecast the severity of the condition prior to the event, focusing on predictions for day 0, 1, and 2. It was deemed sufficient to forecast the patient's severity up to 2 days in advance to adequately prepare the necessary oxygen and ICU resources.

- The authors mention that missing data were imputed using the last observation carried forward method and the mode. However, it is not clear how missing data were handled for patients without any previous data. This should be explained.

- The missing value imputation process involved two steps. Firstly, when previous data was available, the last observation carried forward method was employed. Secondly, in cases where data was missing without any preceding values, it was assumed that medical practitioners would determine the patient's values to be within the normal range. The normal value was calculated using the mode value, taking into account the skewed distribution observed in each laboratory data. Additional information and graphical representations of the imputation procedure can be found in Supplement Figures 2 and 3. A comprehensive description of the missing value handling approach is provided in the "Data Exclusion and Imputation" section.

- First, if the variable had an existing previous value, the last observation carried forward method was used (such as laboratory data and vital signs). Second, when the patient was hospitalized without laboratory results, we assumed that the prediction based on normal values until abnormal data were observed would be robust. As many data distributions were skewed, the mode was appropriate for representing normal values in our dataset. (S2,3 Figs) Thus, no previously observed data were imputed with each mode value.

- The authors should provide more information about the hyper-parameters of the models used, such as the number of hidden layers and the learning rate for the deep neural network model.

Due to space limitations in the manuscript, a detailed exposition of the parameters of the DNN (Deep Neural Network) model was omitted. The model's architecture is not overly complex, and the majority of architectural specifics have been provided in the Supplement Method section. The following provides a comprehensive description of the specific architecture utilized in the model.

- Our model is comprised of an input layer, three hidden layers, and an output layer, each utilizing a different number of nodes. The model architecture is as follows.

Input Layer: The input layer consists of nodes equal to the number of features in our training data. This layer passes the data directly to the first hidden layer.

First Hidden Layer: This layer contains 512 nodes. The layer utilizes a dropout (20%) regularization technique to prevent overfitting. The activation function employed in this layer is the Rectified Linear Unit (ReLU) function, which introduces non-linearity into the model.

Second Hidden Layer: This layer consists of 1024 nodes. Like the first hidden layer, this layer also uses dropout regularization (20%) the ReLU activation function.

Third Hidden Layer: This layer is made up of 512 nodes. It also applies dropout regularization and the ReLU activation function.

Output Layer: Representing the concluding layer of our model, this layer comprises three nodes. It does not incorporate dropout or an activation function. The quantity of nodes is indicative of the number of categories we aspire to forecast, which in our case are the daily severity levels: mild, moderate, and severe.

- It would be helpful to provide more information about the performance metrics used to evaluate the models.

We used the model metrics from accuracy, AUROC, AUPRC. And the importance of input variable was calculated by SHAP values. The description was described in Statistical analysis section in Method.

- Data preprocessing and feature selection: The description of the data preprocessing and feature selection steps is clear. However, it would be useful to provide more details on the specific features selected for the models and the rationale behind their selection.

The process of feature selection was solely determined by the clinician serving as the first author. Given our extensive care of numerous COVID-19 patients, we carefully assessed patient severity based on a comprehensive set of variables, including laboratory results, vital signs, demographics, and other relevant factors. We firmly believed that these selected variables provided a sufficiently robust measure of patient severity and future outcome. As a result, we elaborated on the rationale behind the inclusion of these specific features in the "Data Processing and Feature Selection" section.

- The laboratory data encompassed a range of parameters, including complete blood cell count, chemistry markers (such as protein, albumin, liver function tests, BUN/Creatinine), electrolyte levels, inflammatory markers (CRP, LDH, ferritin, procalcitonin), and coagulation measurements. The selection of these specific variables was determined by the attending clinicians responsible for patient care, who deemed them relevant for the management and treatment of COVID-19 patients.

- The models used for prediction are well-described. However, it would be beneficial to provide more information on the hyperparameters of each model to ensure reproducibility of the study. Additionally, it would be helpful to provide references for the transformer model and clarify how it handles the temporal aspect of the data.

- As a consequence of space constraints within the manuscript, a comprehensive account of the hyperparameters for each model was relegated to the Supplement Method section. A more exhaustive and complete description of all hyperparameters can be found in the Supplement Method file.

2. The results section provides a comprehensive overview of the findings. However, it would be beneficial to include specific details on the performance metrics (AUROC and AUPRC) for each model at different prediction horizons (day 0, day 1, day 2). It would also be useful to provide statistical significance testing or confidence intervals to assess the differences in performance between models.

In order to ascertain statistical significance, we carried out random sampling of the test set 1000 times and calculated the AUROC, AUPRC, and accuracy, setting the confidence interval at 2.5% and 97.5%. The resulting data has been incorporated into Table 2. Moreover, the models’ decision based on threshold was compared using decision curve analysis. We added the result in Fig 4. You can easily compare the model performances with the likelihood of the probability and compare the model prediction in mild, moderate, and severe.

3. The discussion provides a good interpretation of the results and highlights the importance of considering hospital-specific factors in real-time prediction models. However, it would be helpful to discuss the limitations of the study, such as the generalizability of the findings to other healthcare settings and the potential impact of missing data on the model performance. Additionally, it would be helpful to compare the findings of this study with previous studies that have predicted COVID-19 severity.

- In this discussion, we added the potential limitation in generalizability of our model as follows.

This study had some limitations. First, it was based on single-centered retrospective cohort data; thus, the model may not show high performance using other hospital datasets. Due to the difficulty of accessing other hospitals with a comparable patient group, external validation was restricted.

� This study had some limitations. First, it was based on single-centered retrospective cohort data; thus, the model may not show high performance using other hospital datasets. Due to the difficulty of accessing other hospitals with a comparable patient group, external validation was restricted. The variability in the frequency of vital sign and laboratory tests may result in a higher incidence of missing data. Additionally, the unavailability of daily symptom records in other hospitals could potentially present challenges to the generalizability of our model.

Minor Comments:

1. In the introduction, it would be helpful to define the term "short-term outcomes" as used in the context of COVID-19 severity prediction.

- To enhance readability, we have revised the introduction to provide a clear explanation of the short-term outcome, as depicted below.

- Prediction of future need for intensive care is required for controlling capacities for facilities in medical centers. Timely preparation of essential medical devices based on short-term forecasts (1 day or 2 days) is critical for ensuring efficient management of healthcare resources. By predicting short-term patient outcomes, informed decisions can be made regarding the potential transfer of patients to alternate hospitals or care units within the same facility's ICU.

2. In the methods section, it would be useful to provide more information on the number of patients included in each severity category (mild, moderate, severe) to assess the distribution of severity levels in the dataset.

The distribution of patients based on severity is presented in Table 1. Among the observed cases, there were 2704 patients classified as mild, 955 as moderate, and 337 as severe.

3. In the results section, it would be beneficial to include a figure or table summarizing the performance metrics (AUROC, AUPRC) of each model at different prediction horizons.

Tables presenting the outcomes for various prediction horizons have been compiled as Table 2 (utilizing one-day information) and Table 3 (employing prior hospital day information by transformer). The details regarding the information horizon employed in the transformer model have been comprehensively explicated in the Supplement Method section and Supplement Figures 6 and 7.

4. In the discussion, it would be helpful to provide some insights into the clinical implications of the findings and how real-time prediction of COVID-19 severity can improve patient care and resource allocation.

For enhanced clarity and ease of comprehension, we have included the following sentences in the discussion section.

- By integrating our model into clinical practice, clinicians can readily monitor patients who may be at risk of deterioration due to COVID-19 as well as those showing signs of recovery. Such insights prove valuable in anticipating potential strains on medical facilities and allow for proactive arrangements, including network preparations for transferring patients requiring ventilator support and intensive care unit (ICU) facilities.

5. The manuscript could benefit from proofreading and minor grammatical edits to improve readability.

- We sincerely appreciate your invaluable feedback. Rest assured that we have taken great care in addressing all minor grammatical errors and rectifying any miswritten words, thanks to the expertise of an English language specialist. While the initial revision was conducted by a native English editor, we have undertaken a meticulous review and comprehensive amendment of the entire manuscript in this subsequent revision.

Reviewer #3: While many media sources continue to discount or obfuscate the long-term effects of COVID-19, I remain isolated to protect an immunocompromised spouse. Consequently, I am very familiar with how risk containment has changed over time and best practices. It is also cool to see a snapshot of what predicted risk from pre-pandemic to around Delta invasivity data.

For the introduction, this is an excellent summary of the first few years of the pandemic and the thesis of the manuscript. It is true that early precautionary measures were useful for reducing risk, and that this prevention or mitigation strategy was used in both mild and severe cases. I had not been aware of South Korea's efforts to minimize disease severity, but that is great. To continue reducing disease severity and better estimate hospitals being at full capacity, yes, predicting disease severity is critical. The point about event risk is well-taken. Most studies arbitrarily chose a given endpoint versus baseline, rather than taking all timepoints or dynamic timepoints into account. It is also true that many of these other studies, ours included, focused on onset of COVID-19 EHR instead of estimating recovery. Thus, predicting COVID-19 onset and recovery at the BMC is a very useful extension of prior work.

For methods, in summary, I see no problems here at all. To begin, given the secondary nature of the data, it makes sense that informed consent would be waived. The timestamped data for a robustly large cohort is also good. The data collected reflect what is available through EHRs and what is routinely collected in many tertiary care settings. Scaling variables is described surprisingly well. I commend the authors on this point. All of the decisions seem fine with regard to making variables binary, continuous, or bringing in the distribution tails (with clinician direction) when values might be unforeseen outliers. Supplementary Figures 1 and 2 also show a willingness for the raw data to be transparent, which I appreciate. The distributions are all similar to what I would expect in relatively healthy middle-aged to aged adults who present at a given clinic. Data missingness for mild cases is also understandable, as this will stochastically vary depending on the nursing staff, attending physician, and capacity of the tertiary clinic. Supplemental Table 1 goes into this in detail. For imputation, it is reasonable to include the mode to avoid bizarre behavior for vitals and other data in estimation analyses. I appreciate the additional data in Supplement Figures 3-5 that describe raw data, as well as data fit for mild to moderate COVID-19 cases using different estimation methods. For statistical methods, this all seems standard regarding classification metrics, split-model training vs. assessment probands, and even some reasonable mean +/- SD for input length.

For results, it is refreshing and welcome to read a brief summary of Table 1, as well as initial figures, and for it all to make intuitive sense. Table 2 reveals that regardless of the model type tested, the AUROC or AUPRC was outstanding. It was interesting how RF did better than DNN. Given the sparsity of the model set and N, however, too many interaction terms may have loaded and diluted overall model fit.

My only suggestion here is to list, either in text or the tables, if Model X significantly differed from Model Y (e.g., if Prediction Horizon day 0, 1, or 2 showed any difference for Input Length for Accuracy). In other words, just some basic statistics to formally show what is described in section 3.1. In section 3.2, the authors describe an intriguing pattern in the data: that the best fit methods predominantly extracted hospital/treatment factors (i.e., external factors) compared to DNN which extracted patient factors (i.e., internal factors). To strengthen or formalize this observation, a statistical test comparing factors on a binary scale ('0' = internal, '1' = external) might be useful. This is just a friendly suggestion for substantiating the claim made and is not a critique.

For the discussion, I again agree that predicting recovery is just as important as initial infection and degree of disease aggravation. Comparisons with other studies are appropriate and thoughtful. The strengths and limitations sections are both thorough and, again, thoughtful.

- Thank you for your exceptional review. Your thoughtful feedback has been immensely valuable, and we genuinely appreciate the insightful comments you have provided. While we recognized the differences in variable characteristics, we found it challenging to effectively illustrate these distinctions in our analysis. We eagerly welcome your valuable insights in this regard.

- To address the differentiation in input variables, we have incorporated a classification that categorizes variables as either extrinsic (hospital-related) or intrinsic (patient-related) in the Statistical Analysis section.

Input variable importance was assessed based on the data type, distinguishing between hospital/treatment-dependent factors (extrinsic factors) and patient-dependent factors (intrinsic factors). Hospital/treatment-dependent factors encompassed the number of vital sign occurrences (count, variance) and the dosage of drugs (max, min, median). On the other hand, patient-dependent factors included vital sign values (max, mean, median, min), demographics, and the presence of symptoms.

- And the proportion of intrinsic factor in top 20 importance variable was described in result section as below. Despite the lack of clear distinction based on the chi-square p-value in this classification (intrinsic vs extrinsic), we believe that the issue lies with the number of observations rather than actual differences between the models. Consequently, we posit that tree-based models may carry a higher risk of bias for time-series analysis.

- The factors were categorized as patient-dependent and hospital/treatment-dependent. For predicting patient’s severity, the patient factor cannot be altered by the hospital where the patient was admitted. However, hospital/treatment factors, such as the number of vital signs that should be monitored and the dosage of steroid administered, can differ between hospitals. Regarding the SHAP variable analysis, the patient-dependent percentage for each model was 95% in DNN, 75% in XGBoost, and 35% in random forest. (p-value: 0.065).

Attachment

Submitted filename: rebuttal_letter_plos one.doc

Decision Letter 1

John Adeoye

23 Oct 2023

PONE-D-23-14493R1In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health recordsPLOS ONE

Dear Dr. Jeong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 07 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

John Adeoye

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: The authors have addressed my comments. I have no further concerns. I think the authors did a good job addressing my comments

Reviewer #4: The article discusses the development and evaluation of machine learning and deep learning models to predict the daily severity of COVID-19 in patients at a dedicated hospital. The goal is to forecast the severity of the condition up to 2 days in advance to help manage healthcare resources efficiently. The study uses a dataset of COVID-19 patients from a specific medical center and assesses various models for their predictive accuracy.

The key findings and points in the article include:

- The importance of predicting the daily severity of COVID-19 to facilitate proactive resource allocation, such as respiratory devices.

- The comparison of different types of prediction models, including non-temporal (logistic regression, random forest, gradient boost) and temporal models (transformer).

- The use of patient data, including demographics, symptoms, laboratory tests, and vital signs as input features for prediction.

- The performance of the models in predicting severity for different time horizons (day 0, day 1, and day 2), with the random forest model outperforming others for predicting day 0 severity.

- The importance of feature selection and model interpretability using the SHAP method, with differences in feature importance between models.

- The discussion of model generalizability to other hospitals and the potential limitations of the study, including data availability and clinical variations.

In conclusion, the study presents a machine learning model, particularly the random forest model, as a valuable tool for predicting COVID-19 severity, aiding healthcare resource allocation, and offering insights into patient outcomes. The hierarchical transformer model also demonstrated good performance for certain periods of hospital admission. Further validation and research are needed to confirm the findings and adapt the model for use in different healthcare settings.

Comments that should be addressed by the authors:

1. Early identification of patients who are on the path to recovery could provide a justification for reducing or discontinuing ventilatory support and potentially adjusting the timing of treatment interventions?

2. The utilization of an exceptionally large number of predictors raises concerns about potential overfitting of the model to the dataset.

3. The algorithm predicts severity of disease within 2 days. How should a clinician interpret the results of the algorithm if the prediction is opposed to clinical evidence? For example, if a patient is at clinical stability and the algorithm predicts impeding severe disease, what kind of information can be deducted by the clinician to address the causes of impending aggravation of the disease?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: Yes: Auriel A. Willette

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jan 25;19(1):e0294362. doi: 10.1371/journal.pone.0294362.r004

Author response to Decision Letter 1


24 Oct 2023

Review Comments to the Author

Reviewer #3: The authors have addressed my comments. I have no further concerns. I think the authors did a good job addressing my comments

- Thank you for your positive feedback and acknowledgment of our efforts in addressing your comments. We greatly appreciate your constructive insights throughout the review process.

Reviewer #4: The article discusses the development and evaluation of machine learning and deep learning models to predict the daily severity of COVID-19 in patients at a dedicated hospital. The goal is to forecast the severity of the condition up to 2 days in advance to help manage healthcare resources efficiently. The study uses a dataset of COVID-19 patients from a specific medical center and assesses various models for their predictive accuracy.

The key findings and points in the article include:

- The importance of predicting the daily severity of COVID-19 to facilitate proactive resource allocation, such as respiratory devices.

- The comparison of different types of prediction models, including non-temporal (logistic regression, random forest, gradient boost) and temporal models (transformer).

- The use of patient data, including demographics, symptoms, laboratory tests, and vital signs as input features for prediction.

- The performance of the models in predicting severity for different time horizons (day 0, day 1, and day 2), with the random forest model outperforming others for predicting day 0 severity.

- The importance of feature selection and model interpretability using the SHAP method, with differences in feature importance between models.

- The discussion of model generalizability to other hospitals and the potential limitations of the study, including data availability and clinical variations.

In conclusion, the study presents a machine learning model, particularly the random forest model, as a valuable tool for predicting COVID-19 severity, aiding healthcare resource allocation, and offering insights into patient outcomes. The hierarchical transformer model also demonstrated good performance for certain periods of hospital admission. Further validation and research are needed to confirm the findings and adapt the model for use in different healthcare settings.

Comments that should be addressed by the authors:

1. Early identification of patients who are on the path to recovery could provide a justification for reducing or discontinuing ventilatory support and potentially adjusting the timing of treatment interventions?

- We value your insights. In our study, the primary goal was to proactively detect signs of patient deterioration and recovery. Although our model doesn't directly change patient trajectory upon early identification, its strength lies in the anticipation of potential deterioration. This early prediction becomes pivotal when considering resource allocation. On days where the model predicts fewer severe cases, receiving a call about a deteriorating patient might lead to more flexible allocation of resources, such as quickly assigning ventilatory support like HFNC. However, on days with a higher predicted count of at-risk patients, the same call might necessitate more judicious and possibly stringent resource allocation, ensuring that the most critical patients receive timely intervention. This nuanced approach to resource allocation, guided by our predictive model, aims to optimize patient care outcomes in varied scenarios.

2. The utilization of an exceptionally large number of predictors raises concerns about potential overfitting of the model to the dataset.

We acknowledge your concern regarding the potential for overfitting due to the extensive number of predictors. It's worth noting that our inputs predominantly consist of variables routinely collected during standard patient care, enhancing their potential for generalization across different hospitals. While many contemporary predictive models strive to incorporate as many inputs as possible, we are aware of the risks associated with overfitting. To mitigate this, we employed a methodological approach by partitioning the dataset into training, validation, and test sets—a standard practice that we have applied in our study. Furthermore, we recognize the potential for bias, especially given that some high-performing tree-based models may produce overfitted results by prioritizing hospital-dependent factors. As a proactive measure, we advocate for the use of deep learning models that exclude these potential biases (hospital-dependent elements), ensuring not only effective near-term outcome prediction but also facilitating model generalization across various hospitals.

3. The algorithm predicts severity of disease within 2 days. How should a clinician interpret the results of the algorithm if the prediction is opposed to clinical evidence? For example, if a patient is at clinical stability and the algorithm predicts impeding severe disease, what kind of information can be deducted by the clinician to address the causes of impending aggravation of the disease?

We recognize the significance of your concern. In time-series prediction models, including ours, there's an inherent challenge when the model's forecast diverges from the observed clinical status. This challenge isn't unique to our model but is a general consideration for all time-series prediction models. In our current study, our primary focus has been on forecasting potential risks and recovery. The question of how to act upon a prediction, especially when it appears to contradict clinical observations, is indeed the subsequent phase of our model's objectives. We understand that further prospective studies are warranted to assess the potential risks associated with such discrepancies. For the time being, our model has been developed with an emphasis on addressing the challenges of overcrowded hospitals, where clinicians are constantly grappling with decisions related to ICU admissions, HFNC assignments, and potential discontinuations. In such scenarios, our model could serve as a valuable tool for healthcare professionals navigating high-demand environments.

Attachment

Submitted filename: rebuttal_letter_plos_one1024.docx

Decision Letter 2

John Adeoye

26 Oct 2023

PONE-D-23-14493R2In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health recordsPLOS ONE

Dear Dr. Jeong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 10 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

John Adeoye

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Dear Authors,

Please include the concerns raised by 'Reviewer #4' in the last review round as potential limitations or recommendations for future work in the manuscript.

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jan 25;19(1):e0294362. doi: 10.1371/journal.pone.0294362.r006

Author response to Decision Letter 2


29 Oct 2023

Response to Reviewer

Reviewer #3: The authors have addressed my comments. I have no further concerns. I think the authors did a good job addressing my comments

- Thank you for your positive feedback and acknowledgment of our efforts in addressing your comments. We greatly appreciate your constructive insights throughout the review process.

Reviewer #4: The article discusses the development and evaluation of machine learning and deep learning models to predict the daily severity of COVID-19 in patients at a dedicated hospital. The goal is to forecast the severity of the condition up to 2 days in advance to help manage healthcare resources efficiently. The study uses a dataset of COVID-19 patients from a specific medical center and assesses various models for their predictive accuracy.

The key findings and points in the article include:

- The importance of predicting the daily severity of COVID-19 to facilitate proactive resource allocation, such as respiratory devices.

- The comparison of different types of prediction models, including non-temporal (logistic regression, random forest, gradient boost) and temporal models (transformer).

- The use of patient data, including demographics, symptoms, laboratory tests, and vital signs as input features for prediction.

- The performance of the models in predicting severity for different time horizons (day 0, day 1, and day 2), with the random forest model outperforming others for predicting day 0 severity.

- The importance of feature selection and model interpretability using the SHAP method, with differences in feature importance between models.

- The discussion of model generalizability to other hospitals and the potential limitations of the study, including data availability and clinical variations.

In conclusion, the study presents a machine learning model, particularly the random forest model, as a valuable tool for predicting COVID-19 severity, aiding healthcare resource allocation, and offering insights into patient outcomes. The hierarchical transformer model also demonstrated good performance for certain periods of hospital admission. Further validation and research are needed to confirm the findings and adapt the model for use in different healthcare settings.

Comments that should be addressed by the authors:

1. Early identification of patients who are on the path to recovery could provide a justification for reducing or discontinuing ventilatory support and potentially adjusting the timing of treatment interventions?

- We value your insights. In our study, the primary goal was to proactively detect signs of patient deterioration and recovery. Although our model doesn't directly change patient trajectory upon early identification, its strength lies in the anticipation of potential deterioration. This early prediction becomes pivotal when considering resource allocation. On days where the model predicts fewer severe cases, receiving a call about a deteriorating patient might lead to more flexible allocation of resources, such as quickly assigning ventilatory support like HFNC. However, on days with a higher predicted count of at-risk patients, the same call might necessitate more judicious and possibly stringent resource allocation, ensuring that the most critical patients receive timely intervention. This nuanced approach to resource allocation, guided by our predictive model, aims to optimize patient care outcomes in varied scenarios.

2. The utilization of an exceptionally large number of predictors raises concerns about potential overfitting of the model to the dataset.

We acknowledge your concern regarding the potential for overfitting due to the extensive number of predictors. It's worth noting that our inputs predominantly consist of variables routinely collected during standard patient care, enhancing their potential for generalization across different hospitals. While many contemporary predictive models strive to incorporate as many inputs as possible, we are aware of the risks associated with overfitting. To mitigate this, we employed a methodological approach by partitioning the dataset into training, validation, and test sets—a standard practice that we have applied in our study. Furthermore, we recognize the potential for bias, especially given that some high-performing tree-based models may produce overfitted results by prioritizing hospital-dependent factors. As a proactive measure, we advocate for the use of deep learning models that exclude these potential biases (hospital-dependent elements), ensuring not only effective near-term outcome prediction but also facilitating model generalization across various hospitals.

3. The algorithm predicts severity of disease within 2 days. How should a clinician interpret the results of the algorithm if the prediction is opposed to clinical evidence? For example, if a patient is at clinical stability and the algorithm predicts impeding severe disease, what kind of information can be deducted by the clinician to address the causes of impending aggravation of the disease?

We recognize the significance of your concern. In time-series prediction models, including ours, there's an inherent challenge when the model's forecast diverges from the observed clinical status. This challenge isn't unique to our model but is a general consideration for all time-series prediction models. In our current study, our primary focus has been on forecasting potential risks and recovery. The question of how to act upon a prediction, especially when it appears to contradict clinical observations, is indeed the subsequent phase of our model's objectives. We understand that further prospective studies are warranted to assess the potential risks associated with such discrepancies. For the time being, our model has been developed with an emphasis on addressing the challenges of overcrowded hospitals, where clinicians are constantly grappling with decisions related to ICU admissions, HFNC assignments, and potential discontinuations. In such scenarios, our model could serve as a valuable tool for healthcare professionals navigating high-demand environments.

Attachment

Submitted filename: response to reviewer 1024.docx

Decision Letter 3

John Adeoye

31 Oct 2023

In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health records

PONE-D-23-14493R3

Dear Dr. Jeong,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

John Adeoye

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

John Adeoye

18 Jan 2024

PONE-D-23-14493R3

PLOS ONE

Dear Dr. Jeong,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. John Adeoye

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Supplement methods.

    (DOCX)

    S1 Fig. Missing value proportion per patient.

    (TIF)

    S2 Fig. Exploration of laboratory distribution.

    (TIF)

    S3 Fig. Difference of mode, median, and mean in ferritin value.

    (TIF)

    S4 Fig. Histogram of the length of hospital admission duration.

    (TIF)

    S5 Fig. The feature importance of the models for predicting mild and moderate COVID-19.

    (TIF)

    S1 Table. Percentile and clipping values of continuous variables.

    (DOCX)

    S2 Table. Symptom presentation of COVID-19 patients during hospital admission.

    (DOCX)

    S3 Table. Detailed model performance comparison.

    (DOCX)

    Attachment

    Submitted filename: rebuttal_letter_plos one.doc

    Attachment

    Submitted filename: rebuttal_letter_plos_one1024.docx

    Attachment

    Submitted filename: response to reviewer 1024.docx

    Data Availability Statement

    The data that support the findings of this study were provided by the Institutional Review Board (IRB) of Boramae Medical Center. While these data are not publicly accessible, they can be provided upon a reasonable request and with the approval of the IRB. For further inquiries or data requests, please directly contact the IRB of Boramae Medical Center at Tel: 82-2-870-1851. We have acknowledged the concern about the authors being the sole individuals responsible for data access and have taken appropriate measures to address this.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES