Skip to main content
Medical Journal of the Islamic Republic of Iran logoLink to Medical Journal of the Islamic Republic of Iran
. 2022 Apr 4;36:30. doi: 10.47176/mjiri.36.30

Predicting the Need for Intubation among COVID-19 Patients Using Machine Learning Algorithms: A Single-Center Study

Raoof Nopour 1, Mostafa Shanbehzadeh 2, Hadi Kazemi-Arpanahi 3,4,*
PMCID: PMC9386770  PMID: 35999913

Abstract

Background: Owing to the shortage of ventilators, there is a crucial demand for an objective and accurate prognosis for 2019 coronavirus disease (COVID-19) critical patients, which may necessitate a mechanical ventilator (MV). This study aimed to construct a predictive model using machine learning (ML) algorithms for frontline clinicians to better triage endangered patients and priorities who would need MV.

Methods: In this retrospective single-center study, the data of 482 COVID-19 patients from February 9, 2020, to December 20, 2020, were analyzed by several ML algorithms including, multi-layer perception (MLP), logistic regression (LR), J-48 decision tree, and Naïve Bayes (NB). First, the most important clinical variables were identified using the Chi-square test at P < 0.01. Then, by comparing the ML algorithms' performance using some evaluation criteria, including TP-Rate, FP-Rate, precision, recall, F-Score, MCC, and Kappa, the best performing one was identified.

Results: Predictive models were trained using 15 validated features, including cough, contusion, oxygen therapy, dyspnea, loss of taste, rhinorrhea, blood pressure, absolute lymphocyte count, pleural fluid, activated partial thromboplastin time, blood glucose, white cell count, cardiac diseases, length of hospitalization, and other underline diseases. The results indicated the J-48 with F-score = 0.868 and AUC = 0.892 yielded the best performance for predicting intubation requirement.

Conclusion: ML algorithms are potentials to improve traditional clinical criteria to forecast the necessity for intubation in COVID-19 in-hospital patients. Such ML-based prediction models may help physicians with optimizing the timing of intubation, better sharing of MV resources and personnel, and increase patient clinical status.

Keywords: COVID-19, Coronavirus, Machine Learning, Intubation, Prognosis, Mechanical Ventilator

Introduction

↑What is “already known” in this topic:

Despite the effective and large-scale vaccination programs, tolls of COVID-19 new cases, caused by extensive dissemination of multiple variants, have increased. This pandemic overwhelmed the health care systems across the world with severe shortages in critical medical resources.

→What this article adds:

In this study, we applied several machine learning algorithms to predict the likelihood of the need for Mechanical ventilation in hospitalized COVID-19 patients based on routine clinical data collected at the first time of admission. Based on the research’s results, it found that machine learning algorithms enable a reasonable level of accuracy in predicting the risk of intubation among hospitalized COVID-19 patients.

The Coronavirus Disease of 2019 (COVID-19) is a highly contagious viral infection that has to propagate speedily around the world as of the first advent in early December 2019 reported in Wuhan, Hubei province, China (1,2). The COVID-19 is characterized by a varied and multi-dimensional clinical picture. The disease severity ranged from asymptomatic infection to mild symptoms, and baseline comorbidities appear after one week following infection onset, and even serious progressive complications in a small proportion of patients requiring the intensive care unit (ICU) admission (3,4). Despite the effective and large-scale vaccination plans, tolls of COVID-19 new cases, caused by extensively contagious multiple variants, have plateaued (5-7). Old-age, male sex, pre-existing conditions, and hypoxemia demonstrated as significant factors leading to the critical stage (8-10). The critical or grave stage of COVID-19 is characterized by serious complications such as acute respiratory distress syndrome (ARDS), cytokine storm syndrome, and multi-system organ dysfunction (MOF) (10,11).

The COVID-19 patients with acute respiratory insufficiency required a medical ventilator (MV) and supplemental oxygen (12). Therefore, to manage the MV scarceness, a clinical judgment is required to decide the need for early or postponing intubation and who doesn’t necessitate it (13). Furthermore, the COVID-19 course and outcome are unpredictable that complex this situation. There is a high degree of vagueness in the deterioration of the patient’s clinical status and in the speed at which cases develop respiratory distress demanding MV. Estimattion of the number of patients that need MV has been considered in previous researches (8-11,14,15). To address these problems, in this study, we aimed to develop machine learning (ML)-based prediction models for frontline clinical workers and public health authorities to better triage endangered patients and priorities who would need MV.

ML as a sub-category of artificial intelligence (AI) is increasingly employed for COVID-19 screening, diagnosis, prediction, and prognosis outcomes (16,17). It can rapidly synthesize and analyze large dimensional data. ML algorithms are employed to generate the prognostic models that can be used to support and improve clinical decision-making for a wide diversity of outcomes (18,19). In the prior studies, a large number of ML-based models were developed for estimating the risk of COVID-19ʹ severity and patient illness deteriorating (16,20), ICU admission (20-24), and deaths (21,22,25-30). Thus, this study aimed to construct and compare several ML-based prediction models for predicting the COVID-19 patients' severity requiring MV.

Methods

This retrospective single-center study aimed at predict of the need for MV among COVID-19 hospitalized patients using four popular ML algorithms.

Dataset definition

In this study, a COVID-19 hospital-based registry from Ayatollah Taleqhani hospital, Abadan city, Southwest of Khuzestan, Iran, was retrospectively reviewed from February 9, 2020, to December 20, 2020. During this period, a total of 6854 suspected cases with COVID-19 had been referred to this center, of whom 1853 cases were introduced as positive COVID-19, 2472 as negative, and 2529 as unspecified.

The inclusion criteria for patient selection were: 1- hospitalized patients with confirmed COVID-19, 2- patients who were greater than 18 years of age, 3- those with qualitative and comprehensive medical documentation (missing less than 70%), and 4-. On the other hand, the exclusion criteria for patient selection were: 1- non-COVID-19 cases or non-hospitalized COVID-19 or patients with unknown disposition, 2- patients who were less than 18 years of age, The under 18 age patients should be included in the scope of pediatric exploration. 3- incomplete case records (missing more than 70%), and 4- discharged / death from the emergency department or unknown patient disposition. The data on 1853 positive RT-PCR patients were extracted from the Ayatollah Taleghani hospital registry database. Based on the Table 1, the number of 53 clinical features in five classes including patient’s demographic data (five features), clinical features (14 features), history of personal diseases (five features), epidemiological (two features), laboratory results (26 features), remedies (one feature) and an output variable (0: non-intubation and 1: intubation) are extracted from the dataset. Table 1 demonstrates all different determinant factors associated with the prediction of intubation.

Table 1. All extracted clinical features from the dataset .

Mode Feature classes Features
Inputs Basic Age, Sex, height, weight, and blood group
Clinical Cough, nausea, headache, gastrointestinal (GI) manifestation, chill, loss of taste and smell, rhinorrhea, sore throat, contusion, fever, muscular pain, vomiting, dyspnea,
History of diseases Cardiac disease, pneumonia, hypertension, diabetes, and other underline diseases
Laboratory red-cell count, hematocrit, hemoglobin, absolute lymphocyte count, blood calcium, blood potassium, absolute neutrophil count, alanine aminotransferase (ALT), magnesium, activated partial, prothrombin time, alkaline phosphatase, platelet count, hypersensitive troponin creatinine, white cell count, aspartate aminotransferase (ASP), blood glucose, total bilirubin, erythrocyte sedimentation rate (ESR), c-reactive protein, albumin, activated partial thromboplastin time, lactate dehydrogenase (LDH), blood phosphorus, blood sodium, and blood urea nitrogen (BUN)
Epidemiological Smoking, alcohol addiction
Remedy Oxygen therapy
Output Outcome Endotracheal intubation (Yes, No)

Dataset normalization and preprocessing

In this study, first, all included cases were investigated by two health information managers (R: N and H: KA) with consulting two infectious diseases and virology specialists. After reviewing all patients’ records, those with more than 70% missing values were omitted from the analysis. For other missing fields, the average of the existing available values and the K-Nearest Neighborhood (KNN) Euclidean distance for the quantitative and qualitative variables were used, respectively, in the Rapid Miner Studio V-7.1.001 environment.

Feature selection

In this study, for reducing the dataset dimension, we used the Chi-square (χ2) test for determining the relationship between each independent variable (53 variables) with the dependent (intubation: Yes or No) as the output class in SPSS software V25. The P<0.01 has been considered as a statistically significant level in this respect.

ML algorithms

The four ML algorithms have been utilized in this study for building the prediction models for intubation risk assessment among COVID-19 hospitalized patients in Weka V3.9, because of their high rate usage of these algorithms in recent articles, and also, their higher performance in terms of data classification process than other data mining algorithms.

Multi-layer Pe rception (MLP): MLP is one of the most popular Artificial Neural Networks (ANNs) utilized for knowledge modeling in different scientific domains. An MLP consists of at least three layers of nodes: input, hidden, and output layers. Each node has its weight for communication with other nodes. The input layers have consisted of variables affecting the study output(s). The number of nodes in this layer is equal to the number of independent variables that existed in the study. The hidden or processing layer is included different nodes with a specific number of layers that can perform different calculations using math function in the logistic activation method for giving the suitable output values depending on different amounts of inputs. The number of the output layer is equaled to the output variable and this layer gives the results of calculations in ANNs using the linear activation method that existed between multiple nodes (31-34). In this research, the back-propagation neural network (BPNN) along with tansig activation methods have been used to train the prediction model.

Logistic Regression (LR): LR has various applications, especially in health domains for example estimating the outcomes from different influencing factors and making a beneficial model for prognostic models (35,36). In reality, this is a probabilistic and statistical model that can predict the dependent variable(s) in two situations: 1- the dependent variables are qualitative with two or more values also known as binominal and poly nominal variables, respectively and 2- the independent variables are highly correlated concerning the output class. Hence in this situation, we can evaluate the effects of variables on each other in predicting the probability of the output class. The formula of the LR has been represented in Equation 1. In this equation, y is equaled to the anticipated output, (a) demonstrates the intercept term or bias, and (b) is the coefficient for the sole input value of (x) (37-39).

Equation 1:

y=ea+bx1+ea+bx

J-48:The C4.5 decision tree algorithm known as J-48 in the Weka data mining environment is a more advanced algorithm developed from the ID3 decision tree algorithm. Extracting the rule sets from this algorithm causes that this algorithm has more widely applicable than other algorithms. In this decision tree type similar to others, the classes or dependent variables lie in the leaf of the tree, and the input variables lie in the paths from the root nodes, which lies the independent variable with the highest Information gain (IG) to the leaf nodes. These paths are called the branches (nodes from roots to leaf), and the rules can be extracted from them. The IG is a classification method in splitting the nodes and building the decision tree by finding the differences between weighted entropies of each tree branch and main entropies. Equation 2 demonstrates a simple calculation formula of the IG. The C is the dataset class and the Pi represents the probability of selecting an element of the class (i) randomly.

Equation 2:

IG=icPilog2Pi

Generally, some beneficial features existed in this algorithm, such as pruning the decision tree by setting the confidence factor, abilities in the classification of the continuous and numerical variables, considering the missing values in sample classification, and rule derivation which caused that this algorithm has become better than other algorithms, especially decision trees (40-42).

Naïve Bay es (NB):The NB is a simple algorithm that is based on the Bayes theory. In this theory, all features values that existed in databases are for predicting the output class are independently considered in contrast with most other algorithms such as LR as one of them with the hybrid correlation between input variables to predict output class, and all features are equal in determining the output. It can be used for mining datasets with high dimensions. Some outstanding features that existed in this algorithm are 1- linear training time associated with features in model classification, 2- Low variance: although there are highly biased in this algorithm’s samples classification, because of not utilizing the searching method, it is a low-variance algorithm), 3-Insensitive in associated with the missing values: in this algorithm, all the features existing in the database will be used in predicting the output class, and although there might be a missing value associated with one feature, the other features can be used for predicting with simultaneously, a slight diminishing in algorithm performance. Generally, because of using all features that existed in the database and the nature of probabilistic, this algorithm is less sensitive to the noise and missing values. The probability of predicting the output class using the NB can be calculated in Equation 3. In this equation, the probability of occurrence of Y provided the occurrence of the X is the probability of occurring the feature of X in the condition that the output class (Y) occurs and the probability of the output class occurrence (P(Y)). This equation demonstrates the importance and independence of each input class in determining the occurrence of the output class distinctly (42-45).

Equation 3:

PY/X=PYPX/YPX

Performance evaluation of selected ML algorithms

In this study, the confusion matrix (Table 2) has been used for measuring the capabilities of each data mining algorithm in classification. In this table, the True Positive (TP) represents the hospitalized COVID-19 patients who have performed the intubation and are truly classified by the data mining algorithms; True Negative (TN) has belonged to hospitalized COVID-19 patients without any intubation and is classified truly by the model. The False Negative (FN) and False Positive (FP) have belonged to hospitalized COVID-19 patients who had and had not done the intubation, respectively, and were falsely classified by the model. Based on the confusion matrix, the TP-Rate, FP-Rate, Precision, Recall, F-Score, Matthews Correlation Coefficient (MCC), Kappa statistics, and AUC (Area Under the ROC (Receiver Operator Curve) of each algorithm have been measured, and then the capability of each data mining algorithm has been assessed using these evaluation criteria. 10% fold cross-validation has been considered in this regard. Finally, the best data mining algorithm has been explained in more detail.

Table 2. Confusion matrix .

Results Predicted cases
+ -
Real cases + TP FP
- FN TN

Results

After applying exclusion criteria, ultimately, the 482 case records were selected for the study (191 and 291 cases were associated with intubated and non-intubated hospitalized COVID-19 patients, respectively.) (Fig. 1).

Fig. 1.

Fig. 1

Flow chart describing patient selection

The results of using the Chi-square test for determining the association between each factor and intubation outcome demonstrated that the variables such as age (χ2=3.222 at P=0.124), sex (χ2=6.222 at P=0.126), height (χ2=2.256 at P=0.068), weight (χ2=16.226 at P=0.285), and blood group (χ2=4.446 at P=0.123) as basic classes, and nausea (χ2=12.567 at P=0.072), headache (χ2=1.114 at P=0.049), GI manifestation (χ2=2.774 at P=0.171), chill (χ2=21.552 at P=0.243), loss of smell (χ2=4.771 at P=0.110), sore throat (χ2=5.54 at P=0.086), fever (χ2=13.446 at P=0.121), muscular pain (χ2=21.256 at P=0.056), and vomiting (χ2=14.954 at P=0.151) as clinical manifestations, and red-cell count (χ2=3.223 at P=0.068), hematocrit (χ2=6.532 at P=0.113), hemoglobin (χ2=1.32 at P=0.081), blood calcium (χ2=4.412 at P=0.095), blood potassium (χ2=3.12 at P=0.072), absolute neutrophil count (χ2=14.889 at P=0.171), ALT (χ2=2.226 at P=0.144), blood magnesium (χ2=1.112 at P=0.085), alkaline phosphatase (χ2=5.847 at P=0.062), platelet count (χ2=1.776 at P=0.041), hypersensitive troponin (χ2=4.112 at P=0.075), creatinine (χ2=7.412 at P=0.041), ASP (χ2=2.745 at P=0.093), total bilirubin (χ2=18.745 at P=0.166), ESR (χ2=14.256 at P=0.083), C-reactive protein (χ2=5.445 at P=0.143), albumin (χ2=12.332 at P=0.121), activated partial thromboplastin time (χ2=13.227 at P=0.165), LDH (χ2=4.556 at P=0.064), blood phosphorus (χ2=1.226 at P=0.082), blood sodium (χ2=7.747 at P=0.188), and BUN (χ2=2.266 at P=0.121) as laboratory findings and alcohol consumption (χ2=16.227 at P=0.075) and smoking (χ2=8.887 at P=0.111) as epidemiological and pneumonia (χ2=4.536 at P=0.162) and diabetes (χ2=11.447 at P=0.061) as history of diseases, gained the P>0.01, and therefore, were not considered as the important factor predicting the intubation among hospitalized COVID-19 patients and were excluded from the analysis process.

For 15 variables, there was a meaningful relationship with output class (intubation prediction) at P<0.01, and so has been shown in Table 3.

Table 3. Important features related to the prediction of the need for MV .

No Variable Variable’s type Frequency or
Mean +/- SD
χ2 p-value
1 Cough Nominal Yes (401)
No (81)
5.949 <0.001
2 Contusion Nominal Yes (180)
No (302)
4.997 <0.001
3 Oxygen therapy Nominal Yes (437)
No (45)
7.01 <0.001
4 Dyspnea Nominal Yes (442)
No (40)
15.023 <0.001
5 Loss of taste Nominal Yes (124)
No (358)
7.722 <0.001
6 Rhinorrhea Nominal Yes (202)
No (280)
10.239 <0.001
7 Blood pressure Nominal Yes (189)
No (293)
7.281 <0.001
8 Absolute lymphocyte count Numeric 21.702±12.01 23.46 <0.001
9 Pleural fluid Nominal Yes (275)
No (78)
19.583 <0.001
10 Activated partial thromboplastin time Numeric 35.453±9.25 17.458 <0.001
11 Blood glucose Numeric 148.4±96.946 12.884 <0.001
12 White cell count Numeric 9684±1241 14.424 <0.001
13 Cardiac diseases Nominal Yes (157)
No (325)
12.491 <0.001
14 Length of hospitalization Numeric 5.03±2.188 2.713 <0.001
15 Other underline diseases Nominal Yes (339)
No (143)
13.277 <0.001

Based on the information given in Table 3, the 15 variables obtained the meaningful association at P<0.01. Of these, five variables including the history of cardiac diseases (χ2=12.491, P<0.001), pleural fluid (χ2=19.583, P<0.001), absolute lymphocyte count (χ2=23.46, P<0.001), cough (χ2=5.949, P<0.001), and dyspnea (χ2=15.023, P<0.001) yielded the highest association at P<0.001 to predict the need for MV among hospitalized COVID-19 patients. The results of classifying the samples using the confusion matrix have been shown in Table 4.

Table 4. The data mining algorithm’s confusion matrix .

No Algorithm TP FP FN TN
1 MLP 212 78 108 84
2 LR 241 49 95 97
3 J-48 266 24 39 153
4 NB 195 95 56 136

Based on the information provided in Table 4, the J-48 decision tree algorithm with TP=266 and TN=153 yielded the highest performance in the prediction of the need for MV. Also, this algorithm with FP= 24 and FN=39 had the lowest incorrectly classified samples than others. The results of the performance of selected ML algorithms based on the TP-Rate, FP-Rate, Precision, Recall, F-Score, MCC, and Kappa statistics have been shown in Figure 2.

Fig. 2.

Fig. 2

Visual comparison of ML algorithm capabilities for prediction of the need for MV

The results of classifying the samples using the AUC have been demonstrated in Figure 3, (The vertical and horizontal vertices show the TP-Rate and FP-Rate, respectively).

Fig. 3.

Fig. 3

The ROC diagrams of selected ML algorithms

Based on the information given in Figures 1 and 2, it has resulted that the J-48 decision tree algorithm with TP-Rate=0.869, FP-Rate=0.155, Precision=0.869, Recall=0.869, F-Score=0.868, MCC=0.725, Kappa=0.723, and AUC=0.892 had the best capability for early predicting the risk of intubation in COVID-19 hospitalized patients. On the other hand, the MLP with TP-Rate=0.614, FP-Rate=0.446, Precision=0.605, Recall=0.614, F-Score=0.607, MCC=0.175, Kappa=0.173 and AUC=0.639 gained the worst predictive performance. Therefore, the J-48 decision tree algorithm with confidence factors of 0.15 has been depicted in Figure 4.

Fig. 4.

Fig. 4

The pruned J-48 decision tree algorithm

Based on the J-48 decision tree algorithm, some clinical rules have been extracted, we have brought the two most important of them with the highest samples classified.

Rule 1: IF (Activated partial thromboplastin time <=31) THEN the Intubation=True. This rule can be interpreted as overall among the 64 research samples who had more than 31 of activated partial thromboplastin time, the 47 samples had the intubation process, and the variable as the root node in the J-48 decision tree was considered as the most important factor for determining the endotracheal intubation risk among hospitalized COVID-19 patients.

Rule 2:IF (Activated partial thromboplastin time >31 && Pleural fluid=Yes && White cell count <9200 && activated partial thromboplastin time <=43) THEN the endotracheal intubation risk =negative. In this study, 221 samples had this rule template, and among them, 187 samples have been classified correctly through this rule template as negative or low risk of endotracheal intubation. Generally, this rule with the most classified samples has been recognized as the most important decision rule in this research.

Discussion

Given the high spectrum of COVID-19 clinical manifestations, it is important to construct models for estimating the likelihood of intubation by using ML techniques. Thus, we trained four ML-based models according to the top related parameters affecting the risk of intubation that derived from a statistical analysis. The ML methods employed herein included ANN, LR, J-48, and NB techniques which were trained using the most important forecasters from 482 hospitalized laboratory-confirmed COVID-19 patients at the time of admission. Finally, based on our analysis, we found that the J-48 classifier with an F-score of 0.868 and AUC of 0.892 has better performance than other selected ML algorithms.

During the COVID-19 pandemic, the requirement for informed decision-making is most imperative, specifically, where the healthcare system runs into an increasing surge of patients and scarcities in intensive care resources such as ICU beds and ventilators (46,47). clinicians have stated trouble in forecasting the disease progression of COVID-19 in-hospital patients, along with problems in the detection of patients who are susceptible to fast decompensation (48). In response to this life-treating infection, the design and implementation of clinical decision support systems (CDSS), will be critical to hold the optimal use of limited hospital resources and supporting clinical decisions (16,49). CDSSs equipped with ML can help clinical decisions by informing caregivers and recommending interventions based on objective and generalizable experimental data (50). Our study proves that ML algorithms, particularly the J-48 algorithm, augment the analytic precision and the discriminative efficacy of these variables, increasing their usage for estimating the need for MV among COVID-19 hospitalized patients.

So far, several studies have been evaluating the application of ML techniques in predicting the COVID-19 poor outcomes. Saha et al. (2021) designed an intelligent system based on some ML algorithms using a dataset of 1023 patients’ data to predict future intubation among hospitalized patients with COVID-19. Finally, the best performance was yielded by the DT algorithm with an AUC of 0.84 (51). Alotaibiet al. (2021) in their study, assessed the performance of three ML algorithms for early prediction of disease severity using patient history and laboratory findings of patients with COVID-19, and the best performance in all the applied techniques is yielded by the Random Forest (RF) (AUC= 0.897) (52). In one study performed by Cobre (2021) the data of 5,643 COVID-19 negative and positive samples were analyzed to predict the individual severity by selected ML models. The results showed DT algorithm has a good discriminative ability with an accuracy of 86% (53). Accordingly, Yadaw et al. (2020) assessed the performance of four ML algorithms using a dataset including 3841 COVID-19 records for the prediction of COVID-19 deterioration and severity. Finally, the DT model with an AUC of 0.92 was introduced as the most appropriate algorithm (29). Pan and their colleagues (2020) assessed the performance of four ML algorithms to anticipate patient condition deterioration with COVID-19 and the best performance was reported from RF the model (AUC= 0.92) (24). Gao and their colleagues (2020) retrospectively studied the 2520 COVID-19 hospitalized patients' medical records with 13 physical features to construct an intelligent predictive model through selected ML algorithms for physiological deterioration and endotracheal intubation prediction. Finally, the DT model with an AUC of 0.9760% gained the best performance (25). Similarly, in the current study, the results showed that the J-48 decision tree with an F-score of 0.868 and AUC of 0.892 has the best capability for early prediction of the risk of intubation in COVID-19 hospitalized patients.

The high predictive measures attained by the developed J-48 model in our study reveal that it has the capability of correct judgment amid COVID-19 patients at high risk against low risk of demanding MV. The innovation of the current study lies in the fact that contrast to the prior studies, we predict the intubation possibility based on the most pertinent predictors derived from the performing feature selection. Furthermore, to precisely detect predictors for intubation in infected patients with COVID-19, we evaluated the patient's features at the first time of admission and not at the progressive or severe stage of the disease. For this reason, some important laboratory features such as increased ALT/ASP, high BUN, elevated C-reactive protein, and increased lymphocyte or neutrophil are not identified as intubation predictors in our study because these factors may only develop in the advanced stage that was omitted from analysis in our study.

Zhou(11), Choron (54), Allenbach (23), Lei (55), and Yadaw (29) stated that some predictors such as age (elderly), BMI (high), gender (male sex), ALT/ASP (raised), C-reactive protein (elevated) and oxygen saturation (decreased); had been related to COVID-19 poor outcomes and patient deterioration condition. However, these factors are likewise very predominant in COVID-19 moderate or asymptomatic presentation. But our analysis in this study does not demonstrate the association between these variables with intubation as a critical outcome of COVID-19. This hole may originate from the analysis of the only selected admitted patients in the hospital instead of population-based investigation. Henceforth, if validated, these predictors could be used for estimating the risk for patients’ intubation and may support the effective patients’ triaging.

This work has some limitations that need to be addressed. First, as analysis of a single-center and retrospective dataset with limited sample size and the outcome of intubation for model prediction in our study is rare, the study design might be affected by several hypothesis testing biases. Thus, external validation is essential to be conducted in further studies. Second, the dynamic variations of some significant variables must be followed up to better and timely recognize patients at higher risks of poor outcomes. Finally, the selected dataset lacks some important clinical variables, such as radiological indicators. In the future, the performance accuracy of our model and its generalizability will be enhanced if we test more ML techniques at the larger, multicenter, and prospective dataset which is equipped with more qualitative and validated data.

Conclusion

In this article, we analyzed the data from a hospital registry to develop and test models capable of predicting the need for MV in hospitalized COVID-19 patients according to 15 baseline clinical features. The results disclosed a satisfactory performance and tuning of the J-48 decision tree model, which indicates that adopting the models is acceptable. Given the considerable challenges concerning hospital resources, including MV, during the COVID-19 pandemic, an exact estimate of patients to be expected to require intubation may aid to provide vital guidance regarding priority patients toward assigning the restricted resources to patients whom emergency required. Further, timely detection of such people may allow for planned intubation measures and decrease some known risks related to urgent intubation. These developed prediction models may therefore be an advantage in better care delivery, lessen clinician workload, lessen illness and death in the COVID-19 pandemic.

Acknowledgment

We thank the research deputy of the Abadan University of Medical Sciences for financially supporting this project. (IR.ABADANUMS.REC.1400.071).

Conflict of Interests

The authors declare that they have no competing interests.

Cite this article as: Nopour R, Shanbehzadeh M, Kazemi-Arpanahi H. Predicting the Need for Intubation among COVID-19 Patients Using Machine Learning Algorithms: A Single-Center Study. Med J Islam Repub Iran. 2022 (4 Apr);36:30. https://doi.org/10.47176/mjiri.36.30

References

  • 1.Zhou Y, He Y, Yang H, Yu H, Wang T, Chen Z, et al. Exploiting an early warning Nomogram for predicting the risk of ICU admission in patients with COVID-19: a multi-center study in China. Scand J Trauma Resusc Emerg Med. 2020;28(1):1–13. doi: 10.1186/s13049-020-00795-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Chen Y, Linli Z, Lei Y, Yang Y, Liu Z, Xia Y, et al. Risk factors for mortality in critically ill patients with COVID‐19 in Huanggang, China: A single‐center multivariate pattern analysis. J Med Virol. 2020. [DOI] [PMC free article] [PubMed]
  • 3.Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506. doi: 10.1016/S0140-6736(20)30183-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bolourani S, Brenner M, Wang P, McGinn T, Hirsch JS, Barnaby D, et al. A machine learning prediction model of respiratory failure within 48 hours of patient admission for COVID-19: model development and validation. J Med Internet Res. 2021;23(2):e24246. doi: 10.2196/24246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Thompson MG, Burgess JL, Naleway AL, Tyner HL, Yoon SK, Meece J, et al. Interim estimates of vaccine effectiveness of BNT162b2 and mRNA-1273 COVID-19 vaccines in preventing SARS-CoV-2 infection among health care personnel, first responders, and other essential and frontline workers—eight US locations, December 2020–March 2021. MMWR Morb Mortal Wkly Rep. 2021;70(13):495. doi: 10.15585/mmwr.mm7013e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Alhazzani W, Møller MH, Arabi YM, Loeb M, Gong MN, Fan E, et al. Surviving Sepsis Campaign: guidelines on the management of critically ill adults with Coronavirus Disease 2019 (COVID-19) Crit Care Med. 2020;46(5):854–87. doi: 10.1007/s00134-020-06022-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Abdool Karim SS, de Oliveira T. New SARS-CoV-2 variants—clinical, public health, and vaccine implications. N Engl J Med. 2021;384(19):1866–8. doi: 10.1056/NEJMc2100362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. Jama. 2020;323(20):2052–9. doi: 10.1001/jama.2020.6775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Goyal P, Choi JJ, Pinheiro LC, Schenck EJ, Chen R, Jabri A, et al. Clinical characteristics of Covid-19 in New York city. N Engl J Med. 2020;382(24):2372–4. doi: 10.1056/NEJMc2010419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chow N, Fleming-Dutra K, Gierke R, Hall A, Hughes M, Pilishvili T. CDC COVID-19 Response Team. Preliminary estimates of the prevalence of selected underlying health conditions among patients with coronavirus disease 2019—United States, February 12–March 28, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(13):382–6. doi: 10.15585/mmwr.mm6913e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054–62. doi: 10.1016/S0140-6736(20)30566-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hawkins A, Stapleton S, Rodriguez G, Gonzalez RM, Baker WE. Emergency Tracheal Intubation in Patients with COVID-19: A Single-center, Retrospective Cohort Study. West J Emerg Med. 2021;22(3):678. doi: 10.5811/westjem.2020.2.49665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Zhang K, Jiang X, Madadi M, Chen L, Savitz S, Shams S, editors. DBNet: a novel deep learning framework for mechanical ventilation prediction using electronic health records. Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2021.
  • 14.Suardi LR, Pallotto C, Esperti S, Tazzioli E, Baragli F, Salomoni E, et al. Risk factors for non-invasive/invasive ventilatory support in patients with COVID-19 pneumonia: A retrospective study within a multidisciplinary approach. Int J Infect Dis. 2020;100:258–63. doi: 10.1016/j.ijid.2020.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. JAMA. 2020;323(11):1061–9. doi: 10.1001/jama.2020.1585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Assaf D, Gutman Ya, Neuman Y, Segal G, Amit S, Gefen-Halevi S, et al. Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Intern Emerg Med. 2020;15(8):1435–43. doi: 10.1007/s11739-020-02475-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Booth AL, Abels E, McCaffrey P. Development of a prognostic model for mortality in COVID-19 infection using machine learning. Mod Pathol. 2020:1-10. [DOI] [PMC free article] [PubMed]
  • 18. Yan L, Zhang H, Goncalves J, Xiao Y, Wang M, Guo Y, et al. A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv. 2020.
  • 19. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ 2020;369. [DOI] [PMC free article] [PubMed]
  • 20. Agieb R. Machine learning models for the prediction the necessity of resorting to icu of COVID-19 patients. Int J Adv Trends Comput Sci Eng. 2020:6980-4.
  • 21.Ryan L, Lam C, Mataraso S, Allen A, Green-Saxena A, Pellegrini E, et al. Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: a retrospective study. Ann Med Surg. 2020;59:207–16. doi: 10.1016/j.amsu.2020.09.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhao Z, Chen A, Hou W, Graham JM, Li H, Richman PS, et al. Prediction model and risk scores of ICU admission and mortality in COVID-19. PloS One. 2020;15(7):e0236618. doi: 10.1371/journal.pone.0236618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Allenbach Y, Saadoun D, Maalouf G, Vieira M, Hellio A, Boddaert J, et al. Development of a multivariate prediction model of intensive care unit transfer or death: A French prospective cohort study of hospitalized COVID-19 patients. PloS One. 2020;15(10):e0240711. doi: 10.1371/journal.pone.0240711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pan P, Li Y, Xiao Y, Han B, Su L, Su M, et al. Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation. J Med Internet Res. 2020;22(11):e23128. doi: 10.2196/23128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gao Y, Cai GY, Fang W, Li HY, Wang SY, Chen L, et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat Commun. 2020;11(1):1–10. doi: 10.1038/s41467-020-18684-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hernandez-Suarez DF, Ranka S, Kim Y, Latib A, Wiley J, Lopez-Candales A, et al. Machine-learning-based in-hospital mortality prediction for transcatheter mitral valve repair in the United States. Cardiovasc Revasc Med. 2021;22:22–8. doi: 10.1016/j.carrev.2020.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Parchure P, Joshi H, Dharmarajan K, Freeman R, Reich DL, Mazumdar M, et al. Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19. BMJ Support Palliat Care. 2020. [DOI] [PMC free article] [PubMed]
  • 28.Vaid A, Jaladanki SK, Xu J, Teng S, Kumar A, Lee S, et al. Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach. JMIR Med Inform. 2021;9(1):e24207. doi: 10.2196/24207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yadaw AS, Li Y-c, Bose S, Iyengar R, Bunyavanich S, Pandey G. Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. Lancet Digit Health. 2020;2(10):e516–e25. doi: 10.1016/S2589-7500(20)30217-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yan L, Zhang HT, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Med Intell. 2020;2(5):283–8. [Google Scholar]
  • 31.Bayram S, Ocal ME, Laptali Oral E, Atis CD. Comparison of multi layer perceptron (MLP) and radial basis function (RBF) for construction cost estimation: the case of Turkey. J Civ Eng Manag. 2016;22(4):480–90. [Google Scholar]
  • 32. Taud H, Mas J. Multilayer perceptron (MLP). Geomatic Approaches for Modeling Land Change Scenarios: Springer; 2018. p. 451-5.
  • 33. Singh G, Sachan M, editors. Multi-layer perceptron (MLP) neural network technique for offline handwritten Gurmukhi character recognition. 2014 IEEE international conference on computational intelligence and computing research. 2014: IEEE.
  • 34. Ibrahim S, Kamaruddin SA, Mangshor NNA, Fadzil AFA. Performance evaluation of multi-layer perceptron (MLP) and radial basis function (RBF): COVID-19 spread and death contributing factors. Int J Adv Trends Comput Sci Eng 2020;9(1.4 Special Issue).
  • 35.van der Meulen F, Vermaat T, Willems P. Case study: An application of logistic regression in a six sigma project in health care. Qual Eng. 2011;23(2):113–24. [Google Scholar]
  • 36.Shrestha G. Application of Multinomial Logistic Regression Model in Maternal Health Care Service Utilization as Assistance During Delivery. Int J Oper Res Nepal. 2018;7(1):73–86. [Google Scholar]
  • 37.Courvoisier DS, Combescure C, Agoritsas T, Gayet-Ageron A, Perneger TV. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol. 2011;64(9):993–1000. doi: 10.1016/j.jclinepi.2010.11.012. [DOI] [PubMed] [Google Scholar]
  • 38.Domínguez-Almendros S, Benítez-Parejo N, Gonzalez-Ramirez A. Logistic regression models. Allergol Immunopathol. 2011;39(5):295–305. doi: 10.1016/j.aller.2011.05.002. [DOI] [PubMed] [Google Scholar]
  • 39.Stoltzfus JC. Logistic regression: a brief primer. Acad Emerg Med. 2011;18(10):1099–104. doi: 10.1111/j.1553-2712.2011.01185.x. [DOI] [PubMed] [Google Scholar]
  • 40. Abdar M, Kalhori SRN, Sutikno T, Subroto IMI, Arji G. Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases. International Journal of Electrical and Computer Engineering 2015;5(6).
  • 41. Mehmood T, Rais HBM, editors. Machine learning algorithms in context of intrusion detection. 2016 3rd International Conference on Computer and Information Sciences (ICCOINS). 2016: IEEE.
  • 42.Koklu M, Unal Y. Analysis of a population of diabetic patients databases with classifiers. Int J Biol Biomed. 2013;7(8):481–3. [Google Scholar]
  • 43.Vembandasamy K, Sasipriya R, Deepa E. Heart diseases detection using Naive Bayes algorithm. Int J Innov Sci Technol. 2015;2(9):441–4. [Google Scholar]
  • 44.Saritas MM, Yasar A. Performance analysis of ANN and Naive Bayes classification algorithm for data classification. Int J Intell Syst. 2019;7(2):88–91. [Google Scholar]
  • 45.Webb GI, Keogh E, Miikkulainen R. Naïve Bayes. Encyclopedia of machine learning. 2010;15:713–4. [Google Scholar]
  • 46.Leclerc T, Donat N, Donat A, Pasquier P, Libert N, Schaeffer E, et al. Prioritisation of ICU treatments for critically ill patients in a COVID-19 pandemic with scarce resources. Anaesth Crit Care Pain Med. 2020;39(3):333–9. doi: 10.1016/j.accpm.2020.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Tambone V, Boudreau D, Ciccozzi M, Sanders K, Campanozzi LL, Wathuta J, et al. Ethical Criteria for the Admission and Management of Patients in the ICU Under Conditions of Limited Medical Resources: A Shared International Proposal in View of the COVID-19 Pandemic. Front Public Health. 2020;8:284. doi: 10.3389/fpubh.2020.00284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Burdick H, Lam C, Mataraso S, Siefkas A, Braden G, Dellinger RP, et al. Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial. Comput Biol Med. 2020;124:103949. doi: 10.1016/j.compbiomed.2020.103949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Chin V, Samia NI, Marchant R, Rosen O, Ioannidis JP, Tanner MA, et al. A case study in model failure? COVID-19 daily deaths and ICU bed utilisation predictions in New York State. Eur J Epidemiol. 2020;35(8):733–42. doi: 10.1007/s10654-020-00669-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Haug PJ, Gardner RM, Tate KE, Evans RS, East TD, Kuperman G, et al. Decision support in medicine: examples from the HELP system. Comput Biomed Res. 1994;27(5):396–418. doi: 10.1006/cbmr.1994.1030. [DOI] [PubMed] [Google Scholar]
  • 51.Arvind V, Kim JS, Cho BH, Geng E, Cho SK. Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. J Crit Care. 2021;62:25–30. doi: 10.1016/j.jcrc.2020.10.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Alotaibi A, Shiblee M, Alshahrani A. Prediction of severity of COVID-19-infected patients using machine learning techniques. Computers. 2021;10(3):31. [Google Scholar]
  • 53. de Fátima Cobre A, Stremel DP, Noleto GR, Fachi MM, Surek M, Wiens A, et al. Diagnosis and prediction of COVID-19 severity: can biochemical tests and machine learning be used as prognostic indicators? Comput Biol Med. 2021:104531. [DOI] [PMC free article] [PubMed]
  • 54.Choron RL, Butts CA, Bargoud C, Krumrei NJ, Teichman AL, Schroeder ME, et al. Fever in the ICU: A Predictor of Mortality in Mechanically Ventilated COVID-19 Patients. J Intensive Care Med. 2021;36(4):484–93. doi: 10.1177/0885066620979622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Lei M, Lin K, Pi Y, Huang X, Fan L, Huang J, et al. Clinical Features and Risk Factors of ICU Admission for COVID-19 Patients with Diabetes. J Diabetes Res 2020;2020. [DOI] [PMC free article] [PubMed]

Articles from Medical Journal of the Islamic Republic of Iran are provided here courtesy of Iran University of Medical Sciences

RESOURCES