Abstract
In this study, we integrate deep neural network (DNN) with hybrid approaches (feature selection and instance clustering) to build prediction models for predicting mortality risk in patients with COVID-19. Besides, we use cross-validation methods to evaluate the performance of these prediction models, including feature based DNN, cluster-based DNN, DNN, and neural network (multi-layer perceptron). The COVID-19 dataset with 12,020 instances and 10 cross-validation methods are used to evaluate the prediction models. The experimental results showed that the proposed feature based DNN model, holding Recall (98.62%), F1-score (91.99%), Accuracy (91.41%), and False Negative Rate (1.38%), outperforms than original prediction model (neural network) in the prediction performance. Furthermore, the proposed approach uses the Top 5 features to build a DNN prediction model with high prediction performance, exhibiting the well prediction as the model built by all features (57 features). The novelty of this study is that we integrate feature selection, instance clustering, and DNN techniques to improve prediction performance. Moreover, the proposed approach which is built with fewer features performs much better than the original prediction models in many metrics and can still remain high prediction performance.
Keywords: COVID-19, Mortality risk, Deep learning, Feature-based DNN, Feature selection
1. Introduction
A new coronavirus disease known as COVID-19 is currently a pandemic that is spread out the whole world. The virus has spread out worldwide and has been declared a pandemic by the World Health Organization (Covid et al., 2020). Although the most important clinical symptoms are fever and cough, symptoms such as fatigue, headache and shortness of breath can also be seen. However, diagnostic tests are needed because all these symptoms are not specific to the disease and the disease can progress rapidly to severe pneumonia (Akçay et al., 2020, Chen et al., 2020). Conghy et al. (2020) considered that once the coronavirus outbreak starts, it will take less than four weeks to overwhelm the healthcare system. Once the hospital capacity gets overwhelmed, the death rate jumps. Therefore, how to predict mortality risk in patients with COVID-19 using machine learning (ML) techniques is an interesting research issue.
There are numerous studies presented in the literature on COVID-19 disease detection by analyzing images. Ozturk et al. (2020) proposed DarkCovidNet model in X-ray images for classifying COVID-19. For example, Hemdan et al. (2020) proposed a deep learning model called COVIDX-Net to analyze 25 COVID-19, and 25 healthy images. Wang et al. (2021) proposed a new architecture called M-inception by modifying the classical inception network to dialogize 1119 CT (computed tomography) images for COVID-19.
However, when ML algorithms are applicable to data-driven capabilities, their performance and reliability are often limited by the quality of data representation used to train and test algorithms (Ellefsen et al., 2019). Besides, datasets with many variables (high dimensionality) and redundant variables (features) generate poor ML algorithm performance (Aremu et al., 2020, Russell and Norvig, 2016). Specifically, high-dimensional datasets highlight the limitations of ML algorithms (Laurence et al., 2019, Russell and Norvig, 2016). A dataset that contains high amounts of redundancies and low information content can result in poor performances of ML algorithms and increased computation time (Cai et al., 2018, Li et al., 2019).
Pourhomayoun and Shakibi (2021) used several machine learning algorithms including support vector machine (SVM), artificial neural networks (ANN), random forest, decision tree, logistic regression, and k-nearest neighbor (KNN), to predict the mortality rate in patients with COVID-19. In addition, how to evaluate better feature selection methods and integrate them with deep neural network (DNN) to build more accurate prediction models is an interesting research issue.
Different to Covid-19 disease detection by deep learning techniques in analyzing images, we attempt to integrate several approaches, feature selection and instance clustering and DNN, to predict mortality risk in patients with COVID-19. We focus on that the proposed model can achieve high performance by using fewer features. Therefore, how to use fewer and significant features to build a prediction model with higher prediction performance and parsimonious model is the main objective of this study. In addition, two approaches (filter and wrapper) are integrated into DNN prediction models.
2. Related work
In this section we mainly review the issue and techniques related to COVID-19 disease detection, deep learning methods used for COVID-19 disease prediction. Besides, we also review feature selection techniques.
2.1. COVID-19 disease detection by deep learning techniques in analyzing images
An increasing number of cases of novel coronavirus (2019-nCoV)–infected pneumonia (NCIP) has been identified since December 2019. The World Health Organization (WHO) defined an official name, COVID-19, for the infectious disease caused by the novel coronavirus. The new coronavirus disease, COVID-19, is currently a pandemic declared a pandemic by the WHO and is spread out the whole world (Covid et al., 2020).
There are numerous studies presented in the literature on COVID-19 disease detection by analyzing images. Ozturk et al. (2020) proposed DarkCovidNet model in X-ray images for classifying COVID-19, healthy and pneumonia disease to achieve 87.02% accuracy rate. Hemdan et al. (2020) proposed a deep learning model called COVIDX-Net to analyze 25 COVID-19 and 25 healthy images to obtain 90% success rate. Wang et al. (2021) proposed a new architecture called M-inception by modifying the classical inception network to dialogize 1119 CT images for COVID-19 to obtain accuracy (89.5%), specificity (88%), and sensitivity (87%). Zhao et al. (2020) integrated transfer learning and data augmentation with deep learning to dialogize 275 CT images for COVID-19 to achieve an accuracy of 84.7%. Moreover, Loey et al. (2020) integrated a deep transfer learning model with classical data augmentation and conditional generative adversarial network (CGAN) for detecting the COVID-19 from the chest CT images.
2.2. Feature selection techniques
The success of ML algorithms depends upon the quality of the data to obtain a generalized predictive model of the classification problem. A dataset that contains a lot amounts of redundancies and low information content would result in poor performance of ML algorithms and increased computation time (Li et al., 2019). Therefore, the importance of feature selection (FS) for improving data quality and subsequently the performance of ML algorithms has been presented in many studies.
The classification of surface electromyography (sEMG) signal has an important usage in the man-machine interfaces for proper controlling of prosthetic devices with multiple degrees of freedom. Mukhopadhyay and Samui (2020) have demonstrated a detailed empirical exploration on DNN based classification system for the upper limb position invariant myoelectric signal. In this study, the DNN based system can outperform the other existing classifiers.
Because uneven environment conditions, such as branch and leaf occlusion, illumination variation, clusters of tomatoes, shading, and so on, have made fruit detection very challenging. Lawal (2021) proposed a modified YOLOv3 model called YOLO-Tomato models to detect tomatoes in complex environmental conditions. Because inter-subject variability, inherent complex properties, and low signal-to-noise ratio (SNR) in electroencephalogram (EEG) signals are major challenges, Roy (2022) proposed an efficient transfer learning (TL)-based multi-scale feature fused convolutional neural networks (MSFFCNN) which can capture the distinguishable features of various non-overlapping canonical frequency bands of EEG signals from different convolutional scales for multi-class MI classification. Automatic analysis and the recognition and prediction of the behavior of large-scale crowds in video-surveillance data is a research field of paramount importance for the security of modern societies. Matkovic et al. (2022) proposed a novel method for generating meta-tracklets and recognition of dominant motion patterns as a basis for automatic crowd behavior analysis at the macroscopic level, where a crowd is treated as an entity.
By considering Chi-square feature selection which ranks the features based on the statistical significance test and only those features that are dependent on the class label, Thaseen et al. (2019) developed an intrusion detection model by utilizing feature selection (Chi-square) and the ensemble of classifiers, including SVM, modified Naive Bayes (MNB), and LPBoost.
To achieve a higher classification accuracy, Ozyurt et al. (2021) proposed two basic feature generation functions (FRDEPFGN and RFINCA), which are used to extract statistical and textural features. The selected most informative features are forwarded to ANN and DNN for classification.
Similar to FS, feature extraction (FeExt) involves the derivation of new attributes from the prevailing attributes. Shastry and Sanjay (2021) proposed a hybrid FS and FeExt strategy, modified-Genetic Algorithm (m-GA) and weighted principal component analysis (wgt-PCA), for selecting features from the agricultural data set to achieve a higher classification accuracy.
Yuvaraj et al. (2021) developed a novel deep decision tree classifier that utilizes the hidden layers of DNN as its tree node to process the input elements. In their study, three feature extraction methods (Information gain, , Pearson correlation) are used to avoid a failure in classifying with limited features.
Considering the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic, Akçay et al. (2020) stated that diagnostic tests are needed because all these symptoms are not specific to the disease and the disease can progress rapidly to severe pneumonia. Besides, Conghy et al. (2020) considered that once the coronavirus outbreak starts, it will take less than four weeks to overwhelm the healthcare system. Once the hospital capacity gets overwhelmed, the death rate jumps. Motivated by recent advances and applications of artificial intelligence (AI) and big data in various areas, Pham et al. (2020) emphasized their importance in responding to the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic. They also provided researchers and communities with new insights into the ways that AI and big data improve the COVID-19 situation and drive further studies in stopping the COVID-19 outbreak.
Cai et al. (2018) considered that feature selection methods can be broadly classified into categories: filter, wrapper, and embedded methods. Embedded methods are that feature selection methods can be integrated into algorithms. Wrapper methods evaluate feature importance based on the predictor algorithm’s performance using various feature subsets. Filter methods select features by ranking them per various criterion ranging from feature variance properties and independences.
Different to COVID-19 disease detection by deep learning techniques in analyzing images, we attempt to integrate feature selection methods and DNN to predict mortality risk in patients with COVID-19. Two feature selection approaches (filter and wrapper) are integrated into DNN to build prediction models with high prediction performance. How to use fewer features to build prediction model with higher prediction performance is the main objective of this study.
2.3. Comparison of our research and literature
To describe the differences between our and prior studies in various techniques, including machine learning, feature selection and instance clustering, we provide a comparison data in Table 1. Furthermore, we describe the differences between our and prior studies in dialogizing COVID-19 disease, as shown in Table 2.
Table 1.
Comparison of our research and literature.
| Studies | Machine learning | Feature selection | Instance clustering |
|---|---|---|---|
| Lawal (2021) | YOLOv3 | N | N |
| Mukhopadhyay and Samui (2020) | DNN | N | N |
| Ozyurt et al. (2021) | ANN, DNN | Functions (FRDEPFGN, RFINCA) | N |
| Roy (2022) | MSFFCNN | N | N |
| Shastry and Sanjay (2021) | m-GA, wgt-PCA | N | N |
| Thaseen et al. (2019) | SVM, MNB, LPBoost | N | |
| Yuvaraj et al. (2021) | DNN | Information gain, , Pearson correlation | N |
| This study | DNN | , Pearson correlation, information gain, DT, LR, RF | Y |
ANN: Artificial Neural Networks; CGAN: Conditional Generative Adversarial Network; DNN: Deep Neural Networks; DT: Decision Tree; LR: Logistic Regression; KNN: k-Nearest Neighbor; m-GA: modified-Genetic Algorithm; MNB: Modified Naive Bayes; MSFFCNN: multi-scale feature fused CNN; RF: Random Forest; SVM: Support Vector Machine; wgt-PCA: weighted principal component analysis.
Table 2.
Comparison of our research and literature in analyzing COVID-19.
| Studies | Aim | Machine learning | Feature selection | Instance clustering |
|---|---|---|---|---|
| Hemdan et al. (2020) | Detect COVID-19 disease by analyzing images | COVIDX-Net | N | N |
| Loey et al. (2020) | Detect COVID-19 by analyzing chest CT images | CGAN | N | N |
| Ozturk et al. (2020) | Detect COVID-19 disease by analyzing images | DarkCovidNet | N | N |
| Wang et al. (2021) | Dialogize CT images for COVID-19 | M-inception | N | N |
| Pourhomayoun and Shakibi (2021) | Predict mortality risk in patients with COVID-19 | SVM, ANN, RF, DT, LR, KNN | Correlation | N |
| This study | Predict mortality risk in patients with COVID-19 | DNN | , Pearson correlation, information gain, DT, LR, RF | Y |
CGAN: Conditional Generative Adversarial Network; DNN: Deep Neural Networks; DT: Decision Tree; LR: Logistic Regression; KNN: k-Nearest Neighbor; MNB: Modified Naive Bayes; ANN: Artificial Neural Networks; RF: Random Forest; SVM: Support Vector Machine.
3. Research methodology
In order to improve the accuracy and performance of prediction models, we develop a new framework which integrates feature selection, clustering methods and deep learning to build prediction models. The proposed framework, development of prediction models, and assessment metrics are illustrated as follows.
3.1. The proposed framework for prediction models
In this section, we further describe four steps (data pre-processing, feature selection, instances clustering, and prediction models construction) in the proposed framework, as shown in Fig. 1. The four steps are introduced as follows.
Fig. 1.
The proposed framework.
Step 1: Data pre-processing
Data pre-processing is a technique that transforms the raw data into a useful format for applying machine learning techniques. At the data pre-processing stage, useless and redundant data would be removed. Besides, the unlabeled data instances would be removed as well.
Step 2: Feature selection
Two feature selection strategies (filter and wrapper) are used to select the most important features to build prediction models. Filter methods (2, Pearson correlation, and information gain) select features by ranking them per various criterion ranging from feature variance properties and independences. After that, wrapper methods are used to evaluate feature importance based on the predictor algorithm’s performance in various feature subsets.
Step 3: Instances clustering
We use specific attributes to cluster instances of the dataset into sub-datasets. The attributes can be determined by experts or clustering methods, such as k-means, EM (Expectation-Maximization), and DBSCAN. After that, specific clusters of instances could be used to prediction models in the next step.
Step 4: Prediction models building
We use DNN to build prediction models by important features which are selected by the above two feature selection strategies (filter and wrapper).
After prediction models built, we use the popular assessment metrics to evaluate the prediction performance of these prediction models. The assessment metrics are illustrated in Section 3.4.
3.2. Feature selection techniques
Two feature selection strategies (filter and wrapper) are used to select the most important features to build prediction models. Filter methods (2, Pearson correlation, and information gain) select features by ranking them per various criterion. Wrapper methods, LR (logistic regression), DT (decision tree), and RF (random forest), are used to evaluate feature importance based on the predictor algorithm’s performance in various feature subsets. These selection methods illustrated as follows.
3.2.1. Feature filter methods (, Pearson correlation, and information gain)
This study applies three feature selection methods, namely Chi-Square (Bahassine et al., 2020, Forman, 2003, Thaseen et al., 2019), Pearson Correlation (Tan et al., 2006, Yuvaraj et al., 2021), and Information Gain (Quinlan, 1986, Quinlan, 1987), to filter feature and these three feature selection methods are given as following.
Chi-Square (Forman, 2003)
CHI statists determines the level of independence among the feature () and the class label () and compares to CHI distribution with degree of freedom as 1. The Chi-square statistic is defined as:
| (1) |
where
A: frequency of feature () and class label () in the dataset.
B: frequency of feature () appearing without class label () in the dataset.
C: frequency of class label () appearing without feature () in the dataset.
D: frequency of neither class label () nor feature () appearing in the dataset.
N: total number of records.
Pearson Correlation (Tan et al., 2006, Yuvaraj et al., 2021)
The Pearson correlation coefficient in the present study is used to estimate optimal features by calculating the degree of linear correlation between the extracted class and the original class. Pearson correlation coefficient between two data objects, x and y, is defined by the following:
| (2) |
where
: is the mean of x.
: is the mean of y.
Information gain (Quinlan, 1986, Quinlan, 1987)
Let D be a set of class-labeled instances. Suppose the class label attribute has m distinct values defining m distinct classes, (for , 2, …, m). Let be the set of instances of class in D. Let D and denote the number of instances in D and , respectively. The expected information needed to classify an instance in D is given by
| (3) |
where
: the probability that an arbitrary instance in D belongs to class and is estimated by / D .
Feature (or called Attribute) A can be used to spilt D into v partitions or subsets, {, , …, , where contains those instances in D that have outcome of A. The expected information needed to classify instances by Attribute A in D is given by
| (4) |
where
: the weight of the th partition.
: the expected information required to classify a instance from D based on the portioning by Attribute A.
Information gain is defined as the difference between the original information requirement (i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after portioning based on attribute A). That is Gain(A) Info(D) (D). Gain(A) tells us how much would be gained by branching based on A. If attribute A holds the highest information gain, Gain(A), it is chosen as the splitting attribute at node N. That is, we partition based on attribute A for the “best classification”, so as to minimize the amount of information still required to finish classifying the instances, i.e., minimum (D).
3.2.2. Feature wrapper methods
This study applies three feature selection methods, namely logistic regression (LR) (Sperandei, 2014), decision tree (DT) (Quinlan, 1979, Quinlan, 2014), and random forest (RF) (Breiman, 2001), to filter feature and these three feature selection methods are given as following.
LR
Logistic regression works very similar to linear regression, but with a binomial response variable. A logistic regression will model the chance of an outcome based on individual characteristics (Sperandei, 2014). Because chance is a ratio, what will be actually modeled is the logarithm of the chance given by:
| (5) |
where
p: indicates the probability of an event.
: the regression coefficients associated with the reference group and the variables.
DT
A decision tree is a flowchart-like tree structure, where each inter node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node. Quinlan (1979) developed ID3 decision tree algorithm. Quinlan (2014) later presented C4.5 (a successor of ID3) decision tree algorithm, becoming a benchmark to which newer supervised learning algorithm are often compared to ID3 used information gain as its attribute selection measure. However, the measure (information gain) is biased toward tests with many outcomes. C4.5 used gain ratio as its attribute selection measure which attempt to overcome this bias. The two measures, gain ratio and information gain, can be formalized as follows:
| (6) |
| (7) |
| (8) |
where
Info(D): the average amount of information need to identify the class label of a tuple in D.
(D): the expected information required to classify a tuple from D based on the partitioning by attribute (A).
RF
Random forest is a class of ensemble methods specially designed for decision tree classifier. It combines the predictions made by multiple decision trees, where each tree is generated based on the values of an independent set of random vectors. A random forest is defined formally as follows Breiman (2001). The strength of a set of classifiers refers to the average of performance of the classifiers, where performance is measured probabilistically in terms of the classifier’s margin:
| (9) |
where
: the predicted class of X according to a classifier build from some random vector .
3.3. Development of prediction models
There are several classification techniques; named multi-layer perceptron (MLP), and DNN, used for constructing prediction models. These classification techniques are briefly described below.
-
(1)MLP: An ANN is an abstract computational model of a human brain. The architecture of an artificial neural network is defined by the characteristics of a node and the characteristics of the node’s connectivity in the network (Haykin and Lippmann, 1994). Perceptron is the simplest model in ANN model family. MLP can learn powerful non-linear transformations: in fact, with enough hidden units they can represent arbitrarily complex but smooth functions. In a perceptron, each input node is connected via a weighted link to the output node. The output of a perceptron model can be expressed as follows (Tan et al., 2006):
where , , …, are the weights of the input links, , , …, are the input attribute values and w is the weight vector and x is the input vector x. The sign function, which acts as an activation function for the output neuron, output a value (+1) if its argument is positive and (−1) if its argument is positive. An artificial neural network has a more complex structure than that of a perceptron model. The goal of the MLP learning algorithm is to determine a set of weights (w) that minimize the total sum of squared errors:(10)
where is the predicted error. The weight update formula used the gradient descent method can be written as follows:(11)
where l is the learning rate.(12) -
(2)DNN: A DNN can be considered as a conventional MLP with many hidden layers (thus deep). The DNN parameters are optimized with back propagation using stochastic gradient descent. DNN, a (L 1)-layer MLP, is used to model the posterior probability of a hidden Markov model (HMM) tied state s given an observation vector o. The first L layers, l 0… L−1, are hidden layers that model posterior probability of hidden nodes given input vectors from previous layer while the top layer L is used to compute the posterior probability for all tied states using softmax (Pan et al., 2012):
(13) (14)
where and denote weight matrix and bias vectors for hidden layer l, and , denote the th component of hidden node, and its activation , respectively(15)
3.4. Assessment metrics
There are six metrics, Precision, Recall, F1-score, Accuracy, FPR (false positive rate) and FNR (false negative rate), that are commonly used in prediction for evaluating the machine learning algorithms proposed in this study. These six metrics (Tan et al., 2006) are as follows.
| (16) |
| (17) |
| (18) |
| (19) |
| (20) |
| (21) |
where TP# and TN# present the number of the positive and negative terms of predicted instances which are classified correctly, respectively. FP# and FN# present the number of the positive and negative terms of predicted instances which are misclassified, respectively.
4. Experimental results
The dataset used to evaluate the prediction models and hyper-parameters of DNN model are described in Section 4.1. In Section 4.2, we first compare the prediction performance of two methods, neural network (MLP) (ANN with MLP) and DNN, using the COVID-19 dataset provided by Pourhomayoun and Shakibi (2021) to understand the performance difference of the two methods (ANN and DNN) in several metrics. In Section 4.3, we investigate the impact of the important features when building DNN models in prediction performance. There are two feature selection strategies (filter and wrapper) used to choose important features to build prediction models. In Section 4.4, we divided the COVID-19 dataset into several sub-datasets according to the attribute (country) into country sub-datasets and build country-based DNN prediction models. Finally, the experimental results are summarized in Section 4.5. The prediction models were implemented using the Python language and tested on a PC running Windows 10.
Table 3.
The performance difference of prediction models in metrics.
| Measures | DNN (This study) |
ANN (Pourhomayoun and Shakibi, 2021) |
ANN* (Pourhomayoun and Shakibi, 2021) |
|---|---|---|---|
| Recall | 98.62% | 94.20% | 95.49% |
| Precision | 86.20% | 86.86% | 85.57% |
| F1-score | 91.99% | 90.38% | 90.44% |
| Accuracy | 91.41% | 89.98% | 89.91% |
| FPR | 15.79% | 14.24% | 15.67% |
| FNR | 1.38% | 5.79% | 4.51% |
ANN*: We re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN (this study).
4.1. The dataset description and hyper-parameters of DNN model
The original dataset consists of more than 2,670,000 laboratory-confirmed COVID-19 patients from 146 countries around the world, including 307,382 labeled samples containing both male and female patients with an average age of 44.75. At the data cleaning stage, Pourhomayoun and Shakibi (2021) removed useless and redundant data elements and the unlabeled data samples. After that, data imputation techniques including mean/median/mode value replacement and KNN technique were used to handle missing values. Moreover, both recovered and deceased patients were created to train and test by Pourhomayoun and Shakibi (2021) to make sure the dataset is balanced. Finally, there are 57 features chosen out of 112 features and 12,020 instances in this dataset. Pourhomayoun and Shakibi (2021) have provided the processed dataset in hope to benefit the research community. We called this processed dataset as COVID-19 dataset in this study.
The hyper-parameters of the model, a Deep Neural Network (DNN), are specified as follows: Firstly, we set the learning rate to 0.0005. Secondly, the network structure of the DNN is designed as follows: In the input layer, we use “ReLU” as the activation function and set the number of cells to 200. We then incorporate four hidden layers, each utilizing “ReLU” as the activation function. Specifically, hidden layer #1 contains 300 cells, hidden layer #2 contains 200 cells, hidden layer #3 contains 500 cells, and hidden layer #4 contains 250 cells. Finally, in the output layer, we set the activation function to “sigmoid” and the number of cells to 1 to achieve the desired model output.
4.2. Performance of prediction models (ANN and DNN)
We first investigate the performance of the two methods (ANN and DNN) in following metrics, Precision, Recall, F1-score, Accuracy, FPR, and FNR, to understand the difference of prediction performance in COVID-19 dataset provided by Pourhomayoun and Shakibi (2021). The result in Table 3 indicated that the prediction performance metrics (Precision, Recall, F1-score, Accuracy, FPR, and FNR) of two methods (ANN and DNN) using COVID-19 dataset. Compared to the prediction performance provided by Pourhomayoun and Shakibi (2021), we find that DNN outperforms than ANN (MLP) in the prediction performance metrics (Recall, F1-score, Accuracy, and FNR). ANN (MLP) outperforms than DNN only in Precision and FPR metrics. Besides, we re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN. We find that DNN outperforms than ANN (MLP) in the prediction performance metrics (Recall, F1-score, Accuracy, and FNR). That is, DNN outperforms than ANN (MLP) in the prediction performance metrics (Recall, F1-score, Accuracy, and FNR).
Second, we investigate the performance of the two methods (ANN and DNN) in the area under the curve of ROC (receiver operating characteristic) (AUC (area under curve) of ROC) to understand the difference of prediction performance in COVID-19 dataset. The result in Table 4 indicated that the prediction performance (AUC of ROC) of two methods (ANN and DNN) using COVID-19 dataset. Compared to the prediction performance provided by Pourhomayoun and Shakibi (2021), we find that ANN (MLP) with AUC of ROC (92.76%) outperforms than DNN with AUC of ROC (91.41%). However, we re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of ANN (MLP). We find that DNN with AUC of ROC (91.41%) outperforms than ANN (MLP) with AUC of ROC (89.91%).
Table 4.
The performance difference of prediction models in the area of ROC curve.
| DNN | ANN | ANN* |
|---|---|---|
| (This study) | (Pourhomayoun and Shakibi, 2021) | (Pourhomayoun and Shakibi, 2021) |
| 91.41% | 92.76% | 89.91% |
![]() |
![]() |
![]() |
ANN*: We re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN.
Finally, we investigate the performance of the two methods (ANN and DNN) in the area under the PRC (precision–recall curve) (AUC of PRC) to understand the difference of prediction performance in COVID-19 dataset. The result in Table 5 indicated that the prediction performance (AUC of PRC) of two methods (ANN and DNN) using COVID-19 dataset. Compared to the prediction performance provided by Pourhomayoun and Shakibi (2021), we find that DNN with AUC of PRC (92.75%) outperforms than ANN (MLP) with AUC of PRC (91.99%). Besides we re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN. We find that DNN with AUC of PRC (92.75%) still outperforms than ANN (MLP) with AUC of PRC (91.82%). Therefore, DNN outperforms than ANN (MLP) in the prediction performance (AUC of PRC).
Table 5.
The performance difference of prediction models in the area of PRC curve.
| DNN | ANN | ANN* |
|---|---|---|
| (This study) | (Pourhomayoun and Shakibi, 2021) | (Pourhomayoun and Shakibi, 2021) |
| 92.75% | 91.99% | 91.82% |
![]() |
![]() |
![]() |
ANN*: We re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN.
4.3. The impact of the important features
4.3.1. Feature filter methods (2, Pearson correlation, and information gain)
We compare prediction performance of DNN models built by different features which are recommended by different feature filter methods (2, Pearson correlation, and information gain) with the criteria, Precision, Recall, F1-score, Accuracy, ROC, PRC, FPR, and FNR.
First, we determine the level of independence among the feature and the class label by feature filter method (2). The Chi-square vales of all features (57 features) calculated by 2 method are shown in Table 6. After that, we choose the Top features, according to the Chi-square vales, to build DNN prediction models. From results in Table 7, we know that the prediction performance of DNN model built by Top 25 features performs very well. That is, we could use the Top 25 features to build a prediction model with high prediction performance as well as the prediction model built by all features (57 features).
Table 6.
The Chi-square vales of all features.
| No | Feature | chi | No | Feature | chi |
|---|---|---|---|---|---|
| 1 | city | 0.000000 | 30 | chronic_disease_HIV | 0.157299 |
| 2 | province | 0.000000 | 31 | chronic_disease_Parkinson | 0.157299 |
| 3 | country | 0.000000 | 32 | anorexia | 0.157299 |
| 4 | age | 0.000000 | 33 | expectoration | 0.157299 |
| 5 | travel_history_location | 0.000000 | 34 | lesions on chest radiographs | 0.157299 |
| 6 | chronic_disease_binary | 0.000000 | 35 | hypertension | 0.157299 |
| 7 | chronic_disease_Hypertension | 0.000000 | 36 | cardiac disease | 0.157299 |
| 8 | sex | 0.000000 | 37 | hypoxia | 0.157299 |
| 9 | pneumonia | 0.000000 | 38 | chronic_disease_prostate | 0.179712 |
| 10 | respiratory distress | 0.000000 | 39 | chronic_disease_TB | 0.317311 |
| 11 | chronic_disease_Diabetes | 0.000000 | 40 | chronic_disease_cereberal | 0.317311 |
| 12 | septic shock | 0.000013 | 41 | conjunctivitis | 0.317311 |
| 13 | chronic_disease_kidney | 0.000162 | 42 | dizziness | 0.317311 |
| 14 | Heart attack | 0.000311 | 43 | emesis | 0.317311 |
| 15 | rhinorrhea | 0.001565 | 44 | eye irritation | 0.317311 |
| 16 | sore throat | 0.004509 | 45 | obnubilation | 0.317311 |
| 17 | kidney failure | 0.004678 | 46 | myelofibrosis | 0.317311 |
| 18 | chronic_disease_heart | 0.008151 | 47 | somnolence | 0.317311 |
| 19 | chronic_disease_cardiac | 0.014306 | 48 | cough | 0.324756 |
| 20 | dyspnea | 0.014306 | 49 | Myalgia | 0.479500 |
| 21 | gasp | 0.014306 | 50 | chronic_disease_hypothyroidism | 0.563703 |
| 22 | headache | 0.019631 | 51 | diarrhea | 0.563703 |
| 23 | chronic_disease_COPD | 0.025347 | 52 | sputum | 0.563703 |
| 24 | fever | 0.048815 | 53 | cold | 0.563703 |
| 25 | chronic_disease_asthma | 0.058782 | 54 | shortness of breath | 0.654721 |
| 26 | chest pain | 0.058782 | 55 | chronic_disease_cancer | 1.000000 |
| 27 | chronic_disease_bronchitis | 0.083265 | 56 | chronic_disease_dyslipidemia | 1.000000 |
| 28 | chills | 0.102470 | 57 | fatigue | 1.000000 |
| 29 | chronic_disease_Hepatitis | 0.157299 |
Table 7.
The prediction performance of Top N features by 2 method.
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.8636 | 0.9745 | 0.9157 | 0.9103 | 0.9103 | 0.9254 | 0.1539 | 0.0255 |
| 10 | 0.8601 | 0.9699 | 0.9117 | 0.9061 | 0.9061 | 0.9225 | 0.1577 | 0.0301 |
| 15 | 0.8665 | 0.9647 | 0.9130 | 0.9081 | 0.9081 | 0.9244 | 0.1486 | 0.0353 |
| 20 | 0.8627 | 0.9814 | 0.9182 | 0.9126 | 0.9126 | 0.9267 | 0.1562 | 0.0186 |
| 25 | 0.8608 | 0.9815 | 0.9172 | 0.9114 | 0.9114 | 0.9258 | 0.1587 | 0.0185 |
| 30 | 0.8629 | 0.9732 | 0.9148 | 0.9093 | 0.9093 | 0.9248 | 0.1546 | 0.0268 |
| 35 | 0.8633 | 0.9784 | 0.9172 | 0.9117 | 0.9117 | 0.9262 | 0.1549 | 0.0216 |
| 40 | 0.8632 | 0.9872 | 0.9211 | 0.9154 | 0.9154 | 0.9284 | 0.1564 | 0.0128 |
| 45 | 0.8663 | 0.9689 | 0.9147 | 0.9097 | 0.9097 | 0.9254 | 0.1496 | 0.0311 |
| 50 | 0.8636 | 0.9775 | 0.9170 | 0.9116 | 0.9116 | 0.9262 | 0.1544 | 0.0225 |
| 54 | 0.8674 | 0.9607 | 0.9117 | 0.9069 | 0.9069 | 0.9239 | 0.1469 | 0.0393 |
| 57 | 0.8620 | 0.9862 | 0.9199 | 0.9141 | 0.9141 | 0.9275 | 0.1579 | 0.0138 |
Second, we calculate Pearson correlation coefficient values of all features (57 features) by feature filter method (Pearson correlation) as shown in Table 8. After that, we choose the Top features, according to the Pearson correlation coefficient vales, to build DNN prediction models. From results in Table 9, we know that the prediction performance of DNN model built by Top 50 features filtered by Pearson method performs well. That is, we should use the Top 55 features to build a prediction model with high prediction performance as well as the prediction model built by all features (57 features).
Table 8.
The Pearson correlation coefficient values of all features.
| No | Feature | Pearson | No | Feature | Pearson |
|---|---|---|---|---|---|
| 1 | country | 0.502119 | 30 | chronic_disease_hypothyroidism | 0.0053 |
| 2 | age | 0.126401 | 31 | diarrhea | 0.0053 |
| 3 | sex | 0.114197 | 32 | cold | 0.0053 |
| 4 | chronic_disease_binary | 0.089623 | 33 | fatigue | 0.0000 |
| 5 | chronic_disease_Hypertension | 0.077358 | 34 | chronic_disease_dyslipidemia | −0.0000 |
| 6 | chronic_disease_Diabetes | 0.059286 | 35 | chronic_disease_cancer | −0.0000 |
| 7 | chronic_disease_kidney | 0.034424 | 36 | shortness of breath | −0.0041 |
| 8 | rhinorrhea | 0.028855 | 37 | dizziness | −0.0091 |
| 9 | sore throat | 0.025922 | 38 | emesis | −0.0091 |
| 10 | chronic_disease_heart | 0.024139 | 39 | obnubilation | −0.0091 |
| 11 | chronic_disease_cardiac | 0.022348 | 40 | myelofibrosis | −0.0091 |
| 12 | headache | 0.021291 | 41 | somnolence | −0.0091 |
| 13 | chronic_disease_COPD | 0.020400 | 42 | anorexia | −0.0129 |
| 14 | fever | 0.018040 | 43 | expectoration | −0.0129 |
| 15 | chronic_disease_asthma | 0.017242 | 44 | hypertension | −0.0129 |
| 16 | chronic_disease_bronchitis | 0.015800 | 45 | cardiac disease | −0.0129 |
| 17 | chills | 0.014898 | 46 | hypoxia | −0.0129 |
| 18 | chronic_disease_Hepatitis | 0.012900 | 47 | chest pain | −0.0172 |
| 19 | chronic_disease_HIV | 0.012900 | 48 | dyspnea | −0.0223 |
| 20 | chronic_disease_Parkinson | 0.012900 | 49 | gasp | −0.0223 |
| 21 | lesions on chest radiographs | 0.012900 | 50 | kidney failure | −0.0258 |
| 22 | chronic_disease_prostate | 0.012240 | 51 | Heart attack | −0.0329 |
| 23 | chronic_disease_cereberal | 0.009123 | 52 | septic shock | −0.0398 |
| 24 | chronic_disease_TB | 0.009121 | 53 | city | −0.0456 |
| 25 | conjunctivitis | 0.009121 | 54 | respiratory distress | −0.0650 |
| 26 | eye irritation | 0.009121 | 55 | pneumonia | −0.0716 |
| 27 | cough | 0.009007 | 56 | province | −0.1025 |
| 28 | Myalgia | 0.006452 | 57 | travel_history_location | −0.1265 |
| 29 | sputum | 0.005267 |
Table 9.
The prediction performance of Top N features by Pearson method.
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.9370 | 0.7975 | 0.8617 | 0.8720 | 0.8720 | 0.9179 | 0.0536 | 0.2025 |
| 10 | 0.8888 | 0.8085 | 0.8467 | 0.8537 | 0.8537 | 0.8965 | 0.1012 | 0.1915 |
| 15 | 0.9306 | 0.8098 | 0.8660 | 0.8747 | 0.8747 | 0.9178 | 0.0604 | 0.1902 |
| 20 | 0.9303 | 0.8090 | 0.8654 | 0.8742 | 0.8742 | 0.9174 | 0.0606 | 0.1910 |
| 25 | 0.9307 | 0.8108 | 0.8666 | 0.8752 | 0.8752 | 0.9180 | 0.0604 | 0.1892 |
| 30 | 0.9299 | 0.8116 | 0.8667 | 0.8752 | 0.8752 | 0.9178 | 0.0612 | 0.1884 |
| 35 | 0.9292 | 0.8125 | 0.8669 | 0.8753 | 0.8753 | 0.9177 | 0.0619 | 0.1875 |
| 40 | 0.9292 | 0.8103 | 0.8657 | 0.8743 | 0.8743 | 0.9172 | 0.0617 | 0.1897 |
| 45 | 0.9296 | 0.8105 | 0.8660 | 0.8745 | 0.8745 | 0.9174 | 0.0614 | 0.1895 |
| 50 | 0.9290 | 0.8118 | 0.8665 | 0.8749 | 0.8749 | 0.9174 | 0.0621 | 0.1882 |
| 55 | 0.8661 | 0.9742 | 0.9170 | 0.9118 | 0.9118 | 0.9266 | 0.1506 | 0.0258 |
| 57 | 0.8620 | 0.9862 | 0.9199 | 0.9141 | 0.9141 | 0.9275 | 0.1579 | 0.0138 |
Finally, the information gain values of all features (57 features) calculated by feature filter method (information gain) are shown in Table 10. After that, we choose the Top features, according to the information gain values, to build DNN prediction models. From results in Table 11, we know that the prediction performance of DNN model built by Top 5 features performs very well. That is, we can also use the Top 5 features to build a new prediction model with high prediction performance, exhibiting the well prediction as the model built by all features (57 features), and save redundant computation cost.
Table 10.
The information gain (info) values of all features.
| No | Feature | Info | No | Feature | Info |
|---|---|---|---|---|---|
| 1 | city | 0.411180 | 30 | cardiac disease | 0.0001 |
| 2 | province | 0.409138 | 31 | chronic_disease_Hepatitis | 0.0001 |
| 3 | age | 0.326288 | 32 | chronic_disease_HIV | 0.0001 |
| 4 | country | 0.280728 | 33 | chronic_disease_Parkinson | 0.0001 |
| 5 | travel_history_location | 0.017563 | 34 | expectoration | 0.0001 |
| 6 | sex | 0.006536 | 35 | hypertension | 0.0001 |
| 7 | chronic_disease_binary | 0.004766 | 36 | hypoxia | 0.0001 |
| 8 | chronic_disease_Hypertension | 0.003733 | 37 | lesions on chest radiographs | 0.0001 |
| 9 | pneumonia | 0.003241 | 38 | chronic_disease_prostate | 0.0001 |
| 10 | respiratory distress | 0.002587 | 39 | chronic_disease_TB | 0.0001 |
| 11 | chronic_disease_Diabetes | 0.002259 | 40 | conjunctivitis | 0.0001 |
| 12 | septic shock | 0.001097 | 41 | dizziness | 0.0001 |
| 13 | Heart attack | 0.000750 | 42 | emesis | 0.0001 |
| 14 | chronic_disease_kidney | 0.000718 | 43 | eye irritation | 0.0001 |
| 15 | rhinorrhea | 0.000577 | 44 | myelofibrosis | 0.0001 |
| 16 | kidney failure | 0.000462 | 45 | obnubilation | 0.0001 |
| 17 | chronic_disease_heart | 0.000404 | 46 | somnolence | 0.0001 |
| 18 | sore throat | 0.000375 | 47 | chronic_disease_cereberal | 0.0000 |
| 19 | chronic_disease_cardiac | 0.000346 | 48 | cough | 0.0000 |
| 20 | dyspnea | 0.000346 | 49 | Myalgia | 0.0000 |
| 21 | gasp | 0.000346 | 50 | chronic_disease_hypothyroidism | 0.0000 |
| 22 | chronic_disease_COPD | 0.000288 | 51 | cold | 0.0000 |
| 23 | headache | 0.000258 | 52 | diarrhea | 0.0000 |
| 24 | chronic_disease_bronchitis | 0.000173 | 53 | sputum | 0.0000 |
| 25 | chest pain | 0.000165 | 54 | shortness of breath | 0.0000 |
| 26 | chronic_disease_asthma | 0.000165 | 55 | chronic_disease_cancer | 0.0000 |
| 27 | fever | 0.000164 | 56 | chronic_disease_dyslipidemia | 0.0000 |
| 28 | chills | 0.000121 | 57 | fatigue | 0.0000 |
| 29 | anorexia | 0.000115 |
Table 11.
The prediction performance of Top N features by information gain method.
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.8586 | 0.9822 | 0.9163 | 0.9102 | 0.9102 | 0.9249 | 0.1617 | 0.0178 |
| 10 | 0.8495 | 0.9852 | 0.9123 | 0.9053 | 0.9053 | 0.9210 | 0.1745 | 0.0148 |
| 15 | 0.8506 | 0.9855 | 0.9131 | 0.9062 | 0.9062 | 0.9217 | 0.1730 | 0.0145 |
| 20 | 0.8525 | 0.9875 | 0.9150 | 0.9083 | 0.9083 | 0.9231 | 0.1709 | 0.0125 |
| 25 | 0.8545 | 0.9814 | 0.9136 | 0.9072 | 0.9072 | 0.9226 | 0.1671 | 0.0186 |
| 30 | 0.8512 | 0.9920 | 0.9162 | 0.9093 | 0.9093 | 0.9236 | 0.1734 | 0.0080 |
| 35 | 0.8516 | 0.9920 | 0.9165 | 0.9096 | 0.9096 | 0.9238 | 0.1729 | 0.0080 |
| 40 | 0.8506 | 0.9925 | 0.9161 | 0.9091 | 0.9091 | 0.9234 | 0.1744 | 0.0075 |
| 45 | 0.8520 | 0.9925 | 0.9169 | 0.9101 | 0.9101 | 0.9241 | 0.1724 | 0.0075 |
| 50 | 0.8514 | 0.9940 | 0.9172 | 0.9102 | 0.9102 | 0.9242 | 0.1735 | 0.0060 |
| 54 | 0.8506 | 0.9935 | 0.9165 | 0.9095 | 0.9095 | 0.9237 | 0.1745 | 0.0065 |
| 57 | 0.8620 | 0.9862 | 0.9199 | 0.9141 | 0.9141 | 0.9275 | 0.1579 | 0.0138 |
From the above discussion, we understand that filter method (information gain) performs best. Therefore, we consider that the filter method (information gain) is a better way to select features to build DNN prediction model.
4.3.2. Feature wrapper methods (DT, LR, and RF)
We compare prediction performance of DNN models built by different features which are ranked by feature wrapper methods (DT, LR, and RF) with the criteria, Precision, Recall, F1-score, Accuracy, ROC, PRC, FPR, and FNR.
First, we rank 57 features by feature wrapper method (DT). Then, we build DNN models by Top features according to their rankings. The DNN prediction performance of Top features filtered by wrapper method (DT) are shown in Table 12. From results in Table 12, we know that the prediction performance of Top 5 features filtered by wrapper method (DT) is well. That is, we could use the Top 5 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features).
Table 12.
The prediction performance of Top N features by wrapper method (DT).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.8611 | 0.9819 | 0.9175 | 0.9117 | 0.9117 | 0.9260 | 0.1584 | 0.0181 |
| 10 | 0.8644 | 0.9755 | 0.9166 | 0.9112 | 0.9112 | 0.9261 | 0.1531 | 0.0245 |
| 15 | 0.8630 | 0.9854 | 0.9201 | 0.9145 | 0.9145 | 0.9278 | 0.1564 | 0.0146 |
| 20 | 0.8603 | 0.9870 | 0.9193 | 0.9134 | 0.9134 | 0.9269 | 0.1602 | 0.0130 |
| 25 | 0.8643 | 0.9789 | 0.9180 | 0.9126 | 0.9126 | 0.9268 | 0.1537 | 0.0211 |
| 30 | 0.8599 | 0.9879 | 0.9195 | 0.9135 | 0.9135 | 0.9269 | 0.1609 | 0.0121 |
| 35 | 0.8620 | 0.9875 | 0.9205 | 0.9147 | 0.9147 | 0.9279 | 0.1581 | 0.0125 |
| 40 | 0.8614 | 0.9832 | 0.9183 | 0.9125 | 0.9125 | 0.9265 | 0.1582 | 0.0168 |
| 45 | 0.8644 | 0.9789 | 0.9181 | 0.9126 | 0.9126 | 0.9269 | 0.1536 | 0.0211 |
| 50 | 0.8761 | 0.9516 | 0.9123 | 0.9085 | 0.9085 | 0.9259 | 0.1346 | 0.0484 |
Second, we rank 57 features by feature wrapper method (LR). Then, we build DNN models by Top features according to their rankings. The DNN prediction performance of Top features filtered by wrapper method (LR) are shown in Table 13. From results in Table 13, we know that the prediction performance of Top 50 features filtered by wrapper method (LR) is well. That is, we could use the Top 50 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features).
Table 13.
The prediction performance of Top N features by wrapper method (LR).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.4962 | 0.4013 | 0.4437 | 0.4969 | 0.4969 | 0.5984 | 0.4075 | 0.5987 |
| 10 | 0.5017 | 0.5050 | 0.5033 | 0.5017 | 0.5017 | 0.6271 | 0.5017 | 0.4950 |
| 15 | 0.5037 | 0.4065 | 0.4499 | 0.5030 | 0.5030 | 0.6035 | 0.4005 | 0.5935 |
| 20 | 0.4965 | 0.5953 | 0.5415 | 0.4958 | 0.4958 | 0.6471 | 0.6037 | 0.4047 |
| 25 | 0.5577 | 0.6276 | 0.5906 | 0.5649 | 0.5649 | 0.6857 | 0.4978 | 0.3724 |
| 30 | 0.5566 | 0.6303 | 0.5911 | 0.5641 | 0.5641 | 0.6859 | 0.5022 | 0.3697 |
| 35 | 0.5573 | 0.6303 | 0.5916 | 0.5648 | 0.5648 | 0.6862 | 0.5007 | 0.3697 |
| 40 | 0.9303 | 0.7413 | 0.8251 | 0.8428 | 0.8428 | 0.9004 | 0.0556 | 0.2587 |
| 45 | 0.9286 | 0.7556 | 0.8332 | 0.8488 | 0.8488 | 0.9032 | 0.0581 | 0.2444 |
| 50 | 0.9305 | 0.8128 | 0.8677 | 0.8760 | 0.8760 | 0.9184 | 0.0607 | 0.1872 |
Finally, we rank 57 features by feature wrapper method (RF). Then, we build DNN models by Top features according to their rankings. The DNN prediction performance of Top features filtered by wrapper method (RF) are shown in Table 14. From results in Table 14, we know that the prediction performance of Top 10 features filtered by wrapper method (RF) is well. That is, we could use the Top 10 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features).
Table 14.
The prediction performance of Top N features by wrapper method (RF).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.8694 | 0.9634 | 0.9140 | 0.9093 | 0.9093 | 0.9255 | 0.1448 | 0.0366 |
| 10 | 0.8629 | 0.9809 | 0.9181 | 0.9125 | 0.9125 | 0.9266 | 0.1559 | 0.0191 |
| 15 | 0.8695 | 0.9679 | 0.9161 | 0.9113 | 0.9113 | 0.9267 | 0.1453 | 0.0321 |
| 20 | 0.8722 | 0.9686 | 0.9178 | 0.9133 | 0.9133 | 0.9282 | 0.1419 | 0.0314 |
| 25 | 0.8590 | 0.9832 | 0.9169 | 0.9109 | 0.9109 | 0.9253 | 0.1614 | 0.0168 |
| 30 | 0.8661 | 0.9772 | 0.9183 | 0.9131 | 0.9131 | 0.9273 | 0.1511 | 0.0228 |
| 35 | 0.8632 | 0.9792 | 0.9175 | 0.9120 | 0.9120 | 0.9264 | 0.1552 | 0.0208 |
| 40 | 0.8654 | 0.9699 | 0.9146 | 0.9095 | 0.9095 | 0.9251 | 0.1509 | 0.0301 |
| 45 | 0.8607 | 0.9857 | 0.9189 | 0.9131 | 0.9131 | 0.9268 | 0.1596 | 0.0143 |
| 50 | 0.8611 | 0.9725 | 0.9134 | 0.9078 | 0.9078 | 0.9237 | 0.1569 | 0.0275 |
From the above discussion, we understand that wrapper method (DT) performs best among the wrapper methods. Therefore, we consider that the wrapper method (DT) is a better way to select features to build DNN prediction model.
4.4. Performance of prediction models (ANN and DNN) in different countries
There are 12,020 instances in COVID-19 dataset. We first cluster instances according to the attribute (country) and further select the top 5 countries with more than 100 instances in COVID-19 dataset. There are China (139), Ethiopia (113), India (7309), Philippines (4058), and Singapore (111). We compare prediction performance of ANN/DNN models built by instances of different countries with the criteria, Precision, Recall, F1-score, Accuracy, ROC, PRC, FPR, and FNR.
From results in Table 15, we know that the prediction performance of DNN model outperforms than ANN (MLP) in the prediction performance metrics (Precision, Recall, F1-score, Accuracy, and FPR) in China (139), India (7309), and Philippines (4058). DNN model performs better than ANN model in the other 2 countries, Ethiopia (113) and Singapore (111). That is, we consider that DNN is a good method and could be used to build COVID-19 prediction models for predicting mortality risk in patients.
Table 15.
The prediction performance of ANN and DNN (Country-based instances).
| Country | Method | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|---|
| China (139) |
DNN | 0.9084 | 0.9835 | 0.9444 | 0.8995 | 0.6584 | 0.9531 | 0.6667 | 0.0165 |
| ANN | 0.8705 | 1.0000 | 0.9308 | 0.8702 | 0.5000 | 0.9353 | 1.0000 | 0.0000 | |
| Ethiopia (113) |
DNN | 0.9646 | 1.0000 | 0.9820 | 0.9649 | 0.5000 | 0.9823 | 1.0000 | 0.0000 |
| ANN | 0.9646 | 1.0000 | 0.9820 | 0.9649 | 0.5000 | 0.9823 | 1.0000 | 0.0000 | |
| India (7309) |
DNN | 0.7021 | 0.9739 | 0.8159 | 0.9033 | 0.9286 | 0.8409 | 0.1167 | 0.0261 |
| ANN | 0.6983 | 0.8285 | 0.7578 | 0.8834 | 0.8637 | 0.7822 | 0.1011 | 0.1715 | |
| Philippines (4058) |
DNN | 0.9423 | 0.9932 | 0.9671 | 0.9367 | 0.5506 | 0.9709 | 0.8919 | 0.0068 |
| ANN | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 | |
| Singapore (111) |
DNN | 0.9640 | 1.0000 | 0.9817 | 0.9640 | 0.5000 | 0.9820 | 1.0000 | 0.0000 |
| ANN | 0.9640 | 1.0000 | 0.9817 | 0.9640 | 0.5000 | 0.9820 | 1.0000 | 0.0000 | |
Furthermore, the information gain values of all features (57 features) are calculated by feature filter method (information gain). After that, we choose the Top features, according to the information gain values, to build prediction models (ANN and DNN) for the two countries (India and Philippines). We select top 37 features from India dataset and top 27 features from Philippines dataset to investigate the prediction performance.
From the results in Table 16, Table 17, we know that the prediction performance of models (ANN and DNN) built by Top 10 features from India dataset performs very well. That is, we could use the Top 10 features to build a prediction model with high prediction performance as well as the prediction model built by all features (57 features). Besides, from the results in Table 18, Table 19, we know that the prediction performance of models (ANN and DNN) built by Top 5 features from Philippines dataset performs very well. That is, we could use the Top 5 features to build a prediction model with high prediction performance.
Table 16.
The prediction performance of ANN with Top N features (India).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.6999 | 0.7582 | 0.7279 | 0.8752 | 0.8332 | 0.7557 | 0.0918 | 0.2418 |
| 10 | 0.6799 | 0.8173 | 0.7423 | 0.8751 | 0.8543 | 0.7687 | 0.1086 | 0.1827 |
| 15 | 0.6995 | 0.7551 | 0.7262 | 0.8747 | 0.8318 | 0.7543 | 0.0916 | 0.2449 |
| 20 | 0.6838 | 0.8291 | 0.7494 | 0.8780 | 0.8604 | 0.7752 | 0.1082 | 0.1709 |
| 25 | 0.6806 | 0.8198 | 0.7437 | 0.8756 | 0.8556 | 0.7700 | 0.1086 | 0.1802 |
| 30 | 0.6951 | 0.7837 | 0.7368 | 0.8767 | 0.8433 | 0.7632 | 0.0970 | 0.2163 |
| 35 | 0.6949 | 0.8707 | 0.7730 | 0.8874 | 0.8814 | 0.7971 | 0.1079 | 0.1293 |
| 37 | 0.6900 | 0.8745 | 0.7714 | 0.8859 | 0.8818 | 0.7961 | 0.1109 | 0.1255 |
Table 17.
The prediction performance of DNN with Top N features (India).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.6940 | 0.9316 | 0.7954 | 0.8945 | 0.9078 | 0.8203 | 0.1160 | 0.0684 |
| 10 | 0.6949 | 0.9739 | 0.8111 | 0.9001 | 0.9266 | 0.8373 | 0.1207 | 0.0261 |
| 15 | 0.6871 | 0.9633 | 0.8021 | 0.8953 | 0.9197 | 0.8292 | 0.1239 | 0.0367 |
| 20 | 0.6957 | 0.9764 | 0.8125 | 0.9008 | 0.9279 | 0.8387 | 0.1205 | 0.0236 |
| 25 | 0.6950 | 0.9671 | 0.8087 | 0.8993 | 0.9236 | 0.8346 | 0.1198 | 0.0329 |
| 30 | 0.6944 | 0.9789 | 0.8125 | 0.9005 | 0.9286 | 0.8390 | 0.1216 | 0.0211 |
| 35 | 0.6918 | 0.9739 | 0.8090 | 0.8988 | 0.9257 | 0.8357 | 0.1225 | 0.0261 |
| 37 | 0.6869 | 0.9764 | 0.8065 | 0.8968 | 0.9254 | 0.8343 | 0.1256 | 0.0236 |
Table 18.
The prediction performance of ANN with Top N features (Philippines).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 |
| 10 | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 |
| 15 | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 |
| 20 | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 |
| 25 | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 |
| 27 | 0.9362 | 1.0000 | 0.9670 | 0.9362 | 0.5000 | 0.9681 | 1.0000 | 0.0000 |
Table 19.
The prediction performance of DNN with Top N features (Philippines).
| Top N | Precision | Recall | F1-score | Accuracy | ROC | PRC | FPR | FNR |
|---|---|---|---|---|---|---|---|---|
| 5 | 0.9394 | 0.9958 | 0.9668 | 0.9359 | 0.5269 | 0.9696 | 0.9421 | 0.0042 |
| 10 | 0.9409 | 0.9926 | 0.9661 | 0.9347 | 0.5388 | 0.9702 | 0.9151 | 0.0074 |
| 15 | 0.9398 | 0.9942 | 0.9662 | 0.9349 | 0.5299 | 0.9697 | 0.9344 | 0.0058 |
| 20 | 0.9397 | 0.9929 | 0.9656 | 0.9337 | 0.5293 | 0.9696 | 0.9344 | 0.0071 |
| 25 | 0.9393 | 0.9939 | 0.9659 | 0.9342 | 0.5259 | 0.9695 | 0.9421 | 0.0061 |
| 27 | 0.9399 | 0.9955 | 0.9669 | 0.9362 | 0.5306 | 0.9698 | 0.9344 | 0.0045 |
From the above experimental results, we could find that the proposed approach which integrates deep learning method (DNN) with hybrid methods (feature selection and instance clustering) performs very well in predicting mortality risk in patients with COVID-19 under fewer features.
4.5. Summary of experimental results
To investigate the prediction performance difference, we first build DNN model with the COVID-19 dataset provided by Pourhomayoun and Shakibi (2021). We find that the built DNN model outperforms than ANN model by Pourhomayoun and Shakibi (2021) in the criteria, Recall, F1-score, Accuracy, ROC, and PRC.
In addition, we investigate the impact of the important features by 3 feature filter methods (2, Pearson correlation, information gain). From the experimental results, we find that the filter method (information gain) performs best and filters top 5 important features to build prediction model (DNN), performing as well as original prediction model (DNN) built with all of 57 features. Therefore, we consider that the filter method (information gain) is a better way to select features to build DNN prediction model.
Furthermore, we investigate the impact of the important features by 3 feature wrapper methods (DT, LR, and RF). From the experimental results, we find that wrapper method (DT) performs best among the three wrapper methods. That is, the wrapper method (DT) filters top 10 important features to build prediction model (DNN) which performs as well as original prediction model (DNN) built with all of 57 features.
Finally, we study the prediction performance difference between ANN (MLP) and DNN among in different countries, China (139), Ethiopia (113), India (7309), Philippines (4058), and Singapore (111). From the experimental results, we find that DNN model outperforms than ANN (MLP) in the prediction performance metrics (Precision, Recall, F1-score, and Accuracy) in China (139), India (7309), and Philippines (4058). That is, DNN model seems to be a better method to predict mortality risk in patients with COVID-19. Moreover, the proposed hybrid approach (instance clustering, feature selection and deep learning method) performs very well in predicting mortality risk in patients with COVID-19 under fewer features.
5. Conclusion
In this study, we integrate deep learning with hybrid approaches (instance clustering and feature selection) to building prediction models and predicting mortality risk in patients with COVID-19.
The experimental results showed that the proposed feature based DNN model, holding Recall (98.62%), F1-score (91.99%), Accuracy (91.41%), and FNR (1.38%), outperforms than original prediction model (ANN) in the prediction performance. Furthermore, the proposed approach which uses the Top 5 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features). We find that the filter method (information gain) performs better than the other two feature filter methods (). Therefore, the proposed framework, feature based DNN approach, could use fewer features to build prediction model with higher prediction performance for predicting mortality risk in patients with COVID-19.
The weakness and limitation of the proposed model are as follows. First, we use only one dataset to evaluate the proposed model. Second, we do not evaluate the performance of integrating feature filter or wrapper methods with machine learning algorithms, including support vector machine, artificial neural networks, random forest, decision tree, logistic regression, and k-nearest neighbor.
There are several issues that remain to be addressed in the future. First, there are other deep learning techniques which could be considered in the future. Second, it would be interesting to apply other data normalization methods to build more accurate prediction models. Finally, the exploration of other feature selection methods is still an interesting issue to increase prediction performance. Therefore, it would be a good new research direction to incorporate the above issues.
CRediT authorship contribution statement
Thing-Yuan Chang: Conceptualization, Supervision. Cheng-Kui Huang: Methodology, Writing – review & editing. Cheng-Hsiung Weng: Conceptualization, Methodology, Writing – review & editing, Software. Jing-Yuan Chen: Writing – original draft, Software.
Declaration of Competing Interest
The authors declare that they have no Conflict of interests.
Data availability
Data will be made available on request.
References
- Akçay M.Ş., Özlü T., Yilmaz A. Radiological approaches to COVID-19 pneumonia. Turk. J. Med. Sci. 2020;50(SI-1):604–610. doi: 10.3906/sag-2004-160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aremu O.O., Cody R.A., Hyland-Wood D., McAree P.R. A relative entropy based feature selection framework for asset data in predictive maintenance. Comput. Ind. Eng. 2020;145 [Google Scholar]
- Bahassine S., Madani A., Al-Sarem M., Kissi M. Feature selection using an improved Chi-square for Arabic text classification. J. King Saud Univ.-Comput. Inf. Sci. 2020;32(2):225–231. [Google Scholar]
- Breiman L. Random forests. Mach. Learn. 2001;45(1):5–32. [Google Scholar]
- Cai J., Luo J., Wang S., Yang S. Feature selection in machine learning: A new perspective. Neurocomputing. 2018;300:70–79. [Google Scholar]
- Chen Y.C., Lu P.E., Chang C.S., Liu T.H. A time-dependent SIR model for COVID-19 with undetectable infected persons. IEEE Trans. Netw. Sci. Eng. 2020;7(4):3279–3294. doi: 10.1109/TNSE.2020.3024723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conghy T., Pon B., Anderson E. 2020. When does hospital capacity get overwhelmed in USA. https://medium.com/@trentmc0/when-does-hospital-capacity-get-overwhelmed-in-usa-germany-a06cf2835f89. [Google Scholar]
- Covid C.D.C., Team R., COVID C., Team R., COVID C., Team … R., Sauber-Schatz E. Severe outcomes among patients with coronavirus disease 2019 (COVID-19)—United States, February 12–March 16, 2020. Morb. Mortal. Wkly. Rep. 2020;69(12):343. doi: 10.15585/mmwr.mm6912e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ellefsen A.L., Bjørlykhaug E., Æsøy V., Ushakov S., Zhang H. Remaining useful life predictions for turbofan engine degradation using semi-supervised deep architecture. Reliab. Eng. Syst. Saf. 2019;183:240–251. [Google Scholar]
- Forman G. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 2003;3(Mar):1289–1305. [Google Scholar]
- Haykin S., Lippmann R. Neural networks, a comprehensive foundation. Int. J. Neural Syst. 1994;5(4):363–364. [Google Scholar]
- Hemdan E.E.D., Shouman M.A., Karar M.E. 2020. Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv preprint arXiv:2003.11055. [Google Scholar]
- Laurence E., Doyon N., Dubé L.J., Desrosiers P. Spectral dimension reduction of complex dynamical networks. Phys. Rev. X. 2019;9(1) [Google Scholar]
- Lawal M.O. Tomato detection based on modified YOLOv3 framework. Sci. Rep. 2021;11(1):1–11. doi: 10.1038/s41598-021-81216-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li X., Zhang W., Ding Q. Deep learning-based remaining useful life estimation of bearings using multi-scale feature extraction. Reliab. Eng. Syst. Saf. 2019;182:208–218. [Google Scholar]
- Loey M., Manogaran G., Khalifa N.E.M. A deep transfer learning model with classical data augmentation and cgan to detect covid-19 from chest ct radiography digital images. Neural Comput. Appl. 2020:1–13. doi: 10.1007/s00521-020-05437-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matkovic F., Ivasic-Kos M., Ribaric S. A new approach to dominant motion pattern recognition at the macroscopic crowd level. Eng. Appl. Artif. Intell. 2022;116 [Google Scholar]
- Mukhopadhyay A.K., Samui S. An experimental study on upper limb position invariant EMG signal classification based on deep neural network. Biomed. Signal Process. Control. 2020;55 [Google Scholar]
- Ozturk T., Talo M., Yildirim E.A., Baloglu U.B., Yildirim O., Acharya U.R. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 2020;121 doi: 10.1016/j.compbiomed.2020.103792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozyurt F., Tuncer T., Subasi A. An automated COVID-19 detection based on fused dynamic exemplar pyramid feature extraction and hybrid feature selection using deep learning. Comput. Biol. Med. 2021;132 doi: 10.1016/j.compbiomed.2021.104356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan J., Liu C., Wang Z., Hu Y., Jiang H. 2012 8th International Symposium on Chinese Spoken Language Processing. IEEE; 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMs in acoustic modeling; pp. 301–305. [Google Scholar]
- Pham Q.V., Nguyen D.C., Huynh-The T., Hwang W.J., Pathirana P.N. 2020. Artificial intelligence (AI) and big data for coronavirus (COVID-19) pandemic: A survey on the state-of-the-arts. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pourhomayoun M., Shakibi M. Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making. Smart Health. 2021;20 doi: 10.1016/j.smhl.2020.100178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinlan J.R. Expert Systems in the Micro Electronics Age. 1979. Discovering rules by induction from large collections of examples. [Google Scholar]
- Quinlan J.R. Induction of decision trees. Mach. Learn. 1986;1(1):81–106. [Google Scholar]
- Quinlan J.R. Simplifying decision trees. Int. J. Man-Mach. Stud. 1987;27(3):221–234. [Google Scholar]
- Quinlan J.R. Elsevier; 2014. C4. 5: Programs for Machine Learning. [Google Scholar]
- Roy A.M. Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain–computer interface. Eng. Appl. Artif. Intell. 2022;116 [Google Scholar]
- Russell S.J., Norvig P. Pearson Education Limited; Malaysia: 2016. Artificial Intelligence: A Modern Approach. [Google Scholar]
- Shastry K.A., Sanjay H.A. A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture. Knowl.-Based Syst. 2021;232 [Google Scholar]
- Sperandei S. Understanding logistic regression analysis. Biochem. Med. 2014;24(1):12–18. doi: 10.11613/BM.2014.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan P.N., Steinbach M., Kumar V. Pearson Addison Wesley; Boston: 2006. Introduction to Data Mining, Vol. 1. [Google Scholar]
- Thaseen I.S., Kumar C.A., Ahmad A. Integrated intrusion detection model using chi-square feature selection and ensemble of classifiers. Arab. J. Sci. Eng. 2019;44(4):3357–3368. [Google Scholar]
- Wang S., Kang B., Ma J., Zeng X., Xiao M., Guo … J., Xu B. A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19) Eur. Radiol. 2021:1–9. doi: 10.1007/s00330-021-07715-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuvaraj N., Chang V., Gobinathan B., Pinagapani A., Kannan S., Dhiman G., Rajan A.R. Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification. Comput. Electr. Eng. 2021;92 [Google Scholar]
- Zhao J., Zhang Y., He X., Xie P. 2020. Covid-ct-dataset: a ct scan dataset about covid-19. arXiv preprint arXiv:2003.13865, 490. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.







