Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2023 Jun 19;124:106644. doi: 10.1016/j.engappai.2023.106644

Feature-based deep neural network approach for predicting mortality risk in patients with COVID-19

Thing-Yuan Chang a, Cheng-Kui Huang b, Cheng-Hsiung Weng a,c,, Jing-Yuan Chen a
PMCID: PMC10277846  PMID: 37366394

Abstract

In this study, we integrate deep neural network (DNN) with hybrid approaches (feature selection and instance clustering) to build prediction models for predicting mortality risk in patients with COVID-19. Besides, we use cross-validation methods to evaluate the performance of these prediction models, including feature based DNN, cluster-based DNN, DNN, and neural network (multi-layer perceptron). The COVID-19 dataset with 12,020 instances and 10 cross-validation methods are used to evaluate the prediction models. The experimental results showed that the proposed feature based DNN model, holding Recall (98.62%), F1-score (91.99%), Accuracy (91.41%), and False Negative Rate (1.38%), outperforms than original prediction model (neural network) in the prediction performance. Furthermore, the proposed approach uses the Top 5 features to build a DNN prediction model with high prediction performance, exhibiting the well prediction as the model built by all features (57 features). The novelty of this study is that we integrate feature selection, instance clustering, and DNN techniques to improve prediction performance. Moreover, the proposed approach which is built with fewer features performs much better than the original prediction models in many metrics and can still remain high prediction performance.

Keywords: COVID-19, Mortality risk, Deep learning, Feature-based DNN, Feature selection

1. Introduction

A new coronavirus disease known as COVID-19 is currently a pandemic that is spread out the whole world. The virus has spread out worldwide and has been declared a pandemic by the World Health Organization (Covid et al., 2020). Although the most important clinical symptoms are fever and cough, symptoms such as fatigue, headache and shortness of breath can also be seen. However, diagnostic tests are needed because all these symptoms are not specific to the disease and the disease can progress rapidly to severe pneumonia (Akçay et al., 2020, Chen et al., 2020). Conghy et al. (2020) considered that once the coronavirus outbreak starts, it will take less than four weeks to overwhelm the healthcare system. Once the hospital capacity gets overwhelmed, the death rate jumps. Therefore, how to predict mortality risk in patients with COVID-19 using machine learning (ML) techniques is an interesting research issue.

There are numerous studies presented in the literature on COVID-19 disease detection by analyzing images. Ozturk et al. (2020) proposed DarkCovidNet model in X-ray images for classifying COVID-19. For example, Hemdan et al. (2020) proposed a deep learning model called COVIDX-Net to analyze 25 COVID-19, and 25 healthy images. Wang et al. (2021) proposed a new architecture called M-inception by modifying the classical inception network to dialogize 1119 CT (computed tomography) images for COVID-19.

However, when ML algorithms are applicable to data-driven capabilities, their performance and reliability are often limited by the quality of data representation used to train and test algorithms (Ellefsen et al., 2019). Besides, datasets with many variables (high dimensionality) and redundant variables (features) generate poor ML algorithm performance (Aremu et al., 2020, Russell and Norvig, 2016). Specifically, high-dimensional datasets highlight the limitations of ML algorithms (Laurence et al., 2019, Russell and Norvig, 2016). A dataset that contains high amounts of redundancies and low information content can result in poor performances of ML algorithms and increased computation time (Cai et al., 2018, Li et al., 2019).

Pourhomayoun and Shakibi (2021) used several machine learning algorithms including support vector machine (SVM), artificial neural networks (ANN), random forest, decision tree, logistic regression, and k-nearest neighbor (KNN), to predict the mortality rate in patients with COVID-19. In addition, how to evaluate better feature selection methods and integrate them with deep neural network (DNN) to build more accurate prediction models is an interesting research issue.

Different to Covid-19 disease detection by deep learning techniques in analyzing images, we attempt to integrate several approaches, feature selection and instance clustering and DNN, to predict mortality risk in patients with COVID-19. We focus on that the proposed model can achieve high performance by using fewer features. Therefore, how to use fewer and significant features to build a prediction model with higher prediction performance and parsimonious model is the main objective of this study. In addition, two approaches (filter and wrapper) are integrated into DNN prediction models.

2. Related work

In this section we mainly review the issue and techniques related to COVID-19 disease detection, deep learning methods used for COVID-19 disease prediction. Besides, we also review feature selection techniques.

2.1. COVID-19 disease detection by deep learning techniques in analyzing images

An increasing number of cases of novel coronavirus (2019-nCoV)–infected pneumonia (NCIP) has been identified since December 2019. The World Health Organization (WHO) defined an official name, COVID-19, for the infectious disease caused by the novel coronavirus. The new coronavirus disease, COVID-19, is currently a pandemic declared a pandemic by the WHO and is spread out the whole world (Covid et al., 2020).

There are numerous studies presented in the literature on COVID-19 disease detection by analyzing images. Ozturk et al. (2020) proposed DarkCovidNet model in X-ray images for classifying COVID-19, healthy and pneumonia disease to achieve 87.02% accuracy rate. Hemdan et al. (2020) proposed a deep learning model called COVIDX-Net to analyze 25 COVID-19 and 25 healthy images to obtain 90% success rate. Wang et al. (2021) proposed a new architecture called M-inception by modifying the classical inception network to dialogize 1119 CT images for COVID-19 to obtain accuracy (89.5%), specificity (88%), and sensitivity (87%). Zhao et al. (2020) integrated transfer learning and data augmentation with deep learning to dialogize 275 CT images for COVID-19 to achieve an accuracy of 84.7%. Moreover, Loey et al. (2020) integrated a deep transfer learning model with classical data augmentation and conditional generative adversarial network (CGAN) for detecting the COVID-19 from the chest CT images.

2.2. Feature selection techniques

The success of ML algorithms depends upon the quality of the data to obtain a generalized predictive model of the classification problem. A dataset that contains a lot amounts of redundancies and low information content would result in poor performance of ML algorithms and increased computation time (Li et al., 2019). Therefore, the importance of feature selection (FS) for improving data quality and subsequently the performance of ML algorithms has been presented in many studies.

The classification of surface electromyography (sEMG) signal has an important usage in the man-machine interfaces for proper controlling of prosthetic devices with multiple degrees of freedom. Mukhopadhyay and Samui (2020) have demonstrated a detailed empirical exploration on DNN based classification system for the upper limb position invariant myoelectric signal. In this study, the DNN based system can outperform the other existing classifiers.

Because uneven environment conditions, such as branch and leaf occlusion, illumination variation, clusters of tomatoes, shading, and so on, have made fruit detection very challenging. Lawal (2021) proposed a modified YOLOv3 model called YOLO-Tomato models to detect tomatoes in complex environmental conditions. Because inter-subject variability, inherent complex properties, and low signal-to-noise ratio (SNR) in electroencephalogram (EEG) signals are major challenges, Roy (2022) proposed an efficient transfer learning (TL)-based multi-scale feature fused convolutional neural networks (MSFFCNN) which can capture the distinguishable features of various non-overlapping canonical frequency bands of EEG signals from different convolutional scales for multi-class MI classification. Automatic analysis and the recognition and prediction of the behavior of large-scale crowds in video-surveillance data is a research field of paramount importance for the security of modern societies. Matkovic et al. (2022) proposed a novel method for generating meta-tracklets and recognition of dominant motion patterns as a basis for automatic crowd behavior analysis at the macroscopic level, where a crowd is treated as an entity.

By considering Chi-square feature selection which ranks the features based on the statistical significance test and only those features that are dependent on the class label, Thaseen et al. (2019) developed an intrusion detection model by utilizing feature selection (Chi-square) and the ensemble of classifiers, including SVM, modified Naive Bayes (MNB), and LPBoost.

To achieve a higher classification accuracy, Ozyurt et al. (2021) proposed two basic feature generation functions (FRDEPFGN and RFINCA), which are used to extract statistical and textural features. The selected most informative features are forwarded to ANN and DNN for classification.

Similar to FS, feature extraction (FeExt) involves the derivation of new attributes from the prevailing attributes. Shastry and Sanjay (2021) proposed a hybrid FS and FeExt strategy, modified-Genetic Algorithm (m-GA) and weighted principal component analysis (wgt-PCA), for selecting features from the agricultural data set to achieve a higher classification accuracy.

Yuvaraj et al. (2021) developed a novel deep decision tree classifier that utilizes the hidden layers of DNN as its tree node to process the input elements. In their study, three feature extraction methods (Information gain, χ2, Pearson correlation) are used to avoid a failure in classifying with limited features.

Considering the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic, Akçay et al. (2020) stated that diagnostic tests are needed because all these symptoms are not specific to the disease and the disease can progress rapidly to severe pneumonia. Besides, Conghy et al. (2020) considered that once the coronavirus outbreak starts, it will take less than four weeks to overwhelm the healthcare system. Once the hospital capacity gets overwhelmed, the death rate jumps. Motivated by recent advances and applications of artificial intelligence (AI) and big data in various areas, Pham et al. (2020) emphasized their importance in responding to the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic. They also provided researchers and communities with new insights into the ways that AI and big data improve the COVID-19 situation and drive further studies in stopping the COVID-19 outbreak.

Cai et al. (2018) considered that feature selection methods can be broadly classified into categories: filter, wrapper, and embedded methods. Embedded methods are that feature selection methods can be integrated into algorithms. Wrapper methods evaluate feature importance based on the predictor algorithm’s performance using various feature subsets. Filter methods select features by ranking them per various criterion ranging from feature variance properties and independences.

Different to COVID-19 disease detection by deep learning techniques in analyzing images, we attempt to integrate feature selection methods and DNN to predict mortality risk in patients with COVID-19. Two feature selection approaches (filter and wrapper) are integrated into DNN to build prediction models with high prediction performance. How to use fewer features to build prediction model with higher prediction performance is the main objective of this study.

2.3. Comparison of our research and literature

To describe the differences between our and prior studies in various techniques, including machine learning, feature selection and instance clustering, we provide a comparison data in Table 1. Furthermore, we describe the differences between our and prior studies in dialogizing COVID-19 disease, as shown in Table 2.

Table 1.

Comparison of our research and literature.

Studies Machine learning Feature selection Instance clustering
Lawal (2021) YOLOv3 N N
Mukhopadhyay and Samui (2020) DNN N N
Ozyurt et al. (2021) ANN, DNN Functions (FRDEPFGN, RFINCA) N
Roy (2022) MSFFCNN N N
Shastry and Sanjay (2021) m-GA, wgt-PCA N N
Thaseen et al. (2019) SVM, MNB, LPBoost χ2 N
Yuvaraj et al. (2021) DNN Information gain, χ2, Pearson correlation N
This study DNN χ2 , Pearson correlation, information gain, DT, LR, RF Y

ANN: Artificial Neural Networks; CGAN: Conditional Generative Adversarial Network; DNN: Deep Neural Networks; DT: Decision Tree; LR: Logistic Regression; KNN: k-Nearest Neighbor; m-GA: modified-Genetic Algorithm; MNB: Modified Naive Bayes; MSFFCNN: multi-scale feature fused CNN; RF: Random Forest; SVM: Support Vector Machine; wgt-PCA: weighted principal component analysis.

Table 2.

Comparison of our research and literature in analyzing COVID-19.

Studies Aim Machine learning Feature selection Instance clustering
Hemdan et al. (2020) Detect COVID-19 disease by analyzing images COVIDX-Net N N

Loey et al. (2020) Detect COVID-19 by analyzing chest CT images CGAN N N

Ozturk et al. (2020) Detect COVID-19 disease by analyzing images DarkCovidNet N N

Wang et al. (2021) Dialogize CT images for COVID-19 M-inception N N

Pourhomayoun and Shakibi (2021) Predict mortality risk in patients with COVID-19 SVM, ANN, RF, DT, LR, KNN Correlation N

This study Predict mortality risk in patients with COVID-19 DNN χ2, Pearson correlation, information gain, DT, LR, RF Y

CGAN: Conditional Generative Adversarial Network; DNN: Deep Neural Networks; DT: Decision Tree; LR: Logistic Regression; KNN: k-Nearest Neighbor; MNB: Modified Naive Bayes; ANN: Artificial Neural Networks; RF: Random Forest; SVM: Support Vector Machine.

3. Research methodology

In order to improve the accuracy and performance of prediction models, we develop a new framework which integrates feature selection, clustering methods and deep learning to build prediction models. The proposed framework, development of prediction models, and assessment metrics are illustrated as follows.

3.1. The proposed framework for prediction models

In this section, we further describe four steps (data pre-processing, feature selection, instances clustering, and prediction models construction) in the proposed framework, as shown in Fig. 1. The four steps are introduced as follows.

Fig. 1.

Fig. 1

The proposed framework.

Step 1: Data pre-processing

Data pre-processing is a technique that transforms the raw data into a useful format for applying machine learning techniques. At the data pre-processing stage, useless and redundant data would be removed. Besides, the unlabeled data instances would be removed as well.

Step 2: Feature selection

Two feature selection strategies (filter and wrapper) are used to select the most important features to build prediction models. Filter methods (χ2, Pearson correlation, and information gain) select features by ranking them per various criterion ranging from feature variance properties and independences. After that, wrapper methods are used to evaluate feature importance based on the predictor algorithm’s performance in various feature subsets.

Step 3: Instances clustering

We use specific attributes to cluster instances of the dataset into sub-datasets. The attributes can be determined by experts or clustering methods, such as k-means, EM (Expectation-Maximization), and DBSCAN. After that, specific clusters of instances could be used to prediction models in the next step.

Step 4: Prediction models building

We use DNN to build prediction models by important features which are selected by the above two feature selection strategies (filter and wrapper).

After prediction models built, we use the popular assessment metrics to evaluate the prediction performance of these prediction models. The assessment metrics are illustrated in Section 3.4.

3.2. Feature selection techniques

Two feature selection strategies (filter and wrapper) are used to select the most important features to build prediction models. Filter methods (χ2, Pearson correlation, and information gain) select features by ranking them per various criterion. Wrapper methods, LR (logistic regression), DT (decision tree), and RF (random forest), are used to evaluate feature importance based on the predictor algorithm’s performance in various feature subsets. These selection methods illustrated as follows.

3.2.1. Feature filter methods (χ2, Pearson correlation, and information gain)

This study applies three feature selection methods, namely Chi-Square χ2 (Bahassine et al., 2020, Forman, 2003, Thaseen et al., 2019), Pearson Correlation (Tan et al., 2006, Yuvaraj et al., 2021), and Information Gain (Quinlan, 1986, Quinlan, 1987), to filter feature and these three feature selection methods are given as following.

 Chi-Square χ2 (Forman, 2003)

CHI statists determines the level of independence among the feature (xi) and the class label (yj) and compares to CHI distribution with degree of freedom as 1. The Chi-square statistic is defined as:

χ2xi,yj=N(ADBC)2(A+B)(A+D)(B+D)(C+D) (1)

where

A: frequency of feature (xi) and class label (yj) in the dataset.

B: frequency of feature (xi) appearing without class label (yj) in the dataset.

C: frequency of class label (yj) appearing without feature (xi) in the dataset.

D: frequency of neither class label (yj) nor feature (xi) appearing in the dataset.

N: total number of records.

 Pearson Correlation (Tan et al., 2006, Yuvaraj et al., 2021)

The Pearson correlation coefficient in the present study is used to estimate optimal features by calculating the degree of linear correlation between the extracted class and the original class. Pearson correlation coefficient between two data objects, x and y, is defined by the following:

corrx,y=1n1k=1n(xkx¯)(yky¯)1n1k=1n(xkx¯)2×1n1k=1n(yky¯)2 (2)

where

x¯: is the mean of x.

y¯: is the mean of y.

 Information gain (Quinlan, 1986, Quinlan, 1987)

Let D be a set of class-labeled instances. Suppose the class label attribute has m distinct values defining m distinct classes, Ci (for i=1, 2, …, m). Let Ci, D be the set of instances of class Ci in D. Let | D | and | Ci, D| denote the number of instances in D and Ci, D, respectively. The expected information needed to classify an instance in D is given by

Info(D)=i=1mpilog2(pi) (3)

where

pi: the probability that an arbitrary instance in D belongs to class Ci and is estimated by |Ci, D|/| D |.

Feature (or called Attribute) A can be used to spilt D into v partitions or subsets, {D1, D2, …, Dv}, where Dj contains those instances in D that have outcome aj of A. The expected information needed to classify instances by Attribute A in D is given by

InfoA(D)=j=1v|Dj||D|×Info(Dj) (4)

where

|Dj||D|: the weight of the jth partition.

InfoA(D): the expected information required to classify a instance from D based on the portioning by Attribute A.

Information gain is defined as the difference between the original information requirement (i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after portioning based on attribute A). That is Gain(A) = Info(D) InfoA(D). Gain(A) tells us how much would be gained by branching based on A. If attribute A holds the highest information gain, Gain(A), it is chosen as the splitting attribute at node N. That is, we partition based on attribute A for the “best classification”, so as to minimize the amount of information still required to finish classifying the instances, i.e., minimum InfoA(D).

3.2.2. Feature wrapper methods

This study applies three feature selection methods, namely logistic regression (LR) (Sperandei, 2014), decision tree (DT) (Quinlan, 1979, Quinlan, 2014), and random forest (RF) (Breiman, 2001), to filter feature and these three feature selection methods are given as following.

 LR

Logistic regression works very similar to linear regression, but with a binomial response variable. A logistic regression will model the chance of an outcome based on individual characteristics (Sperandei, 2014). Because chance is a ratio, what will be actually modeled is the logarithm of the chance given by:

logp1p=β0+β1x1+β2x2++βmxm (5)

where

p: indicates the probability of an event.

βi: the regression coefficients associated with the reference group and the xi variables.

 DT

A decision tree is a flowchart-like tree structure, where each inter node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node. Quinlan (1979) developed ID3 decision tree algorithm. Quinlan (2014) later presented C4.5 (a successor of ID3) decision tree algorithm, becoming a benchmark to which newer supervised learning algorithm are often compared to ID3 used information gain as its attribute selection measure. However, the measure (information gain) is biased toward tests with many outcomes. C4.5 used gain ratio as its attribute selection measure which attempt to overcome this bias. The two measures, gain ratio and information gain, can be formalized as follows:

GainRatioA=Gain(A)SplitInfoD(A) (6)
GainA=InfoDInfoA(D) (7)
SplitInfoDA=j=1v|Dj||D|×log2(|Dj||D|) (8)

where

Info(D): the average amount of information need to identify the class label of a tuple in D.

InfoA(D): the expected information required to classify a tuple from D based on the partitioning by attribute (A).

 RF

Random forest is a class of ensemble methods specially designed for decision tree classifier. It combines the predictions made by multiple decision trees, where each tree is generated based on the values of an independent set of random vectors. A random forest is defined formally as follows Breiman (2001). The strength of a set of classifiers refers to the average of performance of the classifiers, where performance is measured probabilistically in terms of the classifier’s margin:

margin,MX,Y=PYˆθ=YmaxZYPYˆθ=Z (9)

where

Yˆθ: the predicted class of X according to a classifier build from some random vector θ.

3.3. Development of prediction models

There are several classification techniques; named multi-layer perceptron (MLP), and DNN, used for constructing prediction models. These classification techniques are briefly described below.

  • (1)
    MLP: An ANN is an abstract computational model of a human brain. The architecture of an artificial neural network is defined by the characteristics of a node and the characteristics of the node’s connectivity in the network (Haykin and Lippmann, 1994). Perceptron is the simplest model in ANN model family. MLP can learn powerful non-linear transformations: in fact, with enough hidden units they can represent arbitrarily complex but smooth functions. In a perceptron, each input node is connected via a weighted link to the output node. The output of a perceptron model can be expressed as follows (Tan et al., 2006):
    yˆ=signwdxd+wd1xd1++w1x1+w0x0=sign(wx) (10)
    where w0, w1, …, wd are the weights of the input links, x0, x1, …, wd are the input attribute values and w is the weight vector and x is the input vector x. The sign function, which acts as an activation function for the output neuron, output a value (+1) if its argument is positive and (−1) if its argument is positive. An artificial neural network has a more complex structure than that of a perceptron model. The goal of the MLP learning algorithm is to determine a set of weights (w) that minimize the total sum of squared errors:
    Ew=12i=1N(yiyˆi)2 (11)
    where (yiyˆi) is the predicted error. The weight update formula used the gradient descent method can be written as follows:
    wj=wjlE(w)wj (12)
    where l is the learning rate.
  • (2)
    DNN: A DNN can be considered as a conventional MLP with many hidden layers (thus deep). The DNN parameters are optimized with back propagation using stochastic gradient descent. DNN, a (L + 1)-layer MLP, is used to model the posterior probability Ps|o(s|o) of a hidden Markov model (HMM) tied state s given an observation vector o. The first L layers, l = 0… L−1, are hidden layers that model posterior probability of hidden nodes hl given input vectors vl from previous layer while the top layer L is used to compute the posterior probability for all tied states using softmax (Pan et al., 2012):
    Phj|vlhjl|vl=11+ezjlvl,0lL (13)
    Ps|vLs|vL=softmaxs(zL(vL)) (14)
    zlvl=WlTvl+αl (15)
    where Wl and αl denote weight matrix and bias vectors for hidden layer l, and hjl, zjlvl denote the jth component of hidden node, hl and its activation zlvl, respectively

3.4. Assessment metrics

There are six metrics, Precision, Recall, F1-score, Accuracy, FPR (false positive rate) and FNR (false negative rate), that are commonly used in prediction for evaluating the machine learning algorithms proposed in this study. These six metrics (Tan et al., 2006) are as follows.

Precision=TP#/(TP#+FP#) (16)
Recall=TP#/(TP#+FN#) (17)
F1-score=2×Precision×Recall/(Precision+Recall) (18)
Accuracy=(TP#+TN#)/(TP#+TN#+FP#+FN#) (19)
FPR=FP#/(TN#+FP#) (20)
FNR=FN#/(TP#+FN#), (21)

where TP# and TN# present the number of the positive and negative terms of predicted instances which are classified correctly, respectively. FP# and FN# present the number of the positive and negative terms of predicted instances which are misclassified, respectively.

4. Experimental results

The dataset used to evaluate the prediction models and hyper-parameters of DNN model are described in Section 4.1. In Section 4.2, we first compare the prediction performance of two methods, neural network (MLP) (ANN with MLP) and DNN, using the COVID-19 dataset provided by Pourhomayoun and Shakibi (2021) to understand the performance difference of the two methods (ANN and DNN) in several metrics. In Section 4.3, we investigate the impact of the important features when building DNN models in prediction performance. There are two feature selection strategies (filter and wrapper) used to choose important features to build prediction models. In Section 4.4, we divided the COVID-19 dataset into several sub-datasets according to the attribute (country) into country sub-datasets and build country-based DNN prediction models. Finally, the experimental results are summarized in Section 4.5. The prediction models were implemented using the Python language and tested on a PC running Windows 10.

Table 3.

The performance difference of prediction models in metrics.

Measures DNN
(This study)
ANN
(Pourhomayoun and Shakibi, 2021)
ANN*
(Pourhomayoun and Shakibi, 2021)
Recall 98.62% 94.20% 95.49%
Precision 86.20% 86.86% 85.57%
F1-score 91.99% 90.38% 90.44%
Accuracy 91.41% 89.98% 89.91%
FPR 15.79% 14.24% 15.67%
FNR 1.38% 5.79% 4.51%

ANN*: We re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN (this study).

4.1. The dataset description and hyper-parameters of DNN model

The original dataset consists of more than 2,670,000 laboratory-confirmed COVID-19 patients from 146 countries around the world, including 307,382 labeled samples containing both male and female patients with an average age of 44.75. At the data cleaning stage, Pourhomayoun and Shakibi (2021) removed useless and redundant data elements and the unlabeled data samples. After that, data imputation techniques including mean/median/mode value replacement and KNN technique were used to handle missing values. Moreover, both recovered and deceased patients were created to train and test by Pourhomayoun and Shakibi (2021) to make sure the dataset is balanced. Finally, there are 57 features chosen out of 112 features and 12,020 instances in this dataset. Pourhomayoun and Shakibi (2021) have provided the processed dataset in hope to benefit the research community. We called this processed dataset as COVID-19 dataset in this study.

The hyper-parameters of the model, a Deep Neural Network (DNN), are specified as follows: Firstly, we set the learning rate to 0.0005. Secondly, the network structure of the DNN is designed as follows: In the input layer, we use “ReLU” as the activation function and set the number of cells to 200. We then incorporate four hidden layers, each utilizing “ReLU” as the activation function. Specifically, hidden layer #1 contains 300 cells, hidden layer #2 contains 200 cells, hidden layer #3 contains 500 cells, and hidden layer #4 contains 250 cells. Finally, in the output layer, we set the activation function to “sigmoid” and the number of cells to 1 to achieve the desired model output.

4.2. Performance of prediction models (ANN and DNN)

We first investigate the performance of the two methods (ANN and DNN) in following metrics, Precision, Recall, F1-score, Accuracy, FPR, and FNR, to understand the difference of prediction performance in COVID-19 dataset provided by Pourhomayoun and Shakibi (2021). The result in Table 3 indicated that the prediction performance metrics (Precision, Recall, F1-score, Accuracy, FPR, and FNR) of two methods (ANN and DNN) using COVID-19 dataset. Compared to the prediction performance provided by Pourhomayoun and Shakibi (2021), we find that DNN outperforms than ANN (MLP) in the prediction performance metrics (Recall, F1-score, Accuracy, and FNR). ANN (MLP) outperforms than DNN only in Precision and FPR metrics. Besides, we re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN. We find that DNN outperforms than ANN (MLP) in the prediction performance metrics (Recall, F1-score, Accuracy, and FNR). That is, DNN outperforms than ANN (MLP) in the prediction performance metrics (Recall, F1-score, Accuracy, and FNR).

Second, we investigate the performance of the two methods (ANN and DNN) in the area under the curve of ROC (receiver operating characteristic) (AUC (area under curve) of ROC) to understand the difference of prediction performance in COVID-19 dataset. The result in Table 4 indicated that the prediction performance (AUC of ROC) of two methods (ANN and DNN) using COVID-19 dataset. Compared to the prediction performance provided by Pourhomayoun and Shakibi (2021), we find that ANN (MLP) with AUC of ROC (92.76%) outperforms than DNN with AUC of ROC (91.41%). However, we re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of ANN (MLP). We find that DNN with AUC of ROC (91.41%) outperforms than ANN (MLP) with AUC of ROC (89.91%).

Table 4.

The performance difference of prediction models in the area of ROC curve.

DNN ANN ANN*
(This study) (Pourhomayoun and Shakibi, 2021) (Pourhomayoun and Shakibi, 2021)
91.41% 92.76% 89.91%
graphic file with name fx1_lrg.gif graphic file with name fx2_lrg.gif graphic file with name fx3_lrg.gif

ANN*: We re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN.

Finally, we investigate the performance of the two methods (ANN and DNN) in the area under the PRC (precision–recall curve) (AUC of PRC) to understand the difference of prediction performance in COVID-19 dataset. The result in Table 5 indicated that the prediction performance (AUC of PRC) of two methods (ANN and DNN) using COVID-19 dataset. Compared to the prediction performance provided by Pourhomayoun and Shakibi (2021), we find that DNN with AUC of PRC (92.75%) outperforms than ANN (MLP) with AUC of PRC (91.99%). Besides we re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN. We find that DNN with AUC of PRC (92.75%) still outperforms than ANN (MLP) with AUC of PRC (91.82%). Therefore, DNN outperforms than ANN (MLP) in the prediction performance (AUC of PRC).

Table 5.

The performance difference of prediction models in the area of PRC curve.

DNN ANN ANN*
(This study) (Pourhomayoun and Shakibi, 2021) (Pourhomayoun and Shakibi, 2021)
92.75% 91.99% 91.82%
graphic file with name fx4_lrg.gif graphic file with name fx5_lrg.gif graphic file with name fx6_lrg.gif

ANN*: We re-execute the Python codes provided by Pourhomayoun and Shakibi (2021) in the same computer which was used to evaluate the performance of DNN.

4.3. The impact of the important features

4.3.1. Feature filter methods (χ2, Pearson correlation, and information gain)

We compare prediction performance of DNN models built by different features which are recommended by different feature filter methods (χ2, Pearson correlation, and information gain) with the criteria, Precision, Recall, F1-score, Accuracy, ROC, PRC, FPR, and FNR.

First, we determine the level of independence among the feature and the class label by feature filter method (χ2). The Chi-square vales of all features (57 features) calculated by χ2 method are shown in Table 6. After that, we choose the Top N features, according to the Chi-square vales, to build DNN prediction models. From results in Table 7, we know that the prediction performance of DNN model built by Top 25 features performs very well. That is, we could use the Top 25 features to build a prediction model with high prediction performance as well as the prediction model built by all features (57 features).

Table 6.

The Chi-square vales of all features.

No Feature chi No Feature chi
1 city 0.000000 30 chronic_disease_HIV 0.157299
2 province 0.000000 31 chronic_disease_Parkinson 0.157299
3 country 0.000000 32 anorexia 0.157299
4 age 0.000000 33 expectoration 0.157299
5 travel_history_location 0.000000 34 lesions on chest radiographs 0.157299
6 chronic_disease_binary 0.000000 35 hypertension 0.157299
7 chronic_disease_Hypertension 0.000000 36 cardiac disease 0.157299
8 sex 0.000000 37 hypoxia 0.157299
9 pneumonia 0.000000 38 chronic_disease_prostate 0.179712
10 respiratory distress 0.000000 39 chronic_disease_TB 0.317311
11 chronic_disease_Diabetes 0.000000 40 chronic_disease_cereberal 0.317311
12 septic shock 0.000013 41 conjunctivitis 0.317311
13 chronic_disease_kidney 0.000162 42 dizziness 0.317311
14 Heart attack 0.000311 43 emesis 0.317311
15 rhinorrhea 0.001565 44 eye irritation 0.317311
16 sore throat 0.004509 45 obnubilation 0.317311
17 kidney failure 0.004678 46 myelofibrosis 0.317311
18 chronic_disease_heart 0.008151 47 somnolence 0.317311
19 chronic_disease_cardiac 0.014306 48 cough 0.324756
20 dyspnea 0.014306 49 Myalgia 0.479500
21 gasp 0.014306 50 chronic_disease_hypothyroidism 0.563703
22 headache 0.019631 51 diarrhea 0.563703
23 chronic_disease_COPD 0.025347 52 sputum 0.563703
24 fever 0.048815 53 cold 0.563703
25 chronic_disease_asthma 0.058782 54 shortness of breath 0.654721
26 chest pain 0.058782 55 chronic_disease_cancer 1.000000
27 chronic_disease_bronchitis 0.083265 56 chronic_disease_dyslipidemia 1.000000
28 chills 0.102470 57 fatigue 1.000000
29 chronic_disease_Hepatitis 0.157299
Table 7.

The prediction performance of Top N features by χ2 method.

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.8636 0.9745 0.9157 0.9103 0.9103 0.9254 0.1539 0.0255
10 0.8601 0.9699 0.9117 0.9061 0.9061 0.9225 0.1577 0.0301
15 0.8665 0.9647 0.9130 0.9081 0.9081 0.9244 0.1486 0.0353
20 0.8627 0.9814 0.9182 0.9126 0.9126 0.9267 0.1562 0.0186
25 0.8608 0.9815 0.9172 0.9114 0.9114 0.9258 0.1587 0.0185
30 0.8629 0.9732 0.9148 0.9093 0.9093 0.9248 0.1546 0.0268
35 0.8633 0.9784 0.9172 0.9117 0.9117 0.9262 0.1549 0.0216
40 0.8632 0.9872 0.9211 0.9154 0.9154 0.9284 0.1564 0.0128
45 0.8663 0.9689 0.9147 0.9097 0.9097 0.9254 0.1496 0.0311
50 0.8636 0.9775 0.9170 0.9116 0.9116 0.9262 0.1544 0.0225
54 0.8674 0.9607 0.9117 0.9069 0.9069 0.9239 0.1469 0.0393
57 0.8620 0.9862 0.9199 0.9141 0.9141 0.9275 0.1579 0.0138

Second, we calculate Pearson correlation coefficient values of all features (57 features) by feature filter method (Pearson correlation) as shown in Table 8. After that, we choose the Top N features, according to the Pearson correlation coefficient vales, to build DNN prediction models. From results in Table 9, we know that the prediction performance of DNN model built by Top 50 features filtered by Pearson method performs well. That is, we should use the Top 55 features to build a prediction model with high prediction performance as well as the prediction model built by all features (57 features).

Table 8.

The Pearson correlation coefficient values of all features.

No Feature Pearson No Feature Pearson
1 country 0.502119 30 chronic_disease_hypothyroidism 0.0053
2 age 0.126401 31 diarrhea 0.0053
3 sex 0.114197 32 cold 0.0053
4 chronic_disease_binary 0.089623 33 fatigue 0.0000
5 chronic_disease_Hypertension 0.077358 34 chronic_disease_dyslipidemia −0.0000
6 chronic_disease_Diabetes 0.059286 35 chronic_disease_cancer −0.0000
7 chronic_disease_kidney 0.034424 36 shortness of breath −0.0041
8 rhinorrhea 0.028855 37 dizziness −0.0091
9 sore throat 0.025922 38 emesis −0.0091
10 chronic_disease_heart 0.024139 39 obnubilation −0.0091
11 chronic_disease_cardiac 0.022348 40 myelofibrosis −0.0091
12 headache 0.021291 41 somnolence −0.0091
13 chronic_disease_COPD 0.020400 42 anorexia −0.0129
14 fever 0.018040 43 expectoration −0.0129
15 chronic_disease_asthma 0.017242 44 hypertension −0.0129
16 chronic_disease_bronchitis 0.015800 45 cardiac disease −0.0129
17 chills 0.014898 46 hypoxia −0.0129
18 chronic_disease_Hepatitis 0.012900 47 chest pain −0.0172
19 chronic_disease_HIV 0.012900 48 dyspnea −0.0223
20 chronic_disease_Parkinson 0.012900 49 gasp −0.0223
21 lesions on chest radiographs 0.012900 50 kidney failure −0.0258
22 chronic_disease_prostate 0.012240 51 Heart attack −0.0329
23 chronic_disease_cereberal 0.009123 52 septic shock −0.0398
24 chronic_disease_TB 0.009121 53 city −0.0456
25 conjunctivitis 0.009121 54 respiratory distress −0.0650
26 eye irritation 0.009121 55 pneumonia −0.0716
27 cough 0.009007 56 province −0.1025
28 Myalgia 0.006452 57 travel_history_location −0.1265
29 sputum 0.005267
Table 9.

The prediction performance of Top N features by Pearson method.

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.9370 0.7975 0.8617 0.8720 0.8720 0.9179 0.0536 0.2025
10 0.8888 0.8085 0.8467 0.8537 0.8537 0.8965 0.1012 0.1915
15 0.9306 0.8098 0.8660 0.8747 0.8747 0.9178 0.0604 0.1902
20 0.9303 0.8090 0.8654 0.8742 0.8742 0.9174 0.0606 0.1910
25 0.9307 0.8108 0.8666 0.8752 0.8752 0.9180 0.0604 0.1892
30 0.9299 0.8116 0.8667 0.8752 0.8752 0.9178 0.0612 0.1884
35 0.9292 0.8125 0.8669 0.8753 0.8753 0.9177 0.0619 0.1875
40 0.9292 0.8103 0.8657 0.8743 0.8743 0.9172 0.0617 0.1897
45 0.9296 0.8105 0.8660 0.8745 0.8745 0.9174 0.0614 0.1895
50 0.9290 0.8118 0.8665 0.8749 0.8749 0.9174 0.0621 0.1882
55 0.8661 0.9742 0.9170 0.9118 0.9118 0.9266 0.1506 0.0258
57 0.8620 0.9862 0.9199 0.9141 0.9141 0.9275 0.1579 0.0138

Finally, the information gain values of all features (57 features) calculated by feature filter method (information gain) are shown in Table 10. After that, we choose the Top N features, according to the information gain values, to build DNN prediction models. From results in Table 11, we know that the prediction performance of DNN model built by Top 5 features performs very well. That is, we can also use the Top 5 features to build a new prediction model with high prediction performance, exhibiting the well prediction as the model built by all features (57 features), and save redundant computation cost.

Table 10.

The information gain (info) values of all features.

No Feature Info No Feature Info
1 city 0.411180 30 cardiac disease 0.0001
2 province 0.409138 31 chronic_disease_Hepatitis 0.0001
3 age 0.326288 32 chronic_disease_HIV 0.0001
4 country 0.280728 33 chronic_disease_Parkinson 0.0001
5 travel_history_location 0.017563 34 expectoration 0.0001
6 sex 0.006536 35 hypertension 0.0001
7 chronic_disease_binary 0.004766 36 hypoxia 0.0001
8 chronic_disease_Hypertension 0.003733 37 lesions on chest radiographs 0.0001
9 pneumonia 0.003241 38 chronic_disease_prostate 0.0001
10 respiratory distress 0.002587 39 chronic_disease_TB 0.0001
11 chronic_disease_Diabetes 0.002259 40 conjunctivitis 0.0001
12 septic shock 0.001097 41 dizziness 0.0001
13 Heart attack 0.000750 42 emesis 0.0001
14 chronic_disease_kidney 0.000718 43 eye irritation 0.0001
15 rhinorrhea 0.000577 44 myelofibrosis 0.0001
16 kidney failure 0.000462 45 obnubilation 0.0001
17 chronic_disease_heart 0.000404 46 somnolence 0.0001
18 sore throat 0.000375 47 chronic_disease_cereberal 0.0000
19 chronic_disease_cardiac 0.000346 48 cough 0.0000
20 dyspnea 0.000346 49 Myalgia 0.0000
21 gasp 0.000346 50 chronic_disease_hypothyroidism 0.0000
22 chronic_disease_COPD 0.000288 51 cold 0.0000
23 headache 0.000258 52 diarrhea 0.0000
24 chronic_disease_bronchitis 0.000173 53 sputum 0.0000
25 chest pain 0.000165 54 shortness of breath 0.0000
26 chronic_disease_asthma 0.000165 55 chronic_disease_cancer 0.0000
27 fever 0.000164 56 chronic_disease_dyslipidemia 0.0000
28 chills 0.000121 57 fatigue 0.0000
29 anorexia 0.000115
Table 11.

The prediction performance of Top N features by information gain method.

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.8586 0.9822 0.9163 0.9102 0.9102 0.9249 0.1617 0.0178
10 0.8495 0.9852 0.9123 0.9053 0.9053 0.9210 0.1745 0.0148
15 0.8506 0.9855 0.9131 0.9062 0.9062 0.9217 0.1730 0.0145
20 0.8525 0.9875 0.9150 0.9083 0.9083 0.9231 0.1709 0.0125
25 0.8545 0.9814 0.9136 0.9072 0.9072 0.9226 0.1671 0.0186
30 0.8512 0.9920 0.9162 0.9093 0.9093 0.9236 0.1734 0.0080
35 0.8516 0.9920 0.9165 0.9096 0.9096 0.9238 0.1729 0.0080
40 0.8506 0.9925 0.9161 0.9091 0.9091 0.9234 0.1744 0.0075
45 0.8520 0.9925 0.9169 0.9101 0.9101 0.9241 0.1724 0.0075
50 0.8514 0.9940 0.9172 0.9102 0.9102 0.9242 0.1735 0.0060
54 0.8506 0.9935 0.9165 0.9095 0.9095 0.9237 0.1745 0.0065
57 0.8620 0.9862 0.9199 0.9141 0.9141 0.9275 0.1579 0.0138

From the above discussion, we understand that filter method (information gain) performs best. Therefore, we consider that the filter method (information gain) is a better way to select features to build DNN prediction model.

4.3.2. Feature wrapper methods (DT, LR, and RF)

We compare prediction performance of DNN models built by different features which are ranked by feature wrapper methods (DT, LR, and RF) with the criteria, Precision, Recall, F1-score, Accuracy, ROC, PRC, FPR, and FNR.

First, we rank 57 features by feature wrapper method (DT). Then, we build DNN models by Top N features according to their rankings. The DNN prediction performance of Top N features filtered by wrapper method (DT) are shown in Table 12. From results in Table 12, we know that the prediction performance of Top 5 features filtered by wrapper method (DT) is well. That is, we could use the Top 5 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features).

Table 12.

The prediction performance of Top N features by wrapper method (DT).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.8611 0.9819 0.9175 0.9117 0.9117 0.9260 0.1584 0.0181
10 0.8644 0.9755 0.9166 0.9112 0.9112 0.9261 0.1531 0.0245
15 0.8630 0.9854 0.9201 0.9145 0.9145 0.9278 0.1564 0.0146
20 0.8603 0.9870 0.9193 0.9134 0.9134 0.9269 0.1602 0.0130
25 0.8643 0.9789 0.9180 0.9126 0.9126 0.9268 0.1537 0.0211
30 0.8599 0.9879 0.9195 0.9135 0.9135 0.9269 0.1609 0.0121
35 0.8620 0.9875 0.9205 0.9147 0.9147 0.9279 0.1581 0.0125
40 0.8614 0.9832 0.9183 0.9125 0.9125 0.9265 0.1582 0.0168
45 0.8644 0.9789 0.9181 0.9126 0.9126 0.9269 0.1536 0.0211
50 0.8761 0.9516 0.9123 0.9085 0.9085 0.9259 0.1346 0.0484

Second, we rank 57 features by feature wrapper method (LR). Then, we build DNN models by Top N features according to their rankings. The DNN prediction performance of Top N features filtered by wrapper method (LR) are shown in Table 13. From results in Table 13, we know that the prediction performance of Top 50 features filtered by wrapper method (LR) is well. That is, we could use the Top 50 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features).

Table 13.

The prediction performance of Top N features by wrapper method (LR).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.4962 0.4013 0.4437 0.4969 0.4969 0.5984 0.4075 0.5987
10 0.5017 0.5050 0.5033 0.5017 0.5017 0.6271 0.5017 0.4950
15 0.5037 0.4065 0.4499 0.5030 0.5030 0.6035 0.4005 0.5935
20 0.4965 0.5953 0.5415 0.4958 0.4958 0.6471 0.6037 0.4047
25 0.5577 0.6276 0.5906 0.5649 0.5649 0.6857 0.4978 0.3724
30 0.5566 0.6303 0.5911 0.5641 0.5641 0.6859 0.5022 0.3697
35 0.5573 0.6303 0.5916 0.5648 0.5648 0.6862 0.5007 0.3697
40 0.9303 0.7413 0.8251 0.8428 0.8428 0.9004 0.0556 0.2587
45 0.9286 0.7556 0.8332 0.8488 0.8488 0.9032 0.0581 0.2444
50 0.9305 0.8128 0.8677 0.8760 0.8760 0.9184 0.0607 0.1872

Finally, we rank 57 features by feature wrapper method (RF). Then, we build DNN models by Top N features according to their rankings. The DNN prediction performance of Top N features filtered by wrapper method (RF) are shown in Table 14. From results in Table 14, we know that the prediction performance of Top 10 features filtered by wrapper method (RF) is well. That is, we could use the Top 10 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features).

Table 14.

The prediction performance of Top N features by wrapper method (RF).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.8694 0.9634 0.9140 0.9093 0.9093 0.9255 0.1448 0.0366
10 0.8629 0.9809 0.9181 0.9125 0.9125 0.9266 0.1559 0.0191
15 0.8695 0.9679 0.9161 0.9113 0.9113 0.9267 0.1453 0.0321
20 0.8722 0.9686 0.9178 0.9133 0.9133 0.9282 0.1419 0.0314
25 0.8590 0.9832 0.9169 0.9109 0.9109 0.9253 0.1614 0.0168
30 0.8661 0.9772 0.9183 0.9131 0.9131 0.9273 0.1511 0.0228
35 0.8632 0.9792 0.9175 0.9120 0.9120 0.9264 0.1552 0.0208
40 0.8654 0.9699 0.9146 0.9095 0.9095 0.9251 0.1509 0.0301
45 0.8607 0.9857 0.9189 0.9131 0.9131 0.9268 0.1596 0.0143
50 0.8611 0.9725 0.9134 0.9078 0.9078 0.9237 0.1569 0.0275

From the above discussion, we understand that wrapper method (DT) performs best among the wrapper methods. Therefore, we consider that the wrapper method (DT) is a better way to select features to build DNN prediction model.

4.4. Performance of prediction models (ANN and DNN) in different countries

There are 12,020 instances in COVID-19 dataset. We first cluster instances according to the attribute (country) and further select the top 5 countries with more than 100 instances in COVID-19 dataset. There are China (139), Ethiopia (113), India (7309), Philippines (4058), and Singapore (111). We compare prediction performance of ANN/DNN models built by instances of different countries with the criteria, Precision, Recall, F1-score, Accuracy, ROC, PRC, FPR, and FNR.

From results in Table 15, we know that the prediction performance of DNN model outperforms than ANN (MLP) in the prediction performance metrics (Precision, Recall, F1-score, Accuracy, and FPR) in China (139), India (7309), and Philippines (4058). DNN model performs better than ANN model in the other 2 countries, Ethiopia (113) and Singapore (111). That is, we consider that DNN is a good method and could be used to build COVID-19 prediction models for predicting mortality risk in patients.

Table 15.

The prediction performance of ANN and DNN (Country-based instances).

Country Method Precision Recall F1-score Accuracy ROC PRC FPR FNR
China
(139)
DNN 0.9084 0.9835 0.9444 0.8995 0.6584 0.9531 0.6667 0.0165
ANN 0.8705 1.0000 0.9308 0.8702 0.5000 0.9353 1.0000 0.0000

Ethiopia
(113)
DNN 0.9646 1.0000 0.9820 0.9649 0.5000 0.9823 1.0000 0.0000
ANN 0.9646 1.0000 0.9820 0.9649 0.5000 0.9823 1.0000 0.0000

India
(7309)
DNN 0.7021 0.9739 0.8159 0.9033 0.9286 0.8409 0.1167 0.0261
ANN 0.6983 0.8285 0.7578 0.8834 0.8637 0.7822 0.1011 0.1715

Philippines
(4058)
DNN 0.9423 0.9932 0.9671 0.9367 0.5506 0.9709 0.8919 0.0068
ANN 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000

Singapore
(111)
DNN 0.9640 1.0000 0.9817 0.9640 0.5000 0.9820 1.0000 0.0000
ANN 0.9640 1.0000 0.9817 0.9640 0.5000 0.9820 1.0000 0.0000

Furthermore, the information gain values of all features (57 features) are calculated by feature filter method (information gain). After that, we choose the Top N features, according to the information gain values, to build prediction models (ANN and DNN) for the two countries (India and Philippines). We select top 37 features from India dataset and top 27 features from Philippines dataset to investigate the prediction performance.

From the results in Table 16, Table 17, we know that the prediction performance of models (ANN and DNN) built by Top 10 features from India dataset performs very well. That is, we could use the Top 10 features to build a prediction model with high prediction performance as well as the prediction model built by all features (57 features). Besides, from the results in Table 18, Table 19, we know that the prediction performance of models (ANN and DNN) built by Top 5 features from Philippines dataset performs very well. That is, we could use the Top 5 features to build a prediction model with high prediction performance.

Table 16.

The prediction performance of ANN with Top N features (India).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.6999 0.7582 0.7279 0.8752 0.8332 0.7557 0.0918 0.2418
10 0.6799 0.8173 0.7423 0.8751 0.8543 0.7687 0.1086 0.1827
15 0.6995 0.7551 0.7262 0.8747 0.8318 0.7543 0.0916 0.2449
20 0.6838 0.8291 0.7494 0.8780 0.8604 0.7752 0.1082 0.1709
25 0.6806 0.8198 0.7437 0.8756 0.8556 0.7700 0.1086 0.1802
30 0.6951 0.7837 0.7368 0.8767 0.8433 0.7632 0.0970 0.2163
35 0.6949 0.8707 0.7730 0.8874 0.8814 0.7971 0.1079 0.1293
37 0.6900 0.8745 0.7714 0.8859 0.8818 0.7961 0.1109 0.1255

Table 17.

The prediction performance of DNN with Top N features (India).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.6940 0.9316 0.7954 0.8945 0.9078 0.8203 0.1160 0.0684
10 0.6949 0.9739 0.8111 0.9001 0.9266 0.8373 0.1207 0.0261
15 0.6871 0.9633 0.8021 0.8953 0.9197 0.8292 0.1239 0.0367
20 0.6957 0.9764 0.8125 0.9008 0.9279 0.8387 0.1205 0.0236
25 0.6950 0.9671 0.8087 0.8993 0.9236 0.8346 0.1198 0.0329
30 0.6944 0.9789 0.8125 0.9005 0.9286 0.8390 0.1216 0.0211
35 0.6918 0.9739 0.8090 0.8988 0.9257 0.8357 0.1225 0.0261
37 0.6869 0.9764 0.8065 0.8968 0.9254 0.8343 0.1256 0.0236

Table 18.

The prediction performance of ANN with Top N features (Philippines).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000
10 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000
15 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000
20 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000
25 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000
27 0.9362 1.0000 0.9670 0.9362 0.5000 0.9681 1.0000 0.0000

Table 19.

The prediction performance of DNN with Top N features (Philippines).

Top N Precision Recall F1-score Accuracy ROC PRC FPR FNR
5 0.9394 0.9958 0.9668 0.9359 0.5269 0.9696 0.9421 0.0042
10 0.9409 0.9926 0.9661 0.9347 0.5388 0.9702 0.9151 0.0074
15 0.9398 0.9942 0.9662 0.9349 0.5299 0.9697 0.9344 0.0058
20 0.9397 0.9929 0.9656 0.9337 0.5293 0.9696 0.9344 0.0071
25 0.9393 0.9939 0.9659 0.9342 0.5259 0.9695 0.9421 0.0061
27 0.9399 0.9955 0.9669 0.9362 0.5306 0.9698 0.9344 0.0045

From the above experimental results, we could find that the proposed approach which integrates deep learning method (DNN) with hybrid methods (feature selection and instance clustering) performs very well in predicting mortality risk in patients with COVID-19 under fewer features.

4.5. Summary of experimental results

To investigate the prediction performance difference, we first build DNN model with the COVID-19 dataset provided by Pourhomayoun and Shakibi (2021). We find that the built DNN model outperforms than ANN model by Pourhomayoun and Shakibi (2021) in the criteria, Recall, F1-score, Accuracy, ROC, and PRC.

In addition, we investigate the impact of the important features by 3 feature filter methods (χ2, Pearson correlation, information gain). From the experimental results, we find that the filter method (information gain) performs best and filters top 5 important features to build prediction model (DNN), performing as well as original prediction model (DNN) built with all of 57 features. Therefore, we consider that the filter method (information gain) is a better way to select features to build DNN prediction model.

Furthermore, we investigate the impact of the important features by 3 feature wrapper methods (DT, LR, and RF). From the experimental results, we find that wrapper method (DT) performs best among the three wrapper methods. That is, the wrapper method (DT) filters top 10 important features to build prediction model (DNN) which performs as well as original prediction model (DNN) built with all of 57 features.

Finally, we study the prediction performance difference between ANN (MLP) and DNN among in different countries, China (139), Ethiopia (113), India (7309), Philippines (4058), and Singapore (111). From the experimental results, we find that DNN model outperforms than ANN (MLP) in the prediction performance metrics (Precision, Recall, F1-score, and Accuracy) in China (139), India (7309), and Philippines (4058). That is, DNN model seems to be a better method to predict mortality risk in patients with COVID-19. Moreover, the proposed hybrid approach (instance clustering, feature selection and deep learning method) performs very well in predicting mortality risk in patients with COVID-19 under fewer features.

5. Conclusion

In this study, we integrate deep learning with hybrid approaches (instance clustering and feature selection) to building prediction models and predicting mortality risk in patients with COVID-19.

The experimental results showed that the proposed feature based DNN model, holding Recall (98.62%), F1-score (91.99%), Accuracy (91.41%), and FNR (1.38%), outperforms than original prediction model (ANN) in the prediction performance. Furthermore, the proposed approach which uses the Top 5 features to build a DNN prediction model with high prediction performance as well as the prediction model built by all features (57 features). We find that the filter method (information gain) performs better than the other two feature filter methods (χ2). Therefore, the proposed framework, feature based DNN approach, could use fewer features to build prediction model with higher prediction performance for predicting mortality risk in patients with COVID-19.

The weakness and limitation of the proposed model are as follows. First, we use only one dataset to evaluate the proposed model. Second, we do not evaluate the performance of integrating feature filter or wrapper methods with machine learning algorithms, including support vector machine, artificial neural networks, random forest, decision tree, logistic regression, and k-nearest neighbor.

There are several issues that remain to be addressed in the future. First, there are other deep learning techniques which could be considered in the future. Second, it would be interesting to apply other data normalization methods to build more accurate prediction models. Finally, the exploration of other feature selection methods is still an interesting issue to increase prediction performance. Therefore, it would be a good new research direction to incorporate the above issues.

CRediT authorship contribution statement

Thing-Yuan Chang: Conceptualization, Supervision. Cheng-Kui Huang: Methodology, Writing – review & editing. Cheng-Hsiung Weng: Conceptualization, Methodology, Writing – review & editing, Software. Jing-Yuan Chen: Writing – original draft, Software.

Declaration of Competing Interest

The authors declare that they have no Conflict of interests.

Data availability

Data will be made available on request.

References

  1. Akçay M.Ş., Özlü T., Yilmaz A. Radiological approaches to COVID-19 pneumonia. Turk. J. Med. Sci. 2020;50(SI-1):604–610. doi: 10.3906/sag-2004-160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aremu O.O., Cody R.A., Hyland-Wood D., McAree P.R. A relative entropy based feature selection framework for asset data in predictive maintenance. Comput. Ind. Eng. 2020;145 [Google Scholar]
  3. Bahassine S., Madani A., Al-Sarem M., Kissi M. Feature selection using an improved Chi-square for Arabic text classification. J. King Saud Univ.-Comput. Inf. Sci. 2020;32(2):225–231. [Google Scholar]
  4. Breiman L. Random forests. Mach. Learn. 2001;45(1):5–32. [Google Scholar]
  5. Cai J., Luo J., Wang S., Yang S. Feature selection in machine learning: A new perspective. Neurocomputing. 2018;300:70–79. [Google Scholar]
  6. Chen Y.C., Lu P.E., Chang C.S., Liu T.H. A time-dependent SIR model for COVID-19 with undetectable infected persons. IEEE Trans. Netw. Sci. Eng. 2020;7(4):3279–3294. doi: 10.1109/TNSE.2020.3024723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Conghy T., Pon B., Anderson E. 2020. When does hospital capacity get overwhelmed in USA. https://medium.com/@trentmc0/when-does-hospital-capacity-get-overwhelmed-in-usa-germany-a06cf2835f89. [Google Scholar]
  8. Covid C.D.C., Team R., COVID C., Team R., COVID C., Team … R., Sauber-Schatz E. Severe outcomes among patients with coronavirus disease 2019 (COVID-19)—United States, February 12–March 16, 2020. Morb. Mortal. Wkly. Rep. 2020;69(12):343. doi: 10.15585/mmwr.mm6912e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ellefsen A.L., Bjørlykhaug E., Æsøy V., Ushakov S., Zhang H. Remaining useful life predictions for turbofan engine degradation using semi-supervised deep architecture. Reliab. Eng. Syst. Saf. 2019;183:240–251. [Google Scholar]
  10. Forman G. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 2003;3(Mar):1289–1305. [Google Scholar]
  11. Haykin S., Lippmann R. Neural networks, a comprehensive foundation. Int. J. Neural Syst. 1994;5(4):363–364. [Google Scholar]
  12. Hemdan E.E.D., Shouman M.A., Karar M.E. 2020. Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images. arXiv preprint arXiv:2003.11055. [Google Scholar]
  13. Laurence E., Doyon N., Dubé L.J., Desrosiers P. Spectral dimension reduction of complex dynamical networks. Phys. Rev. X. 2019;9(1) [Google Scholar]
  14. Lawal M.O. Tomato detection based on modified YOLOv3 framework. Sci. Rep. 2021;11(1):1–11. doi: 10.1038/s41598-021-81216-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Li X., Zhang W., Ding Q. Deep learning-based remaining useful life estimation of bearings using multi-scale feature extraction. Reliab. Eng. Syst. Saf. 2019;182:208–218. [Google Scholar]
  16. Loey M., Manogaran G., Khalifa N.E.M. A deep transfer learning model with classical data augmentation and cgan to detect covid-19 from chest ct radiography digital images. Neural Comput. Appl. 2020:1–13. doi: 10.1007/s00521-020-05437-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Matkovic F., Ivasic-Kos M., Ribaric S. A new approach to dominant motion pattern recognition at the macroscopic crowd level. Eng. Appl. Artif. Intell. 2022;116 [Google Scholar]
  18. Mukhopadhyay A.K., Samui S. An experimental study on upper limb position invariant EMG signal classification based on deep neural network. Biomed. Signal Process. Control. 2020;55 [Google Scholar]
  19. Ozturk T., Talo M., Yildirim E.A., Baloglu U.B., Yildirim O., Acharya U.R. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 2020;121 doi: 10.1016/j.compbiomed.2020.103792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ozyurt F., Tuncer T., Subasi A. An automated COVID-19 detection based on fused dynamic exemplar pyramid feature extraction and hybrid feature selection using deep learning. Comput. Biol. Med. 2021;132 doi: 10.1016/j.compbiomed.2021.104356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pan J., Liu C., Wang Z., Hu Y., Jiang H. 2012 8th International Symposium on Chinese Spoken Language Processing. IEEE; 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMs in acoustic modeling; pp. 301–305. [Google Scholar]
  22. Pham Q.V., Nguyen D.C., Huynh-The T., Hwang W.J., Pathirana P.N. 2020. Artificial intelligence (AI) and big data for coronavirus (COVID-19) pandemic: A survey on the state-of-the-arts. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Pourhomayoun M., Shakibi M. Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making. Smart Health. 2021;20 doi: 10.1016/j.smhl.2020.100178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Quinlan J.R. Expert Systems in the Micro Electronics Age. 1979. Discovering rules by induction from large collections of examples. [Google Scholar]
  25. Quinlan J.R. Induction of decision trees. Mach. Learn. 1986;1(1):81–106. [Google Scholar]
  26. Quinlan J.R. Simplifying decision trees. Int. J. Man-Mach. Stud. 1987;27(3):221–234. [Google Scholar]
  27. Quinlan J.R. Elsevier; 2014. C4. 5: Programs for Machine Learning. [Google Scholar]
  28. Roy A.M. Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain–computer interface. Eng. Appl. Artif. Intell. 2022;116 [Google Scholar]
  29. Russell S.J., Norvig P. Pearson Education Limited; Malaysia: 2016. Artificial Intelligence: A Modern Approach. [Google Scholar]
  30. Shastry K.A., Sanjay H.A. A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture. Knowl.-Based Syst. 2021;232 [Google Scholar]
  31. Sperandei S. Understanding logistic regression analysis. Biochem. Med. 2014;24(1):12–18. doi: 10.11613/BM.2014.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Tan P.N., Steinbach M., Kumar V. Pearson Addison Wesley; Boston: 2006. Introduction to Data Mining, Vol. 1. [Google Scholar]
  33. Thaseen I.S., Kumar C.A., Ahmad A. Integrated intrusion detection model using chi-square feature selection and ensemble of classifiers. Arab. J. Sci. Eng. 2019;44(4):3357–3368. [Google Scholar]
  34. Wang S., Kang B., Ma J., Zeng X., Xiao M., Guo … J., Xu B. A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19) Eur. Radiol. 2021:1–9. doi: 10.1007/s00330-021-07715-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Yuvaraj N., Chang V., Gobinathan B., Pinagapani A., Kannan S., Dhiman G., Rajan A.R. Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification. Comput. Electr. Eng. 2021;92 [Google Scholar]
  36. Zhao J., Zhang Y., He X., Xie P. 2020. Covid-ct-dataset: a ct scan dataset about covid-19. arXiv preprint arXiv:2003.13865, 490. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from Engineering Applications of Artificial Intelligence are provided here courtesy of Elsevier

RESOURCES