Skip to main content
IEEE - PMC COVID-19 Collection logoLink to IEEE - PMC COVID-19 Collection
. 2020 Sep 22;22(18):17573–17582. doi: 10.1109/JSEN.2020.3025855

A Seven-Layer Convolutional Neural Network for Chest CT-Based COVID-19 Diagnosis Using Stochastic Pooling

Yudong Zhang 1, Suresh Chandra Satapathy 2, Li-Yao Zhu 3, Juan Manuel Gorriz 4, Shuihua Wang 5,
PMCID: PMC9564037  PMID: 36346095

Abstract

(Aim) COVID-19 pandemic causes numerous death tolls till now. Chest CT is an effective imaging sensor system to make accurate diagnosis. (Method) This article proposed a novel seven layer convolutional neural network based smart diagnosis model for COVID-19 diagnosis (7L-CNN-CD). We proposed a 14-way data augmentation to enhance the training set, and introduced stochastic pooling to replace traditional pooling methods. (Results) The 10 runs of 10-fold cross validation experiment show that our 7L-CNN-CD approach achieves a sensitivity of 94.44±0.73, a specificity of 93.63±1.60, and an accuracy of 94.03±0.80. (Conclusion) Our proposed 7L-CNN-CD is effective in diagnosing COVID-19 in chest CT images. It gives better performance than several state-of-the-art algorithms. The data augmentation and stochastic pooling methods are proven to be effective.

Keywords: Convolutional neural network, data augmentation, deep learning, stochastic pooling; COVID-19

I. Introduction

Covid-19 (also known as coronavirus) was declared a Public Health Emergency of International Concern on 30/01/2020, and declared as a pandemic on 11/03/2020.

Till 2/Sep, this COVID-19 pandemic caused 25.8 million confirmed cases and 858.2 thousand death tolls (US 187.4k deaths, Brazil 122.5k deaths, India 66.3k deaths, Mexico 65.2k deaths, UK 41.5k deaths, etc.)

Global economy experienced negative effects from COVID-19. For example, Balsalobre-Lorente, et al. [1] analyzed consequences of COVID-19 on the social isolation of Chinese economy. Chaudhary, et al. [2] presented reflections for policy and program of the effect of COVID-19 on economy in India.

Two prevail diagnosis are available. One is viral testing via a nasopharyngeal swab to test the presence of viral RNA fragments [3]. Another is imaging methods, among which the chest computed tomography (CCT) [4] is one of the imaging devices that can provide the highest sensitivity. The CCT uses X-ray generator and X-ray sensors that rotate around the subjects.

The main biomarkers in CCT differentiating COVID-19 from healthy people are the asymmetric peripheral ground-glass opacities (GGOs) without pleural effusions [5]. This study collects those CCT slices.

However, manual interpretation by radiologists is tedious and easy to be influenced by inter-expert and intra-expert factors (such as fatigue, emotion, etc.). Smart diagnosis systems via computer vision and artificial intelligence can benefit patients, radiologists, experts and hospitals. Traditional artificial intelligence (AI) and modern deep learning (DL) methods have achieved excellent results in analyzing medical images, e.g., Lu [6] proposed a radial-basis-function neural network (RBFNN) to detect pathological brains. Yang [7] presented a kernel-based extreme learning classifier (K-ELM) to create a novel pathological brain detection system. Their method was robust and effective. Lu [8] proposed a novel extreme learning machine trained by the bat algorithm (ELM-BA) approach. Li and Liu [9] introduced the real-coded biogeography-based optimization (RCBBO) to detect diseased brains. Jiang [10] used a six-layer convolutional neural network (6L-CNN) to recognize sign language fingerspelling. Szegedy, et al. [11] presented the GoogleNet. Yu and Wang [12] suggested the use of ResNet18 for mammogram abnormality detection. Furthermore, some smart health systems gained success in emotion-aware security [13], authentication [14], and IoT [15].

We proposed a novel 7-layer convolutional neural network for COVID-19 diagnosis (7L-CNN-CD). To improve its performance, three improvements were proposed in this study: (i) A 12-way data augmentation (DA-12) was proposed; (ii) Stochastic pooling was introduced to replace traditional pooling methods;

II. Dataset

Image acquisition CT configuration and method: Philips Ingenuity 64 row spiral CT machine, KV: 120, MAS: 240, layer thickness 3 mm, layer spacing 3 mm, screw pitch 1.5: lung window (W: 1500 HU, L: -500 HU), Mediastinum window (W: 350 HU, L: 60 HU), thin layer reconstruction according to the lesion display, layer thickness and layer distance are 1mm lung window image. The patients were placed in a supine position, breathing deeply after holding in, and conventionally scanned from the lung tip to the costal diaphragm angle.

For each subject, 1-4 slices were chosen. Slice level selection (SLS) method was employed: For COVID-19 pneumonia patients, the slice showing the largest size and number of lesions was selected. For normal subjects, any level of the image can be selected. The resolutions of all images are Inline graphic. Table I shows the demographics, where HC means healthy control.

TABLE I. Demographics of Subjects Used in This Study.

No. of subjects (m/f) No. of Images Age Range
COVID-19 142 (95/47) 320 22-91
HC 142 (88/54) 320 21-76

When there are differences between the two analyses Inline graphic, a superior doctor Inline graphic was consulted to reach a consensus. Suppose Inline graphic means a CCT image scan, Inline graphic means the labelling of each individual expert, and the final labelling Inline graphic is obtained by

II.

where MV denotes majority voting, Inline graphic represents the labelling of all three experts.

III. Methodology

Table VIII shows the abbreviations and their full names for ease of understanding of our methodology part.

TABLE VIII. Abbreviation List.

Meanings Abbreviations
MV majority voting
SLS Slice level selection
HC Healthy control
CCT Chest computed tomography
DS downsampling
HS histogram stretching
CR compression ratio
DA Data augmentation
(A)(M)(S)(L2)P (Average) (Max) (Stochastic) ( Inline graphic norm) pooling
MCC Matthews correlation coefficient
FMI Fowlkes–Mallows index

A. Preprocessing

The original dataset containing 320 COVID-19 images and 320 HC images is symbolized as Inline graphic, each image is symbolized as Inline graphic, Inline graphic. We have

A.

Figure 1(a) shows a raw COVID-19 CCT image. Figure 1(b) shows the flowchart of our preprocessing procedure. First, we converted all color images to grayscale by only reserving the luminance information. The reason of performing grayscale is there is no need to store a grayscale image in three color channels. Directly inputting original RGB images to the neural network will increase the computation burdens. Thus, we get the grayscale image set Inline graphic as

A.

where Inline graphic means the grayscale operation.

Fig. 1.

Fig. 1.

Preprocessing on raw dataset.

Second, histogram stretching (HS) method was used to increase every slice’s contrast. For Inline graphic-th image Inline graphic, we first calculate their minimum grayscale value Inline graphic and maximum grayscale value Inline graphic respectively by

A.

here ( Inline graphic, Inline graphic means coordinates of pixel of the image Inline graphic. The new histogram stretched image Inline graphic is obtained by

A.

In all, we get the histogram stretched image set Inline graphic as above.

Third, we crop the images to remove the texts at the margin areas, and the checkup bed at the bottom area. Thus, we get the cropped dataset Inline graphic as

A.

where Inline graphic represents crop operation. Four crop variables: top, bottom, left, and right means the pixels to be removed during crop operation. In this study all their values equal 150. Now the size of each image is reduced from Inline graphic to Inline graphic.

Fourth, we downsampled each image to size of [256, 256], and we now get the resized image set Inline graphic as

A.

where Inline graphic means the downsampling (DS) function, where Inline graphic is a downsampled image of original image Inline graphic.

Table II compares the size and storage of each image Inline graphic, Inline graphic, Inline graphic at every preprocessing step. We can see here after preprocessing procedure, each image will only cost about 2.08% of its original storage or size. The compression ratio (CR) rates of Inline graphic-th image of final state Inline graphic to original stage Inline graphic were calculated by following equation.

A.

TABLE II. Image Size and Storage per Image at Each Preprocessing Step.

Preprocess Symbol Size (per image) Storage (per image)
Original Inline graphic Inline graphic 12,582,912
Grayscale Inline graphic Inline graphic 4,194,304
HS Inline graphic Inline graphic 4,194,304
Crop Inline graphic Inline graphic 2,096,704
DS Inline graphic Inline graphic 262,144

We can see here the storage CR equals size CR for any Inline graphic-th image. Figure 2 shows two samples from the preprocessed dataset Inline graphic.

Fig. 2.

Fig. 2.

Two samples of preprocessed dataset Inline graphic.

B. Improvement I: Data Augmentation

Generally, the CCT image set faces small-size dataset (SSD) and lack of generalization (LoG) problems. To break the curse of SSD and LoG, there are four possible types of solutions: (i) data generation (DG); (ii) regularization approach (RA), (iii) ensemble approach (EA); and (iv) data augmentation (DA). All those DG, RA, EA, and DA methods are effective in handling SSD and LoG problems.

We proposed a 14-way DA method, as shown in Figure 3. We will use 10-fold cross validation technique. Suppose the preprocessed CCT image set Inline graphic will split into ten folds, nine of which form the training set Inline graphic, and the rest forms test set Inline graphic.

B.

where Inline graphic means the cardinality of the set Inline graphic. For ease of reading, we ignore the run-index Inline graphic in following texts, and just simplify the situations as Inline graphic, and we assume Inline graphic contains Inline graphic images

B.

For each image Inline graphic, we shall define all the 14 different DA operations.

Fig. 3.

Fig. 3.

Illustration of our DA-14.

1). Rotation:

Rotation angle Inline graphic was in the value from −30° to 30° in increase of 2°, skipping the value of Inline graphic, since it corresponds to the original image Inline graphic.

1).

where the rotation factor vector Inline graphic is defined as

1).

2). Scaling:

All training CCT images were scaled with scaling factor Inline graphic, the values of which vary from 0.7 to 1.3 with increase of 0.02, skipping the value of 1.

2).

where scaling factor vector Inline graphic is defined as

2).

3). Noise Injection (NI):

The Inline graphic-mean Inline graphic-variance Gaussian noises were added to the all CCT training images to produce 30 new noised images.

3).

where the mean and variance vector definition of noise are defined as Inline graphic, Inline graphic. The values of 0 and 0.01 are default values of mean and variance of Gaussian noises, respectively.

4). Random Translation (RT):

All CCT image Inline graphic was translated 30 times with random horizontal shift vector Inline graphic and random vertical shift Inline graphic

4).

where the values of Inline graphic and Inline graphic are in the range of [−15, 15], and obey uniform distribution Inline graphic.

4).

5). Gamma Correction (GC):

GC can help adjust the contrast of original image [16]. The factor vector of GC Inline graphic varied from 0.4 to 1.6 with increase of 0.04, skipping the value of 1.

5).

where the values of Inline graphic is chosen as:

5).

6). Horizontal Shear Transform (HST):

We will generate 30 horizontal shear transform (HST) images as

6).

where the HST values are assigned from −0.15 to 0.15 with increase of 0.01, skipping the value o 0

6).

7). Vertical Shear Transform (VST):

Similarly, we generate 30 vertical shear transform (VST) images as below. Besides, the values of VST factor vector Inline graphic are the same as Inline graphic.

7).

8). Mirror:

The original image Inline graphic is mirrored and we obtain a new image Inline graphic. Suppose Inline graphic is the mirror function, we have

8).

we define following operations:

8).

9). Concatenation:

All the first seven DA results are concatenated, and we have

9).

where Inline graphic means the concatenation. The size of Inline graphic is Inline graphic images, then we have the results of 8–14 DA techniques as

9).

Finally, one original image Inline graphic will yield to 365 images (containing itself) in the enhanced training set.

9).

C. Improvement 2: Stochastic Pooling

In traditional CNN, the activation maps (AMs) are usually too large [17] (i.e., contain too many features) which will cause (i) overfitting of the training and (ii) large computational costs. Thus, pooling layers (PLs) are frequently used to reduce the size of AMs. Besides, PL could help guarantee the characteristics of invariance-to-translation. There exist three generally-used pooling techniques: (i) Inline graphic norm pooling (L2P); (ii) average pooling (AP); and (iii) max pooling (MP). Assume pooling is a function Inline graphic.

L2P calculates the Inline graphic norm [18] of a given region Inline graphic. Suppose

C.

L2P output Inline graphic is defined as Inline graphic. In this study, we add a constant 1/4 under the square root to make it easier to compare with other pooling methods. This constant 1/4 does not influence training and inference.

C.

The AP [19] calculates the mean value of region Inline graphic

C.

Finally, MP picks out the maximal value from region Inline graphic

C.

Figure 4 showcases the differences of our pooling methods, where we assume both pooling size and pooling stride equal 2. Observe the top left region Inline graphic, its vectorization is Inline graphic. The calculation of L2P, AP, and MP are as below: Inline graphic, Inline graphic, Inline graphic.

Fig. 4.

Fig. 4.

A toy example of four pooling techniques. ( Inline graphic norm pooling; AP = average pooling; MP =max pooling; SP = stochastic pooling).

The SP was invented to conquer the problems caused by aforementioned three pooling methods: L2P, MP and AP. Both L2P and AP does not work well, since all pixels in Inline graphic are considered by L2P and AP, thus they could reduce the values of strong activations because of other surrounding near-zero pixels. On the other hand, the MP elucidates this obstruction, although it simply overfits the training set and causes the LoG problem.

Instead of computing the Inline graphic norm, average value or max value, the output of the SP Inline graphic is attained via sampling from a multinomial distribution [20] formed from the activations of each element in region Inline graphic [21].

  • (1)
    Reckon the probability Inline graphic of each element Inline graphic.
    graphic file with name M212.gif
  • (2)
    Select a location Inline graphic within the Inline graphic in accordance with the probability Inline graphic, calculated by scanning the Inline graphic from up to bottom and left to right [22].
    graphic file with name M217.gif
  • (3)
    The output is the value at location Inline graphic.
    graphic file with name M219.gif

We use the first block Inline graphic in Figure 4 as an instance. The calculation procedures of SP are described below:

C.

Thus, we get Inline graphic, and Inline graphic. Using the probability map Inline graphic, we randomly select the position Inline graphic associated with probability of Inline graphic. Thus, the output Inline graphic of SP at region Inline graphic is 6. Instead of considering the max values barely or considering all the elements in the region, SP uses non-maximal activations randomly within the region Inline graphic.

D. Measures and Indicators

We set a 10-fold cross validation on the whole dataset Inline graphic. Each fold will contain 32 COVID-19 images and 32 HC images. Within each trial, the training set contains 288+288 = 576 images, and the test set contains 32+32 = 64 images. After combining all the 10 trials, the test set will contain 640 images. The above 10-fold cross validation will run 10 times, and so the final report was based on Inline graphic images. Table III shows the split setting of our dataset.

TABLE III. Split Setting of our Dataset.

Set Percentage COVID-19 HC Total
Training Inline graphic 90% 288 288 576
DA Training 121,248 121,248 242,496
Test Inline graphic 10% 32 32 64
Total 100% 320 320 640

This proposed seven-layer convolutional neural network for COVID-19 diagnosis (7L-CNN-CD) will be tested by 10 runs of 10-fold cross validation. Suppose the ideal confusion matrix Inline graphic over the test set at Inline graphic-th trial and Inline graphic-th run is

D.

where the value 32 can be found in the test row in Table III. The value of 32 means the number of COVID-19 cases and the number of HC cases in the test set. After running through 1-10 trials, and we get the confusion matrix of one-run 10-fold CV as

D.

In realistic inference, we cannot get the perfect diagonal matrix, where all off-diagonal elements are zero. Suppose the confusion matrix at Inline graphic-th run is

D.

Note Inline graphic, Inline graphic in this study. Here Inline graphic and Inline graphic represent true positive (TP) and true negative (TN), respectively. Positive class (P) is COVID-19, and negative class (N) is healthy control. Inline graphic and Inline graphic represent false negative (FN) and false positive (FP), respectively. We can define four simple measures as

D.

Three advanced measures are defined below. F1 score is:

D.

Matthews correlation coefficient (MCC) is defined as

D.

Fowlkes-Mallows index (FMI) is defined as

D.

After combining 10 runs Inline graphic, we can calculate the mean and standard deviation (SD) of all Inline graphic-th Inline graphic measures as

D.

E. Proposed 7L-CNN-CD Algorithm

Figure 5 presents the structure of proposed 7-layer CNN (7L-CNN). After training, the network can be used to diagnose COVID-19 is called 7L-CNN-CD. The sizes of activation map are labelled at each cube in Figure 5. Table IV shows the pseudocode of our 7L-CNN-CD model. Here we divide our algorithm into two phases: (I) Preprocessing and (II) 10 runs of 10-fold cross validation.

Fig. 5.

Fig. 5.

Structure of proposed 7-layer CNN.

TABLE IV. Pseudocode of Our 7L-CNN-CD Model.

Input: Original Image Set Inline graphic
Ground Truth: Inline graphic obtained from two junior and one senior radiologists. See Eq. (1.a)
Phase I: Preprocessing
Grayscale Inline graphic. See Eq. (3)
Histogram Stretching Inline graphic. See Eq. (5.a)
Image Crop Inline graphic. See Eq. (6.a)
Downsampling Inline graphic. See Eq. (7)
Phase II: 10 runs of 10-fold cross validation
for Inline graphic % Inline graphic is run index
Randomly split preprocessed set Inline graphic into 10 folds
Inline graphic,
for Inline graphic:10 % Inline graphic is trial index
Step II.A: Training & Test Set
Inline graphic is chosen as the t-th fold.
Inline graphic
Training Set. Inline graphicB Inline graphic is chosen as the other folds.
Inline graphic
Enhanced Training Set.
DA Inline graphic, see equation (27).
Step II.B: Create Initial CNN model
Create an initial deep network Inline graphic via 7L-CNN model;
Use SP to replace all pooling layers in 7L-CNN model. See equation (34).
Step II.C Trained 7L-CNN-CD model
Train 7L-CNN network using Inline graphic and ground truth Inline graphic
Trained model Inline graphic
Inline graphic;
Step II.D: Confusion Matrix Performance
Test prediction Inline graphic
Inline graphic
Test performance. Inline graphic is obtained by comparing test prediction and ground truth.
Inline graphic.
end
Summarize all 10 trials and get Inline graphic, see Eq. (38).
Calculate Inline graphic, see Eqs. (39.a)(42)
end
Output mean and SD of Inline graphic. see Eq. (43.a)

IV. Results, and Discussions

A. Result of Data Augmentation

Suppose Inline graphic is Figure 2(a), Figure 6 shows the DA(1-7) results. Due to the page limit, their horizontal results DA(8-14) are not presented in this article. Particularly, we only select 15 new generated images among 30 generate results per DA technique.

Fig. 6.

Fig. 6.

Half of DA(1–7) results.

Figure 6(a) presents the 15 rotated new images. Figure 6(b-e) present 15 scaled, 15 noise-injected, 15 randomly translated, and 15 Gamma corrected images, respectively. Figure 6(f-g) present the 15 HST and 30 VST new images, respectively.

B. SP Compared With Other Three Pooling Methods

The results of SP against other three pooling methods were presented in Table V, which indicates that SP obtained the best sensitivity, accuracy, F1, MCC, and FMI. The definition of Inline graphic can be found in Eqs. (39.a)(42).

TABLE V. Ten Runs of Different Pooling Methods.

L2P Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
1 90.63 94.69 94.46 92.66 92.50 85.38 92.52
2 91.56 92.81 92.72 92.19 92.14 84.38 92.14
3 92.50 94.06 93.97 93.28 93.23 86.57 93.23
4 93.13 93.75 93.71 93.44 93.42 86.88 93.42
5 92.19 93.44 93.35 92.81 92.77 85.63 92.77
6 91.56 93.44 93.31 92.50 92.43 85.01 92.43
7 93.13 94.38 94.30 93.75 93.71 87.51 93.71
8 93.75 91.56 91.74 92.66 92.74 85.33 92.74
9 93.13 95.31 95.21 94.22 94.15 88.46 94.16
10 93.75 93.75 93.75 93.75 93.75 87.50 93.75
M+SD 92.53±1.04 93.72±1.04 93.65±0.96 93.13±0.66 93.08±0.67 86.27±1.31 93.09±0.66
AP Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
1 91.25 94.38 94.19 92.81 92.70 85.67 92.71
2 91.88 94.06 93.93 92.97 92.89 85.96 92.90
3 92.50 92.19 92.21 92.34 92.36 84.69 92.36
4 92.81 94.69 94.59 93.75 93.69 87.52 93.70
5 92.81 95.00 94.89 93.91 93.84 87.83 93.84
6 91.25 92.50 92.41 91.88 91.82 83.76 91.83
7 92.50 92.50 92.50 92.50 92.50 85.00 92.50
8 93.44 95.31 95.22 94.38 94.32 88.77 94.33
9 92.81 94.38 94.29 93.59 93.54 87.20 93.55
10 95.63 94.38 94.44 95.00 95.03 90.01 95.03
M+SD 92.69±1.25 93.94±1.12 93.87±1.09 93.31±0.98 93.27±0.99 86.64±1.96 93.27±0.99
MP Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
1 94.69 95.31 95.28 95.00 94.98 90.00 94.98
2 92.19 92.81 92.77 92.50 92.48 85.00 92.48
3 94.69 94.69 94.69 94.69 94.69 89.38 94.69
4 93.75 92.81 92.88 93.28 93.31 86.57 93.31
5 92.50 94.38 94.27 93.44 93.38 86.89 93.38
6 95.31 91.56 91.87 93.44 93.56 86.94 93.57
7 94.38 93.44 93.50 93.91 93.93 87.82 93.94
8 95.00 94.69 94.70 94.84 94.85 89.69 94.85
9 94.06 93.13 93.19 93.59 93.62 87.19 93.62
10 94.38 94.38 94.38 94.38 94.38 88.75 94.38
M+SD 94.09±1.03 93.72±1.15 93.75±1.08 93.91±0.80 93.92±0.80 87.82±1.60 93.92±0.80
SP (Ours) Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
1 95.00 90.63 91.02 92.81 92.97 85.71 92.99
2 93.13 92.50 92.55 92.81 92.83 85.63 92.84
3 94.69 93.13 93.23 93.91 93.95 87.82 93.96
4 94.69 95.31 95.28 95.00 94.98 90.00 94.98
5 95.31 92.81 92.99 94.06 94.14 88.15 94.14
6 94.06 95.31 95.25 94.69 94.65 89.38 94.66
7 93.75 95.00 94.94 94.38 94.34 88.76 94.34
8 94.69 92.19 92.38 93.44 93.52 86.90 93.53
9 93.75 94.69 94.64 94.22 94.19 88.44 94.19
10 95.31 94.69 94.72 95.00 95.02 90.00 95.02
M+SD 94.44±0.73 93.63±1.60 93.70±1.47 94.03±0.80 94.06±0.76 88.08±1.59 94.06±0.76

For the specificity and precision indicators, the AP achieved the best performance. If we consider all the indicators, SP wins five out of seven indicators. Hence, SP gives the best performance compared to other three pooling methods.

C. Effect of DA

We compared using our 14-way DA “DA14” against not using DA (symbolized as DA0), to explore the effects of our DA14 strategies. The cognate comparison performance is presented in Table VI.

TABLE VI. Comparison of DA0 and DA14.

DA Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
DA0 92.06±0.85 91.59±1.60 91.65±1.48 91.83±0.96 91.85±0.92 83.67±1.92 91.85±0.92
DA14 (Ours) 94.44±0.73 93.63±1.60 93.70±1.47 94.03±0.80 94.06±0.76 88.08±1.59 94.06±0.76

We can observe training with DA14 could significantly provide better performance than DA0 in terms of all seven indicators. Furthermore, the SD of results of DA14 are slightly smaller than that of DA0.

D. Comparison to State-of-the-art Methods

Our 7L-CNN-CD method was compared with five state-of-the-art approaches: RBFNN [6], K-ELM [7], ELM-BA [8], GoogLeNet [11], and ResNet18 [12].

All performances were compared on test set and presented in Table VII. Omitting the SD information, the comparison plot is presented in Figure 7, with measurement indicators chosen from Inline graphic to Inline graphic.

TABLE VII. Comparison to State-of-the-art Approaches.

Approach Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
RBFNN[6] 67.08 74.48 72.52 70.78 69.64 41.74 69.64
K-ELM[7] 57.29 61.46 59.83 59.38 58.46 18.81 58.46
ELM-BA [8] 57.08±3.86 72.40±3.03 67.48±1.65 64.74±1.26 61.75±2.24 29.90±2.45 61.76±2.24
GoogLeNet [11] 76.88±3.92 83.96±2.29 82.84±1.58 80.42±1.40 79.65±1.92 61.10±2.62 79.65±1.91
ResNet18 [12] 78.96±2.90 89.48±1.64 88.30±1.50 84.22±1.23 83.31±1.53 68.89±2.33 83.32±1.53
7L-CNN-CD (Ours) 94.44±0.73 93.63±1.60 93.70±1.47 94.03±0.80 94.06±0.76 88.08±1.59 94.06±0.76

Fig. 7.

Fig. 7.

Bar plot of performances of six different methods.

V. Conclusion

In this COVID-19 diagnosis study, a novel 7L-CNN-CD was proposed, using a seven-layer standard convolutional neural network as background, and integrating data augmentation and stochastic pooling methods.

Experimental results showcased our 7L-CNN-CD algorithm obtained excellent test performances: Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic. The results are better than five state-of-the-art algorithms in terms of COVID-19 diagnosis.

In our future studies, we shall attempt to (i) test more advanced data augmentation techniques; (ii) collect more COVID-19 data to test our algorithm; and (iii) move our algorithm to cloud computing platform to benefit radiologists.

Biographies

graphic file with name zhang-3025855.gif

Yudong Zhang (Senior Member, IEEE) received the B.E. degree in information sciences and the M.Phil. degree in communication and information engineering from the Nanjing University of Aeronautics and Astronautics in 2004 and 2007, respectively, and the Ph.D. degree in signal and information processing from Southeast University in 2010. He serves as a Professor with the University of Leicester.

graphic file with name satap-3025855.gif

Suresh Chandra Satapathy (Senior Member, IEEE) is currently pursuing the Ph.D. degree in computer science engineering with the School of Computer Engineering, KIIT, Bhubaneshwar, India. He is working as a Professor with the School of Computer Engineering and the Dean Research with KIIT. He has developed two new optimization algorithms, social group optimization (SGO) published in Springer Journal and social evolution and learning algorithm (SELO) published in Elsevier.

graphic file with name zhu-3025855.gif

Li-Yao Zhu received the bachelor’s degree in clinical medicine from Xuzhou Medical University in 1989. He has presided over five municipal scientific research projects, published more than 50 articles, including seven SCI articles, and co-edited a monograph. He has won three Municipal Scientific and Technological Progress Awards and three Municipal New Technology Introduction Awards.

graphic file with name gorri-3025855.gif

Juan Manuel Górriz received the B.Sc. degree in physics and the B.Sc. degree in electronic engineering from the University of Granada, Spain, in 2000 and 2001, respectively, the Ph.D. degree from the University of Cádiz, Spain, in 2003, and the Ph.D. degree from the University of Granada in 2006. He is currently a Full Professor with the University of Granada.

graphic file with name wang-3025855.gif

Shuihua Wang (Senior Member, IEEE) received the bachelor’s degree in information sciences from Southeast University in 2008, the master’s degree in electrical engineering from the City College of New York in 2012, and the Ph.D. degree in electrical engineering from Nanjing University in 2017. She is working as a Research Associate with the University of Leicester, U.K.

Appendix A

(See Table VIII.)

Funding Statement

This work was supported in part by the Natural Science Foundation of China under Grant 61602250; in part by the Henan Key Research and Development Project under Grant 182102310629; in part by the Fundamental Research Funds for the Central Universities under Grant CDLS-2020-03; in part by the Key Laboratory of Child Development and Learning Science (Southeast University), Ministry of Education; in part by the Royal Society International Exchanges Cost Share Award, U.K., under Grant RP202G0230; in part by the Medical Research Council Confidence in Concept Award, U.K., under Grant MC_PC_17171; and in part by the Hope Foundation for Cancer Research, U.K., under Grant RM60G0680.

Contributor Information

Yudong Zhang, Email: yudongzhang@ieee.org.

Suresh Chandra Satapathy, Email: sureshsatapathy@ieee.org.

Li-Yao Zhu, Email: zhu_liyao@126.com.

Juan Manuel Górriz, Email: gorriz@ugr.es.

Shuihua Wang, Email: shuihuawang@ieee.org.

References

  • [1].Balsalobre-Lorente D., Driha O. M., Bekun F. V., Sinha A., and Adedoyin F. F., “Consequences of COVID-19 on the social isolation of the chinese economy: Accounting for the role of reduction in carbon emissions,” Air Qual., Atmos. Health, vol. 13, pp. 1–13, Aug. 2020. [Google Scholar]
  • [2].Chaudhary M., Sodani P. R., and Das S., “Effect of COVID-19 on economy in India: Some reflections for policy and programme,” J. Health Manage., vol. 22, no. 2, pp. 169–180, Jun. 2020. [Google Scholar]
  • [3].Campos G. S.et al. , “Ion torrent-based nasopharyngeal swab metatranscriptomics in COVID-19,” J. Virol. Methods, vol. 282, Aug. 2020, Art. no. 113888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Mahdavi A.et al. , “The role of repeat chest CT scan in the COVID-19 pandemic,” Academic Radiol., vol. 27, no. 7, pp. 1049–1050, Jul. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Li Y. and Xia L., “Coronavirus disease 2019 (COVID-19): Role of chest CT in diagnosis and management,” Amer. J. Roentgenol., vol. 214, no. 6, pp. 1280–1286, Jun. 2020. [DOI] [PubMed] [Google Scholar]
  • [6].Lu Z., Lu S., Liu G., Zhang Y., Yang J., and Phillips P., “A pathological brain detection system based on radial basis function neural network,” J. Med. Imag. Health Informat., vol. 6, no. 5, pp. 1218–1222, Sep. 2016. [Google Scholar]
  • [7].Lu S., Lu Z., Yang J., Yang M., and Wang S., “A pathological brain detection system based on kernel based ELM,” Multimedia Tools Appl., vol. 77, no. 3, pp. 3715–3728, Feb. 2018. [Google Scholar]
  • [8].Lu S.et al. , “A pathological brain detection system based on extreme learning machine optimized by bat algorithm,” CNS Neurol. Disorders—Drug Targets, vol. 16, no. 1, pp. 23–29, Jan. 2017. [DOI] [PubMed] [Google Scholar]
  • [9].Wang S.et al. , “Pathological brain detection via wavelet packet tsallis entropy and real-coded biogeography-based optimization,” Fundamenta Informaticae, vol. 151, nos. 1–4, pp. 275–291, Mar. 2017. [Google Scholar]
  • [10].Jiang X., “Chinese Sign Language Fingerspelling Recognition via Six-Layer Convolutional Neural Network with Leaky Rectified Linear Units for Therapy and Rehabilitation,” J. Med. Imag. Health Informat., vol. 9, pp. 2031–2038, 2019. [Google Scholar]
  • [11].Szegedy C.et al. , “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9. [Google Scholar]
  • [12].Yu X. and Wang S.-H., “Abnormality diagnosis in mammograms by transfer learning based on ResNet18,” Fundamenta Informaticae, vol. 168, nos. 2–4, pp. 219–230, Sep. 2019. [Google Scholar]
  • [13].Zhang Y., Qian Y., Wu D., Hossain M. S., Ghoneim A., and Chen M., “Emotion-aware multimedia systems security,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 617–624, Mar. 2019. [Google Scholar]
  • [14].Zhang Y., Gravina R., Lu H., Villari M., and Fortino G., “PEA: Parallel electrocardiogram-based authentication for smart healthcare systems,” J. Netw. Comput. Appl., vol. 117, pp. 10–16, Sep. 2018. [Google Scholar]
  • [15].Zhang Y., Ma X., Zhang J., Hossain M. S., Muhammad G., and Amin S. U., “Edge intelligence in the cognitive Internet of Things: Improving sensitivity and interactivity,” IEEE Netw., vol. 33, no. 3, pp. 58–64, May 2019. [Google Scholar]
  • [16].Veluchamy M. and Subramani B., “Fuzzy dissimilarity contextual intensity transformation with gamma correction for color image enhancement,” Multimedia Tools Appl., vol. 79, pp. 19945–19961, Apr. 2020. [Google Scholar]
  • [17].Górriz J. M., “Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications,” Neurocomputing, vol. 410, pp. 237–270, 2020. [Google Scholar]
  • [18].Rezaei M., Yang H., and Meinel C., “Deep Neural Network with l2-Norm Unit for Brain Lesions Detection,” in Proc. Int. Conf. Neural Inf. Process. (ICNIP), Cham, Switzerland, 2017, pp. 798–807. [Google Scholar]
  • [19].Ghosh A., Singh S., and Sheet D., “Simultaneous localization and classification of acute lymphoblastic leukemic cells in peripheral blood smears using a deep convolutional network with average pooling layer,” in Proc. IEEE Int. Conf. Ind. Inf. Syst. (ICIIS), Dec. 2017, pp. 529–534. [Google Scholar]
  • [20].Wang S.-H.et al. , “Multiple sclerosis identification by 14-layer convolutional neural network with batch normalization, dropout, and stochastic pooling,” Frontiers Neurosci., vol. 12, Nov. 2018. Art. no. 818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Jiang X., Lu M., and Wang S.-H., “An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of chinese sign language,” Multimedia Tools Appl., vol. 79, nos. 21–22, pp. 15697–15715, Jun. 2020. [Google Scholar]
  • [22].Sun S., Hu B., Yu Z., and Song X., “A stochastic max pooling strategy for convolutional neural network trained by noisy samples,” Int. J. Comput. Commun. Control, vol. 15, no. 1, Feb. 2020, Art. no. 1007. [Google Scholar]

Articles from Ieee Sensors Journal are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES