A Seven-Layer Convolutional Neural Network for Chest CT-Based COVID-19 Diagnosis Using Stochastic Pooling

Yudong Zhang; Suresh Chandra Satapathy; Li-Yao Zhu; Juan Manuel Gorriz; Shuihua Wang

doi:10.1109/JSEN.2020.3025855

. 2020 Sep 22;22(18):17573–17582. doi: 10.1109/JSEN.2020.3025855

A Seven-Layer Convolutional Neural Network for Chest CT-Based COVID-19 Diagnosis Using Stochastic Pooling

Yudong Zhang ¹, Suresh Chandra Satapathy ², Li-Yao Zhu ³, Juan Manuel Gorriz ⁴, Shuihua Wang ^5,^✉

PMCID: PMC9564037 PMID: 36346095

Abstract

(Aim) COVID-19 pandemic causes numerous death tolls till now. Chest CT is an effective imaging sensor system to make accurate diagnosis. (Method) This article proposed a novel seven layer convolutional neural network based smart diagnosis model for COVID-19 diagnosis (7L-CNN-CD). We proposed a 14-way data augmentation to enhance the training set, and introduced stochastic pooling to replace traditional pooling methods. (Results) The 10 runs of 10-fold cross validation experiment show that our 7L-CNN-CD approach achieves a sensitivity of 94.44±0.73, a specificity of 93.63±1.60, and an accuracy of 94.03±0.80. (Conclusion) Our proposed 7L-CNN-CD is effective in diagnosing COVID-19 in chest CT images. It gives better performance than several state-of-the-art algorithms. The data augmentation and stochastic pooling methods are proven to be effective.

Keywords: Convolutional neural network, data augmentation, deep learning, stochastic pooling; COVID-19

I. Introduction

Covid-19 (also known as coronavirus) was declared a Public Health Emergency of International Concern on 30/01/2020, and declared as a pandemic on 11/03/2020.

Till 2/Sep, this COVID-19 pandemic caused 25.8 million confirmed cases and 858.2 thousand death tolls (US 187.4k deaths, Brazil 122.5k deaths, India 66.3k deaths, Mexico 65.2k deaths, UK 41.5k deaths, etc.)

Global economy experienced negative effects from COVID-19. For example, Balsalobre-Lorente, et al. [1] analyzed consequences of COVID-19 on the social isolation of Chinese economy. Chaudhary, et al. [2] presented reflections for policy and program of the effect of COVID-19 on economy in India.

Two prevail diagnosis are available. One is viral testing via a nasopharyngeal swab to test the presence of viral RNA fragments [3]. Another is imaging methods, among which the chest computed tomography (CCT) [4] is one of the imaging devices that can provide the highest sensitivity. The CCT uses X-ray generator and X-ray sensors that rotate around the subjects.

The main biomarkers in CCT differentiating COVID-19 from healthy people are the asymmetric peripheral ground-glass opacities (GGOs) without pleural effusions [5]. This study collects those CCT slices.

However, manual interpretation by radiologists is tedious and easy to be influenced by inter-expert and intra-expert factors (such as fatigue, emotion, etc.). Smart diagnosis systems via computer vision and artificial intelligence can benefit patients, radiologists, experts and hospitals. Traditional artificial intelligence (AI) and modern deep learning (DL) methods have achieved excellent results in analyzing medical images, e.g., Lu [6] proposed a radial-basis-function neural network (RBFNN) to detect pathological brains. Yang [7] presented a kernel-based extreme learning classifier (K-ELM) to create a novel pathological brain detection system. Their method was robust and effective. Lu [8] proposed a novel extreme learning machine trained by the bat algorithm (ELM-BA) approach. Li and Liu [9] introduced the real-coded biogeography-based optimization (RCBBO) to detect diseased brains. Jiang [10] used a six-layer convolutional neural network (6L-CNN) to recognize sign language fingerspelling. Szegedy, et al. [11] presented the GoogleNet. Yu and Wang [12] suggested the use of ResNet18 for mammogram abnormality detection. Furthermore, some smart health systems gained success in emotion-aware security [13], authentication [14], and IoT [15].

We proposed a novel 7-layer convolutional neural network for COVID-19 diagnosis (7L-CNN-CD). To improve its performance, three improvements were proposed in this study: (i) A 12-way data augmentation (DA-12) was proposed; (ii) Stochastic pooling was introduced to replace traditional pooling methods;

II. Dataset

Image acquisition CT configuration and method: Philips Ingenuity 64 row spiral CT machine, KV: 120, MAS: 240, layer thickness 3 mm, layer spacing 3 mm, screw pitch 1.5: lung window (W: 1500 HU, L: -500 HU), Mediastinum window (W: 350 HU, L: 60 HU), thin layer reconstruction according to the lesion display, layer thickness and layer distance are 1mm lung window image. The patients were placed in a supine position, breathing deeply after holding in, and conventionally scanned from the lung tip to the costal diaphragm angle.

For each subject, 1-4 slices were chosen. Slice level selection (SLS) method was employed: For COVID-19 pneumonia patients, the slice showing the largest size and number of lesions was selected. For normal subjects, any level of the image can be selected. The resolutions of all images are Inline graphic . Table I shows the demographics, where HC means healthy control.

TABLE I. Demographics of Subjects Used in This Study.

	No. of subjects (m/f)	No. of Images	Age Range
COVID-19	142 (95/47)	320	22-91
HC	142 (88/54)	320	21-76

Open in a new tab

When there are differences between the two analyses Inline graphic , a superior doctor was consulted to reach a consensus. Suppose means a CCT image scan, means the labelling of each individual expert, and the final labelling is obtained by

where MV denotes majority voting, Inline graphic represents the labelling of all three experts.

III. Methodology

Table VIII shows the abbreviations and their full names for ease of understanding of our methodology part.

TABLE VIII. Abbreviation List.

Meanings	Abbreviations
MV	majority voting
SLS	Slice level selection
HC	Healthy control
CCT	Chest computed tomography
DS	downsampling
HS	histogram stretching
CR	compression ratio
DA	Data augmentation
(A)(M)(S)(L2)P	(Average) (Max) (Stochastic) ( norm) pooling
MCC	Matthews correlation coefficient
FMI	Fowlkes–Mallows index

Open in a new tab

A. Preprocessing

The original dataset containing 320 COVID-19 images and 320 HC images is symbolized as Inline graphic , each image is symbolized as , . We have

Figure 1(a) shows a raw COVID-19 CCT image. Figure 1(b) shows the flowchart of our preprocessing procedure. First, we converted all color images to grayscale by only reserving the luminance information. The reason of performing grayscale is there is no need to store a grayscale image in three color channels. Directly inputting original RGB images to the neural network will increase the computation burdens. Thus, we get the grayscale image set Inline graphic as

where Inline graphic means the grayscale operation.

Second, histogram stretching (HS) method was used to increase every slice’s contrast. For Inline graphic -th image , we first calculate their minimum grayscale value and maximum grayscale value respectively by

here ( Inline graphic , means coordinates of pixel of the image . The new histogram stretched image is obtained by

In all, we get the histogram stretched image set Inline graphic as above.

Third, we crop the images to remove the texts at the margin areas, and the checkup bed at the bottom area. Thus, we get the cropped dataset Inline graphic as

where Inline graphic represents crop operation. Four crop variables: top, bottom, left, and right means the pixels to be removed during crop operation. In this study all their values equal 150. Now the size of each image is reduced from to .

Fourth, we downsampled each image to size of [256, 256], and we now get the resized image set Inline graphic as

where Inline graphic means the downsampling (DS) function, where is a downsampled image of original image .

Table II compares the size and storage of each image Inline graphic , , at every preprocessing step. We can see here after preprocessing procedure, each image will only cost about 2.08% of its original storage or size. The compression ratio (CR) rates of -th image of final state to original stage were calculated by following equation.

TABLE II. Image Size and Storage per Image at Each Preprocessing Step.

Preprocess	Symbol	Size (per image)	Storage (per image)
Original			12,582,912
Grayscale			4,194,304
HS			4,194,304
Crop			2,096,704
DS			262,144

Open in a new tab

We can see here the storage CR equals size CR for any Inline graphic -th image. Figure 2 shows two samples from the preprocessed dataset .

Fig. 2. — Two samples of preprocessed dataset .

B. Improvement I: Data Augmentation

Generally, the CCT image set faces small-size dataset (SSD) and lack of generalization (LoG) problems. To break the curse of SSD and LoG, there are four possible types of solutions: (i) data generation (DG); (ii) regularization approach (RA), (iii) ensemble approach (EA); and (iv) data augmentation (DA). All those DG, RA, EA, and DA methods are effective in handling SSD and LoG problems.

We proposed a 14-way DA method, as shown in Figure 3. We will use 10-fold cross validation technique. Suppose the preprocessed CCT image set Inline graphic will split into ten folds, nine of which form the training set , and the rest forms test set .

where Inline graphic means the cardinality of the set . For ease of reading, we ignore the run-index in following texts, and just simplify the situations as , and we assume contains images

For each image Inline graphic , we shall define all the 14 different DA operations.

1). Rotation:

Rotation angle Inline graphic was in the value from −30° to 30° in increase of 2°, skipping the value of , since it corresponds to the original image .

where the rotation factor vector Inline graphic is defined as

2). Scaling:

All training CCT images were scaled with scaling factor Inline graphic , the values of which vary from 0.7 to 1.3 with increase of 0.02, skipping the value of 1.

where scaling factor vector Inline graphic is defined as

3). Noise Injection (NI):

The Inline graphic -mean -variance Gaussian noises were added to the all CCT training images to produce 30 new noised images.

where the mean and variance vector definition of noise are defined as Inline graphic , . The values of 0 and 0.01 are default values of mean and variance of Gaussian noises, respectively.

4). Random Translation (RT):

All CCT image Inline graphic was translated 30 times with random horizontal shift vector and random vertical shift

where the values of Inline graphic and are in the range of [−15, 15], and obey uniform distribution .

5). Gamma Correction (GC):

GC can help adjust the contrast of original image [16]. The factor vector of GC Inline graphic varied from 0.4 to 1.6 with increase of 0.04, skipping the value of 1.

where the values of Inline graphic is chosen as:

6). Horizontal Shear Transform (HST):

We will generate 30 horizontal shear transform (HST) images as

where the HST values are assigned from −0.15 to 0.15 with increase of 0.01, skipping the value o 0

7). Vertical Shear Transform (VST):

Similarly, we generate 30 vertical shear transform (VST) images as below. Besides, the values of VST factor vector Inline graphic are the same as .

8). Mirror:

The original image Inline graphic is mirrored and we obtain a new image . Suppose is the mirror function, we have

we define following operations:

9). Concatenation:

All the first seven DA results are concatenated, and we have

where Inline graphic means the concatenation. The size of is images, then we have the results of 8–14 DA techniques as

Finally, one original image Inline graphic will yield to 365 images (containing itself) in the enhanced training set.

C. Improvement 2: Stochastic Pooling

In traditional CNN, the activation maps (AMs) are usually too large [17] (i.e., contain too many features) which will cause (i) overfitting of the training and (ii) large computational costs. Thus, pooling layers (PLs) are frequently used to reduce the size of AMs. Besides, PL could help guarantee the characteristics of invariance-to-translation. There exist three generally-used pooling techniques: (i) Inline graphic norm pooling (L2P); (ii) average pooling (AP); and (iii) max pooling (MP). Assume pooling is a function .

L2P calculates the Inline graphic norm [18] of a given region . Suppose

L2P output Inline graphic is defined as . In this study, we add a constant 1/4 under the square root to make it easier to compare with other pooling methods. This constant 1/4 does not influence training and inference.

The AP [19] calculates the mean value of region Inline graphic

Finally, MP picks out the maximal value from region Inline graphic

Figure 4 showcases the differences of our pooling methods, where we assume both pooling size and pooling stride equal 2. Observe the top left region Inline graphic , its vectorization is . The calculation of L2P, AP, and MP are as below: , , .

Fig. 4. — A toy example of four pooling techniques. ( norm pooling; AP = average pooling; MP =max pooling; SP = stochastic pooling).

The SP was invented to conquer the problems caused by aforementioned three pooling methods: L2P, MP and AP. Both L2P and AP does not work well, since all pixels in Inline graphic are considered by L2P and AP, thus they could reduce the values of strong activations because of other surrounding near-zero pixels. On the other hand, the MP elucidates this obstruction, although it simply overfits the training set and causes the LoG problem.

Instead of computing the Inline graphic norm, average value or max value, the output of the SP is attained via sampling from a multinomial distribution [20] formed from the activations of each element in region [21].

(1)
Reckon the probability of each element .
(2)
Select a location within the in accordance with the probability , calculated by scanning the from up to bottom and left to right [22].
(3)
The output is the value at location .

We use the first block Inline graphic in Figure 4 as an instance. The calculation procedures of SP are described below:

Thus, we get Inline graphic , and . Using the probability map , we randomly select the position associated with probability of . Thus, the output of SP at region is 6. Instead of considering the max values barely or considering all the elements in the region, SP uses non-maximal activations randomly within the region Inline graphic .

D. Measures and Indicators

We set a 10-fold cross validation on the whole dataset Inline graphic . Each fold will contain 32 COVID-19 images and 32 HC images. Within each trial, the training set contains 288+288 = 576 images, and the test set contains 32+32 = 64 images. After combining all the 10 trials, the test set will contain 640 images. The above 10-fold cross validation will run 10 times, and so the final report was based on Inline graphic images. Table III shows the split setting of our dataset.

TABLE III. Split Setting of our Dataset.

Set	Percentage	COVID-19	HC	Total
Training	90%	288	288	576
DA Training		121,248	121,248	242,496
Test	10%	32	32	64
Total	100%	320	320	640

Open in a new tab

This proposed seven-layer convolutional neural network for COVID-19 diagnosis (7L-CNN-CD) will be tested by 10 runs of 10-fold cross validation. Suppose the ideal confusion matrix Inline graphic over the test set at -th trial and -th run is

where the value 32 can be found in the test row in Table III. The value of 32 means the number of COVID-19 cases and the number of HC cases in the test set. After running through 1-10 trials, and we get the confusion matrix of one-run 10-fold CV as

In realistic inference, we cannot get the perfect diagonal matrix, where all off-diagonal elements are zero. Suppose the confusion matrix at Inline graphic -th run is

Note Inline graphic , in this study. Here and represent true positive (TP) and true negative (TN), respectively. Positive class (P) is COVID-19, and negative class (N) is healthy control. and represent false negative (FN) and false positive (FP), respectively. We can define four simple measures as

Three advanced measures are defined below. F1 score is:

Matthews correlation coefficient (MCC) is defined as

Fowlkes-Mallows index (FMI) is defined as

After combining 10 runs Inline graphic , we can calculate the mean and standard deviation (SD) of all -th measures as

E. Proposed 7L-CNN-CD Algorithm

Figure 5 presents the structure of proposed 7-layer CNN (7L-CNN). After training, the network can be used to diagnose COVID-19 is called 7L-CNN-CD. The sizes of activation map are labelled at each cube in Figure 5. Table IV shows the pseudocode of our 7L-CNN-CD model. Here we divide our algorithm into two phases: (I) Preprocessing and (II) 10 runs of 10-fold cross validation.

Fig. 5. — Structure of proposed 7-layer CNN.

TABLE IV. Pseudocode of Our 7L-CNN-CD Model.

Input: Original Image Set
Ground Truth: obtained from two junior and one senior radiologists. See Eq. (1.a)
Phase I: Preprocessing
Grayscale . See Eq. (3)
Histogram Stretching . See Eq. (5.a)
Image Crop . See Eq. (6.a)
Downsampling . See Eq. (7)
Phase II: 10 runs of 10-fold cross validation
for % is run index
Randomly split preprocessed set into 10 folds
,
for :10 % is trial index
Step II.A: Training & Test Set
is chosen as the t-th fold.

Training Set. B is chosen as the other folds.

Enhanced Training Set.
DA , see equation (27).
Step II.B: Create Initial CNN model
Create an initial deep network via 7L-CNN model;
Use SP to replace all pooling layers in 7L-CNN model. See equation (34).
Step II.C Trained 7L-CNN-CD model
Train 7L-CNN network using and ground truth
Trained model
;
Step II.D: Confusion Matrix Performance
Test prediction

Test performance. is obtained by comparing test prediction and ground truth.
.
end
Summarize all 10 trials and get , see Eq. (38).
Calculate , see Eqs. (39.a)–(42)
end
Output mean and SD of . see Eq. (43.a)

Open in a new tab

IV. Results, and Discussions

A. Result of Data Augmentation

Suppose Inline graphic is Figure 2(a), Figure 6 shows the DA(1-7) results. Due to the page limit, their horizontal results DA(8-14) are not presented in this article. Particularly, we only select 15 new generated images among 30 generate results per DA technique.

Figure 6(a) presents the 15 rotated new images. Figure 6(b-e) present 15 scaled, 15 noise-injected, 15 randomly translated, and 15 Gamma corrected images, respectively. Figure 6(f-g) present the 15 HST and 30 VST new images, respectively.

B. SP Compared With Other Three Pooling Methods

The results of SP against other three pooling methods were presented in Table V, which indicates that SP obtained the best sensitivity, accuracy, F1, MCC, and FMI. The definition of Inline graphic can be found in Eqs. (39.a)–(42).

TABLE V. Ten Runs of Different Pooling Methods.

L2P
1	90.63	94.69	94.46	92.66	92.50	85.38	92.52
2	91.56	92.81	92.72	92.19	92.14	84.38	92.14
3	92.50	94.06	93.97	93.28	93.23	86.57	93.23
4	93.13	93.75	93.71	93.44	93.42	86.88	93.42
5	92.19	93.44	93.35	92.81	92.77	85.63	92.77
6	91.56	93.44	93.31	92.50	92.43	85.01	92.43
7	93.13	94.38	94.30	93.75	93.71	87.51	93.71
8	93.75	91.56	91.74	92.66	92.74	85.33	92.74
9	93.13	95.31	95.21	94.22	94.15	88.46	94.16
10	93.75	93.75	93.75	93.75	93.75	87.50	93.75
M+SD	92.53±1.04	93.72±1.04	93.65±0.96	93.13±0.66	93.08±0.67	86.27±1.31	93.09±0.66
AP
1	91.25	94.38	94.19	92.81	92.70	85.67	92.71
2	91.88	94.06	93.93	92.97	92.89	85.96	92.90
3	92.50	92.19	92.21	92.34	92.36	84.69	92.36
4	92.81	94.69	94.59	93.75	93.69	87.52	93.70
5	92.81	95.00	94.89	93.91	93.84	87.83	93.84
6	91.25	92.50	92.41	91.88	91.82	83.76	91.83
7	92.50	92.50	92.50	92.50	92.50	85.00	92.50
8	93.44	95.31	95.22	94.38	94.32	88.77	94.33
9	92.81	94.38	94.29	93.59	93.54	87.20	93.55
10	95.63	94.38	94.44	95.00	95.03	90.01	95.03
M+SD	92.69±1.25	93.94±1.12	93.87±1.09	93.31±0.98	93.27±0.99	86.64±1.96	93.27±0.99
MP
1	94.69	95.31	95.28	95.00	94.98	90.00	94.98
2	92.19	92.81	92.77	92.50	92.48	85.00	92.48
3	94.69	94.69	94.69	94.69	94.69	89.38	94.69
4	93.75	92.81	92.88	93.28	93.31	86.57	93.31
5	92.50	94.38	94.27	93.44	93.38	86.89	93.38
6	95.31	91.56	91.87	93.44	93.56	86.94	93.57
7	94.38	93.44	93.50	93.91	93.93	87.82	93.94
8	95.00	94.69	94.70	94.84	94.85	89.69	94.85
9	94.06	93.13	93.19	93.59	93.62	87.19	93.62
10	94.38	94.38	94.38	94.38	94.38	88.75	94.38
M+SD	94.09±1.03	93.72±1.15	93.75±1.08	93.91±0.80	93.92±0.80	87.82±1.60	93.92±0.80
SP (Ours)
1	95.00	90.63	91.02	92.81	92.97	85.71	92.99
2	93.13	92.50	92.55	92.81	92.83	85.63	92.84
3	94.69	93.13	93.23	93.91	93.95	87.82	93.96
4	94.69	95.31	95.28	95.00	94.98	90.00	94.98
5	95.31	92.81	92.99	94.06	94.14	88.15	94.14
6	94.06	95.31	95.25	94.69	94.65	89.38	94.66
7	93.75	95.00	94.94	94.38	94.34	88.76	94.34
8	94.69	92.19	92.38	93.44	93.52	86.90	93.53
9	93.75	94.69	94.64	94.22	94.19	88.44	94.19
10	95.31	94.69	94.72	95.00	95.02	90.00	95.02
M+SD	94.44±0.73	93.63±1.60	93.70±1.47	94.03±0.80	94.06±0.76	88.08±1.59	94.06±0.76

Open in a new tab

For the specificity and precision indicators, the AP achieved the best performance. If we consider all the indicators, SP wins five out of seven indicators. Hence, SP gives the best performance compared to other three pooling methods.

C. Effect of DA

We compared using our 14-way DA “DA14” against not using DA (symbolized as DA0), to explore the effects of our DA14 strategies. The cognate comparison performance is presented in Table VI.

TABLE VI. Comparison of DA0 and DA14.

DA
DA0	92.06±0.85	91.59±1.60	91.65±1.48	91.83±0.96	91.85±0.92	83.67±1.92	91.85±0.92
DA14 (Ours)	94.44±0.73	93.63±1.60	93.70±1.47	94.03±0.80	94.06±0.76	88.08±1.59	94.06±0.76

Open in a new tab

We can observe training with DA14 could significantly provide better performance than DA0 in terms of all seven indicators. Furthermore, the SD of results of DA14 are slightly smaller than that of DA0.

D. Comparison to State-of-the-art Methods

Our 7L-CNN-CD method was compared with five state-of-the-art approaches: RBFNN [6], K-ELM [7], ELM-BA [8], GoogLeNet [11], and ResNet18 [12].

All performances were compared on test set and presented in Table VII. Omitting the SD information, the comparison plot is presented in Figure 7, with measurement indicators chosen from Inline graphic to .

TABLE VII. Comparison to State-of-the-art Approaches.

Approach
RBFNN[6]	67.08	74.48	72.52	70.78	69.64	41.74	69.64
K-ELM[7]	57.29	61.46	59.83	59.38	58.46	18.81	58.46
ELM-BA [8]	57.08±3.86	72.40±3.03	67.48±1.65	64.74±1.26	61.75±2.24	29.90±2.45	61.76±2.24
GoogLeNet [11]	76.88±3.92	83.96±2.29	82.84±1.58	80.42±1.40	79.65±1.92	61.10±2.62	79.65±1.91
ResNet18 [12]	78.96±2.90	89.48±1.64	88.30±1.50	84.22±1.23	83.31±1.53	68.89±2.33	83.32±1.53
7L-CNN-CD (Ours)	94.44±0.73	93.63±1.60	93.70±1.47	94.03±0.80	94.06±0.76	88.08±1.59	94.06±0.76

Open in a new tab

Fig. 7. — Bar plot of performances of six different methods.

V. Conclusion

In this COVID-19 diagnosis study, a novel 7L-CNN-CD was proposed, using a seven-layer standard convolutional neural network as background, and integrating data augmentation and stochastic pooling methods.

Experimental results showcased our 7L-CNN-CD algorithm obtained excellent test performances: Inline graphic , , , , , . The results are better than five state-of-the-art algorithms in terms of COVID-19 diagnosis.

In our future studies, we shall attempt to (i) test more advanced data augmentation techniques; (ii) collect more COVID-19 data to test our algorithm; and (iii) move our algorithm to cloud computing platform to benefit radiologists.

Biographies

graphic file with name zhang-3025855.gif

Yudong Zhang (Senior Member, IEEE) received the B.E. degree in information sciences and the M.Phil. degree in communication and information engineering from the Nanjing University of Aeronautics and Astronautics in 2004 and 2007, respectively, and the Ph.D. degree in signal and information processing from Southeast University in 2010. He serves as a Professor with the University of Leicester.

graphic file with name satap-3025855.gif

Suresh Chandra Satapathy (Senior Member, IEEE) is currently pursuing the Ph.D. degree in computer science engineering with the School of Computer Engineering, KIIT, Bhubaneshwar, India. He is working as a Professor with the School of Computer Engineering and the Dean Research with KIIT. He has developed two new optimization algorithms, social group optimization (SGO) published in Springer Journal and social evolution and learning algorithm (SELO) published in Elsevier.

graphic file with name zhu-3025855.gif

Li-Yao Zhu received the bachelor’s degree in clinical medicine from Xuzhou Medical University in 1989. He has presided over five municipal scientific research projects, published more than 50 articles, including seven SCI articles, and co-edited a monograph. He has won three Municipal Scientific and Technological Progress Awards and three Municipal New Technology Introduction Awards.

graphic file with name gorri-3025855.gif

Juan Manuel Górriz received the B.Sc. degree in physics and the B.Sc. degree in electronic engineering from the University of Granada, Spain, in 2000 and 2001, respectively, the Ph.D. degree from the University of Cádiz, Spain, in 2003, and the Ph.D. degree from the University of Granada in 2006. He is currently a Full Professor with the University of Granada.

graphic file with name wang-3025855.gif

Shuihua Wang (Senior Member, IEEE) received the bachelor’s degree in information sciences from Southeast University in 2008, the master’s degree in electrical engineering from the City College of New York in 2012, and the Ph.D. degree in electrical engineering from Nanjing University in 2017. She is working as a Research Associate with the University of Leicester, U.K.

Appendix A

(See Table VIII.)

Funding Statement

This work was supported in part by the Natural Science Foundation of China under Grant 61602250; in part by the Henan Key Research and Development Project under Grant 182102310629; in part by the Fundamental Research Funds for the Central Universities under Grant CDLS-2020-03; in part by the Key Laboratory of Child Development and Learning Science (Southeast University), Ministry of Education; in part by the Royal Society International Exchanges Cost Share Award, U.K., under Grant RP202G0230; in part by the Medical Research Council Confidence in Concept Award, U.K., under Grant MC_PC_17171; and in part by the Hope Foundation for Cancer Research, U.K., under Grant RM60G0680.

Contributor Information

Yudong Zhang, Email: yudongzhang@ieee.org.

Suresh Chandra Satapathy, Email: sureshsatapathy@ieee.org.

Li-Yao Zhu, Email: zhu_liyao@126.com.

Juan Manuel Górriz, Email: gorriz@ugr.es.

Shuihua Wang, Email: shuihuawang@ieee.org.

References

[1].Balsalobre-Lorente D., Driha O. M., Bekun F. V., Sinha A., and Adedoyin F. F., “Consequences of COVID-19 on the social isolation of the chinese economy: Accounting for the role of reduction in carbon emissions,” Air Qual., Atmos. Health, vol. 13, pp. 1–13, Aug. 2020. [Google Scholar]
[2].Chaudhary M., Sodani P. R., and Das S., “Effect of COVID-19 on economy in India: Some reflections for policy and programme,” J. Health Manage., vol. 22, no. 2, pp. 169–180, Jun. 2020. [Google Scholar]
[3].Campos G. S.et al. , “Ion torrent-based nasopharyngeal swab metatranscriptomics in COVID-19,” J. Virol. Methods, vol. 282, Aug. 2020, Art. no. 113888. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Mahdavi A.et al. , “The role of repeat chest CT scan in the COVID-19 pandemic,” Academic Radiol., vol. 27, no. 7, pp. 1049–1050, Jul. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Li Y. and Xia L., “Coronavirus disease 2019 (COVID-19): Role of chest CT in diagnosis and management,” Amer. J. Roentgenol., vol. 214, no. 6, pp. 1280–1286, Jun. 2020. [DOI] [PubMed] [Google Scholar]
[6].Lu Z., Lu S., Liu G., Zhang Y., Yang J., and Phillips P., “A pathological brain detection system based on radial basis function neural network,” J. Med. Imag. Health Informat., vol. 6, no. 5, pp. 1218–1222, Sep. 2016. [Google Scholar]
[7].Lu S., Lu Z., Yang J., Yang M., and Wang S., “A pathological brain detection system based on kernel based ELM,” Multimedia Tools Appl., vol. 77, no. 3, pp. 3715–3728, Feb. 2018. [Google Scholar]
[8].Lu S.et al. , “A pathological brain detection system based on extreme learning machine optimized by bat algorithm,” CNS Neurol. Disorders—Drug Targets, vol. 16, no. 1, pp. 23–29, Jan. 2017. [DOI] [PubMed] [Google Scholar]
[9].Wang S.et al. , “Pathological brain detection via wavelet packet tsallis entropy and real-coded biogeography-based optimization,” Fundamenta Informaticae, vol. 151, nos. 1–4, pp. 275–291, Mar. 2017. [Google Scholar]
[10].Jiang X., “Chinese Sign Language Fingerspelling Recognition via Six-Layer Convolutional Neural Network with Leaky Rectified Linear Units for Therapy and Rehabilitation,” J. Med. Imag. Health Informat., vol. 9, pp. 2031–2038, 2019. [Google Scholar]
[11].Szegedy C.et al. , “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9. [Google Scholar]
[12].Yu X. and Wang S.-H., “Abnormality diagnosis in mammograms by transfer learning based on ResNet18,” Fundamenta Informaticae, vol. 168, nos. 2–4, pp. 219–230, Sep. 2019. [Google Scholar]
[13].Zhang Y., Qian Y., Wu D., Hossain M. S., Ghoneim A., and Chen M., “Emotion-aware multimedia systems security,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 617–624, Mar. 2019. [Google Scholar]
[14].Zhang Y., Gravina R., Lu H., Villari M., and Fortino G., “PEA: Parallel electrocardiogram-based authentication for smart healthcare systems,” J. Netw. Comput. Appl., vol. 117, pp. 10–16, Sep. 2018. [Google Scholar]
[15].Zhang Y., Ma X., Zhang J., Hossain M. S., Muhammad G., and Amin S. U., “Edge intelligence in the cognitive Internet of Things: Improving sensitivity and interactivity,” IEEE Netw., vol. 33, no. 3, pp. 58–64, May 2019. [Google Scholar]
[16].Veluchamy M. and Subramani B., “Fuzzy dissimilarity contextual intensity transformation with gamma correction for color image enhancement,” Multimedia Tools Appl., vol. 79, pp. 19945–19961, Apr. 2020. [Google Scholar]
[17].Górriz J. M., “Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications,” Neurocomputing, vol. 410, pp. 237–270, 2020. [Google Scholar]
[18].Rezaei M., Yang H., and Meinel C., “Deep Neural Network with l2-Norm Unit for Brain Lesions Detection,” in Proc. Int. Conf. Neural Inf. Process. (ICNIP), Cham, Switzerland, 2017, pp. 798–807. [Google Scholar]
[19].Ghosh A., Singh S., and Sheet D., “Simultaneous localization and classification of acute lymphoblastic leukemic cells in peripheral blood smears using a deep convolutional network with average pooling layer,” in Proc. IEEE Int. Conf. Ind. Inf. Syst. (ICIIS), Dec. 2017, pp. 529–534. [Google Scholar]
[20].Wang S.-H.et al. , “Multiple sclerosis identification by 14-layer convolutional neural network with batch normalization, dropout, and stochastic pooling,” Frontiers Neurosci., vol. 12, Nov. 2018. Art. no. 818. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Jiang X., Lu M., and Wang S.-H., “An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of chinese sign language,” Multimedia Tools Appl., vol. 79, nos. 21–22, pp. 15697–15715, Jun. 2020. [Google Scholar]
[22].Sun S., Hu B., Yu Z., and Song X., “A stochastic max pooling strategy for convolutional neural network trained by noisy samples,” Int. J. Comput. Commun. Control, vol. 15, no. 1, Feb. 2020, Art. no. 1007. [Google Scholar]

[ref1] [1].Balsalobre-Lorente D., Driha O. M., Bekun F. V., Sinha A., and Adedoyin F. F., “Consequences of COVID-19 on the social isolation of the chinese economy: Accounting for the role of reduction in carbon emissions,” Air Qual., Atmos. Health, vol. 13, pp. 1–13, Aug. 2020. [Google Scholar]

[ref2] [2].Chaudhary M., Sodani P. R., and Das S., “Effect of COVID-19 on economy in India: Some reflections for policy and programme,” J. Health Manage., vol. 22, no. 2, pp. 169–180, Jun. 2020. [Google Scholar]

[ref3] [3].Campos G. S.et al. , “Ion torrent-based nasopharyngeal swab metatranscriptomics in COVID-19,” J. Virol. Methods, vol. 282, Aug. 2020, Art. no. 113888. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] [4].Mahdavi A.et al. , “The role of repeat chest CT scan in the COVID-19 pandemic,” Academic Radiol., vol. 27, no. 7, pp. 1049–1050, Jul. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] [5].Li Y. and Xia L., “Coronavirus disease 2019 (COVID-19): Role of chest CT in diagnosis and management,” Amer. J. Roentgenol., vol. 214, no. 6, pp. 1280–1286, Jun. 2020. [DOI] [PubMed] [Google Scholar]

[ref6] [6].Lu Z., Lu S., Liu G., Zhang Y., Yang J., and Phillips P., “A pathological brain detection system based on radial basis function neural network,” J. Med. Imag. Health Informat., vol. 6, no. 5, pp. 1218–1222, Sep. 2016. [Google Scholar]

[ref7] [7].Lu S., Lu Z., Yang J., Yang M., and Wang S., “A pathological brain detection system based on kernel based ELM,” Multimedia Tools Appl., vol. 77, no. 3, pp. 3715–3728, Feb. 2018. [Google Scholar]

[ref8] [8].Lu S.et al. , “A pathological brain detection system based on extreme learning machine optimized by bat algorithm,” CNS Neurol. Disorders—Drug Targets, vol. 16, no. 1, pp. 23–29, Jan. 2017. [DOI] [PubMed] [Google Scholar]

[ref9] [9].Wang S.et al. , “Pathological brain detection via wavelet packet tsallis entropy and real-coded biogeography-based optimization,” Fundamenta Informaticae, vol. 151, nos. 1–4, pp. 275–291, Mar. 2017. [Google Scholar]

[ref10] [10].Jiang X., “Chinese Sign Language Fingerspelling Recognition via Six-Layer Convolutional Neural Network with Leaky Rectified Linear Units for Therapy and Rehabilitation,” J. Med. Imag. Health Informat., vol. 9, pp. 2031–2038, 2019. [Google Scholar]

[ref11] [11].Szegedy C.et al. , “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9. [Google Scholar]

[ref12] [12].Yu X. and Wang S.-H., “Abnormality diagnosis in mammograms by transfer learning based on ResNet18,” Fundamenta Informaticae, vol. 168, nos. 2–4, pp. 219–230, Sep. 2019. [Google Scholar]

[ref13] [13].Zhang Y., Qian Y., Wu D., Hossain M. S., Ghoneim A., and Chen M., “Emotion-aware multimedia systems security,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 617–624, Mar. 2019. [Google Scholar]

[ref14] [14].Zhang Y., Gravina R., Lu H., Villari M., and Fortino G., “PEA: Parallel electrocardiogram-based authentication for smart healthcare systems,” J. Netw. Comput. Appl., vol. 117, pp. 10–16, Sep. 2018. [Google Scholar]

[ref15] [15].Zhang Y., Ma X., Zhang J., Hossain M. S., Muhammad G., and Amin S. U., “Edge intelligence in the cognitive Internet of Things: Improving sensitivity and interactivity,” IEEE Netw., vol. 33, no. 3, pp. 58–64, May 2019. [Google Scholar]

[ref16] [16].Veluchamy M. and Subramani B., “Fuzzy dissimilarity contextual intensity transformation with gamma correction for color image enhancement,” Multimedia Tools Appl., vol. 79, pp. 19945–19961, Apr. 2020. [Google Scholar]

[ref17] [17].Górriz J. M., “Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications,” Neurocomputing, vol. 410, pp. 237–270, 2020. [Google Scholar]

[ref18] [18].Rezaei M., Yang H., and Meinel C., “Deep Neural Network with l2-Norm Unit for Brain Lesions Detection,” in Proc. Int. Conf. Neural Inf. Process. (ICNIP), Cham, Switzerland, 2017, pp. 798–807. [Google Scholar]

[ref19] [19].Ghosh A., Singh S., and Sheet D., “Simultaneous localization and classification of acute lymphoblastic leukemic cells in peripheral blood smears using a deep convolutional network with average pooling layer,” in Proc. IEEE Int. Conf. Ind. Inf. Syst. (ICIIS), Dec. 2017, pp. 529–534. [Google Scholar]

[ref20] [20].Wang S.-H.et al. , “Multiple sclerosis identification by 14-layer convolutional neural network with batch normalization, dropout, and stochastic pooling,” Frontiers Neurosci., vol. 12, Nov. 2018. Art. no. 818. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] [21].Jiang X., Lu M., and Wang S.-H., “An eight-layer convolutional neural network with stochastic pooling, batch normalization and dropout for fingerspelling recognition of chinese sign language,” Multimedia Tools Appl., vol. 79, nos. 21–22, pp. 15697–15715, Jun. 2020. [Google Scholar]

[ref22] [22].Sun S., Hu B., Yu Z., and Song X., “A stochastic max pooling strategy for convolutional neural network trained by noisy samples,” Int. J. Comput. Commun. Control, vol. 15, no. 1, Feb. 2020, Art. no. 1007. [Google Scholar]

PERMALINK

A Seven-Layer Convolutional Neural Network for Chest CT-Based COVID-19 Diagnosis Using Stochastic Pooling

Yudong Zhang

Suresh Chandra Satapathy

Li-Yao Zhu

Juan Manuel Gorriz

Shuihua Wang

Abstract

I. Introduction

II. Dataset

TABLE I. Demographics of Subjects Used in This Study.

III. Methodology

TABLE VIII. Abbreviation List.

A. Preprocessing

Fig. 1.

TABLE II. Image Size and Storage per Image at Each Preprocessing Step.

Fig. 2.

B. Improvement I: Data Augmentation

Fig. 3.

1). Rotation:

2). Scaling:

3). Noise Injection (NI):

4). Random Translation (RT):

5). Gamma Correction (GC):

6). Horizontal Shear Transform (HST):

7). Vertical Shear Transform (VST):

8). Mirror:

9). Concatenation:

C. Improvement 2: Stochastic Pooling

Fig. 4.

D. Measures and Indicators

TABLE III. Split Setting of our Dataset.

E. Proposed 7L-CNN-CD Algorithm

Fig. 5.

TABLE IV. Pseudocode of Our 7L-CNN-CD Model.

IV. Results, and Discussions

A. Result of Data Augmentation

Fig. 6.

B. SP Compared With Other Three Pooling Methods

TABLE V. Ten Runs of Different Pooling Methods.

C. Effect of DA

TABLE VI. Comparison of DA0 and DA14.

D. Comparison to State-of-the-art Methods

TABLE VII. Comparison to State-of-the-art Approaches.

Fig. 7.

V. Conclusion

Biographies

Appendix A

Funding Statement

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases