Highlights
-
•
We analysed over 320 COVID-19 images and 320 healthy control images.
-
•
We proposed an improved CNN to extract individual image-level features.
-
•
We proposed to use GCN to extract relation-aware representations.
-
•
We proposed a DFF technology to combine features from GCN and CNN.
-
•
The proposed FCGNet gives better performance than 15 state-of-the-art approaches.
Keywords: Deep feature fusion, Convolutional neural network, Graph convolutional network, Multiple-way data augmentation, Batch normalization, Dropout, Rank-based average pooling
Abstract
(Aim) COVID-19 is an infectious disease spreading to the world this year. In this study, we plan to develop an artificial intelligence based tool to diagnose on chest CT images.
(Method) On one hand, we extract features from a self-created convolutional neural network (CNN) to learn individual image-level representations. The proposed CNN employed several new techniques such as rank-based average pooling and multiple-way data augmentation. On the other hand, relation-aware representations were learnt from graph convolutional network (GCN). Deep feature fusion (DFF) was developed in this work to fuse individual image-level features and relation-aware features from both GCN and CNN, respectively. The best model was named as FGCNet.
(Results) The experiment first chose the best model from eight proposed network models, and then compared it with 15 state-of-the-art approaches.
(Conclusion) The proposed FGCNet model is effective and gives better performance than all 15 state-of-the-art methods. Thus, our proposed FGCNet model can assist radiologists to rapidly detect COVID-19 from chest CT images.
1. Introduction
COVID-19 (also known as coronavirus) was declared as a Public Health Emergency of International Concern on 30/01/2020, and declared as a worldwide pandemic on 11/03/2020 [1]. Till 16/Sep, this COVID-19 pandemic caused 29.6 million confirmed cases and 936.9 thousand death tolls (US 199.1k deaths, Brazil 133.2k deaths, India 82.0k deaths, Mexico 71.6k deaths, UK 41.6k deaths, etc.)
Two prevail diagnosis are available. One is viral testing via a nasopharyngeal swab to test the presence of viral RNA fragments. The samples are then tested by the method of real-time reverse transcription polymerase chain reaction (rRT-PCR) [2]. In some situation, a nasal swab or sputum sample may also be used. Results of rRT-PCR are generally available within some hours to two days. Another is imaging methods, among which the chest computed tomography (CCT) is one of the imaging devices that can provide the highest sensitivity. The main biomarkers in CCT differentiating COVID-19 from healthy people are the asymmetric peripheral ground-glass opacities (GGOs) without pleural effusions. The advantages of imaging methods are that they could aid in screening or accelerate the speed of diagnosis, especially with shortages of RT-PCR [3].
However, manual interpretation by radiologists is tedious and easy to be influenced by inter-expert and intra-expert factors (such as fatigue, emotion, etc.). Besides, the diagnosis throughput by human experts are not comparable with machines. Particularly, early symptoms are smaller, which may be neglected by human experts. Smart diagnosis systems via computer vision and artificial intelligence can benefit patients, radiologists, experts and hospitals.
Traditional artificial intelligence (AI) and modern deep learning (DL) methods have achieved excellent results in analyzing medical images, e.g., Lu [4] proposed a radial-basis-function neural network (RBFNN) to detect pathological brains. Yang [5] presented a kernel-based extreme learning classifier (KELM) to create a novel pathological brain detection system. Their methods were robust and effective. Lu [6] proposed a novel extreme learning machine trained by the bat algorithm (ELM-BA) approach. Li and Liu [7] employed a real-coded biogeography-based optimization (RCBBO) for pathological brain detection, and this RCBBO method can be transferred to COVID-19 detection. Jiang [8] proposed a six-layer convolutional neural network with leaky rectified linear unit for fingerspelling recognition. Their method was shorted as 6L-CLF. Szegedy, et al. [9] presented the GoogleNet. Guo and Du [10] suggested the use of ResNet-18 for thyroid ultrasound classification. Fulton, et al. [11] employed ResNet-50 for classification of Alzheimer's disease. Their method is called RN-50-AD. Togacar, et al. [12] used SqueezeNet and MobileNetV2 to extract features, and employed social mimic optimization (SMO) for feature selection and combination. Cohen, et al. [13] used a large non-COVID-19 chest X-ray set to construct features for COVID-19 images. They authors predicted geographic extent score and lung opacity score to gauge severity of COVID-19 infection. Their method was abbreviated as Covid severity score net (CSSNet) in short. Loey, et al. [14] used generative adversarial network (GAN) to generate more images, and they found GoogleNet with GAN works better for two-class classification. They method was called GGNet. Li, et al. [15] used ResNet50 as the backbone. The CNN features were combined by a max-pooling operation. The resulting feature map was fed to a fully-connected layer to generate the probability score of COVID-19, community acquired pneumonia (CAP), and non-pneumonia. Their method was called COVNet. Ni, et al. [16] used 3D U-Net and MVP-Net on 96 COVID-19 patients in CCT images, for pulmonary lobe segmentation, COVID-19 lesion detection and COVID-19 lesion segmentation. Their method is called NiNet in this study. Ko, et al. [17] proposed a simple 2D deep learning framework for single CCT image. The authors compared four pretrained models: VGG16, ResNet-50, Inception-V3, and Xception. They found ResNet-50 showed the best performance. The authors used two augmentation method: image rotation and zoom. Their proposed additional layers consist of a flatten layer, a fully connected layer with 32 neurons, and a fully connected layer with 3 neurons. Their model can classify three classes: non-pneumonia, other pneumonia, and COVID-19. Their method was named as fast-track COVID-19 classification network (FCONet). Wang, et al. [18] developed a weakly-supervised deep learning framework using 3D CT volumes for COVID-19 classification and lesion localization. In their method, the lung region was segmented via a pre-trained UNet. The segmented 3D lung region was fed into a 3D deep neural network to predict the COVID-19 infection probability. Their method was named 3D deep convolutional neural network to detect COVID-19 (DeCovNet). Tabik, et al. [19] proposed a COVID-SDNet for predicting COVID-19 based on chest X-ray images.
However, most of the existing COVID-19 algorithms above employed single feature representation (SFR), and ignore fusing multiple feature representations (MFRs). Commonly, MFRs can yield better results than using SFR, because MFR is more informative and accurate than any SFR, and MFR consists of all the necessary information. The disadvantages of MFR are (i) they are of high-dimensionality and needs fusion technology, but fuse sometimes will introduce the distortion into the fused features; (ii) Fusion is not a static process in nature; (iii) Fusion of trivial features may affect the results. For example, Hasan, et al. [20] fused MFRs in robust hemolytic peptide prediction. Their cross-validation results showed their method outperformed state-of-the-art approaches. Li, et al. [21] used kernel sparse MFRs for hyperspectral (HS) image classification. Their method gave better results than state-of-the-art HS image classification methods. There are more success cases of using MFRs yielding better performances than SFR.
The main contribution of this paper is deep feature fusion (DFF), viz., the fuse of multiple deep feature representations from both convolutional neural network (CNN) and graph convolutional network (GCN). CNN yields individual image-level representation (IIR), while GCN yields relation-aware representation (RAR). Hence, IIR and RAR are fused together at feature-level, and experiments proved the DFF proposed is efficient, and the fused features give better performances than using IIR features alone. Particularly, our method addresses related subproblems, e.g., feature selection, feature fusion, classifier selection. The resulting system is an efficient pipeline for COVID-19 diagnosis. We propose a fully automatic method which aims to ease the burden of the radiologist. Other contributions of this study are: (i) we propose to use batch normalization and dropout to our deep neural network model; (ii) we use rank-based average pooling to replace traditional max pooling; (iii) we propose multiple-way data augmentation. Finally, our model was demonstrated to give better performance than state-of-the-art approaches.
2. Dataset and preprocessing
Tables 12 and 13 itemized the abbreviation and mathematical symbols used in this study for easy reading. See Appendix A and Appendix B.
2.1. Dataset
Image acquisition CT configuration and method: Philips Ingenuity 64 row spiral CT machine, KV: 120, MAS: 240, layer thickness 3 mm, layer spacing 3 mm, screw pitch 1.5: lung window (W: 1500 HU, L: −500 HU), Mediastinum window (W: 350 HU, L: 60 HU), thin layer reconstruction according to the lesion display, layer thickness and layer distance are 1 mm lung window image. The patients were placed in a supine position, breathing deeply after holding in, and conventionally scanned from the lung tip to the costal diaphragm angle.
For each subject, 1–4 slices were chosen by radiologists using slice level selection method, because usually 4 slices are sufficient to cover the lesion. For COVID-19 pneumonia patients, the slice showing the largest size and number of lesions was selected. For normal subjects, any level of the image can be selected.
The resolutions of all selected images are 1, 024 × 1, 024 × 3. Table 1 lists the demographics of subjects, where we have two categories: (i) COVID-19 patient, and (ii) healthy control (HC) subjects.
Table 1.
Demographics of subjects used in this study.
| Subject nos | Image nos. | Age range | |
|---|---|---|---|
| COVID-19 | 142 | 320 | 22–91 |
| HC | 142 | 320 | 21–76 |
When there are differences between the analyses of two junior radiologist (A 1, A 2), a superior doctor (A 3) was consulted to reach a consensus. Suppose X means a CCT scan, L means the labeling of each individual expert, and the final labeling is obtained by
| (1) |
where Lall represents the labeling of all three experts, i.e., MV denotes majority voting.
2.2. Preprocessing
The original dataset containing 320 COVID-19 images and 320 HC images. The dataset is symbolized as U 1, each image is symbolized as . We have . The size of each image is . Here , .
The raw images are not suitable to train the deep neural networks, because (i) they have redundant information in three color channels; (ii) their contrast are incoherent; (iii) they contained background, checkup bed, and text information; and (iv) their sizes are too large. Fig. 1 shows the pipeline of preprocessing of our COVID-19 dataset.
Fig. 1.
Illustration of preprocessing.
First, we converted the color images to grayscale by only reserving the luminance information, and thus got the grayscale image set U 2 as
| (2) |
where means the grayscale operation. Now . Here , .
Second, histogram stretching (HS) method was used to increase every image's contrast. For ith image , we first calculate their minimum grayscale value μmin(i) and maximum grayscale value μmax(i) respectively by
| (3.a) |
| (3.b) |
here (x, y) means coordinates of pixel of the image u 2(i). The new histogram stretched image u 3(i) is obtained by
| (4) |
In all, we get the histogram stretched image set .
Third, we crop the images to remove the texts at the margin areas, and the checkup bed at the bottom area. Thus, we get the cropped dataset U 4 as
| (5) |
where C represents crop operation. Parameter (vt, vb, vl, vr) means the crop values in unit of pixel from top, bottom, left, and right. We set 150. Now the size of each image . We can have , and .
Fourth, we downsampled each image to size of [W 5, H 5], and we now get the resized image set U 5 as
| (6) |
where ⇓: x↦y means the downsampling (DS) function, where y is a downsampled image of original image x. In this study, , . The advantage of DS are two parts: (i) It can save storage, as shown in Table 2 . (ii) Smaller-size dataset can help the following classification system from overfitting. The reason why we set is based on trial-and-error method. We found that larger size will bring in overfitting which impairs the performance, and meanwhile, smaller size will make the images blurry which also decreases the classifier's performances.
Table 2.
Image size and storage per image at each preprocessing step.
| Preprocess | Symbol | W | H | C | Size (per image) | Storage (per image) |
|---|---|---|---|---|---|---|
| Original | u1(i) | 1024 | 1024 | 3 | 3, 145, 728 | 12,582,912 |
| Grayscale | u2(i) | 1024 | 1024 | 1 | 1, 048, 576 | 4194,304 |
| HS | u3(i) | 1024 | 1024 | 1 | 1, 048, 576 | 4194,304 |
| Crop | u4(i) | 724 | 724 | 1 | 524, 176 | 2096,704 |
| DS | u5(i) | 256 | 256 | 1 | 65, 536 | 262,144 |
Table 2 compares the size and storage of each image at every preprocessing step. We can see here after preprocessing procedure, each image will only cost about 2.08% of its original storage or size. The compression ratio (CR) rates of ith image of final state U 5 to original stage U 1 were calculated as. , and . Hence, we can get , . Fig. 2 (a and b) shows two samples (COVID of a and HC of b) from the preprocessed dataset U 5. Fig. 2(c) delineates the lesions of (a) within red circles.
Fig. 2.
Two samples of preprocessed dataset U5.
3. Methodology
The motivation of this study is two-fold. First, we plan to create a customary state-of-the-art comparable convolutional neural network with several improvements, including batch normalization, dropout, rank-based average pooling, and multiple-way data augmentation. This motivation of using CNN is to extract individual image-level representation (IIR).
Nevertheless, CNN cancels out the relation of a particular image among a group of images. In contrast, this relation-aware representation (RAR) can be captured by graph convolutional network (GCN). Hence, the second main motivation is (i) to use GCN to establish connectivity analysis and extract RAR features; and (ii) to fuse CNN features and GCN features together to enhance the classifier's performance.
Using GCN with CNN can obtain better performance than using CNN merely. Shi, et al. [22] used GCN for cervical cell classification. Their results were significantly better than ResNet-101 and DenseNet-121. Bin, et al. [23] used GCN to extract structure-aware human pose estimation. Their experiments on single- and multi-person estimation benchmark datasets showed that GCN consistently outperforms competing state-of-the-art methods. Tian, et al. [24] proposed a novel GCN-based interactive prostate segmentation in MR images. Their method yielded mean Dice similarity coefficients of 93.8 ± 1.2% and 94.4 ± 1.0% on their in-house and PROMISE12 datasets, respectively. All those three literatures show the powerfulness of GCN.
3.1. Basics of CNN
Traditional machine learning achieved excellent results on disease detections [25,26]. Convolutional neural network (CNN) is a new artificial neural network. Generally, CNN is composed of conv layers (CLs), pooling layers (PLs), non-linear activation functions (NLAFs) and fully connected layers (FCLs) [27,28].
The essential operation in CNN is convolution. A complete CL performs 2D convolution along the width and height directions [29]. Note that the weights in CNN are initialized randomly, and then weights are learnt from the data itself by network training. Fig. 3 illustrates the pipeline of input feature maps passing across a complete CL.
Fig. 3.
Illustration of a complete conv layer.
Fig. 3 shows there are three steps during a complete conv layer: (i) Kernel-based convolution; (ii) Stack; (iii) NLAF. Assume there is an input matrix Γ, kernels Qj, ∀j ∈ [1, ⋅⋅⋅, J], and an output T, (here output T means output of the whole three-step complete conv layer, not the output of merely convolution operation). Note a conv layer means the layer runs convolution, and the “complete conv layer” means the combination of the conv layer; the stack, and NLAF. In Fig. 3, we used the same color to denote the input and output, because the output is the input of next conv layer.
For each kernel Qj, the convolution output is
| (7) |
where ⊗ means convolution operation. Then, all f(j) matrixes are stacked into a three-dimensional matrix F.
| (8) |
where means the stack operation. Finally, the matrix F is passed into the NLAF and output the final matrix [30]
| (9) |
We can calculate theirs sizes S of three main components (input, kernel, and output) as
| (10) |
Where the triple elements (W, H, C) represent the size of height, width, and channels of the matrix, respectively [31]. The subscript Γ, Q, and T represent input, kernel, and output, respectively. J denotes total number of filters. Note that , which means the channel of input C Γshould equal the channel of kernel CQ.
Assume those filters move with padding of vp and stride of vs, we can get the sizes (WT × HT × CT) of output matrix T by simple math as [32]:
| (11) |
where ⌊ · ⌋ represents the floor function. The channel of output CT should equal the number of filters J.
For the last step, viz., the NLAF β, it usually selects the rectified linear unit (ReLU) function [33]. Suppose fij is the entry of the matrix F, we have
| (12) |
ReLU is preferred to traditional NLAFs such as sigmoid (SM) function and hyperbolic tangent (HT) as below:
| (13) |
| (14) |
3.2. Improvement 1: batch normalization and dropout
The motivation of batch normalization (BAN) is to solve the “internal covariant shift (ICS)”, which means the effect of randomness of the distribution of inputs to internal CNN layers during training. The existence of ICS will worsen the CNN's performance [34,35].
This study introduced BAN to normalize those internal layer's inputs over every mini-batch (suppose its size is |Γ|), in order to guarantee the batch normalized output have a uniform distribution. Mathematically, BAN is to learn a function from
| (15) |
During training, the empirical mean μe and empirical variance ϕe can be calculated as
| (16) |
The input γi ∈ Γ was first normalized to
| (17) |
where αs in denominator in Eq. (17) is stability factor, used to enhance the numerical stability. Now the have zero-mean and unit-variance characteristics. In order to have a more expressive deep neural network [36] (here expressive means the network's expressive power, i.e., the ability to express functions), a transformation is usually carried out as
| (18) |
where the parameters A 1 and A 2 are two learnable parameters during training. The transformed output ti ∈ T is then passed to the next layer and the normalized remains internal to the current layer.
In the inference stage, we no longer have minibatch. So instead of calculating μe and ϕe, we will calculate population mean μp and population variance ϕp, and we have the output at the inference stage as
| (19) |
On the other hand, a dropout layer will be introduced before the fully-connected layer. It is a regularization technique which means randomly dropping out neurons during the training. Dropout can help avoid overfitting of deep neural networks. Srivastava, et al. [37] proposed the concept of dropout neurons (DN) by randomly drop neurons and set to zero their neighboring weights from the CNN during training. Suppose collection of all fully-connected neurons is {}, the collection of dropped neurons is {}, the collection of reserved neuron is . The selections of DN are random with a retention probability (αrp) defined as:
| (20) |
Suppose we have a neuron N(i, j) and its corresponding original weights are w(i, j). During training, the neuron's weights wT(i, j) will update as:
| (21) |
During inference, we run the entire CNN without dropout, but the weights of FCLs wI(i, j) using DNs are downscaled (viz., multiplied) by αrp.
| (22) |
The compression ratio of learnable weights (CRLW), is the squared value of retention probability αrp.
| (23) |
Where CD is the total number of learnable weights after dropout, and C is the total number of learnable weights before dropout. Fig. 4 shows a simplistic example, and the detailed analysis is in Appendix C.
Fig. 4.
A simplistic example of a 4-layer DON.
3.3. Improvement 2: rank-based average pooling
The pooling function fundamentally replaces the output of a layer (particularly conv layers) with a summary statistic of the adjacent outputs at a particular position. The pooling method is capable of making less-sensitive activations in the pooled map than the original feature map to the accurate spots of structures within the image.
For a region to be pooled Ψ with size of n × n, here n means pooling size. Suppose the pixels within the region are
| (24) |
The l 2 norm pooling (L2P) calculates the l 2 norm of the given region Ψ. Assume the output pooling matrix is P, the L2P output P L2P(Ψ) is defined as . In this study, we add a constant 1/|Ψ|, where |Ψ| means the number of elements of region Ψ. We have . We added the new constant 1/|Ψ| under the square root, and the constant 1/|Ψ| does not influence training and inference.
| (25) |
The average pooling (AP) calculates the mean value in the region Ψ as
| (26) |
The max pooling (MP) operates on the region Ψ and selects the max value.
| (27) |
Shi, et al. [38] proposed three different rank-based pooling approaches. Their advantages compared to ordinary pooling methods are: (i) Ranking list is invariant under slight changes of activation values; (ii) Important activation values can be easily distinguished by their cognate ranks; and (iii) Usage of rank can avoid scale problems arise from value-based pooling methods. Among rank-based pooling approaches, the rank-based average pooling (RAP) gives better performances than state-of-the-art tactics, and has been applied in many fields. For example, Jiang [39] added RAP to convolutional neural networks for the susceptibility-weighted imaging based cerebral microbleed detection. They got a high accuracy of 97.18%. Sun, et al. [40] added RAP between each subspace mapping layer for facial expression recognition. Their method is better than PCANet and LDANet approaches. Akhtar and Ragavendran [41] compared rank-based pooling with traditional pooling methods, and stated the advantages of rank-based pooling are they can assign ranks and weights to activations simultaneously.
RAP first calculate the rank matrix (RM) based on the values of each element ψl ∈ Ψ, usually lower ranks rl ∈ [1, 2, ⋅⋅⋅, n 2] are assigned to higher values (ψl) as
| (28) |
Providing tied values (), a constraint is added to Eq. (28).
| (29) |
RAP output of input Ψ is PRAP(Ψ), which used the ag greatest activations
| (30) |
ag is the rank threshold. If , then RAP will degrade to MP. On the other side, if , then RAP will degrade to AP. Therefore, RAP is regarded as a trade-off between average pooling and max pooling. Note that L2P, AP, MP, and RAP work on every slice separately. Fig. 5 shows a simplistic example of four pooling techniques and the explanation is shown in Appendix D.
Fig. 5.
A simplistic example of four pooling technologies (L2P, AP, MP, and RAP).
3.4. Improvement 3: multiple-way data augmentation
To circumvent the small-size dataset (SSD) and lack of generation (LG) problems, there are four possible types of solutions, e.g., data augmentation (DA), data generation (DG), ensemble approaches (EA), and regularization.
DA will generate fake images by perturbing existing data, such as cropping, rotation. DG create data from a sampled data source. Synthetic minority over-sampling technique (SMOTE) [42] is a typical algorithm of DG. EA methods use multiple models to obtain better predictive performance than any model alone [43]. Regularization is mainly for the weights of models. Large weights will make the models unstable, because minor variation on the inputs will yield large differences in the output for large weights. Smaller weights are regarded to be more regular (i.e., less specialized). Hence, this type of technique is called weight regularization. DA is used due to its simple and ease to realize.
We proposed a ηDA-way multiple-way data augmentation (MDA) technology. The difference of our MDA to traditional DA is we use multiple (ηDA > 10) DA techniques. Assume the preprocessed dataset . The dataset U 5is divided into three sets:
| (31) |
where training set , validation set , and test set . Meanwhile, the sum of sizes of training set, validation set, and test set equals to the size of preprocessed dataset, .
From the whole training image set Xt, we first performed the following seven DA techniques with different MDA factors χ. Note that each MDA technique will generate ηn new images. Suppose the output MDA training set is symbolized as .
-
(i)Rotation. Rotation angle vector χR skip the value of 0.
(32)
where R means rotation operation.
-
(ii)Noise injection. The -mean -variance Gaussian noises
(33)
were added to all training images to produce ηn new noised images, where z is the gray levels, and p is the probability density function. We have
| (34) |
where N means the noise injection operation. We used Gaussian noise in this study because it is the most common type found in images compared to impulse noise, speckle noise, and salt and pepper noise.
-
(iii)Horizontal Shear (HS) transform. New V images were generated by HT transform
(35)
where H means HS transform. HS factors χH skip the value of . Mathematically, if original coordinates are (u, v), and HS transformed coordinates are (u 1, v 1), then we have
| (36) |
Clearly, the HS transform is a special affine transform, which can be written as
| (37) |
-
(iv)Vertical Shear (VS) transform.
(38)
where V means VS transform, which ran similarly as ST transform. Particularly, the VS factor is the same as HS factor .
-
(v)Random translation (RT). All training images xt(i) were translated ηn times with random horizontal shift ɛx and random vertical shift ɛy, both values of which are in the range of , and obey uniform distribution.
(39)
where aZ is the maximum shift factor. So, we have
| (40) |
-
(vi)Gamma correction (GC). The factor of GC χG will skip the value of 1.
(41)
where G means GC operation.
-
(vii)Scaling. All training images {xt(i)} were scaled with scaling factor χS, skipping .
(42) -
(viii)Mirror and concatenation. All the above results are mirrored, we have
(43)
where represents the mirror function. All the ηDA-way results are finally concatenated as
| (44) |
where means concatenation, ηEF the enhance factor, meaning the ratio of enhanced training set to the original training set. ηEF is defined as
| (45) |
We can calculate . Thus, the MDA can be regarded as a function, making the enhanced training set ηEFtimes as large as raw training set Xt.
| (46) |
3.5. Improvement 4: deep feature fusion by graph convolutional network
To further improve the performance, we introduced a deep feature fusion (DFF) method based on graph convolutional network (GCN). GCN helps find the relation-aware representation (RAR) [44], and thus fuse RAR from GCN with IIR from CNN.
For a given graph , where we have |V| nodes and corresponding links (vi, vj) ∈ E. We can define an adjacency matrix A ∈ |V| × |V| which embeds the relationship of all nodes. The purpose of GCN is to encode the graph G via a neural network model f(X, A) where X ∈ |V| × D, where D means the feature dimension of each node [45]. Note that AX means the sum of all neighboring node features. So GCN can capture the RAR information [46].
A multi-layer GCN will update the node features based on following layer-wise rule:
| (47) |
where represents the normalized version of adjacency matrix A. fReLU is ReLU function. The variable is the feature representation at Lth layer [47].
To carry out the normalization , we first calculate the degree matrix δ ∈ |V| × |V|which is a diagonal matrix
| (48) |
The normalized is obtained based on original adjacency matrix A and degree matrix δ [48].
Note the input , so for a two-layer GCN as shown in Fig. 6 , we have
| (49.a) |
| (49.b) |
where , and are two trainable weight matrixes. (d 0, d 1, d 2) are hyperparameters will be set in the experiment.
Fig. 6.

Illustration of a two-layer GCN. (Different color cylinders mean different cluster centroids).
In our COVID-19 classification task, the GCN will be fused with previous CNN models . The last FCL in previous CNN models is used as the individual image-level representation (IIR) Afterwards, k-means clustering (KMC) is performed on those image-level representation features, and we can get |V| cluster centroids . The clustering correlation shows the potential relationships of images. The adjacency matrix A ∈ |V| × |V|is defined as
| (50) |
where ∨ means operation “or”, Δ means the k-nearest neighbors (kNN) based on cosine similarity. The number of neighbors in kNN is symbolized as .
Fig. 7 shows an example, the three nearest neighbors of node i and node j are , . So, we have Xj ∈ Δ(Xi), Xi∉Δ(Xj). Using the ‘or’ operation we can conclude . The node features Xand adjacency matrix A are sent into two-layer GCN, and we can get . The fusion between H (2)and I is accomplished by dot product fusion. Note we need to set ,
| (51) |
Fig. 7.
Illustration of KNN-based adjacency matrix.
A linear projection (LP) with learnable weight W (2) ∈ |V| × C, where C means the number of categories, we have
| (52) |
where z ∈ C, and b represents the bias. in this study because our task is a binary classification problem, i.e., COVID-19 and healthy people. Finally, softmax operation were performed on z, and cross entropy (CE) loss were calculated. Algorithm 1 shows the proposed deep feature fusion algorithm. During inference stage, the CNN's IIR features are gained and its corresponding GCN's RAR features are obtained by pre-constructed graph and trained two-layer GCN. Using both CNN and GCN, each image is represented by the fusion of its individual image-level representations and its relation-aware representations [22]. Fig. 8 shows the fusion flowchart.
Algorithm 1.
Proposed deep feature fusion.
| Input: IIR Feature I from CNN models |
| Algorithm of DFF |
| Step 1: Create RAR features H(2) from pre-constructed two-layer GCN model |
| Step 2: Dot product fusion combining IIR features and RAR features, |
| Step 3: Linear Projection, |
| Step 4: Softmax and cross-entropy (CE) loss |
Fig. 8.
Flowchart of deep feature fusion strategy. (LP = Linear Projection; CE = Cross Entropy).
3.6. Summary of proposed eight networks
In total, we proposed eight new networks [N(1), ⋅⋅⋅, N(8)]: (i) We first designed a base network N(1), N(1) is called the base network (BN). (ii) we added batch normalization (BAN) and dropout (DO) techniques, and obtained the improved network, N(2) named as “BDBN”. (iii) Next, we developed N(3) termed BDRBN, by introducing rank-based average pooling (RAP) to replace traditional max pooling (MP) in N(2). (iv) Multiple data augmentation (MDA) was proposed and added to N(3), so we get new network N(4), which is short named as BDRMBN.
For the rest four networks, we add a new deep feature fusion (DFF) approach that combines the features from above networks [N(1), N(2), N(3), N(4)] to RAR features from 2-layer GCN. The short names of [N(5), N(6), N(7), N(8)] are defined as DBN, DBDBN, DBDRBN, DBDRMBN. Table 3 gives all the eight proposed network relationships.
Table 3.
Eight proposed networks.
| Index | Inheritance | Short Name | Description |
|---|---|---|---|
| N(1) | BN | Base Network | |
| N(2) | ←N(1)+BAN+DO | BDBN | Add BAN and DO to N(1) |
| N(3) | ←N(2)-MP+RAP | BDRBN | Use RAP to replace MP in N(2) |
| N(4) | ←N(3)+MDA | BDRMBN | Add MDA to N(3) |
| N(5) | ←N(1)+DFF | DBN | Add DFF to N(1) |
| N(6) | ←N(2)+DFF | DBDBN | Add DFF to N(2) |
| N(7) | ←N(3)+DFF | DBDRBN | Add DFF to N(3) |
| N(8) | ←N(4)+DFF | DBDRMBN (FGCNet*) | Add DFF to N(4) |
(* Experiment below shows N(8) gives the best performance, and we name it as FGCNet).
As our expectation, the best model should be within since they fuse the features from CNNs with GCN. The best model will have a formal name as FGCNet, which means Fusion of GCN and CNN networks.
The configuration of proposed networks was designed by trial-and-error method. We set the number of conv layers as uc. Similarly, the number of FCL layers is symbolized as uf. The details of the base network can be found in Table 6.
Table 6.
Hyperparameters of N(1).
| Index | Layer | Hyperparameter | Size of activation map |
|---|---|---|---|
| 1 | Input | 256 × 256 × 1 | |
| 2 | Conv-1 | 32 3 × 3 | 256 × 256 × 32 |
| 3 | P-1 | /2 | 128 × 128 × 32 |
| 4 | Conv-2 | 64 3 × 3 | 128 × 128 × 64 |
| 5 | P-2 | /2 | 64 × 64 × 64 |
| 6 | Conv-3 | 128 3 × 3 | 64 × 64 × 128 |
| 7 | P-3 | /2 | 32 × 32 × 128 |
| 8 | Conv-4 | 128 3 × 3 | 32 × 32 × 128 |
| 9 | P-4 | /2 | 16 × 16 × 128 |
| 10 | Conv-5 | 256 3 × 3 | 16 × 16 × 256 |
| 11 | P-5 | /2 | 8 × 8 × 256 |
| 12 | Conv-6 | 256 3 × 3 | 8 × 8 × 256 |
| 13 | P-6 | /2 | 4 × 4 × 256 |
| 14 | Conv-7 | 512 3 × 3 | 4 × 4 × 512 |
| 15 | P-7 | /2 | 2 × 2 × 512 |
| 16 | Flatten | 1 × 1 × 2048 | |
| 17 | FCL-1 | 120 × 2048 120 × 1 |
1 × 1 × 120 |
| 18 | FCL-2 | 2 × 120 2 × 1 |
1 × 1 × 2 |
3.7. Measures
The algorithm will run W runs, which help to reduce the randomness. At each run , the ideal Eiand real Er confusion matrix over validation set are
| (53) |
where is because we have a balanced dataset. If the confusion matrix was run on the test set, then the diagonal elements turn to . In realistic situation, suppose we have a confusion matrix as
| (54) |
where the four variables {a 1(w), a 2(w), a 3(w), a 4(w)} represent TP, FN, FP, and TN at w-th run, respectively. Here P means COVID-19 and N means healthy lung. TP means a COVID-19 image is classified correctly as COVID-19, FN means a COVID-19 image is wrongly classified as healthy, FP means a healthy lung is wrongly classified as COVID-19, and TN means a healthy lung is classified correctly as healthy. It is obvious that if confusion matrix run on validation set.
Four simple measures {ν 1(w), ν 2(w), ν 3(w), ν 4(w)} can be below, here ν 1 means sensitivity, ν 2 means specificity, ν 3 precision, and ν 4 accuracy.
| (55) |
| (56) |
| (57) |
| (58) |
F1 score at w-th run is defined as ν 5(w)
| (59) |
Besides, F1 score can be expressed in the format of sensitivity and precision as . The F1 score ν 5is the harmonic mean of the precision ν 3 and sensitivity ν 1. The range of F1 score is [0, 1]. The highest possible value 1 indicates perfect precision ν 3 and sensitivity ν 1, and the lowest possible value 0 means either the precision ν 3 or the sensitivity is zero ν 1.
| (60) |
Matthews correlation coefficient at w-th run (MCC) ν 6(w) is defined as
| (61) |
where ɛ(w) is a temporary variable defined as .
Finally, Fowlkes–Mallows index (FMI) at w-th run ν 7(w) can be defined as:
| (62) |
FMI can be expressed in the terms of sensitivity and precision as .
After recording the seven indicators of all W runs, we can calculate the mean and standard deviation (MSD) of all m-th (∀m ∈ [1, 7]) measures as
| (63) |
| (64) |
The final result over W runs were reported in the form of Mean ± SD format.
3.8. Proposed algorithm
Algorithm 2 presents the pseudocode of our algorithm, composed of one input, one output, and five phases. Phase I presents the preprocessing. Phase II presents how to construct the eight network models. Phase III shows the detailed procedures of W runs over validation set. Phase IV presents how to select the best network model based on validation performance. Phase V shows to calculate the test performance using the best network model.
Algorithm 2.
Pseudocode of our algorithm.
![]() |
![]() |
4. Experiments and results
4.1. Hyperparameter setting
Table 4 shows the hyperparameter setting in this study. Most of the values are set by trial-and-error method. The stability factor is set as . The retention probability is set as 0.5. The pooling size is set to 2. The rank threshold is set to 2. The number of DA techniques and new images per DA are set to 14 and 30, respectively. The maximum shift factor is 25, the mean and variance of noise injection is set to 0 and 0.01, respectively. The rotation parameter vector χR, horizontal shift parameter vector χH, Gamma correction parameter vector χG, scaling parameter vector χSare all listed here. Enhanced factor is calculated as 421. The number of conv layers and fully connected layers are set as 7 and 2, respectively. Number of cluster centroids is set to 256. Feature dimension in GCN is set as , and . The number of neighbors in KNN is set to 7. The number of runs Wis 10 since it is a default value used in many other publications.
Table 4.
Hyperparameter setting.
| Parameter | Value |
|---|---|
| αs | |
| αrp | 0.5 |
| n | 2 |
| ag | 2 |
| ηDA | 14 |
| ηn | 30 |
| aZ | 25 |
| χR | . |
| 0 | |
| 0.01 | |
| χH | . |
| χG | |
| χS | . |
| ηEF | 421 |
| uc | 7 |
| uf | 2 |
| |V| | 256 |
| d0 | 120 |
| d1 | 60 |
| d2 | 120 |
| 7 | |
| W | 10 |
Table 5 shows the training, MDA training, validation and test set. Where we can see the total size of training set is . The total size of enhanced MDA training set is . The validation set's and test set's sizes are , and , respectively. In total, the size of the whole dataset is .
Table 5.
Training, validation, and test set.
| Set | Symbol | COVID C | Healthy H |
|---|---|---|---|
| Training | Xt | 160 | 160 |
| MDA Training | XtD | 67,360 | 67,360 |
| Validation | Xv | 64 | 64 |
| Test | Y | 96 | 96 |
| Total | 113 | 209 |
4.2. Base network configuration
The top row of Fig. 9 (a) shows the activation maps of the proposed base network N(1). Here, the size of input is , the output of the first conv layer (C1) is . Then after the first pooling (P1), the output is , We repeat the conv layers in total seven times, and the output is , S15 was flattened into one column vector , and passed into two fully-connected blocks (first two block contains FCL and ReLU, last block contains FCL and softmax), and the outputs are , and . All the 18 matrices S(k), k ∈ [1, 18] correspond to the cuboids in Fig. 9(b). Note S17 will be used as IIR features. The hyperparameters of N(1) are presented in Table 6 . Based on N(1), we can create the rest seven networks.
Fig. 9.
Block chart of first three proposed networks.
4.3. Illustration of MDA
Fig. 10 shows the MDA results. The original image is Fig. 2(a). We can observe that one image will generate 421 new images. This is why we called our algorithm multiple-way data augmentation (MDA).
Fig. 10.
Results of proposed MDA.
4.4. Comparison among proposed networks
Table 7 gives the 10 runs of results using N(1) to N(4). N(1) is BN, N(2) is BDBN, N(3) is BDRBN, and N(4) is BDRMBN. Table 7 clearly shows that N(1) model yielded the following seven performances as: , , , , , , . Definition of νcan be found in Section 3.7.
Table 7.
Comparison among N(1–4) over validation set.
| N(1) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
|---|---|---|---|---|---|---|---|
| 1 | 89.06 | 93.75 | 93.44 | 91.41 | 91.20 | 82.90 | 91.23 |
| 2 | 93.75 | 92.19 | 92.31 | 92.97 | 93.02 | 85.95 | 93.03 |
| 3 | 90.63 | 87.50 | 87.88 | 89.06 | 89.23 | 78.16 | 89.24 |
| 4 | 89.06 | 92.19 | 91.94 | 90.63 | 90.48 | 81.29 | 90.49 |
| 5 | 89.06 | 92.19 | 91.94 | 90.63 | 90.48 | 81.29 | 90.49 |
| 6 | 90.63 | 92.19 | 92.06 | 91.41 | 91.34 | 82.82 | 91.34 |
| 7 | 89.06 | 93.75 | 93.44 | 91.41 | 91.20 | 82.90 | 91.23 |
| 8 | 89.06 | 95.31 | 95.00 | 92.19 | 91.94 | 84.54 | 91.98 |
| 9 | 87.50 | 90.63 | 90.32 | 89.06 | 88.89 | 78.16 | 88.90 |
| 10 | 84.38 | 95.31 | 94.74 | 89.84 | 89.26 | 80.17 | 89.41 |
| MSD | 89.22±2.38 | 92.50±2.31 | 92.31±2.10 | 90.86±1.28 | 90.70±1.31 | 81.82±2.53 | 90.73±1.30 |
| N(2) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
| 1 | 93.75 | 95.31 | 95.24 | 94.53 | 94.49 | 89.07 | 94.49 |
| 2 | 95.31 | 96.88 | 96.83 | 96.09 | 96.06 | 92.20 | 96.07 |
| 3 | 93.75 | 95.31 | 95.24 | 94.53 | 94.49 | 89.07 | 94.49 |
| 4 | 95.31 | 93.75 | 93.85 | 94.53 | 94.57 | 89.07 | 94.58 |
| 5 | 95.31 | 96.88 | 96.83 | 96.09 | 96.06 | 92.20 | 96.07 |
| 6 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 7 | 93.75 | 92.19 | 92.31 | 92.97 | 93.02 | 85.95 | 93.03 |
| 8 | 92.19 | 96.88 | 96.72 | 94.53 | 94.40 | 89.16 | 94.43 |
| 9 | 93.75 | 89.06 | 89.55 | 91.41 | 91.60 | 82.90 | 91.63 |
| 10 | 93.75 | 95.31 | 95.24 | 94.53 | 94.49 | 89.07 | 94.49 |
| MSD | 94.22±1.05 | 94.69±2.47 | 94.71±2.29 | 94.45±1.40 | 94.45±1.34 | 88.93±2.78 | 94.46±1.33 |
| N(3) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
| 1 | 96.88 | 92.19 | 92.54 | 94.53 | 94.66 | 89.16 | 94.68 |
| 2 | 95.31 | 93.75 | 93.85 | 94.53 | 94.57 | 89.07 | 94.58 |
| 3 | 93.75 | 93.75 | 93.75 | 93.75 | 93.75 | 87.50 | 93.75 |
| 4 | 93.75 | 95.31 | 95.24 | 94.53 | 94.49 | 89.07 | 94.49 |
| 5 | 93.75 | 98.44 | 98.36 | 96.09 | 96.00 | 92.29 | 96.03 |
| 6 | 95.31 | 98.44 | 98.39 | 96.88 | 96.83 | 93.80 | 96.84 |
| 7 | 96.88 | 95.31 | 95.38 | 96.09 | 96.12 | 92.20 | 96.13 |
| 8 | 92.19 | 96.88 | 96.72 | 94.53 | 94.40 | 89.16 | 94.43 |
| 9 | 95.31 | 96.88 | 96.83 | 96.09 | 96.06 | 92.20 | 96.07 |
| 10 | 92.19 | 93.75 | 93.65 | 92.97 | 92.91 | 85.95 | 92.92 |
| MSD | 94.53±1.69 | 95.47±2.14 | 95.47±2.05 | 95.00±1.23 | 94.98±1.23 | 90.04±2.47 | 94.99±1.23 |
| N(4) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
| 1 | 96.88 | 98.44 | 98.41 | 97.66 | 97.64 | 95.32 | 97.64 |
| 2 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 3 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 4 | 98.44 | 98.44 | 98.44 | 98.44 | 98.44 | 96.88 | 98.44 |
| 5 | 93.75 | 98.44 | 98.36 | 96.09 | 96.00 | 92.29 | 96.03 |
| 6 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 7 | 95.31 | 98.44 | 98.39 | 96.88 | 96.83 | 93.80 | 96.84 |
| 8 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 9 | 95.31 | 93.75 | 93.85 | 94.53 | 94.57 | 89.07 | 94.58 |
| 10 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| MSD | 96.25±1.32 | 97.03±1.55 | 97.03±1.52 | 96.64±1.11 | 96.63±1.10 | 93.30±2.21 | 96.63±1.10 |
For N(2), the performances improved as , , , , , , . Comparing the results of BN as N(1), we can observe the effectiveness of using batch normalization and dropout.
Furthermore, N(3) yields the performances as , , , , , , . Comparing the seven indicator performances between BDBN of N(2) and BDRBN of N(3), we can conclude that rank-based average pooling gives significant better performance than using max pooling in N(2).
Finally, checking N(4) in Table 7 obtained the performance as , , , , , , . The increase of performances of BDRMBN of N(4) compared to BDRBN of N(3) indicate that multiple-way data augmentation can help improving the performance of the AI classifier.
4.5. Visual explanation
We used Gradient-weighted Class Activation Mapping (Grad-CAM) [49] to visually show why our model, BDRMBN of N(4), can make the decision. Grad-CAM uses the gradient of the classification score with respect to the convolutional features determined by the network in order to understand which parts of the image are most important for classification. The “jet” pseudo-color was used in the heat map. Hence, red colors mean important areas for AI diagnosis, and blue colors mean unimportant areas for AI diagnosis.
Fig. 11 shows the Grad-CAM heat map results of a COVID-19 CCT slice. On the left part of Fig. 11(a), the red circle delineated the lesions, where we can see the GGO occurs. Fig. 11(b) show the corresponding heat map. We can see here AI pay the most attention on the GGO lesion (See the red circle on the Fig. 11a), indicating AI successfully capture the GGO lesions. Secondly, AI pay somewhat attention on the tracheae (in the middle of the Fig. 11b). The reason may be COVID-19 influences the grayscale values of tracheae tissues, where we can see yellow blots on the middle areas in Fig. 11(b).
Fig. 11.
Grad-CAM result on a covid-19 case.
Fig. 12 shows the Grad-CAM heat map of a normal CCT slice. Our AI model scans through the whole image and does not find any strong activations (suspicious areas). Hence, the AI model judges this image “healthy”.
Fig. 12.
Grad-CAM result on a normal case.
4.6. Effect of deep feature fusion
We compared the performance of using deep feature fusion against not using deep feature fusion. The comparison was done on the validation set. The results using N(5–8) are presented in Table 8 . Comparing Tables 7 and 8, we can find that using DFF can increase the classification performance.
Table 8.
Comparison among N(5–8) over validation set.
| N(5) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
|---|---|---|---|---|---|---|---|
| 1 | 96.88 | 90.63 | 91.18 | 93.75 | 93.94 | 87.67 | 93.98 |
| 2 | 90.63 | 93.75 | 93.55 | 92.19 | 92.06 | 84.42 | 92.08 |
| 3 | 93.75 | 92.19 | 92.31 | 92.97 | 93.02 | 85.95 | 93.03 |
| 4 | 95.31 | 89.06 | 89.71 | 92.19 | 92.42 | 84.54 | 92.47 |
| 5 | 95.31 | 90.63 | 91.04 | 92.97 | 93.13 | 86.03 | 93.15 |
| 6 | 90.63 | 90.63 | 90.63 | 90.63 | 90.63 | 81.25 | 90.63 |
| 7 | 92.19 | 92.19 | 92.19 | 92.19 | 92.19 | 84.38 | 92.19 |
| 8 | 95.31 | 89.06 | 89.71 | 92.19 | 92.42 | 84.54 | 92.47 |
| 9 | 89.06 | 93.75 | 93.44 | 91.41 | 91.20 | 82.90 | 91.23 |
| 10 | 95.31 | 90.63 | 91.04 | 92.97 | 93.13 | 86.03 | 93.15 |
| MSD | 93.44±2.64 | 91.25±1.68 | 91.48±1.37 | 92.34±0.89 | 92.41±0.98 | 84.77±1.80 | 92.44±0.98 |
| N(6) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
| 1 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 2 | 92.19 | 96.88 | 96.72 | 94.53 | 94.40 | 89.16 | 94.43 |
| 3 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 4 | 98.44 | 93.75 | 94.03 | 96.09 | 96.18 | 92.29 | 96.21 |
| 5 | 98.44 | 93.75 | 94.03 | 96.09 | 96.18 | 92.29 | 96.21 |
| 6 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 7 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 8 | 93.75 | 95.31 | 95.24 | 94.53 | 94.49 | 89.07 | 94.49 |
| 9 | 93.75 | 95.31 | 95.24 | 94.53 | 94.49 | 89.07 | 94.49 |
| 10 | 95.31 | 93.75 | 93.85 | 94.53 | 94.57 | 89.07 | 94.58 |
| MSD | 95.47±2.01 | 95.16±1.15 | 95.19±1.04 | 95.31±0.82 | 95.31±0.86 | 90.66±1.66 | 95.32±0.86 |
| N(7) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
| 1 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 2 | 93.75 | 96.88 | 96.77 | 95.31 | 95.24 | 90.67 | 95.25 |
| 3 | 96.88 | 93.75 | 93.94 | 95.31 | 95.38 | 90.67 | 95.40 |
| 4 | 96.88 | 95.31 | 95.38 | 96.09 | 96.12 | 92.20 | 96.13 |
| 5 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 6 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 7 | 95.31 | 95.31 | 95.31 | 95.31 | 95.31 | 90.63 | 95.31 |
| 8 | 98.44 | 95.31 | 95.45 | 96.88 | 96.92 | 93.80 | 96.93 |
| 9 | 96.88 | 95.31 | 95.38 | 96.09 | 96.12 | 92.20 | 96.13 |
| 10 | 96.88 | 95.31 | 95.38 | 96.09 | 96.12 | 92.20 | 96.13 |
| MSD | 96.09±1.33 | 95.31±0.74 | 95.36±0.67 | 95.70±0.55 | 95.72±0.57 | 91.42±1.11 | 95.72±0.57 |
| N(8) | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
| 1 | 98.44 | 98.44 | 98.44 | 98.44 | 98.44 | 96.88 | 98.44 |
| 2 | 98.44 | 96.88 | 96.92 | 97.66 | 97.67 | 95.32 | 97.68 |
| 3 | 96.88 | 98.44 | 98.41 | 97.66 | 97.64 | 95.32 | 97.64 |
| 4 | 96.88 | 95.31 | 95.38 | 96.09 | 96.12 | 92.20 | 96.13 |
| 5 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 6 | 100.00 | 98.44 | 98.46 | 99.22 | 99.22 | 98.45 | 99.23 |
| 7 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 8 | 100.00 | 90.63 | 91.43 | 95.31 | 95.52 | 91.03 | 95.62 |
| 9 | 95.31 | 98.44 | 98.39 | 96.88 | 96.83 | 93.80 | 96.84 |
| 10 | 96.88 | 98.44 | 98.41 | 97.66 | 97.64 | 95.32 | 97.64 |
| MSD | 97.66±1.52 | 96.88±2.44 | 96.96±2.21 | 97.27±1.12 | 97.28±1.08 | 94.58±2.17 | 97.30±1.06 |
For clear view, the results using all the proposed eight models are presented in Table 9 . N(1–4) did not use DFF while N(5–8) added DFF to the corresponding networks (See Table 3). Fig. 13 shows the mean and standard deviation of the eight neural network models.
Table 9.
Comparison of eight network models.
| Model | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
|---|---|---|---|---|---|---|---|
| N(1) | 89.22±2.38 | 92.50±2.31 | 92.31±2.10 | 90.86±1.28 | 90.70±1.31 | 81.82±2.53 | 90.73±1.30 |
| N(2) | 94.22±1.05 | 94.69±2.47 | 94.71±2.29 | 94.45±1.40 | 94.45±1.34 | 88.93±2.78 | 94.46±1.33 |
| N(3) | 94.53±1.69 | 95.47±2.14 | 95.47±2.05 | 95.00±1.23 | 94.98±1.23 | 90.04±2.47 | 94.99±1.23 |
| N(4) | 96.25±1.32 | 97.03±1.55 | 97.03±1.52 | 96.64±1.11 | 96.63±1.10 | 93.30±2.21 | 96.63±1.10 |
| N(5) | 93.44±2.64 | 91.25±1.68 | 91.48±1.37 | 92.34±0.89 | 92.41±0.98 | 84.77±1.80 | 92.44±0.98 |
| N(6) | 95.47±2.01 | 95.16±1.15 | 95.19±1.04 | 95.31±0.82 | 95.31±0.86 | 90.66±1.66 | 95.32±0.86 |
| N(7) | 96.09±1.33 | 95.31±0.74 | 95.36±0.67 | 95.70±0.55 | 95.72±0.57 | 91.42±1.11 | 95.72±0.57 |
| N(8) | 97.66±1.52 | 96.88±2.44 | 96.96±2.21 | 97.27±1.12 | 97.28±1.08 | 94.58±2.17 | 97.30±1.06 |
Fig. 13.
Error bar of proposed eight models.
Comparing BDRMBN of N(4) against DBDRMBN of N(8), we can see that adding DFF can improve all the five indicators (ν 1, ν 4, ν 5, ν 6, ν 7) except (ν 2, ν 3). This indicates the DFF is effective in increasing the classifier's performance.
The same scenario can be observed by comparing BN of N(1) against DBN of N(5), comparing BDBN of N(2) DBDBN of N(6), and comparing BDRBN of N(3) and DBDRBN of N(7). The reason why DFF can improve the performance is because DFF fuses features from GCN, which learns the relation-awareness relationships (RARs) among the validation images. Hence, classifiers with DFFs were more accurate than those without DFFs.
Besides, from Table 9 and Fig. 13, we can find the optimal . That means, N(8) achieved the best performance among all our proposed eight network models. This falls within our expectation, because DBDRMBN of N(8) fuses RAR features from GCN with features from BDRMBN of N(4). This feature fusion helps our classifier obtained the best performance. As designed in Section 3.6, the best model among all eight proposed networks is dubbed as FGCNet, indicating the fusion of GCN and CNN networks.
4.7. Comparison to state-of-the-art approaches
Above performances ran on the validation set. Now we run the best model FGCNet, i.e., DBDRMBN model on the test set, and report its performance. The results are shown in Table 10 . As is shown, , , , , , , . Comparing Tables 9and 10, we observe that the performances on validation set and test set are quite similar, only the test performance is slightly less than the validation performance.
Table 10.
Performance of proposed N(8) FGCNet on test set.
| Run | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
|---|---|---|---|---|---|---|---|
| 1 | 98.96 | 96.88 | 96.94 | 97.92 | 97.94 | 95.85 | 97.94 |
| 2 | 96.88 | 96.88 | 96.88 | 96.88 | 96.88 | 93.75 | 96.88 |
| 3 | 96.88 | 98.96 | 98.94 | 97.92 | 97.89 | 95.85 | 97.90 |
| 4 | 96.88 | 95.83 | 95.88 | 96.35 | 96.37 | 92.71 | 96.37 |
| 5 | 96.88 | 95.83 | 95.88 | 96.35 | 96.37 | 92.71 | 96.37 |
| 6 | 97.92 | 95.83 | 95.92 | 96.88 | 96.91 | 93.77 | 96.91 |
| 7 | 95.83 | 95.83 | 95.83 | 95.83 | 95.83 | 91.67 | 95.83 |
| 8 | 100.00 | 97.92 | 97.96 | 98.96 | 98.97 | 97.94 | 98.97 |
| 9 | 96.88 | 93.75 | 93.94 | 95.31 | 95.38 | 90.67 | 95.40 |
| 10 | 100.00 | 97.92 | 97.96 | 98.96 | 98.97 | 97.94 | 98.97 |
| MSD | 97.71±1.46 | 96.56±1.48 | 96.61±1.43 | 97.14±1.26 | 97.15±1.25 | 94.29±2.52 | 97.16±1.25 |
We compared our DBDRMBN method, i.e., N(8) model, with 15 state-of-the-art approaches: RBFNN [4], KELM [5], ELM-BA [6], RCBBO [7], 6L-CLF [8], GoogLeNet [9], ResNet-18 [10], RN-50-AD [11], SMO [12], CSSNet [13], GGNet [14], COVNet [15], NiNet [16], FCONet [17], and DeCovNet [18]. All the methods were compared on the test set of our 640-image dataset. Some methods were not proposed for detecting COVID-19, some methods work on Chest X-ray images, and some are for multi-class classification, nevertheless, we modified and transferred their methods to our CCT images. The comparison and its plot are presented in Table 11 and Fig. 14 . The results in Table 11 and Fig. 14 show that the proposed FGCNet (DBDRMBN) achieved the best results among all algorithms.
Table 11.
Comparison with state-of-the-art approaches.
| Approach | ν1 | ν2 | ν3 | ν4 | ν5 | ν6 | ν7 |
|---|---|---|---|---|---|---|---|
| RBFNN [4] | 67.08 | 74.48 | 72.52 | 70.78 | 69.64 | 41.74 | 69.64 |
| KELM [5] | 57.29 | 61.46 | 59.83 | 59.38 | 58.46 | 18.81 | 58.46 |
| ELM-BA [6] | 57.08 ±3.86 |
72.40 ±3.03 |
67.48 ±1.65 |
64.74 ±1.26 |
61.75 ±2.24 |
29.90 ±2.45 |
61.76 ±2.24 |
| RCBBO [7] | 69.48 ±4.47 |
81.15 ±3.16 |
78.79 ±1.80 |
75.31 ±0.82 |
73.72 ±1.86 |
51.10 ±1.28 |
73.93 ±1.66 |
| 6L-CLF [8] | 81.04 ± 2.90 |
79.27 ± 2.21 |
79.70 ± 1.27 |
80.16 ± 0.85 |
80.31 ± 1.13 |
60.42 ± 1.73 |
80.35 ± 1.15 |
| GoogLeNet [9] | 76.88 ± 3.92 |
83.96 ± 2.29 |
82.84 ± 1.58 |
80.42 ± 1.40 |
79.65 ± 1.92 |
61.10 ± 2.62 |
79.65 ± 1.91 |
| ResNet-18 [10] | 78.96 ± 2.90 |
89.48 ± 1.64 |
88.30 ± 1.50 |
84.22 ± 1.23 |
83.31 ± 1.53 |
68.89 ± 2.33 |
83.32 ± 1.53 |
| RN-50-AD [11] | 83.96 ±3.19 |
90.31 ±2.14 |
89.73 ±1.78 |
87.14 ±1.07 |
86.69 ±1.34 |
74.50 ±2.00 |
86.77 ±1.27 |
| SMO [12] | 93.23 ±1.72 |
95.52 ±1.30 |
95.44 ±1.22 |
94.38 ±0.64 |
94.31 ±0.68 |
88.80 ±1.27 |
93.23 ±1.72 |
| CSSNet [13] | 92.08 ±1.01 |
93.33 ±2.61 |
93.32 ±2.40 |
92.71 ±0.95 |
92.67 ±0.85 |
85.47 ±1.93 |
92.69 ±0.86 |
| GGNet [14] | 94.38 ±2.09 |
90.63 ±2.02 |
91.00 ±1.77 |
92.50 ±1.16 |
92.64 ±1.14 |
85.11± 2.34 |
92.66 ±1.15 |
| COVNet [15] | 90.83 ±2.45 |
96.67 ±0.82 |
96.47 ±0.82 |
93.75 ±1.13 |
93.55 ±1.24 |
87.68 ±2.14 |
93.60 ±1.21 |
| NiNet [16] | 95.31 ±1.13 |
77.19 ±2.62 |
80.73 ±1.70 |
86.25 ±1.02 |
87.40 ±0.79 |
73.76 ±1.73 |
87.71 ±0.73 |
| FCONet [17] | 93.54 ±1.69 |
96.04 ±1.46 |
95.96 ±1.38 |
94.79 ±0.92 |
94.72 ±0.94 |
89.64 ±1.83 |
94.74 ±0.94 |
| DeCovNet [18] | 90.00 ±1.49 |
90.52 ±1.99 |
90.52 ±1.72 |
90.26 ±0.65 |
90.24 ±0.61 |
80.56 ±1.32 |
90.25 ±0.61 |
| FGCNet (Ours) |
97.71 ±1.46 |
96.56 ±1.48 |
96.61 ±1.43 |
97.14 ±1.26 |
97.15 ±1.25 |
94.29 ±2.52 |
97.16 ±1.25 |
Fig. 14.
Comparison plot.
5. Conclusions
This paper proposed a total eight network models for COVID-19 detection in CCTs. Our experiments showed our DBDRMBN of N(8) can achieve the best performance among all the eight proposed models. This model is also named as FGCNet for short, and it obtains superior results to other 15 state-of-the-art approaches.
The reason why our FGCNet has the best performance is (i) because the FGCNet (DBDRMBN) is a deep feature fusion combination of an improved CNN model N(4) of BDRMBN and a GCN model. Here, N(4) (BDRMBN) helps to extract learnt individual image-level representations, while GCN helps to extract learnt relation-aware representation. Finally, the DFF strategy help fuse those two types of features. (ii) The proposed N(4) (BDRMBN) is a novel neural network trained with its structure developed and weights trained from scratch. Besides, BDRMBN used several advanced techniques, such as batch normalization, dropout, rank-based average pooling, and multiple-way data augmentation.
The shortcomings of this proposed FGCNet can only handles CCT images. For chest X-ray images and other sources of data, the FGCNet may not work correctly. So, it is necessary to define a network that can fuse different sources/modalities of data and give an overall decision. Another shortcoming is this proposed FGCNet is not verified by a strict clinical test.
The future work directions are: (i) Expand the dataset and test our algorithm on different sources of COVID-19, such as the combination of X-ray and CT. (ii) Test other fuse strategies of CNN and GCN. (iii) Try to employ deeper GCN, and test whether deeper GCN will help improve the performance.
CRediT authorship contribution statement
Shui-Hua Wang: Conceptualization, Methodology, Software, Validation, Data curation, Writing - original draft, Investigation, Data curation. Vishnu Varthanan Govindaraj: Formal analysis, Writing - original draft, Writing - review & editing. Juan Manuel Górriz: Writing - original draft, Writing - review & editing, Supervision, Funding acquisition. Xin Zhang: Writing - original draft, Writing - review & editing. Yu-Dong Zhang: Resources, Formal analysis, Investigation, Data curation, Writing - review & editing, Supervision, Project administration, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This paper is partially supported by Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Hope Foundation for Cancer Research, UK (RM60G0680); Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); British Heart Foundation Accelerator Award, UK; and MINECO/FEDER under the RTI2018-098913-B100 and A-TIC-080-UGR18 projects.
Appendix A
Table 12.
Abbreviation list.
| Abbreviation | Full name |
|---|---|
| CCT | chest computed tomography |
| GGO | ground-glass opacity |
| CAP | community acquired pneumonia |
| HS | hyperspectral |
| CNN | convolutional neural network |
| GCN | graph convolutional network |
| IIR | individual image-level representation |
| RAR | relation-aware representation |
| MV | majority voting |
| NLAF | non-linear activation function |
| ICS | internal covariant shift |
| BAN | Batch Normalization |
| DO | Dropout |
| L2P | l2 pooling |
| MDA | Multiple-way data augmentation |
| SSD | small-size dataset |
| LG | lack of generation |
| KMC | k-means clustering |
| kNN | k-nearest neighbors |
| BN | Base Network |
| FGCNet | Fusion of GCN and CNN Network |
Appendix B
Table 13.
Symbol list.
| Symbol | Meaning |
|---|---|
| L | Labeling from each individual expert |
| Final labeling | |
| U | Dataset |
| U1 | Raw dataset |
| U5 | Preprocessed dataset |
| |U| | Number of samples in the dataset |
| Xt | Training set |
| Xv | Validation Set |
| Y | Test set |
| μmin | Min grayscale value |
| μmax | Max grayscale value |
| W | Width |
| H | Height |
| C | Channel |
| αs | Stability factor |
| αrp | Retention probability |
| μe | Empirical mean |
| ϕe | Empirical variance |
| μp | Population mean |
| ϕp | Population variance |
| Ψ | Region to be pooled |
| n | Pooling size. |
| P | Pooling output |
| ag | Rank threshold |
| χ | Various data augmentation factor |
| ηDA | Number of MDA techniques |
| ηn | Number of new images generated for each DA |
| aZ | Maximum shift factor |
| Mirror function | |
| ηEF | Enhance factor |
| Concatenation | |
| uc | Number of conv layers |
| uf | Number of FCL layers |
| |V| | Cluster centroids |
| Number of neighbors in kNN | |
| W | Number of runs |
| w | Run index |
Appendix C
Fig. 4 shows a simplistic CNN example with four FCL layers. Suppose we have neurons at k-th layer, and assume , , , . Thus, we have in total nodes. Suppose we do not consider incoming and outgoing weights, and do not consider the number of biases, the size of learnable weights C(i, j), where as number of weights between layer i and layer j before dropout, roughly calculating, can be written as , , . In total, we have the total number of learnable weights before dropout as .
Using , the size of learnable weights after dropout between layer iand layer j is symbolized as CD(i, j), and we can calculate the total number of learnable weights as . The compression ratio of learnable weights (CRLW), roughly, can be calculated by ,
Appendix D
Using Fig. 5 as a simplistic example, and assuming the region Ψ(1, 1) at 1st row 1st column of the input is chosen. For the sake of format, we use its row-vector to represent in this paragraph, so we have . We can calculate the results of L2P as . The pooling result of AP is . The MP result is . The rank matrix of Ψ(1, 1) also expressed in row-vector format R Ψ(1, 1) ← vec(R Ψ(1, 1)), we have . The RAP result is .
References
- 1.Ataguba O.A., Ataguba J.E. Social determinants of health: the role of effective communication in the COVID-19 pandemic in developing countries. Glob. Health Action. 2020;13:5. doi: 10.1080/16549716.2020.1788263. Article ID: 1788263Dec. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ketencioglu B.B., Yigit F., Almadqa M., Tutar N., Yilmaz I. Non-infectious diseases compatible With COVID-19 Pneumonia. Cureus. 2020;12:5. doi: 10.7759/cureus.9989. Article ID: e9989, Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hope M.D., Raptis C.A., Shah A., Hammer M.M., Henry T.S., Six S. A role for CT in COVID-19? What data really tell us so far. Lancet. 2020;395:1189–1190. doi: 10.1016/S0140-6736(20)30728-5. Apr. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lu Z. A pathological brain detection system based on radial basis function neural network. J. Med. Imaging Health Inform. 2016;6:1218–1222. [Google Scholar]
- 5.Yang J. A pathological brain detection system based on kernel based ELM. Multimed. Tools Appl.. 2018;77:3715–3728. [Google Scholar]
- 6.Lu S. A pathological brain detection system based on extreme learning machine optimized by bat algorithm. CNS Neurol. Disord.-Drug Targets. 2017;16:23–29. doi: 10.2174/1871527315666161019153259. [DOI] [PubMed] [Google Scholar]
- 7.Li P., Liu G. Pathological brain detection via wavelet packet Tsallis entropy and real-coded biogeography-based optimization. Fundam. Inform. 2017;151:275–291. [Google Scholar]
- 8.Jiang X. Chinese sign language fingerspelling recognition via six-layer convolutional neural network with leaky rectified linear units for therapy and rehabilitation. J. Med. Imaging Health Inform.. 2019;9:2031–2038. [Google Scholar]
- 9.Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE; Boston, MA, USA: 2015. Going deeper with convolutions; pp. 1–9. [Google Scholar]
- 10.Guo M.H., Du Y.Z. Classification of thyroid ultrasound standard plane images using ResNet-18 networks. IEEE 13th International Conference on Anti-Counterfeiting, Security, and Identification; Xiamen, China; 2019. pp. 324–328. [Google Scholar]
- 11.Fulton L.V., Dolezel D., Harrop J., Yan Y., Fulton C.P. Classification of Alzheimer's disease with and without imagery using gradient boosted machines and ResNet-50. Brain Sci. 2019;9:16. doi: 10.3390/brainsci9090212. Article ID: 212, Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Togacar M., Ergen B., Comert Z. COVID-19 detection using deep learning models to exploit social mimic optimization and structured chest X-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 2020;121:12. doi: 10.1016/j.compbiomed.2020.103805. Article ID: 103805, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cohen J.P., Dao L., Morrison P., Roth K., Bengio Y., Shen B.Y. Predicting COVID-19 pneumonia severity on chest X-ray with deep learning. Cureus. 2020;12:10. doi: 10.7759/cureus.9448. Article ID: e9448, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Loey M., Smarandache F., Khalifa N.E.M. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry-Basel. 2020;12:19. Article ID: 651, Apr. [Google Scholar]
- 15.Li L., Qin L., Xu Z., Yin Y., Wang X., Kong B. Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology. 2020;296:E65–E71. doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ni Q.Q., Sun Z.Y., Qi L., Chen W., Yang Y., Wang L. A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images. Eur. Radiol. 2020:11. doi: 10.1007/s00330-020-07044-9. https://link.springer.com/article/10.1007/s00330-020-07044-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ko H., Chung H., Kang W.S., Kim K.W., Shin Y., Kang S.J. COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest CT image: model development and validation. J. Med. Internet Res. 2020;22:13. doi: 10.2196/19569. Article ID: e19569, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wang X.G., Deng X.B., Fu Q., Zhou Q., Feng J.P., Ma H. A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT. IEEE Trans. Med. Imaging. 2020;39:2615–2625. doi: 10.1109/TMI.2020.2995965. Aug. [DOI] [PubMed] [Google Scholar]
- 19.S. Tabik, A. Gómez-Ríos, J. Martín-Rodríguez, I. Sevillano-García, M. Rey-Area, D. Charte, et al., "COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on Chest X-Ray images," arXiv Preprint 2020. [DOI] [PMC free article] [PubMed]
- 20.Hasan M.M., Schaduangrat N., Basith S., Lee G., Shoombuatong W., Manavalan B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics. 2020;36:3350–3356. doi: 10.1093/bioinformatics/btaa160. Jun. [DOI] [PubMed] [Google Scholar]
- 21.Li D., Wang Q., Kong F.Q. Adaptive kernel sparse representation based on multiple feature learning for hyperspectral image classification. Neurocomputing. 2020;400:97–112. Aug. [Google Scholar]
- 22.Shi J., Wang R., Zheng Y., Jiang Z., Yu L. Graph convolutional networks for cervical cell classification. Second MICCAI Workshop on Computational Pathology (COMPAT); Shenzhen, China; 2019. [Google Scholar]
- 23.Bin Y.R., Chen Z.M., Wei X.S., Chen X.Y., Gao C.X., Sang N. Structure-aware human pose estimation with graph convolutional networks. Pattern Recognit. 2020;106:13. Article ID: 107410, Oct. [Google Scholar]
- 24.Tian Z.Q., Li X.J., Zheng Y.Y., Chen Z., Shi Z., Liu L.Z. Graph-convolutional-network-based interactive prostate segmentation in MR images. Med. Phys. 2020:13. doi: 10.1002/mp.14327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Castillo-Barnes D., Su L., Ramirez J., Salas-Gonzalez D., Martinez-Murcia F.J., Illan I.A. Autosomal dominantly inherited Alzheimer disease: analysis of genetic subgroups by machine learning. Inf. Fusion. 2020;58:153–167. doi: 10.1016/j.inffus.2020.01.001. Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rodriguez-Rivero J., Ramirez J., Martinez-Murcia F.J., Segovia F., Ortiz A., Salas D. Granger causality-based information fusion applied to electrical measurements from power transformers. Inf. Fusion. 2020;57:59–70. May. [Google Scholar]
- 27.Valueva M.V., Nagornov N.N., Lyakhov P.A., Valuev G.V., Chervyakov N.I. Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math. Comput. Simul. 2020;177:232–243. Nov. [Google Scholar]
- 28.Tabik S., Alvear-Sandoval R.F., Ruiz M.M., Sancho-Gomez J.L., Figueiras-Vidal A.R., Herrera F. MNIST-NET10: a heterogeneous deep networks fusion based on the degree of certainty to reach 0.1% error rate. Ensembles overview and proposal. Inf. Fusion. 2020;62:73–80. Oct. [Google Scholar]
- 29.Mittal G., Korus P., Memon N. FiFTy: large-scale file fragment type identification using convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 2021;16:28–41. [Google Scholar]
- 30.Azamfar M., Singh J., Bravo-Imaz I., Lee J. Multisensor data fusion for gearbox fault diagnosis using 2-D convolutional neural network and motor current signature analysis. Mech. Syst. Signal Process. 2020;144:18. Article ID: 106861, Oct. [Google Scholar]
- 31.Kim E. Interpretable and accurate convolutional neural networks for human activity recognition. IEEE Trans. Ind. Inf. 2020;16:7190–7198. Nov. [Google Scholar]
- 32.Jeon S., Moon J. Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inf. Sci. 2020;535:1–15. Oct. [Google Scholar]
- 33.Nayak D.R., Das D., Dash R., Majhi S., Majhi B. Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images. Multimed. Tools Appl. 2020;79:15381–15396. Jun. [Google Scholar]
- 34.Górriz J.M. Artificial intelligence within the interplay between natural and artificial computation: advances in data science, trends and applications. Neurocomputing. 2020;410:237–270. [Google Scholar]
- 35.Zhang Y.-D. Advances in multimodal data fusion in neuroimaging: overview, challenges, and novel orientation. Inf. Fusion. 2020;64:149–187. doi: 10.1016/j.inffus.2020.07.006. 2020/12/01/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kileel J., Trager M., Bruna J. Advances in Neural Information Processing Systems. Vol. 32. 2019. On the expressive power of deep polynomial neural networks; pp. 1–10. 32H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, and R. Garnett, Eds., ed La Jolla: Neural Information Processing Systems (NIPS) https://papers.nips.cc/paper/9219-on-the-expressive-power-of-deep-polynomial-neural-networks.pdf. [Google Scholar]
- 37.Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958. Jun. [Google Scholar]
- 38.Shi Z.L., Ye Y.D., Wu Y.P. Rank-based pooling for deep convolutional neural networks. Neural Netw. 2016;83:21–31. doi: 10.1016/j.neunet.2016.07.003. Nov. [DOI] [PubMed] [Google Scholar]
- 39.Jiang Y.Y. Cerebral micro-bleed detection based on the convolution neural network with rank based average pooling. IEEE Access. 2017;5:16576–16583. [Google Scholar]
- 40.Sun Z., Chiong R., Hu Z.P. An extended dictionary representation approach with deep subspace learning for facial expression recognition. Neurocomputing. 2018;316:1–9. Nov. [Google Scholar]
- 41.Akhtar N., Ragavendran U. Interpretation of intelligence in CNN-pooling processes: a methodological survey. Neural. Comput. Appl. 2020;32:879–898. Feb. [Google Scholar]
- 42.Tarawneh A.S., Hassanat A.B.A., Almohammadi K., Chetverikov D., Bellinger C. SMOTEFUNA: synthetic minority over-sampling technique based on furthest neighbour algorithm. IEEE Access. 2020;8:59069–59082. [Google Scholar]
- 43.Jan Z., Verma B. Multiple strong and balanced cluster-based ensemble of deep learners. Pattern Recognit. 2020;107:11. Article ID: 107420, Nov. [Google Scholar]
- 44.Spinelli I., Scardapane S., Uncini A. Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 2020;129:249–260. doi: 10.1016/j.neunet.2020.06.005. Sep. [DOI] [PubMed] [Google Scholar]
- 45.Mallick T., Balaprakash P., Rask E., Macfarlane J. Graph-partitioning-based diffusion convolutional recurrent neural network for large-scale traffic forecasting. Transp Res Rec. 2020;2674:473–488. [Google Scholar]
- 46.Derr T., Ma Y., Fan W.Q., Liu X.R., Aggarwal C., Tang J.L. Epidemic graph convolutional network. 13th International Conference on Web Search and Data Mining; Houston, TX; 2020. pp. 160–168. [Google Scholar]
- 47.Jeong C., Jang S., Park E., Choi S. A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics. 2020;124:1907–1922. Sep, [Google Scholar]
- 48.Kipf T.N., Welling M. International Conference on Learning Representations (ICLR) ICLR; Palais des Congrès Neptune: 2017. Semi-supervised classification with graph convolutional networks; pp. 1–14. presented at the. [Google Scholar]
- 49.Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020;128:336–359. 2020/02/01. [Google Scholar]















