Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2020 Oct 9;67:208–229. doi: 10.1016/j.inffus.2020.10.004

Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network

Shui-Hua Wang a,b,c,1, Vishnu Varthanan Govindaraj d,1, Juan Manuel Górriz e,f,, Xin Zhang g,, Yu-Dong Zhang b,h,
PMCID: PMC7544601  PMID: 33052196

Highlights

  • We analysed over 320 COVID-19 images and 320 healthy control images.

  • We proposed an improved CNN to extract individual image-level features.

  • We proposed to use GCN to extract relation-aware representations.

  • We proposed a DFF technology to combine features from GCN and CNN.

  • The proposed FCGNet gives better performance than 15 state-of-the-art approaches.

Keywords: Deep feature fusion, Convolutional neural network, Graph convolutional network, Multiple-way data augmentation, Batch normalization, Dropout, Rank-based average pooling

Abstract

(Aim) COVID-19 is an infectious disease spreading to the world this year. In this study, we plan to develop an artificial intelligence based tool to diagnose on chest CT images.

(Method) On one hand, we extract features from a self-created convolutional neural network (CNN) to learn individual image-level representations. The proposed CNN employed several new techniques such as rank-based average pooling and multiple-way data augmentation. On the other hand, relation-aware representations were learnt from graph convolutional network (GCN). Deep feature fusion (DFF) was developed in this work to fuse individual image-level features and relation-aware features from both GCN and CNN, respectively. The best model was named as FGCNet.

(Results) The experiment first chose the best model from eight proposed network models, and then compared it with 15 state-of-the-art approaches.

(Conclusion) The proposed FGCNet model is effective and gives better performance than all 15 state-of-the-art methods. Thus, our proposed FGCNet model can assist radiologists to rapidly detect COVID-19 from chest CT images.

1. Introduction

COVID-19 (also known as coronavirus) was declared as a Public Health Emergency of International Concern on 30/01/2020, and declared as a worldwide pandemic on 11/03/2020 [1]. Till 16/Sep, this COVID-19 pandemic caused 29.6 million confirmed cases and 936.9 thousand death tolls (US 199.1k deaths, Brazil 133.2k deaths, India 82.0k deaths, Mexico 71.6k deaths, UK 41.6k deaths, etc.)

Two prevail diagnosis are available. One is viral testing via a nasopharyngeal swab to test the presence of viral RNA fragments. The samples are then tested by the method of real-time reverse transcription polymerase chain reaction (rRT-PCR) [2]. In some situation, a nasal swab or sputum sample may also be used. Results of rRT-PCR are generally available within some hours to two days. Another is imaging methods, among which the chest computed tomography (CCT) is one of the imaging devices that can provide the highest sensitivity. The main biomarkers in CCT differentiating COVID-19 from healthy people are the asymmetric peripheral ground-glass opacities (GGOs) without pleural effusions. The advantages of imaging methods are that they could aid in screening or accelerate the speed of diagnosis, especially with shortages of RT-PCR [3].

However, manual interpretation by radiologists is tedious and easy to be influenced by inter-expert and intra-expert factors (such as fatigue, emotion, etc.). Besides, the diagnosis throughput by human experts are not comparable with machines. Particularly, early symptoms are smaller, which may be neglected by human experts. Smart diagnosis systems via computer vision and artificial intelligence can benefit patients, radiologists, experts and hospitals.

Traditional artificial intelligence (AI) and modern deep learning (DL) methods have achieved excellent results in analyzing medical images, e.g., Lu [4] proposed a radial-basis-function neural network (RBFNN) to detect pathological brains. Yang [5] presented a kernel-based extreme learning classifier (KELM) to create a novel pathological brain detection system. Their methods were robust and effective. Lu [6] proposed a novel extreme learning machine trained by the bat algorithm (ELM-BA) approach. Li and Liu [7] employed a real-coded biogeography-based optimization (RCBBO) for pathological brain detection, and this RCBBO method can be transferred to COVID-19 detection. Jiang [8] proposed a six-layer convolutional neural network with leaky rectified linear unit for fingerspelling recognition. Their method was shorted as 6L-CLF. Szegedy, et al. [9] presented the GoogleNet. Guo and Du [10] suggested the use of ResNet-18 for thyroid ultrasound classification. Fulton, et al. [11] employed ResNet-50 for classification of Alzheimer's disease. Their method is called RN-50-AD. Togacar, et al. [12] used SqueezeNet and MobileNetV2 to extract features, and employed social mimic optimization (SMO) for feature selection and combination. Cohen, et al. [13] used a large non-COVID-19 chest X-ray set to construct features for COVID-19 images. They authors predicted geographic extent score and lung opacity score to gauge severity of COVID-19 infection. Their method was abbreviated as Covid severity score net (CSSNet) in short. Loey, et al. [14] used generative adversarial network (GAN) to generate more images, and they found GoogleNet with GAN works better for two-class classification. They method was called GGNet. Li, et al. [15] used ResNet50 as the backbone. The CNN features were combined by a max-pooling operation. The resulting feature map was fed to a fully-connected layer to generate the probability score of COVID-19, community acquired pneumonia (CAP), and non-pneumonia. Their method was called COVNet. Ni, et al. [16] used 3D U-Net and MVP-Net on 96 COVID-19 patients in CCT images, for pulmonary lobe segmentation, COVID-19 lesion detection and COVID-19 lesion segmentation. Their method is called NiNet in this study. Ko, et al. [17] proposed a simple 2D deep learning framework for single CCT image. The authors compared four pretrained models: VGG16, ResNet-50, Inception-V3, and Xception. They found ResNet-50 showed the best performance. The authors used two augmentation method: image rotation and zoom. Their proposed additional layers consist of a flatten layer, a fully connected layer with 32 neurons, and a fully connected layer with 3 neurons. Their model can classify three classes: non-pneumonia, other pneumonia, and COVID-19. Their method was named as fast-track COVID-19 classification network (FCONet). Wang, et al. [18] developed a weakly-supervised deep learning framework using 3D CT volumes for COVID-19 classification and lesion localization. In their method, the lung region was segmented via a pre-trained UNet. The segmented 3D lung region was fed into a 3D deep neural network to predict the COVID-19 infection probability. Their method was named 3D deep convolutional neural network to detect COVID-19 (DeCovNet). Tabik, et al. [19] proposed a COVID-SDNet for predicting COVID-19 based on chest X-ray images.

However, most of the existing COVID-19 algorithms above employed single feature representation (SFR), and ignore fusing multiple feature representations (MFRs). Commonly, MFRs can yield better results than using SFR, because MFR is more informative and accurate than any SFR, and MFR consists of all the necessary information. The disadvantages of MFR are (i) they are of high-dimensionality and needs fusion technology, but fuse sometimes will introduce the distortion into the fused features; (ii) Fusion is not a static process in nature; (iii) Fusion of trivial features may affect the results. For example, Hasan, et al. [20] fused MFRs in robust hemolytic peptide prediction. Their cross-validation results showed their method outperformed state-of-the-art approaches. Li, et al. [21] used kernel sparse MFRs for hyperspectral (HS) image classification. Their method gave better results than state-of-the-art HS image classification methods. There are more success cases of using MFRs yielding better performances than SFR.

The main contribution of this paper is deep feature fusion (DFF), viz., the fuse of multiple deep feature representations from both convolutional neural network (CNN) and graph convolutional network (GCN). CNN yields individual image-level representation (IIR), while GCN yields relation-aware representation (RAR). Hence, IIR and RAR are fused together at feature-level, and experiments proved the DFF proposed is efficient, and the fused features give better performances than using IIR features alone. Particularly, our method addresses related subproblems, e.g., feature selection, feature fusion, classifier selection. The resulting system is an efficient pipeline for COVID-19 diagnosis. We propose a fully automatic method which aims to ease the burden of the radiologist. Other contributions of this study are: (i) we propose to use batch normalization and dropout to our deep neural network model; (ii) we use rank-based average pooling to replace traditional max pooling; (iii) we propose multiple-way data augmentation. Finally, our model was demonstrated to give better performance than state-of-the-art approaches.

2. Dataset and preprocessing

Tables 12 and 13 itemized the abbreviation and mathematical symbols used in this study for easy reading. See Appendix A and Appendix B.

2.1. Dataset

Image acquisition CT configuration and method: Philips Ingenuity 64 row spiral CT machine, KV: 120, MAS: 240, layer thickness 3 mm, layer spacing 3 mm, screw pitch 1.5: lung window (W: 1500 HU, L: −500 HU), Mediastinum window (W: 350 HU, L: 60 HU), thin layer reconstruction according to the lesion display, layer thickness and layer distance are 1 mm lung window image. The patients were placed in a supine position, breathing deeply after holding in, and conventionally scanned from the lung tip to the costal diaphragm angle.

For each subject, 1–4 slices were chosen by radiologists using slice level selection method, because usually 4 slices are sufficient to cover the lesion. For COVID-19 pneumonia patients, the slice showing the largest size and number of lesions was selected. For normal subjects, any level of the image can be selected.

The resolutions of all selected images are 1, 024 × 1, 024 × 3. Table 1 lists the demographics of subjects, where we have two categories: (i) COVID-19 patient, and (ii) healthy control (HC) subjects.

Table 1.

Demographics of subjects used in this study.

Subject nos Image nos. Age range
COVID-19 142 320 22–91
HC 142 320 21–76

When there are differences between the analyses of two junior radiologist (A 1, A 2), a superior doctor (A 3) was consulted to reach a consensus. Suppose X means a CCT scan, L means the labeling of each individual expert, and the final labeling L^is obtained by

L^(X)={L(A1)L(A1)=L(A2)MV(Lall)otherwise (1)

where Lall represents the labeling of all three experts, i.e., Lall=[L(A1),L(A2),L(A3)].MV denotes majority voting.

2.2. Preprocessing

The original dataset containing 320 COVID-19 images and 320 HC images. The dataset is symbolized as U 1, each image is symbolized as u1(i)U1,i=1,2,,|U|=640. We have U1=[u1(1),u1(2),,u1(i),,u1(|U|)]. The size of each image is size[u1(i)]=W1×H1×C1. Here W1=H1=1024, C1=3.

The raw images are not suitable to train the deep neural networks, because (i) they have redundant information in three color channels; (ii) their contrast are incoherent; (iii) they contained background, checkup bed, and text information; and (iv) their sizes are too large. Fig. 1 shows the pipeline of preprocessing of our COVID-19 dataset.

Fig. 1.

Fig 1

Illustration of preprocessing.

First, we converted the color images to grayscale by only reserving the luminance information, and thus got the grayscale image set U 2 as

U2=G(U1)={u2(1),u2(2),,u2(i),u2(|U|)} (2)

where means the grayscale operation. Now size[u2(i)]=W2×H2×C2. Here W2=H2=1024, C2=1.

Second, histogram stretching (HS) method was used to increase every image's contrast. For ith image u2(i),i=1,2,,|U|, we first calculate their minimum grayscale value μmin(i) and maximum grayscale value μmax(i) respectively by

μmin(i)=minx=1W1miny=1H1u2(i|x,y) (3.a)
μmax(i)=maxx=1W1maxy=1H1u2(i|x,y) (3.b)

here (x, y) means coordinates of pixel of the image u 2(i). The new histogram stretched image u 3(i) is obtained by

u3(i)=u2(i)μmin(i)μmax(i)μmin(i) (4)

In all, we get the histogram stretched image set U3=HS(V2)={u3(1),u3(2),,u3(i),u3(|U|)}.

Third, we crop the images to remove the texts at the margin areas, and the checkup bed at the bottom area. Thus, we get the cropped dataset U 4 as

U4=C(U3,[vt,vb,vl,vr])={u4(1),u4(2),,u4(i),,u4(|U|)} (5)

where C represents crop operation. Parameter (vt, vb, vl, vr) means the crop values in unit of pixel from top, bottom, left, and right. We set vt=vb=vl=vr=150. Now the size of each image size[u4(i)]=W4×H4×C4. We can have W4=H4=724, and C4=C2=1.

Fourth, we downsampled each image to size of [W 5, H 5], and we now get the resized image set U 5 as

U5=(U4,[W5,H5])={u5(1),u5(2),,u5(i),u5(|U|)} (6)

where ⇓: xy means the downsampling (DS) function, where y is a downsampled image of original image x. In this study, W5=H5=256, C5=1. The advantage of DS are two parts: (i) It can save storage, as shown in Table 2 . (ii) Smaller-size dataset can help the following classification system from overfitting. The reason why we set W5=H5=256is based on trial-and-error method. We found that larger size will bring in overfitting which impairs the performance, and meanwhile, smaller size will make the images blurry which also decreases the classifier's performances.

Table 2.

Image size and storage per image at each preprocessing step.

Preprocess Symbol W H C Size (per image) Storage (per image)
Original u1(i) 1024 1024 3 3, 145, 728 12,582,912
Grayscale u2(i) 1024 1024 1 1, 048, 576 4194,304
HS u3(i) 1024 1024 1 1, 048, 576 4194,304
Crop u4(i) 724 724 1 524, 176 2096,704
DS u5(i) 256 256 1 65, 536 262,144

Table 2 compares the size and storage of each image us(i),s=1,,5,i=1,,|U|at every preprocessing step. We can see here after preprocessing procedure, each image will only cost about 2.08% of its original storage or size. The compression ratio (CR) rates of ith image of final state U 5 to original stage U 1 were calculated as. CRStorage(i)=byte[u5(i)]/byte[u1(i)]=262,144/12,582,912, and CRsize(i)=size[u5(i)]/size[u1(i)]=65,536/3,145,728. Hence, we can get CRsize(i)=CRsize(i)=2.083%, i=1,2,,|U|. Fig. 2 (a and b) shows two samples (COVID of a and HC of b) from the preprocessed dataset U 5. Fig. 2(c) delineates the lesions of (a) within red circles.

Fig. 2.

Fig 2

Two samples of preprocessed dataset U5.

3. Methodology

The motivation of this study is two-fold. First, we plan to create a customary state-of-the-art comparable convolutional neural network with several improvements, including batch normalization, dropout, rank-based average pooling, and multiple-way data augmentation. This motivation of using CNN is to extract individual image-level representation (IIR).

Nevertheless, CNN cancels out the relation of a particular image among a group of images. In contrast, this relation-aware representation (RAR) can be captured by graph convolutional network (GCN). Hence, the second main motivation is (i) to use GCN to establish connectivity analysis and extract RAR features; and (ii) to fuse CNN features and GCN features together to enhance the classifier's performance.

Using GCN with CNN can obtain better performance than using CNN merely. Shi, et al. [22] used GCN for cervical cell classification. Their results were significantly better than ResNet-101 and DenseNet-121. Bin, et al. [23] used GCN to extract structure-aware human pose estimation. Their experiments on single- and multi-person estimation benchmark datasets showed that GCN consistently outperforms competing state-of-the-art methods. Tian, et al. [24] proposed a novel GCN-based interactive prostate segmentation in MR images. Their method yielded mean Dice similarity coefficients of 93.8 ± 1.2% and 94.4 ± 1.0% on their in-house and PROMISE12 datasets, respectively. All those three literatures show the powerfulness of GCN.

3.1. Basics of CNN

Traditional machine learning achieved excellent results on disease detections [25,26]. Convolutional neural network (CNN) is a new artificial neural network. Generally, CNN is composed of conv layers (CLs), pooling layers (PLs), non-linear activation functions (NLAFs) and fully connected layers (FCLs) [27,28].

The essential operation in CNN is convolution. A complete CL performs 2D convolution along the width and height directions [29]. Note that the weights in CNN are initialized randomly, and then weights are learnt from the data itself by network training. Fig. 3 illustrates the pipeline of input feature maps passing across a complete CL.

Fig. 3.

Fig 3

Illustration of a complete conv layer.

Fig. 3 shows there are three steps during a complete conv layer: (i) Kernel-based convolution; (ii) Stack; (iii) NLAF. Assume there is an input matrix Γ, kernels Qj, ∀j ∈ [1, ⋅⋅⋅, J], and an output T, (here output T means output of the whole three-step complete conv layer, not the output of merely convolution operation). Note a conv layer means the layer runs convolution, and the “complete conv layer” means the combination of the conv layer; the stack, and NLAF. In Fig. 3, we used the same color to denote the input and output, because the output is the input of next conv layer.

For each kernel Qj, the convolution output is

Step1:f(j)=ΓQj,j[1,,J] (7)

where ⊗ means convolution operation. Then, all f(j) matrixes are stacked into a three-dimensional matrix F.

Step2:F=[f(1),,f(J)] (8)

where means the stack operation. Finally, the matrix F is passed into the NLAF and output the final matrix [30]

Step3:T=NLAF(F) (9)

We can calculate theirs sizes S of three main components (input, kernel, and output) as

S(x)={WΓ×HΓ×CΓx=ΓWQ×HQ×CQx=Qj,j[1,,J]WT×HT×CTx=T (10)

Where the triple elements (W, H, C) represent the size of height, width, and channels of the matrix, respectively [31]. The subscript Γ, Q, and T represent input, kernel, and output, respectively. J denotes total number of filters. Note that CΓ=CQ, which means the channel of input C Γshould equal the channel of kernel CQ.

Assume those filters move with padding of vp and stride of vs, we can get the sizes (WT × HT × CT) of output matrix T by simple math as [32]:

{WT=1+(2×vp+WΓWQ)vsHT=1+(2×vp+HΓHQ)vsCT=J (11)

where ⌊ · ⌋ represents the floor function. The channel of output CT should equal the number of filters J.

For the last step, viz., the NLAF β, it usually selects the rectified linear unit (ReLU) function [33]. Suppose fij is the entry of the matrix F, we have

βReLU(fij)=ReLU(fij)=max(0,fij) (12)

ReLU is preferred to traditional NLAFs such as sigmoid (SM) function and hyperbolic tangent (HT) as below:

βSM(fij)=(1+efij)1 (13)
βHT(fij)=tanh(fij)=(efijefij)(efij+efij) (14)

3.2. Improvement 1: batch normalization and dropout

The motivation of batch normalization (BAN) is to solve the “internal covariant shift (ICS)”, which means the effect of randomness of the distribution of inputs to internal CNN layers during training. The existence of ICS will worsen the CNN's performance [34,35].

This study introduced BAN to normalize those internal layer's inputs Γ={γi}over every mini-batch (suppose its size is |Γ|), in order to guarantee the batch normalized output T={ti}have a uniform distribution. Mathematically, BAN is to learn a function from

{γi,i=1,2.,|Γ|}Γ{ti,i=1,2,,|Γ|}T (15)

During training, the empirical mean μe and empirical variance ϕe can be calculated as

{μe=1|Γ|(i=1|Γ|γi)ϕe=1|Γ|i=1|Γ|(γiμe)2 (16)

The input γi ∈ Γ was first normalized to γi`

γi`=γiμe(ϕe+αs) (17)

where αs in denominator in Eq. (17) is stability factor, used to enhance the numerical stability. Now the γi`have zero-mean and unit-variance characteristics. In order to have a more expressive deep neural network [36] (here expressive means the network's expressive power, i.e., the ability to express functions), a transformation is usually carried out as

ti=A1×γi`+A2,i=1,2,,|Γ| (18)

where the parameters A 1 and A 2 are two learnable parameters during training. The transformed output ti ∈ T is then passed to the next layer and the normalized γi`remains internal to the current layer.

In the inference stage, we no longer have minibatch. So instead of calculating μe and ϕe, we will calculate population mean μp and population variance ϕp, and we have the output ti^at the inference stage as

ti^=A1×γiμp(ϕp+Δ)+A2 (19)

On the other hand, a dropout layer will be introduced before the fully-connected layer. It is a regularization technique which means randomly dropping out neurons during the training. Dropout can help avoid overfitting of deep neural networks. Srivastava, et al. [37] proposed the concept of dropout neurons (DN) by randomly drop neurons and set to zero their neighboring weights from the CNN during training. Suppose collection of all fully-connected neurons is {}, the collection of dropped neurons is {}, the collection of reserved neuron is {}. The selections of DN are random with a retention probability (αrp) defined as:

αrp=|DN|N (20)

Suppose we have a neuron N(i, j) and its corresponding original weights are w(i, j). During training, the neuron's weights wT(i, j) will update as:

wT(i,j)={w(i,j)N(i,j)D0N(i,j)D (21)

During inference, we run the entire CNN without dropout, but the weights of FCLs wI(i, j) using DNs are downscaled (viz., multiplied) by αrp.

wI(i,j)=αrp×w(i,j) (22)

The compression ratio of learnable weights (CRLW), is the squared value of retention probability αrp.

CRLW=CD/C=αrp2 (23)

Where CD is the total number of learnable weights after dropout, and C is the total number of learnable weights before dropout. Fig. 4 shows a simplistic example, and the detailed analysis is in Appendix C.

Fig. 4.

Fig 4

A simplistic example of a 4-layer DON.

3.3. Improvement 2: rank-based average pooling

The pooling function fundamentally replaces the output of a layer (particularly conv layers) with a summary statistic of the adjacent outputs at a particular position. The pooling method is capable of making less-sensitive activations in the pooled map than the original feature map to the accurate spots of structures within the image.

For a region to be pooled Ψ with size of n × n, here n means pooling size. Suppose the pixels within the region Ψ={ψi,j},(1i,jn)are

Ψ=[ψ1,1ψ1,nψn,1ψn,n] (24)

The l 2 norm pooling (L2P) calculates the l 2 norm of the given region Ψ. Assume the output pooling matrix is P, the L2P output P L2P(Ψ) is defined as PL2P(Ψ)=sqrt(i,j=1nψij2). In this study, we add a constant 1/|Ψ|, where |Ψ| means the number of elements of region Ψ. We have |Ψ|=n2. We added the new constant 1/|Ψ| under the square root, and the constant 1/|Ψ| does not influence training and inference.

PL2P(Ψ)=i,j=1nψij2|Ψ| (25)

The average pooling (AP) calculates the mean value in the region Ψ as

PAP(Ψ)=average(Ψ)=i,j=1nψi,j|Ψ| (26)

The max pooling (MP) operates on the region Ψ and selects the max value.

PMP(Ψ)=max(Ψ)=maxi,j=1nψi,j (27)

Shi, et al. [38] proposed three different rank-based pooling approaches. Their advantages compared to ordinary pooling methods are: (i) Ranking list is invariant under slight changes of activation values; (ii) Important activation values can be easily distinguished by their cognate ranks; and (iii) Usage of rank can avoid scale problems arise from value-based pooling methods. Among rank-based pooling approaches, the rank-based average pooling (RAP) gives better performances than state-of-the-art tactics, and has been applied in many fields. For example, Jiang [39] added RAP to convolutional neural networks for the susceptibility-weighted imaging based cerebral microbleed detection. They got a high accuracy of 97.18%. Sun, et al. [40] added RAP between each subspace mapping layer for facial expression recognition. Their method is better than PCANet and LDANet approaches. Akhtar and Ragavendran [41] compared rank-based pooling with traditional pooling methods, and stated the advantages of rank-based pooling are they can assign ranks and weights to activations simultaneously.

RAP first calculate the rank matrix (RM) based on the values of each element ψl ∈ Ψ, usually lower ranks rl ∈ [1, 2, ⋅⋅⋅, n 2] are assigned to higher values (ψl) as

ψl1ψl2rl1rl2 (28)

Providing tied values (ψl1=ψl2), a constraint is added to Eq. (28).

(ψl1=ψl2)(l1>l2)rl1>rl2 (29)

RAP output of input Ψ is PRAP(Ψ), which used the ag greatest activations

PRAP(Ψ)=1agl(ψl|1rlag) (30)

ag is the rank threshold. If ag=1, then RAP will degrade to MP. On the other side, if ag=n2, then RAP will degrade to AP. Therefore, RAP is regarded as a trade-off between average pooling and max pooling. Note that L2P, AP, MP, and RAP work on every slice separately. Fig. 5 shows a simplistic example of four pooling techniques and the explanation is shown in Appendix D.

Fig. 5.

Fig 5

A simplistic example of four pooling technologies (L2P, AP, MP, and RAP).

3.4. Improvement 3: multiple-way data augmentation

To circumvent the small-size dataset (SSD) and lack of generation (LG) problems, there are four possible types of solutions, e.g., data augmentation (DA), data generation (DG), ensemble approaches (EA), and regularization.

DA will generate fake images by perturbing existing data, such as cropping, rotation. DG create data from a sampled data source. Synthetic minority over-sampling technique (SMOTE) [42] is a typical algorithm of DG. EA methods use multiple models to obtain better predictive performance than any model alone [43]. Regularization is mainly for the weights of models. Large weights will make the models unstable, because minor variation on the inputs will yield large differences in the output for large weights. Smaller weights are regarded to be more regular (i.e., less specialized). Hence, this type of technique is called weight regularization. DA is used due to its simple and ease to realize.

We proposed a ηDA-way multiple-way data augmentation (MDA) technology. The difference of our MDA to traditional DA is we use multiple (ηDA > 10) DA techniques. Assume the preprocessed dataset U5=[u5(1),,u5(|U|)]. The dataset U 5is divided into three sets:

U5split{Xt,Xv,Y} (31)

where training set Xt=[xt(1),xt(i),,xt(|Xt|)], validation set Xv=[xv(1),,xv(|Xv|)], and test set Y=[y(1),,y(|Y|)]. Meanwhile, the sum of sizes of training set, validation set, and test set equals to the size of preprocessed dataset, |Xt|+|Xv|+|Y|=|U5|.

From the whole training image set Xt, we first performed the following seven DA techniques with different MDA factors χ. Note that each MDA technique will generate ηn new images. Suppose the output MDA training set is symbolized as XtD={xtD(i)}.

  • (i)
    Rotation. Rotation angle vector χR skip the value of 0.
    xt1(i)=R[xt(i)]=[x1t1(i,χ1R),bηnR(i,χηnR)] (32)

where R means rotation operation.

  • (ii)
    Noise injection. The χmN-mean χvN-variance Gaussian noises
    p(z)=12π×χvNexp[(zχmN)22×χvN] (33)

were added to all training images to produce ηn new noised images, where z is the gray levels, and p is the probability density function. We have

xt2(i)=N[xt(i)]=[x1t2(i),xηnt2(i)] (34)

where N means the noise injection operation. We used Gaussian noise in this study because it is the most common type found in images compared to impulse noise, speckle noise, and salt and pepper noise.

  • (iii)
    Horizontal Shear (HS) transform. New V images were generated by HT transform
    xt3(i)=H[xt(i)]=[x1t3(i,χ1H),xηnt3(i,χηnH)] (35)

where H means HS transform. HS factors χH skip the value of χH=0. Mathematically, if original coordinates are (u, v), and HS transformed coordinates are (u 1, v 1), then we have

{u1=u+χH×vv1=v (36)

Clearly, the HS transform is a special affine transform, which can be written as

[u1,v1,1]=[u,v,1]*[100χH10001] (37)
  • (iv)
    Vertical Shear (VS) transform.
    xt4(i)=V[xt(i)]=[x1t4(i,χ1V),xηnt4(i,χηnV)] (38)

where V means VS transform, which ran similarly as ST transform. Particularly, the VS factor is the same as HS factor χjV=χjH,j1,2,,ηn.

  • (v)
    Random translation (RT). All training images xt(i) were translated ηn times with random horizontal shift ɛx and random vertical shift ɛy, both values of which are in the range of [aZ,aZ], and obey uniform distribution.
    εmθ[aZ,aZ],m[1,ηn]θ{x,y} (39)

where aZ is the maximum shift factor. So, we have

xt5(i)=RT[xt(i)]=[x1t5(i,ε1x,ε1y),xηnt5(i,εηnx,εηny)] (40)
  • (vi)
    Gamma correction (GC). The factor of GC χG will skip the value of 1.
    xt6(i)=G[xt(i)]=[x1t6(i,χ1G),xηnt6(i,χηnG)] (41)

where G means GC operation.

  • (vii)
    Scaling. All training images {xt(i)} were scaled with scaling factor χS, skipping χS=1.
    xt7(i)=S[xt(i)]=[x1t7(i,χ1S),xηnt7(i,χηnS)] (42)
  • (viii)
    Mirror and concatenation. All the above results are mirrored, we have
    xn+ηDA2(i)=S[xn(i)],n{1,2,,ηDA2} (43)

where represents the mirror function. All the ηDA-way results are finally concatenated as

xtD(i)ηEF={x(i)1,x1(i)ηn,,xηDA(i)ηn} (44)

where means concatenation, ηEF the enhance factor, meaning the ratio of enhanced training set to the original training set. ηEF is defined as

ηEF=|xtD(i)||xt(i)| (45)

We can calculate ηEF=ηn×ηDA+1. Thus, the MDA can be regarded as a function, making the enhanced training set ηEFtimes as large as raw training set Xt.

{xt(i)Xt}MDA{xtD(i)XtD} (46)

3.5. Improvement 4: deep feature fusion by graph convolutional network

To further improve the performance, we introduced a deep feature fusion (DFF) method based on graph convolutional network (GCN). GCN helps find the relation-aware representation (RAR) [44], and thus fuse RAR from GCN with IIR from CNN.

For a given graph G=(V,E), where we have |V| nodes viV,i=1,,|V|and corresponding links (vi, vj) ∈ E. We can define an adjacency matrix A ∈ |V| × |V| which embeds the relationship of all nodes. The purpose of GCN is to encode the graph G via a neural network model f(X, A) where X ∈ |V| × D, where D means the feature dimension of each node [45]. Note that AX means the sum of all neighboring node features. So GCN can capture the RAR information [46].

A multi-layer GCN will update the node features based on following layer-wise rule:

Hl+1=fReLU(A^HlWl) (47)

where A^|V|×|V|represents the normalized version of adjacency matrix A. fReLU is ReLU function. The variable H(l)|V|×dlis the feature representation at Lth layer [47].

To carry out the normalization AA^, we first calculate the degree matrix δ ∈ |V| × |V|which is a diagonal matrix

δijdef__{deg(vi)ifi=j0otherwise (48)

The normalized A^is obtained based on original adjacency matrix A and degree matrix δ [48].

Note the input X=H(0), so for a two-layer GCN as shown in Fig. 6 , we have

H(1)=fReLU(A^XW(0)) (49.a)
H(2)=fReLU(A^H1W(1)) (49.b)

where W(0)d0×d1, and W(1)d1×d2are two trainable weight matrixes. (d 0, d 1, d 2) are hyperparameters will be set in the experiment.

Fig. 6.

Fig 6

Illustration of a two-layer GCN. (Different color cylinders mean different cluster centroids).

In our COVID-19 classification task, the GCN will be fused with previous CNN models N(1)N(4). The last FCL in previous CNN models is used as the individual image-level representation (IIR) Id0Afterwards, k-means clustering (KMC) is performed on those image-level representation features, and we can get |V| cluster centroids X|V|×d0. The clustering correlation shows the potential relationships of images. The adjacency matrix A ∈ |V| × |V|is defined as

Amn={1ifXmΔ(Xn)XnΔ(Xm)0otherwise (50)

where ∨ means operation “or”, Δ means the k-nearest neighbors (kNN) based on cosine similarity. The number of neighbors in kNN is symbolized as .

Fig. 7 shows an example, the three nearest neighbors of node i and node j are Δ(Xi)=(1,2,j), Δ(Xj)=(3,4,5). So, we have Xj ∈ Δ(Xi), Xi∉Δ(Xj). Using the ‘or’ operation we can conclude Aij=1. The node features Xand adjacency matrix A are sent into two-layer GCN, and we can get H(2)|V|×d2. The fusion between H (2)and I is accomplished by dot product fusion. Note we need to set d0=d2,

y=H(2)I (51)

Fig. 7.

Fig 7

Illustration of KNN-based adjacency matrix.

A linear projection (LP) with learnable weight W (2) ∈ |V| × C, where C means the number of categories, we have

z=yW(2)+b (52)

where z ∈ C, and b represents the bias. C=2in this study because our task is a binary classification problem, i.e., COVID-19 and healthy people. Finally, softmax operation were performed on z, and cross entropy (CE) loss were calculated. Algorithm 1 shows the proposed deep feature fusion algorithm. During inference stage, the CNN's IIR features are gained and its corresponding GCN's RAR features are obtained by pre-constructed graph and trained two-layer GCN. Using both CNN and GCN, each image is represented by the fusion of its individual image-level representations and its relation-aware representations [22]. Fig. 8 shows the fusion flowchart.

Algorithm 1.

Proposed deep feature fusion.

Input: IIR Feature I from CNN models
Algorithm of DFF
Step 1: Create RAR features H(2) from pre-constructed two-layer GCN model
Step 2: Dot product fusion combining IIR features and RAR features, y=H(2)I
Step 3: Linear Projection, z=yW(2)+b
Step 4: Softmax and cross-entropy (CE) loss

Fig. 8.

Fig 8

Flowchart of deep feature fusion strategy. (LP = Linear Projection; CE = Cross Entropy).

3.6. Summary of proposed eight networks

In total, we proposed eight new networks [N(1), ⋅⋅⋅, N(8)]: (i) We first designed a base network N(1), N(1) is called the base network (BN). (ii) we added batch normalization (BAN) and dropout (DO) techniques, and obtained the improved network, N(2) named as “BDBN”. (iii) Next, we developed N(3) termed BDRBN, by introducing rank-based average pooling (RAP) to replace traditional max pooling (MP) in N(2). (iv) Multiple data augmentation (MDA) was proposed and added to N(3), so we get new network N(4), which is short named as BDRMBN.

For the rest four networks, we add a new deep feature fusion (DFF) approach that combines the features from above networks [N(1), N(2), N(3), N(4)] to RAR features from 2-layer GCN. The short names of [N(5), N(6), N(7), N(8)] are defined as DBN, DBDBN, DBDRBN, DBDRMBN. Table 3 gives all the eight proposed network relationships.

Table 3.

Eight proposed networks.

Index Inheritance Short Name Description
N(1) BN Base Network
N(2) N(1)+BAN+DO BDBN Add BAN and DO to N(1)
N(3) N(2)-MP+RAP BDRBN Use RAP to replace MP in N(2)
N(4) N(3)+MDA BDRMBN Add MDA to N(3)
N(5) N(1)+DFF DBN Add DFF to N(1)
N(6) N(2)+DFF DBDBN Add DFF to N(2)
N(7) N(3)+DFF DBDRBN Add DFF to N(3)
N(8) N(4)+DFF DBDRMBN (FGCNet*) Add DFF to N(4)

(* Experiment below shows N(8) gives the best performance, and we name it as FGCNet).

As our expectation, the best model should be within N(58)since they fuse the features from CNNs N(14)with GCN. The best model will have a formal name as FGCNet, which means Fusion of GCN and CNN networks.

The configuration of proposed networks was designed by trial-and-error method. We set the number of conv layers as uc. Similarly, the number of FCL layers is symbolized as uf. The details of the base network can be found in Table 6.

Table 6.

Hyperparameters of N(1).

Index Layer Hyperparameter Size of activation map
1 Input 256 × 256 × 1
2 Conv-1 32 3 × 3 256 × 256 × 32
3 P-1 /2 128 × 128 × 32
4 Conv-2 64 3 × 3 128 × 128 × 64
5 P-2 /2 64 × 64 × 64
6 Conv-3 128 3 × 3 64 × 64 × 128
7 P-3 /2 32 × 32 × 128
8 Conv-4 128 3 × 3 32 × 32 × 128
9 P-4 /2 16 × 16 × 128
10 Conv-5 256 3 × 3 16 × 16 × 256
11 P-5 /2 8 × 8 × 256
12 Conv-6 256 3 × 3 8 × 8 × 256
13 P-6 /2 4 × 4 × 256
14 Conv-7 512 3 × 3 4 × 4 × 512
15 P-7 /2 2 × 2 × 512
16 Flatten 1 × 1 × 2048
17 FCL-1 120 × 2048
120 × 1
1 × 1 × 120
18 FCL-2 2 × 120
2 × 1
1 × 1 × 2

3.7. Measures

The algorithm will run W runs, which help to reduce the randomness. At each run w=1,2,,W, the ideal Eiand real Er confusion matrix over validation set are

Ei(w)=[|Xv|200|Xv|2],w1,,W (53)

where |Xv|2is because we have a balanced dataset. If the confusion matrix was run on the test set, then the diagonal elements turn to |Y|2. In realistic situation, suppose we have a confusion matrix as

Er(w)=[a1(w)a2(w)a3(w)a4(w)],w1,,W (54)

where the four variables {a 1(w), a 2(w), a 3(w), a 4(w)} represent TP, FN, FP, and TN at w-th run, respectively. Here P means COVID-19 and N means healthy lung. TP means a COVID-19 image is classified correctly as COVID-19, FN means a COVID-19 image is wrongly classified as healthy, FP means a healthy lung is wrongly classified as COVID-19, and TN means a healthy lung is classified correctly as healthy. It is obvious that 0ak(w)|Xv|2,w1,,Wk[1,2,3,4]if confusion matrix run on validation set.

Four simple measures {ν 1(w), ν 2(w), ν 3(w), ν 4(w)} can be below, here ν 1 means sensitivity, ν 2 means specificity, ν 3 precision, and ν 4 accuracy.

ν1(w)=a1(w)a1(w)+a2(w) (55)
ν2(w)=a4(w)a3(w)+a4(w) (56)
ν3(w)=a1(w)a1(w)+a3(w) (57)
ν4(w)=a1(w)+a4(w)a1(w)+a2(w)+a3(w)+a4(w) (58)

F1 score at w-th run is defined as ν 5(w)

ν5(w)=2×a1(w)2×a1(w)+a2(w)+a3(w) (59)

Besides, F1 score can be expressed in the format of sensitivity and precision as ν5(w)=2×[ν3(w)×ν1(w)]÷[ν3(w)+ν1(w)]. The F1 score ν 5is the harmonic mean of the precision ν 3 and sensitivity ν 1. The range of F1 score is [0, 1]. The highest possible value 1 indicates perfect precision ν 3 and sensitivity ν 1, and the lowest possible value 0 means either the precision ν 3 or the sensitivity is zero ν 1.

ν5={1ν3=1ν1=10ν3=0ν1=0 (60)

Matthews correlation coefficient at w-th run (MCC) ν 6(w) is defined as

ν6(w)=a4(w)×a1(w)a3(w)×a2(w)ε(w) (61)

where ɛ(w) is a temporary variable defined as ε(w)=[a3(w)+a1(w)]×[a1(w)+a2(w)]×[a4(w)+a3(w)]×[a4(w)+a2(w)].

Finally, Fowlkes–Mallows index (FMI) at w-th run ν 7(w) can be defined as:

ν7(w)=a1(w)a1(w)+a3(w)×a1(w)a1(w)+a2(w) (62)

FMI can be expressed in the terms of sensitivity and precision as ν7(w)=sqrt[ν3(w)×ν1(w)].

After recording the seven indicators of all W runs, we can calculate the mean and standard deviation (MSD) of all m-th (∀m ∈ [1, 7]) measures as

M(νm)=1W×w=1Wνm(w) (63)
SD(νm)=1W1×w=1W[νm(w)M(νm)]2 (64)

The final result over W runs were reported in the form of Mean ± SD format.

3.8. Proposed algorithm

Algorithm 2 presents the pseudocode of our algorithm, composed of one input, one output, and five phases. Phase I presents the preprocessing. Phase II presents how to construct the eight network models. Phase III shows the detailed procedures of W runs over validation set. Phase IV presents how to select the best network model based on validation performance. Phase V shows to calculate the test performance using the best network model.

Algorithm 2.

Pseudocode of our algorithm.

Image, table 21
Image, table 21

4. Experiments and results

4.1. Hyperparameter setting

Table 4 shows the hyperparameter setting in this study. Most of the values are set by trial-and-error method. The stability factor is set as 105. The retention probability is set as 0.5. The pooling size is set to 2. The rank threshold is set to 2. The number of DA techniques and new images per DA are set to 14 and 30, respectively. The maximum shift factor is 25, the mean and variance of noise injection is set to 0 and 0.01, respectively. The rotation parameter vector χR, horizontal shift parameter vector χH, Gamma correction parameter vector χG, scaling parameter vector χSare all listed here. Enhanced factor is calculated as 421. The number of conv layers and fully connected layers are set as 7 and 2, respectively. Number of cluster centroids is set to 256. Feature dimension in GCN is set as d0=d2=120, and d1=60. The number of neighbors in KNN is set to 7. The number of runs Wis 10 since it is a default value used in many other publications.

Table 4.

Hyperparameter setting.

Parameter Value
αs 105
αrp 0.5
n 2
ag 2
ηDA 14
ηn 30
aZ 25
χR χ1R=30,χ2R=28,,χ15R=2,χ16R=+2,,χηnR=+30.
χmN 0
χvN 0.01
χH χ1H=0.15,χ2H=0.14,,χ15H=0.01,χ16H=+0.01,,χηnH=+0.15.
χG χ1G=0.4,χ2G=0.44,,χ15G=0.96,χ16G=1.04,,χηnG=1.6
χS χ1S=0.7,χ2S=0.72,,χ15S=0.98,χ16S=1.02,,χηnS=1.3.
ηEF 421
uc 7
uf 2
|V| 256
d0 120
d1 60
d2 120
7
W 10

Table 5 shows the training, MDA training, validation and test set. Where we can see the total size of training set is |Xt|=320. The total size of enhanced MDA training set is |XtD|=134,720. The validation set's and test set's sizes are |Xv|=128, and |Y|=192, respectively. In total, the size of the whole dataset is |U5|=|Xt|+|Xv|+|Y|=640.

Table 5.

Training, validation, and test set.

Set Symbol COVID C Healthy H
Training Xt 160 160
MDA Training XtD 67,360 67,360
Validation Xv 64 64
Test Y 96 96
Total U5=XtXvY 113 209

4.2. Base network configuration

The top row of Fig. 9 (a) shows the activation maps of the proposed base network N(1). Here, the size of input is S1=256×256×1, the output of the first conv layer (C1) is S2=256×256×32. Then after the first pooling (P1), the output is S3=128×128×32, We repeat the conv layers in total seven times, and the output is S15=2×2×512, S15 was flattened into one column vector S16=1×1×2048, and passed into two fully-connected blocks (first two block contains FCL and ReLU, last block contains FCL and softmax), and the outputs are S17=1×1×120, and S18=1×1×2. All the 18 matrices S(k), k ∈ [1, 18] correspond to the cuboids in Fig. 9(b). Note S17 will be used as IIR features. The hyperparameters of N(1) are presented in Table 6 . Based on N(1), we can create the rest seven networks.

Fig. 9.

Fig 9

Block chart of first three proposed networks.

4.3. Illustration of MDA

Fig. 10 shows the MDA results. The original image is Fig. 2(a). We can observe that one image will generate 421 new images. This is why we called our algorithm multiple-way data augmentation (MDA).

Fig. 10.

Fig 10

Results of proposed MDA.

4.4. Comparison among proposed networks

Table 7 gives the 10 runs of results using N(1) to N(4). N(1) is BN, N(2) is BDBN, N(3) is BDRBN, and N(4) is BDRMBN. Table 7 clearly shows that N(1) model yielded the following seven performances as: νN11=89.22±2.38, νN12=92.50±2.31, νN13=92.31±2.10, νN14=90.86±1.28, νN15=90.70±1.31, νN16=81.82±2.53, νN17=90.73±1.30. Definition of νcan be found in Section 3.7.

Table 7.

Comparison among N(1–4) over validation set.

N(1) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 89.06 93.75 93.44 91.41 91.20 82.90 91.23
2 93.75 92.19 92.31 92.97 93.02 85.95 93.03
3 90.63 87.50 87.88 89.06 89.23 78.16 89.24
4 89.06 92.19 91.94 90.63 90.48 81.29 90.49
5 89.06 92.19 91.94 90.63 90.48 81.29 90.49
6 90.63 92.19 92.06 91.41 91.34 82.82 91.34
7 89.06 93.75 93.44 91.41 91.20 82.90 91.23
8 89.06 95.31 95.00 92.19 91.94 84.54 91.98
9 87.50 90.63 90.32 89.06 88.89 78.16 88.90
10 84.38 95.31 94.74 89.84 89.26 80.17 89.41
MSD 89.22±2.38 92.50±2.31 92.31±2.10 90.86±1.28 90.70±1.31 81.82±2.53 90.73±1.30
N(2) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 93.75 95.31 95.24 94.53 94.49 89.07 94.49
2 95.31 96.88 96.83 96.09 96.06 92.20 96.07
3 93.75 95.31 95.24 94.53 94.49 89.07 94.49
4 95.31 93.75 93.85 94.53 94.57 89.07 94.58
5 95.31 96.88 96.83 96.09 96.06 92.20 96.07
6 95.31 95.31 95.31 95.31 95.31 90.63 95.31
7 93.75 92.19 92.31 92.97 93.02 85.95 93.03
8 92.19 96.88 96.72 94.53 94.40 89.16 94.43
9 93.75 89.06 89.55 91.41 91.60 82.90 91.63
10 93.75 95.31 95.24 94.53 94.49 89.07 94.49
MSD 94.22±1.05 94.69±2.47 94.71±2.29 94.45±1.40 94.45±1.34 88.93±2.78 94.46±1.33
N(3) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 96.88 92.19 92.54 94.53 94.66 89.16 94.68
2 95.31 93.75 93.85 94.53 94.57 89.07 94.58
3 93.75 93.75 93.75 93.75 93.75 87.50 93.75
4 93.75 95.31 95.24 94.53 94.49 89.07 94.49
5 93.75 98.44 98.36 96.09 96.00 92.29 96.03
6 95.31 98.44 98.39 96.88 96.83 93.80 96.84
7 96.88 95.31 95.38 96.09 96.12 92.20 96.13
8 92.19 96.88 96.72 94.53 94.40 89.16 94.43
9 95.31 96.88 96.83 96.09 96.06 92.20 96.07
10 92.19 93.75 93.65 92.97 92.91 85.95 92.92
MSD 94.53±1.69 95.47±2.14 95.47±2.05 95.00±1.23 94.98±1.23 90.04±2.47 94.99±1.23
N(4) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 96.88 98.44 98.41 97.66 97.64 95.32 97.64
2 96.88 96.88 96.88 96.88 96.88 93.75 96.88
3 96.88 96.88 96.88 96.88 96.88 93.75 96.88
4 98.44 98.44 98.44 98.44 98.44 96.88 98.44
5 93.75 98.44 98.36 96.09 96.00 92.29 96.03
6 96.88 96.88 96.88 96.88 96.88 93.75 96.88
7 95.31 98.44 98.39 96.88 96.83 93.80 96.84
8 95.31 95.31 95.31 95.31 95.31 90.63 95.31
9 95.31 93.75 93.85 94.53 94.57 89.07 94.58
10 96.88 96.88 96.88 96.88 96.88 93.75 96.88
MSD 96.25±1.32 97.03±1.55 97.03±1.52 96.64±1.11 96.63±1.10 93.30±2.21 96.63±1.10

For N(2), the performances improved as νN21=94.22±1.05, νN22=94.69±2.47, νN23=94.71±2.29, νN24=94.45±1.40, νN25=94.45±1.34, νN26=88.93±2.78, νN27=94.46±1.33. Comparing the results of BN as N(1), we can observe the effectiveness of using batch normalization and dropout.

Furthermore, N(3) yields the performances as νN31=94.53±1.69, νN32=95.47±2.14, νN33=95.47±2.05, νN34=95.00±1.23, νN35=94.98±1.23, νN36=90.04±2.47, νN37=94.99±1.23. Comparing the seven indicator performances between BDBN of N(2) and BDRBN of N(3), we can conclude that rank-based average pooling gives significant better performance than using max pooling in N(2).

Finally, checking N(4) in Table 7 obtained the performance as νN41=96.25±1.32, νN42=97.03±1.55, νN43=97.03±1.52, νN44=96.64±1.11, νN45=96.63±1.10, νN46=93.30±2.21, νN47=96.63±1.10. The increase of performances of BDRMBN of N(4) compared to BDRBN of N(3) indicate that multiple-way data augmentation can help improving the performance of the AI classifier.

4.5. Visual explanation

We used Gradient-weighted Class Activation Mapping (Grad-CAM) [49] to visually show why our model, BDRMBN of N(4), can make the decision. Grad-CAM uses the gradient of the classification score with respect to the convolutional features determined by the network in order to understand which parts of the image are most important for classification. The “jet” pseudo-color was used in the heat map. Hence, red colors mean important areas for AI diagnosis, and blue colors mean unimportant areas for AI diagnosis.

Fig. 11 shows the Grad-CAM heat map results of a COVID-19 CCT slice. On the left part of Fig. 11(a), the red circle delineated the lesions, where we can see the GGO occurs. Fig. 11(b) show the corresponding heat map. We can see here AI pay the most attention on the GGO lesion (See the red circle on the Fig. 11a), indicating AI successfully capture the GGO lesions. Secondly, AI pay somewhat attention on the tracheae (in the middle of the Fig. 11b). The reason may be COVID-19 influences the grayscale values of tracheae tissues, where we can see yellow blots on the middle areas in Fig. 11(b).

Fig. 11.

Fig 11

Grad-CAM result on a covid-19 case.

Fig. 12 shows the Grad-CAM heat map of a normal CCT slice. Our AI model scans through the whole image and does not find any strong activations (suspicious areas). Hence, the AI model judges this image “healthy”.

Fig. 12.

Fig 12

Grad-CAM result on a normal case.

4.6. Effect of deep feature fusion

We compared the performance of using deep feature fusion against not using deep feature fusion. The comparison was done on the validation set. The results using N(5–8) are presented in Table 8 . Comparing Tables 7 and 8, we can find that using DFF can increase the classification performance.

Table 8.

Comparison among N(5–8) over validation set.

N(5) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 96.88 90.63 91.18 93.75 93.94 87.67 93.98
2 90.63 93.75 93.55 92.19 92.06 84.42 92.08
3 93.75 92.19 92.31 92.97 93.02 85.95 93.03
4 95.31 89.06 89.71 92.19 92.42 84.54 92.47
5 95.31 90.63 91.04 92.97 93.13 86.03 93.15
6 90.63 90.63 90.63 90.63 90.63 81.25 90.63
7 92.19 92.19 92.19 92.19 92.19 84.38 92.19
8 95.31 89.06 89.71 92.19 92.42 84.54 92.47
9 89.06 93.75 93.44 91.41 91.20 82.90 91.23
10 95.31 90.63 91.04 92.97 93.13 86.03 93.15
MSD 93.44±2.64 91.25±1.68 91.48±1.37 92.34±0.89 92.41±0.98 84.77±1.80 92.44±0.98
N(6) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 96.88 96.88 96.88 96.88 96.88 93.75 96.88
2 92.19 96.88 96.72 94.53 94.40 89.16 94.43
3 95.31 95.31 95.31 95.31 95.31 90.63 95.31
4 98.44 93.75 94.03 96.09 96.18 92.29 96.21
5 98.44 93.75 94.03 96.09 96.18 92.29 96.21
6 95.31 95.31 95.31 95.31 95.31 90.63 95.31
7 95.31 95.31 95.31 95.31 95.31 90.63 95.31
8 93.75 95.31 95.24 94.53 94.49 89.07 94.49
9 93.75 95.31 95.24 94.53 94.49 89.07 94.49
10 95.31 93.75 93.85 94.53 94.57 89.07 94.58
MSD 95.47±2.01 95.16±1.15 95.19±1.04 95.31±0.82 95.31±0.86 90.66±1.66 95.32±0.86
N(7) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 95.31 95.31 95.31 95.31 95.31 90.63 95.31
2 93.75 96.88 96.77 95.31 95.24 90.67 95.25
3 96.88 93.75 93.94 95.31 95.38 90.67 95.40
4 96.88 95.31 95.38 96.09 96.12 92.20 96.13
5 95.31 95.31 95.31 95.31 95.31 90.63 95.31
6 95.31 95.31 95.31 95.31 95.31 90.63 95.31
7 95.31 95.31 95.31 95.31 95.31 90.63 95.31
8 98.44 95.31 95.45 96.88 96.92 93.80 96.93
9 96.88 95.31 95.38 96.09 96.12 92.20 96.13
10 96.88 95.31 95.38 96.09 96.12 92.20 96.13
MSD 96.09±1.33 95.31±0.74 95.36±0.67 95.70±0.55 95.72±0.57 91.42±1.11 95.72±0.57
N(8) ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 98.44 98.44 98.44 98.44 98.44 96.88 98.44
2 98.44 96.88 96.92 97.66 97.67 95.32 97.68
3 96.88 98.44 98.41 97.66 97.64 95.32 97.64
4 96.88 95.31 95.38 96.09 96.12 92.20 96.13
5 96.88 96.88 96.88 96.88 96.88 93.75 96.88
6 100.00 98.44 98.46 99.22 99.22 98.45 99.23
7 96.88 96.88 96.88 96.88 96.88 93.75 96.88
8 100.00 90.63 91.43 95.31 95.52 91.03 95.62
9 95.31 98.44 98.39 96.88 96.83 93.80 96.84
10 96.88 98.44 98.41 97.66 97.64 95.32 97.64
MSD 97.66±1.52 96.88±2.44 96.96±2.21 97.27±1.12 97.28±1.08 94.58±2.17 97.30±1.06

For clear view, the results using all the proposed eight models are presented in Table 9 . N(1–4) did not use DFF while N(5–8) added DFF to the corresponding networks (See Table 3). Fig. 13 shows the mean and standard deviation of the eight neural network models.

Table 9.

Comparison of eight network models.

Model ν1 ν2 ν3 ν4 ν5 ν6 ν7
N(1) 89.22±2.38 92.50±2.31 92.31±2.10 90.86±1.28 90.70±1.31 81.82±2.53 90.73±1.30
N(2) 94.22±1.05 94.69±2.47 94.71±2.29 94.45±1.40 94.45±1.34 88.93±2.78 94.46±1.33
N(3) 94.53±1.69 95.47±2.14 95.47±2.05 95.00±1.23 94.98±1.23 90.04±2.47 94.99±1.23
N(4) 96.25±1.32 97.03±1.55 97.03±1.52 96.64±1.11 96.63±1.10 93.30±2.21 96.63±1.10
N(5) 93.44±2.64 91.25±1.68 91.48±1.37 92.34±0.89 92.41±0.98 84.77±1.80 92.44±0.98
N(6) 95.47±2.01 95.16±1.15 95.19±1.04 95.31±0.82 95.31±0.86 90.66±1.66 95.32±0.86
N(7) 96.09±1.33 95.31±0.74 95.36±0.67 95.70±0.55 95.72±0.57 91.42±1.11 95.72±0.57
N(8) 97.66±1.52 96.88±2.44 96.96±2.21 97.27±1.12 97.28±1.08 94.58±2.17 97.30±1.06

Fig. 13.

Fig 13

Error bar of proposed eight models.

Comparing BDRMBN of N(4) against DBDRMBN of N(8), we can see that adding DFF can improve all the five indicators (ν 1, ν 4, ν 5, ν 6, ν 7) except (ν 2, ν 3). This indicates the DFF is effective in increasing the classifier's performance.

The same scenario can be observed by comparing BN of N(1) against DBN of N(5), comparing BDBN of N(2) DBDBN of N(6), and comparing BDRBN of N(3) and DBDRBN of N(7). The reason why DFF can improve the performance is because DFF fuses features from GCN, which learns the relation-awareness relationships (RARs) among the validation images. Hence, classifiers with DFFs were more accurate than those without DFFs.

Besides, from Table 9 and Fig. 13, we can find the optimal n*=8. That means, N(8) achieved the best performance among all our proposed eight network models. This falls within our expectation, because DBDRMBN of N(8) fuses RAR features from GCN with features from BDRMBN of N(4). This feature fusion helps our classifier obtained the best performance. As designed in Section 3.6, the best model among all eight proposed networks is dubbed as FGCNet, indicating the fusion of GCN and CNN networks.

4.7. Comparison to state-of-the-art approaches

Above performances ran on the validation set. Now we run the best model FGCNet, i.e., DBDRMBN model on the test set, and report its performance. The results are shown in Table 10 . As is shown, νN81=97.71±1.46, νN82=96.56±1.48, νN83=96.61±1.43, νN84=97.14±1.26, νN85=97.15±1.25, νN86=94.29±2.52, νN87=97.16±1.25. Comparing Tables 9and 10, we observe that the performances on validation set and test set are quite similar, only the test performance is slightly less than the validation performance.

Table 10.

Performance of proposed N(8) FGCNet on test set.

Run ν1 ν2 ν3 ν4 ν5 ν6 ν7
1 98.96 96.88 96.94 97.92 97.94 95.85 97.94
2 96.88 96.88 96.88 96.88 96.88 93.75 96.88
3 96.88 98.96 98.94 97.92 97.89 95.85 97.90
4 96.88 95.83 95.88 96.35 96.37 92.71 96.37
5 96.88 95.83 95.88 96.35 96.37 92.71 96.37
6 97.92 95.83 95.92 96.88 96.91 93.77 96.91
7 95.83 95.83 95.83 95.83 95.83 91.67 95.83
8 100.00 97.92 97.96 98.96 98.97 97.94 98.97
9 96.88 93.75 93.94 95.31 95.38 90.67 95.40
10 100.00 97.92 97.96 98.96 98.97 97.94 98.97
MSD 97.71±1.46 96.56±1.48 96.61±1.43 97.14±1.26 97.15±1.25 94.29±2.52 97.16±1.25

We compared our DBDRMBN method, i.e., N(8) model, with 15 state-of-the-art approaches: RBFNN [4], KELM [5], ELM-BA [6], RCBBO [7], 6L-CLF [8], GoogLeNet [9], ResNet-18 [10], RN-50-AD [11], SMO [12], CSSNet [13], GGNet [14], COVNet [15], NiNet [16], FCONet [17], and DeCovNet [18]. All the methods were compared on the test set of our 640-image dataset. Some methods were not proposed for detecting COVID-19, some methods work on Chest X-ray images, and some are for multi-class classification, nevertheless, we modified and transferred their methods to our CCT images. The comparison and its plot are presented in Table 11 and Fig. 14 . The results in Table 11 and Fig. 14 show that the proposed FGCNet (DBDRMBN) achieved the best results among all algorithms.

Table 11.

Comparison with state-of-the-art approaches.

Approach ν1 ν2 ν3 ν4 ν5 ν6 ν7
RBFNN [4] 67.08 74.48 72.52 70.78 69.64 41.74 69.64
KELM [5] 57.29 61.46 59.83 59.38 58.46 18.81 58.46
ELM-BA [6] 57.08
±3.86
72.40
±3.03
67.48
±1.65
64.74
±1.26
61.75
±2.24
29.90
±2.45
61.76
±2.24
RCBBO [7] 69.48
±4.47
81.15
±3.16
78.79
±1.80
75.31
±0.82
73.72
±1.86
51.10
±1.28
73.93
±1.66
6L-CLF [8] 81.04
± 2.90
79.27
± 2.21
79.70
± 1.27
80.16
± 0.85
80.31
± 1.13
60.42
± 1.73
80.35
± 1.15
GoogLeNet [9] 76.88
± 3.92
83.96
± 2.29
82.84
± 1.58
80.42
± 1.40
79.65
± 1.92
61.10
± 2.62
79.65
± 1.91
ResNet-18 [10] 78.96
± 2.90
89.48
± 1.64
88.30
± 1.50
84.22
± 1.23
83.31
± 1.53
68.89
± 2.33
83.32
± 1.53
RN-50-AD [11] 83.96
±3.19
90.31
±2.14
89.73
±1.78
87.14
±1.07
86.69
±1.34
74.50
±2.00
86.77
±1.27
SMO [12] 93.23
±1.72
95.52
±1.30
95.44
±1.22
94.38
±0.64
94.31
±0.68
88.80
±1.27
93.23
±1.72
CSSNet [13] 92.08
±1.01
93.33
±2.61
93.32
±2.40
92.71
±0.95
92.67
±0.85
85.47
±1.93
92.69
±0.86
GGNet [14] 94.38
±2.09
90.63
±2.02
91.00
±1.77
92.50
±1.16
92.64
±1.14
85.11±
2.34
92.66
±1.15
COVNet [15] 90.83
±2.45
96.67
±0.82
96.47
±0.82
93.75
±1.13
93.55
±1.24
87.68
±2.14
93.60
±1.21
NiNet [16] 95.31
±1.13
77.19
±2.62
80.73
±1.70
86.25
±1.02
87.40
±0.79
73.76
±1.73
87.71
±0.73
FCONet [17] 93.54
±1.69
96.04
±1.46
95.96
±1.38
94.79
±0.92
94.72
±0.94
89.64
±1.83
94.74
±0.94
DeCovNet [18] 90.00
±1.49
90.52
±1.99
90.52
±1.72
90.26
±0.65
90.24
±0.61
80.56
±1.32
90.25
±0.61
FGCNet
(Ours)
97.71
±1.46
96.56
±1.48
96.61
±1.43
97.14
±1.26
97.15
±1.25
94.29
±2.52
97.16
±1.25

Fig. 14.

Fig 14

Comparison plot.

5. Conclusions

This paper proposed a total eight network models for COVID-19 detection in CCTs. Our experiments showed our DBDRMBN of N(8) can achieve the best performance among all the eight proposed models. This model is also named as FGCNet for short, and it obtains superior results to other 15 state-of-the-art approaches.

The reason why our FGCNet has the best performance is (i) because the FGCNet (DBDRMBN) is a deep feature fusion combination of an improved CNN model N(4) of BDRMBN and a GCN model. Here, N(4) (BDRMBN) helps to extract learnt individual image-level representations, while GCN helps to extract learnt relation-aware representation. Finally, the DFF strategy help fuse those two types of features. (ii) The proposed N(4) (BDRMBN) is a novel neural network trained with its structure developed and weights trained from scratch. Besides, BDRMBN used several advanced techniques, such as batch normalization, dropout, rank-based average pooling, and multiple-way data augmentation.

The shortcomings of this proposed FGCNet can only handles CCT images. For chest X-ray images and other sources of data, the FGCNet may not work correctly. So, it is necessary to define a network that can fuse different sources/modalities of data and give an overall decision. Another shortcoming is this proposed FGCNet is not verified by a strict clinical test.

The future work directions are: (i) Expand the dataset and test our algorithm on different sources of COVID-19, such as the combination of X-ray and CT. (ii) Test other fuse strategies of CNN and GCN. (iii) Try to employ deeper GCN, and test whether deeper GCN will help improve the performance.

CRediT authorship contribution statement

Shui-Hua Wang: Conceptualization, Methodology, Software, Validation, Data curation, Writing - original draft, Investigation, Data curation. Vishnu Varthanan Govindaraj: Formal analysis, Writing - original draft, Writing - review & editing. Juan Manuel Górriz: Writing - original draft, Writing - review & editing, Supervision, Funding acquisition. Xin Zhang: Writing - original draft, Writing - review & editing. Yu-Dong Zhang: Resources, Formal analysis, Investigation, Data curation, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This paper is partially supported by Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Hope Foundation for Cancer Research, UK (RM60G0680); Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); British Heart Foundation Accelerator Award, UK; and MINECO/FEDER under the RTI2018-098913-B100 and A-TIC-080-UGR18 projects.

Appendix A

Table 12.

Table 12.

Abbreviation list.

Abbreviation Full name
CCT chest computed tomography
GGO ground-glass opacity
CAP community acquired pneumonia
HS hyperspectral
CNN convolutional neural network
GCN graph convolutional network
IIR individual image-level representation
RAR relation-aware representation
MV majority voting
NLAF non-linear activation function
ICS internal covariant shift
BAN Batch Normalization
DO Dropout
L2P l2 pooling
MDA Multiple-way data augmentation
SSD small-size dataset
LG lack of generation
KMC k-means clustering
kNN k-nearest neighbors
BN Base Network
FGCNet Fusion of GCN and CNN Network

Appendix B

Table 13

Table 13.

Symbol list.

Symbol Meaning
L Labeling from each individual expert
L^ Final labeling
U Dataset
U1 Raw dataset
U5 Preprocessed dataset
|U| Number of samples in the dataset
Xt Training set
Xv Validation Set
Y Test set
μmin Min grayscale value
μmax Max grayscale value
W Width
H Height
C Channel
αs Stability factor
αrp Retention probability
μe Empirical mean
ϕe Empirical variance
μp Population mean
ϕp Population variance
Ψ Region to be pooled
n Pooling size.
P Pooling output
ag Rank threshold
χ Various data augmentation factor
ηDA Number of MDA techniques
ηn Number of new images generated for each DA
aZ Maximum shift factor
Mirror function
ηEF Enhance factor
Concatenation
uc Number of conv layers
uf Number of FCL layers
|V| Cluster centroids
Number of neighbors in kNN
W Number of runs
w Run index

Appendix C

Fig. 4 shows a simplistic CNN example with four FCL layers. Suppose we have C(k),k=1,2,3,4neurons at k-th layer, and assume C(1)=6, C(2)=10, C(3)=6, C(4)=4. Thus, we have in total k=14N(k)=26nodes. Suppose we do not consider incoming and outgoing weights, and do not consider the number of biases, the size of learnable weights C(i, j), where (i,j)={(1,2),(2,3),(3,4)}as number of weights between layer i and layer j before dropout, roughly calculating, can be written as C(1,2)=6×10=60, C(2,3)=10×6=60, C(3,4)=6×4=24. In total, we have the total number of learnable weights before dropout as C=i,jC(i,j)=144.

Using αrp=0.5, the size of learnable weights after dropout between layer iand layer j is symbolized as CD(i, j), and we can calculate the total number of learnable weights as CD=i,jC(i,j)=15+15+6=36. The compression ratio of learnable weights (CRLW), roughly, can be calculated by CD/C=36/44=0.25,

Appendix D

Using Fig. 5 as a simplistic example, and assuming the region Ψ(1, 1) at 1st row 1st column of the input is chosen. For the sake of format, we use its row-vector Ψ(1,1)Ψ(1,1)to represent in this paragraph, so we have Φ(1,1)=(2.5,5,1.6,1.1). We can calculate the results of L2P as PL2P(Ψ(1,1))=sqrt((2.52+52+1.62+1.12)/4)=35.02/4=8.755=2.96. The pooling result of AP is PAP(Ψ(1,1))=average(Ψ(1,1))=(2.5+5+1.6+1.1)÷4=2.55. The MP result is PMP(Ψ(1,1))=max(Ψ(1,1))=max(2.5,5,1.6,1.1)=5. The rank matrix of Ψ(1, 1) also expressed in row-vector format R Ψ(1, 1) ← vec(R Ψ(1, 1)), we have RΨ(1,1)=(2,1,3,4). The RAP result is PRAP(Ψ(1,1))=(2.5+5)÷2=3.75.

References

  • 1.Ataguba O.A., Ataguba J.E. Social determinants of health: the role of effective communication in the COVID-19 pandemic in developing countries. Glob. Health Action. 2020;13:5. doi: 10.1080/16549716.2020.1788263. Article ID: 1788263Dec. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ketencioglu B.B., Yigit F., Almadqa M., Tutar N., Yilmaz I. Non-infectious diseases compatible With COVID-19 Pneumonia. Cureus. 2020;12:5. doi: 10.7759/cureus.9989. Article ID: e9989, Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hope M.D., Raptis C.A., Shah A., Hammer M.M., Henry T.S., Six S. A role for CT in COVID-19? What data really tell us so far. Lancet. 2020;395:1189–1190. doi: 10.1016/S0140-6736(20)30728-5. Apr. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lu Z. A pathological brain detection system based on radial basis function neural network. J. Med. Imaging Health Inform. 2016;6:1218–1222. [Google Scholar]
  • 5.Yang J. A pathological brain detection system based on kernel based ELM. Multimed. Tools Appl.. 2018;77:3715–3728. [Google Scholar]
  • 6.Lu S. A pathological brain detection system based on extreme learning machine optimized by bat algorithm. CNS Neurol. Disord.-Drug Targets. 2017;16:23–29. doi: 10.2174/1871527315666161019153259. [DOI] [PubMed] [Google Scholar]
  • 7.Li P., Liu G. Pathological brain detection via wavelet packet Tsallis entropy and real-coded biogeography-based optimization. Fundam. Inform. 2017;151:275–291. [Google Scholar]
  • 8.Jiang X. Chinese sign language fingerspelling recognition via six-layer convolutional neural network with leaky rectified linear units for therapy and rehabilitation. J. Med. Imaging Health Inform.. 2019;9:2031–2038. [Google Scholar]
  • 9.Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE; Boston, MA, USA: 2015. Going deeper with convolutions; pp. 1–9. [Google Scholar]
  • 10.Guo M.H., Du Y.Z. Classification of thyroid ultrasound standard plane images using ResNet-18 networks. IEEE 13th International Conference on Anti-Counterfeiting, Security, and Identification; Xiamen, China; 2019. pp. 324–328. [Google Scholar]
  • 11.Fulton L.V., Dolezel D., Harrop J., Yan Y., Fulton C.P. Classification of Alzheimer's disease with and without imagery using gradient boosted machines and ResNet-50. Brain Sci. 2019;9:16. doi: 10.3390/brainsci9090212. Article ID: 212, Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Togacar M., Ergen B., Comert Z. COVID-19 detection using deep learning models to exploit social mimic optimization and structured chest X-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 2020;121:12. doi: 10.1016/j.compbiomed.2020.103805. Article ID: 103805, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cohen J.P., Dao L., Morrison P., Roth K., Bengio Y., Shen B.Y. Predicting COVID-19 pneumonia severity on chest X-ray with deep learning. Cureus. 2020;12:10. doi: 10.7759/cureus.9448. Article ID: e9448, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Loey M., Smarandache F., Khalifa N.E.M. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry-Basel. 2020;12:19. Article ID: 651, Apr. [Google Scholar]
  • 15.Li L., Qin L., Xu Z., Yin Y., Wang X., Kong B. Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology. 2020;296:E65–E71. doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ni Q.Q., Sun Z.Y., Qi L., Chen W., Yang Y., Wang L. A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images. Eur. Radiol. 2020:11. doi: 10.1007/s00330-020-07044-9. https://link.springer.com/article/10.1007/s00330-020-07044-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ko H., Chung H., Kang W.S., Kim K.W., Shin Y., Kang S.J. COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest CT image: model development and validation. J. Med. Internet Res. 2020;22:13. doi: 10.2196/19569. Article ID: e19569, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang X.G., Deng X.B., Fu Q., Zhou Q., Feng J.P., Ma H. A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT. IEEE Trans. Med. Imaging. 2020;39:2615–2625. doi: 10.1109/TMI.2020.2995965. Aug. [DOI] [PubMed] [Google Scholar]
  • 19.S. Tabik, A. Gómez-Ríos, J. Martín-Rodríguez, I. Sevillano-García, M. Rey-Area, D. Charte, et al., "COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on Chest X-Ray images," arXiv Preprint 2020. [DOI] [PMC free article] [PubMed]
  • 20.Hasan M.M., Schaduangrat N., Basith S., Lee G., Shoombuatong W., Manavalan B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics. 2020;36:3350–3356. doi: 10.1093/bioinformatics/btaa160. Jun. [DOI] [PubMed] [Google Scholar]
  • 21.Li D., Wang Q., Kong F.Q. Adaptive kernel sparse representation based on multiple feature learning for hyperspectral image classification. Neurocomputing. 2020;400:97–112. Aug. [Google Scholar]
  • 22.Shi J., Wang R., Zheng Y., Jiang Z., Yu L. Graph convolutional networks for cervical cell classification. Second MICCAI Workshop on Computational Pathology (COMPAT); Shenzhen, China; 2019. [Google Scholar]
  • 23.Bin Y.R., Chen Z.M., Wei X.S., Chen X.Y., Gao C.X., Sang N. Structure-aware human pose estimation with graph convolutional networks. Pattern Recognit. 2020;106:13. Article ID: 107410, Oct. [Google Scholar]
  • 24.Tian Z.Q., Li X.J., Zheng Y.Y., Chen Z., Shi Z., Liu L.Z. Graph-convolutional-network-based interactive prostate segmentation in MR images. Med. Phys. 2020:13. doi: 10.1002/mp.14327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Castillo-Barnes D., Su L., Ramirez J., Salas-Gonzalez D., Martinez-Murcia F.J., Illan I.A. Autosomal dominantly inherited Alzheimer disease: analysis of genetic subgroups by machine learning. Inf. Fusion. 2020;58:153–167. doi: 10.1016/j.inffus.2020.01.001. Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rodriguez-Rivero J., Ramirez J., Martinez-Murcia F.J., Segovia F., Ortiz A., Salas D. Granger causality-based information fusion applied to electrical measurements from power transformers. Inf. Fusion. 2020;57:59–70. May. [Google Scholar]
  • 27.Valueva M.V., Nagornov N.N., Lyakhov P.A., Valuev G.V., Chervyakov N.I. Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math. Comput. Simul. 2020;177:232–243. Nov. [Google Scholar]
  • 28.Tabik S., Alvear-Sandoval R.F., Ruiz M.M., Sancho-Gomez J.L., Figueiras-Vidal A.R., Herrera F. MNIST-NET10: a heterogeneous deep networks fusion based on the degree of certainty to reach 0.1% error rate. Ensembles overview and proposal. Inf. Fusion. 2020;62:73–80. Oct. [Google Scholar]
  • 29.Mittal G., Korus P., Memon N. FiFTy: large-scale file fragment type identification using convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 2021;16:28–41. [Google Scholar]
  • 30.Azamfar M., Singh J., Bravo-Imaz I., Lee J. Multisensor data fusion for gearbox fault diagnosis using 2-D convolutional neural network and motor current signature analysis. Mech. Syst. Signal Process. 2020;144:18. Article ID: 106861, Oct. [Google Scholar]
  • 31.Kim E. Interpretable and accurate convolutional neural networks for human activity recognition. IEEE Trans. Ind. Inf. 2020;16:7190–7198. Nov. [Google Scholar]
  • 32.Jeon S., Moon J. Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inf. Sci. 2020;535:1–15. Oct. [Google Scholar]
  • 33.Nayak D.R., Das D., Dash R., Majhi S., Majhi B. Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images. Multimed. Tools Appl. 2020;79:15381–15396. Jun. [Google Scholar]
  • 34.Górriz J.M. Artificial intelligence within the interplay between natural and artificial computation: advances in data science, trends and applications. Neurocomputing. 2020;410:237–270. [Google Scholar]
  • 35.Zhang Y.-D. Advances in multimodal data fusion in neuroimaging: overview, challenges, and novel orientation. Inf. Fusion. 2020;64:149–187. doi: 10.1016/j.inffus.2020.07.006. 2020/12/01/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kileel J., Trager M., Bruna J. Advances in Neural Information Processing Systems. Vol. 32. 2019. On the expressive power of deep polynomial neural networks; pp. 1–10. 32H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, and R. Garnett, Eds., ed La Jolla: Neural Information Processing Systems (NIPS) https://papers.nips.cc/paper/9219-on-the-expressive-power-of-deep-polynomial-neural-networks.pdf. [Google Scholar]
  • 37.Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958. Jun. [Google Scholar]
  • 38.Shi Z.L., Ye Y.D., Wu Y.P. Rank-based pooling for deep convolutional neural networks. Neural Netw. 2016;83:21–31. doi: 10.1016/j.neunet.2016.07.003. Nov. [DOI] [PubMed] [Google Scholar]
  • 39.Jiang Y.Y. Cerebral micro-bleed detection based on the convolution neural network with rank based average pooling. IEEE Access. 2017;5:16576–16583. [Google Scholar]
  • 40.Sun Z., Chiong R., Hu Z.P. An extended dictionary representation approach with deep subspace learning for facial expression recognition. Neurocomputing. 2018;316:1–9. Nov. [Google Scholar]
  • 41.Akhtar N., Ragavendran U. Interpretation of intelligence in CNN-pooling processes: a methodological survey. Neural. Comput. Appl. 2020;32:879–898. Feb. [Google Scholar]
  • 42.Tarawneh A.S., Hassanat A.B.A., Almohammadi K., Chetverikov D., Bellinger C. SMOTEFUNA: synthetic minority over-sampling technique based on furthest neighbour algorithm. IEEE Access. 2020;8:59069–59082. [Google Scholar]
  • 43.Jan Z., Verma B. Multiple strong and balanced cluster-based ensemble of deep learners. Pattern Recognit. 2020;107:11. Article ID: 107420, Nov. [Google Scholar]
  • 44.Spinelli I., Scardapane S., Uncini A. Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw. 2020;129:249–260. doi: 10.1016/j.neunet.2020.06.005. Sep. [DOI] [PubMed] [Google Scholar]
  • 45.Mallick T., Balaprakash P., Rask E., Macfarlane J. Graph-partitioning-based diffusion convolutional recurrent neural network for large-scale traffic forecasting. Transp Res Rec. 2020;2674:473–488. [Google Scholar]
  • 46.Derr T., Ma Y., Fan W.Q., Liu X.R., Aggarwal C., Tang J.L. Epidemic graph convolutional network. 13th International Conference on Web Search and Data Mining; Houston, TX; 2020. pp. 160–168. [Google Scholar]
  • 47.Jeong C., Jang S., Park E., Choi S. A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics. 2020;124:1907–1922. Sep, [Google Scholar]
  • 48.Kipf T.N., Welling M. International Conference on Learning Representations (ICLR) ICLR; Palais des Congrès Neptune: 2017. Semi-supervised classification with graph convolutional networks; pp. 1–14. presented at the. [Google Scholar]
  • 49.Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020;128:336–359. 2020/02/01. [Google Scholar]

Articles from An International Journal on Information Fusion are provided here courtesy of Elsevier

RESOURCES