COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis

Shui-Hua Wang; Deepak Ranjan Nayak; David S Guttery; Xin Zhang; Yu-Dong Zhang

doi:10.1016/j.inffus.2020.11.005

. 2020 Nov 13;68:131–148. doi: 10.1016/j.inffus.2020.11.005

COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis

Shui-Hua Wang ^a,^b,^c,^#, Deepak Ranjan Nayak ^d,^#, David S Guttery ^e,^#, Xin Zhang ^f,^⁎, Yu-Dong Zhang ^b,^g,^⁎

PMCID: PMC7837204 PMID: 33519321

Highlights

•
We proposed a novel (L, 2) transfer feature learning (L2TFL) approach.
•
L2TFL can elucidate the optimal layers to be removed prior to selection.
•
We developed a novel selection algorithm of pretrained network for fusion approach.
•
SAPNF can determine the best two pretrained models for fusion.
•
We introduced a deep CCT fusion discriminant correlation analysis fusion method.

Keywords: Chest CT, COVID-19, Deep fusion, transfer learning, pretrained model, Discriminant correlation analysis, Micro-averaged F1

Abstract

Aim

: COVID-19 is a disease caused by a new strain of coronavirus. Up to 18th October 2020, worldwide there have been 39.6 million confirmed cases resulting in more than 1.1 million deaths. To improve diagnosis, we aimed to design and develop a novel advanced AI system for COVID-19 classification based on chest CT (CCT) images.

Methods

: Our dataset from local hospitals consisted of 284 COVID-19 images, 281 community-acquired pneumonia images, 293 secondary pulmonary tuberculosis images; and 306 healthy control images. We first used pretrained models (PTMs) to learn features, and proposed a novel (L, 2) transfer feature learning algorithm to extract features, with a hyperparameter of number of layers to be removed (NLR, symbolized as L). Second, we proposed a selection algorithm of pretrained network for fusion to determine the best two models characterized by PTM and NLR. Third, deep CCT fusion by discriminant correlation analysis was proposed to help fuse the two features from the two models. Micro-averaged (MA) F1 score was used as the measuring indicator. The final determined model was named CCSHNet.

Results

: On the test set, CCSHNet achieved sensitivities of four classes of 95.61%, 96.25%, 98.30%, and 97.86%, respectively. The precision values of four classes were 97.32%, 96.42%, 96.99%, and 97.38%, respectively. The F1 scores of four classes were 96.46%, 96.33%, 97.64%, and 97.62%, respectively. The MA F1 score was 97.04%. In addition, CCSHNet outperformed 12 state-of-the-art COVID-19 detection methods.

Conclusions

: CCSHNet is effective in detecting COVID-19 and other lung infectious diseases using first-line clinical imaging and can therefore assist radiologists in making accurate diagnoses based on CCTs.

1. Introduction

COVID-19 (coronavirus disease 2019) was declared as a Public Health Emergency of International Concern on 30/Jan/2020, and a worldwide pandemic on 11/March/2020. Up to 18/Oct/2020, globally there have been 39.6 million confirmed cases and more than 1.1 million deaths (including US 222.5k Brazil 153.6k, India 114.0k, Mexico 86.0k, UK 43.5k, etc.) [1].

Two prevailing diagnostic methods are available for COVID-19 detection. One is viral testing by nasopharyngeal swabs [2] to test the existence of viral RNA fragments using real-time reverse-transcriptase PCR (rRT-PCR) and the other is imaging methods such as chest X-ray (CXR) [3] and chest computed tomography (CCT) [4]. Compared to viral testing, CCT can avoid the problem of sample contamination. For example, the swab can touch contaminated surfaces or gloves, samples can be cross-contaminated, etc. It was reported that in March 2020, due to the problem of reagent contamination, the US Center for Disease Control and Prevention (CDC) withdrew testing kits [5]. As an alternative, CCT scans can help to detect hazy, patchy, “ground glass” white spots in the lung, a tell-tale sign of COVID-19 infection, which can provide a more accurate result than viral tests. Furthermore, previous studies have shown that CCT can detect 97% of COVID-19 infections; whereas viral testing only detected 52% of patients with COVID-19 infection [6].

There are currently two imaging modalities that are used to detect COVID-19 infection. CXR is the most widely used diagnostic X-ray examination in medical practice, producing images of the blood vessels, airways, lungs, heart, bones of the spine, and chest. On the other hand, CCT uses computer-processed combinations of numerous X-ray images taken at different angles to produce a cross-sectional image of the region being scanned and to examine abnormalities. CCT is able to detect very small nodules in the lung compared to CXR [7]. In addition, CCT has advantages over CXR since it generates high-quality, detailed images by taking a 360-degree image of the chest and its internal organs. Moreover, CXR provides a 2D image that contains less information; whereas CCT provides 3D volumetric data that can highlight additional spatial features and abnormalities.

For diagnosis of COVID-19, CXR is sub-optimal since important abnormalities are undetectable due to the normal black appearance of the lung. However, CCT can clearly show a combination of multifocal peripheral lung changes of ground-glass opacity (GGO) [8] and/or consolidation [9], which indicate infection with COVID-19. Hence, in this study we used CCT to aid diagnosis of COVID-19 infection. Nevertheless, manual labeling by radiologists is tedious and time-consuming, while being affected by inter- and/or intra-expert factors (e.g., emotion, tiredness, lethargy, etc.). Further, diagnostic throughputs of radiologists are not comparable with digital methods and early symptoms are more difficult to measure and hence can potentially be missed by experts.

Improved diagnostic systems using image processing and machine learning can potentially benefit patients, experts, radiologists, consultants, and hospitals. Currently, most AI methods can differentiate COVID-19 infection in images from healthy subjects and/or community acquired pneumonia (CAP). Deep learning (DL) approaches are an emerging new type of machine learning, which consists of stacks of convolution layers and fully connected layers (FCLs).

For example, Li and Liu [10] employed wavelet packet Tsallis entropy as a feature descriptor, and used a real-coded biogeography-based optimization (RCBO) approach as a classifier. Lu [11] employed bat algorithm to optimize extreme learning machine. Their method was called ELM-BA. Jiang [12] proposed a six-level convolutional neural network (6L-CNN) towards therapy and rehabilitation, while improving performance by replacing the traditional rectified linear unit with leaky rectified linear unit. Guo and Du [13] used ResNet-18 (RN-18) to classify thyroid ultrasound standard plane (TUSP), achieving a classification accuracy of 83.88%. Their experiment verified the effectiveness of RN-18. Fulton, et al. [14] utilized ResNet-50 to classify Alzheimer's disease (RN-50-AD) with and without imagery. The authors stated that ResNet-50 models might help identify AD patients prior to provider review. Although these previous five studies did not analyze COVID-19 positive patients, their algorithms can be easily transferred to the multi-class classification task of COVID-19 diagnosis in this study.

Numerous cutting-edge AI methods have been proposed to diagnose COVID-19 using either CXR or CCT. For CXR, Loey, et al. [15] employed generative adversarial network (GAN) to produce new simulated images showing that the combination of GAN and GoogleNet (GAN-GN) is optimal for two-class classification than AlexNet and ResNet-18. Togacar, et al. [16] utilized SqueezeNet and MobileNetV2 to obtain image descriptors. The authors chose social mimic optimization (SMO) as a feature selection tool. The obtained features were then combined and passed into support vector machines. Cohen, et al. [17] employed a sizable non-COVID-19 CXR set to improve extracted features from images of CXRs from COVID-19 patients and predicted two scores: (i) lung opacity score; and (ii) geographic extent score. Their method could gage severity of COVID-19. The method (termed COVID severity score or CSS) achieved a mean absolute error (MAE) of 1.14 on geographic extent score, and a MAE of 0.78 on lung opacity score. Tabik, et al. [18] built COVIDGR-1.0, a homogeneous and balanced database that includes all levels of severity, and presented a novel COVID-SDNet in order to classify COVID-19 based on CXR images.

For CCT, Ni, et al. [19] proposed NiNet, utilizing both 3D U-Net and MVP-Net on more than 90 COVID-19 patients in CCT scanning, for the aim of (i) pulmonary lobe segmentation, (ii) lesion segmentation, and (iii) lesion detection. The authors found the deep learning algorithm could assist radiologists to make quicker diagnosis (all p values are less than 0.01%) with first-class performances. Ko, et al. [20] presented a straigtforward 2D fast-track deep learning system for single CCT image termed FCONet (fast-track COVID-19 classification network). They analyzed 4 pretrained models: ResNet-50, VGG16, Xception, and Inception-V3, finding that ResNet-50 performed the best when classifying COVID-19 positive patients. They used two augmentation methods: zoom and image rotation, while proposing extra layers consisting of a flatten layer, a FCL (32 neurons), and a FCL (3 neurons). The final FCL has 3 neurons since their task is to classify 3 categories: COVID-19, other pneumonia, and non-pneumonia. As validation, the authors tested the FCONet approaches on an external set from embedded low-quality CCT images of COVID-19 patients. Li, et al. [21] developed COVNet, choosing ResNet50 as the backbone network. In their study, the deep representations were merged by a max-pooling procedure, with the obtained feature map being passed into a FCL to produce the probability score of three categories: (i) COVID-19 infection, (ii) CA), and (iii) non-pneumonia. Wang, et al. [22] proposed DeCovNet, a weakly-supervised DL framework via three-dimensional CT data for (i) lesion localization; and (ii) COVID-19 classification. The lung region was firstly segmented via a pre-trained UNet. Next, the segmented 3D lung region was passed to a three-dimensional deep network to predict the probability of COVID-19. When using a probability threshold of 0.5, the DeCovNet yielded an accuracy of 90.1%, a negative predictive value of 98.2%, and a positive predictive value of 84.0%. Satapathy, et al. [23] proposed a seven-layer CNN by stochastic pooling. Their method achieved a specificity of 93.63%, a sensitivity of 94.44%, and an accuracy of 94.03%. Wu [24] combined wavelet Renyi entropy with a three-segment biogeography-based optimization (TSBO). Their proposed TSBO can optimize weights, biases, and order of Renyi entropy at the same time.

The inspiration for this study was to improve detection of COVID-19 infection in CCT images by developing a novel method to fuse the features from two neural network models. The main contributions of this paper are that: (i) We proposed a novel (L, 2) transfer feature learning (L2TFL) approach to elucidate the optimal layers to be removed prior to selection by testing various pretrained networks with various settings. (ii) We developed a novel selection algorithm of pretrained network for fusion (SAPNF) approach that can determine the best two pretrained models and proved it gives better performance than the proposed greedy selection algorithm for fusion (GSAF). (iii) We introduced a deep CCT fusion discriminant correlation analysis (DCFDCA) fusion method that gives better performance than traditional addition and concatenation fusion methods; and (iv) we improved performance over current methods by implementing multiple-way data augmentation.

The structure of the paper is organized as below. Section 2 introduces the dataset, imaging protocol, slice selection method, ground-truth labeling, and preprocessing of the images. Section 3 describes every component of the proposed AI model for COVID-19 detection, and Section 4 presents the experimental results and discussions. Finally, Section 5 concludes the paper.

2. Dataset and preprocessing

Table 10 in Appendix.A and Table 11 in Appendix.B list the abbreviation and variable meanings exercised for easy reading.

2.1. Slice selection

Four types of CCT were used in this study: (i) COVID-19 positive; (ii) community-acquired pneumonia (CAP); (iii) second pulmonary tuberculosis (SPT); (iv) healthy control (HC). The three diseased classes were chosen since they are all infectious diseases of the chest regions. Our aim was to develop an AI system that can automatically predict the four categories.

For each subject, $s = {1, 2, 3, 4}$ slices were chosen and a slice level selection (SLS) method was employed: For the three diseased groups, the slice displaying the largest number of lesions and size was chosen. For healthy subjects, any slice of the 3D image was randomly chosen.

The resolutions of all images were $1024 \times 1024 \times 3$ . In total, we enrolled 521 subjects, and generated 1164 slice images using the SLS method, viz., 284 COVID-19 images, 281 CAP images, 293 SPT images; and 306 HC images. Image collection is challenging since it is expensive and labor-intensive, as well as requiring expert curation. Table 1 lists the demographics of the four-category subject cohort.

Table 1.

Subjects and images of four categories.

Category	Patients (n)	CCT Images (n)
COVID-19	125	284
CAP	123	281
SPT	134	293
HC	139	306

Open in a new tab

(n = number).

2.2. Ground-truth labelling

Three radiologists (Two juniors: $B_{1}$ and $B_{2}$ , and one senior: $B_{3}$ ) were assigned to curate all the images. Suppose $x_{0}$ means one CCT scan, $Z$ means the labeling of each individual expert, and the final labeling $Z^{C C T}$ of the CCT scan is obtained by

Z^{C C T} [x_{0}] = {\begin{matrix} Z [B_{1}, x_{0}] & Z [B_{1}, x_{0}] = = Z [B_{2}, x_{0}] \\ MV {Z_{a l l} [x_{0}]} & otherwise \end{matrix}

(1)

Where $Z_{a l l}$ denotes the labeling of all radiologists, viz.,

Z_{a l l} [x_{0}] = [Z (B_{1}, x_{0}), Z (B_{2}, x_{0}), Z (B_{3}, x_{0})]

(2)

MV denotes majority voting. The above equation means the situation of disagreement between the analyses of two junior radiologists $(B_{1}, B_{2})$ , we need to consult a senior radiologist $(B_{3})$ to reach a consensus.

2.3. Preprocessing

Preprocessing has already shown its success in medical image analysis [25, 26]. The original dataset contained $| X_{0} |$ slice images ${x_{0} (i), i = 1, 2, \dots, | X_{0} |}$ . The size of each image was $s i z e [x_{0} (i)] = W_{0} \times H_{0} \times C_{0}$ . Fig. 1 shows the pipeline for preprocessing of our dataset.

Fig. 1 — Illustration of preprocessing. (CAP: community-acquired pneumonia; SPT: secondary pulmonary tuberculosis; HC: healthy control; CCT: chest CT; HS: histogram stretching).

First, the color CCT images from four classes were converted into grayscale by retaining the luminance channel, and yielding the grayscale data set $X_{1}$ :

X_{1} = F_{G} (X_{0}) = {x_{1} (1), x_{1} (2), \dots, x_{1} (i), \dots x_{1} (| X |)}

(3)

where $F_{G}$ means the grayscale operation. Note that $s i z e [x_{1} (i)] = W_{1} \times H_{1} \times C_{1}$

Second, the histogram stretching (HS) [27, 28] was utilized to increase the contrast of all images. Take the i th image $x_{1} (i), i = 1, 2, \dots, | X |$ as an example, its minimum and maximum grayscale values $a_{l} (i)$ and $a_{h} (i)$ were calculated as:

{\begin{matrix} a_{l} (i) = \min_{w = 1}^{W_{1}} \min_{h = 1}^{H_{1}} \min_{c = 1}^{C_{1}} x_{1} (i | x, y, c) \\ a_{h} (i) = \max_{w = 1}^{W_{1}} \max_{h = 1}^{H_{1}} \max_{c = 1}^{C_{1}} x_{1} (i | x, y, c) \end{matrix}

(4)

here (w, h, c) means the index of width, height, and channel directions along image $x_{1} (i)$ , respectively. The new histogram stretched image set $X_{2}$ was calculated as:

X_{2} = H S (X_{1}) = {x_{2} (i) \underline{\underline{d e f}} \frac{x_{1} (i) - a_{l} (i)}{a_{h} (i) - a_{l} (i)}}

(5)

Third, cropping was carried out to remove the checkup bed at the bottom area, and to remove the texts at the margin areas. The cropped dataset $X_{3}$ is obtained as

X_{3} = F_{C} (X_{2}, [c_{t}, c_{b}, c_{l}, c_{r}]) = {x_{3} (1), x_{3} (2), \dots, x_{3} (i), \dots, x_{3} (| X |)}

(6)

where $F_{C}$ represents crop operation. Parameter $(c_{t}, c_{b}, c_{l}, c_{r})$ means pixels to be cropped in unit of pixel from four directions. The subscript $(t, b, l, r)$ is the initial letter of top, bottom, left, and right, respectively. After this step, the resolution of each image $s i z e [x_{3} (i)] = W_{3} \times H_{3} \times C_{3}$ .

Fourth, we down-sampled each image to a size of $[W_{4}, H_{4}]$ , obtaining the resized image set $X_{4}$ as

X_{4} = F_{D} (X_{3}, [W_{5}, H_{5}]) = {x_{4} (1), x_{4} (2), \dots, x_{4} (i), \dots x_{4} (| X |)}

(7)

where $F_{D} : a \mapsto b$ represents the downsampling (DS) function, in which b is a down-sampled image of the raw image a.

After the preprocessing procedure, each image was approximately 1.64% (explained below) of its original storage or size. The compression ratio (CR) rates of i th image of the final stage $X_{4}$ to the raw stage $X_{0}$ was measured by two variables: the storage CR $(δ_{1})$ and size CR $(δ_{2})$

{\begin{matrix} δ_{1} = \frac{s t o r a g e [x_{4} (i)]}{s t o r a g e [x_{0} (i)]} \\ δ_{2} = \frac{s i z e [x_{4} (i)]}{s i z e [x_{0} (i)]} \end{matrix}

(8)

We can have $δ_{1} = 206, 116 / 12, 582, 912$ , and $δ_{2} = 51, 529 / 3, 145, 728$ . Hence, we can obtain $δ_{1} (i) = δ_{2} (i) = 1.64 %$ , $\forall i = 1, 2, \dots, | X |$ . Hence, it proves the importance of preprocessing. Furthermore, Fig. 2 displays four samples from the preprocessed set $X_{4}$ . The top row presents the preprocessed images, and the bottom row the delineated results in red curves. Overall, the advantages of preprocessing is three-fold: (i) Compression ratio helps to minimize the storage size; (ii) Histogram stretching helps to normalize the contrast of all samples; (iii) Cropping removes irrelevant contents from CCT images, so the AI model will focus on the lung region. Table 2 compares the storage and size of every image $x_{s} (i), s = 0, \dots, 4, i = 1, \dots, | X |$ at each preprocessing step.

Fig. 2 — Samples of X₄. (CAP: community-acquired pneumonia; SPT: secondary pulmonary tuberculosis; HC: healthy control).

Table 2.

Storage and size per preprocessing step.

Preprocessing Step	Variable	W	H	C	Storage*	Size*
Raw	$x_{0} (i)$	1024	1024	3	12,582,912	$3, 145, 728$
Grayscale	$x_{1} (i)$	1024	1024	1	4194,304	$1, 048, 576$
HS	$x_{2} (i)$	1024	1024	1	4194,304	$1, 048, 576$
Crop	$x_{3} (i)$	724	724	1	2096,704	$524, 176$
DS	$x_{4} (i)$	227	227	1	206,116	$51, 529$

Open in a new tab

* Storage and size are measured per image.

3. Methodology

The motivation of our algorithm was to use pretrained models to generate features from CCT images, and fuse those features using the discriminant correlation analysis (DCA) method. Section 3.1 introduces what transfer learning is. Section 3.2 briefs several state-of-the-art pretrained models and proposes a novel (L, 2) transfer feature learning (L2TFL) algorithm, to answer the question of how to extract features using pretrained networks. Section 3.3 determines how to choose the optimal two pretrained models, and proposes a novel selection algorithm of pretrained networks for fusion (SAPNF). Section 3.4 details how to fuse, and introduces the DCA technology. Section 3.5 presents a novel data augmentation method to further improve the performance. Section 3.6 presents the experimental setup and measures. Section 3.7 summarizes and gives the pseudocode of the proposed algorithms.

3.1. Transfer learning

The basic ideas of transfer learning (TL) are utilizing a complicated and successfully pre-trained model (PTM) [29], taught from a sizable amount of source data, viz., (1000 categories from ImageNet), and then “transfer” the learnt knowledge [30] to the relatively simple task (4 categories of COVID-19, CAP, SPT and HC in this study) with a small quantity of data.

Mathematically, suppose the source data is $X_{S}$ representing ImageNet, the source label $L_{S}$ the 1000-category labeling, and $O_{S}$ means the source objective-predictive function (i.e., the classifier), we have the source domain knowledge $S$ as a triple variable of

S = {X_{S}, L_{S}, O_{S}}

(9)

Now we have the triple target: target data $X_{T}$ represents the training set, $L_{T}$ presents the 4-class labeling (COVID-19, CAP, SPT, or HC), and $O_{T}$ represents the classifier to be established.

T = {X_{T}, L_{T}, O_{T}}

(10)

Using TL, the classifier to be created can be written as $O_{T} (X_{T}, L_{T} | S)$ . Without using transfer learning, the classifier is written as $O_{T} (X_{T}, L_{T})$ .

O_{T} = {\begin{matrix} O_{T} (X_{T}, L_{T} | S) = O_{T} (X_{T}, L_{T} | X_{S}, L_{S}, O_{S}) & using TL \\ O_{T} (X_{T}, L_{T}) & not using TL \end{matrix}

(11)

Then we can say $O_{T} (X_{T}, L_{T} | S)$ is expected to be much closer to the ideal classifier $O_{T}^{I d e a l}$ than the classifier using only the target domain $O_{T} (X_{T}, L_{T})$ , viz. suppose we have a large number of samples $X$ and its labels $L$ .

\begin{matrix} e r r [O_{T} (X_{T}, L_{T} | S) (X), L] \\ < e r r [O_{T} (X_{T}, L_{T}) (X), L] \end{matrix}

(12)

Where $e r r (a, b)$ is an error function measuring the two inputs $a$ and $b$ .

In practice, three elements are vital to help transfer learning improve its performance than building and training a network [31] from scratch: (i) Successful PTM can help the user remove hyper-parameter tuning; (ii) The initial layers in PTM can be thought of as feature descriptors, which extract low-level features, e.g., tints, edges, blobs, shades, and textures; (iii) The target model may only need to re-train the last several layers of the pre-trained model, since we believe the last several layers carry out the complex identification tasks. The basic idea of transfer learning is shown in Fig. 3 .

Fig. 3 — Idea of transfer learning. (PTM: pretrained mode; CAP: community-acquired pneumonia; SPT: secondary pulmonary tuberculosis; HC: healthy control).

3.2. Novelty 1: (L, 2) transfer feature learning

As shown in Table 3 , $N_{P T M}$ pretrained models were tested in this study: AlexNet, DenseNet201, ResNet50, ResNet101, VGG16, and VGG19. Traditional transfer learning usually modifies the neuron number of the last fully connected layer. Then the user may choose to retrain the whole network (The weights of reserved layers may be initialized by either pretrained models or re-initialization) or only retrains the modified layer.

Table 3.

Candidate pretrained models.

PTM	PTM Symbol	Parameters (millions)	Input Size
AlexNet	$M^{P T M} (1)$	61.0	227×227
DenseNet201	$M^{P T M} (2)$	20.0	224×224
ResNet50	$M^{P T M} (3)$	25.6	224×224
ResNet101	$M^{P T M} (4)$	44.6	224×224
VGG16	$M^{P T M} (5)$	138	224×224
VGG19	$M^{P T M} (6)$	144	224×224

Open in a new tab

In this study, we proposed a new (L, 2) transfer feature learning algorithm (abbreviated as L2TFL). The motivation for L2TFL is two-fold: (i) We make $L$ , the number of layers to be removed (NLR), adaptive, and the value of $L$ was optimized to improve performance. (ii) We chose to add two newly fully connected layers due to the arbitrary width case of universal approximation theorem.

For ease of understanding, the pseudocode of proposed L2TFL algorithm is presented in Algorithm 1, where $L$ is a parameter and its value was optimized.

Step 1. Read the PTM network in Table 3, and store it into variable $M_{0}$ , suppose its number of learnable layers is $L^{0}$ .
Step 2. Remove the last NLR l-learnable layers from $M_{0}$ and get $M_{1}$ ,
$M_{1} = F_{r l} (M_{0}, L)$ (13)

where $F_{r l}$ means remove layer function, and parameter $L$ means the number of last layers to be removed. If there are shortcuts with their outputs located within the last $L$ learnable layers, those shortcuts must be removed.

Step 3. Add 2 new fully connected layers

M_{2} = F_{a f c l} (M_{1}, 2)

(14)

where $F_{a f c l}$ means add fully-connected layer function, and the constant 2 means the number of fully-connected layers to be appended to $M_{1}$ . Here the number of learnable layers $L_{2}$ of network $M_{2}$ can be calculated as $L_{2} = L_{0} - L + 2$ . The first layer FCL layer has $N_{F C L} (1)$ nodes, and the second FCL layer has $N_{F C L} (2)$ nodes.

Step 4. Keep the learning rate [32] of all the transfer layers zero, in order to freeze those layers

\vec{l_{r}} [M_{2} (1 : L_{0} - L)] \leftarrow 0

(15)

Where $l_{r}$ means the leaning rate, and $M (a : b)$ means the layers from $a$ to $b$ in network $M$ , in total $b - a + 1$ layers are considered in $M (a : b)$ .

Step 5. Let the last two added new fully connected layers be retrainable, i.e., set their learning rate as 1

\vec{l_{r}} [M_{2} (L_{2} - 1 : L_{2})] \leftarrow 1

(16)

Step 6. Retrain the whole network $M_{2}$ using our four-class data and get the trained network $M_{3}$ .

M_{3} = F_{r t} (M_{2}, X)

(17)

where $X$ is some dataset, and $F_{r t}$ means the retrain function.

Step 7. Using $M_{3}$ to generate learnt features

f_{M} (L) = F_{a c} (M_{3}, L_{2} - 1)

(18)

where $F_{a c}$ is the activation function, $f_{a c} (a, b)$ means to extract the activation functions from network $a$ at the $b$ -th layer, $f_{M} (L)$ means features learnt from network $M$ by removing $L$ learnable layers.

Take ResNet18 as an example, Fig. 4 shows the diagram of our L2TFL algorithm, where $L = 2$ . Fig. 4(a) shows the part of ResNet18 with the last two learnable layers. Fig. 4(b) shows the structure of using our L2TFL, by which the last two learnable layers of ResNet18 were replaced by two newly added FCL layers with number of nodes of $N_{F C L} (1)$ and $N_{F C L} (2)$ , respectively.

To search the optimal value of NLR $L$ , we set a range of $[1, L_{max}]$ , where we searched the optimal NLR $L$ value from this range for each PTM. $L_{max}$ is the maximum removable layer (MRL). Note that SqueezeNet [33] and GoogleNet [34] were not considered since their structure contains parallel branches and were not appropriate in our L2TFL algorithm.

3.3. Novelty 2: selection algorithm of pretrained networks for fusion

Previously, we discussed how to extract features from PTMs. Now the question is how to select the two pretrained models? The naive idea is to use greedy selection algorithm for fusion (GSAF), i.e., select the best two pretrained models, and extract their features, and fuse those two features.

Suppose there is a dataset $Y$ will be split into a training set $Y_{1}$ , a validation set $Y_{2}$ , and a test set $Y_{3}$ , resulting in $| Y_{1} | + | Y_{2} | + | Y_{3} | = | Y |$ . The GSAF uses $Y_{2}$ to create a performance rank list $R_{L}$ , choose the best two PTMs from that list, and fuse their corresponding features.

The procedure of GSAF is briefly described as: For a given $k$ -th PTM $M_{0} (k)$ , we use L2TFL via data $Y_{1}$ and removing $L$ layers to obtain $M_{3} (k, L)$

M_{3} (k, L) = F_{L 2 T F L} [M_{0} (k), L, Y_{1}]

(19)

where $F_{L 2 T F L}$ is our proposed L2TFL operation. Subsequently, an empty one-hidden layer neural network (OHNN) [35] $B$ was created for validation. The initial and trained one-hidden neural network are symbolized as $B^{i}$ and $B^{t}$ , respectively. The input of $B$ is $f_{M} (k, L)$ , and the number of hidden neurons is symbolized to $N_{H L}$ . Performance indicator $I$ was calculated by comparing the output of $B^{t}$ over validation set $Y_{2}$ , viz.,

O_{2} = B^{t} (Y_{2})

(20)

with its ground truth labels $Z (Y_{2})$ , so $I$ is calculated as

I = F_{M I} [O_{2}, Z (Y_{2})]

(21)

Where $F_{M I}$ is the measuring indicator function. It can be accuracy or sensitivity or specificity or any other measuring indicators. The $I$ is gathered overall all possible hyperparametric combination, so we get indicator vector

\vec{I (k, L)} \underline{\underline{d e f}} (k, L | \begin{matrix} k = 1, \dots, N_{P T M} \\ L = 1 \dots, L_{max} \end{matrix})

(22)

The indicator vector $\vec{I (k, L)}$ is used to compare all the $N_{P T M}$ possible models and all $L_{max}$ possible removable layers, and we obtain the rank list $\vec{R}$ by

\vec{R_{G S A F}} = F_{S D} (\vec{I (k, L)})

(23)

Where $F_{S D}$ is the sort function in descending way, and $I (k, L)$ means the indicator by learnt features from k-th PTM with removing NLR $L$ learnable layers. Now $\vec{R_{G S A F}} (1)$ and $\vec{R_{G S A F}} (2)$ means the index of the top two best models by GSAF method, as shown in Table Algorithm 2.

Nevertheless, this greedy selection algorithm cannot ensure the fused feature can obtain the best performance. For example, if the two best models are all focusing on one region, their fusion does not help improve the performance.

Hence, we proposed a novel selection algorithm of pretrained networks for fusion (SAPNF) to help choose the best two pretrained models that can specifically improve the performance of the fused features. The difference between SAPNF and GSAF is the former will investigate a larger search space that covers both PTM candidates to be fused, while the latter only searches a smaller space which contains only one PTM candidate. The pseudocode of SAPNF is presented in Algorithm 3.

Mathematically, we retrained two models (with hyperparameters as PTM and NLR) in SAPNF. Hence, Eq (19) was updated as

{\begin{matrix} M_{3} (k_{1}, L_{1}) = F_{L 2 T F L} [M_{0} (k_{1}), L_{M 1}, Y_{1}] \\ M_{3} (k_{2}, L_{2}) = F_{L 2 T F L} [M_{0} (k_{2}), L_{M 2}, Y_{1}] \end{matrix}

(24)

where the subscript $1 or 2$ in $(k_{1}, k_{2})$ and $(L_{M 1}, L_{M 2})$ means the index of candidate model. Note we should guarantee

k_{1} \neq k_{2} \lor L_{M 1} \neq L_{M 2}

(25)

Which helped ensure the two candidate models are not the same one.

Now we can generate two features $f_{M} (k_{1}, L_{M 1})$ and $f_{M} (k_{2}, L_{M 2})$ from two different models $M_{3} (k_{1}, L_{M 1})$ and $M_{3} (k_{2}, L_{M 2})$ , respectively. We used some fusion operation to generate a fused feature $f_{F}$ as

f_{F} (k_{1}, L_{M 1}, k_{2}, L_{M 2}) = F_{D F} [f_{M} (k_{1}, L_{M 1}), f_{M} (k_{2}, L_{M 2})]

(26)

Where $F_{D F}$ is the deep fusion function, which will be discussed in next Section.

The indicator vector $\vec{I}$ is updated as

\vec{I (k_{1}, L_{M 1}, k_{2}, L_{M 2})} \underline{\underline{d e f}} \vec{(k_{1}, L_{1}, k_{2}, L_{2} | \begin{matrix} (k_{1}, k_{2}) = 1, \dots, N_{P T M} \\ (L_{M 1}, L_{M 2}) = 1 \dots, L_{max} \\ k_{1} \neq k_{2} \lor L_{M 1} \neq L_{M 2} \end{matrix})}

(27)

Similarly, we obtain the rank list $\vec{R_{S A P N F}}$ by

\vec{R_{S A P N F}} = F_{S D} (\vec{I (k_{1}, L_{M 1}, k_{2}, L_{M 2})})

(28)

3.4. Novelty 3: deep CCT fusion by discriminant correlation analysis

Feature-level fusion (FLF) aims to combine discriminative multiple features, while decision-level fusion (DLF) combines multiple decision answers. Commonly, DLF is simpler than FLF, but FLF outperforms DLF [36, 37]. In this study we chose feature-level fusion. In our future research, we will also consider some advanced fusion rules, such as score-level fusion, DLF, and hybrid fusion methods [38].

We have discussed how to carry out transfer feature learning and how to select pretrained models. Now we need to answer the question of how to fuse those extracted features. There are two commonly used FLF methods. Based on having two $N_{F C L} (1)$ -dimension features from two PTMs, the features were generated by our L2TFL method, and the selection of PTM was by SAPNF method.

Assume the two features are symbolized as $f_{M 1}$ with length $q_{1}$ and $f_{M 2}$ with length $q_{2}$ , the fused feature is symbolized as $f_{F}$ . Serial fusion (SF) [39] concatenates the two features into one single feature

\underset{q_{1} + q_{2}}{\underset{︸}{f_{F}}} = F_{S F} (\underset{q_{1}}{\underset{︸}{f_{M 1}}}, \underset{q_{2}}{\underset{︸}{f_{M 2}}})

(29)

where $F_{S F}$ represents the SF operation. The length of $f_{F}$ equals $| f_{M 1} | + | f_{M 2} | = q_{1} + q_{2}$ .

Parallel fusion (PF) [40] combines $f_{M 1}$ and $f_{M 2}$ into one complex vector

f_{F} = F_{P F} (f_{M 1}, f_{M 2}) = f_{M 1} + i \times f_{M 2}

(30)

where $F_{P F}$ represents the PF operation, and $i$ the imaginary unit.

Sun, et al. [41] proposed a canonical correlation analysis (CCA), which finds optimal linear combination of $f_{M 1}$ and $f_{M 2}$ which have maximum correlation with each other. Suppose $f_{M 1} \in R^{q_{1} \times N_{T F}}$ , $f_{M 2} \in R^{q_{2} \times N_{T F}}$ , where $N_{T F}$ means the number of trained features. First, we can define two covariance matrixes $S_{(M 1, M 1)}$ and $S_{(M 2, M 2)}$ as

{\begin{matrix} S_{(M 1, M 1)} = F_{C C O V} (f_{M 1}, f_{M 1}) \in R^{q_{1} \times q_{1}} \\ S_{(M 2, M 2)} = F_{C C O V} (f_{M 2}, f_{M 2}) \in R^{q_{2} \times q_{2}} \end{matrix}

(31)

where $F_{C C O V}$ is the cross-covariance operation. Also, we can define the covariance matrix $S_{(M 1, M 2)}$ as

S_{(M 1, M 2)} = F_{C C O V} (f_{M 1}, f_{M 2})

(32)

We have $S_{(M 1, M 2)} = S_{(M 2, M 1)}^{T}$ .

The overall covariance matrix can be computed as

S = [\begin{matrix} S_{(M 1, M 1)} & S_{(M 1, M 2)} \\ S_{(M 2, M 1)} & S_{(M 2, M 2)} \end{matrix}] \in R^{(q_{1} + q_{2}) \times (q_{1} + q_{2})}

(33)

The aim of CCA is to seek the best linear projection

{\begin{matrix} \bar{f_{M 1}} = W_{C C A, M 1}^{T} f_{M 1} \\ \bar{f_{M 2}} = W_{C C A, M 2}^{T} f_{M 2} \end{matrix}

(34)

where $W_{C C A, M 1}$ and $W_{C C A, M 2}$ are transformation matrices of CCA. The aim is to find the optimal $(W_{C C A, M 1}, W_{C C A, M 2})$ that maximizes the pair-wise correlation $F_{P W C}$ over the two feature sets:

(W_{C C A, M 1}, W_{C C A, M 2}) = \underset{W_{M 1}, W_{M 2}}{argmax} [F_{P W C} (\bar{f_{M 1}}, \bar{f_{M 2}})]

(35)

Where $F_{P W C}$ means the pair-wise correlation, defined as

F_{P W C} (\bar{f_{M 1}}, \bar{f_{M 2}}) = \frac{W_{C C A, M 1}^{T} S_{(M 1, M 2)} W_{C C A, M 2}}{\sqrt{W_{C C A, M 1}^{T} S_{(M 1, M 1)} W_{C C A, M 1}} \times \sqrt{W_{C C A, M 2}^{T} S_{(M 2, M 2)} W_{C C A, M 2}}}

(36)

The detailed derivation and solution can be found in [41]. For the optimal weights $(W_{C C A, M 1}, W_{C C A, M 2})$ , we have $\bar{f_{M 1}} = W_{C C A, M 1}^{T} f_{M 1}$ , and $\bar{f_{M 2}} = W_{C C A, M 2}^{T} f_{M 2}$ . Hence, the combination of the transformed features is carried out by either concatenation or summation as:

{\begin{matrix} f_{C C C A} = (\begin{matrix} \bar{f_{M 1}} \\ \bar{f_{M 2}} \end{matrix}) = {(\begin{matrix} W_{C C A, M 1} & 0 \\ 0 & W_{C C A, M 2} \end{matrix})}^{T} (\begin{matrix} f_{M 1} \\ f_{M 2} \end{matrix}) \\ f_{S C C A} = \bar{f_{M 1}} + \bar{f_{M 2}} = {(\begin{matrix} W_{C C A, M 1} \\ W_{C C A, M 2} \end{matrix})}^{T} (\begin{matrix} f_{M 1} \\ f_{M 2} \end{matrix}) \end{matrix}

(37)

where $f_{C C C A}$ and $f_{S C C A}$ represent the concatenation and summation of CCA features, respectively.

CCA has two issues: (i) The number of samples is less than the number of features in many real world scenarios: $N_{T F} < q_{1} \lor N_{T F} < q_{2}$ , which makes the covariance matrices non-invertible and singular. (ii) CCA neglects the class structure information. To solve these two issues, Haghighat, et al. [42] presented a discriminant correlation analysis (DCA) approach. DCA has been proven to offer improved performance than recent fusion approaches.

In this study, we used DCA to fuse features from CCT images, and we named it as deep CCT fusion by discriminant correlation analysis (DCFDCA). Similar to CCA, suppose $f_{M 1} \in R^{q_{1} \times N_{T F}}$ , where $N_{T F}$ means the number of trained features. The $N_{T F}$ columns of the data matrix can be segmented into $C$ classes, suppose $N_{i}$ columns belong to the i th class, we have

N_{T F} = \sum_{i = 1}^{C} N_{i}

(38)

Let $f_{M 1}^{i j} \in f_{M 1}$ denotes feature extracted from i th image of j-th category via model $M 1$ , and $\bar{f_{M 1}^{i}}$ and $\bar{f_{M 1}}$ denotes the mean of $f_{M 1}^{i j}$ over i th class and the whole set, respectively. We can get

{\begin{matrix} \bar{f_{M 1}^{i}} = \frac{1}{N_{T F}} \sum_{j = 1}^{N_{i}} f_{M 1}^{i j} \\ \bar{f_{M 1}} = \frac{1}{N_{T F}} \sum_{i = 1}^{C} N_{i} \bar{f_{M 1}^{i}} \end{matrix}

(39)

Thus, the between-class scatter (BCS) matrix $S_{B C S, M 1} \in R^{q_{1} \times q_{1}}$ is defined as

S_{B C S, M 1} = \sum_{i = 1}^{C} N_{i} (\bar{f_{M 1}^{i}} - \bar{f_{M 1}}) (\bar{f_{M 1}^{i}} - \bar{f_{M 1}}) \underline{\underline{def}} Φ_{B C S, M 1} Φ_{B C S, M 1}^{T}

(40)

where $Φ_{B C S, M 1} \in R^{q_{1} \times C}$ is defined as

Φ_{B C S, M 1} = [\sqrt{N_{1}} (\bar{f_{M 1}^{1}} - \bar{f_{M 1}}), \sqrt{N_{2}} (\bar{f_{M 1}^{2}} - \bar{f_{M 1}}), \dots, \sqrt{N_{C}} (\bar{f_{M 1}^{C}} - \bar{f_{M 1}})]

(41)

Note the number of features is greater than the number of classes in this study, i.e., $(q_{1} ≫ C)$ , so a method [43] is chosen here to calculate the covariance matrix of $Φ_{B C S, M 1}^{T} Φ_{B C S, M 1} \in R^{C \times C}$ . The most significant eigenvectors of $Φ_{B C S, M 1} Φ_{B C S, M 1}^{T}$ can be economically attained by mapping the eigenvectors of $Φ_{B C S, M 1}^{T} Φ_{B C S, M 1}$ . Hence, it is ncessary to acquire the eigenvectros of this $C \times C$ covraicne matrix $Φ_{B C S, M 1}^{T} Φ_{B C S, M 1}$ . Assue the classes were well-separated, $Φ_{B C S, M 1}^{T} Φ_{B C S, M 1}$ is a diagonal matrix as

P_{O E}^{T} (Φ_{B C S, M 1}^{T} Φ_{B C S, M 1}) P_{O E} = \hat{Λ}

(42)

where $P_{O E}$ denotes the matrix of orthogonal eigenvectors, $\hat{Λ}$ the diagonal matrix of real and non-negtive eigenvalue in decreasing order.

Assue $Q_{O E} \in R^{r \times r}$ entails the first $r$ eigenvectors from $P_{O E}$ , so $Q_{O E}$ corrresponds to the $r$ largest non-zero eigenvalues in $\hat{Λ}$ . We can deduce folowing equation as

Q_{O E}^{T} (Φ_{B C S, M 1}^{T} Φ_{B C S, M 1}) Q_{O E} = Λ \in R^{r \times r}

(43)

Therefore, the $r$ most significant eigenvectors of $S_{B C S, M 1}$ are acquired by the mapping $Q_{O E} \Rightarrow Φ_{B C S} Q_{O E}$ as

{(Φ_{B C S, M 1} Q_{O E})}^{T} S_{B C S, M 1} (Φ_{B C S, M 1} Q_{O E}) = Λ

(44)

Assume $W_{B C S, M 1} = Φ_{B C S, M 1} Q_{O E} Λ^{- 1 / 2}$ is the transformation which uses $S_{B C S}$ and reduces the data's dimensionality from $q_{1}$ to $r$ , we have

W_{B C S, M 1}^{T} S_{B C S, M 1} W_{B C S, M 1} = I

(45)

and

\underset{r \times N_{T F}}{\underset{︸}{{f^{'}}_{M 1}}} = \underset{r \times q_{1}}{\underset{︸}{W_{B C S, M 1}^{T}}} \times \underset{q_{1} \times N_{T F}}{\underset{︸}{f_{M 1}}}

(46)

where $f_{M 1}^{'}$ denotes the projection of $f_{M 1}$ in a temporary space, in which the BCS matrix of the 1st feature set to be fused is $I$ and the classes are all separated. Notice that

r \leq \min [F_{r a n k} (f_{M 1}), F_{r a n k} (f_{M 2}), C - 1]

(47)

where $F_{r a n k}$ is the rank function.

Similarly, to the second feature set $f_{M 2}$ , we can find a transform matrix $W_{B C S, M 2}$ , which employes the BCS matrix for the second feature sets to be fused $S_{B C S, M 2}$ and reduces the dimensionality of $f_{M 2}$ from $q_{2}$ to $r$ as

W_{B C S, M 2}^{T} S_{B C S, M 2} W_{B C S, M 2} = I

(48)

\underset{r \times N_{T F}}{\underset{︸}{{f^{'}}_{M 2}}} = \underset{r \times q_{2}}{\underset{︸}{W_{B C S, M 2}^{T}}} \times \underset{q_{2} \times N_{T F}}{\underset{︸}{f_{M 2}}}

(49)

The updated $Φ_{B C S, M 1}^{'}$ and $Φ_{B C S, M 2}^{'}$ are now non-square $r \times C$ orthonormal matrices. Note that $S_{B C S, M 1}^{'} = S_{B C S, M 2}^{'} = I$ , nevertheless, the matrices ${(Φ_{B C S, M 1}^{'})}^{T} Φ_{B C S, M 1}^{'}$ and ${(Φ_{B C S, M 2}^{'})}^{T} Φ_{B C S, M 2}^{'}$ are strictly diagonally dominant matrices (DDMs), namely, if $b_{i j}$ denotes the entry of a DDM, then $(\forall i : | b_{i i} | > \sum_{i \neq j} | b_{i j} |)$ . In our study, the diagonal entries are near to 1 and the off-diagonal entries are near to zero.

So far, we have transformed $f_{M 1} \to f_{M 1}^{'}$ and $f_{M 2} \to f_{M 2}^{'}$ , i.e., we have finished the unitization of BCS matrices. The next step is to transform the features in one set to have nonzero correlation with their cognate features in the other set.

Mathematically, the between-set covariance (BSC) matrix $S_{M 1, M 2}^{'} = f_{M 1}^{'} {(f_{M 2}^{'})}^{T} \in R^{r \times r}$ of the transformed features set need to be diagonalized. The singular value decomposition (SVD) approach is utilized at this step.

S_{M 1, M 2}^{'} = U Σ V^{T} \Rightarrow U^{T} S_{M 1, M 2}^{'} V = Σ

(50)

Remember that $f_{M 1}^{'}$ and $f_{M 2}^{'}$ are of rank $r$ and $S_{M 1, M 2}^{'}$ is non-degenerate. We can deduce $Σ$ is a diagonal matrix, of which the main diagonal elements are non-zero. Assume

{\begin{matrix} W_{B S C, M 1} \underline{\underline{def}} U Σ^{- 1 / 2} \\ W_{B S C, M 2} \underline{\underline{def}} V Σ^{- 1 / 2} \end{matrix}

(51)

We have

{(U Σ^{- 1 / 2})}^{T} S_{M 1, M 2}^{'} (V Σ^{- 1 / 2}) = I

(52)

which unitizes the BSC matrix $S_{M 1, M 2}^{'}$ . Finally, the DCA-transformed features can be written as

{\begin{matrix} g_{M 1} = W_{B S C, M 1}^{T} {f^{'}}_{M 1} = W_{B S C, M 1}^{T} W_{B C S, M 1}^{T} f_{M 1} \underline{\underline{def}} W_{D C A, M 1} f_{M 1} \\ g_{M 2} = W_{B S C, M 2}^{T} {f^{'}}_{M 2} = W_{B S C, M 2}^{T} W_{B C S, M 2}^{T} f_{M 2} \underline{\underline{def}} W_{D C A, M 2} f_{M 2} \end{matrix}

(53)

Where $W_{D C A M 1} \underline{\underline{def}} W_{B S C, M 1}^{T} W_{B C S, M 1}^{T}$ and $W_{D C A, M 2} \underline{\underline{def}} W_{B S C, M 2}^{T} W_{B C S, M 2}^{T}$ are the final transformation matrices of DCA for $f_{M 1}$ and $f_{M 2}$ , respectively. Similarly, the combination of the transformed DCA features is done by either concatenation or summation as:

{\begin{matrix} g_{C D C A} = (\begin{matrix} g_{M 1} \\ g_{M 2} \end{matrix}) = {(\begin{matrix} W_{D C A, M 1} & 0 \\ 0 & W_{D C A, M 2} \end{matrix})}^{T} (\begin{matrix} f_{M 1} \\ f_{M 2} \end{matrix}) \\ g_{S D C A} = g_{M 1} + g_{M 2} = {(\begin{matrix} W_{D C A, M 1} \\ W_{D C A, M 2} \end{matrix})}^{T} (\begin{matrix} f_{M 1} \\ f_{M 2} \end{matrix}) \end{matrix}

(54)

where $g_{C D C A}$ and $g_{S D C A}$ represent the concatenation and summation of DCA features, respectively. In this study, $g_{S D C A}$ was chosen, since (i) summation procedure features in lower number of dimensions, and (ii) the summation and concatenation have similar results reported in [42]. In addition, feature fusion can help improve the performance compared to using a single PTM model (See Sections 4.3 and 4.4).

3.5. Data augmentation

Multiple-way data augmentation (MDA) technology [44] was used in this study. The disparity of MDA to conventional DA is that MDA utilizes a large number of different data augmentation methods. There are two types [45] of MDA, offline and online. Offline means editing and storing data on the disk, and online means on-the-fly augmentation. In this study, we chose to use offline multiple-way data augmentation, as shown in Fig. 6 . Usually, online data augmentation is mainly applied when the dataset is large. The transformations happen in mini-batches and then, the transformed data is fed into the model to improve the generalization of the model. However, we have a small dataset in this study. Therefore, we chose offline data augmentation as a preprocessing step to expand the dataset.

Fig. 6 — Diagram of proposed offline MDA technology. (DA: data augmentation; MDA: multiple-way DA).

Suppose the number of different DA techniques used is $c_{D A}$ , and there is one training image $x^{t r} (i) \in X^{t r}$ , where $X^{t r}$ means the training set. Assume each offline MDA technique will generate $n_{D A}$ images, so for each image, we will generate $c_{D A} \times n_{D A}$ new images. Over the entire training image set $X^{t r}$ , we perform the subsequent seven DA methods:

(i) noise injection

The $h_{m}^{N I}$ -mean $h_{v}^{N I}$ -variance Gaussian noises were added to all training images to produce $n_{D A}$ new noised images.

$\vec{x^{t r 1} (i)} \begin{matrix} | = F_{N I} [x^{t r} (i)] \\ | = [x_{1}^{t r 1} (i), \dots x_{n_{D A}}^{t r 1} (i)] \end{matrix}$

\begin{matrix} \vec{x^{t r 1} (i)} = F_{N I} [x^{t r} (i)] \\ = [x_{1}^{t r 1} (i), \dots x_{n_{D A}}^{t r 1} (i)] \end{matrix}

(55)

where $F_{N I}$ means the noise injection function.

(ii) horizontal shear (HS) transform

New $n_{D A}$ images were made by HS transform

\vec{x^{t r 2} (i)} \begin{matrix} | = F_{H S} [x^{t r} (i)] \\ | = [x_{1}^{t r 2} (i, h_{1}^{H S}), \dots x_{n_{D A}}^{t r 2} (i, h_{n_{D A}}^{H S})] \end{matrix}

(56)

Where $F_{H S}$ denotes the HS transform function. HS factors $h^{H S}$ does not include the value of $h^{H S} = 0$ .

(iii) vertical shear (VS) transform

\vec{x^{t r 3} (i)} \begin{matrix} | = F_{V S} [x^{t r} (i)] \\ | = [x_{1}^{t r 3} (i, h_{1}^{V S}), \dots x_{n_{D A}}^{t r 3} (i, h_{n_{D A}}^{V S})] \end{matrix}

(57)

where $F_{V S}$ means VS transform function, which operates similarly as ST transform. The VS factor has the same value of HS factor $h_{j}^{V S} = h_{j}^{H S}, \forall j \in 1, 2, \dots, n_{D A}$ .

(iv) rotation

(i) Rotation angle vector $h^{R O}$ skips the value of 0.

\begin{matrix} \vec{x^{t r 4} (i)} = F_{R O} [x^{t r} (i)] \\ = [x_{1}^{t r 4} (i, h_{1}^{R O}), \dots x_{n_{D A}}^{t r 4} (i, h_{n_{D A}}^{R O})] \end{matrix}

(58)

where $F_{R O}$ means rotation operation.

(v) gamma correction (GC)

The factor of GC $h^{G C}$ skips the value of 1.

\begin{matrix} \vec{x^{t r 5} (i)} = F_{G C} [x^{t r} (i)] \\ = [x_{1}^{t r 5} (i, h_{1}^{G C}), \dots x_{n_{D A}}^{t r 5} (i, h_{n_{D A}}^{G C})] \end{matrix}

(59)

Where $F_{G C}$ means GC operation.

(vi) random translation (RT)

Every image in the training set $x^{t r} (i), i = 1, 2, \dots, | X^{t r} |$ is translated $n_{D A}$ times with random vertical shift $h_{r v s}$ and random horizontal shift $h_{r h s}$ . The values of $h_{r h s}$ and $h_{r v s}$ are in the range of $[- a_{Z}, a_{Z}]$ , and obey uniform distribution $V$ .

{\begin{matrix} h_{r h s}^{i} \sim V [- M_{S R}, + M_{S R}] \\ h_{v h s}^{i} \sim V [- M_{S R}, + M_{S R}] \end{matrix}, \forall i \in [1, n_{D A}]

(60)

where $M_{S R}$ is the maximum shift range. Hence, we have

\begin{matrix} \vec{x^{t r 6} (i)} = F_{R T} [x^{t r} (i)] \\ = [x_{1}^{t r 6} (i, h_{r h s}^{1}, h_{v h s}^{1}), \dots x_{n_{D A}}^{t r 6} (i, h_{r h s}^{n_{D A}}, h_{v h s}^{n_{D A}})] \end{matrix}

(61)

(vii) scaling

All training images ${x^{t r} (i)}$ are scaled with scaling factor $h^{S C}$ , skipping $h^{S C} = 1$ .

\begin{matrix} \vec{x^{t r 7} (i)} = F_{S C} [x^{t r} (i)] \\ = [x_{1}^{t r 7} (i, h_{1}^{S C}), \dots x_{n_{D A}}^{t r 7} (i, h_{n_{D A}}^{S C})] \end{matrix}

(62)

where $F_{S C}$ is the scaling operation.

(ix) mirror

All the above $\frac{c_{D A}}{2}$ results are mirrored:

\begin{matrix} \vec{x^{t r (k + \frac{c_{D A}}{2})} (i)} = F_{M I R} [\vec{x^{t r (k)} (i)}] \end{matrix}, \forall k \in {1, 2, \dots, \frac{c_{D A}}{2}}

(63)

where $F_{M I R}$ represents the mirror function.

(x) concatenation

All the $c_{D A}$ -way results are concatenated as

\vec{{\underset{︸}{x^{D A} (i)}}_{α_{D A}}} = F_{C O N} {\underset{1}{\underset{︸}{x^{t r} (i)}}, \underset{1}{\underset{︸}{F_{M I R} [x^{t r} (i)]}}, \vec{{\underset{︸}{x^{t r 1} (i)}}_{n_{D A}}}, \dots \vec{{\underset{︸}{x^{t r} (C_{D A}) (i)}}_{n_{D A}}}}

(64)

where $F_{C O N}$ means concatenation operation, $\vec{x^{D A} (i)}, i = 1, 2, \dots, | X^{D A} |$ is the collection of generated MDA images with original image $x^{t r} (i)$ . $X^{D A}$ is the set of all augmented images. $| X^{D A} |$ is the size of the augmented dataset. $a_{D A}$ the data augmentation factor (DAF), representing the ratio of size of augmented training set to the size of original training set. $a_{D A}$ is calculated as

a_{D A} = \frac{| \vec{x^{D A} (i)} |}{| x^{t r} (i) |} = \frac{| X^{D A} |}{| X^{t r} |}

(65)

We can calculate $a_{D A} = n_{D A} \times c_{D A} + 2$ . Therefore, the MDA is a function making the enhanced training set $a_{D A}$ times as large as the original training set $X^{t r}$ .

{x^{t r} (i) \in X^{t r}} \overset{MDA}{\to} {\to x^{D A} (i) \in X^{D A}}

(66)

3.6. Experiment setup and measures

Two types of measures were performed in our experiment. One is for validation to choose the best PTMs, and the other is on the test set to relate the unbiased performances so as to compare with state-of-the-art approaches. the whole preprocessed dataset $X_{4}$ is split into a non-test set $X_{4}^{n t e s t}$ , and a test set $X_{4}^{t e s t}$ , i.e., $X_{4} \to {X_{4}^{n t e s t}, X_{4}^{t e s t}}$ . Roughly, the non-test set $X_{4}^{n t e s t}$ comprises 80% of the whole dataset, and the test set $X_{4}^{t e s t}$ the remaining 20%. So we have

N_{k}^{n t e s t} + N_{k}^{t e s t} = N_{k} (k = 1, 2, 3, 4)

(67)

where $N_{k}^{n t e s t}$ means the number of samples in the non-test set in k-th class and $N_{k}^{t e s t}$ the number of samples of the test set in k-th class. Hence, $N_{k}^{n t e s t} / N_{k} \approx 80 %$ , and $N_{k}^{t e s t} / N_{k} \approx 20 %$ .

For the validation phase, a $R_{v}$ runs of 10-fold cross validation [46] was run to obtain the validation performance. The ideal confusion matrix $L_{V a l}^{I}$ combining all $R_{v}$ runs of 10 folds is

L_{V a l}^{I} = R_{v} \times [\begin{matrix} N_{1}^{n t e s t} \\ N_{2}^{n t e s t} \\ N_{3}^{n t e s t} \\ N_{4}^{n t e s t} \end{matrix}]

(68)

In the test phase, we ran our selected best models with $R_{t}$ times, each run with various initial seeds, the ideal confusion matrix $L_{T e s t}^{I}$ is

L_{T e s t}^{I} = R_{t} \times [\begin{matrix} N_{1}^{t e s t} \\ N_{2}^{t e s t} \\ N_{3}^{t e s t} \\ N_{4}^{t e s t} \end{matrix}]

(69)

For realistic runs, suppose $r_{i}$ is the run index, and each run we will generate either validation confusion matrix [47] $L_{V a l} (r_{i})$ or test confusion matrix $L_{T e s t} (r_{i})$ . After summarizing all runs, we can obtain the summation of validation confusion matrix $L_{V a l}$ as

L_{V a l} = \sum_{r_{i} = 1}^{R_{v}} L_{V a l} (r_{i})

(70)

And the summation of test confusion matrix $L_{T e s t}$ as

L_{T e s t} = \sum_{r_{i} = 1}^{R_{t}} L_{T e s t} (r_{i})

(71)

For each class $k = 1, 2, 3, 4$ , we set the that class label as “positive”, and all other three classes are “negative”. Fig. 7 shows a schematic of a multiple-class confusion matrix, where we focus on the 3rd class. Hence, the element on the 3rd row and 3rd column is TP, the summation of the remaining entries on the 3rd row is FN, and the summation of the remaining entries in the 3rd column is FP. So, we can define this measure per class as

S e n (k) = \frac{T P (k)}{T P (k) + F N (k)}

(72)

P r c (k) = \frac{T P (k)}{T P (k) + F P (k)}

(73)

F 1 (k) = \frac{2 * P r c (k) * S e n (k)}{P r c (k) + S e n (k)}

(74)

Fig. 7 — Confusion matrix of multiple class conditions.

Measures can also be given at an overall level. One is called macro-level, which computes the metric independently for each class and takes the average that gives equal weight to each class (treating all classes equally) [48]. In contrast, the other is micro-level, weighting all samples equally [49]. In this multiple classification research, we prefer the micro-averaged (MA) F1 as the dataset is slightly unbalanced. The MA F1 [50] ( $F 1_{μ}$ ) is defined below as the main indicator in the validation phase.

F 1_{μ} = \frac{2 * P r c_{μ} * S e n_{μ}}{P r c_{μ} + S e n_{μ}}

(75)

where $P r c_{μ}$ and $S e n_{μ}$ are micro-averaged precision and micro-averaged sensitivity, defined as

S e n_{μ} = \frac{\sum_{k} T P (k)}{\sum_{k} T P (k) + F N (k)}

(76)

P r c_{μ} = \frac{\sum_{k} T P (k)}{\sum_{k} T P (k) + F P (k)}

(77)

We used $F 1_{μ}$ in this study, since its values equals $S e n_{μ}$ and $P r c_{μ}$ .

3.7. Pseudocode of CCSHNet

Algorithm 4 lists the pseudocode of proposed AI model, named CCSHNet, which is an acronym of the four categories analyzed in this study: COVID-19, CAP, SPT, and HC. The proposed algorithm and experiment setup consisted of five phases: Phase I shows the preprocessing. Phase II shows $R_{v}$ runs of ten-folds CV on the non-test set. Phase III shows the PTM selection. Phase IV presents CCSHNet model creation, and Phase V reports the test performance of the CCSHNet model.

Gradient-weighted Class Activation Mapping (Grad-CAM) [51] was used to give an explainable heat map. It utilizes the gradient of the classification score in terms of the convolutional features regulated by the AI model to help users comprehend which regions of the input image are the most vital for AI model to make decisions.

4. Experiments, results, and discussions

4.1. Hyperparameter values

Table 4 itemizes the hyperparameter setting. The image size using slice level section was obtained as $| X_{0} | = 1164$ . The size of each raw image was $1024 \times 1024 \times 3$ . The crop values along four directions were all set to 150 (We tested larger values and found some important chest regions are removed). The number of PTM candidates was set to 6. The number of the first FCL was set to 512, and the number of the second FCL was set to 4, which corresponds to the number of classes in this task. The maximum removable layer was set 3, so we searched the best $L$ at the range of $[1, 3]$ . The number of hidden neurons in OHNN was set to 10.

Table 4.

Hyperparameter Setting.

Parameter	Value
$\| X_{0} \|$	$\| X_{0} \| = 284 + 281 + 293 + 306 = 1164$
$W_{0}$	1024
$H_{0}$	1024
$C_{0}$	3
$c_{t}$	150
$c_{b}$	150
$c_{l}$	150
$c_{r}$	150
$N_{P T M}$	6
$N_{F C L} (1)$	512
$N_{F C L} (2)$	4
$L_{max}$	3
$N_{H L}$	10
$c_{D A}$	14
$n_{D A}$	30
$h_{m}^{N I}$	0
$h_{v}^{N I}$	0.01
$h^{H S}$	$h_{1 - 15}^{H S} = [- 0.15, - 0.14, \dots, - 0.01]$ , $h_{16 - 30}^{H S} = [+ 0.01, + 0.02, \dots, + 0.15]$ .
$h^{R O}$	$h_{1 - 15}^{R O} = [- 30^{\circ}, - 28^{\circ}, \dots, - 2^{\circ}]$ , $h_{16 - 30}^{R O} = [+ 2^{\circ}, + 4^{\circ}, \dots, + 30^{\circ}]$ .
$h^{G C}$	$h_{1 - 15}^{G C} = [0.4, 0.44, \dots, 0.96]$ , $h_{16 - 30}^{G C} = [1.04, 1.08, \dots, 1.6]$ .
$M_{S R}$	20
$h^{S C}$	$h_{1 - 15}^{S C} = [0.7, 0.72, \dots, 0.98]$ , $h_{16 - 30}^{S C} = [1.02, 1.04, \dots, 1.3]$ .
$a_{D A}$	422
$R_{v}$	10
$R_{t}$	10

Open in a new tab

For the offline MDA technique, the number of different DA techniques was adjusted to 14. The number of generated images by each offline MDA technique was 30. The mean and variance of Gaussian noise injected were 0 and 0.01, respectively. The HS factor $h^{H S}$ ranged from −0.15 to +0.15, excluding the value of 0. The RO factor $h^{R O}$ ranged from −30 to 30 excluding the value of 0. The GC factor $h^{G C}$ ranged from 0.4 to 1.6 skipping the value of 1. The maximum shift range was set to 20. The SC factor $h^{S C}$ ranged from 0.7 to 1.3 excluding the value of 1. Data augmentation factor was calculated as 422. The number of runs over validation and test sets were all set to 10.

Table 5 itemizes the training, validation, and test set for each category. For the non-test set, 10-fold cross validation was used for validation, with 9 folds being for training and the remaining fold for validation, which repeated 10 times, so all the non-test set was used in the validation set. The above 10-fold cross validation repeat $R_{v}$ runs, and thus generated a summation of validation confusion matrix $L_{V a l}$ . For the test set, $R_{t}$ runs generated a summation of test confusion matrix $L_{T e s t}$ .

Table 5.

Training, validation, and test set.

	Non-test (9 folds for training and 1-fold for validation)	Test	Total
COVID-19	$N_{1}^{n t e s t} = 227$	$N_{1}^{t e s t} = 57$	$N_{1} = 284$
CAP	$N_{2}^{n t e s t} = 225$	$N_{2}^{t e s t} = 56$	$N_{2} = 281$
SPT	$N_{3}^{n t e s t} = 234$	$N_{3}^{t e s t} = 59$	$N_{3} = 293$
HC	$N_{4}^{n t e s t} = 245$	$N_{4}^{t e s t} = 61$	$N_{4} = 306$

Open in a new tab

4.2. Illustration of multiple data augmentation

Fig. 8 displays the MDA results, where the hyperparameters can be found in Section 4.1. The raw image is Fig. 2(a), which generates 421 new images (1 mirror image, 210 new images obtained from the original image, and 210 new images obtained from the mirrored original image). Fig. 8(a-g) shows the noise injection, HS transform, VS transform, rotation, GC, RT, and scaling results, respectively.

4.3. Top three models of the validation set

On the validation set, we found the best three models using GSAF were: (i) $M^{P T M} (2)$ , i.e., DenseNet201 with NLR of 1; (ii) DenseNet201 with NLR of 2; and (iii) ResNet1–1 with NLR of 1. Those top best three models found by GSAF are listed in Table 6 . For the best model (DenseNet201 with NLR of 1), we observed the sensitivity of the four classes were 94.63%, 93.16%, 98.12%, and 99.18%, the precision of the four classes were 96.45%, 96.68%, 96.15%, and 96.16%, the F1 score of the four classes were 95.53%, 94.88%, 97.12%, and 97.65%. The MA F1 score was 96.35%. For the other two best models, their $F 1_{μ}$ values were 96.06%, and 95.83%, respectively.

Table 6.

Top best three models on validation set.

Model	Class	Sen (%)	Prc (%)	F1 (%)
DensetNet201 (NLR=1)	C1	94.63	96.45	95.53
	C2	93.16	96.68	94.88
	C3	98.12	96.15	97.12
	C4	99.18	96.16	97.65
	MA			96.35
DensetNet201 (NLR=2)	C1	94.93	97.07	95.99
	C2	93.91	95.57	94.73
	C3	97.26	94.09	95.65
	C4	97.92	97.52	97.72
	MA			96.06
ResNet101 (NLR=1)	C1	96.91	96.57	96.74
	C2	96.22	94.45	95.33
	C3	94.44	95.75	95.09
	C4	95.79	96.50	96.14
	MA			95.83

Open in a new tab

(MA: micro-averaged; Sen: Sensitivity; Prc: Precision).

4.4. GSAF against SAPNF

Using the greedy version GSAF to select the two models, we chose the best two models as DenseNet201 with NLR of 1, and DenseNet201 with NLR of 2. Conversely, using the non-greedy algorithm SAPNF showed the two best models to be fused were DenseNet201 with NLR of 1 and ResNet101 with NLR of 1. The comparative results are presented in Table 7 .

Table 7.

GSAF against SAPNF on validation set.

Selection Approach	Selected Model	Class	Sen (%)	Prc (%)	F1 (%)
GSAF	DenseNet201 (NLR =1) & DenseNet201 (NLR =2)	C1	94.80	96.58	95.68
		C2	93.42	96.59	94.98
		C3	97.90	96.22	97.05
		C4	99.06	96.11	97.56
		MA			96.37
SAPNF	DenseNet201 (NLR=1) & ResNet101 (NLR=1)	C1	96.43	98.07	97.24
		C2	95.95	97.03	96.49
		C3	97.64	96.82	97.23
		C4	98.53	96.83	97.67
		MA			97.18

Open in a new tab

(MA: micro-averaged; Sen: Sensitivity; Prc: Precision).

There are two findings we can observe from comparing Table 7 with Table 6. (i) First, fusion can give improved performance than individual models alone. The MA F1 score $F 1_{μ}$ of the best single model was 96.35%, while the two fused model gave improved performance, with GSAF of 96.37%, and SAPNF of 97.18%. (ii) A non-greedy selection approach (SAPNF) can obtain better results than the greedy selection approach, GSAF. The reason is the two best models have similar advantages. For example, both DenseNet201 (NLR=1) and DenseNet201 (NLR=2) work optimally on the 3rd and 4th classes, so their fusion will not help to handle the weak spots (1st and 2nd classes). Nevertheless, the 3rd best model, i.e., RESNet101 (NLR = 1) shows exceptional classification ability on 1st and 2nd classes. Hence, fusing the 1st best model and 3rd best model is more logical, which is the core idea of our SAPNF.

4.5. Visual explanation of fusion

Grad-CAM [51] was used to illustrate why the fusion of Heat map by DenseNet201 (NLR =1) and Heat map by ResNet101 (NLR = 1) works the best among all possible fusion model combinations.

Fig. 9 displays the heat map results of a COVID-19 CCT slice by Grad-CAM over three models. Fig. 9(b, c, & d) presents the heat maps generated by DenseNet201 (NLR =1), DenseNet201 (NLR = 2), and ResNet101 (NLR = 1), respectively. We can observe that DenseNet201 networks with NLR equaling 1 & 2 capture the same GGO lesion on the bottom-half of the pictures (See Fig. 9b & c), so their fusion will not aid the other model. In contrast, ResNet101 (NLR=1) captures the top left GGO areas, which are neglected by the two DenseNet models. Thus, fusing DenseNet201 (NLR =1) and ResNet101 (NLR=1) is reasonable and has a solid visual explanation.

Fig. 10 displays the Grad-CAM heat map of a normal CCT slice using the top three models. Fig. 10(a) shows the original CCT image, and Fig. 10(b-d) gives the heat maps using DenseNet201 (NLR=1), DenseNet201 (NLR=2), and ResNet101 (NLR=1). All three AI models did not locate any strong indications of suspicious areas. Therefore, all three AI models classified this image as “normal”, which was subsequently confirmed by a radiologist.

4.6. Performance of CCSHNet on the test set

After completing our previous experiments on the validation set, and selecting the optimal pretrained models and optimal NLR values, we ran our model CCSHNet, i.e., fusion of DenseNet201 (NLR =1) and ResNet101 (NLR=1) via DCA, on the test set and reported its performance. Test results are summarized in Table 8 . The sensitivities of four classes were 95.61%, 96.25%, 98.30%, and 97.86%, respectively. The precision values for the four classes were 97.32%, 96.42%, 96.99%, and 97.38%, respectively. The F1 scores of the four classes were 96.46%, 96.33%, 97.64%, and 97.62%, respectively. The MA F1 score $F 1_{μ}$ of CCSHNet on test set was 97.04%, which is slight lower than the validation $F 1_{μ}$ of 97.18% (See Table 7).

Table 8.

Performance of proposed CCSHNet on test set (%).

Class	Sen (%)	Prc (%)	F1 (%)
C1	95.61	97.32	96.46
C2	96.25	96.42	96.33
C3	98.30	96.99	97.64
C4	97.86	97.38	97.62
MA			97.04

Open in a new tab

(MA: micro-averaged; Sen: Sensitivity; Prc: Precision).

4.7. Comparison to state-of-the-art approaches

Proposed CCSHNet method was compared with 12 state-of-the-art approaches: RCBO [10], ELM-BA [11], 6L-CNN [12], RN-18 [13], RN-50-AD [14], GAN-GN [15], SMO [16], CSS [17], NiNet [19], FCONet [20], COVNet [21], and DeCovNet [22]. All these approaches were compared using our dataset. The comparison and their MA F1 $F 1_{μ}$ plots are presented in Table 9 and Fig. 11 , respectively.

Table 9.

Comparison results of state-of-the-art methods.

Method	Class	Sen (%)	Prc (%)	F1 (%)
RCBO [10]	C1	71.93	84.19	77.58
	C2	72.86	72.73	72.79
	C3	73.56	76.41	74.96
	C4	80.66	68.91	74.32
	MA			74.85
ELM-BA [11]	C1	62.63	67.61	65.03
	C2	64.29	65.10	64.69
	C3	71.86	66.77	69.22
	C4	63.93	63.52	63.73
	MA			65.71
6L-CNN [12]	C1	72.46	83.94	77.78
	C2	78.93	77.82	78.37
	C3	81.86	75.00	78.28
	C4	89.84	87.54	88.67
	MA			80.94
RN-18 [13]	C1	82.81	82.66	82.73
	C2	81.07	74.43	77.61
	C3	74.24	76.98	75.58
	C4	82.13	86.38	84.20
	MA			80.04
RN-50-AD [14]	C1	87.72	85.03	86.36
	C2	87.68	91.26	89.44
	C3	93.39	89.89	91.60
	C4	84.92	87.65	86.26
	MA			88.41
GAN-GN [15]	C1	91.75	89.86	90.80
	C2	92.86	91.87	92.36
	C3	89.83	89.98	89.91
	C4	91.64	94.27	92.93
	MA			91.50
SMO [16]	C1	97.02	92.63	94.77
	C2	89.11	95.23	92.07
	C3	94.92	94.92	94.92
	C4	94.26	92.89	93.57
	MA			93.86
CSS [17]	C1	94.04	92.25	93.14
	C2	93.75	95.11	94.42
	C3	91.36	93.58	92.45
	C4	94.43	92.75	93.58
	MA			93.39
NiNet [19]	C1	87.89	91.59	89.70
	C2	80.89	85.47	83.12
	C3	83.22	82.11	82.66
	C4	92.30	85.95	89.01
	MA			86.18
FCONet [20]	C1	92.28	95.64	93.93
	C2	96.79	94.43	95.59
	C3	94.75	95.88	95.31
	C4	94.92	92.94	93.92
	MA			94.68
COVNet [21]	C1	89.82	86.63	88.20
	C2	89.82	92.63	91.21
	C3	93.73	90.66	92.17
	C4	87.38	90.96	89.13
	MA			90.17
DeCovNet [22]	C1	91.05	90.58	90.81
	C2	93.75	90.99	92.35
	C3	90.51	86.97	88.70
	C4	88.69	95.58	92.01
	MA			90.94
CCSHNet (Ours)	C1	95.61	97.32	96.46
	C2	96.25	96.42	96.33
	C3	98.30	96.99	97.64
	C4	97.86	97.38	97.62
	MA			97.04

Open in a new tab

The results in Table 9 and Fig. 11 demonstrate that our CCSHNet accomplished the best outcomes among all methods. The reason our CCSHNet obtains the best overall performance is that we have proposed several new algorithms to improve our fusion model: (L, 2) transfer feature learning (L2TFL), the selection algorithm of pretrained network for fusion (SAPNF), and deep CCT fusion discriminant correlation analysis (DCFDCA). The fusion framework demonstrates their effectiveness. Meanwhile, the proposed multiple-way data augmentation prevents our AI model from overfitting, thus increasing its performances.

Our method is unique in comparison to other strategies. The RCBO [10] used real-coded strategy in traditional biogeography-based optimization method; however, their method still needs to manually select the features, and they cannot validate their manually curated features to fit this four-class classification task. ELM-BA [11] used extreme learning classifier as the backbone, which employed random features (i.e., non-tuned random hidden nodes), so its performance may not be reliable. 6L-CNN [12] was proposed for fingerspelling classification during patients’ rehabilitation. It used leaky rectified linear unit to replace traditional rectified linear unit. Nevertheless, the structure itself is shallow (only six layers), thus may not handle the complicated internal mapping from CCT images to the four class labels. RN-18 [13] and RN-50-AD [14] used two variants of ResNet to classify thyroid ultrasound standard plane and Alzheimer's disease, respectively. The weights of the corresponding two networks were already adapted to their corresponding data, so retraining of the weights is required, which results in suboptimal performance. GAN-GN [15] combined generative adversarial network (GAN) and GoogleNet, but the image size and size of the dataset affects the generated images produced by GAN. SMO [16] used social mimic optimization for feature selection and fusion. Nevertheless, SMO's performance needs further verification. CSS [17] predicted COVID severity score in their model. We transferred the score prediction in their paper to COVID-19 recognition in this task. Those geographic extent score and lung opacity score may not have direct relation to our COVID-19 recognition, so this transfer is cross-field, which makes it more challenging. NiNet [19] combined 3D U-Net and MVP-Net. However, the 3D neural network needs more samples to train; otherwise it is susceptible to overfitting. FCONet [20] is a type of fast-track COVID-19 classification network. Again, the authors used ResNet50 and trained their models on three categories. In contrast our CCSHNet used deeper models and four categories of CCT images; hence, our model is more complicated and effective. COVNet [21] chose ResNet50, which has fewer layers than our proposed models (DenseNet201 and ResNet101). They trained their models with three categories; in contrast, our model was trained with four categories, which provides an additional class such as secondary pulmonary tuberculosis. DeCovNet [22] is a weakly-supervised DL method. Nevertheless, their model needs to train a UNet to extract lung regions, which requires more samples and more precise expert annotations.

5. Conclusions

This paper proposed a novel CCSHNet for COVID-19 detection in CCTs. Our model is based on the proposed DCFDCA algorithm of the selected two optimal models, of which we developed a SAPNF algorithm to optimally determine the best PTM and NLR. The feature learning procedures of the two models were achieved by the proposed L2TFL algorithm. Overall, our experiments showed our CCSHNet can achieve the best performance compared to 12 state-of-the-art approaches, and potentially aid radiologists in making more accurate, quicker diagnoses of COVID-19 using CCTs.

The impacts of our method in hospitals and society are promising. From the experimental results, our CCSHNet system can aid decision making when diagnosing lung-related diseases using CCTs. Furthermore, our CCSHNet can be improved by integration with other AI models developed by other teams from other universities/countries. In addition; our algorithm has the potential to be re-deployed to a new hospital's server, with little costs if using cloud-computing based techniques.

The shortcomings of our CCSHNet are three-fold: (i) It cannot handle heterogeneous data, such as the mixed data of CCT with CXR and patient history and other data. (ii) It has not yet been through a strict clinical verification. (iii) The dataset in this study is size-limited and category-limited.

The future work contains following aspects: (i) Expand the size of the dataset and test CCSHNet model on a larger and heterogeneous dataset. (ii) Try to use some advanced PTMs, particularly those trained from medical lung images. (iii) Try some advanced data preprocessing techniques to check whether the performance of our AI system can be improved. (iv) Our AI system can be embedded into other automated healthcare systems [52], [53], [54]. (v) IoT [55], [56], [57], [58] and communication technologies [59] can help make our AI system more powerful. (vi) Some advanced or hybrid fusion rules will be tested.

CRediT authorship contribution statement

Shui-Hua Wang: Conceptualization, Methodology, Software, Validation, Data curation, Writing - original draft, Investigation, Data curation. Deepak Ranjan Nayak: Formal analysis, Writing - original draft, Writing - review & editing. David S. Guttery: Writing - original draft, Writing - review & editing. Xin Zhang: Writing - original draft, Writing - review & editing. Yu-Dong Zhang: Resources, Formal analysis, Investigation, Data curation, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This paper is partially supported by British Heart Foundation Accelerator Award, UK; Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Hope Foundation for Cancer Research, UK (RM60G0680); Medical Research Council Confidence in Concept Award, UK (MC_PC_17171).

Appendix A

Table 10

Table 10.

Abbreviation List.

Abbreviation	Full Name
(M)DA	(multiple-way) data augmentation
BCS	between-class scatter
BSC	between-set covariance
CAP	community-acquired pneumonia
CCA	canonical correlation analysis
CCT	chest computed tomography
DAF	data augmentation factor
DCA	discriminant correlation analysis
DCFDCA	deep CCT fusion by discriminant correlation analysis
DDM	diagonally dominant matrix
DL	deep learning
DLF	decision-level fusion
FCL	fully-connected layer
FLF	feature-level fusion
GSAF	greedy selection algorithm for fusion
HC	healthy control
ISP	incompatible size problem
L2TFL	(L,2) transfer feature learning
MRL	maximum removable layer
MV	Majority voting
NLR	number of layers to be removed
OHNN	one-hidden layer neural network
PF	parallel fusion
PTM	pre-trained model
SAPNF	selection algorithm of pretrained networks for fusion
SF	serial fusion
SPT	secondary pulmonary tuberculosis
SVD	singular value decomposition
TL	transfer learning

Open in a new tab

Appendix B

Table 11

Table 11.

Symbol List.

Symbol	Meaning
$X_{0}$	raw dataset
$x_{0}$	raw slice CCT image
$[W_{0}, H_{0}, C_{0}]$	size of $x_{0}$
$[w, h, c]$	index of $[W, H, C]$
$Z$	labeling
$B$	radiologist
$C$	class (viz., COVID, CAP, SPT, HC)
$F_{G}$	grayscale operation
$a_{l}$	minimum grayscale of an image
$a_{h}$	maximum grayscale of an image
$F_{C}$	crop operation
$(c_{t}, c_{b}, c_{l}, c_{r})$	crop values in unit of pixel from four directions
$X_{4}$	preprocessed dataset
$x_{4}$	preprocessed slice CCT image
$[W_{4}, H_{4}, C_{4}]$	size of $x_{4}$
$δ_{1}$	storage compression ratio
$δ_{2}$	size compression ratio
${X_{S}, L_{S}, O_{S}}$	data, labeling, and classifier of source domain
${X_{T}, L_{T}, O_{T}}$	data, labeling, and classifier of target domain
$e r r$	error function
$l_{r}$	learning rate
$N_{P T M}$	number of PTMs
$M^{P T M}$	a specified model among all six models. See Table 3.
$M_{0}$	a pretrained model
$M_{1}$	$M_{0}$ with last L layers removed
$M_{2}$	$M_{1}$ with 2 new fully connected layers added
$M_{3}$	retrained $M_{2}$
$M (k)$	k-th PTM
$M 1$	1st model
$M 2$	2nd model
$L$	number of last layers to be removed
$L_{M 1}$	number of last layers to be removed at 1st model
$L_{M 2}$	number of last layers to be removed at 2nd model
$L_{0}$	number of learnable layers of $M_{0}$
$L_{2}$	number of learnable layers of $M_{2}$
$L_{max}$	maximum removable layer
$M (a : b)$	layers from $a$ to $b$ in network $M$
$f_{M}$	features learnt from network $M$
$F_{r l}$	remove layer function
$F_{a f c l}$	add fully-connected layer function
$F_{r t}$	retrain function
$F_{a c}$	activation function
$N_{F C L} (1)$	node number of first added FCL layer
$N_{F C L} (2)$	node number of second added FCL layer
${Y_{1}, Y_{2}, Y_{3}}$	training, validation, and test set of a dataset $Y$ .
$N_{H L}$	number of hidden neurons
$B^{i}$	initialized OHNN
$B^{t}$	trained OHNN
$F_{S D}$	sort function in descending way
$\vec{R}$	rank list
$I (k, L)$	indicator by k-th PTM and removing $L$ layers
$\vec{I}$	indicator vector
$F_{L 2 T F L}$	proposed L2TFL operation
$O_{2}$	output on validation set
$O_{3}$	output on test set
$F_{M I}$	measuring indicator function
$f_{F}$	fused feature
$F_{D F}$	deep fusion function
$(f_{M 1}, f_{M 2})$	features to be fused from two models $(M 1, M 2)$
$F_{S F}$	serial fusion operation
$F_{P F}$	parallel fusion operation
$q$	length of feature
$N_{T F}$	number of trained features.
$F_{C C O V}$	cross-covariance operation
$W_{C C A, M 1}$	transformation matrix of CCA for model 1
$W_{C C A, M 2}$	transformation matrix of CCA for model 1
$(\bar{f_{M 1}}, \bar{f_{M 2}})$	transformed features by CCA
$f_{C C C A}$	concatenation of CCA features
$f_{S C C A}$	summation of CCA features
$f_{M 1}^{i j}$	feature extracted from i th image of j-th category via model $M 1$
$S_{B C S}$	between-class scatter matrix
$P_{O E}$	matrix of orthogonal eigenvectors
$\hat{Λ}$	diagonal matrix of real and non-negtive eigenvalue in decreasing order.
$F_{r a n k}$	rank function
$f_{M 1}^{'}$	projection of $f_{M 1}$ where the BCS matrix is $I$
$S_{M 1, M 2}^{'}$	between-set covariance matrix of transformed feature sets
$W_{D C A, M 1}$	transform matrix of DCA for model 1
$W_{D C A, M 2}$	transform matrix of DCA for model 2
$(g_{M 1}, g_{M 2})$	transformed feature sets by DCA
$g_{C D C A}$	concatenation of DCA features
$g_{S D C A}$	summation of DCA features
$c_{D A}$	number of different DA techniques
$n_{D A}$	number of generated images by each offline MDA technique
$x^{t r} (i)$	one training image
$X^{t r}$	training set
$X^{v a}$	validation set
$h_{m}^{N I}$	mean of Gaussian noise injected
$h_{v}^{N I}$	variance of Gaussian noise injected
$F_{N I}$	noise injection operation
$F_{H S}$	horizontal shift transform function
$F_{V S}$	vertical shift transform function
$F_{G C}$	Gamma correction operation
$F_{R O}$	image rotation operation
$F_{R T}$	random translation operation
$h_{r h s}$	random horizontal shift
$h_{r v s}$	random vertical shift
$M_{S R}$	maximum shift range
$V$	uniform distribution
$F_{S C}$	image scaling operation
$F_{M I R}$	mirror function
$F_{C O N}$	concatenation operation
$\vec{x^{D A} (i)}$	collection of generated MDA images with original image
$X^{D A}$	set of all augmented images
$a_{D A}$	data augmentation factor
$X_{4}^{n t e s t}$	non-test set of preprocessed dataset
$X_{4}^{t e s t}$	test set of preprocessed dataset
$N_{k}^{n t e s t}$	number of samples of non-test set in k-th class
$N_{k}^{t e s t}$	number of samples of test set in k-th class.
$L_{V a l}^{I}$	ideal confusion matrix over validation set
$L_{T e s t}^{I}$	ideal confusion matrix over test set
$R_{v}$	number of runs on validation set
$R_{t}$	number of runs on test set
$F 1_{μ}$	micro-averaged F1
$P r c_{μ}$	micro-averaged precision
$S e n_{μ}$	micro-averaged sensitivity
$r_{i}$	run index
$f_{i}$	fold index

Open in a new tab

Algorithm 1.

Proposed L2TFL algorithm.

Step 1 Read one raw PTM network

M_{0}

Step 2 Remove the last NLR l-learnable layers from

M_{0}

and get

M_{1}

M_{1} = F_{r l} (M_{0}, L)

Step 3 Add two new fully connected layers,

M_{2} = F_{a f c l} (M_{1}, 2)

Step 4 Freeze early layers,

\vec{l_{r}} [M_{2} (1 : L_{0} - L)] \leftarrow 0

Step 5 Let last two layers retrainable,

\vec{l_{r}} [M_{2} (L_{2} - 1 : L_{2})] \leftarrow 1

Step 6 Retrain the whole network, and obtain the new network

M_{3} = F_{r t} (M_{2}, X)

Step 7 Output learnt features

f_{N} = F_{a c} (M_{3}, L_{2} - 1)

Open in a new tab

Algorithm 2.

Proposed GSAF for PTM selection.

Step 1 Input: Training set

Y_{1}

and validation set

Y_{2}

Step 2 for k = 1:

N_{P T M}

(k is the index of PTM)

for

L = 1 : L_{max}

(L is the index of NLR)

Step 2.1 PTM Retrain

Import

k

-th PTM

M_{0} (k)

Use L2TFL via data

Y_{1}

and removing

L

layers,

Obtain

M_{3} (k, L)

Step 2.2 Feature Extraction

Generate features

f_{M} (k, L)

from

M_{3} (k, L)

Step 2.3 Train OHNN

Initialize OHNN

B^{i} (k, L)

Train OHNN

B^{i} (k, L)

using input as

f_{M} (k, L)

Obtain

B^{t} (k, L)

Step 2.4 Obtain Indicator

Obtain performance indicator

I (k, L)

over validation set

Y_{2}

end

Step 3 Generate and sort the indicator vector

\vec{I (k, L)},

Step 4 Obtain the rank list

\vec{R_{G S A F}}

Step 5 Choose the top two best models (determine PTM and NLR):

M [\vec{R_{G S A F}} (1)]

and

M [\vec{R_{G S A F}} (2)]

Open in a new tab

Algorithm 3.

Proposed SAPNF for PTM selection.

Step 1 Input: Training set

Y_{1}

and validation set

Y_{2}

Step 2 for

k_{1} = 1 : N_{P T M}

(

k_{1}

is the index of PTM of 1st model)

for

L_{M 1} = 1 : L_{max}

(

L_{M 1}

is the index of NLR of 1st model)

for

k_{2} = 1 : N_{P T M}

(

k_{2}

is the index of PTM of 1st model)

for

L_{M 2} = 1 : L_{max}

(

L_{M 2}

is the index of NLR of 1st model)

Step 2.1 1st model Retrain

Import

k_{1}

-th PTM

M_{0} (k_{1})

Use L2TFL via data

Y_{1}

and removing

L_{M 1}

layers,

Obtain

M_{3} (k_{1}, L_{M 1})

Step 2.2 2nd model Retrain

Import

k_{2}

-th PTM

M_{0} (k_{2})

Use L2TFL via data

Y_{1}

and removing

L_{M 2}

layers,

Obtain

M_{3} (k_{2}, L_{M 2}),

Step 2.3 Feature Extraction from two retrained PTMs

Generate features

f_{M} (k_{1}, L_{M 1})

from

M_{3} (k_{1}, L_{M 1})

Generate features

f_{M} (k_{2}, L_{M 2})

from

M_{3} (k_{2}, L_{M 2})

Step 2.4 Feature Fusion

Obtain

f_{F} (k_{1}, L_{M 1}, k_{2}, L_{M 2})

Step 2.5 Train OHNN

Initialize OHNN

B^{i} (k_{1}, L_{M 1}, k_{2}, L_{M 2})

Train OHNN

B^{i} (k_{1}, L_{M 1}, k_{2}, L_{M 2})

using input as

f_{F} (k_{1}, L_{M 1}, k_{2}, L_{M 2})

Obtain

B^{t} (k_{1}, L_{M 1}, k_{2}, L_{M 2})

Step 2.6 Obtain Indicator

Obtain performance indicator

I (k_{1}, L_{M 1}, k_{2}, L_{M 2})

over validation set

Y_{2}

end

Step 3 Generate and sort the indicator vector

\vec{I (k_{1}, L_{M 1}, k_{2}, L_{M 2})},

Step 4 Obtain the rank list

\vec{R_{S A P N F}}

Step 5 Choose the top two best models (determine PTM and NLR)

M [\vec{R_{S A P N F}} (1)]

and

M [\vec{R_{S A P N F}} (2)]

Open in a new tab

Algorithm 4.

Pseudocode of our CCSHNet algorithm.

Input: Original Image Set

X_{0}

and its ground truth label

Z^{C C T}

Phase I: Preprocessing

X_{0} \to X_{4}

Grayscaling:

X_{0} \to X_{1}

. See Eq. (3).

HS:

X_{1} \to X_{2}

, See Eq. (5).

Crop:

X_{2} \to X_{3} .

See Eq. (6).

Downsampling:

X_{3} \to X_{4} .

See Eq. (7).

Phase II: Ten-folds CV on Non-test Set

Split

X_{4}

into nontest set and test set:

X_{4} \to {X_{4}^{n t e s t}, X_{4}^{t e s t}}

for

r_{i} = 1 : R_{v}

r_{i}

is run index

for

f_{i} = 1 : 10

f_{i}

is fold index

Step II.A Split into 10 folds

Split nontest set

X_{4}^{n t e s t}

into 10 folds

{F_{4}^{n t e s t} (1 | r_{i}), \dots, F_{4}^{n t e s t} (10 | r_{i})}

Step II.B Create Training and Validation set

Training Set:

X^{t r} (r_{i}) = F_{4}^{n t e s t} (1, \dots f_{i - 1}, f_{i + 1}, \dots, 10 | r_{i})

Validation Set:

X^{v a} (r_{i}) = F_{4}^{n t e s t} (f_{i} | r_{i})

Step II.C MDA on training set

for

i = 1 : | X^{t r} |

Training image:

x^{t r} (i, r_{i})

and its ground truth labels

Z^{C C T} [x^{t r} (i, r_{i})]

x^{t r} (i, r_{i})

: i th training image in

r_{i}

-th run

x^{t r} (i, r_{i}) \to \vec{x^{D A} (i, r_{i})}

. See Eq. (66).

end

DA enhanced training set:

X^{D A} (r_{i}) = {\vec{x^{D A} (i, r_{i})} | i = 1, \dots, | X^{t r} (r_{i}) |}

Enhanced training set labels:

Z^{C C T} (r_{i}) = {Z^{C C T} [x^{t r} (i, r_{i})] | i = 1, \dots, | X^{t r} (r_{i}) |}

Step II.D Model Selection by SAPNF, L2TFL, and DCFDCA.

See Algorithm 3, Algorithm 1, and Fig. 5.

Step II.E Validation confusion matrix at

r_{i}

-th run and

f_{i}

-th fold

Record

L_{V a l} (r_{i}, f_{i})

, See Eq. (68)

End

Validation confusion matrix at

r_{i}

-th run

L_{V a l} (r_{i}) = \sum_{f_{i} = 1}^{10} L_{V a l} (r_{i}, f_{i})

end

Phase III: PTM and NLR Selection

Validation confusion matrix. See Eq. (70).

Indicator is chosen as micro-averaged F1.

Obtain

F 1_{μ}

. See Eq. (75)

Obtain the rank list

\vec{R_{S A P N F}}

. See Eq. (28).

Output the top two models, i.e., best PTM and NLR combinations.

Output

M [\vec{R_{S A P N F}} (1)]

and

M [\vec{R_{S A P N F}} (2)]

and the corresponding removed layers

L_{M 1}

and

L_{M 2}

Phase IV: Create CCSHNet Model

Select the two optimal models

M [\vec{R_{S A P N F}} (1)]

and

M [\vec{R_{S A P N F}} (2)]

Feature learning by L2TFL with NLR

L_{M 1}

and

L_{M 2}

layers removed.

Deep CCT fusion by DCFDCA.

OHNN

B^{i}

Phase V: Report the test performance of the CCSHNet model

Training set is

X_{4}^{n t e s t}

, and its labels

Z^{C C T} (X_{4}^{n t e s t})

Test set is

X_{4}^{t e s t}

, and its labels

Z^{C C T} (X_{4}^{t e s t})

for

r_{i} = 1 : R_{t}

r_{i}

is run index

We initialized a random seed

S (r_{i})

at each run.

Trained CCSHNet Model

trainnetwork {CCSHNet, M D A [X_{4}^{n t e s t}], Z^{C C T} (X_{4}^{n t e s t}), S (w)}

Prediction:

P r e d (r_{i}) = predict [CCSHNet, X_{4}^{t e s t}]

;

Test confusion matrix at

r_{i}

-th run:

L_{T e s t} (r_{i}) = compare [P r e d (r_{i}), Z^{C C T} (X_{4}^{t e s t})]

Calculate Indicators. See Eq. (72)-(77).

End

Test confusion matrix: See Eq. (71).

Calculate indicators.

Output: The best model CCSHNet and its test performances.

Open in a new tab

Fig. 5 — Diagram of our proposed fusion method, indicating the relation among SAPNF, L2TFL, and DCFDCA. (SAPNF: selection algorithm of pretrained networks for fusion; L2TFL: (L, 2) transfer feature learning; DCFDCA: deep CCT fusion by discriminant correlation analysis; CCT: chest CT; PTM: pretrained model).

References

1.COVID-19 CORONAVIRUS PANDEMIC, 2020. (12/Oct/2020). Available: https://www.worldometers.info/coronavirus.
2.A. Azar, D.E. Wessell, J.R. Janus, and L.V. Simon. Fractured aluminum nasopharyngeal swab during drive-through testing for COVID-19: radiographic detection of a retained foreign body. Skeletal Radiol. [Article; Early Access]. 5 (2020). doi: 10.1007/s00256-020-03582-x. [DOI] [PMC free article] [PubMed]
3.de Barry O., Obadia I., Hajjam M.El, Carlier R.Y. Chest-X-ray is a mainstay for follow-up in critically ill patients with covid-19 induced. Eur. J. Radiol. 2020;129(2) doi: 10.1016/j.ejrad.2020.109075. Article ID: 109075, Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Herpe G., Tasu J.P. Impact of the Prevalence on the Predictive Positive Value of Chest CT in the Diagnosis of Coronavirus Disease (COVID-19) Am. J. Roentgenol. 2020;215:W39. doi: 10.2214/AJR.20.23530. Sep. [DOI] [PubMed] [Google Scholar]
5.Willman D. The Washington Post [Internet]; 2020. Contamination At CDC Lab Delayed Rollout of Coronavirus Tests. [Google Scholar]
6.Ai T., Yang Z., Hou H., Zhan C., Chen C., Lv W. Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: a Report of 1014 Cases,". Radiology. 2020;296:E32–E40. doi: 10.1148/radiol.2020200642. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Imre A. A Typical Chest CT Appearance of a Case with Coronavirus Disease 2019 (COVID-19), Erciyes Med. J. Sep, 2020;42:346–347. [Google Scholar]
8.Flor N., Tonolini M. From ground-glass opacities to pulmonary emboli. A snapshot of the evolving role of a radiology unit facing the COVID-19 outbreak. Clin. Radiol. 2020;75:556–557. doi: 10.1016/j.crad.2020.04.009. Jul, [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Fry C.V., Cai X.J., Zhang Y., Wagner C.S. Consolidation in a crisis: patterns of international collaboration in early COVID-19 research. PLoS ONE. 2020;15:15. doi: 10.1371/journal.pone.0236307. Article ID: e0236307, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Li P., Liu G. Pathological Brain Detection via Wavelet Packet Tsallis Entropy and Real-Coded Biogeography-based Optimization. Fundam. Inform. 2017;151:275–291. [Google Scholar]
11.Lu S. A Pathological Brain Detection System based on Extreme Learning Machine Optimized by Bat Algorithm. CNS Neurol. Dis. - Drug Targets. 2017;16:23–29. doi: 10.2174/1871527315666161019153259. [DOI] [PubMed] [Google Scholar]
12.Jiang X. Chinese Sign Language Fingerspelling Recognition via Six-Layer Convolutional Neural Network with Leaky Rectified Linear Units for Therapy and Rehabilitation. J. Med. Imaging Health Inform. 2019;9:2031–2038. [Google Scholar]
13.Guo M.H., Du Y.Z. Classification of Thyroid Ultrasound Standard Plane Images using ResNet-18 Networks. IEEE 13th International Conference on Anti-Counterfeiting, Security, and Identification; Xiamen, China; 2019. pp. 324–328. [Google Scholar]
14.Fulton L.V., Dolezel D., Harrop J., Yan Y., Fulton C.P. Classification of Alzheimer's Disease with and without Imagery Using Gradient Boosted Machines and ResNet-50. Brain. Sci. 2019;9:16. doi: 10.3390/brainsci9090212. Article ID: 212, Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Loey M., Smarandache F., Khalifa N.E.M. Within the Lack of Chest COVID-19 X-ray Dataset: a Novel Detection Model Based on GAN and Deep Transfer Learning. Symmetry-Basel. 2020;12:19. Article ID: 651, Apr. [Google Scholar]
16.Togacar M., Ergen B., Comert Z. COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 2020;121:12. doi: 10.1016/j.compbiomed.2020.103805. Article ID: 103805, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Cohen J.P., Dao L., Morrison P., Roth K., Bengio Y., Shen B.Y. Predicting COVID-19 Pneumonia Severity on Chest X-ray With Deep Learning. Cureus. 2020;12:10. doi: 10.7759/cureus.9448. Article ID: e9448, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Tabik S., Gómez-Ríos A., Martín-Rodríguez J., Sevillano-García I., Rey-Area M., Charte D. 2020. COVIDGR Dataset and COVID-SDNet Methodology For Predicting COVID-19 Based On Chest X-Ray Images. arXiv Preprint. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ni Q.Q., Sun Z.Y., Qi L., Chen W., Yang Y., Wang L. A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images. Eur. Radiol. 2020:11. doi: 10.1007/s00330-020-07044-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ko H., Chung H., Kang W.S., Kim K.W., Shin Y., Kang S.J. COVID-19 Pneumonia Diagnosis Using a Simple 2D Deep Learning Framework With a Single Chest CT Image: model Development and Validation. J. Med. Internet Res. 2020;22:13. doi: 10.2196/19569. Article ID: e19569, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Li L., Qin L., Xu Z., Yin Y., Wang X., Kong B. Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: evaluation of the Diagnostic Accuracy. Radiology. 2020;296:E65–E71. doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Wang X.G., Deng X.B., Fu Q., Zhou Q., Feng J.P., Ma H. A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT. IEEE Trans. Med. Imaging. Aug, 2020;39:2615–2625. doi: 10.1109/TMI.2020.2995965. [DOI] [PubMed] [Google Scholar]
23.Satapathy S.C., Zhu L.Y., Górriz J.M. A seven-layer convolutional neural network for chest CT based COVID-19 diagnosis using stochastic pooling. IEEE Sens. J. 2020:1. doi: 10.1109/JSEN.2020.3025855. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wu X. Diagnosis of COVID-19 by Wavelet Renyi Entropy and Three-Segment Biogeography-Based Optimization. Int. J. Comput. Intelligence Syst. 2020;13:1332–1344. 2020-09-17T09:29:20.000Z. [Google Scholar]
25.Chen Y. A Feature-Free 30-Disease Pathological Brain Detection System by Linear Regression Classifier. CNS Neurol. Dis. - Drug Targets. 2017;16:5–10. doi: 10.2174/1871527314666161124115531. [DOI] [PubMed] [Google Scholar]
26.Chen Y. Wavelet energy entropy and linear regression classifier for detecting abnormal breasts. Multimed. Tools Appl. 2018;77:3813–3832. [Google Scholar]
27.Farhood H., Perry S., Cheng E., Kim J. Enhanced 3D Point Cloud from a Light Field Image. Remote Sens. (Basel) 2020;12 Article ID: 1125, Apr. [Google Scholar]
28.Debnath S., Talukdar F.A. Brain tumour segmentation using memory based learning method. Multimed. Tools. Appl. Aug, 2019;78:23689–23706. [Google Scholar]
29.Glatt R., Da Silva F.L., Bianchi R.A.D., Costa A.H.R. DECAF: deep Case-based Policy Inference for knowledge transfer in Reinforcement Learning. Expert Syst. Appl. 2020;156:13. Article ID: 113420, Oct, [Google Scholar]
30.Benbahria Z., Sebari I., Hajji H., Smiej M.F. Intelligent mapping of irrigated areas from landsat 8 images using transfer learning. Int. J. Eng. Geoscie. 2021;6:41–51. Feb, [Google Scholar]
31.Hundt A., Killeen B., Greene N., Wu H.T., Kwon H., Paxton C. Good Robot!": efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer. IEEE Robotics Automation Lett. 2020;5:6724–6731. Oct, [Google Scholar]
32.Gessert N., Bengs M., Wittig L., Dr?mann D., Keck T., Schlaefer A. Deep transfer learning methods for colon cancer classification in confocal laser microscopy images. Int. J. Comput. Assist. Radiol. Surg. 2019;14:1837–1845. doi: 10.1007/s11548-019-02004-1. Nov, [DOI] [PubMed] [Google Scholar]
33.Hassanpour M., Malek H. Learning Document Image Features With SqueezeNet Convolutional Neural Network. Int. J. Eng. Jul, 2020;33:1201–1207. [Google Scholar]
34.Hirano G., Nemoto M., Kimura Y., Kiyohara Y., Koga H., Yamazaki N. Automatic diagnosis of melanoma using hyperspectral data and GoogLeNet. Skin Res. Technol. [Article; Early Access]. 2020;7 doi: 10.1111/srt.12891. [DOI] [PubMed] [Google Scholar]
35.Venturi L., Bandeira A.S., Bruna J. Spurious Valleys in One-hidden-layer Neural Network Optimization Landscapes. J. Mach. Learn. Res. 2019;20:34. Article ID: 133. [Google Scholar]
36.Planet S., Iriondo I. Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. 7th Iberian Conference on Information Systems and Technologies (CISTI 2012; Madrid, Spain; 2012. pp. 1–6. [Google Scholar]
37.Gunatilaka A.H., Baertlein B.A. Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23:577–589. [Google Scholar]
38.Grover J., Hanmandlu M. Hybrid fusion of score level and adaptive fuzzy decision level fusions for the finger-knuckle-print based authentication. Appl. Soft Comput. 2015;31:1–13. 2015/06/01/ [Google Scholar]
39.Liu C.J., Wechsler H. A shape- and texture-based enhanced fisher classifier for face recognition. IEEE Transactions on Image Processing. 2001;10:598–608. doi: 10.1109/83.913594. Apr. [DOI] [PubMed] [Google Scholar]
40.Yang J., Yang J.Y. Generalized K-L transform based combined feature extraction. Pattern Recognit. 2002;35:295–297. Jan. [Google Scholar]
41.Sun Q.S., Zeng S.G., Liu Y., Heng P.A., Xia D.S. A new method of feature fusion and its application in image recognition. Pattern Recognit. 2005;38:2437–2448. Dec. [Google Scholar]
42.Haghighat M., Abdel-Mottaleb M., Alhalabi W. Discriminant Correlation Analysis: real-Time Feature Level Fusion for Multimodal Biometric Recognition. IEEE Trans. Inform. Forensics Security. 2016;11:1984–1996. Sep. [Google Scholar]
43.Chaib S., Liu H., Gu Y.F., Yao H.X. Deep Feature Fusion for VHR Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. Aug, 2017;55:4775–4784. [Google Scholar]
44.Wang S.-.H., Govindaraj V.V., Górriz J.M., Zhang X., Zhang Y.-.D. Information Fusion; 2020. Covid-19 Classification by FGCNet With Deep Feature Fusion from Graph Convolutional Network and Convolutional Neural Network. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Mazzini D., Napoletano P., Piccoli F., Schettini R. A Novel Approach to Data Augmentation for Pavement Distress Segmentation. Comput. Industry. 2020;121 Article ID: 103225, Oct. [Google Scholar]
46.Duncan M.J., Eyre E.L.J., Cox V., Roscoe C.M.P., Faghy M.A., Tallis J. Cross-validation of Actigraph derived accelerometer cut-points for assessment of sedentary behaviour and physical activity in children aged 8-11 years. Acta Paediatr. Sep, 2020;109:1825–1830. doi: 10.1111/apa.15189. [DOI] [PubMed] [Google Scholar]
47.Hasnain M., Pasha M.F., Ghani I., Imran M., Alzahrani M.Y., Budiarto R. Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking. IEEE Access. 2020;8:90847–90861. [Google Scholar]
48.Pondenkandath V., Alberti M., Eichenberger N., Ingold R., Liwicki M. Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning. J. Imaging. 2020;6:20. doi: 10.3390/jimaging6070071. Article ID: 71, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Fernandes B., Gonzalez-Briones A., Novais P., Calafate M., Analide C., Neves J. An Adjective Selection Personality Assessment Method Using Gradient Boosting Machine Learning. Processes. 2020;8:24. Article ID: 618, May. [Google Scholar]
50.Krsnik I., Glavas G., Krsnik M., Miletic D., Stajduhar I. Automatic Annotation of Narrative Radiology Reports. Diagnostics. 2020;10:15. doi: 10.3390/diagnostics10040196. Article ID: 196Apr, [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020;128:336–359. 2020/02/01. [Google Scholar]
52.Zhang Y., Qian Y.F., Wu D., Hossain M.S., Ghoneim A., Chen M. Emotion-Aware Multimedia Systems Security. IEEE Trans. Multimedia. Mar, 2019;21:617–624. [Google Scholar]
53.Zhang Y., Gravina R., Lu H.M., Villari M., Fortino G. PEA: parallel electrocardiogram-based authentication for smart healthcare systems. J. Netw. Comput. Appl. Sep, 2018;117:10–16. [Google Scholar]
54.Zhang Y., Li Y., Wang R., Lu J., Ma X., Qiu M. PSAC: proactive Sequence-aware Content Caching via Deep Learning at the Network Edge. IEEE Trans. Netw. Scie. Eng. 2020:1. doi: 10.1109/TNSE.2020.2990963. [DOI] [Google Scholar]
55.Zhang Y., Wang R.R., Hossain M.S., Alhamid M.F., Guizani M. Heterogeneous Information Network-Based Content Caching in the Internet of Vehicles. IEEE Trans. Veh. Technol. Oct, 2019;68:10216–10226. [Google Scholar]
56.Zhang Y., Ma X., Zhang J., Hossain M.S., Muhammad G., Amin S.U. Edge Intelligence in the Cognitive Internet of Things: improving Sensitivity and Interactivity. IEEE Netw. 2019;33:58–64. May-Jun. [Google Scholar]
57.Zhang Y., Li Y., Wang R., Hossain M.S., Lu H. Multi-Aspect Aware Session-Based Recommendation for Intelligent Transportation Services. IEEE Trans. Intell. Transport. Syst. 2020:1–10. doi: 10.1109/TITS.2020.2990214. [DOI] [Google Scholar]
58.Zhang Y., Wen H., Qiu F., Wang Z., Abbas H. iBike: intelligent public bicycle services assisted by data analytics. Future Gen. Comput. Syst. 2019;95:187–197. 2019/06/01/ [Google Scholar]
59.Zhang Y., Hossain M.S., Ghoneim A., Guizani M. COCME: content-Oriented Caching on the Mobile Edge for Wireless Communications. IEEE Wireless Commun. 2019;26:26–31. [Google Scholar]

[bib0001] 1.COVID-19 CORONAVIRUS PANDEMIC, 2020. (12/Oct/2020). Available: https://www.worldometers.info/coronavirus.

[bib0002] 2.A. Azar, D.E. Wessell, J.R. Janus, and L.V. Simon. Fractured aluminum nasopharyngeal swab during drive-through testing for COVID-19: radiographic detection of a retained foreign body. Skeletal Radiol. [Article; Early Access]. 5 (2020). doi: 10.1007/s00256-020-03582-x. [DOI] [PMC free article] [PubMed]

[bib0003] 3.de Barry O., Obadia I., Hajjam M.El, Carlier R.Y. Chest-X-ray is a mainstay for follow-up in critically ill patients with covid-19 induced. Eur. J. Radiol. 2020;129(2) doi: 10.1016/j.ejrad.2020.109075. Article ID: 109075, Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Herpe G., Tasu J.P. Impact of the Prevalence on the Predictive Positive Value of Chest CT in the Diagnosis of Coronavirus Disease (COVID-19) Am. J. Roentgenol. 2020;215:W39. doi: 10.2214/AJR.20.23530. Sep. [DOI] [PubMed] [Google Scholar]

[bib0005] 5.Willman D. The Washington Post [Internet]; 2020. Contamination At CDC Lab Delayed Rollout of Coronavirus Tests. [Google Scholar]

[bib0006] 6.Ai T., Yang Z., Hou H., Zhan C., Chen C., Lv W. Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: a Report of 1014 Cases,". Radiology. 2020;296:E32–E40. doi: 10.1148/radiol.2020200642. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Imre A. A Typical Chest CT Appearance of a Case with Coronavirus Disease 2019 (COVID-19), Erciyes Med. J. Sep, 2020;42:346–347. [Google Scholar]

[bib0008] 8.Flor N., Tonolini M. From ground-glass opacities to pulmonary emboli. A snapshot of the evolving role of a radiology unit facing the COVID-19 outbreak. Clin. Radiol. 2020;75:556–557. doi: 10.1016/j.crad.2020.04.009. Jul, [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0009] 9.Fry C.V., Cai X.J., Zhang Y., Wagner C.S. Consolidation in a crisis: patterns of international collaboration in early COVID-19 research. PLoS ONE. 2020;15:15. doi: 10.1371/journal.pone.0236307. Article ID: e0236307, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0010] 10.Li P., Liu G. Pathological Brain Detection via Wavelet Packet Tsallis Entropy and Real-Coded Biogeography-based Optimization. Fundam. Inform. 2017;151:275–291. [Google Scholar]

[bib0011] 11.Lu S. A Pathological Brain Detection System based on Extreme Learning Machine Optimized by Bat Algorithm. CNS Neurol. Dis. - Drug Targets. 2017;16:23–29. doi: 10.2174/1871527315666161019153259. [DOI] [PubMed] [Google Scholar]

[bib0012] 12.Jiang X. Chinese Sign Language Fingerspelling Recognition via Six-Layer Convolutional Neural Network with Leaky Rectified Linear Units for Therapy and Rehabilitation. J. Med. Imaging Health Inform. 2019;9:2031–2038. [Google Scholar]

[bib0013] 13.Guo M.H., Du Y.Z. Classification of Thyroid Ultrasound Standard Plane Images using ResNet-18 Networks. IEEE 13th International Conference on Anti-Counterfeiting, Security, and Identification; Xiamen, China; 2019. pp. 324–328. [Google Scholar]

[bib0014] 14.Fulton L.V., Dolezel D., Harrop J., Yan Y., Fulton C.P. Classification of Alzheimer's Disease with and without Imagery Using Gradient Boosted Machines and ResNet-50. Brain. Sci. 2019;9:16. doi: 10.3390/brainsci9090212. Article ID: 212, Sep. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0015] 15.Loey M., Smarandache F., Khalifa N.E.M. Within the Lack of Chest COVID-19 X-ray Dataset: a Novel Detection Model Based on GAN and Deep Transfer Learning. Symmetry-Basel. 2020;12:19. Article ID: 651, Apr. [Google Scholar]

[bib0016] 16.Togacar M., Ergen B., Comert Z. COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 2020;121:12. doi: 10.1016/j.compbiomed.2020.103805. Article ID: 103805, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0017] 17.Cohen J.P., Dao L., Morrison P., Roth K., Bengio Y., Shen B.Y. Predicting COVID-19 Pneumonia Severity on Chest X-ray With Deep Learning. Cureus. 2020;12:10. doi: 10.7759/cureus.9448. Article ID: e9448, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0018] 18.Tabik S., Gómez-Ríos A., Martín-Rodríguez J., Sevillano-García I., Rey-Area M., Charte D. 2020. COVIDGR Dataset and COVID-SDNet Methodology For Predicting COVID-19 Based On Chest X-Ray Images. arXiv Preprint. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0019] 19.Ni Q.Q., Sun Z.Y., Qi L., Chen W., Yang Y., Wang L. A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images. Eur. Radiol. 2020:11. doi: 10.1007/s00330-020-07044-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0020] 20.Ko H., Chung H., Kang W.S., Kim K.W., Shin Y., Kang S.J. COVID-19 Pneumonia Diagnosis Using a Simple 2D Deep Learning Framework With a Single Chest CT Image: model Development and Validation. J. Med. Internet Res. 2020;22:13. doi: 10.2196/19569. Article ID: e19569, Jun. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0021] 21.Li L., Qin L., Xu Z., Yin Y., Wang X., Kong B. Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: evaluation of the Diagnostic Accuracy. Radiology. 2020;296:E65–E71. doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0022] 22.Wang X.G., Deng X.B., Fu Q., Zhou Q., Feng J.P., Ma H. A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT. IEEE Trans. Med. Imaging. Aug, 2020;39:2615–2625. doi: 10.1109/TMI.2020.2995965. [DOI] [PubMed] [Google Scholar]

[bib0023] 23.Satapathy S.C., Zhu L.Y., Górriz J.M. A seven-layer convolutional neural network for chest CT based COVID-19 diagnosis using stochastic pooling. IEEE Sens. J. 2020:1. doi: 10.1109/JSEN.2020.3025855. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0024] 24.Wu X. Diagnosis of COVID-19 by Wavelet Renyi Entropy and Three-Segment Biogeography-Based Optimization. Int. J. Comput. Intelligence Syst. 2020;13:1332–1344. 2020-09-17T09:29:20.000Z. [Google Scholar]

[bib0025] 25.Chen Y. A Feature-Free 30-Disease Pathological Brain Detection System by Linear Regression Classifier. CNS Neurol. Dis. - Drug Targets. 2017;16:5–10. doi: 10.2174/1871527314666161124115531. [DOI] [PubMed] [Google Scholar]

[bib0026] 26.Chen Y. Wavelet energy entropy and linear regression classifier for detecting abnormal breasts. Multimed. Tools Appl. 2018;77:3813–3832. [Google Scholar]

[bib0027] 27.Farhood H., Perry S., Cheng E., Kim J. Enhanced 3D Point Cloud from a Light Field Image. Remote Sens. (Basel) 2020;12 Article ID: 1125, Apr. [Google Scholar]

[bib0028] 28.Debnath S., Talukdar F.A. Brain tumour segmentation using memory based learning method. Multimed. Tools. Appl. Aug, 2019;78:23689–23706. [Google Scholar]

[bib0029] 29.Glatt R., Da Silva F.L., Bianchi R.A.D., Costa A.H.R. DECAF: deep Case-based Policy Inference for knowledge transfer in Reinforcement Learning. Expert Syst. Appl. 2020;156:13. Article ID: 113420, Oct, [Google Scholar]

[bib0030] 30.Benbahria Z., Sebari I., Hajji H., Smiej M.F. Intelligent mapping of irrigated areas from landsat 8 images using transfer learning. Int. J. Eng. Geoscie. 2021;6:41–51. Feb, [Google Scholar]

[bib0031] 31.Hundt A., Killeen B., Greene N., Wu H.T., Kwon H., Paxton C. Good Robot!": efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer. IEEE Robotics Automation Lett. 2020;5:6724–6731. Oct, [Google Scholar]

[bib0032] 32.Gessert N., Bengs M., Wittig L., Dr?mann D., Keck T., Schlaefer A. Deep transfer learning methods for colon cancer classification in confocal laser microscopy images. Int. J. Comput. Assist. Radiol. Surg. 2019;14:1837–1845. doi: 10.1007/s11548-019-02004-1. Nov, [DOI] [PubMed] [Google Scholar]

[bib0033] 33.Hassanpour M., Malek H. Learning Document Image Features With SqueezeNet Convolutional Neural Network. Int. J. Eng. Jul, 2020;33:1201–1207. [Google Scholar]

[bib0034] 34.Hirano G., Nemoto M., Kimura Y., Kiyohara Y., Koga H., Yamazaki N. Automatic diagnosis of melanoma using hyperspectral data and GoogLeNet. Skin Res. Technol. [Article; Early Access]. 2020;7 doi: 10.1111/srt.12891. [DOI] [PubMed] [Google Scholar]

[bib0035] 35.Venturi L., Bandeira A.S., Bruna J. Spurious Valleys in One-hidden-layer Neural Network Optimization Landscapes. J. Mach. Learn. Res. 2019;20:34. Article ID: 133. [Google Scholar]

[bib0036] 36.Planet S., Iriondo I. Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. 7th Iberian Conference on Information Systems and Technologies (CISTI 2012; Madrid, Spain; 2012. pp. 1–6. [Google Scholar]

[bib0037] 37.Gunatilaka A.H., Baertlein B.A. Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23:577–589. [Google Scholar]

[bib0038] 38.Grover J., Hanmandlu M. Hybrid fusion of score level and adaptive fuzzy decision level fusions for the finger-knuckle-print based authentication. Appl. Soft Comput. 2015;31:1–13. 2015/06/01/ [Google Scholar]

[bib0039] 39.Liu C.J., Wechsler H. A shape- and texture-based enhanced fisher classifier for face recognition. IEEE Transactions on Image Processing. 2001;10:598–608. doi: 10.1109/83.913594. Apr. [DOI] [PubMed] [Google Scholar]

[bib0040] 40.Yang J., Yang J.Y. Generalized K-L transform based combined feature extraction. Pattern Recognit. 2002;35:295–297. Jan. [Google Scholar]

[bib0041] 41.Sun Q.S., Zeng S.G., Liu Y., Heng P.A., Xia D.S. A new method of feature fusion and its application in image recognition. Pattern Recognit. 2005;38:2437–2448. Dec. [Google Scholar]

[bib0042] 42.Haghighat M., Abdel-Mottaleb M., Alhalabi W. Discriminant Correlation Analysis: real-Time Feature Level Fusion for Multimodal Biometric Recognition. IEEE Trans. Inform. Forensics Security. 2016;11:1984–1996. Sep. [Google Scholar]

[bib0043] 43.Chaib S., Liu H., Gu Y.F., Yao H.X. Deep Feature Fusion for VHR Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. Aug, 2017;55:4775–4784. [Google Scholar]

[bib0044] 44.Wang S.-.H., Govindaraj V.V., Górriz J.M., Zhang X., Zhang Y.-.D. Information Fusion; 2020. Covid-19 Classification by FGCNet With Deep Feature Fusion from Graph Convolutional Network and Convolutional Neural Network. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0045] 45.Mazzini D., Napoletano P., Piccoli F., Schettini R. A Novel Approach to Data Augmentation for Pavement Distress Segmentation. Comput. Industry. 2020;121 Article ID: 103225, Oct. [Google Scholar]

[bib0046] 46.Duncan M.J., Eyre E.L.J., Cox V., Roscoe C.M.P., Faghy M.A., Tallis J. Cross-validation of Actigraph derived accelerometer cut-points for assessment of sedentary behaviour and physical activity in children aged 8-11 years. Acta Paediatr. Sep, 2020;109:1825–1830. doi: 10.1111/apa.15189. [DOI] [PubMed] [Google Scholar]

[bib0047] 47.Hasnain M., Pasha M.F., Ghani I., Imran M., Alzahrani M.Y., Budiarto R. Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking. IEEE Access. 2020;8:90847–90861. [Google Scholar]

[bib0048] 48.Pondenkandath V., Alberti M., Eichenberger N., Ingold R., Liwicki M. Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning. J. Imaging. 2020;6:20. doi: 10.3390/jimaging6070071. Article ID: 71, Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0049] 49.Fernandes B., Gonzalez-Briones A., Novais P., Calafate M., Analide C., Neves J. An Adjective Selection Personality Assessment Method Using Gradient Boosting Machine Learning. Processes. 2020;8:24. Article ID: 618, May. [Google Scholar]

[bib0050] 50.Krsnik I., Glavas G., Krsnik M., Miletic D., Stajduhar I. Automatic Annotation of Narrative Radiology Reports. Diagnostics. 2020;10:15. doi: 10.3390/diagnostics10040196. Article ID: 196Apr, [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0051] 51.Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020;128:336–359. 2020/02/01. [Google Scholar]

[bib0052] 52.Zhang Y., Qian Y.F., Wu D., Hossain M.S., Ghoneim A., Chen M. Emotion-Aware Multimedia Systems Security. IEEE Trans. Multimedia. Mar, 2019;21:617–624. [Google Scholar]

[bib0053] 53.Zhang Y., Gravina R., Lu H.M., Villari M., Fortino G. PEA: parallel electrocardiogram-based authentication for smart healthcare systems. J. Netw. Comput. Appl. Sep, 2018;117:10–16. [Google Scholar]

[bib0054] 54.Zhang Y., Li Y., Wang R., Lu J., Ma X., Qiu M. PSAC: proactive Sequence-aware Content Caching via Deep Learning at the Network Edge. IEEE Trans. Netw. Scie. Eng. 2020:1. doi: 10.1109/TNSE.2020.2990963. [DOI] [Google Scholar]

[bib0055] 55.Zhang Y., Wang R.R., Hossain M.S., Alhamid M.F., Guizani M. Heterogeneous Information Network-Based Content Caching in the Internet of Vehicles. IEEE Trans. Veh. Technol. Oct, 2019;68:10216–10226. [Google Scholar]

[bib0056] 56.Zhang Y., Ma X., Zhang J., Hossain M.S., Muhammad G., Amin S.U. Edge Intelligence in the Cognitive Internet of Things: improving Sensitivity and Interactivity. IEEE Netw. 2019;33:58–64. May-Jun. [Google Scholar]

[bib0057] 57.Zhang Y., Li Y., Wang R., Hossain M.S., Lu H. Multi-Aspect Aware Session-Based Recommendation for Intelligent Transportation Services. IEEE Trans. Intell. Transport. Syst. 2020:1–10. doi: 10.1109/TITS.2020.2990214. [DOI] [Google Scholar]

[bib0058] 58.Zhang Y., Wen H., Qiu F., Wang Z., Abbas H. iBike: intelligent public bicycle services assisted by data analytics. Future Gen. Comput. Syst. 2019;95:187–197. 2019/06/01/ [Google Scholar]

[bib0059] 59.Zhang Y., Hossain M.S., Ghoneim A., Guizani M. COCME: content-Oriented Caching on the Mobile Edge for Wireless Communications. IEEE Wireless Commun. 2019;26:26–31. [Google Scholar]

PERMALINK

COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis

Shui-Hua Wang

Deepak Ranjan Nayak

David S Guttery

Xin Zhang

Yu-Dong Zhang

Highlights

Abstract

Aim

Methods

Results

Conclusions

1. Introduction

2. Dataset and preprocessing

2.1. Slice selection

Table 1.

2.2. Ground-truth labelling

2.3. Preprocessing

Fig. 1.

Fig. 2.

Table 2.

3. Methodology

3.1. Transfer learning

Fig. 3.

3.2. Novelty 1: (L, 2) transfer feature learning

Table 3.

Fig. 4.

3.3. Novelty 2: selection algorithm of pretrained networks for fusion

3.4. Novelty 3: deep CCT fusion by discriminant correlation analysis

3.5. Data augmentation

Fig. 6.

(i) noise injection

(ii) horizontal shear (HS) transform

(iii) vertical shear (VS) transform

(iv) rotation

(v) gamma correction (GC)

(vi) random translation (RT)

(vii) scaling

(ix) mirror

(x) concatenation

3.6. Experiment setup and measures

Fig. 7.

3.7. Pseudocode of CCSHNet

4. Experiments, results, and discussions

4.1. Hyperparameter values

Table 4.

Table 5.

4.2. Illustration of multiple data augmentation

Fig. 8.

4.3. Top three models of the validation set

Table 6.

4.4. GSAF against SAPNF

Table 7.

4.5. Visual explanation of fusion

Fig. 9.

Fig. 10.

4.6. Performance of CCSHNet on the test set

Table 8.

4.7. Comparison to state-of-the-art approaches

Table 9.

Fig. 11.

5. Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Appendix A

Table 10.

Appendix B

Table 11.

Algorithm 1.

Algorithm 2.

Algorithm 3.

Algorithm 4.

Fig. 5.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles