MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray

Yu-Dong Zhang; Zheng Zhang; Xin Zhang; Shui-Hua Wang

doi:10.1016/j.patrec.2021.06.021

. 2021 Jul 14;150:8–16. doi: 10.1016/j.patrec.2021.06.021

MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray

Yu-Dong Zhang ^a,¹, Zheng Zhang ^b,^c,¹, Xin Zhang ^d,^⁎, Shui-Hua Wang ^e,^⁎

PMCID: PMC8277963 PMID: 34276114

Abstract

Background

COVID-19 has caused 3.34m deaths till 13/May/2021. It is now still causing confirmed cases and ongoing deaths every day.

Method

This study investigated whether fusing chest CT with chest X-ray can help improve the AI's diagnosis performance. Data harmonization is employed to make a homogeneous dataset. We create an end-to-end multiple-input deep convolutional attention network (MIDCAN) by using the convolutional block attention module (CBAM). One input of our model receives 3D chest CT image, and other input receives 2D X-ray image. Besides, multiple-way data augmentation is used to generate fake data on training set. Grad-CAM is used to give explainable heatmap.

Results

The proposed MIDCAN achieves a sensitivity of 98.10±1.88%, a specificity of 97.95±2.26%, and an accuracy of 98.02±1.35%.

Conclusion

Our MIDCAN method provides better results than 8 state-of-the-art approaches. We demonstrate the using multiple modalities can achieve better results than individual modality. Also, we demonstrate that CBAM can help improve the diagnosis performance.

Keywords: Deep learning, Data harmonization, Multiple input, Convolutional neural network, Automatic differentiation, COVID-19, Chest CT, Chest X-ray, Multimodality

1. Introduction

COVID-19 (also known as coronavirus) pandemic is an ongoing infectious disease caused by severe acute respiratory syndrome (SARS) coronavirus 2 [1]. As of 13/May/2021, there are over 161.14m confirmed cases and over 3.34m deaths attributed to COVID-19. The cumulative deaths of the top 10 countries are shown in Fig. 1 .

Fig 1 — Top 10 countries in terms of cumulative deaths (13/May/2021).

The main symptoms of COVID-19 are a low fever, a new and ongoing cough, a loss or change to taste and smell. In UK, the vaccines approved were developed by Pfizer/BioNTech, Oxford/AstraZeneca, and Moderna. The joint committee on vaccination and immunization (JCVI) [2] determines the order in which people will be offered the vaccine. At April/2021, people aged 50 and over, people of clinically (extremely) vulnerable, people living or working in the care homes, and health care providers, people with a learning disability are being offered.

Two COVID-19 diagnosis methods are available. The first method is viral testing to test the existence of viral RNA fragments [3]. The shortcomings of swab test [4] are two folds: (i) the swab samples may be contaminated and (ii) it needs to wait from several hours to several days to get the results. The other method is chest imaging. There are two main different chest imaging available: chest computed tomography (CCT) and chest X-ray (CXR)

CCT is one of the best chest imaging techniques so far, because it provides the finest resolution and it is capable of recognizing extremely small nodules [5]. It provides high-quality volumetric 3D chest data. On the other hand, CXR performs poor on soft tissue contrast, and it only provides 2D image [6].

In this paper, we aim to fuse CCT and CXR images, and expects the fusion can improve the performance compared to using CCT or CXR individually. Besides, we create a novel multiple input deep convolutional attention network (MIDCAN) that can handle CCT and CXR images simultaneously, and present the diagnosis output. The contributions of this study are itemized briefly as following five points:

•
Attention mechanism, convolutional block attention module, is included in the proposed MIDCAN model to improve the performance;
•
The proposed MIDCAN model can handle CCT and CXR images simultaneously;
•
Multiple-way data augmentation is employed to overcome overfitting problem;
•
This proposed MIDCAN model gives more accurate performances than individual modality-based approaches;
•
The proposed MIDCAN model is superior to state-of-the-art COVID-19 diagnosis approaches.

2. Literature survey

From previous year, AI field has investigated ongoing researches on automatic COVID-19 diagnosis, which can save the workloads of manual labelling.

For the CCT image based COVID-19 diagnosis, Chen (2020) [7] employed gray-level occurrence matrix (GLCM) as feature extraction method. The authors then used support vector machine (SVM) as the classifier. Yao (2020) [8] combined wavelet entropy (WE) and biogeography-based optimization (BBO). Wu (2020) [9] presented a novel method—wavelet Renyi entropy (WRE) to help diagnose COVID-19. El-kenawy, Ibrahim (2020) [10] proposed a feature selection voting classifier (FSVC) approach for COVID-19 classification. Satapathy (2021) [11] combined DenseNet with optimization of transfer learning setting (OTLS). Saood and Hatem (2021) [12] explored two structurally-different deep learning (DL) methods—U-Net and SegNet—for COVID-19 CT image segmentation.

On the other side, there are several successful AI models for CXR image based COVID-19 diagnosis. For example, Ismael and Sengur (2020) [13] presented a multi-resolution analysis (MRA) approach. Loey, Smarandache (2020) [14] combined generative adversarial network (GAN) with GoogleNet. Their method is abbreviated as GG. Togacar, Ergen (2020) [15] employed social mimic optimization (SMO) for feature selection and combination. Das, Ghosh (2021) [16] used weighted average ensembling technique with convolutional neural network (CNN) for automatic COVID-19 detection.

The main shortcomings of above approaches are three points: (i) They only consider individual modality, either CCT or CXR. (ii) Their AI models are either traditional feature extraction plus classifier model, or modern deep neural network models. Nevertheless, their models lack attention mechanism. (iii) Efficient measures to resist overfitting are missing.

To solve or alleviate above three shortcomings, we proposed the multiple input deep convolutional attention network. The dataset and details of our method will be discussed in Sections 3 and 4, respectively.

3. Dataset

3.1. Data harmonization

This retrospective study was granted to exempt ethical approval. 42 COVID-19 patients and 44 healthy controls (HCs) were recruited. All the data were collected from local hospitals. Each subject $n$ takes a CCT scan a CXR scan, and thus generate a CCT image $A_{0} (n)$ and CXR image $B_{0} (n)$ . Due to different chest sizes of different people and the different sources of scanning machines, the height of image $A_{0} (n)$ and the size of $B_{0} (n)$ vary.

To make a homogeneous dataset, data harmonization [17] is used. The central 64 slices of CCT image and the central rectangle region of CXR image are reserved. The height and width of CCT slices are resized to $1024 \times 1024$ , and the CXR image is resized to $2048 \times 2048$ . They are named $A_{1} (n)$ and $B_{1} (n)$ . We choose 64 and 2048 because we find this can keep the lung part of images while get rid of unrelated body tissues. The details are displayed in Algorithm 1 .

Algorithm 1.

Data harmonization.

Input	CCT image $A_{0} (n)$ and CXR image $B_{0} (n)$ of subject $n$ .
Step 1	For CCT image $A_{0} (n)$ 64 central slices are reserved, and top/bottom slices are removed
Step 2	For CXR image $B_{0} (n)$ Central rectangle region is reserved, and outskirt pixels are removed.
Step 3	CCT images are resized to $1024 \times 1024$ , and CXR image is resized to $2048 \times 2048$ .
Output	CCT image $A_{1} (n)$ and CXR image $B_{1} (n)$ . $s i z e [A_{1} (n)] = 1024 \times 1024 \times 64$ . $s i z e [B_{1} (n)] = 2048 \times 2048$ .

Open in a new tab

3.2. Data Preprocessing

Second, data preprocessing (See Fig. 2 ) is used since both CCT and CXR image contain redundant/unrelated spatial information and their sizes are still too large. First, all the CCT and CXR images are grayscaled. Second, histogram stretch is carried to enhance the image contrast, where $(v_{m i n}, v_{m a x})$ stand for the minimum and maximum grayscale values of our images. Third, the margins at four directions are cropped (e.g., the text in the right side and the check-up bed in bottom side of CCT images, the neck part in the top side of CXR images, the background regions at four directions, etc.). Finally, CCT images are resized to $H^{C C T} \times W^{C C T} \times C^{C C T}$ and CXR images are resized to $H^{C X R} \times W^{C X R} \times 1$ .

Fig. 3 gives the examples of preprocessed images of a COVID-19 patient. Fig. 3(a) displays one slice out of 16 CCT slices, and Fig. 3(b) displays the CXR image.

4. Methodology

4.1. Convolutional block attention module

Table 1 gives the abbreviation list. DL has gained many successes in prediction/classification quests. Among all the DL structures, convolutional neural network (CNN) [18, 19] is particularly suitable for analyzing 2D/3D images. To help boost the performance of CNN, researches are proposed to modify CNN structures in terms of either depth, or cardinality, or width. Newly, scholars have studied on attention mechanism, and attempted to integrate attention to DL structures. For example, Hu, Shen (2020) [20] proposed squeeze-and-excitation (SE) network. Woo, Park (2018) [21] presented a new convolutional block attention module (CBAM), that improves the traditional convolutional block (CB) by integrating attention mechanism. This study we choose CBAM because CBAM can provider both spatial attention and channel attention, compared to SE.

Table 1.

Abbreviation list.

Meanings	Abbreviations
AM	activation map
AI	artificial intelligence
AP	average pooling
BN	batch normalization
CAM	channel attention module
CCT	chest computed tomography
CXR	chest X-ray
CB	convolutional block
CBAM	convolutional block attention module
CNN	convolutional neural network
DA	data augmentation
DL	deep learning
FMI	Fowlkes–Mallows index
MCC	Matthews correlation coefficient
MP	max pooling
MSD	mean and standard deviation
ReLU	rectified linear unit
SAPN	salt-and-pepper noise
SAM	spatial attention module
SN	speckle noise
SE	squeeze-and-excitation

Open in a new tab

Take a 2D-image input as an example, Fig. 4 (a) displays the structure of a traditional CB. The output of previous block was sent to $m$ -repetitions of convolution layer, batch normalization (BN), and rectified linear unit (ReLU) layer. Finally, the $m$ -repetitions is followed by a pooling layer. The output is named activation map (AM), symbolized as $G \in R^{C \times H \times W}$ , where $(C, H, W)$ stands for the sizes of channel, height, and width, respectively.

In contrast to Fig. 4(a), Fig. 4(b) adds the structure of CBAM, by which two modules: channel attention module (CAM) and spatial attention module (SAM) are added to refine the activation map $G$ . Suppose the CBAM applies a 1D CAM $N_{CAM} \in R^{C \times 1 \times 1}$ and a 2D SAM $N_{SAM} \in R^{1 \times H \times W}$ in sequence to the input $G$ . Hence, the channel-refined activation map can be obtained as:

H = N_{CAM} (D) \otimes G

(1)

And the final refined AM

I = N_{SAM} (E) \otimes H

(2)

where $\otimes$ means the element-wise multiplication. $I$ is the refined AM, which replaces the $G$ of traditional CB output, and it will be sent to the next block.

Note if the above two operands are not with the same dimension, the values are reproduced so that (i) the spatial attentional values are copied along the channel dimension, and (ii) the channel attention values are copied along the spatial dimension.

4.2. Channel Attention Module

CAM is firstly defined. Both max pooling (MP) $z_{m p}$ and average pooling (AP) $z_{m p}$ are employed, making two features $J_{ap}$ and $J_{mp}$ as shown in Fig. 5 (a).

{\begin{matrix} J_{ap} = z_{a p} (G) \\ J_{mp} = z_{m p} (G) \end{matrix}

(3)

Both $J_{ap}$ and $J_{mp}$ are thenceforth sent to a shared multi-layer perceptron (MLP) to make the output AMs, that are then merged via element-wise summation $\oplus$ . The merged sum $z_{m l p} [J_{ap}] \oplus z_{m l p} [J_{mp}]$ is lastly forwarded to $σ$ . That is,

N_{CAM} (G) = σ {z_{m l p} [J_{ap}] \oplus z_{m l p} [J_{mp}]}

(4)

where $σ$ is sigmoid function.

To decrease the parameter space, the hidden size of MLP is fixed to $R^{C / r \times 1 \times 1}$ , where $r$ stands for the reduction ratio. Assume $X_{0} \in R^{C / r \times C}$ and $X_{1} \in R^{C \times C / r}$ mean the MLP weights (See Fig. 5a), respectively, equation (4) can be rewritten as:

N_{CAM} (G) = σ {X_{1} [X_{0} (J_{ap})] \oplus X_{1} [X_{0} (J_{mp})]}

(5)

Note $X_{0}$ and $X_{1}$ are shared by both $J_{ap}$ and $J_{mp}$ . Fig. 5(a) displays the diagram of CAM.

4.3. Spatial Attention Module

Next, SAM is defined in Fig. 5(b). The spatial attention module $N_{SAM}$ is a complementary procedure to the previous CAM $N_{CAM}$ . The average pooling $z_{a p}$ and max pooling $z_{m p}$ are harnessed again to the channel-refined activation map $H$ ,

{\begin{matrix} K_{ap} = z_{a p} (H) \\ K_{mp} = z_{m p} (H) \end{matrix}

(6)

Both $K_{ap}$ and $K_{mp}$ are two dimensional AMs: $K_{ap} \in R^{1 \times H \times W} \land K_{mp} \in R^{1 \times H \times W}$ . They are concatenated via concatenation function $z_{c o n}$ together along the channel dimension as

K = z_{c o n} (K_{ap}, K_{mp})

(7)

Afterwards, the concatenated AM is passed into a standard convolution $z_{c o n v}$ with the size of $7 \times 7$ , followed by sigmoid function $σ$ . Overall, we attain:

N_{SAM} (H) = σ {z_{c o n v} [K]}

(8)

The $N_{SAM} (H)$ is then element-wisely multiplied by $H$ to get the final refined AE $I$ . See Equation (2). The diagram of SAM is portrayed in Fig. 5(b).

4.4. Single Input and Multiple Input Deep Convolutional Attention Networks

In this study, we proposed a novel multiple-input deep convolutional attention network (MIDCAN) based on the ideas of CBAM and multiple-input. The structure of this proposed MIDCAN is determined by trial-and-error method. The variable $m$ in each block varies, and we found the best values are chosen in the range from $[1, 3]$ . We tested values larger than 3, which increase the computation burden, but the performances do not increase.

The structure of proposed shown below in Fig. 6 (a), which is composed of two inputs. The left input is “Input-CCT” where CCT images are passed into the network. The right input is “Input-CXR” where CXR images are passed into the network. Suppose $N_{C}^{C C T}$ and $N_{C}^{C X R}$ stand for the number of CBAM blocks individually. We set $N_{C}^{C C T} = N_{C}^{C X R} = 5$ in this study by trial-and-error.

For the left branch, the CCT input $A_{2}$ goes through $N_{C}^{C C T}$ 3D-CBAMs and generates the output AM $A_{7}$ , which is then flattened into $A_{8}$ . Similarly, the CXR input at right branch $B_{2}$ goes through $N_{C}^{C X R}$ 2D-CBAMs, generates the output AM $B_{7}$ , which is flattened into $B_{8}$ . The deep CCT features and deep CXR features are then concatenated via concatenation function $z_{c o n}$ as

MIDCAN : F_{1} = z_{c o n} (A_{8}, B_{8})

(9)

Here note than in our experiments, we use ablation studies, where we set two models: single-input deep convolutional attention network (SIDCAN) models, which remove the left and right branches, respectively. The first SIDCAN model, shown in Fig. 6(b), will only use CCT features, i.e.,

SIDCAN - CCT : F_{1} = A_{8}

(10)

This model is given a short name as SIDCAN-CCT.

The second SIDCAN model will only use CXR features, i.e.,

SIDCAN - CXR : F_{1} = B_{8}

(11)

This model is named as SIDCAN-CXR. Its flowchart is displayed in Fig. 6(c). Those two models will used as comparison method in our experiments.

The feature $F_{1}$ is then passed to two fully-connected layers [22]. The first FCL contains 500 neurons, and the last FCL contains $N_{C}$ neurons, where $N_{C}$ stands for the number of classes. In this study $N_{C} = 2$ . Finally, a softmax layer [23] turns the $F_{3}$ to probability. The loss function of this MIDCAN is cross entropy [24] function.

Table 2 gives the details of proposed MIDCAN. For the kernel parameter in Table 2, [3 × 3 × 3, 16]x3, [/2/2/2] stands for 3 repetitions of 16 filters with each size of $3 \times 3 \times 3$ , following by a pooling with pooling factor of 2, 2, and 2 along three dimensions, respectively. In FCL stage, the kernel parameter gives the size of weight matrix and bias vector, respectively.

Table 2.

Details of proposed MIDCAN model.

Name	Kernel Parameter	Variable and size
Input-CCT		$s i z e (A_{2}) = 256 \times 256 \times 16$
3D-CBAM-1	[3 × 3 × 3, 16]x3, [/2/2/2]	$s i z e (A_{3}) = 128 \times 128 \times 8 \times 16$
3D-CBAM-2	[3 × 3 × 3, 32]x2, [/2/2/1]	$s i z e (A_{4}) = 64 \times 64 \times 8 \times 32$
3D-CBAM-3	[3 × 3 × 3, 32]x2, [/2/2/2]	$s i z e (A_{5}) = 32 \times 32 \times 4 \times 32$
3D-CBAM-4	[3 × 3 × 3, 64]x2, [/2/2/1]	$s i z e (A_{6}) = 16 \times 16 \times 4 \times 64$
3D-CBAM-5	[3 × 3 × 3, 64]x2, [/2/2/2]	$s i z e (A_{7}) = 8 \times 8 \times 2 \times 64$
Flatten		$s i z e (A_{8}) = 8192$
Input-CXR		$s i z e (B_{2}) = 256 \times 256$
CBAM-1	[3 × 3, 16]x3, [/2/2]	$s i z e (B_{3}) = 128 \times 128 \times 16$
CBAM-2	[3 × 3, 32]x2, [/2/2]	$s i z e (B_{4}) = 64 \times 64 \times 32$
CBAM-3	[3 × 3, 64]x2, [/2/2]	$s i z e (B_{5}) = 32 \times 32 \times 64$
CBAM-4	[3 × 3, 64]x2, [/2/2]	$s i z e (B_{6}) = 16 \times 16 \times 64$
CBAM-5	[3 × 3, 128]x2, [/2/2]	$s i z e (B_{7}) = 8 \times 8 \times 128$
Flatten		$s i z e (B_{8}) = 8192$
Concatenate		$s i z e (F_{1}) = 16, 384$
FCL-1	500 × 16384, 500 × 1	$s i z e (F_{2}) = 500$
FCL-2	2 × 500, 2 × 1	$s i z e (F_{3}) = 2$
Softmax

Open in a new tab

4.5. 18-way data augmentation

Data augmentation (DA) [25] is an important utensil over the training set to avoid overfitting of classifiers when applied to test set. Meanwhile, DA can overcome the small-size dataset problem. Recently, Wang (2021) [26] proposed a novel 14-way data augmentation (DA), which used seven different DA techniques to the preprocessed training image $v (k)$ and its horizontal mirrored image $v^{H} (k)$ , respectively. Cheng (2021) [27] presented a 16-way DA, and used PatchShuffle technique to avoid overfitting.

This study enhances the 14-way DA method [26] to 18-way DA, by adding two new DA methods: salt-and-pepper noise (SAPN) and speckle noise (SN) on both $v (k)$ and $v^{H} (k)$ . Use $v (k)$ as an example, the SAPN altered image is symbolized as $v^{S A P N} (k)$ with its values are set as

{\begin{matrix} R (v^{S A P N} = v) = 1 - γ_{d}^{s a}, \\ R (v^{S A P N} = v_{m i n}) = \frac{γ_{d}^{s a}}{2}, \\ R (v^{S A P N} = v_{m a x}) = \frac{γ_{d}^{s a}}{2}, \end{matrix}

(12)

where $γ_{d}^{s a}$ stands for noise density, and $R$ the probability function. $v_{min}$ and $v_{max}$ correspond to black and white colors, respectively.

On the other side, the SN altered image is defined as

v^{S N} (k) = v (k) + U \times v (k),

(13)

where $U$ is a uniformly distributed random noise, of which the mean and variance are symbolized as $U_{m}^{s n}$ and $U_{v}^{s n}$ , respectively. Take Fig. 3(b) as the example, Fig. 7 (a-b) display the SAPN and SN altered images, respectively. Due to the page limit, the results of other DA are not shown in this paper.

Fig 7 — Examples of newly proposed DA methods.

Let $M_{a}$ stands for the number of DA techniques to the preprocessed image $v (k)$ , and $M_{b}$ stands for the number of new generated images for each DA. This proposed $(2 \times M_{a})$ -way DA algorithm is a four-step algorithm depicted below:

First, $M_{a}$ geometric/photometric/noise-injection DA transforms are utilized on preprocessed train image $v (k)$ ,. We use $w_{(m)}^{D A}, m = 1, \dots, M_{a}$ to denote each DA operation. See each DA operations $w_{(m)}^{D A}$ yields $M_{b}$ new images. Thus, for a given image $t (k)$ , we yield $M_{a}$ different data set $w_{(m)}^{D A} [v (k)], m = 1, \dots, M_{a}$ , and each dataset contains $M_{b}$ new images.

Second, horizontally mirrored image is generated as

v^{H} (k) = z_{M} [v (k)],

(14)

where $z_{M}$ means horizontal mirror function.

Third, all the $M_{a}$ DA methods are carried out on the mirror image $v^{H} (k)$ , and generate $M_{a}$ different dataset $w_{(m)}^{D A} [v^{H} (k)], m = 1, \dots, M_{a}$ .

Four, the raw image $v (k)$ , the horizontally mirrored image $v^{H} (k)$ , all $M_{a}$ -way results of preprocessed image $w_{(m)}^{D A} [v (k)], m = 1, \dots, M_{a}$ , and $M_{a}$ -way DA results of horizontally mirrored image $w_{(m)}^{D A} [v^{H} (k)], m = 1, \dots, M_{a}$ , are fused together using concatenation function $z_{C O N}$ , as defined in Eq. (9).

The final combined dataset is defined as $Λ$

v (k) \mapsto Λ = z_{c o n} {\begin{matrix} v (k) & v^{H} (k) \\ \underset{M_{b}}{\underset{︸}{w_{(1)}^{D A} [v (k)]}} & \underset{M_{b}}{\underset{︸}{w_{(1)}^{D A} [v^{H} (k)]}} \\ \dots & \dots \\ \underset{M_{b}}{\underset{︸}{w_{(M_{a})}^{D A} [v (k)]}} & \underset{M_{b}}{\underset{︸}{w_{(M_{a})}^{D A} [v^{H} (k)]}} \end{matrix}}

(15)

Therefore, one image $v (k)$ will generate

| Λ | = 2 \times M_{a} \times M_{b} + 2

(16)

images (including the original image $v (k)$ ). Note in our dataset, different $M_{b}$ will be assigned to CCT training images and CXR images since CCT images are 3D and CXR images are 2D. That means for each DA, we have $M_{b}^{C C T}$ new images for each CCT image and $M_{b}^{C X R}$ new images for each CXR image.

4.6. Implementation and evaluation

$K$ -fold cross validation is employed on both datasets. Suppose confusion matrix $J$ over r-th ( $1 \leq r \leq R$ ) run and k-th ( $1 \leq k \leq K$ ) fold is defined as

J (r, k) = [\begin{matrix} j_{11} (r, k) & j_{12} (r, k) \\ j_{21} (r, k) & j_{22} (r, k) \end{matrix}]

(17)

where $(j_{11}, j_{12}, j_{21}, j_{22})$ stand for TP, FN, FP, and TN, respectively. P stands for positive class, i.e., COVID-19, and N means negative class, i.e., healthy control. $k$ represents the index of trial/fold, and r stands for the index of run. At $k$ -th trial, the $k$ -th fold is used as test, and all the left folds are used as training,

Note that $J (r, k)$ is calculated based on each test fold, and are then summarized across all $K$ trials, as shown in Fig. 8 . Afterwards, we get the confusion matrix at r-th run $J (r)$ as

J (r) = \sum_{k = 1}^{K} J (r, k)

(18)

Fig 8 — Diagram of one run of $K$ -fold cross validation.

Seven indicators $\vec{τ} (r)$ are computed based on the confusion matrix over r-th run $J (r)$ .

J (r) \mapsto \vec{η} (r) = [η_{1} (r), η_{2} (r), \dots, η_{7} (r)],

(19)

where first four indicators are: $η_{1}$ sensitivity, $η_{2}$ specificity, $η_{3}$ precision, and $η_{4}$ accuracy. Those four indicators are commonly used. Their definitions can be found easily. $η_{5}$ is F1 score.

η_{5} (r) = \frac{2 \times j_{11} (r)}{2 \times j_{11} (r) + j_{12} (r) + j_{21} (r)}

(20)

$η_{6}$ is Matthews correlation coefficient (MCC)

η_{6} (r) = \frac{j_{11} (r) \times j_{22} (r) - j_{21} (r) \times j_{12} (r)}{\sqrt{[j_{11} (r) + j_{21} (r)] \times [j_{11} (r) + j_{12} (r)] \times [j_{22} (r) + j_{21} (r)] \times [j_{22} (r) + j_{12} (r)]}}

(21)

and $η_{7}$ is the Fowlkes–Mallows index (FMI).

η_{7} (r) = \sqrt{\frac{j_{11} (r)}{j_{11} (r) + l_{21} (r)} \times \frac{j_{11} (r)}{j_{11} (r) + j_{21} (r)}}

(22)

There are two indicators $η_{4}$ and $η_{6}$ using all the four basic measures $(j_{11}, j_{12}, j_{21}, j_{22})$ . Considering the range of $η_{4}$ is $0 \leq η_{4} \leq 1$ , and the range of $η_{6}$ is $- 1 \leq η_{6} \leq + 1$ , we finally choose $η_{6}$ as the most important indicator. Besides, Chicco, Totsch (2021) [28] stated that MCC is more reliable than many other indicators.

Above procedure is one run of $K$ -fold cross validation. We run the $K$ -fold cross validation $R$ runs. The mean and standard deviation (MSD) of all seven indicators $η_{m} (m = 1, \dots, 7)$ are calculated over all $R$ runs.

{\begin{matrix} μ (η_{m}) = \frac{1}{R} \times \sum_{r = 1}^{R} η_{m} (r) \\ σ (η_{m}) = \sqrt{\frac{1}{R - 1} \times \sum_{r = 1}^{R} {| η_{m} (r) - μ (η_{m}) |}^{2}} \end{matrix}

(23)

where $μ$ stands for the mean value, and $σ$ stands for the standard deviation. The MSDs are reported in the format of $η = μ (η) \pm σ (η)$ .

5. Experiments, results, and discussions

5.1. Parameter setting

Table 3 itemizes the parameter setting. Here the minimum value and maximum value of our images are set to 0 and 255, respectively. The size of preprocessed CCT images and CXR images are set to $256 \times 256 \times 16$ and $256 \times 256$ , respectively. The number of CBAM blocks for CCT and CXR branches are set to 5. The noise density of SAPN is set to 0.05. The mean and variance of uniform distributed noise in SN are set to 0 and 0.05, respectively. Nine different DA methods are used, so we have an 18-way DA if we consider both raw training image and its horizontal mirrored image. For each DA, 30 new images are generated for each CXR image, and 90 new images are generated for each CCT image. The number of $K$ -fold is set to $K = 10$ . We run our model $R = 10$ runs.

Table 3.

Parameter setting.

Parameter	Value
$(v_{m i n}, v_{, m a x})$	(0, 255)
$H^{C C T} \times W^{C C T} \times C^{C C T}$	$256 \times 256 \times 16$
$H^{C X R} \times W^{C X R}$	$256 \times 256$
$N_{C}^{C C T}$	5
$N_{C}^{C X R}$	5
$γ_{d}^{s a}$	0.05
$U_{m}^{s n}$	0
$U_{v}^{s n}$	0.05
$M_{a}$	9
$M_{b}^{C X R}$	30
$M_{b}^{C C T}$	90
$K$	10
$R$	10

Open in a new tab

5.2. Statistics of proposed MIDCAN

We use two modalities, CCT and CXR, in this experiment. The structure of our model is shown in Fig. 6(a). The statistical results of proposed MIDCAN are shown in Table 4 . As it shows, the sensitivity, specificity, precision and accuracy are 98.10±1.88, 97.95±2.26, 97.92±2.24, and 98.02±1.35, respectively. Moreover, the F1 score is 97.98±1.37, the MCC is 96.09±2.66, and FMI is 97.99±1.36.

Table 4.

Statistical results of proposed MIDCAN model.

Run	$η_{1}$	$η_{2}$	$η_{3}$	$η_{4}$	$η_{5}$	$η_{6}$	$η_{7}$
1	97.62	100.00	100.00	98.84	98.80	97.70	98.80
2	97.62	97.73	97.62	97.67	97.62	95.35	97.62
3	100.00	100.00	100.00	100.00	100.00	100.00	100.00
4	100.00	97.73	97.67	98.84	98.82	97.70	98.83
5	95.24	97.73	97.56	96.51	96.39	93.04	96.39
6	100.00	93.18	93.33	96.51	96.55	93.26	96.61
7	100.00	100.00	100.00	100.00	100.00	100.00	100.00
8	97.62	95.45	95.35	96.51	96.47	93.05	96.48
9	95.24	100.00	100.00	97.67	97.56	95.44	97.59
10	97.62	97.73	97.62	97.67	97.62	95.35	97.62
MSD	98.10 ±1.88	97.95 ±2.26	97.92 ±2.24	98.02 ±1.35	97.98 ±1.37	96.09 ±2.66	97.99 ±1.36

Open in a new tab

5.3. Effect of multimodality and attention mechanism

We compare multiple-modality against single-modality. Two models, viz., SIDCAN-CCT and SIDCAN-CXR, shown in Fig. 6(b-c) are used. Meanwhile, using attention and not using attention are compared.

The comparison results are shown in Table 5 , where NA means no attention. Fig. 9 presents the error bar comparison of all the six setting. Comparing using attention against without using attention, we can observe the attention mechanism does help improve the classification performance, which is coherent with the conclusion of Ref. [21].

Table 5.

Comparison of different settings.

Method	$η_{1}$	$η_{2}$	$η_{3}$	$η_{4}$	$η_{5}$	$η_{6}$	$η_{7}$
MIDCAN	98.10 ±1.88	97.95 ±2.26	97.92 ±2.24	98.02 ±1.35	97.98 ±1.37	96.09 ±2.66	97.99 ±1.36
SIDCAN -CCT	96.19 ±1.23	95.91 ±1.44	95.76 ±1.40	96.05 ±0.60	95.96 ±0.60	92.11 ±1.20	95.97 ±0.60
SIDCAN-CXR	93.81 ±3.01	93.86 ±2.64	93.70 ±2.48	93.84 ±0.78	93.69 ±0.85	87.78 ±1.56	93.72 ±0.84
MIDCAN (NA)	94.29 ±2.30	94.32 ±1.61	94.10 ±1.45	94.30 ±0.86	94.17 ±0.94	88.65 ±1.69	94.18 ±0.93
SIDCAN-CCT(NA)	92.86 ±1.59	93.64 ±2.09	93.36 ±2.02	93.26 ±0.74	93.08 ±0.71	86.55 ±1.49	93.10 ±0.72
SIDCAN-CXR(NA)	89.52 ±2.30	90.68 ±2.92	90.29 ±2.63	90.12 ±0.82	89.85 ±0.76	80.31 ±1.64	89.88 ±0.76

Open in a new tab

Fig 9 — Error bar comparison of six different settings.

Meanwhile, if we compare MIDCAN with two SIDCAN models, we can conclude that multimodality has the better performance than single modalities (both CT and CXR).

Fig. 10 displays the ROC curves of six settings. The blue patch corresponds to the lower bound and upper bound. For the first three models using attention, we can observe their AUCs are 0.9855, 0.9695, and 0.9567 for MIDCAN, SIDCAN-CCT, and SIDCAN-CXR, respectively. If removing the CBAM module, we can observe from the bottom part of Fig. 10, that the corresponding AUCs decrease to 0.9512, 0.9361, and 0.9262, respectively. In addition, multimodality is proven to give better performance than using single-modality.

5.4. Explainability of proposed model

Fig. 11 presents the manual delineation and heatmap results of Fig. 3. The heatmap images are generated via Grad-CAM method [29].

From Fig. 11, we can observe the proposed MIDCAN model is able to capture the lesions of both CCT image and CXR image accurately. This explainability via Grad-CAM can help the doctors, radiologists, and patients to better understand how our AI model works.

5.5. Comparison to State-of-the-art approaches

We compare the proposed method “MIDCAN” with 8 state-of-the-art methods: GLCM [7], WE-BBO [8], WRE [9], FSVC [10], OTLS [11], MRA [13], GG [14], SMO [15]. Those methods were carried out on single modality dataset depending on their original paper reported (either CCT or CXR), so we test those methods in the corresponding SIDCAN models and single-modality dataset.

All the methods were evaluated via 10 runs of 10-fold cross validation. The MSD results $\vec{η}$ of all approaches on ten runs of 10-fold cross validation are pictured in Fig. 12 , which sorts all the methods in terms of $η_{6}$ , and itemized in Table 6 .

Fig 12 — 3D bar plot of approach comparison.

Table 6.

Comparison with SOTA approaches (Unit: %).

Approach	$η_{1}$	$η_{2}$	$η_{3}$	$η_{4}$	$η_{5}$	$η_{6}$	$η_{7}$
GLCM [7]	71.90 ±4.02	78.18 ±3.89	76.04 ±2.41	75.12 ±0.98	73.80 ±1.49	50.35 ±1.91	73.89 ±1.39
WE-BBO [8]	74.05 ±4.82	74.77 ±3.93	73.83 ±1.84	74.42 ±0.78	73.81 ±1.65	48.98 ±1.65	73.88 ±1.67
WRE [9]	86.43 ±3.18	86.36 ±3.86	86.01 ±3.13	86.40 ±0.56	86.12 ±0.39	72.95 ±1.15	86.17 ±0.40
FSVC [10]	91.90 ±2.56	90.00 ±2.44	89.85 ±1.99	90.93 ±0.49	90.82 ±0.55	81.97 ±0.99	90.85 ±0.56
OTLS [11]	95.95 ±2.26	96.59 ±1.61	96.45 ±1.56	96.28 ±1.07	96.17 ±1.13	92.60 ±2.09	96.19 ±1.12
MRA [13]	86.43 ±3.90	90.45 ±2.79	89.71 ±2.63	88.49 ±2.08	87.98 ±2.27	77.09 ±4.17	88.02 ±2.26
GG [14]	93.33 ±2.70	90.00 ±4.44	90.13 ±3.81	91.63 ±1.53	91.61 ±1.35	83.49 ±2.84	91.67 ±1.30
SMO [15]	93.10 ±2.37	95.23 ±2.50	94.99 ±2.45	94.19 ±1.10	93.99 ±1.13	88.45 ±2.16	94.02 ±1.11
MIDCAN (Ours)	98.10 ±1.88	97.95 ±2.26	97.92 ±2.24	98.02 ±1.35	97.98 ±1.37	96.09 ±2.66	97.99 ±1.36

Open in a new tab

From Table 6, we can observe that this proposed MIDCAN outperforms all the other 8 comparison baseline methods in terms of all indicators.

The reason why our MIDCAN method is the best lie in following three facts: (i) we propose to use multiple modality instead of traditional single modality; (ii) CBAM is used in our network that attention mechanism can help our AI model focuses on the lesion region; (iii) multiple-way data augmentation is employed to overcome overfitting.

6. Conclusion

This paper proposed a novel multiple input deep convolutional attention network (MIDCAN) model for diagnosis of COVID-19. The results show our method achieves a sensitivity of 98.10±1.88%, a specificity of 97.95±2.26%, and an accuracy of 98.02±1.35%.

In the future researches, we shall carry out several attempts: (i) expand our dataset; (ii) include other advanced network strategies, such as graph neural network; (iii) collect IoT signals of subjects.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This paper is partially supported by Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); Hope Foundation for Cancer Research, UK (RM60G0680); British Heart Foundation Accelerator Award, UK; Sino-UK Industrial Fund, UK (RP202G0289); Global Challenges Research Fund (GCRF), UK (P202PF11); Royal Society International Exchanges Cost Share Award, UK (RP202G0230).

Edited by: Maria De Marsico

References

1.Turgutalp K., et al. Determinants of mortality in a large group of hemodialysis patients hospitalized for COVID-19. BMC Nephrol. 2021;22(1):10. doi: 10.1186/s12882-021-02233-0. Article ID. 29. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Hall A.J. The United Kingdom joint committee on vaccination and immunisation. Vaccine. 2010;28:A54–A57. doi: 10.1016/j.vaccine.2010.02.034. [DOI] [PubMed] [Google Scholar]
3.Sakanashi D., et al. Comparative evaluation of nasopharyngeal swab and saliva specimens for the molecular detection of SARS-CoV-2 RNA in Japanese patients with COVID-19. J. Infect. Chemother. 2021;27(1):126–129. doi: 10.1016/j.jiac.2020.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Giannitto C., et al. Chest CT in patients with a moderate or high pretest probability of COVID-19 and negative swab. Radiol. Med. (Torino) 2020;125(12):1260–1270. doi: 10.1007/s11547-020-01269-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Draelos R.L., et al. Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes. Med. Image Anal. 2021;67:12. doi: 10.1016/j.media.2020.101857. Article ID. 101857. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Braga A., et al. When less is more: regarding the use of chest X-ray instead of computed tomography in screening for pulmonary metastasis in postmolar gestational trophoblastic neoplasia. Br. J. Cancer. 2021 doi: 10.1038/s41416-020-01209-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Chen Y. in COVID-19: Prediction, Decision-Making, and its Impacts. Springer Singapore; Singapore: 2020. Covid-19 classification based on gray-level co-occurrence matrix and support vector machine; pp. 47–55. K.C. Santosh and A. Joshi, Editors. [Google Scholar]
8.Yao X. in COVID-19: Prediction, Decision-Making, and its Impacts. Springer; 2020. COVID-19 detection via wavelet entropy and biogeography-based optimization; pp. 69–76. K.C. Santosh and A. Joshi, Editors. [Google Scholar]
9.Wu X. Diagnosis of COVID-19 by wavelet Renyi entropy and three-segment biogeography-based optimization. Int. J. Comput. Intell. Syst. 2020;13(1):1332–1344. [Google Scholar]
10.El-kenawy E.S.M., et al. Novel feature selection and voting classifier algorithms for COVID-19 classification in CT images. IEEE Access. 2020;8:179317–179335. doi: 10.1109/ACCESS.2020.3028012. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Satapathy S.C. Covid-19 diagnosis via DenseNet and optimization of transfer learning setting. Cognitive Comput. 2021 doi: 10.1007/s12559-020-09776-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Saood A., et al. COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet. BMC Med. Imaging. 2021;21(1):10. doi: 10.1186/s12880-020-00529-5. Article ID. 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Ismael A.M., et al. The investigation of multiresolution approaches for chest X-ray image based COVID-19 detection. Health Inf. Sci. Syst. 2020;8(1) doi: 10.1007/s13755-020-00116-6. Article ID. 29. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Loey M., et al. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry-Basel. 2020;12(4) Article ID. 651. [Google Scholar]
15.Togacar M., et al. COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 2020;121:12. doi: 10.1016/j.compbiomed.2020.103805. Article ID. 103805. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Das A.K., et al. Automatic COVID-19 detection from X-ray images using ensemble learning with convolutional neural network. Pattern Anal. Appl. 2021:14. doi: 10.1007/s10044-021-00970-4. [DOI] [Google Scholar]
17.Susukida R., et al. Data management in substance use disorder treatment research: Implications from data harmonization of National Institute on Drug Abuse-funded randomized controlled trials. Clin. Trials. 2021:11. doi: 10.1177/1740774520972687. [DOI] [PubMed] [Google Scholar]
18.Kumari K., et al. Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Future Generat. Comput. Syst. Int. J. Escience. 2021;118:187–197. [Google Scholar]
19.Hamer A.M., et al. Replacing human interpretation of agricultural land in Afghanistan with a deep convolutional neural network. Int. J. Remote Sens. 2021;42(8):3017–3038. [Google Scholar]
20.Hu J., et al. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020;42(8):2011–2023. doi: 10.1109/TPAMI.2019.2913372. [DOI] [PubMed] [Google Scholar]
21.Woo S., et al. CBAM: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV); Munich, Germany; Springer; 2018. pp. 3–19. [Google Scholar]
22.Sindi H., et al. Random fully connected layered 1D CNN for solving the Z-bus loss allocation problem. Measurement. 2021;171:8. Article ID. 108794. [Google Scholar]
23.Kumar A., et al. Topic-document inference with the gumbel-softmax distribution. IEEE Access. 2021;9:1313–1320. [Google Scholar]
24.Sathya P.D., et al. Color image segmentation using Kapur, Otsu and minimum cross entropy functions based on exchange market algorithm. Expert Syst. Appl. 2021;172:30. Article ID. 114636. [Google Scholar]
25.Kim S., et al. Synthesis of brain tumor multicontrast MR images for improved data augmentation. Med. Phys. 2021:14. doi: 10.1002/mp.14701. [DOI] [PubMed] [Google Scholar]
26.Wang S.-H. Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inf. Fusion. 2021;67:208–229. doi: 10.1016/j.inffus.2020.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Cheng X. PSSPNN: PatchShuffle stochastic pooling neural network for an explainable diagnosis of COVID-19 with multiple-way data augmentation. Comput. Math. Methods Med. 2021;2021 doi: 10.1155/2021/6633755. Article ID. 6633755. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
28.Chicco D., et al. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. Biodata Mining. 2021;14(1):22. doi: 10.1186/s13040-021-00244-z. Article ID. 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Selvaraju R.R., et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision. 2020;128(2):336–359. [Google Scholar]

[bib0001] 1.Turgutalp K., et al. Determinants of mortality in a large group of hemodialysis patients hospitalized for COVID-19. BMC Nephrol. 2021;22(1):10. doi: 10.1186/s12882-021-02233-0. Article ID. 29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0002] 2.Hall A.J. The United Kingdom joint committee on vaccination and immunisation. Vaccine. 2010;28:A54–A57. doi: 10.1016/j.vaccine.2010.02.034. [DOI] [PubMed] [Google Scholar]

[bib0003] 3.Sakanashi D., et al. Comparative evaluation of nasopharyngeal swab and saliva specimens for the molecular detection of SARS-CoV-2 RNA in Japanese patients with COVID-19. J. Infect. Chemother. 2021;27(1):126–129. doi: 10.1016/j.jiac.2020.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Giannitto C., et al. Chest CT in patients with a moderate or high pretest probability of COVID-19 and negative swab. Radiol. Med. (Torino) 2020;125(12):1260–1270. doi: 10.1007/s11547-020-01269-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0005] 5.Draelos R.L., et al. Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes. Med. Image Anal. 2021;67:12. doi: 10.1016/j.media.2020.101857. Article ID. 101857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Braga A., et al. When less is more: regarding the use of chest X-ray instead of computed tomography in screening for pulmonary metastasis in postmolar gestational trophoblastic neoplasia. Br. J. Cancer. 2021 doi: 10.1038/s41416-020-01209-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Chen Y. in COVID-19: Prediction, Decision-Making, and its Impacts. Springer Singapore; Singapore: 2020. Covid-19 classification based on gray-level co-occurrence matrix and support vector machine; pp. 47–55. K.C. Santosh and A. Joshi, Editors. [Google Scholar]

[bib0008] 8.Yao X. in COVID-19: Prediction, Decision-Making, and its Impacts. Springer; 2020. COVID-19 detection via wavelet entropy and biogeography-based optimization; pp. 69–76. K.C. Santosh and A. Joshi, Editors. [Google Scholar]

[bib0009] 9.Wu X. Diagnosis of COVID-19 by wavelet Renyi entropy and three-segment biogeography-based optimization. Int. J. Comput. Intell. Syst. 2020;13(1):1332–1344. [Google Scholar]

[bib0010] 10.El-kenawy E.S.M., et al. Novel feature selection and voting classifier algorithms for COVID-19 classification in CT images. IEEE Access. 2020;8:179317–179335. doi: 10.1109/ACCESS.2020.3028012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0011] 11.Satapathy S.C. Covid-19 diagnosis via DenseNet and optimization of transfer learning setting. Cognitive Comput. 2021 doi: 10.1007/s12559-020-09776-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0012] 12.Saood A., et al. COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet. BMC Med. Imaging. 2021;21(1):10. doi: 10.1186/s12880-020-00529-5. Article ID. 19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0013] 13.Ismael A.M., et al. The investigation of multiresolution approaches for chest X-ray image based COVID-19 detection. Health Inf. Sci. Syst. 2020;8(1) doi: 10.1007/s13755-020-00116-6. Article ID. 29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0014] 14.Loey M., et al. Within the lack of chest COVID-19 X-ray dataset: a novel detection model based on GAN and deep transfer learning. Symmetry-Basel. 2020;12(4) Article ID. 651. [Google Scholar]

[bib0015] 15.Togacar M., et al. COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 2020;121:12. doi: 10.1016/j.compbiomed.2020.103805. Article ID. 103805. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0016] 16.Das A.K., et al. Automatic COVID-19 detection from X-ray images using ensemble learning with convolutional neural network. Pattern Anal. Appl. 2021:14. doi: 10.1007/s10044-021-00970-4. [DOI] [Google Scholar]

[bib0017] 17.Susukida R., et al. Data management in substance use disorder treatment research: Implications from data harmonization of National Institute on Drug Abuse-funded randomized controlled trials. Clin. Trials. 2021:11. doi: 10.1177/1740774520972687. [DOI] [PubMed] [Google Scholar]

[bib0018] 18.Kumari K., et al. Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Future Generat. Comput. Syst. Int. J. Escience. 2021;118:187–197. [Google Scholar]

[bib0019] 19.Hamer A.M., et al. Replacing human interpretation of agricultural land in Afghanistan with a deep convolutional neural network. Int. J. Remote Sens. 2021;42(8):3017–3038. [Google Scholar]

[bib0020] 20.Hu J., et al. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020;42(8):2011–2023. doi: 10.1109/TPAMI.2019.2913372. [DOI] [PubMed] [Google Scholar]

[bib0021] 21.Woo S., et al. CBAM: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV); Munich, Germany; Springer; 2018. pp. 3–19. [Google Scholar]

[bib0022] 22.Sindi H., et al. Random fully connected layered 1D CNN for solving the Z-bus loss allocation problem. Measurement. 2021;171:8. Article ID. 108794. [Google Scholar]

[bib0023] 23.Kumar A., et al. Topic-document inference with the gumbel-softmax distribution. IEEE Access. 2021;9:1313–1320. [Google Scholar]

[bib0024] 24.Sathya P.D., et al. Color image segmentation using Kapur, Otsu and minimum cross entropy functions based on exchange market algorithm. Expert Syst. Appl. 2021;172:30. Article ID. 114636. [Google Scholar]

[bib0025] 25.Kim S., et al. Synthesis of brain tumor multicontrast MR images for improved data augmentation. Med. Phys. 2021:14. doi: 10.1002/mp.14701. [DOI] [PubMed] [Google Scholar]

[bib0026] 26.Wang S.-H. Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inf. Fusion. 2021;67:208–229. doi: 10.1016/j.inffus.2020.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0027] 27.Cheng X. PSSPNN: PatchShuffle stochastic pooling neural network for an explainable diagnosis of COVID-19 with multiple-way data augmentation. Comput. Math. Methods Med. 2021;2021 doi: 10.1155/2021/6633755. Article ID. 6633755. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[bib0028] 28.Chicco D., et al. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. Biodata Mining. 2021;14(1):22. doi: 10.1186/s13040-021-00244-z. Article ID. 13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0029] 29.Selvaraju R.R., et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision. 2020;128(2):336–359. [Google Scholar]

PERMALINK

MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray

Yu-Dong Zhang

Zheng Zhang

Xin Zhang

Shui-Hua Wang

Abstract

Background

Method

Results

Conclusion

1. Introduction

Fig. 1.

2. Literature survey

3. Dataset

3.1. Data harmonization

Algorithm 1.

3.2. Data Preprocessing

Fig. 2.

Fig. 3.

4. Methodology

4.1. Convolutional block attention module

Table 1.

Fig. 4.

4.2. Channel Attention Module

Fig. 5.

4.3. Spatial Attention Module

4.4. Single Input and Multiple Input Deep Convolutional Attention Networks

Fig. 6.

Table 2.

4.5. 18-way data augmentation

Fig. 7.

4.6. Implementation and evaluation

Fig. 8.

5. Experiments, results, and discussions

5.1. Parameter setting

Table 3.

5.2. Statistics of proposed MIDCAN

Table 4.

5.3. Effect of multimodality and attention mechanism

Table 5.

Fig. 9.

Fig. 10.

5.4. Explainability of proposed model

Fig. 11.

5.5. Comparison to State-of-the-art approaches

Fig. 12.

Table 6.

6. Conclusion

Declaration of Competing Interest

Acknowledgment

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases