Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2023 Jun 2;163:107113. doi: 10.1016/j.compbiomed.2023.107113

GIONet: Global information optimized network for multi-center COVID-19 diagnosis via COVID-GAN and domain adversarial strategy

Jing Zhang a,1, Yiyao Liu b,1, Baiying Lei b, Dandan Sun a, Siqi Wang a, Changning Zhou a, Xing Ding a, Yang Chen a, Fen Chen a, Tianfu Wang b, Ruidong Huang a, Kuntao Chen a,
PMCID: PMC10242645  PMID: 37307643

Abstract

The outbreak of coronavirus disease (COVID-19) in 2019 has highlighted the need for automatic diagnosis of the disease, which can develop rapidly into a severe condition. Nevertheless, distinguishing between COVID-19 pneumonia and community-acquired pneumonia (CAP) through computed tomography scans can be challenging due to their similar characteristics. The existing methods often perform poorly in the 3-class classification task of healthy, CAP, and COVID-19 pneumonia, and they have poor ability to handle the heterogeneity of multi-centers data. To address these challenges, we design a COVID-19 classification model using global information optimized network (GIONet) and cross-centers domain adversarial learning strategy. Our approach includes proposing a 3D convolutional neural network with graph enhanced aggregation unit and multi-scale self-attention fusion unit to improve the global feature extraction capability. We also verified that domain adversarial training can effectively reduce feature distance between different centers to address the heterogeneity of multi-center data, and used specialized generative adversarial networks to balance data distribution and improve diagnostic performance. Our experiments demonstrate satisfying diagnosis results, with a mixed dataset accuracy of 99.17% and cross-centers task accuracies of 86.73% and 89.61%.

Keywords: COVID-19 diagnosis, Global information optimized network, Domain adversarial strategy, COVID-GAN

1. Introduction

Based on a study [1], the pneumonia caused by the novel coronavirus (COVID-19) is a severe acute illness, and the original strain in early 2020 had a mortality rate of nearly 2.0%. Once a person contracts COVID-19, they usually develop respiratory symptoms within a few days, which can quickly progress to acute respiratory distress syndrome. The rapid progression of the disease makes it difficult to detect in patients with community-acquired pneumonia.(CAP). According to the latest report from the World Health Organization [2], as of now, COVID-19 has resulted in 624 million confirmed cases and 6.5 million deaths globally, imposing significant health risks and economic burdens worldwide. Although the current mainstream strain, Omicron, has significantly reduced mortality rates, its contagiousness is astonishing and can still cause a huge number of new COVID-19 cases. Early screening of COVID-19 cases is critical for effective prevention and treatment of the disease. Detecting COVID-19 patients early and providing prompt treatment can effectively prevent disease progression and significantly reduce mortality rates. Therefore, finding accurate and efficient screening methods for COVID-19 patients is essential. Reverse transcription polymerase chain reaction (RT-PCR) has been widely used for COVID-19 screening, however, it is still limited by the large consumption of medical resources [3].

The typical medical diagnostic methods for pulmonary diseases include X-rays and computed tomography scans (CT) [4], so it can be inferred that these two medical imaging technologies are also feasible auxiliary diagnostic scheme for COVID-19 [5]. Although X-ray imaging can identify bronchitis, bronchiolitis, and shadows in the lung field, its rate of missed diagnosis is high, and there are no significant image features in the early stage of the disease [6]. On the other hand, ground-glass opacities (GGO) in the CT image are a typical manifestation that can confirm COVID-19 [7]. GGO and infiltrative shadows appear in both lungs with the development of the disease, and pulmonary consolidation may occur in severe cases [8]. By analyzing the distribution and size of lesions in CT images, a pulmonary inflammation index can be mapped to assess the severity of lung inflammation [9]. However, manual interpretation requires a significant amount of experienced medical personnel and is time-consuming, which is not feasible during the COVID-19 pandemic when medical resources are severely limited [10]. Therefore, it is crucial to use deep learning methods for the automatic diagnosis of COVID-19 [11].

Currently, many researchers are using deep learning to assist in the study of COVID-19. For example, Ardakani et al. [12] utilized ten typical Convolutional Neural Networks (CNNs) for the joint diagnosis of COVID-19 on transverse CT images. The above methods improve the diagnostic performance of the model by cascading multiple neural networks, but training is cumbersome, the number of parameters is huge, and the final result is only obtained by integrating the results of multiple models. Similarly, Ko et al. [13] has deployed a fast-track COVID-19 classification network for conducting a three-class study on COVID-19 using transfer learning from four advanced networks. In addition to the aforementioned studies, Shah et al. [14] and Wang et al. [15] also conducted methodological research on two-dimensional convolutional neural networks. However, 2D slice images may lose the relevance of axial features, and multiple slices of the same case at different locations may conflict during the classification process. How to automate the selection of slices with diagnostic significance is a challenge. Therefore, it can be seen that conducting research on 3D volume images is more clinically significant. For example, Serte et al. [16] used 3D CNN to diagnose COVID-19 and achieved a 96.0% AUC score. In this study, only a single backbone network was used and no comparison was made with different networks. In addition, He et al. [17] utilized the differentiable neural architecture search method to automatically search for the optimal network structure for the diagnosis of COVID-19. Which has the advantages of automation, efficiency and flexibility, but suffers from high computational cost, data requirements, and the possibility of suboptimal results due to the complexity of the search process.

Due to COVID-19 being a global pandemic, multi-center research and even cross-national research is urgently needed. However, there are currently only limited studies that focus on multi-center research. One study [18] utilized clinical data and CT images to predict the mortality rate of patients across four centers. However, they only utilized data from different centers for classification and did not address the feature differences present in multi-center data. Another study [19] introduced a new robust weakly supervised learning approach to address the variation in CT image features among different centers. Due to the heterogeneity of CT images from different centers as described in the above-mentioned studies, it is necessary to address this heterogeneity. Therefore, we plan to use domain adversarial learning strategy.

Due to the scarcity of medical images, it is usually necessary to use some data enhancement methods to expand the dataset. The generative models such as Variational AutoEncoder (VAE) [20], generative adversarial network (GAN) [21] are used. GAN is one of the most popular methods at present. It generates new data that conforms to the real data distribution through the train process of a discriminator and a generator. Many researchers constantly improve the generation effect of GAN, such as Gulrajani et al. [22] designed WGAN using wasserstein distance instead of cross entropy loss, and Mirza et al. [23] input conditional information and hidden variables into the generator and discriminator to control the type of generated data. After that, Brock et al. [24] designed BigGAN which slice the conditional information and hidden variables into different stages of the generator to achieve data generation under specific conditions, and improved the generation effect by increasing the batch size. Mariani et al. [25] proposed BAGAN, which used a VAE to obtain the distribution of data of different categories, and used this input generator to generate data of specified categories. However, due to the difficulty of training and the limitation of video storage, 3D GAN method is still relatively rare and is mostly used for the generation of brain cross modal images. There is less GAN model to generate CT images to balance data categories for the diagnosis of COVID-19.

In this study, we embed graph convolution aggregation and multi-scale self-attention fusion into a CNN backbone for global information optimization (GIONet) by enhancing the ability of global feature extraction. Experimental results on mixed-center task shows that our enhanced 3D CNN achieves quite promising results for COVID-19 diagnosis, which achieves an accuracy of 99.17%. In addition, 86.73% and 89.61% accuracy was obtained in the two tasks of cross center research. In summary, this work has three main contributions:

•We design a novel coronavirus disease generative adversarial network (COVID-GAN) to generate high-quality 3D CT data for imbalanced learning and COVID-19 diagnosis.

•We adopt a graph enhanced aggregation unit (GEAU) and multi-scale self-attention fusion unit (MSFU) in 3D CNN to enhance the ability of global features extraction, which can effectively extract features of lesions that are widely distributed in the lungs, such as GGO.

•We propose a domain adversarial strategy (DAS) for cross-center task to mitigate the difference of CT image characteristics between different centers.

2. Methods

2.1. Backbone

In this study, we have designed a COVID-GAN to balance the data before training. After obtaining the balanced data, we constructed a global information optimized network based on 3D SqueezeNet [26] due to its lightweight and high accuracy, making it easy to deploy and suitable for COVID-19 diagnosis. To enhance the global feature extraction capability of CNNs, we added two parallel branches in the middle of the backbone network. These branches are GEAU and MSFU, consisting of graph-enhanced aggregation operations and multi-scale self-attention fusion, respectively. GEAU utilizes the relationship between graph structure nodes and edges to enhance feature extraction, while MSFU utilizes self-attention to capture long-distance dependencies. Once feature extraction is complete, the input image is classified into its respective domain and disease categories using a domain classifier and a disease classifier. The overall block diagram of our approach is depicted in Fig. 1.

Fig. 1.

Fig. 1

Global information optimized network and domain adversarial strategy. GEAU represents the graph enhanced aggregation unit and MSFU represents the multi-scale self-attention fusion unit.

2.2. COVID-GAN for imbalanced-learning

Based on our statistics and analysis of the dataset, we can clearly see that there is an imbalance in the categories of the dataset. As shown in Table 1, in the training set of the mixed-center task, the number of cases in the healthy control group (normal) is 516, the number of cases in CAP is 810, and the number of cases in COVID-19 is 4466. In addition, in the training sets of the two cross-center tasks, the distribution of the number of samples of different categories in the dataset is 749:1007:4097 and 378:942:1029. Fig. 2 shows the imbalance in data distribution clearly and intuitively.

Table 1.

The detailed distribution of datasets in different tasks.

Task Train
Val
Test
Normal CAP COVID-19 Normal CAP COVID-19 Normal CAP COVID-19
Mixed-center 516 810 4466 220 258 281 218 208 293
Task1 749 1007 4097 121 171 708 84 98 235
Task2 378 942 1029 93 236 258 84 98 235

Fig. 2.

Fig. 2

The distribution of data of different categories in mix center task.

Such severely unbalanced data sets will seriously affect the COVID-19 diagnosis performance. Therefore, we have designed a new conditional generation countermeasure network adapted to 3D data, which expand the sample data of a small number of categories and achieve the purpose of data balance. Accordingly, the final diagnosis performance of COVID-19 can be enhanced.

The specific implementation idea of COVID-GAN is shown in Fig. 3, which is composed of two parts: generator and discriminator. Among them, the input of the generator is a noise with a length of 480 that conforms to the Gaussian distribution, and a unique heat coding of an image category. First, the Gaussian noise is divided into six equal length sub-vectors, and the first sub-vector is expanded into a 1 × 4×4 3D matrix, which is the initial input hidden variable of the generator. After five stages of three bits convolution and up-sampling, a 32 × 128×128 generated image is finally obtained. More specifically, the remaining five Gaussian noise sub-vectors will be gradually substituted in different stages of the generator after being connected with the category coding. As the condition control information of the generator, they will control the category of the image generated by the generator and maintain the diversity of the generated image. In this process, the feature will be weighted using the condition control information coding and can be described in Eq (1):

fout=BN(fin)×[1+Linear1(zcondition)]+Linear2(zcondition) (1)

where BN means batch normalization, zcondition means the hidden variable with conditional information, Linear1, Linear2 means two different linear transformation.

Fig. 3.

Fig. 3

COVID-GAN for COVID-19 diagnosis with imbalanced data. zcondition means the hidden variable with conditional information.

In the discrimination process, the input image is a real or false image and the category coding of the image. The input image is gradually extracted as a feature vector through multi-layer convolution and down-sampling layers, and the final feature value is obtained by weighting the feature vector with the category coding for discrimination. It is worth mentioning that the self-attention calculation process is added to the generator and discriminator to enhance the global information of features. The loss function of the generated confrontation process is described in Eq (2):

LGAN=minGmaxDExPr[logD(x)]+Ex˜Pg[log(1D(x˜))] (2)

where ExPr means that x is a sample from the real distribution and Ex˜Pg means the x˜ is a sample from the generated data.

Finally, the generated CT images will be resampled to 64 × 256 ×256 for training, which is consistent with the original image.

2.3. Graph enhanced aggregation unit

Recently, graph-based methods have gained popularity due to their high efficiency in relational reasoning. Following the approach of [27], which utilized graph convolution for global reasoning. We proposed graph enhanced aggregation unit which illustrated in Fig. 4. Firstly, we transformed the coordinate space into a hidden space by a 1 × 1 × 1 convolution layer. Next, we constructed a fully-connected adjacency matrix V by a learnable projection function shown in Eq. (3). And then, the features was calculated by two graph convolutional layers and reversed to coordinate space. The total GEAU process can be represented by Eq. (4), where f denotes the output features of the convolutional layer, V denotes the fully-connected adjacency matrix, each node stores features, and Wg_1, Wg_2 denotes the parameter matrix used in graph convolution.

V=projection(Conv(Fin))×Conv(Fin) (3)
OutputGEAU=Fin+Conv(reverse(Conv(Fin))×Wg_1V Wg_2) (4)

Fig. 4.

Fig. 4

Graph enhanced aggregation unit. Fin represents the feature from each stage of the backbone and is also the input feature of this unit, GCN means the normal graph convolutional network and OutputGEAU means the output feature of GEAU. V denotes the fully-connected adjacency matrix, each node stores features, and Wg_1, Wg_2 denotes the parameter matrix used in graph convolution.

2.4. Multi-scale self-attention fusion unit

In addition to the Graph enhanced aggregation unit used in the previous section to enhance the global information extraction capability of the model, we also use a multi-scale self-attention calculation method, which can strengthen the long-distance relationship between features and further optimize global information. Unlike traditional Transformers [28], inspired by the dilated convolution operation and Swin-Transformer [29], this paper uses three parallel multi-head self-attention calculation branches, which perform window-based self-attention calculation on features operated with different dilation factors (e.g., 0, 1, 2). This further expands the global perception ability by sampling a larger range of features without significantly increasing computational costs. The specific operation is shown in Fig. 5, where Q, K and V represent query, key and value vectors, respectively. After the self-attention calculation in each branch, the dimensions of the features are different due to the dilation sampling. Therefore, before feature fusion, the feature dimensions need to be restored by padding with zero elements. The padding interval is consistent with the previous dilation sampling coefficient. After unifying the dimensions, the fusion is achieved by element-wise addition. Additionally, layer normalization (LN) layer and multilayer perceptron (MLP) layer. The calculation process of MSFU can be described in Eq (5), where df represents dilation factors and d stands for dimension.

OutputMSFU=df=02MLP(LN(Softmax(QKTd)×V))/3 (5)

Fig. 5.

Fig. 5

Multi-scale self-attention fusion unit. Fin represents the feature from each stage of the backbone and is also the input feature of this unit, Embed means the patch embed operation, MLP means the Multilayer Perceptron and OutputMSFU means the output feature of MSFU.

2.5. Domain adversarial strategy for cross-center task

Due to the varied equipment types and scanning methods utilized by different centers, the CT images exhibit distinct features that can pose a challenge for classifiers. In order to address this issue, we have employed a domain adversarial training approach in this study, with the aim of minimizing the feature discrepancies between centers. The schematic diagram of domain adversarial learning is shown in Fig. 6, where s and t represent the source domain and target domain respectively. x and f means input images and features. In addition, y and d represent the class labels and domain labels. y˜ and d˜ mean the predicted disease type and predicted domain. The strategy can be described as follow: images were input into the GIONet and gained the features. Then, a diseases classifier and a domain discriminator were adopted to produce the prediction. And last, two loss functions were used to optimize the distance between features in different domains while maximizing the distance between features in different diseases.

Fig. 6.

Fig. 6

Schematic diagram of domain adversarial strategy. Forwardprop denotes the forward propagation and Backprop denotes the back propagation. Enhanced CNN means the backbone in Fig. 1.

The loss function used in our domain adversarial training strategy is shown in Eq (6), where Ly represents the loss function of the disease classifier, ys represents the real label of the disease, y˜s represents the disease type predicted by the network, Ld represents the loss function of the domain discriminator, ds and dt represents the domain label, d˜s and d˜t represents the domain predicted by the network, N denotes the total number of samples, while source refers to the source domain, which includes the training and validation sets. Conversely, target represents the target domain, which is the test set. The target domain only provides its domain label, and the class label is not disclosed. The loss functions used are Ly and Ld, both of which are cross-entropy.

LDAS=i=0sourceNLy(y˜s,ys)i=0sourceNLd(d˜s,ds)i=0targetNLd(d˜t,dt) (6)

3. Results

3.1. Data and preprocessing

In this study, we utilize the publicly available COVIDx CT-3 dataset [30], which contains imaging data of COVID-19 patients from multiple countries and centers. The dataset includes 7270 CT scans of 6068 patients, comprising 431025 2D slice images collected from 11 data centers across more than 17 countries. We transform the 2D slice data into a 3D voxel image, where we upsample or downsample to 64 in the z-axis and 256 in the other two directions due to the inconsistent number of slices in the axial direction. This results in a final 3D volume size of [64, 256, 256]. We perform experiments on various tasks using this large dataset, and the detailed data partitioning is presented in Table 1. The mixed-center task employs data from all centers and splits the dataset into training, validation, and testing sets according to COVIDx CT-3 [30]. In Task 1, we use two centers for testing and the remaining centers for training. Task 2 uses the same test set as Task 1 but reduces the number of centers used for training.

3.2. Implementation details

In this research, we trained our models in parallel on four TITAN RTX GPUs with 24 GB memory, utilizing the PyTorch framework. The training process involved an initial learning rate of 5×104, and a learning rate decay mechanism with 0.95 decay coefficient was employed. Our experiment will be conducted for 10,000 epochs and the best model will be obtained through an early stopping mechanism. In addition, we have set the batch size to 8. To assess the performance of our GIONet with COVID-GAN and DAS, we compared it with several state-of-the-art models, including ResNets [31], MobileNetV2 [32], ShuffleNet [33], EfficientNet [34], and SqueezeNet [26]. In addition, Transformer methods that have demonstrated outstanding performance in visual tasks, such as ViT [35] and Swin-Transformer [36], will also be compared in the experiment, using their 3D versions. While also considering prior research on the same dataset, such as NASNet-A-Mobile [37], COVID-Net CT L [38], and COVID-Net CT S [39]. The results of the above three methods in Table 1 are directly quoted from their reported results in their respective papers. We measured the performance of our models using various metrics, including accuracy (ACC), sensitivity (SEN), precision (PRE), specificity (SPE), F1-score, and area under the curve (AUC), which were calculated using the macro-average method. We evaluated our COVID-GAN using the Fréchet Inception Distance (FID) [40] and Kernel Maximum Mean Discrepancy (MMD) [41]. The calculation formulas for the above metrics are shown below:

ACC=13i=03TPi+TNiTPi+TNi+FPi+FNi (7)
SEN=13i=03TPiTPi+FNi (8)
PRE=13i=03TPiTPi+FPi (9)
SPE=13i=03TNiTNi+FPi (10)
F1score=2PRE×SENPRE+SEN (11)
AUC=13i=03jpositiveirankjMi(Mi+1)2MiNi (12)
FID=μrμg2+Tr(Σr+Σg2(ΣrΣg)1/2) (13)

where μ is the empirical mean, Σ is the empirical covariance, Tr is the trace of a matrix, r represents the real dataset, and g represents the generated dataset. Specifically, the calculation process of FID is as follows: Generate n*2048 vectors using InceptionV3 for n images in the target images, take the average to obtain μr. Generate m*2048 vectors using InceptionV3 for m images in the generated images, take the average to obtain μg. Calculate Σr and Σg from μr and μg, and finally obtain FID.

MMD=Ex,xPr[k(x,x)]2Expdata,yPg[k(x,y)]+Ey,yPg[k(y,y)] (14)

MMD used to measure the distance between two probability distributions. In the MMD formula, Pr is the real data distribution, Pg is the generated data distribution, x is a sample drawn from the real data distribution, y is a sample drawn from the generated data distribution and k(x,x) is the Gaussian kernel function used to measure the similarity between samples x and x.

3.3. Ablation study

To clearly demonstrate the effectiveness of each module, we have conducted many ablation experiments. Table 2 shows the performance improvements of generated images, GEAU and MSFU for training. We can observe that designed module improves the diagnosis performance. COVID-GAN, GEAU and MSFU respectively increased the accuracy of the backbone network by 0.93, 0.18, and 0.37 percentage points. In addition, the combination of several modules has also contributed to different degrees of improvement in the model, and the complete model achieved the best performance.

Table 2.

The ablation study of different modules, Boldface denotes the best performance(%).

GAN GEAU MSFU ACC PRE SEN SPE F1 AUC
97.59 96.60 96.27 98.09 96.43 98.62
98.52 97.84 97.81 98.85 97.82 99.15
97.77 96.73 96.68 98.27 96.71 99.19
97.96 97.06 96.91 98.40 96.98 98.81
98.78 98.15 98.25 99.09 98.20 99.25
98.69 97.99 98.09 99.02 98.04 99.32
98.05 97.22 97.02 98.47 97.12 99.38
99.17 98.77 98.79 99.36 98.78 99.67

3.4. Comparative results

Table 3 displays the performance of our COVID-GAN following academic conventions. It is evident that our COVID-GAN obtains the lowest values for both MMD and FID, indicating that our GAN is capable of generating diverse CT images that closely approximate the true distribution. Table 4 showcases a comparative evaluation of our method and several neural network approaches on mixed-center task data. We observe that the use of SqueezeNet as the backbone leads to superior outcomes in various comparison metrics. Our GIONet achieves the best overall accuracy of 99.17%, with sensitivity and specificity values of 98.79% and 99.36%, respectively. Our results outperform both conventional neural networks and those reported in prior studies. Additionally, we can visualize the classification results for the mixed-center task through Fig. 7, where our method has the highest number of samples in the diagonal direction of confusion matrix and achieves the best results.

Table 3.

The performance of our COVID-GAN. Boldface denotes the best performance(%).

Methods MMD FID
BAGAN 3.55 75.24
BigGAN 2.01 35.41
WGAN 3.23 68.52
CGAN 2.52 42.33
Ours 1.47 30.54

Table 4.

The performance comparison of different methods in mixed dataset. Boldface denotes the best performance(%).

Methods ACC PRE SEN SPE F1 AUC
ResNet-18 96.48 95.03 94.62 97.22 94.83 98.71
ResNet-34 93.60 90.82 90.27 94.96 90.55 95.59
MobileNet 94.44 91.57 92.08 95.77 91.82 95.47
ShuffleNet 95.36 93.05 93.33 96.46 93.19 97.72
EfficientNet 85.91 90.46 78.57 89.02 79.50 96.05
SqueezeNet 97.59 96.60 96.27 98.09 96.43 98.62
ViT 96.75 95.17 95.15 97.50 95.16 96.88
Swin-Transformer 97.03 95.64 95.47 97.70 95.55 97.98
NASNet-A-Mobile [37] 98.8 98.7
COVID-Net CT L [38] 98.4 98.1
COVID-Net CT S [39] 98.3 97.3
Ours 99.17 98.77 98.79 99.36 98.78 99.67

Fig. 7.

Fig. 7

The confusion matrix of mixed-centers task.

Furthermore, Table 5 presents comparative experimental results for two cross-center tasks. Our method, utilizing domain adversarial training strategies, attains the best performance with 86.73% accuracy in the first cross-center task and 89.61% accuracy in the second task, surpassing other approaches in all measures. The abbreviation “w/o DAS” indicates that our network does not incorporate domain adversarial strategy. It is worth noting that the experimental data of both Task 1 and Task 2 in our method are balanced by COVID-GAN. Finally, although our method still exhibits some weaknesses in distinguishing between CAP and COVID-19, it outperforms other methods, as evidenced by Fig. 8. The results of Task 1 were not as good as Task 2, despite Task 1 having more training data, because the data distribution in Task 2 was more balanced, and the lesions in Task 2 were more representative of the overall population. Additionally, Task 1 had a larger class imbalance, which made it more challenging for the model to learn and generalize well to the test set.

Table 5.

The performance comparison of different methods in cross center dataset. Boldface denotes the best performance(%).

Task Methods ACC PRE SEN SPE F1 AUC
Task1 ResNet-18 81.77 67.08 64.77 81.00 65.90 72.12
ResNet-34 81.45 70.60 64.23 80.98 67.26 75.77
MobileNet 81.45 67.30 63.69 80.40 65.44 72.99
ShuffleNet 81.93 75.40 64.11 80.96 69.30 74.39
EfficientNet 81.14 65.54 62.24 80.46 63.85 74.07
SqueezeNet 83.85 74.54 68.20 82.95 71.22 79.03
ViT 80.02 58.65 62.81 79.69 60.66 70.25
Swin-Transformer 82.41 69.23 65.33 81.40 67.23 80.52
Ours(w/o DAS) 85.93 79.61 73.07 85.41 76.20 80.85
Ours 86.73 85.98 73.61 85.87 79.32 82.32

Task1 ResNet-18 76.98 63.94 68.78 81.06 66.27 80.40
ResNet-34 74.42 61.98 66.26 79.53 64.05 79.92
MobileNet 84.33 76.05 76.50 86.71 76.27 86.22
ShuffleNet 77.62 62.86 66.97 80.78 64.85 79.67
EfficientNet 57.31 36.15 54.17 71.72 43.36 70.14
SqueezeNet 85.45 77.25 77.49 86.98 77.37 87.52
ViT 84.81 76.38 70.72 84.35 73.44 85.08
Swin-Transformer 85.29 77.51 71.74 84.90 74.51 88.20
Ours(w/o DAS) 88.01 91.94 74.38 86.26 82.23 89.15
Ours 89.61 92.78 77.83 88.10 84.65 90.21

Fig. 8.

Fig. 8

The confusion matrix of Task1 and Task2.

From Table 6, we can see that our method has 3.45M parameters and a computational complexity of 23.2GMac, while SqueezeNet, which has the second-best performance, has 1.84M parameters and a computational complexity of 15.61. With a small increase in the number of parameters, we can achieve an improvement in accuracy, and in our experimental environment, such an increase in parameters is completely acceptable.

Table 6.

The computational complexity comparison of different methods.

Methods Flops(GMac) Params(M)
ResNet-18 126.04 33.16
ResNet-34 216.72 63.47
MobileNet 9.75 2.36
ShuffleNet 2.63 1.30
EfficientNet 5.55 4.69
SqueezeNet 15.61 1.84
ViT 10.79 84.07
Swin-Transformer 103.1 27.54
Ours 23.2 3.45

3.5. Visualization

We have visualized the generated 3D CT image from the designed generator from the transverse section as shown in Fig. 10. It shows that the quality of the image we generated is good. From the last line of Fig. 10, the red boxes indicate the difference from Normal, CAP and COVID-19. There are shadows in the lungs of CAP cases and GGO in the lungs of COVID-19 cases. However, there are some problems with the images generated by other methods. For example, BAGAN’s images generated by mode collapse are very similar, or CGAN produces some strange textures. Our method can obtain images with similar structure but different categories by changing the conditional information under the fixed hidden variable input.

Fig. 10.

Fig. 10

The 2D visualization of generated volume CT images from different GAN methods and the last line are the results of our COVID-GAN.

To visually demonstrate the effectiveness of the feature extraction and classification methods proposed in our work, we applied t-distributed stochastic neighbor embedding (t-SNE) visualization to the output features of each method [42]. Fig. 11 shows that our method can effectively distinguish different disease features. In contrast, the features generated by other methods often have a large overlap, while our method separates the features of different categories further apart. Additionally, domain adversarial training can minimize the distance between features from different domains and maximize the distance between features from different diseases. This can be observed in Fig. 12, where domain adversarial training effectively reduces the feature distance between different domains and increases the feature distance between different disease categories.

Fig. 11.

Fig. 11

The visualization of t-SNE characteristics of different methods in mixed-center task, in which green represents normal, blue represents CAP, and red represents COVID-19.

Fig. 12.

Fig. 12

The visualization of t-SNE characteristics in cross-center tasks, in which blue dots represent the characteristics of the source domain, red dots represent the characteristics of the target domain, and different shapes represent different disease types.

From Fig. 9, it can be seen intuitively that the classification basis of our method mainly lies in some special features of the lungs, such as shadow, ground-glass opacity and other areas, while the classification basis of other methods is obviously incorrect. For example, in the case of ResNet34’s activation map, the classification basis is too broad to explain the differences in the model’s classification judgments for different categories of images.

Fig. 9.

Fig. 9

The class activation map of different methods on three categories of data.

4. Conclusions

In summary, this paper presents a novel approach to enhance the global feature extraction ability of neural networks for the accurate classification of COVID-19, by leveraging graph enhanced aggregation unit and multi-scale self-attention fusion unit. Furthermore, a COVID-GAN is introduced to generate high-quality 3D CT images for data balancing learning, while domain adversarial training is utilized to optimize the performance of cross-center tasks by reducing the feature difference between domains. Experimental results demonstrate that the proposed method achieves exceptional accuracy, with 99.17% on mixed-center task and 86.73%, 89.61% accuracy on two cross-center tasks, surpassing existing methods in terms of feature extraction ability and classification performance. These findings suggest that the proposed approach holds great potential for accurate COVID-19 diagnosis.

5. Discussion

This work has several potential limitations. First of all, because our three-dimensional data comes from the reconstruction of two-dimensional slices, and the number of slices in some cases is very small, we need to repeat the sampling of some slices for many times during the reconstruction. Therefore, this part of the cases will produce distortion in the three-dimensional data, which will affect the experimental results. In the future, we will use the GAN method to reconstruct 3D images on a small number of slices of this part of data with a small number of slices to replace repeated sampling. Secondly, because there are only two types or even one type of data in this dataset on several centers, there are fewer centers that can be selected for training when conducting cross center dataset experiments. In the future, we will collect more multi center data to more widely verify our multi center domain adversarial COVID-19 diagnosis method. Finally, our COVID-GAN method can only expand the category balance data, and we hope that it can be improved in the future to achieve category missing completion, as well as domain confrontation adaptation for multi-centers. Finally, this work can also be extended to the diagnosis of other lung diseases in the future, such as nodules, inflammations, and so on.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

This work was supported partly by the Science and Technology Plan Project of Guizhou Province, China (Qiankehe Support [2020] No. 4Y179 ), National Natural Science Foundation of China (No. 82260341, No. 62005325), Medical Science and Technology Research Project of Guangdong Province, China (B2022144), Science and Technology Plan Grant of Guizhou Province, China (Qiankehe Foundation-ZK [2022] General 634).

References

  • 1.Xu Z., Shi L., Wang Y., Zhang J., Huang L., Zhang C., Liu S., Zhao P., Liu H., Zhu L., et al. Pathological findings of COVID-19 associated with acute respiratory distress syndrome. Lancet Respir. Med. 2020;8(4):420–422. doi: 10.1016/S2213-2600(20)30076-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Organization W.H., et al. World Health Organization; 2020. Coronavirus disease 2019 (COVID-19): Situation report, 100. [Google Scholar]
  • 3.Li Z., Liu F., Cui J., Peng Z., Chang Z., Lai S., Chen Q., Wang L., Gao G.F., Feng Z. Comprehensive large-scale nucleic acid–testing strategies support China’s sustained containment of COVID-19. Nat. Med. 2021;27(5):740–742. doi: 10.1038/s41591-021-01308-7. [DOI] [PubMed] [Google Scholar]
  • 4.Chung M., Bernheim A., Mei X., Zhang N., Huang M., Zeng X., Cui J., Xu W., Yang Y., Fayad Z.A., et al. CT imaging features of 2019 novel Coronavirus (2019-nCoV) Radiology. 2020;295(1):202–207. doi: 10.1148/radiol.2020200230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Salehi S., Abedi A., Balakrishnan S., Gholamrezanezhad A., et al. Coronavirus disease 2019 (COVID-19): A systematic review of imaging findings in 919 patients. Ajr. Am. J. Roentgenol. 2020;215(1):87–93. doi: 10.2214/AJR.20.23034. [DOI] [PubMed] [Google Scholar]
  • 6.Ning R., Tang X., Conover D. X-ray scatter correction algorithm for cone beam CT imaging. Med. Phys. 2004;31(5):1195–1202. doi: 10.1118/1.1711475. [DOI] [PubMed] [Google Scholar]
  • 7.Kong W., Agarwal P.P. Chest imaging appearance of COVID-19 infection. Radiol. Cardiothoracic Imaging. 2020;2(1) doi: 10.1148/ryct.2020200028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bernheim A., Mei X., Huang M., Yang Y., Fayad Z.A., Zhang N., Diao K., Lin B., Zhu X., Li K., et al. Chest CT findings in Coronavirus disease-19 (COVID-19): Relationship to duration of infection. Radiology. 2020;295(3):685–691. doi: 10.1148/radiol.2020200463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wan S., Yi Q., Fan S., Lv J., Zhang X., Guo L., Lang C., Xiao Q., Xiao K., Yi Z., et al. Relationships among lymphocyte subsets, cytokines, and the pulmonary inflammation index in Coronavirus (COVID-19) infected patients. Br. J. Haematol. 2020;189(3):428–437. doi: 10.1111/bjh.16659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jin C., Chen W., Cao Y., Xu Z., Tan Z., Zhang X., Deng L., Zheng C., Zhou J., Shi H., et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nature Commun. 2020;11(1):1–14. doi: 10.1038/s41467-020-18685-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhang K., Liu X., Shen J., Li Z., Sang Y., Wu X., Zha Y., Liang W., Wang C., Wang K., et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell. 2020;181(6):1423–1433. doi: 10.1016/j.cell.2020.04.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ardakani A.A., Kanafi A.R., Acharya U.R., Khadem N., Mohammadi A. Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Comput. Biol. Med. 2020;121 doi: 10.1016/j.compbiomed.2020.103795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ko H., Chung H., Kang W.S., Kim K.W., Shin Y., Kang S.J., Lee J.H., Kim Y.J., Kim N.Y., Jung H., et al. COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest CT image: Model development and validation. J. Med. Internet Res. 2020;22(6) doi: 10.2196/19569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shah V., Keniya R., Shridharani A., Punjabi M., Shah J., Mehendale N. Diagnosis of COVID-19 using CT scan images and deep learning techniques. Emerg. Radiol. 2021;28(3):497–505. doi: 10.1007/s10140-020-01886-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang S., Kang B., Ma J., Zeng X., Xiao M., Guo J., Cai M., Yang J., Li Y., Meng X., et al. A deep learning algorithm using CT images to screen for Corona virus disease (COVID-19) Eur. Radiol. 2021;31(8):6096–6104. doi: 10.1007/s00330-021-07715-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Serte S., Demirel H. Deep learning for diagnosis of COVID-19 using 3D CT scans. Comput. Biol. Med. 2021;132 doi: 10.1016/j.compbiomed.2021.104306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.He X., Yang X., Zhang S., Zhao J., Zhang Y., Xing E., Xie P. 2020. Sample-efficient deep learning for COVID-19 diagnosis based on CT scans. medrxiv, 2020–04. [Google Scholar]
  • 18.Meng L., Dong D., Li L., Niu M., Bai Y., Wang M., Qiu X., Zha Y., Tian J. A deep learning prognosis model help alert for COVID-19 patients at high-risk of death: A multi-center study. IEEE J. Biomed. Health Inf. 2020;24(12):3576–3584. doi: 10.1109/JBHI.2020.3034296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ye Q., Gao Y., Ding W., Niu Z., Wang C., Jiang Y., Wang M., Fang E.F., Menpes-Smith W., Xia J., et al. Robust weakly supervised learning for COVID-19 recognition using multi-center CT images. Appl. Soft Comput. 2022;116 doi: 10.1016/j.asoc.2021.108291. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kingma D.P., Welling M. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. [Google Scholar]
  • 21.Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial networks. Commun. ACM. 2020;63(11):139–144. [Google Scholar]
  • 22.Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017;30 [Google Scholar]
  • 23.Mirza M., Osindero S. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. [Google Scholar]
  • 24.Brock A., Donahue J., Simonyan K. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096. [Google Scholar]
  • 25.Mariani G., Scheidegger F., Istrate R., Bekas C., Malossi C. 2018. Bagan: Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655. [Google Scholar]
  • 26.Iandola F.N., Han S., Moskewicz M.W., Ashraf K., Dally W.J., Keutzer K. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and¡ 0.5 MB model size. arXiv preprint arXiv:1602.07360. [Google Scholar]
  • 27.Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, Y. Kalantidis, Graph-based global reasoning networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 433–442.
  • 28.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017;30 [Google Scholar]
  • 29.Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
  • 30.Tuinstra T., Gunraj H., Wong A. 2022. COVIDx CT-3: A large-scale, multinational, open-source benchmark dataset for computer-aided COVID-19 screening from chest CT images. arXiv preprint arXiv:2206.03043. [Google Scholar]
  • 31.K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  • 32.M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
  • 33.N. Ma, X. Zhang, H.-T. Zheng, J. Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 116–131.
  • 34.Tan M., Le Q. International Conference on Machine Learning. PMLR; 2019. Efficientnet: Rethinking model scaling for convolutional neural networks; pp. 6105–6114. [Google Scholar]
  • 35.Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. [Google Scholar]
  • 36.Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.
  • 37.B. Zoph, V. Vasudevan, J. Shlens, Q.V. Le, Learning transferable architectures for scalable image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8697–8710.
  • 38.Gunraj H., Wang L., Wong A. Covidnet-ct: A tailored deep convolutional neural network design for detection of Covid-19 cases from chest ct images. Front. Med. 2020;7 doi: 10.3389/fmed.2020.608525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Gunraj H., Sabri A., Koff D., Wong A. Covid-net ct-2: Enhanced deep neural networks for detection of Covid-19 from chest ct images through bigger, more diverse learning. Front. Med. 2021;8 doi: 10.3389/fmed.2021.729287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Dowson D., Landau B. The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 1982;12(3):450–455. [Google Scholar]
  • 41.Dziugaite G.K., Roy D.M., Ghahramani Z. 2015. Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906. [Google Scholar]
  • 42.Van der Maaten L., Hinton G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008;9(11) [Google Scholar]

Articles from Computers in Biology and Medicine are provided here courtesy of Elsevier

RESOURCES