Abstract
Since December 2019, coronavirus SARS-CoV-2 (COVID-19) has rapidly developed into a global epidemic, with millions of patients affected worldwide. As part of the diagnostic pathway, computed tomography (CT) scans are used to help patient management. However, parenchymal imaging findings in COVID-19 are non-specific and can be seen in other diseases. In this work, we propose to first segment lesions from CT images, and further, classify COVID-19 patients from healthy persons and common pneumonia patients. In detail, a novel Dynamic Fusion Segmentation Network (DFSN) that automatically segments infection-related pixels is first proposed. Within this network, low-level features are aggregated to high-level ones to effectively capture context characteristics of infection regions, and high-level features are dynamically fused to model multi-scale semantic information of lesions. Based on DFSN, Dynamic Transfer-learning Classification Network (DTCN) is proposed to distinguish COVID-19 patients. Within DTCN, a pre-trained DFSN is transferred and used as the backbone to extract pixel-level information. Then the pixel-level information is dynamically selected and used to make a diagnosis. In this way, the pre-trained DFSN is utilized through transfer learning, and clinical significance of segmentation results is comprehensively considered. Thus DTCN becomes more sensitive to typical signs of COVID-19. Extensive experiments are conducted to demonstrate effectiveness of the proposed DFSN and DTCN frameworks. The corresponding results indicate that these two models achieve state-of-the-art performance in terms of segmentation and classification.
Keywords: COVID-19, Computed tomography, Dynamical fusion, Transfer learning
1. Introduction
The world is experiencing a global pandemic due to the outbreak of coronavirus 2019 (COVID-19). This outbreak is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). As reported by the Center for Systems Science and Engineering (to data June 26, 2020), there have been 9,586,769 confirmed cases and 488,824 deaths worldwide. Reverse-transcription polymerase chain reaction (RT-PCR) is the gold standard for diagnosis of COVID-19. However, RT-PCR is time-consuming and labour-intensive. Therefore, the radiological imaging techniques, particularly chest radiography (CXR) and computed tomography (CT), have emerged as an important complement to RT-PCR.
Among the radiological imaging techniques, CT screening is widely researched due to its low energy consumption and anatomical information of the lungs. In recent researches [1,2], it has been demonstrated that the parenchymal imaging findings of COVID-19 pneumonia on CT scans include multi-focal peripheral ground-glass opacity (GGO) and pulmonary consolidation, which respectively appear in the early and late stages of COVID-19. Therefore, automated evaluation of CT scans can be helpful in fighting against COVID-19.
Many deep learning methods have been proposed and achieved excellent performance [3]. For example, Fan et al. [4] use a parallel partial decoder to aggregate high-level features, and propose a semi-supervised network to segment COVID-19 CT scans. In Ref. [5], a COVID-Net is proposed to diagnose COVID-19 patients from CXR. Zhang et al. [6] propose an anomaly detection model to analyse CXRs in COVID-19. Most existing research only focuses on segmenting lesions or classifying CT images, and may be insufficient for assisting clinicians. This is because, clinicians first review CT images to identify any lesion and their location, and then make decisions of the likely diagnosis based on the location and morphology of lesions. This indicates that segmentation and classification tasks are closely interrelated in clinical practice, and the segmentation task can provide necessary information for classification. Therefore, we propose a COVID-19 diagnosis methodology that employs both segmentation and classification tasks. However, as discussed in previous works, COVID-19 segmentation in CT scans is still a challenging task due to two issues: 1) Multi-scale context information is not fully utilized [7], 2) The high variance in texture, size and position of infections are challenging for segmentation, and inter-class variance of lesions are small [4]. On the other hand, the already challenging overlap in appearances between the different causes of pneumonia are always further complicated by limited image resolution [8].
To address the above issues, this work proposes the Dynamic Fusion Segmentation Network (DFSN) and Dynamic Transfer-learning Classification Network (DTCN), and Fig. 1 provides a rough view of proposed networks. On the one hand, segmentation-related issues are addressed by two modules in DFSN, namely the inter-stage and intra-stage fusion module. To address the first issue of utilizing context information, several inter-stage fusion modules are employed to combine low-level features with high-level ones, thus providing global information guidance flow for generating segmentation results. In contrast to methods that use skip-connections to fuse features from symmetrical layers (i.e., from front or end of networks, layers have the same order) [7,9], the inter-stage fusion module takes features from unsymmetrical layers as inputs. It adopts convolution layers with large kernels to estimate such guidance. Besides, high-level features are dynamically fused by an intra-stage module to address issues of lesion characteristics (e.g., texture, size and position). This module first takes several pixel-wise convolution layers to estimate location-specific fusion weights, then uses these weights to fuse features related to different lesions. Based on DFSN, a transfer-learning-based network (i.e., DTCN) is then proposed to address the issue of distinguishing signs of diseases. Concretely, DTCN first utilizes a DFSN with transferred weights to extract pixel-level information, then adaptively selects this information according to clinical knowledge, and finally, generates image labels. In this way, parenchymal changes due to infection can be comprehensively estimated, and by considering their clinical meanings, accurate classifications can be easily achieved.
Fig. 1.
The proposed DFSN is trained to generate semantic segmentation results accurately. In the proposed DTCN, DFSN is transferred as the backbone and fine-tuned in datasets with different distributions.
The main contributions of this work can be summarized as follows:
-
●
We propose a novel Dynamic Fusion Segmentation Network (DFSN) for COVID-19 segmentation. Inside this network, inter-stage and intra-stage fusion modules are employed to fuse multi-scale context information and semantic information. To the best of our knowledge, this is the first work to consider differences of low-level and high-level features for designing fusion methodologies.
-
●
Based on DFSN, we further introduce a novel Dynamic Transfer-learning Classification Network (DTCN) for classifying COVID-19 patients. Inside DTCN, a pre-trained DFSN is transferred as the backbone to extract pixel-level semantic information. Through systematic experiments, we also demonstrate the clinical significance of pixel-level information and their importance for COVID-19 diagnosis. According to our best knowledge, this is the first work to consider clinical significance of segmentation results for designing deep networks.
-
●
We evaluate the proposed methods through comprehensive experiments. The results demonstrate that our methods achieve state-of-the-art performance in terms of segmentation and classification.
2. Related work
In this section, we briefly review several researches closely related to this work, including segmentation in CT, classification in CT and deep learning for COVID-19.
2.1. Segmentation in CT
Compared with RT-PCR, CT imaging is a more popular technique for the diagnosis and assessment of lung diseases [10,11]. As observed in Refs. [12,13], by segmenting organs and lesions from CT scans, doctors can quickly obtain information crucial for diagnosing lung diseases. Following this observation, many segmentation methods have been proposed and achieved significant performance. For example, in Ref. [14], the support vector machine (SVM) classifier is proposed to segment lung nodules from CT images. Shen et al. [15] present an automated lung segmentation system by taking the bidirectional chain code. Though this system achieves better performance than its counterparts, it still suffers from similar visual appearances of nodules and other tissues. To address this issue, methods based on convolutional neural networks are widely studied. Wang et al. [16] introduce a central focused convolutional neural network to extract nodule-sensitive features from both 2D and 3D CT images. Jin et al. [17] utilize generative adversarial network (GAN) with multi-mask reconstruction loss to improve the robustness of progressive holistically nested network (P–HNN). According to their experiments, additional data, synthesized by GAN, is beneficial to overall segmentation performance. Jiang et al. [18] add multiple residual streams of varying resolutions and propose two networks to segment lung tumours. For further capturing global and multi-scale context information, Feng et al. [7] combine two pyramid convolutional modules and propose the Context Pyramid Fusion Network (CPFNet). In this work, we utilize the above context information further by employing two different feature fusion modules. Additionally, via comprehensive experiments, we demonstrate that, for an encoder-decoder network such as UNet and CPFNet, fusing features from the encoder is beneficial to locate infected regions, and fusing features from the decoder is beneficial to estimate pixel labels.
2.2. Classification in CT
Apart from segmenting organs and lesions on CT scans, computer-aided diagnosis (CAD) systems are also thought to be useful in effectively classifying multi-category CT images. For example, Li et al. [19] design a 2D convolutional neural network (CNN) to distinguish CT images in three categories, i.e., COVID-19 pneumonia, community acquired pneumonia (CAP) and non-pneumonia. Specifically, input CT slices are fed through a pre-trained ResNet50 [20] to extract features. Then these features are combined and fed through a fully connected layer. In Ref. [8], Kang et al. conduct the classification task with multi-view presentation learning, which are achieved by encoding information from different aspects of features. In contrast, Wang et al. [21] combine two 3D-ResNets [20] and use a prior-attention strategy to guide them to learn more discriminative representations for the pneumonia-type classification. In Ref. [22], a 3D segmentation model is first used to segment lesion locations. Then, all locations are separately classified, and the overall classification result is obtained by using the Noisy-or-Bayesian function. In this work, we conduct a three-category classification task by transfer learning, which is achieved by fine-tuning a pre-trained segmentation model (i.e., DFSN). In addition, through systematic experiments, we observe the clinical significance of segmentation results is vital for classifying on COVID-19 CT scans.
2.3. Deep learning for COVID-19
Deep learning methods have been widely employed in many CAD systems for COVID-19 [3]. Joseph et al. [23] categorize these methods into three classes: patient scale (e.g., medical imaging for diagnosis), molecular scale (e.g., protein structure prediction) and population-scale (e.g., epidemiology). In this Section, we briefly discuss patient-specific methods. In Ref. [24], a Spatial Transformer Network is proposed to predict disease severity in lung ultrasonography and localize pathological signatures. In Ref. [25], FC-DenseNet is first used to provide segmentation results. Based on this result, different patches of a CXR are cropped and fed into different classification networks. Then, the final classification result is obtained by majority voting from the results of cropped patches. Wu et al. [26] propose a Joint Classification and Segmentation (JCS) system to perform classification and segmentation of COVID-19 CT images. However, the JCS system performs the segmentation diagnosis only if the classification results are COVID-19 predictions. In Ref. [27], the authors proposed an AI system to conduct both classification, segmentation, and quantitative measurement tasks. Within this system, input images are initially put through a segmentation network to obtain lung-lesion maps. Then, the maps are taken as the input of a classification network for generating image labels, which are finally incorporated with clinical metadata to make quantitative measurements. Similar to Ref. [26], authors of [28] proposed a multi-task model, which contains parallel classification and segmentation branches and is supervised by a multi-task loss function.
Besides, several network architectures and mechanisms are considered in developing COVID-19 diagnosis models. For example, given an input image, previous [9] takes an encoder and a recurrent decoder to provide pixel-level diagnosis, while ResNet [20] and DenseNet [29] both utilize residual blocks and skip connections to make image-level classification. However, because high-level features tend to lose details of the input image, the above methods are easy prone to failure on complicated imaging data. For solving this issue, mechanisms such as attention mechanisms [30], multi-view presentation learning [8] and semi-supervision [4] are employed into the above models. By contrast, in this work, we first explore characteristics of multi-level features, and propose separate fusion blocks for features in different levels. As a result, the proposed DFSN can segment CT images more accurately. In addition, DFSN is used to form the transfer learning, which aims at diagnosing CT images. During this process, medical knowledge of different lesions is utilized to form model architectures.
3. Proposed method
In this Section, we first introduce the proposed Dynamic Fusion Segmentation Network (DFSN). Then, we discuss the motivation of integrating transfer learning and feature selection within a joint framework, which we coin Dynamic Transfer-learning Classification Network (DTCN).
3.1. Dynamic Fusion Segmentation Network
The architecture of the proposed DFSN is shown in Fig. 2 . A CT image is first stacked on a three-dimensional tensor for alleviating the rapid increase of channels. After that, the three-dimensional tensor is fed through a U-shape model to extract features with different channels and resolutions. The tensor is first fed to an encoder (the five blocks on the left) to extract context information. Then encoder-generated features are fed to a decoder (the five blocks on the right) to obtain semantic information. Inside the encoder and decoder, feature maps are processed by several max pooling and convolution layers. Despite their demonstrated ability to estimate context information, this architecture blurs fine detail such as lesion boundaries, leading to poor segmentation performance [7]. Inspired by existing methods [4], features at different levels are fused to avoid this problem. In addition, by considering characteristics of multi-level features [31], two feature fusion modules are designed. For convenience, they are named as inter-stage and intra-stage fusion modules, and will be detailed in following subsections.
Fig. 2.
The overall architecture of the proposed DFSN.
3.1.1. Inter-stage fusion module
As discussed in Ref. [31], the encoder can learn context information, including boundaries and category characteristics of objects. However, this information may be progressively weakened after going through deep layers. Besides, the simple skip-connection between the encoder and decoder is an indiscriminate combination, which always introduces irrelevant clutter [7]. Therefore, in this work, we propose the inter-stage fusion module to better use context information.
Fig. 3 shows the proposed inter-stage fusion module, which has three different inputs, namely pooling indices from max-pooling layers, features from the encoder and the decoder. For convenience, these three inputs are denoted as P I, F e and F d, respectively. Inside this module, the P I is used to magnify resolutions of F d, which contains more semantic information than F e [32]. After that, F d is concatenated with F e, thus integrating the context information extracted by the encoder and the semantic information estimated by the decoder. The three local fusion blocks comprising pixel-wise convolution, batch-normalization, and ReLU layers, provide semantic and contextual information into the decoder. The proposed module differs from Refs. [4,9,29] for the following reasons: 1) The fused features are different. To be specific, Ronneberger et al. [9] fuse the encoder and decoder features in a balanced manner (e.g., in Fig. 2, features generated by the left-upper block are combined to the right-upper block), while the proposed module fuses features transported by the purple arrows. 2) Their fusion methods are different. The method proposed in Ref. [9] directly adds different features, and the model proposed by Ref. [7] fuses features through diluted convolution layers with 3 × 3 kernel. In contrast, the proposed module only takes pixel-wise convolution layers, with a 1 × 1 kernel.1 Comprehensive experiments are conducted and presented in Section 4.3.1, which demonstrate the effectiveness of the intra-stage fusion module. According to the results, it is demonstrated that pixel-wise convolution layers can effectively utilize context information.
Fig. 3.
The architecture of the proposed inter-stage fusion module has three inputs. It takes several pixel-wise convolution layers to fuse these inputs.
3.1.2. Intra-stage fusion module
As we mentioned in the introduction, infection characteristics such as texture, size and position are highly variable. At the same time, the inter-class variance of lesions is small. However, semantic edge information in high-level features help locate lesions and can provide useful constraints to guide label estimation for segmentation [4,[33], [34], [35]]. Thus, in this subsection, we propose an intra-stage fusion module to generate segmentation results dynamically.Four feature maps from the decoder are fused by the proposed intra-stage fusion module, whose architecture is shown in Fig. 4 . As the resolutions and channels of these inputs are different from each other, they are first processed by feature normalization blocks with different parameters. The feature normalization blocks contain a pixel-wise convolution layer, a batch-normalization layer and a transposed convolution layer. The former two layers aim at reducing feature channels and covariate shift [36], while the last one recovers feature resolutions. Then, for generating location-adaptive fusion weights, output features of the Feature Normalization-7 are further fed into the Adaptive Weight Learner proposed in Ref. [32]. Specifically, this learner is formed by three pairs of convolution, batch normalization and ReLU layers. A notable difference is that, in this work, the kernel size of these convolution layers is set to 5 × 5. This is because lesion contours are blurrier than boundaries in healthy tissues [7], and compared with pixel-wise convolution, convolution layers with big conceptive fields (e.g., big kernels) are more robust to the blurry contours [37]. After that, output features of Adaptive Weight Learner and the bottom four feature normalization layers are deformed to two 4D tensors by different methods. Specifically, output features of the Adaptive Weight Learner are deformed according to channel numbers. In contrast, features of the Feature Normalization-5 are first concatenated with other features, then, these four concatenated features are further concatenated to obtain the 4D tensors (indicated by coloured rectangles in Fig. 4). Finally, these two 4D tensors are multiplied to generate segmentation results (i.e., the right-bottom tensor). Overall, the intra-stage fusion module dynamically fuses features by the Adaptive Weight Learner, whose kernel sizes are different from Ref. [32]. To demonstrate the influence of this different configuration, several experiments are conducted in Section 4.3.1.
Fig. 4.
The architecture of the proposed intra-stage fusion module. Multi-level features from the decoder are taken as input and processed by different feature normalization blocks. After that, the output features of these block are dynamically fused.
3.2. Dynamic Transfer-learning classification network
Fig. 5 provides an overview of the proposed Dynamic Transfer-learning Classification Network (DTCN), which takes a pre-trained semantic segmentation network to conduct the image classification task. To be specific, inside DTCN, a pre-trained DFSN is taken as a backbone to extract pixel-level information (i.e., segmentation results presented by features). Then, these extracted features are adaptively selected according to medical knowledge. Finally, an adaptive pooling layer reduces feature resolutions, and a fully connecting layer generates categories of the input image. The motivation to use DTCN comes from considering that, during diagnosis, clinicians first scout CT images, then locate lesions regions, and finally, make decisions according to regional appearances and contours of the detected regions. Thus, the goal of the first step is similar to the semantic segmentation task that predicts pixel-level categories, and the other steps aim at making decisions. In addition, it is easy to find that the above methodology can also be utilized for analyzing other medical images, which indicates that the proposed DTCN is efficient and heuristic for other tasks. Therefore, semantic segmentation methods can provide classification models with pixel-level information [22]. However, whether the semantic segmentation methods should be pre-trained and whether segmentation results are beneficial to classification remains unclear. In the following subsections, we discuss the above two questions by detailing the proposed DTCN.
Fig. 5.
The overall architecture of the proposed DTCN. The DTCN, with a CT scan as input, makes a classification using the transferred DFSN, feature selection layer, adaptive pooling layer, and the fully connected layer.
3.2.1. Joint segmentation and classification by transfer learning
It is widely known that semantic segmentation aims at estimating categories of each pixel, and image classification aims at obtaining image-level categories. Therefore, these two tasks are closely related to one another, and their complementary relationship has been explored in medical image analysis. To be specific, Wu et al. [26] considered the results of semantic segmentation and classification tasks are jointly considered during diagnosis. Zhang et al. [27] first utilize a segmentation network to generate lung-lesion features, then provide prognosis analysis by considering the lung-lesion features and clinical metadata. However, it can be noted that method proposed in Ref. [27] requires not only numerous manually segmented images, but also need multi-modal data such as clinical metadata. However, achieving these two requirements is difficult and time-consuming. In addition, as discussed in Refs. [[38], [39], [40]], fine-tuning pre-trained models in datasets with different distributions can improve the robustness and effectiveness of learned feature space. Therefore, a pre-trained segmentation network is needed to provide pixel-level information.
3.2.2. Feature selection
Fig. 5 shows that output features of DFSN form a 4D feature map. A simple method is to directly take this 4D feature map to make classifications. However, for a segmentation network (i.e., DFSN), values of each pixel represent probabilities of a specific class. According to channels with the highest probability, the category of a single pixel can be obtained. Therefore, it is easy to find that each channel contains information about a particular class, and not all channels are beneficial to diagnose CT images. For example, during the process of distinguishing COVID-19 CT scans, channels related to the background (e.g., tissues around the lung) are useless. In this work, DFSN is trained for segmenting pixels of four categories, including background, GGO, consolidation and pleural effusion. Therefore, in the 4D feature maps, the feature at the second channel relates to consolidation, which is characterized by a homogeneous increase in lung parenchymal attenuation [41]. However, as indicated in Ref. [42], the most frequently observed features of COVID-19 pneumonia are bilateral involvement, peripheral distribution with GGO, rather than consolidation. Besides, perhaps consolidations are small, and their appearances are similar to adjacent structures, resulting in false-negative detection [4]. Due to the above two reasons, segmentation of consolidation is likely to be less useful for classifying COVID-19 images. For demonstrating this observation, comprehensive experiments are conducted and presented in Section 4.3.2.
4. Experiments
4.1. Data and metrics
4.1.1. Data for segmentation
As discussed in Ref. [4], only the COVID-19 CT Segmentation dataset, which consists of 100 axial CT images from different COVID-19 patients, is publicly accessible. In this dataset, all images are collected by the Italian Society of Medical and Interventional Radiology.2 For identifying lung infections, all CT images are manually segmented by a radiologist, and all pixels are labeled by four categories, i.e., background, GGO, consolidation and pleural effusion. For comparing DFSN and other methods, all images are randomly split into two subsets. Table 1 summarizes these two subsets. As discussed in Ref. [4], though this is the first open-access dataset for lung infection segmentation, it still suffers from the limited size and low-resolution images. Therefore, we also conduct a 5-fold experiments to further compare DFSN and other methods.
Table 1.
Summary of the segmentation subsets.
| Subset | Background | GGO | Consolidation | Pleural Effusion |
|---|---|---|---|---|
| Train | 70 | 66 | 56 | 19 |
| Test | 30 | 30 | 22 | 6 |
| Total | 100 | 96 | 78 | 25 |
4.1.2. Data for classification
For comparing DTCN and other methods, we obtain a three-category dataset by combing COVID-CT-Dataset [43] and the dataset proposed in Ref. [27]. This approach ensures there are no duplicated images in the classification and segmentation datasets. In total, this synthetic dataset contains 1136 images, which are labeled as non-COVID-19, COVID-19 and common pneumonia. Images of common pneumonia are randomly extracted from the dataset described by Ref. [27], while images of other categories are taken from the COVID-CT-Dataset. As Table 2 presents, all images are split into train, validation and test subsets.
Table 2.
Summary of the classification subsets.
| Subset | Common Pneumonia | Non COVID-19 | COVID-19 |
|---|---|---|---|
| Train | 200 | 234 | 191 |
| Validation | 100 | 58 | 60 |
| Test | 100 | 105 | 98 |
| Total | 400 | 387 | 349 |
4.1.3. Metrics
In this work, the Intersection over Union (IoU), Dice coefficient (Dice), Precision (Pre) and Accuracy (Acc) are taken to evaluate performance of the proposed DFSN. For the three-category classification task, the Accuracy (Acc), Precision (Pre) and Recall (Rec) are used to verify the classification ability of DTCN.
4.2. Implementation details
In this work, the proposed DFSN and DTCN are implemented on the PyTorch platform with NVIDIA Tesla K80 GPU. For training DFSN, the stochastic gradient descent (SGD) optimizer is used, in which momentum and learning rate are set to 0.9 and 0.001, respectively. During training, the batch size is fixed to 1, and the learning rate is decayed every 10 epochs. All training batches are randomly rotated for data augmentation. In addition, for alleviating the unbalanced segmentation problems, a weighted cross entry loss function is adopted, and these weights are 0.0013, 0.0261, 0.0501 and 1.000, respectively. On the other hand, the proposed DCTN is trained with the Adam optimizer [44] and an unweighted cross entry loss. In Adam, the first and second momentum values are fixed to 0.9 and 0.999, and the weight-decay is set to 0.1. The learning rate is set to 0.00001, and is dynamically decayed according to validation loss.
4.3. Model analysis
4.3.1. Analysis of segmentation
As discussed in Section 3.1, DFSN takes two modules to capture complementary relationships between multi-level features. Therefore, here we provide several baselines to demonstrate the effectiveness of the two fusion modules. Configurations of these baselines are shown in Table 3 . Specially, as both SegNet [45] and the proposed DFSN take max pooling and un-pooling layers to change feature resolutions, we take SegNet as a baseline without the two proposed fusion modules, rather than the widely used UNet [9]. For the Addition-Fusion baseline, features from the encoder are additionally fused by the inter-stage fusion module.
Table 3.
Configurations of segmentation baselines. For the Addition-Fusion baseline, features from encoder are additionally fused by the inter-stage fusion module.
| Baselines | Inter-Stage Fusion | Inter-Stage Fusion | Depth | KernelsInter | KernelsIntra |
|---|---|---|---|---|---|
| SegNet [45] | – | – | – | – | – |
| Inter-Fusion | ✓ | – | 3 | 1-1-1 | – |
| Intra-Fusion | – | ✓ | 3 | – | 1-1-1 |
| Dual-Fusion | ✓ | ✓ | 3 | 1-1-1 | 1-1-1 |
| Addi-Fusion | ✓ | ✓* | 3 | 1-1-1 | 1-1-1 |
| Deeper-Fusion | ✓ | ✓ | 6 | 1-1-1-1-1-1 | 1-1-1-1-1-1 |
| Dual-Fusion-3 | ✓ | ✓ | 3 | 1-3-1 | 1-3-1 |
| Dual-Fusion-5 | ✓ | ✓ | 3 | 1-5-1 | 1-5-1 |
| Dual-Fusion-7 | ✓ | ✓ | 3 | 1-7-1 | 1-7-1 |
| Dual-Fusion-5+ | ✓ | ✓ | 3 | 5-5-5 | 1-1-1 |
| DFSN | ✓ | ✓ | 3 | 1-1-1 | 5-5-5 |
Experimental results of these baselines are shown in Table 4 . By comparing metrics of SegNet, Inter-Fusion, Intra-Fusion and Dual-Fusion, effectiveness of these two fusion modules is demonstrated. On the one hand, performance degradation of Addi-Fusion indicates that fusing multi-level features from the encoder is not beneficial for extracting context information. Similarly, the comparison result between Deeper-Fusion and Dual-Fusion indicates that taking deep modules to fuse multi-level features is ineffective. There are two reasons for this observation: 1) Deep fusion modules inevitably cause an increase in model parameters, which makes it hard to train with limited images; 2) Deep fusion modules can easily cause the gradient vanishing problem. On the other hand, features from different stages should be fused differently. That is, to capture multi-scale context information, pixel-wise convolutions are more effective than large-kernel convolutions. The latter, however, perform better at fusing semantic information.
Table 4.
Results of segmentation baselines. The best and second-best results are respectively denoted by red and blue colors.
| Baselines | IoU | Dice | Pre | Acc |
|---|---|---|---|---|
| SegNet | 0.74 | 0.50 | 0.46 | 0.87 |
| Inter-Fusion | 0.78 | 0.49 | 0.46 | 0.87 |
| Intra-Fusion | 0.76 | 0.52 | 0.46 | 0.90 |
| Dual-Fusion | 0.79 | 0.51 | 0.47 | 0.90 |
| Addi-Fusion | 0.79 | 0.46 | 0.42 | 0.83 |
| Deeper-Fusion | 0.77 | 0.51 | 0.46 | 0.90 |
| Dual-Fusion-3 | 0.77 | 0.47 | 0.44 | 0.84 |
| Dual-Fusion-5 | 0.78 | 0.53 | 0.49 | 0.92 |
| Dual-Fusion-7 | 0.77 | 0.49 | 0.46 | 0.84 |
| Dual-Fusion-5+ | 0.79 | 0.46 | 0.43 | 0.83 |
| DFSN | 0.80 | 0.53 | 0.49 | 0.90 |
4.3.2. Analysis of classification
Here we provide seven baselines to verify the performance of the transferred DFSN and feature selection layer. Table 5 provides a clear view of these seven baselines. Among them, maintained features of DFSN-Class124+ are additionally fused with the 2nd feature through a point-wise convolution layer. For the SegNet-Class, a pre-trained SegNet is taken as the backbone.
Table 5.
Configurations of classification baselines.
| Baselines | Transfer Learning | 1st Feature | 2nd Feature | 3rd Feature | 4th Feature |
|---|---|---|---|---|---|
| w/o transfer | – | ✓ | ✓ | – | ✓ |
| SegNet-Class | SegNet | ✓ | ✓ | – | ✓ |
| w/o selection | DFSN | ✓ | ✓ | ✓ | ✓ |
| DFSN-Class123 | DFSN | ✓ | ✓ | ✓ | – |
| DFSN-Class134 | DFSN | ✓ | – | ✓ | ✓ |
| DFSN-Class234 | DFSN | – | ✓ | ✓ | ✓ |
| DFSN-Class124+ | DFSN | ✓ | ✓ | – | ✓ |
| DTCN | DFSN | ✓ | ✓ | – | ✓ |
Table 6 presents results of DTCN and the proposed baselines. By comparing results of w/o transfer and DTCN, it can be found that the pre-trained DFSN remarkably improves the overall performance. Results of SegNet-Class and DTCN also demonstrate the effectiveness of the transferred DFSN. The five baselines related to the feature selection layer indicate that features related to the consolidation are unhelpful to classification. In detail, comparing configurations of w/o selection, DFSN-Class123, DFSN-Class134, DFSN-Class234 and DTCN, it can be found GGO are the most important features, and features related to the pleural effusion are secondarily important. In addition, by comparing DFSN-Class124 + and DTCN, effectiveness of the proposed feature selection layer is further demonstrated. Overall, the above comparison results demonstrate that: 1) Using a pre-trained segmentation network as the backbone of classification models is a promising method, 2) Segmentation results should be considered according to their clinical means.
Table 6.
Results of classification baselines. In the Pre and Rec columns, values in the three sub-columns are metrics related to certain categories (from left to right, non-COVID-19, COVID-19 and common pneumonia). The best and second best results are respectively denoted by red and blue colors.
| Baselines | Acc | Pre | Rec | ||||
|---|---|---|---|---|---|---|---|
| w/o transfer | 0.64 | 0.55 | 0.00 | 0.75 | 0.90 | 0.00 | 0.98 |
| SegNet-Class | 0.40 | 0.63 | 0.31 | 0.00 | 0.49 | 0.70 | 0.00 |
| w/o selection | 0.74 | 0.70 | 0.66 | 0.82 | 0.69 | 0.53 | 0.99 |
| DFSN-Class123 | 0.65 | 0.56 | 0.00 | 0.78 | 0.93 | 0.00 | 1.00 |
| DFSN-Class134 | 0.47 | 0.37 | 0.71 | 0.00 | 0.78 | 0.60 | 0.00 |
| DFSN-Class234 | 0.56 | 0.00 | 0.67 | 0.51 | 0.00 | 0.72 | 1.00 |
| DFSN-Class124+ | 0.70 | 0.62 | 0.75 | 0.78 | 0.88 | 0.21 | 0.98 |
| DTCN | 0.77 | 0.73 | 0.76 | 0.82 | 0.76 | 0.57 | 0.98 |
4.4. Comparisons with other methods
4.4.1. Comparison of segmentation
In this subsection, the proposed DFSN is compared with several segmentation networks. As Table 7 presents, the compared methods include classical networks, such as SegNet [45], DeepLab v2 [46] and DeepLab v3 [47]. Furthermore, five segmentation networks designed for medical images and their variants are also compared. Inside UNet [9], an encoder-decoder architecture is taken to learn semantic information, and skip-connections are taken to fuse features from the encoder and decoder. Based on UNet, the Attention UNet [30] and R2UNet [48] are obtained by employing attention mechanisms and residual blocks. By combining these two models, the Attention R2UNet,3 which has shown better segmentation performance, is obtained. For InfNet-ResNet and InfNet-VggNet, they are variants of InfNet [4]. As most methods in our comparison are trained with different datasets, all compared methods and DFSN are trained under the same protocol for a fair comparison.
Table 7.
Results of compared segmentation methods. The best and second-best results are respectively denoted by red and blue colors.
| Methods | IoU | Dice | Pre | Acc |
|---|---|---|---|---|
| SegNet [45] | 0.74 | 0.50 | 0.46 | 0.87 |
| UNet [9] | 0.47 | 0.52 | 0.61 | 0.96 |
| DeepLab v2 [46] | 0.59 | 0.42 | 0.39 | 0.85 |
| DeepLab v3 [47] | 0.68 | 0.40 | 0.37 | 0.76 |
| Attention UNet [30] | 0.75 | 0.49 | 0.43 | 0.89 |
| R2UNet [48] | 0.31 | 0.24 | 0.25 | 0.83 |
| Attention R2UNet a | 0.47 | 0.34 | 0.33 | 0.79 |
| InfNet-ResNet [4] | 0.72 | 0.45 | 0.40 | 0.87 |
| InfNet-VggNet [4] | 0.76 | 0.38 | 0.35 | 0.87 |
| Triage [28] | 0.75 | 0.47 | 0.44 | 0.85 |
| CopleNet [49] | 0.74 | 0.50 | 0.48 | 0.88 |
| DFSN | 0.80 | 0.53 | 0.49 | 0.90 |
Comparison results are shown in Table 7, Table 8 and Fig. 6 . As can be seen from Table 7, the proposed DFSN achieves the best performance in terms of IoU and Dice. For other metrics (i.e., Pre and Acc), DFSN also achieves state-of-the-art performance. In detail, compared with methods based on UNet, such as Attention UNet, R2UNet and Attention R2UNet, DFSN outperforms them by about 6%. This demonstrates that the proposed fusion modules are more effective than attention mechanisms used in Refs. [30,48]. Further, by comparing SegNet and DFSN, it can be found that context information is beneficial to segmentation, and effectively utilizing such information can remarkably improve segmentation performance. As Table 8 shows, during the 5-fold experiment, the proposed DFSN has the minimum standard deviation, which indicates that DFSN are more robust than these compare methods. On the other hand, compared with recent methods such as Triage [28], JCS [26], and CopleNet [49], DFSN not only outperforms them in terms of values of most metrics, but also the standard deviation. The above comparison results demonstrate that DFSN achieves comparable performance and better stability than recent works. Fig. 6 further demonstrates the effectiveness of DFSN. It can be found that, compared with other methods, the results of DFSN are more similar to the ground truth. However, this figure also indicates that all methods wrongly classify background into consolidation or pleural effusion. There are two reasons for this phenomenon: 1) The adopted segmentation dataset is highly unbalanced (see Table 1), 2) Most of the CT images contain not only a large number of dark regions (e.g., the corners of the bottom image), but also a broad and complex variation of tissues (e.g., corners of the upper image).
Table 8.
The 5-fold results of compared segmentation methods. The best and second-best results are respectively denoted by red and blue colors. Each metric is shown together with the standard deviation.
| Methods | IoU | Dice | Pre | Acc |
|---|---|---|---|---|
| SegNet [45] | 0.67 ± 0.13 | 0.47 ± 0.08 | 0.43 ± 0.07 | 0.86 ± 0.04 |
| UNet [9] | 0.71 ± 0.07 | 0.51 ± 0.03 | 0.50 ± 0.06 | 0.90 ± 0.05 |
| DeepLab v2 [46] | 0.60 ± 0.08 | 0.37 ± 0.10 | 0.36 ± 0.06 | 0.72 ± 0.16 |
| DeepLab v3 [47] | 0.68 ± 0.06 | 0.50 ± 0.04 | 0.45 ± 0.04 | 0.88 ± 0.04 |
| Attention UNet [30] | 0.75 ± 0.03 | 0.51 ± 0.06 | 0.47 ± 0.06 | 0.88 ± 0.04 |
| R2UNet [48] | 0.47 ± 0.11 | 0.32 ± 0.06 | 0.36 ± 0.07 | 0.80 ± 0.08 |
| InfNet-ResNet [4] | 0.68 ± 0.09 | 0.47 ± 0.06 | 0.42 ± 0.05 | 0.86 ± 0.03 |
| InfNet-VggNet [4] | 0.70 ± 0.03 | 0.51 ± 0.06 | 0.48 ± 0.07 | 0.88 ± 0.03 |
| Triage [28] | 0.63 ± 0.06 | 0.50 ± 0.04 | 0.50 ± 0.04 | 0.91 ± 0.02 |
| JCS [26] | 0.41 ± 0.06 | 0.35 ± 0.08 | 0.36 ± 0.08 | 0.88 ± 0.04 |
| CopleNet [49] | 0.64 ± 0.06 | 0.47 ± 0.05 | 0.46 ± 0.05 | 0.86 ± 0.08 |
| DFSN | 0.73 ± 0.01 | 0.52 ± 0.03 | 0.48 ± 0.02 | 0.88 ± 0.02 |
Fig. 6.
Qualitative results of compared segmentation methods and the proposed DFSN. The red, green and blue pixels indicate GGO, consolidation and pleural effusion, respectively. Best viewed in color.
4.4.2. Comparison of classification
For demonstrating the performance of the proposed DTCN, six classification models are compared. Table 9 provides quantitative results of the six compared models and DTCN. Among these compared models, COVNet [19] and DarkCovidNet [53] are recently published methods for classifying COVID-19 images. For DenseNet [29] and ResNext-50 [52], the classification result is generated by using stacked convolution layers and skip-connections, which are widely used in both medical image analysis and traditional image classification. Other methods, e.g., EfficientNet [50], rely on efficient network design, such as multiple branches and convolution layers with different kernel sizes.
Table 9.
Results of compared classification methods. In the Pre and Rec columns, values in the three sub-columns are metrics related to certain categories (from left to right, non-COVID-19, COVID-19 and common pneumonia). The best and second-best results are respectively denoted by red and blue colors.
| Methods | Acc | Pre | Rec | ||||
|---|---|---|---|---|---|---|---|
| EfficientNet [50] | 0.65 | 0.71 | 0.51 | 0.80 | 0.19 | 0.79 | 0.99 |
| DenseNet [29] | 0.72 | 0.64 | 0.65 | 0.82 | 0.65 | 0.52 | 0.98 |
| Inception-v3 [51] | 0.37 | 0.41 | 0.34 | 0.40 | 0.27 | 0.53 | 0.33 |
| ResNext-50 [52] | 0.76 | 0.72 | 0.73 | 0.81 | 0.74 | 0.57 | 0.96 |
| DarkCovidNet [53] | 0.67 | 0.59 | 0.57 | 0.79 | 0.66 | 0.36 | 1.00 |
| Jin et al. [54] | 0.72 | 0.66 | 0.63 | 0.83 | 0.72 | 0.43 | 1.00 |
| AdderNet [55] | 0.52 | 0.52 | 0.45 | 0.58 | 0.43 | 0.40 | 0.75 |
| GhostNet [56] | 0.59 | 0.49 | 0.47 | 0.77 | 0.16 | 0.68 | 0.96 |
| Res2Net [57] | 0.65 | 0.58 | 0.51 | 0.81 | 0.56 | 0.42 | 0.98 |
| Res2Next [57] | 0.70 | 0.63 | 0.64 | 0.79 | 0.63 | 0.49 | 0.97 |
| COVNet [19] | 0.76 | 0.73 | 0.69 | 0.82 | 0.71 | 0.55 | 1.00 |
| DTCN | 0.77 | 0.73 | 0.76 | 0.82 | 0.76 | 0.57 | 0.98 |
As Table 9 indicates, the proposed DTCN achieves the best performance in terms of Acc, Pre and Rec. Concretely, DTCN outperforms COVNet and DarkCovidNet by 0.01 and 0.1 in term of Acc. For classifying images of the first and second categories, the performance of DTCN indicates that pixel-level information aids the classification of Non-COVID-19 and COVID-19 CT scans. Thus DTCN can effectively locate lesions (e.g., GGO and pleural effusion) and make more accurate classifications than its counterparts. Besides, during the detection of COVID-19 CT scans, the Rec of DTCN is worse than EfficientNet. This indicates that DTCN makes more false-negative classifications than EfficientNet. By analyzing classification baselines of DTCN, DTCN and EfficientNet, the reasons can be attributed to that segmentation results related to the background frequently introduce irrelevant information. In addition, by comparing DTCN with recent models, it can be found that DTCN outperforms most of them. Specifically, compared with novel models such as AdderNet [55], GhostNet [56], and Res2Net [57], it can be seen that owing to the transfer learning and the feature selection, DTCN can achieve better performance with limited parameters and simple architectures. It can also be found that DTCN achieves comparable performance of recent works proposed for medical image analysis, e.g., Jin el al. [54] and COVNet [19]. Compared with [54], the proposed DTCN is mildly worse in term of Rec. However, by observing the middle column of Pre and Rec, it is demonstrated that DTCN can distinguish COVID-19 images more accurately than [54]. For the other work COVNet [19], DTCN also achieves better performance in terms of Acc and Pre. On the Rec metric, though DTCN does not outperform COVNet in classifying images of common pneumonia, it still achieves better performance in classifying images of non-COVID-19 and COVID-19.
5. Conclusion
In this work, we propose a novel Dynamic Fusion Segmentation Network, which adopts two fusion modules to improve the identification of lesion regions. Moreover, we also design a transfer-learning-based network (i.e., Dynamic Transfer-learning Classification Network) to distinguish COVID-19 CT images from Non-COVID and common pneumonia scans. This network employs a pre-trained segmentation network to extract pixel-level information (i.e., segmentation results), which makes classifications according to selected pixel-level information. For demonstrating effectiveness of DFSN and DTCN, extensive experiments are conducted on benchmark segmentation dataset and a three-class classification dataset. Experimental results demonstrate that the proposed models achieve not only state-of-the-art performance, but also have great potential in the clinical assessment of patients with suspected COVID-19.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the National Natural Science Foundation of China [grant nos. 61922064, U2033210, 62101387].
Footnotes
For simplicity, batch normalization and ReLU layers of [7] and the proposed module are ignored.
References
- 1.Ai T., Yang Z., Hou H., Zhan C., Chen C., Lv W., et al. Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in China: a report of 1014 cases. Radiology. 2020 doi: 10.1148/radiol.2020200642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ye Z., Zhang Y., Wang Y., Huang Z., Song B. Chest ct manifestations of new coronavirus disease 2019 (covid-19): a pictorial review. Eur. Radiol. 2020:1–9. doi: 10.1007/s00330-020-06801-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shi F., Wang J., Shi J., Wu Z., Wang Q., Tang Z., et al. Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. IEEE Reviews in Biomedical Engineering. 2020 doi: 10.1109/RBME.2020.2987975. [DOI] [PubMed] [Google Scholar]
- 4.Fan D., Zhou T., Ji G., Zhou Y., Chen G., Fu H., et al. Inf-net: automatic covid-19 lung infection segmentation from ct scans. arXiv preprint arXiv. 2020 doi: 10.1109/TMI.2020.2996645. 200414133. [DOI] [PubMed] [Google Scholar]
- 5.Wang L., Wong A. Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images. arXiv 2020. 2003 doi: 10.1038/s41598-020-76550-z. arXiv. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang J., Xie Y., Li Y., Shen C., Xia Y. Covid-19 screening on chest x-ray images using deep learning based anomaly detection. arXiv preprint arXiv. 2020 200312338. [Google Scholar]
- 7.Feng S., Zhao H., Shi F., Cheng X., Wang M., Ma Y., et al. Cpfnet: context pyramid fusion network for medical image segmentation. IEEE Trans. Med. Imag. 2020 doi: 10.1109/TMI.2020.2983721. [DOI] [PubMed] [Google Scholar]
- 8.Kang H., Xia L., Yan F., Wan Z., Shi F., Yuan H., et al. IEEE Transactions on Medical Imaging; 2020. Diagnosis of Coronavirus Disease 2019 (Covid-19) with Structured Latent Multi-View Representation Learning. [DOI] [PubMed] [Google Scholar]
- 9.Xiaoqin Zhang, Runhua Jiang, Tao Wang, et al. IEEE Transactions on Circuits and Systems for Video Technology. IEEE; 2021. Recursive Neural Network for Video Deblurring; pp. 3025–3036. [Google Scholar]
- 10.Sluimer I., Schilham A., Prokop M., Van Ginneken B. Computer analysis of computed tomography scans of the lung: a survey. IEEE Trans. Med. Imag. 2006;25(4):385–405. doi: 10.1109/TMI.2005.862753. [DOI] [PubMed] [Google Scholar]
- 11.Kamble B., Sahu S.P., Doriya R. Advances in Data and Information Sciences. Springer; 2020. A review on lung and nodule segmentation techniques; pp. 555–565. [Google Scholar]
- 12.Gordaliza P.M., Muñoz-Barrutia A., Abella M., Desco M., Sharpe S., Vaquero J.J. Unsupervised ct lung image segmentation of a mycobacterium tuberculosis infection model. Sci. Rep. 2018;8(1):1–10. doi: 10.1038/s41598-018-28100-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Munoz-Barrutia A., Ceresa M., Artaechevarria X., Montuenga L.M., Ortiz-de Solorzano C. Quantification of lung damage in an elastase-induced mouse model of emphysema. Int. J. Biomed. Imag. 2012;2012 doi: 10.1155/2012/734734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Keshani M., Azimifar Z., Tajeripour F., Boostani R. Lung nodule segmentation and recognition using svm classifier and active contour modeling: a aomplete intelligent system. Comput. Biol. Med. 2013;43(4):287–300. doi: 10.1016/j.compbiomed.2012.12.004. [DOI] [PubMed] [Google Scholar]
- 15.Shen S., Bui A.A., Cong J., Hsu W. An automated lung segmentation approach using bidirectional chain codes to improve nodule detection accuracy. Comput. Biol. Med. 2015;57:139–149. doi: 10.1016/j.compbiomed.2014.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang S., Zhou M., Liu Z., Liu Z., Gu D., Zang Y., et al. Central focused convolutional neural networks: developing a aata-driven model for lung nodule segmentation. Med. Image Anal. 2017;40:172–183. doi: 10.1016/j.media.2017.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jin D., Xu Z., Tang Y., Harrison A.P., Mollura D.J. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2018. Ct-realistic lung nodule simulation from 3d conditional generative adversarial networks for robust lung segmentation; pp. 732–740. [Google Scholar]
- 18.Jiang J., Hu Y., Liu C., Halpenny D., Hellmann M.D., Deasy J.O., et al. Multiple resolution residually connected feature streams for automatic lung tumor segmentation from ct images. IEEE Trans. Med. Imag. 2018;38(1):134–144. doi: 10.1109/TMI.2018.2857800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li L., Qin L., Xu Z., Yin Y., Wang X., Kong B., et al. Artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct. Radiology. 2020 doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.He K., Zhang X., Ren S., Sun J. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Deep residual learning for image recognition; pp. 770–778. [Google Scholar]
- 21.Wang J., Bao Y., Wen Y., Lu H., Luo H., Xiang Y., et al. Prior-attention residual learning for more discriminative covid-19 screening in ct images. IEEE Trans. Med. Imag. 2020 doi: 10.1109/TMI.2020.2994908. [DOI] [PubMed] [Google Scholar]
- 22.Butt C., Gill J., Chun D., Babu B.A. Applied Intelligence; 2020. Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia; p. 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rajinikanth V., Dey N., Raj A.N.J., Hassanien A.E., Santosh K., Raja N. Harmony-search and otsu based system for coronavirus disease (covid-19) detection using lung ct scan images. arXiv preprint arXiv. 2020 200403431. [Google Scholar]
- 24.Roy S., Menapace W., Oei S., Luijten B., Fini E., Saltori C., et al. Deep learning for classification and localization of covid-19 markers in point-of-care lung ultrasound. IEEE Trans. Med. Imag. 2020 doi: 10.1109/TMI.2020.2994459. [DOI] [PubMed] [Google Scholar]
- 25.Oh Y., Park S., Ye J.C. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans. Med. Imag. 2020 doi: 10.1109/TMI.2020.2993291. [DOI] [PubMed] [Google Scholar]
- 26.Wu Y.H., Gao S.H., Mei J., Xu J., Fan D.P., Zhang R.G., et al. Jcs: an explainable covid-19 diagnosis system by joint classification and segmentation. IEEE Trans. Image Process. 2021;30:3113–3126. doi: 10.1109/TIP.2021.3058783. [DOI] [PubMed] [Google Scholar]
- 27.Zhang K., Liu X., Shen J., Li Z., Sang Y., Wu X., et al. Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography. Cell. 2020 doi: 10.1016/j.cell.2020.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Goncharov M., Pisov M., Shevtsov A., Shirokikh B., Kurmukov A., Blokhin I., et al. Ct-based covid-19 triage: deep multitask learning improves joint identification and severity quantification. Med. Image Anal. 2021;71 doi: 10.1016/j.media.2021.102054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Densely connected convolutional networks; pp. 4700–4708. [Google Scholar]
- 30.Oktay O., Schlemper J., Folgoc L.L., Lee M., Heinrich M., Misawa K., et al. Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv. 2018 180403999. [Google Scholar]
- 31.Hu H., Zhang Z., Xie Z., Lin S. Proceedings of the IEEE International Conference on Computer Vision. 2019. Local relation networks for image recognition; pp. 3464–3473. [Google Scholar]
- 32.Hu Y., Chen Y., Li X., Feng J. Dynamic feature fusion for semantic edge detection. arXiv preprint arXiv. 2019 190209104. [Google Scholar]
- 33.Zhao J., Liu J., Fan D., Cao Y., Yang J., Cheng M. Proceedings of the IEEE International Conference on Computer Vision. 2019. Egnet: edge guidance network for salient object detection; pp. 8779–8788. [Google Scholar]
- 34.Wu Z., Su L., Huang Q.m. Proceedings of the IEEE International Conference on Computer Vision. 2019. Stacked cross refinement network for edge-aware salient object detection; pp. 7264–7273. [Google Scholar]
- 35.Zhang Z., Fu H., Dai H., Shen J., Pang Y., Shao L. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2019. Et-net: A generic edge-attention guidance network for medical image segmentation; pp. 442–450. [Google Scholar]
- 36.Ioffe S., Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv. 2015 150203167. [Google Scholar]
- 37.Fu Z., Zheng Y., Ye H., Kong Y., Yang J., He L. Edge-aware deep image deblurring. arXiv preprint arXiv. 2019 190702282. [Google Scholar]
- 38.Long M., Wang J., Ding G., Sun J., Yu P.S. Proceedings of the IEEE International Conference on Computer Vision. 2013. Transfer feature learning with joint distribution adaptation; pp. 2200–2207. [Google Scholar]
- 39.Pan S.J., Tsang I.W., Kwok J.T., Yang Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Network. 2010;22(2):199–210. doi: 10.1109/TNN.2010.2091281. [DOI] [PubMed] [Google Scholar]
- 40.Gong B., Shi Y., Sha F., Grauman K. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2012. Geodesic flow kernel for unsupervised domain adaptation; pp. 2066–2073. [Google Scholar]
- 41.Hansell D.M., Bankier A.A., MacMahon H., McLoud T.C., Muller N.L., Remy J. Fleischner society: glossary of terms for thoracic imaging. Radiology. 2008;246(3):697–722. doi: 10.1148/radiol.2462070712. [DOI] [PubMed] [Google Scholar]
- 42.Salehi S., Abedi A., Balakrishnan S., Gholamrezanezhad A. Coronavirus disease 2019 (covid-19): a systematic review of imaging findings in 919 patients. Am. J. Roentgenol. 2020:1–7. doi: 10.2214/AJR.20.23034. [DOI] [PubMed] [Google Scholar]
- 43.Zhao J., Zhang Y., He X., Xie P. Covid-ct-dataset: a ct scan dataset about covid-19. arXiv preprint arXiv. 2020 200313865. [Google Scholar]
- 44.Kingma D.P., Ba J., Adam A method for stochastic optimization. arXiv preprint arXiv. 2014 14126980. [Google Scholar]
- 45.Badrinarayanan V., Kendall A., Cipolla R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39(12):2481–2495. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]
- 46.Chen L.C., Papandreou G., Kokkinos I., Murphy K., Yuille A.L. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017;40(4):834–848. doi: 10.1109/TPAMI.2017.2699184. [DOI] [PubMed] [Google Scholar]
- 47.Chen L.C., Papandreou G., Schroff F., Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv. 2017 170605587. [Google Scholar]
- 48.Alom M.Z., Hasan M., Yakopcic C., Taha T.M., Asari V.K. Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv. 2018 180206955. [Google Scholar]
- 49.Wang G., Liu X., Li C., Xu Z., Ruan J., Zhu H., et al. A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images. IEEE Trans. Med. Imag. 2020;39(8):2653–2663. doi: 10.1109/TMI.2020.3000314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Tan M., Le Q.V. Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv. 2019 190511946. [Google Scholar]
- 51.Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Rethinking the inception architecture for computer vision; pp. 2818–2826. [Google Scholar]
- 52.Xie S., Girshick R., Dollár P., Tu Z., He K. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Aggregated residual transformations for deep neural networks; pp. 1492–1500. [Google Scholar]
- 53.Ozturk T., Talo M., Yildirim E.A., Baloglu U.B., Yildirim O., Acharya U.R. Automated detection of covid-19 cases using deep neural networks with x-ray images. Comput. Biol. Med. 2020 doi: 10.1016/j.compbiomed.2020.103792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Jin C., Chen W., Cao Y., Xu Z., Tan Z., Zhang X., et al. Development and evaluation of an artificial intelligence system for covid-19 diagnosis. Nat. Commun. 2020;11(1):1–14. doi: 10.1038/s41467-020-18685-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Chen H., Wang Y., Xu C., Shi B., Xu C., Tian Q., et al. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Addernet: do we really need multiplications in deep learning? pp. 1468–1477. [Google Scholar]
- 56.Han K., Wang Y., Tian Q., Guo J., Xu C., Xu C. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Ghostnet: more features from cheap operations; pp. 1580–1589. [Google Scholar]
- 57.Gao S., Cheng M.M., Zhao K., Zhang X.Y., Yang M.H., Torr P.H. Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019 doi: 10.1109/TPAMI.2019.2938758. [DOI] [PubMed] [Google Scholar]






