Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Dec 15;152:106417. doi: 10.1016/j.compbiomed.2022.106417

Improving COVID-19 CT classification of CNNs by learning parameter-efficient representation

Yujia Xu 1, Hak-Keung Lam 1,, Guangyu Jia 1, Jian Jiang 1, Junkai Liao 1, Xinqi Bao 1
PMCID: PMC9750504  PMID: 36543003

Abstract

The COVID-19 pandemic continues to spread rapidly over the world and causes a tremendous crisis in global human health and the economy. Its early detection and diagnosis are crucial for controlling the further spread. Many deep learning-based methods have been proposed to assist clinicians in automatic COVID-19 diagnosis based on computed tomography imaging. However, challenges still remain, including low data diversity in existing datasets, and unsatisfied detection resulting from insufficient accuracy and sensitivity of deep learning models. To enhance the data diversity, we design augmentation techniques of incremental levels and apply them to the largest open-access benchmark dataset, COVIDx CT-2A. Meanwhile, similarity regularization (SR) derived from contrastive learning is proposed in this study to enable CNNs to learn more parameter-efficient representations, thus improve the accuracy and sensitivity of CNNs. The results on seven commonly used CNNs demonstrate that CNN performance can be improved stably through applying the designed augmentation and SR techniques. In particular, DenseNet121 with SR achieves an average test accuracy of 99.44% in three trials for three-category classification, including normal, non-COVID-19 pneumonia, and COVID-19 pneumonia. The achieved precision, sensitivity, and specificity for the COVID-19 pneumonia category are 98.40%, 99.59%, and 99.50%, respectively. These statistics suggest that our method has surpassed the existing state-of-the-art methods on the COVIDx CT-2A dataset. Source code is available at https://github.com/YujiaKCL/COVID-CT-Similarity-Regularization.

Keywords: COVID-19, Computed tomography, CNNs, Deep learning, Similarity regularization

1. Introduction

The Coronavirus Disease 2019 (COVID-19) has become a worldwide pandemic and infected over 493 million people till April 2022 [1]. Its increasingly high infectivity and fatality rate due to strain variation are threatening human health and damaging the global economy [2], [3], [4]. The efficient reproductive number of the virus in many countries remains high, as reported in [5], indicating COVID-19 continues spreading quickly around the world. Thereby, a timely and efficient diagnosis is crucial for the treatment of COVID-19 positive patients and the control of further disease spread.

In the early diagnosis of COVID-19 infection, real-time reverse transcription polymerase chain reaction (RT-PCR) is the primary choice due to its convenience and high specificity. However, research results [6], [7], [8] have suggested that RT-PCR is not sensitive enough that some infected patients turned out to be positive even after several negative tests. These false-negative cases might continue to infect their close contacts without isolation or develop into severe illness. Chest computed tomography (CT) has the potential to identify pathological lung changes and thus can serve as a supplementary screening tool to RT-PCR in infection detection, as indicated by [9], [10], [11].

Since the pandemic started, researchers have been exploring the potential of convolutional neural networks (CNNs) in COVID-19 CT classification and reported high accuracy without clinician intervention. CNNs are a kind of deep learning technique dominating computer vision tasks. For example, Gunraj et al. [12] introduced a large-scale open-access COVID-19 CT dataset (COVIDx CT-1) and trained a COVID-19-specific tailored CNN. Panwar et al. [13] utilized transfer learning to inherit cross-domain knowledge to improve the model performance. All these researches reveal that CNNs have the potential to serve as an assistant to help clinicians in COVID CT diagnosis.

Although CNNs have achieved remarkable results in CT diagnosis, challenges remain before they can be put into practical use. Deep learning methods often require large-scale standard datasets, while the existing COVID-19 CT datasets are insufficient. Also, CT scans collected from different institutes have inconsistent characteristics like orientation, brightness, etc. The trained models might be more sensitive to these irrelevant information than the pneumonic pathologies that really matter. Furthermore, the increasingly great capability of CNN-based models may not be fully fulfilled when learning from the limited data sources. Hence, methods for learning more parameter-efficient representations are crucial for mitigating the data insufficiency issue and improving the classification performance.

By addressing these problems above, a more reliable COVID-19 CT classification system can reduce the workload of clinicians and provide more accurate and sensitive computer-aided diagnoses. Motivated by these factors, this study aims to use deep learning techniques to improve the COVID-19 CT classification performance of commonly used CNNs. Particularly, to alleviate the data insufficiency and enhance the data diversity, we design and apply augmentation of incremental levels on the currently largest COVID-19 CT benchmark dataset (COVIDx CT-2A) [14]. Meanwhile, to find the optimal selection of CNN architectures and augmentation combinations, we explore seven commonly used CNN architectures under seven augmentation settings. The CNNs include SqueezeNet1.1 [15], MobileNetV2 [16], DenseNet121 [17], ResNet-18/34/50 [18], and InceptionV3 [19]. Meanwhile, contrastive learning is one promising self-supervised method for enabling deep learning models to learn more parameter-efficient features. We propose the similarity regularization (SR) derived from contrastive learning to learn more parameter-efficient representations and improve CNN classification. The experimental results demonstrate that SR can improve the classification performance of CNNs stably and surpass conventional contrastive learning. Our main contributions are summarized as follows:

  • (a)

    We investigate the impacts of augmentation and model selection in COVID-19 CT classification for three classes, including normal, non-COVID-19 pneumonia (NCP), and COVID-19 pneumonia (CP).

  • (b)

    We propose SR as a regularization term for learning more parameter-efficient representations. Comparisons between seven models with or without SR are conducted. The experimental results demonstrate that SR can improve classification stably without extra introduced model parameters during the test interface.

  • (c)

    Our proposed model, DenseNet121-SR, achieves 99.44% test accuracy, 98.40% precision, 99.59% sensitivity and 99.50% specificity for COVID-19 positive class, achieving the state-of-the-art.

  • (d)

    On other COVID-19 CT datasets, i.e., SARS-CoV2 and COVIDx CT-1, our DenseNet121-SR outperforms the existing methods in terms of efficiency and accuracy.

  • (e)

    We extend the study to seven classic natural datasets and find that our DenseNet121-SR is superior to the original DenseNet121 for all tasks, indicating that our method can be generalized to general classification problems.

The rest of this paper is organized as follows. In Section 2, we review the related works and analyze their pros and cons. Section 3 describes our proposed method. We demonstrate and analyze the experimental results in Section 4 and conduct ablation studies in Section 5. Section 6 discusses our achievements, limitations, and future works. Section 7 draws the conclusion.

2. Related works

2.1. COVID-19 related researches

CNNs are increasingly improving the COVID-19 CT classification with advanced algorithms and enhanced datasets. Numerous CNN-based methods achieving high accuracy have been proposed, indicating the potential of CNNs in assisting practical diagnosis. Some representative methods on four benchmark datasets are listed in Table 1.

Table 1.

Comparison of classification metrics between multiple deep learning methods in four datasets. The precision, sensitivity, and specificity metrics are for COVID-19 positive class only. (The decimal places are kept consistent as reported in the publications.).

Dataset Method Params. (M) Accuracy (%) Precision (%) Sensitivity (%) Specificity (%)
SARS-CoV-2 [20] Alshazly et al. [21] 86.74 99.4 99.6 99.8 99.6
Silva et al. [22] 4.78 98.99 99.20 98.80
Kundu et al. [23] 132.86 98.93 98.93 98.93 98.93
Jaiswal et al. [24] 96.25 96.29 96.29 96.21
Panwar et al. [13] 20.55 94.04 95.30 94.04 95.86
Jangam et al. [25] 202.87 93.5 89.91 98
Wang et al. [26] 90.83 95.75 85.89

COVID-CT [27] Chen et al. [28] 88.5 89.9 88.6
He et al. [29] 0.55 86
Polsinelli et al. [30] 1.26 85.03 85.01 87.55 81.95
Jangam et al. [25] 202.87 84.73 78.15 94.9
Wang et al. [26] 78.69 78.02 79.71

COVIDx CT-1 [12] Gunraj et al. [12] 1.40 99.1 99.7 97.3 99.9
Ter-Sarkisov [31] 34.14 91.66 90.80 94.75

COVIDx CT-2A [14] Zhao et al. [32] 23.51 99.2 98.5 98.7 99.5
Gunraj et al. [14] 0.45 98.1 97.2 98.2 98.8
Gunraj et al. [12] 1.40 94.5 90.2 99.0 95.7

In the COVID-19 CT classification, there exist no gold standard datasets so far. The four widely employed open-access datasets [12], [14], [20], [27] in Table 1 differ in many aspects, including patient/scan distribution, collection sources, dataset size, class numbers, labeling quality, etc. Particularly, COVID-CT [27] and SARS-CoV-2 [20] are two small binary-classification datasets containing 812 and 2,482 CT scans for COVID-19 positive and non-COVID classes, respectively. Gunraj et al. released a larger dataset COVIDx CT-1 [12] consisting of 104,009 scans for normal, NCP and CP classes upon which the authors later built COVIDx CT-2 [14]. COVIDx CT-2 is the largest existing dataset containing 194,922 CT scans, combined from multiple data sources. Generally, data-driven methods like CNNs depend heavily on dataset size. This can be drawn from the classification metrics in Table 1 that the methods trained on larger datasets can roughly achieve higher performance. To ensure both the data diversity and satisfactory results, our study employs COVIDx CT-2A [14] as our target dataset.

Despite the various datasets, many CNN-based methods have been developed to continuously boost classification performance. In particular, researchers often use transfer learning [13], [21], [22], [23], [24], [25], [33], [34], [35] or ensemble learning [22], [23], [25], [33], [36], [37] to overcome data insufficiency in small-scale datasets like SARS-CoV-2 and COVID-CT. For example, Jaiswal et al. [24] and Panwar et al. [13] utilized transfer learning to pre-train the weights of VGG19 and DenseNet201 on ImageNet and then fine-tuned on SARS-CoV-2, achieving 96.25% and 94.04% accuracy, respectively. Besides, ensemble learning, merging the decisions from multiple models into a more balanced decision, has been widely applied in some works through different merging approaches like weighted sum [25], voting [22], and fuzzy rank-based fusion [23]. However, ensemble learning is rarely applied in large-scale datasets like COVIDx CT-1/2 [12], [14]. Specifically, COVID-Net CT-1/2 L [12], [14] are two light-tailored CNNs whose architectures are finely designed by automatic neural architecture searching. The two models are extremely parameter-efficient and achieved 94.5% and 98.1% accuracy with only 0.45 MB and 1.40 MB parameters, respectively. Another research [32] employed ResNet-50x1 pre-trained on ImageNet-21k and fine-tuned on COVIDx CT-2A, achieving 99.2% accuracy.

Drawn from the reviewed research works above, deep learning models can achieve higher performance in COVID-19 CT classification through the approaches that: (1) train models over data of higher diversity; (2) with finely designed neural networks; (3) ensemble the decisions from multiple models; (4) inherit out-of-domain classification knowledge. Although models can benefit from these aspects, the expensive computational cost of neural architecture searching and large-scale pre-training, and long execution time caused by over-parameterization should be considered as well.

2.2. Contrastive learning

In recent years, supervised deep learning models of increasing complexity and depth have shown great progress in many large-scale applications like ImageNet classification [18], [19]. However, directly applying these models to COVID-19 datasets of smaller scales might cause over-parameterization. It means that model capacities cannot be fully fulfilled, and the extracted representations are not parameter-efficient. One promising approach to addressing the issue is contrastive learning.

In the deep learning field, it is widely recognized that the model performance depends on the quality of their learned representation. Contrastive learning, also known as contrastive self-supervised representation learning, is one framework aiming at learning efficient representations without human-specified labels. In general, the main idea of contrastive learning is to project inputs into an embedding space where the embedded vectors of similar samples are closer while dissimilar ones are apart. More formally, for visual tasks, a pair of views augmented from one image is considered a positive pair, while pairs of views from different images are considered negative pairs. Hence, contrastive learning models aim to maximize the representation similarity between positive pairs and minimize that between negative pairs. In practical tasks, contrastive learning often pre-trains the front representation extractors of deep learning models in a self-supervised manner, and then fine-tunes the pre-trained weights in a conventional supervised manner.

The state-of-the-art contrastive learning frameworks include MoCo  [38], [39], SimCLR [40], [41], SimSiam [42], SwAV [43], BYOL [44], etc. These frameworks mainly differ in terms of the loss function, representation projection, and negative pair formation [42]. And the differences further determine their requirements on the complexity of augmentation policies and batch size. Normally, in order to obtain a satisfactory result, contrastive methods depend on a large batch size to cover enough negative pairs [38], [39], [40], [41]. Among these models, BYOL, SwAV and SimSiam are the contrastive frameworks requiring no negative pairs. In ImageNet linear classification experiments [42], BYOL achieves relatively better performance. This explains that we select BYOL as the basic framework for SR calculation as in Section 3.2.

The success of contrastive learning has emerged some applications in COVID-19 CT diagnosis [28], [29], [45]. He et al. [29] employed a MoCo-like [38] framework to enhance the CT scan representations extracted by DenseNet169 and fine-tuned the network, achieving 86% accuracy on COVID-CT [27]. Similarly, Chen et al. employed the MoCo-v2-like [39] framework on the same dataset and reached 88.5% accuracy within six shots. Li et al. [45] put the contrastive loss as a regularization term and trained their CMT-CNN in an end-to-end manner, obtaining 93.46% accuracy. These studies suggest contrastive learning can boost classification performance by learning more efficient representations.

3. Method

3.1. Augmentation of incremental levels

Data augmentation is vital for improving the performance of deep learning models, especially for contrastive learning [39], [40]. However, the optimal selection for COVID-19 CT augmentation has not been studied. Inspired by the literature in Section 2, we design and evaluate a series of augmentation operations of incremental levels as follows where “+” denotes the appended augmentation based on the previous level:

Level 0

No augmentation.

Level 1

+ RandomResizedCrop: Randomly obtain an image crop of size in the range [0.08,1] of the original size 256 × 256, and then randomly scale the crop according to an aspect ratio in the range [3/4,4/3]. The scaled crop is finally resized to the original size.

Level 2

+ Horizontal Flip. Randomly flip the input image horizontally with 50% using probability.

Level 3

+ RandAugment [46]: Randomly apply rand augment twice with magnitude 9 and magnitude standard deviation 0.5.

Level 4

+ Random Erasing [47]: Select a rectangle region of the input image and do pixel-wise erasing with 25% using probability. The size of the selected region are randomly picked in the range [0.02,1/3] of the image size.

Level 5

+ Mixup [48]: Mix two in-batch images up with a ratio λ subjecting to a beta distribution, λB(1,1). The mixup process for images IA and IB can be formatted as IA(x,y)=λIA(x,y)+(1λ)IB(x,y), where (x,y) denotes the pixel coordinate.

Level 6

+ CutMix [49]: Switch from Mixup to Cutmix with 50% probability. Randomly replace a square region in the original image with a region in another in-batch image. The region size is randomly determined, subject to the squared root of a beta distribution B(1,1).

The visualization of the augmented scans is demonstrated in Fig. 2. Specifically, RandomResizedCrop and horizontal flip are two commonly used augmentation operations in both supervised [18], [19] and self-supervised learning [39], [40], [41], [44]. Since contrastive learning requires more complicated augmentation [40], the two stronger augmentations, RandAugment and Random Erasing, are further introduced in levels 3 and 4. Their implementations and parameters refer to [50], [51]. In levels 5 and 6, mixup and cutmix are two augmentations enabling higher data diversity by fusing in-batch images. In these two levels, we mainly experiment on whether such sample-fusing augmentations can improve COVID-19 classification. By comparing the performance of models under these incremental augmentation levels, an appropriate augmentation strategy for COVID-19 CT scans can be established.

Fig. 2.

Fig. 2

Illustration of applied incremental augmentation of six levels. On the right side, six groups of images are augmented from the same left COVID-19 positive scan. These groups from top to down are in the augmentation levels from 1 to 6. The bottom left normal scan is the auxiliary original image that only participated in Mixup/Cutmix augmentation in levels 5 and 6. The left two scans are from COVIDx CT-2A.

3.2. Similarity regularization

Most mainstream conventional CNNs contain two parts, a representation extractor f and a followed fully connected layer FC. The extractor f aims to extract the distinguishable representations of given inputs, and FC predicts the class probability distribution by summarizing the extracted representations. This forward propagation is demonstrated as the top branch in Fig. 1. More formally, the input image x is first transformed to a view v by a random on-the-fly augmentation operation tT where T denotes an infinite collection of augmentation operations. Subsequently, the representation extractor f converts the input view v to a representation embedding vector h=f(v). FC predicts the class probability distribution based on its obtained representation, yˆ=FC(h). The training target of such a classifier is to minimize the class probability distribution distance between the prediction yˆ and the ground truth y according to the cross-entropy loss in Eq. (1), where i{0,1,2} denotes the class index.

H(y,yˆ)=i=02yilogyˆi (1)

Fig. 1.

Fig. 1

The overall structure of the models with our proposed similarity regularization in training interface. The two projectors, g1 and g2, and the online predictor p are implemented by non-linear MLPs. After training, only the online encoder f1 and fully connected layer FC are preserved in testing.

In this conventional fully supervised scenario, the trained representations aim at better projecting to human-specified class distribution. However, this manner affects data efficiency, robustness or generalization [52]. Instead, contrastive learning enables learning more parameter-efficient representations from inputs themselves instead of the specified annotations. We thus incorporate it in common CNNs to improve their representation learning ability.

The overall structure of our method is illustrated in Fig. 1. We keep the conventional supervised classifier unchanged in the top branch while introducing a contrastive learning framework in the bottom branch. As in Section 2.2, contrastive learning is to maximize the representation similarity between positive pairs. We punish the positive-pair representation distance as a regularization term beside the cross-entropy loss, naming the term similarity regularization (SR).

Particularly, the contrastive framework is a siamese network like most mainstream frameworks [40], [41], [42], [44], consisting of an online network and a target network. The target network can be seen as a moving average of the online one. Given two views v1 and v2 augmented from the same input image x, the representation extractors f1 and f2 in two networks extract their corresponding latent representation vectors, h1=f1(v1) and h2=f2(v2). To avoid representations heavily affected by SR, the representation vectors then projected to another embedding space where z1=g1(h1) and z2=g2(h2), as in [41], [44]. Since the projectors g1 and g2 share slightly different feature spaces, the online projection z1 is further projected to p(z1) of same dimension via online predictor p. The cosine representation similarity S of value in range [1,1] can be measured according to Eq. (2).

S(p(z1),z2)=p(z1),z2p(z1)2z22 (2)

where , and 2 are inner product and L2 norm notations, respectively. A higher value indicates two vectors are of higher similarity. To penalize a low cosine similarity between positive pairs and scale the penalty in range [0,1], SR can be calculated as Eq. (3).

D(p(z1),z2)=22S(p(z1),z2)=22p(z1),z2p(z1)2z22 (3)

Hence, for a positive pair (v1,v2), its total loss containing both cross-entropy loss and SR is written as in Eq. (4).

L(z1,z2,y,yˆ)=(1γ)H(y,yˆ)+γD(p(z1),z2) (4)

where γ is a scale factor for balancing the conventional cross-entropy loss and the introduced SR.

(v2,v1) is the symmetric positive pair with respect to (v1,v2). We calculate the losses for both symmetric pairs and take their mean as the final loss for fast convergence.

SR as a regularization term may raise concern if it will dominate the combined loss and thus degrade the classification. To remove the concern and find an appropriate scheduler for γ, we design three strategies as listed below. i denotes the current training iteration number.

Constant (default)

: γ is set to be a constant value during all training iterations, 0.5 by default.

Linear Decay
: γ decays linearly to a minimum value γmin=0.01 along N training iterations according to Eq. (5).
γi=γmin+(1iN)(1γmin) (5)
Cosine Decay
: γ decays to a minimum value γmin=0.01 along N training iterations according to cosine annealing scheduler as in Eq. (6).
γi=γmin+12(1+cosiπN)(1γmin) (6)

graphic file with name fx1001_lrg.jpg

Besides, after training, we throw away all the components except the online representation extractor f1 and the fully connected layer FC. Hence, introducing SR in training will not slow down the test interface. The training pseudocode of models with SR is demonstrated in Algorithm 1.

4. Results and analysis

4.1. Dataset description

In this paper, we mainly train and evaluate our proposed method using the largest existing open-access COVID-19 CT dataset, COVIDx CT-2A.1 Specifically, the dataset contains three classes, including normal, non-COVID-19 pneumonia (NCP), and COVID-19 pneumonia (CP). Its class distribution is summarized in Table 2. The dataset is of high diversity, containing scans of 3,745 patients from eight open-access sources. It should be noted that the scans from the same patient are in one subset, preventing information leakage from training to validation or testing.

Table 2.

Class distribution of the employed COVIDx CT-2A dataset.

Set Normal NCP CP Total
Training 35,996 25,496 82,286 143,778
Validation 11,842 7400 6244 25,486
Testing 12,245 7395 6018 25,658

4.2. Experimental setting

In this paper, we keep the hyper-parameters consistent in all experiments for fair comparisons. The codes are implemented by PyTorch. We implement the CNN backbones and image augmentation by torchvision and timm [51] libraries, respectively. For acceleration, we train models on Torch distributed data parallelism on four Nvidia V100 GPUs with apex mixed precision of level O1. Besides, to alleviate the randomness concern, we obtain the experimental statistics by averaging the measurements in three distinct trials.

During training, CT scans are resized to 256 × 256 in 3 channels using bicubic interpolation and normalized by ImageNet mean and std. In the test interface, 256 × 256 CT scans are cropped from the center of resized 293 × 293 original images. This is empirically good as the center crop can preserve the main lung regions. To avoid models from being too confident in one-class prediction, label smoothing [19] of smoothing factor 0.1 is applied in the cross-entropy in augmentation levels 04. While in augmentation level 5 or 6, in-batch paired labels are mixed up based on mixed inputs (See [48], [49] for more details).

The optimizer we used is Adam with 10−6 weight decay. After a 5-epoch linear warmup [53] from 5×107, we use cosine annealing scheduler to decay the learning rate from 5×104 to 5×107 in the later 45 epochs. The batch size is set to 64 in each process. Besides, the gradients are clipped to be no larger than 5.0 to avoid overflow.

In the SR calculation, the projectors g1,g2 and predictor p have the same multi-layer perceptron (MLP) architecture that consists of two linear layers connected by a batch normalization layer and a ReLU activation layer. The front linear layer projects the inputs to 512-D embedding vectors and the later linear layer outputs 128-D vectors. The analysis for the dimension setting is in Section 5.3. The momentum rate β for updating f2 and g2 is 0.99, a median value among contrastive frameworks [38], [42], [43], [44].

4.3. Results of ResNets under incremental augmentation levels

We first compare the performance of ResNets with or without SR under the incremental augmentation levels designed in Section 3.1 to determine an appropriate augmentation policy for the coming experiments. The averaged test accuracies are listed in Table 3. Since SR requires calculating the similarity between two augmented views, models with SR cannot be implemented under augmentation level 0.

Table 3.

Test accuracy comparison between original ResNets and ResNets with proposed SR under incremental augmentation levels. ResNet is abbreviated as R. The scale factor scheduler for scaling SR is the default constant scheduler γ=0.5.

Augmentation Method CNN backbone
R18 R34 R50
Level 0 Original 91.28 92.28 91.57

Level 1 Original 99.10 99.00 98.89
+SR (Ours) 99.23 99.27 99.09

Level 2 Original 99.17 99.11 99.00
+SR (Ours) 99.27 99.39 99.19

Level 3 Original 99.11 99.06 99.08
+SR (Ours) 99.26 99.29 99.20

Level 4 Original 99.12 99.09 99.13
+SR (Ours) 99.40 99.43 99.31

Level 5 Original 97.49 97.34 97.66

Level 6 Original 98.50 98.69 98.91

Table 3 shows that the original ResNets achieve the highest accuracy in level 2 and cannot be improved in the following levels, heavily degraded in levels 5 and 6. The degradation may result from the fact that sample-fusing augmentation sometimes transfers the pneumonic pathologies from CP/NCP cases to normal cases. We thus do not perform SR in levels 5 and 6. Different from the original ResNets, ResNets with SR continue to improve after level 2 and achieve the highest accuracy in level 4. This is consistent with the findings in many contrastive learning research works that contrastive learning requires stronger augmentation than supervised models [39], [40], [54]. Hence, we select level 4 as the basic augmentation level for the following experiments. Overall, it is observed that SR can improve the classification performance of ResNets stably under all augmentation levels from 1 to 4.

4.4. Results under augmentation level 4

In this section, we extend the experiments to seven widely used CNNs, including SqueezeNet1.1 [15], MobileNetV2 [16], DenseNet121  [17], ResNet-18/34/50 [18], and InceptionV3 [19]. The experiments are under augmentation level 4 and a constant scale factor scheduler γ=0.5.

From the results in Table 4, it can be seen that all our models with SR surpass the original models in terms of averaged test accuracy. The best model, DenseNet121 with SR, achieves 99.44% accuracy with 7.33M parameters. Note that the extra parameters will be thrown away after training so that the parameters in the test interface are consistent for a model with or without SR. Another observation is that, in COVID-19 CT classification, the model performance is not strictly proportional to its capacity despite model architecture. This suggests that the fine design of model architecture rather than simply expanding depth or width is more valuable in this task, as supported in [12], [14], [30].

Table 4.

Test accuracy of seven CNNs with or without SR under augmentation level 4, ordered by the number of parameters. The scale factor scheduler for scaling SR is the default constant strategy γ=0.5. Note that the extra parameters introduced by SR will be discarded in test interface after training.

Method Metric CNN backbone
SqueezeNet MobileNet DenseNet ResNet18 ResNet34 InceptionV3 ResNet50
Original Params. (M) 0.69 2.12 6.63 10.66 20.30 20.78 22.42
Acc. (%) 98.22 99.02 99.05 99.08 99.06 99.22 99.12

+SR (Ours) Params. (M) 1.13 2.94 7.33 11.10 20.74 21.97 23.62
Acc. (%) 98.41 99.18 99.44 99.39 99.43 99.32 99.31

Fig. 3 shows the confusion matrices for DenseNet121-SR in three training trials. Based on the matrices, we measure the performance of the model in terms of averaged accuracy, precision, sensitivity, and specificity, as listed in Table 5. The results show that our DenseNet121-SR has outperformed the state-of-the-art models in nearly all measurements. Specifically, DenseNet121-SR achieves a high sensitivity 99.59% for COVID-19 positive class, indicating that the model has the potential to efficiently avoid COVID-19 positive patients from being wrongly diagnosed.

Fig. 3.

Fig. 3

Confusion matrix for DenseNet121-SR in three trials.

Table 5.

Comparison of DenseNet121-SR with the state-of-the-art methods on COVIDx CT-2A dataset.

Method Accuracy (%) Precision (%)
Sensitivity (%)
Specificity (%)
Normal NCP CP Normal NCP CP Normal NCP CP
COVID-Net CT-1 [12] 94.5 96.1 97.6 90.2 98.8 80.2 99.0 96.3 99.4 95.7
COVID-Net CT-2 S [14] 97.9 99.3 96.4 97.0 98.9 95.7 98.1 99.3 98.9 98.8
COVID-Net CT-2 L [14] 98.1 99.4 96.7 97.2 99.0 96.2 98.2 99.5 99.0 98.8
Bit-M [32] 99.2 99.8 98.9 98.5 99.3 99.6 98.7 99.8 99.6 99.5
DenseNet121-SR (Ours) 99.44 99.89 99.55 98.40 99.12 99.83 99.59 99.91 99.82 99.50

To better understand the behavior of our model, we visualize the attention of DenseNet121-SR on three CT scans in different classes as in Fig. 4. It can be observed that our model mainly focuses its attention on some suspicious regions where the pathologies may exist.

Fig. 4.

Fig. 4

Attention of DenseNet121-SR visualized by Grad-CAM. The three groups of CT scans and heatmaps from left to right are in class normal, NCP, and CP, respectively. The highlighted parts are the regions based on which CNNs classify the CT scans.

4.5. Results on other datasets

On other COVID-19 CT datasets Based on the experimental results aforementioned, we extend our method to the other two COVID-19 CT datasets, i.e., SARS-CoV2 and COVIDx CT-1. It should be noted that, for SARS-CoV2, we train DenseNet121-SR over 200 epochs with weights pre-trained on ImageNet because SARS-CoV2 contains much fewer CT scans than the others. The results as listed in Table 7 show that our method can be generalized to other datasets and can achieve a high classification performance. Comparing to the methods listed in Table 1, our DenseNet121-SR with only 6.63 MB parameters is more parameter-efficient and outperforms the reviewed methods.

Table 7.

Classification results of DenseNet121-SR on SARS-CoV2 and Covidx CT-1 datasets. The precision, sensitivity, and specificity metrics are for COVID-19 positive class only.

Dataset Acc (%) Prec (%) Sens (%) Spec (%)
SARS-CoV2 99.20 99.47 98.93 99.46
COVIDx CT-1 99.78 99.56 99.84 99.74

On classic natural datasets Besides, extensive experiments are conducted over seven natural datasets to further evaluate the generalization ability of our method. To evaluate the effect of SR fairly, we keep the setting unchanged as in Section 4.2 and initialize the model weights as pre-trained on ImageNet. Table 6 demonstrates the classification accuracy of DenseNet121 with or without SR on the seven datasets, including FGVC Aircraft [55], CIFAR10/100 [56], Describable Textures Dataset (DTD) [57], Oxford 102 Flowers [58], Oxford-IIIT Pets [59], and Stanford Cars [60]. It shows that DenseNet121-SR is superior to the original model in all the tasks, indicating our proposed SR can be generalized to general classification problems.

Table 6.

Classification accuracy of DenseNet121 with or without SR over seven classic natural datasets.

Aircraft CIFAR10 CIFAR100 DTD Flowers102 OxfordIIITPet StanfordCars
DenseNet121 88.15 94.45 85.08 70.60 93.17 92.47 92.46
DenseNet121-SR (Ours) 88.18 94.47 85.08 71.01 94.42 92.88 92.55

5. Ablation study

The following ablation studies are conducted to better investigate the effects of our proposed SR.

5.1. Ablation to self-supervised learning

Fully self-supervised learning Contrastive learning is widely adopted in pre-training CNNs that are fine-tuned later for downstream tasks. In our methods, we turn the process to an end-to-end manner by regularizing CNNs with proposed SR derived from contrastive learning. Hence, a comparison between SR and conventional contrastive learning is necessary. Specifically, we design and measure the following methods for comparison as follows.

  • (a)

    Linear Evaluation. First pre-train the representation extractor f weights of which are frozen in the later FC fine-tuning. The pre-training process is equivalent to setting γ=1 in all training epochs as in Algorithm 1, and then fine-tuning only the linear layers FC as usual. The fine-tuning hyper-parameters include: 256 batch size, learning rate decays from 40 to 4×106 according to cosine decay scheduler [53]. The optimizer used is SGD. Linear evaluation is simply conducted for verifying the effects of contrastive learning in this task.

  • (b)

    Two-stage training (self-supervised contrastive learning followed by conventional supervised learning). This way first pre-trains the representation extractor f and then trains the entire CNN with pre-trained weights. The hyper-parameters are consistent with others as in Section 4.2.

  • (c)

    Apply SR to ResNets with a default constant γ=0.5.

The results for the designs above are listed in Table 8. It can be observed that contrastive learning can learn efficient representations that even a simple linear evaluation on the pre-trained representation extractor can achieve over 92% test accuracy. For the second method, two-stage contrastive learning, the pre-trained weights from the representation extractor might be hard to maintain in the later training phase. Our introduced SR maintains the representation by explicitly penalizing the representation difference for positive pairs. The results in Table 8 verify that ResNets with SR surpass the two-stage contrastive learning method in most experiments. Besides, it is worth noting that the end-to-end training with SR does not require pre-training and thus saves computational resources.

Table 8.

Test accuracy comparison between two-stage training and end-to-end training with SR under incremental augmentation levels. The scale factor scheduler for scaling SR is the default constant scheduler γ=0.5.

Augmentation Method CNN backbone
R18 R34 R50
Level 1 Linear Eval 95.47 92.89 92.32
Two-stage 99.17 99.17 99.04
+SR (Ours) 99.23 99.27 99.09

Level 2 Linear Eval 95.76 92.03 94.02
Two-stage 99.25 99.23 99.20
+SR (Ours) 99.27 99.39 99.19

Level 3 Linear Eval 93.92 93.88 95.08
Two-stage 99.26 99.29 99.24
+SR (Ours) 99.26 99.29 99.20

Level 4 Linear Eval 93.12 93.23 94.53
Two-stage 99.38 99.30 99.18
+SR (Ours) 99.40 99.43 99.31

5.2. Ablation to decay strategy for γ

The two-stage contrastive learning method can be approximated to run the SR Algorithm 1 with γ=1 in pre-training and γ=0 in fine-tuning. The sharp fall of γ may destroy the maintained representation space obtained in contrastive pre-training. To avoid the potential negative impact, we designed two mild γ decay strategies in Section 3.2 despite the constant γ strategy. From the results demonstrated in Fig. 5, we can conclude that SR with all designed γ strategies can stably improve classification accuracy. And SR is insensitive to the γ strategy setting since all strategies have comparable performance. Due to the simplicity of the γ strategy (γ=0.5 in all iterations) and its slight superiority in level 4 augmentation, we select it as the default strategy in our experiments.

Fig. 5.

Fig. 5

Accuracy of ResNets obtained with different γ decay strategies under incremental augmentation levels from 1 to 4. The baselines are the original models without introduced SR. The γ value for the constant strategy is 0.5 by default while γ decays from 1.0 to 0.01 in linear and cosine strategies.

The ablation studies find that a constant strategy, γ=0.5, can achieve the relatively highest performance among the three strategies under level 4 augmentation.

The constant γ value still requires studies for finding its effects on model performance. We thus vary the γ value in constant strategy from 0.1 to 0.9 with 0.2 interval and repeat the experiments for CNNs with SR under augmentation level 4. As shown in Fig. 6, SR can improve the CNN classification performance when γ value is in an appropriate range near [0.5,0.7]. In particular, a smaller γ cannot fully fulfills the advantage of SR and sometimes even degrades the model capacity as in SqueezeNet1.1 case. Meanwhile, setting γ to a large value like 0.9 is also risky since SR dominates the total loss while the primary cross entropy for classification is slighted.

Fig. 6.

Fig. 6

Accuracy of seven CNNs with SR controlled by constant γ scale factor strategy, under augmentation level 4. γ varies from 0.1 to 0.9 with 0.2 interval.

5.3. Ablation to projection size in SR

The output dimension or named projection size, of both the projector and predictor in SR calculation is set to 128 as default. We keep the hidden dimension 512 unchanged to avoid redundant computation while varying the projection size to analyze its effect in terms of classification accuracy. As visualized in Fig. 7, the differences between the classification accuracy for models except for SqueezeNet are small (0.2%). This indicates that the hyper-parameter setting in our proposed SR is robust.

Fig. 7.

Fig. 7

Impact of projection size in SR calculation.

6. Discussion

Since COVID-19 grows rapidly worldwide, designing efficient and accurate classification systems is essential. Although some methods [12], [14], [21], [22], [32] have claimed a high classification accuracy (99%) on multiple datasets, we argue that even a slight improvement can mitigate further infection. Meanwhile, some high-performance methods require considerable computational resources, making them hard to be deployed into practical healthcare systems. Hence, designing more efficient models with affordable training parameters should also be noted.

This paper mainly proposes an incremental augmentation strategy and SR to improve the CNN classification performance on three COVID-19 CT datasets. The results illustrate that appropriate augmentation can significantly alleviate the data limitation problem in COVID-19 CT classification. Meanwhile, our proposed SR further improves the classification performance of seven CNNs by enhancing their representation learning ability. Specifically, on the largest dataset COVIDx CT-2A, our model DenseNet121-SR achieves 99.44% accuracy and 99.59% sensitivity with only 6.63 MB parameters in the test interface, outperforming all the reviewed state-of-the-art methods. Overall, the designed augmentation strategy enhances the COVID-19 CT dataset diversity approximately and the proposed SR better fulfills the representation learning ability of CNNs within definite parameters, both leading to the high classification performance of our work.

Besides, we evaluate DenseNet121-SR on the other two datasets, achieving 99.78% and 99.20% accuracy on COVIDx CT-1 and SARS-CoV2, respectively. To further justify the effect of SR, we extend the DenseNet121-SR to seven classic natural datasets, illustrating SR can be generalized to general classification tasks. Furthermore, since SR derives from contrastive learning, we compare traditional contrastive learning and end-to-end training with SR in ablation studies. The comparison demonstrates that SR is superior in classification accuracy and training efficiency and is robust to its hyper-parameter setting.

Despite the achieved promising performance, the limitations of our method exist. Our method requires either large amounts of training data or pre-training on other large-scale datasets. The high performance of our models with SR partly owes to the efforts of workers collecting numerous CT scans. For smaller-scale datasets like SARS-CoV2, the backbone of our method requires pre-training. The pre-training on ImageNet helps improve the accuracy from around 98% to 99.20% in the DenseNet121-SR case. Besides, we can hardly evaluate other contrastive frameworks due to the lack of computational resources. Meanwhile, we cannot redesign the CNN backbones to better balance computational efficiency and classification performance because of the substantial computational loads of neural architecture searching and pre-training.

Before deploying to clinical practice, computer-aided methods must be evaluated regarding their robustness and generalization. Compared to the existing online CT datasets, CT scans in practical scenarios are obtained by more diverse imaging devices under changeable environments. It is significant for the methods to adapt to these unfamiliar scenarios. Also, the interpretability of the methods for inferring the decision principle can make the methods more trustable. Besides, with the evolution of coronavirus, whether the pathological changes can still be observed clearly in CT scans should be concerned as well. Computer-aided methods can convince clinicians and patients as assisting tools in practical uses only if the issues above are addressed.

For future work, we will explore redesigning the network backbone, pre-training the redesigned backbones on large-scale datasets, and making networks more explainable in COVID-19 CT diagnosis.

7. Conclusion

This paper aims to improve the CNN performance for COVID-19 CT classification by enabling CNNs to learn parameter-efficient representations from CT scans. We propose the SR technique derived from contrastive learning and apply it to seven commonly used CNNs. The experimental results show that SR can stably improve the CNN classification performance. Together with a well-designed augmentation strategy, our model DenseNet121-SR with 6.63 MB parameters outperforms the existing methods on three COVID-19 CT datasets, including SARS-CoV2, COVIDx CT-1, and COVIDx CT-2A. Specifically, on the largest available dataset COVIDx CT-2A, DenseNet121-SR achieves 99.44% accuracy, 98.40% precision, 99.59% sensitivity, and 99.50% specificity for the COVID-19 pneumonia category. Furthermore, the extensive experiments on seven classic natural datasets demonstrate that SR can be generalized to common classification problems.

CRediT authorship contribution statement

Yujia Xu: Conceptualization, Methodology, Writing – original draft. Hak-Keung Lam: Method improvement, Writing – review & editing, Supervision. Guangyu Jia: Method improvement. Jian Jiang: Writing – review & editing. Junkai Liao: Writing – review & editing. Xinqi Bao: Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by King’s College London and has been performed using resources from the Cirrus UK National Tier-2 HPC Service at EPCC funded by the University of Edinburgh and EPSRC (EP/P020267/1).

Footnotes

References

  • 1.WHO . 2022. WHO Coronavirus (COVID-19) dashboard. https://covid19.who.int/. (Online; Last accessed 7 April 2022) [Google Scholar]
  • 2.McKibbin W., Fernando R., et al. The economic impact of COVID-19. Economics in the Time of COVID-19. 2020;45(10.1162) [Google Scholar]
  • 3.Iacobucci G. COVID-19: New UK variant may be linked to increased death rate, early data indicate. Br. Med. J. 2021;372(230):n230. doi: 10.1136/bmj.n230. [DOI] [PubMed] [Google Scholar]
  • 4.Mahase E. COVID-19: Where are we on vaccines and variants? Br. Med. J. 2021;372:n597. doi: 10.1136/bmj.n597. [DOI] [PubMed] [Google Scholar]
  • 5.Gu Y. 2022. COVID-19 infections tracker. https://covid19-projections.com/infections-tracker/. (Online; Last Accessed 17 August 2021) [Google Scholar]
  • 6.Kucirka L.M., Lauer S.A., Laeyendecker O., Boon D., Lessler J. Variation in false-negative rate of reverse transcriptase polymerase chain reaction–based SARS-CoV-2 tests by time since exposure. Ann. Intern. Med. 2020;173(4):262–267. doi: 10.7326/M20-1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li Y., Yao L., Li J., Chen L., Song Y., Cai Z., Yang C. Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19. J. Med. Virol. 2020;92(7):903–908. doi: 10.1002/jmv.25786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tahamtan A., Ardebili A. Real-time RT-PCR in COVID-19 detection: Issues affecting the results. Exp. Rev. Mol. Diagnostics. 2020;20(5):453–454. doi: 10.1080/14737159.2020.1757437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kovács A., Palásti P., Veréb D., Bozsik B., Palkó A., Kincses Z.T. The sensitivity and specificity of chest CT in the diagnosis of COVID-19. Eur. Radiol. 2021;31(5):2819–2824. doi: 10.1007/s00330-020-07347-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fang Y., Zhang H., Xie J., Lin M., Ying L., Pang P., Ji W. Sensitivity of chest CT for COVID-19: Comparison to RT-PCR. Radiology. 2020;296(2):E115–E117. doi: 10.1148/radiol.2020200432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Xie X., Zhong Z., Zhao W., Zheng C., Wang F., Liu J. Chest CT for typical coronavirus disease 2019 (COVID-19) pneumonia: Relationship to negative RT-PCR testing. Radiology. 2020;296(2):E41–E45. doi: 10.1148/radiol.2020200343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gunraj H., Wang L., Wong A. COVIDNet-CT: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest CT images. Front. Med. 2020;7:1025. doi: 10.3389/fmed.2020.608525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Panwar H., Gupta P., Siddiqui M.K., Morales-Menendez R., Bhardwaj P., Singh V. A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gunraj H., Sabri A., Koff D., Wong A. 2021. COVID-Net CT-2: Enhanced deep neural networks for detection of COVID-19 from chest CT images through bigger, more diverse learning. arXiv:arXiv:2101.07433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Iandola F.N., Han S., Moskewicz M.W., Ashraf K., Dally W.J., Keutzer K. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5 MB model size. arXiv preprint arXiv:1602.07360. [Google Scholar]
  • 16.M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
  • 17.G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.
  • 18.K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  • 19.C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
  • 20.Angelov P., Almeida Soares E. 2020. SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification. MedRxiv. [Google Scholar]
  • 21.Alshazly H., Linse C., Barth E., Martinetz T. Explainable COVID-19 detection using chest CT scans and deep learning. Sensors. 2021;21(2):455. doi: 10.3390/s21020455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Silva P., Luz E., Silva G., Moreira G., Silva R., Lucio D., Menotti D. COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis. Inform. Med. Unlocked. 2020;20 doi: 10.1016/j.imu.2020.100427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kundu R., Basak H., Singh P.K., Ahmadian A., Ferrara M., Sarkar R. Fuzzy rank-based fusion of CNN models using Gompertz function for screening COVID-19 CT-scans. Sci. Rep. 2021;11(1):1–12. doi: 10.1038/s41598-021-93658-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jaiswal A., Gianchandani N., Singh D., Kumar V., Kaur M. Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. J. Biomol. Struct. Dyn. 2020:1–8. doi: 10.1080/07391102.2020.1788642. [DOI] [PubMed] [Google Scholar]
  • 25.Jangam E., Annavarapu C.S.R. A stacked ensemble for the detection of COVID-19 with high recall and accuracy. Comput. Biol. Med. 2021;135 doi: 10.1016/j.compbiomed.2021.104608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wang Z., Liu Q., Dou Q. Contrastive cross-site learning with redesigned net for COVID-19 CT classification. IEEE J. Biomed. Health Inf. 2020;24(10):2806–2813. doi: 10.1109/JBHI.2020.3023246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhao J., Zhang Y., He X., Xie P. 2020. COVID-CT-dataset: A CT scan dataset about COVID-19. arXiv preprint arXiv:2003.13865, 490. [Google Scholar]
  • 28.Chen X., Yao L., Zhou T., Dong J., Zhang Y. Momentum contrastive learning for few-shot COVID-19 diagnosis from chest CT images. Pattern Recognit. 2021;113 doi: 10.1016/j.patcog.2021.107826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.He X., Yang X., Zhang S., Zhao J., Zhang Y., Xing E., Xie P. 2020. Sample-efficient deep learning for COVID-19 diagnosis based on CT scans. Medrxiv. [Google Scholar]
  • 30.Polsinelli M., Cinque L., Placidi G. A light CNN for detecting COVID-19 from CT scans of the chest. Pattern Recognit. Lett. 2020;140:95–100. doi: 10.1016/j.patrec.2020.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ter-Sarkisov A. 2020. COVID-CT-mask-Net: Prediction of COVID-19 from CT scans using regional features. MedRxiv. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhao W., Jiang W., Qiu X. Deep learning for COVID-19 detection based on CT images. Sci. Rep. 2021;11(1):1–12. doi: 10.1038/s41598-021-93832-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shaik N.S., Cherukuri T.K. Transfer learning based novel ensemble classifier for COVID-19 detection from chest CT-scans. Comput. Biol. Med. 2022;141 doi: 10.1016/j.compbiomed.2021.105127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Garg S., Kumar S., Muhuri P.K. A novel approach for COVID-19 infection forecasting based on multi-source deep transfer learning. Comput. Biol. Med. 2022;149 doi: 10.1016/j.compbiomed.2022.105915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fallahpoor M., Chakraborty S., Heshejin M.T., Chegeni H., Horry M.J., Pradhan B. Generalizability assessment of COVID-19 3D CT data for deep learning-based disease detection. Comput. Biol. Med. 2022;145 doi: 10.1016/j.compbiomed.2022.105464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kundu R., Singh P.K., Mirjalili S., Sarkar R. COVID-19 detection from lung CT-Scans using a fuzzy integral-based CNN ensemble. Comput. Biol. Med. 2021;138 doi: 10.1016/j.compbiomed.2021.104895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Akter S., Das D., Haque R.U., Tonmoy M.I.Q., Hasan M.R., Mahjabeen S., Ahmed M. AD-CovNet: An exploratory analysis using a hybrid deep learning model to handle data imbalance, predict fatality, and risk factors in Alzheimer’s patients with COVID-19. Comput. Biol. Med. 2022 doi: 10.1016/j.compbiomed.2022.105657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
  • 39.Chen X., Fan H., Girshick R., He K. 2020. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297. [Google Scholar]
  • 40.Chen T., Kornblith S., Norouzi M., Hinton G. International Conference on Machine Learning. PMLR; 2020. A simple framework for contrastive learning of visual representations; pp. 1597–1607. [Google Scholar]
  • 41.Chen T., Kornblith S., Swersky K., Norouzi M., Hinton G. 2020. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029. [Google Scholar]
  • 42.X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15750–15758.
  • 43.Caron M., Misra I., Mairal J., Goyal P., Bojanowski P., Joulin A. 2020. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882. [Google Scholar]
  • 44.Grill J.-B., Strub F., Altché F., Tallec C., Richemond P.H., Buchatskaya E., Doersch C., Pires B.A., Guo Z.D., Azar M.G., et al. 2020. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733. [Google Scholar]
  • 45.Li J., Zhao G., Tao Y., Zhai P., Chen H., He H., Cai T. Multi-task contrastive learning for automatic CT and X-ray diagnosis of COVID-19. Pattern Recognit. 2021;114 doi: 10.1016/j.patcog.2021.107848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.E.D. Cubuk, B. Zoph, J. Shlens, Q.V. Le, RandAugment: Practical automated data augmentation with a reduced search space, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 702–703.
  • 47.Z. Zhong, L. Zheng, G. Kang, S. Li, Y. Yang, Random erasing data augmentation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 13001–13008.
  • 48.Zhang H., Cisse M., Dauphin Y.N., Lopez-Paz D. 2017. Mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412. [Google Scholar]
  • 49.S. Yun, D. Han, S.J. Oh, S. Chun, J. Choe, Y. Yoo, CutMix: Regularization strategy to train strong classifiers with localizable features, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6023–6032.
  • 50.Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jégou H. International Conference on Machine Learning. PMLR; 2021. Training data-efficient image transformers & distillation through attention; pp. 10347–10357. [Google Scholar]
  • 51.Wightman R. 2019. PyTorch image models. GitHub Repository, GitHub. [DOI] [Google Scholar]
  • 52.Khosla P., Teterwak P., Wang C., Sarna A., Tian Y., Isola P., Maschinot A., Liu C., Krishnan D. 2020. Supervised contrastive learning. arXiv preprint arXiv:2004.11362. [Google Scholar]
  • 53.Loshchilov I., Hutter F. 2016. SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983. [Google Scholar]
  • 54.Asano Y.M., Rupprecht C., Vedaldi A. 2019. A critical analysis of self-supervision, or what we can learn from a single image. arXiv preprint arXiv:1904.13132. [Google Scholar]
  • 55.Maji S., Rahtu E., Kannala J., Blaschko M., Vedaldi A. 2013. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151. [Google Scholar]
  • 56.Krizhevsky A., Hinton G., et al. Citeseer; 2009. Learning Multiple Layers of Features from Tiny Images. [Google Scholar]
  • 57.M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3606–3613.
  • 58.Nilsback M.-E., Zisserman A. 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE; 2008. Automated flower classification over a large number of classes; pp. 722–729. [Google Scholar]
  • 59.Parkhi O.M., Vedaldi A., Zisserman A., Jawahar C. 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2012. Cats and dogs; pp. 3498–3505. [Google Scholar]
  • 60.J. Krause, M. Stark, J. Deng, L. Fei-Fei, 3D object representations for fine-grained categorization, in: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 554–561.

Articles from Computers in Biology and Medicine are provided here courtesy of Elsevier

RESOURCES