Abstract
The COVID-19 disease, initially known as SARS-CoV-2, was first reported in early December 2019 and has caused immense damage to humans globally. The most widely used clinical screening method for COVID-19 is Reverse Transcription Polymerase Chain Reaction (RT-PCR). RT-PCR uses respiratory samples for testing, because of which, this manual technique becomes complicated, laborious and time-consuming. Even though it has a low sensitivity, it carries a considerable risk for the testing medical staff. Hence, there is a need for an automated diagnosis system that can provide quick and efficient diagnosis results. This research proposed a multi-scale lightweight CNN (LMNet) architecture for COVID-19 detection. The proposed model is computationally less expensive than previously available models and requires less memory space. The performance of the proposed LMNet model ensemble with DenseNet169 and MobileNetV2 is higher than the other state-of-the-art models. The ensemble model can be integrated at the backend of the smart devices; hence it is useful for the Internet of Medical Things (IoMT) environment.
Keywords: Convolutional neural network, Deep learning, Classification, Transfer learning, Ensemble learning, COVID-19
Graphical abstract
1. Introduction
The year 2020 and 2021 have been plagued by the coronavirus (COVID-19) pandemic and have affected our lives. More than 250 million cases have been reported, out of which there have been more than 5 million fatalities. COVID-19 was first reported in Wuhan, China, in December 2019 and was declared a pandemic on 11th March 2020 by the World Health Organization (WHO)1 . The infectious disease was first called SARS-CoV-22 . COVID-19 belongs to the Nidovirus family. This family also includes viruses that cause Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). The symptoms like breathlessness, fever, sneezing, and throat swelling is common indications of COVID-19 infection [1]. The infection is also observed to cause the severe acute respiratory syndrome, pneumonia, kidney failure and even death in some severe cases. COVID-19 was more likely to develop severe instances in persons with cardiovascular illness, diabetes, and cancer (usually older people) [1].
Though many vaccines have been developed throughout the world, no permanent cure has been found for COVID-19 to date. Hence, the need for early detection and prevention has risen. COVID-19 transmits when a person sneezes or coughs [1]. These droplets scatter in the air or settle on the ground and nearby surfaces. When someone inhales or comes in contact with these droplets on the surfaces, he poses the risk of getting infected. RT-PCR is a widely used clinical screening method for COVID-19 detection. It uses respiratory samples for testing, because of which, this technique becomes complicated, and time-consuming. Even though it has a low sensitivity, it carries a huge risk for the tests’ medical staff. Further, the WHO suggests 10–30 tests per confirmed case as the optimal benchmark of adequate testing. Most of the COVID-19 cases have similar features to radiographic images. Features on radiographic images include multi-focal, bilateral, ground-glass opacities with a peripheral or posterior distribution, which are present in the lower lobes, in the early stage and in the later stage it is present in pulmonary consolidation [2], [3], [4]. Therefore radiological imaging has become an essential diagnostic tool for COVID-19.
Although Chest X-ray (CXR) images assist in the early detection of suspected COVID-19 cases, CXR imaging of viral pneumonia infections is comparable to those of other communicable and inflammatory lung disorders [2], [4]. Thus making it difficult for radiologists to distinguish COVID-19 from other lung infections (mainly viral pneumonia) [1], [2], [5].
This research proposes a model called LMNet for COVID-19 detection. The LMNet model consists of convolutional layer, batch normalization, flatten layer, fully connected dense layer, and activation functions such as Leaky ReLU, ReLU and Sigmoid. The proposed LMNet model is computationally less expensive than available transfer learning models and requires less memory space. Hence, the proposed model can be loaded onto the computer’s memory. The input CXR image will be pre-processed, re-scaled, resized, and then fed into the queue. The trained model will be stored in the memory. Whenever the new image will be inserted into the queue, the stored model will predict the probability of the three classes and give that as its final diagnosis in the IoMT environment. Section 3.6 discusses the process of the model embedding and their working with the IoMT device in detail. The major contributions made by our research are as follows:
-
1.
Proposed a lightweight multi-scale CNN (LMNet) architecture for COVID-19 detection. The proposed model is computationally less expensive than previously available models and requires less memory space. The accuracy and F1-score achieved by the proposed LMNet and ensemble model are higher than the other state-of-the-art methods for COVID-19 detection.
-
2.
The proposed model is suitable for embedding in Internet-enabled devices for early detection of COVID-19 due to low computational complexity.
The organization of the work presented in this paper is as follows: Section 2 confers the relevant work within the research area of COVID-19 detection. Section 3 illustrates the proposed system, the dataset and its pre-processing and preparation details. Section 4 discusses the results of the experimented models, followed by conclusion of the research work in Section 5.
2. Related works
Since the beginning of the COVID-19, much research has been reported to predict and identify the COVID-19 infected people. Chest X-ray (CXR) image is the widely used input to the existing binary or multi-class classification models. Researchers mainly use recent technology like transfer learning, deep learning, and Artificial Intelligence with IoT based frameworks [6] to provide the solutions. This section highlights the models used for COVID-19 prediction with their performance and limitations in detail.
Apostolopoulos and Mpesiana [7] used X-ray images of Normal, Pneumonia, and COVID-19 cases for automatic detection by applying transfer learning techniques. Their procedure extracted significant biomarkers relating to the COVID-19 disease with an accuracy of 96.78% on two class classification using MobileNetV2 model. Zhang et al. [8] proposed a Confidence Aware Anomaly Detection (CAAD) model, which is a one-class model for fast and reliable scanning. CNN was tuned on CT scans using multi-objective differential evolution (MODE) by Singh et al. [9] to classify people with and without COVID-19. Their model achieved accuracy up to 93.5%. Jaiswal et al. [10] used DenseNet201 based deep transfer learning model for a similar purpose. Their model achieved classification accuracy of 96%. Adhikari [11] suggested a method to reduce turnaround time for doctors using both X-ray and CT scans and, proposed a 2-staged DenseNet architecture for diagnosing COVID-19 patients. Alqudah et al. [12] studied two different scenarios for COVID-19 detection using CXR images. For the first, MobileNet, AOCTNet and ShuffleNet CNNs were used to extract the features. For the second, features extracted were classified using machine learning algorithms.
Khan et al. [3] obtained a 95% accuracy on a 3-class classification (COVID-19, Pneumonia and Normal) using the Xception architecture. Ozturk et al. [13] proposed a deep learning network for automating the COVID-19 diagnosis. Their multi-class model achieved an accuracy of 87.02%, and the binary-class model attained an accuracy of 98.08%. Wang and Wong [14] used the deep CNN model and achieved a testing accuracy of 92.60%. Farooq and Hafeez [15] achieved 96.23% accuracy using ResNet CNN for 4-classes namely Normal, COVID-19, bacterial pneumonia and viral pneumonia.
Wang et al. [16] used a personalized VGG16 model for identification of infectious lung regions and different types of pneumonia classification. A 121-layer CNN on CXR images was studied and proposed by Rajpurkar et al. [17]. They identified 14 unique disorders, including pneumonia, using a combination of distinct networks. Pre-trained DenseNet-121 along with feature extraction techniques were used by Ho and Gwak [18] in the precise identification of 14 thoracic disorders. Lakhani and Sundaram [19] obtained an AUC of 0.95 in pneumonia identification using GoogLeNet and AlexNet with image enhancement. Kong and Agarwal [20] affirmed about opacities at the right infrahilar space found in a COVID-19 patient. Jain et al. [4] experimented on three different pre-trained models. They analysed the results obtained through InceptionV3, Xception, and ResNeXt model. In their experiment, the best performing model was Xception which achieved a testing accuracy of 97%. Maghdid et al. [5] proposed a modified pre-trained AlexNet model and a simple CNN architecture. Their AlexNet model achieved accuracy up to 98%, and the simple CNN model achieved 94.1% accuracy.
2.1. Research gaps and motivation
The limitations observed with the current research works in the domain of COVID-19 detection are as follows:
-
1.
After doing literature reviews, we observed that most of the research works had been carried out on small datasets with a limited training sample. But, to make an effective model, the model should be trained on the datasets having a large variation in the training sets.
-
2.
Existing research works extract features from X-ray images using a single-scale receptive field and fails to detect edge variation features properly. Hence multi-scaled filters helps the model to learn edge variations properly.
-
3.
Most of the researches have been carried out using pretrained models such as VGG16, ResNet, Xception etc. These models have large number of parameters and are computationally expensive, and even require large memory space. Hence the proposed model provides a proper balance between the accuracy and complexity of the deep learning model.
3. Proposed methodology
The aim of this research is to build a lightweight model for COVID-19 detection which requires less resources and can easily be integrated with Internet-enabled devices. The existing transfer learning models required more resources, computation time, and a high-end configuration system to run the model. This research addresses these limitations and achieves competitive prediction accuracy in most test cases. The complete workflow of the proposed LMNet model is shown in Fig. 1. The steps involved to build the model is discussed in the successive subsections.
Fig. 1.
Overview of the proposed Lightweight Multi-scale CNN (LMNet) model.
Table 1.
Description of the dataset used to train and test the proposed LMNet model.
Classes | Training data | Test data | Total count (Per Class) |
---|---|---|---|
COVID-19 | 460 | 116 | 576 |
Normal | 1260 | 317 | 1577 |
Pneumonia | 3418 | 855 | 4273 |
Total | 5138 | 1288 | 6426 |
3.1. Dataset description
The dataset used in this research consists of 6426 CXR images of three classes: COVID-19, Normal and Pneumonia 3 . This dataset is divided into training and testing set in the ratio of 80% and 20% respectively. As shown in Table 1, the training set consisted of 5138 CXR images, out of which 460 are COVID-19, 1260 are Normal, and 3418 are Pneumonia. The test set consisted of 1288 CXR images in the composition of 116 COVID-19, 317 Normal, and 855 Pneumonia. To create a few variations from each image, data augmentation techniques was performed on all three classes of the training dataset.
3.2. Data pre-processing
The CXR images present in the dataset are in different shapes and sizes. Therefore, the images were resized before passing it to different CNN architecture as an input. The proposed research work also uses pre-trained deep CNN models. Therefore for different CNNs models, different sizes of images need to feed as an input. For InceptionV3 and Xception, the images were resized to 299 × 299 pixels, whereas for DenseNet169, ResNet50V2, MobileNetV2 and the proposed LMNet model, the images were resized to 224 × 224 pixels. We also scaled down the input image by factor 255 before feeding it to the model. Every digital image is formed by the pixel having values in the range of 0 to 255, 0 is black, and 255 represents white. Therefore scaling it down to 255 means transforming every pixel value from range [0,255] to [0,1]. Since single input images share the same hyperparameters such as weights and learning rate. When the pixel values range between [0,255], the high difference between the image pixel values tends to create stronger loss while low range creates weak loss, the sum of them will all contribute to the backpropagation update. Scaling every image to the same range [0,1] will contribute more evenly to the total loss.
3.3. Data augmentation
This paper used three different image augmentation techniques on the training data, i.e. translation, zoom and rotation, to generate variations in the available dataset. As shown in Fig. 2, the image translation was performed by horizontally flipping. A zoom augmentation is also applied with a zoom range equal to 0.3, which randomly zooms the image and adds new pixel values around the image or interpolates the pixel values. Therefore, the images zoom in between 70% and zoom out up to 130%. The rotation operation used for image augmentation was done by rotating the images in the clockwise and anticlockwise direction in the range of 0 and 90 degrees.
Fig. 2.
Outcomes of the data augmentation process on image.
3.4. Pretrained models
Transfer learning models are widely used for accurate COVID-19 predictions. This research used multiple pre-trained transfer learning models and compared their outcomes with the proposed LMNet model. The details of the used transfer learning models can be found on Keras library4 .
-
•
ResNet50V2: The residual neural network model is a powerful backbone model that uses batch normalization at its core. ResNet uses Identity connection; and the network is protected from the effect of the vanishing gradient problem that occurs as the network gets deeper and more complex. In addition, the bottleneck residual blocks design is used to make training faster. ResNet50 is a CNN that is 50-layer deep and was trained on the ImageNet dataset.
-
•
InceptionV3: The inception module is a collection of the convolutional and pooling operations performed parallelly. This helps extract features using different kernel sizes. The InceptionV3 also uses the Label Smoothing Regularization method along with batch normalization for a fully connected layer of the Auxiliary classifier. Similar to the ResNet50 model, this network is also trained with the ImageNet dataset.
-
•
Xception: It uses Depthwise Separable Convolutions. It also relies on shortcuts between Convolution blocks similar to ResNet. They are expensive to train but give a lot better performance than Inception models.
-
•
MobileNetV2: It is an architecture designed for mobile devices. It has residual connections between bottleneck layers, based on an inverted residual structure. It was designed to give high accuracy even while keeping the parameters and mathematical operations to a minimum.
-
•
EfficientNetB2: EfficientNets are a group of models carefully designed by balancing network depth, width, and resolution for better performance. The main building block of EfficientNet is mobile inverted bottleneck MBConv, which was taken from MobileNetV2. It uses shortcuts between bottlenecks which effectively reduces computation.
-
•
VGG19: It is a 19 layer deep CNN. It has convolution layers and always uses the same padding and maxpool layer rather than focusing on a large number of hyper-parameters. The spacial padding preserves the spacial resolution of image. In the end, for output, it has 2 fully connected layers followed by a softmax. VGG19 was trained using the ImageNet dataset.
3.5. LMNet:Proposed model
This paper proposed a lightweight multi-scaled network inspired from [21], [22] for COVID-19 detection. The linearly connected single branch convolutional layers focus on uniformly scaled filter sizes and thus ignore detailed information of edges. Thereby deteriorates the learning capability to estimate minute changes in the pixel values of input classes variations. Multi-scaled filters allow the network to preserve extensive and small space features that enhance the network’s robustness to deal with the issues mentioned earlier. When clubbed in lightweight architecture, multi-scaled filters achieve good classification accuracy and reduce the computational and space complexity of real-time COVID-19 detection applications. A detailed comparison of the number of parameters, size and hyperparameters of existing pretrained transfer learning models and the proposed LMNet model is shown in Table 2. The number of convolutional layers, the optimizers used at the particular layer and other details of the proposed LMNet model is shown in Table 3. Algorithm 1 discusses the data preparation steps, and Algorithm 2 highlights the steps involved in the proposed LMNet model.
Table 2.
Comparison of hyperparameters between proposed and existing transfer learning framework.
Hyperparameter | Existing transfer learning models |
Proposed model | |||||
---|---|---|---|---|---|---|---|
VGG19 | InceptionV3 | DenseNet169 | ResNet50V2 | Xception | MobileNetV2 | ||
Input shape | (224,224,3) | (299,299,3) | (224,224,3) | (224,224,3) | (299,299,3) | (224,224,3) | (224,224,3) |
Model size | 1.5GB | 250 MB | 152.75 MB | 270 MB | 250.6 MB | 27.57 MB | 11.72 MB |
Total parameters | 143 0667 240 | 23 851 784 | 14 307 880 | 25 613 800 | 22 910 480 | 3 538 984 | 961 923 |
Epochs | 30 | 30 | |||||
Batch size | 32 | 32 | |||||
Loss | Categorical crossentropy | Categorical crossentropy | |||||
Optimizer | Adam (lr 0.001) | Adam (lr 0.001) | |||||
Learning rate | , | , |
Table 3.
Details of the convolutional layers and other hyperparameter used in the proposed LMNet model Architecture.
Layers | Filter/Stride | Output dimension |
1 × 1 filters |
3 × 3 filters |
5 × 5 filters |
7 × 7 filters |
Dilation | Activation | Parameters |
---|---|---|---|---|---|---|---|---|---|
Conv2D_1_a | 7 × 7/2 | 112 × 112 × 16 | 16 | LeakyRelu | 2368 | ||||
Conv2D_1_b | 3 × 3/2 | 112 × 112 × 16 | 16 | LeakyRelu | 448 | ||||
Add_1_b | 112 × 112 × 16 | 0 | |||||||
Conv2D_2_a | 5 × 5/2 | 56 × 56 × 32 | 32 | LeakyRelu | 12 832 | ||||
Conv2D_2_b | 1 × 1/2 | 56 × 56 × 32 | 32 | LeakyRelu | 544 | ||||
Add_2_b | 56 × 56 × 32 | 0 | |||||||
BatchNormalization | 56 × 56 × 32 | 128 | |||||||
Conv2D_3_a | 1 × 1/2 | 28 × 28 × 16 | 16 | Relu | 528 | ||||
Conv2D_3_b | 1 × 1/2 | 28 × 28 × 16 | 16 | Relu | 528 | ||||
Conv2D_3_c | 1 × 1/2 | 28 × 28 × 16 | 16 | Relu | 528 | ||||
Conv2D_3_d | 1 × 1/2 | 28 × 28 × 16 | 16 | Relu | 528 | ||||
Add_3_a | 28 × 28 × 16 | 0 | |||||||
Add_3_b | 28 × 28 × 16 | 0 | |||||||
Conv2D_4_a | 3 × 3/1 | 28 × 28 × 32 | 32 | 3 | Relu | 4640 | |||
Conv2D_4_b | 3 × 3/1 | 28 × 28 × 32 | 32 | 3 | Relu | 4640 | |||
Conv2D_4_c | 3 × 3/1 | 28 × 28 × 32 | 32 | Relu | 4640 | ||||
Conv2D_4_d | 3 × 3/1 | 28 × 28 × 32 | 32 | Relu | 4640 | ||||
Add_4_a | 28 × 28 × 32 | 0 | |||||||
Add_4_b | 28 × 28 × 32 | 0 | |||||||
Conv2D_5_a | 5 × 5/1 | 28 × 28 × 32 | 64 | 2 | Relu | 51 264 | |||
Conv2D_5_b | 5 × 5/1 | 28 × 28 × 32 | 64 | 2 | Relu | 51 264 | |||
Conv2D_5_c | 5 × 5/1 | 28 × 28 × 32 | 64 | Relu | 51 264 | ||||
Conv2D_5_d | 5 × 5/1 | 28 × 28 × 32 | 64 | Relu | 51 264 | ||||
Concat | 28 × 28 × 256 | 0 | |||||||
Conv2D_6_a | 3 × 3/2 | 13 × 13 × 256 | 256 | Relu | 590 080 | ||||
Flatten | 1 × 1 × 43 264 | 0 | |||||||
Dense | 3 | Softmax | 129 795 | ||||||
Total | 961,923 |
LMNet begins with a block comprising of 4 convolution layers, where 3 × 3 and 7 × 7 sized convolution filters slides over the input image simultaneously to extract minute variations in the pixel values of the input CXR image. The feature map obtained from the above two convolution filters are added to preserve the receptive field’s inherited features. Now, 1 × 1 and 5 × 5 sized multi-scaled convolution filters slide over the fused feature map obtained from the input image simultaneously to capture other situational and environmental information present in the feature map. These multi-scaled convolution filters allow the network to learn minute variations in the pixel values and enhance the model’s distinctive capability with fewer parameters. The features obtained from 1 × 1 and 5 × 5 sized multi-scaled convolution filters are added using Eq. (1).
(1) |
where is feature obtained from each convolution operation, , and represents the length, width and depth of feature map respectively, extracted after convolution operation.
The added features are then normalized by using the batch normalization layer. Batch normalization reduces the dependence of gradients on the scale of the parameters or their initial values. It also reduces the demand for dropout layer, local response normalization and other regularization techniques. The output from the batch normalization layer passes into a block consisting of four parallel paths comprised of a convolution layer, with 16 filters of size 1 × 1. The two of these four paths follow the sequential CNN flow, whereas the remaining two paths combine the features generated by previous layers. Similar blocks are added linearly to the previous block consisting of 16 filters of size 1 × 1. The next two blocks consists of 32 and 64 filters of size 3 × 3 and 5 × 5, respectively. The top two lateral convolution operations are dilated with dilation rate of 3 and 2 for 3 × 3 and 5 × 5 filters respectively, as shown in Fig. 1. The output of dilated convolution operation is calculated as shown in Eq. (2).
(2) |
where implies the output of dilated convolution layer with given input and a filter , given that the length and width are and respectively.
The dilated convolution operation helps us extract more information from the output obtained after every convolution operation. It also helps us to achieve larger receptive field at the same computational expenses. The kernel size after dilation changes as shown in Eq. (3).
(3) |
where is kernel size after dilation, represents initial kernel size and implies to dilation rate.
The feature maps extracted from the block having 64 filters of size 5 × 5 are concatenated and forwarded to the next convolution layer with 256 filters of size 3 × 3 and stride equal to 2 as shown in Fig. 1. We choose convolution layer with stride two instead of max pooling for dimension reduction of feature maps in order to extract micro-level features of edges. Then, resultant feature maps are flattened and passed to the fully connected layer. The flattened feature map at the fully connected layer passes through softmax activation function as shown in Eq. (4), to predict the probability distribution among three input classes; COVID-19, Normal and Pneumonia. The output layer uses categorical cross-entropy loss function as shown in Eq. (5) to calculate loss between actual and predicted label.
(4) |
(5) |
where represents the values from the neurons of the fully connected layer, represent number of training samples and is the total number of classes. The and in the Eq. (5) are true label and predicted label respectively. The proposed model is optimized by using the Adam optimizer to update the weights for which model attains minimum total loss.
3.6. Embedding in smart devices
Fig. 3 shows the graphical representation of embedding the proposed LMNet model into internet-enabled smart devices for early prediction of the COVID-19. The CXR image will be given as an input to the model, which will be processed, analysed and then yield the predictive outcomes. A stream of electromagnetic radiation is produced by the X-ray machine,5 which interacts with the anode in the X-ray tube. The X-rays produced by this method are then projected onto the chest region of the patient. Then the X-ray sensitive plates capture these rays to produce a digital image of the object, which is transferred to the computer. This CXR image can be viewed on the computer screen by the medical expert.
Fig. 3.
Overview of model embedding and their working in IoMT devices.
The CXR image will be pre-processed before making final output predictions, and the input image is rescaled and resized and then fed into the input queue by the producer which is a service that brings the images to be examined into the input queue. The consumer stores the model loaded into the memory and whenever it finds a new image in the queue it calls the predict function on the model and removes the image from the queue. The consumer uses a polling mechanism to keep track of the queue. The result obtained from the model is the probability of class (COVID-19, Normal, Pneumonia) for the CXR image. The medical expert can consider the result shown and the CXR image to give his diagnosis of the patient.
4. Results
The proposed model is executed in python language and implemented in the Keras framework with TensorFlow in the backend. The extensive experiments were carried out on Kaggle’s free access to NVIDIA TESLA P100 GPUs. The total of 16 GB VRAM as GPU, with 13 GB RAM 2-core of Intel Xeon as CPU. The implementation of deep learning models on CXR image data is done using the Kaggle free access to NVIDIA TESLA P100 GPUs. The pre-trained models were first trained and tested individually along with LMNet model. Later top 3 models based on the model size and accuracy were ensembled, and that ensemble model was used to test the data. The performance of the model with the above-mentioned settings are separately discussed in two subsections (i) Results with pre-trained transfer learning models (Section 4.1), (ii) Results of the proposed LMNet and ensemble model (Section 4.2). To evaluate the performance of the proposed model, precision, recall, F1-score, and AUC-ROC curve are used. These evaluation metrics are defined in Eqs. (6) to (9):
-
•
The precision of COVID-19/Normal/Pneumonia can be calculated by obtaining the ratio of accurately predicted COVID-19/Normal/Pneumonia to total real COVID-19/Normal/Pneumonia in the dataset as given in Eq. (6).
-
•
Recall of COVID-19/Normal/Pneumonia can be calculated as a ratio of accurately predicted COVID-19/Normal/Pneumonia to total actual COVID-19/Normal/Pneumonia as given in Eq. (7).
-
•
F1-score is the harmonic mean of precision and recall. The mathematical equation for the F1-score can be seen in Eq. (8).
-
•
Accuracy is the sum of truly predicted COVID-19, Normal and Pneumonia class out of the total number of input images. The mathematical equation is shown in Eq. (9).
(6) |
(7) |
(8) |
(9) |
TN represents true negative, FN represents the false-negative, TP represents true positive and FP represents false positive [23], [24].
The AUC-ROC curve, also called Area Under The Curve-Receiver Operating Characteristics curve, is plotted with TPR against the FPR, where TPR stands for True Positive Rate, and FPR stands for False Positive Rate. The TPR and FPR values of COVID-19 class are calculated using Eqs. (10), (11), respectively. Similarly, TPR and FPR can be calculated for the Normal and Pneumonia classes. Macro-average results in a simple average over the classes, so every class is given equal weight independently from their proportion.
(10) |
In Eq. (10), A Number of accurately predicted COVID-19 CXR B Number of COVID-19 CXR predicted as COVID-19, and C Number of COVID-19 CXR predicted as Normal and Pneumonia CXR.
(11) |
In Eq. (11), D Number of Normal and Pneumonia CXR predicted as COVID-19, E Number of Normal and Pneumonia CXR predicted as COVID-19, and F Number of Normal and Pneumonia CXR tweets predicted as Normal and Pneumonia CXR.
4.1. Results with pre-trained transfer learning models
The experiment was started with InceptionV3, a pre-trained deep learning model. The performance of InceptionV3 on test samples was measured with precision, recall, F1-score, accuracy and AUC-ROC. The outcomes of these models are shown in Table 4. The precision for COVID-19 (say C1), Normal class (say C2), and for Pneumonia (say C3) was 1.00, 0.83, and 0.99, respectively. The recall value for the C1, C2, and C3 classes was 0.98, 0.99, and 0.93, respectively and the overall accuracy of the InceptionV3 model was 96.58%. The model size was 250 MB and had 22,910,480 parameters which is huge and requires a lot of resources to process the test data.
Table 4.
Experimental outcomes of the pretrained transfer learning models.
Model | Class | Precision | Recall | F1- score | Accuracy |
---|---|---|---|---|---|
InceptionV3 | COVID-19 | 1.00 | 0.98 | 0.99 | 96.58% |
Normal | 0.83 | 0.99 | 0.90 | ||
Pneumonia | 0.99 | 0.93 | 0.96 | ||
Macro average | 0.94 | 0.97 | 0.95 | ||
Weighted average | 0.96 | 0.95 | 0.95 | ||
Xception | COVID-19 | 1.00 | 0.99 | 1.00 | 97.52% |
Normal | 0.93 | 0.96 | 0.94 | ||
Pneumonia | 0.98 | 0.97 | 0.98 | ||
Macro average | 0.97 | 0.97 | 0.97 | ||
Weighted average | 0.97 | 0.97 | 0.97 | ||
DenseNet169 | COVID-19 | 1.00 | 1.00 | 1.00 | 97.63% |
Normal | 0.95 | 0.96 | 0.95 | ||
Pneumonia | 0.98 | 0.98 | 0.98 | ||
Macro average | 0.98 | 0.98 | 0.98 | ||
Weighted average | 0.98 | 0.98 | 0.98 | ||
VGG19 | COVID-19 | 1.00 | 0.96 | 0.98 | 95.42% |
Normal | 0.90 | 0.92 | 0.91 | ||
Pneumonia | 0.97 | 0.97 | 0.97 | ||
Macro average | 0.96 | 0.95 | 0.95 | ||
Weighted average | 0.95 | 0.95 | 0.95 | ||
ResNet50V2 | COVID-19 | 0.99 | 1.00 | 1.00 | 97.44% |
Normal | 0.93 | 0.95 | 0.94 | ||
Pneumonia | 0.98 | 0.97 | 0.98 | ||
Macro average | 0.97 | 0.97 | 0.97 | ||
Weighted average | 0.97 | 0.97 | 0.97 | ||
MobileNetV2 | COVID-19 | 0.99 | 0.99 | 0.99 | 97.83% |
Normal | 0.95 | 0.97 | 0.96 | ||
Pneumonia | 0.99 | 0.98 | 0.98 | ||
Macro average | 0.98 | 0.98 | 0.98 | ||
Weighted average | 0.98 | 0.98 | 0.98 |
The next model was Xception; the prediction accuracy with the Xception model was 97.52% which was better than InceptionV3. However, this model also required huge resources because of the high number of parameters and size of the model. The precision value of C2 (COVID-19) prediction using both InceptionV3 and Xception model was 1.00%, indicating that 100% accuracy. The other performance metrics, such as precision for C1, C2, and C3 class was 1.00, 0.93, and 0.98, respectively, whereas recall values were 0.99, 0.96, 0.97. With DesneNet169, the precision value for C1, C2 and C3 class was 1.00, 0.95, 0.98, and the recall value was 1.00, 0.96, 0.98, the corresponding F1-score was 1.00, 0.95, and 0.98 respectively. The accuracy of the DenseNet169 model was 97.63%. The DenseNet169 model also achieved the precision value for the C1 class, i.e., COVID-19 prediction as 1.00, whereas for Normal, it was 0.95 (highest among InceptionV3 and Xception models).
The precision for classes C1, C2, and C3 with the VGG19 model was poor than other models as it achieved 1.00, 0.90, and 0.97, respectively. The recall value also lowers the other experimented models as 0.96, 0.92, 0.97. The accuracy of the VGG19 model was 95.42% only, indicating around 5% of the test sample was misclassified. The next two models are ResNetV2 and MobileNetV2. The precision for classes C1, C2, and C3 with the ResNetV2 model was poor than other models as it achieved 0.99, 0.93, and 0.98, respectively. The recall value also lowers the other experimented models as 0.99, 0.97, 0.98. The accuracy of the ResNetV2 model was 97.83%. The performance of the MobileNetV2 model was similar to ResNetV2, where the accuracy value for prediction of the COVID-19 was 97.83%. The best accuracy value among all experimented pretrained transfer learning models, i.e., InceptionV3, Xception, DenseNet169, VGG19, ResNetV2, and MobileNetV2, 97.83% was obtained using the MobileNetV2 transfer learning model. The best performing pretrained model, i.e., MobileNetV2 consists of 35,38,984 parameters and have the size of 27.57MB. The MobileNetV2 model uses a single convolutional filter per input channel in the first layer. Hence, it is needed to build a model with multi-scaled filters that extracts extensive and small space features from input image and enhance the network’s robustness. And the model should have less number of parameters and requires less storage and computational resources for execution. Such a model can be embedded with an Internet-enabled low-cost machine and hence may be helpful in monitoring and detecting COVID-19 in real-time.
4.2. Results of the proposed model
The experiment was started with the pre-trained transfer learning models and achieved acceptable prediction accuracy, as shown in Table 4. However, due to huge parameters and large model size, it was needed to build a model having less number of parameters and a model that requires less storage. The proposed LMNet model filled these gaps and achieved comparative prediction performances. The performance of the proposed LMNet model on test samples was measured with the same metrics, i.e., precision, recall, F1-score, accuracy, and AUC-ROC. The C1, C2, and C3 class precision were 1.00, 0.92, and 0.98, respectively. The recall value for the C1, C2, and C3 classes was 0.97, 0.95, and 0.97, respectively. The overall accuracy of the proposed LMNet model was 96.03% as shown in Table 5. The model size was 11.72 MB and had 9,61,923 parameters which are very less as compared to the existing pre-trained transfer learning models (Table 2). The comparative performance of the models in terms of precision, recall, and F1-score are shown in Fig. 4. The ensemble framework of the best performing models, i.e., LMNet, DenseNet169, and MobileNetV2, achieved the precision value for classes C1, C2, and C3 as 1.00, 0.96, and 0.99, respectively, whereas the recall value was 1.00, 0.96, and 0.98. The accuracy of the ensemble model was 98.00%, outperforming the experimented pre-trained models. The algorithm 3 discusses the majority voting approach for the ensemble learning technique used in our proposed ensemble model. The accuracy of the ensemble framework confirmed that out of 100 test samples, only 2 were misclassified. The confusion matrix of the best performing pre-trained MobineNetV2, proposed LMNet, and ensemble model is shown in Fig. 5. The AUC-ROC plots obtained using these models are shown in Fig. 6.
Table 5.
Experimental outcomes of the Proposed LMNet, and Ensemble model.
Model | Class | Precision | Recall | F1- score | Accuracy |
---|---|---|---|---|---|
Proposed LMNet model | COVID-19 | 1.00 | 0.97 | 0.97 | 96.03% |
Normal | 0.92 | 0.95 | 0.93 | ||
Pneumonia | 0.98 | 0.97 | 0.97 | ||
Macro average | 0.97 | 0.96 | 0.96 | ||
Weighted average | 0.97 | 0.96 | 0.96 | ||
Proposed ensemble model | COVID-19 | 1.00 | 1.00 | 1.00 | 98.00% |
Normal | 0.96 | 0.96 | 0.96 | ||
Pneumonia | 0.99 | 0.98 | 0.98 | ||
Macro average | 0.98 | 0.98 | 0.98 | ||
Weighted average | 0.98 | 0.98 | 0.98 |
Fig. 4.
Performance Comparison of the proposed LMNet and ensemble model with the best performing pretrained MobileNetV2 model in term of precision, recall and F1-score.
Fig. 5.
Confusion matrix of the best performing pretrained (a) MobileNetV2, (b) Proposed LMNet and (c) Ensemble model.
Fig. 6.
AUC-ROC Curve for (A) MobileNetV2 (B) Proposed LMNet, and (C) Ensemble model.
4.3. Comparison with state-of-the-art
This section discusses the proposed model performances with the state-of-the-art COVID-19 detection models. Table 6 shows the list of similar research on COVID-19 prediction using similar datasets, the proposed models, and the accuracy value reported. The proposed LMNet model consists of comparatively fewer parameters among all listed research and achieves similar accuracy. Also, the ensemble model proposed by including the LMNet model’s prediction outperforms the listed performance metrics, i.e., precision, recall, F1-score, and accuracy value.
Table 6.
Comparison with state-of-the-art models.
Existing methods | Dataset used (COVID-19 positive) |
Classes | Method used | Accuracy |
---|---|---|---|---|
Apostolopoulos and Mpesiana [7] | X-rays(1428) | Dataset 1- COVID-19, Pneumonia and Normal | Transfer learning | Dataset 1 - 93.48% |
X-rays(1442) | Dataset 2- COVID-19, Pneumonia and Healthy | Dataset 2 - 94.72% | ||
Jaiswal et al. [10] | CT-scans(2492) | COVID-19, Normal | DenseNet201 | 96.00% |
Khan et al. [3] | X-rays(192) | Normal, Pneumonia, COVID-19 | DCNN | 95.00% |
Jain et al. [4] | X-rays(6432) | Normal, COVID-19, Pneumonia | Transfer learning | 97.00% |
Maghdid et al. [5] | CTs, and X-rays(431) | Normal, COVID-19 | Transfer learning | 98.00% |
Wang et al. [14] | X-rays(13800) | Normal, COVID-19, Pneumonia | COVID-Net | 92.60% |
Farooq at el.[15] | X-rays(5941) | COVID-19, Normal, Bacterial, Pneumonia, Viral | COVID-ResNet | 96.23% |
Ozturk et al. [13] | X-rays(625) | Binary class: COVID-19, Normal | DarkCovidNet | Binary – 98.08% |
X-rays(1125) | Multi-class: COVID-19, Normal, Pneumonia | Multi-class – 87.02% | ||
Proposed (LMNet) | X-rays(6426) | COVID-19, Normal, Pneumonia | LMNet | 96.03% |
Proposed (Ensemble) | X-rays(6426) | COVID-19, Normal, Pneumonia | LMNet DenseNet169 MobileNetV2 | 98.00% |
As shown in Table 6, the proposed LMNet model, despite being one of the most computationally economical model, outperformed previous researches carried out in the same domain. Khan et al. [3] performed a classification task on CXR images using a deep convolutional neural network model based Xception architecture and achieved an overall accuracy of 95%, which is 1.03% less than the accuracy achieved by our proposed LMNet model. Wang et al. [14] worked with 13 800 images and proposed a neural network architecture to introduce a lightweight projection–expansion–projection–extension design for COVID-19 detection, which greatly reduced computational complexity. Their COVID-Net model had 11.75 million parameters and achieved an accuracy of 92.60%. Ozturk et al. [13] experimented on both binary and multi-class classification. They proposed a DarkCovidNet architecture inspired by the DarkNet architecture with fewer layers and filters than the original DarkNet architectures. Their model achieved an accuracy of 98.08% on binary classification and 87.02% on multi-class classification. Apostolopoulos and Mpesiana [7] experimented using transfer learning techniques for automatic detection of COVID-19 disease and achieved an accuracy of 96.78%. Similarly Jaiswal et al. [10] also used pre-trained DenseNet201 architecture for the classification of CT-scan images and achieved an accuracy of 96%. Our proposed LMNet architecture outperformed existing models [7], [10], [13], [14]. Jain et al. [4] achieved a testing accuracy of 97% with the Xception model. Maghdid et al. [5] proposed a simple convolution neural network and a modified pre-trained AlexNet model. In their experiment, the pre-trained network achieved accuracy up to 98%, and the simple modified CNN achieved 94.1%. Farooq and Hafeez [15] also presented a fine-tuned pre-trained ResNet-50 architecture called COVIDResNet. They presented a 3-step technique to fine-tune the model by progressively resizing the input images to 128 × 128 × 3, 224 × 224 × 3, and 229 × 229 × 3 pixels and fine-tuning the network at each stage. Their fine-tuned model achieved an accuracy of 96.23%. Existing models [2], [4], [5], [15] performed better than the proposed LMNet model, but these existing models are computationally expensive and have large number of parameters. In a general trend, it can be observed that highly complex models achieve high accuracy. Therefore the major drawback to the pre-trained models is that they require significant training time and high computational resources, which make them resource expensive in real-world applications. A simpler model helps prevent overfitting on datasets when their size is small. In this paper, we have minimized the size of the neural network and the number of parameters. The simplicity and low complexity of the model is its edge over other pre-existing models. The proposed model maintains high accuracy with its low complexity and thus is suitable for devices with less computation power, such as an X-ray machine. A graph comparing the total number of parameters vs the accuracy is shown in Fig. 7 which demonstrates that our model, while being computationally inexpensive, delivers high accuracy. The proposed ensemble model consists of three different models(LMNet, DenseNet169, and MobileNetV2) with fewer parameters and a comparatively smaller model size. The proposed ensemble model outperformed the models mentioned above with an accuracy of 98%. The precision, recall, and F1-score for the COVID-19 class is 100% for the ensemble model.
Fig. 7.
Parameters Vs Accuracy comparison.
5. Conclusion
This paper proposed a new lightweight and multi-scale CNN model for COVID-19 detection using CXR images. The proposed LMNet model has less than 1 million parameters and performs better than many state-of-the-art techniques. The less number of parameters can be helpful in the integration of Internet-enabled devices in the IoMT environment. The classification accuracy of the proposed model is 96.03%, whereas the precision and recall for COVID-19 are 100% and 97%, respectively. On further investigation, we found that the LMNet, when ensembled with DenseNet169 and MobileNetV2, performs better than the other state-of-the-art methods for COVID-19 detection. The ensemble model also requires less memory space and can be easily integrated at the backend of the smart devices. Hence, it is useful for the Internet of Medical Things (IoMT) environment. The ensemble model achieved the classification accuracy of 98.00% for 3 class classification, whereas the precision and recall for COVID-19 are 100%. The present research work is limited to the COVID-19 detection based on the CXR images. Hence, in the future, the researcher can also analyze the multi-modal datasets, combining coughing sound and CXR image datasets to detect COVID-19 patients. The present work can also be extended by considering CXR images as a multi-label classification problem because a single CXR can simultaneously be diagnosed with many different medical conditions. Therefore building a multi-label classifier for predicting diseases using CXR images will be more accurate for diagnosis.
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.compeleceng.2022.108325.
Biographies
Vishwajeet Dwivedy is currently pursuing his B.Tech degree in Computer Science and Engineering from Indian Institute of Information Technology Surat, India. His research interest includes deep learning machine learning, data analytic and natural language processing.
Harsh Deep Shukla is currently a graduate student in the Department of Computer Science and Engineering, Indian Institute of Information Technology Surat, India. His major research interests include machine learning and data engineering.
Pradeep Kumar Roy received his Ph.D. degree in Computer Science and Engineering from the National Institute of Technology Patna in 2018. He received a Certificate of Excellence for securing a top rank in the M. Tech course. He is currently an Assistant Professor at IIIT Surat, India. His area of specialization straddles across machine learning, deep learning, and computer vision.
Footnotes
This paper is for CAEE special section VSI-covid. Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. Sunil Kumar Singh.
https://3c5.com/MXWmr, accessed on 25-12-2021.
https://3c5.com/yoBST, accessed on 25-12-2021.
https://keras.io/api/applications/, accessed on 30-12-2021.
https://www.nibib.nih.gov/science-education/science-topics/x-rays,accessed on 30-12-2021.
Data availability
Data will be made available on request.
References
- 1.Chahar S., Roy P.K. COVID-19: A comprehensive review of learning models. Arch Comput Methods Eng. 2021;29:1915–1940. doi: 10.1007/s11831-021-09641-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Narin A., Kaya C., Pamuk Z. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Anal Appl. 2021:1–14. doi: 10.1007/s10044-021-00984-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Khan A.I., Shah J.L., Bhat M.M. CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images. Comput Methods Programs Biomed. 2020;196 doi: 10.1016/j.cmpb.2020.105581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jain R., Gupta M., Taneja S., Hemanth D.J. Deep learning based detection and analysis of COVID-19 on chest X-ray images. Appl Intell. 2021;51(3):1690–1700. doi: 10.1007/s10489-020-01902-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Maghdid H.S., Asaad A.T., Ghafoor K.Z., Sadiq A.S., Mirjalili S., Khan M.K. Multimodal image exploitation and learning 2021, Vol. 11734. International Society for Optics and Photonics; 2021. Diagnosing COVID-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms; p. 117340E. [Google Scholar]
- 6.Singh A., Satapathy S.C., Roy A., Gutub A. AI-based mobile edge computing for IoT: Applications, challenges, and future scope. Arab J Sci Eng. 2022:1–31. [Google Scholar]
- 7.Apostolopoulos I.D., Mpesiana T.A. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med. 2020;43(2):635–640. doi: 10.1007/s13246-020-00865-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhang J., Xie Y., Pang G., Liao Z., Verjans J., Li W., Sun Z., He J., Li Y., Shen C., et al. 2020. Viral pneumonia screening on chest X-ray images using confidence-aware anomaly detection. arXiv preprint arXiv:2003.12338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Singh D., Kumar V., Kaur M., et al. Classification of COVID-19 patients from chest CT images using multi-objective differential evolution–based convolutional neural networks. Eur J Clin Microbiol Infect Dis. 2020;39(7):1379–1389. doi: 10.1007/s10096-020-03901-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Jaiswal A., Gianchandani N., Singh D., Kumar V., Kaur M. Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. J Biomol Struct Dyn. 2020:1–8. doi: 10.1080/07391102.2020.1788642. [DOI] [PubMed] [Google Scholar]
- 11.Adhikari N.C.D. Infection severity detection of CoVid19 from X-Rays and CT scans using artificial intelligence. Int J Comput (IJC) 2020;38(1):73–92. [Google Scholar]
- 12.Alqudah A.M., Qazan S., Alqudah A. 2020. Automated systems for detection of COVID-19 using chest X-ray images and lightweight convolutional neural networks. [Google Scholar]
- 13.Ozturk T., Talo M., Yildirim E.A., Baloglu U.B., Yildirim O., Acharya U.R. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol Med. 2020;121 doi: 10.1016/j.compbiomed.2020.103792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wang L., Lin Z.Q., Wong A. Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Sci Rep. 2020;10(1):1–12. doi: 10.1038/s41598-020-76550-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Farooq M., Hafeez A. 2020. Covid-resnet: A deep learning framework for screening of covid19 from radiographs. arXiv preprint arXiv:2003.14395. [Google Scholar]
- 16.Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, p. 2097–2106.
- 17.Rajpurkar P., Irvin J., Ball R.L., Zhu K., Yang B., Mehta H., Duan T., Ding D., Bagul A., Langlotz C.P., et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15(11) doi: 10.1371/journal.pmed.1002686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ho T.K.K., Gwak J. Multiple feature integration for classification of thoracic disease in chest radiography. Appl Sci. 2019;9(19):4130. [Google Scholar]
- 19.Lakhani P., Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284(2):574–582. doi: 10.1148/radiol.2017162326. [DOI] [PubMed] [Google Scholar]
- 20.Kong W., Agarwal P.P. Chest imaging appearance of COVID-19 infection. Radiol: Cardiothorac Imaging. 2020;2(1) doi: 10.1148/ryct.2020200028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Verma M., Vipparthi S.K., Singh G., Murala S. LEARNet: Dynamic imaging network for micro expression recognition. IEEE Trans Image Process. 2019;29:1618–1627. doi: 10.1109/TIP.2019.2912358. [DOI] [PubMed] [Google Scholar]
- 22.Verma M., Vipparthi S.K., Singh G. Hinet: Hybrid inherited feature learning network for facial expression recognition. IEEE Lett Comput Soc. 2019;2(4):36–39. [Google Scholar]
- 23.Roy P.K., Tripathy A.K., Das T.K., Gao X.-Z. A framework for hate speech detection using deep convolutional neural network. IEEE Access. 2020;8:204951–204962. [Google Scholar]
- 24.Roy P.K., Saumya S., Singh J.P., Banerjee S., Gutub A. Analysis of community question-answering issues via machine learning and deep learning: State-of-the-art review. CAAI Trans Intell Technol. 2022:1–23. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.