Skip to main content
BMC Medical Imaging logoLink to BMC Medical Imaging
. 2025 Mar 4;25:71. doi: 10.1186/s12880-025-01604-5

Automated classification of chest X-rays: a deep learning approach with attention mechanisms

Burcu Oltu 1,, Selda Güney 2, Seniha Esen Yuksel 3, Berna Dengiz 4
PMCID: PMC11877751  PMID: 40038588

Abstract

Background

Pulmonary diseases such as COVID-19 and pneumonia, are life-threatening conditions, that require prompt and accurate diagnosis for effective treatment. Chest X-ray (CXR) has become the most common alternative method for detecting pulmonary diseases such as COVID-19, pneumonia, and lung opacity due to their availability, cost-effectiveness, and ability to facilitate comparative analysis. However, the interpretation of CXRs is a challenging task.

Methods

This study presents an automated deep learning (DL) model that outperforms multiple state-of-the-art methods in diagnosing COVID-19, Lung Opacity, and Viral Pneumonia. Using a dataset of 21,165 CXRs, the proposed framework introduces a seamless combination of the Vision Transformer (ViT) for capturing long-range dependencies, DenseNet201 for powerful feature extraction, and global average pooling (GAP) for retaining critical spatial details. This combination results in a robust classification system, achieving remarkable accuracy.

Results

The proposed methodology delivers outstanding results across all categories: achieving 99.4% accuracy and an F1-score of 98.43% for COVID-19, 96.45% accuracy and an F1-score of 93.64% for Lung Opacity, 99.63% accuracy and an F1-score of 97.05% for Viral Pneumonia, and 95.97% accuracy with an F1-score of 95.87% for Normal subjects.

Conclusion

The proposed framework achieves a remarkable overall accuracy of 97.87%, surpassing several state-of-the-art methods with reproducible and objective outcomes. To ensure robustness and minimize variability in train-test splits, our study employs five-fold cross-validation, providing reliable and consistent performance evaluation. For transparency and to facilitate future comparisons, the specific training and testing splits have been made publicly accessible. Furthermore, Grad-CAM-based visualizations are integrated to enhance the interpretability of the model, offering valuable insights into its decision-making process. This innovative framework not only boosts classification accuracy but also sets a new benchmark in CXR-based disease diagnosis.

Keywords: Deep learning, Convolutional neural network (CNN), Vision transformer (ViT), Chest X-Ray (CXR)

Background

COVID-19 is an infectious disease caused by the “Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2)” virus [1, 2]. World Health Organization (WHO) proclaimed COVID-19 to be a global pandemic in March 2020, after it first appeared in Wuhan, China in December 2019 [3, 4]. Up to December 2023, there have been 772,386,069 confirmed cases including 6,987,222 deaths globally [5].

The common symptoms of this deadly virus are fever, dry cough, shortness of breath, sore throat, fatigue, and headache [2]– [4, 6]. Additionally, COVID-19 related heart failure, septic shock, pneumonia, respiratory distress, and pulmonary edema are identified as the primary causes of death [1, 7]. Nevertheless, given the absence of an approved treatment for COVID-19, social distancing and accurate, rapid diagnosis remain essential measures to deal with this deadly virus [3, 6].

According to WHO, the gold standard for COVID-19 diagnosis and screening is RT-PCR [2, 3, 6]. While RT-PCR is the most widely used diagnostic tool for COVID-19, it has several drawbacks, including longer detection times (up to 2 days), low sensitivity (around 60–70%), and high false negative rates [2, 8]. Since accurate and rapid detection is necessary for stopping the spread of the virus and ensuring effective treatment, CXR and CT imaging techniques have become the alternative methods, as suggested by WHO [3, 9]. Both CT and CXR images contain visual markers correlated with COVID-19 infection. However, CT imaging has certain drawbacks as stated below and higher maintenance costs compared to CXR [4, 6, 9]:

  • CT scanners are not portable and therefore there is a risk of transmitting the virus in rooms with fixed imaging systems.

  • CT delivers higher radiation doses.

  • CT scanner has higher costs and requires a high level of expertise.

  • High-quality CT scanners are not accessible in rural areas.

CXR, on the other hand, is a commonly used, accessible imaging technique that has lower cost, delivers lower radiation doses, provides faster screening, and is widely available [1014]. Also, it plays a crucial role in the early diagnosis and screening of chest diseases, as well as COVID-19.

Despite the advantages of CXR imaging, the interpretation of chest radiographs is a challenging process due to the overlap of anatomical structures and tissue structures along the projection direction, and accurate diagnosis requires a high degree of skill, a high level of experience, and concentration. Moreover, the extensive use and high volume of the CXR images, have increased workload may lead the radiologist to misinterpret the images [1519]. For these reasons, it is estimated that radiologists have an average of 3–5% “real-time” errors in their daily practice [20].

In recent years, to develop a low-priced, automated approach especially for COVID-19 diagnosis and evaluation that can aid radiologists in making rapid and precise diagnoses, researchers and the scientific community have focused on CXR images [3, 18]. In this context, Machine Learning (ML) and mostly Deep Learning (DL) algorithms dominated the field in detecting COVID-19 from CXRs. However, the developed algorithms are mostly trained and tested on relatively smaller datasets, and these algorithms aimed to classify two (COVID-19 vs. normal) or three (COVID-19 vs. pneumonia vs. normal) classes, and fever focused on the classification of four classes [3, 9, 12, 21]– [26]. Nevertheless, a large database, namely the Covid-19 Radiography Database, consists of 3616 COVID-19, 10,192 Normal, 6012 Lung Opacity (Non-COVID lung infection), and 1345 Viral Pneumonia released in 2021, which allows the researchers to evaluate their models with an extensive dataset [21, 27]. The studies carried out with the data set in question are summarized below.

In 2021, Bashar et al. [21] introduced a classification model for distinguishing between normal, COVID-19, viral pneumonia, and lung opacity cases using AlexNet, GoogleNet, VGG16, VGG19, and DenseNet. Their approach achieved a maximum accuracy of 95.63% while enhancing normalized augmented data.

Brima et al. [9] developed an end-to-end deep transfer learning (TL) framework to classify three types of pneumonia (COVID-19, viral pneumonia, and lung opacity) and normal CXRs. They tested VGG19, Densenet121, and ResNet50 with the SGDM optimizer, and obtained the best test accuracy of 93.99% using the VGG19 model.

In 2022, Ukwuoma et al. [28] proposed a methodology named LSCB-Inception (light-chroma separated branches) based on Inceptionv3 for four-class classification. By replacing global average pooling (GAP) with global second-order pooling they achieved 98.2% accuracy with a computationally efficient model.

Khan et al. [29] proposed a multi-class classification method using EfficientNetB1, MobileNetV2, and NasNetMobile with a new classification head. By balancing the dataset through augmentation techniques, their EfficientNetB1 achieved a maximum accuracy of 96.13%.

In the same year, Hassanlou et al. [6] introduced FirecovNet, a lightweight DL network inspired by DarkNet and SqueezeNet, for five different classification tasks. They attained an accuracy of 95.92% for a four-class classification task.

Pan et al. [30] developed a multi-channel feature deep neural network (MFDNN) algorithm for four-class classification. By utilizing multi-channel feature fusion, they achieved an average accuracy of 93.19%.

Roy et al. [31] introduced SVD-CLAHE Boosting, a data augmentation algorithm, and a novel loss function (Balanced Weighted Categorical Cross Entropy (BWCCE)) to classify a highly class-imbalanced CXR dataset. Using ResNet50 and VGG19, they improved classification performance for imbalanced datasets.

Ukwuoma et al. [32] proposed the Dual_Pachi approach which combined CIE LAB conversion, global second-order pooling, and multi-head self-attention They trained and tested the proposed approach on a sub-dataset they created. In their study, they enlarged the pneumonia samples to 3,000 by using various data augmentation techniques (i.e., rotation, horizontal flip, zoom). Afterward, they used a balanced data set in training (3,000 per class), validation (300 per class), and testing (300 per class). Subsequently, they achieved 0.97 accuracy.

In a follow-up study Ukwuoma et al. [33] developed an ensemble framework combining DenseNet201, VGG16, and GoogleNet models with global-second order pooling. In the proposed model, these fused features were further processed through a multi-head self-attention (MSA) layer and a multi-layer perceptron (MLP) for classification. Using the same sub-dataset they created as in [32] their approach increased the performance metrics by approximately 3% compared to their previous work [32].

Islam et al. [34] built an algorithm to classify CXRs into four classes using Xception, VGG19, and ResNet50 with slight modifications along the bottom layers. Consequently, they achieved a maximum accuracy of 93% using the Xception model and increased the interpretability of their algorithms with GradCam analysis, highlighting critical areas for classification.

In 2023, Azad et al. [12] presented an algorithm for COVID-19 detection from CXRs using local binary patterns and pre-trained CNN models. They classified extracted features with support vector machine (SVM), decision tree, random forest, and k-nearest neighbors classifier. An ensemble-CNN based SVM method using DenseNet201, EfficientNet-b0, and DarkNet53 achieved the best performance, within a four-class classification.

Ukwuoma et al. [10] proposed a DL framework based on feature concatenation obtained from VGG16, Inceptionv3, DenseNet, and a multi-head self-attention network. Their model achieved the best performance using the Adam optimizer, categorical cross-entropy, and a learning rate of 10− 4.

Alablani and Alenazi [27] introduced a COVID-ConvNet, which consists of convolutional layers, maximum pooling layers, flattening layers, and dense layers, achieving an overall accuracy of 95.46%.

In 2023, Almalki et al. [35] evaluated the performance of the Swin transformer for the classification of CXRs. They compared 7 DL models’ (i.e. ResNet50, DenseNet121, InceptionV3, EfficientNet-b2, VGG19, ViT, CaIT) performances, concluding that Swin transformers provided the best performance.

In Table 1, obtained performance results in the studies carried out with the data set in question are summarized.

Table 1.

A summary of the literature studies conducted on the Covid-19 radiography database

Study Accuracy (%) Precision (%) Recall (Sensitivity) (%) F1-Score (%) AUC
2021, Bashar et al. [21] 95.63 99.18 98.78 - -
2021, Brima et al. [9] 93.99 - - - -
2022, Ukwuoma et al. [28] 98.199 9 98.798 - -
2022, Khan et al. [29] 96.13 97.25 96.50 97.50 -
2022, Hassanlou et al. [6] 95.92 95.9 95.94 95.9 1
2022, Pan et al. [30] 93.19 92.795 94.2775 93.3925 -
2022, Roy et al. [31] 94 96 95 95 -
2022, Ukwuoma et al. [32] 96.65 93.91 93.31 93.38 0.95547
2022, Ukwuoma et al. [33] 98 96.21 96.02 96.03 0.97341
2023, Islam et al. [34] 93 92.75 93 93 0,99
2023, Azad et al. [12] 97.41 94.91 94.81 94.86 -
2023, Ukwuoma et al. [10] 98 96.2 96.01 96.03 0.9734
2023, Alablani and Alenazi [27] 95.46 91 93 92.25 -
2023, Almalki et al. [35] 96.60 96.675 97.075 96.8 -

Apart from the studies that employed the Covid-19 Radiography Database, there are studies focusing on attention mechanisms and vision transformers (ViT), which processes pixels with attention mechanisms instead of convolution layers [36].

Shome et al. [37] proposed a ViT-based network for classifying COVID-19, pneumonia, and healthy CXRs. The study utilized the ViT L-16 model replacing the original MSA block with a Gaussian error linear unit (GELU) based MSA block. They concluded that the ViT-based model outperforms several existing architectures.

In 2022 Chetoui and Akhloufi [38] evaluated different ViT models for COVID-19 detection. The ViT-B32 model achieved superior performance compared to EfficientNet, DenseNet-121, NasNet, and MobileNet.

Yang et al. [39] introduced Covid-Vision-Transformers (CovidViT), a transformer-based model using self-attention mechanisms for COVID-19 diagnosis. They obtained 98.2% classification accuracy.

In 2023, Nafisah et al. [40] compared CNN and ViT models for COVID-19 detection. They showed that the best performance was obtained with the EfficientB7 CNN network on a balanced data set.

Chen et al. [41] proposed BoT-ViTNet based on ResNet50 by incorporating MSA and TRT-ViT blocks with transformers and bottlenecks into its final layers. This model demonstrated the benefits of integrating these techniques for multi-class classification (COVID-19, healthy, pneumonia).

Marefat et al. [7] introduced CCTCOVID, a Compact Convolutional Transformers architecture that combines CNN and ViT. Their model achieved 99.2% accuracy for COVID-19 detection, surpassing previous studies.

Wang et al. [42] proposed PneuNet, a hybrid ResNet18-ViT model for detecting COVID-19 from CXRs. In this model, ResNet18 serves as the backbone of the model, extracting spatial features while ViT, processes these features as a single patch using maximum pooling. The final classification is performed using an MLP, achieving an accuracy of 90.03% in multi-class classification.

Previous research has demonstrated the potential of DL in COVID-19 detection. However, many of these studies have focused on binary or three-class classification tasks (COVID-19 vs. healthy, COVID-19 vs. pneumonia vs. normal), datasets with a limited number of CXRs, or artificially augmented datasets. Since DL models require a large amount of data, the generalization ability and the performance of models that are trained and tested on small datasets can be unreliable. This study proposes a DL-based classification model using a large publicly available dataset containing CXRs with COVID-19, as well as three other distinct disease symptoms (viral pneumonia, Lung Opacity, and normal). Although there are studies in the literature using the same dataset, none directly compared their results against other studies which used the original data. Moreover, the obtained performance of these classification studies is not yet adequate and needs to be improved. Additionally, some studies applied data augmentation techniques to create new subsets from the existing one and conducted their analyses on these artificially augmented image subsets. However, this approach may introduce biases or artifacts not present in real-world data, impacting the model’s generalization ability to unseen clinical cases. Furthermore, a lack of transparency in data splitting methodologies is observed in some existing works utilizing the same dataset. Certain studies did not specify the data split entirely, while others neglected to specify which images were in training, validation, or testing sets. This lack of clarity hinders the reproducibility and reliability of their results.

In our study, we propose an end-to-end attention-based DL network, which requires neither preprocessing nor hand-crafted features, and can distinguish between COVID-19 CXRs and other lung infections such as lung opacity and viral pneumonia apart from healthy CXRs. In the proposed network, a multi-head attention-based network based on ViTs is combined with Densenet201 in order to capture spatial features, and for boosting the classification performance. Additionally, the global average pooling layer is employed to enhance the extracted features from Densenet201. As a result, a comprehensive framework, that leverages the advantages of DenseNet201’s robust feature extraction capabilities, ViT’s ability to pay more attention to the global extracted spatial features, and GAP’s effectiveness in reducing dimensionality and improving feature representation, is proposed. Moreover, since it is essential to distinguish COVID-19 from other lung diseases for determination of the correct treatment process, the proposed model is trained and tested on the largest dataset of CXRs, including COVID-19, 2 other disease symptoms, and normal classes. Additionally, this study employs a rigorous five-fold cross-validation approach. By evaluating the model using five-fold cross-validation, rather than relying on a single training-test split, we ensure a more comprehensive and robust assessment of model performance, enhancing both reliability and stability. Additionally, to ensure transparency and demonstrate the robustness of our model, we have made the specific folds used in our five-fold cross-validation publicly available. As a result, it is clear that our work not only improves the performance of COVID-19 detection but also offers a reliable and reproducible approach that is suitable for real-world implementation, hence a significant contribution to the field.

These CXRs are obtained from an open-source dataset on Kaggle which is the largest dataset including the aforementioned classes [21, 32].

The key contributions of this study are summarized as follows:

  • We developed a comprehensive framework that integrates a pre-trained CNN model and attention mechanisms to detect and classify COVID-19 with outstanding performance. The proposed methodology achieves superior performance by combining pre-trained DenseNet201’s ability to extract strong features and ViT’s ability to capture long-distance dependencies and boost classification accuracy.

  • We compared the performance of the developed model in terms of accuracy, precision, recall, F1-score, the area under the curve (AUC), and the confusion matrix with various experiments.

  • We compiled a comprehensive summary of the studies conducted using the aforementioned “Covid-19 Radiography Database”, which does not exist in the open literature, to the best of our knowledge. Additionally, comparing the test results with the studies that use the same data set, which was not previously undertaken in any other study.

The rest of the article is as follows. Following the Background section, Sect. 2 presents the dataset’s details and methodological approach of the study. In Sect. 3, the obtained results are presented. Section 4 discusses the results and compares the performance of the proposed study with the performance of literature studies. Finally, conclusions are drawn in Sect. 5.

Material and methodology

Dataset

In this study, the open-source COVID-19 Radiography Database is used [43, 44]. The dataset is created by a team of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh along with their collaborators from Pakistan and Malaysia. The dataset consists of four classes: COVID-19, viral pneumonia, lung opacity (i.e. non-COVID lung infection), and normal. In this dataset, all images are in “.png” format and their resolutions are 229 × 229 pixels., Examples of images from each class are shown in Fig. 1. Table 2 presents the number of images in each class.

Fig. 1.

Fig. 1

Example CXR’s

Table 2.

Dataset

Set COVID-19 Viral Pneumonia Lung Opacity Normal
Train 2604 968 4329 7339
Validation 289 108 481 815
Test 723 269 1202 2038
Total 3616 1345 6012 10192

In this study, five-fold cross validation is utilized in all evaluations to generalize model performances [12] using 80% as the training set and 20% as the test set (unseen data). However, during each fold, 10% of the training split is used as a validation set in order to check the validity of classification performance and overcome overfitting. While splitting the subsets, a random shuffle method is preferred.

Methodology

The proposed algorithm represented as a schematic diagram in Fig. 2 is based on DenseNet201 and ViT. DenseNet201 is the backbone of the algorithm that is responsible for extracting spatial features from the CXRs. The output of the DenseNet201 is fed into the ViT as a whole patch rather than an output that will be split into patches, as suggested in [42]. Afterward, the patches are encoded and embedded, passing into the transformer encoder. Following the transformer encoder, the output passes through the MLP. Additionally, GAP is applied to the output of DenseNet201. Then, the output of GAP and the MLP concatenates and passes through a dense layer for classification. Each of these layers is described in the following sections.

Fig. 2.

Fig. 2

Schematic diagram of proposed algorithm

Resizing and normalization

The resolution of the images in the dataset is 299 × 299. However, to train the DenseNet201, at first the images are resized to 224 × 224 resolution. Secondly, to obtain a stable model, the images are normalized by dividing pixels by 255 (min-max normalization) [3]. After the resizing and normalization process, images are fed to DenseNet201.

DenseNet201

Convolutional neural networks are DL networks that are capable of extracting discriminative features from images and therefore, are widely used in the classification of diseases. CNNs are feed-forward networks consisting of multiple layers, i.e. convolutional layers which are employed to extract local features, pooling layers that downsample the extracted features by means of decreasing the computational complexity and preventing overfitting, and fully connected layers that are in charge of final classification [13, 45]. Moreover, due to their ability to understand the hidden features of images, CNNs attracted great attention in the field of healthcare [21, 46, 47].

However, though CNNs have superior performance, to train them and obtain victorious results, huge datasets are required [3, 48]. To address this limitation, TL is often employed to enhance the generalization ability of the CNNs even in limited datasets [45]. In this context, state-of-the-art pre-trained networks such as DenseNet201, Inceptionv3, Mobilenet etc., trained on ImageNet are frequently used for feature extraction and image classification, especially in the field of healthcare [9, 27]. Therefore, in this study, DenseNet201, pre-trained on ImageNet, is used since this model is proven to present good performance in CXRs in studies such as [28, 49, 50].

DenseNet201, which comprises 201 convolution layers, is proposed by Huang et al. [51] in 2018 to maximize the information flow in between the layers. It consists of multiple dense blocks in which layers concatenate the inputs from the preceding layers with their feature maps and then send the concatenation to the subsequent layers [49]. Thus, it reduces overfitting especially in smaller training datasets [51].

In the proposed approach the feature maps obtained from the output of the last convolution layer of pre-trained DenseNet201 are fed into the transformer block. Additionally, GAP is applied to the obtained feature maps from DenseNet201 with the aim of preserving essential spatial information, leading to more effective and generalizable feature representations. The output of GAP layer is concatenated with the output of the transformer block for the classification.

Transformer block

Transformers are neural networks that are based on self-attention mechanisms. Following their remarkable achievements in the field of natural language processing, transformers have become state-of-the-art in image classification and object detection [52]. Vision transformers, which are the most successful attempt to deploy transformers into image classification tasks, employ only the encoder block of transformers [5254].

ViTs, divide the input images into fixed-size patches and then linearly transform them into lower dimensional embeddings. Afterwards, the patch embeddings are fed into the transformer encoder block to model the resemblance between these patches as sequences. The transformer encoder block consists of normalization layers, multi-head attention layers, residual adding layers, and MLP layers.

In this study, rather than generating patches from input images, each channel of extracted spatial features obtained from the output of the DenseNet201 is considered as one patch, similar to the work of Wang et al. [42]. Then, the patches are transformed into embeddings and normalized. The normalized patch embeddings are fed into the multiple parallel self-attention heads in the multi-head attention layer, in which relationships and dependencies are captured. At the output of the multi-head attention layer, the attained output and the input embeddings are added element-wise to preserve the original information. Following this addition process, the normalized output is passed through MLP, where non-linear activation functions (i.e. GeLu) are applied. As in the previous step, the input and the output of the MLP layer are added element-wise. Thus, the output of the encoder is obtained. In this study, six transformer layers with 16 heads are used.

Concatenation and classification

After the encoder, the output is fed into an MLP consisting of three dense layers, each utilizing ReLU activation. A 20% dropout is applied after each dense layer. The output from the MLP and the Densenet201, post-GAP, are concatenated. The concatenated features then pass through a final dense layer, ultimately producing the desired class labels.

Network implementation and training process

As mentioned earlier, all evaluations are conducted using a five-fold cross-validation technique, with train and test subsets split in a 80:20 ratio. Additionally, 10% of the training subset is used as a validation subset at each fold. The test data remains unseen and is only evaluated after training the model. Before training, all images are resized to 224 × 224 to fit the input requirements of DenseNet201. The data is then normalized to a range of 0 and 1 dividing by 255. Image augmentation techniques such as 20° rotation and 0.1 zooming are applied exclusively to the training data to mitigate potential overfitting due to the dataset size [29]. Data augmentation is implemented using ImageDataGenerator function provided by Keras, allowing real-time data augmentation during model training instead of increasing the size of the training dataset. Moreover, only zooming and rotation techniques are applied, as other augmentation methods could potentially disturb the semantic meaning of the CXR images [40]. Following these preprocessing steps, the training is initiated with a learning rate (LR) of 0.01 using a Stochastic Gradient Descent optimizer. For fast convergence, an adaptive LR decay algorithm, ReduceLROnPlateau, is employed. This technique reduces LR by a factor of 0.7 when the validation loss does not improve for three consecutive epochs. A categorical cross-entropy loss function is used. Model parameters are summarized in Table 3. For evaluation, all processes are executed on the Google Colab Pro platform, which provides an NVIDIA Tesla V100 GPU and uses the TensorFlow framework. After training, the model is tested with the unseen test subset, and performance evaluations are carried out. The model is trained using 30 epochs and the version with the lowest validation loss is selected for testing.

Table 3.

Model parameters

Parameter Value
Learning Rate 0.01
Optimizer Stochastic Gradient Descent (SGD)
Scheduling Technique ReduceLROnPlateau
Factor 0.7
Patience 3 epochs
Loss function Categorical cross-entropy

Performance evaluation

To assess the success of the proposed algorithm, metrics such as accuracy, precision, recall (sensitivity), specificity, F1-score, and AUC are evaluated. In this study, the class label provided by the dataset is considered as the gold standard. Therefore, when computing sensitivity, specificity, F1-score, and accuracy, the data distribution in the confusion matrix (an example for COVID-19 calculation is given in Fig. 3) is taken into account. The formulae for calculating accuracy, precision, recall, and F1-score of the proposed algorithm are given in Eqs. 15:

Fig. 3.

Fig. 3

Confusion matrix for COVID-19 performance evaluation

graphic file with name M1.gif 1
graphic file with name M2.gif 2
graphic file with name M3.gif 3
graphic file with name M4.gif 4
graphic file with name M5.gif 5

Here, TP is the number of true positives, TN is the number of true negatives, FN is the number of false negatives and FP is the number of false positives, respectively [6, 55].

The ROC curve is used to graphically observe the model’s ability to distinguish classes. The degree of separability of classes is defined by AUC [6].

Experimental results

Classification results

As explained in previous sections, the model’s performance is assessed using the unseen test set. Additionally, to provide a more reliable assessment of overall performance, a five-fold cross-validation approach is used instead of a single train-test split. The results of each fold are presented in Table 4. Confusion matrices for the four-class classification and the ROC curves for each fold are shown in Figs. 4 and 5, respectively.

Table 4.

Test results of each fold for five-folds, and their averages

Fold Class Accuracy (%) Precision (%) Recall (Sensitivity) (%) Specificity (%) F1-score (%) AUC
1 COVID-19 99.39 99.15 97.23 99.83 98.18 1.00
Lung Opacity 96.88 96.28 92.60 98.58 94.40 0.99
Normal 96.03 94.35 97.59 94.58 95.94 0.99
Viral Pneumonia 99.43 97.30 93.68 99.82 95.45 1.00
2 COVID-19 99.50 99.72 97.37 99.94 98.53 1.00
Lung Opacity 96.31 94.78 92.10 97.99 93.42 0.98
Normal 96.03 94.61 97.30 94.85 95.94 0.99
Viral Pneumonia 99.69 98.85 96.28 99.92 97.55 1.00
3 COVID-19 99.53 99.02 98.20 99.80 98.61 1.00
Lung Opacity 96.17 94.52 91.85 97.89 93.16 0.98
Normal 95.98 94.95 96.81 95.21 95.87 0.99
Viral Pneumonia 99.76 98.14 98.14 99.87 98.14 1.00
4 COVID-19 99.24 99.71 95.85 99.94 97.74 1.00
Lung Opacity 96.15 94.21 92.10 97.76 93.14 0.99
Normal 95.42 93.74 96.96 93.98 95.32 0.99
Viral Pneumonia 99.50 98.82 93.31 99.92 95.98 1.00
5 COVID-19 99.53 98.62 98.62 99.72 98.62 1.00
Lung Opacity 96.72 96.42 91.85 98.65 94.08 0.99
Normal 96.38 94.86 97.79 95.08 96.30 0.99
Viral Pneumonia 99.76 99.24 97.03 99.95 98.12 1.00
Average COVID-19 99.44 99.24 97.46 99.85 98.34 1.00
Lung Opacity 96.45 95.24 92.10 98.17 93.64 0.99
Normal 95.97 94.50 97.29 94.74 95.87 0.99
Viral Pneumonia 99.63 98.47 95.69 99.90 97.05 1.00

Fig. 4.

Fig. 4

Confusion matrices

Fig. 5.

Fig. 5

ROC Curves

Table 4 provides the results for each fold, with the average values summarized in the final rows. Across all folds, the model consistently demonstrates high performance, with accuracies ranging from 97.58 to 98.10%, confirming robust predictive ability. On average, the model achieved 99.44% accuracy and identified COVID-19 cases with 99.24% precision while high accuracy rates for detecting Lung Opacity (96.45%) maintained, Viral Pneumonia (99.63%), Normal cases (95.97%). Strong recall rates were observed across all classes, particularly notable for COVID-19 (97.46%) and Viral Pneumonia (95.69%). The high specificity rates indicate a low number of false negatives, meaning actual negative cases are correctly identified by the model.

Table 5; Fig. 6 displays the overall average test results across all diseases. The model shows consistent and robust performance across all folds, with accuracy levels between 97.58% and 98.10%, indicating reliable disease prediction. Notably, the precision and recall scores are high, showing its ability to minimize false positives while accurately capturing true positives. High specificity scores further demonstrate the model’s ability to correctly identify negative cases which is crucial for minimizing misdiagnoses. Overall, the strong F1-scores and AUC values also indicate the model’s success in detecting diseases and distinguishing between classes.

Table 5.

Average test results

Fold Accuracy (%) Precision (%) Recall (Sensitivity) (%) Specificity (%) F1-score (%) AUC
1 97.93 96.77 95.28 98.20 96.00 0.99
2 97.89 96.99 95.76 98.18 96.36 0.99
3 97.86 96.66 96.25 98.19 96.45 0.99
4 97.58 96.62 94.55 97.90 95.55 0.99
5 98.10 97.28 96.32 98.35 96.78 1.00
Average 97.87 96.86 95.64 98.16 96.23 0.992

Fig. 6.

Fig. 6

Box plot of test results

Analyzing the confusion matrices, it is evident that COVID-19 and Viral Pneumonia CXRs are classified with minimal error. However, Normal and Lung Opacity CXRs are sometimes confused. This confusion is likely due to the severity of the Lung Opacity, when the opacity severity is mild, the images resemble Normal CXRs. Unfortunately, the dataset lacks information about the severity of the pathologies, preventing a detailed assessment of this issue. Additionally, as shown in Figs. 1 and 7, these images appear similar to an untrained eye, making it particularly challenging to differentiate between various lung diseases. Despite this difficulty, the trained model has proven successful in accurately distinguishing these cases.

Fig. 7.

Fig. 7

Example CXRs of Normal and Lung Opacity

As shown in Fig. 5, COVID-19 and viral pneumonia consistently achieved an AUC of 1.0 in each fold, while Lung Opacity and Normal cases demonstrated an AUC of 0.99.

In conclusion, the proposed methodology has demonstrated superior performance in four-class classification, despite potential measurement errors and erroneous class labels within the dataset, which is compiled from multiple sources.,

In order to enhance the interpretability of the model, the Gradient-Weighted Class Activation Maps (Grad-CAM) algorithm is used in this study. Grad-CAM serves as an effective tool to visualize the decision-making process of deep learning models by highlighting the regions of an image that the model prioritizes when producing predictions. The process begins by inputting an image into the model and calculating the gradient of the output score for a specific target class with respect to the activations in the final convolutional layer. These gradients reveal the contribution of each pixel to the class prediction. Using global max pooling, the gradients are transformed into neuron importance weights [56]. These weights are then applied to the activation maps to produce a weighted combination, resulting in a heatmap. This heatmap is overlaid onto the original image to visually emphasize the most effective regions for the class prediction [57]. Examples of Grad-CAM for each class are illustrated in Fig. 8.

Fig. 8.

Fig. 8

Grad-CAM Examples for Each Class

In Fig. 8, the heatmaps are represented using a jet color scheme. In this scheme, blue corresponds to regions with minimal importance for classification, while yellow and green indicate moderate contributions, referring to areas with less significant feature extraction. Conversely, red and dark red signify regions with significant relevance, representing features that are strongly indicative of the target class [58]. Thus, Fig. 8 allows the identification of the critical areas where the discriminative features are most prominent for each of the four classes.

Ablation studies of the implemented models

To validate the effectiveness of the proposed model, ablation studies were performed to assess the contributions of its individual components: DenseNet, ViT, and their combination without GAP. The results, summarized in Table 6, include performance metrics such as accuracy, sensitivity, specificity, precision, and F1-score. Table 6 clearly demonstrates the superior performance of the proposed model across all metrics.

Table 6.

Ablation studies test results- individual contributions

Model Accuracy (%) Precision (%) Recall (Sensitivity) (%) Specificity (%) F1-score (%)
ViT 94.20 89.74 88.17 95.24 88.85
DenseNet201 97.52 96.20 94.83 97,90 95.47
DenseNet201 + ViT 97.53 96.25 94.86 97.90 95.53
Proposed (DenseNet201 + ViT + GAP) 97.87 96.86 95.64 98.16 96.23

Among the individual components, ViT exhibited the lowest performance, achieving an accuracy of 94.20%. While DenseNet201 performed slightly better, with an accuracy of 97.52%, it still lagged behind the proposed model. The combination of DenseNet201 and ViT, without incorporating GAP, yielded a minor improvement, with a + 0.01% increase in accuracy and a + 0.06% gain in F1-score compared to DenseNet201 alone.

In contrast, the proposed model, which integrates DenseNet201, ViT, and GAP, significantly outperformed both the individual components and their combination. Specifically, it achieved improvements of + 0.35% in accuracy, + 0.61% in precision, + 0.78% in sensitivity, + 0.26% in specificity, and + 0.7% in F1-score. These results highlight the effectiveness and superiority of the proposed methodology.

Additionally, to further demonstrate the effectiveness of our proposed model, ablation studies were conducted using different transfer learning approaches as backbone models. The results, presented in Table 7, include performance metrics such as accuracy, sensitivity, specificity, precision, and F1-score. The evaluation compares three TL models: VGG16, Resnet50, and Inceptionv3 as backbones [28]. As shown in Table 7, our proposed model outperforms the VGG16-based model by + 0.08% in accuracy and surpasses the Inceptionv3-based model by + 0.11% in accuracy. Additionally, our model enhances the performance of the Resnet50-based model by + 0.41% in accuracy.

Table 7.

Ablation studies test results- different backbones

Backbone Accuracy (%) Precision (%) Recall (Sensitivity) (%) Specificity (%) F1-score (%)
VGG16 97.79 96.59 95.79 98.22 96.13
Inceptionv3 97.76 96.41 95.99 98.16 96.18
Resnet50 97.46 96.14 95.51 97.86 95.91
Proposed 97.87 96.86 95.64 98.16 96.23

Comparing the results of both ablation studies reveals that the proposed model consistently demonstrates superior performance.

Discussion

Since its emergence and declaration as a global pandemic, COVID-19 continues to have a significant impact on the world population [4, 27]. Despite the availability of PCR tests and advanced imaging techniques, these methods are often inadequate, time consuming, and require a high level of expertise. Consequently, the use of CXRs for the detection and diagnosis of COVID-19 has gained importance. In line with this trend, DL techniques have become a focal point for COVID-19 detection from CXRs.

In this study, we have proposed an end-to-end DL framework based on an attention mechanism to differentiate between COVID-19, Lung Opacity, Viral Pneumonia, and Normal CXRs. The proposed algorithm is trained and tested using an open-access data set from Kaggle, specifically the COVID-19 Radiography Database. Table 8 presents other DL approaches using the same dataset with similar splits. For a fair comparison, only studies that utilized the entire dataset without extraction and separated it into training, testing, and validation sets are included. Although all studies in Table 8 used the same dataset, direct comparisons are challenging due to the dataset splits. Additionally, not all researchers employed k-fold cross-validation, which can lead to biased and inflated performance metrics. Nonetheless, it is evident that the proposed methodology outperforms the results achieved in studies [9, 27, 29]– [31]. Since Khan et al. [29] did not use a k-fold cross-validation, performance of their study is slightly higher than our results; however, a one-to-one comparison is inappropriate. For instance, our results in Fold five still exceed the success of Khan et al.’s study.

Table 8.

Comparison of results in the literature using entire data set and the proposed algorithm

Accuracy (%) Precision (%) Recall (Sensitivity) (%) Specificity (%) F1-score (%) AUC CV
2021, Bashar et al. [21] 95.63 99.18 98.78 - 98.98 0.99
2021, Brima et al. [9] 93.99 - - - - 0.99 5-Fold
2022, Khan et al. [29] 96.13 97.25 96.5 - 97.5 -
2022, Pan et al. [30] 93.19 92.795 94.2775 - 93.3925 -
2022, Roy et al. [31] 94 96 95 - 95 0.967
2023, Alablani and Alenazi [27] 95.46 91 93 - 92.25 -
Proposed methodology Fold 1 97.93 96.77 95.28 98.2 96 0.99
Fold 2 97.89 96.99 95.76 98.18 96.36 0.99
Fold 3 97.86 96.66 96.25 98.19 96.45 0.99
Fold 4 97.58 96.62 94.55 97.9 95.55 0.99
Fold 5 98.1 97.28 96.32 98.35 96.78 1
Average 97.87 96.86 95.64 98.16 96.23 0.992 5-Fold

As discussed in Sect. 2, some studies trained and tested their approaches on specially created subsets of the dataset. Table 9 presents other DL approaches using these subsets, highlighting the data splits and obtained results for a fair comparison. Ukwuoma et al. [10, 28, 32, 33] conducted several studies using the same data set, employing unconventional partition of 3000 CXRs per class for training and 300 CXRs for validation and testing. Due to the limited number of viral pneumonia (1345) they augmented the data to reach 3000 samples. Consequently, their results are not directly comparable with ours. However, to benchmark our methodology we conducted an analysis using their data partition. Our method achieved an accuracy of 98.58%, precision of 98.59%, recall of 98.58%, F1-score of 98.59%, and an AUC of 1.

Table 9.

Comparison of the results of the studies in the literature that use the subset and the proposed algorithm

Study Dataset split Accuracy (%) Precision (%) Recall (Sensitivity) (%) F1-score (%) AUC
2022, Hassanlou et al. [6]

1345 images per class

Train:80%, Test:20%, Val:10% of Train

95.92 95.9 95.94 95.9 1
2022, Ukwuoma et al. [32]

Train:3000 images per class

Val: 300 images per class

Test: 300 images per class

96.65 93.91 93.31 93.38 0.95547
2022, Ukwuoma et al. [33]

Train:3000 images per class

Val: 300 images per class

Test: 300 images per class

98 96.21 96.02 96.03 0.97341
2022, Ukwuoma et al. [28]

Train:3000 images per class

Val: 300 images per class

Test: 300 images per class

98.199 96.51 96.407 96.414 0.97603
2023, Islam et al. [34]

1000 images per class

Train:80%,

Test:20%, Val:20% of Train

93 92.75 93 93 0,99
2023, Azad et al., [12]

1340 images per class

Train:1072, Test:268,

97.41 94.91 94.81 94.86 -
2023, Ukwuoma et al. [10]

Train:3000 images per class

Val: 300 images per class

Test: 300 images per class

98 96.2 96.01 96.03 0.9734
Proposed methodology

Train:3000 images per class

Val: 300 images per class

Test: 300 images per class

98.58 98.59 98.58 98.59 1

Overall, the performance of our proposed model surpasses all other studies using subsets of the dataset. In conclusion, the algorithm presented in this study demonstrates superior performance compared to similar approaches in the literature. Our methodology, as shown in Tables 8 and 9, outperforms all 13 algorithms that classify four classes (COVID-19, Viral Pneumonia, Lung Opacity, and Normal) using the COVID-19 Radiography Database.

Conclusion and future works

In this paper, a robust DL architecture approach, that achieved outstanding performance, to ensure thriving detection of COVID-19 and classification of CXRs into COVID-19, Lung Opacity, Viral Pneumonia and Normal is proposed. As stated above, at present, the gold standard for COVID-19 detection are PCR tests which are limited in quantity, and time-consuming to analyze [35]. Apart from PCR tests, another approach for detection is imaging modalities such as CT scanning which requires time for imaging, is costly; and due to its higher radiation exposure doses, is not suitable for children and pregnant women [9]. Therefore, CXRs became a popular imaging technique for COVID-19 detection and classification due to their low cost, accessibility, and lower radiation doses [27]. However, the interpretation of CXRs is challenging even for experienced radiologists, moreover due to the extensive volume of CXRs the increased workload may lead the radiologists to misinterpret the images. Therefore, for an objective evaluation of the CXRs, a computer-aided diagnosis system has become necessity. To overcome the difficulties of detection of COVID-19, Lung Opacity and Viral Pneumonia we have developed an end-to-end DL framework based on an attention mechanism that discriminates against COVID-19, Lung Opacity, Viral Pneumonia, and Normal CXRs with highly favorable outcomes. Moreover, since our algorithm is tested on unseen test CXRs and we have employed five-fold cross-validation, the suggested algorithm yields reproducible and objective results. Therefore, it is suitable for a computer-aided diagnosis systems that can assist radiologists as a second reader to improve the decision-making process by means of speed and accuracy. Further, we provide the list of images in each fold for future fair comparisons.

This study identified some limitations of the proposed approach. In our study, we used a publicly available dataset that has been widely used as a benchmark in similar studies. However, publicly available datasets may not fully represent the diversity of real-world scenarios. This limitation may impact the generalizability of the model in clinical settings [59]. Additionally, biases in these datasets can compromise reliability and reduce the generalizability of the models [60]. Although our proposed model showed strong performance on the test dataset, it may face several challenges in using this model in real-world applications due to varying imaging conditions, scanner settings, and patient demographics [61]. To address these concerns, we used five-fold cross-validation to ensure robustness and minimize the risk of overfitting to certain training-test splits. However, we acknowledge that real-world deployment requires further evaluation under varied conditions to ensure consistent reliability and generalizability. Therefore, in future studies, we plan to improve the model performance by incorporating a significantly larger number of CXRs during training. Moreover, we intend to evaluate the performance of the model on more complex classification tasks, such as five-class or six-class problems. Also, we will focus on validating the proposed model using diverse, multicenter clinical datasets to better reflect real-world variability. These datasets will encompass images from various sources, regions, and patient demographics, enabling a more comprehensive evaluation of the model’s robustness and generalizability. Additionally, future work also can investigate the potential of integrating GAN, Transformer, or diffusion-based synthetic image generation into our model and compare its effectiveness against minimal augmentation techniques in addressing class imbalance and improving model performance. While combining DenseNet201 with Vision Transformers (ViTs) introduces computational complexity, it is clear that a successful model has been produced. Future work will investigate lightweight model architectures or optimization techniques to balance efficiency and accuracy to reduce this concern and classify higher-class datasets. Moreover, we will explore the loss functions to overcome class-imbalance issues. Furthermore, future study could explore the impact of lung segmentation on the model’s performance. By minimizing irrelevant background information, lung segmentation has the potential to significantly enhance the accuracy and effectiveness of the proposed approach. By addressing these challenges, we will be able to develop a powerful decision support system that will not only increase the reliability and efficiency of the model but also reduce the workload of clinicians and increase diagnostic accuracy.

Acknowledgements

Not applicable.

Author contributions

B.O. played a role in developing the methodology, conducting the literature review, preparing the figures, and was responsible from the analysis. She also wrote the original draft of the manuscript. S.G. contributed significantly to the conceptualization of the research and the development of the methodology. She also ensured the validity of the work and participated in reviewing and editing the manuscript. S.E.Y. was actively involved in the conceptualization process and contributed to reviewing and editing the manuscript. B.D. provided critical oversight, contributing to the conceptual framework, methodology design, manuscript review, editing, and overall supervision of the paper.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability

In this study, a publicly available dataset was used. Necessary references and relevant links are provided appropriately.

Declarations

Ethics approval and consent to participate

This research is done on publicly available datasets that are already published and are therefore exempt from approval.

Consent for publication

The research is conducted on a publicly available anonymous dataset therefore this section is not applicable for this research.

Competing interests

The authors declare no competing interests.

Clinical trial number

Not applicable.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Li W, Deng X, Shao H, Wang X. Deep learning applications for COVID-19 analysis: A state-of-the-art survey, CMES - Computer Modeling in Engineering and Sciences. Tech Science Press. 2021;129(1):65–98. 10.32604/cmes.2021.016981
  • 2.Al-Antari MA, Hua C-H, Bang J, Lee S. Fast deep learning computer-aided diagnosis of COVID-19 based on digital chest x-ray images, 2021, 10.1007/s10489-020-02076-6/Published [DOI] [PMC free article] [PubMed]
  • 3.Ahmad M, Bajwa UI, Mehmood Y, Anwar MW. Lightweight ResGRU: a deep learning-based prediction of SARS-CoV-2 (COVID-19) and its severity classification using multimodal chest radiography images. Neural Comput Appl. 2023. 10.1007/s00521-023-08200-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chakraborty M, Dhavale SV, Ingole J. Corona-Nidaan: lightweight deep convolutional neural network for chest X-Ray based COVID-19 infection detection. Appl Intell. 2021;51(5):3026–43. 10.1007/s10489-020-01978-9. [DOI] [PMC free article] [PubMed]
  • 5.World Health Organization. WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/
  • 6.Hassanlou L, Meshgini S, Afrouzian R, Farzamnia A, Moung EG, FirecovNet. A novel, lightweight, and fast deep Learning-Based network for detecting COVID-19 patients using chest X-rays. Electron. 2022;11(19). 10.3390/electronics11193068.
  • 7.Marefat A, Marefat M, Hassannataj Joloudari J, Nematollahi MA, Lashgari R, CCTCOVID. COVID-19 detection from chest X-ray images using compact convolutional Transformers. Front Public Heal. 2023;11. 10.3389/fpubh.2023.1025746. [DOI] [PMC free article] [PubMed]
  • 8.Minaee S, Kafieh R, Sonka M, Yazdani S, Jamalipour Soufi G. Deep-COVID: predicting COVID-19 from chest X-ray images using deep transfer learning. Med Image Anal. 2020;65. 10.1016/j.media.2020.101794. [DOI] [PMC free article] [PubMed]
  • 9.Brima Y, Atemkeng M, Djiokap ST, Ebiele J, Tchakounté F. Transfer learning for the detection and diagnosis of types of pneumonia including pneumonia induced by COVID-19 from chest X-ray images, Diagnostics. 11(8):2021 10.3390/diagnostics11081480 [DOI] [PMC free article] [PubMed]
  • 10.Ukwuoma CC, et al. Deep learning framework for rapid and accurate respiratory COVID-19 prediction using chest X-ray images. J King Saud Univ - Comput Inf Sci. 2023;35(7):101596. 10.1016/j.jksuci.2023.101596. [DOI] [PMC free article] [PubMed]
  • 11.Liu J, Sun W, Zhao X, Zhao J, Jiang Z. Deep feature fusion classification network (DFFCNet): Towards accurate diagnosis of COVID-19 using chest X-rays images, Biomed. Signal Process. Control. 76, 2022. 10.1016/j.bspc.2022.103677 [DOI] [PMC free article] [PubMed]
  • 12.Azad AK, Mahabub-A-Alahi I, Ahmed, Ahmed MU. In search of an efficient and reliable deep learning model for identification of COVID-19 infection from chest X-ray images. Diagnostics. 2023;13(3). 10.3390/diagnostics13030574. [DOI] [PMC free article] [PubMed]
  • 13.Oltu B, Guney S, Dengiz B, Agildere M. Automated tuberculosis detection using Pre-Trained CNN and SVM, 2021, 10.1109/TSP52935.2021.9522644
  • 14.Sundaram SG, Aloyuni SA, Alharbi RA, Alqahtani T, Sikkandar MY, Subbiah C. Deep Transfer Learning Based Unified Framework for COVID19 Classification and Infection Detection from Chest X-Ray Images, Arab. J. Sci. Eng. 2022;47(2):1675–1692. 10.1007/s13369-021-05958-0 [DOI] [PMC free article] [PubMed]
  • 15.Zainab Yousuf Zaidi S, Usman Akram M, Jameel A, Alghamdi NS. Lung Segmentation-Based pulmonary disease classification using deep neural networks. IEEE Access. 2021;9:125202–14. 10.1109/ACCESS.2021.3110904. [Google Scholar]
  • 16.Souza JC, Bandeira Diniz JO, Ferreira JL, França da Silva GL, Corrêa A, Silva, de Paiva AC. An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks, Comput. Methods Programs Biomed. 2019;177:285–296. 10.1016/j.cmpb.2019.06.005 [DOI] [PubMed]
  • 17.Li F et al. Lesion-aware convolutional neural network for chest radiograph classification, Clin. Radiol. 2021;76(2):155.e1-155.e14. 10.1016/j.crad.2020.08.027 [DOI] [PubMed]
  • 18.Sogancioglu E, et al. Deep learning for chest X-ray analysis: A survey. Elsevier B.V. 2021;72:102125. [DOI] [PubMed]
  • 19.Schalekamp S, et al. Performance of AI to exclude normal chest radiographs to reduce radiologists’ workload. Eur Radiol. 2024. 10.1007/s00330-024-10794-5. [DOI] [PMC free article] [PubMed]
  • 20.Brady AP. Error and discrepancy in radiology: inevitable or avoidable? Insights Imaging. 2017;8(1):171–82. 10.1007/s13244-016-0534-1. [DOI] [PMC free article] [PubMed]
  • 21.Bashar A, Latif G, Ben Brahim G, Mohammad N, Alghazo J. COVID-19 pneumonia detection using optimized deep learning techniques, Diagnostics. 11(11), 2021, 10.3390/diagnostics11111972 [DOI] [PMC free article] [PubMed]
  • 22.Türk F, Kökver Y. Detection of lung opacity and treatment planning with Three-Channel fusion CNN model. Arab J Sci Eng. 2024;49(3):2973–85. 10.1007/s13369-023-07843-4. [DOI] [PMC free article] [PubMed]
  • 23.Nayak SR, Nayak J, Sinha U, Arora V, Ghosh U, Satapathy SC. An automated lightweight deep neural network for diagnosis of COVID-19 from chest X-ray images. Arab J Sci Eng. 2021. 10.1007/s13369-021-05956-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chauhan S, et al. Detection of COVID-19 using edge devices by a light-weight convolutional neural network from chest X-ray images. BMC Med Imaging. 2024;24(1):1–15. 10.1186/s12880-023-01155-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Guo K, Cheng J, Li K, Wang L, Lv Y, Cao D. Diagnosis and detection of pneumonia using weak-label based on X-ray images: a multi-center study. BMC Med Imaging. 2023;23(1):1–8. 10.1186/s12880-023-01174-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wang W, Jiang Y, Wang X, Zhang P, Li J. Detecting COVID-19 patients via MLES-Net deep learning models from X-Ray images. BMC Med Imaging. 2022;22(1):135. 10.1186/s12880-022-00861-y. [DOI] [PMC free article] [PubMed]
  • 27.Alablani IAL, Alenazi MJF. COVID-ConvNet: A convolutional neural network classifier for diagnosing COVID-19 infection. Diagnostics. 2023;13(10). 10.3390/diagnostics13101675. [DOI] [PMC free article] [PubMed]
  • 28.Ukwuoma CC, et al. LCSB-inception: reliable and effective light-chroma separated branches for Covid-19 detection from chest X-ray images. Comput Biol Med. 2022;150. 10.1016/j.compbiomed.2022.106195. [DOI] [PMC free article] [PubMed]
  • 29.Khan E, Rehman MZU, Ahmed F, Alfouzan FA, Alzahrani NM, Ahmad J. Chest X-ray classification for the detection of COVID-19 using deep learning techniques. Sensors. 2022;22(3). 10.3390/s22031211. [DOI] [PMC free article] [PubMed]
  • 30.Pan L, et al. MFDNN: multi-channel feature deep neural network algorithm to identify COVID19 chest X-ray images. Heal Inf Sci Syst. 2022;10(1). 10.1007/s13755-022-00174-y. [DOI] [PMC free article] [PubMed]
  • 31.Roy S, Tyagi M, Bansal V, Jain V. SVD-CLAHE boosting and balanced loss function for Covid-19 detection from an imbalanced chest X-Ray dataset. Comput Biol Med. 2022;150. 10.1016/j.compbiomed.2022.106092. [DOI] [PMC free article] [PubMed]
  • 32.Ukwuoma CC, et al. Dual_Pachi: Attention-based dual path framework with intermediate second order-pooling for Covid-19 detection from chest X-ray images. Comput Biol Med. 2022;151. 10.1016/j.compbiomed.2022.106324. [DOI] [PMC free article] [PubMed]
  • 33.Ukwuoma CC et al. Nov., Automated Lung-Related Pneumonia and COVID-19 Detection Based on Novel Feature Extraction Framework and Vision Transformer Approaches Using Chest X-ray Images, Bioengineering. 2022, 9(11). 10.3390/bioengineering9110709 [DOI] [PMC free article] [PubMed]
  • 34.Islam MN et al. Interpretable Differential Diagnosis of Non-COVID Viral Pneumonia, Lung Opacity and COVID-19 Using Tuned Transfer Learning and Explainable AI, Healthc. 2023, 11(3). 10.3390/healthcare11030410 [DOI] [PMC free article] [PubMed]
  • 35.Almalki YE, et al. A Novel-based Swin transfer based diagnosis of COVID-19 patients. Intell Autom Soft Comput. 2023;35(1):163–80. 10.32604/iasc.2023.025580. [Google Scholar]
  • 36.Dosovitskiy A et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, 2020, [Online]. Available: http://arxiv.org/abs/2010.11929
  • 37.Shome D, et al. Covid-transformer: interpretable covid-19 detection using vision transformer for healthcare. Int J Environ Res Public Health. 2021;18(21). 10.3390/ijerph182111086. [DOI] [PMC free article] [PubMed]
  • 38.Chetoui M, Akhloufi MA. Explainable vision Transformers and radiomics for COVID-19 detection in chest X-rays. J Clin Med. 2022;11(11). 10.3390/jcm11113013. [DOI] [PMC free article] [PubMed]
  • 39.Yang H, Wang L, Xu Y, Liu X. CovidViT: a novel neural network with self-attention mechanism to detect Covid-19 through X-ray images, Int. J. Mach. Learn. Cybern. 2023;14(3):973–987. 10.1007/s13042-022-01676-7 [DOI] [PMC free article] [PubMed]
  • 40.Nafisah SI, Muhammad G, Hossain MS, AlQahtani SA. A Comparative Evaluation between Convolutional Neural Networks and Vision Transformers for COVID-19 Detection, Mathematics. 2023, 11(6). 10.3390/math11061489
  • 41.Chen H, Zhang T, Chen R, Zhu Z, Wang X. A Novel COVID-19 Image Classification Method Based on the Improved Residual Network, Electron. 2023, 12(1). 10.3390/electronics12010080
  • 42.Wang T, et al. PneuNet: deep learning for COVID-19 pneumonia diagnosis on chest X-ray image analysis using vision transformer. Med Biol Eng Comput. 2023. 10.1007/s11517-022-02746-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chowdhury MEH, et al. Can AI help in screening viral and COVID-19 pneumonia?? IEEE Access. 2020;8:132665–76. 10.1109/ACCESS.2020.3010287. [Google Scholar]
  • 44.Rahman T, et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med. 2021;132:104319. 10.1016/j.compbiomed.2021.104319. [DOI] [PMC free article] [PubMed]
  • 45.Salini Y, HariKiran J. ViT: quantifying chest X-Ray images using vision transformer & XAI technique. SN Comput Sci. 2023;4(6). 10.1007/s42979-023-02204-2.
  • 46.Zhang X, et al. Attention to region: Region-based integration-and-recalibration networks for nuclear cataract classification using AS-OCT images. Med Image Anal. 2022;80:102499. 10.1016/j.media.2022.102499. [DOI] [PubMed]
  • 47.Xiao Z, Zhang X, Zheng B, Guo Y, Higashita R, Liu J. Multi-style Spatial attention module for cortical cataract classification in AS-OCT image with supervised contrastive learning. Comput Methods Programs Biomed. 2024;244. 10.1016/j.cmpb.2023.107958. [DOI] [PubMed]
  • 48.Oltu B, Karaca BK, Erdem H, Özgür AA. A Systematic Review of Transfer Learning-Based Approaches for Diabetic Retinopathy Detection, Gazi Univ. J. Sci. 2023;36(3):1140–1157. 10.35378/gujs.1081546
  • 49.Lee CP, Lim KM. COVID-19 diagnosis on chest radiographs with enhanced deep neural networks. Diagnostics. 2022;12(8). 10.3390/diagnostics12081828. [DOI] [PMC free article] [PubMed]
  • 50.Chutia U, Shanker A, Jyoti T, Singh P, Kumar V. Classification of lung diseases using an Attention– Based modified densenet model, 0123456789, 2024, 10.1007/s10278-024-01005-0 [DOI] [PMC free article] [PubMed]
  • 51.Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks, Aug. 2016, [Online]. Available: http://arxiv.org/abs/1608.06993
  • 52.Usman M, Zia T, Tariq A. Analyzing Transfer Learning of Vision Transformers for Interpreting Chest Radiography, J. Digit. Imaging. 2022;35(6):1445–1462. 10.1007/s10278-022-00666-z [DOI] [PMC free article] [PubMed]
  • 53.Ukwuoma CC, et al. A hybrid explainable ensemble transformer encoder for pneumonia identification from chest X-ray images. J Adv Res. 2023;48:191–211. 10.1016/j.jare.2022.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ko J, Park S, Woo HG. Optimization of vision transformer-based detection of lung diseases from chest X-ray images. BMC Med Inf Decis Mak. 2024;24(1):4–11. 10.1186/s12911-024-02591-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Oltu B, Akşahin MF, Kibaroğlu S. A novel electroencephalography based approach for Alzheimer’s disease and mild cognitive impairment detection. Biomed Signal Process Control. 2021;63. 10.1016/j.bspc.2020.102223.
  • 56.Wang B, Zhang W. MARnet: Multi-scale adaptive residual neural network for chest X-ray images recognition of lung diseases. Math Biosci Eng. 2022;19(1):331–50. 10.3934/mbe.2022017. [DOI] [PubMed] [Google Scholar]
  • 57.Choudhary P, Hazra A. Chest disease radiography in twofold: using convolutional neural networks and transfer learning, Evol. Syst. 2021;12(2):567–579. 10.1007/s12530-019-09316-2
  • 58.Umair M et al. Sep., Detection of COVID-19 using transfer learning and grad-cam visualization on indigenously collected X-ray dataset, Sensors. 2021;21(17). 10.3390/s21175813 [DOI] [PMC free article] [PubMed]
  • 59.Galanty M, et al. Assessing the Documentation of publicly available medical image and signal datasets and their impact on bias using the BEAMRAD tool. Sci Rep. 2024;14(1):31846. 10.1038/s41598-024-83218-5. [DOI] [PMC free article] [PubMed]
  • 60.Zhang S, Wang L, Ding L, Liu A, Zhu S, Tu D. Intrinsic Bias Identification on Medical Image Datasets, 2022, doi: 2203.12872v2.
  • 61.Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195. 10.1186/s12916-019-1426-2. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

In this study, a publicly available dataset was used. Necessary references and relevant links are provided appropriately.


Articles from BMC Medical Imaging are provided here courtesy of BMC

RESOURCES