COVID-19 detection in X-ray images using convolutional neural networks[image]

Daniel Arias-Garzón; Jesús Alejandro Alzate-Grisales; Simon Orozco-Arias; Harold Brayan Arteaga-Arteaga; Mario Alejandro Bravo-Ortiz; Alejandro Mora-Rubio; Jose Manuel Saborit-Torres; Joaquim Ángel Montell Serrano; Maria de la Iglesia Vayá; Oscar Cardona-Morales; Reinel Tabares-Soto

doi:10.1016/j.mlwa.2021.100138

. 2021 Aug 20;6:100138. doi: 10.1016/j.mlwa.2021.100138

COVID-19 detection in X-ray images using convolutional neural networks

Daniel Arias-Garzón ^a,^⁎,¹, Jesús Alejandro Alzate-Grisales ^a,¹, Simon Orozco-Arias ^b,^c, Harold Brayan Arteaga-Arteaga ^a, Mario Alejandro Bravo-Ortiz ^a, Alejandro Mora-Rubio ^a, Jose Manuel Saborit-Torres ^d, Joaquim Ángel Montell Serrano ^d, Maria de la Iglesia Vayá ^d,^⁎, Oscar Cardona-Morales ^a, Reinel Tabares-Soto ^a,^⁎

PMCID: PMC8378046 PMID: 34939042

Abstract

COVID-19 global pandemic affects health care and lifestyle worldwide, and its early detection is critical to control cases’ spreading and mortality. The actual leader diagnosis test is the Reverse transcription Polymerase chain reaction (RT-PCR), result times and cost of these tests are high, so other fast and accessible diagnostic tools are needed. Inspired by recent research that correlates the presence of COVID-19 to findings in Chest X-ray images, this papers’ approach uses existing deep learning models (VGG19 and U-Net) to process these images and classify them as positive or negative for COVID-19. The proposed system involves a preprocessing stage with lung segmentation, removing the surroundings which does not offer relevant information for the task and may produce biased results; after this initial stage comes the classification model trained under the transfer learning scheme; and finally, results analysis and interpretation via heat maps visualization. The best models achieved a detection accuracy of COVID-19 around 97%.

Keywords: COVID-19, Deep learning, Transfer learning, X-ray, Segmentation

Graphical abstract

1. Introduction

Coronavirus illness is a disease that comes from Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). A novel coronavirus, COVID-19, is the infection caused by SARS-CoV-2 (Zhang, 2020). In December 2019, the first COVID-19 cases were reported in Wuhan city, Hubei province, China (Xu et al., 2020). World Health Organization (WHO) declared COVID-19 a pandemic (Ducharme, 2020) on March 11 2021, up to July 13 of 2021 there are 188,404,506 reported cases around the world, which have caused 4,059,220 deaths (Worldometer, 2020).

These diseases cause respiratory problems that can be treated without specialized medicine or equipment. Still, underlying medical issues such as diabetes, cancer, cardiovascular and respiratory illnesses can make this sickness worse (World Health Organization, 2020). Reverse transcription Polymerase chain reaction (RT-PCR), gene sequencing for respiratory or blood samples are now the main methods for COVID-19 detection (Wang et al., 2020). Other studies show that COVID-19 has similar pathologies presented in pneumonic illness, leaving chest pathologies visible in medical images. Research shows RT-PCR correlation with Chest CT (Ai et al., 2020), while others study its correlation with X-ray chest images (Kanne et al., 2020). Typical opacities or attenuation are the most common finding in these images, with ground-glass opacity in around 57% of cases (Kong & Agarwal, 2020). Even though expert radiologists can identify the visual patterns found in these images, considering monetary resources at low-level medical institutions and the ongoing increase of cases, this diagnostic process is quite impractical. Recent research in Artificial Intelligence (AI), especially in Deep Learning approaches, shows how these techniques applied to medical images performed well.

There are only a few large open access datasets of COVID-19 X-ray images; most of the published studies use as a foundation the COVID-19 Image Data Collection (Cohen et al., 2020), which was constructed with images from COVID-19 reports or articles, in collaboration with a radiologist to confirm pathologies in the pictures taken. Past approaches use different strategies to deal with small datasets such as transfer learning, data augmentation or combining different datasets, finding good results in papers as Civit-Masot et al. (2020) using a VGG16 with 86% accuracy; Ozturk et al. (2020) with a Dark Covid Net presents 87% accuracy classifying three classes in which is included Covid; Yoo et al. (2020) used a ResNet18 obtaining a 95% accuracy; Sethy et al. (2020) used a ResNet50 for a 95.33% accuracy, and Minaee et al. (2020) used Squeeze Net for a 95.45% accuracy; Panwar et al. (2020) achieved 97.62% using a nCovnet; Apostolopoulos and Mpesiana (2020) improved the results using a VGG19-MobileNet with a 97.8% accuracy, and finally higher results are found in Jain et al. (2020) using a ResNet101 with 98.95% and Khan et al. (2020) with a 99% accuracy using CoroNet a model based on an Xception.

This paper presents a new approach using existing Deep Learning models. It focuses on enhancing the preprocessing stage to obtain accurate and reliable results classifying COVID-19 from Chest X-ray images. The preprocessing step involves a network to filter the images based on the projection it is (lateral or frontal), some common operations such as normalization, standardization, and resizing to reduce data variability, which may hurt the performance of the classification models, and a segmentation model (U-Net) to extract the lung region which contains the relevant information, and discard the information of the surroundings that can produce misleading results (de Informática, 2020). Following the preprocessing stage comes the classification model (VGG16-19), using the transfer learning scheme that takes advantage of pre-trained weights from a much bigger dataset, such as ImageNet, and helps the training process of the network in performance and time to convergence. It is worth noting that the dataset used for this research is at least ten times bigger than the ones used in previous works. Finally, the visualization of heatmaps for different images provides helpful information about the regions of the images that contribute to the prediction of the network, which in ideal conditions should focus on the appearance of the lungs, backing the importance of lung segmentation in the preprocessing stage. After this section, the paper follows the next order: first, the Methodology applied for these approaches, followed by the experiments and results obtained, a discussion of the products, and lastly the conclusions.

2. Methodology

Our methodology consists of three main experiments to evaluate the performance of the models and assess the influence of the different stages of the process. Each experiment follows the workflow shown in Fig. 1. The difference between experiments is the dataset used. In all instances, the same images for COVID-19 positive cases were used. Meanwhile, three different datasets for negative cases were used. In that order, Experiment 1 and 2 consists of evaluating positive vs. negative cases datasets, and Experiment 3 involves Pre-COVID era images (images from $2015$ - $2017$ ).

Fig. 1 — Experiment diagram: a is the first classification task, b is the lung segmentation task, c is a covid prediction with standard images, d is a covid prediction with only lungs part in the images, and e is covid prediction without lungs in images.. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

2.1. Datasets

A total of 9 Chest X-ray images datasets were used in different stages:

2.1.1. COVID-19 classification datasets

The following datasets were used to train the classification models: BIMCV-COVID19+(Vayá et al., 2020), BIMCV-COVID- (Medical Imaging Databank of the Valencia region BIMCV, 2020), and Spain Pre-COVID era dataset. These datasets were provided by the Medical Imaging Databank of the Valencia Region (BIMCV). Also, for comparing these processes with other previous works, we use another two databases. For positive cases, the COVID-19 Image Data Collection by Cohen et al. (2020), and negative cases compound by Normal, Viral Pneumonia and Bacterial Pneumonia database by Daniel Kermany et al. (2018), these last databases can be found (COVID-19 X rays, 2020).

2.1.2. Image projection filtering

The images from the COVID-19 datasets have a label corresponding to the image projection: frontal (posteroanterior and anteroposterior) and lateral. Upon manual inspection, several mismatched labels were found, affecting model performance, given the difference between the information available from the two views and that not every patient had both views available. In order to automate the process of filtering the images according to the projection, a classification model was trained on a subset of BIMCV-Padchest dataset (Bustos et al., 2020), with $2481$ frontal images and $815$ lateral images. This model allowed us to filter the COVID-19 datasets efficiently and keep the frontal projection images that offer more information than lateral images.

Finally, to train COVID-19 classification models, the positive dataset (BIMCV-COVID19+), once separated, has 12,802 frontal images. In Experiment 1, images from BIMCV-COVID-dataset were used as negative cases, with 4610 frontal images. BIMCV-COVID — was not organized; also, some of the patients from this dataset were confirmed as COVID-19 positive in a posterior evaluation. Therefore, the models trained on this data could have a biased or unfavorable performance based on dataset size and false positives identified by radiologists. Experiment 2 used a curated version of BIMCV-COVID — for negative patients to avoid this bias, by eliminating patients’ images that correlate with the positive dataset, a total of 1370 images were excluded. Finally, Experiment 3 used a Pre-COVID dataset of images collected from European patients between $2015$ and $2017$ . There are 5469 images; this dataset was obtained from BIMCV, but it has not been published yet.

2.1.3. Lung segmentation

Three datasets were used to train the U-Net models for these segmentations: Montgomery dataset (Jaeger et al., 2020) with $138$ images, JSTR (Shiraishi et al., 2020) with $240$ , and NIH (Tang et al., 2020) with $100$ . Despite the apparent small amount of data, the quantity and variability of the images was enough to achieve a useful segmentation model.

2.2. Image separation

For the classification task, data were divided into a train (60%), validation (20%), and test (20%) partitions, following the clinical information to avoid images from the same subject in two different partitions, which could generate bias and overfitting in the models. Accordingly, the data distribution was as follows:

•
For the classification model to filter images based on the projection, the data was composed of frontal images, 1,150, $723$ , and $608$ for train, test, and validation partitions. In contrast, in the same partitions, the separation of lateral images was $375$ , $236$ , and $204$ images.
•
For the COVID-19 classification model, the positive cases dataset has 6475 images for train, 3454 for test, and 2873 for the validation set. Meanwhile, for the negative cases datasets, the BIMCV-COVID dataset is divided into 2342, 1228, and 1040 images for train, test, and validation. After the BIMCV-COVID-dataset was curated, there were 1645 images, $895$ , and $700$ for the train, test, and validation sets. Finally, the Pre-COVID era dataset was divided into 2803 images, 1401, and 1265 for the train, test, and validation sets.
•
For the COVID-19 comparison with previous works, the COVID cases dataset has $286$ images for train, $96$ for the test, and $96$ for the validation set. Meanwhile, for the negative cases datasets, Normal images are divided into 809 for training, 270 for test and validation sets. For Pneumonia, there are 2329 images for the train, 777 for the other two groups each.

The image quantity was considerably less for the segmentation task, so creating a test dataset was avoided, leaving the distribution of 80% ( $382$ images) for the train set and 20% ( $96$ images) for validation data.

2.3. Preprocessing

As the images come from several datasets with different image sizes and acquisition conditions, a preprocessing step is applied to reduce or remove effects on the performance of the models due to data variability. For instance, the BIMCV-Padchest dataset was collected all from the same hospital. In contrast, COVID-19 datasets have images mainly from the Valencian region in Spain, other parts of Spain, and other European countries. On the other hand, the Montgomery and NIH segmentation datasets come from US images, while JSRT is a Japanese dataset. In general, this implies that there were many types of X-ray devices used to take the images, with different technologies and resolutions. The preprocessing layer is shown orange in Fig. 1., it consists of three steps: resize all images to 224 × 224 pixels in one channel (grayscale). In the second step, Eq. (1) shows the normalization of datasets where $x$ represents the original images and $N$ , the normalized image. Finally, we standardized datasets according to Eq. (2), being $Z$ the standardized image and $N$ the normalized image. When applying standardization to the validation and test sets, the mean and standard deviation (std) from the training set were used to unify the data distribution.

N_{i} = \frac{x_{i} - m i n (x)}{m a x (x) - m i n (x)}

(1)

Z_{i} = \frac{N_{i} - m e a n (N)}{s t d (N)}

(2)

2.4. Segmentation

There are multiple ways to perform image segmentation; this paper uses a Deep Learning model based on U-Net architecture (Ronneberger et al., 2020). Previous articles show that U-Net architecture is accurate for the segmentation of medical images. This kind of model receives the target in the form of an image mask with ones (1) on the reconstruction area and zeros (0) on the rest; consequently, in a production setting, model input is an X-ray chest image, and the output is the predicted mask. Fig. 2. shows the structure of U-Net.

Fig. 2 — U-Net used for segmentation task.

For experimental purposes, we tested three different amounts of filters on convolutional layers to find the optimal for this task. The number of filters in contraction blocks are computed according to Eq. (3), where $F_{0}$ is the number of the initial filters, $i$ corresponds to the number of contraction blocks. Eq. (4) shows the number of filters for each block for Expansion blocks: $F_{f}$ is the number of filters at the last contraction block, and $i$ is the number of the corresponding expansion block. In the expansion block, the transposed convolution layer uses the same number of filters as convolutional layers.

# F i l t e r s_{c o n t} = F_{0} * 2^{i - 1}

(3)

# F i l t e r s_{e x p a n} = \frac{F_{f}}{2^{i}}

(4)

The values used for $F_{0}$ were 16, 112, and 64, the models will be identified as U-Net 1,2, and 3, respectively.

2.4.1. Hyperparameters

Kernel size in convolutional layers is 3 × 3 with a kernel initialization he-normal and padding same. In Maxpooling layers, the pool size is 2 × 2, the Dropout rate in the first two Expansion and contraction blocks is 0.1, while in three and four of 0.2, and for contraction block five is 0.3. Transposed convolutional layers use kernel size of 2 × 2, strides of 2 × 2, and padding same. Finally, the last convolutional layer uses one filter and a kernel size of 1 × 1.

2.5. Classification

There are two classification tasks in this research, first to separate frontal and lateral Chest X-ray images, and the second one to distinguish COVID-19 positive cases from negative ones. For both tasks, VGG16 and VGG19 (Simonyan & Zisserman, 2020) Deep Learning models were used. The networks were trained using transfer learning (Bravo Ortíz et al., 2021) with pre-trained weights from the Imagenet dataset (Deng et al., 2020). These were trained, using millions of images to predict more than $1000$ classes. The use of pre-trained models takes advantage of features learned on a larger dataset so that a new model converges faster and performs better on a smaller dataset (Aggarwal, 2020). Pre-trained models come from the Tensorflow+Keras library, these weights come from $3$ channels images, and the X-ray data comes in one channel. The following weights were used to convert the RGB values from $3$ channels to $1$ channel: Red 0.2989, Green 0.5870, and Blue 0.1140.

3. Experiments and results

Regarding Fig. 1, in part a, the dataset is filtered by a VGG19 model to find whether a Chest X-ray image is lateral or frontal. This network will be referred to as VGG19 FL to distinguish it from the other classification model. In part b, lung segmentation was performed with a U-Net model, only with the images that pass as frontal from the previous stage. A VGG19 classification model was used in parts c, d, and e, to predict COVID-19 positive and negative cases. For differentiation with the other VGG19 model, we use the name VGG19 Covid. In variation c, the datasets passed through the classification without lung segmentation. Variation d was using the segmented images, obtained by multiplying the predicted mask from part b and the original images, and passing them through the VGG19 Covid classifier; finally, in variation e, the mask from part b was inverted and applied to the original images to be passed through VGG19 Covid classifier; these three variations allowed us to assess the importance of the segmentation stage, by giving the model full or partial information and analyzing which part of the images contributes to the prediction.

3.1. VGG19 FL

To filter frontal and lateral images, a subset of samples from BIMCV-Padchest and BIMCV-COVID-datasets were labeled manually; experiments were performed using the VGG16 and VGG19 models with pre-trained weights from the Imagenet dataset. Table 1 shows the accuracy for these experiments leaving the best results in VGG19, making that model the one to be used for future parts in the Experiment diagram. Each model was trained for $30$ epochs with a batch size of $64$ .

Table 1.

Accuracy of part a models.

Model	Train	Validation	Test
VGG16	0.9908	0.9803	0.9687
VGG19	0.9973	0.9975	0.9937

Part	Accuracy	Sensitivity	Specificity	F1 Score
c	0.939	0.972	0.883	0.965
d	0.933	0.968	0.871	0.961
e	0.956	0.967	0.917	0.969

Part	Accuracy	Sensitivity	Specificity	F1 Score
c	0.942	0.982	0.858	0.973
d	0.963	0.951	0.913	0.964
e	0.952	0.987	0.882	0.978

Part	Accuracy	Sensitivity	Specificity	F1 Score
c	0.962	0.973	0.935	0.973
d	0.973	0.928	0.956	0.954
e	0.969	0.981	0.946	0.979

Part	Accuracy	Sensitivity	Specificity	F1 Score
c	0.991	0.995	0.986	0.966
d	0.993	0.971	0.996	0.965
e	0.993	0.986	0.996	0.973

Model	Accuracy	F1 score	Recall	Precisión
Proposed Method	99.06	99.06	99.06	99.07
CoroNet (Khan et al., 2020)	99	98.5	99.3	98.3
VGG19 (Apostolopoulos & Mpesiana, 2020)	98.75	93.06	92.85	93.27

Model	Dice		IoU
	Train	Validation	Train	Validation
U-Net 1	0.9869	0.9645	0.9609	0.9416
U-Net 2	0.9828	0.9609	0.9520	0.9322
U-Net 3	0.9867	0.9648	0.9591	0.904

Model	Train	Validation	Test
VGG16	0.9883	0.8898	0.8274
VGG19	0.9863	0.9478	0.8996

Model	Train	Validation	Test
VGG16	0.9835	0.9036	0.8767
VGG19	0.9628	0.9379	0.9113

Model	Train	Validation	Test
VGG16	0.9872	0.9366	0.8983
VGG19	0.9954	0.9639	0.9538

PERMALINK

COVID-19 detection in X-ray images using convolutional neural networks

Daniel Arias-Garzón

Jesús Alejandro Alzate-Grisales

Simon Orozco-Arias

Harold Brayan Arteaga-Arteaga

Mario Alejandro Bravo-Ortiz

Alejandro Mora-Rubio

Jose Manuel Saborit-Torres

Joaquim Ángel Montell Serrano

Maria de la Iglesia Vayá

Oscar Cardona-Morales

Reinel Tabares-Soto

Abstract

Graphical abstract

1. Introduction

2. Methodology

Fig. 1.

2.1. Datasets

2.1.1. COVID-19 classification datasets

2.1.2. Image projection filtering

2.1.3. Lung segmentation

2.2. Image separation

2.3. Preprocessing

2.4. Segmentation

Fig. 2.

2.4.1. Hyperparameters

2.5. Classification

3. Experiments and results

3.1. VGG19 FL

Table 1.

3.2. U-Net

Table 2.

Fig. 3.

3.3. VGG19 covid

3.3.1. Experiment 1

Table 3.

Table 4.

Table 5.

Table 6.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

3.3.2. Experiment 2

Table 7.

Table 8.

Table 9.

Table 10.

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

Fig. 13.

3.3.3. Experiment 3

Table 11.

Table 12.

Table 13.

Table 14.

Fig. 14.

Fig. 15.

Fig. 16.

Fig. 17.

Fig. 18.

3.4. Results of comparison dataset

Table 15.

3.5. Hardware and software

3.6. Future works

4. Discussion

Table 16.

5. Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Funding

Footnotes

References

ACTIONS

PERMALINK