Convolutional Sparse Support Estimator-Based COVID-19 Recognition From X-Ray Images

Mehmet Yamac; Mete Ahishali; Aysen Degerli; Serkan Kiranyaz; Muhammad E H Chowdhury; Moncef Gabbouj

doi:10.1109/TNNLS.2021.3070467

. 2021 Apr 19;32(5):1810–1820. doi: 10.1109/TNNLS.2021.3070467

Convolutional Sparse Support Estimator-Based COVID-19 Recognition From X-Ray Images

Mehmet Yamac ^1,^✉, Mete Ahishali ¹, Aysen Degerli ¹, Serkan Kiranyaz ², Muhammad E H Chowdhury ², Moncef Gabbouj ¹

PMCID: PMC8544941 PMID: 33872157

Abstract

Coronavirus disease (COVID-19) has been the main agenda of the whole world ever since it came into sight. X-ray imaging is a common and easily accessible tool that has great potential for COVID-19 diagnosis and prognosis. Deep learning techniques can generally provide state-of-the-art performance in many classification tasks when trained properly over large data sets. However, data scarcity can be a crucial obstacle when using them for COVID-19 detection. Alternative approaches such as representation-based classification [collaborative or sparse representation (SR)] might provide satisfactory performance with limited size data sets, but they generally fall short in performance or speed compared to the neural network (NN)-based methods. To address this deficiency, convolution support estimation network (CSEN) has recently been proposed as a bridge between representation-based and NN approaches by providing a noniterative real-time mapping from query sample to ideally SR coefficient support, which is critical information for class decision in representation-based techniques. The main premises of this study can be summarized as follows: 1) A benchmark X-ray data set, namely QaTa-Cov19, containing over 6200 X-ray images is created. The data set covering 462 X-ray images from COVID-19 patients along with three other classes; bacterial pneumonia, viral pneumonia, and normal. 2) The proposed CSEN-based classification scheme equipped with feature extraction from state-of-the-art deep NN solution for X-ray images, CheXNet, achieves over 98% sensitivity and over 95% specificity for COVID-19 recognition directly from raw X-ray images when the average performance of 5-fold cross validation over QaTa-Cov19 data set is calculated. 3) Having such an elegant COVID-19 assistive diagnosis performance, this study further provides evidence that COVID-19 induces a unique pattern in X-rays that can be discriminated with high accuracy.

Keywords: Coronavirus disease (COVID-19) recognition, representation-based classification, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)~virus, transfer learning

I. Introduction

Coronavirus disease 2019 (COVID-19) has been declared as a pandemic by the World Health Organization (WHO) a few months after its first appearance. It has infected more than 70 million people, caused a few million causalities, and has so far paralyzed mobility all around the world. The spreading rate of COVID-19 is so high that the number of cases is expected to be doubled every three days if the social distancing is not strictly observed to slow this accretion [1]. Roughly around half of the COVID-19 positive patients also exhibit a comorbidity [2], making it difficult to differentiate COVID-19 from other lung diseases. Automated and accurate COVID-19 diagnosis is critical for both saving lives and preventing its rapid spread in the community. Currently, reverse transcription-polymerase chain reaction (RT-PCR) and computed tomography (CT) are the common diagnostic techniques used today. RT-PCR results are ready at the earliest 24 h for critical cases and generally take several days to conclude a decision [3]. CT may be an alternative at initial presentation; however, it is expensive and not easily accessible [4]. The most common tool that medical experts use for both diagnostic and monitoring the course of the disease is X-ray imaging. Compared to RT-PCR or CT test, having an X-ray image is an extremely low cost and a fast process, usually taking only a few seconds. Recently, WHO reported that even RT-PCR may give false results in COVID-19 cases due to several reasons such as poor quality specimen from the patient, inappropriate processing of the specimen, taking the specimen at an early or late stage of the disease [5]. For this reason, X-ray imaging has a great potential to be an alternative technological tool to be used along with the other tests for an accurate diagnosis.

In this study, we aim to differentiate X-ray images of COVID-19 patients among other classes; bacterial pneumonia, viral pneumonia, and normal. For this work, a benchmark COVID-19 X-ray data set, Qata-Cov19 (Qatar University and Tampere University COVID-19 Data set) that contains 462 X-ray images from COVID-19 patients was collected. The images in the data set are different in quality, resolution, and SNR levels as shown in Fig. 1. QaTa-Cov19 also contains many X-ray images from the COVID-19 patients who are in the early stages; therefore, their X-ray images show mild or no-sign of COVID-19 infestation by the naked eye.¹ Some sample images are shown in Fig. 2(b). Another fact that makes the diagnosis far more challenging is that interclass similarity can be very high for many X-ray images as some samples are shown in Fig. 2(a). Against such high interclass similarities and intraclass variations, in this study, we aim for a high robustness level.

Fig. 1. — Sample COVID-19 X-ray images from QaTa-Cov19.

Fig. 2. — Sample QaTa-Cov19 X-ray images. (a) X-ray images from different classes. (b) X-ray images from the COVID-19 patients who are in the different stages.

In numerous classification tasks, deep learning techniques have been shown to achieve state-of-the-art performance in terms of both recognition accuracy and their parallelizable computing structures which play an important role, especially in real-time applications. Despite their advantages, in order to achieve the desired performance level in a deep model, proper training over a massive training data set is usually needed. Nevertheless, this is unfortunately unfeasible for this problem since the available data is still rather limited.

An alternative supervised approach, which requires a limited number of training samples to achieve satisfactory classification accuracy is representation-based classification [6]–[8]. In representation-based classification systems, a dictionary, the columns of which consist of the training samples that are stacked in such a way that a subset of them corresponding to a class, is predefined. A test sample is expected to be a linear combination of all points from the same class as the test sample. Therefore, given a predefined dictionary matrix, Inline graphic and a test sample , we expect the solution from , carry enough information about the class of . Overall, in this study, we draw a convolutional support estimation network (CSEN) [9] -based solution pipeline, which fuses the representation-based classification scheme into a neural network (NN) body.

The rest of this article is organized as follows. In Section II, notations and mathematical preliminaries are given with emphasis on sparse representation (SR) and sparse support estimation (SE). Then in Section III, a literature review on deep learning models over X-ray images and representation-based classification is presented. The proposed CSEN-based COVID-19 recognition system is introduced in Section IV along with two recent alternative approaches that are used as the competing methods. The data collection is also explained in this section. Experimental setup and the main results are provided in Section V. Finally, Section VII concludes this article and suggests topics for future research.

II. Preliminaries and Mathematical Notations

A. Notations

In this study, the Inline graphic -norm of a vector is defined as for . On the other hand, the -norm of the vector is defined as and the -norm is defined as . A signal is called strictly -sparse if . Sparse support set or simply support set, of sparse signal can be defined as the set of nonzero coefficients’ location, i.e., Inline graphic .

B. Sparse Signal Representation

SR of a signal Inline graphic in a predefined set of waveforms, , can be defined as representing as a linear combination of only a small subset of atoms in the dictionary , i.e., . Defining these sets, which dates back to Fourier’s pioneering work [10], has been excessively studied in the literature. In the early approaches, these sets of waveforms have been selected as a collection of linearly independent and generally orthogonal waveforms (which are called a complete dictionary or basis, i.e., Inline graphic ) such as Fourier transform, DCT, and wavelet transform, until the pioneering work of Mallat [11] on overcomplete dictionaries ( ). In the last decade, interest in SR research increased tremendously. Their wide range of applications includes denoising [12], classification [13], anomaly detection [14], [15], deep learning [16], and compressive sensing (CS) [17], [18].

With a possible dimensional reduction that can be satisfied via a compression matrix Inline graphic ( ), sample can be obtained from

where Inline graphic can be called the equivalent dictionary. Because (1) describes an underdetermined system of linear equations, finding the representation coefficient vector requires at least one more constraint to have a unique solution. Using the prior information about sparsity, the following representation:

which is also an SR of Inline graphic has a unique solution provided that is strictly sparse and satisfies some required properties [19]. For instance, if , the minimum number of linearly independent columns of , , should be greater than 2 k, i.e., in order to not to have for distinct -sparse signals, and [19]. However, the optimization problem in (2) is a NP-hard. Fortunately, the following relaxation:

produces exactly the same solution as that of (2) provided that Inline graphic obeys some criteria: the equivalence of – minimization problems can be guaranteed when satisfies a notation of null space property (NSP) [20], [21] not only for exact sparse signals but approximately sparse signals. Furthermore, the query sample can be corrupted with an additive noise pattern. In this case, the equality constraint in (3) can be further relaxed such as in the basis pursuit denoising (BPDN) [22]: Inline graphic , where is a small constant that depends on the noise level. In this case, a stronger property which is known as restricted isometry property (RIP) [23], [24] is frequently used which both cover conditions satisfying exact recovery of BP and stable recovery of BPDN, e.g., exact recovery of Inline graphic from (3) is possible when has RIP and .

We may refer to the sparse SE problem as finding the indices a set, Inline graphic , of nonzero elements of [25], [26]. Indeed, in many applications, SE can be more important than finding the magnitude and sign of as well as , which refers to the sparse signal recovery (SSR) via a recovery technique, such as (3). For example, in a sparse representation-based classification (SRC) system, a query sample Inline graphic can be represented with sparse coefficient vector, , in the dictionary, in such a way that when we recover this representation coefficient from , the solution vector is expected to have a significant number of nonzero coefficients coming from the particular locations corresponding to the class of Inline graphic .

Readers are referred to [9] for a more detailed literature review on SE and its applications. In the sequel, we briefly summarize the building blocks of the proposed approach.

III. Background and Prior Art

A. CheXNet

In the proposed approach, we first use the pretrained deep network, CheXNet, to extract discriminative features from raw X-ray images. CheXNet was developed for pneumonia detection from the chest X-ray images [27]. In [27], it was claimed that CheXNet can perform even better than expert radiologists in the pneumonia detection problem. This deep NN design is based on the previously proposed DenseNet [28] that consists of 121 layers. It is first pretrained over ImageNet data set [29] and performed transfer learning over 112120 frontal-view chest X-ray images in the ChestX-ray14 data set [30].

B. Representation-Based Classification

Consider we are given a test sample Inline graphic , which represents either the extracted features, , or their dimensionally reduced version, i.e., . In developing the dictionary, training samples are stacked in the dictionary with particular locations in such a way that the optimal support for a given query should be the set of all points coming from the same class as Inline graphic . Therefore, a solution vector, of is supposed to have enough information, i.e., the sparse support should be the set of location indices of the training sample from the same class as . This strategy is generally known as representation-based classification. However, a typical solution Inline graphic of is not necessarily a sparse one especially when its size grows with more training samples, which results in a highly underdetermined system of linear equations. Fortunately, if one estimates the representation coefficient vector with a sparse recovery design such as -minimization as in (3), we can expect that the important nonzero entries of the solution, Inline graphic , are grouped in the particular locations that correspond to the locations of the training samples from the same class as . This can be a typical example of scenarios where SE can be more valuable than the magnitudes and sign recovery as explained in Section II-B.

For instance, Wright et al. [8] proposed a systematic way of determining the identity of face images using Inline graphic -minimization. The authors develop a three-step classification technique that includes: (i) normalization of all the atoms in and to have unit -norm; (ii) estimating the representation coefficient vector via sparse recovery, i.e., ; and (iii) finding the residuals corresponding to each class via Inline graphic , where is the group of the estimated coefficients, , that correspond to class .

This technique, which is known as SRC, and its variants have been applied to a wide range of applications in the literature [31], [32], e.g., human action recognition [33], and hyperspectral image classification [34], to name a few. Despite the good recognition accuracy performance of SRC systems, their main drawbacks is the fact that their sparse recovery algorithms (e.g., Inline graphic -minimization) are iterative methods and computationally costly, rendering them infeasible in real-time applications. Later, the authors of [6] introduced collaborative representation-based classification (CRC), which is similar to SRC except for the use of traditional -minimization in the second step; Inline graphic . Thus, CRC does not require an iterative solution to obtain representation coefficient thanks to that -minimization has a closed form solution, . Although, the sparsity in cannot be guaranteed, it has often been reported to achieve a comparable classification performance, especially in small-size training data sets.

IV. Proposed Approach

For a computer-aided COVID-19 recognition system design, our primary objective is to achieve the highest sensitivity possible in the diagnosis of COVID-19 induced pneumonia with an acceptable false-alarm rate (e.g., specificity Inline graphic ). In particular, the misdiagnosis of a COVID-19 X-ray image as a normal case should be minimized whilst a small number of false negatives (FNs) is tolerable.

Our interest in representation-based classification is that they perform well in classification tasks even in the cases where training data is scarce. As mentioned, the two well-known representation-based classification methodologies are SRC [7] and CRC [6]. Among them, SRC provides slightly improved accuracy by solving an SR problem, i.e., producing a sparse solution Inline graphic from . Then, the location of the nonzero elements of , which is also known as support set, provides the class information of the query . Despite improved recognition accuracy, SRC solutions are iterative solutions and can be computationally demanding compared to CRC. In a recent work [9], a compact NN design that can be considered as a bridge between NN-based and representation-based methodologies was proposed. The so-called CSEN uses a predefined dictionary and learns a direct mapping using moderate/low size training set, which maps query samples, Inline graphic , directly to the support set of representation coefficients, (as it should be purely sparse in the ideal case).

In this study, to address the data scarcity limitations in COVID-19 diagnosis from X-ray images we propose a CSEN-based approach. Since a relatively larger set of COVID-19 X-ray images ever compiled is used in this study, the proposed approach can be evaluated rigorously against a high level of diversity to obtain a reliable analysis. The general pipeline of the proposed CSEN-based recognition scheme is illustrated in Fig. 3. In order to obtain highly discriminative features, we use the recently proposed CheXNet [27], which is the fine-tuned version of 121 layer Dense Convolutional Network (DenseNet-121) [28] by using over Inline graphic frontal view X-ray images form 14 classes. Having the pretrained CheXNet for feature extraction, we develop two different strategies to obtain the classes of query X-ray images: 1) using CRC with proper preprocessing; 2) a slightly modified version of our recently proposed convolution support estimator (CSEN) models. In the sequel, both techniques will be explained in detail as well as alternative solutions.

Fig. 3. — Proposed approach for Covid recognition from X-ray images. The proposed convolution support estimator network (CSEN) which can be trained from a moderate size training set. The pipeline employs the pretrained deep NN for feature extraction. is the dimensional reduction (PCA) matrix, the coarse estimation of representation coefficient (sparse in ideal case), is obtained via the denoiser matrix, , where and is the predefined dictionary matrix of training samples (before dimensional reduction).

Inline graphic — Proposed approach for Covid recognition from X-ray images. The proposed convolution support estimator network (CSEN) which can be trained from a moderate size training set. The pipeline employs the pretrained deep NN for feature extraction. is the dimensional reduction (PCA) matrix, the coarse estimation of representation coefficient (sparse in ideal case), is obtained via the denoiser matrix, , where and is the predefined dictionary matrix of training samples (before dimensional reduction).

A. Benchmark Data Set: QaTa-Cov19

Accordingly, there are several recent works [35]–[38] that have been proposed for COVID-19 detection/classification from X-ray images. However, they use a rather small data set (the largest containing only a few hundreds of X-ray images), with only a few COVID-19 samples. This makes it difficult to generalize their results in practice. To address this deficiency and provide reliable results, in this study the researchers of Qatar University and Tampere University have compiled a bechmark Covid-19 data set, called QaTa-Cov19. Compared to the earlier benchmark data set created in this domain, such as COVID Chestxray Data set [39] or COVID-19 DATA SET [40], QaTa-Cov19 has the following unique benchmarking properties. First, it is a larger data set, not only in terms of the number of images (more than 6200 images) but its versatility, i.e., QaTa-Cov19 contains additional major pneumonia categories, such as viral and bacterial, along with the control (normal) class. Moreover, this is a diverse data set encapsulating X-ray images from several countries (e.g., Italy, Spain, China, etc.) produced by different X-ray machines.

COVID-19 chest X-ray images were gathered from different publicly available but scattered image sources. However, the major sources of COVID-19 images are Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 Database [40], Radiopaedia [41], Chest Imaging (Spain) at thread reader [42] and online articles and news portals [43]. The authors have carried out the task of collecting and indexing the X-ray images for COVID-19 positive cases reported in the published and preprint articles from China, South Korea, USA, Taiwan, Spain, and Italy, as well as online news-portals (up to 20th April 2020). Therefore, these X-ray images represent different age groups, gender, ethnicity, and country. Negative Covid19 cases were normal, viral, and bacterial pneumonia chest X-ray images and collected from the Kaggle chest X-ray database. Kaggle chest X-ray database contains 5863 chest X-ray images of normal, viral, and bacterial pneumonia with varying resolutions [44]. Out of these 5863 chest X-ray images, 1583 images are normal images and the remaining are bacterial and viral pneumonia images. Sample X-ray images from QaTa-Cov19 data set are shown in Fig. 4.

Fig. 4. — Samples from the benchmark QU-Chest data set.

B. Feature Extraction

With their outstanding performance in image classification along with other inference tasks, deep NNs became a dominant paradigm. However, these techniques usually necessitate a large number of training samples (e.g., several hundred-thousand to millions depending on the network size) to achieve an adequate generalization capability. Albeit, we can still leverage their power by finding properly pretrained models for similar problems. To this end, we use a state-of-the-art pneumonia detection network, CheXNet, whose details are summarized in Section III-A. With the pretrained model, we extract 1024-long vectors, right after the last average pooling layer. After data normalization (zero mean and unit variance), we obtain a feature vector Inline graphic .

A dimensionality reduction PCA is applied to Inline graphic in order to get the query sample, , where is PCA matrix ( ).

C. Proposed CSEN-Based Classification

Considering the limited number of training data in our COVID-19 data set, a representation-based classification can be applied hereafter to obtain the class of Inline graphic using the dictionary (in the form of ), whose columns are stacked training samples with class-specific locations.

As discussed earlier, SRC is an SE problem which is expected to be an easier task than an SSR problem. On the other hand, even if the exact signal recovery is not possible in noisy cases or in cases where Inline graphic is not exactly but approximately sparse (which is the case almost all the time in dictionary-based classification problems), it is still possible to recover the support set exactly [25], [38], [45], [46] or partially [46]–[48]. However, many works in the literature dealing with SE problems tend to first apply a sparse recovery technique on Inline graphic to first get , then use simple thresholding over to obtain a sparse SE, . However, SSR techniques such as -minimization are rather slow and their performance varies from one SRR tool to another [9]. In our previous work [9], we proposed an alternative solution for this iterative sparse recovery approach which aims to learn a direct mapping from a test sample Inline graphic to the corresponding support set . Along with the speed and stability compared to conventional SSR-based techniques and recent deep learning-based SSR solutions, CSEN has the crucial advantage of having a compact design that can achieve a good performance level even over scarce training data.

Mathematically speaking, an ideal CSEN is supposed to yield a binary mask Inline graphic

which indicates the true support, i.e., Inline graphic . In order to approximate this ideal case, a CSEN network, produces a probability vector which returns a measure about the probability of each index being in such that . Having the estimated probability map, estimating the support can easily be done via , by thresholding with Inline graphic where is a fixed threshold.

A CSEN is composed of fully convolutional layers, and as input it takes a proxy, Inline graphic , of sparse coefficient vector, which is a coarse estimation of , i.e., or simply . Then, it yields the aforementioned probability like vector via fully convolutional layers. Using such a proxy of , instead of making inference directly on has also studied in a few more recent studies. For instance, in [49] and [50], the authors proposed reconstruction-free image classification from compressively sensed images. Alternatively, one may design a network to learn proxy Inline graphic by fully connected dense layers [49]. However, it increases the computational complexity and may result in an even over-fitting problem with scarce training data [9].

The input vector Inline graphic is reshaped to have a 2-D plane representation in order to use it with 2-D convolutional layers. This transformation is performed via reordering the indices of the atoms in such a way that the nonzero elements of the representation vector for a specific class come together in the 2-D plane. A representative illustration of the proposed dictionary design compared to the traditional one is shown in Fig. 5.

Fig. 5. — Illustration of proposed dictionary design versus conventional design in representation-based classifiers.

Hereafter, the proxy Inline graphic is convolved with the weight kernels, connecting the input with the next layer with filters to yield the inputs of the next layer, with the biases as follows:

where Inline graphic is the weight bias, is either identity or sub-sampling operator predefined according to network structure and . For other layers, i.e., , the feature map of layer is defined as

where Inline graphic is either identity operator or one the operations from down- and up-sampling and is the number of feature maps in layer. Therefore, the trainable parameters of CSEN will be: for an layer CSEN design.

In developing the dictionary that is to be used in the SRC, the training samples are stacked-in by grouping them according to their classes. Thus, instead of using traditional Inline graphic -minimization formulation as in (3), the following group -minimization formulation may result in increased classification accuracy:

where Inline graphic is the group of coefficients from the class. In this manner, one possible cost function for a SE network would be

where Inline graphic is network output at location and is the ground truth binary mask of the sparse code . Due to its high computational complexity, we approximate the cost function in (8) with a simpler average pooling layer after convolutional layer, which can produce directly the estimated class in our CSEN design. An illustration of proposed CSEN-based COVID-19 recognition is shown in Fig. 3.

D. Competing Methods

This section summarizes the competing methods that are selected among numerous alternatives due to their superior performance levels obtained in similar problems. For fair comparative evaluations, all classification methods have the same input feature vectors fed to the proposed CSENs.

1). Collaborative Representation-Based Classification:

As a possible competing technique to the proposed CSEN-based technique which is a hybrid method, CRC [6] is a direct and representation-based classification method that can be applied to this problem as shown in Fig. 6. It is a noniterative SE technique, that satisfies faster and comparable classification performance with SRC while it is more stable compared to existing iterative sparse recovery tools as it is shown in [9]. In the first step of CRC, the tradeoff parameter of the regularized least-square solution is set as Inline graphic . In order to obtain the best possible , a grid search was made in the range with a log scale.

Fig. 6. — Baseline Approach I: CRC is fed by deep learning-based extracted features that are preprocessed.

2). Multilayer Perceptron (MLP) Classification:

The proposed COVID-19 recognition pipeline can be modified by replacing CSEN or CRC part with another classifier. As one of the most-common classifiers, a 4-hidden layer multilayer perceptron (MLP) is used for this problem as shown in Fig. 7. For training, we used back-propagation (BP) with Adam optimization technique [51]. The network and training hyperparameters are as follows: learning rate, Inline graphic , and moment updates , , and 50 as the number of epochs. Fig. 8 illustrates the network configuration in detail. This network configuration has achieved the best performance among others (deeper and shallower) where deep configurations have suffered from over-fitting while the shallow ones exhibit an inferior learning performance.

Fig. 7. — Baseline Approach II: A 5-layer MLP layer is used over the features of CheXNet.

3). Support Vector Machines (SVMs):

For a multiclass problem, the first objective is to select the SVM topology for ensemble learning: one-versus-one or one-versus-all. In order to find the optimal topology and the hyperparameters (e.g., kernel type and its parameters) we first performed a grid-search with the following variations and setting: kernel function {linear, radial basis function (RBF)}, box constraint ( Inline graphic parameter) in the range with a log scale, and kernel scale ( for the RBF kernel) in the range with a log scale.

4). k-Nearest-Neighbor (k-NN):

Finally, we use a traditional approach, Inline graphic -nearest neighbor ( -NN) is used with PCA dimensionality reduction. In a similar fashion, the distance metric and the -value are optimized by a prior grid-search. The following distance metrics are evaluated: City-block, Chebyshev, correlation, cosine, Euclidean, Hamming, Jaccard, Mahalanobis, Minkowski, standardized Euclidean, and Spearman metrics. The Inline graphic -value is varied within the range of with a log scale.

V. Experimental Results

A. Experimental Setup

We have performed our experiments over the QaTa-Cov19 data set, which consists of normal and three pneumonia classes: bacterial, viral, and COVID-19. The proposed approach is evaluated using a stratified fivefold cross-validation (CV) scheme with a ratio of 80% for training and 20% for the test (unseen folds) splits, respectively.

Table II shows the number of X-ray images per class in the QaTa-Cov19 data set. Since the data set is unbalanced, we have applied data augmentation to the training set in order to balance the size of each class in the train set. Therefore, the X-ray images in viral and COVID-19 pneumonia and normal classes are augmented up to the same number as the bacterial pneumonia class in the train set. We use Image Data Generator by Keras to perform data augmentation by randomly rotating the X-ray images in a range of 10°, randomly shifting images both horizontally and vertically within the interval of Inline graphic . In each CV fold, we use a total of 8832 and 1257 images in the train and test (unseen in the fold) sets, respectively.

TABLE II. Number of Images per Class and per-Fold Before and After Data Augmentation.

Class	# of Samples	Training Samples	Augmented Training Samples	Test Samples
Bacterial Pneumonia	2760	2208	2208	552
Viral Pneumonia	1485	1188	2208	297
Normal	1579	1263	2208	316
COVID-19	462	370	2208	92
Total	6286	5029	8832	1257

Open in a new tab

The experimental evaluations of SVM, Inline graphic -NN, and CRC are performed using MATLAB version 2019a, running on PC with Intel^® i7-8650U CPU and 32 GB system memory. On the other hand, MLP and CSEN methods are implemented using Tensorflow library [52] with Python on NVidia^® TITAN-X GPU card. For the CSEN training, ADAM optimizer [51] is used with the proposed default learning parameters: learning rate, Inline graphic , and moment updates , with only 15 back-propagation epochs. Neither grid-search nor any other parameter or configuration optimization was performed for CSEN.

B. Experimental Results

The same network configurations are used for CSEN as in [9]. Accordingly, we use two compact CSEN designs: CSEN1 and CSEN2, respectively. The first CSEN network consists of only two hidden convolutional layers, the first layer has 48 neurons and the second has 24. ReLu activation function is used in the hidden layers and the filter size was Inline graphic . On the other hand, CSEN2 uses max-pooling and has one additional hidden layer with 24 neurons to perform transposed-convolution. CSEN1 and CSEN2 are compared against the 6 competing methods under the same experimental setup.

For the dictionary construction in Inline graphic each CSEN design, 625 images for each class (from the augmented training samples per fold) are stacked in such way that the representation coefficient in the 2-D plane, has size as shown in Fig. 5. The rest of the images in the training set are used to train each CSEN, i.e., 1583 samples from each class. We use PCA dimensional reduction matrix, Inline graphic with the compression ratio, . Therefore, we have equivalent dictionary, , and denoiser to obtain a coarse estimation of the representation (sparse in the ideal case) coefficients, . Hereafter, the CSEN networks are trained to obtain the class information of from input as illustrated in Fig. 3.

Due to the lack of other learning-based SE studies in the literature, we chose a deeper network compared to CSEN designs to investigate the role of network depth in this problem. ReconNet [53] was proposed as a noniterative deep learning solution to CS problem, i.e., Inline graphic and it is one of the state of the art in compressively sensed image recognition task. It consists of six fully convolutional layers and one dense layer in front of the convolutional ones, which act as the learned denoiser for the mapping from to . Then, the convolutional layers are responsible for producing the reconstructed signal, Inline graphic from . Therefore, by replacing this dense layer with the denoiser matrix , this network can be used as a competing method.

Both CSEN and the modified ReconNet use Inline graphic as an input, which is produced using an equivalent dictionary and its pseudo-inverse matrix .

In designing the dictionary of the CRC system, all training samples are stacked in the dictionary, Inline graphic , i.e., 2208 samples from each class. The same PCA matrix used in CSEN-based recognition, is applied to features, . Therefore, a dictionary of size and the corresponding denoiser matrix of size are used in the CRC framework.

Overall, the confusion matrix elements are formed as follows: true positive (TP): the number of correctly detected positive class members, true negative (TN): the number of correctly detected negative class samples, false positive (FP): the number of misclassified negative class members as positive, and FN: the number of misclassified positive class samples as negative (i.e., missed positive cases). Then, the standard performance evaluation metrics are defined as follows:

where sensitivity (or Recall) is the rate of correctly detected positive samples in the positive class

where specificity is the ratio of accurately detected negative class samples to all negative class

where precision is the rate of correctly classified positive class samples among all the members classified as a positive sample

where accuracy is the ratio of correctly classified elements among all the data

where Inline graphic -score is defined by the weighting parameter . The -score is calculated with , which is the harmonic average of precision and sensitivity.

The classification performance of the proposed CSEN-based approach and the competing methods is presented in Table I. As can be easily observed from Table I, the proposed approaches surpass all competing methods in COVID-19 recognition performance by achieving 98.5% sensitivity, and over 95% specificity. As shown in Table III, compared to MLP and ReconNet, the proposed CSEN designs are very compact and computationally efficient. This is evident in Table IV where the computational complexity (measured as total computation, time over the 1257 test images) is reported.

TABLE I. Classification Performances of the Proposed CSEN and Competing Methods. The Best COVID-19 Recognition Rates Are Highlighted.

		k-NN	SVM	MLP	CRC	ReconNet	CSEN1	CSEN2
Accuracy	Bacterial	0.777	0.780	0.763	0.820	0.765	0.793	0.794
	Viral	0.801	0.787	0.765	0.827	0.785	0.805	0.803
	Normal	0.903	0.934	0.933	0.928	0.918	0.926	0.927
	COVID-19	0.950	0.945	0.949	0.955	0.936	0.955	0.959
TN	Bacterial	3166	3219	3114	3063	3180	3177	3173
	Viral	4123	3965	3923	4385	4005	4109	4091
	Normal	4253	4444	4442	4380	4364	4388	4396
	COVID-19	5525	5489	5522	5554	5435	5548	5572
TP	Bacterial	1720	1687	1680	2091	1629	1810	1818
	Viral	909	979	884	816	928	954	959
	Normal	1420	1427	1421	1456	1407	1431	1428
	COVID-19	446	452	444	447	448	455	455
FP	Bacterial	360	307	412	463	346	349	353
	Viral	678	836	878	416	796	692	710
	Normal	454	263	265	327	343	319	311
	COVID-19	299	335	302	270	389	276	252
FN	Bacterial	1040	1073	1080	669	1131	950	942
	Viral	576	506	601	669	557	531	526
	Normal	159	152	158	123	172	148	151
	COVID-19	16	10	18	15	14	7	7
Sensitivity	Bacterial	0.623	0.611	0.609	0.758	0.590	0.656	0.659
	Viral	0.612	0.660	0.595	0.550	0.625	0.642	0.646
	Normal	0.899	0.904	0.900	0.922	0.891	0.906	0.904
	COVID-19	0.965	0.978	0.961	0.968	0.970	0.985	0.985
Specificity	Bacterial	0.898	0.913	0.883	0.869	0.902	0.901	0.900
	Viral	0.859	0.826	0.817	0.913	0.834	0.856	.852
	Normal	0.904	0.944	0.944	0.931	0.927	0.932	0.934
	COVID-19	0.949	0.943	0.948	0.954	0.933	0.953	0.957
F1-score	Bacterial	0.711	0.710	0.693	0.787	0.688	0.736	0.737
	Viral	0.592	0.593	0.545	0.601	0.578	0.609	0.608
	Normal	0.823	0.873	0.870	0.866	0.845	0.860	0.861
	COVID-19	0.740	0.724	0.735	0.758	0.690	0.763	0.778

Open in a new tab

TABLE III. Number of Network Parameters of Each Method.

	MLP	CSEN1	CSEN2	ReconNet
# of trainable parameters	672,836	11,089	16,297	22,914

Open in a new tab

TABLE IV. Computation Times (Sec) of Each Method Over 1257 Test Images.

	CRC (light)	CRC	CSEN1	CSEN2	ReconNet	MLP
Computation Time (in sec.)	13.4176	40.7878	0.2196	0.2272	0.2993	0.2935

Open in a new tab

Finally, Table V presents the overall (cumulative) confusion matrix of the proposed CSEN-based COVID-19 recognition approach over the new QaTa-Cov19 data set. The most critical misclassifications are the false-positives, i.e., the misclassified COVID-19 X-ray images. The confusion matrix shows that the proposed approach has misclassified seven COVID-19 images (out of 462). The 3 out of 7 misclassifications are still in “viral pneumonia” category, which can be an expected confusion due to the viral nature of COVID-19. However, the other four cases are misclassified as “Normal” which is indeed a severe clinical misdiagnosis. A close look at these false-negatives in Fig. 9 reveals the fact that they are indeed very similar to normal images where typical COVID-19 patterns are hardly visible even by an expert’s naked eye. It is possible that these images come from patients who were in the very early stages of COVID-19.

TABLE V. Overall (Cumulative) Confusion Matrix of the Proposed Recognition Scheme.

CSEN2	Predicted
		Bacterial	Viral	Normal	COVID-19
Real	Bacterial	1818	636	180	126
	Viral	338	959	127	61
	Normal	15	71	1428	65
	COVID-19	0	3	4	455

Open in a new tab

Fig. 9. — FNs of the proposed COVID-19 recognition scheme.

VI. Discussion

A. CRC Versus CSEN

When compared against CRC in particular, CSEN-based classification has two advantages; computational efficiency and, a superior COVID-19 recognition performance. The computational efficiency comes from the fact that a larger size dictionary matrix (of the size of Inline graphic ) is used in CRC and hence, this requires more computations in terms of matrix-vector multiplications. Furthermore, saving the trainable parameters ( ) and a light dictionary matrix coefficients ( ) in the test device is more memory efficient compared to saving coefficients ( ) of larger size dictionary used in CRC.

For further analysis, we also tested the CRC framework by using the light dictionary (of size Inline graphic ) used in CSEN-based recognition. We called it CRC (light), and as it can be seen in Table VI, the performance of CRC further reduced, and there was no significant improvement concerning the computational cost. When it comes to creating deeper convolutional layers instead of using CSEN designs, such as the modified ReconNet, the results presented in Table I shows us that compact CSEN structures are indeed preferable to achieve superior classification performances compared to deeper networks.

TABLE VI. Performance of CRC Algorithm When the Dictionary (Size of 625 per Class) That Is Used in CSEN Is Used.

	CRC (Light)
	Accuracy	Sensitivity	Specificity
Bacterial	0.8129	0.7464	0.8650
Viral	0.8163	0.5461	0.8998
Normal	0.9267	0.9170	0.9299
COVID-19	0.9564	0.9394	0.9578

Open in a new tab

B. Compact Versus Deep CSENs

Representation-based classifications are known for providing satisfactory performance when it comes to limited size data sets. On the other hand, deep artificial NNs usually require a large training set to achieve a satisfactory generalization capability.

In a representation-based (dictionary) classification scheme when the dictionary size getting bigger (increase the number of training samples), the computational complexity of the method drastically increases. The proposed CSEN is an alternative approach to handle both moderate and scarce data sets via compact as possible NN structures for the dictionary-based classification.

Since there is no other learning-based SE method except CSEN in the literature, we chose ReconNet as a possible competing algorithm for this problem as explained in detail in Section V. ReconNet has six fully convolution layers. As an ablation study, we also add more hidden layers to proposed CSEN models to compare: CSEN3 and CSEN4 models were obtained by adding one and two hidden layers to CSEN2, respectively, after the transposed convolutional layer. Additional layers have 24 neurons, ReLu activation functions and filter size Inline graphic . As we can observe from Tables VII and VIII, the proposed compact designs, CSEN1 and CSEN2, both surpass deeper counterparts both in performance and the required number of parameters.

TABLE VII. Performance of Alternative Deeper Designs Compared to Compact CSENs.

	Accuracy		Sensitivity		Specificity
	CSEN3	CSEN4	CSEN3	CSEN4	CSEN3	CSEN4
Bacterial	0.793	0.792	0.651	0.653	0.904	0.900
Viral	0.808	0.805	0.642	0.638	0.859	0.856
Normal	0.922	0.921	0.907	0.899	0.927	0.928
Covid-19	0.954	0.954	0.990	0.987	0.951	0.952

Open in a new tab

TABLE VIII. Number of Network Parameters of Competing SE Networks.

	CSEN1	CSEN2	CSEN3	ReconNet	CSEN 4
# of trainable parameters	11,089	16,297	21,505	22,914	26,713

Open in a new tab

VII. Conclusion

The commonly used methods in COVID-19 diagnosis, namely RT-PCR and CT have certain limitations and drawbacks such as long processing times and unacceptably high misdiagnosis rates. These drawbacks are also shared by most of the recent works in the literature based on deep learning due to data scarcity from the COVID-19 cases. Although deep learning-based recognition techniques are dominant in computer vision where they achieved state-of-the-art performance, their performance degrades fast due to data scarcity, which is the reality in this problem at hand. This study aims to address such limitations by proposing a robust and highly accurate COVID-19 recognition approach directly from X-ray images. The proposed approach is based on the CSEN that can be seen as a bridge between deep learning models and representation-based methods. CSEN uses both a dictionary and a set of training samples to learn a direct mapping from the query samples to the sparse support set of representation coefficients. With this unique ability and having the advantage of a compact network, the proposed CSEN-based COVID-19 recognition systems surpass the competing methods and achieve over 98% sensitivity and over 95% specificity. Furthermore, they yield the most computationally efficient scheme in terms of speed and memory.

Acknowledgment

The authors would like to thank the following medical doctor team for their generous feedbacks and continuous proof reading: Khalid Hameed is a MD in Reem Medical Center, Doha, Qatar. Tahir Hamid is consultant cardiologist in Hamad Medical Corporation Hospital and with Weill Cornell Medicine-Qatar, Doha. Rashid Mazhar is a MD in Hamad Medical Corporation Hospital, Doha, Qatar.

Biographies

graphic file with name yamac-3070467.gif

Mehmet Yamaç received the B.S. degree in electrical and electronics engineering from Anadolu University, Eskisehir, Turkey, in 2009, and the M.S. degree in electrical and electronics engineering from Bogazici University, Istanbul, Turkey, in 2014. He is currently pursuing the Ph.D. degree with the Department of Computing Sciences, Tampere University, Tampere, Finland.

He was a Research and Teaching Assistant with Bogazici University from 2012 to 2017 and a Researcher with Tampere University from 2017 to 2020. He is currently working as a Senior Researcher with Huawei Technologies Oy, Helsinki, Finland. He has coauthored the articles nominated for the “Best Paper Award” or the “Student Best Paper Award” in EUVIP 2018 and EUSIPCO 2019. His research interests are computer and machine vision, machine learning, and compressive sensing.

graphic file with name ahish-3070467.gif

Mete Ahishali received the B.Sc. degree (Hons.) in electrical and electronics engineering from the Izmir University of Economics, Izmir, Turkey, in 2017, and the M.Sc. degree (Hons.) in data engineering and machine learning from Tampere University, Tampere, Finland, in 2019, where he is currently pursuing the Ph.D. degree in computing and electrical engineering.

Since 2017, he has been working as a Researcher with the Signal Analysis and Machine Intelligence Research Group under the supervision of Prof. Gabbouj. His research interests are pattern recognition, machine learning, and semantic segmentation with applications in computer vision, remote sensing, and biomedical images.

graphic file with name deger-3070467.gif

Aysen Degerli received the B.Sc. degree (Hons.) in electrical and electronics engineering from the Izmir University of Economics, Izmir, Turkey, in 2017, and the M.Sc. degree (Hons.) in data engineering and machine learning from Tampere University, Tampere, Finland, in 2019, where she is currently pursuing the Ph.D. degree in computing and electrical engineering with the Signal Analysis and Machine Intelligence Research Group led by Prof. M. Gabbouj.

Her research interests include machine learning, compressive sensing, and biomedical image processing.

graphic file with name kiran-3070467.gif

Serkan Kiranyaz (Senior Member, IEEE) is a Professor with Qatar University, Doha, Qatar. He published two books, five book chapters, more than 80 journal articles in high impact journals, and 100 articles in international conferences. He made contributions on evolutionary optimization, machine learning, bio-signal analysis, computer vision with applications to recognition, classification, and signal processing. He has coauthored the articles which have nominated or received the “Best Paper Award” in ICIP 2013, ICPR 2014, ICIP 2015, and IEEE Transactions on Signal Processing (TSP) 2018. He had the most-popular articles in the years 2010 and 2016, and most-cited article in 2018 in IEEE Transactions on Biomedical Engineering. From 2010 to 2015, he authored the 4th most-cited article of the Neural Networks journal. His research team has won the second and first places in PhysioNet Grand Challenges 2016 and 2017, among 48 and 75 international teams, respectively. His theoretical contributions to advance the current state of the art in modeling and representation, targeting high long-term impact, while algorithmic, system level design and implementation issues target medium and long-term challenges for the next five to ten years. He in particular aims at investigating scientific questions and inventing cutting edge solutions in “personalized biomedicine” which is in one of the most dynamic areas where science combines with technology to produce efficient signal and information processing systems.

Prof. Kiranyaz received the “Research Excellence Award” and the “Merit Award” of Qatar University in 2019.

graphic file with name chowd-3070467.gif

Muhammad E. H. Chowdhury (Senior Member, IEEE) received the Ph.D. degree from the University of Nottingham, Nottingham, U.K., in 2014.

He worked as a Postdoctoral Research Fellow with the Sir Peter Mansfield Imaging Center, University of Nottingham. He is currently working as an Assistant Professor with the Department of Electrical Engineering, Qatar University, Doha, Qatar. He has two patents and published around 80 peer-reviewed journal articles, conference papers, and four book chapters. His current research interests include biomedical instrumentation, signal processing, wearable sensors, medical image analysis, machine learning, embedded system design, and simultaneous EEG/fMRI. He is also running several QNRF grants and internal grants from Qatar University along with academic and government projects along with different national and international projects. He has worked as a Consultant for the projects entitled, “Driver Distraction Management Using Sensor Data Cloud (2013–14),” Information Society Innovation Fund (ISIF) Asia).

Dr. Chowdhury received the ISIF Asia Community Choice Award 2013 for a project entitled, “Design and Development of Precision Agriculture Information System for Bangladesh.” He has recently won the COVID-19 Data Set Award for his contribution to the fight against COVID-19. He is serving as an Associate Editor for IEEE Access and a Topic Editor for Frontiers in Neuroscience.

graphic file with name gabbo-3070467.gif

Moncef Gabbouj (Fellow, IEEE) received the B.S. degree from Oklahoma State University, Stillwater, OK, USA, in 1985, and the M.S. and Ph.D. degrees from Purdue University, in 1986 and 1989, respectively, all in electrical engineering.

He is a Professor of signal processing with the Department of Computing Sciences, Tampere University, Tampere, Finland. He was an Academy of Finland Professor from 2011 to 2015. His research interests include big data analytics, multimedia content-based analysis, indexing and retrieval, artificial intelligence, machine learning, pattern recognition, nonlinear signal and image processing and analysis, voice conversion, and video processing and coding.

Dr. Gabbouj is a member of the Academia Europaea and the Finnish Academy of Science and Letters. He is the past Chairman of the IEEE CAS TC on DSP and the Committee Member of the IEEE Fourier Award for Signal Processing. He served as an Associate Editor and the Guest Editor of many IEEE, and international journals and a Distinguished Lecturer for the IEEE CASS. He is the Finland Site Director of the NSF IUCRC funded Center for Visual and Decision Informatics (CVDI) and leads the Artificial Intelligence Research Task Force of the Ministry of Economic Affairs and Employment funded Research Alliance on Autonomous Systems (RAAS).

Footnotes

^¹

The statements belong to the medical doctors whose names are listed in the Acknowledgment section.

References

[1].Pellis L.et al. , “Challenges in control of COVID-19: Short doubling time and long delay to effect of interventions,” 2020, arXiv:2004.00117. [Online]. Available: http://arxiv.org/abs/2004.00117 [DOI] [PMC free article] [PubMed]
[2].Zhou F.et al. , “Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study,” Lancet, vol. 395, no. 10229, pp. 1054–1062, Mar. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Fang Y.et al. , “Sensitivity of chest CT for COVID-19: Comparison to RT-PCR,” Radiology, vol. 296, no. 2, pp. E115–E117, Aug. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Erickson K. A., Mackenzie K., and Marshall A., “Advanced but expensive technology. Balancing affordability with access in rural areas,” Can. Family Physician Medecin de Famille Canadien, vol. 39, pp. 28–30, Jan. 1993. [PMC free article] [PubMed] [Google Scholar]
[5].World Health Organization, “Laboratory testing for coronavirus disease (COVID-19) in suspected human cases: Interim guidance,” World Health Org., Tech. Rep. WHO/COVID-19/laboratory/2020.5, Mar. 2020.
[6].Zhang L., Yang M., and Feng X., “Sparse representation or collaborative representation: Which helps face recognition?” in Proc. Int. Conf. Comput. Vis., Nov. 2011, pp. 471–478. [Google Scholar]
[7].Wright J., Yang A. Y., Ganesh A., Sastry S. S., and Ma Y., “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. [DOI] [PubMed] [Google Scholar]
[8].Wright J., Ma Y., Mairal J., Sapiro G., Huang T. S., and Yan S., “Sparse representation for computer vision and pattern recognition,” Proc. IEEE, vol. 98, no. 6, pp. 1031–1044, Jun. 2010. [Google Scholar]
[9].Yamac M., Ahishali M., Kiranyaz S., and Gabbouj M., “Convolutional sparse support estimator network (CSEN) from energy efficient support estimation to learning-aided compressive sensing,” 2020, arXiv:2003.00768. [Online]. Available: http://arxiv.org/abs/2003.00768 [DOI] [PubMed]
[10].de Fourier B. and Joseph J. B., Théorie Analytique de la Chaleur. Firmin Didot, 1822. [Google Scholar]
[11].Mallat S. G. and Zhang Z., “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, Dec. 1993. [Google Scholar]
[12].Starck J.-L., Candes E. J., and Donoho D. L., “The curvelet transform for image denoising,” IEEE Trans. Image Process., vol. 11, no. 6, pp. 670–684, Jun. 2002. [DOI] [PubMed] [Google Scholar]
[13].Yang J., Yu K., Gong Y., and Huang T., “Linear spatial pyramid matching using sparse coding for image classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 1794–1801. [Google Scholar]
[14].Adler A., Elad M., Hel-Or Y., and Rivlin E., “Sparse coding with anomaly detection,” J. Signal Process. Syst., vol. 79, no. 2, pp. 179–188, May 2015. [Google Scholar]
[15].Carrera D., Boracchi G., Foi A., and Wohlberg B., “Detecting anomalous structures by convolutional sparse models,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2015, pp. 1–8. [Google Scholar]
[16].Wen W., Wu C., Wang Y., Chen Y., and Li H., “Learning structured sparsity in deep neural networks,” in Adv. neural Inf. Process. Syst., 2016, pp. 2074–2082. [Google Scholar]
[17].Donoho D. L., “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [Google Scholar]
[18].Candès E. J., “Compressive sampling,” in Proc. Int. Congr. Math., vol. 3. Madrid, Spain, 2006, pp. 1433–1452. [Google Scholar]
[19].Donoho D. L. and Elad M., “Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization,” Proc. Nat. Acad. Sci. USA, vol. 100, no. 5, pp. 2197–2202, Mar. 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Cohen A., Dahmen W., and DeVore R., “Compressed sensing and best K-term approximation,” J. Amer. Math. Soc., vol. 22, no. 1, pp. 211–231, 2009. [Google Scholar]
[21].Rauhut H., “Compressive sensing and structured random matrices,” in Theoretical Foundations and Numerical Methods for Sparse Recovery (Radon Series on Computational and Applied Mathematics), vol. 9, M. Fornasier, Ed. deGruyter, 2010, pp. 1–92. [Google Scholar]
[22].Chen S. S., Donoho D. L., and Saunders M. A., “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, Jan. 2001. [Google Scholar]
[23].Candès E. J. and Tao T., “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [Google Scholar]
[24].Candès E. J., “The restricted isometry property and its implications for compressed sensing,” Comp. Rendus Mathematique, vol. 346, nos. 9–10, pp. 589–592, May 2008. [Google Scholar]
[25].Wang W., Wainwright M. J., and Ramchandran K., “Information-theoretic limits on sparse support recovery: Dense versus sparse measurements,” in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2008, pp. 2197–2201. [Google Scholar]
[26].Haupt J. and Baraniuk R., “Robust support recovery using sparse compressive sensing matrices,” in Proc. 45th Annu. Conf. Inf. Sci. Syst., Mar. 2011, pp. 1–6. [Google Scholar]
[27].Rajpurkar P.et al. , “CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning,” 2017, arXiv:1711.05225. [Online]. Available: http://arxiv.org/abs/1711.05225
[28].Huang G., Liu Z., Van Der Maaten L., and Weinberger K. Q., “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. [Google Scholar]
[29].Deng J., Dong W., Socher R., Li L.-J., Li K., and Fei-Fei L., “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255. [Google Scholar]
[30].Wang X., Peng Y., Lu L., Lu Z., Bagheri M., and Summers R. M., “ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2097–2106. [Google Scholar]
[31].Shekhar S., Patel V. M., Nasrabadi N. M., and Chellappa R., “Joint sparse representation for robust multimodal biometrics recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 113–126, Jan. 2014. [DOI] [PubMed] [Google Scholar]
[32].Mei X. and Ling H., “Robust visual tracking and vehicle classification via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2259–2272, Nov. 2011. [DOI] [PubMed] [Google Scholar]
[33].Guha T. and Ward R. K., “Learning sparse representations for human action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 8, pp. 1576–1588, Aug. 2012. [DOI] [PubMed] [Google Scholar]
[34].Li W. and Du Q., “A survey on representation-based classification and detection in hyperspectral remote sensing imagery,” Pattern Recognit. Lett., vol. 83, pp. 115–123, Nov. 2016. [Google Scholar]
[35].Chowdhury M. E. H.et al. , “Can AI help in screening viral and COVID-19 pneumonia?” 2020, arXiv:2003.13145. [Online]. Available: http://arxiv.org/abs/2003.13145
[36].Apostolopoulos I. D. and Mpesiana T. A., “Covid-19: Automatic detection from X-ray images utilizing transfer learning with convolutional neural networks,” Phys. Eng. Sci. Med., vol. 43, no. 2, pp. 635–640, Jun. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Hall L. O., Paul R., Goldgof D. B., and Goldgof G. M., “Finding covid-19 from chest X-rays using deep learning on a small dataset,” 2020, arXiv:2004.02060. [Online]. Available: http://arxiv.org/abs/2004.02060
[38].Wainwright M., “Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting,” in Proc. IEEE Int. Symp. Inf. Theory, Jun. 2007, pp. 961–965. [Google Scholar]
[39].Cohen J. P., Morrison P., and Dao L., “COVID-19 image data collection,” 2020, arXiv:2003.11597. [Online]. Available: http://arxiv.org/abs/2003.11597
[40].(2020). COVID-19 database. [Online]. Available: https://www.sirm.org/category/senza-categoria/covid-19/
[41].(2020). [Online]. Available: https://radiopaedia.org/playlists/25975?
[42].(2020). [Online]. Available: https://threadreaderapp.com/thread/1243928581983670272.html
[43].Monteral J. C.. (2020). COVID-Chestxray Database, [Online]. Available: https://github.com/ieee8023/covid-chestxray-dataset
[44].Mooney P.. (2018). Chest X-ray Images (Pneumonia). kaggle, Marzo. [Online]. Available: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia [Google Scholar]
[45].Rahnama Rad K., “Nearly sharp sufficient conditions on exact sparsity pattern recovery,” IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4672–4679, Jul. 2011. [Google Scholar]
[46].Scarlett J. and Cevher V., “Limits on support recovery with probabilistic models: An information-theoretic framework,” IEEE Trans. Inf. Theory, vol. 63, no. 1, pp. 593–620, Jan. 2017. [Google Scholar]
[47].Reeves G. and Gastpar M., “Sampling bounds for sparse support recovery in the presence of noise,” in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2008, pp. 2187–2191. [Google Scholar]
[48].Reeves G. and Gastpar M. C., “Approximate sparsity pattern recovery: Information-theoretic lower bounds,” IEEE Trans. Inf. Theory, vol. 59, no. 6, pp. 3451–3465, Jun. 2013. [Google Scholar]
[49].Degerli A., Aslan S., Yamac M., Sankur B., and Gabbouj M., “Compressively sensed image recognition,” in Proc. 7th Eur. Workshop Vis. Inf. Process. (EUVIP), Nov. 2018, pp. 1–6. [Google Scholar]
[50].Lohit S., Kulkarni K., and Turaga P., “Direct inference on compressive measurements using convolutional neural networks,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 1913–1917. [Google Scholar]
[51].Kingma D. P. and Ba J., “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980. [Online]. Available: http://arxiv.org/abs/1412.6980
[52].Abadi M.et al. , “TensorFlow: Large-scale machine learning on heterogeneous distributed systems,” 2016, arXiv:1603.04467. [Online]. Available: http://arxiv.org/abs/1603.04467
[53].Kulkarni K., Lohit S., Turaga P., Kerviche R., and Ashok A., “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 449–458. [Google Scholar]

[ref1] [1].Pellis L.et al. , “Challenges in control of COVID-19: Short doubling time and long delay to effect of interventions,” 2020, arXiv:2004.00117. [Online]. Available: http://arxiv.org/abs/2004.00117 [DOI] [PMC free article] [PubMed]

[ref2] [2].Zhou F.et al. , “Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study,” Lancet, vol. 395, no. 10229, pp. 1054–1062, Mar. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] [3].Fang Y.et al. , “Sensitivity of chest CT for COVID-19: Comparison to RT-PCR,” Radiology, vol. 296, no. 2, pp. E115–E117, Aug. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] [4].Erickson K. A., Mackenzie K., and Marshall A., “Advanced but expensive technology. Balancing affordability with access in rural areas,” Can. Family Physician Medecin de Famille Canadien, vol. 39, pp. 28–30, Jan. 1993. [PMC free article] [PubMed] [Google Scholar]

[ref5] [5].World Health Organization, “Laboratory testing for coronavirus disease (COVID-19) in suspected human cases: Interim guidance,” World Health Org., Tech. Rep. WHO/COVID-19/laboratory/2020.5, Mar. 2020.

[ref6] [6].Zhang L., Yang M., and Feng X., “Sparse representation or collaborative representation: Which helps face recognition?” in Proc. Int. Conf. Comput. Vis., Nov. 2011, pp. 471–478. [Google Scholar]

[ref7] [7].Wright J., Yang A. Y., Ganesh A., Sastry S. S., and Ma Y., “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. [DOI] [PubMed] [Google Scholar]

[ref8] [8].Wright J., Ma Y., Mairal J., Sapiro G., Huang T. S., and Yan S., “Sparse representation for computer vision and pattern recognition,” Proc. IEEE, vol. 98, no. 6, pp. 1031–1044, Jun. 2010. [Google Scholar]

[ref9] [9].Yamac M., Ahishali M., Kiranyaz S., and Gabbouj M., “Convolutional sparse support estimator network (CSEN) from energy efficient support estimation to learning-aided compressive sensing,” 2020, arXiv:2003.00768. [Online]. Available: http://arxiv.org/abs/2003.00768 [DOI] [PubMed]

[ref10] [10].de Fourier B. and Joseph J. B., Théorie Analytique de la Chaleur. Firmin Didot, 1822. [Google Scholar]

[ref11] [11].Mallat S. G. and Zhang Z., “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, Dec. 1993. [Google Scholar]

[ref12] [12].Starck J.-L., Candes E. J., and Donoho D. L., “The curvelet transform for image denoising,” IEEE Trans. Image Process., vol. 11, no. 6, pp. 670–684, Jun. 2002. [DOI] [PubMed] [Google Scholar]

[ref13] [13].Yang J., Yu K., Gong Y., and Huang T., “Linear spatial pyramid matching using sparse coding for image classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 1794–1801. [Google Scholar]

[ref14] [14].Adler A., Elad M., Hel-Or Y., and Rivlin E., “Sparse coding with anomaly detection,” J. Signal Process. Syst., vol. 79, no. 2, pp. 179–188, May 2015. [Google Scholar]

[ref15] [15].Carrera D., Boracchi G., Foi A., and Wohlberg B., “Detecting anomalous structures by convolutional sparse models,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2015, pp. 1–8. [Google Scholar]

[ref16] [16].Wen W., Wu C., Wang Y., Chen Y., and Li H., “Learning structured sparsity in deep neural networks,” in Adv. neural Inf. Process. Syst., 2016, pp. 2074–2082. [Google Scholar]

[ref17] [17].Donoho D. L., “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [Google Scholar]

[ref18] [18].Candès E. J., “Compressive sampling,” in Proc. Int. Congr. Math., vol. 3. Madrid, Spain, 2006, pp. 1433–1452. [Google Scholar]

[ref19] [19].Donoho D. L. and Elad M., “Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization,” Proc. Nat. Acad. Sci. USA, vol. 100, no. 5, pp. 2197–2202, Mar. 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] [20].Cohen A., Dahmen W., and DeVore R., “Compressed sensing and best K-term approximation,” J. Amer. Math. Soc., vol. 22, no. 1, pp. 211–231, 2009. [Google Scholar]

[ref21] [21].Rauhut H., “Compressive sensing and structured random matrices,” in Theoretical Foundations and Numerical Methods for Sparse Recovery (Radon Series on Computational and Applied Mathematics), vol. 9, M. Fornasier, Ed. deGruyter, 2010, pp. 1–92. [Google Scholar]

[ref22] [22].Chen S. S., Donoho D. L., and Saunders M. A., “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, Jan. 2001. [Google Scholar]

[ref23] [23].Candès E. J. and Tao T., “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [Google Scholar]

[ref24] [24].Candès E. J., “The restricted isometry property and its implications for compressed sensing,” Comp. Rendus Mathematique, vol. 346, nos. 9–10, pp. 589–592, May 2008. [Google Scholar]

[ref25] [25].Wang W., Wainwright M. J., and Ramchandran K., “Information-theoretic limits on sparse support recovery: Dense versus sparse measurements,” in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2008, pp. 2197–2201. [Google Scholar]

[ref26] [26].Haupt J. and Baraniuk R., “Robust support recovery using sparse compressive sensing matrices,” in Proc. 45th Annu. Conf. Inf. Sci. Syst., Mar. 2011, pp. 1–6. [Google Scholar]

[ref27] [27].Rajpurkar P.et al. , “CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning,” 2017, arXiv:1711.05225. [Online]. Available: http://arxiv.org/abs/1711.05225

[ref28] [28].Huang G., Liu Z., Van Der Maaten L., and Weinberger K. Q., “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. [Google Scholar]

[ref29] [29].Deng J., Dong W., Socher R., Li L.-J., Li K., and Fei-Fei L., “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255. [Google Scholar]

[ref30] [30].Wang X., Peng Y., Lu L., Lu Z., Bagheri M., and Summers R. M., “ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2097–2106. [Google Scholar]

[ref31] [31].Shekhar S., Patel V. M., Nasrabadi N. M., and Chellappa R., “Joint sparse representation for robust multimodal biometrics recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 113–126, Jan. 2014. [DOI] [PubMed] [Google Scholar]

[ref32] [32].Mei X. and Ling H., “Robust visual tracking and vehicle classification via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2259–2272, Nov. 2011. [DOI] [PubMed] [Google Scholar]

[ref33] [33].Guha T. and Ward R. K., “Learning sparse representations for human action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 8, pp. 1576–1588, Aug. 2012. [DOI] [PubMed] [Google Scholar]

[ref34] [34].Li W. and Du Q., “A survey on representation-based classification and detection in hyperspectral remote sensing imagery,” Pattern Recognit. Lett., vol. 83, pp. 115–123, Nov. 2016. [Google Scholar]

[ref35] [35].Chowdhury M. E. H.et al. , “Can AI help in screening viral and COVID-19 pneumonia?” 2020, arXiv:2003.13145. [Online]. Available: http://arxiv.org/abs/2003.13145

[ref36] [36].Apostolopoulos I. D. and Mpesiana T. A., “Covid-19: Automatic detection from X-ray images utilizing transfer learning with convolutional neural networks,” Phys. Eng. Sci. Med., vol. 43, no. 2, pp. 635–640, Jun. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] [37].Hall L. O., Paul R., Goldgof D. B., and Goldgof G. M., “Finding covid-19 from chest X-rays using deep learning on a small dataset,” 2020, arXiv:2004.02060. [Online]. Available: http://arxiv.org/abs/2004.02060

[ref38] [38].Wainwright M., “Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting,” in Proc. IEEE Int. Symp. Inf. Theory, Jun. 2007, pp. 961–965. [Google Scholar]

[ref39] [39].Cohen J. P., Morrison P., and Dao L., “COVID-19 image data collection,” 2020, arXiv:2003.11597. [Online]. Available: http://arxiv.org/abs/2003.11597

[ref40] [40].(2020). COVID-19 database. [Online]. Available: https://www.sirm.org/category/senza-categoria/covid-19/

[ref41] [41].(2020). [Online]. Available: https://radiopaedia.org/playlists/25975?

[ref42] [42].(2020). [Online]. Available: https://threadreaderapp.com/thread/1243928581983670272.html

[ref43] [43].Monteral J. C.. (2020). COVID-Chestxray Database, [Online]. Available: https://github.com/ieee8023/covid-chestxray-dataset

[ref44] [44].Mooney P.. (2018). Chest X-ray Images (Pneumonia). kaggle, Marzo. [Online]. Available: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia [Google Scholar]

[ref45] [45].Rahnama Rad K., “Nearly sharp sufficient conditions on exact sparsity pattern recovery,” IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4672–4679, Jul. 2011. [Google Scholar]

[ref46] [46].Scarlett J. and Cevher V., “Limits on support recovery with probabilistic models: An information-theoretic framework,” IEEE Trans. Inf. Theory, vol. 63, no. 1, pp. 593–620, Jan. 2017. [Google Scholar]

[ref47] [47].Reeves G. and Gastpar M., “Sampling bounds for sparse support recovery in the presence of noise,” in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2008, pp. 2187–2191. [Google Scholar]

[ref48] [48].Reeves G. and Gastpar M. C., “Approximate sparsity pattern recovery: Information-theoretic lower bounds,” IEEE Trans. Inf. Theory, vol. 59, no. 6, pp. 3451–3465, Jun. 2013. [Google Scholar]

[ref49] [49].Degerli A., Aslan S., Yamac M., Sankur B., and Gabbouj M., “Compressively sensed image recognition,” in Proc. 7th Eur. Workshop Vis. Inf. Process. (EUVIP), Nov. 2018, pp. 1–6. [Google Scholar]

[ref50] [50].Lohit S., Kulkarni K., and Turaga P., “Direct inference on compressive measurements using convolutional neural networks,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2016, pp. 1913–1917. [Google Scholar]

[ref51] [51].Kingma D. P. and Ba J., “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980. [Online]. Available: http://arxiv.org/abs/1412.6980

[ref52] [52].Abadi M.et al. , “TensorFlow: Large-scale machine learning on heterogeneous distributed systems,” 2016, arXiv:1603.04467. [Online]. Available: http://arxiv.org/abs/1603.04467

[ref53] [53].Kulkarni K., Lohit S., Turaga P., Kerviche R., and Ashok A., “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 449–458. [Google Scholar]

PERMALINK

Convolutional Sparse Support Estimator-Based COVID-19 Recognition From X-Ray Images

Mehmet Yamac

Mete Ahishali

Aysen Degerli

Serkan Kiranyaz

Muhammad E H Chowdhury

Moncef Gabbouj

Abstract

I. Introduction

Fig. 1.

Fig. 2.

II. Preliminaries and Mathematical Notations

A. Notations

B. Sparse Signal Representation

III. Background and Prior Art

A. CheXNet

B. Representation-Based Classification

IV. Proposed Approach

Fig. 3.

A. Benchmark Data Set: QaTa-Cov19

Fig. 4.

B. Feature Extraction

C. Proposed CSEN-Based Classification

Fig. 5.

D. Competing Methods

1). Collaborative Representation-Based Classification:

Fig. 6.

2). Multilayer Perceptron (MLP) Classification:

Fig. 7.

Fig. 8.

3). Support Vector Machines (SVMs):

4). k-Nearest-Neighbor (k-NN):

V. Experimental Results

A. Experimental Setup

TABLE II. Number of Images per Class and per-Fold Before and After Data Augmentation.

B. Experimental Results

TABLE I. Classification Performances of the Proposed CSEN and Competing Methods. The Best COVID-19 Recognition Rates Are Highlighted.

TABLE III. Number of Network Parameters of Each Method.

TABLE IV. Computation Times (Sec) of Each Method Over 1257 Test Images.

TABLE V. Overall (Cumulative) Confusion Matrix of the Proposed Recognition Scheme.

Fig. 9.

VI. Discussion

A. CRC Versus CSEN

TABLE VI. Performance of CRC Algorithm When the Dictionary (Size of 625 per Class) That Is Used in CSEN Is Used.

B. Compact Versus Deep CSENs

TABLE VII. Performance of Alternative Deeper Designs Compared to Compact CSENs.

TABLE VIII. Number of Network Parameters of Competing SE Networks.

VII. Conclusion

Acknowledgment

Biographies

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases