Abstract
Accurate and rapid diagnosis of coronavirus disease 2019 (COVID-19) from chest CT scans is of great importance and urgency during the worldwide outbreak. However, radiologists have to distinguish COVID-19 pneumonia from other pneumonia in a large number of CT scans, which is tedious and inefficient. Thus, it is urgently and clinically needed to develop an efficient and accurate diagnostic tool to help radiologists to fulfill the difficult task. In this study, we proposed a deep supervised autoencoder (DSAE) framework to automatically identify COVID-19 using multi-view features extracted from CT images. To fully explore features characterizing CT images from different frequency domains, DSAE was proposed to learn the latent representation by multi-task learning. The proposal was designed to both encode valuable information from different frequency features and construct a compact class structure for separability. To achieve this, we designed a multi-task loss function, which consists of a supervised loss and a reconstruction loss. Our proposed method was evaluated on a newly collected dataset of 787 subjects including COVID-19 pneumonia patients, other pneumonia patients, and normal subjects without abnormal CT findings. Extensive experimental results demonstrated that our proposed method achieved encouraging diagnostic performance and may have potential clinical application for the diagnosis of COVID-19.
Keywords: COVID-19, deep supervised autoencoder, multi-view features, multi-task learning
1. Introduction
Since the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first reported in December 2019, in Wuhan, China [1], it has brought tremendous panic all around the world. The challenges related to the shortage of medical resources and staffs in the global medical system posed by COVID-19 remain intractable. As of January 29, over one hundred million cases were confirmed in the world [2]. Given the facts that COVID-19 is highly contagious and no effective vaccine is routinely used in clinical practice, strict prevention and early diagnosis remain the most effective way to fight the outbreak of COVID-19.
Reverse-transcription polymerase chain reaction (RT-PCR) has become standard of care in the diagnosis of COVID-19 [3]. However, its inherent disadvantage, that is false negative, can limit the clinical applicability to early diagnose the disease [4], [5]. A delayed diagnosis can increase the risk of viral transmission, which is not conducive to the epidemic prevention and control. Therefore, a more sensitive diagnostic tool is urgently needed. Computed tomography (CT) has played a vital role in the screening, diagnosis and evaluation of treatment response of COVID-19 [6], [7], [8]. More importantly, previous studies have reported that some patients have typical chest CT scan findings and symptoms for COVID-19 but their initial RT-PCR results are negative [5], [9], [10]. In this context, CT was considered as a clinical diagnostic tool in China and helped us to screen out and isolate suspected cases. In this way, a lot of cases were timely diagnosed and the spread of virus had been substantially avoided. Although CT has high sensitivity, it has pitfalls such as a relatively low specificity. The typical imaging findings of COVID-19 pneumonia are bilateral and peripheral ground-glass and consolidative opacities [6], [11]. However, other lung diseases may also present the aforementioned imaging manifestations [9], [12]. Moreover, accurately differentiating COVID-19 pneumonia from other pneumonia in a large number of CT examinations is a tedious and inefficient work, which could compromise the accuracy. In this context, it is urgently and clinically needed to find an efficient and accurate diagnostic tool to help the radiologists to fulfill this difficult task.
With the state-of-the-art data analysis strategy, artificial intelligence (AI) technologies, especially convolutional neural networks (CNNs), have achieved remarkable success in medical imaging analysis. Numerous studies have shown great potential in automatic diagnosis of COVID-19 from medical images [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]. According to the difference of inputs and screening classifiers, we categorize pioneering methods used for classifying COVID-19 into four classes. The first class is the feature engineering-based approaches that manually annotate the infection areas, quantify radiomic features, and then train machine learning classifiers based on the features [17], [18]. A preliminary study conducted by Fang et al. reported that some image biomarkers had strong predictive power for screening out COVID-19 pneumonia from CT images [17]. Subsequently, they built a radiomic signature by combining clinical features and radiomic features to improve the diagnostic performance [18]. However, the number of data samples used in their studies is relatively small, which requires further validation by larger prospective multicenter studies.
The second class is the transfer learning-based methods that take a series of CT slices as input, employ the state-of-the-art pre-trained models as backbones to generate features and perform slice-wise decisions [19], [20], [25]. Li et al. used the segmented lung regions as input and employed RestNet50 architecture, which won the first place in multi-intelligence tasks [26], as the backbone to extract features for differentiating COVID-19 pneumonia from community acquired pneumonia and other lung diseases [19]. Similarly, Bai el al. used EfficientNet architecture, which achieved state-of-the-art accuracy on ImageNet and CIFAR-100 datasets [27], to distinguish COVID-19 from other pneumonia with CT images [20]. Their high predictive performance partially benefited from carefully preprocessed and selected data, such as manual corrections and annotations.
The third class is the 3D CNNs-based methods that directly take 3D CT images as input and train their proposed 3D CNNs to identify COVID-19 pneumonia [21], [22], [28], [29]. Wang et al. applied a 3D connected component algorithm [30] for lesion localization in an unsupervised manner, and then took 3D CT images with the corresponding 3D lesion masks into a CNN model to generate the probabilities of COVID-19 positive and COVID-19 negative [21]. Although it is no longer required to manually annotate the COVID-19 lesions on CT images, the segmentations they gained are still imperfect. To improve the accuracy of automatic identification of infected regions, Ouyang et al. used an established segmentation model to automatically extract the lung regions with infection lesions and put them into a dual-sampling attention network to focus on diagnosing COVID-19 pneumonia [22]. This work can avoid errors that may be caused by intermediate processes, but the reliability of their network needs to be trained with a large amount of data.
The forth class is the representation learning-based methods that learn latent representation from CT images to diagnose COVID-19 pneumonia [23], [24]. Kang et al. explored multiple features from infected lesions and designed a multi-view representation-based framework for diagnosis of COVID-19 [23]. Han et al. employed an attention-based deep multiple instance learning to obtain bag representations and transformed them into final prediction by using two fully connected layers [24]. Since they are highly predictive and well interpretable, representation learning-based approaches have great potential in diagnosing COVID-19. Although significant advancements have achieved, the diagnostic methods remain underexplored.
In this study, we proposed a classification framework towards identify COVID-19 pneumonia rapidly and accurately from other pneumonia and normal subjects. It is worth noting that automatic and accurate segmentation of COVID-19 or other pneumonia lesions from CT images is extremely challenging due to complex and changeable manifestations of pneumonia lesions. Since there is a large contrast difference between the lesion and the normal tissue in the lungs, we employed a morphological technique to detect lesions as an alternative for segmentation of lesions. To further demonstrate the superiority of the alternative, we took subjects without abnormal CT findings as normal group into consideration for the triple classification task. Thus, we only performed lung parenchyma segmentation from 3D CT images using a pre-trained 3D U-Net model [31]. Then, the lung parenchyma was texturized by using 3D wavelet transform to capture multiple different frequency subbands. We extracted the multiple features including gray features and texture features from the subbands with different frequencies, which were considered as a multi-view feature set. Based on multi-view learning and supervised autoencoder [32], we designed a deep supervised autoencoder (DSAE) to map the original features into a latent space, aiming to learn informative and structured representations. A series of experiments on a newly collected dataset from multiple institutions were conducted to evaluate our proposed method, and the results showed that our method could achieve encouraging diagnostic performance. Our main contributions are summarised as follows:
-
1)
We employed a morphological technique, namely 3D wavelet transform, to decompose the original image with its lung mask into multiple frequency subbands, and then the features extracted from the subbands constituted a multi-view features set for diagnosis of COVID-19.
-
2)
We proposed a DSAE network to map the original features into a latent space to learn the latent representation.
-
3)
We developed a multi-task loss function to make the latent representations more informative and structured.
-
4)
We evaluated the performance of the proposed DSAE on a newly collected dataset from multiple institutions and provided clinical insights for diagnosis of COVID-19.
The remainder of this paper is organized as follows. Materials and methods are introduced in Section 2. Experiments and results are presented in Section 3. A discussion of our works is provided in Section 4. Finally, a brief conclusion to this study is provided in Section 5.
2. Materials and Methods
This retrospective study was approved by our Medical Ethical Committee (Approved Number.2020024), which waived the requirement of informed consent of patients.
2.1. Dataset
In this study, we retrospectively included three groups. First, 317 confirmed COVID-19 cases were collected from nine institutions in Hunan Province, China (152 women, 165 men; mean age, 45.33 years 19.41 [SD]; age range, 1-84 years), named as COVID-19 group. The inclusion criteria for COVID-19 group were : 1) patients were confirmed as COVID-19; 2) patients underwent CT scanning before or upon admission; and 3) patients had abnormal CT findings. Second, 248 cases of non-COVID-19 pneumonia were also collected from our institution (103 women, 145 men; mean age, 49.32 years 19.49 [SD]; age range, 0-90 years), named as non-COVID-19 group. The inclusion criteria for non-COVID-19 group were: 1) patients were identified as community acquired pneumonia (CAP); and 2) patients had imaging manifestations showing viral pneumonia but without accurate diagnosis. Lastly, 222 cases without abnormal imaging findings were retrieved from the picture archiving and communication system (PACS) as a comparison (114 women, 108 men; mean age, 27.18 years 18.90 [SD]; age range, 0-57 years), named as normal group. The CT images of all included cases were retrieved from PACS and anonymized for further investigation. Note that we only included cases who had CT images with slice less than 5 and non-contrast enhanced CT images. For patients had multiple CT scans, we only included the first CT scan images. Finally, a total of 787 subjects were used in this study. According to the data collection dates, we divided all subjects into a primary cohort of 529 cases and a validation cohort of 258 cases. The gender, age, and institution distributions of the two cohorts are presented in Table 1.
TABLE 1. Clinical Characteristics of Patients in the Primary and Validation Cohorts.
Primary cohort (n = 529) | Validation cohort (n = 258) | ||||||
---|---|---|---|---|---|---|---|
COVID-19 | Non-COVID-19 | Normal | COVID-19 | Non-COVID-19 | Normal | ||
Gender | Male | 119 [54.34] | 85 [49.71] | 71 [51.08] | 46 [46.94] | 60 [77.92] | 37 [44.58] |
Female | 100 [45.66] | 86 [50.29] | 68 [48.92] | 52 [53.06] | 17 [22.08] | 46 [55.42] | |
Age (meanstd, years) | 45.3916.46 | 49.1219.84 | 30.4614.07 | 45.1817.59 | 49.7618.86 | 21.6715.60 | |
020 | 9 [4.11] | 17 [9.94] | 31 [22.30] | 4 [4.08] | 4 [5.19] | 40 [48.19] | |
2040 | 66 [30.14] | 24 [14.04] | 73 [52.52] | 36 [36.73] | 21 [27.27] | 29 [34.94] | |
4060 | 99 [45.20] | 70 [40.94] | 35 [25.18] | 36 [36.73] | 25 [32.47] | 14 [16.87] | |
6080 | 41 [18.72] | 55 [32.16] | 0 | 20 [20.42] | 21 [27.27] | 0 | |
80100 | 4 [1.83] | 5 [2.92] | 0 | 2 [2.04] | 6 [1.30] | 0 | |
Institution | |||||||
The Second Xiangya Hospital | 11 | 171 | 139 | 0 | 77 | 83 | |
The First People Hospital of Changde | 14 | 0 | 0 | 0 | 0 | 0 | |
The First People Hospital of Huaihua | 1 | 0 | 0 | 0 | 0 | 0 | |
The First People Hospital of Yueyang | 51 | 0 | 0 | 0 | 0 | 0 | |
The First People Hospital of Changsha | 34 | 0 | 0 | 98 | 0 | 0 | |
The Second People Hospital of Chenzhou | 10 | 0 | 0 | 0 | 0 | 0 | |
The Central Hospital of Zhuzhou | 12 | 0 | 0 | 0 | 0 | 0 | |
The Central Hospital of Shaoyang | 75 | 0 | 0 | 0 | 0 | 0 | |
The First Affiliated Hospital University of South China | 9 | 0 | 0 | 0 | 0 | 0 | |
Total | 219 [41.40] | 171 [32.32] | 139 [26.28] | 98 [37.98] | 77 [28.84] | 83 [32.17] |
Note that data are presented as mean SD, or n [%].
2.2. Data Preprocessing
In this study, all CT images of each patient were first reconstructed into a three-dimensional image using dcm2nii package [33]. Then, each image was preprocessed with an U-Net model [31], which is widely used in medical image segmentation [34], [35], [36], to extract the lung parenchyma. To overcome the difference between the varying thickness of samples, the volumetric data of lung parenchyma were resampled to voxel resolution by the B-spline interpolation. After that, each segmented volume was texturized by using 3D wavelet transform (3D-WT) to capture eight different frequency subbands (Fig. 1A). Each frequency subband was treated as a view image. The 3D-WT provides a spatial and frequency representation of the original signal. For a wavelet decomposition, the 3D-WT can be denoted by a tensor product
where and represent a space direct sum and a convolution operation, respectively. and represent the low- and high-pass filters along the -axis, where .
2.3. Multi-View Feature Extraction
Gray features, with a total of 18 features, mainly consist of the first-order statistics which describe the distribution of voxels within the volume of interest (VOI), such as entropy, energy, maximum, mean, and so on. Texture features are extracted from gray level co-occurrence matrix (GLCM, 24 features), gray level dependence matrix (GLDM, 14 features), gray level run length matrix (GLRLM, 16 features), gray level size zone matrix (GLSZM, 16 features) and neighboring gray tone difference matrix (NGTDM, 5 features). Thus, there are 93 radiomics features extracted from each subband. A total of 744 radiomic features were extracted from eight frequency subbands for each subject and designed as multi-view features in this study.
2.4. Deep Supervised Autoencoder and Representation Learning
The presence of heterogeneity in multiple frequency subbands may provide additional information for the diagnosis of COVID-19, and a multi-view learning-based approach [37], [38] was used in this study. To effectively exploit these features from multiple frequency subbands, a DSAE was proposed to learn the latent representation by multi-task learning.
2.4.1. Deep Supervised Autoencoder
Autoencoder is an artificial neural network designed to learn latent data representations in an unsupervised manner, which can optimally reconstruct the original data [39]. Therefore, autoencoder has been demonstrated the capacity of reducing dimensionality [40], [41] and mining latent fetures [42]. To learn latent representations with class structure, we proposed a DSAE framework for this diagnostic task. The structure of our proposed DSAE shown in Fig. 1C consists of three components: a) an encoder, which learns the latent representations from the original input; b) a decoder, which reconstructs the input from the latent representations; and c) a supervisor, which structures the latent representations and discriminates disease types. For our settings, the encoder has three hidden layers with 256, 128, and 16 neurons, respectively, and the last hidden layer serves as the representation layer. On the contrary, the decoder is regarded as the reverse operation of the encoder. It has two hidden layers with 128 and 256 neurons, and the output of the decoder has the same size as the input layer of the encoder. The supervisor is behind the representation layer followed by a batch normalization layer, a dropout layer with a drop rate of 0.5, and a classification output layer. More formally, we defined the following notations. Let the training samples be , where is a multi-view feature set ( and represent the number of samples and multi-view features, respectively.) and is the corresponding label set. Generally, indicates the non-COVID-19 patient, indicates the COVID-19 patient, and indicates the normal subject.
2.4.2. Representation Learning for Multi-View Features
To discover latent high-level representation for each subject, the multi-view features were used as input and encoded into a low-dimensional space. Then, the latent representation was reconstructed to the original dimension of the input. The reconstruction error was minimized through back propagation to learn two stable mappings, that is, in encoding path and in decoding path, where and indicate the parameters of the two paths. Let denote the learned latent representation and be the decoded output. Thus, they can be formulated as
Our proposed aotoencoder learns the latent representation by minimizing the mean squared error (MSE) loss function between the inputs and outputs. The reconstruction loss is defined as
2.4.3. Structure for Latent Representation
To make the learned latent representation of these different pneumonia diseases well structured, a supervised block was introduced in the representation layer. The advantage is that it enables the network to better learn latent representations associated with pneumonia diseases. Batch normalization [43] and dropout [44] strategies were introduced into the supervised block to reduce overfitting issues. And the softmax layer was used to predict the subject class. The output probability can be computed as
where denotes the probability of th sample for class . is the output vector in last fully connected layer and is the number of classes. For this supervised task, we minimized the cross-entropy loss function defined as (6) to enforce the compactness for the same type of disease and to present boundaries between COVID-19 pneumonia and others.
To take informativeness and separability into consideration, two tasks are jointly trained with the following multi-task loss:
where is a balance factor between the two tasks. In this study, the supervised loss is served as a major task to distinguish COVID-19 pneumonia from others, and the reconstruction loss is used as an auxiliary task to learn latent representation.
3. Experiments and Results
3.1. Experimental Settings
We conducted multiple experiments on the CT images to evaluate the proposed pipeline. Since the original features extracted from multi-view CT images are quite different, a preprocessing step of standardized features is essential for training the model. Thus, the widely used -score standardization was employed and computed as
where is the standardization feature of feature and denotes the number of features. and are mean value and standard deviation of the feature , respectively. For the training procedure, Adam [45] was used as an optimizer with an initial learning of 0.001, which was reduced by half after each 20 epochs. The batch size was set to 8, and the maximum number of epochs was set to 500. To avoid the overfitting problem, we used an early stopping strategy that the training would be terminated if the validation loss does not decrease within 50 epochs. Furthermore, we used a 5-fold cross-validation technique on the primary cohort to determine the factor in (7) from the range [0,1] with an interval of 0.05. We found that the overall accuracy was the best when = 0.75. Hence, was fixed at 0.75 in the following experiments.
To clarify and compare the fairness, we used the standardized data as input for all experimental methods. We compared the proposed method with radiomics-based methods and a deep neural network (DNN). The radiomisc-based methods first used the minimum redundancy maximum relevance (mRMR) [46] algorithm to select features, and then the selected features were entered into the logistic regression (LR) [47], random forest (RF) [48] and support vector machine (SVM) [49] classifier to separately build a radiomic signature for the diagnosis task. The DNN is the remaining parts of the proposed DSAE excluding the decoder. For each of these methods, we performed ten experiments on CT images and reported the mean and standard deviation. Diagnostic performance was evaluated using overall accuracy in a triple classification task. Furthermore, we used a one-vs-rest strategy, treating each class as a positive in turn and the rest as negatives, to evaluate the performance with respect to accuracy (ACC), sensitivity (SEN), specificity (SPE) and F1-score (F1) metrics, which can be formulated as
where , , , and denote the number of true positives, false positives, false negatives and true negatives at th experiment, respectively. indicates the number of experiments, which is equal to 10 in this study.
3.2. Diagnostic Power of Different Frequency Features
To investigate the diagnostic power of different frequency features, we first used a visualization technique called t-distributed stochastic neighbor embedding (t-SNE) [50]. Fig. 2 shows different distributions of the 8 types of original features and their fused multi-view features. For quantitative analysis, we conducted five-fold cross-validation experiments on the primary cohort for each type of features. Table 2 shows the overall accuracy of the triple classification task. Tables 3, 4, 5, and 6 show the diagnostic performance of one-vs-rest in terms of mean accuracy, sensitivity, specificity and F1-score, respectively. We can first observe that different frequency features have large performance gaps for all methods. For example, the features extracted from high-frequency subbands have better predictive performances than those extracted from low-frequency subbands for COVID-19 patients. However, low-frequency subbands have a strong predictive power for normal subjects. As expected, the high-pass filter can detect lesions with large gradient changes, while the low-pass filter can detect normal tissues with smooth gradient changes. Noteworthily, the features from different frequency subbands have different diagnostic power and they can be regarded as multiple views to complement each other for enhancing diagnostic power. As shown in Tables 2, 3, 4, 5, and 6, the approaches using multi-view features (i.e., eight different frequency features) have better predictive performance than those using individual type of features.
TABLE 2. Mean Overall Accuracy of the Proposed Method and Compared Methods of the Triple Classification Task on the Primary Cohort.
Methods | Multi-view | LLL | LLH | LHL | LHH | HLL | HLH | HHL | HHH |
---|---|---|---|---|---|---|---|---|---|
LR | 80.510.63 | 75.410.79 | 73.720.91 | 77.790.49 | 74.140.81 | 74.240.86 | 74.140.52 | 75.900.64 | 74.461.26 |
RF | 84.160.98 | 75.011.01 | 80.470.79 | 76.590.95 | 78.940.87 | 74.231.04 | 79.040.67 | 79.360.81 | 78.220.61 |
SVM | 81.061.21 | 75.670.75 | 75.461.14 | 77.190.77 | 74.800.91 | 75.111.01 | 74.560.68 | 75.881.57 | 74.481.22 |
DNN | 83.340.93 | 75.931.01 | 79.280.85 | 77.771.23 | 77.431.00 | 77.621.21 | 78.860.96 | 78.651.26 | 77.581.03 |
DSAE | 86.440.70 | 78.991.20 | 81.190.55 | 81.100.91 | 80.580.74 | 80.590.89 | 81.151.08 | 81.640.84 | 80.620.86 |
Note that LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH represent different frequency features extracted from the multiple frequency subbands.
TABLE 3. Mean Accuracy of the Proposed Method and Compared Methods Based on the Primary Cohort.
Method | Class | Multi-view | LLL | LLH | LHL | LHH | HLL | HLH | HHL | HHH |
---|---|---|---|---|---|---|---|---|---|---|
LR | COVID-19 | 89.621.55 | 82.740.94 | 84.520.58 | 86.240.63 | 86.341.07 | 84.270.83 | 87.070.45 | 86.710.62 | 87.410.71 |
Non-COVID-19 | 84.241.41 | 81.130.79 | 78.791.33 | 81.980.43 | 78.580.64 | 80.260.76 | 78.750.67 | 81.100.58 | 78.621.00 | |
Normal | 88.191.18 | 86.940.47 | 84.140.71 | 87.360.67 | 83.350.72 | 83.940.76 | 82.460.45 | 83.990.75 | 82.891.01 | |
RF | COVID-19 | 93.950.47 | 82.510.98 | 92.190.64 | 87.090.61 | 91.980.67 | 85.630.94 | 92.270.55 | 91.910.54 | 91.550.53 |
Non-COVID-19 | 86.261.22 | 80.471.01 | 82.930.73 | 80.330.68 | 81.490.80 | 79.811.19 | 81.830.61 | 82.270.78 | 82.080.60 | |
Normal | 88.831.45 | 87.040.75 | 85.820.51 | 85.770.83 | 84.410.59 | 83.030.84 | 83.970.79 | 84.540.89 | 82.820.88 | |
SVM | COVID-19 | 90.632.01 | 83.330.77 | 87.200.69 | 86.690.63 | 87.920.85 | 86.090.89 | 88.940.50 | 88.221.01 | 88.090.95 |
Non-COVID-19 | 84.441.27 | 80.760.96 | 78.811.22 | 81.160.62 | 78.360.62 | 79.471.07 | 77.470.60 | 79.941.26 | 78.171.34 | |
Normal | 88.301.32 | 87.260.50 | 84.921.09 | 86.530.76 | 83.330.89 | 84.650.62 | 82.710.60 | 83.601.17 | 82.710.76 | |
DNN | COVID-19 | 93.740.51 | 83.530.69 | 89.990.44 | 87.731.12 | 90.300.88 | 87.030.88 | 91.740.59 | 90.580.91 | 90.430.71 |
Non-COVID-19 | 85.230.92 | 80.600.77 | 81.980.46 | 81.680.72 | 79.960.96 | 80.451.01 | 80.790.79 | 81.231.10 | 80.180.88 | |
Normal | 88.160.62 | 88.140.91 | 85.030.60 | 85.820.97 | 82.741.12 | 84.881.01 | 83.120.94 | 83.161.23 | 83.290.77 | |
DSAE | COVID-19 | 94.610.44 | 85.650.92 | 91.700.52 | 89.900.52 | 92.340.44 | 90.510.45 | 92.610.42 | 92.820.57 | 93.460.38 |
Non-COVID-19 | 87.600.61 | 82.910.90 | 83.950.71 | 83.800.94 | 81.871.10 | 82.990.68 | 82.800.65 | 83.250.76 | 83.670.59 | |
Normal | 90.380.58 | 89.770.40 | 87.520.80 | 88.150.67 | 86.091.24 | 86.810.67 | 86.260.59 | 86.200.73 | 85.820.66 |
Note that LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH represent different frequency features extracted from the multiple frequency subbands.
TABLE 4. Mean Sensitivity of the Proposed Method and Compared Methods Based on the Primary Cohort.
Method | Class | Multi-view | LLL | LLH | LHL | LHH | HLL | HLH | HHL | HHH |
---|---|---|---|---|---|---|---|---|---|---|
LR | COVID-19 | 87.331.86 | 78.501.65 | 79.521.69 | 84.030.87 | 84.901.45 | 83.221.04 | 86.320.42 | 84.571.35 | 85.241.25 |
Non-COVID-19 | 73.292.67 | 65.772.05 | 65.531.77 | 68.590.93 | 66.531.36 | 67.072.32 | 66.861.39 | 70.831.36 | 67.532.35 | |
Normal | 80.793.01 | 83.002.08 | 75.151.93 | 79.701.75 | 66.922.16 | 69.471.75 | 64.071.10 | 68.822.26 | 66.541.74 | |
RF | COVID-19 | 89.860.74 | 79.031.33 | 88.840.80 | 82.870.61 | 88.300.85 | 82.621.44 | 89.241.04 | 88.940.71 | 87.410.90 |
Non-COVID-19 | 79.372.05 | 65.642.28 | 73.641.87 | 71.632.50 | 71.872.20 | 67.622.29 | 72.322.03 | 73.851.04 | 73.171.04 | |
Normal | 82.993.10 | 80.522.17 | 76.081.53 | 73.522.84 | 73.182.15 | 69.221.94 | 71.711.40 | 71.632.63 | 70.471.66 | |
SVM | COVID-19 | 86.672.26 | 77.631.29 | 80.111.39 | 82.000.95 | 85.031.37 | 82.581.38 | 86.551.03 | 82.370.99 | 84.091.17 |
Non-COVID-19 | 75.272.96 | 67.211.73 | 71.201.89 | 68.991.41 | 74.432.39 | 71.851.59 | 76.661.88 | 75.422.03 | 71.402.74 | |
Normal | 81.993.14 | 83.651.23 | 74.082.77 | 80.102.09 | 59.492.17 | 67.852.20 | 53.511.60 | 66.693.62 | 63.702.56 | |
DNN | COVID-19 | 89.850.41 | 79.461.42 | 84.610.81 | 84.062.24 | 84.951.38 | 85.201.44 | 87.530.91 | 84.641.46 | 85.811.18 |
Non-COVID-19 | 74.352.09 | 65.612.72 | 71.101.88 | 68.911.78 | 65.433.03 | 62.842.61 | 68.043.00 | 67.741.11 | 67.982.38 | |
Normal | 85.112.48 | 84.162.26 | 78.121.50 | 78.491.89 | 77.502.36 | 78.822.92 | 74.523.11 | 78.392.83 | 74.102.27 | |
DSAE | COVID-19 | 91.221.02 | 79.211.97 | 88.821.06 | 85.041.32 | 89.520.78 | 87.141.31 | 89.781.09 | 88.261.48 | 89.560.92 |
Non-COVID-19 | 80.321.77 | 72.553.58 | 75.272.24 | 76.072.20 | 75.912.89 | 74.912.33 | 76.272.77 | 77.882.45 | 78.182.72 | |
Normal | 85.631.84 | 87.221.60 | 77.583.58 | 80.342.52 | 70.373.28 | 75.273.08 | 71.692.64 | 73.492.46 | 72.391.99 |
Note that LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH represent different frequency features extracted from the multiple frequency subbands.
TABLE 5. Mean Specificity of the Proposed Method and Compared Methods Based on the Primary Cohort.
Method | Class | Multi-view | LLL | LLH | LHL | LHH | HLL | HLH | HHL | HHH |
---|---|---|---|---|---|---|---|---|---|---|
LR | COVID-19 | 91.221.78 | 85.861.59 | 88.140.94 | 87.860.72 | 87.411.23 | 85.090.82 | 87.720.92 | 88.210.99 | 89.000.93 |
Non-COVID-19 | 89.631.14 | 88.630.85 | 85.211.32 | 88.510.58 | 84.430.96 | 86.740.52 | 84.540.63 | 86.160.83 | 84.120.84 | |
Normal | 90.851.34 | 88.481.02 | 87.520.69 | 90.200.59 | 89.320.94 | 89.270.99 | 89.030.75 | 89.480.99 | 88.861.03 | |
RF | COVID-19 | 96.880.56 | 84.921.21 | 94.620.83 | 90.170.92 | 94.600.82 | 87.741.11 | 94.430.41 | 94.030.74 | 94.471.05 |
Non-COVID-19 | 89.681.12 | 87.620.73 | 87.460.68 | 84.740.92 | 86.170.92 | 85.680.98 | 86.510.98 | 86.390.95 | 86.520.81 | |
Normal | 91.091.10 | 89.530.77 | 89.400.40 | 90.270.96 | 88.540.70 | 88.001.25 | 88.541.10 | 89.350.50 | 87.370.97 | |
SVM | COVID-19 | 93.392.43 | 87.461.56 | 92.270.72 | 90.031.05 | 89.981.22 | 88.660.97 | 90.726.23 | 92.401.31 | 91.021.34 |
Non-COVID-19 | 88.960.99 | 87.371.02 | 82.601.22 | 87.150.85 | 80.390.89 | 83.261.19 | 78.030.76 | 82.271.33 | 81.601.50 | |
Normal | 90.641.39 | 88.680.79 | 88.971.02 | 88.920.52 | 91.910.89 | 90.800.62 | 93.170.98 | 89.760.76 | 89.570.95 | |
DNN | COVID-19 | 96.510.86 | 86.481.25 | 93.070.70 | 90.411.51 | 94.171.07 | 88.411.40 | 94.770.93 | 94.750.82 | 93.710.99 |
Non-COVID-19 | 90.471.14 | 87.781.86 | 87.330.74 | 87.361.25 | 86.971.17 | 88.981.17 | 86.980.87 | 87.841.39 | 86.150.96 | |
Normal | 89.240.87 | 89.630.99 | 87.410.98 | 88.471.11 | 84.761.48 | 87.081.28 | 86.021.24 | 84.840.73 | 86.450.89 | |
DSAE | COVID-19 | 97.000.73 | 90.181.72 | 93.750.46 | 93.341.02 | 94.330.38 | 92.880.64 | 94.590.75 | 95.980.98 | 96.210.99 |
Non-COVID-19 | 91.060.77 | 87.831.72 | 88.081.10 | 87.471.64 | 84.751.69 | 86.761.44 | 85.761.06 | 85.731.39 | 86.260.99 | |
Normal | 92.040.84 | 90.710.48 | 90.931.07 | 90.960.90 | 91.561.64 | 90.860.96 | 91.191.10 | 90.610.95 | 90.501.01 |
Note that LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH represent different frequency features extracted from the multiple frequency subbands.
TABLE 6. Mean F1-Score of the Proposed Method and Compared Methods Based on the Primary Cohort.
Method | Class | Multi-view | LLL | LLH | LHL | LHH | HLL | HLH | HHL | HHH |
---|---|---|---|---|---|---|---|---|---|---|
LR | COVID-19 | 87.321.96 | 78.841.07 | 80.880.83 | 83.400.72 | 83.681.29 | 81.350.95 | 84.590.49 | 83.980.76 | 84.780.85 |
Non-COVID-19 | 74.912.27 | 69.071.53 | 66.301.94 | 70.300.78 | 66.440.92 | 68.471.52 | 66.821.05 | 70.600.98 | 66.841.63 | |
Normal | 77.882.31 | 76.730.77 | 71.031.54 | 76.541.41 | 67.511.61 | 69.121.32 | 65.350.82 | 68.921.71 | 66.931.67 | |
RF | COVID-19 | 92.390.60 | 78.771.17 | 90.330.78 | 84.050.76 | 90.030.86 | 82.551.19 | 90.420.73 | 90.000.69 | 89.450.65 |
Non-COVID-19 | 78.721.84 | 68.261.86 | 73.411.20 | 69.971.23 | 71.311.38 | 68.141.93 | 71.711.23 | 72.681.14 | 72.350.93 | |
Normal | 79.432.70 | 76.381.47 | 73.531.14 | 72.731.81 | 70.841.32 | 67.791.25 | 69.971.20 | 70.601.97 | 68.021.49 | |
SVM | COVID-19 | 88.342.47 | 79.240.89 | 83.770.96 | 83.520.72 | 85.311.03 | 82.991.06 | 86.570.68 | 85.151.21 | 85.281.11 |
Non-COVID-19 | 75.542.19 | 69.101.60 | 68.301.59 | 70.071.05 | 68.801.13 | 69.121.47 | 68.551.15 | 70.621.71 | 67.752.05 | |
Normal | 78.362.36 | 77.330.68 | 71.562.46 | 75.451.67 | 64.642.00 | 69.521.48 | 61.420.82 | 67.513.19 | 65.501.91 | |
DNN | COVID-19 | 92.150.54 | 79.790.83 | 87.390.52 | 84.921.46 | 87.801.16 | 84.371.03 | 89.700.73 | 88.031.22 | 88.050.81 |
Non-COVID-19 | 76.251.52 | 68.271.61 | 71.680.97 | 70.610.99 | 67.462.08 | 67.061.89 | 69.222.05 | 69.621.60 | 68.601.51 | |
Normal | 78.991.30 | 78.751.62 | 73.000.91 | 74.201.51 | 70.171.74 | 73.051.76 | 69.591.75 | 70.782.30 | 69.671.51 | |
DSAE | COVID-19 | 93.240.56 | 81.871.09 | 89.760.67 | 87.360.63 | 90.550.57 | 88.270.64 | 90.860.59 | 90.930.78 | 91.820.42 |
Non-COVID-19 | 80.511.06 | 73.001.65 | 75.071.24 | 75.031.30 | 72.811.65 | 73.781.07 | 73.791.30 | 74.801.18 | 75.311.24 | |
Normal | 82.171.08 | 81.610.88 | 76.161.74 | 77.891.39 | 72.312.25 | 74.621.44 | 72.891.24 | 73.371.48 | 72.531.16 |
Note that LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH represent different frequency features extracted from the multiple frequency subbands.
To further demonstrate the strong diagnostic power of multi-view features, we conducted experiments to explore the overall accuracy of our proposed model under different combinations of feature views. For simplicity, we randomly removed the feature views one by one and performed 10 five-fold cross-validation experiments on the primary cohort. Fig. 3 reveals the performance trend of our proposed method as the number of feature views removed varies from 0 to 7. We can see the overall accuracy becomes significantly lower as the number of feature views removed increases, which strongly supports the need to combine feature views of eight different frequencies. Fig. 4 shows the performance trends of our proposed method and the comparaed methods across different frequency features. It reveals that the performance of the methods using multi-view features is better than that using individual frequency features.
3.3. Efficacy and Discovery of Latent Representations
To demonstrate the effectiveness of latent representations, we visualized the learned features in representation layer and original multi-view features in both primary and validation cohorts. Fig. 5 shows the distributions of the original multi-view features and the latent representations and vividly illustrates that the latent representations are more informative and structured compared to the original multi-view features. More specifically, the visualization results in Fig. 5a) show that the underlying class structure is not well revealed for the original multi-view features, while Fig. 5b) indicates that the latent representations learned from original multi-view features and classes are more informative and well-structured. As expected, Fig. 5c) and Fig. 5d) also illustrate a similar situation in the validation cohort. As shown in Tables 2, 3, 4, 5, and 6, we can also observe that the performance with latent representations is better than that without latent representations. For example, the DSAE achieved an overall accuracy of 86.44 percent, which is 3.10 percent higher than the DNN without learning latent representations.
In addition, we discovered the following two phenomena from Fig. 5. One is that there is a certain margin between the COVID-19 patients and the others in the original multi-view features (Fig. 5a) and Fig. 5c)). This indicates that the feature extraction method we designed is beneficial to distinguishing COVID-19 patients from others. The other is that the learned latent representations have an internal class structure in COVID-19 patients (Fig. 5b) and Fig. 5d)). This means that COVID-19 patients can be further classified into multiple sub-clusters using our proposed method. We will further discuss this in Section 4.
3.4. Comparison With Other Methods
Fig. 4 and Table 2 show the overall accuracy of the triple classification task of the proposed method and compared methods. Obviously, the proposed method achieved the best overall accuracy up to 84.66 percent by multi-view features learning. Compared to the radiomics-based methods, our latent representation-based approach improved the overall accuracy by 2.285.93 percent in multi-view features learning. To further demonstrate the effectiveness of our proposed method, we used the DNN model for different frequency features and multi-view features to directly learn the mapping from the original features to the class labels. The experimental results show that the performance of DNN model is comparable to that of radiomics-based methods, but much worse than that of our proposed DSAE model. This indicates that the DSAE model can achieve a better predictive performance by multi-task learning.
Tables 3, 4, 5, and 6 also show the performance of the proposed method and comparison methods on the three diagnostic tasks. Compared with other methods, our proposed method overall achieved the best performance in each diagnostic task across different frequency features and multi-view features. More precisely, when identifying COVID-19 from others, our proposed method achieved the best performance in multi-view features learning, with an accuracy of 94.61 percent, a sensitivity of 91.22 percent, a specificity of 97.00 percent, and a F1-score of 93.24 percent. When distinguishing non-COVID-19 from others, the proposed method achieved the best performance in multi-view features learning, with an accuracy of 87.60 percent, a sensitivity of 80.32 percent, a specificity of 91.06 percent and a F1-score of 80.51 percent. Similarly, it achieved the best diagnostic performance, an accuracy of 93.38 percent, a sensitivity of 85.63 percent, a specificity of 92.04 percent and a F1-score of 82.17 percent, in distinguishing normal subjects from COVID-19 and non-COVID-19 patients. As seen from the performance differences of three diagnostic tasks, our proposed model has the strongest ability to identify COVID-19 subjects, followed by normal and non-COVID-19 subjects.
3.5. Independent Validation
To further validate our proposed method, we performed independent validation experiments based on the validation cohort presented in Table 1. More specifically, we used the primary cohort as training data to obtain more generalized models and used validation cohort to evaluate the performance. Fig. 6A) shows the confusion matrix of our proposed method, and Fig. 6B) shows the overall accuracy of our proposed method and the compared methods of the triple classification task on the validation cohort. It was observed that our proposed method achieves the best overall accuracy with a value of 89.53 percent in independent validation cohort. Although our proposed DSAE achieves promising performance, there are still 10 COVID-19 cases, 14 non-COVID-19 cases and 3 normal cases that are misdiagnosed. We will further discuss the underlying reason of misclassification in the next section. Moreover, Table 7 presents the corresponding diagnostic performance in terms of accuracy, sensitivity, specificity and F1-score under the one-vs-rest strategy. As shown in Table 7, our proposed method has a consistent pattern across the three binary classification tasks and achieves the best diagnostic performance compared with other methods. Although the diagnostic performance of our proposed method is similar to that of the RF-based approach on the primary cohort, our proposed DSAE model performs better on the validation cohort. For example, the accuracy of our proposed DSAE model is 1.55, 7.76, and 6.20 percent higher in distinguishing each category than that of the RF-based approach, respectively. Moreover, the radiomics-based methods are less sensitive in distinguishing normal subjects from other patients, while our proposed method and DNN-based method are more sensitive.
TABLE 7. Diagnostic Performance of the Proposed Method and Compared Methods on the Validation Cohort.
Method | Class | ACC(%) | SEN(%) | SPE(%) | F1(%) |
---|---|---|---|---|---|
LR | COVID-19 | 92.25 | 96.94 | 89.38 | 90.48 |
Non-COVID-19 | 86.82 | 87.01 | 86.74 | 79.76 | |
Normal | 83.72 | 57.83 | 96.00 | 69.57 | |
RF | COVID-19 | 94.57 | 96.94 | 93.14 | 93.14 |
Non-COVID-19 | 83.33 | 76.62 | 86.18 | 73.29 | |
Normal | 85.66 | 68.67 | 93.71 | 75.50 | |
SVM | COVID-19 | 92.64 | 92.86 | 92.50 | 80.55 |
Non-COVID-19 | 85.66 | 88.31 | 84.53 | 78.61 | |
Normal | 82.95 | 59.04 | 94.29 | 69.01 | |
DNN | COVID-19 | 95.74 | 88.78 | 100.00 | 94.05 |
Non-COVID-19 | 87.21 | 75.32 | 92.26 | 77.85 | |
Normal | 89.92 | 93.98 | 88.00 | 85.71 | |
DSAE | COVID-19 | 96.12 | 89.80 | 100.00 | 94.62 |
Non-COVID-19 | 91.09 | 81.82 | 95.03 | 84.56 | |
Normal | 91.86 | 96.39 | 89.71 | 88.40 |
To further prove the diagnosis power of the purposed methods, other radiomics features such as shape-based features (14 features), and gray and texture features (93 features, named as original features) extracted directly from original images were also used to distinguish the COVID-19 pneumonia from others using the proposed DSAE method. As shown in Table 8, the DSAE method using shape features achieves the lowest overall accuracy with an value of 46.51 percent, which means that only using shape-based features are difficult to screen out COVID-19 pneumonia from others. The method using gray and texture features achieves a moderate performance with an overall accuracy of 77.13 percent. However, the method using our designed multi-view features achieves encouraging performance with an overall accuracy of 89.53 percent. This demonstrates that the multi-view feature we designed has a strong diagnostic power for distinguishing the COVID-19 pneumonia from others. Moreover, we implemented multiple works [19], [20], [23], [25] to compare the performance. The comparison performance is shown in Table 8. We found that the representation learning-based method [23] performed better than the transfer learning-based methods [19], [20], [25]. More precisely, the overall accuracy of the literature [23] is 10.85 percent higher than that of the literature [25], which suggests that the radiomic features have a strong predictive power for identifying COVID-19. However, our proposed method is based on multi-view representation learning and achieves better performance compared with these methods.
TABLE 8. Diagnostic Performance on the Validation Cohort When Distinguishing COVID-19 From Other Subjects.
Method | Class | ACC (%) | SEN (%) | SPE (%) | F1 (%) | Overall accuracy(%) |
---|---|---|---|---|---|---|
Li et al. [19] | COVID-19 | 76.36 | 63.27 | 84.38 | 67.03 | 63.95 |
Non-COVID-19 | 79.07 | 65.39 | 85.00 | 65.39 | ||
Normal | 72.48 | 63.42 | 76.71 | 59.43 | ||
Shah et al. [25] | COVID-19 | 79.85 | 59.18 | 92.50 | 69.05 | 71.32 |
Non-COVID-19 | 80.22 | 79.49 | 80.56 | 70.86 | ||
Normal | 82.56 | 78.05 | 84.66 | 73.99 | ||
Bai et al. [20] | COVID-19 | 70.93 | 38.78 | 90.63 | 50.33 | 62.02 |
Non-COVID-19 | 75.19 | 75.64 | 75.00 | 64.84 | ||
Normal | 77.91 | 76.83 | 78.41 | 68.85 | ||
Kang et al. [23] | COVID-19 | 94.57 | 88.78 | 98.13 | 92.55 | 82.17 |
Non-COVID-19 | 83.72 | 54.55 | 96.13 | 66.67 | ||
Normal | 86.05 | 100.00 | 79.43 | 82.18 | ||
DSAE (Shape) | COVID-19 | 57.36 | 36.74 | 70.00 | 39.56 | 46.51 |
Non-COVID-19 | 65.89 | 50.65 | 72.38 | 46.99 | ||
Normal | 69.77 | 54.22 | 77.14 | 53.57 | ||
DSAE (Original) | COVID-19 | 87.21 | 68.37 | 98.75 | 80.24 | 77.13 |
Non-COVID-19 | 82.17 | 72.73 | 86.19 | 70.89 | ||
Normal | 84.88 | 91.56 | 81.71 | 79.58 | ||
DSAE (Multi-view) | COVID-19 | 96.12 | 89.80 | 100.00 | 94.62 | 89.53 |
Non-COVID-19 | 91.09 | 81.82 | 95.03 | 84.56 | ||
Normal | 91.86 | 96.39 | 89.71 | 88.40 |
4. Discussion
The COVID-19 pandemic has infected more than 14 million patients worldwide and quickly become a major global health threat [2]. Computer-aided diagnosis systems are playing an increasingly important role in the diagnosis and monitoring of COVID-19, which can reduce the burden of radiologists and help them make clinical decisions. In this study, we proposed an efficient and accurate diagnosis system for automatically differentiating COVID-19 pneumonia from other pneumonia and normal subjects. Compared with previous studies [17], [18], [20], [51] which require manual or semi-automatic lesions annotation from CT images, our proposed method only needs an easy-to-implement preprocessing step, that is, the use of AI model to automatically segment the lung regions from CT images. It also does not require select key slices to represent a full 3D CT scan, but only uses 3D-WT technique to decompose the 3D lung images into multiple frequency subbands for exploiting multi-view features. More importantly, compared to the CNN-based methods, our proposed method can achieve promising performance on a limited amount of data.
Leveraging the complementarity of multiple views, multi-view representation learning is capable of learning more informative and compact representations for improving predictive performance as proved in this study (see Table 2 and Fig. 5). Moreover, visualization of latent representations in Fig. 5b) and Fig. 5d) has revealed an internal class structure in COVID-19 subjects. We further retrospectively investigated severity of COVID-19 (non-severe and severe) based on clinical assessment criteria. Statistical results presented in Fig. 5b) and Fig. 5d) show that non-severe and severe are not completely separated, but there are three types of structures with a high, medium, and low presence probability of severe subjects. The reason for this difference may be that the severity assessment criteria are not fully derived from CT imaging. In fact, CT evaluation has little reference value in clinical classification [52].
Automatically identifying patients with abnormal CT findings and further screening out COVID-19 pneumonia from other pneumonia is urgently needed in the clinical practice. Inspired by the clinical requirements, we proposed a classification system to address the task. Despite the promising performance, 14 cases with non-COVID-19 pneumonia were identified as normal cases (see Fig. 6A)). After carefully reviewed the misdiagnosed cases by radiologists, 9 cases had small and low-density lesions, which were too subtle to detect. This circumstance was also presented in COVID-19 group. Among 10 misdiagnosed cases characterized by our system in COVID-19 group, 4 cases with small and low-density lesions were identified as normal cases. This means that the sensitivity of our system requires further improvement. Other 6 misdiagnosed cases in COVID-19 group had non-typical imaging manifestations due to the potential of a relatively later stage of the disease. Although we only used the first CT scans of each case, the interval time varied between onset of symptoms and first CT scans. We stated it as one of our limitations. Three cases with normal CT findings were misdiagnosed as non-COVID-19 pneumonia. All the 3 failure cases had false lesions due to the relatively higher density of the posterior lung. It is worth noting that none failure cases in normal and non-COVID-19 group were identified as COVID-19 pneumonia, indicating a substantial inherent difference between COVID-19 group and other two groups. In clinical practice, it is easy to identify normal and abnormal lung CT findings. Therefore, we further investigated the binary classification, that is, differentiating COVID-19 pneumonia from non-COVID-19 pneumonia. On independent validation cohort, the diagnostic results showed that only two cases with small and low-density lesions from COVID-19 group were identified as non-COVID-19 cases, and three cases in non-COVID-19 group were identified as COVID-19 cases (see Fig. 7A)). Specifically, our proposed DSAE achieved encouraging predictive performance with an accuracy of 97.14 percent, a sensitivity of 96.10 percent, a specificity of 97.37 percent and an F1-score of 96.73 percent (see Fig. 7B)). As expected, the performance of binary classification task outperformed that of triple classification task. Actually, the triple classification task is naturally more difficult than the binary classification task.
Training a deep learning model with high generalization, especially in multiple classification tasks, may require the use of more samples. One of the biggest advantages of using deep learning approach is the ability to automatically learn the latent features associated with pneumonia diseases. However, it still lacks interpretability and can’t extract quantitative features as the same as radiomic features. Therefore, we used radiomic features from the multiple frequency subbands as the multi-view features for the diagnosis of COVID-19 in this study. Moreover, we further investigated whether our approach was influenced by age. Considering the size and imbalance of samples, we divided the validation cohort into three groups based on age. Table 9 shows the diagnostic performance based on three age groups. We found that our proposed method yielded consistently good performance across age groups and was more sensitive to COVID-19 over the age of 40. Therefore, our model is not affected by age differences when distinguishing COVID-19 from non-COVID-19 pneumonia.
TABLE 9. Diagnostic Performance of Different Age Groups of the Proposed Method on the Validation Cohort When Distinguishing COVID-19 From Non-COVID-19 Subjects.
Age group | ACC (%) | SEN (%) | SPE (%) | F1 (%) |
---|---|---|---|---|
0-40 | 95.39 | 92.00 | 97.50 | 93.88 |
40-60 | 98.36 | 96.00 | 100.00 | 97.96 |
60-100 | 97.96 | 100.00 | 95.46 | 98.18 |
Our study still comprises some limitations. First, only radiomic features are used, the deep learning features may have the potential to identify COVID-19 pneumonia. Next work will collect more data and use the CNN-based methods to automatically learn the latent features associated with pneumonia diseases. Second, we only included the first CT scans, thus the longitudinal CT changes were not investigated. Whether our model has the same performance in identifying different stages of COVID-19 pneumonia from other pneumonia is unclear. Moreover, only early-stage patients of the pandemic are considered in this study, the patients infected with SARS-CoV-2 variants are further investigated. Future work will focus on collecting more data on patients after virus mutations to validate the performance of the model. Third, our deep learning model only integrated chest CT features without involving the clinical information such as symptoms, exposure history, and so on. A recent research has reported that combining CT imaging and clinical information can improve the diagnostic performance of AI model [51]. We will further validate it in future work. Finally, the interpretability of the deep learning system remains unclear, and the clinical meaning of the feature learned by the system is difficult to explain. Actually, we have investigated the visualization of the original multi-view features and the latent representations in two cohorts to mine the inherent mechanism. However, the further investigation is needed in the future work.
5. Conclusion
In conclusion, we proposed an easy-to-use diagnostic method based on multi-view representation learning, which used 3D CT images to rapidly screen out COVID-19 from other pneumonia and normal subjects without abnormal CT findings. Our proposed diagnostic model achieved an overall accuracy of 89.54 percent in the triple classification task. When only considering to distinguish COVID-19 from non-COVID-19 pneumonia, the model had a more generalization performance with an accuracy of 97.14 percent, a sensitivity of 96.10 percent, a specificity of 97.37 percent and an F1-score of 96.73 percent. Comprehensive results have demonstrated that our proposed method has great potential in accurately and rapidly screening out COVID-19 pneumonia, which is beneficial to fight the current disease outbreak.
Acknowledgments
This work was supported in part by Key Emergency Project of Pneumonia Epidemic of novel coronavirus infection under Grant 2020SK3006, Emergency Project of Prevention and Control for COVID-19 of Central South University under Grant 160260005, Foundation from Changsha Scientific and Technical bureau, China under Grant kq2001001 National Natural Science Foundation of China under Grants 61802442, 61877059, Natural Science Foundation of Hunan Province under Grant 2019JJ50775, 111 Project under Grant B18059, the Hunan Provincial Science and Technology Program under Grant 2018WK4001, the Hunan Provincial Science and Technology Innovation Leading Plan under Grant 2020GK2019, the Science and Technology Innovation Program of Hunan Province under Grant 2020SK53423, and Clinical Research Center for Medical Imaging In Hunan Province under Grant 2020SK4001. Jianhong Cheng and Wei Zhao are with equal contribution.
Biographies
Jianhong Cheng received the BS degree from Liaoning Technical University, Fuxin, China, in 2014, the MS degree in software engineering from Central South University, Changsha, China, in 2017. He is currently working toward the PhD degree in the School of Computer Science and Engineering, Central South University, Changsha, China. His research interests include machine learning, deep learning, and medical image analysis.
Wei Zhao received the PhD degree in imaging and nuclear medicine from Fudan University, China. He is a radiologist of The Second Xiangya Hospital. His research interests include chest CT imaging, radiomics, and deep learning.
Jin Liu (Member, IEEE) received the PhD degree in computer science from Central South University, China, in 2017. He is currently a lecturer at the School of Computer Science and Engineering, Central South University, Changsha, Hunan, China. His current research interests include medical image analysis, machine learning, and bioinformatics.
Xingzhi Xie received the bachelor’s degree in clinical medicine from Central South University, China. She is currently working toward the graduate degree in imaging and nuclear medicine of The Second Xiangya Hospital. Her research interests include CT imaging, radiomics, and deep learning.
Shangjie Wu received the bachelor’s degree of clinical medicine from the Central South University, China, the master’s degree of DME from Fudan University, China, and the PhD degree from Central South University, China. She is postdoctoral fellow at the Cancer Research Institute of Central South University, China. She is visiting scholar at Imperial College London, U.K. Her research interests include basic and clinical research of pulmonary vascular diseases, Venous Thrombus Embolism, pulmonary hypertension and Evidence-based Medicine.
Liangliang Liu received the MS degree from Henan University, China, in 2014, and the PhD degree from the School of Computer Science and Engineering, Central South University, Changsha, China, in 2020. He worked as a visiting scholar with Tulane University, New Orleans, Louisiana, from 2019 to 2020. He is an distinguished professor with the College of Information and Management Science, Henan Agricultural University, Zhengzhou, China, since 2021. His research interests include machine learning, deep learning, and medical image analysis.
Hailin Yue received the BS degree from NanJing XiaoZhuang University, Nanjing, China, in 2019. He is currently working toward the MS degree in the School of Computer Science and Engineering, Central South University, Changsha, China. His research interests include machine learning, deep learning, and medical image analysis.
Junjian Li received the BS degree from the Hefei University of Technology, Hefei, China, in 2019. He is currently working toward the MS degree in the School of Computer Science and Engineering, Central South University, Changsha, China. His research interests include unsupervised learning, self-supervised learning, and medical image analysis.
Jianxin Wang (Senior Member, IEEE) received the BS and MS degrees in computer science and application from the Central South University of Technology, China, and the PhD degree in computer science and technology from Central South University, China. Currently, he is the dean and a professor in the School of Computer Science and Engineering, Central South University, Changsha, Hunan, China. He is also a leader in Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan, China. His current research interests include algorithm analysis and optimization, parameterized algorithm, bioinformatics and computer network. He has published more than 200 papers in various International journals and refereed conferences. He has been on numerous program committees and NSFC review panels, and served as editors for several journals such as the IEEE/ACM Transactions Computational Biology and Bioinformatics (TCBB), the International Journal of Bioinformatics Research and Applications, the Current Bioinformatics, the Current Protein & Peptide Science, the Protein & Peptide Letters.
Jun Liu is the director of Radiology Department of The Second Xiangya Hospital. He is also the leader of 225 subjects in Hunan Province, National member of the Neurology Group of the Chinese Society of Radiology, National Committee of the Neurology Group of the Radiological Branch of the Chinese Medical Association. His research interests include brain functional imaging, radiomics, and deep learning.
Funding Statement
This work was supported in part by Key Emergency Project of Pneumonia Epidemic of novel coronavirus infection under Grant 2020SK3006, Emergency Project of Prevention and Control for COVID-19 of Central South University under Grant 160260005, Foundation from Changsha Scientific and Technical bureau, China under Grant kq2001001 National Natural Science Foundation of China under Grants 61802442, 61877059, Natural Science Foundation of Hunan Province under Grant 2019JJ50775, 111 Project under Grant B18059, the Hunan Provincial Science and Technology Program under Grant 2018WK4001, the Hunan Provincial Science and Technology Innovation Leading Plan under Grant 2020GK2019, the Science and Technology Innovation Program of Hunan Province under Grant 2020SK53423, and Clinical Research Center for Medical Imaging In Hunan Province under Grant 2020SK4001.
Contributor Information
Jianhong Cheng, Email: jianhong_cheng@csu.edu.cn.
Wei Zhao, Email: wei.zhao@csu.edu.cn.
Jin Liu, Email: liujin06@mail.csu.edu.cn.
Xingzhi Xie, Email: xingzhixie123@csu.edu.cn.
Shangjie Wu, Email: wushangjie@csu.edu.cn.
Liangliang Liu, Email: liuhau@126.com.
Hailin Yue, Email: yuehailin@mail.csu.edu.cn.
Junjian Li, Email: 194712147@mail.csu.edu.cn.
Jianxin Wang, Email: jxwang@mail.csu.edu.cn.
Jun Liu, Email: junliu123@csu.edu.cn.
References
- [1].WHO, “Novel coronavirus-China,” Website, Accessed: Jan. 12, 2020, 2020. [Online]. Available: https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/
- [2].WHO, “Coronavirus disease (COVID-19) pandemic,” Website, Accessed: Jan. 29, 2021, 2020. [Online]. Available: https://www.who.int/emergencies/diseases/novel-coronavirus-2019
- [3].CDC, “CDC diagnostic tests for COVID-19,” Website, Accessed: Aug. 5, 2020, 2020. [Online]. Available: https://www.cdc.gov/coronavirus/2019-ncov/lab/testing.html
- [4].Ai T. et al. , “Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases,” Radiology, vol. 296, no. 2, pp. E32–E40, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Xie X., Zhong Z., Zhao W., Zheng C., Wang F., and Liu J., “Chest CT for typical coronavirus disease 2019 (COVID-19) pneumonia: Relationship to negative RT-PCR testing,” Radiology, vol. 296, no. 2, pp. E41–E45, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Zhao W., Zhong Z., Xie X., Yu Q., and Liu J., “Relation between chest CT findings and clinical conditions of coronavirus disease (COVID-19) pneumonia: A multicenter study,” Amer. J. Roentgenol., vol. 214, no. 5, pp. 1072–1077, 2020. [DOI] [PubMed] [Google Scholar]
- [7].Colombi D. et al. , “Well-aerated lung on admitting chest CT to predict adverse outcome in COVID-19 pneumonia,” Radiology, vol. 296, no. 2, pp. E86–E96, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Zhao W., Zhong Z., Xie X., Yu Q., and Liu J., “CT scans of patients with 2019 novel coronavirus (COVID-19) pneumonia,” Theranostics, vol. 10, no. 10, 2020, Art. no. 4606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Bai H. X. et al. , “Performance of radiologists in differentiating COVID-19 from non-COVID-19 viral pneumonia at chest CT,” Radiology, vol. 296, no. 2, pp. E46–E54, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Fang Y. et al. , “Sensitivity of chest CT for COVID-19: Comparison to RT-PCR,” Radiology, vol. 296, no. 2, pp. E115–E117, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Bernheim A. et al. , “Chest CT findings in coronavirus disease-19 (COVID-19): Relationship to duration of infection,” Radiology, vol. 295, no. 3, 2020, Art. no. 200463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Liu M., Zeng W., Wen Y., Zheng Y., Lv F., and Xiao K., “COVID-19 pneumonia: CT findings of 122 patients and differentiation from influenza pneumonia,” Eur. Radiol., vol. 30, no. 10, pp. 5463–5469, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Zhang Y.-D., Satapathy S. C., Zhang X., and Wang S.-H., “COVID-19 diagnosis via DenseNet and optimization of transfer learning setting,” Cogn. Comput., Springer, 2021. [Online]. Available: https://doi.org/10.1007/s12559-020-09776-8 [DOI] [PMC free article] [PubMed]
- [14].Wong H. Y. F. et al. , “Frequency and distribution of chest radiographic findings in COVID-19 positive patients,” Radiology, vol. 296, no. 2, pp. E72–E78, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Wang S.-H., Nayak D. R., Guttery D. S., Zhang X., and Zhang Y.-D., “COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis,” Inf. Fusion, vol. 68, pp. 131–148, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Wang L., Lin Z. Q., and Wong A., “COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images,” Sci. Rep., vol. 10, no. 1, pp. 1–12, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Fang M. et al. , “CT radiomics can help screen the coronavirus disease 2019 (COVID-19): A preliminary study,” Sci. China Inf. Sci., vol. 63, no. 7, 2020, Art. no. 172103. [Google Scholar]
- [18].Chen H. J. et al. , “Machine learning-based CT radiomics model distinguishes COVID-19 from other viral pneumonia,” 2020. [Online]. Available: https://doi.org/10.21203/rs.3.rs-32511/v1 [DOI] [PMC free article] [PubMed]
- [19].Li L. et al. , “Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: Evaluation of the diagnostic accuracy,” Radiology, vol. 296, no. 2, pp. E65–E71, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Bai H. X. et al. , “Artificial intelligence augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other origin at chest CT,” Radiology, vol. 296, no. 3, pp. E156–E165, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Wang X. et al. , “A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT,” IEEE Trans. Med. Imag., vol. 39, no. 8, pp. 2615–2625, Aug. 2020. [DOI] [PubMed] [Google Scholar]
- [22].Ouyang X. et al. , “Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia,” IEEE Trans. Med. Imag., vol. 39, no. 8, pp. 2595–2605, Aug. 2020. [DOI] [PubMed] [Google Scholar]
- [23].Kang H. et al. , “Diagnosis of coronavirus disease 2019 (COVID-19) with structured latent multi-view representation learning,” IEEE Trans. Med. Imag., vol. 39, no. 8, pp. 2606–2614, Aug. 2020. [DOI] [PubMed] [Google Scholar]
- [24].Han Z. et al. , “Accurate screening of COVID-19 using attention-based deep 3D multiple instance learning,” IEEE Trans. Med. Imag., vol. 39, no. 8, pp. 2584–2594, Aug. 2020. [DOI] [PubMed] [Google Scholar]
- [25].Shah V., Keniya R., Shridharani A., Punjabi M., Shah J., and Mehendale N., “Diagnosis of COVID-19 using CT scan images and deep learning techniques,” Emergency Radiol., vol. 28, pp. 497–505, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].He K., Zhang X., Ren S., and Sun J., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778. [Google Scholar]
- [27].Tan M. and Le Q., “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 6105–6114. [Google Scholar]
- [28].Zhang Y.-D., Satapathy S. C., Liu S., and Li G.-R., “A five-layer deep convolutional neural network with stochastic pooling for chest CT-based COVID-19 diagnosis,” Mach. Vis. Appl., vol. 32, no. 1, pp. 1–13, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Zhang X. et al. , “Diagnosis of COVID-19 pneumonia via a novel deep learning architecture,” J. Comput. Sci. Technol., 2021. [Online]. Available: https://jcst.ict.ac.cn/EN/10.1007/s11390–020-0679-8 [DOI] [PMC free article] [PubMed]
- [30].Liao F., Liang M., Li Z., Hu X., and Song S., “Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-or network,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, pp. 3484–3495, Nov. 2019. [DOI] [PubMed] [Google Scholar]
- [31].Hofmanninger J., Prayer F., Pan J., Röhrich S., Prosch H., and Langs G., “Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem,” Eur. Radiol. Exp., vol. 4, no. 1, pp. 1–13, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Le L., Patterson A., and White M., “Supervised autoencoders: Improving generalization performance with unsupervised regularizers,” in Proc. 32nd Int. Conf. Neural Inf. Process. Syst., 2018, vol. 31, pp. 107–117. [Google Scholar]
- [33].Li X., Morgan P. S., Ashburner J., Smith J., and Rorden C., “The first step for neuroimaging data analysis: DICOM to NIfTI conversion,” J. Neurosci. Methods, vol. 264, pp. 47–56, 2016. [DOI] [PubMed] [Google Scholar]
- [34].Cheng J., Liu J., Liu L., Pan Y., and Wang J., “Multi-level glioma segmentation using 3D U-net combined attention mechanism with atrous convolution,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2019, pp. 1031–1036. [Google Scholar]
- [35].Liu L., Cheng J., Quan Q., Wu F.-X., Wang Y.-P., and Wang J., “A survey on U-shaped networks in medical image segmentations,” Neurocomputing, vol. 409, pp. 244–258, 2020. [Google Scholar]
- [36].Cheng J., Liu J., Jiang M., Yue H., Wu L., and Wang J., “Prediction of egfr mutation status in lung adenocarcinoma using multi-source feature representations,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2021, pp. 1350–1354. [Google Scholar]
- [37].Zhang Y. and Yang Q., “A survey on multi-task learning,” IEEE Trans. Knowl. Data Eng., early access, Mar. 31, 2021. [Online]. Available: https://doi.org/10.1109/TKDE.2021.3070203
- [38].Sun S., “A survey of multi-view machine learning,” Neural Comput. Appl., vol. 23, no. 7/8, pp. 2031–2038, 2013. [Google Scholar]
- [39].Vincent P., Larochelle H., Lajoie I., Bengio Y., and Manzagol P.-A., “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, no. Dec, pp. 3371–3408, 2010. [Google Scholar]
- [40].Wang Y., Yao H., and Zhao S., “Auto-encoder based dimensionality reduction,” Neurocomputing, vol. 184, pp. 232–242, 2016. [Google Scholar]
- [41].Cheng J. et al. , “Multimodal disentangled variational autoencoder with game theoretic interpretability for glioma grading,” IEEE J. Biomed. Health Inform., 2021. [Online]. Available: https://doi.org/10.1109/JBHI.2021.3095476 [DOI] [PubMed]
- [42].Lu C., Wang Z.-Y., Qin W.-L., and Ma J., “Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification,” Signal Process., vol. 130, pp. 377–388, 2017. [Google Scholar]
- [43].Ioffe S. and Szegedy C., “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 448–456. [Google Scholar]
- [44].Srivastava N., Hinton G., Krizhevsky A., Sutskever I., and Salakhutdinov R., “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014. [Google Scholar]
- [45].Wang Y., Liu J., Xiang Y., Wang J., Chen Q., and Chong J., “Mage: Automatic diagnosis of autism spectrum disorders using multi-atlas graph convolutional networks and ensemble learning,” Neurocomputing, 2021. [Online]. Available: https://doi.org/10.1016/j.neucom.2020.06.152
- [46].Cheng J., Liu J., Yue H., Bai H., Pan Y., and Wang J., “Prediction of glioma grade using intratumoral and peritumoral radiomic features from multiparametric MRI images,” IEEE/ACM Trans. Comput. Biol. Bioinf., 2020. [Online]. Available: https://doi.org/10.1109/TCBB.2020.3033538 [DOI] [PubMed]
- [47].Peng C.-Y. J., Lee K. L., and Ingersoll G. M., “An introduction to logistic regression analysis and reporting,” J. Educ. Res., vol. 96, no. 1, pp. 3–14, 2002. [Google Scholar]
- [48].Liaw A. et al. , “Classification and regression by randomforest,” R News, vol. 2, no. 3, pp. 18–22, 2002. [Google Scholar]
- [49].Burges C. J., “A tutorial on support vector machines for pattern recognition,” Data Mining Knowl. Discov., vol. 2, no. 2, pp. 121–167, 1998. [Google Scholar]
- [50].Maaten L. V. D. and Hinton G., “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. Nov, pp. 2579–2605, 2008. [Google Scholar]
- [51].Mei X. et al. , “Artificial intelligence–enabled rapid diagnosis of patients with COVID-19,” Nat. Med., vol. 26, no. 8, pp. 1224–1228, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Zheng C., “Time course of lung changes at chest CT during recovery from coronavirus disease 2019 (COVID-19),” Radiology, vol. 295, no. 3, pp. 715–721, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]