Skip to main content
. 2021 Apr 26;23(6):1467–1497. doi: 10.1007/s10796-021-10131-x

Table 3.

Summary of selected research papers about COVID-19 and data used within those studies

Reference Dataset COVID19 Data Time interval AI/ML method Performance Relevance Shortcoming
Car et al. (2020) JHU CSSE time series. infected, recovered, and deceased patients. 20,706 data points for 406 locations Jan 22 – Mar 12, 2020 MLP regressor. Limited-memory BFGS (Broyden–Fletcher–Goldfarb–Shanno algorithm) R2 (confirmed): 0.94 R2(Recovered): 0.781 R2(Deceased): 0.986 on 5-cross validation Model of novel viral infections with geographical and time data as inputs. Average training time 2357 min on 16 48-thread HPC nodes, for 5-fold cross-validation and grid search of 5376 items. Models can be compared with various infectious diseases. Other approaches should be applied to gain explainability.
Zhu et al. (2020) Huami Wearable Device time series. Physiological data. 1.3 million users (with or without COVID-19) Jul 1, 2017 – Apr 8, 2020 Regression model combining sparse categorical features and dense numerical features (CDNet), that concatenates 2 subnetworks: CatNN and DenNN Pearson’s coefficient ρ: 0.68 Prediction using dynamic physiological data may have an advantage in recognition of the outbreak of infection. The validity of the statistical description depends on both the user scale and diversity.
Ghamizi et al. (2020) Google’s Mobility Reports time series. 32 features (mobility trends over time and demographic features) for 97 different countries. 4,625 inputs of 32 features each Jan 3 – Apr 29, 2020 Feed-Forward Neural Network (FFNN) R2: 0.97 vs R2 (LSTM): 0.95 FFNN provides accurate and interpretable predictions better feature engineering or neural architecture search (with CNN or RNN)
Mackey et al. (2020) Twitter and Instagram text. Sales of COVID-19 related products. 1,042 unique tweets and 596 Instagram posts Feb 5 – May 7, 2020 NLP & RNN and LSTM AUC: 94–99 (based on Li et al. (2019)) Identified over 1000 suspect selling posts Multimodal methods that could analyze and distinguish both text and image have not been used.
Murphy et al. (2020) Netherlands Hospitals images. Chest X-rays. 994 images including 512 images from COVID-19 positive subjects Mar 4 – Apr 6, 2020 CAD4COVID-Xray, based on CAD4TB v6 - a commercial deep learning system AUC: 0.81 Spec: 78%; Sens: 75% Performance compared against 6 independent readers Need to take into account related patient details.
Ls et al. (2020) 2 hospitals in China images. CT scans. 408 COVID-19 patients Jan 1 – Mar 18, 2020 ResNet34 as a backbone model for multiple instance learning (MIL) framework training procedure (ROC) AUC : 0.987 ACC: 97.4% on 5-fold cross-validation Model can be employed as a tool for prognosis prediction. Validated a MIL-based predictive model using CT imaging. i) Sample size was relatively small; ii)Lack of transparency and interpretability (like all DL models)
Zhang et al. (2020) Wuhan and Ecuador centers; Radiopaedia dataset images. CT images. 2,246 patients including 752 COVID-19 patients used for training Jan 25 – Mar 25, 2020 (i) segmentation networks: U-net, DRUNET, FCN SegNet, and DeepLabv3. (ii) Classification networks: ResNet-18 ACC = 90.71% Sens = 92.50%, Spec = 90.00% Performance comparable to that of practicing radiologists. To refine the clinical prognostic model with varying risk thresholds associated with different clinical prognoses.
Abdel-Basset et al. (2021a) Italian Society of Medical and Interventional Radiology images. CT images. 80 COVID-19 patients used for image segmentation before Apr 11, 2020 Few-shot segmentation (FSS) with four encoder blocks based on pre-trained Res2Net-50 DSC: 0.798 Sens: 0.803, Spec: 0.986 Model could outperform all approaches to multiple evaluation metrics i) Comprehensive parameter improving to attain the highest results, ii) Predictions lack laborious uncertainty quantification, unable to achieve a very precise segmentation iii) Accountability and interpretability do need to be improved.
Roy et al. (2020) ICLUS-DB video. Lung ultrasound (LUS) videos. 35 patients (including 17 COVID-19 patients) generating 277 videos Mar – Apr, 2020 ConvNet similar to van Sloun and Demi (2020), B-line, STN and CNN are jointly trained by using the Adam optimizer ACC: 96% binary Dice score: 0.75 i) Fully-annotated dataset of LUS images, ii) Predicts the disease severity score associated with a input frame. i) Leveraging the temporal structure between frames in a sequential model; ii) The data set should be wider and more balanced
Banerjee et al. (2020) Hospital in Brazil time series. laboratory test clinical data: age, outcome from SARS-CoV-2 test and standard full blood count (15 features). individual patients, including 81 COVID-19 patients Mar 28 – Apr 3, 2020 i) ANN; ii) random forest (RF) and Lasso-elastic-net regularized generalized linear (glmnet); iii) simple logistic regression (LR) (i) (ROC) AUC 0.95 ± 0.08 (ii) (ROC) AUC: 94% (iii) (ROC) AUC: 81% Improve initial screening for patients with limited PCR-based diagnostic tools. Random forests and glmnet offer a clearer overview of the most relevant factors, compared to ANN, as well as a better indicator on how a decision has been reached.
Pan et al. (2021) 2 isolation centers of Huazhong University of Science and Technology in Wuhan multimedia. chest CT scans. 931 confirmed COVID-19 vs 1340 healthy persons Until Mar 31, 2020 COVID-Lesion Net based on a combination of U-net and Fully convolutional networks Dice coefficient: 82.08% 85.00% for the training Deep learning-based quantification for COVID-19, quantification of the lung volume and the percent of the lung involvement. i) Performance measured against no standard for the lesion area quantification for viral pneumonia, ii) Not multi-center training
Ismael and Şengür (2021) three different sources (Cohen, Kaggle, Radiology Assistant) multimedia Chest X-ray images. 180 COVID-19 and 200 normal (healthy) chest X-ray images Mar 10, 2020 deep features model (ResNet50) and SVM with Linear kernel 94.7% accuracy other: 89.1%- 90.3% Three CNN deep methods have been applied. In addition to different kernel functions, the deep features have been classified through SVM. More testing needed.
Lopez-Rincon et al. (2021) NCBI database of genetic variation and NGDC (National Genomics Data Center) sequences. 583 sequences (*.fasta files) from the NGDC Mar 15, 2020 CNN Accuracy of 98.73 The network was able to systematically discover significant sequences to isolate the various virus classes. Further testing is necessary

During the first peak of the COVID-19 pandemic (stage 3), the principal affected countries were in Europe and America and therefore the databases generally come from these areas. The first examples of social network analysis are reported, with a limited number of instances. The temporal windows during which the data were gathered extend until April 2020. Despite the time interval reported for Mackey et al. (2020), the relationship with Stage 3 for the COVID-19 is due to the fact that in the USA, by that time, the pandemic phase was still in the first stages