. 2021 Apr 26;23(6):1467–1497. doi: 10.1007/s10796-021-10131-x

Table 4.

Timing and data availability in research papers after the first peak of the COVID-19 pandemic (stage 4)

Reference	Dataset	COVID19 Data	Time interval	AI/ML method	Performance	Relevance	Shortcoming
Doanvo et al. (2020)	CORD-19 dataset (Lu Wang et al. 2020)	text. research paper collection. 48,670 COVID-19 papers vs 137,326 overall papers	before Jul 31, 2020	NLP & SVD & LDA	N.A.	ML explores latent semantic information to recognize hidden patterns and does not rely on any a priori knowledge of topics.	LDA is an unsupervised probabilistic algorithm and lacks the quality of a supervised method.
Ramchandani et al. (2020)	SafeGraph; Mapbox and The New York Times GitHub repository	time series. Features related to: population attributes, population activities, mobility, and disease spread. 2,100 sociodemographic features and others	Apr 5 – June 28, 2020	deep learning model based on the high-level framework of DeepFM	average ACC: 63.7%	Method can derive embeddings from multivariate time series and multivariate spatial time series data by using both the temporal and spatial structure in a wide range of input features.	No suitable method for interpreting second-order interactions; higher-order interactions are only indirectly captured and cannot therefore be easily interpreted.
Kim et al. (2020)	Google Search Trend; and datasets in Data description sec.	time series. Intra-country and inter-country time series. Daily cases and deaths with anxiety search trend; daily roaming entrants, airlines arriving, count of imported cases;	Mar 22 – May 5, 2020	Two-level hierarchical architecture of Hi-COVIDNet model, which mainly consists of the country-level encoder and the continent-level encoder.	RMSE: 0.4045 RMSE (ARIMA):0.4931 RMSE (multiv. LSTM): 0.5188	Exploit the geographic hierarchy as well as a hierarchical objective function to overcome a relatively short period of data collection for COVID-19.	Further testing is needed, on other country data.
Minaee et al. (2020)	COVID-19 Image Data Collection (Cohen et al. 2020b); Chestxray and ChexPert datasets	images. X-Ray images. 5,184 X-ray images including 184 COVID-19	before May 3, 2020	ResNet18, ResNet50, SqueezeNet, and DenseNet-161	Model Spec: ResNet18: 90.7% ± 1.1% ResNet50: 89.6% ± 1.1% SqueezeNet: 92.9% ± 0.9% Densenet-121: 75.1% ± 1.5%	Made 5k images dataset publicly available. Used transfer learning, and fine-tuning on pre-trained convolutional models	Only benchmark for future works and comparisons.It is worth noting that, considering the amount of data labeled, the outcome of the work is still preliminary and a more definitive conclusion needs more tests on a larger dataset of the COVID-19 labeled X-ray images.
Horry et al. (2020)	COVID-19 Image Data Collection (Cohen et al. 2020b); COVID-CT and POCOVID-Net datasets	images. a) chest X-Rays, b) CT scans and c) Ultrasound images. a) 729 patients including 139 COVID-19 patients; b) 746 patients including 349 COVID-19 patients; c) 911 patients including 339 COVID-19 patients	before May 11, 2020	Model Selection with VGG19 and others	F1 score: a)0.84-0.89, b) 0.81-0.83 c) 0.96-1.00	Provided a pre-processing pipeline aimed to remove the sampling bias and improve image quality. Showed pre-trained models tuned effectively for the Ultrasound image samples.	Needs great caution in the development of clinical diagnostic models using the available COVID-19 image dataset. To extend study with multimodal data fusion. A highly curated data set is not comparable to the available COVID-19 chest X-Ray dataset.
Jin et al. (2020)	3 centers in Wuhan; MosMedData, Tianchi-Alibaba, LIDC–IDRI databases	images. CT scans. 11,356 CT scans from 9,025 subjects of which 2,529 were COVID-19 scans	Feb 5 – Mar 29, 2020	i) Segmentation module based on U-Net, ii) Deep network backbone ResNet152, iii) Guided Grad-CAM for attentional regions.	AUC: 0.9745 - 0.9885 Dice (segmentation): 92.55%	AI system outperforms all of radiologists. Unlike classical black-box deep learning approaches, it can decode effective representation of COVID-19 on CT imaging.	Guided Grad-CAM can only extract attention region rather than lesion segmentation. It is important to collect more data and build a wide data set with linked CT and clinical information to allow additional diagnostic analysis.
Sadefo Kamdem et al. (2020)	Boursorama database; COVID-19 dataset	time series. Daily prices for 4 trading commodities; confirmed cases and total deaths in 2 countries	before Apr 24, 2020	ARIMA-WBF model and LSTM model	ACC: 92.13% - 97.45%	Forecasting commodity prices and examining the effect of coronavirus on commodity price fluctuations.	Application of reinforcement learning. Not analyzed price overreaction behavior.
Ou et al. (2020)	EIA weekly gasoline demand; datasets in Data availability sec.	time series. COVID-19 pandemic data, government policies and demographic information	Feb 15 – June 5 2020	PODA model, has 42 inputs, 2 layers, and 25 hidden nodes for each layer	RMSE: 6.2 - 65.2	Framework to investigate and project motor gasoline demand based on COVID-19 pandemic impacts.	Model does not consider the dynamic effect of travel mobility and assumes that the demand for gasoline from light-duty vehicles and other sectors is constant throughout the pandemic.
Wang et al. (2021b)	RSNA Pneumonia Detection Challenge & Cohen Dataset	multimedia chest X-ray. 3545 chest X-ray images, with 225 COVID-19	Jan 25 - May 1, 2020	ResNet50 + feature pyramid network (FPN)	accuracy of 95.12%	i) Automatically identify COVID-19 patients with X-rays. ii) Automatic lung detection. iii) Propose CAD tool for assisting in the processing of large-scale chest X-ray data	Lack of interpretability and not addressed disease classification by severity.
Gupta et al. (2021)	Five different virus diseases (i.e., Covid-19, Ebola, MERS, SARS, Swine flu)	time series Retrived number of affected cases and deaths from Coronavirus disease 2019 (COVID-19) time serie	Jan 22 - Oct 9, 2020	Dense layers with LSTM	RMSE: 0.0766-0.0533 RMSE (SVM): 0.2801-0.4323 RMSE (DT): 0.1108-0.1223	The proposed DL model has been compared to other popular prediction methods that indicate a lower RMSE.	For the model to work perfectly, the data must be accurate.

The time intervals generally extend from March to October, 2020. There is a wide variety of data sets, with different kinds of diagnostics images and social data. In particular, images derive from both CT and X-Rays, and many researchers use the Cohen et al. (2020b) X-Ray Dataset. In this stage data are freely available and are generated not only by hospitals: a wide variety is of social data can be found, e.g. encompassing gasoline demand Ou et al. (2020) and numbers of incoming visitors to a country Kim et al. (2020) . The different time positioning of Jin et al. (2020) concerning the Stage 4 of the proposed temporal subdivision, can be found in the fact that the authors refers to their country situation, the China, where the pandemic was already in an advanced state with respect to the rest of the world