Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2021 Aug 21;29:104631. doi: 10.1016/j.rinp.2021.104631

Modeling and forecasting the COVID-19 pandemic with heterogeneous autoregression approaches: South Korea

Eunju Hwang 1,, SeongMin Yu 1
PMCID: PMC8378995  PMID: 34458082

Abstract

This paper deals with time series analysis for COVID-19 in South Korea. We adopt heterogeneous autoregressive (HAR) time series models and discuss the statistical inference for various COVID-19 data. Seven data sets such as cumulative confirmed (CC) case, cumulative recovered (CR) case and cumulative death (CD) case as well as recovery rate, fatality rate and infection rates for 14 and 21 days are handled for the statistical analysis. In the HAR models, model selections of orders are conducted by evaluating root mean square error (RMSE) and mean absolute error (MAE) as well as R2, AIC, and BIC. As a result of estimation, we provide coefficients estimates, standard errors and 95% confidence intervals in the HAR models. Our results report that fitted values via the HAR models are not only well-matched with the real cumulative cases but also differenced values from the fitted HAR models are well-matched with real daily cases. Additionally, because the CC and the CD cases are strongly correlated, we use a bivariate HAR model for the two data sets. Out-of-sample forecastings are carried out with the COVID-19 data sets to obtain multi-step ahead predicted values and 95% prediction intervals. As for the forecasting performances, four accuracy measures such as RMSE, MAE, mean absolute percentage error (MAPE) and root relative square error (RRSE) are evaluated. Contributions of this work are three folds: First, it is shown that the HAR models fit well to cumulative numbers of the COVID-19 data along with good criterion results. Second, a variety of analysis are studied for the COVID-19 series: confirmed, recovered, death cases, as well as the related rates. Third, forecast accuracy measures are evaluated as small values of errors, and thus it is concluded that the HAR model provides a good prediction model for the COVID-19.

Keywords: COVID-19, Heterogeneous autoregressive model, Estimation, Prediction

Introduction

The coronavirus 2019 (COVID-19) emerged in December 2019 has been seriously threatening the life of humans. The World Health Organization (WHO) declared the outbreak of COVID-19 as a pandemic in March 2020. Not only have governments around world tried to control the spread of the virus but also medical institutions are committed to finding vaccines and treatments. Many researchers in all fields are trying to support the health system to prevent the disaster of infection as well. Nevertheless, as of May 1, 2021, the COVID-19 pandemic has infected more than 151 million of the humans all over the world and caused 3 million deaths.

As one of the academia attempts to overcome the crisis, mathematical modeling and forecasting on the COVID-19 cases have been extensively carried out by statisticians and health scientists. The earliest work on the time series model for the COVID-19 analysis has been done in February 2020 by [1], who performed autoregressive integrated moving average (ARIMA) model prediction for the prevalence and incidence of COVID-19. The ARIMA model is well-known as a very powerful and effective model for epidemic diseases, for which we refer to Table 1 of [2]. The author of [2] also used the ARIMA time series models for estimation of COVID-19 in Italy, Spain and France. Many authors, for instance, [3], [4], [5], [6], [7], [8] have adopted the ARIMA models for the COVID-19 analysis. [3] studied time series analysis and forecasting with the ARIMA models and other neural network techniques for main eight European countries, whereas [4] explored and compared using the ARIMA and various models of machine learning regression for short-term forecasting cumulative confirmed cases in Brazil. Also, [5] adopted the ARIMA and Prophet time series forecasting models and evaluated prediction performances for 10 mostly affected countries, where the Prophet model was proposed by [9] and used several non-linear and linear methods as components with time. [6] identified the best ARIMA model and then applied it to forecast future incidence of COVID-19 cases in India. Moreover, [7] used Box–Jenkins method to find the best ARIMA model for predicting the numbers of people infected with COVID-19 disease in Iraq. [8] presented numerous statistical analyses to forecast the future number of daily deaths and infections in European and African countries, including Box–Jenkins ARIMA and fractal interpolations, fractal dimension.

As a more general or more refined probability model, [10] adopted an AR model with two-piece scale-mixed normal distributed errors (TP-SMN-AR) while [11] a TP-SMN-ARMA process in order to possess asymmetry of the distribution of the COVID-19 data. [12] introduced the autoregressive distributed lag (ADL) model to describe the development of the disease at the exponential phase. The ADL model allows describing non-monotonic changes in relative infection over the time, and predicting the outcomes of their decisions on public health. [13] proposed a mathematical model to characterize aspects of the COVID-19 pandemic in South Korea, Italy and Brazil, and compared main features of the three countries as examples of very different scenarios of the COVID-19 pandemic. [14] examined the pandemic for India via sensitivity analysis whereas [15] for China via an optimization method. [16] dealt with an evolution of the COVID-19 infection data with power-law growth and saturation by noticing that the total numbers of infected cases exhibits exponential growth and then power-law growth before the flattening of the curve. [17] proposed a new gray prediction model based on traditional Richards model and modified gray action quantity for the COVID-19 in China, Italy, Britain and Russia. [18] used an efficient prediction model based on Verhulst equation to analyze the COVID-19 in China, Italy and Spain.

Some efforts in the area of artificial intelligence have been conducted to analyze the COVID-19 epidemic. For instance, [19] predicted the number of COVID-19 daily cases in Turkey through a fuzzy rule basing system; [20] exploited the Gaussian process regression as a recently developed machine learning technique to predict the COVID-19 death cases and compared with artificial neural network supervised based methods in terms of their prediction abilities. [21] implemented several forecasting techniques such as naive method, moving average, exponential smoothing, Holt–Winters method, ARIMA, etc. for comparison of prediction of COVID-19 worldwide cases. [22] proposed a new genetic programming based model for confirmed cases and death cases for India. [23] proposed a novel technique based on meta-heuristic GWO (gray wolf optimizer) algorithm to optimize hyper parameters for LSTM (long short term memory) network and compared with the baseline models including ARIMA model. [24] proposed a time series prediction model using machine learning to obtain the curve of disease and forecast the epidemic tendency. Linear regression, multi-layer perception, random forest and support vector machines (SVM) machine learning methods were used. [25] examined the advantages of Singular Spectrum Analysis (SSA) for forecasting the number of daily confirmed cases, deaths, and recoveries caused by COVID-19. The results of V-SSA and R-SSA were compared to those from ARIMA, ARFIMA, Exponential Smoothing, TBATS, and NNAR. [26] studied parameter estimation and prediction of the COVID-19 epidemic using SIR/SQAIR models. [27] also presented a COVID-19 prediction study of artificial neural networks for the daily numbers of cases and deaths per million people, which are different from other works mentioned above. [28], [29], [30], [31], [32] investigated forecasting for the COVID-19 as well.

As for the infection fatality rates of the COVID-19, [33], [34], [35], [36], [37], [38], [39] discussed with the empirical results; [33] calculated an estimate of the infection fatality rate by dividing the number of deaths by the total infected cases from seroprevalence data. [34] presented empirical results on infection and fatality rates from cross-country regression and found a significant positive impact of local air pollution on the rates. [39] forecasted COVID-19 growth rates with statistical, epidemiological, machine-learning and deep-learning models, and a new hybrid forecasting method based on nearest neighbors and clustering. Meanwhile, [40], [41] analyzed the COVID-19 data curve and [42], [43], [44] dealt with the finance and economics related to the COVID-19.

Fortunately, vaccines of the COVID-19 have been recently developed and distributed worldwide. Even though the vaccines of the COVID-19 are now available almost all over the world, numbers of the confirmed cases are not decreasing and the virus still threatens the life of the humans and the health system on earth. At the beginning, South Korea was reported to be able to control the disease very well, as [13] mentioned. However, as the winter came with low temperature South Korea has been facing the crisis of the COVID-19 pandemic like other countries, and even as spring comes, the number of the COVID-19 confirmed cases still remains unchanged. The ceaseless spread of the COVID-19 leads us to study the accurate prediction of the time series on the COVID-19 cases data in order to give useful information and help to determine the government policy. Accuracy of the forecasting can provide optimized decisions on the policies such as social distancing stages and preparation of health systems, ready to accommodate the expecting numbers of patients.

In this work we explore Korean COVID-19 pandemic time series such as confirmed cases, released cases, death cases, as well as their related rates, and examine the estimation and forecast by means of a popular time series model. The model we adopt is a heterogeneous autoregressive (HAR) model, which is known to be powerful for long-memory financial volatility. The COVID-19 time series data reveal the long-memory feature as seen in Fig. 1. For this reason we employ the HAR models and investigate statistical analysis on the COVID-19. Some criteria such as R2, AIC and BIC are computed in the HAR models of order p=2,3,4. Another merit of the HAR model on the COVID-19 is that its coefficient of determination R2 is near one for the cumulative series data. R2 is a measure of the global fit of the model and R2=1 implies that the relative square error is zero. In conclusion, for these two reasons: long-memory of the COVID-19 data and the R2-value of the HAR model with the COVID-19 data, by means of the HAR models, instead of other models, we study estimation as well as forecasting on the COVID-19 cases.

Fig. 1.

Fig. 1

Korean COVID-19 time series data and their ACFs; Cumulative Confirmed, Cumulative Recovered and Cumulative Deaths cases (first row); Recovery Rate and Fatality Rate (second row); Infection Rates for 14 days and for 21 days (third row).

We handle seven data sets such as cumulative confirmed (CC) case, cumulative recovered (CR) case, cumulative death (CD) case as well as recovery rate (RR), fatality rate (FR), infection rates (IRs) for 14 and 21 days. The rates are considered as time-varying series and defined as follows: The RR on a day is computed as the ratio of cumulative recovered case to cumulative confirmed case, and the FR is the ratio of cumulative death case to cumulative confirmed case. The IR on a day during the period of τ days is computed as the proportion of confirmed case on the day to the partial sum of confirmed cases for the last consecutive τ days, and we set τ=14 and 21 because the incubation period of the COVID-19 is between 14 and 21 according to [45]. Our results report that the R2-values of HAR models fitted with the CC, CR, CD and RR are equal to one, and the value of the FR is 0.999, whereas the R2-values of IRs are not one, but greater than 0.8, and yet their HAR models have good performance.

Model selections of orders in the HAR models are conducted by evaluating root mean square error (RMSE) and mean absolute error (MAE) as well as R2, AIC, and BIC. As for the estimation method we use the ordinary least square estimates (OLSE) in the HAR models. Asymptotic normality of the OLSE in the HAR model was established together with a Monte-Carlo simulation study by [46]. Thus, we defer to [46] for the simulation study and omit it in this work. Estimates of the parameters in the HAR models are computed along with standard errors and 95% confidence intervals, based on the theory of [46]. Our results address that fitted values via the HAR models with the OLSEs are not only well-matched to the real cumulative cases data, but also differenced values from the fitted HAR models are well-matched to real daily cases data. Indeed, the proposed model makes small values of RMSE and MAE on the COVID-19 data, and therefore it is an efficient and reliable tool to predict the COVID-19.

Additionally, because the CC and the CD cases are strongly correlated, we consider a bivariate HAR time series model to see the co-movements of a pair of the confirmed and death cases. By comparing the bivariate HAR model with the univariate HAR models, applied to the CC and the CD, we determine better prediction models for the two cases. It is demonstrated that the bivariate HAR model has smaller errors than univariate HAR models.

Out-of-sample forecastings are conducted with the seven COVID-19 data sets to compute multi-step ahead predicted values and 95% prediction intervals. Four forecasting performance measures are evaluated: RMSE, MAE, mean absolute percentage error (MAPE) and root relative square error (RRSE). Most prediction intervals include the real values except for the FR that becomes suddenly flattened after the time epoch of the out-of-sample. However, predicted values and prediction interval of the FR have reasonable pattern, though its actual values in the out-of-sample period deviate from the interval. The CC and the CD cases, applied by two models: the univariate and the bivariate, have different prediction intervals in the two models. It is because the confirmed cases have regressors of the death cases in the bivariate HAR model, and after the time epoch of the out-of-sample, the number of death cases increases very slowly, while in the univariate HAR models the two cases are modeled independently. It is also addressed that in the bivariate prediction the errors of the forecast accuracy for the CC are similar to or smaller, and all the errors for the CD are smaller than those in the univariate prediction. Consequently, as we expected, the bivariate HAR model yields better fitting-performance than the univariate models.

The main findings of this work are three folds: First, HAR models are well-fitted to cumulative numbers of COVID-19 data with a good criterion result like coefficient of determination R2 being near one, which means that the relative square error is near zero. Second, this study analyzes a variety of the COVID-19 time series data such as confirmed cases as well as recovery cases, death cases, and the related rates: recovery rate, fatality rate, infection rates for 14, 21 days, for which all the HAR model can be adopted with good performance. Third, forecast accuracy measures of RMSE, MAE, MAPE and RRSE are evaluated as small errors to conclude that the HAR model provides a good prediction model for the COVID-19 pandemic.

The rest of the paper is organized as follows: In Section “Heterogeneous autoregressive (HAR) models” the HAR model and its background are briefly described. In Section “Estimation” data description and estimation are explored while in Section “Forecasting” forecasting is investigated. Conclusion and discussion are given in Section “Conclusion and discussion”.

Heterogeneous autoregressive (HAR) models

This section introduces briefly the heterogeneous autoregressive (HAR) model, which we adopt for analysis of Korean COVID-19 series. To do this, some background is stated here.

One of the theories that explain the financial market is a heterogeneous market hypothesis. The theory assumes that investors who accept investment information have different reactions. The heterogeneous market hypothesis in stock markets is the hypothesis proposed by [47]. Dealers and traders observe the situation in stock markets that are traded in a short period of time for trading. Investors holding stocks in the form of assets over the long term are sensitive to government policy changes and are not very sensitive to the frequency of stock trading. Market participants note long-term events along with share price volatility. This is called a heterogeneous market, which is composed by the entire economy on the premise that there is a temporal heterogeneity in making investment decisions.

[47] argued that stock market volatility should be analyzed using heterogeneous market hypotheses to identify dynamic changes in the market. Studies of heterogeneous market hypotheses include the HARCH (heterogeneous autoregressive conditional heteroscedastic) model of [47] and the HAR-RV (heterogeneous autoregressive-realized volatility) model of [48]. Inspired by the heterogeneous market hypothesis and the HARCH model, [48] proposed the HAR-RV model, which is a linear autoregressive model with daily, weekly, and monthly moving averages, theoretically equivalent to an autoregressive model of order 22. The HAR-RV model uses not only the heterogeneous market hypothesis of [47] but also the realized volatility of [49]. It represents heterogeneity over daily, weekly and monthly periods. A main feature of the HAR model is long-memory. In particular, it well represents the characteristics of variability, such as long-term memory, and is highly regarded for its good predictive performance on realized volatility.

The HAR-RV model of [48] is described as follows:

RVt=α0+αdRVt1+αwRVt5:t1+αmRVt22:t1+εt

where RVt5:t1 and RVt22:t1 are weekly and monthly average realized volatility, and the parameters αd,αw and αm are coefficients for each day, week, and month, respectively.

In this work, we adopt the HAR model to investigate estimation and forecasting of Korean COVID-19 cumulative cases series and related data. We consider two types of HAR models: univariate HAR and bivariate HAR models. Cumulative cases data have long-memory features as seen in Fig. 1 and thus the HAR models are appropriately fitted to the cumulative cases. Its theoretical analysis as well as simulation study can be seen in recent works, for example, in [46]. We describe a univariate HAR model and a bivariate HAR model in the following:

A (univariate) HAR model {Xt,tZ} of order p is given by

Xt=ϕ1Xt1(1)+ϕ2Xt1(2)++ϕpXt1(p)+εt (1)

where Xt1(i)=1hi(Xt1++Xthi), i=1,2,,p, with positive integers {hi,i=1,2,,p} satisfying 1=h1<h2<<hp, and {εt} is a sequence of random variables with mean zero and variance σε2. ϕ1,,ϕp are coefficients of the model to be estimated and it is assumed that i=1pϕi<1.

A bivariate HAR model {(Xt,Yt),tZ} of orders (p,q) is defined as

Xt=α11Xt1(1)++α1pXt1(p)+β11Yt1(1)++β1qYt1(q)+ε1,tYt=α21Xt1(1)++α2pXt1(p)+β21Yt1(1)++β2qYt1(q)+ε2,t (2)

where Yt1(i) is given in the same way with hi, i=1,2,q; {ε1,t} and {ε2,t} are independent noise processes with mean zeros. Coefficients αji,βjk are assumed to be i=1pαji+k=1qβjk<1 for each j=1,2. The asymptotic property of the least squares estimators in the bivariate HAR model as well as the simulation study were established by [46] along with an application to financial data. In this work we adopt the HAR models in order to analysis Korean COVID-19 pandemic time series data such as the confirmed, recovered, death cases and their related rates.

Estimation

Data

In this work, some important COVID-19 cases data are considered for analysis of estimation and forecasting. The COVID-19 data used in this paper are available in https://coronaboard.kr/ and http://www.ecdc.europa.eu/en/covid-19-pandemic. For all the computations, we use Python3.8 numpy, scipy and statsmodel.tsa. Each of the data is applied to the univariate HAR time series models, respectively, and furthermore a pair of the data is applied to the bivariate HAR model. Cumulative confirmed (CC), cumulative recovered (CR) and cumulative deaths (CD) cases time series data are basically used. As the related rates, recovery rate (RR) and fatality rate (FR) as well as infection rates (IR) for τ days, (τ=14,21) are analyzed. The RR on day t is given as the ratio of the CR to the CC on the day and the FR on day t as the ratio of the CD to the CC.

RRt=CRtCCt×100,FRt=CDtCCt×100.

The IR on day t is defined as the proportion of (daily) confirmed (C) case to the sum of confirmed cases for τ days, where τ=14,21 are the incubation period chosen according to [45]:

IRt(τ)=Ctl=0τ1Ctl×100,τ=14,21.

The data statistics of the seven data sets CCt, CRt, CDt, RRt, FRt and IRt(τ),τ=14,21, are described in Table 1, and plots of the data and their autocorrelation functions (ACFs) are depicted in Fig. 1.

Table 1.

Korean COVID-19 Data during in-sample period from Jan.20.2020 to Dec.14.2020; (Values in Size indicate n+hp in the HAR(p) models, where n is the sample size, hp=7,14,21 for p=2,3,4, resp.).

Data Statistics
Size Mean SD Min Median Max
Cumulative confirmed cases CCt 330 15 351.89 9723.06 1 12 935.5 43 484
Cumulative recovered cases CRt 330 12 722.35 8790.52 0 11 721.5 32 102
Cumulative death cases CDt 330 276.91 156.49 0 282 587

Recovery rate RRt 330 70.239 31.595 0 87.227 93.694
Fatality rate FRt 330 1.6431 0.6910 0 1.7517 2.3964

Infection rate for 14 days IRt(14) 317 8.563 7.578 0 7.278 59.649
Infection rate for 21 days IRt(21) 310 6.266 6.744 0 4.814 57.057

Estimation in univariate HAR models

In the HAR(p) models, orders p=2,3,4 are set along with lag structure h=(h1,,hp)=(1,7),(1,7,14), (1,7,14,21), respectively. As for the estimation method we use the OLSE in the HAR model (1). Theoretic asymptotic normality of the HAR model was developed as well as simulation study was conducted in the multivariate cases by [46]. Based on the work of [46], we perform the analysis of the OLSE for the HAR model and illustrate the estimation result with standard error (se) and 95% confidence interval (CI). Suppose we observed {Xhp+1,Xhp+2,,X0,X1,,Xn} where n is regarded as the sample size to compute the OLSE as follows:

Model (1) is rewritten as Xt=ϕXt1+εt where ϕ=(ϕ1,,ϕp) and Xt1=(Xt1(1),,Xt1(p)). The OLSE is given by

ϕˆ=(ϕˆ1,,ϕˆp)=t=1nXt1Xt11t=1nXt1Xt.

In order to compare estimations in the HAR models of orders p=2,3,4, we use some criteria such as R2, AIC (Akaike’s Information Criterion) and BIC (Bayesian Information Criterion):

R2=1t=1n(XtXˆt)2t=1n(XtX¯)2

where Xˆt=ϕˆ1Xt1(1)++ϕˆpXt1(p) and X¯=1nt=1nXt.

AIC=2LL+2k,BIC=2LL+klog(n)

where LL is the Log-likelihood function and k is the number of parameters. Also we compute some performance errors: root mean squares error (RMSE) and mean absolute error (MAE):

RMSE=1nt=1net2,MAE=1nt=1n|et|

where et is the difference between the observed and estimated values: et=XtXˆt=Xtϕˆ1Xt1(1)ϕˆpXt1(p).

One of the reasons why we adopt the HAR models is that the Korean COVID-19 data fit very well in the HAR models in the sense of the coefficient of determination R2 being near one, as seen in Table 2, Table 3. Comparisons in terms of R2 values are commonly used in regression models, for instance, [50] recently computed R2 values of univariate and multivariate HAR models for several financial data sets.

Table 2.

Criteria of the OLSEs in the univariate HAR(p) model on cumulative confirmed CCt, cumulative recovered CRt and cumulative deaths CDt cases.

CCt
RMSE MAE R2 AIC BIC
p=2 67.310 (0.00692) 36.658 (0.00377) 1.00 3640 3647
p=3 63.139 (0.00649) 34.647 (0.00356)a 1.00 3523 3534
p=4 63.016 (0.00648)a 34.747 (0.00357) 1.00 3446a 3460a

CRt
RMSE MAE R2 AIC BIC

p=2 54.935 (0.00625)a 36.857 (0.00419)a 1.00 3509 3516
p=3 55.081 (0.00627) 37.650 (0.00428) 1.00 3436 3448
p=4 55.121 (0.00627) 37.747 (0.00429) 1.00 3363a 3378a

CDt
RMSE MAE R2 AIC BIC

p=2 1.529 (0.00976) 1.104 (0.00785) 1.00 1195 1203
p=3 1.524 (0.00974)a 1.101 (0.00703)a 1.00 1169 1180
p=4 1.531 (0.00978) 1.114 (0.00712) 1.00 1148a 1163a

Values in the parentheses indicate the errors divided by standard deviation.

a

Denotes the best.

Table 3.

Criteria of the OLSEs in the univariate HAR(p) model on recovery rate RRt and fatality rate FRt.

RRt
RMSE MAE R2 AIC BIC
p=2 1.531 (0.04866) 0.535 (0.0169) 1.00 1196 1203
p=3 1.546 (0.04847) 0.550 (0.0174) 1.00 1178 1190
p=4 1.525 (0.04795)a 0.524 (0.0166)a 1.00 1146a 1161a

FRt
RMSE MAE R2 AIC BIC

p=2 0.0446 (0.06454)a 0.0144 (0.02078)a 0.999 −1088a −1081a
p=3 0.0451 (0.06525) 0.0148 (0.02138) 0.999 −1056 −1045
p=4 0.0449 (0.06506) 0.0145 (0.02094) 0.999 −1032 −1017

Values in the parentheses indicate the errors divided by standard deviation.

a

Denotes the best.

Table 2, Table 3, Table 4 report the results of performance errors as well as the criterion rules for the seven data sets: CC, CR, CD, RR, FR and IR(τ), τ=14,21. In the tables, values in the parentheses of RMSE and MAE are the RMSE and MAE divided by standard deviation, respectively, which are equivalent to RMSE and MAE of the HAR model fitted by the normalized data, that is, the data subtracted by mean and then divided by standard deviation. The reason why we compute the RMSE and MAE of the normalized data is because data considered have all different scales. To compare the different data adopted in the HAR models, we use the standardized data that have mean zero and variance one. In most of the cases of Table 2, Table 3, Table 4, order p=4 has the best. In Table 2, which has the criterion results of CC, CR, CD with all R2 equal to one, all errors of RMSE and MAE in the parentheses are less than 0.01. Table 5, Table 6, Table 7, which correspond to Table 2, Table 3, Table 4, respectively, report coefficient estimates, standard errors as well as the confidence intervals of the parameters in the HAR models for Korean COVID-19 data. Estimation results of the parameters in the HAR(4) model for the CC are, letting Xt=CCt, written by

Xt=1.492(0.023)Xt1(1)0.705(0.056)Xt1(2)+0.284(0.058)Xt1(3)0.07(0.025)Xt1(4)+εt

where the estimates of coefficients with standard errors in the parentheses are cited from Table 5. The 95% confidence intervals of the coefficients are given as (1.447,1.563),(0.816, 0.594), (0.170,0.399) and (0.119,0.022) for ϕ1,ϕ2,ϕ3 and ϕ4, respectively, and all the confidence intervals are significant.

Table 4.

Criteria of the OLSEs in the univariate HAR(p) model on individual infection rate IRt(τ).

IRt(14)
RMSE MAE R2 AIC BIC
p=2 4.512 (0.5953) 2.584 (0.3410) 0.844 1818 1825
p=3 4.522 (0.5967) 2.605 (0.3438) 0.846 1780 1792
p=4 3.336 (0.4403)a 2.297 (0.3032)a 0.877a 1561a 1576a

IRt(21)
RMSE MAE R2 AIC BIC

p=2 3.489 (0.5174) 1.923 (0.2851) 0.859 1621 1629
p=3 2.554 (0.3786) 1.657 (0.2456) 0.887a 1401 1412
p=4 2.198 (0.3259)a 1.485 (0.2202)a 0.880 1283a 1298a

Values in the parentheses indicate the errors divided by standard deviation.

a

Denotes the best.

Table 5.

Coefficient estimates (standard errors) and 95% CI in univariate HAR(p), p=2,3,4, for cumulative confirmed CCt, cumulative recovered CRt, cumulative death CDt cases.


p=2
p=3
p=4
CCt coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI
ϕ1 1.339(0.009) [1.321,1.356] 1.454(0.018) [1.418,1.489] 1.492(0.023) [1.447,1.536]
ϕ2 −0.338(0.009) [−0.356,−0.320] −0.571(0.035) [−0.648,−0.511] −0.705(0.056) [−0.816,−0.594]
ϕ3 0.127(0.018) [0.092,0.162] 0.284(0.058) [0.170,0.399]
ϕ4 −0.070(0.025) [−0.119,−0.022]

CRt coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI

ϕ1 1.316(0.011) [1.294,1.339] 1.373(0.027) [1.319,1.427] 1.306(0.038) [1.231,1.381]
ϕ2 −0.316(0.012) [−0.339,−0.293] −0.432(0.052) [−0.534,−0.329] −0.217(0.099) [−0.412,−0.022]
ϕ3 0.059(0.026) [0.008,0.110] −0.192(0.102) [−0.392,0.009]
ϕ4 0.104(0.041) [0.024,0.185]

CDt coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI

ϕ1 1.299(0.016) [1.268,1.331] 1.175(0.043) [1.091,1.260] 1.138(0.047) [1.045,1.231]
ϕ2 −0.298(0.016) [−0.331,−0.267] −0.052(0.081) [−0.212,−0.108] 0.124(0.121) [−0.115,0.362]
ϕ3 −0.123(0.040) [−0.201,−0.045] −0.378(0.136) [−0.045,−0.111]
ϕ4 0.117(0.059) [0.000,0.233]

Table 6.

Coefficient estimates (standard errors) and 95% CI in univariate HAR(p), p=2,3,4, for recovery rate RRt and fatality rate FRt.


p=2
p=3
p=4
RRt coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI
ϕ1 1.219(0.024) [1.171,1.267] 1.235(0.033) [1.170,1.300] 1.237(0.035) [1.169,1.306]
ϕ2 −0.219(0.025) [−0.267,−0.170] −0.250(0.061) [−0.380,−0.140] −0.273(0.076) [−0.423,−0.123]
ϕ3 0.025(0.034) [−0.041,0.091] 0.049(0.088) [−0.125,0.222]
ϕ4 −0.013(0.045) [−0.102,0.076]

FRt coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI

ϕ1 1.144(0.031) [1.084,1.204] 1.149(0.036) [1.078,1.219] 1.139(0.036) [1.068,1.210]
ϕ2 −0.144(0.031) [−0.205,−0.084] −0.159(0.066) [−0.289,−0.030] −0.253(0.073) [−0.396,−0.109]
ϕ3 0.011(0.040) [−0.069,0.090] 0.288(0.102) [0.088,0.489]
ϕ4 −0.175(0.059) [−0.291,−0.059]

Table 7.

Coefficient estimates (standard errors) and 95% CI in univariate HAR(p) models of order p=2,3,4 for infection rate IRt(τ), τ=14,21.



p=2
p=3
p=4
IRt(14) coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI
τ=14 ϕ1 0.856(0.046) [0.765,0.947] 0.869(0.047) [0.776,0.961] 0.599(0.049) [0.502,0.696]
ϕ2 0.078(0.050) [−0.021,0.177] −0.041(0.092) [−0.223,0.141] 0.241(0.077) [0.089,0.393]
ϕ3 0.121(0.080) [−0.037,0.278] −0.093(0.118) [−0.325,0.139]
ϕ4 0.162(0.089) [−0.012,0.337]

IRt(21) coef. est (se) 95% CI coef. est (se) 95% CI coef. est (se) 95% CI

τ=21 ϕ1 0.910(0.044) [0.824,0.996] 0.597(0.044) [0.510,0.684] 0.612(0.061) [0.492,0.733]
ϕ2 0.022(0.048) [−0.072,0.116] 0.277(0.069) [0.142,0.413] 0.576(0.121) [0.337,0.814]
ϕ3 0.001(0.054) [−0.105,0.107] −0.498(0.119) [−0.672,−0.225]
ϕ4 0.204(0.070) [0.066,0.343]

Fig. 2 depicts the HAR fitted model with order chosen in Table 2 and with estimated coefficients in Table 5 for CC, CR, CD. Moreover, their daily fitted data, which are the differences from the HAR models, are plotted together with real daily data and residuals on the right column of Fig. 2. Also, Fig. 3 shows the HAR fitted model for RR, FR, IR(τ), τ=14,21 with order chosen in Table 3, Table 4 and with estimated coefficients in Table 6, Table 7, respectively. As shown in Fig. 2, Fig. 3, where real values, fits and residuals appear in blue, red and gray, respectively, HAR models have good performances with small residuals even for the cases of RR, FR and IR(τ), τ=14,21.

Fig. 2.

Fig. 2

Cumulative Confirmed, Cumulative Recovered and Cumulative Deaths cases data and their univariate HAR fitting (first column); Daily data and differences from the HAR fittings (second column). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 3.

Fig. 3

Recovery Rate, Fatality Rate and their HAR fittings (first row); Infection Rates for 14 and 21 days and their HAR fittings (second row). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Among others, from Table 7 we report the estimation results for the IR(14) in the HAR(4) model as follows: for Xt=IRt(14),

Xt=0.599(0.049)Xt1(1)+0.241(0.077)Xt1(2)0.093(0.118)Xt1(3)+0.162(0.089)Xt1(4)+εt.

The 95% confidence intervals of the four coefficients are given as (0.502,0.696),(0.089,0.393), (0.325,0.139) and (0.012,0.337), and the first two confidence intervals are significant. The HAR(4) model of the IR(21) is given by

Xt=0.612(0.061)Xt1(1)+0.576(0.121)Xt1(2)0.498(0.119)Xt1(3)+0.204(0.070)Xt1(4)+εt.

for Xt=IRt(21). The 95% confidence intervals are (0.492,0.733), (0.337,0.814), (0.672,0.225) and (0.066,0.343), and all the confidence intervals have significant results.

Estimation in bivariate HAR models

Now we consider a bivariate HAR model for bivariate data of CC and CD. The matrix form of the model (2) is written as Zt=BWt1+Et where

Zt=(Xt,Yt),Wt1=(Xt1(1),,Xt1(p),Yt1(1),,Yt1(q)),Et=(ε1,t,ε2,t)

and

B=α11α1pβ11β1qα21α2pβ21β2q.

The OLSE B^ of B is given by B^=argminBj=12t=1nεj,t2 and it is obtained by

B^=t=1nWt1Wt11t=1nWt1Zt

Its asymptotic normality theory was established by [46] together with a simulation study in the cases of i.i.d. and correlated errors. We refer to [46] for the simulation, and thus we omit the simulation experiment in this present work. Table 8 presents the results of the criteria for the bivariate HAR(p,q) models with some pairs of orders for two data sets CCt and CDt, where we use the normalized data because the two data sets have different scales. It is also intended to compare performance of the univariate HAR models of CCt and CDt with that of the bivariate HAR model.

Table 8.

Criteria of the estimates in the bivariate HAR(p,q) model on joint cumulative confirmed CCt and cumulative deaths CDt cases.

Bivariate CCt CDt
(p,q) (j=1)
(j=2)
RMSEsd MAEsd R2 AIC BIC RMSEsd MAEsd R2 AIC BIC
(2,2) 0.00686 0.00367 1.00 −2293a −2278a 0.00954 0.00684 1.00 −2080a −2034a
(3,3) 0.00654 0.00359 1.00 −2270 −2247 0.00917 0.00662a 1.00 −2057 −2034a
(3,4) 0.00662 0.00366 1.00 −2210 −2184 0.00924 0.00675 1.00 −2004 −1978
(4,3) 0.00647a 0.00352a 1.00 −2224 −2198 0.00911 0.00666 1.00 −2013 −1987
(4,4) 0.00647a 0.00357 1.00 −2222 −2192 0.00910a 0.00667 1.00 −2011 −1982
a

Denotes the best.

In the bivariate HAR model, results of coefficient estimates, standard errors and 95% confidence intervals are given in Table 9 for the model of order (p,q)=(4,4), which has the smallest RMSE among the considered models. The bivariate HAR(4,4) model for Xt=CCt and Yt=CDt in Table 9 is given by

Xt=1.512(0.023)Xt1(1)0.751(0.058)Xt1(2)+0.341(0.062)Xt1(3)0.103(0.028)Xt1(4)0.051(0.036)Yt1(1)+0.085(0.084)Yt1(2)0.040(0.101)Yt1(3)+0.006(0.046)Yt1(4)+ε1,t
Yt=0.143(0.032)Xt1(1)0.286(0.081)Xt1(2)+0.268(0.087)Xt1(3)0.119(0.039)Xt1(4)+0.973(0.051)Yt1(1)+0.197(0.117)Yt1(2)0.128(0.142)Yt1(3)0.047(0.065)Yt1(4)+ε2,t.

From the results in Table 9, where the 95% confidence intervals have the star mark if they are significant, we can see that the CC affects the CD whereas the CD does not affect the CC so much.

Table 9.

Coefficient estimates (standard errors) and 95% CI in bivariate HAR(4,4) model for (CCt,CDt).

Bivariate CCt CDt
(p,q) (j=1)
(j=2)
Coef. est (se) (95% CI) Coef. est (se) (95% CI)
(4,4) αj1 1.512(0.023) [1.467,1.558]a 0.143(0.032) [0.079,0.206]a
αj2 −0.751(0.058) [−0.864,−0.637]a −0.286(0.081) [−0.447,−0.126]a
αj3 0.341(0.062) [0.220,0.462]a 0.268(0.087) [0.098,0.431]a
αj4
−0.103(0.028)
[−0.157,−0.048]a

−0.119(0.039)
[−0.196,−0.043]a
βj1 −0.051(0.036) [−0.121,0.020] 0.973(0.051) [0.873,1.072]a
βj2 0.085(0.084) [−0.079,0.249] 0.197(0.117) [−0.034,0.429]
βj3 −0.040(0.101) [−0.238,0.158] −0.128(0.142) [−0.407,0.150]
βj4 0.006(0.046) [−0.085,0.096] −0.047(0.065) [−0.175,0.079]
a

Indicates significant CI.

Notice that in the univariate HAR models of order p=4 the RMSEs of CCt and CDt are 0.00648 and 0.00978, respectively, while in the bivariate case, those of CCt and CDt are 0.00647 and 0.00910. We see that the MAEs also have smaller values in the bivariate models. It implies that the bivariate HAR model fits better with the two data sets.

Forecasting

Now in this section we investigate forecasts of the Korean COVID-19 data by means of the HAR models. For the forecasting, we use the data during the in-sample-period from Jan.20.2020 to Jan.20.2021, and compute the out-of-sample forecasts between Jan.21.2021 and Feb.18.2021.

We examine forecast performance by evaluating four measures: root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root relative square error (RRSE) for -step ahead forecast, =1,2,. Suppose that {z1,z2,,zn} are observed data during the in-sample period and {zn+:=1,2,} are data used to compute out-of-sample forecasts. Let zˆn+|n be the -step ahead forecast. The four forecast performance measures of the -step ahead forecasts are given as follows: N indicates a general time epoch of out-of-sample to obtain the -step ahead forecast so that Nm+,,N1+ are time indices used to compute the difference between the real value and the forecast, as well as zN1+ is the last observation in the total sample for each , where m=m is the size for the forecasts, which depends on .

Root Mean Square Error (RMSE):

RMSE=1mn=NmN1(zˆn+|nzn+)2.

Mean Absolute Error (MAE):

MAE=1mn=NmN1|zˆn+|nzn+|.

Mean Absolute Percentage Error (MAPE):

MAPE=100mn=NmN1|zˆn+|nzn+zn+|.

Root Relative Square Error (RRSE):

RRSE=n=NmN1(zˆn+|nzn+)2n=NmN1(z¯zn+)2

where z¯=1mn=NmN1zn+.

We evaluate the performance results of the first, 7th, 14th, 21th step ahead forecasts (i.e., =1,7,14,21) for the Korean COVID-19 time series data in Table 10, Table 11, Table 12, Table 13. In Table 10, forecasting performances of CC, CR, CD adopted in univariate HAR models are described whereas bivariate HAR forecasting performances of (CC, CD) are given in Table 13. Table 11, Table 12 report the multi-step ahead forecasts of RR, FR and IR(τ). In Table 10, Table 11, Table 12, Table 13, values in parentheses of RMSE and MAE indicate the errors of forecasts by means of the normalized data with mean zero and variance one, as in Table 2, Table 3, Table 4.

Table 10.

Out-of-sample -step forecast performance of the univariate HAR(4) model on cumulative confirmed CCt, cumulative recovered CRt and cumulative deaths CDt cases. (Values in parentheses indicate the errors divided by standard deviation.)

CCt
CRt
CDt
=1 =7 =14 =21 =1 =7 =14 =21 =1 =7 =14 =21
RMSE 142.76 904.05 2519.56 3231.81 338.53 1049.06 3154.64 6074.15 7.66 37.92 94.59 172.85
(0.012) (0.079) (0.229) (0.317) (0.034) (0.112) (0.343) (0.677) (0.045) (0.235) (0.592) (1.100)

MAE 114.35 772.86 1884.05 3001.86 206.82 805.05 2644.25 5898.07 5.75 30.59 79.88 154.10
(0.010) (0.069) (0.175) (0.296) (0.021) (0.086) (0.289) (0.659) (0.034) (0.188) (0.499) (0.979)

MAPE 0.1961 1.2581 2.8957 4.4701 0.4895 1.9116 5.8221 12.1479 0.6972 3.6328 8.4457 14.8437

RRSE 0.0187 0.1571 0.7069 1.9811 0.0587 0.2129 0.8555 2.7305 0.0464 0.2803 0.9758 3.2299

Table 11.

Out-of-sample -step forecast performance of the univariate HAR(4) model on recovery rate RRt and fatality rate FRt. (Values in parentheses indicate the errors divided by standard deviation.)

RRt
FRt
=1 =7 =14 =21 =1 =7 =14 =21
RMSE 0.5931 1.8122 4.7101 4.6843 0.0143 0.0869 0.1998 0.3322
(0.0222) (0.0648) (0.1657) (0.1557) (0.0235) (0.1377) (0.3097) (0.4963)

MAE 0.4038 1.2792 1.2792 3.5534 0.0097 0.0845 0.1990 0.3316
(0.0147) (0.0451) (0.0451) (0.1173) (0.0161) (0.1347) (0.3079) (0.4952)

MAPE 0.5656 1.8034 1.8034 4.8277 0.6669 5.7143 12.959 20.864

RRSE 0.3443 1.0360 1.0360 2.8141 0.1538 1.0592 3.0011 7.9863

Table 12.

Out-of-sample -step forecast performance of the univariate HAR(4) model on individual infection rate IRt(τ), τ=14,21. (Values in parentheses indicate the errors divided by standard deviation.)

IRt(14)
IRt(21)
=1 =7 =14 =21 =1 =7 =14 =21
RMSE 1.092 1.082 1.123 1.393 0.7733 0.9360 1.0026 0.9103
(0.1842) (0.1711) (0.1499) (0.1842) (0.1762) (0.2119) (0.1719) (0.1359)

MAE 0.8923 0.9043 0.9611 0.9998 0.6507 0.7604 0.8440 0.6973
(0.1433) (0.1316) (0.1283) (0.1325) (0.1389) (0.1579) (0.1379) (0.1036)

MAPE 13.601 15.342 16.135 16.395 14.313 20.441 24.325 19.240

RRSE 0.685 0.9108 1.1141 1.408 0.5398 0.8646 1.2576 1.298

Moreover, the forecasts plots can be seen in Fig. 4, Fig. 5, Fig. 6, where predicted values as well as 95% prediction intervals are depicted between Jan.21.2021 and Feb.18.2021. Most of the cases show that forecasts are close to real data and the 95% prediction intervals include the real values except for the fatality rate. In the fatality rate case, predicted values and prediction interval have reasonably increasing pattern, although the actual values are not contained in the prediction interval. It is because the real fatality rate becomes suddenly flattened after Jan.21.2021.

Fig. 4.

Fig. 4

Forecasting and 95% prediction intervals of Cumulative Confirmed, Cumulative Recovery, Cumulative Death cases using the univariate HAR models.

Fig. 5.

Fig. 5

Forecasting and 95% prediction intervals of Recovery Rate, Fatality Rate (first row); Infection Rates for 14 and 21 days (second row).

Fig. 6.

Fig. 6

Forecasting and 95% prediction intervals of Cumulative Confirmed, Cumulative Death cases using the bivariate HAR models.

We see, in the first plots of Fig. 4, Fig. 6, that somewhat different prediction intervals for the CC are obtained in univariate and bivariate HAR models. This happens because the univariate model of the CC in Fig. 4 is independent of the CD, but the bivariate model of CC depends on the CD. As seen in the plots of real data of the CD in Fig. 4, Fig. 6, the numbers of death cases are increasing very slowly during the period from Jan.21.2021 to Feb.18.2021, and this fact affects the slow increasing predicted values of the CC in the bivariate model in Fig. 6. Also, we note in Table 10, Table 13 that in the bivariate prediction model the errors for the CC are similar to or smaller, and all the errors for the CD are smaller than those in the univariate prediction model. For example, in the univariate CD has RMSEs (7.66, 37.92, 94.59, 172.85) whereas in the bivariate (7.087, 27.48, 62,07, 104.08) for =1,7,14,21. Consequently, we conclude that the bivariate HAR model yields better fitting performance than the univariate model as we expected.

Table 13.

Out-of-sample -step forecast performance of the bivariate HAR(4,4) model on joint cumulative confirmed CCt and cumulative deaths CDt cases. (Values in parentheses indicate the errors divided by standard deviation.)

Bivariate CCt CDt
(j=1)
(j=2)
=1 =7 =14 =21 =1 =7 =14 =21
RMSE 146.12 999.56 2984.10 2283.31 7.087 27.482 62.07 104.08
(0.0126) (0.0904) (0.2750) (0.2215) (0.0417) (0.1718) (0.3897) (0.6633)

MAE 117.72 768.20 2270.15 1952.30 5.218 22.56 54.52 86.61
(0.0101) (0.0691) (0.2106) (0.1912) (0.0305) (0.1392) (0.3405) (0.5509)

MAPE 0.2025 1.2614 3.4759 2.8907 0.6363 2.7278 5.7639 8.4292

RRSE 0.0192 0.1737 0.8372 1.3996 0.0429 0.2031 0.6403 1.9449

Conclusion and discussion

This present work concerns with the COVID-19 time series analysis in South Korea. The COVID-19 pandemic lasts longer than one year, and humans are still threatened and are not free from the virus. We deal with a variety of Korean COVID-19 time series data sets such as confirmed case, recovery case, death case as well as recovery rate, fatality rate and infection rates. An interesting and robust time series model for long-memory structured data is a HAR model that has recently attracted much attention from econometrician and statistician in the field of time series analysis.

In this work, instead of the traditional ARIMA model, the HAR models are applied to study the estimation and forecasting for the Korean COVID-19 time series data. The reason for choosing the HAR models in the present paper is that the Korean COVID-19 data have long-memory features, and moreover some criteria such as R2, AIC and BIC have good performances. Indeed, the R2-values of the four data sets: cumulative confirmed, cumulative recovery, cumulative death cases and recovery rate, are one, and also the value of fatality rate is 0.999. Those of infection rates for 14 and 21 days are larger than 0.84.

For the model selection, we compute RMSE, MAE as well as R2, AIC, BIC to decide the order of the HAR models, and evaluate coefficients estimates by means of the least square method for each model. Moreover, 95% confidence intervals along with standard errors are constructed. Adopting the HAR models, we see that the actual values of COVID-19 cases and the fits by the HAR models have very small errors in residuals, with well-matched fitting plots as seen in Fig. 2, Fig. 3. Also, real daily data of confirmed, recovered and death cases are well-matched with the differences from our proposed HAR models. Furthermore, noticing that confirmed and death cases are strongly correlated with each other, a bivariate HAR model is applied to the two data sets. Better fitting is demonstrated with smaller RMSE and MAE in the bivariate HAR model than in the univariate HAR models.

To investigate the out-of-sample forecasting, HAR models with the orders, chosen optimally according to the criteria rules, are used to compute multi-step ahead forecasts. As forecasting performance measures, RMSE, MAE, MAPE and RRSE are evaluated at the 1, 7, 14 and 21-step ahead forecasts. Predicted values as well as 95% prediction intervals are illustrated from the out-of-sample forecasting. Most of the cases show that forecasts are close to real values and the 95% prediction intervals contain the real values except for the fatality rate. In the fatality rate case, predicted values and prediction interval have reasonably increasing pattern, although the actual values of the real data deviate from the prediction interval. The reason is that the fatality rates become suddenly flattened after the time epoch of the out-of-sample.

For the two data sets of cumulative confirmed and cumulative death cases, as mentioned above, the univariate and bivariate models are considered for forecasting. We see that somewhat different prediction intervals for the cumulative confirmed cases are found. It is because the univariate model for the confirmed cases is independent of death cases, whereas the bivariate model depends on the death cases as regressors. The numbers of death cases are increasing very slowly during the out-of-sample period, and for this reason, slowly increasing predicted values of the confirmed cases seem to appear in the bivariate model. Also, the forecast results report that in the bivariate prediction model the forecast accuracy errors for the cumulative confirmed case are similar to or smaller, and all the errors for the cumulative death case are smaller than those in the univariate prediction model. In conclusion, the bivariate HAR model improves the fitting-performance of the univariate models.

The novelty of this paper is as follows: This work is the first attempt of the HAR models for the COVID-19 analysis in literature, and moreover we validate that the HAR model provides a good prediction model for the COVID-19 time series data. Various data sets from the Korean COVID-19, that is, time series data such as confirmed, recovered, death cases as well as recovery, fatality, and infection rates, are applied to the HAR models. Parameter estimation and forecasting are illustrated with good performances of fitting and accuracy. In the pair of the confirmed and death cases, the bivariate HAR model has better fitting-performance than the univariate HAR models. Therefore, the proposed model can be an efficient tool for the COVID-19 analysis, which will help governments and health systems to manage social policies of the COVID-19 disease prevention during the pandemic.

We have explored the estimation and prediction using the HAR models for the COVID-19 time series data in South Korea. However, Korean COVID-19 cumulative confirmed or death cases are not much similar to those of worldwide or US COVID-19 data, which have some pattern of oscillation with 7 days periodicity. By imposing the periodicity to the HAR models, we can extend the discussion of this work to the COVID-19 in other countries. For example, recently [51] proposed an integer-valued AR model with oscillating weighted cosine geometric innovations for modeling the COVID-19 series in some small island developing states. The weighted cosine geometric process accounts for oscillating patterns and, according to [52], it outperforms well-known competing discrete models. The HAR model with the weighted cosine geometric innovation terms will be an interesting model to fit the COVID-19 series worldwide or in other countries with oscillation feature. This extension will be studied in a future research. Another interesting time series model we suggest is a partially periodic HAR model, which has some truncation in the oscillation. In general, as seen the numbers of the confirmed cases in US., Germany, Brazil, India, etc., the oscillation seems to appear as the magnitude of the COVID-19 series is large, and furthermore, the larger magnitude the stronger oscillation. In order to represent this phenomenon, partial oscillating which has some truncation as well as is proportional to the magnitude of oscillation might be desirable rather than pure oscillating. This topic on the partially periodic HAR model will be also very interesting and promising.

CRediT authorship contribution statement

Eunju Hwang: Conceptualization, Methodology, Software, Formal analysis, Writing – original draft, Project administration, Writing – review and editing, Supervision, Funding acquisition. SeongMin Yu: Conceptualization, Methodology, Software, Formal analysis, Writing – original draft, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by National Research Foundation of Korea (NRF-2018R1D1A1 B07048745).

References

  • 1.Benvenuto D., Giovanetti M., Vassallo L., Angeletti S., Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Date Brief. 2020;29 doi: 10.1016/j.dib.2020.105340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ceylan Z. Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ. 2020;729 doi: 10.1016/j.scitotenv.2020.138817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kirbas I., Sozen A., Tuncer A.D., Kazancioglu F.S. Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches. Chaos Solitons Fractals. 2020;138 doi: 10.1016/j.chaos.2020.110015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ribeiro M.H.D.M., Silva R.G.D., Mariani V.C., Coelho L.D.S. Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil. Chaos Solitons Fractals. 2020;135 doi: 10.1016/j.chaos.2020.109853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kumar N., Susan S. 11th ICCCNT 2020 Conference. 2020. COVID-19 pandemic prediction using time series forecasting models. [Google Scholar]
  • 6.Tandon H., Ranjan P., Chakraborty T., Suhag V. 2020. Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future. arXiv:2004.07859. [Google Scholar]
  • 7.Mustafa H.I., Fareed N.Y. 2nd Al-Noor International Conference for Science and Technology (NICST) 2020. COVID-19 cases in Iraq; forecasting incidents using box - jenkins ARIMA model; pp. 22–26. [Google Scholar]
  • 8.Atangana A., Araz I.S. Modeling and forecasting the spread of COVID-19 with stochastic and deterministic approaches: Africa and europe. Adv Difference Equ. 2021;57 doi: 10.1186/s13662-021-03213-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Taylor S.J., Letham B. Forecasting at scale. Amer Statist. 2018;72:37–45. [Google Scholar]
  • 10.Maleki M., Mahmoudi M., Wraith D., Pho K. Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel Med Infect Dis. 2020;37 doi: 10.1016/j.tmaid.2020.101742. [DOI] [PubMed] [Google Scholar]
  • 11.Maleki M., Mahmoudi M.R., Heydari M.H. Modeling and forecasting the spread and death rate of coronavirus (COVID-19) in the world using time series models. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Soukhovolsky V., Kovalev A., Pitt A., Kessel B. A new modelling of the COVID 19 pandemic. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Reis R.F., Quintela B.D.M., Campos J.D.O., Gomes J.M., Rocha B.M., Lobosco M., Santos R.W.D. Characterization of the COVID-19 pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in South Korea, Italy, and Brazil. Chaos, Solitons Fractals. 2020;136 doi: 10.1016/j.chaos.2020.109888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sarkar K., Khajanchi S., Nieto J.J. Modeling and forecasting the COVID-19 pandemic in India. Chaos Solitons Fractals. 2020;139 doi: 10.1016/j.chaos.2020.110049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mohammed A.A., Ewees Ahmed A., Fan Hong, Abd El Aziz Mohamed. Optimization method for forecasting confirmed cases of COVID-19 in China. J Clin Med. 2020;9:674. doi: 10.3390/jcm9030674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chatterjee S., Asad A., Shayak B., Bhattacharya S., Alam S., Verma M.K. Evolution of COVID-19 pandemic: Power-law growth and saturation. Int License. 2020 [Google Scholar]
  • 17.Luo X., Duan H., Xu K. A novel grey model based on traditional richards model and its application in COVID-19. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sanchez-Cabalero S., Selles M.A., Peydro M.A., Perez-Bernabeu E. An efficient COVID-19 prediction model validated with the cases of China, Italy and Spain: Total or partial lockdowns? J Clin Med. 2020;9:1547. doi: 10.3390/jcm9051547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cihan P. 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) 2020. Fuzzy rule-based system for predicting daily case in COVID-19 outbreak; pp. 1–4. [Google Scholar]
  • 20.Jarndal A., Husain S., Zaatar O., Gumaei T.A., Hamadeh A. International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) 2020. GPR and ANN based prediction models for COVID-19 death cases; pp. 1–5. [Google Scholar]
  • 21.Chaurasia V., Pal S. Application of machine learning time series analysis for prediction COVID-19 pandemic. Res Biomed Eng. 2020 [Google Scholar]
  • 22.Salgotra R., Gandomi M., Gandomi A.H. Time series analysis and forecast of the COVID-19 pandemic in India using genetic programming. Chaos Solitons Fractals. 2020;138 doi: 10.1016/j.chaos.2020.109945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Prasanth S., Singh U., Kumar A., Tikkiwal V.A., Chong P.H.J. Forecasting spread of COVID-19 using google trends: A hybrid GWO-deep learning approach. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Balli S. Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kalantari M. Forecasting COVID-19 pandemic using optimal singular spectrum analysis. Chaos Solitons Fractals. 2021;142 doi: 10.1016/j.chaos.2020.110547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mehra A.H.A., Shafieirad M., Abbasi Z., Zamani I. Parameter estimation and predicition of COVID-19 epidemic teruning point and ending time of a case study on SIR/SQAIR epidemic models. Comput Math Methods Med. 2020 doi: 10.1155/2020/1465923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Carvalho K., Vicente J.P., Jakovljevic M., Teixeira J.P. 2021. Forecasted incidence, intensive care unit admissions and projected mortality attributable to Covid-19 in Portugal, UK, Germany, Italy and France - 4 weeks ahead. Preprints. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Feroze N. Forecasting the patterns of COVID-19 and causal impacts of lockdown in top five affected countries using bayesian structural time series models. Chaos Solitons Fractals. 2020;140 doi: 10.1016/j.chaos.2020.110196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Katris C. A time series-based statistical approach for outbreak spread forecating: Application of COVID-19 in Greece. Expert Syst Appl. 2021;166 doi: 10.1016/j.eswa.2020.114077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Marmarelis V.Z. Predictive modeling of COVID-19 data in the US: Adaptive phase-space approach. J Eng Med Biol. 2020 doi: 10.1109/OJEMB.2020.3008313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Liu L., Moon H.R., Schorfheide F. Panel forecasts of country-level COVID-19 infections. J Econometrics. 2021;220:2–22. doi: 10.1016/j.jeconom.2020.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sun J., Chen X., Zhang Z., Lai S., Zhao B., Liu H., Wang S., Huan W., Zhao R., Ng M.T.A., Zheng Y. Forecasting the long-term trend of COVID-10 epidemic using a dynamic model. Sci Rep. 2020;10:21122. doi: 10.1038/s41598-020-78084-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ioannidis J.P.A. Infection fatality rate of COVID-19 inferred from seroprevalence data. Bull World Health Organ. 2020 doi: 10.2471/BLT.20.265892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bretschger L., Grieg E., Welfens P.J.J., Xiong T. COVID-19 infections and fatalities developments: empirical evidence for OECD countries and newly industrialized economies. Int Econ Economic Policy. 2020;17:801–847. [Google Scholar]
  • 35.Manski C.F., Molinari F. Estimating the COVID-19 infection rate: Anatomy of an inference problem. J Econometrics. 2020:60268–62600. doi: 10.1016/j.jeconom.2020.04.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jiang F., Zhao Z., Shao X. Time series analysis of COVID-19 infection curve: A change-point perspective. J Econometrics. 2020 doi: 10.1016/j.jeconom.2020.07.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Meyerowitz-Katz G., Merone L. A systematic review and meta-analysis of published research data on COVID-19 infection fatality rates. Int J Infect Dis. 2020;101:138–148. doi: 10.1016/j.ijid.2020.09.1464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Lee S., Liao Y., Seo M.H., Shin Y. Sparse HP filter: Finding kinks in the COVID-19 contact rate. J Econometrics. 2021;220:158–180. doi: 10.1016/j.jeconom.2020.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nikolopoulos K., Punia S., Schäfers A., Tsinopoulos C., Vasilakis C. Forecasting and planning during a pandemic: COVID-19 growth rates, supply chain disruptions, and governmental decisions. European J Oper Res. 2021;290:99–115. doi: 10.1016/j.ejor.2020.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lee K.B., Han S., Jeong Y. COVID-19, flattening the curve, and Benford’s law. Physica A. 2020;559 doi: 10.1016/j.physa.2020.125090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Li S., Linton O. When will the COVID-19 pandemic peak? J Econometrics. 2021;220:130–157. doi: 10.1016/j.jeconom.2020.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Keane M., Neal T. Consumer panic in the COVID-19 pandemic. J Econometrics. 2020:0304–4076. doi: 10.1016/j.jeconom.2020.07.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hanke M., Kosolapova M., Weissensteiner A. COVID-19 and market expectations: Evidence from option-implied densities. Econom Lett. 2020;195 doi: 10.1016/j.econlet.2020.109441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.James Nick, Menzies Max. Association between COVID-19 cases and international equity indices. Physica D. 2021;417 doi: 10.1016/j.physd.2020.132809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Linton N.M., Kobayashi T., Yang Y., Hayashi K., Akhmetzhanov A.R., Jung S., Yuan B., Kinoshita R., Nishiura H. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. J Clin Med. 2020;9:538. doi: 10.3390/jcm9020538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hong W.T., Lee J., Hwang E. A note on the asymptotic normality theory of the least squares estimates in multivariate HAR-RV models. Mathematics. 2020;8:2083. [Google Scholar]
  • 47.Müller UA, Dacorogna MM, Dave RD, Pictet OV, Olsen RB, Ward JR. Fractals and intrinsic time: a challenge to econometricians. In: Proceedings of the 39th International AEA Conference on Real Time Econometrics 1993, Luxembourg;.
  • 48.Corsi F. A simple approximate long-memory model of realized volatility. J Financ Econom. 2009;7:174–196. [Google Scholar]
  • 49.Andersen T.G., Bollerslev T. Answering the skeptics: YES, standard volatility models do provide accurate forecasts. Internat Econom Rev. 1998;39:885–905. [Google Scholar]
  • 50.Wilms I., Rombouts J., Croux C. Multivariate volatility forecasts for stock market indices. Int J Forecast. 2021;37:484–499. [Google Scholar]
  • 51.Khan N.M., Bakouch H.S., Soobhug A.D., Scotto M.G. Insights on the trend of the novel coronavirus 2019 series in some small island developing states: A thinning-based modelling approarch. Alexand Eng J. 2021;60:2535–2550. [Google Scholar]
  • 52.Chesneau C., Bakouch H.S., Hussain T., Para B.A. The cosine geometric distribution with count data modeling. J Appl Stat. 2020;48:124–137. doi: 10.1080/02664763.2019.1711364. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Results in Physics are provided here courtesy of Elsevier

RESOURCES