Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2021 Oct 21;25(23):14549–14559. doi: 10.1007/s00500-021-06362-4

Uncertain regression model with autoregressive time series errors

Dan Chen 1,
PMCID: PMC8530374  PMID: 34703385

Abstract

Uncertain regression model is a powerful analytical tool for exploring the relationship between explanatory variables and response variables. It is assumed that the errors of regression equations are independent. However, in many cases, the error terms are highly positively autocorrelated. Assuming that the errors have an autoregressive structure, this paper first proposes an uncertain regression model with autoregressive time series errors. Then, the principle of least squares is used to estimate the unknown parameters in the model. Besides, this new methodology is used to analyze and predict the cumulative number of confirmed COVID-19 cases in China. Finally, this paper gives a comparative analysis of uncertain regression model, difference plus uncertain autoregressive model, and uncertain regression model with autoregressive time series errors. From the comparison, it is concluded that the uncertain regression model with autoregressive time series errors can improve the accuracy of predictions compared with the uncertain regression model.

Keywords: Uncertain regression analysis, Uncertain time series analysis, Uncertain hypothesis test, Uncertainty theory, COVID-19

Introduction

Uncertain statistics is a set of mathematical techniques for collecting, analyzing, and interpreting data by uncertainty theory (Liu 2007). Uncertain regression analysis as a branch of uncertain statistics is a set of statistical techniques that use uncertainty theory to explore the relationship between explanatory variables and response variables. The study of uncertain regression analysis was started by Yao and Liu (2018) by assuming that the disturbance term is an uncertain variable instead of a stochastic variable. To make point estimation for the unknown parameters in an uncertain regression model, they suggested the least squares estimation. Then, Liu and Yang (2020) considered the least absolute deviations estimation, Chen (2020) investigated the Tukey biweight estimation, and Lio and Liu (2020) proposed the maximum likelihood estimation. Lio and Liu (2018) further gave interval estimation of the predictive response variables based on the uncertain disturbance term. To evaluate the appropriateness of fitted regression model and estimated disturbance term, Ye and Liu (2021) introduced uncertain hypothesis test. In addition, uncertain regression analysis has been extended in various directions, such as multivariate regression analysis (Song and Fu 2018; Ye and Liu 2020; Zhang et al. 2020) and cross-validation (Liu and Jia 2020; Liu 2019). Uncertain time series analysis as another branch of uncertain statistics is a set of statistical techniques that use uncertainty theory to predict future values based on the previously observed values. The study of uncertain time series analysis was started by Yang and Liu (2019) by assuming that the disturbance term is an uncertain variable instead of a stochastic variable. To explore the relationship between these observations, they presented an uncertain autoregressive (UAR) model. Furthermore, they applied the principle of least squares to estimating the unknown autoregressive parameters. Then, Yang et al. (2020) considered the least absolute deviations estimation, and Chen and Yang (2021a, 2021b) investigated the ridge estimation and the maximum likelihood estimation. To determine the optimum order of the UAR model, Liu and Yang (2020) gave cross-validation. In addition, some researchers studied other uncertain time series models, such as the 1-order uncertain moving average model (Yang and Ni 2021) and the uncertain vector autoregressive model (Tang 2020). To estimate the unknown parameters in an uncertain differential equation that fits the observed data as much as possible, several methods were proposed, for example, method of moments (Yao and Liu 2020), minimum cover estimation (Yang et al. 2020), least squares estimation (Sheng et al. 2020), generalized moment estimation (Liu 2021), and maximum likelihood estimation (Liu and Liu 2020). To obtain the unknown initial value of uncertain differential equation based on observed data, Lio and Liu (2021) proposed an estimation method.

In recent studies, these statistical techniques were utilized for COVID-19 prediction. For example, Liu (2021) applied the uncertain logistic growth model to fitting the cumulative number of confirmed COVID-19 cases in China. Ye and Yang (2021) applied the UAR model to analyzing the second difference of the cumulative number. Following that, Chen et al. (2021) presented an uncertain SIR model, and Jia and Chen (2021) proposed an uncertain SEIAR model by employing high-dimensional uncertain differential equations. Concerned about the time when COVID-19 started spreading in China, Lio and Liu (2021) inferred the zero-day of COVID-19 spread using the initial value estimation.

This paper proposes a new methodology for the analysis and prediction of time series data. In 1949, Cochrane and Orcutt (1949) presented evidence showing that the error terms involved in most current formulations of economic relations are highly positively autocorrelated. They indicated that in many cases the assumption of random error terms is not a very good approximation to the truth. Assuming that the errors are an autoregressive process with finite order, Durbin (1960) proposed a two-stage procedure that yields asymptotically efficient estimates in linear model. Similarly, it is an oversimplification to assume that error terms are independent in time in uncertain regression analysis. To improve this situation, in this paper a different type of model is considered in which the errors in the model have an autoregressive structure, i.e., uncertain regression model with autoregressive time series errors.

The rest of this paper is organized as follows. Section 2 introduces an uncertain regression model with autoregressive time series errors in detail, including parameter estimation, residual analysis, forecast value, and confidence interval. In Sect. 3, the approach is applied to modeling the cumulative number of confirmed COVID-19 cases in China. A comparative study on uncertain regression model, difference plus UAR model, and uncertain regression model with autoregressive time series errors is analyzed in Sect. 4. Section 5 shows that stochastic regression model with autoregressive time series errors is not suitable. Finally, some conclusions are made in Sect. 6.

Uncertain regression model with autoregressive time series errors

The uncertain regression model with autoregressive time series errors has the form of

Yt=f(Xt1,Xt2,,Xtp|β)+Zt 1

where Yt is a response series, (Xt1,Xt2,,Xtp) is a vector of explanatory series, β is an unknown vector of parameters, f(Xt1,Xt2,,Xtp|β) represents the effect of (Xt1,Xt2,,Xtp) on Yt, and Zt is an error series for t=1,2,,n. We assume that the errors of (1) follow a k-order uncertain autoregressive model, that is,

Zt=a0+i=1kaiZt-i+εt 2

where the autoregressive coefficients a0,a1,,ak are unknown, and εt are uncertain disturbances (uncertain variables) for t=k+1,k+2,,n.

Remark 1

Uncertain regression (linear or nonlinear) and uncertain autoregressive models are special cases of (1).

Remark 2

When (Xt1,Xt2,,Xtp) contains the time variable t, (1) is used by economists to study the trend of Yt.

In the regression model with autoregressive time series errors (1), if f(Xt1,Xt2,,Xtp|β) is a linear function, i.e.,

Yt=β0+β1Xt1+β2Xt2++βpXtp+ZtZt=a0+i=1kaiZt-i+εt, 3

then it is called a linear regression model with autoregressive time series errors. If f(Xt1,Xt2,,Xtp|β) is a logistic function, i.e.,

Yt=β0/(1+β1exp(-β2Xt))+Zt,β0,β1,β2>0Zt=a0+i=1kaiZt-i+εt, 4

then it is called a logistic growth model with autoregressive time series errors.

Parameter estimation

Assume (xt1,xt2,,xtp,yt) are the observed data at times t for t=1,2,,n, respectively. Based on the observed data, the least squares estimate of β in the uncertain regression model with autoregressive time series errors

Yt=f(Xt1,Xt2,,Xtp|β)+ZtZt=a0+i=1kaiZt-i+εt 5

is the solution, β, of the minimization problem,

minβt=1nyt-f(xt1,xt2,,xtp|β)2. 6

Then, for each index t (t=1,2,,n), the errors can be calculated as

zt=yt-f(xt1,xt2,,xtp|β). 7

The errors z1,z2,,zn will be regarded as the samples of Zt. The least squares estimate of (a0,a1,,ak) in the uncertain regression model with autoregressive time series errors is the solution, (a0,a1,,ak), of the minimization problem,

mina0,a1,,akt=k+1nzt-a0-i=1kaizt-i2. 8

Thus, the fitted regression model with autoregressive time series errors is determined by

Yt=f(Xt1,Xt2,,Xtp|β)+ZtZt=a0+i=1kaiZt-i. 9

Example 1

Let (xt1,xt2,,xtp,yt) be the observed data at times t for t=1,2,,n, respectively. The least squares estimates of β0,β1,,βp and a0,a1,,ak in the linear regression model with autoregressive time series errors

Yt=β0+β1Xt1+β2Xt2++βpXtp+ZtZt=a0+i=1kaiZt-i+εt 10

solve the minimization problems,

minβ0,β1,,βpt=1nyt-β0-j=1pβjxtj2 11

and

mina0,a1,,akt=k+1nzt-a0-i=1kaizt-i2, 12

respectively, where

zt=yt-β0-β1xt1-β2xt2--βpxtp 13

for t=1,2,,n.

Example 2

Let (xt,yt) be the observed data at times t for t=1,2,,n, respectively. The least squares estimates of β0,β1,β2 and a0,a1,,ak in the logistic growth model with autoregressive time series errors

Yt=β0/(1+β1exp(-β2Xt))+Zt,β0,β1,β2>0Zt=a0+i=1kaiZt-i+εt 14

solve the minimization problems,

minβ0>0,β1>0,β2>0t=1nyt-β0/(1+β1exp(-β2xt))2 15

and

mina0,a1,,akt=k+1nzt-a0-i=1kaizt-i2, 16

respectively, where

zt=yt-β0/(1+β1exp(-β2xt)) 17

for t=1,2,,n.

Residual analysis

Definition 1

Let (xt1,xt2,,xtp,yt) be the observed data at times t for t=1,2,,n, respectively, and let the fitted regression model with autoregressive time series errors be

Yt=f(Xt1,Xt2,,Xtp|β)+ZtZt=a0+i=1kaiZt-i. 18

Then, for each index t (t=k+1,k+2,,n), the term

ε^t=yt-f(xt1,xt2,,xtp|β)-a0-i=1kaiyt-i-f(xt-i,1,xt-i,2,,xt-i,p|β) 19

is called the t-th residual.

The residuals ε^k+1,ε^k+2,,ε^n will be regarded as the samples of the uncertain disturbance terms εt in the uncertain regression model with autoregressive time series errors

Yt=f(Xt1,Xt2,,Xtp|β)+ZtZt=a0+i=1kaiZt-i+εt. 20

Thus, the expected value of εt can be estimated as the average of residuals, i.e.,

e^=1n-kt=k+1nε^t, 21

and the variance can be estimated as

σ^2=1n-kt=k+1nε^t-e^2. 22

Therefore, we may assume the estimated disturbance terms ε~t follow the normal uncertainty distribution Inline graphic.

Example 3

Let (xt1,xt2,,xtp,yt) be the observed data at times t for t=1,2,,n, respectively, and let the fitted linear regression model with autoregressive time series errors be

Yt=β0+β1Xt1+β2Xt2++βpXtp+ZtZt=a0+i=1kaiZt-i. 23

The expected value of estimated disturbance terms ε~t is

e^=1n-kt=k+1nyt-β0-j=1pβjxtj-a0-i=1kaiyt-i-β0-j=1pβjxt-i,j, 24

and the variance is

σ^2=1n-kt=k+1nyt-β0-j=1pβjxtj-a0-i=1kaiyt-i-β0-j=1pβjxt-i,j-e^2. 25

Example 4

Let (xt,yt) be the observed data at times t for t=1,2,,n, respectively, and let the fitted logistic growth model with autoregressive time series errors be

Yt=β0/(1+β1exp(-β2Xt))+Zt,β0,β1,β2>0Zt=a0+i=1kaiZt-i. 26

The expected value of estimated disturbance terms ε~t is

e^=1n-kt=k+1nyt-β0/(1+β1exp(-β2xt))-a0-i=1kaiyt-i-β0/(1+β1exp(-β2xt-i)), 27

and the variance is

σ^2=1n-kt=k+1nyt-β0/(1+β1exp(-β2xt))-a0-i=1kaiyt-i-β0/(1+β1exp(-β2xt-i))-e^2. 28

Remark 3

After that, we apply uncertain hypothesis test (Ye and Liu 2021) to evaluating the appropriateness of fitted regression model with autoregressive time series errors and estimated disturbance terms.

Forecast value

Let (xn+1,1,xn+1,2,,xn+1,p) be an explanatory vector at time n+1. Assume (i) the fitted regression model with autoregressive time series errors is

Yt=f(Xt1,Xt2,,Xtp|β)+ZtZt=a0+i=1kaiZt-i, 29

and (ii) the estimated disturbance terms ε~t follow the normal uncertainty distribution with expected value e^ determined by (21) and variance σ^2 determined by (22). The forecast uncertain variable of Yn+1 with respect to (xn+1,1,xn+1,2,,xn+1,p) is determined by

Y^n+1=f(xn+1,1,xn+1,2,,xn+1,p|β)+a0+i=1kaiyn+1-i-f(xn+1-i,1,xn+1-i,2,,xn+1-i,p|β)+ε~n+1, 30

and the forecast value is defined as the expected value of the forecast uncertain variable Y^n+1, i.e.,

y^n+1=f(xn+1,1,xn+1,2,,xn+1,p|β)+a0+i=1kaiyn+1-i-f(xn+1-i,1,xn+1-i,2,,xn+1-i,p|β)+e^. 31

Example 5

Let (xn+1,1,xn+1,2,,xn+1,p) be an explanatory vector at time n+1. Assume (i) the fitted linear regression model with autoregressive time series errors is

Yt=β0+β1Xt1+β2Xt2++βpXtp+ZtZt=a0+i=1kaiZt-i, 32

and (ii) the estimated disturbance terms ε~t follow the normal uncertainty distribution with expected value e^ and variance σ^2. The forecast uncertain variable of Yn+1 with respect to (xn+1,1,xn+1,2,,xn+1,p) is

Y^n+1=β0+j=1pβjxn+1,j+a0+i=1kaiyn+1-i-β0-j=1pβjxn+1-i,j+ε~n+1, 33

and the forecast value of Yn+1 is

y^n+1=β0+j=1pβjxn+1,j+a0+i=1kaiyn+1-i-β0-j=1pβjxn+1-i,j+e^. 34

Example 6

Let xn+1 be an explanatory variable at time n+1. Assume (i) the fitted logistic growth model with autoregressive time series errors is

Yt=β0/(1+β1exp(-β2Xt))+Zt,β0,β1,β2>0Zt=a0+i=1kaiZt-i, 35

and (ii) the estimated disturbance terms ε~t follow the normal uncertainty distribution with expected value e^ and variance σ^2. The forecast uncertain variable of Yn+1 with respect to xn+1 is

Y^n+1=β0/(1+β1exp(-β2xn+1))+a0+i=1kaiyn+1-i-β0/(1+β1exp(-β2xn+1-i))+ε~n+1, 36

and the forecast value of Yn+1 is

y^n+1=β0/(1+β1exp(-β2xn+1))+a0+i=1kaiyn+1-i-β0/(1+β1exp(-β2xn+1-i))+e^. 37

Confidence interval

Let (xn+1,1,xn+1,2,,xn+1,p) be an explanatory vector at time n+1. Assume the forecast uncertain variable of Yn+1 with respect to (xn+1,1,xn+1,2,,xn+1,p) is

graphic file with name 500_2021_6362_Equ38_HTML.gif 38

Then, the forecast value of Yn+1 is

y^n+1=f(xn+1,1,xn+1,2,,xn+1,p|β)+a0+i=1kaiyn+1-i-f(xn+1-i,1,xn+1-i,2,,xn+1-i,p|β)+e^. 39

It follows from the operational law that Y^n+1 has a normal uncertainty distribution Inline graphic, i.e.,

Φ^n+1(x)=1+expπ(y^n+1-x)3σ^-1. 40

Taking α as the confidence level, it is easy to verify that

b^=σ^3πln1+α1-α 41

is the minimum value b such that

Φ^n+1(y^n+1+b)-Φ^n+1(y^n+1-b)α. 42

Since Inline graphic, the α confidence interval of Yn+1 is

y^n+1±σ^3πln1+α1-α. 43

Modeling the cumulative number of confirmed COVID-19 cases

In this section, the uncertain regression model with autoregressive time series errors is applied to analyzing the cumulative number of confirmed COVID-19 cases by local transmission in China. We use the same data for comparison with uncertain regression model (Liu 2021) and difference plus uncertain autoregressive model (Ye and Yang 2021), that is, the cumulative number of confirmed COVID-19 cases excluding imported cases from February 13 to March 23, 2020 (see Table 1). The data are plotted in Fig. 1.

Table 1.

Cumulative number of confirmed COVID-19 cases in China (excluding imported cases) from February 13 to March 23, 2020

63851 66492 68500 70548 72436 74185
74576 75465 76288 76936 77150 77658
78064 78497 78824 79251 79824 80026
80151 80270 80389 80516 80591 80632
80668 80685 80699 80708 80725 80729
80733 80737 80738 80739 80739 80739
80739 80740 80740 80744

Fig. 1.

Fig. 1

Cumulative number of confirmed COVID-19 cases excluding imported cases in China

Let 1,2,,40 represent the dates (t) from February 13 to March 23. For example, t=1 and 40 represent February 13 and March 23, respectively. In order to determine the functional relationship between t (the date) and Yt (the cumulative number of confirmed COVID-19 cases on date t), we may use the observed data

(t,yt),t=1,2,,40 44

where yt are the cumulative numbers shown in Table 1 on days t, t=1,2,,40, respectively. For example,

y1=63851,y40=80744. 45

In order to fit the above observed data, we employ the uncertain logistic growth model with autoregressive time series errors,

Yt=β0/(1+β1exp(-β2t))+Zt,β0,β1,β2>0Zt=a0+i=1kaiZt-i+εt 46

where Zt is an error series for t=1,2,,40, and εt are uncertain disturbances for t=k+1,k+2,,40.

Based on the observed data (t,yt), t=1,2,,40, Liu (2021) obtained the fitted logistic growth component

Yt=80822/(1+0.3100exp(-0.1802t)). 47

Then, the observed data of the error series Zt are

zt=yt-80822/(1+0.3100exp(-0.1802t)) 48

for t=1,2,,40 (see Table 2). Next, we apply the UAR(k) model to modeling z1,z2,,z40.

Table 2.

Error data zt (t=1,2,,40)

-352.1443 35.2024 36.3348 313.1674
60.3347 152.8045 276.2352 163.0415
128.1786 44.6745 -363.0434 -381.9831
-421.5404 -364.5330 -354.2881 -193.7600
155.3367 169.3903 136.7594 123.6474
131.9887 166.3455 163.8171 139.9599
121.7188 93.3672 69.4558 46.7695
37.2898 19.1638. 4.6775 -6.7664
-18.6679 -28.4448 -37.4466 -44.9653
-51.2452 -55.4901 -59.8705 -59.5288

First we determine the value of the order k by rolling origin cross validation (Liu and Yang 2020). Assume that T=37, and the average testing error ATE(k) is

ATE(k)=m=0213-mt=38+m40zt-a0-i=1kaizt-i2 49

where (a0,a1,,ak)m are the least squares estimations using the observation data in the training sets {z1,z2,,z37+m} for m=0,1,2, respectively. Table 3 provides a quick summary of the value of ATE(k) with k{1,2,3,4,5}. When k=4, we get the minimum value of ATE(k). Thus the UAR component is

Zt=a0+i=14aiZt-i+εt. 50

Table 3.

Value of ATE(k) with k{1,2,3,4,5}

k 1 2 3 4 5
ATE(k) 2318 636 901 465 1335

Using the error data z1,z2,,z40 and solving the minimization problem

mina0,a1,,a4t=540zt-a0-i=14aizt-i2, 51

we obtain a fitted UAR(4) component

Zt=-7.4091+0.8058Zt-1+0.0642Zt-2-0.0606Zt-3-0.1655Zt-4. 52

From

ε^t=zt+7.4091-0.8058zt-1-0.0642zt-2+0.0606zt-3+0.1655zt-4, 53

we obtain 36 residuals ε^5,ε^6,,ε^40 shown in Fig. 2. Thus, the expected value of estimated disturbance terms is

e^=136t=540ε^t=0.0000, 54

and the variance is

σ^2=136t=540(ε^t-e^)2=96.02542. 55

Assume the estimated disturbance terms follow the normal uncertainty distribution

graphic file with name 500_2021_6362_Equ56_HTML.gif 56

In order to test whether it is appropriate, given a significance level α=0.01, the uncertain hypothesis test for the hypotheses

H0:e=0.0000andσ=96.0254versusH1:e0.0000orσ96.0254

is

W={w5,w6,,w40|-243.2729wt243.2729,t=5,6,,40}c. 57

Since (ε^5,ε^6,,ε^40)W, we reject H0. It follows from

ε^11=-344.2653<-243.2729, 58
ε^17=249.7626>243.2729 59

that z11 and z17 are the outliers and are replaced with

z11=z10+z12-z102=44.6745+-381.9831-44.67452=-168.6543, 60
z17=z16+z18-z162=-193.7600+169.3903+193.76002=-12.1848, 61

respectively (see Table 4).

Fig. 2.

Fig. 2

Residuals in the first iteration

Table 4.

Results of data modification

zt The original data The revised data
z5 60.3347 232.9860
z7 276.2352 157.9230
z11 -363.0434 -125.2114
z12 -381.9831 -295.0974
z17 155.3367 -12.1848

Using the revised data z1,z2,,z40, we obtain a new fitted UAR(4) component

Zt=-6.9372+0.8836Zt-1+0.0762Zt-2-0.1614Zt-3-0.1083Zt-4. 62

From

ε^t=zt+6.9372-0.8836zt-1-0.0762zt-2+0.1614zt-3+0.1083zt-4, 63

we obtain 36 residuals ε^5,ε^6,,ε^40 shown in Fig. 3. Thus, the expected value of estimated disturbance terms is

e^=136t=540ε^t=0.0002, 64

and the variance is

σ^2=136t=540(ε^t-e^)2=78.45782. 65

Fig. 3.

Fig. 3

Residuals in the second iteration

Assume the estimated disturbance terms follow the normal uncertainty distribution

N(0.0002,78.4578). 66

In order to test whether it is appropriate, given a significance level α=0.01, the uncertain hypothesis test for the hypotheses

H0:e=0.0002andσ=78.4578versusH1:e0.0002orσ78.4578

is

W={w5,w6,,w40|-198.7665wt198.7669,t=5,6,,40}c. 67

Since (ε^5,ε^6,,ε^40)W, we reject H0. It follows from

ε^5=-244.6483<-198.7665 68

that z5 is an outlier, and is replaced with

z5=z4+z6-z42=313.1674+152.8045-313.16742=232.9860. 69

After repeating the iterative procedure 5 times, we obtain a new fitted UAR(4) component

Zt=-4.6259+1.2608Zt-1-0.2837Zt-2-0.2820Zt-3+0.0939Zt-4. 70

From

ε^t=zt+4.6259-1.2608zt-1+0.2837zt-2+0.2820zt-3-0.0939zt-4, 71

we obtain 36 residuals ε^5,ε^6,,ε^40 shown in Fig. 4. Thus, the expected value of estimated disturbance terms is

e^=136t=540ε^t=0.0000, 72

and the variance is

σ^2=136t=540(ε^t-e^)2=53.41332. 73

Assume the estimated disturbance terms follow the normal uncertainty distribution

graphic file with name 500_2021_6362_Equ74_HTML.gif 74

In order to test whether it is appropriate, given a significance level α=0.01, the uncertain hypothesis test for the hypotheses

H0:e=0.0000andσ=53.4133versusH1:e0.0000orσ53.4133

is

W={w5,w6,,w40|-135.3184wt135.3184,t=5,6,,40}c. 75

Since (ε^5,ε^6,,ε^40)W, we accept H0. That is, the normal uncertainty distribution Inline graphic is appropriate.

Fig. 4.

Fig. 4

Residuals in the fifth iteration

Using the fitted logistic growth model with autoregressive time series errors and the estimated disturbance terms, we obtain that the forecast uncertain variable of Y41 on day 41 is

graphic file with name 500_2021_6362_Equ76_HTML.gif 76

Then, the forecast value on day 41 (March 24, 2020) is

y^41=808221+0.3100exp(-0.1802×41)-4.6259+1.2608z40-0.2837z39-0.2820z38+0.0939z37+0.0000=80755, 77

and the 95% confidence interval is

[80647,80862]. 78

That is, we predict that the cumulative number on March 24, 2020 will be 80755, and we are 95% sure that the number falls into [80647, 80862].

Comparative analysis

In this section, we compare uncertain regression model with autoregressive time series errors with uncertain regression model (Liu 2021) and difference plus UAR model (Ye and Yang 2021). The estimated standard deviation (see Table 5) obtained by the uncertain logistic growth model, 183.82, is too large to be accepted while the estimated standard deviation obtained by the uncertain logistic growth model with autoregressive time series errors, 53.4133, makes more sense. From Table 5, the uncertain logistic growth model with autoregressive time series errors can predict better. This model has less information loss, but more computation.

Table 5.

Effect of Zt

Zt Actual value Forecast value Standard deviation
ε 80744 80807 183.82
UAR(4) 80744 80755 53.413

Ye and Yang (2021) modeled the second difference of the cumulative cases series using the UAR(5) model. However, so far, there are no definitions of stationary and difference in uncertain time series analysis. In Reference (Ye and Yang 2021), the differencing operation was based on stochastic time series analysis, and this methodology (i.e., difference plus UAR model) used for analyzing time series data was flawed. But in this paper, uncertainty theory supplies the theoretical justifications for the uncertain regression model with autoregressive time series errors, and this method can be used easily.

In 2019, Aslam and Albassam (2019) used the neutrosophic regression model to study the relationship between prostate cancer and dietary fat level. The essential difference between neutrosophic regression and uncertain regression lies in statistical techniques. The former uses neutrosophic statistics which was introduced based on the idea of neutrosophic logic, while the latter uses uncertain statistics which was introduced based on uncertainty theory. In other words, the difference between neutrosophic regression and uncertain regression is that the former deals with the data having Neutrosophy, inexact values, unclear observations, and interval values, while the latter deals with the data containing precise and exact observations. The former provides the parameters, confidence interval and p-values in the indeterminacy interval range, while the latter provides the determined values of all parameters.

Why is stochastic regression model with autoregressive time series errors not suitable for modeling cumulative number?

The difference between traditional regression model with autoregressive time series errors and uncertain regression model with autoregressive time series errors lies in how the disturbance terms are assumed. The former assumes the disturbance term is a stochastic variable, while the latter assumes the disturbance term is an uncertain variable. Since random variables and uncertain variables obey different operational laws, wrong assumptions may mislead the decision-maker.

In the example of COVID-19, we use the Lilliefors test for testing the normality of the residuals (see Fig. 4). The test results show that the null hypothesis is rejected. That is, the disturbance term cannot be characterized as a normal random variable. Therefore, stochastic regression model with autoregressive time series errors is not suitable for modeling cumulative number.

Conclusions

This paper firstly proposed a new model, i.e., the uncertain regression model with autoregressive time series errors. Then, the principle of least squares was used to estimate the unknown parameters. Finally, we made a comparative analysis. The conclusion was that the uncertain regression model with autoregressive time series errors can improve the accuracy of predictions compared with the uncertain regression model.

In future research, we will investigate how to deal with imprecise observations using neutrosophic statistics or uncertain statistics. In addition, referring to goodness of fit test (Aslam 2021a), analysis of means (Aslam 2021c), and skewness and kurtosis estimators (Aslam 2021b), these techniques can be introduced into uncertain statistics as a future research endeavor.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grants No.61873329 and 61873084).

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Data availability

All data generated or analyzed during this study are included in Tables 1 and 2.

Code availability

All the codes implemented during this study are available from the corresponding author on reasonable request.

Ethical approval

This paper does not contain any studies with human participants or animals performed by any of the authors.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Aslam M. A new goodness of fit test in the presence of uncertain parameters. Complex Intell Syst. 2021;7(1):359–365. doi: 10.1007/s40747-020-00214-8. [DOI] [Google Scholar]
  2. Aslam M. A study on skewness and kurtosis estimators of wind speed distribution under indeterminacy. Theo Appl Climatol. 2021;143(3–4):1227–1234. doi: 10.1007/s00704-020-03509-5. [DOI] [Google Scholar]
  3. Aslam M. Analyzing wind power data using analysis of means under neutrosophic statistics. Soft Comput. 2021;25(10):7087–7093. doi: 10.1007/s00500-021-05661-0. [DOI] [Google Scholar]
  4. Aslam M, Albassam M. Application of neutrosophic logic to evaluate correlation between prostate cancer mortality and dietary fat assumption. Symmetry. 2019;11(3):330. doi: 10.3390/sym11030330. [DOI] [Google Scholar]
  5. Chen D. Tukeys biweight estimation for uncertain regression model with imprecise observations. Soft Comput. 2020;24(22):16803–16809. doi: 10.1007/s00500-020-04973-x. [DOI] [Google Scholar]
  6. Chen D, Yang X. Maximum likelihood estimation for uncertain autoregressive model with application to carbon dioxide emissions. J Intell Fuzzy Syst. 2021;40(1):1391–1399. doi: 10.3233/JIFS-201724. [DOI] [Google Scholar]
  7. Chen D, Yang X. Ridge estimation for uncertain autoregressive model with imprecise observations. Int J Uncertain Fuzzi Knowl-Based Syst. 2021;29(1):37–55. doi: 10.1142/S0218488521500033. [DOI] [Google Scholar]
  8. Chen X, Li J, Xiao C, Yang P. Numerical solution and parameter estimation for uncertain SIR model with application to COVID-19. Fuzzy Optim Decis Mak. 2021;20(2):189–208. doi: 10.1007/s10700-020-09342-9. [DOI] [Google Scholar]
  9. Cochrane D, Orcutt GH. Application of least squares regression to relationships containing autocorrelated error terms. J Am Stat Assoc. 1949;44(245):32–61. [Google Scholar]
  10. Durbin J. Estimation of parameters in time-series regression models. J Royal Stat Soc Series B. 1960;22(1):139–153. [Google Scholar]
  11. Jia L, Chen W. Uncertain SEIAR model for COVID-19 cases in China. Fuzzy Optim Decis Mak. 2021;20(2):243–259. doi: 10.1007/s10700-020-09341-w. [DOI] [Google Scholar]
  12. Lio W, Liu B. Residual and confidence interval for uncertain regression model with imprecise observations. J Intell Fuzzy Syst. 2018;35(2):2573–2583. doi: 10.3233/JIFS-18353. [DOI] [Google Scholar]
  13. Lio W, Liu B. Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput. 2020;24(13):9351–9360. doi: 10.1007/s00500-020-04951-3. [DOI] [Google Scholar]
  14. Lio W, Liu B. Initial value estimation of uncertain differential equations and zero-day of COVID-19 spread in China. Fuzzy Optim Decis Mak. 2021;20(2):177–188. doi: 10.1007/s10700-020-09337-6. [DOI] [Google Scholar]
  15. Liu B. Uncertainty theory. 2. Berlin: Springer-Verlag; 2007. [Google Scholar]
  16. Liu S. Leave-p-out cross-validation test for uncertain Verhulst-Pearl model with imprecise observations. IEEE Access. 2019;7:131705–131709. doi: 10.1109/ACCESS.2019.2939386. [DOI] [Google Scholar]
  17. Liu Z. Uncertain growth model for the cumulative number of COVID-19 infections in China. Fuzzy Optim Decis Mak. 2021;20(2):229–242. doi: 10.1007/s10700-020-09340-x. [DOI] [Google Scholar]
  18. Liu Z. Generalized moment estimation for uncertain differential equations. Appl Math Comput. 2021;392:125724. [Google Scholar]
  19. Liu Z, Jia L. Cross-validation for the uncertain Chapman-Richards growth model with imprecise observations. Int J Uncertain Fuzzi Knowl-Based Syst. 2020;28(5):769–783. doi: 10.1142/S0218488520500336. [DOI] [Google Scholar]
  20. Liu Z, Yang X. Cross validation for uncertain autoregressive model. Commun Stat Simul Comput. 2020 doi: 10.1080/03610918.2020.1747077. [DOI] [Google Scholar]
  21. Liu Z, Yang Y. Least absolute deviations estimation for uncertain regression with imprecise observations. Fuzzy Optim Decis Mak. 2020;19(1):33–52. doi: 10.1007/s10700-019-09312-w. [DOI] [Google Scholar]
  22. Liu Y, Liu B (2020) Estimating unknown parameters in uncertain differential equation by maximum likelihood estimation, Technical Report
  23. Sheng Y, Yao K, Chen X. Least squares estimation in uncertain differential equations. IEEE Trans Fuzzy Syst. 2020;28(10):2651–2655. doi: 10.1109/TFUZZ.2019.2939984. [DOI] [Google Scholar]
  24. Song Y, Fu Z. Uncertain multivariable regression model. Soft Comput. 2018;22(17):5861–5866. doi: 10.1007/s00500-018-3324-5. [DOI] [Google Scholar]
  25. Tang H. Uncertain vector autoregressive model with imprecise observations. Soft Comput. 2020;24(22):17001–17007. doi: 10.1007/s00500-020-04991-9. [DOI] [Google Scholar]
  26. Yang X, Liu B. Uncertain time series analysis with imprecise observations. Fuzzy Optim Decis Mak. 2019;18(3):263–278. doi: 10.1007/s10700-018-9298-z. [DOI] [Google Scholar]
  27. Yang X, Ni Y. Least-squares estimation for uncertain moving average model. Commun Stat Theory Methods. 2021;50(17):4134–4143. doi: 10.1080/03610926.2020.1713373. [DOI] [Google Scholar]
  28. Yang X, Liu Y, Park G. Parameter estimation of uncertain differential equation with application to financial market. Chaos Solitons Fract. 2020;139:110026. doi: 10.1016/j.chaos.2020.110026. [DOI] [Google Scholar]
  29. Yang X, Park G, Hu Y. Least absolute deviations estimation for uncertain autoregressive model. Soft Comput. 2020;24(23):18211–18217. doi: 10.1007/s00500-020-05079-0. [DOI] [Google Scholar]
  30. Yao K, Liu B. Uncertain regression analysis: an approach for imprecise observations. Soft Comput. 2018;22(17):5579–5582. doi: 10.1007/s00500-017-2521-y. [DOI] [Google Scholar]
  31. Yao K, Liu B. Parameter estimation in uncertain differential equations. Fuzzy Optim Decis Mak. 2020;19(1):1–12. doi: 10.1007/s10700-019-09310-y. [DOI] [Google Scholar]
  32. Ye T, Liu Y. Multivariate uncertain regression model with imprecise observations. J Ambient Intell Human Comput. 2020;11(11):4941–4950. doi: 10.1007/s12652-020-01763-z. [DOI] [Google Scholar]
  33. Ye T, Liu B. Uncertain hypothesis test with application to uncertain regression analysis. Fuzzy Optim Decis Mak. 2021 doi: 10.1007/s10700-021-09365-w. [DOI] [Google Scholar]
  34. Ye T, Yang X. Analysis and prediction of confirmed COVID-19 cases in China with uncertain time series. Fuzzy Optim Decis Mak. 2021;20(2):209–228. doi: 10.1007/s10700-020-09339-4. [DOI] [Google Scholar]
  35. Zhang C, Liu Z, Liu J. Least absolute deviations for uncertain multivariate regression model. Int J General Syst. 2020;49(4):449–465. doi: 10.1080/03081079.2020.1748615. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analyzed during this study are included in Tables 1 and 2.


Articles from Soft Computing are provided here courtesy of Nature Publishing Group

RESOURCES