Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Mar 13;49(9):2271–2286. doi: 10.1080/02664763.2021.1899142

Empirical likelihood estimation for linear regression models with AR(p) error terms with numerical examples

Şenay Özdemir a,CONTACT, Yeşim Güney b, Yetkin Tuaç b, Olcay Arslan b
PMCID: PMC9267430  PMID: 35812070

Abstract

Linear regression models are useful statistical tools to analyze data sets in different fields. There are several methods to estimate the parameters of a linear regression model. These methods usually perform under normally distributed and uncorrelated errors. If error terms are correlated the Conditional Maximum Likelihood (CML) estimation method under normality assumption is often used to estimate the parameters of interest. The CML estimation method is required a distributional assumption on error terms. However, in practice, such distributional assumptions on error terms may not be plausible. In this paper, we propose to estimate the parameters of a linear regression model with autoregressive error term using Empirical Likelihood (EL) method, which is a distribution free estimation method. A small simulation study is provided to evaluate the performance of the proposed estimation method over the CML method. The results of the simulation study show that the proposed estimators based on EL method are remarkably better than the estimators obtained from CML method in terms of mean squared errors (MSE) and bias in almost all the simulation configurations. These findings are also confirmed by the results of the numerical and real data examples.

Keywords: AR(p) error terms, dependent error, empirical likelihood, linear regression

1. Introduction

Consider the following linear regression model

yt=xtTβ+εtfort=1,2,,N (1)

where yt are the t-th response variable, xtRM are the design vector, βRM is the unknown M-dimensional parameter vector and εt are the uncorrelated errors with E(εt)=0 and Var(εt)=σ2.

It is known that the LS estimators of the regression parameters are obtained by minimizing the sum of the squared residuals or solving the following estimating equation

1Nt=1N(ytxtTβ)xt=0. (2)

The LS estimators are the minimum variance unbiased estimators of β if εt are normally distributed. However, in real data applications the normality assumption may not be completely satisfied. If the normally distributed error term is not a reasonable assumption for a regression model, some alternative distributions can be used as the error distribution and the maximum likelihood estimation method can be applied to estimate the regression and other model parameters. On the other hand, if an error distribution is not be easy to specify some alternative distribution free estimation methods should be preferred to obtain estimators for the parameters of interest. One of such methods is the EL method introduced by Owen [5–7]. A noticeable advantage of EL method is that it creates likelihood-type inference without specifying any distributional model for the data. The EL method supposes that there are unknown πt, t=1,2,,N, probability weights for each observation and it tries to estimate these probability weights by maximizing an EL function defined as the production of πts under some constraints related with πts, and the regression parameters. The EL method can be mathematically defined as follows. Maximize the following EL function

L(β)=t=1Nπt (3)

under the constraints

t=1Nπt=1 (4)
t=1Nπtxt(ytxtTβ)=0. (5)

Note that the constraint given in Equation (5) is similar to the estimating equation given in Equation (2). The only difference is that in Equation (2) we use the equal known weight (1/N) for each observation, but in Equation (5) we use the unknown probability weights πts by considering that each observation has different contribution to the estimation procedure. Further, if we also want to estimate the error variance along with the regression parameters we add the following constraint to the constraints given in Equations (4)–(5).

t=1Nπt[(ytxtTβ)2σ2]=0. (6)

The estimation of πts and hence the model parameters β and σ2 can be done by maximizing the EL function given in Equation (3) under the constraints (4) – (6). In general, Lagrange multipliers method can be used for these types of constrained optimization problems. For our problem the Lagrange function will be as follows

L(π,β,λ0,λ1T,λ2)=t=1Nlog(nπt)+λ0(t=1Nπt1)+λ1Tt=1Nπtxt(ytxtTβ)+λ2t=1Nπt[(ytxtTβ)2σ2]

where π=[π1,π2,,πN]T, λ0,λ2R1 and λ1TRM are Lagrange multipliers. Taking the derivatives of Lagrange function with respect to each πt and setting to zero we get

πt=1N+λ1Txt(ytxtTβ)+λ2[(ytxtTβ)2σ2]. (7)

Substituting πt given in Equation (7) in the EL function and constraints, the optimization problem is reduced to the problem of finding β, σ and Lagrange multipliers. However, this problem is still not easy to handle to obtain the estimators for β and σ. The solution of this problem are considered by several author using several different approaches. For details of the algorithms they are suggesting, one can see the papers [5–10,18].

It should be noted that in all of the mentioned papers researchers consider the EL estimation method to estimate the parameters of a regression model with uncorrelated error terms. However, in practice, uncorrelated error assumption may not be plausible for some data sets. For these data sets regression analysis should be carried on with an autoregressive error terms regression model (regression model with AR(p) error terms). An autoregressive error terms regression model is defined as follows.

yt=i=1Mxt,iβi+et,t=1,2,,N, (8)

where yt is the response variable, xt,i is the predictor variable, βi is the unknown regression parameter and et is the AR(p) error term with

et=ϕ1et1++ϕpetp+at. (9)

Here ϕj,j=1,2,3,,p are the unknown autoregressive parameters and at,t=1,2,3,,N are i.i.d. random variables with E(at)=0 and Var(at)=σ2. Note that this regression equation is different from the regression equation given in (1). However, using the back shift operator B, this equation can be transformed to the usual regression equation as follows. Let

at=Φ(B)et=etϕ1et1ϕpetp,Φ(B)yt=ytϕ1yt1ϕpytp, (10)

and

Φ(B)xt=xtϕ1xt1,iϕpxtp,i. (11)

Then, the regression model given in (8) can be rewritten as

Φ(B)yt=i=1MβiΦ(B)xt,i+at,t=1,2,,N. (12)

In literature, parameters of an autoregressive error term regression model are estimated using LS, ML or CML estimation methods. Some of the related papers are [1,3,15,16] and [17]. In all of the mentioned papers some known distributions, such as normal or t, are assumed as the error distribution to carry on estimation of the parameters of interest in this model. However, since imposing appropriate distributional assumptions on the error term of a regression model may not be easy some other distribution free estimation methods may be preferred to carry on regression analysis of a data set. In this study, unlike the papers in literature, we will not assume any distribution for the error terms and propose to use the EL estimation method to estimate the parameters of the linear regression model described in the previous paragraph. A brief demonstration of our purpose given by [11] with an assumption such that variance of error terms is known.

The rest of the paper is organized as follows. In Section 2, the CML and the EL estimation methods for the linear models with AR(p) error terms are given. In Section 3, a small simulation study and a numerical and a real data examples are provided to assess the performance of the EL based estimators over the estimators obtained from the classical CML method. Finally, we draw some conclusions in Section 4.

2. Parameter estimation for linear regression models with AR(p) error terms

In this section, we describe in detail how the EL method is used to estimate the parameters of an autoregressive error term regression model. We will first recall the CML method under normality assumption of at. Note that since the exact likelihood function could be well approximated by the conditional likelihood function [2] CML estimation method are often used in cases where ML estimation method is not feasible to carry on.

2.1. Conditional maximum likelihood estimation

In general, a system of nonlinear equations of the parameters have to be solved to obtain ML estimators. However, since in most of the cases ML estimators cannot be analytically obtained numerical procedures should be used to get the estimates for the parameters of interest. An alternative way for numerical maximization of the exact likelihood function is to regard the value of the first p observations as known and to maximize the likelihood function conditioned on the first observations. In this part, the CML estimation method will be considered for the regression model given in (8).

Let the error terms at in the regression model given in Equation (12) have normal distribution with zero mean and σ2 variance. Then, the conditional log-likelihood function will be as

lnL=cNp2lnσ212σ2t=p+1N(Φ(B)yti=1MβiΦ(B)xt,i)2 (13)

[1]. Taking the derivatives of the conditional log-likelihood function with respect to the unknown parameters, setting them to zero and rearranging the resulting equations yields the following estimating equations for the unknown parameters of the regression model under consideration

β~=[Φ~(B)XTΦ~(B)X]1[Φ~(B)XTΦ~(B)Y] (14)
σ~2=1Np[Φ~(B)YΦ~(B)XTβ~]T[Φ~(B)YΦ~(B)XTβ~] (15)
Φ~=R1(β~)R0(β~) (16)

where Φ~(B)X=[Φ~(B)xt,i], Φ~(B)Y=[Φ~(B)yt],

R(β~)=[t=p+1Net12t=p+1Net1et2t=p+1Net1etpt=p+1Net22t=p+1Net2etpt=p+1Netp2]

and R0(β~)=[t=p+1Netet1t=p+1Netet2t=p+1Netetp].

However, since these estimating equations cannot be explicitly solved to get estimators for the unknown parameters some numerical methods should be used to compute the estimates. Among all numerical methods the estimating equations suggest a simple iteratively reweighting algorithm (IRA) to compute estimates for the unknown parameters [1,16].

2.2. Empirical likelihood estimation

In this section we consider the EL method to estimate the unknown parameters of the regression model given in Equation (12). The required constraints related to the parameters will be formed similar to the EL estimation used in classical regression case (uncorrelated error term regression model). Since we will use CML estimation approach we will again assume that first p observations are known and will form the conditional empirical likelihood (CEL) function using the unknown probability weights πt for the observations t=p+1,,N. It should be noted that in CML estimation case at are assumed to have normal distribution, however, in CEL estimation case we do not need to assume any specific distribution for at. Now we can formulate the CEL estimation procedure as follows.

Let πt for t=p+1,,N be the unknown probabilities for the observations i=p+1,,N. Then, maximize the following CEL function

maxπt(0,1)t=p+1Nlog(πt) (17)

under the constraints

t=p+1Nπt=1 (18)
t=p+1Nπt(Φ(B)yti=1MβiΦ(B)xt,i)Φ(B)xt=0 (19)
t=p+1Nπt(Φ(B)yti=1MβiΦ(B)xt,i)et,p=0 (20)
t=p+1Nπt[(Φ(B)yti=1MβiΦ(B)xt,i)2σ2]=0 (21)

to obtain CEL estimators for the parameters of the regression model given in Equation (12). Here,

Φ(B)xt=[Φ(B)xt,1,Φ(B)xt,2,,Φ(B)xt,M]T and et,p=[et1,et2,,etp]T.

Since this is a constraint optimization problem Lagrange multiplier method can be used to solve it. To this extend, let λ0,λ=(λ1T,λ2T,λ3), ϕ and π denote the Lagrange multipliers, vector of ϕi,i=1,2,,p and vector of πt,t=p+1,,N, respectively. Then, the Lagrange function for this optimization problem can be written as

L(β,ϕ,σ2,π,λ,λ0)=t=p+1Nlog(πt)+λ0(t=p+1Nπt1)+λ1T(t=p+1Nπt(Φ(B)yti=1MβiΦ(B)xt,i)Φ(B)xt)+λ2T(t=p+1Nπt(Φ(B)yti=1MβiΦ(B)xt,i)et,p)+λ3(t=p+1Nπt[(Φ(B)yti=1MβiΦ(B)xt,i)2σ2])

Taking the derivatives of L(β,ϕ,σ2,π,λ,λ0) with respect to πt and the Lagrangian multipliers, and setting the resulting equations to zero we get first-order conditions of this optimization problem. Solving the first-order conditions with respect to πt yields

πt=1(Np)+λ1TΨ1,t+λ2TΨ2,t+λ3Ψ3,t,t=p+1,,N (22)

where

Ψ1,t=(Φ(B)yti=1MβiΦ(B)xt,i)Φ(B)xtΨ2,t=(Φ(B)yti=1MβiΦ(B)xt,i)et,pΨ3,t=[(Φ(B)yti=1MβiΦ(B)xt,i)2σ2].

Substituting these values of πt into Equation (17) we find that

l(λ,β,ϕ,σ2)=t=p+1Nlogπt=t=p+1Nlog((Np)+λ1TΨ1,t+λ2TΨ2,t+λ3Ψ3,t) (23)

Since 0<πt<1, the EL method maximizes this function over the set (Np)+λ1TΨ1,t+λ2TΨ2,t+λ3Ψ3,t>1. The rest of this optimization problem will be carried on as follows.

For given β, ϕ and σ2 minimize the function l(λ,β,ϕ,σ2) given in equation (23) with respect to the Lagrange multipliers λ=(λ1,λ2,λ3). That is, for given β, ϕ and σ2 solve the following minimization problem to get the values of Lagrange multipliers

λ(β,ϕ,σ2)=argminλl(λ,β,ϕ,σ2).

Since the solution of this minimization problem cannot be obtained explicitly numerical methods should be used to get solutions. Substituting this solution in l(λ,β,ϕ,σ2) yields the function l(λ(β,ϕ,σ2),β,ϕ,σ2) that is only depend on the model parameters β, ϕ and σ2. This function can be regarded as a profile conditional empirical log-likelihood function. The CEL estimators β^, ϕ^ and σ^2 will be obtained by maximizing l(λ(β,ϕ,σ2),β,ϕ,σ2) function with respect to the model parameters β, ϕ and σ2. That is, the CEL estimators will be the solutions of the following maximization problem

(β^,ϕ^,σ^2)=argmaxβ,ϕ,σ2l(λ(β,ϕ,σ2),β,ϕ,σ2).

Since, there is no explicit solution of this maximization problem to explicitly obtain the CEL estimators β^, ϕ^ and σ^2 numerical methods should be used to obtain CEL estimators. In this paper, we use a Newton type algorithm to carry on this optimization problem. The steps of our algorithm are as follows.

Step 0. Set initial values λ(0), β(0), ϕ(0) and σ2(0). Fix stopping rule ϵ. The starting value of λ(0) can be set to be the zero vector, but setting it n for each Lagrange multiplier yields faster convergence.

Step 1. The function l(λ,β(0),ϕ(0),σ2(0)) given in equation (23) is minimized with respect to λ. This will be done by iterating

λj+1=λj+12(lλ1lλ)

until convergence is satisfied. Here, lλ is the first-order derivative and lλ is the second-order derivative of l(λ,β(0),ϕ(0),σ2(0)) with respect to λ. By doing so we calculate λ(m) for m=1,2,3, value for the Lagrange multipliers computed at m-th step.

Step 2. After finding λ(m) at Step 1, the function l(λ(m),β,ϕ,σ2) is maximized with respect to ϕ, β and σ2 using following updating equations.

ϕj+1=ϕj12(lϕ1lϕ)βj+1=βj12(lβ1lβ).σj+12=σj212(lσ21lσ2).

Here lβ, lϕ and lσ2 are the first-order derivatives of l(λ(m),β,ϕ,σ2) with respect to β, ϕ and σ2 while lβ, lϕ and lσ2 denote the second-order derivatives. At this step we obtain β(m), ϕ(m) and σ2(m), for m=1,2,3,

Step 3. Together, steps 1 and 2 accomplish one iteration. Therefore, after steps 1 and 2 we obtain the values λ(m), β(m), ϕ(m) and σ2(m) computed at m-th iteration. These are current estimates for the parameters and are used as initial values to obtain the estimates in (m+1)-th iteration. Therefore, repeat steps 1 and 2 until

β(m+1)β(m)ϕ(m+1)ϕ(m)σ2(m+1)σ2(m)<ϵ

is satisfied.

The performance of the proposed algorithm evaluated by the help of simulation study and a numerical example given in the next section.

Asymptotic distribution of the proposed estimator can be given under following assumptions. Let g(x,β0)=t=p+1Nπt(Φ(B)yti=1MβiΦ(B)xt,i)Φ(B)xt=0

  • E[g(x,β0)gT(x,β0)] is positive definite.

  • g(x,β)β is continuous in a neighborhood of β0.

  • g(x,β)/β and g(x,β)3 are bounded in the neighborhood β0.

  • The rank of E[g(x,β0)β] is k.

  • 2g(x,β)/ββT is continuous in β in a neighborhood of β0 and the 2g(x,β)/ββT is bounded.

Then, under these assumptions,

n(β^β0)N(0,V)

where

V=[E(g(x,β0)β)TE(g(x,β0)g(x,β0)T)E(g(x,β0)β)]1

[14].

One can prefer to use the observed Fisher information matrix to calculate the standard error or confidence intervals. The observed Fisher information matrix for CEL can be obtained by the help of lβ, lϕ and lσ2 which are mentioned in Step 2 of the algorithm.

3. Simulation study

In this section, we give a small simulation study and a numerical example to illustrate the performance of the empirical likelihood estimators for the autoregressive error term regression model over the estimators obtained from the normally distributed error terms. We use R version 3.4.0 (2017-04-21) [13] to carry on our simulation study and numerical example.

3.1. Simulation design

We consider the second-order (AR(2)) autoregressive error term model with M=3 regression parameters. The sample sizes are taken as n = 25, 50 and 100. For each case the explanatory variables xt are generated from standard normal distribution (xtN(0,1)). The values of the parameters are taken as β=(β1,β2,β3)=(1,3,5) , σ2=1 and ϕ=(ϕ1,ϕ2)=(0.8,0.2). Note that the values of ϕ are taken as above to guarantee the stationarity assumption of the error terms. We consider three different distributions for the error term at: N(0,1), 0.9N(0,1)+0.1N(30,1) and 0.9N(0,1)+0.1N(0,10). Note that last two distributions are used to generate outliers in y-direction. After setting the values of the model parameters β, ϕ and σ2 and deciding the distribution of the error term, the values of the response variable are generated using ϕ(B)yt=i=1Mβiϕ(B)xt,i+at.

We compare the results of the CEL estimators with the results of the CML estimators. Mean squared error (MSE) and bias values are calculated as performance measures to compare the estimators. These values are calculated using the following equations for 1000 replications:

  1. MSE(β^)=(11000)i=11000(β^iβ)2, bias(β^)=β¯β,

  2. MSE(ϕ^)=(11000)i=11000(ϕi^ϕ)2, bias(ϕ^)=ϕ¯ϕ,

  3. MSE(σ2^)=(11000)i=11000(σ^i2σ2)2, bias(σ2^)=σ¯2σ2

where β¯=(11000)i=11000β^i , ϕ¯=(11000)i=11000ϕ^i,and σ¯2=(11000)i=11000σ^i2.

3.2. Simulation results

In Table 1, mean estimates, MSE and bias values of the CEL and CML estimators computed over 1000 replications for the normally distributed error case (for the case without outliers). We observe from these results that the both estimators have similar behavior for the large sample sizes when outliers are not the case. On the other hand, for smaller sample sizes the CEL estimators have better performance than the CML estimators in terms of the MSE values. Thus, for small sample sizes the estimator based on EL method may be a good alternative to the CML method.

Table 1.

Estimates, MSE and bias values of the estimates from atN(0,1) for different sample sizes without outliers.

  n=25 n=50 n=100
  Normal Empirical Normal Empirical Normal Empirical  
  β^1 0.98796 1.00138 0.99845 1.00178 0.99932 1.00068
β1 MSE 0.03276 0.0058 0.01445 0.00228 0.00584 0.00074
  BIAS 0.14255 0.05276 0.09522 0.0319 0.0605 0.01832
  β^2 3.00136 3.00697 2.99808 3.00009 2.9978 2.99753
β2 MSE 0.03247 0.00613 0.01346 0.00225 0.00664 0.00071
  BIAS 0.14406 0.05512 0.09181 0.03214 0.06437 0.01797
  β^3 5.00803 5.00139 4.99681 4.99966 4.99913 5.00266
β3 MSE 0.03481 0.00657 0.01421 0.00202 0.00583 0.00069
  BIAS 0.14491 0.05574 0.09389 0.03042 0.06045 0.01789
  ϕ^1 0.76906 0.80372 0.77997 0.8013 0.79587 0.80081
ϕ1 MSE 0.0457 0.00028 0.02118 0.0001 0.01042 0.00004
  BIAS 0.16869 0.01328 0.11537 0.00784 0.08196 0.00465
  ϕ^2 0.2095 0.19714 0.20945 0.199 0.21079 0.20002
ϕ2 MSE 0.04266 0.00029 0.01949 0.0001 0.00951 0.00004
  BIAS 0.16559 0.01394 0.11067 0.00824 0.07779 0.00467
  σ^2 0.88229 1.15221 0.94378 1.1022 0.96983 1.07259
σ2 MSE 0.03511 0.03062 0.01351 0.01454 0.00633 0.00784
  BIAS 0.15368 0.15458 0.09334 0.1022 0.06461 0.07326

Note: True values are (β1,β2,β3)=(1,3,5), (ϕ1,ϕ2)=(0.8,0.2) and σ2=1.

The simulation results for the outlier cases are reported in Tables 2 and 3. From these tables we observe that when outliers are introduced in the data the CML estimator is noticeably influenced by the outliers with higher MSE values. On the other hand, the estimators based on empirical likelihood still retain their good performance in terms of having smaller MSE and bias values. To sum up, the performance of the estimators obtained from the empirical likelihood is superior to the performance of the estimators obtained from CML method when some outliers are present in the data and/or the sample size is smaller.

Table 2.

Estimates, MSE and bias values of the estimates for different sample sizes from at0.9N(0,1)+0.1N(30,1).

    n=25 n=50 n=100
    Normal Empirical Normal Empirical Normal Empirical
  β^1 0.99758 1.0001 1.01599 1.0008 0.9947 1.00067
β1 MSE 0.53329 0.00588 0.11102 0.00189 0.02944 0.00071
  BIAS 0.56527 0.05219 0.26139 0.02957 0.13598 0.01776
  β^2 2.97227 3.00938 2.98684 2.9988 2.99961 3.00186
β2 MSE 0.51638 0.00659 0.10983 0.00222 0.02613 0.00068
  BIAS 0.5623 0.05636 0.26129 0.03185 0.12822 0.01766
  β^3 4.97749 5.00772 5.00388 5.00511 4.99569 5.00152
β3 MSE 0.50668 0.00559 0.10949 0.00218 0.0282 0.00072
  BIAS 0.55812 0.05204 0.2574 0.03174 0.13337 0.01798
  ϕ^1 1.77218 0.80245 1.7009 0.80143 1.63445 0.80104
ϕ1 MSE 0.9494 0.00029 0.81299 0.00009 0.69691 0.00003
  BIAS 0.97218 0.0136 0.9009 0.00746 0.83445 0.00446
  ϕ^2 0.94992 0.19518 0.75483 0.19879 0.64632 0.20002
ϕ2 MSE 0.58956 0.00033 0.31046 0.00011 0.19993 0.00004
  BIAS 0.74992 0.01471 0.55483 0.0083 0.44632 0.00462
  σ^2 5.95214 1.13923 4.42534 1.10108 3.30909 1.06915
σ2 MSE 24.68756 0.02581 11.7763 0.0142 5.34667 0.00688
  BIAS 4.95214 0.1419 3.42534 0.10183 2.30909 0.06921

Note: True values are (β1,β2,β3)=(1,3,5), (ϕ1,ϕ2)=(0.8,0.2) and σ2=1.

Table 3.

Estimates, MSE and bias values of the estimates for different sample sizes from at0.9N(0,1)+0.1N(0,10).

    n=25 n=50 n=100
    Normal Empirical Normal Empirical Normal Empirical
  β^1 1.00012 1.00766 0.9878 0.99887 0.99299 1.00153
β1 MSE 0.21859 0.00777 0.11943 0.00276 0.06829 0.00076
  BIAS 0.33632 0.05886 0.26183 0.034 0.20502 0.01859
  β^2 2.99806 2.9978 2.98896 3.00449 2.98302 3.00347
β2 MSE 0.23989 0.0086 0.13169 0.00265 0.06363 0.00083
  BIAS 0.34908 0.06087 0.2692 0.03391 0.19512 0.01922
  β^3 5.01594 5.00257 4.99975 5.00495 4.99407 4.99616
β3 MSE 0.22748 0.0079 0.12487 0.00273 0.06541 0.00083
  BIAS 0.3443 0.05964 0.26638 0.03409 0.19757 0.01894
  ϕ^1 0.79014 0.80161 0.73106 0.80076 0.7298 0.80066
ϕ1 MSE 0.43476 0.00022 0.18784 0.00009 0.08888 0.00004
  BIAS 0.51506 0.01213 0.34467 0.00757 0.23919 0.00471
  ϕ^2 0.46927 0.19645 0.26362 0.19819 0.21955 0.19966
ϕ2 MSE 0.7917 0.00031 0.23481 0.0001 0.08613 0.00004
  BIAS 0.70074 0.01421 0.37436 0.00786 0.23868 0.00473
  σ^2 2.65719 1.14735 2.70776 1.09725 2.94881 1.06939
σ2 MSE 3.79346 0.02846 3.59029 0.01368 4.16932 0.00702
  BIAS 1.65951 0.14943 1.70791 0.09889 1.94881 0.0697

Note: True values are (β1,β2,β3)=(1,3,5), (ϕ1,ϕ2)=(0.8,0.2) and σ2=1.

3.3. Numerical example

Montgomery et al. [4] give soft drink data example relating annual regional advertising expenses to annual regional concentrate sales for a soft drink company for 20 years. After calculating LS residuals for the linear regression model, the assumption of uncorrelated errors can be tested by using the Durbin–Watson statistic, which is a test statistics detecting the positive autocorrelation. The critical values of the Durbin–Watson statistic are dL = 1.20 and dU = 1.41 for one explanatory variable and 20 observations. The calculated value of the Durbin–Watson statistic d = 1.08 and this value is less than the critical value dL = 1.20. Therefore it can be said that the errors in the regression model are positively autocorrelated.

We consider a linear regression model with AR(p) error terms and find the LS estimators for the model parameters. Then, we use the LS estimators as the initial values to run the algorithm to compute CEL and CML estimators. The estimates of the parameters are given in Table 4. We observe from the results given in Table 4 that the CEL and the CML methods give similar estimates for the regression and the autoregressive parameters. But, the CEL estimate of error variance is smaller than the estimate obtained from the CML method. Figure 1(a) shows the scatter plot of the data set with the fitted regression lines obtained from CEL and CML methods. We observe that the fitted lines are coinciding. To further explore the behavior of the estimators against the outliers we create one outlier by replacing the last observation with an outlier. For the new data set (the same data set with one outlier) we again find the CEL and the CML estimates for all the parameters. We also provide these estimates in Table 4. Figure 1(b) depicts the scatter plot of the data and the fitted lines obtained from the CEL and the CML estimates. We observe that, unlike the fitted line obtained from the CEL method, the fitted line obtained from the CML method is badly influenced by the outlier.

Table 4.

The parameter estimates for soft drink data.

  Without outlier With one outlier
  Empirical Normal Empirical Normal
β^0 1593.95911 1645.37912 1617.57068 560.9601
β^1 20.07335 19.80988 19.93477 30.7569
ϕ^1 0.89579 0.56856 0.89814 2.5689
σ^2 6.43127 16.96544 2.44785 2025.174

Figure 1.

Figure 1.

Scatter plot of soft drink data and, CEL and CML fits with outlier.

3.4. Real data example

We use a real data set consisting of CO2 emission (metric kgs per capita) and energy usage (kg of oil equivalent per capita) of Turkey between the years 1974 and 2014. The data set can be obtained from ‘https://data.worldbank. org’. The scatter plot in Figure 2(a) shows that a linear relationship between logarithms of CO2 emission and energy usage. If the error terms are calculated using the LS estimates (−15.06489, 1.16258), it can be seen that the data has AR(1) structure with ϕ^=0.65 autocorrelation coefficient, since the Durbin–Watson test statistic and its p-value are 0.662571 and 1.235e-07, respectively. The ACF (autocorrelation function) and PACF (Partial ACF) plots given in Figure 3 also support the same results. Further, Q-Q plot of residuals in Figure 3 yields that the assumption of normally distributed error terms is achieved.

Figure 2.

Figure 2.

Scatter plot of CO2 emission–energy usage data, and CEL and CML fits.

Figure 3.

Figure 3.

ACF, PACF and Q-Q of plots of residuals CO2 emission–energy usage data.

For this data set, we use the CML and CEL estimates to estimate the parameters of regression model and autocorrelation structure and also error variance. Note that the LS estimates of original data are used as initial values to compute the CML and the CEL estimates given in Table 5. The fitted regression lines along with the scatter plot of original data are also given in Figure 2(a), and we can see that both methods give similar results.

Table 5.

The parameter estimates for CO2 emission–energy usage data.

  Without outlier With one outlier
  Empirical Normal Empirical Normal
β^0 15.389316 13.827491 13.891223 25.869536
β^1 1.187502 1.074341 1.076084 1.952316
ϕ^1 0.655439 0.861551 0.511249 0.201119
σ^2 0.824727 0.021447 1.148087 0.900177

To evaluate the performance of CEL method when there are outliers in the data or departure from normality, we create an artificial outlier multiplying last observation by five in the data, and use the both estimation methods to estimate the model parameters. The estimates are provided in Table 5 and the fitted lines obtained from both methods can be seen in Figure 2(b). We can easily see from Figure 2(b), the CEL method has better fit than the CML method to the data with outlier. Although real autocorrelation is positive, the CML estimate of autocorrelation is calculated as negative because of the outlier effect. In this case the LS estimates of regression parameters are (−27.4952, 2.0717) and estimated autocorrelation is −0.0107.

4. Conclusion

In literature, the CML or the LS methods are often adapted to estimate the parameters of an autoregressive error terms regression model. The CML method is applied to carry on the estimation under some distributional assumption on the error therm. However, for some data sets it may not be plausible to make some distributional assumption on the error term due to the lack of information about data sets. For those data sets, some alternative distributional assumption free methods should be preferred to continue regression analysis. One of those distributional free estimation methods is the EL method, which can also be used for small sample sizes. In this paper, we have used the EL estimation method to estimate the parameters of an autoregressive error terms regression model. We have defined a CEL function and constructed the necessary constraints using probability weights and the normal equations borrowed from the classical LS estimation method to conduct parameter estimation. To evaluate and compare the performance of the proposed method with the CML method we have provided a simulation study and an example. We have compared the results using the MSE and the bias. We have designed two different simulation scenarios, data with outliers and data without outliers. The results of simulation study, numerical and real data examples have demonstrated that the CEL method can perform better than the CML method when there are outliers in the data set, which symbolizes the deviation from normality assumption. On the other hand, they have similar performance if there are no outliers in the data sets. For further studies ARMA extension and constraints from t-distribution for CEL method will be considered.

Acknowledgements

The authors thank the anonymous reviewers and editors for their careful reading and suggestions of this article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Alpuim T. and El-Shaarawi A., On the efficiency of regression analysis with AR(p) errors, J. Appl. Stat. 35 (2008), pp. 717–737. [Google Scholar]
  • 2.Ansley C.F., An algorithm for the exact likelihood of a mixed autoregressive-moving average process, Biometrika 66 (1979), pp. 59–65. [Google Scholar]
  • 3.Beach C.M. and McKinnon J.G., A maximum likelihood procedure for regression with autocorrelated errors, Econometrica 46 (1978), pp. 51–58. [Google Scholar]
  • 4.Montgomery D.C., Peck E.A. and Vining G.G., Introduction to Linear Regression Analysis, Vol. 821, John Wiley & Sons, Hoboken, 2012. [Google Scholar]
  • 5.Owen A.B., Empirical likelihood ratio confidence intervals for a single functional, Biometrika 75 (1988), pp. 237–249. [Google Scholar]
  • 6.Owen A.B., Empirical likelihood confidence regions, Ann. Statist. 18 (1990), pp. 90–120. [Google Scholar]
  • 7.Owen A.B., Empirical likelihood for linear models, Ann. Statist. 19 (1991), pp. 1725–1747. [Google Scholar]
  • 8.Owen A.B., Empirical Likelihood, Chapman and Hall, New York, 2001. [Google Scholar]
  • 9.Owen A.B., Self-concordance for empirical likelihood, Can. J. Stat. 41 (2013), pp. 387–397. [Google Scholar]
  • 10.Özdemir Ş. and Arslan O., Combining empirical likelihood and robust estimation methods for linear regression models, Commun. Stat. Simul. Comput. (2019). Available at https://www.tandfonline.com/doi/full/ 10.1080/03610918.2019.1659968. [DOI]
  • 11.Özdemir Ş., Güney Y., Tuaç Y. and Arslan O., Empirical likelihood estimation for linear regression models with AR(p) error term, 31st European Meeting of Statisticians (EMS 2017), Helsinki, July 2017. Available at http://ems2017.helsinki.fi/main.pdf.
  • 12.Özdemir Ş., Güney Y., Tuaç Y. and Arslan O., Empirical likelihood estimation for linear regression models with AR(p) error terms, preprint (2008). Available at https://arxiv.org/abs/2008.03282. [DOI] [PMC free article] [PubMed]
  • 13.R Core Team . R: A language and environment for statistical computing, 2017. Available at https://www.R-project.org/.
  • 14.Qin J. and Lawless J., Empirical likelihood and general estimating equations, Ann. Statist. 22 (1994), pp. 300–325. [Google Scholar]
  • 15.Tiku M.L., Wong W. and Bian G., Estimating parameters in autoregressive models in nonnormal situations: symmetric innovations, Commun. Stat. Theory Methods 28 (1999), pp. 315–341. [Google Scholar]
  • 16.Tuaç Y., Güney Y., Şenoğlu B. and Arslan O., Robust parameterestimation of regression model with AR(p) error terms, Commun. Stat. Simul. Comput 47 (2018), pp. 2343–2359. [Google Scholar]
  • 17.Tuaç Y., Güney Y. and Arslan O., Parameter estimation of regression model with AR(p) error terms based on skew distributions with EM algorithm, Soft Comput. (2019). Available at 10.1007/s00500-019-04089-x. [DOI]
  • 18.Yang D. and Small D.S., An R package and a study of methods for computing empirical likelihood, J. Stat. Comput. Simul. 83 (2013), pp. 1363–1372. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES