Empirical likelihood estimation for linear regression models with AR(p) error terms with numerical examples

Şenay Özdemir; Yeşim Güney; Yetkin Tuaç; Olcay Arslan

doi:10.1080/02664763.2021.1899142

. 2021 Mar 13;49(9):2271–2286. doi: 10.1080/02664763.2021.1899142

Empirical likelihood estimation for linear regression models with AR(p) error terms with numerical examples

Şenay Özdemir ^a,^CONTACT, Yeşim Güney ^b, Yetkin Tuaç ^b, Olcay Arslan ^b

PMCID: PMC9267430 PMID: 35812070

Abstract

Linear regression models are useful statistical tools to analyze data sets in different fields. There are several methods to estimate the parameters of a linear regression model. These methods usually perform under normally distributed and uncorrelated errors. If error terms are correlated the Conditional Maximum Likelihood (CML) estimation method under normality assumption is often used to estimate the parameters of interest. The CML estimation method is required a distributional assumption on error terms. However, in practice, such distributional assumptions on error terms may not be plausible. In this paper, we propose to estimate the parameters of a linear regression model with autoregressive error term using Empirical Likelihood (EL) method, which is a distribution free estimation method. A small simulation study is provided to evaluate the performance of the proposed estimation method over the CML method. The results of the simulation study show that the proposed estimators based on EL method are remarkably better than the estimators obtained from CML method in terms of mean squared errors (MSE) and bias in almost all the simulation configurations. These findings are also confirmed by the results of the numerical and real data examples.

Keywords: AR(p) error terms, dependent error, empirical likelihood, linear regression

1. Introduction

Consider the following linear regression model

y_{t} = x_{t}^{T} β + ε_{t} for t = 1, 2, \dots, N

(1)

where $y_{t}$ are the t-th response variable, $x_{t} \in R^{M}$ are the design vector, $β \in R^{M}$ is the unknown M-dimensional parameter vector and $ε_{t}$ are the uncorrelated errors with $E (ε_{t}) = 0$ and $V a r (ε_{t}) = σ^{2}$ .

It is known that the LS estimators of the regression parameters are obtained by minimizing the sum of the squared residuals or solving the following estimating equation

\frac{1}{N} \sum_{t = 1}^{N} (y_{t} - x_{t}^{T} β) x_{t} = 0 .

(2)

The LS estimators are the minimum variance unbiased estimators of $β$ if $ε_{t}$ are normally distributed. However, in real data applications the normality assumption may not be completely satisfied. If the normally distributed error term is not a reasonable assumption for a regression model, some alternative distributions can be used as the error distribution and the maximum likelihood estimation method can be applied to estimate the regression and other model parameters. On the other hand, if an error distribution is not be easy to specify some alternative distribution free estimation methods should be preferred to obtain estimators for the parameters of interest. One of such methods is the EL method introduced by Owen [5–7]. A noticeable advantage of EL method is that it creates likelihood-type inference without specifying any distributional model for the data. The EL method supposes that there are unknown $π_{t}$ , $t = 1, 2, \dots, N$ , probability weights for each observation and it tries to estimate these probability weights by maximizing an EL function defined as the production of $π_{t}$ s under some constraints related with $π_{t}$ s, and the regression parameters. The EL method can be mathematically defined as follows. Maximize the following EL function

L (β) = \prod_{t = 1}^{N} π_{t}

(3)

under the constraints

\begin{aligned} \sum_{t = 1}^{N} π_{t} = 1 \end{aligned}

(4)

\begin{aligned} \sum_{t = 1}^{N} π_{t} x_{t} (y_{t} - x_{t}^{T} β) = 0 . \end{aligned}

(5)

Note that the constraint given in Equation (5) is similar to the estimating equation given in Equation (2). The only difference is that in Equation (2) we use the equal known weight $(1 / N)$ for each observation, but in Equation (5) we use the unknown probability weights $π_{t}$ s by considering that each observation has different contribution to the estimation procedure. Further, if we also want to estimate the error variance along with the regression parameters we add the following constraint to the constraints given in Equations (4)–(5).

\sum_{t = 1}^{N} π_{t} [(y_{t} - x_{t}^{T} β)^{2} - σ^{2}] = 0.

(6)

The estimation of $π_{t}$ s and hence the model parameters $β$ and $σ^{2}$ can be done by maximizing the EL function given in Equation (3) under the constraints (4) – (6). In general, Lagrange multipliers method can be used for these types of constrained optimization problems. For our problem the Lagrange function will be as follows

\begin{aligned} L (π, β, λ_{0}, λ_{1}^{T}, λ_{2}) & = \sum_{t = 1}^{N} l o g (n π_{t}) + λ_{0} (\sum_{t = 1}^{N} π_{t} - 1) \\ + λ_{1}^{T} \sum_{t = 1}^{N} π_{t} x_{t} (y_{t} - x_{t}^{T} β) \\ + λ_{2} \sum_{t = 1}^{N} π_{t} [(y_{t} - x_{t}^{T} β)^{2} - σ^{2}] \end{aligned}

where $π = [π_{1}, π_{2}, \dots, π_{N}]^{T}$ , $λ_{0}, λ_{2} \in R^{1}$ and $λ_{1}^{T} \in R^{M}$ are Lagrange multipliers. Taking the derivatives of Lagrange function with respect to each $π_{t}$ and setting to zero we get

π_{t} = \frac{1}{N + λ_{1}^{T} x_{t} (y_{t} - x_{t}^{T} β) + λ_{2} [(y_{t} - x_{t}^{T} β)^{2} - σ^{2}]} .

(7)

Substituting $π_{t}$ given in Equation (7) in the EL function and constraints, the optimization problem is reduced to the problem of finding $β$ , σ and Lagrange multipliers. However, this problem is still not easy to handle to obtain the estimators for $β$ and σ. The solution of this problem are considered by several author using several different approaches. For details of the algorithms they are suggesting, one can see the papers [5–10,18].

It should be noted that in all of the mentioned papers researchers consider the EL estimation method to estimate the parameters of a regression model with uncorrelated error terms. However, in practice, uncorrelated error assumption may not be plausible for some data sets. For these data sets regression analysis should be carried on with an autoregressive error terms regression model (regression model with AR(p) error terms). An autoregressive error terms regression model is defined as follows.

y_{t} = \sum_{i = 1}^{M} x_{t, i} β_{i} + e_{t}, t = 1, 2, \dots, N,

(8)

where $y_{t}$ is the response variable, $x_{t, i}$ is the predictor variable, $β_{i}$ is the unknown regression parameter and $e_{t}$ is the AR(p) error term with

e_{t} = ϕ_{1} e_{t - 1} + \dots + ϕ_{p} e_{t - p} + a_{t} .

(9)

Here $ϕ_{j}, j = 1, 2, 3, \dots, p$ are the unknown autoregressive parameters and $a_{t}, t = 1, 2, 3, \dots, N$ are i.i.d. random variables with $E (a_{t}) = 0$ and $V a r (a_{t}) = σ^{2}$ . Note that this regression equation is different from the regression equation given in (1). However, using the back shift operator B, this equation can be transformed to the usual regression equation as follows. Let

\begin{aligned} a_{t} & = Φ (B) e_{t} = e_{t} - ϕ_{1} e_{t - 1} - \dots - ϕ_{p} e_{t - p}, \\ Φ (B) y_{t} & = y_{t} - ϕ_{1} y_{t - 1} - \dots - ϕ_{p} y_{t - p}, \end{aligned}

(10)

and

Φ (B) x_{t} = x_{t} - ϕ_{1} x_{t - 1, i} - \dots - ϕ_{p} x_{t - p, i} .

(11)

Then, the regression model given in (8) can be rewritten as

Φ (B) y_{t} = \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i} + a_{t}, t = 1, 2, \dots, N .

(12)

In literature, parameters of an autoregressive error term regression model are estimated using LS, ML or CML estimation methods. Some of the related papers are [1,3,15,16] and [17]. In all of the mentioned papers some known distributions, such as normal or t, are assumed as the error distribution to carry on estimation of the parameters of interest in this model. However, since imposing appropriate distributional assumptions on the error term of a regression model may not be easy some other distribution free estimation methods may be preferred to carry on regression analysis of a data set. In this study, unlike the papers in literature, we will not assume any distribution for the error terms and propose to use the EL estimation method to estimate the parameters of the linear regression model described in the previous paragraph. A brief demonstration of our purpose given by [11] with an assumption such that variance of error terms is known.

The rest of the paper is organized as follows. In Section 2, the CML and the EL estimation methods for the linear models with AR(p) error terms are given. In Section 3, a small simulation study and a numerical and a real data examples are provided to assess the performance of the EL based estimators over the estimators obtained from the classical CML method. Finally, we draw some conclusions in Section 4.

2. Parameter estimation for linear regression models with AR(p) error terms

In this section, we describe in detail how the EL method is used to estimate the parameters of an autoregressive error term regression model. We will first recall the CML method under normality assumption of $a_{t}$ . Note that since the exact likelihood function could be well approximated by the conditional likelihood function [2] CML estimation method are often used in cases where ML estimation method is not feasible to carry on.

2.1. Conditional maximum likelihood estimation

In general, a system of nonlinear equations of the parameters have to be solved to obtain ML estimators. However, since in most of the cases ML estimators cannot be analytically obtained numerical procedures should be used to get the estimates for the parameters of interest. An alternative way for numerical maximization of the exact likelihood function is to regard the value of the first p observations as known and to maximize the likelihood function conditioned on the first observations. In this part, the CML estimation method will be considered for the regression model given in (8).

Let the error terms $a_{t}$ in the regression model given in Equation (12) have normal distribution with zero mean and $σ^{2}$ variance. Then, the conditional log-likelihood function will be as

\ln L = c - \frac{N - p}{2} \ln σ^{2} - \frac{1}{2 σ^{2}} \sum_{t = p + 1}^{N} {(Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i})}^{2}

(13)

[1]. Taking the derivatives of the conditional log-likelihood function with respect to the unknown parameters, setting them to zero and rearranging the resulting equations yields the following estimating equations for the unknown parameters of the regression model under consideration

\begin{aligned} \tilde{β} & = {[\tilde{Φ} (B) X^{T} \tilde{Φ} (B) X]}^{- 1} [\tilde{Φ} (B) X^{T} \tilde{Φ} (B) Y] \end{aligned}

(14)

\begin{aligned} {\tilde{σ}}^{2} & = \frac{1}{N - p} {[\tilde{Φ} (B) Y - \tilde{Φ} (B) X^{T} \tilde{β}]}^{T} [\tilde{Φ} (B) Y - \tilde{Φ} (B) X^{T} \tilde{β}] \end{aligned}

(15)

\begin{aligned} \tilde{Φ} & = R^{- 1} (\tilde{β}) R_{0} (\tilde{β}) \end{aligned}

(16)

where $\tilde{Φ} (B) X = [\tilde{Φ} (B) x_{t, i}]$ , $\tilde{Φ} (B) Y = [\tilde{Φ} (B) y_{t}]$ ,

\begin{aligned} R (\tilde{β}) = [\begin{array}{cccc} \sum_{t = p + 1}^{N} e_{t - 1}^{2} & \sum_{t = p + 1}^{N} e_{t - 1} e_{t - 2} & \dots & \sum_{t = p + 1}^{N} e_{t - 1} e_{t - p} \\ \sum_{t = p + 1}^{N} e_{t - 2}^{2} & \dots & \sum_{t = p + 1}^{N} e_{t - 2} e_{t - p} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \dots & \sum_{t = p + 1}^{N} e_{t - p}^{2} \end{array}] \end{aligned}

and $R_{0} (\tilde{β}) = [\begin{matrix} \sum_{t = p + 1}^{N} e_{t} e_{t - 1} \\ \sum_{t = p + 1}^{N} e_{t} e_{t - 2} \\ ⋮ \\ \sum_{t = p + 1}^{N} e_{t} e_{t - p} \end{matrix}] .$

However, since these estimating equations cannot be explicitly solved to get estimators for the unknown parameters some numerical methods should be used to compute the estimates. Among all numerical methods the estimating equations suggest a simple iteratively reweighting algorithm (IRA) to compute estimates for the unknown parameters [1,16].

2.2. Empirical likelihood estimation

In this section we consider the EL method to estimate the unknown parameters of the regression model given in Equation (12). The required constraints related to the parameters will be formed similar to the EL estimation used in classical regression case (uncorrelated error term regression model). Since we will use CML estimation approach we will again assume that first p observations are known and will form the conditional empirical likelihood (CEL) function using the unknown probability weights $π_{t}$ for the observations $t = p + 1, \dots, N$ . It should be noted that in CML estimation case $a_{t}$ are assumed to have normal distribution, however, in CEL estimation case we do not need to assume any specific distribution for $a_{t}$ . Now we can formulate the CEL estimation procedure as follows.

Let $π_{t}$ for $t = p + 1, \dots, N$ be the unknown probabilities for the observations $i = p + 1, \dots, N$ . Then, maximize the following CEL function

\max_{π_{t} \in (0, 1)} \sum_{t = p + 1}^{N} l o g (π_{t})

(17)

under the constraints

\begin{aligned} \sum_{t = p + 1}^{N} π_{t} = 1 \end{aligned}

(18)

\begin{aligned} \sum_{t = p + 1}^{N} π_{t} (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) Φ (B) x_{t} = 0 \end{aligned}

(19)

\begin{aligned} \sum_{t = p + 1}^{N} π_{t} (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) e_{t, p} = 0 \end{aligned}

(20)

\begin{aligned} \sum_{t = p + 1}^{N} π_{t} [{(Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i})}^{2} - σ^{2}] = 0 \end{aligned}

(21)

to obtain CEL estimators for the parameters of the regression model given in Equation (12). Here,

$Φ (B) x_{t} = [Φ (B) x_{t, 1}, Φ (B) x_{t, 2}, \dots, Φ (B) x_{t, M}]^{T}$ and $e_{t, p} = [e_{t - 1}, e_{t - 2}, \dots, e_{t - p}]^{T}$ .

Since this is a constraint optimization problem Lagrange multiplier method can be used to solve it. To this extend, let $λ_{0}, λ = (λ_{1}^{T}, λ_{2}^{T}, λ_{3})$ , $ϕ$ and $π$ denote the Lagrange multipliers, vector of $ϕ_{i}, i = 1, 2, \dots, p$ and vector of $π_{t}, t = p + 1, \dots, N$ , respectively. Then, the Lagrange function for this optimization problem can be written as

\begin{aligned} L (β, ϕ, σ^{2}, π, λ, λ_{0}) & = - \sum_{t = p + 1}^{N} l o g (π_{t}) + λ_{0} (\sum_{t = p + 1}^{N} π_{t} - 1) \\ + λ_{1}^{T} (\sum_{t = p + 1}^{N} π_{t} (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) Φ (B) x_{t}) \\ + λ_{2}^{T} (\sum_{t = p + 1}^{N} π_{t} (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) e_{t, p}) \\ + λ_{3} (\sum_{t = p + 1}^{N} π_{t} [{(Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i})}^{2} - σ^{2}]) \end{aligned}

Taking the derivatives of $L (β, ϕ, σ^{2}, π, λ, λ_{0})$ with respect to $π_{t}$ and the Lagrangian multipliers, and setting the resulting equations to zero we get first-order conditions of this optimization problem. Solving the first-order conditions with respect to $π_{t}$ yields

π_{t} = \frac{1}{(N - p) + λ_{1}^{T} Ψ_{1, t} + λ_{2}^{T} Ψ_{2, t} + λ_{3} Ψ_{3, t}}, t = p + 1, \dots, N

(22)

where

\begin{aligned} Ψ_{1, t} & = (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) Φ (B) x_{t} \\ Ψ_{2, t} & = (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) e_{t, p} \\ Ψ_{3, t} & = [{(Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i})}^{2} - σ^{2}] . \end{aligned}

Substituting these values of $π_{t}$ into Equation (17) we find that

\begin{aligned} l (λ, β, ϕ, σ^{2}) & = \sum_{t = p + 1}^{N} \log π_{t} \\ = - \sum_{t = p + 1}^{N} \log ((N - p) + λ_{1}^{T} Ψ_{1, t} + λ_{2}^{T} Ψ_{2, t} + λ_{3} Ψ_{3, t}) \end{aligned}

(23)

Since $0 < π_{t} < 1$ , the EL method maximizes this function over the set $(N - p) + λ_{1}^{T} Ψ_{1, t} + λ_{2}^{T} Ψ_{2, t} + λ_{3} Ψ_{3, t} > 1$ . The rest of this optimization problem will be carried on as follows.

For given $β$ , $ϕ$ and $σ^{2}$ minimize the function $l (λ, β, ϕ, σ^{2})$ given in equation (23) with respect to the Lagrange multipliers $λ = (λ_{1}, λ_{2}, λ_{3})$ . That is, for given $β$ , $ϕ$ and $σ^{2}$ solve the following minimization problem to get the values of Lagrange multipliers

λ (β, ϕ, σ^{2}) = {argmin}_{λ} l (λ, β, ϕ, σ^{2}) .

Since the solution of this minimization problem cannot be obtained explicitly numerical methods should be used to get solutions. Substituting this solution in $l (λ, β, ϕ, σ^{2})$ yields the function $l (λ (β, ϕ, σ^{2}), β, ϕ, σ^{2})$ that is only depend on the model parameters $β$ , $ϕ$ and $σ^{2}$ . This function can be regarded as a profile conditional empirical log-likelihood function. The CEL estimators $\hat{β}$ , $\hat{ϕ}$ and ${\hat{σ}}^{2}$ will be obtained by maximizing $l (λ (β, ϕ, σ^{2}), β, ϕ, σ^{2})$ function with respect to the model parameters $β$ , $ϕ$ and $σ^{2}$ . That is, the CEL estimators will be the solutions of the following maximization problem

(\hat{β}, \hat{ϕ}, {\hat{σ}}^{2}) = {argmax}_{β, ϕ, σ^{2}} l (λ (β, ϕ, σ^{2}), β, ϕ, σ^{2}) .

Since, there is no explicit solution of this maximization problem to explicitly obtain the CEL estimators $\hat{β}$ , $\hat{ϕ}$ and ${\hat{σ}}^{2}$ numerical methods should be used to obtain CEL estimators. In this paper, we use a Newton type algorithm to carry on this optimization problem. The steps of our algorithm are as follows.

Step 0. Set initial values $λ^{(0)}$ , $β^{(0)}$ , $ϕ^{(0)}$ and $σ^{2 (0)}$ . Fix stopping rule ϵ. The starting value of $λ^{(0)}$ can be set to be the zero vector, but setting it $- n$ for each Lagrange multiplier yields faster convergence.

Step 1. The function $l ({λ, β}^{(0)} {, ϕ}^{(0)}, σ^{2 (0)})$ given in equation (23) is minimized with respect to $λ$ . This will be done by iterating

λ_{j + 1} = λ_{j} + \frac{1}{2} (l_{λ}^{'' - 1} l_{λ}^{'})

until convergence is satisfied. Here, $l_{λ}^{'}$ is the first-order derivative and $l_{λ}^{''}$ is the second-order derivative of $l ({λ, β}^{(0)} {, ϕ}^{(0)}, σ^{2 (0)})$ with respect to $λ$ . By doing so we calculate $λ^{(m)}$ for $m = 1, 2, 3, \dots$ value for the Lagrange multipliers computed at m-th step.

Step 2. After finding $λ^{(m)}$ at Step 1, the function $l (λ^{(m)}, β, ϕ, σ^{2})$ is maximized with respect to $ϕ$ , $β$ and $σ^{2}$ using following updating equations.

\begin{aligned} ϕ_{j + 1} & = ϕ_{j} - \frac{1}{2} (l_{ϕ}^{'' - 1} l_{ϕ}^{'}) \\ β_{j + 1} & = β_{j} - \frac{1}{2} (l_{β}^{'' - 1} l_{β}^{'}) . \\ σ_{j + 1}^{2} & = σ_{j}^{2} - \frac{1}{2} (l_{σ^{2}}^{'' - 1} l_{σ^{2}}^{'}) . \end{aligned}

Here $l_{β}^{'}$ , $l_{ϕ}^{'}$ and $l_{σ^{2}}^{'}$ are the first-order derivatives of $l (λ^{(m)}, β, ϕ, σ^{2})$ with respect to $β$ , $ϕ$ and $σ^{2}$ while $l_{β}^{''}$ , $l_{ϕ}^{''}$ and $l_{σ^{2}}^{''}$ denote the second-order derivatives. At this step we obtain $β^{(m)}$ , $ϕ^{(m)}$ and $σ^{2 (m)}$ , for $m = 1, 2, 3, \dots$

Step 3. Together, steps 1 and 2 accomplish one iteration. Therefore, after steps 1 and 2 we obtain the values $λ^{(m)}$ , $β^{(m)}$ , $ϕ^{(m)}$ and $σ^{2 (m)}$ computed at m-th iteration. These are current estimates for the parameters and are used as initial values to obtain the estimates in $(m + 1)$ -th iteration. Therefore, repeat steps 1 and 2 until

\begin{aligned} ‖ \begin{matrix} β^{(m + 1)} - β^{(m)} \\ ϕ^{(m + 1)} - ϕ^{(m)} \\ σ^{2 (m + 1)} - σ^{2 (m)} \end{matrix} ‖ < ϵ \end{aligned}

is satisfied.

The performance of the proposed algorithm evaluated by the help of simulation study and a numerical example given in the next section.

Asymptotic distribution of the proposed estimator can be given under following assumptions. Let $g (x, β_{0}) = \sum_{t = p + 1}^{N} π_{t} (Φ (B) y_{t} - \sum_{i = 1}^{M} β_{i} Φ (B) x_{t, i}) Φ (B) x_{t} = 0$

$E [g (x, β_{0}) g^{T} (x, β_{0})]$ is positive definite.
$\frac{\partial g (x, β)}{\partial β}$ is continuous in a neighborhood of $β_{0}$ .
$‖ \partial g (x, β) / \partial β ‖$ and $‖ g (x, β) ‖^{3}$ are bounded in the neighborhood $β_{0}$ .
The rank of $E [\frac{\partial g (x, β_{0})}{\partial β}]$ is k.
$\partial^{2} g (x, β) / \partial β β^{T}$ is continuous in $β$ in a neighborhood of $β_{0}$ and the $‖ \partial^{2} g (x, β) / \partial β β^{T} ‖$ is bounded.

Then, under these assumptions,

\sqrt{n} (\hat{β} - β_{0}) \to N (0, V)

where

V = {[E {(\frac{\partial g (x, β_{0})}{\partial β})}^{T} E (g (x, β_{0}) g (x, β_{0})^{T}) E (\frac{\partial g (x, β_{0})}{\partial β})]}^{- 1}

[14].

One can prefer to use the observed Fisher information matrix to calculate the standard error or confidence intervals. The observed Fisher information matrix for CEL can be obtained by the help of $l_{β}^{''}$ , $l_{ϕ}^{''}$ and $l_{σ^{2}}^{''}$ which are mentioned in Step 2 of the algorithm.

3. Simulation study

In this section, we give a small simulation study and a numerical example to illustrate the performance of the empirical likelihood estimators for the autoregressive error term regression model over the estimators obtained from the normally distributed error terms. We use R version 3.4.0 (2017-04-21) [13] to carry on our simulation study and numerical example.

3.1. Simulation design

We consider the second-order (AR(2)) autoregressive error term model with M=3 regression parameters. The sample sizes are taken as n = 25, 50 and 100. For each case the explanatory variables $x_{t}$ are generated from standard normal distribution $(x_{t} \sim N (0, 1))$ . The values of the parameters are taken as $β = (β_{1}, β_{2}, β_{3})^{'} = (1, 3, 5)^{'}$ , $σ^{2} = 1$ and $ϕ = (ϕ_{1}, ϕ_{2})^{'} = (0.8, - 0.2)^{'}$ . Note that the values of $ϕ$ are taken as above to guarantee the stationarity assumption of the error terms. We consider three different distributions for the error term $a_{t}$ : $N (0, 1)$ , $0.9 N (0, 1) + 0.1 N (30, 1)$ and $0.9 N (0, 1) + 0.1 N (0, 10)$ . Note that last two distributions are used to generate outliers in y-direction. After setting the values of the model parameters $β$ , $ϕ$ and $σ^{2}$ and deciding the distribution of the error term, the values of the response variable are generated using $ϕ (B) y_{t} = \sum_{i = 1}^{M} β_{i} ϕ (B) x_{t, i} + a_{t}$ .

We compare the results of the CEL estimators with the results of the CML estimators. Mean squared error (MSE) and bias values are calculated as performance measures to compare the estimators. These values are calculated using the following equations for 1000 replications:

$M S E (\hat{β}) = (\frac{1}{1000}) \sum_{i = 1}^{1000} ({\hat{β}}_{i} - β)^{2}$ , $b i a s (\hat{β}) = \bar{β} - β$ ,
$M S E (\hat{ϕ}) = (\frac{1}{1000}) \sum_{i = 1}^{1000} (\hat{ϕ_{i}} - ϕ)^{2}$ , $b i a s (\hat{ϕ}) = \bar{ϕ} - ϕ$ ,
$M S E (\hat{σ^{2}}) = (\frac{1}{1000}) \sum_{i = 1}^{1000} ({\hat{σ}}_{i}^{2} - σ^{2})^{2}$ , $b i a s (\hat{σ^{2}}) = {\bar{σ}}^{2} - σ^{2}$

where $\bar{β} = (\frac{1}{1000}) \sum_{i = 1}^{1000} {\hat{β}}_{i}$ , $\bar{ϕ} = (\frac{1}{1000}) \sum_{i = 1}^{1000} {\hat{ϕ}}_{i}$ ,and ${\bar{σ}}^{2} = (\frac{1}{1000}) \sum_{i = 1}^{1000} {\hat{σ}}_{i}^{2}$ .

3.2. Simulation results

In Table 1, mean estimates, MSE and bias values of the CEL and CML estimators computed over 1000 replications for the normally distributed error case (for the case without outliers). We observe from these results that the both estimators have similar behavior for the large sample sizes when outliers are not the case. On the other hand, for smaller sample sizes the CEL estimators have better performance than the CML estimators in terms of the MSE values. Thus, for small sample sizes the estimator based on EL method may be a good alternative to the CML method.

Table 1.

Estimates, MSE and bias values of the estimates from $a_{t} \sim N (0, 1)$ for different sample sizes without outliers.

	n=25		n=50		n=100
	Normal	Empirical	Normal	Empirical	Normal	Empirical
	${\hat{β}}_{1}$	0.98796	1.00138	0.99845	1.00178	0.99932	1.00068
$β_{1}$	MSE	0.03276	0.0058	0.01445	0.00228	0.00584	0.00074
	BIAS	0.14255	0.05276	0.09522	0.0319	0.0605	0.01832
	${\hat{β}}_{2}$	3.00136	3.00697	2.99808	3.00009	2.9978	2.99753
$β_{2}$	MSE	0.03247	0.00613	0.01346	0.00225	0.00664	0.00071
	BIAS	0.14406	0.05512	0.09181	0.03214	0.06437	0.01797
	${\hat{β}}_{3}$	5.00803	5.00139	4.99681	4.99966	4.99913	5.00266
$β_{3}$	MSE	0.03481	0.00657	0.01421	0.00202	0.00583	0.00069
	BIAS	0.14491	0.05574	0.09389	0.03042	0.06045	0.01789
	${\hat{ϕ}}_{1}$	0.76906	0.80372	0.77997	0.8013	0.79587	0.80081
$ϕ_{1}$	MSE	0.0457	0.00028	0.02118	0.0001	0.01042	0.00004
	BIAS	0.16869	0.01328	0.11537	0.00784	0.08196	0.00465
	${\hat{ϕ}}_{2}$	$-$ 0.2095	$-$ 0.19714	$-$ 0.20945	$-$ 0.199	$-$ 0.21079	$-$ 0.20002
$ϕ_{2}$	MSE	0.04266	0.00029	0.01949	0.0001	0.00951	0.00004
	BIAS	0.16559	0.01394	0.11067	0.00824	0.07779	0.00467
	${\hat{σ}}^{2}$	0.88229	1.15221	0.94378	1.1022	0.96983	1.07259
$σ^{2}$	MSE	0.03511	0.03062	0.01351	0.01454	0.00633	0.00784
	BIAS	0.15368	0.15458	0.09334	0.1022	0.06461	0.07326

Open in a new tab

Note: True values are $(β_{1}, β_{2}, β_{3})^{'} = (1, 3, 5)^{'}$ , $(ϕ_{1}, ϕ_{2})^{'} = (0.8, - 0.2)$ and $σ^{2} = 1$ .

The simulation results for the outlier cases are reported in Tables 2 and 3. From these tables we observe that when outliers are introduced in the data the CML estimator is noticeably influenced by the outliers with higher MSE values. On the other hand, the estimators based on empirical likelihood still retain their good performance in terms of having smaller MSE and bias values. To sum up, the performance of the estimators obtained from the empirical likelihood is superior to the performance of the estimators obtained from CML method when some outliers are present in the data and/or the sample size is smaller.

Table 2.

Estimates, MSE and bias values of the estimates for different sample sizes from $a_{t} \sim 0.9 N (0, 1) + 0.1 N (30, 1)$ .

		n=25		n=50		n=100
		Normal	Empirical	Normal	Empirical	Normal	Empirical
	${\hat{β}}_{1}$	0.99758	1.0001	1.01599	1.0008	0.9947	1.00067
$β_{1}$	MSE	0.53329	0.00588	0.11102	0.00189	0.02944	0.00071
	BIAS	0.56527	0.05219	0.26139	0.02957	0.13598	0.01776
	${\hat{β}}_{2}$	2.97227	3.00938	2.98684	2.9988	2.99961	3.00186
$β_{2}$	MSE	0.51638	0.00659	0.10983	0.00222	0.02613	0.00068
	BIAS	0.5623	0.05636	0.26129	0.03185	0.12822	0.01766
	${\hat{β}}_{3}$	4.97749	5.00772	5.00388	5.00511	4.99569	5.00152
$β_{3}$	MSE	0.50668	0.00559	0.10949	0.00218	0.0282	0.00072
	BIAS	0.55812	0.05204	0.2574	0.03174	0.13337	0.01798
	${\hat{ϕ}}_{1}$	1.77218	0.80245	1.7009	0.80143	1.63445	0.80104
$ϕ_{1}$	MSE	0.9494	0.00029	0.81299	0.00009	0.69691	0.00003
	BIAS	0.97218	0.0136	0.9009	0.00746	0.83445	0.00446
	${\hat{ϕ}}_{2}$	$-$ 0.94992	$-$ 0.19518	$-$ 0.75483	$-$ 0.19879	$-$ 0.64632	$-$ 0.20002
$ϕ_{2}$	MSE	0.58956	0.00033	0.31046	0.00011	0.19993	0.00004
	BIAS	0.74992	0.01471	0.55483	0.0083	0.44632	0.00462
	${\hat{σ}}^{2}$	5.95214	1.13923	4.42534	1.10108	3.30909	1.06915
$σ^{2}$	MSE	24.68756	0.02581	11.7763	0.0142	5.34667	0.00688
	BIAS	4.95214	0.1419	3.42534	0.10183	2.30909	0.06921

Open in a new tab

Note: True values are $(β_{1}, β_{2}, β_{3})^{'} = (1, 3, 5)^{'}$ , $(ϕ_{1}, ϕ_{2})^{'} = (0.8, - 0.2)$ and $σ^{2} = 1$ .

Table 3.

Estimates, MSE and bias values of the estimates for different sample sizes from $a_{t} \sim 0.9 N (0, 1) + 0.1 N (0, 10)$ .

		n=25		n=50		n=100
		Normal	Empirical	Normal	Empirical	Normal	Empirical
	${\hat{β}}_{1}$	1.00012	1.00766	0.9878	0.99887	0.99299	1.00153
$β_{1}$	MSE	0.21859	0.00777	0.11943	0.00276	0.06829	0.00076
	BIAS	0.33632	0.05886	0.26183	0.034	0.20502	0.01859
	${\hat{β}}_{2}$	2.99806	2.9978	2.98896	3.00449	2.98302	3.00347
$β_{2}$	MSE	0.23989	0.0086	0.13169	0.00265	0.06363	0.00083
	BIAS	0.34908	0.06087	0.2692	0.03391	0.19512	0.01922
	${\hat{β}}_{3}$	5.01594	5.00257	4.99975	5.00495	4.99407	4.99616
$β_{3}$	MSE	0.22748	0.0079	0.12487	0.00273	0.06541	0.00083
	BIAS	0.3443	0.05964	0.26638	0.03409	0.19757	0.01894
	${\hat{ϕ}}_{1}$	0.79014	0.80161	0.73106	0.80076	0.7298	0.80066
$ϕ_{1}$	MSE	0.43476	0.00022	0.18784	0.00009	0.08888	0.00004
	BIAS	0.51506	0.01213	0.34467	0.00757	0.23919	0.00471
	${\hat{ϕ}}_{2}$	$-$ 0.46927	$-$ 0.19645	$-$ 0.26362	$-$ 0.19819	$-$ 0.21955	$-$ 0.19966
$ϕ_{2}$	MSE	0.7917	0.00031	0.23481	0.0001	0.08613	0.00004
	BIAS	0.70074	0.01421	0.37436	0.00786	0.23868	0.00473
	${\hat{σ}}^{2}$	2.65719	1.14735	2.70776	1.09725	2.94881	1.06939
$σ^{2}$	MSE	3.79346	0.02846	3.59029	0.01368	4.16932	0.00702
	BIAS	1.65951	0.14943	1.70791	0.09889	1.94881	0.0697

Open in a new tab

Note: True values are $(β_{1}, β_{2}, β_{3})^{'} = (1, 3, 5)^{'}$ , $(ϕ_{1}, ϕ_{2})^{'} = (0.8, - 0.2)$ and $σ^{2} = 1$ .

3.3. Numerical example

Montgomery et al. [4] give soft drink data example relating annual regional advertising expenses to annual regional concentrate sales for a soft drink company for 20 years. After calculating LS residuals for the linear regression model, the assumption of uncorrelated errors can be tested by using the Durbin–Watson statistic, which is a test statistics detecting the positive autocorrelation. The critical values of the Durbin–Watson statistic are dL = 1.20 and dU = 1.41 for one explanatory variable and 20 observations. The calculated value of the Durbin–Watson statistic d = 1.08 and this value is less than the critical value dL = 1.20. Therefore it can be said that the errors in the regression model are positively autocorrelated.

We consider a linear regression model with AR(p) error terms and find the LS estimators for the model parameters. Then, we use the LS estimators as the initial values to run the algorithm to compute CEL and CML estimators. The estimates of the parameters are given in Table 4. We observe from the results given in Table 4 that the CEL and the CML methods give similar estimates for the regression and the autoregressive parameters. But, the CEL estimate of error variance is smaller than the estimate obtained from the CML method. Figure 1(a) shows the scatter plot of the data set with the fitted regression lines obtained from CEL and CML methods. We observe that the fitted lines are coinciding. To further explore the behavior of the estimators against the outliers we create one outlier by replacing the last observation with an outlier. For the new data set (the same data set with one outlier) we again find the CEL and the CML estimates for all the parameters. We also provide these estimates in Table 4. Figure 1(b) depicts the scatter plot of the data and the fitted lines obtained from the CEL and the CML estimates. We observe that, unlike the fitted line obtained from the CEL method, the fitted line obtained from the CML method is badly influenced by the outlier.

Table 4.

The parameter estimates for soft drink data.

	Without outlier		With one outlier
	Empirical	Normal	Empirical	Normal
${\hat{β}}_{0}$	1593.95911	1645.37912	1617.57068	560.9601
${\hat{β}}_{1}$	20.07335	19.80988	19.93477	30.7569
${\hat{ϕ}}_{1}$	0.89579	0.56856	0.89814	$-$ 2.5689
${\hat{σ}}^{2}$	6.43127	16.96544	2.44785	2025.174

Open in a new tab

Figure 1. — Scatter plot of soft drink data and, CEL and CML fits with outlier.

3.4. Real data example

We use a real data set consisting of $C O_{2}$ emission (metric kgs per capita) and energy usage (kg of oil equivalent per capita) of Turkey between the years 1974 and 2014. The data set can be obtained from ‘https://data.worldbank. org’. The scatter plot in Figure 2(a) shows that a linear relationship between logarithms of $C O_{2}$ emission and energy usage. If the error terms are calculated using the LS estimates (−15.06489, 1.16258), it can be seen that the data has AR(1) structure with $\hat{ϕ} = 0.65$ autocorrelation coefficient, since the Durbin–Watson test statistic and its p-value are 0.662571 and 1.235e-07, respectively. The ACF (autocorrelation function) and PACF (Partial ACF) plots given in Figure 3 also support the same results. Further, Q-Q plot of residuals in Figure 3 yields that the assumption of normally distributed error terms is achieved.

Figure 2. — Scatter plot of $C O_{2}$ emission–energy usage data, and CEL and CML fits.

Figure 3. — ACF, PACF and Q-Q of plots of residuals $C O_{2}$ emission–energy usage data.

For this data set, we use the CML and CEL estimates to estimate the parameters of regression model and autocorrelation structure and also error variance. Note that the LS estimates of original data are used as initial values to compute the CML and the CEL estimates given in Table 5. The fitted regression lines along with the scatter plot of original data are also given in Figure 2(a), and we can see that both methods give similar results.

Table 5.

The parameter estimates for $C O_{2}$ emission–energy usage data.

	Without outlier		With one outlier
	Empirical	Normal	Empirical	Normal
${\hat{β}}_{0}$	$-$ 15.389316	$-$ 13.827491	$-$ 13.891223	$-$ 25.869536
${\hat{β}}_{1}$	1.187502	1.074341	1.076084	1.952316
${\hat{ϕ}}_{1}$	0.655439	0.861551	0.511249	$-$ 0.201119
${\hat{σ}}^{2}$	0.824727	0.021447	1.148087	0.900177

Open in a new tab

To evaluate the performance of CEL method when there are outliers in the data or departure from normality, we create an artificial outlier multiplying last observation by five in the data, and use the both estimation methods to estimate the model parameters. The estimates are provided in Table 5 and the fitted lines obtained from both methods can be seen in Figure 2(b). We can easily see from Figure 2(b), the CEL method has better fit than the CML method to the data with outlier. Although real autocorrelation is positive, the CML estimate of autocorrelation is calculated as negative because of the outlier effect. In this case the LS estimates of regression parameters are (−27.4952, 2.0717) and estimated autocorrelation is −0.0107.

4. Conclusion

In literature, the CML or the LS methods are often adapted to estimate the parameters of an autoregressive error terms regression model. The CML method is applied to carry on the estimation under some distributional assumption on the error therm. However, for some data sets it may not be plausible to make some distributional assumption on the error term due to the lack of information about data sets. For those data sets, some alternative distributional assumption free methods should be preferred to continue regression analysis. One of those distributional free estimation methods is the EL method, which can also be used for small sample sizes. In this paper, we have used the EL estimation method to estimate the parameters of an autoregressive error terms regression model. We have defined a CEL function and constructed the necessary constraints using probability weights and the normal equations borrowed from the classical LS estimation method to conduct parameter estimation. To evaluate and compare the performance of the proposed method with the CML method we have provided a simulation study and an example. We have compared the results using the MSE and the bias. We have designed two different simulation scenarios, data with outliers and data without outliers. The results of simulation study, numerical and real data examples have demonstrated that the CEL method can perform better than the CML method when there are outliers in the data set, which symbolizes the deviation from normality assumption. On the other hand, they have similar performance if there are no outliers in the data sets. For further studies ARMA extension and constraints from t-distribution for CEL method will be considered.

Acknowledgements

The authors thank the anonymous reviewers and editors for their careful reading and suggestions of this article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Alpuim T. and El-Shaarawi A., On the efficiency of regression analysis with AR(p) errors, J. Appl. Stat. 35 (2008), pp. 717–737. [Google Scholar]
2.Ansley C.F., An algorithm for the exact likelihood of a mixed autoregressive-moving average process, Biometrika 66 (1979), pp. 59–65. [Google Scholar]
3.Beach C.M. and McKinnon J.G., A maximum likelihood procedure for regression with autocorrelated errors, Econometrica 46 (1978), pp. 51–58. [Google Scholar]
4.Montgomery D.C., Peck E.A. and Vining G.G., Introduction to Linear Regression Analysis, Vol. 821, John Wiley & Sons, Hoboken, 2012. [Google Scholar]
5.Owen A.B., Empirical likelihood ratio confidence intervals for a single functional, Biometrika 75 (1988), pp. 237–249. [Google Scholar]
6.Owen A.B., Empirical likelihood confidence regions, Ann. Statist. 18 (1990), pp. 90–120. [Google Scholar]
7.Owen A.B., Empirical likelihood for linear models, Ann. Statist. 19 (1991), pp. 1725–1747. [Google Scholar]
8.Owen A.B., Empirical Likelihood, Chapman and Hall, New York, 2001. [Google Scholar]
9.Owen A.B., Self-concordance for empirical likelihood, Can. J. Stat. 41 (2013), pp. 387–397. [Google Scholar]
10.Özdemir Ş. and Arslan O., Combining empirical likelihood and robust estimation methods for linear regression models, Commun. Stat. Simul. Comput. (2019). Available at https://www.tandfonline.com/doi/full/ 10.1080/03610918.2019.1659968. [DOI]
11.Özdemir Ş., Güney Y., Tuaç Y. and Arslan O., Empirical likelihood estimation for linear regression models with AR(p) error term, 31st European Meeting of Statisticians (EMS 2017), Helsinki, July 2017. Available at http://ems2017.helsinki.fi/main.pdf.
12.Özdemir Ş., Güney Y., Tuaç Y. and Arslan O., Empirical likelihood estimation for linear regression models with AR(p) error terms, preprint (2008). Available at https://arxiv.org/abs/2008.03282. [DOI] [PMC free article] [PubMed]
13.R Core Team . R: A language and environment for statistical computing, 2017. Available at https://www.R-project.org/.
14.Qin J. and Lawless J., Empirical likelihood and general estimating equations, Ann. Statist. 22 (1994), pp. 300–325. [Google Scholar]
15.Tiku M.L., Wong W. and Bian G., Estimating parameters in autoregressive models in nonnormal situations: symmetric innovations, Commun. Stat. Theory Methods 28 (1999), pp. 315–341. [Google Scholar]
16.Tuaç Y., Güney Y., Şenoğlu B. and Arslan O., Robust parameterestimation of regression model with AR(p) error terms, Commun. Stat. Simul. Comput 47 (2018), pp. 2343–2359. [Google Scholar]
17.Tuaç Y., Güney Y. and Arslan O., Parameter estimation of regression model with AR(p) error terms based on skew distributions with EM algorithm, Soft Comput. (2019). Available at 10.1007/s00500-019-04089-x. [DOI]
18.Yang D. and Small D.S., An R package and a study of methods for computing empirical likelihood, J. Stat. Comput. Simul. 83 (2013), pp. 1363–1372. [Google Scholar]

[CIT0001] 1.Alpuim T. and El-Shaarawi A., On the efficiency of regression analysis with AR(p) errors, J. Appl. Stat. 35 (2008), pp. 717–737. [Google Scholar]

[CIT0002] 2.Ansley C.F., An algorithm for the exact likelihood of a mixed autoregressive-moving average process, Biometrika 66 (1979), pp. 59–65. [Google Scholar]

[CIT0003] 3.Beach C.M. and McKinnon J.G., A maximum likelihood procedure for regression with autocorrelated errors, Econometrica 46 (1978), pp. 51–58. [Google Scholar]

[CIT0004] 4.Montgomery D.C., Peck E.A. and Vining G.G., Introduction to Linear Regression Analysis, Vol. 821, John Wiley & Sons, Hoboken, 2012. [Google Scholar]

[CIT0005] 5.Owen A.B., Empirical likelihood ratio confidence intervals for a single functional, Biometrika 75 (1988), pp. 237–249. [Google Scholar]

[CIT0006] 6.Owen A.B., Empirical likelihood confidence regions, Ann. Statist. 18 (1990), pp. 90–120. [Google Scholar]

[CIT0007] 7.Owen A.B., Empirical likelihood for linear models, Ann. Statist. 19 (1991), pp. 1725–1747. [Google Scholar]

[CIT0008] 8.Owen A.B., Empirical Likelihood, Chapman and Hall, New York, 2001. [Google Scholar]

[CIT0009] 9.Owen A.B., Self-concordance for empirical likelihood, Can. J. Stat. 41 (2013), pp. 387–397. [Google Scholar]

[CIT0010] 10.Özdemir Ş. and Arslan O., Combining empirical likelihood and robust estimation methods for linear regression models, Commun. Stat. Simul. Comput. (2019). Available at https://www.tandfonline.com/doi/full/ 10.1080/03610918.2019.1659968. [DOI]

[CIT0011] 11.Özdemir Ş., Güney Y., Tuaç Y. and Arslan O., Empirical likelihood estimation for linear regression models with AR(p) error term, 31st European Meeting of Statisticians (EMS 2017), Helsinki, July 2017. Available at http://ems2017.helsinki.fi/main.pdf.

[CIT0012] 12.Özdemir Ş., Güney Y., Tuaç Y. and Arslan O., Empirical likelihood estimation for linear regression models with AR(p) error terms, preprint (2008). Available at https://arxiv.org/abs/2008.03282. [DOI] [PMC free article] [PubMed]

[CIT0013] 13.R Core Team . R: A language and environment for statistical computing, 2017. Available at https://www.R-project.org/.

[CIT0014] 14.Qin J. and Lawless J., Empirical likelihood and general estimating equations, Ann. Statist. 22 (1994), pp. 300–325. [Google Scholar]

[CIT0015] 15.Tiku M.L., Wong W. and Bian G., Estimating parameters in autoregressive models in nonnormal situations: symmetric innovations, Commun. Stat. Theory Methods 28 (1999), pp. 315–341. [Google Scholar]

[CIT0016] 16.Tuaç Y., Güney Y., Şenoğlu B. and Arslan O., Robust parameterestimation of regression model with AR(p) error terms, Commun. Stat. Simul. Comput 47 (2018), pp. 2343–2359. [Google Scholar]

[CIT0017] 17.Tuaç Y., Güney Y. and Arslan O., Parameter estimation of regression model with AR(p) error terms based on skew distributions with EM algorithm, Soft Comput. (2019). Available at 10.1007/s00500-019-04089-x. [DOI]

[CIT0018] 18.Yang D. and Small D.S., An R package and a study of methods for computing empirical likelihood, J. Stat. Comput. Simul. 83 (2013), pp. 1363–1372. [Google Scholar]

PERMALINK

Empirical likelihood estimation for linear regression models with AR(p) error terms with numerical examples

Şenay Özdemir

Yeşim Güney

Yetkin Tuaç

Olcay Arslan

Abstract

1. Introduction

2. Parameter estimation for linear regression models with AR(p) error terms