Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Sep 4.
Published in final edited form as: J Am Stat Assoc. 2012 Jan 1;105(491):1104–1112. doi: 10.1198/jasa.2010.tm09307

Least Absolute Relative Error Estimation

Kani CHEN 1, Shaojun GUO 2, Yuanyuan LIN 3,#, Zhiliang YING 4
PMCID: PMC3762514  NIHMSID: NIHMS491922  PMID: 24013644

Abstract

Multiplicative regression model or accelerated failure time model, which becomes linear regression model after logarithmic transformation, is useful in analyzing data with positive responses, such as stock prices or life times, that are particularly common in economic/financial or biomedical studies. Least squares or least absolute deviation are among the most widely used criterions in statistical estimation for linear regression model. However, in many practical applications, especially in treating, for example, stock price data, the size of relative error, rather than that of error itself, is the central concern of the practitioners. This paper offers an alternative to the traditional estimation methods by considering minimizing the least absolute relative errors for multiplicative regression models. We prove consistency and asymptotic normality and provide an inference approach via random weighting. We also specify the error distribution, with which the proposed least absolute relative errors estimation is efficient. Supportive evidence is shown in simulation studies. Application is illustrated in an analysis of stock returns in Hong Kong Stock Exchange.

Keywords: Multiplicative regression model, Logarithm transformation, Relative error, Random weighting

1. INTRODUCTION

Linear regression model is one of the most fundamental statistical models. And the most popular method of estimation, which dates back to Gauss, is the method of least squares (LS); see Gauss (1809) and Stigler (1981). Specifically, consider

Yi=Xiβ+εi,i=1,,n, (1)

where Yi and Xi are, respectively, the response variable and observable p-vector of covariates, β is the p-vector of regression coefficients including an intercept and εi is the unobservable error term independent of Xi. The least squares criterion is to minimize the sum of squares of the errors: i=1n(YiXiβ)2. The resulting LS estimator enjoys some important optimality, such as best linear unbiased estimator. It is efficient when the errors follow normal distribution. An important alternative to the least squares method is the least absolute deviation (LAD) method, which is to minimize the sum of absolute values of the errors: i=1nYiXiβ. The LAD estimator is more robust than the LS estimator, and its computation and inference procedure is now rather straightforward with the help of linear program and random weighting. A comprehensive discussion may be found in Portnoy and Koenker (1997). We note that the LS method requires finite second moment of the errors while the LAD requires positivity of the density of the errors at 0.

The above LS and LAD criterions are based on absolute errors. In many practical applications, however, the relative errors, rather than the absolute errors, are more of concern. Narula and Wellington (1977) presented an estimation method based on minimizing the sum of absolute relative errors for linear model. Makridakis et al (1984) used relative error as a model selection criterion in time series modeling. Khoshgoftaar et al (1992) gave sufficient conditions to ensure the strong consistency of the estimators minimizing the sum of squared relative errors: i=1n[{Yif(Xi,β)}Yi]2 (RLS for relative least squares) and minimizing the sum of the absolute relative errors: i=1n{Yif(Xi,β)}Yi (MRE for minimum relative errors) for nonlinear regression model Yi = f(Xi,β)+εi, where f(x,β) is the regression function and Yi, Xi, β, εi are given in model (1). Park and Stefanski (1998) derived a closed form expression for the best mean squared relative error predictor of Y given X, where Y is the response variable and X is the predictor variable. These approaches are conceptually appealing and quite easy to implement. Under certain restrictive, such as parametric, modeling assumptions, Park and Stefanski (1998) and Khoshgoftaar et al (1992) reported some elegant results. However, the theoretical justifications of the RLS and MRE methods are in general quite challenging. The consistency and asymptotic normality of RLS and MRE estimators for linear or nonlinear models are not established under general regularity conditions. Moreover, in all these studies, the relative error is defined as the spread between the target value and the predictor divided by the target value, i.e., the ratio of the error relative to the target. Such a relative error can be quite inadequate when, in particular, the unknown target value is large and the predictor is relatively small. On the other hand, the ratio of the error relative to the predictor can very well be an alternative representation of the relative error. More discussions on the choice of criterion of relative errors are given in Section 2. A similar consideration is seen in an accounting model in Ye (2007).

In the next section, we propose the least absolute relative errors criterion (LARE) for multiplicative models, by using both types of relative errors. Since the responses are usually positive when relative error is of concern, the multiplicative model or accelerated failure time (AFT) model naturally handles positive responses. In section 3, a large sample theory including consistency and asymptotic normality is presented along with an inference procedure with random weighting. Conditions, especially on the error terms, are also specified. In addition, the error distribution with which the LARE is efficient is given. Section 4 contains results of simulation studies. An illustration with a real example is given in Section 5. All proofs are deferred to the Appendix.

2. THE MODEL AND THE LARE CRITERION

Consider the following multiplicative model or accelerated failure time model:

Yi=exp(Xiβ)εi,i=1,,n, (2)

which, by taking logarithmic transformation, is model (1) with Yi=log(Yi) and εi=log(εi). Such logarithmic transformation is a reasonable choice in some cases due to its theoretical simplicity. However, a linear relationship in the transformed model is not linear in the original one. And one need to transform the analysis results back to the original measurement scale.

Observe that the predictor of Yi with covariate Xi is exp(Xiβ). It is intuitively appealing and interpretable to consider the relative error

Yiexp(Xiβ)YiorYiexp(Xiβ)exp(Xiβ).

We note that log(Yi)Xiβ is approximately equal to Yiexp(Xiβ)Yi or Yiexp(Xiβ)exp(Xiβ) only when the relative error is very small.

Remark 1

A measurement of relative error in terms of the ratio of the error relative to the target value can be inappropriate. Consider, for example, Yi being large, say, 100, and the predictor exp(Xiβ) being small, say 10. The relative error so defined, Yiexp(Xiβ)Yi, returns a value 0.9, whilst the alternative Yiexp(Xiβ)exp(Xiβ) returns 9. The latter, in this case, more properly reflects the inaccuracy of the predictor. The criteria RLS and MRE which use the former as the relative error are thus inadequate in this case. Conversely, only using the latter as relative error can be equally inappropriate when the predictor is large but the response is small. The criterion LARE that we propose below takes into consideration both types of relative errors. We note that the criteria RLS and MRE, if using both types of relative errors, are increasingly difficult to analyze. In particular, the closed form expression of the best mean squared relative error predictor of Y given X shall not be available anymore.

The criterion we propose, called least absolute relative errors (LARE), is to minimize the sum of the absolute relative errors for model (2):

LAREn(β)i=1n{Yiexp(Xiβ)Yi+Yiexp(Xiβ)exp(Xiβ)}. (3)

One advantage is that they are scale free or unit free. This is particularly important for applying LARE criterion to certain types of data. For example, in regression analysis of a number of stocks, comparison of share prices of different stocks is generally meaningless, especially because of possible share split or reverse split. In other words, different stocks have different units which are not well defined. The criterions based on absolute errors is not directly applicable here without accounting for the heterogeneity.

The proposed LARE criterion is based on the sum of the two types of the relative errors. There are also several different ways of combining the two types of errors. For example, one might consider the maximum of the two, as appeared in Ye (2007), in which case, a theory can be developed in an analogous fashion; see more discussion in Section 6. The computation of minimizing LARE(β) can be carried out by the conventional numerical tools, such as the Newton-Raphson method, or by the programming similar to that of LAD regression which is now a standard practice.

3. ASYMPTOTIC PROPERTIES

Some notations are needed. Throughout the paper, ∥·∥ is the Euclidean norm and I(·) is the indicator function. For simplicity of presentation, we make a notion (X, Y, ε) and assume (Xi, Yi, εi), i ≥ 1, are independent and identically distributed (i.i.d) copies of (X, Y, ε), where Xi and εi are independent. Let β0 be the true value of β. The following assumptions are needed for the consistency and asymptotic normality of the LARE estimator.

Assumption 1

ε has a continuous density f(·) in a neighborhood of 1.

Assumption 2

P (ε > 0) = 1.

Assumption 3

X is bounded, i.e, P (∥X∥ ≤ K) = 1 for some 0 < K < ∞, and does not concentrate on any hyperplane of p – 1 dimension.

Assumption 4

E(ε + ε−1) < ∞ and E[(ε + ε−1)sgn(ε – 1)] = 0.

Assumption 5

E{(ε + ε−1)2} < ∞.

Assumptions 1-3 are regularity conditions. In Assumption 4, the condition on the first moment E(ε + ε−1) < ∞ is to ensure the weak consistency of the LARE estimator. The condition E[(ε + ε−1)sgn(ε – 1)] = 0 is only an identifiability condition, which plays the same role as the assumptions of zero mean and zero median for the LS and LAD methods, respectively, for linear regression. In fact, as shown in Lemma 2 in Appendix, if ε is nondegenerate and satisfies E(ε + ε−1) < ∞, then there exists a unique scale transformation εa = a · ε such that E[(εa+εa1)sgn(εa1)]=0. It implies that this condition ensures the identifiability of the intercept component of the parameter β in model (2). Assumption 5 is to ensure the asymptotic normality of the LARE estimator, similar to the finite second moment assumption for the LS estimator for linear regression.

Remark 2

The first moment condition E(ε + ε−1) < ∞ ensures consistency and the second moment condition E{(ε+ ε−1)2} < ∞ ensures the asymptotic normality of the LARE estimator, while the RLS estimator in Park and Stefanski (1998) requires second moment condition E(ε−2) < ∞ for consistency.

Remark 3

These technical conditions may not be the weakest possible ones. They are imposed to facilitate the proofs. Some conditions could be relaxed for general limit theory. Knight (1998) gave a general limit theory for LAD estimation. Correspondingly, we could follow those steps to construct more general limit theory. This leaves space for future research.

Assumption 3 implies that i=1nXiXi is positive definite almost surely. By Lemma 1 in the Appendix, LAREn(β) is strictly convex in β under Assumption 3. Therefore, the minimizer of LAREn(β), denoted as β^n, exists and is unique almost surely. The following theorem establishes the consistency and asymptotic normality for β^n.

Theorem 1

Suppose Assumptions 1-4 hold. Then, β^n converges to β0 in probability as n → ∞. If, in addition to Assumptions 1-4, Assumption 5 holds, then as n→ ∞ ,

n(β^nβ0)DN(0,14{J+2f(1)}2AV1),

where D presents ‘convergence in distribution’, A = E{(ε + ε−1)2}, J = E{εsgn(ε – 1)} and V = E(XXT).

Remark 4

Note that

2E{εI(ε>1)}>E{(ε+ε1)I(ε>1)}=E{(ε+ε1)I(ε1)}>2E{εI(ε1)}

under Assumptions 1 and 4, which ensures J > 0. So the positivity of the density of the error in a neighborhood of 1 is not required here. It is different from the LAD estimation for linear regression models, where the positivity of the density of the error in a neighborhood of zero is essential to ensure the asymptotic normality.

Unlike the least squares estimator, the asymptotic covariance matrix involves the density function of the error terms and cannot be properly estimated using the plug-in rules. To avoid density estimation, we propose a distributional approximation based on random weighting method by externally generating i.i.d. random variables. Let w1,…, wn be a sequence of i.i.d. nonnegative random variables, with mean and variance both equal to 1. For instance, the standard exponential distribution has mean and variance equal to 1. Define

LAREn(β)i=1nwi{Yiexp(Xiβ)Yi+Yiexp(Xiβ)exp(Xiβ)},

and β^n=argminβB0LAREn(β). The distribution of n(β^nβ0) can be approximated by the resampling distribution of n(β^nβ^n). Let L denote the conditional distribution given {(Yi, Xi), i = 1,…, n}.

Proposition 1

Suppose Assumptions 1-5 hold. Then as n→ ∞,

L(n(β^nβ^n))DN(0,14{J+2f(1)}2AV1),

which is the asymptotic distribution of n(β^nβ0), where J, A and V are given in Theorem 1.

The proof of Proposition 1 is similar to the proof of Theorem 1 in Chen et al (2008) and is omitted here. The inference procedure via resampling is as follows. First, nonnegative i.i.d. random weights {w1,…, wn} of mean one and variance one are generated M times, where M is a large number. Each time, β^n is computed. Denote them as b1, …, bM. Then, the distribution of n(β^nβ^n) is approximated by the empirical distribution of {n(biβ^n),i=1,,M}.

It is known that the variance of an efficient estimator attains the Cramer-Rao lower bound. The least squares estimator and least absolute deviation estimator are efficient when the error terms follow normal distribution and double exponential distribution, respectively. In the following, we give the error distribution with which the LARE estimator is efficient.

Proposition 2

Suppose Assumption 3 holds. If the error ε has a density function as follows:

f(x)=cexp(1x1x1logx)I(x>0),

where c is a normalizing constant, then the estimator β^n is efficient.

Remark 5

If a random variable X is distributed with density f(x) in Proposition 2, then 1/X is equal in distribution to X.

4. SIMULATION STUDIES

Simulation studies are conducted to compare the finite sample efficiency of the least squares (LS), the least absolute deviation (LAD), the relative least squares (RLS) in which the predictor is the best mean squared relative error predictor of Y given X and our proposed least absolute relative errors (LARE) estimator. The studies are based on the model

Yi=exp(β0+β1X1i+β2X2i)εi,i=1,,n, (4)

where X1i and X2i are two independent random variables following the standard normal distribution N(0, 1), and β0, β1 and β2 are the regression parameters. We consider three error distributions: ε follows the distribution with which the LARE estimator is efficient; log(ε) follows Uniform(−2, 2); and log(ε) follows N(0, 1). The sample size n is 200. The variance inference is based on the random weighting and the resampling size N is 500. The simulation results are based on 1000 replications.

We get the LS and LAD estimators by minimizing i=1n(logYiβ0β1X1iβ2X2)2 and i=1nlogYiβ0β1X1iβ2X2i respectively. And we get the RLS estimators by minimizing i=1n[{Yigi(X)}Yi]2, where gi(X)=E(Yi1X1i,X2i)E(Yi2X1i,X2i) for model (4) is the best mean squared relative error predictor proposed in Park and Stefanski (1998).

In the following Table 4-1, we present the average of the estimates β^n the empirical standard error (SE), the average of the estimated standard errors (SEE) and coverage probabilities (CP) of 95% confidence intervals based on the resampling. Table 4-2 shows the asymptotic standard error for β^n.

Table 4-1.

Comparison among various approaches with β = (1, 1, 1)T

ε ~ f(·) log(ε) ~ Unif(−2,2) log(ε) ε N(0, 1)

β^0 β^1 β^2 β^0 β^1 β^2 β^0 β^1 β^2
LARE BIAS 0.001 0.002 0.001 0.001 0.002 0.000 0.004 0.004 0.002
SE 0.032 0.033 0.034 0.077 0.075 0.073 0.076 0.073 0.076
SEE 0.033 0.034 0.034 0.075 0.075 0.075 0.073 0.072 0.072
CP 0.945 0.944 0.951 0.944 0.943 0.959 0.926 0.928 0.931

LS BIAS 0.001 0.002 0.001 0.001 0.002 0.000 0.004 0.003 0.002
SE 0.035 0.035 0.037 0.083 0.081 0.078 0.071 0.069 0.072
SEE 0.035 0.035 0.035 0.081 0.080 0.080 0.070 0.069 0.070
CP 0.945 0.952 0.926 0.948 0.937 0.951 0.950 0.939 0.935

LAD BIAS 0.001 0.002 0.001 0.001 0.004 0.001 0.004 0.003 0.001
SE 0.033 0.034 0.034 0.143 0.140 0.135 0.090 0.085 0.090
SEE 0.036 0.038 0.038 0.145 0.144 0.144 0.093 0.094 0.094
CP 0.938 0.915 0.921 0.897 0.868 0.888 0.917 0.907 0.906

RLS BIAS 0.145 0.010 0.003 0.268 0.004 0.002 0.071 0.001 0.003
SE 1.231 0.269 0.180 1.663 0.253 0.243 1.414 0.286 0.283
SEE 0.692 0.144 0.143 0.818 0.173 0.167 0.822 0.216 0.215
CP 0.854 0.889 0.872 0.925 0.921 0.925 0.660 0.708 0.751

Note: f(x) = c exp(−∣1 − x∣ − ∣1 − x−1∣ − log x)I(x > 0).

Table 4-2.

Asymptotic standard errors for estimators of β

ε ~ f(·) log(ε) ~ Unif(−2,2) log(ε) ~ N(0, 1)
LARE 0.030 0.030 0.030 0.074 0.074 0.074 0.075 0.075 0.075
LS 0.035 0.035 0.035 0.082 0.082 0.082 0.071 0.071 0.071
LAD 0.031 0.031 0.031 0.141 0.141 0.141 0.089 0.089 0.089

Note: f(x) = c exp(−∣1 − x∣ − ∣1 − x−1∣ − log x)I(x > 0).

The main findings can be summarized as follows:

  • For ε follows the efficient distribution, LARE is slightly better than the LS and LAD and much better than the RLS in terms of accuracy and stability of the estimation of the regression parameters.

  • For log(ε) follows uniform distribution, LARE performs considerably better than the LS, LAD and RLS.

  • For log(ε) follows normal distribution, LS is efficient theoretically for linear regression models. It is seen from Tables 4-1 and 4-2 that, LARE does well with comparable results to the LS.

  • For the error distributions considered in our simulation, Tables 4-1 and 4-2 show that, the SE, SEE and the asymptotic standard error of LARE estimator are generally close.

Further simulation shows that LARE is not reliable when log(ε) follows double exponential distribution. This result is not strange because Assumption 4 is not satisfied in this case. Indeed, our proposed method performs well in practical settings.

5. APPLICATIONS

The dataset to be analyzed is obtained by the Reuters 3000 Xtra which is a major tool used by financial and investment analysts worldwide. The dataset contains the monthly close stock prices for 408 firms from 2007 to 2008 and their corresponding Book Value Per Share (BVPS) and Earning Per Share (EPS) in Hong Kong Stock Exchange. The P/B ratio is the price-to-book ratio which is a financial ratio to compare book value of a company to its current market price. And the P/E ratio is the price-to-earning ratio which is also a financial ratio to measure the price paid for a share relative to the annual income or profit per share earned by the firm.

Let PCi and PNi be the current price and the price for a fixed period of time later for i = 1,…, n, respectively. The sample size n here is 408. We consider the following model:

PNi=PCiexp(β0+β1PEi+β2PBi)εi,i=1,,n, (5)

where PEi and PBi are the P/E ratio and P/B ratio corresponding to the current price PCi.

The purpose of this study is to analyze the stock returns by using LARE and LS to estimate β = (β0, β1, β2) in model (5). Table 5-1 presents the estimator β^ for β where PCi are the monthly close prices of 2007 and PNi are the corresponding monthly close prices one year later in model (5). Table 5-2 shows summary statistics of β^0,β^1 and β^2.

Table 5-1.

Comparison of regression coefficients: LARE vs LS

LARE LS

β^0 β^1 β^2 β^0 β^1 β^2
JAN 1.0549 −0.0002 −0.0171 1.0840 0.0001 −0.0210
FEB 1.1976 0.0004 −0.0222 1.2244 0.0002 −0.0241
MAR 1.3198 −0.0009 −0.0168 1.3243 −0.0005 −0.0227
APR 0.9157 −0.0012 −0.0085 0.8961 −0.0007 −0.0092
MAY 0.6336 −0.0009 −0.0067 0.6291 −0.0006 −0.0069
JUN 0.6461 −0.0007 −0.0078 0.6330 −0.0004 −0.0071
JUL 0.4676 −0.0007 −0.0061 0.4478 −0.0005 −0.0056
AUG 0.2313 −0.0003 −0.0053 0.2838 −0.0004 −0.0048
SEP 0.0623 −0.0002 −0.0039 0.0844 −0.0002 −0.0031
OCT 0.0106 −0.0000 −0.0040 0.0373 −0.0001 −0.0038
NOV −0.1079 −0.0000 −0.0034 −0.1060 −0.0002 −0.0035
DEC −0.1429 −0.0003 −0.0014 −0.1442 −0.0003 −0.0033

Table 5-2.

Summary statistics: LARE vs LS

10th 90th
Min Max Mean Stdev Median Percentile Percentile
LARE β^0 −0.1429 1.3198 0.5241 0.5186 0.5506 −0.1079 1.1976
β^1 −0.0012 0.0004 −0.0004 0.0005 −0.0003 −0.0009 0.0000
β^2 −0.0222 −0.0014 −0.0086 0.0065 −0.0064 −0.0171 −0.0034

LS β^0 −0.1442 1.3243 0.5328 0.5172 0.5384 −0.1060 1.2244
β^1 −0.0007 0.0002 −0.0003 0.0003 −0.0004 −0.0006 0.0001
β^2 −0.0241 −0.0031 −0.0096 0.0081 −0.0063 −0.0227 −0.0033

The results show that, LARE and LS give similar estimates which are statistically stable. The predictor based on LARE are financially meaningful and could give better estimates for the intrinsic value of a firm. Moreover, it can be seen that the proposed estimates for β1 which is the coefficient of P/E ratio in model (5) are substantially more stable than that of P/B ratio.

6. CONCLUDING REMARKS

This paper proposes the least absolute relative errors estimation for multiplicative model. The main point of the paper is to advocating such a criterion, which may have broader applications in financial/economic data analysis, as shown in the real example of this paper and Ye (2007), survival analysis or categorical analysis. Heuristically, in survival analysis, less accuracy in terms of absolute error may be required for predicting longer life times; and, in categorical data analysis, a category with larger percentage of observations may require more accuracy of prediction in terms of absolute error. Such consideration bears the same rationale of using relative error rather than absolute error. Our future work shall consider further extension of the method to censored data and categorical data.

The least absolute relative error criterion that we adopt in (3) is not necessarily the unique choice. There are variations such as

LAREn(β)i=1nmax{Yiexp(Xiβ)Yi,Yiexp(Xiβ)exp(Xiβ)}, (6)

as also considered in Ye (2007). For such variations, the asymptotic theories analogous to Theorem 1 and Propositions 1 and 2 can be established without further difficulty. In this paper, we choose to present a typical one of the criterions.

For completion, we give the main results for the estimator of such variations here without proof as a note. The assumptions parallel Assumptions 1-5 in Section 3. Similar to Lemma 1 in the Appendix, one can prove that LAREn(β) is strictly convex in β under Assumption 3. Therefore, there exists a unique βn which minimizes LAREn(β) almost surely. Other than Assumptions 1-3, the following assumptions are needed for consistency and asymptotic normality for β^n the minimizer of LAREn(β).

Assumption 6

E(ε + ε−1) < ∞ and E{ε−1I(ε ≤ 1) – εI(ε > 1)} = 0.

Assumption 7

E{ε2I(ε > 1) + ε−2I(ε ≤ 1)} < ∞.

Assumptions 6-7 play the same role as Assumptions 4-5 in Section 3. E{ε−1I(ε ≤ 1) – εI(ε > 1)} = 0 shares similar property as E[(ε + ε−1)sgn(ε – 1)] = 0 in Section 3, which is only an identifiability condition.

Proposition 3

Suppose Assumptions 1-3 and Assumption 6 hold. Then, β^n converges to β0 in probability as n → ∞. If, in addition to Assumptions 1-3 and Assumption 6, Assumption 7 holds, then as n→ ∞,

n(β^nβ0)DN(0,14{K+f(1)}2BV1),

where B = E{ε2I(ε > 1) + ε−2I(ε ≤ 1)}, K = E{εI(ε > 1)} and V = E(XXT).

Figure 1.

Figure 1

Plot of four densities.

density1: f(x) = c exp(−∣1 – x∣ – ∣1 – x−1∣ − log x)I(x > 0).

density2: the density of ε where log(ε) ~ N(0, 1).

density3: the density of ε where log(ε) ~ Double Exponential(0, 1).

density4: the density of ε where log(ε) ~ Uniform(−2, 2).

Acknowledgments

The authors are grateful to two anonymous referees, the Associate Editor and the Editor for comments and suggestions that lead to substantial improvements in the paper.

APPENDIX. PROOFS

We state two lemmas that will be used later.

Lemma 1

Let ψ(x, a) = ∣1 – a−1ex∣ + ∣1 − aex∣ for a > 0 and xR. Then, for fixed a > 0, ψ(x, a) is a strictly convex function in xR.

The proof is omitted.

Lemma 2

Suppose that ξ* is nondegenerate and E{exp(ξ*) + exp(−ξ*)} < ∞. Let ϕ(a) = E{exp(ξ* – a) + exp(a − ξ*)}sgn(ξ* − a)] and a* = max{a : ϕ(a) ≥ 0}. If ϕ(a) is continuous at a*, then there exists a unique constant aR such that ϕ(a) = 0.

Proof

Observe the following inequality

{exp(xb)+exp(bx)}sgn(xb){exp(xa)+exp(ax)}sgn(xa)ab{exp(xy)+exp(x+y)}sgn(xy)dy, (A.1)

for any x, a and bR with a < b. Then,

ϕ(b)ϕ(a)E[ab{exp(ξy)+exp(ξ+y)}sgn(ξy)dy]=abE[{exp(ξy)+exp(ξ+y)}sgn(ξy)]dy. (A.2)

It is easy to show that {exp(−x + y) − exp(x − y)}sgn(x − y) < 0 for xy. It follows that

E[{exp(ξy)+exp(ξ+y)}sgn(ξy)]<0,

which implies that ϕ(b) − ϕ(a) < 0. Thus, ϕ(·) is strictly decreasing. On the other hand, it is seen from the expression ϕ(·) that

ϕ(a)asaandϕ(a)asa.

Together with the continuity of ϕ(·) at a*, there exists a unique solution to ϕ(a) = 0. The proof is complete.

A.1. Proof of Theorem 1

The proof will be done in several steps.

Step 1

To prove consistency, denote

ψn(β)i=1n[1εi1exp{Xi(ββ0)}+1εiexp{Xi(ββ0)}].

It follows from the Convexity Lemma in Pollard (1991, p. 187) and the convexity of ψn(β) by Lemma 1 that, for any compact set B,

supβB1nψn(β)E{ψn(β)}0 (A.3)

in probability as n → ∞. Then,

E{ψn(β)ψn(β0)}=i=1nE[1εi1exp{Xi(ββ0)}+1εiexp{Xi(ββ0)}1εi11εi]=i=1nE((εi+εi1)sgn(1εi)[exp{Xi(ββ0)}1])+i=1nE(εisgn(εi1)[exp{Xi(ββ0)}+exp{Xi(ββ0)}2])+2i=1nE({I(εiexp{Xi(ββ0)})I(εi1)})([εi1exp{Xi(ββ0)}εiexp{Xi(ββ0)}]). (A.4)

By Assumption 4, the first term in the summand is 0. It follows from Assumptions 1 and 4 that,

2E{εI(ε>1)}>E{(ε+ε1)I(ε>1)}=E{(ε+ε1)I(ε1)}>2E{εI(ε1)}, (A.5)

which implies J = E{εsgn(ε − 1)} > 0. This result leads to the fact that the second term in (A.4) is nonnegative. It is easy to check that the third term in (A.4) is also nonnegative. Hence, E{ψn(β) − ψn(β0)} ≥ 0 for all β. Furthermore, E{ψn(β) − ψn(β0)} = 0 ensures

i=1nE(εisgn(εi1)[exp{Xi(ββ0)}+exp{Xi(ββ0)}2])=0.

As β = β0 is the unique minimizer of exp{Xi(ββ0)}+exp{Xi(ββ0)}, it follows from Assumption 3 and E{εsgn(ε − 1)} > 0 that β = β0 is the unique minimizer of E{ψn(β) − ψn(β0)}. Denote ψ(β) = n−1E{ψn(β)}. Then, for every δ > 0, there exists η > 0 such that ψ(β) > ψ(β0) + η for ∥ββ0∥ ≥ δ. For any constant δ and C, let β^n be the minimizer of ψn(β) over δ ≤ ∥ββ0∥ ≤ C. Then by (A.3), ψn(βn)ψ(βn) in probability as n → ∞ and ψ(βn)<ψ(β0)+η for some η > 0. On the other hand, for any constant δ,

infββ0δψn(β)ψn(β0)ψ(β0)

in probability by (A.3). Therefore, with probability going to 1, the minimum of ψn(β) in ∥ββ0∥ ≤ C is achieved inside ∥ββ0∥ ≤ δ. Since ψn(β) is strictly convex, the local minimizer inside ∥ββ0∥ ≤ δ is the unique global minimizer. By the definition of β^n,P(β^n{β:ββ0δ})1asn. Thus, the weak consistency of β^n is proved by letting δ → 0.

Step 2

To prove asymptotic normality, we approximate E{ψn(β) − ψn(β0)} for every fixed β in a neighborhood of β0 first. Observe that exp(x) + exp(x) − 2 = x2 + O(∣x3) if x closes to zero. By the Taylor expansion,

1nE{ψn(β)ψn(β0)}=J1ni=1nE[exp{Xi(ββ0)}+exp{Xi(ββ0)}2]+2f(1)1ni=1nE{(ββ0)XiXi(ββ0)}+O(ββ03)=J(ββ)V(ββ0)+2f(1)(ββ0)V(ββ0)+O(ββ03)={J+2f(1)}(ββ0)V(ββ0)+O(ββ3), (A.6)

where J = E{εsgn(ε − 1)} and V = E(XXT).

Step 3

Write Wn=i=1n(εi+εi1)sgn(εi1)Xi. We are now in position to show

supββCn12ψn(β)ψn(β0)+Wn(ββ0)E{ψn(β0)ψn(β0)}0 (A.7)

in probability as n → ∞, for each positive constant C. To this end, let θ=n(ββ0), it is equivalent to show

supθCψn(β0+θn)ψn(β0)+1nWnθE{ψn(β0+βn)ψn(β0)}0 (A.8)

in probability as n → ∞. In order to establish (A.8), we shall first show that, for each fixed θ,

ψn(β0+θn)ψn(β0)+1nWnθE{ψn(β0+θn)ψn(β0)}0 (A.9)

in probability as n → ∞. Analogous to (A.4), denote

Gi(β)εisgn(εi1)[exp{Xi(ββ0)}+exp{Xi(ββ0)}2]

and

Ri(β)[I(εi>exp{Xi(ββ0)})I(εi>1)][εiexp{Xi(ββ0)}εi1exp{Xi(ββ0)}].

Then,

ψn(β)ψn(β0)E{ψn(β)ψn(β0)}=i=1n(εi+εi1)sgn(εi1)[exp{Xi(ββ0)}1]+i=1n{Gi(β)EGi(β)}+2i=1n{Ri(β)ERi(β)}.

For each fixed θ,

i=1nE[Gi(β0+θn)E{Gi(β0+θn)}]2i=1nE{εisgn(εi1)}2E{exp(1nXiθ)+exp(1nXiθ)2}2=i=1nE{εisgn(εi1)}2E(1nθ,XiXiθ+ai)2,say0 (A.10)

as n → ∞, where P(∥ai∥ ≤ cn−3/2) = 1 for some constant c and i = 1,…, n. It then follows that

i=1n[Gi(β0+θn)E{Gi(β0+θn)}]0 (11)

in probability as n → ∞. On the other hand, by the Taylor expansion, for each fixed θ,

E{εexp(1nXθ)ε1exp(1nXθ)}2=E(ε1nXθε11nXθ+εε1+b)2=E{(ε1)1nXθ(ε11)1nXθ2nXθ+(ε1)(ε11)+b}2E[2{(ε1)2+(ε11)2+4}1nθXXθ+2(ε1)2+2(ε11)2+b2],say

where P(∥b∥ ≤ cn−1) = 1 for some constant c. Hence, an argument similar to (A.10) leads to

i=1nE[Ri(β0+θn)E{Ri(β0+θn)}]2i=1nE{{I(1nXiθ>0)I(0<logεi1nXiθ)}}{+I(1nXiθ0)I(0logεi>1nXiθ)}[2{(εi1)2+(εi11)2+4}1nθXiXiθ+2(εi1)2+2(εi11)2+bi2],say0

as n → ∞, where P(∥bi∥ ≤ cn−1) = 1 for some constant c and i = 1,…, n. Thus, for each fixed θ,

i=1n[Ri(β0+θn)E{Ri(β0+θn)}]0 (A.12)

in probability as n → ∞. Combining (A.11) and (A.12), together with Assumption 4, we have shown (A.9).

Next, ψn(β0+θn)ψn(β0)+Wnθn is convex by Lemma 1. It follows from (A.9) and the Convexity Lemma in Pollard (1991, p. 187) that, for each constant C > 0,

supθCψn(β0+θn)ψn(β0)+1nWnθE{ψn(β0+θn)ψn(β0)}0

in probability. Then (A.7) is proved.

Step 4

Let ξn(β)=ψn(β)ψn(β0)n{J+2f(1)}(ββ0)V(ββ0)+Wn(ββ0). Combining step 2 and step 3, we have

supββ0Cn12ξn(β)0 (A.13)

in probability as n → ∞ for each constant C > 0. Let β^n be the minimizer of n{J+2f(1)}(ββ0)V(ββ0)Wn(ββ0). Clearly β^nβ0={J+2f(1)}1V1Wn(2n). By the definition of Wn, for every δ > 0, there exist some constants > 0 and Nδ, such that P(β^nβ>Kδn12)δ2 for any nNδ. In view of (A.13), for every η > 0, there exists some constant Nη such that, for any nNη,

P(supββ0Kδn12ξn(β)>η)δ2.

Hence, for every δ, η > 0, there exists N = max{Nδ, Nη} such that, for any nN,

P(ξn(β^n)>η)=P(ξn(β^n)>η,β^nβ0>Kδn12)+P(ξn(β^n)>η,β^nβ0Kδn12)P(β^nβ0>Kδn12)+P(supββ0Kδn12ξn(β)>η)δ,

which implies ξn(β^n)=op(1). Similar arguments also lead to

supββ^nCn12ξn(β)=op(1)

for each constant C > 0.

Observe that

ψn(β)ψn(β0)=n{J+2f(1)}(ββ^n)V(ββ^n)14n{J+2f(1)}1WnV1Wn+ξn(β).

For any constants c and C with 0 < c < C < ∞,

infcn12ββ^nCn12{ψn(β)ψn(β0)}infcn12ββ^nCn12[n{J+2f(1)}(ββ^n)V(ββ^n)]14n[{J+2f(1)}1WnV1Wn]supcn12ββ^nCn12ξn(β){J+2f(1)}c2λ14n{J+2f(1)}1WnV1Wn+op(1), (A.14)

where λ is the smallest eigenvalue of V. On the other hand, for any constant c,

infββ^ncn12{ψn(β)ψn(β0)}ψn(β^n)ψn(β0)=14n{J+2f(1)}1WnV1Wn+ξn(β^n)=14n{J+2f(1)}1WnV1Wn+op(1). (A.15)

Both (A.14) and (A.15) together imply that, with probability going to 1, the minimum of ψn(β) − ψn(β0) in ββ^nCn12 is achieved inside ββ^ncn12. Since ψn(β)−ψn(β0) is convex, the local minimizer inside ββ^ncn12 is the global minimizer. Thus,

β^nβ0=β^nβ0+op(n12)=12n{J+2f(1)}1V11nWn+op(n12).

Hence, as n → ∞,

n(β^nβ0)DN(0,14{J+2f(1)}2AV1),

A.2. Proof of Proposition 2

For the given density of ε, the density of Yi given Xi is

fYiXi(y)=cexp{exp(Xiβ)yexp(Xiβ)yexp(Xiβ)ylogy}.

Then, the likelihood function of Y is

L(β)=cnexp[i=1n{exp(Xiβ)Yiexp(Xiβ)Yiexp(Xiβ)YilogYi}].

Maximizing this likelihood function is equivalent to minimizing our proposed LARE criterion

i=1n{exp(Xiβ)Yiexp(Xiβ)+Yiexp(Xiβ)Yi}.

Therefore the estimator β^n, which minimizes LAREn(β), is efficient when ε ~ f(·) = c exp(−∣1 − x∣ − ∣1 − x−1∣ − log x)I(x > 0). The proof is complete.

Contributor Information

Kani CHEN, Department of Mathematics, HKUST, Kowloon, Hong Kong, China (makchen@ust.hk).

Shaojun GUO, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R.China (guoshaoj@amss.ac.cn).

Yuanyuan LIN, Department of Mathematics, HKUST, Kowloon, Hong Kong, China (linyy@ust.hk).

Zhiliang YING, Department of Statistics, 618 Mathematics, Columbia University, New York NY 10027 (zying@stat.columbia.edu).

References

  1. Chen K, Ying Z, Zhang H, Zhao L. Analysis of least absolute deviation. Biometrika. 2008;95:107–122. [Google Scholar]
  2. Gauss CF. Theoria Motus Corporum Coelestium. Perthes; Hamburg: 1809. [Google Scholar]; Theory of the Motions of the Heavenly Bodies Moving about the Sun in Conic Sections. Dover; New York: 1963. Translation reprinted as. [Google Scholar]
  3. Khoshgoftaar TM, Bhattacharyya BB, Richardson GD. Predicting software errors, during development, using nonlinear regression models: a comparative study. IEEE Transactions on Reliability. 1992;41:390–395. [Google Scholar]
  4. Knight K. Limiting distribution for L1 regression estimators under general conditions. the Annals of Statistics. 1998;26:755–770. [Google Scholar]
  5. Makridakis S, Andersen A, Carbone R, Fildes R, Hibon M, Lewandowski R, Newton J, Parzen E, Winkler R. The Forecasting Accuracy of Major Time Series Methods. Wiley; New York: 1984. [Google Scholar]
  6. Narula SC, Wellington JF. Prediction, linear regression and the minimum sum of relative errors. Technometrics. 1977;19:185–190. [Google Scholar]
  7. Park H, Stefanski LA. Relative-error prediction. Statistics and Probability Letters. 1998;40:227–236. [Google Scholar]
  8. Pollard D. Asymptotics for least absolute deviations regression estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]
  9. Portnoy S, Koenker R. The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators (with discussion) Statistical Science. 1997;12:279–300. [Google Scholar]
  10. Stigler SM. Gauss and the Invention of Least Squares. the Annals of Statistics. 1981;9:465–474. [Google Scholar]
  11. Ye J. Price models and the value relevance of accounting information. Technical report. 2007 [Google Scholar]

RESOURCES