Least Absolute Relative Error Estimation

Kani CHEN; Shaojun GUO; Yuanyuan LIN; Zhiliang YING

doi:10.1198/jasa.2010.tm09307

. Author manuscript; available in PMC: 2013 Sep 4.

Published in final edited form as: J Am Stat Assoc. 2012 Jan 1;105(491):1104–1112. doi: 10.1198/jasa.2010.tm09307

Least Absolute Relative Error Estimation

Kani CHEN ¹, Shaojun GUO ², Yuanyuan LIN ^3,^#, Zhiliang YING ⁴

PMCID: PMC3762514 NIHMSID: NIHMS491922 PMID: 24013644

Abstract

Multiplicative regression model or accelerated failure time model, which becomes linear regression model after logarithmic transformation, is useful in analyzing data with positive responses, such as stock prices or life times, that are particularly common in economic/financial or biomedical studies. Least squares or least absolute deviation are among the most widely used criterions in statistical estimation for linear regression model. However, in many practical applications, especially in treating, for example, stock price data, the size of relative error, rather than that of error itself, is the central concern of the practitioners. This paper offers an alternative to the traditional estimation methods by considering minimizing the least absolute relative errors for multiplicative regression models. We prove consistency and asymptotic normality and provide an inference approach via random weighting. We also specify the error distribution, with which the proposed least absolute relative errors estimation is efficient. Supportive evidence is shown in simulation studies. Application is illustrated in an analysis of stock returns in Hong Kong Stock Exchange.

Keywords: Multiplicative regression model, Logarithm transformation, Relative error, Random weighting

1. INTRODUCTION

Linear regression model is one of the most fundamental statistical models. And the most popular method of estimation, which dates back to Gauss, is the method of least squares (LS); see Gauss (1809) and Stigler (1981). Specifically, consider

Y_{i}^{*} = X_{i}^{⊺} β + ε_{i}^{*}, i = 1, \dots, n,

(1)

where $Y_{i}^{*}$ and X_i are, respectively, the response variable and observable p-vector of covariates, β is the p-vector of regression coefficients including an intercept and $ε_{i}^{*}$ is the unobservable error term independent of X_i. The least squares criterion is to minimize the sum of squares of the errors: $\sum_{i = 1}^{n} {(Y_{i}^{*} - X_{i}^{⊺} β)}^{2}$ . The resulting LS estimator enjoys some important optimality, such as best linear unbiased estimator. It is efficient when the errors follow normal distribution. An important alternative to the least squares method is the least absolute deviation (LAD) method, which is to minimize the sum of absolute values of the errors: $\sum_{i = 1}^{n} ∣ Y_{i}^{*} - X_{i}^{⊺} β ∣$ . The LAD estimator is more robust than the LS estimator, and its computation and inference procedure is now rather straightforward with the help of linear program and random weighting. A comprehensive discussion may be found in Portnoy and Koenker (1997). We note that the LS method requires finite second moment of the errors while the LAD requires positivity of the density of the errors at 0.

The above LS and LAD criterions are based on absolute errors. In many practical applications, however, the relative errors, rather than the absolute errors, are more of concern. Narula and Wellington (1977) presented an estimation method based on minimizing the sum of absolute relative errors for linear model. Makridakis et al (1984) used relative error as a model selection criterion in time series modeling. Khoshgoftaar et al (1992) gave sufficient conditions to ensure the strong consistency of the estimators minimizing the sum of squared relative errors: $\sum_{i = 1}^{n} {[{Y_{i} - f (X_{i}, β)} ∕ Y_{i}]}^{2}$ (RLS for relative least squares) and minimizing the sum of the absolute relative errors: $\sum_{i = 1}^{n} ∣ {Y_{i} - f (X_{i}, β)} ∕ Y_{i} ∣$ (MRE for minimum relative errors) for nonlinear regression model Y_i = f(X_i,β)+ε_i, where f(x,β) is the regression function and Y_i, X_i, β, ε_i are given in model (1). Park and Stefanski (1998) derived a closed form expression for the best mean squared relative error predictor of Y given X, where Y is the response variable and X is the predictor variable. These approaches are conceptually appealing and quite easy to implement. Under certain restrictive, such as parametric, modeling assumptions, Park and Stefanski (1998) and Khoshgoftaar et al (1992) reported some elegant results. However, the theoretical justifications of the RLS and MRE methods are in general quite challenging. The consistency and asymptotic normality of RLS and MRE estimators for linear or nonlinear models are not established under general regularity conditions. Moreover, in all these studies, the relative error is defined as the spread between the target value and the predictor divided by the target value, i.e., the ratio of the error relative to the target. Such a relative error can be quite inadequate when, in particular, the unknown target value is large and the predictor is relatively small. On the other hand, the ratio of the error relative to the predictor can very well be an alternative representation of the relative error. More discussions on the choice of criterion of relative errors are given in Section 2. A similar consideration is seen in an accounting model in Ye (2007).

In the next section, we propose the least absolute relative errors criterion (LARE) for multiplicative models, by using both types of relative errors. Since the responses are usually positive when relative error is of concern, the multiplicative model or accelerated failure time (AFT) model naturally handles positive responses. In section 3, a large sample theory including consistency and asymptotic normality is presented along with an inference procedure with random weighting. Conditions, especially on the error terms, are also specified. In addition, the error distribution with which the LARE is efficient is given. Section 4 contains results of simulation studies. An illustration with a real example is given in Section 5. All proofs are deferred to the Appendix.

2. THE MODEL AND THE LARE CRITERION

Consider the following multiplicative model or accelerated failure time model:

Y_{i} = \exp (X_{i}^{⊺} β) ε_{i}, i = 1, \dots, n,

(2)

which, by taking logarithmic transformation, is model (1) with $Y_{i}^{*} = \log (Y_{i})$ and $ε_{i}^{*} = \log (ε_{i})$ . Such logarithmic transformation is a reasonable choice in some cases due to its theoretical simplicity. However, a linear relationship in the transformed model is not linear in the original one. And one need to transform the analysis results back to the original measurement scale.

Observe that the predictor of Y_i with covariate X_i is $\exp (X_{i}^{⊺} β)$ . It is intuitively appealing and interpretable to consider the relative error

∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{Y_{i}} ∣ or ∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{\exp (X_{i}^{⊺} β)} ∣ .

We note that $∣ \log (Y_{i}) - X_{i}^{⊺} β ∣$ is approximately equal to $∣ Y_{i} - \exp (X_{i}^{⊺} β) ∣ ∕ Y_{i}$ or $∣ Y_{i} - \exp (X_{i}^{⊺} β) ∣ ∕ \exp (X_{i}^{⊺} β)$ only when the relative error is very small.

Remark 1

A measurement of relative error in terms of the ratio of the error relative to the target value can be inappropriate. Consider, for example, Y_i being large, say, 100, and the predictor $\exp (X_{i}^{⊺} β)$ being small, say 10. The relative error so defined, $∣ Y_{i} - \exp (X_{i}^{⊺} β) ∣ ∕ Y_{i}$ , returns a value 0.9, whilst the alternative $∣ Y_{i} - \exp (X_{i}^{⊺} β) ∣ ∕ \exp (X_{i}^{⊺} β)$ returns 9. The latter, in this case, more properly reflects the inaccuracy of the predictor. The criteria RLS and MRE which use the former as the relative error are thus inadequate in this case. Conversely, only using the latter as relative error can be equally inappropriate when the predictor is large but the response is small. The criterion LARE that we propose below takes into consideration both types of relative errors. We note that the criteria RLS and MRE, if using both types of relative errors, are increasingly difficult to analyze. In particular, the closed form expression of the best mean squared relative error predictor of Y given X shall not be available anymore.

The criterion we propose, called least absolute relative errors (LARE), is to minimize the sum of the absolute relative errors for model (2):

{LARE}_{n} (β) \equiv \sum_{i = 1}^{n} {∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{Y_{i}} ∣ + ∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{\exp (X_{i}^{⊺} β)} ∣} .

(3)

One advantage is that they are scale free or unit free. This is particularly important for applying LARE criterion to certain types of data. For example, in regression analysis of a number of stocks, comparison of share prices of different stocks is generally meaningless, especially because of possible share split or reverse split. In other words, different stocks have different units which are not well defined. The criterions based on absolute errors is not directly applicable here without accounting for the heterogeneity.

The proposed LARE criterion is based on the sum of the two types of the relative errors. There are also several different ways of combining the two types of errors. For example, one might consider the maximum of the two, as appeared in Ye (2007), in which case, a theory can be developed in an analogous fashion; see more discussion in Section 6. The computation of minimizing LARE(β) can be carried out by the conventional numerical tools, such as the Newton-Raphson method, or by the programming similar to that of LAD regression which is now a standard practice.

3. ASYMPTOTIC PROPERTIES

Some notations are needed. Throughout the paper, ∥·∥ is the Euclidean norm and I(·) is the indicator function. For simplicity of presentation, we make a notion (X, Y, ε) and assume (X_i, Y_i, ε_i), i ≥ 1, are independent and identically distributed (i.i.d) copies of (X, Y, ε), where X_i and ε_i are independent. Let β₀ be the true value of β. The following assumptions are needed for the consistency and asymptotic normality of the LARE estimator.

Assumption 1

ε has a continuous density f(·) in a neighborhood of 1.

Assumption 2

P (ε > 0) = 1.

Assumption 3

X is bounded, i.e, P (∥X∥ ≤ K) = 1 for some 0 < K < ∞, and does not concentrate on any hyperplane of p – 1 dimension.

Assumption 4

E(ε + ε⁻¹) < ∞ and E[(ε + ε⁻¹)sgn(ε – 1)] = 0.

Assumption 5

E{(ε + ε⁻¹)²} < ∞.

Assumptions 1-3 are regularity conditions. In Assumption 4, the condition on the first moment E(ε + ε⁻¹) < ∞ is to ensure the weak consistency of the LARE estimator. The condition E[(ε + ε⁻¹)sgn(ε – 1)] = 0 is only an identifiability condition, which plays the same role as the assumptions of zero mean and zero median for the LS and LAD methods, respectively, for linear regression. In fact, as shown in Lemma 2 in Appendix, if ε is nondegenerate and satisfies E(ε + ε⁻¹) < ∞, then there exists a unique scale transformation ε_a = a · ε such that $E [(ε_{a} + ε_{a}^{- 1}) sgn (ε_{a} - 1)] = 0$ . It implies that this condition ensures the identifiability of the intercept component of the parameter β in model (2). Assumption 5 is to ensure the asymptotic normality of the LARE estimator, similar to the finite second moment assumption for the LS estimator for linear regression.

Remark 2

The first moment condition E(ε + ε⁻¹) < ∞ ensures consistency and the second moment condition E{(ε+ ε⁻¹)²} < ∞ ensures the asymptotic normality of the LARE estimator, while the RLS estimator in Park and Stefanski (1998) requires second moment condition E(ε⁻²) < ∞ for consistency.

Remark 3

These technical conditions may not be the weakest possible ones. They are imposed to facilitate the proofs. Some conditions could be relaxed for general limit theory. Knight (1998) gave a general limit theory for LAD estimation. Correspondingly, we could follow those steps to construct more general limit theory. This leaves space for future research.

Assumption 3 implies that $\sum_{i = 1}^{n} X_{i} X_{i}^{⊺}$ is positive definite almost surely. By Lemma 1 in the Appendix, LARE_n(β) is strictly convex in β under Assumption 3. Therefore, the minimizer of LARE_n(β), denoted as ${\hat{β}}_{n}$ , exists and is unique almost surely. The following theorem establishes the consistency and asymptotic normality for ${\hat{β}}_{n}$ .

Theorem 1

Suppose Assumptions 1-4 hold. Then, ${\hat{β}}_{n}$ converges to β₀ in probability as n → ∞. If, in addition to Assumptions 1-4, Assumption 5 holds, then as n→ ∞ ,

\sqrt{n} ({\hat{β}}_{n} - β_{0}) \overset{D}{\to} N (0, \frac{1}{4} {J + 2 f (1)}^{- 2} A V^{- 1}),

where $‘ \overset{D}{\to} ’$ presents ‘convergence in distribution’, A = E{(ε + ε⁻¹)²}, J = E{εsgn(ε – 1)} and V = E(XX^T).

Remark 4

Note that

2 E {ε I (ε > 1)} > E {(ε + ε^{- 1}) I (ε > 1)} = E {(ε + ε^{- 1}) I (ε \leq 1)} > 2 E {ε I (ε \leq 1)}

under Assumptions 1 and 4, which ensures J > 0. So the positivity of the density of the error in a neighborhood of 1 is not required here. It is different from the LAD estimation for linear regression models, where the positivity of the density of the error in a neighborhood of zero is essential to ensure the asymptotic normality.

Unlike the least squares estimator, the asymptotic covariance matrix involves the density function of the error terms and cannot be properly estimated using the plug-in rules. To avoid density estimation, we propose a distributional approximation based on random weighting method by externally generating i.i.d. random variables. Let w₁,…, w_n be a sequence of i.i.d. nonnegative random variables, with mean and variance both equal to 1. For instance, the standard exponential distribution has mean and variance equal to 1. Define

{LARE}_{n}^{⋆} (β) \equiv \sum_{i = 1}^{n} w_{i} {∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{Y_{i}} ∣ + ∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{\exp (X_{i}^{⊺} β)} ∣},

and ${\hat{β}}_{n}^{⋆} = \arg \min_{β \in B_{0}} {LARE}_{n}^{⋆} (β)$ . The distribution of $\sqrt{n} ({\hat{β}}_{n} - β_{0})$ can be approximated by the resampling distribution of $\sqrt{n} ({\hat{β}}_{n}^{⋆} - {\hat{β}}_{n})$ . Let $L^{⋆}$ denote the conditional distribution given {(Y_i, X_i), i = 1,…, n}.

Proposition 1

Suppose Assumptions 1-5 hold. Then as n→ ∞,

L^{⋆} (\sqrt{n} ({\hat{β}}_{n}^{⋆} - {\hat{β}}_{n})) \overset{D}{\to} N (0, \frac{1}{4} {J + 2 f (1)}^{- 2} A V^{- 1}),

which is the asymptotic distribution of $\sqrt{n} ({\hat{β}}_{n} - β_{0})$ , where J, A and V are given in Theorem 1.

The proof of Proposition 1 is similar to the proof of Theorem 1 in Chen et al (2008) and is omitted here. The inference procedure via resampling is as follows. First, nonnegative i.i.d. random weights {w₁,…, w_n} of mean one and variance one are generated M times, where M is a large number. Each time, ${\hat{β}}_{n}^{⋆}$ is computed. Denote them as b₁, …, b_M. Then, the distribution of $\sqrt{n} ({\hat{β}}_{n}^{⋆} - {\hat{β}}_{n})$ is approximated by the empirical distribution of ${\sqrt{n} (b_{i} - {\hat{β}}_{n}), i = 1, \dots, M}$ .

It is known that the variance of an efficient estimator attains the Cramer-Rao lower bound. The least squares estimator and least absolute deviation estimator are efficient when the error terms follow normal distribution and double exponential distribution, respectively. In the following, we give the error distribution with which the LARE estimator is efficient.

Proposition 2

Suppose Assumption 3 holds. If the error ε has a density function as follows:

f (x) = c \exp (- ∣ 1 - x ∣ - ∣ 1 - x^{- 1} ∣ - \log x) I (x > 0),

where c is a normalizing constant, then the estimator ${\hat{β}}_{n}$ is efficient.

Remark 5

If a random variable X is distributed with density f(x) in Proposition 2, then 1/X is equal in distribution to X.

4. SIMULATION STUDIES

Simulation studies are conducted to compare the finite sample efficiency of the least squares (LS), the least absolute deviation (LAD), the relative least squares (RLS) in which the predictor is the best mean squared relative error predictor of Y given X and our proposed least absolute relative errors (LARE) estimator. The studies are based on the model

Y_{i} = \exp (β_{0} + β_{1} X_{1_{i}} + β_{2} X_{2_{i}}) ε_{i}, i = 1, \dots, n,

(4)

where X_{1_i} and X_{2_i} are two independent random variables following the standard normal distribution N(0, 1), and β₀, β₁ and β₂ are the regression parameters. We consider three error distributions: ε follows the distribution with which the LARE estimator is efficient; log(ε) follows Uniform(−2, 2); and log(ε) follows N(0, 1). The sample size n is 200. The variance inference is based on the random weighting and the resampling size N is 500. The simulation results are based on 1000 replications.

We get the LS and LAD estimators by minimizing $\sum_{i = 1}^{n} {(\log Y_{i} - β_{0} - β_{1} X_{1_{i}} - β_{2} X_{2})}^{2}$ and $\sum_{i = 1}^{n} ∣ \log Y_{i} - β_{0} - β_{1} X_{1_{i}} - β_{2} X_{2_{i}} ∣$ respectively. And we get the RLS estimators by minimizing $\sum_{i = 1}^{n} {[{Y_{i} - g_{i}^{*} (X)} ∕ Y_{i}]}^{2}$ , where $g_{i}^{*} (X) = E (Y_{i}^{- 1} ∣ X_{1_{i}}, X_{2_{i}}) ∕ E (Y_{i}^{- 2} ∣ X_{1_{i}}, X_{2_{i}})$ for model (4) is the best mean squared relative error predictor proposed in Park and Stefanski (1998).

In the following Table 4-1, we present the average of the estimates ${\hat{β}}_{n}$ the empirical standard error (SE), the average of the estimated standard errors (SEE) and coverage probabilities (CP) of 95% confidence intervals based on the resampling. Table 4-2 shows the asymptotic standard error for ${\hat{β}}_{n}$ .

Table 4-1.

Comparison among various approaches with β = (1, 1, 1)^T

		ε ~ f(·)^†			log(ε) ~ Unif(−2,2)			log(ε) ε N(0, 1)

		${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$
LARE	BIAS	0.001	0.002	0.001	0.001	0.002	0.000	0.004	0.004	0.002
	SE	0.032	0.033	0.034	0.077	0.075	0.073	0.076	0.073	0.076
	SEE	0.033	0.034	0.034	0.075	0.075	0.075	0.073	0.072	0.072
	CP	0.945	0.944	0.951	0.944	0.943	0.959	0.926	0.928	0.931

LS	BIAS	0.001	0.002	0.001	0.001	0.002	0.000	0.004	0.003	0.002
	SE	0.035	0.035	0.037	0.083	0.081	0.078	0.071	0.069	0.072
	SEE	0.035	0.035	0.035	0.081	0.080	0.080	0.070	0.069	0.070
	CP	0.945	0.952	0.926	0.948	0.937	0.951	0.950	0.939	0.935

LAD	BIAS	0.001	0.002	0.001	0.001	0.004	0.001	0.004	0.003	0.001
	SE	0.033	0.034	0.034	0.143	0.140	0.135	0.090	0.085	0.090
	SEE	0.036	0.038	0.038	0.145	0.144	0.144	0.093	0.094	0.094
	CP	0.938	0.915	0.921	0.897	0.868	0.888	0.917	0.907	0.906

RLS	BIAS	0.145	0.010	0.003	0.268	0.004	0.002	0.071	0.001	0.003
	SE	1.231	0.269	0.180	1.663	0.253	0.243	1.414	0.286	0.283
	SEE	0.692	0.144	0.143	0.818	0.173	0.167	0.822	0.216	0.215
	CP	0.854	0.889	0.872	0.925	0.921	0.925	0.660	0.708	0.751

Open in a new tab

^†

Note: f(x) = c exp(−∣1 − x∣ − ∣1 − x⁻¹∣ − log x)I(x > 0).

Table 4-2.

Asymptotic standard errors for estimators of β

	ε ~ f(·)^†			log(ε) ~ Unif(−2,2)			log(ε) ~ N(0, 1)
LARE	0.030	0.030	0.030	0.074	0.074	0.074	0.075	0.075	0.075
LS	0.035	0.035	0.035	0.082	0.082	0.082	0.071	0.071	0.071
LAD	0.031	0.031	0.031	0.141	0.141	0.141	0.089	0.089	0.089

Open in a new tab

^†

Note: f(x) = c exp(−∣1 − x∣ − ∣1 − x⁻¹∣ − log x)I(x > 0).

The main findings can be summarized as follows:

For ε follows the efficient distribution, LARE is slightly better than the LS and LAD and much better than the RLS in terms of accuracy and stability of the estimation of the regression parameters.
For log(ε) follows uniform distribution, LARE performs considerably better than the LS, LAD and RLS.
For log(ε) follows normal distribution, LS is efficient theoretically for linear regression models. It is seen from Tables 4-1 and 4-2 that, LARE does well with comparable results to the LS.
For the error distributions considered in our simulation, Tables 4-1 and 4-2 show that, the SE, SEE and the asymptotic standard error of LARE estimator are generally close.

Further simulation shows that LARE is not reliable when log(ε) follows double exponential distribution. This result is not strange because Assumption 4 is not satisfied in this case. Indeed, our proposed method performs well in practical settings.

5. APPLICATIONS

The dataset to be analyzed is obtained by the Reuters 3000 Xtra which is a major tool used by financial and investment analysts worldwide. The dataset contains the monthly close stock prices for 408 firms from 2007 to 2008 and their corresponding Book Value Per Share (BVPS) and Earning Per Share (EPS) in Hong Kong Stock Exchange. The P/B ratio is the price-to-book ratio which is a financial ratio to compare book value of a company to its current market price. And the P/E ratio is the price-to-earning ratio which is also a financial ratio to measure the price paid for a share relative to the annual income or profit per share earned by the firm.

Let P_{C_i} and P_{N_i} be the current price and the price for a fixed period of time later for i = 1,…, n, respectively. The sample size n here is 408. We consider the following model:

P_{N_{i}} = P_{C_{i}} \exp (β_{0} + β_{1} {PE}_{i} + β_{2} {PB}_{i}) ε_{i}, i = 1, \dots, n,

(5)

where PE_i and PB_i are the P/E ratio and P/B ratio corresponding to the current price P_{C_i}.

The purpose of this study is to analyze the stock returns by using LARE and LS to estimate β = (β₀, β₁, β₂) in model (5). Table 5-1 presents the estimator $\hat{β}$ for β where P_{C_i} are the monthly close prices of 2007 and P_{N_i} are the corresponding monthly close prices one year later in model (5). Table 5-2 shows summary statistics of ${\hat{β}}_{0}, {\hat{β}}_{1}$ and ${\hat{β}}_{2}$ .

Table 5-1.

Comparison of regression coefficients: LARE vs LS

	LARE			LS

	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$
JAN	1.0549	−0.0002	−0.0171	1.0840	0.0001	−0.0210
FEB	1.1976	0.0004	−0.0222	1.2244	0.0002	−0.0241
MAR	1.3198	−0.0009	−0.0168	1.3243	−0.0005	−0.0227
APR	0.9157	−0.0012	−0.0085	0.8961	−0.0007	−0.0092
MAY	0.6336	−0.0009	−0.0067	0.6291	−0.0006	−0.0069
JUN	0.6461	−0.0007	−0.0078	0.6330	−0.0004	−0.0071
JUL	0.4676	−0.0007	−0.0061	0.4478	−0.0005	−0.0056
AUG	0.2313	−0.0003	−0.0053	0.2838	−0.0004	−0.0048
SEP	0.0623	−0.0002	−0.0039	0.0844	−0.0002	−0.0031
OCT	0.0106	−0.0000	−0.0040	0.0373	−0.0001	−0.0038
NOV	−0.1079	−0.0000	−0.0034	−0.1060	−0.0002	−0.0035
DEC	−0.1429	−0.0003	−0.0014	−0.1442	−0.0003	−0.0033

Open in a new tab

Table 5-2.

Summary statistics: LARE vs LS

							10th	90th
		Min	Max	Mean	Stdev	Median	Percentile	Percentile
LARE	${\hat{β}}_{0}$	−0.1429	1.3198	0.5241	0.5186	0.5506	−0.1079	1.1976
	${\hat{β}}_{1}$	−0.0012	0.0004	−0.0004	0.0005	−0.0003	−0.0009	0.0000
	${\hat{β}}_{2}$	−0.0222	−0.0014	−0.0086	0.0065	−0.0064	−0.0171	−0.0034

LS	${\hat{β}}_{0}$	−0.1442	1.3243	0.5328	0.5172	0.5384	−0.1060	1.2244
	${\hat{β}}_{1}$	−0.0007	0.0002	−0.0003	0.0003	−0.0004	−0.0006	0.0001
	${\hat{β}}_{2}$	−0.0241	−0.0031	−0.0096	0.0081	−0.0063	−0.0227	−0.0033

Open in a new tab

The results show that, LARE and LS give similar estimates which are statistically stable. The predictor based on LARE are financially meaningful and could give better estimates for the intrinsic value of a firm. Moreover, it can be seen that the proposed estimates for β₁ which is the coefficient of P/E ratio in model (5) are substantially more stable than that of P/B ratio.

6. CONCLUDING REMARKS

This paper proposes the least absolute relative errors estimation for multiplicative model. The main point of the paper is to advocating such a criterion, which may have broader applications in financial/economic data analysis, as shown in the real example of this paper and Ye (2007), survival analysis or categorical analysis. Heuristically, in survival analysis, less accuracy in terms of absolute error may be required for predicting longer life times; and, in categorical data analysis, a category with larger percentage of observations may require more accuracy of prediction in terms of absolute error. Such consideration bears the same rationale of using relative error rather than absolute error. Our future work shall consider further extension of the method to censored data and categorical data.

The least absolute relative error criterion that we adopt in (3) is not necessarily the unique choice. There are variations such as

{LARE}_{n}^{'} (β) \equiv \sum_{i = 1}^{n} \max {∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{Y_{i}} ∣, ∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{\exp (X_{i}^{⊺} β)} ∣},

(6)

as also considered in Ye (2007). For such variations, the asymptotic theories analogous to Theorem 1 and Propositions 1 and 2 can be established without further difficulty. In this paper, we choose to present a typical one of the criterions.

For completion, we give the main results for the estimator of such variations here without proof as a note. The assumptions parallel Assumptions 1-5 in Section 3. Similar to Lemma 1 in the Appendix, one can prove that ${LARE}_{n}^{'} (β)$ is strictly convex in β under Assumption 3. Therefore, there exists a unique $β_{n}^{'}$ which minimizes ${LARE}_{n}^{'} (β)$ almost surely. Other than Assumptions 1-3, the following assumptions are needed for consistency and asymptotic normality for ${\hat{β}}_{n}^{'}$ the minimizer of ${LARE}_{n}^{'} (β)$ .

Assumption 6

E(ε + ε⁻¹) < ∞ and E{ε⁻¹I(ε ≤ 1) – εI(ε > 1)} = 0.

Assumption 7

E{ε²I(ε > 1) + ε⁻²I(ε ≤ 1)} < ∞.

Assumptions 6-7 play the same role as Assumptions 4-5 in Section 3. E{ε⁻¹I(ε ≤ 1) – εI(ε > 1)} = 0 shares similar property as E[(ε + ε⁻¹)sgn(ε – 1)] = 0 in Section 3, which is only an identifiability condition.

Proposition 3

Suppose Assumptions 1-3 and Assumption 6 hold. Then, ${\hat{β}}_{n}^{'}$ converges to β₀ in probability as n → ∞. If, in addition to Assumptions 1-3 and Assumption 6, Assumption 7 holds, then as n→ ∞,

\sqrt{n} ({\hat{β}}_{n}^{'} - β_{0}) \overset{D}{\to} N (0, \frac{1}{4} {K + f (1)}^{- 2} B V^{- 1}),

where B = E{ε²I(ε > 1) + ε⁻²I(ε ≤ 1)}, K = E{εI(ε > 1)} and V = E(XX^T).

Plot of four densities.

density1: f(x) = c exp(−∣1 – x∣ – ∣1 – x⁻¹∣ − log x)I(x > 0).

density2: the density of ε where log(ε) ~ N(0, 1).

density3: the density of ε where log(ε) ~ Double Exponential(0, 1).

density4: the density of ε where log(ε) ~ Uniform(−2, 2).

Acknowledgments

The authors are grateful to two anonymous referees, the Associate Editor and the Editor for comments and suggestions that lead to substantial improvements in the paper.

APPENDIX. PROOFS

We state two lemmas that will be used later.

Lemma 1

Let ψ(x, a) = ∣1 – a⁻¹e^x∣ + ∣1 − ae^−x∣ for a > 0 and x ∈ R. Then, for fixed a > 0, ψ(x, a) is a strictly convex function in x ∈ R.

The proof is omitted.

Lemma 2

Suppose that ξ* is nondegenerate and E{exp(ξ*) + exp(−ξ*)} < ∞. Let ϕ(a) = E{exp(ξ* – a) + exp(a − ξ*)}sgn(ξ* − a)] and a* = max{a : ϕ(a) ≥ 0}. If ϕ(a) is continuous at a*, then there exists a unique constant a ∈ R such that ϕ(a) = 0.

Proof

Observe the following inequality

\begin{matrix} {\exp (x - b) & + \exp (b - x)} sgn (x - b) \\ - {\exp (x - a) + \exp (a - x)} sgn (x - a) \\ \leq \int_{a}^{b} {- \exp & (x - y) + \exp (- x + y)} sgn (x - y) dy, \end{matrix}

(A.1)

for any x, a and b ∈ R with a < b. Then,

\begin{matrix} ϕ (b) - ϕ (a) & \leq E [\int_{a}^{b} {- \exp (ξ^{*} - y) + \exp (- ξ^{*} + y)} sgn (ξ^{*} - y) dy] \\ = \int_{a}^{b} E [{- \exp (ξ^{*} - y) + \exp (- ξ^{*} + y)} sgn (ξ^{*} - y)] dy . \end{matrix}

(A.2)

It is easy to show that {exp(−x + y) − exp(x − y)}sgn(x − y) < 0 for x ≠ y. It follows that

E [{- \exp (ξ^{*} - y) + \exp (- ξ^{*} + y)} sgn (ξ^{*} - y)] < 0,

which implies that ϕ(b) − ϕ(a) < 0. Thus, ϕ(·) is strictly decreasing. On the other hand, it is seen from the expression ϕ(·) that

ϕ (a) \to - \infty as a \to \infty and ϕ (a) \to \infty as a \to - \infty .

Together with the continuity of ϕ(·) at a*, there exists a unique solution to ϕ(a) = 0. The proof is complete.

A.1. Proof of Theorem 1

The proof will be done in several steps.

Step 1

To prove consistency, denote

ψ_{n} (β) \equiv \sum_{i = 1}^{n} [∣ 1 - ε_{i}^{- 1} \exp {X_{i}^{⊺} (β - β_{0})} ∣ + ∣ 1 - ε_{i} \exp {- X_{i}^{⊺} (β - β_{0})} ∣] .

It follows from the Convexity Lemma in Pollard (1991, p. 187) and the convexity of ψ_n(β) by Lemma 1 that, for any compact set $B$ ,

\sup_{β \in B} \frac{1}{n} ∣ ψ_{n} (β) - E {ψ_{n} (β)} ∣ \to 0

(A.3)

in probability as n → ∞. Then,

\begin{matrix} E {ψ_{n} (β) - ψ_{n} (β_{0})} \\ = & \sum_{i = 1}^{n} E [∣ 1 - ε_{i}^{- 1} \exp {X_{i}^{⊺} (β - β_{0})} ∣ + ∣ 1 - ε_{i} \exp {- X_{i}^{⊺} (β - β_{0})} ∣ - ∣ 1 - ε_{i}^{- 1} ∣ - ∣ 1 - ε_{i} ∣] \\ = & \sum_{i = 1}^{n} E ((ε_{i} + ε_{i}^{- 1}) sgn (1 - ε_{i}) [\exp {X_{i}^{⊺} (β - β_{0})} - 1]) \\ + \sum_{i = 1}^{n} E (ε_{i} sgn (ε_{i} - 1) [\exp {X_{i}^{⊺} (β - β_{0})} + \exp {- X_{i}^{⊺} (β - β_{0})} - 2]) \\ + 2 \sum_{i = 1}^{n} E ({I (ε_{i} \leq \exp {X_{i}^{⊺} (β - β_{0})}) - I (ε_{i} \leq 1)} \\ [ε_{i}^{- 1} \exp {X_{i}^{⊺} (β - β_{0})} - ε_{i} \exp {- X_{i}^{⊺} (β - β_{0})}]) . \end{matrix}

(A.4)

By Assumption 4, the first term in the summand is 0. It follows from Assumptions 1 and 4 that,

2 E {ε I (ε > 1)} > E {(ε + ε^{- 1}) I (ε > 1)} = E {(ε + ε^{- 1}) I (ε \leq 1)} > 2 E {ε I (ε \leq 1)},

(A.5)

which implies J = E{εsgn(ε − 1)} > 0. This result leads to the fact that the second term in (A.4) is nonnegative. It is easy to check that the third term in (A.4) is also nonnegative. Hence, E{ψ_n(β) − ψ_n(β₀)} ≥ 0 for all β. Furthermore, E{ψ_n(β) − ψ_n(β₀)} = 0 ensures

\sum_{i = 1}^{n} E (ε_{i} sgn (ε_{i} - 1) [\exp {X_{i}^{⊺} (β - β_{0})} + \exp {- X_{i}^{⊺} (β - β_{0})} - 2]) = 0 .

As β = β₀ is the unique minimizer of $\exp {X_{i}^{⊺} (β - β_{0})} + \exp {- X_{i}^{⊺} (β - β_{0})}$ , it follows from Assumption 3 and E{εsgn(ε − 1)} > 0 that β = β₀ is the unique minimizer of E{ψ_n(β) − ψ_n(β₀)}. Denote ψ(β) = n⁻¹E{ψ_n(β)}. Then, for every δ > 0, there exists η > 0 such that ψ(β) > ψ(β₀) + η for ∥β − β₀∥ ≥ δ. For any constant δ and C, let ${\hat{β}}_{n}^{*}$ be the minimizer of ψ_n(β) over δ ≤ ∥β − β₀∥ ≤ C. Then by (A.3), $ψ_{n} (β_{n}^{*}) \to ψ (β_{n}^{*})$ in probability as n → ∞ and $ψ (β_{n}^{*}) < ψ (β_{0}) + η$ for some η > 0. On the other hand, for any constant δ,

\inf_{‖ β - β_{0} ‖ \leq δ} ψ_{n} (β) \leq ψ_{n} (β_{0}) \to ψ (β_{0})

in probability by (A.3). Therefore, with probability going to 1, the minimum of ψ_n(β) in ∥β − β₀∥ ≤ C is achieved inside ∥β − β₀∥ ≤ δ. Since ψ_n(β) is strictly convex, the local minimizer inside ∥β − β₀∥ ≤ δ is the unique global minimizer. By the definition of ${\hat{β}}_{n}, P ({\hat{β}}_{n} \in {β : ‖ β - β_{0} ‖ \leq δ}) \to 1 as n \to \infty$ . Thus, the weak consistency of ${\hat{β}}_{n}$ is proved by letting δ → 0.

Step 2

To prove asymptotic normality, we approximate E{ψ_n(β) − ψ_n(β₀)} for every fixed β in a neighborhood of β₀ first. Observe that exp(x) + exp(x) − 2 = x² + O(∣x∣³) if x closes to zero. By the Taylor expansion,

\begin{matrix} \frac{1}{n} E {ψ_{n} (β) - ψ_{n} (β_{0})} \\ = & J \cdot \frac{1}{n} \sum_{i = 1}^{n} E [\exp {- X_{i}^{⊺} (β - β_{0})} + \exp {X_{i}^{⊺} (β - β_{0})} - 2] \\ + 2 f (1) \frac{1}{n} \sum_{i = 1}^{n} E {{(β - β_{0})}^{⊺} X_{i} X_{i}^{⊺} (β - β_{0})} + O ({‖ β - β_{0} ‖}^{3}) \\ = & J {(β - β)}^{⊺} V (β - β_{0}) + 2 f (1) {(β - β_{0})}^{⊺} V (β - β_{0}) + O ({‖ β - β_{0} ‖}^{3}) \\ = & {J + 2 f (1)} {(β - β_{0})}^{⊺} V (β - β_{0}) + O ({‖ β - β ‖}^{3}), \end{matrix}

(A.6)

where J = E{εsgn(ε − 1)} and V = E(XX^T).

Step 3

Write $W_{n} = \sum_{i = 1}^{n} (ε_{i} + ε_{i}^{- 1}) sgn (ε_{i} - 1) X_{i}$ . We are now in position to show

\sup_{‖ β - β ‖ \leq {Cn}^{- 1 ∕ 2}} ∣ ψ_{n} (β) - ψ_{n} (β_{0}) + W_{n}^{⊺} (β - β_{0}) - E {ψ_{n} (β_{0}) - ψ_{n} (β_{0})} ∣ \to 0

(A.7)

in probability as n → ∞, for each positive constant C. To this end, let $θ = \sqrt{n} (β - β_{0})$ , it is equivalent to show

\sup_{‖ θ ‖ \leq C} ∣ ψ_{n} (β_{0} + \frac{θ}{\sqrt{n}}) - ψ_{n} (β_{0}) + \frac{1}{\sqrt{n}} W_{n}^{⊺} θ - E {ψ_{n} (β_{0} + \frac{β}{\sqrt{n}}) - ψ_{n} (β_{0})} ∣ \to 0

(A.8)

in probability as n → ∞. In order to establish (A.8), we shall first show that, for each fixed θ,

ψ_{n} (β_{0} + \frac{θ}{\sqrt{n}}) - ψ_{n} (β_{0}) + \frac{1}{\sqrt{n}} W_{n}^{⊺} θ - E {ψ_{n} (β_{0} + \frac{θ}{\sqrt{n}}) - ψ_{n} (β_{0})} \to 0

(A.9)

in probability as n → ∞. Analogous to (A.4), denote

G_{i} (β) \equiv ε_{i} sgn (ε_{i} - 1) [\exp {X_{i}^{⊺} (β - β_{0})} + \exp {- X_{i}^{⊺} (β - β_{0})} - 2]

and

\begin{matrix} R_{i} (β) \equiv [I (ε_{i} > \exp {X_{i}^{⊺} (β - β_{0})}) & - I (ε_{i} > 1)] \\ [ε_{i} \exp {- X_{i}^{⊺} (β - β_{0})} - ε_{i}^{- 1} \exp {X_{i}^{⊺} (β - β_{0})}] . \end{matrix}

Then,

\begin{matrix} ψ_{n} (β) - ψ_{n} (β_{0}) - E {ψ_{n} (β) - ψ_{n} (β_{0})} \\ = & - \sum_{i = 1}^{n} (ε_{i} + ε_{i}^{- 1}) sgn (ε_{i} - 1) [\exp {X_{i}^{⊺} (β - β_{0})} - 1] \\ + \sum_{i = 1}^{n} {G_{i} (β) - {EG}_{i} (β)} + 2 \sum_{i = 1}^{n} {R_{i} (β) - {ER}_{i} (β)} . \end{matrix}

For each fixed θ,

\begin{matrix} \sum_{i = 1}^{n} E {[G_{i} (β_{0} + \frac{θ}{\sqrt{n}}) - E {G_{i} (β_{0} + \frac{θ}{\sqrt{n}})}]}^{2} \\ \leq & \sum_{i = 1}^{n} E {ε_{i} sgn (ε_{i} - 1)}^{2} E {\exp (- \frac{1}{\sqrt{n}} X_{i}^{⊺} θ) + \exp (\frac{1}{\sqrt{n}} X_{i}^{⊺} θ) - 2}^{2} \\ = & \sum_{i = 1}^{n} E {ε_{i} sgn (ε_{i} - 1)}^{2} E {(\frac{1}{n} θ^{⊺}, X_{i} X_{i}^{⊺} θ + a_{i})}^{2}, say \\ \to & 0 \end{matrix}

(A.10)

as n → ∞, where P(∥a_i∥ ≤ cn^−3/2) = 1 for some constant c and i = 1,…, n. It then follows that

\sum_{i = 1}^{n} [G_{i} (β_{0} + \frac{θ}{\sqrt{n}}) - E {G_{i} (β_{0} + \frac{θ}{\sqrt{n}})}] \to 0

(11)

in probability as n → ∞. On the other hand, by the Taylor expansion, for each fixed θ,

\begin{matrix} E {ε \exp (- \frac{1}{\sqrt{n}} X^{⊺} θ) - ε^{- 1} \exp (\frac{1}{\sqrt{n}} X^{⊺} θ)}^{2} \\ = & E {(- ε \frac{1}{\sqrt{n}} X^{⊺} θ - ε^{- 1} \frac{1}{\sqrt{n}} X^{⊺} θ + ε - ε^{- 1} + b)}^{2} \\ = & E {- (ε - 1) \frac{1}{\sqrt{n}} X^{⊺} θ - (ε^{- 1} - 1) \frac{1}{\sqrt{n}} X^{⊺} θ - \frac{2}{\sqrt{n}} X^{⊺} θ + (ε - 1) - (ε^{- 1} - 1) + b}^{2} \\ \leq & E [2 {{(ε - 1)}^{2} + {(ε^{- 1} - 1)}^{2} + 4} \frac{1}{n} θ^{⊺} {XX}^{⊺} θ + 2 {(ε - 1)}^{2} + 2 {(ε^{- 1} - 1)}^{2} + b^{2}], say \end{matrix}

where P(∥b∥ ≤ cn⁻¹) = 1 for some constant c. Hence, an argument similar to (A.10) leads to

\begin{matrix} \sum_{i = 1}^{n} E {[R_{i} (β_{0} + \frac{θ}{\sqrt{n}}) - E {R_{i} (β_{0} + \frac{θ}{\sqrt{n}})}]}^{2} \\ \leq & \sum_{i = 1}^{n} E {{I (\frac{1}{\sqrt{n}} X_{i}^{⊺} θ > 0) I (0 < \log ε_{i} \leq \frac{1}{\sqrt{n}} X_{i}^{⊺} θ) \\ + I (\frac{1}{\sqrt{n}} X_{i}^{⊺} θ \leq 0) I (0 \geq \log ε_{i} > \frac{1}{\sqrt{n}} X_{i}^{⊺} θ)} \\ [2 {{(ε_{i} - 1)}^{2} + {(ε_{i}^{- 1} - 1)}^{2} + 4} \frac{1}{n} θ^{⊺} X_{i} X_{i}^{⊺} θ + 2 {(ε_{i} - 1)}^{2} + 2 {(ε_{i}^{- 1} - 1)}^{2} + b_{i}^{2}], say \\ \to & 0 \end{matrix}

as n → ∞, where P(∥b_i∥ ≤ cn⁻¹) = 1 for some constant c and i = 1,…, n. Thus, for each fixed θ,

\sum_{i = 1}^{n} [R_{i} (β_{0} + \frac{θ}{\sqrt{n}}) - E {R_{i} (β_{0} + \frac{θ}{\sqrt{n}})}] \to 0

(A.12)

in probability as n → ∞. Combining (A.11) and (A.12), together with Assumption 4, we have shown (A.9).

Next, $ψ_{n} (β_{0} + θ ∕ \sqrt{n}) - ψ_{n} (β_{0}) + W_{n}^{⊺} θ ∕ \sqrt{n}$ is convex by Lemma 1. It follows from (A.9) and the Convexity Lemma in Pollard (1991, p. 187) that, for each constant C > 0,

\sup_{‖ θ ‖ \leq C} ∣ ψ_{n} (β_{0} + \frac{θ}{\sqrt{n}}) - ψ_{n} (β_{0}) + \frac{1}{\sqrt{n}} W_{n}^{⊺} θ - E {ψ_{n} (β_{0} + \frac{θ}{\sqrt{n}}) - ψ_{n} (β_{0})} ∣ \to 0

in probability. Then (A.7) is proved.

Step 4

Let $ξ_{n} (β) = ψ_{n} (β) - ψ_{n} (β_{0}) - n {J + 2 f (1)} {(β - β_{0})}^{⊺} V (β - β_{0}) + W_{n}^{⊺} (β - β_{0})$ . Combining step 2 and step 3, we have

\sup_{‖ β - β_{0} ‖ \leq {Cn}^{- 1 ∕ 2}} ∣ ξ_{n} (β) ∣ \to 0

(A.13)

in probability as n → ∞ for each constant C > 0. Let ${\hat{β}}_{n}^{*}$ be the minimizer of $n {J + 2 f (1)} {(β - β_{0})}^{⊺} V (β - β_{0}) - W_{n}^{⊺} (β - β_{0})$ . Clearly ${\hat{β}}_{n}^{*} - β_{0} = {J + 2 f (1)}^{- 1} V^{- 1} W_{n} ∕ (2 n)$ . By the definition of W_n, for every δ > 0, there exist some constants Kδ > 0 and N_δ, such that $P (‖ {\hat{β}}_{n}^{*} - β ‖ > K_{δ} n^{- 1 ∕ 2}) \leq δ ∕ 2$ for any n ≥ N_δ. In view of (A.13), for every η > 0, there exists some constant N_η such that, for any n ≥ N_η,

P (\sup_{‖ β - β_{0} ‖ \leq K_{δ} n^{- 1 ∕ 2}} ∣ ξ_{n} (β) ∣ > η) \leq δ ∕ 2 .

Hence, for every δ, η > 0, there exists N = max{N_δ, N_η} such that, for any n ≥ N,

\begin{matrix} P (∣ ξ_{n} ({\hat{β}}_{n}^{*}) ∣ > η) & = P (∣ ξ_{n} ({\hat{β}}_{n}^{*}) ∣ > η, ‖ {\hat{β}}_{n}^{*} - β_{0} ‖ > K_{δ} n^{- 1 ∕ 2}) \\ + P (∣ ξ_{n} ({\hat{β}}_{n}^{*}) ∣ > η, ‖ {\hat{β}}_{n}^{*} - β_{0} ‖ \leq K_{δ} n^{- 1 ∕ 2}) \\ \leq P (‖ {\hat{β}}_{n}^{*} - β_{0} ‖ > K_{δ} n^{- 1 ∕ 2}) + P (\sup_{‖ β - β_{0} ‖ \leq K_{δ} n^{- 1 ∕ 2}} ∣ ξ_{n} (β) ∣ > η) \\ \leq δ, \end{matrix}

which implies $ξ_{n} ({\hat{β}}_{n}^{*}) = o_{p} (1)$ . Similar arguments also lead to

\sup_{‖ β - {\hat{β}}_{n}^{*} ‖ \leq {Cn}^{- 1 ∕ 2}} ∣ ξ_{n} (β) ∣ = o_{p} (1)

for each constant C > 0.

Observe that

\begin{matrix} ψ_{n} (β) - ψ_{n} (β_{0}) = n {J + 2 f (1)} {(β - {\hat{β}}_{n}^{*})}^{⊺} & V (β - {\hat{β}}_{n}^{*}) \\ - \frac{1}{4 n} {J + 2 f (1)}^{- 1} W_{n}^{⊺} V^{- 1} W_{n} + ξ_{n} (β) . \end{matrix}

For any constants c and C with 0 < c < C < ∞,

\begin{matrix} \inf_{{cn}^{- 1 ∕ 2} \leq ‖ β - {\hat{β}}_{n}^{*} ‖ \leq {Cn}^{- 1 ∕ 2}} {ψ_{n} (β) - ψ_{n} (β_{0})} \\ \leq & \inf_{{cn}^{- 1 ∕ 2} \leq ‖ β - {\hat{β}}_{n}^{*} ‖ \leq {Cn}^{- 1 ∕ 2}} [n {J + 2 f (1)} {(β - {\hat{β}}_{n}^{*})}^{⊺} V (β - {\hat{β}}_{n}^{*}) \\ - \frac{1}{4 n} {J + 2 f (1)}^{- 1} W_{n}^{⊺} V^{- 1} W_{n}] - \sup_{{cn}^{- 1 ∕ 2} \leq ‖ β - {\hat{β}}_{n}^{*} ‖ \leq {Cn}^{- 1 ∕ 2}} ∣ ξ_{n} (β) ∣ \\ \geq & {J + 2 f (1)} c^{2} λ - \frac{1}{4 n} {J + 2 f (1)}^{- 1} W_{n}^{⊺} V^{- 1} W_{n} + o_{p} (1), \end{matrix}

(A.14)

where λ is the smallest eigenvalue of V. On the other hand, for any constant c,

\begin{matrix} \inf_{‖ β - {\hat{β}}_{n}^{*} ‖ \leq {cn}^{- 1 ∕ 2}} {ψ_{n} (β) - ψ_{n} (β_{0})} & \leq ψ_{n} ({\hat{β}}_{n}^{*}) - ψ_{n} (β_{0}) \\ = - \frac{1}{4 n} {J + 2 f (1)}^{- 1} W_{n}^{⊺} V^{- 1} W_{n} + ξ_{n} ({\hat{β}}_{n}^{*}) \\ = - \frac{1}{4 n} {J + 2 f (1)}^{- 1} W_{n}^{⊺} V^{- 1} W_{n} + o_{p} (1) . \end{matrix}

(A.15)

Both (A.14) and (A.15) together imply that, with probability going to 1, the minimum of ψ_n(β) − ψ_n(β₀) in $‖ β - {\hat{β}}_{n}^{*} ‖ \leq {Cn}^{- 1 ∕ 2}$ is achieved inside $‖ β - {\hat{β}}_{n}^{*} ‖ \leq {cn}^{- 1 ∕ 2}$ . Since ψ_n(β)−ψ_n(β₀) is convex, the local minimizer inside $‖ β - {\hat{β}}_{n}^{*} ‖ \leq {cn}^{- 1 ∕ 2}$ is the global minimizer. Thus,

\begin{matrix} {\hat{β}}_{n} - β_{0} & = {\hat{β}}_{n}^{*} - β_{0} + o_{p} (n^{- 1 ∕ 2}) \\ = \frac{1}{2 \sqrt{n}} {J + 2 f (1)}^{- 1} V^{- 1} \frac{1}{\sqrt{n}} W_{n} + o_{p} (n^{- 1 ∕ 2}) . \end{matrix}

Hence, as n → ∞,

\sqrt{n} ({\hat{β}}_{n} - β_{0}) \overset{D}{\to} N (0, \frac{1}{4} {J + 2 f (1)}^{- 2} A V^{- 1}),

A.2. Proof of Proposition 2

For the given density of ε, the density of Y_i given X_i is

f_{Y_{i} ∣ X_{i}} (y) = c \exp {- ∣ \frac{\exp (X_{i}^{⊺} β) - y}{\exp (X_{i}^{⊺} β)} ∣ - ∣ \frac{y - \exp (X_{i}^{⊺} β)}{y} ∣ - \log y} .

Then, the likelihood function of Y is

L (β) = c^{n} \exp [- \sum_{i = 1}^{n} {∣ \frac{\exp (X_{i}^{⊺} β) - Y_{i}}{\exp (X_{i}^{⊺} β)} ∣ - ∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{Y_{i}} ∣ - \log Y_{i}}] .

Maximizing this likelihood function is equivalent to minimizing our proposed LARE criterion

\sum_{i = 1}^{n} {∣ \frac{\exp (X_{i}^{⊺} β) - Y_{i}}{\exp (X_{i}^{⊺} β)} ∣ + ∣ \frac{Y_{i} - \exp (X_{i}^{⊺} β)}{Y_{i}} ∣} .

Therefore the estimator ${\hat{β}}_{n}$ , which minimizes LARE_n(β), is efficient when ε ~ f(·) = c exp(−∣1 − x∣ − ∣1 − x⁻¹∣ − log x)I(x > 0). The proof is complete.

Contributor Information

Kani CHEN, Department of Mathematics, HKUST, Kowloon, Hong Kong, China (makchen@ust.hk).

Shaojun GUO, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R.China (guoshaoj@amss.ac.cn).

Yuanyuan LIN, Department of Mathematics, HKUST, Kowloon, Hong Kong, China (linyy@ust.hk).

Zhiliang YING, Department of Statistics, 618 Mathematics, Columbia University, New York NY 10027 (zying@stat.columbia.edu).

References

Chen K, Ying Z, Zhang H, Zhao L. Analysis of least absolute deviation. Biometrika. 2008;95:107–122. [Google Scholar]
Gauss CF. Theoria Motus Corporum Coelestium. Perthes; Hamburg: 1809. [Google Scholar]; Theory of the Motions of the Heavenly Bodies Moving about the Sun in Conic Sections. Dover; New York: 1963. Translation reprinted as. [Google Scholar]
Khoshgoftaar TM, Bhattacharyya BB, Richardson GD. Predicting software errors, during development, using nonlinear regression models: a comparative study. IEEE Transactions on Reliability. 1992;41:390–395. [Google Scholar]
Knight K. Limiting distribution for L1 regression estimators under general conditions. the Annals of Statistics. 1998;26:755–770. [Google Scholar]
Makridakis S, Andersen A, Carbone R, Fildes R, Hibon M, Lewandowski R, Newton J, Parzen E, Winkler R. The Forecasting Accuracy of Major Time Series Methods. Wiley; New York: 1984. [Google Scholar]
Narula SC, Wellington JF. Prediction, linear regression and the minimum sum of relative errors. Technometrics. 1977;19:185–190. [Google Scholar]
Park H, Stefanski LA. Relative-error prediction. Statistics and Probability Letters. 1998;40:227–236. [Google Scholar]
Pollard D. Asymptotics for least absolute deviations regression estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]
Portnoy S, Koenker R. The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators (with discussion) Statistical Science. 1997;12:279–300. [Google Scholar]
Stigler SM. Gauss and the Invention of Least Squares. the Annals of Statistics. 1981;9:465–474. [Google Scholar]
Ye J. Price models and the value relevance of accounting information. Technical report. 2007 [Google Scholar]

[R1] Chen K, Ying Z, Zhang H, Zhao L. Analysis of least absolute deviation. Biometrika. 2008;95:107–122. [Google Scholar]

[R2] Gauss CF. Theoria Motus Corporum Coelestium. Perthes; Hamburg: 1809. [Google Scholar]; Theory of the Motions of the Heavenly Bodies Moving about the Sun in Conic Sections. Dover; New York: 1963. Translation reprinted as. [Google Scholar]

[R3] Khoshgoftaar TM, Bhattacharyya BB, Richardson GD. Predicting software errors, during development, using nonlinear regression models: a comparative study. IEEE Transactions on Reliability. 1992;41:390–395. [Google Scholar]

[R4] Knight K. Limiting distribution for L1 regression estimators under general conditions. the Annals of Statistics. 1998;26:755–770. [Google Scholar]

[R5] Makridakis S, Andersen A, Carbone R, Fildes R, Hibon M, Lewandowski R, Newton J, Parzen E, Winkler R. The Forecasting Accuracy of Major Time Series Methods. Wiley; New York: 1984. [Google Scholar]

[R6] Narula SC, Wellington JF. Prediction, linear regression and the minimum sum of relative errors. Technometrics. 1977;19:185–190. [Google Scholar]

[R7] Park H, Stefanski LA. Relative-error prediction. Statistics and Probability Letters. 1998;40:227–236. [Google Scholar]

[R8] Pollard D. Asymptotics for least absolute deviations regression estimators. Econometric Theory. 1991;7:186–199. [Google Scholar]

[R9] Portnoy S, Koenker R. The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators (with discussion) Statistical Science. 1997;12:279–300. [Google Scholar]

[R10] Stigler SM. Gauss and the Invention of Least Squares. the Annals of Statistics. 1981;9:465–474. [Google Scholar]

[R11] Ye J. Price models and the value relevance of accounting information. Technical report. 2007 [Google Scholar]

PERMALINK

Least Absolute Relative Error Estimation

Kani CHEN

Shaojun GUO

Yuanyuan LIN

Zhiliang YING

Roles

Abstract

1. INTRODUCTION

2. THE MODEL AND THE LARE CRITERION

Remark 1

3. ASYMPTOTIC PROPERTIES

Assumption 1

Assumption 2

Assumption 3

Assumption 4

Assumption 5

Remark 2

Remark 3

Theorem 1

Remark 4

Proposition 1

Proposition 2

Remark 5

4. SIMULATION STUDIES

Table 4-1.

Table 4-2.

5. APPLICATIONS

Table 5-1.

Table 5-2.

6. CONCLUDING REMARKS

Assumption 6

Assumption 7

Proposition 3

Figure 1.

Acknowledgments

APPENDIX. PROOFS

Lemma 1

Lemma 2

Proof

A.1. Proof of Theorem 1

Step 1

Step 2

Step 3

Step 4

A.2. Proof of Proposition 2

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases