Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2020 May 3;48(13-15):2348–2368. doi: 10.1080/02664763.2020.1761951

Minimizing the expected value of the asymmetric loss function and an inequality for the variance of the loss

Naoya Yamaguchi a,CONTACT, Yuka Yamaguchi a, Ryuei Nishii b
PMCID: PMC9041947  PMID: 35707067

ABSTRACT

The coefficients of regression are usually estimated for minimization problems with asymmetric loss functions. In this paper, we rather correct predictions so that the prediction error follows a generalized Gaussian distribution. In our method, we not only minimize the expected value of the asymmetric loss, but also lower the variance of the loss. Predictions usually have errors. Therefore, it is necessary to use predictions in consideration of these errors. Our approach takes into account prediction errors. Furthermore, even if we do not understand the prediction method, which is a possible circumstance in, e.g. deep learning, we can use our method if we know the prediction error distribution and asymmetric loss function. Our method can be applied to procurement of electricity from electricity markets.

Keywords: Asymmetric loss function, gamma function, generalized Gaussian distribution, minimizing the expectation value, risk function

2010 Mathematics Subject Classifications: 62A99, 62E99

1. Introduction

In this paper, we treat minimization problems with loss functions as follows: Let {(xi,yi)1in} be a dataset, where xi are 1×p explanatory vectors and yiR are target variables. We assume that the following linear regression model:

y=Xβ+ε,

where y=t(y1,,yn), ε=t(ε1,,εn), and X is an n×p design matrix having xi as the ith row. Let L be the loss function for the parameter estimation. The parameter vector β can be estimated by

βˆ:=argminβi=1nL(εi).

The case of L(ε)=ε2 is known as the quadratic loss function, which is symmetric (see, e.g. Refs. [1,10,12]). In the case of an asymmetric loss function, we refer the reader to, e.g. Refs. [3,5,9,16]. Consider the case that the prediction accuracy is assessed by a risk function based on an asymmetric loss function. Let y and yˆ be an observed and a predicted value of y. Suppose that the prediction error yˆy follows a general Gaussian distribution. Then, the error can be corrected by adding a constant C, i.e. yˆy+C, determined by minimizing the risk function. We make the following assumptions:

  1. The prediction error z:=yˆy is the realized value of a random variable Z, whose density function is a generalized Gaussian distribution function (see, e.g. Refs. [4,11,13]) with mean zero
    fZ(z):=12abΓ(a)expzb1/a,
    where Γ(a) is the gamma function and a, bR>0.
  2. Let k1, k2R>0. If there is a mismatch between y and yˆ, we suffer a loss,
    L(z):=k1z,z0,k2z,z<0.

Here, fZ(z) could be a Gaussian distribution N(0,12b2) or Laplace distribution Laplace(0,b) with mean zero (see, e.g. [[7, p. 80], [8, p. 164]]). That is, fZ(z) is the probability density function of a Gaussian distribution when a=12, and fZ(z) is the probability density function of a Laplace distribution when a = 1.

Under the assumptions (I) and (II), we will derive the optimized predicted value y=yˆ+C minimizing the expected value of the loss (risk). That is, we will solve the following risk minimization problem:

C=argmincEL(Z+c).

In our method, we not only minimize the expected value of the asymmetric loss, but also lower the variance of the loss:

V[L(Z+C)]V[L(Z)].

The variance of the loss is reduced by correcting the predicted value y to the optimized predicted value y.

The motivation of our research is as follows: (1) Suppose we have a risk minimization problem based on an asymmetric loss function. Then, we can reduce the risk by correcting the predictors. For example, the paper [15] formulates a method for minimizing the expected value of the procurement cost of electricity in two popular spot markets: day-ahead and intra-day, under the assumption that the expected value of the unit prices and the distributions of the prediction errors for the electricity demand traded in two markets are known. The paper showed that if the procurement is increased or decreased from the prediction, in some cases, the expected value of the procurement cost is reduced. Our method is a generalization of the method in [15].

(2) In recent years, prediction methods have been black boxed by big data and machine learning (see, e.g. Ref. [6]). The day will soon come when we must minimize the objective function by using predictions obtained by such black-box methods. In our method, even if we do not know the prediction yˆ, we can determine the parameter C if we know the prediction error distribution fZ and asymmetric loss function L.

The remainder of this paper is organized as follows: In Section 2, we introduce the expected value and the variance of L(Z+c) and determine the value c = C that gives the minimum value of E[L(Z+c)]. In addition, we give a geometrical interpretation of the parameter C and give the minimized expected value E[L(Z+C)]. In Section 3, we prove the inequality for the variance of the loss. In Section 4, we report the results of simulations of our method using actual data.

2. Expected value and variance of the loss

In the following, we assume that (I) and (II) hold. Here, we introduce the expected value and the variance of L(Z+c) and determine the value c=C that gives the minimum value of E[L(Z+c)]. In addition, we give a geometrical interpretation of the parameter C and give the minimized expected value E[L(Z+C)].

2.1. Expected value and variance of the loss

Let y be an observation value and yˆ be a predicted value of y. Let Γ(a) be the gamma function, and let Γ(a,x) and γ(a,x) be the upper and the lower incomplete gamma functions, respectively (see, e.g. Refs. [2, p. 197, 4, p. 2, 14, p. 93]), defined by

Γ(a):=0+ta1etdt,Γ(a,x):=x+ta1etdt,γ(a,x):=0xta1etdt,

where Re(a)>0 and x0. Also, for cR, let sgn(c):=1 (c0); 1 (c<0). Then, the expected value and variance of L(Z+c) are as follows (see Appendix 1 for the proof): For any cR, we have

E[L(Z+c)]=(k1k2)c2+(k1+k2)c2Γ(a)γa,cb1/a+(k1+k2)b2Γ(a)Γ2a,cb1/a, (1)
V[L(Z+c)]=(k1+k2)2c24+(k12k22)bc2Γ(a)Γ2a,cb1/a(k1+k2)2bc2Γ(a)2γa,cb1/aΓ2a,cb1/a(k1+k2)2c24Γ(a)2γa,cb1/a2(k1+k2)2b24Γ(a)2Γ2a,cb1/a2+(k12+k22)b2Γ(3a)2Γ(a)+sgn(c)(k12k22)b22Γ(a)γ3a,cb1/a. (2)

From this, we have the following:

E[L(Z)]=(k1+k2)b2Γ(a)Γ2a, (3)
V[L(Z)]=(k12+k22)b2Γ(3a)2Γ(a)(k1+k2)2b2Γ(2a)24Γ(a)2. (4)

Let erf(x) be the error function (see, e.g. Refs. [2, p. 196]) defined by

erf(x):=2π0xexpt2dt

for any xR. We give two examples of E[L(Z+c)] and V[L(Z+c)]. In the case of ZLaplace(0,b), since a=1, we have

E[L(Z+c)]=L(c)+(k1+k2)b2expcb,V[L(Z+c)]=k12+k22+sgn(c)(k12k22)b2sgn(c)(k1+k2)L(c)+b(k1k2)×bexpcb(k1+k2)2b24exp2cb.

To derive these expressions, we used the equations γ(1,x)=1ex,Γ(2,x)=(1+x)ex, and γ(3,x)=2(2+2x+x2)ex, which are easily obtained from the definitions of the incomplete gamma functions. In the case of ZN(0,12b2), since a=12, we have

E[L(Z+c)]=(k1k2)c2+(k1+k2)c2erfcb+(k1+k2)b2πexpc2b2,V[L(Z+c)]=(k12+k22)b24+(k1+k2)2c24+(k12k22)b24erfcb(k1+k2)2c24erf2cb(k1+k2)2bc2πerfcbexpc2b2(k1+k2)2b24πexp2c2b2.

To derive these expressions, we used the equations γ(12,x)=πerf(π),Γ(1,x)=ex, and γ(32,x)=(π/2)erf(π)πex, which are easily obtained from the definitions of the incomplete gamma functions. For k1=50 and b = 1, we can plot E[L(Z)] and V[L(Z)] for the Laplace and the Gaussian distributions respect to k2 as follows:

In both cases, the graph of the expected value is a straight line (positive slope), and the graph of the variance is a quadratic curve (convex downward) for k2 (Figures 12).

Figure 1.

Figure 1.

Plots for a Laplace distribution. (a) Expected value of L(Z). (b) Variance of L(Z).

Figure 2.

Figure 2.

Plots for a Gaussian distribution. (a) Expected value of L(Z). (b) Variance of L(Z).

2.2. Parameter value minimizing the expected value

Here, we determine the value c = C that gives the minimum value of E[L(Z+c)]. Since

ddcΓ2a,cb1/a=cabexpcb1/a;ddcγa,cb1/a=sgn(c)1abexpcb1/a,

we have

ddcE[L(Z+c)]=k1k22+sgn(c)k1+k22Γ(a)γa,cb1/a.

We will denote the value of c satisfying (d/dc)E[L(Z+c)]=0 as C. Then, from the first derivative test, we find that E[L(Z+c)] has a minimum value at c = C.

c Less than C C More than C
ddcE[L(Z+c)] Negative 0 Positive
E[L(Z+c)] Strongly decreasing   Strongly increasing

Also, it follows from

γa,Cb1/a=sgn(C)k2k1k1+k2Γ(a) (5)

that sgn(C)=sgn(k2k1) and C = 0 only when k1=k2. Equation (5) presents us with a geometrical interpretation of C as follows: The ratio of Γ(a) and γ(a,C/b1/a) is 1:k2k1/(k1+k2). That is, the vertical axis t=C/b1/a divides the area between ta1et and the t-axis into k2k1/(k1+k2):1k2k1/(k1+k2).

Let erf1(x) be the inverse error function. We give two examples of C. In the case of ZLaplace(0,b), since a = 1, from γ(1,x)=1ex, we have

C=sgn(k2k1)blog1k2k1k1+k2.

In the case of ZN(0,12b2), since a=12, from γ(12,x)=πerf(π), we have

C=sgn(k2k1)berf1k2k1k1+k2. (6)

Figure 3 plots C for Laplace and Gaussian distributions respect to k2 for k1=50 and b = 1.

Figure 3.

Figure 3.

Plots of C for Laplace and Gaussian distributions. (a) When ZLaplace(0,1). (b) When ZN(0,12).

2.3. Minimized expected value of the loss

Here, we derive the minimum value of E[L(Z+c)]. Substituting c=C in Equation (A1), we have, from Equation (5),

E[L(Z+C)]=(k1+k2)b2Γ(a)Γ2a,Cb1/a.

This is the minimum value of E[L(Z+c)]. From this and Equation (3), we have

E[L(Z)]E[L(Z+C)]=(k1+k2)b2Γ(a)γ2a,Cb1/a,E[L(Z+C)]E[L(Z)]=1Γ(2a)Γ2a,Cb1/a.

Figure 4 plots E[L(Z)]E[L(Z+C)] for the Laplace and Gaussian distributions respect to k2 for k1=50 and b = 1.

Figure 4.

Figure 4.

Plots of E[L(Z)]E[L(Z+C)] for the Laplace and Gaussian distributions. (a) When ZLaplace(0,1). (b) When ZN(0,12).

3. Inequality for the variance of the loss

Here, we derive an inequality for the variance of L(Z+c). Let C be the value of c giving the minimum value of E[L(Z+c)]. Then, the following holds:

Theorem 3.1

We have

V[L(Z+C)]V[L(Z)],

where equality holds only when C=0; that is, when k1=k2.

Proof.

Put C:=|C/b|1/a. It follows from Equation (5) that

(k12k22)bC2Γ(a)Γ2a,C=sgn(C)k2k1k1+k2(k1+k2)2bC2Γ(a)Γ2a,C=(k1+k2)2bC2Γ(a)2γa,CΓ2a,C,sgn(C)(k12k22)b22Γ(a)γ3a,C=sgn(C)k2k1k1+k2(k1+k2)2b22Γ(a)γ3a,C=(k1+k2)2b22Γ(a)2γa,Cγ3a,C.

Hence, substituting c = C in Equation (A2), we have

V[L(Z+C)]=(k1+k2)2C24(k1+k2)2bCΓ(a)2γa,CΓ2a,C(k1+k2)2C24Γ(a)2γa,C2(k1+k2)2b24Γ(a)2Γ2a,C2+(k12+k22)b2Γ(3a)2Γ(a)(k1+k2)2b22Γ(a)2γa,Cγ3a,C.

From this and Equation (4), we obtain

V[L(Z)]V[L(Z+C)]=(k1+k2)2C24+(k1+k2)2bCΓ(a)2γa,CΓ2a,C+(k1+k2)2C24Γ(a)2γa,C2+(k1+k2)2b24Γ(a)2Γ2a,C2(k1+k2)2b2Γ(2a)24Γ(a)2+(k1+k2)2b22Γ(a)2γa,Cγ3a,C=(k1+k2)2b24Γ(a)2fa,C,

where, for a>0 and x0, f(a,x) is defined as

f(a,x):=x2aγ(a,x)2x2aΓ(a)2+4xaγ(a,x)Γ(2a,x)+Γ(2a,x)2Γ(2a)2+2γ(a,x)γ(3a,x).

Here, since

ddxf(a,x)=2axa1xaγ(a,x)2xaΓ(a)2+2γ(a,x)Γ(2a,x)+2xa1exγ(3a,x)+2x2a1exΓ(2a,x),

from Lemma A.1 in Appendix 2, we have (d/dx)f(a,x)>0 (a>0, x>0). Also, f(a,0)=0 holds for a>0. Therefore, we obtain

V[L(Z)]V[L(Z+C)]0,

where equality holds only when C = 0. Moreover, from Equation (5), we find that C = 0 holds only when k1=k2.

Figure 5 plots V[L(Z)]V[L(Z+C)] for the Laplace and Gaussian distributions respect to k2 for k1=50 and b = 1.

Figure 5.

Figure 5.

Plots of V[L(Z)]V[L(Z+C)] for the Laplace and the Gaussian distributions. (a) When ZLaplace(0,1). (b) When ZN(0,12).

4. Simulation

The simulations of our method used actual data, in particular ‘cars’ of the R datasets package. The cars is a dataset of speeds and stopping distances of cars.

We separated the cars data into training and test datasets as follows: The odd-numbered data were selected the cars is the training dataset, and the even-numbered data were the test dataset. The following figure shows the scatter plots of the training and test data. The horizontal axis represents speed, and the vertical axis represents stopping distances of cars (Figure 6).

Figure 6.

Figure 6.

Scatter plots of the training and test data. (a) Training data. (b) Verification data.

We used the training dataset to fit the parameters of the regression model y=ax+b+ε and also used it to find the solution to the minimization problem:

(a~,b~):=argmin(a,b)i=1nL(axi+byi).

The regression coefficients of y = ax + b were estimated to be aˆ=4.09 and bˆ=18.61 by using the least squares method, and the unbiased sample variance of the error was 308.42. We obtained a prediction yˆ=aˆx+bˆ and z=yˆyN(0,308.42). Fix the conditions as k1=1 and k2=5. Then C = 16.99 from Equation (6), and the estimated solution to the minimization problem is (a~,b~)=(5.33,19.33). Figure 7 plots (i)  yˆ=aˆx+bˆ, (ii) yˆ+C, and (iii) y~=a~x+b~.

Figure 7.

Figure 7.

Scatter plots and plots of (i), (ii), and (iii). (a) Scatter plot for the training data and plots of (i), (ii), and (iii). (b) Scatter plot for the test data and plots of (i), (ii), and (iii).

The slope of y~ is steeper than the slope of yˆ. Therefore, if the model y(x)=ax+b+ε is true, a=aˆ, and b=bˆ, then the value y~(x) is far from y(x) when x is large. That is, the loss L(y~(x)y(x)) is large when x is large. This implies that using y~ as a prediction is high risk. On the other hand, the slope of yˆ is equal to the slope of yˆ+C. Therefore, using yˆ+C as a prediction is low risk.

Table 1 lists the sample means and sample variances of the loss for the training and test data when k1=1 and k2=5.

Table 1. Sample means and sample variances of the loss for the training and the test data when k1=1 and k2=5.

  Mean (training) Mean (test) Variance (training) Variance (test)
(i) 37.31 28.05 3337.93 896.99
(ii) 31.31 21.86 834.35 105.40
(iii) 28.48 25.56 612.18 127.50

Figure 8 is a plot of the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure 8.

Figure 8.

Plots of sample means for the training and the test data. (a) Sample means for the training data. (b) Sample means for the test data.

For the test data, the plot of the loss of (iii) is in the form of steps. This is because a~ and b~ are discontinuous with respect to k2. On the other hand, the plot of the loss of (ii) is smooth. This is because C is continuous with respect to k2. Therefore, the loss of (ii) is more stable than the loss of (iii). In addition, deriving a~ and b~ is troublesome, but deriving C is easy because C is explicitly a function of k1 and k2.

Figure 9 plots sample variances of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure 9.

Figure 9.

Plots of sample variances for the training and the test data. (a) Sample variance for the training data. (b) Sample variances for the training data. (c) Sample variance for the test data. (d) Sample variances for the test data.

Clealy, from Figures 8 and 9, the sample means and sample variances of (ii) and (iii) are lower than those of (i) for any k2. This simulation shows that it is best to use (ii) as the prediction. Other simulation results are listed in Appendix 3.

Appendix 1.

Calculation of the expected value and variance of the loss

A.1. Expected value of the loss

Put β:=(2abΓ(a))1. Then,

E[L(Z+c)]=+L(z+c)fZ(z)dz=k2βc(zc)expzb1/adz+k1βc+(z+c)expzb1/adz.

Replace z with bz to get

E[L(Z+c)]=k2bβc/b(bzc)expz1/adz+k1bβc/b+(bz+c)expz1/adz.

When c0, we have

E[L(Z+c)]=k2bβc/b(bzc)exp(z)1/adz+k1bβc/b0(bz+c)exp(z)1/adz+k1bβ0+(bz+c)expz1/adz=k2bβc/b+(bzc)expz1/adz+k1bβ0c/b(bz+c)expz1/adz+k1bβ0+(bz+c)expz1/adz=(k1+k2)b2βc/b+zexpz1/adz+(k1k2)bcβ0+expz1/adz+(k1+k2)bcβ0c/bexpz1/adz.

When c<0, we have

E[L(Z+c)]=k2bβ0(bzc)exp(z)1/adz+k2bβ0c/b(bzc)expz1/adz+k1bβc/b+(bz+c)expz1/adz=k2bβ0+(bzc)expz1/adz+k2bβ0c/b(bzc)expz1/adz+k1bβc/b+(bz+c)expz1/adz=(k1+k2)b2βc/b+zexpz1/adz+(k1k2)bcβ0+expz1/adz(k1+k2)bcβ0c/bexpz1/adz.

From the above, for any cR, we have

E[L(Z+c)]=(k1+k2)b2βc/b+zexpz1/adz+(k1k2)bcβ0+expz1/adz+(k1+k2)bcβ0c/bexpz1/adz.

Now set t:=z1/a to get

E[L(Z+c)]=(k1+k2)ab2βc+t2a1etdt+(k1k2)abcβ0+ta1etdt+(k1+k2)abcβ0cta1etdt=(k1+k2)ab2βΓ(2a,c)+(k1k2)abcβΓ(a)+(k1+k2)abcβγ(a,c),

where c:=c/b1/a. Therefore, for any cR, we have

E[L(Z+c)]=(k1k2)c2+(k1+k2)c2Γ(a)γa,cb1/a+(k1+k2)b2Γ(a)Γ2a,cb1/a.

A.2. Variance of the loss

Put β:=(2abΓ(a))1. Then,

E[L(Z+c)2]=+L(z+c)2fZ(z)dz=k22βc(z+c)2expzb1/adz+k12βc+(z+c)2expzb1/adz.

Replace z with bz to get

E[L(Z+c)2]=k22bβc/b(bz+c)2expz1/adz+k12bβc/b+(bz+c)2expz1/adz.

When c0, we have

E[L(Z+c)2]=k22bβc/b(bz+c)2exp(z)1/adz+k12bβc/b0(bz+c)2exp(z)1/adz+k12bβ0+(bz+c)2expz1/adz=k22bβc/b+(bz+c)2expz1/adz+k12bβ0c/b(bz+c)2expz1/adz+k12bβ0+(bz+c)2expz1/adz=(k12+k22)b3β0+z2expz1/adz+(k12k22)b3β0c/bz2expz1/adz+2(k12k22)b2cβc/b+zexpz1/adz+(k12+k22)bc2β0+expz1/adz+(k12k22)bc2β0c/bexpz1/adz.

When c<0, we have

E[L(Z+c)2]=k22bβ0(bz+c)2exp(z)1/adz+k22bβ0c/b(bz+c)2expz1/adz+k12bβc/b+(bz+c)2expz1/adz=k22bβ0+(bz+c)2expz1/adz+k22bβ0c/b(bz+c)2expz1/adz+k12bβc/b+(bz+c)2expz1/adz=(k12+k22)b3β0+z2expz1/adz(k12k22)b3β0c/bz2expz1/adz+2(k12k22)b2cβc/b+zexpz1/adz+(k12+k22)bc2β0+expz1/adz(k12k22)bc2β0c/bexpz1/adz.

From the above, for any cR, we have

E[L(Z+c)2]=(k12+k22)b3β0+z2expz1/adz+sgn(c)(k12k22)b3β0c/bz2expz1/adz+2(k12k22)b2cβc/b+zexpz1/adz+(k12+k22)bc2β0+expz1/adz+sgn(c)(k12k22)bc2β0c/bexpz1/adz.

Now set t:=z1/a to get

E[L(Z+c)2]=(k12+k22)ab3β0+t3a1etdt+sgn(c)(k12k22)ab3β0ct3a1etdt+2(k12k22)ab2cβc+t2a1etdt+(k12+k22)abc2β0+ta1et,dt+sgn(c)(k12k22)abc2β0cta1etdt=(k12+k22)ab3βΓ(3a)+sgn(c)(k12k22)ab3βγ(3a,c)+2(k12k22)ab2cβΓ(2a,c)+(k12+k22)abc2βΓ(a)+sgn(c)(k12k22)abc2βγ(a,c),

where c:=c/b1/a. Therefore, for any cR, we have

E[L(Z+c)2]=(k12+k22)c22+sgn(c)(k12k22)c22Γ(a)γa,c+(k12k22)bcΓ(a)Γ2a,c+(k12+k22)b2Γ(3a)2Γ(a)+sgn(c)(k12k22)b22Γ(a)γ3a,c.

Moreover, from Appendix A.1, we have

E[L(Z+c)]2=(k1k2)2c24+(k12k22)cc2Γ(a)γa,c+(k12k22)bc2Γ(a)Γ2a,c+(k1+k2)2bc2Γ(a)2γa,cΓ2a,c+(k1+k2)2c24Γ(a)2γa,c2+(k1+k2)2b24Γ(a)2Γ2a,c2.

Therefore, for any cR, we have

V[L(Z+c)]=E[L(Z+c)2]E[L(Z+c)]2=(k1+k2)2c24+(k12k22)bc2Γ(a)Γ2a,c(k1+k2)2bc2Γ(a)2γa,cΓ2a,c(k1+k2)2c24Γ(a)2γa,c2(k1+k2)2b24Γ(a)2Γ2a,c2+(k12+k22)b2Γ(3a)2Γ(a)+sgn(c)(k12k22)b22Γ(a)γ3a,c.

Appendix 2. Inequalities for the gamma and the incomplete gamma functions

Here, we prove the following lemma used in Theorem 3.1.

Lemma A.1

For a>0 and x>0,

xaγ(a,x)2xaΓ(a)2+2γ(a,x)Γ(2a,x)>0.

To prove Lemma A.1, we need the following three lemmas:

Lemma A.2

For a>0,

2Γ(2a)aΓ(a)2>0.

Lemma A.3

For a>0 and x0,

aγ(a,x)xaex.

Lemma A.4

For a>0 and bR,

limx+xbΓ(a,x)=0.

Proof Proof of Lemma A.2 —

First, we prove

n=11n(2n1)=2log2. (A1)

Let Sn:=k=1n(1/k(2k1)). Accordingly, we have

Sn=k=1n22k11k=2k=1n12k1k=1n1k=2k=1n12k1+2k=1n12k2k=1n12kk=1n1k=2k=1n12k1+k=1n12k2k=1n1k=2k=12n1k2k=1n1k=2k=n+12n1k=2k=1n1k+n.

Therefore, we obtain

limnSn=2limnk=1n1k+n=2limn1nk=1n11+kn=0111+xdx=2log2.

Thus, Equation (A1) is proved.

Next, by using Equation (A1), we prove

4aΓa+12>πΓ(a+1) (A2)

for a>0. Let

g(a):=4aΓa+12πΓ(a+1).

To prove g(a)>1 for a>0, we use the following formula [2, p. 13, Theorem 1.2.5]:

ddxlogΓ(x)=Γ(x)Γ(x)=γ0+n=11n1x+n1,

where γ0 is Euler's constant. Taking the logarithmic derivative of g(a), from the above formula, we have

ddalogg(a)=2log2+ddalogΓa+12ddalogΓ(a+1)=2log2+n=11n1a12+nn=11n1a+n=2log212n=11(a+n)a12+n>2log212n=11nn12=2log2n=11n(2n1)

for a>0. Moreover, from Equation (A1), we obtain (d/da)logg(a)>0 for a>0. This leads to (d/da)g(a)>0 for a>0. Equation (A2) follows from this and g(0)=1.

Using Equation (A2) and the formula [2, p. 22, Theorem 6.5.1]

Γ(2a)=22a1πΓ(a)Γa+12,

we complete the proof of Lemma A.2 as follows:

2Γ(2a)aΓ(a)2=22aπΓ(a)Γa+12Γ(a)Γ(a+1)=1πΓ(a)4aΓa+12πΓ(a+1)>0.

Proof Proof of Lemma A.3 —

For a>0 and x0, we define

u(a,x):=aγ(a,x)xaex.

Then, we have

ddxu(a,x)=xaex0.

The lemma follows from this and u(a,0)=0.

Proof Proof of Lemma A.4 —

When b0, the statement easily follows from the definition of Γ(a,x). When b>0, we use L'Hôpital's rule to obtain

limx+Γ(a,x)xb=limx+xa1exbxb1=limx+xa+bexb=0.

Proof Proof of Lemma A.1 —

For a>0 and x0, we define

y1(a,x):=xaγ(a,x)2xaΓ(a)2+2γ(a,x)Γ(2a,x).

Let us prove y1(a,x)>0 (a>0, x>0). For a>0 and x0, we define

y2(a,x):=aγ(a,x)2aΓ(a)2+2exΓ(2a,x);y3(a,x):=axa1γ(a,x)Γ(2a,x)x2a1ex;y4(a,x):=a(a1)γ(a,x)+xaex(2x+1a).

Then, we have

ddxy1(a,x)=xa1y2(a,x);ddxy2(a,x)=2exy3(a,x);ddxy3(a,x)=xa2y4(a,x);ddxy4(a,x)=xaex(3a+12x).

From these relations, we find that the signs of (d/dx)yi(a,x) and yi+1(a,x) (i = 1, 2, 3) are the same for a>0 and x>0. Let pi(a) (i = 2, 3, 4) be the value of x satisfying yi(a,x)=0. It is easily verified that limx0+(d/dx)y4(a,x)=limx+(d/dx)y4(a,x)=limx0+y4(a,x)=0 and limx+y4(a,x)=a(a1)Γ(a) for a>0. Therefore, from the first derivative test, we obtain Tables 2 and 3. Moreover, using Lemmas A.3 and A.4 and L'Hôpital's rule, we obtain

limx0+ddxy3(a,x)=(0<a<1),0(a1),limx+ddxy3(a,x)=0(0<a<2),2(a=2),(a>2),limx0+y3(a,x)=Γ(2a)(a>0),limx+y3(a,x)=0(0<a<1),1(a=1),(a>1),limx0+ddxy2(a,x)=2Γ(2a)(a>0),limx+ddxy2(a,x)=0(a>0),limx0+y2(a,x)=2Γ(2a)aΓ(a)2(a>0),limx+y2(a,x)=0(a>0),limx0+ddxy1(a,x)=(0<a<1),1(a=1),0(a>1),limx+ddxy1(a,x)=0(a>0),limx0+y1(a,x)=0(a>0),limx+y1(a,x)=0(a>0).

From these results, Lemma A.2, and the fact that the signs of (d/dx)yi(a,x) and yi+1(a,x) (i = 1, 2, 3) are the same for a>0 and x>0, we obtain Tables 4 and 5. From Tables 4 and 5, we can verify that y1(a,x)>0 holds for a>0 and x>0. This completes the proof of the lemma.

Table A1. Case of 0<a<1.

x 0 ···  3a+12 ···  p4(a) ···  +
ddxy4(a,x) 0 + 0 0
y4(a,x) 0 + + + 0

Table A2. Case of a1.

x 0 ···  3a+12 ···  +
ddxy4(a,x) 0 + 0 0
y4(a,x) 0 + + + 0 (a=1)+ (a>1)

Table A3. Case of 0<a<1.

x 0 ···  p2(a) ···  p3(a) ···  p4(a) ···  +
ddxy3(a,x) + + + + + + 0 0
y3(a,x) 0 + + + 0
ddxy2(a,x) 0 + + + 0
y2(a,x) + + 0 0
ddxy1(a,x) + + 0 0
y1(a,x) 0 + + + + + + + 0

Table A4. Case of a1.

x 0 ···  p2(a) ···  p3(a) ···  +
ddxy3(a,x) 0 + + + + + 0 (a<2)+ (a=2)+ (a>2)
y3(a,x) 0 + 0
ddxy2(a,x) 0 + 0
y2(a,x) + + 0 0
ddxy1(a,x) + + 0 0
y1(a,x) 0 + + + + + 0

Appendix 3. Other simulations

We conducted three other simulations of our method using actual data, i.e. ‘cars’ of the R dataset package. This dataset includes the speeds and stopping distances of automobiles.

We separated the dataset into training and test datasets as follows: The even-numbered data were selected as the training dataset, and the odd-numbered data were the test dataset. The regression coefficients of y=ax+b were estimated as aˆ=3.80 and bˆ=16.93 by using the least squares method, and unbiased sample variance of the error was 159.31. We obtained a prediction yˆ=aˆx+bˆ and z=yˆyN(0,159.31). Fix the conditions as k1=1 and k2=5. Then, C = 16.99 by Equation (6), and the estimated solution to the minimization problem is (a~,b~)=(5.33,19.33). Figure A1 plots (i) yˆ=aˆx+bˆ, (ii) yˆ+C, and (iii) y~=a~x+b~.

Figure A.1.

Figure A.1.

Scatter plots and plots of (i), (ii), and (iii). (a) Scatter plot for the training data and plots of (i), (ii), and (iii). (b) Scatter plot for the test data and plots of (i), (ii), and (iii).

Table 6 lists the sample means and sample variances of the loss for the training and test data when k1=1 and k2=5.

Table A5. Sample means and sample variances of the loss for the training and test data when k1=1 and k2=5.

  Mean (training) Mean (test) Variance (training) Variance (test)
(i) 31.42 42.54 1307.61 4065.98
(ii) 20.04 31.74 216.34 1836.93
(iii) 19.56 30.40 169.21 1540.48

Figure A2 plots the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure A.2.

Figure A.2.

Plots of sample means for the training and the test data. (a) Sample means for the training data. (b) Sample means for the test data.

Moreover Figure A3 plots the sample variances of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure A.3.

Figure A.3.

Plots of sample variances for the training and the test data. (a) Sample variance for the training data. (b) Sample variances for the training data. (c) Sample variance for the test data. (d) Sample variances for the test data.

Next, we separated the cars dataset into training and test datasets as follows: The 1st to 25th data of the cars dataset were selected as the training dataset (the data of the cars dataset are arranged in ascending order of speed), and the 26th to 50th data were the test dataset. The regression coefficients of y = ax + b were estimated to be aˆ=3.29 and bˆ=10.00 by using the least squares method, and unbiased sample variance of the error was 178.77. Figure A4 plots the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure A.4.

Figure A.4.

Plots of sample means for the training and test data. (a) Sample means for the training data. (b) Sample means for the test data.

Moreover, Figure A5 plots the sample variances of the loss for the training and test data for (i), (ii), and (iii) for respect to k2.

Figure A.5.

Figure A.5.

Plots of sample variances for the training and test data.(a) Sample variance for the training data. (b) Sample variances for the training data. (c) Sample variance for the test data. (d) Sample variances for the test data.

Finally, we separated the cars into training and test datasets as follows: The 26th to 50th data were selected as the training dataset, and the 26th to 50th data were the test dataset. The regression coefficients of y=ax+b were estimated to be aˆ=5.15 and bˆ=42.04 by using the least squares method, and the unbiased sample variance of the error was 277.30. Figure A6 plots the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure A.6.

Figure A.6.

Plots of sample means for the training and the test data. (a) Sample means for the training data. (b) Sample means for the test data.

Moreover, Figure A7 plots the sample variances of the loss for the training and test data for (i), (ii), and (iii) respect to k2.

Figure A.7.

Figure A.7.

Plots of sample variances for the training and test data. (a) Sample variance for the training data. (b) Sample variances for the training data. (c) Sample variance for the test data. (d) Sample variances for the test data.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Aldrich J., Doing least squares: Perspectives from gauss and yule, Int. Stat. Rev. 66 (1998), pp. 61–81. Available at https://onlinelibrary.wiley.com/doi/abs/ 10.1111/j.1751-5823.1998.tb00406.x. [DOI] [Google Scholar]
  • 2.Andrews G.E., Askey R., and Roy R., Special Functions, Encyclopedia of Mathematics and its Applications, Cambridge University Press, New York, 1999. [Google Scholar]
  • 3.Breckling J. and Chambers R., M-quantiles, Biometrika 75 (1988), pp. 761–771. Available at http://www.jstor.org/stable/2336317. doi: 10.1093/biomet/75.4.761 [DOI] [Google Scholar]
  • 4.Dytso A., Bustin R., Poor H.V., and Shamai S., Analytical properties of generalized gaussian distributions, J. Stat. Distrib. Appl. 5 (2018), p. 6. Available at 10.1186/s40488-018-0088-5. [DOI] [Google Scholar]
  • 5.Efron B., Regression percentiles using asymmetric squared error loss, Stat. Sin. 1 (1991), pp. 93–125. Available at http://www.jstor.org/stable/24303995. [Google Scholar]
  • 6.Guidotti R., Monreale A., Turini F., Pedreschi D., and Giannotti F., A survey of methods for explaining black box models, ACM Comput. Surv. 51 (2018), pp. 93:1–93:42. [Google Scholar]
  • 7.Johnson N., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, 2nd ed., Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Vol. 1, Wiley-Interscience, 1994. [Google Scholar]
  • 8.Johnson N., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, 2nd ed., Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Vol. 2, Wiley-Interscience, 1995. [Google Scholar]
  • 9.Koenker R. and Bassett G., Regression quantiles, Econometrica 46 (1978), pp. 33–50. Available at http://www.jstor.org/stable/1913643. doi: 10.2307/1913643 [DOI] [Google Scholar]
  • 10.Legendre A., Nouvelles méthodes pour la détermination des orbites des comètes, Nineteenth Century Collections Online (NCCO): Science, Technology, and Medicine: 1780–1925, F. Didot, 1805. Available at https://books.google.co.jp/books?id=FRcOAAAAQAAJ.
  • 11.Nadarajah S., A generalized normal distribution, J. Appl. Stat. 32 (2005), pp. 685–694. Available at 10.1080/02664760500079464. [DOI] [Google Scholar]
  • 12.Stigler S.M., Gauss and the invention of least squares, Ann. Statist. 9 (1981), pp. 465–474. Available at 10.1214/aos/1176345451. [DOI] [Google Scholar]
  • 13.Subbotin T., On the law of frequency of error, Recueil Math. 31 (1923), pp. 296–301. [Google Scholar]
  • 14.Wang Z.X. and Guo D.R., Special Functions, World Scientific, 1989. Available at https://www.worldscientific.com/doi/abs/ 10.1142/0653. [DOI] [Google Scholar]
  • 15.Yamaguchi N., Hori M., and Ideguchi Y., Minimising the expectation value of the procurement cost in electricity markets based on the prediction error of energy consumption, Pac. J. Math. Ind. 10 (2018), p. 4. Available at 10.1186/s40736-018-0038-7. [DOI] [Google Scholar]
  • 16.Zellner A., Bayesian estimation and prediction using asymmetric loss functions, J. Am. Stat. Assoc. 81 (1986), pp. 446–451. Available at http://www.jstor.org/stable/2289234. doi: 10.1080/01621459.1986.10478289 [DOI] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES