Minimizing the expected value of the asymmetric loss function and an inequality for the variance of the loss

Naoya Yamaguchi; Yuka Yamaguchi; Ryuei Nishii

doi:10.1080/02664763.2020.1761951

. 2020 May 3;48(13-15):2348–2368. doi: 10.1080/02664763.2020.1761951

Minimizing the expected value of the asymmetric loss function and an inequality for the variance of the loss

Naoya Yamaguchi ^a,^CONTACT, Yuka Yamaguchi ^a, Ryuei Nishii ^b

PMCID: PMC9041947 PMID: 35707067

ABSTRACT

The coefficients of regression are usually estimated for minimization problems with asymmetric loss functions. In this paper, we rather correct predictions so that the prediction error follows a generalized Gaussian distribution. In our method, we not only minimize the expected value of the asymmetric loss, but also lower the variance of the loss. Predictions usually have errors. Therefore, it is necessary to use predictions in consideration of these errors. Our approach takes into account prediction errors. Furthermore, even if we do not understand the prediction method, which is a possible circumstance in, e.g. deep learning, we can use our method if we know the prediction error distribution and asymmetric loss function. Our method can be applied to procurement of electricity from electricity markets.

Keywords: Asymmetric loss function, gamma function, generalized Gaussian distribution, minimizing the expectation value, risk function

2010 Mathematics Subject Classifications: 62A99, 62E99

1. Introduction

In this paper, we treat minimization problems with loss functions as follows: Let ${(x_{i}, y_{i}) ∣ 1 \leq i \leq n}$ be a dataset, where $x_{i}$ are $1 \times p$ explanatory vectors and $y_{i} \in R$ are target variables. We assume that the following linear regression model:

y = X β + ε,

where $y =^{t} (y_{1}, \dots, y_{n})$ , $ε =^{t} (ε_{1}, \dots, ε_{n})$ , and X is an $n \times p$ design matrix having $x_{i}$ as the ith row. Let L be the loss function for the parameter estimation. The parameter vector $β$ can be estimated by

\hat{β} := \arg min_{β} \{\sum_{i = 1}^{n} L (ε_{i})\} .

The case of $L (ε) = ε^{2}$ is known as the quadratic loss function, which is symmetric (see, e.g. Refs. [1,10,12]). In the case of an asymmetric loss function, we refer the reader to, e.g. Refs. [3,5,9,16]. Consider the case that the prediction accuracy is assessed by a risk function based on an asymmetric loss function. Let y and $\hat{y}$ be an observed and a predicted value of y. Suppose that the prediction error $\hat{y} - y$ follows a general Gaussian distribution. Then, the error can be corrected by adding a constant C, i.e. $\hat{y} - y + C$ , determined by minimizing the risk function. We make the following assumptions:

The prediction error $z := \hat{y} - y$ is the realized value of a random variable Z, whose density function is a generalized Gaussian distribution function (see, e.g. Refs. [4,11,13]) with mean zero
$f_{Z} (z) := \frac{1}{2 a b Γ (a)} \exp (- {∣\frac{z}{b}∣}^{1 / a}),$
where $Γ (a)$ is the gamma function and a, $b \in R_{> 0}$ .
Let $k_{1}$ , $k_{2} \in R_{> 0}$ . If there is a mismatch between y and $\hat{y}$ , we suffer a loss,
$L (z) := \{\begin{cases} k_{1} z, & z \geq 0, \\ - k_{2} z, & z < 0. \end{cases}$

Here, $f_{Z} (z)$ could be a Gaussian distribution $N (0, \frac{1}{2} b^{2})$ or Laplace distribution $L a p l a c e (0, b)$ with mean zero (see, e.g. [[7, p. 80], [8, p. 164]]). That is, $f_{Z} (z)$ is the probability density function of a Gaussian distribution when $a = \frac{1}{2}$ , and $f_{Z} (z)$ is the probability density function of a Laplace distribution when a = 1.

Under the assumptions (I) and (II), we will derive the optimized predicted value $y^{*} = \hat{y} + C$ minimizing the expected value of the loss (risk). That is, we will solve the following risk minimization problem:

C = \arg min_{c} \{E [L (Z + c)]\} .

In our method, we not only minimize the expected value of the asymmetric loss, but also lower the variance of the loss:

V [L (Z + C)] \leq V [L (Z)] .

The variance of the loss is reduced by correcting the predicted value y to the optimized predicted value $y^{*}$ .

The motivation of our research is as follows: (1) Suppose we have a risk minimization problem based on an asymmetric loss function. Then, we can reduce the risk by correcting the predictors. For example, the paper [15] formulates a method for minimizing the expected value of the procurement cost of electricity in two popular spot markets: day-ahead and intra-day, under the assumption that the expected value of the unit prices and the distributions of the prediction errors for the electricity demand traded in two markets are known. The paper showed that if the procurement is increased or decreased from the prediction, in some cases, the expected value of the procurement cost is reduced. Our method is a generalization of the method in [15].

(2) In recent years, prediction methods have been black boxed by big data and machine learning (see, e.g. Ref. [6]). The day will soon come when we must minimize the objective function by using predictions obtained by such black-box methods. In our method, even if we do not know the prediction $\hat{y}$ , we can determine the parameter C if we know the prediction error distribution $f_{Z}$ and asymmetric loss function L.

The remainder of this paper is organized as follows: In Section 2, we introduce the expected value and the variance of $L (Z + c)$ and determine the value c = C that gives the minimum value of $E [L (Z + c)]$ . In addition, we give a geometrical interpretation of the parameter C and give the minimized expected value $E [L (Z + C)]$ . In Section 3, we prove the inequality for the variance of the loss. In Section 4, we report the results of simulations of our method using actual data.

2. Expected value and variance of the loss

In the following, we assume that (I) and (II) hold. Here, we introduce the expected value and the variance of $L (Z + c)$ and determine the value $c = C$ that gives the minimum value of $E [L (Z + c)]$ . In addition, we give a geometrical interpretation of the parameter C and give the minimized expected value $E [L (Z + C)]$ .

2.1. Expected value and variance of the loss

Let y be an observation value and $\hat{y}$ be a predicted value of y. Let $Γ (a)$ be the gamma function, and let $Γ (a, x)$ and $γ (a, x)$ be the upper and the lower incomplete gamma functions, respectively (see, e.g. Refs. [2, p. 197, 4, p. 2, 14, p. 93]), defined by

Γ (a) := \int_{0}^{+ \infty} t^{a - 1} e^{- t} d t, Γ (a, x) := \int_{x}^{+ \infty} t^{a - 1} e^{- t} d t, γ (a, x) := \int_{0}^{x} t^{a - 1} e^{- t} d t,

where $R e (a) > 0$ and $x \geq 0$ . Also, for $c \in R$ , let $s g n (c) := 1 (c \geq 0); - 1 (c < 0)$ . Then, the expected value and variance of $L (Z + c)$ are as follows (see Appendix 1 for the proof): For any $c \in R$ , we have

\begin{aligned} E [L (Z + c)] & = \frac{(k_{1} - k_{2}) c}{2} + \frac{(k_{1} + k_{2}) ∣ c ∣}{2 Γ (a)} γ (a, {∣\frac{c}{b}∣}^{1 / a}) + \frac{(k_{1} + k_{2}) b}{2 Γ (a)} Γ (2 a, {∣\frac{c}{b}∣}^{1 / a}), \end{aligned}

(1)

\begin{aligned} V [L (Z + c)] & = \frac{(k_{1} + k_{2})^{2} c^{2}}{4} + \frac{(k_{1}^{2} - k_{2}^{2}) b c}{2 Γ (a)} Γ (2 a, {∣\frac{c}{b}∣}^{1 / a}) \\ - \frac{(k_{1} + k_{2})^{2} b ∣ c ∣}{2 Γ (a)^{2}} γ (a, {∣\frac{c}{b}∣}^{1 / a}) Γ (2 a, {∣\frac{c}{b}∣}^{1 / a}) \\ - \frac{(k_{1} + k_{2})^{2} c^{2}}{4 Γ (a)^{2}} γ {(a, {∣\frac{c}{b}∣}^{1 / a})}^{2} - \frac{(k_{1} + k_{2})^{2} b^{2}}{4 Γ (a)^{2}} Γ {(2 a, {∣\frac{c}{b}∣}^{1 / a})}^{2} \\ + \frac{(k_{1}^{2} + k_{2}^{2}) b^{2} Γ (3 a)}{2 Γ (a)} + s g n (c) \frac{(k_{1}^{2} - k_{2}^{2}) b^{2}}{2 Γ (a)} γ (3 a, {∣\frac{c}{b}∣}^{1 / a}) . \end{aligned}

(2)

From this, we have the following:

\begin{aligned} E [L (Z)] & = \frac{(k_{1} + k_{2}) b}{2 Γ (a)} Γ (2 a), \end{aligned}

(3)

\begin{aligned} V [L (Z)] & = \frac{(k_{1}^{2} + k_{2}^{2}) b^{2} Γ (3 a)}{2 Γ (a)} - \frac{(k_{1} + k_{2})^{2} b^{2} Γ (2 a)^{2}}{4 Γ (a)^{2}} . \end{aligned}

(4)

Let $e r f (x)$ be the error function (see, e.g. Refs. [2, p. 196]) defined by

e r f (x) := \frac{2}{\sqrt{π}} \int_{0}^{x} \exp (- t^{2}) d t

for any $x \in R$ . We give two examples of $E [L (Z + c)]$ and $V [L (Z + c)]$ . In the case of $Z \sim L a p l a c e (0, b)$ , since $a = 1$ , we have

\begin{aligned} E [L (Z + c)] & = L (c) + \frac{(k_{1} + k_{2}) b}{2} \exp (- ∣\frac{c}{b}∣), \\ V [L (Z + c)] & = \{k_{1}^{2} + k_{2}^{2} + s g n (c) (k_{1}^{2} - k_{2}^{2})\} b^{2} - s g n (c) (k_{1} + k_{2}) \{L (c) + b (k_{1} - k_{2})\} \\ \times b \exp (- ∣\frac{c}{b}∣) - \frac{(k_{1} + k_{2})^{2} b^{2}}{4} \exp (- 2 ∣\frac{c}{b}∣) . \end{aligned}

To derive these expressions, we used the equations $γ (1, x) = 1 - e^{- x}, Γ (2, x) = (1 + x) e^{- x}$ , and $γ (3, x) = 2 - (2 + 2 x + x^{2}) e^{- x}$ , which are easily obtained from the definitions of the incomplete gamma functions. In the case of $Z \sim N (0, \frac{1}{2} b^{2})$ , since $a = \frac{1}{2}$ , we have

\begin{aligned} E [L (Z + c)] & = \frac{(k_{1} - k_{2}) c}{2} + \frac{(k_{1} + k_{2}) c}{2} e r f (\frac{c}{b}) + \frac{(k_{1} + k_{2}) b}{2 \sqrt{π}} \exp (- \frac{c^{2}}{b^{2}}), \\ V [L (Z + c)] & = \frac{(k_{1}^{2} + k_{2}^{2}) b^{2}}{4} + \frac{(k_{1} + k_{2})^{2} c^{2}}{4} + \frac{(k_{1}^{2} - k_{2}^{2}) b^{2}}{4} e r f (\frac{c}{b}) \\ - \frac{(k_{1} + k_{2})^{2} c^{2}}{4} {e r f}^{2} (\frac{c}{b}) - \frac{(k_{1} + k_{2})^{2} b c}{2 \sqrt{π}} e r f (\frac{c}{b}) \exp (- \frac{c^{2}}{b^{2}}) \\ - \frac{(k_{1} + k_{2})^{2} b^{2}}{4 π} \exp (- \frac{2 c^{2}}{b^{2}}) . \end{aligned}

To derive these expressions, we used the equations $γ (\frac{1}{2}, x) = \sqrt{π} e r f (\sqrt{π}), Γ (1, x) = e^{- x}$ , and $γ (\frac{3}{2}, x) = (\sqrt{π} / 2) e r f (\sqrt{π}) - \sqrt{π} e^{- x}$ , which are easily obtained from the definitions of the incomplete gamma functions. For $k_{1} = 50$ and b = 1, we can plot $E [L (Z)]$ and $V [L (Z)]$ for the Laplace and the Gaussian distributions respect to $k_{2}$ as follows:

In both cases, the graph of the expected value is a straight line (positive slope), and the graph of the variance is a quadratic curve (convex downward) for $k_{2}$ (Figures 1–2).

Figure 1. — Plots for a Laplace distribution. (a) Expected value of $L (Z)$ . (b) Variance of $L (Z)$ .

Figure 2. — Plots for a Gaussian distribution. (a) Expected value of $L (Z)$ . (b) Variance of $L (Z)$ .

2.2. Parameter value minimizing the expected value

Here, we determine the value c = C that gives the minimum value of $E [L (Z + c)]$ . Since

\begin{aligned} \frac{d}{d c} Γ (2 a, {∣\frac{c}{b}∣}^{1 / a}) & = - \frac{c}{a b} \exp (- {∣\frac{c}{b}∣}^{1 / a}); \\ \frac{d}{d c} γ (a, {∣\frac{c}{b}∣}^{1 / a}) & = s g n (c) \frac{1}{a b} \exp (- {∣\frac{c}{b}∣}^{1 / a}), \end{aligned}

we have

\frac{d}{d c} E [L (Z + c)] = \frac{k_{1} - k_{2}}{2} + s g n (c) \frac{k_{1} + k_{2}}{2 Γ (a)} γ (a, {∣\frac{c}{b}∣}^{1 / a}) .

We will denote the value of c satisfying $(d / d c) E [L (Z + c)] = 0$ as C. Then, from the first derivative test, we find that $E [L (Z + c)]$ has a minimum value at c = C.

c	Less than C	C	More than C
$\frac{d}{d c} E [L (Z + c)]$	Negative	0	Positive
$E [L (Z + c)]$	Strongly decreasing		Strongly increasing

Open in a new tab

Also, it follows from

γ (a, {∣\frac{C}{b}∣}^{1 / a}) = s g n (C) \frac{k_{2} - k_{1}}{k_{1} + k_{2}} Γ (a)

(5)

that $s g n (C) = s g n (k_{2} - k_{1})$ and C = 0 only when $k_{1} = k_{2}$ . Equation (5) presents us with a geometrical interpretation of C as follows: The ratio of $Γ (a)$ and $γ (a, ∣ C / b ∣^{1 / a})$ is $1 : ∣ k_{2} - k_{1} ∣ / (k_{1} + k_{2})$ . That is, the vertical axis $t = ∣ C / b ∣^{1 / a}$ divides the area between $t^{a - 1} e^{- t}$ and the t-axis into $∣ k_{2} - k_{1} ∣ / (k_{1} + k_{2}) : 1 - ∣ k_{2} - k_{1} ∣ / (k_{1} + k_{2})$ .

Let ${e r f}^{- 1} (x)$ be the inverse error function. We give two examples of C. In the case of $Z \sim L a p l a c e (0, b)$ , since a = 1, from $γ (1, x) = 1 - e^{- x}$ , we have

C = - s g n (k_{2} - k_{1}) b \log (1 - ∣\frac{k_{2} - k_{1}}{k_{1} + k_{2}}∣) .

In the case of $Z \sim N (0, \frac{1}{2} b^{2})$ , since $a = \frac{1}{2}$ , from $γ (\frac{1}{2}, x) = \sqrt{π} e r f (\sqrt{π})$ , we have

C = s g n (k_{2} - k_{1}) b {e r f}^{- 1} (∣\frac{k_{2} - k_{1}}{k_{1} + k_{2}}∣) .

(6)

Figure 3 plots C for Laplace and Gaussian distributions respect to $k_{2}$ for $k_{1} = 50$ and b = 1.

2.3. Minimized expected value of the loss

Here, we derive the minimum value of $E [L (Z + c)]$ . Substituting $c = C$ in Equation (A1), we have, from Equation (5),

E [L (Z + C)] = \frac{(k_{1} + k_{2}) b}{2 Γ (a)} Γ (2 a, {∣\frac{C}{b}∣}^{1 / a}) .

This is the minimum value of $E [L (Z + c)]$ . From this and Equation (3), we have

\begin{aligned} E [L (Z)] - E [L (Z + C)] & = \frac{(k_{1} + k_{2}) b}{2 Γ (a)} γ (2 a, {∣\frac{C}{b}∣}^{1 / a}), \\ \frac{E [L (Z + C)]}{E [L (Z)]} & = \frac{1}{Γ (2 a)} Γ (2 a, {∣\frac{C}{b}∣}^{1 / a}) . \end{aligned}

Figure 4 plots $E [L (Z)] - E [L (Z + C)]$ for the Laplace and Gaussian distributions respect to $k_{2}$ for $k_{1} = 50$ and b = 1.

3. Inequality for the variance of the loss

Here, we derive an inequality for the variance of $L (Z + c)$ . Let C be the value of c giving the minimum value of $E [L (Z + c)]$ . Then, the following holds:

Theorem 3.1

We have

$V [L (Z + C)] \leq V [L (Z)],$

where equality holds only when $C = 0;$ that is, when $k_{1} = k_{2}$ .

Proof.

Put $C^{'} := | C / b |^{1 / a}$ . It follows from Equation (5) that

$\begin{aligned} \frac{(k_{1}^{2} - k_{2}^{2}) b C}{2 Γ (a)} Γ (2 a, C^{'}) & = - s g n (C) \frac{k_{2} - k_{1}}{k_{1} + k_{2}} \frac{(k_{1} + k_{2})^{2} b ∣ C ∣}{2 Γ (a)} Γ (2 a, C^{'}) \\ = - \frac{(k_{1} + k_{2})^{2} b ∣ C ∣}{2 Γ (a)^{2}} γ (a, C^{'}) Γ (2 a, C^{'}), \\ s g n (C) \frac{(k_{1}^{2} - k_{2}^{2}) b^{2}}{2 Γ (a)} γ (3 a, C^{'}) & = - s g n (C) \frac{k_{2} - k_{1}}{k_{1} + k_{2}} \frac{(k_{1} + k_{2})^{2} b^{2}}{2 Γ (a)} γ (3 a, C^{'}) \\ = - \frac{(k_{1} + k_{2})^{2} b^{2}}{2 Γ (a)^{2}} γ (a, C^{'}) γ (3 a, C^{'}) . \end{aligned}$

Hence, substituting c = C in Equation (A2), we have

$\begin{aligned} V [L (Z + C)] & = \frac{(k_{1} + k_{2})^{2} C^{2}}{4} - \frac{(k_{1} + k_{2})^{2} b ∣ C ∣}{Γ (a)^{2}} γ (a, C^{'}) Γ (2 a, C^{'}) \\ - \frac{(k_{1} + k_{2})^{2} C^{2}}{4 Γ (a)^{2}} γ {(a, C^{'})}^{2} - \frac{(k_{1} + k_{2})^{2} b^{2}}{4 Γ (a)^{2}} Γ {(2 a, C^{'})}^{2} \\ + \frac{(k_{1}^{2} + k_{2}^{2}) b^{2} Γ (3 a)}{2 Γ (a)} - \frac{(k_{1} + k_{2})^{2} b^{2}}{2 Γ (a)^{2}} γ (a, C^{'}) γ (3 a, C^{'}) . \end{aligned}$

From this and Equation (4), we obtain

$\begin{aligned} V [L (Z)] - V [L (Z + C)] & = - \frac{(k_{1} + k_{2})^{2} C^{2}}{4} + \frac{(k_{1} + k_{2})^{2} b ∣ C ∣}{Γ (a)^{2}} γ (a, C^{'}) Γ (2 a, C^{'}) \\ + \frac{(k_{1} + k_{2})^{2} C^{2}}{4 Γ (a)^{2}} γ {(a, C^{'})}^{2} + \frac{(k_{1} + k_{2})^{2} b^{2}}{4 Γ (a)^{2}} Γ {(2 a, C^{'})}^{2} \\ - \frac{(k_{1} + k_{2})^{2} b^{2} Γ (2 a)^{2}}{4 Γ (a)^{2}} + \frac{(k_{1} + k_{2})^{2} b^{2}}{2 Γ (a)^{2}} γ (a, C^{'}) γ (3 a, C^{'}) \\ = \frac{(k_{1} + k_{2})^{2} b^{2}}{4 Γ (a)^{2}} f (a, C^{'}), \end{aligned}$

where, for a>0 and $x \geq 0$ , $f (a, x)$ is defined as

$\begin{aligned} f (a, x) & := x^{2 a} γ (a, x)^{2} - x^{2 a} Γ (a)^{2} + 4 x^{a} γ (a, x) Γ (2 a, x) \\ + Γ (2 a, x)^{2} - Γ (2 a)^{2} + 2 γ (a, x) γ (3 a, x) . \end{aligned}$

Here, since

$\begin{aligned} \frac{d}{d x} f (a, x) & = 2 a x^{a - 1} \{x^{a} γ (a, x)^{2} - x^{a} Γ (a)^{2} + 2 γ (a, x) Γ (2 a, x)\} \\ + 2 x^{a - 1} e^{- x} γ (3 a, x) + 2 x^{2 a - 1} e^{- x} Γ (2 a, x), \end{aligned}$

from Lemma A.1 in Appendix 2, we have $(d / d x) f (a, x) > 0$ (a>0, x>0). Also, $f (a, 0) = 0$ holds for a>0. Therefore, we obtain

$V [L (Z)] - V [L (Z + C)] \geq 0,$

where equality holds only when C = 0. Moreover, from Equation (5), we find that C = 0 holds only when $k_{1} = k_{2}$ .

Figure 5 plots $V [L (Z)] - V [L (Z + C)]$ for the Laplace and Gaussian distributions respect to $k_{2}$ for $k_{1} = 50$ and b = 1.

4. Simulation

The simulations of our method used actual data, in particular ‘cars’ of the R datasets package. The cars is a dataset of speeds and stopping distances of cars.

We separated the cars data into training and test datasets as follows: The odd-numbered data were selected the cars is the training dataset, and the even-numbered data were the test dataset. The following figure shows the scatter plots of the training and test data. The horizontal axis represents speed, and the vertical axis represents stopping distances of cars (Figure 6).

Figure 6. — Scatter plots of the training and test data. (a) Training data. (b) Verification data.

We used the training dataset to fit the parameters of the regression model $y = a x + b + ε$ and also used it to find the solution to the minimization problem:

(\tilde{a}, \tilde{b}) := \arg min_{(a, b)} \{\sum_{i = 1}^{n} L (a x_{i} + b - y_{i})\} .

The regression coefficients of y = ax + b were estimated to be $\hat{a} = 4.09$ and $\hat{b} = - 18.61$ by using the least squares method, and the unbiased sample variance of the error was 308.42. We obtained a prediction $\hat{y} = \hat{a} x + \hat{b}$ and $z = \hat{y} - y \sim N (0, 308.42)$ . Fix the conditions as $k_{1} = 1$ and $k_{2} = 5$ . Then C = 16.99 from Equation (6), and the estimated solution to the minimization problem is $(\tilde{a}, \tilde{b}) = (5.33, - 19.33)$ . Figure 7 plots (i) $\hat{y} = \hat{a} x + \hat{b}$ , (ii) $\hat{y} + C$ , and (iii) $\tilde{y} = \tilde{a} x + \tilde{b}$ .

Figure 7. — Scatter plots and plots of (i), (ii), and (iii). (a) Scatter plot for the training data and plots of (i), (ii), and (iii). (b) Scatter plot for the test data and plots of (i), (ii), and (iii).

The slope of $\tilde{y}$ is steeper than the slope of $\hat{y}$ . Therefore, if the model $y (x) = a x + b + ε$ is true, $a = \hat{a}$ , and $b = \hat{b}$ , then the value $\tilde{y} (x)$ is far from $y (x)$ when x is large. That is, the loss $L (\tilde{y} (x) - y (x))$ is large when x is large. This implies that using $\tilde{y}$ as a prediction is high risk. On the other hand, the slope of $\hat{y}$ is equal to the slope of $\hat{y} + C$ . Therefore, using $\hat{y} + C$ as a prediction is low risk.

Table 1 lists the sample means and sample variances of the loss for the training and test data when $k_{1} = 1$ and $k_{2} = 5$ .

Table 1. Sample means and sample variances of the loss for the training and the test data when $k_{1} = 1$ and $k_{2} = 5$ .

	Mean (training)	Mean (test)	Variance (training)	Variance (test)
(i)	37.31	28.05	3337.93	896.99
(ii)	31.31	21.86	834.35	105.40
(iii)	28.48	25.56	612.18	127.50

Open in a new tab

Figure 8 is a plot of the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

For the test data, the plot of the loss of (iii) is in the form of steps. This is because $\tilde{a}$ and $\tilde{b}$ are discontinuous with respect to $k_{2}$ . On the other hand, the plot of the loss of (ii) is smooth. This is because C is continuous with respect to $k_{2}$ . Therefore, the loss of (ii) is more stable than the loss of (iii). In addition, deriving $\tilde{a}$ and $\tilde{b}$ is troublesome, but deriving C is easy because C is explicitly a function of $k_{1}$ and $k_{2}$ .

Figure 9 plots sample variances of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

Clealy, from Figures 8 and 9, the sample means and sample variances of (ii) and (iii) are lower than those of (i) for any $k_{2}$ . This simulation shows that it is best to use (ii) as the prediction. Other simulation results are listed in Appendix 3.

Appendix 1.

Calculation of the expected value and variance of the loss

A.1. Expected value of the loss

Put $β := (2 a b Γ (a))^{- 1}$ . Then,

\begin{aligned} E [L (Z + c)] & = \int_{- \infty}^{+ \infty} L (z + c) f_{Z} (z) d z \\ = k_{2} β \int_{- \infty}^{- c} (- z - c) \exp (- {∣\frac{z}{b}∣}^{1 / a}) d z + k_{1} β \int_{- c}^{+ \infty} (z + c) \exp (- {∣\frac{z}{b}∣}^{1 / a}) d z . \end{aligned}

Replace z with bz to get

E [L (Z + c)] = k_{2} b β \int_{- \infty}^{- c / b} (- b z - c) \exp (- ∣ z ∣^{1 / a}) d z + k_{1} b β \int_{- c / b}^{+ \infty} (b z + c) \exp (- ∣ z ∣^{1 / a}) d z .

When $c \geq 0$ , we have

\begin{aligned} E [L (Z + c)] & = k_{2} b β \int_{- \infty}^{- c / b} (- b z - c) \exp (- (- z)^{1 / a}) d z \\ + k_{1} b β \int_{- c / b}^{0} (b z + c) \exp (- (- z)^{1 / a}) d z + k_{1} b β \int_{0}^{+ \infty} (b z + c) \exp (- z^{1 / a}) d z \\ = k_{2} b β \int_{c / b}^{+ \infty} (b z - c) \exp (- z^{1 / a}) d z \\ + k_{1} b β \int_{0}^{c / b} (- b z + c) \exp (- z^{1 / a}) d z + k_{1} b β \int_{0}^{+ \infty} (b z + c) \exp (- z^{1 / a}) d z \\ = (k_{1} + k_{2}) b^{2} β \int_{c / b}^{+ \infty} z \exp (- z^{1 / a}) d z \\ + (k_{1} - k_{2}) b c β \int_{0}^{+ \infty} \exp (- z^{1 / a}) d z + (k_{1} + k_{2}) b c β \int_{0}^{c / b} \exp (- z^{1 / a}) d z . \end{aligned}

When c<0, we have

\begin{aligned} E [L (Z + c)] & = k_{2} b β \int_{- \infty}^{0} (- b z - c) \exp (- (- z)^{1 / a}) d z + k_{2} b β \int_{0}^{- c / b} (- b z - c) \exp (- z^{1 / a}) d z \\ + k_{1} b β \int_{- c / b}^{+ \infty} (b z + c) \exp (- z^{1 / a}) d z \\ = k_{2} b β \int_{0}^{+ \infty} (b z - c) \exp (- z^{1 / a}) d z + k_{2} b β \int_{0}^{- c / b} (- b z - c) \exp (- z^{1 / a}) d z \\ + k_{1} b β \int_{- c / b}^{+ \infty} (b z + c) \exp (- z^{1 / a}) d z \\ = (k_{1} + k_{2}) b^{2} β \int_{- c / b}^{+ \infty} z \exp (- z^{1 / a}) d z \\ + (k_{1} - k_{2}) b c β \int_{0}^{+ \infty} \exp (- z^{1 / a}) d z - (k_{1} + k_{2}) b c β \int_{0}^{- c / b} \exp (- z^{1 / a}) d z . \end{aligned}

From the above, for any $c \in R$ , we have

\begin{aligned} E [L (Z + c)] & = (k_{1} + k_{2}) b^{2} β \int_{∣ c / b ∣}^{+ \infty} z \exp (- z^{1 / a}) d z \\ + (k_{1} - k_{2}) b c β \int_{0}^{+ \infty} \exp (- z^{1 / a}) d z + (k_{1} + k_{2}) b ∣ c ∣ β \int_{0}^{∣ c / b ∣} \exp (- z^{1 / a}) d z . \end{aligned}

Now set $t := z^{1 / a}$ to get

\begin{aligned} E [L (Z + c)] & = (k_{1} + k_{2}) a b^{2} β \int_{c^{'}}^{+ \infty} t^{2 a - 1} e^{- t} d t \\ + (k_{1} - k_{2}) a b c β \int_{0}^{+ \infty} t^{a - 1} e^{- t} d t + (k_{1} + k_{2}) a b ∣ c ∣ β \int_{0}^{c^{'}} t^{a - 1} e^{- t} d t \\ = (k_{1} + k_{2}) a b^{2} β Γ (2 a, c^{'}) + (k_{1} - k_{2}) a b c β Γ (a) + (k_{1} + k_{2}) a b ∣ c ∣ β γ (a, c^{'}), \end{aligned}

where $c^{'} := ∣ c / b ∣^{1 / a}$ . Therefore, for any $c \in R$ , we have

E [L (Z + c)] = \frac{(k_{1} - k_{2}) c}{2} + \frac{(k_{1} + k_{2}) ∣ c ∣}{2 Γ (a)} γ (a, {∣\frac{c}{b}∣}^{1 / a}) + \frac{(k_{1} + k_{2}) b}{2 Γ (a)} Γ (2 a, {∣\frac{c}{b}∣}^{1 / a}) .

A.2. Variance of the loss

Put $β := (2 a b Γ (a))^{- 1}$ . Then,

\begin{aligned} E [L (Z + c)^{2}] & = \int_{- \infty}^{+ \infty} L (z + c)^{2} f_{Z} (z) d z \\ = k_{2}^{2} β \int_{- \infty}^{- c} (z + c)^{2} \exp (- {∣\frac{z}{b}∣}^{1 / a}) d z + k_{1}^{2} β \int_{- c}^{+ \infty} (z + c)^{2} \exp (- {∣\frac{z}{b}∣}^{1 / a}) d z . \end{aligned}

Replace z with bz to get

E [L (Z + c)^{2}] = k_{2}^{2} b β \int_{- \infty}^{- c / b} (b z + c)^{2} \exp (- ∣ z ∣^{1 / a}) d z + k_{1}^{2} b β \int_{- c / b}^{+ \infty} (b z + c)^{2} \exp (- ∣ z ∣^{1 / a}) d z .

When $c \geq 0$ , we have

\begin{aligned} E [L (Z + c)^{2}] & = k_{2}^{2} b β \int_{- \infty}^{- c / b} (b z + c)^{2} \exp (- (- z)^{1 / a}) d z \\ + k_{1}^{2} b β \int_{- c / b}^{0} (b z + c)^{2} \exp (- (- z)^{1 / a}) d z + k_{1}^{2} b β \int_{0}^{+ \infty} (b z + c)^{2} \exp (- z^{1 / a}) d z \\ = k_{2}^{2} b β \int_{c / b}^{+ \infty} (- b z + c)^{2} \exp (- z^{1 / a}) d z \\ + k_{1}^{2} b β \int_{0}^{c / b} (- b z + c)^{2} \exp (- z^{1 / a}) d z + k_{1}^{2} b β \int_{0}^{+ \infty} (b z + c)^{2} \exp (- z^{1 / a}) d z \\ = (k_{1}^{2} + k_{2}^{2}) b^{3} β \int_{0}^{+ \infty} z^{2} \exp (- z^{1 / a}) d z + (k_{1}^{2} - k_{2}^{2}) b^{3} β \int_{0}^{c / b} z^{2} \exp (- z^{1 / a}) d z \\ + 2 (k_{1}^{2} - k_{2}^{2}) b^{2} c β \int_{c / b}^{+ \infty} z \exp (- z^{1 / a}) d z \\ + (k_{1}^{2} + k_{2}^{2}) b c^{2} β \int_{0}^{+ \infty} \exp (- z^{1 / a}) d z + (k_{1}^{2} - k_{2}^{2}) b c^{2} β \int_{0}^{c / b} \exp (- z^{1 / a}) d z . \end{aligned}

When c<0, we have

\begin{aligned} E [L (Z + c)^{2}] & = k_{2}^{2} b β \int_{- \infty}^{0} (b z + c)^{2} \exp (- (- z)^{1 / a}) d z + k_{2}^{2} b β \int_{0}^{- c / b} (b z + c)^{2} \exp (- z^{1 / a}) d z \\ + k_{1}^{2} b β \int_{- c / b}^{+ \infty} (b z + c)^{2} \exp (- z^{1 / a}) d z \\ = k_{2}^{2} b β \int_{0}^{+ \infty} (- b z + c)^{2} \exp (- z^{1 / a}) d z + k_{2}^{2} b β \int_{0}^{- c / b} (b z + c)^{2} \exp (- z^{1 / a}) d z \\ + k_{1}^{2} b β \int_{- c / b}^{+ \infty} (b z + c)^{2} \exp (- z^{1 / a}) d z \\ = (k_{1}^{2} + k_{2}^{2}) b^{3} β \int_{0}^{+ \infty} z^{2} \exp (- z^{1 / a}) d z - (k_{1}^{2} - k_{2}^{2}) b^{3} β \int_{0}^{- c / b} z^{2} \exp (- z^{1 / a}) d z \\ + 2 (k_{1}^{2} - k_{2}^{2}) b^{2} c β \int_{- c / b}^{+ \infty} z \exp (- z^{1 / a}) d z \\ + (k_{1}^{2} + k_{2}^{2}) b c^{2} β \int_{0}^{+ \infty} \exp (- z^{1 / a}) d z - (k_{1}^{2} - k_{2}^{2}) b c^{2} β \int_{0}^{- c / b} \exp (- z^{1 / a}) d z . \end{aligned}

From the above, for any $c \in R$ , we have

\begin{aligned} E [L (Z + c)^{2}] & = (k_{1}^{2} + k_{2}^{2}) b^{3} β \int_{0}^{+ \infty} z^{2} \exp (- z^{1 / a}) d z \\ + s g n (c) (k_{1}^{2} - k_{2}^{2}) b^{3} β \int_{0}^{∣ c / b ∣} z^{2} \exp (- z^{1 / a}) d z \\ + 2 (k_{1}^{2} - k_{2}^{2}) b^{2} c β \int_{∣ c / b ∣}^{+ \infty} z \exp (- z^{1 / a}) d z \\ + (k_{1}^{2} + k_{2}^{2}) b c^{2} β \int_{0}^{+ \infty} \exp (- z^{1 / a}) d z \\ + s g n (c) (k_{1}^{2} - k_{2}^{2}) b c^{2} β \int_{0}^{∣ c / b ∣} \exp (- z^{1 / a}) d z . \end{aligned}

Now set $t := z^{1 / a}$ to get

\begin{aligned} E [L (Z + c)^{2}] & = (k_{1}^{2} + k_{2}^{2}) a b^{3} β \int_{0}^{+ \infty} t^{3 a - 1} e^{- t} d t + s g n (c) (k_{1}^{2} - k_{2}^{2}) a b^{3} β \int_{0}^{c^{'}} t^{3 a - 1} e^{- t} d t \\ + 2 (k_{1}^{2} - k_{2}^{2}) a b^{2} c β \int_{c^{'}}^{+ \infty} t^{2 a - 1} e^{- t} d t \\ + (k_{1}^{2} + k_{2}^{2}) a b c^{2} β \int_{0}^{+ \infty} t^{a - 1} e^{- t}, d t + s g n (c) (k_{1}^{2} - k_{2}^{2}) a b c^{2} β \int_{0}^{c^{'}} t^{a - 1} e^{- t} d t \\ = (k_{1}^{2} + k_{2}^{2}) a b^{3} β Γ (3 a) + s g n (c) (k_{1}^{2} - k_{2}^{2}) a b^{3} β γ (3 a, c^{'}) \\ + 2 (k_{1}^{2} - k_{2}^{2}) a b^{2} c β Γ (2 a, c^{'}) \\ + (k_{1}^{2} + k_{2}^{2}) a b c^{2} β Γ (a) + s g n (c) (k_{1}^{2} - k_{2}^{2}) a b c^{2} β γ (a, c^{'}), \end{aligned}

where $c^{'} := ∣ c / b ∣^{1 / a}$ . Therefore, for any $c \in R$ , we have

\begin{aligned} E [L (Z + c)^{2}] & = \frac{(k_{1}^{2} + k_{2}^{2}) c^{2}}{2} + s g n (c) \frac{(k_{1}^{2} - k_{2}^{2}) c^{2}}{2 Γ (a)} γ (a, c^{'}) + \frac{(k_{1}^{2} - k_{2}^{2}) b c}{Γ (a)} Γ (2 a, c^{'}) \\ + \frac{(k_{1}^{2} + k_{2}^{2}) b^{2} Γ (3 a)}{2 Γ (a)} + s g n (c) \frac{(k_{1}^{2} - k_{2}^{2}) b^{2}}{2 Γ (a)} γ (3 a, c^{'}) . \end{aligned}

Moreover, from Appendix A.1, we have

\begin{aligned} E [L (Z + c)]^{2} & = \frac{(k_{1} - k_{2})^{2} c^{2}}{4} + \frac{(k_{1}^{2} - k_{2}^{2}) c ∣ c ∣}{2 Γ (a)} γ (a, c^{'}) + \frac{(k_{1}^{2} - k_{2}^{2}) b c}{2 Γ (a)} Γ (2 a, c^{'}) \\ + \frac{(k_{1} + k_{2})^{2} b ∣ c ∣}{2 Γ (a)^{2}} γ (a, c^{'}) Γ (2 a, c^{'}) \\ + \frac{(k_{1} + k_{2})^{2} c^{2}}{4 Γ (a)^{2}} γ {(a, c^{'})}^{2} + \frac{(k_{1} + k_{2})^{2} b^{2}}{4 Γ (a)^{2}} Γ {(2 a, c^{'})}^{2} . \end{aligned}

Therefore, for any $c \in R$ , we have

\begin{aligned} V [L (Z + c)] & = E [L (Z + c)^{2}] - E [L (Z + c)]^{2} \\ = \frac{(k_{1} + k_{2})^{2} c^{2}}{4} + \frac{(k_{1}^{2} - k_{2}^{2}) b c}{2 Γ (a)} Γ (2 a, c^{'}) \\ - \frac{(k_{1} + k_{2})^{2} b ∣ c ∣}{2 Γ (a)^{2}} γ (a, c^{'}) Γ (2 a, c^{'}) \\ - \frac{(k_{1} + k_{2})^{2} c^{2}}{4 Γ (a)^{2}} γ {(a, c^{'})}^{2} - \frac{(k_{1} + k_{2})^{2} b^{2}}{4 Γ (a)^{2}} Γ {(2 a, c^{'})}^{2} \\ + \frac{(k_{1}^{2} + k_{2}^{2}) b^{2} Γ (3 a)}{2 Γ (a)} + s g n (c) \frac{(k_{1}^{2} - k_{2}^{2}) b^{2}}{2 Γ (a)} γ (3 a, c^{'}) . \end{aligned}

Appendix 2. Inequalities for the gamma and the incomplete gamma functions

Here, we prove the following lemma used in Theorem 3.1.

Lemma A.1

For a>0 and $x > 0,$

$x^{a} γ (a, x)^{2} - x^{a} Γ (a)^{2} + 2 γ (a, x) Γ (2 a, x) > 0.$

To prove Lemma A.1, we need the following three lemmas:

Lemma A.2

For $a > 0,$

$2 Γ (2 a) - a Γ (a)^{2} > 0.$

Lemma A.3

For a>0 and $x \geq 0,$

$a γ (a, x) \geq x^{a} e^{- x} .$

Lemma A.4

For a>0 and $b \in R,$

$lim_{x \to + \infty} x^{b} Γ (a, x) = 0.$

Proof Proof of Lemma A.2 —

First, we prove

$\sum_{n = 1}^{\infty} \frac{1}{n (2 n - 1)} = 2 \log 2 .$ (A1)

Let $S_{n} := \sum_{k = 1}^{n} (1 / k (2 k - 1))$ . Accordingly, we have

$\begin{aligned} S_{n} & = \sum_{k = 1}^{n} (\frac{2}{2 k - 1} - \frac{1}{k}) \\ = 2 \sum_{k = 1}^{n} \frac{1}{2 k - 1} - \sum_{k = 1}^{n} \frac{1}{k} \\ = 2 \sum_{k = 1}^{n} \frac{1}{2 k - 1} + (2 \sum_{k = 1}^{n} \frac{1}{2 k} - 2 \sum_{k = 1}^{n} \frac{1}{2 k}) - \sum_{k = 1}^{n} \frac{1}{k} \\ = 2 (\sum_{k = 1}^{n} \frac{1}{2 k - 1} + \sum_{k = 1}^{n} \frac{1}{2 k}) - 2 \sum_{k = 1}^{n} \frac{1}{k} \\ = 2 \sum_{k = 1}^{2 n} \frac{1}{k} - 2 \sum_{k = 1}^{n} \frac{1}{k} \\ = 2 \sum_{k = n + 1}^{2 n} \frac{1}{k} \\ = 2 \sum_{k = 1}^{n} \frac{1}{k + n} . \end{aligned}$

Therefore, we obtain

$\begin{aligned} lim_{n \to \infty} S_{n} & = 2 lim_{n \to \infty} \sum_{k = 1}^{n} \frac{1}{k + n} \\ = 2 lim_{n \to \infty} \frac{1}{n} \sum_{k = 1}^{n} \frac{1}{1 + \frac{k}{n}} \\ = \int_{0}^{1} \frac{1}{1 + x} d x \\ = 2 \log 2 . \end{aligned}$

Thus, Equation (A1) is proved.

Next, by using Equation (A1), we prove

$4^{a} Γ (a + \frac{1}{2}) > \sqrt{π} Γ (a + 1)$ (A2)

for a>0. Let

$g (a) := \frac{4^{a} Γ (a + \frac{1}{2})}{\sqrt{π} Γ (a + 1)} .$

To prove $g (a) > 1$ for a>0, we use the following formula [2, p. 13, Theorem 1.2.5]:

$\frac{d}{d x} \log Γ (x) = \frac{Γ^{'} (x)}{Γ (x)} = - γ_{0} + \sum_{n = 1}^{\infty} (\frac{1}{n} - \frac{1}{x + n - 1}),$

where $γ_{0}$ is Euler's constant. Taking the logarithmic derivative of $g (a)$ , from the above formula, we have

$\begin{aligned} \frac{d}{d a} \log g (a) & = 2 \log 2 + \frac{d}{d a} \log Γ (a + \frac{1}{2}) - \frac{d}{d a} \log Γ (a + 1) \\ = 2 \log 2 + \sum_{n = 1}^{\infty} (\frac{1}{n} - \frac{1}{a - \frac{1}{2} + n}) - \sum_{n = 1}^{\infty} (\frac{1}{n} - \frac{1}{a + n}) \\ = 2 \log 2 - \frac{1}{2} \sum_{n = 1}^{\infty} \frac{1}{(a + n) (a - \frac{1}{2} + n)} \\ > 2 \log 2 - \frac{1}{2} \sum_{n = 1}^{\infty} \frac{1}{n (n - \frac{1}{2})} \\ = 2 \log 2 - \sum_{n = 1}^{\infty} \frac{1}{n (2 n - 1)} \end{aligned}$

for a>0. Moreover, from Equation (A1), we obtain $(d / d a) \log g (a) > 0$ for a>0. This leads to $(d / d a) g (a) > 0$ for a>0. Equation (A2) follows from this and $g (0) = 1$ .

Using Equation (A2) and the formula [2, p. 22, Theorem 6.5.1]

$Γ (2 a) = \frac{2^{2 a - 1}}{\sqrt{π}} Γ (a) Γ (a + \frac{1}{2}),$

we complete the proof of Lemma A.2 as follows:

$\begin{aligned} 2 Γ (2 a) - a Γ (a)^{2} & = \frac{2^{2 a}}{\sqrt{π}} Γ (a) Γ (a + \frac{1}{2}) - Γ (a) Γ (a + 1) \\ = \frac{1}{\sqrt{π}} Γ (a) \{4^{a} Γ (a + \frac{1}{2}) - \sqrt{π} Γ (a + 1)\} \\ > 0. \end{aligned}$

Proof Proof of Lemma A.3 —

For a>0 and $x \geq 0$ , we define

$u (a, x) := a γ (a, x) - x^{a} e^{- x} .$

Then, we have

$\frac{d}{d x} u (a, x) = x^{a} e^{- x} \geq 0.$

The lemma follows from this and $u (a, 0) = 0$ .

Proof Proof of Lemma A.4 —

When $b \leq 0$ , the statement easily follows from the definition of $Γ (a, x)$ . When b>0, we use L'Hôpital's rule to obtain

$\begin{aligned} lim_{x \to + \infty} \frac{Γ (a, x)}{x^{- b}} & = lim_{x \to + \infty} \frac{x^{a - 1} e^{- x}}{b x^{- b - 1}} \\ = lim_{x \to + \infty} \frac{x^{a + b} e^{- x}}{b} \\ = 0. \end{aligned}$

Proof Proof of Lemma A.1 —

For a>0 and $x \geq 0$ , we define

$y_{1} (a, x) := x^{a} γ (a, x)^{2} - x^{a} Γ (a)^{2} + 2 γ (a, x) Γ (2 a, x) .$

Let us prove $y_{1} (a, x) > 0$ (a>0, x>0). For a>0 and $x \geq 0$ , we define

$\begin{aligned} y_{2} (a, x) & := a γ (a, x)^{2} - a Γ (a)^{2} + 2 e^{- x} Γ (2 a, x); \\ y_{3} (a, x) & := a x^{a - 1} γ (a, x) - Γ (2 a, x) - x^{2 a - 1} e^{- x}; \\ y_{4} (a, x) & := a (a - 1) γ (a, x) + x^{a} e^{- x} (2 x + 1 - a) . \end{aligned}$

Then, we have

$\begin{aligned} \frac{d}{d x} y_{1} (a, x) & = x^{a - 1} y_{2} (a, x); \\ \frac{d}{d x} y_{2} (a, x) & = 2 e^{- x} y_{3} (a, x); \\ \frac{d}{d x} y_{3} (a, x) & = x^{a - 2} y_{4} (a, x); \\ \frac{d}{d x} y_{4} (a, x) & = x^{a} e^{- x} (3 a + 1 - 2 x) . \end{aligned}$

From these relations, we find that the signs of $(d / d x) y_{i} (a, x)$ and $y_{i + 1} (a, x)$ (i = 1, 2, 3) are the same for a>0 and $x > 0$ . Let $p_{i} (a)$ (i = 2, 3, 4) be the value of x satisfying $y_{i} (a, x) = 0$ . It is easily verified that $lim_{x \to 0 +} (d / d x) y_{4} (a, x) = lim_{x \to + \infty} (d / d x) y_{4} (a, x) = lim_{x \to 0 +} y_{4} (a, x) = 0$ and $lim_{x \to + \infty} y_{4} (a, x) = a (a - 1) Γ (a)$ for a>0. Therefore, from the first derivative test, we obtain Tables 2 and 3. Moreover, using Lemmas A.3 and A.4 and L'Hôpital's rule, we obtain

$\begin{aligned} lim_{x \to 0 +} \frac{d}{d x} y_{3} (a, x) = \{\begin{cases} \infty & (0 < a < 1), \\ 0 & (a \geq 1), \end{cases} lim_{x \to + \infty} \frac{d}{d x} y_{3} (a, x) = \{\begin{cases} 0 & (0 < a < 2), \\ 2 & (a = 2), \\ \infty & (a > 2), \end{cases} \\ lim_{x \to 0 +} y_{3} (a, x) = - Γ (2 a) (a > 0), lim_{x \to + \infty} y_{3} (a, x) = \{\begin{cases} 0 & (0 < a < 1), \\ 1 & (a = 1), \\ \infty & (a > 1), \end{cases} \\ lim_{x \to 0 +} \frac{d}{d x} y_{2} (a, x) = - 2 Γ (2 a) (a > 0), lim_{x \to + \infty} \frac{d}{d x} y_{2} (a, x) = 0 (a > 0), \\ lim_{x \to 0 +} y_{2} (a, x) = 2 Γ (2 a) - a Γ (a)^{2} (a > 0), lim_{x \to + \infty} y_{2} (a, x) = 0 (a > 0), \\ lim_{x \to 0 +} \frac{d}{d x} y_{1} (a, x) = \{\begin{cases} \infty & (0 < a < 1), \\ 1 & (a = 1), \\ 0 & (a > 1), \end{cases} lim_{x \to + \infty} \frac{d}{d x} y_{1} (a, x) = 0 (a > 0), \\ lim_{x \to 0 +} y_{1} (a, x) = 0 (a > 0), lim_{x \to + \infty} y_{1} (a, x) = 0 (a > 0) . \end{aligned}$

From these results, Lemma A.2, and the fact that the signs of $(d / d x) y_{i} (a, x)$ and $y_{i + 1} (a, x)$ (i = 1, 2, 3) are the same for a>0 and x>0, we obtain Tables 4 and 5. From Tables 4 and 5, we can verify that $y_{1} (a, x) > 0$ holds for $a > 0$ and x>0. This completes the proof of the lemma.

Table A1. Case of 0<a<1.

x 0 ··· $\frac{3 a + 1}{2}$ ··· $p_{4} (a)$ ··· $+ \infty$

$\frac{d}{d x} y_{4} (a, x)$ 0 + 0 − − − 0

$y_{4} (a, x)$ 0 + + + 0 − −

Open in a new tab
Table A2. Case of $a \geq 1$ .

x 0 ··· $\frac{3 a + 1}{2}$ ··· $+ \infty$

$\frac{d}{d x} y_{4} (a, x)$ 0 + 0 − 0

$y_{4} (a, x)$ 0 + + + $\begin{matrix} 0 (a = 1) \\ + (a > 1) \end{matrix}$

Open in a new tab
Table A3. Case of 0<a<1.

x 0 ··· $p_{2} (a)$ ··· $p_{3} (a)$ ··· $p_{4} (a)$ ··· $+ \infty$

$\frac{d}{d x} y_{3} (a, x)$ $+ \infty$ + + + + + 0 − 0

$y_{3} (a, x)$ − − − − 0 + + + 0

$\frac{d}{d x} y_{2} (a, x)$ − − − − 0 + + + 0

$y_{2} (a, x)$ + + 0 − − − − − 0

$\frac{d}{d x} y_{1} (a, x)$ + + 0 − − − − − 0

$y_{1} (a, x)$ 0 + + + + + + + 0

Open in a new tab
Table A4. Case of $a \geq 1$ .

x 0 ··· $p_{2} (a)$ ··· $p_{3} (a)$ ··· $+ \infty$

$\frac{d}{d x} y_{3} (a, x)$ 0 + + + + + $\begin{matrix} 0 (a < 2) \\ + (a = 2) \\ + \infty (a > 2) \end{matrix}$

$y_{3} (a, x)$ − − − − 0 + 0

$\frac{d}{d x} y_{2} (a, x)$ − − − − 0 + 0

$y_{2} (a, x)$ + + 0 − − − 0

$\frac{d}{d x} y_{1} (a, x)$ + + 0 − − − 0

$y_{1} (a, x)$ 0 + + + + + 0

Open in a new tab

Appendix 3. Other simulations

We conducted three other simulations of our method using actual data, i.e. ‘cars’ of the R dataset package. This dataset includes the speeds and stopping distances of automobiles.

We separated the dataset into training and test datasets as follows: The even-numbered data were selected as the training dataset, and the odd-numbered data were the test dataset. The regression coefficients of $y = a x + b$ were estimated as $\hat{a} = 3.80$ and $\hat{b} = - 16.93$ by using the least squares method, and unbiased sample variance of the error was 159.31. We obtained a prediction $\hat{y} = \hat{a} x + \hat{b}$ and $z = \hat{y} - y \sim N (0, 159.31)$ . Fix the conditions as $k_{1} = 1$ and $k_{2} = 5$ . Then, C = 16.99 by Equation (6), and the estimated solution to the minimization problem is $(\tilde{a}, \tilde{b}) = (5.33, - 19.33)$ . Figure A1 plots (i) $\hat{y} = \hat{a} x + \hat{b}$ , (ii) $\hat{y} + C$ , and (iii) $\tilde{y} = \tilde{a} x + \tilde{b}$ .

Figure A.1. — Scatter plots and plots of (i), (ii), and (iii). (a) Scatter plot for the training data and plots of (i), (ii), and (iii). (b) Scatter plot for the test data and plots of (i), (ii), and (iii).

Table 6 lists the sample means and sample variances of the loss for the training and test data when $k_{1} = 1$ and $k_{2} = 5$ .

Table A5. Sample means and sample variances of the loss for the training and test data when $k_{1} = 1$ and $k_{2} = 5$ .

	Mean (training)	Mean (test)	Variance (training)	Variance (test)
(i)	31.42	42.54	1307.61	4065.98
(ii)	20.04	31.74	216.34	1836.93
(iii)	19.56	30.40	169.21	1540.48

Open in a new tab

Figure A2 plots the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

Moreover Figure A3 plots the sample variances of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

Next, we separated the cars dataset into training and test datasets as follows: The 1st to 25th data of the cars dataset were selected as the training dataset (the data of the cars dataset are arranged in ascending order of speed), and the 26th to 50th data were the test dataset. The regression coefficients of y = ax + b were estimated to be $\hat{a} = 3.29$ and $\hat{b} = - 10.00$ by using the least squares method, and unbiased sample variance of the error was 178.77. Figure A4 plots the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

Figure A.4. — Plots of sample means for the training and test data. (a) Sample means for the training data. (b) Sample means for the test data.

Moreover, Figure A5 plots the sample variances of the loss for the training and test data for (i), (ii), and (iii) for respect to $k_{2}$ .

Finally, we separated the cars into training and test datasets as follows: The 26th to 50th data were selected as the training dataset, and the 26th to 50th data were the test dataset. The regression coefficients of $y = a x + b$ were estimated to be $\hat{a} = 5.15$ and $\hat{b} = - 42.04$ by using the least squares method, and the unbiased sample variance of the error was 277.30. Figure A6 plots the sample means of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

Figure A.6. — Plots of sample means for the training and the test data. (a) Sample means for the training data. (b) Sample means for the test data.

Moreover, Figure A7 plots the sample variances of the loss for the training and test data for (i), (ii), and (iii) respect to $k_{2}$ .

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Aldrich J., Doing least squares: Perspectives from gauss and yule, Int. Stat. Rev. 66 (1998), pp. 61–81. Available at https://onlinelibrary.wiley.com/doi/abs/ 10.1111/j.1751-5823.1998.tb00406.x. [DOI] [Google Scholar]
2.Andrews G.E., Askey R., and Roy R., Special Functions, Encyclopedia of Mathematics and its Applications, Cambridge University Press, New York, 1999. [Google Scholar]
3.Breckling J. and Chambers R., M-quantiles, Biometrika 75 (1988), pp. 761–771. Available at http://www.jstor.org/stable/2336317. doi: 10.1093/biomet/75.4.761 [DOI] [Google Scholar]
4.Dytso A., Bustin R., Poor H.V., and Shamai S., Analytical properties of generalized gaussian distributions, J. Stat. Distrib. Appl. 5 (2018), p. 6. Available at 10.1186/s40488-018-0088-5. [DOI] [Google Scholar]
5.Efron B., Regression percentiles using asymmetric squared error loss, Stat. Sin. 1 (1991), pp. 93–125. Available at http://www.jstor.org/stable/24303995. [Google Scholar]
6.Guidotti R., Monreale A., Turini F., Pedreschi D., and Giannotti F., A survey of methods for explaining black box models, ACM Comput. Surv. 51 (2018), pp. 93:1–93:42. [Google Scholar]
7.Johnson N., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, 2nd ed., Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Vol. 1, Wiley-Interscience, 1994. [Google Scholar]
8.Johnson N., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, 2nd ed., Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Vol. 2, Wiley-Interscience, 1995. [Google Scholar]
9.Koenker R. and Bassett G., Regression quantiles, Econometrica 46 (1978), pp. 33–50. Available at http://www.jstor.org/stable/1913643. doi: 10.2307/1913643 [DOI] [Google Scholar]
10.Legendre A., Nouvelles méthodes pour la détermination des orbites des comètes, Nineteenth Century Collections Online (NCCO): Science, Technology, and Medicine: 1780–1925, F. Didot, 1805. Available at https://books.google.co.jp/books?id=FRcOAAAAQAAJ.
11.Nadarajah S., A generalized normal distribution, J. Appl. Stat. 32 (2005), pp. 685–694. Available at 10.1080/02664760500079464. [DOI] [Google Scholar]
12.Stigler S.M., Gauss and the invention of least squares, Ann. Statist. 9 (1981), pp. 465–474. Available at 10.1214/aos/1176345451. [DOI] [Google Scholar]
13.Subbotin T., On the law of frequency of error, Recueil Math. 31 (1923), pp. 296–301. [Google Scholar]
14.Wang Z.X. and Guo D.R., Special Functions, World Scientific, 1989. Available at https://www.worldscientific.com/doi/abs/ 10.1142/0653. [DOI] [Google Scholar]
15.Yamaguchi N., Hori M., and Ideguchi Y., Minimising the expectation value of the procurement cost in electricity markets based on the prediction error of energy consumption, Pac. J. Math. Ind. 10 (2018), p. 4. Available at 10.1186/s40736-018-0038-7. [DOI] [Google Scholar]
16.Zellner A., Bayesian estimation and prediction using asymmetric loss functions, J. Am. Stat. Assoc. 81 (1986), pp. 446–451. Available at http://www.jstor.org/stable/2289234. doi: 10.1080/01621459.1986.10478289 [DOI] [Google Scholar]

[CIT0001] 1.Aldrich J., Doing least squares: Perspectives from gauss and yule, Int. Stat. Rev. 66 (1998), pp. 61–81. Available at https://onlinelibrary.wiley.com/doi/abs/ 10.1111/j.1751-5823.1998.tb00406.x. [DOI] [Google Scholar]

[CIT0002] 2.Andrews G.E., Askey R., and Roy R., Special Functions, Encyclopedia of Mathematics and its Applications, Cambridge University Press, New York, 1999. [Google Scholar]

[CIT0003] 3.Breckling J. and Chambers R., M-quantiles, Biometrika 75 (1988), pp. 761–771. Available at http://www.jstor.org/stable/2336317. doi: 10.1093/biomet/75.4.761 [DOI] [Google Scholar]

[CIT0004] 4.Dytso A., Bustin R., Poor H.V., and Shamai S., Analytical properties of generalized gaussian distributions, J. Stat. Distrib. Appl. 5 (2018), p. 6. Available at 10.1186/s40488-018-0088-5. [DOI] [Google Scholar]

[CIT0005] 5.Efron B., Regression percentiles using asymmetric squared error loss, Stat. Sin. 1 (1991), pp. 93–125. Available at http://www.jstor.org/stable/24303995. [Google Scholar]

[CIT0006] 6.Guidotti R., Monreale A., Turini F., Pedreschi D., and Giannotti F., A survey of methods for explaining black box models, ACM Comput. Surv. 51 (2018), pp. 93:1–93:42. [Google Scholar]

[CIT0007] 7.Johnson N., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, 2nd ed., Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Vol. 1, Wiley-Interscience, 1994. [Google Scholar]

[CIT0008] 8.Johnson N., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, 2nd ed., Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Vol. 2, Wiley-Interscience, 1995. [Google Scholar]

[CIT0009] 9.Koenker R. and Bassett G., Regression quantiles, Econometrica 46 (1978), pp. 33–50. Available at http://www.jstor.org/stable/1913643. doi: 10.2307/1913643 [DOI] [Google Scholar]

[CIT0010] 10.Legendre A., Nouvelles méthodes pour la détermination des orbites des comètes, Nineteenth Century Collections Online (NCCO): Science, Technology, and Medicine: 1780–1925, F. Didot, 1805. Available at https://books.google.co.jp/books?id=FRcOAAAAQAAJ.

[CIT0011] 11.Nadarajah S., A generalized normal distribution, J. Appl. Stat. 32 (2005), pp. 685–694. Available at 10.1080/02664760500079464. [DOI] [Google Scholar]

[CIT0012] 12.Stigler S.M., Gauss and the invention of least squares, Ann. Statist. 9 (1981), pp. 465–474. Available at 10.1214/aos/1176345451. [DOI] [Google Scholar]

[CIT0013] 13.Subbotin T., On the law of frequency of error, Recueil Math. 31 (1923), pp. 296–301. [Google Scholar]

[CIT0014] 14.Wang Z.X. and Guo D.R., Special Functions, World Scientific, 1989. Available at https://www.worldscientific.com/doi/abs/ 10.1142/0653. [DOI] [Google Scholar]

[CIT0015] 15.Yamaguchi N., Hori M., and Ideguchi Y., Minimising the expectation value of the procurement cost in electricity markets based on the prediction error of energy consumption, Pac. J. Math. Ind. 10 (2018), p. 4. Available at 10.1186/s40736-018-0038-7. [DOI] [Google Scholar]

[CIT0016] 16.Zellner A., Bayesian estimation and prediction using asymmetric loss functions, J. Am. Stat. Assoc. 81 (1986), pp. 446–451. Available at http://www.jstor.org/stable/2289234. doi: 10.1080/01621459.1986.10478289 [DOI] [Google Scholar]

x	0	···	$\frac{3 a + 1}{2}$	···	$p_{4} (a)$	···	$+ \infty$
$\frac{d}{d x} y_{4} (a, x)$	0	+	0	−	−	−	0
$y_{4} (a, x)$	0	+	+	+	0	−	−

x	0	···	$\frac{3 a + 1}{2}$	···	$+ \infty$
$\frac{d}{d x} y_{4} (a, x)$	0	+	0	−	0
$y_{4} (a, x)$	0	+	+	+	$\begin{matrix} 0 (a = 1) \\ + (a > 1) \end{matrix}$

x	0	···	$p_{2} (a)$	···	$p_{3} (a)$	···	$p_{4} (a)$	···
$\frac{d}{d x} y_{3} (a, x)$	$+ \infty$	+	+	+	+	+	0	−
$y_{3} (a, x)$	−	−	−	−	0	+	+	+
$\frac{d}{d x} y_{2} (a, x)$	−	−	−	−	0	+	+	+
$y_{2} (a, x)$	+	+	0	−	−	−	−	−
$\frac{d}{d x} y_{1} (a, x)$	+	+	0	−	−	−	−	−
$y_{1} (a, x)$	0	+	+	+	+	+	+	+

x	0	···	$p_{2} (a)$	···	$p_{3} (a)$	···	$+ \infty$
$\frac{d}{d x} y_{3} (a, x)$	0	+	+	+	+	+	$\begin{matrix} 0 (a < 2) \\ + (a = 2) \\ + \infty (a > 2) \end{matrix}$
$y_{3} (a, x)$	−	−	−	−	0	+	0
$\frac{d}{d x} y_{2} (a, x)$	−	−	−	−	0	+	0
$y_{2} (a, x)$	+	+	0	−	−	−	0
$\frac{d}{d x} y_{1} (a, x)$	+	+	0	−	−	−	0
$y_{1} (a, x)$	0	+	+	+	+	+	0

PERMALINK

Minimizing the expected value of the asymmetric loss function and an inequality for the variance of the loss

Naoya Yamaguchi

Yuka Yamaguchi

Ryuei Nishii

ABSTRACT

1. Introduction

2. Expected value and variance of the loss

2.1. Expected value and variance of the loss

Figure 1.

Figure 2.

2.2. Parameter value minimizing the expected value

Figure 3.

2.3. Minimized expected value of the loss

Figure 4.

3. Inequality for the variance of the loss

Theorem 3.1

Proof.

Figure 5.

4. Simulation

Figure 6.

Figure 7.

Table 1. Sample means and sample variances of the loss for the training and the test data when k1=1 and k2=5.

Figure 8.

Figure 9.

Appendix 1.

Calculation of the expected value and variance of the loss

A.1. Expected value of the loss

A.2. Variance of the loss

Appendix 2. Inequalities for the gamma and the incomplete gamma functions

Lemma A.1

Lemma A.2

Lemma A.3

Lemma A.4

Proof Proof of Lemma A.2 —

Proof Proof of Lemma A.3 —

Proof Proof of Lemma A.4 —

Proof Proof of Lemma A.1 —

Table A1. Case of 0<a<1.

Table A2. Case of a≥1.

Table A3. Case of 0<a<1.

Table A4. Case of a≥1.

Appendix 3. Other simulations

Figure A.1.

Table A5. Sample means and sample variances of the loss for the training and test data when k1=1 and k2=5.

Figure A.2.

Figure A.3.

Figure A.4.

Figure A.5.

Figure A.6.

Figure A.7.

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. Sample means and sample variances of the loss for the training and the test data when $k_{1} = 1$ and $k_{2} = 5$ .

Table A2. Case of $a \geq 1$ .

Table A4. Case of $a \geq 1$ .

Table A5. Sample means and sample variances of the loss for the training and test data when $k_{1} = 1$ and $k_{2} = 5$ .