Dichotomous unimodal compound models: application to the distribution of insurance losses

Salvatore D Tomarchio; Antonio Punzo

doi:10.1080/02664763.2020.1789076

. 2020 Jul 7;47(13-15):2328–2353. doi: 10.1080/02664763.2020.1789076

Dichotomous unimodal compound models: application to the distribution of insurance losses

Salvatore D Tomarchio ^1,^CONTACT, Antonio Punzo ¹

PMCID: PMC9041621 PMID: 35707426

ABSTRACT

A correct modelization of the insurance losses distribution is crucial in the insurance industry. This distribution is generally highly positively skewed, unimodal hump-shaped, and with a heavy right tail. Compound models are a profitable way to accommodate situations in which some of the probability masses are shifted to the tails of the distribution. Therefore, in this work, a general approach to compound unimodal hump-shaped distributions with a mixing dichotomous distribution is introduced. A 2-parameter unimodal hump-shaped distribution, defined on a positive support, is considered and reparametrized with respect to the mode and to another parameter related to the distribution variability. The compound is performed by scaling the latter parameter by means of a dichotomous mixing distribution that governs the tail behavior of the resulting model. The proposed model can also allow for automatic detection of typical and atypical losses via a simple procedure based on maximum a posteriori probabilities. Unimodal gamma and log-normal are considered as examples of unimodal hump-shaped distributions. The resulting models are firstly evaluated in a sensitivity study and then fitted to two real insurance loss datasets, along with several well-known competitors. Likelihood-based information criteria and risk measures are used to compare the models.

Keywords: Insurance losses, compound model, unimodal gamma distribution, log-normal distribution, value at risk, tail value at risk

2010 Mathematics Subject Classifications: 62F99, 62P05, 62H30

1. Introduction

It is crucial in insurance business to find adequate models for loss data, in order to correctly compute premiums, risk measures and the required reserves. Unfortunately, modeling insurance losses is not an easy task because of the distinctive characteristics of their distribution. As widely documented, the loss distribution is unimodal hump-shaped, highly positively skewed and with a heavy right tail [1,3,33,37].

Among the different approaches, the parametric one has been the most followed in the actuarial literature. Some authors argue that observed losses can be described by a single probability distribution, such as the log-normal [10,14], the gamma or the Pareto distribution [15,46]. As pointed out by Cooray and Ananda [18], the Pareto distribution, due to the monotonically decreasing shape of the density, does not provide a reasonable fit when the density of data is hump-shaped. In these cases, the log-normal distribution has been typically used, but it fades away to zero more quickly than the Pareto distribution. This implies that the log-normal model fails to cover the higher losses. Some models have been proposed to solve this issue in the actuarial literature [see, e.g., [1,18,47,50]]. Alternative models that have been increasingly discussed in finance and actuarial sciences are based on the skew-normal, skew-t or skew-logistic [2,26,38]. However, the use of distributions defined on the whole real line is not adequate, because of the allocation of probability mass to unrealistic negative losses (the so-called boundary bias problem [17]).

In Section 2, we propose a compound approach that allows to consider all the peculiarities of the loss data until here discussed. The underlying idea is derived from one of the most famous compound models, the normal scale mixture [45], in which the variability-related parameter of the normal distribution is scaled by a convenient mixing positive random variable. In detail, we consider a 2-parameter unimodal hump-shaped distribution, defined on a positive support and reparametrized with respect to the mode $θ > 0$ and to another parameter $γ > 0$ that is strictly related to the distribution variability. The γ parameter is then scaled by a dichotomous mixing variable that depends on a vector of parameters $ν$ governing the tails behavior. The resulting model can be seen as a 2-component contaminated model [44,48,49] in which one component, often called ‘contaminant’, is an inflated version of the other, herein called ‘target’ [19], and allows a more flexible accommodation of the outlying observations. Additionally, since both components have the same mode, our model guarantees unimodality in θ.

The proposed model can also allow for automatic detection of atypical losses via a simple procedure based on maximum a posteriori probabilities [4]. Such a detection rule allows the partition of the positive real line in two regions [23,36] that could be used to identify different categories of losses. Indeed, their classification can be of interest for insurance companies in tuning premiums and credit scores [39,62].

A drawback of the proposed model is that when extremely large losses need to be accounted for, this makes also heavier its left tail, rising the probability of losses close to zero. However, this is a minor problem for at least two reasons: (1) because of its distinctive characteristics, the loss distribution has a very short left tail that could be considered negligible; (2) risk managers are mainly interested in a good description of the right tail, because large losses, though rare in frequency, are the ones that have the most impact on the financial stability of insurance companies [8].

For illustration purposes, two examples of unimodal hump-shaped distributions are examined in Section 2. Parameter estimation via the maximum likelihood (ML) approach is discussed in Section 3, while model selection and computational aspects are considered in Section 4. A sensitivity analysis is described in Section 5, where the robustness of the ML estimator for our models is investigated. The application of our models to two real insurance loss datasets, along with other well-known competitors, is discussed in Section 6. Finally, some conclusions, with possible future extensions, are drawn in Section 7.

2. Methodology

2.1. A general framework

Let X be the positive random variable, denoting the insurance loss, with probability density function (pdf) $p (x)$ . Requiring that $p (x)$ should be unimodal hump-shaped and positively skewed, the general unimodal compound model proposed by Punzo et al. [50] has the pdf

p (x; θ, γ, ν) = \int_{0}^{\infty} f (x; θ, γ / w) h (w; ν) d w, x > 0,

(1)

where $f (x; θ, γ)$ is the unimodal hump-shaped pdf, with $θ > 0$ being the mode and $γ > 0$ governing the concentration of f around the mode, and $h (w; ν)$ is the mixing probability density or mass function that depends on the vector of parameters $ν$ . Note that, if W (random counterpart of w) is degenerate in 1 (i.e. $W \equiv 1$ ), then $f (x; θ, γ)$ is obtained, whereas in the other cases the tails of p are heavier than those of f.

2.2. Dichotomous unimodal compound models

Although only the case of a continuous mixing random variable W on $(0, \infty)$ is considered in Punzo et al. [50], a simple but effective special case of the compound model (1) is obtained if we consider

W = {\begin{cases} 1 & with probability π \\ 1 / η & with probability 1 - π, \end{cases}

(2)

where $π \in (0, 1)$ and $η > 1$ . The probability mass function (pmf) of W in (2) is

h (w; π, η) = π^{(w - 1 / η) / (1 - 1 / η)} {(1 - π)}^{(1 - w) / (1 - 1 / η)}, w \in {1, 1 / η},

with $(π, η)^{'} = ν$ . So, model (1) can be written as

p (x; θ, γ, η, π) = π f (x; θ, γ) + (1 - π) f (x; θ, η γ), x > 0,

(3)

Therefore, model (3) can be seen as a contaminated model, in which the contaminant distribution $f (x; θ, η γ)$ is an inflated version of target one $f (x; θ, γ)$ . As a result, atypical losses can be modeled in a better way. Specifically, atypical losses are defined with respect to the target distribution as points producing an overall density that is too heavy-tailed to be modeled by the target distribution only. From this point of view, it is natural to require, as often happens in robust statistics, that half of the losses are typical [16,51,59]; this is the reason why we will consider $π \in (0.5, 1)$ hereafter. Being a member of the family of compound unimodal models in (1), $p (x; θ, γ, η, π)$ will have heavier tails with respect to $f (x; θ, γ)$ . As a special case of (3), when $π \to 1^{-}$ and $η \to 1^{+}$ , we obtain $f (x; θ, γ)$ .

Differently from Punzo et al. [50], with respect to $f (x; θ, γ)$ , the additional parameters π and η have an interpretation of practical interest:

π is the proportion of points from the target distribution; in other words, it can be meant as the proportion of typical losses.
η denotes the degree of contamination and, because of the assumption $η > 1$ , it can be meant as the increase in variability due to the points which do not come from the target distribution, i.e. due to the presence of either an excessive number of losses close to zero or to excessively large losses. Therefore, it is an inflation parameter.

It is worth to be mentioned that model (3) assumes that the target and the contaminant distributions have the same pdf. Although this assumption may seem to be restrictive, it allows the target and the contaminant distributions to be comparable not only in terms of the mode θ, but also in terms of the variability parameter γ, so that η effectively acts as an inflation parameter.

Another peculiar characteristic of model (3) is that, once the parameters are estimated (marked with a ‘hat’ in the following), it is possible to determine whether a generic loss x is typical via the a posteriori probability

v (x; \hat{θ}, \hat{γ}, \hat{η}, \hat{π}) = \frac{\hat{π} f (x; \hat{θ}, \hat{γ})}{p (x; \hat{θ}, \hat{γ}, \hat{η}, \hat{π})} .

(4)

In detail, x is considered typical if $v (x; \hat{θ}, \hat{γ}, \hat{η}, \hat{π}) > 0.5$ , while it is considered atypical otherwise. Such a decision rule (also called ‘dichotomizer’) can be equivalently defined in terms of the discriminant functions

D_{typical} (x; \hat{θ}, \hat{γ}, \hat{π}) = \hat{π} f (x; \hat{θ}, \hat{γ})

and

D_{atypical} (x; \hat{θ}, \hat{γ}, \hat{η}, \hat{π}) = (1 - \hat{π}) f (x; \hat{θ}, \hat{η} \hat{γ}),

such that x is classified as typical if

D_{typical} (x; \hat{θ}, \hat{γ}, \hat{π}) > D_{atypical} (x; \hat{θ}, \hat{γ}, \hat{η}, \hat{π}),

(5)

and atypical otherwise [23,36]. By solving (5) as a function of x, the positive real line is partitioned in two regions of typical and atypical data, delimited by the intersection points between the two discriminant functions. Indeed, these points represent the situation of maximum assignment uncertainty, where the probabilities to be a typical or an atypical point coincide. As better visualized in Section 6, the region of atypical data involves the two tails of the model, whereas the area between them entails the typical region. This might be useful to identify different categories of losses that could be classified as atypically low (left tail), typical (center) and atypically high (right tail) with respect to the target distribution.

Among the existing 2-parameter unimodal hump-shaped distributions that can be used for f, log-normal and unimodal gamma will be treated in the next paragraphs for illustrative purposes.

2.3. Specific cases

2.3.1. Mode-parametrized log-normal distribution

The pdf of a log-normal (LN) distribution with the standard parametrization is given by

f (x; μ, σ) = \frac{e^{- {(\ln (x) - μ)}^{2} / 2 σ^{2}}}{\sqrt{2 π} σ x}, x > 0,

where $μ \in R$ and $σ > 0$ are the mean and the standard deviation of the variable's natural logarithm, respectively.

With the purpose of having a distribution that can be inserted in model (3), also in this case, a reparametrization is needed. Imposing

\begin{array}{ll} μ = \ln (θ) + γ \\ σ^{2} = γ \end{array} \Rightarrow \begin{array}{ll} θ = e^{μ - σ^{2}} \\ γ = σ^{2} \end{array},

the pdf becomes

f (x; θ, γ) = \frac{e^{- {(\ln (x) - \ln (θ) - γ)}^{2} / 2 γ}}{\sqrt{2 π γ} x}, x > 0,

(6)

with $θ > 0$ and $γ > 0$ . The variance of a random variable with pdf (6) is

(e^{γ} - 1) θ^{2} e^{3 γ} .

(7)

For a fixed θ in (7), the variance rises if γ increases, confirming that γ governs the variability of the distribution, as illustrated in Figure 1.

When the LN distribution is chosen in (3), the LN dichotomous unimodal compound model (LN-DUC) is obtained. In this case, the intersection points between the discriminant functions, delimiting the typical and the atypical regions, are

x_{1} = θ e^{- \sqrt{γ (η - 1) η [γ (η - 1) - 2 \ln (1 - α) + 2 \ln (α) + \ln (η)]} / (η - 1)},

with $x_{1} \in (0, θ)$ , and

x_{2} = θ e^{\sqrt{γ (η - 1) η [γ (η - 1) - 2 \ln (1 - α) + 2 \ln (α) + \ln (η)]} / (η - 1)},

with $x_{2} \in (θ, \infty)$ .

2.3.2. Mode-parametrized unimodal gamma distribution

The pdf of a unimodal hump-shaped gamma (UG) distribution with the standard parametrization is

f (x; α, β) = \frac{x^{α - 1} e^{- x / β}}{β^{α} Γ (α)}, x > 0,

with shape parameter $α > 1$ and scale parameter $β > 0$ . In order to have a distribution that can be inserted in model (3), a reparametrization is needed. Setting

\begin{array}{ll} α = \frac{θ}{γ} + 1 \\ β = γ \end{array} \Rightarrow \begin{array}{ll} θ = β (α - 1) \\ γ = β \end{array},

we obtain

f (x; θ, γ) = \frac{x^{θ / γ} e^{- x / γ}}{γ^{(θ / γ) + 1} Γ (\frac{θ}{γ} + 1)}, x > 0,

(8)

with $θ > 0$ and $γ > 0$ . The variance of a random variable with pdf (8) is

γ^{2} + θ γ .

(9)

Fixing θ in (9), the variance increases if γ increases, confirming that γ governs the variability of the distribution, as shown in Figure 2.

When the UG distribution is chosen in (3), the UG dichotomous unimodal compound model (UG-DUC) is obtained. Recovering a closed-form expression for the intersection points between the discriminant functions is analytically cumbersome. However, they can be easily obtained numerically by using, for instance, the uniroot.all() function of the rootSolve package [56] in R [52].

3. Maximum-likelihood estimation

To find the estimates of the parameters for our model, we used the maximum-likelihood (ML) approach. Given a random sample $x_{1}, \dots, x_{n}$ of size n from the pdf in (3), the corresponding log-likelihood function is

l (θ, γ, η, π) = \sum_{i = 1}^{n} \ln [p (x_{i}; θ, γ, η, π)] .

(10)

Operationally, maximization of (10) with respect to $(θ, γ, η, π)^{'}$ is obtained by the general-purpose numerical optimizer optim(), included in the stats package. The BFGS algorithm, passed to optim() via the argument method, is used for maximization.

To run the optimization algorithm, the starting values for the parameters must be specified. A system of two equations is solved to initialize θ and γ. The first equation matches the empirical and the theoretical modes. The second equation is model-dependent. Specifically, it matches the empirical and the theoretical variances, for the UG-DUC model, whereas the empirical and the theoretical means are equaled for the LN-DUC model. As already said in Section 2.2, when $π \to 1^{-}$ and $η \to 1^{+}$ , we obtain the target distribution $f (x; θ, γ)$ . For this reason, in our analyses we set $π_{0} = 0.99$ and $η_{0} = 1.01$ as starting values for π and η. From an operational point of view, thanks to the monotonicity property of the BFGS algorithm, this guarantees that the log-likelihood of model (3) will be always greater than, or equal to, the log-likelihood of the target distribution. This is an important consideration for choosing between the target distribution and its corresponding dichotomous unimodal compound version, when using likelihood-based model selection criteria.

All the parameters involved are subject to constraints, and to make the maximization of $l (θ, γ, η, π)$ unconstrained, as required by the BFGS algorithm, a transformation/back-transformation approach has been implemented [6,64]. Specifically, the original constrained parameters are mapped to unconstrained real values (marked with a ‘tilde’ in the following) and, after the log-likelihood is maximized with respect to the unconstrained parameters, a back-transformation is applied to obtain the constrained parameter estimates. The following transformations/back-transformations are used:

\tilde{θ} = \ln (θ) \leftrightarrow θ = \exp (\tilde{θ}), \tilde{γ} = \ln (γ) \leftrightarrow γ = \exp (\tilde{γ}),

(11)

\begin{aligned} \tilde{η} & = \ln (η - 1) \leftrightarrow η = \exp (\tilde{η}) + 1, \end{aligned}

(12)

\begin{aligned} \tilde{π} & = \ln (\frac{ϕ}{1 - ϕ}) \leftrightarrow π = \frac{0.5 + \exp (\tilde{π})}{1 + \exp (\tilde{π})}, \end{aligned}

(13)

where $ϕ = (π - 0.5) / 0.5$ .

An alternative theoretical way to obtain the ML estimates of model (3) could be the implementation of the well-known expectation-maximization (EM) algorithm [21]. However, closed-form expressions are not available for most of the parameters involved in the M-step of the algorithm; for these parameters, numerical methods, such as the BFGS algorithm mentioned above, must be used in this case too. Therefore, there is no need to derive and code any sophisticated algorithm when the log-likelihood can be directly maximized by simply using a general-purpose numerical optimizer (more on this topic can be found in [42]).

4. Computational and operative aspects

4.1. Model comparison

Several measures are adopted to compare the fitted models. In Section 4.1.1, we discuss on how a global goodness of fit evaluation is done. We then focus on the right tail goodness of fit in Section 4.1.2 because, as pointed out in Section 1, it is of particular interest for risk managers.

4.1.1. Global fit evaluation

We begin with a likelihood-ratio (LR) test, that can be used to compare the goodness of fit of two models, one of which (the null model) is a special case of the other (the alternative model). When the null model is the UG distribution, then the alternative model is the UG-DUC, while when the null model is the LN distribution, then the alternative model is the LN-DUC. Under the null hypothesis of no improvement, the test statistic is

LR = 2 [l ({\hat{δ}}_{1}) - l ({\hat{δ}}_{0})],

where ${\hat{δ}}_{1}$ and ${\hat{δ}}_{0}$ are the parameter vectors of the alternative and null models, respectively, and $l ({\hat{δ}}_{1})$ and $l ({\hat{δ}}_{0})$ are the maximized log-likelihood values of the alternative and null models, respectively. Unfortunately, regularity conditions do not hold for mixture-based models, and the LR statistic has not its usual asymptotic null distribution of a $χ^{2}$ random variable with m degrees of freedom, where $m \in N_{+}$ is the difference between the number of estimated parameters of the alternative and the null models. To overcome this issue, under the same null and alternative hypotheses, the following parametric double bootstrap procedure is implemented [43,45]:

fit the null and alternative models to the sample and compute the LR statistic, say ${LR}_{obs}$ ;
for $j = 1, \dots, B_{1}$
1. generate a (first-level) bootstrap sample, of size n, from the null model fitted to the sample;
2. fit the null and the alternative models and compute the first-level bootstrap LR statistic, say ${LR}_{j}^{*}$ ;
3. for $l = 1, \dots, B_{2}$
  1. generate a (second-level) bootstrap sample, of size n, from the null model fitted to the (first-level) bootstrap sample;
  2. fit the null and the alternative models and compute the second-level bootstrap LR statistic, say ${LR}_{j l}^{* *}$ ;
4. compute the second-level bootstrap p-value as
  $p_{j}^{* *} = \frac{1}{B_{2}} \sum_{l = 1}^{B_{2}} I ({LR}_{j l}^{* *} > {LR}_{j}^{*}),$
  where $I (\cdot)$ denotes the indicator function, which is equal to 1 when its argument is true and 0 otherwise;
calculate the first-level bootstrap p-value as
$p^{*} = \frac{1}{B_{1}} \sum_{j = 1}^{B_{1}} I ({LR}_{j}^{*} > {LR}_{obs});$
calculate the double bootstrap p-value as the proportion of the $p_{j}^{* *}$ that are more extreme than $p^{*}$ , i.e.
$p = \frac{1}{B_{1}} \sum_{j = 1}^{B_{1}} I (p_{j}^{* *} < p^{*}) .$

The double bootstrap procedure reduces the bias in the bootstrap estimates obtained from the first level, but is computationally demanding, as we need to calculate $1 + B_{1} + (B_{1} \times B_{2})$ test statistics. In our applications, we set $B_{1} = 500$ and $B_{2} = 250$ , producing a total of 125,501 estimates. The double bootstrap p-value is compared with the 0.05 significance level.

Besides comparing our models with the corresponding target distributions, two well-known (likelihood-based) information criteria, the AIC [5] and BIC [54], are used to make comparisons with some benchmark distributions. This provides an overall measure of goodness fit of the models, and allow to draw up a ranking. These information criteria are defined as

AIC = 2 l (\hat{δ}) - 2 k and BIC = 2 l (\hat{δ}) - k \ln (n),

(14)

where $l (\hat{δ})$ is the maximized log-likelihood value, k is the number of parameters of the model and n is the sample size. The BIC penalizes model complexity more than the AIC. With the formulation in (14), the higher is the value of these two information criteria, the better is the model.

4.1.2. Right tail fit evaluation

In the actuarial literature (see, e.g. [1,9,25,38,50]), a standard procedure used to assess the estimated tail behavior consists in comparing empirical risk measures, namely the value at risk (VaR) and tail value at risk (TVaR), with the same measures produced by the fitted models. These risk measures are related to the quantiles of a distribution; therefore, the closer the estimated risk measures are to the empirical ones, the better the fitting on the tail is. From a parametric point of view, a proper tail estimation can be important for computing tail probabilities of values that are outside the range of the observed data [7,12]. More generally, it allows the possibility to simulate and assess extreme scenarios that are not limited to those occurred in the sample [11,22]. Therefore, it is important to have a model that resembles the tail behavior of the empirical distribution, but permits to go beyond it. The VaR represents the maximum loss which can occur with probability c over a specified period of time, and it is defined as

{VaR}_{c} (X) = inf {x : F (x) \geq c}, 0 \leq c \leq 1,

(15)

where F is the cumulative distribution function of X. The TVaR quantifies the expected value of the loss given that an event outside a given probability level has occurred. It is defined as

{TVaR}_{c} (X) = E [X | X \geq {VaR}_{c} (X)], 0 \leq c \leq 1.

(16)

If the underlying distribution for X is continuous, then the TVaR is equivalent to the expected shortfall (ES). An alternative formulation, that will be useful in the following, expresses the TVaR in terms of the VaR as

{TVaR}_{c} (X) = \frac{1}{1 - c} \int_{c}^{1} {VaR}_{u} (X) d u .

(17)

In the following we will consider the probability levels c = 0.95 and c = 0.99 for both risk measures.

To evaluate the goodness of the VaR and TVaR estimates produced by the competing models, two backtesting procedures are also implemented. For the VaR, a binomial test examines, under the null hypothesis, if the proportion of violations $\hat{ρ}$ obtained using the estimates of the VaR ( $\hat{ρ} = y / n$ , where y is the number of losses exceeding the estimated VaR and n is the sample size), is compatible with the one expected $ρ = (1 - c)$ [40]. The test can be performed via the VaRTest() function of the rugarch package [34]. For the TVaR, we implemented the backtest suggested by Emmer et al. [28], that it based on a simple approximation of the TVaR representation in (17). Specifically, given a ${TVaR}_{c} (X)$ , they suggest to compute and backtest the VaR at the following four levels: ${VaR}_{c} (X)$ , ${VaR}_{0.75 c + 0.25} (X)$ , ${VaR}_{0.5 c + 0.5} (X)$ and ${VaR}_{0.25 c + 0.75} (X)$ . If all the four backtests are not rejected, then the estimate of TVaR can be considered acceptable. As a consequence, the minimum among the four p-values is enough to decide whether the TVaR estimate has to be discarded; such a minimum will be reported in the analyses of Section 6.

Lastly, both backtesting procedures are compared with the 0.05 significance level.

4.2. Competing models and approaches

The proposed models are compared to a lot of standard distributions used in the actuarial literature, and whose parameters are estimated by using the ML approach. They are listed in Section 4.2.1. In addition to the ML approach, two further methodologies are considered in Section 4.2.2 and Section 4.2.3, respectively.

4.2.1. ML-estimated competitors

Table 1 shows the competing distributions, along with the R functions and packages used to fit them to the data.

Table 1. R functions and packages used for the ML-based competitors.

Distribution	Function	Package
Exponential	`fitdistr()`	MASS [60]
Weibull	`fitdist()`	fitdistrplus [20]
Normal	`fitdist()`	fitdistrplus
Logistic	`fitdist()`	fitdistrplus
Skew-logistic	`glogisfit()`	glogis [63]
Skew-Normal	`snormFit()`	fGarch [61]
Skew-t	`sstdFit()`	fGarch
Hyperbolic	`hyperbFit()`	HyperbolicDist [55]

Open in a new tab

Clearly, the UG and LN distributions discussed in Section 2.3.1 and Section 2.3.2, are also considered. We implemented a convenient code to find the ML estimates of the parameters of these distributions, as done for our models. Specifically, the θ and γ parameters are estimated via the optim() function, by using the same strategy explained in Section 3.

4.2.2. The t-score approach

The t-score moment estimator has been proposed and discussed by Fabián [29–31]. When heavy-tailed distributions are considered, it has been shown to be robust against outliers [58]. Additionally, although the usual moments of many heavy-tailed distributions do not exist, or do exist only within a certain range of parameters, the existence of the corresponding t-score moments is guaranteed [30].

In the insurance framework this estimator has been considered by Stehlík et al. [57,58]. Specifically, the authors discussed the t-score moment estimator for the Pareto distribution, both in its ‘American’ and ‘European’ parametrization. The main difference between these two parametrizations is that the former starts in zero, whereas the latter begins in a threshold that either should be carefully estimated [57,58] or is assumed to be known [53]. We tried both versions in our applications, but the risk measures obtained with the ‘European’ parametrization were far from their empirical counterparts, suggesting that this parametrization is not adequate for modeling our data. On the contrary, the estimated risk measures obtained with the ‘American’ parametrization are close to the empirical values, and for this reason only this version of the Pareto distribution is considered hereafter. Specifically, it has pdf $α λ^{α} / (x + λ)^{α + 1}$ , with $α > 0$ and $λ > 0$ . According to Fabián [30], its parameters are estimated by solving the system of the following two equations:

\begin{aligned} \sum_{i = 1}^{n} \frac{α x_{i} - λ}{x_{i} + λ} = 0 \end{aligned}

(18)

\begin{aligned} \frac{1}{n} \sum_{i = 1}^{n} {(\frac{α x_{i} - λ}{x_{i} + λ})}^{2} = \frac{α}{α + 2} \end{aligned}

(19)

Therefore, by using a generalized method of the moments approach, Equations (18) and (19) match the theoretical t-score moments to their empirical counterparts.

4.2.3. The ${PORT-MO}_{p}$ approach

In the field of extreme value theory, the ${PORT-MO}_{p}$ estimator has been recently introduced by Gomes et al. [35]. This estimator has been used for VaR estimation by Figueiredo et al. [32]. Specifically, given a random sample $x_{1}, \dots, x_{n}$ , and the associated sample of ascending order statistics $x_{1 : n} \leq \dots \leq x_{n : n}$ , the ${PORT-MO}_{p}$ VaR can be computed as

{PORT-MO}_{p} VaR (k; p, s) = (x_{n - k : n} - x_{n_{s} : n}) {(\frac{k}{n c})}^{H_{k} (p, s)} + x_{n_{s} : n},

where $n_{s} := ⌊ n s ⌋ + 1$ , with $⌊ \cdot ⌋$ denoting the floor function, $H_{k} (p, s)$ is the ${PORT-MO}_{p}$ estimator, c is the probability level for compute the VaR and ${k, p, s}$ are the tuning parameters that must be estimated. The tuning parameters are estimated by combining the bootstrap scheme in Longin [41] with the double bootstrap algorithm discussed by Brilhante et al. [13]. Notice that, due to the specificity of the underlying concepts and the length of the whole procedure, its detailed explanation goes far beyond the scope of this manuscript. We simply mention the references and the execution order of the steps, by commenting only those where an arbitrary choice is required. In detail, the following scheme is implemented:

Steps 1 to 2.1 of Longin [41], page 130;
1. In step 1, we set Q to be the sequence of values from 0 to 1, with increments of 0.1.
Steps 1 to 16 of Brilhante et al. [13], pages 527–528;
1. In step 5, we set b to be the sequence of values from 0.925 to 0.995, with increments of 0.01.
Steps 3 to 5 of Longin [41], page 130.

Also in this case, we are in the presence of a reasonably sophisticated and time-consuming procedure, given the double bootstrap algorithm.

5. Simulation study

The aim of this section is to evaluate how the ML estimator for the θ and γ parameters of the UG and LN distributions is affected by atypical observations, and how its robustness is increased when the corresponding UG-DUC and LN-DUC models are considered. The underlying idea is borrowed by Stehlík et al. [58], where a simulation study shows how the ML estimator for the shape parameter of a Pareto distribution is affected when atypical observations are added to the data. With similar purposes we conduct two sensitivity analyses that differ depending on how the data are contaminated. In the first analysis, the following scenario is considered:

generate $π n$ typical observations from a UG distribution;
generate $(1 - π) n$ atypical observations from the same UG distribution, with the only difference that the γ-parameter is multiplied by an inflation factor η;
fit both the UG and UG-DUC to the merged data.

The same procedure is repeated by substituting the UG with the LN distribution and the UG-DUC model with the LN-DUC one. Thus, in the first analysis, typical and atypical points come from a distribution of the same type, as assumed in model (3).

In the second analysis, the following scenario is examined instead:

generate $π n$ typical observations from a UG distribution;
generate $(1 - π) n$ atypical observations from a LN distribution, with the same $θ^{*} = θ$ of the UG distribution and with variability parameter $γ^{*}$ ;
fit the UG and UG-DUC models to the merged data.

Again, this procedure is repeated for the complementary case where, in steps 1 and 3 the UG and UG-DUC models are substituted by the LN and LN-DUC models, respectively, and the UG replaces the LN distribution in step 2. In this second study, the atypical points come from a distribution of a different type with respect to the one generating the typical points. Therefore, we are evaluating the estimation performances in presence of a misspecification situation also for model (3).

In each study, the sample size is set to n = 1000, and three different proportions of typical points π are combined with three levels of contamination, governed by η in the first analysis and by the variability parameter $γ^{*}$ of the contaminant distribution in the second analysis. This yields a total of nine different contamination cases. For each of them, 10,000 replications are considered; hence, a total of $2 \times 10, 000 \times 9 \times 2 = 360, 000$ samples are generated. The mean and the standard deviation (in brackets) of the estimated θ and γ, over these replications, are reported. For ease of understanding, each simulation scenario is identified according to the data generating process (DGP), labeled by matching with a ‘+’ the name of the distributions generating the typical and the atypical observations, respectively.

5.1. Sensitivity analysis I

The parameters for the UG+UG DGP are $θ = 1$ and $γ = 1$ , and the average estimated parameters are shown in Table 2. To be more precise, each subtable displays the ML estimates of θ and γ obtained by fitting the UG and UG-DUC models, respectively. On the contrary, the parameters used for the LN+LN DGP are $θ = 1$ and $γ = 0.5$ . The parameters estimated for the corresponding LN and LN-DUC models are reported in Table 3.

Table 2. UG+UG DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10,000 replications by the UG (a) and UG-DUC (b) models.

		η
		2.5	3.75	5
(a)
π	0.9	$\hat{θ} = 0.97 (0.05)$	$\hat{θ} = 0.91 (0.06)$	$\hat{θ} = 0.83 (0.07)$
		$\hat{γ} = 1.18 (0.06)$	$\hat{γ} = 1.37 (0.08)$	$\hat{γ} = 1.57 (0.11)$
	0.8	$\hat{θ} = 0.94 (0.06)$	$\hat{θ} = 0.83 (0.07)$	$\hat{θ} = 0.67 (0.09)$
		$\hat{γ} = 1.36 (0.07)$	$\hat{γ} = 1.72 (0.10)$	$\hat{γ} = 2.13 (0.14)$
	0.7	$\hat{θ} = 0.92 (0.06)$	$\hat{θ} = 0.76 (0.08)$	$\hat{θ} = 0.54 (0.10)$
		$\hat{γ} = 1.53 (0.08)$	$\hat{γ} = 2.06 (0.12)$	$\hat{γ} = 2.66 (0.17)$
(b)
π	0.9	$\hat{θ} = 1.00 (0.05)$	$\hat{θ} = 1.00 (0.05)$	$\hat{θ} = 1.00 (0.05)$
		$\hat{γ} = 0.95 (0.13)$	$\hat{γ} = 0.98 (0.10)$	$\hat{γ} = 0.99 (0.08)$
	0.8	$\hat{θ} = 1.00 (0.05)$	$\hat{θ} = 1.00 (0.06)$	$\hat{θ} = 1.00 (0.06)$
		$\hat{γ} = 0.96 (0.14)$	$\hat{γ} = 0.99 (0.11)$	$\hat{γ} = 0.99 (0.09)$
	0.7	$\hat{θ} = 1.00 (0.06)$	$\hat{θ} = 1.00 (0.06)$	$\hat{θ} = 1.00 (0.06)$
		$\hat{γ} = 0.98 (0.16)$	$\hat{γ} = 0.99 (0.13)$	$\hat{γ} = 0.99 (0.12)$

Open in a new tab

Table 3. LN+LN DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the LN (a) and LN-DUC (b) models.

		η
		1.5	2	2.5
(a)
π	0.9	$\hat{θ} = 0.99 (0.03)$	$\hat{θ} = 0.97 (0.03)$	$\hat{θ} = 0.95 (0.04)$
		$\hat{γ} = 0.53 (0.02)$	$\hat{γ} = 0.57 (0.03)$	$\hat{γ} = 0.62 (0.03)$
	0.8	$\hat{θ} = 0.99 (0.03)$	$\hat{θ} = 0.96 (0.04)$	$\hat{θ} = 0.92 (0.04)$
		$\hat{γ} = 0.56 (0.03)$	$\hat{γ} = 0.64 (0.03)$	$\hat{γ} = 0.74 (0.04)$
	0.7	$\hat{θ} = 0.98 (0.03)$	$\hat{θ} = 0.95 (0.04)$	$\hat{θ} = 0.89 (0.04)$
		$\hat{γ} = 0.59 (0.03)$	$\hat{γ} = 0.70 (0.03)$	$\hat{γ} = 0.84 (0.04)$
(b)
π	0.9	$\hat{θ} = 1.00 (0.03)$	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.00 (0.03)$
		$\hat{γ} = 0.48 (0.06)$	$\hat{γ} = 0.48 (0.06)$	$\hat{γ} = 0.49 (0.05)$
	0.8	$\hat{θ} = 1.00 (0.03)$	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.00 (0.04)$
		$\hat{γ} = 0.49 (0.06)$	$\hat{γ} = 0.48 (0.06)$	$\hat{γ} = 0.49 (0.05)$
	0.7	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.00 (0.04)$
		$\hat{γ} = 0.50 (0.07)$	$\hat{γ} = 0.49 (0.07)$	$\hat{γ} = 0.49 (0.06)$

Open in a new tab

It is straightforward to note that, for a fixed π, the more η increases, the more the differences between the estimates produced by the competing models become. The same behavior occurs keeping fixed η and decreasing π. Moreover, in presence of atypical observations, the estimates produced by the UG-DUC and LN-DUC models are always closer to the true values, indicating the increased robustness of the ML estimator.

It is interesting to evaluate how the differences between the parameter estimates produced by the competing models have an effect on the corresponding quantile estimates, since the most common risk measures are based on them, as discussed in Section 4.1.2. As an example, we could focus our attention on the $(π, η)$ combinations that are highlighted with a gray background in Tables 2 and 3, and that represent situations with growing levels of contamination in the data, within each DGP. For this reason, Figure 3 displays the quantile values of the target distributions and their dichotomous unimodal compound versions, obtained by using the average estimated parameters in the diagonals of Tables 2 and 3, against the true quantiles of the corresponding DGPs, for growing probability levels.

Figure 3. — Quantile values from the target distribution, its dichotomous unimodal compound version, and the true DGP. (a) UG+UG DGP. (b) LN+LN DGP.

It is possible to see that, within each DGP and for low levels of contamination, the estimated quantiles of the competing models are close enough to true ones. However, when the contamination in the data starts to increase, the estimated quantiles of the target distributions start to diverge from the true ones, whereas the corresponding dichotomous compound models fit always better.

5.2. Sensitivity analysis II

The parameters used for the UG+LN DGP are $θ = θ^{*} = 1$ and $γ = 1$ . The average estimated θ and γ obtained by fitting both the UG distribution and the UG-DUC model are shown in Table 4. On the other hand, the parameters used for the LN+UG DGP are $θ = θ^{*} = 1$ and $γ = 0.5$ . The average estimated θ and γ obtained by fitting both the LN distribution and the LN-DUC model are reported in Table 5.

Table 4. UG+LN DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the UG (a) and UG-DUC (b) models.

		$γ^{*}$
		0.75	1	1.25
(a)
π	0.9	$\hat{θ} = 0.98 (0.05)$	$\hat{θ} = 0.90 (0.07)$	$\hat{θ} = 0.73 (0.13)$
		$\hat{γ} = 1.12 (0.07)$	$\hat{γ} = 1.35 (0.12)$	$\hat{γ} = 1.72 (0.22)$
	0.8	$\hat{θ} = 0.96 (0.06)$	$\hat{θ} = 0.80 (0.09)$	$\hat{θ} = 0.46 (0.17)$
		$\hat{γ} = 1.25 (0.08)$	$\hat{γ} = 1.69 (0.16)$	$\hat{γ} = 2.44 (0.31)$
	0.7	$\hat{θ} = 0.95 (0.06)$	$\hat{θ} = 0.71 (0.11)$	$\hat{θ} = 0.22 (0.17)$
		$\hat{γ} = 1.37 (0.10)$	$\hat{γ} = 2.03 (0.19)$	$\hat{γ} = 3.13 (0.33)$
(b)
π	0.9	$\hat{θ} = 1.02 (0.05)$	$\hat{θ} = 1.02 (0.05)$	$\hat{θ} = 1.01 (0.05)$
		$\hat{γ} = 0.95 (0.10)$	$\hat{γ} = 1.01 (0.08)$	$\hat{γ} = 1.04 (0.07)$
	0.8	$\hat{θ} = 1.04 (0.05)$	$\hat{θ} = 1.03 (0.05)$	$\hat{θ} = 0.91 (0.24)$
		$\hat{γ} = 0.96 (0.11)$	$\hat{γ} = 1.03 (0.09)$	$\hat{γ} = 1.08 (0.10)$
	0.7	$\hat{θ} = 1.06 (0.05)$	$\hat{θ} = 1.05 (0.05)$	$\hat{θ} = 1.03 (0.09)$
		$\hat{γ} = 0.96 (0.11)$	$\hat{γ} = 1.05 (0.10)$	$\hat{γ} = 1.13 (0.14)$

Open in a new tab

Table 5. LN+UG DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the LN (a) and LN-DUC (b) models.

		$γ^{*}$
		5	10	15
(a)
π	0.9	$\hat{θ} = 0.95 (0.04)$	$\hat{θ} = 0.88 (0.04)$	$\hat{θ} = 0.82 (0.04)$
		$\hat{γ} = 0.64 (0.03)$	$\hat{γ} = 0.76 (0.04)$	$\hat{γ} = 0.87 (0.04)$
	0.8	$\hat{θ} = 0.91 (0.04)$	$\hat{θ} = 0.81 (0.04)$	$\hat{θ} = 0.72 (0.04)$
		$\hat{γ} = 0.76 (0.04)$	$\hat{γ} = 0.99 (0.04)$	$\hat{γ} = 1.18 (0.05)$
	0.7	$\hat{θ} = 0.88 (0.05)$	$\hat{θ} = 0.77 (0.04)$	$\hat{θ} = 0.67 (0.04)$
		$\hat{γ} = 0.87 (0.04)$	$\hat{γ} = 1.18 (0.05)$	$\hat{γ} = 1.43 (0.06)$
(b)
π	0.9	$\hat{θ} = 0.99 (0.04)$	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.00 (0.03)$
		$\hat{γ} = 0.44 (0.06)$	$\hat{γ} = 0.44 (0.04)$	$\hat{γ} = 0.45 (0.03)$
	0.8	$\hat{θ} = 0.98 (0.04)$	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.00 (0.04)$
		$\hat{γ} = 0.45 (0.07)$	$\hat{γ} = 0.41 (0.04)$	$\hat{γ} = 0.41 (0.03)$
	0.7	$\hat{θ} = 0.96 (0.05)$	$\hat{θ} = 1.00 (0.04)$	$\hat{θ} = 1.01 (0.04)$
		$\hat{γ} = 0.55 (0.11)$	$\hat{γ} = 0.41 (0.04)$	$\hat{γ} = 0.38 (0.03)$

Open in a new tab

Similar conclusions to those previously discussed can be drawn, in terms of differences between the estimates produced by the competing models. Even in presence of a misspecification effect, the ML estimation carried out by the UG-DUC and LN-DUC models is more robust than the one obtained by fitting the corresponding target distributions.

Also in this case, we provide a comparison in terms of quantile values for the $(π, η)$ combinations that are highlighted with a gray background in Tables 4 and 5. This is illustrated in Figure 4 for growing probability values. With the exclusion of the combination $(π = 0.9, γ^{*} = 0.75)$ for the UG+LN DGP case, representing a situation of low contamination and where the right tail of the UG-DUC model is heavier than necessary, in all the other situations our models provide a better fit then their target counterparts. This is due to their greater flexibility, that allows the accommodation of the contaminating points in a better way, regardless of whether they are generated by a distribution of the same type or not.

Figure 4. — Quantile values from the target distribution, its dichotomous unimodal compound version, and the true DGP. (a) UG+LN DGP. (b) LN+UG DGP.

6. Application to insurance loss datasets

In this section, the dichotomous unimodal compound models, along with the nested target distributions, are applied to two real insurance loss datasets.

6.1. French business interruption losses

The first dataset (called Frebiloss hereafter) consists of 2387 French business interruption losses, in French francs (FF), over the period 1985 to 2000, and it is contained in the R package CASdatasets [24]. For each observation, the total cost in FF is considered. For scaling purposes, payments amount is divided by 1000; thus, thousand of French francs (TFF), instead of FF, are considered.

The histogram of the data is displayed in Figure 5, whereas their summary statistics are reported in Table 6. This dataset shares the classic characteristics of insurance losses described in Section 1. Indeed, the mean is considerably higher than the median and the third quartile, suggesting extreme right skewness and a heavy right tail, as also confirmed by some large losses that lay quite far from the bulk of the data. As stated in Section 1, these losses could be considered like atypical values that contaminate the target distribution, implying a heavier right tail than expected. This is supported by the estimated value of the degree of contamination parameter for the LN-DUC ( $\hat{η} = 2.64$ ) and the UG-DUC models ( $\hat{η} = 11.36$ ).

Table 6. Frebiloss: summary statistics.

Variable	Value
Min	100.29
1st Quart.	304.90
Median	762.25
Mean	2027.74
3rd Quart.	1829.39
Max	168,654.35
St. Dev	5938.06
Skewness	17.62
Kurtosis	438.89

Open in a new tab

The comparison between the likelihood-estimated models is presented in Table 7. AIC and BIC, which provide the same ranking, indicate that the LN-DUC model is the best one, while the UG-DUC one is the third. Importantly, both models provide an improvement compared to their target distributions, as also confirmed by the null p-values of the double bootstrap LR test. As highlighted by Kazemi et al. [38], the skew-logistic (in 9th position) seems to work slightly better than the skew-normal (in 11st position), while the skew-t (in 4th position) seems to behave better than both. Globally, our models appear to be very competitive.

Table 7. Frebiloss: AIC and BIC values for the competing models, along with their rankings.

Model	AIC	Rank	BIC	Rank	LR p-value
UG-DUC	−39,974.58	3	−39,997.69	3	0.00
LN-DUC	−39,693.97	1	−39,717.08	1	0.00
UG	−41,130.46	8	−41,142.01	8
LN	−39,790.59	2	−39,802.14	2
Exponential	−41,128.46	7	−41,134.23	7
Weibull	−40,513.47	5	−40,525.02	5
Normal	−48,258.95	12	−48,270.51	12
Logistic	−44,527.29	10	−44,538.85	10
Skew-logistic	−43,447.07	9	−43,464.40	9
Skew-normal	−45,191.31	11	−45,208.64	11
Skew-t	−40,086.33	4	−40,109.44	4
Hyperbolic	−40,899.14	6	−40,922.26	6

Open in a new tab

In the last column, p-values from the LR tests.

Table 8 reports the empirical and the estimated VaR values of all the competing models and approaches. Also in this case, a ranking is introduced in order to simplify the reading, but this time it is based on the absolute value of the percentage of variation with respect to the empirical VaR; the lower the difference, the better the position in the ranking. The corresponding backtesting results are also provided in the last two columns.

Table 8. Frebiloss: ${VaR}_{95} (X)$ and ${VaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

	Value at risk				p-value
Model	${VaR}_{95} (X)$	Rank	${VaR}_{99} (X)$	Rank	${VaR}_{95} (X)$	${VaR}_{99} (X)$
Empirical	7675.81		18,293.88
UG-DUC	9454.65	8	18,931.39	1	0.00	0.55
LN-DUC	7787.84	1	21,810.33	6	0.90	0.05
UG	6074.52	7	9338.02	11	0.00	0.00
LN	6189.57	5	14,304.85	7	0.00	0.00
Exponential	6074.55	6	9338.07	10	0.00	0.00
Weibull	6893.35	2	12,358.04	9	0.01	0.00
Normal	11,792.92	13	15,838.83	4	0.00	0.16
Logistic	5097.50	11	7257.37	13	0.00	0.00
Skew-Logistic	4930.70	12	7093.86	14	0.00	0.00
Skew-Normal	12,385.83	14	16,246.27	3	0.00	0.31
Skew-t	5809.08	10	12,804.28	8	0.00	0.00
Hyperbolic	5879.79	9	8986.22	12	0.00	0.00
Pareto(t-score)	6327.10	4	15,284.05	5	0.00	0.08
${PORT-MO}_{p}$	8772.87	3	19,041.06	2	0.03	0.55

Open in a new tab

When c = 0.95, the LN-DUC model is ranked first, extremely close to the empirical value. On the contrary, some models seem to provide better VaR values than the UG-DUC one. However, if the corresponding backtesting p-values are examined, only the LN-DUC model does not lead to the rejection of the null hypothesis.

In contrast with the previous case, when the c = 0.99, the best model is the UG-DUC, providing a good estimate of the VaR. The LN-DUC model falls in sixth position, exceeding the empirical value. Even if our models have acceptable p-values, four benchmark models have good values too, among which we find the Pareto t-estimator and the ${PORT-MO}_{p}$ (with estimated tuning parameters k = 47, p = 0.19 and s = 0.2).

Table 9 shows the empirical and estimated values of the TVaR along with the corresponding backtesting results. The rankings are computed as in Table 8. According to the backtesting procedure, when c = 0.95, only the LN-DUC provides a p-value that does not lead to the rejection of the null hypothesis. On the contrary, when c = 0.99, our models pass the backtest, along with the Pareto t-estimator. Overall, our models seem to suggest a good description of the tail behavior of the empirical distribution.

Table 9. Frebiloss: ${TVaR}_{95} (X)$ and ${TVaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

	Tail value at risk				min p-value
Model	${VaR}_{95} (X)$	Rank	${VaR}_{99} (X)$	Rank	${VaR}_{95} (X)$	${VaR}_{99} (X)$
Empirical	17,062.59		38,135.05
UG-DUC	15,341.69	2	24,789.96	3	0.00	0.07
LN-DUC	17,872.90	1	40,499.24	1	0.09	0.05
UG	8102.25	9	11,365.75	9	0.00	0.00
LN	11,822.42	6	23,777.22	5	0.00	0.00
Exponential	8102.29	10	11,365.80	10	0.00	0.00
Weibull	10,335.17	8	16,253.92	8	0.00	0.00
Normal	14,273.68	4	17,850.61	7	0.00	0.00
Logistic	6393.65	12	8508.83	12	0.00	0.00
Skew-logistic	6274.92	13	8422.52	13	0.00	0.00
Skew-normal	14,754.25	3	18,227.81	6	0.00	0.00
Skew-t	11,297.15	7	24,449.78	4	0.00	0.00
Hyperbolic	7809.57	11	10,914.98	11	0.00	0.00
Pareto(t-score)	13,176.71	5	29,258.68	2	0.00	0.05

Open in a new tab

Finally, Figure 6 illustrates the estimated probabilities to be typical or atypical points, as defined in (4), according to the UG-DUC and LN-DUC models. For a better graphical visualization, we focus our attention to the $(0, 7000]$ interval. As discussed in Section 2.1, these probabilities coincide in the intersection points between the discriminant functions, delimiting the typical and the atypical regions (marked with the vertical dashed lines in Figure 6). In detail, the intersection points are in $x_{1} = 0.12$ and $x_{2} = 2531.82$ for the UG-DUC model and in $x_{1} = 52.21$ and $x_{2} = 1057.17$ for the LN-DUC one. Considering that $x_{1}$ is lower than the minimum observed (100.29; see Table 6) for both models, none of the points is considered atypically low. However, a different classification of typical and atypically high data is produced in the interval $(1057.17, 2531.82)$ , containing the $21.49 %$ of the data. When a classification of the data is of interest, a possible way to choose among the two models could be to rely on the model producing the best overall fitting according to the information criteria, i.e. in this case the LN-DUC one. If it is of particular interest the set of atypically high data, another option could be to choose the classification produced by the model with the best fit in the right tail. Again the LN-DUC model seems to be the best option even if, when ${VaR}_{99} (X)$ is selected, it performs worse than the UG-DUC one. In any case, as it is possible to see from Figure 6, both models agree always more in defining the losses as atypically high, as their values become larger and larger.

6.2. Swedish fire insurance losses

This dataset (called Swefire hereafter) contains Swedish fire losses collected in 1982, and it can be extracted from Embrechts and Schmidli [27]. It consists of 218 observations, but the ones with claims (in millions of Swedish krona; SEK) equal to zero have been removed from the analysis, implying a final number of n = 215 observations. The histogram of the data is shown in Figure 7, whereas Table 10 reports the summary statistics. As for the previous dataset, it is possible to note positive skewness, leptokurtosis, and few large losses. The estimated values of η, that are $\hat{η} = 8.73$ for the LN-DUC model and $\hat{η} = 27.95$ for the UG-DUC one, are higher than for the previous dataset, suggesting a higher degree of contamination in these data.

Figure 7. — Swefire: histogram of the dataset.

Table 10. Swefire: summary statistics.

Variable	Value
Min	0.09
1st Quart.	0.63
Median	1.00
Mean	2.31
3rd Quart.	2.00
Max	34.00
St. Dev	4.18
Skewness	4.81
Kurtosis	27.48

Open in a new tab

The comparison between the likelihood-estimated models is presented in Table 11. Also in this case, AIC and BIC provide the same ranking, and our models are the best ones. Specifically, the LN-DUC model is ranked first, while the UG-DUC one is ranked second. The null p-values of the double bootstrap LR test suggest that our models provide an improvement compared to the respective target distributions.

Table 11. Swefire: AIC and BIC for the competing models, along with their rankings.

Model	AIC	Rank	BIC	Rank	LR p-value
UG-DUC	−665.86	2	−679.34	2	0.00
LN-DUC	−648.60	1	−662.08	1	0.00
UG	−794.71	8	−801.45	8
LN	−695.36	4	−702.10	4
Exponential	−792.71	7	−796.08	7
Weibull	−783.95	6	−790.70	6
Normal	−1228.33	12	−1235.07	12
Logistic	−1046.04	11	−1052.78	11
Skew-logistic	−930.55	9	−940.66	9
Skew-normal	−986.13	10	−996.24	10
Skew-t	−686.21	3	−699.69	3
Hyperbolic	−763.45	5	−776.93	5

Open in a new tab

In the last column, p-values from the LR tests.

Table 12 reports the empirical VaR as well as the VaR for all the considered models and approaches, along with the corresponding backtesting results. When c = 0.95, about our models only the LN-DUC performs well, even if by the analysis of the p-values, all models except the skew-t seem able to produce acceptable estimate of the empirical VaR (the estimated tuning parameters for the ${PORT-MO}_{p}$ are k = 11, p = 4.24 and s = 0.9). It is possible to appreciate the real difference between our models and the benchmark ones, moving deeper in the right tail of the empirical distribution. In fact, when c = 0.99 our models are the best, as also confirmed by the p-values of the backtest.

Table 12. Swefire: ${VaR}_{95} (X)$ and ${VaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

	Value at risk				p-value
Model	${VaR}_{95} (X)$	Rank	${VaR}_{99} (X)$	Rank	${VaR}_{95} (X)$	${VaR}_{99} (X)$
Empirical	7.84		19.93
UG-DUC	9.45	10	16.47	2	0.80	0.10
LN-DUC	8.43	3	19.40	1	0.94	0.58
UG	6.93	6	10.65	9	0.50	0.00
LN	6.14	11	11.90	7	0.50	0.01
Exponential	6.93	5	10.65	8	0.50	0.00
Weibull	7.43	1	12.20	5	0.94	0.01
Normal	9.18	7	12.02	6	0.81	0.01
Logistic	5.38	13	7.54	14	0.12	0.00
Skew-logistic	5.44	12	7.78	13	0.12	0.00
Skew-normal	9.36	8	12.27	4	0.81	0.01
Skew-t	4.33	14	9.33	11	0.00	0.00
Hyperbolic	6.26	9	9.44	10	0.50	0.00
Pareto(t-score)	7.01	4	13.62	3	0.70	0.03
${PORT-MO}_{p}$	7.37	2	8.60	12	0.94	0.00

Open in a new tab

Table 13 reports the empirical and estimated values of the TVaR, with always the same ranking mechanism. Also in this case, the LN-DUC is the best model, as also confirmed by the backtest results. In an opposite way with respect to the previous application, the UG-DUC seems to provide a good estimate of the TVaR only when c = 0.95. Overall, the TVaR values from our models are the only ones being not rejected for at least one probability level c. However, it should be noticed that, for a lower 0.01 significance level, only the Pareto-t estimator would be considered reliable at both probability levels among the competitors.

Table 13. Swefire: ${TVaR}_{95} (X)$ and ${TVaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

	Tail value at risk				min p-value
Model	${TVaR}_{95} (X)$	Rank	${TVaR}_{99} (X)$	Rank	${TVaR}_{95} (X)$	${TVaR}_{99} (X)$
Empirical	17.39		28.37
UG-DUC	13.81	2	20.77	2	0.20	0.03
LN-DUC	15.90	1	31.23	1	0.20	0.30
UG	9.25	9	12.97	10	0.00	0.00
LN	9.97	7	17.45	5	0.01	0.00
Exponential	9.25	8	12.97	9	0.00	0.00
Weibull	10.41	6	15.34	6	0.01	0.00
Normal	10.92	5	13.43	8	0.03	0.00
Logistic	6.73	13	8.86	13	0.00	0.00
Skew-logistic	6.90	12	9.22	12	0.00	0.00
Skew-normal	11.14	4	13.76	7	0.03	0.00
Skew-t	8.35	10	18.20	4	0.00	0.00
Hyperbolic	8.23	11	11.41	11	0.00	0.00
Pareto(t-score)	11.46	3	20.27	3	0.03	0.01

Open in a new tab

Lastly, Figure 8 illustrates the estimated probabilities to be typical or atypical points, as defined in (4), according to the UG-DUC and LN-DUC models. In this case, the interval $(0, 5]$ is shown for a better graphical visualization. The intersection points delimiting the typical and the atypical regions (marked with the vertical dashed lines in Figure 8) are in $x_{1} = 0.19$ and $x_{2} = 1.73$ for the UG-DUC model and in $x_{1} = 0.33$ and $x_{2} = 1.31$ for the LN-DUC one. The differences between the partitions produced by our models concern the interval $(0.19, 0.33)$ , containing the $1.39 %$ of the data, and the interval $(1.31, 1.73)$ , containing the $10.23 %$ of the data. Therefore, they are more similar to each other than they are in the previous application. Here, both from a global and a right tail goodness of fit point of view, the best model is the LN-DUC one, and then its partition of the data should be preferred. However, as it is possible to see from Figure 8, both models agree always more in defining the losses as atypically high, as their values become larger and larger.

7. Conclusions

Several models have been suggested in the actuarial literature for insurance loss data. However, losses distributions show characteristics that are hardly compatible with the choice of fitting a single parametric distribution, calling for more flexible models.

In this paper, a general dichotomous unimodal compound model has been introduced by compounding a unimodal hump-shaped distribution on a positive support with a convenient mixing dichotomous distribution. As illustrative examples, two hump-shaped distributions have been considered, i.e. the log-normal and the unimodal gamma. As a consequence, the main results of our work are given below.

The density of a model belonging to our family of distributions is: defined on positive support, unimodal hump-shaped, positively skewed, and with tails heavier than the target distribution. Moreover, compared to the target distribution, such a density allows a better accommodation of atypical losses and the additional parameters can be useful to evaluate the degree of contamination in the data.
On a related note, our models provide an automatic way to detect typical/atypical losses, and define the corresponding typical/atypical regions on the x-axis.
In the simulation study, we discussed how the ML estimator of the parameters of the target distribution is more robust under our models, with respect to the case where the target distribution is directly considered.
We applied our models to two real insurance datasets, along with several standard distributions whose parameters have been estimated by using the ML approach. Additionally, two further competing methodologies have been considered, namely the t-score estimator and the ${PORT-MO}_{p}$ approach. Firstly, a double bootstrap likelihood-ratio test has been implemented to evaluate if our models perform better than the nested target distributions. For comparison, classical model selection criteria (AIC and BIC) and two well-known risk measures (VaR and TVaR), along with their backtest, have been used. The results show that our models behave very well on the whole data, as indicated by AIC and BIC, but also when we focus on the right tail only, as confirmed by the comparison between theoretical and empirical VaR and TVaR values and by the corresponding backtests. Among our competitors, just few of them provide appropriate VaR estimates in both datasets, and only the Pareto t-estimator produces good TVaR estimates.

Possible extensions of this work could be to:

use alternative unimodal hump-shaped distributions (defined on a positive support);
consider a regression framework, in which the response variable is conditionally distributed according to one of our models;
extend our models to the multivariate context.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Abu Bakar S.A., Hamzah N.A., Maghsoudi M., and Nadarajah S., Modeling loss data using composite models, Insurance: Math. Econom. 61 (2015), pp. 146–154. [Google Scholar]
2.Adcock C., Eling M., and Loperfido N., Skewed distributions in finance and actuarial science: a review, Eur. J. Finance 21 (2015), pp. 1253–1281. doi: 10.1080/1351847X.2012.720269 [DOI] [Google Scholar]
3.Ahn S., Kim J.H.T., and Ramaswami V., A new class of models for heavy tailed distributions in finance and insurance risk, Insurance: Math. Econom. 51 (2012), pp. 43–52. [Google Scholar]
4.Aitkin M. and Wilson G.T., Mixture models, outliers, and the EM algorithm, Technometrics 22 (1980), pp. 325–331. doi: 10.1080/00401706.1980.10486163 [DOI] [Google Scholar]
5.Akaike H., A new look at the statistical model identification, IEEE. Trans. Automat. Contr. 19 (1974), pp. 716–723. doi: 10.1109/TAC.1974.1100705 [DOI] [Google Scholar]
6.Bagnato L. and Punzo A., Unconstrained representation of orthogonal matrices with application to common principle components, preprint (2019), Available at arXiv:1906.00587.
7.Beirlant J., Goegebeur Y., Segers J., and Teugels J.L., Statistics of Extremes: Theory and Applications, John Wiley & Sons, Chichester, 2006. [Google Scholar]
8.Berkowitz J., Testing density forecasts, with applications to risk management, J. Bus. Econ. Stat. 19 (2001), pp. 465–474. doi: 10.1198/07350010152596718 [DOI] [Google Scholar]
9.Bernardi M., Maruotti A., and Petrella L., Skew mixture models for loss distributions: a Bayesian approach, Insurance: Math. Econom. 51 (2012), pp. 617–623. [Google Scholar]
10.Bickerstaff D.R., Automobile collision deductibles and repair cost groups: the lognormal model, PCAS LIX 59 (1972), pp. 68–84. [Google Scholar]
11.Bohdalová M., A comparison of value-at-risk methods for measurement of the financial risk, Faculty of Management, Comenius University, Bratislava, Slovakia 10, 2007.
12.Brazauskas V. and Kleefeld A., Folded and log-folded-t distributions as models for insurance loss data, Scand. Actuar. J. 2011 (2011), pp. 59–74. doi: 10.1080/03461230903424199 [DOI] [Google Scholar]
13.Brilhante M.F., Gomes M.I., and Pestana D., A simple generalisation of the hill estimator, Comput. Stat. Data. Anal. 57 (2013), pp. 518–535. doi: 10.1016/j.csda.2012.07.019 [DOI] [Google Scholar]
14.Burnecki K., Kukla G., and Weron R., Property insurance loss distributions, Phys. A: Stat. Mech. Appl. 287 (2000), pp. 269–278. doi: 10.1016/S0378-4371(00)00453-2 [DOI] [Google Scholar]
15.Burnecki K., Misiorek A., and Weron R., Loss distributions, in Statistical Tools for Finance and Insurance, Springer, Berlin, 2005, pp. 289–317.
16.Cerioli A., Farcomeni A., and Riani M., Wild adaptive trimming for robust estimation and cluster analysis, Scand. J. Stat. 46 (2019), pp. 235–256. doi: 10.1111/sjos.12349 [DOI] [Google Scholar]
17.Chen S.X., Probability density function estimation using gamma kernels, Ann. Inst. Stat. Math. 52 (2000), pp. 471–480. doi: 10.1023/A:1004165218295 [DOI] [Google Scholar]
18.Cooray K. and Ananda M.M.A., Modeling actuarial data with a composite lognormal-Pareto model, Scand. Actuar. J. 2005 (2005), pp. 321–334. doi: 10.1080/03461230510009763 [DOI] [Google Scholar]
19.Davies L. and Gather U., The identification of multiple outliers, J. Am. Stat. Assoc. 88 (1993), pp. 782–792. doi: 10.1080/01621459.1993.10476339 [DOI] [Google Scholar]
20.Delignette-Muller M.L. and Dutang C., Fitdistrplus: an R package for fitting distributions, J. Stat. Softw. 64 (2015), pp. 1–34. Available at http://www.jstatsoft.org/v64/i04/. doi: 10.18637/jss.v064.i04 [DOI] [Google Scholar]
21.Dempster A.P., Laird N.M., and Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. R. Stat. Soc. Ser. B (Stat. Methodol.) 39 (1977), pp. 1–38. [Google Scholar]
22.Dobránszky P., Comparison of historical and parametric value-at-risk methodologies, 2009. Available at SSRN 1508041.
23.Duda R.O., Hart P.E., and Stork D.G., Pattern Classification, John Wiley & Sons, New York, 2012. [Google Scholar]
24.Dutang C. and Charpentier A., CASdatasets: Insurance datasets (Official website), 2016; Version 1.0-6 (2016-05-28) available at http://cas.uqam.ca/.
25.Eling M., Fitting insurance claims to skewed distributions: are the skew-normal and skew-student good models, Insurance: Math. Econom. 51 (2012), pp. 239–248. [Google Scholar]
26.Eling M., Farinelli S., Rossello D., and Tibiletti L., Skewness in hedge funds returns: classical skewness coefficients vs Azzalini's skewness parameter, Int. J. Managerial Finance 6 (2010), pp. 290–304. doi: 10.1108/17439131011074459 [DOI] [Google Scholar]
27.Embrechts P. and Schmidli H., Modelling of extremal events in insurance and finance, Zeitschrift Für Oper. Res. 39 (1994), pp. 1–34. [Google Scholar]
28.Emmer S., Kratz M., and Tasche D., What is the best risk measure in practice? A comparison of standard measures, J Risk. 18 (2015), pp. 31–60. [Google Scholar]
29.Fabián Z., Johnson point and johnson variance, Proc. Prague Stochastics 2006 (2006), pp. 354–363. [Google Scholar]
30.Fabián Z., Parametric estimation using generalized moment method, Tech. Rep. Research report 1014, Inst. of Computer Sciences, Academy of Sciences of the Czech Republic, Prague, 2007.
31.Fabián Z., Scalar score function and score correlation, Tech. Rep. Research report 1077, Inst. of Computer Sciences, Academy of Sciences of the Czech Republic, Prague, 2010.
32.Figueiredo F., Gomes M.I., and Henriques-Rodrigues L., Value-at-risk estimation and the port mean-of-order-p methodology, Revstat 15 (2017), pp. 187–204. [Google Scholar]
33.Furman E., On a multivariate gamma distribution, Stat. Probab. Lett. 78 (2008), pp. 2353–2360. doi: 10.1016/j.spl.2008.02.012 [DOI] [Google Scholar]
34.Ghalanos A., Rugarch: univariate GARCH models, 2015; Version 1.3-6 (2015-08-16) available at https://cran.r-project.org/web/packages/rugarch/index.html.
35.Gomes M.I., Henriques-Rodrigues L., and Manjunath B., Mean-of-order-p location-invariant extreme value index estimation, Revstat 14 (2016), pp. 273–296. [Google Scholar]
36.Ingrassia S. and Punzo A., Decision boundaries for mixtures of regressions, J. Korean. Stat. Soc. 45 (2016), pp. 295–306. doi: 10.1016/j.jkss.2015.11.005 [DOI] [Google Scholar]
37.Jeon Y. and Kim J.H., A gamma kernel density estimation for insurance loss data, Insurance: Math. Econom. 53 (2013), pp. 569–579. [Google Scholar]
38.Kazemi R. and Noorizadeh M., A comparison between skew-logistic and skew-normal distributions, Matematika 31 (2015), pp. 15–24. [Google Scholar]
39.Kellison J.B. and Brockett P., Check the score: Credit scoring and insurance losses: Is there a connection? Texas Business Review, Bureau of Business Research, Austin, 2003.
40.Kupiec P., Techniques for verifying the accuracy of risk measurement models, J. Derivatives 3 (1995). doi: 10.3905/jod.1995.407942 [DOI] [Google Scholar]
41.Longin F., Extreme Events in Finance: A Handbook of Extreme Value Theory and Its Applications, John Wiley & Sons, Hoboken, 2016. [Google Scholar]
42.MacDonald I.L., Numerical maximisation of likelihood: a neglected alternative to em, Int. Stat. Rev. 82 (2014), pp. 296–308. doi: 10.1111/insr.12041 [DOI] [Google Scholar]
43.MacKinnon J.G., Bootstrap hypothesis testing, Handb. Comput. Econom. 183 (2009), pp. 213. [Google Scholar]
44.Mazza A. and Punzo A., Mixtures of multivariate contaminated normal regression models, Stat. Papers 61 (2020), pp. 787–822. doi: 10.1007/s00362-017-0964-y [DOI] [Google Scholar]
45.McLachlan G.J. and Peel D., Finite Mixture Models, John Wiley & Sons, New York, 2000. [Google Scholar]
46.Packová V. and Brebera D., Loss distributions in insurance risk management, Business Administration 19 (2015), pp. 17–22. [Google Scholar]
47.Pigeon M. and Denuit M., Composite lognormal-Pareto model with random threshold, Scand. Actuar. J. 2011 (2011), pp. 177–192. doi: 10.1080/03461231003690754 [DOI] [Google Scholar]
48.Punzo A., Blostein M., and McNicholas P.D., High-dimensional unsupervised classification via parsimonious contaminated mixtures, Pattern Recognit. 98 (2019), p. 107031. [Google Scholar]
49.Punzo A. and McNicholas P.D., Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model, J. Classif. 34 (2017), pp. 249–293. doi: 10.1007/s00357-017-9234-x [DOI] [Google Scholar]
50.Punzo A., Bagnato L., and Maruotti A., Compound unimodal distributions for insurance losses, Insurance: Math. Econom. 81 (2018), pp. 95–107. [Google Scholar]
51.Punzo A. and McNicholas P.D., Parsimonious mixtures of multivariate contaminated normal distributions, Biom. J. 58 (2016), pp. 1506–1537. doi: 10.1002/bimj.201500144 [DOI] [PubMed] [Google Scholar]
52.R Core Team R., A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2018. Available at https://www.R-project.org/.
53.Rytgaard M., Estimation in the pareto distribution, ASTIN Bulletin: J. IAA 20 (1990), pp. 201–216. doi: 10.2143/AST.20.2.2005443 [DOI] [Google Scholar]
54.Schwarz G., Estimating the dimension of a model, Ann. Stat. 6 (1978), pp. 461–464. doi: 10.1214/aos/1176344136 [DOI] [Google Scholar]
55.Scott D., HyperbolicDist: The hyperbolic distribution, 2009; R package version 0.6-2 available at https://CRAN.R-project.org/package=HyperbolicDist.
56.Soetaert K., rootSolve: Nonlinear root finding, equilibrium and steady-state analysis of ordinary differential equations, 2009; R-package version 1.6 available at https://CRAN.R-project.org/package=rootSolve.
57.Stehlík M., Potocký R., Waldl H., and Fabián Z., Some notes on the favourable estimation of fitting heavy tailed data, Tech. Rep. 32, IFAS Research Paper Series, Johannes Kepler University Linz, Linz, 2008.
58.Stehlík M., Potocký R., Waldl H., and Fabián Z., On the favorable estimation for fitting heavy tailed data, Comput. Stat. 25 (2010), pp. 485–503. doi: 10.1007/s00180-010-0189-1 [DOI] [Google Scholar]
59.Templ M., Gussenbauer J., and Filzmoser P., Evaluation of robust outlier detection methods for zero-inflated complex data, J. Appl. Stat. 47 (2019), pp. 1–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Venables W.N. and Ripley B.D., Modern Applied Statistics with S, 4th ed., Springer, New York, 2002. Available at http://www.stats.ox.ac.uk/pub/MASS4, ISBN 0-387-95457-0. [Google Scholar]
61.Wuertz D. and Chalabi Y., fGarch: Rmetrics – Autoregressive Conditional Heteroskedastic Modelling, 2016; Version 3010.82.1 (2016-08-15) available at https://cran.r-project.org/web/packages/fGarch/index.html.
62.Yeo A.C., Smith K.A., Willis R.J., and Brooks M., Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry, Intell. Syst. Accounting Finance Manag. 10 (2001), pp. 39–50. doi: 10.1002/isaf.196 [DOI] [Google Scholar]
63.Zeileis A. and Windberger T., Glogis: fitting and testing generalized logistic distributions, 2014; R package version 1.0-0 available at http://CRAN.R-project.org/package=glogis.
64.Zucchini W., MacDonald I.L., and Langrock R., Hidden Markov models for time series: an introduction using R, Chapman & Hall/CRC Monographs on Statistics and Applied Probability, CRC Press, Boca Raton, 2017.

[CIT0001] 1.Abu Bakar S.A., Hamzah N.A., Maghsoudi M., and Nadarajah S., Modeling loss data using composite models, Insurance: Math. Econom. 61 (2015), pp. 146–154. [Google Scholar]

[CIT0002] 2.Adcock C., Eling M., and Loperfido N., Skewed distributions in finance and actuarial science: a review, Eur. J. Finance 21 (2015), pp. 1253–1281. doi: 10.1080/1351847X.2012.720269 [DOI] [Google Scholar]

[CIT0003] 3.Ahn S., Kim J.H.T., and Ramaswami V., A new class of models for heavy tailed distributions in finance and insurance risk, Insurance: Math. Econom. 51 (2012), pp. 43–52. [Google Scholar]

[CIT0004] 4.Aitkin M. and Wilson G.T., Mixture models, outliers, and the EM algorithm, Technometrics 22 (1980), pp. 325–331. doi: 10.1080/00401706.1980.10486163 [DOI] [Google Scholar]

[CIT0005] 5.Akaike H., A new look at the statistical model identification, IEEE. Trans. Automat. Contr. 19 (1974), pp. 716–723. doi: 10.1109/TAC.1974.1100705 [DOI] [Google Scholar]

[CIT0006] 6.Bagnato L. and Punzo A., Unconstrained representation of orthogonal matrices with application to common principle components, preprint (2019), Available at arXiv:1906.00587.

[CIT0007] 7.Beirlant J., Goegebeur Y., Segers J., and Teugels J.L., Statistics of Extremes: Theory and Applications, John Wiley & Sons, Chichester, 2006. [Google Scholar]

[CIT0008] 8.Berkowitz J., Testing density forecasts, with applications to risk management, J. Bus. Econ. Stat. 19 (2001), pp. 465–474. doi: 10.1198/07350010152596718 [DOI] [Google Scholar]

[CIT0009] 9.Bernardi M., Maruotti A., and Petrella L., Skew mixture models for loss distributions: a Bayesian approach, Insurance: Math. Econom. 51 (2012), pp. 617–623. [Google Scholar]

[CIT0010] 10.Bickerstaff D.R., Automobile collision deductibles and repair cost groups: the lognormal model, PCAS LIX 59 (1972), pp. 68–84. [Google Scholar]

[CIT0011] 11.Bohdalová M., A comparison of value-at-risk methods for measurement of the financial risk, Faculty of Management, Comenius University, Bratislava, Slovakia 10, 2007.

[CIT0012] 12.Brazauskas V. and Kleefeld A., Folded and log-folded-t distributions as models for insurance loss data, Scand. Actuar. J. 2011 (2011), pp. 59–74. doi: 10.1080/03461230903424199 [DOI] [Google Scholar]

[CIT0013] 13.Brilhante M.F., Gomes M.I., and Pestana D., A simple generalisation of the hill estimator, Comput. Stat. Data. Anal. 57 (2013), pp. 518–535. doi: 10.1016/j.csda.2012.07.019 [DOI] [Google Scholar]

[CIT0014] 14.Burnecki K., Kukla G., and Weron R., Property insurance loss distributions, Phys. A: Stat. Mech. Appl. 287 (2000), pp. 269–278. doi: 10.1016/S0378-4371(00)00453-2 [DOI] [Google Scholar]

[CIT0015] 15.Burnecki K., Misiorek A., and Weron R., Loss distributions, in Statistical Tools for Finance and Insurance, Springer, Berlin, 2005, pp. 289–317.

[CIT0016] 16.Cerioli A., Farcomeni A., and Riani M., Wild adaptive trimming for robust estimation and cluster analysis, Scand. J. Stat. 46 (2019), pp. 235–256. doi: 10.1111/sjos.12349 [DOI] [Google Scholar]

[CIT0017] 17.Chen S.X., Probability density function estimation using gamma kernels, Ann. Inst. Stat. Math. 52 (2000), pp. 471–480. doi: 10.1023/A:1004165218295 [DOI] [Google Scholar]

[CIT0018] 18.Cooray K. and Ananda M.M.A., Modeling actuarial data with a composite lognormal-Pareto model, Scand. Actuar. J. 2005 (2005), pp. 321–334. doi: 10.1080/03461230510009763 [DOI] [Google Scholar]

[CIT0019] 19.Davies L. and Gather U., The identification of multiple outliers, J. Am. Stat. Assoc. 88 (1993), pp. 782–792. doi: 10.1080/01621459.1993.10476339 [DOI] [Google Scholar]

[CIT0020] 20.Delignette-Muller M.L. and Dutang C., Fitdistrplus: an R package for fitting distributions, J. Stat. Softw. 64 (2015), pp. 1–34. Available at http://www.jstatsoft.org/v64/i04/. doi: 10.18637/jss.v064.i04 [DOI] [Google Scholar]

[CIT0021] 21.Dempster A.P., Laird N.M., and Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. R. Stat. Soc. Ser. B (Stat. Methodol.) 39 (1977), pp. 1–38. [Google Scholar]

[CIT0022] 22.Dobránszky P., Comparison of historical and parametric value-at-risk methodologies, 2009. Available at SSRN 1508041.

[CIT0023] 23.Duda R.O., Hart P.E., and Stork D.G., Pattern Classification, John Wiley & Sons, New York, 2012. [Google Scholar]

[CIT0024] 24.Dutang C. and Charpentier A., CASdatasets: Insurance datasets (Official website), 2016; Version 1.0-6 (2016-05-28) available at http://cas.uqam.ca/.

[CIT0025] 25.Eling M., Fitting insurance claims to skewed distributions: are the skew-normal and skew-student good models, Insurance: Math. Econom. 51 (2012), pp. 239–248. [Google Scholar]

[CIT0026] 26.Eling M., Farinelli S., Rossello D., and Tibiletti L., Skewness in hedge funds returns: classical skewness coefficients vs Azzalini's skewness parameter, Int. J. Managerial Finance 6 (2010), pp. 290–304. doi: 10.1108/17439131011074459 [DOI] [Google Scholar]

[CIT0027] 27.Embrechts P. and Schmidli H., Modelling of extremal events in insurance and finance, Zeitschrift Für Oper. Res. 39 (1994), pp. 1–34. [Google Scholar]

[CIT0028] 28.Emmer S., Kratz M., and Tasche D., What is the best risk measure in practice? A comparison of standard measures, J Risk. 18 (2015), pp. 31–60. [Google Scholar]

[CIT0029] 29.Fabián Z., Johnson point and johnson variance, Proc. Prague Stochastics 2006 (2006), pp. 354–363. [Google Scholar]

[CIT0030] 30.Fabián Z., Parametric estimation using generalized moment method, Tech. Rep. Research report 1014, Inst. of Computer Sciences, Academy of Sciences of the Czech Republic, Prague, 2007.

[CIT0031] 31.Fabián Z., Scalar score function and score correlation, Tech. Rep. Research report 1077, Inst. of Computer Sciences, Academy of Sciences of the Czech Republic, Prague, 2010.

[CIT0032] 32.Figueiredo F., Gomes M.I., and Henriques-Rodrigues L., Value-at-risk estimation and the port mean-of-order-p methodology, Revstat 15 (2017), pp. 187–204. [Google Scholar]

[CIT0033] 33.Furman E., On a multivariate gamma distribution, Stat. Probab. Lett. 78 (2008), pp. 2353–2360. doi: 10.1016/j.spl.2008.02.012 [DOI] [Google Scholar]

[CIT0034] 34.Ghalanos A., Rugarch: univariate GARCH models, 2015; Version 1.3-6 (2015-08-16) available at https://cran.r-project.org/web/packages/rugarch/index.html.

[CIT0035] 35.Gomes M.I., Henriques-Rodrigues L., and Manjunath B., Mean-of-order-p location-invariant extreme value index estimation, Revstat 14 (2016), pp. 273–296. [Google Scholar]

[CIT0036] 36.Ingrassia S. and Punzo A., Decision boundaries for mixtures of regressions, J. Korean. Stat. Soc. 45 (2016), pp. 295–306. doi: 10.1016/j.jkss.2015.11.005 [DOI] [Google Scholar]

[CIT0037] 37.Jeon Y. and Kim J.H., A gamma kernel density estimation for insurance loss data, Insurance: Math. Econom. 53 (2013), pp. 569–579. [Google Scholar]

[CIT0038] 38.Kazemi R. and Noorizadeh M., A comparison between skew-logistic and skew-normal distributions, Matematika 31 (2015), pp. 15–24. [Google Scholar]

[CIT0039] 39.Kellison J.B. and Brockett P., Check the score: Credit scoring and insurance losses: Is there a connection? Texas Business Review, Bureau of Business Research, Austin, 2003.

[CIT0040] 40.Kupiec P., Techniques for verifying the accuracy of risk measurement models, J. Derivatives 3 (1995). doi: 10.3905/jod.1995.407942 [DOI] [Google Scholar]

[CIT0041] 41.Longin F., Extreme Events in Finance: A Handbook of Extreme Value Theory and Its Applications, John Wiley & Sons, Hoboken, 2016. [Google Scholar]

[CIT0042] 42.MacDonald I.L., Numerical maximisation of likelihood: a neglected alternative to em, Int. Stat. Rev. 82 (2014), pp. 296–308. doi: 10.1111/insr.12041 [DOI] [Google Scholar]

[CIT0043] 43.MacKinnon J.G., Bootstrap hypothesis testing, Handb. Comput. Econom. 183 (2009), pp. 213. [Google Scholar]

[CIT0044] 44.Mazza A. and Punzo A., Mixtures of multivariate contaminated normal regression models, Stat. Papers 61 (2020), pp. 787–822. doi: 10.1007/s00362-017-0964-y [DOI] [Google Scholar]

[CIT0045] 45.McLachlan G.J. and Peel D., Finite Mixture Models, John Wiley & Sons, New York, 2000. [Google Scholar]

[CIT0046] 46.Packová V. and Brebera D., Loss distributions in insurance risk management, Business Administration 19 (2015), pp. 17–22. [Google Scholar]

[CIT0047] 47.Pigeon M. and Denuit M., Composite lognormal-Pareto model with random threshold, Scand. Actuar. J. 2011 (2011), pp. 177–192. doi: 10.1080/03461231003690754 [DOI] [Google Scholar]

[CIT0048] 48.Punzo A., Blostein M., and McNicholas P.D., High-dimensional unsupervised classification via parsimonious contaminated mixtures, Pattern Recognit. 98 (2019), p. 107031. [Google Scholar]

[CIT0049] 49.Punzo A. and McNicholas P.D., Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model, J. Classif. 34 (2017), pp. 249–293. doi: 10.1007/s00357-017-9234-x [DOI] [Google Scholar]

[CIT0050] 50.Punzo A., Bagnato L., and Maruotti A., Compound unimodal distributions for insurance losses, Insurance: Math. Econom. 81 (2018), pp. 95–107. [Google Scholar]

[CIT0051] 51.Punzo A. and McNicholas P.D., Parsimonious mixtures of multivariate contaminated normal distributions, Biom. J. 58 (2016), pp. 1506–1537. doi: 10.1002/bimj.201500144 [DOI] [PubMed] [Google Scholar]

[CIT0052] 52.R Core Team R., A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2018. Available at https://www.R-project.org/.

[CIT0053] 53.Rytgaard M., Estimation in the pareto distribution, ASTIN Bulletin: J. IAA 20 (1990), pp. 201–216. doi: 10.2143/AST.20.2.2005443 [DOI] [Google Scholar]

[CIT0054] 54.Schwarz G., Estimating the dimension of a model, Ann. Stat. 6 (1978), pp. 461–464. doi: 10.1214/aos/1176344136 [DOI] [Google Scholar]

[CIT0055] 55.Scott D., HyperbolicDist: The hyperbolic distribution, 2009; R package version 0.6-2 available at https://CRAN.R-project.org/package=HyperbolicDist.

[CIT0056] 56.Soetaert K., rootSolve: Nonlinear root finding, equilibrium and steady-state analysis of ordinary differential equations, 2009; R-package version 1.6 available at https://CRAN.R-project.org/package=rootSolve.

[CIT0057] 57.Stehlík M., Potocký R., Waldl H., and Fabián Z., Some notes on the favourable estimation of fitting heavy tailed data, Tech. Rep. 32, IFAS Research Paper Series, Johannes Kepler University Linz, Linz, 2008.

[CIT0058] 58.Stehlík M., Potocký R., Waldl H., and Fabián Z., On the favorable estimation for fitting heavy tailed data, Comput. Stat. 25 (2010), pp. 485–503. doi: 10.1007/s00180-010-0189-1 [DOI] [Google Scholar]

[CIT0059] 59.Templ M., Gussenbauer J., and Filzmoser P., Evaluation of robust outlier detection methods for zero-inflated complex data, J. Appl. Stat. 47 (2019), pp. 1–24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0060] 60.Venables W.N. and Ripley B.D., Modern Applied Statistics with S, 4th ed., Springer, New York, 2002. Available at http://www.stats.ox.ac.uk/pub/MASS4, ISBN 0-387-95457-0. [Google Scholar]

[CIT0061] 61.Wuertz D. and Chalabi Y., fGarch: Rmetrics – Autoregressive Conditional Heteroskedastic Modelling, 2016; Version 3010.82.1 (2016-08-15) available at https://cran.r-project.org/web/packages/fGarch/index.html.

[CIT0062] 62.Yeo A.C., Smith K.A., Willis R.J., and Brooks M., Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry, Intell. Syst. Accounting Finance Manag. 10 (2001), pp. 39–50. doi: 10.1002/isaf.196 [DOI] [Google Scholar]

[CIT0063] 63.Zeileis A. and Windberger T., Glogis: fitting and testing generalized logistic distributions, 2014; R package version 1.0-0 available at http://CRAN.R-project.org/package=glogis.

[CIT0064] 64.Zucchini W., MacDonald I.L., and Langrock R., Hidden Markov models for time series: an introduction using R, Chapman & Hall/CRC Monographs on Statistics and Applied Probability, CRC Press, Boca Raton, 2017.

PERMALINK

Dichotomous unimodal compound models: application to the distribution of insurance losses

Salvatore D Tomarchio

Antonio Punzo

ABSTRACT

1. Introduction

2. Methodology

2.1. A general framework

2.2. Dichotomous unimodal compound models

2.3. Specific cases

2.3.1. Mode-parametrized log-normal distribution

Figure 1.

2.3.2. Mode-parametrized unimodal gamma distribution

Figure 2.

3. Maximum-likelihood estimation

4. Computational and operative aspects

4.1. Model comparison

4.1.1. Global fit evaluation

4.1.2. Right tail fit evaluation

4.2. Competing models and approaches

4.2.1. ML-estimated competitors

Table 1. R functions and packages used for the ML-based competitors.

4.2.2. The t-score approach

4.2.3. The PORT-MOp approach

5. Simulation study

5.1. Sensitivity analysis I

Table 2. UG+UG DGP: Average θ^ and γ^ values, with corresponding standard deviations in brackets, estimated over 10,000 replications by the UG (a) and UG-DUC (b) models.

Table 3. LN+LN DGP: Average θ^ and γ^ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the LN (a) and LN-DUC (b) models.

Figure 3.

5.2. Sensitivity analysis II

Table 4. UG+LN DGP: Average θ^ and γ^ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the UG (a) and UG-DUC (b) models.

Table 5. LN+UG DGP: Average θ^ and γ^ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the LN (a) and LN-DUC (b) models.

Figure 4.

6. Application to insurance loss datasets

6.1. French business interruption losses

Figure 5.

Table 6. Frebiloss: summary statistics.

Table 7. Frebiloss: AIC and BIC values for the competing models, along with their rankings.

Table 8. Frebiloss: VaR95(X) and VaR99(X) with corresponding ranking and backtesting p-values.

Table 9. Frebiloss: TVaR95(X) and TVaR99(X) with corresponding ranking and backtesting p-values.

Figure 6.

6.2. Swedish fire insurance losses

Figure 7.

Table 10. Swefire: summary statistics.

Table 11. Swefire: AIC and BIC for the competing models, along with their rankings.

Table 12. Swefire: VaR95(X) and VaR99(X) with corresponding ranking and backtesting p-values.

Table 13. Swefire: TVaR95(X) and TVaR99(X) with corresponding ranking and backtesting p-values.

Figure 8.

7. Conclusions

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.2.3. The ${PORT-MO}_{p}$ approach

Table 2. UG+UG DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10,000 replications by the UG (a) and UG-DUC (b) models.

Table 3. LN+LN DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the LN (a) and LN-DUC (b) models.

Table 4. UG+LN DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the UG (a) and UG-DUC (b) models.

Table 5. LN+UG DGP: Average $\hat{θ}$ and $\hat{γ}$ values, with corresponding standard deviations in brackets, estimated over 10000 replications by the LN (a) and LN-DUC (b) models.

Table 8. Frebiloss: ${VaR}_{95} (X)$ and ${VaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

Table 9. Frebiloss: ${TVaR}_{95} (X)$ and ${TVaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

Table 12. Swefire: ${VaR}_{95} (X)$ and ${VaR}_{99} (X)$ with corresponding ranking and backtesting p-values.

Table 13. Swefire: ${TVaR}_{95} (X)$ and ${TVaR}_{99} (X)$ with corresponding ranking and backtesting p-values.