Regression models for the full distribution to exceedance data

Fernando Ferraz do Nascimento; Aline Raquel Assunção Nunes

doi:10.1080/02664763.2022.2153812

. 2022 Dec 5;51(4):701–720. doi: 10.1080/02664763.2022.2153812

Regression models for the full distribution to exceedance data

Fernando Ferraz do Nascimento ^1,^CONTACT, Aline Raquel Assunção Nunes ¹

PMCID: PMC10929683 PMID: 38476620

Abstract

The list of occurrences linked to significant climate change has grown in recent decades. These changes can be influenced by a set of covariates, such as temperature, location and period of the year. Analyzing the relation among elements and factors that influence the behavior of such events is extremely important for decision-making in order to minimize damages and losses. Exceedance analysis uses the tail of the distribution based on Extreme Value Theory (EVT). Extensions for these models have been proposed in literature, such as regression models for the tail parameters and a parametric or semi-parametric distribution for the part that comes before the tail (well known as bulk distribution). This work presents a new extension to exceedance model, in which the parameters for the bulk distribution capture the effect of covariates such as location and seasonality. We considered a Bayesian approach in the inference procedure. The estimation was done using MCMC -- Markov Chain Monte Carlo methods. Application results for modeling maximum and minimum temperature data showed an efficient estimation of extreme quantiles and a predictive advantage compared to models previously used in literature.

Keywords: Extreme value theory, exceedance models, regression models, MCMC, Bayesian inference

1. Introduction

Global warming is one of the most relevant issues in recent times. According to Parmesan et al. [16] and Sang and Gelfand [18], changes in extremes temperature have more impact on people, causing more human and material loses than changes in the mean temperature. In addition, knowing how often extreme events will occur helps in taking preventive measures, which can minimize human and financial losses.

Due, in part, to the need in predicting the occurrence of these climatic events, the Extreme Value Theory (EVT) emerged. The EVT had its foundations described in Fisher and Tippet [8], who define three asymptotic distributions, for maximum blocks of size n. In the 1950s, the Generalized Extreme Value (GEV) distribution was proposed by von Mises [24] and Jenkinson [11] which encompasses the three distributions previously proposed by Fisher and Tippet [8]. The Generalized Pareto Distribution (GPD) was developed by Pickands [17] who proposed a limit distribution for excesses above a certain threshold.

Although these first works have given a great contribution to the EVT, specific problems in extremes generated the need for the elaboration of more complex models. According to Davison and Smith [5] an important topic in statistics is the description of systematic variation in a variable of primary interest, a response, in terms of covariates. By combining multiple data sets using covariates, it is possible to use the GPD from a higher threshold and thus improve the fit of the model. According to Cabras et al. [2], the probability of extreme hydrological events depends on the period of the year or the characteristics of the geographical area. They related the GPD parameters to the covariates available through regression functions. In the context of modeling temperature, important covariates may be climate, location, altitude, and season.

1.1. The GPD distribution

The GPD distribution was developed by Pickands [17], that showed that when $u \to \infty$ , for any distribution with cdf F, the exceedance distribution converges to the GPD. Then, in EVT studies, a high threshold is considered and supposed that GPD should provide a good fit to exceedance. The CDF of the GPD is given by:

G (x | ξ, η, u) = {\begin{cases} 1 - {(1 + ξ \frac{(x - u)}{η})}^{- 1 / ξ}, & if ξ \neq 0 \\ 1 - \exp {- (x - u) / η}, & if ξ = 0 \end{cases} .

(1)

where $u > 0$ , $η > 0$ . The GPD is valid for x>u for $ξ \geq 0$ and $u < x < (u - η / ξ)$ for $ξ < 0$ . The parameters ξ, η and u represent shape, scale and location.

The GPD density is given by

g (x | ξ, η, u) = {\begin{cases} \frac{1}{η} {(1 + ξ \frac{(x - u)}{η})}^{- (1 + ξ) / ξ}, & if ξ \neq 0 \\ \frac{1}{η} \exp {- (x - u) / η}, & if ξ = 0 \end{cases} .

(2)

According to Coles [4] and Embrechts et al. [7], in EVT, it is crucial to find a way to determine the high quantiles above the threshold, such that if X has a GPD distribution, it is important to know the probability of an event greater than or equal to q, i.e. $P (X > q) = 1 - p$ . The p-quantile of the GPD distribution is given by:

q (p ∣ ξ, η, u) = {\begin{cases} u + \frac{((1 - p)^{- ξ} - 1) η}{ξ}, & if ξ \neq 0 \\ u - η (\log (1 - p)), & if ξ = 0 \end{cases} .

For $ξ < 0$ , the GPD distribution has a finite upper bound, given by:

q_{0} = u - \frac{η}{ξ} .

(3)

1.2. Mixture models to EVT

In the last decades, several works were presented with different approaches regarding to extreme value mixture models. These models refer not only to the tail of the distribution, but also for the bulk distribution, which is the part of the distribution below the threshold.

Scarrot and MacDonald [19] discuss different approaches for the bulk distribution: fully parametric, semi-parametric and non-parametric. Figure 1 illustrates some of these approaches.

Figure 1(a) shows a model that was introduced in Behrens et al [1]. It considers a GPD distribution in tail and a Gamma distribution in the bulk. The density of the Behrens et al. [1] model is given by

f (x | Θ) = {\begin{cases} f_{G} (x | μ, ν), & if x \leq u \\ (1 - F_{G} (u | μ, ν)) g (x | ξ, η, u), & if x > u \end{cases},

(4)

where $Θ = (μ, ν, ξ, η, u)$ , g is the GPD density, and $f_{G}$ and $F_{G}$ are respectively the density and cumulative function of the Gamma distribution, whose density is given by

f_{G} (x | μ, ν) = {\begin{cases} (1 / Γ (ν)) (ν / μ)^{ν} x^{ν - 1} \exp (- (ν / μ) x), & if x \geq 0 \\ 0, & if x < 0 \end{cases},

(5)

with $μ > 0$ and $ν > 0$ . In this case, $E_{G} (X) = μ$ and $V_{G} (X) = μ^{2} / ν$ .

Behrens et al. [1] proposed a Bayesian approach to the parameter estimation, and showed advantage in obtaining quantile estimations compared with standard GPD models, which estimates the quantile below the threshold empirically.

Figure 1(b) shows the approach proposed in Friguessi et al. [9]. It considers a Cauchy cumulative distribution function between the tail GPD and for the bulk a Weibull distribution was considered.

Figure 1(c) shows the model introduced by Carreau and Bengio [3], which include constraints on the parameters to ensure continuity to the first density derivative. In practice, it showed poor performance and was later extended to a mixture of hybrid Pareto distributions.

Figure 1(d) shows the approach of Tancredi et al. [23]. It is the first extreme value mixture model which combines a non-parametric estimator for the bulk distribution with GPD for tail distribution. The model uses a mixture of the uniform densities and provides a piecewise linear approximation for a cumulative distribution function below the threshold. This work was one of the first to consider the threshold as a parameter to be estimated.

Figure 1(e) presents the approach of MacDonald et al. [13]. It considers a Kernel density estimators for the bulk distribution, combining with the GPD for tail distribution. For populations with unbounded support, MacDonald et al. [13] uses symmetric kernels, while that for an unbounded support, a boundary-corrected kernel density estimator is used.

Figure 1(f) presents the model of Nascimento et al. [14]. It represents an extension of the work by Behrens et al. [1]. It uses a non-parametric Bayesian approach with finite mixture of Gamma distributions for the central part of the distribution and a GPD distribution above the threshold. The BIC [21] and DIC [22] comparison criteria were used to choose an appropriate number of Gamma components in the mixture. The distribution change is given by determining the threshold and the estimation considers situations with and without jump threshold.

1.3. Regression models in extreme parameters

Considering an approach that captures the effect of covariates, Nascimento et al. [13] introduced the MPGDR model, which is an extension of the Nascimento et al. [14] model, with density given by,

f (x_{i} | Θ) = {\begin{cases} \sum_{j = 1^{k} p_{j} f_{G} (x_{i}} | μ_{j}, ν_{j}), & if x_{i} \leq u_{i} \\ (1 - \sum_{j = 1}^{k} p_{j} F_{G} (u | μ_{j}, ν_{j})) g (x_{i} | ξ_{i}, η_{i}, u_{i}), & if x_{i} > u_{i} \end{cases},

(6)

where the parameters $(ξ_{i}, η_{i}, u_{i})$ were written as functions of covariates. The weight parmeters satisfies $p_{j} \in (0, 1), j = 1, \dots, k$ , and $\sum_{j = 1}^{k} p_{j} = 1$ .

Nascimento et al. [13] showed in applications of environmental data that covariates of location and month of the year influence the parameters of the distribution and consequently their higher quantiles.

In this work, we proposed a model that considers regression structure for the bulk parameters. This is different from all of the other models discussed. We considered a Gamma distribution for the bulk with density form as in (5), where the parameters μ and ν are written as functions of covariates. The threshold and tail parameters are also written as functions of regression parameters, in a similar form as the work by Nascimento et al. [13]. With this model, we expect to have more accurate estimates for data that have a strong dependence of covariates such as location and seasonality, once we have a specific distribution for tail and bulk for each combination of covariates.

The work is organized as follows. Section 2 presents the proposed model, the estimation procedure of parameter and quantiles. Section 3 presents two application in extreme temperature data using the proposed model and compared with previous models. Section 4 presents the main conclusions and final remarks.

2. The model

Considering the model of Behrens et al. [1] which proposes a Gamma distribution for the bulk and GPD distribution to tail, and considering the model of Nascimento et al. [13], in which the GPD parameters vary according to covariate effects, we present in this work a model in which the bulk parameters also vary as functions of covariates. The density is given by

f (x_{i} | Θ) = {\begin{cases} f_{G} (x_{i} | μ_{i}, ν_{i}), & if x_{i} \leq u_{i} \\ (1 - F_{G} (u_{i} | μ_{i}, ν_{i})) g (x_{i} | ξ_{i}, η_{i}, u_{i}), & if x_{i} > u_{i} \end{cases}

(7)

The parameters of Gamma and GPD distribution is written as functions of explanatory variables z and parameters $Θ = (β_{ξ}, β_{η}, β_{u}, β_{μ}, β_{ν})$ . The model is denoted by $MGPDB (ξ, η, u, μ, ν)$ , or simply MGPDB, when there is no need to specify the parameters.

Considering the data from the proposed model, if the value is greater than the threshold $u_{i}$ , this observation belongs to the GPD distribution where the parameters can be written as functions of linear predictors of a set of covariates, such that $ξ_{i} = ξ (β_{ξ} z_{1, i}^{'})$ , with $β_{ξ} = (β_{ξ, 0}, \dots, β_{ξ, k_{ξ}})$ , $η_{i} = η (β_{η} z_{2, i}^{'})$ , with $β_{η} = (β_{η, 0}, \dots, β_{η, k_{η}})$ and $u_{i} = u (β_{u} z_{3, i}^{'})$ , with $β_{u} = (β_{u, 0}, \dots, β_{u, k_{u}})$ . The vectors $z_{1}, z_{2}$ and $z_{3}$ are the covariates of $ξ_{i}, η_{i}$ and $u_{i}$ respectively, and $β_{ξ}, β_{η}$ and $β_{u}$ are their respective parameter vectors. The link functions for the linear predictors of the tail parameters are given as in Nascimento et al. [13], being $ξ_{i} = \exp (z_{1, i}^{'} β_{ξ}) - 1$ and $η_{i} = \exp (z_{2, i}^{'} β_{η})$ , where $β_{ξ} = (β_{ξ_{0}}, \dots, β_{ξ_{k_{ξ}}})$ and $β_{η} = (β_{η_{0}}, \dots, β_{η_{k_{η}}})$ . The restriction $ξ > - 1$ is necessary because, according to Smith [20], the maximum likelihood estimator does not exist, and this can make it difficult to obtain the estimators. To the threshold the regression structure is given by $u_{i} = u (z_{3, i}^{'} β_{u})$ with $β_{u} = (β_{u_{0}}, \dots, β_{u_{k_{u}}})$ with a canonical link function $u_{i} = z_{3, i}^{'} β_{u}$ .

If the value of the observation is lower than the threshold $u_{i}$ , this observation belongs to the Gamma distribution whose parameters can be written as functions of linear predictor such that $μ_{i} = μ (β_{μ} z_{4, i}^{'})$ , with $β_{μ} = (β_{μ, 0}, \dots, β_{μ, k_{μ}})$ and $ν_{i} = ν (β_{ν} z_{5, i}^{'})$ , with $β_{ν} = (β_{ν, 0}, \dots, β_{ν, k_{ν}})$ . The vectors $z_{4}$ and $z_{5}$ are the covariates of μ and ν and $β_{μ}$ and $β_{ν}$ are their respective parameter vectors. The link functions for the linear predictors of the Gamma distribution are $μ_{i} = \exp (z_{4, i}^{'} β_{μ})$ and $ν_{i} = \exp (z_{5, i}^{'} β_{ν})$ with $β_{μ} = (β_{μ_{0}}, \dots, β_{μ_{k_{μ}}})$ and $β_{ν} = (β_{ν_{0}}, \dots, β_{ν_{k_{ν}}})$ . The choice of the exponential link function was made because the Gamma parameters are strictly positive.

In EVT, a important statistics to collect the information is the high quantiles, where we can measure the probability of a extreme event. In the MGPDB model, the p-quantile expression can be written by

q (p ∣ Θ_{i}) = {\begin{cases} u_{i} + \frac{((1 - p^{*})^{- ξ_{i}} - 1) η_{i}}{ξ_{i}}, & if ξ_{i} \neq 0 \\ u_{i} - η_{i} (\log (1 - p^{*})), & if ξ_{i} = 0 \end{cases},

where

p^{*} = \frac{p - F_{G} (u_{i} | μ_{i}, ν_{i})}{1 - F_{G} (u_{i} | μ_{i}, ν_{i})} .

Equation $q (p ∣ Θ_{i})$ is valid for $p > F_{G} (u_{i} | μ_{i}, ν_{i})$ . Unlike previously considered models, for $p < F_{G} (u_{i} | μ_{i}, ν_{i})$ , the quantile estimation can also be written as functions of covariates, inverting the quantiles of Gamma distribution, given by:

q (p ∣ Θ_{i}) = F_{G}^{- 1} (p, μ_{i}, ν_{i}) .

2.1. Bayesian inference to the MGPDB model

In this work, we use a Bayesian approach to estimate the parameters of the proposed model, as the similar manner than all the other similar works presented in Section 1. Nascimento et al. [14] justify the choice of this approach due to its flexibility in considering the threshold as a parameter to be estimated, along with the parameters for the bulk distribution.

2.1.1. Prior distribution

As in the work of Nascimento et al. [13], considering that the parameters are now functions of the covariates and their associated coefficient can be positive or negative , normal prior distributions were chosen for the parameters, given by $β_{u_{0}} \sim N (a_{0}, V_{β_{u_{0}}})$ and $β_{u_{i}} \sim N (0, V_{β_{u_{i}}})$ , with $i = 1, \dots, k_{u}$ , $β_{ξ_{0}} \sim N (b_{0}, V_{β_{ξ_{0}}})$ and $β_{ξ_{i}} \sim N (0, V_{β_{ξ_{i}}})$ , with $i = 1, \dots, k_{ξ}$ , $β_{η_{0}} \sim N (c_{0}, V_{β_{η_{0}}})$ and $β_{η_{i}} \sim N (0, V_{β_{η_{i}}})$ , with $i = 1, \dots, k_{η}$ , $β_{μ_{0}} \sim N (d_{0}, V_{β_{μ_{0}}})$ and $β_{μ_{i}} \sim N (0, V_{β_{μ_{i}}})$ , with $i = 1, \dots, k_{μ}$ and $β_{ν_{0}} \sim N (e_{0}, V_{β_{ν_{0}}})$ and $β_{ν_{i}} \sim N (0, V_{β_{ν_{i}}})$ , with $i = 1, \dots, k_{ν}$ , where k is the number of covariates.

The values $a_{0}, b_{0}, c_{0}, d_{0}$ and $e_{0}$ , are the intercept effects for each parameter and we usually choose these values different from zero, considering the magnitude of each dataset. For example, the value of the intercept for u and μ, given by $a_{0}$ and $d_{0}$ respectively, depends on the sample space of the data. Usually, we consider a value at a high quantile for the intercept threshold $a_{0}$ , following the GPD assumption that considers that the GPD fits well for a sufficiently large threshold. For $d_{0}$ , which is the mean for the bulk, we can consider a value close to a measure of central tendency of the distribution.

For the value of $c_{0}$ , which represent the intercept of the scale parameter for the GPD distribution, this value was chosen to be around a scale measure (in logarithm scale) of the tail. Considering the shape parameter for the bulk distribution, the prior to intercept $e_{0}$ was chosen in a low value, as the bulk distribution usually in asymmetric, as we can see in previous works as Nascimento et al. [13] and Nascimento et al. [14].

The prior intercept variances $V_{β_{u_{0}}}, V_{β_{ξ_{0}}}, V_{β_{η_{0}}}, V_{β_{μ_{0}}}$ and $V_{β_{ν_{0}}}$ are chosen in such a way that the variances are reasonably large. According to Nascimento et al. [15], a very large a priori variance for these parameters can cause difficulties in estimating to obtain the correct values of the parameters, while a small variance leads to the underestimated temporal variation. With the variances proposed in Nascimento et al. [15], we can identify different types of behaviour for these parameters.

For the values of $β_{u_{i}}, β_{ξ_{i}}, β_{η_{i}}, β_{μ_{i}}$ and $β_{ν_{i}}$ assume a prior mean 0 and measure a covariate effect for each parameter, considering that in the prior we give the same probability to the effect for each covariate be positive and negative, and all effects of the covariates are captured by the data.

2.1.2. Posterior distribution

The Likelihood function of the MGPDB model is given by

L (u, μ, ν, η, ξ | x) = \prod_{i : x_{i} < u_{i}} (f_{G} (x_{i} | μ_{i}, ν_{i})) \prod_{i : x_{i} > u_{i}} ([1 - F_{G} (u_{i} | μ_{i}, ν_{i})] g (x_{i} | ξ_{i}, η_{i}, u_{i})),

(8)

with parameters given by $ξ_{i} = \exp (β_{ξ} {z^{'}}_{1, i}) - 1$ , $η_{i} = \exp (β_{η} {z^{'}}_{2, i})$ , $u_{i} = β_{u} {z^{'}}_{3, i}$ , $μ_{i} = \exp (β_{μ} {z^{'}}_{4, i})$ and $ν_{i} = \exp (β_{ν} {z^{'}}_{5, i})$ .

Given the prior and likelihood distribution, we can obtain the proportional form of posterior distribution of the model. The proportional form of the log-posterior distribution is given by

\begin{aligned} \log π (Θ | x) & \propto \sum_{i : x_{i} < u_{i}}^{k} \log [f_{G} (x_{i} | μ_{i}, ν_{i})] + \sum_{x_{i} \geq u_{i}}^{k} \log (1 - F_{G} (u_{i} | μ_{i}, ν_{i})) \\ + \sum_{x_{i} \geq u_{i}}^{k} \log [g (x_{i} | ξ_{i}, η_{i}, u_{i})] \\ - \frac{(β_{u_{0}} - a_{0})^{2}}{2 V_{β_{u_{0}}}} - \sum_{i = 1}^{k_{u}} (\frac{β_{u_{i}}^{2}}{2 V β_{u_{i}}}) - \frac{(β_{ξ_{0}} - b_{0})^{2}}{2 V_{β_{ξ_{0}}}} - \sum_{i = 1}^{k_{ξ}} (\frac{β_{ξ_{i}}^{2}}{2 V β_{ξ_{i}}}) \\ - \frac{(β_{η_{0}} - c_{0})^{2}}{2 V_{β_{η_{0}}}} - \sum_{i = 1}^{k_{η}} (\frac{β_{η_{i}}^{2}}{2 V β_{η_{i}}}) \\ - \frac{(β_{μ_{0}} - d_{0})^{2}}{2 V_{β_{μ_{0}}}} - \sum_{i = 1}^{k_{μ}} (\frac{β_{μ_{i}}^{2}}{2 V β_{μ_{i}}}) - \frac{(β_{ν_{0}} - e_{0})^{2}}{2 V_{β_{ν_{0}}}} - \sum_{i = 1}^{k_{ν}} (\frac{β_{ν_{i}}^{2}}{2 V β_{ν_{i}}}) \end{aligned}

where $Θ = (β_{ξ}, β_{η}, β_{u}, β_{μ}, β_{ν})$ are the parameters.

The prior and likelihood support in Θ depend on the covariates $z$ and the restriction of the parameters of the MGPDB model for the GPD distribution that must satisfy, when $ξ \neq 0$ :

(1 + \frac{ξ_{i}}{η_{i}} (x_{i} - u_{i})) > 0,

(9)

$\forall i = 1, \dots, n and x_{i} \geq u_{i}$ . For the cases where $x_{i} < u_{i}$ , the restriction is not necessary as it is modellled by Gamma distribution.

The parameters were estimated using MCMC techniques. We considered 100,000 iterations, in which the first 60,000 were used in burn-in and for the remaining 40,000, a sample was collected for every 100 observations, resulting in 400 points for each parameter. Each tail parameter vector $β_{u}, β_{ξ}$ and $β_{η}$ and each bulk parameter vector $β_{μ}$ and $β_{ν}$ is estimated as a block by the Metropolis-Hastings algorithm. Details of the algorithm are given in the Appendix.

3. Applications

In order to verify the applicability of the proposed model, two environmental datasets were used in this work, which present covariate effects related to seasonality and location. The data presented consist in extreme temperatures.

3.1. Application 1: maximum temperature in US

The dataset consists of the average daily temperature, in ^○F, from 1995 to 2008 of 84 US cities. This dataset was collected in Dayton University website (academic.udayton.edu/kissock/http/weather). The analysis was performed using monthly maxima of 14,356 observations. Among the factors that can influence the temperature of cities, we highlight the location and period of the year. As the United States has a large continental territorial extension, the local temperature is strongly influenced by the latitudinal ranges. Since one of the characteristics presented is temperature data, seasonality is incorporated into the model through trigonometric functions. The representation of the covariate vector for this model is given by $z_{i} = (z_{0, i}, z_{1, i}, z_{2, i}, z_{3, i}, z_{4, i})$ , where the intercept is represented by $z_{0, i} = 1$ , and the seasonal behaviour is given by $z_{1, i} = \cos (\frac{2 π m_{i}}{12})$ and $z_{2, i} = \sin (\frac{2 π m_{i}}{12})$ , where $m_{i}$ is the month of observation i, and the latitude effect is captured by $z_{3, i} = \frac{l_{i} - m_{l}}{10}$ , where $l_{i}$ is the latitude of the observation i and $m_{l}$ is the mean latitude value of all observations and the influence of altitude is indicated by $z_{4, i} = \frac{a_{i} - m_{a}}{100}$ , where $a_{i}$ is the altitude of the observation i and $m_{a}$ is the mean altitude value of all observations.

The parameter estimation of the MGPDB model was written as a function of the covariate vector z. The difficulty in estimating the threshold, a parameter that separates the tail from the bulk, requires a not very high variance. Then, the prior distribution to the parameters $β_{u}$ is given by $β_{u_{0}} \sim N (65, 100)$ and $β_{u_{i}} \sim N (0, 10)$ , with $i = 1, \dots, 4$ . We considererd the prior mean of intercept in 65 as this value represent and high quantile of the dataset. A variance of 100 for the intercept and 10 for the effects of the covariates were chosen according to the magnitude of the data and their possible variations, with greater variation in the intercept than in the effects of the covariates. Works like Nascimento et al. [13] used these values for similar applications, obtaining good results and good fit measures. The prior distribution to the parameters $β_{ξ}$ was given by $β_{ξ_{0}} \sim N (0, 10)$ and $β_{ξ_{i}} \sim N (0, 10)$ , with $i = 1, \dots, 4$ . As in many applications at extreme values this parameters usually has value around 0, we also considered this value in mean of the intercept. The prior parameters for $β_{η}$ was $β_{η_{0}} \sim N (2, 100)$ and $β_{η_{i}} \sim N (0, 100)$ , with $i = 1, \dots, 4$ . We considered these prior parameters on the same manner as Nascimento et al. [13]. For the parameter $β_{ν}$ the prior was given by $β_{ν_{0}} \sim N (3, 100)$ and $β_{ν_{i}} \sim N (0, 100)$ , in which the mean value of the intercept represents a value around a bulk shape parameter for the distribution (in logarithm scale, as we are using the log predictor to this parameter). For the parameters $β_{μ}$ a Normal prior distribution was considered, with mean centered around the median of the data and variance 100. Application results show that the choice of these variances for the parameters means that the a priori has little influence on the estimation, making all the posterior information come from the data information.

Table 1 presents the compared results of the BIC and DIC fit measures for the US maximum temperature data, compared with some other previous models used in extreme value estimation, in which the MG model corresponds a fully semi-parametric model with mixture of Gammas [25]; the MPGD model corresponds in a mixture of Gammas with static GPD parameters [14]. In this case, when k = 1 we have the particular case of Behrens et al. [1] model; The MGPDR model is the approach of Nascimento et al. [13], which consider a Mixture of Gammas for the bulk and GPD for tail, considering regression structure only in the tail parameters. We can observe that the proposed model presents better fit measures for this application.

Table 1.

Fit measures to the US Data.

Model	$10^{5}$ DIC	$10^{5}$ BIC
MG	1.1321	1.1331
MGPD	1.1325	1.1335
MGPDR	0.9141	0.9163
MGPDB	0.7368	0.7390

Open in a new tab

Table 2 presents the credibility intervals for all parameters of this model. We consider symmetric credibility intervals, considering the $α / 2$ and $(1 - α / 2)$ quantiles of the MCMC sampled points (see [10]). The seasonality showed significant values indicating the effect of months throughout the year in its entirety for the parameters related to $β_{1}$ and $β_{2}$ . Latitude had a negative influence on all parameters, that is, the higher the latitude, the lower the values of $u, ξ, η, μ$ and ν. The effect of the altitude is significant in almost all parameters. The most expressive exception appears in $β_{4, ξ}$ which is the effect of altitude on the tail parameter in ξ. Also in relation to altitude, it has a negative influence on the parameter μ, that is, the higher the altitude, the lower the mean of the Gamma. When the altitude related to the parameters $u, η$ and ν increases, higher is the value of these parameters.

Table 2.

Posterior mean $95 %$ for the parameters of the MGPDB model applied to US maximum data.

$β_{0, u}$	$β_{1, u}$	$β_{2, u}$	$β_{3, u}$	$β_{4, u}$
75.247	$- 3.582$	$- 2.725$	$- 3.720$	0.539
$(75.202, 75.282)$	$(- 3.616, - 3.549)$	$(- 2.772, - 2661)$	$(- 3.754, - 3.676)$	$(0.533, 0.544)$
$β_{0, η}$	$β_{1, η}$	$β_{2, η}$	$β_{3, η}$	$β_{4, η}$
1.257	$- 0.691$	$- 0.442$	$- 0.229$	0.029
$(1.199, 1.317)$	$(- 0.754, - 0.625)$	$(- 0.499, - 0.387)$	$(- 0.293, - 0.168)$	$(0.017, 0.040)$
$β_{0, ξ}$	$β_{1, ξ}$	$β_{2, ξ}$	$β_{3, ξ}$	$β_{4, ξ}$
$- 0.234$	0.115	0.088	$- 0.098$	0.004
$(- 0.271, - 0.201)$	$(0.065, 0.164)$	$(0.055, 0.124)$	$(- 0.142, - 0.052)$	$(- 0.004, 0.011)$
$β_{0, μ}$	$β_{1, μ}$	$β_{2, μ}$	$β_{3, μ}$	$β_{4, μ}$
4.265	$- 0.238$	$- 0.152$	$- 0.218$	$- 0.0068$
$(4.263, 4.268)$	$(- 0.242, - 0.234)$	$(- 0.155, - 0.149)$	$(- 0.224, - 0.213)$	$(- 0.007, - 0.006)$
$β_{0, ν}$	$β_{1, ν}$	$β_{2, ν}$	$β_{3, ν}$	$β_{4, ν}$
4.474	$- 0.069$	$- 0.218$	$- 0.532$	0.057
$(4.439, 4.506)$	$(- 0.119, - 0.022)$	$(- 0.263, - 0.174)$	$(- 0.604, - 0.465)$	$(0.051, 0.062)$

Open in a new tab

Figure 2 presents the threshold estimation for different months for the cities of Miami Beach-FL, Seattle-WA, San Diego-CA and Portland-ME, with latitudes of $25^{\circ} 46^{'}$ , $47^{\circ} 36^{'}$ , $32^{\circ} 48^{'}$ and $43^{\circ} 39^{'}$ and altitudes of $2 m$ , $53 m$ , $19 m$ and $16 m$ respectively. According to the proposed model, the points that are under the threshold are modelled by the GPD distribution, where for the points below the threshold is modelled by the Gamma distribution. Sea-level cities like Miami tend to have higher temperatures, in both the coldest and warmest months, due to their lower altitudes. Also, as this city has one of the lowest latitudes in the US, this fact contributes to it having higher temperatures than other cities. Much of the data for this city are found in the tail of the distribution, that is, these data are considered extremes belonging to the GPD distribution. Portland and Seattle have similar behaviour due to their high latitudes. In colder months, the Gamma distribution proved to be more suitable for adjusting temperatures and in warm months, the data is modeled partly by Gamma and partly by GPD. These cities have a wide variation in temperatures. The coldest months register very low temperatures. In San Diego, the maximum temperatures in the coldest months show little variation from the warmer months. For this city almost all data are adjusted by Gamma as observations are below the threshold.

Altitude is one of the factors that most influence the characteristics of a region. Cities that have higher altitudes tend to have milder temperatures and cities that have lower altitudes tend to have higher temperatures. This behaviour is captured by studying the threshold among all cities for each month of the year. As an example, we take Denver-CO where, due to its location in relation to high altitude, shows lower temperatures compared to cities located at low altitudes, such as Miami Beach-FL. Considering the four cities in Figure 2, we can note the importance of considering covariates for the bulk distribution, as the four cities have many points below the threshold and with different behaviours, that can only be captured in a model with covariates, differently from the previous works cited in Section 1, in which consider the same bulk distribution for all locations and months.

Figure 3 shows the histogram of the MCMC posterior points of the upper bound $q_{0}$ quantity given in Equation (3) of these cities in the months of January and July. The upper bound temperature that the city of Miami, with an altitude of $2 m$ , can reach on a January day is between ${75.45}^{\circ} F$ and $75. 55^{\circ} F$ , while in Denver, whose altitude is $1597 m$ , in the same month, the upper bound is between ${70.20}^{\circ} F$ and ${70.40}^{\circ} F$ . In July, Miami can record upper bound temperatures between ${84.35}^{\circ} F$ and ${84.45}^{\circ} F$ , while in Denver the value is between ${79.20}^{\circ} F$ and ${79.28}^{\circ} F$ .

Figure 4 shows the histogram of MCMC posterior points for the $99 %$ quantile. We can observe that the probability of a temperature being higher than ${67.28}^{\circ} F$ , in April and ${92.79}^{\circ} F$ , in September in Salt Lake City -- UT is $1 %$ , while that in Houston, the $99 %$ quantile is ${83.03}^{\circ} F$ in April and ${97.33}^{\circ} F$ in September.

Figure 5 presents the estimated $90 %, 95 %$ and $99 %$ quantile to Miami, Salt Lake City, Seattle and Houston. We can note some observation higher than the $90 %$ and $95 %$ quantiles around the year, specially in the winter months. For the most months there isn't any value higher than the estimated $99 %$ quantile, although some observed values are close to this quantile.

Figure 6 shows the map of the estimated temperature of the $95 %$ and $99 %$ quantile for the entire US using the estimate of the 84 analyzed cities, considering an ordinary kriging [6]. We can see a big difference in temperature across the country between winter and summer, where the difference in extreme quantiles in the seasons reaches about $40^{\circ} F$ . We also observe that the estimate for the 99% quantile is slightly higher than that for the 95% quantile in the range of $5^{\circ} F$ to $7^{\circ} F$ . We observed the highest temperatures in the Midwest and Florida, and the lowest maximum temperatures are observed in the northwest region.

3.2. Application 2: minimum temperature in Rio de Janeiro

This database consists of daily minimum temperature data, in ^○C, and covers the period from 1961 to 2000 in 37 cities in Rio de Janeiro State, in Brazil. The data was collected in INMET (National Institude of Metheoroly) website (portal.inmet.gov.br). For the analysis, monthly minimum temperatures were collected, a total of 11,336 observations.

The tail of the distribution of the minimum dataset is on the left and the highest temperature found in the observations was 25.6. For the analysis of this database, according to the proposed model, the transformation $x = 30.0 - y$ was used, where y are the original data, with the purpose to guarantee positive and right-tailed observations, in addition to allowing the calculation of minimum observations greater than 25.6. Figure 7 shows the histogram of the original data and the transformed data respectively

Figure 7. — Histogram of original and transformed data to Rio de Janeiro minimum temperature.

The Rio de Janeiro state stands out for its diverse landscapes, with mountains and lowlands in its relief. Based on the local geography and the little latitudinal variation of the state, the covariate altitude will be more relevant. The covariate referring to the months of the year, also responsible for changes in temperature, will be analyzed using trigonometric transformations. The covariate vector is given by $z_{i} = (z_{0, i}, z_{1, i}, z_{2, i}, z_{3, i}, z_{4, i})$ , where the intercept is given by $z_{0, i} = 1$ , the monthly effect is given by $z_{1, i} = \cos (\frac{2 π m_{i}}{12})$ and $z_{4, i} = \sin (\frac{2 π m_{i}}{12})$ , where $m_{i}$ is the month i, and the altitude effect is given by $z_{2, i} = \frac{a_{i} - m_{a}}{100}$ , where $a_{i}$ is the altitude i and $m_{a}$ is the mean of altitude between all observations. Finally the interaction effect between month and altitude is given by $z_{3, i} = z_{1, i} z_{2, i}$ .

The estimation of the tail and bulk parameters of the MGPDB model proposed in this work was written as a function of the covariate vector z. Similar to the US maximum temperature data regarding threshold estimation, the prior distribution is more informative in minimum temperature data. The prior distribution of the parameter $β_{u}$ is given by $β_{u_{0}} \sim N (6, 10)$ and $β_{u_{i}} \sim N (0, 1)$ , with $i = 1, \dots, 4$ . The prior distributions of $β_{ξ}$ , $β_{η}$ , $β_{μ}$ and $β_{ν}$ are given respectively by $β_{ξ_{0}} \sim N (0, 10)$ and $β_{ξ_{i}} \sim N (0, 10)$ , $β_{η_{0}} \sim N (0.5, 100)$ and $β_{η_{i}} \sim N (0, 100)$ , $β_{μ_{0}} \sim N (1.4, 10)$ and $β_{μ_{i}} \sim N (0, 1)$ and $β_{ν_{0}} \sim N (1, 100)$ and $β_{ν_{i}} \sim N (0, 100)$ , with $i = 1, \dots, 4$ . The criterion for choosing the prior parameters was similar to that used in the first application. As the magnitude of these data is smaller, the values of the intercepts of the parameters were smaller.

As in the analyzed data from the United States, the compared results of the BIC and DIC adjustment measures for the minimum temperature given in Table 3 presents advantages in the estimation compared with previous models. Table 4 presents the credibility intervals for the parameters of the MGPDB model. It is observed that all parameters are influenced by the seasonal effect of the month of the year, since all $β_{1}$ and $β_{4}$ are significant. It is also noticed the relationship between the threshold and the mean of the Gamma regarding altitude $β_{2}$ . As we are working with transformed data, the interpretation for the minimum temperature observations, we have that the higher the altitude, the lower the threshold and Gamma mean values. The effect of the interaction between the months of the year and altitude is almost non-existent since most of the results were not significant, except in $β_{3, u}$ , which is the effect of the interaction between altitude and season of the year in determining the threshold.

Table 3.

Fit measures to the Rio de Janeiro data.

Model	$10^{5}$ DIC	$10^{5}$ BIC
MG	0.5863	0.5876
MGPD	0.5866	0.5874
MGPDR	0.4222	0.4248
MGPDB	0.1146	0.1168

Open in a new tab

Table 4.

Posterior mean $95 %$ for the parameters of the MGPDB model applied to Rio de Janeiro Minimum data.

$β_{0, u}$	$β_{1, u}$	$β_{2, u}$	$β_{3, u}$	$β_{4, u}$
10.445	$- 1.618$	0.528	$- 0.092$	$- 1.501$
$(10.442, 10.450)$	$(- 1.623, - 1.613)$	$(0.527, 0.529)$	$(- 0.093, - 0.091)$	$(- 1.505, - 1.497)$
$β_{0, η}$	$β_{1, η}$	$β_{2, η}$	$β_{3, η}$	$β_{4, η}$
0.731	$- 0.454$	0.057	0.014	$- 0.158$
$(0.708, 0.754)$	$(- 0.482, - 0.424)$	$(0.050, 0.064)$	$(0.005, 0.024)$	$(- 0.188, - 0.123)$
$β_{0, ξ}$	$β_{1, ξ}$	$β_{2, ξ}$	$β_{3, ξ}$	$β_{4, ξ}$
$- 0.264$	0.096	$- 0.041$	$- 0.009$	0.051
$(- 0.277, - 0.249)$	$(0.079, 0.112)$	$(- 0.046, - 0.035)$	$(- 0.017, - 0.003)$	$(0.033, 0.068)$
$β_{0, μ}$	$β_{1, μ}$	$β_{2, μ}$	$β_{3, μ}$	$β_{4, μ}$
2.529	$- 0.217$	0.089	8.47219e−05	$- 0.243$
$(2.515, 2.546)$	$(- 0.234, - 0.201)$	$(0.083, 0.096)$	$(- 0.00798, - 0.00790)$	$(- 0.255, - 0.229)$
$β_{0, ν}$	$β_{1, ν}$	$β_{2, ν}$	$β_{3, ν}$	$β_{4, ν} 4$
3.849	$- 0.174$	$- 0.122$	0.032	0.157
$(3.748, 3.934)$	$(- 0.278, - 0.072)$	$(- 0.153, - 0.093)$	$(- 0.009, 0.073)$	$(0.616, 0.253) 4$

Open in a new tab

Figure 8 presents the evolution of the threshold throughout the year for the cities of Campos dos Goytacazes, Resende, Teresópolis and Cabo Frio. The proposed model MGPDB points out that observations above the threshold are modeled by the GPD distribution and the observations below the threshold by the Gamma distribution. Located at altitude of $869 m$ , Teresópolis concentrates most of the above-average observations higher than the estimated threshold. The cities of Campos dos Goytacazes, Resende and Cabo Frio show similar behaviour regarding the adjustment of observations. The fact that Resende is located at $407 m$ of altitude means that the average temperatures in the hottest and coldest months are lower than those of Campos dos Goytacazes, with elevation of $14 m$ and Cabo Frio, with elevation of $4 m$ .

When we work with minimum data, the analysis of extreme quantiles occurs through the transformation of the original data, making extreme quantiles become minimum quantiles. To find the $1 %$ quantile of some city in a given month of the year we find the distribution of the $99 %$ quantile of the transformed data and return this distribution to the scale of original data. Figure 9 shows the data according to the distribution of the $1 %$ quantile. Observing the line of the estimated $1 %$ quantile, we can say tht the probability of the temperature being less than ${23.02}^{\circ} C$ , in April and ${20.83}^{\circ} C$ , in September in the city of Rio de Janeiro is $1 %$ , while in Cabo Frio, in April and September, the temperature probability is less than ${23.01}^{\circ} C$ and ${20.82}^{\circ} C$ respectively.

Figure 9. — Histogram of the estimated $1 %$ quantile to Rio de Janeiro and Cabo Frio.

Figure 10 shows the variation of the minimum quantiles throughout the year. The cities of Angra dos Reis ( $6 m$ of altitude) and Cabo Frio ( $4 m$ of altitude), show similar behaviour to the quantiles $1 %$ , $5 %$ and $10 %$ . Similarly, the cities of Pinheiral ( $345 m$ of altitude) and Carmo ( $347 m$ of altitude) show similarities regarding the extreme data captured between the months of May and December by the quantiles $1 %$ , $5 %$ and $10 %$ .

4. Conclusions

This work proposed a new extreme value mixture model, which considers a regression structure for both the bulk and the tail of the distribution. The MGPDB model uses a GPD distribution and a Gamma distribution with parameters varying according to pertinently selected covariates to estimate values below and above the threshold.

The results of the BIC and DIC adjustment measures presented advantages in the estimation since the effect of the covariates in the tail and bulk of the distribution provide better adjusts compared with the previously MGPDR model that uses regression only on the tail parameters.

In both applications, the model obtained significant results regarding the effect of covariates on almost all parameters. In the analysis of the behaviour of the threshold, characteristics relevant to the covariates latitude and altitude were evidenced through the evolution of the threshold in both analyzed data.

The MGPDB model has the advantage of providing characteristic behaviors of all locations and periods of the year, in addition to providing better BIC and DIC measures for the estimation of important indexes in extreme values such as the estimation of higher quantiles, when analyzing data on maximum temperatures, and lower quantiles, when minimum temperatures are studied.

Appendix -- MCMC Algorithm.

The MCMC Algorithm consists of the block estimation of the parameters of the MGPDB model.

In s iteration, the parameter chains in s + 1 is updated as follows:

Sampling $β_{ξ}$

For each $β_{ξ_{i}}, i = 0, \dots, k_{ξ}$ , we sample $β_{ξ_{i}}^{*}$ from $N (β_{ξ_{i}}^{(s)}, V_{β_{ξ_{i}}})$ if the sample vector $β_{ξ_{i}}^{*}$ satisfies the restriction in (9). Else, we sample $β_{ξ_{i}}^{*}$ until the vector satisties the restriction.

Update $β_{ξ}^{(s + 1)} = β_{ξ}^{*}$ with probabilities $min {1, \frac{π (β_{ξ}^{*}, β_{η}^{(s)}, β_{u}^{(s)}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}{π (β_{ξ}^{(s)}, β_{η}^{(s)}, β_{u}^{(s)}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}} .$
Sampling $β_{η}$

For each $β_{η_{i}}, i = 0, \dots, k_{η}$ , we sample $β_{η_{i}}^{*}$ from $N (β_{η_{i}}^{(s)}, V_{β_{η_{i}}})$ if the sampled vector $β_{η_{i}}^{*}$ satisfies the restriction in (9). Else we sample the vector $β_{η_{i}}^{*}$ until satisfies the restriction.

Update $β_{η}^{(s + 1)} = β_{η}^{*}$ with probability $min {1, \frac{π (β_{ξ}^{(s + 1)}, β_{η}^{*}, β_{u}^{(s)}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}{π (β_{ξ}^{(s + 1)}, β_{η}^{(s)}, β_{u}^{(s)}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}} .$
Sampling $β_{u}$

For each $β_{u_{i}}, i = 0, \dots, k_{u}$ , we sample $β_{u_{i}}^{*}$ from $N (β_{u_{i}}^{(s)}, V_{β_{u_{i}}})$ if the sampled vector $β_{u_{i}}^{*}$ satisfies the restriction in (9). Else we sample $β_{u_{i}}^{*}$ until it satisfies the restriction.

Update $β_{u}^{(s + 1)} = β_{u}^{*}$ with probability $min {1, \frac{π (β_{ξ}^{(s + 1)}, β_{η}^{(s + 1)}, β_{u}^{*}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}{π (β_{ξ}^{(s + 1)}, β_{η}^{(s + 1)}, β_{u}^{(s)}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}} .$
Sampling $β_{μ}$

The vector $β_{μ_{i}}, i = 0, \dots, k_{μ}$ , is sampled $β_{μ_{i}}^{*}$ from $N (β_{μ_{i}}^{(s)}, V_{β_{μ_{i}}}) .$ The value $β_{μ}^{(s + 1)} = β_{μ}^{*}$ is accept with probability $min {1, \frac{π (β_{ξ}^{(s + 1)}, β_{η}^{(s + 1)}, β_{u}^{(s + 1)}, β_{μ}^{*}, β_{ν}^{(s)} | x)}{π (β_{ξ}^{(s + 1)}, β_{η}^{(s + 1)}, β_{u}^{(s + 1)}, β_{μ}^{(s)}, β_{ν}^{(s)} | x)}} .$
Sampling $β_{ν}$

The vector $β_{ν_{i}}, i = 0, \dots, k_{ν}$ , is sampled $β_{ν_{i}}^{*}$ from $N (β_{ν_{i}}^{(s)}, V_{β_{ν_{i}}}) .$ The value $β_{ν}^{(s + 1)} = β_{ν}^{*}$ is accept with probability $min {1, \frac{π (β_{ξ}^{(s + 1)}, β_{η}^{(s + 1)}, β_{u}^{(s + 1)}, β_{μ}^{(s + 1)}, β_{ν}^{*} | x)}{π (β_{ξ}^{(s + 1)}, β_{η}^{(s + 1)}, β_{u}^{(s + 1)}, β_{μ}^{(s + 1)}, β_{ν}^{(s)} | x)}} .$

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Behrens C., Gamerman D., and Lopes H.F., Bayesian analysis of extreme events with threshold estimation, Stat. Modell. 4 (2004), pp. 227–244. [Google Scholar]
2.Cabras S., Nueda M.E.C., and Gamerman D., A default Bayesian approach for regression on extremes, Stat. Modell. 11 (2011), pp. 557–580. [Google Scholar]
3.Carreau J. and Bengio Y., A hybrid Pareto model for asymmetric fat-tailed data: the univariate case, Extremes 12 (2009), pp. 53–76. [Google Scholar]
4.Coles S., An introduction to statistical modeling of extreme values, in Extreme value theory an applications, J. Galambos, J. Lechner, E. Mimiu, eds., Springer-Verlag, London, 2001.
5.Davison A.C. and Smith R.L., Models for exceedances over high thresholds (with discussion), J. R. Stat. Soc. Ser. B 52 (1990), pp. 393–442. [Google Scholar]
6.Diggle P.J. and Ribeiro P.J., Model-Based Geostatistics, Springer Series in Statistics, New York, 2007. [Google Scholar]
7.Embrechts P., Klüppelberg C., and Mikosch T., Modelling Extremal Events: For Insurance and Finance, Springer, New York, 2011. [Google Scholar]
8.Fisher R.A. and Tippet L.H.C., On the estimation of the frequency distributions of the largest and smallest sumber of a sample, Proc. Camb. Philos. Soc. 24 (1928), pp. 180–190. [Google Scholar]
9.Frigessi A., Haug O., and Rue H., A dynamic mixture model for unsupervised tail estimation without threshold selection, Extremes 5 (2003), pp. 219–235. [Google Scholar]
10.Gamerman D. and Lopes H.F., Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, Second Edition, Chapman and Hall CRC, Boca Raton, 2006. [Google Scholar]
11.Jenkinson A.F., The frequency distribution of the annual maximum (or minimum) values of meteorological events, Quart. J. R. Meteorolog. Soc. 81 (1955), pp. 158–172. [Google Scholar]
12.MacDonald A., Scarrott C.J., Lee D., Darlow B., Reale M., and Russell G., A flexible extreme value mixture model, Comp. Statist. Data Anal. 55 (2011a), pp. 2137–2157. [Google Scholar]
13.Nascimento F.F., Gamerman D., and Lopes H.F., Regression models for exceedance data via the full likelihood, Environ. Ecol. Stat. 18 (2011), pp. 495–512. [Google Scholar]
14.Nascimento F.F., Gamerman D., and Lopes H.F., A semiparametric Bayesian approach to extreme value estimation, Stat. Comput. 22 (2012), pp. 661–675. [Google Scholar]
15.Nascimento F.F., Gamerman D., and Lopes H.F., Time-varying extreme pattern with dynamic models, TEST 25 (2016), pp. 131–149. [Google Scholar]
16.Parmesan C., Roo T.L., and Willing M.R., Impacts of extreme weather and climate on terrestrial biota, Bull. Ann. Meterorol. Soc. 81 (2000), pp. 443–450. [Google Scholar]
17.Pickands J., Statistical inference using extreme order statistics, Ann. Stat. 3 (1975), pp. 131–199. [Google Scholar]
18.Sang H. and Gelfand A.E., Hierarchical modeling for extreme values observed over space and time, Environ. Ecol. Stat. 16 (2009), pp. 407–426. [Google Scholar]
19.Scarrot C. and MacDonald A., A review of extreme value threshold estimation and encertainty quantification, REVSTAT Stat. J. 10 (2012), pp. 33–60. [Google Scholar]
20.Smith R.L., Threshold models for sample extremes., Stat. Extrem. Appl. 1 (1984), pp. 621–638. [Google Scholar]
21.Schwarz G., Estimating the dimension of a model, Ann. Stat. 6 (1978), pp. 461–464. [Google Scholar]
22.Spiegelhalter D., Best N.G., Carlin B.P., and van der Linde A., Bayesian measures of model complexity and fit, J. R. Stat. Soc. Ser. B 64 (2002), pp. 583–639. [Google Scholar]
23.Tancredi A., Anderson C., and O'Hagan A., Accounting for threshold uncertainty in extreme value estimation, Extremes 9 (2006), pp. 87–106. [Google Scholar]
24.von Mises R., La distribution de la plus grande de n valeurs, in Selected Papers, Vol. 2, American Mathematical Society, 1954, pp. 271–294.
25.Wiper M., Rios Insua D., and Ruggeri F., Mixtures of gamma distributions with applications, J. Comput. Graph. Stat. 9 (2001), pp. 440–454. [Google Scholar]

[CIT0001] 1.Behrens C., Gamerman D., and Lopes H.F., Bayesian analysis of extreme events with threshold estimation, Stat. Modell. 4 (2004), pp. 227–244. [Google Scholar]

[CIT0002] 2.Cabras S., Nueda M.E.C., and Gamerman D., A default Bayesian approach for regression on extremes, Stat. Modell. 11 (2011), pp. 557–580. [Google Scholar]

[CIT0003] 3.Carreau J. and Bengio Y., A hybrid Pareto model for asymmetric fat-tailed data: the univariate case, Extremes 12 (2009), pp. 53–76. [Google Scholar]

[CIT0004] 4.Coles S., An introduction to statistical modeling of extreme values, in Extreme value theory an applications, J. Galambos, J. Lechner, E. Mimiu, eds., Springer-Verlag, London, 2001.

[CIT0005] 5.Davison A.C. and Smith R.L., Models for exceedances over high thresholds (with discussion), J. R. Stat. Soc. Ser. B 52 (1990), pp. 393–442. [Google Scholar]

[CIT0006] 6.Diggle P.J. and Ribeiro P.J., Model-Based Geostatistics, Springer Series in Statistics, New York, 2007. [Google Scholar]

[CIT0007] 7.Embrechts P., Klüppelberg C., and Mikosch T., Modelling Extremal Events: For Insurance and Finance, Springer, New York, 2011. [Google Scholar]

[CIT0008] 8.Fisher R.A. and Tippet L.H.C., On the estimation of the frequency distributions of the largest and smallest sumber of a sample, Proc. Camb. Philos. Soc. 24 (1928), pp. 180–190. [Google Scholar]

[CIT0009] 9.Frigessi A., Haug O., and Rue H., A dynamic mixture model for unsupervised tail estimation without threshold selection, Extremes 5 (2003), pp. 219–235. [Google Scholar]

[CIT0010] 10.Gamerman D. and Lopes H.F., Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, Second Edition, Chapman and Hall CRC, Boca Raton, 2006. [Google Scholar]

[CIT0011] 11.Jenkinson A.F., The frequency distribution of the annual maximum (or minimum) values of meteorological events, Quart. J. R. Meteorolog. Soc. 81 (1955), pp. 158–172. [Google Scholar]

[CIT0012] 12.MacDonald A., Scarrott C.J., Lee D., Darlow B., Reale M., and Russell G., A flexible extreme value mixture model, Comp. Statist. Data Anal. 55 (2011a), pp. 2137–2157. [Google Scholar]

[CIT0013] 13.Nascimento F.F., Gamerman D., and Lopes H.F., Regression models for exceedance data via the full likelihood, Environ. Ecol. Stat. 18 (2011), pp. 495–512. [Google Scholar]

[CIT0014] 14.Nascimento F.F., Gamerman D., and Lopes H.F., A semiparametric Bayesian approach to extreme value estimation, Stat. Comput. 22 (2012), pp. 661–675. [Google Scholar]

[CIT0015] 15.Nascimento F.F., Gamerman D., and Lopes H.F., Time-varying extreme pattern with dynamic models, TEST 25 (2016), pp. 131–149. [Google Scholar]

[CIT0016] 16.Parmesan C., Roo T.L., and Willing M.R., Impacts of extreme weather and climate on terrestrial biota, Bull. Ann. Meterorol. Soc. 81 (2000), pp. 443–450. [Google Scholar]

[CIT0017] 17.Pickands J., Statistical inference using extreme order statistics, Ann. Stat. 3 (1975), pp. 131–199. [Google Scholar]

[CIT0018] 18.Sang H. and Gelfand A.E., Hierarchical modeling for extreme values observed over space and time, Environ. Ecol. Stat. 16 (2009), pp. 407–426. [Google Scholar]

[CIT0019] 19.Scarrot C. and MacDonald A., A review of extreme value threshold estimation and encertainty quantification, REVSTAT Stat. J. 10 (2012), pp. 33–60. [Google Scholar]

[CIT0020] 20.Smith R.L., Threshold models for sample extremes., Stat. Extrem. Appl. 1 (1984), pp. 621–638. [Google Scholar]

[CIT0021] 21.Schwarz G., Estimating the dimension of a model, Ann. Stat. 6 (1978), pp. 461–464. [Google Scholar]

[CIT0022] 22.Spiegelhalter D., Best N.G., Carlin B.P., and van der Linde A., Bayesian measures of model complexity and fit, J. R. Stat. Soc. Ser. B 64 (2002), pp. 583–639. [Google Scholar]

[CIT0023] 23.Tancredi A., Anderson C., and O'Hagan A., Accounting for threshold uncertainty in extreme value estimation, Extremes 9 (2006), pp. 87–106. [Google Scholar]

[CIT0024] 24.von Mises R., La distribution de la plus grande de n valeurs, in Selected Papers, Vol. 2, American Mathematical Society, 1954, pp. 271–294.

[CIT0025] 25.Wiper M., Rios Insua D., and Ruggeri F., Mixtures of gamma distributions with applications, J. Comput. Graph. Stat. 9 (2001), pp. 440–454. [Google Scholar]

PERMALINK

Regression models for the full distribution to exceedance data

Fernando Ferraz do Nascimento

Aline Raquel Assunção Nunes

Abstract