Risk Analysis via Generalized Pareto Distributions

YI HE; LIANG PENG; DABAO ZHANG; ZIFENG ZHAO

doi:10.1080/07350015.2021.1874390

. Author manuscript; available in PMC: 2023 Jan 1.

Published in final edited form as: J Bus Econ Stat. 2021 Mar 3;40(2):852–867. doi: 10.1080/07350015.2021.1874390

Risk Analysis via Generalized Pareto Distributions

YI HE ¹, LIANG PENG ², DABAO ZHANG ³, ZIFENG ZHAO ⁴

PMCID: PMC9231421 NIHMSID: NIHMS1733545 PMID: 35756092

Abstract

We compute the value-at-risk of financial losses by fitting a generalized Pareto distribution to exceedances over a threshold. Following the common practice of setting the threshold as high sample quantiles, we show that, for both independent observations and time-series data, the asymptotic variance for the maximum likelihood estimation depends on the choice of threshold, unlike the existing study of using a divergent threshold. We also propose a random weighted bootstrap method for the interval estimation of VaR, with critical values computed by the empirical distribution of the absolute differences between the bootstrapped estimators and the maximum likelihood estimator. While our asymptotic results unify the inference with non-divergent and divergent thresholds, the finite sample studies via simulation and application to real data show that the derived confidence intervals well cover the true VaR in insurance and finance.

Keywords: ARMA-GARCH models, Generalized Pareto distribution, Random weighted bootstrap, Value-at-risk, Weighted empirical process

1. Introduction

Measuring risk and quantifying its uncertainty is crucial in insurance and finance. A well-studied and widely used risk measure is the so-called Value-at-Risk (VaR) at level 1 − p ∈ (0,1), which is defined as the quantile of the distribution function of a risk variable or a portfolio; see Duffie and Pan (1997) and Jorion (2006) for an overview of VaR. Given n identically distributed observations, the VaR at level 1 − p can be estimated nonparametrically by the sample quantile when n(1 − p) is not close to n or zero. The risk manager may quantify the estimation uncertainty via a direct estimation of the asymptotic variance or resampling methods such as the bootstrap and the empirical likelihood in Owen (2001).

The regulators often set the probability level 1 − p of VaR to be close to one such as 99% and 99.9%. Therefore, when the sample size is not particularly large, the nonparametric VaR estimator is inefficient and may seriously underestimate the risk. An obvious way to improve inference efficiency is to fit a parametric distribution family to the risk variable. It is known that an efficient likelihood based inference mainly uses the information around the center of data. As 1 − p is close to one, the information on the upper tail of the distribution becomes more crucial to the study of VaR. Therefore, one may build a parametric model for observations above a threshold to ensure that the upper tail’s fitting is accurate and robust. This raises an interesting question on how to model the excess distribution above a threshold, say, u given by

F_{u} (x) = P (X - u \leq x ∣ X > u) = \frac{F (x + u) - F (u)}{1 - F (u)} for 0 \leq x < x_{F} - u,

where x_F the right endpoint of the distribution function $F (x) = P (X \leq x)$ , i.e., x_F = sup{x : F(x) < 1}.

The extreme value theory states that, when F is in the domain of attraction of extreme value distribution, there exists a function β(u) > 0 such that

lim_{u \to x_{F}} sup_{0 \leq x < x_{F} - u} ∣ F_{u} (x) - G_{γ, β (u)} (x) ∣ = 0,

(1.1)

where G_γ,β(u)(x) = 1−(1 + γx / β(u))^−1/γ for 1 + γx / β(u) > 0 is the cumulative distribution function of the generalized Pareto distribution with the shape parameter γ and scale parameter β(u); see Balkema and de Haan (1974), and the overviews by Resnick (1987) and Embrechts et al. (1997). Fitting a generalized Pareto distribution to exceedances over a high threshold has been studied in the literature. For example, Smith (1987) and Drees et al. (2004) have studied the maximum likelihood estimation when a deterministic divergent threshold and a random divergent threshold are chosen, respectively; see also Davison and Smith (1990). The choice of the threshold depends on the order of the approximation errors in (1.1), which generally is defined as a second order regular variation. Typically, a large threshold gives a big variance, but a small threshold leads to large estimation bias. Given the difficulty in choosing this divergent threshold, researchers often advise practitioners to plot estimators against various thresholds and find a relatively stable region. In this case, the estimator has a non-negligible bias, which complicates the interval estimation.

Nevertheless, as a rule of thumb, practitioners often ignore the asymptotic bias and pick up 90% or 95% sample quantile as a threshold; see the discussions in Section 13.6.1 of Hull (2018). This is especially the case when modeling the so-called dynamic tail risk by some critical economic variables. Some applications of the generalized Pareto distribution include Rootzén and Tajvidi (1997) and Brodin and Rootzén (2009) for wind storm losses, Barro and Jin (2011) for economic disasters, and McNeil and Frey (2000), Chavez-Demoulin and Embrechts (2004), Bollerslev and Todorov (2011), and Allen et al. (2012) for financial time series. For dynamically modeling the generalized Pareto distribution, we refer to Chavez-Demoulin et al. (2005), Kelly and Jiang (2014), Chavez-Demoulin et al. (2014), Massacci (2017), and Zhao (2020) for financial returns and Hall and Tajvidi (2000) for climate data.

The common practice of using 90% or 95% sample quantile as the threshold ignores the estimation bias caused by the model approximation error. Hence, it becomes natural to model exceedances over an (unknown) fixed threshold by a generalized Pareto distribution. In other words, instead of fitting a parametric model to the entire data set, it is good to fit the exceedances over a threshold by the generalized Pareto distribution and model the data below the threshold nonparametrically as in Smith (1987) and Drees et al. (2004) for independent data, McNeil and Frey (2000) for an ARMA-GARCH model, and Martins-Filho et al. (2018) for nonparametric regression. Under such a model assumption, when the threshold is chosen as a sample quantile, the inference for the parameters and VaR depends on the random threshold selected, which is in stark contrast with the existing study of using a divergent threshold. A particular semi-parametric model we focus on is

F (x) = {\begin{matrix} \frac{θ G (x)}{G (x_{0})} & if x \leq x_{0} \\ 1 - (1 - θ) {(1 + \frac{γ (x - x_{0})}{σ})}^{- 1 ∕ γ} & if x > x_{0}, \end{matrix}

(1.2)

where θ ∈ (0,1), and G is a distribution function.

This paper aims to take such a model to provide a comprehensive inference on VaR for independent observations and time series data. The developed methodologies can be also applied to other tail-related risk measures such as expected shortfall and expectile. As insurance losses are arguably independent, we first derive the asymptotic distribution of the maximum likelihood estimator of the model parameters and the VaR based on independent data. We develop a unified inference theory for a universal threshold statistic, which can be a deterministic threshold based on prior knowledge, an order statistic based on the observations, or a more sophisticated quantile estimator. We show that the naive bootstrap method and the random weighted bootstrap method are both asymptotically correct for quantifying the estimation uncertainty.

For dependent data such as financial time series, we propose to infer on the conditional VaR by combining an ARMA-GARCH model and the semiparametric model for the residuals. To ensure the normality of VaR estimation for the ARMA-GARCH model with heavy-tailed errors, we propose a two-step self-weighted procedure to estimate the ARMA-GARCH model before fitting the residual distribution semiparametrically. We first estimate the ARMA parameters by a self-weighted least-squares method. Then estimate the GARCH parameters using the self-weighted exponential quasi-likelihood in Zhu and Ling (2011) with the least-squares ARMA residuals. Our approach maintains the natural condition that the ARMA errors have a zero mean, rather than a zero median in Zhu and Ling (2011), when relaxing the kurtosis condition on GARCH errors. To quantify the uncertainty of the conditional VaR estimation, we propose the random weighted bootstrap method, which is much less computationally intensive than the residual based bootstrap method.

The existing methods of using a divergent threshold face the severe difficulty of choosing the threshold. When one concerns interval estimation, the efficient way is to choose a larger threshold such that the estimation bias is negligible. This essentially assumes the exceedance follows an exact generalized Pareto distribution. In other words, when the exceedance has an approximate generalized Pareto distribution, our proposed point estimation and interval estimation are still valid as long as the divergent threshold is sufficiently large such that the model approximation error is at a smaller order than that of the estimation error.

We organize this paper as follows. Sections 2 and 3 present our methodologies and asymptotic results for independent observations and an ARMA-GARCH model, respectively. Sections 4 and 5 contain simulation study and data analysis. Finally, Section 6 concludes the paper and discusses future work. The detailed proofs of the theorems are available in the supplement. Throughout this paper, we denote by A^T the transpose of a matrix or vector A, and denote $\overset{d}{\to}$ as convergence in distribution and $\overset{P}{\to}$ as convergence in probability. All the asymptotic results hold as the sample size n → ∞.

2. Methodologies and Asymptotic Results for Independent Data

Consider a random variable $X \in R$ with distribution function F and quantile function Q(·) = F^←(·). For a threshold u₀ = F^←(1 − α₀) with exceeding probability α₀ ∈ (0,1), we make the following assumption for the exceedance X − u₀ ∣ X > u₀.

Assumption 1 (Generalized Pareto Model). There exist a shape parameter $γ_{0} \in R$ and a scale parameter σ_α₀ > 0 such that

P (X > u_{0} + x ∣ X > u_{0}) = {\begin{matrix} {(1 + \frac{γ_{0} x}{σ_{α_{0}}})}^{- 1 ∕ γ_{0}}, & γ_{0} \neq 0, \\ exp (- \frac{x}{σ_{α_{0}}}), & γ_{0} = 0 . \end{matrix}

where $α_{0} = P (X > u_{0})$ and we require 1 + γ₀x / σ_α₀ > 0 for γ₀ ≠ 0 and x > 0 for γ₀ = 0.

The shape parameter γ₀ is called the extreme value index for the exceedance X − u₀ ∣ X > u₀. When γ₀ < 0, there is a finite right endpoint $u^{*} = u_{0} - \frac{σ_{_{α_{0}}}}{γ_{0}}$ in the support of the distribution of X, i.e. F(x) = 1 for all x ≥ u^*. When γ = 0, X − u₀ ∣ X > u₀ has an exponential distribution with mean σ_α₀. When γ₀ > 0, X − u₀ ∣ X > u₀ has a heavy tail with up to $\frac{1}{γ_{0}}$ -th finite moments. Note that σ_α₀ is also a function of u₀ via $α_{0} = P (X > u_{0})$ .

Observe that, for any higher threshold u > u₀, the exceedance X − u ∣ X > u again follows the generalized Pareto distribution with the same shape parameter γ₀ but a different scale parameter $σ_{α} = {(\frac{α_{0}}{α})}^{γ_{0}} σ_{α_{0}}$ , where α = 1 − F(u) is the exceeding probability. Specifically,

P (X > u + x ∣ X > u) = {\begin{matrix} {(1 + \frac{γ_{0} x}{σ_{α}})}^{- 1 ∕ γ_{0}}, & γ_{0} \neq 0, \\ exp (- \frac{x}{σ_{α}}), & γ_{0} = 0 . \end{matrix}

A direct calculation yields the (1 − p)-th quantile of X, i.e., the VaR at level 1 − p takes the form

{VaR}_{X} (1 - p) = Q (1 - p) = {\begin{matrix} u + \frac{σ_{α}}{γ_{0}} ({(\frac{α}{p})}^{γ_{0}} - 1) & if γ_{0} \neq 0, \\ u + σ_{α} \log (\frac{α}{p}) & if γ_{0} = 0, \end{matrix}

(2.1)

for all given p ∈ (0, α₀).

As Assumption 1 above does not model the distribution below the threshold parametrically, computing VaR(1 − p) based on (2.1) is a semiparametric method and achieves a good balance between robustness and efficiency. It is easy to check that model (1.2) satisfies Assumption 1. Unlike the existing studies on fitting GPD to exceedances over a divergent threshold, we initially investigate the inference based on a non-divergent threshold. In this case, the threshold may play a role in quantifying the inference uncertainty of VaR in (2.1). On the other hand, if the threshold diverges fast enough such that the estimation bias is negligible, then the model approximation error is negligible. Hence, the developed method for fitting an exact generalized Pareto distribution is valid for using a larger divergent threshold under the setting that the exceedance has an approximate generalized Pareto distribution.

Suppose we have a random sample X₁, … , X_n from F satisfying Assumption 1. Let the order statistics be X_1:n ≤ … ≤ X_n:n. Take a large threshold, say, u_n, either deterministic or random, corresponding to the sample exceeding probability

{\hat{α}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} δ (X_{i} - u_{n}),

(2.2)

where δ(x) := 1(x > 0) denotes the step function taking value 1 on the positive line and value 0 otherwise. Let α_n =1 − F(u_n) denote the adaptive exceeding probability, which may be either deterministic or random depending on our choice of the threshold u_n.

Given an exceedance X_i − u_n = x > 0, the log-likelihood function for the Pareto parameters $v ≔ (γ, \log σ)^{T} \in R^{2}$ is given by

l (v ∣ x) = - {\frac{1 + γ}{γ} \log (1 + \frac{γ x}{σ}) + \log σ} .

Note that the above function is well defined for γ = 0 by continuity as

l ((0, \log σ)^{T} ∣ x) = - \frac{x}{σ} - \log σ .

Thus, the full log-likelihood function with the observed X₁ − u_n, … , X_n − u_n is given by

\sum_{i = 1}^{n} δ (X_{i} - u_{n}) l (v ∣ X_{i} - u_{n}) .

Therefore, the maximum likelihood estimator of ν solves the score equations

\sum_{i = 1}^{n} δ (X_{i} - u_{n}) \frac{\partial l (v ∣ X_{i} - u_{n})}{\partial γ} = \sum_{i = 1}^{n} δ (X_{i} - u_{n}) s_{1} (v ∣ X_{i} - u_{n}) = 0,

(2.3)

\sum_{i = 1}^{n} δ (X_{i} - u_{n}) \frac{\partial l (v ∣ X_{i} - u_{n})}{\partial \log σ} = \sum_{i = 1}^{n} δ (X_{i} - u_{n}) s_{2} (v ∣ X_{i} - u_{n}) = 0,

(2.4)

where

s_{1} (γ, \log σ ∣ x) = \frac{1}{γ^{2}} (\log (1 + γ \frac{x}{σ}) - \frac{γ x ∕ σ}{1 + γ x ∕ σ}) - \frac{x ∕ σ}{1 + γ x ∕ σ}, s_{2} (γ, \log σ ∣ x) = - 1 + (1 + γ) \frac{x ∕ σ}{1 + γ x ∕ σ},

and for γ = 0 the above equations take the form

s_{1} (0, \log σ ∣ x) = \frac{1}{2} {(\frac{x}{σ})}^{2} - \frac{x}{σ}, s_{2} (0, \log σ ∣ x) = - 1 + \frac{x}{σ} .

In this paper, we only consider the regular case, i.e., $γ_{0} > - \frac{1}{2}$ as in Davison and Smith (1990) and Drees et al. (2004), and it is often the case of γ₀ > 0 regarding heavy-tailed losses in insurance and finance; see also Bücher and Segers (2017) for more discussions. For dealing with the irregular case, i.e., γ₀ ≤ −1/2, we refer to Smith (1985), Zhou (2009, 2010), and Peng and Qi (2009).

Davison and Smith (1990) disregard the randomness of threshold while Drees et al. (2004) obtain the asymptotic normality of MLE for a divergent random threshold (i.e., $\bar{α} = \bar{α} (n) \to 0$ as n → ∞) under (1.1), which holds for Assumption 1. Here, we present a universal asymptotic normality result under Assumption 1 in the sense of unifying the cases of using either a deterministic threshold or a random threshold:

Assumption 2 (Universal threshold statistic). The threshold u_n = u_n(X₁, … , X_n) is an arbitrary measurable statistic such that $u_{n} \overset{P}{\to} Q (1 - \bar{α})$ for some $\bar{α} \in (0, α_{0})$ .

Remark 1. Assumption 2 allows a flexible choice of the threshold u_n, regardless of it being deterministic or random. The practitioners may choose a deterministic threshold based on their prior knowledge, an order statistic based on the observations, or an even more sophisticated quantile estimator. Unifying these thresholds extends the scope of our inference theory, and it is necessary for our extension to time-series data in the next section where the threshold for the sample residuals may depend on the estimator of the ARMA-GARCH parameters.

Normalizing the estimators with the adaptive values $θ_{0}^{(n)} = (γ_{0}, \log σ_{α_{n}}, \log α_{n})$ rather than its limit $θ_{0} = (γ_{0}, \log σ_{\bar{α}}, \log \bar{α})$ , we have a unified inference procedure for a general threshold statistic u_n:

Theorem 1 (Universal inference for generalized Pareto parameters). Suppose that Assumption 1 holds with a true parameter $γ_{0} > - \frac{1}{2}$ and the choice of sequence u_n satisfies Assumption 2.

With probability tending to 1, there exists a maximum likelihood estimator $θ_{n} = ({\hat{γ}}_{n}, \log {\hat{σ}}_{n}, \log {\hat{α}}_{n})$ , solving the score equations (2.2)- (2.4) simultaneously, in the local parameter space
${\bar{Θ}}_{n}^{ε} = {θ \in R^{3} : ‖ θ - θ_{0}^{(n)} ‖ < n^{- 1 ∕ 2 + ε}},$ (2.5)

for any ε ∈ (0, min {γ₀ + 1 / 2, 1 / 2}), where $θ_{0}^{(n)} = (γ_{0}, \log σ_{α_{n}}, \log α_{n})$ denotes the adaptive true values.
Any maximum likelihood estimator sequence from part (i) is asymptotically normal in such a way that
$\sqrt{n \bar{α}} ({\hat{γ}}_{n} - γ_{0}, \frac{{\hat{σ}}_{n}}{σ_{α_{n}}} - 1, \frac{{\hat{α}}_{n}}{α_{n}} - 1) \overset{d}{\to} N (0, [\begin{matrix} I^{- 1} & 0 \\ 0 & 1 - \bar{α} \end{matrix}])$

where the inverse Fisher information matrix
$I^{- 1} = {(E (\frac{\partial l (v_{0} ∣ Z)}{\partial v} \frac{\partial l (v_{0} ∣ Z)}{\partial v^{T}} ∣ Z > 0))}^{- 1} = [\begin{matrix} (1 + γ_{0})^{2} & - (1 + γ_{0}) \\ - (1 + γ_{0}) & 2 (1 + γ_{0}) \end{matrix}]$

with $Z = X - Q (1 - \bar{α})$ .

In practice, it is common to fix a proportion of data, say, $\bar{α} \in (0, 1)$ and use the [ $n \bar{α}$ ]th largest observation $u_{n} = X_{n - [n \bar{α}] : n}$ as the threshold. It is then easy to deduce the following corollary.

Corollary 1. Under the conditions of Theorem 1 with $u_{n} = X_{n - [n \bar{α}] : n}$ , as n → ∞,

\sqrt{n \bar{α}} (\begin{matrix} {\hat{γ}}_{n} - γ_{0} \\ \frac{{\hat{σ}}_{n}}{σ_{\bar{α}}} - 1 \\ \frac{{\hat{α}}_{n}}{\bar{α}} - 1 \end{matrix}) \overset{d}{\to} N (0, [\begin{matrix} (1 + γ_{0})^{2} & - (1 + γ_{0}) & 0 \\ - (1 + γ_{0}) & 2 (1 + γ_{0}) + γ_{0}^{2} (1 - \bar{α}) & - γ_{0} (1 - \bar{α}) \\ 0 & - γ_{0} (1 - \bar{α}) & 1 - \bar{α} \end{matrix}]) .

Remark 2. Though Assumption 2 requires a fixed $\bar{α} \in (0, α_{0})$ , our asymptotic variance formula in Theorem 1 is indeed unified for both a finite and a divergent threshold. Specifically, in the supplement, we show that the asymptotic results in Theorem 1 remain valid if $\bar{α} = {\bar{α}}_{n}$ is an intermediate sequence such that $\bar{α} \to 0$ and $n \bar{α} \to \infty$ if we rewrite Assumption 2 with $u_{n} ∕ Q (1 - \bar{α}) \overset{P}{\to} 1$ , and the asymptotic variance formula in Theorem 1 remains valid by simply setting $\bar{α}$ to its limit 0. For a vanishing $\bar{α}$ (i.e. $\bar{α} \to 0$ ), we allow the true threshold α₀ to vanish as long as $\bar{α} ∕ α_{0}$ is bounded strictly below 1. Moreover, it can be seen from the proofs that our results remain true under a relaxed Assumption 1 as long as the approximation error between the exceedance distribution and a generalized Pareto distribution is at a smaller order than that of the estimation error. More specifically, suppose our observations ( $X_{1}^{(n)}, \dots, X_{n}^{(n)}$ ) come from a triangular array of i.i.d. random variables and denote their common distribution as F⁽ⁿ⁾. Our inference remains valid if the generalized Pareto model is approximately true, that is,

sup_{x \geq 0} ∣ \frac{1 - F_{u_{0}}^{(n)} (x)}{1 - G_{γ, σ_{α_{0}}} (x)} - 1 ∣ = o ((n α_{0})^{- 1 ∕ 2}),

(2.6)

where $F_{u_{0}}^{(n)} (x) = \frac{F^{(n)} (x + u_{0}) - F^{(n)} (u_{0})}{α_{0}}$ denotes the exceedance distribution function, the exceeding probability α₀ = 1 − F⁽ⁿ⁾(u₀) can be either fixed or vanishing, and G_{γ,σ_α₀} denotes the generalized Pareto distribution with the shape parameter γ and scale parameter σ_α₀. For a vanishing α₀, condition (2.6) indeed follows from the high-order extended regular variation conditions as shown in, e.g., Drees et al. (2004) which is a necessary assumption used in the extreme value literature for removing the estimation bias with a divergent threshold. In conclusion, our fixed- $\bar{α}$ approach is more robust than the existing extreme value approach, and covers more practical applications. We leave the studies under model misspecification (e.g., when the model approximation (2.6) does not hold) for future works.

Remark 3. As argued by Dombry (2015), there is no guarantee that the global maximum likelihood estimator is unique. Even if a global MLE is attainable, the classical regularity conditions in Cramér (1946) are not fulfilled, and it requires a detailed verification of the local asymptotic normality (LAN) conditions in Bücher and Segers (2017). Also, the global estimation theory in Bücher and Segers (2017) does not apply as our ‘true’ values $θ_{0}^{(n)}$ are a sequence of adaptive values depending on the (random) threshold statistic rather than a fixed point. Therefore, we consider a local maximum likelihood estimator and leave the global estimation theory for future research. Note that this challenge remains for a divergent threshold, as the asymptotic normality results in, e.g., Drees et al. (2004) are not guaranteed to hold for an arbitrary global estimator sequence; see, e.g., Zhou (2009) and Zhou (2010) for comments.

Plugging the estimator ( ${\hat{γ}}_{n}$ , ${\hat{σ}}_{n}$ , ${\hat{α}}_{n}$ ) from Theorem 1 in VaR formula (2.1), the MLE of VaR_X(1 − p) is given by

{VaR}_{X} (1 - p) = u_{n} + \frac{{\hat{σ}}_{n}}{{\hat{γ}}_{n}} ({(\frac{{\hat{α}}_{n}}{p})}^{{\hat{γ}}_{n}} - 1),

(2.7)

which takes the form ${VaR}_{X} (1 - p) = u_{n} + {\hat{σ}}_{n} \log (\frac{{\hat{α}}_{n}}{p})$ if ${\hat{γ}}_{n} = 0$ . The asymptotic normality of the quantile estimator (2.7) then follows directly from the continuous mapping theorem, since we can expand the true quantile in (2.1) similarly by

{VaR}_{X} (1 - p) = u_{n} + \frac{σ_{α_{n}}}{γ_{0}} ({(\frac{α_{n}}{p})}^{γ_{0}} - 1),

(2.8)

even with a random adaptive exceeding probability α_n, conditional on the event α_n > α₀, which occurs with probability tending to 1. Again, our quantile inference is asymptotically correct for a universal threshold statistic.

Theorem 2 (Universal inference for high quantile). Under the conditions of Theorem 1, for every p ∈ (0, α₀),

\frac{\sqrt{n \bar{α}}}{σ_{p}} ({VaR}_{X} (1 - p) - {VaR}_{X} (1 - p)) \overset{d}{\to} N (0, q {(\frac{\bar{α}}{p})}^{T} I^{- 1} q (\frac{\bar{α}}{p}) + 1 - \bar{α}),

where for γ₀ ≠ 0 the vector function

q (t) = {(\int_{1}^{t} {(\frac{S}{t})}^{γ_{0}} \frac{\log s}{s} d s, \frac{1 - t^{- γ_{0}}}{γ_{0}})}^{T}, t > 0,

and it should be interpreted by continuity as ${(\frac{1}{2} (\log t)^{2}, \log t)}^{T}$ when γ₀ = 0.

Remark 4. Ignoring all common factors, one may search for the best threshold as u_n = X_n−[nλp]:n with λ in a neighborhood of 1 minimizing the asymptotic variance

\frac{1}{λ} (\hat{q} (λ)^{T} I^{- 1} \hat{q} (λ) + 1),

where $I^{^{- 1}}$ and $\hat{q}$ may be constructed using some preliminary estimate $\hat{γ}$ of the extreme value index γ₀ as given below. If necessary, one may update $\hat{γ}$ with the new choice of λ until convergence. On the other hand, it is important to develop a distribution-free goodness-of-fit test for fitting a generalized Pareto distribution to exceedances over a threshold. It is challenging to extend the existing parametric testing methods in, e.g., Koul and Ling (2006) to our semi-parametric models, which will be our future research.

It is straightforward to quantify the uncertainty of VaR_X(1 − p) based on the normal approximation. More specifically, we estimate $\bar{α}$ by ${\hat{α}}_{n}$ (if $\bar{α}$ is unknown), the scale parameter σ_p by

{\hat{σ}}_{p} = {\hat{σ}}_{n} ({\hat{α}}_{n} ∕ p)^{{\hat{γ}}_{n}},

and the limiting variance by

{\hat{τ}}_{n}^{2} ≔ q {(\frac{{\hat{α}}_{n}}{p})}^{T} I^{^{- 1}} q (\frac{{\hat{α}}_{n}}{p}) + 1 - {\hat{α}}_{n}

with

I^{^{- 1}} = [\begin{matrix} (1 + {\hat{γ}}_{n})^{2} & - (1 + {\hat{γ}}_{n}) \\ - (1 + {\hat{γ}}_{n}) & 2 (1 + {\hat{γ}}_{n}) \end{matrix}] .

Hence, a normal approximation confidence interval of VaR_X(1 − p) with level a is

I_{N A} (a) = [{VaR}_{X} (1 - p) - \frac{z_{(1 + a) ∕ 2}}{\sqrt{n {\hat{α}}_{n}}} {\hat{σ}}_{n} {(\frac{{\hat{α}}_{n}}{p})}^{{\hat{γ}}_{n}} {\hat{τ}}_{n}, {VaR}_{X} (1 - p) + \frac{z_{(1 + a) ∕ 2}}{\sqrt{n {\hat{α}}_{n}}} {\hat{σ}}_{n} {(\frac{{\hat{α}}_{n}}{p})}^{{\hat{γ}}_{n}} {\hat{τ}}_{n}],

where z_(1+a)/2 is the $\frac{1 + a}{2}$ -quantile of the standard normal distribution. Unfortunately, our simulation study below shows that this interval has a poor coverage probability in small samples, which calls for more efficient methods.

To improve finite-sample coverage, we propose a resampling method called the random weighted bootstrap procedure. The random weighted bootstrap method is less computationally intensive than the naive bootstrap method when we estimate risk based on a time series model (in the next section).

Zhu (2016, 2019) recently applies this method to conduct a Portmanteau test and infer autoregressive models.

Step B1) Draw a random sample with sample size n from a distribution function with mean one and variance one such as the standard exponential distribution, say $ξ_{1}^{(b)}$ , ⋯ , $ξ_{n}^{(b)}$ .
Step B2) Choose a threshold statistic $u_{n}^{(b)}$ , possibly dependent on $ξ_{1}^{(b)}$ , … , $ξ_{n}^{(b)}$ . Solve the following random weighted score equations for $\bar{α}$ , γ, and log σ:
$\sum_{i = 1}^{n} ξ_{i} (δ (X_{i} - u_{n}^{(b)}) - \bar{α}) = 0$ (2.9)

$\sum_{i = 1}^{n} ξ_{i} δ (X_{i} - u_{n}^{(b)}) s_{1} (v ∣ X_{i} - u_{n}^{(b)}) = 0,$ (2.10)

$\sum_{i = 1}^{n} ξ_{i} δ (X_{i} - u_{n}^{(b)}) s_{2} (v ∣ X_{i} - u_{n}^{(b)}) = 0 .$ (2.11)

Denote these estimators by ${\hat{α}}_{n}^{(b)}$ , ${\hat{γ}}_{n}^{(b)}$ , and ${\hat{σ}}_{n}^{(b)}$ , we have
${VaR}_{X}^{(b)} (1 - p) = u_{n}^{(b)} + \frac{{\hat{σ}}_{n}^{(b)}}{{\hat{γ}}_{n}^{(b)}} (({\hat{α}}_{n}^{(b)} ∕ p)^{{\hat{γ}}_{n}^{(b)}} - 1) .$
Step B3) Repeat the above two steps B times to obtain ${{VaR}_{X}^{(b)} (1 - p)}_{b = 1}^{B}$ . Let ${\bar{D}}_{1 : B} \leq \dots \leq {\bar{D}}_{B : B}$ denote the order statistics of
$\log (\frac{{VaR}_{X}^{(b)} (1 - p)}{{VaR}_{X} (1 - p)}), b = 1, \dots, B,$

and let ${\bar{D}}_{(1)} \leq \dots \leq {\bar{D}}_{(B)}$ denote the order statistics of
$∣ \log (\frac{{VaR}_{X}^{(b)} (1 - p)}{{VaR}_{X} (1 - p)}) ∣, b = 1, \dots, B,$

Hence, the confidence intervals with level a for log(VaR_X(1 − p)) are
$I_{R W B, 1} (a) = [\log ({VaR}_{X} (1 - p)) - {\bar{D}}_{[\frac{B + B a}{2}] : B}, \log ({VaR}_{X} (1 - p)) - {\bar{D}}_{[\frac{B - B a}{2}] : B}]$

and
$I_{R W B, 2} (a) = [\log ({VaR}_{X} (1 - p)) - {\bar{D}}_{(B a)}, \log ({VaR}_{X} (1 - p)) + {\bar{D}}_{(B a)}] .$

The following theorem establishes the validity of our random weighted bootstrap method.

Theorem 3 (Random weighted bootstrap). Suppose the conditions of Theorem 1 hold. For an arbitrary bootstrap threshold statistic $u_{n}^{(b)} = u_{n} + o_{P} (1)$ and let $α_{n}^{(b)} = 1 - F (u_{n}^{(b)})$ :

With probability tending to 1, there exists a random weighted maximum likelihood estimator $θ_{n}^{(b)}$ = ( ${\hat{γ}}_{n}^{(b)}$ , $\log {\hat{σ}}_{n}^{(b)}$ , $\log {\hat{α}}_{n}^{(b)}$ ), solving the score equations (2.9)- (2.11) simultaneously in the local parameter space
$Θ_{ε}^{(b)} = {θ \in R^{3} : ‖ θ - θ_{0}^{(b)} ‖ < n^{- 1 ∕ 2 + ε}},$ (2.12)

for any ε ∈ (0, min{γ₀ + 1 / 2, 1 / 2}), where $θ_{0}^{(b)} = (γ_{0}, \log σ_{α_{n}^{(b)}}, \log α_{n}^{(b)})$ denotes the adaptive true values.
For each probability level a ∈ (0,1),
$P ({VaR}_{X} (1 - p) - {VaR}_{X} (1 - p) \leq c_{n} (a)) \to 1 - a,$

where
$c_{n} (a) = inf {x : P ({VaR}^{(b)} (1 - p) - {VaR}_{X} (1 - p) \leq x ∣ X_{1}, \dots, X_{n}) > 1 - a} .$

The result remains true if VaR_X(1 − p), VaR_X(1 − p), and VaR^(b)(1 − p) are substituted by their logarithms, respectively, for VaR_X(1 − p) > 0.

Remark 5. The random weighted bootstrap intervals for the extreme value index γ₀ is also valid. For each probability level a ∈ (0,1), $P ({\hat{γ}}_{n} - γ_{0} \leq c_{n, γ} (a)) \to 1 - a$ , where

c_{n, γ} (a) = inf {x : P ({\hat{γ}}_{n}^{(b)} - {\hat{γ}}_{n} \leq x ∣ X_{1}, \dots, X_{n}) > 1 - a} .

The result remains true if we substitute γ₀, ${\hat{γ}}_{n}$ and ${\hat{γ}}_{n}^{(b)}$ by their logarithms respectively, when γ₀ > 0. The random weighted bootstrap intervals for the adaptive scale parameter σ_{α_n} and the adaptive exceeding probability α_n, are asymptotically correct if the difference between the bootstrap threshold and the original threshold is asymptotically negligible in the sense that $u_{n}^{(b)} = u_{n} + o_{P} ((n \bar{α})^{- 1 ∕ 2})$ .

Remark 6 (Naive bootstrap). In the supplement, we show that Theorem 3 and Remark 5 remain true if replacing the random weighted bootstrap statistics with the naive bootstrap statistics. In simulations, we observe comparable performance between these two methods for independent data.

3. Methodologies and Asymptotic Results for ARMA-GARCH Models

Since the seminal work of Engle (1982) and Bollerslev (1986), it has become a common practice to model a financial time series by an ARMA-GARCH model given by

{\begin{matrix} Y_{_{t}} = μ + \sum_{i = 1}^{q_{1}} ϕ_{_{i}} Y_{_{t - i}} + \sum_{j = 1}^{q_{2}} ψ_{_{j}} ε_{_{t - j}} + ε_{_{t}} \\ ε_{_{t}} = \sqrt{{\underline{h}}_{_{t}}} η_{_{t}}, {\underline{h}}_{_{t}} = \underline{ω} + \sum_{i = 1}^{r} {\underline{a}}_{_{i}} ε_{_{t - i}}^{^{2}} + \sum_{j = 1}^{s} {\underline{b}}_{_{j}} {\underline{h}}_{_{t - j}}, \end{matrix}

(3.1)

where $\underline{ω} > 0$ , ${\underline{a}}_{i} \geq 0$ , ${\underline{b}}_{j} \geq 0$ , and {η_t} is a sequence of i.i.d. random variables with zero mean and variance one. In this case, the so-called one-step ahead conditional VaR is more useful in forecasting risk and is defined as the conditional quantile of Y_n+1 given the past information up to time n, i.e., $F_{n} = σ (\dots, Y_{n - 1}, Y_{n})$ . Hence, the one-step ahead conditional VaR is

{VaR}_{Y, n} (1 - p) = μ + \sum_{i = 1}^{q_{1}} ϕ_{i} Y_{n + 1 - i} + \sum_{j = 1}^{q_{2}} ψ_{j} ε_{n + 1 - j} + \sqrt{{\underline{h}}_{n + 1}} {VaR}_{η} (1 - p),

(3.2)

and note that ${\underline{h}}_{n + 1}$ is $F_{n}$ − measurable.

We remark that McNeil and Frey (2000) consider the model above, and Martins-Filho et al. (2018) study the nonparametric regression, which covers AR-ARCH models but not ARMA-GARCH models. Both papers only consider the case of vanishing risk level, i.e., p = p(n) → 0 as n → ∞. In this case, the estimation for the ARMA-GARCH model in McNeil and Frey (2000) and the kernel smoothing estimation for the conditional mean and conditional standard deviation in Martins-Filho et al. (2018) do not play a role in the asymptotic variance of the VaR estimation. Unlike these two papers, we aim to allow both fixed and vanishing risk levels and consider the uncertainties in fitting both the ARMA-GARCH model with fewer finite moments and the GPD to residuals.

As mentioned above, regulators often set p close to one, making it useful to model η_t over a high threshold by a GPD parametrically. To infer the conditional VaR, we need to estimate the unknown parameters in (3.1) and (2.1). An obvious inference method for model (3.1) is the so-called quasi maximum likelihood estimation. The asymptotic normality of the quasi-Gaussian maximum likelihood estimator is available in Francq and Zakoïan (2004), which requires finite fourth moments of both ε_t and η_t. However, in practice, it is often that $\sum_{i = 1}^{r} a_{i} + \sum_{j = 1}^{s} b_{j}$ is close to one, making it problematic to assume $E ε_{t}^{4} < \infty$ . When $E η_{t}^{4} < \infty$ , Ling (2007) proposes a self-weighted quasi-maximum likelihood estimator that has a normal limiting distribution and allows for $E ε_{t}^{4} = \infty$ . However, the asymptotic normality of these estimators may be lost when $E η_{t}^{4} = \infty$ , see, e.g., Hall and Yao (2003). To further allow both $E η_{t}^{4} = \infty$ and $E ε_{t}^{4} = \infty$ , Zhu and Ling (2011) propose a self-weighted exponential likelihood estimator, which has a normal limiting distribution but requires E∣η_t∣ = 1 and a zero median of η_t. Changing $E η_{t}^{2} = 1$ to $E ∣ η_{t} ∣ = 1$ requires a scale transformation of h_t, which does not affect the inference of the conditional VaR; however, changing a zero mean of η_t to a zero median involves a shift transformation, which makes the inference of the conditional VaR infeasible.

Here, we instead propose a three-step inference of the conditional VaR (3.2) under model (3.1), which allows both $E ε_{t}^{4} = \infty$ and $E η_{t}^{4} = \infty$ . This is important in estimating VaR_Y,n(1 − p) when p is treated as a fixed number rather than a number converging to zero as n → ∞.

We assume that $E ∣ η_{t} ∣ = d > 0$ unknown and put X_t = η_t / d, $h_{t} = d^{2} {\underline{h}}_{t}$ , $ω = \underline{ω} d^{2}$ , $a_{i} = {\underline{a}}_{i} d^{2}$ , and $b_{j} = {\underline{b}}_{j}$ . Then model (3.1) is equivalent to

{\begin{matrix} Y_{_{t}} = μ + \sum_{i = 1}^{q_{1}} ϕ_{_{i}} Y_{_{t - i}} + \sum_{j = 1}^{q_{2}} ψ_{_{j}} ε_{_{t - j}} + ε_{_{t}} \\ ε_{_{t}} = \sqrt{h_{_{t}}} X_{_{t}}, h_{_{t}} = ω + \sum_{i = 1}^{r} a_{_{i}} ε_{_{t - i}}^{^{2}} + \sum_{j = 1}^{s} b_{_{j}} h_{_{t - j}}, \end{matrix}

(3.3)

where $E ∣ X_{t} ∣ = E ∣ η_{t} ∣ ∕ d = 1$ . The ARMA coefficients remain the same for Y_t, and we can rewrite (3.2) as

{VaR}_{Y, n} (1 - p) = μ + \sum_{i = 1}^{q_{1}} ϕ_{i} Y_{n + 1 - i} + \sum_{j = 1}^{q_{2}} ψ_{j} ε_{n + 1 - j} + \sqrt{h_{n + 1}} {VaR}_{X} (1 - p) .

Model (3.3) has been studied in Zhu and Ling (2011), but here we maintain the zero mean condition on X_t as required by the original model (3.1).

Let $ψ = (ϕ^{T}, ϕ_{h}^{T})^{T}$ denote the parameters in (3.3) with ϕ = (μ, ϕ₁, … , ϕ_q₁, ψ₁, … , ψ_q₂)^T and ϕ_h =(ω, a₁, … , a_r, b₁, … , b_s)^T. Before moving on to the quantile inference, we first develop a two-step estimator of ψ that is asymptotic normal without requiring any fourth moment condition.

Given the observations Y₁, … , Y_n and the initial value ${\bar{Y}}_{0} = {Y_{t} : t \leq 0}$ generated by model (3.1), we can write the parametric model (3.3) as

ε_{_{t}} (ϕ) = Y_{_{t}} - μ - \sum_{i = 1}^{q_{1}} ϕ_{_{i}} Y_{_{t - i}} - \sum_{j = 1}^{q_{2}} ψ_{_{j}} ε_{_{t - j}} (ϕ), h_{_{t}} (ψ) = ω + \sum_{i = 1}^{r} a_{i} ε_{_{t - i}}^{^{2}} (ϕ) + \sum_{j = 1}^{s} b_{_{j}} h_{_{t - j}} (ψ), X_{t} (ψ) = ε_{_{t}} (ϕ) ∕ \sqrt{h_{t} (ψ)} .

Obviously, ε_t = ε_t(ϕ₀), h_t = h_t(ψ₀), and X_t = X_t (ψ₀), where ϕ₀ and $ψ_{0} = (ϕ_{0}^{T}, ϕ_{h 0}^{T})^{T}$ denote the true values of the parameters. In practice, however, we do not have the initial values ${\bar{Y}}_{0} = {Y_{t} : t \leq 0}$ , which makes the calculation of ε_t(ϕ), h_t(ψ) and X_t(ψ) infeasible. To make the estimation feasible, in what follows, we replace ${\bar{Y}}_{0} = {Y_{t} : t \leq 0}$ by zeros like Ling (2007) and Zhu and Ling (2011), and define the feasible parametric model ${\tilde{ε}}_{t} (ϕ)$ , ${\tilde{h}}_{t} (ψ)$ , ${\tilde{X}}_{t} (ψ)$ based on the new initial values.

First, we estimate ϕ by the self-weighted least squares estimator given by

ϕ = arg min_{ϕ} \sum_{t = 1}^{n} {\tilde{w}}_{_{t}}^{^{2}} {\tilde{ε}}_{_{t}}^{^{2}} (ϕ),

(3.4)

where { ${\tilde{w}}_{t}$ } are some proper weights designed to reduce the moment effect of {h_t}, and ${\tilde{ε}}_{t} (ϕ)$ is the feasible parametric model as defined above. We introduce the weights ${\tilde{w}}_{t}$ that downweight $\sum_{i = 1}^{\infty} ρ^{i} ∣ Y_{t - i} ∣$ for some ρ ∈ (0,1) to control the asymptotic order of the gradient of the least-squares error functions. The constant ρ depends on the true ARMA-GARCH model. Same as in He et al. (forthcoming), we use the feasible weight

{\tilde{w}}_{_{t}} = {\max (1, \sum_{i = 1}^{t - 1} e^{- \log^{2} (i + 1)} ∣ Y_{_{t - i}} ∣)}^{- 1},

which is a truncated version of the oracle weight

w_{_{t}} = {\max (1, \sum_{i = 1}^{\infty} e^{- \log^{2} (i + 1)} ∣ Y_{_{t - i}} ∣)}^{- 1} .

Second, we define the self-weighted estimator ϕ_h of ϕ_h, which minimizes the self-weighted negative log quasi-exponential-likelihood

\sum_{t = 1}^{n} {\tilde{w}}_{_{t}}^{4} {\tilde{l}}_{_{t}} (ϕ_{_{h}} ∣ ϕ) with {\tilde{l}}_{_{t}} (ϕ_{_{h}} ∣ ϕ) = \log \sqrt{{\tilde{h}}_{_{t}} (ϕ, ϕ_{_{h}})} + \frac{∣ {\tilde{ε}}_{_{t}} (ϕ) ∣}{\sqrt{{\tilde{h}}_{_{t}} (ϕ, ϕ_{_{h}})}},

(3.5)

where ϕ and ${\tilde{w}}_{t}$ are the least-squares estimator and self-weights from the first step, respectively, and ${\tilde{h}}_{t} (ϕ, ϕ_{h})$ is the feasible parametric model.

To establish the joint asymptotic normality of $ψ = {(ϕ^{^{T}}, ϕ_{h}^{^{T}})}^{T}$ , we need the following additional regularity conditions.

A1. Let $Θ^{ψ} = Θ^{ϕ} \times Θ^{ϕ_{h}} \subset R^{q_{1} + q_{2} + 1} \times [0, \infty)^{r + s + 1}$ denote the parameter space for $ψ = {(ϕ^{^{T}}, ϕ_{h}^{^{T}})}^{T}$ . Assume that Θ^ψ is compact and the true value of ψ is an interior point.

A2. For each ϕ ∈ Θ^ϕ, $1 - \sum_{i = 1}^{q_{1}} ϕ_{i} z^{i} \neq 0$ and $1 + \sum_{j = 1}^{q_{2}} ψ_{j} z^{j} \neq 0$ when ∣z∣≤1, and $1 - \sum_{i = 1}^{q_{1}} ϕ_{i} z^{i} = 0$ and $1 + \sum_{j = 1}^{q_{2}} ψ_{j} z^{j} = 0$ have no common root with ϕ_q₁ ≠ 0 and ψ_q₂ ≠ 0.

A3. For each ϕ_h ∈ Θ^ϕ_h, there is no common root for equations $\sum_{i = 1}^{r} a_{i} z^{i} = 0$ and $\sum_{j = 1}^{s} b_{j} z^{j} = 0$ . Further, $\sum_{i = 1}^{r} a_{i} \neq 0, a_{r} + b_{s} \neq 0$ , and $\sum_{j = 1}^{s} b_{i} < 1$ .

A4. $E ε_{t}^{2} < \infty$ .

A5. $X_{t} = η_{t} ∕ E ∣ η_{t} ∣$ and ${η_{t}}_{t = 1}^{n}$ is a sequence of independent and identically distributed random variables with mean zero, variance one, and continuous density function f such that f(0) > 0 and $sup_{x \in R} f (x) < \infty$ .

Conditions A1–A3 are standard stationarity, invertibility, and identification conditions for ARMA-GARCH model (3.3) as in, e.g., Ling (2007). Condition A4 is equivalent to requiring that there is no unit root in the underlying GARCH process (3.1), that is,

\sum_{i = 1}^{r} {\underline{a}}_{i} + \sum_{j = 1}^{s} {\underline{b}}_{j} < 1 .

By carefully checking our proofs, it can be seen that we may further relax the condition down to the first moment, that is, $E ∣ ε_{t} ∣ < \infty$ . This means that our results extend to IGARCH model with $\sum_{i = 1}^{r} {\underline{a}}_{i} + \sum_{j = 1}^{s} {\underline{b}}_{j} = 1$ under suitable conditions; see, e.g., part (iii) of Theorem 2.1 in Ling (2007). The second moment condition simplifies our subsequent inference for the generalized Pareto model, and therefore we keep it throughout for simplicity. Condition A5 is similar to Assumption 2.6 in Zhu and Ling (2011), but we maintain the necessary condition that X_t has a zero mean rather than a zero median.

Theorem 4. Assume conditions A1–A5 hold.

The self-weighted estimator is consistent, that is, $ψ ≔ {(ϕ^{^{T}}, ϕ_{h}^{^{T}})}^{T} \overset{P}{\to} ψ_{0}$ .

The self-weighted estimator is asymptotic normal in such a way that

\sqrt{n} (ψ - ψ_{0}) \overset{d}{\to} N (0, Σ^{- 1} Ω (Σ^{- 1})^{T}),

with

Σ = [\begin{matrix} Σ_{1} & 0 \\ Σ_{21} & Σ_{22} \end{matrix}], Ω = [\begin{matrix} Ω_{11} & Ω_{21}^{T} \\ Ω_{21} & Ω_{22} \end{matrix}],

where the sub-matrices

Σ_{1} = E (w_{_{t}}^{2} \frac{\partial ε_{_{t}}}{\partial ϕ} \frac{\partial ε_{_{t}}}{\partial ϕ^{T}}), Σ_{21} = \frac{1}{8} E (\frac{w_{_{t}}^{4}}{h_{_{t}}^{2}} \frac{\partial h_{_{t}}}{\partial ϕ_{_{h}}} \frac{\partial h_{_{t}}}{\partial ϕ^{T}}), Σ_{22} = \frac{1}{8} E (\frac{w_{_{t}}^{4}}{h_{_{t}}^{2}} \frac{\partial h_{_{t}}}{\partial ϕ_{_{h}}} \frac{\partial h_{_{t}}}{\partial ϕ^{T}}), Ω_{11} = E X_{_{t}}^{2} \cdot E (w_{_{t}}^{2} h_{_{t}} \frac{\partial ε_{_{t}}}{\partial ϕ} \frac{\partial ε_{_{t}}}{\partial ϕ^{T}}), Ω_{22} = \frac{E X_{_{t}}^{2} - 1}{4} \cdot E [\frac{w_{_{t}}^{8}}{h_{_{t}}^{2}} \frac{\partial h_{_{t}}}{\partial ϕ_{_{h}}} \frac{\partial h_{_{t}}}{\partial ϕ_{_{h}}^{T}}], and Ω_{21} = E [X_{_{t}}^{2} (1 (X_{_{t}} > 0) - 1 (X_{_{t}} < 0))] \cdot E [\frac{w_{_{t}}^{6}}{2 \sqrt{h_{_{t}}}} \frac{\partial h_{_{t}}}{\partial ϕ_{_{h}}} \frac{\partial ε_{_{t}}}{\partial ϕ^{T}}] .

Next, we estimate the high quantile of X_t under the generalized Pareto model with the additional assumption:

A6. $X_{_{t}} \equiv η_{_{t}} ∕ E ∣ η_{_{t}} ∣$ satisfies Assumption 1 with $γ_{0} \in (0, \frac{1}{2})$ and scale parameter σ_α₀ > 0.

Note that γ₀ < 1/2 above ensures that $E η_{t}^{2} < \infty$ . Let ${\hat{X}}_{_{1 : n}} \leq \dots \leq {\hat{X}}_{_{n : n}}$ denote the order statistics of residuals { ${\hat{X}}_{_{t}} ≔ {\tilde{X}}_{_{t}} (ψ) : t = 1, \dots, n$ }, where ${\tilde{X}}_{_{t}} (\cdot)$ is the feasible parametric model as defined above. We then choose a threshold statistic such as

u_{n} = {\hat{X}}_{n - [n \bar{α}] : n},

(3.6)

corresponding to an adaptive tail probability level α_n = 1 − F(u_n), where F denotes the distribution function of X_t. Under the conditions of Theorem 4, we show that the threshold estimator (3.6) is consistent, that is,

u_{n} \overset{P}{\to} Q (1 - \bar{α}), and equivalently α_{n} \overset{P}{\to} \bar{α},

(3.7)

where Q(·) = F^←(·) denotes the quantile function of X_t. In general, our theory allows an arbitrary threshold statistic u_n that satisfies our Assumption 2 above. With a general threshold statistic, we estimate the adaptive exceeding probability α_n, the shape parameter γ₀, and the scale parameter σ_{α_n} altogether by solving equations (2.2)-(2.4) with X_i therein replaced by the residual ${\hat{X}}_{_{t}}$ . Denote the estimators by $\hat{α}$ , $\hat{γ}$ , and $\hat{σ}$ respectively, which give the quantile estimator

{VaR}_{X} (1 - p) = u_{n} + \frac{\hat{σ}}{\hat{γ}} ({(\frac{\hat{α}}{p})}^{\hat{γ}} - 1) .

Thus the estimator of VaR_Y,n(1 − p) is given by

{VaR}_{Y, n} (1 - p) = \hat{μ} + \sum_{i = 1}^{q_{1}} {\hat{ϕ}}_{_{i}} Y_{_{n + 1 - i}} + \sum_{j = 1}^{q_{2}} {\hat{ψ}}_{j} {\tilde{ε}}_{_{n + 1 - j}} (ϕ) + \sqrt{{\tilde{h}}_{_{t}} (ϕ, ϕ_{_{h}})} {VaR}_{X} (1 - p) .

(3.8)

Note that u_n = u_n(ψ), $\hat{γ} = \hat{γ} (ψ)$ and $\hat{σ} = \hat{σ} (ψ)$ all depend on the self-weighted estimator ψ , whose effects do not fade away for any finite p ∈ (0,1).

Theorem 5. Assume conditions A1-A6 hold.

With probability tending to 1, there exists a maximum likelihood estimator $θ = (\hat{γ}, \log \hat{σ}, \log \hat{α})$ solving the score equations (2.2)- (2.4) simultaneously for { ${\hat{X}}_{_{t}}$ } in the local parameter space
${\bar{Θ}}_{n}^{ε} = {θ \in R^{3} : ‖ θ - θ_{0}^{(n)} ‖ < n^{- 1 ∕ 2 + ε}},$

for any ε ∈ (0, min{γ₀ +1 / 2, 1 / 2}), where $θ_{0}^{(n)} = (γ_{0}, \log σ_{α_{n}}, \log α_{n})$ and α_n = 1 − F(u_n) denote the adaptive true values.

Any maximum likelihood estimator sequence from part (i) is jointly asymptotic normal, in such a way that

\sqrt{n \bar{α}} [\begin{matrix} ψ - ψ_{0} \\ θ - θ_{0}^{(n)} \end{matrix}] \overset{d}{\to} N (0, ({\tilde{Σ}}^{- 1}) \tilde{Ω} ({\tilde{Σ}}^{- 1})^{T})

where

\tilde{Σ} = [\begin{matrix} Σ & 0 & 0 \\ - Γ_{1}^{T} Σ & I & 0 \\ - \frac{1}{1 - \bar{α}} Γ_{2}^{T} Σ & 0 & \frac{1}{1 - \bar{α}} \end{matrix}], \tilde{Ω} = [\begin{matrix} \bar{α} Ω & \bar{α} σ_{\bar{α}} Γ_{3} & \frac{\bar{α} σ_{\bar{α}}}{1 - \bar{α}} Γ_{4} \\ \bar{α} σ_{\bar{α}} Γ_{3}^{T} & I & 0 \\ \frac{\bar{α} σ_{\bar{α}}}{1 - \bar{α}} Γ_{4}^{T} & 0 & \frac{1}{1 - \bar{α}} \end{matrix}]

with

Γ_{1} = E [\frac{1}{2 h_{t}} \frac{\partial h_{t}}{\partial ψ}] {[\begin{matrix} \frac{1}{(1 + γ_{0}) (1 + 2 γ_{0})} \\ \frac{1}{1 + 2 γ_{0}} \end{matrix}]}^{T} + Γ_{2} {[\begin{matrix} - \frac{γ_{0}}{(1 + γ_{0}) (1 + 2 γ_{0})} \\ \frac{1 + γ_{0}}{1 + 2 γ_{0}} \end{matrix}]}^{T}, Γ_{2} = \frac{1}{σ_{\bar{α}}} {Q (1 - \bar{α}) E [\frac{1}{2 h_{t}} \frac{\partial h_{t}}{\partial ψ}] - E [\frac{1}{\sqrt{h_{t}}} \frac{\partial ε_{t}}{\partial ψ}]}, Γ_{3} = [\begin{matrix} E [w_{t}^{2} \frac{\partial ε_{t}}{\partial ϕ} {\sqrt{h}}_{t}] \\ E [\frac{w_{t}^{4}}{2 h_{t}} \frac{\partial h_{t}}{\partial ϕ_{h}}] \end{matrix}] {[\begin{matrix} \frac{1}{(1 - γ_{0})^{2}} \\ \frac{1}{1 - γ_{0}} \end{matrix}]}^{T}, Γ_{4} = [\begin{matrix} E [w_{t}^{2} \frac{\partial ε_{t}}{\partial ϕ} {\sqrt{h}}_{t}] (\frac{Q (1 - \bar{α})}{σ_{\bar{α}}} + \frac{1}{1 - γ_{0}}) \\ E [\frac{w_{t}^{4}}{2 h_{t}} \frac{\partial h_{t}}{\partial ϕ_{h}}] (\frac{Q (1 - \bar{α})}{σ_{\bar{α}}} + \frac{1}{1 - γ_{0}} - \frac{1}{σ_{\bar{α}}}) \end{matrix}],

and $I$ defined in Theorem 1.

Remark 7. Again, our inference is asymptotically correct regardless of a finite or divergent threshold, be it deterministic or random; see Remarks 1 and 2. By fixing $\bar{α}$ , we can effectively quantify the influence from the ARMA-GARCH model estimation errors on our generalized Pareto parameter inference based on the estimated residuals { ${\hat{X}}_{_{t}}$ } instead of the true errors {X_t}. When $\bar{α} = {\bar{α}}_{n} \to 0$ is an intermediate sequence such that $u_{n} ∕ Q (1 - \bar{α}) \overset{P}{\to} 1$ and $n \bar{α} ∕ n^{κ} \to \infty$ for some κ > 0 as in, e.g, McNeil and Frey (2000), Martins-Filho et al. (2018), and Hoga (2019), we deduce in the supplement that the estimation error from the ARMA-GARCH model indeed becomes asymptotically negligible as

\sqrt{n {\bar{α}}_{n}} (θ - θ_{0}^{(n)}) \overset{d}{\to} N (0, [\begin{matrix} I^{- 1} & 0 \\ 0 & 1 \end{matrix}]),

where the asymptotic variance is the same as using the true errors {X_t} rather than the estimated residuals { ${\hat{X}}_{_{t}}$ }, and coincides with that in Theorem 5 by setting $\bar{α}$ to its limit 0 in the asymptotic variance. In other words, our approach unifies the inference for both non-divergent and divergent thresholds. Following Remark 2, it is natural to expect that our methods remain asymptotically correct when the true errors are array data that could be sufficiently well modeled by the generalized Pareto distribution.

From Theorem 5, we can quantify the impact of the ARMA-GARCH model estimation errors to our inference of the generalized Pareto parameters using the estimated residuals rather than the true errors. In particular, observe that

\sqrt{n \bar{α}} (θ - θ_{0}^{(n)}) \overset{d}{\to} N (0, I_{\bar{α}}^{- 1} + \bar{α} V_{\bar{α}}),

where $I_{\bar{α}}^{- 1} = [\begin{matrix} I^{- 1} & 0 \\ 0 & 1 - \bar{α} \end{matrix}]$ is the asymptotic covariance matrix in Theorem 1, and we have an additional variance term depending on the ARMA-GARCH model given by

V_{\bar{α}} = I_{\bar{α}}^{- 1} (A Ω A^{T} + v A^{T} + A v^{T}) I_{\bar{α}}^{- 1}, A = [\begin{matrix} Γ_{1}^{T} \\ \frac{1}{1 - \bar{α}} Γ_{2}^{T} \end{matrix}], v = [\begin{matrix} σ_{\bar{α}} Γ_{3}^{T} \\ \frac{σ_{\bar{α}}}{1 - \bar{α}} Γ_{4}^{T} \end{matrix}] .

(3.9)

Now recall the quantile formula (2.8). The following quantile inference theorem follows by continuous mapping theorem.

Theorem 6. Under the conditions of Theorem 5, for any p ∈ (0, α₀)

\frac{\sqrt{n \bar{α}}}{σ_{p}} ({VaR}_{X} (1 - p) - {VaR}_{X} (1 - p)) \overset{d}{\to} N (0, τ^{2} (\bar{α}, p)),

where the variance

τ^{2} (\bar{α}, p) = q {(\frac{\bar{α}}{p})}^{T} I^{- 1} q (\frac{\bar{α}}{p}) + 1 - \bar{α} + \bar{α} [\begin{matrix} q (\bar{α} ∕ p) \\ 1 \end{matrix}] V_{\bar{α}} {[\begin{matrix} q (\bar{α} ∕ p) \\ 1 \end{matrix}]}^{T},

with $I^{- 1}$ defined in Theorem 1 and the additional variance term $V_{\bar{α}}$ given in (3.9).

We omit the proof as it is completely analogous to that of Theorem 2, but using Theorem 5 instead of Theorem 1. Now, with ${\hat{σ}}_{p} = {\hat{σ}}_{n} (\hat{α} ∕ p)^{{\hat{γ}}_{n}}$ and a consistent estimator ${\hat{τ}}^{2} (\bar{α}, p)$ (e.g., replacing the moments by their sample versions, $\bar{α}$ by $\hat{α}$ , γ₀ by $\hat{γ}$ , $σ_{\bar{α}}$ by ${\hat{σ}}_{n}$ , and $Q (1 - \bar{α})$ by u_n), a confidence interval with level a of VaR_X(1 − p) is given by

[{VaR}_{X} (1 - p) - \frac{z_{(1 + a) ∕ 2}}{\sqrt{n \bar{α}}} {\hat{σ}}_{n} (\hat{α} ∕ p)^{{\hat{γ}}_{n}} \hat{τ} (\bar{α}, p), {VaR}_{X} (1 - p) + \frac{z_{(1 + a) ∕ 2}}{\sqrt{n \bar{α}}} {\hat{σ}}_{n} (\hat{α} ∕ p)^{{\hat{γ}}_{n}} \hat{τ} (\bar{α}, p)] .

Substituting VaR_X(1 − p) in (3.8) by the values in the above interval, we can construct a prediction interval for VaR_Y,n(1 − p). Similar to the case of independent data, such an interval has a poor coverage probability in small samples. It is computationally intensive to employ the residual based bootstrap method. Here, to bypass the daunting task of estimating the asymptotic variance of the quantile estimator, we suggest a random weighted bootstrap procedure as follows.

Step C1) Draw a random sample with sample size n from a distribution function with mean one and variance one, say $ξ_{1}^{(b)}$ , ⋯ , $ξ_{n}^{(b)}$ .
Step C2) First, we estimate ϕ by
$ϕ^{(b)} = arg min_{ϕ} \sum_{t = 1}^{n} ξ_{t}^{(b)} {\tilde{w}}_{t}^{2} {\tilde{ε}}_{t}^{2} (ϕ) .$

Second, we estimate ϕ_h by maximizing
$\sum_{t = 1}^{n} ξ_{t}^{(b)} {\tilde{w}}_{t}^{4} {\tilde{l}}_{t} (ϕ_{h} ∣ ϕ^{(b)})$

and denote the estimator by $ϕ_{_{h}}^{^{(b)}}$ . Define ${\hat{X}}_{t}^{(b)} = {\tilde{ε}}_{_{t}} (ϕ^{^{(b)}}) ∕ \sqrt{{\tilde{h}}_{_{t}} (ϕ^{^{(b)}}, ϕ_{_{h}}^{^{(b)}})}$ for t = 1, ⋯ , n, ${\hat{u}}_{_{n}}^{^{(b)}} = {\hat{X}}_{n - [n \bar{α}] : n}^{(b)}$ , and estimate γ₀ and σ_{α_n} by solving
$\sum_{t = 1}^{n} ξ_{t}^{(b)} δ ({\hat{X}}_{t}^{(b)} - {\hat{u}}_{n}^{(b)}) s_{1} (v ∣ {\hat{X}}_{t}^{(b)} - {\hat{u}}_{n}^{(b)}) = 0, \sum_{t = 1}^{n} ξ_{t}^{(b)} δ ({\hat{X}}_{t}^{(b)} - {\hat{u}}_{n}^{(b)}) s_{2} (v ∣ {\hat{X}}_{t}^{(b)} - {\hat{u}}_{n}^{(b)}) = 0 .$

Denote the estimators by ${\hat{γ}}^{^{(b)}}$ and ${\hat{σ}}^{^{(b)}}$ , which gives
${VaR}_{X}^{(b)} (1 - p) = {\hat{u}}_{n}^{(b)} + \frac{{\hat{σ}}^{(b)}}{{\hat{γ}}^{(b)}} ((\bar{α} ∕ p)^{{\hat{γ}}^{(b)}} - 1) .$
Step C3) Repeat the above two steps B times to obtain ${{VaR}_{X}^{(b)} (1 - p)}_{b = 1}^{B}$ . Let ${\tilde{D}}_{1 : B} \leq \dots \leq {\tilde{D}}_{B : B}$ denote the order statistics of
$\log (\frac{{VaR}_{X}^{(b)} (1 - p)}{{VaR}_{X} (1 - p)}), b = 1, \dots, B,$

and let ${\tilde{D}}_{(1)} \leq \dots \leq {\tilde{D}}_{(B)}$ denote the order statistics of
$∣ \log (\frac{{VaR}_{X}^{(b)} (1 - p)}{{VaR}_{X} (1 - p)}) ∣, b = 1, \dots, B .$

Hence, the confidence intervals with level a for $\log ({VaR}_{\bar{X}} (1 - p))$ are
${\tilde{I}}_{R W B, 1} (a) = [\log ({VaR}_{X} (1 - p)) - {\tilde{D}}_{[\frac{B + B a}{2}] : B}, \log ({VaR}_{X} (1 - p)) - {\tilde{D}}_{[\frac{B - B a}{2}] : B}]$
and
${\tilde{I}}_{R W B, 2} (a) = [\log ({VaR}_{X} (1 - p)) - {\tilde{D}}_{(B a)}, \log ({VaR}_{X} (1 - p)) + {\tilde{D}}_{(B a)}] .$

Again, substituting VaR_X(1 − p) in (3.8) by the values in each interval above, we can construct the corresponding prediction interval for VaR_Y,n(1 − p). The simulation study below shows that the above procedure provides a good finite-sample coverage performance. The asymptotic theory for the random weighted bootstrap method can be derived with rather tedious calculations and thus is skipped.

4. Simulation Study

4.1. Independent data

This subsection carries out a simulation study to evaluate the finite-sample behavior of the proposed method for estimating VaR based on independent observations.

We draw 10000 random samples with sample size n = 500 or 1200 or 2500 from (1.2) with γ = 3 or 1/3, σ = 1, G being the standard normal distribution, and θ = 0.9. We use $\bar{α} = 0.05$ , p = 0.01 or 0.001, and B = 10000 in the naive bootstrap method and the random weighted bootstrap method. The details of the naive bootstrap methods are available in Section A of the supplement. Same as the random weighted bootstrap confidence intervals I_RWB,1(a) and I_RWB,2(a) given in Section 2, we construct two different types of naive bootstrap intervals I_Boot,1(a) and I_Boot,2(a) based on the nominal and absolute differences of the estimators respectively. We use the nlm function in the R statistical software to minimize the likelihood function with the following initial values for γ and $σ_{\bar{α}}$ .

Let $Y_{_{i}} = X_{_{(n - i + 1) : n}} - X_{_{(n - [n \bar{α}]) : n}}$ for i = 1, … , m with $m = [n \bar{α}]$ . As we consider a positive index γ, we use the initial values

γ^{i n i} = \frac{1}{\log 2} ∣ \log \frac{Y_{[m (1 - 3 ∕ 8)] : m} - Y_{[m (1 - 3 ∕ 16)] : m}}{Y_{[m (1 - 3 ∕ 4)] : m} - Y_{[m (1 - 3 ∕ 8)] : m}} ∣ and σ_{\bar{α}}^{i n i} = \frac{Y_{[m (1 - 3 ∕ 8)] : m} γ^{i n i}}{(3 ∕ 8)^{{- γ}^{i n i}} - 1} .

Here γⁱⁿⁱ is the Pickands (1975) tail index estimator.

The coverage probabilities of the proposed intervals with levels a = 90% and 95% are reported in Tables 1 and 2, which show that: i) the normal approximation method is the worst, and ii) it is much better to use the naive bootstrap method and the random weighted bootstrap method with critical values computed from the empirical distribution of the absolute differences between the bootstrapped estimators and the maximum likelihood VaR estimator. Further, the normal Q-Q plots in Figures B.1 and B.2 of the supplement show that the distribution of the VaR estimator is away from a normal distribution, especially when p is very small. Hence we prefer I_Boot,2(a) and I_RWB,2(a) to I_Boot,1(a) and I_RWB,1(a) in risk analysis.

Table 1.

Confidence intervals with level a = 90%. Empirical coverage probabilities are reported for the normal approximation confidence interval I_NA(a), the naive bootstrap intervals I_Boot,1(a) and I_Boot,2(a), and the random weighted bootstrap intervals I_RWB,1(a) and I_RWB,2(a). We take γ = 3 or 1/3, σ = 1, G ~ N(0,1), and θ = 0.9 in (1.2).

(n, p, γ)	I_NA(0.90)	I_Boot,1(0.90)	I_Boot,2(0.90)	I_RWB,1(0.90)	I_RWB,2(0.90)
(500,0.01,3)	0.7671	0.8447	0.9042	0.8516	0.9009
(500,0.001,3)	0.6634	0.8267	0.9012	0.8089	0.8936
(1200,0.01,3)	0.8247	0.8723	0.9005	0.8697	0.9012
(1200,0.001,3)	0.7392	0.8607	0.8971	0.8535	0.8984
(2500,0.01,3)	0.8573	0.8901	0.8987	0.8869	0.9007
(2500,0.001,3)	0.7837	0.8812	0.8957	0.8706	0.8972
(500,0.01,1/3)	0.8569	0.8571	0.8990	0.8591	0.8966
(500,0.001,1/3)	0.7453	0.7053	0.9318	0.6791	0.9210
(1200,0.01,1/3)	0.8803	0.8815	0.9034	0.8799	0.9036
(1200,0.001,1/3)	0.8027	0.7840	0.9145	0.7635	0.9136
(2500,0.01,1/3)	0.8928	0.8898	0.8985	0.8893	0.8975
(2500,0.001,1/3)	0.8494	0.8446	0.9029	0.8205	0.9048

Open in a new tab

Table 2.

Confidence intervals with level a = 95%. Empirical coverage probabilities are reported for the normal approximation confidence interval I_NA(a), the naive bootstrap intervals I_Boot,1(a) and I_Boot,2(a), and the random weighted bootstrap intervals I_RWB,1(a) and I_RWB,2(a). We take γ = 3 or 1/3, σ = 1, G ~ N(0,1), and θ = 0.9 in (1.2).

(n, p, γ)	I_NA(0.95)	I_Boot,1(0.95)	I_Boot,2(0.95)	I_RWB,1(0.95)	I_RWB,2(0.95)
(500,0.01,3)	0.7931	0.8928	0.9523	0.8806	0.9508
(500,0.001,3)	0.6817	0.8718	0.9512	0.8393	0.9479
(1200,0.01,3)	0.8538	0.9184	0.9491	0.9130	0.9489
(1200,0.001,3)	0.7626	0.9075	0.9480	0.8932	0.9486
(2500,0.01,3)	0.8908	0.9359	0.9480	0.9306	0.9486
(2500,0.001,3)	0.8102	0.9316	0.9478	0.9173	0.9483
(500,0.01,1/3)	0.9008	0.9102	0.9490	0.9126	0.9483
(500,0.001,1/3)	0.7852	0.7631	0.9655	0.7162	0.9607
(1200,0.01,1/3)	0.9235	0.9301	0.9520	0.9274	0.9524
(1200,0.001,1/3)	0.8396	0.8253	0.9572	0.7986	0.9567
(2500,0.01,1/3)	0.9375	0.9408	0.9485	0.9399	0.9491
(2500,0.001,1/3)	0.8838	0.8843	0.9519	0.8608	0.9532

Open in a new tab

4.2. ARMA-GARCH sequence

This subsection carries out a simulation study to evaluate the finite-sample behavior of the proposed method for estimating VaR based on an AR-GARCH sequence.

Due to the computation burden of the random weighted bootstrap method, we draw 1000 random samples with sample size n = 1200 and 2500 from the following AR-GARCH model:

Y_{t} = 0.0337 - 0.0620 Y_{t - 1} + ε_{t}, ε_{t} = \sqrt{h_{t}} X_{t}, h_{t} = 0.0123 + 0.0883 ε_{t - 1} + 0.8310 h_{t - 1},

where $X_{_{t}} = (e_{_{t}} - E e_{_{t}}) ∕ E ∣ e_{_{t}} ∣$ , e_t = δe_t,1−(1 − δ)e_t,2, and e_t,1 and e_t,2 are independent GPD random variables with CDF F(x) = 1 −(1 + γx)^−1/γ. The parameters are calibrated from the daily returns on the S&P500 index between 2012 and 2016. We consider γ = 1/3 and 1/6 to ensure $E X_{_{t}}^{^{2}} < \infty$ . We take δ = 0.5 and use the random weighted bootstrap method with B = 10000. The coverage probabilities of the proposed intervals with levels a = 90% and 95% are reported in Table 3, which show that ${\tilde{I}}_{R W B, 2} (a)$ is again better than ${\tilde{I}}_{R W B, 1} (a)$ and performs well except under the case (n, p) = (1200, 0.001), where over-coverage is observed.

Table 3. Confidence intervals for AR-GARCH models.

Empirical coverage probabilities are reported for the random weighted bootstrap confidence intervals ${\tilde{I}}_{R W B, 1} (a)$ and ${\tilde{I}}_{R W B, 2} (a)$ with a = 0.90 and 0.95.

(n, p, γ)	${\tilde{I}}_{R W B, 1} (0.90)$	${\tilde{I}}_{R W B, 2} (0.90)$	${\tilde{I}}_{R W B, 1} (0.95)$	${\tilde{I}}_{R W B, 2} (0.95)$
(1200,0.01,1/3)	0.843	0.908	0.901	0.961
(1200,0.001,1/3)	0.629	0.940	0.689	0.968
(2500,0.01,1/3)	0.878	0.894	0.928	0.949
(2500,0.001,1/3)	0.700	0.906	0.763	0.952
(1200,0.01,1/6)	0.857	0.898	0.918	0.951
(1200,0.001,1/6)	0.703	0.936	0.764	0.970
(2500,0.01,1/6)	0.868	0.903	0.930	0.943
(2500,0.001,1/6)	0.757	0.896	0.811	0.943

Open in a new tab

In summary, for quantifying the inference uncertainty of VaR estimation, we prefer the random weighted bootstrap method with critical values computed from the empirical distribution of the absolute differences between the bootstrapped risk estimators and the risk estimator, which works well for independent data and dependent data.

5. Data Analysis

5.1. Danish fire insurance losses

This subsection analyzes the Danish fire insurance data¹ in McNeil (1997) using the proposed semi-parametric GPD model in (1.2) by treating the fire losses as independent data. The dataset consists of 2167 large fire insurance claims (i.e., losses) in Denmark from January 1980 until December 1990.

We first check the validity of Assumption 1 via the probability-probability plot (P-P plot). Specifically, denote the data as ${X_{_{i}}}_{i = 1}^{2167}$ and its empirical distribution as $\hat{F} (^{.}) .$ . For a given threshold level $\bar{α}$ and the corresponding threshold $u = {\hat{F}}^{\leftarrow} (1 - \bar{α})$ , we estimate a GPD distribution $\hat{G} (\cdot; {\hat{σ}}_{\bar{α}}, {\hat{γ}}_{\bar{α}})$ based on {X_i ∣ X_i > u} and plot it against $\tilde{G} (^{.}) = (\hat{F} (^{.} + u) - \hat{F} (u)) ∕ (1 - \hat{F} (u))$ . Figure 1 gives the P-P plots for $\bar{α} = 0.1$ and 0.05. As can be seen, both P-P plots are roughly linear, supporting the validity of Assumption 1.

Fig. 1 — P-P plots for the Danish data with threshold levels $\bar{α} = 0.1$ and 0.05.

We then perform a sensitivity analysis of the proposed method with respect to the choice of threshold level $\bar{α}$ . Specifically, we calculate the (1 − p)×100% VaR at level p = 0.01, 0.005, 0.001 using varying threshold levels $\bar{α} = 0.05$ , 0.1. The confidence interval (C.I.) for VaR is calculated at a = 90% and 95% level. To construct the C.I., we implement the normal approximation (NA), the naive bootstrap method (Boot1 and Boot2), and the random weighted bootstrap method (RWB1 and RWB2). For comparison, we further conduct a naive nonparametric bootstrap (Naive), where we simply bootstrap the Danish fire insurance data and use sample quantile to estimate the VaR and its C.I.

The results are given in Figure 2. The performance of the semi-parametric GPD is fairly stable with respect to $\bar{α}$ for p = 0.01, 0.005 and shows some variation for p = 0.001. Note that the naive nonparametric bootstrap (Naive) gives a very wide (and thus non-informative) C.I. for extreme VaR (p = 0.001), which highlights the value/necessity of the proposed semiparametric C.I. construction approach. We also report the Q-Q plots of $\log ({VaR}_{X}^{(b)} (1 - p) ∕ {VaR}_{X} (1 - p))$ , b = 1, 2, ⋯, 10000, for Boot and RWB, respectively, in Figures B.3 and B.4 of the supplement. These figures show that the distribution is generally skewed, especially for p = 0.005 and 0.001.

Fig. 2 — Sensitivity analysis of constructed confidence intervals (C.I.) for the (1 − p)×100% VaR at level p = 0.01, 0.005, 0.001. The y-axis is in the unit of 1 million Danish Krone. EmpQ stands for VaR estimated by the sample quantile, and EstQ stands for VaR estimated by the semi-parametric GPD. The result of the naive approach (Naive) does not depend on $\bar{α}$ , thus is only plotted unders $\bar{α} = 0.05$ to avoid confusion.

We further conduct a leave-one-out validation (LOOV) for the proposed semi-parametric GPD model and compare it with the naive nonparametric sample quantile approach (Naive). Specifically, for each observation X_i with i = 1, 2, ⋯ , 2167, we use the leave-one-out sample X_−i to estimate the (1 − p)×100% VaR by either semi-parametric GPD or sample quantile (Naive). For evaluation, we use the empirical coverage rate, which is defined as the proportion of experiments where the left-out loss X_i is covered by (i.e. lower than) the estimated VaR based on the leave-one-out sample X_−i for i = 1, ⋯, 2167.

Table 4 reports the empirical coverage rate of the estimated VaR by the two approaches across the 2167 experiments and further gives the corresponding p-values from the binomial tests² for the null hypothesis that the coverage probability of the estimated (1 − p)×100% VaR is indeed the target level 1 − p. As can be seen, both approaches give a satisfactory result with GPD providing a perfect performance. Moreover, note that the performance of the GPD approach is insensitive to the threshold level of $\bar{α}$ , indicating the statistical stability of the proposed approach.

Table 4.

The empirical coverage rate (Emp. rate) of the estimated (1 − p)×100% VaR across the 2167 experiments at level p = 0.01, 0.005, 0.001. The result of the naive approach does not depend on $\bar{α}$ , thus is only reported under $\bar{α} = 0.05$ to avoid confusion.

p	$\bar{α}$	Emp. rate(GPD)	Emp. rate(Naive)	p-value(GPD)	p-value(Naive)
0.010	0.050	0.990	0.989	1.000	0.745
0.010	0.100	0.990	-	1.000	-
0.005	0.050	0.995	0.994	1.000	0.647
0.005	0.100	0.995	-	1.000	-
0.001	0.050	0.999	0.999	0.483	0.483
0.001	0.100	0.999	-	0.483	-

Open in a new tab

5.2. Losses in the S&P500 index

This subsection analyzes the daily negative log-returns (i.e., losses) in the S&P500 index using the proposed semi-parametric GPD method with an ARMA-GARCH model. Precisely, on each day t, based on the past 2500 historical observations (y_t−2499, y_t−2498, ⋯, y_t), we fit an AR(1)-GARCH(1,1) model using the proposed two-step self-weighted estimation method. We then calculate the one-day ahead (1 − p)×100% conditional VaR by the semi-parametric GPD method with a threshold $\bar{α}$ and construct the corresponding 90% or 95% C.I. by RWB.

For comparison, we also conduct the analysis using a traditional nonparametric approach (Trad). That is, we fit an AR(1)-GARCH(1,1) model by MLE and use the sample quantile of the fitted residuals to calculate the one-day ahead conditional VaR and bootstrap the residuals to construct the corresponding C.I. of VaR.

We let t start from 11/01/2007, which is roughly the start of the financial crisis, and we make the end date to be 10/20/2011, which roughly marks the end of the crisis. In other words, we aim to test the ability of the proposed GPD method for monitoring a financial system under stress.

There are 1000 predictions of one-day ahead conditional VaR given by the semi-parametric GPD approach and the traditional nonparametric approach (Trad). We vary p = 0.01, 0.005, 0.001 and set the confidence level of the C.I. to be 90% or 95%. For the semi-parametric GPD approach, we further vary $\bar{α} = 0.05$ , 0.1. Table 5 reports the empirical coverage rate of the estimated VaR by the two approaches across the 1000 predictions, which is defined as the proportion of predictions where the observed loss is lower than the estimated one-day ahead conditional VaR. Table 5 also gives the corresponding p-values from the binomial test. As can be seen, the traditional nonparametric approach tends to underestimate the true conditional VaR, and thus imposes serious under-reserve risk. On the other hand, the semi-parametric GPD gives satisfactory prediction performance and passes all the binomial tests. Moreover, note that the performance of the GPD approach is again insensitive to the threshold level of $\bar{α}$ , indicating the statistical stability of the proposed approach.

Table 5.

The empirical coverage rate (Emp. rate) of the estimated (1 − p)×100% VaR across the 1000 predictions at level p = 0.01, 0.005, 0.001. The result of the traditional approach does not depend on $\bar{α}$ , thus is only reported under $\bar{α} = 0.05$ to avoid confusion.

p	$\bar{α}$	Emp. rate(GPD)	Emp. rate(Trad)	p-value(GPD)	p-value(Trad)
0.010	0.050	0.985	0.979	0.111	0.002
0.010	0.100	0.985	-	0.111	-
0.005	0.050	0.995	0.989	1.000	0.020
0.005	0.100	0.995	-	1.000	-
0.001	0.050	1.000	0.998	0.632	0.264
0.001	0.100	1.000	-	0.632	-

Open in a new tab

For illustration, Figure 3 plots the estimated VaR and the corresponding C.I. given by RWB2 and the traditional nonparametric approach. For the plots, we set $\bar{α} = 0.05$ , the confidence level of C.I. a = 90% and vary p = 0.01, 0.005, 0.001. The result for RWB1 and the result for $\bar{α} = 0.1$ and the confidence level a = 95% are similar and thus are omitted. Note that compared to RWB2, the C.I. given by the traditional nonparametric method is narrower for p = 0.01, which may be possibly due to the fact that the C.I. by the traditional nonparametric approach does not incorporate the estimation uncertainty of the AR(1)-GARCH(1,1) model. On the other hand, the nonparametric approach gives much wider C.I. for extreme quantiles p = 0.005, 0.001, indicating that a naive nonparametric bootstrap cannot construct an informative C.I. for extreme quantiles. This phenomenon is observed in the Danish insurance data analysis as well. In Appendix C, we also show that our proposed method outperforms the filtered historical simulation methods in Kuester et al. (2006). The conclusions are qualitatively the same, and the details are available in the supplement.

Fig. 3 — Estimated one-day ahead conditional VaR (red line) and its 90% C.I. (blue dashed lines) by the random weight bootstrap (RWB2) and the traditional nonparametric approach (Trad). The black line denotes the negative daily log returns of the S&P500 index.

We conclude the analysis by providing a validity check of Assumption A6. Specifically, Figure 4 gives the P-P plots for $\bar{α} = 0.1$ and 0.05 based on the entire dataset from 11/24/1997 to 10/20/2011 (2500 + 1000 observations). The P-P plot is generated in the same fashion as in Section 5.1 except it is now based on the residuals of the AR(1)-GARCH(1,1) model. Both P-P plots are roughly linear and support Assumption A6.

Fig. 4 — P-P plots for the S&P500 index with threshold levels $\bar{α} = 0.1$ and 0.05.

6. Conclusions

Given that regulators often set a high VaR level in risk management, fitting distribution in the tail is essential. This paper infers a semi-parametric model, which only models exceedances over a non-divergent threshold by the generalized Pareto distribution. Asymptotic results for parameters and VaR estimation are first derived for independent data. For financial data modeled by an ARMA-GARCH process, a three-step weighted estimation procedure is proposed to ensure a normal limit for the estimated parameters and conditional VaR with heavy-tailed observations. For efficiently quantifying the uncertainty of risk forecast, a random weighted bootstrap method is proposed and shown to be consistent. A simulation study and real data analysis confirm the advantages of the proposed methodologies. It is crucial to develop a distribution free goodness-of-fit test and the asymptotic theory for dynamic modeling of the generalized Pareto distribution, which will be our future research plan.

Supplementary Material

Supp 1

NIHMS1733545-supplement-Supp_1.pdf^{(8MB, pdf)}

ACKNOWLEDGMENTS

We thank the editor, Professor Jianqing Fan, an associate editor, and two reviewers for their useful comments that led to this improved version of the manuscript. Peng’s research was partly supported by the Simons Foundation and the NSF grant, DMS2012448. Zhang’s research was partially supported by the National Cancer Institute and the National Institute of General Medical Sciences of the National Institutes of Health under award numbers R03CA235363 and R01GM131491, respectively.

Footnotes

SUPPLEMENTARY MATERIAL

Supplement to “Risk Analysis via Generalized Pareto Distributions” In this supplement, we provide the asymptotic theory for the naive bootstrap method for independent data (Remark 6), report some additional Q-Q plots discussed in Sections 4.1 and 5.1, discuss some additional empirical results for the S&P 500 index, prove Theorems 1–5, and deduce the results for divergent thresholds from Remarks 2 and 7 in details.

The dataset is publicly available via R package evir.

Under the null hypothesis, the number of coverage by the estimated (1 − p)×100% VaR across n experiments should follow a binomial distribution with parameters (n, 1 − p). See Kratz et al. (2018) for more details of the binomial test.

Contributor Information

YI HE, Amsterdam School of Economics, University of Amsterdam, Amsterdam 1001 NJ, The Netherlands.

LIANG PENG, Department of Risk Management and Insurance, Georgia State University, Atlanta, GA.

DABAO ZHANG, Department of Statistics, Purdue University, West Lafayette, IN.

ZIFENG ZHAO, Mendoza College of Business, University of Notre Dame, Notre Dame, IN.

References

Allen L, Bali TG, and Tang Y (2012), “Does Systemic Risk in the Financial Sector Predict Future Economic Downturns?,” The Review of Financial Studies, 25, 3000–3036. [Google Scholar]
Balkema A, and de Haan L (1974), “Residual Life Time at Great Age,” The Annals of Probability, 2, 792–804. [Google Scholar]
Barro RJ, and Jin T (2011), “On the Size Distribution of Macroeconomic Disasters,” Econometrica, 79, 1567–1589. [Google Scholar]
Bollerslev T (1986), “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. [Google Scholar]
Bollerslev T, and Todorov V (2011), “Estimation of Jump Tails,” Econometrica, 79, 1727–1783. [Google Scholar]
Brodin E, and Rootzén H (2009), “Univariate and Bivariate GPD methods for Predicting Extreme Wind Storm Losses,” Insurance: Mathematics and Economics, 44, 345–356. [Google Scholar]
Bücher A, and Segers J (2017), “On the Maximum Likelihood Estimator for the Generalized Extreme-Value Distribution,” Extremes, 20, 839–872. [Google Scholar]
Chavez-Demoulin V, Davison AC, and McNeil AJ (2005), “Estimating Value-at-Risk: a Point Process Approach,” Quantitative Finance, 5, 227–234. [Google Scholar]
Chavez-Demoulin V, and Embrechts P (2004), “Smooth Extremal Models in Finance and Insurance,” The Journal of Risk and insurance 71, 183–199. [Google Scholar]
Chavez-Demoulin V, Embrechts P, and Sardy S (2014), “Extreme-quantile Tracking for Financial Time Series,” Journal of Econometrics, 181, 44–52. [Google Scholar]
Cont R (2001), “Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues,” Quantitative Finance, 1, 223–236. [Google Scholar]
Cramér H. (1946), Mathematical Methods of Statistics, Princeton University Press. [Google Scholar]
Davison AC, and Smith RL (1990), “Models for Exceedances over High Thresholds,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 52, 393–442. [Google Scholar]
de Haan L, and Ferreira A (2006), Extreme Value Theory: an Introduction, New York: Springer. [Google Scholar]
Dombry C (2015), “Existence and Consistency of the Maximum Likelihood Estimators for the Extreme Value Index within the block maxima framework.”, Bernoulli, 21, 420–436. [Google Scholar]
Drees H, Ferreira A, and de Haan L (2004), “On Maximum Likelihood Estimation of the Extreme Value Index,” The Annals of Applied Probability, 14, 1179–1201. [Google Scholar]
Duffie D, and Pan J (1997), “An Overview of Value at Risk,” The Journal of Derivatives, 4, 7–49. [Google Scholar]
Embrechts P, Klüppelberg C, and Mikosch T (1997), Modelling Extremal Events for insurance and Finance, Berlin: Springer. [Google Scholar]
Engle RF (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of U.K. Inflation,” Econometrica, 50, 987–1008. [Google Scholar]
Francq C, and Zakoïan J (2004), “Maximum Likelihood Estimation of Pure GARCH and ARMA-GARCH processes,” Bernoulli, 10, 605–637. [Google Scholar]
Hall P, and Tajvidi N (2000), “Nonparametric Analysis of Temporal Trend when Fitting Parametric Models to Extreme-Value Data,” Statistical Science, 15, 153–167. [Google Scholar]
Hall P, and Yao Q (2003), “Inference in ARCH and GARCH Models with Heavy Tailed Errors,” Econometrica, 71, 285–317. [Google Scholar]
He Y, Hou Y, Peng L, and Shen H (forthcoming), “Inference for Conditional Value-at-Risk of a Predictive Regression,” The Annals of Statistics. [Google Scholar]
Hoga Y (2019), “Confidence Intervals for Conditional Tail Risk Measures in ARMA-GARCH Models,” Journal of Business & Economic Statistics, 37, 613–624. [Google Scholar]
Hull JC (2018), Risk Management and Financial Institutions, Fifth ed.,Hoboken: Wiley. [Google Scholar]
Jorion P (2006), Value at Risk: the New Benchmark for Measuring Financial Risk, Third ed., New York:McGraw-Hill. [Google Scholar]
Kelly B, and Jiang H (2014), “Tail Risk and Asset Prices,” The Review of Financial Studies, 27, 2841–2871. [Google Scholar]
Koul HL, and Ling S (2006), “Fitting an Error Distribution in Some Heteroscedastic Time Series Models,” The Annals of Statistics, 34, 994–1012. [Google Scholar]
Kratz M, Lok YH, and McNeil A (2018), “Multinomial VaR backtests: A simple implicit approach to backtesting expected shortfall,” Journal of Banking & Finance, 88, 393–407. [Google Scholar]
Kuester K, Mittnik S, and Paolella MS. (2006), “Value-at-risk Prediction: A Comparison of Alternative Strategies,” Journal of Financial Econometrics, 4, 53–89. [Google Scholar]
Ling S (2007), “Self-Weighted and Local Quasi-Maximum Likelihood Estimators for ARMA-GARCH/IGARCH Models,” Journal of Econometrics, 140, 849–873. [Google Scholar]
Martins-Filho C, Yao F, and Torero M (2018), “Nonparametric Estimation of Conditional Value-at-risk and Expected Shortfall based on Extreme Value Theory,” Econometric Theory, 34, 23–67. [Google Scholar]
Massacci D (2017), “Tail Risk Dynamics in Stock Returns: Links to the Macroeconomy and Global Markets Connectedness,” Management Science, 63, 3072–3089. [Google Scholar]
McNeil AJ (1997), “Estimating the Tails of Loss Severity Distributions Using Extreme Value Theory,” ASTIN Bulletin, 27, 117–137. [Google Scholar]
McNeil AJ, and Frey R (2000), “Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: an Extreme Value Approach,” Journal of Empirical Finance, 7, 271–300. [Google Scholar]
Owen A (2001), Empirical Likelihood, New York: Chapman & Hall. [Google Scholar]
Peng L, and Qi Y (2009). “Maximum Likelihood Estimation of Extreme Value Index for Irregular Cases,” Journal of Statistical Planning and inference, 139, 3361–3376. [Google Scholar]
Pickands J (1975), “Statistical Inference Using Extreme Order Statistics,” The Annals of Statistics, 3, 119–131. [Google Scholar]
Resnick SI (1987), Extreme Values, Regular Variation and Point Processes, New York: Springer. [Google Scholar]
Rootzén H, and Tajvidi N (1997), “Extreme Value Statistics and Wind Storm Losses: a Case Study,” Scandinavian Actuarial Journal 1, 70–94. [Google Scholar]
Smith RL (1985). “Maximum Likelihood Estimation in a Class of Nonregular Case,” Biometrika, 72, 67–90. [Google Scholar]
Smith RL (1987), “Estimating Tails of Probability Distributions,” The Annals of Statistics, 15, 1174–1207. [Google Scholar]
Zhao Z (2020), “Dynamic Bivariate Peak over Threshold Model for Joint Tail Risk Dynamics of Financial Markets,” Journal of Business & Economic Statistics. [Google Scholar]
Zhou C (2009), “Existence and Consistency of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 100, 794–815. [Google Scholar]
Zhou C (2010), “The Extent of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 101, 971–983. [Google Scholar]
Zhu K, and Ling S (2011), “Global Self-Weighted and Local Quasi-Maximum Exponential Likelihood Estimators for ARMA-GARCH/IGARCH Models,” The Annals of Statistics, 39, 2131–2163. [Google Scholar]
Zhu K (2016), “Bootstrapping the Portmanteau Tests in Weak Auto-Regressive Moving Average Models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 463–485. [Google Scholar]
Zhu K (2019), “Statistical Inference for Autoregressive Models Under Heteroscedasticity of Unknown Form,” The Annals of Statistics, 6, 3185–3215. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS1733545-supplement-Supp_1.pdf^{(8MB, pdf)}

[R1] Allen L, Bali TG, and Tang Y (2012), “Does Systemic Risk in the Financial Sector Predict Future Economic Downturns?,” The Review of Financial Studies, 25, 3000–3036. [Google Scholar]

[R2] Balkema A, and de Haan L (1974), “Residual Life Time at Great Age,” The Annals of Probability, 2, 792–804. [Google Scholar]

[R3] Barro RJ, and Jin T (2011), “On the Size Distribution of Macroeconomic Disasters,” Econometrica, 79, 1567–1589. [Google Scholar]

[R4] Bollerslev T (1986), “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. [Google Scholar]

[R5] Bollerslev T, and Todorov V (2011), “Estimation of Jump Tails,” Econometrica, 79, 1727–1783. [Google Scholar]

[R6] Brodin E, and Rootzén H (2009), “Univariate and Bivariate GPD methods for Predicting Extreme Wind Storm Losses,” Insurance: Mathematics and Economics, 44, 345–356. [Google Scholar]

[R7] Bücher A, and Segers J (2017), “On the Maximum Likelihood Estimator for the Generalized Extreme-Value Distribution,” Extremes, 20, 839–872. [Google Scholar]

[R8] Chavez-Demoulin V, Davison AC, and McNeil AJ (2005), “Estimating Value-at-Risk: a Point Process Approach,” Quantitative Finance, 5, 227–234. [Google Scholar]

[R9] Chavez-Demoulin V, and Embrechts P (2004), “Smooth Extremal Models in Finance and Insurance,” The Journal of Risk and insurance 71, 183–199. [Google Scholar]

[R10] Chavez-Demoulin V, Embrechts P, and Sardy S (2014), “Extreme-quantile Tracking for Financial Time Series,” Journal of Econometrics, 181, 44–52. [Google Scholar]

[R11] Cont R (2001), “Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues,” Quantitative Finance, 1, 223–236. [Google Scholar]

[R12] Cramér H. (1946), Mathematical Methods of Statistics, Princeton University Press. [Google Scholar]

[R13] Davison AC, and Smith RL (1990), “Models for Exceedances over High Thresholds,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 52, 393–442. [Google Scholar]

[R14] de Haan L, and Ferreira A (2006), Extreme Value Theory: an Introduction, New York: Springer. [Google Scholar]

[R15] Dombry C (2015), “Existence and Consistency of the Maximum Likelihood Estimators for the Extreme Value Index within the block maxima framework.”, Bernoulli, 21, 420–436. [Google Scholar]

[R16] Drees H, Ferreira A, and de Haan L (2004), “On Maximum Likelihood Estimation of the Extreme Value Index,” The Annals of Applied Probability, 14, 1179–1201. [Google Scholar]

[R17] Duffie D, and Pan J (1997), “An Overview of Value at Risk,” The Journal of Derivatives, 4, 7–49. [Google Scholar]

[R18] Embrechts P, Klüppelberg C, and Mikosch T (1997), Modelling Extremal Events for insurance and Finance, Berlin: Springer. [Google Scholar]

[R19] Engle RF (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of U.K. Inflation,” Econometrica, 50, 987–1008. [Google Scholar]

[R20] Francq C, and Zakoïan J (2004), “Maximum Likelihood Estimation of Pure GARCH and ARMA-GARCH processes,” Bernoulli, 10, 605–637. [Google Scholar]

[R21] Hall P, and Tajvidi N (2000), “Nonparametric Analysis of Temporal Trend when Fitting Parametric Models to Extreme-Value Data,” Statistical Science, 15, 153–167. [Google Scholar]

[R22] Hall P, and Yao Q (2003), “Inference in ARCH and GARCH Models with Heavy Tailed Errors,” Econometrica, 71, 285–317. [Google Scholar]

[R23] He Y, Hou Y, Peng L, and Shen H (forthcoming), “Inference for Conditional Value-at-Risk of a Predictive Regression,” The Annals of Statistics. [Google Scholar]

[R24] Hoga Y (2019), “Confidence Intervals for Conditional Tail Risk Measures in ARMA-GARCH Models,” Journal of Business & Economic Statistics, 37, 613–624. [Google Scholar]

[R25] Hull JC (2018), Risk Management and Financial Institutions, Fifth ed.,Hoboken: Wiley. [Google Scholar]

[R26] Jorion P (2006), Value at Risk: the New Benchmark for Measuring Financial Risk, Third ed., New York:McGraw-Hill. [Google Scholar]

[R27] Kelly B, and Jiang H (2014), “Tail Risk and Asset Prices,” The Review of Financial Studies, 27, 2841–2871. [Google Scholar]

[R28] Koul HL, and Ling S (2006), “Fitting an Error Distribution in Some Heteroscedastic Time Series Models,” The Annals of Statistics, 34, 994–1012. [Google Scholar]

[R29] Kratz M, Lok YH, and McNeil A (2018), “Multinomial VaR backtests: A simple implicit approach to backtesting expected shortfall,” Journal of Banking & Finance, 88, 393–407. [Google Scholar]

[R30] Kuester K, Mittnik S, and Paolella MS. (2006), “Value-at-risk Prediction: A Comparison of Alternative Strategies,” Journal of Financial Econometrics, 4, 53–89. [Google Scholar]

[R31] Ling S (2007), “Self-Weighted and Local Quasi-Maximum Likelihood Estimators for ARMA-GARCH/IGARCH Models,” Journal of Econometrics, 140, 849–873. [Google Scholar]

[R32] Martins-Filho C, Yao F, and Torero M (2018), “Nonparametric Estimation of Conditional Value-at-risk and Expected Shortfall based on Extreme Value Theory,” Econometric Theory, 34, 23–67. [Google Scholar]

[R33] Massacci D (2017), “Tail Risk Dynamics in Stock Returns: Links to the Macroeconomy and Global Markets Connectedness,” Management Science, 63, 3072–3089. [Google Scholar]

[R34] McNeil AJ (1997), “Estimating the Tails of Loss Severity Distributions Using Extreme Value Theory,” ASTIN Bulletin, 27, 117–137. [Google Scholar]

[R35] McNeil AJ, and Frey R (2000), “Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: an Extreme Value Approach,” Journal of Empirical Finance, 7, 271–300. [Google Scholar]

[R36] Owen A (2001), Empirical Likelihood, New York: Chapman & Hall. [Google Scholar]

[R37] Peng L, and Qi Y (2009). “Maximum Likelihood Estimation of Extreme Value Index for Irregular Cases,” Journal of Statistical Planning and inference, 139, 3361–3376. [Google Scholar]

[R38] Pickands J (1975), “Statistical Inference Using Extreme Order Statistics,” The Annals of Statistics, 3, 119–131. [Google Scholar]

[R39] Resnick SI (1987), Extreme Values, Regular Variation and Point Processes, New York: Springer. [Google Scholar]

[R40] Rootzén H, and Tajvidi N (1997), “Extreme Value Statistics and Wind Storm Losses: a Case Study,” Scandinavian Actuarial Journal 1, 70–94. [Google Scholar]

[R41] Smith RL (1985). “Maximum Likelihood Estimation in a Class of Nonregular Case,” Biometrika, 72, 67–90. [Google Scholar]

[R42] Smith RL (1987), “Estimating Tails of Probability Distributions,” The Annals of Statistics, 15, 1174–1207. [Google Scholar]

[R43] Zhao Z (2020), “Dynamic Bivariate Peak over Threshold Model for Joint Tail Risk Dynamics of Financial Markets,” Journal of Business & Economic Statistics. [Google Scholar]

[R44] Zhou C (2009), “Existence and Consistency of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 100, 794–815. [Google Scholar]

[R45] Zhou C (2010), “The Extent of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 101, 971–983. [Google Scholar]

[R46] Zhu K, and Ling S (2011), “Global Self-Weighted and Local Quasi-Maximum Exponential Likelihood Estimators for ARMA-GARCH/IGARCH Models,” The Annals of Statistics, 39, 2131–2163. [Google Scholar]

[R47] Zhu K (2016), “Bootstrapping the Portmanteau Tests in Weak Auto-Regressive Moving Average Models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 463–485. [Google Scholar]

[R48] Zhu K (2019), “Statistical Inference for Autoregressive Models Under Heteroscedasticity of Unknown Form,” The Annals of Statistics, 6, 3185–3215. [Google Scholar]

PERMALINK

Risk Analysis via Generalized Pareto Distributions

YI HE

LIANG PENG

DABAO ZHANG

ZIFENG ZHAO

Abstract

1. Introduction

2. Methodologies and Asymptotic Results for Independent Data

3. Methodologies and Asymptotic Results for ARMA-GARCH Models