Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: J Bus Econ Stat. 2021 Mar 3;40(2):852–867. doi: 10.1080/07350015.2021.1874390

Risk Analysis via Generalized Pareto Distributions

YI HE 1, LIANG PENG 2, DABAO ZHANG 3, ZIFENG ZHAO 4
PMCID: PMC9231421  NIHMSID: NIHMS1733545  PMID: 35756092

Abstract

We compute the value-at-risk of financial losses by fitting a generalized Pareto distribution to exceedances over a threshold. Following the common practice of setting the threshold as high sample quantiles, we show that, for both independent observations and time-series data, the asymptotic variance for the maximum likelihood estimation depends on the choice of threshold, unlike the existing study of using a divergent threshold. We also propose a random weighted bootstrap method for the interval estimation of VaR, with critical values computed by the empirical distribution of the absolute differences between the bootstrapped estimators and the maximum likelihood estimator. While our asymptotic results unify the inference with non-divergent and divergent thresholds, the finite sample studies via simulation and application to real data show that the derived confidence intervals well cover the true VaR in insurance and finance.

Keywords: ARMA-GARCH models, Generalized Pareto distribution, Random weighted bootstrap, Value-at-risk, Weighted empirical process

1. Introduction

Measuring risk and quantifying its uncertainty is crucial in insurance and finance. A well-studied and widely used risk measure is the so-called Value-at-Risk (VaR) at level 1 − p ∈ (0,1), which is defined as the quantile of the distribution function of a risk variable or a portfolio; see Duffie and Pan (1997) and Jorion (2006) for an overview of VaR. Given n identically distributed observations, the VaR at level 1 − p can be estimated nonparametrically by the sample quantile when n(1 − p) is not close to n or zero. The risk manager may quantify the estimation uncertainty via a direct estimation of the asymptotic variance or resampling methods such as the bootstrap and the empirical likelihood in Owen (2001).

The regulators often set the probability level 1 − p of VaR to be close to one such as 99% and 99.9%. Therefore, when the sample size is not particularly large, the nonparametric VaR estimator is inefficient and may seriously underestimate the risk. An obvious way to improve inference efficiency is to fit a parametric distribution family to the risk variable. It is known that an efficient likelihood based inference mainly uses the information around the center of data. As 1 − p is close to one, the information on the upper tail of the distribution becomes more crucial to the study of VaR. Therefore, one may build a parametric model for observations above a threshold to ensure that the upper tail’s fitting is accurate and robust. This raises an interesting question on how to model the excess distribution above a threshold, say, u given by

Fu(x)=P(XuxX>u)=F(x+u)F(u)1F(u)for0x<xFu,

where xF the right endpoint of the distribution function F(x)=P(Xx), i.e., xF = sup{x : F(x) < 1}.

The extreme value theory states that, when F is in the domain of attraction of extreme value distribution, there exists a function β(u) > 0 such that

limuxFsup0x<xFuFu(x)Gγ,β(u)(x)=0, (1.1)

where Gγ,β(u)(x) = 1−(1 + γx / β(u))−1/γ for 1 + γx / β(u) > 0 is the cumulative distribution function of the generalized Pareto distribution with the shape parameter γ and scale parameter β(u); see Balkema and de Haan (1974), and the overviews by Resnick (1987) and Embrechts et al. (1997). Fitting a generalized Pareto distribution to exceedances over a high threshold has been studied in the literature. For example, Smith (1987) and Drees et al. (2004) have studied the maximum likelihood estimation when a deterministic divergent threshold and a random divergent threshold are chosen, respectively; see also Davison and Smith (1990). The choice of the threshold depends on the order of the approximation errors in (1.1), which generally is defined as a second order regular variation. Typically, a large threshold gives a big variance, but a small threshold leads to large estimation bias. Given the difficulty in choosing this divergent threshold, researchers often advise practitioners to plot estimators against various thresholds and find a relatively stable region. In this case, the estimator has a non-negligible bias, which complicates the interval estimation.

Nevertheless, as a rule of thumb, practitioners often ignore the asymptotic bias and pick up 90% or 95% sample quantile as a threshold; see the discussions in Section 13.6.1 of Hull (2018). This is especially the case when modeling the so-called dynamic tail risk by some critical economic variables. Some applications of the generalized Pareto distribution include Rootzén and Tajvidi (1997) and Brodin and Rootzén (2009) for wind storm losses, Barro and Jin (2011) for economic disasters, and McNeil and Frey (2000), Chavez-Demoulin and Embrechts (2004), Bollerslev and Todorov (2011), and Allen et al. (2012) for financial time series. For dynamically modeling the generalized Pareto distribution, we refer to Chavez-Demoulin et al. (2005), Kelly and Jiang (2014), Chavez-Demoulin et al. (2014), Massacci (2017), and Zhao (2020) for financial returns and Hall and Tajvidi (2000) for climate data.

The common practice of using 90% or 95% sample quantile as the threshold ignores the estimation bias caused by the model approximation error. Hence, it becomes natural to model exceedances over an (unknown) fixed threshold by a generalized Pareto distribution. In other words, instead of fitting a parametric model to the entire data set, it is good to fit the exceedances over a threshold by the generalized Pareto distribution and model the data below the threshold nonparametrically as in Smith (1987) and Drees et al. (2004) for independent data, McNeil and Frey (2000) for an ARMA-GARCH model, and Martins-Filho et al. (2018) for nonparametric regression. Under such a model assumption, when the threshold is chosen as a sample quantile, the inference for the parameters and VaR depends on the random threshold selected, which is in stark contrast with the existing study of using a divergent threshold. A particular semi-parametric model we focus on is

F(x)={θG(x)G(x0)ifxx01(1θ)(1+γ(xx0)σ)1γifx>x0,} (1.2)

where θ ∈ (0,1), and G is a distribution function.

This paper aims to take such a model to provide a comprehensive inference on VaR for independent observations and time series data. The developed methodologies can be also applied to other tail-related risk measures such as expected shortfall and expectile. As insurance losses are arguably independent, we first derive the asymptotic distribution of the maximum likelihood estimator of the model parameters and the VaR based on independent data. We develop a unified inference theory for a universal threshold statistic, which can be a deterministic threshold based on prior knowledge, an order statistic based on the observations, or a more sophisticated quantile estimator. We show that the naive bootstrap method and the random weighted bootstrap method are both asymptotically correct for quantifying the estimation uncertainty.

For dependent data such as financial time series, we propose to infer on the conditional VaR by combining an ARMA-GARCH model and the semiparametric model for the residuals. To ensure the normality of VaR estimation for the ARMA-GARCH model with heavy-tailed errors, we propose a two-step self-weighted procedure to estimate the ARMA-GARCH model before fitting the residual distribution semiparametrically. We first estimate the ARMA parameters by a self-weighted least-squares method. Then estimate the GARCH parameters using the self-weighted exponential quasi-likelihood in Zhu and Ling (2011) with the least-squares ARMA residuals. Our approach maintains the natural condition that the ARMA errors have a zero mean, rather than a zero median in Zhu and Ling (2011), when relaxing the kurtosis condition on GARCH errors. To quantify the uncertainty of the conditional VaR estimation, we propose the random weighted bootstrap method, which is much less computationally intensive than the residual based bootstrap method.

The existing methods of using a divergent threshold face the severe difficulty of choosing the threshold. When one concerns interval estimation, the efficient way is to choose a larger threshold such that the estimation bias is negligible. This essentially assumes the exceedance follows an exact generalized Pareto distribution. In other words, when the exceedance has an approximate generalized Pareto distribution, our proposed point estimation and interval estimation are still valid as long as the divergent threshold is sufficiently large such that the model approximation error is at a smaller order than that of the estimation error.

We organize this paper as follows. Sections 2 and 3 present our methodologies and asymptotic results for independent observations and an ARMA-GARCH model, respectively. Sections 4 and 5 contain simulation study and data analysis. Finally, Section 6 concludes the paper and discusses future work. The detailed proofs of the theorems are available in the supplement. Throughout this paper, we denote by AT the transpose of a matrix or vector A, and denote d as convergence in distribution and P as convergence in probability. All the asymptotic results hold as the sample size n → ∞.

2. Methodologies and Asymptotic Results for Independent Data

Consider a random variable XR with distribution function F and quantile function Q(·) = F(·). For a threshold u0 = F(1 − α0) with exceeding probability α0 ∈ (0,1), we make the following assumption for the exceedance Xu0X > u0.

Assumption 1 (Generalized Pareto Model). There exist a shape parameter γ0R and a scale parameter σα0 > 0 such that

P(X>u0+xX>u0)={(1+γ0xσα0)1γ0,γ00,exp(xσα0),γ0=0.}

where α0=P(X>u0) and we require 1 + γ0x / σα0 > 0 for γ0 ≠ 0 and x > 0 for γ0 = 0.

The shape parameter γ0 is called the extreme value index for the exceedance Xu0X > u0. When γ0 < 0, there is a finite right endpoint u=u0σα0γ0 in the support of the distribution of X, i.e. F(x) = 1 for all xu*. When γ = 0, Xu0X > u0 has an exponential distribution with mean σα0. When γ0 > 0, Xu0X > u0 has a heavy tail with up to 1γ0-th finite moments. Note that σα0 is also a function of u0 via α0=P(X>u0).

Observe that, for any higher threshold u > u0, the exceedance XuX > u again follows the generalized Pareto distribution with the same shape parameter γ0 but a different scale parameter σα=(α0α)γ0σα0, where α = 1 − F(u) is the exceeding probability. Specifically,

P(X>u+xX>u)={(1+γ0xσα)1γ0,γ00,exp(xσα),γ0=0.}

A direct calculation yields the (1 − p)-th quantile of X, i.e., the VaR at level 1 − p takes the form

VaRX(1p)=Q(1p)={u+σαγ0((αp)γ01)ifγ00,u+σαlog(αp)ifγ0=0,} (2.1)

for all given p ∈ (0, α0).

As Assumption 1 above does not model the distribution below the threshold parametrically, computing VaR(1 − p) based on (2.1) is a semiparametric method and achieves a good balance between robustness and efficiency. It is easy to check that model (1.2) satisfies Assumption 1. Unlike the existing studies on fitting GPD to exceedances over a divergent threshold, we initially investigate the inference based on a non-divergent threshold. In this case, the threshold may play a role in quantifying the inference uncertainty of VaR in (2.1). On the other hand, if the threshold diverges fast enough such that the estimation bias is negligible, then the model approximation error is negligible. Hence, the developed method for fitting an exact generalized Pareto distribution is valid for using a larger divergent threshold under the setting that the exceedance has an approximate generalized Pareto distribution.

Suppose we have a random sample X1, … , Xn from F satisfying Assumption 1. Let the order statistics be X1:n ≤ … ≤ Xn:n. Take a large threshold, say, un, either deterministic or random, corresponding to the sample exceeding probability

α^n=1ni=1nδ(Xiun), (2.2)

where δ(x) := 1(x > 0) denotes the step function taking value 1 on the positive line and value 0 otherwise. Let αn =1 − F(un) denote the adaptive exceeding probability, which may be either deterministic or random depending on our choice of the threshold un.

Given an exceedance Xiun = x > 0, the log-likelihood function for the Pareto parameters v(γ,logσ)TR2 is given by

l(vx)={1+γγlog(1+γxσ)+logσ}.

Note that the above function is well defined for γ = 0 by continuity as

l((0,logσ)Tx)=xσlogσ.

Thus, the full log-likelihood function with the observed X1un, … , Xnun is given by

i=1nδ(Xiun)l(vXiun).

Therefore, the maximum likelihood estimator of ν solves the score equations

i=1nδ(Xiun)l(vXiun)γ=i=1nδ(Xiun)s1(vXiun)=0, (2.3)
i=1nδ(Xiun)l(vXiun)logσ=i=1nδ(Xiun)s2(vXiun)=0, (2.4)

where

s1(γ,logσx)=1γ2(log(1+γxσ)γxσ1+γxσ)xσ1+γxσ,s2(γ,logσx)=1+(1+γ)xσ1+γxσ,

and for γ = 0 the above equations take the form

s1(0,logσx)=12(xσ)2xσ,s2(0,logσx)=1+xσ.

In this paper, we only consider the regular case, i.e., γ0>12 as in Davison and Smith (1990) and Drees et al. (2004), and it is often the case of γ0 > 0 regarding heavy-tailed losses in insurance and finance; see also Bücher and Segers (2017) for more discussions. For dealing with the irregular case, i.e., γ0 ≤ −1/2, we refer to Smith (1985), Zhou (2009, 2010), and Peng and Qi (2009).

Davison and Smith (1990) disregard the randomness of threshold while Drees et al. (2004) obtain the asymptotic normality of MLE for a divergent random threshold (i.e., α¯=α¯(n)0 as n → ∞) under (1.1), which holds for Assumption 1. Here, we present a universal asymptotic normality result under Assumption 1 in the sense of unifying the cases of using either a deterministic threshold or a random threshold:

Assumption 2 (Universal threshold statistic). The threshold un = un(X1, … , Xn) is an arbitrary measurable statistic such that unPQ(1α¯) for some α¯(0,α0).

Remark 1. Assumption 2 allows a flexible choice of the threshold un, regardless of it being deterministic or random. The practitioners may choose a deterministic threshold based on their prior knowledge, an order statistic based on the observations, or an even more sophisticated quantile estimator. Unifying these thresholds extends the scope of our inference theory, and it is necessary for our extension to time-series data in the next section where the threshold for the sample residuals may depend on the estimator of the ARMA-GARCH parameters.

Normalizing the estimators with the adaptive values θ0(n)=(γ0,logσαn,logαn) rather than its limit θ0=(γ0,logσα¯,logα¯), we have a unified inference procedure for a general threshold statistic un:

Theorem 1 (Universal inference for generalized Pareto parameters). Suppose that Assumption 1 holds with a true parameter γ0>12 and the choice of sequence un satisfies Assumption 2.

  1. With probability tending to 1, there exists a maximum likelihood estimator θn=(γ^n,logσ^n,logα^n), solving the score equations (2.2)- (2.4) simultaneously, in the local parameter space
    Θ¯nε={θR3:θθ0(n)<n12+ε}, (2.5)

    for any ε ∈ (0, min {γ0 + 1 / 2, 1 / 2}), where θ0(n)=(γ0,logσαn,logαn) denotes the adaptive true values.

  2. Any maximum likelihood estimator sequence from part (i) is asymptotically normal in such a way that
    nα¯(γ^nγ0,σ^nσαn1,α^nαn1)dN(0,[I1001α¯])
    where the inverse Fisher information matrix
    I1=(E(l(v0Z)vl(v0Z)vTZ>0))1=[(1+γ0)2(1+γ0)(1+γ0)2(1+γ0)]

    with Z=XQ(1α¯).

In practice, it is common to fix a proportion of data, say, α¯(0,1) and use the [nα¯]th largest observation un=Xn[nα¯]:n as the threshold. It is then easy to deduce the following corollary.

Corollary 1. Under the conditions of Theorem 1 with un=Xn[nα¯]:n, as n → ∞,

nα¯(γ^nγ0σ^nσα¯1α^nα¯1)dN(0,[(1+γ0)2(1+γ0)0(1+γ0)2(1+γ0)+γ02(1α¯)γ0(1α¯)0γ0(1α¯)1α¯]).

Remark 2. Though Assumption 2 requires a fixed α¯(0,α0), our asymptotic variance formula in Theorem 1 is indeed unified for both a finite and a divergent threshold. Specifically, in the supplement, we show that the asymptotic results in Theorem 1 remain valid if α¯=α¯n is an intermediate sequence such that α¯0 and nα¯ if we rewrite Assumption 2 with unQ(1α¯)P1, and the asymptotic variance formula in Theorem 1 remains valid by simply setting α¯ to its limit 0. For a vanishing α¯ (i.e. α¯0), we allow the true threshold α0 to vanish as long as α¯α0 is bounded strictly below 1. Moreover, it can be seen from the proofs that our results remain true under a relaxed Assumption 1 as long as the approximation error between the exceedance distribution and a generalized Pareto distribution is at a smaller order than that of the estimation error. More specifically, suppose our observations (X1(n),,Xn(n)) come from a triangular array of i.i.d. random variables and denote their common distribution as F(n). Our inference remains valid if the generalized Pareto model is approximately true, that is,

supx01Fu0(n)(x)1Gγ,σα0(x)1=o((nα0)12), (2.6)

where Fu0(n)(x)=F(n)(x+u0)F(n)(u0)α0 denotes the exceedance distribution function, the exceeding probability α0 = 1 − F(n)(u0) can be either fixed or vanishing, and Gγ,σα0 denotes the generalized Pareto distribution with the shape parameter γ and scale parameter σα0. For a vanishing α0, condition (2.6) indeed follows from the high-order extended regular variation conditions as shown in, e.g., Drees et al. (2004) which is a necessary assumption used in the extreme value literature for removing the estimation bias with a divergent threshold. In conclusion, our fixed-α¯ approach is more robust than the existing extreme value approach, and covers more practical applications. We leave the studies under model misspecification (e.g., when the model approximation (2.6) does not hold) for future works.

Remark 3. As argued by Dombry (2015), there is no guarantee that the global maximum likelihood estimator is unique. Even if a global MLE is attainable, the classical regularity conditions in Cramér (1946) are not fulfilled, and it requires a detailed verification of the local asymptotic normality (LAN) conditions in Bücher and Segers (2017). Also, the global estimation theory in Bücher and Segers (2017) does not apply as our ‘true’ values θ0(n) are a sequence of adaptive values depending on the (random) threshold statistic rather than a fixed point. Therefore, we consider a local maximum likelihood estimator and leave the global estimation theory for future research. Note that this challenge remains for a divergent threshold, as the asymptotic normality results in, e.g., Drees et al. (2004) are not guaranteed to hold for an arbitrary global estimator sequence; see, e.g., Zhou (2009) and Zhou (2010) for comments.

Plugging the estimator (γ^n, σ^n, α^n) from Theorem 1 in VaR formula (2.1), the MLE of VaRX(1 − p) is given by

VaRX(1p)=un+σ^nγ^n((α^np)γ^n1), (2.7)

which takes the form VaRX(1p)=un+σ^nlog(α^np) if γ^n=0. The asymptotic normality of the quantile estimator (2.7) then follows directly from the continuous mapping theorem, since we can expand the true quantile in (2.1) similarly by

VaRX(1p)=un+σαnγ0((αnp)γ01), (2.8)

even with a random adaptive exceeding probability αn, conditional on the event αn > α0, which occurs with probability tending to 1. Again, our quantile inference is asymptotically correct for a universal threshold statistic.

Theorem 2 (Universal inference for high quantile). Under the conditions of Theorem 1, for every p ∈ (0, α0),

nα¯σp(VaRX(1p)VaRX(1p))dN(0,q(α¯p)TI1q(α¯p)+1α¯),

where for γ0 ≠ 0 the vector function

q(t)=(1t(St)γ0logssds,1tγ0γ0)T,t>0,

and it should be interpreted by continuity as (12(logt)2,logt)T when γ0 = 0.

Remark 4. Ignoring all common factors, one may search for the best threshold as un = Xn−[p]:n with λ in a neighborhood of 1 minimizing the asymptotic variance

1λ(q^(λ)TI1q^(λ)+1),

where I1 and q^ may be constructed using some preliminary estimate γ^ of the extreme value index γ0 as given below. If necessary, one may update γ^ with the new choice of λ until convergence. On the other hand, it is important to develop a distribution-free goodness-of-fit test for fitting a generalized Pareto distribution to exceedances over a threshold. It is challenging to extend the existing parametric testing methods in, e.g., Koul and Ling (2006) to our semi-parametric models, which will be our future research.

It is straightforward to quantify the uncertainty of VaRX(1 − p) based on the normal approximation. More specifically, we estimate α¯ by α^n (if α¯ is unknown), the scale parameter σp by

σ^p=σ^n(α^np)γ^n,

and the limiting variance by

τ^n2q(α^np)TI1q(α^np)+1α^n

with

I1=[(1+γ^n)2(1+γ^n)(1+γ^n)2(1+γ^n)].

Hence, a normal approximation confidence interval of VaRX(1 − p) with level a is

INA(a)=[VaRX(1p)z(1+a)2nα^nσ^n(α^np)γ^nτ^n,VaRX(1p)+z(1+a)2nα^nσ^n(α^np)γ^nτ^n],

where z(1+a)/2 is the 1+a2-quantile of the standard normal distribution. Unfortunately, our simulation study below shows that this interval has a poor coverage probability in small samples, which calls for more efficient methods.

To improve finite-sample coverage, we propose a resampling method called the random weighted bootstrap procedure. The random weighted bootstrap method is less computationally intensive than the naive bootstrap method when we estimate risk based on a time series model (in the next section).

Zhu (2016, 2019) recently applies this method to conduct a Portmanteau test and infer autoregressive models.

  • Step B1) Draw a random sample with sample size n from a distribution function with mean one and variance one such as the standard exponential distribution, say ξ1(b), ⋯ , ξn(b).

  • Step B2) Choose a threshold statistic un(b), possibly dependent on ξ1(b), … , ξn(b). Solve the following random weighted score equations for α¯, γ, and log σ:
    i=1nξi(δ(Xiun(b))α¯)=0 (2.9)
    i=1nξiδ(Xiun(b))s1(vXiun(b))=0, (2.10)
    i=1nξiδ(Xiun(b))s2(vXiun(b))=0. (2.11)
    Denote these estimators by α^n(b), γ^n(b), and σ^n(b), we have
    VaRX(b)(1p)=un(b)+σ^n(b)γ^n(b)((α^n(b)p)γ^n(b)1).
  • Step B3) Repeat the above two steps B times to obtain {VaRX(b)(1p)}b=1B. Let D¯1:BD¯B:B denote the order statistics of
    log(VaRX(b)(1p)VaRX(1p)),b=1,,B,
    and let D¯(1)D¯(B) denote the order statistics of
    log(VaRX(b)(1p)VaRX(1p)),b=1,,B,
    Hence, the confidence intervals with level a for log(VaRX(1 − p)) are
    IRWB,1(a)=[log(VaRX(1p))D¯[B+Ba2]:B,log(VaRX(1p))D¯[BBa2]:B]
    and
    IRWB,2(a)=[log(VaRX(1p))D¯(Ba),log(VaRX(1p))+D¯(Ba)].

The following theorem establishes the validity of our random weighted bootstrap method.

Theorem 3 (Random weighted bootstrap). Suppose the conditions of Theorem 1 hold. For an arbitrary bootstrap threshold statistic un(b)=un+oP(1) and let αn(b)=1F(un(b)):

  1. With probability tending to 1, there exists a random weighted maximum likelihood estimator θn(b) = (γ^n(b), logσ^n(b), logα^n(b)), solving the score equations (2.9)- (2.11) simultaneously in the local parameter space
    Θε(b)={θR3:θθ0(b)<n12+ε}, (2.12)

    for any ε ∈ (0, min{γ0 + 1 / 2, 1 / 2}), where θ0(b)=(γ0,logσαn(b),logαn(b)) denotes the adaptive true values.

  2. For each probability level a ∈ (0,1),
    P(VaRX(1p)VaRX(1p)cn(a))1a,
    where
    cn(a)=inf{x:P(VaR(b)(1p)VaRX(1p)xX1,,Xn)>1a}.

    The result remains true if VaRX(1 − p), VaRX(1 − p), and VaR(b)(1 − p) are substituted by their logarithms, respectively, for VaRX(1 − p) > 0.

Remark 5. The random weighted bootstrap intervals for the extreme value index γ0 is also valid. For each probability level a ∈ (0,1), P(γ^nγ0cn,γ(a))1a, where

cn,γ(a)=inf{x:P(γ^n(b)γ^nxX1,,Xn)>1a}.

The result remains true if we substitute γ0, γ^n and γ^n(b) by their logarithms respectively, when γ0 > 0. The random weighted bootstrap intervals for the adaptive scale parameter σαn and the adaptive exceeding probability αn, are asymptotically correct if the difference between the bootstrap threshold and the original threshold is asymptotically negligible in the sense that un(b)=un+oP((nα¯)12).

Remark 6 (Naive bootstrap). In the supplement, we show that Theorem 3 and Remark 5 remain true if replacing the random weighted bootstrap statistics with the naive bootstrap statistics. In simulations, we observe comparable performance between these two methods for independent data.

3. Methodologies and Asymptotic Results for ARMA-GARCH Models

Since the seminal work of Engle (1982) and Bollerslev (1986), it has become a common practice to model a financial time series by an ARMA-GARCH model given by

{Yt=μ+i=1q1ϕiYti+j=1q2ψjεtj+εtεt=h¯tηt,h¯t=ω¯+i=1ra¯iεti2+j=1sb¯jh¯tj,} (3.1)

where ω¯>0, a¯i0, b¯j0, and {ηt} is a sequence of i.i.d. random variables with zero mean and variance one. In this case, the so-called one-step ahead conditional VaR is more useful in forecasting risk and is defined as the conditional quantile of Yn+1 given the past information up to time n, i.e., Fn=σ(,Yn1,Yn). Hence, the one-step ahead conditional VaR is

VaRY,n(1p)=μ+i=1q1ϕiYn+1i+j=1q2ψjεn+1j+h¯n+1VaRη(1p), (3.2)

and note that h¯n+1 is Fn − measurable.

We remark that McNeil and Frey (2000) consider the model above, and Martins-Filho et al. (2018) study the nonparametric regression, which covers AR-ARCH models but not ARMA-GARCH models. Both papers only consider the case of vanishing risk level, i.e., p = p(n) → 0 as n → ∞. In this case, the estimation for the ARMA-GARCH model in McNeil and Frey (2000) and the kernel smoothing estimation for the conditional mean and conditional standard deviation in Martins-Filho et al. (2018) do not play a role in the asymptotic variance of the VaR estimation. Unlike these two papers, we aim to allow both fixed and vanishing risk levels and consider the uncertainties in fitting both the ARMA-GARCH model with fewer finite moments and the GPD to residuals.

As mentioned above, regulators often set p close to one, making it useful to model ηt over a high threshold by a GPD parametrically. To infer the conditional VaR, we need to estimate the unknown parameters in (3.1) and (2.1). An obvious inference method for model (3.1) is the so-called quasi maximum likelihood estimation. The asymptotic normality of the quasi-Gaussian maximum likelihood estimator is available in Francq and Zakoïan (2004), which requires finite fourth moments of both εt and ηt. However, in practice, it is often that i=1rai+j=1sbj is close to one, making it problematic to assume Eεt4<. When Eηt4<, Ling (2007) proposes a self-weighted quasi-maximum likelihood estimator that has a normal limiting distribution and allows for Eεt4=. However, the asymptotic normality of these estimators may be lost when Eηt4=, see, e.g., Hall and Yao (2003). To further allow both Eηt4= and Eεt4=, Zhu and Ling (2011) propose a self-weighted exponential likelihood estimator, which has a normal limiting distribution but requires Eηt∣ = 1 and a zero median of ηt. Changing Eηt2=1 to Eηt=1 requires a scale transformation of ht, which does not affect the inference of the conditional VaR; however, changing a zero mean of ηt to a zero median involves a shift transformation, which makes the inference of the conditional VaR infeasible.

Here, we instead propose a three-step inference of the conditional VaR (3.2) under model (3.1), which allows both Eεt4= and Eηt4=. This is important in estimating VaRY,n(1 − p) when p is treated as a fixed number rather than a number converging to zero as n → ∞.

We assume that Eηt=d>0 unknown and put Xt = ηt / d, ht=d2h¯t, ω=ω¯d2, ai=a¯id2, and bj=b¯j. Then model (3.1) is equivalent to

{Yt=μ+i=1q1ϕiYti+j=1q2ψjεtj+εtεt=htXt,ht=ω+i=1raiεti2+j=1sbjhtj,} (3.3)

where EXt=Eηtd=1. The ARMA coefficients remain the same for Yt, and we can rewrite (3.2) as

VaRY,n(1p)=μ+i=1q1ϕiYn+1i+j=1q2ψjεn+1j+hn+1VaRX(1p).

Model (3.3) has been studied in Zhu and Ling (2011), but here we maintain the zero mean condition on Xt as required by the original model (3.1).

Let ψ=(ϕT,ϕhT)T denote the parameters in (3.3) with ϕ = (μ, ϕ1, … , ϕq1, ψ1, … , ψq2)T and ϕh =(ω, a1, … , ar, b1, … , bs)T. Before moving on to the quantile inference, we first develop a two-step estimator of ψ that is asymptotic normal without requiring any fourth moment condition.

Given the observations Y1, … , Yn and the initial value Y¯0={Yt:t0} generated by model (3.1), we can write the parametric model (3.3) as

εt(ϕ)=Ytμi=1q1ϕiYtij=1q2ψjεtj(ϕ),ht(ψ)=ω+i=1raiεti2(ϕ)+j=1sbjhtj(ψ),Xt(ψ)=εt(ϕ)ht(ψ).

Obviously, εt = εt(ϕ0), ht = ht(ψ0), and Xt = Xt (ψ0), where ϕ0 and ψ0=(ϕ0T,ϕh0T)T denote the true values of the parameters. In practice, however, we do not have the initial values Y¯0={Yt:t0}, which makes the calculation of εt(ϕ), ht(ψ) and Xt(ψ) infeasible. To make the estimation feasible, in what follows, we replace Y¯0={Yt:t0} by zeros like Ling (2007) and Zhu and Ling (2011), and define the feasible parametric model ε~t(ϕ), h~t(ψ), X~t(ψ) based on the new initial values.

First, we estimate ϕ by the self-weighted least squares estimator given by

ϕ=argminϕt=1nw~t2ε~t2(ϕ), (3.4)

where {w~t} are some proper weights designed to reduce the moment effect of {ht}, and ε~t(ϕ) is the feasible parametric model as defined above. We introduce the weights w~t that downweight i=1ρiYti for some ρ ∈ (0,1) to control the asymptotic order of the gradient of the least-squares error functions. The constant ρ depends on the true ARMA-GARCH model. Same as in He et al. (forthcoming), we use the feasible weight

w~t={max(1,i=1t1elog2(i+1)Yti)}1,

which is a truncated version of the oracle weight

wt={max(1,i=1elog2(i+1)Yti)}1.

Second, we define the self-weighted estimator ϕh of ϕh, which minimizes the self-weighted negative log quasi-exponential-likelihood

t=1nw~t4l~t(ϕhϕ)withl~t(ϕhϕ)=logh~t(ϕ,ϕh)+ε~t(ϕ)h~t(ϕ,ϕh), (3.5)

where ϕ and w~t are the least-squares estimator and self-weights from the first step, respectively, and h~t(ϕ,ϕh) is the feasible parametric model.

To establish the joint asymptotic normality of ψ=(ϕT,ϕhT)T, we need the following additional regularity conditions.

A1. Let Θψ=Θϕ×ΘϕhRq1+q2+1×[0,)r+s+1 denote the parameter space for ψ=(ϕT,ϕhT)T. Assume that Θψ is compact and the true value of ψ is an interior point.

A2. For each ϕ ∈ Θϕ, 1i=1q1ϕizi0 and 1+j=1q2ψjzj0 when ∣z∣≤1, and 1i=1q1ϕizi=0 and 1+j=1q2ψjzj=0 have no common root with ϕq1 ≠ 0 and ψq2 ≠ 0.

A3. For each ϕh ∈ Θϕh, there is no common root for equations i=1raizi=0 and j=1sbjzj=0. Further, i=1rai0,ar+bs0, and j=1sbi<1.

A4. Eεt2<.

A5. Xt=ηtEηt and {ηt}t=1n is a sequence of independent and identically distributed random variables with mean zero, variance one, and continuous density function f such that f(0) > 0 and supxRf(x)<.

Conditions A1–A3 are standard stationarity, invertibility, and identification conditions for ARMA-GARCH model (3.3) as in, e.g., Ling (2007). Condition A4 is equivalent to requiring that there is no unit root in the underlying GARCH process (3.1), that is,

i=1ra¯i+j=1sb¯j<1.

By carefully checking our proofs, it can be seen that we may further relax the condition down to the first moment, that is, Eεt<. This means that our results extend to IGARCH model with i=1ra¯i+j=1sb¯j=1 under suitable conditions; see, e.g., part (iii) of Theorem 2.1 in Ling (2007). The second moment condition simplifies our subsequent inference for the generalized Pareto model, and therefore we keep it throughout for simplicity. Condition A5 is similar to Assumption 2.6 in Zhu and Ling (2011), but we maintain the necessary condition that Xt has a zero mean rather than a zero median.

Theorem 4. Assume conditions A1–A5 hold.

  1. The self-weighted estimator is consistent, that is, ψ(ϕT,ϕhT)TPψ0.

  2. The self-weighted estimator is asymptotic normal in such a way that
    n(ψψ0)dN(0,Σ1Ω(Σ1)T),
    with
    Σ=[Σ10Σ21Σ22],Ω=[Ω11Ω21TΩ21Ω22],
    where the sub-matrices
    Σ1=E(wt2εtϕεtϕT),Σ21=18E(wt4ht2htϕhhtϕT),Σ22=18E(wt4ht2htϕhhtϕT),Ω11=EXt2E(wt2htεtϕεtϕT),Ω22=EXt214E[wt8ht2htϕhhtϕhT],andΩ21=E[Xt2(1(Xt>0)1(Xt<0))]E[wt62hthtϕhεtϕT].

Next, we estimate the high quantile of Xt under the generalized Pareto model with the additional assumption:

A6. XtηtEηt satisfies Assumption 1 with γ0(0,12) and scale parameter σα0 > 0.

Note that γ0 < 1/2 above ensures that Eηt2<. Let X^1:nX^n:n denote the order statistics of residuals {X^tX~t(ψ):t=1,,n}, where X~t() is the feasible parametric model as defined above. We then choose a threshold statistic such as

un=X^n[nα¯]:n, (3.6)

corresponding to an adaptive tail probability level αn = 1 − F(un), where F denotes the distribution function of Xt. Under the conditions of Theorem 4, we show that the threshold estimator (3.6) is consistent, that is,

unPQ(1α¯),and equivalentlyαnPα¯, (3.7)

where Q(·) = F(·) denotes the quantile function of Xt. In general, our theory allows an arbitrary threshold statistic un that satisfies our Assumption 2 above. With a general threshold statistic, we estimate the adaptive exceeding probability αn, the shape parameter γ0, and the scale parameter σαn altogether by solving equations (2.2)-(2.4) with Xi therein replaced by the residual X^t. Denote the estimators by α^, γ^, and σ^ respectively, which give the quantile estimator

VaRX(1p)=un+σ^γ^((α^p)γ^1).

Thus the estimator of VaRY,n(1 − p) is given by

VaRY,n(1p)=μ^+i=1q1ϕ^iYn+1i+j=1q2ψ^jε~n+1j(ϕ)+h~t(ϕ,ϕh)VaRX(1p). (3.8)

Note that un = un(ψ), γ^=γ^(ψ) and σ^=σ^(ψ) all depend on the self-weighted estimator ψ , whose effects do not fade away for any finite p ∈ (0,1).

Theorem 5. Assume conditions A1-A6 hold.

  1. With probability tending to 1, there exists a maximum likelihood estimator θ=(γ^,logσ^,logα^) solving the score equations (2.2)- (2.4) simultaneously for {X^t} in the local parameter space
    Θ¯nε={θR3:θθ0(n)<n12+ε},

    for any ε ∈ (0, min{γ0 +1 / 2, 1 / 2}), where θ0(n)=(γ0,logσαn,logαn) and αn = 1 − F(un) denote the adaptive true values.

  2. Any maximum likelihood estimator sequence from part (i) is jointly asymptotic normal, in such a way that
    nα¯[ψψ0θθ0(n)]dN(0,(Σ~1)Ω~(Σ~1)T)
    where
    Σ~=[Σ00Γ1TΣI011α¯Γ2TΣ011α¯],Ω~=[α¯Ωα¯σα¯Γ3α¯σα¯1α¯Γ4α¯σα¯Γ3TI0α¯σα¯1α¯Γ4T011α¯]
    with
    Γ1=E[12hthtψ][1(1+γ0)(1+2γ0)11+2γ0]T+Γ2[γ0(1+γ0)(1+2γ0)1+γ01+2γ0]T,Γ2=1σα¯{Q(1α¯)E[12hthtψ]E[1htεtψ]},Γ3=[E[wt2εtϕht]E[wt42hthtϕh]][1(1γ0)211γ0]T,Γ4=[E[wt2εtϕht](Q(1α¯)σα¯+11γ0)E[wt42hthtϕh](Q(1α¯)σα¯+11γ01σα¯)],

    and I defined in Theorem 1.

Remark 7. Again, our inference is asymptotically correct regardless of a finite or divergent threshold, be it deterministic or random; see Remarks 1 and 2. By fixing α¯, we can effectively quantify the influence from the ARMA-GARCH model estimation errors on our generalized Pareto parameter inference based on the estimated residuals {X^t} instead of the true errors {Xt}. When α¯=α¯n0 is an intermediate sequence such that unQ(1α¯)P1 and nα¯nκ for some κ > 0 as in, e.g, McNeil and Frey (2000), Martins-Filho et al. (2018), and Hoga (2019), we deduce in the supplement that the estimation error from the ARMA-GARCH model indeed becomes asymptotically negligible as

nα¯n(θθ0(n))dN(0,[I1001]),

where the asymptotic variance is the same as using the true errors {Xt} rather than the estimated residuals {X^t}, and coincides with that in Theorem 5 by setting α¯ to its limit 0 in the asymptotic variance. In other words, our approach unifies the inference for both non-divergent and divergent thresholds. Following Remark 2, it is natural to expect that our methods remain asymptotically correct when the true errors are array data that could be sufficiently well modeled by the generalized Pareto distribution.

From Theorem 5, we can quantify the impact of the ARMA-GARCH model estimation errors to our inference of the generalized Pareto parameters using the estimated residuals rather than the true errors. In particular, observe that

nα¯(θθ0(n))dN(0,Iα¯1+α¯Vα¯),

where Iα¯1=[I1001α¯] is the asymptotic covariance matrix in Theorem 1, and we have an additional variance term depending on the ARMA-GARCH model given by

Vα¯=Iα¯1(AΩAT+vAT+AvT)Iα¯1,A=[Γ1T11α¯Γ2T],v=[σα¯Γ3Tσα¯1α¯Γ4T]. (3.9)

Now recall the quantile formula (2.8). The following quantile inference theorem follows by continuous mapping theorem.

Theorem 6. Under the conditions of Theorem 5, for any p ∈ (0, α0)

nα¯σp(VaRX(1p)VaRX(1p))dN(0,τ2(α¯,p)),

where the variance

τ2(α¯,p)=q(α¯p)TI1q(α¯p)+1α¯+α¯[q(α¯p)1]Vα¯[q(α¯p)1]T,

with I1 defined in Theorem 1 and the additional variance term Vα¯ given in (3.9).

We omit the proof as it is completely analogous to that of Theorem 2, but using Theorem 5 instead of Theorem 1. Now, with σ^p=σ^n(α^p)γ^n and a consistent estimator τ^2(α¯,p) (e.g., replacing the moments by their sample versions, α¯ by α^, γ0 by γ^, σα¯ by σ^n, and Q(1α¯) by un), a confidence interval with level a of VaRX(1 − p) is given by

[VaRX(1p)z(1+a)2nα¯σ^n(α^p)γ^nτ^(α¯,p),VaRX(1p)+z(1+a)2nα¯σ^n(α^p)γ^nτ^(α¯,p)].

Substituting VaRX(1 − p) in (3.8) by the values in the above interval, we can construct a prediction interval for VaRY,n(1 − p). Similar to the case of independent data, such an interval has a poor coverage probability in small samples. It is computationally intensive to employ the residual based bootstrap method. Here, to bypass the daunting task of estimating the asymptotic variance of the quantile estimator, we suggest a random weighted bootstrap procedure as follows.

  • Step C1) Draw a random sample with sample size n from a distribution function with mean one and variance one, say ξ1(b), ⋯ , ξn(b).

  • Step C2) First, we estimate ϕ by
    ϕ(b)=argminϕt=1nξt(b)w~t2ε~t2(ϕ).
    Second, we estimate ϕh by maximizing
    t=1nξt(b)w~t4l~t(ϕhϕ(b))
    and denote the estimator by ϕh(b). Define X^t(b)=ε~t(ϕ(b))h~t(ϕ(b),ϕh(b)) for t = 1, ⋯ , n, u^n(b)=X^n[nα¯]:n(b), and estimate γ0 and σαn by solving
    t=1nξt(b)δ(X^t(b)u^n(b))s1(vX^t(b)u^n(b))=0,t=1nξt(b)δ(X^t(b)u^n(b))s2(vX^t(b)u^n(b))=0.
    Denote the estimators by γ^(b) and σ^(b), which gives
    VaRX(b)(1p)=u^n(b)+σ^(b)γ^(b)((α¯p)γ^(b)1).
  • Step C3) Repeat the above two steps B times to obtain {VaRX(b)(1p)}b=1B. Let D~1:BD~B:B denote the order statistics of
    log(VaRX(b)(1p)VaRX(1p)),b=1,,B,
    and let D~(1)D~(B) denote the order statistics of
    log(VaRX(b)(1p)VaRX(1p)),b=1,,B.
    Hence, the confidence intervals with level a for log(VaRX¯(1p)) are
    I~RWB,1(a)=[log(VaRX(1p))D~[B+Ba2]:B,log(VaRX(1p))D~[BBa2]:B]
    and
    I~RWB,2(a)=[log(VaRX(1p))D~(Ba),log(VaRX(1p))+D~(Ba)].

Again, substituting VaRX(1 − p) in (3.8) by the values in each interval above, we can construct the corresponding prediction interval for VaRY,n(1 − p). The simulation study below shows that the above procedure provides a good finite-sample coverage performance. The asymptotic theory for the random weighted bootstrap method can be derived with rather tedious calculations and thus is skipped.

4. Simulation Study

4.1. Independent data

This subsection carries out a simulation study to evaluate the finite-sample behavior of the proposed method for estimating VaR based on independent observations.

We draw 10000 random samples with sample size n = 500 or 1200 or 2500 from (1.2) with γ = 3 or 1/3, σ = 1, G being the standard normal distribution, and θ = 0.9. We use α¯=0.05, p = 0.01 or 0.001, and B = 10000 in the naive bootstrap method and the random weighted bootstrap method. The details of the naive bootstrap methods are available in Section A of the supplement. Same as the random weighted bootstrap confidence intervals IRWB,1(a) and IRWB,2(a) given in Section 2, we construct two different types of naive bootstrap intervals IBoot,1(a) and IBoot,2(a) based on the nominal and absolute differences of the estimators respectively. We use the nlm function in the R statistical software to minimize the likelihood function with the following initial values for γ and σα¯.

Let Yi=X(ni+1):nX(n[nα¯]):n for i = 1, … , m with m=[nα¯]. As we consider a positive index γ, we use the initial values

γini=1log2logY[m(138)]:mY[m(1316)]:mY[m(134)]:mY[m(138)]:mandσα¯ini=Y[m(138)]:mγini(38)γini1.

Here γini is the Pickands (1975) tail index estimator.

The coverage probabilities of the proposed intervals with levels a = 90% and 95% are reported in Tables 1 and 2, which show that: i) the normal approximation method is the worst, and ii) it is much better to use the naive bootstrap method and the random weighted bootstrap method with critical values computed from the empirical distribution of the absolute differences between the bootstrapped estimators and the maximum likelihood VaR estimator. Further, the normal Q-Q plots in Figures B.1 and B.2 of the supplement show that the distribution of the VaR estimator is away from a normal distribution, especially when p is very small. Hence we prefer IBoot,2(a) and IRWB,2(a) to IBoot,1(a) and IRWB,1(a) in risk analysis.

Table 1.

Confidence intervals with level a = 90%. Empirical coverage probabilities are reported for the normal approximation confidence interval INA(a), the naive bootstrap intervals IBoot,1(a) and IBoot,2(a), and the random weighted bootstrap intervals IRWB,1(a) and IRWB,2(a). We take γ = 3 or 1/3, σ = 1, G ~ N(0,1), and θ = 0.9 in (1.2).

(n, p, γ) INA(0.90) IBoot,1(0.90) IBoot,2(0.90) IRWB,1(0.90) IRWB,2(0.90)
(500,0.01,3) 0.7671 0.8447 0.9042 0.8516 0.9009
(500,0.001,3) 0.6634 0.8267 0.9012 0.8089 0.8936
(1200,0.01,3) 0.8247 0.8723 0.9005 0.8697 0.9012
(1200,0.001,3) 0.7392 0.8607 0.8971 0.8535 0.8984
(2500,0.01,3) 0.8573 0.8901 0.8987 0.8869 0.9007
(2500,0.001,3) 0.7837 0.8812 0.8957 0.8706 0.8972
(500,0.01,1/3) 0.8569 0.8571 0.8990 0.8591 0.8966
(500,0.001,1/3) 0.7453 0.7053 0.9318 0.6791 0.9210
(1200,0.01,1/3) 0.8803 0.8815 0.9034 0.8799 0.9036
(1200,0.001,1/3) 0.8027 0.7840 0.9145 0.7635 0.9136
(2500,0.01,1/3) 0.8928 0.8898 0.8985 0.8893 0.8975
(2500,0.001,1/3) 0.8494 0.8446 0.9029 0.8205 0.9048

Table 2.

Confidence intervals with level a = 95%. Empirical coverage probabilities are reported for the normal approximation confidence interval INA(a), the naive bootstrap intervals IBoot,1(a) and IBoot,2(a), and the random weighted bootstrap intervals IRWB,1(a) and IRWB,2(a). We take γ = 3 or 1/3, σ = 1, G ~ N(0,1), and θ = 0.9 in (1.2).

(n, p, γ) INA(0.95) IBoot,1(0.95) IBoot,2(0.95) IRWB,1(0.95) IRWB,2(0.95)
(500,0.01,3) 0.7931 0.8928 0.9523 0.8806 0.9508
(500,0.001,3) 0.6817 0.8718 0.9512 0.8393 0.9479
(1200,0.01,3) 0.8538 0.9184 0.9491 0.9130 0.9489
(1200,0.001,3) 0.7626 0.9075 0.9480 0.8932 0.9486
(2500,0.01,3) 0.8908 0.9359 0.9480 0.9306 0.9486
(2500,0.001,3) 0.8102 0.9316 0.9478 0.9173 0.9483
(500,0.01,1/3) 0.9008 0.9102 0.9490 0.9126 0.9483
(500,0.001,1/3) 0.7852 0.7631 0.9655 0.7162 0.9607
(1200,0.01,1/3) 0.9235 0.9301 0.9520 0.9274 0.9524
(1200,0.001,1/3) 0.8396 0.8253 0.9572 0.7986 0.9567
(2500,0.01,1/3) 0.9375 0.9408 0.9485 0.9399 0.9491
(2500,0.001,1/3) 0.8838 0.8843 0.9519 0.8608 0.9532

4.2. ARMA-GARCH sequence

This subsection carries out a simulation study to evaluate the finite-sample behavior of the proposed method for estimating VaR based on an AR-GARCH sequence.

Due to the computation burden of the random weighted bootstrap method, we draw 1000 random samples with sample size n = 1200 and 2500 from the following AR-GARCH model:

Yt=0.03370.0620Yt1+εt,εt=htXt,ht=0.0123+0.0883εt1+0.8310ht1,

where Xt=(etEet)Eet, et = δet,1−(1 − δ)et,2, and et,1 and et,2 are independent GPD random variables with CDF F(x) = 1 −(1 + γx)−1/γ. The parameters are calibrated from the daily returns on the S&P500 index between 2012 and 2016. We consider γ = 1/3 and 1/6 to ensure EXt2<. We take δ = 0.5 and use the random weighted bootstrap method with B = 10000. The coverage probabilities of the proposed intervals with levels a = 90% and 95% are reported in Table 3, which show that I~RWB,2(a) is again better than I~RWB,1(a) and performs well except under the case (n, p) = (1200, 0.001), where over-coverage is observed.

Table 3. Confidence intervals for AR-GARCH models.

Empirical coverage probabilities are reported for the random weighted bootstrap confidence intervals I~RWB,1(a) and I~RWB,2(a) with a = 0.90 and 0.95.

(n, p, γ) I~RWB,1(0.90) I~RWB,2(0.90) I~RWB,1(0.95) I~RWB,2(0.95)
(1200,0.01,1/3) 0.843 0.908 0.901 0.961
(1200,0.001,1/3) 0.629 0.940 0.689 0.968
(2500,0.01,1/3) 0.878 0.894 0.928 0.949
(2500,0.001,1/3) 0.700 0.906 0.763 0.952
(1200,0.01,1/6) 0.857 0.898 0.918 0.951
(1200,0.001,1/6) 0.703 0.936 0.764 0.970
(2500,0.01,1/6) 0.868 0.903 0.930 0.943
(2500,0.001,1/6) 0.757 0.896 0.811 0.943

In summary, for quantifying the inference uncertainty of VaR estimation, we prefer the random weighted bootstrap method with critical values computed from the empirical distribution of the absolute differences between the bootstrapped risk estimators and the risk estimator, which works well for independent data and dependent data.

5. Data Analysis

5.1. Danish fire insurance losses

This subsection analyzes the Danish fire insurance data1 in McNeil (1997) using the proposed semi-parametric GPD model in (1.2) by treating the fire losses as independent data. The dataset consists of 2167 large fire insurance claims (i.e., losses) in Denmark from January 1980 until December 1990.

We first check the validity of Assumption 1 via the probability-probability plot (P-P plot). Specifically, denote the data as {Xi}i=12167 and its empirical distribution as F^(.).. For a given threshold level α¯ and the corresponding threshold u=F^(1α¯), we estimate a GPD distribution G^(;σ^α¯,γ^α¯) based on {XiXi > u} and plot it against G~(.)=(F^(.+u)F^(u))(1F^(u)). Figure 1 gives the P-P plots for α¯=0.1 and 0.05. As can be seen, both P-P plots are roughly linear, supporting the validity of Assumption 1.

Fig. 1.

Fig. 1

P-P plots for the Danish data with threshold levels α¯=0.1 and 0.05.

We then perform a sensitivity analysis of the proposed method with respect to the choice of threshold level α¯. Specifically, we calculate the (1 − p)×100% VaR at level p = 0.01, 0.005, 0.001 using varying threshold levels α¯=0.05, 0.1. The confidence interval (C.I.) for VaR is calculated at a = 90% and 95% level. To construct the C.I., we implement the normal approximation (NA), the naive bootstrap method (Boot1 and Boot2), and the random weighted bootstrap method (RWB1 and RWB2). For comparison, we further conduct a naive nonparametric bootstrap (Naive), where we simply bootstrap the Danish fire insurance data and use sample quantile to estimate the VaR and its C.I.

The results are given in Figure 2. The performance of the semi-parametric GPD is fairly stable with respect to α¯ for p = 0.01, 0.005 and shows some variation for p = 0.001. Note that the naive nonparametric bootstrap (Naive) gives a very wide (and thus non-informative) C.I. for extreme VaR (p = 0.001), which highlights the value/necessity of the proposed semiparametric C.I. construction approach. We also report the Q-Q plots of log(VaRX(b)(1p)VaRX(1p)), b = 1, 2, ⋯, 10000, for Boot and RWB, respectively, in Figures B.3 and B.4 of the supplement. These figures show that the distribution is generally skewed, especially for p = 0.005 and 0.001.

Fig. 2.

Fig. 2

Sensitivity analysis of constructed confidence intervals (C.I.) for the (1 − p)×100% VaR at level p = 0.01, 0.005, 0.001. The y-axis is in the unit of 1 million Danish Krone. EmpQ stands for VaR estimated by the sample quantile, and EstQ stands for VaR estimated by the semi-parametric GPD. The result of the naive approach (Naive) does not depend on α¯, thus is only plotted unders α¯=0.05 to avoid confusion.

We further conduct a leave-one-out validation (LOOV) for the proposed semi-parametric GPD model and compare it with the naive nonparametric sample quantile approach (Naive). Specifically, for each observation Xi with i = 1, 2, ⋯ , 2167, we use the leave-one-out sample Xi to estimate the (1 − p)×100% VaR by either semi-parametric GPD or sample quantile (Naive). For evaluation, we use the empirical coverage rate, which is defined as the proportion of experiments where the left-out loss Xi is covered by (i.e. lower than) the estimated VaR based on the leave-one-out sample Xi for i = 1, ⋯, 2167.

Table 4 reports the empirical coverage rate of the estimated VaR by the two approaches across the 2167 experiments and further gives the corresponding p-values from the binomial tests2 for the null hypothesis that the coverage probability of the estimated (1 − p)×100% VaR is indeed the target level 1 − p. As can be seen, both approaches give a satisfactory result with GPD providing a perfect performance. Moreover, note that the performance of the GPD approach is insensitive to the threshold level of α¯ , indicating the statistical stability of the proposed approach.

Table 4.

The empirical coverage rate (Emp. rate) of the estimated (1 − p)×100% VaR across the 2167 experiments at level p = 0.01, 0.005, 0.001. The result of the naive approach does not depend on α¯, thus is only reported under α¯=0.05 to avoid confusion.

p α¯ Emp. rate(GPD) Emp. rate(Naive) p-value(GPD) p-value(Naive)
0.010 0.050 0.990 0.989 1.000 0.745
0.010 0.100 0.990 - 1.000 -
0.005 0.050 0.995 0.994 1.000 0.647
0.005 0.100 0.995 - 1.000 -
0.001 0.050 0.999 0.999 0.483 0.483
0.001 0.100 0.999 - 0.483 -

5.2. Losses in the S&P500 index

This subsection analyzes the daily negative log-returns (i.e., losses) in the S&P500 index using the proposed semi-parametric GPD method with an ARMA-GARCH model. Precisely, on each day t, based on the past 2500 historical observations (yt−2499, yt−2498, ⋯, yt), we fit an AR(1)-GARCH(1,1) model using the proposed two-step self-weighted estimation method. We then calculate the one-day ahead (1 − p)×100% conditional VaR by the semi-parametric GPD method with a threshold α¯ and construct the corresponding 90% or 95% C.I. by RWB.

For comparison, we also conduct the analysis using a traditional nonparametric approach (Trad). That is, we fit an AR(1)-GARCH(1,1) model by MLE and use the sample quantile of the fitted residuals to calculate the one-day ahead conditional VaR and bootstrap the residuals to construct the corresponding C.I. of VaR.

We let t start from 11/01/2007, which is roughly the start of the financial crisis, and we make the end date to be 10/20/2011, which roughly marks the end of the crisis. In other words, we aim to test the ability of the proposed GPD method for monitoring a financial system under stress.

There are 1000 predictions of one-day ahead conditional VaR given by the semi-parametric GPD approach and the traditional nonparametric approach (Trad). We vary p = 0.01, 0.005, 0.001 and set the confidence level of the C.I. to be 90% or 95%. For the semi-parametric GPD approach, we further vary α¯=0.05, 0.1. Table 5 reports the empirical coverage rate of the estimated VaR by the two approaches across the 1000 predictions, which is defined as the proportion of predictions where the observed loss is lower than the estimated one-day ahead conditional VaR. Table 5 also gives the corresponding p-values from the binomial test. As can be seen, the traditional nonparametric approach tends to underestimate the true conditional VaR, and thus imposes serious under-reserve risk. On the other hand, the semi-parametric GPD gives satisfactory prediction performance and passes all the binomial tests. Moreover, note that the performance of the GPD approach is again insensitive to the threshold level of α¯, indicating the statistical stability of the proposed approach.

Table 5.

The empirical coverage rate (Emp. rate) of the estimated (1 − p)×100% VaR across the 1000 predictions at level p = 0.01, 0.005, 0.001. The result of the traditional approach does not depend on α¯, thus is only reported under α¯=0.05 to avoid confusion.

p α¯ Emp. rate(GPD) Emp. rate(Trad) p-value(GPD) p-value(Trad)
0.010 0.050 0.985 0.979 0.111 0.002
0.010 0.100 0.985 - 0.111 -
0.005 0.050 0.995 0.989 1.000 0.020
0.005 0.100 0.995 - 1.000 -
0.001 0.050 1.000 0.998 0.632 0.264
0.001 0.100 1.000 - 0.632 -

For illustration, Figure 3 plots the estimated VaR and the corresponding C.I. given by RWB2 and the traditional nonparametric approach. For the plots, we set α¯=0.05, the confidence level of C.I. a = 90% and vary p = 0.01, 0.005, 0.001. The result for RWB1 and the result for α¯=0.1 and the confidence level a = 95% are similar and thus are omitted. Note that compared to RWB2, the C.I. given by the traditional nonparametric method is narrower for p = 0.01, which may be possibly due to the fact that the C.I. by the traditional nonparametric approach does not incorporate the estimation uncertainty of the AR(1)-GARCH(1,1) model. On the other hand, the nonparametric approach gives much wider C.I. for extreme quantiles p = 0.005, 0.001, indicating that a naive nonparametric bootstrap cannot construct an informative C.I. for extreme quantiles. This phenomenon is observed in the Danish insurance data analysis as well. In Appendix C, we also show that our proposed method outperforms the filtered historical simulation methods in Kuester et al. (2006). The conclusions are qualitatively the same, and the details are available in the supplement.

Fig. 3.

Fig. 3

Estimated one-day ahead conditional VaR (red line) and its 90% C.I. (blue dashed lines) by the random weight bootstrap (RWB2) and the traditional nonparametric approach (Trad). The black line denotes the negative daily log returns of the S&P500 index.

We conclude the analysis by providing a validity check of Assumption A6. Specifically, Figure 4 gives the P-P plots for α¯=0.1 and 0.05 based on the entire dataset from 11/24/1997 to 10/20/2011 (2500 + 1000 observations). The P-P plot is generated in the same fashion as in Section 5.1 except it is now based on the residuals of the AR(1)-GARCH(1,1) model. Both P-P plots are roughly linear and support Assumption A6.

Fig. 4.

Fig. 4

P-P plots for the S&P500 index with threshold levels α¯=0.1 and 0.05.

6. Conclusions

Given that regulators often set a high VaR level in risk management, fitting distribution in the tail is essential. This paper infers a semi-parametric model, which only models exceedances over a non-divergent threshold by the generalized Pareto distribution. Asymptotic results for parameters and VaR estimation are first derived for independent data. For financial data modeled by an ARMA-GARCH process, a three-step weighted estimation procedure is proposed to ensure a normal limit for the estimated parameters and conditional VaR with heavy-tailed observations. For efficiently quantifying the uncertainty of risk forecast, a random weighted bootstrap method is proposed and shown to be consistent. A simulation study and real data analysis confirm the advantages of the proposed methodologies. It is crucial to develop a distribution free goodness-of-fit test and the asymptotic theory for dynamic modeling of the generalized Pareto distribution, which will be our future research plan.

Supplementary Material

Supp 1

ACKNOWLEDGMENTS

We thank the editor, Professor Jianqing Fan, an associate editor, and two reviewers for their useful comments that led to this improved version of the manuscript. Peng’s research was partly supported by the Simons Foundation and the NSF grant, DMS2012448. Zhang’s research was partially supported by the National Cancer Institute and the National Institute of General Medical Sciences of the National Institutes of Health under award numbers R03CA235363 and R01GM131491, respectively.

Footnotes

SUPPLEMENTARY MATERIAL

Supplement to “Risk Analysis via Generalized Pareto Distributions” In this supplement, we provide the asymptotic theory for the naive bootstrap method for independent data (Remark 6), report some additional Q-Q plots discussed in Sections 4.1 and 5.1, discuss some additional empirical results for the S&P 500 index, prove Theorems 1–5, and deduce the results for divergent thresholds from Remarks 2 and 7 in details.

1

The dataset is publicly available via R package evir.

2

Under the null hypothesis, the number of coverage by the estimated (1 − p)×100% VaR across n experiments should follow a binomial distribution with parameters (n, 1 − p). See Kratz et al. (2018) for more details of the binomial test.

Contributor Information

YI HE, Amsterdam School of Economics, University of Amsterdam, Amsterdam 1001 NJ, The Netherlands.

LIANG PENG, Department of Risk Management and Insurance, Georgia State University, Atlanta, GA.

DABAO ZHANG, Department of Statistics, Purdue University, West Lafayette, IN.

ZIFENG ZHAO, Mendoza College of Business, University of Notre Dame, Notre Dame, IN.

References

  1. Allen L, Bali TG, and Tang Y (2012), “Does Systemic Risk in the Financial Sector Predict Future Economic Downturns?,” The Review of Financial Studies, 25, 3000–3036. [Google Scholar]
  2. Balkema A, and de Haan L (1974), “Residual Life Time at Great Age,” The Annals of Probability, 2, 792–804. [Google Scholar]
  3. Barro RJ, and Jin T (2011), “On the Size Distribution of Macroeconomic Disasters,” Econometrica, 79, 1567–1589. [Google Scholar]
  4. Bollerslev T (1986), “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. [Google Scholar]
  5. Bollerslev T, and Todorov V (2011), “Estimation of Jump Tails,” Econometrica, 79, 1727–1783. [Google Scholar]
  6. Brodin E, and Rootzén H (2009), “Univariate and Bivariate GPD methods for Predicting Extreme Wind Storm Losses,” Insurance: Mathematics and Economics, 44, 345–356. [Google Scholar]
  7. Bücher A, and Segers J (2017), “On the Maximum Likelihood Estimator for the Generalized Extreme-Value Distribution,” Extremes, 20, 839–872. [Google Scholar]
  8. Chavez-Demoulin V, Davison AC, and McNeil AJ (2005), “Estimating Value-at-Risk: a Point Process Approach,” Quantitative Finance, 5, 227–234. [Google Scholar]
  9. Chavez-Demoulin V, and Embrechts P (2004), “Smooth Extremal Models in Finance and Insurance,” The Journal of Risk and insurance 71, 183–199. [Google Scholar]
  10. Chavez-Demoulin V, Embrechts P, and Sardy S (2014), “Extreme-quantile Tracking for Financial Time Series,” Journal of Econometrics, 181, 44–52. [Google Scholar]
  11. Cont R (2001), “Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues,” Quantitative Finance, 1, 223–236. [Google Scholar]
  12. Cramér H. (1946), Mathematical Methods of Statistics, Princeton University Press. [Google Scholar]
  13. Davison AC, and Smith RL (1990), “Models for Exceedances over High Thresholds,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 52, 393–442. [Google Scholar]
  14. de Haan L, and Ferreira A (2006), Extreme Value Theory: an Introduction, New York: Springer. [Google Scholar]
  15. Dombry C (2015), “Existence and Consistency of the Maximum Likelihood Estimators for the Extreme Value Index within the block maxima framework.”, Bernoulli, 21, 420–436. [Google Scholar]
  16. Drees H, Ferreira A, and de Haan L (2004), “On Maximum Likelihood Estimation of the Extreme Value Index,” The Annals of Applied Probability, 14, 1179–1201. [Google Scholar]
  17. Duffie D, and Pan J (1997), “An Overview of Value at Risk,” The Journal of Derivatives, 4, 7–49. [Google Scholar]
  18. Embrechts P, Klüppelberg C, and Mikosch T (1997), Modelling Extremal Events for insurance and Finance, Berlin: Springer. [Google Scholar]
  19. Engle RF (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of U.K. Inflation,” Econometrica, 50, 987–1008. [Google Scholar]
  20. Francq C, and Zakoïan J (2004), “Maximum Likelihood Estimation of Pure GARCH and ARMA-GARCH processes,” Bernoulli, 10, 605–637. [Google Scholar]
  21. Hall P, and Tajvidi N (2000), “Nonparametric Analysis of Temporal Trend when Fitting Parametric Models to Extreme-Value Data,” Statistical Science, 15, 153–167. [Google Scholar]
  22. Hall P, and Yao Q (2003), “Inference in ARCH and GARCH Models with Heavy Tailed Errors,” Econometrica, 71, 285–317. [Google Scholar]
  23. He Y, Hou Y, Peng L, and Shen H (forthcoming), “Inference for Conditional Value-at-Risk of a Predictive Regression,” The Annals of Statistics. [Google Scholar]
  24. Hoga Y (2019), “Confidence Intervals for Conditional Tail Risk Measures in ARMA-GARCH Models,” Journal of Business & Economic Statistics, 37, 613–624. [Google Scholar]
  25. Hull JC (2018), Risk Management and Financial Institutions, Fifth ed.,Hoboken: Wiley. [Google Scholar]
  26. Jorion P (2006), Value at Risk: the New Benchmark for Measuring Financial Risk, Third ed., New York:McGraw-Hill. [Google Scholar]
  27. Kelly B, and Jiang H (2014), “Tail Risk and Asset Prices,” The Review of Financial Studies, 27, 2841–2871. [Google Scholar]
  28. Koul HL, and Ling S (2006), “Fitting an Error Distribution in Some Heteroscedastic Time Series Models,” The Annals of Statistics, 34, 994–1012. [Google Scholar]
  29. Kratz M, Lok YH, and McNeil A (2018), “Multinomial VaR backtests: A simple implicit approach to backtesting expected shortfall,” Journal of Banking & Finance, 88, 393–407. [Google Scholar]
  30. Kuester K, Mittnik S, and Paolella MS. (2006), “Value-at-risk Prediction: A Comparison of Alternative Strategies,” Journal of Financial Econometrics, 4, 53–89. [Google Scholar]
  31. Ling S (2007), “Self-Weighted and Local Quasi-Maximum Likelihood Estimators for ARMA-GARCH/IGARCH Models,” Journal of Econometrics, 140, 849–873. [Google Scholar]
  32. Martins-Filho C, Yao F, and Torero M (2018), “Nonparametric Estimation of Conditional Value-at-risk and Expected Shortfall based on Extreme Value Theory,” Econometric Theory, 34, 23–67. [Google Scholar]
  33. Massacci D (2017), “Tail Risk Dynamics in Stock Returns: Links to the Macroeconomy and Global Markets Connectedness,” Management Science, 63, 3072–3089. [Google Scholar]
  34. McNeil AJ (1997), “Estimating the Tails of Loss Severity Distributions Using Extreme Value Theory,” ASTIN Bulletin, 27, 117–137. [Google Scholar]
  35. McNeil AJ, and Frey R (2000), “Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: an Extreme Value Approach,” Journal of Empirical Finance, 7, 271–300. [Google Scholar]
  36. Owen A (2001), Empirical Likelihood, New York: Chapman & Hall. [Google Scholar]
  37. Peng L, and Qi Y (2009). “Maximum Likelihood Estimation of Extreme Value Index for Irregular Cases,” Journal of Statistical Planning and inference, 139, 3361–3376. [Google Scholar]
  38. Pickands J (1975), “Statistical Inference Using Extreme Order Statistics,” The Annals of Statistics, 3, 119–131. [Google Scholar]
  39. Resnick SI (1987), Extreme Values, Regular Variation and Point Processes, New York: Springer. [Google Scholar]
  40. Rootzén H, and Tajvidi N (1997), “Extreme Value Statistics and Wind Storm Losses: a Case Study,” Scandinavian Actuarial Journal 1, 70–94. [Google Scholar]
  41. Smith RL (1985). “Maximum Likelihood Estimation in a Class of Nonregular Case,” Biometrika, 72, 67–90. [Google Scholar]
  42. Smith RL (1987), “Estimating Tails of Probability Distributions,” The Annals of Statistics, 15, 1174–1207. [Google Scholar]
  43. Zhao Z (2020), “Dynamic Bivariate Peak over Threshold Model for Joint Tail Risk Dynamics of Financial Markets,” Journal of Business & Economic Statistics. [Google Scholar]
  44. Zhou C (2009), “Existence and Consistency of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 100, 794–815. [Google Scholar]
  45. Zhou C (2010), “The Extent of the Maximum Likelihood Estimator for the Extreme Value Index,” Journal of Multivariate Analysis, 101, 971–983. [Google Scholar]
  46. Zhu K, and Ling S (2011), “Global Self-Weighted and Local Quasi-Maximum Exponential Likelihood Estimators for ARMA-GARCH/IGARCH Models,” The Annals of Statistics, 39, 2131–2163. [Google Scholar]
  47. Zhu K (2016), “Bootstrapping the Portmanteau Tests in Weak Auto-Regressive Moving Average Models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78, 463–485. [Google Scholar]
  48. Zhu K (2019), “Statistical Inference for Autoregressive Models Under Heteroscedasticity of Unknown Form,” The Annals of Statistics, 6, 3185–3215. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES