Incorporating spatial dependence in regional frequency analysis

Zhuo Wang; Jun Yan; Xuebin Zhang

doi:10.1002/2013WR014849

. 2014 Dec 23;50(12):9570–9585. doi: 10.1002/2013WR014849

Incorporating spatial dependence in regional frequency analysis

Zhuo Wang ¹, Jun Yan ^1,^2,^✉, Xuebin Zhang ³

PMCID: PMC4328148 PMID: 25745273

Abstract

The efficiency of regional frequency analysis (RFA) is undermined by intersite dependence, which is usually ignored in parameter estimation. We propose a spatial index flood model where marginal generalized extreme value distributions are joined by an extreme-value copula characterized by a max-stable process for the spatial dependence. The parameters are estimated with a pairwise likelihood constructed from bivariate marginal generalized extreme value distributions. The estimators of model parameters and return levels can be more efficient than those from the traditional index flood model when the max-stable process fits the intersite dependence well. Through simulation, we compared the pairwise likelihood method with an L-moment method and an independence likelihood method under various spatial dependence models and dependence levels. The pairwise likelihood method was found to be the most efficient in mean squared error if the dependence model was correctly specified. When the dependence model was misspecified within the max-stable models, the pairwise likelihood method was still competitive relative to the other two methods. When the dependence model was not a max-stable model, the pairwise likelihood method led to serious bias in estimating the shape parameter and return levels, especially when the dependence was strong. In an illustration with annual maximum precipitation data from Switzerland, the pairwise likelihood method yielded remarkable reduction in the standard errors of return level estimates in comparison to the L-moment method.

Keywords: extreme analysis, max-stable process

Introduction

Natural extremes, such as extreme rainfall or extreme temperature, have profound impact on both the environment and the society. Regional frequency analysis (RFA) is widely used in characterizing the frequency of the extreme events. It is a technique that based on the regionalization concept that trades space for time to obtain adequate estimation of model parameters based on data from a low-density network with short record length [see e.g., Ouarda, ²⁰¹³, for a recent review]. It uses data from a number of sites that are identified to be in a homogeneous region in certain sense (see section 2 for specific definition) to estimate the quantiles of the variables of interest at each site in the region. That is, short records from different sites within a homogeneous region are pooled to improve the estimation efficiency. Widely used in water resources research, the RFA approach has various models and has been extended to accommodate temporal nonstationarity [e.g., Ouarda et al., ²⁰⁰⁶; Cunderlik and Ouarda, ²⁰⁰⁶; Leclerc and Ouarda, ²⁰⁰⁷] and multivariate analysis [e.g., Ouarda et al., ²⁰⁰⁰; Javelle et al., ²⁰⁰²; Chebana and Ouarda, ²⁰⁰⁹]. Nevertheless, our focus is the stationary index flood model with marginal generalized extreme value (GEV) distributions. We improve the efficiency of this type of RFA by incorporating spatial dependence and compare its performance with competing methods to better understand its advantages and limitations.

In an index flood model, the marginal distributions are identical apart from a site-specific scaling factor. Two popular methods are available for parameter estimation, neither of which needs to specify the intersite dependence. The L-moment method [e.g., Hosking and Wallis, ¹⁹⁹⁷] estimates the parameters by solving the equations that match the sample L-moments with the population moments. It first estimates the site-specific scaling factor with at-site data, and then uses it to scale the data at each site. The scaleless data are then pooled to estimate the parameters of the shared scaleless distribution by matching the L-moments. Properties of the L-moments were studied by Hosking [1990]. L-moments are more robust to sampling variability than conventional moments, and their existence only requires existence of the mean. The second method is the independence likelihood method that adds up the marginal loglikelihood from all sites, ignoring the intersite dependence, and then maximizes it (R. L. Smith, Regional estimation from spatially dependent data, University of Chapel Hill, unpublished data, 1990b). The independence likelihood method in RFA gives the most efficient estimator for larger samples, and it can incorporate covariates into model parameters [e.g., Buishand, ¹⁹⁹¹; Northrop, ²⁰⁰⁴], which is necessary in many cases where temporal or spatial nonstationarity is present. Both the L-moment method and the independence likelihood method are robust to intersite dependence at the cost of low efficiency when the dependence is strong.

The impact of intersite dependence on many aspects of RFA is a fundamental issue that has been actively investigated. Intersite correlation does not introduce bias but increases the variance in predicting regional mean or moments [e.g., Matalas and Langbein, ¹⁹⁶²; Stedinger, ¹⁹⁸³]. For index flood models, Hosking and Wallis [1988] reported similar findings in predicting flood quantiles. Intersite dependence was found in general to increase the variance of the estimator in other contexts such as estimation of regional exceeding probability of a flood level [Troutman and Karlinger, ²⁰⁰³] or a regional envelope curve [Castellarin et al., ²⁰⁰⁵]. Intersite correlation is part of the model in probabilistic regional envelope curves [Castellarin, ²⁰⁰⁷; Viglione et al., ²⁰¹²]. It has been used in regression analysis with generalized least squares to estimate the parameters of a model of the target quantity as a function of basin characteristics [Griffis and Stedinger, ²⁰⁰⁷]. For testing regional homogeneity with the heterogeneity measures of Hosking and Wallis [1993], intersite dependence reduces the power of the tests [Castellarin et al., ²⁰⁰⁸]. Most of the existing simulation studies generated data from meta-Gaussian models, which essentially use the normal copula for the dependence structure [e.g., Hosking and Wallis, ¹⁹⁸⁸]. For extreme observations, however, the Pearson correlation coefficient may not be a good dependence measure [Embrechts et al., ²⁰⁰²] and the normal copula may not be a good dependence model [Genest and Favre, ²⁰⁰⁷; Gudendorf and Segers, ²⁰¹⁰]. Smith (unpublished data, 1990b) reported a study where the intersite dependence was modeled with an extreme-value copula, but it was an exchangeable Gumbel copula which does not allow the dependence to weaken as the distance between two sites increases.

Spatial extreme modeling has made progress recently in the statistics literature; see Davison et al. [2012] for a recent review. Max-stable processes extend the multivariate extreme value distribution to the infinite dimensional setting [de Haan, ¹⁹⁸⁴], with marginal distribution of any dimension being multivariate extreme-value. These models provide a natural modeling framework for spatial extremes. A pairwise likelihood approach has been used in parameter estimation due to the unavailability of the joint multivariate density function [e.g., Padoan et al., ²⁰¹⁰; Davison and Gholamrezaee, ²⁰¹²]. The pairwise likelihood approach has a robust feature that its validity in inference only needs the correct specification of the bivariate joint density of all the pairs, instead of the full joint density. A spatial index model retains the marginal GEV distributions and uses a max-stable process model for the dependence structure. The pairwise likelihood approach with pairwise bivariate generalized extreme value distributions can potentially increase the efficiency for marginal GEV parameter and return level estimation. Similar efficiency improvement with max-stable process model has recently been reported in detection of nonstationarity in precipitation extremes [Westra and Sisson, ²⁰¹¹].

The rest of the article is organized as follows. The index flood model with GEV distributions is reviewed in Section 2, along with two existing estimation methods: L-moment and independence likelihood. Spatial extreme models, their application in RFA with index flood model, and parameter estimation with a pairwise likelihood method are introduced in Section 3. A large scale simulation study that compares the performance of proposed method with the L-moment method and the independence likelihood method is reported in Section 4. All three methods are illustrated in an example of Swiss annual maximum daily precipitation in Section 5. A discussion concludes in Section 6.

Index Flood Model With GEV Distribution

Model

The index flood model is a widely used RFA model with the homogeneity assumption being that all the sites have an identical distribution up to a site-specific scaling factor known as the index variable. It originated from applications to flood data in hydrology, but the method can be used with any kind of data [Hosking and Wallis, ¹⁹⁹⁷, p. 6]. Examples of application to precipitation are Kysely and Picek [2007] and Ngongondo et al. [2011]. Let Inline graphic be the quantile function of the distribution at site s; i.e.,, where F_s is the distribution function at site s. The index flood procedure assumes that for all site s in a homogeneous region,, where c_s is a site specific index variable, and is called the regional growth curve, the scale-free quantile function shared by all sites. Within the region, the T-year return level at any site s which is the upper 1/T-quantile, is proportional to the return level of the scale-free distribution: Inline graphic .

The GEV distribution is often used to model the regional growth curve. It can be obtained as the limit distribution of properly normalized maximum of a sequence of independent and identically distributed random variables. The cumulative distribution function of a GEV distribution is

(1)

where μ, σ and ξ are the location, scale and shape parameters, respectively. Let GEV Inline graphic denote this distribution. The shape parameter ξ controls the tail behavior of the distribution. The distribution is known as the Gumbel distribution when ξ = 0. The case with has heavy tail is of most interest since real data of extreme events often exhibits heavy tail. The quantile function of GEV Inline graphic is the inverse function of F in (1):

(2)

The T-year return level is then Inline graphic .

When the GEV distribution is used in a index flood model, the location parameter can be used as the index variable. In particular, let Inline graphic and let Z be a GEV variable. It is straightforward to show that the distribution of is GEV. Therefore, the homogeneity for this index flood model means that the ratio of the scale parameter to the location parameter of the GEV parameters is a constant (γ), and that the shape parameters at all sites are the same [e.g., Buishand, ¹⁹⁹¹; Hanel et al., ²⁰⁰⁹]. Note that the index flood model only specifies the marginal GEV distributions; no spatial dependence is specified.

Existing Estimation Methods

Suppose that we observe annual maxima of a variable of interest at m sites over n years. Let Inline graphic and, be the record in year t from site s. The data from year to year are assumed to be independent, but within the same year, spatial dependence exists across the sites. For ease of presentation, the notations are for balanced data where all sites have the same length of records, but the methods can be easily adapted to use varying length of records. Let μ_s, Inline graphic , and ξ_s be the location, scale, and shape parameters, respectively, of the GEV distribution at site s. The parameters to be estimated are. Two existing estimation methods are the L-moment method [Hosking and Wallis, ¹⁹⁸⁸] and the independence likelihood method [Hanel et al., ²⁰⁰⁹; Smith, unpublished data, 1990b], neither of which requires the specification of the spatial dependence. As both methods target small samples, asymptotic variance estimator of the parameter estimator is not expected to work well, which is observed in our simulation studies in Section 4. A parametric bootstrap procedure with preserved spatial dependence is used to assess the uncertainty of the estimator.

L-Moment

The L-moment method proceeds as follows. First, for each site s, estimate the GEV parameters Inline graphic using data from this site with the L-moment method, and let be the estimate of μ_s. Then use to scale the data at each site s by letting. Apply the L-moment method to the pooled, scaled data, to fit a GEV distribution with location 1, scale γ and shape ξ. The only extra difficulty in the last step is that the location parameter of the GEV distribution is restricted to be 1. With two unknown parameters Inline graphic , the estimating equations match the first two sample L-moments (l₁, l₂) with their population counterparts:

where Inline graphic . The solutions to the equations are the L-moment estimates. The implicit restriction is required for the existence of the L-moments (finite mean), which makes the L-moment method more efficient than the likelihood method for small samples when is true [e.g., Coles and Dixon, ¹⁹⁹⁹]. When the homogeneity assumption is not valid, bias may be introduced, but RFA may still be more accurate than single site analysis [Lettenmaier et al., ¹⁹⁸⁷; Hosking and Wallis, ¹⁹⁹⁷].

Independence Likelihood

Denote Inline graphic as the probability density function of the GEV distribution at site s in year t. The independence likelihood method estimates β by maximizing the log-likelihood function pretending that the sites are independent. That is, is the maximizer of

(3)

Similar to the L-moment method, this method only assumes correct specification of the marginal GEV distribution at each site. No spatial dependence is taken into account in the point estimation. For large samples, the variance of the estimator has a sandwich form under certain regularity conditions and can be consistently estimated by a sandwich estimator (Smith, unpublished data, 1990b). The sandwich variance adjusts for the unspecified spatial dependence. For small samples, however, a bootstrap procedure that preserves the spatial dependence can be used [Heffernan and Tawn, ²⁰⁰⁴]. Modification of the likelihood method to improve its small sample performance has been obtained by adding a penalty on the shape parameter [Coles and Dixon, ¹⁹⁹⁹] or, equivalently, imposing a prior distribution on it [Martins and Stedinger, ²⁰⁰⁰]. We do not consider them here because they introduce the complexity of penalty form selection or prior specification.

Spatial Index Flood Model

Spatial extreme models have gained much focus in the statistics literature. Can one exploit spatial extreme modeling in RFA to improve efficiency? If so, the spatial dependence can be used in a positive way for better efficiency instead of as nuisance that reduces the efficiency.

Max-Stable Process

By Sklar's Theorem, the distribution function H of a p-dimensional continuous random vector Inline graphic with marginal distribution, respectively, can be uniquely represented as

where Inline graphic , called a copula, is a p-dimensional distribution function with standard uniform marginals [Sklar, ¹⁹⁵⁹]. When H is a multivariate extreme value distribution, the corresponding copula C must be an extreme-value copula, which satisfies a max-stable property [Gudendorf and Segers, ²⁰¹⁰]. If all the margins are transformed to unit Fréchet distribution with distribution function Inline graphic , the max-stable property means

Max-stability is a defining property for max-stable processes whose marginal copula in any dimension is an extreme-value copula.

In a recent review, Davison et al. [2012] gave a spectral characterization of max-stable processes that unifies the characterizations in de Haan [1984] and Schlather [2002]. Consider a spatial domain Inline graphic . Let be a nonnegative stationary stochastic process on with and be independent copies of W. Let be the points of a Poisson process on with intensity. Then,

(4)

is a stationary max-stable process on Inline graphic with unit Fréchet marginal distributions. Different forms of W(x) lead to different parametric max-stable models.

It is often desirable to measure the extremal dependence of m sites. The extremal coefficient is such a measure. Consider a max-stable process Z defined in (7) at sites Inline graphic . The extremal coefficient of the m sites is

(5)

It can be interpreted as the effective sample size of the m variables. The upper bound of θ_m is m, meaning complete independence, while the lower bound is 1, meaning complete dependence. Specifically, for two sites x₁ and x₂, a bivariate extremal coefficient function Inline graphic can be defined as

(6)

where Inline graphic [Schlather and Tawn, ²⁰⁰³]. In a spatial context, the bivariate extremal coefficient of two sites is often modeled to increase from 1 to 2 as the distance between the two sites increases from zero to infinity. When — the dependence measure depends only on distance instead of direction — the model is isotropic.

Parametric Max-Stable Models

We consider three isotropic models that are used in the simulation study. The first model is obtained by taking Inline graphic and, where g is a bivariate density function and are the points of a homogeneous Poisson process with unit rate on. The special case where g is the normal density with mean zero and covariance matrix Σ is known as the Smith model (R. L. Smith, Max-stable processes and spatial extremes, University of Surrey, unpublished data, 1990a). The bivariate marginal distribution function at two sites x_i and x_j is

(7)

where Inline graphic is the cumulative distribution function of the standard normal variable,, and. The bivariate density function can be obtained by differentiating the distribution function [e.g., Padoan et al., ²⁰¹⁰]. The bivariate extremal coefficient function is, with range from 1 to 2, providing full range of dependence level. A limitation of the Smith model is that the storms generated from it have shapes that are too regular compared to the reality.

The second model we consider is the Schlather model [Schlather, ²⁰⁰²]. It is obtained by taking Inline graphic , where is a stationary Gaussian process with unit variance and correlation function ρ, a function of the euclidean distance between two sites x_i and x_j. The bivariate marginal distribution function is

(8)

where Inline graphic . The bivariate extremal coefficient is, with a range from 1 to, or about 1.707. Therefore, the Schlather model does not provide full range. It cannot be used to model sites that are completely independent. Models of correlation function are standard in spatial statistics [e.g., Banerjee et al., ²⁰⁰⁴, Table 2.1], and contain parameters that characterizing the strength of the spatial dependence.

The third model we consider is a geometric Gaussian process obtained by taking Inline graphic , where is again a stationary Gaussian process with unit variance and correlation function ρ, and is the variance of W(x) on the log scale [Davison et al., ²⁰¹², p. 172]. The bivariate marginal distribution is the same as (10) for the Smith model, except that. The bivariate extremal coefficient function is Inline graphic . The range of is from 1 to. The upper bound is 1.96, quite close to 2, if. As approaches 2 for any ρ.

Application to RFA

Spatial Index Flood Model

To incorporate the spatial dependence in the index flood model, we assume that the dependence structure among the sites is an extreme-value copula described by a max-stable process model. This assumption is in addition to the homogeneity assumption for the index flood model with GEV margins in Section 2. We keep using Inline graphic , as the observed data at site s in year t. The spatial index flood model completely specifies the joint distribution of for each t. As seen from Sklar's theorem, it is sufficient to specify the marginal models and the spatial dependence structure. Under the setup of index flood model, the marginal distribution at each site s is still GEV Inline graphic . The extreme-value copula is specified by a parametric max-stable process with dependence parameter α. Although the model is fully specified and easy to understand, the joint density is unavailable except for lower dimensions (m = 2, 3) for certain parametric models. The bivariate marginal density of two sites depends on α in addition to the marginal GEV parameters. Let Inline graphic be the cumulative distribution function of the GEV distribution at site s, let G be the cumulative distribution function of the unit Fréchet distribution, and let be the inverse function of G. The bivariate density of site i and j,, is

where Inline graphic is the bivariate marginal density of the max-stable process model with unit Fréchet margins,, and

Pairwise Likelihood Estimation

Inferences about max-stable process models have been mostly based on the composite likelihood approach [Padoan et al., ²⁰¹⁰; Davison and Gholamrezaee, ²⁰¹²]. The composite likelihood approach constructs an objective function, known as the composite likelihood, by putting together pieces of tractable likelihood, such as lower dimensional marginal densities [Lindsay, ¹⁹⁸⁸]. The composite likelihood is maximized to give the maximum composite likelihood estimator (MCLE) as if it were a likelihood. Under mild conditions, correct specification of the pieces in the composite likelihood leads to consistency and asymptotic normality of the MCLE. It has wide applications where the full joint distribution is unavailable or intractable but lower-order marginal or conditional distributions are known [e.g., Varin, ²⁰⁰⁸; Varin et al., ²⁰¹¹]. When the pieces in the composite likelihood are pairwise bivariate densities, the composite likelihood is also called pairwise likelihood. The independence likelihood in Smith (unpublished data, 1990b) is also a composite likelihood constructed from the univariate marginal GEV distributions.

The dependence parameter α and marginal parameter β are estimated jointly in the pairwise likelihood method. The pairwise likelihood is constructed with the bivariate density of all the site-pairs within the same years:

When the record lengths are different across sites, it can be constructed from all the available pairs within each year. Let Inline graphic be the maximizer of the pairwise log-likelihood (14). Under certain regularity conditions, is consistent to the true parameter vector and is asymptotically normally distributed [e.g., Padoan et al., ²⁰¹⁰]. The variance of can be estimated by a sandwich estimator, which can only give valid inference when the sample size n is large. For small to moderate sample sizes, as is often the case with RFA, a bootstrap variance estimator is preferred. Heffernan and Tawn [2004] proposed a bootstrap procedure that preserves the dependence structure for multivariate extremes. This procedure has been applied in a nonstationary index flood model [Hanel et al., ²⁰⁰⁹], and is used here.

Simulation Study

A simulation study was conducted to compare the performance of the three estimation methods for index flood model: L-moment, independence likelihood, and pairwise likelihood. Unlike the other two methods, which do not need to specify spatial dependence, the pairwise likelihood method incorporates spatial dependence through the extra specification of an extreme-value dependence model. It has the potential of being more efficient when the dependence model is correctly specified, but risks severe bias otherwise. The L-moment method has been found to be unbiased regardless of the spatial dependence [Hosking and Wallis, ¹⁹⁸⁸]. Nevertheless, existing studies all used normal copulas, which provide no extremal dependence, in generating data. The performance of the L-moment method for data with extremal dependence as generated from max-stable processes has not previously been assessed. Our simulation design reflects these needs.

Design

We considered data from m sites over n years in a study region Inline graphic . Data from different years were independent, but within the same year, data from different sites were generated with spatial dependence. The center point (5, 5) is included so that parameter estimates and return level estimates are compared at this point across scenarios. The additional Inline graphic sites were randomly generated in the region. The marginal distribution at site s is GEV with for, and μ_s for other sites randomly generated from a normal distribution and rounded to an integer. The parameters of this normal distribution were the sample mean and sample variance of the L-moment estimates of the μ_s's from an extreme rainfall data in Southern Ontario analyzed in Wang et al. [2014]. The values of μ_s ranged from 33 to 51.

Four factors were considered in the experimental design: the spatial dependence model, the spatial dependence level, the number of sites m, and the length of the record n. Four spatial dependence models were used to generate data, including three extreme-value models and one nonextreme-value model. The three parametric isotropic extreme-value models were the Smith model, the Schlather model, and the geometric Gaussian model, abbreviated as SM, SC, and GG, respectively. The nonextreme-value model is a Gaussian copula, which is also known as meta-Gaussian model, abbreviated as GA. For each model, three levels of dependence were used: weak, moderate, and strong, abbreviated as W, M, and S, respectively. The SM model had Inline graphic , where I₂ is the identity matrix of dimension 2, with τ chosen to be 4, 16, and 64 for the W, M, and S dependence, respectively. Observations at two sites with distance over would be close to independent. In our study region, these choices correspond to the cases where two sites are almost independent if their distance exceeds 4, 8, and 16, respectively. The SC model had a Gaussian correlation function Inline graphic with range parameter chosen such that the resulting bivariate extremal coefficient function matches that from the SM model as close as possible. It is a special case of the power exponential correlation family with smooth parameter fixed at 2 as in the R package SpatialExtremes [Ribatet and Singleton, ²⁰¹³]. Through nonlinear least squares, the values of Inline graphic were tuned to be 2.942, 5.910 and 13.153 for W, M, and S dependence, respectively. For the GG model, a bigger offers fuller range of dependence level for two sites, but the data generating function for this model in the R package SpatialExtremes works well only for. As a compromise, was fixed at 8. The GG model also had a Gaussian correlation structure, and similarly through nonlinear least squares, the range parameter was set to be 7.134, 14.780, and 31.149 for W, M, and S dependence, respectively. For the GA model, an exponential correlation function Inline graphic was used with range parameter to be 6, 12, and 20, which were chosen so that the fitted exponential correlation curves of the empirical correlation of the score functions of μ and ξ are close to those of SM model. Two levels of m were considered,. When m = 20, 10 additional sites were generated and added to those sites used in the case of m = 10. Finally, two levels of n, Inline graphic , were considered. This design led to 48 scenarios.

For each scenario, we generated 1000 data sets. For each data set, we estimated the GEV parameters and T-year return level Q_T for Inline graphic . The three methods, L-moment, independence likelihood, and pairwise likelihood, are abbreviated as LM, IL, and PL, respectively. In optimization, the IL estimator used the LM estimators as starting values, and the PL estimator used the IL estimators as starting values. Given the large number of parameters in the model, we maximized the likelihood with an iterative procedure that maximizes the objective function with respect to one parameter at a time while the other parameters are held constant. The procedure is iterated over all parameters until convergence. It was reported to give better estimation when the number of parameters is large [Blanchet and Davison, ²⁰¹¹]. In contrast to the LM and IL method, the PL method needed to specify a dependence model, which may be correct or incorrect. We studied its performance under correct specification and misspecification of the dependence structure within the spatial extreme model and with nonextreme-value copula (GA model).

Results

The center site [5,5] which was presented in all replicates was used to do the comparison across various scenarios. As the results from the three spatial extreme models were similar, we use the GG model to represent the three extreme-value models. Since sample size n = 10 was too small for the likelihood methods to be numerically reliable, the results for n = 10 were based on trimmed data where 2% from each tail were excluded in the summary. The results for n = 25 were stable and included all 1000 replicates. The PL method had correct specification of the dependence model for data generated from the extreme-value models (SM, SC, and GG). For data generated from the GA model, the PL method was obtained under the specification of a GG model with Gaussian correlation structure. For each method, we report the relative bias and the relative root mean squared error (RMSE) for the GEV parameters and return levels at the center point.

When PL Is Correctly Specified

We first look at the results for data generated from extreme-value dependence models (Figure 1 for the GG model). The bias decreases for all methods as n goes from 10 to 25. At n = 25, the bias of the PL method is quite small, with the largest relative magnitude of 7.8%. The bias of the LM method, however, remains high, and it is bigger especially with stronger dependence level. More sites did not help, especially for strong dependence. The relative bias is 17.5% for ξ and 20.6% for Q₅₀₀ under strong dependence level and m = 20. This behavior of the LM method is in contrast to the existing result that intersite dependence does not introduce bias in RFA [Hosking and Wallis, ¹⁹⁸⁸]. It may be explained by that the simulation here was done with data generated from max-stable processes, which ensures that all marginal copulas are extreme-value copulas. In existing studies, however, data were generated mostly with normal copulas, which is not an extreme-value copula.

Relative bias (%) and relative RMSE (%) for three methods with data from the GG model.

The RMSE for all methods decreases as n increases as expected. Between m = 10 and m = 20, little difference was observed in RMSE. We have also tried m = 5 (not reported here) and found that the relative RMSEs did decrease when m increased from 5 to 10. This indicates that increasing the number of sites helps increase the efficiency for smaller m, but only up to a certain point, an observation consistent with the findings in Hosking and Wallis [1988]. As the dependence gets stronger, the RMSE increases for all methods, but the magnitude of the change is the smallest for the PL method. This is because under correct specification, the PL method incorporates spatial dependence in the estimation while the other two methods do not. For both sample sizes, the PL method is a clear winner among the three. The comparison between the LM method and the IL method is mixed. For n = 25, the LM method is less efficient in parameter estimates but more efficient in some return level estimates than the IL method. This is possible because the return levels are nonlinear transformations of the parameters; see equation (2). For n = 10, the LM performs better than the IL method in most of the return level estimates, even though it is less efficient in estimating γ and comparable in estimating μ and ξ. Further investigation revealed that the variance of the LM estimator is much smaller than that of the IL estimator, which compensates the larger bias (especially in ξ) of the LM estimator. This makes sense since it is known that L-moment estimator has a restriction Inline graphic , which makes it more efficient than the likelihood method for small samples when the restriction is true. In an earlier version under a slightly different simulation design, we had (not reported here) and found that for large sample sizes, the efficiency order was PL, IL, and LM from the highest to the lowest. Among the three GEV parameters, the efficiency gain of the PL method relative to the IL method was always the greatest for the shape parameter ξ (in one case it was as large as 2.76); the RE for the μ and γ were close to 1. That is, the efficiency gain of the PL method is mostly realized in ξ, which controls the tail behavior of the GEV distribution. This leads to the efficiency gain of the PL method in estimating the return levels, especially for longer return periods such as 500 years.

Misspecification of PL Within Spatial Extreme Models

The efficiency gain in the PL method relative to the IL method comes with a cost: one needs to specify the spatial dependence model. We first look at the results for cases where misspecification is within the class of spatial extreme models; that is, one max-stable model is misspecified as another max-stable model. Figure 2 summarizes the relative efficiency (RE) in mean squared error of the PL method in estimating the GEV parameters and return levels, using the IL method as the reference (RE is the ratio of MSE of IL over MSE of PL), under both correct specification and misspecification within spatial extreme models for n = 25 and m = 10. When a SM model was misspecified as a GG model, or vice versa, the resulting estimator was almost as efficient as that under correct specification, and was much more efficient than the IL estimator. This is because the SM model and the GG model are very similar models, as evident from their similar bivariate distributions given in Section 3.2. When a SM or GG model was misspecified as a SC model, the resulting PL estimator is comparable with the IL estimator for weak dependence, but more efficient than the IL estimator for moderate or strong dependence. This is as expected, because in contrast to the other two models, the SC model does not provide full range of dependence and, therefore, cannot accommodate weak dependence or close to independence. When a SC model was misspecified as a SM or GG model, the resulting PL estimator remained competitive compared to the IL estimator, especially for the cases with stronger spatial dependence. Among the three parametric models, both the SM model and GG model offer full range of dependence level and high efficiency under misspecification, but since the SM model gives too regular shapes of extreme observations to be observed in practice [e.g., Schlather, ²⁰⁰²], we recommend using the GG model. This is also why we presented the results for data generated from the GG model only in the main text.

Relative efficiency (RE) of PL method (with the IL method as reference) under correct specification and misspecification within the class of extreme-value dependence models with n = 25 and m = 10. The grouped variable is the model that generated the data, and the line in each panel represents the corresponding fitted model.

Misspecification of PL Under Nonextreme-Value Model

What if the true dependence model is a nonextreme-value copula but we fit an extreme-value dependence model? Figure 3 summarizes the results for data generated from the GA model, with the PL method specified under a GG model. The misspecified PL estimator has small bias in μ and γ, but large bias in ξ (as high as 50% for m = 20 and n = 25), which is much larger than that of the LM estimator or IL estimator. The bias increases as the dependence level gets stronger, and having more sites do not help. This large bias played a major part in the RMSE of the PL estimator; the RE for ξ and all return levels are as small as 0.51. The LM estimator is better than the IL estimator for all return levels. The performance of the LM method is similar to what was reported in Hosking and Wallis [1988]—its relative bias is only alarmingly noticeable (9.8% for Q₅₀₀) under the strong dependence level. This is reasonable because this scenario is the closest to the data generation scheme of Hosking and Wallis [1988], where a relative bias up to 5% for Q₁₀₀₀ was reported. Therefore, the spatial dependence does not affect the LM estimator under normal copula as much as it does under extreme-value copulas. For instance, the bias of the LM estimator in estimating Q₅₀₀ with data from the GG model is about twice as much as that with data from the GA model: 33.6% versus 17.5% when n = 10 and 20.6% versus 9.8% when n = 25.

Relative Bias (%) and relative RMSE (%) for three methods with data from the GA model. The PL method using a GG model specification.

In summary, incorporating spatial dependence in the index flood model through max-stable processes may improve the efficiency of RFA, but at the cost of having to specify the dependence model. The LM method and the IL method do need to do so, which makes them attractive when no evidence supports extreme-value dependence. Misspecification of the PL method can lead undesired large bias. If, however, extreme-value copulas are known to correctly specify the dependence structure or provide adequate fit to the data through a goodness-of-fit test, then the PL method may be preferred by exploiting the dependence structure to give more efficient RFA. Essentially, it is still a story of bias-variance trade-off. To reap the potential efficiency gain in practice, one must check the goodness-of-fit of max-stable processes, which has been studied recently [Kojadinovic et al., ²⁰¹⁴], in addition to the goodness-of-fit tests for the marginal GEV models and the homogeneity assumption on the GEV parameters.

Illustration

For illustration, we applied the index flood model with all three estimation methods to the Swiss rainfall data that has been analyzed by many authors in modeling spatial extremes [e.g., Davison et al., ²⁰¹²]. The data consist of summer maximum daily precipitation (mm) for 51 stations over the years of 1962–2008 in the Plateau region of Switzerland; it is available in the R package SpatialExtremes [Ribatet and Singleton, ²⁰¹³]. To make a more realistic RFA, we used only the last 25 years of data from 1984 to 2008 (n = 25). We further filtered the sites in attempt to enhance the chance that the PL method gives reliable inferences. As we pointed in the discussion of the simulation study, the higher efficiency of the PL method is only achievable when the bivariate marginal distributions are correctly specified; otherwise, the PL method could lead to serious bias. The flood-index model we considered assumes that all the marginal distributions are GEV distributions with the same shape parameter and the same ratio of location parameter and scale parameter. These assumptions are shared by all three methods, but the PL method assumes additionally that the dependence structure can be captured by a max-stable (e.g., a geometric Gaussian) process. The goodness-of-fit of the geometric Gaussian process on this data has been checked graphically [Davison et al., ²⁰¹²]. A formal goodness-of-fit test for max-stable process models was not rejected for this data [Kojadinovic et al., ²⁰¹⁴]. The test, however, was applied on the whole region globally, and may have low power in detecting local lack-of-fit. Therefore, we further applied the test for bivariate extreme-value dependence [Kojadinovic et al., ²⁰¹¹] on all the pairs.

We filtered the sites by three tests on the model assumptions: (1) the goodness-of-fit of univariate GEV distribution was not rejected at any single site by a Kolmogorov–Smirnov test; (2) the homogeneity hypothesis ( Inline graphic and ξ_s are both constant) was not rejected by a nonparametric bootstrap test procedure which preserves the spatial dependence [Heffernan and Tawn, ²⁰⁰⁴]; and (3) the hypothesis of bivariate extreme-value dependence was not rejected for any pair of the sites by a nonparametric test proposed by Kojadinovic et al. [2011]. Note that the first two are needed by all three methods, but the other one is only needed by the PL method. The 3rd test was an additional measure on model specification check given that the geometric Gaussian process has been known to fit this data well [Davison et al., ²⁰¹²; Kojadinovic et al., ²⁰¹⁴]. For a different data set, it will be necessary to run model diagnosis and global goodness-of-fit test too. This process ended up with 11 sites; see map in Figure 4.

Elevation map of Switzerland with the 11 stations that were used in the Swiss rainfall analysis. The 11 stations are marked by triangles, and the dots represent cities in Switzerland.

We fitted the index flood model with marginal GEV distribution to the 25 year data of the 11 sites with all three methods. The PL method was carried out under the same GG model that was used in the simulation study; that is, it had a Gaussian correlation function with a single range parameter and Inline graphic was fixed. The standard errors of all the parameter estimates were obtained with a spatial-dependence-preserving bootstrap procedure [Heffernan and Tawn, ²⁰⁰⁴] with 1000 bootstrap samples.

Figure 5 summarizes the point estimate and 95% confidence interval (CI) for each parameter and three site-specific return levels (50, 100, and 500 years) from the three methods. The bounds of the 95% CI were the 2.5% and 97.5% percentiles of the 1000 bootstrap estimates, respectively. The point estimates from the three methods are similar for μ_s's, but quite different for γ and ξ. For γ, the estimates are 0.331 (s.e. 0.021), 0.392 (s.e. 0.018), and 0.389 (s.e. 0.018) for LM, IL, and PL, respectively. For ξ, the estimates are 0.345 (s.e. 0.063), 0.212 (s.e. 0.053), and 0.148 (s.e. 0.054) for LM, IL, and PL, respectively. The shape parameter is estimated to be significantly different from zero regardless of the method, suggesting the tails of the GEV distributions are heavier than the tail of the Gumbel distribution. The differences have a drastic effect on return level estimates. Consider for example the 11th site. The estimates of Q₅₀₀ for this site are 228.39 (s.e. 40.17), 171.28 (s.e. 30.44), and 139.27 (s.e. 24.52), respectively, from the LM method, the IL method, and the PL method. The reduction in the standard error of the PL method is remarkable. The standard error of Q₅₀₀ from the PL method is about 40% smaller than that from the LM method, and 20% smaller than that from the IL method. The bounds of the 95% bootstrap CIs are asymmetric around the point estimates, which are most notable for γ and ξ. The asymmetry appears to be in opposite direction for the LM method and the PL method, making the overlaps of the CIs to be bigger than those symmetric CIs constructed from the bootstrap standard errors. The CIs for the return levels have the shortest length from the PL method, followed by the IL method and then the LM method. We emphasize that the short CIs from the PL method does come at a cost — we had to specify the dependence model with a geometric Gaussian process with a Gaussian correlation structure. The estimate of the range parameter in the dependence model is Inline graphic (s.e. 92.31). We have gone through the extra steps in filter the sites to be included in the analysis beyond the known model checking in the literature on this data, which turned out to be worth it.

Estimated parameters and return levels (in mm) along with their 95% confidence intervals from the bootstrap procedure for the Swiss rainfall data. The PL method used a geometric Gaussian model with a Gaussian correlation function for the spatial dependence.

The systematically lower return level estimates from the PL method than those from the LM method may be explained from two perspectives. First, the return level is a function of all three parameters of the GEV distribution but is most sensitive to the shape parameter. The location parameter estimates are similar across different methods. The scale parameter estimates and the shape parameter estimates, however, tend to compensate each other: higher scale parameter estimate is accompanied with lower shape parameter estimate. The LM method has lower Inline graphic , hence lower scale estimate, and higher, which led to the higher return level estimates. Second, the shape parameter seems to have the most room for efficiency improvement as seen in the simulation (Figure 2 in the manuscript). The efficiency gain of the PL method, assuming that the PL is correctly specified, is only on average if replicates were available. For a single data set, the truth is unknown, and the lower point estimate of the shape parameter from the PL method than that from the IL method is quite likely, noting that the 95% bootstrap confidence intervals from the two methods overlap by large.

The analysis so far is based on the last 25 years data and the availability of the whole 47 years data enabled us to compare the performance of the three methods more thoroughly in other ways. We randomly selected 100 subsets of 25 year data, and for each subset we ran the same analysis as we did for the last 25 years. The same analysis was also repeated on the full 47 years of data. Table1 summarizes results from these analyses. Site specific estimates are only presented for two sites with the smallest or the largest Inline graphic from the full data analysis. Point estimates and bootstrap standard errors are reported for the full data analysis. For the 100 subset analyses, we reported the average of the point estimates, the average of the bootstrap standard errors, and the standard deviation of the point estimates. It is reassuring that the point estimates from the full data analysis are very close to the average of those from the 100 subset analyses for all parameters and all three methods. For the estimates of the shape parameter and the return levels, the PL method always has the smallest standard error while the LM method always has the largest standard error, regardless of the full data analysis or the average of the subset analyses. This is consistent to the results from the last 25 years of data. The standard deviation of the 100 subset point estimates of the shape parameter and the return levels has the same pattern, but the difference in the magnitudes is even more obvious. For instance, the standard deviation of the 100 subset estimates Q₅₀₀ is 68.0, 49.3, and 31.9 for LM, IL, and PL, respectively.

Table 1.

Point Estimate and Bootstrap Standard Error for the Full 47 Years Data Analysis (Abbreviated as Full), the Average Point Estimate and Average Bootstrap Standard Error Based on 100 Subsets of 25 Years (Abbreviated as Ave), and the Standard Deviation of the 100 Point Estimates

	Point Estimate						Standard Error
	LM		IL		PL		LM		IL		PL		Standard Error of 100 Point Estimates
	Full	Ave	Full	Ave	Full	Ave	Full	Ave	Full	Ave	Full	Ave	LM	IL	PL
μ	20.5	20.7	20.4	20.5	20.5	20.6	0.43	0.55	0.42	0.55	0.41	0.55	1.31	1.28	1.26
	26.5	26.6	26.5	26.6	26.4	26.5	0.49	0.65	0.46	0.61	0.45	0.60	1.70	1.61	1.53
Q₅₀	68.9	67.8	66.0	65.4	61.6	61.3	4.4	6.1	4.0	5.7	3.6	5.3	12.4	12.2	9.2
	88.9	87.3	85.7	84.8	79.2	79.0	5.8	8.1	5.4	7.7	4.9	7.1	16.8	15.4	11.6
Q₁₀₀	84.4	83.7	79.3	78.7	72.5	72.2	6.9	10.1	6.0	8.8	5.4	8.0	19.7	17.6	12.6
	108.9	107.8	102.8	102.1	93.2	93.0	9.2	13.2	8.1	11.8	7.2	10.6	26.3	22.5	16.1
Q₅₀₀	133.9	137.6	119.0	119.8	103.3	103.8	17.8	29.2	14.1	21.6	12.1	18.7	51.5	38.4	24.8
	172.8	177.5	154.4	155.3	132.9	133.7	23.5	38.3	18.8	28.6	16.0	24.7	68.0	49.3	31.9
γ	0.338	0.327	0.358	0.353	0.355	0.351	0.014	0.021	0.012	0.017	0.012	0.017	0.030	0.026	0.026
ξ	0.274	0.256	0.223	0.205	0.178	0.168	0.045	0.062	0.037	0.053	0.038	0.053	0.133	0.097	0.071

Open in a new tab

Another interesting finding is about the ratio of the standard errors from the full data analysis and those from the subset analyses. More data are associated with smaller standard errors in theory. For most parameter estimates and all methods, the ratios of the standard error from the full 47 years data to that from the average of the 25 years subsets are close to Inline graphic . This is expected to happen for large sample, which suggests that the large sample results approximate the finite sample results quite well even for sample size of 25 in this data analysis. We also studied the ratios of the 95% bootstrap CI lengths, most of which were higher than 0.73 due to the asymmetry in the CIs.

Discussion

This paper explores the idea of incorporating intersite dependence in RFA with index flood models to improve the efficiency in estimation. The efficiency gain comes at the cost of having to specifying the dependence model in addition to the usual specifications such as marginal distributions and the regional homogeneity assumption. When the dependence model is correctly specified, smaller standard errors and narrower confidence intervals can be obtained for model parameter and return levels. Misspecification of the dependence model, however, may result in serious bias, especially when the true dependence model is not of extreme-value type and the dependence is strong. This makes it important to check the goodness-of-fit for the dependence structure, in addition to the usual check for marginal goodness-of-fit and regional homogeneity, to reap the efficiency gain. The L-moment method and the independence method may sometimes be preferable because they have no need for dependence model specification. The L-moment method implicitly constrains the shape parameter to be less than 1 for the existence of the L-moments, which gives efficient estimator for small samples when the constraint does hold. As extreme-value copula can be very different from nonextreme-value copula (e.g., normal copula), it has more effect on the bias of the L-moment estimator than does the normal copula for typical record lengths in RFA, a result that has not previously been reported. The independence likelihood method may have unreasonable estimates in small samples unless it is modified to impose a similar constraint, but for large samples, it is more efficient than the L-moment method.

Spatial dependence in RFA is often a nuisance because the goal of an RFA is usually to estimate marginal return levels. With marginal GEV distributions, it is desirable to improve the efficiency without specifying a spatial dependence model. Specification and selection of a working dependence model can be avoided by a combined estimating equation approach based on data contrasts for clustered data [Stoner and Leroux, ²⁰⁰²]. For spatial data, estimating equation approaches have also been applied for marginal models with no need to correctly specify the dependence structure [Yasui and Lele, ¹⁹⁹⁷; Clayton and Lin, ²⁰⁰⁵; Lin, ^,]. The marginal score equations can be combined in certain way to improve the efficiency [Nikoloulopoulos et al., ²⁰¹¹]. Further research in this direction may benefit not just RFA but general spatial extreme modeling.

Acknowledgments

Z. Wang's graduate assistantship was partially supported by a contract with the Environment Canada. The Swiss rainfall data are publicly available from the open source R package SpatialExtremes (http://CRAN.R-project.org/package=SpatialExtremes).

Key Points

Exploiting spatial dependence improves the efficiency of RFA
Larger gain in efficiency is possible when the spatial dependence is stronger
Estimation could be biased if the dependence model is severely misspecified

References

Banerjee S, Carlin BP. Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. N. Y: Chapman and Hall; 2004. p. 452. and pp. [Google Scholar]
Blanchet J. Davison AC. Spatial modeling of extreme snow depth. Ann. Appl. Stat. 2011;5(3):1699–1725. [Google Scholar]
Buishand TA. Extreme rainfall estimation by combining data from several sites. Hydrol. Sci. J. 1991;36:345–365. [Google Scholar]
Castellarin A. Probabilistic envelope curves for design flood estimation at ungauged sites. Water Resour. Res. 2007;43:W04406. doi: 10.1029/2005WR004384. [Google Scholar]
Castellarin A, Vogel RM. Matalas NC. Probabilistic behavior of a regional envelope curve. Water Resour. Res. 2005;41:W06018. and, doi: 10.1029/2004WR003042. [Google Scholar]
Castellarin A, Burn D. Brath A. Homogeneity testing: How homogeneous do heterogeneous cross-correlated regions seem? J. Hydrol. 2008;360(1–4):67–76. [Google Scholar]
Chebana F. Ouarda T. Index flood–based multivariate regional frequency analysis. Water Resour. Res. 2009;45:W10435. and, doi: 10.1029/2008WR007490. [Google Scholar]
Clayton MK. Lin P-S. Analysis of binary spatial data by quasi-likelihood estimating equations. Ann. Stat. 2005;33(2):542–555. [Google Scholar]
Coles SG. Dixon MJ. Likelihood-based inference for extreme value models. Extremes. 1999;2:5–23. [Google Scholar]
Cunderlik JM. Ouarda TB. Regional flood-duration–frequency modeling in the changing environment. J. Hydrol. 2006;318(1):276–291. [Google Scholar]
Davison AC. Gholamrezaee MM. Geostatistics of extremes. Proc. R. Soc. A. 2012;468:581–608. [Google Scholar]
Davison AC, Padoan SA. Ribatet M. Statistical modeling of spatial extremes. Stat. Sci. 2012;27(2):161–186. [Google Scholar]
de Haan L. A spectral representation for max-stable processes. Ann. Probab. 1984;12:1194–1204. [Google Scholar]
Embrechts P, McNeil A. Straumann D. Correlation and dependence in risk management: Properties and pitfalls. In: Dempster MAH, editor; Risk Management: Value at Risk and Beyond. Cambridge, U. K: Cambridge Univ. Press; 2002. pp. 176–223. [Google Scholar]
Genest C. Favre A-C. Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng. 2007;12:347–368. [Google Scholar]
Griffis V. Stedinger J. The use of GLS regression in regional hydrologic analyses. J. Hydrol. 2007;344(1–2):82–95. and, doi: 10.1016/j.jhydrol.2007.06.023. [Google Scholar]
Gudendorf G. Segers J. Extreme-value copulas. In: Jaworski P, editor; Copula Theory and Its Applications. Springer Berlin Heidelberg; 2010. pp. 127–145. [Google Scholar]
Hanel M, Buishand TA. Ferro C. A nonstationary index flood model for precipitation extremes in transient regional climate model simulations. J. Geophys. Res. 2009;114:D15107. and, doi: 10.1029/2009JD011712. [Google Scholar]
Heffernan JE. Tawn JA. A conditional approach for multivariate extreme values (with discussion) J. R. Stat. Soc., Ser. B. 2004;66(3):497–546. [Google Scholar]
Hosking JRM. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc., Ser. B. 1990;52:105–124. [Google Scholar]
Hosking JRM. Wallis JR. The effect of intersite dependence on regional flood frequency analysis. Water Resour. Res. 1988;24:588–600. [Google Scholar]
Hosking JRM. Wallis JR. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993;29:271–281. [Google Scholar]
Hosking JRM. Wallis JR. Regional Frequency Analysis: An Approach Based on L-moments. Cambridge, U. K: Cambridge Univ. Press; 1997. p. 224. and pp. [Google Scholar]
Javelle P, Ouarda TB, Lang M, Bobée B, Galéa G. Grésillon J-M. Development of regional flood-duration–frequency curves based on the index-flood method. J. Hydrol. 2002;258(1):249–259. [Google Scholar]
Kojadinovic I, Segers J. Yan J. Large-sample tests of extreme-value dependence for multivariate copulas. Can. J. Stat. 2011;39(4):703–720. [Google Scholar]
Kojadinovic I, Shang H. Yan J. A class of goodness-of-fit tests for spatial extremes models based on max-stable processes. Stat. Interface. 2014 and, in press. [Google Scholar]
Kysely J. Picek J. Regional growth curves and improved design value estimates of extreme precipitation events in the Czech Republic. Clim. Res. 2007;33(3):243–255. [Google Scholar]
Leclerc M. Ouarda TB. Non-stationary regional flood frequency analysis at ungauged sites. J. Hydrol. 2007;343(3):254–265. [Google Scholar]
Lettenmaier DP, Wallis J. Wood E. Effect of regional heterogeneity on flood frequency estimation. Water Resour. Res. 1987;23:313–323. [Google Scholar]
Lin P-S. Estimating equations for spatially correlated data in multi-dimensional space. Biometrika. 2008;95(4):847–858. [Google Scholar]
Lin P-S. A working estimating equation for spatial count data. J. Stat. Plann. Inference. 2010;140(9):2470–2477. [Google Scholar]
Lindsay BG. Composite likelihood methods. In: Prabhu NU, editor. Statistical Inference from Stochastic Processes. Am. Math. Soc; 1988. pp. 221–239. edited by,, Providence, R. I. [Google Scholar]
Martins ES. Stedinger JR. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resour. Res. 2000;36:737–744. and, doi: 10.1029/1999WR900330. [Google Scholar]
Matalas NC. Langbein WB. Information content of the mean. J. Geophys. Res. 1962;67(9):3441–3448. [Google Scholar]
Ngongondo CS, Xu C-Y, Tallaksen LM, Alemaw B. Chirwa T. Regional frequency analysis of rainfall extremes in Southern Malawi using the index rainfall and l-moments approaches. Stochastic Environ. Res. Risk Assess. 2011;25(7):939–955. [Google Scholar]
Nikoloulopoulos AK, Joe H. Chaganty NR. Weighted scores method for regression models with dependent data. Biostatistics. 2011;12:653–665. doi: 10.1093/biostatistics/kxr005. [DOI] [PubMed] [Google Scholar]
Northrop P. Likelihood-based approaches to flood frequency estimation. J. Hydrol. 2004;292:96–113. [Google Scholar]
Ouarda T, Cunderlik J, St-Hilaire A, Barbet M, Bruneau P. Bobée B. Data-based comparison of seasonality-based regional flood frequency methods. J. Hydrol. 2006;330(1):329–339. [Google Scholar]
Ouarda TB, Haché M, Bruneau P. Bobée B. Regional flood peak and volume estimation in Northern Canadian Basin. J. Cold Reg. Eng. 2000;14(4):176–191. [Google Scholar]
Ouarda TBMJ. Hydrological frequency analysis, regional. In: El-Shaarawi A, Piegorsch W, editors. Encyclopedia of Environmetrics. 2nd ed. Chichester, U. K: John Wiley; 2013. edited by,, doi: 10.1002/9780470057339.vnn043. [Google Scholar]
Padoan SA, Ribatet M. Sisson SA. Likelihood-based inference for max-stable processes. J. Am. Stat. Assoc. 2010;105(489):263–277. [Google Scholar]
Ribatet M. Singleton R. 2013. SpatialExtremes: Modelling Spatial Extremesand, R package version 1.9-1. [Available at http://CRAN.R-project.org/package=SpatialExtremes.]
Schlather M. Models for stationary max-stable random fields. Extremes. 2002;5(1):33–44. [Google Scholar]
Schlather M. Tawn JA. A dependence measure for multivariate and spatial extreme values: Properties and inference. Biometrika. 2003;90(1):139–156. [Google Scholar]
Sklar AW. Fonctions de répartition à n dimension et leurs marges. Publ. Inst. Stat. Univ. Paris. 1959;8:229–231. [Google Scholar]
Stedinger JR. Estimating a regional flood frequency distribution. Water Resour. Res. 1983;19:503–510. doi: 10.1029/WR019i002p00503. [Google Scholar]
Stoner JA. Leroux BG. Analysis of clustered data: A combined estimating equations approach. Biometrika. 2002;89(3):567–578. [Google Scholar]
Troutman BM. Karlinger MR. Regional flood probabilities. Water Resour. Res. 2003;39(4):1095. and, doi: 10.1029/2001WR001140. [Google Scholar]
Varin C. On composite marginal likelihoods. Adv. Stat. Anal. 2008;92(1):1–28. [Google Scholar]
Varin C, Reid N. Firth D. An overview of composite likelihood methods. Stat. Sinica. 2011;21(1):5–42. [Google Scholar]
Viglione A, Castellarin A, Rogger M, Merz R. Blschl G. Extreme rainstorms: Comparing regional envelope curves to stochastically generated events. Water Resour. Res. 2012;48:W01509. and, doi: 10.1029/2011WR010515. [Google Scholar]
Wang Z, Yan J. Zhang X. 2014. Tech. Rep. 9and ), Incorporating spatial dependence in regional frequency analysis,, Dep. of Stat., Univ. of Conn., Storrs, Conn.
Westra S. Sisson SA. Detection of non-stationarity in precipitation extremes using a max-stable process model. J. Hydrol. 2011;406(1):119–128. [Google Scholar]
Yasui Y. Lele S. A regression method for spatial disease rates: An estimating function approach. J. Am. Stat. Assoc. 1997;92:21–32. [Google Scholar]

[b1] Banerjee S, Carlin BP. Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. N. Y: Chapman and Hall; 2004. p. 452. and pp. [Google Scholar]

[b2] Blanchet J. Davison AC. Spatial modeling of extreme snow depth. Ann. Appl. Stat. 2011;5(3):1699–1725. [Google Scholar]

[b3] Buishand TA. Extreme rainfall estimation by combining data from several sites. Hydrol. Sci. J. 1991;36:345–365. [Google Scholar]

[b4] Castellarin A. Probabilistic envelope curves for design flood estimation at ungauged sites. Water Resour. Res. 2007;43:W04406. doi: 10.1029/2005WR004384. [Google Scholar]

[b5] Castellarin A, Vogel RM. Matalas NC. Probabilistic behavior of a regional envelope curve. Water Resour. Res. 2005;41:W06018. and, doi: 10.1029/2004WR003042. [Google Scholar]

[b6] Castellarin A, Burn D. Brath A. Homogeneity testing: How homogeneous do heterogeneous cross-correlated regions seem? J. Hydrol. 2008;360(1–4):67–76. [Google Scholar]

[b7] Chebana F. Ouarda T. Index flood–based multivariate regional frequency analysis. Water Resour. Res. 2009;45:W10435. and, doi: 10.1029/2008WR007490. [Google Scholar]

[b8] Clayton MK. Lin P-S. Analysis of binary spatial data by quasi-likelihood estimating equations. Ann. Stat. 2005;33(2):542–555. [Google Scholar]

[b9] Coles SG. Dixon MJ. Likelihood-based inference for extreme value models. Extremes. 1999;2:5–23. [Google Scholar]

[b10] Cunderlik JM. Ouarda TB. Regional flood-duration–frequency modeling in the changing environment. J. Hydrol. 2006;318(1):276–291. [Google Scholar]

[b11] Davison AC. Gholamrezaee MM. Geostatistics of extremes. Proc. R. Soc. A. 2012;468:581–608. [Google Scholar]

[b12] Davison AC, Padoan SA. Ribatet M. Statistical modeling of spatial extremes. Stat. Sci. 2012;27(2):161–186. [Google Scholar]

[b13] de Haan L. A spectral representation for max-stable processes. Ann. Probab. 1984;12:1194–1204. [Google Scholar]

[b14] Embrechts P, McNeil A. Straumann D. Correlation and dependence in risk management: Properties and pitfalls. In: Dempster MAH, editor; Risk Management: Value at Risk and Beyond. Cambridge, U. K: Cambridge Univ. Press; 2002. pp. 176–223. [Google Scholar]

[b15] Genest C. Favre A-C. Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng. 2007;12:347–368. [Google Scholar]

[b16] Griffis V. Stedinger J. The use of GLS regression in regional hydrologic analyses. J. Hydrol. 2007;344(1–2):82–95. and, doi: 10.1016/j.jhydrol.2007.06.023. [Google Scholar]

[b17] Gudendorf G. Segers J. Extreme-value copulas. In: Jaworski P, editor; Copula Theory and Its Applications. Springer Berlin Heidelberg; 2010. pp. 127–145. [Google Scholar]

[b18] Hanel M, Buishand TA. Ferro C. A nonstationary index flood model for precipitation extremes in transient regional climate model simulations. J. Geophys. Res. 2009;114:D15107. and, doi: 10.1029/2009JD011712. [Google Scholar]

[b19] Heffernan JE. Tawn JA. A conditional approach for multivariate extreme values (with discussion) J. R. Stat. Soc., Ser. B. 2004;66(3):497–546. [Google Scholar]

[b20] Hosking JRM. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc., Ser. B. 1990;52:105–124. [Google Scholar]

[b21] Hosking JRM. Wallis JR. The effect of intersite dependence on regional flood frequency analysis. Water Resour. Res. 1988;24:588–600. [Google Scholar]

[b22] Hosking JRM. Wallis JR. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993;29:271–281. [Google Scholar]

[b23] Hosking JRM. Wallis JR. Regional Frequency Analysis: An Approach Based on L-moments. Cambridge, U. K: Cambridge Univ. Press; 1997. p. 224. and pp. [Google Scholar]

[b24] Javelle P, Ouarda TB, Lang M, Bobée B, Galéa G. Grésillon J-M. Development of regional flood-duration–frequency curves based on the index-flood method. J. Hydrol. 2002;258(1):249–259. [Google Scholar]

[b25] Kojadinovic I, Segers J. Yan J. Large-sample tests of extreme-value dependence for multivariate copulas. Can. J. Stat. 2011;39(4):703–720. [Google Scholar]

[b26] Kojadinovic I, Shang H. Yan J. A class of goodness-of-fit tests for spatial extremes models based on max-stable processes. Stat. Interface. 2014 and, in press. [Google Scholar]

[b27] Kysely J. Picek J. Regional growth curves and improved design value estimates of extreme precipitation events in the Czech Republic. Clim. Res. 2007;33(3):243–255. [Google Scholar]

[b28] Leclerc M. Ouarda TB. Non-stationary regional flood frequency analysis at ungauged sites. J. Hydrol. 2007;343(3):254–265. [Google Scholar]

[b29] Lettenmaier DP, Wallis J. Wood E. Effect of regional heterogeneity on flood frequency estimation. Water Resour. Res. 1987;23:313–323. [Google Scholar]

[b30] Lin P-S. Estimating equations for spatially correlated data in multi-dimensional space. Biometrika. 2008;95(4):847–858. [Google Scholar]

[b31] Lin P-S. A working estimating equation for spatial count data. J. Stat. Plann. Inference. 2010;140(9):2470–2477. [Google Scholar]

[b32] Lindsay BG. Composite likelihood methods. In: Prabhu NU, editor. Statistical Inference from Stochastic Processes. Am. Math. Soc; 1988. pp. 221–239. edited by,, Providence, R. I. [Google Scholar]

[b33] Martins ES. Stedinger JR. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resour. Res. 2000;36:737–744. and, doi: 10.1029/1999WR900330. [Google Scholar]

[b34] Matalas NC. Langbein WB. Information content of the mean. J. Geophys. Res. 1962;67(9):3441–3448. [Google Scholar]

[b35] Ngongondo CS, Xu C-Y, Tallaksen LM, Alemaw B. Chirwa T. Regional frequency analysis of rainfall extremes in Southern Malawi using the index rainfall and l-moments approaches. Stochastic Environ. Res. Risk Assess. 2011;25(7):939–955. [Google Scholar]

[b36] Nikoloulopoulos AK, Joe H. Chaganty NR. Weighted scores method for regression models with dependent data. Biostatistics. 2011;12:653–665. doi: 10.1093/biostatistics/kxr005. [DOI] [PubMed] [Google Scholar]

[b37] Northrop P. Likelihood-based approaches to flood frequency estimation. J. Hydrol. 2004;292:96–113. [Google Scholar]

[b38] Ouarda T, Cunderlik J, St-Hilaire A, Barbet M, Bruneau P. Bobée B. Data-based comparison of seasonality-based regional flood frequency methods. J. Hydrol. 2006;330(1):329–339. [Google Scholar]

[b39] Ouarda TB, Haché M, Bruneau P. Bobée B. Regional flood peak and volume estimation in Northern Canadian Basin. J. Cold Reg. Eng. 2000;14(4):176–191. [Google Scholar]

[b40] Ouarda TBMJ. Hydrological frequency analysis, regional. In: El-Shaarawi A, Piegorsch W, editors. Encyclopedia of Environmetrics. 2nd ed. Chichester, U. K: John Wiley; 2013. edited by,, doi: 10.1002/9780470057339.vnn043. [Google Scholar]

[b41] Padoan SA, Ribatet M. Sisson SA. Likelihood-based inference for max-stable processes. J. Am. Stat. Assoc. 2010;105(489):263–277. [Google Scholar]

[b42] Ribatet M. Singleton R. 2013. SpatialExtremes: Modelling Spatial Extremesand, R package version 1.9-1. [Available at http://CRAN.R-project.org/package=SpatialExtremes.]

[b43] Schlather M. Models for stationary max-stable random fields. Extremes. 2002;5(1):33–44. [Google Scholar]

[b44] Schlather M. Tawn JA. A dependence measure for multivariate and spatial extreme values: Properties and inference. Biometrika. 2003;90(1):139–156. [Google Scholar]

[b45] Sklar AW. Fonctions de répartition à n dimension et leurs marges. Publ. Inst. Stat. Univ. Paris. 1959;8:229–231. [Google Scholar]

[b46] Stedinger JR. Estimating a regional flood frequency distribution. Water Resour. Res. 1983;19:503–510. doi: 10.1029/WR019i002p00503. [Google Scholar]

[b47] Stoner JA. Leroux BG. Analysis of clustered data: A combined estimating equations approach. Biometrika. 2002;89(3):567–578. [Google Scholar]

[b48] Troutman BM. Karlinger MR. Regional flood probabilities. Water Resour. Res. 2003;39(4):1095. and, doi: 10.1029/2001WR001140. [Google Scholar]

[b49] Varin C. On composite marginal likelihoods. Adv. Stat. Anal. 2008;92(1):1–28. [Google Scholar]

[b50] Varin C, Reid N. Firth D. An overview of composite likelihood methods. Stat. Sinica. 2011;21(1):5–42. [Google Scholar]

[b51] Viglione A, Castellarin A, Rogger M, Merz R. Blschl G. Extreme rainstorms: Comparing regional envelope curves to stochastically generated events. Water Resour. Res. 2012;48:W01509. and, doi: 10.1029/2011WR010515. [Google Scholar]

[b52] Wang Z, Yan J. Zhang X. 2014. Tech. Rep. 9and ), Incorporating spatial dependence in regional frequency analysis,, Dep. of Stat., Univ. of Conn., Storrs, Conn.

[b53] Westra S. Sisson SA. Detection of non-stationarity in precipitation extremes using a max-stable process model. J. Hydrol. 2011;406(1):119–128. [Google Scholar]

[b54] Yasui Y. Lele S. A regression method for spatial disease rates: An estimating function approach. J. Am. Stat. Assoc. 1997;92:21–32. [Google Scholar]

PERMALINK

Incorporating spatial dependence in regional frequency analysis

Zhuo Wang

Jun Yan

Xuebin Zhang

Abstract

Introduction

Index Flood Model With GEV Distribution

Model

Existing Estimation Methods

L-Moment

Independence Likelihood

Spatial Index Flood Model

Max-Stable Process

Parametric Max-Stable Models

Application to RFA

Spatial Index Flood Model

Pairwise Likelihood Estimation

Simulation Study

Design

Results

When PL Is Correctly Specified

Figure 1.

Misspecification of PL Within Spatial Extreme Models

Figure 2.

Misspecification of PL Under Nonextreme-Value Model

Figure 3.

Illustration

Figure 4.

Figure 5.

Table 1.

Discussion

Acknowledgments

Key Points

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Incorporating spatial dependence in regional frequency analysis

Zhuo Wang

Jun Yan

Xuebin Zhang

Abstract

Introduction

Index Flood Model With GEV Distribution

Model

Existing Estimation Methods

L-Moment

Independence Likelihood

Spatial Index Flood Model

Max-Stable Process

Parametric Max-Stable Models

Application to RFA

Spatial Index Flood Model

Pairwise Likelihood Estimation

Simulation Study

Design

Results

When PL Is Correctly Specified

Figure 1.

Misspecification of PL Within Spatial Extreme Models

Figure 2.

Misspecification of PL Under Nonextreme-Value Model

Figure 3.

Illustration

Figure 4.

Figure 5.

Table 1.

Discussion

Acknowledgments

Key Points

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases