Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Apr 1;49(9):2349–2369. doi: 10.1080/02664763.2021.1904385

Adjusting statistical benchmark risk analysis to account for non-spatial autocorrelation, with application to natural hazard risk assessment

Jingyu Liu a, Walter W Piegorsch a,b,c, A Grant Schissler d,CONTACT, Rachel R McCaster e,f, Susan L Cutter e,f
PMCID: PMC9225316  PMID: 35755089

ABSTRACT

We develop and study a quantitative, interdisciplinary strategy for conducting statistical risk analyses within the ‘benchmark risk’ paradigm of contemporary risk assessment when potential autocorrelation exists among sample units. We use the methodology to explore information on vulnerability to natural hazards across 3108 counties in the conterminous 48 US states, applying a place-based resilience index to an existing knowledgebase of hazardous incidents and related human casualties. An extension of a centered autologistic regression model is applied to relate local, county-level vulnerability to hazardous outcomes. Adjustments for autocorrelation embedded in the resiliency information are applied via a novel, non-spatial neighborhood structure. Statistical risk-benchmarking techniques are then incorporated into the modeling framework, wherein levels of high and low vulnerability to hazards are identified.

Keywords: Benchmark dose, centered autologistic model, maximum pseudo-likelihood, quantitative risk assessment, natural hazard vulnerability, non-spatial autocorrelation

1. Introduction

1.1. Background: Place-based risk analytics

Quantifying vulnerability to hazardous events is a critical component in contemporary environmental risk assessment. Driven by concerns over climate change and other consequences of natural or human-induced activities, data scientists, risk assessors, and policy makers are deeply involved in the effort to identify and characterize such localized vulnerabilities. According to the US National Oceanic and Atmospheric Administration (NOAA), calendar year 2020 was the most impactful year on record for weather-related disasters, in terms of losses exceeding one billion US dollars (https://www.ncdc.noaa.gov/billions/). The potential for future catastrophic events of similar ilk is an issue of clear concern, as scientists warn that climate change could make extreme weather events more damaging [17]. This recognition, and the debate associated therewith, motivates the approach we proffer here to quantify place-based vulnerabilities to natural and human-induced hazards.

We appeal to a mechanism from statistical risk analysis: estimation of minimum exposure levels, called Benchmark Doses (BMDs), that induce a pre-specified Benchmark Response (BMR) in a target population [8]. Established inferential approaches for BMD analysis typically involve one-sided confidence limits, leading in practice to what are called Benchmark Dose Lower Limits [BMDLs; see 9]. Within this context, we previously extended the BMD from its roots in environmental toxicology to characterize socio-geographic vulnerability among 132 US urban centers (‘cities’) to terrorist events, applying a quantitative, place-based vulnerability index (PVI) to a database of terrorism incidents and related human casualties [27]. We employed a centered autologistic regression model that related urban vulnerability to terrorist outcomes and also adjusted for autocorrelation in the geospatial data. Benchmark ‘doses’ were then estimated within this modeling framework, wherein levels of greater and lesser urban vulnerability to terrorism were identified. This new, translational adaptation of the risk-benchmark approach, including its ability to account for geospatial autocorrelation, was seen to operate quite flexibly in that socio-geographic setting. Herein, we extend and evolve those results by allowing for a broader definition of geographic autocorrelation, and by employing the expanded methodology to mine a much larger geographic database – described in the next section – of natural hazard outcomes.

1.2. Hazard assessment for 3108 US counties

Expanding on our previous work, we apply our autocorrelated benchmark risk approach to a larger, more-complex knowledgebase of geospatial environmental responses to natural or human-induced hazards: the Spatial Hazard Events and Losses Database for the United States (SHELDUSTM). SHELDUS is a county-level geo-referenced knowledgebase of economic and human losses from 18 different types of natural hazards for the continental US, Alaska, and Hawaii (http://www.sheldus.org). It covers the period from 1960-present and includes data fields on location (FIPS code), date of event, and direct losses (property losses, crop losses, injuries, deaths). The original input loss data are from US government agencies – e.g. the US Geological Survey, the US Department of Agriculture, NOAA, and the Federal Emergency Management Agency – updated for locational specificity to counties. SHELDUS represents the only comprehensive natural hazards loss database in the US and is widely employed by researchers, businesses, and governmental agencies.

We centered our attention on 3108 counties in the 48 conterminous US states (excluding Broomfield, CO, for which data were not available). For the outcome variable, we explored how disasters led to cumulative property or crop losses of one billion US dollars or more. The billion-dollar threshold is used by NOAA to highlight singular, extreme weather events (https://www.ncdc.noaa.gov/billions/), but of course significant losses such as these can be a consequence of much more than weather-related impacts. At the inception of the billion-dollar loss threshold in 1980, an event causing more than one billion dollars of losses was felt to define an extreme weather event in terms of adverse economic consequences. As a result, we adopted billion-dollar cumulative losses as our indicator of disaster-related, per-county outcomes: from SHELDUS, we created a binary response variable, Yi, indicating whether or not the cumulative loss of property and crop damage in the ith US county exceeded one billion dollars, as reported in calendar year 2015 (i=1,,3108). Of the 3108 counties in our database, Yi=79, or 2.5%, reported a cumulative loss exceeding this one billion dollar threshold.

Associated with these adverse outcomes, we set the target predictor variable, xi, as the number of hazards that caused a fatality or loss in a county during the previous 16 years (1990–2015). This numbers-of-hazards predictor distinguishes between common everyday events that add up over time to significant losses versus losses based on a few events but where the impacts were very large.

Across the 3108 counties, x ranged from 6 to 1247 with arithmetic mean equal to 135.7 and standard deviation equal to 108.0. Figure 1 displays a histogram (with overlayed kernel density estimate) for the target predictor, indicating a unimodal and clearly right-skewed pattern.

Figure 1.

Figure 1.

Histogram and overlaid kernel density estimate (dark curve) of the number of hazards per county (x) that caused a fatality or loss in the 1990–2015 period, for 3108 counties in the 48 conterminous US states. Histogram employs Scott's rule [35] for bin selection.

Figure 2 gives a further graphical summary for x by mapping the geographic distribution of the predictor over the 3108 US counties. A clearly non-homogenous pattern is indicated, with higher numbers of hazards observed in the southwest desert, Iowa, along the eastern Great Lakes and most of New York, and otherwise a scattering of higher numbers throughout the central and eastern US

Figure 2.

Figure 2.

Geographic distribution of the number of hazards per county that caused a fatality or loss in the 1990–2015 period, for 3108 counties in the 48 conterminous US states. Darker shading indicates higher numbers. © 2019 HVRI. Used by permission. All rights reserved.

Our interest with these data is the modeled probability, πi, that the ith county exhibits a cumulative one billion dollar loss in 2015 as predicted by its hazardous fatality/loss history. As with our previous analysis of urban terrorism [27] we propose here that beyond a relationship to numbers of hazardous fatalities/losses, πi would also be affected in some fashion by other, possibly autocorrelated features relating to place-based vulnerability. In effect, we define a neighborhood or clique for the ith county comprised of those other US counties exhibiting similar features, inducing autocorrelation in the response Yi. Rather than rely on simple geospatial adjacency, however, we postulate that the definition of a location's neighborhood need not be limited to map-based coordinates: it is not unlikely that two counties' outcomes may correlate in ways defined other than by spatial proximity. For instance, consider two counties on the US Chesapeake Bay such as Anne Arundel county and another physically adjacent county farther to the south from the nearby city of Baltimore, say, Calvert county, MD. Clearly, due to their locations on the Chesapeake Bay the two counties share much in common. Anne Arundel's regional vulnerability to losses from a hazardous incident might not be as strongly correlated, however, with its physical neighbor to the south (Calvert) as much as with another more-distant locality exhibiting similar environmental, geophysical, and socio-economic characteristics – say, Dearborn county, OH, which is near to the metropolitan city of Cincinnati, OH (comparable to Baltimore) – and vice versa. Indeed, by expanding the neighborhood definition beyond simple spatial proximity, our vulnerability analysis may be able to ‘borrow strength’ from across the ensemble and consequently improve risk-analytic precision. This may be especially true for counties in the more-dispersed central and Western US states, where large urban centers are more isolated geospatially.

As is true in many areas of scientific endeavor, by studying familiar processes in unfamiliar settings a better grasp can be achieved of both the familiar processes and the new settings. With this in mind, we consider an alternative definition for the cliques that allows for physically non-adjacent localities to be included in each neighborhood, as long as these also conform to the underlying statistical requirements for an appropriate neighborhood structure. That is, the indices defining the elements (counties) within each neighborhood and their associated responses must satisfy a series of conditional probability and response requirements given, e.g. by Cressie [10, §6.4.1]. In all the operations we present below, these various requirements are satisfied. We focus our attention on an autocorrelated logistic model's ability to extend into non-spatial settings, and show how the model can apply successfully in the presence of non-spatial autocorrelation. While the context changes from our previous article [27], the concept retains surprising flexibility. The effort in effect resembles a data mining/knowledge discovery exercise [18]: query the large 3108-counties database to uncover potential characteristics for defining a neighborhood structure that may prove more informative than quantities based on simple, naïve spatial propinquity.

1.2.1. Baseline resilience indicators for communities (BRIC)

To quantify and explore non-geographical definitions for the neighborhood structure when studying county-level relationships between numbers of hazards and cumulative billion-dollar losses, we employ a metric describing Baseline Resilience Indicators for Communities [BRIC; see 12] for each county. We quantify these extended neighborhood characteristics by focusing attention on a single county's resilience to hazardous impacts. Holling [20] first used the term resilience, from an ecological perspective, to describe a ‘measure of the persistency of systems and their ability to absorb change and disturbance and still maintain the same relationships between populations or state variables’ [20, p. 14]. This notion of resilience has gained diverse popularity in the past few decades [14,24,28]. In terms of natural hazards, resilience is defined as the ability of a system – here, a county – to survive, respond to, and cope with a hazardous natural or human-induced event [5,11].

To measure the inherent resilience of counties in the United States, Cutter et al. [12] quantified six different types of resilience identified in the extant literature. An initial set of 61 variables was collected from 30 different sources to represent the six forms of resilience: social, economic, community, institutional, housing/infrastructure, and environmental (see that 2014 article for full details). The authors then converted the original count variables into percentages, rates, differences, or averages in order to reliably compare counties of different sizes and characteristics. The data then underwent min-max normalization [36] so that all values had comparable reference points on a scale of 0 to 1. Cutter et al. also adjusted the orientation of each variable to make larger values correspond theoretically to higher resilience. They then discarded conceptually extraneous variables and also applied Cronbach's alpha coefficient [13] to assess the internal consistency among the remaining variables in the dataset when building the composite scores [29,30]. Their final set contained 49 indicators (Cronbach alpha = 0.66) which were further distilled into their six resilience measures, each of whose potential scores range from 0 to 1. Note that we treat these six-dimensional BRIC scores as constants derived from each county's underlying (i) social, (ii) economic, (iii) community, (iv) institutional, (v) housing/infrastructure, and (vi) environmental characteristics, in similar fashion to the less-extensive PVI from our earlier 132-cities dataset [27].

1.2.2. Non-spatial autocorrelation

The six BRIC components measure qualities of a community that may enhance or detract from its ability to prepare for, respond to, recover from, and mitigate environmental hazards in the six targeted domains: social, economic, housing & infrastructure, institutional, community, and environmental. Therefore, for each county in our 3108-counties database, the BRIC components as a whole represent highly informative and comprehensive markers for natural-hazard risk assessment. It is then straightforward to employ them for defining a non-spatial neighborhood structure. Denote bi=[bi1bi6] and bj=[bj1bj6] as the vectors of six-dimensional BRIC scores of county i and county j, respectively. The Euclidean distance can then be calculated between the two six-dimensional vectors for any pair of counties in the conterminous US:

dij=t=16(bitbjt)2. (1)

This produces a simple, well-defined ‘proximity’ measure for each pair of counties, from which neighborhoods may be derived. Figure 3 presents a histogram for the univariate distances between each pairing of the 3108 counties. The pattern displays a clear right skew.

Figure 3.

Figure 3.

Histogram and overlaid kernel density estimate (dark curve) of the distribution for the Euclidean distances between six-dimensional BRIC scores for each pair of counties with the 3108-counties data. Histogram employs Scott's rule [35] for bin selection.

We next set some threshold or breakpoint, D, below which the ith and jth counties are viewed as ‘neighbors’; i.e. they possess similar six-dimensional disaster resilience. Then, the corresponding neighborhood structure can be quantified via an adjacency matrix, A, whose elements are simply

aij={1,if0dijD,0,otherwise.

For exploratory purposes, we used percentiles of the empirical distributions of the dijs to define D. For mining SHELDUS we studied a range of small percentiles: 0.25th, 0.5th, 1st, 2nd, 3rd,…, 9th, 10th. Larger percentiles were considered as well, but these gave such highly dispersed ‘neighborhood’ patterns that no significant relationships were observed between the variables. This had us limit ourselves to the 10th percentile and below.

In addition, we also included naïve spatial autocorrelation, by constructing an adjacency matrix with a simple rook-adjacency neighborhood structure, as employed in [27]. For our SHELDUS data, approximately 0.2% of the (31082) pairs are spatially adjacent ‘neighbors’ to each other. Our goal herein is to employ these various metrics, explore vulnerability, and characterize risk to hazardous events while adjusting for possible autocorrelated community-level resilience in the SHELDUS 3108-counties database.

In what follows, Section 2 reviews the autologistic benchmark framework we employed [27], along with details on how the risk-benchmark calculations can be incorporated into a centered autologistic regression model. In Section 3 we return to the 3108-counties database from above and apply the risk-analytic benchmark approach to illustrate its use with our novel, non-spatial autocorrelated model structure. Section 4 ends with a brief discussion. All calculations we present here were performed in the R programming environment [34].

2. Autologistic model development

2.1. Centered autologistic model

We appeal to the well-known logistic regression model [21], which is widely applied with binary data such as we have with our 3108-counties database. A simple and direct way to account for spatial autocorrelation within a logistic model allows the response probability, πi, at the ith observation – here, the ith county – to depend upon the other observed binary responses in some pre-defined, possibly non-spatial, neighborhood, Ni, for that county. Besag [2] first proposed such a formulation to incorporate neighboring autocorrelation, the conditional autologistic model

πi=P[Yi=1|xi,yj,ji]=11+exp{β0β1xiβ2jNiaijyj}. (2)

In our notation, yi is the outcome indicator, xi is the ‘dose’ predictor, the aijs stipulate each yi's BRIC-based neighborhood Ni (see Section 1.2), and jNiaijyj is an autocovariate constructed from the information within Ni. We refer to β2 as the autocorrelation parameter. When β2=0, no autocorrelation exists in the data, and the autologistic model then reduces to a standard logistic regression (which we call the ‘independence model’).

Caragea and Kaiser [7] offered a correction to (2) that extends the conditional model into a centered autologistic form, by redefining the spatial autocovariate (essentially, by centering it):

πi=P[Yi=1|xi,yj,ji]=11+exp{β0β1xiβ2jNiaij(yjμj)} (3)

where the new quantity μj=E[Yj|β2=0]={1+exp(β0β1xj)}1 is the expected value of Yj under an independence model with no autocorrelation. Centering makes the estimates more stable and interpretable, and as such the centered autologistic construction in (3) provides a direct and simple way to build autocorrelated dependency into a binary regression ([23]; [25, §8.3]). We therefore turn to (3) for constructing quantitative risk/vulnerability assessments with our data. More details on the importance of autologistic centering are available in [7], [23], and our own previous discussion [27].

We note in passing that our non-spatial adjacency model and a model employing only naïve adjacency can both be accommodated under the centered autologistic construction in Equation (3), via appropriate definition of the neighborhoods, Ni.

2.2. Maximum pseudo-likelihood estimation

Allowing for autocorrelation induces dependencies among the observations, complicating standard likelihood analysis with our centered autologistic model. Instead, we estimate the unknown β-parameters in (3) via maximization of the pseudo-likelihood function, as proposed by Besag [3]. Besag found that by multiplying together the conditional probability distributions of the Yis given their neighbors, the resulting pseudo-likelihood

i=1nP[Yi=1|xi,yj,ji]=i=1nπiyi(1πi)1yi

possesses many of the same features as the usual likelihood function, despite any existing dependencies among the observations. In particular, maximum pseudo-likelihood estimates (MPLEs) of the β-parameters are consistent and asymptotically normal under typical regularity conditions [1].

2.3. Autologistic benchmark risk analysis

With binary data, a traditional benchmark-risk analysis relates a pre-specified level of outcome response – the benchmark response, BMR – to the response probability π(x) viewed as a function of the ‘dose’ variable x. The goal is to find the smallest positive x at which a background-adjusted function of π(x), R(x)={π(x)π(0)}/{1π(0)}, equals the BMR. The solution is the benchmark dose (BMD), past which a location's risk to some hazardous event is elevated to unacceptable levels at that BMR. The ‘background’ adjustment is included to account for extemporaneous factors out of the risk assessor's control; a prototypical example is correction for spontaneous (zero-exposure) tumor incidence when assessing the risk of exposure to a carcinogen [32, §4.2.1].

An added complexity with the centered autologistic model in (3) is, however, presence of the centered autocovariate jNiaij(yjμj). To overcome this, in [27] we exploited a clever definition for the benchmark response proposed by Budtz-Jørgensen et al. [6]: view BMR as a specified proportional increase over zero-level background in the odds of an adverse event, evaluated at the same constellation of secondary covariates. At least for logistic-type models, the functional dependency on secondary covariates then cancels out in the various ratio operations [26]. Budtz-Jørgensen et al. provided no guidance on selection of the BMR here, so we mimic our previous approach from [27]: we set BMR = 10 or 25, as we found that an increase of these magnitudes in the odds could be useful markers in practice. It will require future, more-extensive benchmark risk analyses with autocorrelated data to determine if selection of BMR = 10 or BMR = 25 performs as suitably as we have seen with our data.

Employed with the centered autologistic regression model in (3), we therefore define the BMD as the dose resulting in a pre-specified increase (of BMR multiples) in the odds for an abnormal response. That is, solve for x in

BMR=π(x|β)/{1π(x|β)}π(0|β)/{1π(0|β)}, (4)

where π(|β) is the centered autologistic response probability based on (3) and β=[β0β1β2]T is the vector of unknown autologistic coefficients. Note that for adverse outcomes we anticipate β1>0, representing an underlying increase in the probability of an adverse event as x increases. This typically defines a proportional rise in odds, thus we operate implicitly with BMR>1.

As illustrated in [27], the odds ratio in (4) simplifies to

π(x|β)/{1π(x|β)}π(x=0|β)/{1π(x=0|β)}=exp(β0+β1x+β2tNsat(ytμt))exp(β0+β2tNsat(ytμt))=exp{β1x},

Setting this odds ratio equal to the BMR and solving for x then yields the unique BMD at the given BMR:

BMDBMR=log(BMR)β1. (5)

This is the risk/vulnerability value resulting in a (BMR 1)-fold increase in odds of an adverse event, relative to a zero background value.

2.4. Benchmark estimation and inference

From the data pairs (xi,Yi) for each ith location (i=1,,n), we fit the centered autologistic regression model from (3) via maximum pseudo-likelihood, as in Section 2.2. The result is a vector of MPLEs β^=[β^0β^1β^2]T. To estimate these we employ the R package ngspatial [22]; see https://CRAN.R-project.org/package=ngspatial. Given the MPLE β^1 (and a predetermined level for the BMR), from (5) the corresponding point estimate for the BMD is then simply

BMD^BMR=log(BMR)β^1. (6)

We further desire a lower 1α confidence limit on the BMD, denoted as BMDL BMR. The straightforward form for the BMD in (5) allows for a particularly simple construction, at least conceptually: if b1U is an upper 1α confidence limit on β1 such that P[β1<b1U]=1α at least approximately in large samples, then clearly

P[log(BMR)b1U<log(BMR)β1]=P[log(BMR)b1U<BMDBMR]=1α (7)

if BMR>1. This defines a lower 1α confidence limit on the BMD, thus we take BMDL BMR=log(BMR)/b1U. We note in passing that Equation (7) can equivalently be written as P[b1U1log{BMR}<BMDBMR, BMR>1]=1α, which provides a simultaneous lower confidence statement on the BMD, in the sense of Nitcheva et al. [31].

Actually finding such a b1U confidence limit is a more difficult task, however, since it is inappropriate under the centered autologistic model to imitate standard practice and estimate the standard error of β^1 by inverting the MPL information matrix [37]. Instead, we follow a suggestion by Hughes [22], who calls for a computer-intensive, parallel, parametric bootstrap approach. The method can be implemented in the ngspatial package [22], and it returns a bootstrap distribution of B>0 resampled β1 values, based on the original data. An approximate upper 1α confidence limit can then be taken as the (1α)B quantile from this bootstrap distribution, denoted as b1B. Following [27], we operate with B = 5000 bootstrap resamples. From this, a bootstrap-based lower confidence limit for the BMD becomes

BMDLBMR=log(BMR)b1B. (8)

3. Benchmark analysis for the 3108-counties data

3.1. Maximum pseudo-likelihood estimates

Returning to the 3108-counties database from Section 1.2, we regressed Y on x= number of hazards causing a county fatality or loss in the 1990–2015 period, via the centered autologistic model in Equation (3) employing Euclidean distance from (1). This produced MPLEs β^0, β^1, and β^2 for the unknown parameters. For each percentile and its corresponding adjacency neighborhood matrix A introduced in Section 1.2.2, Table 1 provides the MPLEs, pointwise 95% bootstrap confidence intervals for β1 and β2, and an upper 95% bootstrap confidence limit on β1, denoted as b1B from Section 2.4. Notice that values of β^0 are negative and values of β^1 and β^2 are positive in all cases. Further, the MPLEs for β0 and β1 are quite close under each percentile-defined adjacency matrix A. The autocorrelation parameter estimate, β^2, generally decreases as the percentile used to define A increases. When the neighborhood structure is described via simple adjacency, the three coefficient estimates are evidently larger in absolute value compared with those obtained under the non-spatial, BRIC percentile-defined A matrices.

Table 1.

The MPLEs with pointwise 95% bootstrap confidence intervals for β1 and β2 for the 3108-counties database, given different BRIC-distance percentile neighborhood matrices A.

A β^0 β^1 β^2 b1B β1 lower β1 upper β2 lower β2 upper
Naïve adjacency −6.1665 0.0106 1.6745 0.0123 0.0068 0.0127 1.5064 1.8019
0.25th percentile −4.4029 0.0062 1.1753 0.0074 0.0047 0.0077 1.0941 1.2787
0.5th percentile −4.2758 0.0059 1.0198 0.0074 0.0041 0.0077 0.9404 1.1200
1st percentile −4.1957 0.0060 0.6714 0.0076 0.0042 0.0079 0.6180 0.7399
2nd percentile −4.1876 0.0057 0.4808 0.0075 0.0037 0.0079 0.4395 0.5354
3rd percentile −4.1832 0.0053 0.3684 0.0072 0.0033 0.0076 0.3348 0.4130
4th percentile −4.3289 0.0048 0.4306 0.0069 0.0021 0.0074 0.3543 0.4605
5th percentile −4.3394 0.0044 0.3971 0.0069 0.0022 0.0074 0.3549 0.4608
6th percentile −4.4292 0.0042 0.4198 0.0073 0.0014 0.0079 0.3721 0.4939
7th percentile −4.4258 0.0040 0.3842 0.0071 0.0011 0.0078 0.3380 0.4554
8th percentile −4.4516 0.0040 0.3484 0.0073 0.0006 0.0080 0.3049 0.4156
9th percentile −4.4873 0.0041 0.3173 0.0078 0.0008 0.0087 0.2783 0.3817
10th percentile −4.5124 0.0041 0.3035 0.0081 0.0004 0.0089 0.2654 0.3669

Notes: b1B represents an upper 95% bootstrap confidence limit for β1. All confidence statements are based on 5000 bootstrap resamples. Euclidean distance as in Equation (1) is employed to define the adjacency metric (apart from naïve adjacency).

The ngspatial package was used to report pointwise 95% bootstrap confidence intervals for the non-intercept parameters in Table 1, where we see every pointwise interval for β1 fails to contain β1=0. Thus it can be reasonably concluded that the number of hazards significantly affects the response probability π; from the strictly positive values in every interval it appears to do so via an increasing relationship. Similarly, as the pointwise 95% intervals for β2 fail to contain β2=0, we conclude that the autocovariate is important for describing features of π. And, as the pointwise β2 intervals present only positive values, the autocorrelation appears positive. As a result, our analysis suggests that when a US county exhibits a cumulative billion-dollar-loss event, another county with ‘neighboring’ BRIC patterns would expect to experience such a result as well, and vice versa.

Figure 4 presents histograms and overlaid kernel density estimates of the corresponding bootstrap distributions for β^1 and β^2, based on 5000 bootstrap values from ngspatial, when the neighborhood structure is defined using the 1st percentile (see Section 3.2) of the six-dimensional BRIC Euclidean distances between each pair of counties. Both appear roughly bell-shaped, suggesting that the large-sample features of the MPLEs are close to if not fully realized with the 3108 counties in this database.

Figure 4.

Figure 4.

Histograms and overlaid kernel density estimates (dark curves) of the bootstrap distributions for β^1 (top panel) and β^2 (lower panel), based on 5000 bootstrap replicates from the centered autologistic fit of the 3108-counties data. The neighborhood structure is defined via the 1st percentile of the BRIC distances. Histograms employ Scott's rule [35] for bin selection.

3.2. Benchmark risk assessment

Moving to the risk-benchmark calculations, we apply Equation (6) for a point estimate of BMDBMR. In our application, the benchmark ‘dose’ variable is the number of hazards per county that caused a fatality or loss in the 1990-2015 period, therefore we refer to the BMD for clarity here as a Benchmark Number, or BMN. As mentioned above, the choice for the BMR here is an open issue; to our knowledge, the benchmark paradigm has been applied with autocorrelated models of this sort only in our previous work [27]. Thus a certain portion of this exercise must be viewed as exploratory in nature. To begin, BMR = 10 was taken as an initial choice for the relative increase in odds over background when dealing with the 3108-counties data. In addition, and similar to our earlier analysis with the US 132-cities data [27], for comparison purposes a higher benchmark response at BMR = 25 was also considered, anticipating that an increase of this magnitude in the odds could be an informative marker for practical use. Notice that the relationship between BMN^ and BMR specified by Equation (6) still holds. At BMR = 10, this leads to BMN^10=log(10)/β^1. Given a one-sided, 95% bootstrap confidence bound of the form b1B>β1 from the MPL fit (above), the 95% lower bootstrap limit is BMNL10=log(10)/b1B. Similarly, from BMN^25=log(25)/β^1 the 95% lower bootstrap limit is BMNL25=log(25)/b1B. Specific values for our data are presented in Table 2.

Table 2.

Estimated Benchmark Number BMN^BMR and pointwise 95% confidence limits BMNL BMR at BMR=10, 25, along with safety predictivity rate ςp at each pth percentile, for each neighborhood structure matrix A with the 3108-counties data, using the same conditions as in Table 1.

A β1^ b1B BMN^10 BMNL10 BMN^25 BMNL25 ςp
Naïve adjacency 0.0106 0.0123 218.2177 187.5223 305.0553 262.1449 n/a
0.25th percentile 0.0062 0.0074 374.0632 309.7607 522.9180 433.0269 0.977190
0.5th percentile 0.0059 0.0074 392.6402 310.9472 548.8875 434.6856 0.977198
1st percentile 0.0060 0.0076 383.8631 303.2917 536.6176 423.9836 0.977778
2nd percentile 0.0057 0.0075 403.7716 305.8832 564.4485 427.6064 0.977469
3rd percentile 0.0053 0.0072 431.1988 320.1287 602.7900 447.5207 0.976905
4th percentile 0.0048 0.0069 477.7865 328.7617 667.9168 459.5892 0.976300
5th percentile 0.0044 0.0069 522.5073 334.7674 730.4338 467.9847 0.976339
6th percentile 0.0042 0.0073 547.7995 317.0368 765.7908 443.1985 0.977213
7th percentile 0.0040 0.0071 572.7168 323.7617 800.6237 452.5994 0.976913
8th percentile 0.0040 0.0073 575.6973 315.0101 804.7903 440.3652 0.977213
9th percentile 0.0041 0.0078 565.2493 296.4169 790.1846 414.3730 0.977733
10th percentile 0.0041 0.0081 563.3387 283.2238 787.5137 395.9299 0.977607

Note: n/a, not applicable.

Comparing the range between BMNLBMR and BMN^BMR in Table 2, values generally become larger as the percentile increases, although the quantities taper a bit for higher percentiles. This is not altogether surprising: more counties are considered to be neighbors to each other when a larger percentile is used. As a result, more variation is introduced to the model fit and this appears to increase both the point estimates and the uncertainty associated with the estimation process.

We found that when the 0.25th percentile is used as a breakpoint to define the adjacency matrix A, 0.25% of the (31082) pairs of counties were defined as neighbors to each other. Coincidentally, this percentage is very close to the 0.2% of neighbors under naïve spatial adjacency; nonetheless, the gap between BMNLBMR and BMN^BMR almost doubles. Moving to higher percentiles increases the separation.

In order to identify an appropriate percentile from Table 1 for use in defining the final adjacency matrix A, we appealed to methods from prediction analytics. For each percentile in Table 1 we found the corresponding BMNL25. (We operated with BMNL25 instead of BMNL10 due to its more stringent level of benchmark response.) Next, we determined how often a county would be predicted to remain ‘safe’ from a billion-dollar loss based on that BMNL25, i.e. when the BMNL25 rose above the county's number-of-hazards x-predictor. We then determined the overall ‘safety predictivity’ at the given percentile, p, as ςp = (number of counties observed to avoid billion-dollar losses)/(number of counties predicted to avoid billion-dollar losses). Higher values of ςp indicate greater safety predictivity.

We do acknowledge that other possible association metrics could be applied here. Examples include concordance, sensitivity, specificity, etc. We feel our predictivity measure targets the sort of indicator a risk manager would pursue in this setting but, of course, eventual choice among such metrics is up to the predilections of each individual analyst.

Table 2 reports the ςp values in its final column. As seen there, we find highest safety predictivity at p=1%, although the values are all encouragingly large and rather close to each other. Thus in what follows we operate with the 1st percentile as the threshold to determine our BRIC-based neighborhood structure. Applied to the data, we find BMN^10=log(10)/β^1=383.8631 with 95% lower bootstrap limit BMNL10=log(10)/b1B=303.2917. Also, BMN^25=log(25)/β^1=536.6176 with 95% lower bootstrap limit BMNL25=log(25)/b1B=423.9836. Counties experiencing numbers of hazards larger than these values are viewed as exhibiting excess risk to cumulative billion-dollar losses, based on our benchmark analysis.

To illustrate the benchmark delineations geospatially, Figure 5 maps the 3108 counties, distinguishing those whose numbers of hazards exceed our two benchmarks above: counties whose number of hazards exceed BMNL25 are marked in darker shade, while those whose number of hazards exceed only BMNL10 are marked in lighter shade. (A colorized version is available in the online version.) Counties with number of hazards below both benchmarks are marked in an intermediate tone. For comparison, Figure 6 maps the same relationship under the simpler, naïve spatial adjacency neighborhood structure.

Figure 5.

Figure 5.

Map of 3108 counties, coded to indicate risk status from the centered autologistic analysis: intermediate shade indicates a county with x<BMNL10, lighter shade indicates BMNL10x<BMNL25, while darker shade indicates xBMNL25. (Neighborhood structure matrix is defined using the 1st percentile of the Euclidean distances of the six-dimensional BRIC scores between counties.) © 2019 HVRI. Used by permission. All rights reserved.

Figure 6.

Figure 6.

Map of 3108 counties, coded to indicate risk status from the centered autologistic analysis: intermediate shade indicates a county with x<BMNL10, lighter shade indicates BMNL10x<BMNL25, while darker shade indicates xBMNL25. (Neighborhood structure matrix is defined via naïve adjacency.) © 2019 HVRI. Used by permission. All rights reserved.

Figure 5 indicates that 93 counties exhibit high relative benchmark risk to cumulative billion-dollar losses; these are situated primarily in the southwest desert, Iowa, and along the eastern Great Lakes. The 136 counties at moderate benchmark risk are generally concentrated around those counties at high risk. No such high/moderate-risk counties appear in the northwest and the central US. In total, there are 2879 counties at lesser risk. The pattern has strong similarities to the geographic distribution of the hazards predictor seen in Figure 2, which is not surprising. Indeed, this helps corroborate the indication from the confidence intervals in Table 1 that xi is a significant positive predictor of πi.

Many more counties are considered to be at risk higher than seen in Figure 5 when autocorrelation is defined via naïve adjacency in Figure 6. There, the 306 high-risk and 321 moderate-risk counties are generally concentrated along the eastern Great Lakes, Iowa, various waterways such as the Connecticut and St. Lawrence Rivers, the interior southwest desert, and scattered throughout the US southeast, often centered at or near large cities. A few central counties also exceed BMNL10, but not to the extent seen in the eastern US. There are also more counties in Iowa identified at high relative benchmark risk to cumulative billion-dollar losses than seen in Figure 5. In both Figures 5 and 6 (and Figure 2), however, the larger patterns are retained – in effect, only the numbers change to reflect the choice of BMR. From a methodology perspective, this provides some level of assurance that the statistical operations identify underlying features in a reliable pattern. Also, the BMN lower limits estimated under naïve spatial adjacency appear to be more conservative than those under the non-spatial BRIC-based definition of neighboring counties. In fact, this extends to all the percentile cut-offs that we report in Table 2. We feel that the consistent pattern in our non-spatial neighborhood definition perhaps cuts through the ‘noise’ in our large database and helps focus attention on the resilience-adjusted ‘signal.’

3.3. Coverage properties of the BMN^ and BMNL

To explore the operating characteristics of BMN^ based on (6) and the BMDL based on (8), we conducted a short Monte Carlo simulation study. Various numerical features were taken from our 3108-counties data analysis, above, to construct the simulation design. Specifically, we randomly sampled, without replacement, n counties from the actual 3108-counties data set, extracting each of their six-dimensional BRIC scores and their target benchmark predictor x = number of hazards. We selected two values for n: 900 and 3025. The neighborhood structure matrix A for the n sampled counties was based on the non-spatial autocorrelation approach described in Section 1.2.2: from the six-dimensional BRIC Euclidean distances in (1) between each pair of counties we set aij=1 if 0dijD, and aij=0 otherwise, where the breakpoint D was set to be the 1st percentile of the distances, based on our experience with it for the 3108-counties data.

Given values for xi and Ai,i=1,,n, binary responses Yi with response probabilities defined via the centered autologistic model in (3) were generated via the computer, using the perfect sampler given by Hughes [22]. For the three unknown parameters in β, we chose two combinations: β=[4,0.005,1]T and β=[6,0.01,1]T, whose components are roughly the values of the MPLEs we found from fitting the centered autologistic model to the 3108-counties data under different definitions of neighborhood structure; see Sec. 3.1, above. In particular, the value of β2=1 was considered to cover positive autocorrelation. For completeness, we also included the independence case with β2=0. (The case of negative autocorrelation was not considered, as no significant negative autocorrelations were identified in the 3108-counties analysis above.) Coupled with the two different sample sizes n = 900, 5025, this produced a total of eight different design/parameter configurations for study; see Table 3.

Table 3.

Empirical coverage rates for centered autologistic benchmark number lower confidence limit (BMNL) at any BMR>1 (see text) based on 2000 simulated data sets, along with empirical rates of convergence failure for the MPL algorithm, each across 107 fitting attempts (5000 bootstrap resamples from 2000 simulated data sets), stratified by true autologistic regression parameter configuration (left column), and sample size n.

Coefficients: Coverage rate   Failure rate
[β0,β1,β2] n = 900 n = 3025   n = 900 n = 3025
[4,0.005,0] 0.956 0.954  
[4,0.005,1] 0.963 0.946  
[6,0.01,0] 0.968 0.964   9.20×106
[6,0.01,1] 0.968 0.959  
Average 0.964 0.956      

Notes: Nominal coverage level is set to 95%. Dashes indicate no convergence failures.

For each of the eight simulation configurations, 2000 simulated data sets were generated. In each simulated data set, the consequent MPLEs were calculated and from these, the corresponding BMN^ and bootstrap-based BMNL were determined using the approach described in Section 2.4 via the ngspatial R package [22]. Then, the number of times these BMNLs correctly covered – i.e. remained below – the true value of the BMN for that parameter configuration was recorded. Dividing this by 2000 gave an empirical Monte Carlo estimate of the method's actual confidence level. Nominal confidence was set to 1α=0.95; therefore with 2000 simulations per configuration, the approximate standard error of our empirical coverage rates at the nominal 95% level is (0.05)(0.95)/2000=0.005 and it never exceeds (0.5)(0.5)/2000=0.011.

Notice that under the construction based on Equation (7), the BMNL will correctly cover the true BMN from below if and only if the corresponding upper bootstrap limit b1U correctly covers β1 from above. As a result, the operation is independent of the BMR, so these coverage assessments hold for any choice of BMR>1.

The simulation results appear in Table 3. These represent empirical coverage rates of our bootstrap-based lower 95% confidence limit on the true BMN under the centered logistic model. (Indeed, they also represent empirical coverage rates of the bootstrap-based 95% upper confidence limit on β1.) As can be seen, all values rest above the nominal 95% confidence coefficient except in the case of β=[4,0.005,1]T and n = 3025, and this value does not differ significantly from the nominal level. As sample size increases, the rates drop towards the nominal level, on average. They also show rough agreement at these sample sizes with a larger simulation study we conducted in our previous investigation of geospatial risk benchmarking [27]. Taken together, we find that these bootstrap-based confidence limits appear to operate in a reasonable, if slightly conservative manner.

We did discover one minor instability with the MPL fitting algorithm. Apparently, cases can occur where the algorithm fails to converge in some bootstrap resamples, so that ngspatial reports only ‘NA’ for β^1. The resulting bootstrap distribution therefore contains fewer than the desired B = 5000 resampled β^1 values within the simulation run. Table 3 also displays rates of how often this occurred in our simulations; each value is the number of reported NAs out of 107 fitting attempts (i.e. 5000 bootstrap samples ×2000 simulated data sets) at each simulation configuration. As can be seen, failures only occur for the case of β=[6,0.01,0]T at n = 900, and the rate is quite low. In fact, this phenomenon was also observed in our previous study [27]: failure rates there were comparable to the larger sample sizes we consider herein. As in that study, we view this as a slight inconvenience and a tolerable consequence of employing such a complex model/fitting procedure.

4. Discussion

In our previous work [27], we showed how to incorporate spatial autocorrelation for environmental risk assessment with autocorrelated geospatial data via a centered autologistic model. In fact, those methods are sufficiently extensible to apply in a variety of data scenarios where spatial autocorrelation may challenge the more-simplistic models in common use. This paper separates from naïve spatial adjacency and further extends the autologistic benchmark model to novel, non-spatial, autocorrelated settings. We corroborate the flexibility of the centered autologistic framework established in our previous article; however, we also find that for a binary outcome and a single ‘dose’ predictor, one can employ a variety of different definitions for neighborhood structure to quantify autocorrelation. We are led to encourage data scientists to explore non-spatial relationships between locations – using metrics such as the BRIC scores – to provide informative and comparable measures of non-spatial autocorrelation with these sorts of environmental and geographic hazard data. Indeed, the use of BRIC scores to represent the broad features of resilience allows for a richer definition of counties without propinquity – i.e. counties that share resilience characteristics but that are not spatially adjacent [15,38].

Of course, some caveats and qualifications are in order. Our approach has focused on a logistic model for fitting and predicting hazardous risks from the 3108-counties data. Obviously, however, many different forms for modeling πi in Equation (3) could be applied. Doing so would affect a number of features in our construction, including the specific form of the risk function R(x) and potentially the consequent form of the BMD in (5) and all quantities developed from it. Indeed, when considering these many models for πi, or for that matter when selecting the percentile to define A, some advanced form of model averaging [33] might be applied to fine-tune the risk analysis. Clearly, extensions to other model formulations under our spatially-adjusted paradigm is an area of open, future research.

Further, our choice of a simple Euclidean metric for the distances dij in (1) was adopted as much for convenience and familiarity as for any other reason. Other distance metrics may be equally or more propitious if appropriate motivation for their use were available. For example, the well-know Manhattan distance (also called ‘Hamming’ distance)

dij=t=16|bitbjt|. (9)

could be applied in place of the Euclidean distance from (1) to define the proximity measure between two six-dimensional BRIC vectors, bi and bj. One would then mimic the approach in Section 1.2.2 and set some threshold D below which the counties are viewed as ‘neighbors’. The corresponding neighborhood structure can again be quantified via an adjacency matrix, A, as in Section 1.2.2.

To explore this, we applied the Manhattan distance (9) in place of Euclidean distance to the 3108-counties data from Section 3, otherwise using the same settings and software as in that section. The MPL point estimates and various confidence limits appear in Table 4, while the consequent BMN^s and BMNLs appear in Table 5. Comparing the results to those in Table 1 and Table 2, respectively, we see that, on balance, the magnitudes of the various point estimates and confidence limits are of roughly similar value. There is a hint of slightly higher MPLEs and correspondingly lower benchmark points with the Euclidean metric, but with no strongly consistent pattern.

Table 4.

The MPLEs with pointwise 95% bootstrap confidence intervals for β1 and β2 for the 3108-counties database, given different BRIC-distance percentile neighborhood matrices A.

A β^0 β^1 β^2 b1B β1 lower β1 upper β2 lower β2 upper
0.25th percentile −4.3473 0.0058 1.1043 0.0070 0.0044 0.0073 1.0254 1.2025
0.5th percentile −4.2174 0.0057 0.9905 0.0073 0.0041 0.0075 0.9160 1.0843
1st percentile −4.1542 0.0062 0.6453 0.0078 0.0044 0.0082 0.5942 0.7099
2nd percentile −4.1240 0.0058 0.4293 0.0076 0.0039 0.0080 0.3934 0.4768
3rd percentile −4.1909 0.0056 0.3747 0.0075 0.0035 0.0078 0.3410 0.4212
4th percentile −4.3138 0.0050 0.4128 0.0073 0.0028 0.0077 0.3707 0.4754
5th percentile −4.3733 0.0045 0.4206 0.0071 0.0021 0.0076 0.3735 0.4857
6th percentile −4.4033 0.0044 0.3928 0.0072 0.0018 0.0077 0.3473 0.4609
7th percentile −4.4328 0.0041 0.3975 0.0074 0.0011 0.0081 0.3487 0.4730
8th percentile −4.4552 0.0041 0.3579 0.0075 0.0008 0.0083 0.3139 0.4304
9th percentile −4.4559 0.0039 0.3480 0.0080 0.0002 0.0088 0.3027 0.4230
10th percentile −4.4741 0.0039 0.3315 0.0082 0.0003 0.0091 0.2880 0.4100

Notes: b1B represents an upper 95% bootstrap confidence limit for β1 based on 5000 bootstrap resamples. Manhattan distance is employed to define the adjacency metric.

Table 5.

BMN^BMR and pointwise 95% BMNL BMR at BMR=10, 25 for each neighborhood structure matrix A with the 3108-counties data, using the same conditions as in Table 4.

A β1^ b1B BMN^10 BMNL10 BMN^25 BMNL25
0.25th percentile 0.0058 0.0070 395.9773 327.4111 553.5524 457.7011
0.5th percentile 0.0057 0.0073 401.8472 316.7504 561.7582 442.7980
1st percentile 0.0062 0.0078 373.5223 293.8861 522.1617 410.8352
2nd percentile 0.0059 0.0076 395.1405 301.9651 552.3828 422.1291
3rd percentile 0.0056 0.0075 412.6542 308.6793 576.8658 431.5151
4th percentile 0.0050 0.0073 457.5444 315.6176 639.6196 441.2145
5th percentile 0.0045 0.0072 507.7418 323.5023 709.7926 452.2368
6th percentile 0.0044 0.0072 527.7336 321.3651 737.7400 449.2491
7th percentile 0.0041 0.0074 564.0084 310.5300 788.4499 434.1023
8th percentile 0.0041 0.0075 560.7612 307.4593 783.9104 429.8096
9th percentile 0.0039 0.0080 587.9782 288.9053 821.9582 403.8723
10th percentile 0.0039 0.0082 592.5658 282.4419 828.3714 394.8369

Of course, one need not classify counties as strict ‘neighbors’ (or not) to their adjacent – in a non-spatial sense – counties. If it were known, e.g. that differential, a priori weights exist quantifying how much the neighborhood status depends on the distances, those weights could be incorporated into Equation (3). Or, for that matter, our MPL fit for the centered autologistic model could be replaced by some form of Bayesian fit [19,39] if sufficient hierarchical prior information were available to apply the Bayesian paradigm. Indeed, the concept of adjusting for autocorrelation in a logistic regression is obviously not new: besides Besag's original paper [2] and the centered extension in [7], applications include use of auxiliary variables to account for severity of adverse events [4], variational methods to capture spatial dependence via Gaussian processes [16], and extensions to (sparse) generalized linear mixed models [22], among many others. In all these cases, further development for implementation in our risk-analytic context, and how to account for the necessary benchmark components, would be required.

Acknowledgments

Thanks are due to Dr Stephan R. Sain for his seminal suggestions on developing non-spatial measures of autocorrelation, to Dr John Hughes for discussions on the centered autologistic model, and to an anonymous referee for quite helpful suggestions on how to improve the manuscript. This material represents a portion of the first author's PhD dissertation from the University of Arizona Graduate Interdisciplinary Program in Statistics.

Funding Statement

The research was supported in part by #ES027394 from the U.S. National Institutes of Health.

Data availability

The full 3108-counties database was generated from the SHELDUSTM knowledgebase (http://www.sheldus.org). Derived data employed in the calculations herein are available from the corresponding author [WWP] on request.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Arnold B.C. and Strauss D.J., Pseudolikelihood estimation: Some examples, Sank. Ser. B 53 (1991), pp. 233–243. [Google Scholar]
  • 2.Besag J.E., Nearest-neighbour systems and the auto-logistic model for binary data, J. R. Stat. Soc. Ser. B 34 (1972), pp. 75–83. [Google Scholar]
  • 3.Besag J.E., Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D 24 (1975), pp. 179–195. [Google Scholar]
  • 4.Bee M., Benedetti R., and Espa G., Spatial models for flood risk assessment, Environmetrics 19 (2008), pp. 725–741. [Google Scholar]
  • 5.Berke P.R. and Campanella T.J., Planning for postdisaster resiliency, Ann. Am. Acad. Pol. Soc. Sci. 604 (2006), pp. 192–207. [Google Scholar]
  • 6.Budtz-Jørgensen E., Keiding N., and Grandjean P., Benchmark dose calculation from epidemiological data, Biometrics 57 (2001), pp. 698–706. [DOI] [PubMed] [Google Scholar]
  • 7.Caragea P.C. and Kaiser M.S., Autologistic models with interpretable parameters, J. Agric. Biol. Environ. Stat. 14 (2009), pp. 281–300. [Google Scholar]
  • 8.Crump K.S., A new method for determining allowable daily intakes, Toxicol. Sci. 4 (1984), pp. 854–871. [DOI] [PubMed] [Google Scholar]
  • 9.Crump K.S., Calculation of benchmark doses from continuous data, Risk. Anal. 15 (1995), pp. 79–89. [Google Scholar]
  • 10.Cressie N.A.C., Statistics for Spatial Data, John Wiley & Sons, New York, 1993. [Google Scholar]
  • 11.Cutter S.L., Barnes L., Berry M., Burton C., Evans E., Tate E., and Webb J., A place-based model for understanding community resilience to natural disasters, Glob. Environ. Change. 18 (2008), pp. 598–606. [Google Scholar]
  • 12.Cutter S.L., Ash K.D., and Emrich C.T., The geographies of community disaster resilience, Glob. Environ. Change. 29 (2014), pp. 65–77. [Google Scholar]
  • 13.Dukes K.A., Cronbach's alpha, in Encyclopedia of Biostatistics 2, P. Armitage and T. Colton, eds., John Wiley & Sons, Chichester, 1998, pp. 1026–1028.
  • 14.Folke C., Carpenter S., Elmqvist T., Gunderson L., Holling C.S., and Walker B., Resilience and sustainable development: Building adaptive capacity in a world of transformations, AMBIO: A J. Human Environ. 31 (2002), pp. 437–440. [DOI] [PubMed] [Google Scholar]
  • 15.Gurney G.G., Blythe J., Adams H., Adger W.N., Curnock M., Faulkner L., James T., and Marshall N.A., Redefining community based on place attachment in a connected world, Proc. Natl. Acad. Sci. USA 114 (2017), pp. 10077–10082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hardouin C., A variational method for parameter estimation in a logistic spatial regression, Spat. Stat. 31 (2019). Article No. 100365 (14 pp.). [Google Scholar]
  • 17.Harvey C., Extreme weather events could worsen climate change, Scientific American E&E News (24 January 2019). Available at https://www.scientificamerican.com/article/extreme-weather-events-could-worsen-climate-change/.
  • 18.Hand D.J., Blunt G., Kelly M.G., and Adams N.M., Data mining for fun and profit, Stat. Sci. 15 (2000), pp. 111–131. [Google Scholar]
  • 19.Hoeting J.A., Leecaster M., and Bowden D., An improved model for spatially correlated binary responses, J. Agric. Biol. Environ. Stat. 5 (2000), pp. 102–114. [Google Scholar]
  • 20.Holling C.S., Resilience and stability of ecological systems, Annu. Rev. Ecol. Syst. 4 (1973), pp. 1–23. [Google Scholar]
  • 21.Hosmer D.W., Lemeshow S., and Sturdivant R.X., Applied Logistic Regression, 3rd ed., John Wiley & Sons, New York, 2013. [Google Scholar]
  • 22.Hughes J., ngspatial: A package for fitting the centered autologistic and sparse spatial generalized linear mixed models for areal data, R. J. 6 (2014), pp. 81–95. [Google Scholar]
  • 23.Hughes J., Haran M., and Caragea P.C., Autologistic models for binary data on a lattice, Environmetrics 22 (2011), pp. 857–871. [Google Scholar]
  • 24.Klein R.J.T., Nicholls R.J., and Thomalla F., Resilience to natural hazards: How useful is this concept? Global Environ. Change Part B: Environ. Haz. 5 (2003), pp. 35–45. [Google Scholar]
  • 25.Kolaczyk E.D. and Csárdi G., Statistical Analysis of Network Data with R, Springer, New York, 2014. [Google Scholar]
  • 26.Liu J., Autologistic modeling in benchmark risk analysis, Ph.D. thesis, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 2017.
  • 27.Liu J., Piegorsch W.W., Schissler A.G., and Cutter S.L., Autologistic models for benchmark risk or vulnerability assessment of urban terrorism outcomes, J. R. Stat. Soc. Ser. A 181 (2018), pp. 803–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Manyena S.B., The concept of resilience revisited, Disasters 30 (2006), pp. 434–450. [DOI] [PubMed] [Google Scholar]
  • 29.Martin C.R. and Savage-McGlynn E., A ‘good practice’ guide for the reporting of design and analysis for psychometric evaluation, J. Reprod. Infant Psychol. 31 (2013), pp. 449–455. [Google Scholar]
  • 30.Nardo M., Saisana M., Saltelli A., and Tarantola S., Handbook on Constructing Composite Indicators: Methodology and User Guide, Organisation For Economic Co-Operation and Development Publishing, Paris, 2008.
  • 31.Nitcheva D.K., Piegorsch W.W., West R.W., and Kodell R.L., Multiplicity-adjusted inferences in risk assessment: Benchmark analysis with quantal response data, Biometrics 61 (2005), pp. 277–286. [DOI] [PubMed] [Google Scholar]
  • 32.Piegorsch W.W. and Bailer A.J., Analyzing Environmental Data, John Wiley & Sons, Chichester, 2005. [Google Scholar]
  • 33.Piegorsch W.W., An L., Wickens A., West W., Peña E.A., and Wu W., Information-theoretic model-averaged benchmark dose analysis in environmental risk assessment, Environmetrics 24 (2013), pp. 143–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2019. Available at http://www.R-project.org/.
  • 35.Scott D.W., Multivariate Density Estimation. Theory, Practice, and Visualization, John Wiley & Sons, New York, 1992. [Google Scholar]
  • 36.Tarabusi E.C. and Guarini G., An unbalance adjustment method for development indicators, Soc. Indic. Res. 112 (2013), pp. 19–45. [Google Scholar]
  • 37.Varin C., Reid N., and Firth D., An overview of composite likelihood methods, Stat. Sin. 21 (2011), pp. 5–42. [Google Scholar]
  • 38.Webber M.M., Order in diversity: Community without propinquity, in Cities and Space, L. Wirigo, ed., Johns Hopkins University Press, Baltimore, 1983, pp. 23–56.
  • 39.Zheng Y. and Zhu J., Markov chain Monte Carlo for a spatial-temporal autologistic regression model, J. Comput. Graph. Stat. 17 (2008), pp. 123–137. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The full 3108-counties database was generated from the SHELDUSTM knowledgebase (http://www.sheldus.org). Derived data employed in the calculations herein are available from the corresponding author [WWP] on request.


Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES