Adjusting statistical benchmark risk analysis to account for non-spatial autocorrelation, with application to natural hazard risk assessment

Jingyu Liu; Walter W Piegorsch; A Grant Schissler; Rachel R McCaster; Susan L Cutter

doi:10.1080/02664763.2021.1904385

. 2021 Apr 1;49(9):2349–2369. doi: 10.1080/02664763.2021.1904385

Adjusting statistical benchmark risk analysis to account for non-spatial autocorrelation, with application to natural hazard risk assessment

Jingyu Liu ^a, Walter W Piegorsch ^a,^b,^c, A Grant Schissler ^d,^CONTACT, Rachel R McCaster ^e,^f, Susan L Cutter ^e,^f

PMCID: PMC9225316 PMID: 35755089

ABSTRACT

We develop and study a quantitative, interdisciplinary strategy for conducting statistical risk analyses within the ‘benchmark risk’ paradigm of contemporary risk assessment when potential autocorrelation exists among sample units. We use the methodology to explore information on vulnerability to natural hazards across 3108 counties in the conterminous 48 US states, applying a place-based resilience index to an existing knowledgebase of hazardous incidents and related human casualties. An extension of a centered autologistic regression model is applied to relate local, county-level vulnerability to hazardous outcomes. Adjustments for autocorrelation embedded in the resiliency information are applied via a novel, non-spatial neighborhood structure. Statistical risk-benchmarking techniques are then incorporated into the modeling framework, wherein levels of high and low vulnerability to hazards are identified.

Keywords: Benchmark dose, centered autologistic model, maximum pseudo-likelihood, quantitative risk assessment, natural hazard vulnerability, non-spatial autocorrelation

1. Introduction

1.1. Background: Place-based risk analytics

Quantifying vulnerability to hazardous events is a critical component in contemporary environmental risk assessment. Driven by concerns over climate change and other consequences of natural or human-induced activities, data scientists, risk assessors, and policy makers are deeply involved in the effort to identify and characterize such localized vulnerabilities. According to the US National Oceanic and Atmospheric Administration (NOAA), calendar year 2020 was the most impactful year on record for weather-related disasters, in terms of losses exceeding one billion US dollars (https://www.ncdc.noaa.gov/billions/). The potential for future catastrophic events of similar ilk is an issue of clear concern, as scientists warn that climate change could make extreme weather events more damaging [17]. This recognition, and the debate associated therewith, motivates the approach we proffer here to quantify place-based vulnerabilities to natural and human-induced hazards.

We appeal to a mechanism from statistical risk analysis: estimation of minimum exposure levels, called Benchmark Doses (BMDs), that induce a pre-specified Benchmark Response (BMR) in a target population [8]. Established inferential approaches for BMD analysis typically involve one-sided confidence limits, leading in practice to what are called Benchmark Dose Lower Limits [BMDLs; see 9]. Within this context, we previously extended the BMD from its roots in environmental toxicology to characterize socio-geographic vulnerability among 132 US urban centers (‘cities’) to terrorist events, applying a quantitative, place-based vulnerability index (PVI) to a database of terrorism incidents and related human casualties [27]. We employed a centered autologistic regression model that related urban vulnerability to terrorist outcomes and also adjusted for autocorrelation in the geospatial data. Benchmark ‘doses’ were then estimated within this modeling framework, wherein levels of greater and lesser urban vulnerability to terrorism were identified. This new, translational adaptation of the risk-benchmark approach, including its ability to account for geospatial autocorrelation, was seen to operate quite flexibly in that socio-geographic setting. Herein, we extend and evolve those results by allowing for a broader definition of geographic autocorrelation, and by employing the expanded methodology to mine a much larger geographic database – described in the next section – of natural hazard outcomes.

1.2. Hazard assessment for 3108 US counties

Expanding on our previous work, we apply our autocorrelated benchmark risk approach to a larger, more-complex knowledgebase of geospatial environmental responses to natural or human-induced hazards: the Spatial Hazard Events and Losses Database for the United States (SHELDUS^TM). SHELDUS is a county-level geo-referenced knowledgebase of economic and human losses from 18 different types of natural hazards for the continental US, Alaska, and Hawaii (http://www.sheldus.org). It covers the period from 1960-present and includes data fields on location (FIPS code), date of event, and direct losses (property losses, crop losses, injuries, deaths). The original input loss data are from US government agencies – e.g. the US Geological Survey, the US Department of Agriculture, NOAA, and the Federal Emergency Management Agency – updated for locational specificity to counties. SHELDUS represents the only comprehensive natural hazards loss database in the US and is widely employed by researchers, businesses, and governmental agencies.

We centered our attention on 3108 counties in the 48 conterminous US states (excluding Broomfield, CO, for which data were not available). For the outcome variable, we explored how disasters led to cumulative property or crop losses of one billion US dollars or more. The billion-dollar threshold is used by NOAA to highlight singular, extreme weather events (https://www.ncdc.noaa.gov/billions/), but of course significant losses such as these can be a consequence of much more than weather-related impacts. At the inception of the billion-dollar loss threshold in 1980, an event causing more than one billion dollars of losses was felt to define an extreme weather event in terms of adverse economic consequences. As a result, we adopted billion-dollar cumulative losses as our indicator of disaster-related, per-county outcomes: from SHELDUS, we created a binary response variable, $Y_{i}$ , indicating whether or not the cumulative loss of property and crop damage in the ith US county exceeded one billion dollars, as reported in calendar year 2015 $(i = 1, \dots, 3108)$ . Of the 3108 counties in our database, $\sum Y_{i} = 79$ , or 2.5%, reported a cumulative loss exceeding this one billion dollar threshold.

Associated with these adverse outcomes, we set the target predictor variable, $x_{i}$ , as the number of hazards that caused a fatality or loss in a county during the previous 16 years (1990–2015). This numbers-of-hazards predictor distinguishes between common everyday events that add up over time to significant losses versus losses based on a few events but where the impacts were very large.

Across the 3108 counties, x ranged from 6 to 1247 with arithmetic mean equal to 135.7 and standard deviation equal to 108.0. Figure 1 displays a histogram (with overlayed kernel density estimate) for the target predictor, indicating a unimodal and clearly right-skewed pattern.

Figure 2 gives a further graphical summary for x by mapping the geographic distribution of the predictor over the 3108 US counties. A clearly non-homogenous pattern is indicated, with higher numbers of hazards observed in the southwest desert, Iowa, along the eastern Great Lakes and most of New York, and otherwise a scattering of higher numbers throughout the central and eastern US

Our interest with these data is the modeled probability, $π_{i}$ , that the ith county exhibits a cumulative one billion dollar loss in 2015 as predicted by its hazardous fatality/loss history. As with our previous analysis of urban terrorism [27] we propose here that beyond a relationship to numbers of hazardous fatalities/losses, $π_{i}$ would also be affected in some fashion by other, possibly autocorrelated features relating to place-based vulnerability. In effect, we define a neighborhood or clique for the ith county comprised of those other US counties exhibiting similar features, inducing autocorrelation in the response $Y_{i}$ . Rather than rely on simple geospatial adjacency, however, we postulate that the definition of a location's neighborhood need not be limited to map-based coordinates: it is not unlikely that two counties' outcomes may correlate in ways defined other than by spatial proximity. For instance, consider two counties on the US Chesapeake Bay such as Anne Arundel county and another physically adjacent county farther to the south from the nearby city of Baltimore, say, Calvert county, MD. Clearly, due to their locations on the Chesapeake Bay the two counties share much in common. Anne Arundel's regional vulnerability to losses from a hazardous incident might not be as strongly correlated, however, with its physical neighbor to the south (Calvert) as much as with another more-distant locality exhibiting similar environmental, geophysical, and socio-economic characteristics – say, Dearborn county, OH, which is near to the metropolitan city of Cincinnati, OH (comparable to Baltimore) – and vice versa. Indeed, by expanding the neighborhood definition beyond simple spatial proximity, our vulnerability analysis may be able to ‘borrow strength’ from across the ensemble and consequently improve risk-analytic precision. This may be especially true for counties in the more-dispersed central and Western US states, where large urban centers are more isolated geospatially.

As is true in many areas of scientific endeavor, by studying familiar processes in unfamiliar settings a better grasp can be achieved of both the familiar processes and the new settings. With this in mind, we consider an alternative definition for the cliques that allows for physically non-adjacent localities to be included in each neighborhood, as long as these also conform to the underlying statistical requirements for an appropriate neighborhood structure. That is, the indices defining the elements (counties) within each neighborhood and their associated responses must satisfy a series of conditional probability and response requirements given, e.g. by Cressie [10, §6.4.1]. In all the operations we present below, these various requirements are satisfied. We focus our attention on an autocorrelated logistic model's ability to extend into non-spatial settings, and show how the model can apply successfully in the presence of non-spatial autocorrelation. While the context changes from our previous article [27], the concept retains surprising flexibility. The effort in effect resembles a data mining/knowledge discovery exercise [18]: query the large 3108-counties database to uncover potential characteristics for defining a neighborhood structure that may prove more informative than quantities based on simple, naïve spatial propinquity.

1.2.1. Baseline resilience indicators for communities (BRIC)

To quantify and explore non-geographical definitions for the neighborhood structure when studying county-level relationships between numbers of hazards and cumulative billion-dollar losses, we employ a metric describing Baseline Resilience Indicators for Communities [BRIC; see 12] for each county. We quantify these extended neighborhood characteristics by focusing attention on a single county's resilience to hazardous impacts. Holling [20] first used the term resilience, from an ecological perspective, to describe a ‘measure of the persistency of systems and their ability to absorb change and disturbance and still maintain the same relationships between populations or state variables’ [20, p. 14]. This notion of resilience has gained diverse popularity in the past few decades [14,24,28]. In terms of natural hazards, resilience is defined as the ability of a system – here, a county – to survive, respond to, and cope with a hazardous natural or human-induced event [5,11].

To measure the inherent resilience of counties in the United States, Cutter et al. [12] quantified six different types of resilience identified in the extant literature. An initial set of 61 variables was collected from 30 different sources to represent the six forms of resilience: social, economic, community, institutional, housing/infrastructure, and environmental (see that 2014 article for full details). The authors then converted the original count variables into percentages, rates, differences, or averages in order to reliably compare counties of different sizes and characteristics. The data then underwent min-max normalization [36] so that all values had comparable reference points on a scale of 0 to 1. Cutter et al. also adjusted the orientation of each variable to make larger values correspond theoretically to higher resilience. They then discarded conceptually extraneous variables and also applied Cronbach's alpha coefficient [13] to assess the internal consistency among the remaining variables in the dataset when building the composite scores [29,30]. Their final set contained 49 indicators (Cronbach alpha = 0.66) which were further distilled into their six resilience measures, each of whose potential scores range from 0 to 1. Note that we treat these six-dimensional BRIC scores as constants derived from each county's underlying (i) social, (ii) economic, (iii) community, (iv) institutional, (v) housing/infrastructure, and (vi) environmental characteristics, in similar fashion to the less-extensive PVI from our earlier 132-cities dataset [27].

1.2.2. Non-spatial autocorrelation

The six BRIC components measure qualities of a community that may enhance or detract from its ability to prepare for, respond to, recover from, and mitigate environmental hazards in the six targeted domains: social, economic, housing & infrastructure, institutional, community, and environmental. Therefore, for each county in our 3108-counties database, the BRIC components as a whole represent highly informative and comprehensive markers for natural-hazard risk assessment. It is then straightforward to employ them for defining a non-spatial neighborhood structure. Denote $b_{i} = [b_{i 1} \dots b_{i 6}]$ and $b_{j} = [b_{j 1} \dots b_{j 6}]$ as the vectors of six-dimensional BRIC scores of county i and county j, respectively. The Euclidean distance can then be calculated between the two six-dimensional vectors for any pair of counties in the conterminous US:

d_{i j} = \sqrt{\sum_{t = 1}^{6} (b_{i t} - b_{j t})^{2}} .

(1)

This produces a simple, well-defined ‘proximity’ measure for each pair of counties, from which neighborhoods may be derived. Figure 3 presents a histogram for the univariate distances between each pairing of the 3108 counties. The pattern displays a clear right skew.

We next set some threshold or breakpoint, D, below which the ith and jth counties are viewed as ‘neighbors’; i.e. they possess similar six-dimensional disaster resilience. Then, the corresponding neighborhood structure can be quantified via an adjacency matrix, $A$ , whose elements are simply

a_{i j} = {\begin{cases} 1, & if 0 \leq d_{i j} \leq D, \\ 0, & otherwise . \end{cases}

For exploratory purposes, we used percentiles of the empirical distributions of the $d_{i j}$ s to define D. For mining SHELDUS we studied a range of small percentiles: 0.25th, 0.5th, 1st, 2nd, 3rd,…, 9th, 10th. Larger percentiles were considered as well, but these gave such highly dispersed ‘neighborhood’ patterns that no significant relationships were observed between the variables. This had us limit ourselves to the 10th percentile and below.

In addition, we also included naïve spatial autocorrelation, by constructing an adjacency matrix with a simple rook-adjacency neighborhood structure, as employed in [27]. For our SHELDUS data, approximately 0.2% of the $(\begin{matrix} 3108 \\ 2 \end{matrix})$ pairs are spatially adjacent ‘neighbors’ to each other. Our goal herein is to employ these various metrics, explore vulnerability, and characterize risk to hazardous events while adjusting for possible autocorrelated community-level resilience in the SHELDUS 3108-counties database.

In what follows, Section 2 reviews the autologistic benchmark framework we employed [27], along with details on how the risk-benchmark calculations can be incorporated into a centered autologistic regression model. In Section 3 we return to the 3108-counties database from above and apply the risk-analytic benchmark approach to illustrate its use with our novel, non-spatial autocorrelated model structure. Section 4 ends with a brief discussion. All calculations we present here were performed in the $R$ programming environment [34].

2. Autologistic model development

2.1. Centered autologistic model

We appeal to the well-known logistic regression model [21], which is widely applied with binary data such as we have with our 3108-counties database. A simple and direct way to account for spatial autocorrelation within a logistic model allows the response probability, $π_{i}$ , at the ith observation – here, the ith county – to depend upon the other observed binary responses in some pre-defined, possibly non-spatial, neighborhood, $N_{i}$ , for that county. Besag [2] first proposed such a formulation to incorporate neighboring autocorrelation, the conditional autologistic model

\begin{aligned} π_{i} = P [Y_{i} = 1 | x_{i}, y_{j}, j \neq i] & = \frac{1}{1 + \exp {- β_{0} - β_{1} x_{i} - β_{2} \sum_{j \in N_{i}} a_{i j} y_{j}}} . \end{aligned}

(2)

In our notation, $y_{i}$ is the outcome indicator, $x_{i}$ is the ‘dose’ predictor, the $a_{i j}$ s stipulate each $y_{i}$ 's BRIC-based neighborhood $N_{i}$ (see Section 1.2), and $\sum_{j \in N_{i}} a_{i j} y_{j}$ is an autocovariate constructed from the information within $N_{i}$ . We refer to $β_{2}$ as the autocorrelation parameter. When $β_{2} = 0$ , no autocorrelation exists in the data, and the autologistic model then reduces to a standard logistic regression (which we call the ‘independence model’).

Caragea and Kaiser [7] offered a correction to (2) that extends the conditional model into a centered autologistic form, by redefining the spatial autocovariate (essentially, by centering it):

π_{i} = P [Y_{i} = 1 | x_{i}, y_{j}, j \neq i] = \frac{1}{1 + \exp {- β_{0} - β_{1} x_{i} - β_{2} \sum_{j \in N_{i}} a_{i j} (y_{j} - μ_{j})}}

(3)

where the new quantity $μ_{j} = E [Y_{j} | β_{2} = 0] = {1 + \exp (- β_{0} - β_{1} x_{j})}^{- 1}$ is the expected value of $Y_{j}$ under an independence model with no autocorrelation. Centering makes the estimates more stable and interpretable, and as such the centered autologistic construction in (3) provides a direct and simple way to build autocorrelated dependency into a binary regression ([23]; [25, §8.3]). We therefore turn to (3) for constructing quantitative risk/vulnerability assessments with our data. More details on the importance of autologistic centering are available in [7], [23], and our own previous discussion [27].

We note in passing that our non-spatial adjacency model and a model employing only naïve adjacency can both be accommodated under the centered autologistic construction in Equation (3), via appropriate definition of the neighborhoods, $N_{i}$ .

2.2. Maximum pseudo-likelihood estimation

Allowing for autocorrelation induces dependencies among the observations, complicating standard likelihood analysis with our centered autologistic model. Instead, we estimate the unknown β-parameters in (3) via maximization of the pseudo-likelihood function, as proposed by Besag [3]. Besag found that by multiplying together the conditional probability distributions of the $Y_{i}$ s given their neighbors, the resulting pseudo-likelihood

\prod_{i = 1}^{n} P [Y_{i} = 1 | x_{i}, y_{j}, j \neq i] = \prod_{i = 1}^{n} π_{i}^{y_{i}} (1 - π_{i})^{1 - y_{i}}

possesses many of the same features as the usual likelihood function, despite any existing dependencies among the observations. In particular, maximum pseudo-likelihood estimates (MPLEs) of the β-parameters are consistent and asymptotically normal under typical regularity conditions [1].

2.3. Autologistic benchmark risk analysis

With binary data, a traditional benchmark-risk analysis relates a pre-specified level of outcome response – the benchmark response, BMR – to the response probability $π (x)$ viewed as a function of the ‘dose’ variable x. The goal is to find the smallest positive x at which a background-adjusted function of $π (x)$ , $R (x) = {π (x) - π (0)} / {1 - π (0)}$ , equals the BMR. The solution is the benchmark dose (BMD), past which a location's risk to some hazardous event is elevated to unacceptable levels at that BMR. The ‘background’ adjustment is included to account for extemporaneous factors out of the risk assessor's control; a prototypical example is correction for spontaneous (zero-exposure) tumor incidence when assessing the risk of exposure to a carcinogen [32, §4.2.1].

An added complexity with the centered autologistic model in (3) is, however, presence of the centered autocovariate $\sum_{j \in N_{i}} a_{i j} (y_{j} - μ_{j})$ . To overcome this, in [27] we exploited a clever definition for the benchmark response proposed by Budtz-Jørgensen et al. [6]: view BMR as a specified proportional increase over zero-level background in the odds of an adverse event, evaluated at the same constellation of secondary covariates. At least for logistic-type models, the functional dependency on secondary covariates then cancels out in the various ratio operations [26]. Budtz-Jørgensen et al. provided no guidance on selection of the BMR here, so we mimic our previous approach from [27]: we set BMR = 10 or 25, as we found that an increase of these magnitudes in the odds could be useful markers in practice. It will require future, more-extensive benchmark risk analyses with autocorrelated data to determine if selection of BMR = 10 or BMR = 25 performs as suitably as we have seen with our data.

Employed with the centered autologistic regression model in (3), we therefore define the BMD as the dose resulting in a pre-specified increase (of BMR multiples) in the odds for an abnormal response. That is, solve for x in

BMR = \frac{π (x | β) / {1 - π (x | β)}}{π (0 | β) / {1 - π (0 | β)}},

(4)

where $π (\cdot | β)$ is the centered autologistic response probability based on (3) and $β = [β_{0} β_{1} β_{2}]^{T}$ is the vector of unknown autologistic coefficients. Note that for adverse outcomes we anticipate $β_{1} > 0$ , representing an underlying increase in the probability of an adverse event as x increases. This typically defines a proportional rise in odds, thus we operate implicitly with $BMR > 1$ .

As illustrated in [27], the odds ratio in (4) simplifies to

\frac{π (x | β) / {1 - π (x | β)}}{π (x = 0 | β) / {1 - π (x = 0 | β)}} = \frac{\exp (β_{0} + β_{1} x + β_{2} \sum_{t \in N_{s}} a_{t} (y_{t} - μ_{t}))}{\exp (β_{0} + β_{2} \sum_{t \in N_{s}} a_{t} (y_{t} - μ_{t}))} = \exp {β_{1} x},

Setting this odds ratio equal to the BMR and solving for x then yields the unique BMD at the given BMR:

{BMD}_{BMR} = \frac{\log (BMR)}{β_{1}} .

(5)

This is the risk/vulnerability value resulting in a (BMR $- 1$ )-fold increase in odds of an adverse event, relative to a zero background value.

2.4. Benchmark estimation and inference

From the data pairs $(x_{i}, Y_{i})$ for each ith location $(i = 1, \dots, n)$ , we fit the centered autologistic regression model from (3) via maximum pseudo-likelihood, as in Section 2.2. The result is a vector of MPLEs $\hat{β} = [{\hat{β}}_{0} {\hat{β}}_{1} {\hat{β}}_{2}]^{T}$ . To estimate these we employ the $R$ package ngspatial [22]; see https://CRAN.R-project.org/package=ngspatial. Given the MPLE ${\hat{β}}_{1}$ (and a predetermined level for the BMR), from (5) the corresponding point estimate for the BMD is then simply

{\hat{BMD}}_{BMR} = \frac{\log (BMR)}{{\hat{β}}_{1}} .

(6)

We further desire a lower $1 - α$ confidence limit on the BMD, denoted as BMDL $_{BMR}$ . The straightforward form for the BMD in (5) allows for a particularly simple construction, at least conceptually: if $b_{1 U}$ is an upper $1 - α$ confidence limit on $β_{1}$ such that $P [β_{1} < b_{1 U}] = 1 - α$ at least approximately in large samples, then clearly

P [\frac{\log (BMR)}{b_{1 U}} < \frac{\log (BMR)}{β_{1}}] = P [\frac{\log (BMR)}{b_{1 U}} < {BMD}_{BMR}] = 1 - α

(7)

if $BMR > 1$ . This defines a lower $1 - α$ confidence limit on the BMD, thus we take BMDL $_{BMR} = \log (BMR) / b_{1 U}$ . We note in passing that Equation (7) can equivalently be written as $P [b_{1 U}^{- 1} \log {BMR} < {BMD}_{BMR},$ $\forall BMR > 1] = 1 - α$ , which provides a simultaneous lower confidence statement on the BMD, in the sense of Nitcheva et al. [31].

Actually finding such a $b_{1 U}$ confidence limit is a more difficult task, however, since it is inappropriate under the centered autologistic model to imitate standard practice and estimate the standard error of ${\hat{β}}_{1}$ by inverting the MPL information matrix [37]. Instead, we follow a suggestion by Hughes [22], who calls for a computer-intensive, parallel, parametric bootstrap approach. The method can be implemented in the ngspatial package [22], and it returns a bootstrap distribution of B>0 resampled $β_{1}$ values, based on the original data. An approximate upper $1 - α$ confidence limit can then be taken as the $(1 - α) B$ quantile from this bootstrap distribution, denoted as $b_{1 B}$ . Following [27], we operate with B = 5000 bootstrap resamples. From this, a bootstrap-based lower confidence limit for the BMD becomes

{BMDL}_{BMR} = \frac{\log (BMR)}{b_{1 B}} .

(8)

3. Benchmark analysis for the 3108-counties data

3.1. Maximum pseudo-likelihood estimates

Returning to the 3108-counties database from Section 1.2, we regressed Y on $x =$ number of hazards causing a county fatality or loss in the 1990–2015 period, via the centered autologistic model in Equation (3) employing Euclidean distance from (1). This produced MPLEs ${\hat{β}}_{0}$ , ${\hat{β}}_{1}$ , and ${\hat{β}}_{2}$ for the unknown parameters. For each percentile and its corresponding adjacency neighborhood matrix $A$ introduced in Section 1.2.2, Table 1 provides the MPLEs, pointwise 95% bootstrap confidence intervals for $β_{1}$ and $β_{2}$ , and an upper 95% bootstrap confidence limit on $β_{1}$ , denoted as $b_{1 B}$ from Section 2.4. Notice that values of ${\hat{β}}_{0}$ are negative and values of ${\hat{β}}_{1}$ and ${\hat{β}}_{2}$ are positive in all cases. Further, the MPLEs for $β_{0}$ and $β_{1}$ are quite close under each percentile-defined adjacency matrix $A$ . The autocorrelation parameter estimate, ${\hat{β}}_{2}$ , generally decreases as the percentile used to define $A$ increases. When the neighborhood structure is described via simple adjacency, the three coefficient estimates are evidently larger in absolute value compared with those obtained under the non-spatial, BRIC percentile-defined $A$ matrices.

Table 1.

The MPLEs with pointwise 95% bootstrap confidence intervals for $β_{1}$ and $β_{2}$ for the 3108-counties database, given different BRIC-distance percentile neighborhood matrices $A$ .

$A$	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$b_{1 B}$	$β_{1}$ lower	$β_{1}$ upper	$β_{2}$ lower	$β_{2}$ upper
Naïve adjacency	−6.1665	0.0106	1.6745	0.0123	0.0068	0.0127	1.5064	1.8019
0.25th percentile	−4.4029	0.0062	1.1753	0.0074	0.0047	0.0077	1.0941	1.2787
0.5th percentile	−4.2758	0.0059	1.0198	0.0074	0.0041	0.0077	0.9404	1.1200
1st percentile	−4.1957	0.0060	0.6714	0.0076	0.0042	0.0079	0.6180	0.7399
2nd percentile	−4.1876	0.0057	0.4808	0.0075	0.0037	0.0079	0.4395	0.5354
3rd percentile	−4.1832	0.0053	0.3684	0.0072	0.0033	0.0076	0.3348	0.4130
4th percentile	−4.3289	0.0048	0.4306	0.0069	0.0021	0.0074	0.3543	0.4605
5th percentile	−4.3394	0.0044	0.3971	0.0069	0.0022	0.0074	0.3549	0.4608
6th percentile	−4.4292	0.0042	0.4198	0.0073	0.0014	0.0079	0.3721	0.4939
7th percentile	−4.4258	0.0040	0.3842	0.0071	0.0011	0.0078	0.3380	0.4554
8th percentile	−4.4516	0.0040	0.3484	0.0073	0.0006	0.0080	0.3049	0.4156
9th percentile	−4.4873	0.0041	0.3173	0.0078	0.0008	0.0087	0.2783	0.3817
10th percentile	−4.5124	0.0041	0.3035	0.0081	0.0004	0.0089	0.2654	0.3669

Open in a new tab

Notes: $b_{1 B}$ represents an upper 95% bootstrap confidence limit for $β_{1}$ . All confidence statements are based on 5000 bootstrap resamples. Euclidean distance as in Equation (1) is employed to define the adjacency metric (apart from naïve adjacency).

The ngspatial package was used to report pointwise 95% bootstrap confidence intervals for the non-intercept parameters in Table 1, where we see every pointwise interval for $β_{1}$ fails to contain $β_{1} = 0$ . Thus it can be reasonably concluded that the number of hazards significantly affects the response probability π; from the strictly positive values in every interval it appears to do so via an increasing relationship. Similarly, as the pointwise 95% intervals for $β_{2}$ fail to contain $β_{2} = 0$ , we conclude that the autocovariate is important for describing features of π. And, as the pointwise $β_{2}$ intervals present only positive values, the autocorrelation appears positive. As a result, our analysis suggests that when a US county exhibits a cumulative billion-dollar-loss event, another county with ‘neighboring’ BRIC patterns would expect to experience such a result as well, and vice versa.

Figure 4 presents histograms and overlaid kernel density estimates of the corresponding bootstrap distributions for ${\hat{β}}_{1}$ and ${\hat{β}}_{2}$ , based on 5000 bootstrap values from ngspatial, when the neighborhood structure is defined using the 1st percentile (see Section 3.2) of the six-dimensional BRIC Euclidean distances between each pair of counties. Both appear roughly bell-shaped, suggesting that the large-sample features of the MPLEs are close to if not fully realized with the 3108 counties in this database.

3.2. Benchmark risk assessment

Moving to the risk-benchmark calculations, we apply Equation (6) for a point estimate of ${BMD}_{BMR}$ . In our application, the benchmark ‘dose’ variable is the number of hazards per county that caused a fatality or loss in the 1990-2015 period, therefore we refer to the BMD for clarity here as a Benchmark Number, or BMN. As mentioned above, the choice for the BMR here is an open issue; to our knowledge, the benchmark paradigm has been applied with autocorrelated models of this sort only in our previous work [27]. Thus a certain portion of this exercise must be viewed as exploratory in nature. To begin, BMR = 10 was taken as an initial choice for the relative increase in odds over background when dealing with the 3108-counties data. In addition, and similar to our earlier analysis with the US 132-cities data [27], for comparison purposes a higher benchmark response at BMR = 25 was also considered, anticipating that an increase of this magnitude in the odds could be an informative marker for practical use. Notice that the relationship between $\hat{BMN}$ and BMR specified by Equation (6) still holds. At BMR = 10, this leads to ${\hat{BMN}}_{10} = \log (10) / {\hat{β}}_{1}$ . Given a one-sided, 95% bootstrap confidence bound of the form $b_{1 B} > β_{1}$ from the MPL fit (above), the 95% lower bootstrap limit is ${BMNL}_{10} = \log (10) / b_{1 B}$ . Similarly, from ${\hat{BMN}}_{25} = \log (25) / {\hat{β}}_{1}$ the 95% lower bootstrap limit is ${BMNL}_{25} = \log (25) / b_{1 B}$ . Specific values for our data are presented in Table 2.

Table 2.

Estimated Benchmark Number ${\hat{BMN}}_{BMR}$ and pointwise 95% confidence limits BMNL $_{BMR}$ at BMR=10, 25, along with safety predictivity rate $ς_{p}$ at each pth percentile, for each neighborhood structure matrix $A$ with the 3108-counties data, using the same conditions as in Table 1.

$A$	$\hat{β_{1}}$	$b_{1 B}$	${\hat{BMN}}_{10}$	${BMNL}_{10}$	${\hat{BMN}}_{25}$	${BMNL}_{25}$	$ς_{p}$
Naïve adjacency	0.0106	0.0123	218.2177	187.5223	305.0553	262.1449	n/a
0.25th percentile	0.0062	0.0074	374.0632	309.7607	522.9180	433.0269	0.977190
0.5th percentile	0.0059	0.0074	392.6402	310.9472	548.8875	434.6856	0.977198
1st percentile	0.0060	0.0076	383.8631	303.2917	536.6176	423.9836	0.977778
2nd percentile	0.0057	0.0075	403.7716	305.8832	564.4485	427.6064	0.977469
3rd percentile	0.0053	0.0072	431.1988	320.1287	602.7900	447.5207	0.976905
4th percentile	0.0048	0.0069	477.7865	328.7617	667.9168	459.5892	0.976300
5th percentile	0.0044	0.0069	522.5073	334.7674	730.4338	467.9847	0.976339
6th percentile	0.0042	0.0073	547.7995	317.0368	765.7908	443.1985	0.977213
7th percentile	0.0040	0.0071	572.7168	323.7617	800.6237	452.5994	0.976913
8th percentile	0.0040	0.0073	575.6973	315.0101	804.7903	440.3652	0.977213
9th percentile	0.0041	0.0078	565.2493	296.4169	790.1846	414.3730	0.977733
10th percentile	0.0041	0.0081	563.3387	283.2238	787.5137	395.9299	0.977607

Open in a new tab

Note: n/a, not applicable.

Comparing the range between ${BMNL}_{BMR}$ and ${\hat{BMN}}_{BMR}$ in Table 2, values generally become larger as the percentile increases, although the quantities taper a bit for higher percentiles. This is not altogether surprising: more counties are considered to be neighbors to each other when a larger percentile is used. As a result, more variation is introduced to the model fit and this appears to increase both the point estimates and the uncertainty associated with the estimation process.

We found that when the 0.25th percentile is used as a breakpoint to define the adjacency matrix $A$ , 0.25% of the $(\binom{3108}{2})$ pairs of counties were defined as neighbors to each other. Coincidentally, this percentage is very close to the 0.2% of neighbors under naïve spatial adjacency; nonetheless, the gap between ${BMNL}_{BMR}$ and ${\hat{BMN}}_{BMR}$ almost doubles. Moving to higher percentiles increases the separation.

In order to identify an appropriate percentile from Table 1 for use in defining the final adjacency matrix $A$ , we appealed to methods from prediction analytics. For each percentile in Table 1 we found the corresponding ${BMNL}_{25}$ . (We operated with ${BMNL}_{25}$ instead of ${BMNL}_{10}$ due to its more stringent level of benchmark response.) Next, we determined how often a county would be predicted to remain ‘safe’ from a billion-dollar loss based on that ${BMNL}_{25}$ , i.e. when the ${BMNL}_{25}$ rose above the county's number-of-hazards x-predictor. We then determined the overall ‘safety predictivity’ at the given percentile, p, as $ς_{p}$ = (number of counties observed to avoid billion-dollar losses)/(number of counties predicted to avoid billion-dollar losses). Higher values of $ς_{p}$ indicate greater safety predictivity.

We do acknowledge that other possible association metrics could be applied here. Examples include concordance, sensitivity, specificity, etc. We feel our predictivity measure targets the sort of indicator a risk manager would pursue in this setting but, of course, eventual choice among such metrics is up to the predilections of each individual analyst.

Table 2 reports the $ς_{p}$ values in its final column. As seen there, we find highest safety predictivity at $p = 1 %$ , although the values are all encouragingly large and rather close to each other. Thus in what follows we operate with the 1st percentile as the threshold to determine our BRIC-based neighborhood structure. Applied to the data, we find ${\hat{BMN}}_{10} = \log (10) / {\hat{β}}_{1} = 383.8631$ with 95% lower bootstrap limit ${BMNL}_{10} = \log (10) / b_{1 B} = 303.2917$ . Also, ${\hat{BMN}}_{25} = \log (25) / {\hat{β}}_{1} = 536.6176$ with 95% lower bootstrap limit ${BMNL}_{25} = \log (25) / b_{1 B} = 423.9836$ . Counties experiencing numbers of hazards larger than these values are viewed as exhibiting excess risk to cumulative billion-dollar losses, based on our benchmark analysis.

To illustrate the benchmark delineations geospatially, Figure 5 maps the 3108 counties, distinguishing those whose numbers of hazards exceed our two benchmarks above: counties whose number of hazards exceed ${BMNL}_{25}$ are marked in darker shade, while those whose number of hazards exceed only ${BMNL}_{10}$ are marked in lighter shade. (A colorized version is available in the online version.) Counties with number of hazards below both benchmarks are marked in an intermediate tone. For comparison, Figure 6 maps the same relationship under the simpler, naïve spatial adjacency neighborhood structure.

Figure 5 indicates that 93 counties exhibit high relative benchmark risk to cumulative billion-dollar losses; these are situated primarily in the southwest desert, Iowa, and along the eastern Great Lakes. The 136 counties at moderate benchmark risk are generally concentrated around those counties at high risk. No such high/moderate-risk counties appear in the northwest and the central US. In total, there are 2879 counties at lesser risk. The pattern has strong similarities to the geographic distribution of the hazards predictor seen in Figure 2, which is not surprising. Indeed, this helps corroborate the indication from the confidence intervals in Table 1 that $x_{i}$ is a significant positive predictor of $π_{i}$ .

Many more counties are considered to be at risk higher than seen in Figure 5 when autocorrelation is defined via naïve adjacency in Figure 6. There, the 306 high-risk and 321 moderate-risk counties are generally concentrated along the eastern Great Lakes, Iowa, various waterways such as the Connecticut and St. Lawrence Rivers, the interior southwest desert, and scattered throughout the US southeast, often centered at or near large cities. A few central counties also exceed ${BMNL}_{10}$ , but not to the extent seen in the eastern US. There are also more counties in Iowa identified at high relative benchmark risk to cumulative billion-dollar losses than seen in Figure 5. In both Figures 5 and 6 (and Figure 2), however, the larger patterns are retained – in effect, only the numbers change to reflect the choice of BMR. From a methodology perspective, this provides some level of assurance that the statistical operations identify underlying features in a reliable pattern. Also, the BMN lower limits estimated under naïve spatial adjacency appear to be more conservative than those under the non-spatial BRIC-based definition of neighboring counties. In fact, this extends to all the percentile cut-offs that we report in Table 2. We feel that the consistent pattern in our non-spatial neighborhood definition perhaps cuts through the ‘noise’ in our large database and helps focus attention on the resilience-adjusted ‘signal.’

3.3. Coverage properties of the $\hat{BMN}$ and BMNL

To explore the operating characteristics of $\hat{BMN}$ based on (6) and the BMDL based on (8), we conducted a short Monte Carlo simulation study. Various numerical features were taken from our 3108-counties data analysis, above, to construct the simulation design. Specifically, we randomly sampled, without replacement, n counties from the actual 3108-counties data set, extracting each of their six-dimensional BRIC scores and their target benchmark predictor x = number of hazards. We selected two values for n: 900 and 3025. The neighborhood structure matrix $A$ for the n sampled counties was based on the non-spatial autocorrelation approach described in Section 1.2.2: from the six-dimensional BRIC Euclidean distances in (1) between each pair of counties we set $a_{i j} = 1$ if $0 \leq d_{i j} \leq D$ , and $a_{i j} = 0$ otherwise, where the breakpoint D was set to be the 1st percentile of the distances, based on our experience with it for the 3108-counties data.

Given values for $x_{i}$ and $A_{i}, i = 1, \dots, n$ , binary responses $Y_{i}$ with response probabilities defined via the centered autologistic model in (3) were generated via the computer, using the perfect sampler given by Hughes [22]. For the three unknown parameters in $β$ , we chose two combinations: $β = [- 4, 0.005, 1]^{T}$ and $β = [- 6, 0.01, 1]^{T}$ , whose components are roughly the values of the MPLEs we found from fitting the centered autologistic model to the 3108-counties data under different definitions of neighborhood structure; see Sec. 3.1, above. In particular, the value of $β_{2} = 1$ was considered to cover positive autocorrelation. For completeness, we also included the independence case with $β_{2} = 0$ . (The case of negative autocorrelation was not considered, as no significant negative autocorrelations were identified in the 3108-counties analysis above.) Coupled with the two different sample sizes n = 900, 5025, this produced a total of eight different design/parameter configurations for study; see Table 3.

Table 3.

Empirical coverage rates for centered autologistic benchmark number lower confidence limit (BMNL) at any $BMR > 1$ (see text) based on 2000 simulated data sets, along with empirical rates of convergence failure for the MPL algorithm, each across $10^{7}$ fitting attempts (5000 bootstrap resamples from 2000 simulated data sets), stratified by true autologistic regression parameter configuration (left column), and sample size n.

Coefficients:	Coverage rate		Failure rate
$[β_{0}, β_{1}, β_{2}]$	n = 900	n = 3025	n = 900	n = 3025
$[- 4, 0.005, 0]$	0.956	0.954	–	–
$[- 4, 0.005, 1]$	0.963	0.946	–	–
$[- 6, 0.01, 0]$	0.968	0.964	$9.20 \times 10^{- 6}$	–
$[- 6, 0.01, 1]$	0.968	0.959	–	–
Average	0.964	0.956

Open in a new tab

Notes: Nominal coverage level is set to 95%. Dashes indicate no convergence failures.

For each of the eight simulation configurations, 2000 simulated data sets were generated. In each simulated data set, the consequent MPLEs were calculated and from these, the corresponding $\hat{BMN}$ and bootstrap-based BMNL were determined using the approach described in Section 2.4 via the ngspatial $R$ package [22]. Then, the number of times these BMNLs correctly covered – i.e. remained below – the true value of the BMN for that parameter configuration was recorded. Dividing this by 2000 gave an empirical Monte Carlo estimate of the method's actual confidence level. Nominal confidence was set to $1 - α = 0.95$ ; therefore with 2000 simulations per configuration, the approximate standard error of our empirical coverage rates at the nominal 95% level is $\sqrt{(0.05) (0.95) / 2000} = 0.005$ and it never exceeds $\sqrt{(0.5) (0.5) / 2000} = 0.011$ .

Notice that under the construction based on Equation (7), the BMNL will correctly cover the true BMN from below if and only if the corresponding upper bootstrap limit $b_{1 U}$ correctly covers $β_{1}$ from above. As a result, the operation is independent of the BMR, so these coverage assessments hold for any choice of $BMR > 1$ .

The simulation results appear in Table 3. These represent empirical coverage rates of our bootstrap-based lower 95% confidence limit on the true BMN under the centered logistic model. (Indeed, they also represent empirical coverage rates of the bootstrap-based 95% upper confidence limit on $β_{1}$ .) As can be seen, all values rest above the nominal 95% confidence coefficient except in the case of $β = [- 4, 0.005, 1]^{T}$ and n = 3025, and this value does not differ significantly from the nominal level. As sample size increases, the rates drop towards the nominal level, on average. They also show rough agreement at these sample sizes with a larger simulation study we conducted in our previous investigation of geospatial risk benchmarking [27]. Taken together, we find that these bootstrap-based confidence limits appear to operate in a reasonable, if slightly conservative manner.

We did discover one minor instability with the MPL fitting algorithm. Apparently, cases can occur where the algorithm fails to converge in some bootstrap resamples, so that ngspatial reports only ‘NA’ for ${\hat{β}}_{1}$ . The resulting bootstrap distribution therefore contains fewer than the desired B = 5000 resampled ${\hat{β}}_{1}$ values within the simulation run. Table 3 also displays rates of how often this occurred in our simulations; each value is the number of reported NAs out of $10^{7}$ fitting attempts (i.e. 5000 bootstrap samples $\times 2000$ simulated data sets) at each simulation configuration. As can be seen, failures only occur for the case of $β = [- 6, 0.01, 0]^{T}$ at n = 900, and the rate is quite low. In fact, this phenomenon was also observed in our previous study [27]: failure rates there were comparable to the larger sample sizes we consider herein. As in that study, we view this as a slight inconvenience and a tolerable consequence of employing such a complex model/fitting procedure.

4. Discussion

In our previous work [27], we showed how to incorporate spatial autocorrelation for environmental risk assessment with autocorrelated geospatial data via a centered autologistic model. In fact, those methods are sufficiently extensible to apply in a variety of data scenarios where spatial autocorrelation may challenge the more-simplistic models in common use. This paper separates from naïve spatial adjacency and further extends the autologistic benchmark model to novel, non-spatial, autocorrelated settings. We corroborate the flexibility of the centered autologistic framework established in our previous article; however, we also find that for a binary outcome and a single ‘dose’ predictor, one can employ a variety of different definitions for neighborhood structure to quantify autocorrelation. We are led to encourage data scientists to explore non-spatial relationships between locations – using metrics such as the BRIC scores – to provide informative and comparable measures of non-spatial autocorrelation with these sorts of environmental and geographic hazard data. Indeed, the use of BRIC scores to represent the broad features of resilience allows for a richer definition of counties without propinquity – i.e. counties that share resilience characteristics but that are not spatially adjacent [15,38].

Of course, some caveats and qualifications are in order. Our approach has focused on a logistic model for fitting and predicting hazardous risks from the 3108-counties data. Obviously, however, many different forms for modeling $π_{i}$ in Equation (3) could be applied. Doing so would affect a number of features in our construction, including the specific form of the risk function $R (x)$ and potentially the consequent form of the BMD in (5) and all quantities developed from it. Indeed, when considering these many models for $π_{i}$ , or for that matter when selecting the percentile to define $A$ , some advanced form of model averaging [33] might be applied to fine-tune the risk analysis. Clearly, extensions to other model formulations under our spatially-adjusted paradigm is an area of open, future research.

Further, our choice of a simple Euclidean metric for the distances $d_{i j}$ in (1) was adopted as much for convenience and familiarity as for any other reason. Other distance metrics may be equally or more propitious if appropriate motivation for their use were available. For example, the well-know Manhattan distance (also called ‘Hamming’ distance)

d_{i j} = \sum_{t = 1}^{6} | b_{i t} - b_{j t} | .

(9)

could be applied in place of the Euclidean distance from (1) to define the proximity measure between two six-dimensional BRIC vectors, $b_{i}$ and $b_{j}$ . One would then mimic the approach in Section 1.2.2 and set some threshold D below which the counties are viewed as ‘neighbors’. The corresponding neighborhood structure can again be quantified via an adjacency matrix, $A$ , as in Section 1.2.2.

To explore this, we applied the Manhattan distance (9) in place of Euclidean distance to the 3108-counties data from Section 3, otherwise using the same settings and software as in that section. The MPL point estimates and various confidence limits appear in Table 4, while the consequent $\hat{BMN}$ s and BMNLs appear in Table 5. Comparing the results to those in Table 1 and Table 2, respectively, we see that, on balance, the magnitudes of the various point estimates and confidence limits are of roughly similar value. There is a hint of slightly higher MPLEs and correspondingly lower benchmark points with the Euclidean metric, but with no strongly consistent pattern.

Table 4.

The MPLEs with pointwise 95% bootstrap confidence intervals for $β_{1}$ and $β_{2}$ for the 3108-counties database, given different BRIC-distance percentile neighborhood matrices $A$ .

$A$	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$b_{1 B}$	$β_{1}$ lower	$β_{1}$ upper	$β_{2}$ lower	$β_{2}$ upper
0.25th percentile	−4.3473	0.0058	1.1043	0.0070	0.0044	0.0073	1.0254	1.2025
0.5th percentile	−4.2174	0.0057	0.9905	0.0073	0.0041	0.0075	0.9160	1.0843
1st percentile	−4.1542	0.0062	0.6453	0.0078	0.0044	0.0082	0.5942	0.7099
2nd percentile	−4.1240	0.0058	0.4293	0.0076	0.0039	0.0080	0.3934	0.4768
3rd percentile	−4.1909	0.0056	0.3747	0.0075	0.0035	0.0078	0.3410	0.4212
4th percentile	−4.3138	0.0050	0.4128	0.0073	0.0028	0.0077	0.3707	0.4754
5th percentile	−4.3733	0.0045	0.4206	0.0071	0.0021	0.0076	0.3735	0.4857
6th percentile	−4.4033	0.0044	0.3928	0.0072	0.0018	0.0077	0.3473	0.4609
7th percentile	−4.4328	0.0041	0.3975	0.0074	0.0011	0.0081	0.3487	0.4730
8th percentile	−4.4552	0.0041	0.3579	0.0075	0.0008	0.0083	0.3139	0.4304
9th percentile	−4.4559	0.0039	0.3480	0.0080	0.0002	0.0088	0.3027	0.4230
10th percentile	−4.4741	0.0039	0.3315	0.0082	0.0003	0.0091	0.2880	0.4100

Open in a new tab

Notes: $b_{1 B}$ represents an upper 95% bootstrap confidence limit for $β_{1}$ based on 5000 bootstrap resamples. Manhattan distance is employed to define the adjacency metric.

Table 5.

${\hat{BMN}}_{BMR}$ and pointwise 95% BMNL $_{BMR}$ at BMR=10, 25 for each neighborhood structure matrix $A$ with the 3108-counties data, using the same conditions as in Table 4.

$A$	$\hat{β_{1}}$	$b_{1 B}$	${\hat{BMN}}_{10}$	${BMNL}_{10}$	${\hat{BMN}}_{25}$	${BMNL}_{25}$
0.25th percentile	0.0058	0.0070	395.9773	327.4111	553.5524	457.7011
0.5th percentile	0.0057	0.0073	401.8472	316.7504	561.7582	442.7980
1st percentile	0.0062	0.0078	373.5223	293.8861	522.1617	410.8352
2nd percentile	0.0059	0.0076	395.1405	301.9651	552.3828	422.1291
3rd percentile	0.0056	0.0075	412.6542	308.6793	576.8658	431.5151
4th percentile	0.0050	0.0073	457.5444	315.6176	639.6196	441.2145
5th percentile	0.0045	0.0072	507.7418	323.5023	709.7926	452.2368
6th percentile	0.0044	0.0072	527.7336	321.3651	737.7400	449.2491
7th percentile	0.0041	0.0074	564.0084	310.5300	788.4499	434.1023
8th percentile	0.0041	0.0075	560.7612	307.4593	783.9104	429.8096
9th percentile	0.0039	0.0080	587.9782	288.9053	821.9582	403.8723
10th percentile	0.0039	0.0082	592.5658	282.4419	828.3714	394.8369

Open in a new tab

Of course, one need not classify counties as strict ‘neighbors’ (or not) to their adjacent – in a non-spatial sense – counties. If it were known, e.g. that differential, a priori weights exist quantifying how much the neighborhood status depends on the distances, those weights could be incorporated into Equation (3). Or, for that matter, our MPL fit for the centered autologistic model could be replaced by some form of Bayesian fit [19,39] if sufficient hierarchical prior information were available to apply the Bayesian paradigm. Indeed, the concept of adjusting for autocorrelation in a logistic regression is obviously not new: besides Besag's original paper [2] and the centered extension in [7], applications include use of auxiliary variables to account for severity of adverse events [4], variational methods to capture spatial dependence via Gaussian processes [16], and extensions to (sparse) generalized linear mixed models [22], among many others. In all these cases, further development for implementation in our risk-analytic context, and how to account for the necessary benchmark components, would be required.

Acknowledgments

Thanks are due to Dr Stephan R. Sain for his seminal suggestions on developing non-spatial measures of autocorrelation, to Dr John Hughes for discussions on the centered autologistic model, and to an anonymous referee for quite helpful suggestions on how to improve the manuscript. This material represents a portion of the first author's PhD dissertation from the University of Arizona Graduate Interdisciplinary Program in Statistics.

Funding Statement

The research was supported in part by #ES027394 from the U.S. National Institutes of Health.

Data availability

The full 3108-counties database was generated from the SHELDUS^TM knowledgebase (http://www.sheldus.org). Derived data employed in the calculations herein are available from the corresponding author [WWP] on request.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Arnold B.C. and Strauss D.J., Pseudolikelihood estimation: Some examples, Sank. Ser. B 53 (1991), pp. 233–243. [Google Scholar]
2.Besag J.E., Nearest-neighbour systems and the auto-logistic model for binary data, J. R. Stat. Soc. Ser. B 34 (1972), pp. 75–83. [Google Scholar]
3.Besag J.E., Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D 24 (1975), pp. 179–195. [Google Scholar]
4.Bee M., Benedetti R., and Espa G., Spatial models for flood risk assessment, Environmetrics 19 (2008), pp. 725–741. [Google Scholar]
5.Berke P.R. and Campanella T.J., Planning for postdisaster resiliency, Ann. Am. Acad. Pol. Soc. Sci. 604 (2006), pp. 192–207. [Google Scholar]
6.Budtz-Jørgensen E., Keiding N., and Grandjean P., Benchmark dose calculation from epidemiological data, Biometrics 57 (2001), pp. 698–706. [DOI] [PubMed] [Google Scholar]
7.Caragea P.C. and Kaiser M.S., Autologistic models with interpretable parameters, J. Agric. Biol. Environ. Stat. 14 (2009), pp. 281–300. [Google Scholar]
8.Crump K.S., A new method for determining allowable daily intakes, Toxicol. Sci. 4 (1984), pp. 854–871. [DOI] [PubMed] [Google Scholar]
9.Crump K.S., Calculation of benchmark doses from continuous data, Risk. Anal. 15 (1995), pp. 79–89. [Google Scholar]
10.Cressie N.A.C., Statistics for Spatial Data, John Wiley & Sons, New York, 1993. [Google Scholar]
11.Cutter S.L., Barnes L., Berry M., Burton C., Evans E., Tate E., and Webb J., A place-based model for understanding community resilience to natural disasters, Glob. Environ. Change. 18 (2008), pp. 598–606. [Google Scholar]
12.Cutter S.L., Ash K.D., and Emrich C.T., The geographies of community disaster resilience, Glob. Environ. Change. 29 (2014), pp. 65–77. [Google Scholar]
13.Dukes K.A., Cronbach's alpha, in Encyclopedia of Biostatistics 2, P. Armitage and T. Colton, eds., John Wiley & Sons, Chichester, 1998, pp. 1026–1028.
14.Folke C., Carpenter S., Elmqvist T., Gunderson L., Holling C.S., and Walker B., Resilience and sustainable development: Building adaptive capacity in a world of transformations, AMBIO: A J. Human Environ. 31 (2002), pp. 437–440. [DOI] [PubMed] [Google Scholar]
15.Gurney G.G., Blythe J., Adams H., Adger W.N., Curnock M., Faulkner L., James T., and Marshall N.A., Redefining community based on place attachment in a connected world, Proc. Natl. Acad. Sci. USA 114 (2017), pp. 10077–10082. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Hardouin C., A variational method for parameter estimation in a logistic spatial regression, Spat. Stat. 31 (2019). Article No. 100365 (14 pp.). [Google Scholar]
17.Harvey C., Extreme weather events could worsen climate change, Scientific American E&E News (24 January 2019). Available at https://www.scientificamerican.com/article/extreme-weather-events-could-worsen-climate-change/.
18.Hand D.J., Blunt G., Kelly M.G., and Adams N.M., Data mining for fun and profit, Stat. Sci. 15 (2000), pp. 111–131. [Google Scholar]
19.Hoeting J.A., Leecaster M., and Bowden D., An improved model for spatially correlated binary responses, J. Agric. Biol. Environ. Stat. 5 (2000), pp. 102–114. [Google Scholar]
20.Holling C.S., Resilience and stability of ecological systems, Annu. Rev. Ecol. Syst. 4 (1973), pp. 1–23. [Google Scholar]
21.Hosmer D.W., Lemeshow S., and Sturdivant R.X., Applied Logistic Regression, 3rd ed., John Wiley & Sons, New York, 2013. [Google Scholar]
22.Hughes J., ngspatial: A package for fitting the centered autologistic and sparse spatial generalized linear mixed models for areal data, R. J. 6 (2014), pp. 81–95. [Google Scholar]
23.Hughes J., Haran M., and Caragea P.C., Autologistic models for binary data on a lattice, Environmetrics 22 (2011), pp. 857–871. [Google Scholar]
24.Klein R.J.T., Nicholls R.J., and Thomalla F., Resilience to natural hazards: How useful is this concept? Global Environ. Change Part B: Environ. Haz. 5 (2003), pp. 35–45. [Google Scholar]
25.Kolaczyk E.D. and Csárdi G., Statistical Analysis of Network Data with R, Springer, New York, 2014. [Google Scholar]
26.Liu J., Autologistic modeling in benchmark risk analysis, Ph.D. thesis, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 2017.
27.Liu J., Piegorsch W.W., Schissler A.G., and Cutter S.L., Autologistic models for benchmark risk or vulnerability assessment of urban terrorism outcomes, J. R. Stat. Soc. Ser. A 181 (2018), pp. 803–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Manyena S.B., The concept of resilience revisited, Disasters 30 (2006), pp. 434–450. [DOI] [PubMed] [Google Scholar]
29.Martin C.R. and Savage-McGlynn E., A ‘good practice’ guide for the reporting of design and analysis for psychometric evaluation, J. Reprod. Infant Psychol. 31 (2013), pp. 449–455. [Google Scholar]
30.Nardo M., Saisana M., Saltelli A., and Tarantola S., Handbook on Constructing Composite Indicators: Methodology and User Guide, Organisation For Economic Co-Operation and Development Publishing, Paris, 2008.
31.Nitcheva D.K., Piegorsch W.W., West R.W., and Kodell R.L., Multiplicity-adjusted inferences in risk assessment: Benchmark analysis with quantal response data, Biometrics 61 (2005), pp. 277–286. [DOI] [PubMed] [Google Scholar]
32.Piegorsch W.W. and Bailer A.J., Analyzing Environmental Data, John Wiley & Sons, Chichester, 2005. [Google Scholar]
33.Piegorsch W.W., An L., Wickens A., West W., Peña E.A., and Wu W., Information-theoretic model-averaged benchmark dose analysis in environmental risk assessment, Environmetrics 24 (2013), pp. 143–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2019. Available at http://www.R-project.org/.
35.Scott D.W., Multivariate Density Estimation. Theory, Practice, and Visualization, John Wiley & Sons, New York, 1992. [Google Scholar]
36.Tarabusi E.C. and Guarini G., An unbalance adjustment method for development indicators, Soc. Indic. Res. 112 (2013), pp. 19–45. [Google Scholar]
37.Varin C., Reid N., and Firth D., An overview of composite likelihood methods, Stat. Sin. 21 (2011), pp. 5–42. [Google Scholar]
38.Webber M.M., Order in diversity: Community without propinquity, in Cities and Space, L. Wirigo, ed., Johns Hopkins University Press, Baltimore, 1983, pp. 23–56.
39.Zheng Y. and Zhu J., Markov chain Monte Carlo for a spatial-temporal autologistic regression model, J. Comput. Graph. Stat. 17 (2008), pp. 123–137. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CIT0001] 1.Arnold B.C. and Strauss D.J., Pseudolikelihood estimation: Some examples, Sank. Ser. B 53 (1991), pp. 233–243. [Google Scholar]

[CIT0002] 2.Besag J.E., Nearest-neighbour systems and the auto-logistic model for binary data, J. R. Stat. Soc. Ser. B 34 (1972), pp. 75–83. [Google Scholar]

[CIT0003] 3.Besag J.E., Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D 24 (1975), pp. 179–195. [Google Scholar]

[CIT0004] 4.Bee M., Benedetti R., and Espa G., Spatial models for flood risk assessment, Environmetrics 19 (2008), pp. 725–741. [Google Scholar]

[CIT0005] 5.Berke P.R. and Campanella T.J., Planning for postdisaster resiliency, Ann. Am. Acad. Pol. Soc. Sci. 604 (2006), pp. 192–207. [Google Scholar]

[CIT0006] 6.Budtz-Jørgensen E., Keiding N., and Grandjean P., Benchmark dose calculation from epidemiological data, Biometrics 57 (2001), pp. 698–706. [DOI] [PubMed] [Google Scholar]

[CIT0007] 7.Caragea P.C. and Kaiser M.S., Autologistic models with interpretable parameters, J. Agric. Biol. Environ. Stat. 14 (2009), pp. 281–300. [Google Scholar]

[CIT0008] 8.Crump K.S., A new method for determining allowable daily intakes, Toxicol. Sci. 4 (1984), pp. 854–871. [DOI] [PubMed] [Google Scholar]

[CIT0009] 9.Crump K.S., Calculation of benchmark doses from continuous data, Risk. Anal. 15 (1995), pp. 79–89. [Google Scholar]

[CIT0010] 10.Cressie N.A.C., Statistics for Spatial Data, John Wiley & Sons, New York, 1993. [Google Scholar]

[CIT0011] 11.Cutter S.L., Barnes L., Berry M., Burton C., Evans E., Tate E., and Webb J., A place-based model for understanding community resilience to natural disasters, Glob. Environ. Change. 18 (2008), pp. 598–606. [Google Scholar]

[CIT0012] 12.Cutter S.L., Ash K.D., and Emrich C.T., The geographies of community disaster resilience, Glob. Environ. Change. 29 (2014), pp. 65–77. [Google Scholar]

[CIT0013] 13.Dukes K.A., Cronbach's alpha, in Encyclopedia of Biostatistics 2, P. Armitage and T. Colton, eds., John Wiley & Sons, Chichester, 1998, pp. 1026–1028.

[CIT0014] 14.Folke C., Carpenter S., Elmqvist T., Gunderson L., Holling C.S., and Walker B., Resilience and sustainable development: Building adaptive capacity in a world of transformations, AMBIO: A J. Human Environ. 31 (2002), pp. 437–440. [DOI] [PubMed] [Google Scholar]

[CIT0015] 15.Gurney G.G., Blythe J., Adams H., Adger W.N., Curnock M., Faulkner L., James T., and Marshall N.A., Redefining community based on place attachment in a connected world, Proc. Natl. Acad. Sci. USA 114 (2017), pp. 10077–10082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0016] 16.Hardouin C., A variational method for parameter estimation in a logistic spatial regression, Spat. Stat. 31 (2019). Article No. 100365 (14 pp.). [Google Scholar]

[CIT0017] 17.Harvey C., Extreme weather events could worsen climate change, Scientific American E&E News (24 January 2019). Available at https://www.scientificamerican.com/article/extreme-weather-events-could-worsen-climate-change/.

[CIT0018] 18.Hand D.J., Blunt G., Kelly M.G., and Adams N.M., Data mining for fun and profit, Stat. Sci. 15 (2000), pp. 111–131. [Google Scholar]

[CIT0019] 19.Hoeting J.A., Leecaster M., and Bowden D., An improved model for spatially correlated binary responses, J. Agric. Biol. Environ. Stat. 5 (2000), pp. 102–114. [Google Scholar]

[CIT0020] 20.Holling C.S., Resilience and stability of ecological systems, Annu. Rev. Ecol. Syst. 4 (1973), pp. 1–23. [Google Scholar]

[CIT0021] 21.Hosmer D.W., Lemeshow S., and Sturdivant R.X., Applied Logistic Regression, 3rd ed., John Wiley & Sons, New York, 2013. [Google Scholar]

[CIT0022] 22.Hughes J., ngspatial: A package for fitting the centered autologistic and sparse spatial generalized linear mixed models for areal data, R. J. 6 (2014), pp. 81–95. [Google Scholar]

[CIT0023] 23.Hughes J., Haran M., and Caragea P.C., Autologistic models for binary data on a lattice, Environmetrics 22 (2011), pp. 857–871. [Google Scholar]

[CIT0024] 24.Klein R.J.T., Nicholls R.J., and Thomalla F., Resilience to natural hazards: How useful is this concept? Global Environ. Change Part B: Environ. Haz. 5 (2003), pp. 35–45. [Google Scholar]

[CIT0025] 25.Kolaczyk E.D. and Csárdi G., Statistical Analysis of Network Data with R, Springer, New York, 2014. [Google Scholar]

[CIT0026] 26.Liu J., Autologistic modeling in benchmark risk analysis, Ph.D. thesis, Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ, 2017.

[CIT0027] 27.Liu J., Piegorsch W.W., Schissler A.G., and Cutter S.L., Autologistic models for benchmark risk or vulnerability assessment of urban terrorism outcomes, J. R. Stat. Soc. Ser. A 181 (2018), pp. 803–823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0028] 28.Manyena S.B., The concept of resilience revisited, Disasters 30 (2006), pp. 434–450. [DOI] [PubMed] [Google Scholar]

[CIT0029] 29.Martin C.R. and Savage-McGlynn E., A ‘good practice’ guide for the reporting of design and analysis for psychometric evaluation, J. Reprod. Infant Psychol. 31 (2013), pp. 449–455. [Google Scholar]

[CIT0030] 30.Nardo M., Saisana M., Saltelli A., and Tarantola S., Handbook on Constructing Composite Indicators: Methodology and User Guide, Organisation For Economic Co-Operation and Development Publishing, Paris, 2008.

[CIT0031] 31.Nitcheva D.K., Piegorsch W.W., West R.W., and Kodell R.L., Multiplicity-adjusted inferences in risk assessment: Benchmark analysis with quantal response data, Biometrics 61 (2005), pp. 277–286. [DOI] [PubMed] [Google Scholar]

[CIT0032] 32.Piegorsch W.W. and Bailer A.J., Analyzing Environmental Data, John Wiley & Sons, Chichester, 2005. [Google Scholar]

[CIT0033] 33.Piegorsch W.W., An L., Wickens A., West W., Peña E.A., and Wu W., Information-theoretic model-averaged benchmark dose analysis in environmental risk assessment, Environmetrics 24 (2013), pp. 143–157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0034] 34.R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2019. Available at http://www.R-project.org/.

[CIT0035] 35.Scott D.W., Multivariate Density Estimation. Theory, Practice, and Visualization, John Wiley & Sons, New York, 1992. [Google Scholar]

[CIT0036] 36.Tarabusi E.C. and Guarini G., An unbalance adjustment method for development indicators, Soc. Indic. Res. 112 (2013), pp. 19–45. [Google Scholar]

[CIT0037] 37.Varin C., Reid N., and Firth D., An overview of composite likelihood methods, Stat. Sin. 21 (2011), pp. 5–42. [Google Scholar]

[CIT0038] 38.Webber M.M., Order in diversity: Community without propinquity, in Cities and Space, L. Wirigo, ed., Johns Hopkins University Press, Baltimore, 1983, pp. 23–56.

[CIT0039] 39.Zheng Y. and Zhu J., Markov chain Monte Carlo for a spatial-temporal autologistic regression model, J. Comput. Graph. Stat. 17 (2008), pp. 123–137. [Google Scholar]

PERMALINK

Adjusting statistical benchmark risk analysis to account for non-spatial autocorrelation, with application to natural hazard risk assessment

Jingyu Liu

Walter W Piegorsch

A Grant Schissler

Rachel R McCaster

Susan L Cutter

ABSTRACT

1. Introduction

1.1. Background: Place-based risk analytics

1.2. Hazard assessment for 3108 US counties

Figure 1.

Figure 2.

1.2.1. Baseline resilience indicators for communities (BRIC)

1.2.2. Non-spatial autocorrelation

Figure 3.

2. Autologistic model development

2.1. Centered autologistic model

2.2. Maximum pseudo-likelihood estimation

2.3. Autologistic benchmark risk analysis

2.4. Benchmark estimation and inference

3. Benchmark analysis for the 3108-counties data

3.1. Maximum pseudo-likelihood estimates

Table 1.

Figure 4.

3.2. Benchmark risk assessment

Table 2.

Figure 5.

Figure 6.

3.3. Coverage properties of the BMN^ and BMNL

Table 3.

4. Discussion

Table 4.

Table 5.

Acknowledgments

Funding Statement

Data availability

Disclosure statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.3. Coverage properties of the $\hat{BMN}$ and BMNL