When we hear the word “disaster”, we often think of events like hurricanes, heatwaves, pandemics, or terrorist attacks. Rarely do we stop to ask, how vulnerable is my location to such hazards? Some of us might ponder the question, but chances are good that any answer we come up with will be somewhat limited in usefulness. Humankind is notoriously poor at judging low-probability/high-consequence events such as pandemics and terrorist attacks, especially when they pertain to adverse, detrimental, “risky” outcomes. Collectively, we are even less aware of any pre-existing vulnerabilities in the places in which we live and work that can either amplify or attenuate the risk of “disaster”.
This leads us to a different question: can statistical quantification of the risks and vulnerabilities we face every day become more useful to, and useable by, local residents and decision makers to understand the dangers and the range of responses communities can muster to address them?
In the absence of quantifiable criteria for assessing a locality’s vulnerability to hazardous impacts, risk managers possess only public reactions to subjective, often overly-hyped inputs regarding the dangers of potential hazards. Data science can, however, explicitly quantify place-based risk and vulnerability to hazardous impacts.
Quantifying place-based urban vulnerability
In general terms, the vulnerability of places is a function of the social characteristics of the people who live there (social vulnerability) and their susceptibility to harm. Place-based vulnerability is also a function of a community’s exposure to damage and loss of function (which we call built-environment vulnerability). Place vulnerability also includes exposure related to physical processes that produce hazardous events such as flooding, hurricanes, or earthquakes, along with their frequency and impact (physical-hazard vulnerability).1
For example, markers of social vulnerability might include a locality’s per capita income and its percentage of population below the poverty level. Higher values of the former and lower values of the latter afford each household greater potential to prepare for and cope with hazardous events that require increases in household spending; these affect building/structure protection and repair, emergency medical costs, and so forth. Another example is a local government’s debt-to-revenue ratio. Higher ratios hinder a government’s ability to respond to unexpected or sudden hazards: increased debt servicing requirements lower the resources potentially available for response(s) to negative consequences of hazardous events.
For built-environment vulnerability, markers might include a locality’s median age of housing units, and its number of mobile homes. Older buildings, if not maintained, suffer greater damage during hazardous events, while large numbers of mobile homes are susceptible to damage in high-wind events such as tornados and hurricanes. The number of hospital beds and modernity of the medical infrastructure is another marker of vulnerability. Smaller and/or older medical institutions cannot respond to or cope with rapid, numerous casualties during a hazardous event. This increases community vulnerability, a relevant issue for contemporary biomedical hazards such as pandemics.
Finally, physical-hazard vulnerability is affected by past hazardous events. A locality’s experiences with many past events, especially if they were of diverse forms, indicates the need for more complex protection and mitigation systems. Cumulative experience with hazards – such as frequent flooding – generally drains a community’s resources and lowers its resilience to future events, increasing its vulnerability status. The locality’s geophysical features – such as peninsulas and islands, extent of shoreline(s), weather/wind patterns, etc. – can hinder or prevent rapid evacuation in the period leading up to or immediately following a hazardous event. Also, locations with greater risks of weather-related hazards, such as hurricanes or tornados, carry obvious, increased susceptibility to adverse impacts.
By carefully curating and analyzing the information these various markers provide, a single new metric can quantify the three broad aspects of place-based vulnerability, focusing on specific, undesirable outcomes or fundamental vulnerabilities with which each locality must contend. This single metric is comprised of three different indices:
The Social Vulnerability Index2 (SoVI), first developed in the early 2000s, is designed to summarize socioeconomic and demographic characteristics that interact and influence a community’s differential susceptibility to hazardous impacts, along with its overall capacity to prepare for, respond to, and ultimately recover from the event. It is a statistically derived, unit-less measure that provides quantitative, comparative value across geographic locations. Larger SoVI scores indicate greater social vulnerability, but these scores have no inherent meaning unless compared to other places – generally depicted as a map to visually highlight the comparisons (see https://sovius.org).
In contrast to SoVI’s socioeconomic focus, the Hazard Vulnerability Index1 (HazVI) focuses on geophysical structures that underlie a locality’s vulnerability and past hazard experiences; it acts as a surrogate for exposures to and locality-specific involvement with natural events that result in losses within the community. For example, localities in the US state of Nebraska have far lower earthquake risk than those in California, but Nebraska’s localities are more prone to tornadoes than California’s. HazVI quantifies such geophysical features. It also provides a proxy for potential risk from natural hazards based on the frequency of previous events and the diversity of event types. The latter is important for planning and preparedness purposes: it is much easier to plan for fewer event types and infrequent hazards in both preparedness and response than the reverse.
Expanding on the HazVI metric, the Built-Environment Vulnerability Index (BEVI)1 captures localized vulnerability due to the diversity and type of built-environment infrastructure, such as water and transportation, property values, age of housing, power grid distributions, and support services such as hospitals and fire stations. For instance, large numbers of vulnerable features such as oil and gas lines at risk of leakage or spillage, or bridges vulnerable during earthquakes and flooding are factors that increase BEVI.
For purposes of summarizing a locality’s overall vulnerability burden, we combine the SoVI, HazVI, and BEVI indices into a single, place-based vulnerability index, called PVI. Because the indices exhibit different patterns of geographic variability, we construct PVI as a weighted average based on the observed variance of each of the three components: lower weight is given to an index if its variance is large.3 Higher variability corresponds to lower precision, so this weighted PVI decreases the contribution of a high-variability/low-precision index. We have found the weighted PVI metric to be more effective than a simple, unweighted average for summarizing place-based vulnerability with large urban centers.3,4 The following examples illustrate its application with two different types of hazard events: urban terrorism and flood damage.
Urban terrorism vulnerability
In a 2007 paper, we focused on whether or not an urban center experienced any human casualties (injuries or deaths) from terrorist-related events during a 35-year study period, 1970– 2004, for 132 cities in the 50 US states and the District of Columbia.3 We connected the PVI with data from the Global Terrorism Database (GTD; https://start.umd.edu/gtd), where a terrorist “event” was quantified via a binary indicator of whether or not any human casualties or deaths were recorded in any terrorism episode during the study period. We did not differentiate the nature, motive, or severity of the terrorist event, because to do so would reduce the number of places with no events. We found that 36 of the 132 cities, or 27.3%, reported such terrorist-related casualty events during that 35-year time frame.
Figure 1 maps the full collection of 132 metropolitan areas, color-coding each locality according to its predicted probability of a terrorist casualty based on its underlying vulnerability. Thus a city manager studying the map could say that they have that given predicted probability of a future terrorist attack leading to human casualties, as long as the city records conditions leading to their particular input PVI. We viewed a predicted probability above 50% as indicative of extreme urban vulnerability (red shading in the figure). Cities with probabilities less than 25% are shaded blue, while those between 25–50% are shaded orange. Intriguingly, all cities that exhibit extreme-probability vulnerability are located on or east of the Mississippi River.
Figure 1.
Map of 132 large US metropolitan areas (cities),3 coded by their predicted probability of terrorism-related casualties.4 Blue cities: < 0.25; orange cities: 0.25 to 0.50; red cities: > 0.50.
In a subsequent article in 2018,4 we focused on how the PVI predictor was affected by spatial proximity to other cities and found a negative spatial correlation: when a city experiences a terrorist casualty event, an adjacent city would expect not to encounter such an event, and vice versa. The vulnerability hazardscape depicted in Figure 1 is derived from our 2018 analysis (see Modeling spatial autocorrelation).
Modeling spatial autocorrelation.
To incorporate spatial autocorrelation into an analysis of the binary outcome data in our 132-cities database, we chose a construct based on the logistic regression model, called a centered autologistic model.8 Pertinent to our application, spatial autocorrelation was expressly included as a quantitative predictor in the model’s construction.4 For the both the urban terrorism vulnerability data and the flood damage data, we calculated the inverse-variance-weighted, place-based vulnerability index, PVI, for each of the 132 US urban centers and employed it as a single predictor variable, x, in the model. From the consequent model fit, we estimated the autologistic probabilities, (x), for each city, in effect ranking them according to their predicted probability of terrorism-related casualties. Table 1 lists the ordered, top 10 cities for the terrorism data according to this arrangement; coincidentally, these also correspond to all those instances where (x) > 0.50 for this outcome.
We similarly applied the centered autologistic model to our flood vulnerability data. We calculated the model’s predicted probabilities of above-median flood-damage claims, (x), as a function of x = PVI for each of the 132 cities. Table 2 lists the top 10 cities according to this new arrangement. In comparing Tables 1 and 2 we see that five urban centers – Washington, DC, New Orleans, Philadelphia, Norfolk, and Charleston – reside on both lists, exhibiting the greatest probability of adverse outcomes based on both terrorism casualties and flood damage.
Notice also in the figure that a number of urban areas outside the highly populous northeast quadrant appear somewhat isolated: especially in the less-populated central and western states, large urban centers are not always adjacent to each other. This is simply a function of our study’s focus on only 132 of the largest, most vulnerable, urban centers in the US and does not hinder the inferences available from the data. (A complete description of the adjacency patterns among these 132 cities appears as supplemental material to our 2018 paper.4)
Flood damage vulnerability
One of the most common, and most damaging forms of natural hazard is flooding. Flooding causes obvious damage to goods and property, but it is less immediate in showing death and destruction than more immediate, headline-grabbing disasters such as earthquakes and tornados. Yet floods often follow on from hurricanes and severe storms, and they can lead to considerable adverse consequences: damage from these three events accounts for as much as 75% of all US hazard losses in the 50-year period from 1960–2009.5
An effective data-analytic strategy to quantify flood damage identifies how often insurance claims are submitted by homeowners and businesses affected by severe flood events. The US National Flood Insurance Program (NFIP) provides flood insurance coverage for homeowners and businesses, established by the US Congress in 1968 in response to devastating flood losses from Hurricane Betsy in 1965. Since the NFIP’s inception, more than 2 million flood insurance claims have been recorded, producing a substantial source of data on flood events. In 2019, the NFIP released a comprehensive claims data set (https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims). In order to study how urban vulnerability describes and predicts flood losses, we connected claims information spanning the years 1977–2019 in this NFIP data set with our composite place-based PVI vulnerability index. We employed a binary outcome variable indicating whether a city’s number of flood insurance claims was at or above the median for numbers of claims over the entire time period. The resulting analysis provides another opportunity to illustrate place-based patterns of vulnerability: Figure 2 maps the full set of 132 cities, again color-coded according to their predicted probabilities of flood damage/claims. As above, a city manager could refer to the map and say that they have that given predicted probability of future flood damage, as long as their city registers that particular input PVI.
Figure 2.
Map of 132 large US metropolitan areas (cities)3, coded by their predicted probability of flood damage/claims. Blue cities: < 0.25; orange cities: 0.25 to 0.50; red cities: > 0.50.
In Figure 2, 70 out of 132 cities now reside in the extreme-vulnerability category with respect to flooding (shaded in red), and the spatial correlation in the flood data is now positive. In addition, and perhaps not surprisingly, the top 10 cities with probabilities of excess flood insurance claims (Table 2) now involve localities situated on rivers or shorelines, including New Orleans and Baton Rouge in Louisiana, and Norfolk, Virginia.
Comparing place vulnerability among different hazards
Comparing Figures 1 and 2, we see that the geographic patterns of predicted probabilities visualize quite differently, with more of the high-flood vulnerability cities appearing in the central and western US. In particular, the predicted flood-damage probabilities are notably much higher than those in the earlier terrorism-based analysis. In fact, all the top 10 flood-based values are above 90% (Table 2), compared to none of those in the terrorism case: in the latter instance, the highest is Washington, DC at 76% (Table 1). (Here again, these are all values city managers can employ to report their predicted probabilities of future adverse events – flood damage or terrorist casualties – as long as their city registers that particular input PVI.) This suggests that urban vulnerability to flood damage is more extensive across the US than vulnerability to terrorist casualties. In both cases, however, we find that the model’s predictive capabilities are quite good (see Predictive analytics). Overall, the differential patterns in Figures 1 and 2 help illustrate – literally and computationally – how different hazardous outcomes can produce substantively distinct vulnerability hazardscapes.
Table 2.
Top10 large US metropolitan areas (cities) with highest autocorrelation-adjusted predicted probabilities of median-exceeding flood insurance claims (far-right column). Also included is each city’s place-based vulnerability index, PVI, from which the predicted probabilities are calculated. Compare to the ordering and values in Table 1.
| Metropolitan area (‘city’) | PVI | (PVI) |
|---|---|---|
| New Orleans, LA | 6.838 | 0.988 |
| Baton Rouge, LA | 6.735 | 0.986 |
| Norfolk-Chesapeake-Newport News-Virginia Beach, VA | 6.045 | 0.978 |
| Charleston, SC | 6.262 | 0.973 |
| New York, NY-Newark, NJ | 5.873 | 0.968 |
| Washington, DC | 5.697 | 0.953 |
| Philadelphia, PA | 5.456 | 0.944 |
| Richmond, VA | 5.655 | 0.937 |
| Houston, TX | 5.563 | 0.933 |
| Boise, ID | 5.415 | 0.918 |
Table 1.
Top 10 large US metropolitan areas (cities) with highest autocorrelation-adjusted predicted probabilities of terrorism-related casualties (far-right column). Also included is each city’s place-based vulnerability index, PVI, from which the predicted probabilities are calculated.
| Metropolitan area (‘city’) | PVI | (PVI) |
|---|---|---|
| Washington, DC | 5.697 | 0.766 |
| New Orleans, LA | 6.838 | 0.732 |
| Philadelphia, PA | 5.456 | 0.683 |
| Norfolk-Chesapeake-Newport News-Virginia Beach, VA | 6.045 | 0.635 |
| Columbia, SC | 4.856 | 0.587 |
| Tampa-St. Petersburg, FL | 4.869 | 0.579 |
| Greensboro-Winston Salem, NC | 4.533 | 0.562 |
| Charleston, SC | 6.262 | 0.532 |
| Detroit-Warren, MI | 3.907 | 0.521 |
| Boston, MA | 4.323 | 0.514 |
Predictive analytics.
We can illustrate the predictive capability of the centered autologistic model (see Modeling spatial autocorrelation) by applying statistical, “machine” learning techniques to the predictive outcomes. For instance, define a positive prediction for terrorist events as a predictive probability in excess of 50% where highly vulnerable cities have the greatest potential to experience a terrorism casualty, i.e., (PVI) > 0.50. Values lower than this represent negative predictions, or simply less risk.
We applied these predictions to the terrorism casualty events observed during the study period (1970–2004) to assess how well the predictions matched actual occurrences. In effect, we trained the predictive model to classify cities as to their potential terrorism status; see Table 3.
A pertinent summary statistic from this 2×2 training table is the accuracy, i.e., the correct classification rate, also known as the concordance. This is the proportion of correct positive and negative predictions: the sum of the two main diagonal counts in the table divided by the total. Here, we find training accuracy = 100/132 = 75.76%, which is above the uninformative, coin-flip baseline of 50%, and is indicative of good predictive power.
For this risk-analytic setting, another pertinent summary statistic is the precision – also known as the positive predictive value – in the 2×2 table, i.e., the correct proportion of positive predictions. The positive predictive value of adverse events for cities concerned about their vulnerability to terrorism casualties is of greater importance than the alternative, negative predictive value of avoiding terrorism casualties. Here, the centered autologistic model’s precision equals an encouraging 70%.
The terrorism data occur between 1970–2004, thus we can re-access the Global Terrorism Database (GTD) and query whether any of these 132 cities experienced terrorism casualties in later years. We compare the predictions from the 1970–2004 training data set with the most-recent data from 2005–2018 (called the test data set in statistical learning). This produces Table 4. Now we find test accuracy = 86 /132 = 65.15%, dropping slightly below that from the training data but nonetheless still above 50%. It is not uncommon to see drops in accuracy as the test of a trained classification rule is conducted. Promisingly, precision in the table remains at 70%.
We can also apply a statistical learning analysis on the flood-damage outcomes. The approach is essentially identical: classify a city as positive if its predicted autologistic probability exceeds 50%. Then, compare the predicted classifications with those actually observed. This produces Table 5. Here test accuracy = 94 /132 = 71.21%, while precision in the table once again reports as exactly 70%. Both values suggest strong predictive power. Note that more recent data do not currently exist for these flood outcomes that would allow us to construct a test classification analysis.
Responding to risk
These examples illustrate how data-scientific strategies can quantify a location’s vulnerability to hazardous events. Our applications to US data on urban vulnerability allow for real knowledge discovery: e.g., significant negative spatial correlation was observed for terrorism-based casualties in the database. This may seem counterintuitive at first, but upon reflection it does appear plausible. Perhaps the occurrence of terrorist events in one city tends to increase emergency preparedness and response planning in adjacent cities, leading to fewer terrorism (or at least lowered casualty) events. On the other hand, perhaps putative terrorists chose to ignore nearby cities in order to maximize their desired impact across a wider geographic space. Many other possibilities exist, and understanding the underlying processes that drive terrorist attacks is an open, ongoing research question.6
From a larger perspective, the message is simple: it is not difficult to quantify and compare place-based susceptibilities to natural and other hazards; however, to do so, one must think outside the proverbial box and integrate modern place-based vulnerability metrics into the analysis. The effort provides effective, data-based guidance on how urban environments must prepare for and respond to a range of hazard types. Indeed, these calculations should be viewed as a foundation from which place-based statistical risk analyses may evolve, as more advanced measures of urban vulnerability – and resilience7 – are added to the body of work in quantitative risk assessment.
Table 3.
Terrorism casualty training set analysis.
| Observed (1970–2004) | ||||
|---|---|---|---|---|
| Positive adverse event | Negative adverse event | Row totals | ||
| Prediction(1970–2004) | Positive adverse event | 7 | 3 | 10 |
| Negative adverse event | 29 | 93 | 122 | |
| Column totals | 36 | 96 | 132 | |
Table 4.
Terrorism casualty test set analysis.
| Observed (2005–2018) | ||||
|---|---|---|---|---|
| Positive adverse event | Negative adverse event | Row totals | ||
| Prediction(1970–2004) | Positive adverse event | 7 | 3 | 10 |
| Negative adverse event | 43 | 79 | 122 | |
| Column totals | 50 | 82 | 132 | |
Table 5.
Flood damage training set analysis.
| Observed (1977–2019) | ||||
|---|---|---|---|---|
| Positive adverse event | Negative adverse event | Row totals | ||
| Prediction (1977–2019) | Positive adverse event | 49 | 21 | 70 |
| Negative adverse event | 17 | 45 | 62 | |
| Column totals | 66 | 66 | 132 | |
Notes and acknowledgement
This research was supported in part by grant #ES027394 from the US National Institute of Environmental Health Sciences. Sincere thanks are due to Dr. Jingyu Liu for background on some of the computational details, and to two anonymous reviewers for constructive inputs on the material.
Contributor Information
Walter W. Piegorsch, University of Arizona
Rachel R. McCaster, University of South Carolina
Susan L. Cutter, University of South Carolina
References
- 1.Borden K, Schmidtlein MC, Emrich CT, Piegorsch WW, and Cutter SL (2007). Natural hazards vulnerability in U.S. cities. J. Homeland Secur. Emerg. Manage. 4(2), Article No. 5 (22 pp.). [Google Scholar]
- 2.Cutter SL, Boruff BJ, and Shirley WL (2003). Social vulnerability to environmental hazards. Soc. Sci. Quart 84(2), 242–261. [Google Scholar]
- 3.Piegorsch WW, Cutter SL, and Hardisty F (2007). Benchmark analysis for quantifying urban vulnerability to terrorist incidents. Risk Analy. 27(6), 1411–1425. [DOI] [PubMed] [Google Scholar]
- 4.Liu J, Piegorsch WW, Schissler AG, and Cutter SL (2018). Autologistic models for benchmark risk or vulnerability assessment of urban terrorism outcomes. J. Roy. Statist. Soc., ser. A (Statist. Soc.) 181(3), 803–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gall M, Borden KA, Emrich CT, and Cutter SL (2011). The unsustainable trend of natural hazard losses in the United States. Sustainability 3(11), 2157–2181. [Google Scholar]
- 6.Python A, Illian JB, Jones-Todd CM, and Blangiardo M (2019). The deadly facets of terrorism. Significance 16(4), 28–31. [Google Scholar]
- 7.Cutter SL, Ash KD, and Emrich CT (2014). The geographies of community disaster resilience. Glob. Environ. Change 29, 65–77. [Google Scholar]
- 8.Caragea PC and Kaiser MS (2009). Autologistic models with interpretable parameters. J. Agric. Biol. Environ. Statist 14(3), 281–300. [Google Scholar]


