Abstract
Area-level measures of the social exposome provide powerful tools to understand how context contributes to health disparities. Due to the geographic phenomenon of the modifiable aerial unit problem, the geographic level at which the index is constructed can threaten it utility. Previous work indicates that using smaller geographic levels lead to increased measurement precision which may result in closer alignment to policies that directly address health disparities.
To provide an illustrative example of this phenomenon, we use Medicare 100% Fee-for Service hospitalization claims data to evaluate the association between area-level disadvantage and 30-day readmissions when the Area Deprivation Index (ADI) is constructed at different geographic levels. When area-level disadvantage is summarized at the “neighborhood” census block group—the study’s smallest geographic level—there was a 20% higher odds of readmissions for those living in the top 20% most disadvantaged neighborhoods compared to those living in the lowest 80% neighborhoods nationwide. Yet, evidence for an association with readmissions was not found when neighborhood disadvantaged was summarized at larger geographic levels.
Smaller geographic levels appear most optimal to capture these effects. In order to provide publicly available data that is truly publicly useable, greater attention in providing small area health data is needed.
Keywords: Scale, Index of Disadvantage, ADI, MAUP
1. INTRODUCTION
Understanding adverse health outcomes is commonly the domain of the doctor-patient relationship. However, research on social determinants of health has established that the social exposome —all the social exposures someone experiences in a lifetime and how it impacts their health—plays an outsized role in individual health (Wild, 2012). While not new, this understanding drives a need to better conceptualize the area-based measures that are used to quantify the social exposome.
Per fundamental cause theory (Phelan et al., 2010), area-level measures of the social exposome reflect the availability of key contextual factors and resources that drive health and serve as powerful tools to quantify the contextual factors driving geographic health disparities around the world. Examining the exposome requires measures that allow for precise, valid and (sometimes) longitudinal assessments of contextual exposure. Researchers can align these measures toward action and harness findings to guide strategies that promote health equity in an actionable manner. But, to accomplish this goal, there is a critical need to understand the extent the geographic level measured (e.g. county, census tract, census block group) influences its utility.
Area level measures are dynamic—reflecting the specific context over a certain period of time. They often include socioeconomic factors such as income, education, housing and employment at a precise geographic level (Buckingham et al., 2021). Some area-level indices, like the Census block group derived ADI, also reflect patterns of structural disadvantage and systematic racism. For example, redlining – the practice of systematically limiting a certain population’s access to financial resources on the basis of geography and race (as well as other characteristics) in the US—can be reflected by some area-level indices. However, the geographic level and precision of these indices must be tailored to optimally reflect such patterns of structural inequity and systematic racism. Choosing a geographic level which is too large will lead to masking of the inequity. This is demonstrated in Figure 1 in which a detail of Indianapolis IN, the block group level ADI (middle) reflects the neighborhood disadvantage linked strongly to historic redlining(Lynch et al., 2021) (left). In comparison, use of a county level ADI masks these nuanced neighborhood level details (right).
Figure 1.

HOLC Redlining, Census Block Group ADI, County ADI. Red in the left map denotes area where lending was restricted. Red in the center map denotes highly disadvantaged areas.
Building on the existing understanding of scale effects on these measures, this study uses the Area Deprivation Index (Kind & Buckingham, 2018; Singh, 2003) and 30-day Medicare rehospitalization risk when the ADI is calculated at different geographic levels to determine the effect of scale on the outcome. Medicare claims data provide a rich source to evaluate the performance of area-level measures of the social exposome summarized at different geographic levels.
1.1. Indices and Scale: Bigger is Not Typically Better
Researchers use many different geographic levels in practice, ranging from the more granular (US Census block-group, Census tract) to broader categorizations (5-digit zip code, county and state). Determining the appropriate level of geography to use in area-based studies is critical to align with the research and policy question of interest as not all geographic levels are appropriate for all situations.
A key consideration when determining the optimal scale to use in a study is minimizing sources of bias associated. Of the potential sources of bias, the modifiable areal unit problem (MAUP) is perhaps the most serious threat to the performance of area-level measures. The MAUP is a statistical bias that occurs when you aggregate point data to an area. It was first identified by Openshaw (Openshaw, 1983) who articulated that “the areal units used in geographical studies are arbitrary, modifiable and subject to the whims of whoever is doing the aggregating”. The MAUP occurs when results change simply by altering the geographic level of aggregation (Flowerdew, 2011; Fortheringham & Wong, 1991; Haynes et al., 2007; Marceau, 1999; Wong & Amrhein, 1996). An example of the MAUP that has gained popular attention is gerrymandering, where the boundary for a voting district that is drawn can determine the ultimate election outcome. From a health perspective, the choice of geographic level can have unintended consequences on assessments of association between contextual exposure and health. This must be considered carefully in index development and requires careful selection of scale for the resultant index.
Certain source data commonly used for measurement construction, like data from the United States Census, are very susceptible to the MAUP (Duckham et al., 2001). One approach to limiting the potential bias of the MAUP is to develop measures using smaller geographic levels, an approach that has been used in index development in countries outside the US(Schuurman et al., 2007).
Work by Haynes shows that the MAUP is particularly problematic when geographic levels are characterized by heterogeneous characteristics, such as distinct economic, educational or population groups placed in the same unit (Haynes et al., 2007). From a census perspective this reinforces the case for smaller levels of geography as these are more homogeneous than larger levels. Like the example in Figure 1, a county level measure may not capture the nuances that would be obvious at a smaller geographic level. Flowerdew looked at census data and found that the MAUP can have substantial effect on census data groupings – making it important to ask which data scale is most coincident with the geographical process under study (Flowerdew, 2011; Flowerdew et al., 2008).
Despite these concerns, only a few studies have evaluated the performance of predicting health outcomes when the geographic level of measure is varied. Using state mortality and cancer incidence data, Krieger et al. (Krieger et al., 2002)highlighted that attempts to monitor socioeconomic inequalities in health were best served at the block group and tract level over the zip code. Likewise, during the development of the VANDIX (Vancouver Area Index of Disadvantage) (Schuurman et al., 2007), Schuurman and her team found that smaller levels of geography (equivalent to block group and tract census US units) had higher predictive accuracy when evaluating self-reported health quality compared to larger administrative levels like counties (Bell, Schuurman, & Hayes, 2007; Bell, Schuurman, Oliver, et al., 2007; Oliver & Hayes, 2007; Schuurman et al., 2007). In work on the English Indices of Multiple Deprivation, Noble shows that quantification of area disadvantage at lower levels than the city is critical for policy to effectively target resource allocation to address disadvantage(Noble et al., 2006). Finally, the precise boundaries that define an area unit may not be essential to identifying trends in disadvantage as indicated by Stafford and colleagues who found that alternative definitions of neighborhood boundaries have no substantive effect on the estimates of inequalities (Stafford et al., 2008). This finding gives confidence that administrative boundaries are appropriate to define the neighborhood, if they are at small levels of geography, such as the Census block group or Census tract.
2. METHODS
This study set out to extend the work of Krieger and Schuurman to evaluate the effects of geographic scale on the predictive power of an index of disadvantage. This work used ArcGIS Pro and the statistical language R to conduct a retrospective study on a hospitalized cohort using all inpatient fee-for-service claims data for Medicare’s nationwide population (100% data) from 2013 to 2015. Consistent with prior analyses of the Centers for Medicare & Medicaid Services (CMS) quality measures criteria, we included beneficiary stays if the beneficiary was admitted to a short-term acute care hospital from January 1, 2013, through November 30, 2015, were 18 and older, had continuous Part A and B coverage in 12 months prior, and were without Railroad Retirement benefits or health maintenance organization enrollment. A key outcome evaluated was 30-day all-cause readmissions, which was constructed from measure specifications used by CMS to inform hospital care-related policy (Services, 2019). 30-day readmissions was chosen as the outcome variable as it is often used as a quality indicator in CMS hospital policy, and it is both widely studied and available nationwide.
Once obtained from the CMS data, the patients nine-digit zip code was used to geolocate the individual. Geographic crosswalks were constructed to link each record to their Census block group, county and zip code. The 30-day rehospitalization indicator was binary coded to indicate presence or absence of a readmissions event. The identifying geographic information was dropped, along with all identifying data from the record, and the geographic code identifying the three areas of interest was appended to the record.
Using the geographic codes appended to the records, neighborhood disadvantage was measured at three commonly used United States geographic levels: the county, Zip Code Tabulation Area (ZCTA), and census block group. Disadvantage was measured using the ADI at each geography. The ADI was independently constructed at each of the three geographic scales chosen for this study using time-concordant five-year estimates data using 2015 (2011–2015) American Community Survey (ACS) data. The ADI was originally developed by the Health Services Research Administration to evaluate chronic conditions and their relationship to area disadvantage at the county level(Singh, 2003). The ADI reflects 17 items measured in the American Community Survey including income, education, housing, and employment factors. Rankings range from 1 to 100, with higher ranks denoting greater area-level disadvantage. Geographic data at the appropriate scales were obtained via National Historical Geographic Information Systems data. Medicare Beneficiaries were linked via the Zip+4 code in the Medicare data to each of the geographic level versions of the ADI. The ADI has been annually updated and validated it at the block group level (Kind & Buckingham, 2018) and for this study was recreated independently for the period of interest at each geographic level.
It should be noted that the choice of geographies created a fundamental difference between the units of analysis. Census block groups contain roughly 1500 persons, while counties and Zip Codes are not population based units but rather administrative and governmental units that vary in population size. In densely populated (i.e. urban) areas the number of block groups in a county will be higher than in sparsely populated rural areas. This implies that while the ADI at the block group level is population normalized, the county and Zip Code ADI’s are not population normalized.
To assess the predictive validity at each geographic level, we used logistic regression categorizing the ADI into high vs. low neighborhood-level socioeconomic disadvantage groupings, using the 80th percentile ADI ranking nationwide as a cut point. Previous health services and policy research has supported this specific threshold (Kind et al., 2014) suggesting increased risk for those living in in areas with greater area-level socioeconomic disadvantage. All analyses were conducted at the index discharge level (with multiple hospitalizations per person possible) to be consistent with current policy metric construction methods. Logistic regression was used via a generalized estimating equations approach to account for hospital-level clustering. Predictive validity was assessed by comparing unadjusted odds ratios (ORs) with 95% Confidence Intervals estimated at each geographic level.
3. RESULTS
The estimated odds of 30-day readmission varied by the geographic level at which the ADI was calculated as shown in Table 1. The final logistic regression model evaluated the effect of high disadvantage (as denoted by a 1) on the odds of increased risk of hospitalization. The choice of logistic regression was made because the ADI is itself a ranked measure and stratifying it into a High/Low marker required treating the variable as a qualitative marker.
Table 1.
Predictive Power of ADI on 30-day Readmissions based on Geographic Level OR is the odds of readmission for those living in high ADI neighborhoods relative to low ADI neighborhoods
| Geographic Level | OR | 95% CI |
|---|---|---|
| Census Block Group | 1.20 | (1.12 – 1.29) |
| County | 1.03 | (0.96 – 1.09) |
| ZCTA (Zip Code) | 1.03 | (0.95 – 1.13) |
At the census block group, the risk of rehospitalization for those living in the most disadvantaged block groups nationwide relative to least disadvantage neighborhoods was 1.20 with a confidence interval of 1.12 – 1.29. This result was statistically significant as the confidence interval did not overlap the value of 1. This can be interpreted as living in disadvantage at the block group level results in 20 percent higher likelihood of rehospitalization risk than for those living in a non-highly disadvantaged area. Conversely, the County, and ZCTA level ADI’s failed to detect an association between the high and low disadvantage status and the risk of rehospitalization as both levels showed an odds ratio of 1.03 which was within the confidence interval of non-significance.
4. DISCUSSION
This study demonstrated that the choice of spatial scale is a critical consideration when working with spatially located data, specifically geographic health data and indices of disadvantage. In particular, working at the smallest spatial scale appears to yield the strongest connection to outcomes. The geographic level is a critical component in the design and implementation of an index of disadvantage—our results suggest that larger geographic levels including the county and ZCTA levels, do not predict rehospitalization as well as block-group level ADIs. Given this and the practical limitations in resource targeting introduced by larger geographic levels, it would support the assertion that block-group level metrics are better aligned to inform area-based interventions and policy efforts. As expected and as has been demonstrated previously, using large areas such as zip codes and counties reduces predictive validity. As the unit becomes larger, it changes the phenomenon being captured (away from small area influence that exists at the neighborhood level in lived experience), making it difficult to assign meaning, and less tractable to address from an intervention standpoint.
This work supports the conclusion Schuurman and colleagues put forth (Schuurman et al., 2007) that while the MAUP is a challenge, it can be mitigated by working at the smallest available levels of geography. In practice, county and ZCTA units are more often readily available in health data. However, our findings suggest they lack the detail necessary for rigorous and meaningful analysis of place-based contextual disadvantage..
The most serious source of bias, in our opinion, is the MAUP as the simple reaggregation can have significant impact upon the resulting comparisons. It is more severe than other biases like missing data and sampling bias as it is a hidden source of error that is difficult to quantify. The best way to handle the MAUP is to work at the lowest geographic level possible, but researchers should still do so while acknowledging the limitations of their data based on the MAUP.
While this study demonstrates the importance of working at small units of geography, limitations exist within this study. The geographic units chosen are functionally different. The county offers data at a governmental unit while the zip code is a unit specifically for mail delivery that has been adapted for marketing purposes. The Census Block Group, by contrast, is a unit of geography that is specific to the United States Census and is population controlled, unlike the other two units of geography. These differences in construction change the aggregation and at the County and Zip Code level, limit the homogeneity inside each unit which is controlled for in the Census. Further work should extend the comparison to other census units such as the Census Tract, to determine the viability of another population controlled small unit of geography. Additionally, this study did not examine the spatial autocorrelation within each units index. Evaluating the spatial relationships within each index would offer further insights into the strengths and weaknesses of each index.
Despite being applied specifically to the construction and application of indices of disadvantage, this work is applicable across research domains and spatial locations. Working at the smallest possible geographic level, often at the neighborhood level – best approximated by the census block group is the most effective strategy to mitigate the effects of the MAUP. Further, incorporating geographic statistical approaches to the analysis, such as a geographic Principle Components Analysis (Harris et al., 2011), are additional means to attempt to control the MAUP by incorporating the geographic structure into the modelling effort. The MAUP is a challenge, one that cannot be completely removed and therefore cannot be ignored. However, prudent choice of geographic scale can reduce its effect so that the results can retain geographic and statistical validity.
From an applied standpoint, this call for small area indices means that a call for small area data, specifically small area health data, is also needed. Publicly available data are not always publicly useable at the appropriate geographic level. Many times, the choice of geographic level used to calculate a measure is predicated by limitations in the publicly available source data, like the county health rankings, Centers for Disease Control mortality measures, or governmental and marketing data. Such limits can make meaningful interpretations to the population of interest difficult, particularly when summarized at larger geographic levels (e.g., state, county)(Flowerdew et al., 2008; Oliver & Hayes, 2007; Schuurman et al., 2007) . Given that the United States Census produces data at the Census block group level with suppression rules in place we suggest it is time to consider if more data, and especially health research data, may be able to follow Census’ lead. Such resources could also more universally incorporate block group-level metrics of the social exposome, such as the ADI, to facilitate easier and more widely accessible evaluation of these factors.
Following in the footsteps of the work by Krieger (2002) and Schuurman (2007) , work which occurred in different domains of thought and both US and non-US contexts, this work reiterates the effects the geographic scale has on understanding the wider outcomes of interest. While this study focused on the outcome of the development of an index of disadvantage, the findings are broadly applicable to any geographic study that includes a choice of scale in any region or country.
These findings underscore the importance of aligning research in a way that can lead to informing intervention and policy. Further, aligning policy based measures and publicly available data to the most appropriate small levels of geography will enhance the ability to link outcomes and policy, across every region and in many domains using geographic data.
Acknowledgements
Thanks to Luke Chamberlain and Nyla Thursday for producing the map in figure 1
Funding
This work was supported by the National Institute on Aging (R01AG070883). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This material is the result of work supported with the resources and the use of facilities at the University of Wisconsin School of Medicine and Public Health Center for Health Disparities Research.
Footnotes
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- Bell N, Schuurman N, & Hayes MV (2007). Using GIS-based methods of multicriteria analysis to construct socio-economic deprivation indices. International Journal of Health Geographics, 6(17). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bell N, Schuurman N, Oliver L, & Hayes MV (2007). Towards the construction of place-specific measure of deprivation: a case study from the Vancouver metropolitan area. The Canadian Geographer, 51(4), 444–461. [Google Scholar]
- Buckingham WR, Bishop L, Hooper-Lane C, Anderson B, Wolfson J, Shelton S, & Kind AJ (2021). A systematic review of geographic indices of disadvantage with implications for older adults. JCI insight, 6(20). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duckham M, Mason K, Stell J, & Worboys M (2001). A formal approach to imperfection in geographic information. Computers, Environment and Urban Systems, 25, 89–103. [Google Scholar]
- Flowerdew R (2011). How serious is the Modifiable Areal Unit Problem for analysis of English census data? Population Trends, 145, 106–118. [DOI] [PubMed] [Google Scholar]
- Flowerdew R, Manley DJ, & Sabel CE (2008). Neighbourhood effects on health: Does it matter where you draw the boundaries? Social Science and Medicine, 66, 1241–1255. [DOI] [PubMed] [Google Scholar]
- Fortheringham AS, & Wong DWS (1991). The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A, 23, 1025–1044. [Google Scholar]
- Harris P, Brunsdon C, & Charlton M (2011). Geographically weighted principal components analysis. International Journal of Geographical Information Science, 25(10), 1717–1736. [Google Scholar]
- Haynes R, Daras K, Reading R, & Jones A (2007). Modifiable neighbourhood units, zone design and residents’ perceptions. Health and Place, 13, 812–825. [DOI] [PubMed] [Google Scholar]
- Kind AJ, & Buckingham WR (2018). Making neighborhood-disadvantage metrics accessible—the neighborhood atlas. The New England journal of medicine, 378(26), 2456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kind AJ, Jencks S, Brock J, Yu M, Bartels C, Ehlenbach W, Greenberg C, & Smith M (2014). Neighborhood socioeconomic disadvantage and 30-day rehospitalization: a retrospective cohort study. Annals of internal medicine, 161(11), 765–774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krieger N, Chen JT, Waterman PD, Soobader M-J, Subramanian SV, & Carson R (2002). Geocoding and Monitoring of US Socioeconomic Inequalities in Mortality and Cancer Incidence: Does the Choice of Area-based Measure and Geographic Level Matter?: The Public Health Disparities Geocoding Project. American Journal of Epidemiology, 156(5), 471–482. 10.1093/aje/kwf068 [DOI] [PubMed] [Google Scholar]
- Lynch E, Malcoe L, Laurent S, Richardson J, Mitchell B, & Meier H (2021). The legacy of structural racism: associations between historic redlining, current mortgage lending, and health. SSM Population Health, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marceau DJ (1999). The Scale Issue in the Social and Natural Sciences. Canadian Journal of Remote Sensing, 25(4), 347–356. [Google Scholar]
- Noble M, Wright G, Smith G, & Dibben C (2006). Measuring multiple deprivation at the small-area level. Environment and Planning A, 38, 169–185. [Google Scholar]
- Oliver LN, & Hayes MV (2007). Does Choice of Spatial Unit Matter for Estimating Small-area Disparities in Health and Place Effects in the Vancouver Census Metropolitan Area? Canadian Journal of Public Health, 98(July-August), S27–S34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Openshaw S (1983). The Modifiable Areal Unit Problem. In N. University (Ed.), CATMOG (Vol. 34). [Google Scholar]
- Phelan JC, Link BG, & Tehranifar P (2010). Social conditions as fundamental causes of health inequalities: theory, evidence, and policy implications. Journal of health and social behavior, 51(1_suppl), S28–S40. [DOI] [PubMed] [Google Scholar]
- Schuurman N, Bell N, Dunn JR, & Oliver L (2007). Deprivation Indices, Population Health and Geography: An Evaluation of the Spatial Effectiveness of Indices at Multiple Scales. Journal of Urban Health: Bulletin of the New York Academy of Medicine, 84(4), 591–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Services C f. M. a. M. (2019). CMS Measures Inventory Tool: claims-based hospital-wide all-cause unplanned readmission measure. . https://cmit.cms.gov/CMIT_public/ListMeasures?q¼Claims-Based%20Hospital-Wide%20All-Cause%20Unplanned%20Readmission%20Measure. [Google Scholar]
- Singh GK (2003). Area deprivation and widening inequalities in US mortality, 1969–1998. American Journal of Public Health, 93(7), 1137–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stafford M, Duke-Williams O, & Shelton N (2008). Small area inequalities in health: Are we underestimating them? Social Science and Medicine, 67, 891–899. [DOI] [PubMed] [Google Scholar]
- Wild CP (2012). The exposome: from concept to utility. International Journal of Epidemiology, 41(1), 24–32. 10.1093/ije/dyr236 [DOI] [PubMed] [Google Scholar]
- Wong D, & Amrhein C (1996). Research on the MAUP: Old Wine in a New Bottle or Real Breakthrough? Geographical Systems, 3, 73–76. [Google Scholar]
