Abstract
The modifiable areal unit problem, MAUP, is ever-present although not always appreciated. Through real examples, this article outlines the basic causes of MAUP, namely changes in the size, shape, and/or orientation of spatial categories/polygons used to map areal data. The visual effects of changes to mapped data are obvious even though the impacts on our understanding of the world are profound. The article concludes with a discussion of technical and broader strategic approaches for confronting the effects of MAUP on our treatment and interpretation of areal data.
Keywords: Aggregated, Areal, Ecological data, Ecological fallacy, Modifiable areal unit problem, Small numbers problem, Spatial autocorrelation or dependence
Glossary
- Areal/aggregated/ecological data
Data gathered together into spatial units, typically rendered in choropleth maps.
- Ecological fallacy
Mistaken assumptions and inferences about one spatial scale of analysis based on another, whether from larger to smaller scales or vice versa.
- Small numbers problem
The statistical instability or uncertainty that results from input of a small number of observations whose variability may be subject to the influence of outliers.
- Spatial autocorrelation/dependence
The correlation of observations made in geographical space where their characteristics are explained in part by their geographical proximity and not other explanatory factors.
Necessary Grit: MAUP Defined
Geographers, it can be said, can often be found “MAUP-ing” in despair. One of our profession's most recognizable data and map types—choropleth—always comes with some uncertainty. The P in MAUP is a problem in the academic and ordinary senses. As outlined in more detail below, MAUP refers to the cartographic representation of data whose attributes are significantly influenced by the spatial scale used. If we change the scale, we alter what we see. There is no easy solution to the scientific problem nor are its impacts appreciated more generally. Economists, for example, grapple with a similar problem: Simpson's paradox manifests when changing counts in categorical data vary data distributions and their associations. Dividing or collapsing classes can generate very different means, ranges, and thereby correlations.
Fundamentally, MAUP causes similar consternation for geographers whose spatial units (i.e., polygons) in choropleth maps constitute the data classes in question. We may define MAUP as the problem that occurs with aggregation of spatial data by modifying the size, shape (or zone), and/or orientation of spatial categories/polygons in geographical data. Such alteration of spatial categories can theoretically regroup observations among polygons into infinitely new and different arrangements and thereby recast the data.
Imagine a simple map with, say, 1000 observation points. These observations might be measured environmental variables such as groundwater quality or the points identifying households and their reported annual income. If we overlaid that mapped region with a set of 36 polygons or areal units, such as a raster grid typical in remote sensing, we may group our 1000 observations into those 36 areal units. In fact we often confront data situations like this in which we do not have access to the original individual-level observations; only the areal data of our choropleth map. This is where the presence and effect of MAUP is most pervasive. Thus, rather than 1000 observations, we are left with 36 observations as each of those areal units act as containers to group our original data. Aggregated or areal data like this are common, and for good reasons such as when physical geography renders natural boundaries. City neighborhoods are often drawn by taking into account physiographic and transportation features (i.e., a neighborhood naturally bounded by a river and major roadway). This helps to mask individual identity and maintain anonymity in social surveys like censuses.
The concern is whether a regular grid or otherwise, a change in polygon size, shape, or orientation, will reassign individual observations to new groups. Some previously grouped observation will become separated and other previously separate points aggregated together. This process in turn generates new group characteristics such as average household income among the new and different neighborhoods. Because of MAUP, this regrouping can occur because our simple grid of 36 areal units is pivoted a few degrees this way or that. Note that the number of spatial units here remains the same and yet the grid impacts the distribution of the underlying data. Instead, the polygons might be deemed too large or small. A change in the size of units across the mapped region will of course change their number. The remote sensing scientist will recognize this effect in hierarchically ordered raster data units. If our spatial units are not the regular grid of raster data but the vector/line data typical of census boundaries, spatial units might be resized to account for population growth or decline. Or different again, areal units might be reshaped (rezoned in the formal parlance of MAUP studies) to conform to natural boundaries or a new criterion or constraint in geographic space. Indeed, we may alter all three characteristics of areal units—the orientation, size, and shape (termed the zone effect)—and probably dramatically regroup the data measured at our observation points.
By now the reader should appreciate how fundamentally MAUP influences our understanding of the world. A lone observation across geographic space can be recast in any number of ways. But the problem only grows from there: for if altering the size, shape, and orientation of a system of areal units can substantially change a single data set, how does potentially infinite reaggregation impact the relationship we see among multiple variables? It is important to appreciate this point, which goes beyond seeing new averages and ranges of neighborhood incomes across a given set of areal units. It goes to how changeable data in turn associate with other changeable data: not one moving target but two. Following our examples above, we might find that low-income households tend to live in areas with poor groundwater quality, a finding that would scarcely surprise the social scientist. But is it possible that MAUP has generated an association that is not borne by lived experience? Perhaps groundwater quality only looks to be concentrated at the lower rungs of the socioeconomic ladder because of how the areal units are drawn. Or what if MAUP has instead masked an association that is even worse than depicted? We consider these questions below when confronting solutions to MAUP.
The examples below are worked from actual census data to illustrate the mechanics and dynamics of MAUP. In Example 1, the percentage share of Chinese- and Indian-origin populations of Vancouver, Canada, is given for 16 contiguous census tracts (loosely, neighborhoods of about 5000 residents each) within the city. The percentage shares produce a strong negative correlation of r = −0.81. This degree of (dis)association says these two ethnic groups live quite apart. Notwithstanding the dynamics of immigration, settlement, and adjustment (and that this is one small segment of Vancouver in 2001), one might interpret this strong negative correlation to mean that Chinese- and Indian-origin residents of Vancouver prefer different neighborhoods and perhaps also prefer to live apart. The analysis could be taken further by deploying the calculation of an index of dissimilarity or segregation. Rearranging the neighborhoods into a new set of areal units illustrates the effects of MAUP. As shown in the example, this is done by merging the original set of 16 neighborhoods into a new set of 4 (a merger process that is by no means simple as we outline further below). If we permit ourselves to look past the so-called small numbers problem of only four observations, we find this reaggregation generates a strong positive correlation of r = 0.80. One might say that Vancouver looks rather more harmonious than it did with our original 16 census tracts. The frustration of MAUP is that it alerts us to the possibility that, somewhat paradoxically, both iterations are borne by the evidence and yet neither is conclusive.
Example 1a and 1b. Number and percentage share of Chinese- and Indian-origin populations in Vancouver, 2001, using alternative grid (16 versus 4 census tract areal units) arrangements. Corresponding scatter plots and correlations illustrate the impact of MAUP.
Example 2 uses the same census database but a slightly different and more workable set of 24 neighborhoods in the west end of Vancouver. In this case we are looking at what our students hope and expect is a positive relationship between years of education and household income, both measured as averages at the neighborhood scale. A correlation of r = 0.24 across all 24 census tracts/neighborhoods offers a glimmer of hope that students are on the right track by furthering their studies (even as the instructor duly notes the influence or ‘leverage,’ in statistical terms, of the highly educated observations to the far right of the x-axis). But rather more hope is offered by a reaggregated set of six neighborhoods, taking successive contiguous polygons in this tiny quarter of Vancouver for a correlation of r = 0.68. “Stay in school,” students might argue, if wantonly seeking affirmation for choices made; a mistake made worse if one is aware of the vagaries of MAUP.
Example 2. The relationship between income and education using Vancouver census tract data, 2001. Recombining the data from the original set of 24 tracts to 6 tracts again illustrates the impact of MAUP.
Coping With “MAUP-ing”
To be able to manage MAUP is critically important to achieve an appropriately nuanced if not definitive understanding. MAUP guards against absolute certainty, but that should not devalue geographical insights, even if—as we shall see—the impacts reach beyond the data portrayed in choropleth maps.
One question that arises in consideration of MAUP is: does size matter? In searching for certainty in the face of MAUP, many jump to the conclusion that size of spatial unit matters. It is tempting but incorrect to assume that smaller spatial units are closer to the truth because they deliver a smaller number of included observations and perhaps also because they get us closer to the individual unit of observation. Provided our individual observations are themselves diverse, smaller units tend to be more variable. Conversely, a set of larger spatial units will tend to bring summary characteristics closer to the regional average (i.e., smoothing in statistical terms). While this scale-related effect on correlations is typical, it still does not dictate which scale is “better” or “more correct.” In our examples of Vancouver, area size and shape were altered to generate new effects; aggregation up or down can generate different but equally valid results. After all, the spatial units used may be meaningful. A watershed, one may rightly say, is generative; its boundaries not random or reasonably changeable. The robust literature on multilevel modeling (hierarchical linear models, HLMs), for example, relies on the use of data at alternative scales to search and account for the influence of geographical context (termed neighborhood effects in spatial analysis) on outcomes of interest. Thus the first means of coping with MAUP is to understand that a sensible rationale for the given areal units used (that there is a long history of their use in successive censuses [even if only loosely supported at the start], that an area matters in some important way, that boundaries are constitutive of lived experience or physical processes within and without, etc.) is far more important than throwing away insights on pure statistical technicality.
A second means of coping with MAUP is guided by this question: in addition to the substantive rationale above, are there technical approaches to reducing uncertainty? The simple answer is yes. The details are rather more involved and as much craft as science. As in the Vancouver examples, use of alternative scales and simulations of areal units by modifying their three fundamental characteristics—size, scale, and orientation—may be invoked to examine the stability of our data. Perhaps more importantly, simulations may reveal the (in)stability of the correlations we see. This kind of approach is at the core of the foundational work of Openshaw and Taylor in which they explore the impacts of MAUP and rationales for aggregation schemes based on repeated correlation analysis. One may, for instance, merge the most correlated proximate spatial units. This is a sound statistical rationale if one accepts that it is based on the input of MAUP-dependent data and may ignore other rationales on the ground in lived reality.
Another approach that may be used in concert with the above or entirely separately is bootstrapping. By successively removing one or several spatial units in rerunning our analyses, we may gain a window on any undue influence of any particular area and whether the global picture drifts unacceptably with each new look at the data. This kind of approach can also instil the necessary caution to interpret results sought at the target scale of analysis. A goal worth noting here is that, from a scientific point of view, one of MAUP's uncertainties we wish to reduce is that of the spatial dependence of our observations across geographic space. Recognizing spatial dependence as the correlation of observations across geographic space (i.e., spatial autocorrelation, in this case of data in our areal units) can be helpful to identify clustered or random spatial distributions. But spatial dependence can also reduce the explanatory power of putative factors in statistical modeling. The impacts here range across data types and spatial-analytic models and are perhaps the most insidious impact of MAUP on scientific inquiry within geography. The above approaches and others, such as geographically weighted regression, are attempts fundamentally to confront spatial dependence in our data. The broader point here is that one ought to couple the rationale behind a targeted scale of analysis with an equally robust technical strategy for its use and justification.
We conclude this article with one last point on coping with MAUP; a point some might read as heresy if they view MAUP as a fundamentally technical issue. The above strategies are technical, of course, but what we are after is understanding. Thus coping with MAUP is ultimately about interpretive skills. Let us deal with this in two parts: first, we started the article by acknowledging that MAUP causes not only consternation among professional geographers but also misunderstanding for the lay public. Awareness is key, and we see this most clearly perhaps in the old and established literature on social areas in cities. Growing out of the Chicago School of Urban Ecology (from the 1920s), “ecological data” (meaning the same as areal or aggregated or polygon data in choropleth maps) was concerned with population composition in urban neighborhoods. The Chicago School's social area analysis, including of residential segregation, for example, came with growing awareness of the core problems identified in this article. One outcome of this tradition—perhaps the most emblematic in the human geographer's understanding of MAUP—is the concept of ecological fallacy: the mistaken assumption of social composition at one spatial scale of analysis based on another scale. Thus a subset of individuals or households in a neighborhood may not resemble the neighborhood as a whole. Conversely, knowing that a province or state tends to vote a particular way does not predict how their constituent cities vote, let alone their respective neighborhoods and the households within them. Use of aggregated data ought to be ever sensitive to and express interpretations at the given scale of analysis.
Second, if the choropleth map is indeed the most recognizable cartographic tool in the geographer's kit, it is because the spatial units reflect a fundamentally sound rationale or a sense of place or is generative in some meaningful way. What we often see in social reality depicted in the choropleth map in fact reaches well beyond the impacts of MAUP on areal data. If, for example, average household income correlates with neighborhood air pollution, social programming (public education, remediation, abatement, etc.) could be channelled there. Those aware of MAUP might wonder whether resources are well spent. If MAUP is lurking too ominously, we may have instead perversely exacerbated the double jeopardy of high air pollution among the poorest households and neighborhoods that are somehow hidden from view. Having adopted analysis at a given spatial scale, remediation and abatement programs might be built upon air pollution monitoring where it is believed to do the most good. Yet the health system may continue to treat citizens where MAUP and our map of ideal interventions suggested the greatest chances of success. The underlying point here is that, although MAUP is a statistical and cartographic problem, it has real-world influences.
Biography
Michael Buzzelli, BA (Hons), MA, PhD, MEd, is Associate Professor at the University of Western Ontario. After completing graduate work at McMaster University, Michael held academic appointments at UBC and Queen's and has been a visiting scholar at the University of Glasgow and the University of Bologna. He has led several national and international research projects on a range of issues as well as applied graduate policy training and consulting work. His recent work focuses on higher education system policy and planning as well as research on teaching and learning in higher education.
The severe acute respiratory syndrome (SARS) epidemic of 2003 made the world acutely aware of the modifiable areal unit problem, MAUP. Public discourse did not formally engage cartographic science, of course, but the public learned that a few isolated cases of SARS in Toronto and Vancouver led the World Health Organisation to issue a travel advisory against visiting the entire country of Canada. To the disbelief of political leaders and other observers in that country, the advisory came with a map identifying the globe's second largest country (by land mass) in bright red. Even as gateway cities, how could two points locating Toronto and Vancouver on a map of this scale lead to a “do not visit” advisory?
In another example of public consciousness-raising of MAUP, the CNN news network in the United States began using digital map data in its coverage of the 2004 US federal election. In the buildup and election night coverage of President George W. Bush's victory, Americans saw multiscale analysis of polling stations, districts, and states. This was real-time geographical analysis: fueled by the controversy of Florida's ballot counting machines, Americans were reacquainted with gerrymandering and other elements of MAUP.
But MAUP is not always controversial, nor as provocative as professional geographers would like it to be. Perhaps because these examples occurred when digital living was in ascendance, public discourse reflected and became more informed about the influence of MAUP on the power of place. But while MAUP is necessary grit in geographical curriculum, thanks to foundational work of scholars such as Openshaw and Taylor, its impacts on our understanding—both scientific and lay—are often subtle and insidious. This article explores the scope and nature of MAUP and discusses how we may smooth its harder edges of data uncertainty.
Further Reading
- Fotheringham A.S., Wong D.W.S. The modifiable areal unit problem in multivariate statistical analysis. Environ. Plan. A. 1991;23:1025–1044. [Google Scholar]
- Jones T.P., McEvoy D. Race and space in cloud-cuckoo land. Area. 1978;10(3):162–166. [Google Scholar]
- King G. Princeton University Press; Princeton, NJ: 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. [Google Scholar]
- Monmonier M. third ed. University of Chicago Press; Chicago: 1996. How to Lie with Maps. [Google Scholar]
- Openshaw S., Taylor P.J. A million or so correlation coefficients: three experiments on the modifiable areal unit problem. In: Wrigley N., editor. Statistical Applications in the Spatial Sciences. Pion; London: 1979. pp. 127–144. [Google Scholar]
- Tate N.J., Atkinson P.M., editors. Modelling Scale in Geographical Information Sciences. Wiley and Sons; London: 2001. [Google Scholar]
- Wong D.W. Aggregation effects in geo-referenced data. In: Arlinghaus S.L., Griffith D.A., editors. Practical Handbook of Spatial Statistics. CRC Press; Boca Raton, FL: 1995. [Google Scholar]
- Wong D.W. Ecological fallacy. In: Warf B., editor. Encyclopedia of Human Geography. Sage Publications; Thousand Oaks, CA: 2006. pp. 117–118. [Google Scholar]
- Wong D.W. Modifiable areal unit problem. In: Kitchin R., Thrift N., editors. International Encyclopedia of Human Geography. Elsevier Science; New York: 2009. pp. 169–174. [Google Scholar]