Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Mar 1.
Published in final edited form as: Spat Spatiotemporal Epidemiol. 2011 Mar 1;2(1):35–47. doi: 10.1016/j.sste.2010.09.009

Spatial Accessibility and Availability Measures and Statistical Properties in the Food Environment

E Van Meter a, AB Lawson a, N Colabianchi b, M Nichols b, J Hibbert b, D Porter c, AD Liese b
PMCID: PMC3076953  NIHMSID: NIHMS241336  PMID: 21499528

Abstract

Spatial accessibility is of increasing interest in the health sciences. This paper addresses the statistical use of spatial accessibility and availability indices. These measures are evaluated via an extensive simulation based on cluster models for local food outlet density. We derived Monte Carlo critical values for several statistical tests based on the indices. In particular we are interested in the ability to make inferential comparisons between different study areas where indices of accessibility and availability are to be calculated. We derive tests of mean difference as well as tests for differences in Moran's I for spatial correlation for each of the accessibility and availability indices. We also apply these new statistical tests to a data example based on two counties in South Carolina for various accessibility and availability measures calculated for food outlets, stores, and restaurants.

Keywords: accessibility, availability, indices, Cp, CI, test, Moran's I, simulation, clustering, Monte Carlo tests

Introduction

Spatial accessibility and availability indices are now used frequently in the analysis of various environments. Original work on these indices was carried out in the 1970s to examine traffic flows and commuter trips for urban planning (Wilson, 1971), but more recently the indices have been applied within nutritional and physical activity studies to assess access to food or exercise resources (Ball et al., 2009; Feng et al., 2010; Galvez et al., 2009; Macdonald et al., 2009; Smoyer-Tomic et al., 2008; Spence et al., 2009). The use of indices in comparative inference about different areas and their properties has increased; however, the statistical properties of such indices have never been fully evaluated. Analyses of these measures often resort to low powered non-parametric tests, which do not exploit the special nature of the indices studied.

In this paper we examine a range of measures that can be used to measure both spatial availability and accessibility. Commonly used availability measures applied in epidemiologic studies on the food environment include number of food outlets, stores, or restaurants in a given location or within a fixed distance ‘buffer’ of a location. In terms of accessibility, commonly used measures are distance–based; assuming that increased distance acts as a deterrent and reduces the frequency of use of the resource. We explore various statistical properties of these measures including correlation between indices. We derived Monte Carlo critical values to be used for statistical analyses after an extensive simulation study. These tests identify differences between accessibility and availability attributes of different study areas and can test for difference between the average value of the measure as well as the spatial correlation, Moran's I.

Background to Measures

In our study we have evaluated a range of measures. Our choice of measures is defined by those commonly found in the literature for studies of the food environment and the accessibility of food resources (Edmonds et al., 2001; Guy, 1983; Inagami et al., 2006; Jeffery et al., 2006; Morland et al., 2002a; Morland et al., 2006; Morland et al., 2002b; Sturm and Datar, 2005). Each measure is calculated from multiple spatial locations within a study area. We define an individual location as s, which represents the Cartesian coordinates of the location, and there will be many s location points designated by either population-specific locations or a uniformly distributed grid over the entire study location.

1. Availability Measures (CI and CI)

The simplest form of availability measure that we have examined is the cumulative index (CI), the count of outlets at a location (or within a pre-defined distance of a location such as a distance buffer, a Census tract, or block group). Hence for a spatial location s this is defined as C(I)s=n(s). If we index the location as the ith site, then CIi=ni. This measure of availability is frequently used (Edmonds et al., 2001; Guy, 1983; Inagami et al., 2006; Jeffery et al., 2006; Morland et al., 2002a; Morland et al., 2006; Morland et al., 2002b; Sturm and Datar, 2005). Simple derivatives of this index include density measures, either relative to population (Cummins et al., 2005; Maddock, 2004; Reidpath et al., 2002; Sturm and Datar, 2005; Zenk et al., 2006) or to area (Block et al., 2004; Maddock, 2004). In addition, it is often useful to consider the variance stabilized form of this count i.e. CIi. This transformation is often made for counts to regularize the variability, and could be helpful in an analysis where there is a need to use a linear transform of the count data (Rawlings et al., 1998).

2. Accessibility indices (Cp, distance to the nearest outlet)

Distance based measures are often used to express the idea that potential access to resources diminishes with distance. The distance measured could be road distance or based on some other relevant distance metric (i.e. network, Euclidian, etc.). The Cumulative Opportunity index (Cp) is defined in general as Cp(s)=A1d(s) where A is a predefined area within which the distances are measured and distance d is measured to all outlets within the area A.

For an indexed location (i) then it is defined as Cpi=jA1dij or alternatively as Cpij=j=1nAdij1 where nA is the number of outlets within the area A.

Some special cases are:

  • Cp(total): CpijT=j=1nAdij1; A is whole study region

  • Cp(buffer): Cpijb=j=1nAdij1; A is defined as a distance buffer around the i th location

  • Cp(nearest): Cpin=di1; where di is the distance to the nearest outlet

Another measure that is sometimes favored is simply the distance to nearest outlet: i.e. Di = di itself. Both the Cp(nearest) and distance to nearest outlet (Di) can be extended to include a variety of closeness (‘distance to’) metrics: nearest, second nearest, third nearest, ‘sum of distances to’ these. For example we could specify a cumulative distance to the 3 nearest outlets, or we could also calculate the cumulative opportunity index for the 2 closest outlets to a location.

Clearly with Cp measures, the smaller the area (A) the more local the measure. One disadvantage of the Cp is that for larger buffers accessibility is being averaged over areas which may be distant from the location and so could lead to lack of spatial differentiation due to this averaging. Hence it is likely to be more informative to use smaller distance buffers in any real study of food access.

Simulation Study Design

Our aim was to provide statistical criteria for inference between various accessibility and availability measures calculated in two spatial environments. To this end we have conducted a simulation study which addresses the nature of the spatial variation of these measures. This study was motivated by and part of a larger effort on characterizing the built food environment in an eight county region in South Carolina (Liese et al., 2009). Therefore, our choice of features in the simulation design is partly informed by real environmental features. Here we define ‘outlet’ to mean either food retail store (convenience store, supermarket, gas station) or restaurant (limited service or full service restaurants). Our simulation design is based on outlet density, but the results of the simulation can be applied across a wide range of food retail enterprises.

As is common in evaluation of distance-dependent spatial processes (Diggle, 2001) we first defined a unit square study area within which we compute measures. This choice allows the evaluation to be carried out without distance scaling and so is non-dimensional. The effects of scaling of distance will be addressed at a later stage. A grid is placed over the unit square area to divide the study section into equally sized grid cells. Uniformly distributed points are placed over the total study area to represent s location points. Each s location point is assigned to a particular grid cell determined by whether they lie in the external or internal area of each grid cell. Each s location point is used to collect various availability and accessibility spatial measures during each simulation, and CI and CI for each location point are determined based on the number of outlets in the grid cell for which the location point is located. This basic grid setup displayed in Figure 1 shows the 225 grid cell locations as well as the 400 predefined s location points on a unit square study area used for all simulations.

Figure 1.

Figure 1

Unit square with 15×15 grid cells (225 total) and 20×20 (400 total) uniformly distributed s location points. This grid is the setting for all simulations conducted during this study.

Model Assumptions

The simulation design is partially based on characteristics of the local food environment and also more general considerations of applicability to a variety of food environment scenarios. To this end we have examined food outlet densities in an eight county urban and rural area of South Carolina (Liese et al., 2009). While no very large cities are represented in that study, the average characteristics of outlet density and its variation between rural and urban areas are highlighted. In initial simulations where we considered overall food outlets (total stores and restaurants), we assumed an outlet density with mean 14.8 and standard deviation of 13.5 per census tract. These summary values correspond to the South Carolina study which identifies 2219 food outlets in an eight county area covering 150 census tracts.

To set up a simulation area, we assumed that the study area was divided into a fine tract grid of equal sizes into which we uniformly distributed 400 s location points. Accessibility and availability measures were calculated at each of the 400 s location points to outlets in tracts. The outlet densities in our study area (Liese et al., 2009) suggest overdispersion relative to a Poisson distribution, and initially we examined simulations where outlets were assumed to have a negative binomial distribution in small areas. This however proved to be too simplistic and did not reflect the clustered nature of the outlet distribution. It is often the case that outlets are found in different clustered arrangements in the food environment and so our simulation would be more appropriate if spatial clustering was included in the design.

To this end we have designed cluster simulations where a fixed number of cluster centers are assumed. The clustering around these centers is specified by the parameter ϕ, which describes how tightly the outlets are clustered around the cluster centers. To simulate outlets using this clustering process, we simulated potential outlet locations s* from a uniform distribution. Then we calculated λ(s)=jh(|sxj|,φ) where h is a clustering function that has a Gaussian–like form h(sxj)=12πφexp(|sxj|22φ). The term |s − xj| is the Euclidean distance between location point s and cluster center xj. We accepted point s* as a location for an outlet when R=λ(s)/λmax>Uniform(0,1). λ(s) is calculated in the same manner as λ(s*) for all predefined s location points on the grid and λmax = maximun of (s).

Note that these forms are closely related to spatial cluster processes (Lawson, 2010). The cluster centers are fixed in the simulation and outlets are simulated around the centers to mimic aggregation of outlets. While it is clear that in some real cases clusters of outlets occur as linear features related to road systems, it is considerably more difficult to simulate generalizable simulation results from linear features. We believe that clustering modeled around centers can act as an adequate approximation to the real aggregation found. Specifically, we use different parameters in the clustering process to distinguish between urban and non-urban areas. We assume there are generally more outlets in urban areas as compared to non-urban areas, and we expect there to be more cluster centers in the urban areas but that the outlets are not as tightly bound around each cluster center. The cluster centers could represent a large urban development or shopping area, but we would also expect some locations of outlets to be in the general urban area and not just around the big developments. In contrast, we expect fewer cluster centers in the non-urban areas and that these centers would represent “small” or “large” towns within the non-urban areas. We also expect that the outlets will be more tightly clustered around these cluster centers, and that very few outlets will be in the areas outside the cluster centers. Therefore, we specify a smaller ϕ =0.005 to represent tighter clustering and fewer total outlets in the non-urban areas as compared to ϕ =0.01 and more outlets in the urban simulations. Figure 2 displays examples of both an urban and non-urban simulation of outlets using clustering.

Figure 2.

Figure 2

Examples of simulated outlets using clustering. From left to right, the first figure represents a non-urban area with 300 outlets, 5 cluster centers, and clustering parameter ϕ =0.005. The second figure represents an urban area with 2000 outlets, 15 cluster centers, and clustering parameter ϕ =0.01. Solid black dots represent cluster centers and open dots represent outlet locations.

Statistical Description and Hypothesis Testing

All the measures that we evaluate are available at any s location point on the grid within a spatial domain (study area). Hence the resulting measure is in fact a surface. At any single location on the selected grid we are making a measurement on what is a continuous surface. For counts of outlets this is of course an approximation but the use of the square root transform of the counts will help to normalize the surface (Cressie, 1993).

Often geostatistical methods are used to characterize such surfaces and variogram estimation and Kriging are commonly employed. However, in the context of food environment studies, it was felt important that we evaluate some simple summary statistics that would be easy to use and capable of wider acceptability in such studies. To this end we have examined descriptive measures designed to make inference about the differences between study areas (i.e. comparative inference). We have also exploited Monte Carlo (MC) testing (Diggle, 2001) to evaluate these comparisons. MC testing is often adopted where complex spatial distributions are present. It relies on simulation from the null hypothesis and this is usually available in such studies. Ultimately, we have derived Monte Carlo critical values for test statistics to allow table look up to assess these differences.

We considered simulation schemes that represented urban only and non-urban only areas. We examined a range of cluster center numbers of simulation and decided to focus on the following which typify the different scenarios (urban/non-urban): Urban areas were simulated with 15 cluster centers and a clustering parameter of 0.01, whereas non-urban areas were simulated with only 5 cluster centers and a tighter clustering parameter equal to 0.005. We then ran these various scenarios for different total number of outlets in each area, specifically 100, 300, 600, and 2000 total outlets. Each simulation was run twice for 500 datasets each using R version 2.10.1 (R, 2009), so that we have a replication of the null hypothesis to examine the differences between areas. We then calculated comparison statistics between these two simulations to gather information regarding the distribution of the null hypothesis for comparing two areas. Seven spatial measures were considered during these simulations including availability measures of CI and CI and accessibility measures of CP total (over the entire area), CP to the nearest 1, 2, and 3 outlets, and the distance to the nearest outlet. We also initially considered distance buffered CP measures. However these have variable outlet numbers and were found to be less reliable in comparisons. We then constructed comparison tests to make inferences between two different areas for these seven spatial measures. To do this we estimated statistics and their distributions under the null hypothesis and found the critical values from these Monte Carlo distributions. In the tests described below, CV represents the critical values for a particular spatial measure. The critical values for differences between the average spatial measure for two study areas are listed in Table 1, and critical values for differences between Moran's I for spatial correlation are listed in Table 2. We included Moran's I in this simulation study since it is often a useful feature of any spatial accessibility or availability measure that explains how spatially correlated and clustered a measure is over the study area, and may be of interest to an investigator.

Table 1. Critical Values for various spatial measures looking at differences in averages between 2 spatial areas. This table corresponds to the calculated average test value, x*.

Alpha Level
0.005 0.01 0.025 0.05 0.075 0.1 0.9 0.925 0.95 0.975 0.99 0.995
CI -0.0637 -0.0567 -0.0444 -0.0353 -0.0295 -0.0257 0.0253 0.0291 0.0345 0.0448 0.0566 0.0634
CI
-0.0773 -0.0697 -0.0552 -0.0439 -0.0368 -0.0321 0.0330 0.0374 0.0434 0.0544 0.0638 0.0753
CP Total -0.0863 -0.0742 -0.0587 -0.0469 -0.0387 -0.0324 0.0318 0.0372 0.0460 0.0561 0.0734 0.0870
CP Nearest 1 -0.1057 -0.0953 -0.0817 -0.0683 -0.0616 -0.0559 0.0578 0.0629 0.0701 0.0817 0.0921 0.1002
CP Nearest 2 -0.0984 -0.0897 -0.0761 -0.0643 -0.0576 -0.0516 0.0529 0.0589 0.0655 0.0755 0.0867 0.0932
CP Nearest 3 -0.0948 -0.0852 -0.0717 -0.0617 -0.0546 -0.0486 0.0495 0.0556 0.0624 0.0710 0.0827 0.0895
Distance to Nearest Outlet -0.1616 -0.1441 -0.1203 -0.0968 -0.0838 -0.0742 0.0752 0.0843 0.0968 0.1155 0.1442 0.1621

Table 2. Critical Values for various spatial measures looking at differences in Moran's I between 2 spatial areas. This table corresponds to the calculated Moran's I test value, I*.

Alpha Level
0.005 0.01 0.025 0.05 0.075 0.1 0.9 0.925 0.95 0.975 0.99 0.995
CI -0.0108 -0.0098 -0.0077 -0.0061 -0.0051 -0.0044 0.0041 0.0048 0.0057 0.0074 0.0088 0.0103
CI
-0.0107 -0.0090 -0.0072 -0.0060 -0.0050 -0.0042 0.0041 0.0047 0.0056 0.0068 0.0083 0.0096
CP Total -0.0420 -0.0362 -0.0272 -0.0217 -0.0172 -0.0144 0.0133 0.0157 0.0194 0.0257 0.0335 0.0392
CP Nearest 1 -0.0075 -0.0071 -0.0057 -0.0047 -0.0041 -0.0036 0.0035 0.0040 0.0046 0.0056 0.0065 0.0074
CP Nearest 2 -0.0101 -0.0084 -0.0071 -0.0058 -0.0051 -0.0045 0.0043 0.0049 0.0056 0.0067 0.0081 0.0087
CP Nearest 3 -0.0114 -0.0096 -0.0079 -0.0065 -0.0058 -0.0050 0.0047 0.0055 0.0064 0.0075 0.0091 0.0104
Distance to Nearest Outlet -0.0223 -0.0189 -0.0153 -0.0121 -0.0102 -0.0089 0.0089 0.0103 0.0123 0.0155 0.0196 0.0227
  1. To test H0: A = B versus H1: AB for mean differences in spatial measure x between areas A and B (critical values are located in Table 1):

    x=x¯Ax¯Bvar(xA)+var(xB),RejectH0ifx<CV(α/2)orx>CV(1α/2)
  2. To test H0: IA =IB versus H1: IA≠ IB for differences in Moran's I for spatial autocorrelation test between areas A and B (critical values are located in Table 2):

    I=IAIB,RejectH0ifI<CV(α/2)orI>CV(1α/2)

    Where Moran's I for spatial autocorrelation is defined

    I=Nijwijijwij(xix¯)(xjx¯)i(xix¯)2

    as:

    for wij = exp (−dij) and dij = Euclidean distance from location i to location j

Although the simulations were conducted on a unit square grid, these two comparison tests are scale-invariant and do not depend on the scale of each test area. Therefore these tests can still be implemented for two unequally sized areas. To assess the reliability of these tests we then constructed histograms, quantile plots, and Kolmogorov-Smirnov tests for equal distributions to see if these comparison statistics have similar distributions under each respective null hypothesis under a variety of conditions. Specifically, we wanted to assess whether these comparison statistics depended on the spatial environment (urban/non-urban) as well the total number of outlets used. Figure 3 display an example of histograms for the test statistic to compare 2 study areas under the null hypothesis for CP total in an urban scenario with 2000 outlets versus a non-urban scenario with 300 outlets. It can be seen that these two histograms look very similar regardless of the simulation setup and that the test statistic appears to be approximately normally distributed with a mean equal to zero. Figure 4 also displays quantile plots for the CP total for both test statistics described above for the urban scenario versus non-urban scenario. We found that both x* and I* were consistent regardless of simulation criteria. Thus, we merged all simulation results for the various scenarios considered to obtain a large sample of over 4000 datasets.

Figure 3.

Figure 3

Histograms of each test statistic under the null distribution for two simulation scenarios. From top left and then clockwise: The null distribution of the test statistic to compare averages for CP total in an urban simulation with 2000 outlets, the null distribution of the test statistic to compare differences in Moran's I for CP total in an urban simulation with 2000 outlets, the null distribution of the test statistic to compare averages for CP total in a non-urban simulation with 300 outlets, the null distribution of the test statistic to compare differences in Moran's I for CP total in a non-urban simulation with 300 outlets.

Figure 4.

Figure 4

Quantile plots for each of the test statistics for an urban simulation with 2000 outlets versus a non-urban simulation with 300 outlets. From left to right: quantile plot for the statistical test to assess differences between the average CP total between areas, quantile plot for the statistical test to assess difference between Moran's I between areas.

Simulation-based results: Hypothesis Testing

Using these combined simulations, we were able to identify critical values to test for significant differences between spatial measures and Moran's I spatial autocorrelation between two areas using Monte Carlo simulation testing. Table 1 displays the Monte Carlo critical values for various alpha levels for each of these comparison tests between average spatial measures identified above. Similarly, Table 2 displays the critical values for comparison tests between Moran's I for spatial correlation for each of the seven spatial measures. These tests are not only scale-invariant, but they also do not depend on the number of outlets in each area or the pattern in which these outlets are clustered under the null hypothesis. In table 1, for example if a value of the mean difference statistic for CI were found to be >0.0448 then this would be evidence of rejection of the null hypothesis at the 0.05 level.

Simulation-based results: Correlation

Another statistical property assessed during this simulation study was correlations between the various accessibility/availability spatial measures. These correlations are not meant for inference; they are simply presented to inform about some of the patterns we find between these measures. They are also not consistent between simulation scenarios of urban versus non-urban and different number of total outlets. For a specific simulation scheme, correlations were calculated for each dataset (500 total) over all s location points. Tables 3 and 4 display median correlations over the 500 datasets for two particular scenarios. Table 3 shows an example of correlations in a non-urban simulation with 300 total outlets, while Table 4 shows an example in an urban simulation with 2000 total outlets. There are many similarities between the median correlations for the urban vs. non-urban simulation. As expected, there are high correlations between CP to the nearest outlet, CP to the nearest 2 outlets, and CP to the nearest 3 outlets. Interestingly, the correlations between availability measure CI and accessibility measures CP to the nearest 1, 2, and 3 outlets are smaller in the urban simulation as compared to the non-urban simulation.

Table 3. Median Correlations for 500 simulations over entire non-urban area with 300 total outlets.

CI
CI
CP Total CP Nearest1 CP Nearest 2 CP Nearest 3 Distance to Nearest Outlet
CI 1.0000 0.9821 0.4055 0.3586 0.4017 0.4285 -0.4333
CI
1.0000 0.3952 0.3589 0.3960 0.4216 -0.4511
CP Total 1.0000 0.6792 0.7218 0.7452 -0.5371
CP Nearest 1 1.0000 0.9807 0.9634 -0.6560
CP Nearest 2 1.0000 0.9948 -0.7039
CP Nearest 3 1.0000 -0.7242
Distance to Nearest Outlet 1.0000

Table 4. Median Correlations for 500 simulations over entire urban area with 2000 outlets.

CI
CI
CP Total CP Nearest1 CP Nearest 2 CP Nearest 3 Distance to Nearest Outlet
CI 1.0000 0.9913 0.4482 0.1387 0.1819 0.2107 -0.2860
CI
1.0000 0.4352 0.1421 0.1830 0.2081 -0.2920
CP Total 1.0000 0.7513 0.7937 0.8159 -0.5829
CP Nearest 1 1.0000 0.9852 0.9715 -0.5644
CP Nearest 2 1.0000 0.9962 -0.6192
CP Nearest 3 1.0000 -0.6422
Distance to Nearest Outlet 1.0000

Mapped results

Another result of our simulation study was to observe spatial patterns of these various availability/accessibility measures. Once again, these patterns are highly dependent on the simulation design, cluster centers, and the total number of outlets in an area. But these contour plots are informative of the variability of these measures within an individual area. Figure 5 displays four contour plots for an individual dataset simulated for an urban environment with 15 cluster centers, a clustering parameter equal to 0.01, and 2000 total outlets. Figure 6 displays four contour plots for an individual dataset simulated for a non-urban environment with 5 cluster centers, a clustering parameter equal to 0.005, and 300 total outlets. It is clear from these two figures that CI is much larger around the cluster centers, and CP total provides a more general picture of the overall accessibility for the entire area. CP to the nearest outlet is once again largest around particular cluster centers, and distance to the nearest outlet is simply the inverse of CP to the nearest outlet. These contour plots provide an informative and unique view of the availability and accessibility in a particular environment.

Figure 5.

Figure 5

From top left: Contour Plots of CI, CP Total, CP to the Nearest Outlet, and Distance to the Nearest Outlet for one simulation of an urban area with 2000 outlets.

Figure 6.

Figure 6

From top left: Contour Plots of CI, CP Total, CP to the Nearest Outlet, and Distance to the Nearest Outlet for one simulation of an non-urban area with 300 outlets.

Data Example

We provide a real data example of how these new statistical inference tests can be applied using previously collected data from urban and rural areas of South Carolina (Liese et al., 2009). Table 5 provides basic demographic characteristics for this study area. We decided to focus on rural Kershaw County and urban Richland County to assess differences in accessibility and availability measures between two study areas. Figure 7 gives a graphical representation of both Kershaw and Richland counties in South Carolina along with the density of food outlets in each region.

Table 5. Demographic characteristics of the data example separated by urban Richland County and rural Kershaw County.

Kershaw Richland
Census Tracts Total 11 78

Area (km2) Total 1915.813 1997.173
Mean 174.165 25.605
Stdev 95.628 64.254

Population Total 52647 320677
Mean 4786.091 4111.244
Stdev 2955.347 2450.587
Population/km2 27.480 160.565

Outlets Total 177 1147
Mean 16.090 14.705
Stdev 17.358 14.668
Outlets/km2 0.092 0.574

Stores Total 77 380
Mean 7.000 4.872
Stdev 6.496 4.456
Stores/km2 0.040 0.190

Restaurants Total 100 767
Mean 9.090 9.833
Stdev 11.158 11.864
Restaurants/km2 0.052 0.384

Figure 7.

Figure 7

Kershaw and Richland Counties in central South Carolina shown with outlet densities per square kilometer.

We used census tract centroids (Kershaw 11 tracts; Richland 78 tracts) as our location points to collect various measures. We then calculated various accessibility and availability measures including CI, the square root of CI, CP total, CP to the nearest 1, 2, and 3 food outlets, and the distance to the nearest outlet for both Richland and Kershaw counties. These measures were calculated using only the outlets located inside each corresponding county. Subsequently, we tested for significant differences between the average indices and Moran's I for spatial autocorrelation between the two counties. Although Richland and Kershaw counties do not have equal study area sizes, the following tests are scale invariant and require no correction for size of the test area. For any spatial accessibility or availability measure, x, we can test the following hypothesis differences between Kershaw and Richland counties using the test statistics described below and the Monte Carlo critical values (CV) derived during the simulation study listed in Tables 1 and 2:

  1. H0:x¯KERSHAW=x¯RICHLANDversusH1:x¯KERSHAWx¯RICHLAND
    x=x¯KERSHAWx¯RICHLANDvar(xKERSHAW)+var(xRICHLAND),RejectH0ifx<CV(α/2)orx>CV(1α/2)

    Corresponding CV values for x* are listed in Table 1.

  2. H0:IKERSHAW=IRICHLANDversusH1:IKERSHAWIRICHLAND
    I=IKERSHAWIRICHLAND,RejectH0ifI<CV(α/2)orI>CV(1α/2)

    Where Moran's I for spatial autocorrelation is defined

    I=Nijwijijwij(xix¯)(xjx¯)i(xix¯)2

    as:

    for wij=exp (−dij) and dij Euclidean distance from location i to location j

Corresponding CV values for I* are listed in Table 2.

We also performed the same analysis for the number of stores and the number of restaurants in each county. Tables 6, 7, and 8 display the calculated measure for each county as well as the corresponding test statistics for all food outlets, food stores, and restaurants respectively. In each of these tables, we also indicate where there are significant differences between the two counties based on the Monte Carlo critical values derived from the simulation study. We also list the Mann-Whitney U non-parametric test commonly used in the statistical analysis of these measures with the corresponding p-value for comparison purposes. The point of this paper is not to compare tests with existing methods, but we felt it was important to also list the traditional comparison Mann-Whitney U test. We also did not provide an alternative test for Moran's I for spatial autocorrelation as these tests are not as commonly used.

Table 6. Analysis of data example to test for significant differences between various accessibility/availability spatial measures for food outlets in Kershaw and Richland Counties in South Carolina.

Kershaw Richland New Test Statistic Reject H0 Mann-Whitney Test Statistic Mann-Whitney test p-value
CI Average 16.091 14.705 0.061 Yes* 403 0.750
Moran's I 0.025 0.002 0.023 Yes*

CI
Average 3.332 3.416 -0.029 No 403 0.750
Moran's I 0.020 0.002 0.017 Yes*

CP total (1/km) Average 1759.250 18755.793 -1.540 Yes* 2 0.000*
Moran's I 0.029 0.014 0.015 No

CP Nearest 1 Outlet (1/km) Average 78.848 413.311 -0.339 Yes* 178 0.002*
Moran's I 0.018 0.004 0.014 Yes*

CP Nearest 2 Outlets (1/km) Average 144.797 694.282 -0.352 Yes* 157 0.001*
Moran's I 0.019 0.004 0.015 Yes*

CP Nearest 3 Outlets (1/km) Average 192.242 916.469 -0.367 Yes* 153 0.001*
Moran's I 0.017 0.004 0.013 Yes*

Distance to the Nearest Outlet (km) Average 0.028 0.010 0.850 Yes* 680 0.002*
Moran's I 0.014 0.015 -0.001 No
*

Statistically Significant for a 2-sided test and α = 0.05

Table 7. Analysis of data example to test for significant differences between various accessibility/availability spatial measures for food stores in Kershaw and Richland Counties in South Carolina.

Kershaw Richland New Test Statistic Reject H0 Mann-Whitney Test Statistic Mann-Whitney test p-value
CI Average 7.000 4.872 0.270 Yes* 475.5 0.564
Moran's I 0.025 0.004 0.021 Yes*

CI
Average 2.275 1.978 0.172 Yes* 475.5 0.564
Moran's I 0.018 0.004 0.014 Yes*

CP total (1/km) Average 747.105 5583.854 -2.000 Yes* 5.0 0.000*
Moran's I 0.025 0.017 0.007 No

CP Nearest 1 Store (1/km) Average 56.845 232.692 -0.395 Yes* 139.0 0.000*
Moran's I 0.013 0.004 0.009 Yes*

CP Nearest 2 Store (1/km) Average 99.783 388.239 -0.436 Yes* 123.0 0.000*
Moran's I 0.013 0.004 0.009 Yes*

CP Nearest 3 Store (1/km) Average 140.257 500.995 -0.478 Yes* 121.0 0.000*
Moran's I 0.013 0.004 0.009 Yes*

Distance to the Nearest Store (km) Average 0.034 0.011 0.925 Yes* 719.0 0.000*
Moran's I 0.011 0.014 -0.003 No
*

Statistically Significant for a 2-sided test and α = 0.05

Table 8. Analysis of data example to test for significant differences between various accessibility/availability spatial measures for restaurants in Kershaw and Richland Counties in South Carolina.

Kershaw Richland New Test Statistic Reject H0 Mann-Whitney Test Statistic Mann-Whitney test p-value
CI Average 9.091 9.833 -0.046 Yes* 384.5 0.582
Moran's I 0.024 0.002 0.023 Yes*

CI
Average 2.375 2.615 -0.092 Yes* 384.5 0.582
Moran's I 0.021 0.002 0.019 Yes*

CP total (1/km) Average 1012.145 13171.94 -1.343 Yes* 1.0 0.000*
Moran's I 0.032 0.012 0.019 No

CP Nearest 1 Restaurant (1/km) Average 71.203 344.877 -0.291 Yes* 177.0 0.002*
Moran's I 0.019 0.003 0.016 Yes*

CP Nearest 2 Restaurants (1/km) Average 134.434 596.872 -0.304 Yes* 174.0 0.002*
Moran's I 0.020 0.003 0.016 Yes*

CP Nearest 3 Restaurants (1/km) Average 172.318 786.883 -0.320 Yes* 172.0 0.001*
Moran's I 0.019 0.003 0.015 Yes*

Distance to the Nearest Restaurant (km) Average 0.042 0.012 0.890 Yes* 681.0 0.002*
Moran's I 0.018 0.014 0.004 No
*

Statistically Significant for a 2-sided test and α = 0.05

All spatial availability and accessibility indices considered in this real data example were found to have statistically significant differences between Kershaw and Richland counties using the test and corresponding Monte Carlo critical values derived earlier in this paper, with the exception of the square root of CI. Specifically, for all measures except the square root of CI, we find significant differences in terms of accessibility and availability to outlets over the entire study area in Kershaw County versus Richland County. We expected significant differences between these two counties as Kershaw represents a more rural environment, whereas Richland County is urban. Interestingly, our derived test reached the same conclusion as the Mann Whitney test except for CI. This discrepancy in statistically significant differences between Kershaw and Richland counties for the availability measure CI between our derived test and the traditional Mann Whitney test may be due to the fact that the Mann Whitney test using ranks of CI, which is only a count of total outlets, thus resulting in failing to reject the null hypothesis when using the traditional test. As shown in table 6, we also found significant differences in Moran's I for all indices expect CP total and distance to the nearest outlet.

As displayed in table 7, focusing on differences in spatial availability and accessibility of stores in Kershaw and Richland counties, we found all average indices to be statistically significantly different between the two counties using our derived test and corresponding Monte Carlo critical values. For stores, our derived test resulted in the same conclusion as compared to the Mann Whitney test except for CI and the square root of CI. Once again, we expect this is due to both CI and square root of CI being simple counts of the total stores in an area. As shown in table 7, we also found significant differences in Moran's I for all indices expect CP total and distance to the nearest outlet, similar to the results seen for outlets in table 6.

As displayed in table 8, for differences in spatial availability and accessibility indices of restaurants in Kershaw and Richland counties, we found all average indices were statistically significant using our derived test and corresponding Monte Carlo critical values. This is expected as we also saw similar results for outlets and stores and there is a major difference in area and population density between these two counties. Once again, our derived test provides the same results as compared to the Mann Whitney test except for CI and also the square root of CI. As shown in table 8, we also found significant differences in Moran's I for all indices expect CP total and distance to the nearest outlet, similar to the results seen for food outlets in table 6 and stores in table 7.

Discussion and Conclusions

As a result of this simulation study, we have gained significantly more information regarding the statistical properties of various accessibility and availability measures commonly found in the literature. We also constructed two tests to assess differences in average values and Moran's I for spatial autocorrelation. These tests are scale-invariant and can be applied to study areas that are not the same size. These tests are also not dependent on the spatial properties of outlets, stores, or restaurants in a location, i.e. these tests work for urban and rural areas as well as different numbers of outlets. Hence, the importance of these features in any particular application is that scaling of different areas is not required and hence our approach is dimensionless. In contrast the conventional comparison tests (such as the non-parametric Mann-Whitney test) lead to reduction of measures to ranks and so lose power. For example, the CP total will always be different in larger areas as the distances are greater.

The real data example demonstrated how we can test for significant differences between these various accessibility and availability measures in different regions using the tests and corresponding Monte Carlo critical values. Regardless of considering outlets, stores, or restaurants, we found significant differences between Kershaw versus Richland County for all of the accessibility indices, i.e. distance-based spatial indexes. In all cases, we also found that our test rejected the null hypothesis for the simplest availability index, CI, whereas the traditional Mann Whitney test failed to reject the null. This was also true for the between county tests of the square root of CI for stores and restaurants. These discrepancies are most likely due to the ranking of counts used in the Mann Whitney test while our newly developed tests do not require such a transformation and hence will have higher power to detect differences between regions. We also found in all cases, differences in Moran's I for spatial autocorrelation for CP total and distance to the nearest outlet, store, or restaurant were not statistically significant between Kershaw and Richland counties.

Another feature of this study was the examination of correlation between measures. We have established that accessibility measures vary in their use of distances and that CP measures of different kinds are highly correlated in both urban and rural settings. Availability measures are less correlated with CPs but this correlation varies depending on the urban or rural context.

The major strength of this study lies in the use of accessibility and availability measures directly in comparisons and the provision of critical levels for the comparisons of these measures. This provides a dimensionless approach without resort to non-parametric rank-based methods as is typically found in the literature.

A few limitations of our study are apparent. First our simulation scenarios are based on modeled clustering and so provide an idealized test bed for the procedure evaluation. While many aggregations of outlets in reality appear in more arbitrary configurations, we believe that the robustness demonstrated by the derived MC critical values allows the application of the approach to be made in ranges of real situations. Another limitation is that we did not have time to evaluate gravity measures. Gravity measures combine accessibility with availability. These composite measures use distance friction modified by a measure of attraction (such as sales volume, floor space of outlet (Guy, 1983). Usually they are defined as a ratio of the form g / d where g is the measure of attraction of the outlet and d is the distance to the outlet. It is beyond the scope of this study to evaluate these measures. We hope to examine these in a later study.

In addition we opted to examine simple statistical tools for the comparison of measures. Instead we could have examined geostatistical models for the measures where spatial correlation is built into a variogram and Kriging analysis (Cressie, 1993). However we believe that the simper approach adopted here is likely to achieve greater acceptance within nutritional epidemiology.

We believe that our study has addressed an important issue in the use of spatial nutritional environment measures: i.e. how to compare measures across regions both in terms of average effects and also in terms of spatial correlation. Because we use direct measures rather than ranks we have a more sensitive analytic tool for comparisons of nutritional environments.

Acknowledgments

This project was supported by R21CA132133 from the National Cancer Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Ball K, Timperio A, Crawford D. Neighbourhood socioeconomic inequalities in food access and affordability. Health Place. 2009;15:578–85. doi: 10.1016/j.healthplace.2008.09.010. [DOI] [PubMed] [Google Scholar]
  2. Block JP, Scribner RA, DeSalvo KB. Fast food, race/ethnicity, and income: a geographic analysis. Am J Prev Med. 2004;27:211–7. doi: 10.1016/j.amepre.2004.06.007. [DOI] [PubMed] [Google Scholar]
  3. Cressie NAC. Statistics for Spatial Data. 1. Wiley, John & Sons, Inc.; 1993. [Google Scholar]
  4. Cummins SC, McKay L, MacIntyre S. McDonald's restaurants and neighborhood deprivation in Scotland and England. Am J Prev Med. 2005;29:308–10. doi: 10.1016/j.amepre.2005.06.011. [DOI] [PubMed] [Google Scholar]
  5. Diggle PJ. Statistical Analysis of Spatial Point Patterns. 2. A Hodder Arnold Publication; 2001. [Google Scholar]
  6. Edmonds J, Baranowski T, Baranowski J, Cullen KW, Myres D. Ecological and socioeconomic correlates of fruit, juice, and vegetable consumption among African-American boys. Prev Med. 2001;32:476–81. doi: 10.1006/pmed.2001.0831. [DOI] [PubMed] [Google Scholar]
  7. Feng J, Glass TA, Curriero FC, Stewart WF, Schwartz BS. Health Place. Vol. 16. 2010. The built environment and obesity: a systematic review of the epidemiologic evidence; pp. 175–90. [DOI] [PubMed] [Google Scholar]
  8. Galvez MP, Hong L, Choi E, Liao L, Godbold J, Brenner B. Acad Pediatr. Vol. 9. 2009. Childhood obesity and neighborhood food-store availability in an inner-city community; pp. 339–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Guy CM. The assessment of access to local shopping opportunities: a comparison of accessibility measures. Environment and Planning B: Planning and Design. 1983;10:219–238. [Google Scholar]
  10. Inagami S, Cohen DA, Finch BK, Asch SM. You are where you shop: grocery store locations, weight, and neighborhoods. Am J Prev Med. 2006;31:10–7. doi: 10.1016/j.amepre.2006.03.019. [DOI] [PubMed] [Google Scholar]
  11. Jeffery RW, Baxter J, McGuire M, Linde J. Are fast food restaurants an environmental risk factor for obesity? Int J Behav Nutr Phys Act. 2006;3:2. doi: 10.1186/1479-5868-3-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Lawson A. Hotspot detection and clustering: ways and means. Environmental and Ecological Statistics. 2010 doi: 10.1007/s10651-010-0142-z. [DOI] [Google Scholar]
  13. Liese AD, Hibbert J, Barnes T, Porter D, Lawson A. Validation of Three Food Outlet Databases: Completeness and Geospatial Accuracy in Rural and Urban Food Environments. Epidemiology; International Society for Environmental Epidemiology Conference; August 25, 2009; 2009. p. S141. [Google Scholar]
  14. Macdonald L, Ellaway A, Macintyre S. The food retail environment and area deprivation in Glasgow City, UK. Int J Behav Nutr Phys Act. 2009;6:52. doi: 10.1186/1479-5868-6-52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Maddock J. The relationship between obesity and the prevalence of fast food restaurants: state-level analysis. Am J Health Promot. 2004;19:137–43. doi: 10.4278/0890-1171-19.2.137. [DOI] [PubMed] [Google Scholar]
  16. Morland K, Wing S, Diez Roux A. The contextual effect of the local food environment on residents' diets: the atherosclerosis risk in communities study. Am J Public Health. 2002a;92:1761–7. doi: 10.2105/ajph.92.11.1761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Morland K, Diez Roux AV, Wing S. Supermarkets, other food stores, and obesity: the atherosclerosis risk in communities study. Am J Prev Med. 2006;30:333–9. doi: 10.1016/j.amepre.2005.11.003. [DOI] [PubMed] [Google Scholar]
  18. Morland K, Wing S, Diez Roux A, Poole C. Neighborhood characteristics associated with the location of food stores and food service places. Am J Prev Med. 2002b;22:23–9. doi: 10.1016/s0749-3797(01)00403-2. [DOI] [PubMed] [Google Scholar]
  19. R. R: A Language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2009. http://www.R-project.org. [Google Scholar]
  20. Rawlings JO, Pantula SG, Dickey DA. Applied Regression Analysis: A Research Tool. 2. 1998. (Springer Texts in Statistics). [Google Scholar]
  21. Reidpath DD, Burns C, Garrard J, Mahoney M, Townsend M. An ecological study of the relationship between social and environmental determinants of obesity. Health Place. 2002;8:141–5. doi: 10.1016/s1353-8292(01)00028-4. [DOI] [PubMed] [Google Scholar]
  22. Smoyer-Tomic KE, Spence JC, Raine KD, Amrhein C, Cameron N, Yasenovskiy V, et al. The association between neighborhood socioeconomic status and exposure to supermarkets and fast food outlets. Health Place. 2008;14:740–54. doi: 10.1016/j.healthplace.2007.12.001. [DOI] [PubMed] [Google Scholar]
  23. Spence JC, Cutumisu N, Edwards J, Raine KD, Smoyer-Tomic K. Relation between local food environments and obesity among adults. BMC Public Health. 2009;9:192. doi: 10.1186/1471-2458-9-192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sturm R, Datar A. Body mass index in elementary school children, metropolitan area food prices and food outlet density. Public Health. 2005;119:1059–68. doi: 10.1016/j.puhe.2005.05.007. [DOI] [PubMed] [Google Scholar]
  25. Wilson AG. A family of spatial interaction models, and associated developments. Environment and Planning. 1971;3:1–32. [Google Scholar]
  26. Zenk SN, Schulz AJ, Israel BA, James SA, Bao S, Wilson ML. Fruit and vegetable access differs by community racial composition and socioeconomic position in Detroit, Michigan. Ethn Dis. 2006;16:275–80. [PubMed] [Google Scholar]

RESOURCES