Abstract
Studying the relation between the residential environment and health requires valid, reliable, and cost-effective methods to collect data on residential environments. This 2002 study compared the level of agreement between measures of the presence of neighborhood businesses drawn from 2 common sources of data used for research on the built environment and health: listings of businesses from commercial databases and direct observations of city blocks by raters. Kappa statistics were calculated for 6 types of businesses—drugstores, liquor stores, bars, convenience stores, restaurants, and grocers—located on 1,663 city blocks in Chicago, Illinois. Logistic regressions estimated whether disagreement between measurement methods was systematically correlated with the socioeconomic and demographic characteristics of neighborhoods. Levels of agreement between the 2 sources were relatively high, with significant (P < 0.001) kappa statistics for each business type ranging from 0.32 to 0.70. Most business types were more likely to be reported by direct observations than in the commercial database listings. Disagreement between the 2 sources was not significantly correlated with the socioeconomic and demographic characteristics of neighborhoods. Results suggest that researchers should have reasonable confidence using whichever method (or combination of methods) is most cost-effective and theoretically appropriate for their research design.
Keywords: Chicago, geographic information systems, reproducibility of results, residence characteristics, social environment
The association between neighborhood context and health is well documented. For example, residents living in socioeconomically disadvantaged neighborhoods have an increased risk of chronic health conditions such as obesity (1, 2), respiratory problems (3, 4), and coronary heart disease (5, 6), independent of individual risks. Additionally, residents in these neighborhoods have higher rates of smoking (7), poor nutrition (8, 9), and physical inactivity (7, 10–12). However, to provide better guidance on how policy interventions can address neighborhood health disparities, research must focus on more specific aspects of neighborhood environments that lead to increased health risks and unhealthy behaviors.
Until recently, researchers have relied mainly on census-based measures to assess associations between neighborhood characteristics and health (13–15), but they have increasingly turned to other methods to measure aspects of neighborhoods that are more theoretically linked to specific health outcomes. Two common sources of data on such neighborhood mechanisms are 1) spatially referenced secondary sources of data collected for other purposes (e.g., administrative data or market research data) and 2) direct observations of the physical and social environment made through systematic social observation (SSO) (16).
Secondary data sources, such as market research databases and municipal registries, can be very useful for measuring aspects of the environment related to the availability of food, exercise facilities, and health care (17–20). One advantage to using secondary data sources is that they usually provide complete geographic coverage of the area under study (e.g., all the stores that sell food in a city or metropolitan area), sometimes for a very limited cost. However, many health-relevant aspects of neighborhoods are not measured in secondary data sources—such as the level of disorder, the condition of buildings and recreational areas, and the type and price of food sold in local stores—so researchers often turn to other sources of contextual data.
Direct SSO of the neighborhood environment allows researchers to measure constructs not commonly found in secondary data sources by using a field instrument tailored to their needs (21–23). This method is usually more costly because it requires training raters and paying for their time, but it can be performed by survey interviewers who are already in the field, at a relatively low marginal cost. An important limitation to using SSO is that geographic coverage is typically more limited than it is with secondary data because researchers must pay for rater training and transportation to the sites being observed.
Both of these options represent promising alternatives to census data, but there are important unanswered questions concerning the “ecometric” properties of neighborhood-level measures constructed from such data sources, including issues of reliability, validity, and whether measures drawn from either source are more or less reliable or valid in certain types of neighborhoods (13, 24, 25). Although there is no “gold standard” for measuring neighborhood characteristics against which to compare measures from secondary data or SSO, we can begin to address these questions by assessing the level of agreement between equivalent measures derived from each method.
Another measurement concern that can be addressed empirically is whether the level of agreement between the 2 sources varies systematically across neighborhoods (26). For example, research has shown that people living in disadvantaged areas with few grocery stores are at higher risk of obesity (27, 28). If, however, a study relies on secondary data that fail to capture smaller businesses such as “mom-and-pop stores,” and disadvantaged neighborhoods are more likely to have these smaller businesses (28, 29), then the observed relation between lack of grocers and obesity may be an artifact of undercounting small stores in areas where obesity rates are highest. Systematic measurement error could also result from the potential failure of market research databases to count businesses in the “informal economy” or to include businesses that have recently closed, both of which could be more common in poorer neighborhoods. Likewise, SSO raters may make errors translating signs from other languages or not identifying multiple types of businesses at a single location.
In this study, we used 2 different data sources on the presence of stores that sell food or beverages in Chicago, Illinois, neighborhoods: a major market research database and an SSO covering the same time period. In this paper, we first assess the level of agreement between these 2 data sources at the block level, focusing on stores that sell food or beverages because proximity to these establishments has been suggested as one mechanism linking the residential environment to individual diet and health (18, 19, 27, 28, 30, 31). We also assess whether the level of disagreement regarding the presence of food stores between the 2 data sources is systematically related to the sociodemographic composition of neighborhoods. To the extent that the disagreement between the 2 sources is not correlated with key ecologic characteristics (e.g., those conventionally used in neighborhood-effects research), researchers can be more confident in using either method to explore the mechanisms that could help explain why neighborhoods matter for health.
MATERIALS AND METHODS
SSO data
Our observational data came from an SSO conducted as part of the Chicago Community Adult Health Study, a multistage area probability sample of 3,105 adults living in the city of Chicago (32). Each block on which one or more sampled residents lived was rated by trained observers. Raters were asked to walk around the perimeter of each block twice—once on the interior perimeter, including the block face on which the sampled resident lived, and once on the exterior perimeter, including the block face across the street from where the sampled resident lived—and to record their observations on a rating form. The rating form asked the raters to indicate only whether a given business type was present; raters were not asked to count the number of businesses, houses, or street features that they observed.
Observations were made between May 2001 and March 2003 on a total of 1,663 blocks in the city of Chicago sampled to be representative of all blocks in the city that contain residential housing (32). Each block is located in one of 343 neighborhood clusters, defined in previous studies and consisting of one or more geographically contiguous census tracts that follow major ecologic boundaries with relatively homogenous populations and comprise the entire area of the city of Chicago (33). Two observers rated 80 of the blocks to determine the interrater reliability for each item in the instrument; kappa statistics for interrater reliability of items measuring the presence of food and beverage stores ranged from κ = 0.37 (supermarkets) to κ = 1.00 (drugstores). We evaluated whether any of the following types of business establishments were present on the block: drugstores, convenience stores, liquor stores, bars, fast-food restaurants, other restaurants, greengrocers/delicatessens, and grocery stores/supermarkets. The definitions that raters used to code these establishments are presented in Table 1, along with interrater kappa statistics based on the 80 double-rated blocks for each business type. If a business could be categorized into multiple business categories (e.g., a bar and a restaurant), raters were instructed to code both business types as present on the block.
Table 1.
Business Type | SSO Variable Name: Descriptiona | NAICS Code: Descriptionb |
Drugstores | Drugstores/pharmacy (interrater κ = 1.00): all drugstores, including large drugstore chains (e.g., Walgreens, Osco) that sell a wide variety of other merchandise. Count any store that provides a pharmacy for prescription medication. | 446110: Pharmacies and drugstores engaged in retailing prescription or nonprescription drugs and medicines. |
Drugstores on sampled blocks: 89 | ||
Liquor stores | Liquor store (interrater κ = 0.36): includes any store that has alcohol as its primary merchandise. They may be of supermarket size or very small. Stores that sell alcohol with a range of foodstuffs would be considered supermarkets or convenience stores. | 445310: Stores retailing packaged alcoholic beverages, such as ale, beer, wine, and liquor. |
Liquor stores on sampled blocks: 93 | ||
Bars | Bar/cocktail lounge (interrater κ = 0.77): includes places where the alcohol that is sold is consumed on the premises, and this is its main purpose, even if food is also provided. A “wine bar” would be included under this heading although they often offer a variety of meals. To qualify as a bar, it must be possible to obtain alcohol without also purchasing food. | 722410: Bars, taverns, nightclubs, or drinking places primarily engaged in preparing and serving alcoholic beverages for immediate consumption. These establishments may also provide limited food services. |
Bars on sampled blocks: 181 | ||
Convenience stores | 7-Eleven/convenience store (interrater κ = 0.61): includes small supermarkets open 18 to 24 hours selling a wide range of products including newspapers, food, small household items, toys, and stationery. Such stores may be part of a chain (e.g., White Hen Pantry, 7-Eleven) or privately owned. They may also be part of a gasoline service station if they sell a range of goods beyond just cigarettes, candy, and soft drinks. | 445120: Convenience stores or food marts (except those with fuel pumps) primarily engaged in retailing a limited line of goods that generally includes milk, bread, soda, and snacks. |
447110: Gasoline stations in combination with convenience store or food marts. | ||
Convenience stores on sampled blocks: 137 | ||
Restaurants | Fast-food/take-out place (interrater κ = 0.64): includes chains (e.g., McDonalds, Burger King, Wendy's, Taco Bell) and restaurants offering only a limited range of “fast-food” items, such as pizza houses. This category includes sandwich bars, small coffee shops with limited seating, establishments that only have a take-out trade (e.g., some Chinese food, kebab houses), and any other eating place where no more than 2 or 3 people could linger to eat. | 722211: Establishments primarily engaged in providing food services (except snack and nonalcoholic beverage bars) where patrons generally order or select items and pay before eating. Food and drink may be consumed on premises, taken out, or delivered to customers’ location. Some establishments in this industry may provide these food services in combination with selling alcoholic beverages. providing take-out services, or presenting live nontheatrical entertainment. |
Restaurants (fast-food) on sampled blocks: 941 | ||
Other eating place/restaurant (interrater κ = 0.52): includes restaurants that have both take-out and eat-in services and restaurants where the space available for eating on the premises is greater than the space for customers to line up and wait for “take-out” items, except for fast-food chains (e.g., McDonalds), which are in the previous category. | 722110: Establishments primarily engaged in providing food services to patrons who order and are served while seated (i.e., waiter/waitress service) and pay after eating. These establishments may provide this type of food service to patrons in combination with selling alcoholic beverages. | |
To be considered an eating establishment, food should be the principal offering, or nonalcoholic beverages (e.g., coffee shops that provide seating and tables and also some food such as doughnuts, coffee cake, pastries, etc.) | 722213: Snack and beverage: Establishments primarily engaged in 1) preparing and/or serving a specialty snack, such as ice cream, frozen yogurt, cookies, or popcorn or 2) serving nonalcoholic beverages, such as coffee, juices, or sodas for consumption on or near the premises. These establishments may carry and sell a combination of snack, nonalcoholic beverage, and other related products (e.g., coffee beans, mugs, coffee makers) but generally promote and sell a unique snack or nonalcoholic beverage. | |
311811: Bakeries: Primarily engaged in retailing bread and other bakery products not for immediate consumption. | ||
Restaurants (“other eating place”) on sampled blocks: 200 | ||
Restaurants (any) on sampled blocks: 1,052c | ||
Supermarkets or grocers | Supermarket/grocery store (interrater κ = 0.36): includes stores that sell predominantly foodstuffs and small household items. The size of supermarkets may vary. Include large supermarkets (e.g., Jewel, Dominics) and also smaller food supermarkets. Some of these will be open 24 hours a day but they are distinguished from “convenience stores” by the fact that they provide a wider range of items. | 445110: Establishments generally known as supermarkets and grocery stores primarily engaged in retailing a general line of food, such as canned and frozen foods; fresh fruits and vegetables; and fresh and prepared meats, fish, and poultry. Included in this industry are delicatessen-type establishments. |
Greengrocer/delicatessen (interrater κ = 0.52): includes “mom-and-pop” small neighborhood groceries and small specialized food stores, such as those selling food from one part of the world or one ethnic background, or health foods/vegetarian food shops. The store should be predominantly for the purchase of foods that need to be prepared off the premises, NOT those offering mainly ready prepared food to “carry out” (such as sandwich bars). | 445210: Establishments primarily engaged in retailing fresh, frozen, or cured meats and poultry. Delicatessen-type establishments primarily engaged in retailing fresh meat are included in this industry. | |
445220: Establishments primarily engaged in retailing fresh, frozen, or cured fish and seafood products. | ||
445230: Establishments primarily engaged in retailing fresh fruits and vegetables. | ||
Supermarkets or grocers on sampled blocks: 437 |
Abbreviations: NAICS, North American Industry Classification System; SSO, systematic social observation.
The variable descriptions are the instructions given to the SSO observers.
The definitions for the NAICS codes are published by the US Bureau of the Census (http://www.census.gov/epcd/ec97/industry/).
The total from any restaurant is less than the sum of fast-food restaurants and “other eating places” because they are not mutually exclusive categories.
Proprietary commercial establishment data
The proprietary data were purchased from InfoUSA (Omaha, Nebraska), a commercial vendor that tracks businesses and provides business listings based on the North American Industry Classification System (NAICS) codes, a standard created and used by government agencies to categorize businesses. This proprietary data set lists 23,868 businesses that had 1) addresses in the city of Chicago during November 2002, a month chosen because 84% of the SSOs were collected within 3 months on either side of that month; and 2) NAICS codes that identified them as drugstores, convenience stores, liquor stores, bars, fast-food restaurants, other restaurants, greengrocers/delicatessens, or grocery stores/supermarkets. Definitions of these business categories and their corresponding NAICS codes are listed in Table 1, along with the number of such establishments in the proprietary data across all sampled blocks. Each establishment could identify itself with up to 4 NAICS codes. We classified a business as belonging to a specific food/beverage store category if at least one of the reported NAICS codes was in the food/beverage store category (establishments with multiple NAICS codes were classified into multiple business categories). Although the data proprietor provides a new list of businesses each month, it does not provide documentation regarding the frequency with which individual listings are verified; the lack of such information is a major disadvantage in using these data.
The data set included addresses and geographic coordinates for each business. After inspecting the data, we found small discrepancies between the coordinates provided by the data proprietor and the coordinates obtained by geocoding the addresses ourselves. Since the proprietor did not document the specific procedures used to obtain geographic coordinates for businesses, we geocoded the addresses of all of the 23,868 businesses using US Census Bureau TIGER/Line files (http://www.census.gov/geo/www/tiger/) and ArcGIS version 9.2 software (ESRI, Redlands, California). We were unable to obtain geocoded coordinates for 1,019 businesses. We conducted the analysis 1) without these observations (which we report in the tables) and 2) with the proprietor-provided coordinates for these observations, and we found no significant variation in our results.
Creating comparability between data sets
We used a multistep process to construct comparable measures across the 2 data sources. The first step created comparable business categories by matching the instructions used by the SSO raters to the NAICS codes used to classify businesses in the proprietary data. This step required collapsing some categories in both data sources. Table 1 provides the business categories used in the analysis and how they are defined in each data source.
Since the availability of fast-food restaurants has been considered particularly important to the health of residents (19, 30, 34), we also created 2 subcategories of restaurants: fast-food restaurants and “other eating places.” The “fast-food” category in the SSO data was compared with the “limited-service restaurants” category (NAICS code 722110) in the proprietary data (refer to Table 1 for more detail), whereas the “other eating places” category from the SSO (which captured any non-fast-food restaurant in the SSO) was compared with a roughly analogous category in the proprietary data that we created by counting the presence of any of the following: a “full-service restaurant” (NAICS code 722211), a “snack and beverage” establishment (NAICS code 722213), or a bakery (NAICS code 311811).
The next step involved creating geographically comparable units of analysis. The SSO was conducted on only 1,663 of the approximately 24,000 blocks in Chicago, and we had to ensure that we were using the same blocks with the proprietary data. We used geographic information systems software to create polygons that represented the blocks observed by SSO raters and matched a business from the proprietary data to a polygon if the geographic coordinates were inside the polygon. A business category was coded as being present on the block if at least one business in that category was inside the block boundary; otherwise, it was coded as not present. In the final step, the block-level measures from the proprietary database were merged with the SSO database, resulting in a combined database that contained measures from both data sources of whether each business category was present on each block.
Analytic strategy
To measure the correspondence between the 2 data sources, kappa statistics were calculated for the 8 business types. The kappa statistic is a measure of intersource reliability, defined as the ratio of the observed agreement to the expected agreement (i.e., the level of agreement that could be expected by chance, based on the marginal frequencies in both data sources) (35). We then assessed whether disagreement between the 2 data sources is systematically associated with the sociodemographic characteristics of neighborhoods by fitting logistic regression models that estimate the log-odds of finding disagreement (a dichotomous variable) about the presence of a given business type on a block.
Independent variables measuring the sociodemographic characteristics of neighborhoods were constructed from Summary File 3 of the 2000 US Census (http://factfinder.census.gov/). We constructed a disadvantage scale (α = 0.94) by taking the mean of the z-score values of the following variables: percentage of households with annual incomes of less than $15,000, percentage of households with annual incomes of $50,000 or less, percentage of families living in poverty, percentage of households receiving public assistance, percentage unemployed, percentage of female-headed households, percentage of never-married persons, and percentage of owner-occupied households (reverse coded). The variables used to construct this scale were selected based on a factor analysis of census tract data conducted for previous research (32).
Two variables were included to capture the influence of racial/ethnic composition on disagreement: the neighborhood percentage non-Hispanic white and a Hispanic/foreign born scale, which is the mean of the z-score values of the percentage of Hispanics and percentage foreign born (α = 0.86). We also included the percentage of residents who lived in the same house for 5 years and the logged population per square kilometer to measure residential stability and population density, respectively.
To control for the geographic location of blocks in the city of Chicago, which is likely to influence business patterns, we included a measure of distance from the Loop—Chicago's central business district—operationalized as a block's distance in kilometers from the Sears Tower. To account for differences in when blocks were rated in the SSO, we controlled for the difference (in months) between the date the block was observed and November 2002, the month for which the proprietary data were obtained. All analyses were conducted by using Stata version 9.2 software (Stata Corporation, College Station, Texas).
RESULTS
The results of the correspondence analysis are reported in Table 2. The rate of agreement between the SSO and proprietary data was quite high, ranging from 82% (for fast-food restaurants and supermarkets or grocers) to 96% (for drugstores). Restaurants had the highest intersource reliability (κ = 0.70), convenience and liquor stores and “other” (non-fast-food) eating places had the lowest (κ = 0.32–0.38), and the other categories (drugstores, bars, fast-food restaurants, and grocery stores) had levels of agreement (range of κ = 0.44–0.49) that were above convenience and liquor stores and “other” (non-fast-food) eating places but well below restaurants.
Table 2.
Business Type | Observed Agreement, % | Expected Agreement, % | Kappa | Agree: Not Present, % | Agree: Present, % | Disagree: Present in SSO, % | Disagree: Present in Prop., % |
Drugstores | 95.8 | 92.4 | 0.45* | 93.9 | 1.9 | 1.8 | 2.4 |
Liquor stores | 90.9 | 85.2 | 0.38* | 87.5 | 3.4 | 7.3 | 1.8 |
Bars | 88.6 | 79.5 | 0.45* | 82.8 | 5.8 | 8.6 | 2.8 |
Convenience stores | 88.3 | 82.8 | 0.32* | 84.7 | 3.6 | 8.1 | 3.6 |
Restaurants | 87.4 | 58.7 | 0.70* | 64.7 | 22.7 | 8.4 | 4.2 |
Fast-food restaurants | 81.8 | 64.2 | 0.49* | 67.7 | 14.1 | 6.3 | 12.0 |
Other eating place | 83.8 | 74.7 | 0.36* | 77.4 | 6.4 | 13.3 | 2.9 |
Supermarkets or grocers | 82.5 | 68.8 | 0.44* | 71.9 | 10.6 | 8.3 | 9.3 |
Abbreviations: Prop., proprietary data source; SSO, systematic social observation.
* P < 0.001.
N = 1,663.
To better understand the nature of disagreement between the 2 data sources, we cross-tabulated measurements from each source and report, in Table 2, the frequency of blocks on which 1) both data sets agreed that the business type is present, 2) both agreed that it is not present, 3) only the SSO data source indicated that the business type is present, and 4) only the proprietary data source indicated that the business type is present. The results show that, where disagreement occurred, it was more often the case that a business type was recorded in the SSO but not the proprietary data, especially in the case of liquor and convenience stores. Exceptions included drugstores, grocery stores, and especially fast-food restaurants (where the NAICS definition is more inclusive), which were reported slightly to moderately more frequently in the proprietary data than in the SSO.
Odds ratios estimated from the logistic regressions of block disagreement are reported in Table 3. The major finding was the lack of a consistent relation between the sociodemographic characteristics of neighborhoods and the odds of the 2 data sources disagreeing. Of the 40 socioeconomic and demographic coefficients that we estimated, only 9 were statistically significant. Three of the outcomes (all restaurants, fast-food restaurants, and supermarkets) had no significant sociodemographic predictors, and another 3 outcomes (drugstores, liquor stores, and bars) had only one significant sociodemographic predictor. The sociodemographic variables associated with disagreement were 1) proportion white (associated with more disagreement on non-fast-food restaurants), 2) the Hispanic/foreign-born scale (associated with more disagreement on bars and non-fast-food restaurants), 3) the disadvantage scale (associated with more disagreement on liquor stores), 4) the residential stability scale (associated with less disagreement on drugstores, convenience stores, and non-fast-food restaurants), and 5) population density (associated with more disagreement on convenience stores and non-fast-food restaurants).
Table 3.
Business Type |
||||||||
Drugstores | Liquor Stores | Bars | Convenience Stores | Restaurantsc |
Supermarket or Greengrocer | |||
Any Restaurant | Fast-Food | Other Eating Place | ||||||
Proportion white | 1.001 (0.008) | 1.007 (0.006) | 1.007 (0.006) | 0.993 (0.005) | 1.001 (0.005) | 1.004 (0.004) | 1.014 (0.005)** | 0.994 (0.004) |
Hispanic/foreign-born scale | 0.878 (0.151) | 1.173 (0.136) | 1.660 (0.183)*** | 0.909 (0.093) | 1.150 (0.111) | 1.132 (0.097) | 1.401 (0.135)*** | 1.122 (0.097) |
Disadvantage scale | 0.946 (0.282) | 1.737 (0.316)** | 0.947 (0.203) | 0.931 (0.167) | 1.346 (0.212) | 0.940 (0.145) | 1.027 (0.188) | 1.318 (0.190) |
Residential stability | 0.971 (0.014)* | 0.990 (0.010) | 0.996 (0.009) | 0.975 (0.009)** | 0.997 (0.009) | 0.996 (0.008) | 0.985 (0.008)* | 0.990 (0.008) |
Population density | 1.244 (0.311) | 1.084 (0.180) | 0.879 (0.129) | 1.655 (0.276)** | 0.999 (0.140) | 1.222 (0.159) | 1.354 (0.190)* | 1.217 (0.159) |
Months from November 2002 | 1.013 (0.027) | 0.993 (0.022) | 1.005 (0.018) | 1.011 (0.018) | 1.016 (0.018) | 1.022 (0.015) | 0.980 (0.016) | 1.032 (0.015)* |
Distance to the Loop | 0.925 (0.030)* | 0.957 (0.021)* | 0.910 (0.019)*** | 0.973 (0.019) | 0.980 (0.018) | 0.931 (0.016)*** | 0.969 (0.017) | 0.939 (0.016)*** |
Constant | 0.034 (0.006)*** | 0.091 (0.010)*** | 0.101 (0.011)*** | 0.114 (0.011)*** | 0.134 (0.012)*** | 0.195 (0.015)*** | 0.169 (0.015)*** | 0.175 (0.014)*** |
* P < 0.05; **P < 0.01; ***P < 0.001.
Refer to the “Analytic strategy” part of the Materials and Methods section of the text for an explanation of these characteristics.
N = 1,663.
“Fast-food” and “other eating place” are mutually exclusive subcategories of “any restaurant” and, combined, constitute the “any restaurant” category. Refer to the text for details.
The nature and direction of the observed sociodemographic effects for each type of establishment seem reasonable, if only marginally more than might occur by chance. There are no sociodemographic predictors for large and inclusive categories (all restaurants, all supermarkets/grocery stores) or quite identifiable ones (fast-food). Findings drawing on the results shown in Tables 2 and 3 showed the following: 1) drugstores are identified less often via SSOs, with more disagreement in areas of high residential stability (where NAICS-identified drugstores may be “hidden” in larger stores); 2) liquor stores are identified less often in the proprietary data, with more disagreement in disadvantaged areas (where the proprietary data response rates may be lower for all stores or especially this type, which may be subject to heightened law enforcement observation if reported); 3) bars are more reported via SSOs and with more disagreement in Hispanic areas, where there may again be underreporting via NAICS; 4) convenience stores are more often reported via SSO, with more disagreement in dense areas and less disagreement in residentially stable areas (where they may be underreported in the proprietary data or more easily visible via SSO); and 5) other eating places are more often reported via SSO, with more disagreement in white or Hispanic areas and denser areas and less disagreement in stable areas (where the somewhat less inclusive NAICS categories may generally miss some types of eating establishments unless the area is very stable).
Distance to the Loop was also associated with less disagreement for 5 outcomes: drugstores, liquor stores, bars, fast-food restaurants, and supermarkets, likely reflecting lower commercial density further away from downtown Chicago (and conversely more purely residential blocks) and thus fewer opportunities for disagreement to occur (because there are fewer businesses). We tested this hypothesis by repeating the models reported in Table 3 using subsamples that contained only those blocks with commercial activity. We did not have official data on land-use patterns to measure commercial activity, so we identified commercial blocks in 2 ways. First, we constructed a subsample of 1,037 blocks with commercial land use by eliminating blocks where the SSO rater did not observe any commercial land use (regardless of whether the proprietary data recorded a food-related business there). We also constructed a second subsample of 656 blocks where any of the 6 business types were reported present on the block in the proprietary data, even if the SSO raters failed to observe a food-related business there. In both subsamples, we found nonsignificant associations between distance to the Loop and disagreement for all businesses except bars, but the associations between the sociodemographic characteristics and the outcomes were similar to those shown in Table 3.
DISCUSSION
The results of this analysis are promising for researchers investigating the role of the residential environment on individual outcomes. Although neither of these 2 data sources can be taken as a “gold standard,” the moderate to high levels of agreement between the 2 sources suggest that the presence of commercial establishments in residential neighborhoods is comparably measured across the 2 methods of data collection. The intersource reliability between the SSO and proprietary measures is comparable to interrater reliabilities obtained for SSO observations (21, 36) and is higher than the rates of agreement between respondent reports and proprietary databases reported in a prior study (17).
Across the 6 business types we investigated, we did not find a consistent pattern of association between sociodemographic characteristics and disagreement that would lead us to conclude that one data source consistently differs from the other in certain types of neighborhoods. The only consistent predictor of disagreement was distance to the Loop: disagreement was less likely in blocks further away from the center of the city. Supplementary analysis revealed that this association disappeared when the sample of blocks was limited to blocks with commercial establishments. This finding bolsters our interpretation that distance to the Loop is a proxy for commercial density, which creates more opportunity for disagreement simply because there are more businesses to rate or list.
Most types of businesses were more likely to be reported on a block by SSO observers than by proprietary business listings. We suspect that this is partly the result of the definitions used to classify the different businesses since the SSO rater instructions generally had more lenient definitions than the NAICS codes. It is also possible that SSO reports were more accurate than the proprietary business data listings because raters were able to obtain additional information about businesses, such as the merchandise or services that they sell or produce, by visiting them. Businesses classify themselves in the proprietary data by using NAICS categories, but there are no assurances that businesses will list all of the different categories they could be part of (and they can list as many as 4 only), and we know nothing about response rates.
Limitations
Although our results suggest that these data sources provide reliable estimates of the residential environment, several limitations should be noted. First, the SSO instrument was not designed to count the number of business establishments. Although we were able to compare how well these data sources capture the presence of businesses, not having data on the number of businesses meant that we could not compare measures of business density in neighborhoods. Future studies should compare both presence and density. Second, as we mentioned above, the definitions of the 6 business types differed slightly across the 2 data sources. This difference likely inflated the amount of disagreement on each block, resulting in conservative assessments of intersource reliability. It also suggests that future studies using SSO methodology to measure business establishments should use standard definitions, such as NAICS codes, to allow for maximum comparability between studies.
Finally, this study examined agreement across only the 2 data sources for 6 business types and examined these types in only a single city at a single point in time. Although Chicago has a wide diversity of neighborhoods, these analyses should be extended to other cities and types of residential environments including suburban and rural communities. Furthermore, we considered only those blocks with some residential housing units and hence cannot generalize our results to blocks with no residential land use. Thus, although this study provides evidence that the 2 data sources yield consistent measures of the residential environment, it represents only a first step in assessing the reliability between these data sources, and further investigations that include more items and more geographic locations are warranted. This analysis presents a straightforward strategy that can be used to assess the reliability of other measurements across different geographic contexts.
Conclusions
The levels of reliability found between these 2 methods of data collection suggest that researchers can use measures of the residential environment derived from either data source with roughly equal confidence. For studies aiming to characterize the neighborhood food environment, the data sources provide fairly comparable measures of the types of establishments present in the neighborhood; therefore, selection of one data source over the other should depend on the needs of the researcher and the costs and benefits unique to each method of data collection. For instance, the proprietary data provide complete geographic coverage, whereas the SSO relies on a sample of city blocks; thus, researchers interested in the total availability or density of food or retail establishments would probably find the proprietary data more cost-effective. On the other hand, the SSO provides researchers with a tool to collect more nuanced measures of the residential environment such as physical and social disorder and can probably make more nuanced distinctions among establishments because of the opportunity for direct observation. Where nuanced distinctions are necessary, these results suggest that SSO might be a more appropriate method. Therefore, both of these methods of data collection should be included in the toolkit researchers use to investigate the role of the residential environment on health, and they can be utilized as alternatives or complements, depending on the research objectives and financial resources of investigators.
Acknowledgments
Author affiliations: Robert Wood Johnson Foundation Health & Society Scholars Program, University of Pennsylvania, Philadelphia, Pennsylvania (Michael D. M. Bader); Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania (Michael D. M. Bader); Multidisciplinary Research Training in Gerontology Program, Andrus Gerontology Center, University of Southern California, Los Angeles, California (Jennifer A. Ailshire); Department of Sociology, University of Michigan, Ann Arbor, Michigan (Jeffrey D. Morenoff, James S. House); Population Studies Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan (Jeffrey D. Morenoff); Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan (Jeffrey D. Morenoff, James S. House); and Ford School of Public Policy, University of Michigan, Ann Arbor, Michigan (James S. House).
This research was supported by the University of Michigan's Robert Wood Johnson Foundation Health & Society Scholars Program (small grant 045823) and the National Institute of Child Health and Human Development (grants HD38986 and HD050467).
The authors thank Dr. Ana Diez Roux for her contributions to this study and Robert Melendez for his assistance in creating the geographically comparable databases.
Conflict of interest: none declared.
Glossary
Abbreviations
- NAICS
North American Industry Classification System
- SSO
systematic social observation
References
- 1.Robert SA, Reither EN. A multilevel analysis of race, community disadvantage, and body mass index among adults in the US. Soc Sci Med. 2004;59(12):2421–2434. doi: 10.1016/j.socscimed.2004.03.034. [DOI] [PubMed] [Google Scholar]
- 2.Ellaway A, Anderson A, Macintyre S. Does area of residence affect body size and shape? Int J Obes Relat Metab Disord. 1997;21(4):304–308. doi: 10.1038/sj.ijo.0800405. [DOI] [PubMed] [Google Scholar]
- 3.Maantay J. Asthma and air pollution in the Bronx: methodological and data considerations in using GIS for environmental justice and health research. Health Place. 2007;13(1):32–56. doi: 10.1016/j.healthplace.2005.09.009. [DOI] [PubMed] [Google Scholar]
- 4.Lovasi GS, Quinn JW, Neckerman KM, et al. Children living in areas with more street trees have lower prevalence of asthma. J Epidemiol Community Health. 2008;62(7):647–649. doi: 10.1136/jech.2007.071894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Diez Roux AV. Residential environments and cardiovascular risk. J Urban Health. 2003;80(4):569–589. doi: 10.1093/jurban/jtg065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Diez Roux AV, Merkin SS, Arnett D, et al. Neighborhood of residence and incidence of coronary heart disease. N Engl J Med. 2001;345(2):99–106. doi: 10.1056/NEJM200107123450205. [DOI] [PubMed] [Google Scholar]
- 7.Ross CE. Walking, exercising, and smoking: does neighborhood matter? Soc Sci Med. 2000;51(2):265–274. doi: 10.1016/s0277-9536(99)00451-7. [DOI] [PubMed] [Google Scholar]
- 8.Diez-Roux AV, Nieto FJ, Caulfield L, et al. Neighbourhood differences in diet: the Atherosclerosis Risk in Communities (ARIC) Study. J Epidemiol Community Health. 1999;53(1):55–63. doi: 10.1136/jech.53.1.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Troutt DD. Thin Red Line: How the Poor Still Pay More. San Francisco, CA: Consumers Union of the U.S., Inc; 1993. [Google Scholar]
- 10.Giles-Corti B, Donovan RJ. Socioeconomic status differences in recreational physical activity levels and real and perceived access to a supportive physical environment. Prev Med. 2002;35(6):601–611. doi: 10.1006/pmed.2002.1115. [DOI] [PubMed] [Google Scholar]
- 11.Saelens BE, Sallis JF, Frank LD. Environmental correlates of walking and cycling: findings from the transportation, urban design, and planning literatures. Ann Behav Med. 2003;25(2):80–91. doi: 10.1207/S15324796ABM2502_03. [DOI] [PubMed] [Google Scholar]
- 12.Humpel N, Owen N, Leslie E. Environmental factors associated with adults’ participation in physical activity: a review. Am J Prev Med. 2002;22(3):188–199. doi: 10.1016/s0749-3797(01)00426-3. [DOI] [PubMed] [Google Scholar]
- 13.Entwisle B. Putting people into place. Demography. 2007;44(4):687–703. doi: 10.1353/dem.2007.0045. [DOI] [PubMed] [Google Scholar]
- 14.Diez Roux AV. Invited commentary: places, people, and health. Am J Epidemiol. 2002;155(6):516–519. doi: 10.1093/aje/155.6.516. [DOI] [PubMed] [Google Scholar]
- 15.Sampson RJ, Morenoff JD, Gannon-Rowley T. Assessing “neighborhood effects”: social processes and new directions in research. Annu Rev Sociol. 2002;28:443–478. [Google Scholar]
- 16.Reiss AJ. Systematic observation of natural social phenomena. Sociol Methodol. 1971;3:3–33. [Google Scholar]
- 17.Kirtland KA, Porter DE, Addy CL, et al. Environmental measures of physical activity supports: perception versus reality. Am J Prev Med. 2003;24(4):323–331. doi: 10.1016/s0749-3797(03)00021-7. [DOI] [PubMed] [Google Scholar]
- 18.Morland K, Wing S, Diez Roux A. The contextual effect of the local food environment on residents’ diets: the Atherosclerosis Risk in Communities Study. Am J Public Health. 2002;92(11):1761–1768. doi: 10.2105/ajph.92.11.1761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Moore LV, Diez Roux AV. Associations of neighborhood characteristics with the location and type of food stores. Am J Public Health. 2006;96(2):325–331. doi: 10.2105/AJPH.2004.058040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cummins S, Macintyre S, Davidson S, et al. Measuring neighbourhood social and material context: generation and interpretation of ecological data from routine and non-routine sources. Health Place. 2005;11(3):249–260. doi: 10.1016/j.healthplace.2004.05.003. [DOI] [PubMed] [Google Scholar]
- 21.Pikora TJ, Bull FC, Jamrozik K, et al. Developing a reliable audit instrument to measure the physical environment for physical activity. Am J Prev Med. 2002;23(3):187–194. doi: 10.1016/s0749-3797(02)00498-1. [DOI] [PubMed] [Google Scholar]
- 22.Pikora T, Giles-Corti B, Bull F, et al. Developing a framework for assessment of the environmental determinants of walking and cycling. Soc Sci Med. 2003;56(8):1693–1703. doi: 10.1016/s0277-9536(02)00163-6. [DOI] [PubMed] [Google Scholar]
- 23.Sampson RJ, Raudenbush SW. Systematic social observation of public spaces: a new look at disorder in urban neighborhoods. Am J Sociol. 1999;105:603–651. [Google Scholar]
- 24.Mujahid MS, Diez Roux AV, Morenoff JD, et al. Assessing the measurement properties of neighborhood scales: from psychometrics to ecometrics. Am J Epidemiol. 2007;165(8):858–867. doi: 10.1093/aje/kwm040. [DOI] [PubMed] [Google Scholar]
- 25.Raudenbush SW, Sampson RJ. Ecometrics: toward a science of assessing ecological settings, with application to the systematic social observation of neighborhoods. Sociol Methodol. 1999;29:1–41. [Google Scholar]
- 26.Zandbergen PA, Comments on Boone et al. “Validation of a GIS facilities database: quantification and implications of error”. Ann Epidemiol. 2008;18(10):823–824. doi: 10.1016/j.annepidem.2008.06.003. [DOI] [PubMed] [Google Scholar]
- 27.Morland K, Diez Roux AV, Wing S. Supermarkets, other food stores, and obesity: the Atherosclerosis Risk in Communities Study. Am J Prev Med. 2006;30(4):333–339. doi: 10.1016/j.amepre.2005.11.003. [DOI] [PubMed] [Google Scholar]
- 28.Powell LM, Slater S, Mirtcheva D, et al. Food store availability and neighborhood characteristics in the United States. Prev Med. 2007;44(3):189–195. doi: 10.1016/j.ypmed.2006.08.008. [DOI] [PubMed] [Google Scholar]
- 29.Small ML, McDermott M. The presence of organizational resources in poor urban neighborhoods: an analysis of average and contextual effects. Soc Forces. 2006;84:1697–1724. [Google Scholar]
- 30.Reidpath DD, Burns C, Garrard J, et al. An ecological study of the relationship between social and environmental determinants of obesity. Health Place. 2002;8(2):141–145. doi: 10.1016/s1353-8292(01)00028-4. [DOI] [PubMed] [Google Scholar]
- 31.Block JP, Scribner RA, DeSalvo KB. Fast food, race/ethnicity, and income: a geographic analysis. Am J Prev Med. 2004;27(3):211–217. doi: 10.1016/j.amepre.2004.06.007. [DOI] [PubMed] [Google Scholar]
- 32.Morenoff JD, House JS, Hansen BB, et al. Understanding social disparities in hypertension prevalence, awareness, treatment, and control: the role of neighborhood context. Soc Sci Med. 2007;65(9):1853–1866. doi: 10.1016/j.socscimed.2007.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sampson RJ, Raudenbush SW, Earls F. Neighborhoods and violent crime: a multilevel study of collective efficacy. Science. 1997;277(5328):918–924. doi: 10.1126/science.277.5328.918. [DOI] [PubMed] [Google Scholar]
- 34.Pearce J, Blakely T, Witten K, et al. Neighborhood deprivation and access to fast-food retailing. A National Study. Am J Prev Med. 2007;32(5):375–382. doi: 10.1016/j.amepre.2007.01.009. [DOI] [PubMed] [Google Scholar]
- 35.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. [PubMed] [Google Scholar]
- 36.Boone JE, Gordon-Larsen P, Stewart JD, et al. Validation of a GIS facilities database: quantification and implications of error. Ann Epidemiol. 2008;18(5):371–377. doi: 10.1016/j.annepidem.2007.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]