Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 1.
Published in final edited form as: Am J Prev Med. 2013 Oct;45(4):462–473. doi: 10.1016/j.amepre.2013.06.009

Validity of Secondary Retail Food Outlet Data

A Systematic Review

Sheila E Fleischhacker 1, Kelly R Evenson 1, Joseph Sharkey 1, Stephanie BJ Pitts 1, Daniel A Rodriguez 1
PMCID: PMC3779346  NIHMSID: NIHMS506551  PMID: 24050423

Abstract

Context

Improving access to healthy foods is a promising strategy to prevent nutrition-related chronic diseases. To characterize retail food environments and identify areas with limited retail access, researchers, government programs, and community advocates have primarily used secondary retail food outlet data sources (e.g., InfoUSA or government food registries). To advance the state of the science on measuring retail food environments, this systematic review examined the evidence for validity reported for secondary retail food outlet data sources for characterizing retail food environments.

Evidence acquisition

A literature search was conducted through December 31, 2012 to identify peer-reviewed published literature that compared secondary retail food outlet data sources to primary data sources (i.e., field observations) for accuracy of identifying the type and location of retail food outlets. Data were analyzed in 2013.

Evidence synthesis

Nineteen studies met the inclusion criteria. The evidence for validity reported varied by secondary data sources examined, primary data–gathering approaches, retail food outlets examined, and geographic and sociodemographic characteristics. More than half of the studies (53%) did not report evidence for validity by type of food outlet examined and by a particular secondary data source.

Conclusions

Researchers should strive to gather primary data but if relying on secondary data sources, InfoUSA and government food registries had higher levels of agreement than reported by other secondary data sources and may provide sufficient accuracy for exploring these associations in large study areas.

Introduction

Promising approaches to reducing nutrition-related chronic diseases include environmental and policy strategies such as land-use regulations that permit farmers’ markets and public–private financing programs that incentivize the building of retail food outlets in underserved communities.1,2 These approaches have been informed by research indicating that limited access to nutritious food is associated with a higher risk for chronic diseases.35 However, studies examining the relationship between retail food environments and chronic diseases have generated mixed results.68

A plausible explanation for these differences may be the lack of consistency and rigor in measuring retail food environments.710 The majority of research and tools available to characterize retail food environments and identify areas with limited retail food access use secondary data sources (i.e., data collected by someone else).8,11 Secondary retail food outlet data sources include government sources (e.g., food inspection registries); commercial sources (e.g., InfoUSA); local directories (e.g., Yellow Pages); and omnidirectional imagery (i.e., sources that simultaneously collect images in multiple directions from a single location producing a panoramic view, such as Google Street View).

Increasingly, primary retail food outlet data sources (i.e., data collected through field observations by the team conducting the research) represent the gold standard in characterizing retail food environments, given that secondary data sources have been found to under- and overestimate food access when compared to primary data sources.9,1214 To advance the state of the science on measuring retail food environments, the current systematic review examined the evidence for validity reported for secondary data sources for characterizing retail food environments. This review focused on criterion-related validity, defined as the accuracy with which secondary data sources identified the type and location of retail food outlets, using primary data to represent the gold standard.15

Evidence Acquisition

A systematic review was conducted through December 31, 2012 to identify peer-reviewed published literature that compared secondary data sources to primary data sources for accuracy of identifying the type and location of retail food outlets (Appendix A, available online at www.ajpmonline.org). Table 1 provides operational definitions used throughout the coding process. Levels of agreement for evidence of validity reported were interpreted using the Landis and Koch criteria (<0.00 poor, 0.00–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect).16 Data were analyzed in 2013.

Table 1.

Key terms and operational definitions

Term Operational Definition
Cohen’s Kappa coefficient The agreement between primary and secondary retail food outlet data sources that takes into account the agreement occurring by chance
Concordance The proportion of the retail food outlets both observed during primary data collection and listed by the secondary retail food outlet data sources among all the outlets ascertained via primary and secondary retail food outlet data sources
GPS-assisted GPS or other forms of remote sensing technologies capture precise locational data (i.e., the latitude and longitude of a retail food outlet)
Ground-truthed Primary data on retail food outlet type and location, gathered by trained observers not guided in the field by a list and/or map of retail food outlets identified through secondary data sources
A systematic canvass of the targeted study area is conducted, with or without the use of GPS or other remote sensing technologies.
Intra- or inter-rater reliability Evidence of intra-rater reliability included comparisons of retail food outlet data entered by the same rater. Evidence of inter-rater reliability included comparisons of raters’ decisions about whether to identify a retail food outlet as a convenience store or fast-food restaurant or both, as well as how to distinguish a small grocery store from a supermarket from a convenience store.
Omnidirectional Observations Uses omnidirectional imagery (i.e., sources that simultaneously collect images in multiple directions from a single location producing a panoramic view such as Google Street View) to visually tour a targeted study area, not guided by a list of predetermined retail food outlets in the study area from primary or secondary data sources
On-Site Verification Primary data on retail food outlet type and location, gathered by trained observers guided in the field by a list and/or map of food outlets identified through secondary data sources that could occur with or without a systematic canvass of the targeted study area and with or without the use of GPS or other remote sensing technologies
Percentage Agreement The percentage of the primary retail food outlet data that matched the secondary retail food outlet data
Positive Predictive Value The proportion of the retail food outlets listed by the secondary retail food outlet data sources that were observed during primary data collection
Primary Retail Food Data Data collected through direct field observations by the team conducting the research to characterize the local retail food environment Primary data are considered the gold standard to characterize retail food environments given that secondary retail food outlet data sources have been found to under- and over-estimate food access, when compared to primary data.
Retail Food Outlet Retail or commercial outlet in the business of selling food to the public; does not include household availability or institutional food service such as child care centers, schools, hospitals, correctional facilities, or municipal
Secondary Retail Food Data Data collected by someone else; for example, government sources, such as local food inspection registries; commercial sources, such as InfoUSA and Dun and Bradstreet; online directories, such as Yellow Pages; and omnidirectional sources, such as Google Street View and Google Earth These sources have been shown to under- and over-count the number of retail food outlets in comparison to primary data.
Sensitivity The ratio of the number of retail food outlets ascertained via primary data that matched retail food outlets ascertained via secondary data source(s), to the number of retail food outlets ascertained via primary data that matched retail outlets ascertained via secondary data source(s) plus the number of retail food outlets ascertained via primary data that did not match retail food outlets ascertained via secondary data source(s)
Specificity The proportion of negatives (i.e., nonretail food outlets) that are correctly identified as not being retail food outlets
Systematic Canvass Thorough and detailed primary data examination of a defined geographic setting using defined geographic parameters Evidence of a systematic canvass includes a detailed description or discussion of study maps marking areas to include and exclude during primary data collection and were not limited to the areas where secondary data sources indicated the presence of a retail food outlet. Ground-truthed studies by definition include systematic canvasses, while onsite verification studies could occur with or without a systematic canvass.
Targeted Observational Field Data Primary data gathered by trained observers that targets a specific study area such as a study participant’s residential block or selected street block segments These observations do not systematically canvass beyond the targeted field areas. These observations may or may not use GPS or other remote sensing technologies. These studies do not include a list of predetermined resources in the study area to target the field observations, but the observational area is limited or guided by a participant’s residential address or based on study selection criteria such as high-walkability block segments in New York City.
Validity This review focused on criterion-related validity, defined as the accuracy with which secondary data sources identified the type and location of retail food outlets, using primary data to represent the gold standard.
Virtual Verification Uses omnidirectional imagery such as Google Street View or Google Earth to visually verify the existence of a retail food outlet identified through primary or secondary data sources

Evidence Synthesis

Nineteen studies met the inclusion criteria1214,1732; relevant information on four of these studies was published elsewhere.3336 The following summarizes the methods used and the evidence for validity reported by secondary data sources examined, primary data–gathering approaches, retail food outlets examined, and geographic and sociodemographic characteristics.

Methods Used

Secondary data sources examined

Four types of secondary data sources were used to identify retail food outlets (Table 2). InfoUSA or ReferenceUSA were examined most frequently (32%).12,13,18,20,28,30 Government food registries were examined in 11 studies (58%), but there was wide variability in jurisdiction and the nature of what was monitored (e.g., authorized retailers of U.S. food and nutrition assistance programs versus state-authorized lottery ticket retailers). All but one study examining local directories specified the name of the directory utilized.14 The only type of omnidirectional imagery examined came from Google, but the approaches varied. To illustrate, two studies18,24 utilized Google Street View to virtually verify the retail food outlets identified by other secondary data sources, whereas Rundle et al.32 gathered targeted observational field data (i.e., canvassed a limited and specific area such as a participant’s residential block).

Table 2.

Secondary retail food outlet data sources examined (n=19)a

Secondary Retail Food Outlet Data Sourceb n (% of Total) References

Commercial Sources
Dun & Bradstreet (U.S.) 3 (16) 12,13,18
InfoUSA or ReferenceUSA 6 (32) 12,13,18,20,28,30
InfoCanada 1 (5) 25
Krak Denmark (Web-based search engine) 1 (5) 29
Stockman Company (Denmark retail food chains) 1 (5) 29
Tamec Inc. (Canada) 1 (5) 26

Government Sources

City Health Department 4 (21) Scotland,27 United Kingdom,21,22 U.S.24
County Health Department (U.S.) 1 (5) 18
State Department of Agriculture (U.S.) 5 (26) 14,18,20,23,30
State Department of Health–authorized U.S. Department of Agriculture’s Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) retailers (U.S.) 1 (5) 23
State Department of Taxation and Finance (U.S.) 1 (5) 23
State Department of Health (U.S.) 1 (5) 13
State Liquor Authority (U.S.) 1 (5) 23
State-authorized lottery ticket retailers (U.S.) 1 (5) 23
U.S. Department of Agriculture–authorized Supplemental Nutrition Assistance Program (SNAP) retailers 1 (5) 23
Country Food Administration (Denmark) 1 (5) 17
National Tax Registry (Denmark) 1 (5) 29

Local Directories Sources

Online 6 (32) 14,18,19,22,25,26
 Canada411 (Canada) 1 (5) 26
 Google (Canada) 1 (5) 26
 Montrealplus (Canada) 1 (5) 26
 Pagesjaunes (Canada) 1 (5) 26
 Toutmontreal.com (Canada) 1 (5) 26
 Unidentified Internet Telephone Directories (U.S.) 1 (5) 14
 Yahoo! Yellow Pages (U.S.) 1 (5) 19
 Yellow Pages 3 (16) Canada,25 United Kingdom,22 U.S.18
Telephone Book(s) 3 (16) 14,22,29
 Teledanmark (Denmark Telephone Company) 1 (5) 29
 Unidentified Local/Area Telephone Directories (U.S.) 1 (5) 14
 Yellow Pages (United Kingdom) 1 (5) 22

Omnidirectional Sources

Google Earth (U.S.) 2 (10) 19,31
Google Street View (U.S.) 3 (16) 18,24,32
Google Maps Denmark 1 (5) 29
a

Thirteen of the 19 studies reviewed (68%) examined more than one secondary data source (in descending order of number of sources examined): six sources18,23,26 (Dun & Bradstreet, ReferenceUSA, County Health Department, State Department of Agriculture, Online Yellow Pages, and Google Street View18; State Department of Agriculture, State Department of Health–authorized WIC retailers, State Department of Taxation and Finance, State Liquor Authority, state-authorized lottery ticket retailers, and USDA–authorized SNAP retailers23; and Tamec Inc., Canada411, Google, Montrealplus, Pagesjaunes, and Toutmontreal.com26); five sources29(Krak Denmark, Stockman, National Tax Registry, Teledanmark, and Google Maps Denmark); three sources13,14,22 (Dun & Bradstreet, InfoUSA, and State Department of Health13; State Department of Agriculture, Unidentified Internet Telephone Directories, and Unidentified Local/Area Telephone Directories14; and City Health Department, Online Yellow Pages, and Yellow Pages22); two sources12,19,20,24,25,30 (Dun & Bradstreet and InfoUSA12; Yahoo! Yellow Pages and Google Earth19; ReferenceUSA and State Department of Agriculture20; City Health Department and Google Street View24; InfoCanada and Online Yellow Pages25; and InfoUSA and State Department of Agriculture30); and one source17,21,27,28,31,32 (Denmark Food Administration17; City Health Department21,27; InfoUSA28; Google Earth31; and Google Street View32).

b

Country data gathered is included in the parentheses if not explicit in the source’s title USDA, U.S. Department of Agriculture; WIC, U.S. Department of Agriculture’s Special Supplemental Nutrition Program for Women, Infants, and Children

Secondary data were typically gathered in the same year as primary data; the exception was a 5-year time lag.31 Few studies elaborated on how secondary data were entered and edited or how duplicate or possible duplicate retail food outlets were eliminated or combined.13,18,24 Only one study did not use GIS software or other digital mapping systems to map the location of retail food outlets ascertained from at least one of the secondary data sources examined.27 Nevertheless, only three studies reported the number of retail food outlets ascertained via secondary data sources which were successfully geocoded (i.e., located the associated latitude and longitude).12,18,28

Primary data–gathering approaches

Great variability was found in primary data–gathering approaches (Table 3), with only five studies (26%) reporting inter-rater reliability for their protocol.18,19,28,30,31 Four studies (21%) ground-truthed, GPS-assisted (i.e., conducted a systematic canvass without a map or list of retail food outlets guiding the observations).12,14,17,18 Another three studies (16%) gathered targeted observational field data.28,31,32 The remaining 12 studies (63%) conducted on-site verification (i.e., conducted a canvass guided by a list and/or map of retail food outlets ascertained via secondary data sources).13,1927,29,30 Use of a systematic canvass varied across on-site verification studies, although verifying whether such a canvass occurred was difficult.

Table 3.

Primary retail food outlet data-gathering approaches (n=19)

Primary Data–Gathering Approach n (% of Total) References by Location of Study Area
Canada Denmark Scotland United Kingdom U.S.
Ground-Truthed, GPS-assisted 4 (21) 17 12,14,18
On-Site Verification
 With Systematic Canvass 2 (10) 21,22
 With Systematic Canvass, GPS-assisted 1 (5) 19
 Without Systematic Canvass 4 (21) 26 29 27 24
 Without Systematic Canvass, GPS-assisted 2 (10) 13,23
 Targeted Observation, GPS-assisted 3 (16) 25 20,30
Targeted Observational Field Data 3 (16) 28,31,32

Regardless of the primary data approach used, studies varied in the detail provided on what type of GPS unit was used, where GPS data were exactly gathered, whether a camera-based GPS was used, and how GPS data were downloaded and analyzed.1214,1720,23,25,30 A few studies gathered primary data for only a portion of their larger study area,17,21 whereas others24,27 restricted the portion of retail food outlets ascertained via secondary data sources that they verified during primary data collection. Piloting or pretesting of primary data collection instruments was explicitly mentioned in five studies (26%).14,18,19,31,34 Re-canvassing the study area to look for retail food outlets identified via secondary data sources that did not match the outlets identified via primary data was noted in three studies (16%).13,14,20 Only one study18 provided the estimated cost of primary data collection; although another study20 mentioned that the cost was minimal.

Retail food outlets examined

More than half of the studies (53%) included a range of retail food outlets such as grocery stores and restaurants (Table 4).12,13,1822,28,29,31 Several studies excluded restaurants,14,23,26,27 and two additional studies excluded full-service restaurants.24,30 Farmers’ markets were specifically examined by four predominantly rural studies.1820,30 To define and classify the retail food outlets examined, most studies created their own definitions or classification schemes17,2024,27,31,32 or used the North America Industry Classification System (NAICS)1214,18,19,25,26,28,30 (Table 4 and Appendix B, available online at www.ajpmonline.org). Only three studies18,19,35 reported percentage agreement between independent coders for classifying food outlets.

Table 4.

Retail food outlets examined (n=19)

n (% of Total) References by Location of Study Area
Inclusions and Exclusions
Canada Denmark Scotland United Kingdom U.S.
Examined a Range of Food Outlets 10 (53) 29 21,22 12,13,1820,28,31
Excluded All Restaurants 4 (21) 26 27 14,23
Excluded Full-Service Restaurants 2 (10) 24,30
Examined Convenience Stores and Restaurants 1 (5) 25
Examined Fast-food Restaurants Only 1 (5) 17
Examined Licensed Food Carts Only 1 (5) 32
Definitions and Classifications
Own Definition(s) or Classification Systema 9 (47) 17 27 21,22 20,23,24,31,32
NAICS or SIC system, including modifications 9 (47)
 Specified NAICS or SIC Codes 12,13,18,28,30
 Did Not Specific NAICS or SIC Codes 25,26 14,19
European Business Codes or NACE codes 1 (5) 29
NEMS, including modificationsb 1 (5) 18,c
a

See Appendix B (available online at www.ajpmonline.org) for specific definitions and classification schemes used

b

Fleischhacker et al. used NAICS codes for two commercial retail food outlet data sources in addition to a modified NEMS approach for classifying retail food outlets gathered from both secondary and primary data sources (see Appendix B, available online at www.ajpmonline.org, for specific definitions and classification schemes used)

NACE, Nomenclature des Activites Economiques; NAICS, North America Industry Classification System; NEMS, Nutrition Environment Measures Survey; SIC, Standard Industrial Classification

Geographic and sociodemographic characteristics

Most studies (63%) were conducted in the U.S. (Appendix C, available online at www.ajpmonline.org).1214,1820,23,24,28,3032 All but three14,19,30 of the studies included urban settings. More recent studies examined various levels of urbanization.12,13,17,18,20,21,25 Although a variety of approaches were used to describe a study area’s urbanization, population density12 and the U.S. Department of Agriculture’s Rural–Urban Commuting Areas13,18 were most often used. Similarly, various geographic units of analyses defined the study areas, ranging in size from a city27 to block segments.32 In the 11 studies (58%),12,14,20,21,23,24,2628,30,32 identifying the SES of the study area, as well as in the seven studies (32%)12,14,18,23,24,26,28 describing the racial/ethnic minority composition of the study, the purpose for and approaches to differentiating study areas by sociodemographic characteristics varied (Appendix D, available online at www.ajpmonline.org).

Evidence for Validity Reported

By secondary data sources examined

Agreement varied between and among the four types of secondary data sources examined (Appendixes D and E, available online at www.ajpmonline.org). The levels of agreement reported by the nine studies examining commercial sources ranged from slight to almost perfect.12,13,18,20,25,26,2830 The most evidence for validity reported on a particular secondary data source was for InfoUSA. Indeed, six studies12,13,18,20,28,30 examined InfoUSA; four of these six studies12,13,18,30 reported advanced statistical analyses specific to InfoUSA by type of retail food outlet examined.

Although 12 studies examined government sources, there was great variability in the types of sources examined and in the evidence for validity reported. Local directories were particularly problematic to assess their evidence for validity since three studies did not specifically report findings by the individual local directory source examined.14,19,26 Scant advanced statistical analyses were reported on the evidence for validity by omnidirectional sources by type of food outlet examined, with little information garnered in rural settings. Results for the accuracy of the geospatial positional errors identified when comparing secondary and primary data sources were limited, while mostly reporting significant positional errors for secondary data sources.13,14,17,24,25

By primary data–gathering approach

While statistically challenging to compare the four ground-truthed studies to 12 onsite verification studies, the levels of agreement tended to be lower for the ground-truthed studies than for the onsite verification studies (Table 5). Only two of the four ground-truthed studies examined the exact same data sources. That is, Powell et al.12 and Fleischhacker et al.18 examined Dun & Bradstreet and InfoUSA, and both reported mixed results across these two sources and across the types of food outlets examined.12,18 Both reported that secondary data sources had limited food outlet classification accuracy. For onsite verification studies13,1927,29,30 the levels of agreement by secondary data source varied across the studies, but sensitivity tended to be higher for commercial sources (e.g., 0.6013 to 0.9630) than for government sources (e.g., 0.4623 to 0.8524), and local directories (e.g., 0.5222 to 0.7429).

Table 5.

By primary data–gathering approach, evidence for validity of secondary retail food data reported (n=19)a

Commercial Sourcesb Government Sourcesc Local Directoriesd Omnidirectional Sourcese
Ground-truthed, GPS-Assisted (n=4)12,14,17,18
Percentage Agreementf Almost Perfect
0.8518g
0.9012g
Substantial to Almost Perfect
0.6414h
0.7617
0.8218g
Substantial
0.6414h
0.7718
Sensitivity Moderate to Substantial
0.5912g
0.6518g
Moderate to Almost Perfect
0.4218g
0.8217
Moderate
0.5518
Positive Predictive Value Moderate to Substantial
0.4918g
0.6212g
Fair to Almost Perfect
0.3118g
0.9217
Moderate
0.4118
Cohen’s Kappa Coefficient Moderate
0.4318g
Fair
0.2418g
Fair
0.2418
Concordance Moderate
0.4218g
0.4412g
Fair
0.2618g
Fair
0.3518
On-Site Verification (n=12)13,1927,29,30
Percentage Agreementf Substantial to Almost Perfect
0.6520 i
0.7230
0.7313g
0.7725,26
0.8629g
Fair to Almost Perfect
0.3422
0.5030
0.6429
0.7021
0.7713
0.8020
0.8524
0.8623g
0.8827
Fair to Almost Perfect
0.3719h
0.5422g
0.6526g
0.7129
0.8825
Fair to Substantial
0.3719h
0.6924
0.7829
Sensitivity Moderate to Almost Perfect
0.6013g
0.8426
0.9029g
0.9630
Moderate to Almost Perfect
0.4623g
0.5030
0.6813
0.7529
0.8421,22
0.8524
Moderate to Substantial
0.5222g
0.6626g
0.7429
Almost Perfect
0.8129
Positive Predictive Value Substantial to Almost Perfect
0.7030
0.8213g
0.9026
0.9429g
Almost Perfect
0.8129
0.8221
0.8913
0.8923g
0.9222
1.0030
Almost Perfect
0.8122g
0.9529
0.9826g
Almost Perfect
0.9529
Concordance Almost Perfect
0.9429g
Fair
0.2329
Fair
0.2729
Almost Perfect
0.8729
Targeted Observational Field Data (n=3)28,31,32
Percentage Agreementf Almost Perfect
0.8828
Fair to Almost Perfect
0.3632
0.9231
Cohen’s Kappa Coefficient Moderate
0.4828
Fair
0.2131
a

Levels of agreement for all evidence for validity findings reported were interpreted using the Landis and Koch criteria (<0.00 poor, 0.00–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect).

b

Averages findings reported on Dun & Bradstreet (U.S.); InfoUSA or ReferenceUSA; InfoCanada; Krak Denmark (Web-based search engine); Stockman Company (chain food addresses); and Tamec Inc.

c

Averages findings reported on City Health Department (United Kingdom and U.S.); County Health Department (U.S.); State Department of Agriculture (U.S.); State Department of Health–authorized WIC retailers (U.S.); State Department of Taxation and Finance (U.S.); State Department of Health (U.S.); State Liquor Authority (U.S.); state-authorized lottery ticket retailers (U.S.); USDA-authorized Supplemental Nutrition Assistance Program (SNAP) retailers; Country Food Administration (Denmark); and National Tax Registry.

d

Averages findings reported on the variety of online and local telephone directories examined.

e

Averages findings reported on Google Earth (U.S.); Google Street View (U.S.); and Google Maps Denmark.

f

Frequencies or dispositions percentages, when necessary, were used to calculate a percentage agreement.

g

Average findings reported across a combination of data sources (e.g., ReferenceUSA and Dun & Bradstreet or multiple government sources)

h

Not all studies reported evidence for validity by specific data source (e.g., Sharkey14 grouped local/area telephone directories, Internet telephone directories, and a list of Current Food Establishment Group Firms from the Texas Department of Agriculture) so the total evidence reported was used for each data source examined.

i

Comparisons were made between results generated using primary versus secondary data for fast-food density and proximity, convenience store proximity, and food deserts.

USDA, U.S. Department of Agriculture; WIC, U.S. Department of Agriculture’s Special Supplemental Nutrition Program for Women, Infants, and Children

By retail food outlets examined

More than half of the studies (53%) did not report evidence for validity by type of retail food outlet for a particular secondary data source examined (e.g., evidence for validity for grocery stores for Dun & Bradstreet; Table 5 and Appendixes D and E).14,1924,26,27,29 For convenience stores, a number of studies noted that these outlets were one of the more challenging categories to match and were more often missed by secondary data sources in comparison to other retail categories.14,18,19,26 Evidence for validity reported was also mixed for general merchandise stores, often defined differently across studies. All but three studies17,25,32 examined grocery stores with varying sensitivity (0.4623 to 0.9930) and positive predictive value (0.5912 to 0.9826).

Supermarkets or supercenters were analyzed separately from grocery stores in six studies.12,14,19,23,27,30 The retail food outlet type studied the least was specialty markets and shops, and the operational definition for this category varied. For instance, farmers’ markets or produce stands were included in specialty stores in one study,18 whereas others19,20,30 examined these outlets as their own category.

Most studies (68%) examined at least one type of restaurant.12,13,1822,24,25,2831 Government sources that specifically maintained food registries for inspection purposes tended to have greater evidence for validity for restaurants than other secondary data sources examined (e.g., Liese13). Several studies compared full-service to fast-food/take out or analyzed restaurants with more specificity such as franchised limited-service, sandwich shop, or pizzeria.12,13,19,22,28,31 One study noted similar sensitivities for government data for three types of eating establishments (full-service, franchised limited-service, and nonfranchised limited-service), but sensitivities varied for commercial data (0.45–0.63, Dun & Bradstreet; 0.61–0.80, InfoUSA).13 Some studies noted relatively similar findings across eating establishment types19,28,31), but differences were noted.12,22

By geographic and sociodemographic characteristics

Slightly higher levels of evidence for validity of secondary data sources were found in urban versus rural areas (Appendix E, available online at www.ajpmonline.org).12,13,17,18,20,21,25 For example, in an eight-county study in South Carolina, no marked differences in evidence for validity were found across levels of urbanization, except Dun & Bradstreet showed greater sensitivity in urban than rural tracts.13 Another study conducted in Illinois found higher levels of agreement for Dun & Bradstreet in suburban versus urban tracts.12 But for InfoUSA, no differences in sensitivity for most retail outlet types across levels of urbanization were found; however, in rural compared to urban tracts, convenience stores and fast-food restaurants had lower levels of agreement.

A study area’s SES12,21,24,27,28 or race/ethnicity composition12,24,26,28 had little effect on the evidence for validity reported. For example, although agreement for supermarkets and grocery stores did not differ across census tracts of varying income levels in Illinois for InfoUSA, agreement was higher in low- compared to middle-income tracts for Dun & Bradstreet.12 On the other hand, agreement for convenience stores and fast-food restaurants did not differ across tracts for Dun & Bradstreet, but for InfoUSA, agreement was lower in low- versus middle-income tracts for convenience stores and was also lower in low-income compared to high-income tracts for nonchain fast-food restaurants.

No differences were reported for supermarkets and grocery stores between predominately white versus black tracts. Yet, agreement was higher in mixed-race tracts compared to black tracts. In Hispanic versus non-Hispanic tracts, agreement was higher for supermarkets and grocery stores. For convenience stores, agreement was higher in white versus black and mixed-race tracts for both commercial databases, but in Dun & Bradstreet, agreement was higher for black compared to mixed-race tracts and was lower in Hispanic versus non-Hispanic tracts. For InfoUSA, agreement was also lower for convenience stores in Hispanic versus non-Hispanic tracts. For nonchain fast-food restaurants, agreement was higher in white compared to black and mixed-race tracts and higher in non-Hispanic versus Hispanic tracts for Dun & Bradstreet, but agreement did not differ for InfoUSA, but was lower in mixed-race versus white tracts.

Discussion

The evidence for validity reported from 19 studies demonstrates differences in accuracy among and between commercial, government, local directories, and omnidirectional data sources for characterizing retail food environments. Much work still remains to be done in order to identify which secondary data source or combination of secondary data sources is best for characterizing retail food environments. Future work can help improve consistency in the gathering, editing, geocoding, and analyzing of secondary data sources.11 Although commonly used to examine associations between retail food environments and chronic diseases,7,8,37,38 Dun & Bradstreet had lower validity than InfoUSA.12,13,18,20,28,30

Certain government sources showed promise, especially if used for identifying the specific types of food outlets regulated by the agency maintaining the registry. Still, the accuracy and accessibility of a registry depends on the agency creating and maintaining the registry; therefore, future research and practice should examine the evidence for validity of the particular registry before relying on the data for characterizing retail food environments. As for local directories, low-resource projects or community food assessments may find these free and relatively accessible data sources worthwhile, but local directories tended to have lower validity than commercial or government data sources. Omnidirectional sources depend on the quality and timing of the visual data capture and might have limited utility in rural settings and in areas with limited or restricted Google coverage (www.google.com/streetview). For certain urban areas, nonetheless, Google Street View and Google Earth may be low-resource options.31,32,39

No studies, to our knowledge, have compared various primary data approaches (e.g., ground-truthed versus on-site verification) to help determine which approach is optimal for conducting evidence for validity of secondary data sources. Only four studies12,14,17,18 ground-truthed, according to the operational definition listed in Table 1, although other studies used the term, illustrating the importance of agreeing on nomenclature and definition of terms. Therefore, further work is needed that compares evidence for validity of secondary retail food outlet data sources reported across various primary data–gathering strategies in diverse geographic and sociodemographic settings. If this work accounts for similarities and differences in methodology, analytic approaches, accuracy, time, and resources, then the findings can guide researchers interested in understanding the evidence for validity of secondary data sources used for large study areas, where gathering primary data would be time-consuming, expensive, and impractical.

One possible approach might be examining smaller subsets of the larger study area to garner insights on evidence for validity for secondary data sources used in the larger study. Even for smaller areas, more work, particularly pertaining to cost–benefit analysis, is needed to guide researchers and practitioners on whether to gather primary data or use multiple data sources (which still demands time to gather, edit, merge, and analyze)13,18 or a combination of primary and secondary data sources. Research examining validity over time would also advance the field’s understanding of how to update primary or secondary data sources to accurately capture store closures and renovations (e.g., updated infrastructure at a corner store to offer more fresh produce or changes in store offerings as in the case of a Target® converting to a Super Target®).27 Besides data accuracy, there is a limited understanding of whether or not there is added value in using primary data collection strategies that engage key stakeholders through the first-hand observation of their community’s obstacles and potential solutions for accessing healthy, affordable foods.20,40

A need continues for consistent use of common measures and methods to classify retail food outlets and analyze evidence for validity for characterizing retail food outlets.41,42 The National Collaborative on Childhood Obesity Research (NCCOR) Measures Registry has been working on addressing this need and has the potential to facilitate more transdisciplinary dialogue about the similarities and differences between methods used and evidence for validity reported for secondary data sources for characterizing a variety of other health-related resources such as physical activity facilities (www.nccor.org/projects/measures/index.php).4345 More research can also help determine where outlet specificity is needed to better gauge evidence for validity of a secondary data source or best satisfy a particular study’s purpose; for example, supermarkets versus grocery stores, chain convenience stores versus independent or single-unit stores, full-service versus fast-food restaurants, and farmers’ markets versus mobile markets.

Another challenge to reviewing the evidence for validity of secondary retail food data sources was that more than half of the studies did not report evidence for validity by type of food outlet for a particular secondary data source examined. Certain secondary data sources may perform better or worse for specific types of food outlets. As one example, government food registries usually only gather data on regulated retail food outlets; studies, however, usually do not elaborate on the specific jurisdictional rules, regulations, or monitoring and enforcement practices.13,18 Future examinations using food registries might provide more explicit detail on the outlets reached (and not), and may exclude from the analysis those retail food outlets not captured by the source to better understand the source’s validity for the outlets it regulates.

Much work remains in understanding the effects of area definition and sociodemographic characteristics on evidence for validity of secondary retail food data sources; specifically, establishing how best to define a study area(s).7,8,11,17,4650 Further work is also needed to understand the effect that a particular geographic area (e.g., county versus census block group) may have on the evidence for validity reported and the ways in which the unit of interest may vary depending on levels of urbanization. Data sources examined at aggregate geographic units (e.g., counties) are likely to exhibit higher agreement than those examined at more disaggregate geographic units (e.g., block groups).51

The geographic units of analysis used across the 19 studies varied, making it difficult to determine which unit or approach was best. Even though a study area’s SES or race/ethnicity composition had little effect on the evidence for validity reported in this review, future research should continue to strengthen our understanding of if and how these variables affect validity, given that several studies document disparities in access to retail food outlets, especially among low-income, ethnic minority and rural communities.7,8 Likewise, as more data emerge within and across countries, future research should examine if and how differences in local, state, tribal, or national policies or food registry infrastructure influence evidence for validity.

Conclusion

Researchers should strive to gather primary data, but if relying on secondary data sources, InfoUSA and government food registries had higher levels of agreement than that reported by other secondary data sources and may provide sufficient accuracy for exploring these associations in large study areas. Whether using primary and/or secondary data sources, researchers should strive to use common methods and measures for data acquisition and analyses, including strategies for establishing the boundaries and geographic unit(s) of analysis for the study area(s). In addition, future work should conduct psychometric analyses such as for classifying retail food outlet type.

Supplementary Material

01

Acknowledgments

Support for preliminary work on this project was provided by the Health-e NC, an initiative of the University Cancer Research Fund at the University of North Carolina—Chapel Hill and by Healthy Eating Research, a national program of the Robert Wood Johnson Foundation (RWJF), ID #66958. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH, Health-e NC, or RWJF. Heather D’Angelo, MHS, Robin McKinnon, PhD, MPA, Margaret McDowell, PhD, MPH, RD, and Van Hubbard, MD, PhD, as well as members of the CDC’s Nutrition and Obesity Policy Research and Evaluation Network Rural Food Access Working Group provided feedback on the paper.

Footnotes

No financial disclosures were reported by the authors of this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Story M, Kaphingst K, Robinson-O’Brien R, Glanz K. Creating healthy food and eating environments: Policy and environmental approaches. Annu Rev Public Health. 2008;29:253–72. doi: 10.1146/annurev.publhealth.29.020907.090926. [DOI] [PubMed] [Google Scholar]
  • 2.Glanz K, Sallies J, Saelens B, Frank L. Healthy nutrition environments: Concepts and measures. Am J Health Promot. 2005;19(5):330–3. doi: 10.4278/0890-1171-19.5.330. [DOI] [PubMed] [Google Scholar]
  • 3.Morland K, Evenson K. Obesity prevalence and the local food environment. Health Place. 2009;15(2):491–5. doi: 10.1016/j.healthplace.2008.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Alter D, Eny K. The relationship between the supply of fast-food chains and cardiovascular outcomes. Can J Public Health. 2005;96(3):173–7. doi: 10.1007/BF03403684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Auchincloss A, Diez Rouz A, Mujahid M, Shen M, Bertoni A, Carnethon M. Neighborhood resources for physical activity and healthy foods and incidence of Type 2 diabetes mellitus: The multi-ethnic study of atherosclerosis. Arch Intern Med. 2009;169(18):1698–704. doi: 10.1001/archinternmed.2009.302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.An R, Sturm R. School and residential neigbhorhood food environment and dietary intake among California children and adolescents. Am J Prev Med. 2012;42(2):129–35. doi: 10.1016/j.amepre.2011.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fleischhacker S, Evenson K, Rodriguez D, Ammerman A. A systematic review of fast food access studies. Obes Rev. 2010;12(5):e460–e71. doi: 10.1111/j.1467-789X.2010.00715.x. [DOI] [PubMed] [Google Scholar]
  • 8.Larson N, Story M, Nelson M. Neighborhood environments: Disparities in access to healthy foods in the U. S Am J Prev Med. 2009;36(1):74–81. doi: 10.1016/j.amepre.2008.09.025. [DOI] [PubMed] [Google Scholar]
  • 9.Sharkey J. Measuring potential access to food stores and food-service places in rural areas in the U. S Am J Prev Med. 2009;36(4S):S151–S5. doi: 10.1016/j.amepre.2009.01.004. [DOI] [PubMed] [Google Scholar]
  • 10.McKinnon R, Reedy J, Morrissette M, Lytle L, Yaroch A. Measures of the food environment: A compilation of the literature, 1990–2007. Am J Prev Med. 2009;36(4S):S124–S33. doi: 10.1016/j.amepre.2009.01.012. [DOI] [PubMed] [Google Scholar]
  • 11.Forsyth A, Lytle L, Riper D. Issues and challenges in using geographic information systems to measure food access. J Transp Land Use. 2010;3(1):43–65. doi: 10.5198/jtlu.v3i1.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Powell L, Han E, Zenk S, et al. Field validation of secondary commercial data sources on the retail food environment in the U. S Health Place. 2011;17(5):1122–31. doi: 10.1016/j.healthplace.2011.05.010. [DOI] [PubMed] [Google Scholar]
  • 13.Liese A, Colabianchi N, Lamichhance A, et al. Validation of 3 food outlet databases: Completeness and geospatial accuracy in rural and urban food environments. Am J Epidemiol. 2010;172(11):1324–33. doi: 10.1093/aje/kwq292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sharkey J, Horel S. Neighborhood socioeconomic deprivation and minority composition are associated with better potential spatial access to the food environment in a large rural area. J Nutr. 2008;138:620–7. doi: 10.1093/jn/138.3.620. [DOI] [PubMed] [Google Scholar]
  • 15.Higgins P, Straub A. Understanding the error of our ways: Mapping the concepts of validity and reliability. Nurs Outlook. 2006;54:23–9. doi: 10.1016/j.outlook.2004.12.004. [DOI] [PubMed] [Google Scholar]
  • 16.Landis J, Koch G. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74. [PubMed] [Google Scholar]
  • 17.Toft U, Erbs-Maibing P, Glumer C. Identifying fast-food restaurants using a central register as a measure of the food environment. Scand J Public Health. 2011;39(8):864–9. doi: 10.1177/1403494811423431. [DOI] [PubMed] [Google Scholar]
  • 18.Fleischhacker S, Rodriguez D, Evenson K, et al. Evidence for validity of five secondary data sources for enumerating retail food outlets in seven American Indian communities in North Carolina. Int J Behav Nutr Phys Act. 2012;9:137. doi: 10.1186/1479-5868-9-137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Longacre M, Primack B, Owens P, et al. Public directory data sources do not accurately characterize the food environment in two predominantly rural states. J Am Diet Assoc. 2011;111:577–82. doi: 10.1016/j.jada.2011.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.McGuirt J, Jilcott S, Vu M, Keyserling M. Conducting community audits to evaluate community resources for healthful lifestyle behaviors: An illustration from rural Eastern North Carolina. Prev Chronic Dis. 2011;8(6):A149. [PMC free article] [PubMed] [Google Scholar]
  • 21.Lake A, Burgoine T, Stamp E, Grieve R. The foodscape: Classification and field validation of secondary data sources across urban/rural and socio-economic classifications in England. Int J Behav Nutr Phys Act. 2012;9:37. doi: 10.1186/1479-5868-9-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lake A, Burgoine T, Greenhalgh F, Stamp E, Tyrrell R. The foodscape: Classification and field validation of secondary data sources. Health Place. 2010;16:666–73. doi: 10.1016/j.healthplace.2010.02.004. [DOI] [PubMed] [Google Scholar]
  • 23.Hosler A, Dharssi A. Identifying retail food stores to evaluate the food environment. Am J Prev Med. 2010;39(1):41–4. doi: 10.1016/j.amepre.2010.03.006. [DOI] [PubMed] [Google Scholar]
  • 24.Rossen L, Pollack K, Curriero F. Verification of retail food outlet location data form a local health department using ground-truthing and remote-sensing technology: Assessing differences by neighborhood characteristics. Health Place. 2012;18(5):956–62. doi: 10.1016/j.healthplace.2012.06.012. [DOI] [PubMed] [Google Scholar]
  • 25.Seliske L, Pickett W, Bates R, Janssen I. Field validation of food service listings: A comparsion of commerical and online geographic information system databases. Int J Environ Res Public Health. 2012;9:2601–7. doi: 10.3390/ijerph9082601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Paquet C, Daniel M, Kestens Y, Leger K, Gauvin L. Field validation of listings of food stores and commercial physical activity establishments from secondary data. Int J Behav Nutr Phys Act. 2008;5:58. doi: 10.1186/1479-5868-5-58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cummins S, Macintyre S. Are secondary data sources on the neighbourhood food environment accurate? Case-study in Glasgow, UK Prev Med. 2009;49:527–8. doi: 10.1016/j.ypmed.2009.10.007. [DOI] [PubMed] [Google Scholar]
  • 28.Bader M, Ailshire J, Morenoff J, House J. Measurement of the local food environment: A comparison of existing data sources. Am J Epidemiol. 2010;171(5):609–17. doi: 10.1093/aje/kwp419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Svastisalee C, Holstein B, Due P. Validation of presence of supermarkets and fast-food outlets in Copenhagen: Case study comparison of multiple sources of secondary data. Public Health Nutr. 2012;15(7):1228–31. doi: 10.1017/S1368980012000845. [DOI] [PubMed] [Google Scholar]
  • 30.Gustafson A, Lewis S, Wilson C, Pitts S. Validation of food store environment secondary data source and the role of neighborhood deprivation in Appalachia, Kentucky. BMC Public Health. 2012;12:688. doi: 10.1186/1471-2458-12-688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Clarke P, Ailshire J, Melendez R, Bader M, Moreneff J. Using Google Earth to conduct a neighborhood audit: Reliability of a virtual audit instrument. Health Place. 2010;16:1224–9. doi: 10.1016/j.healthplace.2010.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rundle A, Bader M, Richards C, Neckerman K, Teitler J. Using Google Street View to audit neighborhood environments. Am J Prev Med. 2011;40(1):94–100. doi: 10.1016/j.amepre.2010.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cummins S, Macintyre S. A systematic study of an urban foodscape: the price and availability of food in greater Glasgow. Urban Studies. 2002;39(11):2115–30. [Google Scholar]
  • 34.Neckerman K, Lovasi G, Davies S, et al. Disparities in urban neighborhood conditions: Evidence from GIS measures and field observation in New York City. J Public Health Policy. 2009;30:S264–S85. doi: 10.1057/jphp.2008.47. [DOI] [PubMed] [Google Scholar]
  • 35.Han E, Powell L, Zenk S, Rimkus L, Ohri-Vachaspati P, FJC Classification bias in commerical business lists for retail food stores in the U. S Int J Behav Nutr Phys Act. 2012;9:46. doi: 10.1186/1479-5868-9-46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Moreneff J, House J, Hansen B, Williams D, Kaplan G, Hunte H. Understanding social disparities in hypertension prevalence, awareness, treatment, and control: the role of neighborhood context. Soc Sci Med. 2009;65(9):1853–66. doi: 10.1016/j.socscimed.2007.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Boone-Heinonen J, Gorden-Larsen P, Kiefe C, Shikany J, Lewis C, Popkin B. Fast food restaurants and food stores: Longitudinal associations with diet in young to middle-aged adults: The CARDIA Study. Arch Intern Med. 2011;171(13):1162–70. doi: 10.1001/archinternmed.2011.283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Block J, Christakis N, O’Malley A, Subramanian S. Proximity to food establishments and body mass index in the Framingham Heart Study offspring cohort over 30 years. Am J Epidemiol. 2011;174(10):1108–14. doi: 10.1093/aje/kwr244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lefer T, Anderson M, Fornari A, Lambert A, Fletcher J, Baguero M. Using Google Earth as an innovative tool for community mapping. Public Health Rep. 2008;123(4):474–80. doi: 10.1177/003335490812300408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lewis L, Sloane D, Nascimentno L, et al. African Americans’ access to healthy food options in south Los Angeles restaurants. Am J Public Health. 2005;95:668–73. doi: 10.2105/AJPH.2004.050260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ohri-Vachaspati P, Leviton L. Measuring food environments: A guide to available instruments. Am J Health Promot. 2010;24(6):410–26. doi: 10.4278/ajhp.080909-LIT-190. [DOI] [PubMed] [Google Scholar]
  • 42.McKinnon R, Reedy J, Handy S, Rodgers A. Measuring the food and physical activity environments: Shaping the research agenda. Am J Prev Med. 2009;36(4):S81–S5. doi: 10.1016/j.amepre.2009.01.003. [DOI] [PubMed] [Google Scholar]
  • 43.Boone J, Gorden-Larsen P, Stewart J, Popkin B. Validation of a GIS facilities database: Quantification and implications of error. Ann Epidemiol. 2008;18:371–7. doi: 10.1016/j.annepidem.2007.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Han E, Powell L, Slater S, Quinn C. Validation of secondary commerical data sources for physical activity facilities in urban and nonurban settings. J Phys Act Health. 2012;9(8):1080–8. doi: 10.1123/jpah.9.8.1080. [DOI] [PubMed] [Google Scholar]
  • 45.Hoehner C, Schootman M. Concordance of commerical data sources for neighborhood-effects studies. J Urban Health. 2010;87(4):713–25. doi: 10.1007/s11524-010-9458-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.De Marco A, De Marco M. Conceptualization and measurement of the neighborhood in rural settings: A systematic review of the literature. J Community Psychol. 2010;38(1):99–114. [Google Scholar]
  • 47.Gittelsohn J, Sharma S. Physical, consumer, and social aspects of measuring the food environment among diverse low-income populations. Am J Prev Med. 2009;36(4 Suppl):S161–S5. doi: 10.1016/j.amepre.2009.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Mikkelsen B. Images of foodscapes: Introduction to foodscape studies and their application in the study of healthy eating out-of-home environments. Perspect Public Health. 2011;131(5):209–16. doi: 10.1177/1757913911415150. [DOI] [PubMed] [Google Scholar]
  • 49.Schaefer-McDaniel N, Caughy M, O’Campo P, Gearey W. Examining methodological details of neighbourhood observations and the relationship to health: A literature review. Soc Sci Med. 2010;70:277–92. doi: 10.1016/j.socscimed.2009.10.018. [DOI] [PubMed] [Google Scholar]
  • 50.Salze P, Banos A, Oppert J, et al. Estimating spatial accessibility to facilities on the regional scale: An extended commuting-based interaction potential model. Int J Health Geogr. 2011;10:2. doi: 10.1186/1476-072X-10-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Cressie N. Change of support and the modifiable areal unit program. Geographical Systems. 1996;3:159–80. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

RESOURCES