Introduction
San Francisco State University (SFSU)’s Institute for Geographic Information Science (IGISc) is collaborating with Dr. Emma Sanchez-Vaznaugh in SFSU’s Department of Public Health on a project to map the fast food environment surrounding public schools in California. The project investigates the links between the rate of change in the fast food environment of a school and socioeconomic characteristics of the surrounding neighborhood to understand population-level childhood obesity disparities over time. The geospatial component of this research focuses on using commercial data to characterize food environments near schools (availability and proximity of schools to food stores) to provide the spatiotemporal detail needed to examine associations of interest. In this article, we highlight the logic of the geospatial methods used to derive a more accurate spatial representation of where all public schools and fast food outlets in the state of California are located. Previous research has shown that a disconnect between US Postal Service 5-digit postal numbers and the spatial mapping of these to zip code tabulation areas (ZCTAs) that started with the 2000 decennial Census, can have spatiotemporal implications such that addresses could be “correctly” geocoded to a zip code that did not exist preceding the decennial Census and “zip code-level analyses have yielded socioeconomic gradients contrary to those reported in the literature” (Krieger et al. 2002). Research by Kaufman et al. (2015) found that “re-geocoding data may improve spatial precision, particularly in early years” and “though geoprocessing [the National Establishment Time Series (NETS)] is a large investment, the accuracy of business establishment locations is central to valid aggregate measures of commercial business access.” Thus, to capture the most accurate location information possible with regard to spatial precision, we demonstrate a method to “recover” low accuracy NETS addresses.
Methods and Data
Schools and Service Areas
School address data for each year from 2000 to 2018 were downloaded from the California Department of Education (CDE). Schools from the newest 2017–2018 school year were excluded if they opened after June of 2018, were completely virtual (had no actual campus populated by students), or if they offered and served preschool through fourth grade levels or adults only. All California school addresses were geocoded using the 2013 Esri US Street Address Locator (Esri 2013a, 2013b) or using the ESRI World geocoder (2019) with the 1984 World Geodetic System (WGS 1984) as the geographic coordinate reference system. Schools and fast food locations were projected to a California Teale Albers projection, which is optimized for area calculations (CDFW, 2018). Comparisons were made between the CDE’s original latitude/longitude and the newly geocoded coordinates. When there was agreement or near agreement, the CDE coordinates were kept. “Near agreement” was defined as matching the CDE latitude and longitude coordinates up to 3 decimal places (approx. ~1m) of the geocoded latitude and longitude. If there was missing coordinate information from the CDE or there was not a near-match between coordinates, the geocoded coordinates were used.
Service areas were created in 0–0.25 mile (0.40 km), 0.25–0.5 mile (0.80 km), 0.5–0.75 mile (1.21 km), and 0.75–1.0 mile (1.61 km) rings around each school using ESRI’s Network Analyst extension with the 2013 Esri US Streets Network Dataset (ESRI 2013a, 2013b). A service area includes all the area that can be accessed travelling along a road network for the specified distance from the school location starting point. Service areas represent a more realistic depiction of how pedestrians and motorists traverse the landscape than a radial buffer, though the accuracy of service areas is limited by the accuracy of available road data.
Fast Food
Fast food location data for businesses (2000–2012) were purchased from the National Establishment Time Series (NETS; Walls & Associates, 2012), a unique historical database providing location information about commercial resources to an accuracy of block face (highest accuracy), street segment, block group, census tract centroid, or ZIP code (lowest accuracy). Greater zip code-level accuracies were observed in earlier years as the accuracy of location data improved substantially over time. An assumption of this analysis is that ESRI’s 2019 geocoding service would provide a more up-to-date street network dataset than the geocoding services that Dunn & Bradstreet and Walls & Associates originally used to compile the NETS location information. Thus, this analysis focused on “re-geocoding” all business locations provided by NETS from 2000–2012 that were less accurate than block face (14,528). The ESRI World Geocoding Service (ESRI 2019c) was used to “re-geocode” businesses to ‘point address-level’ or ‘sub-address-level’ accuracy, which were considered comparable to the ‘block face-level’ accuracy provided by NETS. A minimum of 75% match score was considered as an acceptable address location (ESRI 2019 World Geocoder).
To assess the accuracy of using a 2019 ESRI Geocoder to locate addresses from 2000 – 2012, approximately 30 random re-geocoded addresses (ESRI 2019c) were crosschecked against Google Maps (Google 2019) to determine distance as a threshold for accepting re-geocoded addresses as valid. Distances between the original NETS coordinates and the regeocoded coordinates were calculated with Python (v. 3.4) to evaluate quality control and assurance. Figure 2 describes this process.
Figure 2:
Description of the process used in the methodology.
Results & Discussion
Of the 219,253 total businesses provided in the 2000 – 2012 NETS dataset, 14,528 businesses (6.63% percent) were open after 2000 and classified as a lower accuracy than block face. Eleven out of the 31 randomly selected establishments that were regeocoded were confirmed as valid after cross-referencing with GoogleMaps and averaged a 14-mile (23 km) distance away from the original NETS-provided locations while the remaining 20 establishments were confirmed to be invalid and averaged a 127-mile (204 km) distance. Therefore, an arbitrary though conservative 10-mile (16 km) threshold was chosen in order to filter out regeocoded addresses that were likely to be anomalous. In other words, re-geocoded locations whose distance was greater than 10 miles (16 km) from the NETS locations were considered implausible and therefore excluded (approx. 1.4% of the 14,528 regeocoded business addresses) and the original coordinates from the NETS dataset for these locations were retained. Figure 3 illustrates one business location’s accuracy improvement with the regeocoding.
Figure 3 -.
Example of improvement from an establishment between the NETS coordinates and the ESRI World Geocoder coordinates (distance of 5 miles). In this case, the improvement was from Zip code-level to Point address-level accuracy.
Of the total of 219,253 NETS business addresses from 2000 to 2012, we recovered 3.18% of these locations. Of the total NETS businesses in the same period, there were 14,528 non-block face level food establishments that fell within the 10 mile (16 km) validity threshold that were regeocoded. Of those, we recovered approximately 48%.
The recovery of 3.18% of the total business addresses from 2000 – 2012 from NETS and 48.0% recovered of those that were regeocoded increases the sample size of the establishment data, and importantly, the accuracy of the locations, enabling a more robust spatial analysis. In 2000, 8.28% of the NETS locations had a level of precision at zip-code level accuracy. With re-geocoding, we were able to reduce this to 3.33%. For 2010, we reduced this from 1.18% to 0.98%. Kaufman et al. (2015) attempted a similar analysis and by regeocoding NETS addresses, they were able to recover business locations to an accuracy better than zip code from 16% in 2000 and reducing this to only 2% and for 2010, from 2% to 1% after using a combination of geocoders for NETS New York City business addresses. Consistent with their analysis, we also saw the most improvement in recovering addresses for the earlier years of the dataset. These regeocoded values should improve the accuracy of further spatial analysis research evaluating the impact of the food environment near schools on children’s health.
Figure 1:
Example of open grocery stores in 2000 and 2012 in a school’s service areas in Los Angeles, CA
Table 1 –
Percentage of NETS business locations (total = 14,528) open after 2000 with improved accuracy using the ESRI 2019 World Geocoder considering a cut off distance for accepting regeocoded addresses as spatially valid
| Cut off Distance (miles) | Addresses recovered | Number of Addresses excluded | Percentage of Addresses recovered of the total post 2000 not D (14,528) | Percentage of Addresses recovered of the total post 2000 (219,253) |
|---|---|---|---|---|
| 0 | 7,170 | 0 | 49.35% | 3.27% |
| 10 (arbitrary) | 6,968 | 202 | 47.96% | 3.18% |
| 15 (arbitrary) | 7,039 | 131 | 48.45% | 3.21% |
| 100 (Census Tract) | 7,124 | 46 | 49.04% | 3.24% |
Contributor Information
Anna Studwell, Institute for Geographic Information Science, SFSU.
Ana Pelegrini, Institute for Geographic Information Science, SFSU.
Maria Acosta, Department of Public Health, SFSU.
Karina Fastovsky, Institute for Geographic Information Science.
Mika Matsuzaki, Department of Public Health, SFSU.
Aiko Weverka, Institute for Geographic Information Science, SFSU.
Emma V Sanchez-Vaznaugh, Department of Public Health, SFSU.
References:
- CDFW 2018. CDFW Projections and Datum Guidelines: https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=109326&inline [Google Scholar]
- Esri 2013a. Data & Maps for ArcGIS. Redlands, CA: Environmental Systems Research Institute. [Google Scholar]
- Esri 2013b. ArcGIS Desktop: Release 10.4.1. Redlands, CA: Environmental Systems Research Institute. [Google Scholar]
- ESRI 2019c. ArcGIS Pro 2.3.3, ArcGIS World Geocoding Service September, 2019 (Address Locater) [Google Scholar]
- Google 2019. Map data © 2019 https://www.google.com/intl/en_us/help/terms_maps/, visited September 2019 [Google Scholar]
- Kaufman, 2015. Measuring health-relevant businesses over 21 years: refining the National Establishment Time-Series (NETS), a dynamic longitudinal data set. BMC Res Notes 8: 507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krieger N. 2002. “Zip code caveat. Bias due to spatiotemporal mismatches between zip codes and US Census-defined geographic areas—The public health disparities geocoding project.” Research and Practice. 92: 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walls & Associates, 2012. Denver CO: National Establishment Time Series (NETS) Database: 2012 Database Description. [Google Scholar]



