Skip to main content
Journal of Urban Health : Bulletin of the New York Academy of Medicine logoLink to Journal of Urban Health : Bulletin of the New York Academy of Medicine
. 2020 Oct 1;98(2):271–284. doi: 10.1007/s11524-020-00482-2

Business Data Categorization and Refinement for Application in Longitudinal Neighborhood Health Research: a Methodology

Jana A Hirsch 1,2, Kari A Moore 2, Jesse Cahill 3, James Quinn 3, Yuzhe Zhao 2, Felicia J Bayer 2, Andrew Rundle 3, Gina S Lovasi 1,2,
PMCID: PMC8079597  PMID: 33005987

Abstract

Retail environments, such as healthcare locations, food stores, and recreation facilities, may be relevant to many health behaviors and outcomes. However, minimal guidance on how to collect, process, aggregate, and link these data results in inconsistent or incomplete measurement that can introduce misclassification bias and limit replication of existing research. We describe the following steps to leverage business data for longitudinal neighborhood health research: re-geolocating establishment addresses, preliminary classification using standard industrial codes, systematic checks to refine classifications, incorporation and integration of complementary data sources, documentation of a flexible hierarchical classification system and variable naming conventions, and linking to neighborhoods and participant residences. We show results of this classification from a dataset of locations (over 77 million establishment locations) across the contiguous U.S. from 1990 to 2014. By incorporating complementary data sources, through manual spot checks in Google StreetView and word and name searches, we enhanced a basic classification using only standard industrial codes. Ultimately, providing these enhanced longitudinal data and supplying detailed methods for researchers to replicate our work promotes consistency, replicability, and new opportunities in neighborhood health research.

Electronic supplementary material

The online version of this article (10.1007/s11524-020-00482-2) contains supplementary material, which is available to authorized users.

Keywords: Businesses, Commercial, Classification, Food environment, Physical activity destinations, Cohort studies, Place and health, Geography, GIS or geographic information systems

Background

Contextualizing human behavior and health within environments is important to both understanding and intervening on population health [13]. Ecological models that consider influences across many levels (from intra-personal up to neighborhood or policy context) have long histories in many disciplines (e.g., public health, sociology, geography, biology, and psychology) [4]. Specifically, robust theories of place effects from sociology and geography guide examinations of health with the physical, social, and economic environment [58]. While these frameworks highlight the value of multi-level data on environmental influences, direction on collecting, processing, aggregating, and linking such data to participants is often provided only in brief form within methods. This compounds measurement error concerns, creates inconsistent or incomparable results across studies, and limits exploration of new environment-health relationships [911].

Retail and commercial environments play key roles in the residents’ lives and enhance important social connections by providing spaces to regularly visit and commune with friends, neighbors, coworkers, and strangers [1217]. As such, locations of businesses and amenities are of broad relevance to many health domains [18]. Specifically, physical activity facilities, healthcare locations, food stores, restaurants, social destinations, and common walking destinations can be extracted from retail data for use in health studies. By combining these data with individual-level health behavior data from cohort studies, previous work has linked residential neighborhood recreation facilities to physical activity [1921], food stores and restaurants to nutrition [2225], social destinations to depressive symptoms and sleep [2628], and common walking destinations to active travel [19, 2931]. Similar work has shown associations between neighborhood destinations and obesity [3236], diabetes [3739], hypertension [4042], and cardiovascular disease [4245]. These types of locations have also been recognized as potentially supportive of mobility, healthy aging, and aging-in-place among older adults [4648]. Beyond linkages of retail and commercial destinations to individual-level participant data, these data hold additional importance for understanding dynamic neighborhood changes. Destination data have been used to examine equity in distribution of changes in health-promoting resources [4951], shifts in economic centers [52, 53], or characterizations of types of neighborhood change [54]. As point-based data, these destinations can be used in many combinations with different definitions of neighborhoods, categories of locations, and metrics [36, 55, 56]. While select research has aimed to establish and disseminate methods for calculating neighborhood retail environments [5558], work has often been limited temporally to only one time or geographically to only one place [56, 5961]. Accordingly, meta-analyses and systematic reviews have pointed to a lack of consistency in environmental measures as limiting advancement of research connecting neighborhood context to health [10, 11, 6266].

To enhance technical understanding and advance methods for exploring environmental influences on health, this paper articulates the detailed steps in collection, processing, and linking a longitudinal retail and commercial database to both neighborhood- and individual-level data on health. We designed a multi-step process that allows increased utility for business data for research purposes (Fig. 1) using National Establishment Time Series (NETS) Database (1990–2014) as an example. This process includes (1) initial classification using previous work and theory, (2) systematic checks of industry codes of uncertain relevance to a given category, (3) refinement of classifications using additional data sources, and (4) linking to participants or administrative units. These create a final system of flexible, hierarchical categories that can be combined (in “main” categories and “auxiliary” categories) to construct custom measures for sensitivity analyses. We conclude by discussing ways our work addresses challenges facing research teams when using destination data for longitudinal and national health research.

Fig. 1.

Fig. 1

Multi-step process to clean, categorize, enhance, and compile Dun and Bradstreet (D&B) business data from the National Establishment Time Series (NETS) into neighborhoods for research in the Retail Environment and Cardiovascular Disease (RECVD) study

Methods

Longitudinal Business Destination Data Source

Dun and Bradstreet (D&B, Short Hills, NJ) is a company that provides commercial data, often for economic analysis. These data, originally self-reported for credit purposes, provide business names, addresses, sales, employment, and more. The company’s database contains more than 265 million business records worldwide [67], representing information that can be tied to varied health research. To better characterize the business dynamics of neighborhoods, we licenced for academic purposes using grant funding the NETS longitudinal Database on local business establishments from Walls & Associates (Walls & Associates, Denver, CO). These data are not a survey or statistical sample; instead, they represent a continual annual census of American business, government, and non-profits. Methods on their collection can be found elsewhere [68, ]. Briefly, Walls & Associates use annual snapshots of D&B data taken every January since 1990 to create annual time series of establishments and their characteristics that vary over time. In this way, Walls & Associates creates time series information on over 300 variables including business name, company name, years when a business was active, relocation history, sales volume, number of employees, and the industrial classification of that business using Standard Industrial Classification (SIC). Detailed information on business classification using SIC appears in the subsequent sections. An important data quality concern of both D&B and Walls & Associates is duplication of records. D&B uses proprietary techniques to eliminate duplicate records, but differences such as two reported SICs, address discrepancies, or different abbreviations of business names hinder the elimination of duplicates. In general, Walls & Associates reports that D&B’s duplicate processing eliminates most standalone duplicates.

Re-Geolocating Business Establishment Data

The geolocations of business data provided by the vendor may contain accuracy issues. If these inaccuracies were differential across place and time, they may introduce bias into future analyses. Specifically, approximately 19% of the business establishment locations were geocoded with only ZIP code level accuracy. Although for some large-footprint businesses, such as medical centers, the ZIP code may in fact be unique to that business, these ZIP code level geocodes (more common in early years) were the main cause for concern over address accuracy. Further, the geocoding methods were known to change over time yet we were not provided with detailed documentation of these changes. Thus, we re-geolocated NETS businesses’ addresses using batch geocoding in ESRI Business Analyst software (ESRI Redwoods, CA) and a composite locator, which utilized Navteq 2014 (Q3) reference data (i.e., street ranges, address points).

Initial Classification Based on SIC System

We devised an initial classification system using the primary SIC of each business, with guidance from previous research [ 29, 56, 69]. The SIC system was originally established in the 1930s to classify establishments by the type of activity in which they are primarily engaged. While the North American Industry Classification System (NAICS) was established in 1997 to replace the no-longer-sufficient SIC system, SIC codes are still used in databases like NETS to allow longitudinal analyses across time (pre- and post-1997). Although a NAICS-SIC crosswalk existed [70], it used only four-digit SIC codes (rather than eight-digit), coarsening categories and potentially introducing misclassification.

Guided by theories [6, 7] and expert knowledge on the health impacts of the retail environment, we classified businesses under six broad domains: food and restaurants (places to obtain both prepared and unprepared food), alcohol (locations to purchase or consume alcohol), social (destinations where you can interact with others), physical activity (places to be active), walkable destinations (locations which are part of vibrant neighborhoods and can facilitate activities of daily living without a car), and healthcare facilities (hospitals, offices, and other resources for maintaining health or obtaining health care including pharmacies and physical therapy). These domains were chosen for their potential impacts on health behaviors (e.g., diet, substance use, social interaction, activity levels, active transport, preventive medicine, mental health) and subsequent impacts of these behaviors on chronic health outcomes (e.g., obesity, metabolic function, cardiovascular disease, cancer, mortality). Where possible, we started with existing lists of SIC codes from other studies, including the Multi-Ethnic Study of Atherosclerosis [29, 57]; a study in New York City, NY [56, 59]; and a study from St. Louis, MO [69]. For each domain, we consulted with internal or external researchers with in-depth subject matter expertise. First, these individuals outlined important themes or divisions within categories. For example, within unhealthy food sources, they recommended unhealthy food stores be distinguished from restaurants, distinguishing select types of interest among restaurants (e.g., fast food), and adding categories to accommodate shifts in the food system (e.g., convenience stores which now sell a lot of unhealthy options). Next, experts examined a list of all 18,000+ possible 8-digit SIC codes and corresponding SIC descriptions to classify each eight-digit SIC code into an auxiliary category (within each domain). At least one other researcher adjudicated classification.

Systematic Checks of Ambiguous SIC Codes

Based on SIC description, it may be unclear whether to include individual SIC codes in various health-related neighborhood amenity categories. This may occur because of the catch-all categories or simply the complexity of various retail establishments. In these cases, we systematically checked businesses using 2014 NETS data and online search tools including Google StreetView (Google, CA).

To account for geographic differences that may exist in SIC coding, we conducted checks for each ambiguous SIC code across 150 randomly sampled retail establishments across 30 locations in the continental U.S. We chose the 30 locations to span the ten standard federal regions. Within each region, we selected the largest city, one mid-size city, and one rural county (list of locations in Supplemental Table S1). To select each city, we took the largest from the list of 300 largest cities in the U.S. We then took all cities with a population between 100,000 and 300,000 and took the second-largest city in this mid-side city grouping. Finally, for rural counties, we created quintiles of population density for all counties in the U.S. Then, we randomly selected counties in the second quintile, as this would provide adequate businesses to check. Finally, we checked that no cities with population of more than 10,000 existed within the county and that it did not border a metropolitan area.

Within these 30 selected locations, we randomly selected five records in 2014 for each ambiguous SIC code. If there were less than five establishments with that SIC code in that area, we chose all records. In situations where an ambiguous SIC code had 1500 or less establishments nationally, we randomly selected 150 from the entire list regardless of region and size block. For ambiguous SIC codes with 150 or fewer establishments, we chose all records for spot-checking.

Once we selected up to 150 establishments, we audited each business using Google StreetView in order to assess additional information that could inform whether a given ambiguous SIC code would be used to identify businesses as falling in a category of interest. The information collected differed based on SIC code and broad category. For example, we checked SIC code labeled as “55419903” (Truck stops), potentially representing gas stations, for the presence of convenience stores and fast food chains and found 64% to have a convenience store and 32% to have fast food. In a final step, our statistical team summarized the results of these audits and then discussed during bi-monthly meetings for collective agreement on final classification.

Refinement of Classification Using Additional Data Sources

Since each individual business indicates their own SIC code, classification using only this system can lead to measurement error [56, 57, 71, 72]. As a result, for some categories, we implemented additional word searches or name searches to capture businesses that may be misclassified by SIC code alone.

We could identify some types of establishments using searches for key words commonly appearing in their names. These businesses included pizza, liquor stores, spas, and select multi-use physical activity facilities (e.g., Young Men’s Christian Association (YMCA), Jewish Community Center (JCC)). For example, while we could classify pizza restaurants by SIC codes “58120600,” “58120601,” and “58120602,” we also enhanced identification using a word search for “pizza” or “pizzeria” within SIC codes “58120000–58129999” (eating places) and “54110000–54999999” (food stores).

Relying on researcher-created lists of current chains would risk overlooking chains that were important in geographies and time periods less familiar to our team. Instead, we obtained lists of chain names from Technomic/Restaurants and Institutions (R&I) (restaurants) and TDLinx® (supermarkets, convenience stores, mass merchandise stores, wholesale stores, gas stations, and pharmacies).

Searching through text fields required a detailed and systematic approach. First, NETS included fields in the dataset for Company (ownership) and Trade Name. Since companies could own a large number of subsidiaries which may fall under different subcategories (e.g., Company may be a franchise firm but Trade Name may be the actual business name), we preferred Trade Name as the identifier for which store was at that location. Since the Trade Name is more reliable for identifying what store was actually at that location, we chose to classify records based on chain name searches (for food stores, pharmacies, discount department stores, and restaurants), only using Company Name if the Trade Name was not included. For example, a record may contain Company = “DUNKIN DONUTS-BASKIN ROBBINS” and Trade Name = “BASKIN ROBBINS.” We categorized this record under “sweets” since the Trade Name identified a chain name on the list for sweets. Additionally, since Company and Trade Names provided in NETS had notable variation in how a given chain name was represented, chain names often had multiple entry and spelling variations within the dataset and did not always match those R&I and Technomic provided. Rather than search the entire NETS dataset for potential name/spelling variants, we used select SIC codes (available from research team upon request) to identify name/spelling variants for application to the broader NETS dataset. We identified the reduced SIC codes as the SIC code in which we expect approximately 80% of the businesses within the corresponding category to be found, based on previous work [57]. We then reviewed initial sets of records identified as being on or off the chain name list to identify mismatches or additional name variants. Finally, we implemented the programming code created from this smaller subset on the broader NETS dataset.

Creation of a Final Classification System

Ultimately, we combined the above into a final system for existing and future research projects. This system has smaller, “auxiliary” categories nested within larger, “main” categories (all within the six broad domains) to allow for both analyses of these specific categories and broader combinations. We defined auxiliary categories in relatively small subsets that could be combined into tailored versions for specific outcomes or allow for sensitivity analyses by including or excluding individual auxiliary categories. Some auxiliary categories overlap. Additionally, some auxiliary categories were not expected to be meaningful when used on their own (e.g., chain name bakeries vs total sweets) but to allow for sensitivity analyses. We created main categories as groupings of auxiliary categories that were most likely to be useful across multiple projects or analyses, limiting the need to make customizations and increasing consistency. For most analyses, the main categories will be more useful than auxiliary categories. Inclusion was based on most common use case. For example, while nuts stores often sell healthy options (e.g., unsweetened or unsalted nuts), most people go to buy unhealthy options (e.g., candy). If there was no clear consensus among the research team or literature about the inclusion of an auxiliary category in a main category, we made two versions: restrictive (with only the auxiliary categories entirely supported by literature) and unrestrictive (with a broader set of auxiliary categories considered by the literature).

Establishment of a Flexible Hierarchical System for External Researchers

Chain name and word searches produced the potential for double counting if a researcher wanted to create a customized grouping of categories and summed existing measures. For example, an establishment that we counted as fast food using SIC code but also showed up in pizza using the word search would count in both. The categories of fast food and pizza are thus not mutually exclusive, and a naïve approach of summing counts to create a measure of unhealthy food sources will result in double counting some establishments. The creation of mutually exclusive categories requires prioritization (e.g., categories informed by a chain name search take priority over those informed only by SIC code), which we have operationalized through our hierarchical system. Without such a system, customized categories would need to be constructed from the business-level dataset, ultimately restricting flexibility of the dataset for future research to only researchers with access to the business-level dataset. The hierarchy classification we created to avoid double counting may also have advantages where name or chain search led to more appropriate classification. The hierarchal classification system removes these records from the “incorrect” category, leaving them only in the presumably “correct” category. The hierarchy system classifies each business record into the research defined “best” category for that record such that each record is only classified into a unique category. The hierarchy only uses the auxiliary categories as hierarchy categories. These can be grouped together using the definitions in the main categories to produce the main categories, as needed. The hierarchy system addresses two main issues: (1) double counting of a business when adding categories not originally combined during main category creation and (2) removing records from their “incorrect” categories based on their word search. For example, the hierarchy system may move a specific business classified as a convenience store using chain names into the convenience store category (and remove it from the category it incorrectly fell in using SIC code alone) while also removing a specific business classified as a convenience store using SIC code but actually representing a gas station using chain names.

Spatial Linking to Neighborhoods

Once we classified retail establishments, we could link them spatially to individual cohort participants’ residential histories and administrative neighborhoods using GIS. We calculated counts, densities per population, and densities per area for census tracts, ZIP code tabulation areas (ZCTAs), and radial (Euclidean) buffers around participants’ homes. We used buffer sizes of 1 and 5 km to allow for modifications in relevant spatial scale for different research questions or populations. For example, the distance an older adult may be willing to travel for healthy, affordable food may differ from the distance this same individual or another may be willing to travel for necessary routine medical care. Counts and densities for census tracts and ZCTAs allowed for ease of linkage with other cohorts as new collaborations arose. Buffer scale and administrative areas were informed by previous literature [7375] and resource constraints.

Results

Systematic Checks and Refinement Using Additional Data

In total, we checked 4586 individual establishments within 81 ambiguous SIC codes. Decisions made after systematic checks can be obtained from the authors upon request. These advanced our classification beyond the initial classification using SIC codes alone. Similarly, additional word searches or name searches resulted in a significantly bigger gain over using SIC code classification alone (Fig. 2; Supplemental Table S2). These gains represented a varied percentage across categories ranging from less than 5% benefit for pharmacies and drug stores to over 20% for fast food and convenience stores.

Fig. 2.

Fig. 2

Additional records identified and classified using word and/or chain name searches within different categories. For counts and details, see supplemental data (Table S2). *We did more than one set of word and/or chain name searches within this category

Final Classification System

The final classification system included 44 main categories and 89 auxiliary categories (Supplemental Table S3). The final classification system was complex and intertwined with some auxiliary categories falling in more than one broader main category (See online, interactive diagram: https://bit.ly/2RpGoGR). Very few categories had remaining disagreements that resulted in restrictive and unrestrictive versions. This was most common in food environment establishments where there is a wide variety of evidence for different dietary components and a wide variety of food items (some healthy and some unhealthy) within establishments. For example, healthy food could include grocery stores, supermarkets, and fruit and vegetable markets (restrictive), or it may include additional SIC codes such as fish stores, natural food stores, and more (unrestrictive). Similarly, some research groups may be particularly interested in fast food establishments and could subset this from the broader unhealthy food category.

Classification of Categories over Time

From the original 1990–2014 NETS dataset, our cleaned, classified version included 77,726,626 businesses across the contiguous U.S., which has steadily increased since 1990 when NETS was initiated, with a small fluctuation after the most recent economic recession (Fig. 3; Supplemental Table S4). In general, less than 50% of businesses in any given year were classified as health-related businesses, and this percentage decreased between 1990 and 2014 (Fig. 3; Supplemental Table S4). Among all businesses, the largest proportion was classified as walkable destinations, followed by social. Comparatively few establishments were classified as alcohol or physical activity destinations.

Fig. 3.

Fig. 3

Count of total business establishments in the National Establishment Time Series Dataset across the contiguous U.S. from 1990 to 2014 overlaid with the percentage classified within each domain across time. For counts and details see supplemental data (Table S4)

Discussion

Accurately identifying and quantifying retail environments provides critical information on the contexts and constraints in which people live, work, socialize, and make decisions relevant to their health. In light of the rising emphasis on neighborhood context as a determinant of health [18], we have charted an advanced and novel method for collection, processing, and linking a longitudinal retail and commercial database to both neighborhood- and individual-level data on health. Specifically, our method enhances business data with multiple datasets (including TDLinx® and Google StreetView) and a flexible, hierarchical classification system in order to identify retail destinations important for human health. These methods innovate on existing work by developing a flexible hierarchical system to address both double counting and misclassification. Simultaneously, our use of supplemental data further reduces misclassification.

Previous work that classified business data for use in health research often relied on more spatially or temporally limited datasets [56, 5961]. Distinct from these, our work addresses several challenges unique to classifying destination data for longitudinal and national health research: (1) consistency of classification across space and temporal patterns, (2) the multiple roles a type of destination might play in the health processes, and (3) encompassing enough flexibility for existing and future research needs.

We created a classification system that aims for consistency across space and temporal patterns. However, this posed additional challenges. Establishments may be region-, state-, city-, or temporally specific; by using TDLinx® data to obtain chain name lists, we eliminate bias from differential knowledge of place and time by the research team. Alternatively, the on-the-ground establishments of a destination type may vary by place and time. For example, destinations classified as fish stores may be sell fresh fish in one location and fried fish in another. By creating restrictive and unrestrictive healthy food categories, our classification system allows flexibility for locations where these stores may sell fresh fish. Similarly, there may be temporal shifts in retail patterns and markets. For example, many gas stations had limited food products in the late twentieth century but substantial food retail by the beginning of the twenty-first century. Again, our name search process and hierarchical classification system gives researchers flexibility to include or exclude these destinations from a definition of convenience stores if needed. These features enhance previous efforts to study trajectories over time in different business categories [51, 54, 76, 77]. In turn, longitudinal datasets like these may be key to leveraging social, political, or economic events that may impact health for natural experiments. For example, research could examine the potential health consequences of business closures [78] that result from economic recessions or disasters (e.g., extreme weather events, pandemics/COVID19) [7981].

Classifying all, rather than a subset of, businesses required flexibility for different types of destinations based on how they may intersect with health processes. For example, a researcher interested in walkable destinations and physical activity might be concerned with the actual walking gained from having these nearby, necessitating a smaller buffer size or the inclusion of only categories someone might feasibly walk to for daily errands. Alternatively, a different investigator working on walkable destinations and mental health might be more interested in a wider set of places that are socially relevant to an individual, may only be visited infrequently, and may be farther from an individual’s home. This necessitates a larger buffer size and the inclusion of different destinations. Our hierarchical system, combined with calculations linked to different buffer sizes and different administrative units, accommodates both of these use cases as well as some exploration of sensitivity and robustness to scale and definition that can inform future data collection and linkage efforts. However, our method only used radial (Euclidean) buffers and administrative units; other researchers may choose to link business data to street (Network) buffers or activity spaces to address uncertainty in geographic context or the modifiable areal unit problem [73, 82, 83].

Our system strives to accommodate flexibility for new avenues of research or modification as they arise. For example, should a new research team hope to examine a single type of business, they could use the auxiliary categories to identify and work with that destination. Similarly, it allows for shifts in the field that may lead to reclassification within other systems. By maintaining auxiliary categories, one could easily remove a destination type if new research emerges that indicates its role in health may be different. For example, past classifications of commercial data included nut stores in the health food, natural food, and vitamin category and treated as healthy with the assumption that these destinations provided healthy options of raw nuts. However, other classifications classified these same stores into a bakery, candy, and ice cream categories and treated them as unhealthy, since many of these stores tend to display candy and sugary nuts more prominently. Due to these debates, our system classified nut stores as a separate category that could be included as desired for analyses purposes depending on the research question, setting, and investigator assumptions.

This method is not confined to the NETS dataset as other similar datasets of local business establishments exist [65]. Indeed, purchasing NETS data may be prohibitively expensive for many health-related research projects, necessitating the use of alternate data sources. A recent review of food environment studies found that the most common data source for food outlet locations was commercial lists (provided for a fee by private companies such as InfoUSA or Dunn and Bradstreet), followed by lists provided by government agencies and combinations of these and other sources [11]. InfoUSA is based primarily on the yellow pages and includes a geo-referenced listing maintained by ESRI and updated annually (InfoUSA, Business Analyst, ESRI). It has been used in health studies of depressive symptoms [84], obesity [85], and more. TDLinx®, a division of The Nielsen Company, another US source of data, only includes larger stores [58] but can be used for verification and refinement of other datasets. OneSource Global Business Browser (Avention) continually compiles business lists from over 2500 sources and has also been used in research characterizing built environments for health [86]. Alternatively, ReferenceUSA is a geo-referenced listing that uses Infogroup data and updates continually. Ultimately, no one database is complete, with comparisons showing that concordance varies by outlet type, urbanicity, and neighborhood socioeconomic status [69, 87]. Two separate studies have found agreement between 20 and 65% [69, 87] with both showing higher agreement in urban areas with more population [69, 87]. Regardless of the source of original business data, our method of incorporating complementary data sources through manual spot checks in Google StreetView and word and name searches could be used to enhance a basic classification using only standardized numeric coding systems such as SIC or NAICS.

While our method creates quality, flexible metrics of business establishments, we do not prescriptively advise researchers on analysis choices. Specifically, we make no recommendations on the preferred form of these variables when using as an outcome, predictor, or covariate. This may differ based on the distribution of the destination. For example, many categories (e.g., supermarkets) represent businesses that are relatively rare and contain a lot of zero values for both count and density in the buffer, census tract, or ZCTA. These destinations might require researchers to condense across time or to simplify data to presence and absence of a destination type, rather than count or density. Similarly, a research question may be geared toward changes over a set period of time and only consider differences from two time points, rather than using yearly estimates. Our system allows researchers to address these analysis choices on a case-by-case basis. Related, our method of combining data sources reduced misclassification (boosting category membership by as much as 20%), but additional work is needed to quantify the impacts of this advantage on pairs of business categories and health outcomes; it is expected that the results of correcting these biases will differ across research questions.

Despite the advancements we made by providing a method that is consistent over space and time, accounts for the various roles of destinations in health processes, and has flexibility for existing and future research needs, several limitations remain. First, previous work has highlighted a lack of validation for various commercial lists of destination data, including food outlets [11]. Validation historically has been done via in-person mapping (“ground-truthing”) or through phone and internet validation. These types of validation are not logistically feasible for a national dataset with over 80 million records. Further, even without resource-constrained field teams, we have no means to ground-truth or spot check historic business data. Indeed, our own systematic checks of ambiguous SIC codes using Google StreetView data solely in the most recent year. Second, despite our efforts to build in flexibility, we still created categories based on the constructs we wished to capture using available literature in the field. Therefore, to the extent that previous cross-sectional or single location research may not be generalizable to the broader U.S. during multiple decades, our categories may be insufficient. GIS-based commercial measures provide an incomplete picture of opportunities available to residents. Analyses of understudied regions (including rural areas) may require more qualitative work or a careful interpretation of these measures. Third, the process of classifying across such a broad geographic and temporal scope may be complemented by smaller-scale studies with more tailored measurements. Indeed, our project prioritizes coverage and consistency in measures using national data. Local datasets may be more accurate or valid for some research questions. For example, our method classifies alcohol outlets using both SIC codes and word search. However, city and state regulatory agencies (e.g., the Pennsylvania Liquor Control Board) may provide more detailed information on alcohol outlets for smaller-scale research. Finally, researchers are starting to use machine learning and natural language processing to complete classification of big data like these [88, 89]; it was outside the scope of this project to use these analytic techniques or compare them with our own. Future work could leverage these hand-coded data to train machine learning algorithms and develop new tools for the field of neighborhood health research.

Conclusion

We chronicled methods for collecting and processing a longitudinal, commercial retail database for multiple decades across the entire U.S. Our goal was to obtain objective data, which were reasonably complete and consistent, to link derived measures with national cohort studies. We largely achieved these goals by incorporating multiple data sources, creating a flexible and hierarchical classification system, and creating a national, longitudinal dataset characterizing census tracts, ZCTAs, and buffers.

Ultimately, by creating and sharing these methods, this work promotes new avenues of longitudinal and national research previously unattainable. While administrative boundaries are not optimal definitions of neighborhoods for research on individual neighborhoods and health [9092], the accessibility of this derived dataset ultimately promotes dissemination of this type of dataset and linkage to other cohorts. Similarly, linkages of this data to census tract and ZCTA facilitate research on spatial patterns of businesses, including longitudinal assessments of neighborhood change dynamics. Finally, by expanding previous work to a broader context, we greatly increase the field’s capacity to identify and analyze patterns of place and health across space and time.

Electronic Supplementary Material

ESM 1 (81.9KB, docx)

(DOCX 81 kb)

Acknowledgments

We are especially grateful to Dornsife School of Public Health, Drexel University (Janene Brown, Dustin Fry, Sharon Dei-Tumi), LeBow School of Business, Drexel University (Erik Dolson), and Columbia University (Brennan Rhodes Bratton) for their outstanding research assistance in auditing the ambiguous SIC codes. This work was enhanced by expert input by the Columbia University Built Environment Health Group (Tanya K. Kaufman, Nicolas Berger) for their work in categorizing the healthcare- and food-relevant locations in the New York-New Jersey-Pennsylvania metropolitan area. We are also grateful to the University of Alabama-Birmingham (Suzanne E. Judd) and Drexel University (Amy Auchincloss) for their input on the food categories; the New York Academy of Medicine (David S. Siscovick) and Rutgers Cancer Institute of New Jersey (Jennifer Tsui) for their input on the healthcare categories; University of Pittsburgh (Christina Mair) for her input on social and depression-related categories; and the investigators and staff of the “Communities Designed to Support Cardiovascular Health” team [National Institute of Aging (1R01AG049970, 3R01AG049970-04S1)] for their valuable contributions. This work was supported by the National Institute on Aging (1R01AG049970, 3R01AG049970-04S1); National Heart, Lung, and Blood Institute (grant R01HL131610); the Pennsylvania Department of Health (SAP #4100072543); the Urban Health Collaborative at Drexel University; and the generous gift from Dana and David Dornsife to the Drexel University Dornsife School of Public Health, whose funding made this study possible.

Abbreviations

D & B

Dun and Bradsteet

GIS

Geographic Information System

JCC

Jewish Community Centers

NAICS

North American Industry Classification System

NETS

National Establishment Time Series

R & I

Restaurants & Institutions

SIC

Standard Industrial Classification

YMCA

Young Men’s Christian Associations

ZCTA

Zip Code Tabulation Area

Authors’ Contributions

AR and GS conceived this project and obtained funding to execute it. JC, JQ, JH, KM, YZ, and FB helped classify, process, and analyze the dataset for this manuscript including statistical and geospatial elements. JH, KM, FB, and GS interpreted the data and were major contributors in writing the manuscript. All the authors read, edited, and approved the final manuscript.

Data Availability

The detailed methods, corresponding code or syntax, and reports summarizing aggregated data developed during the current study are available from the corresponding author on reasonable request, subject to limits specified in data use or licensing agreements.

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no competing interests.

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Jana A. Hirsch, Email: jah474@drexel.edu

Kari A. Moore, Email: kam642@drexel.edu

Jesse Cahill, Email: jcahill225@gmail.com.

James Quinn, Email: jq2145@columbia.edu.

Yuzhe Zhao, Email: yz833@drexel.edu.

Felicia J. Bayer, Email: fjb47@drexel.edu

Andrew Rundle, Email: agr3@cumc.columbia.edu.

Gina S. Lovasi, Email: gsl45@drexel.edu

References

  • 1.Arcaya MC, Tucker-Seeley RD, Kim R, Schnake-Mahl A, So M, Subramanian S. Research on neighborhood effects on health in the United States: a systematic review of study characteristics. Soc Sci Med. 2016;168:16–29. doi: 10.1016/j.socscimed.2016.08.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Roux AVD, Mair C. Neighborhoods and health. Ann N Y Acad Sci. 2010;1186(1):125–145. doi: 10.1111/j.1749-6632.2009.05333.x. [DOI] [PubMed] [Google Scholar]
  • 3.Kawachi I, Berkman LF. Neighborhoods and health. Oxford: Oxford University Press; 2003. [Google Scholar]
  • 4.Richard L, Gauvin L, Raine K. Ecological models revisited: their uses and evolution in health promotion over two decades. Annu Rev Public Health. 2011;32:307–326. doi: 10.1146/annurev-publhealth-031210-101141. [DOI] [PubMed] [Google Scholar]
  • 5.Sarkar C, Webster C. Healthy cities of tomorrow: the case for large scale built environment-health studies. J Urban Health. 2017;94(1):4–19. doi: 10.1007/s11524-016-0122-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Schulz A, Northridge ME. Social determinants of health: implications for environmental health promotion. Health Educ Behav. 2004;31(4):455–471. doi: 10.1177/1090198104265598. [DOI] [PubMed] [Google Scholar]
  • 7.Northridge ME, Sclar ED, Biswas P. Sorting out the connections between the built environment and health: a conceptual framework for navigating pathways and planning healthy cities. J Urban Health. 2003;80(4):556–568. doi: 10.1093/jurban/jtg064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.McMichael AJ. Prisoners of the proximate: loosening the constraints on epidemiology in an age of change. Am J Epidemiol. 1999;149(10):887–897. doi: 10.1093/oxfordjournals.aje.a009732. [DOI] [PubMed] [Google Scholar]
  • 9.Macintyre S, Ellaway A, Cummins S. Place effects on health: how can we conceptualise, operationalise and measure them? Soc Sci Med (1982) 2002;55(1):125–139. doi: 10.1016/s0277-9536(01)00214-3. [DOI] [PubMed] [Google Scholar]
  • 10.Charreire H, Casey R, Salze P, Simon C, Chaix B, Banos A, Badariotti D, Weber C, Oppert JM. Measuring the food environment using geographical information systems: a methodological review. Public Health Nutr. 2010;13(11):1773–1785. doi: 10.1017/S1368980010000753. [DOI] [PubMed] [Google Scholar]
  • 11.Cobb LK, Appel LJ, Franco M, Jones-Smith JC, Nur A, Anderson CA. The relationship of the local food environment with obesity: a systematic review of methods, study quality, and results. Obesity. 2015;23(7):1331–1344. doi: 10.1002/oby.21118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Oldenburg RJ. Our vanishing third places. The Planning Commissioners Journal. 1997;25(4):6–10. [Google Scholar]
  • 13.Mehta V, Bosson JK. Third places and the social life of streets. Environ Behav. 2009;42(6):779–805. [Google Scholar]
  • 14.Oldenburg R The great good place: cafes, coffee shops, bookstores, bars, hair salons, and other hangouts at the heart of a community. Cambridge, MA: Da Capo Press; 1999.
  • 15.Oldenburg R Celebrating the third place: inspiring stories about the great good places at the heart of our communities. Cambridge, MA: Da Capo Press; 2001.
  • 16.Klinenberg E. Heat wave: a social autopsy of disaster in Chicago. Chicago, IL: University of Chicago Press; 2015.
  • 17.Klinenberg E. Palaces for the people: how social infrastructure can help fight inequality, polarization, and the decline of civic life. New York City, NY: Broadway Books; 2018.
  • 18.Gullón P, Lovasi GS. Designing healthier built environments. Oxford, UK: Neighborhoods and Health 2018:219.
  • 19.Saelens BE, Handy SL. Built environment correlates of walking: a review. Med Sci Sports Exerc. 2008;40(7 Suppl):S550–S566. doi: 10.1249/MSS.0b013e31817c67a4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Berchuck SI, Warren JL, Herring AH, Evenson KR, Moore KAB, Ranchod YK, Diez-Roux AV. Spatially modelling the association between access to recreational facilities and exercise: the ‘Multi-Ethnic Study of Atherosclerosis’. J R Stat Soc: Series A (Statistics in Society) 2016;179(1):293–310. doi: 10.1111/rssa.12119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kaufman TK, Rundle A, Neckerman KM, Sheehan DM, Lovasi GS, Hirsch JA. Neighborhood recreation facilities and facility membership are jointly associated with objectively measured physical activity. J Urban Health 2019:1–13. [DOI] [PMC free article] [PubMed]
  • 22.Park Y, Neckerman K, Quinn J, Weiss C, Jacobson J, Rundle A. Neighbourhood immigrant acculturation and diet among Hispanic female residents of New York City. Public Health Nutr. 2011;14(9):1593–1600. doi: 10.1017/S136898001100019X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rummo PE, Meyer KA, Boone-Heinonen J, Jacobs DR, Jr, Kiefe CI, Lewis CE, Steffen LM, Gordon-Larsen P. Neighborhood availability of convenience stores and diet quality: findings from 20 years of follow-up in the coronary artery risk development in young adults study. Am J Public Health. 2015;105(5):e65–e73. doi: 10.2105/AJPH.2014.302435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fleischhacker SE, Evenson KR, Rodriguez DA, Ammerman AS. A systematic review of fast food access studies. Obes Rev. 2011;12(5):e460–e471. doi: 10.1111/j.1467-789X.2010.00715.x. [DOI] [PubMed] [Google Scholar]
  • 25.Caspi CE, Sorensen G, Subramanian SV, Kawachi I. The local food environment and diet: a systematic review. Health Place. 2012;18(5):1172–1187. doi: 10.1016/j.healthplace.2012.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Moore KA, Hirsch JA, August C, Mair C, Sanchez BN, Roux AVD. Neighborhood social resources and depressive symptoms: longitudinal results from the Multi-Ethnic Study of Atherosclerosis. J Urban Health. 2016;93(3):572–588. doi: 10.1007/s11524-016-0042-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Johnson DA, Hirsch JA, Moore KA, Redline S, Diez Roux AV. Associations between the built environment and objective measures of sleep: the Multi-Ethnic Study of Atherosclerosis. Am J Epidemiol. 2018;187(5):941–950. doi: 10.1093/aje/kwx302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mair C, Roux AD, Golden SH, Rapp S, Seeman T, Shea S. Change in neighborhood environments and depressive symptoms in New York City: the Multi-Ethnic Study of Atherosclerosis. Health Place. 2015;32:93–98. doi: 10.1016/j.healthplace.2015.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hirsch JA, Moore KA, Clarke PJ, Rodriguez DA, Evenson KR, Brines SJ, Zagorski MA, Diez Roux AV. Changes in the built environment and changes in the amount of walking over time: longitudinal results from the Multi-Ethnic Study of Atherosclerosis. Am J Epidemiol. 2014;180(8):799–809. doi: 10.1093/aje/kwu218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Cerin E, Nathan A, Van Cauwenberg J, Barnett DW, Barnett A. The neighbourhood physical environment and active travel in older adults: a systematic review and meta-analysis. Int J Behav Nutr Phys Act. 2017;14(1):15. doi: 10.1186/s12966-017-0471-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Owen N, Humpel N, Leslie E, Bauman A, Sallis JF. Understanding environmental influences on walking: review and research agenda. Am J Prev Med. 2004;27(1):67–76. doi: 10.1016/j.amepre.2004.03.006. [DOI] [PubMed] [Google Scholar]
  • 32.Meyer KA, Boone-Heinonen J, Duffey KJ, Rodriguez DA, Kiefe CI, Lewis CE, Gordon-Larsen P. Combined measure of neighborhood food and physical activity environments and weight-related outcomes: the CARDIA study. Health Place. 2015;33:9–18. doi: 10.1016/j.healthplace.2015.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hirsch JA, Moore KA, Barrientos-Gutierrez T, Brines SJ, Zagorski MA, Rodriguez DA, Diez Roux AV. Built environment change and change in BMI and waist circumference: multi-ethnic s tudy of a therosclerosis. Obesity. 2014;22(11):2450–2457. doi: 10.1002/oby.20873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lee H. The role of local food availability in explaining obesity risk among young school-aged children. Soc Sci Med. 2012;74(8):1193–1203. doi: 10.1016/j.socscimed.2011.12.036. [DOI] [PubMed] [Google Scholar]
  • 35.Zick CD, Smith KR, Fan JX, Brown BB, Yamada I, Kowaleski-Jones L. Running to the store? The relationship between neighborhood environments and the risk of obesity. Soc Sci Med. 2009;69(10):1493–1500. doi: 10.1016/j.socscimed.2009.08.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Thornton LE, Pearce JR, Kavanagh AM. Using geographic information systems (GIS) to assess the role of the built environment in influencing obesity: a glossary. Int J Behav Nutr Phys Act. 2011;8(1):71. doi: 10.1186/1479-5868-8-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Auchincloss AH, Roux AVD, Brown DG, Erdmann CA, Bertoni AG. Neighborhood resources for physical activity and healthy foods and their association with insulin resistance. Epidemiology. 2008;19:146–157. doi: 10.1097/EDE.0b013e31815c480. [DOI] [PubMed] [Google Scholar]
  • 38.Auchincloss AH, Roux AVD, Mujahid MS, Shen M, Bertoni AG, Carnethon MR. Neighborhood resources for physical activity and healthy foods and incidence of type 2 diabetes mellitus: the Multi-Ethnic study of Atherosclerosis. Arch Intern Med. 2009;169(18):1698–1704. doi: 10.1001/archinternmed.2009.302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Christine PJ, Auchincloss AH, Bertoni AG, Carnethon MR, Sánchez BN, Moore K, Adar SD, Horwich TB, Watson KE, Diez Roux AV. Longitudinal associations between neighborhood physical and social environments and incident type 2 diabetes mellitus: the Multi-Ethnic Study of Atherosclerosis (MESA) JAMA Intern Med. 2015;175(8):1311–1320. doi: 10.1001/jamainternmed.2015.2691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Dubowitz T, Ghosh-Dastidar M, Eibner C, Slaughter ME, Fernandes M, Whitsel EA, Bird CE, Jewell A, Margolis KL, Li W, Michael YL, Shih RA, Manson JAE, Escarce JJ. The Women’s Health Initiative: the food environment, neighborhood socioeconomic status, BMI, and blood pressure. Obesity. 2012;20(4):862–871. doi: 10.1038/oby.2011.141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kaiser P, Diez Roux AV, Mujahid M, Carnethon M, Bertoni A, Adar SD, Shea S, McClelland R, Lisabeth L. Neighborhood environments and incident hypertension in the Multi-Ethnic Study of Atherosclerosis. Am J Epidemiol. 2016;183(11):988–997. doi: 10.1093/aje/kwv296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chandrabose M, Rachele J, Gunn L, et al. Built environment and cardio-metabolic health: systematic review and meta-analysis of longitudinal studies. Obes Rev. 2019;20(1):41–54. doi: 10.1111/obr.12759. [DOI] [PubMed] [Google Scholar]
  • 43.Roux AVD, Mujahid MS, Hirsch JA, Moore K, Moore LV. The impact of neighborhoods on CV risk. Glob Heart. 2016;11(3):353–363. doi: 10.1016/j.gheart.2016.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Braun LM, Rodríguez DA, Evenson KR, Hirsch JA, Moore KA, Roux AVD. Walkability and cardiometabolic risk factors: cross-sectional and longitudinal associations from the Multi-Ethnic Study of Atherosclerosis. Health Place. 2016;39:9–17. doi: 10.1016/j.healthplace.2016.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Goh CE, Mooney SJ, Siscovick DS, Lemaitre RN, Hurvitz P, Sotoodehnia N, Kaufman TK, Zulaika G, Lovasi GS. Medical facilities in the neighborhood and incidence of sudden cardiac arrest. Resuscitation. 2018;130:118–123. doi: 10.1016/j.resuscitation.2018.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Rosso AL, Grubesic TH, Auchincloss AH, Tabb LP, Michael YL. Neighborhood amenities and mobility in older adults. Am J Epidemiol. 2013;178(5):761–769. doi: 10.1093/aje/kwt032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Yen IH, Michael YL, Perdue L. Neighborhood environment in studies of health of older adults: a systematic review. Am J Prev Med. 2009;37(5):455–463. doi: 10.1016/j.amepre.2009.06.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Chaudhury H, Campo M, Michael Y, Mahmood A. Neighbourhood environment and physical activity in older adults. Soc Sci Med. 2016;149:104–113. doi: 10.1016/j.socscimed.2015.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Lamichhane AP, Warren JL, Peterson M, Rummo P, Gordon-Larsen P. Spatial-temporal modeling of neighborhood sociodemographic characteristics and food stores. Am J Epidemiol. 2014;181(2):137–150. doi: 10.1093/aje/kwu250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Rummo PE, Guilkey DK, Ng SW, Popkin BM, Evenson KR, Gordon-Larsen P. Beyond supermarkets: food outlet location selection in four US cities over time. Am J Prev Med. 2017;52(3):300–310. doi: 10.1016/j.amepre.2016.08.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hirsch JA, Green GF, Peterson M, Rodriguez DA, Gordon-Larsen P. Neighborhood sociodemographics and change in built infrastructure. J Urban: International Research on Placemaking and Urban Sustainability. 2017;10(2):181–197. doi: 10.1080/17549175.2016.1212914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Neumark D, Zhang J, Wall B. Employment dynamics and business relocation: new evidence from the National Establishment Time Series. In: Aspects of worker well-being. Bingley: Emerald Group Publishing Limited; 2007:39–83.
  • 53.Neumark D, Zhang J, Wall B. Where the jobs are. Acad Manag Perspect. 2006;20(4):79–94. [Google Scholar]
  • 54.Rummo PE, Hirsch JA, Howard AG, Gordon-Larsen P. In which neighborhoods are older adult populations expanding? Sociodemographic and built environment characteristics across neighborhood trajectory classes of older adult populations in four U.S. cities over 30 years. Gerontol Geriatr Med. 2016;2:2333721416655966. doi: 10.1177/2333721416655966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Wilkins EL, Morris MA, Radley D, Griffiths C. Using geographic information systems to measure retail food environments: discussion of methodological considerations and a proposed reporting checklist (geo-FERN) Health Place. 2017;44:110–117. doi: 10.1016/j.healthplace.2017.01.008. [DOI] [PubMed] [Google Scholar]
  • 56.Kaufman TK, Sheehan DM, Rundle A, Neckerman KM, Bader MDM, Jack D, Lovasi GS. Measuring health-relevant businesses over 21 years: refining the National Establishment Time-Series (NETS), a dynamic longitudinal data set. BMC Research Notes. 2015;8(1):507. doi: 10.1186/s13104-015-1482-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Auchincloss AH, Moore KAB, Moore LV, Diez Roux AV. Improving retrospective characterization of the food environment for a large region in the United States during a historic time period. Health Place. 2012;18(6):1341–1347. doi: 10.1016/j.healthplace.2012.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Wang MC, Gonzalez AA, Ritchie LD, Winkleby MA. The neighborhood food environment: sources of historical data on retail food stores. Int J Behav Nutr Phys Act. 2006;3(1):15. doi: 10.1186/1479-5868-3-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Rundle AG, Chen Y, Quinn JW, et al. Development of a neighborhood walkability index for studying neighborhood physical activity contexts in communities across the US over the past three decades. J Urban Health 2019:1–8. [DOI] [PMC free article] [PubMed]
  • 60.Fleischhacker SE, Evenson KR, Sharkey J, Pitts SBJ, Rodriguez DA. Validity of secondary retail food outlet data: a systematic review. Am J Prev Med. 2013;45(4):462–473. doi: 10.1016/j.amepre.2013.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bader MD, Ailshire JA, Morenoff JD, House JS. Measurement of the local food environment: a comparison of existing data sources. Am J Epidemiol. 2010;171(5):609–617. doi: 10.1093/aje/kwp419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.McCormack GR, Shiell A. In search of causality: a systematic review of the relationship between the built environment and physical activity among adults. Int J Behav Nutr Phys Act. 2011;8(1):125. doi: 10.1186/1479-5868-8-125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Ding D, Gebel K. Built environment, physical activity, and obesity: what have we learned from reviewing the literature? Health Place. 2012;18(1):100–105. doi: 10.1016/j.healthplace.2011.08.021. [DOI] [PubMed] [Google Scholar]
  • 64.Lovasi GS, Grady S, Rundle A. Steps forward: review and recommendations for research on walkability, physical activity and cardiovascular health. Public Health Rev. 2011;33(2):484–506. doi: 10.1007/BF03391647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Forsyth A, Lytle L, Riper DV. Finding food: issues and challenges in using geographic information systems to measure food access. J Transp Land Use. 2010;3(1):43–65. doi: 10.5198/jtlu.v3i1.105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ni Mhurchu C, Vandevijvere S, Waterlander W, Thornton LE, Kelly B, Cameron AJ, Snowdon W, Swinburn B, INFORMAS Monitoring the availability of healthy and unhealthy foods and non-alcoholic beverages in community and consumer retail food environments globally. Obes Rev. 2013;14(S1):108–119. doi: 10.1111/obr.12080. [DOI] [PubMed] [Google Scholar]
  • 67.Dun & Bradstreet Corp. Dun & Bradstreet Corp/NW 2016 Annual Report Form (10-K). 2017; https://www.sec.gov/Archives/edgar/data/1115222/000111522217000007/a201610-k.htm. Accessed September 13, 2017.
  • 68.Walls D. National establishment time-series (NETS) database: 2013 database description. 2015. Denver, CO.
  • 69.Hoehner CM, Schootman M. Concordance of commercial data sources for neighborhood-effects studies. J Urban Health. 2010;87(4):713–725. doi: 10.1007/s11524-010-9458-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.NAICS Association. NAICS to SIC crosswalk. 2020; https://www.naics.com/naics-to-sic-crosswalk-2/. Accessed July 17, 2020.
  • 71.Jones KK, Zenk SN, Tarlov E, Powell LM, Matthews SA, Horoi I. A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments. BMC Res Notes. 2017;10(1):35. doi: 10.1186/s13104-016-2355-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Rundle A, Neckerman Kathryn M, Freeman L, et al. Neighborhood food environment and walkability predict obesity in New York City. Environ Health Perspect. 2009;117(3):442–447. doi: 10.1289/ehp.11590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.James P, Berrigan D, Hart JE, Aaron Hipp J, Hoehner CM, Kerr J, Major JM, Oka M, Laden F. Effects of buffer size and shape on associations between the built environment and energy balance. Health Place. 2014;27:162–170. doi: 10.1016/j.healthplace.2014.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Feng J, Glass TA, Curriero FC, Stewart WF, Schwartz BS. The built environment and obesity: a systematic review of the epidemiologic evidence. Health Place. 2010;16(2):175–190. doi: 10.1016/j.healthplace.2009.09.008. [DOI] [PubMed] [Google Scholar]
  • 75.Richardson AS, Meyer KA, Howard AG, Boone-Heinonen J, Popkin BM, Evenson KR, Shikany JM, Lewis CE, Gordon-Larsen P. Multiple pathways from the neighborhood food environment to increased body mass index through dietary behaviors: a structural equation-based analysis in the CARDIA study. Health Place. 2015;36:74–87. doi: 10.1016/j.healthplace.2015.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Hirsch JA, Grengs J, Schulz A, Adar SD, Rodriguez DA, Brines SJ, Diez Roux AV. How much are built environments changing, and where?: patterns of change by neighborhood sociodemographic characteristics across seven US metropolitan areas. Soc Sci Med. 2016;169:97–105. doi: 10.1016/j.socscimed.2016.09.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Berger N, Kaufman TK, Bader MD, et al. Disparities in trajectories of changes in the unhealthy food environment in New York city: a latent class growth analysis, 1990–2010. Soc Sci Med. 2019;2019:112362. doi: 10.1016/j.socscimed.2019.112362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Finlay J, Esposito M, Kim MH, Gomez-Lopez I, Clarke P. Closure of ‘third places’? Exploring potential consequences for collective health and wellbeing. Health Place. 2019;60:102225. doi: 10.1016/j.healthplace.2019.102225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Bezruchka S. The effect of economic recession on population health. Can Med Assoc J. 2009;181(5):281–285. doi: 10.1503/cmaj.090553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Katikireddi SV, Niedzwiedz CL, Popham F. Trends in population mental health before and after the 2008 recession: a repeat cross-sectional analysis of the 1991–2010 Health Surveys of England. BMJ Open. 2012;2(5):e001790. doi: 10.1136/bmjopen-2012-001790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Nandi A, Charters TJ, Strumpf EC, Heymann J, Harper S. Economic conditions and health behaviours during the ‘Great Recession’. J Epidemiol Community Health. 2013;67(12):1038–1046. doi: 10.1136/jech-2012-202260. [DOI] [PubMed] [Google Scholar]
  • 82.Hirsch JA, Winters M, Clarke P, McKay H. Generating GPS activity spaces that shed light upon the mobility habits of older adults: a descriptive analysis. Int J Health Geogr. 2014;13(1):51. doi: 10.1186/1476-072X-13-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Chen X, Kwan M-P. Contextual uncertainties, human mobility, and perceived food environment: the uncertain geographic context problem in food access research. Am J Public Health. 2015;105(9):1734–1737. doi: 10.2105/AJPH.2015.302792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Duncan DT, Piras G, Dunn EC, Johnson RM, Melly SJ, Molnar BE. The built environment and depressive symptoms among urban youth: a spatial regression study. Spatial Spatio-Temporal Epidemiol. 2013;5:11–25. doi: 10.1016/j.sste.2013.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Black JL, Macinko J. The changing distribution and determinants of obesity in the neighborhoods of New York City, 2003–2007. Am J Epidemiol. 2010;171(7):765–775. doi: 10.1093/aje/kwp458. [DOI] [PubMed] [Google Scholar]
  • 86.Stewart OT, Carlos HA, Lee C, Berke EM, Hurvitz PM, Li L, Moudon AV, Doescher MP. Secondary GIS built environment data for health research: guidance for data development. J Transp Health. 2016;3(4):529–539. doi: 10.1016/j.jth.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Burgoine T, Harrison F. Comparing the accuracy of two secondary food environment data sources in the UK across socio-economic and urban/rural divides. Int J Health Geogr. 2013;12(1):2. doi: 10.1186/1476-072X-12-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Chen X, Lin X. Big data deep learning: challenges and perspectives. IEEE Access. 2014;2:514–525. [Google Scholar]
  • 89.Jin X, Wah BW, Cheng X, Wang Y. Significance and challenges of big data research. Big Data Res. 2015;2(2):59–64. [Google Scholar]
  • 90.Coulton C. Defining neighborhoods for research and policy. Cityscape. 2012:231–6.
  • 91.Duncan DT, Kawachi I, Subramanian S, Aldstadt J, Melly SJ, Williams DR. Examination of how neighborhood definition influences measurements of youths’ access to tobacco retailers: a methodological note on spatial misclassification. Am J Epidemiol. 2013;179(3):373–381. doi: 10.1093/aje/kwt251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Tatalovich Z, Wilson JP, Milam JE, Jerrett M, McConnell R. Competing definitions of contextual environments. Int J Health Geogr. 2006;5(1):55. doi: 10.1186/1476-072X-5-55. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ESM 1 (81.9KB, docx)

(DOCX 81 kb)

Data Availability Statement

The detailed methods, corresponding code or syntax, and reports summarizing aggregated data developed during the current study are available from the corresponding author on reasonable request, subject to limits specified in data use or licensing agreements.


Articles from Journal of Urban Health : Bulletin of the New York Academy of Medicine are provided here courtesy of New York Academy of Medicine

RESOURCES