Skip to main content
International Journal of Epidemiology logoLink to International Journal of Epidemiology
. 2017 Jun 6;46(5):1456–1464. doi: 10.1093/ije/dyx070

Does unmeasured confounding influence associations between the retail food environment and body mass index over time? The Coronary Artery Risk Development in Young Adults (CARDIA) study

Pasquale E Rummo 1,2, David K Guilkey 2,3, Shu Wen Ng 1,2, Katie A Meyer 1,2, Barry M Popkin 1,2, Jared P Reis 4, James M Shikany 5, Penny Gordon-Larsen 1,2,*
PMCID: PMC5837451  PMID: 28586464

Abstract

Background

Findings in the observational retail food environment and obesity literature are inconsistent, potentially due to a lack of adjustment for residual confounding.

Methods

Using data from the CARDIA study (n = 12 174 person-observations; 6 examinations; 1985–2011) across four US cities (Birmingham, AL; Chicago, IL; Minneapolis, MN; Oakland, CA), we used instrumental-variables (IV) regression to obtain causal estimates of the longitudinal associations between the percentage of neighbourhood food stores or restaurants (per total food outlets within 1 km network distance of respondent residence) with body mass index (BMI), adjusting for individual-level socio-demographics, health behaviours, city, year, total food outlets and market-level prices. To determine the presence and extent of bias, we compared the magnitude and direction of results with ordinary least squares (OLS) and random effects (RE) regression, which do not control for residual confounding, and with fixed effects (FE) regression, which does not control for time-varying residual confounding.

Results

Relative to neighbourhood supermarkets (which tend to be larger and have healthier options than grocery stores), a higher percentage of grocery stores [mean = 53.4%; standard deviation (SD) = 31.8%] was positively associated with BMI [β = 0.05; 95% confidence interval (CI) = 0.01, 0.10] using IV regression. However, associations were negligible or null using OLS (β = −0.001; 95% CI = −0.01, 0.01), RE (β = −0.003; 95% CI = −0.01, 0.0001) and FE (β = −0.003; 95% CI = −0.01, 0.0002) regression. Neighbourhood convenience stores and fast-food restaurants were not associated with BMI in any model.

Conclusions

Longitudinal associations between neighbourhood food outlets and BMI were greater in magnitude using a causal model, suggesting that weak findings in the literature may be due to residual confounding.

Keywords: Instrumental-variables regression, neighbourhoods, retail food environment, obesity, weight, endogeneity

Background

In response to inequities in access to healthy food choices, policy makers have sought to modify the retail food environment in low-income areas.1,2 Theoretically, such efforts would influence where residents shop, what they consume and ultimately weight status. However, such experiments have not been successful in reducing obesity,1,3–7 despite some mixed supporting evidence from observational research. For example, findings from observational studies suggest a positive association between density of fast-food restaurants, convenience stores and grocery stores with body mass index (BMI),8–13 and a negative association between full-service restaurants and supermarkets with BMI.8–10,13 Yet Cobb et al., in a systematic review, reported that associations between the retail food environment and obesity are predominantly null.10

Inconsistencies in observational research may be due to a lack of adjustment for unmeasured confounding such as: unmeasured preferences for residing near certain food outlet types; placement of food outlets in areas with higher demand;14 reverse causality; or differential measurement error. Non-causal methods (i.e. any model that ignores time-invariant and time-varying residual confounding), such as ordinary least squares (OLS) and random effects (RE) regression, implicitly assume that omitted variables (e.g. residential preferences) are independent of explanatory variables, and thus may produce biased estimates in the presence of residual confounding.15 Fixed effects (FE) regression controls for observed and unobserved time-invariant characteristics15 but ignores unobserved time-varying characteristics. In contrast, instrumental-variables (IV) regression is a causal approach that corrects for time-varying and time-invariant residual confounding by using proxies for exposures and eliminating the correlation between exposures and unmeasured characteristics.16,17 A few cross-sectional studies on fast-food restaurant availability and BMI have used IV regression finding estimates of greater magnitude relative to OLS regression18–20 but these studies did not address possible substitution effects (e.g. higher relative availability of full-service versus fast-food restaurants).

To address these gaps, we used 25 years of data from the Coronary Artery Risk Development in Young Adults (CARDIA) study and IV regression to quantify associations between different types of neighbourhood food outlets and BMI over time, while accounting for correlation between measured exposures and unmeasured characteristics. We compared the magnitude and direction of estimates from a causal approach (IV regression) with estimates derived from non-causal models (OLS, RE and FE regression) to assess the extent of bias. Based on previous work,18–20 we hypothesized that non-causal models would underestimate the impact of the retail food environment on obesity, possibly due to a lack of adjustment for unobserved bias.

Methods

Study sample

CARDIA is a prospective study of the development and risk factors of cardiometabolic disease in Black and White young adults. In 1985–86, 5115 CARDIA participants were recruited from four US metropolitan areas (Birmingham, AL; Chicago, IL; Minneapolis, MN; Oakland, CA); enrolment was balanced by age (18–24 years or 25–30 years), race (White or Black), gender and education (≤ high school or > high school). Follow-up examinations were conducted in 1987–88 (Year 2), 1990–91 (Year 5), 1992–93 (Year 7), 1995–96 (Year 10), 2000–01 (Year 15), 2005–06 (Year 20) and 2010–11 (Year 25), with retention of 91%, 86%, 81%, 79%, 74%, 72% and 72% of participants, respectively.

Individual-level data

Self-reported socio-demographics were collected at each examination, using a standardized questionnaire, including age, gender, race (Black, White), current educational attainment (years), marital status and number of children. Total family income (categorical responses) was collected starting with Year 5, so we used income values from Year 5 as a proxy for baseline values (no other Year 5 data were used).

Self-reported physical activity (PA) was assessed at each examination using the CARDIA PA History questionnaire21 which captures frequency of participation in 13 categories of exercise in the previous 12 months. Alcohol consumption in the past year was assessed using a self-reported questionnaire at each examination.

Outcome variables

Height and weight were measured to the nearest 0.5 cm and 0.1 kg, respectively, by trained study staff and used to calculate BMI (kg/m2). Waist circumference (WC) was measured in duplicate at the minimum abdominal girth.

Neighbourhood-level data

Using Dun & Bradstreet (D&B) Duns Market Identifiers File (Dun & Bradstreet, Inc., Short Hills, NJ),22 a commercial dataset of US businesses with fair reliability and validity,23–25 we obtained the counts of PA facilities and food outlets at each examination year. We classified food outlets according to 8-digit Standard Industrial Classification (SIC) codes in Years 7, 10, 15, 20 and 25 (Appendix 1, available as Supplementary data at IJE online). Only 4-digit codes were available in 1986, so we used matched business names and a prediction model to supplement classification at baseline (Appendix 2, available as Supplementary data at IJE online).

We also used data from several commercial sources to calculate measures related to neighbourhood socio-demographics, employment density, street connectivity and consumer prices (Appendix 2). Using a geographic information system (GIS), we matched neighbourhood-level measures to CARDIA respondents’ residential addresses at baseline and Years 7, 10, 15, 20 and 25.

Analytical sample

Participants who resided in one of the four baseline cities in each examination year were eligible for the current study (n = 4316, 2462, 1728, 1481, 1202 and 1119 at baseline and Years 7, 10, 15, 20 and 25, respectively). We excluded one participant who withdrew from the study and two participants who changed gender. We also excluded women who were pregnant at the time of examination (n = 6, 33, 9, 4, 3 and 1 at baseline and Years 7, 10, 15, 20 and 25, respectively) and those with missing BMI data (n = 13, 23, 15, 5, 10 and 3 at baseline and Years 7, 10, 15, 20 and 25, respectively). Our final sample sizes were 4294, 2404, 1702, 1470, 1189 and 1115 individuals at baseline and Years 7, 10, 15, 20 and 25, respectively (n = 12 174 person-observations).

Using multilevel mixed effects linear regression (‐mixed- in Stata 14.0) with baseline study centre, gender, race, age and year, we imputed missing values for individual-level income (n = 755, 55, 25, 26, 34, and 31 at baseline and Years 7, 10, 15, 20 and 25, respectively), marital status (n = 6 at baseline), alcohol intake (n = 2, 12, 18, 4, 21 and 11 at baseline and Years 7, 10, 15, 20 and 25, respectively) and PA (n = 1, 47, 23, 6, 12 and 312 at baseline and Years 7, 10, 15, 20 and 25, respectively). Using the mean of non-missing values across all years, we also imputed missing values for census-derived sociodemographics (n = 4), food outlets (n = 4) and road connectivity (n = 5) at baseline and Years 7, 10 and 15.

To account for potential selection bias due to out-migration over time, we used gender, race and baseline study centre to predict the probability of being in the sample at the end of follow-up. We used the inverse of the probability to weight all models (‐pweight-).

Statistical analysis

Exposure specification

To create our explanatory variables (Y vector in equations below), we used the count of each food outlet type within a 1-km street network distance from respondents’ residences, which captures walking distance to food outlets.26 We calculated the percentage of convenience stores, grocery stores and supermarkets out of total food stores (sum of convenience stores, grocery stores and supermarkets). We also calculated the percentage of fast-food restaurants and full-service restaurants out of total restaurants (sum of fast-food and full-service restaurants). Thus, modelling a 10% increase in one type of food store (or restaurant) equals a 10% decrease in the other food stores (or restaurants). We also modelled the total count of food outlets as endogenous variables, to account for variation in the denominator of our central exposure variables (i.e, having fewer or more alternatives might influence choice of food outlet). Endogenous variables (including exposures) are related to and determined by other variables in the model.27

Covariates

We adjusted for several exogenous variables (X vectors in equations below), including age and age-squared (continuous), race (White, Black), gender, educational attainment (< high school, ≥ high school), income (≤ $42 500, > $42 500), baseline study centre, year and market-level cigarette and fast-food prices (Appendix 2). Exogenous variables are theoretically and statistically associated with endogenous variables, and not determined by other variables in the model.

Based on previously established methods,28 we calculated total PA intensity scores (exercise units) using a summary of the frequency and intensity of participants’ moderate and vigorous activities. We treated total PA, alcohol intake (yes/no), marital status (yes/no) and number of children as endogenous (W vectors in equations below).

Instrumental variables

Valid instruments (Z vectors in equations below) should be theoretically and statistically associated with endogenous variables, and have no direct associations with the outcome (outside their influence on endogenous variables) nor with error terms in regression equations. Our set of instruments included: population density; percentage neighbourhood White population; percentage neighbourhood population ≤ 18 years; distance to nearest employment subcentre; count of public and fee-based PA facilities; market-level wine and beer prices; and street connectivity (Appendix 2). We theorized that this set of variables was directly associated with neighbourhood food outlets and other endogenous variables, but not directly associated with BMI or error terms in the model.

Empirical model

The general specification for the IV model (Supplementary Figure 1, available as Supplementary data at IJE online) is shown below:

Wit = α1Zit+β1Xit+μ1i+ ɛ1it (1)
Yit = α2Zit+β2Xit+γ1Wit+μ2i+ ɛ2it (2)
Bit = δ1Yit+β3Xit+γ2Wit+μ3i+ ɛ3it (3)

In equation 1, Wit represents a vector of endogenous variables, which influence BMI and retail food environment variables, and are also influenced by exogenous variables; Zit represents a vector of exogenous instrumental variables; and Xit represents a vector of non-instrument exogenous variables. In equation 2, Yit represents a vector of endogenous retail food environment variables. In equation 3, Bit is BMI at each examination. Across equations, i equals 1,…, N participants; t equals 1,…, Ti years; and μi and ɛit represent unobserved time-invariant and time-varying error components, respectively. The equations capture both the direct and the indirect effects of vectors on endogenous variables (e.g. α2 represents the direct effect of Zit on Yit, and α1 represents the indirect effect of Zit on Yit via Wit).

Estimators and empirical tests of IV assumptions

We used a generalized method of moments (GMM) estimator for IV regression, which is a single-equation estimation approach based on a two-stage least-squares estimator.29 The GMM estimator allows for a cluster-corrected weighting matrix, which is more efficient than other IV estimators. We used -ivregress- with the ‘gmm’ option in Stata (version 14.0).

We used the Sargan-Hansen J test of over-identifying restrictions to test the assumption that our IVs were exogenous (i.e. not related to or determined by other variables also in the model). Failure to reject the null hypothesis (P < 0.05) indicates that our IVs were exogenous and that it was valid to exclude them as predictors of BMI. We used the Durbin-Wu-Hausman test to evaluate whether our theoretically endogenous variables were in fact endogenous (i.e. related to and determined by other variables in the model). Rejecting the null hypothesis (P < 0.10) implies that our assumption about endogeneity was correct. We obtained goodness-of-fit statistics to evaluate the explanatory power of our IVs. An F statistic with a critical value greater than 10 indicates that our IVs were strong predictors of endogenous variables.30 We used the -estat- post-estimation command for all empirical tests.

We then compared IV estimates with non-causal estimators, including: OLS regression (with robust variance) and RE regression, which do not account for endogeneity (i.e. unmeasured confounding, reverse causality and differential measurement error);15 and FE regression, which controls for time-invariant endogeneity only15 (Supplementary Table 1, available as Supplementary data at IJE online). We adjusted for all covariates in each model. We did not include food purchasing and consumption measures because these constructs are on the causal pathway and adjustment would theoretically attenuate estimated effects.

We considered comparing IV estimates with Heckman selection models, but we were unable to identify an exclusion restriction (i.e. a variable that predicts the probability of being obese, but not linear BMI); as well as propensity score-matching methods, but this approach does not account for unobserved bias.31–33

Sensitivity analyses

To determine whether estimates from the central analysis were robust to our measure of obesity, we replicated all analyses with WC as the outcome. We considered using lagged IVs and endogenous variables, but decided that loss of explanatory power and uneven intervals between examinations justified using contemporaneous exposure and outcome variables.

Results

Mean BMI was 24.5 kg/m2 (SD = 5.1) and 31.0 kg/m2 (SD = 8.0) at baseline and Year 25, respectively, with a mean of 27.3 kg/m2 (SD = 6.9) across follow-up (Table 1). Over time, the percentage of neighbourhood full-service restaurants, convenience stores and supermarkets increased, the percentage of fast-food restaurants and grocery stores decreased and total food outlet counts increased.

Table 1.

Descriptive statistics for participants over the study period: CARDIA baseline and Years 7–25 (1985/86–2010/11)

Baseline Year 7 Year 10 Year 15 Year 20 Year 25 Baseline to Year 25 (average)
N (person-observations) 4294 2404 1702 1470 1189 1115 12 174
Individual-level socio-demographics [% or mean (SD)]
White 44.0 41.2 32.2 32.3 31.1 31.0 37.9
 Female 53.8 54.5 56.3 56.5 59.7 58.1 55.6
 Education ≤ high school 58.1 58.7 64.6 70.5 73.2 75.6 63.7
 Income ≤ $12 000 33.8 38.4 37.5 25.3 25.4 28.5 32.3
 Marital status (yes) 21.7 37.8 36.4 38.8 39.1 37.3 32.2
 Children (yes/no) 34.0 57.9 64.0 69.7 73.1 74.7 54.8
 Alcohol intake (yes) 59.9 55.3 51.0 48.8 50.2 49.7 54.5
 Total physical activity (exercise units)a 418 (305) 332 (272) 326 (285) 328 (279) 304 (266) 307 (269) 356 (290)
 Age, years 24.8 (3.7) 32.0 (3.7) 35.0 (3.8) 40.0 (3.8) 45.2 (3.7) 50.1 (3.8) 33.8 (9.2)
 BMI (kg/m2) 24.5 (5.1) 27.1 (6.5) 28.3 (7.1) 29.5 (7.5) 30.3 (7.2) 31.0 (8.0) 27.3 (6.9)
Neighbourhood-level food outlets within 1 km [mean (SD)]
 Fast-food restaurants, % per total restaurants 65.0 (46.0) 45.5 (34.8) 43.3 (36.9) 41.9 (35.3) 41.8 (30.9) 40.1 (29.6) 50.8 (40.1)
 Full-service restaurants, % per total restaurants 2.6 (9.9) 39.5 (33.5) 32.6 (33.2) 34.4 (32.7) 41.6 (30.9) 43.2 (30.5) 25.5 (31.8)
 Convenience stores, % per total food stores 29.6 (28.3) 37.8 (25.9) 38.7 (29.4) 36.8 (28.9) 37.7 (27.8) 37.0 (29.6) 34.8 (28.4)
 Grocery stores, % per total food stores 60.5 (33.1) 53.7 (28.1) 49.2 (31.8) 49.4 (31.6) 46.0 (29.6) 45.4 (31.2) 53.4 (31.8)
 Supermarkets, % per total food stores 1.5 (6.3) 3.4 (8.6) 3.6 (9.5) 3.8 (9.8) 5.6 (11.9) 6.6 (11.6) 3.3 (9.0)
 Total restaurants, countb 2.8 (4.5) 10.3 (17.5) 6.0 (11.4) 6.7 (13.0) 12.0 (25.6) 11.8 (22.3) 6.9 (15.1)
 Total food stores, countb 5.4 (4.3) 11.2 (8.4) 7.0 (6.2) 6.5 (5.9) 8.2 (8.7) 7.6 (7.6) 7.4 (6.9)

aWe calculated total PA intensity scores (exercise units) using a summary of the frequency and intensity of participants’ moderate and vigorous activities.

bValues represent the mean for all CARDIA participants per year and thus do not equal 100% for total restaurants or total food stores.

We failed to reject the null hypothesis of the test of over-identifying restrictions (P = 0.667), and rejected the null hypothesis of the Durbin-Wu-Hausman test (P = 0.001). Taken together, these results suggest our model was appropriately specified. The Fstatistic value for each endogenous variable was greater than 10 (Table 2), suggesting that our combined IVs strongly identified endogenous variables.30

Table 2.

Goodness-of-fit statistics for evaluating strength of identification of endogenous variables with body mass index: CARDIA baseline and Years 7–25 (1985/86–2010/11)

F statistic P-valuea
Convenience stores, % per total food stores 106.0 < 0.001
Grocery stores, % per total food stores 87.9 < 0.001
Fast-food restaurants, % per total restaurants 78.9 < 0.001
Total food stores, count 272.3 < 0.001
Total restaurants, count 183.9 < 0.001
Marital status (yes, no) 12.6 < 0.001
Number of children 17.3 < 0.001
Alcohol intake (yes, no) 29.1 < 0.001
Physical activity (exercise units) 19.7 < 0.001

aRejecting the F test indicates that our set of instruments provides good identification for that endogenous variable.

Estimates of retail food environment exposures in relation to BMI were approximately 10–20 times smaller in magnitude using non-causal (versus causal) models (Table 3). For example, a 10% increase in the percentage of grocery stores (relative to supermarkets) was associated with a 0.50 kg/m2 (95% CI: 0.10, 1.00; P = 0.026) increase in BMI over time using IV regression (assuming a linear relationship). On the other hand, a 10% increase in the percentage of grocery stores was associated with a negligible decrease in BMI using RE regression (β = −0.03; 95% CI: -0.10, -0.001; P = 0.037) and FE regression (β = −0.03; 95% CI: -0.10, -0.002; P = 0.031).

Table 3.

Beta coefficients (95% confidence intervals)a for the associations between each type of food store or restaurantb and body mass index,c using ordinary least squares, fixed effects, and instrumental variables regression: CARDIA baseline and Years 7–25 (1985/86–2010/11)

IV regressiond P-value OLS regressione P-value RE regressionf P-value FE regressiong P-value
N (person-observations) 12 174 12 174 12 174 12 174
 Full-service restaurants, % per total restaurantsh 0.00 0.00 0.00 0.00
 Fast-food restaurants, % per total restaurants −0.01 0.700 −0.001 0.710 −0.001 0.333 −0.001 0.391
(−0.06, 0.04) (−0.005, 0.003) (−0.003, 0.001) (−0.003, 0.001)
 Supermarkets, % per total food storesh 0.00 0.00 0.00 0.00
Convenience stores, % per total food stores 0.02 0.457 −0.002 0.854 −0.003 0.117 −0.002 0.123
(−0.03, 0.06) (−0.01, 0.01) (−0.01, 0.0003) (−0.01, 0.001)
 Grocery stores, % per total food stores 0.05 0.026 −0.001 0.752 −0.003 0.037 −0.003 0.031
(0.01, 0.10)* (−0.01, 0.01) (−0.01, -0.0001) (−0.01, -0.0002)
 Total restaurants, count 0.02 0.350 −0.02 0.010 −0.01 <0.001 −0.01 0.008
(−0.02, 0.06) (−0.03, -0.004)* (−0.02, -0.004)* (−0.01, -0.002)*
 Total food stores, count 0.01 0.414 0.001 0.481 0.003 0.721 −0.10 0.280
(−0.02, 0.05) (−0.01, 0.02) (−0.01, 0.02) (−0.29, 0.08)

aMultivariable-adjusted models were adjusted for individual-level age, gender, race, educational attainment, income, children, marital status, examination year and market-level food prices.

bCalculated within a 1-km network buffer of participants’ residences.

cBody mass index, mean (SD): 27.3 (6.9) kg/m2.

dInstrumental-variables regression using Stata’s -ivregress- command with the ‘gmm’ option.

eOrdinary least squares regression using Stata’s -reg- command with robust variance.

fRepeated measures random effects regression using Stata’s -xtreg- command with the ‘re’ option.

gRepeated measures fixed effects regression using Stata’s -xtreg- command with the ‘fe’ option.

hOmitted from the model (referent).

*Indicates the estimate is statistically significant at the P < 0.05 level.

The percentages of convenience stores (relative to supermarkets) and fast-food restaurants (relative to full-service restaurants) were not associated with BMI in any model, but the magnitude of coefficients was also larger using IV regression.

Sensitivity analyses

The magnitude and direction of estimates derived from models with WC were similar to those obtained in BMI analyses (Supplementary Table 2, available as Supplementary data at IJE online). Goodness-of-fit statistics (Supplementary Table 3, available as Supplementary data at IJE online) and empirical tests of overidentifying restrictions (P = 0.646) and endogeneity (P = 0.001) were also similar to BMI analyses.

Discussion

With clinic-based, anthropometric measures and detailed neighbourhood environment data, we used IV regression to estimate causal effects of the retail food environment on BMI over time. We also compared the magnitude and direction of causal IV estimates with non-causal models, including OLS and RE regression, which do not account for residual confounding, and with FE regression, which only corrects for unmeasured time-invariant characteristics. Controlling for unmeasured characteristics with causal models in neighbourhood environment studies is important because omitted variables (e.g. unmeasured preferences) may bias relationships between environmental variables and health outcomes.

Although selection bias usually biases OLS estimates upwards,35 we found that longitudinal associations between food outlets and BMI were attenuated using non-causal (versus causal) models. The smaller magnitude of non-causal model estimates also suggests that the error terms corresponding to retail food environment exposures and BMI were negatively correlated, possibly due to a mismatch between unmeasured preferences and environment (e.g. individuals with a preference for locating near supermarkets might locate in areas with few supermarkets for reasons unrelated to the retail food environment). Furthermore, the observed differences between FE and IV regression suggest that bias may be time-varying, such as unmeasured preferences for larger residences over time.35 Overall, our findings are consistent with previous studies showing that using IV regression resulted in stronger associations between environment variables and health outcomes than did OLS regression.18–20 Given the rich empirical literature comparing causal and non-causal methodologies across several other disciplines,36,37 we argue that our findings would consistently apply to future studies and are not a unique feature of the CARDIA study.

Our causal model results suggest that the percentage of grocery stores (relative to supermarkets) was positively albeit weakly-associated with BMI over time. Others suggest that grocery stores (which are larger and have higher sales38) have a lower ratio of healthy to unhealthy shelf space than do supermarkets.39 Therefore, it is hypothetically possible that decreasing the number of smaller grocery stores while simultaneously increasing the number of supermarkets, possibly via changes to zoning ordinances,2 may contribute to reducing population-level BMI (though we acknowledge that such efforts are not trivial). On the other hand, natural intervention studies suggest that modifying the retail food environment may not meaningfully reduce obesity whereas price interventions to improve healthy eating have been more successful.40 Although we posit that changes to BMI would operate through changes in food consumption, in an earlier study we did not find an association between the availability of grocery stores and diet outcomes (unpublished); however, it is possible that a shorter follow-up period and a smaller sample size undermined our ability to detect statistically significant associations.

Although our instruments strongly identified obesity outcomes, we acknowledge that there are many challenges with causal models, including availability of longitudinal data, lack of temporal variation in retail food environment exposures and difficulties in identifying valid and robust IVs. The latter can be partially addressed with full-information IV regression, which is preferable in the presence of weak instruments.41 We also acknowledge that the retail food environment is only one risk factor for weight gain, and additional risk factors should be considered in future research, including factors related to food availability and prices in school, professional and recreational environments (i.e. not retail food outlets). We also lacked data related to zoning ordinances and land use policies, which may restrict the placement of healthy food outlets (e.g. supermarkets) in neighbourhoods,42 especially in low-income areas,43 though we controlled for neighbourhood income. Although we observed missing values and classification errors in D&B, we used a prediction model and matched business names to mitigate inaccuracies at baseline (Appendix 2).

Our findings suggest that residing in a neighbourhood with a greater availability of grocery stores (relative to supermarkets) associates with higher BMI over time, after accounting for residual confounding. Our observation of attenuated estimates from non-causal (versus causal) models suggests that the more widely used non-causal models may underestimate associations between environmental exposures and health outcomes. Thus, it is important to recognize that null or weak findings in the predominantly non-causal literature may have been subject to residual confounding.

Supplementary Data

Supplementary data are available at IJE online.

Funding

This work was supported by the National Heart, Lung, and Blood Institute [grant numbers R01HL104580 and R01HL114091]. The Coronary Artery Risk Development in Young Adults study (CARDIA) is supported by contracts from the National Heart, Lung, and Blood Institute, the Intramural Research Program of the National Institute on Aging and an intra-agency agreement between National Institute on Aging and National Heart, Lung, and Blood Institute [grant numbers HHSN268201300025C, HHSN268201300026C, HHSN268201300027C, HHSN268201300028C, HHSN268201300029C, HHSN268200900041C and AG0005]. The authors are grateful to: the Carolina Population Center, University of North Carolina at Chapel Hill, for general support from the Eunice Kennedy Shriver National Institute of Child Health and Human Development [grant number P2C HD050924]; the Nutrition Obesity Research Center, University of North Carolina, for support from the National Institute for Diabetes and Digestive and Kidney Diseases [grant number P30DK56350]; and to the Center for Environmental Health Sciences, University of North Carolina, for support from the National Institute for Environmental Health Sciences [grant number P30ES010126].

Key Messages

  • Findings in the observational retail food environment and obesity literature are inconsistent, potentially due to a lack of adjustment for residual confounding.

  • We sought to assess the presence and extent of bias from residual confounding by comparing estimates derived from causal models (instrumental-variables regression) with estimates derived from non-causal methods, including ordinary least squares and random effects regression, which do not account for residual confounding at all; and fixed effects regression, which only corrects for time-invariant residual confounding.

  • Overall, estimates derived from non-causal models were attenuated relative to a causal modelling strategy, which suggests that non-causal models may underestimate the effect of the neighbourhood retail food environment on weight status.

  • Using causal model strategies in future studies is important for informing efforts to modify neighbourhood retail food environments to improve health outcomes.

Supplementary Material

Supplemental Figure 1
Supplemental Table 1
Supplemental Table 2
Supplemental Table 3
Appendix 1
Appendix 2

Acknowledgments

The authors would like to acknowledge CARDIA Chief Reviewer Janne Boone-Heinonen. PhD, whose thoughtful suggestions improved the paper, as well as Marc Peterson, of the University of North Carolina, Carolina Population Center (CPC) and the CPC Spatial Analysis Unit, for creation of the environmental variables. P.E.R. had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of data analysis.

Conflict of interest: None declared. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.

References

  • 1. Cummins S, Petticrew M, Higgins C, Findlay A, Sparks L. Large scale food retailing as an intervention for diet and health: quasi-experimental evaluation of a natural experiment. J Epidemiol Community Health 2005;59:1035–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Sturm R, Cohen DA. Zoning for health? The year-old ban on new fast-food restaurants in South LA. Health Aff (Millwood). 2009;28:w1088–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Cummins S, Flint E, Matthews SA. New neighbourhood grocery store increased awareness of food access but did not alter dietary habits or obesity. Health Aff (Millwood) 2014;33:283–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Sadler RC, Gilliland JA, Arku G. A food retail-based intervention on food security and consumption. Int J Environ Res Public Health 2013;10:3325–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Sturm R, Hattori A. Diet and obesity in Los Angeles County 2007–2012: Is there a measurable effect of the 2008 ‘Fast-Food Ban’? Soc Sci Med 2015;133:205–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Wrigley N, Warm D, Margetts B. Deprivation, diet, and food-retail access: findings from the Leeds food deserts study. Environment and Planning A 2003;35:151–88. [Google Scholar]
  • 7. Mulligan J, Tsai P, Whitacre PT. The Public Health Effects of Food Deserts: Workshop Summary. Washington, DC: National Academies Press, 2009. [PubMed] [Google Scholar]
  • 8. Hollands S, Campbell MK, Gilliland J, Sarma S. Association between neighbourhood fast-food and full-service restaurant density and body mass index: A cross-sectional study of Canadian adults. Can J Public Health 2014;105:e172–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Larson N, Hannan PJ, Fulkerson JA, Laska MN, Eisenberg ME, Neumark-Sztainer D. Secular trends in fast-food restaurant use among adolescents and maternal caregivers from 1999 to 2010. Am J Public Health 2014;104:e62–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cobb LK, Appel LJ, Franco M, Jones-Smith JC, Nur A, Anderson CA. The relationship of the local food environment with obesity: A systematic review of methods, study quality, and results. Obesity (Silver Spring) 2015;23:1331–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Feng J, Glass TA, Curriero FC, Stewart WF, Schwartz BS. The built environment and obesity: a systematic review of the epidemiologic evidence. Health Place 2010;16:175–90. [DOI] [PubMed] [Google Scholar]
  • 12. Fleischhacker S, Evenson KR, Rodriguez DA, Ammerman AS. A systematic review of fast food access studies. Obes Rev 2011;12:e460–71. [DOI] [PubMed] [Google Scholar]
  • 13. Lovasi GS, Hutson MA, Guerra M, Neckerman KM. Built environments and obesity in disadvantaged populations. Epidemiol Rev 2009:mxp005. [DOI] [PubMed] [Google Scholar]
  • 14. Mokhtarian PL, Cao X. Examining the impacts of residential self-selection on travel behavior: A focus on methodologies. Transportation Research Part B: Methodological 2008;42(3):204–28. [Google Scholar]
  • 15. Allison PD. Fixed Effects Regression Methods for Longitudinal Data Using SAS. Cary, NC: SAS Institute, 2005. [Google Scholar]
  • 16. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. Philadelphia, PA: Wolters Kluwer Health/Lippincott Williams & Wilkins, 2008. [Google Scholar]
  • 17. Heckman J. Sample selection bias as a specification error. Econometrica 1979;47:153. [Google Scholar]
  • 18. Chen SE, Florax RJ, Snyder SD. Obesity and fast food in urban markets: a new approach using geo-referenced micro data. Health Econ 2013;22:835–56. [DOI] [PubMed] [Google Scholar]
  • 19. Dunn RA. The effect of fast-food availability on obesity: an analysis by gender, race, and residential location. American Journal of Agricultural Economics 2010:aaq041. [Google Scholar]
  • 20. Dunn RA, Sharkey JR, Horel S. The effect of fast-food availability on fast-food consumption and obesity among rural residents: an analysis by race/ethnicity. Econ Hum Biol 2012;10:1–13. [DOI] [PubMed] [Google Scholar]
  • 21. Pereira M, FitzerGerald S, Gregg E. A collection of physical activity questionnaires for health-related research. Kriska and Caspersen, eds, Centers for Disease Control and Prevention. Med Sci Sports Exerc 1997;29(Suppl 6):S1–205. [PubMed] [Google Scholar]
  • 22. Schuetz J, Kolko J, Meltzer R. Are poor neighborhoods ‘retail deserts’? Regional Science and Urban Economics 2012;42:269–85. [Google Scholar]
  • 23. Fleischhacker SE, Evenson KR, Sharkey J, Pitts SBJ, Rodriguez DA. Validity of secondary retail food outlet data: a systematic review. Am J Prev Med 2013;45:462–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Han E, Powell LM, Zenk SN, Rimkus L, Ohri-Vachaspati P, Chaloupka FJ. Classification bias in commercial business lists for retail food stores in the US. Int J Behav Nutr Phys Act 2012;9:46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Liese AD, Colabianchi N, Lamichhane AP. et al. Validation of 3 food outlet databases: completeness and geospatial accuracy in rural and urban food environments. Am J Epidemiol 2010;172:1324–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Oliver LN, Schuurman N, Hall AW. Comparing circular and network buffers to examine the influence of land use on walking for leisure and errands. Int J Health Geog 2007;6:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Gunasekara FI, Carter K, Blakely T. Glossary for econometrics and epidemiology. J Epidemiol Community Health 2008;62:858–61. [DOI] [PubMed] [Google Scholar]
  • 28. Jacobs DR, Hahn LP, Haskell WL, Pirie P, Sidney S. Validity and reliability of short physical activity history: CARDIA and the Minnesota Heart Health Program. J Cardiopulm Rehabil 1989;9:448–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Baum CF, Schaffer ME, Stillman S. Instrumental variables and GMM: Estimation and testing. Stata J 2003;3:1–31. [Google Scholar]
  • 30. Stock JH, Wright JH, Yogo M. A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics 2002;20:518–29. [Google Scholar]
  • 31. Heckman J. Sample selection bias as a specification error. Econometrica 1979;47:153. [Google Scholar]
  • 32. Heckman J, Singer B. A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica;1984:271–320. [Google Scholar]
  • 33. Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev Econ Stat 2004;86:4–29. [Google Scholar]
  • 34. Angrist JD, Pischke J-S. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press, 2008. [Google Scholar]
  • 35. Boone-Heinonen J, Howard AG, Meyer K. et al. Marriage and parenthood in relation to obesogenic neighbourhood trajectories: The CARDIA study. Health Place 2015;34:229–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Easterly W. Inequality does cause underdevelopment: Insights from a new instrument. J Dev Econ 2007;84:755–76. [Google Scholar]
  • 37. Gabel M, Scheve K. Estimating the effect of elite communications on public opinion using instrumental variables. Am J Polit Sci 2007;51:1013–28. [Google Scholar]
  • 38. Gibson DM. The neighbourhood food environment and adult weight status: estimates from longitudinal data. Am J Public Health 2011;101:71–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Farley TA, Rice J, Bodor JN, Cohen DA, Bluthenthal RN, Rose D. Measuring the food environment: shelf space of fruits, vegetables, and snack foods in stores. J Urban Health 2009;86:672–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Hyseni L, Atkinson M, Bromley H. et al. The effects of policy actions to improve population dietary patterns and prevent diet-related non-communicable diseases: scoping review. Eur J Clin Nutr 2016, Nov 30. doi: 10.1038/ejcn.2016.234. [Epub ahead of print.] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Meyer K, Guilkey D, Hsiao-Chuen T, Kiefe C, Popkin B, Gordon-Larsen P. Instrumental variables simultaneous equations model of physical activity and body mass index: Coronary Artery Risk Development in Young Adults (CARDIA) Study. Am J Epidemiol 2016;184:465–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Chen SE, Florax RJ. Zoning for health: the obesity epidemic and opportunities for local policy intervention. J Nutr 2010;140:1181–84S. [DOI] [PubMed] [Google Scholar]
  • 43. Mayo ML, Pitts SBJ, Chriqui JF. Peer reviewed: associations between county and municipality zoning ordinances and access to fruit and vegetable outlets in rural North Carolina, 2012. Prev Chronic Dis. 2013;10:130196. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figure 1
Supplemental Table 1
Supplemental Table 2
Supplemental Table 3
Appendix 1
Appendix 2

Articles from International Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES