Abstract
Background
Neighborhood circumstances have an influence on multiple health outcomes, but the association between neighborhood conditions and lung cancer incidence has not been studied in sufficient detail. The goal of this study was to understand whether neighborhood conditions are independently associated with lung cancer incidence in ever-smokers after adjusting for individual smoking exposure and other risk factors.
Methods
A cohort of ever-smokers aged ≥ 55 years was assembled from 19 years of electronic health record data from our academic community health-care system. Patient demographic characteristics and other measures known to be associated with lung cancer were ascertained. Patient addresses at their index visit were geocoded to the census block group level to determine the area deprivation index (ADI), drawn from 5-year estimates from the American Community Survey. A multivariate Cox proportional hazards model was fit to assess the association between ADI and time to lung cancer diagnosis. Tests of statistical significance were two-sided.
Results
The study included 19,867 male subjects and 21,748 female subjects. Fifty-three percent of the patients were white, 38% were black, and 5% were Hispanic. Of these, 1,149 developed lung cancer. After adjusting for known risk factors, patients residing in the most disadvantaged areas had a significantly increased incidence of lung cancer compared with those in the least disadvantaged areas (hazard ratio, 1.29; 95% CI 1.07-1.55).
Conclusions
Census-derived estimates of neighborhood conditions have a powerful association with lung cancer incidence, even when adjusting for individual variables. Further research investigating the mechanisms that link neighborhood conditions to lung cancer is warranted.
Key Words: area deprivation, lung cancer, neighborhood deprivation, socioeconomic position
Abbreviations: ADI, area deprivation index; EHR, electronic health record; SEP, socioeconomic position
Lung cancer is the most common malignancy worldwide and the leading cause of cancer-related death in the United States.1 Racial and ethnic disparities in lung cancer risk are well established and not fully explained by differences in smoking exposure.2,3 There is growing evidence supporting an association between area-based measures of socioeconomic position (SEP) and lung cancer incidence in cohorts from a number of European countries,4, 5, 6, 7, 8, 9 Canada,10,11 and the United States.12,13 These findings implicate underlying social determinants of health as potential mediators of racial and ethnic disparities in lung cancer risk. However, the conclusions of most of the aforementioned studies are limited by the absence of adjustment for smoking exposure, which is the single largest risk factor for developing lung cancer.14
Only two studies assessing the association of lung cancer incidence with area-based SEP have been conducted in the United States to date.12,13 The first was a study by Hastert et al13 that revealed progressively increasing incidence of lung cancer with worsening area-based deprivation but did not account for smoking exposure. Sanderson et al12 conducted a nested case-control study that found a trend between lung cancer incidence and area-based SEP in current and former recent term smokers after adjusting for categorical representations of smoking exposure. However, the authors acknowledged that their analysis was unable to rule out residual confounding due to smoking as a result of their reliance on categorical representations of smoking exposure.
We therefore sought to determine the extent to which an established neighborhood-level SEP measure, the area deprivation index (ADI), accounts for lung cancer incidence in ever-smokers after adjusting for a more accurate representation of smoking exposure, along with other well-established predictors, in a large, racially and socioeconomically diverse cohort of ever-smokers at a community health-care system.
Materials and Methods
Study Design
This study was a retrospective chart review using 19 years of electronic health record (EHR) data (1999-2018) from our urban safety net health-care system that includes a large tertiary care hospital and 27 community health centers. Since 1999, the system has served > 1.5 million individual patients, with approximately 1.2 million outpatient encounters per year. From 1999 to 2018, providers cared for > 150,000 patients with at least two different in-person encounters after the age of 55 years. Demographically, the population consists of approximately 59% of patients identifying as white, 29% as black, and 5.1% as Hispanic. The median follow-up time for these patients is 2.9 years, with a mean follow-up time of 4.4 years and a range of 0.0028 to 19.9 years. For this analysis, we included ever-smokers who had adequate smoking history documentation, had at least two clinical encounters a minimum of 1 day apart after the age of 55 years, and had no previous diagnosis of lung cancer in our EHR. Patients whose address of residence could not be geocoded to a census block group were excluded.
Measures
Patient demographic characteristics, social histories, medical histories, diagnosis codes, and family histories relevant to the study were ascertained from the EHR. The index encounter was defined as the first visit after the age of 55 years when elements of the smoking history activity, inclusive of status, intensity, and duration, were documented. We retrieved the age, sex, race/ethnicity, BMI, smoking status, smoking intensity, smoking duration, quit time, insurance type, diagnosis of COPD, personal diagnosis of cancer, and the number of first-degree relatives with lung cancer of each patient as documented at their index encounter. Race and ethnicity were included as a single variable that was categorized as “white,” “black,” “Hispanic,” or “other.” BMI values < 14 kg/m2 and > 60 kg/m2 were assumed to be erroneous and, as such, coded as either 14 kg/m2 or 60 kg/m2, respectively. Smoking status was categorized as “former” and “current.” Smoking intensity values more than five packs per day were considered improbable and potentially erroneous, and as such coded as five packs per day. Smoking intensity was subjected to a square root transformation to reflect its nonlinear relationship with cancer risk.15 Insurance was stratified into four categories: “commercial,” “Medicaid,“ “Medicare,” and “self-pay.” Historical encounter and billing- associated diagnostic codes as classified using the International Classification of Diseases, Tenth Revision, were used to determine history of cancer (C00-D49, excluding C44.11*) and COPD (J41-J44).16 Structured data from the family history activity in the EHR were reviewed to ascertain the presence of a family history of lung cancer; the absence of documentation to that effect was considered to indicate a negative family history of lung cancer.
Primary Exposure Variable
The composite area-based deprivation index leveraged in the current study is a construct that approximates the living conditions and SEP of any given neighborhood.17 When it was first developed, it used 17 indicators, drawn from census tract-level data, including population aged ≥ 25 years with < 9 years of education, population aged ≥ 25 years with at least a high school diploma, employed persons aged ≥ 16 years in white collar occupations, median family income, income disparity, median home value, median gross rent, median monthly mortgage, owner-occupied housing units, civilian labor force population aged ≥ 16 years unemployed, percentage of families below the poverty level, percentage of the population < 150% of the poverty threshold, percentage of single-parent households with children aged < 18 years, percentage of households without a motor vehicle, percentage of households without a telephone, percentage of occupied housing units without complete plumbing, and percentage of households with more than one person per room.18,19 We implemented the same methods but only applied 15 of the aforementioned indicators. Specifically, we removed the number of households without a telephone and the number of occupied housing units without complete plumbing, as they had little variation in Northeast Ohio.
Each patient’s address at his or her index encounter was mapped and geocoded to 2010 US Census block groups using ArcGIS (version 10.6.1). The ADI at the block group level was calculated for each patient by using the R “Sociome” package.20 Following other published approaches to estimating ADI, Sociome creates a composite index derived from the observed factor weights of each of the census indicators.17,18 The values were drawn from the American Community Survey 5-year estimates, 2013 edition, using the state of Ohio as the reference population. Previous research has not established the stability of the ADI over time, but in general, US Census-based estimates are stable at the neighborhood level; we conducted a test of this by examining the bivariate association between the 2009 and 2017 ADI for Ohio (e-Fig 1) and found that the two ADI estimates were nearly co-linear (r = 0.942; 95% CI, 0.937-0.946). In light of our earlier research that illustrated a nonlinear association between ADI and cardiovascular outcomes in the patient population from our study,21 we ranked subjects according to the ADI percentile distribution of their census block groups and categorized the variable as follows: the least deprived 50% of census blocks were grouped into the “lowest 1%-50%,” followed by the “51%-75%” group, the “76%-90%” group, and the top 10% of the most deprived neighborhoods were grouped as the “highest 91%-100%.”
Outcome Variable
Lung cancer incidence, as ascertained by using encounter-level billing diagnoses, was the primary outcome analyzed in the current study. For a sensitivity analysis, diagnoses of lung cancer were reviewed in the EHR and adjudicated to be both new and accurate based on either tissue sampling, treatment, or, at the least, documentation in pulmonary tumor board or specialist notes indicating a high index of suspicion. Chart review was completed by two of the authors (D. J. K. and then Y. T.) for another independent analysis that preceded this study. Of the 1,149 cancer diagnoses in this study, only 64 (5.6%) were excluded for the sensitivity analysis. Reasons for exclusion are noted in e-Appendix 1.
Statistical Approach
Only patients with complete data were included in the analysis. Descriptive statistics were conducted to examine population characteristics. A Wilcoxon test for trend across ordered groups22 was used to determine the relationship between ADI strata (lowest 1%-50%, 51%-75%, 76%-90%, and highest 91%-100%) and all categorical predictor variables. K-Sample equality of medians was used to compare medians of continuous predictor variables between the four ADI groups, given their nonnormal distribution as suggested by the Shapiro Wilk test of normality.
Unadjusted and adjusted Cox proportional hazards models were used to assess the relation between ADI and lung cancer incidence. Patients who did not develop lung cancer were right-censored at the date of their last in-person visit. The proportional hazards assumption for both models was examined by using both statistical and graphical approaches (e-Appendix 2, e-Table 1, e-Figs 2-4). Overall model discrimination performance was evaluated with Harrell’s concordance (C) statistic.23
All statistical analyses were performed by using STATA 15.0 (StataCorp). All significance tests were two-sided. Our study was approved by the MetroHealth System institutional review board (IRB17-00495).
Results
There was a total of 73,591 ever-smokers aged > 55 years with at least two clinical encounters in our system; characteristics are presented in e-Table 2. A total of 43,360 unique patients met inclusion criteria for the study. Of these, 1,745 (4.0%) were excluded due to missing BMI values. Fifty patients had a BMI < 14 kg/m2 and 382 had a BMI > 60 kg/m2. No other variable had missing values. Accordingly, 41,615 patients were included in the analysis. Patient characteristics of the final study cohort, as stratified according to ADI, are shown in Table 1. All of the continuous risk factors analyzed were statistically significantly different among the four groups of the (locally derived) area deprivation measure. Similarly, all of the categorical risk factors examined had a statistically significant trend across the ADI strata. A total of 1,149 individuals in this cohort developed lung cancer. Median study time, stratified according to ADI, is reported in e-Table 3.
Table 1.
Variable | Overall | Area Deprivationa |
P Value | |||
---|---|---|---|---|---|---|
Lowest 1%-50% (Median ADI, 85.9; n = 11,162) | 51%-75% (Median ADI, 105.1; n = 9,996) | 76%-90% (Median ADI, 124.3; n = 10,388) | Highest 91%-100% (Median ADI, 141.4; n = 10,069) | |||
Smoking status | ||||||
Former | 24,288 (58.4) | 5,267 (47.2) | 5,765 (57.7) | 6,555 (63.1) | 6,701 (66.5) | < .001 |
Current | 17,327 (41.6) | 5,895 (52.8) | 4,231 (42.3) | 3,833 (36.9) | 3,368 (33.5) | |
Smoking intensity, packs per day | 0.94 ± 0.65 | 1.00 ± 0.65 | 0.95 ± 0.64 | 0.91 ± 0.64 | 0.89 ± 0.66 | < .001 |
Smoking duration, y | 29.58 ± 13.55 | 28.36 ± 13.96 | 29.59 ± 13.51 | 30.02 ± 13.51 | 30.48 ± 13.09 | < .001 |
Quit time, y | 3.83 ± 9.10 | 5.72 ± 11.23 | 3.78 ± 9.02 | 3.06 ± 7.89 | 2.56 ± 7.18 | < .001 |
Age, y | 61.45 ± 7.55 | 62.89 ± 8.38 | 61.45 ± 7.57 | 60.79 ± 7.03 | 60.53 ± 6.82 | < .001 |
BMI, kg/m2 | 29.69 ± 7.69 | 29.34 ± 7.21 | 29.86 ± 7.63 | 29.71 ± 7.84 | 29.89 ± 8.07 | < .001 |
Sex | ||||||
Female | 21,748 (52.3) | 5,571 (49.9) | 5,339 (53.4) | 5,502 (53.0) | 5,336 (53.0) | < .001 |
Male | 19,867 (47.7) | 5,591 (50.1) | 4,657 (46.6) | 4,886 (47.0) | 4,733 (47.0) | |
Race/ethnicity | ||||||
White | 22,151 (53.2) | 9,024 (80.9) | 6,262 (62.7) | 3,926 (37.8) | 2,939 (29.2) | < .001 |
Black | 15,696 (37.7) | 1,309 (11.7) | 2,945 (29.5) | 5,337 (51.4) | 6,105 (60.6) | |
Hispanicb | 1,955 (4.7) | 175 (1.6) | 389 (3.9) | 729 (7.0) | 662 (6.6) | |
Other | 1,813 (4.4) | 654 (5.9) | 400 (4.0) | 396 (3.8) | 363 (3.6) | |
Insurance | ||||||
Commercial | 10,208 (24.5) | 4,279 (38.3) | 2,639 (26.4) | 1,939 (18.7) | 1,351 (13.4) | < .001 |
Medicaid | 9,592 (23.1) | 1,340 (12.0) | 1,991 (19.9) | 2,851 (27.5) | 3,410 (33.9) | |
Medicare | 11,554 (27.8) | 3,094 (27.7) | 2,823 (28.2) | 2,878 (27.7) | 2,759 (27.4) | |
Self-pay | 10,261 (24.7) | 2,449 (21.9) | 2,543 (25.4) | 2,720 (26.2) | 2,549 (25.3) | |
Family history of lung cancerc | ||||||
0 | 39,357 (94.6) | 10,550 (94.5) | 9,400 (94.0) | 9,840 (94.7) | 9,567 (95.0) | .04 |
1 | 2,107 (5.1) | 574 (5.1) | 564 (5.6) | 509 (4.9) | 460 (4.6) | |
≥ 2 | 151 (0.4) | 38 (0.3) | 32 (0.3) | 39 (0.4) | 42 (0.4) | |
Diagnosis of COPDd | ||||||
No | 34,108 (82.0) | 9,668 (86.6) | 8,183 (81.9) | 8,245 (79.4) | 8,012 (79.6) | < .001 |
Yes | 7,507 (18.0) | 1,494 (13.4) | 1,813 (18.1) | 2,143 (20.6) | 2,057 (20.4) | |
Personal history of cancerd | ||||||
No | 30,129 (72.4) | 8,001 (71.7) | 7,170 (71.7) | 7,598 (73.1) | 7,360 (73.1) | .004 |
Yes | 11,486 (27.6) | 3,161 (28.3) | 2,826 (28.3) | 2,790 (26.9) | 2,709 (26.9) |
For all continuous variables, the mean values ± SD were listed, and the k-sample equality of medians was used to ascertain the P values. For all categorical variables, the No. (%) are listed and a Wilcoxon test of trend was used to ascertain the P values. ADI = area deprivation index.
ADI was categorized into percentiles of census block group disadvantage; lowest group is the most affluent 50% of census blocks, and the highest group is the most disadvantaged 10% of census blocks.
Hispanic ethnicity was combined with race as one variable due to the negligibly small number of individuals who identify as non-white Hispanic.
Family history of lung cancer was modeled on an ordinal scale: 0, 1, and ≥ 2.
Diagnosis of COPD (yes vs no) and personal history of cancer (yes vs no).
In total, our cohort dwelled in 2,166 census block groups within 45 counties in Ohio. The minimum ADI value was 7.45, and the maximum value was 174.5. After assigning a rank according to the ADI percentile distribution of the census blocks within the study cohort, the total number of patients grouped in the 50% of the least deprived census blocks was 11,162 with a median ADI of 85.9. The 51%-75% strata had a total of 9,996 patients with a median ADI of 105.1. The 76%-90% group had a total of 10,388 patients with a median ADI of 124.3. Finally, there were a total of 10,069 patients who lived in the top 10% of the most deprived neighborhoods and had a median ADI of 141.4. Temporally, we found that 34,197 patients remained in the same ADI stratum as defined by our study, while 4,312 would move to an address in a more deprived neighborhood, and 3,106 moved to a less deprived one.
In the unadjusted Cox proportional hazards regression analysis, area deprivation was significantly associated with lung cancer (Table 2). The corresponding cumulative incidence curves are shown in Figure 1. Individuals residing in the 51%-75% census block percentile of deprivation had a 19% increased hazard of lung cancer incidence compared with the patients residing in the least deprived 50% of census block group (95% CI, 0.99-1.43). Subjects living in the 76%-90% census block group percentile of deprivation had a 32% increased hazard of lung cancer incidence compared with the patients residing in the least deprived 50% (95% CI, 1.11-1.56) block group. Those living in the top 10% of the most deprived neighborhoods had a 37% increased hazard of lung cancer incidence compared with the least deprived group (95% CI, 1.16-1.62). Harrell’s C statistic for this unadjusted model was 0.53.
Table 2.
Variable | Unadjusted Hazard Ratio (95% CI) | Adjusted Hazard Ratio (95% CI) |
---|---|---|
Area deprivationa | ||
Lowest 1%-50% (median ADI, 85.9; n = 11,162) | Reference | Reference |
51%-75% (median ADI, 105.1; n = 9,996) | 1.19 (0.99-1.43) | 1.11 (0.92-1.33) |
76%-90% (median ADI: 124.3, n = 10,388) | 1.32 (1.11-1.56) | 1.23 (1.02-1.47) |
Highest 91%-100% (median ADI, 141.4; N = 10,069) | 1.37 (1.16-1.62) | 1.29 (1.07-1.55) |
Smoking status | … | |
Former | Reference | |
Current | 1.44 (1.24-1.68) | |
Smoking intensity, packs per dayb | … | 2.02 (1.67-2.45) |
Smoking duration, per 10-year increase | … | 1.22 (1.16-1.29) |
Quit time, per 1-year increase | … | 0.98 (0.97-0.99) |
Age, per 10-year increase | … | 1.40 (1.26-1.54) |
BMI, per 5 kg/m2 increase | … | 0.85 (0.81-0.89) |
Sex | … | |
Female | Reference | |
Male | 1.01 (0.89-1.14) | |
Race/ethnicity | … | |
White | Reference | |
Black | 1.02 (0.89-1.18) | |
Hispanicc | 0.53 (0.37-0.77) | |
Other | 0.46 (0.30-0.70) | |
Insurance | … | |
Commercial | Reference | |
Medicaid | 0.95 (0.78-1.17) | |
Medicare | 1.16 (0.97-1.40) | |
Self-pay | 1.13 (0.94-1.35) | |
Family history of lung cancerd | … | 1.23 (1.00-1.52) |
Diagnosis of COPDe | … | 1.97 (1.73-2.24) |
Personal history of cancere | … | 1.43 (1.26-1.62) |
See Table 1 legend for expansion of abbreviation.
ADI was categorized into percentiles of census block group disadvantage: the lowest group is the most affluent 50% of census blocks, and the highest group is the most disadvantaged 10% of census blocks.
Smoking intensity was transformed to its square root.
Hispanic ethnicity was combined with race as one variable due to the negligibly small number of individuals who identify as non-white Hispanic.
Family history of lung cancer was modeled on an ordinal scale: 0, 1, and ≥ 2.
Diagnosis of COPD (yes vs no) and personal history of cancer (yes vs no).
The association between area deprivation and lung cancer incidence was only slightly attenuated in the multivariate model compared with the unadjusted model (Table 2). Smoking status, smoking intensity, smoking duration, quit time, age, BMI, race/ethnicity, diagnosis of COPD, and personal history of cancer were statistically significantly associated with lung cancer incidence. There were no statistically significant interactions between area deprivation and sex, race/ethnicity, and insurance (P values > .05). The full multivariate model had a C statistic of 0.74.
For the sensitivity analysis, we replicated the primary analysis excluding the 64 patients for whom the lung cancer diagnosis was less certain. The magnitude and direction of hazard ratios were not meaningfully different in this supplemental model. In addition, we conducted a post hoc analysis of the impact of geographic clustering by running an additional mixed effects exponential proportional hazards regression clustered on census tracts (n = 980). The results of our analysis were grossly unchanged (not shown).
Discussion
Although previous research has implicated variations in smoking intake and carcinogen clearance as possible mediators of racial and ethnic lung cancer disparities, differences in lung cancer incidence are not sufficiently explained by self-reported or biomarker-measured smoking intensity alone.3 This suggests the presence of additional risk factors or mediators of lung cancer risk. Evidence that area-based measures of SEP are associated with lung cancer incidence, even when individual SEP is accounted for, provides a potential explanation for racial and ethnic disparities in risk beyond smoking exposure.6,13 Whether this association represents confounding with unmeasured or poorly measured risk factors, however, has been contested.10 Previous studies that did not adjust for smoking exposure specifically have attributed the association of lung cancer incidence with area-based deprivation to known increases in smoking exposure in more deprived areas.10,24 Alternately, the two studies to date that did quantitate and control for smoking history did not fully adjust for all measures of smoking exposure.10,12 Namely, Nkosi et al10 leveraged cumulative pack-year equivalents in addition to categorical representations of time since cessation to quantitate exposure, whereas Sanderson et al12 used broad categorical representations of smoking status. Continuous measures of smoking intensity, duration, and time since cessation are independently required to better capture the impact of smoking exposure as it pertains to lung cancer risk.15,25 As a result, our study should be less prone to potential residual confounding effects that might have affected previous research.
The current study used a composite area-based socioeconomic index to analyze the extent to which area deprivation is associated with lung cancer incidence, while adjusting for smoking exposure and other well-established individual-level predictors of lung cancer. Specifically, we aimed to better model the impact of smoking exposure by mirroring the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial Model (PLCOM2012) produced by Tammemägi et al15 for the prediction of lung cancer incidence. Tammemägi et al modeled smoking exposure risk more completely by accounting for smoking intensity and duration independently. We found that even after controlling for these and other variables, individuals residing in disadvantaged neighborhoods had a significantly higher risk of lung cancer incidence. To our knowledge, the current study is the first to show such a strong association between lung cancer incidence and neighborhood deprivation while also accounting for smoking exposure in the United States.
Our results build on and extend those of Sanderson et al,12 who conducted the only other US-based study to ascertain the impact of area-based SEP measures on lung cancer incidence with some adjustment for smoking history. Although our study was based on a regional retrospective cohort, the analysis by Sanderson et al relied on a nested case-control design from a multistate cohort that consisted mostly of low-income subjects. As acknowledged by Sanderson et al in their own discussion, the incomplete adjustment for smoking exposure and their homogeneity in area-based SEP might explain why they could not show a conclusive association between the most deprived neighborhoods and lung cancer incidence in their analysis.
The association between area deprivation and lung cancer incidence we uncovered could still represent confounding of other unmeasured variables that separate across area-based SEP. Possible factors include evidence of poorer dietary choices, lower levels of physical activity, and greater alcohol consumption among more socioeconomically deprived neighborhoods.26 A greater likelihood of exposure to home and work-related environmental carcinogens is a consideration as well.11 Differences in second-hand smoke exposure could also explain a higher risk for lung cancer in more deprived neighborhoods, although the additive effect of second-hand smoke exposure among ever-smokers is likely to be marginal.27 Although we did not adjust for environmental or occupational exposures, other studies that did so found mostly no significant contribution when multivariate models were used,10, 11, 12 or only a relative attenuation in the resulting association of area-level SEP to lung cancer incidence.13 Finally, SEP-associated biologic stress, or “allostatic load,” is an alluring potential mechanism for the development of lung cancer, but this hypothesis has not been clinically validated to date.28,29 Further studies combining biomarkers and social and neighborhood indicators are certainly warranted.
Our study had several strengths. In leveraging a retrospective study design, we are able to potentially limit selection biases that may hinder the conclusions of the aforementioned case-control studies.30 We also had a relatively large sample size of 41,615 patients, of whom 1,149 individuals developed lung cancer. Our sample size and incidence rate are comparable to those of previous studies that validated lung cancer risk prediction models. The Bach model developed from the Carotene and Retinol Efficacy Trial used 36,286 individuals, of whom 1,070 developed cancer.31 In comparison, the PLCOM2012 and Lung Cancer Risk Assesment Tool (LCRAT) models were developed from the control arm of the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial, which had a total of 36,286 individuals, of whom 630 developed lung cancer.15,32
The largest limitations in our approach relate to our reliance on secondary analysis of EHR data. EHR data fields were not primarily designed for research and are often incomplete or inaccurate as a result.33 The family history of lung cancer, for instance, is a field that is defaulted to a negative value. Accordingly, the absence of an affirmative response in this field is not indicative of a confirmation of a negative family history of lung cancer. Because we were unable to include spirometry in our analysis, our reliance on an EHR-based diagnosis of COPD can be equally problematic. The presence of a COPD diagnosis does not rule out the real possibility of an overdiagnosis, and its absence could still represent a missed diagnosis. Because completeness of EHR data in general is biased toward sicker patients,34 the documentation of a positive family history of lung cancer or a diagnosis of COPD is potentially more likely to be elicited in patients who are considered by their providers to be at-risk for lung cancer. Our results are therefore less likely to be applicable to individuals who are relatively healthy and/or averse to health care. Outside of data completeness, smoking exposure documentation in the EHR is frequently inaccurate and often underestimates true exposure.35
Although we have previously shown that measures of smoking intensity and duration indicate variability over time in our own EHR system,36 the impact of this is likely attenuated by the finding that the duration of smoking exposure in our EHR environment, as the largest contributor to lung cancer risk,25 was the most stable of the two. In addition, although we controlled for all known confounders retrievable from the EHR, we were unable to control for individual SEP, radon, and air pollution exposures.37,38 Lastly, our findings are limited regarding generalizability, given our reliance on patients from a single academic safety net health-care system.
Our results reveal a concerning gradient of SEP-based disparity in the incidence of lung cancer in a diverse, regional US cohort. This trend is likely driven by interrelated and complex social determinants that were not uncovered in our retrospective study. A more complete accounting and assessment of individual and area-based factors that might reflect or account for these disparities is warranted in future analyses. Our study would also be strengthened by expanding the analysis to include never-smokers and patients from other health-care systems to provider better representation from all strata of neighborhood SEP in our region. On a broader level, place should be considered in the future development of lung cancer risk models, alongside race, ethnicity, and education. Our work is consistent with findings by LaVeist et al,39 who suggest that efforts focused narrowly on health behavior changes and biological differences between groups without focusing on neighborhood conditions will have only limited success in addressing health disparities. Similarly, neighborhood SEP should be considered in the development and targeting of health promotion and screening programs to individuals residing in deprived neighborhoods, although the magnitude of the benefit of the latter is less clear given the underrepresentation of minorities in lung cancer screening studies.40
Conclusions
We report that among a regional cohort of 41,615 ever-smokers aged > 55 years, patients residing in the most disadvantaged areas had a significantly increased incidence of lung cancer diagnoses compared with those in the least disadvantaged areas (hazard ratio, 1.29; 95% CI, 1.07-1.55), even following adjustment for race and multiple other individual-level factors. Ours is the first study in the United States to report this association after adjusting for smoking exposure, and it stands out in contrast to other studies in its more nuanced approach to modeling and adjusting for smoking exposure associated lung cancer risk. Further study is indicated to more precisely elucidate the mechanisms by which area-based deprivation adversely affects individual health as it pertains to lung cancer risk.
Acknowledgments
Author contributions: All the authors listed have made substantial contributions to the conception and design of this study, reviewed and revised the manuscript, and provided final approval.
Financial/nonfinancial disclosures: None declared.
Role of sponsors: The sponsor had no role in the design of the study, the collection and analysis of the data, or the preparation of the manuscript.
Additional information: The e-Tables, e-Figures, and e-Appendix can be found in the Supplemental Materials section of the online article.
Footnotes
Part of this article has been presented in abstract form (Adie Y, Tlimat A, Kats D, Perzynski A, Tarabichi Y. Neighborhood Disadvantage and Lung Cancer Incidence in an Electronic Health Record Cohort [abstract]. In: Annual Meeting of the American Thoracic Society; May 17-22, 2019; Dallas, TX. Am J Respir Crit Care Med. 2019;199:A7287).
FUNDING/SUPPORT: Research reported in this publication was partially supported by The National Institute on Aging of the National Institutes of Health under award number R01AG055480. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Supplementary Data
References
- 1.Siegel R.L., Miller K.D., Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;68:7–30. doi: 10.3322/caac.21442. [DOI] [PubMed] [Google Scholar]
- 2.Ward E., Jemal A., Cokkinides V. Cancer disparities by race/ethnicity and socioeconomic status. CA Cancer J Clin. 2004;54:78–93. doi: 10.3322/canjclin.54.2.78. [DOI] [PubMed] [Google Scholar]
- 3.Stram D.O., Park S.L., Haiman C.A. Racial/ethnic differences in lung cancer incidence in the Multiethnic Cohort Study: an update. J Natl Cancer Inst. 2019;111(8):811–819. doi: 10.1093/jnci/djy206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sharpe K.H., McMahon A.D., McClements P., Watling C., Brewster D.H., Conway D.I. Socioeconomic inequalities in incidence of lung and upper aero-digestive tract cancer by age, tumour subtype and sex: a population-based study in Scotland (2000-2007) Cancer Epidemiol. 2012;36:e164–e170. doi: 10.1016/j.canep.2012.01.007. [DOI] [PubMed] [Google Scholar]
- 5.Riaz S.P., Horton M., Kang J., Mak V., Luchtenborg M., Moller H. Lung cancer incidence and survival in England: an analysis by socioeconomic deprivation and urbanization. J Thorac Oncol. 2011;6:2005–2010. doi: 10.1097/JTO.0b013e31822b02db. [DOI] [PubMed] [Google Scholar]
- 6.Meijer M., Bloomfield K., Engholm G. Neighbourhoods matter too: the association between neighbourhood socioeconomic position, population density and breast, prostate and lung cancer incidence in Denmark between 2004 and 2008. J Epidemiol Community Health. 2013;67:6–13. doi: 10.1136/jech-2011-200192. [DOI] [PubMed] [Google Scholar]
- 7.Li X., Sundquist J., Zoller B., Sundquist K. Neighborhood deprivation and lung cancer incidence and mortality: a multilevel analysis from Sweden. J Thorac Oncol. 2015;10:256–263. doi: 10.1097/JTO.0000000000000417. [DOI] [PubMed] [Google Scholar]
- 8.Hoebel J., Kroll L.E., Fiebig J. Socioeconomic inequalities in total and site-specific cancer incidence in Germany: a population-based registry study. Front Oncol. 2018;8:402. doi: 10.3389/fonc.2018.00402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bryere J., Dejardin O., Launay L. Socioeconomic status and site-specific cancer incidence, a Bayesian approach in a French Cancer Registries Network study. Eur J Cancer Prev. 2018;27:391–398. doi: 10.1097/CEJ.0000000000000326. [DOI] [PubMed] [Google Scholar]
- 10.Nkosi T.M., Parent M.E., Siemiatycki J., Rousseau M.C. Socioeconomic position and lung cancer risk: how important is the modeling of smoking? Epidemiology. 2012;23:377–385. doi: 10.1097/EDE.0b013e31824d0548. [DOI] [PubMed] [Google Scholar]
- 11.Hystad P., Carpiano R.M., Demers P.A., Johnson K.C., Brauer M. Neighbourhood socioeconomic status and individual lung cancer risk: evaluating long-term exposure measures and mediating mechanisms. Soc Sci Med. 2013;97:95–103. doi: 10.1016/j.socscimed.2013.08.005. [DOI] [PubMed] [Google Scholar]
- 12.Sanderson M., Aldrich M.C., Levine R.S., Kilbourne B., Cai Q., Blot W.J. Neighbourhood deprivation and lung cancer risk: a nested case-control study in the USA. BMJ Open. 2018;8 doi: 10.1136/bmjopen-2017-021059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hastert T.A., Beresford S.A., Sheppard L., White E. Disparities in cancer incidence and mortality by area-level socioeconomic status: a multilevel analysis. J Epidemiol Community Health. 2015;69:168–176. doi: 10.1136/jech-2014-204417. [DOI] [PubMed] [Google Scholar]
- 14.Alberg A.J., Samet J.M. Epidemiology of lung cancer. Chest. 2003;123(suppl 1):21S–49S. doi: 10.1378/chest.123.1_suppl.21s. [DOI] [PubMed] [Google Scholar]
- 15.Tammemägi M.C., Katki H.A., Hocking W.G. Selection criteria for lung-cancer screening. N Engl J Med. 2013;368:728–736. doi: 10.1056/NEJMoa1211776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.ICD-10: international statistical classification of diseases and related health problems: tenth revision, 2nd ed. http://www.who.int/iris/handle/10665/42980 Accessed November 27, 2018. [PubMed]
- 17.Singh G.K. Area deprivation and widening inequalities in US mortality, 1969-1998. Am J Public Health. 2003;93:1137–1143. doi: 10.2105/ajph.93.7.1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kind A.J., Jencks S., Brock J. Neighborhood socioeconomic disadvantage and 30-day rehospitalization: a retrospective cohort study. Ann Intern Med. 2014;161:765–774. doi: 10.7326/M13-2946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Singh G.K., Miller B.A., Hankey B.F. Changing area socioeconomic patterns in US cancer mortality, 1950-1998: Part II—lung and colorectal cancers. J Natl Cancer Inst. 2002;94:916–925. doi: 10.1093/jnci/94.12.916. [DOI] [PubMed] [Google Scholar]
- 20.Sociome: Operationalizing Social Determinants of Health Data for Researchers. Version: 0.3.4. https://cran.r-project.org/web/packages/sociome/index.html
- 21.Dalton J.E., Perzynski A.T., Zidar D.A. Accuracy of cardiovascular risk prediction varies by neighborhood socioeconomic position: a retrospective cohort study. Ann Intern Med. 2017;167:456–464. doi: 10.7326/M16-2543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cuzick J. A Wilcoxon-type test for trend. Stat Med. 1985;4:87–90. doi: 10.1002/sim.4780040112. [DOI] [PubMed] [Google Scholar]
- 23.Newson R.B. Comparing the predictive powers of survival models using Harrell's C or Somers’. D. Stata J. 2010;10:339–358. [Google Scholar]
- 24.Cohen S.S., Sonderman J.S., Mumma M.T., Signorello L.B., Blot W.J. Individual and neighborhood-level socioeconomic characteristics in relation to smoking prevalence among black and white adults in the Southeastern United States: a cross-sectional study. BMC Public Health. 2011;11:877. doi: 10.1186/1471-2458-11-877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Peto J. That the effects of smoking should be measured in pack-years: misconceptions 4. Br J Cancer. 2012;107:406–407. doi: 10.1038/bjc.2012.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hastert T.A., Ruterbusch J.J., Beresford S.A., Sheppard L., White E. Contribution of health behaviors to the association between area-level socioeconomic status and cancer mortality. Soc Sci Med. 2016;148:52–58. doi: 10.1016/j.socscimed.2015.11.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Edwards R., Hasselholdt C.P., Hargreaves K. Levels of second hand smoke in pubs and bars by deprivation and food-serving status: a cross-sectional study from North West England. BMC Public Health. 2006;6:42. doi: 10.1186/1471-2458-6-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bird C.E., Seeman T., Escarce J.J. Neighbourhood socioeconomic status and biological 'wear and tear' in a nationally representative sample of US adults. J Epidemiol Community Health. 2010;64:860–865. doi: 10.1136/jech.2008.084814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Reiche E.M., Nunes S.O., Morimoto H.K. Stress, depression, the immune system, and cancer. Lancet Oncol. 2004;5:617–625. doi: 10.1016/S1470-2045(04)01597-9. [DOI] [PubMed] [Google Scholar]
- 30.Aigner A., Grittner U., Becher H. Bias due to differential participation in case-control studies and review of available approaches for adjustment. PLoS One. 2018;13 doi: 10.1371/journal.pone.0191327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bach P.B., Kattan M.W., Thornquist M.D. Variations in lung cancer risk among smokers. J Natl Cancer Inst. 2003;95:470–478. doi: 10.1093/jnci/95.6.470. [DOI] [PubMed] [Google Scholar]
- 32.Katki H.A., Kovalchik S.A., Berg C.D., Cheung L.C., Chaturvedi A.K. Development and validation of risk models to select ever-smokers for CT lung cancer screening. JAMA. 2016;315:2300–2311. doi: 10.1001/jama.2016.6255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hersh W.R., Weiner M.G., Embi P.J. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51:S30–S37. doi: 10.1097/MLR.0b013e31829b1dbd. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Rusanov A., Weiskopf N.G., Wang S., Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:51. doi: 10.1186/1472-6947-14-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Modin H.E., Fathi J.T., Gilbert C.R. Pack-year cigarette smoking history for determination of lung cancer screening eligibility: comparison of the electronic medical record versus a shared decision making conversation. Ann Am Thorac Soc. 2017;14(8):1320–1325. doi: 10.1513/AnnalsATS.201612-984OC. [DOI] [PubMed] [Google Scholar]
- 36.Tarabichi Y., Kats D.J., Kaelber D.C., Thornton J.D. The impact of fluctuations in pack-year smoking history in the electronic health record on lung cancer screening practices. Chest. 2018;153(2):575–578. doi: 10.1016/j.chest.2017.10.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Darby S., Hill D., Auvinen A. Radon in homes and risk of lung cancer: collaborative analysis of individual data from 13 European case-control studies. BMJ. 2005;330:223. doi: 10.1136/bmj.38308.477650.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Raaschou-Nielsen O., Andersen Z.J., Beelen R. Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE) Lancet Oncol. 2013;14:813–822. doi: 10.1016/S1470-2045(13)70279-1. [DOI] [PubMed] [Google Scholar]
- 39.LaVeist T., Pollack K., Thorpe R., Jr., Fesahazion R., Gaskin D. Place, not race: disparities dissipate in southwest Baltimore when blacks and whites live under similar conditions. Health Aff (Millwood) 2011;30:1880–1887. doi: 10.1377/hlthaff.2011.0640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hestbech M.S., Siersma V., Dirksen A., Pedersen J.H., Brodersen J. Participation bias in a randomised trial of screening for lung cancer. Lung Cancer. 2011;73:325–331. doi: 10.1016/j.lungcan.2010.12.018. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.