Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jul 1.
Published in final edited form as: Risk Anal. 2012 Jul;32(Suppl 1):S179–S189. doi: 10.1111/j.1539-6924.2011.01734.x

COMPARING THE ADEQUACY OF CARCINOGENESIS MODELS IN ESTIMATING US POPULATION RATES FOR LUNG CANCER MORTALITY

Theodore R Holford 1, David Levy 2
PMCID: PMC3478769  NIHMSID: NIHMS408209  PMID: 22882888

Abstract

The relationship between smoking and lung cancer is well established and cohort studies provide estimates of risk for individual cohorts. While population trends are qualitatively consistent with smoking trends, the rates do not agree well with results from analytical studies. Four carcinogenesis models for the effect of smoking on lung cancer mortality were used to estimate lung cancer mortality rates for US males: two stage clonal expansion and multistage models using parameters estimated from two Cancer Prevention Studies (CPS I and CPS II). Calibration was essential to adjust for both shift and temporal trend. The age-period-cohort model was used for calibration. Overall, models using parameters derived from CPS I performed best, and the corresponding two stage clonal expansion model was best overall. However, temporal calibration did significantly improve agreement with the population rates, especially the effect of age and cohort.

Keywords: age-period-cohort model, carcinogenesis models, cigarette smoking, lung cancer mortality, population rates

1. INTRODUCTION

The relationship of smoking to lung cancer has been established by biological and epidemiological studies (1), and it has been strongly linked to dose, both in terms of cigarettes smoked per day and number of years smoked (27). These studies are primarily large cohort studies, such as the Cancer Prevention Study (CPS)-I and CPS-II.

In other papers in this supplement, the CPS-I and II and other longitudinal data (e.g., the Nurses Health Study) have been used to develop models to predict lung cancer deaths from smoking rates. All but one of these models used the two stage clonal expansion (TSCE) model to map smoking prevalence, duration and intensity on lung cancer death rates. Nevertheless, these models have had to be calibrated, in order that the estimates better fit the actual levels and trends in lung cancer deaths. To better understand smoking and its changing role in lung cancer, this paper considers how well four separate models of the relationship between smoking and lung cancer explain historical trends in lung cancer risks. We cannot be certain as to which model provides the correct quantitative description of the effect of smoking on lung cancer risk. However, these alternative models have been proposed to provide quantitative summaries of the effect of smoking on lung cancer risk. In this paper, we consider these alternatives in order to determine the extent to which results are affected by model choice.

Like other papers in this supplement, this study examines the role of smoking in explaining trends in lung cancer deaths using the Base Case data, a unique set of data developed through CISNET. The data contain information on smoking status, smoking intensity and smoking duration by cohort from 1975 through 2000. However, we consider a broad array of models applied to the CPS-I and CPS-II data. We confine the analysis to models using the CPS-I and II data, because these data are most widely used in developing estimates of relative smoking risks and smoking-attributable deaths.(810)

We consider four carcinogenesis models: the two stage clonal expansion (TSCE) model as applied to CPS-I and CPS-II data (7) and the multi-stage cancer model applied by Flanders et al. (3) to CPS-II and by Knoke et al. (5) to CPS-I. An age-period-cohort (APC) calibration model is then applied to examine remaining biases not captured by each smoking model, and the results of the four models are compared. Finally, the number of lung cancer deaths under actual tobacco control (ATC) that occurred in the US is compared to that under two alternative scenarios: 1) no tobacco control (NTC) which could have resulted from ignoring scientific evidence on health effects of cigarette smoking and continuing behavior that existed earlier, and 2) complete tobacco control (CTC) resulting in cessation of all smoking following publication of the Surgeon’s General’s Report(11) in 1964.

2. METHODS

The model developed here for the impact of exposure to cigarette smoking on lung cancer makes use of (a) distribution of smoking history summaries, (b) quantitative description of the functional relationship between smoking history and the mortality rate based on a carcinogenesis model, R, and (c) equations that consider deviation of predictions using equations in (b) from observed population rates. Deviations between carcinogenesis model rates and population rates for lung cancer mortality are calibrated using a multiplicative factor that may be an estimated constant, or a function that depends on time, yielding an estimated population rate, R * =θR, where θ represents the calibration factor.

The models are applied to 1975–2000 mortality rates for US males by single year of age and calendar period, starting at 30 and continuing through 84.

2.1. Smoking models

To model the effect of smoking on lung cancer mortality, the population was divided into never, current and former smokers with prevalences pNS, pS and pFS and ratesRNS, RS and RFS respectively, yielding an overall rate

R=pNSRNS+pSRS+pFSRFS. (1)

Among never smokers, the mortality rate, RNS (AGE), reflects only the aging process on lung cancer risk. Mortality among current smokers depends not only on age, but on dose measured by the number of cigarettes smoked per day (CPD), and the smoking duration (DUR).

For former smokers, mortality not only depends on age, dose and duration, but also on duration of quitting (QT_DUR),

RFS(AGE,CPD,DUR,QT_AGE,QT_DUR)

The categories of duration of quitting were 1–2, 3–5, 6–10, 11–15 and 16+ years. In duration category j, mean dose (CPDj), mean age of initiation ( AGE-DURj), mean age quit (AGE-QT_AGE j) and proportion of former smokers in category (qj) were determined, yielding the mortality rate among former smokers

RFS=jqjRFS,j(AGE,CPDj,DURj,QT_DURj) (2)

Data on smoking prevalence, smoking intensity and smoking duration were originally obtained from the National Health Interview Survey (NHIS). Data were developed using smoothing methods by cohort and year, which were converted to age and year, as described in Chapter 2 (Burns et al. in this supplement).

We consider three sets of models (27), all of which relate lung cancer mortality risk to age, duration and intensity of smoking, employ either a multi-stage or TSCE carcinogenesis model, and employ either CPS-I or CPS-II cohort data.

Knoke et al (5) estimated separate equations for absolute lung cancer death risk of never smokers, smokers and ex-smokers using the white, male population of CPS-I. Lung cancer death rates were estimated as excess risk, where mean absolute risk for smokers and former smokers is added to never smokers.

Flanders et al. (3) used CPS-II data for all races to estimate separate lung cancer death rates for smokers by 10 year age groups. Because Flanders et al. (3) did not estimate equations for ex-smokers, we apply the ex-smoker equation from Knoke et al. (5) to smoker death rates from Flanders to obtain declining death rates of ex-smokers by years quit. Since death rates for never smokers have been found to be the same for CPS-1 and CPS-2 (12), we used the Knoke et al. (5) model for never smokers.

Hazelton et al. (7) separately applied the two-stage clonal expansion (TSCE) model to CPS-I and CPS-II, thus providing a common model applied to both data sets. They fitted a stochastic model derived as a function of initiation, cell division, apoptosis of initiated cells, and malignant conversion of initiated cells rates, each a function of smoking intensity and duration. Data were not available for former smokers, but the model provides estimates of risk for that group.

2.2. Model evaluation

We consider how well each model predicts lung cancer mortality over time. Estimated parameters in each model use data from CPS-I or CPS-II, which differ from the US population and thus may produce biased rate estimates. In addition, cohort studies have limited follow-up, while smoking risk may vary temporally due to changes in manufacturing or non-smoking causes of lung cancer, which can also bias rate estimates. We consider how potential biases may vary by age (AGE), period (PER) or birth cohort (COH).

Let the vector t = (AGE,PER,COH) represent temporal elements. A carcinogenesis model estimates the mortality rate using smoking exposure in the population, which varies by age and by period and cohort changes in smoking behavior. We calibrate the model using a multiplicative factor that depends on the temporal vector,

R=θ(t)R (2)

The calibration function is a log-linear age-period-cohort (APC) model,

θ(AGE,PER,COH)=exp{μ+αAGE+πPER+γCOH} (3)

The intercept, μ, scales model estimated rates to correspond overall with observed US rates. Temporal elements for age (αAGE ), period (πPER ) and cohort (γCOH) provide corresponding temporal calibration for the model to provide better agreement with observed rates. If temporal effects are all 0, modeled rates agree well with observed rates, and parallelism with the abscissa indicates adequacy of a model’s characterization of temporal trend during that time. Poor agreement may result from model inaccuracy or limitations in population exposure estimates.

We apply usual constraints to parameters in equation (4), i.e., ΣαAGE = ΣπPER = ΣγCOH = 0. Cohort is the difference between period and age, resulting in the identifiability problem (1317). One can express estimable components that are easily interpreted by partitioning each temporal effect into overall slope or direction of the trend and curvature or deviation from linear trend.(18, 14). To calibrate a smoking model, the model in equations (3) and (4) is fitted to the observed lung cancer rates yielding the calibration function θ(·). We assume the number of lung cancer deaths, Y, is Poisson and the denominator, D, is known. Maximum likelihood estimates of parameters are found by using the observed rate, = Y/D, as the response and including a scale weight, D (1921). This is accomplished using PROC GENMOD in SAS®, by using the link function

η=logθ=log(λ/λ)=μ+αAGE+πPER+γCOH

where λ* is the expected population rates and λ the rate from the carcinogenesis model. An F-test based on a quasi-likelihood in which the variance is proportional to the mean was used to assess statistical significance. Calibrated rates for smoking exposure, Z, scale the model estimate, R, using θ̂(AGE,PER,COH)R.

2.3. Model Evaluation

Carcinogenesis models are compared to historical rates by interpreting APC calibration parameters and by evaluating agreement between observed and fitted rates. The shift factor, μ, indicates systematic deviation from actual rates or bias. We further evaluate separate shifts by each temporal element alone or in combination with age to consider adequacy in estimating temporal bias in the prediction model. Finally, we apply calibration to obtain optimal estimates of rates for the population.

We compare four models: two multi-stage models and two TSCE carcinogenesis models which employ parameters derived from either CPS-I and CPS-II data. Predictability is gauged by R-square, i.e., the proportion of deviance explained by the model. Systematic bias was gauged by patterns in the APC effects, their statistical significance and graphical display of fitted age-adjusted and age-specific rates.

2.4. Lung Cancer Deaths under the No, Actual and Complete Tobacco Control Scenarios

For each of the 4 models (TSCE CPS-I and TSCE CPS-II, Knoke el al. and Flanders et al.) we compared predicted lung cancer deaths under the NTC, ATC and CTC scenarios. We consider uncalibrated and calibrated models. To calibrate the counterfactual NTC and CTC predictions we applied the best-fitting calibration equations from the ATC model (for the corresponding gender and CPS model) to the predicted lung cancer rates for NTC and CTC.

The number of deaths avoided as a result of tobacco control policies was calculated as the difference in deaths between the ATC and NTC scenarios. The number of deaths that could have been avoided if smoking was entirely eliminated in 1965 was calculated as the difference in deaths between the ATC and CTC scenarios.

3. RESULTS

Figure 1 shows age-adjusted rates predicted for each smoking model, along with observed rates. Predicted rates from models are consistently lower than observed rates, indicating that shift calibration is essential for estimates of similar magnitude to population rates or number of lung cancer deaths expected. Further, predictions from TSCE I are lower than TSCE II. The shapes of temporal trends differ considerably from each other and from the observed. TSCE I agrees best with the observed shape, but reaches a peak more than five years earlier than observed. These results indicate that not only shift calibration is required to achieve good agreement with observed rates, but temporal calibration, as well.

Figure 1.

Figure 1

Observed and estimated age-adjusted rates using alternative models for smoking effects.

Table I summarizes fit of the four models along with temporal APC elements. All models include shift calibration, which aligns average estimated and observed lung cancer rates for ages 30–84 during 1975–2000, and these account for over 94% of deviance, the best being TSCE I and Knoke models using the CPS-I, each accounting for over 98% of deviance.

Table I.

Summary of the percent of scaled deviance explained by the models and the corresponding percent explained by calibration for age, period and cohort with the corresponding significance tests.

df TSCE I TSCE II Knoke Flanders
Deviance explained by the carcinogenesis model (%) 98.62 96.53 98.13 94.34
Remaining deviance from calibration (%)
 Age, period, cohort 96.21 98.28 97.01 98.87
 Period, cohort 74.30 59.68 93.52 59.45
 Age, cohort 94.56 97.77 95.85 98.33
 Age, period 83.92 91.68 83.29 90.01
Age
 Deviance explained by calibration (%) 90.38 60.18 97.94 38.38
 F* 53 130.42 539.72 27.94 835.29
Period
 Deviance explained by calibration (%) 61.47 68.11 58.67 48.98
 F* 24 19.14 15.84 20.53 25.34
Cohort
 Deviance explained by calibration (%) 69.48 61.20 53.78 21.16
 F* 78 49.37 62.77 74.77 127.55
*

Denominator degrees of freedom (df) for F are 1272, and all have P-values less than .0001.

To evaluate extent to which temporal calibration improves fit, we show percent of remaining variation (e.g., above and beyond the 98.6% variation shown by TSCE I or the 94.3% variation shown by Flanders). With those additional effects, 96–99% can be attributed to a combination of age, period and cohort effects. Age, period and cohort effects each provide extra explanatory power as indicated by significant F-statistics. For the three two factor subset models, AC calibration does nearly as well as APC, accounting for 95–98% of unexplained deviance. This suggests that period calibration accounts for a relatively small fraction of unexplained deviance, almost all of which can be attributable to age and cohort. Also, age correction plays a more important role in the two CPS-II models, i.e., TSCE II and Flanders.

Estimates of temporal calibration effects for age, period and cohort are shown in Figure 2, along with the corresponding effect from an age-period-cohort model that makes no allowance for smoking. The common vertical scale allows one to compare the relative contribution of each temporal effect on calibration. If all the effects are 0, then no temporal calibration is required. Clearly, calibration for period is tiny in comparison to age and cohort, showing an increasing effect through about 1990 and then decreasing slightly. In addition, both age and cohort accounted for some remaining temporal effects, but not all. TSCE I overall appears to do best for age 40 or older, but it overestimates rates for younger age groups where there are many fewer deaths. Knoke describes younger ages 40 to 50 better, but not the older ages. Neither TSCE II nor Flanders models based on CPS-II do as well at describing age effects. The saw tooth pattern for Flanders results from the step instead of a smooth function for age. While all models account for some of the cohort effects, the TSCE I model does best, i.e., closest to the origin for all three temporal effects. Cohort effects increase through those born in 1930 and then declines. Smoking models pick up most of the birth cohort decline since the 1930 cohort when compared to the no smoking model, with TSCE I doing best especially in early cohorts. Interestingly, both the period and cohort effects are relatively consistent across models showing increasing followed by decreasing effects over time.

Figure 2.

Figure 2

Estimated age, period and cohort effects for no smoking carcinogenesis model and alternative smoking models.

To assess overall ability of these calibrated models to describe population rates, we show age-adjusted rates in Figure 3. Figure 3(a) shows outstanding overall agreement between observed and estimated age-adjusted rates for APC calibration in all four models. Alternative temporal calibration for the TSCE I model is shown in Figure 3(b) which shows excellent agreement with observed rates whenever period calibration is included. Normal equations solved when fitting a model that includes period forces observed and expected totals for each period to be equal, which likewise tends to result in good agreement with period trends especially when the carcinogenesis model provides a good description of age. A similar result was observed for the Knoke model but PC calibration did not do well for TSCE II and Flanders because age is not well described by these CPS-II based models (not shown). Results from AC calibration are similar to observed for the TSCE I model, due to the consistently small contribution of period calibration seen in Figure 2. However, calibration for period does have some effect in shifting location of the peak and changing the overall shape of the curve compared to observed.

Figure 3.

Figure 3

Observed and estimated age-adjusted lung cancer mortality rates using (a) APC, PC AC and AP calibration with the TSCE I model; and, (b) APC calibration for each of the smoking models.

Next, we evaluated the success of the models with various calibrations in predicting age-specific rates. Figure 4 provides estimates in selected, representative five-year age groups (40-4, 60-4, 80-4) along with observed values. Figure 4(a) shows excellent agreement for the APC calibrated estimates, and the AC calibrated values in Figure 4(c) are nearly as good because of the small calibration provided by period. Results in Figures 4(b) do not provide additional calibration for age other than the contribution included in the model itself. TSCE I and Knoke do well in older ages because the model provides a good description of the effect. TSCE II and Flanders do not perform nearly as well. We also see limitations in the AP calibration depending on adequacy of the model, as assessed by the missing temporal parameter, C, i.e., TSCE I perform best as implied by relatively small cohort calibration. At young ages relative differences in rates are apparent when the log scale is employed (not shown), but these rates are very low, thus contributing little to the overall rates and number of lung cancer deaths.

Figure 4.

Figure 4

Observed and estimated age-specific lung cancer mortality rates by age groups (40–44, 60–64, and 80–84) using (a) APC, (b) PC, (c) AC and (d) AP calibration.

Finally, for each of the four models, we considered the number of lives saved (comparing ATC to NTC) and the potential number of lives (comparing ATC to CTC) under the different calibrations. As shown in Table II the uncalibrated models yield very different results in terms of the number of deaths under actual tobacco control, with the TSCE models, especially the CPS II indicating a substantially higher number of deaths. While the increase under no calibration yields similar proportional increases in the number of deaths with no tobacco control (NTC), the relative reduction in deaths differ substantially with complete tobacco control. The number of deaths is approximately halved with the TSCE models, but reduced more than 75% under the Knoke et al. and Flanders et al. models. This appears due to the relatively greater reduction in risk for ex-smokers in the latter two models. Nevertheless, the percent of deaths not lost (last column) remains between 30 and 40%.

Table II.

The Number of Lung Cancer Deaths under Actual, No and Complete Tobacco Control over the years 1975–2000 using the TSCE I, TSCE II, Knoke et al. and Flanders et al. Models

Tobacco Control No Tobacco Control Complete Tobacco Control Lives Lost Lives Not Lost Percent Not Lost
APC Calibration
 TSCE I 2,067,775 2,461,572 1,124,994 1,336,578 393,797 29.5%
 TSCE II 2,067,775 2,573,161 875,419 1,697,742 505,386 29.8%
 Knoke et al. 2,067,775 2,890,181 491,193 2,398,987 822,406 34.3%
 Flanders et al. 2,067,775 3,156,175 416,733 2,739,441 1,088,400 39.7%
PC Calibration
 TSCE I 2,067,775 2,460,750 1,127,233 1,333,516 392,975 29.5%
 TSCE II 2,067,775 2,589,028 855,438 1,733,590 521,253 30.1%
 Knoke et al. 2,067,775 2,888,962 492,176 2,396,785 821,187 34.3%
 Flanders et al. 2,067,776 3,180,845 395,492 2,785,353 1,113,069 40.0%
Constant Calibration
 TSCE I 2,067,775 2,453,915 1,151,776 1,302,139 386,140 29.7%
 TSCE II 2,067,775 2,551,012 917,585 1,633,427 483,237 29.6%
 Knoke et al. 2,067,775 2,861,500 504,148 2,357,353 793,725 33.7%
 Flanders et al. 2,067,776 3,135,499 382,442 2,753,058 1,067,723 38.8%
No Calibration
 TSCE I 1,238,974 1,470,342 690,124 780,219 231,368 29.7%
 TSCE II 1,752,444 2,161,988 777,655 1,384,332 409,544 29.6%
 Knoke et al. 800,498 1,107,773 195,171 912,603 307,275 33.7%
 Flanders et al. 889,076 1,348,163 164,437 1,183,725 459,086 38.8%

4. DISCUSSION

Data from CPS-I and even more often CPS-II are used to calculate relative risks associated with smoking and even applied to other countries, and used to calculate smoking-attributable deaths. We consider four models, TSCE I and Knoke models use CPS-I data and TSCE II and Flanders use CPS II data. Our results show that these models underestimate lung cancer mortality risk and bias temporal trends, even when adjusted for smoking characteristics, including duration and intensity.

CPS-I models predict level of lung cancer deaths better than CPS-II (as evidenced by comparing the TSCE I and II models), but both underestimate lung cancer death rates each year from 1975 through 2000. This underprediction may be due to understatement of the number of true smokers in US surveys, or problems in CPS-II surveys, such as over-representation of the relatively low risk middle class, married, White, and more educated population, or misclassification bias, due to former or current smokers classifying themselves as never smokers, exposure of true never smokers to second hand smoke and undetected lung cancer. Lung cancer mortality rates may vary between populations (22), even after controlling for smoking histories.(9) However, since between 80 and 90% of lung cancer deaths are generally attributed to lung cancer, our results would suggest that prior estimates of smoking attributable death using the CPS I or CPS II are severely biased downward.

In comparing models, those using CPS-I data, especially TSCE I provides better estimates of trend, as evidenced by the better predictability when compared to the two CPS-II models with shift factors. However, all four models still contain detectable biases over time, especially for age and cohort effects. Relative departures from observed rates are especially apparent at younger ages, but these contribute little to overall numbers because risk is low.

For all these models, we find a similar intertemporal bias, indicating an underpredicting through about 1990 and then over predicting subsequently. Cohort effects also indicate upward followed by downward bias. Since most lung cancer deaths are generally attributed to smoking and previous studies indicate that risks to never smokers have stayed relatively constant (12), these results might suggest that the risk from cigarettes have been increasing over time in early cohorts, but may have been decreasing in more recent years. In addition, the lack of clear temporal trend suggests changes in risks due to the composition of cigarettes are not dramatic. Further research is warranted on how risks to smokers and non-smokers have changed over time.

All of the models described here do not fully capture the temporal trends in lung cancer mortality. The APC calibration suggests that age is not well characterized, especially in young ages. While the rate model could be revised to improve this, the rates are very low in the early age groups so this would have little effect on estimates of the number of lung cancer deaths. The period calibration was small in comparison to cohort. Reasons for the need to calibrate for cohort could be due to (a) limitations in the carcinogenesis model, (b) changes in the potency of cigarettes or (c) inaccurate estimates of exposure in the population. If manufacturing changes uniformly affected lethality of all cigarettes then one might expect to see an induced period effect. On the other hand, brand differences in temporal trend for lung cancer risk could result in an unexplained cohort effect should there be generational differences in brand preference or willingness to switch brands. Our results suggest the models are leaving more of the cohort effects unexplained than period effects but the reasons for this are uncertain. Existing data are not yet sufficient to identify with confidence the reasons for the discrepancy between population rates and the models for rates. Smoking and lung cancer data are currently being collected for additional years, which may help in identifying reasons for the discrepancies.

5. LIMITATIONS

We have focused on overall trends in lung cancer mortality rates, since these can be directly compared to available lung cancer mortality data showing that both shift and temporal calibration are required to bring them into conformity with observed rates. However, these data do not distinguish by smoking status. After calibration for age, period and cohort, all four models agree well with observed rates, but this by itself is not a good indication of a good model. Sizeable differences among these models can and do remain even though calibration provides equally good fit to population rates. The effects for smokers and ex-smokers on population rates are highly collinear, making it difficult to separate their effects using only population rates. These effects can only be determined from analytical studies, and the adequacy with which these apply to the US population is often difficult to judge. While the TSCE I and Knoke models exhibit very similar estimates of overall rates, the separate contributions by smoking category are quite different, e.g., Knoke model indicates that former smokers’ risk returns much more quickly to never smokers than TSCE I.

TSCE smoking models require individual exposure information, which we attempted to capture as best we could by assuming homogeneity of quantities like quit rates by intensity at different ages. However, homogeneity may not exist, e.g., there may be variation in quit rates by both duration and intensity among those of a particular age, which would require more detailed data on the joint distributions of duration and intensity than was available to us.

Further refinement of this implementation of these models could be accomplished by using a finer breakdown among the smoking categories by age at initiation, dose and age of cessation, and taking into account any correlation between age and these three characteristics. The results shown here used means for many of these characteristics, but the smoking history generator that was used to generate smoking histories could provide further detail on the underlying distribution. For age at initiation and mean dose within each quintile of exposure additional detail would probably have little effect. However, the effect could be somewhat larger for duration of cessation especially because the older ages with the highest rates would be most affected. Haldorsen and Grimsrud (23), for example, use a somewhat finer breakdown of longer duration among former smokers. Alternatively, one could include second order components to the distribution of these characteristics into the model, as was used by Holford et al (24). These refinements are not likely to have much effect on the comparisons among models because the same first order approximation was used in each case. One can also see the modest magnitude of the effect this might have by comparing the TSCE model which used an identical approximation in Chapter 12 and the corresponding micro-simulation model in Chapter 8 which would have fully captured detail on the distribution of these parameters.

The models that we apply do not distinguish racial-ethnic differences. Some previous studies indicate important differences in lung cancer rates and the role of smoking by race (2). The TCSE equations were estimated only for Whites and therefore may be biased when applied to other populations. We have also not examined lung cancer rates for females in this paper, but preliminary analysis indicates similar and perhaps greater biases.

We have considered three carcinogenesis models with parameters derived from two different cohorts selected from the US population. In Chapter 12 of this supplement, Holford et al. present results for the TSCE model using the Health Professionals Follow-up Study for males and the Nurses Health Study for females. Results from the model for males are similar to what we see in TSCE I and Knoke model results, i.e., weak agreement with age for younger individuals, excellent agreement for period and very good agreement with cohort trend. The APC calibrated results estimate 603,122 lives were not lost due to control that took place compared to 1,712,017 that might not have been lost under the ideal scenario, or 35.2%. These results are in the middle of the range of estimates shown in this Chapter, and the numbers of deaths tend to be closer to those resulting from the TSCE and Knoke models presented in Table II.

In sum, the availability of a unique set of data on smoking prevalence has enabled us to consider the effects of smoking intensity, duration and cessation over time more systematically than previous studies limited to longitudinal data over a limited number of years and a non-representative population. The analyses conducted in this paper provide a first step in considering the robustness of results from studies using the CPS-I and CPS-II data. In particular, we consider biases in predictions stemming from the TSCE, Knoke and Flanders models and how these biases may be changing over time. Our analysis suggests some important biases both with respect to level and trend in the effects of smoking related factors that may have changed over time.

Acknowledgments

This work was conducted in collaboration with the Cancer Intervention and Surveillance Network (CISNET) and we are grateful for their insights and assistance with obtaining data on population rates and smoking behavior over time. Funding was generously provided by National Cancer Institute grants CA97432 and CA97450.

APPENDIX

Knoke et al (5) equations for absolute lung cancer death risk of never smokers, smokers and ex-smokers were fitted to data for white males in CPS-I. Lung cancer death rates were estimated for excess risk (ER), where the mean absolute risk (R ) for smokers (S) and former smokers (FS) is added to never smokers (NS), or:

RS=RNS+ERS,RFS=RNS+ERFS

Knoke et al. modeled the absolute risk of death due to lung cancer in nonsmokers using a two-parameter Poisson regression model on attained age, in years:

RNS=9.2110-13(AGE2.38).

Excess risk of death due to lung cancer in continuing smokers employed a Poisson regression model with modified offset, an extended Doll and Peto model (25) using multistage theory of carcinogenesis:

ERS=[1.5110-13(AGE2.38)(CPD0.867)(DUR2.87)],

For former smokers, the lung cancer death rate is a function of years since quit (QT_YRS) and quit age (QT_AGE) for former smokers:

RFS=RNS+f(QT_YRS,QT_AGE)·ERS.

The excess smoker risk, ERS, continues to denote the excess risk as if the individual continued to smoke at the same intensity and duration when that individual quit. Having found no decline in risk for the first two years after quitting and no significant effect of CPD, they obtained

f(QT_YRS,QT_AGE)=exp{-(0.274-0.00279×QT_AGE)(QT_YRS-2)}.

Flanders et al. (3) used CPS-II data for all races to estimate separate lung cancer death rate equations for male and female smokers by 10 year age groups using the multistage model of Doll and Peto:

ages40-49:R=e-17.9×DUR19×CPD0.95ages50-59:R=e-17.4×DUR2.6×CPD0.52ages60-69:R=e-15.7×DUR2.4×CPD0.37ages70-79:R=e-13.0×DUR1.8×CPD0.39.

The estimates for ages 40–49 are applied to those ages 30–39, and the estimates for ages 70–79 were applied to ages 80–84.

References

  • 1.US Department of Health and Human Services. The health consequences of smoking: A report of the surgeon general. Washington, DC: Centers for Disease Control and Prevention, National center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; 2004. [Google Scholar]
  • 2.Doll R, Peto R, Wheatley K, et al. Mortality in relation to smoking: 40 years’ observations on male british doctors. British Medical Journal. 1994;309 (6959):901–11. doi: 10.1136/bmj.309.6959.901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Flanders WD, Lally CA, Zhu B-P, et al. Lung cancer mortality in relation to age, duration of smoking, and daily cigarette consumption: Results from cancer prevention study ii. Cancer Research. 2003;63:6556–62. [PubMed] [Google Scholar]
  • 4.Knoke JD, Shanks TG, Vaughn JW, et al. Lung cancer mortality is related to age in addition to duration and intensity of cigarette smoking: An analysis of cps-i data. Cancer Epidemiology, Biomarkers and Prevention. 2004;13 (6):949–57. [PubMed] [Google Scholar]
  • 5.Knoke JD, Burns DM, Thun MJ. The change in excess risk of lung cancer attributable to smoking following smoking cessation: An examination of different analytic approaches using cps-i data. Cancer Causes and Control. 2008;19 (2):207–19. doi: 10.1007/s10552-007-9086-5. [DOI] [PubMed] [Google Scholar]
  • 6.Meza R, Hazelton WD, Colditz GA, et al. Analysis of lung cancer incidence in the nurses’ health and the health professionals’ follow-up studies using a multistage carcinogenesis model. Cancer Causes & Control. 2008;19 (3):317–28. doi: 10.1007/s10552-007-9094-5. [DOI] [PubMed] [Google Scholar]
  • 7.Hazelton WD, Clements MS, Moolgavkar SH. Multistage carcinogenesis and lung cancer mortality in three cohorts. Cancer Epidemiology, Biomarkers & Prevention. 2005;14 (5):1171–81. doi: 10.1158/1055-9965.EPI-04-0756. [DOI] [PubMed] [Google Scholar]
  • 8.Thun M, Heath CW. Changes in mortality from smoking in two american cancer society prospective studies since 1959. Preventive Medicine. 1997;26 (4):422–6. doi: 10.1006/pmed.1997.0182. [DOI] [PubMed] [Google Scholar]
  • 9.Thun MJ, Day-Lally C, Myers DG, et al. Trends in tobacco smoking and mortality from cigarette use in cancer prevention studies i (1959 through 1965) and ii (1982 through 1988) In: Shopland DR, Burns DM, Garfinkel L, et al., translators and editors. Changes in cigarette-related disease risks and their implication for prevention and control, monograph 8. Bethesda, MD: U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Cancer Instidude; 1997. pp. 305–82. [Google Scholar]
  • 10.US Department of Health and Human Services. The health consequences of smoking: A report of th surgeon general. Washington: Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; [Google Scholar]
  • 11.United States Surgeon General’s Advisory Committee on Smoking and Health. Smoking and health: Report of the advisory committee to the surgeon general of the public health service. Washington: U.S. Department of Health, Education, and Welfare, Public Health Service; U.S. Government Printing Office; 1964. [Google Scholar]
  • 12.Thun MJ, Henley SJ, Burns D, et al. Lung cancer death rates in lifelong nonsmokers. Journal of the National Cancer Institute. 2006;98 (10):691–9. doi: 10.1093/jnci/djj187. [DOI] [PubMed] [Google Scholar]
  • 13.Holford TR. Age-period-cohort analysis. In: Armitage P, Colton T, translators and editors. Encyclopedia of biostatistics. Chichester: John Wiley & Sons; 1998. pp. 82–99. [Google Scholar]
  • 14.Holford TR. The estimation of age, period and cohort effects for vital rates. Biometrics. 1983;39:311–24. [PubMed] [Google Scholar]
  • 15.Fienberg SE, Mason WM. Identification and estimation of age-period-cohort models in the analysis of discrete archival data. In: Schuessler KF, translator and editor. Sociological methodology. Vol. 1978. San Francisco: Jossey-Bass, Inc; 1979. pp. 1–67. [Google Scholar]
  • 16.Kupper LL, Janis JM, Salama IA, et al. Age-period-cohort analysis: An illustration of the problems in assessing interaction in one observation per cell data. Communication in Statistics-Theory and Methods. 1983;12:2779–807. [Google Scholar]
  • 17.Kupper LL, Janis JM, Karmous A, et al. Statistical age-period-cohort analysis: A review and critique. Journal of Chronic Diseases. 1985;38:811–30. doi: 10.1016/0021-9681(85)90105-5. [DOI] [PubMed] [Google Scholar]
  • 18.Rogers WL. Estimable functions of age, period, and cohort effects. American Sociological Review. 1982;47:774–96. [Google Scholar]
  • 19.Holford TR. In: Multivariate methods in epidemiology. Kelsey JL, Marmot MG, Stolley PD, et al., editors. New York: Oxford University Press; 2002. Monographs in epidemiology and biotatistics. [Google Scholar]
  • 20.Aranda-Ordaz FJ. On two families of transformations to additivity for binary response data. Biometrika. 1981;68:357–63. [Google Scholar]
  • 21.McCullagh P, Nelder JA. Generalized linear models. 2. London: Chapman and Hall; 1989. [Google Scholar]
  • 22.Curado MP, Edwards B, Shin HR, et al. Cancer incidence in five continents, volume ix, lyon: International agency for research on cancer. Lyon: International Agency for Research on Cancer; 2007. [Google Scholar]
  • 23.Haldorsen T, Grimsrud TK. Cohort analysis of cigarette smoking an dlung cancer incidence among norwegian women. International Journal of Epidemiology. 1999;28:1032–6. doi: 10.1093/ije/28.6.1032. [DOI] [PubMed] [Google Scholar]
  • 24.Holford TR, Zhang Z, Zheng T, et al. A model for the effect of cigarette smoking on lung cancer incidence in connecticut. Statistics in Medicine. 1996;15:565–80. doi: 10.1002/(SICI)1097-0258(19960330)15:6<565::AID-SIM185>3.0.CO;2-T. [DOI] [PubMed] [Google Scholar]
  • 25.Doll R, Peto R. Cigarette smoking and bronchial carcinoma: Dose and time relationships among regular smokers and lifelong non-smokers. Journal of Epidemiology and Community Health. 1978;32:303–13. doi: 10.1136/jech.32.4.303. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES