Skip to main content
Cancer Informatics logoLink to Cancer Informatics
. 2011 Feb 23;10:31–44. doi: 10.4137/CIN.S6770

Extension of Cox Proportional Hazard Model for Estimation of Interrelated Age-Period-Cohort Effects on Cancer Survival

Tengiz Mdzinarishvili 1, Michael X Gleason 1, Leo Kinarsky 1, Simon Sherman 1,
PMCID: PMC3085422  PMID: 21552491

Abstract

In the frame of the Cox proportional hazard (PH) model, a novel two-step procedure for estimating age-period-cohort (APC) effects on the hazard function of death from cancer was developed. In the first step, the procedure estimates the influence of joint APC effects on the hazard function, using Cox PH regression procedures from a standard software package. In the second step, the coefficients for age at diagnosis, time period and birth cohort effects are estimated. To solve the identifiability problem that arises in estimating these coefficients, an assumption that neighboring birth cohorts almost equally affect the hazard function was utilized. Using an anchoring technique, simple procedures for obtaining estimates of interrelated age at diagnosis, time period and birth cohort effect coefficients were developed.

As a proof-of-concept these procedures were used to analyze survival data, collected in the SEER database, on white men and women diagnosed with LC in 1975–1999 and the age at diagnosis, time period and birth cohort effect coefficients were estimated. The PH assumption was evaluated by a graphical approach using log-log plots. Analysis of trends of these coefficients suggests that the hazard of death from LC for a given time from cancer diagnosis: (i) decreases between 1975 and 1999; (ii) increases with increasing the age at diagnosis; and (iii) depends upon birth cohort effects.

The proposed computing procedure can be used for estimating joint APC effects, as well as interrelated age at diagnosis, time period and birth cohort effects in survival analysis of different types of cancer.

Keywords: cancer survival, age, time period, cohort, hazard function, lung cancer

Introduction

In cancer epidemiology, survival and hazard functions are valuable characteristics of severity for a given type of cancer. By analyzing temporal trends of these functions, clinicians can evaluate their achievements in cancer diagnosis and treatment. This analysis can also help researchers develop novel approaches and strategies for fighting cancer.

The survival function, S(τ), is the probability for a cancer patient to stay alive longer than a specified time, τ, after cancer diagnosis. This function is related to the hazard function, h(τ), that determines the instantaneous risk (hazard) of death from the cancer at time, τ, given that the patient has survived up to this time:

s(τ)=eH(τ), (1)

where H(τ)=0τh(z)dz is the so-called cumulative hazard function.1,2

For each cancer type, these functions, along with the most common risk factors, such as gender, race, geographical areas of living, etc., also depend on age at diagnosis (ages at which patients were diagnosed with cancer), time period (calendar years when patients were diagnosed with cancer) and birth cohort (calendar years when cancer patients were born) effects.

Traditionally, survival functions have been evaluated from cohort-based follow-up observations by monitoring cancer patient survival in clinical-based registries. To analyze survival data, a single variable Kaplan–Meier method has been widely used.3 The survival functions obtained from these observations adequately describe survival data on cancer cases diagnosed many years ago. Data collected more recently have lower impact on evaluation of survival functions.

To overcome this shortcoming, the period analysis approach and its modification (called period analysis modeling technique) were introduced.48 The latter technique assumes the existence of a linear trend for the conditional survival estimates within the 5-year periods used for modeling. A period of five calendar years was chosen to optimize the most up-to-date and precise estimation of cancer survival function. Compared to traditional cohort-based approaches, the period analysis modeling technique allows one to derive more up-to-date and more precise estimates of survival function for cancer patients. However, the period analysis approach does not consider birth cohort effects.

A multivariate Cox regression approach in the frame of the proportional hazard (PH) model was used to assess the comparative risks or hazard functions of death from cancer.9 The PH model assumes that values of the hazard function are proportionally dependent upon the risk factors. A graphical approach using log-log plots was utilized to evaluate the PH assumption. A multivariate Cox regression approach was applied to estimate differences in hazards by histological types of pancreatic cancer. Along with other variables, such as gender, race, histological type, surgery status and cancer stage, age at diagnosis and time period effects were considered, while cohort effects were ignored. As we show below, in the frame of the PH model, age at diagnosis, time period and birth cohort effects are interrelated. To date, there is no numerical method for simultaneous estimation of these interrelated effects.

In this paper we are proposing to extend the Cox PH model and apply it for estimation of the interrelated age-period-cohort effects on cancer survival. It should be noted that this model can be utilized if the parallelism of log-log survival curves is present. In contrast to the single variable Kaplan–Meier approach that accounts only for time to event (survival) data, a multivariate Cox regression approach accounts for many confounding variables, as well as for censored data. In cancer research, the Cox PH model has been widely used for the analysis of data collected in nested case-control, case-cohort, and cohort studies, as well as in clinical trials. However, to our best knowledge, this approach was not used for analysis of data from population-based studies to estimate the interrelated age-period-cohort (APC) effects on cancer survival. The main reason for that is an identifiability problem with multiple estimators that arises in estimating these effects. In this paper we introduce a simple, computationally effective method to solve this identifiability problem. The proposed solution of this problem is analogous to one that we recently utilized for accounting APC effects on cancer incidence rates.10,11

As a proof-of-concept, the proposed approach was utilized to analyze the SEER data on lung cancer (LC) survival in white men and women. The validity of the PH assumption for analyzing this data was initially checked by assessing the parallelism of log-log survival curves. The proposed approach allowed us to estimate numerically the interrelated age at diagnosis, time period and birth cohort effects on survival and hazard functions of LC in white men and women.

Methodology

Generally, APC analysis refers to a family of statistical techniques for understanding temporal trends of an outcome under consideration (such as cancer incidence or mortality rates, hazard function of death from cancer, etc.). The purpose of this analysis is to determine separate contributions of age, time period of observations, and birth cohort to this outcome.12 This kind of analysis, along with other data, can also be performed with the use of cancer follow-up survival data collected over a long period of time from large population-based cancer registries (such as, for instance, the Surveillance, Epidemiology, and End Results (SEER) Program).13

Statement of the problem

Let us assume that for each patient with a particular type of cancer, there is information on age at diagnosis, date of diagnosis, date of birth, as well as follow-up data on death from the cancer at time, τ, given that the patient has survived up to this time and the right censorship is presented by a dichotomous value (0 or 1). We can group the data by their belonging to the categorical intervals noted by the i, j and l indexes, where index i (i = 1,2, …, n) denotes successive age at diagnosis intervals, index j (j = 1,2, …, m) denotes successive time period intervals, and index l (l = 1,2, …, k) denotes successive birth cohort intervals. Let us denote the corresponding hazard functions for cancer patients with (i,j,l) grouped data by hi,j,l(τ). This function, along with τ, also depends on the i, j, and l indexes, which are related by the following linear relationship:

l=ji+n. (2)

This relationship directly follows from the fact that if an event occurs to an individual of age a in year p then a particular cohort c = pa must be involved.12

To determine the separate contributions of age, period, and cohort effects to the hi,j,l(τ) function let us use the PH model, which is widely utilized in cancer survival analysis.1,2 In the frame of this model, the hi,j,l(τ) function proportionally depends upon age at diagnosis (wi), time period (vj) and birth cohort effect coefficients (ul), as well as on the baseline hazard function (h0(τ)) the following way:

hi,j,l(τ)=wivjulh0(τ). (3)

Now, the APC analysis problem is to estimate the wi, vj, and ul coefficients and the h0(τ) function, using the patient’s survival time data, τ, grouped by the i, j and l indexes. These survival time data also contain information for right censorship presented by dichotomous values (0 or 1). Since the i, j and l indexes are connected by linear relationship (2), values of these coefficients are interrelated and the estimation of these coefficients is an identifiability problem with multiple estimators.14 It means that there are many solutions to this problem that equally satisfy the observed survival time data and this problem needs to be transferred into the problem that has a single solution. This is the main difficulty in solving this problem. To the best of our knowledge in survival studies, the identifiability problem of APC analysis stated in such a way has not been solved yet.

Computational procedure for solving the problem

Below, we introduce a simple, computationally effective two-step procedure to solve the aforementioned identifiability problem. In the first step, it estimates the influence of joint APC effects on the hazard function, using a Cox PH approach. In the second step, the coefficients for age at diagnosis, time period and birth cohort effects are estimated. To solve the identifiability problem in estimating these coefficients, an additional assumption that neighboring birth cohorts almost equally affect the hazard function is utilized. The proposed procedure uses the same assumption that we have effectively used for accounting APC effects on cancer incidence rates.10,11 Using an anchoring technique, simple algorithms for obtaining estimates of interrelated age at diagnosis, time period and birth cohort effect coefficients are developed and coded into a computer program. A detailed explanation of this two-step procedure is presented below.

Step 1. Determination of joint age-period-cohort effect coefficients

Let us present (3) in the following way:

hi,j,l(τ)=ai,j,lh0(τ)i=1,2,...,n;j=1,2,...,m;l=ji+n; (4)

where ai,j,l designates the product of wivjul and h0(τ). Since l = ji + n, grouping by three indexes i, j, and l can be reduced to the grouping by two indexes, i and j, and the system (4) can be presented as:

hi,j(τ)=ai,jh0(τ)i=1,2,...,n;j=1,2,...,m. (5)

Now, using system (5) with observed survival data, one can assess each ai, j and its standard error (SE), as well as h0(τ). For this purpose, the Cox PH regression approach that uses maximum likelihood estimates can be utilized. The ai,j* estimates (here and below the asterisks designate the estimates) need to be anchored to one of the coefficients to be estimated. This coefficient, say, ai0,j0*, is assumed to be equal to 1 and its SE is assumed to be equal to 0, (i.e. ai0,j0*=1 and SE(ai0,j0*)=0, where i0 and j0 are indexes of the anchored coefficient ai, j).

Note: The Cox PH model, that is a particular case of the PH model, is usually written in terms of an exponential expression:

hi,j(τ)=h0(τ)elnai,j, (6)

where parameters to be estimated are ln ai, j. This exponential form of the expression (6) provides nonnegative estimates of ai, j.1

Step 2. Determination of coefficients for interrelated age at diagnosis, time period and birth cohort effects

The estimates ai,j*, obtained on the previous step, can be used for estimating the wi, vj, and ul coefficients. For this purpose, three sets of estimates can be obtained from the system of i × j conditional equations

ai,j*=wivjuli=1,2,...,n;j=1,2,...,m;l=ji+n. (7)

These sets are: (i) estimates for the age at diagnosis coefficients ( wi*); (ii) estimates for the time period coefficients ( vj*); and (iii) estimates for the birth cohort effect coefficients ( ul*). However, due to the linear relationship (2) between indexes i, j and l, these three effects are interrelated. As a result, the identifiability problem with multiple estimators arises in system (7): different combinations of corresponding effect coefficients equally satisfy the observations of cancer survival. This problem is analogous to the problem of accounting for effects of age, period, and cohort on cancer incidence rates.12,1518

To solve the identifiability problem in APC analysis it is necessary to make additional assumptions.12,1518 For such an assumption, we hypothesize that the neighboring birth cohorts almost equally affect the cancer survival data. The rationale for this assumption is that, in practice, the adjacent cohorts are overlapping in time intervals and thus the values of the corresponding cohort effects should be close.14 Based on this assumption, we proposed a novel computing procedure for numerical estimation of interrelated age at diagnosis, time period and birth cohort coefficients on cancer survival and hazard functions.

Estimation of age at diagnosis effect coefficients

Let us consider the i × j matrix with ai,j* elements presented in system (7). By dividing the corresponding elements of the neighboring rows (with indexes i and i + 1 or i + 1 and i) of this matrix, one can obtain two systems of equations (vj coefficients are canceled out):

ai,j*ai+1,j*=wiwi+1ulul1;i=1,...,n1;j=1,...,m;l=ji+n (8)

and

ai+1,j*ai,j*=wi+1wiul1ul;i=1,...,n1;j=1,...,m;l=ji+n. (9)

Note: (8) provides (n − 1) × m conditional equations for assessing n − 1 ratios of time period coefficients (wi/wi+1, i = 1, …, n − 1), and m − 1 + n − 1 ratios of the cohort effect coefficients (ul/ul−1, l = 2, …, m − 1 + n). Analogously, (9) provides (n − 1) × m conditional equations for assessing n − 1 ratios of time period coefficients (wi+1/wi, i = 1, …, n − 1), and m − 1 + n − 1 ratios of cohort effect coefficients (ul−1/ul, l = 2, …, m − 1 + n).

Assuming that any pair of the neighboring cohorts has a cohort effect coefficient ratio close to 1, the following pair of systems can be obtained:

ai,j*ai+1,j*=wiwi+1;i=1,...,n1;j=1,...,m (10)

and

ai+1,j*ai,j*=wi+1wi;i=1,...,n1;j=1,...,m. (11)

When coefficients of variation of estimates ai,j* are small, SEs of the ratios ai,j*/ai,j+1* and ai,j+1*/ai,j* can be calculated by standard rules of error propagation.19 For estimation of wi/wi+1 and wi+1/wi, a least squares method can be applied and the most efficient estimates for these ratios are the weighted means of the values ai,j*/ai,j+1* and ai,j+1*/ai,j* averaged through index j, correspondingly (weights are given as reciprocals of the square of their standard errors). The SEs of the estimates (wi/wi+1)* and (wi+1/wi)* can be calculated in a standard way. In fact, after anchoring the age at diagnosis coefficient at index i0, assuming wi0 = 1 and SE(wi0) = 0, one can obtain the following recurrent estimates of wi*:

wi0+1*=(wi0+1wi0)*,wi0+2*=(wi0+2wi0+1)*wi0+1*,...,wn*=(wnwn1)wn1* (12)

and

wi01*=(wi01wi0)*,wi02*=(wi02wi01)*wi01*,...,w1*=(w1w2)w2*. (13)

Note 1: Index i0 is defined from the corresponding index of the anchored coefficient ai0,j0*=1. The SE of wi* can be calculated by the standard rules of error propagation by means of the estimates (wi/wi+1)*, (wi+1/wi)* and their SEs.

Note 2: Analogous to our previous works for the APC analysis of cancer incidence rates,10,11 one can show that errors of the estimates wi* (as well as errors of vj* and ul*) are distributed approximately normally. This was used to test the null hypotheses (wi = wi0, vj = vj0, and ul = ul0) by the standard z-test.

Estimation of time period effect coefficients

By dividing the corresponding elements of the neighboring columns (with indexes j and j + 1 or j + 1 and j) of the i × j matrix with ai,j* elements, one can obtain the following two systems of equations (wi coefficients are canceled out):

ai,j*ai,j+1*=vjvj+1ulul+1;i=1,2,...,n;j=1,2,...,m1;l=ji+n (14)

and

ai,j+1*ai,j*=vj+1vjul+1ul;i=1,2,...,n;j=1,2,...,m1;l=ji+n. (15)

Note: (14) provides n × (m − 1) conditional equations for assessing m − 1 ratios of time period coefficients (vj/vj+1, j = 1, …, m − 1), and m − 1 + n − 1 ratios of cohort effect coefficients (ul/ul+1, l = 1, …, m − 1 + n − 1). Analogously, (15) provides n × (m − 1) conditional equations for assessing m − 1 ratios of time period coefficients (vj+1/vj, j = 1, …, m − 1), and m − 1 + n − 1 ratios of cohort effect coefficients (ul+1/ul, l = 1, …, m − 1 + n − 1). Assuming that for any pair of the neighboring cohorts, the ratio of their cohort effect coefficients is close to 1, one can obtain from (14) and (15) a pair of systems:

ai,j*ai,j+1*=vjvj+1;i=1,2,...,n;j=1,2,...,m1 (16)

and

ai,j+1*ai,j*=vj+1vj;i=1,2,...,n;j=1,2,...,m1. (17)

When coefficients of variation of estimates ai,j* are small, SEs of the ratios ai,j*/ai,j+1* and ai,j+1*/ai,j* can be calculated by standard rules of error propagation. 19 For estimation of vj/vj+1 and vj+1/vj, a least squares method can be applied and the most efficient estimates for these ratios are the weighted means of the values ai,j*/ai,j+1* and ai,j+1*/ai,j* averaged through index i, correspondingly (weights are given as reciprocals of the square of their standard errors). The SEs of the estimates (vj/vj+1)* and (vj+1/vj)* can be calculated in a standard way.

After anchoring the age at diagnosis coefficient at index j0, assuming v j0 = 1 and SE(v j0) = 0, one can obtain the following recurrent estimates of vj*:

vj0+1*=(vj0+1vj0)*,vj0+2*=(vj0+2vj0+1)*vj0+1,...,*vm*=(vmvm1)*vm1* (18)

and

vj01*=(vj01vj0)*,vj02*=(vj02vj01)*vj01,...,*v1*=(v1v2)*v2*. (19)

The SE of vj* can be calculated by the standard rules of error propagation by means of the estimates (vj/vj+1)* and (vj+1/vj)* and their SEs.

Note 1: Index j0 is defined by the anchored coefficient ai0,j0*=1.

Note 2: The preceding method for estimation of time period effect coefficients is similar to the method for estimation of age at diagnosis effect coefficients. In the first case, the conditional equations are derived dividing the corresponding elements of the neighboring rows of i × j matrix with ai,j*. In the second case, the conditional equations are derived dividing the corresponding elements of the neighboring columns.

Estimation of birth cohort effect coefficients

One way to assess ul is as follows. After evaluating the time period effect coefficients, vj*, one can correct the ai,j* coefficients for time period effects by dividing them by vj*. From (7) and (14), the following two systems of conditional equations can be derived:

ai,j*/vj*ai,j+1*/vj+1*=ulul+1;i=1,,n;j=1,,m1;l=ji+n (20)

and

ai,j+1*/vj+1*ai,j*/vj*=ul+1ul;i=1,,n;j=1,,m1;l=ji+n. (21)

By the standard rules of error propagation, one can obtain SEs of the ratios of the corrected coefficients ai,j*/vj* and ai,j+1*/vj+1* by means of the standard errors of ai,j*, vj*, ai,j+1* and vj+1*. Similar to the ratios of the time period coefficients, the ratios ul/ul+1 and ul+1/ul can be estimated by the weighted means of values of the left sides of systems (20) and (21). Weights should be given according to the SEs of the corrected coefficients (reciprocal of squares of the SEs). The index of the cohort coefficient to be anchored can be simply obtained from the relationship (2) between the j, l, and i indexes.

By setting ul0 = 1 and (SE(ul0) = 0), all cohort coefficients and their SEs can be estimated by a procedure analogous to one used for determination of time period coefficients:

ul0+1*=(ul0+1ul0)*,ul0+2*=(ul0+2ul0+1)*ul0+1*,,uk*=(ukuk1)*uk1*;k=m1+n (22)

and

ul01*=(ul01ul0)*,ul02*=(ul02ul01)*ul01*,,ul*=(ulu2)*u2*. (23)

Note: Index l0, for anchoring the birth cohort coefficient is simply derived as: l0 = j0i0 + n. The SE of ul* can be calculated by the standard rules of error propagation by means of the estimates (ul/ul+1)* and (ul+1/ul)*, and their SEs.

Additional details of the proposed procedure are discussed below on the example of analysis of lung cancer (LC) survival data collected in the SEER database.

Potential limitations

The proposed extension of the Cox PH model has several potential limitations. First, this model can be utilized only if the parallelism of log-log survival curves is present. However, the problem of visual evaluation of the parallelism by graphical approaches is to decide “how parallel is parallel?” For a given data set, this decision can be quite subjective. Therefore, we utilized the recommendation of a conservative strategy proposed by Kleinbaum and Klein1 suggesting that the PH assumption is satisfied if there is not strong evidence for the non-parallelism of considered log-log survival curves.

Second, to solve the identifiability problem, the proposed approach uses an assumption that neighboring birth cohorts almost equally affect the cancer survival data. Therefore, after estimating the birth cohort coefficients and their SEs, the validity of this assumption needs to be verified. If the obtained estimates of some neighboring birth cohort coefficients are statistically different (i.e. validity of this assumption will not be proved by obtained results of calculations), the results cannot be fully justifiable.

It could also be argued that the requirement for categorizing the age at diagnosis, time-period, and birth cohort by equally-sized time intervals reduces areas of possible application of the proposed procedure. By admitting this limitation, we suggest that in practice, the quantitative estimation of the age at diagnosis, time period and birth cohort effect coefficients mainly depends on the amount and quality of the collected data rather than on the use of the equally-sized time intervals. Indeed, according to common practice used in cancer epidemiology, to smooth out random fluctuations in cancer incidence rates, the age at diagnosis, time period and birth cohort intervals are grouped in 5-year time intervals.18 When the amount of analyzed data is large enough, these time intervals can be diminished to, say, 3 or 4 years that will result in improved accuracy of coefficients determination. However, when the collected data is relatively small, the length of these intervals can be enlarged up to 10 years.12 The price of this, however, will be the lower accuracy in calculated coefficients.

In principle, the approach proposed in this work can be further extended for cases when the age at diagnosis and time period intervals have different durations. For this purpose, the technique proposed in the literature20 can be utilized. However, it poses additional identifiability problems24 and the use of this technique requires the development of a more complicated computational procedure, while benefits of its use are questionable. Therefore, we decided to keep such an extension out of the scope of this work.

Example

Estimation of APC effects in lung cancer survival analysis

The proposed procedure was used for estimation of the APC effects in LC survival of white men and women. Selections of LC cases and data preparation, as well as implementation of the proposed procedure and analysis of the obtained results, are presented below.

Selection of LC cases and data preparation

In this work, we used the SEER database that contains cancer follow-up survival data collected 1973 through 2004 in the SEER 9 Registries13 (Connecticut, Detroit, Hawaii, Iowa, New Mexico, San Francisco-Oakland, Utah, Atlanta (1975–2004), and Seattle-Puget Sound (1975–2004)). From this database, we selected cases for white men and white women aged 40–84 and diagnosed with LC in the 1975–1999 time period, for a total of 272,604 cases. By using the same data processing methodology as described in the SEER Survival Monograph21 and our previous study,22 we excluded 38,463 cases that were not first primary cancers; from the obtained subset, we excluded 5,006 cases that were diagnosed via death certificate or at autopsy only; then, we excluded 16,413 cases that were not microscopically confirmed by a pathologist, yielding a total of 212,722 cases (134,360 male and 78,362 female). Choosing the 1975–1999 time period allowed us to analyze the survival with a minimum of five years of follow-up data for LC patients diagnosed in 1999 or earlier. For the selected cases, the survival time was measured in months from the date of diagnosis until the date of death. Cases lost to follow-up were right-censored at the time of the last known follow-up, and patients alive at the end of our study period (December 31, 1999) were right-censored at this date.

The ages of LC patients at the time of diagnosis were divided in nine age at diagnosis intervals, denoted by index i: i = 1 for 40–44; i = 2 for 45–49; …; i = 8 for 75–79; and i = 9 for 80–84. To get a sufficiently large sample size for statistical analysis, data for the age groups 40 years and over was used (in this case, the number of patients within each age at the diagnosis group exceeds 300). The considered 25-year range of observations (1975–1999) of LC patients was divided into five 5-year time period intervals denoted by index j: j = 1 for 1975–1979; j = 2 for 1980–1984; j = 3 for 1985–1989; j = 4 for 1990–1995; and j = 5 for 1995–1999. In addition, 13 birth cohorts corresponding to the aforementioned age at diagnosis and time periods were divided into 5-year intervals denoted by index l (l = ji + 9): (l = 1) 1895–99; (l = 2) 1900–04; …; (l = 12) 1950–54; and (l = 13) 1955–59.

For the PH model, the hazard function of LC was presented as hi, j,l(τ) = wivjulh0(τ), which is a function of the age at diagnosis (wi), time period (vj) and birth cohort (ul) coefficients, as well as the baseline hazard function, h0(τ). For further convenience, we present this model in Table 1. In this table, hazard functions hi, j,l(τ) = wivjulh0(τ) are located in the following way: for a given age at diagnosis interval (i) along the row; for a given time period interval (j) along the column; and for a given birth cohort interval (l) along the diagonal. From this table, it is clear that indexes i, j and l are interrelated: any combination of two indexes simply defines the third index (for instance, the row and column defines the diagonal, etc.).

Table 1.

Presentation of the hazard function h(τ, wi, vj, ul) by age at diagnosis (wi), time period (vj) and birth cohort (ul) effect coefficients, and the baseline hazard function, h0(τ).

graphic file with name cin-2011-031t1.jpg

Notes: The abbreviation, “mp, ti,” indicates the midpoint of the i-th age at diagnosis interval. Arrows show directions (along diagonals) of changing hazard functions of death from cancer for patients born in the given intervals of calendar years (birth cohort intervals).

Validation of the PH model

To test the validity of the PH model given by formulas (1) and (2), we used a graphical approach using log-log plots.1 According to this approach, for each (i, j) cell of Table 1 we plotted the survival curves, S*, as a function of time τ determined by the method of Kaplan-Meier and then considered the ln(−ln S*) curve.1 The parallelism of the log-log survival plots for different cells (i,j) provides one with a graphical approach for assessing the PH assumption. In fact, from (1) and (3) it follows:

lnS(τ,wi,vj,ul)=wivjul0τh0(z)dz (24)

and

ln(lnS(τ,wi,vj,ul)=ln(wi,vj,ul)+ln[0τh0(z)dz]. (25)

When the PH assumption is valid, it follows from formulas (24) and (25) that ln(−ln S(τ,vj,ul,wi) will represent the logarithm of the cumulative baseline hazard function of death from cancer, lnH0(τ)=ln[0τh0(z)dz], shifted along the ordinate axis by the value of ln(wivjul). After inspecting the log-log survival plots for each cell of Table 1, we accepted the PH models for LC for both men and women (data not shown).

Results and Discussion

To estimate the joint APC effects in the frame of the Cox PH model, we used Table 2, where for each cell ai, j = wivjul (see the section Step 1, above).

Table 2.

Presentation of the hazard function h(τ, wi, vj, ul) as a function of joint age-period-cohort effect coefficients ai, j(ai, j = wi vj ul) and the baseline hazard function, h0(τ).

Period of observation

Age group 1975–79 1980–84 1985–89 1990–94 1995–99

i mp, ti j= 1 j= 2 j= 3 j= 4 j= 5
1 42.5 a1,1h0(τ) a1,2h0(τ) a1,3h0(τ) a1,4h0(τ) a1,5h0(τ)
2 47.5 a2,1h0(τ) a2,2h0(τ) a2,3h0(τ) a2,4h0(τ) a2,5h0(τ)
3 52.5 a3,1h0(τ) a3,2h0(τ) a3,4h0(τ) a3,4h0(τ) a3,5h0(τ)
4 57.5 a4,1h0(τ) a4,2h0(τ) a4,3h0(τ) a4,4h0(τ) a4,5h0(τ)
5 62.5 a5,1h0(τ) a5,2h0(τ) a5,3h0(τ) a5,4h0(τ) a5,5h0(τ)
6 67.5 a6,1h0(τ) a6,2h0(τ) a6,3h0(τ) a6,4h0(τ) a6,5h0(τ)
7 72.5 a7,1h0(τ) a7,2h0(τ) a7,3h0(τ) a7,4h0(τ) a7,5h0(τ)
8 77.5 a8,1h0(τ) a8,2h0(τ) a8,3h0(τ) a8,4h0(τ) a8,5h0(τ)
9 82.5 a9,1h0(τ) a9,2h0(τ) a9,3h0(τ) a9,4h0(τ) a9,5h0(τ)

Note: The abbreviation, “mp, ti”, indicates the midpoint of the i-th age at diagnosis interval.

To estimate the ai, j coefficients, we used the Cox PH model, written in terms of an exponential expression (6), and utilized the MATLAB function, “coxphfit”. (It should be noted that for this purpose, other programs for Cox PH regression analysis can also be used.) For each (i,j) cell (where i = 1,2, …, 9; j = 1,2, …, 5) two files were used as input for this function. The first file contained the survival time data, τi, j,ρ = τi, j,l,ρ, where l = ji + 9, and ρ denotes the patient’s identification index. The second file contained dichotomous values (0 or 1) for the censorship status of each patient. As output data, the coxphfit function provided the estimates (ln ai, j)* and SE[(ln ai, j)*], and the estimates of the cumulative baseline H0*(τ) for τ = τi, j,ρ.

We obtained the estimates ai,j* and SE(ai,j*) by the formulas:

ai,j*=e(lnai,j)* (26)

and

SE(ai,j*)=ai,j*SE[(lnai,j)*]. (27)

To obtain estimates of the age at diagnosis, time period and birth cohort effect coefficients ( wi*, vj* and ul*, correspondingly), we used our newly developed MATLAB computing program, called the “apcsur” function. This program implements algorithms described above (see the section Step 2) and uses the estimates ai,j* and SE(ai,j*), as well as indexes of the age at diagnosis, time period and birth cohort intervals to be anchored as input data. The coefficients for the anchored intervals were taken equal to 1 and their SE equal to 0. Values of other coefficients were estimated relative to the values of the anchored coefficients. The estimates of the wi*, vj* and ul* coefficients were obtained as output data of the “apcsur” function.

In this work, the age at diagnosis, time period, and birth cohort effect coefficients with median indexes of i = 5, j = 3 and l = 7 were chosen as anchors; values of these coefficients were taken as w5 = 1, v3 = 1 and u7 = 1 and their SEs were taken equal to 0. The anchored coefficients were chosen based on our numerical experiments and showed (data not presented) that, in this case, the SEs of the majority of coefficients to be estimated were smaller than for any other combination.

Table 3 presents the estimates of the age at diagnosis effect coefficients, wi*, and their SEs for white men and women with LC. Statistical differences between these coefficients and the coefficient for the anchored age interval, 60–64, with a value set to be equal to 1, were measured by P-values calculated using the standard z-test. The obtained P-values are shown in Table 3; P-values for the coefficients statistically distinguishable from 1 (with the significance level of 0.05) are shown in italics. Figure 1 shows the trends of the age at diagnosis effect coefficients in white men (A) and women (B). As can be seen, for both men and women, the estimates of age at diagnosis coefficients increase with age, i.e. the hazard of death from LC increases with age. Because, for a given τ, from formula (1) it follows that when the hazard function is increasing, the survival function is decreasing, we can conclude that LC survival rates decrease with age. This statement is consistent with the conclusion made in this work,23 which is that the cancer survival rates decrease with age.

Table 3.

Estimated values of the age at diagnosis coefficients wi*, their standard errors (SE), and P-values characterizing the statistical difference between the estimated coefficient and the anchored coefficient.

Age interval index White men
White women
wi* ± SE P-value wi* ± SE P-value
1 0.77 ± 0.03 <0.0001 0.78 ± 0.04 <0.0001
2 0.85 ± 0.02 <0.0001 0.84 ± 0.03 <0.0001
3 0.88 ± 0.02 <0.0001 0.86 ± 0.02 <0.0001
4 0.95 ± 0.01 <0.0001 0.94 ± 0.02 <0.0001
5 1.00 1.00
6 1.08 ± 0.02 <0.0001 1.06 ± 0.02 <0.0001
7 1.20 ± 0.02 <0.0001 1.19 ± 0.02 <0.0001
8 1.34 ± 0.03 <0.0001 1.36 ± 0.04 <0.0001
9 1.33 ± 0.04 <0.0001 1.37 ± 0.05 <0.0001

Notes: The coefficient for age interval 5 is the anchored coefficient and is defined as 1.0. Italicized P-values denote coefficients statistically distinguishable from 1.0 (with significance level 0.05).

Figure 1.

Figure 1.

Variation of age at diagnosis coefficients, anchored at index 5, with time period index (i), in white men (A) and white women (B).

Notes: Error bars indicate 95% confidence intervals. Open circles indicate coefficients significantly different than 1.0; closed circles indicate coefficients not significantly different from 1.0; “x” indicates anchor point.

Table 4 presents for white men and women with LC estimates of the time period effect coefficients vj* and their SEs, as well as P-values calculated using the standard z-test, for four time period effect coefficients compared to 1 (that is the anchored coefficient for the 1985–89 time period). Figure 2 shows that trends of these coefficients in white men (A) and women (B) demonstrate a slight decrease with time, i.e. the hazard of death from LC has somewhat decreased since the 1975–1999 time period. (More detailed analysis of improvement in LC survival over time is given below, see Table 6). This conclusion is different from the conclusion made by Bassily et al.23 that states that the LC survival has not improved over three decades. One possible explanation of this discrepancy is the use of different approaches in analysis of the observed survival data: in this work23 a single variable Kaplan-Meier approach that accounts only time to event (survival) data was used, while in our work we used a modified multivariate Cox regression approach that additionally accounts for interrelated APC effects on cancer survival.

Table 4.

Estimated values of the time period coefficients vj*, their standard errors (SE), and P-values characterizing the statistical difference between the estimated coefficient and the anchored coefficient.

Time period index White men
White women
vj* ± SE P-value vj* ± SE P-value
1 1.06 ± 0.02 0.0003 1.04 ± 0.02 0.06
2 1.00 ± 0.01 0.62 1.02 ± 0.02 0.18
3 1.00 1.00
4 0.92 ± 0.01 <0.0001 0.95 ± 0.01 0.0003
5 0.91 ± 0.01 <0.0001 0.93 ± 0.02 <0.0001

Notes: The coefficient for time period index 3 is the anchored coefficient and is defined as 1.0. Italicized P-values denote coefficients statistically distinguishable from 1.0 (with significance level 0.05).

Figure 2.

Figure 2.

Variation of time period coefficients, anchored at index 3, with time period index (j), in white men (A) and white women (B).

Notes: Error bars indicate 95% confidence intervals. Open circles indicate coefficients significantly different than 1.0; closed circles indicate coefficients not significantly different from 1.0; “x” indicates anchor point.

Table 6.

Estimated values (in %) of the LC survival function for white men and white women in the 60–64 age at diagnosis group.

Survival time 12 months 24 months 36 Months



Time periods 1975–1979 1995–1999 1975–1979 1995–1999 1975–1979 1995–1999
Men 37.2 ± 0.8 42.3 ± 0.8 16.2 ± 0.6 20.5 ± 0.7 11.4 ± 0.5 15.1 ± 0.6
Women 44.6 ± 1.1 48.6 ± 1.0 22.5 ± 1.0 26.4 ± 1.0 17.0 ± 0.9 20.5 ± 0.9

Table 5 presents for white men and women with LC estimates of the birth cohort effect coefficients, ul*, and their SEs, as well as P-values calculated using the standard z-test for twelve birth cohort effect coefficients compared to 1 (that is the anchored coefficient for the 1925–29 birth cohort). Figure 3 presents trends of these coefficients in white men (A) and women (B). As can be seen, for men three (from eight) birth cohort effect coefficients (namely, the coefficients for the 1895–99, 1945–49, and 1950–54 birth cohort periods) are statistically distinguishable from the coefficient of the anchored cohort, 1925–29. For women, only one birth cohort effect coefficient for 1895–99 is statistically distinguishable from the coefficient 1 of the anchored cohort, 1925–29. These data suggest that influence of the birth cohort effects on the hazard of death from LC should not be ignored in the LC survival analysis.

Table 5.

Estimated values of the birth cohort coefficients, ul*, their standard errors (SE), and P-values characterizing the statistical difference between the estimated coefficient and the anchored coefficient.

Birth cohort index White men
White women
ul* ± SE P-value ul* ± SE P-value
1 1.20 ± 0.08 0.006 1.25 ± 0.12 0.04
2 0.94 ± 0.04 0.15 0.97 ± 0.06 0.56
3 1.04 ± 0.03 0.23 0.98 ± 0.04 0.72
4 0.99 ± 0.02 0.24 1.00 ± 0.04 1.00
5 1.04 ± 0.02 0.08 1.01 ± 0.03 0.73
6 1.01 ± 0.01 0.48 1.01 ± 0.02 0.61
7 1.00 1.00
8 1.02 ± 0.02 0.24 1.01 ± 0.02 0.71
9 1.04 ± 0.02 0.12 1.00 ± 0.03 0.93
10 1.06 ± 0.04 0.07 1.01 ± 0.04 0.76
11 1.13 ± 0.05 0.009 1.06 ± 0.06 0.33
12 1.19 ± 0.07 0.01 0.99 ± 0.07 0.85
13 1.14 ± 0.11 0.23 0.99 ± 0.16 0.95

Notes: The coefficient for birth cohort index 7 is the anchored coefficient and is defined as 1.0. Italicized P-values denote coefficients statistically distinguishable from 1.0 (with significance level 0.05).

Figure 3.

Figure 3.

Variation of birth cohort coefficients, anchored at index 7, with birth cohort index (l), in white men (A) and white women (B).

Notes: Error bars indicate 95% confidence intervals. Open circles indicate coefficients significantly different than 1.0; closed circles indicate coefficients not significantly different from 1.0; “x” indicates anchor point.

Overall, our analysis suggests that for both white men and women diagnosed with LC during the 1975–1999 time period, the hazard for the death from LC depends not only on age at diagnosis (wi) and time period (vj) coefficients, but also on birth cohort (ul) coefficients.

The obtained estimates of the joint (age at diagnosis, time period and birth cohort) effect coefficients, ai,j*, and estimates of the cumulative hazard, H0*(τ), were used for estimates of survival functions S*(τ,wi,vj,ul). In the frame of the PH model, these estimates can be obtained by the following formula:

S*(τ,wi,vj,ul)=eai,j*H0*(τ)i=1,,9;j=1,,5. (28)

As an example, Table 6 presents estimated probabilities (in %) of 12-, 36- and 60-month LC survival (τ = 12, τ = 36 and τ = 60, correspondingly) for the 60–64 age groups of white men and women. These data show that in men diagnosed with LC in the age interval of 60–64 years, 12-month survival probability has increased about 5%, 36-month survival probability about 4%, and 60-month survival probability about 4%. For women diagnosed with LC at the 60–64 age interval, 12-month survival probability has increased about 4%, 36-month survival probability about 4%, and 60-month survival probability about 3%. Analogous improvements of the LC survival for the time period of 1975–1999 were revealed for the majority of the considered age at diagnosis groups of white men and women.

It should be noted that the estimates of survival functions for the observed data, indexed by (i,j), can also be obtained by the Kaplan-Meier method.3 Our calculations showed that the estimates obtained by the proposed approach were close to values of survival functions obtained by the Kaplan–Meier method (data not shown). However, the single variable Kaplan–Meier approach accounts only for time to event (survival) data, while our approach allows modeling survival functions and, in the frame of the PH model, estimating the joint, as well as separate influences of interrelated age at diagnosis, time period and birth cohort effects on the survival and hazard functions.

Conclusion

A novel, computationally effective two-step procedure for estimating APC effects for cancer survival in the frame of the PH model was developed. This procedure allows one to estimate joint APC effect coefficients, as well as interrelated age at diagnosis, time period and birth cohort effect coefficients.

A standard software package for Cox PH regression analysis was used to estimate joint APC effect coefficients. To obtain estimates of the interrelated age at diagnosis, time period and birth cohort effect coefficients, we assumed that the neighboring birth cohorts almost equally affect the hazard function for the death from cancer. It should be noted that this assumption is milder than assumptions utilized in APC analysis of cancer incidence rates by other authors (such as, for example, that cohort effects are absent,18 or trends of cohort effects can be presented as smooth functions,14 etc.). Our assumption allows one to solve the identifiability problem of estimating these coefficients. Using an anchoring technique, we developed simple algorithms to obtain estimates of the age at diagnosis, time period and birth cohort effect coefficients. These algorithms were coded into our newly developed MATLAB computing program, called the “apcsur” function.

As the proof-of-concept, the proposed approach was utilized for analyzing SEER data of LC survival for white men and women, observed within the following successive 5-year time periods: 1975–1979, 1980–1984, 1985–1989, 1990–1994, and 1995–1999. A graphical approach using log-log plots was applied to evaluate the PH assumption. The estimates of coefficients of age at diagnosis, time period and birth cohort effects were obtained. Analysis of trends of these estimates suggests that the hazard of death from LC for a given time passed after the cancer diagnosis: (i) decreases between 1975 and 1999; (ii) increases with increasing the age at diagnosis; and (iii) depends upon birth cohort effects. Our analysis, performed in the frame of the PH model, clearly suggests that there is a small but statistically significant improvement of the LC survival in the time period of 1975–1999. Biological and clinical insights of the obtained results need further analysis, which is out of the scope of this methodologically-oriented work.

Overall, we suggest that the proposed computing procedure could also be used for estimating APC effects in survival analysis of different types of cancer.

Acknowledgments

This work was partially supported by an R01 CA140940-01 A1 (NIH, SS the PI) grant.

Footnotes

Disclosures

This manuscript has been read and approved by all authors. This paper is unique and is not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers of this paper report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.

References

  • 1.Kleinbaum DG, Klein M. Survival Analysis: A Self-Learning Text. 2nd ed. Springer Science + Business media, Inc; 2005. p. 590. [Google Scholar]
  • 2.Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. 2nd ed. Springer Science+Business media, Inc; 2003. p. 536. [Google Scholar]
  • 3.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Amer Statist Assn. 1958;53:457–81. [Google Scholar]
  • 4.Brenner H. Long-term survival rates of cancer patients achieved by the end of the 20th century: a period analysis. The Lancet. 2002;360:1131–5. doi: 10.1016/S0140-6736(02)11199-8. [DOI] [PubMed] [Google Scholar]
  • 5.Brenner H, Gefeller O, Hakulinen T. A computer program for period analysis of survival. European Journal of Cancer. 2002;38:690–5. doi: 10.1016/s0959-8049(02)00003-5. [DOI] [PubMed] [Google Scholar]
  • 6.Brenner H, Arndt V, Gefeller O, Hakulinen T. An alternative approach to age adjustment of cancer survival rates. European Journal of Cancer. 2004;40:2317–22. doi: 10.1016/j.ejca.2004.07.007. [DOI] [PubMed] [Google Scholar]
  • 7.Brenner H, Gondos A, Arndt V. Recent major progress in long-term cancer patient survival disclosed by modeled period analysis. Journal of Clinical Oncology. 2007;25:3274–80. doi: 10.1200/JCO.2007.11.3431. [DOI] [PubMed] [Google Scholar]
  • 8.Brenner H, Gondos A, Pulte D. Survival expectations of patients diagnosed with Hodgkin’s lymphoma in 2006–2010. The Oncologist. 2009;14:806–13. doi: 10.1634/theoncologist.2008-0285. [DOI] [PubMed] [Google Scholar]
  • 9.Fesinmeyer MD, Austin MA, Li CI, De Roos AJ, Bowen DJ. Differences in survival by histologic type of pancreatic cancer. Cancer Epidemiology, Biomarkers and Prevention. 2005;14:1766–73. doi: 10.1158/1055-9965.EPI-05-0120. [DOI] [PubMed] [Google Scholar]
  • 10.Mdzinarishvili T, Gleason MX, Sherman S. A novel approach for analysis of the log-linear age-period-cohort model: application to lung cancer incidence. Cancer Inform. 2009;7:271–80. doi: 10.4137/cin.s3572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mdzinarishvili T, Gleason MX, Sherman S. Estimation of hazard functions in the log-linear age-period-cohort model: application to lung cancer risk associated with geographical area. Cancer Inform. 2010;9:67–78. doi: 10.4137/cin.s4522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Holford TR. Age-period-cohort analysis. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. 2nd ed. John Wiley & Sons, Ltd; 2005. pp. 17–35. [Google Scholar]
  • 13.Registry Groupings for Analyses. Surveillance, Epidemiology, and End Results (SEER) Program ( wwwseercancergov), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch. Available at: http://seer.cancer.gov/registries/terms.html. Accessed December 3, 2010.
  • 14.Fu WJ. A smoothing cohort model in age-period-cohort analysis with applications to homicide arrest rates and lung cancer mortality rates. Sociol Method Res. 2008;36:327–61. [Google Scholar]
  • 15.Clayton D, Schifflers E. Models for temporal variation in cancer rates. I: age-period and age-cohort models. Statistics in Medicine. 1987;6:449–67. doi: 10.1002/sim.4780060405. [DOI] [PubMed] [Google Scholar]
  • 16.Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: age-period-cohort models. Statistics in Medicine. 1987;6:469–81. doi: 10.1002/sim.4780060406. [DOI] [PubMed] [Google Scholar]
  • 17.Holford TR. Understanding the effects of age, period, and cohort on incidence and mortality rates. Annu Rev Public Health. 1991;12:425–57. doi: 10.1146/annurev.pu.12.050191.002233. [DOI] [PubMed] [Google Scholar]
  • 18.Moolgavkar SH, Lee JAH, Stevens RG. Analysis of vital statistical data. In: Rothman K, Greenland S, editors. Modern Epidemiology. 2nd ed. Lippincott-Raven, PA: 1998. pp. 482–97. [Google Scholar]
  • 19.Weisstein EW. Error Propagation. MathWorld—A Wolfram Web Resource. [last updated Nov 29 2010]. Available from http://mathworld.wolfram.com/ErrorPropagation.html.
  • 20.Holford TR. Approaches to fitting age-period-cohort models with unequal intervals. Statistics in Medicine. 2006:977–93. doi: 10.1002/sim.2253. [DOI] [PubMed] [Google Scholar]
  • 21.Ries LAG, Young JL, Keel GE, et al., editors. SEER Survival Monograph: Cancer Survival Among Adults: US SEER Program, 1988–2001, Patient and Tumor Characteristics. Bethesda, MD: National Cancer Institute; 2007. Available at: http://seer.cancer.gov/publications/survival/seer_survival_mono_highres.pdf. Accessed September 29, 2010. [Google Scholar]
  • 22.Mdzinarishvili T, Gleason MX, Kinarsky L, Sherman S. A generalized beta model for age distribution of cancers: application to pancreatic and kidney cancer. Cancer Inform. 2009;7:183–97. doi: 10.4137/cin.s3050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bassily MN, Wilson R, Pompei F, Burmistrov D. Cancer survival as a function of age at diagnosis: a study of the Surveillance, Epidemiology and End Results database. Cancer Epidemiology. 2010;34:667–81. doi: 10.1016/j.canep.2010.04.013. [DOI] [PubMed] [Google Scholar]
  • 24.Holford TR. An alternative approach to statistical age-period-cohort analysis. J Chron Dis. 1985;38:831–836. doi: 10.1016/0021-9681(85)90106-7. [DOI] [PubMed] [Google Scholar]

Articles from Cancer Informatics are provided here courtesy of SAGE Publications

RESOURCES