Skip to main content
International Journal of Preventive Medicine logoLink to International Journal of Preventive Medicine
. 2012 Sep;3(9):644–651.

Breast Cancer Survival Analysis: Applying the Generalized Gamma Distribution under Different Conditions of the Proportional Hazards and Accelerated Failure Time Assumptions

Alireza Abadi 1,, Farzaneh Amanpour 1, Chris Bajdik 2, Parvin Yavari 3
PMCID: PMC3445281  PMID: 23024854

Abstract

Background:

The goal of this study is to extend the applications of parametric survival models so that they include cases in which accelerated failure time (AFT) assumption is not satisfied, and examine parametric and semiparametric models under different proportional hazards (PH) and AFT assumptions.

Methods:

The data for 12,531 women diagnosed with breast cancer in British Columbia, Canada, during 1990–1999 were divided into eight groups according to patients’ ages and stage of disease, and each group was assumed to have different AFT and PH assumptions. For parametric models, we fitted the saturated generalized gamma (GG) distribution, and compared this with the conventional AFT model. Using a likelihood ratio statistic, both models were compared to the simpler forms including the Weibull and lognormal. For semiparametric models, either Cox's PH model or stratified Cox model was fitted according to the PH assumption and tested using Schoenfeld residuals. The GG family was compared to the log-logistic model using Akaike information criterion (AIC) and Baysian information criterion (BIC).

Results:

When PH and AFT assumptions were satisfied, semiparametric and parametric models both provided valid descriptions of breast cancer patient survival. When PH assumption was not satisfied but AFT condition held, the parametric models performed better than the stratified Cox model. When neither the PH nor the AFT assumptions were met, the log normal distribution provided a reasonable fit.

Conclusions:

When both the PH and AFT assumptions are satisfied, the parametric and semiparametric models provide complementary information. When PH assumption is not satisfied, the parametric models should be considered, whether the AFT assumption is met or not.

Keywords: Breast cancer, generalized gamma distribution, parametric regression, stratified Cox model, survival analysis

INTRODUCTION

The Cox proportional hazards (PH) model is popular for analyzing survival data. The utility of this model stems from the fact that few assumptions are needed to determine hazard ratios based on the coefficients. The coefficients are easily interpreted and clinically meaningful.[1] The “stratified Cox (SC) model” is a modification of the Cox PH model, which allows for control by stratification of a predictor that does not satisfy the PH assumption.[2]

Parametric models are used only occasionally in analyzing clinical studies of survival despite offering some advantages over semiparametric models. Parametric regression analysis is an attractive alternative to the widely used Cox model when hazard functions themselves are of primary interest, or when relative survival times are the primary measure of association.[3] When empirical information is available, parametric models can provide insight into the shape of the baseline hazard and baseline survival. However, fully parametric models involve stronger assumptions than semiparametric options. Furthermore, difficulty choosing the appropriate family of distributions leads many researchers to prefer the Cox model.[4] Some parametric models are accelerated failure time (AFT) models which assume that the relationship between the logarithm of survival time and covariates is linear.[5] Violation of the AFT assumption makes the parametric models more complicated.

One approach to address these difficulties is fitting the generalized gamma (GG) distribution. This extensive family contains nearly all commonly used distributions including the exponential, Weibull, and log normal, making it particularly useful for estimating individual hazard functions as well as both relative hazards and relative survival times.[3]

In this analysis, we applied both semiparametric and parametric models under different conditions of the PH and AFT assumptions and compared their results.

MATERIALS AND METHODS

Study design

The data in this study describe 12,531 women diagnosed with breast cancer in British Columbia during 1990–1999, and followed till 2010. All women were identified from the population-based BC Cancer Registry.

Patients’ treatments included hormone therapy, chemotherapy, surgery, and radiotherapy. These were coded using binary variables equal to one if the subject received the treatment and zero otherwise.

We defined survival time as a period between the diagnosis of disease and death or the end of patient's follow-up.

A binary censoring variable was used to indicate whether a patient died of breast cancer.

Statistical analysis

Parametric model

For choosing the appropriate parametric model, we started by fitting the saturated GG distribution. This distribution is a three-parameter family with location (β), scale (σ > 0), and shape (λ) parameters, in which all the three parameters depend on covariates. It should be noted that the conventional AFT model holds only when covariate effects are modeled through the beta parameter. If we extend the analysis to covariates having effects through the sigma and/or lambda parameters, these are no longer conventional AFT models.[3]

The GG family contains nearly all of the most commonly used distributions in survival analysis, including the exponential (λ = σ = 1), Weibull (λ = 1), and log normal (λ = 0). If a general parametric distribution includes other distributions as special cases, the general distribution is called a nesting (larger) family of the specific distributions.[5] The GG distribution includes three specific distributions, and thus represents a nesting family of them, allowing us to evaluate the appropriateness of the specific distributions relative to each other. Testing the appropriateness of a family of distributions is equivalent to testing whether a subset of parameters in its nesting distribution are equal to specific values, and can be performed using a likelihood ratio test.[5]

The log-logistic distribution is a commonly used distribution in survival analysis, which is not nested in the GG family. To compare the selected parameter distribution with the log-logistic distribution, we used simple procedures based on Baysian information criterion (BIC Inline graphic Schwarz, 1978 and Akaike information criterion (AIC; Akaike, 1969), in which r is defined as Inline graphic. In our comparisons, the candidate distribution with the largest r value was considered the best fit.[5]

For AFT models, we can estimate relative survival time by exponentiating the coefficient of a variable. For other models, we can calculate the relative time, RT (p), using appropriate formulae. The relative times are defined for 0 < p < 1 as the ratio of the corresponding quantile functions, RT (p) = t1(p)/t0(p). The interpretation of RT (p) is that the time required for proportion p of individuals in the exposed or treated population to experience the event of interest is RT(p)-fold the time for the same proportion of events to occur in the reference population. Links between quantiles of the gamma and GG facilitats use of software to obtain the percentiles of the GG.[3]

Semiparametric model

We can also describe the distribution of survival time by specifying the hazard function. The advantage of this approach is that we directly address the aging process. Cox was the first to propose the model, specifying the hazard function as a function of time and the covariates:[1]

H (t, x, b) = h0(t) exp(xb).

With this parameterization, the hazard ratio is:

HR (t, x1, x0) = exp(b(x1 - x0)).

The Cox PH model assumes that the hazard ratio for any two specifications of predictors is constant over time, and Schoenfeld residuals can be used to assess the PH assumption.[2] The “SC model” is a modification of the Cox PH model, which allows for control by “stratification” of a predictor that does not satisfy the PH assumption.The predictor that does not satisfy the PH assumption is being adjusted by stratification, whereas the predictor that satisfies the PH assumption is being adjusted by its inclusion in the model. The hazard ratio value for the effect of variables in each stratum can be estimated. Nevertheless, the hazard ratio value for the effect of a stratified variable cannot be estimated. Furthermore, we applied the likelihood ratio (LR) test to check the interaction between stratified variable and variables in each stratum.[2]

A standard treatment protocol for breast cancer is determined mainly by the patient's age, stage of cancer, and tumor sensitivity to certain hormones. Breast cancer stage has an important role in choosing the treatment, and a patient's response to treatment depends on her age, so we divided the dataset according to the age of patient at diagnosis of disease (age < 50, age ≥ 50) and the stage of cancer (I, II, III, IV). This produced eight combinations of age and stage with different conditions of PH and AFT assumptions, and we could compare the treatment effect on patient survival with parametric and semi-parametric models.

The presence of hormone receptors has been proven to have an effect on survival time of patient, and was included in all models.

For each combination of age and stage, we only included variables for which more than 10 patients received and did not receive the treatment.

RESULTS

Frequency of patients receiving treatment by stages at diagnosis is shown in Table 1. Results of comparison between parametric regression models have been summarized in Table 2. Also, the results of fitting parametric model and Cox model are shown in Tables 3 and 4 respectively.

Table 1.

Number of patients receiving treatment by stage at diagnosis

graphic file with name IJPVM-3-644-g003.jpg

Table 2.

Comparison between parametric regression models in each category of age and stage

graphic file with name IJPVM-3-644-g004.jpg

Table 3.

Relative time estimated in parametric models

graphic file with name IJPVM-3-644-g005.jpg

Table 4.

Relative hazard estimated in Cox model

graphic file with name IJPVM-3-644-g006.jpg

Situations when both PH and AFT assumptions are satisfied

For patients under age 50 years in disease stages I and IV, the PH and AFT assumptions were satisfied.

In patients under age 50 years with stage I cancer, the best-fitting parametric model used a conventional GG distribution in which chemotherapy (P < 0.001) and erposneg (P = 0.002) were significant. For the Cox model, radiotherapy (P = 0.001), chemotherapy (P < 0.001), and hormone therapy (P = 0.01) were significant. In both models, no interactions were significant.

In patients under age 50 years with stage IV cancer, a conventional lognormal model was the best candidate in GG family. The AIC and BIC criteria were the same for conventional lognormal and conventional log-logistic models, but the Cox-Snell residual plot indicated better fit for the lognormal. Radiotherapy (P = 0.01), hormone therapy (P = 0.01), and erposneg(P = 0.01) were meaningful in lognormal model. For the Cox model, radiotherapy (P = 0.02) and erposneg (P = 0.001) were significant and no interactions were significant in any of the models. Cox–Snell residual plots for the lognormal, log-logistic, and Cox PH model are given in Figures 13.

Figure 1.

Figure 1

Cox-Snell residual plot for fitted conventional lognormal model for patients under age 50 years with stage IV cancer

Figure 3.

Figure 3

Cox-Snell residual plot for proportional Cox model for patients under age 50 years with stage IV cancer

Figure 2.

Figure 2

Cox-Snell residual plot for fitted conventional log-logistic model for patients under age 50 years with stage IV cancer

Situations when only AFT assumptions hold

For patients aged 50 years or more with stage I cancer, the PH assumption was not satisfied, but the AFT assumptions held. In parametric models, the conventional GG had the best fit and radiotherapy (P = 0.014), chemotherapy (P < 0.001), hormone therapy (P < 0.001), and erposneg (P < 0.001) were significant. In semiparametric model, the covariates radiotherapy (P = 0.003), chemotherapy (P < 0.001), hormone therapy (P < 0.001), and erposneg (P < 0.001) were meaningful; and since the variable erposneg did not hold the PH assumption, it was lost due to stratification and the effects of other parameters were estimated by stratifying the model by this variable. No interactions were significant in each model.

Situations where none of the PH and AFT assumptions are satisfied

For patients aged 50 years or more with stage IV cancer, none of the PH and AFT assumptions held. In parametric model, the GG family fitted better than the log-logistic according to AIC and BIC criteria. In models based on the GG family , the saturated lognormal distribution was the best model. In this model, surgery (P = 0.001), radiotherapy (P = 0.004), hormone therapy (P < 0.001), and erposneg (P < 0.001) were significant and the interaction of hormone therapy and erposneg (P = 0.021) was meaningful. Since the saturated lognormal is not an AFT model, the relative survival time was calculated for the 25th, 50th, and 75th quantiles. Results of fitting have been shown in Table 5. In the Cox model, surgery (P = 0.001), radiotherapy (P = 0.03), hormone therapy (P < 0.001), and erposneg (P = 0.001) were significant. Since the PH assumption only held for surgery, the effect of this variable was estimated by stratifying on erposneg. No interaction was significant in each model.

Table 5.

Relative times for stage IV–Age≥50

graphic file with name IJPVM-3-644-g010.jpg

For all patients with stage II cancer, and patients aged 50 and older with stage III cancer, neither the PH nor the AFT assumptions were satisfied. In semiparametric models, most variables did not meet the PH assumption and their effect could not be estimated; in parametric models, the saturated GG fitted better than other distributions.

DISCUSSION

When PH and AFT assumptions were satisfied, both the parametric and semiparametric models were appropriate. The models indicate different significant variables, but parametric models have some advantages over semiparametric models in general. With small sample sizes, relative efficiencies may further change in favor of parametric models.[4] .When the PH assumption is satisfied, some studies indicate that parametric PH models are a better approach than the Cox model.[6,7] Further, some studies have shown the robustness of parametric AFT models to misspecification because of their log-linear form.[8] Finally, one advantage of a parametric model compared to a Cox model is that the parametric likelihood accommodates right-, left-, or interval-censored data. The Cox likelihood, by contrast, handles right-censored data but does not directly accommodate left- or interval-censored data.[2]

When the PH assumption is not satisfied but AFT assumptions hold, the parametric model can be used as a substitute for the Cox model. Other studies have suggested the same thing.[9,10]

When neither the PH nor the AFT assumptions hold, a member of the GG distribution, the saturated lognormal, can be used to calculate relative survival times in different quantiles. The lognormal distribution has a long history in cancer survival analysis.[10,11] .In many settings, including breast cancer analysis, where the proportionality assumption does not hold, the lognormal model has been shown to be appropriate.[1214] . A meta-analysis has shown that saturated parametric models provide better results than conventional models for comparing treatments.[15]

When the PH and AFT assumptions do not hold and the saturated GG distribution fits better than other distributions, further analyses may need to be considered. Through its three parameters, the GG family contains many different distributions such as the inverse Weibull and inverse lognormal.[3] Accordingly, the best fit can be found by trying different parameters. Our analysis applied the most commonly used parametric distributions in survival analysis, but we could not determine the best fit.

For one category of age and stage in our study, according to AIC and BIC criteria, the log-logistic distribution gave the same fit as the lognormal model within the GG family. The log-logistic distribution belongs to the generalized F distribution, which includes the GG distribution. When a member of the GG family does not fit satisfactorily, the best distribution can be found through the generalized F distribution family.[16,17]

If the scales of the parameters in Cox's model and the parametric models differ, neither parameter estimates nor their estimated variances are suitable for comparisons. If the Cox model is compared to parametric PH models, the efficiency of parameter estimates can be compared by Wald-type statistics.[4]

For patients with stage IV cancer, semiparametric and parametric models showed lower hazards and longer survival times for patients receiving treatments than those not receiving treatments. For patients with stage I cancer in both the under 50-year and over 50-year age groups, semiparametric and parametric models showed higher hazards and smaller survival times for patients receiving treatments, which might reflect other covariates that caused the patients to receive a certain treatment. For example, studies have shown the ethnicity of patients affect their use of treatments that are more common in lower stages of cancer.[18]

In both parametric and semiparametric models applied to different combinations of age and cancer stage, the expression of hormone receptors was associated with a longer survival time and lower hazard, which has been confirmed in many other studies.[19]

A major strength of this study is that we fitted models and performed comparisons using a large set of real-life data from a population-based cancer registry. The major limitation is that our findings describe associations with survival, and not causes of survival. In particular, breast cancer patients often receive treatments because of disease characteristics. A patient's survival does not necessarily depend on the treatment they receive; rather, the treatment that a patient receives often depends on disease characteristics that predict survival.

Subsequent research should examine whether our findings hold for other diseases and other populations. In particular, it would be of interest to determine whether the findings are sustained in more-recent patient cohorts – where newer treatments have been used. However, there are some recent studies that describe the factors associated with survival time of patients in developing countries.[20,21]

CONCLUSION

When PH and AFT assumptions were satisfied, semiparametric and parametric models provided two different valid approaches for exploring breast cancer patients’ survival, and the models can be seen as complementary. When PH assumptions were not satisfied but AFT conditions held, the parametric model should be used instead of the Cox model. When neither the PH nor the AFT assumptions were met, the log normal distribution, a member of the GG family, provided an alternative approach to semiparametric model. More generally, when PH assumptions are not satisfied, parametric models should be considered, whether or not AFT assumptions are met.

ACKNOWLEDGMENTS

We thank the BC Cancer Registry for providing data for our study, and the Breast Cancer Outcomes Unit (BCOU) at the BC Cancer Agency for informing our interpretations of cancer and treatment. CB is a Senior Scholar with the Michael Smith Foundation for Health Research (MSFHR).

Footnotes

Source of Support: Nil

Conflict of Interest: None declared

REFERENCES

  • 1.Hosmer D, Lemeshow S. Applied Survival Analysis. New York: Wiley; 1989. [Google Scholar]
  • 2.Kleinbaum D, Klein M. Survival Analysis. New York: Springer; 2005. [Google Scholar]
  • 3.Cox C, Chu H, Schneider MF, Muñoz A. parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat Med. 2007;26:4352–74. doi: 10.1002/sim.2836. [DOI] [PubMed] [Google Scholar]
  • 4.Nardi A, Schemper M. Comparing Cox and parametric models in clinical studies. Stat Med. 2003;22:3597–610. doi: 10.1002/sim.1592. [DOI] [PubMed] [Google Scholar]
  • 5.Lee E, Wang J. Statistical Methods for Survival Data Analysis. New Jersey: Wiley; 2003. [Google Scholar]
  • 6.Royston P, Parmar M. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modeling and estimation of treatment effects. Stat Med. 2002;21:2175–97. doi: 10.1002/sim.1203. [DOI] [PubMed] [Google Scholar]
  • 7.Nelson CP, Lambert PC, Squire IB, Jones DR. Flexible parametric models for relative survival, with application in coronary heart disease. Stat Med. 2007;26:5486–98. doi: 10.1002/sim.3064. [DOI] [PubMed] [Google Scholar]
  • 8.Hutton JL, Monaghan PF. Choice of parametric accelerated life and proportional hazards models for survival data: Asymptotic Results. Lifetime Data Anal. 2002;8:375–39. doi: 10.1023/a:1020570922072. [DOI] [PubMed] [Google Scholar]
  • 9.Frankel P, Longmate J. Parametric models for accelerated and long-term survival: A comment on proportional hazards. Stat Med. 2002;21:3279–89. doi: 10.1002/sim.1273. [DOI] [PubMed] [Google Scholar]
  • 10.Wang SJ, Kalpathy-Cramer J, Kim JS, Fuller D, Thomas CR. Parametric Survival Models for Predicting the Benefit of Adjuvant Chemo radiotherapy in Gallbladder Cancer. AMIA Annu Symp Proc. 2010:847–51. [PMC free article] [PubMed] [Google Scholar]
  • 11.Boag JW. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc B. 1949;11:15–44. [Google Scholar]
  • 12.Royston P. The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Stat isticaNeerlandica. 2001;55:89–104. [Google Scholar]
  • 13.Gamel JW, Vogel RL, Valagussa P, Bonadonna G. Parametric survival analysis of adjuvant therapy for stage II breast cancer. Cancer. 1994;74:2483–90. doi: 10.1002/1097-0142(19941101)74:9<2483::aid-cncr2820740915>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 14.Chapman JA, Lickley HL, Trudeau ME, Hanna WM, Kahn HJ, Murray D, et al. Ascertaining prognosis for breast cancer in node-negative patients with innovative survival analysis. Breast J. 2006;12:37–47. doi: 10.1111/j.1075-122X.2006.00183.x. [DOI] [PubMed] [Google Scholar]
  • 15.Ouwens M, Philips Z, Jansen J. Network meta-analysis of parametric survival curves. Wiley online library. 2011 doi: 10.1002/jrsm.25. [DOI] [PubMed] [Google Scholar]
  • 16.Kalbfleisch J, Prentice R. The Statistical Analysis of Failure Time Data. New York: Wiley; 1980. [Google Scholar]
  • 17.Cox C. The generalized F distribution: An umbrella for parametric survival analysis. Stat Med. 2008;27:4301–12. doi: 10.1002/sim.3292. [DOI] [PubMed] [Google Scholar]
  • 18.Yavari P, Barroetavena MC, Hislop TG, Bajdik CD. Breast cancer treatment and ethnicity in British Columbia, Canada. BMC Cancer. 2010;10:154. doi: 10.1186/1471-2407-10-154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pam S. Hormone Receptor Status and Diagnosis - Estrogen and Progesterone. 2011. [Last accessed on 2012 Aug 28]. Available from: http://www.breastcancer.about.com/od/diagnosis/p/hormone_status.htm .
  • 20.Seedhom AE, Kamal NN. Factors Affecting Survival of Women Diagnosed with Breast Cancer in El-Minia Governorate, Egypt. Int J Prev Med. 2011;2:131–8. [PMC free article] [PubMed] [Google Scholar]
  • 21.Rasaf MR, Ramezani R, Mehrazma M, Rasaf MRR, Asadi-Lari M. Inequalities in Cancer Distribution in Tehran; A Disaggregated Estimation of 2007 Incidence by 22 Districts. Int J Prev Med. 2012;3:483–92. [PMC free article] [PubMed] [Google Scholar]

Articles from International Journal of Preventive Medicine are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES