Abstract
Background
Cox regression is the most widely used survival model in oncology. Parametric survival models are an alternative of Cox regression model. In this study, we have illustrated the application of semiparametric model and various parametric (Weibull, exponential, log‐normal, and log‐logistic) models in lung cancer data by using R software.
Aims
The aim of the study is to illustrate responsible factors in lung cancer and compared with Cox regression and parametric models.
Methods
A total of 66 lung cancer patients of African Americans (AAs) (data available online at http://clincancerres.aacrjournals.org) was used. To identify predictors of overall survival, stage of patient, sex, age, smoking, and tumor grade were taken into account. Both parametric and semiparametric models were fitted. Performance of parametric models was compared by Akaike information criterion (AIC). “Survival” package in R software was used to perform the analysis. Posterior density was obtained for different parameters through Bayesian approach using WinBUGS.
Results
The illustration about model fitting problem was documented. Parametric models were fitted only for stage after controlling for age. AIC value was minimum (462.4087) for log‐logistic model as compared with other parametric models. Log‐logistic model was the best fit for AAs lung cancer data under study.
Conclusion
Exploring parametric survival models in daily practice of cancer research is challenging. It may be due to many reasons including popularity of Cox regression and lack of knowledge about how to perform it. This paper provides the application of parametric survival models by using freely available R software with illustration. It is expected that this present work can be useful to apply parametric survival models.
Keywords: Cox proportional hazard model, log‐logistic model, posterior density, R software
1. INTRODUCTION
Survival analysis is union of different statistical methods for data analysis. Survival analysis is used to analyze time to event data; event may be death, recurrence, or any other outcome of interest. Cox proportional hazard (CPH) model is well known for analyzing survival data because of its simplicity as it has no assumption regarding survival distribution. CPH helps to find out hazard ratio based on coefficients. These coefficients are easy to interpret and clinically meaningful.1 In parametric survival models, it is considered that survival time follows known distributions as Weibull, exponential, log‐normal, and log‐logistic distributions. Parametric models may be acceleration failure time (AFT) and CPH models. The AFT models are useful for comparison of survival times whereas the CPH is applicable for comparison of hazards.2 Parametric models are better over CPH with respect to sample size and relative efficiencies.3 Survival analysis has another methodology for computation, and modeling is known as Bayesian survival analysis (BSA). It is based on prior information that is in our setting previous history of patient.4, 5 Weibull parametric model is used to compare survival time of patient treated with two different methods.6 The performance of CPH and BSA had compared under different sample sizes using Markov chain Monte Carlo (MCMC) simulation for cancer patients.7 There are many challenges that arise in analysis of data. Broadly, these are as follows: (1) How to choose statistical model for estimation of parameters and (2) how to correlate biological information with statistical modeling. To address these challenges, efficient methods and computer software are required.
To address these challenges, efficient methods and computer software are required. In freely available R, package “survival” is used to implement these methods. This package survival provides functions as CPH model, KM method, and parametric models for discrete and continuous outcomes. To analyze the survival time of a patient till time point t, we can define a mathematical expression in terms of probability as
| (1) |
Clinical data related to AA lung cancer patients at clinical Cancer Research Centre available online (http://clincancerres.aacrjournals.org) taken from 1998 to 2014 in both the mRNA and miRNA context were used for illustration.8 We have evaluated the role of treatment, stage of patient, time, sex, age, smoking, and tumor in life span of lung cancer patients.
2. MODELS AND ALGORITHMS
Survival analysis includes nonparametric (KM method), semiparametric (CPH model), and parametric methods. In freely available R, package survival is used to implement these methods. Survival function is defined in Equation (1).
CPH model, KM method, and parametric models (Weibull, exponential, log‐normal, and log‐logistic) were used for estimation of survival analysis. The Weibull distribution was given by Waloddi Weibull in 1951. It is most preferred in all conditions when hazard rate is decreasing, increasing, or constant over time.9 Exponential distribution is the most important life time distribution that has constant hazard rate or failure rate.10 Log‐normal distribution is defined on continuous random variable with log of normal distribution. Biological phenomena like lengths of latent periods of infectious diseases and distribution of mineral resources in the Earth's crust have skewed distributions and are often closely fit the log‐normal distribution.11, 12, 13, 14 The log‐logistic distribution is a parametric model with positive random variable for survival analysis. It is used when survival rate increases at starting and decreases thereafter. Mortality rate in cancer follow‐up study is followed by log‐logistic distribution.15 An AIC criterion is used to find best fitted model on clinical data. The AIC is developed by Hirotugu Akaike in 1974. It measures goodness of fit for estimated statistical model.16 The value of AIC is calculated by
| (2) |
where
k = no. of estimated parameter.
L = maximum value of likelihood function for this model.
Minimum AIC value gives the best fit model.
2.1. Cox proportional hazard model
CPH model is one type of regression model which is commonly used in biological phenomena for investigating the association between the survival time of patients and one or more predictor variables.17 This method is used to evaluate the effect of many factors on survival time, and it allows to examine how specified factors influence the rate of particular event that occurs at a particular time. Mathematically, this model is described as
| (3) |
where h(t) hazard function, h 0(t) baseline hazard function, and β1,β2,…,βp regression coefficients.
2.2. Kaplan‐Meier estimation method
KM method is very popular in nonparametric method to estimate the survival probability at given time.18 There are “d” distinct time with t 1, t 2, …, t d and d i death or events occurred at time t i, and Y i is the number of individuals who are at risk at t i. KM survival probability corresponding to each ordered failure time is shown in Figure 2. The KM formula can also be expressed as a product limit function. The product of all fractions is used to estimate the conditional probabilities for failure times. The mathematical representation of KM survival probabilities is calculated as
| (4) |
KM survival probability of surviving past the previous failure time t(j − 1) multiplied by the conditional probability of surviving past time t(j) given survival to at least time t(j).
Figure 2.

Survival curve
2.3. CPH and parametric models
CPH method is used to find the effect of different variables on the time deviation for specific event. This method is not based on any particular survival model. It has assumption that effect of predictor variables on survival is constant throughout time. Parametric model for survival data is not working properly when random variable follows normal distribution. Parametric Weibull, exponential, log‐normal, and log‐logistic models work better if chosen correctly. The mathematical equations of parametric models are given under this section.
The Weibull distribution model is given as
| (5) |
The exponential model is also known as one parameter exponential distribution with mean life σ given in this form
| (6) |
The log‐normal distribution model is given by
| (7) |
The log‐logistic distribution model is given by
| (8) |
The mathematical formulation of used models in this article is defined by Equations (5) to (8). These models are directly used by R software, and value is computed using inbuilt functions in R software.
2.4. Algorithm's flow chart
It is used to show the algorithm of survival package in R software for survival analysis. The algorithm and codes of R programming are shown in Figure 1 .
Figure 1.

Algorithm's flow chart; the package survival is used for the survival analysis as follows in the flow chat
2.5. Bayesian analysis
The Bayesian analysis is an inferential procedure for consideration of observed outcome based on prior information. Inferences are drawn by samples to predict population parameter. It is more useful in clinical data analysis over classical approach and suitable data analysis technique for clinical researchers.19 Complexity of computation of survival models can be removed by Bayesian technique.
2.5.1. Procedure in Bayesian approach
In most practical problems, the information is contained in the sample. Posterior inference is dependent on prior information. Bayesian approach is based on (1) distribution of prior information, (2) likelihood function to generate the posterior, and (3) distribution of posterior. The relation and procedure can be elaborated with the samples. Prior information is related to outcome. This information will be taken as prior information to generate posterior estimates. In the next step, the prior information is multiplied with likelihood. The obtain value will be the posterior distribution of the parameter.
The mathematical formulation of Bayesian approach is given by
Bayes theorem is defined as
| (9) |
Let f(x/θ) be the pdf of a random variable X where θ is parameter.
In Bayesian framework, θ is a random variable and the data X given as fixed.
| (10) |
Let be a random sample from the density function.
Then, the distribution of a parameter θ given data x is given by
| (11) |
where
Ω is the parameter space of θ. g(θ) is the prior distribution, and π(θ/x)is the posterior distribution of θ.
2.5.2. Prior information
The prior information is used in Bayesian technique which is obtained through previously performed study. The prior can differentiate into informative and noninformative. There are different methods to create the data‐based priors. Priors are classified as uniform prior, noninformative prior, Jeffrey's prior, natural conjugate prior, minimal information prior, asymptotically locally invariant prior, and Dirichlet's prior.
2.5.3. Application of Bayesian models in clinical data
Open source software like R and WinBUGS are useful for Bayesian data modeling. Sequential design, predictive probability, adaptive study design, and meta‐analysis are the area for Bayesian applicability.
2.5.4. Limitation of Bayesian
Selection bias and incorrect selection of prior selection can influence the wrong information. The prior should not influence the posterior distribution. The conjugate prior gives the good result and also provides posterior from same family of distributions.
3. RESULTS
This study includes total 66 observations related to AAs (22 mRNA + 44 miRNA).8 Overall survival (time in days) of lung cancer patients among AAs was calculated and represented graphically using KM curve with upper and lower limits of survival estimates (Figure 2 ). Cox regression was applied to determine the predictors of overall survival among AAs lung cancer patients. There was no significance difference found for variables under study (Table 1 ). However, stage of patient is a very well‐known factor for determining difference in survival. Stage is a widely used indicator to determine survival in oncology. In our data, it is observed that survival of stage I is almost similar to stage II lung cancer patients. Survival of stage III patients is lower than survival for patients diagnosed with stage I (Figure 3). Considering the popularity of stage as a predictor, we build Cox and parametric models by taking stage and age into consideration. In Cox multivariate analysis, both stage and age were found to be nonsignificant (P > 0.05) (Table 2). Parametric survival models (exponential, Weibull, log‐normal, and log‐logistic distributions) were illustrated on this data along with their AIC values. It is found that log‐logistic distribution is best fitted with low AIC value (460.4255) (Table 3).
Table 1.
Univariate analysis using Cox proportional hazard model
| Study Variable | Coeff. (β) | Exp (β) | SE (β) | Z statistics | P Value | AIC Value |
|---|---|---|---|---|---|---|
| Age | 0.00812 | 1.00515 | 0.01734 | 0.47 | 0.64 | 127.431 |
| Sex 2 | 0.536 | 1.709 | 0.440 | 1.22 | 0.22 | 126.0972 |
| Smoking | −0.00535 | 0.99467 | 0.00694 | −0.77 | 0.44 | 126.9987 |
| Stage 2 | −0.141 | 0.868 | 0.543 | −0.26 | 0.795 | 126.3213 |
| Stage 3 | 0.961 | 2.615 | 0.540 | 1.78 | 0.075 | 126.3213 |
| Tumor 2 | −0.141 | 0.868 | 0.394 | −0.36 | 0.72 | 127.524 |
Figure 3.

Kaplan‐Meier curve
Table 2.
Independent predictors of overall survival by using Cox proportional hazard model
| Study Variable | Coeff. (β) | Exp (β) | SE (β) | Z statistics | P Value | AIC Value |
|---|---|---|---|---|---|---|
| Age | 0.00803 | 1.00807 | 0.02706 | 0.30 | 0.767 | 128.2339 |
| Stage 2 | 0.03293 | 1.03348 | 0.79703 | 0.04 | 0.967 | 128.2339 |
| Stage 3 | 0.99428 | 2.70277 | 0.55349 | 1.80 | 0.072 | 128.2339 |
Table 3.
Performance of different parametric models
| X | Models | P Value | Log‐Likelihood | AIC Value |
|---|---|---|---|---|
| Age + stage (2, 3) | Weibull | 1 | −312.3 | 634.608 |
| Exponential | 0.74 | −241.7 | 491.3968 | |
| Log‐normal | 0.91 | −228.2 | 466.3095 | |
| Log‐logistic | 0.75 | −226.2 | 462.4087 |
Bayesian survival analysis has been gaining popularity over the last few years. In this article, we illustrate the application of Bayesian survival analysis to compare survival probability for lung cancer based on log‐logistic distribution estimated survival function. Table 4 presents posterior estimation and credible regions with normal priors. One way to assess the accuracy of the posterior estimates is by calculating the MC error for each parameter. As a rule of thumb, the simulation should be run until the MC error for each parameter of interest is less than about 5% of the sample standard deviation (SD). All the parameters' 95% credible regions contain zero. So, all the parameters had nonsignificant effect on survival time of lung cancer.
Table 4.
Posterior density for different regression estimates using Bayesian survival analysis (BSA)
| Parameter | Mean (SD) | 95% Credible Interval | DIC Value | Posterior Density |
|---|---|---|---|---|
| Intercept | 7.67 (0.48) | 6.72, 8.64 | 41.33 | 2.93 |
| Age | 0.00 (0.00) | −0.01, 0.02 | ||
| Scale | 0.47 (0.06) | 0.35, 0.61 | ||
| Intercept | 8.43 (0.30) | 7.84, 9.01 | 38.49 | 2.92 |
| Gender | −0.29 (0.17) | −0.64, 0.04 | ||
| Scale | 0.44 (0.06) | 0.33, 0.57 | ||
| Intercept | 7.77 (0.12) | 7.54, 8.02 | 37.39 | 2.93 |
| Smoking | 0.00 (0.00) | 0.00, 0.01 | ||
| Scale | 0.44 (0.06) | 0.33, 0.56 |
4. DISCUSSION
CPH and KM are two frequently used methods by the researchers, especially in clinical settings.20, 21, 22 Popularity of Cox model to estimate and make inference may be due to the fact that it does not require any assumption regarding the distribution of the lifetime. However, it does have the requirement of proportional hazards, which is not always satisfied by the data. If the assumption of proportional hazard does not hold, parametric survival models may perform better.23
In this study, we have evaluated the performance of various parametric models in survival analysis of patient with lung cancer. Parametric models provide appropriate interpretation based on a particular distribution of time to event. The main objective of this study was to illustrate the application of survival analysis using R software and to demonstrate the application of parametric models. Keeping this in view, we have applied four widely used parametric models on lung cancer data. Many studies have been conducted on the survival analysis. We used AIC to evaluate the performance among parametric models. AIC was used previously to evaluate models.24, 25, 26 A model with minimum AIC was considered as a best model for lung cancer patients. AIC values of various parametric models show minimum AIC for log‐logistic model. Thus, in our data of lung cancer, log‐logistic model is fitted better than other models.27 It is also cited that mortality rate in cancer follow‐up study is followed by log‐logistic distribution.15 This article applied log‐logistic survival analysis based on Bayesian approach to lung cancer data. To determine lung cancer data distribution, deviance and probability were used. Bayesian methods were previously used by many authors in survival analysis.4, 5, 6, 7 In our data, posterior density was calculated for age, gender, and smoking. None of these factors were found to be significant effect survival of lung cancer patients.
5. CONCLUSIONS
Exploring parametric survival models in daily practice of cancer research is challenging. It may be due to popularity of Cox regression and lack of knowledge about how to perform it. This paper provides the application of parametric survival models by using freely available R software with illustration. It is expected that this present work can be useful to apply parametric survival models.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
AUTHOR CONTRIBUTIONS
All authors had full access to the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Conceptualization, M.K.; Formal Analysis, R.K.S., P.K.S, A.S.; Resources, A.B.; Writing ‐ Original Draft, A.J.
DATA AVAILABILITY STATEMENT
The data that support the finding of the study are available from the corresponding author upon reasonable request from author.
ACKNOWLEDGMENT
The authors would like to thank the referees and editor for their helpful and valuable suggestions and comments.
Kumar M, Sonker PK, Saroj A, Jain A, Bhattacharjee A, Saroj RK. Parametric survival analysis using R: Illustration with lung cancer data. Cancer Reports. 2020;3:e1210. 10.1002/cnr2.1210
[Correction added on 19 August 2019, after first online publication: The authors affiliation has been updated.]
REFERENCES
- 1. Hosmer D, Lemesow S. Applied Survival Analysis. New York: Willey; 1989. [Google Scholar]
- 2. Kleinbaum DG, Klien M. Survival Analysis: A Self‐Learning Text. USA: Springer; 1996. [Google Scholar]
- 3. Nardi A, Schemper M. Comparison Cox and parametric models in clinical studies. Stat Med. 2003;22(23):3597‐3610. [DOI] [PubMed] [Google Scholar]
- 4. Ibrahim JG, Chen MH, Sinha D. Bayesian Survival Analysis. New York: Springer‐Verlag; 2001. [Google Scholar]
- 5. Avci E. Bayesian survival analysis comparison of survival probability of harmone receptor status for breast cancer data. Int J Data Anal Tech Strateg. 2017;9(1):63. [Google Scholar]
- 6. Khan Y, Khan AA. Bayesian survival analysis of regression model using “Weibull”. Int J Inno Tes Sci Eng Tech. 2013;2(12):7199‐7204. [Google Scholar]
- 7. Omurlu IK, Ozdamar K, Ture M. Comparison of Bayesian survival analysis and Cox regression analysis in simulated and breast cancer data sets. Exp Syst Appl. 2009;36(23):11341‐11346. [Google Scholar]
- 8. Mitchell KA, Zingone A, Toulabi L, Boeckelman J, Ryan BM. Comprative transcriptome profilling reveals coding and noncoding RNA differences in NSCLC from African American and European Americans. Clin Cancer Res. 2017;23(23):7412‐7425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Weibull W. A statistical distribution function of wide applicability. J Appl Mech. 1951;18:293. [Google Scholar]
- 10. Benjamin E. Exponential Distribution and Its Role in Life Testing. Wayne state &Stanford Universities; 1958:1‐21. [Google Scholar]
- 11. Aitchison J, Brown JAC. The Log‐Normal Distribution. Cambridge (UK: Cambridge University Press; 1957. [Google Scholar]
- 12. Crow EL, Shimizu K. Log‐Normal Distributions: Theory and Application. New York: Dekker; 1988. [Google Scholar]
- 13. Lee ET. Statistical Methods for Survival Data Analysis. New York: Wiley; 1992. [Google Scholar]
- 14. Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions. New York: Wiley; 1994. [Google Scholar]
- 15. Collett D. Modelling Survival Data in Medical Research. 2nd ed. CRC press; 2003. [Google Scholar]
- 16. Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716‐723. [Google Scholar]
- 17. Cox DR, Oakes D. Analysis of Survival Data. USA: Chapman & Hall; 1998. [Google Scholar]
- 18. Kaplan EL, Meir P. Nonparametric estimation from incomplete observation. J Am Stat Assoc. 1958;53(282):457‐481. [Google Scholar]
- 19. Bhattacharjee A. Application of Bayesian approach in cancer clinical trial. World J Oncol. 2014;5(3):109‐112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Lunn M, McNeil D. Applying Cox regression to competing risks. Biometrics. 1995;51(2):524‐532. [PubMed] [Google Scholar]
- 21. Fisher LD, Lin DY. Time‐dependent covariates in Cox proportional hazard regression model. Annu Rev Public Health. 1999;20(1):145‐157. [DOI] [PubMed] [Google Scholar]
- 22. Yi N, Tang Z, Zhang X, Guo B. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics. 2018;35:1‐3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gardiner JC. Survival analysis: overview of parametric, nonparametric and semi‐parametric approaches and new developments. Stat Data Anal Paper. 2010;252. [Google Scholar]
- 24. Dehkordi BM, Safaee A, Pourhoseingholi MA, Fatemi R, Tabeie Z, Zali MR. Statistical comparison of survival model for analysis of cancer data. Asian Pac J Cancer Prev. 2008;9:417‐420. [PubMed] [Google Scholar]
- 25. Pourhoseingholi MA, Moghimi‐Dehkordi B, Safaee A, Hajizadeh E, Solhpour A, Zali MR. Prognostic factors in gastric cancer using log normal censored regression. Indian J Med Res. 2009;129:262‐267. [PubMed] [Google Scholar]
- 26. Pourhoseingholi MA, Hajizadeh E, Moghimi Dehkordi B, Safaee A, Abadi A, Zali MR. Comparing Cox regression and parametric model for survival of patient with gastric carcinoma. Asian Pac J Cancer Prev. 2007;8(3):412‐416. [PubMed] [Google Scholar]
- 27. Gupta RC, Akman O, Lvin S. A study of log‐logistic model in survival analysis. Biom J. 1999;41(4):431‐443. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the finding of the study are available from the corresponding author upon reasonable request from author.
