Abstract
Background
Although the Cox time-varying coefficient (TVC) model has been developed to address non-proportional hazard (PH), its use remains underexplored. Instead, the restricted mean survival time (RMST) has been widely used in non-PH settings to quantify treatment effects using life expectancy ratio (LER) and life expectancy difference (LED).
Methods
This study explores a novel extension of the Cox TVC model under non-PH to generate LER and LED to enable a direct comparison with RMST based on flexible parametric survival model (FPM). An intensive simulation study was conducted to compare the performance of FPM to the Cox TVC model under PH and non-PH assumptions. The survival time t was assumed to follow the Piecewise Exponential distribution with various censoring patterns generated from the Uniform distribution. Both methods were evaluated via a randomised clinical trial of nasopharyngeal cancer exhibiting increasing treatment benefit.
Results
Intensive simulations showed Cox TVC outperformed FPM under non-PH in terms of bias and coverage, with generally higher power observed in scenarios of crossing or diverging curves under low censoring. In real-world data, the FPM produced slightly larger LER and LED estimates than Cox TVC. Cox TVC has the advantage of assessing treatment effect at different milestones and detecting earlier difference when estimating using hazard ratio (HR).
Conclusion
Overall, Cox TVC is a viable option for summarising treatment effect using LER and LED under non-PH conditions. It would be beneficial to complement the reporting by providing estimates of HR at specific milestone to detect early differences.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12874-025-02608-z.
Keywords: Restricted mean survival time, Time; Varying coefficient Cox model, Non; Proportional hazards, Flexible parametric model, Time; To; Event outcomes
Introduction
In randomised clinical trials with time-to-event outcomes, the Cox proportional hazard (PH) model is frequently employed to assess the effect of treatment through the estimation of hazard ratio (HR) [1, 2]. However, recent studies have highlighted that the PH assumption may not always be valid in clinical trials involving new immuno-oncology medicines owing to the mechanism of action of these medications [3, 4]. Survival curves commonly diverged and crossed due to delayed therapeutic response and dwindling treatment effectiveness. Commonly, in Phase 2/3 randomised controlled trials (RCTs) comparing induction chemotherapy plus chemo-radiotherapy to chemo-radiotherapy alone, it was suggested that the PH assumption was untenable for the progression-free survival, disease-free survival and overall survival (OS) outcomes in patients with locally advanced nasopharyngeal carcinoma (NPC) [5–7]. Therefore, a single HR estimate from the conventional Cox PH model may not be appropriate for summarising the treatment effect if it is not constant over time as it does not accurately capture the survival experience of the patients over the entire follow-up duration.
More complex methodologies for handling non-PH have been proposed, including the adaptations of the conventional Cox model by incorporating time-varying coefficient (TVC) [8, 9]. This adaptation allows the HR to be quantified at multiple time points, capturing the time-varying treatment effect within the Cox TVC model. Despite the development of the Cox TVC model to address non-PH conditions, its application in clinical trials where treatment effect changes over time remains underexplored.
Various methods have been proposed to quantify the effect of treatment when the PH assumption is violated. These include the average hazard ratio [10],area between two survival curves [11], difference in median survival time and milestone time, and generalised hazard difference [12]. Among these, the restricted mean survival time (RMST) has been proposed as an alternative key summary measure for assessing treatment effect, and it has gained traction in recent years [13–15]. It is defined as the area underneath the survival curve between time 0 and a pre-determined truncation time . This approach is beneficial regardless of whether the PH assumption holds, and it offers straightforward interpretations without the need for extensive modelling expertise. The treatment effect can be estimated as a difference or ratio between the RMSTs of the experimental and control arms.
Despite the wide application of RMST, a vital issue is the choice of , which is limited by the duration of follow-up and censoring distribution of the study. This issue has been addressed in the literature, with the minimum of the greatest observed event or censoring time amongst the groups or any clinically meaningful milestone time being suggested. To avoid biases in the results, the value of should be determined based on clinical rather than purely statistical considerations, with attention to the maturity of the data [13, 16–18]. For example, in our motivation data based on a randomised clinical trial (SQNP01) of patients with nasopharyngeal cancer, data maturity was observed between Years 4–6, and significant treatment effects were observed during this time, whereas selecting an earlier led to a non-significant finding [19]. Further details regarding this analysis will be provided in the subsequent sections.
When the PH assumption is not valid, life expectancy ratios (LER) and life expectancy differences (LED) have become widely accepted summaries of treatment effect, as demonstrated in numerous studies utilising both real-world and simulated data [17, 20–22]. Notably, a previous study has compared and evaluated the Cox PH model and RMST from Kaplan–Meier using reciprocal HR (control vs. experimental) against RMST ratio (experimental vs. control), or vice versa [17]. In addition, HR from Cox PH was compared with RMST under non-PH conditions [20, 21]. These studies have consistently shown that the HR derived from the Cox PH model tends to provide larger effect estimates compared to ratio of RMST. While these measures are clinically interpretable, they are not directly comparable. This discrepancy motivates our quest for a unified metric to compare the Cox and RMST models appropriately.
RMST estimation has not traditionally been incorporated into the Cox PH model, despite its flexibility for capturing dynamic treatment effect under non-PH [13]. In this paper, we propose a novel extension of the Cox TVC model by incorporating RMST-based estimator (LER and LED) to quantify treatment effect under non-PH. By allowing time-varying effects within the RMST framework, our approach captures time-varying treatment effect without relying on PH, thereby providing a more robust estimation. Furthermore, the introduction of metrics such as LER and LED allows for a direct and consistent comparison between the Cox TVC model and RMST-based methods. This unified approach addresses a significant gap in the literature, where previous studies have predominantly compared the conventional Cox PH model against RMST-based methods, often employing inconsistent metrics for comparison of treatment effect evaluation [17].
We evaluate the performance of the Cox TVC model and the RMST based on a flexible parametric survival model (FPM) in terms of LER and LED under various scenarios of PH and non-PH, considering different levels of censoring. The data from a randomised clinical trial (SQNP01) of patients diagnosed with stage III or IV (non-metastatic) nasopharyngeal cancer (NPC) comparing conventional radiotherapy (RT) with concurrent chemo-radiotherapy followed by adjuvant chemotherapy (CRT) is used for illustration [19]. In this study, we demonstrate that Cox TVC is a viable alternative to FPM when there is non-PH, and a guidance on selection of metrics and approaches in the form of decision-making table is offered.
Methodology
Restricted mean survival time (RMST)
Let be the right truncated survival time, where is a pre-specified time. The RMST, for the random time-to-event variable is defined as the region under the survival curve from to as follows:
| 1 |
The specification of was discussed by Royston and Parmar [13, 16]. Usually, the last observed event or censored time, or a clinically significant milestone may be considered for the cutoff time, . Several approaches may be utilised to estimate RMST. These include estimating the area under the Kaplan–Meier curve or fitting a Cox PH model and performing numerical integration up to , or fitting a flexible parametric survival model (FPM) and integrating the estimated survival curve up to , using jackknife method [13].
Flexible parametric model (FPM)
Royston and Parmar [16, 23] proposed fitting a flexible parametric survival model to estimate the RMST due to its adaptability in estimating the quantities and its simplicity in fitting new variables and time-dependent effects. The Kaplan–Meier method as a step function can become unstable for RMST estimation when the risk set is small, leading to unreliable results [13]. FPM addresses this issue by using restricted cubic splines to smoothly model the survival function, to enhance stability and accuracy in RMST estimation. In the flexible parametric model, assuming the binary treatment variable whose effect varies over time, the time-varying effect can be modeled by interacting with a restricted cubic spline function [24]. Denoting to be a restricted cubic spline function of with knots and parameters the log cumulative hazard can be estimated by
| 2 |
where is the coefficient for the log hazard ratio for PH is a function that captures the time-varying effect of. The restrictive cubic spline with knots has only cubic polynomials in the interior sub-functions but requires linearity before the first and after the last knot [23]. The complex shape of the data can be captured by cubic polynomials given a suitable number of knots. The spline is a flexible piecewise function joined by polynomials with pre-defined number of knots to produce smooth curves, and its first and second derivatives need to be continuous [25].
The restrictive cubic spline with knots can be expressed as
| 3 |
where , for . It allows the fitting of baseline log cumulative hazard functions with complex shapes. There is a tradeoff between increasing the flexibility of the fitted function and risking overfitting the model. Thus the flexible parametric survival model with 3 knots and 1 df (FPM(3,1)) was recommended for estimating RMST, where 1 df is the number of time-dependent effect which is constrained to be a linear function of log of follow-up time ( = log(t)) [16, 26].
The survival function can be expressed as follows:
| 4 |
Cox time-varying coefficient (TVC) model
Let and be the hazard and baseline hazard of an event at time . The standard Cox model may be extended to handle non-PH by including a time-varying coefficient (TVC) as follow:
| 5 |
where is the regression coefficient of under PH assumption, and when combined with , they capture the time-varying effect of treatment . The survival function can be estimated as follow:
| 6 |
where is the scenario-specific hazard contribution from Kalbfleisch and Prentice [27].
While both Eq. (5) and (6) are expressed in terms of t, which is a common and preferred function in Cox model [28], alternative functions of t, for example ln(t), is also commonly used in the Cox TVC model [8, 9]. Thus, in a further sensitivity analysis, we considered the use of ln(t), which has been employed to capture the time-varying effect, and we denote this model as Cox TVCln(t) [9, 29].
Estimands: Life expectancy ratio (LER) and life expectancy difference (LED)
The treatment effect based on RMST at may be quantified in terms of life expectancy ratio (LER) or life expectancy difference (LED). The LER is a relative effect measure, and is estimated by the following formula:
| 7 |
An LER greater than 1 favours the experimental arm and suggests that the experimental (E) arm has a longer mean survival time than the control (C) arm. The SE of LER is obtained using Taylor’s expansion as follows [30]
| 8 |
and may be obtained via Monte Carlo simulation for Cox TVC model, and via bootstrapping using the package strmst in STATA for FPM [26].
The LED is an absolute measure that represents an absolute impact on life expectancy per unit of time [13, 30]. It may be expressed as:
| 9 |
where and are the RMST at time for E and C arms respectively. and can be estimated by the integration of survival function stated in Eq. (6) for the Cox TVC model.
The standard error (SE) of LED may be estimated using the following formula [30]
| 10 |
where and refer to the respective SE of the RMST in arms E and C.
Simulation study
Simulation study design
We consider seven different scenarios by assuming survival time t from Piecewise Exponential distribution, which offers flexibility to accommodate a wide range of survival patterns[16, 22]. These scenarios include survival curves representing PH, survival curves that overlap initially and diverge thereafter, and survival curves that cross over. The hazard function from the Piecewise Exponential distribution is formulated as follows by defining changing points = 0 and = :
| 11 |
where = E and C for indicating experimental and control treatment, respectively. is an indicator function, equal to 1 if is in the interval , and 0 otherwise. The survival function at time t is presented as follow:
| 12 |
For truncation time , where the true RMST for each group can be estimated as:
| 13 |
True LER and LED can be calculated based on formula (7) and (9) after obtaining the true RMST for each arm. In our simulation, we assume censoring time C to follow a Uniform distribution ,where is chosen to achieve the desired censoring proportion. The observed time is min(t, C). By tuning , the censoring rates were approximately 26%, 47% and 60%, depicting low, moderate and heavy censoring respectively. To simulate a trial involving an experimental and a control treatment which are equally allocated, Binomial (n, 0.5) was assumed.
Simulation settings assuming Piecewise exponential distribution
The following seven simulation scenarios were constructed under both PH and non-PH assumptions.
-
i.Proportional hazard (no changing point):
- Scenario 1: HR = 0.5, depicting beneficial effect of treatment.
- Scenario 2: , HR = 1, depicting no effect of treatment.
- Scenario 3: HR = 1.5, depicting detrimental effect of treatment.
-
ii.Survival curves diverge (= ):
- Scenario 4: and where HR = 1, 1.9 and 3 for the three time intervals, indicating detrimental effect of treatment when survival curves diverged.
- Scenario 5: and where HR = 1, 0.5 and 0.3 for the three time intervals, indicating increasing beneficial effect of treatment at the later time period.
-
iii.Survival curves cross (= ):
- Scenario 6: and where HR = 0.7, 1.5 and 3.5 for the three time intervals, indicating beneficial effect of treatment at the beginning and detrimental effect thereafter.
- Scenario 7: and where HR = 1.5, 0.5 and 0.29 for the three time intervals, indicating detrimental effect of treatment at the beginning and beneficial effect thereafter.
Small and moderate sample sizes of 200 and 500 were considered with 1000 replications generated for each scenario.
Performance measure
For each approach and scenario, the treatment effect was quantified at pre-specified (3 and 5), corresponding to mid or late effect of treatment in oncology trials. The LER and LED estimates derived from FPM(3,1) and Cox TVC models were evaluated in terms of bias, 95% coverage probability, power and Type I error.
Simulation results
Performance under PH assumption
Figure 1 shows that the magnitude of bias of LER estimated based on Cox TVC was consistently lower than that of FPM(3,1) regardless of the censoring rates for n = 500. When HR = 0.5 under PH assumption (Scenario 1), the 95% coverage probabilities of LER and LED estimates were achieved by both methods which showed comparable power (Fig. 1, Table 1). In Scenario 2 (HR = 1), both models showed similar coverage probabilities, whereas the Cox TVC achieved a lower Type I error for n = 200 and 500. Similar patterns were observed concerning the coverage probabilities and power between the two models when HR = 1.5 for both sample sizes (Scenario 3).
Fig. 1.
Bias, 95% coverage probability, Power and Type I Error of FPM(3,1) and Cox TVC estimates of LER under PH assumption (Scenarios 1 to 3) for HR = 0.5, 1 and 1.5, with low, moderate and high censoring, n = 500 and varying t = 3 to 5
Table 1.
Bias, 95% coverage probability and Power of FPM(3,1) and Cox TVC estimates of LED under PH assumption (Scenarios 1 to 3) for HR = 0.5, 1 and 1.5, with low, moderate and high censoring, n = 200 and 500 and varying t = 3 to 5
| n | Censoring | Method | Time (Year) |
Scenario 1 (HR = 0.5) |
Scenario 2 (HR = 1) |
Scenario 3 (HR = 1.5) |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| True LED | Bias | Coverage | Power | True LED | Bias | Coverage | Type I Error | True LED | Bias | Coverage | Power | ||||
| 200 | Low | FPM | 3 | 0.391 | -0.004 | 0.948 | 0.999 | 0 | 0.007 | 0.947 | 0.053 | -0.254 | 0.007 | 0.945 | 0.409 |
| 5 | 0.864 | -0.003 | 0.950 | 1.000 | 0 | 0.012 | 0.948 | 0.052 | -0.542 | 0.012 | 0.950 | 0.543 | |||
| Cox TVC | 3 | 0.391 | -0.004 | 0.966 | 0.999 | 0 | 0.007 | 0.966 | 0.034 | -0.254 | 0.007 | 0.956 | 0.380 | ||
| 5 | 0.864 | -0.002 | 0.958 | 1.000 | 0 | 0.012 | 0.954 | 0.046 | -0.542 | 0.010 | 0.954 | 0.499 | |||
| Moderate | FPM | 3 | 0.391 | -0.001 | 0.950 | 1.000 | 0 | 0.006 | 0.932 | 0.068 | -0.254 | 0.008 | 0.944 | 0.399 | |
| 5 | 0.864 | -0.001 | 0.953 | 1.000 | 0 | 0.010 | 0.945 | 0.055 | -0.542 | 0.011 | 0.946 | 0.514 | |||
| Cox TVC | 3 | 0.391 | -0.002 | 0.961 | 0.997 | 0 | 0.008 | 0.944 | 0.056 | -0.254 | 0.007 | 0.949 | 0.365 | ||
| 5 | 0.864 | -0.001 | 0.959 | 1.000 | 0 | 0.011 | 0.946 | 0.054 | -0.542 | 0.010 | 0.954 | 0.483 | |||
| High | FPM | 3 | 0.391 | -0.002 | 0.950 | 0.994 | 0 | 0.008 | 0.939 | 0.061 | -0.254 | 0.008 | 0.940 | 0.374 | |
| 5 | 0.864 | -0.004 | 0.949 | 0.998 | 0 | 0.017 | 0.940 | 0.060 | -0.542 | 0.016 | 0.942 | 0.438 | |||
| Cox TVC | 3 | 0.391 | -0.004 | 0.953 | 0.987 | 0 | 0.009 | 0.941 | 0.059 | -0.254 | 0.007 | 0.946 | 0.327 | ||
| 5 | 0.864 | -0.003 | 0.956 | 0.998 | 0 | 0.017 | 0.948 | 0.052 | -0.542 | 0.013 | 0.945 | 0.417 | |||
| 500 | Low | FPM | 3 | 0.391 | -0.003 | 0.948 | 0.999 | 0 | -0.004 | 0.948 | 0.052 | -0.254 | -0.004 | 0.949 | 0.823 |
| 5 | 0.864 | -0.003 | 0.950 | 1.000 | 0 | -0.006 | 0.949 | 0.051 | -0.542 | -0.007 | 0.948 | 0.932 | |||
| Cox TVC | 3 | 0.391 | -0.004 | 0.966 | 0.999 | 0 | -0.004 | 0.962 | 0.038 | -0.254 | -0.001 | 0.970 | 0.828 | ||
| 5 | 0.864 | -0.003 | 0.958 | 1.000 | 0 | -0.007 | 0.959 | 0.041 | -0.542 | -0.001 | 0.959 | 0.911 | |||
| Moderate | FPM | 3 | 0.391 | -0.003 | 0.950 | 1.000 | 0 | -0.004 | 0.946 | 0.054 | -0.254 | -0.004 | 0.947 | 0.799 | |
| 5 | 0.864 | -0.004 | 0.957 | 1.000 | 0 | -0.009 | 0.939 | 0.061 | -0.542 | -0.007 | 0.940 | 0.907 | |||
| Cox TVC | 3 | 0.391 | -0.003 | 0.961 | 0.997 | 0 | -0.004 | 0.953 | 0.047 | -0.254 | -0.002 | 0.948 | 0.763 | ||
| 5 | 0.864 | -0.006 | 0.959 | 1.000 | 0 | -0.008 | 0.954 | 0.046 | -0.542 | -0.001 | 0.950 | 0.883 | |||
| High | FPM | 3 | 0.391 | -0.003 | 0.950 | 0.994 | 0 | -0.002 | 0.952 | 0.048 | -0.254 | -0.004 | 0.949 | 0.761 | |
| 5 | 0.864 | -0.004 | 0.949 | 0.998 | 0 | -0.002 | 0.944 | 0.056 | -0.542 | -0.005 | 0.952 | 0.831 | |||
| Cox TVC | 3 | 0.391 | -0.003 | 0.952 | 0.987 | 0 | -0.001 | 0.959 | 0.041 | -0.254 | -0.001 | 0.950 | 0.728 | ||
| 5 | 0.864 | -0.003 | 0.954 | 0.998 | 0 | -0.002 | 0.951 | 0.049 | -0.542 | 0.001 | 0.957 | 0.825 | |||
Performance when survival curves diverge
The operational features of these approaches exhibit various patterns of separation of survival curves (Fig. 2, Table 2, Supplementary Table 2). In contrast to FPM(3,1), considering HR = 1.9 for t [1, 3), HR = 3 for t 3 (Scenario 4), the absolute bias was slightly smaller for Cox TVC for n = 200 and 500. It was observed that the coverage of FPM(3,1) was lower (< 95%) under this scenario, whereas that of Cox TVC exceeded 95% for moderate and high censoring. Generally, assuming HR = 0.5 for t [1, 3), HR = 0.3 for t 3 (Scenario 5), the bias of LER and LED based on Cox TVC was lower (Fig. 2, Table 2). Similar to Scenario 4, Cox TVC maintained 95% coverage for both estimates under moderate and high censoring. Notably, the performance of both methods seems better for high censoring as compared to low or moderate censoring with respect to coverage and bias for n = 200 and 500. In Scenarios 4 and 5, Cox TVC generally exhibited lower power, except under low censoring when n = 500.
Fig. 2.
Bias, 95% coverage probability and Power of FPM(3,1) and Cox TVC estimates of LER when survival curves diverge, assuming HR = 1.9 for t , HR = 3 for t 3 (Scenario 4), and HR = 0.5 for t , HR = 0.3 for t 3 (Scenario 5), with low, moderate and high censoring, n = 500
Table 2.
Bias, 95% coverage probability and Power of FPM(3,1) and Cox TVC estimates of LED when survival curves diverge assuming HR = 1.9 for t [1,3), HR = 3 for t 3 (Scenario 4), and HR = 0.5 for t [1,3), HR = 0.3 for t 3 (Scenario 5) with low, moderate and high censoring, n = 200 and 500
| Sample size | Censoring rate | Method | Time (Year) |
Scenario 4 (HR = 1.9 for t [1,3), HR = 3 for t 3) |
Scenario 5 (HR = 0.5 for t [1,3), HR = 0.3 for t 3) |
||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| True LED | Bias | Coverage probability | Power | True LED | Bias | Coverage probability | Power | ||||
| 200 | Low | FPM | 3 | -0.179 | -0.033 | 0.950 | 0.363 | 0.179 | 0.035 | 0.933 | 0.363 |
| 5 | -0.659 | -0.036 | 0.957 | 0.834 | 0.659 | 0.040 | 0.937 | 0.837 | |||
| Cox TVC | 3 | -0.179 | -0.033 | 0.964 | 0.330 | 0.179 | 0.022 | 0.961 | 0.314 | ||
| 5 | -0.659 | 0.020 | 0.968 | 0.788 | 0.659 | -0.018 | 0.959 | 0.755 | |||
| Moderate | FPM | 3 | -0.179 | -0.036 | 0.948 | 0.354 | 0.179 | 0.036 | 0.931 | 0.350 | |
| 5 | -0.659 | -0.032 | 0.955 | 0.775 | 0.659 | 0.032 | 0.934 | 0.765 | |||
| Cox TVC | 3 | -0.179 | -0.005 | 0.965 | 0.262 | 0.179 | 0.007 | 0.968 | 0.267 | ||
| 5 | -0.659 | 0.011 | 0.961 | 0.692 | 0.659 | -0.015 | 0.954 | 0.682 | |||
| High | FPM | 3 | -0.179 | -0.027 | 0.946 | 0.319 | 0.179 | 0.032 | 0.946 | 0.327 | |
| 5 | -0.659 | 0.017 | 0.953 | 0.601 | 0.659 | 0.005 | 0.945 | 0.640 | |||
| Cox TVC | 3 | -0.179 | 0.008 | 0.964 | 0.275 | 0.179 | -0.004 | 0.961 | 0.281 | ||
| 5 | -0.659 | 0.014 | 0.957 | 0.588 | 0.659 | -0.004 | 0.958 | 0.614 | |||
| 500 | Low | FPM | 3 | -0.179 | -0.036 | 0.931 | 0.748 | 0.179 | 0.037 | 0.933 | 0.731 |
| 5 | -0.659 | -0.035 | 0.945 | 0.985 | 0.659 | 0.039 | 0.950 | 0.992 | |||
| Cox TVC | 3 | -0.179 | -0.030 | 0.939 | 0.780 | 0.179 | 0.025 | 0.938 | 0.770 | ||
| 5 | -0.659 | 0.003 | 0.957 | 0.989 | 0.659 | -0.001 | 0.958 | 0.993 | |||
| Moderate | FPM | 3 | -0.179 | -0.038 | 0.935 | 0.722 | 0.179 | 0.039 | 0.928 | 0.712 | |
| 5 | -0.659 | -0.028 | 0.947 | 0.990 | 0.659 | 0.035 | 0.955 | 0.995 | |||
| Cox TVC | 3 | -0.179 | -0.010 | 0.955 | 0.655 | 0.179 | 0.012 | 0.949 | 0.635 | ||
| 5 | -0.659 | 0.008 | 0.958 | 0.980 | 0.659 | -0.022 | 0.960 | 0.982 | |||
| High | FPM | 3 | -0.179 | -0.030 | 0.945 | 0.641 | 0.179 | 0.033 | 0.945 | 0.655 | |
| 5 | -0.659 | 0.014 | 0.950 | 0.952 | 0.659 | -0.004 | 0.941 | 0.959 | |||
| Cox TVC | 3 | -0.179 | 0.004 | 0.951 | 0.580 | 0.179 | -0.003 | 0.956 | 0.565 | ||
| 5 | -0.659 | 0.009 | 0.965 | 0.955 | 0.659 | -0.003 | 0.951 | 0.961 | |||
Performance when survival curves cross
For Scenarios 6 and 7 of crossing survival curves, absolute bias for both LER and LED estimates were slightly higher for FPM(3,1) than Cox TVC under moderate and high censorings for n = 200 and 500 (Fig. 3, Table 3, Supplementary Table 3). In Scenario 6 assuming HR = 1.5 for t [1, 3), HR = 3.5 for t 3, lower coverages were generally observed for LER and LED estimated using FPM(3,1) under moderate and high censoring. In Scenario 7 assuming HR = 0.5 for t [1, 3), HR = 0.29 for t 3, comparable coverage probabilities were observed for both methods regardless of censoring. Higher power was observed for Cox TVC for scenarios 6 and 7 under low censoring when n = 500 (Fig. 3, Table 3).
Fig. 3.
Bias, 95% coverage probability and Power of FPM(3,1) and Cox TVC estimates of LER when survival curves cross, assuming HR = 1.5 for t , HR = 3.5 for t 3 (Scenario 6), and HR = 0.5 for t , HR = 0.29 for t 3 (Scenario 7), with low, moderate and high censoring, n = 500
Table 3.
Bias, 95% coverage probability and Power of FPM(3,1) and Cox TVC estimates of LED when survival curves cross, assuming HR = 1.5 for t [1,3), HR = 3.5 for t 3 (Scenario 6), and HR = 0.5 for t [1,3), HR = 0.29 for t 3 (Scenario 7), with low, moderate and high censoring, n = 200 and 500
| Sample size | Censoring rate | Method | Time (Year) |
Scenario 6 (HR = 1.5 for t [1,3), HR = 3.5 for t 3) |
Scenario 7 (HR = 0.5 for t [1,3), HR = 0.29 for t 3) |
||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| True LED | Bias | Coverage probability | Power | True LED | Bias | Coverage probability | Power | ||||
| 200 | Low | FPM | 3 | -0.029 | -0.049 | 0.904 | 0.141 | 0.125 | 0.026 | 0.926 | 0.251 |
| 5 | -0.502 | -0.040 | 0.930 | 0.702 | 0.663 | 0.015 | 0.944 | 0.855 | |||
| Cox TVC | 3 | -0.029 | -0.045 | 0.947 | 0.108 | 0.125 | 0.028 | 0.945 | 0.287 | ||
| 5 | -0.502 | -0.007 | 0.957 | 0.626 | 0.663 | -0.008 | 0.951 | 0.832 | |||
| Moderate | FPM | 3 | -0.029 | -0.052 | 0.910 | 0.136 | 0.125 | 0.026 | 0.925 | 0.241 | |
| 5 | -0.502 | -0.054 | 0.932 | 0.666 | 0.663 | 0.022 | 0.935 | 0.811 | |||
| Cox TVC | 3 | -0.029 | -0.028 | 0.957 | 0.081 | 0.125 | 0.025 | 0.955 | 0.202 | ||
| 5 | -0.502 | 0.029 | 0.954 | 0.588 | 0.663 | -0.021 | 0.947 | 0.761 | |||
| High | FPM | 3 | -0.029 | -0.053 | 0.911 | 0.122 | 0.125 | 0.025 | 0.934 | 0.208 | |
| 5 | -0.502 | -0.027 | 0.945 | 0.574 | 0.663 | 0.013 | 0.942 | 0.716 | |||
| Cox TVC | 3 | -0.029 | -0.002 | 0.961 | 0.075 | 0.125 | -0.003 | 0.961 | 0.159 | ||
| 5 | -0.502 | 0.023 | 0.956 | 0.496 | 0.663 | -0.012 | 0.954 | 0.667 | |||
| 500 | Low | FPM | 3 | -0.029 | -0.054 | 0.873 | 0.244 | 0.125 | 0.027 | 0.939 | 0.509 |
| 5 | -0.502 | -0.041 | 0.954 | 0.974 | 0.663 | 0.010 | 0.960 | 0.997 | |||
| Cox TVC | 3 | -0.029 | -0.052 | 0.894 | 0.332 | 0.125 | 0.029 | 0.945 | 0.752 | ||
| 5 | -0.502 | -0.030 | 0.959 | 0.977 | 0.663 | 0.008 | 0.968 | 0.998 | |||
| Moderate | FPM | 3 | -0.029 | -0.056 | 0.862 | 0.236 | 0.125 | 0.027 | 0.942 | 0.481 | |
| 5 | -0.502 | -0.051 | 0.933 | 0.962 | 0.663 | 0.018 | 0.961 | 0.997 | |||
| Cox TVC | 3 | -0.029 | -0.036 | 0.934 | 0.158 | 0.125 | 0.026 | 0.945 | 0.501 | ||
| 5 | -0.502 | 0.020 | 0.951 | 0.950 | 0.663 | -0.017 | 0.968 | 0.994 | |||
| High | FPM | 3 | -0.029 | -0.059 | 0.866 | 0.233 | 0.125 | 0.025 | 0.941 | 0.444 | |
| 5 | -0.502 | -0.029 | 0.952 | 0.924 | 0.663 | 0.009 | 0.952 | 0.982 | |||
| Cox TVC | 3 | -0.029 | -0.009 | 0.957 | 0.133 | 0.125 | -0.003 | 0.954 | 0.396 | ||
| 5 | -0.502 | 0.024 | 0.959 | 0.893 | 0.663 | -0.007 | 0.958 | 0.981 | |||
Sensitivity analysis
A sensitivity analysis was conducted by specifying the time-varying coefficient as a function of ln(t), denoted as Cox TVCln(t) (Supplementary Tables 4 and 5). Under the PH assumption, Cox TVCln(t) exhibited higher 95% coverage and lower Type I error than FPM(3,1).
In scenarios where survival curves diverge (Scenarios 4 and 5), 95% coverage probabilities were generally achieved by Cox TVCln(t), and comparable power was observed for both methods under low censoring when n = 500. The bias of Cox TVCln(t) was generally lower than FPM. In scenarios where survival curves cross (Scenarios 6 and 7), lower coverage were generally observed for LER and LED when estimated using FPM(3,1). Besides, absolute bias for both estimates were slightly higher for FPM(3,1) than Cox TVCln(t). Comparable power was observed for both methods under low censoring when n = 500. Overall, modelling the TVC as a function of ln(t) did not alter the primary conclusions or the decision-making table.
Illustration using SQNP01 study
The SQNP01 trial was conducted between September 1997 and May 2003, with the aim to confirm the OS benefit of adding chemotherapy to radiotherapy, and to assess its applicability to patients with endemic nasopharyngeal cancer [19]. As shown in Fig. 4, the Kaplan–Meier (KM) OS curves demonstrated diverging survival trends, suggesting an increasing beneficial effect of treatment over time. The Schoenfeld residuals test indicated non-proportionality of hazard for the treatment variable (p = 0.011). The pattern on non-proportionality of this data set followed that of Scenario 5. The survival probabilities estimated by Cox TVC and Cox TVCln(t) were both comparable to KM (Fig. 4). Survival probabilities estimated by FPM(3,1) for the RT arm between 2.5 to 5.3 years were slightly lower as compared with KM, and higher thereafter. All three models showed increasing LER estimates over time (Table 4). Consistent with the observation of Fig. 4, LER estimates from FPM(3,1) were slightly larger than those of Cox TVC especially for t beyond 3 years, with correspondingly smaller p-values. Estimates of LED showed similar pattern of increase with time, with generally smaller difference for Cox TVC as compared with FPM(3,1). As with the results from the simulation studies, application using the NP01 data demonstrated that the Cox TVC method was robust regardless of whether the function of time used was t or ln(t). Figure 5 displays the HR as a function of time with a 95% confidence interval for ranging from 0 to 6. Of note, significance was detected from Year 2 onwards when the treatment effect was quantified in terms of HR (HR = 0.48, 95% CI: 0.28 – 0.81, p = 0.006).
Fig. 4.
Overall survival curves of Kaplan–Meier, Cox TVC, Cox TVCln(t) and FPM(3,1) models (data from Wee, et al. [19])
Table 4.
LER and LED estimates of FPM(3,1), Cox TVC and Cox TVCln(t) models (data from Wee, et al. [19])
| Time (Year) | LER | LED | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FPM (3,1) | Cox TVC | Cox TVCln(t) | FPM (3,1) | Cox TVC | Cox TVCln(t) | |||||||
|
Estimate (95% CI) |
p |
Estimate (95% CI) |
p |
Estimate (95% CI) |
p |
Estimate (95% CI) |
p |
Estimate (95% CI) |
p |
Estimate (95% CI) |
p | |
| 1 | 0.99 (0.95–1.02) | 0.443 | 1.00 (0.95–1.05) | 0.877 | 0.99 (0.93 – 1.05) | 0.686 | -0.01 (-0.05–0.02) | 0.443 | -0.004 (-0.05–0.04) | 0.877 | -0.01 (-0.07 – 0.05) | 0.686 |
| 2 | 1.00 (0.95–1.06) | 0.886 | 1.01 (0.94–1.08) | 0.828 | 1.01 (0.94 – 1.08) | 0.823 | 0.01 (-0.10–0.11) | 0.886 | 0.01 (-0.11–0.13) | 0.828 | 0.01 (-0.11 – 0.14) | 0.823 |
| 3 | 1.04 (0.97–1.13) | 0.285 | 1.04 (0.96–1.12) | 0.348 | 1.05 (0.97 – 1.13) | 0.251 | 0.11 (-0.09–0.31) | 0.285 | 0.10 (-0.11–0.30) | 0.348 | 0.12 (-0.08 – 0.32) | 0.251 |
| 4 | 1.10 (0.99–1.21) | 0.055 | 1.09 (0.98–1.20) | 0.098 | 1.10 (0.996 – 1.21) | 0.060 | 0.31 (-0.01–0.62) | 0.055 | 0.27 (-0.05–0.59) | 0.098 | 0.30 (-0.01 – 0.62) | 0.060 |
| 5 | 1.16 (1.04–1.30) | 0.008 | 1.15 (1.03–1.29) | 0.016 | 1.16 (1.03 – 1.30) | 0.011 | 0.58 (0.15–1.01) | 0.008 | 0.54 (0.10–0.98) | 0.015 | 0.56 (0.13 – 0.996) | 0.011 |
| 6 | 1.24 (1.08–1.41) | 0.002 | 1.23 (1.08–1.41) | 0.003 | 1.23 ( 1.07 – 1.40) | 0.003 | 0.93 (0.36–1.50) | 0.001 | 0.91 (0.32–1.50) | 0.002 | 0.88 (0.30 – 1.46) | 0.003 |
Fig. 5.
Estimation of the HR over time, with the corresponding 95% confidence interval and p-value (data from Wee, et al. [19])
Discussion
In this paper, we rigorously examined the application of two statistical approaches—FPM(3,1) and Cox TVC models, to estimate LER and LED under non-PH. While FPM(3,1) is more commonly used, we highlighted the practical utility of Cox TVC model as a robust alternative. Comprehensive simulations were conducted for both PH and non-PH scenarios. When there was non-PH, the Cox TVC model performed better than FPM(3,1) in terms of magnitude of bias, 95% coverage probability and Type I error. Generally, FPM(3,1) exhibited marginally higher power, except when there was crossing or diverging of survival curves with low censoring.
The RMST-based method has been recommended for analysing time-to-event outcomes when non-PH treatment effects occur because it does not require any model assumptions. However, the choice of in situations where survival curves cross can lead to a loss of power and increased bias [20, 31]. Lin et al. [31] have shown that the RMST estimated based on KM method may perform poorly in non-PH situation with crossing hazard due to loss of power. This finding was consistent with Scenario 6 of our simulation where the coverage probabilities of LER and LED from FPM(3,1) were smaller than those of Cox TVC, and with higher absolute bias.
Huang and Kuan [22] have shown that the performance of RMST-based method in non-PH settings is highly dependent on the choice of t*. This was demonstrated in our simulation which showed the coverage probabilities increased from t* = 3 to 5 under moderate and high censorings when the curves crossed in Scenario 6 (Fig. 3 and Table 3). In the separation scenario, the RMST-based method performed better when t* was closer to the tail end of the curves in the investigation of Huang and Kuan [22]. This finding corroborated with those observed for FPM(3,1) when t* = 5 under high censoring, where lower bias, higher coverage and power were generally attained compared to t* = 3 (Fig. 2 and Table 2). In addition, it was observed that scenarios with higher censoring yielded lower bias compared to lower censoring. For instance, in Scenario 4 (considering HR = 1.9 for t [1, 3), HR = 3 for t 3), with a sample size of 500, the high censoring setting (60%) yielded a total of 202 events, of which 197 occurred by t* = 5 corresponding to an information fraction (i.e. the ratio of events by t* to the total number of events) of 97.5%. Conversely, in the low censoring setting (26%), although the total number of events increased to 375, only 294 occurred by t* = 5, leading to a lower information fraction of 78.4%. This lower information fraction was associated with increased bias, as illustrated by LER at t* = 5, where the absolute bias increased from 0.005 under high censoring to 0.01 under low censoring for FPM(3,1), and from 0.005 to 0.007 for Cox TVC. Nevertheless, the larger total number of events under low censoring still contributed to improved statistical power in that setting. In addition, it is possible to summarise the survival function in a more clinically relevant manner by describing the RMST curve at t* in a time interval instead of a single time point t as suggested by Zhao et al. [32].
We used the flexible parametric model to estimate RMST in this study. The complexity of the spline function in the model is based on the quantity of knots, which corresponds to the number of degrees of freedom utilised to represent the hazard function. The fitting function is more adaptable when there are more internal knots. Royston and Parmar [13] recommended that RMST based on FPM(3,1) is a good choice to be specified in a trial protocol, and thus we have assumed the same setting for our RMST model. They have also discussed the placement of knots. Durrleman and Simon recommended the boundary knots kmin and kmax to be placed at the extreme observed time values, and the internal knots selected based on the centiles [25]. The hazard and survival functions obtained from these models have been demonstrated to be indifferent to the number of knots if the number is adequate ( d.f) [33]. Additionally, the tails of the spline function must be linear in ln(t) beyond two specified boundary knots, which may lead to information loss, reduced prediction capacity, and poor fit. In contrast, Cox TVC model allows the estimation of treatment effect at different time points all in a single model. This provides meaningful insight into how the effect of treatment changes over time. The early and mid-term effects, as well as the earliest time when treatment shows benefit are all invaluable information for clinicians in the management of patients. In the case of RMST-based method, however, the effectiveness of treatment is summarised by a single estimate, namely LED or LER, based on the time point specified.
Moreover, the coverage probability obtained based on FPM(3,1) was lower in most simulation settings examined. This may imply overconfidence in the estimates resulting in false positives or a larger Type I error than intended. The illustrative data set, showed p-values from FPM(3,1) and Cox TVC were 0.055 and 0.098, respectively at 4-year. This suggests that the FPM(3,1) may reach significance sooner than Cox TVC, consistent with the simulation findings where FPM(3,1) demonstrated higher power under high censoring. Therefore, we suggest characterising the time-varying treatment effect using HR at clinical meaningful timepoints to supplement the single summary measure based on RMST. Of note, estimates based on LER were close to the value of null for t = 1 to 3 years, although the KM method showed separation of survival curves from 2 years onwards. As such, the effect of treatment might have been missed at earlier time points based on the estimates of LER and LED, similar to the observation Huang and Kuan [22], who reported better performance of RMST for t* closer to the tail end of the curves.
Quigley [34] suggested that prior to evaluating treatment differences, a decision had to be made with regards to how the effect should be measured. He further added that based on power considerations, it should be based on HR. This corroborated with our observation based on the SQNP01 data, which showed significant early treatment effect from Year 2 onwards when quantified in terms of HR. In contrast, quantification via LER or LED, detected a significant treatment effect from only Year 5 onwards. Thus, when reporting the findings of randomised clinical trials involving time-to-event outcomes, it would be informative to also provide other additional summary measures such as median survival time or survival probability at relevant time points by treatment groups, to allow a more meaningful assessment of treatment effect.
The development of computers and modeling technologies has gained emergent interest in the Bayesian approach, which provides a flexible alternative to the frequentist method [35]. It enables the incorporation of prior knowledge in the statistical model via assigning prior distribution to each parameter, and the posterior distribution of parameters would be determined by the likelihood of the observed data given the parameters and prior distribution [36]. Thus, the time-varying effect of the covariate may be modelled by placing the priors on the non-PH parameters to favor PH while arbitrarily allowing non-PH as the number of events permits. This however is beyond the scope of this study, and may be considered for future research.
Conclusion
We have demonstrated that the Cox TVC model is a viable approach for quantifying time-varying treatment effects in terms of LER and LED in the presence of non-PH. Given the comparative performance and vital usage of appropriate statistical methodologies in clinical trial analysis, it would be beneficial to provide HR estimates at specific milestone to detect early treatment differences. A decision-making table is presented to summarise the findings from this study (Table 5). It serves as a practical guide for selecting the competing methods under various patterns of non-PH. Cox TVC is recommended for non-PH scenarios involving diverging survival curves under low censoring, and crossing curves under low or moderate censoring, since it demonstrated lower bias, higher coverage and generally higher power especially for n = 500. For cases where power is the primary indicator, FPM is recommended for high censoring. This study thus provides researchers and practitioners with a structured framework for choosing the most suitable model to ensure a more robust and reliable analyses in future applications.
Table 5.
Decision-making table for selection between Cox TVC and FPM(3,1) models based on bias, coverage and power for low, moderate and high censoring scenarios assuming non-PH
| Lower Bias | Higher Coverage | Higher Power | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Censoring rate | Censoring rate | Censoring rate | |||||||||
| Estimand | Non-PH | Method | Low | Moderate | High | Low | Moderate | High | Low | Moderate | High |
| LER | Diverging | Cox TVC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FPM(3,1) | ✓ | ✓ | ✓ | ||||||||
| Crossing | Cox TVC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FPM(3,1) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| LED | Diverging | Cox TVC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FPM(3,1) | ✓ | ✓ | ✓ | ||||||||
| Crossing | Cox TVC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| FPM(3,1) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Supplementary Information
Acknowledgements
Not applicable.
Abbreviations
- FPM
Flexible Parametric Model
- HR
Hazard Ratio
- KM
Kaplan-Meier
- LED
Life Expectancy Difference
- LER
Life Expectancy Ratio
- OS
Overall survival
- PH
Proportional Hazards
- RCT
Randomised Controlled Trial
- RMST
Restricted Mean Survival Time
- SE
Standard Error
- TVC
Time-Varying Coefficient
Authors’ contributions
TYG wrote the STATA codes for the simulation study, conducted real-data analyses, and drafted the manuscript. BCT contributed to the conception and design of the study, assisted with the data analysis and provided critical feedback during the manuscript revisions. ZJC participated in the conception of the study, and assisted with the manuscript revision. YYS offered valuable input during the manuscript revision process. JW provided the data for the SQNP01 trial. All authors reviewed and approved the final version of the manuscript.
Funding
This research is partially supported by the National University Health System, Singapore, Internal Grant Funding (NUHSRO/2020/144/RO5+6/Seed-Sep/04)
Data availability
The SQNP01 data will not be shared, however the codes for simulation of this study are available upon reasonable request.
Declarations
Ethics approval and consent to participate.
Not applicable.
Consent for publication.
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd ed. Hoboken (NJ): Wiley-Interscience; 2002. [Google Scholar]
- 2.Saad ED, Zalcberg JR, Péron J, Coart E, Burzykowski T, Buyse M. Understanding and Communicating Measures of Treatment Effect on Survival: Can We Do Better? J Natl Cancer Inst. 2017;110(3):232–40. [DOI] [PubMed] [Google Scholar]
- 3.Blagoev KB, Wilkerson J, Fojo T. Hazard ratios in cancer clinical trials–a primer. Nat Rev Clin Oncol. 2012;9(3):178–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ananthakrishnan R, Green S, Previtali A, Liu R, Li D, LaValley M. Critical review of oncology clinical trial design under non-proportional hazards. Crit Rev Oncol Hematol. 2021;162:103350. [DOI] [PubMed] [Google Scholar]
- 5.Hui EP, Ma BB, Leung SF, King AD, Mo F, Kam MK, Yu BK, Chiu SK, Kwan WH, Ho R, et al. Randomized phase II trial of concurrent cisplatin-radiotherapy with or without neoadjuvant docetaxel and cisplatin in advanced nasopharyngeal carcinoma. J Clin Oncol. 2009;27(2):242–9. [DOI] [PubMed] [Google Scholar]
- 6.Fountzilas G, Ciuleanu E, Bobos M, Kalogera-Fountzila A, Eleftheraki AG, Karayannopoulou G, Zaramboukas T, Nikolaou A, Markou K, Resiga L, et al. Induction chemotherapy followed by concomitant radiotherapy and weekly cisplatin versus the same concomitant chemoradiotherapy in patients with nasopharyngeal carcinoma: a randomized phase II study conducted by the Hellenic Cooperative Oncology Group (HeCOG) with biomarker evaluation. Ann Oncol. 2012;23(2):427–35. [DOI] [PubMed] [Google Scholar]
- 7.Tan T, Lim WT, Fong KW, Cheah SL, Soong YL, Ang MK, Ng QS, Tan D, Ong WS, Tan SH, et al. Concurrent chemo-radiation with or without induction gemcitabine, Carboplatin, and Paclitaxel: a randomized, phase 2/3 trial in locally advanced nasopharyngeal carcinoma. Int J Radiat Oncol Biol Phys. 2015;91(5):952–60. [DOI] [PubMed] [Google Scholar]
- 8.Zhang Z, Reinikainen J, Adeleke KA, Pieterse ME, Groothuis-Oudshoorn CGM. Time-varying covariates and coefficients in Cox regression models. Ann Transl Med. 2018;6(7):121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Thomas L, Reyes EM. Tutorial: survival estimation for Cox regression models with time-varying coefficients using SAS and R. J Stat Softw. 2014;61:1–23. [Google Scholar]
- 10.Kalbfleisch JD, Prentice RL. Estimation of the average hazard ratio. Biometrika. 1981;68(1):105–12. [Google Scholar]
- 11.Lin X, Xu Q. A new method for the comparison of survival distributions. Pharm Stat. 2010;9(1):67–76. [DOI] [PubMed] [Google Scholar]
- 12.Snapinn S, Jiang Q, Ke C. Treatment effect measures under nonproportional hazards. Pharm Stat. 2023;22(1):181–93. [DOI] [PubMed] [Google Scholar]
- 13.Royston P, Parmar MK. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med. 2011;30(19):2409–21. [DOI] [PubMed] [Google Scholar]
- 14.Clamp AR, James EC, McNeish IA, Dean A, Kim JW, O’Donnell DM, Hook J, Coyle C, Blagden S, Brenton JD, et al. Weekly dose-dense chemotherapy in first-line epithelial ovarian, fallopian tube, or primary peritoneal carcinoma treatment (ICON8): primary progression free survival analysis results from a GCIG phase 3 randomised controlled trial. Lancet. 2019;394(10214):2084–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rahmadian AP, Delos Santos S, Parshad S, Everest L, Cheung MC, Chan KK. Quantifying the Survival Benefits of Oncology Drugs With a Focus on Immunotherapy Using Restricted Mean Survival Time. J Natl Compr Canc Netw. 2020;18(3):278–85. [DOI] [PubMed] [Google Scholar]
- 16.Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol. 2013;13:152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Trinquart L, Jacot J, Conner SC, Porcher R. Comparison of Treatment Effects Measured by the Hazard Ratio and by the Ratio of Restricted Mean Survival Times in Oncology Randomized Controlled Trials. J Clin Oncol. 2016;34(15):1813–9. [DOI] [PubMed] [Google Scholar]
- 18.Phinyo P, Patumanond J, Pongudom S. Time-dependent treatment effects of metronomic chemotherapy in unfit AML patients: a secondary analysis of a randomised controlled trial. BMC Res Notes. 2021;14(1):3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wee J, Tan EH, Tai BC, Wong HB, Leong SS, Tan T, Chua ET, Yang E, Lee KM, Fong KW, et al. Randomized trial of radiotherapy versus concurrent chemoradiotherapy followed by adjuvant chemotherapy in patients with American Joint Committee on Cancer/International Union against cancer stage III and IV nasopharyngeal cancer of the endemic variety. J Clin Oncol. 2005;23(27):6730–8. [DOI] [PubMed] [Google Scholar]
- 20.Wei Y, Royston P, Tierney JF, Parmar MK. Meta-analysis of time-to-event outcomes from randomized trials using restricted mean survival time: application to individual participant data. Stat Med. 2015;34(21):2881–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, Schrag D, Takeuchi M, Uyama Y, Zhao L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol. 2014;32(22):2380–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huang B, Kuan PF. Comparison of the restricted mean survival time with the hazard ratio in superiority trials with a time-to-event end point. Pharm Stat. 2018;17(3):202–13. [DOI] [PubMed] [Google Scholar]
- 23.Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175–97. [DOI] [PubMed] [Google Scholar]
- 24.Lambert PC, Royston P. Further Development of Flexible Parametric Models for Survival Analysis. Stand Genomic Sci. 2009;9(2):265–90. [Google Scholar]
- 25.Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med. 1989;8(5):551–61. [DOI] [PubMed] [Google Scholar]
- 26.Royston P. Estimating the treatment effect in a clinical trial using difference in restricted mean survival time. Stand Genomic Sci. 2015;15(4):1098–117. [Google Scholar]
- 27.Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd ed. Hoboken (NJ): Wiley; 2011. [Google Scholar]
- 28.Bellera CA, MacGrogan G, Debled M, de Lara CT, Brouste V, Mathoulin-Pélissier S. Variables with time-varying effects and the Cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol. 2010;10(1):20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ruhe C. Estimating Survival Functions after Stcox with Time-varying Coefficients. Stand Genomic Sci. 2016;16(4):867–79. [Google Scholar]
- 30.Dehbi HM, Royston P, Hackshaw A. Life expectancy difference and life expectancy ratio: two measures of treatment effects in randomised trials with non-proportional hazards. BMJ. 2017;357:j2250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lin RS, Lin J, Roychoudhury S, Anderson KM, Hu T, Huang B, Leon LF, Liao JJZ, Liu R, Luo X, et al. Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis. Stat Biopharmaceut Res. 2020;12(2):187–98. [Google Scholar]
- 32.Zhao L, Claggett B, Tian L, Uno H, Pfeffer MA, Solomon SD, Trippa L, Wei LJ. On the restricted mean survival time curve in survival analysis. Biometrics. 2016;72(1):215–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rutherford MJ, Crowther MJ, Lambert PC. The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study. J Stat Comput Simul. 2015;85(4):777–93. [Google Scholar]
- 34.O’Quigley J. Testing for Differences in Survival When Treatment Effects Are Persistent, Decaying, or Delayed. J Clin Oncol. 2022;40(30):JCO.21.01811. [DOI] [PubMed] [Google Scholar]
- 35.Ibrahim JG, Chen MH, Sinha D. Bayesian survival analysis. 1st ed. New York (NY): Springer; 2001.
- 36.Omurlu IK, Ozdamar K, Ture M. Comparison of Bayesian survival analysis and Cox regression analysis in simulated and breast cancer data sets. Expert Syst Appl. 2009;36(8):11341–6. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The SQNP01 data will not be shared, however the codes for simulation of this study are available upon reasonable request.





