Abstract
The hazard ratio provides a natural target for assessing a treatment effect with survival data, with the Cox proportional hazards model providing a widely used special case. In general, the hazard ratio is a function of time and provides a visual display of the temporal pattern of the treatment effect. A variety of nonproportional hazards models have been proposed in the literature. However, available methods for flexibly estimating a possibly time-dependent hazard ratio are limited. Here, we investigate a semiparametric model that allows a wide range of time-varying hazard ratio shapes. Point estimates as well as pointwise confidence intervals and simultaneous confidence bands of the hazard ratio function are established under this model. The average hazard ratio function is also studied to assess the cumulative treatment effect. We illustrate corresponding inference procedures using coronary heart disease data from the Women's Health Initiative estrogen plus progestin clinical trial.
Keywords: Clinical trial, Empirical process, Gaussian process, Hazard ratio, Simultaneous inference, Survival analysis, Treatment–time interaction
1. INTRODUCTION
Consider the comparison of failure times between a treated and control group under independent censorship. The hazard ratio provides a natural target of estimation in many applications since it permits a focus on relative failure rates across the study follow-up period, without the need to model absolute failure rates, which may be sensitive to study eligibility criteria and other factors. The proportional hazards special case of the Cox (1972) regression model is widely used for hazard ratio estimation. The maximum partial likelihood procedure (Cox, 1975) provides a convenient and robust means of estimating a constant hazard ratio and yields a log-rank procedure for testing equality of hazards between the 2 groups.
In general, the hazard ratio may be a function of time, and estimation of the hazard ratio function may provide useful insights into temporal aspects of treatment effects. For example, Gilbert and others (2002) develop a nonparametric estimation procedure for the log-hazard ratio function with simultaneous confidence bands, for use as an exploratory data analytic tool. Naturally, confidence bands may be wide with such a nonparametric estimator, particularly at longer follow-up times where data may be sparse. See also Gray (1992), Kooperberg and others (1995), Cai and Sun (2003), Tian and others (2005), Abrahamowicz and Mackenzie (2007), and Peng and Huang (2007), and references therein, for additional related work.
Parametric or semiparametric hazard ratio models have potential to contribute valuably to treatment effect assessment. Hazard ratio models having parameters of useful interpretation, and that embrace a range of hazard ratio shapes, may be particularly valuable. The Cox model allows time-varying covariates to be defined that can, for example, allow separate hazard ratios for the elements of a partition of the time axis or allow the hazard ratio to be a parametric function of follow-up time more generally. Various other semiparametric regression models have been proposed for failure time data analyses, including accelerated failure time models, proportional odds models, and linear transformation models, many of which are embraced by the broad class of models for which Zeng and Lin (2007) develop maximum likelihood estimation procedures. Some more semiparametric models can be found in Vaupel and others (1979), Hsieh (1996), Chen and Wang (2000), Tsodikov (2002), Yang and Prentice (2005), and Chen and Cheng (2006). Many of these models induce a semiparametric class of models for the hazard ratio function that includes proportional hazards as a special case. Hazard ratio estimators under such semiparametric models can avoid the instability that may attend nonparametric hazard ratio function estimators.
One of these, proposed by Yang and Prentice (2005), involves short-term and long-term hazard ratio parameters, and a hazard ratio function that depends also on the control group survivor function. Assume absolutely continuous failure times and label the 2 groups control and treatment, with hazard functions λC(t) and λT(t), respectively. Let h(t) = λT(t)/λC(t) be the hazard ratio function and SC(t) the survivor function of the control group. The model postulates that
| (1.1) |
where β1 and β2 are scalar parameters and
| (1.2) |
This model includes the proportional hazards model and the proportional odds model as special cases. It has a monotone h(t) with a variety of patterns, including proportional hazards, no initial effect, disappearing effect, and crossing hazards, among others. Thus, the model presumably entails sufficient flexibility for many applications. It has also been studied for current status data in Tong and others (2007).
In comparison, for many commonly used special cases of the accelerated failure time model either limt↓0h(t) = 1 or limt↑τ0h(t)∈{0,1,∞} and the hazard ratio stays above or below one when λC is increasing. This is less flexible than desired. For the class of linear transformation models, with the logarithmic transformation, the hazard ratio also inherits some of these restrictions at many common baseline distributions. Similar properties hold as well for many other semiparametric models.
Under model (1.1), estimation procedures to date have focused on the finite-dimensional parameters, as has mostly been the case also for estimation under other semiparametric models. Here, we extend the estimation to pointwise and simultaneous inference on the hazard ratio function itself. First, consistency and asymptotic normality of the estimate at a fixed time point are established. Then procedures for constructing pointwise confidence intervals and simultaneous confidence bands for the hazard ratio function are developed, and some modifications are implemented to improve moderate sample size performance. For additional display of the treatment effect, simultaneous confidence bands are also obtained for the average hazard ratio function over a time interval. The average hazard ratio gives a summary measure of treatment comparison and provides a picture of the cumulative treatment effect to augment display of the temporal pattern of the hazard ratio. These hazard ratio estimation procedures are applied to data from the Women's Health Initiative (WHI) estrogen plus progestin clinical trial (Writing Group For the Women's Health Initiative Investigators, 2002; Manson and others, 2003), which yielded a hazard ratio function for the primary coronary heart disease outcome that was decidedly nonproportional. Understanding the hazard ratio function shape in this setting was important to integrating the clinical trial data with a large body of preceding observational literature that had failed to identify an early hazard ratio increase (e.g. Manson and others, 2003; Prentice and others, 2005).
We organize the article as follows: In Section 2, the short-term and long-term hazard ratio model and the hazard ratio estimate are described. Pointwise confidence intervals of the hazard ratio are established. Simultaneous confidence bands for the hazard ratio and the average hazard ratio are provided in Section 3. Simulation results are presented in Section 4. Application to data from the WHI trial is given in Section 5. Some concluding remarks are given in Section 6. Proofs of the asymptotic results are contained in the Supplementary Material available at Biostatistics online.
2. HAZARD RATIO FUNCTION ESTIMATION
Let T1,…,Tn be the pooled lifetimes of the 2 groups, with T1,…,Tn1,n1 < n, constituting the control group having the survivor function SC. Let C1,…,Cn be the censoring variables, and Zi = I(i > n1),i = 1,…,n, where I(·) is the indicator function. The available data consist of the independent triplets (Xi,δi,Zi), i = 1,⋯,n, where Xi = min(Ti,Ci) and δi = I(Ti ≤ Ci). We assume that Ti and Ci are independent given Zi. The censoring variables (Ci's) need not be identically distributed, and in particular, the 2 groups may have different censoring patterns. For t < τ0 with τ0 defined in (1.2), let R(t) be the the odds function 1/SC(t) − 1 of the control group. The model of Yang and Prentice (2005) can be expressed as
| (2.1) |
where λi(t) is the hazard function for Ti given Zi. Under the model, the hazard ratio is
To estimate h(t), we need to estimate the parameter β = (β1,β2)T and the baseline function R(t), where “T ” denotes transpose. Let us first introduce the estimators from Yang and Prentice (2005).
Define
where b = (b1,b2)T. Let τ < τ0 be such that
| (2.2) |
with probability 1. For t ≤ τ, let
![]() |
where ΔH2(s;b) denotes the jump of H2(s;b) in s and
denotes the left continuous (in s) version of
Define the martingale residuals
Yang and Prentice (2005) proposed a pseudo maximum likelihood estimator
of β, which is the zero of Q(b), where
| (2.3) |
with fi = (f1i,f2i)T, where
Once
is obtained, R(t) can be estimated by
, and the hazard ratio h(t) can be estimated by
In Appendix A of the Supplementary Material available at Biostatistics online, we show that
is strongly consistent for h(t) under model (2.1).
To study the distributional properties of
, let
For the asymptotic distribution of
, define
![]() |
From Theorem A2 of Yang and Prentice (2005) and some algebra,
![]() |
where
![]() |
(2.4) |
Now for
, from Lemma A3 in Yang and Prentice (2005) and some algebra,
![]() |
(2.5) |
where
Let
![]() |
For t ≤ τ, define the process
![]() |
(2.6) |
With the representations for Q(β) and
, in Appendix B of the Supplementary Material available at Biostatistics online, we show that Wn is asymptotically equivalent to
which converges weakly to a zero-mean Gaussian process W*. The weak convergence of Wn thus follows. The limiting covariance function σ(s,t) of W* involves the derivative D(t;β) and the derivative matrix in U. Although analytic forms of these derivatives are available, they are quite complicated and cumbersome. Here, we approximate them by numerical derivatives. For the functions B(t),C(t), μ1(t), μ2(t), ν1(t), and ν2(t), define corresponding
by replacing β with
, R(t) with
and D(t;β) with the numerical derivatives. Similarly, let
be the numerical approximation of U. Simulation studies show that the results are fairly stable with respect to the choice of the jump size in the numerical derivatives, and that the choice of n − 1/2 works well. With these approximations, we can estimate σ(s,t), s ≤ t ≤ τ, by
![]() |
(2.7) |
Now for a fixed t0 ≤ τ, from the above results, confidence intervals for h(t0) can be obtained from the asymptotic normality of
and the estimated variance
The usual logarithm transformation results in the asymptotic 100(1 − α)% confidence interval
where zα/2 is the 100(1 − α/2)% percentile of the standard normal distribution.
3. SIMULTANEOUS CONFIDENCE BANDS
To make simultaneous inference on h(t) over a time interval I = [a,b]⊂[0,τ], consider
where s(t) converges in probability, uniformly in t over I, to a bounded function s*(t) > 0. From the weak convergence of Wn to W* and the functional delta method, we have the weak convergence of Vn to W*/s*. Thus, if cα is the upper αth percentile of supt∈I|W*/s*|, an asymptotic 100(1 − α)% simultaneous confidence band for h(t),t∈I, can be obtained as
![]() |
It is difficult to obtain cα analytically. One obvious alternative would be the bootstrapping method. However, it is very time-consuming and results in lower than nominal coverage probabilities in some simulation studies. Lin and others (1993) used a normal resampling approximation to simulate the asymptotic distribution of sums of martingale residuals for checking the Cox regression model. The normal resampling approach reduces computing time significantly and has become a standard method. It has been used in many works, including Lin and others (1994), Cheng and others (1997), Gilbert and others (2002), Tian and others (2005), and Peng and Huang (2007). We will modify this approach for our problem here.
For t ≤ τ, define the process
![]() |
(3.1) |
where ϵi,i = 1,…,n, are independent variables that are also independent from the data. Furthermore, these variables have mean zero and variance converging to one as n→∞. In the normal resampling approach mentioned above, the ϵi's are the standard normal variables. However, the standard normal variables often result in lower coverage probabilities in various simulation studies. Thus, with moderate sized samples, we need to make some adjustment.
Conditional on (Xi,δi,Zi),i = 1,…,n,
is a sum of n independent variables at each time point. In Appendix B of the Supplementary Material available at Biostatistics online, we show that
given the data converges weakly to W*. It follows that
given the data converges in distribution to
. Therefore, cα can be estimated empirically from a large number of realizations of the conditional distribution of
given the data.
Several choices of the weight s arise from recommendations in the literature for confidence bands of the survivor function and the cumulative hazard function in the one sample case. The choice
results in equal precision bands (Nair, 1984), which differ from pointwise confidence intervals in that cα replaces zα/2. The choice
results in the Hall–Wellner type bands recommended by Bie and others (1987), which often have narrower widths in the middle of data range and wider widths near the extremes of data range (Lin and others, 1994). One could also choose
This choice does not involve
and hence is easier to implement. It may be adequate when
only varies mildly over time.
Let a∈(0,τ) and define the average hazard ratio, over [a,t],
Note that the average hazard ratio involves an integral of the hazard ratio rather than a ratio of integrated hazards. It provides a measure for the cumulative treatment effect over a time interval to augment the temporal effect display from the hazard ratio estimates. It can be estimated by
To obtain simultaneous confidence bands for the average hazard ratio, let
In Appendix B of the Supplementary Material available at Biostatistics online, we show that
converges weakly to the zero-mean Gaussian process
. Also,
behaves more stably than
and its covariance function is comparatively insensitive to the choice of weight function. Hence, for simplicity, we consider only the process
From the functional delta method, it follows that
converges weakly to
. Thus, if
is the upper αth percentile of
an asymptotic 100(1 − α)% simultaneous confidence band for
can be obtained as
To approximate the critical value
, again we use a resampling approximation. In Appendix B of the Supplementary Material available at Biostatistics online, the process
given the data is shown to converge weakly to
. From this and strong consistency of
,
can be approximated empirically from a large number of realizations of the conditional distribution of
given the data.
4. SIMULATION STUDIES
Without any finite-sample modifications, it was found that the empirical coverage probabilities of the proposed confidence bands for the hazard ratio were often lower than the nominal levels for small samples, especially with substantial censoring. In a series of simulation studies, we have gone through an extensive trial and error process to evaluate various modifications. In the end, we recommend that the left continuous versions of the integrands in (2.3) be used. Also, instead of
, we will use the asymptotically equivalent form
In addition, it is best to restrict to the time range [infκ,supκ], where κ is the set of observations at which the weight function s(t) is less than or equal to the 90%th percentile of s(ti),i = 1,…,n, with ti's being the uncensored observations. This restriction is similar in spirit to the recommendations of Nair (1984) and Bie and others (1987), except we measure the extremeness of data by s(ti). For the hazard ratio and small to moderate n, we choose the ϵi's in (3.1) to be a multiple of the standard normal variables. We will use an ad hoc multiplier of
based on various simulations. For n equal to 400 or larger, the standard normal variables can be used. For the average hazard ratio, no such multiplier adjustment is necessary.
Next, we report the results from some representative simulation studies. Here and for the real data application in Section 5 later, τ was set to exclude the last-order statistic. All numerical computations were done in “Matlab.“ First, under the model of Yang and Prentice (2005), lifetime variables were generated with R(t) chosen to yield the standard exponential distribution for the control group. The values of β were (log(0.9),log(1.2)) and (log(1.2),log(0.8)), representing 1/3 increase or decrease over time from the initial hazard ratio, respectively. The censoring variables were independent and identically distributed with the log-normal distribution, where the normal distribution had mean c and standard deviation 0.5, with c chosen to achieve various censoring rates. The empirical coverage probabilities were obtained from 1000 repetitions, and for each repetition, the critical values cα and
were calculated empirically from 1000 realizations of relevant conditional distributions. The results of these simulations are summarized in Table 1, where the equal precision bands, Hall–Wellner type bands and unweighted bands for the hazard ratio are denoted by EP, HW, and UW, respectively. Results for simultaneous confidence bands of the average hazard ratio are also included with the column header
. From Table 1, the empirical coverage probabilities for the hazard ratio were mostly close to the nominal level. The empirical coverage probabilities for the average hazard ratio were mostly conservative. The conservative results were partially due to the finite-sample modifications intended for the hazard ratio. Those modifications improved the performance of the hazard ratio estimation procedure under some scenarios, while yielding conservatism in others, particularly for the more stable average hazard ratio estimator. The coverage probabilities for the equal precision bands overall were closer to the nominal level than other types of bands.
To check the robustness of the proposed procedures, we carried out various simulation studies with monotone hazard ratio not satisfying the model of Yang and Prentice (2005). For Table 2, the control group lifetime variables were standard exponential. The hazard ratio was linear from 0 to the 99th percentile of the standard exponential and continuous and constant afterward. The initial and end hazard ratios again were (0.9,1.2) and (1.2,0.8), respectively, and the censoring variables were the same as before. From Table 2, the results are similar to those in Table 1, with slight undercoverage under some scenarios.
Table 2.
Empirical coverage probabilities of the simultaneous confidence bands, for the hazard ratio (EP, HW, and UW) and the average hazard ratio (
), under a monotone hazard ratio model not satisfying the model of Yang and Prentice (2005), based on 1000 repetitions
| Hazard ratio | Censoring rate (%) | n1 = n2 | EP | HW | UW | |
| 0.9 ↑ 1.2 | 10 | 40 | 0.955 | 0.957 | 0.954 | 0.973 |
| 30 | 0.965 | 0.952 | 0.964 | 0.976 | ||
| 50 | 0.945 | 0.941 | 0.960 | 0.962 | ||
| 75 | 0.971 | 0.972 | 0.9754 | 0.970 | ||
| 10 | 80 | 0.959 | 0.963 | 0.958 | 0.983 | |
| 30 | 0.935 | 0.943 | 0.940 | 0.968 | ||
| 50 | 0.938 | 0.943 | 0.937 | 0.956 | ||
| 75 | 0.956 | 0.958 | 0.965 | 0.955 | ||
| 10 | 160 | 0.963 | 0.964 | 0.950 | 0.974 | |
| 30 | 0.952 | 0.949 | 0.937 | 0.966 | ||
| 50 | 0.940 | 0.935 | 0.920 | 0.960 | ||
| 75 | 0.957 | 0.969 | 0.976 | 0.971 | ||
| 1.2 ↓ 0.8 | 10 | 40 | 0.976 | 0.969 | 0.975 | 0.982 |
| 30 | 0.952 | 0.956 | 0.967 | 0.970 | ||
| 50 | 0.955 | 0.963 | 0.966 | 0.961 | ||
| 75 | 0.966 | 0.970 | 0.975 | 0.967 | ||
| 10 | 80 | 0.965 | 0.967 | 0.969 | 0.975 | |
| 30 | 0.954 | 0.967 | 0.969 | 0.972 | ||
| 50 | 0.941 | 0.948 | 0.960 | 0.968 | ||
| 75 | 0.965 | 0.965 | 0.971 | 0.973 | ||
| 10 | 160 | 0.969 | 0.977 | 0.960 | 0.976 | |
| 30 | 0.963 | 0.967 | 0.969 | 0.967 | ||
| 50 | 0.937 | 0.943 | 0.941 | 0.963 | ||
| 75 | 0.955 | 0.963 | 0.970 | 0.970 |
Table 1.
Empirical coverage probabilities of the simultaneous confidence bands, for the hazard ratio (EP, HW, and UW) and the average hazard ratio (
), under the model of Yang and Prentice (2005), based on 1000 repetitions
| Hazard ratio | Censoring rate (%) | n1 = n2 | EP | HW | UW | |
| 0.9 ↑ 1.2 | 10 | 40 | 0.954 | 0.946 | 0.963 | 0.973 |
| 30 | 0.952 | 0.946 | 0.961 | 0.970 | ||
| 50 | 0.971 | 0.960 | 0.976 | 0.977 | ||
| 75 | 0.967 | 0.966 | 0.977 | 0.964 | ||
| 10 | 80 | 0.955 | 0.957 | 0.959 | 0.963 | |
| 30 | 0.947 | 0.940 | 0.955 | 0.962 | ||
| 50 | 0.955 | 0.943 | 0.956 | 0.965 | ||
| 75 | 0.967 | 0.979 | 0.979 | 0.976 | ||
| 10 | 160 | 0.960 | 0.966 | 0.966 | 0.977 | |
| 30 | 0.954 | 0.950 | 0.951 | 0.969 | ||
| 50 | 0.941 | 0.937 | 0.940 | 0.964 | ||
| 75 | 0.960 | 0.970 | 0.971 | 0.967 | ||
| 1.2 ↓ 0.8 | 10 | 40 | 0.966 | 0.980 | 0.983 | 0.976 |
| 30 | 0.936 | 0.948 | 0.967 | 0.980 | ||
| 50 | 0.943 | 0.948 | 0.954 | 0.967 | ||
| 75 | 0.956 | 0.959 | 0.964 | 0.966 | ||
| 10 | 80 | 0.959 | 0.974 | 0.975 | 0.971 | |
| 30 | 0.926 | 0.946 | 0.945 | 0.964 | ||
| 50 | 0.930 | 0.946 | 0.939 | 0.953 | ||
| 75 | 0.959 | 0.966 | 0.968 | 0.965 | ||
| 10 | 160 | 0.957 | 0.973 | 0.963 | 0.967 | |
| 30 | 0.949 | 0.965 | 0.945 | 0.968 | ||
| 50 | 0.944 | 0.962 | 0.947 | 0.970 | ||
| 75 | 0.951 | 0.957 | 0.954 | 0.961 |
5. APPLICATION
Let us illustrate the proposed methods with data from the WHI randomized controlled trial of combined (estrogen plus progestin) postmenopausal hormone therapy, which reported an elevated coronary heart disease risk and overall unfavorable health benefits versus risks over a 5.6-year study period (Writing Group For the Women's Health Initiative Investigators, 2002; Manson and others, 2003). Few research reports have stimulated as much public response, since preceding observational research literature suggested a 40–50% reduction in coronary heart disease incidence among women taking postmenopausal hormone therapy. Analysis of the WHI observational study shows a similar discrepancy with the WHI clinical trial for each of coronary heart disease, stroke, and venous thromboembolism. The discrepancy is partially explained by confounding in the observational study. A remaining source of discrepancy between the clinical trial and the observational study is elucidated by recognizing a dependence of the hazard ratio on the therapy duration (e.g. Prentice and others, 2005). Here, we look at the time to coronary heart disease in the WHI clinical trial, which included 16 608 postmenopausal women initially in the age range of 50–79 with uterus (n1 = 8102). There were 188 and 147 events observed in the treatment and control group, respectively, implying about 98% censoring, primarily by the trial stopping time. Fitting model (2.1) to this data set, we get
=(0.65,−3.63)T. Due to heavy censoring, the value 0.03 of exp(
2) cannot be interpreted as the estimated long-term hazard ratio in the range of study follow-up times. The estimated hazard ratio function is needed for a more complete and accurate assessment of the treatment effect.
To examine model adequacy, we can use a residual plot that is similar to the method for the Cox regression model (Cox and Snell, 1968). Let ΛC and ΛT be the cumulative hazard functions of the 2 groups, respectively. Then ΛC(Ti),i ≤ n1,ΛT(Ti),i > n1 are i.i.d. from the standard exponential distribution. Let
C and
T be the model-based estimator of ΛC and ΛT, respectively, and define the residuals
If model (2.1) is correct, the residuals should behave like a censored sample from the standard exponential distribution. Thus, the Aalen–Nelson cumulative hazard estimator based on them should be close to the identity function. If there is noticeable deviation, then model (2.1) is questionable. Similarly, the residual plot can be obtained for the piecewise constant hazards ratio model used in Prentice and others (2005). Both residual plots, not shown here, suggest that the 2 models fit the data adequately, with similar residual behaviors.
The 95% pointwise confidence intervals and simultaneous confidence bands for the hazard ratio function are given in Figure 1. For comparison, the 95% confidence intervals for 0–2, 2–5, and > 5 years from Prentice and others (2005) are included, over the median of uncensored data in each time interval. Compared with the piecewise constant hazards ratio model, the confidence bands do not depend on partitioning of the data range and provide more continuously changing display of the treatment effect. The confidence bands are generally in agreement with the results from Prentice and others (2005). The UW band is wider than the other 2 bands most of the time. The HW band is the narrowest in the middle section but is quite wide at the beginning. Both the EP band and the HW band give narrower intervals for the middle portion of the data range than the piecewise Cox model. Near the end of the data range, all 3 bands have about the same width as the confidence interval from Prentice and others (2005). Overall the EP band matches most closely with the results for the piecewise constant hazards ratio model. The width of the EP band is less than or equal to the piecewise model–based confidence intervals for most of the data range, except at the beginning. Note that the constant function 1 is not excluded in the HW and UW bands. In comparison, the EP band stays above 1 for about the first 600 days. From Prentice and others (2005), the confidence interval for 0 − 2yr excludes 1, indicating an elevation in coronary heart disease risk for the treatment early on. For this data set, the standard error of the estimated hazard ratio begins at 0.43, quickly comes down to below 0.20 at 600 days and stays below 0.20 for the rest of data range. Since the UW band does not take the variance into account and the HW band emphasizes the middle range, the elevated standard error at early follow-up times likely explains the discrepancy among the results. Compared with the original analysis that showed an overall difference between the 2 groups, the results here and those from Prentice and others (2005) give more detailed analysis on the dependence of the hazard ratio on time and help explaining the discrepancy between the results of the WHI clinical trial and preceding observational research, much of which involved cohorts where women could be enrolled some years after initiating hormone therapy.
Fig. 1.
The 95% pointwise confidence intervals and simultaneous confidence bands of the hazard ratio function for the WHI data: Solid lines—equal precision confidence band; dashed lines— Hall–Wellner type confidence band; dash dotted lines— unweighted confidence band; outside dotted lines—pointwise confidence limits; and central dotted line—the estimated hazard ratio function; vertical segments—95% confidence intervals for the hazard ratios in 0–2, 2–5, and > 5 years intervals from Prentice and others (2005), over the median of uncensored data in each time interval, with “*” indicating the point estimates.
For the average hazard ratio function, the estimator and the 95% simultaneous confidence band are given in Figure 2. The standard error of the estimated average hazard ratio varies more mildly over time, and both the estimated average hazard ratio and the confidence band are changing much more smoothly compared with the results for the hazard ratio in Figure 1. Note that the confidence band stays above 1 for t < 700 days. This is in agreement with the results of Prentice and others (2005).
Fig. 2.
The 95% simultaneous confidence band of the average hazard ratio function for the WHI data: dotted line—estimated average hazard ratio; and solid lines—95% simultaneous confidence band.
To compare with the nonparametric approach, Figure 3 gives the estimated hazard ratio, the 95% pointwise confidence intervals and simultaneous confidence band of Gilbert and others (2002), based on the R programs from the author's site. The same scale as that in Figure 1 is used for comparison and results in truncation of some portion of the plot. The estimated hazard ratio suggests that the hazard ratio is reasonably monotonic. The nonparametric hazard ratio estimate is somewhat lower than the hazard ratio estimates in Figure 1 under either model (2.1) or the piecewise constant hazards ratio model. The confidence band is wider than those in Figure 1 for the beginning and later parts of the data range, reflecting the difficulty in making nonparametric inference on the hazard functions, especially with heavy censoring and in the tail region.
Fig. 3.
Nonparametric 95% pointwise confidence intervals and simultaneous confidence band of the hazard ratio function for the WHI data: dotted line—estimated hazard ratio; solid lines—simultaneous confidence band; and dashed lines—pointwise confidence intervals.
From the results here and additional numerical studies and real data applications, we find that for the hazard ratio, the EP bands are preferable if the interest is in the largest possible data range; if the interest is in part of the middle portion, then the HW bands are usually better. For the average hazard ratio, the simple confidence band proposed here works adequately, although could possibly be improved if more elaborate weights are used.
6. DISCUSSION
We have focused on the model of Yang and Prentice (2005) in deriving inference procedures for the hazard ratio function. Under this model, the hazard ratio involves the baseline survivor function, but not the baseline density function, a property shared by some other semiparametric models. Thus, inference on the hazard ratio may be easier and more reliable than approaches involving densities, such as those under the accelerated failure time model or the nonparametric approaches.
To assess the cumulative treatment effect, we have worked with the average hazard ratio function here, partly due to its close connection with the hazard ratio and its corresponding ready interpretation. Alternatively, the ratios ST(t)/SC(t) and (1 − ST(t))/(1 − SC(t)) or the difference ST(t) − SC(t), could be considered.
We expect that the model of Yang and Prentice (2005) can provide an adequate approximation for a wide range of applications. More rigorous model checking procedures would be useful to address model fit and robustness issues. Note that the form of this model is not closed under a relabeling of treatment and control groups, so its use may be more natural if there is a “no treatment ” or “standard treatment” control group. It would be possible to study hazard ratio function estimation for larger classes of semiparametric models to incorporate an even wider range of time dependence of the hazard ratio, though there is a trade off between the model fit and increasing variance, as well as analysis cost. Also, while we have focused on the 2-sample comparison here, adjustment for covariates may be considered. These and other problems are worthy of further exploration.
SUPPLEMENTARY MATERIAL
Supplementary material is available at http://biostatistics.oxfordjournals.org.
FUNDING
National Institutes of Health (CA 53996 to Ross L. Prentice).
Acknowledgments
We thank Co-Editor Anastasios Tsiatis, a referee, and an associate editor for helpful comments and suggestions. Conflict of Interest: None declared.
References
- Abrahamowicz M, Mackenzie TA. Joint estimation of time-dependent and non-linear effects of continuous covariates on survival. Statistics in Medicine. 2007;26:392–408. doi: 10.1002/sim.2519. [DOI] [PubMed] [Google Scholar]
- Bie O, Borgan ø, Liestøl K. Confidence intervals and confidence bands for the cumulative hazard rate function and their small-sample properties. Scandinavian Journal of Statistics. 1987;14:221–233. [Google Scholar]
- Cai Z, Sun Y. Local linear estimation for time-dependent coefficients in Cox's regression models. Scandinavian Journal of Statistics. 2003;30:93–111. [Google Scholar]
- Chen YQ, Cheng S. Linear life expectancy regression with censored data. Biometrika. 2006;93:303–313. [Google Scholar]
- Chen YQ, Wang MC. Analysis of accelerated hazards models. Journal of the American Statistical Association. 2000;95:608–618. [Google Scholar]
- Cheng SC, Wei LJ, Ying Z. Predicting survival probabilities with semiparametric transformation models. Journal of the American Statistical Association. 1997;92:227–235. [Google Scholar]
- Cox DR. Regression models and life-tables (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
- Cox DR. Partial likelihood. Biometrika. 1975;62:269–276. [Google Scholar]
- Cox DR, Snell EJ. A general definition of residuals (with discussion) Journal of the Royal Statistical Society, Series B. 1968;30:248–275. [Google Scholar]
- Gilbert PB, Wei LJ, Kosorok MR, Clemens JD. Simultaneous inferences on the contrast of two hazard functions with censored observations. Biometrics. 2002;58:773–780. doi: 10.1111/j.0006-341x.2002.00773.x. [DOI] [PubMed] [Google Scholar]
- Gray RJ. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. Journal of the American Statistical Association. 1992;87:942–951. [Google Scholar]
- Hsieh F. A transformation model for two survival curves: an empirical process approach. Biometrika. 1996;83:519–528. [Google Scholar]
- Kooperberg C, Stone CJ, Truong YK. Hazard regression. Journal of the American Statistical Association. 1995;90:78–94. [Google Scholar]
- Lin DY, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika. 1994;81:73–81. [Google Scholar]
- Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572. [Google Scholar]
- Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, Trevisan M, Black HR, Heckbert SR, Detrano R and others. Estrogen plus progestin and the risk of coronary heart disease. The New England Journal of Medicine. 2003;349:523–534. doi: 10.1056/NEJMoa030808. [DOI] [PubMed] [Google Scholar]
- Nair VN. Confidence bands for survival functions with censored data: a comparative study. Technometrics. 1984;26:265–275. [Google Scholar]
- Peng L, Huang Y. Survival analysis with temporal covariate effects. Biometrika. 2007;94:719–733. [Google Scholar]
- Prentice RL, Langer R, Stefanick ML, Howard BV, Pettinger M, Anderson G, Barad D, Curb JD, Kotchen J, Kuller L and others. Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the women's health initiative clinical trial. American Journal of Epidemiology. 2005;162:404–414. doi: 10.1093/aje/kwi223. [DOI] [PubMed] [Google Scholar]
- Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. Journal of the American statistical Association. 2005;100:172–183. [Google Scholar]
- Tong X, Zhu C, Sun J. Semiparametric regression analysis of two-sample current status data, with applications to tumorigenicity experiments. The Canadian Journal of Statistics. 2007;35:575–584. [Google Scholar]
- Tsodikov A. Semi-parametric models of long- and short-term survival: an application to the analysis of breast cancer survival in Utah by age and state. Statistics in Medicine. 2002;21:895–920. doi: 10.1002/sim.1054. [DOI] [PubMed] [Google Scholar]
- Vaupel JW, Manton KG, Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979;16:439–454. [PubMed] [Google Scholar]
- Writing Group For the Women's Health Initiative Investigators. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the women's health initiative randomized controlled trial. Journal of the American Medical Association. 2002;288:321–333. doi: 10.1001/jama.288.3.321. [DOI] [PubMed] [Google Scholar]
- Yang S, Prentice RL. Semiparametric analysis of short-term and long-term hazard ratios with two-sample survival data. Biometrika. 2005;92:1–17. [Google Scholar]
- Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society, Series B. 2007;69:507–564. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.













