Comparison of Multiple Hazard Rate Functions

Zhongxue Chen; Hanwen Huang; Peihua Qiu

doi:10.1111/biom.12412

. Author manuscript; available in PMC: 2018 Apr 23.

Published in final edited form as: Biometrics. 2015 Sep 22;72(1):39–45. doi: 10.1111/biom.12412

Comparison of Multiple Hazard Rate Functions

Zhongxue Chen ^1,^*, Hanwen Huang ², Peihua Qiu ³

PMCID: PMC5912921 NIHMSID: NIHMS959137 PMID: 26393315

SUMMARY

Many robust tests have been proposed in the literature to compare two hazard rate functions, however, very few of them can be used in cases when there are multiple hazard rate functions to be compared. In this paper, we propose an approach for detecting the difference among multiple hazard rate functions. Through a simulation study and a real-data application, we show that the new method is robust and powerful in many situations, compared with some commonly used tests.

Keywords: Asymptotically independent, counting process, crossing, survival data

1. Introduction

In survival data analysis, people are often interested in comparing hazard rate functions or survival curves of different treatment groups. In the literature, a large number of tests have been proposed to compare two hazard rate functions. See, for example, the so-called G^ρ,γ test statistics introduced by Fleming and Harrington (Fleming and Harrington, 1991) that include the commonly used log-rank test due to Mantel and Haenszel (Mantel and Haenszel, 1959), and the Gehan-Wilcoxon test due to Gehan (Gehan, 1965) as special cases. The G^ρ,γ tests are weighted log-rank tests that are based on the weighted sum of the differences between the expected and observed numbers of events in one group at each failure time point. In the G^ρ,γ tests, the weights are usually assigned using non-negative values. In different situations, it has been shown that improvement of the G^ρ,γ tests can be achieved by choosing appropriate weights (Buyske, Fagerstrom and Ying, 2000; Yang and Prentice, 2010).

In practice, the two hazard rate functions may cross each other. If this is the case, then the G^ρ,γ tests would not be optimal and they may even have little or no power at all. To circumvent this limitation, negative weights have been considered and assigned to some failure time points (Moreau et al., 1992; Qiu and Sheng, 2008). In Qiu and Sheng (2008), the authors proposed a two-stage approach to compare two hazard rate functions. In the first stage, they used the log-rank test, which has been shown to be the most powerful test when the hazard rates are proportional (Fleming and Harrington, 1991); in the second stage, a weighted log-rank test with possible negative weights was designed to detect the difference between two hazard rate functions in cases when they crossed each other. An overall p-value was then calculated based on the two test statistics. This two-stage test was shown to be robust and it has a good power to detect the difference between two hazard rate functions regardless whether they cross each other or not. In the literature, some other methods designed specifically for detecting the difference between two crossing hazard rate functions have been proposed either based on other functions (e.g., absolute or squared) of the differences between the two hazard rates or some modeling techniques (Lin and Wang, 2004; Liu, Qiu and Sheng, 2007; Park and Qiu, 2014).

When there are multiple treatment groups, although the log-rank test and the Gehan-Wilcoxon test can be extended and applied, they may have little or no power if the hazard rate functions cross each other. Robust methods with good properties as those of the two-stage approach are highly desirable. However, there is no simple extension of the aforementioned two-stage approach for comparing multiple hazard rate functions. In this paper, we propose a new approach based on a series of asymptotically independent tests. An overall p-value for each test is calculated using a robust p-value combining method.

The rest parts of the paper are organized as follows. In Section 2, we describe the proposed method; in Section 3, through a simulation study, we study the numerical performance of the new test; in Section 4, we illustrate the new test using a real-data application; the paper is concluded with some concluding remarks.

2. Proposed Method

Suppose we have K treatment groups; for group k (k = 1,2, …, K), the survival function of the surviving time, T_ik (i = 1,2, …, n_k), is S_k(t) = 1 − F_k(t), and the survival function of the censoring time, C_ik, is L_k(t) = 1 − G_k(t), where F_k(t) and G_k(t) are the related cumulative distribution functions. Let {t₁, t₂, ⋯, t_D} be the set of D distinct ordered event times in the pooled sample. Define d_ik as the observed number of events out of Y_ik individuals at risk in the k^th group at time t_i (i = 1,2, ⋯, D). Denote d_i = (d_i₁, d_i₂, ⋯, d_iK)^T, and Y_i = (Y_i₁, Y_i₂,· ⋯, Y_iK)^T. Let X_ik = min (T_ik, C_ik), $δ_{i k} = I_{{T_{i k} < C_{i k}}}$ , and π_k(t) = P(X_ik > t) = S_k(t)L_k(t) (k = 1,2, …, K), where I_(·) is the indicator function. In this paper, we make the conventional assumption that the surviving times T_ik and the censoring times C_ik are independent.

To test for the homogeneity of the hazard rate functions among the K groups, we consider the following hypotheses:

H₀:S₁(t) = S₂(t) = ⋯ = S_K(t), and
H₁: at least one S_k is different than others at some time t.

In the two-sample situation (i.e., K = 2), the un-weighted log-rank test is constructed as follows:

U = h \sum_{i = 1}^{D} w_{1 i} (d_{i 1} - Y_{i 1} \frac{d_{i 1} + d_{i 2}}{Y_{i 1} + Y_{i 2}}) / \sqrt{\sum_{i = 1}^{D} w_{1 i}^{2} \frac{Y_{i 1}}{Y_{i 1} + Y_{i 2}} \frac{Y_{i 2}}{Y_{i 1} + Y_{i 2}} \frac{Y_{i 1} + Y_{i 2} - (d_{i 1} + d_{i 2})}{(Y_{i 1} + Y_{i 2}) - 1} (d_{i 1} + d_{i 2})},

(1)

where $h = {(\frac{n_{1} + n_{2}}{n_{1} n_{2}})}^{1 / 2}$ , and w₁_i = 1 for all i = 1,2, …, D. Under the null hypothesis, U in (1) has an asymptotic standard normal distribution.

The weighted log-rank test statistic considered by (Qiu and Sheng, 2008) is defined as follows.

V_{m} = h \sum_{i = 1}^{D} w_{2 i}^{(m)} (d_{i 1} - Y_{i 1} \frac{d_{i 1} + d_{i 2}}{Y_{i 1} + Y_{i 2}}) / \sqrt{\sum_{i = 1}^{D} {(w_{2 i}^{(m)})}^{2} \frac{Y_{i 1}}{Y_{i 1} + Y_{i 2}} \frac{Y_{i 2}}{Y_{i 1} + Y_{i 2}} \frac{Y_{i 1} + Y_{i 2} - (d_{i 1} + d_{i 2})}{(Y_{i 1} + Y_{i 2}) - 1} (d_{i 1} + d_{i 2})},

(2)

where $w_{2 i}^{(m)} = {\begin{matrix} - 1 & i f i = 1, 2, \dots, m \\ {\hat{c}}_{m} & otherwise \end{matrix}, m = [D r]$ , the integer part of Dr for any r ∈ [ε, 1 − ε], 0 < ε < 0.5, and ${\hat{c}}_{m} = \sum_{i = 1}^{m} \frac{{\hat{L}}_{1} (t_{i}) {\hat{L}}_{2} (t_{i})}{(\frac{n_{1}}{n}) {\hat{L}}_{1} (t_{i}) + (\frac{n_{2}}{n}) {\hat{L}}_{2} (t_{i})} Δ \hat{S} (t_{i}) / \sum_{i = m + 1}^{D} \frac{{\hat{L}}_{1} (t_{i}) {\hat{L}}_{2} (t_{i})}{(\frac{n_{1}}{n}) {\hat{L}}_{1} (t_{i}) + (\frac{n_{2}}{n}) {\hat{L}}_{2} (t_{i})} Δ \hat{S} (t_{i})$ , which is an estimate of the quantity

c_{r} = \int_{0}^{F^{- 1} (r)} \frac{L_{1} (s) L_{2} (s)}{p_{1} L_{1} (s) + p_{2} L_{2} (s)} d F (s) / \int_{F^{- 1} (r)}^{u} \frac{L_{1} (s) L_{2} (s)}{p_{1} L_{1} (s) + p_{2} L_{2} (s)} d F (s),

(3)

where ${\hat{L}}_{l} (t)$ is the estimate of L_t(t) (l = 1,2), $\hat{S} (t)$ is the estimate of the survival distribution S(t) using the pooled samples, $p_{1} = \lim_{n \to \infty} \frac{n_{1}}{n_{1} + n_{2}}$ , $p_{k_{2}} = \lim_{n \to \infty} \frac{n_{2}}{n_{1} + n_{2}}$ ; m =[Dr], and u = inf [s: min{π₁(s), π₂(s)} = 0].

Under the null hypothesis, each V_m defined in (2) has an asymptotic standard normal distribution (Qiu and Sheng, 2008). Furthermore, let

V =_{D_{ε} \leq m \leq D - D_{ε}}^{\sup} (V_{m}),

(3)

which is designed specifically for detecting the difference between two crossing hazard rate functions, we have the following result (Qiu and Sheng, 2008):

Lemma 1

Under the null hypothesis, U and V_m for each m, and therefore, U and V, are asymptotically independent.

The sampling distribution of V in (3) under the null hypothesis is hard to derive explicitly since V_m’s are correlated. However, its p-value can be estimated using bootstrap (Qiu and Sheng, 2008).

Next, we want to generalize the statistics U and V to cases with multiple hazard rate functions (i.e., K > 2). To this end, we will obtain K−1 pairs of U and V through comparing two hazard rate functions K−1 times. Specifically, for each k (k=1, 2, …, K−1), we compare two groups, one from the pooled original treatment groups of 1, 2, …, k, and the other is the original group of k+1, and obtain two respective statistics U_k and V_k. For those U_k and V_k, we have the following result.

Theorem 1

Under the null hypothesis, the 2(K−1) statistics, U_k, V_k (k=1, 2, …, K−1) are asymptotically independent.

The proof of Theorem 1 is given in the Appendix. It should be pointed out that when K=2, the two statistics U₁ and V₁ are the same statistics as in (1) and (3) obtained by the Qiu and Sheng method. Based on the properties of those statistics U_k and V_k, we define some test statistics.

2.1 The U test

Define the test statistic

U_{O} = \sum_{i = 1}^{K - 1} {(χ_{1}^{2})}^{- 1} (1 - P_{U_{i}}),

(4)

where $P_{U_{i}}$ is the p-value from the test U_i (i = 1,2, ⋯, K − 1), ${(χ_{1}^{2})}^{- 1} (.)$ is the inverse function of the cumulative chi-square distribution with degree of freedom (df) 1.

We will call the test U_O in (4) the U test. Under the null hypothesis $P_{U_{i}}$ are asymptotically and independently distributed uniformly between 0 and 1. Therefore, under the null hypothesis, U_O has an asymptotic chi-square distribution with K – 1 df. Its p-value can be calculated as $P_{U} = pr [χ_{K - 1}^{2} > U_{O}]$ . The U test is an extension of the two-sample log-rank test and it has similar performance as the multiple-sample log-rank test. When the hazard rate functions are parallel, this test should be powerful.

2.2 The V test

Define the test statistic

V_{O} = \sum_{i = 1}^{K - 1} {(χ_{1}^{2})}^{- 1} (1 - P_{V_{i}}),

(13)

where $P_{V_{i}}$ is the p-value from the test V_i (i = 1,2, ⋯, K − 1).

We will call the test V_O in (13) the V test. From theorem 1, under the null hypothesis V_i are asymptotically independent. Therefore, under the null hypothesis, V_O has an asymptotic chi-square distribution with df K−1. Its p-value can be calculated as $P_{V} = pr [χ_{K - 1}^{2} > V_{O}]$ . The V test is an extension of the weighted log-rank test which is powerful when the hazard rate functions are crossing.

2.3 The UV test

From Theorem 1, under the null hypothesis, U_k, V_k_′ (k, k′ = 1,2, ⋯, K − 1) are asymptotically independent. Therefore, we have the following result.

Theorem 2

Under the null hypothesis, the U test and the V test statistics are asymptotically independent.

The proof of Theorem 2 are given in the Appendix.

Based on this result, a test statistic can be constructed as U_O + V_O. We will call this test the UV test. Under the null hypothesis, the UV test has an asymptotic chi-square distribution with df equals 2(K−1). Therefore, its p-value can be calculated as

P_{U V} = pr {χ_{2 (K - 1)}^{2} > U_{O} + V_{O}} = pr {χ_{2 (K - 1)}^{2} > \sum_{i = 1}^{K - 1} {(χ_{1}^{2})}^{- 1} (1 - P_{U_{i}}) + \sum_{i = 1}^{K - 1} {(χ_{1}^{2})}^{- 1} (1 - P_{V_{i}})} .

It should be pointed out that when K=2, the above UV test is constructed based on the two statistics U and V as obtained by the Qiu and Sheng method. However, the Qiu and Sheng approach obtains the overall p-value from U and V in a different way. Our simulation study (data not shown) indicates that our proposed UV test is usually more powerful than the Qiu and Sheng test when K=2.

3. A Simulation Study

In this section, we conduct a simulation study to demonstrate the performance of the U test, the V test and the UV test. We will compare these methods with the commonly used log-rank (LR) test, and the Gehan-Wilcoxon (GW) test for multiple samples. In this simulation study, we assume there are three treatment groups with hazard rate functions, h₁(t), h₂(t), and h₃(t), respectively. We assume the censoring times are uniformly distributed between 0 and 2. Two different sample sizes are considered: 50 and 100 for each group. The empirical type I error rate and power are estimated based on 1000 replicates using the significance level of 0.05. When the null hypothesis is true, we set h₁(t) = h₂(t) = h₃(t) = 1 to estimate the type I error rate. To estimate the power, we consider the following four cases, as shown in Figure 1: (a) h₁(t) = 1, h₂(t) = 1.2, h₃(t) = 1.5; (b) h₁(t) = 1, h₂(t) = 1.5, h₃(t) = 1 + 0.5t; (c) h₁(t) = 1, h₂(t) = 1.2, h₃(t) = 2t; and (d) h₁(t) = 1, h₂(t) = 0.5 + t, h₃(t) = 2t. Note that, in (a), the three hazard rate functions are parallel; in (b), the three hazard rate functions cross once (beyond time 0); in (c), the three hazard rate functions cross twice; and in (d), the three hazard rate functions cross three times. From (a) to (d), the degree of crossing among the three risk rate functions is in an increasing order.

Four sets of hazard rate functions used in the simulation study for power estimation.

Table 1 reports the empirical type I error rate and power for each method considered in the simulation. From Table 1, we can see that the U test, the V test and the proposed UV test all control the type I error rate quite well. We also observe that the U test has similar performance to that of the LR test, because they are both extensions of the two-sample LR test in different ways. When there are no or very few crossings among the hazard rate functions (e.g., cases (a) and (b)), the V test has no or little power; therefore, the proposed UV test has slightly lower power than the U test. However, when there are more crossings (e.g., cases (c) and (d)), the V test can be very powerful while the LR and GW tests lose power dramatically. From this simulation example, it can be seen that the U, LR and GW tests are powerful when there are no or few crossings, the V test is powerful when there are more crossings, and the UV test is always close to the best test in all cases considered. Therefore, the UV test is robust to different patterns of the hazard rate functions. It should be pointed out that in the simulation study, we simply use the number of crossing to indicate the degree of crossing. However, the performance of the V test, and the UV test may also depend on when and how the crossing take place.

Table 1.

Empirical type I error rate and power of each method obtained from 1000 replicates with nominal significance level of 0.05.

Sample size	Test	Type I error rate	Power
Sample size	Test	Type I error rate	(a)	(b)	(c)	(d)
n₁ = n₂ = n₃ = 50	U	0.048	0.281	0.289	0.132	0.044
	V	0.043	0.036	0.079	0.845	0.721
	UV	0.047	0.187	0.244	0.832	0.628
	LR	0.048	0.273	0.310	0.185	0.066
	GW	0.052	0.235	0.268	0.449	0.215

n₁ = n₂ = n₃ = 100	U	0.047	0.545	0.495	0.301	0.072
	V	0.036	0.049	0.137	0.995	0.965
	UV	0.054	0.431	0.464	0.995	0.935
	LR	0.051	0.533	0.521	0.349	0.092
	GW	0.053	0.469	0.479	0.777	0.356

Open in a new tab

4. A Real Data Example

In this section, we apply the proposed test to a real data set from the randomized, double-blinded Digoxin Intervention Trial (The Digitalis Investigation Group, 1997). In the trial, patients with left ventricular ejection fractions of 0.45 or less were randomly assigned to digoxin (3397 patients) or placebo (3403 patients) groups. A primary outcome was the mortality due to worsening heart failure. Figure (a) plots the Kaplan-Meier curves for the two treatment groups.

Kaplan-Meier curves for the two treatment groups (plot (a)) and the four treatment by gender groups (plot (b)).

In the original study, the authors used the LR test and obtained a p-value of 0.06, indicating that the evidence of the effectiveness of digoxin, in terms of reducing the mortality due to worsening heart failure, is at most marginal. However, based on the above Kaplan-Meier curves, the proportional hazard rates assumption made in the LR test may not be valid, and tests other than the LR should be considered. In fact, we obtain p-values from other methods as shown in Table 2 (a). The p-value from the UV test is 0.021; there is a relatively stronger evidence to support the effectiveness of the drug.

Table 2.

P-values from various tests when compare (a) two groups (placebo, drug), (b) compare four groups (male placebo, male drug, female placebo, and female drug); (c) two groups (male placebo, male drug), and (d) two groups (female placebo, female drug).

Data used	U	V	UV	LR	GW
(a) P vs. D	0.061	0.040	0.021	0.061	0.050
(b) M-P, M-D, F-P, F-D	0.11	0.0083	0.0070	0.11	0.092
(c) M-P vs. M-D	0.019	0.026	0.0054	0.019	0.014
(d) F-P vs. F-D	0.66	0.69	0.84	0.66	0.66

Open in a new tab

We then consider the possible interaction between the treatment and gender. Table 3 summarizes the data for the four groups. Figure 2 (b) plots the Kaplan-Meier curves for the four groups, which clearly shows that the LR test is not optimal as the proportional hazard rates assumption is violated. Table 2 (b) reports the p-values from different tests. Both the V test and the UV test obtain very small p-values. It is then interesting to compare the two treatments among male patients: M-P vs. M-D. Table 2 (c) gives the p-values from various tests. The p-value from the UV test is 0.0054; there is strong evidence that the drug is effective among male patients. Finally, we compare the treatments among female patients; the p-values from all the methods considered are shown in Table 2 (d). All tests obtained large p-values; there is no evidence from the data to support the effectiveness of the drug for female patients, although it is noticeable that there were much fewer female patients in the study.

Table 3.

Death due to worsening heart failure by treatment and gender.

Group	Yes (%)	No (%)	Total
male-placebo (M-P)	358 (13.6)	2281 (86.4)	2639
male-drug (M-D)	300 (11.4)	2342 (88.6)	2642
female-placebo (F-P)	91 (11.9)	673 (88.1)	764
female-drug (F-D)	94 (12.5)	661 (87.5)	755

Open in a new tab

5. Discussion and Conclusion

In this paper we extend the two-sample LR test to multiple samples and call this extension the U test. The U test has similar performance as the traditional LR test, and therefore, it is powerful when the hazard rate functions are parallel. We also proposed another test, called the V test, which is designed specifically for the situation when the hazard rate functions cross each other and when the U test fails. The proposed UV test is a robust approach that combines information from the U test and the V test in an efficient way. If we have information about the hazard rate functions prior to seeing the data, we can use the U test, or the V test only when it is appropriate. For example, if the alternative hypothesis is directional, S₁(t) ≥ S₂(t) ≥ ⋯ ≥ S_K(t) (or S₁(t) ≤ S₂(t) ≤ ⋯ ≤ S_K(t)), the one-sided p-values obtained based on the test statistics U_k(k = 1,2, ⋯, K − 1) can be used to obtain an overall p-value. However, if this kind of information is unavailable, the proposed UV test would be a good choice.

In each of the U, V and UV tests, we choose to combine the related p-values using the chi-square distribution with df 1 as this method is robust (Chen, 2011; Chen and Nadarajah, 2014). Although other robust methods of combing p-values, such as the Fisher test (Fisher, 1932), are also possible, the weighted z tests are not recommended here since they are not robust and may lose power dramatically in some situations. In addition, for the UV test, we can combine the two p–values from the U and V tests, namely, P_U and P_V, with different df values df_U and df_V, i.e., $P_{U V} = pr {χ_{d f_{U} + d f_{V}}^{2} > {(χ_{d f_{U}}^{2})}^{- 1} (1 - P_{U}) + {(χ_{d f_{V}}^{2})}^{- 1} (1 - P_{V})}$ . If we know that the hazard rate functions are mainly parallel, we can assign a large number for df_U and a relatively small number for df_V. On the other hand, if the hazard rate functions are dominated by the crossing, we can assign a small number for df_U and a large number for df_V. When df_U = df_V = K−1, it is the proposed UV test.

In summary, the UV test proposed in this paper is a flexible and robust approach, which has been confirmed by a simulation study and a real data application. More specifically, we have shown that this approach has good performance in terms of controlling the type I error rate and the detecting power. In some situations, the gain in power is substantial compared with other commonly used methods, such as the LR test and the GW test.

Acknowledgments

The authors are grateful to editor Professor Yi-Hau Chen for his helpful comments and suggestions that lead to improvement of the paper. The first author also acknowledges the support from the internal research funds awarded by Indiana University School of Public Health-Bloomington.

APPENDIX

For the proofs, first, we define the following statistics:

Z_{1 k} {w_{k : 1}^{(1)} = Z_{1 k} (w_{k : 1}) = h_{k} \sum_{i = 1}^{D_{k}} w_{k : 1 i} {d_{i (k + 1)} - Y_{i (k + 1)} \sum_{j = 1}^{k + 1} d_{i j} / \sum_{j = 1}^{k + 1} Y_{i j}},

(A.1)

Z_{2 k} {w_{k : 2}^{(r_{k})} = h_{k} \sum_{i = 1}^{D_{k}} w_{k : 2 i}^{(r)} {d_{i (k + 1)} - Y_{i (k + 1)} \sum_{j = 1}^{k + 1} d_{i j} / \sum_{j = 1}^{k + 1} Y_{i j}},

(A.2)

where k = 1,2, …, K − 1, $h_{k} = \sqrt{\sum_{j = 1}^{k + 1} n_{j} / {n_{k + 1} (\sum_{j = 1}^{k} n_{j})}}$ , D_k is the total number of distinct failure times from groups 1,2, …, k + 1, and $w_{k : l}^{(r)} = {w_{k : l 1}^{(r)}, w_{k : l 2}^{(r)}, \dots, w_{k : l D_{k}}^{(r)}}^{'} (l = 1, 2)$ are suitable weight functions defined as follows.

$w_{k : 1 i}^{(1)} = 1$ , for i=1,.2,…, D_k, and

w_{k : 2 i}^{(r_{k})} = {\begin{matrix} - 1 & i f i = 1, 2, \dots, m_{k} \\ c_{r_{k}} & oterwise \end{matrix},

(A.3)

where

c_{r_{k}} ≜ \int_{0}^{F_{k +}^{- 1} (r_{k})} \frac{L_{k 1} (s) L_{k 2} (s)}{p_{k 1} L_{k 1} (s) + p_{k 2} L_{k 2} (s)} d F_{k +} (s) / \int_{F_{k +}^{- 1} (r_{k})}^{u_{k}} \frac{L_{k 1} (s) L_{k 2} (s)}{p_{k 1} L_{k 1} (s) + p_{k 2} L_{k 2} (s)} d F_{k +} (s),

(A.4)

F_k₊(t) = 1 − S_k₊(t) is the cumulative distribution function of the survival time using the pooled data from groups 1,2, …, k + 1; $p_{k 1} = \lim_{n \to \infty} \frac{n_{k 1}}{n_{k +}}$ , $p_{k 2} = \lim_{n \to \infty} \frac{n_{k 2}}{n_{k +}}$ , m_k = [D_kr_k], and u_k = inf[s : min{π_k₁(s), π_k₂(s)} = 0]. Note that the wights $w_{k : 2 i}^{(r_{k})}$ defined in (A.3) and (A.4) are analogous to those defined in (2) and (3).

For the covariance between $Z_{l k} {w_{k : l}^{({(r_{k})}^{l - 1})}}$ and $Z_{l^{'} k^{'}} {w_{k^{'} : l^{'}}^{({(r_{k})}^{l^{'} - 1})}} (k \neq k^{'})$ , we have the following major result.

Lemma 2

If k ≠ k′, $cov [Z_{l k} {w_{k : l}^{({(r_{k})}^{l - 1})}}$ , $Z_{l^{'} k^{'}} {w_{k^{'} : l^{'}}^{({(r_{k})}^{l^{'} - 1})}}] = 0$ for l,l′ = 1,2 and k,k′ =1,2, …, K − 1

Proof of Lemma 2

First we define the following counting processes:

Y_{i j} (t) = I (X_{i j} > t), i = 1, 2, \dots, K, j = 1, 2, \dots, n_{i}, N_{i j} (t) = I (X_{i j} < t, δ_{i j} = 1), i = 1, 2, \dots, K, j = 1, 2, \dots, n_{i}, {\bar{Y}}_{k 1} (t) = \sum_{j = 1}^{n_{k + 1}} Y_{(k + 1) j} (t), k = 1, 2, \dots, K - 1, {\bar{Y}}_{k 2} (t) = \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} Y_{i j} (t), k = 1, 2, \dots, K - 1, {\bar{Y}}_{k} (t) = {\bar{Y}}_{k 1} (t) + {\bar{Y}}_{k 2} (t) = \sum_{i = 1}^{k + 1} \sum_{j = 1}^{n_{i}} Y_{i j} (t), k = 1, 2, \dots, K - 1.

To simplify the notation, for given r_k, we use w_k_:_l to denote $w_{k : l}^{{{(r_{k})}^{l - 1}}}$ . For the statistic $Z_{l k} (w_{k : l}; t) = \sum_{i = 1}^{D_{k}} w_{k : l i} {d_{i (k + 1)} - Y_{i (k + 1)} \sum_{j = 1}^{k + 1} d_{i j} / \sum_{j = 1}^{k + 1} Y_{i j}}$ , where l = 1,2, k = 1,2, …, K – 1, it can be shown that

Z_{l k} (w_{k : l}; t) = \sum_{i = 1}^{k + 1} \sum_{j = 1}^{n_{i}} \int_{0}^{t} H_{ijlk} (s) d M_{i j} (s),

where $H_{ijlk} (t) = {(- 1)}^{I_{{i \in (1, 2, \dots, k)}}} w_{k : l} (t) {\bar{Y}}_{k 1} (t) {\bar{Y}}_{k 2} (t) {({\bar{Y}}_{k} (t) {({\bar{Y}}_{k 1} (t))}^{I_{{i \in (1, 2, \dots, k)}}} {({\bar{Y}}_{k 2} (t))}^{I_{(i = k + 1)}})}^{- 1},$ and martingale $M_{i j} (t) = N_{i j} (t) - \int_{0}^{t} I_{{X_{i j} \geq u}} d Λ (u)$ .Therefore, $Z_{l^{'} k^{'}} (w_{k^{'} : l^{'} i}; t) = \sum_{i = 1}^{k^{'} + 1} \sum_{j = 1}^{n_{i}} \int_{0}^{t} H_{i j l^{'} k^{'}} (s) d M_{i j} (s)$ , where $H_{i j l^{'} k^{'}} (t) = {(- 1)}^{I_{{i \in (1, 2, \dots, k^{'})}}} w_{k^{'} : l^{'}} (t) {\bar{Y}}_{k^{'} 1} (t) {\bar{Y}}_{k^{'} 2} (t) {({\bar{Y}}_{k^{'}} (t) {({\bar{Y}}_{k^{'} 1} (t))}^{I_{{i \in (1, 2, \dots, k^{'})}}} {({\bar{Y}}_{k^{'} 2} (t))}^{I_{{i = k^{'} + 1}}})}^{- 1} .$

Without loss of generality, in this proof we assume k < k′. With the weight functions w_k_:_l(t) and w_k_′:_l_′(t) defined as in (A.3), it is easy to check that the required conditions in theorem 2.6.2 of Fleming and Harrington (Fleming and Harrington, 1991) are met; therefore, based on that theorem, the covariance between Z_lk(w_k_:_l;t) and Z_l_′_k_′(w_k_′:_l_′;t) can be expressed as:

cov {Z_{l k} (w_{k : l}; t), Z_{l^{'} k^{'}} (w_{k^{'} : l^{'}}; t)} = \sum_{i = 1}^{k + 1} \sum_{j = 1}^{n_{i}} \int_{0}^{t} E {H_{ijlk} (s) H_{i j l^{'} k^{'}} (s) I_{{X_{i j} \geq s}}} {1 - Λ (s)} d Λ (s) .

Therefore, $cov {Z_{l k} (w_{k : l}; t), Z_{l^{'} k^{'}} (w_{k^{'} : l^{'}}; t)} = \int_{0}^{t} E {\sum_{i = 1}^{k + 1} \sum_{j = 1}^{n_{i}} H_{ijlk} (s) H_{i j l^{'} k^{'}} (s) I_{(X_{i j} \geq s)}} {1 - Λ (s)} d Λ (s)$ .

But, $\sum_{i = 1}^{k + 1} \sum_{j = 1}^{n_{i}} H_{ijlk} (s) H_{i j l^{'} k^{'}} (s) = \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} H_{ijlk} (s) H_{i j l^{'} k^{'}} (s) + \sum_{j = 1}^{n_{k + 1}} H_{(k + 1) jlk} (s) H_{(k + 1) j l^{'} k^{'}} (s) = w_{k : l} (s) w_{k^{'} : l^{'}} (s) [- \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} I_{{X_{i j} \geq s}} \frac{{\bar{Y}}_{k 1} (s)}{{\bar{Y}}_{k} (s)} \frac{{\bar{Y}}_{k^{'} 1} (s)}{{\bar{Y}}_{k^{'}} (s)} + \sum_{i = 1}^{n_{k + 1}} I_{{X_{(k + 1) j} \geq s}} \frac{{\bar{Y}}_{k 2} (s)}{{\bar{Y}}_{k} (s)} \frac{{\bar{Y}}_{k^{'} 1} (s)}{{\bar{Y}}_{k^{'}} (s)}] = c {- \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} Y_{i j} (s) \sum_{j = 1}^{n_{k + 1}} Y_{(k + 1) j} (s) + \sum_{j = 1}^{n_{k + 1}} Y_{(k + 1) j} (s) \sum_{i = 1}^{k} \sum_{j = 1}^{n_{i}} Y_{i j} (s)} = 0, where c = w_{k : l} (s) w_{k^{'} : l^{'}} (s) \frac{1}{{\bar{Y}}_{k} (s)} \frac{{\bar{Y}}_{k^{'} 1} (s)}{{\bar{Y}}_{k^{'}} (s)} .$

Finally, let u_k,k_′ = inf [s: min{π_k₁(s), π_k₂(s), π_k_′1(s), π_k_′2(s)} = 0], covariance cov(Z_lk, Z_l_′_k_′) = cov{Z_lk(u_k,k_′), Z_l_′_k_′(u_k,k_′)} = 0 when k = k′.

Proof of theorem 1

Note that

U_{k} = Z_{1 k} (w_{k : 1}) / h_{k} \hat{σ} {Z_{1 k} (w_{k : 1})} (k = 1, 2, \dots, K - 1),

(A.5)

where $\hat{σ} {Z_{1 k} (w_{k : 1})} = \sqrt{\sum_{i = 1}^{D_{k}} w_{k : 1 i}^{2} \frac{Y_{k : i 1}}{Y_{k : i}} \frac{Y_{k : i 2}}{Y_{k : i}} \frac{Y_{k : i} - d_{k : i}}{Y_{k : i} - 1} d_{k : i}}$ , Y_k_:_i₁ = Y_i₍_k₊₁₎, $Y_{k : i 2} = \sum_{j = 1}^{k} Y_{i j}$ , Y_k_:_i = Y_k_:_i₁ + Y_k_:_i₂, $d_{k : i} = \sum_{j = 1}^{k + 1} d_{i j}$ .

And

V_{k} =_{D_{k_{ε}} \leq m_{k} \leq D_{k} - D_{k_{ε}}}^{\sup} (V_{k : m_{k}}),

(A.6)

where $V_{k : m_{k}} = Z_{2 k} {{\hat{w}}_{k : 2}^{(m_{k})}} / h_{k} \hat{σ} [Z_{2 k} {{\hat{w}}_{k : 2}^{(m_{k})}}]$ ,

\hat{σ} [Z_{2 k} {{\hat{w}}_{k : 2}^{(m_{k})}}] = \sqrt{\sum_{i = 1}^{D_{k}} {{\hat{w}}_{k : 2 i}^{(m_{k})}}^{2} \frac{Y_{k : i 1}}{Y_{k : i}} \frac{Y_{k : i 2}}{Y_{k : i}} \frac{Y_{k : i} - d_{k : i}}{Y_{k : i} - 1} d_{k : i}}, {\hat{w}}_{k : 2}^{(m_{k})} = {{\hat{w}}_{k : 21}^{(m_{k})}, {\hat{w}}_{k : 22}^{(m_{k})}, \dots, {\hat{w}}_{k : 2 D_{k}}^{(m_{k})}}^{T}, {\hat{w}}_{k : 2 i}^{(m_{k})} = {\begin{matrix} - 1 & i f i = 1, 2, \dots, m_{k} \\ {\hat{c}}_{m_{k}} & otherwise \end{matrix}, {\hat{c}}_{m_{k}} = \sum_{i = 1}^{m_{k}} \frac{{\hat{L}}_{k 1} (t_{i}) {\hat{L}}_{k 2} (t_{i})}{(\frac{n_{k 1}}{n_{k +}}) {\hat{L}}_{k 1} (t_{i}) + (\frac{n_{k 2}}{n_{k +}}) {\hat{L}}_{k 2} (t_{i})} Δ {\hat{S}}_{k} (t_{i}) / \sum_{i = m_{k} + 1}^{D_{k}} \frac{{\hat{L}}_{k 1} (t_{i}) {\hat{L}}_{k 2} (t_{i})}{(\frac{n_{k 1}}{n_{k +}}) {\hat{L}}_{k 1} (t_{i}) + (\frac{n_{k 2}}{n_{k +}}) {\hat{L}}_{k 2} (t_{i})} Δ {\hat{S}}_{k} (t_{i}), n_{k 1} = n_{k + 1}, n_{k 2} = \sum_{i = 1}^{k} n_{i}, n_{k +} = n_{k 1} + n_{k 2} = \sum_{i = 1}^{k + 1} n_{i},

(A.7)

and ${\hat{L}}_{k 1} (t)$ is the K-M estimate of the survival function for censoring time using the (k+1)^th sample, and ${\hat{L}}_{k 2} (t)$ is the K-M estimate of the survival function for censoring time using pooled sample from groups 1, 2, …, k and ${\hat{S}}_{k} (t)$ is the K-M estimate of the survival function for survival time using pooled sample from groups 1, 2, …, k + 1.

We have the following result for ${\hat{c}}_{m_{k}}$ in (A.7) (Qiu and Sheng, 2008): it convergences in probability to

c_{r_{k}} = \int_{0}^{F_{k +}^{- 1} (r_{k})} \frac{L_{k 1} (s) L_{k 2} (s)}{p_{k 1} L_{k 1} (s) + p_{k 2} L_{k 2} (s)} d F_{k +} (s) / \int_{F_{k +}^{- 1} (r_{k})}^{u_{k}} \frac{L_{k 1} (s) L_{k 2} (s)}{p_{k 1} L_{k 1} (s) + p_{k 2} L_{k 2} (s)} d F_{k +} (s) .

(A.8)

The quantity defined in (A.8) is analogous to that in (3).

This part can be proved in the same way as the proof of lemma 1.
From Corollary 7.2.1 in Fleming and Harrington (Fleming and Harrington, 1991), see also Appendix A in Qiu and Sheng (Qiu and Sheng, 2008), we know that under the null hypothesis $Z_{1 k} (w_{k : 1}) / \hat{σ} {Z_{1 k} (w_{k : 1})}$ has an asymptotic standard normal distribution, and $σ^{2} {Z_{1 k} (w_{k : 1})} / {\hat{σ}}^{2} {Z_{1 k} (w_{k : 1})}$ convergences to 1 in probability, where $σ^{2} {Z_{1 k} (w_{k : 1})} = \int_{0}^{u_{k}} w_{k : 1}^{2} (s) \frac{L_{k 1} (s) L_{k 2} (s)}{p_{k 1} L_{k 1} (s) + p_{k 2} L_{k 2} (s)} d F_{k + 1} (s)$ .

From lemma 2, we also have cov(Z₁_k, Z₁_k_′) = 0 for k ≠ k′. Therefore $cov (U_{k}, U_{k^{'}}) \to \frac{1}{σ (Z_{1 k} (w_{k : 1})) σ (Z_{1 k^{'}} (w_{k^{'} : 1}))} cov (Z_{1 k}, Z_{1 k^{'}}) = 0$ , and U_k and U_k_′ (k, k′ = 1,2,…,K – 1, k ≠ k′) are asymptotically independent.
For any k ≠ k′, as in (ii), it can be shown that under the null hypothesis $V_{k : m_{k}}$ and $V_{k^{'} : m_{k^{'}}}$ are asymptotically independent. Notice that $V_{k} =_{D_{k_{ε}} \leq m_{k} \leq D_{k} - D_{k_{ε}}}^{\sup} (V_{k : m_{k}})$ , based on theorems 18.10, and 18.11 in van der Vaart (Van der Vaart, 1998),we have under the null hypothesis V_k and each $V_{k^{'} : m_{k^{'}}}$ , are asymptotically independent. Similarly, under the null hypothesis V_k, and each $V_{k : m_{k^{'}}}$ are asymptotically independent. Therefore, under the null hypothesis V_k and V_k_′ are asymptotically independent.
For any k ≠ k′, as in (ii), it can be shown that under the null hypothesis U_k and each $V_{k^{'} : m_{k^{'}}}$ are asymptotically independent. Therefore, again by theorems 18.10, and 18.11 in van der Vaart (Van der Vaart, 1998), under the null hypothesis U_k and V_k, are asymptotically independent.

Proof of theorem 2

Under the null hypothesis, the U test is the sum of K – 1 asymptotically independent chi-square distributions each with df 1; and the V test is the sum of another K – 1 asymptotically independent chi-square distributions each with df 1. The two sets of chi-square distributions are mutually asymptotically independent, therefore, under the null hypothesis, the U test and the V test statistics are asymptotically independent.

References

Buyske S, Fagerstrom R, Ying Z. A class of weighted log-rank tests for survival data when the event is rare. Journal of the American Statistical Association. 2000;95:249–258. [Google Scholar]
Chen Z. Is the weighted z-test the best method for combining probabilities from independent tests? Journal of Evolutionary Biology. 2011;24:926–930. doi: 10.1111/j.1420-9101.2010.02226.x. [DOI] [PubMed] [Google Scholar]
Chen Z, Nadarajah S. On the optimally weighted z-test for combining probabilities from independent studies. Computational statistics & data analysis. 2014;70:387–394. [Google Scholar]
Fisher RA, editor. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd; 1932. [Google Scholar]
Fleming TR, Harrington DP. Counting processes and survival analysis. New York: Wiley; 1991. [Google Scholar]
Gehan EA. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika. 1965;52:203–223. [PubMed] [Google Scholar]
Lin X, Wang H. A new testing approach for comparing the overall homogeneity of survival curves. Biometrical Journal. 2004;46:489–496. [Google Scholar]
Liu K, Qiu P, Sheng J. Comparing two crossing hazard rates by Cox proportional hazards modelling. Statistics in Medicine. 2007;26:375–391. doi: 10.1002/sim.2544. [DOI] [PubMed] [Google Scholar]
Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies. J natl cancer inst. 1959;22:719–748. [PubMed] [Google Scholar]
Moreau T, Maccario J, Lellouch J, Huber C. Weighted log rank statistics for comparing two distributions. Biometrika. 1992;79:195–198. [Google Scholar]
Park KY, Qiu P. Model selection and diagnostics for joint modeling of survival and longitudinal data with crossing hazard rate functions. Statistics in Medicine. 2014;33:4532–4546. doi: 10.1002/sim.6259. [DOI] [PubMed] [Google Scholar]
Qiu P, Sheng J. A two-stage procedure for comparing hazard rate functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70:191–208. [Google Scholar]
The Digitalis Investigation Group. The effect of digoxin on mortality and morbidity in patients with heart failure. N Engl J Med. 1997;336:525–533. doi: 10.1056/NEJM199702203360801. [DOI] [PubMed] [Google Scholar]
Van der Vaart AW. Asymptotic statistics. Cambridge university press; 1998. [Google Scholar]
Yang S, Prentice R. Improved Logrank Type Tests for Survival Data Using Adaptive Weights. Biometrics. 2010;66:30–38. doi: 10.1111/j.1541-0420.2009.01243.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Buyske S, Fagerstrom R, Ying Z. A class of weighted log-rank tests for survival data when the event is rare. Journal of the American Statistical Association. 2000;95:249–258. [Google Scholar]

[R2] Chen Z. Is the weighted z-test the best method for combining probabilities from independent tests? Journal of Evolutionary Biology. 2011;24:926–930. doi: 10.1111/j.1420-9101.2010.02226.x. [DOI] [PubMed] [Google Scholar]

[R3] Chen Z, Nadarajah S. On the optimally weighted z-test for combining probabilities from independent studies. Computational statistics & data analysis. 2014;70:387–394. [Google Scholar]

[R4] Fisher RA, editor. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd; 1932. [Google Scholar]

[R5] Fleming TR, Harrington DP. Counting processes and survival analysis. New York: Wiley; 1991. [Google Scholar]

[R6] Gehan EA. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika. 1965;52:203–223. [PubMed] [Google Scholar]

[R7] Lin X, Wang H. A new testing approach for comparing the overall homogeneity of survival curves. Biometrical Journal. 2004;46:489–496. [Google Scholar]

[R8] Liu K, Qiu P, Sheng J. Comparing two crossing hazard rates by Cox proportional hazards modelling. Statistics in Medicine. 2007;26:375–391. doi: 10.1002/sim.2544. [DOI] [PubMed] [Google Scholar]

[R9] Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies. J natl cancer inst. 1959;22:719–748. [PubMed] [Google Scholar]

[R10] Moreau T, Maccario J, Lellouch J, Huber C. Weighted log rank statistics for comparing two distributions. Biometrika. 1992;79:195–198. [Google Scholar]

[R11] Park KY, Qiu P. Model selection and diagnostics for joint modeling of survival and longitudinal data with crossing hazard rate functions. Statistics in Medicine. 2014;33:4532–4546. doi: 10.1002/sim.6259. [DOI] [PubMed] [Google Scholar]

[R12] Qiu P, Sheng J. A two-stage procedure for comparing hazard rate functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70:191–208. [Google Scholar]

[R13] The Digitalis Investigation Group. The effect of digoxin on mortality and morbidity in patients with heart failure. N Engl J Med. 1997;336:525–533. doi: 10.1056/NEJM199702203360801. [DOI] [PubMed] [Google Scholar]

[R14] Van der Vaart AW. Asymptotic statistics. Cambridge university press; 1998. [Google Scholar]

[R15] Yang S, Prentice R. Improved Logrank Type Tests for Survival Data Using Adaptive Weights. Biometrics. 2010;66:30–38. doi: 10.1111/j.1541-0420.2009.01243.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparison of Multiple Hazard Rate Functions

Zhongxue Chen

Hanwen Huang

Peihua Qiu

SUMMARY

1. Introduction

2. Proposed Method

Lemma 1

Theorem 1

2.1 The U test

2.2 The V test

2.3 The UV test

Theorem 2

3. A Simulation Study

Figure 1.

Table 1.

4. A Real Data Example

Figure 2.

Table 2.

Table 3.

5. Discussion and Conclusion

Acknowledgments

APPENDIX

Lemma 2

Proof of Lemma 2

Proof of theorem 1

Proof of theorem 2

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comparison of Multiple Hazard Rate Functions

Zhongxue Chen

Hanwen Huang

Peihua Qiu

SUMMARY

1. Introduction

2. Proposed Method

Lemma 1

Theorem 1

2.1 The U test

2.2 The V test

2.3 The UV test

Theorem 2

3. A Simulation Study

Figure 1.

Table 1.

4. A Real Data Example

Figure 2.

Table 2.

Table 3.

5. Discussion and Conclusion

Acknowledgments

APPENDIX

Lemma 2

Proof of Lemma 2

Proof of theorem 1

Proof of theorem 2

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases