Omnibus test for restricted mean survival time based on influence function

Jiaqi Gu; Yiwei Fan; Guosheng Yin

doi:10.1177/09622802231158735

. 2023 Apr 4;32(6):1082–1099. doi: 10.1177/09622802231158735

Omnibus test for restricted mean survival time based on influence function

Jiaqi Gu ¹, Yiwei Fan ², Guosheng Yin ^3,^✉

PMCID: PMC10331519 PMID: 37015346

Abstract

The restricted mean survival time (RMST), which evaluates the expected survival time up to a pre-specified time point $τ$ , has been widely used to summarize the survival distribution due to its robustness and straightforward interpretation. In comparative studies with time-to-event data, the RMST-based test has been utilized as an alternative to the classic log-rank test because the power of the log-rank test deteriorates when the proportional hazards assumption is violated. To overcome the challenge of selecting an appropriate time point $τ$ , we develop an RMST-based omnibus Wald test to detect the survival difference between two groups throughout the study follow-up period. Treating a vector of RMSTs at multiple quantile-based time points as a statistical functional, we construct a Wald $χ^{2}$ test statistic and derive its asymptotic distribution using the influence function. We further propose a new procedure based on the influence function to estimate the asymptotic covariance matrix in contrast to the usual bootstrap method. Simulations under different scenarios validate the size of our RMST-based omnibus test and demonstrate its advantage over the existing tests in power, especially when the true survival functions cross within the study follow-up period. For illustration, the proposed test is applied to two real datasets, which demonstrate its power and applicability in various situations.

Keywords: Influence function, Kaplan–Meier estimator, perturbation procedure, survival analysis, Wald test

1. Introduction

In statistical analysis of clinical studies with time-to-event data, one important task is to test the equality of two survival functions of treatment and control groups:

H_{0} : S_{1} (t) = S_{2} (t) versus H_{1} : S_{1} (t) \neq S_{2} (t) for \, some t

(1)

where $S_{k} (t)$ is the true survival function of group $k$ ( $k = 1, 2$ ). In the past several decades, the log-rank test¹ has been dominantly used and it possesses the highest statistical power under the proportional hazards (PHs) assumption, that is the hazard functions of the two groups are proportional to each other.^2,3 However, if the PH assumption is violated, which is frequently encountered in practice, the log-rank test is no longer optimal and other more powerful tests can be used to distinguish $S_{1} (t)$ and $S_{2} (t)$ , for example, a family of weighted log-rank tests. Unlike the log-rank test which weighs all events equally, the weighted log-rank tests assign different weights to different events, for example, putting more weights on early survival differences or late survival differences. With the weight function constructed on the basis of the estimated survival function of combined data⁴ or the proportion of at-risk observations,^5,6 one can place more weights on events within a particular time period in order to detect the difference in survival functions with a higher power.

Due to censoring, the mean survival time may not be estimable and, as a result, the restricted mean survival time (RMST) has been developed as a summary statistic for the survival distribution.^7–9 The $τ$ -RMST with respect to the survival function $S (t)$ is defined as the expected value of survival time $T$ up to a time point $τ$ ,

μ (τ; S) = E (min {T, τ}) = \int_{0}^{τ} S (t) dt .

(2)

In contrast to the typical mean survival time which is sensitive to outliers, the RMST is robust due to the incorporation of $τ$ in its calculation. By plugging in the Kaplan–Meier¹⁰ (KM) estimator $\hat{S} (t)$ , a non-parametric maximum likelihood estimator (NPMLE) for the survival function,¹¹ $μ (τ; S)$ can be easily estimated as the area under $\hat{S} (t)$ from $0$ to $τ$ , denoted as $μ (τ; \hat{S})$ . To compare the survival distributions of two groups, inferences based on RMST have been extensively studied.^12–17 In recent works,^18–20 the RMST-based test has been shown to be more powerful than the log-rank test when the PH assumption is violated and the power is comparable when the PH assumption holds.

A crucial issue in the RMST-based test is the selection of the time point $τ$ , which can be specified at the design stage or selected in a data-driven way after data collection.²¹ Because the power of the RMST-based test highly depends on the pattern of differences between the two survival functions, an improperly chosen $τ$ would lead to poor performance in identifying the survival differences. For example, the survival functions $S_{1} (t)$ and $S_{2} (t)$ in Figure 1 cross within the interval $[0, 2]$ and the $τ$ -RMSTs of the two groups evaluated at $τ = 2$ happens to be equal, resulting in inability of using $μ (2; S_{1}) - μ (2; S_{2})$ to distinguish the survival functions. To overcome the aforementioned challenge, several non-parametric RMST-based tests have been developed to adapt to the pattern of differences between the survival functions with high robustness and flexibility. Horiguchi et al.²² proposed the RMST-based versatile test by constructing the test statistic as the maximum of $Z$ -statistics corresponding to RMST differences $μ (τ_{1}; S_{1}) - μ (τ_{1}; S_{2}), \dots, μ (τ_{d}; S_{1}) - μ (τ_{d}; S_{2})$ , while Wolski et al.²³ used the absolute difference of RMST for comparing net survival distributions. Without analytical derivation of the null distribution, all the existing methods unfortunately rely on numerical procedures, such as wild bootstrap and permutation, to make inferences and are thus computationally intensive.

Figure 1. — The true survival functions of two treatment groups whose $τ$ -restricted mean survival time ( $τ$ -RMST) evaluated at $τ = 2$ are equal.

Inspired by the concept of influence function²⁴ and its relationship with the asymptotic distribution of statistical functional,²⁵ we propose an omnibus Wald test for the hypothesis in (1) based on RMST and the KM estimator. Via a perturbation procedure, we can estimate the asymptotic covariance matrix of estimated RMSTs at multiple time points $τ_{1}, \dots, τ_{d}$ for each group with higher computational efficiency than the existing method.²⁶ We construct a Wald test statistic for a vector of RMST differences $μ (τ_{1}; S_{1}) - μ (τ_{1}; S_{2}), \dots, μ (τ_{d}; S_{1}) - μ (τ_{d}; S_{2})$ , which follows a $χ_{d}^{2}$ distribution asymptotically. To our knowledge, our work is the first one to use an influence function for asymptotic covariance matrix estimation and omnibus hypothesis testing in survival analysis, which nicely circumvents intensive resampling-based computation. Extensive numerical studies are conducted to assess the size and power of the proposed Wald test in comparison with several existing tests. It is shown that the performance of our Wald test is stable across various cases with power comparable to the state-of-the-art tests. Particularly, when the true survival functions $S_{1} (t)$ and $S_{2} (t)$ cross, our method outperforms all the existing tests dramatically.

The remainder of this paper is organized as follows. In Section 2, we introduce the RMST as a statistical functional and derive the asymptotic distribution in terms of influence function. A perturbation procedure is further developed to compute the RMST-based Wald test statistic, along with recommendations on the selection of time points. We conduct extensive simulations in Section 3 to demonstrate the effectiveness of estimation of the asymptotic covariance matrix, validate our RMST-based Wald test and compare it with existing methods under different scenarios. In Section 4, we apply the proposed test to analyze two real datasets and illustrate its broad applicability to different cases. Section 5 concludes with a brief discussion.

2. Methodology

2.1. RMST as statistical functional

Let $T_{1}, \dots, T_{n}$ be independent and identically distributed (i.i.d.) event times with the survival function $S (t)$ . Let $C_{1}, \dots, C_{n}$ be i.i.d. censoring times with the survival function $G (c)$ , which are assumed to be independent of event times. Let $(X_{1}, Δ_{1}), \dots, (X_{n}, Δ_{n})$ denote the observed time-to-event data, where $X_{i} = min (T_{i}, C_{i})$ and $Δ_{i} = I (T_{i} \leq C_{i})$ are, respectively, the observed time and the censoring indicator for subject $i$ . Denote the joint survival distribution of the event time and censoring time as $H (t, c) = S (t) G (c),$ which can be evaluated on its entire support. According to the definition in (2), the vector of RMSTs evaluated at $τ_{1}, \dots, τ_{d}$ :

\begin{aligned} ψ (H) & = {(\begin{matrix} μ (τ_{1}; S), & \dots, & μ (τ_{d}; S) \end{matrix})}^{T} \\ = {(\begin{matrix} \int_{t = 0}^{τ_{1}} {- \int_{c = 0}^{\infty} d H (t, c)} d t, & \dots, & \int_{t = 0}^{τ_{d}} {- \int_{c = 0}^{\infty} d H (t, c)} d t \end{matrix})}^{T} \end{aligned}

can be treated as a statistical functional of $H (t, c)$ ,²⁷ where $d H (t, c)$ is the differential of the joint survival function $H (t, c)$ . Even though RMST in (2) appears to be irrelevant to the survival function $G (c)$ of the censoring time, we still involve $G (c)$ in the statistical functional expression of RMST throughout this article for reasons detailed in Remark 1.

Let $# {i : condition}$ denote the number of observations that meet the specified condition after the colon and let $X_{(1)} < \dots < X_{(M)}$ be $M$ distinct observed times. Under the convention that $Δ_{i} = I (T_{i} \leq C_{i})$ , the KM estimators of $S (t)$ and $G (c)$ are, respectively, given by

\hat{S} (t) = \prod_{m : X_{(m)} \leq t} {1 - \frac{# {i : X_{i} = X_{(m)}; Δ_{i} = 1}}{# {i : X_{i} \geq X_{(m)}}}}, t \leq X_{(M)}

(3)

\hat{G} (c) = \prod_{m : X_{(m)} \leq c} {1 - \frac{# {i : X_{i} = X_{(m)}; Δ_{i} = 0}}{# {i : X_{i} > X_{(m)} or (X_{i}, Δ_{i}) = (X_{(m)}, 0)}}}, c \leq X_{(M)}

(4)

which provide the NPMLEs of the true survival functions $S (t)$ and $G (c)$ .²⁸ Based on the multiple time points $τ_{1}, \dots, τ_{d}$ , we can obtain a vector of estimated RMSTs:

ψ (\hat{H}) = (μ (τ_{1}; \hat{S}), \dots, μ (τ_{d}; \hat{S}))^{T}

(5)

where $min {τ_{1}, \dots, τ_{d}} \geq X_{(1)}$ , $max {τ_{1}, \dots, τ_{d}} \leq X_{(M)}$ and $\hat{H} (t, c) = \hat{S} (t) \hat{G} (c)$ .²¹ To estimate the asymptotic covariance matrix of the estimated RMSTs in (5), we utilize the influence function for the asymptotic analysis as detailed in Section 2.2.

Remark 1

Although the vector of RMSTs can also be treated as a statistical functional of the survival function $S (t)$ alone, using the influence function corresponding to $S (t)$ would underestimate the asymptotic covariance matrix of RMSTs. The reason is that the KM estimator $\hat{S} (t)$ is determined by the i.i.d. observations $(X_{1}, Δ_{1}), \dots, (X_{n}, Δ_{n})$ from the probabilistic measure:

$P r (X > x, Δ) = Δ \int_{x \leq t \leq c} d H (t, c) + (1 - Δ) \int_{x \leq c < t} d H (t, c)$

for $x \in [0, \infty), Δ \in {0, 1}$ . This implies that the distribution of estimated RMSTs is determined by $H (t, c)$ and would vary as $G (c)$ changes even under the same $S (t)$ . This can also be seen in Example IV.3.8. of Andersen et al.²⁹ that the asymptotic variance of the estimated RMST $μ (τ; \hat{S})$ ( $τ > 0$ ) depends on the survival function of the observed time

$Y (u) = Pr (X > u) = S (u) G (u), u > 0$

If the influence function of the statistical functional of $S (t)$ is used, we can only estimate the asymptotic covariance matrix of estimated RMSTs conditional on $\hat{G} (c)$ , which is smaller than the true asymptotic covariance matrix as shown in Section 3.1. As a result, the type I error rate of the RMST-based Wald test proposed in Section 2.3 would be inflated.

2.2. Influence function

Considering a general parameter vector $ψ (H)$ expressed as a statistical functional of the joint survival function $H$ , the influence function of $ψ$ at the point $(t, c)$ is defined as

φ (t, c; ψ, H) = lim_{ϵ \to 0} \frac{ψ ((1 - ϵ) H + ϵ δ_{t, c}) - ψ (H)}{ϵ}

(6)

where $δ_{t, c}$ is the survival function of the Dirac delta distribution that places probability one at the point $(t, c)$ . As long as the statistical functional $ψ (H)$ is differentiable with respect to the joint survival function $H$ , the influence function $φ (t, c; ψ, H)$ measures the rate and the direction at which $ψ (H)$ changes when $H$ is slightly perturbed by the Dirac delta distribution $δ_{t, c}$ or by a contaminated observation at $(t, c)$ .³⁰ Thus, the NPMLE of $ψ (H)$ with a bounded influence function is robust because an outlier can only shift the estimator $ψ (\hat{H})$ in a controlled range.

Given the NPMLE $\hat{H} (t, c)$ , we can apply the Taylor expansion to $ψ (\hat{H})$ as follows:

\begin{aligned} ψ (\hat{H}) - ψ (H) = & \int φ (t, c; ψ, H) {d \hat{H} (t, c) - d H (t, c)} + R_{n} \\ = & \int \int φ (t, c; ψ, H) d \hat{S} (t) d \hat{G} (c) - \int \int φ (t, c; ψ, H) d S (t) d G (c) + R_{n} \end{aligned}

where $R_{n}$ is the remainder. Provided that

E {φ (t, c; ψ, H)} = \int \int φ (t, c; ψ, H) d S (t) d G (c) = 0

(7)

and

\sqrt{n} R_{n} \to 0 in probability

(8)

it is well known that

\sqrt{n} {ψ (\hat{H}) - ψ (H)} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} φ (T_{i}, C_{i}; ψ, H) + o_{p} (1)

If the covariance matrix

\begin{aligned} V_{ψ} = & E {φ (t, c; ψ, H) φ (t, c; ψ, H)^{T}} \\ = & \int \int φ (t, c; ψ, H) φ (t, c; ψ, H)^{T} d S (t) d G (c) \end{aligned}

(9)

is finite, the central limit theorem implies that

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} φ (T_{i}, C_{i}; ψ, H) \overset{D}{⟶} MVN (0, V_{ψ})

where $\overset{D}{⟶}$ stands for convergence in distribution, and further we have

\sqrt{n} {ψ (\hat{H}) - ψ (H)} \overset{D}{⟶} MVN (0, V_{ψ})

(10)

In practice, given the observed data $(X_{1}, Δ_{1}), \dots, (X_{n}, Δ_{n})$ and the KM estimators (3)–(4), the asymptotic covariance matrix $V_{ψ}$ can be estimated by

\begin{aligned} {\hat{V}}_{ψ} & = \int \int φ (t, c; ψ, \hat{H}) φ (t, c; ψ, \hat{H})^{T} d \hat{S} (t) d \hat{G} (c) \\ = \sum_{j = 1}^{M + 1} \sum_{l = 1}^{M + 1} [φ (X_{(j)}, X_{(l)}; ψ, \hat{H}) φ (X_{(j)}, X_{(l)}; ψ, \hat{H})^{T} {\hat{S} (X_{(j - 1)}) - \hat{S} (X_{(j)})} {\hat{G} (X_{(l - 1)}) - \hat{G} (X_{(l)})}] \end{aligned}

where $X_{(0)} = 0$ , $X_{(M + 1)} = \infty$ , $\hat{S} (X_{(M + 1)}) = \hat{G} (X_{(M + 1)}) = 0$ and $φ (t, c; ψ, \hat{H})$ is the empirical influence function of $ψ$ at the point $(t, c)$ . Based on the definition in (6), the influence function $φ (t, c; ψ, H)$ equals the ratio between the change of $ψ$ and the probability weight $ϵ$ of distributional perturbation on $H$ at $(t, c)$ for small enough $ϵ$ . As a result, we propose a perturbation procedure to estimate the empirical influence function $φ (X_{(j)}, X_{(l)}; ψ, \hat{H})$ ( $j, l = 1, \dots, M + 1$ ) with details given below.

(i)
Generate a pseudo observation $(X_{(m^{*})}, Δ_{(m^{*})})$ with $X_{(m^{*})} = min (X_{(j)}, X_{(l)})$ , $Δ_{(m^{*})} = I (X_{(j)} \leq X_{(l)})$ . The pseudo observation characterizes the distributional perturbation on $\hat{H}$ at $(X_{(j)}, X_{(l)})$ .

(ii)

Compute the KM estimators of the survival functions perturbed by the pseudo observation with weight

ϵ

\begin{aligned} {\hat{S}}_{ϵ} (t) = \prod_{m : X_{(m)} \leq t} {1 - \frac{# {i : X_{i} = X_{(m)}; Δ_{i} = 1} + \frac{n ϵ}{1 - ϵ} I {X_{(m^{*})} = X_{(m)}; Δ_{(m^{*})} = 1}}{# {i : X_{i} \geq X_{(m)}} + \frac{n ϵ}{1 - ϵ} I {X_{(m^{*})} \geq X_{(m)}}}} \\ {\hat{G}}_{ϵ} (c) = \prod_{m : X_{(m)} \leq c} {1 - \frac{# {i : X_{i} = X_{(m)}; Δ_{i} = 0} + \frac{n ϵ}{1 - ϵ} I {X_{(m^{*})} = X_{(m)}; Δ_{(m^{*})} = 0}}{# {i : X_{i} > X_{(m)} or (X_{i}, Δ_{i}) = (X_{(m)}, 0)} + \frac{n ϵ}{1 - ϵ} I {X_{(m^{*})} > X_{(m)} or (X_{(m^{*})}, Δ_{i}) = (X_{(m)}, 0)}}} \end{aligned}

(11)

where

t \leq X_{(M)}

c \leq X_{(M)}

, and the perturbation parameter

ϵ > 0

takes a small value (e.g.

ϵ = 0.001

(iii)
Estimate the empirical influence function at point $(X_{(j)}, X_{(l)})$ by
$\begin{aligned} \hat{φ} (X_{(j)}, X_{(l)}; ψ, \hat{H}) & = \frac{ψ ({\hat{H}}_{ϵ}) - ψ (\hat{H})}{ϵ} \end{aligned}$ (12)
where ${\hat{H}}_{ϵ} (t, c) = {\hat{S}}_{ϵ} (t) {\hat{G}}_{ϵ} (c)$ and $ψ ({\hat{H}}_{ϵ}) - ψ (\hat{H})$ characterizes the change of $ψ$ due to the distributional perturbation of weight $ϵ$ on $\hat{H}$ at $(X_{(j)}, X_{(l)})$ .

By (11), the value $\hat{φ} (X_{(j)}, X_{(l)}; ψ, \hat{H})$ remains the same as long as the values of $X_{(m^{*})} = min (X_{(j)}, X_{(l)})$ and $Δ_{(m^{*})} = I (X_{(j)} \leq X_{(l)})$ are unchanged with respect to different pairs of $j$ and $l$ . For convenience, we define ${\hat{φ}}_{X_{(m^{*})}, Δ_{(m^{*})}} = \hat{φ} (X_{(j)}, X_{(l)}; ψ, \hat{H})$ for each unique pair $(X_{(m^{*})}, Δ_{(m^{*})})$ . Given the KM estimators in (3) to (4), we have

\begin{aligned} {\hat{V}}_{ψ} & = \sum_{m^{*} = 1}^{M + 1} \sum_{l = m^{*}}^{M + 1} {\hat{φ}}_{X_{(m^{*})}, 1} {\hat{φ}}_{X_{(m^{*})}, 1}^{T} {\hat{S} (X_{(m^{*} - 1)}) - \hat{S} (X_{(m^{*})})} {\hat{G} (X_{(l - 1)}) - \hat{G} (X_{(l)})} \\ + \sum_{m^{*} = 1}^{M} \sum_{j = m^{*} + 1}^{M + 1} {\hat{φ}}_{X_{(m^{*})}, 0} {\hat{φ}}_{X_{(m^{*})}, 0}^{T} {\hat{S} (X_{(j - 1)}) - \hat{S} (X_{(j)})} {\hat{G} (X_{(m^{*} - 1)}) - \hat{G} (X_{(m^{*})})} \\ = \sum_{m^{*} = 1}^{M + 1} {\hat{φ}}_{X_{(m^{*})}, 1} {\hat{φ}}_{X_{(m^{*})}, 1}^{T} {\hat{S} (X_{(m^{*} - 1)}) - \hat{S} (X_{(m^{*})})} \hat{G} (X_{(m^{*} - 1)}) \\ + \sum_{m^{*} = 1}^{M + 1} {\hat{φ}}_{X_{(m^{*})}, 0} {\hat{φ}}_{X_{(m^{*})}, 0}^{T} \hat{S} (X_{(m^{*})}) {\hat{G} (X_{(m^{*} - 1)}) - \hat{G} (X_{(m^{*})})} \end{aligned}

(13)

Compared with the existing formula,²⁶ our perturbation procedure can estimate the asymptotic covariance matrix at a much lower cost. In the work of Murray and Tsiatis,²⁶ the asymptotic covariance of $μ (τ_{a}; \hat{S})$ and $μ (τ_{b}; \hat{S})$ ( $1 \leq a \leq b \leq d$ ) is estimated as

\begin{aligned} \sum_{m : X_{(m)} < τ_{a}} \frac{{μ (τ_{a}; \hat{S}) - μ (X_{(m)}; \hat{S})} {μ (τ_{b}; \hat{S}) - μ (X_{(m)}; \hat{S})}}{\hat{G} (X_{(m - 1)}) \hat{S} (X_{(m - 1)})} \frac{# {i : X_{i} = X_{(m)}; Δ_{i} = 1}}{# {i : X_{i} \geq X_{(m)}}} \end{aligned}

with computational complexity $O (n^{3})$ and as a result, at least $O (n^{3} d^{2})$ operations are needed to obtain ${\hat{V}}_{ψ}$ . If a (wild) bootstrap procedure with $B$ bootstraps is adopted, the computation cost can be reduced to $O (B n d)$ . On the contrary, the total computational complexity of the perturbation procedure and equation (13) is $O (n^{2} d)$ . If the number of bootstraps $B$ is set at a value larger than the sample size $n$ , our method would be more computationally efficient.

Remark 2

If all the observed times $X_{1}, \dots, X_{n}$ are different (i.e. $M = n$ ), we have

$\begin{aligned} {\hat{S} (X_{(m^{*} - 1)}) - \hat{S} (X_{(m^{*})})} \hat{G} (X_{(m^{*} - 1)}) & = \frac{Δ_{(m^{*})}}{n} \\ \hat{S} (X_{(m^{*})}) {\hat{G} (X_{(m^{*} - 1)}) - \hat{G} (X_{(m^{*})})} & = \frac{1 - Δ_{(m^{*})}}{n} \end{aligned}$

for $m^{*} = 1, \dots, n$ . As a result, the estimated asymptotic covariance matrix reduces to

$\begin{aligned} {\hat{V}}_{ψ} = & \frac{1}{n} \sum_{m^{*} = 1}^{n} {\hat{φ}}_{X_{(m^{*})}, Δ_{(m^{*})}} {\hat{φ}}_{X_{(m^{*})}, Δ_{(m^{*})}}^{T} \end{aligned}$

which equals the Jackknife estimator of the covariance matrix of $\hat{ψ}$ ³¹ multiplied by the sample size $n$ if $ϵ = - 1 / (n - 1)$ is used to compute the empirical influence function.

2.3. RMST-Based Wald test

Now we consider a vector of RMSTs at the pre-specified time points $τ_{1}, \dots, τ_{d}$ , $ψ (H) = (μ (τ_{1}, S), \dots, μ (τ_{d}, S))^{T}$ . By the linearity of RMSTs with respect to the joint survival function, we have

\begin{aligned} φ (t, c; ψ, H) = & lim_{ϵ \to 0} \frac{ψ ((1 - ϵ) H + ϵ δ_{t, c}) - ψ (H)}{ϵ} \\ = & lim_{ϵ \to 0} \frac{ϵ ψ (δ_{t, c}) - ϵ ψ (H)}{ϵ} \\ = & ψ (δ_{t, c}) - ψ (H) \\ = & (min {t, τ_{1}}, \dots, min {t, τ_{d}})^{T} - ψ (H) \end{aligned}

which falls into a compact subset of $R^{d}$ and thus equations (7) and (9) hold. The boundedness of $φ (t, c; ψ, H)$ suggests the robustness of the estimated RMST to outliers, in contrast to the usual belief that the mean is sensitive to outliers. In addition, we have that

\begin{aligned} R_{n} & = ψ (\hat{H}) - ψ (H) - \int \int φ (t, c; ψ, H) d \hat{H} (t, c) \\ = ψ (\hat{H}) - ψ (H) - \int \int (min {t, τ_{1}}, \dots, min {t, τ_{S}})^{T} d \hat{S} (t) d \hat{G} (c) + ψ (H) \\ = 0 \end{aligned}

implying that equation (8) holds. As a result, the asymptotic distribution in (10) holds for the vector of estimated RMSTs in (5).

Consider the i.i.d. observations $(X_{k 1}, Δ_{k 1}), \dots, (X_{k n_{k}}, Δ_{k n_{k}})$ for group $k$ ( $k = 1, 2$ ) with sample size $n_{k}$ . Let $S_{k} (t)$ and $G_{k} (c)$ denote the true survival functions for the event time and censoring time, respectively. If $min {S_{1} (τ_{a}), S_{2} (τ_{a})} < 1$ and $max {H_{1} (τ_{a}, τ_{a}), H_{2} (τ_{a}, τ_{a})} > 0$ for all $p = 1, \dots, d$ , it is straightforward to show from (10) that as $min {n_{1}, n_{2}} \to \infty$ with $n_{1} / (n_{1} + n_{2}) \to κ \in (0, 1)$ :

{ψ ({\hat{H}}_{1}) - ψ ({\hat{H}}_{2})} - {ψ (H_{1}) - ψ (H_{2})} \overset{D}{⟶} MVN (0, \frac{V_{1, ψ}}{n_{1}} + \frac{V_{2, ψ}}{n_{2}})

(14)

where ${\hat{H}}_{k} (t, c) = {\hat{S}}_{k} (t) {\hat{G}}_{k} (c)$ for $k = 1, 2$ . As a result, we can construct an RMST-based Wald test statistic for the hypothesis in (1),

T_{ψ} = (ψ ({\hat{H}}_{1}) - ψ ({\hat{H}}_{2}))^{T} (\frac{{\hat{V}}_{1, ψ}}{n_{1}} + \frac{{\hat{V}}_{2, ψ}}{n_{2}})^{- 1} (ψ ({\hat{H}}_{1}) - ψ ({\hat{H}}_{2}))

(15)

where ${\hat{V}}_{k, ψ}$ is obtained by implementing the perturbation procedure and (13) in Section 2.2 on observations of group $k$ ( $k = 1, 2$ ). Based on (14), the Wald test statistic $T_{ψ}$ follows the $χ_{d}^{2}$ distribution asymptotically when the null hypothesis in (1) is true.

Remark 3

Although the RMST-based Wald test is developed under the simple random censorship model $H (t, s) = S (t) G (c)$ , it is also valid for the event-driven censoring model with staggered entry,³² where censoring times $C_{1}, \dots, C_{n}$ are shown to imply no information on i.i.d. event times $T_{1}, \dots, T_{n}$ and thus RMSTs ${(\begin{matrix} μ (τ_{1}; S), & \dots, & μ (τ_{d}; S) \end{matrix})}^{T}$ . As a result, using KM estimators (3) and (4) for inference on the study time scale can be regarded as the composite marginal likelihood method³³ under the event-driven censoring model with staggered entry, where the perturbation procedure and (13) in Section 2.2 can accurately estimate the asymptotic covariance matrix of RMSTs and our RMST-based Wald test is thus valid.

2.4. Selection of time points

When the null hypothesis in (1) is not true, the RMST-based Wald test with a larger number of time points ( $d$ ) is asymptotically more powerful than that with a smaller $d$ . However, in practice where the number of observations is finite, a suitable $d$ should be chosen because the power of the RMST-based Wald test would increase but eventually decrease as $d$ continues to increase. More severely, the probability of the type I error would be inflated if $d$ is so large that the asymptotic null distribution $T_{ψ} \sim χ_{d}^{2}$ does not hold. This can be seen from Figure 2, where the size and power of the RMST-based Wald test are exhibited with different values of $d$ . Here, we fix the sample size $n = 200$ for both groups, the perturbation parameter $ϵ = 0.001$ , and the significance level $α = 0.05$ . We use the rejection proportions in $1000$ simulated datasets under the null case where the survival functions of two groups are the same and the non-null case in Figure 1 to empirically evaluate the size and power of the RMST-based Wald test, respectively. Generally speaking, the number of time points ( $d$ ) should be chosen based on clinical knowledge, the total sample size, censoring rate, available computational resources, etc. We recommend $d$ to be between $4$ and $6$ as a tradeoff between statistical power and computational cost.

Figure 2. — The rejection proportions of the restricted mean survival time (RMST)-based Wald test under different numbers of time points ( $d = 2$ to 32) with sample size $n = 200$ for both groups and perturbation parameter $ϵ = 0.001$ . We consider the null case $H_{0}$ where the survival functions of two groups are the same and the non-null case $H_{1}$ as given in Figure 1.

In comparative studies, it is suggested that the time points ( $τ_{1}, \dots, τ_{d}$ ) should be pre-specified at the design stage based on prior clinical information.^19,21 If it is needed to select the time points $τ_{1}, \dots, τ_{d}$ in a data-driven way, which is usually the case in observational studies, we suggest using the equal-distance ${1 / d, 2 / d, \dots, 1}$ quantiles of all the observed event times ${X_{k i} : Δ_{k i} = 1, k = 1, 2}$ as the time points $τ_{1}, \dots, τ_{d}$ . The reason is that the estimated correlation between $μ (τ_{a}, S)$ and $μ (τ_{b}, S)$ ( $1 \leq a < b \leq d$ ) would be large if there are too few events observed in the time interval $[τ_{a}, τ_{b}]$ . Such high correlation would lead to strong collinearity in the estimated asymptotic covariance matrices ${\hat{V}}_{k, ψ}$ ( $k = 1, 2$ ) and thus incur power loss for the RMST-based Wald test.

3. Simulations

3.1. Estimation accuracy of ${\hat{V}}_{ψ}$

As the estimated asymptotic covariance matrix ${\hat{V}}_{ψ}$ computed by the perturbation procedure and (13) plays a crucial role in the Wald test statistic $T_{ψ}$ , we check whether it is a valid estimator of the true asymptotic covariance matrix $V_{ψ}$ on $1000$ simulated datasets under different sample sizes $n = 100, 200, 500$ and $1000$ . In each dataset, the i.i.d. observations $(X_{1}, Δ_{1}), \dots, (X_{n}, Δ_{n})$ are generated, where $X_{i} = min (T_{i}, C_{i}, C_{0})$ and $Δ_{i} = I (T_{i} \leq min {C_{i}, C_{0}})$ with the event time $T_{i} \sim S (t) = \exp (- t / 2)$ , the individual censoring time $C_{i} \sim G (c) = \exp (- c / 4)$ and the administrative censoring time $C_{0} = 2$ ( $i = 1, \dots, n$ ). Let $ψ ({\hat{H}}^{(b)})$ and ${\hat{V}}_{ψ}^{(b)}$ denote the vector of estimated RMSTs and the estimated asymptotic covariance matrix in the $b$ th dataset ( $b = 1, \dots, 1000$ ). We measure the estimation accuracy of ${\hat{V}}_{ψ}$ by the average relative Frobenius norm ( $ARFN$ ) of ${\hat{V}}_{ψ}^{(1)}, \dots, {\hat{V}}_{ψ}^{(1000)}$ as follows:

ARFN = \frac{1}{1000} \sum_{b = 1}^{1000} \frac{‖ {\hat{V}}_{ψ}^{(b)} - {\bar{V}}_{ψ} ‖_{F}}{‖ {\bar{V}}_{ψ} ‖_{F}}

where $‖ \cdot ‖_{F}$ is the Frobenius norm of matrices, and

\begin{aligned} {\bar{V}}_{ψ} & = \frac{1}{1000} \sum_{b = 1}^{1000} n {ψ ({\hat{H}}^{(b)}) - \bar{ψ} (\hat{H})} {ψ ({\hat{H}}^{(b)}) - \bar{ψ} (\hat{H})}^{T} \\ \bar{ψ} (\hat{H}) & = \frac{1}{1000} \sum_{b = 1}^{1000} ψ ({\hat{H}}^{(b)}) \end{aligned}

(16)

Table 1 presents the $ARFN$ of ${\hat{V}}_{ψ}^{(1)}, \dots, {\hat{V}}_{ψ}^{(1000)}$ under different sample sizes ( $n = 100, 200, 500$ , and $1000$ ), different perturbation parameters ( $ϵ = 0.0001, 0.001$ , and $0.01$ ), three sets of time points ( ${2}$ , ${1, 2}$ and ${0.5, 1, 1.5, 2}$ ). It is clear that the choice of $ϵ$ does not have a substantial impact on the difference between ${\hat{V}}_{ψ}$ and the true covariance matrix. The estimation accuracy of ${\hat{V}}_{ψ}$ improves as the sample size increases, implying the effectiveness of the proposed procedure in estimating the asymptotic covariance matrix.

Table 1.

Estimation accuracy (measured in ARFN for which the smaller the better) of the estimated asymptotic covariance matrix under different sample sizes ( $n$ ), different perturbation parameters ( $ϵ$ ) and different sets of time points.

	Time points	$n = 100$	$n = 200$	$n = 500$	$n = 1000$
Treating RMST as statistical functional of $H (t, c)$
$ϵ = 0.0001$	${2}$	0.0623	0.0488	0.0331	0.0206
	${1, 2}$	0.0740	0.0521	0.0380	0.0246
	${0.5, 1, 1.5, 2}$	0.0781	0.0566	0.0393	0.0256
$ϵ = 0.001$	${2}$	0.0621	0.0509	0.0341	0.0201
	${1, 2}$	0.0761	0.0546	0.0385	0.0258
	${0.5, 1, 1.5, 2}$	0.0808	0.0574	0.0394	0.0276
$ϵ = 0.01$	${2}$	0.0632	0.0515	0.0300	0.0199
	${1, 2}$	0.0767	0.0588	0.0354	0.0263
	${0.5, 1, 1.5, 2}$	0.0811	0.0597	0.0367	0.0278
Treating RMST as statistical functional of $S (t)$
$ϵ = 0.001$	${2}$	0.4750	0.4547	0.4618	0.4558
	${1, 2}$	0.4466	0.4260	0.4349	0.4268
	${0.5, 1, 1.5, 2}$	0.4159	0.3928	0.4056	0.3953

Open in a new tab

ARFN: average relative Frobenius norm; RMST: restricted mean survival time.

For comparison, we also evaluate the accuracy of the estimated asymptotic covariance matrix when the vector of RMSTs is treated as a statistical functional of $S (t)$ alone. From Tables 1 and 2, it is clear that treating the vector of RMSTs as a statistical functional of the survival function $S (t)$ would underestimate the asymptotic covariance matrix of the estimated RMSTs. We also examine whether the same pattern exists on $1000$ simulated datasets under the event-driven censorship with staggered entry and sample size $n = 200$ . For each dataset, the i.i.d. event times $T_{1}, \dots, T_{200} \sim S (t) = \exp (- t / 2)$ and entry times $R_{1}, \dots, R_{200} \sim Unif (0, 2)$ are generated. With a fixed censoring rate $50 %$ , we let the event-driven censoring time $C^{*}$ (on the calendar scale) be the $100$ th smallest value in ${T_{i} + R_{i}; i = 1, \dots, 200}$ and then observations $(X_{1}, Δ_{1}), \dots, (X_{200}, Δ_{200})$ are generated in the way that $X_{i} = min (T_{i}, C^{*} - R_{i})$ and $Δ_{i} = min (T_{i} \leq C^{*} - R_{i})$ ( $i = 1, \dots, 200$ ). From Table 2, it is clear that under the event-driven censorship with staggered entry, our perturbation procedure can still accurately estimate the asymptotic covariance matrix of the estimated RMSTs as stated in Remark 3 while treating the vector of RMSTs as a statistical functional of the survival function $S (t)$ alone would underestimate the asymptotic covariance matrix.

Table 2.

Average value of the estimated asymptotic covariance matrix under the random censorship or event-driven censorship with staggered entry, sample size $n = 200$ , the perturbation parameter $ϵ = 0.001$ and time points ${0.5, 1, 1.5, 2}$ when the vector of RMSTs is treated as a statistical functional of $H (t, c)$ or $S (t)$ .

		Event-driven censorship
	Random censorship	with staggered entry
${\bar{V}}_{ψ}$ in (16)	$(\begin{matrix} 0.018 & 0.039 & 0.055 & 0.067 \\ 0.039 & 0.112 & 0.179 & 0.230 \\ 0.055 & 0.179 & 0.327 & 0.448 \\ 0.067 & 0.230 & 0.448 & 0.665 \end{matrix})$	$(\begin{matrix} 0.017 & 0.038 & 0.055 & 0.067 \\ 0.038 & 0.110 & 0.171 & 0.219 \\ 0.055 & 0.171 & 0.300 & 0.406 \\ 0.067 & 0.219 & 0.406 & 0.598 \end{matrix})$
$\frac{1}{1000} \sum_{b = 1}^{1000} {\hat{V}}_{ψ}^{(b)}$ when RMST treated as statistical functional of $H (t, c)$	$(\begin{matrix} 0.017 & 0.039 & 0.056 & 0.069 \\ 0.039 & 0.116 & 0.184 & 0.236 \\ 0.056 & 0.184 & 0.330 & 0.452 \\ 0.069 & 0.236 & 0.452 & 0.666 \end{matrix})$	$(\begin{matrix} 0.016 & 0.036 & 0.052 & 0.063 \\ 0.036 & 0.104 & 0.163 & 0.209 \\ 0.052 & 0.163 & 0.288 & 0.392 \\ 0.063 & 0.209 & 0.392 & 0.581 \end{matrix})$
$\frac{1}{1000} \sum_{b = 1}^{1000} {\hat{V}}_{ψ}^{(b)}$ when RMST treated as statistical functional of $S (t)$	$(\begin{matrix} 0.015 & 0.031 & 0.042 & 0.048 \\ 0.031 & 0.085 & 0.124 & 0.146 \\ 0.042 & 0.124 & 0.200 & 0.245 \\ 0.048 & 0.146 & 0.245 & 0.315 \end{matrix})$	$(\begin{matrix} 0.015 & 0.032 & 0.044 & 0.051 \\ 0.032 & 0.089 & 0.132 & 0.158 \\ 0.044 & 0.132 & 0.215 & 0.271 \\ 0.051 & 0.158 & 0.271 & 0.358 \end{matrix})$

Open in a new tab

3.2. Power Comparisons

We conduct simulations to evaluate the performance of our RMST-based Wald test in various cases and compare it with the log-rank test and the RMST-based versatile test,²² which is more similar to the confidence band test. Although the versatile test also tests the null hypothesis in (1) based on RMSTs, Horiguchi et al.²² used a wild bootstrap procedure to approximate the null distribution of the test statistic

max_{k \in {1, \dots, d}} | \frac{\sqrt{n_{1} + n_{0}} {μ (τ_{k}; {\hat{S}}_{1}) - μ (τ_{k}; {\hat{S}}_{0})}}{{\hat{σ}}_{k}} |

where ${\hat{σ}}_{k}$ is the estimated standard error of the numerator, and compute the $p$ -value.

We first validate the size of our RMST-based Wald test in three null cases where the survival functions for two groups are the same. We then consider six different types of non-null cases to compare our method with the existing ones in distinguishing various types of differences between survival functions. The true survival functions $S_{1} (t)$ and $S_{2} (t)$ of nine cases are listed as follows.

-
Null cases:
- (i)
  $S_{1} (t) = S_{2} (t) = \exp (- t / 2)$ .
- (ii)
  $S_{1} (t) = S_{2} (t)$ both from $Weibull (2, 0.5)$ .
- (iii)
  $S_{1} (t) = S_{2} (t)$ both from $Weibull (2, 2)$ .
-
Non-null cases:
1. Proportional hazards: $S_{1} (t) = \exp (- t / 2)$ and
  $S_{2} (t) = \exp {- (1 + λ) t / 2}$
  where $λ = 0.1, \dots, 1$ .
2. Crossing hazards: $S_{1} (t) = \exp (- t / 2)$ and
  $S_{2} (t) = \exp {- (1 + λ) t / 2 + λ max (t - 1, 0)}$
  where $λ = 0.1, \dots, 1$ .
3. Crossing survival functions: $S_{1} (t) = \exp (- t / 2)$ and
  $S_{2} (t) = λ (1 - t / e) + (1 - λ) \exp (- t / 2)$
  where $λ = 0.1, \dots, 1$ .
4. Early difference: $S_{1} (t) = (1 + t)^{- 1}$ and
  $S_{2} (t) = 2 (1 + t)^{- 1} (2 + min {t, t_{0}})^{- 1}$
  where $t_{0} = 0.1, \dots, 1$ .
5. Middle difference: $S_{1} (t) = (1 + t)^{- 1}$ and
  $S_{2} (t) = 2 (1 + t)^{- 1} {2 + min (t, 1 + t_{0} / 2) - min (t, 1 - t_{0} / 2)}^{- 1}$
  where $t_{0} = 0.1, \dots, 1$ .
6. Late difference: $S_{1} (t) = (1 + t)^{- 1}$ and
  $S_{2} (t) = 2 (1 + t)^{- 1} (2 + max {t + t_{0} - 2, 0})^{- 1}$
  where $t_{0} = 0.1, \dots, 1$ .

The survival functions $S_{1} (t)$ and $S_{2} (t)$ with $λ = 0.5, 1$ under non-null cases (a) to (c) are shown in the top panel of Figure 3 and those with $t_{0} = 0.5, 1$ under non-null cases (d) to (f) are shown in the bottom panel. In non-null cases (a) to (c), the parameter $λ$ controls the size of the gap between $S_{1} (t)$ and $S_{2} (t)$ , where a larger value of $λ$ leads to a larger gap, and $S_{1} (t)$ equals $S_{2} (t)$ when $λ = 0$ . In non-null cases (d) to (f), the parameter $t_{0}$ is the length of the time period in $[0, 2]$ where the hazard functions of two groups are different, and $S_{1} (t)$ equals $S_{2} (t)$ when $t_{0} = 0$ .

For each null case, we simulate 1000 datasets to validate the size of hypothesis testing methods under different sample sizes $n = 100$ , $200$ , $500$ , and $1000$ for both groups. For each dataset, we simulate the i.i.d. observations $(X_{k 1}, Δ_{k 1}), \dots, (X_{k n}, Δ_{k n})$ for group $k$ ( $k = 1, 2$ ), where $X_{k i} = min (T_{k i}, C_{k i}, C_{0})$ and $Δ_{k i} = I (T_{k i} \leq min {C_{k i}, C_{0}})$ with the event time $T_{k i} \sim S_{k} (t)$ , the individual censoring time $C_{k i} \sim G (c) = \exp {- c / 4}$ and the administrative censoring time $C_{0} = 2$ for $i = 1, \dots, n$ . We apply the RMST-based Wald test (15) with three sets of time points ${2}$ , ${1, 2}$ and ${0.5, 1, 1.5, 2}$ , the perturbation parameter $ϵ = 0.001$ and the significance level $α = 0.05$ . We also conduct the log-rank test and the RMST-based versatile test²² with the same sets of time points for comparison. The rejection proportions of our test and the existing approaches under different null cases and sample sizes are presented in Table 3. The rejection proportions of our RMST-based Wald test (15) are all around the nominal level $5 %$ , implying that the size of our RMST-based Wald test can be maintained at the significance level when $H_{0}$ is true, regardless of how many and which time points are chosen.

Table 3.

The rejection proportions ( $%$ ) of the log-rank test, the RMST-based versatile test, and our RMST-based Wald test with different sets of time points, different null cases, and sample sizes ( $n$ ) over 1000 simulated datasets.

	Null case (i)				Null case (ii)				Null case (iii)
$n$	$100$	$200$	$500$	$1000$	$100$	$200$	$500$	$1000$	$100$	$200$	$500$	$1000$
Log-rank	5.1	4.4	4.5	5.5	5.2	4.2	2.9	5.2	4.9	3.4	3.7	5.1
${Versatile}_{{2}}$	5.7	5.1	4.3	5.0	5.5	5.0	3.1	5.7	4.7	3.9	3.9	5.5
${Versatile}_{{1, 2}}$	5.2	5.2	4.0	5.5	4.8	4.8	3.9	5.5	5.7	4.2	4.4	5.1
${Versatile}_{{0.5, 1, 1.5, 2}}$	5.5	5.0	4.4	5.8	4.7	4.2	3.9	5.5	5.3	4.8	5.0	4.6
${Wald}_{{2}}$	5.6	5.4	4.6	5.3	5.5	4.8	3.4	5.7	4.3	3.9	4.1	5.3
${Wald}_{{1, 2}}$	4.9	5.1	4.1	5.6	4.6	5.0	5.2	6.0	6.2	5.0	4.7	5.2
${Wald}_{{0.5, 1, 1.5, 2}}$	4.3	4.1	4.8	5.4	6.4	6.3	5.6	4.4	7.1	5.1	5.7	5.1

Open in a new tab

We perform the proposed RMST-based Wald test (15) on 1000 simulated datasets with a fixed value of $λ$ for non-null cases (a) to (c) and a fixed value of $t_{0}$ for non-null cases (d) to (f) under different sample sizes $n$ for both groups to examine how the power of our test varies as the sample size increases. The parameter $λ$ is set as $0.5$ under non-null cases (a) and (b) and $1$ under non-null case (c), while $t_{0}$ is fixed as $0.5$ under non-null cases (d) to (f). The rejection proportions of our omnibus Wald test, the log-rank test, and the versatile test under different non-null cases and $n$ are exhibited in Figure 4. The true RMSTs of both groups at time point 2 are the same for all values of $λ$ in the case (c) of crossing survival functions, making the rejection proportions (or the type I error rates) of our RMST-based Wald test and the RMST-based versatile test with a single time point $2$ equal to the significance level $0.05$ . Except for this scenario, the power of all the tests increases toward $1$ as sample size $n$ increases.

Figure 4. — The rejection proportions of the log-rank test, the restricted mean survival time (RMST)-based versatile test, and our RMST-based Wald test versus sample size $n$ under different non-null cases. (a) Proportional hazards $λ = 0.5$ ; (b) crossing hazards $λ = 0.5$ ; (c) crossing survival $λ = 1$ ; (d) early difference $t_{0} = 0.5$ ; (e) middle difference $t_{0} = 0.5$ ; and (f) late difference $t_{0} = 0.5$ .

To evaluate how the power of the RMST-based Wald test (15) varies with respect to different sizes of gaps between survival functions $S_{1} (t)$ and $S_{2} (t)$ , we test $H_{0} : S_{1} (t) = S_{2} (t)$ on 1000 simulated datasets with a fixed sample size $n$ for both groups under different values of $λ$ for non-null cases (a) to (c). The sample size $n$ is $200$ under non-null cases (a) and (b) and $1000$ under the non-null case (c). The rejection proportions of our test and the existing approaches under different non-null cases and $λ$ are presented in Figure 5(a) to (c). Under the case of PHs, it is clear that the log-rank test possesses the highest power, which is consistent with the fact that the log-rank test is the most powerful test under the PH assumption. However, both the power values of our RMST-based Wald test (15) and the RMST-based versatile test are close to that of the log-rank test, especially when we use a single time point $2$ . On the contrary, in the cases of crossing hazards and crossing survival functions where the PH assumption does not hold, our test dominates the log-rank test and the RMST-based versatile test significantly, especially when we use multiple time points. In addition, the performance of our RMST-based Wald test with multiple time points is stable across various cases, suggesting its robustness when multiple time points are included in the omnibus test.

Figure 5. — The rejection proportions of the log-rank test, the restricted mean survival time (RMST)-based versatile test, and our RMST-based Wald test versus $λ$ or $t_{0}$ under different non-null cases. (a) Proportional hazards $n = 200$ ; (b) crossing hazards $n = 200$ ; (c) crossing survival $n = 1000$ ; (d) early difference $n = 200$ ; (e) middle difference $n = 200$ ; and (f) late difference $n = 200$ .

We also study the power of the RMST-based Wald test in detecting early, middle, or late difference by testing $H_{0} : S_{1} (t) = S_{2} (t)$ on 1000 simulated datasets with a fixed sample size of $n = 200$ for both groups under different values of $t_{0}$ for non-null cases (d) to (f). The rejection proportions of our test and the existing methods under different cases and different values of $t_{0}$ are presented in Figure 5(d) to (f). Under the case of early difference, it is clear that the RMST-based versatile test possesses the highest power while the log-rank test yields the worst performance. Although our RMST-based Wald test is not the most powerful one, its power is comparable to the RMST-based versatile test. Under the cases of middle and late differences, our RMST-based Wald test with multiple time points is the most powerful one, suggesting its sensitivity to the difference between $S_{1} (t)$ and $S_{2} (t)$ regardless of the period where the difference exists. In contrast, the power of the RMST-based versatile test is even lower than that of the log-rank test under the cases of middle and late differences, suggesting that the RMST-based versatile test weighs more on early events. Therefore, our method is not inferior to the existing ones for detecting early survival differences and outperforms them in detecting middle and late survival differences. In summary, our RMST-based Wald test is much more stable and powerful across all cases at a minor cost of efficiency under some particular scenarios.

3.3. Robustness to censoring patterns

We also change the censoring distribution $G (c)$ to investigate how the RMST-based Wald test performs under different censoring patterns. We consider the null case (i) and non-null cases (a) and (c) with a fixed sample size $n = 200$ for both groups. The value of $λ$ is set to be $0.5$ and $1$ in non-null cases (a) and (c), respectively. We use three censoring distributions $G (c)$ to generate individual censoring times to obtain different censoring rates as follows:

-
$G (c) = 1 - c / 3,$ with $c < 3$ , to yield a censoring rate of $55 %$ ;
-
$G (c) = \exp (- c / 4)$ to yield a censoring rate of $45 %$ ;
-
$G (c)$ from $Weibull (2, 4)$ to yield a censoring rate of $35 %$ .

Table 4 summarizes the rejection proportions of our test at the significance level $α = 0.05$ with different sets of time points under different (null and non-null) cases and censoring rates. On the one hand, the size of the RMST-based Wald test is stable and centered around the significance level, implying that our test remains effective as the censoring rate increases. On the other hand, when $H_{0}$ is not true, the power of the RMST-based Wald test increases as the censoring rate decreases. In the PH situation, the power of the RMST-based Wald test with a single time point is the highest. When the PH assumption does not hold, it is better to use multiple time points in our RMST-based Wald test to capture the complex patterns of survival differences.

Table 4.

The rejection proportions ( $%$ ) of the restricted mean survival time (RMST)-based Wald test with different sets of time points in 1000 simulated datasets under different cases and censoring rates.

		Censoring rate
	Time points	$55 %$	$45 %$	$35 %$
Null case (i)	${2}$	5.4	5.4	5.3
	${1, 2}$	5.3	5.1	4.6
	${0.5, 1, 1.5, 2}$	5.2	4.1	3.5
Non-null case (a) (Proportional hazards)	${2}$	82.8	84.7	87.9
	${1, 2}$	74.1	78.1	81.7
	${0.5, 1, 1.5, 2}$	65.7	69.3	75.9
Non-null case (c) (Crossing survival functions)	${2}^{†}$	5.2	5.1	4.9
	${1, 2}$	29.7	34.8	34.1
	${0.5, 1, 1.5, 2}$	32.0	41.6	47.1

Open in a new tab

$†$ : In a non-null case (c), the true RMSTs of both groups at time point 2 are the same, making the power (or the type I error rate) of the RMST-based Wald test with a single time point $2$ equal to the significance level $0.05$ .

4. Real data analysis

To illustrate the empirical performance of our RMST-based Wald test under real situations, we consider two data examples described as follows:

-
Gastric cancer dataset^34,35: The gastric cancer dataset contains the records of $4, 069$ patients with advanced/recurrent gastric cancer from $20$ randomized trials, which studied whether chemotherapy could improve patients’ overall survival and progression-free survival. In our analysis, we used the data of the $12$ th trial with 279 observations (174 in the treatment group and 105 in the control group) to test whether the progression-free survival functions of the treatment group and the control group differed or not.
-
Bone marrow transplant dataset³⁶: The bone marrow transplant dataset includes the time-to-event data of 101 patients with acute myelogenous leukemia in a study of bone marrow transplantation. In this dataset, $50$ patients received allogeneic bone marrow transplantation while the other $51$ patients received autologous transplantation. The event of interest in this study was death or relapse of leukemia, and the target of the study was to investigate the impact of transplant types on patients’ leukemia-free survival.

Figure 6 displays the KM estimators of the survival functions of two groups in two real data examples. In the gastric cancer dataset, it is clear that the estimated survival function of the treatment group stays higher than that of the control group for a long period of time till about 500 days, suggesting that chemotherapy is possibly effective in improving patients’ progression-free survival. However, due to the existence of several patients in the control group with extremely long progression-free survival, the estimated survival functions of the two groups cross at around $500$ days and the estimated survival rate of the control group dominates the treatment group afterward. The same phenomenon also appears in the bone marrow transplant dataset, where compared with the allogeneic group, the estimated survival rate of the autologous group is higher in the early period of follow-up but drops lower after around $15$ months.

In the two real data examples, we apply both our RMST-based Wald test (with the perturbation parameter $ϵ = 0.001$ ) and the existing RMST-based versatile test with the same set of $d = 6$ time points to compare their performances in detecting the differences between two survival functions. Here, the time points $τ_{1}, \dots, τ_{6}$ are selected as $1 / 6, 2 / 6, \dots, 1$ quantiles of all the observed event times ${X_{k i} : Δ_{k i} = 1, k = 1, 2}$ in each real dataset. We also apply the log-rank test as a baseline for comparison and the significance level is set at $0.05$ . For the gastric cancer dataset, both the Wald test ( $P$ -value = $0.045$ ) and the versatile test ( $P$ -value = $0.003$ ) results are significant while the log-rank test does not show statistical significance ( $P$ -value = $0.055$ ). The reason may be that the PH assumption is violated in the gastric cancer dataset with a crossing in the estimated survival functions. However, in the bone marrow transplant dataset where the estimated survival functions of the two groups also cross, our RMST-based Wald test reports significant differences in the survival functions ( $P$ -value = $0.042$ ), while both existing tests fail to show statistical significance ( $P$ -values: log-rank $0.537$ ; versatile test $0.211$ ). One possible reason for the significant result of our RMST-based Wald test is that our method is more sensitive to the late difference of survival functions as shown in Section 3.2. The differences in the inferential results for the gastric cancer dataset and the bone marrow transplant dataset demonstrate the statistical efficiency of our RMST-based Wald test over the existing RMST-based versatile test and log-rank test when the survival functions of two groups cross, which is consistent with the numerical results in Section 3.2.

5. Conclusions

In comparative clinical trials with time-to-event data, the log-rank test is commonly used to examine whether two survival functions are equal or not. Equivalent to the score test for the hypothesis that the hazard ratio is $1$ , the log-rank test is the most powerful test under the PH assumption, which, however, may not hold in practice. To develop a robust omnibus test for the null hypothesis in (1) with comparable power no matter whether the PH assumption is violated or not, we test the null in (1) by comparing a vector of RMSTs between different groups. Interpreting the vector of RMSTs at multiple time points as a statistical functional of the joint survival function of event and censoring times, we define the influence function of RMSTs. Based on the relationship between the influence function and the asymptotic distribution, we estimate the asymptotic covariance matrix of estimated RMSTs via a perturbation procedure and thus construct an RMST-based Wald test for (1), where the statistical inference can be made without intensive numerical resampling schemes (e.g. bootstrap or permutation methods). Extensive numerical studies and real data analysis are conducted to validate the size of the proposed RMST-based Wald test and compare its power with existing methods under different scenarios. It is shown that the proposed RMST-based Wald test with multiple time points possesses comparable power when the PH assumption holds and dominates the log-rank test if the PH assumption is violated. Compared with the RMST-based versatile test²² which is more similar to the confidence band test, the power of our RMST-based Wald test is more stable across various scenarios, suggesting its statistical robustness.

The RMST-based Wald test can also be extended to the null

H_{0} : S_{1} = \dots = S_{K}

(17)

of $K > 2$ groups. As the test statistic $T_{ψ}$ in (15) is based on the asymptotic distribution (10), under (17) we have for any $w = (w_{1}, \dots, w_{K})^{T}$ satisfying $\sum_{k = 1}^{K} w_{k} = 0$ and $\sum_{k = 1}^{K} w_{k}^{2} = 1$ ,

\sum_{k = 1}^{K} w_{k} ψ ({\hat{H}}_{k}) \overset{D}{⟶} MVN (0, \sum_{k = 1}^{K} \frac{w_{k}^{2} V_{ψ, k}}{n_{k}})

as $min {n_{1}, \dots, n_{K}} \to \infty$ and $n_{k} / (n_{1} + \dots + n_{K}) \to κ_{k} \in (0, 1)$ for all $k \in {1, \dots, K}$ . As a result, it is possible to extend the RMST-based Wald test with the test statistic

T_{ψ, w} = {\sum_{k = 1}^{K} w_{k} ψ ({\hat{H}}_{k})}^{T} (\sum_{k = 1}^{K} \frac{w_{k}^{2} V_{ψ, k}}{n_{k}})^{- 1} {\sum_{k = 1}^{K} w_{k} ψ ({\hat{H}}_{k})}

which follows the $χ_{d}^{2}$ distribution asymptotically when (17) is true. Here, the projection vector $w$ controls the power of test with $T_{ψ, w}$ and the optimal $w$ can be estimated following the work of Liu et al.³⁷

By the definition of RMST in (2), it is clear that there exists a one-to-one correspondence between the survival function $S (t)$ and the RMST $μ (τ; S)$ that

\begin{aligned} S (t) = \frac{\partial μ (τ; S)}{\partial τ} |_{τ = t} \end{aligned}

As a result, the RMST $μ (\cdot; S)$ can also be treated as an infinite-dimensional statistical functional of the joint survival function. As presented in Zhao et al.,¹⁸ the process

Z (τ) = \sqrt{n} {μ (τ; \hat{S}) - μ (τ; S)}, τ \in [0, τ_{max}]

converges weakly to a zero-mean Gaussian process. It is of interest to develop a test based on the process $Z (τ)$ as an infinite-dimensional extension of our RMST-based Wald test. However, there are several challenges.

-
Given that the covariance function of the process $Z (τ)$ is an integral of the covariance function (2.2) in the work of Hall and Wellner,³⁹ the connection between $Z (τ)$ and standard stochastic processes (Brownian motion or Brownian bridge) is unclear. Existing asymptotic results of standard stochastic processes cannot be directly utilized.
-
As illustrated in Section 2.4, the asymptotic $χ_{d}^{2}$ distribution of $T_{ψ}$ under $H_{0}$ requires the number of time points ( $d$ ) to be fixed and finite. Thus, our proposed inference procedure based on $χ_{d}^{2}$ is not directly applicable to developing a test based on $Z (τ)$ with infinite dimensions.
-
Although resampling approaches are able to provide the exact type I error rate control, directly applying them to develop a test based on $Z (τ)$ may suffer substantial power loss. For example, similar to construction of the confidence band for the KM estimator,^38–41 we construct a naive band-based test statistic $T_{ψ}$ based on the procedure detailed in Sections 2.2–2.3, where $τ_{1}, \dots, τ_{d}$ are unique values of all the observed event times ${X_{k i} : Δ_{k i} = 1, k = 1, 2}$ . We apply a permutation test (by randomly reassigning observations into two groups) on this band-based test statistic under the settings of Section 2.4 (sample size $n = 200$ for both groups, the perturbation parameter $ϵ = 0.001$ , significance level $α = 0.05$ ). The average value of $d$ over 1000 repetitions is around 220. With the type I error rate ( $5.5 %$ ) well controlled in the null case (i), the power of the band-based test ( $13.0 %$ ) is significantly lower than the power of our Wald test ( $38.1 %$ ) with a fixed number of time points ( $d = 4$ ) under the non-null case (c) with $t_{0} = 1$ .

As a result, the proposed construction mechanism of the Wald test statistic cannot be directly utilized for the infinite-dimensional process ${μ (τ; \hat{S}) - μ (τ; S)}$ and the construction of a powerful band-based test based on the RMST function $μ (\cdot; S)$ warrants further investigation.

Supplemental Material

sj-zip-1-smm-10.1177_09622802231158735 - Supplemental material for Omnibus test for restricted mean survival time based on influence function

Click here for additional data file.^{(40KB, zip)}

Supplemental material, sj-zip-1-smm-10.1177_09622802231158735 for Omnibus test for restricted mean survival time based on influence function by Jiaqi Gu, Yiwei Fan and Guosheng Yin in Statistical Methods in Medical Research

Footnotes

Author’s Note: Guosheng Yin is also affiliated with Department of Mathematics, Imperial College London, London, UK.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: This work was partially supported by the Research Grants Council of Hong Kong (17308321).

ORCID iDs: Jiaqi Gu https://orcid.org/0000-0002-8773-9558

Guosheng Yin https://orcid.org/0000-0003-3276-1392

Supplemental material: Supplementary material for this paper is available online.

References

1.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 1966; 50: 163–170. [PubMed] [Google Scholar]
2.Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68: 316–319. [Google Scholar]
3.Fleming TR, Harrington DP. Counting processes and survival analysis. Hoboken, NY: John Wiley & Sons, Inc., 2005. [Google Scholar]
4.Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika 1982; 69: 553–566. [Google Scholar]
5.Gehan EA. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 1965; 52: 203. [PubMed] [Google Scholar]
6.Tarone RE, Ware J. On distribution-free tests for equality of survival distributions. Biometrika 1977; 64: 156–160. [Google Scholar]
7.Irwin JO. The standard error of an estimate of expectation of life, with special reference to expectation of tumourless life in experiments with mice. J Hyg 1949; 47: 188–189. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Karrison T. Restricted mean life with adjustment for covariates. J Am Stat Assoc 1987; 82: 1169–1176. [Google Scholar]
9.Zucker DM. Restricted mean life with covariates: modification and extension of a useful survival analysis method. J Am Stat Assoc 1998; 93: 702–709. [Google Scholar]
10.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958; 53: 457–481. [Google Scholar]
11.Johansen S. The product limit estimator as maximum likelihood estimator. Scand J Stat 1978; 5: 195–199. [Google Scholar]
12.Royston P, Parmar MKB. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med 2011; 30: 2409–2421. [DOI] [PubMed] [Google Scholar]
13.Zhao L, Tian L, Uno H, et al. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study. Clin Trials 2012; 9: 570–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Uno H, Claggett B, Tian L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol 2014; 32: 2380–2385. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Uno H, Wittes J, Fu H, et al. Alternatives to hazard ratios for comparing the efficacy or safety of therapies in noninferiority studies. Ann Intern Med 2015; 163: 127. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Uno H, Tian L, Claggett B, et al. A versatile test for equality of two survival functions based on weighted differences of Kaplan–Meier curves. Stat Med 2015; 34: 3680–3695. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Pak K, Uno H, Kim DH, et al. Interpretability of cancer clinical trial results using restricted mean survival time as an alternative to the hazard ratio. JAMA Oncol 2017; 3: 1692. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhao L, Claggett B, Tian L, et al. On the restricted mean survival time curve in survival analysis. Biometrics 2015; 72: 215–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Tian L, Fu H, Ruberg SJ, et al. Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations. Biometrics 2017; 74: 694–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Huang B, Kuan PF. Comparison of the restricted mean survival time with the hazard ratio in superiority trials with a time-to-event end point. Pharm Stat 2017; 17: 202–213. [DOI] [PubMed] [Google Scholar]
21.Tian L, Jin H, Uno H, et al. On the empirical choice of the time window for restricted mean survival time. Biometrics 2020; 76: 1157–1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Horiguchi M, Cronin AM, Takeuchi M, et al. A flexible and coherent test/estimation procedure based on restricted mean survival times for censored time-to-event data in randomized clinical trials. Stat Med 2018; 37: 2307–2320. [DOI] [PubMed] [Google Scholar]
23.Wolski A, Grafféo N, Giorgi R. A permutation test based on the restricted mean survival time for comparison of net survival distributions in non-proportional excess hazard settings. Stat Methods Med Res 2019; 29: 1612–1623. [DOI] [PubMed] [Google Scholar]
24.Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69: 383–393. [Google Scholar]
25.Huber PJ. Robust statistical procedures. Philadelphia: Society for Industrial and Applied Mathematics, 1977. [Google Scholar]
26.Murray S, Tsiatis AA. Sequential methods for comparing years of life saved in the two-sample censored data problem. Biometrics 1999; 55: 1085–1092. [DOI] [PubMed] [Google Scholar]
27.Reid N. Influence functions for censored data. Ann Stat 1981; 9: 78–92. [Google Scholar]
28.Gill RD. Censoring and stochastic integrals. Stat Neerl 1980; 34: 124. [Google Scholar]
29.Andersen PK, Borgan Ø, Gill RD, et al. Statistical models based on counting processes. USA: Springer, 1993. [Google Scholar]
30.Von Mises R. On the asymptotic distribution of differentiable statistical functions. Ann Math Stat 1947; 18: 309–348. [Google Scholar]
31.Efron B, Stein C. The jackknife estimate of variance. Ann Stat 1981; 9: 586–596. [Google Scholar]
32.Rühl J Beyersmann J, Friedrich S. General independent censoring in event-driven trials with staggered entry. Biometrics 2022; 1–12. DOI: 10.1111/biom.13710. [DOI] [PubMed] [Google Scholar]
33.Chandler RE, Bate S. Inference for clustered data using the independence loglikelihood. Biometrika 2007; 94: 167–183. [Google Scholar]
34.Paoletti X, Oba K, Bang YJ, et al. Progression-free survival as a surrogate for overall survival in advanced/recurrent gastric cancer trials: A meta-analysis. J Natl Cancer Inst 2013; 105: 1667–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Buyse M, Molenberghs G, Paoletti X, et al. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J 2015; 58: 104–132. [DOI] [PubMed] [Google Scholar]
36.Klein JP, Moeschberger ML. Survival analysis: Techniques for censored and truncated data. New York: Springer, 2003. [Google Scholar]
37.Liu W, Yu X, Zhong W, et al. Projection test for mean vector in high dimensions. J Am Stat Assoc 2022: 1–13. DOI: 10.1080/01621459.2022.2142592. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Breslow N, Crowley J. A large sample study of the life table and product limit estimates under random censorship. Ann Stat 1974; 2: 437–453. [Google Scholar]
39.Hall WJ, Wellner JA. Confidence bands for a survival curve from censored data. Biometrika 1980; 67: 133–143. [Google Scholar]
40.Nair VN. Confidence bands for survival functions with censored data: A comparative study. Technometrics 1984; 26: 265–275. [Google Scholar]
41.Lin D, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika 1994; 81: 73–81. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-zip-1-smm-10.1177_09622802231158735 - Supplemental material for Omnibus test for restricted mean survival time based on influence function

Click here for additional data file.^{(40KB, zip)}

[bibr1-09622802231158735] 1.Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 1966; 50: 163–170. [PubMed] [Google Scholar]

[bibr2-09622802231158735] 2.Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68: 316–319. [Google Scholar]

[bibr3-09622802231158735] 3.Fleming TR, Harrington DP. Counting processes and survival analysis. Hoboken, NY: John Wiley & Sons, Inc., 2005. [Google Scholar]

[bibr4-09622802231158735] 4.Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika 1982; 69: 553–566. [Google Scholar]

[bibr5-09622802231158735] 5.Gehan EA. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 1965; 52: 203. [PubMed] [Google Scholar]

[bibr6-09622802231158735] 6.Tarone RE, Ware J. On distribution-free tests for equality of survival distributions. Biometrika 1977; 64: 156–160. [Google Scholar]

[bibr7-09622802231158735] 7.Irwin JO. The standard error of an estimate of expectation of life, with special reference to expectation of tumourless life in experiments with mice. J Hyg 1949; 47: 188–189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr8-09622802231158735] 8.Karrison T. Restricted mean life with adjustment for covariates. J Am Stat Assoc 1987; 82: 1169–1176. [Google Scholar]

[bibr9-09622802231158735] 9.Zucker DM. Restricted mean life with covariates: modification and extension of a useful survival analysis method. J Am Stat Assoc 1998; 93: 702–709. [Google Scholar]

[bibr10-09622802231158735] 10.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958; 53: 457–481. [Google Scholar]

[bibr11-09622802231158735] 11.Johansen S. The product limit estimator as maximum likelihood estimator. Scand J Stat 1978; 5: 195–199. [Google Scholar]

[bibr12-09622802231158735] 12.Royston P, Parmar MKB. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med 2011; 30: 2409–2421. [DOI] [PubMed] [Google Scholar]

[bibr13-09622802231158735] 13.Zhao L, Tian L, Uno H, et al. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study. Clin Trials 2012; 9: 570–577. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr14-09622802231158735] 14.Uno H, Claggett B, Tian L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol 2014; 32: 2380–2385. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr15-09622802231158735] 15.Uno H, Wittes J, Fu H, et al. Alternatives to hazard ratios for comparing the efficacy or safety of therapies in noninferiority studies. Ann Intern Med 2015; 163: 127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr16-09622802231158735] 16.Uno H, Tian L, Claggett B, et al. A versatile test for equality of two survival functions based on weighted differences of Kaplan–Meier curves. Stat Med 2015; 34: 3680–3695. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr17-09622802231158735] 17.Pak K, Uno H, Kim DH, et al. Interpretability of cancer clinical trial results using restricted mean survival time as an alternative to the hazard ratio. JAMA Oncol 2017; 3: 1692. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr18-09622802231158735] 18.Zhao L, Claggett B, Tian L, et al. On the restricted mean survival time curve in survival analysis. Biometrics 2015; 72: 215–221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr19-09622802231158735] 19.Tian L, Fu H, Ruberg SJ, et al. Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations. Biometrics 2017; 74: 694–702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr20-09622802231158735] 20.Huang B, Kuan PF. Comparison of the restricted mean survival time with the hazard ratio in superiority trials with a time-to-event end point. Pharm Stat 2017; 17: 202–213. [DOI] [PubMed] [Google Scholar]

[bibr21-09622802231158735] 21.Tian L, Jin H, Uno H, et al. On the empirical choice of the time window for restricted mean survival time. Biometrics 2020; 76: 1157–1166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr22-09622802231158735] 22.Horiguchi M, Cronin AM, Takeuchi M, et al. A flexible and coherent test/estimation procedure based on restricted mean survival times for censored time-to-event data in randomized clinical trials. Stat Med 2018; 37: 2307–2320. [DOI] [PubMed] [Google Scholar]

[bibr23-09622802231158735] 23.Wolski A, Grafféo N, Giorgi R. A permutation test based on the restricted mean survival time for comparison of net survival distributions in non-proportional excess hazard settings. Stat Methods Med Res 2019; 29: 1612–1623. [DOI] [PubMed] [Google Scholar]

[bibr24-09622802231158735] 24.Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69: 383–393. [Google Scholar]

[bibr25-09622802231158735] 25.Huber PJ. Robust statistical procedures. Philadelphia: Society for Industrial and Applied Mathematics, 1977. [Google Scholar]

[bibr26-09622802231158735] 26.Murray S, Tsiatis AA. Sequential methods for comparing years of life saved in the two-sample censored data problem. Biometrics 1999; 55: 1085–1092. [DOI] [PubMed] [Google Scholar]

[bibr27-09622802231158735] 27.Reid N. Influence functions for censored data. Ann Stat 1981; 9: 78–92. [Google Scholar]

[bibr28-09622802231158735] 28.Gill RD. Censoring and stochastic integrals. Stat Neerl 1980; 34: 124. [Google Scholar]

[bibr29-09622802231158735] 29.Andersen PK, Borgan Ø, Gill RD, et al. Statistical models based on counting processes. USA: Springer, 1993. [Google Scholar]

[bibr30-09622802231158735] 30.Von Mises R. On the asymptotic distribution of differentiable statistical functions. Ann Math Stat 1947; 18: 309–348. [Google Scholar]

[bibr31-09622802231158735] 31.Efron B, Stein C. The jackknife estimate of variance. Ann Stat 1981; 9: 586–596. [Google Scholar]

[bibr32-09622802231158735] 32.Rühl J Beyersmann J, Friedrich S. General independent censoring in event-driven trials with staggered entry. Biometrics 2022; 1–12. DOI: 10.1111/biom.13710. [DOI] [PubMed] [Google Scholar]

[bibr33-09622802231158735] 33.Chandler RE, Bate S. Inference for clustered data using the independence loglikelihood. Biometrika 2007; 94: 167–183. [Google Scholar]

[bibr34-09622802231158735] 34.Paoletti X, Oba K, Bang YJ, et al. Progression-free survival as a surrogate for overall survival in advanced/recurrent gastric cancer trials: A meta-analysis. J Natl Cancer Inst 2013; 105: 1667–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr35-09622802231158735] 35.Buyse M, Molenberghs G, Paoletti X, et al. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J 2015; 58: 104–132. [DOI] [PubMed] [Google Scholar]

[bibr36-09622802231158735] 36.Klein JP, Moeschberger ML. Survival analysis: Techniques for censored and truncated data. New York: Springer, 2003. [Google Scholar]

[bibr37-09622802231158735] 37.Liu W, Yu X, Zhong W, et al. Projection test for mean vector in high dimensions. J Am Stat Assoc 2022: 1–13. DOI: 10.1080/01621459.2022.2142592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr38-09622802231158735] 38.Breslow N, Crowley J. A large sample study of the life table and product limit estimates under random censorship. Ann Stat 1974; 2: 437–453. [Google Scholar]

[bibr39-09622802231158735] 39.Hall WJ, Wellner JA. Confidence bands for a survival curve from censored data. Biometrika 1980; 67: 133–143. [Google Scholar]

[bibr40-09622802231158735] 40.Nair VN. Confidence bands for survival functions with censored data: A comparative study. Technometrics 1984; 26: 265–275. [Google Scholar]

[bibr41-09622802231158735] 41.Lin D, Fleming TR, Wei LJ. Confidence bands for survival curves under the proportional hazards model. Biometrika 1994; 81: 73–81. [Google Scholar]

PERMALINK

Omnibus test for restricted mean survival time based on influence function

Jiaqi Gu

Yiwei Fan

Guosheng Yin

Abstract

1. Introduction

Figure 1.

2. Methodology

2.1. RMST as statistical functional

Remark 1

2.2. Influence function

Remark 2

2.3. RMST-Based Wald test

Remark 3

2.4. Selection of time points

Figure 2.

3. Simulations

3.1. Estimation accuracy of V^ψ

Table 1.

Table 2.

3.2. Power Comparisons

Figure 3.

Table 3.

Figure 4.

Figure 5.

3.3. Robustness to censoring patterns

Table 4.

4. Real data analysis

Figure 6.

5. Conclusions

Supplemental Material

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.1. Estimation accuracy of ${\hat{V}}_{ψ}$