Sizing clinical trials when comparing bivariate time-to-event outcomes

Tomoyuki Sugimoto; Toshimitsu Hamasaki; Scott Evans; Takashi Sozu

doi:10.1002/sim.7225

. Author manuscript; available in PMC: 2018 Apr 30.

Published in final edited form as: Stat Med. 2017 Jan 24;36(9):1363–1382. doi: 10.1002/sim.7225

Sizing clinical trials when comparing bivariate time-to-event outcomes

Tomoyuki Sugimoto ^a, Toshimitsu Hamasaki ^b,^*,^†, Scott Evans ^c, Takashi Sozu ^d

PMCID: PMC5533151 NIHMSID: NIHMS877489 PMID: 28120524

Abstract

Clinical trials with multiple primary time-to-event outcomes are common. Use of multiple endpoints creates challenges in the evaluation of power and the calculation of sample size during trial design particularly for time-to-event outcomes. We present methods for calculating the power and sample size for randomized superiority clinical trials with two correlated time-to-event outcomes. We do this for independent and dependent censoring for three censoring scenarios: (i) the two events are non-fatal; (ii) when one event is fatal (semi-competing risk); and (iii) when both are fatal (competing risk). We derive the bivariate log-rank test in all three censoring scenarios and investigate the behavior of power and the required sample sizes. Separate evaluations are conducted for two inferential goals, evaluation of whether the test intervention is superior to the control on (1) all of the endpoints (multiple co-primary) or (2) at least one endpoint (multiple primary).

Keywords: dependent censoring, log-rank test, multiple endpoints, semi-competing risk, time-dependent association

1. Introduction

The use of two time-to-event outcomes as primary endpoints has become common in clinical trials evaluating interventions in many disease areas. For example, co-infection/comorbidity trials may utilize primary endpoints to evaluate multiple comorbities, for example, a trial evaluating therapies to treat Kaposi’s sarcoma in HIV-infected individuals may have the time to Kaposi’s sarcoma progression and the time to HIV virologic failure, as primary endpoints. In new anti-cancer drug trials, the most commonly used primary endpoint is overall survival (OS) defined as the time from randomization until death from any cause. However, OS in general requires long follow-up periods after disease progression leading to long and expensive studies. Therefore, in addition to OS, many clinical trials include progression-free survival (PFS) as a primary endpoint, defined as the time from randomization to the first of tumor progression or death. Meanwhile in trials aimed at evaluating treatments to reduce a specific type of mortality, the two endpoints of time to disease-specific mortality and all-cause mortality are primary endpoints. In the first example of co-infection/comorbidity trials, both events are non-fatal (the ‘both non-fatal case’ where neither event-time is censored by the other event). However, in the oncology example, one event is fatal (death) potentially censoring the other non-fatal event (the ‘one fatal case’). Here, we use the term ‘fatal’ to describe an event that censors future events of interest. This is referred as to ‘semi-competing risk problem’, first introduced by [1]. In the last example, both events are fatal as each event-time may be censored by the other event (the ‘both fatal case’).

We developed methods for sizing clinical trials with two primary time-to-event outcomes under a time-dependent correlation structure of three bivariate exponential distributions, where (i) both events are non-fatal [2,3]. In this paper, we discuss the log-rank test-based method using the normal approximation for power and sample size calculations in clinical trials with two time-to-event outcomes, to accommodate two additional situations: (ii) when one event is fatal and (iii) when both are fatal. We also evaluate composite endpoints as a strategy. Hence, six scenarios are evaluated as classified by the three censoring schemes (both non-fatal, one fatal, and both fatal) for the two time-to-event outcomes and whether a composite endpoint is used. Table I illustrates examples of the six scenarios.

Table I.

Classification of event types by the composite and non-composite examples (six scenarios)

Type

Non-composite examples

Composite examples

Both non-fatal events

HIV trial

Time to virologic failure
Time to discontinuation of treatment due to toxicity

Cardiovascular trial:

Time to revascularization
Time to major bleeding

HIV trial

Time to virologic failure or to discontinuation of treatment due to toxicity

One fatal event (one non-fatal event)

Oncology trial

Time to progression (TTP)
Time to all cause death (OS: overall survival)

Cardiovascular trial

Time to cardiovascular hospital admissions
Time to all cause death (OS)

Oncology trial

Time to progression or all cause death (PFS: progression free survival)
Time to all cause death (OS)

Cardiovascular trial

Time to myocardial infarction
Time to major adverse cardiac events (MACE) (acute myocardial infarction, ischaemic stroke, coronary arterial occlusion, death)

Both fatal events

Cardiovascular trial

Time to cardiac-specific death
Time to the other cause death

Cardiovascular trial

Time to cardiac-specific death
Time to all cause death (OS)

Open in a new tab

We consider two inferential goals for clinical trials with multiple endpoints: (1) ‘multiple co-primary endpoints’ (where the trial is designed to evaluate if the intervention is superior to the control on all of the endpoints) and (2) ‘multiple primary endpoints’ (or ‘alternative primary endpoints’, where the trial is designed to evaluate if the intervention is superior to the control on at least one endpoint [4–6]). When considering two or more endpoints as co-primary, no adjustment is needed to control the Type I error rate if the hypothesis associated with each endpoint is evaluated at the same significance level as that required for all of the objectives. However, the Type II error rate increases as the number of endpoints being evaluated increases. Thus, design adjustments are needed to maintain the overall power. In contrast, when designing the trial to evaluate an effect on at least one of the endpoints, an adjustment is needed to control the Type I error rate because the Type I error rate increases as the number of endpoints being evaluated increases.

The paper is structured as follows: in Section 2, we describe the dependence measure and censoring schemes, and then discuss the correlation structure of the bivariate log-rank statistic. In Sections 3 and 4, we provide the power for comparing two groups with respect to two time-to-event outcomes as co-primary or multiple primary and describe methods for calculating the sample size. We also investigate the behaviors of power and the required sample sizes with a real example. In Section 5, we summarize our findings.

2. Censoring schemes, dependency, and correlation

2.1. Notation and framework

Consider a randomized clinical trial designed to compare two interventions with total N participants being recruited and randomized. Suppose that r⁽²⁾N participants are assigned to the test intervention group and r⁽¹⁾N participants to the control intervention group (r⁽¹⁾ + r⁽²⁾ = 1). Patients are then followed to evaluate bivariate survival times for two endpoints. Let $T_{i k}^{*}$ and C_ik be the underlying continuous survival time and potential censoring time of the kth primary endpoint for the ith participant (k = 1, 2; i = 1, … , N). Assume C_i = C_i₁ = C_i₂, because C_i₁ and C_i₂ are usually the same time. Hence, we observe the bivariate time-to-event data of ${(T_{i 1}, T_{i 2}, Δ_{i 1}, Δ_{i 2}, g_{i})}_{i = 1}^{N}$ , where T_ik and Δ_ik are the ith observable survival time and right-censoring indicator for the kth primary endpoint, respectively, and g_i is the group index j (j = 2 if the ith participant belongs to the test, and j = 1 otherwise). For example, typically we see $T_{i k} = min (T_{i k}^{*}, C_{i k})$ and $Δ_{i k} = 𝟙 (T_{i k}^{*} \leq C_{i})$ (𝟙 (·) is the index function). The information of (T_ik, Δ_ik) is represented by the counting process 𝒩_ik(t) = 𝟙 (T_ik ≤ t, Δ_ik = 1) and the at-risk process 𝒴_ik(t) = 𝟙 (T_ik ≥ t).

Denote the marginal hazard function and its cumulative function for $T_{i j}^{*}$ in the group j by

λ_{k}^{(j)} (t) = lim_{d t \to 0} \frac{1}{d t} Pr (t \leq T_{i k}^{*} < t + d t ∣ t \leq T_{i k}^{*}, g_{i} = j) and Λ_{k}^{(j)} = \int_{0}^{t} λ_{k}^{(j)} (s) d s .

Let ψ_k(t) be the hazard ratio (HR) $λ_{k}^{(2)} (t) / λ_{k}^{(1)} (t)$ between the two intervention groups. To test the single hypothesis ‘H₀_k : ψ_k(t) = 1 for all t’ restricted to the kth endpoint (k = 1, 2), the standardized log-rank statistic

Z_{k}^{(0)} = U_{k} (τ) / \sqrt{{\hat{V}}_{k k}^{(0)} (τ)}

(1)

can be applied to the univariate data set of ${(T_{i k}, Δ_{i k}, g_{i})}_{i = 1}^{N}$ , where τ is the maximum observed follow-up time, U_k(t) is the log-rank process

U_{k} (t) = \sqrt{N} \int_{0}^{t} {\hat{H}}_{k} (s) {d {\hat{Λ}}_{k}^{(2)} (s) - d {\hat{Λ}}_{k}^{(1)} (s)}, k = 1, 2,

${\hat{Λ}}_{k}^{(j)} (t) = \int_{0}^{t} d N_{k}^{(j)} (s) / Y_{k}^{(j)} (s)$ is the Nelson–Aalen estimator of $Λ_{k}^{(j)} (t), {\hat{V}}_{k k}^{(0)} (t)$ is the conditional variance of U_k(t) under the null hypothesis H₀_k,

{\hat{V}}_{k k}^{(0)} (t) = N \int_{0}^{t} {\hat{H}}_{k} {(s)}^{2} {1 - \frac{d N_{k} (s) - 1}{Y_{k} (s) - 1}} {\frac{d {\hat{Λ}}_{k}^{(1)} (s)}{Y_{k}^{(2)} (s)} + \frac{d {\hat{Λ}}_{k}^{(2)} (s)}{Y_{k}^{(1)} (s)}},

where ${\hat{H}}_{k} (t) = N^{- 1} Y_{k}^{(1)} (t) Y_{k}^{(2)} (t) / Y_{k} (t), Y_{k} (t) = Y_{k}^{(1)} (t) + Y_{k}^{(2)} (t), N_{k} (t) = N_{k}^{(1)} (t) + N_{k}^{(2)} (t), Y_{k}^{(j)} (t) = \sum_{i = 1}^{N} 𝟙 (g_{i} = j) Y_{i k} (t)$ and $N_{k}^{(j)} (t) = \sum_{i = 1}^{N} 𝟙 (g_{i} = j) N_{i k} (t)$ are the at-risk and counting processes for the kth endpoint of individuals belonging to the group j.

2.2. Censoring schemes and dependence measures for two time-to-event outcomes

In the trial where the bivariate time-to-event data of ${(T_{i 1}, T_{i 2}, Δ_{i 1}, Δ_{i 2}, g_{i})}_{i = 1}^{N}$ is observed, various situations are considered. First, consider the censoring scheme. [3] discuss the simplest case where both events are non-fatal. We consider the two other situations, that is, where one event is fatal but another is non-fatal (one fatal), and when both events are fatal (both fatal). We also briefly describe the measure of dependence between the two time-to-event outcomes.

Examples of the three censoring scenarios

Both non-fatal outcomes: In HIV clinical trials, if $T_{i 1}^{*}$ is the time to infant HIV infection and $T_{i 2}^{*}$ is the time to infant Hepatitis B infection, neither event-time is censored by the other event. If the subjects do not experience both events, then the events are censored at the same time (e.g., by the end of the study or patient drop-out) in the end of follow-up period. So that we have $T_{i 1} = min (T_{i 1}^{*}, C_{i})$ and $T_{i 2} = min (T_{i 2}^{*}, C_{i})$ .
One fatal outcome: In oncology trials, if $T_{i 1}^{*}$ is the time-to-progression (TTP: defined as the time from randomization until objective tumor progression and does not include deaths) and $T_{i 2}^{*}$ is the OS (time to all cause death),one endpoint (TTP) is censored by the other endpoint (OS) being completely observed. So that, we have $T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ and $T_{i 2} = min (T_{i 2}^{*}, C_{i})$ . As T_i₂ is a competing risk for T_i₁ but T_i₁ is not for T_i₂ (T_i₁ ≤ T_i₂), this situation describes a semi-competing risk discussed by [1] (see [7] for further discussions). If there is a non-zero correlation between the endpoints (i.e., dependent censoring), the situation requires modifying the standard log-rank test to account for the dependent censoring. We may be able to avoid this problems by creating a composite endpoint.
Both fatal outcomes: In trials aimed at reducing a specific type of mortality, if $T_{i 1}^{*}$ is the time to disease-specific mortality and $T_{i 2}^{*}$ is the time to other-cause mortalities, each event may be censored by the other event. This situation is called a competing risk. Here, we have $T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ , and $T_{i 2} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ .

Definition for the composite endpoint and handling censoring

We provide the definition for the composite endpoint in the scenarios with two time-to-event outcomes. See Table I for examples of the composite endpoints, such as PFS composing (combining) the TTP and OS or major adverse cardiac events, where the difference between composing and not composing the endpoints is based on handling rule for censoring in each endpoint. That is, we can consider handling censoring in two ways in our time-to-event context. The first way is as usual to use $Δ_{i k} = 𝟙 (T_{i k} = T_{i k}^{*})$ , k = 1, 2, which is the non-composite setting in the context of this paper. The second one is to handle the censoring indicators by $Δ_{i 1} = 𝟙 (T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}))$ and $Δ_{i 2} = 𝟙 (T_{i 2} = T_{i 2}^{*})$ , which is the definition of the composite setting in this paper. For an example, consider the situation where $T_{i 1}^{*}$ and $T_{i 2}^{*}$ are the TTP and OS endpoints, respectively. Then, (T_i₁, Δ_i₁) is the ith observation for the TTP endpoint in the former handling censoring (the TTP outcome), while (T_i₁, Δ_i₁) is that for the PFS endpoint in the latter (the PFS outcome). Note that the difference between the TTP and PFS outcomes is brought by handling of censoring in this paper, but T_i₁ is the observable time with the same length and notation in both TTP and PFS endpoints. Alternatively, because one usually define the PFS endpoint as ${\tilde{T}}_{i 1}^{*} = min (T_{i 1}^{*}, T_{i 2}^{*})$ , the former handling of censoring for the composite endpoint ${\tilde{T}}_{i 1}^{*}$ leads to the PFS outcome. However, to avoid confusion in derivations, we discuss the case of composite setting based on the handling rule for censoring without notations to define the composite endpoint such as ${\tilde{T}}_{i 1}^{*}$ . Also, in this paper, the observations for the 1st endpoint, (T_i₁, Δ_i₁), are used in either non-composite or composite setting. On the other hand, those for the second endpoint, (T_i₂, Δ_i₂), are consistently considered as the non-composite setting.

Dependence measure

Two times $T_{i 1}^{*}$ and $T_{i 2}^{*}$ may be correlated. We consider a correlation structure between $T_{i 1}^{*}$ and $T_{i 2}^{*}$ . Let $S^{(j)} (t, s) = Pr (t < T_{i 1}^{*}, s < T_{i 2}^{*} ∣ g_{i} = j)$ and $S_{k}^{(j)} (t) = Pr (t < T_{i k}^{*} ∣ g_{i} = j) (k = 1, 2)$ (k = 1, 2) be the joint survival and marginal survival functions for the bivariate survival data ${(T_{i 1}^{*}, T_{i 2}^{*})}_{i = 1}^{n}$ in the group j, respectively. We consider the correlation between the two cumulative hazard variates [8] defined by

ρ^{(j)} = corr [Λ_{1}^{(j)} (T_{i 1}^{*}), Λ_{2}^{(j)} (T_{i 2}^{*})] = \int_{0}^{\infty} \int_{0}^{\infty} S^{(j)} (t, s) d Λ_{1}^{(j)} (t) d Λ_{2}^{(j)} (s) - 1.

If the marginal of the bivariate survival data are exponential, ρ⁽^j⁾ is the same as the correlation coefficient of the raw data ${(T_{i 1}^{*}, T_{i 2}^{*})}_{i = 1}^{n}$ [3]. In order to generate the joint survival function S⁽^j⁾(t, s) from the $S_{1}^{(j)} (t)$ and $S_{2}^{(j)} (s)$ , we prepare a copula function 𝒞 (·), which gives

S^{(j)} (t, s) = C (S_{1}^{(j)} (t), S_{2}^{(j)} (s); θ^{(j)}),

where the association parameter θ⁽^j⁾ included in 𝒞 (·) is a one-to-one function of ρ⁽^j⁾.

2.3. Bivariate structure of two log-rank test statistics

Consider applying the log-rank statistic (1) to the data ${(T_{i k}, Δ_{i k}, g_{i})}_{i = 1}^{N}$ for each endpoint (k = 1, 2) extracted from the bivariate time-to-event data ${(T_{i 1}, T_{i 2}, Δ_{i 1}, Δ_{i 2}, g_{i})}_{i = 1}^{N}$ obtained under the aforementioned situation. Then, the pair ${(Z_{1}^{(0)}, Z_{2}^{(0)})}^{T}$ of the standardized log-rank statistics is approximately bivariate normally distributed with mean vector $\sqrt{N} δ$ and variance–covariance matrix Σ when N is sufficiently large (see [3]), where

δ = (\begin{matrix} δ_{1} \\ δ_{2} \end{matrix}) = (\begin{matrix} \frac{μ_{1} (τ)}{\sqrt{V_{11}^{(0)} (τ)}} \\ \frac{μ_{2} (τ)}{\sqrt{V_{22}^{(0)} (τ)}} \end{matrix}) and \sum = (\begin{matrix} \frac{V_{11} (τ)}{V_{11}^{(0)} (τ)} & \frac{V_{12} (τ)}{\sqrt{V_{11}^{(0)} (τ) V_{22}^{(0)} (τ)}} \\ \frac{V_{12} (τ)}{\sqrt{V_{11}^{(0)} (τ) V_{22}^{(0)} (τ)}} & \frac{V_{22} (τ)}{V_{22}^{(0)} (τ)} \end{matrix}),

μ_k(t) and $V_{k k}^{(0)} (t)$ are asymptotic forms of N^−1/2U_k(t) and ${\hat{V}}_{k k}^{(0)} (t)$ , respectively, and V_kk(t), k = 1, 2 and V₁₂(t) are the asymptotic variances and covariance of U₁(t) and U₂(t), respectively. These elements can be written as, in all censoring scenarios with/without the composite setting,

\begin{array}{l} μ_{k} (τ) & = \int_{0}^{τ} H_{k} (t) {d {\tilde{Λ}}_{k}^{(2)} (t) - d {\tilde{Λ}}_{k}^{(1)} (t)}, k = 1, 2 \\ V_{k k}^{(0)} (τ) & = \int_{0}^{τ} H_{k} {(t)}^{2} {d {\tilde{Λ}}_{k}^{(1)} (t) / r^{(2)} y_{k}^{(2)} (t) + d {\tilde{Λ}}_{k}^{(2)} (t) / r^{(1)} y_{k}^{(1)} (t)}, k = 1, 2 \\ V_{k k} (τ) & = \int_{0}^{τ} H_{k} {(t)}^{2} {d {\tilde{Λ}}_{k}^{(1)} (t) / r^{(1)} y_{k}^{(1)} (t) + d {\tilde{Λ}}_{k}^{(2)} (t) / r^{(2)} y_{k}^{(2)} (t)}, k = 1, 2 \\ V_{12} (τ) & = \int_{0}^{τ} \int_{0}^{τ} H_{1} (t) H_{2} (s) G (t \lor s) {\frac{d A^{(1)} (t, s)}{r^{(1)} y_{1}^{(1)} (t) y_{2}^{(1)} (s)} + \frac{d A^{(2)} (t, s)}{r^{(2)} y_{1}^{(2)} (t) y_{2}^{(2)} (s)}}, \end{array}

(2)

where H_k(t) is the asymptotic form of Ĥ_k(t), $y_{k}^{(j)} (t) = E [Y_{i k} (t) ∣ g_{i} = j], d {\tilde{Λ}}_{k}^{(j)} (t) = E [d N_{i k} (t) ∣ Y_{i k} (t) = 1, g_{i} = j]$ ,

G (t \lor s) d A^{(j)} (t, s) = E [d M_{i 1} (t) d M_{i 2} (s) ∣ g_{i} = j], j = 1, 2

is a covariance function for a martingale variation of $d M_{i k} (t) = d N_{i k} (t) - Y_{i k} (t) d {\tilde{Λ}}_{k}^{(j)} (t)$ , G(t) is the survival function of censoring time C_i, and t ∨ s represents max(t, s). However, note that the forms of H_k(t), $y_{k}^{(j)} (t), d {\tilde{Λ}}_{k}^{(j)} (t)$ , and dA⁽^j⁾(t, s) are different among the censoring scenarios as provided thereafter. Some details of the derivations are in Appendix A. In advance, let Λ⁽^j⁾(t, s) = −log S⁽^j⁾(t, s) be the joint cumulative hazard function in group j. Using this notation, note that we can write $d Λ_{1}^{(j)} (t) = Λ^{(j)} (d t, 0)$ and $d Λ_{2}^{(j)} (s) = Λ^{(j)} (0, d s)$ .

(i)
Both non-fatal outcomes: In this situation, the setting of the censoring indicators is consistently $Δ_{i k} = 𝟙 (T_{i k}^{*} = T_{i k}) = 𝟙 (T_{i k}^{*} \leq C_{i})$ . The forms of H_k(t), $y_{k}^{(j)} (t), d {\tilde{Λ}}_{k}^{(j)} (t)$ and dA⁽^j⁾(t,s) are given by [3]. See Table II for the details. In particular, ${\tilde{Λ}}_{k}^{(j)} (t) = Λ_{k}^{(j)} (t)$ and
$d A^{(j)} (t, s) = S^{(j)} (d t, d s) + S^{(j)} (t, d s) d Λ_{1}^{(j)} (t) + S^{(j)} (d t, s) d Λ_{2}^{(j)} (s) + S^{(j)} (t, s) d Λ_{1}^{(j)} (t) d Λ_{2}^{(j)} (s) .$
(ii)
One fatal outcome: Assume that $T_{i 1}^{*}$ is the time to non-fatal event while $T_{i 2}^{*}$ is the time to fatal event, so that $T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ and $T_{i 2} = min (T_{i 2}^{*}, C_{i})$ . For example, in oncology trials, the TTP and OS endpoints are defined by non-fatal event time $T_{i 1}^{*}$ and fatal one $T_{i 2}^{*}$ , respectively, while the PFS endpoint is defined by $min (T_{i 1}^{*}, T_{i 2}^{*})$ as the composite. The log-rank test is applied to survival outcomes obtained using their endpoints. The applications to two sets of bivariate data defined by the TTP and OS endpoints and by the PFS and OS endpoints are discussed in the following non-composite and composite settings, respectively. See Appendix A for the details provided below.

Table II.

Elements of U_k(τ), $V_{k k}^{(0)} (τ)$ , V_kk(τ), and V₁₂(τ) for the three censoring scenarios under the non-composite setting. t ∨ s = max(t, s).

Elements

Both non-fatal case

One fatal case

Both fatal case

d {\tilde{Λ}}_{k}^{(j)} (t)

Λ⁽^j⁾(dt, t)

d {\tilde{Λ}}_{k}^{(j)} (t)

Λ ⁽^j⁾(t, dt)

y_{k}^{(j)} (t)

S⁽^j⁾(t, t)G(t)

S_{k}^{(j)} (t) G (t)

H_k(t)

G (t) \frac{r^{(1)} r^{(2)} S^{(1)} (t, t) S^{(2)} (t, t)}{r^{(1)} S^{(1)} (t, t) + r^{(2)} S^{(2)} (t, t)}

G (t) \frac{r^{(1)} r^{(2)} S_{k}^{(1)} (t) S_{k}^{(2)} (t)}{r^{(1)} S_{k}^{(1)} (t) + r^{(2)} S_{k}^{(2)} (t)}

dA⁽^j⁾(t, s)

\begin{array}{l} S^{(j)} (d t, d s) \\ + S^{(j)} (t, d s) d Λ_{1}^{(j)} (t) \\ + S^{(j)} (d t, s) d Λ_{2}^{(j)} (s) \\ + S^{(j)} (t, s) d Λ_{1}^{(j)} (t) d Λ_{2}^{(j)} (s) \end{array}

\begin{array}{l} 𝟙 (t \leq s) S^{(j)} (d t, d s) \\ + 𝟙 (t \leq s) S^{(j)} (t, d s) Λ^{(j)} (d t, t) \\ + S^{(j)} (d t, t \lor s) d Λ_{2}^{(j)} (s) \\ + S^{(j)} (t, t \lor s) Λ^{(j)} (d t, t) d Λ_{2}^{(j)} (s) \end{array}

Open in a new tab

Non-composite case

Consider the non-composite setting, that is, set $Δ_{i k} = 𝟙 (T_{i k}^{*} = T_{i k})$ , k = 1, 2 for the censoring indicators, where non-fatal $T_{i 1}^{*}$ is censored by observing fatal $T_{i 2}^{*}$ earlier than $T_{i 1}^{*}$ . The forms of H_k(t), $y_{k}^{(j)} (t)$ , and $d {\tilde{Λ}}_{k}^{(j)} (t)$ , j = 1, 2 are that, for the non-fatal endpoint,

H_{1} (t) = G (t) \frac{r^{(1)} r^{(2)} S^{(1)} (t, t) S^{(2)} (t, t)}{r^{(1)} S^{(1)} (t, t) + r^{(2)} S^{(2)} (t, t)}, y_{1}^{(j)} (t) = G (t) S^{(j)} (t, t) and d {\tilde{Λ}}_{1}^{(j)} (t) = Λ^{(j)} (d t, t),

depending on the correlation between $T_{i 1}^{*}$ and $T_{i 2}^{*}$ , while, for the fatal endpoint,

H_{2} (t) = G (t) \frac{r^{(1)} r^{(2)} S_{2}^{(1)} (t) S_{2}^{(2)} (t)}{r^{(1)} S_{2}^{(1)} (t) + r^{(2)} S_{2}^{(2)} (t)}, y_{2}^{(j)} (t) = G (t) S_{2}^{(j)} (t) and d {\tilde{Λ}}_{2}^{(j)} (t) = d Λ_{2}^{(j)} (t)

being the same as those of situation (i). Also, dA⁽^j⁾(t, s) related to the covariance of the martingale variation is

\begin{array}{l} d A^{(j)} (t, s) = 𝟙 (t \leq s) S^{(j)} (d t, d s) + 𝟙 (t \leq s) S^{(j)} (t, d s) Λ^{(j)} (d t, t) \\ + S^{(j)} (d t, t \lor s) d Λ_{2}^{(j)} (s) + S^{(j)} (t, t \lor s) Λ^{(j)} (d t, t) d Λ_{2}^{(j)} (s) . \end{array}

(3a)

Composite case

In the composite setting, set $Δ_{i 1} = 𝟙 (T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}))$ and $Δ_{i 2} = 𝟙 (T_{i 2} = T_{i 2}^{*})$ for the censoring indicators. The forms of H_k(t), $y_{k}^{(j)} (t)$ and $d {\tilde{Λ}}_{2}^{(j)} (t)$ are the same as those in the non-composite case, while $d {\tilde{Λ}}_{1}^{(j)} (t)$ and dA⁽^j⁾(t, s) are changed from the non-composite case, where the intensity information on the second endpoint (fatal event) is added into them (k = 1, 2, j = 1, 2). That is, in this case, we have

d {\tilde{Λ}}_{1}^{(j)} (t) = Λ^{(j)} (d t, t) + Λ^{(j)} (t, d t)

and

\begin{array}{l} d A^{(j)} (t, s) = 𝟙 (t \leq s) S^{(j)} (d t, d s) - 𝟙 (t = s) S^{(j)} (t, d s) \\ + 𝟙 (t \leq s) S^{(j)} (t, d s) {Λ^{(j)} (d t, t) + Λ^{(j)} (t, d t)} \\ + {S^{(j)} (d t, t \lor s) + 𝟙 (t \leq s) S^{(j)} (d t, t)} d Λ_{2}^{(j)} (s) \\ + S^{(j)} (t, t \lor s) {Λ^{(j)} (d t, t) + Λ^{(j)} (t, d t)} d Λ_{2}^{(j)} (s) . \end{array}

(3b)

(iii)
Both fatal outcomes: Both $T_{i 1}^{*}$ and $T_{i 2}^{*}$ are the times to fatal events, so that we observe $T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ and $T_{i 2} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ . See Appendix A for the details provided in the subsequent section.

Non-composite case

Assume the non-composite setting, that is, $Δ_{i k} = 𝟙 (T_{i k}^{*} = T_{i k})$ , k = 1, 2. The forms of H_k(t), $y_{k}^{(j)} (t)$ , and $d {\tilde{Λ}}_{k}^{(j)} (t)$ , k = 1, 2, j = 1, 2 are similar to those for the 2nd endpoint in situation (ii), because, if either of T_i₁ or T_i₂ is completely observed, the other is always censored. Inthis case, we can derive

\begin{array}{l} H_{k} (t) & = G (t) \frac{r^{(1)} r^{(2)} S^{(1)} (t, t) S^{(2)} (t, t)}{r^{(1)} S^{(1)} (t, t) + r^{(2)} S^{(2)} (t, t)}, y_{k}^{(j)} (t) = G (t) S^{(j)} (t, t), k = 1, 2 \\ d {\tilde{Λ}}_{1}^{(j)} (t) & = Λ^{(j)} (d t, t), d {\tilde{Λ}}_{2}^{(j)} (t) = Λ^{(j)} (t, d t) \end{array}

and

d A^{(j)} (t, s) = 0.

(4a)

Composite case

Here, set $Δ_{i 1} = 𝟙 (T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}))$ and $Δ_{i 2} = 𝟙 (T_{i 2} = T_{i 2}^{*})$ for the composite setting. In this case, only $d {\tilde{Λ}}_{1}^{(j)} (t)$ and dA⁽^j⁾(t, s), j = 1, 2 are changed from the non-composite version. Hence, we have

d {\tilde{Λ}}_{1}^{(j)} (t) = Λ^{(j)} (d t, t) + Λ^{(j)} (t, d t)

and

\begin{array}{l} d A^{(j)} (t, s) = - 𝟙 (t = s) S^{(j)} (t \lor s, d s) + 𝟙 (t \leq s) S^{(j)} (s, d s) Λ^{(j)} (t, d t) \\ + 𝟙 (t \geq s) S^{(j)} (t, d t) Λ^{(j)} (s, d s) + S^{(j)} (t \lor s, t \lor s) Λ^{(j)} (t, d t) Λ^{(j)} (s, d s) . \end{array}

(4b)

The forms of H_k(t), $y_{k}^{(j)} (t)$ , and $d {\tilde{Λ}}_{2}^{(j)} (t)$ , k, j = 1, 2 are the same as those in the non-composite setting.

Table II summarizes the statistics regarding $y_{k}^{(j)} (t), d {\tilde{Λ}}_{k}^{(j)} (t)$ , H_k(t), and dA⁽^j⁾(t, s) among the three censoring scenarios under the non-composite setting. The table clearly shows how these statistics change among the three scenarios. For all scenarios, the power is given by

1 - β = Pr [(Z_{1}^{(0)}, Z_{2}^{(0)}) \in Z^{(0)}] \approx \int \int_{(z_{1}, z_{2}) \in Z^{(0)}} ϕ (z_{1}, z_{2}; \sqrt{N} δ, \sum) d z_{1} d z_{2}

(5)

because the hypothesis is rejected if the bivariate statistic ${(Z_{1}^{(0)}, Z_{2}^{(0)})}^{T}$ takes the realized values on the area 𝒵⁽⁰⁾, where $ϕ (z_{1}, z_{2}; \sqrt{N} δ, \sum)$ is the bivariate normal density with mean vector $\sqrt{N} δ$ and variance–covariance matrix Σ.

3. Co-primary endpoints

3.1. Hypothesis testing, power, and sample sizes

We are interested in testing the hypotheses on the HRs to evaluate a joint reduction of occurrence of events over time on both outcomes, that is, $H_{0}^{cp} = {H_{01} : ψ_{1} (t) = 1 or H_{02} : ψ_{2} (t) = 1 for all t}$ versus $H_{1}^{cp} = {H_{11} : ψ_{1} (t) < 1 and H_{12} : ψ_{2} (t) < 1 at some t}$ . In all of the censoring scenarios, using the two log-rank statistics, $Z_{k}^{(0)} = U_{k} (τ) / \sqrt{{\hat{V}}_{k k}^{(0)} (τ)}$ , k = 1, 2 for this hypothesis testing, we are able to

reject the null hypothesis H_{0}^{cp} if and only if Z_{1}^{(0)} < - z_{α} and Z_{2}^{(0)} < - z_{α}

(6)

at the prespecified significance level of α, where z_α is the 100(1 − α) percentile of N(0, 1). The overall Type I error associated with the null hypothesis $H_{0}^{cp}$ is controlled by the maximum of the marginal Type I errors [5]. This means that, to investigate whether the overall Type I error is larger than the nominal level, it is enough to investigate whether the marginal ones are larger than the nominal level [2]. The behaviors of the Type I error for the univariate log-rank test are well known [9–11]. The one-sided test procedure (6) based on the asymptotic normality may inflate under some situations such as small sample size and/or unbalanced design (in particular, with r⁽¹⁾ > 0.5). We can improve the precision by correcting the critical value based on the sample size N and the allocation rate r⁽¹⁾ to control the marginal Type I errors. However, the overall Type I error on $H_{0}^{cp}$ is, for example, the product of marginal ones for independent endpoints, and usually quite smaller than the maximum of marginal ones as long as ρ⁽^j⁾ is not so high, so that the inflation problem seen in one-sided log-rank test may be moderately reduced in multiple co-primary problem.

In the procedure (6), the rejection region of $H_{0}^{cp}$ is { $(Z_{1}^{(0)}, Z_{2}^{(0)}) : Z_{1}^{(0)} < - z_{α}$ and $Z_{2}^{(0)} < - z_{α}$ }. In all of the scenarios, therefore, we have the power function for the joint reduction in both time-to-event outcomes given as

1 - β = Pr [\cap_{k = 1}^{2} {Z_{k}^{(0)} < - z_{α}} ∣ H_{1}^{cp}]

This overall power is referred to as ‘complete power’ [12] or ‘conjunctive power’ [13], which is simply calculated using the cumulative distribution function of the bivariate normal distribution. The power can be approximately calculated (under large samples) by

1 - β \approx \int_{- \infty}^{z_{α}} \int_{- \infty}^{- z_{α}} ϕ (z_{1}, z_{2}; \sqrt{N} δ, \sum) d z_{1} d z_{2}

(7)

Let N_cp be the minimum of total sample size N required for testing $H_{0}^{cp}$ against $H_{1}^{cp}$ . Thus, N_cp is the smallest integer not less than satisfying the power (7) and given by

N_{cp} = ⌈ {(K_{β} + z_{α} σ_{2}^{(0)} / σ_{2})}^{2} / δ_{2}^{2} ⌉

(8)

where ⌈x⌉ is the smallest integer not less than x, $σ_{k}^{(0)}$ and σ_k(k = 1, 2) are the standard deviations $\sqrt{V_{k k}^{(0)} (τ)}$ and $\sqrt{V_{k k} (τ)}$ , respectively, K_β is the solution of the integral equation

1 - β = \int_{- \infty}^{\frac{δ_{1}}{δ_{2}} K_{β} + z_{α} {\frac{δ_{1}}{δ_{2}} \frac{σ_{2}^{(0)}}{σ_{2}} - \frac{σ_{1}^{(0)}}{σ_{1}}}} \int_{- \infty}^{K_{β}} ϕ (z_{1}, z_{2}; 0, R) d z_{1} d z_{2}

and R is the following correlation matrix

R = (\begin{matrix} 1 & V_{12} (τ) / \sqrt{V_{11} (τ) V_{22} (τ)} \\ V_{12} (τ) / \sqrt{V_{11} (τ) V_{22} (τ)} & 1 \end{matrix}) .

A grid search to find a value of N_cp often requires considerable computing time. The search proceeds by gradually increasing N_cp until the power (7) exceeds the desired power. Alternative methods to reduce the computational time are the Newton–Raphson algorithm for finding K_β in [14] or the basic linear interpolation algorithm in [2].

One may expect to calculate the required number of participants from that of events such as the Reference [15] procedure in univariate data. However, we encounter difficulty in finding such a procedure in the bivariate case [3]. One of the causes is that the relationship between numbers of events and participants is more complicated than the univariate case, because the patterns of observations (censored or not) increase twofold. Another is that it is difficult to let the difference between the correlation models (such as early or late dependency) on a restricted time-interval reflect using both numbers of events without censoring, because both numbers of events are usually considered under the uncensored model (or after long enough follow-up). Even so, the required numbers of events are useful to monitor a trial and they can be obtained using D_𝚤𝚥 = N_cp × P_𝚤𝚥, where P_𝚤𝚥 is the proportion of each observation pattern, $P_{𝚤 𝚥} = Pr (Δ_{i}^{(1)} = 𝚤, Δ_{i}^{(2)} = 𝚥)$ , D_𝚤𝚥 is the expected number of events in each case where the first endpoint is observed (𝚤 = 1) or censored (𝚤 = 0) and the second is observed (𝚥 = 1) or not (𝚥 = 0). See Appendix B for details for calculation of P_𝚤𝚥 under the three censoring schemes.

3.2. Behavior of the sample size

We investigate the behavior of the sample size and power for detecting the joint reduction $H_{1}^{cp}$ in bivariate time-to-event data under the three censoring schemes with the two time-dependent association structures, where the time-dependent association structures are asymmetric late (tail) dependency generated by the Clayton copula [16] and early (tail) dependency by the Gumbel copula [17], which have been widely used in practice. We generate bivariate time-to-event data by supposing that the marginals of $T_{i 1}^{*}$ and $T_{i 2}^{*}$ are exponential, and that C_i = U(0, τ_a) + τ_f (hence, τ = τ_a + τ_f), where τ_a and τ_f are the lengths of the entry period to the trial and follow-up period, respectively, and U(0, τ_a) denotes a uniform random number on (0, τ_a). The target power 1 − β = 0.8, the significance level α = 0.025, τ_a = 2, and τ_f = 3 are used, and all empirical powers are computed by Monte Carlo trials with 100,000 replications throughout.

Let N_sim be the simulation-based sample size required for testing $H_{0}^{cp}$ , and let N_k denote the minimum of total sample size required to test the single hypothesis H₀_k for the kth endpoint. To avoid confusion in N_cp and N₁, we write them as $N_{cp}^{(c)}$ and $N_{1}^{(c)}$ if the first endpoint is replaced by the composite of the first and the second ones (composite setting), and as $N_{cp}^{(nc)}$ and $N_{1}^{(nc)}$ otherwise (non-composite setting). Let P̃_cp denote the empirical power (%) for detecting $H_{1}^{cp}$ when the total number of participants is N_cp designed using the formula (8). See Section B.1 of Supporting Information for results other than ones provided later.

Formula performance and sample size behavior: one fatal case

Focusing on the one fatal case scenario, suppose that $T_{i 1}^{*}$ and $T_{i 2}^{*}$ are the TTP and OS endpoints, respectively. We evaluate the performance of the sample size formula (8) for practicality by comparing it with alternative solutions sizing based on univariate versions (N_TTP, N_PFS, and N_OS) and simulation (N_sim) under the common α and β, where $N_{TTP} (= N_{1}^{(nc)}), N_{PFS} (= N_{1}^{(c)})$ , and N_OS (= N₂) are N_k when the kth endpoint is the TTP, PFS, and OS. In the one fatal case, the bivariate survival data (T_i₁, Δ_i₁) and (T_i₂, Δ_i₂) are of the TTP and OS outcomes under the non-composite setting, but they are of the PFS and OS outcomes under the composite setting.

Table III displays the required total sample sizes N_cp, N_sim, N₁, and N₂ with the empirical power P̃_cp under $ψ_{TTP}^{- 1} = 1.7, S_{TTP}^{(1)} (τ) = 0.3$ and $S_{OS}^{(1)} (τ) = 0.4$ , when $ψ_{OS}^{- 1}$ and ρ⁽^k⁾ vary for the combinations of $ψ_{OS}^{- 1} = 1.3$ , 1.5, 1.7 and ρ⁽¹⁾ = ρ⁽²⁾ = 0, 0.3, 0.5 and 0.8. Note that ψ_TTP (= ψ₁) and ψ_OS (=ψ₂) are the HRs for the TTP and OS, respectively. Similarly, $S_{TTP}^{(1)} (τ) (= S_{1}^{(1)} (τ))$ and $S_{OS}^{(1)} (τ) (= S_{2}^{(1)} (τ))$ are the τ-time survival rates for the TTP and OS.

Table III.

The case of one fatal outcome (ii) (semi-competing risk): total numbers of participants N_cp calculated from (8), the corresponding empirical powers P̃_cp (%) and alternative sizing solutions N_sim, N_TTP and N_OS under $ψ_{TTP}^{- 1} = 1.7, S_{TTP}^{(1)} (τ) = 0.3$ and $S_{OS}^{(1)} (τ) = 0.4$ .

Dependence

Composite setting

Non-composite setting

Structure

ψ_{OS}^{- 1}

ρ⁽^k⁾

N_sim

N_{cp}^{(c)}

P̃_cp

N_PFS

N_OS

N_sim

N_{cp}^{(nc)}

P̃_cp

N_TTP

N_OS

Late (Clayton copula)

1.3

0.0

968

972

80.1

262

972

966

972

80.1

288

972

1.3

0.3

968

972

80.2

312

972

974

80.2

314

972

1.3

0.5

968

972

80.2

348

972

970

976

80.3

328

972

1.3

0.8

968

972

79.8

418

972

966

972

80.3

324

972

1.5

0.0

430

436

80.5

196

432

474

480

80.9

284

432

1.5

0.3

436

440

80.6

232

432

492

496

80.9

320

432

1.5

0.5

440

444

80.6

260

432

500

506

80.8

342

432

1.5

0.8

450

456

80.5

316

432

504

510

80.9

362

432

1.7

0.0

272

278

81.2

160

266

354

360

81.3

280

266

1.7

0.3

280

286

80.9

190

266

380

388

81.0

324

266

1.7

0.5

288

294

81.0

214

266

402

408

81.1

354

266

1.7

0.8

308

314

80.9

262

266

426

432

80.9

394

266

Early (Gumbel copula)

1.3

0.0

968

972

80.2

262

972

968

972

80.1

288

972

1.3

0.3

968

972

80.3

282

972

966

972

80.2

262

972

1.3

0.5

968

972

80.3

294

972

970

972

80.1

232

972

1.3

0.8

970

972

80.1

306

972

968

972

80.2

158

972

1.5

0.0

432

436

80.4

196

432

474

480

80.9

284

432

1.5

0.3

430

434

80.5

212

432

464

472

80.7

278

432

1.5

0.5

430

434

80.5

224

432

456

462

80.5

268

432

1.5

0.8

428

432

80.3

238

432

438

80.8

220

432

1.7

0.0

272

278

81.0

160

266

352

360

81.2

280

266

1.7

0.3

274

278

80.9

176

266

356

364

81.2

294

266

1.7

0.5

274

278

80.8

186

266

358

362

81.0

300

266

1.7

0.8

274

278

80.8

206

266

336

342

80.9

290

266

Open in a new tab

The sample sizes N_cp calculated from the formula (8) usually provide slightly conservative results compared with N_sim, and the corresponding empirical powers P̃_cp are preferable, that is, P̃_cp are slightly larger than the target powers (although P̃_cp tends to be further away from the target as the HR $ψ_{2}^{- 1}$ increases larger than 1). Times to compute N_sim are usually much longer than ones for N_cp, which becomes greater as the effect size is smaller. Hence, the formula (8) reduces the cost greatly, regardless of the effect size, and it is also useful as an initial value to search N_sim. Also, $N_{cp}^{(c)}$ (N_cp under the composite setting) is smaller than $N_{cp}^{(nc)}$ (N_cp under the non-composite setting) in all cases. As the absolute value of $log ψ_{OS}^{- 1}$ decreases below $∣ log ψ_{TTP}^{- 1} ∣$ , the value of N_cp approaches N_OS and diverges from N_PFS and N_TTP. If both HRs are approximately equal, $ψ_{TTP}^{- 1} \approx ψ_{OS}^{- 1}$ , then N_cp is slightly larger than the max(N₁, N₂) (the ratios of N_cp/max(N₁, N₂) are at most about 1.29 in Table III). Further, when comparing late dependency (Clayton copula) and early one (Gumbel copula), if $ψ_{TTP}^{- 1}$ is relatively close to $ψ_{OS}^{- 1}, N_{cp}^{(nc)}$ increases proportionally to ρ⁽^j⁾ in the late dependency, while $N_{cp}^{(nc)}$ decreases or does not change as ρ⁽^j⁾ varies in the early dependency. Similar tendencies but more moderate variation is observed for $N_{cp}^{(c)}$ . That is, a high value of ρ⁽^j⁾ makes the two log-rank statistics correlated, but higher ρ⁽^j⁾ increases the censoring rate, so that the higher ρ⁽^j⁾ does not contribute very much to the reduction of N_cp compared with the sample size when ρ⁽^j⁾ = 0. Hence, N_cp increases in the late dependency, because many observations are censored before the dependent effect works.

Sample size behavior under correlated both fatal outcomes

Consider the both fatal outcomes scenario such that $T_{i 1}^{*}$ is the time to disease-specific mortality and $T_{i 2}^{*}$ is the time to other-cause mortalities. If there are such fatal endpoints, it is often assumed that their endpoints are uncorrelated for purposes of ease. But the assumption of no correlation is often unjustified scientifically. We investigate how the required sample sizes $N_{cp}^{(c)}, N_{cp}^{(nc)}, N_{1}^{(c)}, N_{1}^{(nc)}$ , and N₂ behave when the fatal endpoints are correlated. We compute the sample sizes using formula (8) and provide results in Table IV, displaying N_cp, N₁, and N₂ with the empirical power P̃_cp under $ψ_{2}^{- 1} = 1.7, S_{1}^{(1)} (τ) = S_{2}^{(1)} (τ) = 0.5$ , when $ψ_{2}^{- 1}$ and ρ⁽^k⁾ vary for the combinations of $ψ_{2}^{- 1} = 1.5$ , 1.7 and ρ⁽¹⁾ = ρ⁽²⁾ = 0, 0.3, 0.5, and 0.8.

Table IV.

The case of both fatal outcomes (iii) (Competing risk): total numbers of participants N_cp calculated from (8), the corresponding empirical powers P̃_cp (%) and alternative sizing solutions N_sim, N₁ and N₂ under $ψ_{2}^{- 1} = 1.7$ and $S_{1}^{(1)} (τ) = S_{2}^{(1)} (τ) = 0.5$ .

Dependence

Composite setting

Non-composite setting

Structure

ψ_{1}^{- 1}

ρ⁽^k⁾

N_{cp}^{(c)}

P̃_cp

N_{1}^{(c)}

N₂

N_{cp}^{(nc)}

P̃_cp

N_{1}^{(nc)}

N₂

Late (Clayton copula)

1.5

418

80.7

254

404

710

80.7

648

404

1.5

0.3

474

80.8

294

456

834

80.6

774

456

1.5

0.5

518

80.4

328

498

944

80.5

886

498

1.5

0.8

626

80.7

414

592

1258

80.2

1211

592

1.7

404

80.8

200

400

528

81.0

400

1.7

0.3

466

80.8

232

462

608

81.0

462

1.7

0.5

518

80.6

258

514

676

80.8

514

1.7

0.8

652

80.7

324

646

850

80.7

646

Early (Gumbel copula)

1.5

418

80.8

254

404

710

80.8

648

404

1.5

0.3

444

81.0

288

422

854

80.7

814

422

1.5

0.5

452

80.9

312

422

1010

80.5

992

422

1.5

0.8

436

81.0

360

364

1840

80.6

1840

364

1.7

404

80.7

200

400

528

81.1

400

1.7

0.3

458

80.8

228

454

598

81.3

454

1.7

0.5

496

80.8

246

492

648

81.1

492

1.7

0.8

566

81.0

282

562

740

81.2

562

Open in a new tab

The sample sizes N_cp from (8) usually provide slightly more conservative results from the perspective that the corresponding empirical powers P̃_cp are slightly larger than the target powers, similarly to the one fatal case. The value of $N_{cp}^{(c)}$ is smaller than $N_{cp}^{(nc)}$ in any case, and $N_{cp}^{(c)}, N_{cp}^{(nc)}, N_{1}^{(c)}, N_{1}^{(nc)}$ and N₂ usually increase proportionally to ρ⁽^j⁾ in both of late and early dependencies, but $N_{cp}^{(c)}$ and N₂ decrease inversely only when $(ψ_{1}^{- 1}, ψ_{2}^{- 1}) = (1.5, 1.7)$ , ρ⁽^j⁾ = 0.8 and the Gumbel copula. That is, the stronger dependency makes the censoring rate increase more than the one fatal case, and hence, the reduction effect of N_cp based on correlated log-rank statistics are not obtained. On the other hand, the composite endpoint strategy using $N_{1}^{(c)}$ is the most reasonable in terms of smaller sample sizing. In particular, if $ψ_{1}^{- 1} \approx ψ_{2}^{- 1}$ , we have $N_{cp}^{(c)} \approx N_{2} \approx 2 N_{1}^{(c)}$ .

3.3. Illustration: the ICON7 study

We illustrate the sample size methods with an example. Consider ‘A Randomized, Two-Arm, Multi-Centre Gynaecologic Cancer Inter Group Trial of Adding Bevacizumab to Standard Chemotherapy (Carboplatin and Paclitaxel) in Patients With Epithelial Ovarian Cancer’ (ICON7) [18]. The study was designed to investigate the addition of bevacizumab to standard chemotherapy for first-line treatment of woman with ovarian cancer. The primary endpoints of interest were the PFS and OS. The protocol stated that the number of PFS events, 684 was needed to detect a 28% change in the PFS from a median value of 18 months in the control group to 23 months in the bevacizumab group (i.e., $ψ_{PFS}^{- 1} ≐ 1.28$ ), with the power of 90% at the significance level of 5% (a two-sided log-rank test), while the number of OS events, 715 was required to detect a 23% improvement in the OS from a median value of 43 months in the control group to 53 months in the bevacizumab group (i.e., ψ_OS = 1.23), with the power of 80% at a significance level of 5% (two-sided test). The protocol sample size of 1520 patients was determined for the reasons that the required numbers of PFS and OS events were expected to occur until 36 and 60 months after the first randomized, respectively, assuming constant recruitment over 24 months, and some elements of uncertainty are considered.

We set 60 months PFS and OS rates in the control group as $S_{PFS}^{(1)} (60) = 0.1$ and $S_{OS}^{(1)} (60) = 0.4$ based on the aforementioned information, which is taking account of hazard rates $λ_{PFS}^{(1)} = - \frac{1}{18} log (0.5)$ and $λ_{OS}^{(1)} = - \frac{1}{43} log (0.5)$ from the protocol and the exponential assumptions, and its uncertainty. Also, we derive the HR for TTP, because our calculation is based on the TTP and OS rather than the PFS and OS. That is, the HR and survival rate for TTP can be computed as $ψ_{TTP}^{- 1} = λ_{TTP}^{(1)} / λ_{TTP}^{(2)} ≐ 1.31$ and $S_{TTP}^{(1)} (60) ≐ 0.25$ from the information on medians based on

e^{- λ_{TTP}^{(j)} t} = S_{TTP}^{(j)} (t) = S_{PFS}^{(j)} (t) / S_{OS}^{(j)} (t) = e^{- λ_{PFS}^{(j)} t} / e^{- λ_{OS}^{(j)} t}, j = 1, 2

from the independence and exponential assumptions. These values of the HR and $S_{TTP}^{(1)}$ are used for an illustration although the independence assumption is suspect.

Table V shows the total sample size N_cp, and the empirical powers P̃_cp, P̃_TTP, P̃_PFS, and P̃_OS (%) to evaluate the joint and single hypotheses ( $H_{1}^{cp}$ and H₁_j) of TTP, PFS, and OS given N_cp, respectively, under $ψ_{TTP}^{- 1} = 1.31$ and $ψ_{OS}^{- 1} = 1.23$ . Alternative sizing of NTTP, NPFS, and NOS gives the sample sizes required to test the single hypotheses of the TTP, PFS, and OS, respectively. The sample sizes were calculated to evaluate the joint reduction in both time-to-event outcomes of the TTP and OS or the PFS and OS with the target power of 80% at the significance level of 2.5%, based on the assumptions of the ICON7 study, assuming τ_a = 24 months accrual duration and additional τ_f = 36 months follow-up. The empirical power P̃_PFS is larger than the target power of 90% to test the PFS written in the protocol, because the analysis times in the protocol design (τ_f = 12) and ours (τ_f = 36) are different. Similarly, note that N_PFS is calculated under the target power of 80% smaller than the power 90% in the protocol. Also, the expected numbers of bivariate events are listed using the notation corresponding to D_𝚤𝚥 = N_cp × P_𝚤𝚥 in which N_cp is replaced by $N_{cp}^{(c)}$ or $N_{cp}^{(nc)}$ , and P_𝚤𝚥 is the probability ( $P_{𝚤 𝚥}^{TTP}$ : see Appendix B) calculated under the non-composite setting. Note that, even in the composite setting with the PFS and OS, D_𝚤𝚥 represents the number that either or both of the TTP and OS are observed. In addition, the correlation is assumed to be common between the two groups, that is, ρ⁽¹⁾ = ρ⁽²⁾.

Table V.

Total sample size N_cp and the empirical power for detecting the joint reduction in the TTP or PFS and the OS, where N_cp is designed with the target power 80% at α = 0.025, based on the assumption of ICON7 study, $ψ_{TTP}^{- 1} = 1.31, ψ_{OS}^{- 1} = 1.23$ , τ_a = 24 and τ_f = 36 with $S_{TTP}^{(1)} (60) = 0.25$ and $S_{OS}^{(1)} (60) = 0.4$ .

Dependence

Composite setting

Expected number of (TTP,OS)

Structure

ρ⁽^k⁾

N_{cp}^{(c)}

P̃_cp (P̃_PFS, P̃_OS)

N_PFS

N_OS

D₁₁

D₁₀

D₀₁

D₀₀

Late (Clayton copula)

0.0

1510

80.1 (99.0,80.3)

658

1498

240

476

487

306

0.3

1528

80.0 (97.4,80.7)

789

1498

275

429

461

363

0.5

1544

80.0 (96.0,81.1)

881

1498

302

395

441

405

0.8

1564

80.0 (93.4,81.7)

1024

1498

370

316

383

495

Early (Gumbel copula)

0.0

1510

80.3 (99.0,80.6)

658

1498

240

476

487

306

0.3

1508

80.1 (98.5,80.3)

700

1498

315

384

411

398

0.5

1506

80.2 (98.2,80.4)

726

1498

374

327

351

454

0.8

1498

80.0 (97.9,80.1)

744

1498

510

243

211

533

Dependence

Non-composite setting

Expected number of (TTP,OS)

Structure

ρ ⁽^k⁾

N_{cp}^{(n c)}

P̃_cp (P̃_TTP, P̃_OS)

N_TTP

N_OS

D₁₁

D₁₀

D₀₁

D₀₀

Late (Clayton copula)

0.0

1628

80.3 (96.3,83.3)

913

1498

259

514

525

330

0.3

1674

80.2 (94.9,84.2)

1023

1498

301

470

505

397

0.5

1658

80.2 (94.1,84.7)

1076

1498

332

434

484

445

0.8

1658

80.3 (94.1,84.1)

1056

1498

392

335

406

525

Early (Gumbel copula)

0.0

1628

80.2 (96.4,83.3)

913

1498

259

514

525

330

0.3

1594

80.1 (96.7,82.3)

878

1498

333

406

435

420

0.5

1562

80.2 (97.0,81.8)

830

1498

388

339

364

471

0.8

1510

80.0 (98.6,80.2)

695

1498

514

245

213

538

Open in a new tab

P̃_TTP, P̃_PFS, and P̃_OS are 100 times empirical powers (%) when single hypotheses on the TTP, PFS and OS are tested, given total sample sizes N_cp.

In calculating the sample size, we assume that one event is fatal because the TTP may be censored by the OS of fatal event. If the association between the TTP and OS is late-time dependent, then the total sample sizes $N_{cp}^{(c)}$ required to test the PFS and OS jointly with common correlation between the two groups, ρ⁽^k⁾ = 0.0, 0.3, 0.5, and 0.8 are 1510, 1528, 1544, and 1564, respectively. The sample size increases monotonically to ρ⁽^k⁾ = 0.8 from 0: the difference between the smallest and largest sample sizes may seem relatively large, although the ratio is only about 1.04. If the association is early-time dependent, then the total sample sizes $N_{cp}^{(c)}$ required under ρ⁽^k⁾ = 0.0, 0.3, 0.5 and 0.8 are 1510, 1508, 1506, and 1498, respectively. The sample size decreases with increasing correlation, but the reduction rate from the largest sample size given by ρ⁽^k⁾ = 0.0 is quite small. Also, comparing the composite and non-composite endpoints, $N_{cp}^{(c)}$ is smaller than the $N_{cp}^{(nc)}$ required to test the TTP and OS jointly, but the difference is slight. The expected numbers of bivariate events, D_𝚤𝚥 provide useful information in monitoring process of the trial.

4. Multiple primary endpoints

4.1. Hypothesis testing, power, and sample sizes

We discuss calculating the required sample size for trials with multiple primary endpoints. For simplicity, we will consider the problem on the power and sample size calculation using the simplest procedures, the (weighted) Bonferroni procedure as well known and widely used. The other procedures for controlling the Type I error rate are available [6, 19, 20].

The weighted Bonferroni procedure allocates the Type I error rate α between the endpoints with weight ω, that is, α₁ = ωα for the first endpoint and α₂ = (1−ω)α for the second endpoint. We are then interested in testing $H_{0}^{mp} = {H_{01} : ψ_{1} (t) = 1 and H_{02} : ψ_{2} (t) = 1} for all t}$ versus $H_{1}^{mp} = {H_{11} : ψ_{1} (t) < 1 or H_{12} : ψ_{2} (t) < 1 at some t}$ at the (overall) significance level of 3, based on the log-rank test statistics $Z_{1}^{(0)}$ and $Z_{2}^{(0)}$ given in Section 2. The testing procedure is to

reject the null hypothesis H_{0}^{mp} if and only if Z_{1}^{(0)} < - z_{α_{1}} or Z_{2}^{(0)} < - z_{α_{2}},

(9)

where z_α₁ and z_α₂ are the 100(1 − α₁) and 100(1 − α₂) percentiles of N(0, 1), respectively. Therefore, because the rejection region of $H_{0}^{mp}$ is ${(Z_{1}^{(0)}, Z_{2}^{(0)}) : Z_{1}^{(0)} < - z_{α_{1}} or Z_{2}^{(0)} < - z_{α_{2}}}$ , we have the power function for a reduction in either of the time-to-event outcomes with the weighted Bonferroni procedure,

1 - β = Pr [\cup_{k = 1}^{2} {Z_{k}^{(0)} < - z_{α_{k}}} ∣ H_{1}^{mp}] = 1 - Pr [\cap_{k = 1}^{2} {Z_{k}^{(0)} \geq - z_{α_{k}}} ∣ H_{1}^{mp}]

This overall power is referred to as ‘minimal power’[12] or ‘disjunctive power’[13]. Similarly to Section 3.1, for all of the three censoring scenarios, the power based on the function (5) can be approximated by

1 - β \approx 1 - \int_{- z_{α_{1}}}^{\infty} \int_{- z_{α_{2}}}^{\infty} ϕ (z_{1}, z_{2}; \sqrt{N} δ, \sum) d z_{1} d z_{2} .

(10)

Letting N_mp be the minimum of total sample size N required for testing $H_{0}^{mp}$ against $H_{1}^{mp}$ , the formula for N_mp is given by

N_{mp} = ⌈ {(L_{β} - z_{α_{2}} σ_{2}^{(0)} / σ_{2})}^{2} / δ_{2}^{2} ⌉,

(11)

where L_β is the solution of the integral equation

β = \int_{- \infty}^{\frac{δ_{1}}{δ_{2}} L_{β} + {z_{α_{1}} \frac{σ_{1}^{(0)}}{σ_{1}} - z_{α_{2}} \frac{δ_{1}}{δ_{2}} \frac{σ_{2}^{(0)}}{σ_{2}}}} \int_{- \infty}^{L_{β}} ϕ (z_{1}, z_{2}; 0, R) d z_{1} d z_{2}

and $σ_{k}^{(0)}$ , σ_k, and R are the same as the definitions given in Section 3.1. Similarly to the methods to obtain N_cp, we can use a search method of L_β such as the Newton–Raphson algorithm [14] or the basic linear interpolation algorithm [2] in order to compute N_mp, which generally take shorter computing time than direct search that let N_mp increase sequentially until (10) exceeds the desired power. Also, similarly to Section 3.1, the required numbers of events that are useful to monitor a trial can be obtained by D_𝚤𝚥 = N_mp × P_𝚤𝚥 (see Appendix B for details regarding calculation of P_𝚤𝚥 under the three censoring scenarios).

The overall Type I error associated with the null $H_{0}^{mp}$ is controlled by the sum of marginal Type I errors. Similarly to Section 3, the inflation problem of the overall Type I error in the procedure (6) based on the asymptotic normal approximation is parallel to that of the marginal ones. The overall significance level α₁ + α₂ on $H_{0}^{mp}$ is quite close to the level α₁ + α₂ − α₁α₂ considered when two endpoints are independent, so that the influence on marginal inflations may be practically larger in multiple primary problem than in multiple co-primary one. However, we can control the errors by correcting the critical values based on the sample size N and the allocation rate r⁽¹⁾, using simulation or theoretical result well known in the univariate log-rank statistic [9–11]. The practical need for sample size formula is a balanced weighting of the precision and cost, and then the use of the formula based on the asymptotic normal approximation is left in each stage of practice.

4.2. Illustration: one fatal case

We illustrate the behavior of the sample size and power for detecting at least one reduction $H_{1}^{mp}$ in bivariate time-to-event data focusing on the one fatal outcome scenario with the two time-dependent association structures. Similarly to Section 3.2, we generate bivariate time-to-event data and perform Monte Carlo trials. The target power 1 − β = 0.8, the significance level α = 0.025, τ_a = 2, and τ_f = 3 are used. The notations of N_EP, ψ_EP, and $S_{EP}^{(j)} (t)$ for EP = TTP, PFS, and OS are the same as those in Section 3. Also, N_mp is written as $N_{mp}^{(nc)}$ when the first endpoint is the TTP, or as $N_{mp}^{(c)}$ when the first endpoint is the PFS. Let P̃_mp denote the empirical power (%) for detecting $H_{1}^{mp}$ under N_mp participants designed using the formula (11). Further, we select ω = ω₀ as an optimal value of ω to consider a variable weighting strategy for the weighted Bonferroni procedure, that is, ω₀ is ω that gives a minimum of N_mp over ω = 0, 0.05, …, 0.95, 1 (ω₀ = argmin_ω_∈{0_,_0.05_,_…_,_0.95_,_1}N_mp). One may consider the other testing procedures such as the fixed-sequence procedure when ω₀ = 0 or ω₀ = 1 is suggested.

Table VI displays the required total sample sizes N_mp (given ω = ω₀), the alternative sizing solutions N_PFS, N_TTP, and N_OS, the selected Bonferroni weight ω₀ and the empirical power P̃_mp under $ψ_{TTP}^{- 1} = 1.7, S_{TTP}^{(1)} (τ) = 0.3$ , and $S_{OS}^{(1)} (τ) = 0.5$ when $ψ_{OS}^{- 1}$ and ρ⁽^k⁾ vary in the combinations of $ψ_{OS}^{- 1} = 1.3$ , 1.5, 1.7 and ρ⁽¹⁾ = ρ⁽²⁾ = 0, 0.3, 0.5, 0.8.

Table VI.

The case of one fatal outcome (ii) (semi-competing risk): total numbers of participants N_mp calculated from (11), the corresponding empirical powers P̃_mp (%) and alternative sizing solutions N_PFS, N_TTP, and N_OS under $ψ_{TTP}^{- 1} = 1.7, S_{TTP}^{(1)} (τ) = 0.3$ and $S_{OS}^{(1)} (τ) = 0.5$ .

Dependence

Composite setting

Non-composite setting

structure

ψ_{OS}^{- 1}

ρ⁽^k⁾

ω ₀

N_{mp}^{(c)}

P̃_mp

N_PFS

N_OS

ω₀

N_{mp}^{(nc)}

P̃_mp

N_TTP

N_OS

Late (Clayton copula)

1.3

0.0

1.00

254

80.7

254

1194

0.95

268

80.8

270

1194

1.3

0.3

1.00

284

80.5

284

1194

0.95

288

81.0

288

1194

1.3

0.5

1.00

322

80.9

322

1194

0.95

296

80.5

298

1194

1.3

0.8

1.00

366

80.8

366

1194

1.00

288

80.7

288

1194

1.5

0.0

1.00

200

80.9

200

532

0.80

242

80.2

266

532

1.5

0.3

1.00

232

80.7

232

532

0.80

266

80.4

292

532

1.5

0.5

1.00

256

80.5

256

532

0.80

282

80.5

306

532

1.5

0.8

1.00

300

80.9

300

532

0.80

296

80.5

312

532

1.7

0.0

1.00

168

81.1

168

328

0.55

208

80.4

264

328

1.7

0.3

1.00

196

80.9

196

328

0.55

228

80.7

296

328

1.7

0.5

0.95

218

80.6

218

328

0.55

242

80.7

314

328

1.7

0.8

0.95

260

80.8

262

328

0.50

264

80.7

332

328

Early (Gumbel copula)

1.3

0.0

1.00

254

80.7

254

1194

0.95

268

80.9

270

1194

1.3

0.3

1.00

266

80.6

266

1194

1.00

246

81.0

246

1194

1.3

0.5

1.00

272

80.6

272

1194

1.00

222

80.8

222

1194

1.3

0.8

1.00

258

80.3

258

1194

1.00

174

81.0

174

1194

1.5

0.0

1.00

200

80.7

200

532

0.80

242

80.5

266

532

1.5

0.3

1.00

216

81.0

216

532

0.90

250

80.5

260

532

1.5

0.5

1.00

224

80.7

224

532

0.95

246

80.6

248

532

1.5

0.8

1.00

230

80.9

230

532

1.00

214

81.0

214

532

1.7

0.0

1.00

168

81.3

168

328

0.55

208

80.4

264

328

1.7

0.3

1.00

186

81.0

186

328

0.60

228

80.3

272

328

1.7

0.5

1.00

196

80.7

196

328

0.65

240

80.3

280

328

1.7

0.8

1.00

214

80.7

214

328

1.00

248

81.0

248

328

Open in a new tab

The sample sizes N_mp obtained from the formula (11) consistently provide conservative results because the empirical powers P̃_mp are slightly larger than the target powers. Although $N_{mp}^{(c)} < N_{mp}^{(nc)}$ in many cases, we observe $N_{mp}^{(c)} > N_{mp}^{(nc)}$ in some cases where N_TTP is relatively close to N_PFS or smaller than N_PFS, which occurs when ρ⁽^j⁾ is relatively large and the effect size of OS (absolute value of $log ψ_{OS}^{- 1}$ ) is smaller than that of TTP. Also, if the effect size of OS is smaller than that of TTP, then N_TTP ≪ N_OS or N_PFS ≪ N_OS occurs, so that we have $N_{mp}^{(nc)} \approx N_{TTP}$ or $N_{mp}^{(c)} \approx N_{PFS}$ accompanied with the value of ω₀ close to 1. When N_TTP (or N_PFS) is closer to N_OS, ω₀ takes a value between 0 and 1.

Further, we add a summary about the tendency how ω₀-value ranges from 0 to 1 in Section B.2 of Supporting Information, based on the underlying results by varying S_TTP(τ) and ρ⁽^k⁾ leaving the other factors fixed, whose parts are provided in Figures B.4, B.5, B.6, and B.7. We provide a guideline based on the summary about ω₀-value: generally, the correlation ρ⁽^k⁾ and the dependence structure are unknown, so that these should be examined using meta analysis and/or data from a pilot study, similarly to consideration for the HRs of effect sizes. Although it may be desirable to consider a maximum sample size considering such unknown factors, it is necessary to balance on a degree of uncertainty and the cost. When the TTP and OS are used as two primary endpoints, ω₀ ranges from 0 to 1, but ω₀ is approximately 0.5 when N_TTP/N_OS is close to 1. When the PFS and OS are used as two primary endpoints, the ω₀-value is usually close to 1 under early dependency and under a late dependency with not high ρ⁽^k⁾, if $ψ_{PFS}^{- 1} \geq ψ_{OS}^{- 1}$ . The value of ω₀ may be away from 0 and 1 otherwise. These considerations about ω₀-value suggest that the use of PFS endpoint is one of the reasonable strategies in the multiple primary problem for the TTP and OS if $ψ_{TTP}^{- 1} \geq ψ_{OS}^{- 1}$ , although the PFS endpoint may be used without consideration in practice. Hence, it is useful to incorporate ω₀ in the sample size calculation proposed for multiple primary endpoints.

4.3. Behavior of the sample size as a function of the correlation

We extend the study to the both non-fatal and both fatal cases. We use the same setting as in Section 4.2 (1 − β = 0.8, α = 0.025, τ_a = 2, and τ_f = 3), while the notations of N_k, ψ_k, and $S_{k}^{(j)} (t)$ for the kth endpoint are used because of the consistent expression of the results from the three situations. Similarly to Section 4.2, we select a minimum of N_mp over ω = 0, 0.05, …, 0.95, 1 via a strategy for the weighted Bonferroni procedure, $N_{mp}^{(c)}$ is N_mp calculated under the composite setting, and $N_{mp}^{(nc)}$ is N_mp calculated under the non-composite setting.

Figure 1 shows the required total sample sizes N_mp as a function of ρ⁽¹⁾ = ρ⁽²⁾ = 0, 0.05, …, 0.95 when $ψ_{2}^{- 1}$ varies from 1.3, 1.5 and 1.7 under $ψ_{1}^{- 1} = 1.7, S_{1}^{(1)} (τ) = 0.3$ and the early dependency (Gumbel copula). The 12 plots are arranged from the left for $S_{2}^{(1)} (τ) = 0.4$ , 0.5 and 0.6, and from the top for the four scenarios: the both non-fatal case without the composite setting ( $N_{cp}^{(nc)}$ ), the one fatal case without and with the composite setting ( $N_{mp}^{(nc)}$ and $N_{mp}^{(c)}$ ), and the both fatal case with the composite setting ( $N_{mp}^{(c)}$ ). The plots for the late dependency case are provided in Figure B.3 in Section B.1 of Supporting Information (generated under the Clayton copula (late dependency) using the same conditions as Figure 1 except the copula model).

Behavior of the total sample sizes N_mp as a function of the correlation ρ⁽^j⁾ for $ψ_{2}^{- 1} = 1.3$ , 1.5 and 1.7, arranged from the left for $S_{2}^{(1)} (τ) = 0.4$ , 0.5, 0.6 and from the top following to $N_{mp}^{(nc)}$ from the both non-fatal case, $N_{mp}^{(nc)}$ and $N_{mp}^{(c)}$ of one fatal case and $N_{mp}^{(c)}$ of both fatal case, given 1 − β = 0.8, α = 0.025, τ_a = 2, τ_f = 3, $ψ_{1}^{- 1} = 1.7, S_{1}^{(1)} (τ) = 0.3$ and early dependency

As ρ⁽^j⁾ increases, the behavior of $N_{mp}^{(nc)}$ is monotone increasing up to a constant value (univariate sample size) in the both non-fatal case but is more complicated in the one fatal case. In the one fatal case, there is a complicated interaction between the effect size and the correlation, where a smaller sample size is required with increasing effect size and decreasing correlation, while the censoring rate for non-fatal events increases with increasing correlation. Consider the case of ρ⁽^j⁾ = 1 as a standard, because the sample size N_mp required in bivariate primary endpoints reduces to that of the single endpoint in the optimal weighting Bonferroni strategy when the correlations ρ⁽^j⁾ are 1. In the one fatal case with the composite setting, the complicated behavior is moderated, and $N_{mp}^{(c)}$ tends to decrease as the correlation decreases, but $N_{mp}^{(c)}$ is sometimes larger than $N_{mp}^{(nc)}$ . The behavior of $N_{mp}^{(c)}$ in the both fatal case with the composite setting is similar to that of the one fatal case.

5. Summary

Utilizing multiple endpoints in clinical trials may provide the opportunity for characterizing intervention’s multidimensional effects but also creates challenges in design and analysis of clinical trials. Specifically controlling the Types I and II error rates is non-trivial when the multiple primary endpoints are potentially correlated. When designing the trial to detect effects for all of the endpoints, no adjustment is needed to control the Type I error. However, the Type II error increases as the number of endpoints being evaluated increases. In contrast, when designing the trial to detect an effect for at least one of the endpoints, then an adjustment is needed to control the Type I error.

We describe an approach to the evaluation of power and sample size for comparing the effect of two interventions in superiority clinical trials with two time-to-event outcomes for multiple co-primary and multiple primary cases. Designing clinical trials with multiple time-to-event outcomes is more complex compared with endpoints with the other scales, requiring censoring scheme challenges and time-dependent associations among outcomes. We consider the three censoring scenarios based on the types of outcomes: (i) both outcomes are non-fatal; (ii) one outcome is fatal; and (iii) both outcomes are fatal, and we evaluate their composite and non-composite settings. We discuss the two time-dependent association structures: the asymmetric late time-dependency generated by the Clayton copula and the early time-dependency generated by the Gumbel copula. Our findings are summarized as follows.

In the co-primary endpoint situation, if the two time-to-event outcomes are non-fatal, then the required sample size $N_{cp}^{(nc)}$ decreases with increasing correlations ρ⁽^j⁾, in both of the late and early time-dependencies, except for the case where one HR is larger than the other. When the correlations are zero, $N_{cp}^{(nc)}$ is the largest. However, when one or both outcomes are non-fatal, the behaviors of required sample sizes $N_{cp}^{(nc)}$ and $N_{cp}^{(c)}$ have complicated shapes owing to an interaction between the correlations and effect sizes based on the HRs. Zero correlation does not provide the largest required sample size. Thus, careful consideration is required in practice.
In the multiple primary endpoint situation based on the optimal weighting Bonferroni strategy, a standard situation occurs when the correlations ρ⁽^j⁾ are one, corresponding to the single endpoint case. Considering the two non-fatal case, the sample size $N_{mp}^{(nc)}$ does not vary or increases monotonously up to a constant (univariate sample size) as a function of the correlations, in both of the late and early time-dependencies. However, when one or both outcomes are non-fatal, the behaviors of the required sample sizes $N_{mp}^{(nc)}$ and $N_{mp}^{(c)}$ are complex with an interaction between the correlations and the effect size ratios. Thus, the proposed formula is useful for determining the sample size in practice, noting, when the sample size is small, for example, < 100, one may have to modify a critical value based on the normal approximation.
Unlike the both non-fatal case, larger correlations do not increase the statistical power in the one fatal or both fatal cases. The reason for this is that higher correlation leads to increased censoring. Hence, the standard log-rank statistic must be corrected by incorporating information from informative censoring. We focus on providing a foundation for designing clinical trials with two time-to-event outcomes. Informative censoring is a topic for future work.

When designing a clinical trial with the proposed methods, one needs parameter estimates using the available methods ([7, 21–25, 27, 28]). But specification for the joint distribution is challenging as data to base selection is often limited during trial design. One conservatively alternative is to select the largest sample size of all of the correlation and joint distribution combinations, and stop the clinical trial when the appropriate number of events required for each outcome is observed. Another option is to use group-sequential designs. This may lead to potentially fewer patients than the fixed-sample designs when evidence is overwhelming and thus offers efficiency but introduces the other challenges. Information for the endpoints may not accrue at the same rate and require different information times.

As discussed in [29] and [26], the Type I error rate of the log-rank test for each endpoint may be inflated in small sample sizes or with unequally sized intervention groups. When this occurs in the co-primary endpoint situation, our simulation studies suggest that the overall Type I error associated with the null hypothesis $H_{0}^{cp}$ is not larger than the target significance level, except when the correlation is very high (i.e., close to one), even though the marginal Type I error rate is inflated. On the other hand with multiple primary endpoints, the overall Type I error associated with the null hypothesis $H_{0}^{mp}$ is larger than the target significance level, particularly when the correlation is small. In these cases, we may consider more direct ways of calculating sample size without using a normal approximation such as the methods in [29] and [26].

Supplementary Material

SUPPLEMENTARY MATERIAL

NIHMS877489-supplement-SUPPLEMENTARY_MATERIAL.pdf^{(280.5KB, pdf)}

Acknowledgments

We thank the two anonymous referees for their helpful suggestions and constructive comments that improved the content and presentation. We also thank Dr. Lu Tian, Dr. H.M. James Hung and Dr. Sue-Jane Wang in encouraging us with their valuable comments for this research. This work was supported by JSPS KAKENHI Grant Number 26330032.

APPEENDIX A. Asymptotic forms in the bivariate log-rank statistic

We provide the details of asymptotic forms of the bivariate log-rank statistic discussed in Section 2.3. See [3] for the case when both endpoints are non-fatal. We obtain the covariance process between two incremental differences $d M_{i k} (t) = d N_{i k} (t) - Y_{i k} (t) d {\tilde{Λ}}_{k}^{(g_{i})} (t)$ , k = 1, 2, where M_i₁(t) and M_i₂(t) are martingale processes relative to the filtration generated by the history prior to time t under the null hypothesis. This corresponds to calculating the expectation of

\begin{array}{l} d M_{i 1} (t) d M_{i 2} (s) = d N_{i 1} (t) d N_{i 2} (s) - Y_{i 1} (t) d {\tilde{Λ}}_{1}^{(g_{i})} (t) d N_{i 2} (s) \\ - Y_{i 2} (s) d {\tilde{Λ}}_{2}^{(g_{i})} (s) d N_{i 1} (t) + Y_{i 1} (t) Y_{i 2} (s) d {\tilde{Λ}}_{1}^{(g_{i})} (t) d {\tilde{Λ}}_{2}^{(g_{i})} (s) . \end{array}

(A.1)

One fatal case

Consider the non-composite setting with the censoring indicator $Δ_{i k} = 𝟙 (T_{i k} = T_{i k}^{*})$ , k = 1, 2. Because $T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ and $T_{i 2} = min (T_{i 2}^{*}, C_{i})$ , the expectations of the ith at-risk processes given g_i = j, that is, $y_{k}^{(j)} (t)$ , k = 1, 2 are

y_{1}^{(j)} (t) = E [Y_{i 1} (t) ∣ g_{i} = j] = Pr (T_{i 1}^{*} \geq t, T_{i 2}^{*} \geq t, C_{i} \geq t ∣ g_{i} = j) = S^{(j)} (t, t) G (t)

for the non-fatal endpoint and

y_{2}^{(j)} (t) = E [Y_{i 2} (t) ∣ g_{i} = j] = Pr (T_{i 2}^{*} \geq t, C_{i} \geq t ∣ g_{i} = j) = S_{2}^{(j)} (t) G (t)

for the fatal endpoint (the latter form is identical to that of both non-fatal case). So each element in Ĥ_k(t) converges to the corresponding expectation almost surely by Glivenko–Cantelli’s theorem under some regular conditions, so that we have the asymptotic forms of Ĥ_k(t), k = 1, 2 as

H_{k} (t) = N^{- 1} \frac{E [Y_{k}^{(1)} (t)] E [Y_{k}^{(2)} (t)]}{E [Y_{k}^{(1)} (t)] + E [Y_{k}^{(2)} (t)]} = {\begin{cases} G (t) \frac{r^{(1)} r^{(2)} S^{(1)} (t, t) S^{(2)} (t, t)}{r^{(1)} S^{(1)} (t, t) + r^{(2)} S^{(2)} (t, t)}, & k = 1, \\ G (t) \frac{r^{(1)} r^{(2)} S_{2}^{(1)} (t) S_{2}^{(2)} (t)}{r^{(1)} S_{2}^{(1)} (t) + r^{(2)} S_{2}^{(2)} (t)}, & k = 2. \end{cases}

Similarly, the conditional expectations of the ith counting processes given 𝒴_ik(t) = 1 are

\begin{array}{l} d {\tilde{Λ}}_{1}^{(g_{i})} (t) = E [d N_{i 1} (t) ∣ Y_{i 1} (t) = 1. g_{i}] \\ = \frac{Pr (t \leq T_{i 1}^{*} < t + d t, t \leq T_{i 2}^{*}, t \leq C_{i} ∣ g_{i})}{Pr (Y_{i 1} (t) = 1 ∣ g_{i})} = - \frac{S^{(g_{i})} (d t, t)}{S^{(g_{i})} (t, t)} = Λ^{(g_{i})} (d t, t) \end{array}

for the non-fatal endpoint and $d {\tilde{Λ}}_{2}^{(g_{i})} (t) = E [d N_{i 2} (t) ∣ Y_{i 2} (t) = 1, g_{i}] = d Λ_{2}^{(g_{i})} (t)$ for the fatal endpoint. Hence, the expectation of (A.1) is

\begin{array}{l} E [d M_{i 1} (t) d M_{i 2} (s) ∣ g_{i}] = Pr (Y_{i 1} (t) Y_{i 2} (s) = 1 ∣ g_{i}) E [d M_{i 1} (t) d M_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] \\ = G (t \lor s) {𝟙 (t \leq s) S^{(g_{i})} (d t, d s) + 𝟙 (t \leq s) S^{(g_{i})} (t, d s) Λ^{(g_{i})} (d t, t) + S^{(g_{i})} (d t, t \lor s) d Λ_{2}^{(g_{i})} (s) + S^{(g_{i})} (t, t \lor s) Λ^{(g_{i})} (d t, t) d Λ_{2}^{(g_{i})} (s)}, \end{array}

(A.2)

because, noting that T_i₁ ≤ T_i₂ always holds, we have

\begin{array}{l} Pr (Y_{i 1} (t) Y_{i 2} (s) = 1 ∣ g_{i}) & = Pr (t \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*}, t \lor s \leq C_{i} ∣ g_{i}) = G (t \lor s) S^{(g_{i})} (t, t \lor s), \\ E [d N_{i 1} (t) d N_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{Pr (t \leq T_{i 1}^{*} < t + d t, s \leq T_{i 2}^{*} < s + d s ∣ g_{i})}{Pr (t \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})} = 𝟙 (t \leq s) \frac{S^{(g_{i})} (d t, d s)}{S^{(g_{i})} (t, t \lor s)}, \\ E [Y_{i 1} (t) d N_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{Pr (t \leq T_{i 1}^{*}, s \leq T_{i 2}^{*} < s + d s, t \lor s \leq T_{i 2}^{*} ∣ g_{i})}{Pr (t \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})} = - 𝟙 (t \leq s) \frac{S^{(g_{i})} (t, d s)}{S^{(g_{i})} (t, t, \lor s)} and \\ E [d N_{i 1} (t) Y_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{Pr (t \leq T_{i 1}^{*} < t + d t, t \lor s \leq T_{i 2}^{*} ∣ g_{i})}{Pr (t \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})} = - \frac{S^{(g_{i})} (d t, t \lor s)}{S^{(g_{i})} (t, t \lor s)} . \end{array}

Therefore, we have the form of dA⁽^j⁾(t, s) = G(t ∨ s)⁻¹E[dM_i₁(t)dM_i₂(s) | g_i = j] given in (3a) of Section 2.3.

Next, assume the composite setting $Δ_{i 1} = 𝟙 (T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}))$ and $Δ_{i 2} = 𝟙 (T_{i 2} = T_{i 2}^{*})$ . That is, the definition of d𝒩_i₁(t) is only changed from the non-composite version. Because observing the 2nd endpoint is composed in the ith counting processes 𝒩_i₁(t) for the first endpoint, the conditional expectations of d𝒩_i₁(t) given 𝒴_i₁(t) = 1 is

\begin{array}{l} d {\tilde{Λ}}_{1}^{(g_{i})} (t) = E [d N_{i 1} (t) ∣ Y_{i 1} (t) = 1, g_{i}] = \frac{\sum_{k = 1, 2} Pr (t \leq T_{i k}^{*} < t + d t, t \leq T_{i k^{'}}^{*}, t \leq C_{i} ∣ g_{i})}{Pr (Y_{i 1} = 1 ∣ g_{i})} \\ = Λ^{(g_{i})} (d t, t) + Λ^{(g_{i})} (t, d t), \end{array}

where k′ = 3−k. So, the terms of $d {\tilde{Λ}}_{1}^{(g_{i})} = Λ^{(g_{i})} (d t, t)$ in (A.2) are replaced by Λ^(g_i)(dt, t)+Λ^(g_i)(t, dt) in this composite setting. Similarly, the other corrections in (A.2) are of terms of the conditional expectations including d𝒩_i₁(t), and derived as

\begin{array}{l} E [d N_{i 1} (t) d N_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{𝟙 (t \leq s) S^{(g_{i})} (d t, d s) - 𝟙 (t = s) S^{(g_{i})} (t, d s)}{S^{(g_{i})} (t, t \lor s)} and \\ E [d N_{i 1} (t) Y_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = - \frac{S^{(g_{i})} (d t, t \lor s) + 𝟙 (t \leq s) S^{(g_{i})} (d t, t \lor s)}{S^{(g_{i})} (t, t \lor s)} . \end{array}

We achieve (3b) applying these results into the expectation of (A.1). In this composite setting, we can see that the intensity information on the 2nd endpoint (death event) is added into $d {\tilde{Λ}}_{1}^{(j)} (t)$ and dA⁽^j⁾(t, s) compared with the non-composite version.

Both fatal case

We consider the non-composite setting, $Δ_{i k} = 𝟙 (T_{i k} = T_{i k}^{*})$ , k = 1, 2 and omit the composite setting for simplicity. Following $T_{i 1} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ and $T_{i 2} = min (T_{i 1}^{*}, T_{i 2}^{*}, C_{i})$ , the expectations of the ith at-risk processes given g_i = j are the same forms as

y_{k}^{(j)} (t) = E [Y_{i k} (t) ∣ g_{i} = j] = Pr (t \leq T_{i 1}^{*}, t \leq T_{i 2}^{*}, t \leq C_{i} ∣ g_{i}) = S^{(j)} (t, t) G (t),

for both endpoints, identical to that obtained for the one fatal case. Hence, by Glivenko–Cantelli’s theorem under some regular conditions, the asymptotic forms of Ĥ_k(t), k = 1, 2 are

H_{k} (t) = N^{- 1} \frac{E [Y_{k}^{(1)} (t)] E [Y_{k}^{(2)} (t)]}{E [Y_{k}^{(1)} (t)] + E [Y_{k}^{(2)} (t)]} = G (t) \frac{r^{(1)} r^{(2)} S^{(1)} (t, t) S^{(2)} (t, t)}{r^{(1)} S^{(1)} (t, t) + r^{(2)} S^{(2)} (t, t)} .

The conditional expectations of the ith counting processes given 𝒴_ik(t) = 1 are

\begin{array}{l} d {\tilde{Λ}}_{k}^{(g_{i})} (t) = E [d N_{i k} (t) ∣ Y_{i k} (t) = 1, g_{i}] = \frac{Pr (t \leq T_{i k}^{*} < t + d, t \leq T_{i k^{'}}^{*}, t \leq C_{i} ∣ g_{i})}{Pr (Y_{i k} (t) = 1 ∣ g_{i})} \\ = {\begin{cases} \frac{S^{(g_{i})} (d t, t)}{S^{(g_{i})} (t, t)} & = Λ^{(g_{i})} (d t, t), k = 1 \\ \frac{S^{(g_{i})} (t, d t)}{S^{(g_{i})} (t, t)} & = Λ^{(g_{i})} (t, d t), k = 2 \end{cases} \end{array}

similar to the result for the non-fatal endpoint in the one fatal case, where k′ = 3 − k. Hence, the expectation of (A.1) can be derived as

\begin{array}{l} E [d M_{i 1} (t) d M_{i 2} (s) ∣ g_{i}] = G (t \lor s) {𝟙 (t = s) S^{(g_{i})} (d t, d s) + 𝟙 (t \leq s) S^{(g_{i})} (t \lor s, d s) Λ^{(g_{i})} (d t, t) \\ + 𝟙 (t \geq s) S^{(g_{i})} (d t, t \lor s) d Λ_{2}^{(g_{i})} (s) + S^{(g_{i})} (t \lor s, t \lor s) Λ^{(g_{i})} (d t, t) Λ^{(g_{i})} (s, d s)}, \end{array}

(A.3)

because

\begin{array}{l} Pr (Y_{i 1} (t) Y_{i 2} (s) = 1 ∣ g_{i}) & = Pr (t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*}, t \lor s \leq C_{i} ∣ g_{i}) = G (t \lor s) S^{(g_{i})} (t \lor s, t \lor s), \\ E [d N_{i 1} (t) d N_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{Pr (t \leq T_{i 1}^{*} < t + d t, s \leq T_{i 2}^{*} < s + d s, t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})}{Pr (t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})} \\ = 𝟙 (t = s) \frac{S^{(g_{i})} (d t, d s)}{S^{(g_{i})} (t \lor s, t \lor s)}, \\ E [Y_{i 1} (t) d N_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{Pr (t \leq T_{i 1}^{*}, s \leq T_{i 2}^{*} < s + d s, t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})}{Pr (t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})} \\ = - 𝟙 (t \leq s) \frac{S^{(g_{i})} (t \lor s, d s)}{S^{(g_{i})} (t \lor s, t \lor s)}, \\ E [d N_{i 1} (t) Y_{i 2} (s) ∣ Y_{i 1} (t) Y_{i 2} (s) = 1, g_{i}] & = \frac{Pr (t \leq T_{i 1}^{*} < t + d t, s \leq T_{i 2}^{*}, t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})}{Pr (t \lor s \leq T_{i 1}^{*}, t \lor s \leq T_{i 2}^{*} ∣ g_{i})} \\ = - 𝟙 (t \geq s) \frac{S^{(g_{i})} (d t, t \lor s)}{S^{(g_{i})} (t \lor s, t \lor s)} . \end{array}

Now, let us add an assumption natural for continuous time-to-event data that we cannot observe $T_{i 1}^{*}$ and $T_{i 2}^{*}$ simultaneously in both fatal event times. Then, the factors that occur at only t = s, such as 𝟙(t = s) S⁽^j⁾(dt, ds), do not contribute to the integral of the form $\int_{0}^{τ} \int_{0}^{τ} (\cdot) d A^{(j)} (t, s)$ , so that we can ignore 𝟙(t = s)S⁽^j⁾(dt, ds) in (A.3) as the zero term. We can also apply the relations S⁽^j⁾(s, ds) = −S⁽^j⁾(s, s)Λ⁽^j⁾(s, ds) and S⁽^j⁾(dt, t) = −S⁽^j⁾(t, t)Λ⁽^j⁾(dt, t) into (A.3) further. Hence, (A.3) becomes to zero under continuous time-to-event data. As a result, the form of dA⁽^j⁾(t, s) = G(t ∨ s)⁻¹E[dM_i₁(t)dM_i₂(s) | g_i = j] is obtained as (4a) in Section 2.3.

APPENDIX B. Probability formula for the observed two endpoints

We provide the formula of $P_{a b} = Pr (Δ_{i}^{(1)} = a, Δ_{i}^{(2)} = b)$ for the observation ( $Δ_{i}^{(1)}, Δ_{i}^{(2)}$ ) of the four patterns. We may monitor a trial based on the number of events obtained from the required sample size using the probability formula. Let $P_{a b}^{(j)} = Pr (Δ_{i}^{(1)} = a, Δ_{i}^{(2)} = b ∣ g_{i} = j)$ , a, b = 0, 1, j = 1, 2. Using this notation, we can write

P_{a b} = r^{(1)} P_{a b}^{(1)} + r^{(2)} P_{a b}^{(2)}

and then, for example, in the one fatal case, we can obtain $P_{11}^{(j)} = Q_{1}^{(j)}, P_{10}^{(j)} = R_{2}^{(j)} - Q_{1}^{(j)}, P_{01}^{(j)} = R_{1}^{(j)}$ , and $P_{00}^{(j)} = 1 - R_{1}^{(j)} - R_{2}^{(j)}$ , j = 1, 2, where

\begin{array}{l} Q_{1}^{(j)} & = \int_{0}^{\infty} (\int_{0}^{s} f^{(j)} (t, s) d t) G (s) d s, Q_{2}^{(j)} = \int_{0}^{\infty} (\int_{0}^{t} f^{(j)} (t, s) d s) G (t) d t, \\ R_{1}^{(j)} & = \int_{0}^{\infty} (\int_{s}^{\infty} f^{(j)} (t, s) d t) G (s) d s, R_{2}^{(j)} = \int_{0}^{\infty} (\int_{t}^{\infty} f^{(j)} (t, s) d s) G (t) d t, \end{array}

$f^{(j)} (x, y) (= \frac{d^{2}}{dxdy} S^{(j)} (x, y))$ is the density function of $(T_{i 1}^{*}, T_{i 2}^{*})$ for the ith participant assigned as g_i = j. The changes of $P_{a b}^{(j)}$ in the other censoring scenarios under the non-composite setting are summarized as Table BI including the one fatal case. For an example with the composite setting, let $P_{a b}^{TTP}$ and $P_{a b}^{PFS}$ be P_ab when the first endpoint corresponds to the TTP and the PFS, respectively, and the second endpoint is consistently of the OS. Then we have $P_{11}^{PFS} = P_{11}^{TTP} + P_{01}^{TTP}, P_{10}^{PFS} = P_{10}^{TTP}, P_{00}^{PFS} = P_{00}^{TTP}$ , and $P_{01}^{PFS} = 0$ It would be enough for us to consider P_ab under the non-composite setting as the probability formula, because it includes more information than P_ab under the composite setting.

Table BI.

The probability formula on observing ( $Δ_{i}^{(1)}, Δ_{i}^{(2)}$ ) in the three censoring scenarios under the non-composite setting

P_{a b}^{(j)}

Both non-fatal case (Non-competing model)

One fatal case (Semi-competing model)

Both fatal case (Full-competing model)

P_{11}^{(j)}

Q_{1}^{(j)} + Q_{2}^{(j)}

Q_{1}^{(j)}

P_{10}^{(j)}

R_{2}^{(j)} - Q_{1}^{(j)}

R_{2}^{(j)} - Q_{1}^{(j)}

R_{2}^{(j)}

P_{01}^{(j)}

R_{1}^{(j)} - Q_{2}^{(j)}

R_{1}^{(j)}

R_{1}^{(j)}

P_{00}^{(j)}

1 - R_{1}^{(j)} - R_{2}^{(j)}

1 - R_{1}^{(j)} - R_{2}^{(j)}

1 - R_{1}^{(j)} - R_{2}^{(j)}

Open in a new tab

Footnotes

Supporting information

Additional supporting information may be found in the online version of this article at the publisher’s web site.

References

1.Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika. 2001;88:907–919. doi: 10.1093/biomet/88.4.907. [DOI] [Google Scholar]
2.Hamasaki T, Sugimoto T, Evans SR, Sozu T. Sample size determination for clinical trials with co-primary outcomes: exponential event times. Pharmaceutical Statistics. 2013;12:28–34. doi: 10.1002/pst.1545. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sugimoto T, Sozu T, Hamasaki T, Evans SR. A logrank test-based method for sizing clinical trials with two co-primary time-to-event endpoints. Biostatistics. 2013;14:409–421. doi: 10.1093/biostatistics/kxs057. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Offen W, Chuang-Stein C, Dmitrienko A, Littman G, Maca J, Meyerson L, Muirhead R, Stryszak P, Boddy A, Chen K, Copley-Merriman K, Dere W, Givens S, Hall D, Henry D, Jackson JD, Krishen A, Liu T, Ryder S, Sankoh AJ, Wang J, Yeh CH. Multiple co-primary endpoints: medical and statistical solutions. Drug Information Journal. 2007;41:31–46. doi: 10.1177/009286150704100105. [DOI] [Google Scholar]
5.Hung HMJ, Wang SJ. Some controversial multiple testing problems in regulatory applications. Journal of Biopharmaceutical Statistics. 2009;19:1–11. doi: 10.1080/10543400802541693. [DOI] [PubMed] [Google Scholar]
6.Dmitrienko A, Tamhane AC, Bretz F. Multiple Testing Problems in Pharmaceutical Statistics. Chapman and Hall; Boca Raton, FL: 2010. [Google Scholar]
7.Wang W. Estimating the association parameter for copula models under dependent censoring. Journal of the Royal Statistical Society, Series B. 2003;65:257–273. doi: 10.1111/1467-9868.00385. [DOI] [Google Scholar]
8.Hsu L, Prentice RL. On assessing the strength of dependency between failure time variates. Biometrika. 1996;83:491–506. doi: 10.1093/biomet/83.3.491. [DOI] [Google Scholar]
9.Kellerer AM, Chmelevsky D. Small-sample properties of censored-data rank tests. Biometrics. 1983;39:675–682. doi: 10.2307/2531095. [DOI] [Google Scholar]
10.Hsieh FY. Comparing sample size formulae for trials with unbalanced allocation using the logrank test. Statistics in Medicine. 1992;11:1091–1098. doi: 10.1002/sim.4780110810. [DOI] [PubMed] [Google Scholar]
11.Strawderman RL. An asymptotic analysis of the logrank test. Lifetime Data Analysis. 1997;3:225–249. doi: 10.1023/A:1009648914586. [DOI] [PubMed] [Google Scholar]
12.Westfall PH, Tobias RD, Rom D, Wolfinger RD, Hochberg Y. Multiple Comparisons and Multiple Tests Using the SAS System. SAS; Cary NC: 2011. [Google Scholar]
13.Senn S, Bretz F. Power and sample size when multiple endpoints are considered. Pharmaceutical Statistics. 2007;6:161–170. doi: 10.1002/pst.301. [DOI] [PubMed] [Google Scholar]
14.Sugimoto T, Sozu T, Hamasaki T. A convenient formula for calculating sample size of clinical trials with multiple co-primary continuous endpoints. Pharmaceutical Statistics. 2012;11:118–128. doi: 10.1002/pst.505. [DOI] [PubMed] [Google Scholar]
15.Freedman LS. Table of the number of patients required in clinical trials using the logrank test. Statistics in Medicine. 1982;1:121–129. doi: 10.1002/sim.4780010204. [DOI] [PubMed] [Google Scholar]
16.Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease. Biometrika. 1978;65:141–151. doi: 10.1093/biomet/65.1.141. [DOI] [Google Scholar]
17.Hougaard P. A class of multivariate failure time distribution. Biometrika. 1986;73:671–678. doi: 10.1093/biomet/71.1.75. [DOI] [Google Scholar]
18.Perren TJ, Swart AM, Pfisterer J, Ledermann JA, Pujade-Lauraine E, Kristensen G, Carey MS, Beale P, Cervantes A, Kurzeder C, du Bois A, Sehouli J, Kimmig R, Stähle A, Collinson F, Essapen S, Gourley C, Lortholary A, Selle F, Mirza MR, Leminen A, Plante M, Stark D, Qian W, Parmar MK, Oza AM. A phase 3 trial of bevacizumab in ovarian cancer. New England Journal of Medicine. 2011;365:2484–2496. doi: 10.1056/NEJMoa1103799. [DOI] [PubMed] [Google Scholar]
19.Wiens B, Dmitrienko A. On selecting a multiple comparison procedure for analysis of a clinical trial: fallback, fixed-sequence and related procedures. Statistics in Biopharmaceutical Research. 2010;2:22–32. doi: 10.1198/sbr.2010.08035. [DOI] [Google Scholar]
20.Bretz F, Hothorn T, Westfall P. Multiple Comparisons Using R. Boca Raton, FL: Chapman and Hall; 2011. [Google Scholar]
21.Lagakos SW. A stochastic model for censored-survival data in the presence of an auxiliary variable. Biometrics. 1976;32:551–559. [PubMed] [Google Scholar]
22.Lagakos SW. Using auxiliary variables for improved estimates of survival time. Biometrics. 1977;33:399–404. [PubMed] [Google Scholar]
23.Lin DY, Robins JM, Wei LJ. Comparing two failure time distributions in the presence of dependent censoring. Biometrika. 1996;83:381–393. doi: 10.1093/biomet/83.2.381. [DOI] [Google Scholar]
24.Shih JH. A goodness-of-fit for association in a bivariate survival model. Biometrika. 1998;85:189–200. doi: 10.1093/biomet/85.1.189. [DOI] [Google Scholar]
25.Chang SH. A two-sample comparison for multiple ordered event data. Biometrics. 2000;56:183–189. doi: 10.1111/j.0006-341x.2000.00183.x. [DOI] [PubMed] [Google Scholar]
26.Wang R, Lagakos SW, Gray RJ. Testing and interval estimation for two-sample survival comparisons with small sample sizes and unequal censoring. Biostatistics. 2010;11:676–692. doi: 10.1093/biostatistics/kxq021. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Siannis F, Farewell VT, Head J. A multi-state model for joint modelling of terminal and non-terminal events with application to Whitehall II. Statistics in Medicine. 2007;26:426–442. doi: 10.1002/sim.2342. [DOI] [PubMed] [Google Scholar]
28.Parast L, Tian L, Cai T. Landmark estimation of survival and treatment effect in a randomized clinical trial. Journal of the American Statistical Association. 2014;109:384–394. doi: 10.1080/01621459.2013.842488. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Heinze G, Gnant M, Schemper M. Exact log-rank test for unequal follow-up. Biometrics. 2003;59:1151–1157. doi: 10.1111/j.0006-341X.2003.00132.x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SUPPLEMENTARY MATERIAL

NIHMS877489-supplement-SUPPLEMENTARY_MATERIAL.pdf^{(280.5KB, pdf)}

[R1] 1.Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika. 2001;88:907–919. doi: 10.1093/biomet/88.4.907. [DOI] [Google Scholar]

[R2] 2.Hamasaki T, Sugimoto T, Evans SR, Sozu T. Sample size determination for clinical trials with co-primary outcomes: exponential event times. Pharmaceutical Statistics. 2013;12:28–34. doi: 10.1002/pst.1545. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Sugimoto T, Sozu T, Hamasaki T, Evans SR. A logrank test-based method for sizing clinical trials with two co-primary time-to-event endpoints. Biostatistics. 2013;14:409–421. doi: 10.1093/biostatistics/kxs057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Offen W, Chuang-Stein C, Dmitrienko A, Littman G, Maca J, Meyerson L, Muirhead R, Stryszak P, Boddy A, Chen K, Copley-Merriman K, Dere W, Givens S, Hall D, Henry D, Jackson JD, Krishen A, Liu T, Ryder S, Sankoh AJ, Wang J, Yeh CH. Multiple co-primary endpoints: medical and statistical solutions. Drug Information Journal. 2007;41:31–46. doi: 10.1177/009286150704100105. [DOI] [Google Scholar]

[R5] 5.Hung HMJ, Wang SJ. Some controversial multiple testing problems in regulatory applications. Journal of Biopharmaceutical Statistics. 2009;19:1–11. doi: 10.1080/10543400802541693. [DOI] [PubMed] [Google Scholar]

[R6] 6.Dmitrienko A, Tamhane AC, Bretz F. Multiple Testing Problems in Pharmaceutical Statistics. Chapman and Hall; Boca Raton, FL: 2010. [Google Scholar]

[R7] 7.Wang W. Estimating the association parameter for copula models under dependent censoring. Journal of the Royal Statistical Society, Series B. 2003;65:257–273. doi: 10.1111/1467-9868.00385. [DOI] [Google Scholar]

[R8] 8.Hsu L, Prentice RL. On assessing the strength of dependency between failure time variates. Biometrika. 1996;83:491–506. doi: 10.1093/biomet/83.3.491. [DOI] [Google Scholar]

[R9] 9.Kellerer AM, Chmelevsky D. Small-sample properties of censored-data rank tests. Biometrics. 1983;39:675–682. doi: 10.2307/2531095. [DOI] [Google Scholar]

[R10] 10.Hsieh FY. Comparing sample size formulae for trials with unbalanced allocation using the logrank test. Statistics in Medicine. 1992;11:1091–1098. doi: 10.1002/sim.4780110810. [DOI] [PubMed] [Google Scholar]

[R11] 11.Strawderman RL. An asymptotic analysis of the logrank test. Lifetime Data Analysis. 1997;3:225–249. doi: 10.1023/A:1009648914586. [DOI] [PubMed] [Google Scholar]

[R12] 12.Westfall PH, Tobias RD, Rom D, Wolfinger RD, Hochberg Y. Multiple Comparisons and Multiple Tests Using the SAS System. SAS; Cary NC: 2011. [Google Scholar]

[R13] 13.Senn S, Bretz F. Power and sample size when multiple endpoints are considered. Pharmaceutical Statistics. 2007;6:161–170. doi: 10.1002/pst.301. [DOI] [PubMed] [Google Scholar]

[R14] 14.Sugimoto T, Sozu T, Hamasaki T. A convenient formula for calculating sample size of clinical trials with multiple co-primary continuous endpoints. Pharmaceutical Statistics. 2012;11:118–128. doi: 10.1002/pst.505. [DOI] [PubMed] [Google Scholar]

[R15] 15.Freedman LS. Table of the number of patients required in clinical trials using the logrank test. Statistics in Medicine. 1982;1:121–129. doi: 10.1002/sim.4780010204. [DOI] [PubMed] [Google Scholar]

[R16] 16.Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease. Biometrika. 1978;65:141–151. doi: 10.1093/biomet/65.1.141. [DOI] [Google Scholar]

[R17] 17.Hougaard P. A class of multivariate failure time distribution. Biometrika. 1986;73:671–678. doi: 10.1093/biomet/71.1.75. [DOI] [Google Scholar]

[R18] 18.Perren TJ, Swart AM, Pfisterer J, Ledermann JA, Pujade-Lauraine E, Kristensen G, Carey MS, Beale P, Cervantes A, Kurzeder C, du Bois A, Sehouli J, Kimmig R, Stähle A, Collinson F, Essapen S, Gourley C, Lortholary A, Selle F, Mirza MR, Leminen A, Plante M, Stark D, Qian W, Parmar MK, Oza AM. A phase 3 trial of bevacizumab in ovarian cancer. New England Journal of Medicine. 2011;365:2484–2496. doi: 10.1056/NEJMoa1103799. [DOI] [PubMed] [Google Scholar]

[R19] 19.Wiens B, Dmitrienko A. On selecting a multiple comparison procedure for analysis of a clinical trial: fallback, fixed-sequence and related procedures. Statistics in Biopharmaceutical Research. 2010;2:22–32. doi: 10.1198/sbr.2010.08035. [DOI] [Google Scholar]

[R20] 20.Bretz F, Hothorn T, Westfall P. Multiple Comparisons Using R. Boca Raton, FL: Chapman and Hall; 2011. [Google Scholar]

[R21] 21.Lagakos SW. A stochastic model for censored-survival data in the presence of an auxiliary variable. Biometrics. 1976;32:551–559. [PubMed] [Google Scholar]

[R22] 22.Lagakos SW. Using auxiliary variables for improved estimates of survival time. Biometrics. 1977;33:399–404. [PubMed] [Google Scholar]

[R23] 23.Lin DY, Robins JM, Wei LJ. Comparing two failure time distributions in the presence of dependent censoring. Biometrika. 1996;83:381–393. doi: 10.1093/biomet/83.2.381. [DOI] [Google Scholar]

[R24] 24.Shih JH. A goodness-of-fit for association in a bivariate survival model. Biometrika. 1998;85:189–200. doi: 10.1093/biomet/85.1.189. [DOI] [Google Scholar]

[R25] 25.Chang SH. A two-sample comparison for multiple ordered event data. Biometrics. 2000;56:183–189. doi: 10.1111/j.0006-341x.2000.00183.x. [DOI] [PubMed] [Google Scholar]

[R26] 26.Wang R, Lagakos SW, Gray RJ. Testing and interval estimation for two-sample survival comparisons with small sample sizes and unequal censoring. Biostatistics. 2010;11:676–692. doi: 10.1093/biostatistics/kxq021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Siannis F, Farewell VT, Head J. A multi-state model for joint modelling of terminal and non-terminal events with application to Whitehall II. Statistics in Medicine. 2007;26:426–442. doi: 10.1002/sim.2342. [DOI] [PubMed] [Google Scholar]

[R28] 28.Parast L, Tian L, Cai T. Landmark estimation of survival and treatment effect in a randomized clinical trial. Journal of the American Statistical Association. 2014;109:384–394. doi: 10.1080/01621459.2013.842488. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Heinze G, Gnant M, Schemper M. Exact log-rank test for unequal follow-up. Biometrics. 2003;59:1151–1157. doi: 10.1111/j.0006-341X.2003.00132.x. [DOI] [PubMed] [Google Scholar]

PERMALINK

Sizing clinical trials when comparing bivariate time-to-event outcomes

Tomoyuki Sugimoto

Toshimitsu Hamasaki

Scott Evans

Takashi Sozu

Abstract

1. Introduction

Table I.

2. Censoring schemes, dependency, and correlation

2.1. Notation and framework

2.2. Censoring schemes and dependence measures for two time-to-event outcomes

Examples of the three censoring scenarios

Definition for the composite endpoint and handling censoring

Dependence measure

2.3. Bivariate structure of two log-rank test statistics

Table II.

Non-composite case

Composite case

Non-composite case

Composite case

3. Co-primary endpoints

3.1. Hypothesis testing, power, and sample sizes

3.2. Behavior of the sample size

Formula performance and sample size behavior: one fatal case

Table III.

Sample size behavior under correlated both fatal outcomes

Table IV.

3.3. Illustration: the ICON7 study

Table V.

4. Multiple primary endpoints

4.1. Hypothesis testing, power, and sample sizes

4.2. Illustration: one fatal case

Table VI.

4.3. Behavior of the sample size as a function of the correlation

Figure 1.

5. Summary

Supplementary Material

Acknowledgments

APPEENDIX A. Asymptotic forms in the bivariate log-rank statistic

One fatal case

Both fatal case

APPENDIX B. Probability formula for the observed two endpoints

Table BI.

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases