Quantifying the Feasibility of Shortening Clinical Trial Duration Using Surrogate Markers

Xuan Wang; Tianxi Cai; Lu Tian; Florence Bourgeois; Layla Parast

doi:10.1002/sim.9185

. Author manuscript; available in PMC: 2022 Dec 10.

Published in final edited form as: Stat Med. 2021 Sep 2;40(28):6321–6343. doi: 10.1002/sim.9185

Quantifying the Feasibility of Shortening Clinical Trial Duration Using Surrogate Markers

Xuan Wang ¹, Tianxi Cai ^1,², Lu Tian ³, Florence Bourgeois ⁴, Layla Parast ⁵

PMCID: PMC8595715 NIHMSID: NIHMS1752140 PMID: 34474500

Summary

The potential benefit of using a surrogate marker in place of a long-term primary outcome is very attractive in terms of the impact on study length and cost. Many available methods for quantifying the effectiveness of a surrogate endpoint either rely on strict parametric modeling assumptions or require that the primary outcome and surrogate marker are fully observed i.e., not subject to censoring. Moreover, available methods for quantifying surrogacy typically provide a proportion of treatment effect explained (PTE) measure and do not directly address the important questions of whether and how the trial can be ended earlier using the surrogate marker. In this paper, we specifically address these important questions by proposing a PTE measure to quantify the feasibility of ending trials early based on endpoint information collected at an earlier landmark point t₀ in a time-to-event outcome setting. We provide a framework for deriving an optimally predicted outcome for individual patients at t₀ based on a combination of surrogate marker and event time information in the presence of censoring. We propose a non-parametric estimator for the PTE measure and derive the asymptotic properties of our estimators. Finite sample performance of our estimators are illustrated via extensive simulation studies and a real data application examining the potential of hemoglobin A1c and fasting plasma glucose to predict treatment effects on long term diabetes risk based on the Diabetes Prevention Program study.

Keywords: surrogate marker, proportion of treatment effect explained by the surrogate, nonparametric estimation, censored data

1 |. INTRODUCTION

Primary endpoints for assessing the effectiveness of a new treatment in randomized clinical trials (RCTs) often require long-term follow-up, especially when the outcome is time to the occurrence of a clinical event. The long duration of RCTs is a challenge in that it hinders timely access to needed treatment as well increases study costs.^1,2,3 Efficient trial designs that shorten trial duration could both reduce trial costs and accelerate decision-making regarding treatment effectiveness and distribution. One potential strategy to shorten trial duration is to use surrogate markers collected at an earlier time point during the course of the trial in place of the long term outcome. For example, the surrogate marker hemoglobin A1c has been previously considered by the U.S. Food and Drug Administration as the basis for drug approval in patients with Type 2 diabetes.⁴ Towards this end, the statistical, epidemiological, and clinical research communities have made substantial progress over the past 30 years by proposing, evaluating, and using methods to assess the value of potential surrogate markers.^{5,6,7,8,9,10,11}

Existing methods for quantifying surrogacy can be largely categorized as model based and model free. Model based approaches typically impose regression models relating the outcome to the treatment alone and to both the treatment and the surrogate marker.^12,13,14,15 The proportion of the treatment effect on the primary outcome explained by the surrogate marker (PTE) can then be computed as a ratio of the regression coefficients for treatment. However, when the model assumptions fail, resulting PTE estimates are often biased and invalid.^13,16 To avoid reliance on model specification, model free non-parametric approaches have recently been proposed as useful alternatives for quantifying PTE.^16,17

When the long-term outcome is a time-to-event outcome measured at time t, quantifying the PTE of a surrogate marker measured at an earlier time, denoted by t₀ < t, is further complicated by the fact that the surrogate marker may not be observable for those who drop out of the study or who experience the clinical event by t₀. Most existing methods require that the surrogate marker be observed for all subjects. However, recently, Parast et al. (2017)¹⁸ proposed a non-parametric framework for estimating the PTE of a surrogate marker in the presence of such complications from censoring. While useful, limitations of this method include requiring monotonicity assumptions about the relationship between the surrogate marker and the outcome as well as the support of the surrogate marker being the same among the two treatment groups.

In this paper, we propose an alternative model free PTE measure to quantify the feasibility of ending trials early based on surrogate information collected at an earlier time t₀ in a time-to-event outcome setting. We provide a model free framework for deriving an optimally predicted outcome for individual patients based on a combination of surrogate marker and event time information collected by t₀. This framework is similar to those previously proposed in Price et al. (2018)¹¹ and Wang et al. (2020)¹⁷ in that an optimal transformation of the surrogate marker is identified to best approximate the long term outcome. However, the present setting is substantially more challenging due to censoring and the possibility that the surrogate marker may not be available among those who have experienced the primary event or are censored prior to t₀. Furthermore, our framework allows us to relax assumptions required by Parast et al. (2017).¹⁸

The remainder of the paper is organized as follows. We first derive the optimal transformation function of the surrogate information, g_opt, define the model free PTE measure and then provide non-parametric estimation procedures for g_opt and PTE. We derive the asymptotic properties of our estimators, demonstrate good finite sample performance of our estimators using a simulation study, and illustrate our approach by examining two potential surrogate markers for diabetes, hemoglobin A1c and fasting plasma glucose, using data from the Diabetes Prevention Program (DPP) study.

2 |. METHODOLOGY

2.1 |. An Optimal Prediction Function g and Model Free Definitions of PTE

Let T denote the primary event time outcome and S be the surrogate marker measured at time t₀ < t which can be either discrete or continuous. We treat S as continuous for conciseness of presentation but the proposed methods can be easily modified to accommodate discrete S. Due to censoring for T one can only observe X = min(T, C) and δ = I(T ≤ C), where C is the censoring time. In addition, we allow S to be not observed for those with X < t₀. Under the standard causal inference framework, let T^(a), C^(a), S^(a) denote the respective potential event time, censoring time, and surrogate under treatment A = a ∈ {0, 1}. In practice, {T⁽¹⁾, C⁽¹⁾, S⁽¹⁾} and {T⁽⁰⁾, C⁽⁰⁾, S⁽⁰⁾} cannot both be observed for the same subject. We assume that treatment assignment is random, P(A = a) = 0.5. The observable data for analysis consist of n sets of independent and identically distributed random vectors $D = {D_{i} = {(X_{i}, δ_{i}, S_{i} I (X_{i} \geq t_{0}), A_{i})}^{⊤}, i = 1, \dots, n}$ , where $T_{i} = T_{i}^{(1)} A_{i} + T_{i}^{(0)} (1 - A_{i})$ , $C_{i} = C_{i}^{(1)} A_{i} + C_{i}^{(0)} (1 - A_{i})$ , $S_{i} = S_{i}^{(1)} A_{i} + S_{i}^{(0)} (1 - A_{i})$ is only observed for those with X_i ≥ t₀, and $C_{i}^{(a)}$ is assumed to be independent of ( $T_{i}^{(a)}$ , $S_{i}^{(a)}$ ) with $P (C_{i}^{(a)} > t) > 0$ for a = 0, 1.

We define the treatment effect, Δ(t), as the risk difference at time t:

Δ (t) = μ_{1} (t) - μ_{0} (t), where μ_{a} (t) = E (Y_{t}^{(a)}) and Y_{t}^{(a)} = I (T^{(a)} > t) .

Our goal is to evaluate to what extent surrogate information on T and S collected by t₀, defined as $Q_{t_{0}} = {(Y_{t_{0}}, Y_{t_{0}} S)}^{⊤}$ , can be used to approximate the long term treatment effect Δ(t). We consider $Q_{t_{0}}$ instead of only S at t₀ as in Parast et al.¹⁸ since for those with T < t₀, $Y_{t_{0}}$ is observed while S is not observable. Let $g (Q_{t_{0}}) = (1 - Y_{t_{0}}) g_{1} + Y_{t_{0}} g_{2} (S) \in ℝ$ be the prediction function that maps $Q_{t_{0}} \in {0, 1} \times ℝ$ to a predicted outcome and define the treatment effect on the predicted outcome $g (Q_{t_{0}})$ as

Δ_{g} (t_{0}) = μ_{g, 1} (t_{0}) - μ_{g, 0} (t_{0}), where μ_{g, a} (t_{0}) = E {g (Q_{t_{0}}^{(a)})} .

We aim to study to what extent the estimated treatment effect Δ_g(t₀) can be used to approximate the target treatment effect Δ(t) and how to choose g.

It is essential to identify an optimal prediction function, $g_{opt} (Q_{t_{0}}) = (1 - Y_{t_{0}}) g_{1, opt} + Y_{t_{0}} g_{2, opt} (S)$ , such that $g_{opt} (Q_{t_{0}})$ maximally predicts Y. Throughout, we suppress t₀ from notations for g_opt although g_opt and its estimator inherently depend on t₀. To this end, we follow the strategy of Wang et al.¹⁷ and identify an optimal g that minimizes a mean squared error

L_{oracle} (g) = E {[(Y_{t}^{(1)} - Y_{t}^{(0)}) - {g (Q_{t_{0}}^{(1)}) - g (Q_{t_{0}}^{(0)})}]}^{2} .

That is, we aim to find an optimal prediction function g_opt(·), such that $g_{opt} (Q_{t_{0}}^{(1)}) - g_{opt} (Q_{t_{0}}^{(0)})$ can best approximate $Y_{t}^{(1)} - Y_{t}^{(0)}$ . However, both ( $Y_{t}^{(1)}$ , S⁽¹⁾) and ( $Y_{t}^{(0)}$ , S⁽⁰⁾) are not observable simultaneously for one individual in practice and thus, the correlations between ( $Y_{t}^{(1)}$ , $Q_{t_{0}}^{(1)}$ ) and ( $Y_{t}^{(0)}$ , $Q_{t_{0}}^{(0)}$ ) are not identifiable. Therefore, we instead aim to minimize $L_{oracle} (g)$ under the working independence assumption:

(A1) (Y_{t}^{(1)}, S^{(1)} Y_{t_{0}}^{(1)}) ⊥ (Y_{t}^{(0)}, S^{(0)} Y_{t_{0}}^{(0)}) .

Importantly, we use (A1) only to construct g_opt(·); all proposed inference procedures do not require this working assumption to hold (see Section 5 for more discussion). Under (A1), $L_{oracle} (g)$ is reduced to

L (g) = E [{Y_{t}^{(1)} - g (Q_{t_{0}}^{(1)})}^{2} + {Y_{t}^{(0)} - g (Q_{t_{0}}^{(0)})}^{2}] - 2 E {Y_{t}^{(1)} - g (Q_{t_{0}}^{(1)})} E {Y_{t}^{(0)} - g (Q_{t_{0}}^{(0)})} .

Since g_opt(·) is only identifiable up to a constant shift in $L_{oracle} (g)$ , we define g_opt(·) as the following constrained minimizer to make g_opt(·) location identifiable as well as simplify the minimization problem:

min_{g \in F} E [{Y_{t} - g (Q_{t_{0}})}^{2}] under the constraint E [{Y_{t} - g (Q_{t_{0}})} | A = 0] = 0,

(1)

where $F$ is the class of measurable functions. Using the fact that the primary outcome can be rewritten as Y_t = I(T > t) = I(T > t₀)I(T > t) = I(T > t₀)Y_t, it can be shown that

E [{Y_{t} - g (Q_{t_{0}})}] = E [I (T \leq t_{0}) (0 - g_{1}) + I (T > t_{0}) {I (T > t) - g_{2} (S)}],

E [{Y_{t} - g (Q_{t_{0}})}^{2}] = E [I (T \leq t_{0}) g_{1}^{2} + I (T > t_{0}) {I (T > t) - g_{2} (S)}^{2}] .

So (1) is equivalent to the following optimization problem for g₁ and g₂(·),

min_{g_{1}, g_{2}} E [(1 - Y_{t_{0}}) g_{1}^{2} + Y_{t_{0}} {Y_{t} - g_{2} (S)}^{2}] under the constraint E [(1 - Y_{t_{0}}) (- g_{1}) + Y_{t_{0}} {Y_{t} - g_{2} (S)} | A = 0] = 0.

In Appendix A, we show that the solution to the above optimization problem is

g_{1, opt} = λ \frac{P (T^{(0)} \leq t_{0})}{P (T \leq t_{0})} = λ \frac{1 - μ_{0} (t_{0})}{1 - μ (t_{0})}, g_{2, opt} (s) = m (s | t_{0}) + λ \frac{f_{0} (s | t_{0}) μ_{0} (t_{0})}{f (s | t_{0}) μ (t_{0})},

(2)

where $μ (t) = E (Y_{t}) = P (T > t)$ , $μ_{a} (t) = E (Y_{t} | A = a) = P (T > t | A = a)$ , $m (s | t_{0}) = E (Y_{t} | S = s, T > t_{0})$ , f_a(s|t₀) and f(s|t₀) are the respective conditional density functions of S | (T > t₀, A = a) and S | T > t₀,

λ = c {\int \frac{f_{0} {(s | t_{0})}^{2} μ_{0} {(t_{0})}^{2}}{f (s | t_{0}) μ (t_{0})} d s + \frac{{1 - μ_{0} (t_{0})}^{2}}{1 - μ (t_{0})}}^{- 1} with c = E [Y_{t_{0}} {Y_{t} - m (S | t_{0})} | A = 0] = \frac{μ_{1} (t_{0}) μ_{0} (t_{0})}{μ_{1} (t_{0}) + μ_{0} (t_{0})} \int \frac{{m_{0} (s | t_{0}) - m_{1} (s | t_{0})} f_{1} (s | t_{0}) f_{0} (s | t_{0})}{f (s | t_{0})} d s

and $m_{a} (s | t_{0}) = E (Y_{t} | A = a, S = s, T > t_{0})$ , a = 0, 1. The magnitude of λ tends to be relatively small since for a good surrogate marker S, m₀(s|t₀) − m₁(s|t₀) is small (note that when m₀(s|t₀) − m₁(s|t₀) is small, c is small, and thus, λ is small, resulting in a small g_1,opt). Thus, g_1,opt is usually very small and g_2,opt is often dominated by m(s|t₀).

With g_opt(·), we may approximate the treatment effect on Y_t by the treatment effect on $g_{opt} (Q_{t_{0}}) = I (T \leq t_{0}) g_{1, opt} + I (T > t_{0}) g_{2, opt} (S)$ ,

Δ_{g_{opt}} (t_{0}) = E {g_{opt} (Q_{t_{0}}^{(1)}) - g_{opt} (Q_{t_{0}}^{(0)})} .

We then define the PTE of $Q_{t_{0}}$ as

{PTE}_{g_{opt}} (t_{0}) = Δ_{g_{opt}} (t_{0}) / Δ (t);

(3)

that is, the proportion of the treatment effect on the long-term outcome that is captured by the treatment effect on surrogate transformation. As defined, the measure ${PTE}_{g_{opt}} (t_{0})$ is expected to be close to 1 if $g_{opt} (Q_{t_{0}})$ is a perfect surrogate and to be close to 0 if $g_{opt} (Q_{t_{0}})$ is a useless surrogate.

Remark 1.

Note that we define the prediction transformation function of the surrogate information $Q_{t_{0}}$ to be $g (Q_{t_{0}}) = (1 - Y_{t_{0}}) g_{1} + Y_{t_{0}} g_{2} (S)$ . Another natural prediction function is $g (Q_{t_{0}}) = (1 - Y_{t_{0}}) Y_{t_{0}} + Y_{t_{0}} g_{2} (S) = (1 - Y_{t_{0}}) 0 + Y_{t_{0}} g_{2} (S) = Y_{t_{0}} g_{2} (S)$ as for those with T < t₀, $Y_{t_{0}} = 0$ is observed, and g₁ in the first transformation function can be regarded as a function of 0, or g₁ ≡ g₁(0). Since the optimal g_1,opt is usually very small, these two definitions of transformation functions of the surrogate information are very close. While we focus on the first transformation function in the paper, in Appendix B, we also derive the form of the optimal function of the surrogate and similar PTE to (3) with this optimal function i.e. with g₁ ≡ 0, and provide simulation results comparing the two approaches.

As mentioned previously, Parast et al. (2017)¹⁸ also offer a PTE measure in a time-to-event outcome setting, which we denote by PTE_L, and which is indexed by a reference distribution of the surrogate marker. We show in Appendix B that there is a correspondence between our proposed PTE with the second transformation and the PTE definition of Parast et al. (2017)¹⁸ with a particular reference distribution in their definition of PTE. The correspondence will approximately hold for the PTE in Section 2. In addition, the proposed new definition of PTE for both transformations also relax strong assumptions required by Parast et al. (2017).¹⁸ Specifically, to ensure that PTE is between 0 and 1, they require that P(S⁽¹⁾ > s, T⁽¹⁾ > t₀) ≥ P(S⁽⁰⁾ > s, T⁽⁰⁾ > t₀) for all s, P(T⁽¹⁾ > t|S⁽¹⁾ = s, T⁽¹⁾ > t₀) ≥ P(T⁽⁰⁾ > t|S⁽⁰⁾ = s, T⁽⁰⁾ > t₀) for all s, P(T⁽¹⁾ > t|S⁽¹⁾ = s, T⁽¹⁾ > t₀) is a monotone increasing function of s, and S⁽¹⁾ and S⁽⁰⁾ have the same support. We show in Appendix B that our proposed PTE only needs the following two relaxed conditions to guarantee it is between 0 and 1:

(C1) P (U^{(1)} > u, T^{(1)} > t_{0}) > P (U^{(0)} > u, T^{(0)} > t_{0}) for all u;

(C2) P (T^{(1)} > t | U^{(1)} = u, T^{(1)} > t_{0}) > P (T^{(0)} > t | U^{(0)} = u, T^{(0)} > t_{0}) for all u,

where U^(a) ≈ g_2,opt(S^(a)) ≈ m(S^(a)|t₀). Condition (C1) implies that the distribution of U in group 1 is stochastically higher than the distribution of U in group 0 given $Y_{t_{0}} = 1$ ; condition (C2) implies that the conditional mean of Y_t given U and $Y_{t_{0}} = 1$ in group 1 is larger than the conditional mean of Y_t given U and $Y_{t_{0}} = 1$ in group 0. These two conditions are more likely to be satisfied with the transformed surrogate U than the original S as required by Wang and Taylor (2002)¹⁴ and Parast et al. (2017).¹⁸ In addition, there is no requirement on the shared support of the surrogate marker between the two treatment arms. When examining a surrogate marker that has some prior evidence supporting its potential utility as a surrogate, it is likely reasonable to expect these conditions to hold. If g_2,opt(s) is monotone increasing in s, we may replace U^(a) by S^(a) in these two aforementioned conditions. Lastly, these two conditions can be examined empirically in practice. We compare our proposed approach to that of Parast et al. (2017)¹⁸ in our numerical studies.

2.2 |. Non-parametric estimation of g_opt and ${PTE}_{g_{opt}}$

In practice, g_1,opt and g_2,opt(s) are unknown and need to be estimated. To this end, we propose to use kernel smoothing to non-parametrically estimate components of g_1,opt and g_2,opt(s) with inverse probability weights (IPW) to account for censoring, noting our earlier assumption that $C_{i}^{(a)}$ is independent of ( $T_{i}^{(a)}$ , $S_{i}^{(a)}$ ) and $P (C_{i}^{(a)} > t) > 0$ for a = 0, 1. Specifically, we estimate f_a(s|t₀), f(s|t₀), μ_a(t₀), μ(t₀), m(s|t₀), $c = E {Y_{t} - m (S | t_{0}) | A = 0, T > t_{0}} μ_{0} (t_{0})$ respectively as

{\hat{f}}_{a} (s | t_{0}) = \frac{\sum_{A_{i} = a} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{A_{i} = a} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}, \hat{f} (s | t_{0}) = \frac{\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{i = 1}^{n} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}, {\hat{μ}}_{a} (t_{0}) = \frac{\sum_{A_{i} = a} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{A_{i} = a} {\hat{ω}}_{t_{0}, i}}, \hat{μ} (t_{0}) = \frac{\sum_{i = 1}^{n} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{i = 1}^{n} {\hat{ω}}_{t_{0}, i}} \hat{m} (s | t_{0}) = \frac{\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (X_{i} > t) {\hat{ω}}_{t, i} / \sum_{i = 1}^{n} {\hat{ω}}_{t, i}}{\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i} / \sum_{i = 1}^{n} {\hat{ω}}_{t_{0}, i}}, \hat{c} = \frac{\sum_{A_{i} = 0} I (X_{i} > t) {\hat{ω}}_{t, i}}{\sum_{A_{i} = 0} {\hat{ω}}_{t, i}} - \frac{\sum_{A_{i} = 0} \hat{m} (S_{i} | t_{0}) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{A_{i} = 0} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}, and \hat{λ} = \hat{c} {\int \frac{{\hat{f}}_{0} {(s | t_{0})}^{2} {\hat{μ}}_{0} {(t_{0})}^{2}}{\hat{f} (s | t_{0}) \hat{μ} (t_{0})} d s + \frac{{1 - {\hat{μ}}_{0} (t_{0})}^{2}}{1 - \hat{μ} (t_{0})}}^{- 1},

where ${\hat{ω}}_{t, i} = {I (X_{i} \leq t) δ_{i} + I (X_{i} > t)} / {\hat{G}}_{A_{i}} (X_{i} \land t)$ is the weight accounting for censoring, ${\hat{G}}_{a} (t)$ is the Kaplan-Meier estimator of G_a(t) = P(C > t | A = a) and K_h(·) = K(·/h)/h is a symmetric kernel function with bandwidth h. As is often the case with nonparametric functional estimation procedures, the choice of the bandwidth h is critical. In order to eliminate the impact of the bias of the estimated conditional functions on the resulting estimator, we require the standard undersmoothing assumption of h = O(n^−v), v ∈ (1/5, 1/2). The choice of h can have a great impact on the resulting estimation procedure, especially in smaller sample sizes, and care is needed in its selection. To obtain h, we choose the bandwidth $h = h_{o p t} n^{- c_{0}}$ to ensure the needed undersmoothing, where h_opt is obtained using the procedure of Scott(1992)¹⁹ and in all numerical examples we choose c₀ = 0.11. Alternative choices for c₀ may be used as a sensitivity analysis, as well as alternative approaches to select the bandwidth h_opt, such as cross-validation, as long as the needed under-smoothing rate is achieved.

Based on these quantities, we may construct a plug-in estimate for (g_1,opt, g_2,opt(s))^⊤, denoted by ${({\hat{g}}_{1}, {\hat{g}}_{2} (s))}^{⊤}$ .

{\hat{g}}_{1} = \hat{λ} \frac{1 - {\hat{μ}}_{0} (t_{0})}{1 - \hat{μ} (t_{0})}, {\hat{g}}_{2} (s) = \hat{m} (s | t_{0}) + \hat{λ} \frac{{\hat{f}}_{0} (s | t_{0}) {\hat{μ}}_{0} (t_{0})}{\hat{f} (s | t_{0}) \hat{μ} (t_{0})} .

(4)

In Appendix C, we show that $n^{\frac{1}{2}} {{\hat{g}}_{1} - g_{1, opt}}$ and ${(n h)}^{\frac{1}{2}} {{\hat{g}}_{2} (s) - g_{2, opt} (s)}$ converge jointly in distribution to a multivariate normal distribution with mean 0 and variance-covariance Σ²(s).

To estimate ${PTE}_{g_{opt}} (t_{0})$ , we also adopt the IPW strategy and estimate Δ(t) and $Δ_{g_{opt}} (t_{0})$ and then ${PTE}_{g_{opt}} (t_{0})$ respectively, by

\hat{Δ} (t) = {\hat{μ}}_{1} (t) - {\hat{μ}}_{0} (t), {\hat{Δ}}_{g} (t_{0}) = {\hat{μ}}_{g, 1} (t_{0}) - {\hat{μ}}_{g, 0} (t_{0}), and {\hat{PTE}}_{\hat{g}} (t_{0}) = {\hat{Δ}}_{\hat{g}} (t_{0}) / \hat{Δ} (t) .

where {\hat{μ}}_{a} (t) = \frac{\sum_{i = 1}^{n} {\hat{ω}}_{t, i} I (A_{i} = a) Y_{t, i}}{\sum_{i = 1}^{n} {\hat{ω}}_{t, i} I (A_{i} = a)}, {\hat{μ}}_{g, a} (t_{0}) = \frac{\sum_{i = 1}^{n} {\hat{ω}}_{t_{0}, i} I (A_{i} = a) g (Q_{t_{0}, i})}{\sum_{i = 1}^{n} {\hat{ω}}_{t_{0}, i} A_{i}},

and Y_t,i = I(T_i > t). Both Y_t,i and $Q_{t_{0}, i} = {(Y_{t_{0}, i}, Y_{t_{0}, i} S_{i})}^{⊤}$ are observed when the weights are non-zero. In Appendix D, we show that $\sqrt{n} {\hat{Δ} (t) - Δ (t)}$ and $\sqrt{n} {{\hat{Δ}}_{g} (t_{0}) - Δ_{g} (t_{0})}$ , for a given g, converge respectively in distribution to N{0, σ²(t)} and N{0, σ_g(t₀)²}. Inference for ${\hat{PTE}}_{\hat{g}} (t_{0})$ can be derived based on $\hat{Δ} (t)$ , ${\hat{Δ}}_{g} (t_{0})$ and ${({\hat{g}}_{1}, {\hat{g}}_{2} (s))}^{⊤}$ along with their asymptotic distributions.

Using the same dataset to estimate both g_opt and its corresponding PTE may lead to overfitting bias as in standard prediction settings. We thus employ the cross-validation (CV) strategy wherein we estimate g_opt in a training set and the associated PTE in an independent test set. Specifically, denote $I_{k}$ and $I_{- k} = {1,\dots, n} \ I_{k}$ , k = 1, …, K, be a random partition of the index set {1, …, n} of equal sizes, and let $D_{I} = {D_{i}, i \in I}$ . Let ${\hat{g}}_{I}$ denote g_opt estimated based on $D_{I}$ . Given ${\hat{g}}_{I_{k}}$ , ${PTE}_{g_{opt}} (t_{0})$ is estimated using data in $D_{I_{- k}}$ , and denoted by ${\hat{PTE}}_{{\hat{g}}_{I_{k}}}^{(- k)} (t_{0})$ . We then define the CV-based estimator of ${PTE}_{g_{opt}} (t_{0})$ as

{\hat{PTE}}_{CV} (t_{0}) = K^{- 1} \sum_{k = 1}^{K} {\hat{PTE}}_{{\hat{g}}_{I_{k}}}^{(- k)} (t_{0}) .

The consistency of ${\hat{g}}_{I_{k}}$ to g_opt and that of ${\hat{PTE}}_{g}^{(- k)} (t_{0})$ to PTE_g(t₀) guarantee the consistency of ${\hat{PTE}}_{cv} (t_{0})$ to ${PTE}_{g_{opt}} (t_{0})$ . The asymptotic distribution of ${\hat{PTE}}_{cv} (t_{0}) - {PTE}_{g_{opt}} (t_{0})$ can be obtained from the asymptotic expansions of ${\hat{g}}_{I_{k}} - g_{opt}$ , ${\hat{PTE}}_{g}^{(- k)} (t_{0}) - {PTE}_{g} (t_{0})$ . Specifically, when h = O(n^−v) with v ∈ (1/4, 1/2),

n^{\frac{1}{2}} {{\hat{PTE}}_{cv} (t_{0}) - {PTE}_{g_{opt}} (t_{0})} = n^{- \frac{1}{2}} \sum_{i = 1}^{n} ψ_{{PTE}_{g_{opt}}, i} (t_{0}) + o_{p} (1),

which converges in distribution to a normal with mean 0 and variance $τ_{{PTE}_{g_{opt}}}^{2} (t_{0}) = E {ψ_{{PTE}_{g_{opt}}, i} {(t_{0})}^{2}}$ . Given the complexity in constructing explicit estimation of $τ_{{PTE}_{g_{opt}}}^{2} (t_{0})$ , we instead employ resampling methods to estimate this quantity; see Appendix E for details.

3 |. SIMULATION STUDIES

We conducted simulation studies to evaluate the finite sample performance of the proposed methods. Throughout, we let n = 1000, t = 1, use a normal density, and consider t₀ = 0.3, 0.5 and 0.7. All results are summarized based on 1000 replications for each configuration. For each setting, we investigate estimates of g_opt(·) and ${PTE}_{g_{opt}} (t_{0})$ , at each t₀. For comparison of PTEs, we also include results for the estimation of (a) PTE_L(t₀) from Parast et al. (2017)¹⁸ which accounts for censoring but does not use a transformation of the surrogate marker (see Appendix B) and (b) PTE_cox(t₀) as defined in Lin et al. (1997)¹³ which utilizes Cox proportional hazards models and restricts estimation to patients with X > t₀.

We generated censoring C^(a) ~ exponential(0.5),

S^{(a)} ~ Gamma(shape = α_{1 a}, scale = α_{2 a}), and log T^{(a)} = ϵ^{(a)} - log ξ_{a} (S^{(a)}),

where ϵ^(a) ~ an extreme value distribution and the hyper-parameters α_a = (α_1a, α_2a) and ξ_a(·) are chosen to reflect different levels of surrogacy. In setting 1, we let α₁ = (2, 2), α₀ = (9, 0.5), ξ₁(s) = 0.2s, and ξ₀(s) = 0.2 + 0.22s to represent a strong surrogate marker setting where ${PTE}_{g_{opt}} (t_{0})$ ranges from 0.68 to 0.83. In setting 2, we let α₁ = (9, 0.8), α₀ = (2, 2), ξ₁(s) = 0.08s, and ξ₀(s) = 0.3 + 0.22s to represent a moderate surrogate marker setting where ${PTE}_{g_{opt}} (t_{0})$ ranges from 0.45 to 0.78. In setting 3, we consider the situation that S⁽¹⁾ and S⁽⁰⁾ are correlated, so the working independence assumption (A1) does not hold. We let S⁽¹⁾ ~ Gamma(shape = 2, scale = 2), S⁽⁰⁾ = S⁽¹⁾ + U, where U ~ Uniform(0, 1), ξ₁(s) = 0.2s, and ξ₀(s) = 0.2 + 0.22s.

As shown in Table 1 and Figures 1–2, the estimated prediction functions $\hat{g} (\cdot) = {({\hat{g}}_{1}, {\hat{g}}_{2} (\cdot))}^{⊤}$ across different values of s generally had negligible biases and the estimated standard errors were close to their empirical standard errors.

TABLE 1.

Estimates (Est) of g₁ along with their empirical standard errors (ESE), we also present the average of the estimated standard errors (ASE) along with the empirical coverage probabilities (CP, %) of the 95% confidence intervals.

Setting 1
t ₀	g1.true	g1	ESE	ASE	CP
0.3	−0.034	−0.037	0.013	0.013	0.954
0.5	−0.022	−0.027	0.012	0.012	0.929
0.7	−0.022	−0.019	0.009	0.009	0.918
Setting 2
t ₀	g1.true	g1	SE	ESE	CP
0.3	−0.077	−0.077	0.013	0.013	0.938
0.5	−0.051	−0.051	0.012	0.011	0.921
0.7	−0.028	−0.030	0.009	0.009	0.934
Setting 3
t ₀	g1.true	g1	SE	ESE	CP
0.3	−0.044	−0.045	0.016	0.015	0.931
0.5	−0.031	−0.033	0.014	0.013	0.927
0.7	−0.018	−0.021	0.010	0.010	0.941

Open in a new tab

The empirical bias, empirical standard error (ESE) versus average of the estimated standard error (ASE), coverage probabilities (CP) of the 95% confidence intervals for ${\hat{g}}_{2} (s)$ under Setting 1.

Simulation results for ${PTE}_{g_{opt}} (t_{0})$ at t₀ ∈ {0.3, 0.5, 0.7} are shown in Table 2. In setting 1, ${PTE}_{g_{opt}} (t_{0})$ was 0.68, 0.77 and 0.83 for t₀ = 0.3, 0.5 and 0.7 with an increasing trend over t₀. In setting 2, the PTE of $g_{opt} (Q_{t_{0}})$ ranged from 0.45 to 0.78. In setting 3, the PTE of $g_{opt} (Q_{t_{0}})$ ranged from 0.52 to 0.80. Our proposed estimators had negligible biases and estimated standard errors were close to their empirical standard errors with confidence intervals attaining appropriate empirical coverages. The results also indicate that the proposed method is very robust, whether the working independence assumption holds or not.

TABLE 2.

Estimates (Est) of ${PTE}_{g_{opt}} (t_{0})$ (where PTE denotes the proposed estimate), PTE_cox(t₀) and PTE_L(t₀) at t₀ = 0.3, 0.5 and 0.7 along with their empirical standard errors (ESE, in subscript). Shown also are the average of the estimated standard errors (ASE) along with the empirical coverage probabilities (CP, %) of the 95% confidence intervals for our proposed estimator ${\hat{PTE}}_{cv} (t_{0})$ .

Setting 1
t ₀	PTE_L	PTE_cox	PTE.true	PTE	ESE	ASE	CP
0.3	0.564_0.119	0.121_0.130	0.684	0.685	0.093	0.097	0.955
0.5	0.680_0.114	0.103_0.180	0.767	0.761	0.087	0.094	0.965
0.7	0.781_0.095	0.086_0.229	0.827	0.817	0.080	0.088	0.968
Setting 2
t ₀	PTE_L	PTE_cox	PTE.true	PTE	ESE	ASE	CP
0.3	−0.084_0.281	−0.994_0.259	0.448	0.450	0.099	0.101	0.939
0.5	0.304_0.230	−1.009_0.299	0.626	0.619	0.089	0.097	0.965
0.7	0.607_0.145	−1.017_0.330	0.778	0.762	0.076	0.087	0.977
Setting 3
t ₀	PTE_L	PTE_cox	PTE.true	PTE	ESE	ASE	CP
0.3	0.476_0.121	0.098_0.172	0.517	0.510	0.128	0.122	0.927
0.5	0.618_0.112	0.042_0.208	0.659	0.633	0.120	0.121	0.932
0.7	0.751_0.094	−0.009_0.269	0.800	0.745	0.101	0.115	0.959

Open in a new tab

Existing PTE estimators, including PTE_L(t₀) and PTE_cox(t₀), are also shown in Table 2. Since the Cox model did not hold for T | A, the model based estimate PTE_cox(t₀) gave a very low PTE estimate in setting 1 and even negative results in setting 2. Our proposed estimates of ${PTE}_{g_{opt}}$ were of similar magnitude as those given by PTE_L in setting 1. However, since PTE_L is not guaranteed to be valid when the supports of S in the two treatment groups are not the same which was the case in setting 2, estimates of PTE_L were substantially flawed in this setting, particularly when t₀ = 0.3 or 0.5. In Appendix F we also present (1) PTE results without cross-validation for comparison and (2) additional simulation results examining sensitivity to the working independence assumption (A1).

4 |. APPLICATION TO THE DIABETES PREVENTION PROGRAM STUDY

The Diabetes Prevention Program (DPP) RCT investigated the effect of several prevention strategies for reducing the risk of type 2 diabetes (T2D) among high risk individuals with pre-diabetes.^20,21 DPP data are publicly available through the the National Institute of Diabetes and Digestive and Kidney Diseases Central Repository.²² The participants were randomized to one of four treatment groups: placebo, lifestyle intervention, metformin and troglitazone. The primary endpoint of the trial was time to T2D onset and the participants were followed up to 5 years with an average follow up of 2.8 years. Previous study results have shown that both lifestyle and metformin significantly reduced T2D risk.^20,21 For illustration, we focused on the comparison of lifestyle intervention (n₁ = 1024) to placebo (n₀ = 1030) with respect to diabetes risk at t = 4 years; the estimated treatment effect was $\hat{Δ} (4) = 0.157$ . Our goal was to investigate to what extent surrogate information on hemoglobin A1C (HbA1c) or fasting glucose collected at t₀ ∈ {0.5, 1, 2, 3} can be used together with diabetes event information by t₀ to predict the treatment effect on diabetes risk at t = 4 years. We evaluate the surrogacy potential of these markers (change from baseline to t₀) based on the proposed PTE measure.

The PTE estimates of each marker as well as g_1,opt estimates for the proposed method are shown in Table 3, and estimates of g_2,opt(s) are in Figure 4. Estimates of g₁ are all close to 0 and estimates of g_2,opt(s) are substantially different from s itself, though at later times, the ${\hat{g}}_{2} (s)$ is more flat implying similar contributions of ${\hat{g}}_{2} (s)$ to the treatment effect at different s for these later times. The observation that g_2,opt(s) is different from s itself is important because many of the available methods for evaluating a surrogate marker focus on s itself, not any kind of transformation of s let alone an optimal transformation. If we saw that, in contrast, g_2,opt(s) was essentially the same as s itself, this would imply that a transformation of s is not necessary. The PTE estimates were moderate at t₀ = 0.5 and 1 (0.3 ~ 0.5) but increased to 0.8 at t₀ = 2 and almost 1 at t₀ = 3. These results show that the treatment effect on later year surrogate information explained a larger proportion of the treatment effect on 4-year survival, as would be expected. Comparing the two potential surrogates, at earlier times, glucose appeared to have a slightly higher PTE compared to that of HbA1C with the difference being more pronounced at t₀ = 1. At later times, the PTEs of the two potential surrogates were comparable and close to 1. Results from PTE_L were qualitatively similar but smaller compared to our proposed ${PTE}_{g_{opt}}$ while the Cox model based estimates were above 1 at t₀ = 3, likely reflecting model mis-specification.

TABLE 3.

PTE estimates for HbA1c and fasting plasma glucose.

HbA1c
	t₀ = 0.5		t₀ = 1		t₀ = 2		t₀ = 3
	Est	SE	Est	SE	Est	SE	Est	SE
g ₁	−0.097	0.016	−0.062	0.012	−0.034	0.009	−0.001	0.004
PTE	0.317	0.064	0.520	0.071	0.816	0.072	1.008	0.034
PTE_cox	0.207	0.057	0.486	0.080	0.798	0.078	1.313	0.070
PTE_L	0.146	0.067	0.468	0.129	0.741	0.189	1.017	0.282
Fasting plasma glucose
	t₀ = 0.5		t₀ = 1		t₀ = 2		t₀ = 3
	Est	SE	Est	SE	Est	SE	Est	SE
g ₁	−0.086	0.016	−0.044	0.012	−0.021	0.008	0.001	0.004
PTE	0.349	0.065	0.641	0.076	0.826	0.068	0.966	0.043
PTE_cox	0.355	0.061	0.636	0.082	0.890	0.075	1.318	0.066
PTE_L	0.235	0.090	0.566	0.150	0.753	0.196	0.995	0.286

Open in a new tab

The estimator ${\hat{g}}_{2} (s)$ and confidence intervals for the two surrogates respectively; up indicates upper bound of confidence interval, low indicates lower bound of confidence interval; top panel is HbA1c and bottom panel is fasting plasma glucose; results for t₀ = 0.5, 1, 2, 3 are shown from left to right, respectively

Because we observe higher PTE estimates as t₀ increases, one may wish to examine the PTE of the primary outcome information alone up to t₀, in this case, diabetes incidence up to t₀. This quantity can provide valuable information regarding the incremental value of the surrogate marker with respect to the PTE.¹⁸ In this example, the PTE of diabetes incidence alone up to t₀ is 0.119 when t₀ = 0.5, 0.405 when t₀ = 1, 0.716 when t₀ = 2, and 0.986 when t₀ = 3. Comparing these to the estimates of the PTE for the surrogate information using our proposed approach demonstrates that, for example, at t₀ = 2, while diabetes incidence at 2 years explains 71.6% of the treatment effect, fasting plasma glucose at 2 years explains an additional 11%.

5 |. DISCUSSION

We propose a proportion of treatment effect explained measure to quantify the surrogacy of a marker measured earlier in time in a time-to-event outcome setting. Building on our previously developed optimal transformation function in Wang et al. (2020),¹⁷ we propose a prediction function that optimally combines information on the primary endpoint up to t₀ and the surrogate marker information at t₀ for those who have not yet experienced the primary outcome. Our proposed model free method has distinct advantages over existing methods. First, while the framework is generally similar to Wang et al. (2020),¹⁷, our extension to the time-to-event outcome setting is not trivial and is necessary to allow for use of this framework in practice, where primary outcomes are very often censored. Second, this optimal transformation framework allows us to relax strict assumptions that are required by, for example, Parast et al. (2017)¹⁸, and we explicitly demonstrate in our simulation study the superior performance of our proposed method when such assumptions are violated. An R package implementing the methods proposed in this article, named OSsurvival, is available at https://celehs.github.io/OSsurvival/.

As in the non-censored case of Wang et al. (2020)¹⁷, the working assumption (A1) is only used to facilitate the derivation of g_opt(·). Even when (A1) fails, g_opt(·) remains a sensible prediction function that enables us to effectively use both $Y_{t_{0}}$ and observed S to predict the outcome Y_t i.e., to recover information about the difference in Y_t between two independent patients assigned to treatment vs. control. Furthermore, the interpretation of the proposed PTE measure as well as the proposed inference procedures remain valid regardless of the adequacy of (A1). These properties were illustrated in the simulation results examining the sensitivity to violations of this assumption.

Similar to Wang et al. (2020)¹⁷, we assumed that p₁ = (A = a) = 0.5 both in the population loss function and in the observed data. In practice, even if the observed trial has a randomization ratio different from 1 : 1, the proposed loss function is still an appropriate choice as it reflects a future population with p₁ = 0.5. In that case, the population g_opt(·) remains the same and our proposed estimators can be easily modified to include inverse probability of treatment assignment weights to yield consistent estimators of g_opt(·) and PTE. More generally, if there is a pre-conceived p₁ ≠ 0.5, one may modify both the objective function and the estimators with weights to allow for different treatment assignment probabilities.

In practice, randomized clinical trials often have baseline covariate information available; given randomization, such information can be used to gain efficiency and power.^23,24,25 One approach to incorporate baseline covariates into our proposed method would be through the use of augmentation.¹⁸ Specifically, one could construct augmented versions of ${\hat{Δ}}_{\hat{g}} (t_{0})$ and $\hat{Δ} (t)$ as:

(\begin{matrix} \hat{Δ} {(t)}^{A} \\ {\hat{Δ}}_{\hat{g}} {(t_{0})}^{A} \end{matrix}) = (\begin{matrix} \hat{Δ} (t) \\ {\hat{Δ}}_{\hat{g}} (t_{0}) \end{matrix}) + A {n_{1}^{- 1} \sum_{A_{i} = 1} h (Z_{i}) - n_{0}^{- 1} \sum_{A_{1} = 0} h (Z_{i})}

(5)

where Z_i are i.i.d. random vectors of baseline covariates and h(·) is a pre-specified basis transformation. Given treatment randomization, the augmented estimators converge to the same limit as the non-augmented estimators since $n_{1}^{- 1} \sum_{A_{i} = 1} h (Z_{i}) - n_{0}^{- 1} \sum_{A_{1} = 0} h (Z_{i})$ converges to zero in probability as the sample size goes to infinity. One may then select $A$ such that the variance of ${(\hat{Δ} {(t)}^{A}, {\hat{Δ}}_{\hat{g}} {(t_{0})}^{A})}^{'}$ is minimized, and obtain a final augmented version of PTE as ${\hat{PTE}}_{\hat{g}} {(t_{0})}^{A} = {\hat{Δ}}_{\hat{g}} {(t_{0})}^{A} / \hat{Δ} {(t)}^{A}$ . We would expect increased efficiency with the augmented estimate if the covariates Z_i are associated with the primary outcome and the surrogate marker. When the study is not a randomized trial, is an observational study, or reflects a real-world dataset (RWD), we recently proposed to estimate the optimal transformation function of the surrogate by inverse probability weighted (IPW) method and doubly robust (DB) method to incorporate the baseline covariates, see Han et al. (2021).²⁶

Importantly, the setup considered in this paper is one in which all individuals are randomized at the same timepoint such that the baseline time is the same for all individuals, or equivalently, the surrogate information is collected after the same follow-up period after enrollment for all individuals. An extension of this method, and surrogate marker evaluation methods more generally, to settings with staggered entry would be a valuable future contribution to this field. In a staggered entry study, the time t₀ after randomization differs for each individual and thus if one wishes to evaluate the surrogate at a particular timepoint after study initiation, some individuals may have more information observed after t₀ that could be utilized. Additionally, an extension of this proposed method that would be useful in practice would be to allow for the evaluation of multiple surrogate markers. A fully non-parametric approach would likely not be feasible given our use of kernel smoothing, but the incorporation of machine learning methods, particularly if the surrogate information is high-dimensional, could allow for great flexibility within this same framework. In addition, while we were able to relax assumptions required by others, further work is needed more generally in the development statistical methods to assess and use surrogate markers regarding sensitivity and robustness to assumptions to ensure avoidance of a surrogate paradox situation.^27,28

APPENDIX

A. DERIVATIONS OF THE OPTIMAL TRANSFORMATION FUNCTION g_opt

In this section, we derive the specific form for the optimal transformation function of the surrogate information, g_opt(·). We aim to solve the following problem for g₁ and g₂(·):

min_{g_{1}, g_{2}} \frac{1}{2} E [I (T \leq t_{0}) g_{1}^{2} + I (T > t_{0}) {Y_{t} - g_{2} (Q_{t_{0}})}^{2}] under the constraint E [(1 - Y_{t_{0}}) (- g_{1}) + Y_{t_{0}} {Y_{t} - g_{2} (Q_{t_{0}})} | A = 0] = 0.

Let ${\tilde{g}}_{2} (s) = g_{2} (s) - m (s | t_{0})$ , where $m (s | t_{0}) = E [Y | S = s, T > t_{0}]$ . Then,

E [Y_{t_{0}} {Y_{t} - g_{2} (S)}^{2}] = E [I (T > t_{0}) {Y - m (S | t_{0}) - {\tilde{g}}_{2} (S)}^{2}] = E [I (T > t_{0}) {Y - m (S | t_{0})}^{2}] - 2 E [I (T > t_{0}) {Y - m (S | t_{0})} {\tilde{g}}_{2} (S)] + E [I (T > t_{0}) {\tilde{g}}_{2}^{2} (S)] = E [I (T > t_{0}) {Y - m (S | t_{0})}^{2}] + E [I (T > t_{0}) {\tilde{g}}_{2}^{2} (S)] .

Hence the problem is equivalent to

min_{g_{1}, {\tilde{g}}_{2}} \frac{1}{2} E [I (T \leq t_{0}) g_{1}^{2} + I (T > t_{0}) {\tilde{g}}_{2}^{2} (S)] given E [I (T \leq t_{0}) g_{1} + I (T > t_{0}) {\tilde{g}}_{2} (S) | A = 0] = c,

where

c ≔ E [Y_{t_{0}} {Y_{t} - m (S | t_{0})} | A = 0] = μ_{0} (t_{0}) E [Y_{t}^{(0)} - m (S^{(0)} | t_{0}) | T^{(0)} > t_{0}] .

(A1)

This problem is equivalent to

min_{g_{1}, {\tilde{g}}_{2}} P (T \leq t_{0}) g_{1}^{2} + F_{C} ({\tilde{g}}_{2}), given that G_{C} ({\tilde{g}}_{2}) + P (T^{(0)} \leq t_{0}) g_{1} = c,

where $F_{C} ({\tilde{g}}_{2}) = \int {\tilde{g}}_{2}^{2} (s) f (s, t_{0}) d s$ , $G_{C} ({\tilde{g}}_{2}) = \int {\tilde{g}}_{2} (s) f_{0} (s, t_{0}) d s$ , f(s, t₀) = f(s|t₀)μ(t₀) and f₀(s, t₀) = f₀(s|t₀)μ₀(t₀). Using Lagrange multipliers and taking the derivative with respect to g₁, we have

\frac{d}{d g_{1}} [P (T \leq t_{0}) g_{1}^{2} + F_{C} ({\tilde{g}}_{2}) - 2 λ {G_{C} ({\tilde{g}}_{2}) + P (T^{(0)} \leq t_{0}) g_{1}}] = 2 g_{1} P (T \leq t_{0}) - 2 λ P (T^{(0)} \leq t_{0}) = 0,

so that

g_{1, opt} = λ \frac{P (T^{(0)} \leq t_{0})}{P (T \leq t_{0})} .

Taking the Frechet derivatives of the functionals we have that for all measurable h such that ∫ h²(s)f(s, t₀)ds < ∞,

\frac{d}{d {\tilde{g}}_{2}} [P (T \leq t_{0}) g_{1}^{2} + F_{C} ({\tilde{g}}_{2}) - 2 λ {G_{C} ({\tilde{g}}_{2}) + P (T^{(0)} \leq t_{0}) g_{1}}] = 2 \int {\tilde{g}}_{2} (s) h (s) f (s, t_{0}) d s - 2 λ \int h (s) f_{0} (s, t_{0}) d s = 0.

This then implies (setting h = δ(s))

{\tilde{g}}_{2} (s) = \frac{λ f_{0} (s, t_{0})}{f (s, t_{0})}, for all s .

Hence, by the constraint and (A1), we have

λ = \frac{c}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s + \frac{P^{2} (T^{(0)} \leq t_{0})}{P (T \leq t_{0})}} = \frac{μ_{0} (t_{0}) E [Y_{t}^{(0)} - m (S^{(0)} | t_{0}) | T^{(0)} > t_{0}]}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s + \frac{P^{2} (T^{(0)} \leq t_{0})}{P (T \leq t_{0})}} = \frac{E {Y_{t} - m (S | t_{0}) | A = 0, T > t_{0}} μ_{0} (t_{0})}{\int \frac{f_{0} {(s | t_{0})}^{2} μ_{0} {(t_{0})}^{2}}{f (s | t_{0}) μ (t_{0})} d s + \frac{{1 - μ_{0} (t_{0})}^{2}}{1 - μ (t_{0})}},

(A2)

which also gives us the following solution:

g_{1, opt} = \frac{P (T^{(0)} \leq t_{0})}{P (T \leq t_{0})} \frac{μ_{0} (t_{0}) E [Y_{t}^{(0)} - m (S^{(0)} | t_{0}) | T^{(0)} > t_{0}]}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s + \frac{P^{2} (T^{(0)} \leq t_{0})}{P (T \leq t_{0})}}, and g_{2, opt} (s) = m (s | t_{0}) + {\tilde{g}}_{2} (s) = m (s | t_{0}) + \frac{f_{0} (s, t_{0})}{f (s, t_{0})} \frac{μ_{0} (t_{0}) E [Y_{t}^{(0)} - m (S^{(0)} | t_{0}) | T^{(0)} > t_{0}]}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s + \frac{P^{2} (T^{(0)} \leq t_{0})}{P (T \leq t_{0})}} .

Furthermore, the numerator of λ equals to

E {Y_{t} - m (S | t_{0}) | A = 0, T > t_{0}} μ_{0} (t_{0}) = μ_{0} (t_{0}) [\int E {Y_{t} | A = 0, T > t_{0}, S = s} f_{0} (s | t_{0}) d s - \int m (s | t_{0}) f_{0} (s | t_{0}) d s] = μ_{0} (t_{0}) [\int m_{0} (s | t_{0}) f_{0} (s | t_{0}) d s - \int m (s | t_{0}) f_{0} (s | t_{0}) d s] = μ_{0} (t_{0}) [\int {m_{0} (s | t_{0}) - m_{1} (s | t_{0})} \frac{f_{1} (s | t_{0}) μ_{1} (t_{0})}{f (s | t_{0}) {μ_{1} (t_{0}) + μ_{0} (t_{0})}} f_{0} (s | t_{0}) d s] = \frac{μ_{1} (t_{0}) μ_{0} (t_{0})}{μ_{1} (t_{0}) + μ_{0} (t_{0})} \int \frac{{m_{0} (s | t_{0}) - m_{1} (s | t_{0})} f_{1} (s | t_{0}) f_{0} (s | t_{0})}{f (s | t_{0})} d s,

because

m (s | t_{0}) = m_{1} (s | t_{0}) P (A = 1 | S = s, T > t_{0}) + m_{0} (s | t_{0}) P (A = 0 | S = s, T > t_{0}) = m_{1} (s | t_{0}) \frac{1 / 2 f_{1} (s | t_{0}) μ_{1} (t_{0})}{1 / 2 f_{1} (s | t_{0}) μ_{1} (t_{0}) + 1 / 2 f_{0} (s | t_{0}) μ_{0} (t_{0})} + m_{0} (s | t_{0}) \frac{1 / 2 f_{0} (s | t_{0}) μ_{0} (t_{0})}{1 / 2 f_{1} (s | t_{0}) μ_{1} (t_{0}) + 1 / 2 f_{0} (s | t_{0}) μ_{0} (t_{0})} = m_{1} (s | t_{0}) \frac{1 / 2 f_{1} (s | t_{0}) μ_{1} (t_{0})}{f (s | t_{0}) μ (t_{0})} + m_{0} (s | t_{0}) \frac{1 / 2 f_{0} (s | t_{0}) μ_{0} (t_{0})}{f (s | t_{0}) μ (t_{0})} = m_{1} (s | t_{0}) \frac{f_{1} (s | t_{0}) μ_{1} (t_{0})}{f (s | t_{0}) {μ_{1} (t_{0}) + μ_{0} (t_{0})}} + m_{0} (s | t_{0}) \frac{f_{0} (s | t_{0}) μ_{0} (t_{0})}{f (s | t_{0}) {μ_{1} (t_{0}) + μ_{0} (t_{0})}} .

Therefore, λ can be rewritten as

λ = \frac{μ_{1} (t_{0}) μ_{0} (t_{0})}{μ_{1} (t_{0}) + μ_{0} (t_{0})} \int \frac{{m_{0} (s | t_{0}) - m_{1} (s | t_{0})} f_{1} (s | t_{0}) f_{0} (s | t_{0})}{f (s | t_{0})} d s {\int \frac{f_{0} {(s | t_{0})}^{2} μ_{0} {(t_{0})}^{2}}{f (s | t_{0}) μ (t_{0})} d s + \frac{{1 - μ_{0} (t_{0})}^{2}}{1 - μ (t_{0})}}^{- 1} .

(A3)

B. APPROXIMATE RELATIONSHIP BETWEEN PTE AND PTE_L

Letting g₁ ≡ 0, we will show that there is a correspondence between the proposed PTE and the PTE definition of Parast et al.¹⁸. Since g₁ in Section 2 is very close to 0, the correspondence will approximately hold for the PTE in Section 2. Recalling that Y_t = I(T > t) = I(T > t₀)I(T > t). Assuming S > 0 almost surely, with a slight abuse of notation, $Q_{t_{0}} = I (T \leq t_{0}) 0 + I (T > t_{0}) S = I (T > t_{0}) S$ and $g (Q_{t_{0}}) = I (T > t_{0}) g (S)$ .

We have the following problem:

min_{g \in F} \frac{1}{2} E [{Y_{t} - g (Q_{t_{0}})}^{2}] under the constraint E [Y_{t} - g (Q_{t_{0}}) | A = 0] = 0,

where $F$ is the class of measurable functions. The problem is equivalent to the following problem for g(·),

min_{g \in F} \frac{1}{2} E [I (T > t_{0}) {Y_{t} - g (S)}^{2}] under the constraint E [Y_{t_{0}} {Y_{t} - g (S)} | A = 0] = 0.

Let $\tilde{g} (S) = g (S) - m (S | t_{0})$ , where m(S|t₀) = E[Y|S = s, T > t₀]. Then,

E [Y_{t_{0}} {Y_{t} - g (S)}^{2}] = E [I (T > t_{0}) {Y - m (S | t_{0}) - \tilde{g} (S)}^{2}] = E [I (T > t_{0}) {Y - m (S | t_{0})}^{2}] - 2 E [I (T > t_{0}) {Y - m (S | t_{0})} \tilde{g} (S)] + E [I (T > t_{0}) {\tilde{g}}^{2} (S)] .

Hence the problem is reduced to

min_{\tilde{g}} \frac{1}{2} E [I (T > t_{0}) {\tilde{g}}^{2} (S)] given E [I (T > t_{0}) \tilde{g} (S) | A = 0] = c

where $c ≔ E [Y_{t_{0}} {Y_{t} - m (S | t_{0})} | A = 0]$ here. This problem is equivalent to

min_{\tilde{g}} F_{C} (\tilde{g}), given that G_{C} (\tilde{g}) = c,

where $F_{C} (\tilde{g}) = \int {\tilde{g}}^{2} (s) f (s, t_{0}) d s$ , and $G_{C} (\tilde{g}) = \int \tilde{g} (s) f_{0} (s, t_{0}) d s$ . Taking the Frechet derivatives of the functionals we have that for all measurable h such that ∫ h(s)f(s, t₀)ds < ∞

\frac{d}{d g} [F_{C} (\tilde{g}) - 2 λ G_{C} (\tilde{g})] = 2 \int \tilde{g} (s) h (s) f (s, t_{0}) d s - 2 λ \int h (s) f_{0} (s, t_{0}) d s = 0.

This of course implies (setting h = δ(s)) similarly to before

\tilde{g} (s) = \frac{λ f_{0} (s, t_{0})}{f (s, t_{0})}, for all s .

Hence by the constraint we have

λ = \frac{c}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s},

which also gives us values of

g_{opt} (s) = m (s | t_{0}) + \tilde{g} (s) = m (s | t_{0}) + \frac{f_{0} (s | t_{0})}{f (s | t_{0})} \frac{c}{μ_{0} (t_{0}) \int \frac{f_{0}^{2} (s | t_{0})}{f (s | t_{0})} d s} .

From Parast et al.¹⁸, we have that their residual treatment effect is

Δ_{s} (t, t_{0}) = E [I (S_{t_{0}} > 0) {m_{1} (s | t_{0}) - m_{0} (s | t_{0})}] = μ (t_{0}) E {m_{1} (s | t_{0}) - m_{0} (s | t_{0}) | T > t_{0}},

where m_a(t|s, t₀) = P(T^(a) > t|S^(a) = s, T^(k) > t₀), k = 0, 1. We know that

Δ (t) = E (Y_{t}^{(1)}) - E (Y_{t}^{(0)}) = μ_{1} (t_{0}) \int m_{1} (s | t_{0}) d F_{1} (s | t_{0}) - μ_{0} (t_{0}) \int m_{0} (s | t_{0}) d F_{0} (s | t_{0}),

where μ_a(t₀) = P(T^(a > t₀) and F_a(s|t₀) = P(S^(a) ≤ s|T^(a) > t₀) for a = 0, 1. So

Δ_{L} = Δ (t) - Δ_{s} (t, t_{0}) = \int m_{1} (s | t_{0}) {μ_{1} (t_{0}) f_{1} (s | t_{0}) - μ (t_{0}) \tilde{f} (s | t_{0})} d s - \int m_{0} (s | t_{0}) {μ_{0} (t_{0}) f_{0} (s | t_{0}) - μ (t_{0}) \tilde{f} (s | t_{0})} d s,

where $\tilde{f} (s | t_{0})$ is the density function of a reference distribution for S, and f_a(s|t₀) is the density function of S^(a)|T^(a) > t₀.

Plugging in the formula of g_opt(s),

g_{opt} (s) = m (s | t_{0}) + \frac{f_{0} (s | t_{0})}{f (s | t_{0})} \frac{c}{μ_{0} (t_{0}) \int \frac{f_{0}^{2} (s | t_{0})}{f (s | t_{0})} d s} with

m (s | t_{0}) = E (Y_{t} | S = s, T > t_{0}) = E (Y_{t} | S = s, T > t_{0}, A = 1) P (A = 1 | S = s, T > t_{0}) + E (Y_{t} | S = s, T > t_{0}, A = 0) P (A = 0 | S = s, T > t_{0}) = m_{1} (s | t_{0}) r_{1} (s, t_{0}) + m_{0} (s | t_{0}) r_{0} (s, t_{0}),

r_{a} (s, t_{0}) = P (A = a | S = s, T > t_{0}) = f_{a} (s | t_{0}) μ_{a} (t_{0}) / {f_{1} (s | t_{0}) μ_{1} (t_{0}) + f_{0} (s | t_{0}) μ_{0} (t_{0})}, c = E [Y_{t_{0}} {Y_{t} - m (S | t_{0})} | A = 0] = E {Y_{t} | A = 0} - E {I (T > t_{0}) m (S | t_{0}) | A = 0} = μ_{0} (t_{0}) \int m_{0} (s | t_{0}) f_{0} (s | t_{0}) d s - μ_{0} (t_{0}) \int {m_{1} (s | t_{0}) r_{1} (s, t_{0}) + m_{0} (s | t_{0}) r_{0} (s, t_{0})} f_{0} (s | t_{0}) d s = μ_{0} (t_{0}) \int {- m_{1} (s | t_{0}) + m_{0} (s | t_{0})} r_{1} (s, t_{0}) f_{0} (s | t_{0}) d s,

we have

Δ_{g_{opt}} (t_{0}) = E {g_{opt} (Q^{(1)}) - g_{opt} (Q^{(0)})} = E {I (T > t_{0}) g_{opt} (S) | A = 1} - E {I (T > t_{0}) g_{opt} (S) | A = 0} = \int g_{opt} (s) {μ_{1} (t_{0}) f_{1} (s | t_{0}) d s - μ_{0} (t_{0}) f_{0} (s | t_{0}) d s} = \int m (s | t_{0}) {μ_{1} (t_{0}) f_{1} (s | t_{0}) d s - μ_{0} (t_{0}) f_{0} (s | t_{0}) d s} + \int \frac{f_{0} (s | t_{0})}{f (s | t_{0})} \frac{c}{μ_{0} (t_{0}) \int \frac{f_{0}^{2} (s | t_{0})}{f (s | t_{0})} d s} {μ_{1} (t_{0}) f_{1} (s | t_{0}) d s - μ_{0} (t_{0}) f_{0} (s | t_{0}) d s} ≔ \int m_{1} (s | t_{0}) {μ_{1} (t_{0}) f_{1} (s | t_{0}) - μ_{1} (t_{0}) w (s | t_{0})} d s - \int m_{0} (s | t_{0}) {μ_{0} (t_{0}) f_{0} (s | t_{0}) - μ_{1} (t_{0}) w (s | t_{0})} d s,

where w (s | t_{0}) = r_{0} (s, t_{0}) f_{1} (s | t_{0}) + r_{1} (s, t_{0}) f_{0} (s | t_{0}) \frac{\int \frac{f_{0} (s | t_{0})}{f (s | t_{0})} f_{1} (s | t_{0}) d s}{\int \frac{f_{0}^{2} (s | t_{0})}{f (s | t_{0})} d s} = \frac{f_{0} (s | t_{0}) f_{1} (s | t_{0}) / f (s | t_{0})}{\int f_{0} {(s | t_{0})}^{2} / f (s | t_{0}) d s} .

If we let the density of the reference distribution in Δ_L

\tilde{f} (s | t_{0}) = \frac{μ_{1} (t_{0})}{μ (t_{0})} w (s | t_{0}),

then $Δ_{g_{opt}} = Δ_{L}$ . So $PTE = Δ_{g_{opt}} / Δ = {PTE}_{L}$ under this special case. This relationship holds approximately for the proposed PTE in Section 2.

We now derive the assumptions needed to ensure that PTE is between 0 and 1. The assumptions approximately suffice to make the proposed PTE in Section 2 between 0 and 1.

Let U = g_opt(S), we have

Δ (t) = μ_{1} (t_{0}) \int m_{1} (u | t_{0}) f_{1} (u | t_{0}) d u - μ_{0} (t_{0}) \int m_{0} (u | t_{0}) f_{0} (u | t_{0}) d u,

Δ_{g_{opt}} (t_{0}) = \int m_{1} (u | t_{0}) {μ_{1} (t_{0}) f_{1} (u | t_{0}) - μ_{1} (t_{0}) w (u | t_{0})} d u - \int m_{0} (u | t_{0}) {μ_{0} (t_{0}) f_{0} (u | t_{0}) - μ_{1} (t_{0}) w (u | t_{0})} d u,

Δ (t) - Δ_{g_{opt}} (t_{0}) = μ_{1} (t_{0}) \int {m_{1} (u | t_{0}) - m_{0} (u | t_{0})} w (u | t_{0}) d u,

(B4)

Direct calculations show that

Δ_{g_{opt}} (t_{0}) = E {g_{opt} (Q^{(1)}) - g_{opt} (Q^{(0)})} = E {I (T > t_{0}) g_{opt} (S) | A = 1} - E {I (T > t_{0}) g_{opt} (S) | A = 0} = \int u f_{1} (u | t_{0}) μ_{1} (t_{0}) d u - \int u f_{0} (u | t_{0}) μ_{0} (t_{0}) d u, = μ_{1} (t_{0}) [u_{u} - \int F_{1} (u | t_{0}) d u] - μ_{0} (t_{0}) [u_{u} - \int F_{0} (u | t_{0}) d u] = u_{u} {μ_{1} (t_{0}) - μ_{0} (t_{0})} + \int {P (T^{(1)} > t_{0}, U^{(1)} > u) - P (T^{(0)} > t_{0}, U^{(0)} > u)} d u,

(B5)

where F_a(u|t₀) is the cumulative distribution function corresponding to f_a(u|t₀), a = 0, 1 and u_u is the up bound of U. It follows from (B4) and (B5), a set of sufficient conditions for Δ(t) > Δ_g(t₀) > 0 is

(C1) . P (U^{(1)} > u, T^{(1)} > t_{0}) > P (U^{(0)} > u, T^{(0)} > t_{0}) for all u;

(C2) . m_{1} (u | t_{0}) > m_{0} (u | t_{0}) for all u .

In addition, we compare the performances of these two surrogate transformations, $g_{opt} (Q_{t_{0}}) = (1 - Y_{t_{0}}) g_{1, opt} + Y_{t_{0}} g_{2, opt} (S)$ in the main paper and $g (Q_{t_{0}}) = I (T > t_{0}) g (S)$ in this section, using the simulation settings described in the main paper. The PTE estimators are denoted as PTE.tran1 and PTE.tran2, respectively. From Table B1 we see that although the differences are very small, the PTE estimates using $g_{opt} (Q_{t_{0}})$ are generally a little larger for all the considered settings. This is because the transformation $g_{opt} (Q_{t_{0}})$ is more general and flexible, which includes the latter transformation as a special case when the true value of g₁ is 0. If m₁(s|t₀) and m₀(s|t₀) are not close, where $m_{a} (s | t_{0}) = E (Y_{t} | A = a, S = s, T > t_{0})$ , a = 0, 1, g₁ would not be expected to be very close to 0 and a version which sets g₁ = 0 would omit this information and not perform as well with respect to capturing the treatment effect.

TABLE B1.

PTE estimates for two kinds of transformations, with empirical standard errors in subscripts, under Setting 1-Setting 3.

	Setting 1		Setting 2		Setting 3
t ₀	PTE.tran1	PTE.tran2	PTE.tran1	PTE.tran2	PTE.tran1	PTE.tran2
0.3	0.693_0.090	0.677_0.093	0.454_0.089	0.408_0.091	0.522_0.116	0.499_0.118
0.5	0.770_0.083	0.752_0.088	0.625_0.079	0.583_0.083	0.651_0.107	0.627_0.111
0.7	0.836_0.072	0.821_0.077	0.773_0.066	0.745_0.071	0.771_0.093	0.753_0.098

Open in a new tab

C. ASYMPTOTIC PROPERTIES FOR $\hat{g} (\cdot)$

Throughout, we assume that m(s|t₀) is continuously differentiable. In addition, we assume that f(s, t₀) = f(s|t₀)μ(t₀), f_a(s, t₀) = f_a(s|t₀)μ_a(t₀), a = 0, 1 are continuously differentiable with finite support. For inference, we also require undersmoothing with h = o_p(n^−1∕5) for interval estimation of g_opt and h = o_p(n^−1∕4) for the interval estimation of PTE. Since $\hat{m} (s)$ , $\hat{f} (s, t_{0})$ and $\hat{f_{a}} (s, t_{0})$ , a = 0, 1 are standard kernel estimators, we have that they are consistent w.r.t their true values with rate ${(log n)}^{\frac{1}{2}} {(n h)}^{- \frac{1}{2}} + h^{2}$ . It follows immediately that $| {\hat{g}}_{1} - g_{1, opt} | = O_{p} {{(log n)}^{\frac{1}{2}} {(n h)}^{- \frac{1}{2}} + h^{2}}$ and ${sup}_{s} | {\hat{g}}_{2} (s) - g_{2, opt} (s) | = O_{p} {{(log n)}^{\frac{1}{2}} {(n h)}^{- \frac{1}{2}} + h^{2}}$ .

We firstly derive the influence functions for each estimators in Section 2.2.

{\hat{f}}_{0} (s, t_{0}) - f_{0} (s, t_{0}) = \frac{n^{- 1} \sum_{A_{i} = 0} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{n^{- 1} \sum_{A_{i} = 0} {\hat{ω}}_{t_{0}, i}} - f_{0} (s, t_{0}) = n^{- 1} \sum_{i = 1}^{n} 2 I (A_{i} = 0) [\frac{K_{h} (S_{i} - s) I (X_{i} > t_{0})}{{\hat{G}}_{0} (t_{0})} - \frac{K_{h} (S_{i} - s) I (X_{i} > t_{0})}{G_{0} (t_{0})} + \frac{K_{h} (S_{i} - s) I (X_{i} > t_{0})}{G_{0} (t_{0})} - f_{0} (s, t_{0}) ω_{t_{0}, i} - f_{0} (s, t_{0}) ({\hat{ω}}_{t_{0}, i} - ω_{t_{0}, i})] + o_{p} ({(n h)}^{- 1 / 2}) = n^{- 1} \sum_{i = 1}^{n} 2 I (A_{i} = 0) [\frac{K_{h} (S_{i} - s) I (X_{i} > t_{0})}{G_{0} (t_{0})} \frac{G_{0} (t_{0}) - {\hat{G}}_{0} (t_{0})}{G_{0} (t_{0})} + \frac{K_{h} (S_{i} - s) I (X_{i} > t_{0})}{G_{0} (t_{0})} - f_{0} (s, t_{0}) ω_{t_{0}, i} - f_{0} (s, t_{0}) {\frac{δ_{i} I (X_{i} \leq t_{0})}{G_{0} (X_{i})} \frac{- {\hat{G}}_{0} (X_{i}) + G_{0} (X_{i})}{G_{0} (X_{i})} + \frac{I (X_{i} > t_{0})}{G_{0} (t_{0})} \frac{- {\hat{G}}_{0} (t_{0}) + G_{0} (t_{0})}{G_{0} (t_{0})}}] + o_{p} {{(n h)}^{- 1 / 2}} = 2 n^{- 1} \sum_{j = 1}^{n} \int n^{- 1} \sum_{A_{i} = a} [\frac{I (X_{i} > t_{0})}{G_{a} (t_{0})} {K_{h} (S_{i} - s) - f_{0} (s, t_{0})} I (x \leq t_{0}) - f_{0} (s, t_{0}) \frac{δ_{i} I (X_{i} \leq t_{0})}{G_{a} (X_{i})} I (x \leq X_{i})] \frac{I (A_{j} = a) d M_{j}^{c} (x)}{n^{- 1} \sum_{l = 1}^{n} I (A_{l} = a) I (X_{l} \geq x)} + n^{- 1} \sum_{i = 1}^{n} 2 I (A_{i} = 0) [\frac{K_{h} (S_{i} - s) I (X_{i} > t_{0})}{G_{0} (t_{0})} - f_{0} (s, t_{0}) ω_{t_{0}, i}] + o_{p} {{(n h)}^{- 1 / 2}} ≔ {(n h)}^{- 1} \sum_{i = 1}^{n} ϕ_{0, i} (s, t_{0}) + o_{p} {{(n h)}^{- 1 / 2}} .

Similar derivations can be used to obtain

\hat{f} (s, t_{0}) - f (s, t_{0}) ≔ {(n h)}^{- 1} \sum_{i = 1}^{n} ϕ_{i} (s, t_{0}) + o_{p} {{(n h)}^{- 1 / 2}}, \hat{m} (s | t_{0}) - m (s | t_{0}) = \frac{\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (X_{i} > t) {\hat{ω}}_{t, i} / \sum_{i = 1}^{n} {\hat{ω}}_{t, i}}{\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (t_{0} < X_{i}) {\hat{ω}}_{t_{0}, i} / \sum_{i = 1}^{n} {\hat{ω}}_{t_{0}, i}} - E [Y | S = s, T > t_{0}] = \frac{1}{f (s, t_{0})} {\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (X_{i} > t) {\hat{ω}}_{t, i} / \sum_{i = 1}^{n} {\hat{ω}}_{t, i} - f (s, t)} - \frac{f (s, t)}{f^{2} (s, t_{0})} {\sum_{i = 1}^{n} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i} / \sum_{i = 1}^{n} {\hat{ω}}_{t_{0}, i} - f (s, t_{0})} + o_{p} {{(n h)}^{- 1 / 2}} ≔ {(n h)}^{- 1} \sum_{i = 1}^{n} ϕ_{m, i} (s) + o_{p} {{(n h)}^{- 1 / 2}}, \hat{c} - c = \frac{\sum_{A_{i} = 0} I (X_{i} > t) {\hat{ω}}_{t, i}}{\sum_{A_{i} = 0} {\hat{ω}}_{t, i}} - \frac{\sum_{A_{i} = 0} \hat{m} (S_{i} | t_{0}) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{A_{i} = 0} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}} - c = \frac{\sum_{A_{i} = 0} I (X_{i} > t) {\hat{ω}}_{t, i}}{\sum_{A_{i} = 0} {\hat{ω}}_{t, i}} - \frac{\sum_{A_{i} = 0} m (S_{i} | t_{0}) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{A_{i} = 0} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}} - c - \frac{\sum_{A_{i} = 0} {\hat{m} (S_{i} | t_{0}) - m (S_{i} | t_{0})} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}{\sum_{A_{i} = 0} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}}, ≔ n^{- 1} \sum_{i = 1}^{n} ϕ_{c, i} + o_{p} (n^{- 1 / 2})

by incorporating the influence function of $n^{1 / 2} {\hat{m} (s | t_{0}) - m (S | t_{0})}$ . Furthermore,

\int \frac{{\hat{f}}_{0}^{2} (s, t_{0})}{\hat{f} (s, t_{0})} d s - \int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s ≔ n^{- 1} \sum_{i = 1}^{n} ϕ_{f, i} + o_{p} (n^{- 1 / 2}), {\hat{p}}_{0} (t_{0}) - p_{0} (t_{0}) ≔ n^{- 1} \sum_{i = 1}^{n} ϕ_{p, 0, i} (t_{0}) + o_{p} (n^{- 1 / 2}), \hat{p} (t_{0}) - p (t_{0}) ≔ n^{- 1} \sum_{i = 1}^{n} ϕ_{p, i} (t_{0}) + o_{p} (n^{- 1 / 2}),

where p_a(t₀) = 1 – μ_a(t₀), p(t₀) = 1 – μ(t₀), ${\hat{p}}_{a} (t_{0}) = 1 - {\hat{μ}}_{a} (t_{0})$ , and $\hat{p} (t_{0}) = 1 - \hat{μ} (t_{0})$ .

Using above results we can obtain the influence functions for the optimal transformation function estimators by coupling delta method with the fact that

g_{1, opt} = {\tilde{G}}_{1} (p_{0} (t_{0}), p (t_{0}), c, \int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s) = \frac{p_{0} (t_{0})}{p (t_{0})} \frac{c}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s + \frac{p_{0}^{2} (t_{0})}{p (t_{0})}}

g_{2, opt} (s) = {\tilde{G}}_{2} (m (s | t_{0}), f_{0} (s, t_{0}), f (s, t_{0}), c, p_{0} (t_{0}), p (t_{0}), \int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s) = m (s | t_{0}) + \frac{f_{0} (s, t_{0})}{f (s, t_{0})} \frac{c}{\int \frac{f_{0}^{2} (s, t_{0})}{f (s, t_{0})} d s + \frac{p_{0}^{2} (t_{0})}{p (t_{0})}},

{\hat{g}}_{1} = {\tilde{G}}_{1} ({\hat{p}}_{0} (t_{0}), \hat{p} (t_{0}), \hat{c}, \int \frac{{\hat{f}}_{0}^{2} (s, t_{0})}{\hat{f} (s, t_{0})} d s),

and

{\hat{g}}_{2} (s) = {\tilde{G}}_{2} (\hat{m} (s | t_{0}), {\hat{f}}_{0} (s, t_{0}), \hat{f} (s, t_{0}), \hat{c}, {\hat{p}}_{0} (t_{0}), \hat{p} (t_{0}), \int \frac{{\hat{f}}_{0}^{2} (s, t_{0})}{\hat{f} (s, t_{0})} d s) .

Specifically, we can show that

{\hat{g}}_{1} - g_{1, opt} = n^{- 1} \sum_{i = 1}^{n} ϕ_{g, 1, i} + o_{p} (n^{- 1 / 2}),

and

{\hat{g}}_{2} (s) - g_{2, opt} (s) = {(n h)}^{- 1} \sum_{i = 1}^{n} ϕ_{g, 2, i} (s) + o_{p} {{(n h)}^{- 1 / 2}},

where $E (ϕ_{g, 1, i}^{2}) < \infty$ and E{ϕ_g,2,i(s)²} < ∞.

D. INFLUENCE FUNCTIONS OF $\hat{Δ}$ AND ${\hat{Δ}}_{g}$

We first give some existing results needed in the proof. Following Gill²⁹ and Robins and Rotnitzky³⁰, we have

\frac{{\hat{G}}_{a} (t) - G_{a} (t)}{G_{a} (t)} = - \sum_{A_{j} = a} \int_{0}^{t} \frac{d M_{j}^{c} (u)}{\sum_{A_{l} = a} I (X_{l} \geq u)} + o_{p} (n^{- 1 / 2}),

(D6)

\frac{δ_{i}}{G_{A_{i}} (X_{i})} = 1 - \int_{0}^{\infty} \frac{d M_{i}^{c} (u)}{G_{A_{i}} (u)}

(D7)

where G_a(t) = P(C^(a) > t), $M^{c} (t) = N^{c} (t) - \int_{0}^{t} {A λ_{1}^{c} (u) + (1 - A) λ_{0}^{c} (u)} I (X \geq u) d u$ , N^c(t) = I(X ≤ t, δ = 0), and $λ_{a}^{c} (t)$ is the hazard function of C^(a).

We next derive asymptotic distributions for $\hat{Δ} (t)$ and ${\hat{Δ}}_{g} (t_{0})$ for a fixed g function. To this end, we first write

{\hat{μ}}_{a} (t) - μ_{a} (t) = \frac{\sum_{i = 1}^{n} I (A_{i} = a) {\hat{ω}}_{t, i} I (X_{i} > t)}{\sum_{i = 1}^{n} I (A_{i} = a) {\hat{ω}}_{t, i}} - μ_{a} = \frac{\sum_{i = 1}^{n} I (A_{i} = a) {\frac{I (X_{i} > t)}{{\hat{G}}_{a} (t)} - {\hat{ω}}_{t, i} μ_{a}}}{\sum_{i = 1}^{n} I (A_{i} = a) {\hat{ω}}_{t, i}} = 2 n^{- 1} \sum_{A_{i} = a} [\frac{I (X_{i} > t)}{{\hat{G}}_{a} (t)} - \frac{I (X_{i} > t)}{G_{a} (t)} + \frac{I (X_{i} > t)}{G_{a} (t)} - ω_{t, i} μ_{a} - μ_{a} {{\hat{ω}}_{t, i} - ω_{t, i}}] + o_{p} (n^{- 1 / 2}) = 2 n^{- 1} \sum_{A_{i} = a} [\frac{I (X_{i} > t)}{G_{a} (t)} \frac{- {\hat{G}}_{a} (t) + G_{a} (t)}{G_{a} (t)} + \frac{I (X_{i} > t)}{G_{a} (t)} - ω_{t, i} μ_{a}] - 2 n^{- 1} \sum_{A_{i} = a} μ_{a} [\frac{δ_{i} I (X_{i} \leq t)}{G_{a} (X_{i})} \frac{- {\hat{G}}_{a} (X_{i}) + G_{a} (X_{i})}{G_{a} (X_{i})} + \frac{I (X_{i} > t)}{G_{a} (t)} \frac{- {\hat{G}}_{a} (t) + G_{a} (t)}{G_{a} (t)}] + o_{p} (n^{- 1 / 2}) = 2 n^{- 1} \sum_{j = 1}^{n} \int n^{- 1} \sum_{A_{i} = a} [\frac{I (X_{i} > t)}{G_{a} (t)} (1 - μ_{a}) I (x \leq t) - μ_{a} \frac{δ_{i} I (X_{i} \leq t)}{G_{a} (X_{i})} I (x \leq X_{i})] \frac{I (A_{j} = a) d M_{j}^{c} (x)}{n^{- 1} \sum_{l = 1}^{n} I (A_{l} = a) I (X_{l} \geq x)} + 2 n^{- 1} \sum_{A_{i} = a} [\frac{I (X_{i} > t)}{G_{a} (t)} - ω_{t, i} μ_{a}] + o_{p} (n^{- 1 / 2}) ≔ n^{- 1} \sum_{j = 1}^{n} ψ_{a 1, j} (t) + n^{- 1} \sum_{j = 1}^{n} ψ_{a 2, j} (t) + o_{p} (n^{- 1 / 2}) ≔ n^{- 1} \sum_{j = 1}^{n} ψ_{a, j} (t) + o_{p} (n^{- 1 / 2}) .

It follows that

\sqrt{n} {\hat{Δ} (t) - Δ (t)} = n^{- 1 / 2} \sum_{i = 1}^{n} {ψ_{1, i} (t) - ψ_{0, i} (t)} + o_{p} (1) ≔ n^{- 1 / 2} \sum_{i = 1}^{n} ψ_{i} (t) + o_{p} (1) .

By the central limit theorem we have that $\sqrt{n} {\hat{Δ} (t) - Δ (t)}$ converges in distribution to a normal distribution N(0, σ²(t)) with $σ^{2} = E [ψ_{i}^{2} (t)]$ . Similarly, we have

{\hat{μ}}_{g, a} (t_{0}) - μ_{g, a} (t_{0}) = \frac{\sum_{A_{i} = a} {\hat{ω}}_{t_{0}, i} g (Q_{t_{0}, i})}{\sum_{A_{i} = a} {\hat{ω}}_{t_{0}, i}} - μ_{g, a} (t_{0}) = 2 n^{- 1} \sum_{A_{i} = a} [g_{1} \frac{δ_{i} I (X_{i} \leq t_{0})}{{\hat{G}}_{a} (X_{i})} + g_{2} (S_{i}) \frac{I (X_{i} > t_{0})}{{\hat{G}}_{a} (t_{0})} - {\hat{ω}}_{t_{0}, i} μ_{g, a} (t_{0})] + o_{p} (n^{- 1 / 2}) = 2 n^{- 1} \sum_{A_{i} = a} [g_{1} \frac{δ_{i} I (X_{i} \leq t_{0})}{G_{a} (X_{i})} \frac{- {\hat{G}}_{a} (X_{i}) + G_{a} (X_{i})}{G_{a} (X_{i})} + g_{2} (S_{i}) \frac{I (X_{i} > t_{0})}{G_{a} (t_{0})} \frac{- {\hat{G}}_{a} (t_{0}) + G_{a} (t_{0})}{G_{a} (t_{0})}] + 2 n^{- 1} \sum_{A_{i} = a} [g_{1} \frac{δ_{i} I (X_{i} \leq t_{0})}{G_{a} (X_{i})} + g_{2} (S_{i}) \frac{I (X_{i} > t_{0})}{G_{a} (t_{0})} - ω_{t_{0}, i} μ_{g, a} (t_{0})] - 2 n^{- 1} \sum_{A_{i} = a} μ_{g, a} (t_{0}) [g_{1} \frac{δ_{i} I (X_{i} \leq t_{0})}{G_{a} (X_{i})} \frac{- {\hat{G}}_{a} (X_{i}) + G_{a} (X_{i})}{G_{a} (X_{i})} + g_{2} (S_{i}) \frac{I (X_{i} > t_{0})}{G_{a} (t_{0})} \frac{- {\hat{G}}_{a} (t_{0}) + G_{a} (t_{0})}{G_{a} (t_{0})}] + o_{p} (n^{- 1 / 2}) = 2 n^{- 1} \sum_{j = 1}^{n} \int n^{- 1} \sum_{A_{i} = a} (1 - μ_{g, a}) [g_{1} \frac{δ_{i} I (X_{i} \leq t_{0})}{{\hat{G}}_{a} (X_{i})} I (x \leq X_{i}) + g_{2} (S_{i}) \frac{I (X_{i} > t_{0})}{{\hat{G}}_{a} (t_{0})} I (x \leq t_{0})] \frac{I (A_{j} = a) d M_{j}^{c} (x)}{n^{- 1} \sum_{l = 1}^{n} I (A_{l} = a) I (X_{l} \geq x)} + 2 n^{- 1} \sum_{A_{i} = a} [g_{1} \frac{δ_{i} I (X_{i} \leq t_{0})}{G_{a} (X_{i})} + g_{2} (S_{i}) \frac{I (X_{i} > t_{0})}{G_{a} (t_{0})} - ω_{t_{0}, i} μ_{g, a} (t_{0})] + o_{p} (n^{- 1 / 2}) ≔ n^{- 1} \sum_{j = 1}^{n} ψ_{g, a 1, j} (t_{0}) + n^{- 1} \sum_{j = 1}^{n} ψ_{g, a 2, j} (t_{0}) + o_{p} (n^{- 1 / 2}) ≔ n^{- 1} \sum_{j = 1}^{n} ψ_{g, a, j} (t_{0}) + o_{p} (n^{- 1 / 2}) .

It follows that

\sqrt{n} {{\hat{Δ}}_{g} (t_{0}) - Δ_{g} (t_{0})} = n^{- 1 / 2} \sum_{i = 1}^{n} {ψ_{g, 1, i} (t_{0}) - ψ_{g, 0, i} (t_{0})} + o_{p} (1) ≔ n^{- 1 / 2} \sum_{i = 1}^{n} ψ_{g, i} (t_{0}) + o_{p} (1) .

By the central limit theorem we have that $\sqrt{n} {{\hat{Δ}}_{g} (t_{0}) - Δ_{g} (t_{0})}$ converges in distribution to a normal distribution $N (0, σ_{g}^{2} (t_{0}))$ with $σ_{g}^{2} (t_{0}) = E {ψ_{g, i}^{2} (t_{0})}$ .

With the above influence functions for $\hat{Δ} (t)$ and ${\hat{Δ}}_{g} (t_{0})$ , the variance estimates for them can be obtained by perturbation resampling method.

E. PERTURBATION RESAMPLING

In practice, we may estimate the asymptotic variance of each estimator by perturbation resampling similar to those employed in Parast et al. (2016).¹⁶ Specifically, we may generate V = (V₁, …, V_n) from independent and identically distributed non-negative random variables with mean 1 and variance 1 such as the unit exponential distribution. For each set of V, we let

{\hat{f}}_{a}^{*} (s | t_{0}) = \frac{\sum_{A_{i} = a} V_{i} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}{\sum_{A_{i} = a} V_{i} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}, {\hat{f}}^{*} (s | t_{0}) = \frac{\sum_{i = 1}^{n} V_{i} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}{\sum_{i = 1}^{n} V_{i} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}, {\hat{μ}}_{a}^{*} (t_{0}) = \frac{\sum_{A_{i} = a} V_{i} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}{\sum_{A_{i} = a} V_{i} {\hat{ω}}_{t_{0}, i}^{*}}, {\hat{μ}}^{*} (t_{0}) = \frac{\sum_{i = 1}^{n} V_{i} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}{\sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t_{0}, i}^{*}} {\hat{m}}^{*} (s | t_{0}) = \frac{\sum_{i = 1}^{n} V_{i} K_{h} (S_{i} - s) I (X_{i} > t) {\hat{ω}}_{t, i}^{*} / \sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t, i}^{*}}{\sum_{i = 1}^{n} V_{i} K_{h} (S_{i} - s) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*} / \sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t_{0}, i}^{*}}, {\hat{c}}^{*} = \frac{\sum_{A_{i} = 0} V_{i} I (X_{i} > t) {\hat{ω}}_{t, i}^{*}}{\sum_{A_{i} = 0} V_{i} {\hat{ω}}_{t, i}^{*}} - \frac{\sum_{A_{i} = 0} V_{i} {\hat{m}}^{*} (S_{i} | t_{0}) I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}{\sum_{A_{i} = 0} V_{i} I (X_{i} > t_{0}) {\hat{ω}}_{t_{0}, i}^{*}}, and {\hat{λ}}^{*} = {\hat{c}}^{*} {\int \frac{{\hat{f}}_{0}^{*} {(s | t_{0})}^{2} {\hat{μ}}_{0}^{*} {(t_{0})}^{2}}{{\hat{f}}^{*} (s | t_{0}) {\hat{μ}}^{*} (t_{0})} d s + \frac{{1 - {\hat{μ}}_{0}^{*} (t_{0})}^{2}}{1 - {\hat{μ}}^{*} (t_{0})}}^{- 1},

where ${\hat{ω}}_{t, i}^{*} = {I (X_{i} \leq t) δ_{i} + I (X_{i} > t)} / {\hat{G}}_{A_{i}}^{*} (X_{i} \land t)$ is the perturbed weight with the perturbed Kaplan-Meier estimator ${\hat{G}}_{A_{i}}^{*} (\cdot)$ . Then we may obtain the perturbed counterparts of $\hat{g} (s)$ , $\hat{Δ}$ , ${\hat{Δ}}_{\hat{g}}$ , and ${\hat{PTE}}_{cv} (t_{0})$ as

{\hat{g}}_{1}^{*} = {\hat{λ}}^{*} \frac{1 - {\hat{μ}}_{0}^{*} (t_{0})}{1 - {\hat{μ}}^{*} (t_{0})}, {\hat{g}}_{2}^{*} (s) = {\hat{m}}^{*} (s | t_{0}) + {\hat{λ}}^{*} \frac{{\hat{f}}_{0}^{*} (s | t_{0}) {\hat{μ}}_{0}^{*} (t_{0})}{{\hat{f}}^{*} (s | t_{0}) {\hat{μ}}^{*} (t_{0})},

{\hat{Δ}}^{*} (t) = {\hat{μ}}_{1}^{*} (t) - {\hat{μ}}_{0}^{*} (t), {\hat{Δ}}_{{\hat{g}}^{*}}^{*} (t_{0}) = {\hat{μ}}_{{\hat{g}}^{*}, 1}^{*} (t_{0}) - {\hat{μ}}_{{\hat{g}}^{*}, 0}^{*} (t_{0}), {\hat{PTE}}_{{\hat{g}}^{*}}^{*} (t_{0}) = {\hat{Δ}}_{{\hat{g}}^{*}}^{*} (t_{0}) / {\hat{Δ}}^{*} (t) with {\hat{μ}}_{a}^{*} (t) = \frac{\sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t, i}^{*} I (A_{i} = a) Y_{t, i}}{\sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t, i}^{*} I (A_{i} = a)}, {\hat{μ}}_{g, a}^{*} (t_{0}) = \frac{\sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t_{0}, i}^{*} I (A_{i} = a) g (Q_{t_{0}, i})}{\sum_{i = 1}^{n} V_{i} {\hat{ω}}_{t_{0}, i}^{*} A_{i}}, and {\hat{PTE}}_{cv}^{*} (t_{0}) = K^{- 1} \sum_{k = 1}^{K} {({\hat{PTE}}_{{\hat{g}}_{I_{k}}^{*}}^{(- k)} (t_{0}))}^{*} .

In practice, we may generate a large number, say B, realizations for V, and then obtain B realizations of ${\hat{g}}^{*} (s)$ , ${\hat{Δ}}^{*}$ , ${\hat{Δ}}_{{\hat{g}}^{*}}^{*}$ , and ${\hat{PTE}}_{cv}^{*} (t_{0})$ . The variance estimation and the confidence interval (CI) can be constructed based on the empirical variances and quantiles of these realizations.

F. ADDITIONAL SIMULATION RESULTS

It is known that using the same dataset to estimate both g_opt and its corresponding PTE may lead to overfitting bias as in standard prediction settings. The loss function may underestimate the true loss, the error loss $L (g)$ in our context, and may overestimate PTE as a result. When the sample size is large as in our simulation study, the overfitting issue is usually ignorable. Here, we compare simulation results with and without cross-validation. As seen from Table F2, the PTE.nocv (no cross-validation) and PTE.cv (cross-validation) are comparable with the PTE.nocv estimate being slightly larger and with a slightly smaller empirical standard error.

We further examined sensitivity to assumption (A1) by comparing our proposed g_opt to the true g_oracle in setting 3. Importantly, the oracle transformation g_oracle optimizing the oracle loss function is not identifiable in real data since no observable information is available to estimate the correlation between (T⁽¹⁾, S⁽¹⁾) and (T⁽⁰⁾, S⁽⁰⁾). Furthermore, the explicit functional form of g_oracle is also not tractable in general. To numerically approximate g_oracle in a simulation setting, where the joint distributions of (S⁽¹⁾, S⁽⁰⁾, T⁽¹⁾, T⁽⁰⁾) is known, we use a basis expansion to approximate g_oracle as β^⊤Ψ(S), where Ψ(S) is a K-dimensional spline basis expansion of S and β is an unknown K-dimensional parameter. With a sufficiently rich set of {Ψ(S)}, we can well approximate g_oracle via the Monte Carlo method. When restricting g_oracle = β^⊤Ψ(S), we may easily estimate g_oracle by estimating the optimal β based on a large number of simulated realizations of (S⁽¹⁾, S⁽⁰⁾, T⁽¹⁾, T⁽⁰⁾). As shown in Figure F1, g_opt is very close to g_oracle across all three time points t₀ = 0.3, 0.5 and 0.7 under setting 3. We also calculated the relative difference of PTE_opt to PTE_oracle,

{RD}_{PTE} \equiv {{PTE}_{opt} - {PTE}_{oracle}} / {PTE}_{oracle} .

Across all three time points t₀ = 0.3, 0.5, 0.7, RD_PTE = −0.086, −0.018, and 0.005, respectively, suggesting that the PTE estimates may not be sensitive to these departures from the working independence assumption.

TABLE F2.

The proposed CV-based PTE estimates (PTE.cv) and the PTE estimates without CV (PTE.nocv) for Setting 1-Setting 3, with empirical standard errors in subscripts

	Setting 1		Setting 2		Setting 3
t ₀	PTE.nocv	PTE.cv	PTE.nocv	PTE.cv	PTE.nocv	PTE.cv
0.3	0.693_0.090	0.685_0.093	0.454_0.089	0.450_0.099	0.522_0.116	0.510_0.128
0.5	0.770_0.083	0.761_0.087	0.625_0.079	0.619_0.089	0.651_0.107	0.633_0.120
0.7	0.836_0.072	0.817_0.080	0.773_0.066	0.762_0.076	0.771_0.093	0.745_0.101

Open in a new tab

FIGURE F1 — Plots of g(s) for setting 3 with t₀ = 0.3, 0.5, 0.7, respectively.

References

1.Mendelsohn J, Moses HL, Nass SJ, et al. A national cancer clinical trials system for the 21st century: Reinvigorating the NCI Cooperative Group Program. National Academies Press. 2010. [PubMed] [Google Scholar]
2.Zakeri K, Panjwani N, Carmona R, et al. Generalized competing event models can reduce cost and duration of cancer clinical trials. JCO clinical cancer informatics 2018; 2: 1–12. [DOI] [PubMed] [Google Scholar]
3.Tay-Teo K, Ilbawi A, Hill SR. Comparison of sales income and research and development costs for FDA-approved cancer drugs sold by Originator drug companies. JAMA network open 2019; 2(1): e186875–e186875. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.FDA. Table of Surrogate Endpoints That Were the Basis of Drug Approval or Licensure. https://www.fda.gov/drugs/development-resources/table-surrogate-endpoints-were-basis-drug-approval-or-licensure; 2020.
5.Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 1989; 8(4): 431–440. doi: 10.1002/sim.4780080407 [DOI] [PubMed] [Google Scholar]
6.Huang Y, Gilbert PB. Comparing biomarkers as principal surrogate endpoints. Biometrics 2011; 67(4): 1442–1451. doi: 10.1111/j.1541-0420.2011.01603.x [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Gilbert PB, Hudgens MG. Evaluating candidate principal surrogate endpoints. Biometrics 2008; 64(4): 1146–1154. doi: 10.1111/j.1541-0420.2008.01014.x [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Alonso A, Geys H, Molenberghs G, Kenward MG, Vangeneugden T. Validation of surrogate markers in multiple randomized clinical trials with repeated measurements: canonical correlation approach. Biometrics 2004; 60(4): 845–853. [DOI] [PubMed] [Google Scholar]
9.Molenberghs G, Buyse M, Geys H, Renard D, Burzykowski T, Alonso A. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Controlled Clinical Trials 2002; 23(6): 607–625. [DOI] [PubMed] [Google Scholar]
10.Burzykowski T, Molenberghs G, Buyse M. The evaluation of surrogate endpoints. Springer. 2005. [Google Scholar]
11.Price BL, Gilbert PB, v. dMJ Laan. Estimation of the optimal surrogate based on a randomized trial. Biometrics 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine 1992; 11(2): 167–178. doi: 10.1002/sim.4780110204 [DOI] [PubMed] [Google Scholar]
13.Lin D, Fleming T, De Gruttola V, et al. Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in medicine 1997; 16(13): 1515–1527. [DOI] [PubMed] [Google Scholar]
14.Wang Y, Taylor JM. A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics 2002; 58(4): 803–812. doi: 10.1111/j.0006-341X.2002.00803.x [DOI] [PubMed] [Google Scholar]
15.Conlon AS, Taylor JM, Elliott MR. Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal. Biostatistics 2014; 15(2): 266–283. doi: 10.1093/biostatistics/kxt051 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Parast L, McDermott MM, Tian L. Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in medicine 2016; 35(10): 1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Wang X, Parast L, Tian L, Cai T. Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika 2020; 107(1): 107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Parast L, Cai T, Tian L. Evaluating surrogate marker information using censored data. Statistics in medicine 2017; 36(11): 1767–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Scott D Multivariate density estimation. Multivariate Density Estimation, Wiley, New York, 1992 1992; 1. [Google Scholar]
20.Diabetes Prevention Program Group. The Diabetes Prevention Program: design and methods for a clinical trial in the prevention of Type 2 diabetes. Diabetes Care 1999; 22(4): 623–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Diabetes Prevention Program Group. Reduction in the Incidence of Type 2 Diabetes with Lifestyle Intervention or Metformin. New England Journal of Medicine 2002; 346(6): 393–403. doi: 10.1056/NEJMoa012512 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.DPP. Diabetes Prevention Program. https://repository.niddk.nih.gov/studies/dpp/; 2013.
23.Tian L, Cai T, Zhao L, Wei LJ. On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial. Biostatistics 2012; 13(2): 256–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Garcia TP, Ma Y, Yin G. Efficiency improvement in a class of survival models through model-free covariate incorporation. Lifetime data analysis 2011; 17(4): 552–565. [DOI] [PubMed] [Google Scholar]
25.Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 2008; 64(3): 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Han L, Wang X, Cai T. On the Evaluation of Surrogate Markers in Real World Data Settings. arXiv preprint arXiv:2104.05513 2021. [Google Scholar]
27.VanderWeele TJ. Surrogate measures and consistent surrogates. Biometrics 2013; 69(3): 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Elliott MR, Conlon AS, Li Y, Kaciroti N, Taylor JM. Surrogacy marker paradox measures in meta-analytic settings. Biostatistics 2015; 16(2): 400–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Gill RD. Censoring and stochastic integrals. Statistica Neerlandica 1980; 34(2): 124–124. [Google Scholar]
30.Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Springer. 1992. (pp. 297–331). [Google Scholar]

[R1] 1.Mendelsohn J, Moses HL, Nass SJ, et al. A national cancer clinical trials system for the 21st century: Reinvigorating the NCI Cooperative Group Program. National Academies Press. 2010. [PubMed] [Google Scholar]

[R2] 2.Zakeri K, Panjwani N, Carmona R, et al. Generalized competing event models can reduce cost and duration of cancer clinical trials. JCO clinical cancer informatics 2018; 2: 1–12. [DOI] [PubMed] [Google Scholar]

[R3] 3.Tay-Teo K, Ilbawi A, Hill SR. Comparison of sales income and research and development costs for FDA-approved cancer drugs sold by Originator drug companies. JAMA network open 2019; 2(1): e186875–e186875. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.FDA. Table of Surrogate Endpoints That Were the Basis of Drug Approval or Licensure. https://www.fda.gov/drugs/development-resources/table-surrogate-endpoints-were-basis-drug-approval-or-licensure; 2020.

[R5] 5.Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 1989; 8(4): 431–440. doi: 10.1002/sim.4780080407 [DOI] [PubMed] [Google Scholar]

[R6] 6.Huang Y, Gilbert PB. Comparing biomarkers as principal surrogate endpoints. Biometrics 2011; 67(4): 1442–1451. doi: 10.1111/j.1541-0420.2011.01603.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Gilbert PB, Hudgens MG. Evaluating candidate principal surrogate endpoints. Biometrics 2008; 64(4): 1146–1154. doi: 10.1111/j.1541-0420.2008.01014.x [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Alonso A, Geys H, Molenberghs G, Kenward MG, Vangeneugden T. Validation of surrogate markers in multiple randomized clinical trials with repeated measurements: canonical correlation approach. Biometrics 2004; 60(4): 845–853. [DOI] [PubMed] [Google Scholar]

[R9] 9.Molenberghs G, Buyse M, Geys H, Renard D, Burzykowski T, Alonso A. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Controlled Clinical Trials 2002; 23(6): 607–625. [DOI] [PubMed] [Google Scholar]

[R10] 10.Burzykowski T, Molenberghs G, Buyse M. The evaluation of surrogate endpoints. Springer. 2005. [Google Scholar]

[R11] 11.Price BL, Gilbert PB, v. dMJ Laan. Estimation of the optimal surrogate based on a randomized trial. Biometrics 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine 1992; 11(2): 167–178. doi: 10.1002/sim.4780110204 [DOI] [PubMed] [Google Scholar]

[R13] 13.Lin D, Fleming T, De Gruttola V, et al. Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in medicine 1997; 16(13): 1515–1527. [DOI] [PubMed] [Google Scholar]

[R14] 14.Wang Y, Taylor JM. A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics 2002; 58(4): 803–812. doi: 10.1111/j.0006-341X.2002.00803.x [DOI] [PubMed] [Google Scholar]

[R15] 15.Conlon AS, Taylor JM, Elliott MR. Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal. Biostatistics 2014; 15(2): 266–283. doi: 10.1093/biostatistics/kxt051 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Parast L, McDermott MM, Tian L. Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in medicine 2016; 35(10): 1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Wang X, Parast L, Tian L, Cai T. Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika 2020; 107(1): 107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Parast L, Cai T, Tian L. Evaluating surrogate marker information using censored data. Statistics in medicine 2017; 36(11): 1767–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Scott D Multivariate density estimation. Multivariate Density Estimation, Wiley, New York, 1992 1992; 1. [Google Scholar]

[R20] 20.Diabetes Prevention Program Group. The Diabetes Prevention Program: design and methods for a clinical trial in the prevention of Type 2 diabetes. Diabetes Care 1999; 22(4): 623–634. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Diabetes Prevention Program Group. Reduction in the Incidence of Type 2 Diabetes with Lifestyle Intervention or Metformin. New England Journal of Medicine 2002; 346(6): 393–403. doi: 10.1056/NEJMoa012512 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.DPP. Diabetes Prevention Program. https://repository.niddk.nih.gov/studies/dpp/; 2013.

[R23] 23.Tian L, Cai T, Zhao L, Wei LJ. On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial. Biostatistics 2012; 13(2): 256–273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Garcia TP, Ma Y, Yin G. Efficiency improvement in a class of survival models through model-free covariate incorporation. Lifetime data analysis 2011; 17(4): 552–565. [DOI] [PubMed] [Google Scholar]

[R25] 25.Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 2008; 64(3): 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Han L, Wang X, Cai T. On the Evaluation of Surrogate Markers in Real World Data Settings. arXiv preprint arXiv:2104.05513 2021. [Google Scholar]

[R27] 27.VanderWeele TJ. Surrogate measures and consistent surrogates. Biometrics 2013; 69(3): 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Elliott MR, Conlon AS, Li Y, Kaciroti N, Taylor JM. Surrogacy marker paradox measures in meta-analytic settings. Biostatistics 2015; 16(2): 400–412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Gill RD. Censoring and stochastic integrals. Statistica Neerlandica 1980; 34(2): 124–124. [Google Scholar]

[R30] 30.Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Springer. 1992. (pp. 297–331). [Google Scholar]

PERMALINK

Quantifying the Feasibility of Shortening Clinical Trial Duration Using Surrogate Markers

Xuan Wang

Tianxi Cai

Lu Tian

Florence Bourgeois

Layla Parast

Summary

1 |. INTRODUCTION

2 |. METHODOLOGY

2.1 |. An Optimal Prediction Function g and Model Free Definitions of PTE

Remark 1.

2.2 |. Non-parametric estimation of g_opt and ${PTE}_{g_{opt}}$

3 |. SIMULATION STUDIES

TABLE 1.

FIGURE 1.

FIGURE 2.

TABLE 2.

4 |. APPLICATION TO THE DIABETES PREVENTION PROGRAM STUDY

TABLE 3.

FIGURE 4.

5 |. DISCUSSION

FIGURE 3.

APPENDIX

A. DERIVATIONS OF THE OPTIMAL TRANSFORMATION FUNCTION g_opt

B. APPROXIMATE RELATIONSHIP BETWEEN PTE AND PTE_L

TABLE B1.

C. ASYMPTOTIC PROPERTIES FOR $\hat{g} (\cdot)$

D. INFLUENCE FUNCTIONS OF $\hat{Δ}$ AND ${\hat{Δ}}_{g}$

E. PERTURBATION RESAMPLING

F. ADDITIONAL SIMULATION RESULTS

TABLE F2.

FIGURE F1.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Quantifying the Feasibility of Shortening Clinical Trial Duration Using Surrogate Markers

Xuan Wang

Tianxi Cai

Lu Tian

Florence Bourgeois

Layla Parast

Summary

1 |. INTRODUCTION

2 |. METHODOLOGY

2.1 |. An Optimal Prediction Function g and Model Free Definitions of PTE

Remark 1.

2.2 |. Non-parametric estimation of gopt and PTEgopt

3 |. SIMULATION STUDIES

TABLE 1.

FIGURE 1.

FIGURE 2.

TABLE 2.

4 |. APPLICATION TO THE DIABETES PREVENTION PROGRAM STUDY

TABLE 3.

FIGURE 4.

5 |. DISCUSSION

FIGURE 3.

APPENDIX

A. DERIVATIONS OF THE OPTIMAL TRANSFORMATION FUNCTION gopt

B. APPROXIMATE RELATIONSHIP BETWEEN PTE AND PTEL

TABLE B1.

C. ASYMPTOTIC PROPERTIES FOR g^(⋅)

D. INFLUENCE FUNCTIONS OF Δ^ AND Δ^g

E. PERTURBATION RESAMPLING

F. ADDITIONAL SIMULATION RESULTS

TABLE F2.

FIGURE F1.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.2 |. Non-parametric estimation of g_opt and ${PTE}_{g_{opt}}$

A. DERIVATIONS OF THE OPTIMAL TRANSFORMATION FUNCTION g_opt

B. APPROXIMATE RELATIONSHIP BETWEEN PTE AND PTE_L

C. ASYMPTOTIC PROPERTIES FOR $\hat{g} (\cdot)$

D. INFLUENCE FUNCTIONS OF $\hat{Δ}$ AND ${\hat{Δ}}_{g}$