Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Dec 10.
Published in final edited form as: Stat Med. 2021 Sep 2;40(28):6321–6343. doi: 10.1002/sim.9185

Quantifying the Feasibility of Shortening Clinical Trial Duration Using Surrogate Markers

Xuan Wang 1, Tianxi Cai 1,2, Lu Tian 3, Florence Bourgeois 4, Layla Parast 5
PMCID: PMC8595715  NIHMSID: NIHMS1752140  PMID: 34474500

Summary

The potential benefit of using a surrogate marker in place of a long-term primary outcome is very attractive in terms of the impact on study length and cost. Many available methods for quantifying the effectiveness of a surrogate endpoint either rely on strict parametric modeling assumptions or require that the primary outcome and surrogate marker are fully observed i.e., not subject to censoring. Moreover, available methods for quantifying surrogacy typically provide a proportion of treatment effect explained (PTE) measure and do not directly address the important questions of whether and how the trial can be ended earlier using the surrogate marker. In this paper, we specifically address these important questions by proposing a PTE measure to quantify the feasibility of ending trials early based on endpoint information collected at an earlier landmark point t0 in a time-to-event outcome setting. We provide a framework for deriving an optimally predicted outcome for individual patients at t0 based on a combination of surrogate marker and event time information in the presence of censoring. We propose a non-parametric estimator for the PTE measure and derive the asymptotic properties of our estimators. Finite sample performance of our estimators are illustrated via extensive simulation studies and a real data application examining the potential of hemoglobin A1c and fasting plasma glucose to predict treatment effects on long term diabetes risk based on the Diabetes Prevention Program study.

Keywords: surrogate marker, proportion of treatment effect explained by the surrogate, nonparametric estimation, censored data

1 |. INTRODUCTION

Primary endpoints for assessing the effectiveness of a new treatment in randomized clinical trials (RCTs) often require long-term follow-up, especially when the outcome is time to the occurrence of a clinical event. The long duration of RCTs is a challenge in that it hinders timely access to needed treatment as well increases study costs.1,2,3 Efficient trial designs that shorten trial duration could both reduce trial costs and accelerate decision-making regarding treatment effectiveness and distribution. One potential strategy to shorten trial duration is to use surrogate markers collected at an earlier time point during the course of the trial in place of the long term outcome. For example, the surrogate marker hemoglobin A1c has been previously considered by the U.S. Food and Drug Administration as the basis for drug approval in patients with Type 2 diabetes.4 Towards this end, the statistical, epidemiological, and clinical research communities have made substantial progress over the past 30 years by proposing, evaluating, and using methods to assess the value of potential surrogate markers.5,6,7,8,9,10,11

Existing methods for quantifying surrogacy can be largely categorized as model based and model free. Model based approaches typically impose regression models relating the outcome to the treatment alone and to both the treatment and the surrogate marker.12,13,14,15 The proportion of the treatment effect on the primary outcome explained by the surrogate marker (PTE) can then be computed as a ratio of the regression coefficients for treatment. However, when the model assumptions fail, resulting PTE estimates are often biased and invalid.13,16 To avoid reliance on model specification, model free non-parametric approaches have recently been proposed as useful alternatives for quantifying PTE.16,17

When the long-term outcome is a time-to-event outcome measured at time t, quantifying the PTE of a surrogate marker measured at an earlier time, denoted by t0 < t, is further complicated by the fact that the surrogate marker may not be observable for those who drop out of the study or who experience the clinical event by t0. Most existing methods require that the surrogate marker be observed for all subjects. However, recently, Parast et al. (2017)18 proposed a non-parametric framework for estimating the PTE of a surrogate marker in the presence of such complications from censoring. While useful, limitations of this method include requiring monotonicity assumptions about the relationship between the surrogate marker and the outcome as well as the support of the surrogate marker being the same among the two treatment groups.

In this paper, we propose an alternative model free PTE measure to quantify the feasibility of ending trials early based on surrogate information collected at an earlier time t0 in a time-to-event outcome setting. We provide a model free framework for deriving an optimally predicted outcome for individual patients based on a combination of surrogate marker and event time information collected by t0. This framework is similar to those previously proposed in Price et al. (2018)11 and Wang et al. (2020)17 in that an optimal transformation of the surrogate marker is identified to best approximate the long term outcome. However, the present setting is substantially more challenging due to censoring and the possibility that the surrogate marker may not be available among those who have experienced the primary event or are censored prior to t0. Furthermore, our framework allows us to relax assumptions required by Parast et al. (2017).18

The remainder of the paper is organized as follows. We first derive the optimal transformation function of the surrogate information, gopt, define the model free PTE measure and then provide non-parametric estimation procedures for gopt and PTE. We derive the asymptotic properties of our estimators, demonstrate good finite sample performance of our estimators using a simulation study, and illustrate our approach by examining two potential surrogate markers for diabetes, hemoglobin A1c and fasting plasma glucose, using data from the Diabetes Prevention Program (DPP) study.

2 |. METHODOLOGY

2.1 |. An Optimal Prediction Function g and Model Free Definitions of PTE

Let T denote the primary event time outcome and S be the surrogate marker measured at time t0 < t which can be either discrete or continuous. We treat S as continuous for conciseness of presentation but the proposed methods can be easily modified to accommodate discrete S. Due to censoring for T one can only observe X = min(T, C) and δ = I(TC), where C is the censoring time. In addition, we allow S to be not observed for those with X < t0. Under the standard causal inference framework, let T(a), C(a), S(a) denote the respective potential event time, censoring time, and surrogate under treatment A = a ∈ {0, 1}. In practice, {T(1), C(1), S(1)} and {T(0), C(0), S(0)} cannot both be observed for the same subject. We assume that treatment assignment is random, P(A = a) = 0.5. The observable data for analysis consist of n sets of independent and identically distributed random vectors D={Di=(Xi,δi,SiI(Xit0),Ai),i=1,,n}, where Ti=Ti(1)Ai+Ti(0)(1Ai), Ci=Ci(1)Ai+Ci(0)(1Ai), Si=Si(1)Ai+Si(0)(1Ai) is only observed for those with Xit0, and Ci(a) is assumed to be independent of (Ti(a), Si(a)) with P(Ci(a)>t)>0 for a = 0, 1.

We define the treatment effect, Δ(t), as the risk difference at time t:

Δ(t)=μ1(t)μ0(t), where μa(t)=E(Yt(a)) and Yt(a)=I(T(a)>t).

Our goal is to evaluate to what extent surrogate information on T and S collected by t0, defined as Qt0=(Yt0,Yt0S), can be used to approximate the long term treatment effect Δ(t). We consider Qt0 instead of only S at t0 as in Parast et al.18 since for those with T < t0, Yt0 is observed while S is not observable. Let g(Qt0)=(1Yt0)g1+Yt0g2(S) be the prediction function that maps Qt0{0,1}× to a predicted outcome and define the treatment effect on the predicted outcome g(Qt0) as

Δg(t0)=μg,1(t0)μg,0(t0), where μg,a(t0)=E{g(Qt0(a))}.

We aim to study to what extent the estimated treatment effect Δg(t0) can be used to approximate the target treatment effect Δ(t) and how to choose g.

It is essential to identify an optimal prediction function, gopt(Qt0)=(1Yt0)g1,opt+Yt0g2,opt(S), such that gopt(Qt0) maximally predicts Y. Throughout, we suppress t0 from notations for gopt although gopt and its estimator inherently depend on t0. To this end, we follow the strategy of Wang et al.17 and identify an optimal g that minimizes a mean squared error

Loracle(g)=E[(Yt(1)Yt(0)){g(Qt0(1))g(Qt0(0))}]2.

That is, we aim to find an optimal prediction function gopt(·), such that gopt(Qt0(1))gopt(Qt0(0)) can best approximate Yt(1)Yt(0). However, both (Yt(1), S(1)) and (Yt(0), S(0)) are not observable simultaneously for one individual in practice and thus, the correlations between (Yt(1), Qt0(1)) and (Yt(0), Qt0(0)) are not identifiable. Therefore, we instead aim to minimize Loracle(g) under the working independence assumption:

(A1) (Yt(1),S(1)Yt0(1))(Yt(0),S(0)Yt0(0)).

Importantly, we use (A1) only to construct gopt(·); all proposed inference procedures do not require this working assumption to hold (see Section 5 for more discussion). Under (A1), Loracle(g) is reduced to

L(g)=E[{Yt(1)g(Qt0(1))}2+{Yt(0)g(Qt0(0))}2]2E{Yt(1)g(Qt0(1))}E{Yt(0)g(Qt0(0))}.

Since gopt(·) is only identifiable up to a constant shift in Loracle(g), we define gopt(·) as the following constrained minimizer to make gopt(·) location identifiable as well as simplify the minimization problem:

mingFE[{Ytg(Qt0)}2] under the constraint E[{Ytg(Qt0)}|A=0]=0, (1)

where F is the class of measurable functions. Using the fact that the primary outcome can be rewritten as Yt = I(T > t) = I(T > t0)I(T > t) = I(T > t0)Yt, it can be shown that

E[{Ytg(Qt0)}]=E[I(Tt0)(0g1)+I(T>t0){I(T>t)g2(S)}],
E[{Ytg(Qt0)}2]=E[I(Tt0)g12+I(T>t0){I(T>t)g2(S)}2].

So (1) is equivalent to the following optimization problem for g1 and g2(·),

ming1,g2E[(1Yt0)g12+Yt0{Ytg2(S)}2] under the constraint E[(1Yt0)(g1)+Yt0{Ytg2(S)}|A=0]=0.

In Appendix A, we show that the solution to the above optimization problem is

g1,opt=λP(T(0)t0)P(Tt0)=λ1μ0(t0)1μ(t0),g2,opt(s)=m(s|t0)+λf0(s|t0)μ0(t0)f(s|t0)μ(t0), (2)

where μ(t)=E(Yt)=P(T>t), μa(t)=E(Yt|A=a)=P(T>t|A=a), m(s|t0)=E(Yt|S=s,T>t0), fa(s|t0) and f(s|t0) are the respective conditional density functions of S | (T > t0, A = a) and S | T > t0,

λ=c{f0(s|t0)2μ0(t0)2f(s|t0)μ(t0)ds+{1μ0(t0)}21μ(t0)}1 with c=E[Yt0{Ytm(S|t0)}|A=0]=μ1(t0)μ0(t0)μ1(t0)+μ0(t0){m0(s|t0)m1(s|t0)}f1(s|t0)f0(s|t0)f(s|t0)ds

and ma(s|t0)=E(Yt|A=a,S=s,T>t0), a = 0, 1. The magnitude of λ tends to be relatively small since for a good surrogate marker S, m0(s|t0) − m1(s|t0) is small (note that when m0(s|t0) − m1(s|t0) is small, c is small, and thus, λ is small, resulting in a small g1,opt). Thus, g1,opt is usually very small and g2,opt is often dominated by m(s|t0).

With gopt(·), we may approximate the treatment effect on Yt by the treatment effect on gopt(Qt0)=I(Tt0)g1,opt+I(T>t0)g2,opt(S),

Δgopt(t0)=E{gopt(Qt0(1))gopt(Qt0(0))}.

We then define the PTE of Qt0 as

PTEgopt(t0)=Δgopt(t0)/Δ(t); (3)

that is, the proportion of the treatment effect on the long-term outcome that is captured by the treatment effect on surrogate transformation. As defined, the measure PTEgopt(t0) is expected to be close to 1 if gopt(Qt0) is a perfect surrogate and to be close to 0 if gopt(Qt0) is a useless surrogate.

Remark 1.

Note that we define the prediction transformation function of the surrogate information Qt0 to be g(Qt0)=(1Yt0)g1+Yt0g2(S). Another natural prediction function is g(Qt0)=(1Yt0)Yt0+Yt0g2(S)=(1Yt0)0+Yt0g2(S)=Yt0g2(S) as for those with T < t0, Yt0=0 is observed, and g1 in the first transformation function can be regarded as a function of 0, or g1g1(0). Since the optimal g1,opt is usually very small, these two definitions of transformation functions of the surrogate information are very close. While we focus on the first transformation function in the paper, in Appendix B, we also derive the form of the optimal function of the surrogate and similar PTE to (3) with this optimal function i.e. with g1 ≡ 0, and provide simulation results comparing the two approaches.

As mentioned previously, Parast et al. (2017)18 also offer a PTE measure in a time-to-event outcome setting, which we denote by PTEL, and which is indexed by a reference distribution of the surrogate marker. We show in Appendix B that there is a correspondence between our proposed PTE with the second transformation and the PTE definition of Parast et al. (2017)18 with a particular reference distribution in their definition of PTE. The correspondence will approximately hold for the PTE in Section 2. In addition, the proposed new definition of PTE for both transformations also relax strong assumptions required by Parast et al. (2017).18 Specifically, to ensure that PTE is between 0 and 1, they require that P(S(1) > s, T(1) > t0) ≥ P(S(0) > s, T(0) > t0) for all s, P(T(1) > t|S(1) = s, T(1) > t0) ≥ P(T(0) > t|S(0) = s, T(0) > t0) for all s, P(T(1) > t|S(1) = s, T(1) > t0) is a monotone increasing function of s, and S(1) and S(0) have the same support. We show in Appendix B that our proposed PTE only needs the following two relaxed conditions to guarantee it is between 0 and 1:

(C1)P(U(1)>u,T(1)>t0)>P(U(0)>u,T(0)>t0) for all u;
(C2)P(T(1)>t|U(1)=u,T(1)>t0)>P(T(0)>t|U(0)=u,T(0)>t0) for all u,

where U(a)g2,opt(S(a)) ≈ m(S(a)|t0). Condition (C1) implies that the distribution of U in group 1 is stochastically higher than the distribution of U in group 0 given Yt0=1; condition (C2) implies that the conditional mean of Yt given U and Yt0=1 in group 1 is larger than the conditional mean of Yt given U and Yt0=1 in group 0. These two conditions are more likely to be satisfied with the transformed surrogate U than the original S as required by Wang and Taylor (2002)14 and Parast et al. (2017).18 In addition, there is no requirement on the shared support of the surrogate marker between the two treatment arms. When examining a surrogate marker that has some prior evidence supporting its potential utility as a surrogate, it is likely reasonable to expect these conditions to hold. If g2,opt(s) is monotone increasing in s, we may replace U(a) by S(a) in these two aforementioned conditions. Lastly, these two conditions can be examined empirically in practice. We compare our proposed approach to that of Parast et al. (2017)18 in our numerical studies.

2.2 |. Non-parametric estimation of gopt and PTEgopt

In practice, g1,opt and g2,opt(s) are unknown and need to be estimated. To this end, we propose to use kernel smoothing to non-parametrically estimate components of g1,opt and g2,opt(s) with inverse probability weights (IPW) to account for censoring, noting our earlier assumption that Ci(a) is independent of (Ti(a), Si(a)) and P(Ci(a)>t)>0 for a = 0, 1. Specifically, we estimate fa(s|t0), f(s|t0), μa(t0), μ(t0), m(s|t0), c=E{Ytm(S|t0)|A=0,T>t0}μ0(t0) respectively as

f^a(s|t0)=Ai=aKh(Sis)I(Xi>t0)ω^t0,iAi=aI(Xi>t0)ω^t0,i,f^(s|t0)=i=1nKh(Sis)I(Xi>t0)ω^t0,ii=1nI(Xi>t0)ω^t0,i,μ^a(t0)=Ai=aI(Xi>t0)ω^t0,iAi=aω^t0,i,μ^(t0)=i=1nI(Xi>t0)ω^t0,ii=1nω^t0,im^(s|t0)=i=1nKh(Sis)I(Xi>t)ω^t,i/i=1nω^t,ii=1nKh(Sis)I(Xi>t0)ω^t0,i/i=1nω^t0,i,c^=Ai=0I(Xi>t)ω^t,iAi=0ω^t,iAi=0m^(Si|t0)I(Xi>t0)ω^t0,iAi=0I(Xi>t0)ω^t0,i, and λ^=c^{f^0(s|t0)2μ^0(t0)2f^(s|t0)μ^(t0)ds+{1μ^0(t0)}21μ^(t0)}1,

where ω^t,i={I(Xit)δi+I(Xi>t)}/G^Ai(Xit) is the weight accounting for censoring, G^a(t) is the Kaplan-Meier estimator of Ga(t) = P(C > t | A = a) and Kh(·) = K(·/h)/h is a symmetric kernel function with bandwidth h. As is often the case with nonparametric functional estimation procedures, the choice of the bandwidth h is critical. In order to eliminate the impact of the bias of the estimated conditional functions on the resulting estimator, we require the standard undersmoothing assumption of h = O(nv), v ∈ (1/5, 1/2). The choice of h can have a great impact on the resulting estimation procedure, especially in smaller sample sizes, and care is needed in its selection. To obtain h, we choose the bandwidth h=hoptnc0 to ensure the needed undersmoothing, where hopt is obtained using the procedure of Scott(1992)19 and in all numerical examples we choose c0 = 0.11. Alternative choices for c0 may be used as a sensitivity analysis, as well as alternative approaches to select the bandwidth hopt, such as cross-validation, as long as the needed under-smoothing rate is achieved.

Based on these quantities, we may construct a plug-in estimate for (g1,opt, g2,opt(s)), denoted by (g^1,g^2(s)).

g^1=λ^1μ^0(t0)1μ^(t0),g^2(s)=m^(s|t0)+λ^f^0(s|t0)μ^0(t0)f^(s|t0)μ^(t0). (4)

In Appendix C, we show that n12{g^1g1,opt} and (nh)12{g^2(s)g2,opt(s)} converge jointly in distribution to a multivariate normal distribution with mean 0 and variance-covariance Σ2(s).

To estimate PTEgopt(t0), we also adopt the IPW strategy and estimate Δ(t) and Δgopt(t0) and then PTEgopt(t0) respectively, by

Δ^(t)=μ^1(t)μ^0(t),Δ^g(t0)=μ^g,1(t0)μ^g,0(t0), and PTE^g^(t0)=Δ^g^(t0)/Δ^(t).
where μ^a(t)=i=1nω^t,iI(Ai=a)Yt,ii=1nω^t,iI(Ai=a),μ^g,a(t0)=i=1nω^t0,iI(Ai=a)g(Qt0,i)i=1nω^t0,iAi,

and Yt,i = I(Ti > t). Both Yt,i and Qt0,i=(Yt0,i,Yt0,iSi) are observed when the weights are non-zero. In Appendix D, we show that n{Δ^(t)Δ(t)} and n{Δ^g(t0)Δg(t0)}, for a given g, converge respectively in distribution to N{0, σ2(t)} and N{0, σg(t0)2}. Inference for PTE^g^(t0) can be derived based on Δ^(t), Δ^g(t0) and (g^1,g^2(s)) along with their asymptotic distributions.

Using the same dataset to estimate both gopt and its corresponding PTE may lead to overfitting bias as in standard prediction settings. We thus employ the cross-validation (CV) strategy wherein we estimate gopt in a training set and the associated PTE in an independent test set. Specifically, denote Ik and Ik={1,…,n}\Ik, k = 1, …, K, be a random partition of the index set {1, …, n} of equal sizes, and let DI={Di,iI}. Let g^I denote gopt estimated based on DI. Given g^Ik, PTEgopt(t0) is estimated using data in DIk, and denoted by PTE^g^Ik(k)(t0). We then define the CV-based estimator of PTEgopt(t0) as

PTE^CV(t0)=K1k=1KPTE^g^Ik(k)(t0).

The consistency of g^Ik to gopt and that of PTE^g(k)(t0) to PTEg(t0) guarantee the consistency of PTE^cv(t0) to PTEgopt(t0). The asymptotic distribution of PTE^cv(t0)PTEgopt(t0) can be obtained from the asymptotic expansions of g^Ikgopt, PTE^g(k)(t0)PTEg(t0). Specifically, when h = O(nv) with v ∈ (1/4, 1/2),

n12{PTE^cv(t0)PTEgopt(t0)}=n12i=1nψPTEgopt,i(t0)+op(1),

which converges in distribution to a normal with mean 0 and variance τPTEgopt2(t0)=E{ψPTEgopt,i(t0)2}. Given the complexity in constructing explicit estimation of τPTEgopt2(t0), we instead employ resampling methods to estimate this quantity; see Appendix E for details.

3 |. SIMULATION STUDIES

We conducted simulation studies to evaluate the finite sample performance of the proposed methods. Throughout, we let n = 1000, t = 1, use a normal density, and consider t0 = 0.3, 0.5 and 0.7. All results are summarized based on 1000 replications for each configuration. For each setting, we investigate estimates of gopt(·) and PTEgopt(t0), at each t0. For comparison of PTEs, we also include results for the estimation of (a) PTEL(t0) from Parast et al. (2017)18 which accounts for censoring but does not use a transformation of the surrogate marker (see Appendix B) and (b) PTEcox(t0) as defined in Lin et al. (1997)13 which utilizes Cox proportional hazards models and restricts estimation to patients with X > t0.

We generated censoring C(a) ~ exponential(0.5),

S(a)~ Gamma(shape=α1a,scale =α2a), and log T(a)=ϵ(a)log ξa(S(a)),

where ϵ(a) ~ an extreme value distribution and the hyper-parameters αa = (α1a, α2a) and ξa(·) are chosen to reflect different levels of surrogacy. In setting 1, we let α1 = (2, 2), α0 = (9, 0.5), ξ1(s) = 0.2s, and ξ0(s) = 0.2 + 0.22s to represent a strong surrogate marker setting where PTEgopt(t0) ranges from 0.68 to 0.83. In setting 2, we let α1 = (9, 0.8), α0 = (2, 2), ξ1(s) = 0.08s, and ξ0(s) = 0.3 + 0.22s to represent a moderate surrogate marker setting where PTEgopt(t0) ranges from 0.45 to 0.78. In setting 3, we consider the situation that S(1) and S(0) are correlated, so the working independence assumption (A1) does not hold. We let S(1) ~ Gamma(shape = 2, scale = 2), S(0) = S(1) + U, where U ~ Uniform(0, 1), ξ1(s) = 0.2s, and ξ0(s) = 0.2 + 0.22s.

As shown in Table 1 and Figures 12, the estimated prediction functions g^()=(g^1,g^2()) across different values of s generally had negligible biases and the estimated standard errors were close to their empirical standard errors.

TABLE 1.

Estimates (Est) of g1 along with their empirical standard errors (ESE), we also present the average of the estimated standard errors (ASE) along with the empirical coverage probabilities (CP, %) of the 95% confidence intervals.

Setting 1
t 0 g1.true g1 ESE ASE CP
0.3 −0.034 −0.037 0.013 0.013 0.954
0.5 −0.022 −0.027 0.012 0.012 0.929
0.7 −0.022 −0.019 0.009 0.009 0.918
Setting 2
t 0 g1.true g1 SE ESE CP
0.3 −0.077 −0.077 0.013 0.013 0.938
0.5 −0.051 −0.051 0.012 0.011 0.921
0.7 −0.028 −0.030 0.009 0.009 0.934
Setting 3
t 0 g1.true g1 SE ESE CP
0.3 −0.044 −0.045 0.016 0.015 0.931
0.5 −0.031 −0.033 0.014 0.013 0.927
0.7 −0.018 −0.021 0.010 0.010 0.941

FIGURE 1.

FIGURE 1

The empirical bias, empirical standard error (ESE) versus average of the estimated standard error (ASE), coverage probabilities (CP) of the 95% confidence intervals for g^2(s) under Setting 1.

FIGURE 2.

FIGURE 2

The empirical bias, empirical standard error (ESE) versus average of the estimated standard error (ASE), coverage probabilities (CP) of the 95% confidence intervals for g^2(s) under Setting 2.

Simulation results for PTEgopt(t0) at t0 ∈ {0.3, 0.5, 0.7} are shown in Table 2. In setting 1, PTEgopt(t0) was 0.68, 0.77 and 0.83 for t0 = 0.3, 0.5 and 0.7 with an increasing trend over t0. In setting 2, the PTE of gopt(Qt0) ranged from 0.45 to 0.78. In setting 3, the PTE of gopt(Qt0) ranged from 0.52 to 0.80. Our proposed estimators had negligible biases and estimated standard errors were close to their empirical standard errors with confidence intervals attaining appropriate empirical coverages. The results also indicate that the proposed method is very robust, whether the working independence assumption holds or not.

TABLE 2.

Estimates (Est) of PTEgopt(t0) (where PTE denotes the proposed estimate), PTEcox(t0) and PTEL(t0) at t0 = 0.3, 0.5 and 0.7 along with their empirical standard errors (ESE, in subscript). Shown also are the average of the estimated standard errors (ASE) along with the empirical coverage probabilities (CP, %) of the 95% confidence intervals for our proposed estimator PTE^cv(t0).

Setting 1
t 0 PTEL PTEcox PTE.true PTE ESE ASE CP
0.3 0.5640.119 0.1210.130 0.684 0.685 0.093 0.097 0.955
0.5 0.6800.114 0.1030.180 0.767 0.761 0.087 0.094 0.965
0.7 0.7810.095 0.0860.229 0.827 0.817 0.080 0.088 0.968
Setting 2
t 0 PTEL PTEcox PTE.true PTE ESE ASE CP
0.3 −0.0840.281 −0.9940.259 0.448 0.450 0.099 0.101 0.939
0.5 0.3040.230 −1.0090.299 0.626 0.619 0.089 0.097 0.965
0.7 0.6070.145 −1.0170.330 0.778 0.762 0.076 0.087 0.977
Setting 3
t 0 PTEL PTEcox PTE.true PTE ESE ASE CP
0.3 0.4760.121 0.0980.172 0.517 0.510 0.128 0.122 0.927
0.5 0.6180.112 0.0420.208 0.659 0.633 0.120 0.121 0.932
0.7 0.7510.094 −0.0090.269 0.800 0.745 0.101 0.115 0.959

Existing PTE estimators, including PTEL(t0) and PTEcox(t0), are also shown in Table 2. Since the Cox model did not hold for T | A, the model based estimate PTEcox(t0) gave a very low PTE estimate in setting 1 and even negative results in setting 2. Our proposed estimates of PTEgopt were of similar magnitude as those given by PTEL in setting 1. However, since PTEL is not guaranteed to be valid when the supports of S in the two treatment groups are not the same which was the case in setting 2, estimates of PTEL were substantially flawed in this setting, particularly when t0 = 0.3 or 0.5. In Appendix F we also present (1) PTE results without cross-validation for comparison and (2) additional simulation results examining sensitivity to the working independence assumption (A1).

4 |. APPLICATION TO THE DIABETES PREVENTION PROGRAM STUDY

The Diabetes Prevention Program (DPP) RCT investigated the effect of several prevention strategies for reducing the risk of type 2 diabetes (T2D) among high risk individuals with pre-diabetes.20,21 DPP data are publicly available through the the National Institute of Diabetes and Digestive and Kidney Diseases Central Repository.22 The participants were randomized to one of four treatment groups: placebo, lifestyle intervention, metformin and troglitazone. The primary endpoint of the trial was time to T2D onset and the participants were followed up to 5 years with an average follow up of 2.8 years. Previous study results have shown that both lifestyle and metformin significantly reduced T2D risk.20,21 For illustration, we focused on the comparison of lifestyle intervention (n1 = 1024) to placebo (n0 = 1030) with respect to diabetes risk at t = 4 years; the estimated treatment effect was Δ^(4)=0.157. Our goal was to investigate to what extent surrogate information on hemoglobin A1C (HbA1c) or fasting glucose collected at t0 ∈ {0.5, 1, 2, 3} can be used together with diabetes event information by t0 to predict the treatment effect on diabetes risk at t = 4 years. We evaluate the surrogacy potential of these markers (change from baseline to t0) based on the proposed PTE measure.

The PTE estimates of each marker as well as g1,opt estimates for the proposed method are shown in Table 3, and estimates of g2,opt(s) are in Figure 4. Estimates of g1 are all close to 0 and estimates of g2,opt(s) are substantially different from s itself, though at later times, the g^2(s) is more flat implying similar contributions of g^2(s) to the treatment effect at different s for these later times. The observation that g2,opt(s) is different from s itself is important because many of the available methods for evaluating a surrogate marker focus on s itself, not any kind of transformation of s let alone an optimal transformation. If we saw that, in contrast, g2,opt(s) was essentially the same as s itself, this would imply that a transformation of s is not necessary. The PTE estimates were moderate at t0 = 0.5 and 1 (0.3 ~ 0.5) but increased to 0.8 at t0 = 2 and almost 1 at t0 = 3. These results show that the treatment effect on later year surrogate information explained a larger proportion of the treatment effect on 4-year survival, as would be expected. Comparing the two potential surrogates, at earlier times, glucose appeared to have a slightly higher PTE compared to that of HbA1C with the difference being more pronounced at t0 = 1. At later times, the PTEs of the two potential surrogates were comparable and close to 1. Results from PTEL were qualitatively similar but smaller compared to our proposed PTEgopt while the Cox model based estimates were above 1 at t0 = 3, likely reflecting model mis-specification.

TABLE 3.

PTE estimates for HbA1c and fasting plasma glucose.

HbA1c
t0 = 0.5 t0 = 1 t0 = 2 t0 = 3
Est SE Est SE Est SE Est SE
g 1 −0.097 0.016 −0.062 0.012 −0.034 0.009 −0.001 0.004
PTE 0.317 0.064 0.520 0.071 0.816 0.072 1.008 0.034
PTEcox 0.207 0.057 0.486 0.080 0.798 0.078 1.313 0.070
PTEL 0.146 0.067 0.468 0.129 0.741 0.189 1.017 0.282
Fasting plasma glucose
t0 = 0.5 t0 = 1 t0 = 2 t0 = 3
Est SE Est SE Est SE Est SE
g 1 −0.086 0.016 −0.044 0.012 −0.021 0.008 0.001 0.004
PTE 0.349 0.065 0.641 0.076 0.826 0.068 0.966 0.043
PTEcox 0.355 0.061 0.636 0.082 0.890 0.075 1.318 0.066
PTEL 0.235 0.090 0.566 0.150 0.753 0.196 0.995 0.286

FIGURE 4.

FIGURE 4

The estimator g^2(s) and confidence intervals for the two surrogates respectively; up indicates upper bound of confidence interval, low indicates lower bound of confidence interval; top panel is HbA1c and bottom panel is fasting plasma glucose; results for t0 = 0.5, 1, 2, 3 are shown from left to right, respectively

Because we observe higher PTE estimates as t0 increases, one may wish to examine the PTE of the primary outcome information alone up to t0, in this case, diabetes incidence up to t0. This quantity can provide valuable information regarding the incremental value of the surrogate marker with respect to the PTE.18 In this example, the PTE of diabetes incidence alone up to t0 is 0.119 when t0 = 0.5, 0.405 when t0 = 1, 0.716 when t0 = 2, and 0.986 when t0 = 3. Comparing these to the estimates of the PTE for the surrogate information using our proposed approach demonstrates that, for example, at t0 = 2, while diabetes incidence at 2 years explains 71.6% of the treatment effect, fasting plasma glucose at 2 years explains an additional 11%.

5 |. DISCUSSION

We propose a proportion of treatment effect explained measure to quantify the surrogacy of a marker measured earlier in time in a time-to-event outcome setting. Building on our previously developed optimal transformation function in Wang et al. (2020),17 we propose a prediction function that optimally combines information on the primary endpoint up to t0 and the surrogate marker information at t0 for those who have not yet experienced the primary outcome. Our proposed model free method has distinct advantages over existing methods. First, while the framework is generally similar to Wang et al. (2020),17, our extension to the time-to-event outcome setting is not trivial and is necessary to allow for use of this framework in practice, where primary outcomes are very often censored. Second, this optimal transformation framework allows us to relax strict assumptions that are required by, for example, Parast et al. (2017)18, and we explicitly demonstrate in our simulation study the superior performance of our proposed method when such assumptions are violated. An R package implementing the methods proposed in this article, named OSsurvival, is available at https://celehs.github.io/OSsurvival/.

As in the non-censored case of Wang et al. (2020)17, the working assumption (A1) is only used to facilitate the derivation of gopt(·). Even when (A1) fails, gopt(·) remains a sensible prediction function that enables us to effectively use both Yt0 and observed S to predict the outcome Yt i.e., to recover information about the difference in Yt between two independent patients assigned to treatment vs. control. Furthermore, the interpretation of the proposed PTE measure as well as the proposed inference procedures remain valid regardless of the adequacy of (A1). These properties were illustrated in the simulation results examining the sensitivity to violations of this assumption.

Similar to Wang et al. (2020)17, we assumed that p1 = (A = a) = 0.5 both in the population loss function and in the observed data. In practice, even if the observed trial has a randomization ratio different from 1 : 1, the proposed loss function is still an appropriate choice as it reflects a future population with p1 = 0.5. In that case, the population gopt(·) remains the same and our proposed estimators can be easily modified to include inverse probability of treatment assignment weights to yield consistent estimators of gopt(·) and PTE. More generally, if there is a pre-conceived p1 ≠ 0.5, one may modify both the objective function and the estimators with weights to allow for different treatment assignment probabilities.

In practice, randomized clinical trials often have baseline covariate information available; given randomization, such information can be used to gain efficiency and power.23,24,25 One approach to incorporate baseline covariates into our proposed method would be through the use of augmentation.18 Specifically, one could construct augmented versions of Δ^g^(t0) and Δ^(t) as:

(Δ^(t)AΔ^g^(t0)A)=(Δ^(t)Δ^g^(t0))+A{n11Ai=1h(Zi)n01A1=0h(Zi)} (5)

where Zi are i.i.d. random vectors of baseline covariates and h(·) is a pre-specified basis transformation. Given treatment randomization, the augmented estimators converge to the same limit as the non-augmented estimators since n11Ai=1h(Zi)n01A1=0h(Zi) converges to zero in probability as the sample size goes to infinity. One may then select A such that the variance of (Δ^(t)A,Δ^g^(t0)A) is minimized, and obtain a final augmented version of PTE as PTE^g^(t0)A=Δ^g^(t0)A/Δ^(t)A. We would expect increased efficiency with the augmented estimate if the covariates Zi are associated with the primary outcome and the surrogate marker. When the study is not a randomized trial, is an observational study, or reflects a real-world dataset (RWD), we recently proposed to estimate the optimal transformation function of the surrogate by inverse probability weighted (IPW) method and doubly robust (DB) method to incorporate the baseline covariates, see Han et al. (2021).26

Importantly, the setup considered in this paper is one in which all individuals are randomized at the same timepoint such that the baseline time is the same for all individuals, or equivalently, the surrogate information is collected after the same follow-up period after enrollment for all individuals. An extension of this method, and surrogate marker evaluation methods more generally, to settings with staggered entry would be a valuable future contribution to this field. In a staggered entry study, the time t0 after randomization differs for each individual and thus if one wishes to evaluate the surrogate at a particular timepoint after study initiation, some individuals may have more information observed after t0 that could be utilized. Additionally, an extension of this proposed method that would be useful in practice would be to allow for the evaluation of multiple surrogate markers. A fully non-parametric approach would likely not be feasible given our use of kernel smoothing, but the incorporation of machine learning methods, particularly if the surrogate information is high-dimensional, could allow for great flexibility within this same framework. In addition, while we were able to relax assumptions required by others, further work is needed more generally in the development statistical methods to assess and use surrogate markers regarding sensitivity and robustness to assumptions to ensure avoidance of a surrogate paradox situation.27,28

FIGURE 3.

FIGURE 3

The empirical bias, empirical standard error (ESE) versus average of the estimated standard error (ASE), coverage probabilities (CP) of the 95% confidence intervals for g^2(s) under Setting 3.

APPENDIX

A. DERIVATIONS OF THE OPTIMAL TRANSFORMATION FUNCTION gopt

In this section, we derive the specific form for the optimal transformation function of the surrogate information, gopt(·). We aim to solve the following problem for g1 and g2(·):

ming1,g212E[I(Tt0)g12+I(T>t0){Ytg2(Qt0)}2] under the constraint E[(1Yt0)(g1)+Yt0{Ytg2(Qt0)}|A=0]=0.

Let g˜2(s)=g2(s)m(s|t0), where m(s|t0)=E[Y|S=s,T>t0]. Then,

E[Yt0{Ytg2(S)}2]=E[I(T>t0){Ym(S|t0)g˜2(S)}2]=E[I(T>t0){Ym(S|t0)}2]2E[I(T>t0){Ym(S|t0)}g˜2(S)]+E[I(T>t0)g˜22(S)]=E[I(T>t0){Ym(S|t0)}2]+E[I(T>t0)g˜22(S)].

Hence the problem is equivalent to

ming1,g˜212E[I(Tt0)g12+I(T>t0)g˜22(S)] given E[I(Tt0)g1+I(T>t0)g˜2(S)|A=0]=c,

where

cE[Yt0{Ytm(S|t0)}|A=0]=μ0(t0)E[Yt(0)m(S(0)|t0)|T(0)>t0]. (A1)

This problem is equivalent to

ming1,g˜2P(Tt0)g12+FC(g˜2), given that GC(g˜2)+P(T(0)t0)g1=c,

where FC(g˜2)=g˜22(s)f(s,t0)ds, GC(g˜2)=g˜2(s)f0(s,t0)ds, f(s, t0) = f(s|t0)μ(t0) and f0(s, t0) = f0(s|t0)μ0(t0). Using Lagrange multipliers and taking the derivative with respect to g1, we have

ddg1[P(Tt0)g12+FC(g˜2)2λ{GC(g˜2)+P(T(0)t0)g1}]=2g1P(Tt0)2λP(T(0)t0)=0,

so that

g1,opt=λP(T(0)t0)P(Tt0).

Taking the Frechet derivatives of the functionals we have that for all measurable h such that ∫ h2(s)f(s, t0)ds < ∞,

ddg˜2[P(Tt0)g12+FC(g˜2)2λ{GC(g˜2)+P(T(0)t0)g1}]=2g˜2(s)h(s)f(s,t0)ds2λh(s)f0(s,t0)ds=0.

This then implies (setting h = δ(s))

g˜2(s)=λf0(s,t0)f(s,t0), for all s.

Hence, by the constraint and (A1), we have

λ=cf02(s,t0)f(s,t0)ds+P2(T(0)t0)P(Tt0)=μ0(t0)E[Yt(0)m(S(0)|t0)|T(0)>t0]f02(s,t0)f(s,t0)ds+P2(T(0)t0)P(Tt0)=E{Ytm(S|t0)|A=0,T>t0}μ0(t0)f0(s|t0)2μ0(t0)2f(s|t0)μ(t0)ds+{1μ0(t0)}21μ(t0), (A2)

which also gives us the following solution:

g1,opt=P(T(0)t0)P(Tt0)μ0(t0)E[Yt(0)m(S(0)|t0)|T(0)>t0]f02(s,t0)f(s,t0)ds+P2(T(0)t0)P(Tt0), and g2,opt(s)=m(s|t0)+g˜2(s)=m(s|t0)+f0(s,t0)f(s,t0)μ0(t0)E[Yt(0)m(S(0)|t0)|T(0)>t0]f02(s,t0)f(s,t0)ds+P2(T(0)t0)P(Tt0).

Furthermore, the numerator of λ equals to

E{Ytm(S|t0)|A=0,T>t0}μ0(t0)=μ0(t0)[E{Yt|A=0,T>t0,S=s}f0(s|t0)dsm(s|t0)f0(s|t0)ds]=μ0(t0)[m0(s|t0)f0(s|t0)dsm(s|t0)f0(s|t0)ds]=μ0(t0)[{m0(s|t0)m1(s|t0)}f1(s|t0)μ1(t0)f(s|t0){μ1(t0)+μ0(t0)}f0(s|t0)ds]=μ1(t0)μ0(t0)μ1(t0)+μ0(t0){m0(s|t0)m1(s|t0)}f1(s|t0)f0(s|t0)f(s|t0)ds,

because

m(s|t0)=m1(s|t0)P(A=1|S=s,T>t0)+m0(s|t0)P(A=0|S=s,T>t0)=m1(s|t0)1/2f1(s|t0)μ1(t0)1/2f1(s|t0)μ1(t0)+1/2f0(s|t0)μ0(t0)+m0(s|t0)1/2f0(s|t0)μ0(t0)1/2f1(s|t0)μ1(t0)+1/2f0(s|t0)μ0(t0)=m1(s|t0)1/2f1(s|t0)μ1(t0)f(s|t0)μ(t0)+m0(s|t0)1/2f0(s|t0)μ0(t0)f(s|t0)μ(t0)=m1(s|t0)f1(s|t0)μ1(t0)f(s|t0){μ1(t0)+μ0(t0)}+m0(s|t0)f0(s|t0)μ0(t0)f(s|t0){μ1(t0)+μ0(t0)}.

Therefore, λ can be rewritten as

λ=μ1(t0)μ0(t0)μ1(t0)+μ0(t0){m0(s|t0)m1(s|t0)}f1(s|t0)f0(s|t0)f(s|t0)ds{f0(s|t0)2μ0(t0)2f(s|t0)μ(t0)ds+{1μ0(t0)}21μ(t0)}1. (A3)

B. APPROXIMATE RELATIONSHIP BETWEEN PTE AND PTEL

Letting g1 ≡ 0, we will show that there is a correspondence between the proposed PTE and the PTE definition of Parast et al.18. Since g1 in Section 2 is very close to 0, the correspondence will approximately hold for the PTE in Section 2. Recalling that Yt = I(T > t) = I(T > t0)I(T > t). Assuming S > 0 almost surely, with a slight abuse of notation, Qt0=I(Tt0)0+I(T>t0)S=I(T>t0)S and g(Qt0)=I(T>t0)g(S).

We have the following problem:

mingF12E[{Ytg(Qt0)}2] under the constraint E[Ytg(Qt0)|A=0]=0,

where F is the class of measurable functions. The problem is equivalent to the following problem for g(·),

mingF12E[I(T>t0){Ytg(S)}2] under the constraint E[Yt0{Ytg(S)}|A=0]=0.

Let g˜(S)=g(S)m(S|t0), where m(S|t0) = E[Y|S = s, T > t0]. Then,

E[Yt0{Ytg(S)}2]=E[I(T>t0){Ym(S|t0)g˜(S)}2]=E[I(T>t0){Ym(S|t0)}2]2E[I(T>t0){Ym(S|t0)}g˜(S)]+E[I(T>t0)g˜2(S)].

Hence the problem is reduced to

ming˜12E[I(T>t0)g˜2(S)] given E[I(T>t0)g˜(S)|A=0]=c

where cE[Yt0{Ytm(S|t0)}|A=0] here. This problem is equivalent to

ming˜FC(g˜), given that GC(g˜)=c,

where FC(g˜)=g˜2(s)f(s,t0)ds, and GC(g˜)=g˜(s)f0(s,t0)ds. Taking the Frechet derivatives of the functionals we have that for all measurable h such that ∫ h(s)f(s, t0)ds < ∞

ddg[FC(g˜)2λGC(g˜)]=2g˜(s)h(s)f(s,t0)ds2λh(s)f0(s,t0)ds=0.

This of course implies (setting h = δ(s)) similarly to before

g˜(s)=λf0(s,t0)f(s,t0), for all s.

Hence by the constraint we have

λ=cf02(s,t0)f(s,t0)ds,

which also gives us values of

gopt(s)=m(s|t0)+g˜(s)=m(s|t0)+f0(s|t0)f(s|t0)cμ0(t0)f02(s|t0)f(s|t0)ds.

From Parast et al.18, we have that their residual treatment effect is

Δs(t,t0)=E[I(St0>0){m1(s|t0)m0(s|t0)}]=μ(t0)E{m1(s|t0)m0(s|t0)|T>t0},

where ma(t|s, t0) = P(T(a) > t|S(a) = s, T(k) > t0), k = 0, 1. We know that

Δ(t)=E(Yt(1))E(Yt(0))=μ1(t0)m1(s|t0)dF1(s|t0)μ0(t0)m0(s|t0)dF0(s|t0),

where μa(t0) = P(T(a > t0) and Fa(s|t0) = P(S(a)s|T(a) > t0) for a = 0, 1. So

ΔL=Δ(t)Δs(t,t0)=m1(s|t0){μ1(t0)f1(s|t0)μ(t0)f˜(s|t0)}dsm0(s|t0){μ0(t0)f0(s|t0)μ(t0)f˜(s|t0)}ds,

where f˜(s|t0) is the density function of a reference distribution for S, and fa(s|t0) is the density function of S(a)|T(a) > t0.

Plugging in the formula of gopt(s),

gopt(s)=m(s|t0)+f0(s|t0)f(s|t0)cμ0(t0)f02(s|t0)f(s|t0)dswith
m(s|t0)=E(Yt|S=s,T>t0)=E(Yt|S=s,T>t0,A=1)P(A=1|S=s,T>t0)+E(Yt|S=s,T>t0,A=0)P(A=0|S=s,T>t0)=m1(s|t0)r1(s,t0)+m0(s|t0)r0(s,t0),
ra(s,t0)=P(A=a|S=s,T>t0)=fa(s|t0)μa(t0)/{f1(s|t0)μ1(t0)+f0(s|t0)μ0(t0)},c=E[Yt0{Ytm(S|t0)}|A=0]=E{Yt|A=0}E{I(T>t0)m(S|t0)|A=0}=μ0(t0)m0(s|t0)f0(s|t0)dsμ0(t0){m1(s|t0)r1(s,t0)+m0(s|t0)r0(s,t0)}f0(s|t0)ds=μ0(t0){m1(s|t0)+m0(s|t0)}r1(s,t0)f0(s|t0)ds,

we have

Δgopt(t0)=E{gopt(Q(1))gopt(Q(0))}=E{I(T>t0)gopt(S)|A=1}E{I(T>t0)gopt(S)|A=0}=gopt(s){μ1(t0)f1(s|t0)dsμ0(t0)f0(s|t0)ds}=m(s|t0){μ1(t0)f1(s|t0)dsμ0(t0)f0(s|t0)ds}+f0(s|t0)f(s|t0)cμ0(t0)f02(s|t0)f(s|t0)ds{μ1(t0)f1(s|t0)dsμ0(t0)f0(s|t0)ds}m1(s|t0){μ1(t0)f1(s|t0)μ1(t0)w(s|t0)}dsm0(s|t0){μ0(t0)f0(s|t0)μ1(t0)w(s|t0)}ds,
where w(s|t0)=r0(s,t0)f1(s|t0)+r1(s,t0)f0(s|t0)f0(s|t0)f(s|t0)f1(s|t0)dsf02(s|t0)f(s|t0)ds=f0(s|t0)f1(s|t0)/f(s|t0)f0(s|t0)2/f(s|t0)ds.

If we let the density of the reference distribution in ΔL

f˜(s|t0)=μ1(t0)μ(t0)w(s|t0),

then Δgopt=ΔL. So PTE=Δgopt/Δ=PTEL under this special case. This relationship holds approximately for the proposed PTE in Section 2.

We now derive the assumptions needed to ensure that PTE is between 0 and 1. The assumptions approximately suffice to make the proposed PTE in Section 2 between 0 and 1.

Let U = gopt(S), we have

Δ(t)=μ1(t0)m1(u|t0)f1(u|t0)duμ0(t0)m0(u|t0)f0(u|t0)du,
Δgopt(t0)=m1(u|t0){μ1(t0)f1(u|t0)μ1(t0)w(u|t0)}dum0(u|t0){μ0(t0)f0(u|t0)μ1(t0)w(u|t0)}du,
Δ(t)Δgopt(t0)=μ1(t0){m1(u|t0)m0(u|t0)}w(u|t0)du, (B4)

where ma(u|t0), fa(u|t0) and w(u|t0) are defined similarly to ma(s|t0), fa(s|t0) and w(s|t0) with S replaced by U respectively, a = 0, 1.

Direct calculations show that

Δgopt(t0)=E{gopt(Q(1))gopt(Q(0))}=E{I(T>t0)gopt(S)|A=1}E{I(T>t0)gopt(S)|A=0}=uf1(u|t0)μ1(t0)duuf0(u|t0)μ0(t0)du,=μ1(t0)[uuF1(u|t0)du]μ0(t0)[uuF0(u|t0)du]=uu{μ1(t0)μ0(t0)}+{P(T(1)>t0,U(1)>u)P(T(0)>t0,U(0)>u)}du, (B5)

where Fa(u|t0) is the cumulative distribution function corresponding to fa(u|t0), a = 0, 1 and uu is the up bound of U. It follows from (B4) and (B5), a set of sufficient conditions for Δ(t) > Δg(t0) > 0 is

(C1).P(U(1)>u,T(1)>t0)>P(U(0)>u,T(0)>t0) for all u;
(C2).m1(u|t0)>m0(u|t0) for all u.

In addition, we compare the performances of these two surrogate transformations, gopt(Qt0)=(1Yt0)g1,opt+Yt0g2,opt(S) in the main paper and g(Qt0)=I(T>t0)g(S) in this section, using the simulation settings described in the main paper. The PTE estimators are denoted as PTE.tran1 and PTE.tran2, respectively. From Table B1 we see that although the differences are very small, the PTE estimates using gopt(Qt0) are generally a little larger for all the considered settings. This is because the transformation gopt(Qt0) is more general and flexible, which includes the latter transformation as a special case when the true value of g1 is 0. If m1(s|t0) and m0(s|t0) are not close, where ma(s|t0)=E(Yt|A=a,S=s,T>t0), a = 0, 1, g1 would not be expected to be very close to 0 and a version which sets g1 = 0 would omit this information and not perform as well with respect to capturing the treatment effect.

TABLE B1.

PTE estimates for two kinds of transformations, with empirical standard errors in subscripts, under Setting 1-Setting 3.

Setting 1 Setting 2 Setting 3
t 0 PTE.tran1 PTE.tran2 PTE.tran1 PTE.tran2 PTE.tran1 PTE.tran2
0.3 0.6930.090 0.6770.093 0.4540.089 0.4080.091 0.5220.116 0.4990.118
0.5 0.7700.083 0.7520.088 0.6250.079 0.5830.083 0.6510.107 0.6270.111
0.7 0.8360.072 0.8210.077 0.7730.066 0.7450.071 0.7710.093 0.7530.098

C. ASYMPTOTIC PROPERTIES FOR g^()

Throughout, we assume that m(s|t0) is continuously differentiable. In addition, we assume that f(s, t0) = f(s|t0)μ(t0), fa(s, t0) = fa(s|t0)μa(t0), a = 0, 1 are continuously differentiable with finite support. For inference, we also require undersmoothing with h = op(n−1∕5) for interval estimation of gopt and h = op(n−1∕4) for the interval estimation of PTE. Since m^(s), f^(s,t0) and fa^(s,t0), a = 0, 1 are standard kernel estimators, we have that they are consistent w.r.t their true values with rate (log n)12(nh)12+h2. It follows immediately that |g^1g1, opt|=Op{(log n)12(nh)12+h2} and sups|g^2(s)g2,opt(s)|=Op{(log n)12(nh)12+h2}.

We firstly derive the influence functions for each estimators in Section 2.2.

f^0(s,t0)f0(s,t0)=n1Ai=0Kh(Sis)I(Xi>t0)ω^t0,in1Ai=0ω^t0,if0(s,t0)=n1i=1n2I(Ai=0)[Kh(Sis)I(Xi>t0)G^0(t0)Kh(Sis)I(Xi>t0)G0(t0)+Kh(Sis)I(Xi>t0)G0(t0)f0(s,t0)ωt0,if0(s,t0)(ω^t0,iωt0,i)]+op((nh)1/2)=n1i=1n2I(Ai=0)[Kh(Sis)I(Xi>t0)G0(t0)G0(t0)G^0(t0)G0(t0)+Kh(Sis)I(Xi>t0)G0(t0)f0(s,t0)ωt0,if0(s,t0){δiI(Xit0)G0(Xi)G^0(Xi)+G0(Xi)G0(Xi)+I(Xi>t0)G0(t0)G^0(t0)+G0(t0)G0(t0)}]+op{(nh)1/2}=2n1j=1nn1Ai=a[I(Xi>t0)Ga(t0){Kh(Sis)f0(s,t0)}I(xt0)f0(s,t0)δiI(Xit0)Ga(Xi)I(xXi)]I(Aj=a)dMjc(x)n1l=1nI(Al=a)I(Xlx)+n1i=1n2I(Ai=0)[Kh(Sis)I(Xi>t0)G0(t0)f0(s,t0)ωt0,i]+op{(nh)1/2}(nh)1i=1nϕ0,i(s,t0)+op{(nh)1/2}.

Similar derivations can be used to obtain

f^(s,t0)f(s,t0)(nh)1i=1nϕi(s,t0)+op{(nh)1/2},m^(s|t0)m(s|t0)=i=1nKh(Sis)I(Xi>t)ω^t,i/i=1nω^t,ii=1nKh(Sis)I(t0<Xi)ω^t0,i/i=1nω^t0,iE[Y|S=s,T>t0]=1f(s,t0){i=1nKh(Sis)I(Xi>t)ω^t,i/i=1nω^t,if(s,t)}f(s,t)f2(s,t0){i=1nKh(Sis)I(Xi>t0)ω^t0,i/i=1nω^t0,if(s,t0)}+op{(nh)1/2}(nh)1i=1nϕm,i(s)+op{(nh)1/2},c^c=Ai=0I(Xi>t)ω^t,iAi=0ω^t,iAi=0m^(Si|t0)I(Xi>t0)ω^t0,iAi=0I(Xi>t0)ω^t0,ic=Ai=0I(Xi>t)ω^t,iAi=0ω^t,iAi=0m(Si|t0)I(Xi>t0)ω^t0,iAi=0I(Xi>t0)ω^t0,icAi=0{m^(Si|t0)m(Si|t0)}I(Xi>t0)ω^t0,iAi=0I(Xi>t0)ω^t0,i,n1i=1nϕc,i+op(n1/2)

by incorporating the influence function of n1/2{m^(s|t0)m(S|t0)}. Furthermore,

f^02(s,t0)f^(s,t0)dsf02(s,t0)f(s,t0)dsn1i=1nϕf,i+op(n1/2),p^0(t0)p0(t0)n1i=1nϕp,0,i(t0)+op(n1/2),p^(t0)p(t0)n1i=1nϕp,i(t0)+op(n1/2),

where pa(t0) = 1 – μa(t0), p(t0) = 1 – μ(t0), p^a(t0)=1μ^a(t0), and p^(t0)=1μ^(t0).

Using above results we can obtain the influence functions for the optimal transformation function estimators by coupling delta method with the fact that

g1,opt=G˜1(p0(t0),p(t0),c,f02(s,t0)f(s,t0)ds)=p0(t0)p(t0)cf02(s,t0)f(s,t0)ds+p02(t0)p(t0)
g2,opt(s)=G˜2(m(s|t0),f0(s,t0),f(s,t0),c,p0(t0),p(t0),f02(s,t0)f(s,t0)ds)=m(s|t0)+f0(s,t0)f(s,t0)cf02(s,t0)f(s,t0)ds+p02(t0)p(t0),
g^1=G˜1(p^0(t0),p^(t0),c^,f^02(s,t0)f^(s,t0)ds),

and

g^2(s)=G˜2(m^(s|t0),f^0(s,t0),f^(s,t0),c^,p^0(t0),p^(t0),f^02(s,t0)f^(s,t0)ds).

Specifically, we can show that

g^1g1,opt=n1i=1nϕg,1,i+op(n1/2),

and

g^2(s)g2,opt(s)=(nh)1i=1nϕg,2,i(s)+op{(nh)1/2},

where E(ϕg,1,i2)< and E{ϕg,2,i(s)2} < ∞.

D. INFLUENCE FUNCTIONS OF Δ^ AND Δ^g

We first give some existing results needed in the proof. Following Gill29 and Robins and Rotnitzky30, we have

G^a(t)Ga(t)Ga(t)=Aj=a0tdMjc(u)Al=aI(Xlu)+op(n1/2), (D6)
δiGAi(Xi)=10dMic(u)GAi(u) (D7)

where Ga(t) = P(C(a) > t), Mc(t)=Nc(t)0t{Aλ1c(u)+(1A)λ0c(u)}I(Xu)du, Nc(t) = I(Xt, δ = 0), and λac(t) is the hazard function of C(a).

We next derive asymptotic distributions for Δ^(t) and Δ^g(t0) for a fixed g function. To this end, we first write

μ^a(t)μa(t)=i=1nI(Ai=a)ω^t,iI(Xi>t)i=1nI(Ai=a)ω^t,iμa=i=1nI(Ai=a){I(Xi>t)G^a(t)ω^t,iμa}i=1nI(Ai=a)ω^t,i=2n1Ai=a[I(Xi>t)G^a(t)I(Xi>t)Ga(t)+I(Xi>t)Ga(t)ωt,iμaμa{ω^t,iωt,i}]+op(n1/2)=2n1Ai=a[I(Xi>t)Ga(t)G^a(t)+Ga(t)Ga(t)+I(Xi>t)Ga(t)ωt,iμa]2n1Ai=aμa[δiI(Xit)Ga(Xi)G^a(Xi)+Ga(Xi)Ga(Xi)+I(Xi>t)Ga(t)G^a(t)+Ga(t)Ga(t)]+op(n1/2)=2n1j=1nn1Ai=a[I(Xi>t)Ga(t)(1μa)I(xt)μaδiI(Xit)Ga(Xi)I(xXi)]I(Aj=a)dMjc(x)n1l=1nI(Al=a)I(Xlx)+2n1Ai=a[I(Xi>t)Ga(t)ωt,iμa]+op(n1/2)n1j=1nψa1,j(t)+n1j=1nψa2,j(t)+op(n1/2)n1j=1nψa,j(t)+op(n1/2).

It follows that

n{Δ^(t)Δ(t)}=n1/2i=1n{ψ1,i(t)ψ0,i(t)}+op(1)n1/2i=1nψi(t)+op(1).

By the central limit theorem we have that n{Δ^(t)Δ(t)} converges in distribution to a normal distribution N(0, σ2(t)) with σ2=E[ψi2(t)]. Similarly, we have

μ^g,a(t0)μg,a(t0)=Ai=aω^t0,ig(Qt0,i)Ai=aω^t0,iμg,a(t0)=2n1Ai=a[g1δiI(Xit0)G^a(Xi)+g2(Si)I(Xi>t0)G^a(t0)ω^t0,iμg,a(t0)]+op(n1/2)=2n1Ai=a[g1δiI(Xit0)Ga(Xi)G^a(Xi)+Ga(Xi)Ga(Xi)+g2(Si)I(Xi>t0)Ga(t0)G^a(t0)+Ga(t0)Ga(t0)]+2n1Ai=a[g1δiI(Xit0)Ga(Xi)+g2(Si)I(Xi>t0)Ga(t0)ωt0,iμg,a(t0)]2n1Ai=aμg,a(t0)[g1δiI(Xit0)Ga(Xi)G^a(Xi)+Ga(Xi)Ga(Xi)+g2(Si)I(Xi>t0)Ga(t0)G^a(t0)+Ga(t0)Ga(t0)]+op(n1/2)=2n1j=1nn1Ai=a(1μg,a)[g1δiI(Xit0)G^a(Xi)I(xXi)+g2(Si)I(Xi>t0)G^a(t0)I(xt0)]I(Aj=a)dMjc(x)n1l=1nI(Al=a)I(Xlx)+2n1Ai=a[g1δiI(Xit0)Ga(Xi)+g2(Si)I(Xi>t0)Ga(t0)ωt0,iμg,a(t0)]+op(n1/2)n1j=1nψg,a1,j(t0)+n1j=1nψg,a2,j(t0)+op(n1/2)n1j=1nψg,a,j(t0)+op(n1/2).

It follows that

n{Δ^g(t0)Δg(t0)}=n1/2i=1n{ψg,1,i(t0)ψg,0,i(t0)}+op(1)n1/2i=1nψg,i(t0)+op(1).

By the central limit theorem we have that n{Δ^g(t0)Δg(t0)} converges in distribution to a normal distribution N(0,σg2(t0)) with σg2(t0)=E{ψg,i2(t0)}.

With the above influence functions for Δ^(t) and Δ^g(t0), the variance estimates for them can be obtained by perturbation resampling method.

E. PERTURBATION RESAMPLING

In practice, we may estimate the asymptotic variance of each estimator by perturbation resampling similar to those employed in Parast et al. (2016).16 Specifically, we may generate V = (V1, …, Vn) from independent and identically distributed non-negative random variables with mean 1 and variance 1 such as the unit exponential distribution. For each set of V, we let

f^a*(s|t0)=Ai=aViKh(Sis)I(Xi>t0)ω^t0,i*Ai=aViI(Xi>t0)ω^t0,i*,f^*(s|t0)=i=1nViKh(Sis)I(Xi>t0)ω^t0,i*i=1nViI(Xi>t0)ω^t0,i*,μ^a*(t0)=Ai=aViI(Xi>t0)ω^t0,i*Ai=aViω^t0,i*,μ^*(t0)=i=1nViI(Xi>t0)ω^t0,i*i=1nViω^t0,i*m^*(s|t0)=i=1nViKh(Sis)I(Xi>t)ω^t,i*/i=1nViω^t,i*i=1nViKh(Sis)I(Xi>t0)ω^t0,i*/i=1nViω^t0,i*,c^*=Ai=0ViI(Xi>t)ω^t,i*Ai=0Viω^t,i*Ai=0Vim^*(Si|t0)I(Xi>t0)ω^t0,i*Ai=0ViI(Xi>t0)ω^t0,i*, andλ^*=c^*{f^0*(s|t0)2μ^0*(t0)2f^*(s|t0)μ^*(t0)ds+{1μ^0*(t0)}21μ^*(t0)}1,

where ω^t,i*={I(Xit)δi+I(Xi>t)}/G^Ai*(Xit) is the perturbed weight with the perturbed Kaplan-Meier estimator G^Ai*(). Then we may obtain the perturbed counterparts of g^(s), Δ^, Δ^g^, and PTE^cv(t0) as

g^1*=λ^*1μ^0*(t0)1μ^*(t0),g^2*(s)=m^*(s|t0)+λ^*f^0*(s|t0)μ^0*(t0)f^*(s|t0)μ^*(t0),
Δ^*(t)=μ^1*(t)μ^0*(t),Δ^g^**(t0)=μ^g^*,1*(t0)μ^g^*,0*(t0),PTE^g^**(t0)=Δ^g^**(t0)/Δ^*(t) with μ^a*(t)=i=1nViω^t,i*I(Ai=a)Yt,ii=1nViω^t,i*I(Ai=a),μ^g,a*(t0)=i=1nViω^t0,i*I(Ai=a)g(Qt0,i)i=1nViω^t0,i*Ai, and PTE^cv*(t0)=K1k=1K(PTE^g^Ik*(k)(t0))*.

In practice, we may generate a large number, say B, realizations for V, and then obtain B realizations of g^*(s), Δ^*, Δ^g^**, and PTE^cv*(t0). The variance estimation and the confidence interval (CI) can be constructed based on the empirical variances and quantiles of these realizations.

F. ADDITIONAL SIMULATION RESULTS

It is known that using the same dataset to estimate both gopt and its corresponding PTE may lead to overfitting bias as in standard prediction settings. The loss function may underestimate the true loss, the error loss L(g) in our context, and may overestimate PTE as a result. When the sample size is large as in our simulation study, the overfitting issue is usually ignorable. Here, we compare simulation results with and without cross-validation. As seen from Table F2, the PTE.nocv (no cross-validation) and PTE.cv (cross-validation) are comparable with the PTE.nocv estimate being slightly larger and with a slightly smaller empirical standard error.

We further examined sensitivity to assumption (A1) by comparing our proposed gopt to the true goracle in setting 3. Importantly, the oracle transformation goracle optimizing the oracle loss function is not identifiable in real data since no observable information is available to estimate the correlation between (T(1), S(1)) and (T(0), S(0)). Furthermore, the explicit functional form of goracle is also not tractable in general. To numerically approximate goracle in a simulation setting, where the joint distributions of (S(1), S(0), T(1), T(0)) is known, we use a basis expansion to approximate goracle as βΨ(S), where Ψ(S) is a K-dimensional spline basis expansion of S and β is an unknown K-dimensional parameter. With a sufficiently rich set of {Ψ(S)}, we can well approximate goracle via the Monte Carlo method. When restricting goracle = βΨ(S), we may easily estimate goracle by estimating the optimal β based on a large number of simulated realizations of (S(1), S(0), T(1), T(0)). As shown in Figure F1, gopt is very close to goracle across all three time points t0 = 0.3, 0.5 and 0.7 under setting 3. We also calculated the relative difference of PTEopt to PTEoracle,

RDPTE{PTEoptPTEoracle}/PTEoracle.

Across all three time points t0 = 0.3, 0.5, 0.7, RDPTE = −0.086, −0.018, and 0.005, respectively, suggesting that the PTE estimates may not be sensitive to these departures from the working independence assumption.

TABLE F2.

The proposed CV-based PTE estimates (PTE.cv) and the PTE estimates without CV (PTE.nocv) for Setting 1-Setting 3, with empirical standard errors in subscripts

Setting 1 Setting 2 Setting 3
t 0 PTE.nocv PTE.cv PTE.nocv PTE.cv PTE.nocv PTE.cv
0.3 0.6930.090 0.6850.093 0.4540.089 0.4500.099 0.5220.116 0.5100.128
0.5 0.7700.083 0.7610.087 0.6250.079 0.6190.089 0.6510.107 0.6330.120
0.7 0.8360.072 0.8170.080 0.7730.066 0.7620.076 0.7710.093 0.7450.101

FIGURE F1.

FIGURE F1

Plots of g(s) for setting 3 with t0 = 0.3, 0.5, 0.7, respectively.

References

  • 1.Mendelsohn J, Moses HL, Nass SJ, et al. A national cancer clinical trials system for the 21st century: Reinvigorating the NCI Cooperative Group Program. National Academies Press. 2010. [PubMed] [Google Scholar]
  • 2.Zakeri K, Panjwani N, Carmona R, et al. Generalized competing event models can reduce cost and duration of cancer clinical trials. JCO clinical cancer informatics 2018; 2: 1–12. [DOI] [PubMed] [Google Scholar]
  • 3.Tay-Teo K, Ilbawi A, Hill SR. Comparison of sales income and research and development costs for FDA-approved cancer drugs sold by Originator drug companies. JAMA network open 2019; 2(1): e186875–e186875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.FDA. Table of Surrogate Endpoints That Were the Basis of Drug Approval or Licensure. https://www.fda.gov/drugs/development-resources/table-surrogate-endpoints-were-basis-drug-approval-or-licensure; 2020.
  • 5.Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 1989; 8(4): 431–440. doi: 10.1002/sim.4780080407 [DOI] [PubMed] [Google Scholar]
  • 6.Huang Y, Gilbert PB. Comparing biomarkers as principal surrogate endpoints. Biometrics 2011; 67(4): 1442–1451. doi: 10.1111/j.1541-0420.2011.01603.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gilbert PB, Hudgens MG. Evaluating candidate principal surrogate endpoints. Biometrics 2008; 64(4): 1146–1154. doi: 10.1111/j.1541-0420.2008.01014.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Alonso A, Geys H, Molenberghs G, Kenward MG, Vangeneugden T. Validation of surrogate markers in multiple randomized clinical trials with repeated measurements: canonical correlation approach. Biometrics 2004; 60(4): 845–853. [DOI] [PubMed] [Google Scholar]
  • 9.Molenberghs G, Buyse M, Geys H, Renard D, Burzykowski T, Alonso A. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Controlled Clinical Trials 2002; 23(6): 607–625. [DOI] [PubMed] [Google Scholar]
  • 10.Burzykowski T, Molenberghs G, Buyse M. The evaluation of surrogate endpoints. Springer. 2005. [Google Scholar]
  • 11.Price BL, Gilbert PB, v. dMJ Laan. Estimation of the optimal surrogate based on a randomized trial. Biometrics 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine 1992; 11(2): 167–178. doi: 10.1002/sim.4780110204 [DOI] [PubMed] [Google Scholar]
  • 13.Lin D, Fleming T, De Gruttola V, et al. Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in medicine 1997; 16(13): 1515–1527. [DOI] [PubMed] [Google Scholar]
  • 14.Wang Y, Taylor JM. A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics 2002; 58(4): 803–812. doi: 10.1111/j.0006-341X.2002.00803.x [DOI] [PubMed] [Google Scholar]
  • 15.Conlon AS, Taylor JM, Elliott MR. Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal. Biostatistics 2014; 15(2): 266–283. doi: 10.1093/biostatistics/kxt051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Parast L, McDermott MM, Tian L. Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in medicine 2016; 35(10): 1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang X, Parast L, Tian L, Cai T. Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika 2020; 107(1): 107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Parast L, Cai T, Tian L. Evaluating surrogate marker information using censored data. Statistics in medicine 2017; 36(11): 1767–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Scott D Multivariate density estimation. Multivariate Density Estimation, Wiley, New York, 1992 1992; 1. [Google Scholar]
  • 20.Diabetes Prevention Program Group. The Diabetes Prevention Program: design and methods for a clinical trial in the prevention of Type 2 diabetes. Diabetes Care 1999; 22(4): 623–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Diabetes Prevention Program Group. Reduction in the Incidence of Type 2 Diabetes with Lifestyle Intervention or Metformin. New England Journal of Medicine 2002; 346(6): 393–403. doi: 10.1056/NEJMoa012512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.DPP. Diabetes Prevention Program. https://repository.niddk.nih.gov/studies/dpp/; 2013.
  • 23.Tian L, Cai T, Zhao L, Wei LJ. On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial. Biostatistics 2012; 13(2): 256–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Garcia TP, Ma Y, Yin G. Efficiency improvement in a class of survival models through model-free covariate incorporation. Lifetime data analysis 2011; 17(4): 552–565. [DOI] [PubMed] [Google Scholar]
  • 25.Zhang M, Tsiatis AA, Davidian M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 2008; 64(3): 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Han L, Wang X, Cai T. On the Evaluation of Surrogate Markers in Real World Data Settings. arXiv preprint arXiv:2104.05513 2021. [Google Scholar]
  • 27.VanderWeele TJ. Surrogate measures and consistent surrogates. Biometrics 2013; 69(3): 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Elliott MR, Conlon AS, Li Y, Kaciroti N, Taylor JM. Surrogacy marker paradox measures in meta-analytic settings. Biostatistics 2015; 16(2): 400–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gill RD. Censoring and stochastic integrals. Statistica Neerlandica 1980; 34(2): 124–124. [Google Scholar]
  • 30.Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Springer. 1992. (pp. 297–331). [Google Scholar]

RESOURCES