Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 7.
Published in final edited form as: Mathematics (Basel). 2022 Feb 13;10(4):584. doi: 10.3390/math10040584

A Surrogate Measure for Time-Varying Biomarkers in Randomized Clinical Trials

Rui Zhuang 1, Fan Xia 1, Yixin Wang 2, Ying-Qing Chen 2,*
PMCID: PMC11542621  NIHMSID: NIHMS2031678  PMID: 39512426

Abstract

Clinical trials with rare or distant outcomes are usually designed to be large in size and long term. The resource-demand and time-consuming characteristics limit the feasibility and efficiency of the studies. There are motivations to replace rare or distal clinical endpoints by reliable surrogate markers, which could be earlier and easier to collect. However, statistical challenges still exist to evaluate and rank potential surrogate markers. In this paper, we define a generalized proportion of treatment effect for survival settings. The measure’s definition and estimation do not rely on any model assumption. It is equipped with a consistent and asymptotically normal non-parametric estimator. Under proper conditions, the measure reflects the proportion of average treatment effect mediated by the surrogate marker among the group that would survive to mark the measurement time under both intervention and control arms.

Keywords: surrogate measure, survival settings, time-varying internal markers

1. Introduction

The HPTN (HIV Prevention Trial Network) 052 study is an HIV prevention trial conducted across several continents. The primary clinical endpoint of interest is that HIV infection is estimated to have a rate of 3–5% in modern clinical trial settings. In addition to this, the time from viral exposure to infection is long. The median infection time on average is more than one year. It is desirable to replace the clinically meaningful endpoint by an earlier and more easily accessible alternative endpoint.

A surrogate marker in clinical trials is considered to be “a laboratory measurement or physical sign used as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, functions, or survives and that is expected to predict the effect of the therapy” [1]. It is considered to be valid if one could correctly conclude treatment effect on the clinical endpoint by using the marker [2,3]. In this context, how to validate surrogate markers for a clinically meaningful endpoint are especially important. Zhuang and Chen [4] review the surrogate measures in clinical research on their strengths and limitations in details.

A surrogate marker is considered to be valid if one could correctly conclude the treatment effect on the clinical endpoint by using that marker. In the language of hypothesis testing, that is, departure from the null hypothesis P(TZ)=P(T) is captured by departure from the null hypothesis P(SZ)=P(S), where Z,S, and T represent the intervention, marker, and clinical endpoint, respectively. Prentice [5] operationalized the idea to test P(TS,Z)=P(TS). The conditional independence of T and Z represents an ideal situation when the marker fully mediates the treatment effect. However, many candidate markers may only capture part of the treatment effect, so that to which extent a marker captures the treatment effect is of great practical importance. Freedman et al. [6] further extended Prentice’s criterion and evaluated the strength of surrogate markers by comparing the treatment effect with and without adjusting for the marker. For a binary endpoint T and logistic models:

logit(TZ)=μ1+βZ,
logit(TZ,S)=μ2+βsZ+ϕzS,

the proportion of treatment effect explained (PTE) is defined as PTE=1-βs/β. However, the adjusted and unadjusted models do not hold simultaneously in general model classes [79], and the assumption of no interaction in the adjusted model is not necessarily true. To avoid model dependence, Wang and Taylor [9] proposed the F-measure in a general setting as F=(AA-AB)/(AA-BB), where AA=hsgA(s)dPA(s),AB=hsgA(s)dPB(s), and BB=hsgB(s)dPB(s). Here, PA(s) and PB(s) are the distributions of surrogate marker S in the treatment group A and the control group B, respectively. The functions gA(s) and gB(s) are functions of the conditional distribution of the primary endpoint given S in the two groups. The functions h(),gA(s) and gB(s) are chosen such that AA-BB is the desired measure of treatment effect on the primary endpoint. The F-measure framework is flexible while preserving the flavor of comparing the marginal treatment effect and adjusted treatment effect.

In this paper, we bring in the time dimension and define a generalized F-measure for time-to-event outcomes and time-varying internal surrogate markers explicitly. In Section 2, we introduce the time-varying F-measure. We show the measure can be estimated using a non-parametric estimator that is consistent and asymptotically normal. In Section 3, we give examples to visualize the change of F-measure with time and conduct Monte Carlo simulation studies to evaluate the proposed non-parametric estimation and inference. In Section 4, we apply the time-varying F-measure to an HIV prevention trial for illustration. Finally, we conclude the paper with a discussion in Section 5 and a conclusion in Section 6.

2. The Time-Varying F-Measure

2.1. Definition

We introduce the time-varying F-measure in this section. The new measure does not rely on any model assumption. In addition, it reflects Prentice’s criterion and describes the degree to which a marker captures the treatment effect on the clinical endpoints.

We consider intervention groups Z=1 and Z=0. Let T represent the time-to-event outcome and Xt represents the value of a candidate marker measured at time point t (after randomization). The time-varying F-measure is formulated to evaluate the marker when survival status at time point c(c>t) is of primary interest. We choose h(u)=u,gz(x)=PTcXt=x,Tt,Z=z,Pz(s)=P(TcTt,Z=z). Then, the F-measure for a time-to-event outcome T is:

F(c,t)=AAABAABB,

where

AA=P(Tc|Tt,Z=1),
BB=P(Tc|Tt,Z=0),
AB=xP(TcXt=x,Tt,Z=1)P(Xt=xTt,Z=0).

It is a function of time point c when the survival status is of primary interest, and time point t when the surrogate marker is measured. The definition and estimation do not necessarily rely on any model assumption and are exempt from model misspecification.

The time-varying F-measure reflects Prentice’s criterion. Namely, the scenarios of perfect markers, in which a marker mediates all the treatment effect, lead to F(c,t)=1; the scenarios of useless markers, in which a marker does not mediate any treatment effect or is independent of intervention in the group of interest, leads to F(c,t)=0. In addition, when the treatment effect mediated by the marker is consistent with the direct treatment effect, the F-measure for a partial marker is guaranteed to be bounded within (0,1). A value outside the ideal bound indicates treatment effects via different pathways are not in the same direction so that the marker is not an appropriate surrogate. (Theoretic results are deferred to Section 2.3.)

In summary, the time-varying F-measure evaluates the relative position of the survival probability adjusted by eliminating the treatment effect on a biomarker. It serves as a model-free metric for assessing the proportion of treatment effect explained by the marker.

2.2. Estimation and Inference

In the time-varying F-measure, survival probabilities can be estimated by the non-parametric Kaplan–Meier estimator [10]. Under the assumption of random censoring, the conditional probability px0PXt=xTt,Z=0 can be estimated by the empirical distribution. Naturally, we propose a plug-in estimator for the defined time-varying F-measure:

F^=s1^xs1x^px0^s1^s0^, (1)

where sz^(z=0,1) and s1x^ are the Kaplan-Meier estimator for P(TcTt,Z=z) and PTcXt=x,Tt,Z=1, respectively. Let uz1<uz2< be the ordered, distinct times observed on arm z;nz(τ) be the number of subjects at risk set at time τ on arm z; and dz(τ) be the number of events at time τ on arm z. The Kaplan–Meier estimator of survival probabilities reads:

sz^=t<uzkc(1dz(uzk)nz(uzk)).

Similarly, s1xPrTcXt=x,Tt,Z=1 can be estimated by the Kaplan-Meier estimator in the strata by Z=1 and Xt=x as:

s1x^=t<u1xkc(1d1x(u1xk)n1x(u1xk)).

Under the assumption of random censoring, px0PXt=xTt,Z=0 can be estimated by the empirical distribution as:

px0^=n0x(t)n0(t),

where n0x(t)=i=1nIXt=x,Tt,Z=0.

We show the proposed estimator converges weakly to a Gaussian process under the following regularity assumptions (Proof of the theorem is deferred to Appendix A).

Assumption 1. The time c is in a range of (t,τ) for some constant t>0,0<τ< such that s1(τ)s0(τ)>0 and 1-1-Hτ-1-Gτ-<1, where H is the distribution function of time-to-event T and G is the distribution function of censoring time U.

Assumption 2. Survival probabilities s1()s0() on (0,τ).

Assumption 3. Random censoring: The censoring time U is independent of both the failure time T and time-varying covariates Xt on (0,τ).

Theorem 1. 𝒰nder regularity assumptions 1–3, given a time t,nt(F^(c)-F(c)) converges weakly to a zero-mean Gaussian process with covariance function Eζ(c)ζc between time points c and c, where:

ζ(c)=s1xpx0s1x(s1s0)2η0+xpx0s1xs0(s1s0)2η1+1s0s1xpx0η1x+1s0s1xs1xη0xp

and

η0=s0(c)I(Z=0|Tt)tcdN(u)Y(u)dΛ0(u)E(I(Z=0|Tt)𝒴(u)),
η1=s1(c)I(Z=1|Tt)tcdN(u)Y(u)dΛ1(u)E(I(Z=1|Tt)𝒴(u)),
η1x=s1(cXt=x,Tt)I(Z=1,Xt=xTt)tcdN(u)Y(u)dΛ1x(u)E(I(Z=1,Xt=xTt)𝒴(u)),
η0xp=1p0(I(Xt=x,Z=0Tt)p0x)p0xp02(I(Z=0Tt)p0).

In the above equations, N(u)I(T𝒰,Tu) denote the observed counting process and 𝒴(u)I(Tu,𝒰u) the at-risk process. The covariance function Eζ(c)ζc can be consistently estimated by 1/nti=1ntζˆi(c)ζˆic, where ζˆi() is the sample versions of ζ().

2.3. Ranges of F-Measure

2.3.1. Perfect Marker

When the marker mediates the entire treatment effect, we have PTcXt=x,Tt,Z=1)=PTcXt=x,Tt,Z=0. It implies xPTcXt=x,Tt,Z=1)PXt=xTt,Z=0=xPTcXt=x,Tt,Z=0PXt=xTt,Z=0), and furthermore, F(c,t)=1.

2.3.2. Useless Marker

When the marker does not mediate any treatment effect, we have PTcXt=x1,Tt,Z=1)=PTcXt=x2,Tt,Z=1; when the intervention is independent of Xt in the risk set at time point t, we have PXt=xTt,Z=1=PXt=xTt,Z=0). Either of the above useless marker conditions leads to xPTcXt=x,Tt,Z=1)PXt=xTt,Z=1=xPTcXt=x,Tt,Z=1PXt=xTt,Z=0), and furthermore, F(c,t)=0.

2.3.3. Partial Marker

Without loss of generality, we consider the case AA-BB>0. Theorem 2 stated below and its proof at Appendix B are naturally extendable for the counterpart case AA-BB<0. To give interpretability and links to common instances in clinical trials, we impose three mild assumptions:

Assumption 4. Xt in the treatment group and that in the control group are stochastically ordered, PXtxTt,Z=1PXtxTt,Z=0x, or PXtxTt,Z=1)PXtxTt,Z=0x.

Assumption 5. PTcXt=x,Tt,Z=z is monotone with x in the same direction for any given z.

Assumption 6. PTcXt=x,Tt,Z=z is monotone with z in the same direction for any given x.

In addition, we formulate three conditions:

  • C1. PTcXt=x,Tt,Z=1-PTcXt=x,Tt,Z=0>0.

  • C2. PXtxTt,Z=1PXtxTt,Z=0x and PTcXt=x,Tt,Z=1) is increasing with x.

  • C3. PXtxTt,Z=1PXtxTt,Z=0x and PTcXt=x,Tt,Z=1) is decreasing with x.

Theorem 2. With Assumptions 4–6, if Condition C1 is satisfied, then F<1; if either Condition C2 or C3 is satisfied, then F>0.

2.4. Causal Interpretation

The F-measure is closely related with the concept of natural indirect effect, which is defined in the counterfactual framework [11,12]. We formulate Theorem 3 revealing the link with detailed proof in Appendix C.

Assumption 7. TZ=1,Xt=x,Xt(Z=1)TZ=0,Xt=x,Xt(Z=0).

Assumption 8. ZTZ=z,Xt=x,Xt(Z=z).

Assumption 9. Xt(Z=1)TZ=1,Xt=xTt,Z=1.

Theorem 3. Under Assumptions 7–9, it holds that:

F=P(T(1)cT(1)t,T(0)t)P(T(1,Xt=Xt(0))cT(1)t,T(0)t)P(T(1)cT(1)t,T(0)t)P(T(0)cT(1)t,T(0)t).

For the subgroup of T(1)t,T(0)t, F-measure’s numerator describes the natural indirect effect mediated by the surrogate marker while the denominator describes the average treatment effect. The ratio reflects the proportion of the average treatment effect mediated by the surrogate measure (in the sense of natural direct effect) for the subgroup surviving to marker measurement anyway. However, we also note that the causal interpretation does not apply in general [13].

3. Numerical Studies

To assess the proposed surrogate measure, we conduct numerical studies motivated by the HIV Prevention Trial Network. The plasma HIV-1 viral load represents the degree of viral burden and is believed to play a crucial role in mediating the benefit of antiretroviral therapy (ART) on HIV-related disease progression and transmission. We consider a viral load measurement dichotomized by a threshold of 1000 copies per cubic millimeter as the biomarker of interest. In a hypothetical scenario, participants have some HIV-1 exposure at the enrollment. The viral load level may increase fast in the follow-up while an effective intervention could delay the virus proliferation and further suspend the failure time. We express the above scenario in the following mathematical models. The dichotomous viral load level at time t is modeled as Xt=Itts, where ts denotes the time when one’s viral load shifts from level 0 to 1 after enrollment. We assume ts follows an exponential distribution with mean μz in intervention group z, and a time-varying Cox-Weibull model:

h(tXt,Z)=h0(t)exp(b1Z+b2Xt), (2)
h0(t)=λvtv1, (3)

where Z is Bernoulli with success probability of 0.5.

3.1. Numerical Examples

In Figure 1, we explore the numerical behavior of the time-varying F-measure under the motivation scenario described above. In particular, we assume Model (2) with a constant baseline hazard where h0(t)=0.2. Without loss of generality, we assume b10,b20, and c=5. With the model assumption, the F-measure has a closed-form formula with details in the Appendix D. Figure 1a has b1=-1,t0=0.5,t1=2, varying b2 visualized F-measures curves from an useless marker with b2=0 to partial markers with b2>0. Figure 1b has b2=1,t0=0.5,t1=2, varying b1 gives the F-measure curves from a perfect marker with b1=1 to partial markers with b1<0.

Figure 1.

Figure 1.

F-measure curves describing the surrogacy level for survival status at Year 5.

3.2. Monte–Carlo Simulation

In this section, we describe our Monte–Carlo simulation to evaluate the proposed non-parametric estimator. We generate failure times for each z-group based on a closed-form approach described in Austin [14]: First, a random value u is generated from the Uniform (0, 1) distribution and the subject-specific shift time ts is generated from an exponential distribution with mean μz; second, if-logu<λexpb1ztsv, we let failure time T=-log(u)/λexpb1z1/v; otherwise, T=-logu-λexpb1ztsv+λexpb2expb1ztsv/λexpb2expb1z1/v. In addition, we generate the censoring times from Uniform (0,τ), in which τ is chosen to give a censoring rate of 20%. The censoring is independent of the failure time T, the covariates Z and Xt.

Table 1 summarizes the simulation results. We consider the scale parameter v to be 0.8, 1, and 1.2, representing when the hazard is decreasing, constant, and increasing with time, respectively. We are interested in the surrogacy level of the t-th year marker measurement for the treatment effect on the c-th year survival probability. For each setting of v, we choose c=5 (years) and t=0.25,0.5,1,2 (years). We show typical scenarios when a surrogate marker is perfect, useless, or partial. A perfect surrogate marker explains all the treatment effect on the clinical endpoint, i.e., b1=0; an useless maker is conditionally independent of the failure time given Z, i.e., b2=0; a partial marker, the most common scenario in practice, is beyond the above extreme situations. Without loss of generality, we consider a treatment delaying the failure time by both directly affecting the clinical endpoint and suppressing a harmful marker. That is, b10, b20, and μ0μ1. Specifically, here are the configurations for the three scenarios in Table 1: (1) A perfect marker: λ=0.02,b1=0,b2=3,t0=3 months, t1=30 months; (2) a useless marker: λ=0.3,b1=-1,b2=0,t0=3 months, t1=30 months; and (3) a partial marker: λ=0.2,b1=-0.5,b2=0.5,t0=3 months, t1=30 months. We replicate 1000 times with 20,000 subjects. Under a large sample size, the estimator is unbiased; its variance accurately reflects the sampling variation; the coverage of 95% Wald-type confidence intervals is close to the nominal probability. One limitation for the non-parametric estimator is its lack of efficiency, which is the price for avoiding model misspecification.

Table 1.

Simulation results under Cox-Weibull distribution. The sample size of the study is 20,000 subjects and the coverage probability is obtained by 1000 replicates.

c = 5, v = 0.8
Scenario Marker timea True value Bias Sampling SE Mean of SE Coverage
0.25 0.747 0.003 0.052 0.051 0.941
Perfect 0.5 0.932 0.002 0.054 0.055 0.956
1 0.995 0.003 0.059 0.059 0.953
2 1.000 0.007 0.081 0.080 0.948
0.25 0.000 0.001 0.034 0.034 0.948
Useless 0.5 0.000 0.001 0.031 0.030 0.951
1 0.000 0.001 0.023 0.023 0.949
2 0.000 0.001 0.017 0.016 0.944
0.25 0.197 0.003 0.051 0.051 0.955
Partial 0.5 0.229 0.001 0.047 0.047 0.953
1 0.213 0.002 0.038 0.037 0.952
2 0.167 0.003 0.030 0.030 0.957
c = 5, v = 1
Scenario Marker timea True value Bias Sampling SE Mean of SE Coverage
0.25 0.743 0.002 0.043 0.044 0.951
Perfect 0.5 0.931 0.002 0.044 0.046 0.954
1 0.995 0.002 0.047 0.048 0.949
2 1.000 0.004 0.062 0.063 0.945
0.25 0.000 0.001 0.033 0.034 0.964
Useless 0.5 0.000 0.000 0.030 0.030 0.958
1 0.000 0.000 0.022 0.022 0.954
2 0.000 0.001 0.015 0.016 0.954
0.25 0.204 0.002 0.051 0.051 0.951
Partial 0.5 0.241 0.000 0.046 0.046 0.953
1 0.228 0.000 0.036 0.036 0.960
2 0.181 0.001 0.028 0.029 0.951
c = 5, v = 1.2
Scenario Marker timea True value Bias Sampling SE Mean of SE Coverage
0.25 0.742 0.002 0.036 0.036 0.949
Perfect 0.5 0.930 0.002 0.036 0.037 0.956
1 0.995 0.001 0.037 0.038 0.952
2 1.000 0.003 0.049 0.048 0.940
0.25 0.000 0.001 0.037 0.037 0.953
Useless 0.5 0.000 0.000 0.032 0.032 0.955
1 0.000 0.000 0.023 0.023 0.943
2 0.000 0.000 0.016 0.016 0.953
0.25 0.219 0.003 0.055 0.054 0.948
Partial 0.5 0.262 0.000 0.049 0.049 0.956
1 0.252 0.000 0.037 0.038 0.947
2 0.203 0.001 0.030 0.030 0.952
a

The time point (year) measuring the marker.

4. Data Analysis

We apply the proposed time-varying F-measure to the HIV Prevention Trial Network (HPTN) 052 study [15]. The study enrolled 1763 serodiscordant couples in which one participant was HIV-positive, and the other was HIV-negative. The HIV-positive patients were randomly assigned to receive either immediate or delayed ART in a 1:1 ratio. Patients on the delayed arm started ART when two consecutive CD4+ cell count measurements fell below 250 per cubic millimeter or an indicator of AIDS developed. The study monitored the earlier occurrence of severe clinical outcomes in HIV-positive patients or HIV transmission to HIV-negative partners as a key endpoint. It is believed that plasma viral load mediates the effect of ART on HIV-associated disease progression and transmission [16].

In this application, we consider the plasma viral load as a candidate marker and evaluate its surrogacy level on the composite monitoring endpoint in a 3-year follow-up. To explain the idea in a simple way, we dichotomize the viral load using a threshold of 1000 copies per cubic millimeter. More specifically, we set the marker value to be 1 for a viral load greater than 1000. We estimate the time-varying F-measure for the viral load measured at each of the 2nd to 7th quarter after randomization. Table 2 shows results of the application. Comparing the prevalence of a high viral load between the two arms reveals that ART was very effective in suppressing viral proliferation. In addition, a low viral load significantly decreases the hazard of the composite endpoint on the immediate arm before the treatment effect kicks in on the delayed arm. The time-varying F-measure gradually increases until reaching its maximum at the 6th quarter. This temporal pattern reflects the fact that the surrogacy level is a combination of the treatment effect on the marker and the marker effect on the clinical endpoint. On the one hand, it takes time to realize the effect of viral load suppression. On the other hand, as an increasing number of patients on the delayed arm began ART, the difference in marker distribution between two arms become smaller. The time-varying F-measure correctly reflects the temporal pattern and the biological mechanism of ART.

Table 2.

Application to an HIV prevention trial HPTN 052. The proposed time-varying F-measure captures the proportion of treatment effect explained by the plasma HIV-1 viral load.

Marker Timea Delayed ART Arm Immediate ART Arm F-Measure
Prevalence of Viral Load ≥ 1000 Hazard Ratiob Prevalence of Viral Load ≥ 1000 Hazard Ratiob Estimator 95% CI
2 0.88 1.39 0.08 2.20 0.18 −0.03, 0.39
3 0.88 0.94 0.08 3.21* 0.41 0.13, 0.70
4 0.87 1.00 0.09 4.49* 0.52 0.09, 0.95
5 0.85 1.59 0.08 5.59* 0.72 0.10, 1.34
6 0.81 2.51* 0.07 4.49* 1.12 −0.42,2.67
7 0.75 2.11 0.08 6.55* 0.81 −0.92,2.54
a

The time point (quarter) when plasma HIV-1 viral load was measured.

b

Hazard ratio between groups with a viral load higher and lower than 1000 copies per cubic millimeter.

Significant results at the level of 0.05 are marked with *.

5. Discussion

In this paper, we consider a definition of time-varying F-measure based on three aspects. First, there is the question of whether there is a sound interpretation for comparisons. Second, do the typical marker types, such as perfect or useless markers, correspond to reasonable values. Third, is the defined F-measure model-free and equipped with a non-parametric estimation? Guided by the three questions, we define the time-varying F-measure in Section 2. In addition, we explore two alternative definitions. Both of them do not conduct an appropriate comparison. With gz(x)=PTcXt=x,Z=z, the F-measure can be defined as:

F(c,t)=P(Tc|Z=1)xP(TcXt=x,Z=1)P(Xt=xZ=0)P(Tc|Z=1)P(Tc|Z=0).

When the availability of an internal marker depends on the failure time (e.g., event is death-related), Xt should include “not applicable” as a possible value for subjects with T<t. In this case, PXt=xZ=1-PXt=xZ=0 is determined by both the treatment effect on the marker and that on the primary endpoint. Compared to P(TcZ=1), the adjusted survival probability actually removes a portion of the direct treatment effect. This definition does not reflect the proportion of treatment effect explained by the surrogate marker in general. With gz(x)=hcXt=x,Z=z, the F-measure can be defined as:

F(c,t)=h(c|Z=1)xh(cXt=x,Z=1)P(Xt=xTc,Z=0)h(c|Z=1)h(c|Z=0).

A closer look at the marker distribution PXt=xTc,Z=z reveals that:

P(Xt=xTc,Z=z)=(1+yxP(TcXt=y,Tt,Z=z)P(Xt=yTt,Z=z)P(TcXt=x,Tt,Z=z)P(Xt=xTt,Z=z))1.

If there is no interaction between the marker and intervention, the independence of Xt and Z in the risk set at time point t could translate to the independence at time point c. In other words, only if PTcXt=y,Tt,Z=1/PTcXt=y,Tt,Z=1)=PTcXt=y,Tt,Z=0/PTcXt=y,Tt,Z=0, then PXt=xTc,Z=1=PXt=xTc,Z=0 is equivalent to PXt=yTt,Z=z)/PXt=xTt,Z=z and is constant with z. Assumption of no interaction is, unfortunately, necessary for the appropriateness of the definition with hazard functions. As a contrast, the time-varying F definition introduced in Section 2 has a sound interpretation, reasonable ranges, and model-free definition and estimation. Moreover, numerical studies and practical data analysis verify the measure’s numerical behavior.

The time-varying F-measure is a generalization of the PTE [6] and F-measure [9]. All three measures are quantitative ones based on the qualitative Prentice Criterion [5]. While Prentice Criterion tests P(TS,Z)=P(TS) and requires a surrogate marker to capture the treatment effect fully, the three quantitative measures compare the treatment effects unadjusted and adjusted by the marker distribution on the treatment arm. Beyond the similarities, PTE is defined for binary endpoints and relies on logistic regressions for definition and estimation; the F-measure is a model-free version of PTE, however it does not cover how to assess surrogate markers for time-to-event outcomes. The time-varying F-measure brings in the time dimension and extends the measure for time-to-event outcomes in survival settings.

6. Conclusions

This paper introduces a generalized proportion of treatment effect for survival settings, called the time-varying F-measure. Without relying on any model assumption, the measure reflects the proportion of the average treatment effect mediated by the surrogate marker. In addition, the paper introduces a non-parametric estimator to maximize the measure’s model-free characteristics. One limitation of the current estimation method is its lack of efficiency, which can be a future research direction. We applied the generalized F-measure to assess the viral load as a surrogate marker for HIV progression and transmission in the HPTN052 study. The time-varying F-measure increased from 0.18 in the 2nd quarter after randomization and reached 1.12 in the 6th quarter. It correctly captured the temporal pattern and biological mechanism of how ART regulates HIV progression and transmission by suppressing viral replication.

Funding:

This research was partly supported by the grants from NIH/NICHD R01 HD094682 and NIH/NIAID R56 AI140953.

Appendix A. Proof of Theorem 1

Proof. The proof consists of three steps. First, we decompose nt(F^-F) as multiple empirical processes. Second, the convergence of each empirical process is derived. Third, we combine the asymptotic results and conclude the proof.

Step 1: We first write nt(F^-F) as:

nt(F^F)=nt(F^(px0^)F(px0^))+nt(F(px0^)F(px0)),

and tackle the parts one by one. For the first part, we plug in the estimator (1) and obtain:

nt(F^(px0^)F(px0^))=nt(s1^xs1x^px0^s1^s0^s1xs1xpx0^s1s0).

Rearranging the terms yields:

nt(F^(px0^)F(px0^))=nt(s1^s0^)(s1s0)(s1(s0^s0)s0(s1^s1)xpx0^(s1s0)(s1x^s1x)xpx0^s1x(s0^s0)+xpx0^s1x(s1^s1)).

Then we collect the terms and write the equation in the form of nts0^-s0,nts1^-s1 and nts1x^-s1x as:

nt(F^(px0^)F(px0^))=s1xpx0^s1x(s1^s0^)(s1s0)nt(s0^s0)+xpx0^s1xs0(s1^s0^)(s1s0)nt(s1^s1)+xpx0^(s0s1)(s1^s0^)(s1s0)nt(s1x^s1x).

Similarly, we write the second part as:

nt(F(px0^)F(px0))=nt(s1xs1xpx0^s1s0s1xs1xpx0s1s0)=1s0s1xs1xnt(px0^px0).

Step 2: In this step, we derive the convergence of each empirical process. Under the assumption of random censoring, Kaplan–Meier estimator S^(c) satisfies [17]:

nt(S^(c)S(c))=nt(𝒫n𝒫)(S(c)0cdM(u)EY(u))+op(1),

where M(u)N(u)-0uY(a)dΛ(a) represents the counting process martingale. Therefore, the Kaplan–Meier estimator s0^,s1^, and s1x^ satisfy:

nt(s0^(c)s0(c))=dnt(𝒫n𝒫)(s0(c)I(Z=0Tt)tcdN(u)Y(u)dΛ0(u)E(I(Z=0Tt)Y(u))),nt(s1^(c)s1(c))=dnt(𝒫n𝒫)(s1(c)I(Z=1Tt)tcdN(u)Y(u)dΛ1(u)E(I(Z=1Tt)Y(u))),nt(s1x^(c)s1x(c))=dnt(𝒫n𝒫)(s1x(c)I(Z=1,Xt=xTt)tcdN(u)Y(u)dΛ1x(u)E(I(Z=1,Xt=xTt)Y(u))),

where I() is an indicator function for subjects’ group.

Next we write ntpx0^-px0 in the form of:

nt(px0^px0)=nt(P^(Xt=xTt,Z=0)P(Xt=xTt,Z=0))=nt(P^(Xt=x,Z=0Tt)P^(Z=0Tt)P(Xt=x,Z=0Tt)P(Z=0Tt)).

It can be readily shown that:

nt(px0^px0)=1P^(Z=0Tt)nt(P^(Xt=x,Z=0Tt)P(Xt=x,Z=0Tt))P(Xt=x,Z=0Tt)P^(Z=0Tt)P(Z=0Tt)nt(P^(Z=0Tt)P(Z=0Tt)).

Let p0P(Z=0Tt) and p0xPXt=x,Z=0Tt. Since p0^ is a consistent estimator of p0, we obtain:

nt(px0^px0)=dnt(𝒫n𝒫)(1p0(I(Xt=x,Z=0Tt)p0x)p0xp02(I(Z=0Tt)p0))

as a consequence of Slutsky’s theorem.

Step 3: In this step, we combine the above results and conclude the convergence of nt(F^-F). We first introduce some notation,

η0=s0(c)I(Z=0|Tt)tcdN(u)𝒴(u)dΛ0(u)E(I(Z=0|Tt)𝒴(u)), (A1)
η1=s1(c)I(Z=1|Tt)tcdN(u)𝒴(u)dΛ1(u)E(I(Z=1|Tt)𝒴(u)), (A2)
η1x=s1x(c)I(Z=1,Xt=xTt)tcdN(u)𝒴(u)dΛ1x(u)E(I(Z=1,Xt=xTt)𝒴(u)), (A3)
η0xp=1p0(I(Xt=x,Z=0Tt)p0x)p0xp02(I(Z=0|Tt)p0). (A4)

Combining the above results and apply Slutsky’s theorem, we can write:

nt(F^(c)F(c))=nt(𝒫n𝒫)(s1xpx0s1x(s1s0)2η0+xpx0s1xs0(s1s0)2η1+1s0s1xpx0η1x+1s0s1xs1xη0xp)+op(1).

It follows that nt(F^(c)-F(c)) converges weakly to a zero-mean Gaussian process with covariance function Eζ(c)ζc between time points c and c, where:

ζ(c)=s1xpx0s1x(s1s0)2η0+xpx0s1xs0(s1s0)2η1+1s0s1xpx0η1x+1s0s1xs1xη0xp.

The covariance function Eζ(c)ζc can be consistently estimated by 1/nti=1ntζˆi(c)ζˆic with:

ζˆi(c)=s1xpx0s1x(s1s0)2η0i+xpx0s1xs0(s1s0)2η1i+1s0s1xpx0η1xi+1s0s1xs1xη0xip

where η0i,η1i,η1xi, and η0xip is the subject i′s realization of (A1)–(A4), respectively. The specific forms are:

η0i=s0(c)I(Zi=0Tit)tcdNi(u)Yi(u)dΛ0(u)E(I(Zi=0Tit)Yi(u))
η1i=s1(c)I(Zi=1Tit)tcdNi(u)Yi(u)dΛ1(u)E(I(Zi=1Tit)Yi(u))
η1xi=s1x(c)I(Zi=1,Xti=xTit)tcdNi(u)Yi(u)dΛ1x(u)E(I(Zi=1,Xti=xTit)Y(u))
η0xip=1p0(I(Xti=y,Zi=0Tit)p0x)p0xp02(I(Zi=0Tit)p0).

Appendix B. Proof of Theorem 2

Proof. The proof consists of two steps. First, we show the sufficiency of Condition C1 for F<1. Second, we show the sufficiency of either Condition C2 or C3 for F>0.

Step 1: We first expand AB-BB as xPTcXt=x,Tt,Z=1-PTcXt=x,Tt,Z=0PXt=xTt,Z=0. If Condition C1 is satisfied, we have AB-BB>0. Simple algebra reveals (AA-AB)-(AA-BB)<0. Given AA-BB>0, we conclude:

F=AAABAABB<1.

Step 2: If Xt in the treatment group is stochastically greater than that in the control group, EAfXt>EBfXt for bounded and increasing function f. When P(TcXt=x,Tt,Z=1 is increasing with x, we have xPTcXt=x,Tt,Z=1)PXtxTt,Z=1>xPTcXt=x,Tt,Z=1PXtxTt,Z=0, that is, AA>AB. Use the same argument, if Xt in the control group is stochastically greater than that in the treatment group and PTcXt=x,Tt,Z=1) is decreasing with x, we have -xPTcXt=x,Tt,Z=1PXtxTt,Z=0>-xPTcXt=x,Tt,Z=1PXtxTt,Z=1, that is -AB>-AA. Given AA-BB>0, we conclude:

F=AAABAABB>0.

In summary, if Condition C1 and C2 (or C3) are satisfied, the F-measure is bounded within (0,1). □

Appendix C. Proof of Theorem 3

Proof. Step 1. In this step, we show P(T(1)cT(1)t,T(0)t)=P(TcTt,Z=1). We first write:

P(T(1)c|T(1)t,T(0)t)=P(T(1)c,T(1)t|T(0)t)P(T(1)t|T(0)t).

By Assumption 7, we have:

P(T(1)cT(1)t,T(0)t)=P(T(1)c,T(1)t)P(T(1)t).

Further, Assumption 8 yields:

P(T(1)cT(1)t,T(0)t)=P(T(1)c,T(1)tZ=1)P(T(1)tZ=1)=P(TcZ=1)P(TtZ=1)=P(TcTt,Z=1).

In a similar way, we can show P(T(0)cT(1)t,T(0)t)=P(TtTt,Z=0).

Step 2. In this step, we show PT1,Xt=Xt(0)cT(1)t,T(0)t=xP(TcXt=x,Tt,Z=1PXt=xTt,Z=0. By definition,

P(T(1,Xt=Xt(0))cT(1)t,T(0)t)=xP(T(1,Xt=x)c,Xt(0)=xT(1)t,T(0)t)=xP(T(1,Xt=x)cXt(0)=x,T(1)t,T(0)t)P(Xt(0)=xT(1)t,T(0)t).

In the following, we work on the two probability components one by one.

P(Xt(0)=xT(1)t,T(0)t)=P(Xt(0)=x,T(0)tT(1)t)P(T(0)tT(1)t).

The cross-world independence described in Assumption 7 leads to:

P(Xt(0)=xT(1)t,T(0)t)=P(Xt(0)=x,T(0)t)P(T(0)t).

Furthermore, Assumption 8 yields:

P(Xt(0)=xT(1)t,T(0)t)=P(Xt(0)=x,T(0)tZ=0)P(T(0)tZ=0)=P(Xt=xTt,Z=0).

The remaining probability component:

P(T(1,Xt=x)cXt(0)=x,T(1)t,T(0)t)=P(T(1,Xt=x)c,T(1)tXt(0)=x,T(0)t)PT(1)tXt(0)=x,T(0)t)=P(T(1,Xt=x)c,T(1)t)PT(1)t)=P(T(1,Xt=x)c,T(1)tZ=1)P(T(1)tZ=1)(byAssumption8).

Then we have:

P(T(1,Xt=x)cXt(0)=x,T(1)t,T(0)t)=P(T(1,Xt=x)cTt,Z=1).

Assumption 9 gives Xt(1)T1,Xt=xcTt,Z=1, so that:

P(T(1,Xt=x)cXt(0)=x,T(1)t,T(0)t)=P(T(1,Xt=x)cXt(1)=x,Tt,Z=1)=P(TcXt=x,Tc,Z=1).

Collecting the equations for the two probability components, we show:

P(T(1,Xt=Xt(0))T(1)t,T(0)t)=xP(TcXt=x,Tt,Z=1)P(Xt=xTt,Z=0).

Appendix D. F-Measure under the Time-Varying Cox-Weibull Model

To facilitate the exploration and understanding of the F-measure, we calculate its true value under a time-varying Cox–Weibull model as an illustrative example. We follow the notation described in the main paper: Z denotes treatment assignment (control = 0; treatment = 1); Xt denotes the value of the marker at time t;T denotes the failure time; and c denotes the pre-specified time of interest for survival.

We consider the time-varying Cox–Weibull model:

h(t)=h0(t)exp(b1Z+b2Xt),
h0(t)=λvtv1,

where the marker value satisfies Xt=Itts, and ts follows an exponential distribution with mean μz. The definition of the F-measure reads:

F(c,t)=P(TcTt,Z=1)xPr(TcXt=x,Tt,Z=1)P(Xt=xTt,Z=0)P(Tc|Tt,Z=1)P(Tc|Tt,Z=0).

With the Bayes rule, the conditional survival probability can be written in the form of:

P(Tc|Tt,Z=z)=P(Tc|Z=z)P(Tt|Z=z).

In general, for τ(0,),

P(Tτ|Z=z)=P(Tτ,Xτ=1Z=z)+P(Tτ,Xτ=0Z=z). (A5)

The first term of (A5):

P(Tτ,Xτ=1Z=z)=0τP(Tτts,Z=z)f(tsZ=z)dts=0τexp(0tsexp(b1z)h0(u)du0τexp(b1z+b2)h0(t)du)f(tsZ=z)dts.

Plugging in abh0(u)du=λbv-av and the density ftsZ=z=1/μzexp-ts/μz leads to:

P(Tτ,Xτ=1Z=z)=0τexp(eb1zλtsveb1z+b2λ(τvtsv))1μzexp(tsμz)dts. (A6)

For a general Cox-Weibull model with v>0, there is no closed form formula and we need to refer to a numerical evaluation. Similarly, the second term of (A5):

P(Tτ,Xτ=0Z=z)=τP(Tτts,Z=z)f(tsZ=z)dts=τexp(0τexp(b1z)h0(u)du)f(tsZ=z)dts=τexp(eb1zλτv)1μzexp(tsμz)dts=exp(τvλeb1zτtz). (A7)

Thus far, with Equations (A6) and (A7), we can calculate P(TcTt,Z=1) and P(TcTt,Z=0).

Next, the adjusted probability in the numerator of the F-measure xPTcXt=x,Tt,Z=1)PXt=xTt,Z=0=xPTcXt=x,Tt,Z=1PXt=xTt,Z=0). We use Bayes rule and obtain:

P(TcXt=1,Tt,Z=1)=P(Tc,Xt=1Z=1)P(Tt,Xt=1Z=1). (A8)

The denominator of Equation (A8) is shown in (A6). The numerator of Equation (A8) satisfies:

P(Tc,Xt=1Z=1)=P(Tc,tstZ=1)=0tP(Tcts,Z=1)f(tsZ=1)dts=0texp(0tsexp(b1)h0(u)dutscexp(b1+b2)h0(u)du)f(tsZ=1)dts

Plugging in abh0(u)du=λbv-av and the density ftsZ=z=1/μzexp-ts/μz leads to:

P(Tc,Xt=1Z=1)=0texp(eb1λtsveb1+b2λ(cvtsv))1μ1exp(tsμ1)dts. (A9)

Following the same lines, we work on:

P(TcXt=0,Tt,Z=1)=P(Tc,Xt=0Z=1)P(Tt,Xt=0Z=1). (A10)

The numerator of Equation (A10) reads:

P(Tc,Xt=0Z=1)=P(Tc|Z=1)P(Tc,Xt=1Z=1), (A11)

and the denominator reads:

P(Tt,Xt=0Z=1)=P(Tt|Z=1)P(Tt,Xt=1Z=1). (A12)

To simplify the math, we introduce the following notation:

A(τ,z)P(Tτ,Xτ=1Z=z)=0τexp(eb1zλtsveb1z+b2λ(τvtsv))1μzexp(tsμz)dts,B(τ,z)P(Tτ,Xτ=0Z=z)=exp(τvλeb1zτμz),CP(Tc,Xt=1Z=1)=0τexp(eb1λtsveb1+b2λ(cvtsv))1μ1exp(tsμ1)dts.

With the above notation, Equation (A5) writes:

P(Tτ|Z=z)=A(τ,z)+B(τ,z),

Equation (A8) writes:

P(TcXt=1,Tt,Z=1)=CA(t,1),

and Equation (A10) writes:

P(TcXt=0,Tt,Z=1)=A(c,1)+B(c,1)CA(t,1)+B(t,1)A(t,1)=A(c,1)+B(c,1)CB(t,1).

The remaining parts in the F-measure definition are:

P(Xt=1Tt,Z=0)=P(Xt=1,TtZ=0)P(TtZ=0)=A(t,0)A(t,0)+B(t,0),
P(Xt=0Tt,Z=0)=1P(Xt=1Tt,Z=0)=B(t,0)A(t,0)+B(t,0).

Gathering all the pieces together, we have:

P(Tc|Tt,Z=1)=A(c,1)+B(c,1)A(t,1)+B(t,1),
P(TcTt,Z=0)=A(c,0)+B(c,0)A(t,0)+B(t,0),
Pr(TcXt=1,Tt,Z=1)P(Xt=1Tt,Z=0)=CA(t,0)A(t,1)(A(t,0)+B(t,0)),
Pr(TcXt=0,Tt,Z=1)P(Xt=0Tt,Z=0)=(A(c,1)+B(c,1)C)B(t,0)B(t,1)(A(t,0)+B(t,0)).

When v=1 (i.e., the failure time follows an exponential distribution), terms A and C are equipped with a closed-form formula in the form of:

A(τ,z)exp(τλeb1z+b2)exp(τ(λeb1z+1/μz))1+λμzeb1z(1eb2),
B(τ,z):=exp(τ(λeb1z+1/tz)),
Cexp(cλeb1+b2)(1exp(tμ1+λteb1(1+eb2)))1+λμ1eb1(1eb2).

Footnotes

Conflicts of Interest: The authors declare no conflict of interest.

References

  • 1.FDA. New drug, antibiotic and biological drug product regulations: accelerated approval. Fed. Regist 1992, 57, 13234–13242. [PubMed] [Google Scholar]
  • 2.Baker SG; Kramer BS A perfect correlate does not a surrogate make. BMC Med. Res. Methodol 2003, 3, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fleming TR; Powers JH Biomarkers and surrogate endpoints in clinical trials. Stat. Med 2012, 31, 2973–2984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zhuang R; Chen YQ Measuring Surrogacy in Clinical Research. Stat. Biosci 2020, 12, 295–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Prentice RL Surrogate endpoints in clinical trials: definition and operational criteria. Stat. Med 1989, 8, 431–440. [DOI] [PubMed] [Google Scholar]
  • 6.Freedman LS; Graubard BI; Schatzkin A Statistical validation of intermediate endpoints for chronic diseases. Stat. Med 1992, 11, 167–178. [DOI] [PubMed] [Google Scholar]
  • 7.Lin DY; Fleming TR; De Gruttola V Estimating the proportion of treatment effect explained by a surrogate marker. Stat. Med 1997, 16, 1515–1527. [DOI] [PubMed] [Google Scholar]
  • 8.Bycott PW; Taylor JM An evaluation of a measure of the proportion of the treatment effect explained by a surrogate marker. Control. Clin. Trials 1998, 19, 555–568. [DOI] [PubMed] [Google Scholar]
  • 9.Wang Y; Taylor JM A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics 2002, 58, 803–812. [DOI] [PubMed] [Google Scholar]
  • 10.Kaplan EL; Meier P Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc 1958, 53, 457–481. [Google Scholar]
  • 11.Robins JM; Greenland S Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992, 3, 143–155. [DOI] [PubMed] [Google Scholar]
  • 12.Pearl J Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2–5 August 2001; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 411–420. [Google Scholar]
  • 13.Taylor JM; Wang Y; Thiébaut R Counterfactual Links to the Proportion of Treatment Effect Explained by a Surrogate Marker. Biometrics 2005, 61, 1102–1111. [DOI] [PubMed] [Google Scholar]
  • 14.Austin PC Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Stat. Med 2012, 31, 3946–3958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cohen MS; Chen YQ; McCauley M; Gamble T; Hosseinipour MC; Kumarasamy N; Hakim JG; Kumwenda J; Grinsztejn B; Pilotto JH; et al. Antiretroviral therapy for the prevention of HIV-1 transmission. N. Engl. J. Med 2016, 375, 830–839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Murray JS; Elashoff MR; Iacono-Connors LC; Cvetkovich TA; Struble KA The use of plasma HIV RNA as a study endpoint in efficacy trials of antiretroviral drugs. Aids 1999, 13, 797–804. [DOI] [PubMed] [Google Scholar]
  • 17.Fleming TR; Harrington DP Counting Processes and Survival Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 169. [Google Scholar]

RESOURCES