Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jul 14.
Published in final edited form as: Biometrics. 2022 May 17;79(2):788–798. doi: 10.1111/biom.13677

Robust approach to combining multiple markers to improve surrogacy

Xuan Wang 1, Layla Parast 2, Larry Han 1, Lu Tian 3, Tianxi Cai 1,4
PMCID: PMC10347081  NIHMSID: NIHMS1910650  PMID: 35426444

Abstract

Identifying effective and valid surrogate markers to make inference about a treatment effect on long-term outcomes is an important step in improving the efficiency of clinical trials. Replacing a long-term outcome with short-term and/or cheaper surrogate markers can potentially shorten study duration and reduce trial costs. There is sizable statistical literature on methods to quantify the effectiveness of a single surrogate marker. Both parametric and nonparametric approaches have been well developed for different outcome types. However, when there are multiple markers available, methods for combining markers to construct a composite marker with improved surrogacy remain limited. In this paper, building on top of the optimal transformation framework of Wang et al. (2020), we propose a novel calibrated model fusion approach to optimally combine multiple markers to improve surrogacy. Specifically, we obtain two initial estimates of optimal composite scores of the markers based on two sets of models with one set approximating the underlying data distribution and the other directly approximating the optimal transformation function. We then estimate an optimal calibrated combination of the two estimated scores which ensures both validity of the final combined score and optimality with respect to the proportion of treatment effect explained by the final combined score. This approach is unique in that it identifies an optimal combination of the multiple surrogates without strictly relying on parametric assumptions while borrowing modeling strategies to avoid fully nonparametric estimation which is subject to the curse of dimensionality. Our identified optimal transformation can also be used to directly quantify the surrogacy of this identified combined score. Theoretical properties of the proposed estimators are derived, and the finite sample performance of the proposed method is evaluated through simulation studies. We further illustrate the proposed method using data from the Diabetes Prevention Program study.

Keywords: multiple surrogate markers, nonparametric estimation, proportion of treatment effect explained

1 |. INTRODUCTION

The primary outcomes of randomized clinical trials often require long-term follow-up and/or involve expensive or invasive measurement procedures. Leveraging short-term or less costly surrogate markers to draw valid inference for the treatment effect on the long-term outcome can potentially reduce trial duration, improve cost-effectiveness, and reduce enrollment requirements. For example, in HIV research, biomarkers such as CD4 cell counts and viral load have been used as surrogate outcomes for long-term primary outcomes, such as time to mortality or diagnosis of AIDS (Brookmeyer et al., 1994). Furthermore, in resource-limited settings, total lymphocyte count has been studied as a surrogate marker for CD4 cell count (Wondimeneh et al., 2012; Chen et al., 2013).

There is sizable statistical literature on methods for quantifying the effectiveness of a single surrogate marker in predicting a treatment effect on a long-term outcome. Following the criteria initially proposed by Prentice (1989) for a valid surrogate marker, both model-based (Freedman et al., 1992; Wang and Taylor, 2002) and nonparametric (Parast et al., 2016; Wang et al., 2020) statistical methods for evaluating the validity of surrogate markers have been proposed. For example, Freedman et al. (1992) proposed a measure of the proportion of the treatment effect on the primary outcome that is explained by the treatment effect on the surrogate (PTE) by examining the change in treatment effect size when the surrogate is added to a specified regression model. Wang and Taylor (2002) proposed to quantify the PTE by examining what the treatment effect would be if the surrogate marker in the treatment group had the same distribution as the surrogate in the control group and obtained a model-based estimate for this PTE measure. However, these model-based approaches may yield an invalid PTE estimate under model misspecification (Lin et al., 1997). To overcome such limitations, Parast et al. (2016) proposed a fully nonparametric modelfree estimation procedure for the PTE measure defined by Wang and Taylor (2002). Recently, Wang et al. (2020) proposed an alternative model-free approach to quantifying PTE by identifying an optimal transformation of the surrogate marker that best predicts treatment effect and then quantifying the PTE based on the treatment effect on the transformed marker.

The aforementioned methods are generally limited to a single surrogate setting. When there are multiple surrogate markers, denoted by S=S1,,Sp, it would be valuable to derive a composite marker, g(S), which ideally would have higher surrogacy compared to any Sj alone. For example, in prostate cancer, there are multiple promising surrogate markers including prostate-specific antigen, Gleason score, and circulating tumor cells that have been examined for managing progression and treatment response (Doyen et al., 2012). In diabetes prevention studies, one of which we examine in this paper, there are also often multiple potential surrogate markers including fasting plasma glucose, hemoglobin A1c (HbA1c), and early incidence of diabetes. The ability to examine the surrogacy of glucose, HbA1c, and early incidence together as well as compared to the surrogacy of each of these alone can provide valuable information for the design and conduction of future studies that may use these potential surrogates. Existing methods that can be used for assessing the surrogacy of multiple markers are largely model-based (Freedman et al., 1992; Huang and Gilbert, 2011; Xu and Zeger, 2001; Parast et al., 2016; Van der Elst et al., 2019; Price et al., 2018, for instance). For example, Xu and Zeger (2001) proposed a latent variable model on the joint distribution of the outcome Y and markers S given treatment to determine whether combining multiple surrogates can improve surrogacy. Similar to the single marker scenario, these model-based methods can lead to invalid inference under model misspecification (Parast et al., 2021).

More robust approaches to optimally combine S have been proposed recently. For example, Price et al. (2018) proposed a novel flexible prediction framework to derive an optimal combination of multiple markers within each treatment group which can subsequently be used to predict the treatment effect on Y. However, their approach does not offer a measure of surrogate strength that can be used to make decisions about the utility of the surrogates in a future study and cannot be used to assess the surrogacy of the markers since the combined markers are assumed to attain perfect surrogacy (see Web Appendix A). Athey et al. (2019) proposed a model-based approach in which they estimate the conditional expectation of the primary outcome given multiple surrogates and the conditional probability of having received treatment given the multiple surrogates and combine these models to estimate a treatment effect based on the surrogates. Though they demonstrate that the use of the surrogates via their proposed method can be more efficient than using the primary outcome itself (under certain assumptions), their proposed surrogate index based on E(YS) is not guaranteed to be a valid surrogate marker or possess any optimal property in predicting the treatment effect on Y, especially under possible model misspecifications. Parast et al. (2021) proposed a robust approach to quantify the PTE of multiple surrogates in a time-to-event outcome setting whereby a parametric working model is used to define g:Sg(S) and then a nonparametric method is used to estimate the PTE of g(S). While useful, in obtaining g, Parast et al. (2021) do not target any optimization function nor do they claim that g is optimal in any way; therefore, it is always possible that there is an alternative function that captures more surrogacy of multiple markers. In summary, there does not appear to be any available method to identify and evaluate the surrogacy of an optimal combination of S. In this paper, we fill the gap by proposing a novel calibrated model fusion (CMF) approach to optimally combine multiple markers.

The two-step CMF approach leverages flexible modeling strategies to construct two composite scores of S and overcomes model misspecification via an additional calibration step. Specifically, we first obtain two initial estimates of optimal composite scores, gpar(S) and gbas(S), based on two sets of working models with one targeting the underlying data distribution and the other targeting an optimal transformation function. We then estimate an optimal calibrated fusion of the two scores by maximizing the PTE. This flexible modeling approach allows us to approximate the optimal transformation of S without relying on fully nonparametric estimation which is subject to the curse of dimensionality. We demonstrate that the final CMF composite score, gCMF(S), is valid with PTE between 0 and 1 under mild regularity conditions and is optimal in a certain sense with respect to PTE. This approach is unique in that it identifies an optimal combination of S without strictly relying on parametric assumptions, and the treatment effect on gCMF(S) can be directly used to make inference regarding the PTE and approximate the treatment effect on Y in future clinical trials if PTE is high.

The remainder of the paper is organized as follows. In Section 2, we detail the CMF approach to combining the surrogate markers and propose a PTE definition based on the final composite marker. In Section 3.1, simulation studies are conducted to evaluate the finite sample performance of the proposed method. We also illustrate the proposed method using data from the Diabetes Prevention Program (DPP) study. Concluding remarks are made in Section 4. Proofs of asymptotic results are provided in Web Appendix B.

2 |. METHODOLOGY

2.1. Notation and approach

Let Y be the primary outcome of interest and Sp×1 be the vector of surrogate markers. The outcome Y may be either continuous or discrete. Let A be the treatment indicator, with A = 1 denoting the treatment and A = 0 denoting control. We assume that patients are randomly assigned to the treatment and control groups, and without loss of generality we assume that p1=P(A=1)=0.5. Denote Y(a),S(a) as the potential outcome and potential surrogate vector, respectively, a=0,1. The observed data consist of n independent and identically distributed random vectors 𝒟={Di=Ai,Yi,Si,i=1,,n}, where Yi=AiYi(1)+1-AiYi(0) and Si=AiSi(1)+1-AiSi(0), such that Ai=1 for i=1,,n1 and Ai=0 for i=n1+1,,n0+n1=n.

Our goals are (i) to identify an optimal g() such that

ΔgE{g(S(1))}-E{g(S(0))} (1)

can optimally predict the treatment effect on Y,ΔE(Y(1))-E(Y(0)), and (ii) estimate the PTE of g(S) where PTEgΔg/Δ, that is, the ratio of the treatment effect on the transformed surrogate and the treatment effect on Y. Building on top of the optimal transformation framework proposed in Wang et al. (2020), we propose a CMF approach to deriving such an optimal g(). We first describe the Wang et al. (2020) approach before detailing key steps of the CMF procedure.

For any given set of surrogate markers S, Wang et al. (2020) proposed to quantify the surrogacy of S by first identifying an optimal transformation gopt() such that

gopt(S)=argming:E{Y(0)-g(S(0)}=0E{Y(1)-g(S(1)}2+{Y(0)-g(S(0))}2. (2)

The constraint of E{Y(0)-g(S(0))}=0 is imposed since a constant shift in g() would yield the same Δg. This constraint is a natural choice and would allow us to incorporate the constraint more easily in the minimization. While we do not require (Y(1),S(1))(Y(0),S(0)), when this does hold, it is interesting to note that gopt(S) as we have defined it also optimizes the individual level surrogacy measure E([(Y(1)-Y(0))-{g(S(1))-g(S(0))}]2). It was shown in Wang et al. (2020) that gopt() takes the functional form

gopt(S)=m(S)+λF˙0(S)F˙0(S)+F˙1(S)=m(S)+λ𝒫0(S), (3)

where F˙a(S),Fa(S) are the respective density and cumulative distribution functions of S(a),

ms=EYS=s=m1s𝒫1s+m0s𝒫0s,mas=E(Ya|Sa=s),
𝒫a(s)=F˙a(s)/F˙0(s)+F˙1(s)=P(A=aS=s),λ=μ0-m(s)dF0(s)𝒫0(s)dF0(s), (4)

and μ0=E(Y(0)). Furthermore, the quantity PTEgoptΔgopt/Δ was shown to be between 0 and 1 under mild regularity conditions.

(c1) S1(u)S0(u) for all u;

(c2) M1(u)M0(u) for all u in the common support of gopt(S(1)) and gopt(S(0)), where Sa(u)=P{gopt(S(a))u} and Ma(u)=E{Y(a)gopt(S(a))=u}, for a=0,1. This ensures that gopt(S) is a valid surrogate marker and enables us to quantify the surrogacy of gopt(S) directly based on the treatment effect on gopt(S).

When S is univariate, one may make inference about gopt and PTEgopt nonparametrically. Wang et al. (2020) handle a univariate S setting, but cannot accommodate multiple S. Fully nonparametric inference is not feasible when S is multidimensional due to the curse of dimensionality. However, gopt(S) can be approximated by either imposing distributional assumptions on the data or by restricting g() in (2) to a smaller functional space. However, each of these options alone may not perform well under model misspecification, which we will demonstrate in our simulation study in Section 3.1. To overcome this, we propose a two-step robust CMF approach where in step (I), we derive two model-based estimates of gopt(),gpar(), and gbas(), where gpar() is obtained by imposing parametric assumptions on ma() and 𝒫a(); and gbas() is obtained by directly minimizing (2) restricting g(S) to αΨ where Ψ is a prespecified K-dimensional basis expansion of S and α is an unknown K-dimensional parameter. Since gopt(S) is a complex functional of S, gpar(S) and/or gbas(S) may fail to accurately approximate gopt(S). In step II, we overcome potential model misspecifications by obtaining a final calibrated optimal composite marker as

gCMFS=𝒢ωgparS+1-ωgbasS, (5)

where 𝒢() and ω are estimated to optimize the PTE of gCMF(S). Notably, the “optimality” as we refer to it here essentially has two layers; the first layer is to approximate the optimal transformation function, gopt(S), defined in Wang et al. (2020) via two methods, a parametric method and a spline basis approximation method, resulting in gpar(S) and gbas(S), respectively. The second layer of “optimality” is with respect to finding the optimal linear combination of gpar(S) and gbas(S) in step II that maximizes the PTE, which is the primary measure of interest. Intuitively, this model fusion approach to combining surrogate markers is similar to the model average approaches in the literature in that we use several potential models to approximate gopt and subsequently averages them to maximize PTE within a smaller class of transformations of S.

We next detail estimation and inference procedures for each step of the CMF approach.

2.2. Step (I): Model fusion

We first describe the estimation of gpar and gbas under two sets of models. First, we impose generalized linear working models for ma(S) and 𝒫1(S):

ma(S)=κβaΦ,𝒫1(S)=expitγΦ, (6)

where Φ is a basis expansion of S that includes an intercept, expit (x)=ex/1+ex, and κ() is a known link function such as κ(x)=x for continuous outcomes and κ(x)=expit(x) for binary outcomes. Then βa and γ could be estimated by standard maximum likelihood or M-estimation methods with the estimators denoted as β^a and γ^, respectively, a=0,1. Subsequently, we construct

g^par(S)=m^(S)+λ^𝒫^0(S)=m^1(S)𝒫^1(S)+m^0(S)𝒫^0(S)+λ^𝒫^0(S), (7)

where m^a(S)=κ(β^aΦ),𝒫^1(S)=expit(γ^Φ),𝒫^0(S)=1-𝒫^1(S), and

λ^=μ^0-n0-1i=n1+1nmˆSin0-1i=n1+1n𝒫^0Siandμ^a=na1i=1nYiI(Ai=a). (8)

Standard asymptotic theory can be used to demonstrate that under mild regularity conditions, g^par(S) converges in probability to a deterministic function, denoted by gpar(S), and n12g^par(s)-gpar(s)=n-12i=1n𝒱pars;Di+op(1) which converges in distribution to a mean zero Gaussian process in s, where 𝒱par() is an influence function.

We next derive an alternative approximation to gopt(s) by minimizing (2) with the class of g that takes the form gbas=c+αΨ. Specifically, we obtain c and α as the minimizer of

gbas(S)=argminE{Y(0)-c-αΨ(0)}=0E[{Y(1)-c-αΨ(1))}2.+{Y(0)-c-αΨ(0))}2] (9)

Direct calculations show that the minimizer of (10) has the form c=μ0-αoptC0,1 and

αopt=C1,2+C0,2-C0,1C1,1-C1,1C0,1-1×ρ1+ρ0-μ1C0,1-μ0C1,1, (10)

where Ca,k=EΨkA=a, ρa=E(YΨA=a) for any vector a,a1=a and a2=aa. Similar to the parametric distributional modeling, if gbas(S) is correctly specified for gopt(S), we expect c+αoptΨ to correctly recover gopt(S). A simple plug in estimator for α and c can be constructed as

α^=C^1,2+C^0,2-C^0,1C^1,1-C^1,1C^0,1-1×ρ^1+ρ^0-μ^1C^0,1-μ^0C^1,1,c^=μ^0-α^C^0,1. (11)

where C^a,k=na-1i=1nIAi=aΨik and ρˆa=na-1i=1nIAi=aYiΨi. Correspondingly, the estimator of gbas(S)=αoptΨ can be obtained as

g^basS=α^Ψ. (12)

Here and in the next step, for simplicity, we remove intercepts from gbas and g^bas since the intercept term does not contribute to the PTE or treatment effect estimation. Using standard asymptotic theory, we show in Web Appendix B that under mild regularity conditions, g^bas(S) converges in probability to gbas(S)αoptΨ regardless of whether gopt(S) takes such a functional form, and n12g^bas(S)-gbas(S)=n-12i=1n𝒱basDiΨ+op(1) converges in distribution to a mean zero Gaussian process in s, where 𝒱bas() is the influence function for n12α^-αopt.

2.3 |. Step II: Calibrated optimal combination of gpar(S)andgbas(S)

To guard against model misspecification, we create a calibrated optimal combination of gpar(S) and gbas(S) to form the final CMF composite score,

gCMFS𝒢ωgparS+1-ωgbasS, (13)

where ω and the calibration function 𝒢() are chosen to optimize the PTE of gCMF(S). When ω is given, the optimal calibration function 𝒢 can be obtained as

𝒢ωs=mωs+λω𝒫0,ωs, (14)

where

mωs=EY𝒮ω=s,𝒫a,ωs=PA=a𝒮ω=s,
λω=μ0-mω(s)dF0,ω(s)𝒫0,ω(s)dF0,ω(s), (15)

𝒮ω=ωgpar(S)+(1-ω)gbas(S), and Fa,ω(s)=P𝒮ωsA=a}. The calibration function 𝒢ω() enables us to directly estimate the PTE of 𝒮ω,PTEω, based on treatment effects on 𝒢ω𝒮ω for any ω:

PTEω=Δω/Δ,whereΔω=E𝒢ω(𝒮ω(1))-𝒢ω(𝒮ω(0)) (16)

and 𝒮ω(a)=ωgpar(S(a))+(1-ω)gbas(S(a)). Then, we identify the optimal weight ω, denoted as ω, as the maximizer of PTEω. The final CMF composite marker is then

gCMF(S)=𝒢ωωgpar(S)+(1-ω)gbas(S),withω=argmaxωPTEω. (17)

By the construction of ω, the PTE of gCMF(S) is higher than the PTE of 𝒢0gbas(S) or 𝒢1gpar(S), respectively, denoted by PTEgbasPTEω=0 and PTEgparPTEω=1. Following similar arguments as given in Wang et al. (2020), it can be shown that PTEω[0,1], if

(C1) S1,ω(v)S0,ω(v) for all v;

(C2) M1,ω(v)M0,ω(v) for all v in the common support of (𝒮ω(1)) and (𝒮ω(0)), where Sa,ω(v)=P𝒢ω𝒮ωvA=a and Ma,ω(u)=E[Y(a)𝒢ω(𝒮ω(a))=v], for a=0,1. These two conditions are similar to but slightly weaker than those required in Parast et al. (2016) and Wang and Taylor (2002) and ensure that we are not in a surrogate paradox situation (Vander-Weele, 2013).

To estimate gCMF(), we nonparametrically estimate 𝒢ω() for any given ω as 𝒢^ω(s)=m^ω(s)+λ^ω𝒫^0,ω(s) as in Wang et al. (2020) with 𝒮^ω=ωg^par(S)+(1-ω)g^bas(S) being the surrogate marker, where m^ω(s)={i=1nYiKh(𝒮^ω,i-s)}/{i=1nKh(𝒮^ω,i-s)},

λ^ω=μ^0-n0-1i=n1+1nm^ω(𝒮^ω,i)n0-1i=n1+1n𝒫^0,ω(𝒮^ω,i),and𝒫^a,ω(s)=i=1nI(Ai=a)Kh(𝒮^ω,i-s)i=1nKh(𝒮^ω,i-s), (18)

Kh(x)=K(x/h)/h, and K() is a symmetric kernel function with bandwidth h. We then estimate ω as ω^=argmaxωPTE^ω, where PTE^ω=Δ^ω/Δ^,Δ^ω=μ^1,𝒢^ω-μ^0,𝒢^ω,Δ^=μ^1-μ^0, and μ^a,𝒢=na-1i=1nI(Ai=a)𝒢(𝒮^ω). The PTE of gCMF(S), denoted as PTEω, can be estimated as PTE^ω^, also denoted as PTE^.

2.4. Use with censored outcomes

Our proposed modeling framework can also handle binary outcomes defined by censored event times. Specifically, when Y=I(Tt) is defined as t-year event status for an event time T,Y is not directly observable since T is typically only observed up to (X,δ), where X=min(T,C), δ=I(TC), and C is the censoring time assumed to be independent of S and T given A. For such a t-year event status outcome, one may choose κ() as a cumulative distribution function such as expit() and overcome the censoring via inverse probability weighting (IPW) with weights w^i(t)=IXitδi/G^AiXi+IXi>t/G^Ai(t) as in Parast et al. (2021), where G^a(t) is the Kaplan-Meier estimate of Ga(t)=P(CtA=a) based on data from patients with Ai=a. Then, IPW can be used to weight the likelihood for estimating βa, γ and to obtain α^ with weight w^i(t) for the ith observation in constructing C^a,k and ρˆa. In the final estimation of ω and 𝒢(), similar IPW strategies may be adopted to account for censoring. Alternatively, one could consider an approach using pseudoobservations, particularly if there may be interest in defining the treatment effect in terms of the difference in residual mean survival time (Klein and Andersen, 2005; Andersen and Pohar Perme, 2010).

2.5 |. Inference

In Web Appendix B, we outline the justifications for the convergence of g^par,g^bas, ω^, and PTE^ω^ to their respective population values as well as the asymptotic normality of n1/2(PTE^ω^-PTEω). We find that the variation of ω^ does not contribute to the asymptotic variance of n1/2(PTE^ω^-PTEω) as it maximizes PTE^ω. In practice, the asymptotic variance of n1/2(PTE^ω^-PTEω) can be estimated using resampling similar to those employed by Parast et al. (2016) and Wang et al. (2020); details are provided in Web Appendix C.

3 |. NUMERICAL STUDIES

3.1 |. Simulations

Simulation studies were conducted to evaluate the finite sample performance of the proposed CMF method and compared it with existing methods. We focus primarily on the performance of different strategies to combining markers in attaining the surrogacy of the resulting composite score. In addition to CMF, we include the individual markers S1,,Sp, or single scores, and composite scores g^bas(S), g^par(S), α^0S with α^0 obtained by fitting a generalized linear model of Y(0)G(α0S(0)) using the control group data, where the link function G() is set as the identity for the continuous outcome and anti-logit for a binary outcome. The score αˆ0S is analogous to the multiple surrogate markers approach proposed in Parast et al. (2021), which was originally designed for time-to-event outcomes.

To objectively compare the performance of these scores, we evaluate the PTE of the scores nonparametrically based on the Wang et al. (2020) definition, which has also been shown to be equivalent to the model-free PTE measure studied in Parast et al. (2016) with a specific choice of reference distribution. We also compare the PTE estimate to the oracle PTE, that is, the PTE of the oracle score gopt(S) in (3), which is not attainable under model misspecification. Under potential model misspecification, we would expect that all methods would likely fail to capture the complex relationship between S(a) and Y(a) and hence produce composite markers with lower surrogacy than gopt(S). As an additional benchmark, we also obtained the PTE estimator using the method of Freedman et al. (1992), which simply fits two regressions, one with a treatment indicator only and one with the treatment indicator and the surrogates, as this method is most widely used in practice. The corresponding estimator is denoted as PTEF. Across all configurations, we let n=500 and summarize results based on 500 simulated datasets. Variances were estimated using resampling. We used a natural cubic spline basis with three knots for Ψ in estimating gbas. We chose K() as a Gaussian kernel with bandwidth h=hoptn-c0,c0=0.11, where hopt was found using the method of Scott (1992). For gpar, we let κ(x)=x and included linear and two-way interactions in Φ for estimating both ma(S) and 𝒫1(S).

We considered three settings. In setting (1), we generated S4×1(a)N(a1,I), where 1 denotes the vector of 1 and I denotes the identity matrix. We then generated Y(a) from

Y(a)=1S(a)+aS1(a)S2(a)+S2(a)S3(a)+2S3(a)S4(a)+8-5+ϵa,whereϵaN0,0.32. (19)

We considered a more nonlinear setting (2) by first generating S~4×1(a)=(S~1(a),,S~4(a))N0,4a(0.5I+0.5) and then letting Sj(a)=(3+a)/{1+exp(-5(1-a)S~(a))}. Subsequently, we generated Y(a) from the following model:

Y(a)=exp{0.1(S1(a)+S2(a)+S3(a)+5×(-1)a+1S4(a))+ϵa},whereϵaN0,0.32. (20)

Here ma(S) is log-linear in S, and S(a) has a complex distribution. Finally in setting (3), we generated S(a) as Sj(a)=3/{1+exp(-S~(a))} but generated Y(a) from

Y(a)=exp{0.1(S1(a)+S2(a)+S3(a)-101-aS2(a)S3(a)+(-1)aS4(a)+ϵa}, (21)

where ϵaN0,0.32. Under this setting, the interaction term S2(a)S3(a) is highly influential and, thus, we would expect that g^bas(S) may not perform well. In Web Appendix D, we provide figures illustrating these settings.

In Table 1, we compare the nonparametric estimates of the PTE for the aforementioned composite scores as well as the oracle PTE of gopt(S), PTEgopt. In the table, PTEω is the limiting value of the estimated PTE of gCMF(S). That is, PTEgopt and PTEω are true PTE values for gopt(S) and gCMF(S), respectively, while others are estimates of PTE. Across all the settings, the PTE estimates based on the proposed gCMF(S) were the highest among all the PTE estimates. In setting (1), where the effects are relatively linear, the composite scores derived from the different approaches attain comparable PTEs and their resulting PTEs are also comparable to PTEgopt. In settings (2) and (3), the distribution of S(a) is relatively complex. In addition, the linear regression model for the conditional mean function ma(S) is misspecified in the link in settings (2) and (3). The proposed PTE attains a little lower value than the oracle PTEgopt in setting (2) but a comparable value in setting (3). For the PTE estimates based on the two heavily model-dependent scores, g^par(S) and α^0S, compared with the proposed PTE estimate, they are much lower in setting (2), while the g^par(S)-based PTE estimate is comparable and the α^0S-based PTE is still lower in setting (3). The composite score gbas(S) gives more robust performance attaining near-identical PTE as that of gCMF(S) in setting (2), but a lower PTE in setting (3).

TABLE 1.

Nonparametric estimates of the PTE (the definition in Wang et al. (2020)) for gCMF(S), gpar(S), gbas(S), α0S, S1, S2, S3, and S4 along with their empirical standard errors (ESE) under settings (1), (2), and (3) with n=500. For comparison, we also include the oracle PTE of gopt(S)(PTEgopt) as well as the model-based PTE estimate based on Freedman et al. (1992) PTEF. For the proposed CMF composite score, we also present the average of the estimated standard errors (ASE, shown in the subscript) along with the empirical coverage probabilities (CP) of the 95% confidence intervals (×100)

Combined markers

gCMF(S) gpar(S) gbas(S) α0S PTEF





Setting PTEgopt PTEω¯ Est ESEASE CP Est ESE Est ESE Est ESE Est ESE
(1) 0.85 0.86 0.85 0.0190.018 92.7 0.85 0.019 0.84 0.019 0.83 0.020 0.50 0.027
(2) 0.94 0.85 0.84 0.0230.022 94.2 0.54 0.046 0.83 0.022 0.54 0.041 0.21 0.035
(3) 0.68 0.67 0.68 0.0360.037 94.0 0.67 0.036 0.55 0.038 0.24 0.042 0.00 0.010
Individual markers

S1 S2 S3 S4




Setting Est ESE Est ESE Est ESE Est ESE
(1) 0.40 0.045 0.42 0.045 0.44 0.046 0.42 0.046
(2) 0.46 0.039 0.47 0.040 0.47 0.041 0.52 0.040
(3) 0.26 0.041 0.26 0.039 0.26 0.041 0.26 0.040

The composite scores, gCMF(S), gpar(S), gbas(S), and αˆ0S, all attain a higher PTE estimate than the individual markers alone. These results highlight the value of combining multiple markers to improve surrogacy and the importance of robust approaches to combine markers. The fully model-based PTE estimate of Freedman et al. (1992) gives significantly lower PTE estimates in all three settings and even smaller PTE estimates than the individual markers in settings (2) and (3).

In Table 1, we also present the performance of the standard error and interval estimation based on resampling for the proposed PTE^ω^. In general, the PTE estimates based on PTE^ωˆ present negligible biases for estimating PTEω and the average standard errors are close to the corresponding empirical standard errors. The empirical coverage probabilities of the confidence intervals are close to the nominal level of 95%.

In Table 2, we show the Spearman rank correlations between each g function and gopt(S), with the correlation for gCMF(S) being highest across all three settings, though may not be close to 1 as in setting (2). This is consistent with magnitudes of PTE estimates.

TABLE 2.

The Spearman rank correlation between gopt(S) and g^CMF(S),g^par(S),g^bas(S),α^0S,g^optS1,g^optS2,g^optS3, and g^optS4, respectively

Setting g^CMF(S) g^par(S) g^bas(S) α^0S g^optS1 g^optS2 g^optS3 g^optS4
(1) 0.999 1.000 0.984 0.986 0.608 0.610 0.646 0.638
(2) 0.710 0.459 0.675 0.405 0.614 0.602 0.600 0.560
(3) 0.970 0.971 0.846 0.387 0.472 0.521 0.504 0.504

3.2 |. Application to the Diabetes Prevention Program study

The DPP was a randomized clinical trial designed to evaluate the effect of several prevention strategies for reducing the risk of type 2 diabetes (T2D) among high-risk individuals with prediabetes (Diabetes Prevention Program Group, 1999, 2002). The participants were randomized to one of four treatment groups: placebo, lifestyle intervention, metformin, and troglitazone. The primary endpoint of the trial was time to T2D onset, denoted as T, and the participants were followed up for 5 years with an average followup of 2.8 years. Previous study results showed that both lifestyle and metformin significantly reduced the risk of T2D. For illustration purposes, we focus on the comparison of lifestyle intervention group n1=1010 and placebo group n0=1007 with respect to diabetes risk at year t,Yt=I(T>t), with t=1,2,3, and 4. Our goal is to investigate to what extent three surrogates: HbA1c at t0=0.5 year A1C0.5,S2 fasting glucose at t0=0.5 year, and S3 diabetes incidence up to time t0=0.5, can be used in combination to predict the treatment effect on Yt. Since HbA1c and fasting glucose are also measured at baseline, one may potentially also a consider change from baseline in these two markers as surrogates. As such, we also include HbA1c at baseline S10 and fasting glucose at baseline S20 as part of S in the construction of the composite score. To account for censoring, we use the IPW strategies in the estimation of the model parameters and the final PTE as discussed in Section 2. We use the proposed resampling strategy to estimate the standard errors and construct confidence intervals (CIs).

The PTE estimates for g^CMF(S), g^par(S), g^bas(S), α^0S, and PTEF along with the PTE of the individual markers are shown in Table 3. The PTEs of the composite scores derived from S are generally higher than that of the individual markers, and g^CMF(S) resulted in the highest PTE for all years, for example, a PTE of 0.67 (95% CI: [0.51, 0.83]) for 1-year T2D risk and 0.58 (95%CI : [0.44, 0.72]) for 4-year T2D risk. Thus, in this example, it appears beneficial to use these surrogates jointly to infer or test the treatment effect on the primary outcome, compared to a single surrogate individually. While the PTE of g^CMF(S) is not substantially higher than the other choices of composite markers, it is consistently high across all time points.

TABLE 3.

Nonparametric estimates of the PTE for the proposed g^CMF(S) as well as g^par(S), g^bas(S), αˆS, and the individual markers, HBA1c0.5, Glucose0.5, I(T>0.5), along with the standard errors of the estimated PTE for g^CMF(S), are shown as a subscript. Note that the long-term outcome is defined as Yt=I(T>t) and is illustrated with t=1,2,3, and 4. For comparison, the model-based PTE estimate based on Freedman et al. (1992) PTEF is also shown

Combined markers Individual markers


Outcome
Yt=I(T>t)
g^CMF(S) g^par(S) g^bas(S) αˆS HBA1c0.5 Glucose0.5 I(T>0.5) PTEF
t=1 0.670.080 0.61 0.64 0.63 0.26 0.53 0.29 0.67
t=2 0.660.072 0.63 0.64 0.59 0.28 0.56 0.15 0.57
t=3 0.590.063 0.59 0.58 0.51 0.27 0.53 0.10 0.42
t=4 0.580.069 0.57 0.58 0.53 0.27 0.53 0.10 0.44

4 |. DISCUSSION

We have proposed a robust CMF method to optimally combine multiple surrogate markers and evaluate their surrogacy based on the PTE of the composite score. Our CMF method offers several unique contributions compared to current available methods. First, this approach is flexible in that it does not solely rely on the correct specification of a parametric model but rather combines a parametric component and a nonparametric component in an optimal way. Second, this approach identifies and evaluates the optimal combination of the surrogates, rather than an arbitrary or model-based combination. Third, the determination of the optimal combination and the definition of the PTE here directly reflect a prediction perspective in that we aim to find a combination whereby the treatment effect on this combination can predict the treatment effect on the primary outcome. Price et al. (2018) similarly sought to identify the optimal transformation of a surrogate in a prediction-based framework. While their proposed approach is distinct from that proposed here, the growing consideration of such a framework highlights the benefits of a prediction perspective in that it is directly in line with the ultimate goal of surrogate marker research which is to enable one to replace an expensive or long-term outcome with a surrogate in a future trial. This work provides a useful and novel contribution to the surrogate marker evaluation field. Though we build from the optimal transformation framework of Wang et al. (2020), our two-step CMF approach which combines two model-based estimates of the optimal transformation is unique to this paper.

Importantly, while we state that ω is estimated to optimize the PTE of gCMF(S), we technically require that -cωc, where c is a large finite positive constant. Certainly, discussion regarding the choice of c is needed. Although ω will generally lie in [0, 1], restricting 0ω1 may result in a boundary problem if the true value of ω equals 0 or 1. When the true value lies on the boundary, the asymptotic properties of the estimator cannot be established in the standard way. Furthermore, the data resampling method may not be valid. Therefore, we specify -cωc, where c is chosen to be a relatively large value, such as 5 in our numerical studies.

An important question is what one would actually do with this optimal combination and estimated PTE in practice. We expect that the primary use of these estimates would be to inform a decision around whether this combination is a “good” surrogate. Some previous work has suggested considering a surrogate “good” if the lower bound of the 95% CI of the PTE estimator is above some threshold such as 0.50 (Lin et al., 1997). Although the focus of this paper is to evaluate the surrogacy, once there is agreement that the combined surrogacy is “good,” then there would likely be interest in using the surrogate to make inference regarding an effect on the primary outcome in future studies when the primary outcome is not available. For example, one may use Δg in a future study to test for a treatment effect on the primary outcome, such that we can avoid measuring Y in that future study. Furthermore, the estimated value of PTE can also be used to inform surrogate marker-based future study design focusing on estimating and testing Δg in terms of power calculation.

Another question that may arise is how the proposed approach can be used to determine whether the optimal combination of surrogates offers a significant improvement, with respect to PTE, over a single surrogate, or over the combination of a subset of surrogates. To determine the incremental value of a set of surrogates compared to one or a smaller subgroup of surrogates, one could use our estimation approach to obtain point estimates for the difference in the PTE and additionally use a similar resampling procedure to obtain corresponding variance estimates, allowing for the construction of the CI for the difference in PTE.

The methods proposed here do not utilize any baseline covariate information; however, one could consider using such information to improve efficiency in the estimation of Δω and Δ via augmentation similar to Tian et al. (2012), Garcia et al. (2011), Zhang et al. (2008), and Parast et al. (2017). Given treatment randomization, the augmented versions of these estimators would converge to the same limit as the nonaugmented estimators and the augmentation component can be selected such that the variances of augmented estimators are minimized. Increased efficiency with the augmented estimate would be expected as long as the covariates are associated with the primary outcome and the surrogate markers. In addition, baseline covariates can and should be utilized if the study is not randomized to correct for confounding biases, for example, via a method recently proposed in Han et al. (2021).

We have focused on a single study setting, that is, where we only have individual-level data from a single study that can be used to evaluate the surrogates. In settings where there are multiple studies available for surrogate evaluation, a meta-analytic framework should be considered (Joffe and Greene, 2009; Buyse et al., 2000; Burzykowski et al., 2005).

Our proposed approach has some limitations. First, the resulting combination of surrogate markers, g^CMF(S), may be complex, rendering its interpretation more difficult than, for example, a single surrogate marker for patients and clinicians. The trade-off between interpretability and complexity should be carefully considered for each individual application. It would be expected that any method that attempts to combine multiple surrogates using a robust approach will require some complexities beyond that in a single marker setting. In practice, one could consider exploring interpretability by viewing g^CMF(s) as a function of s and examine how its value varies with each individual surrogate marker while fixing others at constant levels of interest. In addition, one could examine the incremental value of the combined surrogates over a single surrogate or a subset of surrogates with respect to the PTE, as described above. One may also consider quantifying the importance of an individual marker using the feature importance measurement used in a random forest; similar techniques have been developed in machine learning research to characterize a complex functional output from an algorithm (Altmann et al., 2010; Strobl et al., 2008). Second, our estimation approach requires both a relatively large sample size given the nonparametric components and the selection of multiple tuning parameters within the kernel smoothing estimation and basis expansion estimation. For a given dataset without prior knowledge, it is generally difficult to determine which choice of basis functions for gbas(S) can yield the best approximation to gopt(S). One should ideally choose appropriate basis functions according to prior knowledge and can rely on commonly used basis functions such as b-splines and natural splines if no prior knowledge is available to guide the choice. Third, we focus exclusively on the PTE as our measure of surrogacy. Certainly, PTE is not the only measure of surrogacy that may be of interest and limitations of this quantity have been previously discussed (Lin et al., 1997; VanderWeele, 2013). While there is no agreement in the literature on what quantity is the “best” measure of surrogacy in a single trial setting, we focus on PTE because it is used widely in practice, appealing to clinicians and applied researchers, and considered easy to interpret, which we believe will increase the probability that our method will be used and contribute to future robust evaluation of surrogate markers (Agyemang et al., 2018; Inker et al., 2016; Sprenger et al., 2020). Finally, our procedure follows the framework of Wang et al. (2020) with gopt(S)=argming:E{Y(0)-g(S(0))}=0E[{Y(1)-g(S(1))}2+{Y(0)-g(S(0))}2] derived to approximate goracle=argming:E{Y(0)-g(S(0))}=0E([{Y(1)-g(S(1))}-{Y(0)-g(S(0))}]2). Obviously, gopt=goracle if (Y(1),S(1))(Y(0),S(0)). Simulation studies given in Wang et al. (2020) suggest gopt approximates goracle reasonably well when the correlation between (Y(1),S(1)) and (Y(0),S(0)) is not too strong. Nevertheless, both the surrogate index gopt(S) and gCMF(S) are valid surrogate markers under relatively mild assumptions that can be verified using observed data.

Supplementary Material

appendix

Funding information

National Institute of Diabetes and Digestive and Kidney Diseases, Grant/Award Number: R01 DK118354; The University of Texas at Austin

Footnotes

OPEN RESEARCH BADGES

This article has earned an Open Materials badge for making publicly available the components of the research methodology needed to reproduce the reported procedure and analysis. All materials are available at xx.

SUPPORTING INFORMATION

Web Appendix A, B, C, and D referenced in Sections 1, 2, and 3 are available with this paper at the Biometrics website on Wiley Online Library. An R package implementing our proposed approach, named CMFsurrogate, is available at https://github.com/laylaparast/CMFsurrogate, including code and example data. This package is also available at the Biometrics website on Wiley Online Library.

DATA AVAILABILITY STATEMENT

The data that support the findings in this paper in Section 3.2 are available from the corresponding author upon reasonable request.

REFERENCES

  1. Agyemang E, Magaret AS, Selke S, Johnston C, Corey L and Wald A (2018) Herpes simplex virus shedding rate: surrogate outcome for genital herpes recurrence frequency and lesion rates, and phase 2 clinical trials end point for evaluating efficacy of antivirals. The Journal of Infectious Diseases, 218, 1691–1699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Altmann A, Toloşi L, Sander O and Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics, 26, 1340–1347. [DOI] [PubMed] [Google Scholar]
  3. Andersen PK and Pohar Perme M (2010) Pseudo-observations in survival analysis. Statistical Methods in Medical Research, 19, 71–99. [DOI] [PubMed] [Google Scholar]
  4. Athey S, Chetty R, Imbens GW and Kang H (2019) The surrogate index: combining short-term proxies to estimate long-term treatment effects more rapidly and precisely. Technical report, National Bureau of Economic Research. [Google Scholar]
  5. Brookmeyer R, Gail MH, et al. (1994) AIDS Epidemiology: A Quantitative Approach, Monographs in Epidemiology and Biostatistics, volume 22. Oxford, UK: Oxford University Press. [Google Scholar]
  6. Burzykowski T, Molenberghs G and Buyse M (2005) The Evaluation of Surrogate Endpoints. Berlin: Springer. [Google Scholar]
  7. Buyse M, Molenberghs G, Burzykowski T, Renard D and Geys H (2000) The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics, 1, 49–67. [DOI] [PubMed] [Google Scholar]
  8. Chen J, Li W, Huang X, Guo C, Zou R, Yang Q, Zhang H, Zhang T, Chen H and Wu H (2013) Evaluating total lymphocyte count as a surrogate marker for CD4 cell count in the management of HIV-infected patients in resource-limited settings: a study from China. PloS ONE, 8, e69704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Diabetes Prevention Program Group, (1999) The Diabetes Prevention Program: design and methods for a clinical trial in the prevention of Type 2 diabetes. Diabetes Care, 22, 623–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Diabetes Prevention Program Group, (2002) Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England Journal of Medicine, 346, 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Doyen J, Alix-Panabières C, Hofman P, Parks SK, Chamorey E, Naman H and Hannoun-Lévi J-M (2012) Circulating tumor cells in prostate cancer: a potential surrogate marker of survival. Critical Reviews in Oncology/Hematology, 81, 241–256. [DOI] [PubMed] [Google Scholar]
  12. Freedman LS, Graubard BI and Schatzkin A (1992) Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine, 11, 167–178. [DOI] [PubMed] [Google Scholar]
  13. Garcia TP, Ma Y and Yin G (2011) Efficiency improvement in a class of survival models through model-free covariate incorporation. Lifetime Data Analysis, 17, 552–565. [DOI] [PubMed] [Google Scholar]
  14. Han L, Wang X and Cai T (2021) On the evaluation of surrogate markers in real world data settings. arXiv preprint. arXiv:2104.05513. [Google Scholar]
  15. Huang Y and Gilbert PB (2011) Comparing biomarkers as principal surrogate endpoints. Biometrics, 67, 1442–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Inker LA, Mondal H, Greene T, Masaschi T, Locatelli F, Schena FP, Katafuchi R, Appel GB, Maes BD, Li PK, et al. (2016) Early change in urine protein as a surrogate end point in studies of IgA nephropathy: an individual-patient meta-analysis. American Journal of Kidney Diseases, 68, 392–401. [DOI] [PubMed] [Google Scholar]
  17. Joffe MM and Greene T (2009) Related causal frameworks for surrogate outcomes. Biometrics, 65, 530–538. [DOI] [PubMed] [Google Scholar]
  18. Klein JP and Andersen PK (2005) Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics, 61, 223–229., 223–229. [DOI] [PubMed] [Google Scholar]
  19. Lin D, Fleming T, De Gruttola V, et al. (1997) Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in Medicine, 16, 1515–1527. [DOI] [PubMed] [Google Scholar]
  20. Parast L, Cai T and Tian L (2017) Evaluating surrogate marker information using censored data. Statistics in Medicine, 36, 1767–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Parast L, Cai T and Tian L (2021) Evaluating multiple surrogate markers with censored data. Biometrics, 77, 1315–1327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Parast L, McDermott MM and Tian L (2016) Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in Medicine, 35, 1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Prentice RL (1989) Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine, 8, 431–440. [DOI] [PubMed] [Google Scholar]
  24. Price BL, Gilbert PB and van der Laan MJ (2018) Estimation of the optimal surrogate based on a randomized trial. Biometrics, 74, 1271–1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Scott D (1992) Multivariate density estimation. In Multivariate Density Estimation. New York: Wiley. 1. [Google Scholar]
  26. Sprenger T, Kappos L, Radue E-W, Gaetano L, Mueller-Lenke N, Wuerfel J, Poole EM and Cavalier S (2020) Association of brain volume loss and long-term disability outcomes in patients with multiple sclerosis treated with teriflunomide. Multiple Sclerosis Journal, 26, 1207–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Strobl C, Boulesteix A-L, Kneib T, Augustin T and Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Tian L, Cai T, Zhao L and Wei L-J (2012) On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial. Biostatistics, 13, 256–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Van der Elst W, Alonso AA, Geys H, Meyvisch P, Bijnens L, Sengupta R and Molenberghs G (2019) Univariate versus multivariate surrogates in the single-trial setting. Statistics in Biopharmaceutical Research, 11, 301–310. [Google Scholar]
  30. VanderWeele TJ (2013) Surrogate measures and consistent surrogates. Biometrics, 69, 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wang X, Parast L, Tian L and Cai T (2020) Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika, 107, 107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wang Y and Taylor JM (2002) A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics, 58, 803–812. [DOI] [PubMed] [Google Scholar]
  33. Wondimeneh Y, Ferede G, Yismaw G and Muluye D (2012) Total lymphocyte count as surrogate marker for CD4 cell count in HIV-infected individuals in Gondar University Hospital, Northwest Ethiopia. AIDS Research and Therapy, 9, 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Xu J and Zeger SL (2001) The evaluation of multiple surrogate endpoints. Biometrics, 57, 81–87. [DOI] [PubMed] [Google Scholar]
  35. Zhang M, Tsiatis AA and Davidian M (2008) Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64, 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

appendix

Data Availability Statement

The data that support the findings in this paper in Section 3.2 are available from the corresponding author upon reasonable request.

RESOURCES