Abstract
Identifying effective and valid surrogate markers to make inference about a treatment effect on long-term outcomes is an important step in improving the efficiency of clinical trials. Replacing a long-term outcome with short-term and/or cheaper surrogate markers can potentially shorten study duration and reduce trial costs. There is sizable statistical literature on methods to quantify the effectiveness of a single surrogate marker. Both parametric and nonparametric approaches have been well developed for different outcome types. However, when there are multiple markers available, methods for combining markers to construct a composite marker with improved surrogacy remain limited. In this paper, building on top of the optimal transformation framework of Wang et al. (2020), we propose a novel calibrated model fusion approach to optimally combine multiple markers to improve surrogacy. Specifically, we obtain two initial estimates of optimal composite scores of the markers based on two sets of models with one set approximating the underlying data distribution and the other directly approximating the optimal transformation function. We then estimate an optimal calibrated combination of the two estimated scores which ensures both validity of the final combined score and optimality with respect to the proportion of treatment effect explained by the final combined score. This approach is unique in that it identifies an optimal combination of the multiple surrogates without strictly relying on parametric assumptions while borrowing modeling strategies to avoid fully nonparametric estimation which is subject to the curse of dimensionality. Our identified optimal transformation can also be used to directly quantify the surrogacy of this identified combined score. Theoretical properties of the proposed estimators are derived, and the finite sample performance of the proposed method is evaluated through simulation studies. We further illustrate the proposed method using data from the Diabetes Prevention Program study.
Keywords: multiple surrogate markers, nonparametric estimation, proportion of treatment effect explained
1 |. INTRODUCTION
The primary outcomes of randomized clinical trials often require long-term follow-up and/or involve expensive or invasive measurement procedures. Leveraging short-term or less costly surrogate markers to draw valid inference for the treatment effect on the long-term outcome can potentially reduce trial duration, improve cost-effectiveness, and reduce enrollment requirements. For example, in HIV research, biomarkers such as CD4 cell counts and viral load have been used as surrogate outcomes for long-term primary outcomes, such as time to mortality or diagnosis of AIDS (Brookmeyer et al., 1994). Furthermore, in resource-limited settings, total lymphocyte count has been studied as a surrogate marker for CD4 cell count (Wondimeneh et al., 2012; Chen et al., 2013).
There is sizable statistical literature on methods for quantifying the effectiveness of a single surrogate marker in predicting a treatment effect on a long-term outcome. Following the criteria initially proposed by Prentice (1989) for a valid surrogate marker, both model-based (Freedman et al., 1992; Wang and Taylor, 2002) and nonparametric (Parast et al., 2016; Wang et al., 2020) statistical methods for evaluating the validity of surrogate markers have been proposed. For example, Freedman et al. (1992) proposed a measure of the proportion of the treatment effect on the primary outcome that is explained by the treatment effect on the surrogate (PTE) by examining the change in treatment effect size when the surrogate is added to a specified regression model. Wang and Taylor (2002) proposed to quantify the PTE by examining what the treatment effect would be if the surrogate marker in the treatment group had the same distribution as the surrogate in the control group and obtained a model-based estimate for this PTE measure. However, these model-based approaches may yield an invalid PTE estimate under model misspecification (Lin et al., 1997). To overcome such limitations, Parast et al. (2016) proposed a fully nonparametric modelfree estimation procedure for the PTE measure defined by Wang and Taylor (2002). Recently, Wang et al. (2020) proposed an alternative model-free approach to quantifying PTE by identifying an optimal transformation of the surrogate marker that best predicts treatment effect and then quantifying the PTE based on the treatment effect on the transformed marker.
The aforementioned methods are generally limited to a single surrogate setting. When there are multiple surrogate markers, denoted by , it would be valuable to derive a composite marker, , which ideally would have higher surrogacy compared to any alone. For example, in prostate cancer, there are multiple promising surrogate markers including prostate-specific antigen, Gleason score, and circulating tumor cells that have been examined for managing progression and treatment response (Doyen et al., 2012). In diabetes prevention studies, one of which we examine in this paper, there are also often multiple potential surrogate markers including fasting plasma glucose, hemoglobin A1c (HbA1c), and early incidence of diabetes. The ability to examine the surrogacy of glucose, HbA1c, and early incidence together as well as compared to the surrogacy of each of these alone can provide valuable information for the design and conduction of future studies that may use these potential surrogates. Existing methods that can be used for assessing the surrogacy of multiple markers are largely model-based (Freedman et al., 1992; Huang and Gilbert, 2011; Xu and Zeger, 2001; Parast et al., 2016; Van der Elst et al., 2019; Price et al., 2018, for instance). For example, Xu and Zeger (2001) proposed a latent variable model on the joint distribution of the outcome and markers given treatment to determine whether combining multiple surrogates can improve surrogacy. Similar to the single marker scenario, these model-based methods can lead to invalid inference under model misspecification (Parast et al., 2021).
More robust approaches to optimally combine have been proposed recently. For example, Price et al. (2018) proposed a novel flexible prediction framework to derive an optimal combination of multiple markers within each treatment group which can subsequently be used to predict the treatment effect on . However, their approach does not offer a measure of surrogate strength that can be used to make decisions about the utility of the surrogates in a future study and cannot be used to assess the surrogacy of the markers since the combined markers are assumed to attain perfect surrogacy (see Web Appendix A). Athey et al. (2019) proposed a model-based approach in which they estimate the conditional expectation of the primary outcome given multiple surrogates and the conditional probability of having received treatment given the multiple surrogates and combine these models to estimate a treatment effect based on the surrogates. Though they demonstrate that the use of the surrogates via their proposed method can be more efficient than using the primary outcome itself (under certain assumptions), their proposed surrogate index based on is not guaranteed to be a valid surrogate marker or possess any optimal property in predicting the treatment effect on , especially under possible model misspecifications. Parast et al. (2021) proposed a robust approach to quantify the PTE of multiple surrogates in a time-to-event outcome setting whereby a parametric working model is used to define and then a nonparametric method is used to estimate the PTE of . While useful, in obtaining g, Parast et al. (2021) do not target any optimization function nor do they claim that is optimal in any way; therefore, it is always possible that there is an alternative function that captures more surrogacy of multiple markers. In summary, there does not appear to be any available method to identify and evaluate the surrogacy of an optimal combination of . In this paper, we fill the gap by proposing a novel calibrated model fusion (CMF) approach to optimally combine multiple markers.
The two-step CMF approach leverages flexible modeling strategies to construct two composite scores of and overcomes model misspecification via an additional calibration step. Specifically, we first obtain two initial estimates of optimal composite scores, and , based on two sets of working models with one targeting the underlying data distribution and the other targeting an optimal transformation function. We then estimate an optimal calibrated fusion of the two scores by maximizing the PTE. This flexible modeling approach allows us to approximate the optimal transformation of without relying on fully nonparametric estimation which is subject to the curse of dimensionality. We demonstrate that the final CMF composite score, , is valid with PTE between 0 and 1 under mild regularity conditions and is optimal in a certain sense with respect to PTE. This approach is unique in that it identifies an optimal combination of without strictly relying on parametric assumptions, and the treatment effect on can be directly used to make inference regarding the PTE and approximate the treatment effect on in future clinical trials if PTE is high.
The remainder of the paper is organized as follows. In Section 2, we detail the CMF approach to combining the surrogate markers and propose a PTE definition based on the final composite marker. In Section 3.1, simulation studies are conducted to evaluate the finite sample performance of the proposed method. We also illustrate the proposed method using data from the Diabetes Prevention Program (DPP) study. Concluding remarks are made in Section 4. Proofs of asymptotic results are provided in Web Appendix B.
2 |. METHODOLOGY
2.1. Notation and approach
Let be the primary outcome of interest and be the vector of surrogate markers. The outcome may be either continuous or discrete. Let be the treatment indicator, with A = 1 denoting the treatment and A = 0 denoting control. We assume that patients are randomly assigned to the treatment and control groups, and without loss of generality we assume that . Denote as the potential outcome and potential surrogate vector, respectively, . The observed data consist of independent and identically distributed random vectors , where and , such that for and for .
Our goals are (i) to identify an optimal such that
| (1) | 
can optimally predict the treatment effect on , and (ii) estimate the PTE of where , that is, the ratio of the treatment effect on the transformed surrogate and the treatment effect on . Building on top of the optimal transformation framework proposed in Wang et al. (2020), we propose a CMF approach to deriving such an optimal . We first describe the Wang et al. (2020) approach before detailing key steps of the CMF procedure.
For any given set of surrogate markers , Wang et al. (2020) proposed to quantify the surrogacy of by first identifying an optimal transformation such that
| (2) | 
The constraint of is imposed since a constant shift in would yield the same . This constraint is a natural choice and would allow us to incorporate the constraint more easily in the minimization. While we do not require , when this does hold, it is interesting to note that as we have defined it also optimizes the individual level surrogacy measure . It was shown in Wang et al. (2020) that takes the functional form
| (3) | 
where are the respective density and cumulative distribution functions of ,
| (4) | 
and . Furthermore, the quantity was shown to be between 0 and 1 under mild regularity conditions.
(c1) for all ;
(c2) for all u in the common support of and , where and , for . This ensures that is a valid surrogate marker and enables us to quantify the surrogacy of directly based on the treatment effect on .
When is univariate, one may make inference about and nonparametrically. Wang et al. (2020) handle a univariate setting, but cannot accommodate multiple . Fully nonparametric inference is not feasible when is multidimensional due to the curse of dimensionality. However, can be approximated by either imposing distributional assumptions on the data or by restricting in (2) to a smaller functional space. However, each of these options alone may not perform well under model misspecification, which we will demonstrate in our simulation study in Section 3.1. To overcome this, we propose a two-step robust CMF approach where in step (I), we derive two model-based estimates of , and , where is obtained by imposing parametric assumptions on and ; and is obtained by directly minimizing (2) restricting to where is a prespecified -dimensional basis expansion of and is an unknown -dimensional parameter. Since is a complex functional of , and/or may fail to accurately approximate . In step II, we overcome potential model misspecifications by obtaining a final calibrated optimal composite marker as
| (5) | 
where and are estimated to optimize the PTE of . Notably, the “optimality” as we refer to it here essentially has two layers; the first layer is to approximate the optimal transformation function, , defined in Wang et al. (2020) via two methods, a parametric method and a spline basis approximation method, resulting in and , respectively. The second layer of “optimality” is with respect to finding the optimal linear combination of and in step II that maximizes the PTE, which is the primary measure of interest. Intuitively, this model fusion approach to combining surrogate markers is similar to the model average approaches in the literature in that we use several potential models to approximate and subsequently averages them to maximize PTE within a smaller class of transformations of .
We next detail estimation and inference procedures for each step of the CMF approach.
2.2. Step (I): Model fusion
We first describe the estimation of and under two sets of models. First, we impose generalized linear working models for and :
| (6) | 
where is a basis expansion of that includes an intercept, expit , and is a known link function such as for continuous outcomes and for binary outcomes. Then and could be estimated by standard maximum likelihood or M-estimation methods with the estimators denoted as and , respectively, . Subsequently, we construct
| (7) | 
where , and
| (8) | 
Standard asymptotic theory can be used to demonstrate that under mild regularity conditions, converges in probability to a deterministic function, denoted by , and which converges in distribution to a mean zero Gaussian process in , where is an influence function.
We next derive an alternative approximation to by minimizing (2) with the class of that takes the form . Specifically, we obtain and as the minimizer of
| (9) | 
Direct calculations show that the minimizer of (10) has the form and
| (10) | 
where , for any vector and . Similar to the parametric distributional modeling, if is correctly specified for , we expect to correctly recover . A simple plug in estimator for and can be constructed as
| (11) | 
where and . Correspondingly, the estimator of can be obtained as
| (12) | 
Here and in the next step, for simplicity, we remove intercepts from and since the intercept term does not contribute to the PTE or treatment effect estimation. Using standard asymptotic theory, we show in Web Appendix B that under mild regularity conditions, converges in probability to regardless of whether takes such a functional form, and converges in distribution to a mean zero Gaussian process in , where is the influence function for .
2.3 |. Step II: Calibrated optimal combination of
To guard against model misspecification, we create a calibrated optimal combination of and to form the final CMF composite score,
| (13) | 
where and the calibration function are chosen to optimize the PTE of . When is given, the optimal calibration function can be obtained as
| (14) | 
where
| (15) | 
, and . The calibration function enables us to directly estimate the PTE of based on treatment effects on for any ω:
| (16) | 
and . Then, we identify the optimal weight , denoted as , as the maximizer of . The final CMF composite marker is then
| (17) | 
By the construction of , the PTE of is higher than the PTE of or , respectively, denoted by and . Following similar arguments as given in Wang et al. (2020), it can be shown that , if
(C1) for all ;
(C2) for all in the common support of and , where and , for . These two conditions are similar to but slightly weaker than those required in Parast et al. (2016) and Wang and Taylor (2002) and ensure that we are not in a surrogate paradox situation (Vander-Weele, 2013).
To estimate , we nonparametrically estimate for any given as as in Wang et al. (2020) with being the surrogate marker, where ,
| (18) | 
, and is a symmetric kernel function with bandwidth . We then estimate as , where , and . The PTE of , denoted as , can be estimated as , also denoted as .
2.4. Use with censored outcomes
Our proposed modeling framework can also handle binary outcomes defined by censored event times. Specifically, when is defined as -year event status for an event time is not directly observable since is typically only observed up to , where , , and is the censoring time assumed to be independent of and given . For such a -year event status outcome, one may choose as a cumulative distribution function such as and overcome the censoring via inverse probability weighting (IPW) with weights as in Parast et al. (2021), where is the Kaplan-Meier estimate of based on data from patients with . Then, IPW can be used to weight the likelihood for estimating , and to obtain with weight for the th observation in constructing and . In the final estimation of and , similar IPW strategies may be adopted to account for censoring. Alternatively, one could consider an approach using pseudoobservations, particularly if there may be interest in defining the treatment effect in terms of the difference in residual mean survival time (Klein and Andersen, 2005; Andersen and Pohar Perme, 2010).
2.5 |. Inference
In Web Appendix B, we outline the justifications for the convergence of , , and to their respective population values as well as the asymptotic normality of . We find that the variation of does not contribute to the asymptotic variance of as it maximizes . In practice, the asymptotic variance of can be estimated using resampling similar to those employed by Parast et al. (2016) and Wang et al. (2020); details are provided in Web Appendix C.
3 |. NUMERICAL STUDIES
3.1 |. Simulations
Simulation studies were conducted to evaluate the finite sample performance of the proposed CMF method and compared it with existing methods. We focus primarily on the performance of different strategies to combining markers in attaining the surrogacy of the resulting composite score. In addition to CMF, we include the individual markers , or single scores, and composite scores , , with obtained by fitting a generalized linear model of using the control group data, where the link function is set as the identity for the continuous outcome and anti-logit for a binary outcome. The score is analogous to the multiple surrogate markers approach proposed in Parast et al. (2021), which was originally designed for time-to-event outcomes.
To objectively compare the performance of these scores, we evaluate the PTE of the scores nonparametrically based on the Wang et al. (2020) definition, which has also been shown to be equivalent to the model-free PTE measure studied in Parast et al. (2016) with a specific choice of reference distribution. We also compare the PTE estimate to the oracle PTE, that is, the PTE of the oracle score in (3), which is not attainable under model misspecification. Under potential model misspecification, we would expect that all methods would likely fail to capture the complex relationship between and and hence produce composite markers with lower surrogacy than . As an additional benchmark, we also obtained the PTE estimator using the method of Freedman et al. (1992), which simply fits two regressions, one with a treatment indicator only and one with the treatment indicator and the surrogates, as this method is most widely used in practice. The corresponding estimator is denoted as . Across all configurations, we let and summarize results based on 500 simulated datasets. Variances were estimated using resampling. We used a natural cubic spline basis with three knots for in estimating . We chose as a Gaussian kernel with bandwidth , where was found using the method of Scott (1992). For , we let and included linear and two-way interactions in for estimating both and .
We considered three settings. In setting (1), we generated , where denotes the vector of 1 and denotes the identity matrix. We then generated from
| (19) | 
We considered a more nonlinear setting (2) by first generating and then letting . Subsequently, we generated from the following model:
| (20) | 
Here is log-linear in , and has a complex distribution. Finally in setting (3), we generated as but generated from
| (21) | 
where . Under this setting, the interaction term is highly influential and, thus, we would expect that may not perform well. In Web Appendix D, we provide figures illustrating these settings.
In Table 1, we compare the nonparametric estimates of the PTE for the aforementioned composite scores as well as the oracle PTE of , . In the table, is the limiting value of the estimated PTE of . That is, and are true PTE values for and , respectively, while others are estimates of PTE. Across all the settings, the PTE estimates based on the proposed were the highest among all the PTE estimates. In setting (1), where the effects are relatively linear, the composite scores derived from the different approaches attain comparable PTEs and their resulting PTEs are also comparable to . In settings (2) and (3), the distribution of is relatively complex. In addition, the linear regression model for the conditional mean function is misspecified in the link in settings (2) and (3). The proposed PTE attains a little lower value than the oracle in setting (2) but a comparable value in setting (3). For the PTE estimates based on the two heavily model-dependent scores, and , compared with the proposed PTE estimate, they are much lower in setting (2), while the -based PTE estimate is comparable and the -based PTE is still lower in setting (3). The composite score gives more robust performance attaining near-identical PTE as that of in setting (2), but a lower PTE in setting (3).
TABLE 1.
Nonparametric estimates of the PTE (the definition in Wang et al. (2020)) for , , , , , , , and along with their empirical standard errors (ESE) under settings (1), (2), and (3) with . For comparison, we also include the oracle PTE of as well as the model-based PTE estimate based on Freedman et al. (1992) . For the proposed CMF composite score, we also present the average of the estimated standard errors (ASE, shown in the subscript) along with the empirical coverage probabilities (CP) of the 95% confidence intervals (×100)
| Combined markers | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 
 | |||||||||||||
| 
 | 
 | 
 | 
 | 
 | 
|||||||||
| Setting | Est | ESEASE | CP | Est | ESE | Est | ESE | Est | ESE | Est | ESE | ||
| (1) | 0.85 | 0.86 | 0.85 | 0.0190.018 | 92.7 | 0.85 | 0.019 | 0.84 | 0.019 | 0.83 | 0.020 | 0.50 | 0.027 | 
| (2) | 0.94 | 0.85 | 0.84 | 0.0230.022 | 94.2 | 0.54 | 0.046 | 0.83 | 0.022 | 0.54 | 0.041 | 0.21 | 0.035 | 
| (3) | 0.68 | 0.67 | 0.68 | 0.0360.037 | 94.0 | 0.67 | 0.036 | 0.55 | 0.038 | 0.24 | 0.042 | 0.00 | 0.010 | 
| Individual markers | ||||||||
|---|---|---|---|---|---|---|---|---|
| 
 | ||||||||
| 
 | 
 | 
 | 
 | 
|||||
| Setting | Est | ESE | Est | ESE | Est | ESE | Est | ESE | 
| (1) | 0.40 | 0.045 | 0.42 | 0.045 | 0.44 | 0.046 | 0.42 | 0.046 | 
| (2) | 0.46 | 0.039 | 0.47 | 0.040 | 0.47 | 0.041 | 0.52 | 0.040 | 
| (3) | 0.26 | 0.041 | 0.26 | 0.039 | 0.26 | 0.041 | 0.26 | 0.040 | 
The composite scores, , , , and , all attain a higher PTE estimate than the individual markers alone. These results highlight the value of combining multiple markers to improve surrogacy and the importance of robust approaches to combine markers. The fully model-based PTE estimate of Freedman et al. (1992) gives significantly lower PTE estimates in all three settings and even smaller PTE estimates than the individual markers in settings (2) and (3).
In Table 1, we also present the performance of the standard error and interval estimation based on resampling for the proposed . In general, the PTE estimates based on present negligible biases for estimating and the average standard errors are close to the corresponding empirical standard errors. The empirical coverage probabilities of the confidence intervals are close to the nominal level of 95%.
In Table 2, we show the Spearman rank correlations between each function and , with the correlation for being highest across all three settings, though may not be close to 1 as in setting (2). This is consistent with magnitudes of PTE estimates.
TABLE 2.
The Spearman rank correlation between and , and , respectively
| Setting | ||||||||
|---|---|---|---|---|---|---|---|---|
| (1) | 0.999 | 1.000 | 0.984 | 0.986 | 0.608 | 0.610 | 0.646 | 0.638 | 
| (2) | 0.710 | 0.459 | 0.675 | 0.405 | 0.614 | 0.602 | 0.600 | 0.560 | 
| (3) | 0.970 | 0.971 | 0.846 | 0.387 | 0.472 | 0.521 | 0.504 | 0.504 | 
3.2 |. Application to the Diabetes Prevention Program study
The DPP was a randomized clinical trial designed to evaluate the effect of several prevention strategies for reducing the risk of type 2 diabetes (T2D) among high-risk individuals with prediabetes (Diabetes Prevention Program Group, 1999, 2002). The participants were randomized to one of four treatment groups: placebo, lifestyle intervention, metformin, and troglitazone. The primary endpoint of the trial was time to T2D onset, denoted as , and the participants were followed up for 5 years with an average followup of 2.8 years. Previous study results showed that both lifestyle and metformin significantly reduced the risk of T2D. For illustration purposes, we focus on the comparison of lifestyle intervention group and placebo group with respect to diabetes risk at year , with , and 4. Our goal is to investigate to what extent three surrogates: HbA1c at year fasting glucose at year, and diabetes incidence up to time , can be used in combination to predict the treatment effect on . Since HbA1c and fasting glucose are also measured at baseline, one may potentially also a consider change from baseline in these two markers as surrogates. As such, we also include HbA1c at baseline and fasting glucose at baseline as part of in the construction of the composite score. To account for censoring, we use the IPW strategies in the estimation of the model parameters and the final PTE as discussed in Section 2. We use the proposed resampling strategy to estimate the standard errors and construct confidence intervals (CIs).
The PTE estimates for , , , , and along with the PTE of the individual markers are shown in Table 3. The PTEs of the composite scores derived from are generally higher than that of the individual markers, and resulted in the highest PTE for all years, for example, a PTE of 0.67 (95% CI: [0.51, 0.83]) for 1-year T2D risk and 0.58 (95%CI : [0.44, 0.72]) for 4-year T2D risk. Thus, in this example, it appears beneficial to use these surrogates jointly to infer or test the treatment effect on the primary outcome, compared to a single surrogate individually. While the PTE of is not substantially higher than the other choices of composite markers, it is consistently high across all time points.
TABLE 3.
Nonparametric estimates of the PTE for the proposed as well as , , , and the individual markers, HBA1c0.5, Glucose0.5, , along with the standard errors of the estimated PTE for , are shown as a subscript. Note that the long-term outcome is defined as and is illustrated with and 4. For comparison, the model-based PTE estimate based on Freedman et al. (1992) is also shown
| Combined markers | Individual markers | |||||||
|---|---|---|---|---|---|---|---|---|
| 
 | 
 | 
|||||||
| Outcome | 
HBA1c0.5 | Glucose0.5 | ||||||
| 0.670.080 | 0.61 | 0.64 | 0.63 | 0.26 | 0.53 | 0.29 | 0.67 | |
| 0.660.072 | 0.63 | 0.64 | 0.59 | 0.28 | 0.56 | 0.15 | 0.57 | |
| 0.590.063 | 0.59 | 0.58 | 0.51 | 0.27 | 0.53 | 0.10 | 0.42 | |
| 0.580.069 | 0.57 | 0.58 | 0.53 | 0.27 | 0.53 | 0.10 | 0.44 | |
4 |. DISCUSSION
We have proposed a robust CMF method to optimally combine multiple surrogate markers and evaluate their surrogacy based on the PTE of the composite score. Our CMF method offers several unique contributions compared to current available methods. First, this approach is flexible in that it does not solely rely on the correct specification of a parametric model but rather combines a parametric component and a nonparametric component in an optimal way. Second, this approach identifies and evaluates the optimal combination of the surrogates, rather than an arbitrary or model-based combination. Third, the determination of the optimal combination and the definition of the PTE here directly reflect a prediction perspective in that we aim to find a combination whereby the treatment effect on this combination can predict the treatment effect on the primary outcome. Price et al. (2018) similarly sought to identify the optimal transformation of a surrogate in a prediction-based framework. While their proposed approach is distinct from that proposed here, the growing consideration of such a framework highlights the benefits of a prediction perspective in that it is directly in line with the ultimate goal of surrogate marker research which is to enable one to replace an expensive or long-term outcome with a surrogate in a future trial. This work provides a useful and novel contribution to the surrogate marker evaluation field. Though we build from the optimal transformation framework of Wang et al. (2020), our two-step CMF approach which combines two model-based estimates of the optimal transformation is unique to this paper.
Importantly, while we state that is estimated to optimize the PTE of , we technically require that , where is a large finite positive constant. Certainly, discussion regarding the choice of is needed. Although will generally lie in [0, 1], restricting may result in a boundary problem if the true value of equals 0 or 1. When the true value lies on the boundary, the asymptotic properties of the estimator cannot be established in the standard way. Furthermore, the data resampling method may not be valid. Therefore, we specify , where is chosen to be a relatively large value, such as 5 in our numerical studies.
An important question is what one would actually do with this optimal combination and estimated PTE in practice. We expect that the primary use of these estimates would be to inform a decision around whether this combination is a “good” surrogate. Some previous work has suggested considering a surrogate “good” if the lower bound of the 95% CI of the PTE estimator is above some threshold such as 0.50 (Lin et al., 1997). Although the focus of this paper is to evaluate the surrogacy, once there is agreement that the combined surrogacy is “good,” then there would likely be interest in using the surrogate to make inference regarding an effect on the primary outcome in future studies when the primary outcome is not available. For example, one may use in a future study to test for a treatment effect on the primary outcome, such that we can avoid measuring in that future study. Furthermore, the estimated value of PTE can also be used to inform surrogate marker-based future study design focusing on estimating and testing in terms of power calculation.
Another question that may arise is how the proposed approach can be used to determine whether the optimal combination of surrogates offers a significant improvement, with respect to PTE, over a single surrogate, or over the combination of a subset of surrogates. To determine the incremental value of a set of surrogates compared to one or a smaller subgroup of surrogates, one could use our estimation approach to obtain point estimates for the difference in the PTE and additionally use a similar resampling procedure to obtain corresponding variance estimates, allowing for the construction of the CI for the difference in PTE.
The methods proposed here do not utilize any baseline covariate information; however, one could consider using such information to improve efficiency in the estimation of and via augmentation similar to Tian et al. (2012), Garcia et al. (2011), Zhang et al. (2008), and Parast et al. (2017). Given treatment randomization, the augmented versions of these estimators would converge to the same limit as the nonaugmented estimators and the augmentation component can be selected such that the variances of augmented estimators are minimized. Increased efficiency with the augmented estimate would be expected as long as the covariates are associated with the primary outcome and the surrogate markers. In addition, baseline covariates can and should be utilized if the study is not randomized to correct for confounding biases, for example, via a method recently proposed in Han et al. (2021).
We have focused on a single study setting, that is, where we only have individual-level data from a single study that can be used to evaluate the surrogates. In settings where there are multiple studies available for surrogate evaluation, a meta-analytic framework should be considered (Joffe and Greene, 2009; Buyse et al., 2000; Burzykowski et al., 2005).
Our proposed approach has some limitations. First, the resulting combination of surrogate markers, , may be complex, rendering its interpretation more difficult than, for example, a single surrogate marker for patients and clinicians. The trade-off between interpretability and complexity should be carefully considered for each individual application. It would be expected that any method that attempts to combine multiple surrogates using a robust approach will require some complexities beyond that in a single marker setting. In practice, one could consider exploring interpretability by viewing as a function of and examine how its value varies with each individual surrogate marker while fixing others at constant levels of interest. In addition, one could examine the incremental value of the combined surrogates over a single surrogate or a subset of surrogates with respect to the PTE, as described above. One may also consider quantifying the importance of an individual marker using the feature importance measurement used in a random forest; similar techniques have been developed in machine learning research to characterize a complex functional output from an algorithm (Altmann et al., 2010; Strobl et al., 2008). Second, our estimation approach requires both a relatively large sample size given the nonparametric components and the selection of multiple tuning parameters within the kernel smoothing estimation and basis expansion estimation. For a given dataset without prior knowledge, it is generally difficult to determine which choice of basis functions for can yield the best approximation to . One should ideally choose appropriate basis functions according to prior knowledge and can rely on commonly used basis functions such as b-splines and natural splines if no prior knowledge is available to guide the choice. Third, we focus exclusively on the PTE as our measure of surrogacy. Certainly, PTE is not the only measure of surrogacy that may be of interest and limitations of this quantity have been previously discussed (Lin et al., 1997; VanderWeele, 2013). While there is no agreement in the literature on what quantity is the “best” measure of surrogacy in a single trial setting, we focus on PTE because it is used widely in practice, appealing to clinicians and applied researchers, and considered easy to interpret, which we believe will increase the probability that our method will be used and contribute to future robust evaluation of surrogate markers (Agyemang et al., 2018; Inker et al., 2016; Sprenger et al., 2020). Finally, our procedure follows the framework of Wang et al. (2020) with derived to approximate . Obviously, if . Simulation studies given in Wang et al. (2020) suggest approximates reasonably well when the correlation between and is not too strong. Nevertheless, both the surrogate index and are valid surrogate markers under relatively mild assumptions that can be verified using observed data.
Supplementary Material
Funding information
National Institute of Diabetes and Digestive and Kidney Diseases, Grant/Award Number: R01 DK118354; The University of Texas at Austin
Footnotes
OPEN RESEARCH BADGES
This article has earned an Open Materials badge for making publicly available the components of the research methodology needed to reproduce the reported procedure and analysis. All materials are available at xx.
SUPPORTING INFORMATION
Web Appendix A, B, C, and D referenced in Sections 1, 2, and 3 are available with this paper at the Biometrics website on Wiley Online Library. An R package implementing our proposed approach, named CMFsurrogate, is available at https://github.com/laylaparast/CMFsurrogate, including code and example data. This package is also available at the Biometrics website on Wiley Online Library.
DATA AVAILABILITY STATEMENT
The data that support the findings in this paper in Section 3.2 are available from the corresponding author upon reasonable request.
REFERENCES
- Agyemang E, Magaret AS, Selke S, Johnston C, Corey L and Wald A (2018) Herpes simplex virus shedding rate: surrogate outcome for genital herpes recurrence frequency and lesion rates, and phase 2 clinical trials end point for evaluating efficacy of antivirals. The Journal of Infectious Diseases, 218, 1691–1699. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Altmann A, Toloşi L, Sander O and Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics, 26, 1340–1347. [DOI] [PubMed] [Google Scholar]
 - Andersen PK and Pohar Perme M (2010) Pseudo-observations in survival analysis. Statistical Methods in Medical Research, 19, 71–99. [DOI] [PubMed] [Google Scholar]
 - Athey S, Chetty R, Imbens GW and Kang H (2019) The surrogate index: combining short-term proxies to estimate long-term treatment effects more rapidly and precisely. Technical report, National Bureau of Economic Research. [Google Scholar]
 - Brookmeyer R, Gail MH, et al. (1994) AIDS Epidemiology: A Quantitative Approach, Monographs in Epidemiology and Biostatistics, volume 22. Oxford, UK: Oxford University Press. [Google Scholar]
 - Burzykowski T, Molenberghs G and Buyse M (2005) The Evaluation of Surrogate Endpoints. Berlin: Springer. [Google Scholar]
 - Buyse M, Molenberghs G, Burzykowski T, Renard D and Geys H (2000) The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics, 1, 49–67. [DOI] [PubMed] [Google Scholar]
 - Chen J, Li W, Huang X, Guo C, Zou R, Yang Q, Zhang H, Zhang T, Chen H and Wu H (2013) Evaluating total lymphocyte count as a surrogate marker for CD4 cell count in the management of HIV-infected patients in resource-limited settings: a study from China. PloS ONE, 8, e69704. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Diabetes Prevention Program Group, (1999) The Diabetes Prevention Program: design and methods for a clinical trial in the prevention of Type 2 diabetes. Diabetes Care, 22, 623–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Diabetes Prevention Program Group, (2002) Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England Journal of Medicine, 346, 393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Doyen J, Alix-Panabières C, Hofman P, Parks SK, Chamorey E, Naman H and Hannoun-Lévi J-M (2012) Circulating tumor cells in prostate cancer: a potential surrogate marker of survival. Critical Reviews in Oncology/Hematology, 81, 241–256. [DOI] [PubMed] [Google Scholar]
 - Freedman LS, Graubard BI and Schatzkin A (1992) Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine, 11, 167–178. [DOI] [PubMed] [Google Scholar]
 - Garcia TP, Ma Y and Yin G (2011) Efficiency improvement in a class of survival models through model-free covariate incorporation. Lifetime Data Analysis, 17, 552–565. [DOI] [PubMed] [Google Scholar]
 - Han L, Wang X and Cai T (2021) On the evaluation of surrogate markers in real world data settings. arXiv preprint. arXiv:2104.05513. [Google Scholar]
 - Huang Y and Gilbert PB (2011) Comparing biomarkers as principal surrogate endpoints. Biometrics, 67, 1442–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Inker LA, Mondal H, Greene T, Masaschi T, Locatelli F, Schena FP, Katafuchi R, Appel GB, Maes BD, Li PK, et al. (2016) Early change in urine protein as a surrogate end point in studies of IgA nephropathy: an individual-patient meta-analysis. American Journal of Kidney Diseases, 68, 392–401. [DOI] [PubMed] [Google Scholar]
 - Joffe MM and Greene T (2009) Related causal frameworks for surrogate outcomes. Biometrics, 65, 530–538. [DOI] [PubMed] [Google Scholar]
 - Klein JP and Andersen PK (2005) Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics, 61, 223–229., 223–229. [DOI] [PubMed] [Google Scholar]
 - Lin D, Fleming T, De Gruttola V, et al. (1997) Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in Medicine, 16, 1515–1527. [DOI] [PubMed] [Google Scholar]
 - Parast L, Cai T and Tian L (2017) Evaluating surrogate marker information using censored data. Statistics in Medicine, 36, 1767–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Parast L, Cai T and Tian L (2021) Evaluating multiple surrogate markers with censored data. Biometrics, 77, 1315–1327. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Parast L, McDermott MM and Tian L (2016) Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in Medicine, 35, 1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Prentice RL (1989) Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine, 8, 431–440. [DOI] [PubMed] [Google Scholar]
 - Price BL, Gilbert PB and van der Laan MJ (2018) Estimation of the optimal surrogate based on a randomized trial. Biometrics, 74, 1271–1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Scott D (1992) Multivariate density estimation. In Multivariate Density Estimation. New York: Wiley. 1. [Google Scholar]
 - Sprenger T, Kappos L, Radue E-W, Gaetano L, Mueller-Lenke N, Wuerfel J, Poole EM and Cavalier S (2020) Association of brain volume loss and long-term disability outcomes in patients with multiple sclerosis treated with teriflunomide. Multiple Sclerosis Journal, 26, 1207–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Strobl C, Boulesteix A-L, Kneib T, Augustin T and Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Tian L, Cai T, Zhao L and Wei L-J (2012) On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial. Biostatistics, 13, 256–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Van der Elst W, Alonso AA, Geys H, Meyvisch P, Bijnens L, Sengupta R and Molenberghs G (2019) Univariate versus multivariate surrogates in the single-trial setting. Statistics in Biopharmaceutical Research, 11, 301–310. [Google Scholar]
 - VanderWeele TJ (2013) Surrogate measures and consistent surrogates. Biometrics, 69, 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Wang X, Parast L, Tian L and Cai T (2020) Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika, 107, 107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Wang Y and Taylor JM (2002) A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics, 58, 803–812. [DOI] [PubMed] [Google Scholar]
 - Wondimeneh Y, Ferede G, Yismaw G and Muluye D (2012) Total lymphocyte count as surrogate marker for CD4 cell count in HIV-infected individuals in Gondar University Hospital, Northwest Ethiopia. AIDS Research and Therapy, 9, 1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Xu J and Zeger SL (2001) The evaluation of multiple surrogate endpoints. Biometrics, 57, 81–87. [DOI] [PubMed] [Google Scholar]
 - Zhang M, Tsiatis AA and Davidian M (2008) Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64, 707–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
 
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings in this paper in Section 3.2 are available from the corresponding author upon reasonable request.
