Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2022 Jul 6;24(4):985–999. doi: 10.1093/biostatistics/kxac020

Doubly robust evaluation of high-dimensional surrogate markers

Denis Agniel 1,, Boris P Hejblum 2, Rodolphe Thiébaut 3, Layla Parast 4
PMCID: PMC10801117  PMID: 35791753

Summary

When evaluating the effectiveness of a treatment, policy, or intervention, the desired measure of efficacy may be expensive to collect, not routinely available, or may take a long time to occur. In these cases, it is sometimes possible to identify a surrogate outcome that can more easily, quickly, or cheaply capture the effect of interest. Theory and methods for evaluating the strength of surrogate markers have been well studied in the context of a single surrogate marker measured in the course of a randomized clinical study. However, methods are lacking for quantifying the utility of surrogate markers when the dimension of the surrogate grows. We propose a robust and efficient method for evaluating a set of surrogate markers that may be high-dimensional. Our method does not require treatment to be randomized and may be used in observational studies. Our approach draws on a connection between quantifying the utility of a surrogate marker and the most fundamental tools of causal inference—namely, methods for robust estimation of the average treatment effect. This connection facilitates the use of modern methods for estimating treatment effects, using machine learning to estimate nuisance functions and relaxing the dependence on model specification. We demonstrate that our proposed approach performs well, demonstrate connections between our approach and certain mediation effects, and illustrate it by evaluating whether gene expression can be used as a surrogate for immune activation in an Ebola study.

Keywords: Average treatment effect estimation, High-dimensional data, Surrogate marker evaluation

1. Introduction

When evaluating the effectiveness of a treatment, policy, or intervention, the desired measure of effectiveness may be expensive to collect, not routinely available, or may take a long time to occur. In these cases, it is sometimes possible to identify a surrogate outcome that can more easily, quickly, or cheaply capture the effect of interest. For example, when evaluating an intervention designed to delay dementia onset, the time required to observe enough dementia diagnoses is often very long, and surrogates that have been considered for intervention evaluation include mild cognitive impairment, adiponectin levels, neuroimages, amyloid plaques, and neurofibrillary tangles (Small, 2006; Teixeira and others 2013). Similarly, in studies evaluating treatments to prevent diabetes, surrogate measures for diabetes onset have included changes in body weight, fasting plasma glucose, and hemoglobin A1c (Caveney and Cohen, 2011; Choi and others 2011). And it is presently obvious that a surrogate marker for SARS-CoV-2 vaccine efficacy is required to develop the second generation of vaccines adapted to novel variants of concern (Karim, 2021).

Theory and methods for evaluating the strength of surrogate markers have been well studied in the context of a single surrogate marker measured in the course of one or more randomized clinical studies—see the detailed review in Joffe and Greene (2009). In particular, model-based approaches have been proposed for continuous (Alonso and others, 2004) and binary surrogates (Alonso and others, 2016) in meta-analysis and in the principal stratification framework (Gilbert and Hudgens, 2008). Robust nonparametric methods have been proposed in Parast and others (2020) and Parast and others (2016). However, fully nonparametric methods are not available or not reliable when the number of markers is more than one or two. In these cases, a parametric approach to evaluate a high-dimensional surrogate may be considered as proposed in Zhou and others (2022), which requires the correct specification of two high-dimensional linear models. Alternatively, an initial model may be used to reduce the dimensionality of the surrogate (Agniel and Parast, 2021; Parast and others 2016). However, the mis-specification of the parametric models or an inappropriate initial model may produce badly biased estimates of the utility of the surrogate.

In this work, we propose a robust and efficient method for evaluating multiple surrogate markers, with particular attention to the possibility that surrogates may be high-dimensional. Our approach revives the spirit of the estimator in Freedman and others (1992), itself based on the pioneering work of Prentice (1989) to draw on a connection between quantifying the utility of a surrogate marker and the most fundamental tools of causal inference: namely, methods for estimating the average treatment effect. This connection does not appear to have been made before in the surrogate marker literature, despite the extensive connections made between mediation and surrogacy (Taylor and others, 2005; Joffe and Greene, 2009). Making this connection facilitates the use of all of the machinery now available to robustly and efficiently estimate average treatment effects. Specifically, we take advantage of state-of-the-art methods for incorporating flexible machine learning and/or sparse high-dimensional models into the estimation of treatment effects. These methods based estimation on the efficient influence function of key quantities and use sample splitting to ensure convenient asymptotic distributions and inference while putting minimal restrictions on what types of estimators may be used. Furthermore, we show that our proposed approach has connections to certain mediation estimands when the assumptions underlying mediation analysis are satisfied.

One key benefit of our approach is that it does not require randomization of treatment, which has been required by previous nonparametric approaches to evaluating surrogate markers (Agniel and Parast, 2021; Parast and others, 2016). Assessments of surrogate markers are not solely limited to studies of randomized treatment (Obirikorang and others, 2012). In fact, using surrogate outcomes based on complex observational data is finding purchase in all corners of science (Wang and others, 2020). In Section 6, we give an example where researchers are interested in studying whether gene expression can be used as a surrogate for immune activation in an observational setting. Our approach exploits insights from observational causal inference to estimate all relevant quantities and thus is equally applicable in both observational and randomized settings.

The structure of the rest of the article is as follows. In Section 2, we lay out the notation and the setting in which we are working, and we motivate the importance of evaluating surrogate markers in light of recent advances in the surrogate marker literature. In Section 3, we detail assumptions necessary for identifying and interpreting parameters of interest. We discuss the identification, estimation, and the asymptotic behavior of our estimator in Section 4 and discuss variance estimation and inference. We evaluate the performance of the proposed approach using a simulation study in Section 5, and in Section 6, we investigate whether gene expression could be used as a surrogate for immune response to Ebola infection. We give final remarks and draw connections to other methods for evaluating surrogate markers in Section 7.

2. Evaluation of surrogate markers

2.1. Notation

Let Inline graphic denote a binary treatment, and let the primary outcome of the study be Inline graphic. Let there be a vector Inline graphic of potential surrogate information, and let Inline graphic be a vector of pretreatment covariates. The primary quantity of interest is the treatment effect on the outcome Inline graphic where Inline graphic is the potential or counterfactual outcome that would have been observed if treatment were Inline graphic, possibly contrary to fact. Similarly, let Inline graphic be the potential/counterfactual value the vector of surrogates would take if Inline graphic. Let the data observed in the current study be Inline graphic, Inline graphic iid realizations of Inline graphic.

2.2. Importance of quantifying surrogate strength

Understanding the strength of a potential surrogate is important for many reasons. In practice, information about surrogate strength will inform decisions regarding whether to measure the surrogate in a future study (especially if it is costly or invasive to measure) and/or whether to use the surrogate to assess treatment effectiveness in a future study. In addition, a number of novel statistical methods for using surrogates in future studies can only be applied when the surrogate is strong. For example, Parast and others (2019) propose a robust nonparametric procedure to test for a treatment effect using surrogate marker information measured prior to the end of the study in a time-to-event outcome setting, but they rely on the assumption that the surrogate is sufficiently strong. As another example, Price and others (2018) propose constructing a function Inline graphic of the surrogates that leads to some optimality properties, but it can be shown that this method should only be used with a strong surrogate.

Specifically, they construct an optimal transformation of the surrogates Inline graphic in terms of minimizing the following mean squared error

graphic file with name Equation1.gif (2.1)

with Inline graphic, under the constraint that it satisfies the so-called Prentice definition:

graphic file with name Equation2.gif

The optimal transformations in this case are shown to be Inline graphic In addition to this appealing optimality property, this proposal also resolves the so-called “surrogate paradox.” The paradox states that the treatment could have a positive causal effect on a univariate surrogate Inline graphic which could have a positive correlation with the outcome, but the treatment could yet have a negative effect on the outcome. The Price surrogate resolves the surrogate paradox by definition. The treatment effect on the transformed surrogate is by definition equivalent to the treatment effect on the outcome: Inline graphic

However, the Price surrogate’s resolution of the surrogate paradox is in some sense too good. The surrogate paradox is thus resolved for every potential set of surrogates, good and bad. Even if the surrogates Inline graphic are completely unrelated to the outcome Inline graphic, Inline graphic. In fact, the power to detect a treatment effect in a future study using Inline graphic actually increases as the surrogate becomes weaker (explaining less of the treatment effect), if the Prentice definition does not hold. Consider the toy example depicted in Figure 1 (see Appendix A of the Supplementary material available at Biostatistics online for details of the data generation). As the strength of the surrogate increases, Inline graphic becomes more like the true outcome Inline graphic. As the surrogate becomes weaker, Inline graphic becomes more like Inline graphic the mean outcome in treatment group Inline graphic from the first study. This means that a weak surrogate ensures that the treatment effect in the second study will be identical to the treatment effect in the first study because the distribution of Inline graphic will cluster closely around the estimate of Inline graphic from the first study (see specifically, the fourth panel of Figure 1). In this scenario, the second study is providing no new information about the treatment. Thus, the Price surrogate should only be used with a strong Inline graphic, and its use with a weak or possibly even moderately strong set of surrogates may not be advisable.

Fig. 1.

Fig. 1.

Distribution of outcomes Inline graphic in an initial study (Study 1) and surrogate functions (approximating the outcome) Inline graphic (surrogate in the treatment group) and Inline graphic (surrogate in the control group) in a future study (Study 2) based on simulated data, arranged according to the strength of the surrogate marker—i.e., the capacity of the surrogate to explain the treatment effect. The wider distribution on the left corresponds to the control group in Study 1, the wider distribution on the right to the treated group in Study 1, the narrower distribution on the left to Inline graphic in Study 2, and the narrower distribution on the right to Inline graphic in Study 2. Vertical dotted lines correspond to the means of the treatment groups in the first study. See Appendix A of the Supplementary material available at Biostatistics online for simulation details. Surrogate functions Inline graphic were estimated via a correctly specified linear regression in the first study and then applied to the second study population. When the surrogate was strong, the distribution of the surrogate in the second study was quite close to the distribution of the outcome in the first study. However, when the surrogate was weak, the distribution of the surrogate in the second study clustered around the group mean from the first study.

Therefore, for both decision-making purposes and to use statistical methods that take advantage of strong surrogates, methods are needed to rigorously quantify the strength of the proposed set of surrogates before using (functions of) the surrogates in practice. In the following section, we propose an efficient method for estimating the strength of a possibly high-dimensional set of surrogates in observational studies. Our aim is for this proposed method to complement existing work, like that of Parast and others (2019) and Price and others (2018), that rely on the availability of a strong surrogate but currently lack tools to assess surrogate strength, particularly in a high-dimensional surrogate setting.

2.3. Quantity to evaluate surrogate strength

In this section, we present a general method for evaluating the usefulness of a set of surrogates. This can be used to evaluate the suitability of any set of surrogates or surrogate transformations and is appropriate for Inline graphic of any dimension. To evaluate the surrogates’ usefulness, we use the proportion of treatment effect explained by Inline graphic (PTE) defined as Inline graphic with Inline graphic the residual treatment effect, or the treatment effect that remains after controlling for the surrogate information. This measure has been used frequently in the evaluation of surrogate markers, as far back as the pioneering works of Prentice (1989) and Freedman and others (1992) and more recently in Parast and others (2020; 2016) and Agniel and Parast (2021). If Inline graphic are independent of Inline graphic conditional on Inline graphic, then Inline graphic and Inline graphic. In contrast, if all of the treatment effects can be attributed to Inline graphic, then Inline graphic and Inline graphic.

In the more recent works using this PTE, Inline graphic is defined in terms of a particular reference distribution, e.g., taking Inline graphic to be the residual treatment effect among the treated group. This choice of reference distribution is often arbitrary and was not required by the older (model-based) approaches. To obviate the need for this choice of reference distribution and to take advantage of the extensive development in average treatment effect estimation (and the associated intuition and machinery that have been built up around it), we define Inline graphic, the average treatment effect conditional on the distribution of Inline graphic and Inline graphic both being equal to the distribution of Inline graphic (all conditional on Inline graphic). Other choices of reference distribution could be used when there are substantive reasons to prefer them. When there are no interactions with treatment – Inline graphic – choice of reference distribution does not change Inline graphic. See Section 7 for further connections in this vein to Agniel and Parast (2021) and similar recent methods.

This formulation of PTE also has deep connections to quantities important to mediation analysis without requiring some of the restrictive assumptions common in the mediation literature. In particular, we show in Appendix B of the Supplementary material available at Biostatistics online that Inline graphic is a function of conditional natural direct effects and that Inline graphic is a function of conditional natural indirect effects (Joffe and Greene, 2009; VanderWeele 2013), when the assumptions for identifying those effects are met. While many previous reviews of surrogate methods identify the causal effects framework for surrogate evaluation solely with mediation (Joffe and Greene, 2009; Conlon and others, 2017), we use causal tools without requiring all of the assumptions necessary for mediation. Importantly, we do not require that all confounders of the surrogate–outcome relationship are measured and included in the study because the aims of mediation and surrogate marker evaluation are different (VanderWeele, 2013). The aim of identifying a mediator is determining whether the effect of treatment operates through the mediator itself, e.g., through some biological pathway. Often, a good surrogate marker is similarly conceptualized as a variable through which the treatment operates, but this is not necessarily required; a variable can be a good surrogate if it captures the treatment effect on the outcome, even if the treatment effect does not operate through the variable itself (sometimes called a nonmechanistic correlate of protection; Plotkin and Gilbert, 2012).

3. Assumptions

3.1. Identifying assumptions

We first require Inline graphic, without which requirement the goals of identifying surrogate markers are practically and theoretically not meaningful. We further make the three typical assumptions of treatment effect estimation: consistency, positivity, and no unmeasured confounding. Specifically, we assume that the observed values of Inline graphic and Inline graphic when Inline graphic are identical to the counterfactuals Inline graphic and Inline graphic such that Inline graphic and Inline graphic We furthermore assume that Inline graphic contains all confounders of the effects of Inline graphic on the surrogates and the outcome, such that the treatment Inline graphic is as good as randomized conditional on the covariates Inline graphic: (A.1) Inline graphic In addition, we assume two forms of positivity, which ensures that individuals in the two study arms are not too different from one another. First, the usual positive probability of receiving either treatment for some Inline graphic, Inline graphic and a related assumption that further conditions on the surrogates, Inline graphic for some Inline graphic where Inline graphic. Notably, letting Inline graphic be the density of the random variable Inline graphic evaluated at Inline graphic, because Inline graphic these two positivity conditions ensure that the conditional distribution of the counterfactual surrogates under treatment and control cannot be too different from one another, i.e., ensuring overlap. When the treatment has a large effect on Inline graphic, this additional overlap requirement may be suspect—e.g., there may be some values of Inline graphic such that Inline graphic approaches 0 and thus Inline graphic also approaches 0. We discuss a generalization of our approach that removes this overlap condition in Appendix C of the Supplementary material available at Biostatistics online.

3.2. Required assumptions to ensure interpretation of Inline graphic as a proportion

The interpretation of Inline graphic as the PTE depends on it actually being a proportion, lying between 0 and 1. We adapt conditions in Wang and Taylor (2002) and Agniel and Parast (2021) which ensure that Inline graphic. These conditions are as follows (assuming without loss of generality that Inline graphic): (A.2)Inline graphic; and (A.3) Inline graphic The condition (A.3) ensures that Inline graphic, i.e., that the residual treatment effect is in the same direction as the overall treatment effect Inline graphic. Condition (A.2), which ensures that Inline graphic, requires that, roughly speaking, a propensity-weighted mixture of the two conditional mean functions Inline graphic is larger when Inline graphic takes values from the distribution of the counterfactual surrogates under treatment than if it took values from the distribution under control. These two conditions are analogs of conditions (A6) and (A7) in Agniel and Parast (2021). See Appendix D of the Supplementary material available at Biostatistics online for an alternative approach that may be considered if these conditions are not or unlikely to be met, though the alternative approach lacks the interpretability and connections to previous literature of the proposed approach.

4. Identification, estimation, and inference

4.1. Identifiability

In this section, we show that the effects of interest are identifiable, propose a robust and efficient estimation procedure, and describe the asymptotic properties of our proposed estimators. The effects of interest may be identified as average treatment effects identifiable from the data, one of which conditions on the surrogates (Inline graphic) and one which does not (Inline graphic). In particular, the residual treatment effect may be identified as

graphic file with name Equation3.gif (4.2)
graphic file with name Equation4.gif (4.3)

where Inline graphic. The result in the first equality of (4.2) follows from the definition of Inline graphic and the fact that Inline graphic because of (A.1), and the other results follow shortly thereafter, following familiar paths as arguments for the identification of the average treatment effect in other contexts. These results show that the residual treatment effect may be identified without knowledge of the mean function for the outcome via the second equality in (4.2) and gives an augmented inverse probability weighting (Bang and Robins, 2005) version of the estimand (4.3). Inline graphic may also be identified, following similar standard arguments and using similar functionals with Inline graphic replaced by Inline graphic and Inline graphic replaced by Inline graphic.

4.2. Proposed estimation

This identification of Inline graphic and Inline graphic in terms of average treatment effects allows us to take advantage of the rich literature on robust estimation of these quantities. We propose to estimate Inline graphic and Inline graphic as

graphic file with name Equation5.gif

and thus, estimate Inline graphic as Inline graphic, where we leave estimation of the components: Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic to be quite general. We propose and evaluate two different versions of this proposed estimator: one where we estimate these components using the Super Learner (Van der Laan and others, 2007), which finds an optimal combination of a set of candidate models or learners (denoted “DR-SL”) and another that uses the relaxed lasso (Meinshausen, 2007) (denoted “DR-lasso”). In addition, our proposed estimators use a sample-splitting scheme (Chernozhukov and others, 2017), which we describe in detail in Appendix C of the Supplementary material available at Biostatistics online, to avoid placing restrictive conditions on the estimation of the nuisance functions.

Remark 1.

While Inline graphic depends on the counterfactual quantities Inline graphic and is thus not the same target treatment effect estimated by Chernozhukov and others (2017), because it is identified by the same observed data quantities in (4.3), we can use the same machinery used in Chernozhukov and others (2017), though requiring the stronger assumption (A.1).

4.3. Inference

Under very general conditions, including the standard causal assumptions identified in Section 3.1, Inline graphic will converge at the parametric Inline graphic rate and be asymptotically normal (Chernozhukov and others, 2017; Farrell, 2015) so long as

graphic file with name Equation6.gif (4.4)

with similar results holding for Inline graphic if

graphic file with name Equation7.gif (4.5)

As these estimators are built on the efficient influence functions, they are also semiparametrically efficient. Furthermore, as we show in Appendix C of the Supplementary material available at Biostatistics online, as long as the sample splits are the same for Inline graphic and Inline graphic, we will have Inline graphic if both (4.4) and (4.5) hold. Specifically, we have that

graphic file with name Equation8.gif (4.6)
graphic file with name Equation9.gif (4.7)

where

graphic file with name Equation10.gif
Remark 2.

When the dimension of Inline graphic is not high, (4.4) and (4.5) are not in general restrictive, as many parametric and nonparametric methods are able to achieve the slow rates required of the estimators under certain conditions, including the lasso (Tibshirani, 1996), random forests (Wager and Walther, 2015), and deep neural networks (Farrell and others, 2021), when used with a sample-splitting scheme. Ensemble methods such as the Super Learner (Van der Laan and others, 2007) may also be used to combine these and other methods to obtain good performance if any of the methods achieves the required rates of consistency. As described in the prior section, our proposed approach combines both ensemble estimation of nuisance functions and sample splitting; we demonstrate the good performance of this approach in finite samples in Section 5.

In higher dimensions, the convergence in (4.4) and (4.5) is more difficult to ensure without further restrictions. If only Inline graphic is high-dimensional, then a typical approach is to assume that Inline graphic may be specified as a sparse linear model Inline graphic and Inline graphic may be specified as a sparse logistic regression model Inline graphic for Inline graphic, and with Inline graphic sparse enough to ensure (4.4). Because of the fact that Inline graphic is related to Inline graphic, restricting Inline graphic to this class of models is unproblematic so long as Inline graphic is low-dimensional and Inline graphic may be estimated nonparametrically (i.e., without being restricted to the class of sparse linear models). However, if Inline graphic is also high-dimensional, sparse logistic models for Inline graphic and Inline graphic may not in general be compatible with one another because of the well-known noncollapsibility of logistic regression (Guo and Geng, 1995), unless Inline graphic (which would imply Inline graphic) or Inline graphic (which would imply that Inline graphic does not confound the relationship between Inline graphic and Inline graphic). In these cases, the approach of Tan (2020) may be used to estimate the nuisance functions, in which case only one of Inline graphic and Inline graphic are required to be a correctly specified logistic regression. However, this approach did not appear to perform well in our simulations in Section 5. Despite these theoretical considerations, our simulations suggest that an ensemble approach using the Super Learner may outperform approaches based on sparse linear models, as in Tan (2020) or in the version of our estimator that uses only lasso regression. More theoretical development in this area may be warranted.

5. Simulations

5.1. Simulation overview

Our proposed estimator based on the Super Learner (“DR-SL”) is implemented using the Inline graphic package (Polley and others, 2021) in Inline graphic. The candidate learners we included were the lasso, ridge regression, ordinary least squares, support vector machines, and random forests for Inline graphic and the lasso, logistic regression, linear discriminant analysis, quadratic discriminant analysis, support vector machines, and random forests for Inline graphic and Inline graphic. Our second estimator using the relaxed lasso (“DR-lasso”) is implemented using the Inline graphic package (Friedman and others, 2010), to estimate all needed functions. For all simulations, we truncated estimates of the propensity and surrogate scores so that Inline graphic for all Inline graphic. We used the default cross-validation procedure to select the tuning parameters for the Super Learner. For the DR-lasso estimator, we used the default crossvalidation to select the tuning parameters. It is possible that performance could be improved by tweaking the tuning parameters for the candidate learners.

It is our understanding that there are no currently available methods to robustly estimate the PTE of a high-dimensional surrogate. However, since there are available methods to measure the strength of high-dimensional mediators, we compare our proposed approach to these available methods. While the goals of mediation analysis and surrogate markers analysis are different and the necessary assumptions differ, this allows us to offer some reasonable comparison to methods that are currently available, rather than no comparison at all. Thus, to fairly compare, we set up the majority of our simulations such that Inline graphic is a mediator in that it lies on the causal pathway between Inline graphic and Inline graphic. While mediation methods are attempting to estimate a distinct quantity from Inline graphic, in three of the four following simulation settings, the estimation of the “proportion mediated” (which is used in mediation) and Inline graphic are the same.

In our simulations, we compare our proposed approach to: high-dimensional mediation analysis (HIMA, Zhang and others, 2016) as implemented in the Inline graphic package (Zheng and others, 2018); Bayesian mediation analysis (BAMA, Song and others, 2020) as implemented in the Inline graphic package (Rix and Song, 2021); and high-dimensional linear mediation analysis (Zhou and others, 2020) as implemented in the Inline graphic package. All three of these methods propose Inline graphic linear models: Inline graphic models for the surrogates/mediators— Inline graphic— and one outcome model— Inline graphic— though the implementation of Inline graphic does not allow for the inclusion of covariates Inline graphic. Using these models, the overall treatment effect can be identified as Inline graphic, and as above the PTE (or proportion mediated) may be identified as Inline graphic

We computed 95Inline graphic confidence intervals (CIs) for Inline graphic as Inline graphic, where Inline graphic is an estimate of (4.6). We computed 95Inline graphic credible intervals for the BAMA estimator from the 2.5Inline graphic and 97.5Inline graphic quantiles of the posterior distribution of Inline graphic CIs were not computed for the HIMA and freebird implementations because they are not available.

5.2. Simulation setup

We constructed four sets of simulations for a total of 18 simulation settings to assess the performance of our proposed approach in both low- and high-dimensional settings.

For the first set of simulations, the data-generating mechanism for Inline graphic was linear in Inline graphic, the data-generating mechanism for Inline graphic was linear in Inline graphic and Inline graphic, and the propensity score was linear on the log-odds scale. Given this data-generating mechanism, we would expect that all methods (proposed and comparisons) should perform reasonably well. We set the dimension of Inline graphic (Inline graphic) and of Inline graphic (Inline graphic) to be 100. We let Inline graphic, where Inline graphic. The surrogates were generated as Inline graphic, where Inline graphic. And only five of the covariates were important for determining the surrogate: Inline graphic. The outcome counterfactuals were given by Inline graphic, suggesting that only the first two surrogates and 25 of the covariates were important for determining the outcome. The errors Inline graphic and Inline graphic were Inline graphic. We let the sample size vary between Inline graphic and Inline graphic and the level of noise between Inline graphic and Inline graphic—for a total of four settings—and we set Inline graphic

In the second set of simulations (six more settings), the data-generating mechanism was less linear, including interactions between covariates in both the propensity score and the model for Inline graphic, though the outcome model was still a simple linear combination of Inline graphic and Inline graphic. In the third set of simulations (two more settings), the dimension of the surrogates (Inline graphic) was much larger than the sample size (Inline graphic), and correct specification of the outcome functions also included nonlinear terms. These simulations should mimic what may happen in practice since all models are typically subject to some amount of mis-specification. We expected the nonlinearity to induce bias in the competing methods (which require linear models to hold). In the fourth set of simulations (six more settings), we specified Inline graphic so that it was not a mediator but was instead downstream of a true mediator. We performed 1000 replications for each simulation setting. We give full details and results for these simulations in Appendix E of the Supplementary material available at Biostatistics online.

5.3. Results

As expected, the first data-generating mechanism was well approximated using linear models. When the sample size was large (Inline graphic) and Inline graphic, all methods performed well, with a relatively small bias (see Figure 2). The median Inline graphic estimates were 0.46 (BAMA), 0.48 (freebird), 0.49 (HIMA), 0.48 (DR-SL), and 0.49 (DR-lasso), all of which compare favorably to the true Inline graphic value of 0.5. The distribution of estimates for the proposed approaches was a bit tighter than for the competing methods. The empirical 2.5Inline graphic and 97.5Inline graphic quantiles were 0.32 and 0.60 for DR-SL, 0.35 and 0.59 for DR-lasso, 0.15 and 0.67 for HIMA, 0.11 and 0.63 for BAMA, and Inline graphic0.85 and 0.71 for freebird. Results were similar for the sample size with less noise (Inline graphic), except for the freebird approach which began to estimate the PTE to be exactly 1 in almost all simulations: the empirical 2.5Inline graphic and 97.5Inline graphic quantiles of the Inline graphic distribution for freebird were exactly 1. At the lower sample size (Inline graphic), HIMA did not run, while BAMA estimates tended to be near 0 and freebird estimates were again clustered tightly around 1. Our proposed approaches had median Inline graphic values of 0.45 (DR-SL) and 0.56 (DR-lasso), though with more variability than when Inline graphic. The median absolute deviation (Inline graphic) was smaller for both DR-SL and DR-lasso than any of the competing approaches for all settings.

Fig. 2.

Fig. 2.

Distribution of estimates of Inline graphic in the first data-generating mechanism. Lighter shaded regions represent the distribution of the proposed estimators (“DR-SL” and “DR-lasso”); darker shaded regions represent the distribution of the comparison estimators (“BAMA,” “freebird,” and “HIMA”). The true value of Inline graphic is given as a vertical dotted line at 0.5. At the lower sample size (Inline graphic), HIMA did not run and so is not shown.

CI coverage for the proposed estimators tended to be quite good for DR-SL in all settings: coverage was 95Inline graphic (Inline graphic), 96Inline graphic (Inline graphic), 94Inline graphic (Inline graphic), and 97Inline graphic (Inline graphic). Coverage for the DR-lasso estimator was worse in general: coverage was 88Inline graphic (Inline graphic), 82Inline graphic (Inline graphic), 89Inline graphic (Inline graphic), and 96Inline graphic (Inline graphic)

However, BAMA CIs also did not uniformly obtain nominal coverage: 96Inline graphic (Inline graphic), 99Inline graphic (Inline graphic), 92Inline graphic (Inline graphic), and 85Inline graphic (Inline graphic). Even when BAMA CIs had nominal coverage, they were more than four times larger than the CIs for the proposed estimators: for example, BAMA CI half-lengths were 1.40 (Inline graphic) and 1.61 (Inline graphic), while the corresponding half-lengths were 0.25 and 0.30 for DR-SL and 0.28 and 0.44 for DR-lasso.

The proposed estimators also outperformed the competing methods in the additional simulation settings where models were subject to mis-specification and the dimensionality of the surrogates was much larger than the sample size. See Appendix E of the Supplementary material available at Biostatistics online for complete results.

6. Ebola immune response application

The concentration of binding antibodies is often used as the primary outcome of interest in studies of Ebola vaccine efficacy, being itself a surrogate of vaccine efficacy as measured by the effect on the incidence of infections (Roozendaal and others, 2020). Because gene expression is the means by which DNA is turned into RNA and eventually proteins, it is associated with cellular function. Thus, the establishment of the humoral immune response may be captured by changes in gene expression as suggested by early works on systems vaccinology (Li and others, 2014). Furthermore, gene expression changes may occur days or even weeks before traditional measures of immune function (Rechtien and others, 2017). If gene expression can act as a surrogate for immune response, then it could possibly be used to shorten vaccine trials or to quickly measure the effect of vaccination in a population. Finally, genome-wide expression data offer the opportunity at looking at very different ways of influencing the response to intervention which constitutes potential surrogate markers. In this study, we sought to use observational data on long-term Ebola survivors and healthy controls to shed light on the possibility of gene expression’s use as a surrogate for antibody response to Ebola virus. This aim is inspired by the study of potential surrogates of protection among Ebola disease survivors (Sullivan and others, 2009).

In total, 26 Ebola survivors of the 2013–2016 Ebola outbreak in West Africa were recruited from the Postebogui cohort (Etard and others, 2017) as well as 33 healthy donors as described in Wiedemann and others (2020), each of whom had an expression for 29 624 genes quantified from whole blood RNA-seq (freely available from the Gene Expression Omnibus repository with accession code GSE143549) as well as the concentration of specific Ebola binding antibodies measured. Clearly, this is a setting where the number of potential surrogate markers (the genes) is substantially larger than the sample size. Given the superior performance of the DR-SL in our simulation study, and the high-dimensional setting, we focus only on using the DR-SL estimator to quantify how much of the overall difference in humoral immune response between survivors and healthy donors could be captured by the measured gene expression data. We used the same candidate learners as specified in Section 5. Propensity and surrogate scores (Inline graphic and Inline graphic) were truncated at 0.05 and 0.95 to prevent instability due to extreme weights, and Inline graphic included age and sex.

Ebola survivors were estimated to have a much higher abundance of Ebola-specific antibodies (Inline graphic, SE Inline graphic 851.5). Using DR-SL, the residual treatment effect, Inline graphic, was estimated as Inline graphic with SE Inline graphic 727.8, and the proportion of the difference explained by gene expression was estimated as Inline graphic, with a SE of 0.07923. Thus, a large part of the humoral immune response cannot be explained by the differences in gene expression. Of note, we assumed no unmeasured confounding factors although it cannot be guaranteed in such a real-life context where survivors and healthy volunteers are two selected populations. If unmeasured confounding inflated both Inline graphic and Inline graphic roughly equally, this could have the effect of artificially deflating Inline graphic. Another explanation for the low estimated Inline graphic could be that, while gene expression measured shortly after infection may potentially be a good surrogate, measuring it long after infection (as in this study), does not capture the treatment effect as well. Measurement error in Inline graphic could also deflate the PTE. Importantly, one other potential violation of our assumptions is that about one-third of the observations have truncated surrogate scores—Inline graphic or Inline graphic. This suggests that positivity might be (nearly) violated. We discuss this further in Appendix C of the Supplementary material available at Biostatistics online along with a potential solution.

7. Discussion

We have proposed a very general approach to evaluating surrogate markers which can be applied in randomized experiments or in observational studies and can be used regardless of the dimensionality of the surrogates. Our approach is robust in that we have defined the PTE of the surrogates without reference to any models, and we have shown how machine learning approaches like Super Learner may be used to very flexibly estimate nuisance functions. Our simulation results suggest that our Super Learner estimator (DR-SL) outperforms competing methods even when the underlying data-generating mechanism is linear and still gives reasonable results even when high-dimensional linear models are mis-specified.

Our approach here is intimately tied to previous approaches for evaluating surrogate markers. The approaches in Agniel and Parast (2021) and in Parast and others (2016) can be seen to be similar to a version of our approach where the average treatment effect among controls is estimated under further assumptions. In Agniel and Parast (2021), they further require strict randomization of treatment, and they assume that Inline graphic is a realization of a smooth continuous function. Parast and others (2016) make similar assumptions but take Inline graphic to be a scalar surrogate. They estimate a version of (4.2) among controls: Inline graphic using kernel smoothing and taking advantage of the fact that treatment is randomized. Their estimates have the form Inline graphic, where Inline graphic and Inline graphic is estimated via kernel smoothing (possibly after dimension reduction). Our approach here could easily be adapted to estimate a similar quantity by using methods for doubly robust estimation of the average treatment effect on the treated (Shu and Tan, 2018; Moodie and others, 2018; Chernozhukov and others, 2017) by, for example, Inline graphic where Inline graphic may be estimated using a model similar to the one used for Inline graphic. Furthermore, our approach could be used to facilitate the use of machine learning methods in the estimation of Inline graphic, to include covariates to control for confounding, or to simplify or strengthen asymptotic results. For example, results for the kernel estimator used in Agniel and Parast (2021) obtain rates of convergence of Inline graphic only under very limited technical conditions, but using sample-splitting parametric rates of convergence could be obtained under very general conditions.

Supplementary Material

kxac020_Supplementary_Data

Acknowledgments

We thank Yves Lévy and all of the investigators at the Vaccine Research Institute for providing the Ebola survivor data. We thank the Postebogui team for their daily work and the survivors. This work was supported by grant R01 DK118354 from the National Institute of Diabetes and Digestive and Kidney Diseases as well as by the Investissements d’Avenir program managed by the ANR under reference ANR-10-LABX-77-01.

Conflict of Interest: None declared.

Contributor Information

Denis Agniel, RAND Corporation, 1776 Main St. Santa Monica, CA, 90401, USA.

Boris P Hejblum, Univ. Bordeaux, INSERM, INRIA, BPH, U1219, SISTM, F-33000 Bordeaux, France and Vaccine Research Institute, F-94000 Créteil, France.

Rodolphe Thiébaut, Univ. Bordeaux, INSERM, INRIA, BPH, U1219, SISTM, F-33000 Bordeaux, France, CHU de Bordeaux, Service d’Information médicale, F-33000 Bordeaux, France and Vaccine Research Institute, F-94000 Créteil, France.

Layla Parast, University of Texas at Austin, Department of Statistics and Data Sciences, 3925 West Braker Lane, Austin, TX 78759, USA.

8. Software

We include software to implement our proposed methods in the R package Inline graphic available at github.com/denisagniel/crossurr.

Supplementary material

Supplementary material is available online at http://biostatistics.oxfordjournals.org.

References

  1. Agniel, D. and Parast, L. (2021). Evaluation of longitudinal surrogate markers. Biometrics 77, 477–489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alonso, A., Molenberghs, G.. and others. (2004). Prentice’s approach and the meta-analytic paradigm: a reflection on the role of statistics in the evaluation of surrogate endpoints. Biometrics 60, 724–728. [DOI] [PubMed] [Google Scholar]
  3. Alonso, A., Van der Elst, W., Molenberghs, G., Buyse, M. and Burzykowski, T. (2016). An information-theoretic approach for the evaluation of surrogate endpoints based on causal inference. Biometrics 72, 669–677. [DOI] [PubMed] [Google Scholar]
  4. Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–973. [DOI] [PubMed] [Google Scholar]
  5. Caveney, E. J. and Cohen, O. J. (2011). Diabetes and biomarkers. Journal of Diabetes Science and Technology 5, 192–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C. and Newey, W. (2017). Double/debiased/Neyman machine learning of treatment effects. American Economic Review 107, 261–265. [Google Scholar]
  7. Choi, S. H., Kim, T. H., Lim, S., Park, K. S., Jang, H. C. and Cho, N. H. (2011). Hemoglobin A1c as a diagnostic tool for diabetes screening and new-onset diabetes prediction: a 6-year community-based prospective study. Diabetes Care 34, 944–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Conlon, A., Taylor, J., Li, Y., Diaz-Ordaz, K. and Elliott, M. (2017). Links between causal effects and causal association for surrogacy evaluation in a Gaussian setting. Statistics in Medicine 36, 4243–4265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Etard, J.-F., Sow, M. S., Leroy, S., Touré, A., Taverne, B., Keita, A. K., Msellati, P., Baize, S., Raoul, H., Izard, S. and Kpamou, C. (2017). Multidisciplinary assessment of post-Ebola sequelae in Guinea (Postebogui): an observational cohort study. The Lancet Infectious Diseases 17, 545–552. [DOI] [PubMed] [Google Scholar]
  10. Farrell, M. H. (2015). Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics 189, 1–23. [Google Scholar]
  11. Farrell, M. H., Liang, T. and Misra, S. (2021). Deep neural networks for estimation and inference. Econometrica 89, 181–213. [Google Scholar]
  12. Freedman, L. S., Graubard, B. I. and Schatzkin, A. (1992). Statistical validation of intermediate endpoints for chronic diseases. Statistics in Medicine 11, 167–178. [DOI] [PubMed] [Google Scholar]
  13. Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22. [PMC free article] [PubMed] [Google Scholar]
  14. Gilbert, P. B. and Hudgens, M. G. (2008). Evaluating candidate principal surrogate endpoints. Biometrics 64, 1146–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Guo, J. and Geng, Z. (1995). Collapsibility of logistic regression coefficients. Journal of the Royal Statistical Society. Series B (Methodological) 57, 263–267. [Google Scholar]
  16. Joffe, M. M. and Greene, T. (2009). Related causal frameworks for surrogate outcomes. Biometrics 65, 530–538. [DOI] [PubMed] [Google Scholar]
  17. Karim, S. S. A. (2021). Vaccines and SARS-CoV-2 variants: the urgent need for a correlate of protection. The Lancet 397, 1263–1264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Li, S., Rouphael, N.. and others. (2014). Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nature Immunology 15, 195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Meinshausen, N. (2007). Relaxed lasso. Computational Statistics & Data Analysis 52, 374–393. [Google Scholar]
  20. Moodie, E. E. M., Saarela, O. and Stephens, D. A. (2018). A doubly robust weighting estimator of the average treatment effect on the treated. Stat 7, e205. [Google Scholar]
  21. Obirikorang, C., Quaye, L. and Acheampong, I. (2012). Total lymphocyte count as a surrogate marker for CD4 count in resource-limited settings. BMC Infectious Diseases 12, 1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Parast, L., Cai, T. and Tian, L. (2019). Using a surrogate marker for early testing of a treatment effect. Biometrics 75, 1253–1263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Parast, L., McDermott, M. M. and Tian, L. (2016). Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in Medicine 35, 1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Parast, L., Tian, L. and Cai, T. (2020). Assessing the value of a censored surrogate outcome. Lifetime Data Analysis 26, 245–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Plotkin, S. A. and Gilbert, P. B. (2012). Nomenclature for immune correlates of protection after vaccination. Clinical Infectious Diseases 54, 1615–1617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Polley, E., LeDell, E., Kennedy, C., Lendle, S. and van der Laan, M. (2021). SuperLearner: Super Learner Prediction. R package version 2.0-28. https://CRAN.R-project.org/package=SuperLearner. [Google Scholar]
  27. Prentice, R. L. (1989). Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431–440. [DOI] [PubMed] [Google Scholar]
  28. Price, B. L., Gilbert, P. B. and van der Laan, M. J. (2018). Estimation of the optimal surrogate based on a randomized trial. Biometrics 74, 1271–1281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rechtien, A., Richert, L.. and others. (2017). Systems vaccinology identifies an early innate immune signature as a correlate of antibody responses to the Ebola vaccine rVSV-ZEBOV. Cell Reports 20, 2251–2261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Rix, A., Kleinsasser, M. and Song, Y. (2021). BAMA: High Dimensional Bayesian Mediation Analysis. R package version 1.2. https://CRAN.R-project.org/package=bama [Google Scholar]
  31. Roozendaal, R., Hendriks, J.. and others. (2020). Nonhuman primate to human immunobridging to infer the protective effect of an Ebola virus vaccine candidate. NPJ Vaccines 5, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Shu, H. and Tan, Z. (2018). Improved estimation of average treatment effects on the treated: local efficiency, double robustness, and beyond. arXiv preprint arXiv:1808.01408. [Google Scholar]
  33. Small, G. W. (2006). Diagnostic issues in dementia: neuroimaging as a surrogate marker of disease. Journal of Geriatric Psychiatry and Neurology 19, 180–185. [DOI] [PubMed] [Google Scholar]
  34. Song, Y., Zhou, X., Zhang, M., Zhao, W., Liu, Y., Kardia, S. L., Roux, A.V.D., Needham, B.L., Smith, J. A. and Mukherjee, B. (2020). Bayesian shrinkage estimation of high dimensional causal mediation effects in omics studies. Biometrics 76, 700–710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sullivan, N. J., Martin, J. E., Graham, B. S. and Nabel, G. J. (2009). Correlates of protective immunity for Ebola vaccines: implications for regulatory approval by the animal rule. Nature Reviews Microbiology 7, 393–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Tan, Z. (2020). Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data. The Annals of Statistics 48, 811–837. [Google Scholar]
  37. Taylor, J. M. G., Wang, Y. and Thiébaut, R. (2005). Counterfactual links to the proportion of treatment effect explained by a surrogate marker. Biometrics 61, 1102–1111. [DOI] [PubMed] [Google Scholar]
  38. Teixeira, A. L., Diniz, B. S., Campos, A. C., Miranda, A. S., Rocha, N. P., Talib, L. L., Gattaz, W. F. and Forlenza, O. V. (2013). Decreased levels of circulating adiponectin in mild cognitive impairment and Alzheimer’s disease. Neuromolecular medicine 15, 115–121. [DOI] [PubMed] [Google Scholar]
  39. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288. [Google Scholar]
  40. van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology 6, 1–23. [DOI] [PubMed] [Google Scholar]
  41. VanderWeele, T. J. (2013). Surrogate measures and consistent surrogates. Biometrics 69, 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Wager, S. and Walther, G. (2015). Adaptive concentration of regression trees, with application to random forests. arXiv preprint arXiv:1503.06388. [Google Scholar]
  43. Wang, S., McCormick, T. H., and Leek, J. T. (2020). Methods for correcting inference based on outcomes predicted by machine learning. Proceedings of the National Academy of Sciences 117, 30266–30275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wang, Y. and Taylor, J. M. G. (2002). A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics 58, 803–812. [DOI] [PubMed] [Google Scholar]
  45. Wiedemann, A., Foucat, E.. and others. (2020). Long-lasting severe immune dysfunction in Ebola virus disease survivors. Nature Communications 11, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Zhang, H., Zheng, Y.. and others. (2016). Estimating and testing high-dimensional mediation effects in epigenetic studies. Bioinformatics 32, 3150–3154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zheng, Y., Zhang, H., Hou, L. and Liu, L. (2018). HIMA: High-Dimensional Mediation Analysis. R package version 1.0.7. [Google Scholar]
  48. Zhou, R. R., Wang, L. and Zhao, S. D. (2020). Estimation and inference for the indirect effect in high-dimensional linear mediation models. Biometrika 107, 573–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zhou, R. R., Zhao, S. D. and Parast, L. (2022). Estimation of the proportion of treatment effect explained by a high-dimensional surrogate. Statistics in Medicine 41, 2227–2246. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxac020_Supplementary_Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES