Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Sep 17;16(2):400–412. doi: 10.1093/biostatistics/kxu043

Surrogacy marker paradox measures in meta-analytic settings

Michael R Elliott 1,*, Anna SC Conlon 1, Yun Li 1, Nico Kaciroti 1, Jeremy MG Taylor 1
PMCID: PMC4366594  PMID: 25236906

Abstract

Because of the time and expense required to obtain clinical outcomes of interest, such as functional limitations or death, clinical trials often focus the effects of treatment on earlier and more easily obtained surrogate markers. Preliminary work to define surrogates focused on the fraction of a treatment effect “explained” by a marker in a regression model, but as notions of causality have been formalized in the statistical setting, formal definitions of high-quality surrogate markers have been developed in the causal inference framework, using either the “causal effect” or “causal association” settings. In the causal effect setting, high-quality surrogate markers have a large fraction of the total treatment effect explained by the effect of the treatment on the marker net of the treatment on the outcome. In the causal association setting, high-quality surrogate markers have large treatment effects on the outcome when there are large treatment effects on the marker, and small effects on the outcome when there are small effects on the marker. A particularly important feature of a surrogate marker is that the direction of a treatment effect be the same for both the marker and the outcome. Settings in which the marker and outcome are positively associated but the marker and outcome have beneficial and harmful or harmful and beneficial treatment effects, respectively, have been referred to as “surrogate paradoxes”. If this outcome always occurs, it is not problematic; however, as correlations among the outcome, marker, and their treatment effects weaken, it may occur for some trials and not for others, leading to potentially incorrect conclusions, and real-life examples that shortened thousands of lives are unfortunately available. We propose measures for assessing the risk of the surrogate paradox using the meta-analytic causal association framework, which allows us to focus on the probability that a given treatment will yield treatment effect in different directions between the marker and the outcome, and to determine the size of a beneficial effect of the treatment on the marker required to minimize the risk of a harmful effect of the treatment on the outcome. We provide simulations and consider two applications.

Keywords: Causal inference: Surrogate marker, Surrogate paradox

1. Introduction

In order to reduce costs and speed clinical research, determining good surrogate markers for clinical outcomes is of increasing interest. “Good” surrogate markers essentially allow one to understand how a given treatment would impact an outcome, but in a less expensive and more timely fashion. Examples include CD4 counts in blood as a marker for AIDS morbidity and mortality, blood pressure as a marker for risk of heart attack, and tumor progression as a marker for survival in cancer. Prentice (1989) provided one of the first formalized approaches. He suggested that good surrogates would have the property of being highly related to the outcome, and, when put into a regression model with treatment, would eliminate the coefficient associated with the treatment. When formulated in a causal inference setting, a shortcoming of this approach is that it implicitly assumes no confounders between the marker and the outcome, which cannot be guaranteed in most settings even if treatment is randomized, since the marker is observed post-treatment. Hence, two major approaches have been developed to assess surrogates in a causal framework, termed by Joffe and Greene (2009) “causal effects” (Lauritzen, 2004) and “causal association” (Frangakis and Rubin (2002)). In the causal effects paradigm, the treatment and surrogate marker are considered to be separately manipulable, allowing the total effect of the treatment to be decomposed into direct effects (holding the marker constant and changing treatment) and indirect effects (holding the treatment constant and changing the marker) to be assessed. In this setting, good surrogates leave little direct effect of the treatment. The causal association approach assumes only a manipulable treatment, finessing the issue of post-randomization of the marker by conditioning on the joint counterfactual distribution of the marker under both treatment and control. Here a good surrogate shows little effect of the treatment on the outcome when the marker is not affected by the treatment, with the treatment effect on the outcome closely tracking the treatment effect on the marker away from 0. Both the causal effects and causal association paradigms require assumptions to make progress, since their general models are unidentified. An alternative causal association approach, termed “meta-analytic” by Buyse and others (2000) and applicable only when multiple trials are available, assesses surrogacy in a fashion similar to the causal association framework, but can be identified from observable data only in randomized trials.

There are examples where use of a surrogate marker to assess treatment effect in advance of an outcome have been deleterious to health. Fleming and DeMets (1996) discuss the development of drugs to reduce ventricular arrhythmias, which were associated with cardiac-related deaths. Post-approval randomized trials showed these drugs, while effectively reducing premature ventricular beats, were nonetheless associated with substantially increased risk of death. This concept was formalized as the “surrogacy paradox” by Chen and others (2007a, b, c), who pointed out that it is possible for situations to exist in which the treatment has beneficial effect on the surrogate, the surrogate is positively associated with the outcome, yet the overall effect of the treatment on the outcome may be harmful. Chen and others pointed out that surrogates that met criteria for surrogacy under either the causal association paradigm or under the causal effects paradigm might still suffer from the surrogate paradox. Chen and others (2007a, b, c) and Ju and Geng (2010) also discussed under what conditions the surrogacy paradox could be avoided, focusing on the causal effect paradigm. VanderWeele (2013) discussed these manuscripts and concluded that the surrogate paradox can occur when one or more of the following three conditions occur: (i) a direct effect between the treatment and the outcome running in the opposite direction of the indirect effect between the treatment and outcome through the surrogate; (ii) confounding between the surrogate and outcome; and (iii) what he terms “lack of transitivity”, in which the treatment does not change the surrogate for all the same persons for whom the surrogate changes the outcome. Vanderweele notes that the “meta-analytic” approach to assess surrogacy may provide the most effective means of assessing the surrogacy paradox, as it requires fewer assumptions than the other causal analytic methods, and it focuses on quantities that more directly assess the surrogate paradox, namely “transportability” across different treatment settings. Because there now truly is repetition at the level of the trial, the nature of the association can be used to gauge the possibility of paradoxes.

We consider the issue of the surrogate paradox in the setting in which we have multiple clinical trials available where both the outcome and the surrogate marker of interest have been measured. This allows us to consider surrogacy assessment using the “meta-analytic” approach, first described in Buyse and others (2000) and developed further, which views surrogacy assessment primarily from a predictive perspective, namely that of predicting a trial-level treatment effect on an outcome given a trial-level treatment effect on a surrogate. The quality of a surrogates is assessed by the degree of correlation between the trial-level treatment effects on markers and outcomes; however, high correlations do not preclude the possibility of a surrogate paradox. Thus, our goal here is somewhat different—namely to assess probabilities that future trials could produce qualitative interactions between treatment effects on the marker and the outcome, either unconditionally or conditional on having observed the treatment effects on the marker. Specifically, we develop measures for four scenarios:

  1. Predicting the probability that the true treatment effects for the outcome and marker will be in opposite directions in a new (Inline graphic) trial “drawn” from the population of trials from which the Inline graphic trials that make up the meta-analysis have also been “sampled”.

  2. Predicting the probability that the true treatment effects for the outcome will be harmful given that the true treatment effect on the marker is beneficial, again in a new (Inline graphic) trial “drawn” from the population of trials from which the Inline graphic trials that make up the meta-analysis have also been “sampled”.

  3. Predicting the probability that the true treatment effect for the outcome will be in the opposite direction of the true treatment effect for the marker when some data (typically marker, but also possibly including outcome) have already been collected in the new (Inline graphic) trial.

  4. Estimating the size of the observed treatment effect on the marker required to preclude the possibility of a harmful true treatment effect on the outcome at a given level of probability (e.g. 0.95), either for a future (Inline graphic) trial without data, or for a trial in which some data for the marker under both treatment arms has been collected.

Scenario (1) assesses the probability that the (causal) treatment effect of the marker and the outcome will be in different directions in the population of trials that informs the meta-analysis, while scenario (2) assesses the probability of the specifically higher risk situation that the marker would suggest the treatment is beneficial when in fact it is harmful with respect to the true outcome of interest. Scenario (3) focuses on the situation in which a specific trial is of interest, where typically a substantial amount of marker data and no or very little outcome data have been collected. If no trial data have been collected, Scenario (4) is similar to the first two scenarios in that a population measure is of interest, but one that focuses on assessing the level at which a beneficial observed marker effect might essentially put to rest concerns that a harmful treatment effect on the outcome is plausible; if trial data have been collected, it is more similar to Scenario (3) in which marker data only have been collected, but one that again focuses on assessing the degree to which harmful treatment effect on the outcome can be excluded.

Implicit in this development is the concept of “population” of trials with well-defined treatment and control conditions for each trial with the same specific outcome and marker for all trials, from which the “sample” of trials that make up meta-analysis is drawn. In some cases, this population might be restricted to a narrow range of treatment and control conditions, in other cases it might be a broad variety of different treatments with a common outcome and marker. Inference resulting from the methods developed below needs to be interpreted in light of these implied population structures. For this manuscript, we assume that the outcome and the marker have a hierarchical bivariate normal (BVN) distribution; extensions are considered in the discussion.

Few, if any, measures have been proposed to assess the risk surrogate paradox (Elliott and others, in discussion of VanderWeele, 2013). We believe our discussion below proposes sensible measures that are generalizable using the meta-analytic framework.

2. Estimating the probability that an outcome and marker will have opposite treatment effects in a new trial

First we summarize the development of the meta-analytic approach to surrogate marker assessment as described in Buyse and others (2000). Let Inline graphic be the surrogate marker measure for the Inline graphicth subject, Inline graphic, in the Inline graphicth trial, Inline graphic, Inline graphic be the equivalent outcome measure, and Inline graphic be the indicator for treatment arm (1 for treatment, 0 for control):

2.

where the residual errors follow a BVN distribution:

2.

and the random trial-level effects follow a 4D multivariate normal (MVN) distribution:

2.

The causal effect of the treatment on the surrogate marker is given by the contrast of the value of the marker that would be observed under treatment, Inline graphic, with the value that would be observed under control, Inline graphic. Hence, the expected effect of the treatment on the surrogate marker within a given trial Inline graphic is given by Inline graphic, where the second equality derives from randomization, which ensures Inline graphic. Similarly, the expected effect of the treatment on the outcome is Inline graphic. Thus, we can consider Inline graphic and Inline graphic to be random effects, with a (marginal) BVN distribution:

2. (2.1)

Buyse and others suggest a “trial-level” measure of surrogacy given by proportion of variance in the total effect explained by the trial-level random effects associated with the surrogate:

2.

This formulation provides a convenient platform to measure surrogate paradox risk in several ways. First, assuming without loss of generality that the qualitative effects of the treatment (“beneficial” versus “harmful”) on the marker and outcome are in the same direction, with positive effects beneficial and negative effects harmful, we consider the probability that the Inline graphicth trial will yield treatment effects on the marker and outcome in the same direction: Inline graphic. This probability is given by

2. (2.2)

where the subscript 13 in Inline graphic refers to the first and third quadrants of the Cartesian plane, Inline graphic is the cumulative distribution function of a Inline graphic-variate normal distribution with mean Inline graphic and variance Inline graphic evaluated at Inline graphic, Inline graphic, and Inline graphic.

2.1. Maximum likelihood estimation

We can obtain inference about Inline graphic using mixed-model best linear unbiased estimators and predictors. Let Inline graphic constitute the surrogate marker and outcome for each subject, Inline graphic be the fixed effect matrix associated with the parameters Inline graphic, and similarly the random effects matrix associated with Inline graphic. We then obtain Inline graphic and Inline graphic as the third and fourth elements of the maximum likelihood (ML) or restricted ML (REML) estimator of Inline graphic and similarly Inline graphic, Inline graphic, Inline graphic as the lower right corner of the ML or REML estimator of Inline graphic in the model

2.1.

where Inline graphic consists of the stacked elements of Inline graphic, Inline graphic, and Inline graphic, and Inline graphic is the Kronecker product operator on two matricies. We then obtain

2.1.

The variance of Inline graphic can be obtained via the Delta Method; details are provided in Section 1 of supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org).

2.2. Bayesian estimation

Alternatively, a fully Bayesian approach can be employed that does not rely on asymptotic approximations for the variance by placing priors on Inline graphic, Inline graphic, and Inline graphic, obtaining draws via a Markov chain Monte Carlo chain, and transforming them via (2.2) to obtain the posterior distribution of Inline graphic, Inline graphic. Specifically, we assume an MVN prior for the fixed effects such that Inline graphic, and Wishart priors for the covariance matrices such that Inline graphic and Inline graphic. A Gibbs sampling routine is then used to obtain draws from the posterior distributions of the parameters; conditional posterior distributions are described in Section 2 of supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org).

3. Estimating the probability that the treatment effects for the outcome will be harmful given that the treatment effect on the marker is beneficial

A second measure of surrogacy paradox risk targets the particularly dangerous situation where the surrogate marker suggests that the treatment will be beneficial, but in reality it is harmful: situations corresponding Inline graphic, Inline graphic The probability that the Inline graphicth trial will lie outside this quadrant, i.e. that Inline graphic, Inline graphic or Inline graphic is given by

3. (3.1)

Either a frequentist or a Bayesian approach can be utilized as described for inference about Inline graphic.

To get a sense of these measures, Figure 1 shows contour plots and associated measures of Inline graphic and Inline graphic for four different scenarios. Scenario (a) corresponds to a reasonably strong correlation between surrogate marker and outcome effects, with some modest risk of marker/outcome treatment effect reversal because the treatment effects on the surrogate tend to be stronger than the treatment effects on the outcome. An example of this might be a setting in which the surrogate marker is taken as the outcome of interest measured early in the trial. Scenario (b) is similar to (a), but with a stronger treatment effect for the outcome, reducing the risk of the more dangerous reversal (beneficial marker effect but harmful outcome effect). Such an example might occur when part of the treatment effect on the outcome follows a pathway not through the marker, as in the case where we have specific surrogate biomarker, but the final effect of treatment on the outcome (e.g. survival) involves pathways not measured by this biomarker. Scenario (c) shows a very strong marker/outcome correlation, but because there is a small treatment effect size, the paradox risk is not eliminated. Such an example might often be reflected in studies of cardiovascular disease, where treatment effects tend to be small but highly correlated surrogate markers are often available. Finally, (d) shows a much weaker marker/outcome correlation, but the large effect size essentially eliminates paradox risk. These types of settings might be seen in trials of some solid tumors with targeted therapies, where large treatment effects might occur, but only relatively weak surrogates are available.

Fig. 1.

Fig. 1.

(a) Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic; (b) Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic; (c) Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic; (d) Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic. Contour lines represent equal bivariate PDFs at levels of 0.02, except for (c), where contour lines represent an equal bivariate PDF at levels of 0.05.

4. Estimating the probability that an outcome and marker will have opposite treatment effects in a new trial when partial data have been collected

The above discussion has focused on a population-level quantity than can also be viewed as providing inference about a future trial about which we have no collected data. If instead we have such data for the Inline graphicth trial and wish to condition on the observed Inline graphic, we obtain the following for the random effect Inline graphic:

4. (4.1)

where Inline graphic for Inline graphic and Inline graphic (This follows from considering the joint distribution of Inline graphic, Inline graphic, and Inline graphic.) Our trial-level distribution of interest is obtained as

4.

where Inline graphic for Inline graphic corresponding to the third element of the ML/REML estimate of Inline graphic and Inline graphic corresponding to third element of the ML/REML estimate of Inline graphic, Inline graphic for Inline graphic corresponding to fourth element of the ML/REML estimate of Inline graphic and Inline graphic corresponding to fourth element of the ML/REML estimate of Inline graphic, Inline graphic corresponding to the Inline graphic element of the ML/REML estimator of Inline graphic, and Inline graphic and similarly for Inline graphic. Typically, the focus will be on Inline graphic, where limited information will be available for Inline graphic. However, this should not pose a barrier to estimation, since the mixed model can be estimated by replacing Inline graphic with Inline graphic and Inline graphic, with row vector Inline graphic for elements lacking Inline graphic.

5. Estimating the size of the beneficial treatment effect on the marker required to preclude a harmful treatment effect on the outcome

Here our goal in our fourth proposed surrogacy paradox measure is to determine what is the minimum observed beneficial treatment effect for a marker than can reduce the probability that the true treatment effect for the outcome is harmful. Let Inline graphic be the difference between the observed surrogate marker means under treatment and control. The joint distribution of true treatment effect on the outcome and observed treatment effect on the surrogate marker is given by

5. (5.1)

where Inline graphic, Inline graphic, and Inline graphic. Thus, the distribution of true treatment effect on the outcome Inline graphic conditional on a given observed treatment effect on the surrogate marker Inline graphic is distributed as

5. (5.2)

Thus,

5. (5.3)

At this point, the analysis can go one of two ways. If we have actually collected data to determine Inline graphic and want to compute the probability Inline graphic that the true effect in the outcome for the trial will be non-negative, we can compute directly from (5.3), replacing the values in (5.3) with their estimates Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic to determine Inline graphic. Alternatively, we might want to determine the value of Inline graphic that will ensure that the probability that Inline graphic is negative is less than or equal to Inline graphic, or Inline graphic. A bit of algebra shows this requires

5. (5.4)

(We assume that Inline graphic and Inline graphic are known values for the study design). Again, replacing the values in (5.4) with their estimates and the greater than or equal to sign with equality yields Inline graphic, the value of Inline graphic that would have to be observed to ensure that the true treatment effect in the outcome will be non-negative with at least probability Inline graphic. We note that Inline graphic is related to the surrogacy threshold measure of Burzykowski and Buyse (2006), with the difference being that the threshold Inline graphic is computed conditional on the observed surrogate data only, with the goal being to estimate a specific observed treatment effect on a surrogate that is associated with a given risk of treatment reversal in a specific trial, whereas the surrogacy threshold measure conditions on latent trial-level effects to obtain a population-level measure of surrogate value. We use the Delta Method to obtain confidence intervals for Inline graphic and Inline graphic. Section 3 of supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org) provides details.

6. Simulations

We perform simulations under the four surrogacy scenarios depicted in Figure 1 to examine the bias, efficiency, and coverage rates of the proposed estimators. All of the scenarios assume Inline graphic, Inline graphic, Inline graphic, and Inline graphic. The resulting Inline graphic values are given in Figure 1.

For each of the four scenarios, we generate 500 data sets from a meta-analysis with either 10, 40, or 100 trials, each trial with 150 subjects randomized to each treatment arm. The MIXED procedure in SAS is used to obtain point estimates and standard errors for Inline graphic, Inline graphic, and Inline graphic for each data set. Data sets with estimates of Inline graphic that were less than 0.1 were discarded in order to obtain more stable estimates of the quantities of interest; this situation was uncommon in all settings except Scenario (d), where the relatively weak surrogate marker lead to 35% of the simulations being rejected with 10 trials and 19% with 40 trials. Table 1 contains the point estimates, empirical standard errors and coverage rates of Inline graphic, Inline graphic, and the value of Inline graphic such that Inline graphic. There is some bias in the estimation of these quantities using only 10 trials, and the coverage rates are below nominal, similar to results observed in small meta-analytic sample settings in Li and Taylor (2010). In particular, Scenario (a), a “typical” setting for a moderately effective surrogate underestimated Inline graphic to a modest degree; proportionately, the estimate of Inline graphic was downwardly biased more severely in Scenario (d), the low-correlation/large treatment effect setting, although as a practical matter is was still correctly estimated in excess of 0.99. Estimation of Inline graphic was more substantially downwardly biased as well, in all scenarios except (d), which was estimated very poorly. Nominal coverage was substantially below 95% for all estimators. When the number of trials is increased to 40, there is little bias in the estimates, although coverage remained somewhat downwardly biased. The bias was effectively zero when 100 trials are used, with nearly all of the estimates obtaining nominal coverage.

Table 1.

Simulation results

10 trials
40 trials
100 trials
Scenario Quantity True value Estimate (SE) Coverage Estimate (SE) Coverage Estimate (SE) Coverage
aInline graphic Inline graphic 0.918 0.909 (0.05) 0.91 0.917 (0.03) 0.93 0.918 (0.02) 0.93
Inline graphic 0.931 0.926 (0.05) 0.86 0.931 (0.03) 0.90 0.931 (0.02) 0.93
Inline graphic 1.97 1.73 (0.93) 0.94 1.90 (0.46) 0.96 1.95 (0.28) 0.95
bInline graphic Inline graphic 0.858 0.867 (0.08) 0.84 0.859 (0.04) 0.92 0.859 (0.02) 0.95
Inline graphic 0.997 0.996 (0.007) 0.69 0.997 (0.003) 0.80 0.997 (0.002) 0.91
Inline graphic -0.22 -0.52 (0.86) 0.95 -0.29 (0.32) 0.95 -0.23 (0.18) 0.95
cInline graphic Inline graphic 0.874 0.876 (0.05) 0.91 0.875 (0.02) 0.93 0.874 (0.01) 0.94
Inline graphic 0.937 0.938 (0.04) 0.84 0.938 (0.02) 0.93 0.937 (0.01) 0.96
Inline graphic 0.77 0.69 (0.27) 0.86 0.74 (0.12) 0.94 0.76 (0.08) 0.96
Inline graphic 0.997 0.993 (0.01) 0.86 0.996 (0.004) 0.92 0.997 (0.002) 0.92
Inline graphic 0.999 0.996 (0.007) 0.76 0.998 (0.002) 0.90 0.998 (0.001) 0.92
Inline graphic -2.70 -2.20 (4.37) 0.76 -2.61 (2.78) 0.86 -3.18 (2.29) 0.93

Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic.

Inline graphicInline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic

Inline graphicInline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic.

Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic

Thus, in small samples, there may be some underestimation of the probability that the true treatment effects for the outcome and marker will be in the same direction—a conservative result—while the size of the beneficial treatment effect on the marker required to preclude the possibility of a harmful treatment effect on the outcome may be underestimated as well, which would be anti-conservative. In moderate to large meta-analyses, however, the proposed measures of the surrogacy paradox will be well-estimated when the normal approximations hold.

7. Applications

7.1. Early measures of intraocular pressure as a surrogate in glaucoma treatment

We illustrate the use of the proposed metrics in detecting the presence of the surrogate paradox using data from the Collaborative Initial Glaucoma Treatment Study (CIGTS) (Musch and others, 1999). This study was a multicenter randomized clinical trial comparing the effects of surgery versus a drug on the reduction of intraocular pressure (IOP) in glaucoma patients. A total of 607 patients were enrolled in the study, with 307 randomized to the drug arm. IOP was measured at 3 month intervals. We consider the measurement of IOP (recorded in mmHg) at 18 months after beginning treatment as a surrogate marker for IOP at 96 months. We take the trials to correspond to the 14 centers at which the study was conducted. Missing data (Inline graphic in marker [14 under control and 32 under treatment] and Inline graphic in outcome [178 under control and 197 under treatment]) are imputed via a linear mixed model with a random effect for trial, a quadratic trend for time, an effect for treatment, and an interaction between time and treatment. (Because the examples are for illustration, single imputation is used; a more complete analysis would multiply imputed the data to correctly account for the uncertainty in the imputed data.) The Inline graphic measure of surrogacy is 0.491, indicating that 18-month IOP is a marker of moderate quality for 96-month IOP.

Figure 2 provides a plot of the treatment effect on reduction in IOP at 12 months and the treatment effect on reduction in IOP at 96 months for the 14 trials considered, with the size of each circle scaled to represent the size of the trial. The plot shows that in 13 of the 14 trials, the treatment caused a reduction in IOP in both the surrogate and final outcome; in the other trial, the treatment has a harmful effect (increase in IOP) on both the maker and the outcome, although the effect is not statistically significant at the trial level. Using an REML approach, the estimates and 95% confidence intervals of quantities of interest to evaluate the presence of the surrogate paradox are Inline graphic and Inline graphic, indicating a small but non-trivial probability of the presence of the surrogate paradox when using early treatment IOP as a surrogate for later treatment IOP. For a new randomized trial with 150 subjects on each treatment arm, the smallest values of the treatment effect on Inline graphic in which the probability of a beneficial treatment effect on Inline graphic (i.e. the surrogate paradox is present) is Inline graphic or Inline graphic are 0.91 and 5.28 mmHg, respectively, which encompass the range of treatment effects on the surrogate marker observed across the 14 trials.

Fig. 2.

Fig. 2.

(a) Observed treatment effect on reduction in IOP at 12 months (surrogate endpoint) and treatment effect on reduction in IOP at 96 months (true endpoint) for 14 trials from the CIGTS study. (b) Inline graphic and Inline graphic for 14 trials from the CIGTS study.

We also consider a Bayesian approach, assuming an MVN prior for the fixed effects such that Inline graphic, and Wishart priors for the inverse of the covariance matrices either of the form Inline graphic where Inline graphic is the length of the associated random vector (2 for Inline graphic, 4 for Inline graphic) (Li and Taylor, 2010), or Inline graphic, where Inline graphic is the OLS estimator of Inline graphic and Inline graphic is chosen to make this prior somewhat “flatter” relative to Li and Taylor (Kass and Natarajan, 2006). Use of the Li and Taylor prior resulted in a posterior mean for Inline graphic with a 95% credible interval (CI) of Inline graphic, a posterior mean for Inline graphic with a 95% CI of Inline graphic, and estimates of Inline graphic from Inline graphic and Inline graphic of 2.04 and 3.55 mmHg, respectively. Use of the Kass and Natarajan yielded a posterior mean for Inline graphic; a posterior mean for Inline graphic; and estimates of Inline graphic from Inline graphic and Inline graphic of 0.59 and 1.77 mmHg.

7.2. Blood pressure as a surrogate for hypertension

We provide a second illustration of the use of the proposed metrics in detecting the presence of the surrogate paradox using data from the Trial of Preventing Hypertension (TROPHY) (Julius and others, 2006). This study was a multicenter randomized trial which compared the effects of 2 years of treatment with candesartan versus standard of care on the incidence of hypertension in patients with prehypertension. Information on blood pressure and hypertension status were obtained at baseline, 1 and 3 months post-randomization and then every 3 months until month 24. The primary endpoint of the original study was development of hypertension. Here we consider the continuous measure of the average of systolic pressure and diastolic pressure at 12 months, and evaluate the average of systolic pressure and diastolic pressure at 1 month as a surrogate marker. Due to the development of hypertension (at which point patients changed their treatment regiment), there was some missing data in the 1- and 12-month blood pressure measurements (approximately 2% and 20% missing, respectively). Missing values were singly imputed using SAS PROC MI (SAS Institute, Inc., Cary, NC, USA). The imputation model included baseline measurements of age, race, weight, body-mass index, systolic and diastolic blood pressures, total cholesterol, high density lipoprotein cholesterol (HDL), low density lipoprotein cholesterol (LDL), HDL:LDL ratio, triglycerides, fasting glucose, total insulin, and creatine and was stratified by treatment and gender. The imputation model for the missing 12-month values also included all blood pressure measurements up to the 12-month time point.

A total of 772 patients from 69 centers were included in the original analysis. Prior to our analysis, seven centers with patients in only one treatment arm were removed. The remaining data consisted of 764 patients from 62 centers, with 389 randomized to the treatment group. The centers range in size from 2 patients to 46 patients. The left panel of Figure 3(a) provides a plot of the treatment effect on blood pressure reduction at 1 month and the treatment effect on blood pressure reduction at 12 months for the 62 centers considered, with the size of each circle scaled to represent the size of the center. The plot shows that in all but four of the trials, there was a beneficial effect of the treatment on the surrogate and in all but seven trials there was a beneficial effect of the treatment on the final outcome. There was a significant treatment effect on Inline graphic in 18 trials and a significant treatment effect on Inline graphic in 12 trials. As the estimate of the between trial covariance matrix, Inline graphic, is non-positive definite when estimated using REML estimation, only the Bayesian estimation procedure is applied. The same LT and KN priors as described in the IOP analysis were used (Li and Taylor, 2010; Kass and Natarajan, 2006). Figure 3(b) provides a plot of the fixed effect estimates plus the random effect estimates for the treatment effect on Inline graphic and Inline graphic (Inline graphic and Inline graphic, respectively) for the 62 trials. Using the LT priors, the posterior means and 95% CI of these quantities are Inline graphic (95% CI: 0.995, 1) and Inline graphic (95% CI: 0.998, 1), indicating very little probability of the presence of the surrogate paradox in this scenario. For a trial with 300 subjects, the smallest values of the treatment effect on Inline graphic in which the probability of a harmful treatment effect on Inline graphic is Inline graphic0.05 or Inline graphic0.01 are Inline graphic3.7 and Inline graphic1.3, respectively. Use of the KN priors yields posterior means and 95% CIs of 0.975 Inline graphic for Inline graphic, 0Inline graphic for Inline graphic, and values of Inline graphic of 1.77 and 5.06 for Inline graphic and 0.01, respectively.

Fig. 3.

Fig. 3.

(a) Observed treatment effect on average of systolic and diastolic blood pressure at 1 month (surrogate endpoint) and treatment effect on average of systolic and diastolic blood pressure at 12 months (true endpoint) for 62 trials from the TROPHY study. (b) Inline graphic and Inline graphic for 62 trials from the TROPHY study.

8. Discussion

For reasons of cost and time, finding good surrogate markers for outcomes is of great interest in clinical research. However, as Baker and Kramer (2003) note, use of surrogates is made difficult in settings where multiple complex pathways run from treatment to outcome. Surrogate markers that lie on only one of these pathways, and treatments that may affect multiple pathways, can lead to situations where markers can be misleading when considered in new settings. Methods to assess the risk of surrogate paradox can be an important tool to protect against failures that put health at risk. To our knowledge, this is one of the first manuscripts to attempt to quantify this risk using sensible measures that can easily be fit using existing mixed-model methodology. We consider both frequentist and Bayesian methods. Asymptotically, the two approaches yield similar results, although with small numbers of trials, there is some modest sensitivity to the use of priors for covariance matricies with diagonal matrix scales, as illustrated by our first (IOP) example. While such priors are useful for stabilizing covariance matrix estimates and allowing estimation when ML methods fail to converge, they will usually increase the surrogate paradox measure risks, as they weaken the correlation between the treatment effects on the marker and the treatment effects on the outcome. They will also tend to stabilize the surrogate paradox threshold measures for a given trial. The practitioner will have to decide if the tradeoffs in stability or ability to fit the model are worth the use of priors, and may wish to consider several priors to understand the sensitivity of the results to their choice, as we have done.

Several extensions are possible for this work. One option, following Burzykowski and others (2004), replaces the BVN assumption in the first stage of the model with a copula model, allowing for a larger range of marginal distributions to be considered. Another alternative replaces the first level of the hierarchical model with subdomains such as age, baseline biomarkers, or other baseline measures to assess whether there are subsets of the population for whom the surrogate marker effects may be misleading for the outcome of interest. As noted by a reviewer, such use of baseline covariates to identify treatment by covariate interactions assists not only in the identification of subsets of the population at risk for misleading conclusions when relying on a surrogate marker, but may provide evidence of covariate-specific surrogates more generally.

Supplementary material

Supplementary Material is available at http://biostatistics.oxfordjournals.org.

Funding

This research was supported by the National Cancer Institute Grant R01CA-129102.

Supplementary Material

Supplementary Data

Acknowledgement

The authors thank two anonymous reviewers whose suggestions improved the manuscript. Conflict of Interest: None declared.

References

  1. Baker S. G., Kramer B. S. (2006). A perfect correlate does not a surrogate make. BMJ Medical Research Methodology 3, 16–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Burzykowski T., Buyse M. (2006). Surrogate threshold effect: an alternative measure for meta-analytic surrogate endpoint validation. Pharmaceutical Statistics 5, 173–186. [DOI] [PubMed] [Google Scholar]
  3. Burzykowski T., Molenberghs G., Buyse M. (2004). The validation of surrogate end points by using data from randomized clinical trials: a case-study in advanced colorectal cancer. Journal of the Royal Statistical Society A167, 103–124. [Google Scholar]
  4. Buyse M., Molenberghs G., Burzykowski T., Renard D., Geys H. (2000). The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 1, 49–67. [DOI] [PubMed] [Google Scholar]
  5. Chen H., Geng Z., Jia J. (2007a). Criteria for surrogate end points. Journal of the Royal Statistical Society B69, 919–932. [Google Scholar]
  6. Chen H., Geng Z., Jia J. (2007b). Surrogate end points in clinical trials: are we being misled? Annals of Internal Medicine 125, 605–613. [DOI] [PubMed] [Google Scholar]
  7. Chen H., Geng Z., Jia J. (2007c). Principal stratification in causal inference. Biometrics 58, 21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Fleming T.T., DeMets D.L. (1996) . Surrogate end points in clinical trials: are we being mislead? Annals of Internal Medicine, 125, 605–613. [DOI] [PubMed] [Google Scholar]
  9. Frangakis C.E., Rubin D.B. (2002). Principal stratification in causal inference. Biometrics, 58, 21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Joffe M. M., Greene T. (2009). Related causal frameworks for surrogate outcomes. Biometrics 65, 530–538. [DOI] [PubMed] [Google Scholar]
  11. Ju C., Geng Z. (2010). Criteria for surrogate end points based on causal distributions. Journal of the Royal Statistical Society B72, 129–142. [Google Scholar]
  12. Julius S., Nesbitt S. D., Egan B. M., Weber M. A., Michelson E. L., Kaciroti N., Black H. R., Grimm R. H., Messerli F. H., Oparil S., Schork M. A. (2006). Feasibility of treating prehypertension with an angiotensin-receptor blocker. New England Journal of Medicine 354, 1685–1697. [DOI] [PubMed] [Google Scholar]
  13. Kass R. E., Natarajan R. (2006). A default conjugate prior for variance components in generalized linear mixed models (comment on article by Browne and Draper). Bayesian Analysis 1, 535–542. [Google Scholar]
  14. Lauritzen S. L. (2004). Discussion of causality. Scandinavian Journal of Statistics 31, 189–193. [Google Scholar]
  15. Li Y., Taylor J. M. G. (2010). Predicting treatment effects using biomarker data in a meta-analysis of clinical trials. Statistics in Medicine 29, 1875–1889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Musch D. C., Lichter P. R., Guire K. E., Standardi C. L.CIGTS Investigators. (1999). The collaborative initial glaucoma treatment study (CIGTS): study design, methods and baseline characteristics of enrolled patients. Ophthalmology 106, 653–662. [DOI] [PubMed] [Google Scholar]
  17. Prentice R. L. (1989). Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431–440. [DOI] [PubMed] [Google Scholar]
  18. VanderWeele T. J. (2013). Surrogate measures and consistent surrogates. Biometrics 69, 561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES