Summary
A surrogate marker (S) is a variable that can be measured earlier and often easier than the true endpoint (T) in a clinical trial. Most previous research has been devoted to developing surrogacy measures to quantify how well S can replace T or examining the use of S in predicting the effect of a treatment (Z). However, the research often requires one to fit models for the distribution of T given S and Z. It is well known that such models do not have causal interpretations because the models condition on a post-randomization variable S. In this paper, we directly model the relationship among T, S and Z using a potential outcomes framework introduced by Frangakis and Rubin (2002). We propose a Bayesian estimation method to evaluate the causal probabilities associated with the cross-classification of the potential outcomes of S and T when S and T are both binary. We use a log-linear model to directly model the association between the potential outcomes of S and T through the odds ratios. The quantities derived from this approach always have causal interpretations. However, this causal model is not identifiable from the data without additional assumptions. To reduce the non-identifiability problem and increase the precision of statistical inferences, we assume monotonicity and incorporate prior belief that is plausible in the surrogate context by using prior distributions. We also explore the relationship among the surrogacy measures based on traditional models and this counterfactual model. The method is applied to the data from a glaucoma treatment study.
Keywords: Bayesian Estimation, Counterfactual Model, Randomized Trial, Surrogate Marker
1. Introduction
A good surrogate marker S usually has a strong association with the true endpoint T. When T is rare, late-occurring or costly to obtain, one could use an effective surrogate marker to reliably extract information on the effect of the treatment (Z) on T before T is completely observed. While a typical surrogate marker can be a laboratory measurement and used as a substitute for a clinical endpoint such as CD4 counts for HIV infection and prostate-specific antigen for prostate cancer, an earlier laboratory measurement has also been considered as a surrogate marker for a later measurement. By facilitating the early treatment prediction on the later measurement, the earlier measurement can have enormous potential benefits in reducing trial duration and size, and lowering the trial expense. Some examples of using an earlier measurement as a surrogate for a later one include the interim height for adult height on girls with Turner Syndrome by Venkatrama and Begg (1999) and the earlier vision test result for the later result in a study on patients with age-related muscular degeneration by Buyse and Molenberghs (1998). In the data example to which we will apply our method, we consider the intraocular pressure (IOP) at the 12th month as S and the IOP at the 96th month as T among the glaucoma patients.
Prentice (1989) proposed a formal definition of perfect surrogacy which requires that S fully captures the effect of the treatment on T. To measure less than perfect surrogacy, the proportion of the treatment effect explained by S was proposed by Freedman, Graubard and Schatzkin (1992) and further extended by Wang and Taylor (2002). However, these measures often require one to utilize models for the distribution of T given S and Z. They often do not have causal interpretations because the models used condition on the post randomization variable S (Rosenbaum, 1984). Other surrogacy measures include the trial-level and individual-level correlations between S and T in a multiple-trial setting (Buyse et al, 2000) and those based on entropy (Alonso and Molenberghs, 2003).
To allow for causal interpretation, we directly measure the associations among S, T and Z in a causal modeling framework through the principal stratification approach introduced by Frangakis and Rubin (2002) (FR). This framework hypothesizes the setting wherein each individual has two potential outcomes, corresponding to the two possible treatment regimes (e.g., Z = 1 for treatment and Z = 0 for placebo). Here, we use the terms, causal, counterfactual and potential outcomes models exchangeably. When both S and T are binary, the potential outcomes for S and T are denoted by (S(Z) = 0, 1) and (T(Z) = 0, 1) with respect to Z. The approach by FR is to examine the distribution of the potential outcomes of T with respect to Z within each principal stratum, which is defined by each pair of possible realizations of the potential outcomes of S. Since the principal strata cannot be changed by treatment, they can be adjusted for as a pre-randomization variable. As such, the association measures and the quantities derived have causal interpretations.
However, there has been little work on estimation methods using this framework. A review paper by Weir and Walley (2006) advocates the need for further research. An exception to this is the paper of Gilbert and Hudgens (2008), where they proposed the use of causal effect predictiveness to assess surrogacy and their context of an HIV vaccine trial allowed them to assume that S(0) = 0. In this paper, we relax this assumption and propose a Bayesian estimation method to evaluate the counterfactual probabilities associated with the combinations of different sequences of potential outcomes of S and T for each individual. We incorporate the prior knowledge by imposing appropriate prior distributions and placing some reasonable constraints on the model parameters which allows one to reduce the non-identifiability problem and possibly increase the precision of the statistical inference.
In Section 2, we describe the glaucoma data example. In Section 3, we introduce the causal model, assumptions and surrogacy measures. In Section 4, we propose a Bayesian estimation method. In Section 5, we apply the method to the glaucoma data and examine the sensitivity of the priors. In Section 6, we examine the properties of our estimates through simulations. In Section 7, we explore the connections among the surrogacy measures based on conventional models and the counterfactual model. Finally, we provide discussion.
2. Glaucoma Treatment Study
We will apply the method to the data from the Collaborative Initial Glaucoma Treatment Study (CIGTS) (Musch et al, 1999). Glaucoma is a group of diseases that cause vision loss and is a leading cause for blindness. Elevated IOP in the eyes is a major risk factor of glaucoma. The Advanced Glaucoma Intervention Study (AGIS) demonstrated that when IOP reduction from baseline is substantial, progression of visual field loss can be prevented (Musch et al, 2009). The CIGTS is a randomized trial designed to compare the effects of surgery (Z = 1) and medicine (Z = 0) on reducing IOP. Patients were enrolled between 1993 and 1997. The IOP (recorded in mmHg) level has been measured at different time points following randomization. Our purpose is to examine the property of the IOP measurement at the 12th month as S for the IOP at the 96th month as T. Both S and T are defined as 1 if IOP is less than 18mmHg and 0 otherwise. It is found that eyes with IOP of less than 18 at every time point during at least six years of follow-up essentially had no further visual field loss (AGIS, 2000). There are 607 patients enrolled at the baseline. Due to drop out, 345 are measured only at month 12 and 228 have IOP measured at both months 12 and 96.
3. The Setup
3.1 Potential Outcomes Model
For each subject i, we have two potential outcomes for each of Si and Ti, denoted by Si(Zi) and Ti(Zi) with respect to Zi. The possible realizations of (Si(0), Si(1)) are (0, 0), (0, 1), (1, 1) and (1, 0) and similarly for (Ti(0), Ti(1)). There are 16 counterfactual probabilities that are associated with the combinations of different sequences of potential outcomes for Si and Ti. These probabilities sum to 1 as the 16 cells are the partitions of a population. Collectively they completely describe the causal associations among Ti, Si and Zi.
3.2 Assumptions and Identifiability
Since only one of the potential outcomes is unobserved, the counterfactual model is over-parameterized. We make assumptions to assist in the identifiability. In addition to the two standard assumptions, ignorability of treatment assignment (Rubin, 1978) and stable unit treatment value assumption (Rubin, 1980), we also assume monotonicity. Under this assumption, a patient who received Z = 1 does not become worse off than that patient if he or she received Z = 0. Assume S = 1 and T = 1 represent better outcomes than S = 0 and T = 0, respectively. The monotonicity assumption requires that Si(1) ≥ Si(0) and Ti(1) ≥ Ti(0) for all i; hence, we cannot observe either (Si(0) = 1, Si(1) = 0) or (Ti(0) = 1, Ti(1) = 0). The number of free parameters is reduced from 15 to 8 (Table 1). Our data can support only six parameters, as the probabilities (P(T = t, S = s|Z)) within each treatment group add up to 1, hence, only some of the probabilities or certain combinations are estimable.
Table 1.
Probabilities from the Counterfactual Model with Monotonicity Assumption
| (T(0), T(1)) | |||
|---|---|---|---|
| (s(0), s(1)) | (0, 0) | (0, 1) | (1, 1) |
| (0, 0) | p 11 | p 12 | p 13 |
| (0, 1) | p 21 | p 22 | p 23 |
| (1, 1) | p 31 | p 32 | p 33 |
3.3 Surrogacy Measures
In a traditional model framework, Freedman et al (1992) proposed the proportion of the treatment effect explained to measure surrogacy in a model which assumes no interaction between S and T. A measure free of this assumption was proposed by Wang and Taylor (2002) as FWT = δγa/τ where, δ = P(S = 1|Z = 1) − P(S = 1|Z = 0), τ = P(T = 1|Z = 1) − P(T = 1|Z = 0) and γa = P(T = 1|Z = 0, S = 1) − P(T = 1|Z = 0, S = 0). The quantities δ and τ denote the treatment effects on S and T, respectively; and γa measures the strength of the association between S an T. Given the effect on T, the larger the effect on S or the stronger the association between S and T, the higher FWT will be. Odds ratios (OR), ORg0 and ORg1, measure the associations between S and T in the Z = 0, 1 groups, respectively.
In a counterfactual framework, the expressions of causal effects (CE) are based on the comparisons between two potential outcomes. In a randomized trial, both CEs on S and T are directly estimable. FR proposed the concepts of associative and dissociative effects. If the CE on Ti is reflected on the change in Si, the effect is associative. Conversely, the effect is dissociative. We describe the sequence of the values of the potential outcomes (0, 0), (0, 1) and (1, 1) as “non-responsive”, “responsive”, and “always responsive”. Under the monotonicity assumption, the overall CE on T (CET) is p+2 = p12 + p22 + p32 which measures the fraction of the patients whose Ti's are responsive to the treatment. The associative effect is p22 and the dissociative effect is p12 + p32 where p22 refers to the fraction of the patients whose Si's and Ti's are both responsive to the treatment and the dissociative effect is the fraction of the patients whose Ti's are responsive to the treatment but whose Si's are not. To evaluate the degree of surrogacy, Taylor, Wang and Thiébaut (2005) defined the associative proportion (AP) as . The dissociative proportion (DP) is . Two other measures are also quantities of interest: surrogate associative proportion and surrogate dissociative proportion , where .
Prentice (1989) defined perfect surrogacy in a traditional model setup which we refer to as perfect statistical surrogacy. FR suggested a definition for perfect principal surrogacy which requires that the CE on T may only exist when that on S exist; i.e., p12 = p32 = 0. When S and T are binary, we argue that with more restrictions, p21 = p23 = 0, it ensures that for every patient, if Si is responsive, Ti is also responsive; and vice versa. Based on this context, we suggest a new measure, common associative proportion , to assess the degree of principal surrogacy. When p12 = p21 = p23 = p32 = 0, S satisfies perfect principal surrogacy and we have CAP = 1; when p22 = 0, for any individual, no CE on Ti is captured by that on Si and we have CAP = 0. When CAP = 1, SAP = 1, AP = 1, SDP = 0 and DP = 0. The measure CAP is usually smaller than AP and SAP. Unlike FWT, CAP, AP, SAP, SDP and DP always fall in the range [0, 1].
4. The Methods
4.1 Observed Data, Complete Data and Likelihood
Let rz denote the number of patients in the Z = z group (z = 0, 1) and r = r0 + r1. Let rzst denote the number of patients for each combination of Z, S and T. The observed-data likelihood function can be expressed in terms of the counterfactual probabilities
The complete data consists of all potential outcomes. Let njk denote the cell count corresponding to the counterfactual probability in the cell (j, k) for the jth row and the kth column of Table 1 for all patients and for the treatment group z where j, k = 1, 2, 3. The complete data likelihood is
There is a one-to-one or many-to-one correspondence between njk's and rzst's.
4.2 The Model
Let S* = 1, 2, 3 denote the ordered categories of (S(0), S(1)): (0, 0), (0, 1) and (1, 1) and T* = 1, 2, 3 denote the ordered categories of (T(0), T(1)). For convenience, we reparametrize the pjk's and use a log-linear model for . For simplicity, we assume equal allocation, i.e., . The model is specified as
| (1) |
where, λjS and λkT denote the row and column effects, respectively and λjk denote their interaction. For identifiability of the log-linear model, we use the constraints (λ2S = λ2T = λj2 = λ2k = 0) which lead to nice and simple expressions for the following log odds ratios in the four 2 × 2 subtables in the four corners of Table 1:
The parametrization allows us to exploit the associations between the ordered variables S* and T*. A positive association between them implies that λ11 and λ33 are positive and λ13 and λ31 are negative. Conditional on the total counts, we can express the counterfactual probabilities using the parameters in (1) as:
| (2) |
To estimate the parameters, we adopt a Bayesian approach. We treat the unobserved potential outcomes as missing data and apply imputation techniques.
4.3 Prior Specifications
In clinical trials, the selection of the variable to use as S will be based on prior scientific knowledge. The surrogate marker S is often closely related to T, possibly because the marker is in the causal pathway leading to T. Hence, we assume (Si(0), Si(1)) is more likely to agree with (Ti(0), Ti(1)) than not; and S* and T* are ordered. That is, when S is non-responsive (responsive), T is also more likely to be non-responsive (responsive). Similarly it is unlikely that a person will be non-responsive in Si and always responsive in Ti.
The parameters, λ1S, λ3S, λ1T, and λ3T, are identifiable but the others are less so. We have chosen N(u, v2) as the prior distributions for λ1S, λ3S, λ1T, λ3T, λ11, λ13, λ31 and λ33. The prior for exp(λ) is G(a, b) where G denotes the gamma distribution and G(a, b) is parameterized such that the expected value is ab and the variance is ab2. We choose non-informative values of a = 0.001 and b = 1000 and let v2 = 9/4. To incorporate our prior belief and encourage but not force the ordering restriction, we choose u = 0.7 for λ11 and λ33, and u = −0.7 for λ13 and λ31, which would suggest moderate positive associations between the potential outcomes of S and T. When the ordering restriction and positive association are not considered, we let u=0. The characteristics of our prior choices on the log-linear model parameters are similar to what Garrett and Zeger (2000) discovered in their work for the logistic regression. The distributions of the probabilities induced by these priors are relatively flat, not overly skewed and appropriate for our study setting with wide 95% credible intervals. On the other hand, if vague priors such as normal priors with zero means and very large variances are placed on the log-linear model parameters, they would induce priors on the probabilities whose distributions have point masses concentrated at either zeros or ones (King and Brooks, 2001). When there are non-identifiable quantities, vague priors can give the posterior distributions undesirable properties and push them towards being overly skewed and non-normal as observed by Green and Park (2003).
4.4 Estimation Procedure
We use data augmentation (Little and Rubin, 2002) to estimate the parameters. Let robs = {r000, r001, r010, r011, r100, r101, r110, r111} and θ = (λ, λjS, λkT, λjk). The complete data cell counts are denoted by ncom = {n11, n12, n13, n21, n22, n23, n31, n32, n33}. To implement this procedure, we iterate the following I-step and P-step:
I-step: this step consists of distributing the observed counts into the cells in Table 1. Given θl−1 and robs, we impute , , , , , , , , , , and where, θl−1 denotes all the parameter estimates from the (l − 1)th iteration, is the draw of the count that contributes to from r000 from the lth iteration, is the draw of the count that contributes to from r101 from the lth iteration, and so on. Let and .
P-step: generate θl from the posterior distribution, , where includes the counts of the complete data obtained in the I-step from the lth iteration.
where,
· represents all the rest of the parameters, and so on. For exp(λ), the conditional draws can be made directly from the gamma distribution using the Gibbs sampler. For λ1S, λ3S, λ1T, λ3T, λ11, λ13, λ31 and λ33, we use the Metropolis-Hastings algorithm and the proposal distribution is normal with mean as the current value and variance adjusted to give an acceptance rate of approximately 40%. The mixing behavior of the MCMC sampler for a non-identifiable model can be rather slow and poor (Gelfand and Sahu (1999)). In our case the Markov Chain does not move quickly and the sample autocorrelation is high. For the CIGTS study, a burn-in period of 200, 000 iterations is needed for the MCMC samples to stabilize. After burn-in, we sample every 100th MCMC iteration from the posterior distribution to reduce the autocorrelation and obtain samples with good mixing property. The sensitivity towards the initial values is evaluated by comparing parameter estimates from five chains, on which we obtained the Gelman-Rubin Statistic () (Gelman et al, 2004). Generally, is considered sufficient. For all estimates in the CIGTS data, we have min and max .
5. Application To Glaucoma Data
5.1 The Results
We apply the estimation method to the Glaucoma data on 228 patients in the CIGTS for whom S, T and Z are completely observed. The observed counts are: r000 = 28, r001 = 29, r010 = 14, r011 = 55, r100 = 11, r101 = 8, r110 = 10, and r111 = 73. In Table 2, we report the means, medians and 95% credible intervals (CI) from the posterior distributions of the counterfactual probabilities and surrogacy measures. We choose a = 0.001, b = 1000, u = 0.7, v2 = 9/4. The posterior means and medians are similar. The estimated CET has its mean(95% CI) as 0.14(0.05, 0.24). Without the counterfactual model, we estimate CET directly from the observed data as with its 95% confidence interval (0.014, 0.25). The similarity between the two CET estimates suggests the goodness of fit of the counterfactual model and the slight difference may result from the prior assumptions. The mean(95% CI) for AP is 0.328(0.133, 0.550). It shows that about one third of the causal effect on T is reflected by that on S; however, the wide CI implies that AP is quite variable. SAP is estimated as 0.174(0.051, 0.334) indicating among the patients whose Si's are responsive to Zi, only about 17% of their Ti's are also responsive. As expected, CAP is smaller than either AP or SAP and estimated as 0.126(0.041, 0.239) showing that S is far from satisfying the perfect principal surrogacy. In a conventional model setup, the estimated proportion of treatment effect explained, has the mean of 0.732 and median of 0.588 with its 95% bootstrap confidence interval of (0.17, 2.51). The correlation coefficients between S and T in the medicine and surgery groups are 0.304 and 0.441, respectively. The estimated OR and its 95% confidence interval between S and T in the medicine group is ORg0 = 3.79(1.73, 8.30) and that in the surgery group is ORg1 = 10.04(3.26, 30.93). It indicates that the IOP at the 12th month is a good surrogate for that at the 96th month in a conventional model setting, although the association between the causal effect on S and that on T is small.
Table 2.
Bayesian Estimates for the Counterfactual Model for Glaucoma Data. PSD: posterior standard deviation; AP: associative proportion; DP: dissociative proportion; SAP: surrogate associative proportion; SDP: surrogate dissociative proportion; CAP: common associative proportion; CET: causal effect on T. Prior specifications: a = 0.001, b = 1000, u = 0.7, and v2 = 9/4
| Parameter | Mean | Median | PSD | 95% CI |
|---|---|---|---|---|
| p 11 | 0.101 | 0.099 | 0.028 | (0.054, 0.160) |
| p 12 | 0.033 | 0.030 | 0.019 | (0.006, 0.077) |
| p 13 | 0.051 | 0.048 | 0.028 | (0.007, 0.112) |
| p 21 | 0.051 | 0.048 | 0.024 | (0.012, 0.101) |
| p 22 | 0.047 | 0.044 | 0.024 | (0.011, 0.101) |
| p 23 | 0.170 | 0.170 | 0.042 | (0.085, 0.252) |
| p 31 | 0.048 | 0.045 | 0.026 | (0.006, 0.100) |
| p 32 | 0.063 | 0.061 | 0.030 | (0.011, 0.128) |
| p 33 | 0.437 | 0.437 | 0.043 | (0.356, 0.524) |
| AP | 0.328 | 0.321 | 0.108 | (0.133, 0.550) |
| DP | 0.672 | 0.679 | 0.108 | (0.447, 0.866) |
| SAP | 0.174 | 0.168 | 0.073 | (0.051, 0.334) |
| SDP | 0.132 | 0.130 | 0.051 | (0.039, 0.240) |
| CAP | 0.126 | 0.119 | 0.057 | (0.041, 0.239) |
| CET | 0.143 | 0.144 | 0.050 | (0.048, 0.238) |
5.2 Sensitivity of Priors
In Figure 1, we evaluate identifiability by plotting the prior and posterior distributions against each other (Garrett and Zeger, 2000), where u = 0.7 and v2 = 9/4. Generally, the more substantial the average overlap and the more similarity between the prior and posterior is, the less identifiable the parameter is likely to be. We find that p11, p33 and the causal effects are more identifiable than p13, p21, p31 and p32. The counterfactual surrogacy measures are moderately identifiable. OR2 and OR3 appears to be least identifiable.
Figure 1.
Prior and posterior distributions on selected quantities of interest. AP: associative proportion; DP: dissociative proportion; SAP: surrogate associative proportion; SDP: surrogate dissociative proportion; CAP: common associative proportion; CET: causal effect on T. Dash lines for the prior distributions and solid lines for the posterior distributions.
To further assess the extent of the impact of the priors on the posterior distributions on the counterfactual probabilities and surrogacy measures, we vary u of the prior N(u, v2) and fix v2. Then, we vary v2 but fix u. The results are listed in the web appendix. When we change u, we observe bigger changes in the posterior means than the posterior standard deviations (PSD). Relative to those when u = 0.7, with u = 0 or u = 1.4, the extent of the changes in the posterior means is less than 6% for most of the probabilities and surrogacy measures except for p31 and p13. When we change v2, we observe more changes in PSDs than in the posterior means. Compared with those when v2 = 9/4, with v2 = 1 or v2 = 4, the changes in PSDs are generally less than 15%. Overall, the quantities of interest are not overly sensitive to the prior specifications.
6. Simulation Study
We conduct a simulation study to examine the frequentist properties of the estimates. We simulate 100 data sets under the parameter specification: λ1S = 0.15, λ1T = −0.3, λ3S = 0.3, λ3T = −0.7, λ11 = 0.5, λ13 = −0.8, λ31 = −0.5 and λ33 = 0.8. We vary u, v2 and λ: (u = true, v2 = 9/4), (u = 0, v2 = 1/64), and λ = 2, 3.5, 7 where “true” refers to the true parameter value and λ controls the sample size. The simulation results from (u = true, v2 = 9/4, λ = 3.5) are listed in Table 3 and the others are in the web appendix. The quantity SD() refers to the standard deviation of Bayesian estimates and is the mean of PSDs. Both posterior means and medians have very little bias. For the less identifiable parameters, SD() is usually smaller than . When the 5 – 95 percentile ranges of the priors include all the true values, we consistently observe over-coverage regardless of the sample size. However, when the 5–95 percentile ranges of the priors do not include the true values, we may observe extreme under-coverage or over-coverage. As the sample size increases, the performance typically becomes better as the influence of the priors becomes smaller, but we do not usually have nominal large-sample coverage rates. On the other hand, regardless of priors, for the identifiable parameters, the coverage rates usually approach the nominal levels as the sample size increases. These findings are different from the situations for the identifiable models where Bayesian CIs can usually asymptotically match frequentist coverage; however, they are consistent with the literature for non-identifiable models (Gustafson, 2005; McCandless, Gustafson and Levy, 2007).
Table 3.
Bias, standard deviation (SD) of posterior estimates, Mean of posterior standard deviations () and coverage rates from 100 simulations. Posterior estimates (Est) are either posterior medians or posterior means. PSD: posterior standard deviation; AP: associative proportion; DP: dissociative proportion; SAP: surrogate associative proportion; SDP: surrogate dissociative proportion; CAP: common associative proportion; CET: causal effect on T. Prior specifications: G(0.001, 1000) for exp(λ) and N(true, 9/4) for all other parameters where “true” refers to the true parameter values. The parameter specification: λ1S = 0.15, λ1T = −0.3, λ3S = 0.3, λ3T = −0.7. λ11 = 0.5, λ13 = −0.8, λ31 = −0.5, λ33 = 0.8 and λ = 3.5. E(r) = 565
| Prior Distributions | Est: Median | Est: Mean | 95% | ||||||
|---|---|---|---|---|---|---|---|---|---|
| TRUE | 2.5% | 97.5% | Bias | SD | Bias | SD | Coverage | ||
| p 11 | 0.166 | 0.001 | 0.815 | 0.003 | 0.022 | 0.003 | 0.022 | 0.022 | 95 |
| p 12 | 0.136 | 0.001 | 0.476 | −0.001 | 0.021 | −0.001 | 0.022 | 0.027 | 98 |
| p 13 | 0.030 | 0.000 | 0.520 | −0.001 | 0.009 | 0.002 | 0.008 | 0.020 | 100 |
| p 21 | 0.087 | 0.002 | 0.235 | −0.001 | 0.014 | 0.000 | 0.013 | 0.032 | 100 |
| p 22 | 0.117 | 0.002 | 0.309 | −0.007 | 0.020 | −0.005 | 0.020 | 0.033 | 100 |
| p 23 | 0.058 | 0.001 | 0.301 | −0.003 | 0.012 | 0.002 | 0.011 | 0.021 | 99 |
| p 31 | 0.071 | 0.000 | 0.618 | −0.001 | 0.019 | 0.000 | 0.017 | 0.032 | 100 |
| p 32 | 0.158 | 0.002 | 0.495 | −0.002 | 0.022 | −0.002 | 0.021 | 0.035 | 100 |
| p 33 | 0.175 | 0.001 | 0.892 | 0.004 | 0.022 | 0.005 | 0.022 | 0.023 | 96 |
| AP | 0.285 | 0.024 | 0.711 | −0.011 | 0.037 | −0.008 | 0.036 | 0.074 | 100 |
| DP | 0.715 | 0.288 | 0.975 | 0.011 | 0.037 | 0.008 | 0.036 | 0.074 | 100 |
| SAP | 0.447 | 0.090 | 0.611 | −0.011 | 0.041 | −0.009 | 0.040 | 0.104 | 100 |
| SDP | 0.399 | 0.020 | 0.777 | −0.006 | 0.042 | −0.006 | 0.041 | 0.051 | 100 |
| CAP | 0.211 | 0.022 | 0.401 | −0.010 | 0.024 | −0.005 | 0.024 | 0.060 | 100 |
| CET | 0.412 | 0.025 | 0.711 | −0.007 | 0.041 | −0.007 | 0.041 | 0.038 | 95 |
7. Surrogacy Measures in the Counterfactual and Conventional Models
7.1 Perfect Statistical Surrogacy and Perfect Principal Surrogacy
The perfect statistical surrogacy requires that T and Z are conditionally independent given S. In the causal framework, when CAP = 1, S satisfies perfect principal surrogacy. For S to be meaningful, we require that p22 > 0. When CAP = 1, we have p12 = p21 = p23 = p32 = 0, and thus , , and . We consider two scenarios when CAP = 1. Scenario 1): when p13 = p31 = 0, (S(0), S(1)) = (T(0), T(1)), S(Z) = T(Z); that is, S and T are identical. In this trivial scenario, S satisfies both perfect principal surrogacy and perfect statistical surrogacy. Scenario 2): when p13 and p31 are nonzero, T and Z are not conditionally independent given S; as such, S does not satisfy perfect statistical surrogacy. However, this situation seems less plausible. A marker tends to be chosen as S because there is a strong biological mechanistic evidence that it is linked to T. A likely positive association between (S(0), S(1)) and (T(0), T(1)) implies that p31 and p13 are likely smaller than other probabilities. Hence, p13 and p31 would be zeros or close to zeros when p12 = p21 = p23 = p32 = 0. The fact that perfect principal surrogacy precludes perfect statistical surrogacy holds only in this implausible scenario.
7.2 Surrogacy Measures Under Two Hypothetical Examples
In a conventional model setup, under the monotonicity assumption, we can express the elements of FWT using the counterfactual probabilities as follows: δ = p21 + p22 + p23, τ = p12 + p22 + p32 and . Similarly, for the odds ratios, we have and .
To better understand the surrogacy measures in both traditional and counterfactual model settings with respect to the underlying causal associations, we calculate the surrogacy measures in two hypothetical examples (Table 4). In Example 1, when the CE on T is the same across three principal strata, CAP, SAP and AP are relatively small indicating a small causal association between S and T; however, the large values in FWT, ORg0 and ORg1 show that S is closely related to T in a conventional model setup.
Table 4.
Two Hypothetical Numerical Examples. AP: associative proportion; DP: dissociative proportion; SAP: surrogate associative proportion; SDP: surrogate dissociative proportion; CAP: common associative proportion; CET: causal effect on T. Example 1: AP = 1/3, SAP = 0.20, DP = 2/3, SDP = 0.20, CAP = 0.14, FWT = 1.00, ORg0 = 16, ORg1 = 16; Example 2: AP = 0.77, SAP = 0.77, DP = 0.23, SDP = 0.10, CAP = 0.63, FWT = 0.80, ORg0 = 90, ORg1 = 157
| Potential Outcomes | (T(0), T(1)) | |||
|---|---|---|---|---|
| (S(0), S(1)) | (0, 0) | (0, 1) | (1, 1) | Marginal |
| Example 1 | ||||
|
| ||||
| (0, 0) | 0.267 | 0.066 | 0.001 | 0.334 |
| (0, 1) | 0.133 | 0.066 | 0.133 | 0.332 |
| (1, 1) | 0.001 | 0.066 | 0.267 | 0.334 |
|
| ||||
| Example 2 | ||||
|
| ||||
| (0, 0) | 0.31 | 0.03 | 0.005 | 0.345 |
| (0, 1) | 0.03 | 0.24 | 0.04 | 0.31 |
| (1, 1) | 0.005 | 0.04 | 0.30 | 0.345 |
In Example 2, all surrogacy measures indicate a close relationship between S and T in both traditional and counterfactual model framework. In general, when p11 and p33 are relatively large compared with the off-diagonal probabilities in the same rows and columns, S is highly associated with T in a traditional model setup. When p22 is relative large compared with the off-diagonal probabilities in the same row and column, S is closely associated with T in a counterfactual framework. Although a thorough investigation of the critical values and the variability of the counterfactual surrogacy measures and their connections with FWT and ORs is beyond the scope of this manuscript, it would be very useful as future research.
8. Discussion
This manuscript examines the association between the effect of Z on S and that on T, as if we had observed both outcomes of S and T corresponding to two treatment options for every patient. Different from those based on the traditional models, the associations between (S(0), S(1)) and (T(0), T(1)) can not be changed by the treatment assignment and always have causal interpretations. The traditional models also ignore the fact that the effect of Z on T may occur to the patients who are inherently never-responsive or always-responsive in S regardless of the treatment received, however, the counterfactual model teases out the effect of Z on T in each subgroup of subjects defined by their responsiveness in S to the treatment received. The causal framework used here is similar in spirit to that used in the compliance literature (Balke and Pearl, 1997; Imbens and Rubin, 1997) where the main interest is to estimate the causal effect of a treatment for the compliers.
We use a log-linear model to directly model the association between the potential outcomes of S and T through the odds ratios of (S(0), S(1)) and (T(0), T(1)). We believe that there is an ordering in the sequence of the potential outcomes of (0, 0), (0, 1) and (1, 1). With our model setup, the scientific assumptions can be conveniently incorporated through the prior distributions for the odds ratios, for which there is little information from the observed data. The proposed estimation method can be readily extended to the settings when T is partially missing or when there are multiple trials. Besides the log-linear model, we also fit a multinomial model with Dirichlet priors. Although it is easier computationally, the model is less flexible and the impact of the priors on the estimable quantities such as the treatment effect on T is much larger than the log-linear model. Like the multinomial models or logistic regressions for contingency tables, the probabilities based on the log-linear model are required to be positive and as such we cannot test whether S is a perfect principal surrogate. Nonetheless, in practice, it is almost certain that no surrogate exists that either satisfies perfect principal surrogacy or perfect statistical surrogacy.
We adopt the framework proposed by FR. Robins and Greenland (1992) (RG) proposed another counterfactual framework which allows one to manipulate S. It requires additional probabilities to describe the likelihood of how T changes by changing S. This framework has been used by Chen, Geng and Jia (2007) and Taylor et al (2005) to study the surrogacy consistency. RG defined direct and indirect effects where the indirect effect is the part of the effect that Z affects T by affecting S and direct effect is the part not through this pathway. The relationships between the direct/indirect effect proposed by RG and the associative/dissociative effect by FR are explored in depth by VanderWeele (2008) and Joffe and Greene (2008). While the elaboration of the relationships is beyond the scope of this paper, we know that if S is in the causal pathway between Z and T, p22 is large. On the other hand, a very high p22 only shows that the causal effect of Z on S is highly associated with that on T but it does not necessarily imply that Z affects T by affecting S.
One of the key assumptions is monotonicity which is useful and necessary to reduce the number of parameters to have a more identifiable counterfactual model. If this assumption is correctly specified, we expect our estimates to be more effcient and less biased than those based on the conventional model. However, this assumption requires that every single patient would have done at least as well as that when she or he receives Z = 1 relative to that when she or he receives Z = 0. It is perhaps true for most of the patients but not usually satisfied for all patients. For example, in the CIGTS study, it is conceivable that some patients may be better if they received medicine instead of surgery, even though the average effect of surgery is consistently better. Assessing the impact of the violations of the monotonicity assumption would be an important extension.
We assumed that missingness is ignorable, and it will be useful to conduct sensitivity analysis to investigate this assumption. It will also be useful to calculate the non-parametric bounds free of the prior assumptions and quantify the ranges of the counterfactual probabilities in our context (Balke and Pearl, 1997). Extensions to other data types are possible. Some work has been done by Gilbert and Hudgens (2008) whose proposal of causal effect predictiveness surface as a surrogacy measure can be applied to different types of outcomes and by Gallop et al (2008) who considered a normally distributed outcome with a binary mediator that can be easily adapted to the surrogacy setting. Nonetheless, the majority of the literature has focused on binary endpoints. We advocate the need for research on more complex data structure, for example, two failure-time endpoints in oncology, where the settings are more common and can be more important.
Supplementary Material
Acknowledgements
The authors are grateful for the helpful comments made by the reviewers. They would also like to thank Dr. Brenda Gillespie for providing the CIGTS data. This research was supported by National Institutes of Health Grants MH078016 and CA129102.
Footnotes
Supplementary Materials Web Tables referenced in Sections (5.2) and (6) are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.
References
- AGIS Investigators The Advanced Glaucoma Intervention Study (AGIS): 7: The relationship between control of intraocular pressure and visual field deterioration. American Journal of Ophthalmology. 2000;130:429440. doi: 10.1016/s0002-9394(00)00538-9. [DOI] [PubMed] [Google Scholar]
- Alonso A, Molenberghs G. Surrogate Marker Evaluation from an Information Theory Perspective. Biometrics. 2003;63:180–186. doi: 10.1111/j.1541-0420.2006.00634.x. [DOI] [PubMed] [Google Scholar]
- Balke A, Pearl J. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association. 1997;92:1171–1176. [Google Scholar]
- Buyse M, Molenberghs G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics. 1998;54:10141029. [PubMed] [Google Scholar]
- Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics. 2000;1:49–68. doi: 10.1093/biostatistics/1.1.49. [DOI] [PubMed] [Google Scholar]
- Chen H, Geng Z, Jia J. Criteria for surrogate endpoints. Journal Royal Statistical Society B. 2007;69:919–932. [Google Scholar]
- Frangakis CE, Rubin DB. Principal stratification in casual inference. Biometrics. 2002;58:21–29. doi: 10.1111/j.0006-341x.2002.00021.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic disease. Statistics in Medicine. 1992;11:167–178. doi: 10.1002/sim.4780110204. [DOI] [PubMed] [Google Scholar]
- Gallop R, Small D, Lin J, Elliott ER, Joffe MM, Ten Have T. Mediation analysis with principal stratification. Statistics in Medicine. 2009;28:1108–1130. doi: 10.1002/sim.3533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garrett ES, Zeger SL. Latent class model diagnosis. Biometrics. 2000;56:1055–1067. doi: 10.1111/j.0006-341x.2000.01055.x. [DOI] [PubMed] [Google Scholar]
- Gelfand AE, Sahu SK. Identifiability, improper priors, and Gibbs sampling for generalized linear models. Journal of American Statistical Association. 1999;94:247–253. [Google Scholar]
- Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Chapman and Hall; New York: 2004. [Google Scholar]
- Gilbert PB, Hudgens MG. Evaluating causal effect predictiveness of candidate surrogate endpoints. Biometrics. 2008 doi: 10.1111/j.1541-0420.2008.01014.x. doi: 10.1111/j.1541-0420.2008.01014.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green PE, Park T. A bayesian hierarchical model for categorical data with nonignorable nonresponse. Biometrics. 2003;2003:886–896. doi: 10.1111/j.0006-341x.2003.00103.x. [DOI] [PubMed] [Google Scholar]
- Gustafson P. On model expansion, model contraction, identifiability, and prior information: two illustrative scenarios involving mismeasured variables. Statistical Science. 2005;20:111–140. [Google Scholar]
- Imbens GW, Rubin DB. Bayesian inference for causal effects in randomized experiments with noncompliance. The Annals of Statistics. 1997;25:305–327. [Google Scholar]
- King R, Brooks SP. Prior induction in log-linear models for general contingency table analysis. The Annals of Statistics. 2001;29:715–747. [Google Scholar]
- Joffe MM, Greene T. Related Causal Frameworks for Surrogate Outcomes. Biometrics. doi: 10.1111/j.1541-0420.2008.01106.x. doi: 10.1111/j.1541-0420.2008.01106.x. [DOI] [PubMed] [Google Scholar]
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd Edition Wiley; New York: 2002. [Google Scholar]
- McCandless LC, Gustafson P, Levy A. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Statistics in Medicine. 2007;26:2331–2347. doi: 10.1002/sim.2711. [DOI] [PubMed] [Google Scholar]
- Musch DC, Gillespie BW, Lichter PR, Niziol LM, Janz NK. CIGTS Study Investigators. Visual field progression in the Collaborative Initial Glaucoma Treatment Study the impact of treatment and other baseline factors. Ophthalmology. 2009;116:200–207. doi: 10.1016/j.ophtha.2008.08.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Musch DC, Lichter PR, Guire KE, Standardi CL, CIGTS Investigators The Collaborative Initial Glaucoma Treatment Study (CIGTS): Study design, methods, and baseline characteristics of enrolled patients. Ophthalmology. 1999;106:653–62. doi: 10.1016/s0161-6420(99)90147-1. [DOI] [PubMed] [Google Scholar]
- Prentice RL. Surrogate endpoints in clinical trials, definition and operational criteria. Statistics in Medicine. 1989;8:431–440. doi: 10.1002/sim.4780080407. [DOI] [PubMed] [Google Scholar]
- Robins JM, Greenland S. Identifiability and exchangeability of direct and indirect effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
- Rosenbaum PR. The consequences of adjustment for a concomitant variable that has been affected by the treatment. The Journal of the Royal Statistical Society, Series A. 1984;147:656–666. [Google Scholar]
- Rubin DB. Bayesian-inference for causal effects - role of randomization. The Annals of Statistics. 1978;6:34–58. [Google Scholar]
- Rubin DB. Randomization analysis of experimental-data - the Fisher randomziation test - comment. Journal of American Statistical Association. 1980;75:591–593. [Google Scholar]
- Taylor JMG, Wang Y, Thiébaut R. Counterfactual links to the proportion of treatment effect explained by a surrogate marker. Biometrics. 2005;61:1102–1111. doi: 10.1111/j.1541-0420.2005.00380.x. [DOI] [PubMed] [Google Scholar]
- Wang Y, Taylor JMG. A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics. 2003;58:803–812. doi: 10.1111/j.0006-341x.2002.00803.x. [DOI] [PubMed] [Google Scholar]
- VanderWeele T. Simple relations between principal stratification and direct and indirect effects. Statistics and Probability Letters. doi:10.1016/j.spl.2008.05.029. [Google Scholar]
- Venkatraman ES, Begg CB. Properties of A Nonparametric Test for Early Comparison of Treatments in Clinical Trials in the Presence of Surrogate Endpoints. Biometrics. 1999;55:1171–1176. doi: 10.1111/j.0006-341x.1999.01171.x. [DOI] [PubMed] [Google Scholar]
- Weir CJ, Walley RJ. Statistical evaluation of biomarkers as surrogate endpoints: a literature review. Statistics in Medicine. 2006;25:183–203. doi: 10.1002/sim.2319. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

