Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 1.
Published in final edited form as: Med Care. 2023 Dec 11;62(2):102–108. doi: 10.1097/MLR.0000000000001956

Statistical Methods to Evaluate Surrogate Markers

Layla Parast a, Lu Tian b, Tianxi Cai c,d, Latha Palaniappan e
PMCID: PMC10842261  NIHMSID: NIHMS1944992  PMID: 38079232

Abstract

Background

There is tremendous interest in evaluating surrogate markers given their potential to decrease study time, costs, and patient burden.

Objectives

The purpose of this statistical workshop article is to describe and illustrate how to evaluate a surrogate marker of interest using the proportion of treatment effect explained (PTE) as a measure of the quality of the surrogate marker for (1) a setting with a general fully observed primary outcome (e.g., biopsy score) and (2) a setting with a time-to-event primary outcome which may be censored due to study termination or early drop out (e.g., time to diabetes).

Methods

The methods are motivated by two randomized trials, one among children with nonalcoholic fatty liver disease where the primary outcome was change in biopsy score (general outcome) and another study among adults at high risk for Type 2 diabetes where the primary outcome was time to diabetes (time-to-event outcome). The methods are illustrated using the Rsurrogate package with detailed R code provided.

Results

In the biopsy score outcome setting, the estimated PTE of the examined surrogate marker was 0.182 (95% confidence interval [CI]: 0.121, 0.240), i.e., the surrogate explained only 18.2% of the treatment effect on the biopsy score. In the diabetes setting, the estimated PTE of the surrogate marker was 0.596 (95% CI: 0.404, 0.760), i.e., the surrogate explained 59.6% of the treatment effect on diabetes incidence.

Conclusions

This statistical workshop provides tools that will support future researchers in the evaluation of surrogate markers.

Keywords: surrogate marker, treatment effect, clinical trial, biostatistics

Introduction

The primary aim of many clinical studies is to evaluate the effect of a treatment or intervention with respect to a patient-level clinical outcome. Depending on the outcome of interest, these studies often need long-term follow-up of participants to measure the outcome.1,2 A surrogate marker is a patient-level outcome that can be used in place of the primary outcome to test for an earlier treatment effect.3,4 A surrogate marker is generally a measurement that can either be measured earlier than the primary outcome, or is less costly or less invasive to the patient compared to the primary outcome. There is tremendous interest in identifying surrogate markers given their potential to decrease study time, costs, and patient burden, by allowing us to make decisions about a treatment’s effectiveness sooner or with less cost.

The U.S. Food and Drug Administration’s (FDA) Accelerated Approval Program allows for drugs to be approved based on demonstrated effectiveness on a surrogate marker.5 While the FDA describes a surrogate marker as an intermediate endpoint that measures “a therapeutic effect that is considered reasonably likely to predict the clinical benefit of a drug, such as an effect on irreversible morbidity and mortality,” the statistical implications of that description are unclear.5,6 In the statistical literature, there is currently no agreement on a single optimal way to validate a surrogate.7,8 However, a widely used measure of surrogate validity in practice is the proportion of the treatment effect on the primary outcome that is explained by the surrogate marker (PTE).4,9,10 For example, in a study of patients with relapsing multiple sclerosis, the PTE was used to quantify how much of the effect of a once-daily oral immunomodulator on confirmed disability worsening was explained by the treatment’s effect on brain volume loss (PTE estimate = 51.3%).11 In a study of antiviral agents for genital herpes simplex virus (HSV), the PTE was used to investigate whether HSV shedding was a valid surrogate marker for the primary clinical outcome of genital herpes lesions (PTE estimates = 40–82% for different agents).12

Other useful statistical measures have been proposed to evaluate surrogates such as average causal necessity, average causal sufficiency, and the causal effect predictiveness surface in a principal stratification framework,1315 and the relative effect in the meta-analytic framework.16,17 Notably, all available methods to evaluate a surrogate assume that data are available on both the primary outcome and the surrogate marker for the purpose of evaluation. The single number summary of the PTE, as well as its intuitive interpretation, have led to broad use of the PTE as a measure of surrogate strength in clinical studies. For example, when using the PTE to quantify surrogate strength, a value near 1.0 (or 100%) reflects a surrogate that can explain almost all of the treatment effect on the primary outcome, while a value near 0 reflects a surrogate that essentially cannot explain any of the treatment effect. In theory, a high value of the PTE would ensure that using the surrogate marker to evaluate the effect of a treatment is likely to provide a reasonable approximation of the treatment effect on the primary outcome. However, there is variability in what is considered a high enough PTE value. Previous statistical work has suggested examining whether the lower bound of the 95% confidence interval for the PTE is above some threshold such as 0.50 or 0.75.18

Within certain estimation frameworks, the PTE is criticized because its definition, estimation, and statistical inference rely on correct parametric model specification (e.g., linear regressions with or without interaction terms).9,10,1820 It is unlikely for these models to accurately capture the complex relationship between the treatment, surrogate marker, and primary outcome. As an alternative, there exist nonparametric model-free approaches to define and estimate the PTE which do not involve model specification and have been shown to perform better than the model-based approaches when the true relationship between the treatment, surrogate marker, and primary outcome is complex.1921

The purpose of this statistical workshop article is to describe and illustrate how to evaluate a surrogate marker of interest using the PTE for (1) a setting with a general fully observed primary outcome (e.g., biopsy score) and (2) a setting with a time-to-event primary outcome where individuals may be censored/drop out of the study (e.g., time to diabetes). While we focus on the nonparametric approaches, the parametric model-based approaches are also included for completeness. For both settings, we provide the methodological details, R code (see supplementary material), and numeric results; analyses were done in R using the Rsurrogate22 package which is publicly available on CRAN.23

Methods

Motivating Data

We first describe two motivating datasets. The first is a randomized clinical trial among children with nonalcoholic fatty liver disease (NAFLD), the most common cause of chronic liver disease in children in the United States and ultimately results in advanced fibrosis, cirrhosis, and hepatocellular carcinoma.24 This study, the Treatment of Nonalcoholic Fatty Liver Disease in Children (TONIC) trial,25 randomized children with NAFLD to Vitamin E, metformin, or placebo, and followed participants for 2 years. For our purpose, the primary outcome of interest was the NAFLD activity score, which is a measure of improvement in liver function and ascertained via liver biopsy that requires hospitalization and general anesthesia. This NAFLD score combines information on steatosis (build up of fat in the liver), hepatocyte ballooning (form of liver cell degeneration associated with cell swelling and enlargement), and lobular inflammation (two or more inflammatory cells within the lobule), such that higher numbers indicate worse liver function. The surrogate marker of interest was change in alanine aminotransferase (ALT) which is an enzyme concentrated primarily in the liver and increases when there is liver damage. Unlike the NAFLD score, ALT can be easily measured with a simple outpatient venous blood draw. Here, both the primary outcome and the surrogate marker are measured at the same time, 2 years, but the advantage of potentially using ALT instead of the NAFLD activity score is a reduction in patient burden i.e., avoiding the need for a repeat invasive biopsy.

The second motivating dataset, the Diabetes Prevention Program (DPP),2628 is a randomized double-blind, placebo-controlled clinical trial among adults with impaired glucose tolerance. Participants were randomly assigned to placebo, lifestyle intervention, metformin, or troglitazone and followed for 5 years. For our purpose, the primary outcome of interest was time to diabetes diagnosis (with follow-up to 5 years; censored outcome) and the surrogate marker of interest was change in fasting plasma glucose at 2 years post-randomization. Fasting plasma glucose is a reasonable potential surrogate because a diabetes diagnosis in DPP was defined based on this measurement such that values greater than or equal to 140 resulted in a diabetes diagnosis. Here, the surrogate marker as a replacement of the primary outcome would have the advantage of reducing needed follow-up time. Both of these datasets are publicly available through the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Repository via a data use agreement (DUA).29 Though we have access to these datasets, because these datasets are not shareable by us to others without a DUA, our illustrations in this article use simulated data available in the R package: Rsurrogate22 (described below). These simulated datasets mirror these motivating datasets in terms of surrogate strength where in the first dataset, the surrogate marker is weak with a true PTE of 0.20, and in the second dataset, the surrogate marker is moderately strong with a true PTE of 0.60. One can reproduce the analysis results reported in the paper using these simulated datasets.

Statistical Methods: General fully observed primary outcome (e.g., biopsy score)

We first introduce notation to define the quantities of interest; Table 1 summarizes our notation and definitions. Let G be the binary randomized treatment indicator with G=1 for treatment and G=0 for control (or placebo). Let Y and S denote the primary outcome and surrogate marker, respectively, observed for all subjects i, i=1,...,n. This setting is depicted in the top portion of Figure 1. For example, in the TONIC trial, Y is the NAFLD activity score and S is ALT, both measured 2 years after randomization. The goal is to measure the surrogacy of S by estimating the proportion of the treatment effect on Y that is explained by the treatment effect on S (PTE). Throughout, we assume that S is a continuous measure; this assumption is correct in our motivating examples but moreover, allows us to utilize nonparametric kernel smoothing over S, described below. Binary or categorical surrogate markers would make the statistical inference more straightforward. Importantly, Y and S may be on very different scales, as is the case in both motivating examples. We use potential outcomes notation to define this quantity such that Y(g) and S(g) denote the primary outcome and surrogate marker under treatment G = g. That is, Y(1), Y(0), S(1), S(0) denote the measures for the primary outcome under the treatment, primary outcome under the control, surrogate marker under the treatment and surrogate marker under the control, respectively. In practice, we only observe the pairing for the treatment group to which the individual was assigned: (Y, S)=(Y1,S(1)) or (Y0,S(0)). The observed data consist of n1 independent identically distributed (i.i.d) copies of (Y1,S(1)), (Y1i, S1i), i = 1, ... , n1, from the treatment group and n0 i.i.d copies of (Y0,S(0)), (Y0i, S0i), i = 1, ... , n0, from the control group.

Table 1.

Notation and Definitions

Name Notation, General Outcome Notation, Time-to-event Outcome
Primary outcome Y T
Surrogate marker S S
Treatment G G
Treatment effect Δ Δ(t)
Residual treatment effect Δs ΔS(t,t0)
Proportion of treatment effect explained (PTE) R R(t,t0)
Estimated PTE using Freedman approach R^F N/A
Estimated PTE using flexible model-based approach R^M N/A
Estimated PTE using nonparametric approach R^S R^(t,t0)

Figure 1.

Figure 1.

Illustration of a general outcome setting (i) where the primary outcome (e.g., biopsy score) and surrogate measurement (blood biomarker) are both measured for all individuals and both may (or may not) be measured at the same time; and a time-to-event outcome setting (ii) where the occurrence of the primary outcome (e.g. diabetes diagnosis or myocardial infarction) is measured at time t and the surrogate marker (blood biomarker) is measured earlier, at time t0<t, and both are potentially missing due to censoring

The most common approach in the applied literature for defining and estimate the PTE is a simple regression model approach. We do not recommend this approach because, as described below, when the model does not hold, resulting estimates can be severely biased. However, we describe this approach here for completeness, and provide justification for our recommendation to not use it. Freedman proposed to define the PTE of a surrogate marker by specifying two regression models – one where the only predictor in the model is the treatment indicator, and a second where the predictors in the model are the treatment indicator and the surrogate marker:

Model 1:Y=β0+β1G
Model 2: Y=β0*+β1*G+ β2*S

The PTE is defined as R=1-β1*/β1 and is estimated by plugging in the estimated regression coefficients; we denote the resulting estimate as R^F. It is clear why this approach is often used – it is intuitive and easy to implement. If one were to fit Model 1 and see a large and significant treatment effect (β1) and then fit Model 2 and see a much smaller coefficient for the treatment β1*, then R would be close to 1. This appears reasonable as it seems that the surrogate is capturing a large proportion of the overall treatment effect. The challenge with this definition and procedure is that it is only valid if these two models hold and in practice. When these models don’t hold, it is no longer clear what R is quantifying, and previous work has shown substantial bias in the evaluation of the surrogacy of S. For example, an interaction between G and S would mean that Model 2 does not hold; this would happen if the association between the primary outcome and the treatment, and the primary outcome and the surrogate, were not simply additive but were instead, complex and intertwined.

We now describe a definition of the PTE that is model-free and formally uses the potential outcomes notation.10 First, the treatment effect on the primary outcome is defined as

Δ=EY1-Y0= EY1- E(Y0)

where E denotes the expected value of a random variable. Second, the residual treatment effect is defined as

ΔS=EY1-Y0|S1=S0=sdF0(s)

where F0s is the cumulative distribution function of S in the control group. This residual treatment effect is essentially the leftover treatment effect on Y after removing the treatment effect on S. This form of ΔS reflects the treatment effect if the surrogate marker distribution in the treated group looked like that of the control i.e., there is no treatment effect on the surrogate marker. The difference, Δ-ΔS, reflects the portion of the treatment effect on Y explained by S and thus, the PTE is defined as R=Δ-ΔSΔ=1-ΔSΔ. Note that there is no model specification of any kind used in this definition.

In terms of estimation, Δ can simply be estimated using the empirical mean within each treatment group:

Δ^=1n1i=1n1Y1i- 1n0i=1n0Y0i

while estimation of ΔS is more complex and requires a decision regarding the form of EY1|S1 and EY0|S0. If one were to use simple regression models e.g.,

EY0|S0, S1= α0+α1S0 and EY1| S1, S0= (α0+α2)+α1S1,

and assume that they were correctly specified, then this would be equivalent to the Freedman approach, RF, described above. However, one could specify more flexible models here such as:

EY0|S0, S1= α0+α1S0 and EY1|S1,S0= (α0+α2)+(α1+ α3)S1,

which allows for an interaction between the surrogate and the treatment. We denote the resulting estimate of

R=1-α2+α3ES0α2+α3ES1+α1ES1-S0

as R^M which is obtained by fitting the specified flexible regression models and plugging in the corresponding coefficients. Like the Freedman approach, the validity of the estimator would rely upon these models being correctly specified.

A more robust alternative is to use nonparametric kernel-based estimation which does not at all rely on the parametric models specified above. Specifically, μ1(s)= EY1|S1=s can be estimated using:

μ^1(s)=i=1n1KhS1i-sY1ii=1n1KhS1i-s

where Khx=h-1K(h-1x), K() is an appropriate kernel function, and h is a data-dependent bandwidth. The nonparametric estimate of ΔS can be shown to be:

Δ^S= 1n0i=1n0μ^1(S0i)- 1n0i=1n0Y0i

and we denote the resulting estimate of R as R^S= 1-Δ^S/Δ^. As with all statistical methods, there are trade-offs when using one approach versus another. Here, the nonparametric estimate is completely model-free and thus, robust to departures from the simplified regression models specified above. However, the kernel smoothing component requires a relatively large sample size (e.g., 200+ per treatment group) and similar supports for S in the two treatment groups. Note that other nonparametric methods such as splines can also be used to estimate the conditional expectation μ1(s). In this paper, we will illustrate and compare the three described estimation approaches: the Freedman estimate (R^F), the flexible model-based estimate (R^M), and the nonparametric estimate (R^S). Variance estimates and 95% confidence intervals for Δ, ΔS, and R are obtained using bootstrap resampling.

Statistical Methods: Time-to-event primary outcome where individuals may be censored/drop out of the study (e.g., time to diabetes).

The time-to-event outcome setting is depicted in the lower portion of Figure 1. Here, the primary outcome is a time-to-event outcome that is subject to censoring and is denoted by T. The surrogate marker S is measured at some time point, t0, where t0<t and t is the largest follow up time in the study. For example, in the DPP study, T is the time from treatment randomization to diabetes diagnosis and S is change in fasting plasma glucose from baseline to 2 years post-randomization. Again, using potential outcomes notation, T(1), T(0), S(1), S(0) denote the time of the primary outcome under the treatment, time of the primary outcome under the control, surrogate marker under the treatment and surrogate marker under the control, respectively. In this setting, the treatment effect on the primary outcome can be quantified as the difference in survival (or cumulative incidence) to time t:

Δt=PT1>t- PT0>t.

The residual treatment effect, ΔSt,t0, is intuitively similar to the general outcome setting case but more complex, in that it reflects the leftover treatment effect after taking into the account the treatment effect on both the surrogate marker and primary outcome incidence up to t0.20 The PTE is then defined as R(t,t0)=Δ(t)-ΔS(t,t0)Δ(t)=1-ΔS(t,t0)Δ(t). Due to censoring, we do not observe the event time for all patients. Instead, we observe n1 observations, (X1i,δ1i, S1i), i = 1, ... , n1, from the treatment group and n0 observations, X0i,δ0i, S0i, i = 1, ... , n0, from the control group where Xgi=min(Tgi,Cgi), δgi=I(Tgi< Cgi), Cgi is the censoring time, and I(A) is the indicator function which is equal to 1 if A is true, and equal to 0 otherwise. The treatment effect, Δt, can be estimated using inverse probability of censoring weighting:

Δ^(t)=1n1i=1n1I(X1i>t)W^1C(t)- 1n0i=1n0I(X0i>t)W^0C(t)

where W^gC(t) is the Kaplan-Meier estimator of the survival probability for Cgi at time t. The residual treatment effect, ΔSt,t0, can be estimated nonparametrically as:

Δ^S(t,t0)=1n0i=1n1Ψ^1(t| S0i,t0)I(X0i>t0)W^1C(t0)- 1n0i=1n0I(X0i>t)W^0C(t)

where Ψ^1(t|s,t0) is the kernel-based nonparametric estimate30,31 of PT1>t| S1=s, T1>t0. Finally, the PTE is estimated as R^(t,t0)=1-Δ^S(t,t0)Δ^(t). Variance estimates and 95% confidence intervals are obtained bootstrap resampling.

A reasonable question would be whether there are equivalents to the Freedman or flexible model-based approaches that can be used in the censored outcome setting. Technically, one could fit two Cox proportional hazards models, one with only the treatment indicator and a second which adds the surrogate model, and this would parallel the Freedman approach. However, in this case, it is nearly impossible for these two models to hold simultaneously, and many have shown that this approach can lead to substantial bias.18,20 Thus, we only illustrate the nonparametric approach in the time-to-event outcome setting.

Software

These methods are illustrated using the Rsurrogate22 package available on CRAN23 and implemented in R Version 4.2.1. Specifically, the two main functions used are R.s.estimate and R.s.surv.estimate.

Results

Full R code to replicate these results is available in the appendix. The dataset for the general outcome setting included 500 subjects in the treated group and 500 subjects in the control group. The mean of the primary outcome, Y, in the treated group was 22.07 (standard deviation [SD] = 6.01) and in the control group was 14.34 (SD = 4.28) resulting in an estimated treatment effect on the primary outcome of Δ^ =7.73. Estimation results for PTE using the Freedman estimate (R^F), the flexible model-based estimate (R^M), and the nonparametric estimate (R^S) are shown in Table 2. The PTE estimates are similar, ranging from 0.182–0.216. Specifically focusing on the nonparametric estimation approach, the estimated PTE was 0.182 with a 95% confidence interval of (0.121, 0.240). This means that we estimate that 18.2% of the treatment effect on the primary outcome is explained by the surrogate marker, reflecting weak surrogacy in this simulated data.

Table 2.

Results for the general outcome setting motivated by the Treatment of Nonalcoholic Fatty Liver Disease in Children (TONIC) where the primary outcome was change in a liver biopsy score and the surrogate marker alanine aminotransferase (ALT); estimates, standard error estimates, and associated 95% confidence intervals are shown using the Freedman estimate (R^F), the flexible model-based estimate (R^M), and the nonparametric estimate (R^S)

R^F R^M R^S
Estimate 0.216 0.215 0.182
Standard Error 0.023 0.024 0.032
95% Confidence Interval (0.169, 0.259) (0.166, 0.257) (0.121, 0.240)

The dataset for the time-to-event outcome setting included 500 subjects in the treated group and 500 subjects in the control group. Using t0=1 and t=3 for illustration, the estimated treatment effect on the primary outcome measured as the difference in survival at t=3 between the treatment and control groups was Δ^t=0.191 (Table 3). Using the nonparametric approach, the estimated PTE was 0.596 (95% CI: 0.404, 0.760). Though the point estimate of 0.596 (i.e., the surrogate marker explained 59.6% of the treatment effect on the survival time) is moderately strong, since the lower bound of the CI is below 0.50, a reasonable conclusion would be that there is not sufficient evidence that this is a strong surrogate marker in this simulated data.

Table 3.

Results for the time-to-event outcome setting motivated by the Diabetes Prevention Program (DPP) clinical trial where the primary outcome was time to diabetes diagnosis (with follow-up to 5 years; censored outcome) and the surrogate marker was change in fasting plasma glucose at 2 years post-randomization; estimates, standard error estimates, and associated 95% confidence intervals are shown for the treatment effect at t=5 years, Δt, the residual treatment effect, ΔSt,t0, and the proportion of treatment effect explained by the surrogate, R^t,t0

Δ^(t) Δ^S(t,t0) R^(t,t0)
Estimate 0.191 0.077 0.596
Standard Error 0.033 0.022 0.097
95% Confidence Interval (0.125, 0.257) (0.038, 0.123) (0.404, 0.760)

Discussion

We have described and illustrated how to evaluate a surrogate marker of interest using the PTE for (1) a setting with a general fully observed primary outcome and (2) a setting with a time-to-event primary outcome where individuals may be censored/drop out of the study using both parametric and nonparametric approaches and have provided detailed R code to replicate all results.

Recommendations

  • If the primary outcome in the analysis is not subject to censoring, a parametric or nonparametric approach could be used. If there is substantial evidence (e.g., biological evidence supporting the model, model diagnostics indicating excellent fit) that the models used in the parametric approach are correct, then one should use model-based parametric estimation as this will be more efficient (in terms of smaller standard errors, more narrow confidence intervals as in Table 2, and less computational burden) than the nonparametric approach. However, if there is no evidence that these models hold, the nonparametric approach should be used.

  • If the primary outcome in the analysis is subject to censoring (or a nonlinear model is needed to model the association between the primary outcome and surrogate marker and treatment), the nonparametric approach should be used.

  • Bootstrap or similar resampling methods should be used to estimate the standard error for the PTE estimate and confidence intervals, as is implemented in the provided code.

  • If the sample size is very small, e.g., less than 50 per treatment group, the nonparametric approaches may not work well. Statistical work is ongoing to develop robust methods for small sample size settings.

  • If the observed distribution of the surrogate marker in the treated group versus the control group do not overlap (see Figure 2), the nonparametric approaches may not work well since they smooth over the distribution in the treated group and apply this smoothed estimate to the control group. In this case, other methods, such as the surrogate transformation approach of Wang et al. (2020)21 should be considered.

  • It is important to consider the likelihood of being in a surrogate paradox situation. The surrogate paradox occurs when the surrogate and primary outcome are positively associated, the treatment effect has a positive effect on the surrogate, but the treatment effect has a negative effect on the primary outcome. There are numerous examples of the surrogate paradox in clinical studies and the consequences can be dire.3236 It has been rigorously shown that certain causal assumptions can ensure that we are protected from this paradox.37 Essentially, these assumptions are that (1) the expected value of the primary outcome given the surrogate is monotone increasing; (2) there is some non-negative treatment effect on the surrogate marker for all potential values of the surrogate; and (3) conditional on the surrogate, there is a non-negative treatment effect on the primary outcome for all potential values of the surrogate. While these assumptions are stated in nearly every statistical paper on surrogate marker evaluation, they are not testable with observed data because they are stated in terms of the unknown true distributions of the random variables and unknown potential outcomes. Recent and ongoing work has focused on developing methods to assess sensitivity to violations of these assumptions.38,39 If these assumptions are unreasonable in one’s particular setting, results regarding surrogate strength should be interpreted with extreme caution.

  • Depending on the setting, one should consider the possibility of heterogeneity in the utility of the surrogate, meaning that the surrogate may be a valid replacement of the primary outcome for only a subgroup of individuals. Methods and software have been developed to examine and formally test for potential heterogeneity with respect to a baseline individual characteristic (e.g., age).4042

  • After a strong surrogate marker is empirically confirmed, the ultimate use of the surrogate marker is to replace the primary outcome to examine the treatment effect in similar new studies. The transportability of the surrogacy between studies is not guaranteed and can be empirically examined via meta-analysis, when relevant data are available.

Figure 2.

Figure 2.

Illustration of overlapping surrogate distributions (a) vs. lack of overlap in surrogate distributions (b)

We hope that this workshop will be useful for future researchers in their evaluation of potential surrogate markers and increase the use of the nonparametric approaches given their lack of reliance on strict model assumptions.

Supplementary Material

Supplemental Data File (.doc, .tif, pdf, etc.)

Funding:

Support for this research was provided by National Institutes of Health grants R01DK118354 and K24HL150476.

Footnotes

Conflicts of interest: The authors have no conflicts of interest to disclose.

References

  • 1.Lindström J, Ilanne-Parikka P, Peltonen M, et al. Sustained reduction in the incidence of type 2 diabetes by lifestyle intervention: follow-up of the Finnish Diabetes Prevention Study. The Lancet. 2006;368(9548):1673–1679. [DOI] [PubMed] [Google Scholar]
  • 2.Li G, Zhang P, Wang J, et al. The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: a 20-year follow-up study. The Lancet. 2008;371(9626):1783–1789. [DOI] [PubMed] [Google Scholar]
  • 3.Temple R Are surrogate markers adequate to assess cardiovascular disease drugs? Jama. 1999;282(8):790–795. [DOI] [PubMed] [Google Scholar]
  • 4.Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in medicine. 1989;8(4):431–440. [DOI] [PubMed] [Google Scholar]
  • 5.FDA: Accelerated Approval. https://www.fda.gov/patients/fast-track-breakthrough-therapy-accelerated-approval-priority-review/accelerated-approval Accessed April 20, 2023.
  • 6.Mahase E FDA allows drugs without proven clinical benefit to languish for years on accelerated pathway. British Medical Journal Publishing Group; 2021. [DOI] [PubMed] [Google Scholar]
  • 7.Parast L, Gilbert P, Wu L. Statistical Challenges in the Identification, Validation, and Use of Surrogate Markers. https://www.birs.ca/cmo-workshops/2022/22w5184/report22w5184.pdf Banff International Research Station for Mathematical Innovation and Discovery 2022; [Google Scholar]
  • 8.Elliott MR. Surrogate Endpoints in Clinical Trials. Annual Review of Statistics and its Application. 2023;10:75–96. [Google Scholar]
  • 9.Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Statistics in medicine. 1992;11(2):167–178. [DOI] [PubMed] [Google Scholar]
  • 10.Wang Y, Taylor JM. A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics. 2002;58(4):803–812. [DOI] [PubMed] [Google Scholar]
  • 11.Sprenger T, Kappos L, Radue E-W, et al. Association of brain volume loss and long-term disability outcomes in patients with multiple sclerosis treated with teriflunomide. Multiple Sclerosis Journal. 2020;26(10):1207–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Agyemang E, Magaret AS, Selke S, Johnston C, Corey L, Wald A. Herpes simplex virus shedding rate: surrogate outcome for genital herpes recurrence frequency and lesion rates, and phase 2 clinical trials end point for evaluating efficacy of antivirals. The Journal of infectious diseases. 2018;218(11):1691–1699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58(1):21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Huang Y, Gilbert PB. Comparing biomarkers as principal surrogate endpoints. Biometrics. 2011;67(4):1442–1451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Gilbert PB, Hudgens MG. Evaluating candidate principal surrogate endpoints. Biometrics. 2008;64(4):1146–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Buyse M, Molenberghs G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics. 1998:1014–1029. [PubMed] [Google Scholar]
  • 17.Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics. 2000;1(1):49–67. [DOI] [PubMed] [Google Scholar]
  • 18.Lin D, Fleming T, De Gruttola V. Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in medicine. 1997;16(13):1515–1527. [DOI] [PubMed] [Google Scholar]
  • 19.Parast L, McDermott MM, Tian L. Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in medicine. 2016;35(10):1637–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Parast L, Cai T, Tian L. Evaluating surrogate marker information using censored data. Statistics in medicine. 2017;36(11):1767–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wang X, Parast L, Tian L, Cai T. Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika. 2020;107(1):107–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rsurrogate: Robust Estimation of the Proportion of Treatment Effect Explained by Surrogate Marker Information. https://cran.r-project.org/web/packages/Rsurrogate/ Accessed May 5, 2023. [DOI] [PMC free article] [PubMed]
  • 23.CRAN. The Comprehensive R Archive Network. https://cran.r-project.org.
  • 24.Nobili V, Alisi A, Valenti L, Miele L, Feldstein AE, Alkhouri N. NAFLD in children: new genes, new diagnostic modalities and new drugs. Nature reviews Gastroenterology & hepatology. 2019;16(9):517–530. [DOI] [PubMed] [Google Scholar]
  • 25.Lavine JE, Schwimmer JB, Molleston JP, et al. Treatment of nonalcoholic fatty liver disease in children: TONIC trial design. Contemporary clinical trials. 2010;31(1):62–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Diabetes Prevention Program Research G. The Diabetes Prevention Program: baseline characteristics of the randomized cohort. Diabetes care. 2000;23(11):1619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Group DPPR. The Diabetes Prevention Program: design and methods for a clinical trial in the prevention of type 2 diabetes. Diabetes care. 1999;22(4):623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Group DPPR. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England journal of medicine. 2002;346(6):393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.National Institute of Diabetes and Digestive and Kidney Diseases Central Repository: NIDDK-CR Resources for Research. https://repository.niddk.nih.gov/home/.
  • 30.Dabrowska DM. Uniform consistency of the kernel conditional Kaplan-Meier estimate. The Annals of Statistics. 1989:1157–1167. [Google Scholar]
  • 31.Dabrowska DM. Non-parametric regression with censored survival time data. Scandinavian Journal of Statistics. 1987:181–197. [Google Scholar]
  • 32.Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Annals of internal medicine. 1996;125(7):605–613. [DOI] [PubMed] [Google Scholar]
  • 33.Investigators* CASTI. Effect of the antiarrhythmic agent moricizine on survival after myocardial infarction. New England Journal of Medicine. 1992;327(4):227–233. [DOI] [PubMed] [Google Scholar]
  • 34.Packer M, Carver JR, Rodeheffer RJ, et al. Effect of oral milrinone on mortality in severe chronic heart failure. New England Journal of Medicine. 1991;325(21):1468–1475. [DOI] [PubMed] [Google Scholar]
  • 35.Grimes DA, Schulz KF. Surrogate end points in clinical research: hazardous to your health. Obstetrics & Gynecology. 2005;105(5 Part 1):1114–1118. [DOI] [PubMed] [Google Scholar]
  • 36.Investigators WGftWsHI, Investigators WGftWsHI. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women’s Health Initiative randomized controlled trial. Jama. 2002;288(3):321–333. [DOI] [PubMed] [Google Scholar]
  • 37.VanderWeele TJ. Surrogate measures and consistent surrogates. Biometrics. 2013;69(3):561–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Elliott MR, Conlon AS, Li Y, Kaciroti N, Taylor JM. Surrogacy marker paradox measures in meta-analytic settings. Biostatistics. 2015;16(2):400–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Shafie Khorassani F, Taylor JM, Kaciroti N, Elliott MR. Incorporating Covariates into Measures of Surrogate Paradox Risk. Stats. 2023;6(1):322–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Parast L, Cai T, Tian L. Testing for heterogeneity in the utility of a surrogate marker. Biometrics. 2021; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Parast L, Cai T, Tian L. Using a surrogate with heterogeneous utility to test for a treatment effect. Statistics in medicine. 2023;42(1):68–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Roberts EK, Elliott MR, Taylor JM. Incorporating baseline covariates to validate surrogate endpoints with a constant biomarker under control arm. Statistics in medicine. 2021;40(29):6605–6618. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data File (.doc, .tif, pdf, etc.)

RESOURCES