Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 20.
Published in final edited form as: Stat Med. 2023 Sep 3;42(26):4738–4762. doi: 10.1002/sim.9886

Handling Missing Within-Study Correlations in the Evaluation of Surrogate Endpoints

Willem Collier 1,2, Benjamin Haaland 2, Lesley Inker 3, Tom Greene 2
PMCID: PMC10704210  NIHMSID: NIHMS1942015  PMID: 37845797

Summary

Rigorous evaluation of surrogate endpoints is performed in a trial-level analysis in which the strength of the association between treatment effects on the clinical and surrogate endpoints is quantified across a collection of previously conducted trials. To reduce bias in measures of the performance of the surrogate, the statistical model must account for the sampling error in each trial’s estimated treatment effects and their potential correlation. Unfortunately, these within-study correlations can be difficult to obtain, especially for meta-analysis of published trial results where individual patient data is not available. As such, these terms are frequently partially or completely missing in the analysis. We show that improper handling of these missing terms can meaningfully alter the perceived quality of the surrogate and we introduce novel strategies to handle the missingness.

Keywords: surrogate endpoint, meta-regression, Bayesian hierarchical modeling, missing-data

1 |. INTRODUCTION

Validated surrogate endpoints facilitate the implementation of randomized controlled trials (RCTs) for research in disease areas where use of an established clinical endpoint creates logistical hurdles for study size or duration.1,2,3,4,5,6,7,8 A surrogate endpoint is commonly a measure of disease progression that can be observed earlier, measured using less invasive means, or occurs more often than an established clinical endpoint. Surrogate endpoints are widely used in Phase 2 clinical trials, and in certain cases are approved by regulatory agencies for use as a primary endpoint in Phase 3 trials when it can be demonstrated that treatment effects on the surrogate endpoint accurately predict clinical efficacy, futility, or harm.1

A surrogate endpoint is often evaluated in what is referred to as a “trial-level analysis,” where the quality of the surrogate is inferred through the strength of the association between the treatment effects on the clinical endpoint and the treatment effects on the surrogate endpoint across a relevant collection of previously conducted trials.2,3,4,5,6,7,8,9,10,11 Because the input used for the evaluation includes estimated treatment effects, each trial’s sampling error in the estimated effects on both clinical and surrogate endpoints must be accounted for in the model used for the analysis. The model must also take into account the correlation between the estimated effects conditional upon the true treatment effects on the clinical and surrogate endpoint, which we refer to as within-study correlations. The estimated treatment effects, standard errors (SEs), and the within-study correlations combine to form the data input in a hierarchical meta-regression approach that is often used in practice.2,3,4,10,5,6

The treatment effect estimates for both the clinical and surrogate endpoints are commonly reported in publications of trial results.2,3,4,9,10,11,12,13,14,15 The corresponding SEs are also relatively easily obtainable using aggregate study-level data (e.g. through back-calculations based on confidence intervals). However, the within-study correlations are rarely reported and therefore require access to individual patient data (IPD) to estimate. As such, these terms are often fully or partially missing in the trial-level data used for the analysis. In a seminal paper describing a Bayesian hierarchical modeling approach to surrogate evaluation, Daniels and Hughes argued that within-study correlations are likely to be similar and small.2 For missing terms, these authors recommend imputing a common value that is less than 0.2 in magnitude for a primary analysis and suggest the possible utility of a sensitivity analysis covering alternative values for imputation. Korn et al. similarly argue that the within-study correlations can be fixed to zero with little consequence to inference on certain meta-regression parameters when using similar frequentist methods.11 Authors using Bayesian methods also discuss placing a weakly informative or non-informative uniform prior independently on each missing within-study correlation.2,3 It is also common for investigators to impute the mean within-study correlation from trials where these data are available.2,4,5,6 Bujkiewicz et al. impute the within-study correlation obtained from just one of the studies used for their surrogate evaluation for all remaining studies.9 Some authors have also demonstrated that a direct correlation between the clinical and surrogate outcomes (often computed on an observational dataset) can be transformed and imputed as a common within-study correlation for all trials.10,12 Finally, many surrogate evaluations ignore within-study correlations entirely.16,17,18 Importantly, the strategies recommended for completely missing within-study correlations will often result in use of within-study correlations that are weaker in magnitude than the true values because they assume these terms are all zero, close to zero, or are from a distribution centered at zero. Additionally, because only limited data is often used in practice to inform partially missing within-study correlations, these terms remain prone to misspecification.

We examine the impacts of the chosen approach to handling missing within-study correlations on estimates of the meta-regression parameters that influence the interpretation of the surrogate and propose novel strategies for handling such missingness. We perform our analyses using Bayesian meta-regression, which is used abundantly in practice.2,19,5,10,3,4 To our knowledge, there has been no work specifically addressing this issue in the surrogate evaluation literature, but two papers discussing frequentist bivariate meta-analysis partially motivated this work. Riley demonstrated that the within-study correlations exert influence on the strength of pooling of treatment effects on two separate endpoints.13 Ishak et. al. found in simulations that inference on parameters related to the between-study variance-covariance of treatment effects on clinical and surrogate endpoints across trials can be influenced by handling of within-study correlations.15 We will demonstrate the potential to obtain biased posteriors as a result of improper handling of within-study correlations through an analytic argument and with simulation and real data analyses. We will also highlight data scenarios (e.g., the sizes of the trials used) that lead to more or less severe bias. For our proposed missing data handling strategies, we first discuss what we refer to as a conservative prior for complete missingness. We then introduce novel Bayesian data adaptive priors for partial missingness. Our analyses are motivated by evaluation of surrogate endpoints for chronic kidney disease (CKD) clinical trials.5,6

The organization of our paper proceeds as follows. In section 2, we introduce meta-regression models used for surrogate evaluation, the priors and software we use for our analyses, and surrogate endpoints in CKD. We also describe how missing within-study correlations are frequently handled in practice, and we introduce our proposed strategies. We provide the motivating analytic discussion in section 3. In section 4, we introduce and provide results from a simulation study. In section 5, we perform analyses on a collection of real CKD clinical trials. Finally, in section 6, we provide a concluding discussion.

2 |. MODELS AND CONTEXT

2.1 |. Models and Notation for the Two-Stage Approach to Surrogate Endpoint Evaluation

As discussed in the introduction, a two-stage analysis is commonly used for the trial-level approach to surrogate endpoint evaluation, where treatment effects on both the surrogate and clinical endpoint are estimated separately for each trial in the first stage and these pairs of estimated treatment effects are meta-analyzed in the second stage.2,3,4,9,10,5,6 For i=1,,N trials used for the analysis, we let θˆci denote the ith trial’s suitably scaled estimated treatment effect on the clinical endpoint, and θˆsi the ith trial’s suitably scaled estimated treatment effect on the surrogate endpoint. We denote each trial’s within-study variance-covariance matrix as Σi. The elements of this matrix are ordinarily assumed fixed and used as input data for the meta-analysis.2,3,4,9,10,5,6,13,14,20,15 Specifically, Σi1,1=SE(θˆci)2 is the squared standard error of the clinical effect estimate and Σi2,2=SE(θˆsi)2 is the squared standard error of the surrogate effect estimate. Finally, the off-diagonal elements represent the within-study covariances, which we represent as Σi1,2=Σi2,1=r^iSE(θ^ci)SE(θ^si), where r^i is the within-study correlation estimated for study i. The pair of estimated treatment effects in each trial are assumed to follow a bivariate normal distribution conditional on the true treatment effects due to asymptotic normality of the effect estimates. We denote the pair of true effects as θci,θsi and represent the within-study model with the following.

(θˆci,θˆsi)~Nθci,θsi,Σi (1)

The relationship between the true treatment effects on the clinical and surrogate endpoints across trials is a key element in defining the quality of the surrogate. It has been common to assume bivariate normality of the pair of true effects for surrogate evaluations in CKD and in other settings, which we display below.3,5,6 Let μc and μs denote the between-study true mean treatment effects on the clinical and surrogate endpoints, respectively, and Ω the between-study variance-covariance matrix. For example, we refer to Ω1,1=σc2, Ω2,2=σs2 as the between-study variances of the true treatment effects on the clinical and surrogate endpoints, respectively, and σc and σs as the corresponding between-study standard deviations (SDs). Then, Ω1,2=Ω2,1=ρbσcσs is the between-study covariance between the true treatment effects on the clinical and surrogate endpoints.

(θci,θsi)~N(μc,μs,Ω) (2)

The bivariate normal second-stage model induces a linear relationship between the true treatment effects on the clinical and surrogate endpoints. Many authors estimate parameters of a reparameterized, but equivalent model to the bivariate normal between-trial model, which is represented in lines (3) and (4) below. We refer to this reparameterization as the hierarchical random effects model.3,5,6 For the meta-regression parameters, we refer α to as the intercept, β as the slope, σe2 as the error-variance (or the residual SD, σe, as the “error-SD”), and σs2 remains the same as above. As described shortly, for the simulation and application analysis sections of this paper, we fit and estimate parameters for the hierarchical random effects model represented by lines (1), (3), and (4), but simulate data using the equivalent, but alternative parameterization represented by lines (1) and (2).

(θˆci,θˆsi)~Nθci,θsi,Σi, (3)
θsi~Nμs,σs2,andθciθsi~Nα+βθsi,σe2. (4)

The interpretation of the quality of the surrogate being evaluated under the hierarchical random effects model largely depends on posteriors obtained for α, β, and σe.2,3,5,6 For a high-quality surrogate, the true intercept (α) should be close to zero to indicate that when the suitably scaled true treatment effect on the surrogate is the null effect, the mean true treatment effect on the clinical endpoint is also the null effect. A true non-zero meta-regression slope (β) of the appropriate sign indicates that a beneficial true effect on the surrogate corresponds to an expected beneficial true effect on the clinical endpoint. Finally, the error-SD σe should be small, indicating that a given true effect on the surrogate precisely predicts a corresponding true clinical effect.

A number of alternative modeling approaches are used in practice (e.g. Daniels and Hughes assume fixed true treatment effects on the surrogate).2 A weighted least squares (WLS) approach is also frequently used, where the θˆci are regressed directly on the θ^si and each squared residual is weighted inversely proportional to the squared standard error of the estimated effect on the clinical endpoint.16,17,18 The WLS method accounts only for sampling error of estimated effects on the clinical endpoint. For the purposes of this paper, we focus on the hierarchical random effects model because of its frequent use for surrogate evaluations in the CKD setting and because it appropriately accounts for sampling error for estimates of effects on both endpoints.

2.2 |. Estimating Within-Study Correlations Using Individual Patient Data

A variety of approaches estimate within-study correlations in studies with IPD. As is detailed by Riley et. al., direct estimation is possible in some cases when the clinical and surrogate outcomes are jointly modeled as a function of treatment at the individual level.20 The 2 × 2 within-study residual variance-covariance matrix is estimated from the model fitting procedure and terms obtained can be transformed into the within-study correlation of interest. Wei and Higgins also detail how IPD on a variety of mixed outcome types can be used to calculate the within-study correlations relevant to the effect estimates using the delta method.12 Both Daniels and Hughes and Bujkiewicz et. al. detail bootstrap-based strategies.2,19 These authors generate bootstrapped datasets containing resampled IPD within each trial. The treatment effect estimates are calculated independently within each bootstrapped dataset and these effects are used to calculate a within-study correlation across the bootstrapped datasets. For our application analysis, we use a direct approach by modeling clinical and surrogate outcomes within each trial and estimating covariance between effects via a robust sandwich estimator, implemented using the NLMIXED procedure in SAS.21

2.3 |. Strategies for Handling Missing Within-Study Correlations

As summarized above, there are three commonly used strategies to handle completely missing within-study correlations. One strategy (a) assumes the terms are fixed at zero.2,11 A second (b) assumes that within-study correlations are independent random variables centered at zero, which is typically achieved by assigning a uniform(−1, 1) prior independently to each (treating each term as an unknown parameter in the model, denoted ri).2,3 A third (c) imputes a common non-zero value for each trial, typically motivated by subject expertise or data from a single source.2,3,9,10,12 These strategies are listed here:

  • (a)

    Set r^i=0 for i=1,,N.

  • (b)

    In a Bayesian analysis, assign the prior ri~uniform(-1,1) for i=1,,N.

  • (c)

    For some fixed value c, set r^i=c for i=1,,N.

Strategies (a) and (b) may assume, either directly or on average, that within-study correlations are weaker than they actually are. For strategy (b), there is no data used to update the posterior for each trial’s within-study correlation. As such, each correlation used will remain centered at zero. As will be detailed shortly, in a collection of trials used for extensive evaluations of two classes of surrogate endpoints in CKD, the within-study correlations are entirely negative for one endpoint and are almost entirely positive for the other. Moreover, as discussed by Wei and Higgins and demonstrated by Daniels and Hughes in simulations, the within-study correlations of interest are closely related to the direct correlations between endpoints.2,12 As such, for an appropriate choice of surrogate, we would expect the within-study correlations to be related, both in size and direction, to the strength of the direct association between endpoints, and thus be non-zero for a useful surrogate. Although Daniels and Hughes argue that within-study correlations are likely to be less than 0.2, they demonstrate that these terms can be larger depending on, for example, the true underlying probability of the clinical event in the control arm of the study using a binary outcome. For one of the surrogates we consider in the CKD setting, the mean within-study correlation is greater than 0.5 in magnitude for the studies included in the meta-regression. We will soon demonstrate why underestimation of the magnitude of the within-study correlations can lead to over-estimation of the strength of the surrogate.

Using strategy (c) can also be problematic even if the common value is motivated by data. As we see in the trials we use for our application analysis and as is demonstrated by Daniels and Hughes, there can be a wide range of within-study correlations within one set of relevant trials.2 This may naturally occur when the trials used enroll patients with varying disease etiologies and stages. Thus, imputing a single value may result in misspecification of most of the true within-study correlations.

There are two strategies that have been used to handle partial missingness. Let r^im denote a missing estimated correlation for trial i(rim a missing correlation as a parameter in the model) and r^jnm a non-missing estimated correlation for trial ji. One approach (d) proposes mean-imputation, where the mean estimated within-study correlation is calculated among the trials for which this term is available.4,5,6 A second approach (e) assumes each within-study correlation is an independent random variable, typically from a uniform (,) distribution with bounds constructed using studies with non-missing estimated terms:3

  • (d)

    For Nnm non-missing terms, calculate m(r^)=1Nnmj=1Nnmrˆjnm and impute r^im=m(r^) for the Nm missing terms.

  • (e)

    Calculate the 2.5th and 97.5th sample percentiles from the Nnm available non-missing estimated correlations, denoted pl, pu, and assign the prior rim~unifpl,pu for the Nm missing terms.

Strategies (d) and (e) are intuitively plausible but have theoretical and practical limitations. These strategies assume missing and non-missing within-study correlations have an identical mean or stem from a similar distribution.22 However, there may be reasons to suspect missing and non-missing correlations differ. Again, these terms may relate to a variety of trial-level summary characteristics such as patient’s baseline disease stage, follow-up length, or the strength of the clinical benefit of the therapy under evaluation. If studies with and without correlations differ for these or other reasons, the available and non-available correlations may also differ. For example, in the collection of trials we use for CKD evaluations, more recently conducted trials have increasingly evaluated therapies for patients with healthier baseline kidney function. It is thus important to consider missing at random (MAR) or even missing not at random (MNAR) mechanisms as unaccounted-for discrepancies between missing and non-missing terms could also bias estimation of the meta-regression parameters.22 Finally, in practice many authors utilize only a small proportion of the overall set of trials to inform the missing terms. Employing strategy (d), Bujkiewicz et. al. again utilize only 1 of 50 trials and Papanikos et. al. use only 4 of 35 trials.4,9 Use of very little data may result in poor estimation of parameters that define the distribution of missing within-study correlations.

Novel Strategy for Complete Missingness

For completely missing within-study correlations, we evaluate the use of what we refer to as a “conservative prior.” For positive correlations, we use the following:

  • (f)

    In a Bayesian analysis, assign the prior ri~betaαb,βb for i=1,,N (prior placed on -1×ri if the correlations are expected to be negative).

It is often particularly problematic to overstate the strength of a weak surrogate, which as we later show is of greater risk when the magnitude of the within-study correlations is underestimated. As such, it may be beneficial to assign a prior to the missing terms which would be expected to overestimate them. We hypothesize that overestimation of the magnitude of the within-study correlations can lead to conservative posteriors for the meta-regression parameters, reducing the risk of falsely concluding that a surrogate is of high quality. The terms αb, βb control the extent to which the analysis is expected to be conservative.

Novel Strategies for Partial Missingness

We also propose and evaluate two novel strategies for handling partially missing within-study correlations. The first (g) is what we will refer to as a Bayesian data-adaptive beta prior, which assumes all within-study correlations follow a hierarchical distribution and allows the non-missing estimated correlations to inform the distribution of those missing. The degree to which the non-missing terms share information with the missing terms is flexible and depends on the data and priors. The second strategy (h) involves assigning a conditional data-adaptive prior, which we propose for use when MAR is the suspected mechanism. The conditional data-adaptive prior is also a hierarchical prior where it is assumed that both missing and non-missing correlations follow a common distribution, which depends on readily available trial-level summary variables. Let X denote a N×(k+1)) matrix with columns representing k trial-level summary variables for which information is available for all trials (e.g. baseline health), as well as a column of ones to encode the regression intercept. Then, let g(r^) be a vector of appropriately transformed correlations (via g()), mapping the interval (−1, 1) to the real line).

  • (g)

    In a Bayesian analysis, assume r^i~beta(uv,v(1-u)) for i=1,,N. Assign priors to u, v.

  • (h)

    In a Bayesian analysis, assume g(r^)~NXγ,σr2. Assign priors to σr and elements of the length k+1 vector γ.

The beta(uv,v(1-u)) distribution has mean u, and v can be thought to represent a precision parameter or prior sample size. Because we assign a hierarchical distribution to all within-study correlations, posteriors for u and v will be updated in model fitting, unlike previously considered strategies. The larger the proportion of trials for which the within-study correlations are non-missing, the more beta(uv,v(1-u)) will be updated to reflect the distribution of the available terms. The priors for u and v could be uninformative, informed by expert knowledge or external data, or chosen conservatively. For example, the priors for u and v could be chosen such that the beta (uv,v(1-u)) prior is concentrated near 1 if there is concern over under-estimating missing correlations, and this distribution would remain conservative unless an abundance of data suggests otherwise.

As discussed above, the within-study correlations may vary partly as a function of a trial population’s baseline health, the therapies under evaluation, etc. As such, this summary information could be used to inform the missing within-study correlations using strategy (h) to improve the accuracy of recovery of the missing terms. The posteriors for γ, σr would be updated in model fitting according to the strength of the association between summary variables chosen and the non-missing correlations.

2.4 |. Priors and Software

When fitting the hierarchical random effects model using a fully Bayesian strategy, we must assign priors to the meta-regression parameters μs, σs2, α, β, and σe2, and to the parameters determining the missing correlation handling strategy, if relevant αb,βb for strategy (f), u, v f for strategy (g), and γ, σr for strategy (h)). It is typical in the surrogate endpoint literature to assign diffuse priors to the meta-regression parameters μs, σs2, α, β, and σe2 to ensure that the data and not the choice of priors dictates the evidence.2,4,5,6 Consistent with previous work in CKD, we used diffuse normal priors (N(0, 102)) for the mean true effect on the surrogate μs and for the meta-regression intercept (α) and slope (β), and we used diffuse inverse-gamma (denoted IG(0.001, 0.001)) priors for variance terms σs2 and σe2.5 The IG(0.001, 0.001) distribution is an approximation to Jeffrey’s prior.

When using the conservative prior (strategy (f)) for handling missing correlations, the choice αb, βb should depend on how strong missing correlations are theorized to be and how conservative the inference is intended to be. For sake of illustration, we chose αb=1.5 and βb=1, which induces a prior with mean 0.6. This choice is discussed further in Section 4.1. When using strategy (g) for partial missingness, the choice of u, v should similarly depend on subject expertise and the extent to which the inference is intended to be conservative. We assigned priors u~beta(1,1) and v~gamma(100,1) for one set of analyses, and separately, u~beta(9,1) and v~gamma(1,1) for another set of more conservative analyses (discussed further in Section 4.2). Finally, when using strategy (h) for partial missingness, we used diffuse priors (the elements of γ were assigned independent N0,102 priors and σr2 an IG(0.001, 0.001) prior). This way, the data and not priors drive the strength of the association between trial-level variables and within-study correlations.

For our simulation analyses, we used the University of Utah Center for High Performance Computing (CHPC) Linux cluster, where we used R version 4.0.3 for simulating trial-level data and summarizing posteriors. The Markov chain Monte Carlo (mcmc) sampling used for model fitting was implemented using RStan version 2.21.12.23 Example Rstan code is provided in the supplement. The mcmc diagnostic quantities and figures were generated using RStan, R and bayesplot package functions.24 We used the Gelman-Rubin (GR) statistic (or “RHat”) as well as rank plots to assess convergence and mixing of chains.25,26 We also evaluated effective sample size (ESS) to evaluate whether there were sufficient mcmc draws for stable inference.26 For our application analyses, we evaluated convergence and ESS using 3–4 chains for model fitting and ultimately used 10,000–20,000 mcmc draws. While the GR statistic and rank plots indicated adequate convergence and mixing with fewer iterations, ESS for variance parameters was at least 500 across analyses when using 10,000 or more mcmc draws. Similar assessments were used for simulations (example tables and figures used are provided in the supplement).

2.5 |. Motivating Examples: Surrogate Endpoint Evaluations in CKD and Other Disease Areas

The established clinical endpoint for CKD trials is typically a composite endpoint reflecting time-to-kidney failure with replacement therapy (KFRT) (initiation of dialysis or receipt of a kidney transplant or a sustained doubling of serum creatinine).5,7 This is a late-stage event for those with CKD, which restricts trials to evaluating therapies for patients who have rapidly progressive or very late stages of disease. Yet, those with earlier stages of CKD may have the most potential to benefit from new therapies that slow CKD progression.27 As a result, there is great interest in use of surrogate endpoints for CKD progression. Changes in albuminuria and decline in the glomerular filtration rate (GFR) are especially of interest as these are the two central biomarkers in CKD.6,7 Both albuminuria and GFR are part of the definition and staging system of CKD, and there exists strong epidemiological evidence that changes in either can increase the risk for progression to kidney failure.7,28,29,30,31,7

The Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) has used a collection of previously conducted CKD trials to evaluate albuminuria and GFR-based endpoints as surrogates for time-to-KFRT.5,6 These analyses have utilized IPD for each trial. Ongoing work in CKD-EPI is intended to further evaluate the quality of these surrogates by incorporating the increasingly large number of recently completed CKD trials. Updated evaluations are important given that recently conducted trials evaluated novel interventions and distinct patient populations.7 However, new analyses are slowed by the need to secure IPD for recently completed studies. An improved understanding of the potential impacts of how partially missing within-study correlations are handled and a rigorous approach to doing so will facilitate streamlining of surrogate evaluations in CKD.

Issues related to missing within-study correlations are also persistent for surrogate evaluations in many other disease areas, where complete missingness is also frequently confronted. For example, publications of cancer clinical trials often report treatment effect estimates and the associated SEs for multiple endpoints (e.g. tumor objective response rate, progression-free survival, and overall survival), but not the within-study correlations.4,9 As a result, evaluations of surrogate endpoints with completely missing within-study correlations are common in the cancer setting.16,17,18

3 |. ANALYTIC DISCUSSION

In this section, we provide an analytic demonstration of the dependence of posterior inference for certain between-study parameters used to interpret the quality of the surrogate on specification of within-study correlations. Our demonstration revolves around the posterior mode of ρb, the between-study correlation between the true treatment effects on the clinical and surrogate endpoints, which is, of course, closely related to β and σe. We focus on the mode to provide general intuition for how the location of the bulk of the probability mass in the posterior for ρb behaves depending on how missing within-study correlations are handled. The simulation and applied analyses provide alternative posterior summary metrics of interest. We first discuss why, generally, the posterior mode in ρb will depend on r^i and then discuss the case where a set of similarly sized trials are used for the analysis.

Again, let σc represent the total between-study SD of true treatment effects on the clinical endpoint. Recalling model (2) from the previous section, Ω1,2=Ω2,1=ρbσcσs. We evaluate formulae for the derivative of the log of the joint posterior of the between-study parameters expressed in model (2), which we denote p(μ,Ωθ^). Let μ=μc,μs and write the marginal distribution of the within-study treatment effect estimates for trial i as θˆiμ,Ω~Nμ,Σi+Ω. Then, the joint distribution of effect estimates conditional on the model parameters can be expressed as θˆμ,Ω~NXμ,DΣ+Ω, where θˆ is the length 2N vector of estimated treatment effects across all trials concatenated into one vector, X is a 2N×2 design matrix with its first column being the length 2N vector (1,0,1,0,,1,0) and with its second column being the length 2N vector (0,1,0,1,,0,1), and DΣ+Ω is the block-diagonal matrix with 2 × 2 diagonal elements Σi+Ω. For nearly flat priors on Ω, μ, the log-posterior of interest can be approximated by the log-likelihood, (μ,Ωθˆ), which equals the following up to an additive constant.

-12logDΣ+Ω-12(θˆ-Xμ)DΣ+Ω-1(θˆ-Xμ)

The determinant of the study specific variance-covariance matrix for study i can be expressed as Di=(SE(θ^ci)2+σc2)(SE(θˆsi)2+σs2-(rˆiSE(θˆci)SE(θˆsi)+ρbσcσs)2. Also, let θˆ*ci and θˆ*si represent centered estimated treatment effects for trial i using the true-mean treatment effects (θˆ*ci=θˆci-μc and θˆ*si=θˆsi-μs). Then, the derivative with respect to ρb of the log-posterior approximation can be shown to equal the expression below. Additional detail on this derivation is given in supplement Section 1.

(i=1NDi2)1(i=1N{[(r^iSE(θ^ci)SE(θ^si)σcσs+ρbσc2σs2)Di(r^iSE(θ^ci)SE(θ^si)σcσs+ρbσc2σs2)(θ^*ci2(SE(θ^si)2+σs2)+θ^*si2(SE(θ^ci)2+σc2))+θ^*ciθ^*siσcσsDi+2θ^*ciθ^*si(r^iSE(θ^ci)SE(θ^si)+ρbσcσs)(r^iSE(θ^ci)SE(θ^si)σcσs+ρbσc2σs2)]×jiDj2}) (5)

To find the values of ρb which allow the expression above to be equal to zero, we can ignore the reciprocal of the product of squared determinants outside the sum as all elements of the product must be positive under practical assumptions. We thus focus our discussion on the ρb that allow the sum inside the parentheses in (5) to be equal to zero. Firstly for more general intuition, consider that the product jiDj2 will have the highest degree term in ρb, ρb4n-4. In formula (5), the highest order term in ρb in the square brackets, [...], will be of degree 3 (this occurs in the product ρbσcσsDi). Thus, the overall derivative is a polynomial in ρb of degree 4n-4+3=4n-1. This is an odd-degree polynomial with at least one real root in ρb (i.e., there is at least one extrema of the posterior). Because the remaining polynomial coefficients corresponding to lower degree terms in ρb contain r^i, it is clear that the locations of the zeros and thus the locations of the potential posterior modes for ρb depend on how r^i are specified. For example, in the first term of the sum in square brackets, we see -r^iSEθˆciSEθˆsiσc2σs2×ρb2.

A. Collection of Similarly Sized Trials

For further intuition, consider the case where all within-study correlations are missing and are fixed such that r^i=r^i (e.g., a common value such as zero is imputed) and where we additionally suppose that a collection of similarly sized trials are used for the analysis such that SE(θˆci)sc=mcσc, SE(θ^si)ss=msσs. Under such assumptions, DiDi. Then, finding the roots of the derivative in ρb of the log-posterior reduces to finding the roots of a degree-three polynomial. This can be seen by looking at formula (5), where jiDj2D2N-2. Thus, because D2N-2 no longer depends on i, it can be moved outside the sum to give the expression displayed below (with further simplifications made as a result of the assumptions stated for this subsection).

(D2N)1(D2N2){i=1N[(r^mcσcmsσsσcσs+ρbσc2σs2)D(r^mcσcmsσsσcσs+ρbσc2σs2)(θ^*ci2(ms2σs2+σs2)+θ^*si2(mc2σc2+σc2))+θ^*ciθ^*siσcσsD+2θ^*ciθ^*si(r^mcσcmsσs+ρbσcσs)(r^mcσcmsσsσcσs+ρbσc2σs2)]} (6)

Because D is non-zero for any value of ρb(-1,1) and so long as there is variation between studies in the true treatment effects on both endpoints, D2N-1D2N-2 is also non-zero under these practical assumptions, and finding the roots of (6) is achieved by finding the ρb which allow the sum displayed in that expression to be equal to zero. The sum in the expression in (6) can then be shown to reduce to the following (additional detail is again provided in supplement Section 1).

Nσc4σs4(ρb+mcmsr^)3+(σc3σs3i=1Nθ^*ciθ^*si)(ρb+mcmsr^)2+(Nσc2σs2(mc2σc2+σc2)(ms2σs2+σs2)σc2σs2(ms2σs2+σs2)i=1Nθ^*ci2σc2σs2(mc2σc2+σc2)i=1Nθ^*si2)(ρb+mcmsr^)+σcσs(mc2σc2+σc2)(ms2σs2+σs2))i=1Nθ^*ciθ^*si (7)

There are important implications to the formula being reduced to the form aρb+mcmsr^3+bρb+mcmsr^2+cρb+mcmsr^+d. Note that mc and ms describe the sizes of the common SEs on the effect estimates relative to their corresponding between-study SDs σc and σs. Suppose, conditional on the data and for the true within-study correlation rT, the roots (two potentially being complex) of the polynomial are α1,α2,α3. If instead of rT, we impute a common r^=rT-r^0, then the polynomial roots would each shift up by the amount mcmsr^0 (or down if we instead used r^=rT+r^0 for the same r^0). Importantly, this indicates that if the within-study correlation is under-estimated, the between-study correlation will often be biased upward to suggest a stronger surrogate than is true in reality. Alternatively, if the within-study correlation is overestimated, the between-study correlation would be biased downward, suggesting a more conservative interpretation of the surrogate being analyzed. That the degree of bias depends on mc,ms tells us that if either SE approaches zero relative to the corresponding between-study SD (e.g., ms0), bias in the posterior mode converges to zero no matter the magnitude of bias in the imputed r^0. If alternatively, both SEs are meaningfully away from zero (much more likely in practice), the posterior mode in ρb will be biased to a degree proportional to the size of bias in r^0. The magnitude of bias in the posterior mode in ρb could even exceed the bias in the imputed r^0 if msmc>1.

4 |. SIMULATION STUDY

In this section, we compare different approaches to handling missing within-study correlations via several simulation studies spanning a collection of practically relevant scenarios. We are primarily interested in posterior distributions of the meta-regression slope (β) and error-SD σe under the hierarchical random effects model of Section 2.1 (equations (3), (4)) obtained from model fitting. We provide posterior summaries of the meta-regression intercept (α) in the supplemental materials.

We investigate two broad scenarios: one where within-study correlations are entirely missing and another where they are partially missing. Our simulation setups revolve around a base case motivated by an analysis of the CKD-EPI trials for which IPD is available (based on analyses of the albuminuria-based surrogate). We consider 11 variations of the base case (values for all parameters are displayed in Table 1). Based on our findings in the analytic section, one of our hypotheses was that the strength of the relationship between misspecification of within-study correlations and bias in meta-regression parameters is driven by the SEs of the effect estimates relative to the between-study SDs. As such, we vary the ratio of the average SEs relative to the between-study SDs. We additionally evaluate the influence of the number of trials, the range of the sizes of the trials, and the range of the sizes of true underlying within-study correlations.

TABLE 1.

Parameters Defining the 12 Simulation Setups

Setup Number True Simulation Parameters
1 (Base Case) N = 50
ρb = 0.5
μc = −0.3
μs = −0.25
σc = 0.2
σs = 0.15
mc = 1.5
ms = 0.667
bc = 0.1
bs = 0.05
v0 = 10
Setup Number Parameters Varied Relative to Base Case
2 mc = 1, ms = 1
3 mc = 0.1, ms = 0.1
4 mc = 1, ms = 0.1
5 mc = 1, ms = 1, bc = 0.2, bs = 0.15
6 mc = 1, ms = 1, bc = 0.05, bs = 0.025
7 v0 = 5
8 v0 = 25
9 ρb = 0
10 ρb = 0.8
11 N = 25
12 N = 100

We simulated trial-level data for each setup as follows. We use uniform distributions to simulate SEs and either beta distributions to simulate within-study correlations or normal distributions to simulate transformed within-study correlations (the process used for transformed correlations is described further below). The uniform distribution generates SEs for the purpose of inducing variability in the size of the trials used for the analysis. The beta distribution generates positive correlations between 0 and 1 and we evaluated various different shapes for the beta distribution used (by varying u0,v0). The implications of our analyses generalize to the case where within-study correlations are negative in a straightforward manner. We define the average size of the SEs of the clinical and surrogate effect estimates relative to the corresponding between-study SDs as E(SE(θˆci))=mcσc and E(SE(θˆsi))=msσs. For each complete-missingness simulation, we simulate true treatment effects on both endpoints using (8), the SEs and within-study correlations using (9), (10), and (11), and the estimated treatment effects using (12). As mentioned in Section 2.1, while we simulated true treatment effects on both endpoints θci,θsi using a bivariate normal distribution (2), we are primarily interested in direct inference on the meta-regression parameters of the equivalent but reparameterized model represented in (4). Our model fitting is thus directly parameterized to obtain posteriors for meta-regression representation, as can be seen in the Rstan code presented in the supplement.

θci,θsi~Nμc,μs,Ω (8)
SE(θˆci)~unifmcσc-bc,mcσc+bc, (9)
SE(θ^si)~unifmsσs-bs,msσs+bs, (10)
ri~betau0v0,v01-u0 (11)
(θˆci,θˆsi)~Nθci,θsi,Σi (12)

4.1 |. Within-Study Correlations Completely Missing

The results of our simulation analyses for the cases where within-study correlations are completely missing are summarized in Table 2 and in Figure 1 (intercept posteriors are summarized in Supplement Figure 1 and Table 2). Within each setup we simulate data for the average true within-study correlation u0 being equal to 0.2, 0.4, 0.6, and 0.8. We utilize the range of values to demonstrate a potential trend in increasingly biased posteriors associated with increasingly under-estimated mean within-study correlations. We use the complete-missingness data methods described in Section 2 ((a),(b),(c),(f)). For method (c), we use c=0.2 and c=0.6 (results displayed in Supplement Tables 1 and 3, respectively). For our “conservative prior”, we use beta (1.5, 1). The beta (1.5, 1) prior has a mean of 0.6 and places the bulk of the probability mass on correlations closer to 1. We used this prior to represent the scenario where we would like to safeguard against incorrectly concluding that we have a strong surrogate. The prior could be chosen to have a higher (lower) mean if more (less) conservative inference is desired. We summarize bias in the meta-regression slope and error-SD, and use the root mean squared error (RMSE) and two forms of coverage for the true slope. Coverage (CVG) is the proportion of times the 95% credible interval contained the true slope. The additional coverage (CVG*) provides the proportion of times the 95% credible interval strictly exceeded the true slope as we were especially concerned with the scenario where the analysis would lead to over-interpretation of the strength of the surrogate. Bias and RMSE use the average posterior median across simulations. We focused on the posterior median for parity in summary metrics across parameters given that the median is most appropriate for summarizing variance posteriors.2,4,5,6

TABLE 2.

Posterior Summaries: Simulation Analyses Evaluating Handling of Completely Missing Within-Study Correlations

Impute 0 (a) Uniform(-1,1) Prior (b) Conservative Prior (f)

Setup 1 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.28 0.55 0.88 0.87 −0.05 0.23 0.52 0.94 0.93 −0.05 −0.63 0.79 1.00 0.66 −0.01
u0 = 0.4 0.48 0.63 0.74 0.74 −0.07 0.43 0.59 0.79 0.79 −0.06 −0.36 0.57 1.00 0.81 −0.01
u0 = 0.6 0.67 0.76 0.55 0.55 −0.09 0.60 0.70 0.62 0.62 −0.08 −0.09 0.42 0.99 0.96 −0.03
u0 = 0.8 0.83 0.88 0.32 0.32 −0.10 0.76 0.82 0.39 0.39 −0.10 0.17 0.44 0.88 0.88 −0.05

Setup 2 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.40 0.60 0.83 0.83 −0.06 0.34 0.56 0.86 0.86 −0.05 −0.69 0.92 1.00 0.68 0.00
u0 = 0.4 0.60 0.75 0.67 0.67 −0.08 0.55 0.71 0.72 0.72 −0.07 −0.43 0.80 1.00 0.82 0.00
u0 = 0.6 0.64 0.71 0.42 0.42 −0.10 0.59 0.68 0.52 0.52 −0.09 −0.06 0.58 0.97 0.93 −0.02
u0 = 0.8 0.70 0.75 0.21 0.21 −0.11 0.67 0.72 0.30 0.30 −0.10 0.25 0.50 0.86 0.85 −0.04

Setup 3 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.01 0.19 0.95 0.93 0.00 0.02 0.19 0.96 0.94 0.00 0.00 0.19 0.96 0.93 0.00
u0 = 0.4 0.01 0.19 0.97 0.94 0.00 0.01 0.18 0.97 0.94 0.00 0.00 0.19 0.97 0.93 0.00
u0 = 0.6 0.02 0.19 0.97 0.94 0.00 0.01 0.19 0.97 0.94 0.00 0.00 0.19 0.98 0.94 0.00
u0 = 0.8 0.04 0.20 0.93 0.91 0.00 0.03 0.20 0.94 0.91 0.00 0.01 0.19 0.97 0.94 0.00

Setup 4 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.03 0.26 0.97 0.95 −0.01 0.03 0.26 0.97 0.95 −0.01 −0.08 0.27 0.98 0.92 0.00
u0 = 0.4 0.07 0.28 0.91 0.89 −0.02 0.07 0.28 0.91 0.89 −0.02 −0.05 0.27 0.98 0.94 −0.01
u0 = 0.6 0.09 0.29 0.92 0.91 −0.02 0.09 0.29 0.92 0.91 −0.02 −0.01 0.27 0.97 0.91 −0.01
u0 = 0.8 0.13 0.30 0.88 0.87 −0.02 0.13 0.29 0.88 0.87 −0.02 0.03 0.26 0.97 0.94 −0.01

Setup 5 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.23 0.48 0.87 0.85 −0.03 0.20 0.47 0.90 0.88 −0.02 −0.30 0.53 0.99 0.83 0.01
u0 = 0.4 0.31 0.46 0.84 0.84 −0.03 0.28 0.44 0.86 0.86 −0.03 −0.14 0.37 0.99 0.94 0.01
u0 = 0.6 0.38 0.50 0.72 0.72 −0.04 0.35 0.48 0.76 0.76 −0.04 0.03 0.38 0.94 0.93 −0.01
u0 = 0.8 0.43 0.57 0.67 0.67 −0.06 0.41 0.55 0.68 0.68 −0.05 0.16 0.42 0.90 0.88 −0.03

Setup 6 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.46 0.68 0.83 0.83 −0.07 0.38 0.65 0.89 0.89 −0.05 −0.71 0.92 1.00 0.63 0.00
u0 = 0.4 0.62 0.74 0.66 0.66 0.55 0.69 0.72 0.72 −0.47 0.81 1.00 0.83 −0.01
u0 = 0.6 0.70 0.78 0.42 0.42 −0.10 0.66 0.77 0.50 0.50 −0.09 −0.08 0.64 0.98 0.93 −0.02
u0 = 0.8 0.69 0.73 0.20 0.20 −0.12 0.66 0.73 0.25 0.25 −0.11 0.28 0.53 0.86 0.86 −0.05

Setup 7 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.29 0.50 0.91 0.90 −0.05 0.24 0.47 0.94 0.93 −0.04 −0.60 0.73 1.00 0.73 0.00
u0 = 0.4 0.47 0.63 0.77 0.77 −0.07 0.42 0.59 0.79 0.79 −0.06 −0.34 0.56 0.99 0.84 −0.01
u0 = 0.6 0.67 0.77 0.57 0.57 −0.09 0.62 0.73 0.60 0.60 −0.08 −0.05 0.41 0.99 0.97 −0.03
u0 = 0.8 0.82 0.89 0.31 0.31 −0.10 0.76 0.82 0.37 0.37 −0.10 0.18 0.43 0.92 0.91 −0.05

Setup 8 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.28 0.54 0.89 0.88 −0.05 0.25 0.51 0.89 0.89 −0.05 −0.60 0.75 1.00 0.69 0.00
u0 = 0.4 0.47 0.65 0.75 0.73 −0.07 0.41 0.61 0.79 0.77 −0.06 −0.38 0.59 1.00 0.85 −0.01
u0 = 0.6 0.66 0.76 0.54 0.54 −0.09 0.59 0.70 0.61 0.61 −0.08 −0.08 0.43 0.98 0.94 −0.03
u0 = 0.8 0.80 0.87 0.34 0.34 −0.10 0.73 0.81 0.39 0.39 −0.10 0.18 0.45 0.89 0.88 −0.05

Setup 9 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.31 0.54 0.90 0.90 −0.04 0.29 0.52 0.94 0.94 −0.03 −0.60 0.75 1.00 0.74 −0.05
u0 = 0.4 0.50 0.69 0.77 0.77 −0.05 0.48 0.66 0.81 0.81 −0.04 −0.39 0.61 1.00 0.88 −0.04
u0 = 0.6 0.76 0.88 0.57 0.57 −0.06 0.72 0.85 0.60 0.60 −0.06 −0.14 0.52 0.96 0.92 −0.04
u0 = 0.8 1.01 1.11 0.32 0.32 −0.08 0.97 1.07 0.35 0.35 −0.08 0.17 0.51 0.90 0.89 −0.04

Setup 10 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.22 0.44 0.92 0.92 −0.03 0.16 0.42 0.94 0.94 −0.03 −0.57 0.72 1.00 0.65 0.03
u0 = 0.4 0.37 0.50 0.81 0.81 −0.05 0.31 0.46 0.86 0.86 −0.04 −0.30 0.50 1.00 0.83 0.01
u0 = 0.6 0.50 0.58 0.67 0.67 −0.06 0.43 0.52 0.74 0.74 −0.05 −0.05 0.35 0.98 0.95 −0.02
u0 = 0.8 0.63 0.68 0.52 0.52 −0.06 0.56 0.62 0.57 0.57 −0.06 0.20 0.36 0.92 0.92 −0.04

Setup 11 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.23 0.71 0.97 0.97 −0.05 0.20 0.69 0.98 0.98 −0.04 −0.78 1.05 1.00 0.77 −0.02
u0 = 0.4 0.42 0.71 0.91 0.90 −0.06 0.36 0.68 0.92 0.92 −0.06 −0.48 0.83 1.00 0.91 −0.02
u0 = 0.6 0.67 0.85 0.81 0.81 −0.08 0.60 0.79 0.84 0.84 −0.08 −0.10 0.53 0.99 0.97 −0.04
u0 = 0.8 0.80 0.91 0.72 0.72 −0.09 0.73 0.84 0.76 0.76 −0.09 0.09 0.56 0.95 0.95 −0.05

Setup 12 β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias β Bias RMSE CVG* CVG σe Bias

u0 = 0.2 0.28 0.40 0.83 0.83 −0.05 0.24 0.36 0.89 0.88 −0.04 −0.54 0.61 1.00 0.52 0.01
u0 = 0.4 0.47 0.55 0.55 0.55 −0.07 0.43 0.51 0.62 0.62 −0.06 −0.31 0.44 1.00 0.80 0.00
u0 = 0.6 0.67 0.71 0.22 0.22 −0.10 0.61 0.66 0.34 0.34 −0.09 −0.03 0.27 1.00 0.97 −0.01
u0 = 0.8 0.82 0.85 0.06 0.06 −0.11 0.75 0.79 0.09 0.09 −0.10 0.20 0.33 0.86 0.86 −0.04

u0 is the true mean within-study correlation used for simulation. CVG* is the proportion of times the posterior 2.5th percentile exceeded the true slope. Letters (e.g. (a)) displayed in the top row correspond to those used to define missing data handling strategies introduced in-text.

FIGURE 1.

FIGURE 1

Displayed are curves describing bias in the posterior median for the meta-regression slope as a function of the ratio of the average size of the SEs of the estimated treatment effects on the clinical and surrogate endpoints relative to the between-study SDs for the true treatment effects. We display curves obtained from four different missing data handling methods. Each grid used a specific average true within-study correlation. Letters in legend correspond to missing data handling strategies introduced in-text.

With the exception of the approach based on the conservative prior, the meta-regression slope posterior median became increasingly biased upward and the 95% credible interval excluded the true slope an increasing proportion of times as the simulated within-study correlations were increased, and thus were increasingly underestimated. The error-SD also became increasingly biased downward, providing the impression of a higher quality surrogate than is true. For each setup, use of the uniform (−1, 1) prior as opposed to zero-imputation did not meaningfully change the results. As can be seen in Table 1 of the supplement, imputing 0.2 as opposed to 0 reduced bias in both the slope and error-SD posterior, but still resulted in meaningful bias in most scenarios. Bias in the meta-regression intercept was also observed when mishandling missing within-study correlations (Supplement Figure 1). Bias in the meta-regression intercept, slope, and error-SD was most severe when the average size of the SEs was largest relative to the between-study SDs among setups considered (Setup 2), and bias was usually negligible if the SEs were, on average, smallest relative to the between-study SDs (Setup 3). Figure 1 (and Supplement Figure 1) provides a further look at the influence of the sizes of trials used. For this plot, we repeatedly simulated data using the base case but varied mc,ms. The impact of differing strategies for handling missingness also depended on other data characteristics such as the number of studies, the degree of variability between studies in terms of size and strength of within-study correlations, and the true size of the between-study correlation. On the other hand, use of the conservative prior often led to an overestimation of the true average within-study correlation (unless the mean was 0.8) and thus led to conservative inference (e.g. downward bias in the meta-regression slope posterior except in setup 3 or in when the mean within-study correlation was truly 0.8). Note again that the conservative prior can be modified. Imputing a single value of 0.6 for all missing correlations produced somewhat similar results when compared to using a conservative prior with a mean of 0.6 (Supplement Table 3). However, in the majority of the cases considered, the use of the common imputed value rather than the prior resulted in decreased coverage of the true slope by the 95% credible interval.

4.2 |. Within-Study Correlations Partially Missing

To evaluate methods for handling partially missing within-study correlations, we utilized three simulation strategies. First, we simulated both missing and non-missing terms from identical beta distributions to reflect MCAR data. 22 The second strategy for generating missing terms reflected MAR data.22 Under the MAR mechanism, we simulated transformed true correlations as a function of a simulated continuous trial-level summary variable and assigned missingness with probability depending on that same variable (detailed further below). For a third strategy, we simulated missing within-study correlations from a beta distribution shifted to reflect higher probabilities of larger within-study correlations relative to the beta distribution used to simulate non-missing terms, which represents MNAR.22 For the MAR and MNAR scenarios, we simulated data to achieve an average discrepancy between the true missing and non-missing correlations of 0.2 (in magnitude). This value was motivated by evaluations of the CKD-EPI collection of trials (trials with mean baseline GFR above or below 30ml/min/1.73m2, a threshold that defines “severe” CKD, have mean estimated within-study correlations that differ by 0.2 for GFR slope evaluations).

Simulation Setups and Summaries for MCAR and MNAR Data

For the MCAR and MNAR data simulation strategies, we use setups 1 through 10 displayed in Table 1. The beta distributions we use to generate data for the MCAR scenarios are displayed in equation (13), and the beta distributions we use for the MNAR scenarios are displayed in equation (14) (again letting rjm and rinm denote true correlations for studies missing and not missing those terms, respectively). For (13), we used u0=0.4. For (14), we used u0=0.4 and u1=0.6.

rjm~betau0v0,v01-u0,rinm~betau0v0,v01-u0 (13)
rjm~betau1v0,v01-u1,rinm~betau0v0,v01-u0 (14)

We consider four missing data handling strategies for our MCAR and MNAR scenarios. The first two strategies are (d) and (e) described in Section 2. We also consider two versions of strategy (g). For the first data-adaptive beta prior (“uniform”), we assign u~beta(1,1) and v~gamma(100,1). For this combination of priors, the beta (uv,v(1-u)) distribution is approximately uniform over the interval (0, 1). We also apply a more “conservative” variation, where we assign u~beta(9,1) and v~gamma(1,1), creating a beta (uv,v(1-u)) distribution with probability mass concentrated near 1.

Results from analyses under the MCAR data generating model are displayed in Table 3 (intercept posteriors summarized in Supplement Table 5). We provide the results from simulation Setups 2 and 3 in the main manuscript, which highlight the range in bias observed as a function of the size of the trials used in the analysis. Results from the remaining setups and the “uniform” variant of strategy (g) are summarized in the supplement. The summary metrics obtained from analyses using mean imputation, the informative uniform prior, and the uniform data adaptive beta prior were nearly identical. There was slight upward bias, on average, in the posterior median for the meta-regression slope obtained from use of these three methods. As the proportion of trials missing within-study correlations increased, bias and coverage did not change. Use of the conservative data adaptive beta prior resulted in a slight improvement in bias of the slope posterior if the proportion of trials missing within-study correlations was less than 0.8. If the proportion missing was 0.8, there was downward bias in the meta-regression slope and bias away from zero for the meta-regression intercept when using the conservative beta prior under most scenarios, which reflected the tendency for this approach to produce conservative inference.

TABLE 3.

Simulation Posterior Summaries for Partially-Missing Within-Study Correlations: MCAR/MNAR Missingness

Missing Data Mechanism: MCAR

Missing, Non-Missing Average rj Discrepancy = 0

Mean Impute (d) Informative Uniform Prior (e) Conservative Hierarchical Beta Prior (g)

β Bias CVG* CVG σe Bias β Bias CVG* CVG σe Bias β Bias CVG* CVG σe Bias

Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3

PM=0.2 0.06 0.01 0.97 0.98 0.93 0.95 −0.02 0.00 0.05 0.01 0.98 0.99 0.94 0.95 −0.02 0.00 0.04 0.01 0.98 0.98 0.95 0.95 −0.02 0.00
PM=0.5 0.06 0.01 0.97 0.98 0.95 0.95 −0.02 0.00 0.06 0.01 0.98 0.99 0.95 0.96 −0.02 0.00 0.03 0.01 0.98 0.99 0.95 0.95 −0.02 0.00
PM=0.8 0.06 0.01 0.97 0.98 0.95 0.95 −0.02 0.00 0.06 0.02 0.98 0.98 0.95 0.95 −0.02 0.00 −0.13 0.01 0.98 0.98 0.94 0.95 −0.01 0.00

Missing Data Mechanism: MNAR

Missing, Non-Missing Average rj Discrepancy = 0.2

Mean Impute (d) Informative Uniform Prior (e) Conservative Hierarchical Beta Prior (g)

β Bias CVG* CVG σe Bias β Bias CVG* CVG σe Bias β Bias CVG* CVG σe Bias

Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3

PM=0.2 0.13 0.01 0.96 0.99 0.95 0.96 −0.02 0.00 0.13 0.02 0.96 0.99 0.95 0.96 −0.02 0.00 0.12 0.01 0.97 0.99 0.95 0.95 −0.02 0.00
PM=0.5 0.21 0.01 0.92 0.99 0.92 0.96 −0.03 0.00 0.21 0.02 0.93 0.98 0.93 0.95 −0.03 0.00 0.17 0.01 0.93 0.99 0.93 0.96 −0.03 0.00
PM=0.8 0.30 0.01 0.86 0.99 0.86 0.96 −0.05 0.00 0.29 0.02 0.87 0.99 0.87 0.96 −0.05 0.00 0.14 0.01 0.95 0.99 0.93 0.95 −0.03 0.00

PM: Proportion Missing; Setup 2: Average SEs equivalent to between-study SDs. Setup 3: Average SEs one-tenth the size of between-study SDs. Letters (e.g. (d)) displayed correspond to those used to define missing data handling strategies introduced in-text.

When missing within-study correlations were simulated to be MNAR, use of mean-imputation, the informative uniform prior, and the uniform data adaptive beta prior resulted in nearly identical upward bias in the meta-regression slope posterior, which grew more severe as the proportion missing increased. Moreover, when using any of these three methods, as the proportion of missingness increased, coverage for the true slope decreased and bias for the error-SD increased unless the SEs on the average effect estimates were small relative to between-study SDs (Setup 3). When the conservative data adaptive beta prior was used, there was a reduction in bias of meta-regression slope posterior, the error-SD posterior, and the intercept posterior (intercept summarized in Supplement Table 7) relative to that which was observed using the other three strategies. The reduction in bias corresponded to improved coverage metrics across all proportions missing.

Simulation Setups and Summaries for MAR Data

As discussed previously, in the CKD-EPI collection of clinical trials, the data shows associations between estimated within-study correlations and trial-level summary variables. In particular, we note that Fisher’s Z-transformed estimated within-study correlations appear to be approximately linearly related to the mean baseline GFR, which is shown in Figure 2 in the supplement.32 For the purposes of the simulation study, we model Fisher’s Z-transformed correlations as conditionally normal, given a single continuous simulated summary variable.

We repeated simulation Setups 1–6, 9 and 10, but again altered our strategy for simulating correlations and assigning missingness. We simulated an observed standardized Gaussian trial-level summary variable, denoted xi for trial i. We then assigned missing status to the ith trial with probability depending on xi to achieve the same proportions missing considered in the MCAR and MNAR analyses (on average). We assigned missing status Mi for trial i by taking a draw from a Bernoulli pi distribution, with pi represented in equation (17) (a,b were altered to vary the average proportion missing). Then, for each simulation, we would first draw xi and then would draw Fisher’s Z-transformed correlations zi from the distributions in (15). The inverse of the Z-transformed correlations (the true within-study correlations ri) are represented in (16). The true γ0,γ1, and σz we use to generate data were chosen to reflect the relationship between baseline mean GFR and the Z-transformed within-study correlations in the CKD-EPI data. All other parameters were simulated using the distributions described in equation lines (8) - (12). We repeated all eight simulation setups twice, letting γ1=0.15 for the scenario where missing within-study correlations are stronger than those non-missing and γ1=-0.15 for the scenario where missing terms are weaker.

xi~N0,1,zixi~Nγ0+γ1xi,σz2 (15)
ri=exp2zi-1exp2zi+1 (16)
pi=expa+bxiexpa+bxi+1 (17)

For data simulated to reflect correlations MAR, we used missing data handling strategies (d), (e), and (h). Table 4 displays results from simulation Setups 2 and 3 from the scenario where missing within-study correlations are, on average, stronger than those non-missing. The results from the remaining setups are provided in Supplement Tables 1013 (intercept posterior summaries are also included). Use of mean-imputation or the informative uniform prior again resulted in increasingly biased posteriors for the meta-regression slope, error-SD, and intercept as the proportion of trials missing within-study correlations increased except for the scenario with smallest SEs (Setup 3). When missing within-study correlations were stronger, there was upward bias in the strength of the meta-regression slope. As can be seen in the supplement, when missing within-study correlations were weaker than those non-missing, there was most often a larger degree of downward bias in the slope posterior. When the conditional prior was used, bias in the slope posterior remained comparatively lower relative to bias obtained using the other two methods even as the proportion missing increased. Moreover, both forms of coverage remained higher and the bias in the intercept and error-SD posteriors often remained constant across proportions missing instead of increasing in magnitude.

TABLE 4.

Simulation Posterior Summaries for Partially-Missing Within-Study Correlations: MAR Missingness

Missing Data Mechanism: MAR Taking Into Account Observed Trial-Level Variables

Missing, Non-Missing Average rj Discrepancy = 0.2

Mean Impute (d) Informative Uniform Prior (e) Hierarchical Conditional Prio (h)

β Bias CVG* CVG σe Bias β Bias CVG* CVG σe Bias β Bias CVG* CVG σe Bias

Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3 Setup 2 Setup 3

PM=0.2 0.10 0.01 0.98 0.98 0.96 0.95 −0.02 0.00 0.10 0.01 0.98 0.98 0.96 0.95 −0.02 0.00 0.04 0.01 0.98 0.98 0.96 0.95 −0.02 0.00
PM=0.5 0.23 0.00 0.93 0.99 0.93 0.96 −0.03 0.00 0.23 0.00 0.92 0.99 0.92 0.96 −0.03 0.00 0.09 0.01 0.97 0.98 0.93 0.96 −0.02 0.00
PM=0.8 0.30 0.01 0.93 0.98 0.91 0.96 −0.04 0.00 0.31 0.01 0.92 0.98 0.91 0.96 −0.04 0.00 0.08 0.01 0.97 0.98 0.96 0.96 −0.02 0.00

PM: Proportion Missing; Setup 2: Average SEs equivalent to between-study SDs. Setup 3: Average SEs one-tenth the size of between-study SDs. Reminder: Hierarchical Conditional Prior conditions on one continuous simulated trial-level summary variable. Letters (e.g. (d)) displayed correspond to those used to define missing data handling strategies introduced in-text.

Our simulation study for partially missing within-study correlations showed that unless only a small proportion of trials are missing these terms (e.g., 20% or less), earlier approaches to handling missing terms can lead to meaningfully biased posteriors for the meta-regression slope, error-SD, and intercept. If data are MAR, the trial-level summary information associated with within-study correlations can be used to inform the distribution of the missing terms, potentially improving the accuracy of inference on the meta-regression parameters. The missing data mechanism will not be known in practice, but subject matter expertise and available data can be used to decide on the practicality of this approach.

Computational Performance of mcmc Algorithms

We provide brief summaries of the performance of mcmc algorithms under different analyses from our simulation study in Section 9 of the supplement. We focus the discussion on the base case; similar observations apply to the other setups. As Supplemental Table 17 indicates, when correlations are completely missing, use of the conservative prior resulted in qualitatively similar and in some cases improved ESS (e.g. for σe), averaged across simulations, and identical Rhat summaries relative to previously used strategies. Moreover, the conservative prior reduced the average maximum run time for warmup and sampling relative to the vague uniform(−1, 1) prior. As evidenced by results in Supplemental Table 18, for partially missing within-study correlations, use of the data-adaptive beta prior or the conditional imputation strategy provided qualitatively similar ESS summaries and identical Rhat summaries relative to imputing the sample mean, and only slightly increased computation time (about 1–4 minutes longer, on average). These results highlight the practicality of employing these novel strategies.

5 |. APPLICATION ANALYSIS

5.1 |. CKD-EPI Data Description

We also illustrate the approaches described above with the CKD-EPI collection of trials for which IPD is available. We utilized a collection of 47 trials used by Inker et. al. for evaluations of a GFR slope endpoint and 44 of those trials for evaluations of an albuminuria endpoint (the subset with data for the albuminuria analysis).5,6 We note that the results presented differ slightly from those presented by Inker et. al. and Heerspink et. al. because we used more diffuse priors for the meta-regression parameters and because, for our albuminuria-based analysis, we used three additional trials.5,6

For all analyses, the treatment effects on the clinical endpoint were expressed as log hazard ratios (HRs), which were estimated using proportional hazards models. The time-to-event clinical endpoint reflected whether a patient had experienced doubling of serum creatinine or KFRT. To estimate treatment effects on GFR, serum creatinine measurements obtained serially over the follow-up period of each trial were used to estimate GFR with the 2009 CKD-EPI creatinine equation (expressed in ml/min/1.73 m2).33 To account for informative censoring of the GFR measurements by KFRT or death, treatment effects on GFR slope were estimated under a joint mixed effects model for both the time-dependent GFR trajectory and the time-to-cessation of GFR measurements due to KFRT or death. In modeling GFR as a function of time, we used a linear spline model to allow for “acute” and “chronic” GFR slopes in each treatment arm. As is detailed by Inker et. al., for patients with CKD, many therapies cause an acute change in GFR (from time zero to about 3-months follow-up) that differs from the long-term response to therapy.5 The GFR slope that starts at the end of the acute phase is referred to as the chronic slope. In this manuscript, we represent treatment effects on GFR by the difference in the mean treatment and control group GFR chronic slopes (expressing in ml/min/1.73 m2 per year). Additional detail on these methods can be found in Vonesh et. al.5,34 The albuminuria-based effect was captured through an evaluation of a 6-month change in the albumin-to-creatinine ratio (ACR).6 For each patient, we computed the log-transformed ACR at the time closest to 6-months follow-up and at baseline. The log-transformed 6-month ACR measurements were regressed on treatment and baseline measurements, and the effect is represented by a geometric mean ratio 6

Table 5 summarizes key features of the CKD-EPI trials used for these analyses. We note that the median estimated within-study correlation between the clinical endpoint effects and the GFR slope endpoint effects was −0.53, while the median estimated within-study correlation between the clinical effects and the ACR effects was 0.15.

TABLE 5.

Summary of CKD-EPI Trials by Surrogate

Trials for GFR Analysis Trials for ACR Analysis

N (Number Trials) 47 44
Number Patients 392(137,646) 335(122,584)
SEs - Clinical Effect 0.25(0.17,0.42) 0.31(0.18,0.46)
SEs - Surrogate Effect 0.48(0.28,0.78) 0.09(0.06,0.16)
Within-Study Correlation −0.53(−0.57,−0.37) 0.15(0.10,0.23)
*Between-Study SD - σc 0.22(0.17,0.27) 0.19(0.15,0.23)
*Between-Study SD - σs 0.54(0.48,0.61) 0.15(0.14,0.17)

Data summarized via sample median(25th,75th percentiles)

*

Posteriors estimated from the data are summarized.

5.2 |. Application Analysis Results and Discussion

Completely Missing Within-Study Correlations

We fit the hierarchical random effects model for evaluations of both endpoints described above. For the GFR-slope-based endpoint, because correlations are negative we imputed −0.2 rather than 0.2 when using strategy (c), and the conservative beta prior was made negative with a mean of −0.6. Otherwise, priors used for the application section matched those used for the simulation study. Figures 2 and 3 display posterior density curves, the posterior medians, and 95% credible intervals for the meta-regression R2 and slope obtained from all such analyses (Figures 3 and 4 of the Supplement summarize intercept posteriors).

FIGURE 2.

FIGURE 2

Shown are posterior density curves for the meta-regression slope and R2 obtained from fitting the hierarchical random effects model under a variety of approaches to handling completely missing within-study correlations. The CKD-EPI collection of trials was used for the evaluation of GFR chronic slope as the surrogate. The legend summaries include the posterior median and 95% credible interval. Letters in legend correspond to missing data handling strategies introduced in-text.

FIGURE 3.

FIGURE 3

Shown are posterior density curves for the meta-regression slope and R2 obtained from fitting the hierarchical random effects model under a variety of approaches to handling completely missing within-study correlations. The CKD-EPI collection of trials was used for the evaluation of 6-month ACR change as the surrogate. The legend summaries include the posterior median and 95% credible interval. Letters in legend correspond to missing data handling strategies introduced in-text.

For the evaluations of GFR chronic slope, most within-study correlations were substantially under-estimated (in magnitude) when imputing 0 or −0.2, or when assigning the uniform (−1, 1) prior. The conservative prior led to slight overestimation of the average magnitude. There was a small bias in the meta-regression slope posteriors obtained for methods mishandling completely missing within-study correlations relative to the posterior obtained using the available estimates. The R2 posteriors appeared more sensitive to the strategy used to handle the missing terms. For all methods except the conservative prior, the R2 posteriors were shifted considerably closer to one, which meaningfully inflates the perceived strength of the surrogate. There were also clear shifts in the intercept posterior after using all methods under missing correlations.

For ACR evaluations, the estimated within-study correlations are weaker. As such, only imputation of 0 or assigning a uniform (−1, 1) prior underestimated the missing terms. When using these two methods, the R2 posteriors were again biased towards 1 relative to those obtained using the available correlations. The meta-regression slope posteriors were meaningfully shifted above 0. Whereas use of the available within-study correlations provided a 95% credible interval including 0 for the slope, those obtained using 0-imputation or the uniform prior were shifted up to exclude 0. Alternatively, imputing 0.2 resulted in a slightly conservative posterior for R2, slope and intercept, and use of the beta prior produced even more conservative posteriors.

Partially Missing Within-Study Correlations

To evaluate methods for partially missing within-study correlations, we used two strategies to assign missing status. As previously discussed, the CKD-EPI trials were conducted over a span of many years and thus IPD has become available chronologically. We first assigned missing status based on the calendar year the IPD was made available and used two separate dates to create scenarios where it was as if the most recent 20% or 80% of trials were missing IPD. For our second strategy, we assigned missing status randomly to reflect the MAR scenario. Missing status was assigned for each trial again using a Bernoulli pi distribution. The probability pi the trial was missing IPD depended on the mean baseline GFR x1i, the log-transformed mean baseline ACR x2i, and two binary variables indicating the treatment class (renin-angiotensin system blockers x3i or blood pressure control x4i compared with a referent category “other”). Equation (18) was used to calculate pi for each trial, where a,b1,b2,b3,b4 were chosen to produce, on average, 20% or 80% missingness and were altered to obtain stronger or weaker missing within-study correlations. Because missing status was assigned randomly, we repeated the process 10 times and averaged the results across those. When fitting the trial-level model and using strategy (h) to handle missing within-study correlations, we modeled correlations as a linear function of x1i,x2i,x3i,x4i.

pi=expa+b1x1i+b2x2i+b3x3i+b4x4iexpa+b1x1i+b2x2i+b3x3i+b4x4i+1 (18)

First, consider Table 6. For evaluations of ACR, there was increased bias in both the meta-regression slope posterior and the R2 posterior as the proportion of trials missing within-study correlations increased when using the informative uniform prior or the “uniform” data adaptive beta prior. A few of the earliest CKD-EPI trials had correlations near zero, which could have strongly influenced the inference. Mean imputation resulted in minimal bias in either posterior considered. When the analysis produced biased posteriors, for both parameters bias occurred in the direction that corresponds to the interpretation of a higher-quality surrogate if most trials were missing within-study correlations. The conservative hierarchical beta prior again produced conservative posteriors. For the analysis of the GFR chronic slope, the bias in the meta-regression slope was very small for either proportion missing and for any method used. However, when the proportion missing was 0.8, the R2 posterior exhibited a meaningful bias towards 1 for all methods except use of the conservative hierarchical beta prior. Table 15 of the Supplemental Materials shows that the meta-regression intercept posterior also tended to be similar to analyses using IPD or were conservative under the hierarchical beta prior if the proportion missing was 0.8.

TABLE 6.

Posterior Summaries for Time-Dependent Partially-Missing Within-Study Correlations on the CKD-EPI Data

Analyses Evaluating ACR-Based Effects

Analyses Using All Available Within-Study Correlations

β Summary R2 Summary

0.77(−0.01,1.57) 0.35(0.01,0.85)

Analyses With Partially Missing Within-Study Correlations

Mean Impute (d) Informative Uniform Prior (e) Hierarchical Beta Prior (Uniform) (g) Hierarchical Beta Prior (Conservative) (g)

β Summary R2 Summary p Summary R2 Summary p Summary R2 Summary p Summary R2 Summary

PM = 0.2 0.79(−0.04,1.64) 0.36(0.00,0.85) 0.79(−0.02,1.63) 0.35(0.01,0.86) 0.76(−0.02,1.62) 0.33(0.00,0.81) 0.62(−0.25,1.44) 0.22(0.00,0.74)
PM = 0.8 0.77(−0.02,1.61) 0.35(0.00,0.83) 0.85(0.03,1.69) 0.40(0.01,0.90) 0.90(0.12,1.74) 0.44(0.01,0.90) 0.61(−0.19,1.46) 0.24(0.00,0.80)

Analyses Evaluating GFR Chronic Slope-Based Effects

Analyses Using All Available Within-Study Correlations

β Summary R2 Summary

−0.45(−0.63,−0.28) 0.90(0.51,0.99)

Analyses With Partially Missing Within-Study Correlations

Mean Impute (d) Informative Uniform Prior (e) Hierarchical Beta Prior (Uniform) (g) Hierarchical Beta Prior (Conservative) (g)

β Summary R2 Summary β Summary R2 Summary β Summary R2 Summary β Summary R2 Summary

PM = 0.2 −0.45(−0.60,−0.29) 0.91(0.54,0.99) −0.45(−0.61,−0.29) 0.92(0.56,0.99) −0.44(−0.61,−0.28) 0.91(0.52,0.99) −0.44(−0.60,−0.28) 0.91(0.52,0.99)
PM = 0.8 −0.46(−0.62,−0.30) 0.95(0.63,0.99) −0.46(−0.63,−0.30) 0.94(0.63,0.99) −0.46(−0.63,−0.31) 0.95(0.67,0.99) −0.44(−0.60,−0.28) 0.92(0.55,0.99)

Letters (e.g. (d)) displayed correspond to those used to define missing data handling strategies introduced in-text.

PM: Proportion Missing; Summaries are the posterior median and, in parentheses, the 95% credible interval.

Finally, consider Table 7 where we display results from the MAR analyses for the scenario where missing status was assigned with probability dependent on trial-level summary variables such that, on average, the missing within-study correlations were stronger (in magnitude) than those non-missing. Results from the opposite case are displayed in Supplemental Table 14. For either the analysis of ACR or GFR chronic slope, if only 20% of trials were missing within-study correlations for the MAR mechanism, there was limited bias in the posteriors considered no matter the method for handling missingness. However, as the proportion of trials missing within-study correlations increased to 0.8, there was again upward bias in the strength of either the meta-regression slope or 𝑅2 posteriors. Bias was again minimized when the Bayesian conditional imputation strategy was used.

TABLE 7.

Posterior Summaries for Partially-Missing Within-Study Correlations on the CKD-EPI Data: MAR Missingness

Analyses Evaluating GFR Chronic Slope-Based Effects

Analyses Using All Available Within-Study Correlations

β Summary R2 Summary

−0.45(−0.63,−0.28) 0.90(0.51,0.99)

Analyses With Partially Missing Within-Study Correlations

Mean Impute (d) Informative Uniform Prior (e) Hierarchical Conditional Prior (h)

β Summary R2 Summary β Summary R2 Summary β Summary R2 Summary

PM=0.2 −0.45(−0.63,−0.28) 0.90(0.51,0.99) −0.45(−0.62,−0.28) 0.90(0.51,0.99) −0.45(−0.63,−0.28) 0.90(0.51,0.99)
PM=0.8 −0.47(−0.64,−0.31) 0.95(0.65,0.99) −0.47(−0.64,−0.31) 0.95(0.65,0.99) −0.47(−0.64,−0.29) 0.92(0.49,0.99)

Application Analysis Discussion

Our applied analyses suggested imputing a 0 within-study correlation or using a prior centered at 0 would have suggested even better performance of ACR and GFR slope as surrogate endpoints than was actually the case based on analyses incorporating IPD. The analyses considered also showed that if the majority of trials used for model fitting contributed IPD (20% missing), the potential for meaningful bias in meta-regression posteriors was reduced no matter the method used to handle missingness. Alternatively, if for the majority of trials IPD was not available (80% missing), certain methods for handling the missing within-study correlations exhibited greater potential for meaningfully biased posteriors. Use of the conditional normal prior demonstrated potential to mitigate bias in meta-regression posteriors even if a majority of trials were missing within-study correlations.

6 |. DISCUSSION AND CONCLUSIONS

In this paper, we demonstrated that the methods used to handle missing within-study correlations can influence the conclusions of a commonly used two-stage approach to surrogate endpoint evaluation. In the current literature, within-study correlations have often been treated as if they exert little-to-no influence on posterior inference for key parameters in surrogate evaluation whether partially or completely missing. We showed in an analytic evaluation and through simulation and applied data analyses that the methods commonly used to handle completely missing within-study correlations tended to underestimate the magnitude of missing terms, resulting in biased posteriors in a direction that inflated the perceived strength of the surrogate. We also showed that common methods used to handle partially missing within-study correlations can lead to biased inference that may influence the perception of the quality of the surrogate. Finally, we proposed and demonstrated the potential benefits of three novel strategies for handling missing within-study correlations.

Our analyses highlighted data scenarios where the handling of missing within-study correlations could be more or less consequential. For example, our analyses indicated the ideal scenario would be to have IPD and correspondingly be able to calculate within-study correlations for as large a proportion of the overall set of trials as possible. There appeared to be limited consequence even to mishandling missing correlations if only a small proportion of trials had such terms missing (e.g., <20% missing). If correlations are mostly or completely missing, our analytic discussion section and simulation studies showed the sizes of the studies missing IPD strongly impacted the potential for missing data handling methods to influence inference. In particular, our results suggest that if the data used contains only large trials (manifested as small SEs), then missing correlations could potentially be handled using any method considered. Specifically, our simulations suggested that within-study correlations could potentially be assigned arbitrary values if the average size of the SEs on both endpoints was one-tenth the size of the corresponding between-study SDs. Alternatively, if the average size of the SEs were closer to being equivalent to the corresponding between-study SDs, biases induced into the within-study correlations could be entirely absorbed into certain meta-regression posteriors, necessitating rigorous handling of the missing terms. This is important to note because the average size of the SE on either endpoint can be compared to the posterior median between-study SD via univariable random-effects meta-analysis without IPD, which could be used to decide whether the effort to collect IPD must be made.

A point of emphasis in this paper was to discuss how the handling of missing within-study correlations could lead to an over-interpretation of the strength of a surrogate. Our analyses indicated that a key scenario of concern that could lead to such over-interpretation would be when there is a weak surrogate evaluated on trials with high within-study correlations. Our application analyses reflected cases where weaker (stronger) within-study correlations corresponded to a moderate (stronger) surrogate, but there are many settings where poorly performing surrogates are evaluated using data where within-study correlations are likely to be stronger. Consider progression-free survival (PFS) as a surrogate for overall survival (OS) in metastatic breast cancer (MBC). In a review paper, Gyawali, Hey, and Kesselheim found that PFS was a poor trial-level surrogate for OS in MBC trials, and yet discussed that therapies have been approved for MBC because of an effect on PFS without a demonstrated OS benefit.35 Yet, Burzykowski et al. and Michiels et al. found moderate correlations directly between the endpoints (PFS and OS) and for the within-study correlations when using IPD.36,37

The new approaches for handling missing within-study correlations introduced in this paper are novel in the surrogate space and demonstrated the potential to reduce biases that would have occurred under previously used methods. The results from both the simulation and application analyses also indicated these new methods are practical and computationally feasible. One potential challenge to the use of the new methods is that they require the specification of additional priors. For completely missing within-study correlations, we recommend use of domain-specific reasoning to assign a conservative prior to all missing terms. The mean of the prior should be chosen to reasonably exceed the expected mean of the within-study correlations if they were available, depending on subject-specific considerations. Choosing a common value to impute for all missing terms which is equivalent to the mean of the conservative prior could produce similar results, as was the case in our simulation analyses. However, because there was not an increased computational burden associated with the use of a conservative prior over a conservative constant, because coverage for the true meta-regression slope was typically better under use of the conservative prior over the constant value, and because it is more natural to consider a distribution as opposed to assuming a common fixed value for all missing terms, we advocate that the use of a conservative prior should be considered first over imputing a common fixed value for all missing correlations. For partially missing correlations, when using the data-adaptive beta prior (g), the priors used should depend on one of two considerations. If there are not any known key differences between trials with and without IPD in terms of the patient populations or therapies under evaluation, it would be reasonable to start by assigning a beta(1,1) prior to the mean and a gamma(100,1) prior to the beta “variance”, which similarly induces an effectively uniform(0,1) prior to missing within-study correlations. On the other hand, if there are clear differences between studies with and without IPD, it would be reasonable to use a more conservative set of priors, such as those described in Section 2.4. These priors place a higher prior probability on stronger within-study correlations (e.g. a prior probability of 0.9 that the within-study correlations are greater than 0.5) to facilitate more conservative inference. Of course, if trials with and without IPD are meaningfully different, then certain information can be used to model within-study correlations under our hierarchical conditional prior (strategy h). In that case, we suggest the use of diffuse priors to allow the data to determine the strength of the association between study-level information and within-study correlations. However, in all cases subject matter expertise should be used to guide the choice of the priors, and sensitivity analyses should be considered unless only a small proportion of studies are missing within-study correlations or if the trials missing within-study correlations are large (using criteria discussed above). In any case, we recommend against only imputing 0 or using the uniform(−1,1) prior. Within-study correlations are likely to be somewhere between 0 and 1 (or −1 and 0), and assuming these terms are all zero should be considered nearly as extreme as assuming they are all 1.

There are limitations and potential extensions to our work. For example, there are many conceivable strategies to model within-study correlations as a function of summary data. We chose to model Fisher’s Z-transformed within-study correlations as conditionally normal, but different combinations of transformations and distributional assumptions may further improve inference. Utilizing Bayesian methods for fitting trial-level meta-regression models with any given strategy to handle within-study correlations was computationally demanding, which limited the scope of the simulation study. Future analyses might evaluate more combinations of within and between-study variance terms to further detail how best to handle missing within-study correlations depending on other specific scenarios of concern. This work could also be extended by evaluating different missing data handling methods on real trial data in other disease settings, such as for surrogates in cancer. Finally, consistent with most work in the surrogate endpoint space, this paper involved analyses that utilize the assumption that the true within-study correlations can be directly calculated using IPD and serve as fixed data input in model fitting. Future work might evaluate additional strategies that account for estimation error for these terms.

The evaluation of surrogate endpoints should be as rigorous as possible. For a treatment effect on a surrogate to be used to accurately predict the effect on an endpoint that is meaningful to patients, the quality of the surrogate must be assessed using as much high-quality data as possible. For many scenarios, collecting IPD for the trials used for the analysis may be necessary to avoid having a biased perception of the quality of the surrogate. If it is impractical to collect IPD for some number of the trials used, missing within-study correlations should be treated with rigor out of concern of obtaining biased meta-regression posteriors. Missing data mechanisms and treatment of missing terms are rigorously considered for statistical analyses in many other areas of application and should also be for the trial-level evaluation of surrogate endpoints, as we have shown in this paper. We proposed new strategies for handling missing within-study correlations and demonstrated their practical usefulness as well as benefit in inference. The proposed strategies as well as similar strategies should be considered in future work that employs the trial-level approach to surrogate endpoint evaluation when there are missing within-study correlations.

Supplementary Material

Supinfo

ACKNOWLEDGMENTS

The study was funded by the National Kidney Foundation (NKF). NKF has received consortium support from the following companies: AstraZeneca, Bayer, Cerium, Chinook, Boehringer Ingelheim, CSL Behring, Novartis and Travere. This work received support from the Utah Study Design and Biostatistics Center, with funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR002538. The support and resources from the Center for High Performance Computing at the University of Utah are also gratefully acknowledged. We thank all investigators, study teams, and participants of the studies included in the application meta-analysis section. Specific details for the same studies used in our analyses have been detailed in previous work by CKD-EPI.6,5

Abbreviations:

IPD

Individial patient data

GFR

Glomerular filtration rate

ACR

Albumin to creatinine ratio

MCAR

Missing at random

MNAR

Missing not at random

MAR

Missing at random

DATA AVAILABILITY STATEMENT

Restrictions apply to the availability of the data used for the application analysis, which was used under license for this manuscript. The data are not publicly available due to privacy or ethical restrictions. Code used to generate data used for the purposes of the simulation study is provided in the supplemental materials.

References

  • 1.U.S. Food and Drug Administration. Guidance for industry: Expedited programs for serious conditions - drugs and biologics. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/expedited-programs-serious-conditions-drugs-and-biologics; 2014. Accessed January 1, 2022.
  • 2.Daniels MJ, Hughes MD. Meta-analysis for the evaluation of potential surrogate markers. Stat Med 1997; 16(17): 1965–1982. [DOI] [PubMed] [Google Scholar]
  • 3.Bujkiewicz S, Thompson JR, Spata E, Abrams KR. Uncertainty in the Bayesian meta-analysis of normally distributed surrogate endpoints. Stat Methods Med Res 2017; 26(5): 2287–2318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Papanikos T, Thompson JR, Abrams KR, et al. Bayesian hierarchical meta-analytic methods for modeling surrogate relationships that vary across treatment classes using aggregate data. Stat Med 2020; 39(8): 1103–1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Inker LA, Heerspink HJL, Tighiouart H, et al. GFR Slope as a Surrogate End Point for Kidney Disease Progression in Clinical Trials: A Meta-Analysis of Treatment Effects of Randomized Controlled Trials. J Am Soc Nephrol 2019; 30(9): 1735–1745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Heerspink HJL, Greene T, Tighiourt H, et al. Change in albuminuria as a surrogate endpoint for progression of kidney disease: a meta-analysis of treatment effects in randomised clinical trials. Lancet Diabetes Endocrinol 2019; 7(2): 128–139. [DOI] [PubMed] [Google Scholar]
  • 7.Levey AS, Gansevoort RT, Coresh J, et al. Change in Albuminuria and GFR as End Points for Clinical Trials in Early Stages of CKD: A Scientific Workshop Sponsored by the National Kidney Foundation in Collaboration With the US Food and Drug Administration and European Medicines Agency. Am J Kidney Dis 2020; 75(1): 84–104. [DOI] [PubMed] [Google Scholar]
  • 8.Buyse M, Molenberghs G, Paoletti X, et al. Statistical evaluation of surrogate endpoints with examples from cancer clinical trials. Biom J 2016; 58(1): 104–132. [DOI] [PubMed] [Google Scholar]
  • 9.Bujkiewicz S, Jackson D, Thompson JR, et al. Bivariate network meta-analysis for surrogate endpoint evaluation. Stat Med 2019; 38(18): 3322–3341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bujkiewicz S, Thompson JR, Riley RD, Abrams KR. Bayesian meta-analytical methods to incorporate multiple surrogate endpoints in drug development process. Stat Med 2016; 35(7): 1063–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Korn EL, Albert PS, McShane LM. Assessing surrogates as trial endpoints using mixed models [published correction appears in Stat Med. 2008 May 10;27(10):1797]. Stat Med 2005; 24(2): 163–82. [DOI] [PubMed] [Google Scholar]
  • 12.Wei Y, Higgins JPT. Estimating within-study covariances in multivariate meta-analysis with multiple outcomes. Stat Med 2013; 32(7): 1191–1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. J R Stat Soc Series A Stat Soc 2009; 172(4): 789–811. [Google Scholar]
  • 14.Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol 2007; 7(3): 1471–2288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ishak JK, Platt RW, Joseph L, Hanley JA. Impact of approximating or ignoring within-study covariances in multivariate meta-analyses. Stat Med 2008; 27(5): 670–86. [DOI] [PubMed] [Google Scholar]
  • 16.Kataoka K, Nakamura K, Mizusawa J, et al. Surrogacy of progression-free survival (PFS) for overall survival (OS) in esophageal cancer trials with preoperative therapy: Literature-based meta-analysis. Eur J Surg Oncol 2017; 43(10): 1956–1961. [DOI] [PubMed] [Google Scholar]
  • 17.Chen YP, Sun Y, Chen L, et al. Surrogate endpoints for overall survival in combined chemotherapy and radiotherapy trials in nasopharyngeal carcinoma: Meta-analysis of randomised controlled trials. Radiother Ooncol 2015; 116(2): 157–66. [DOI] [PubMed] [Google Scholar]
  • 18.Gharzai LA, Jiang R, Wallington D, et al. Intermediate clinical endpoints for surrogacy in localised prostate cancer: an aggregate meta-analysis. Lancet Oncol 2021; 22(3): 402–410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bujkiewicz S, Thompson JR, Sutton AJ, et al. Multivariate meta-analysis of mixed outcomes: a Bayesian approach. Stat Med 2013; 30(22): 3926–3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Riley RD, Price MJ, Jackson D, et al. Multivariate meta-analysis using individual participant data. Res Synth Methods 2015; 6(2): 157–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.The SAS Institute. The NLMIXED procedure. 2015.
  • 22.Pigott TD. A review of methods for missing data. Ed Res Eval 2001; 7(4): 353–383. [Google Scholar]
  • 23.RStan Development Team. Rstan: The R interface to Stan. 2020.
  • 24.Gabry J bayesplot R package. 2022.
  • 25.Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. New York, New York: Chapman and Hall. 1995. [Google Scholar]
  • 26.Vehtari A, Gelman A, Simpson D, Carpenter B, Burkner PC. Rank-normalization, folding, and localization: An improved rhat for assessing convergence of MCMC (with discussion). Bayesian Analysis 2021; 16. [Google Scholar]
  • 27.Schievink B, Kropelin T, Mulder S, et al. Early renin-angiotensin system intervention is more beneficial than late intervention in delaying end-stage renal disease in patients with type 2 diabetes. Diabetes Obes Metab 2013; 18(1): 64–71. [DOI] [PubMed] [Google Scholar]
  • 28.National Kidney Foundation. How to classify CKD. https://www.kidney.org/professionals/explore-your-knowledge/how-to-classify-ckd; 2022. Accessed February 1, 2022.
  • 29.Burton C, Harris KP. The role of proteinuria in the progression of chronic renal failure. Am J Kidney Dis 1996; 27(6): 765–775. [DOI] [PubMed] [Google Scholar]
  • 30.Abbate M, Zoja C, Remuzzi G. How does proteinuria cause progressive renal damage?. J Am Soc Nephrol 2006; 17(11): 2974–2984. [DOI] [PubMed] [Google Scholar]
  • 31.Tryggvason K, Pettersson E. Causes and consequences of proteinuria: the kidney filtration barrier and progressive renal failure. J Iintern Med 2003; 254(3): 216–224. [DOI] [PubMed] [Google Scholar]
  • 32.Fisher RA. 014: On the “Probable Error” of a Coefficient of Correlation Deduced from a Small Sample. 1921.
  • 33.Levey AS, Stevens LA, Schmid CH, et al. A New Equation to Estimate Glomerular Filtration Rate [published correction appears in Ann Intern Med. 2011 Sep 20;155(6):408]. Ann Intern Med 2009; 150(9): 604–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Vonesh E, Tighiouart H, Ying J, et al. Mixed-effects models for slope-based endpoints in clinical trials of chronic kidney disease. Stat Med 2019; 38(22): 4218–4239. [DOI] [PubMed] [Google Scholar]
  • 35.Gyawali B, Hey SP, Kesselheim AS. Evaluating the evidence behind the surrogate measures included in the FDA’s table of surrogate endpoints as supporting approval of cancer drugs. EClinicalMedicine 2020; 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Burzykowski T, Buyse M, Piccart-Gebhart MJ, et al. Evaluation of tumor response, disease control, progression-free survival, and time to progression as potential surrogate end points in metastatic breast cancer. J Clin Oncol 2008; 26(12): 1987–1992. [DOI] [PubMed] [Google Scholar]
  • 37.Michiels S, Pugliano L, Marguet S, et al. Progression-free survival as surrogate end point for overall survival in clinical trials of HER2-targeted agents in HER2-positive metastatic breast cancer. Ann Oncol 2016; 27(6): 1029–1034. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

Data Availability Statement

Restrictions apply to the availability of the data used for the application analysis, which was used under license for this manuscript. The data are not publicly available due to privacy or ethical restrictions. Code used to generate data used for the purposes of the simulation study is provided in the supplemental materials.

RESOURCES