How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

Casimir Ledoux Sofeu; Virginie Rondeau

doi:10.1371/journal.pone.0228098

. 2020 Jan 28;15(1):e0228098. doi: 10.1371/journal.pone.0228098

How to use `frailtypack` for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

Casimir Ledoux Sofeu ^1,^2,^*, Virginie Rondeau ^1,²

Editor: Alan D Hutson³

PMCID: PMC6986733 PMID: 31990928

Abstract

Background and Objective

The use of valid surrogate endpoints can accelerate the development of phase III trials. Numerous validation methods have been proposed with the most popular used in a context of meta-analyses, based on a two-step analysis strategy. For two failure time endpoints, two association measures are usually considered, Kendall’s τ at individual level and adjusted R2 ( ${adjR}_{t r i a l}^{2}$ ) at trial level. However, ${adjR}_{t r i a l}^{2}$ is not always available mainly due to model estimation constraints. More recently, we proposed a one-step validation method based on a joint frailty model, with the aim of reducing estimation issues and estimation bias on the surrogacy evaluation criteria. The model was quite robust with satisfactory results obtained in simulation studies. This study seeks to popularize this new surrogate endpoints validation approach by making the method available in a user-friendly R package.

Methods

We provide numerous tools in the frailtypack R package, including more flexible functions, for the validation of candidate surrogate endpoints using data from multiple randomized clinical trials.

Results

We implemented the surrogate threshold effect which is used in combination with $R_{t r i a l}^{2}$ to make decisions concerning the validity of the surrogate endpoints. It is also possible thanks to frailtypack to predict the treatment effect on the true endpoint in a new trial using the treatment effect observed on the surrogate endpoint. The leave-one-out cross-validation is available for assessing the accuracy of the prediction using the joint surrogate model. Other tools include data generation, simulation study and graphic representations. We illustrate the use of the new functions with both real data and simulated data.

Conclusion

This article proposes new attractive and well developed tools for validating failure time surrogate endpoints.

Introduction

The choice of endpoint for assessing the efficacy of a new treatment is a key step in setting up clinical trials. The use of the true endpoint increases the cost and duration of trials, and usually induces an alteration of the treatment effects over time [1, 2]. For example, in oncology, overall survival is a common clinical endpoint used during phase 3 trials to evaluate the clinical benefit of new treatments. However, its use requires a sufficiently long follow-up time and a sufficiently high sample size to show a significant difference in the treatment effect. To overcome this problem, there has been a lot of interest over the last three decades in the use of alternative criteria or surrogate endpoints to reduce the cost and shorten the duration of phase 3 trials [1–4]. A good surrogate endpoint should predict the effect of treatment on the primary endpoint [3].

Prentice (1989) [5] enumerated four criteria to be fulfilled by a putative surrogate endpoint. The fourth criterion, often called Prentice’s criterion, stipulates that a surrogate endpoint must capture the full treatment effect upon the true endpoint. The validation of Prentice’s criterion based on a clinical trial was quite difficult, mainly due to a lack of power and the difficulty to verify an assumption related to the relation between the treatment effects upon the true and the surrogate endpoints. Therefore, to verify this assumption and obtain a consistent sample size, Buyse et al. (2000) [6] like other authors [7] suggested basing validation on the meta-analytic (or multicenter) data. An important point when dealing with meta-analytic data is to take heterogeneity between trials into account, for the purpose of prediction outside the scope of the trial. Thus, a validated surrogate endpoint from meta-analytic data can be used to predict the treatment effect upon the true endpoint in any trial.

In the meta-analysis framework, when both the surrogate and the true endpoints are failure times, the current consensus is to base validation on the two-stage analysis strategy proposed by Burzykowski et al. [8]. In the first stage, the association between the surrogate and true endpoints is evaluated using a bivariate copula model after taken the trial specific treatment effects into account. In the second stage, the prediction of the treatment effect on the true endpoint based on the observed treatment effect on the surrogate endpoint is assessed using the adjusted coefficient of determination ( ${adjR}_{t r i a l}^{2}$ ). ${adjR}_{t r i a l}^{2}$ is obtained from the regression model on the estimates of the trial-specific treatment effects on both the surrogate and the true endpoints, after adjusting on the estimation errors obtained in the first-stage model. The programs that implement this method are available in the R package surrosurv [9] and the SAS macro %COPULA [10]. However, the practical use of the two-stage copula model is often difficult, mainly due to convergence issues or model estimation with the adjustment on the estimation errors [11–13]. This drawback led to the development since Burzykowski et al. [8] of alternative approaches [11, 13–17].

Most of the novel methods, except that of Sofeu et al. [17] and Rotolo et al. [13], are based on a two-stage validation strategy. Alonso and Molenberghs [14] proposed an information theory approach, with a new definition and quantification of surrogacy at the individual level and the trial level. The drawback of this method was the difficulty to provide a hard cut-off value in the information-theoretic measure, to discriminate between good and bad surrogates. Buyse et et al. [15] suggested a two-stage validation approach in which individual-level surrogacy was evaluated through the association between the trial-specific Kaplan-Meier estimates of the true endpoint versus Kaplan-Meier estimates of the surrogate endpoint at a fixed time point. It is also possible to base validation at the individual level on a bivariate copula model. In the trial-level evaluation, a weighted linear regression on the treatment effects on the surrogate and true endpoints was fitted and the coefficient of determination (R²) was used to quantify the proportion of variance explained by the regressions. The available programs also make it possible to account for variability between trials using a robust sandwich estimator of Lin and Wei [18].

For the approaches described in the previous paragraph, the R package surrogate [19], the SAS macros %TWOSTAGECOX and %TWOSTAGEKM, and the SAS programs available in Alonso et al. [10] were provided to carry out the evaluation exercise. Rotolo et al. [13] proposed a one-step validation approach based on auxiliary mixed Poisson models, which employs a bivariate survival model with an individual random effect shared between the two endpoints and correlated treatment-by-trial interactions. Simulation results described by the authors showed estimation biases on the surrogacy assessment measures, especially in the event of a high association and when heterogeneity of baseline risk is taken into account. The associated program was implemented in the R package surrosurv [9]. Renfro et al. [11] suggested estimating the second-stage model in a Bayesian framework and the estimate of the adjusted $R_{t r i a l}^{2}$ was then based on the posterior distribution of the parameters of the adjusted model. The corresponding trial-level surrogacy can be evaluated by adapting the WinBUGS and R programs described in Bujkiewicz et al. [20]. This approach emphasizes a decrease in estimation performance of the adjusted $R_{t r i a l}^{2}$ when the data characteristics are close to reality (for example, low trial size or number of trial).

More recently, we proposed a one-step validation approach based on a joint frailty model [17] to reduce convergence issues and estimation biases on the surrogacy evaluation criteria. In this novel method, we used a flexible form of the baseline hazard functions using splines to obtain smooth risk functions, which represent incidence in epidemiology. Several integration strategies were considered to compute integrals over the random effects, present in the marginal log-likelihood. The proposed joint surrogate model showed satisfactory results compared to the existing two-step copula and one-step Poisson approaches.

We aim in this paper to popularize this new surrogate endpoints validation approach by making the method available in a user-friendly R package (frailtypack). We have developed a prediction tool for the treatment effect on true endpoints based on the observed treatment effect on surrogate endpoints. Interpretation of $R_{t r i a l}^{2}$ and decision-making about the validity of the candidate surrogate endpoint are possible thanks to the classification suggested by the Institute for Quality and Efficiency in Health Care (IQWiG) [21], and surrogate threshold effect (STE) introduced by Burzykowski and Buyse [22]. Other tools are for displaying the basic risks and survival functions, for model assessment, and for data generation based on the joint surrogate model. Another attractive goal of this article is to provide a tool to perform simulation studies.

frailtypack is an R package that fits a variety of frailty models containing one or more random effects, or shared frailty. It includes a shared frailty model, a joint frailty model for recurrent events and terminal event, others forms of advanced joint frailty models [23], and now a joint frailty model for evaluating surrogate endpoints in meta-analyses of randomized controlled trials with failure-time endpoints. In this paper we focus on a particular subset of features applicable for evaluating surrogate endpoints.

The rest of this paper is organized as follows. In the next section, we summarize the joint surrogate model with the estimation methods and the surrogacy evaluation criteria. We end it with the definition of STE. In the third section, we introduce the functions developed in the R-package frailtypack to estimate the parameters of the joint surrogate model, as well as the new functions related to the surrogacy evaluation. In the fourth section, we illustrate the new functions using generated data and individual patient data from the Ovarian Cancer Meta-Analysis Project [24]. Finally, we present a concluding discussion.

Methodology

In this section, we present the one-step joint surrogate model for evaluating a candidate surrogate endpoint [17]. The model estimation and the surrogacy evaluation criteria are also discussed here.

Model and estimation

Joint surrogate model definition

Let us consider data from a meta-analysis (or a multi-center study); let S_ij and T_ij be two time-to-event endpoints associated respectively with the surrogate endpoint and the true endpoint such that S_ij < T_ij or S_ij = T_ij in the event of right censoring. We denote Z_ij1 the treatment indicator. S_ij can be the progression-free survival time (defined as the time from randomization to clinical progression of the disease or death) in patients treated for cancer and T_ij the overall survival (defined as the time from randomization to death from any cause). For the j^th subject (j = 1, …, n_i) of the i^th trial (i = 1, …, G), the joint surrogate model is defined as follows [17]:

\begin{matrix} {\begin{matrix} λ_{S, i j} (t | ω_{i j}, u_{i}, v_{S_{i}}, Z_{i j 1}) & = λ_{0 S} (t) exp (ω_{i j} + u_{i} + v_{S_{i}} Z_{i j 1} + β_{S} Z_{i j 1}) \\ λ_{T, i j} (t | ω_{i j}, u_{i}, v_{T_{i}}, Z_{i j 1}) & = λ_{0 T} (t) exp (ζ ω_{i j} + α u_{i} + v_{T_{i}} Z_{i j 1} + β_{T} Z_{i j 1}) \end{matrix} \end{matrix}

(1)

where,

ω_{i j} \sim N (0, θ), u_{i} \sim N (0, γ), ω_{i j} ⊥ u_{i}, u_{i} ⊥ v_{S_{i}}, u_{i} ⊥ v_{T_{i}}

and

\begin{matrix} (\begin{matrix} v_{S_{i}} \\ v_{T_{i}} \end{matrix}) \sim M V N (\begin{matrix} 0, Σ_{v} \end{matrix}), w i t h Σ_{v} = (\begin{matrix} σ_{v_{S}}^{2} & σ_{v_{S T}} \\ σ_{v_{S T}} & σ_{v_{T}}^{2} \end{matrix}) \end{matrix}

In this model, λ_0S(t) is the baseline hazard function associated with the surrogate endpoint and β_S the fixed treatment effect (or log-hazard ratio); λ_0T(t) is the baseline hazard function associated with the true endpoint and β_T the fixed treatment effect. ω_ij is a shared individual-level frailty that serve to take into account the heterogeneity in the data at the individual level due to unobserved covariates; u_i is a shared frailty effect associated with the baseline hazard function that serve to take into account the heterogeneity between trials of the baseline hazard function, associated with the fact that we have several trials in this meta-analytical design. Coefficients ζ and α distinguish both individual and trial-level heterogeneities between the surrogate and the true endpoint. $v_{S_{i}}$ and $v_{T_{i}}$ are two correlated random effects treatment-by-trial interactions.

Estimation

Marginal log-likelihood Let δ_ij and $δ_{i j}^{*}$ be the progression and the death indicators. Sofeu et al. [17] showed that the marginal log-likelihood from model (1) includes two integration levels and is defined as follows:

l (Φ) = log {\prod_{i = 1}^{G} \int_{U} [\prod_{j = 1}^{n i} \int_{ω_{i j}} λ_{S i j}^{δ_{i j}} \cdot S (S_{i j}) \cdot λ_{T i j}^{δ_{i j}^{*}} \cdot S (T_{i j}) f (ω_{i j}) d ω_{i j}] f (v_{S_{i}}, v_{T_{i}}) f (u_{i}) d U}

(2)

where $Φ = ({\hat{σ}}_{v_{S}}^{2}, {\hat{σ}}_{v_{T}}^{2}, {\hat{σ}}_{v_{S T}}, \hat{θ}, \hat{γ}, {\hat{λ}}_{0 T} (.), {\hat{λ}}_{0 S} (.), {\hat{β}}_{S}, {\hat{β}}_{T})$ is the vector of the model parameters and $U = (u_{i}, v_{S_{i}}, v_{T_{i}})$ is the vector of trial random effects. ${\hat{λ}}_{0 S} (.)$ and ${\hat{λ}}_{0 T} (.)$ are estimates for the baseline hazard functions associated with the surrogate endpoint and the true endpoint.

Parameters estimation The model parameters Φ were estimated by a semi-parametric approach using the maximization of the penalized likelihood. We used the robust Marquardt algorithm [25], which is a mixture between the newton-Raphson and the steepest descent algorithm. For more details on the penalized likelihood, see the S1A Appendix in S1 Appendix or [26]. In order to estimate the integrals present in (2), different numerical integration strategies were considered, including a mixture of the Monte-Carlo integration with the Pseudo-adaptive or the classical Gauss-Hermite quadrature.

Surrogacy evaluation criteria and interpretation

We have already proposed new definitions of Kendall’s τ and coefficient of determination as individual-level and trial-level association measures to evaluate a candidate surrogate endpoint [17]. We recall in the S1B and S1C Appendix in S1 Appendix the formulation of these association measures.

Prediction and surrogate threshold effect (STE)

Gail et al. [27] underlined some issues in using $R_{t r i a l}^{2}$ for assessing a candidate surrogate endpoint. The first problem is the difficulty in interpreting $R_{t r i a l}^{2}$ . For perfect prediction of the treatment effect on the true endpoints, $R_{t r i a l}^{2}$ must be equal to 1. However, such a situation is impossible in practice. Therefore, for $R_{t r i a l}^{2}$ ≠ 1, it is not clear what threshold would be sufficient for a valid surrogate endpoint. Another problem raised by Gail et al. [27] is that, unless $R_{t r i a l}^{2}$ = 1, the variance of the prediction of the treatment effect on the true endpoint in a new trial cannot be reduced to 0, even in the absence of any estimation error in the trial. Furthermore, if this effect is estimated directly from data on the true endpoint, this estimation error can theoretically be made arbitrarily close to 0 by increasing the trial’s sample size. To address these issues, Burzykowski and Buyse [22] proposed a new concept, the surrogate threshold effect. One of the most interesting features of STE is its natural interpretation from a clinical point of view. STE represents the minimum treatment effect on the surrogate necessary to predict a non-zero (significant) effect on the true endpoint. We show in S1D Appendix in S1 Appendix that STE, based on model (1), can be obtained by solving one of the following quadratic equations:

E (β_{T} + v_{T 0} | β_{S_{0}}, ϑ) - z_{1 - (γ / 2)} \sqrt{V a r (β_{T} + v_{T 0} | β_{S_{0}}, ϑ)} = 0

(3)

for the lower prediction limit function of the treatment effect on the true endpoint based on the observed treatment effect on the surrogate endpoint, or

E (β_{T} + v_{T 0} | β_{S_{0}}, ϑ) + z_{1 - (γ / 2)} \sqrt{V a r (β_{T} + v_{T 0} | β_{S_{0}}, ϑ)} = 0,

(4)

for the upper prediction limit function. Elements in Eqs (3) and (4) are defined in S1D Appendix in S1 Appendix.

Readers can refer to S1E Appendix in S1 Appendix for the interpretation of STE, in combination with $R_{t r i a l}^{2}$ and decision-making as suggested by the German Institute for Quality and Efficiency in Health Care [21]

Available functions in the `frailtypack R` package for surrogacy evaluation

In this section, we introduce the new R functions, used to estimate model (1). Functions for data generation and simulation studies are also described.

Estimation of joint surrogate model and surrogacy evaluation

The `jointSurroPenal()` function

Model (1) can be fitted using the jointSurroPenal() function defined as follows:

jointSurroPenal(data, maxit = 40, indicator.zeta = 1, indicator.alpha = 1,

frail.base = 1, n.knots = 6, LIMlogl = 0.001, LIMparam = 0.001,

LIMderiv = 0.001, nb.mc = 300, nb.gh = 32, nb.gh2 = 20, adaptatif = 0,

int.method = 2, nb.iterPGH = 5, nb.MC.kendall = 10000,

nboot.kendall = 1000, true.init.val = 0, theta.init = 1,

sigma.ss.init = 0.5, scale = 1, sigma.tt.init = 0.5, sigma.st.init = 0.48,

gamma.init = 0.5, alpha.init = 1, zeta.init = 1, betas.init = 0.5,

betat.init = 0.5, random.generator = 1, kappa.use = 4, random = 0,

seed = 0, random.nb.sim = 0, init.kappa = NULL, nb.decimal = 4,

print.times = TRUE, print.iter = FALSE)

The mandatory argument of this function is data, the dataset to use for the estimations. Argument data refers to a dataframe including at least 7 variables: patientID, trialID, timeS, statusS, timeT, status and trt. The description of these variables, like other arguments of the function, can be found in S2A Appendix in S2 Appendix, or via the R command help(jointSurroPenal). The rest of the arguments can be set to their default values. In addition, details on the required arguments/values are given in the illustration section.

The `jointSurroPenal` object

The function jointSurroPenal() returns an object of class ‘jointSurroPenal’, if the joint surrogate model has been estimated. We describe in S2A Appendix in S2 Appendix some of the relevant returned values, as well as the functions which can be applied to this object. A full description can be found by displaying the help on the function jointSurroPenal().

Data generation using the `R` function `jointSurrSimul()`

For data generation purposes, we implemented the algorithm described in Sofeu et al. [17] in the R function jointSurrSimul(). The generation procedure is based on model (1). A variant of this algorithm is to base generation on a model that includes just a shared frailty term at the individual level as described by Rondeau et al. [28]. This function is defined as follows:

jointSurrSimul(n.obs = 600, n.trial = 30, cens.adm = 549.24, alpha = 1.5,

theta = 3.5, gamma = 2.5, zeta = 1, sigma.s = 0.7, sigma.t = 0.7,

rsqrt = 0.8, betas = -1.25, betat = -1.25, frailt.base = 1,

lambda.S = 1.8, nu.S = 0.0045, lambda.T = 3, nu.T = 0.0025, ver = 1,

typeOf = 1, equi.subj.trial = 1, equi.subj.trt = 1,

prop.subj.trial = NULL, full.data = 0, prop.subj.trt = NULL,

random.generator = 1, random = 0, random.nb.sim = 0, seed = 0,

nb.reject.data = 0)

Arguments of the jointSurrSimul() function are accessible using the R command help(jointSurrSimul). An exhaustive description is presented in S2B Appendix in S2 Appendix.

Simulation studies based on the joint surrogate model

It is possible to perform simulation studies based on model (1), using the function jointSurroPenalSimul() defines as follows:

jointSurroPenalSimul(nb.dataset = 1, nbSubSimul = 1000, ntrialSimul = 30,

equi.subj.trial = 1, prop.subj.trial = NULL, equi.subj.trt = 1,

prop.subj.trt = NULL, theta2 = 3.5, zeta = 1, gamma.ui = 2.5,

alpha.ui = 1, sigma.s = 0.7, sigma.t = 0.7, R2 = 0.81, betas = -1.25,

betat = -1.25, lambdas = 1.8, nus = 0.0045, lambdat = 3, nut = 0.0025,

time.cens = 549, indicator.zeta = 1, indicator.alpha = 1, frail.base = 1,

init.kappa = NULL, n.knots = 6, maxit = 40, LIMparam = 0.001,

LIMlogl = 0.001, LIMderiv = 0.001, int.method = 2, adaptatif = 0,

nb.iterPGH = 5, nb.mc = 300, nb.gh = 32, nb.gh2 = 20,

nb.MC.kendall = 10000, nboot.kendall = 1000, true.init.val = 0,

theta.init = 1, zeta.init = 1, gamma.init = 0.5, alpha.init = 1,

sigma.ss.init = 0.5, sigma.tt.init = 0.5, sigma.st.init = 0.48,

betas.init = 0.5, betat.init = 0.5, kappa.use = 4,

random.generator = 1, random = 0, random.nb.sim = 0, seed = 0,

nb.decimal = 4, print.times = TRUE, print.iter = FALSE)

Most of the arguments in this function are mandatory for the user, taking into account the simulation design. S2B Appendix in S2 Appendix describes all the arguments, as well as the elements of the ‘jointSurroPenalSimul’ object.

Kendall’s τ estimation using the function `jointSurroTKendall`

The function jointSurroTKendall() is used to estimate Kendall’s τ described in S1B Appendix in S1 Appendix, based on the estimates from the model (1). It is possible to perform the numerical integration with the Monte-Carlo or the Gauss-Hermite quadrature method. The jointSurroTKendall() function is defined as shown below, with arguments described in S2D Appendix in S2 Appendix. This function returns the estimated value of Kendall’s τ

jointSurroTKendall(object = NULL, theta, gamma, alpha = 1, zeta = 1,

int.method = 0, sigma.v = matrix(rep(0, 4), 2, 2), nb.gh = 32,

nb.MC.kendall = 10000, random.generator = 1, random.nb.sim = 0,

random = 0, seed = 0, ui = 1)

Illustrations

Computational details and package installation

Estimations in the proposed functions are based on Fortran programs, with parallel computing using OpenMP, to speed up calculations. Thus, we used R as an interface between the user and the Fortran compiler. The stable version of frailtypack is available on the Comprehensive R Archive Network (CRAN) [29]. Furthermore, the ongoing version can be found on GitHub at https://github.com/socale/frailtypack. A list of other models implemented in frailtypack [23] can be found in S1 Fig. The results in this paper were obtained using R version 3.5.2 and the frailtypack package version 3.0.3, using a processor Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz including 40 cores and a Read Only Memory (RAM) of 378 Gb. A standard laptop and a desktop PC under recent versions of R can be used to fit the model. The results will be the same, but with longer computing time. For example, using a standard desktop PC in the application, the fit took around 1 hours compared to 9 min with a server including 40 cores and a RAM of 378 Go.

The frailtypack package can be installed in any R session using the install.packages command as follows:

install.packages (“frailtypack”, dependencies = T, type = “source”,

repos = “https://cloud.r-project.org”)

Installation via GitHub is possible thanks to the devtools package. All dependencies required by frailtypack must be installed first. The installation commands are:

install.packages (c(“survC1”, “doBy”,“statmod”), repos = “https://cloud.r-project.org”)

devtools::install github (“socale/frailtypack”, ref = “surrogacy submetted 3-0-3”)

Finally, frailtypack must be loaded using the command:

library (frailtypack)

Data source

We illustrate the use of the developed functions with the individual patient data of the Ovarian Cancer Meta-Analysis Project [24] and a generated dataset based on model (1). We also describe the simulation studies at the end of this section.

Description of `dataOvarian` dataset

The dataOvarian dataset combines data that were collected in four double-blind randomized clinical trials in advanced ovarian cancer. In the first two trials of this study, data were available on the centers in which patients were treated, and each of the two trials were considered as a homogeneous group according to the investigators. Finally, the statistical unit in the first two trials was center and it was trial for the last two trials. Therefore, a total of 50 units were available for surrogacy evaluation. The objective in these studies was to examine the efficacy of cyclophosphamide plus cisplatin (CP) versus cyclophosphamide plus adriamycin plus cisplatin (CAP) to treat advanced ovarian cancer. The candidate surrogate endpoint S was progression-free survival time (PFS), defined as the time (in years) from randomization to clinical progression of the disease or death. The true endpoint T was survival time, defined as the time (in years) from randomization to death from any cause. The dataset includes 1192 subjects with 82% of PFS-related events at a median survival time of 78.7 days [Interquartile range (IQR): 36.6–202.5], and 79.8% of deaths at a median survival time of 111.4 days [IQR: 56.0–275.9]. Data can be loaded as follows:

data (“dataOvarian”, package = “frailtypack”)

By displaying the structure of this dataset, we can find the same structure as in the function jointSurroPenal(), with 7 variables. The column trialID here refers to the analysis unit.

str (dataOvarian)

‘data.frame’: 1192 obs. of 7 variables:

$ patientID: int 1 2 3 4 5 6 7 8 9 10 …

$ trialID : num 2 2 2 2 2 2 2 2 2 2 …

$ trt : int 0 0 0 1 0 1 0 0 1 1 …

$ timeS : num 0.1052 0.8952 0.079 1.7393 0.0913 …

$ statusS : int 1 1 1 0 1 1 1 1 1 1 …

$ timeT : num 0.186 1.409 0.126 1.739 0.127 …

$ statusT : int 1 1 1 0 1 1 1 1 1 1 …

Generated dataset

In the example below, we generate a meta-analysis including 600 subjects in 30 trials. Arguments α, θ, ζ and γ are fixed to obtain a Kendall’s τ of 0.61, which is obtained using the jointSurroTKendall() function as follows:

jointSurroTKendall (theta = 3.5, gamma = 2.5, alpha = 1.5, zeta = 1)

[1] 0.6062975

Otherwise, the trial level surrogacy, $R_{t r i a l}^{2}$ is fixed to 0.8. This could correspond to simulation design including high trial level and high individual level surrogacy. The treatment effects β_S and β_T are set to -1.25 to consider protective effects both on the surrogate endpoint and the true endpoint. The code below is used to generate the dataset using the jointSurrSimul() function introduce d previously, and display the head.

data.sim <- jointSurrSimul(n.obs = 600, n.trial = 30, alpha = 1.5,

theta = 3.5, gamma = 2.5, zeta = 1, sigma.s = 0.7, sigma.t = 0.7,

rsqrt = 0.8, betas = −1.25, betat = −1.25, random.generator = 1,

seed = 0, nb.reject.data = 0)

head (data.sim)

patientID trialID trt timeS statusS timeT statusT

1 1 1 0 8.243721 1 38.41068 1

2 2 1 1 446.169009 0 446.16901 1

3 3 1 1 110.418853 0 110.41885 1

4 4 1 1 70.262075 0 70.26207 1

5 5 1 1 382.973632 1 549.24000 0

6 6 1 0 61.148254 1 230.24486 1

Surrogacy evaluation

In this section, we use the dataset previously described to illustrate the evaluation of the surrogate endpoints based on the one-step joint surrogate model (1). Different arguments of the associated functions will be explored as the returned values.

Model estimation based on the advanced ovarian cancer meta-analysis dataset

From a practical point of view, the most important arguments for using the jointSurroPenal() function beyond the standard argument (data) concern the following: the parametrization of the model (with arguments indicator.zeta and indicator.alpha), the method of integration and associated arguments (int.method, n.knots, nb.mc, nb.gh, nb.gh2, adaptatif), the smoothing parameters (init.kappa and kappa.use) and the scale of survival times (scale). Although optional, all these arguments can be used to manage the convergence issues. The choice of the values to assign to these arguments can be based on the convergence of model. When the convergence issues are fixed, users can implement the likelihood cross-validation criteria to evaluate the goodness of fit of different models, as shown later in this section. In the first step, users can try the model with the default values.

In the event of convergence issues, we recommend the following strategy: Changing the number of samples for Monte-Carlo integration (nb.mc) by choosing a numerical value between 100 and 300; varying the number of nodes for the Gaussian-Hermite quadrature integration (nb.gh and nb.gh2) by choosing the values between 15, 20 and 32; varying the number of nodes for spline (n.knots) by a numerical value between 6 and 10; providing new values for the smoothing parameters (init.kappa). Users can also set the arguments α or ζ to 1 (indicator.zeta = 1 or indicator.alpha = 1) to avoid estimating these parameters. We also recommend changing the integration method with the arguments int.method and adaptatif. For example, by using adaptatif = 1 for integration over the random effects at the individual level, one could use the pseudo-adaptive quadrature Gaussian-Hermite integration instead of the classical quadrature Gaussian-Hermite method. By changing the scale of the survival times (argument scale) and considering years instead of days, it is possible to solve some of the numerical issues.

Using the default values based on the advanced ovarian cancer dataset, the model did not converge. By changing the value of some arguments, we obtained the following set of arguments/values which allowed convergence:

joint.surro.ovar <- jointSurroPenal(data = dataOvarian, n.knots = 8,

indicator.alpha = 0, nb.mc = 200, scale = 1/365)

In this model, we fix the coefficient α to 1, and thereby do not estimate it. We consider 8 spline nodes for the baseline hazards. By default, we use the fixed initial values and obtain smoothing parameters by cross-validation on reduced models. We approximate integrals over the random effects using a combination of Monte-Carlo with 200 samples and classical Gauss-Hermite quadrature with 32 nodes. To solve numerical problems during estimation, we re-scale the survival times by converting days to years. This parametrization of the model provided the results described in the next section.

Summary of results

By applying the function summary() on the object joint.surro.ovar, the following results are displayed in the event of convergence:

summary(joint.surro.ovar)

Estimates for variance parameters of random effects

Estimate Std Error z P

theta 6.848 0.3786 18.086 < e-10 ***

zeta 1.792 0.0714 25.095 < e-10 ***

gamma 0.045 0.0774 0.576 0.5645

sigma2_S 0.610 0.3733 1.633 0.1025

sigma2_T 1.830 1.0202 1.794 0.07287 .

sigma_ST 1.056 0.6067 1.741 0.0817 .

Estimates for fixed treatment effects

Estimate Std Error z P

beta_S -0.596 0.2298 -2.595 0.009463 **

beta_T -0.841 0.3936 -2.136 0.03264 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

hazard ratios (HR) and confidence intervals for fixed treatment effects

exp(coef) Inf.95.CI Sup.95.CI

beta_S 0.551 0.351 0.864

beta_T 0.431 0.199 0.933

Surrogacy evaluation criterion

Level Estimate Std Error Inf.95.CI Sup.95.CI Strength

Ktau Individual 0.683 -- 0.664 0.696

R2trial Trial 1.000 0.001 0.998 1.002 High

R2.boot Trial 0.982 -- 0.896 1.000 High

---

Association strength: <= 0.49 ‘Low’;]0.49-0.72[‘Medium’; >= 0.72 ‘High’

---

Surrogate threshold effect (STE): -0.273 (HR = 0.761)

Convergence parameters

Penalized marginal log-likelihood = -10892.611

Number of iterations = 29

LCV = the approximate likelihood cross-validation criterion

in semi-parametrical case = 9.162

Convergence criteria:

parameters = 9.573e-06 likelihood = 8.426e-08 gradient = 4.507e-08

The results are organized in five parts. We first present estimates for the variance parameters of the random effects and the coefficients ζ and α (if applicable). This includes standard errors, z-statistics and p value of the Wald test. Results suggest a strong heterogeneity at the individual level, observed on the endpoints (θ = 6.848 compared to 0), and more pronounced on the true endpoint (ζ = 1.792 compared to 1). The estimated value of γ suggests homogeneous baseline hazards across trials (γ = 0.045, p > 0.5), both on the surrogate endpoint and on the true endpoint. This could explain the identification problem encountered by considering the coefficient α in the model. The parameters $σ_{S}^{2}$ , $σ_{T}^{2}$ , σ_ST suggest the presence of heterogeneity at trial level interacting with the treatment (p < = 0.10). The next two parts of the results show estimates for the fixed treatment effects β_S given the random effects (u_i, $v_{S_{i}}$ ) and β_T given (u_i, $v_{T_{i}}$ ), with the associated hazard ratios and confidence intervals. These parameters can be interpreted as usual, but taking adjustment on the random effects into account. We observed significant protective effects of the treatment on the surrogate endpoint and on the true endpoint (p < 0.05).

The fourth part of the results describes the surrogacy evaluation criterion. Kendall’s τ, $R_{t r i a l}^{2}$ and $R_{t r i a l, b o o t}^{2}$ (obtained using parametric bootstrap) are available with the associated confidence intervals as is the standard error of $R_{t r i a l}^{2}$ obtained by the Delta method [30]. Arguments int.method.kt and nb.gh of the summary() function can be used to choose between the Monte-Carlo and the Gauss-Hermite quadrature which integration method is to be used to estimate Kendall’s τ, and set the number of quadrature nodes when appropriate. Using at least 500 samples for the Monte-carlo integration and at least 15 quadrature nodes the two integration methods generally yield the same results for Kendall’s τ.

These results suggest high association measurement at the individual level (Kendall’s τ = 0.68 [0.66–0.70]), and high correlation strength at the trial level ( $R_{t r i a l, b o o t}^{2} = 0.98$ [0.90–1.00]) between the surrogate endpoint and the true endpoints, according to the classification of the surrogacy criteria proposed by the Institute of Quality and Efficiency in Health Care [31, 32]. Given that Kendall’s τ is adjusted on random effects at the individual level [17], it is quite difficult to observe a value > 0.7 compared to unadjusted ones from the two-step copula approach of Burzykowski et al. [8]. A very high value suggests extreme values for the parameters α, ζ, θ or γ, although such values are difficult to observe in practice. Therefore, a value around 0.65 can be considered as sufficient for validating surrogacy at the individual level.

We also compute and display the surrogate threshold effect with the associated hazard risk. We obtain an acceptable value of STE (- 0.273, HR = 0.761), which illustrates the high validity of the surrogate. As mentioned by [22], unrealistically large/small values of STE (e.g., corresponding to a HR of less than 0.5) would indicate too wide prediction limits and, consequently, poor validity of the surrogate. Therefore, as observed previously [8], PFS can be considered as a valid surrogate endpoint for OS when evaluating new treatments for advanced ovarian cancer.

The last part of the results describes the convergence parameters.

Model estimation based on generated dataset

Here, we estimate two joint surrogate models for the purpose of model comparison, based on the generated dataset data.sim. Integrals are approximated using a combination of Monte Carlo and classical Gauss-Hermite in the first model and a combination of Monte Carlo and Pseudo-adaptive Gauss-Hermite integration in the second one. The codes for both models are described as follows:

joint.surro.sim.MCGH <- jointSurroPenal(data = data.sim, int.method = 2,

nb.mc = 300, nb.gh = 20)

joint.surro.sim.MCPGH <- jointSurroPenal(data = data.sim, int.method = 2,

nb.mc = 300, nb.gh = 20, adaptatif = 1)

A relevant question in this case might be how to compare different models, or how to choose the optimal value of the number of knots for spline, the number of quadrature points, the number of samples for Monte-Carlo, or the optimal integration method. We propose in this package to base comparison on the approximated likelihood cross-validation criterion. The lower the value obtained for this parameter, the better the associated model will be.

Choice of model based on `LCV`

The LCV for models joint.surro.sim.MCGH and joint.surro.sim.MCPGH are respectively

joint.surro.sim.MCGH$LCV

[1] 8.29982

joint.surro.sim.MCPGH$LCV

[1] 8.31713

As expected [17], the two observed values of LCV are quite similar. The summary() function applied to previous objects give results shown below. When comparing the two models, estimates of most coefficients and standard errors showed some differences. However this observation does not alter conclusions on the surrogacy validity captured by Kendall’s τ and $R_{t r i a l}^{2}$ .

summary(joint.surro.sim.MCGH)

Estimates for variance parameters of random effects

Estimate Std Error z P

theta 3.450 0.4928 7.001 < e-10 ***

zeta 1.506 0.2364 6.369 1.899e-10 ***

gamma 1.881 0.5602 3.358 0.0007853 ***

alpha 0.916 0.1443 6.348 2.183e-10 ***

sigma2_S 0.703 0.4289 1.640 0.1011

sigma2_T 1.096 0.6147 1.783 0.07451 .

sigma_ST 0.442 0.3974 1.113 0.2657

Estimates for fixed treatment effects

Estimate Std Error z P

beta_S -2.046 0.2667 -7.673 < e-10 ***

beta_T -1.844 0.3562 -5.177 2.25e-07 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

hazard ratios (HR) and confidence intervals for fixed treatment effects

exp(coef) Inf.95.CI Sup.95.CI

beta_S 0.129 0.077 0.218

beta_T 0.158 0.079 0.318

Surrogacy evaluation criterion

Level Estimate Std Error Inf.95.CI Sup.95.CI Strength

Ktau Individual 0.596 -- 0.542 0.625

R2trial Trial 0.254 0.276 -0.288 0.796 Low

R2.boot Trial 0.290 -- 0.002 0.767 Low

---

Association strength: <= 0.49 ‘Low’;]0.49-0.72[‘Medium’; >= 0.72 ‘High’

---

Surrogate threshold effect (STE): -8.523 (HR = 0)

Convergence parameters

Penalized marginal log-likelihood = -4957.842

Number of iterations = 14

LCV = approximate likelihood cross-validation criterion

in the semi-parametrical case = 8.3

Convergence criteria:

parameters = 3.833e-05 likelihood = 0.0002426 gradient = 1.137e-06

summary(joint.surro.sim.MCPGH)

Estimates for variance parameters of random effects

Estimate Std Error z P

theta 2.640 0.4295 6.148 7.854e-10 ***

zeta 2.277 0.4010 5.679 1.356e-08 ***

gamma 1.355 0.4174 3.246 0.00117 **

alpha 1.135 0.2285 4.965 6.855e-07 ***

sigma2_S 0.593 0.3471 1.709 0.0875 .

sigma2_T 0.664 0.5771 1.151 0.2498

sigma_ST 0.380 0.3219 1.181 0.2376

Estimates for fixed treatment effects

Estimate Std Error z P

beta_S -1.643 0.2277 -7.216 < e-10 ***

beta_T -1.640 0.3573 -4.589 4.463e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

hazard ratios (HR) and confidence intervals for fixed treatment effects

exp(coef) Inf.95.CI Sup.95.CI

beta_S 0.193 0.124 0.302

beta_T 0.194 0.096 0.391

Surrogacy evaluation criterion

Level Estimate Std Error Inf.95.CI Sup.95.CI Strength

Ktau Individual 0.577 -- 0.522 0.607

R2trial Trial 0.367 0.358 -0.334 1.068 Low

R2.boot Trial 0.407 -- 0.007 0.964 Low

---

Association strength: <= 0.49 ‘Low’;]0.49-0.72[‘Medium’; >= 0.72 ‘High’

---

Surrogate threshold effect (STE): -4.922 (HR = 0.007)

Convergence parameters

Penalized marginal log-likelihood = -4968.465

Number of iterations = 20

LCV = the approximate likelihood cross-validation criterion

in the semi-parametrical case = 8.317

Convergence criteria:

parameters = 5.962e-05 likelihood = 0.0004484 gradient = 2.465e-06

Graphical representation of baseline hazard and survival functions

By using the generic function plot(), it is possible to plot the baseline hazard and survival functions for both surrogate and true endpoints. The definition of this function is shown below, and the associated arguments are described in S2E Appendix in S2 Appendix.

plot(x, endpoint = 2, scale = 1, type.plot = “Hazard”, xmin = 0,

conf.bands = TRUE, xmax = NULL, ylim = c(0, 1), Xlab = “Time”,

pos.legend = “topright”, main, cex.legend = 0.7,

Ylab = “Baseline hazard function”)

Fig 1 represents the baseline survival and hazard functions for model, for both the surrogate and the true endpoints using the advanced ovarian cancer meta-analysis dataset. We limit survival times to 8 months since after this threshold, the estimated survival probabilities are almost equal to 0. The code below produces the plots given in Fig 1.

par(mfrow = c(2, 1))

plot(joint.surro.ovar,type.plot = “Su”, xmax = 8, Xlab = “Time (in months)”,

scale = 12)

plot(joint.surro.ovar, xmax = 8, ylim = c(0, 0.2), Xlab = “Time (in months)”,

scale = 12, pos.legend = “topleft”)

Fig 2 shows another representation of the baseline survival and hazard functions for the surrogate and the true endpoints. We use the object joint.surro.sim.MCPGH for this purpose, which is based on the generated data.

The following code is used to produces Fig 2:

par(mfrow = c(2, 2))

plot(joint.surro.sim.MCPGH, type.plot = “Su”, endpoint = 0, scale = 1/365,

Xlab = “Time (in years)”)

plot(joint.surro.sim.MCPGH, type.plot = “Su”, endpoint = 1, scale = 1/365

,pos.legend = “bottomleft”, Xlab = “Time (in years)”)

plot(joint.surro.sim.MCPGH, type.plot = “Ha”, endpoint = 0, scale = 1/365

,ylim = c(0, 0.08),

Xlab = “Time (in years)”)

plot(joint.surro.sim.MCPGH, type.plot = “Ha”, endpoint = 1, scale = 1/365

,ylim = c(0, 0.08),

Xlab = “Time (in years)”)

Model evaluation and prediction

To assess the accuracy of the prediction using estimates from model (1), the leave-one-out cross validation criteria (loocv) described in S2F Appendix in S2 Appendix can be performed as follows:

dloocv <- loocv(object = joint.surro.sim.MCGH, unusedtrial = 26,

var.used = “error.estim”)

We found the following result:

dloocv$result

trialID ntrial beta.S beta.T beta.T.i Inf.95.CI Sup.95.CI

1 1 20 -2.145 -0.582 -2.038 -2.663 -1.412

2 2 20 -1.480 -0.799 -1.464 -2.135 -0.793 *

3 3 20 -0.285 -0.422 -0.195 -1.801 1.411 *

4 4 20 0.307 0.487 -0.248 -2.347 1.852 *

5 5 20 -1.087 -1.230 -0.983 -2.007 0.041 *

6 6 20 -21.305 -1.496 -13.951 -32.636 4.733 *

7 7 20 -0.796 -1.943 -0.687 -1.889 0.515

8 8 20 -1.578 -1.302 -1.545 -2.167 -0.923 *

9 9 20 -1.909 -1.402 -1.736 -2.241 -1.230 *

10 10 20 -1.752 -0.053 -1.505 -2.174 -0.836

11 11 20 -21.304 -0.342 -16.325 -35.269 2.619 *

12 12 20 -2.766 -20.920 -2.236 -3.201 -1.271

13 13 20 -0.474 -1.025 -0.835 -2.289 0.618 *

14 14 20 0.056 -0.148 -0.561 -2.603 1.481 *

15 15 20 -1.337 -1.154 -1.250 -2.218 -0.282 *

16 16 20 -0.191 -0.291 -0.125 -1.833 1.582 *

17 17 20 0.264 0.161 -0.540 -3.006 1.926 *

18 18 20 -2.589 -0.657 -2.197 -2.968 -1.426

19 19 20 -1.795 -1.654 -1.562 -2.263 -0.861 *

20 20 20 -0.630 1.599 -1.128 -2.451 0.195

21 21 20 -0.593 -0.510 -0.602 -1.988 0.785 *

22 22 20 -0.682 -1.645 -0.555 -1.827 0.716 *

23 23 20 -0.787 -0.179 -0.850 -2.061 0.362 *

24 24 20 -3.019 -2.735 -2.227 -3.504 -0.949 *

25 25 20 -2.393 -1.577 -2.099 -2.879 -1.319 *

26 27 20 -1.640 -1.063 -1.630 -2.248 -1.012 *

27 28 20 -1.386 -1.672 -1.220 -2.057 -0.383 *

28 29 20 -0.207 -0.722 -0.535 -2.220 1.150 *

29 30 20 0.299 0.185 -0.377 -3.215 2.461 *

The returned object, of class jointSurroPenalloocv includes for each trial the number of included subjects (ntrial), the observed treatment effect on the surrogate endpoint (beta.S), the observed treatment effect on the true endpoint (beta.T) and the predicted treatment effect on the true endpoint (beta.T.i) with the associated prediction interval (Inf.95.CI, Sup.95.CI). If the observed treatment effect on the true endpoint is included into the prediction interval, the last column contains “*”, indicating a good prediction.

Simulation studies

In this section, we show an example of simulation studies in the frailtypack package, based on model (1).

Estimations

Using the function jointSurroPenalSimul() simulation studies can be performed as follows:

joint.simul10 <- jointSurroPenalSimul(nb.dataset = 10, nbSubSimul = 600,

ntrialSimul = 30, LIMparam = 0.001, LIMlogl = 0.001, LIMderiv = 0.001,

nb.mc = 200, nb.gh = 20, nb.gh2 = 32, true.init.val = 1, print.iter = F)

This function serves to perform simulation studies with 10 meta-analyses, each study including 600 subjects and 30 trials. By default, each generated meta-analysis includes the same proportion of subjects per trial and the same proportion of treated subjects per trial. In the event of an identification problem, the model is re-estimated using 32 quadrature nodes. All unused simulation parameters are set to the initial value, as presented in the function jointSurroPenalSimul(). Using default values, we expect 0.81 for $R_{t r i a l}^{2}$ and 0.595 for Kendall’s τ.

Simulation results

Simulation results can be displayed using the S3 method summary(). This function allows argument R2boot to specify whether the confidence interval of $R_{t r i a l}^{2}$ will be computed using parametric bootstrapping (1) or the Delta method (0).

summary(joint.simul10, R2boot = 0)

Simulation and estimation pamareters

nb.subject = 600

nb.trials = 30

nb.simul = 10

int.method = 2

nb.gh = 20

nb.gh2 = 32

nb.mc = 200

kappa.use = 4

n.knots = 6

n.iter = 14

Simulation results

Parameters True value Mean Empirical SE Mean SE CP(%)

2 theta 3.5 3.451 0.711 0.545 80

3 zeta 1 1.049 0.22 0.177 70

4 gamma 2.5 2.642 0.957 0.711 80

5 alpha 1 1.009 0.135 0.138 90

6 sigma.S 0.7 0.608 0.361 0.426 90

7 sigma.T 0.7 0.627 0.347 0.459 80

8 sigma.ST 0.63 0.555 0.314 0.389 90

9 beta.S -1.25 -1.368 0.233 0.251 90

10 beta.T -1.25 -1.397 0.238 0.269 100

11 R2trial 0.81 0.82 0.181 0.521 80

12 K.tau 0.595 0.592 0.032 - 80

Rejected datasets: n(%) = 0(0)

In the first part of the results, we present a brief summary of simulation and estimation parameters, and the average number of iterations to reach convergence (n.iter = 14).

The next part presents a table of simulation results. Each row of the table corresponds to a model parameter. The first column is the name of the parameter, followed by the value assigned to the parameter during simulation. The next three columns correspond to the average of the estimates observed for all the generated datasets, the empirical standard errors and the mean of the estimated standard error. The last column is the coverage probability (CP), which is the proportion (%) of the 95% confidence intervals of the estimate that includes the true value of the parameter. We considered 10 meta-analyses here, although simulation studies more often require around 500 datasets of meta-analysis.

The last row of the results indicates the number of rejected datasets due to convergence issues.

Discussion

This paper presents new tools for validating candidate surrogate endpoints using data from multiple randomized clinical trials, with failure time endpoints. Since version 3.0.1, The R frailtypack package implements the joint-surrogate model, which is a more attractive approach than two-step approaches for evaluating surrogate endpoints based on a one-step analysis strategy. The joint-surrogate model demonstrated better performances than the two-step copula model or the one-step Poisson approach [17]. Furthermore, the new model showed stable results even with a moderate trial size or number of trial as commonly encountered in practice, whereas the adjusted model estimated with the Bayesian framework showed unstable results [11]

By varying the values of the arguments in the jointSurroPenal function, convergence of the model is not always guaranteed. Therefore, it is important in the event of convergence issues to know how to play with the arguments/values couple as shown in the previous section. Thus, users can choose the method of integration, initial values, the number of nodes for splines and the smoothing parameters, the number of nodes to use for the Gauss-Hermite quadrature and the number of samples for the Monte-Carlo integration when applicable, the random number generator, and other necessary arguments. It is also possible to set some parameters of the model in the event of identifiability issues. This underlines the flexibility of the frailtypack package in managing convergence issues. This flexibility is quite different from that obtained with the surrosurv package [9] or macros SAS [10] for evaluating surrogate endpoints using the two-step Copula model or one-step Poisson model. Other advantages of our model compared to existing approaches [8, 13] are in the reduction of convergences and numerical issues, the robustness to model misspecification, the surrogacy evaluation based on a one-step approach and therefore the estimation of $R_{t r i a l}^{2}$ without need for adjustment on estimation errors. In addition, as underlined in the illustration section, the interpretation of Kendall’s τ is different from that in the two-step copula approach.

Our previous paper [17] demonstrated the robustness of the joint surrogate model to model misspecification, numerical integration and variations in data characteristics regarding the surrogacy evaluation criteria ( $R_{t r i a l}^{2}$ and Kendall’s τ). It is robust to variations in the values of the arguments regarding the surrogacy evaluation criteria. Thus, in the event of convergence, change in arguments/values mostly produced similar results. For example, when we reduced the number of samples for Monte-Carlo integration to 100 (nb.mc = 100) in the application based on the advanced ovarian cancer meta-analysis dataset, we observed R2trial = 1.000 [95%CI: 0.998–1.002), R2boot = 0.981 [95%CI: 0.891–1.000], Kendall’s τ = 0.683 [95%CI: 0.664–0.695], STE = -0.291 (HR = 0.747) and LCV = 9.161. These results are quite similar to those using nb.mc = 200 (see illustration section in manuscript). In addition, if we integrate over the random effect at the individual level using the pseudo-adaptive Gaussian-Hermite quadrature (argument adaptatif = 1) instead of the classical Gaussian-Hermite quadrature, the results are similar with R2trial = 1.000 [95%CI: 0.998–1.002], R2boot = 0.982 [95%CI: 0.897–1.000], Kendall’s τ = 0.683 [95%CI: 0.664–0.696], STE = -0.272 (HR = 0.762) and LCV = 9.162. These examples confirm the robustness of the model previously discussed by Sofeu et al. (2019) using simulation studies.

Moreover, thanks to the jointSurroPenalSimul() function, it is possible to perform simulation studies in order to plan a new trial and define the optimal number of clusters when evaluating surrogate endpoints given the joint surrogate model. For example, if a given meta-analysis includes few trials, simulation studies may help in establishing the minimum number of centers to obtain the best estimate of the surrogacy evaluation criteria. Jurgen et al. [33] suggested using clinical trial simulations to optimize adaptive trial designs. As they explained, the typical goal of a clinical trial simulation is to identify a design that has a high probability of success based on the most likely conditions but which can also perform well, or at least acceptably, under more extreme conditions if necessary. Simulation studies can help if the recommended values for the arguments do not make it possible to reach convergence or involve longer computer time when fitting the joint surrogate model. Given the data characteristics, they can help in choosing optimal values for some arguments (the number of quadrature nodes, the number of samples for the Monte-Carlo integration and the number of nodes for splines) and in anticipating their impact on estimating the model parameters. The management of the convergence issues by the program itself is described in S2G Appendix in S2 Appendix.

Numerous tools have been presented in this paper for evaluating surrogacy. We have the following: the surrogate threshold effect which is used in combination with $R_{t r i a l}^{2}$ to assess the validity of the potential surrogate endpoint; the predict() function used in a new trial to predict the treatment effect of the true endpoint based on the observed treatment effect on the surrogate endpoint; and the leave-one-out cross-validation which can be used to assess the accuracy of the prediction using model (1). Furthermore, a graphical representation of the baseline hazard and survival functions is possible using the plot() function.

The jointSurroPenal() function can also be used in interim analyses to estimate the fixed treatment effect on the surrogate endpoint, taking into account competing risk of death and heterogeneity in the data at the individual level and at the trial level in interaction with treatment. This is an alternative to the joint frailty-copula model between tumor progression and death for meta-analysis proposed in [34].

We now plan to extend the model (1) and the jointSurroPenal() function to take into account interval censoring for endpoints where the exact event times are unknown. This extension will also make it possible to model the baseline hazard functions parametrically, using a Weibull distribution. To improve the use of frailtypack, intuition can be gained by developing an associated interactive web app using the R package Shiny available at https://CRAN.R-project.org/package=Shiny.

Supporting information

S1 Fig. Package characteristics (version 3.0.3.1).

Blue cross is for the option available for a given type of model in the package on CRAN, orange cross is for the option included in the package but not yet on CRAN yet. Empty cells mean that an option is not available for a given type of model. RE = Recurrent Event. TE = Terminal Event. LO = Longitudinal Outcome. STE = Surrogate Threshold Effect. ODE = Ordinary Differential Equation.

(TIF)

Click here for additional data file.^{(410.2KB, tif)}

S1 Appendix. Extension of the methodology.

(PDF)

Click here for additional data file.^{(185.2KB, pdf)}

S2 Appendix. Description of the arguments and return values for the functions.

(PDF)

Click here for additional data file.^{(219.8KB, pdf)}

Acknowledgments

The authors thank the Ovarian Cancer Meta-Analysis Project for sharing the data used to illustrate the programs. This work was supported by the Association pour la Recherche sur le Cancer, Grant/Award Number: PJA20161205147; Institut National du Cancer, Grant/Award Number: 2017-125; Institut national de la santé et de la recherche médicale; Région Aquitaine. We also thank Antoine Barbieri, INSERM U1219, for his support for programming the Bayesian approach. We gratefully acknowledge very helpful and constructive comments and suggestions from the academic editor and the three anonymous referees, which lead to significant improvements of this manuscript.

Data Availability

All data files are available from the frailtypack package, which can be download on the Comprehensive R Archive Network (CRAN). URL: https://cran.r-project.org/web/packages/frailtypack/index.html.

Funding Statement

This work was supported by the Association pour la Recherche sur le Cancer, Grant/Award Number: PJA20161205147; Institut National du Cancer, Grant/Award Number: 2017-125; and Institut national de la santé et de la recherche médicale; Région Aquitaine. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Fleming TR, DeMets DL. Surrogate End Points in Clinical Trials: Are We being Misled? Annals of Internal Medicine. 1996;125(7):605–613. 10.7326/0003-4819-125-7-199610010-00011 [DOI] [PubMed] [Google Scholar]
2. Matulonis UA, Oza AM, Ho TW, Ledermann JA. Intermediate Clinical Endpoints: A Bridge Between Progression-Free Survival and Overall Survival in Ovarian Cancer Trials. Cancer. 2015;121(11):1737–1746. 10.1002/cncr.29082 [DOI] [PubMed] [Google Scholar]
3. Ellenberg SS, Hamilton JM. Surrogate Endpoints in Clinical Trials: Cancer. Statistics in Medicine. 1989;8(4):405–413. 10.1002/sim.4780080404 [DOI] [PubMed] [Google Scholar]
4. Booth CM, Eisenhauer EA. Progression-Free Survival: Meaningful or Simply Measurable? Journal of Clinical Oncology. 2012;30(10):1030–1033. 10.1200/JCO.2011.38.7571 [DOI] [PubMed] [Google Scholar]
5. Prentice RL. Surrogate Endpoints in Clinical Trials: Definition and operational criteria. Statistics in Medicine. 1989;8(4):431–440. 10.1002/sim.4780080407 [DOI] [PubMed] [Google Scholar]
6. Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. The Validation of Surrogate Endpoints in Meta-Analyses of Randomized Experiments. Biostatistics. 2000;1(1):49–67. 10.1093/biostatistics/1.1.49 [DOI] [PubMed] [Google Scholar]
7. Burzykowski T, Molenberghs G, Buyse M, Geys H. The Evaluation of Surrogate Endpoints. Springer-Verlag, New-york, NK; 2005. [Google Scholar]
8. Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D. Validation of Surrogate End Points in Multiple Randomized Clinical Trials with Failure Time End Points. Journal of the Royal Statistical Society C (Applied Statistics). 2001;50(4):405–422. 10.1111/1467-9876.00244 [DOI] [Google Scholar]
9.Rotolo F. surrosurv: Evaluation of Failure Time Surrogate Endpoints in Individual Patient Data Meta-Analyses; 2017. Available from: https://CRAN.R-project.org/package=surrosurv. [DOI] [PubMed]
10. Alonso A, Bigirumurame T, Burzykowski T, Buyse M, Molenberghs G, Muchene L, et al. Applied Surrogate Endpoint Evaluation Methods with SAS and R. Chapman and Hall/CRC; 2017. [Google Scholar]
11. Renfro LA, Shi Q, Sargent DJ, Carlin BP. Bayesian Adjusted R2 for the Meta-Analytic Evaluation of Surrogate Time-To-Event Endpoints in Clinical Trials. Statistics in Medicine. 2012;31(8):743–761. 10.1002/sim.4416 [DOI] [PubMed] [Google Scholar]
12. Shi Q, Renfro LA, Bot BM, Burzykowski T, Buyse M, Sargent DJ. Comparative Assessment of Trial-Level Surrogacy Measures for Candidate Time-to-Event Surrogate Endpoints in Clinical Trials. Computational Statistics & Data Analysis. 2011;55(9):2748–2757. 10.1016/j.csda.2011.03.014. [DOI] [Google Scholar]
13. Rotolo F, Paoletti X, Burzykowski T, Buyse M, Michiels S. A Poisson Approach to the Validation of Failure Time Surrogate Endpoints in Individual Patient Data Meta-Analyses. Statistical Methods in Medical Research. 2019;28(1):170–183. 10.1177/0962280217718582 [DOI] [PubMed] [Google Scholar]
14. Alonso A, Molenberghs G. Surrogate Marker Evaluation from an Information Theory Perspective. Biometrics. 2007;63(1):180–186. 10.1111/j.1541-0420.2006.00634.x [DOI] [PubMed] [Google Scholar]
15. Buyse M, Michiels S, Squifflet P, Lucchesi KJ, Hellstrand K, Brune ML, et al. Leukemia-free Survival as a Surrogate End Point for Overall Survival in the Evaluation of Maintenance Therapy for Patients with Acute Myeloid Leukemia in Complete Remission. Haematologica. 2011;96(8):1106–1112. 10.3324/haematol.2010.039131 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Buyse M, Molenberghs G, Paoletti X, Oba K, Alonso A, der Elst WV, et al. Statistical Evaluation of Surrogate Endpoints with Examples from Cancer Clinical Trials. Biometrical Journal. 2016;58(1):104–132. 10.1002/bimj.201400049 [DOI] [PubMed] [Google Scholar]
17. Sofeu CL, Emura T, Rondeau V. One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine. 2019;38(16):2928–2942. 10.1002/sim.8162 [DOI] [PubMed] [Google Scholar]
18. Lin DY, Wei LJ. The Robust Inference for the Cox Proportional Hazards Model. Journal of the American Statistical Association. 1989;84(408):1074–1078. 10.1080/01621459.1989.10478874 [DOI] [Google Scholar]
19.Van der Elst W, Meyvisch P, Alonso A, Ensor HM, Molenberghs CJWG. Surrogate: Evaluation of Surrogate Endpoints in Clinical Trials; 2018. Available from: https://CRAN.R-project.org/package=Surrogate.
20. Bujkiewicz S, Thompson JR, Riley RD, Abrams KR. Bayesian Meta-Analytical Methods to Incorporate Multiple Surrogate Endpoints in Drug Development Process. In: Statistics in medicine; 2016. 10.1002/sim.6776 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Institute for Quality and Efficiency in Health Care. Validity of Surrogate Endpoints in Oncology: Executive Summary; 2011. Available from: www.iqwig.de/download/A10-05_Executive_Summary_v1-1_Surrogate_endpoints_in_oncology.pdf. [PubMed]
22. Burzykowski T, Buyse M. Surrogate Threshold Effect: An Alternative Measure for Meta-Analytic Surrogate Endpoint validation. Pharmaceutical Statistics. 2006;5(3):173–186. 10.1002/pst.207 [DOI] [PubMed] [Google Scholar]
23. Król A, Mauguen A, Mazroui Y, Laurent A, Michiels S, Rondeau V. Tutorial in Joint Modeling and Prediction: A Statistical Software for Correlated Longitudinal Outcomes, Recurrent Events and a Terminal Event. Journal of Statistical Software, Articles. 2017;81(3):1–52. [Google Scholar]
24. Ovarian cancer Meta-Analysis Project. Cyclophosphamide Plus Cisplatin Plus Adriamycin Persus Cyclophosphamide, Doxorubicin, and Cisplatin Chemotherapy of Ovarian Carcinoma: A Meta-Analysis. Classic Papers and Current Comments. 1991;3:237–234. [Google Scholar]
25. Marquardt DW. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics. 1963;11(2):431–441. 10.1137/0111030 [DOI] [Google Scholar]
26. Joly P, Commenges D, Letenneur L. A Penalized Likelihood Approach for Arbitrarily Censored and Truncated Data: Application to Age-Specific Incidence of Dementia. Biometrics. 1998;54(1):185–194. 10.2307/2534006 [DOI] [PubMed] [Google Scholar]
27. Gail MH, Pfeiffer R, van Houwelingen HC, Carroll RJ. On Meta-Analytic Assessment of Surrogate Outcomes. Biostatistics. 2000;1(3):231–246. 10.1093/biostatistics/1.3.231 [DOI] [PubMed] [Google Scholar]
28. Rondeau V, Mathoulin-Pelissier S, Jacqmin-Gadda H, Brouste V, Soubeyran P. Joint Frailty Models for Recurring Events and Death Using Maximum Penalized Likelihood Estimation: Application on Cancer Events. Biostatistics. 2007;8(4):708–721. 10.1093/biostatistics/kxl043 [DOI] [PubMed] [Google Scholar]
29.Rondeau V, Gonzalez JR, Mazroui Y, Mauguen A, Diakite A, Laurent A, et al. frailtypack: General Frailty Models: Shared, Joint and Nested Frailty Models with Prediction; Evaluation of Failure-Time Surrogate Endpoints; 2019. Available from: https://CRAN.R-project.org/package=frailtypack.
30. Dowd BE, Greene WH, Norton EC. Computation of Standard Errors. Health Services Research. 2014;49(2):731–750. 10.1111/1475-6773.12122 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Prasad V, Kim C, Burotto M, Vandross A. The Strength of Association Between Surrogate End Points and Survival in Oncology: A Systematic Review of Trial-Level Meta-Analyses. JAMA Internal Medicine. 2015;175(8):1389–1398. 10.1001/jamainternmed.2015.2829 [DOI] [PubMed] [Google Scholar]
32. Baker SG. Five Criteria for Using a Surrogate Endpoint to Predict Treatment Effect Based on Data from Multiple Previous Trials. Statistics in Medicine. 2018;37(4):507–518. 10.1002/sim.7561 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Jurgen H, Song W, John K. Using simulation to optimize adaptive trial designs: applications in learning and confirmatory phase trials. Clinical Investigation. 2015;5(4):401–413. [Google Scholar]
34. Emura T, Nakatochi M, Murotani K, Rondeau V. A Joint Frailty-Copula Model Between Tumour Progression and Death for Meta-Analysis. Statistical Methods in Medical Research. 2017;26(6):2649–2666. 10.1177/0962280215604510 [DOI] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0228098.r001

Decision Letter 0

Alan D Hutson

11 Sep 2019

PONE-D-19-20769

Frailtypack: an R-package for the validation of failure-time surrogate endpoints using individual patient data from meta-analysis of randomized controlled trials

PLOS ONE

Dear Mr. SOFEU,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Oct 26 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Alan D Hutson

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

Upon resubmission, please provide the following:

The name of the colleague or the details of the professional service that edited your manuscript
A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)
A clean copy of the edited manuscript (uploaded as the new *manuscript* file)

Additional Editor Comments (if provided):

There were three well thought-out reviews for this submission. Please address the important points put forward by each reviewer.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript by Sofeu and Rondeau covers an interesting topic and a relevant research question, and I think it warrants publication. However, I think some changes are needed to the manuscript.

For instance, I think it would be a much stronger paper if the application of the joint frailty model described in this paper was expanded and described in more detail, to motivate the use of the newly developed method; I would also suggest removing the simulated data and focussing on the ovarian cancer dataset.

The submitted manuscript is quite technical, focussing on describing the software package rather than the practical importance of it. I think it would appeal to a wider audience if contextualised more.

Regarding the applied examples, I think that the choice of arguments (e.g. numerical integration method, number of quadrature nodes, etc.) should be discussed in more detail, as it can significantly affect the results (and not all users are aware of these issues). I would also recommend discussing the robustness of the method to model misspecification, and the consequences of varying the estimation arguments mentioned above.

Finally, the authors mention several alternative methods to assess surrogacy; their comparison with the joint frailty model should be discussed in more detail (and maybe it would be interesting to include them in the application section as well, for comparison purposes).

More comments are included below.

Introduction:

I think that the baseline hazard function that uses splines should be referred to as “parametric” (or “flexible parametric”), rather than “non-parametric”. Survival models with flexible, spline-based baseline hazards are commonly referred to as "flexible parametric models".

Parameters estimation:

Given that several numerical integration methods have been considered, what has been chosen and why? How does the choice of integration method affect the validity of the method?

STE:

What is IQWIG?

Computational details:

Not all researchers can afford a Xeon with 40 cores and 300+ Gb of RAM. Could you elaborate on the computational requirements of this method? Would it be possible to fit any model on e.g. a standard laptop or desktop PC?

You are using R 3.4.3, which was released almost 2 years ago. How does the software run with newer versions of R?

You mention that dependencies must be installed. Could you describe them, to make the readers aware?

Data:

Minor comment: data(“dataOvarian”) only works if frailtypack is loaded first; you could add the “package” argument to make the requirement explicit: data(“dataOvarian”, package = “frailtypack”)

Surrogacy evaluation:

The arguments of “jointSurroPenal” that were set when fitting the joint frailty model with the ovarian cancer dataset are described but not motivated. Why were those specific values chosen? Does this affect the results of the joint model? If so, how? Model misspecification is a serious issue that can lead to biased estimates of the treatment effect, also in frailty models.

Choice of the model based on LCV:

To me, the two models don’t give similar results. For instance, the estimated fixed treatments effects are quite different (e.g. -2 vs -1.6 for beta_S). This ties with some of my previous comments on model misspecification and the choice of the estimation arguments when fitting the joint frailty model.

Simulation studies:

I don’t understand the utility of this section. Could you elaborate a bit more on that? If you simulate data from a joint frailty model and then fit a joint frailty model with the same model formulation to the simulated data, then good performance is expected. I am probably missing the point here (sorry for that), but it would be good to describe in more detail why and when this is useful.

Discussion:

The possibility of choosing estimation arguments that affect convergence of the algorithm should be discussed more, including comparisons (e.g. numerical integration methods), drawbacks, and problems that may arise.

A comparison with other established methods to assess surrogacy should be discussed as well, motivating the use of joint frailty models. Furthermore, robustness of the software introduced with this manuscript should be discussed.

Appendices:

There is a lot of material in the appendices - is it all necessary? I believe some of it is already included in the paper introducing the methodology (Sofeu et al., 2019, in Statistics in Medicine), and maybe readers could be referred to that instead.

Finally, the paper is at times hard to read, with some typos and several sentences that could be edited to improve clarity (remember that PLOS ONE does not copyedit accepted manuscripts). I spotted some typos, which are included below:

Title: “frailtypack” should start with a lowercase letter since it is the name of the package;

Keywords: “fraity” instead of “frailty”; “surrogte” instead of “surrogate”; “envent” instead of “event”

Abstract: The first sentence is hard to read; “quiet” instead of “quite” in the background and objective section (line 9);

Line 89: replace “discuss” with “discussed”

Line 131: replace “measurements” with “measure”

Line 134: replace “interpreting” with “interpretation”

Line 136: replace “equals” with “equal”

Line 225: replace “describe” with “described”

Line 237: GitHub should have capitals G and H (it’s the name of the company/service)

Line 241: did you mean Gb instead of Go?

Line 342: replace “respectievely” with “respectively”

Line 365: spell out “loocv”

Line 447: replace “censorship” with “censoring”

Line 448: replace “which” with “where”

Reviewer #2: Sofeu et.al. presented a manuscript that is an R implemented of their method published in “One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine. 2019;1–15”, with some enhanced functions for the validation of candidate surrogate endpoints. The manuscript presented some potentially useful alternative method for validate surrogate endpoint, based on the method published in the associated Statistics in Medicine paper. This package could potentially help researchers to choose more appropriate statistical method when validate surrogate endpoints. However, there is lack of demonstration of the stated advantages in comparison to other methods, and detailed discussion of the potential benefit of using the stated method is missing. Only general comparison was presented in the “Discussion” section without much depth or details. As such, the stated advantages were not well established in this manuscript. Overall, the manuscript mostly demonstrated the technical side of the package. Therefore, this reviewer encourages the author to consider enhances the manuscript so a researcher (instead of a package user) can benefit from reading the manuscript and decided how to realize its potential statistical advantages when used appropriately.

Major comments:

1. Numerous English language grammar errors in the manuscript is a real problem. Some of them listed later in the minor comments. Many sentences are nearly incomprehensible. An English proof reader is in need.

2. In the introduction section, the authors briefly discussed the pros and cons of existing methods and their method for surrogate endpoint validation. Yet, the manuscript (and package) focused on meta-analysis of such data. It might be obvious to the authors why such data is well suit and/or needed, but it should be make clear that a) if the method can be used for single study/center data, and b) the rational of using meta-analysis data in the demonstration of the manuscript. This is vital for the readers to understand the applicability and limitations of their package/methods.

3. In the associated R package, the Vignettes did not provide enough details for even the minimal example. On the other hand, this manuscript provided only marginally more information than a package Vignettes. Some statistical insights and details would make this manuscript more useful for a researcher instead of an R package user, for example, practical considerations in model estimation, interpretation of the results (like the summary presented in page 12-13), and any potential issues when choose different parameters. In addition, any practical advantages (if any) derived from these results using the method for the Ovarian Cancer data presented here would be very useful in demonstrate the benefit of this method over others. If not yet available, at least discuss potential gains by using this method instead of the others.

4. It is not clear what the results in the “Simulation studies” section can be used in the context of the Ovarian Cancer data analysis, other than a brief sentence in the Discussion section. The author may want to explain how such results are useful for the stated purpose (plan future trial) there.

Minor comments:

1. In Abstract the sentence “We have especially the surrogate threshold effect…” need revision to make a valid English sentence. Also “Other tools concerned data generation, studies simulation and graphic representations…”.

2. Many minor grammar issues in main text.

Reviewer #3: This paper addresses an interesting problem, one which is numerically challenging to solve. I have comments concerning the presentation, the statistical problem, and the software. The last of these three is separated out and should be viewed as comments on possible future evolution of the package, but is not relevant to any editorial decisions about the current manuscript, which deals with the present offering of the package.

survival models containing one or more random effects, or frailties...." "it includes for instance simple shared frailty, correlated...." "...for this paper we will focus on a particular subset of features applicable to the evaluation of surrogate endpoints ..." We need just enough to orient the user so that they don't feel like they were dropped into chapter 2 of a novel.

2. The description of the competing methods, starting at about page 2 line 14, was quite difficult to follow. The paragraphs need "roadmap" sentences to help readers who have not immersed themselves in the field know what the goal of the journey (paragraph) will be. It is hard to know what is background and what is essential. (This reviewer for instance -- I know some of the overarching questions but it is not a personal research area.) Without that the information is both too much and too little.

(page 3, lines 78-85 are nicely done.)

3. Equation 1 is surprising -- where are the covariates? (See line 17, which introduced a need for them.) I realize that in practical work there will be at most 1 or 2 prognostic factors that are widely enough recognized such that all studies will have gathered them. Can you sidestep the issue by creating extra strata?

When Z is a 0/1 covariate, as is often the case, then $\\v_S$ and $v_T$ are not identifiable for the control subjects. Does this cause issues wrt estimation of the variance of $v$?

4. The function on lines 163 and following needs some explanation. First there is no formula. Only later when reading the examples, was this reviewer able to guess that you decided that the user should use a certain set of predetermined variable names. Please add some text.

(Software suggestion: There list of options is terribly long; how is one to know what is necessary and what is optional. The pattern found in glm.control and many other packages would be a good way to separate out the secondary ones.)

5. The double hash signs on the left of the printout are a distraction and should be removed. (When there is lots of code, little output, and the goal is to allow users to easily cut/paste, then this peculiar design choice is defensible. None of those 3 is true here.)

6. The discussion of the results on pages 10-11 is an important part of the paper. The authors have several good choices in the material just prior to this section wrt moving significant portions of it to the appendix. However, several details of the discussion are very terse. For example the statements at lines 298-302. Why is a value of 6.8 "strong" and a value of 1.79 "more pronounced".

7. A first question of an estimation method is whether it works when the data set exactly fits the proposed model (if that fails then an approach is nearly useless). The far more important one is how the method works when the model is not quite true, i.e., the case for any real data set. The package's simulation modules are designed to address only the first of these, not the second, which is a serious limitation. More serious is the lack of any evaluation of the approach outside of the ideal case. Some methods are methods are robust to such changes and some are fragile.

8. Figure 1 is not very helpful, and is in fact confusing. Arrows normally go from parent to child. Do you really mean to imply that the plot() function generates data that is used by the estimation functions? Why is there an arrow out the side of summary()?

Figures 2 and 3 are at odds. The paper states that they are two "representations" of a fit to the same data, yet Figure 2 has median values of 2.5 and 4 months, while figure 3 shows values close to .5 and 1 year. If these are simply two different solution paths, it would argue that the software is very unstable. No discussion is provided to guide the user with respect to these discrepant results.

Minor comments (no response required)

Abstract sentence 1: " ... for accelerated effectively the phase 3 trial." I've read this 3 times and still do not know what the words mean.

Abstract: "This model was quiet robust..." I think you mean 'quite robust'.

$\\alpha$ and $\\zeta$ might make a little more sense attached to S rather than T. After all, death is death, but different institutions can and will have different standards for "progression".

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jan 28;15(1):e0228098. doi: 10.1371/journal.pone.0228098.r002

Author response to Decision Letter 0

29 Oct 2019

We thank the editor and the reviewers for the great interest to our manuscript. Following the comments of the editor, about the Plos One’s style requirements, we numbered the body of the manuscript from the Author list and use plos One’s Latex template to write the manuscript.

According to the editor and the reviewers’ comments, the manuscript has been copyedited by a professional scientific copyeditor:

RAYMOND COOKE

20 Rue Louis MONDAUT

33150 CENON

France

Please find below our replies to the reviewers’ comments. We have also revised the manuscript in accordance with the comments when necessary.

5. Review Comments to the Author

1- For instance, I think it would be a much stronger paper if the application of the joint frailty model described in this paper was expanded and described in more detail, to motivate the use of the newly developed method; I would also suggest removing the simulated data and focussing on the ovarian cancer dataset.

We thank reviewer #1 for this remark. The paper describes the use of a new software package for evaluating surrogate endpoints, rather than the methodological aspects already published in Statistics in Medicine (Sofeu et al., 2019). We take your point that we could emphasize the practical importance of it. To this end, we now discuss the choice of arguments with the associated values, the management of convergence issues and the interpretation of outputs, rather than just illustrate the use of the functions.

To improve the illustration of the joint surrogate model and then guide the choice of the values for the arguments of the jointSurroPenal() function, we illustrate the call of this function by adding the following paragraphs in the manuscript (see the illustration section):

“From a practical point of view, the most important arguments for using the jointSurroPenal() function beyond the standard argument (data) concern the following: the parametrization of the model (with arguments indicator.zeta and indicator.alpha), the method of integration and associated arguments (int.method, n.knots, nb.mc, nb.gh, nb.gh2, adaptatif), the smoothing parameters (init.kappa and kappa.use) and the scale of survival times (scale). Although optional, all these arguments can be used to manage the convergence issues. The choice of the values to assign to these arguments can be based on the convergence of model. When the convergence issues are fixed, users can implement the likelihood cross-validation criteria to evaluate the goodness of fit of different models, as shown later in this section. In the first step, users can try the model with the default values.

In the event of convergence issues, we recommend the following strategy: changing the number of samples for Monte-Carlo integration (nb.mc) by choosing a numerical value between 100 and 300; varying the number of nodes for the Gaussian-Hermite quadrature integration (nb.gh and nb.gh2) by choosing the values between 15, 20 and 32 varying the number of nodes for spline (n.knots) by a numerical value between 6 and 10; providing new values for the smoothing parameters (init.kappa). Users can also set the arguments alpha or zeta to 1 (indicator.zeta = 1 or indicator.alpha = 1) to avoid estimating these parameters. We also recommend changing the integration method with the arguments int.method and adaptatif. For example, by using adaptatif = 1 for integration over the random effects at the individual level, one could use the pseudo adaptive quadrature Gaussian-Hermite integration instead of the classical quadrature Gaussian-Hermite method. By changing the scale of the survival times (argument scale) and considering years instead of days, it is possible to solve some of the numerical issues.

We agree that the core of the application should be concentrated around the use of the joint surrogate model based on real data. This is why we emphasized the choice of argument/values and the interpretation of output from the jointSurroPenal() function on real data. However, simulated data were used owing to the description of the jointSurrSimul() functions for generating a meta-analysis dataset and then to illustrate the simple use of different functions and their arguments for surrogacy evaluation.

2- Regarding the applied examples, I think that the choice of arguments (e.g. numerical integration method, number of quadrature nodes, etc.) should be discussed in more detail, as it can significantly affect the results (and not all users are aware of these issues). I would also recommend discussing the robustness of the method to model misspecification, and the consequences of varying the estimation arguments mentioned above.

We thank the reviewer for this comment. The choice of arguments/values and management of arguments in the event of convergence issues are now discussed in detail. We now include a comment on this point (see response comment #1).

Concerning the robustness of the method, it was detailed in the previous methodological paper (Sofeu et al. 2019) based on simulation studies. The authors showed that the joint surrogate model is quite robust to model misspecification, numerical integration and variations in data characteristics regarding the surrogacy evaluation criteria (R2trial and Kendall’s Tau). Therefore, in the event of convergence of the model, the parameters are generally good estimates. We have added the following comment in the discussion section:

“Our previous paper (Sofeu et al., 2019) demonstrated the robustness of the joint surrogate model to model misspecification, numerical integration and variations in data characteristics regarding the surrogacy evaluation criteria (R2trial and Kendall’s Tau).

In the following, we discuss the robustness of the model regarding variations in the values of some arguments based on the advanced ovarian cancer meta-analysis dataset. We then include the following paragraph in the revised discussion section:

“In addition, we demonstrate the robustness of the model to variations in the values of the arguments regarding the surrogacy evaluation criteria. Thus, in the event of convergence, changes in argument/value mostly produced similar results. For example, when we reduced the number of samples for Monte-Carlo integration to 100 (nb.mc = 100) in the application based on the advanced ovarian cancer meta-analysis dataset, we observed R2trial = 1.000 [95%CI: 0.998 – 1.002), R2boot = 0.981 [95%CI: 0.891 – 1.000], Kendall’s Tau = 0.683 [95%CI: 0.664 – 0.695], STE = -0.291 (HR = 0.747) and LCV = 9.161. These results are quite similar to those using nb.mc = 200 (see illustration section in manuscript). In addition, if we integrate over the random effect at the individual level using the pseudo-adaptive Gaussian-Hermite quadrature (argument adaptatif = 1) instead of the classical Gaussian-Hermite quadrature, the results are similar with R2trial = 1.000 [95%CI: 0.998 – 1.002], R2boot = 0.982 [95%CI: 0.897 – 1.000], Kendall’s Tau = 0.683 [95%CI: 0.664 – 0.696], STE = -0.272 (HR = 0.762) and LCV = 9.162. These examples confirm the robustness of the model previously discussed by Sofeu et al. (2019) using simulation studies.”

3- Finally, the authors mention several alternative methods to assess surrogacy; their comparison with the joint frailty model should be discussed in more detail (and maybe it would be interesting to include them in the application section as well, for comparison purposes).

In the methodological article published in Statistics in Medicine (Sofeu et al., 2019), we already compared the joint surrogate model with the two-step copula approach of Burzykowski et al. (2001) and with the one-step Poisson approach of Rotolo et al. (2017). The joint surrogate model was quite robust to the misspecification of data and variations in data characteristics compared to existing approaches. In addition, convergence and identifiability issues were attenuated with the new model. These findings were detailed by the authors. Therefore, to avoid dual publication and given that the ongoing manuscript aims to reach out to a wide community of users, we mainly focus on the description and usage of the package, with additional tools for evaluating surrogacy.

More comments are included below.

Introduction:

4- I think that the baseline hazard function that uses splines should be referred to as “parametric” (or “flexible parametric”), rather than “non-parametric”. Survival models with flexible, spline-based baseline hazards are commonly referred to as "flexible parametric models".

We thank the reviewer for this remark. We have corrected the mistake by replacing the expression “non-parametric” by “flexible” in the manuscript for the baseline hazard function. However, the model parameters and baseline hazard functions were estimated using a semi-parametric penalized likelihood approach.

Parameters estimation:

5- Given that several numerical integration methods have been considered, what has been chosen and why? How does the choice of integration method affect the validity of the method?

The choice of the numerical integration method is governed by the convergence of the model and the computational time. As a first step, we encourage users to choose a combination of Monte-Carlo to integrate over the random effect at the trial level and a Gaussian-Hermite quadrature to integrate over the random effects at the individual level. This integration method is less time-consuming compared to full Monte Carlo and full Gaussian-Hermite quadrature integration methods. However, users can change the method of integration in the event of convergence issues. This has already been discussed in the response to comment #1. As previously shown in the methodological paper, the method is quite robust to integration regarding the surrogacy evaluation criteria. Therefore, in the event of model convergence, we expect close results regarding the validation of surrogate endpoints.

STE:

6- What is IQWIG?

IQWIG stands for the German Institute for Quality and Efficiency in Health Care (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen). It is an independent health technology assessment agency that assesses the benefits and harms of drug and non-drug technologies on behalf of the German Federal Joint Committee and the Federal Ministry of Health. It has issued recommendations for the evaluation of surrogate endpoints.

For the sake of clarity, we have replaced “IQWIG” in the manuscript by “German Institute for Quality and Efficiency in Health Care (IQWiG)”

Computational details:

7- Not all researchers can afford a Xeon with 40 cores and 300+ Gb of RAM. Could you elaborate on the computational requirements of this method? Would it be possible to fit any model on e.g. a standard laptop or desktop PC?

We also tested the application on real data on a standard laptop and a desktop PC under recent versions of R and obtained exactly the same results, but with longer computing time. We have added the following to the section “Computational details and package installation”:

“A standard laptop and a desktop PC under recent versions of R can be used to fit the model. The results will be the same, but with an increase in computing time. For example, using a standard desktop PC in the application, the fit took around 1 hour compared to 9 min with a server including 40 cores and a RAM of 378 Go.”

8- You are using R 3.4.3, which was released almost 2 years ago. How does the software run with newer versions of R?

Now, we are using R 3.5.2 and the results are the same. We have updated the R version in the manuscript to “3.5.2” instead of 3.4.3

9- You mention that dependencies must be installed. Could you describe them, to make the readers aware?

We have rephrased package installation process as follows:

“The frailtypack package can be installed in any R session using the install.packages command as follows:

install.packages("frailtypack", dependencies = T, type ="source",repos = "https://cloud.r-project.org")

Installation via GitHub is possible thanks to the devtools package. All dependencies required by frailtypack must be installed first. The installation commands are:

install.packages(c("survC1","doBy","statmod"), repos = "https://cloud.r-project.org")

devtools::install_github("socale/frailtypack", ref = "surrogacy_submetted_3-0-3")

Finally, frailtypack must be loaded using the command:

library(frailtypack)”

Data:

10- Minor comment: data(“dataOvarian”) only works if frailtypack is loaded first; you could add the “package” argument to make the requirement explicit: data(“dataOvarian”, package = “frailtypack”)

You are right. We have rephrased the command as follows:

“data(“dataOvarian”, package = “frailtypack”)”

Surrogacy evaluation:

11- The arguments of “jointSurroPenal” that were set when fitting the joint frailty model with the ovarian cancer dataset are described but not motivated. Why were those specific values chosen? Does this affect the results of the joint model? If so, how? Model misspecification is a serious issue that can lead to biased estimates of the treatment effect, also in frailty models.

We provide a complete answer to this comment in the response to comments #1 and #2. In addition, we have updated the manuscript accordingly. We observed in simulation that model misspecification did not really affect the fix treatment effects (Sofeu et al., 2019). However, given the research question, we studied in depth the effect of misspecification on surrogacy evaluation criteria (Kendall’s Tau and R2trial).

Choice of the model based on LCV:

12- To me, the two models don’t give similar results. For instance, the estimated fixed treatments effects are quite different (e.g. -2 vs -1.6 for beta_S). This ties with some of my previous comments on model misspecification and the choice of the estimation arguments when fitting the joint frailty model.

We thank the reviewer for this remark and hope that the responses above have helped to dispel any doubts on model misspecification and the choice of the arguments/values for the jointSurroPenal() function. We observed in simulation that the model was quite robust asymptomatically, and also with estimation of fixed treatment effects. Based on a dataset, for two distinct models a slight difference can be observed between some points estimates. However, the 95% confidence intervals for these estimates overlapped. This requires the goodness of fit to be studied in order to choose the best model using the LCV criteria.

Simulation studies:

13- I don’t understand the utility of this section. Could you elaborate a bit more on that? If you simulate data from a joint frailty model and then fit a joint frailty model with the same model formulation to the simulated data, then good performance is expected. I am probably missing the point here (sorry for that), but it would be good to describe in more detail why and when this is useful.

The aim of the simulation studies section is to illustrate the jointSurroPenalSimul() function. As describes in the discussion section, this function can help in planning a new trial and defining the optimal number of clusters when evaluating surrogate endpoints given the joint surrogate model. For example, if a given meta-analysis includes few trials, simulation studies may guide the choice of the minimum number of centers to be considered for a better estimation of the surrogacy evaluation criteria. Jurgen et al. (2015) suggested using clinical trial simulations to optimize adaptive trial designs. As they explained, the typical goal of clinical trial simulation is to identify a design that has a high probability of success based on the most likely conditions but which can also perform well, or at least acceptably, under more extreme conditions if necessary. In addition, if the recommended values for the arguments do not make it possible to reach convergence or involve a longer computer time when fitting the joint surrogate model, simulation studies can help given the data characteristics to choose optimal values for the number of quadrature nodes, the number of samples for the Monte-Carlo integration or for the number of nodes for splines. For the sake of clarity, we have reworded the paragraph about the usefulness of the jointSurroPenalSimul() function in the discussion as follows:

“Moreover, thanks to the jointSurroPenalSimul() function, it is possible to perform simulation studies in order to plan a new trial and define the optimal number of clusters when evaluating surrogate endpoints given the joint surrogate model. For example, if a given meta-analysis includes few trials, simulation studies may help in establishing the minimum number of centers to obtain the best estimate of the surrogacy evaluation criteria. Jurgen et al. (2015) suggested using clinical trial simulations to optimize adaptive trial designs. As they explained, the typical goal of a clinical trial simulation is to identify a design that has a high probability of success based on the most likely conditions but which can also perform well, or at least acceptably, under more extreme conditions if necessary. Simulation studies can help if the recommended values for the arguments do not make it possible to reach convergence or involve longer computer time when fitting the joint surrogate model. Given the data characteristics, they can help in choosing optimal values for some arguments (the number of quadrature nodes, the number of samples for the Monte-Carlo integration and the number of nodes for splines) and in anticipating their impact on estimating the model parameters.

Ref

Jurgen H, Song W, John K. Using simulation to optimize adaptive trial designs: applications in learning and confirmatory phase trials. Clinical Investigation. 2015;5(4):401–413. doi:10.4155/CLI.15.14.

Discussion:

14- The possibility of choosing estimation arguments that affect convergence of the algorithm should be discussed more, including comparisons (e.g. numerical integration methods), drawbacks, and problems that may arise.

These points are now discussed in the manuscript, as responses to comments #1, #2, #3 and #7.

15- A comparison with other established methods to assess surrogacy should be discussed as well, motivating the use of joint frailty models. Furthermore, robustness of the software introduced with this manuscript should be discussed.

We have responded to this comment by responding to comment number #3.

Appendices:

16- There is a lot of material in the appendices - is it all necessary? I believe some of it is already included in the paper introducing the methodology (Sofeu et al., 2019, in Statistics in Medicine), and maybe readers could be referred to that instead.

We thank the reviewer for this remark. In the appendices S1A, S1B and S1C, we just recall the formulation of the penalized marginal log-likelihood, Kendall’s tau and R2trial. Full information on this point can be found in the methodological paper. We believe that this short recall in the appendix is necessary to understand the output of the jointSurroPenal() function. The rest of the material in the appendix is new and concerns the derivation of the surrogate threshold effect (STE) and the help on the parameters of all functions defined in the manuscript.

17- Finally, the paper is at times hard to read, with some typos and several sentences that could be edited to improve clarity (remember that PLOS ONE does not copyedit accepted manuscripts). I spotted some typos, which are included below:

As a result of your comment, the manuscript has been copyedited by a professional scientific copyeditor.

Title: “frailtypack” should start with a lowercase letter since it is the name of the package;

Keywords: “fraity” instead of “frailty”; “surrogte” instead of “surrogate”; “envent” instead of “event”

We have edited the keywords in the submission process

Abstract: The first sentence is hard to read; “quiet” instead of “quite” in the background and objective section (line 9);

We corrected “quite” and rephrased the first sentence as follows:

“The use of valid surrogate endpoints can accelerate the development of phase III trials.”

Line 89: replace “discuss” with “discussed”

Done

Line 131: replace “measurements” with “measure”

Done

Line 134: replace “interpreting” with “interpretation”

Done

Line 136: replace “equals” with “equal”

Done

Line 225: replace “describe” with “described”

Done

Line 237: GitHub should have capitals G and H (it’s the name of the company/service)

Done

Line 241: did you mean Gb instead of Go?

Exactly

Line 342: replace “respectievely” with “respectively”

Done

Line 365: spell out “loocv”

We replaced loocv by “leave-one-out cross validation criteria (loocv)”

Line 447: replace “censorship” with “censoring”

Done

Line 448: replace “which” with “where”

Done

Reviewer #2: Sofeu et al. presented a manuscript that is an R implemented of their method published in “One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine. 2019;1–15”, with some enhanced functions for the validation of candidate surrogate endpoints. The manuscript presented some potentially useful alternative method for validate surrogate endpoint, based on the method published in the associated Statistics in Medicine paper. This package could potentially help researchers to choose more appropriate statistical method when validate surrogate endpoints. However, there is lack of demonstration of the stated advantages in comparison to other methods, and detailed discussion of the potential benefit of using the stated method is missing. Only general comparison was presented in the “Discussion” section without much depth or details. As such, the stated advantages were not well established in this manuscript. Overall, the manuscript mostly demonstrated the technical side of the package. Therefore, this reviewer encourages the author to consider enhances the manuscript so a researcher (instead of a package user) can benefit from reading the manuscript and decided how to realize its potential statistical advantages when used appropriately.

Major comments:

As a result of your comment, the manuscript has been copyedited by a professional scientific copyeditor.

We thank the reviewer for this comment. The need for meta-analysis data when validating surrogate endpoints has been discussed by several authors (Buyse and Molenberghs 1998; Burzkwoski et al. 2005; Buyse et al. 2015; Paoletti et al. 2016). It is due to some practical problems encountered when the validation approach is based on a single trial and the need to take heterogeneity between trials into account, for the purpose of prediction outside the scope of the trial. The one-step validation approach by Sofeu et al. (2019) is based on meta-analytic (or multicenter) data. For this reason, we have included the following paragraph in the introduction of the manuscript for the sake of clarity.

“Prentice (1989) enumerated four criteria to be fulfilled by a putative surrogate endpoint. The fourth criterion, often called Prentice’s criterion, stipulates that a surrogate endpoint must capture the full treatment effect upon the true endpoint. The validation of Prentice’s criterion based on a clinical trial was quite difficult, mainly due to a lack of power and the difficulty to verify an assumption related to the relation between the treatment effects upon the true and the surrogate endpoints. Therefore, to verify this assumption and obtain a consistent sample size, Buyse et al (2000) like other authors suggested basing validation on the meta-analytic (or multicenter) data. An important point when dealing with meta-analytic data is to take heterogeneity between trials into account, for the purpose of prediction outside the scope of the trial. Thus, a validated surrogate endpoint from meta-analytic data can be used to predict the treatment effect upon the true endpoint in any trial”

To provide added value to this manuscript compared to a package vignette, we now discuss the choice of the arguments/values for the main function and the management of the convergence issues by varying values of the arguments, and the robustness of the model (see responses to comments #1, #2, #5, #9, #11 and #12 of reviewer #1). We also emphasize the benefit of proposing a function for simulation studies (see response to comments 13 to reviewer #1). The manuscript has been reworded accordingly. The outputs of the jointSurroPenal() function have been well documented in the section “Summary of results” after the illustration on the advanced ovarian cancer dataset. We have reworded this description for the sake of clarity.

Regarding the package vignette, it just describes the available models within the package with the corresponding options. However, we provide different demos for basic examples. If published, this manuscript will be referenced in the package to facilitate its usage.

Regarding the benefits of the method over the others, they were addressed in the methodological paper (Sofeu et al. 2019). Compared to the one-step copula approach, the main advantages of our model are in the reduction of convergences and numerical issues, the robustness to model misspecification, the flexibility of the model, the validation in one step and the estimation of R2trial without need for adjustment on the estimation errors. However, we have now reworded the second paragraph of the discussion section of the manuscript in order to underline the main advantage of the package compared to the others as follows:

“By varying the values of the arguments in the jointSurroPenal function, convergence of the model is not always guaranteed. Therefore, it is important in the event of convergence issues to know how to play with the couples arguments/values as shown in the previous section. Thus, users can choose the method of integration, initial values, the number of nodes for splines and the smoothing parameters, the number of nodes to use for the Gauss-Hermite quadrature and the number of samples to consider for the Monte-Carlo integration when applicable, the random number generator, and other necessary arguments. It is also possible to set some parameters of the model in the event of identifiability issues. This underlines the flexibility of the frailtypack package in managing convergence issues. This flexibility is quite different from that obtained with the surrosurv package or macros SAS, for evaluating surrogate endpoints using the two-step Copula model or one-step Poisson model. Other advantages of our model compared to existing approaches are in the reduction of convergences and numerical issues, the robustness to model misspecification, the surrogacy evaluation based on a one-step approach and therefore the estimation of R2trial without need for adjustment on estimation errors. In addition, as underlined in the illustration section, the interpretation of Kendall’s tau is different from what is done with the two-step copula approach”

We also note in the discussion that the package can be used beyond the scope of evaluating surrogate endpoints to estimate the fixed treatments effects, taking competing risks and the heterogeneities in the data into account.

Although simulation studies can be used to choose optimal values to assign to some arguments of the jointSurroPenal() function when fitting the model on the advanced ovarian cancer meta-analysis dataset, their use in the manuscript is not directly due to this dataset. Simulation studies can be performed prior to any meta-analytic dataset. A consistent response to this comment can be found in the response to comment #13 of reviewer #1. We have updated the manuscript accordingly.

Minor comments:

We have reworded these sentences.

2. Many minor grammar issues in main text.

The manuscript has now been copyedited by a professional scientific copyeditor.

1. I found the Introduction confusing. The title of the paper leads one to believe that this is a description of a particular package, it needs an early lead in to the fact that this is NOT what the paper is about. Two or three sentences would do, e.g., "frailtypack is an R package that fits a variety of survival models containing one or more random effects, or frailties...." "it includes for instance simple shared frailty, correlated...." "...for this paper we will focus on a particular subset of features applicable to the evaluation of surrogate endpoints ..." We need just enough to orient the user so that they don't feel like they were dropped into chapter 2 of a novel.

We thank reviewer #3 for this comment. For the sake of clarity, we have modified the title of the manuscript. The new one is “How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials”. Also, we have added the following paragraph in the introduction section of the manuscript in order to guide the user to the part of the package concerned:

“frailtypack is an R package that fits a variety of frailty models containing one or more random effects, or shared frailty. For instance, it includes a shared frailty model, a joint frailty model for recurrent events and terminal event, others forms of advanced joint frailty models (Krol et al. 2017), and now a joint frailty model for evaluating surrogate endpoints in meta-analyses of randomized controlled trials with failure-time endpoints. In this paper we focus on a particular subset of features applicable for evaluating surrogate endpoints.”

We now include a paragraph before the description of the competing methods to explain the problem with a single trial and the reason for the meta-analysis validation approach (see response to comment #2 of the reviewer #2). The manuscript has now been copyedited by a professional scientific copyeditor:.

(page 3, lines 78-85 are nicely done.)

Thank you.

In Equation 1, we include one covariate: the treatment indicator (Z_ij1) with a fixed effect and a random effect treatment-by-trial interaction to deal with the heterogeneity at the trial level in interaction with the treatment. This is different from the trial-specific treatment effects considered in the first stage of the two-step copula model of Burzykowski et al. (2001). Another advantage of our approach compared to the two-step approach is to take the potential prognostic factors not observed in the dataset into account through the individual-level random effects. However, for the purpose of validating surrogate endpoints, it is not necessary to adjust the model on additional prognostic factors. However, we take your point and plan to include in the package the possibility to adjust on more prognostic factors.

When Z is a 0/1 covariate, as is often the case, then $\\v_S$ and $v_T$ are not identifiable for the control subjects. Does this cause issues wrt estimation of the variance of $v$?

We did not experience any problems of identifiability for estimating the variance covariance matrix (sigma_v) of the random effect treatment-by-trial interaction, given that a meta-analysis of randomized controlled trials as considered here always includes untreated and treated subjects. Therefore, there is always information for estimating sigma_v.

We refer users to the Appendix for details on the jointSurroPenal() function. This function does not need formula given that formula is implicit when the dataset is named as we recommend. In the illustration section of the manuscript, we have added comments to guide users on the choice of the arguments for this function, when responding to comment #1 of the reviewer #1. Moreover, we have reworded the comment of this function as follows:

“The mandatory argument of this function is data, the dataset to use for the estimations. Argument Data refers to a dataframe including at least 7 variables: patienID, trialID, timeS, statusS, timeT, status and trt. The description of these variables like other arguments of the function can be found in S2A Appendix in S2 Appendix, or via the R command help(jointSurroPenal). The rest of the arguments can be set to their default values. In addition, details on the required arguments/values are given in the illustration section”

We now include comments about the necessary and optional arguments in the illustration section (see response to comment #1 of the reviewer #1).

Done

We thank the reviewer for the comment. We now include more details for the sake of clarity. Therefore, the new comments should improve the understanding of this important part of the manuscript.

The robustness of the method to model misspecification had been studied in simulation studies and was detailed in the methodological paper published in Statistics in Medicine (Sofeu et al. 2019), and we found satisfactory results. The current manuscript aims at wider outreach for the R package that implements this model. However, we plan to include an option in the jointSurroPenalSimul() function for simulation studies in which survival times are generated from other distributions (e.g. Copula model, Poisson model). We now include some discussion in the manuscript regarding this comment, as a response to comment #2 of the reviewer #1.

As we noticed in the legend of Figure 1, the direction of the arrow indicates that the object from the parent function is used by the child function. Therefore, the plot() function uses the object from the jointSurroPenal() function. However, you are right about the usefulness of this figure since all functions have already been defined. We have therefore removed it from the manuscript.

As indicated in the legends, Fig 2 is based on the advanced ovarian cancer meta-analysis, while Fig 3 is based on the generated data. In addition, the following comment had already been included in the manuscript in order to avoid confusion:

“Fig 3 shows another representation of the baseline survival and hazard functions for the surrogate and the true endpoints. We use the object joint.surro.sim.MCPGH for this purpose”

We have added the expression “which is based on the generated data” at the end of the previous comments.

Minor comments (no response required)

Abstract sentence 1: " ... for accelerated effectively the phase 3 trial." I've read this 3 times and still do not know what the words mean.

The sentence has been reworded as follows:

“The use of valid surrogate endpoints can accelerate the development of phase III trials”

Abstract: "This model was quiet robust..." I think you mean 'quite robust'.

Exactly

$\\alpha$ and $\\zeta$ might make a little more sense attached to S rather than T. After all, death is death, but different

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(482KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0228098.r003

Decision Letter 1

Alan D Hutson

12 Dec 2019

PONE-D-19-20769R1

How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

PLOS ONE

Dear Mr. SOFEU,

We would appreciate receiving your revised manuscript by Jan 26 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Joshua Jones

Academic Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments (if provided):

Please attend to the minor comments of the reviewers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: I would like to thank the authors for their thorough responses to reviewers’ comments - I think the manuscript has greatly improved, and my previous comments have been mostly addressed. Well done.

However, I still have some (minor) comments:

IQWiG is first introduced in line 111, but the acronym is introduced only in line 199 - I would move the acronym to line 111 instead.

There is some style inconsistency: some R output is coloured, some is not. Please make consistent throughout the manuscript, I am not sure if this will be copy-edited later on.

Line 287: it is a bit confusing what figure are you referring to. Please reword the sentence.

In computational details and package installation: it is not necessary to specify the repos argument when installing a package. Furthermore, the {devtools} functionality to install packages from GitHub seems to have been moved to the {remotes} package (and only imported in {devtools}). I would consider referring to {remotes} rather than {devtools}, but this is completely up to you.

I just noticed that the column with the patient ID is assumed to be “patienID”. Should it be “patientID” instead? This would be more coherent with the other names, where the whole word is used.

In the section on choosing model via the LCV: I think it is still a bit difficult to compare the estimates of the two competing models. I think it might be helpful to include a separate table that compares them side by side. Besides that, I still think that the coefficients are not so similar - including the confidence intervals that show large overlap would be great too, and would sell the point better (in my opinion).

Some sentences are a bit confusing at the moment, please reword:

Lines 365-366;

Line 382: “observation on...” could be replaced by “the estimated value of…”;

Line 389: reword after the comma;

Lines 508-510.

Finally, I found some more typos:

Line 103, replace “random effect” with “random effects”

Lines 380-381, replace “compare” with “compared”

Line 552, replace “neccesary” with “necessary”

Reviewer #2: Much more details had provided in the revision, and the involvement of professional typewriter improved the readability noticeably. The revision addressed all my concerns.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2020 Jan 28;15(1):e0228098. doi: 10.1371/journal.pone.0228098.r004

Author response to Decision Letter 1

6 Jan 2020

PONE-D-19-20769R1

How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

We thank the editor and the reviewers for the great interest to our manuscript. Please find below our replies to the reviewers’ comments. We have also revised the manuscript in accordance with the comments when necessary.

6. Review Comments to the Author

However, I still have some (minor) comments:

IQWiG is first introduced in line 111, but the acronym is introduced only in line 199 - I would move the acronym to line 111 instead.

Done

There is some style inconsistency: some R output is coloured, some is not. Please make consistent throughout the manuscript, I am not sure if this will be copy-edited later on.

Thank you to reviewer #1 for this comment. Actually, only the R codes used for the outputs are colored. The uncolored codes are related to the definition (or description) of the corresponding functions. This is the case for all functions in section “Available functions in the frailtypack R package for surrogacy evaluation” and the S3method plot() in section “Illustration”. We chose not to color the outputs.

Line 287: it is a bit confusing what figure are you referring to. Please reword the sentence.

We reworded the sentence as:

“A list of other models implemented in frailtypack [23] can be found in S1 Fig”

You are right concerning the points mentioned above. However, the repos argument helps to guide the user on the repository that we are currently used to install the package and to avoid on some platforms (such as in a Linux terminal) the selection in a proposed list the repository to use. Moreover, the use of this argument does not negatively impact the functionality of the command. Regarding the use of the package “remotes” rather than “devtools”, we have not experienced the first one and as mentioned by the reviewer, this package is imported in “devtools”. So, we chose to use the “devtools” package since it allows several tools necessary for the compilation of the package.

We rewrote the column name patienID of argument data as patientID. We updated the package and the manuscript subsequently

We thank reviewer #1 for this comment. We aim in this section to illustrate how to use LCV for model comparison. It is not really the goal to compare the estimates of the two competing models, given that they can be different. So by using LCV, one can choose the best model according to the data. We rephrased the interpretation in the corresponding section as follows:

“As expected [17], the two observed values of LCV are quite similar. The summary() function applied to previous objects give results shown below. When comparing the two models, estimates of most coefficients and standard errors showed some differences. However this observation does not alter conclusions on the surrogacy validity captured by Kendall's tau and R2trial.”

Some sentences are a bit confusing at the moment, please reword:

Lines 365-366;

We rephrased the sentence as:

“By changing the value of some arguments, we obtained the following set of arguments/values which allowed convergence:”

Line 382: “observation on...” could be replaced by “the estimated value of…”;

Done

Line 389: reword after the comma;

We rephrased the sentence as:

“These parameters can be interpreted as usual, but taking adjustment on the random effects into account.”

Lines 508-510.

We rephrased the sentence as:

Finally, I found some more typos:

Line 103, replace “random effect” with “random effects”

Done

Lines 380-381, replace “compare” with “compared”

Done

Line 552, replace “neccesary” with “necessary”

Done

Reviewer #2: Much more details had provided in the revision, and the involvement of professional typewriter improved the readability noticeably. The revision addressed all my concerns.

Thank you

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(375.9KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0228098.r005

Decision Letter 2

Alan D Hutson

8 Jan 2020

How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

PONE-D-19-20769R2

Dear Dr. SOFEU,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Alan D Hutson

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0228098.r006

Acceptance letter

Alan D Hutson

13 Jan 2020

PONE-D-19-20769R2

How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

Dear Dr. Sofeu:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Alan D Hutson

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Package characteristics (version 3.0.3.1).

(TIF)

Click here for additional data file.^{(410.2KB, tif)}

S1 Appendix. Extension of the methodology.

(PDF)

Click here for additional data file.^{(185.2KB, pdf)}

S2 Appendix. Description of the arguments and return values for the functions.

(PDF)

Click here for additional data file.^{(219.8KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(482KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(375.9KB, pdf)}

Data Availability Statement

All data files are available from the frailtypack package, which can be download on the Comprehensive R Archive Network (CRAN). URL: https://cran.r-project.org/web/packages/frailtypack/index.html.

[pone.0228098.ref001] 1. Fleming TR, DeMets DL. Surrogate End Points in Clinical Trials: Are We being Misled? Annals of Internal Medicine. 1996;125(7):605–613. 10.7326/0003-4819-125-7-199610010-00011 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref002] 2. Matulonis UA, Oza AM, Ho TW, Ledermann JA. Intermediate Clinical Endpoints: A Bridge Between Progression-Free Survival and Overall Survival in Ovarian Cancer Trials. Cancer. 2015;121(11):1737–1746. 10.1002/cncr.29082 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref003] 3. Ellenberg SS, Hamilton JM. Surrogate Endpoints in Clinical Trials: Cancer. Statistics in Medicine. 1989;8(4):405–413. 10.1002/sim.4780080404 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref004] 4. Booth CM, Eisenhauer EA. Progression-Free Survival: Meaningful or Simply Measurable? Journal of Clinical Oncology. 2012;30(10):1030–1033. 10.1200/JCO.2011.38.7571 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref005] 5. Prentice RL. Surrogate Endpoints in Clinical Trials: Definition and operational criteria. Statistics in Medicine. 1989;8(4):431–440. 10.1002/sim.4780080407 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref006] 6. Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. The Validation of Surrogate Endpoints in Meta-Analyses of Randomized Experiments. Biostatistics. 2000;1(1):49–67. 10.1093/biostatistics/1.1.49 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref007] 7. Burzykowski T, Molenberghs G, Buyse M, Geys H. The Evaluation of Surrogate Endpoints. Springer-Verlag, New-york, NK; 2005. [Google Scholar]

[pone.0228098.ref008] 8. Burzykowski T, Molenberghs G, Buyse M, Geys H, Renard D. Validation of Surrogate End Points in Multiple Randomized Clinical Trials with Failure Time End Points. Journal of the Royal Statistical Society C (Applied Statistics). 2001;50(4):405–422. 10.1111/1467-9876.00244 [DOI] [Google Scholar]

[pone.0228098.ref009] 9.Rotolo F. surrosurv: Evaluation of Failure Time Surrogate Endpoints in Individual Patient Data Meta-Analyses; 2017. Available from: https://CRAN.R-project.org/package=surrosurv. [DOI] [PubMed]

[pone.0228098.ref010] 10. Alonso A, Bigirumurame T, Burzykowski T, Buyse M, Molenberghs G, Muchene L, et al. Applied Surrogate Endpoint Evaluation Methods with SAS and R. Chapman and Hall/CRC; 2017. [Google Scholar]

[pone.0228098.ref011] 11. Renfro LA, Shi Q, Sargent DJ, Carlin BP. Bayesian Adjusted R2 for the Meta-Analytic Evaluation of Surrogate Time-To-Event Endpoints in Clinical Trials. Statistics in Medicine. 2012;31(8):743–761. 10.1002/sim.4416 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref012] 12. Shi Q, Renfro LA, Bot BM, Burzykowski T, Buyse M, Sargent DJ. Comparative Assessment of Trial-Level Surrogacy Measures for Candidate Time-to-Event Surrogate Endpoints in Clinical Trials. Computational Statistics & Data Analysis. 2011;55(9):2748–2757. 10.1016/j.csda.2011.03.014. [DOI] [Google Scholar]

[pone.0228098.ref013] 13. Rotolo F, Paoletti X, Burzykowski T, Buyse M, Michiels S. A Poisson Approach to the Validation of Failure Time Surrogate Endpoints in Individual Patient Data Meta-Analyses. Statistical Methods in Medical Research. 2019;28(1):170–183. 10.1177/0962280217718582 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref014] 14. Alonso A, Molenberghs G. Surrogate Marker Evaluation from an Information Theory Perspective. Biometrics. 2007;63(1):180–186. 10.1111/j.1541-0420.2006.00634.x [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref015] 15. Buyse M, Michiels S, Squifflet P, Lucchesi KJ, Hellstrand K, Brune ML, et al. Leukemia-free Survival as a Surrogate End Point for Overall Survival in the Evaluation of Maintenance Therapy for Patients with Acute Myeloid Leukemia in Complete Remission. Haematologica. 2011;96(8):1106–1112. 10.3324/haematol.2010.039131 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0228098.ref016] 16. Buyse M, Molenberghs G, Paoletti X, Oba K, Alonso A, der Elst WV, et al. Statistical Evaluation of Surrogate Endpoints with Examples from Cancer Clinical Trials. Biometrical Journal. 2016;58(1):104–132. 10.1002/bimj.201400049 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref017] 17. Sofeu CL, Emura T, Rondeau V. One-step validation method for surrogate endpoints using data from multiple randomized cancer clinical trials with failure-time endpoints. Statistics in Medicine. 2019;38(16):2928–2942. 10.1002/sim.8162 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref018] 18. Lin DY, Wei LJ. The Robust Inference for the Cox Proportional Hazards Model. Journal of the American Statistical Association. 1989;84(408):1074–1078. 10.1080/01621459.1989.10478874 [DOI] [Google Scholar]

[pone.0228098.ref019] 19.Van der Elst W, Meyvisch P, Alonso A, Ensor HM, Molenberghs CJWG. Surrogate: Evaluation of Surrogate Endpoints in Clinical Trials; 2018. Available from: https://CRAN.R-project.org/package=Surrogate.

[pone.0228098.ref020] 20. Bujkiewicz S, Thompson JR, Riley RD, Abrams KR. Bayesian Meta-Analytical Methods to Incorporate Multiple Surrogate Endpoints in Drug Development Process. In: Statistics in medicine; 2016. 10.1002/sim.6776 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0228098.ref021] 21.Institute for Quality and Efficiency in Health Care. Validity of Surrogate Endpoints in Oncology: Executive Summary; 2011. Available from: www.iqwig.de/download/A10-05_Executive_Summary_v1-1_Surrogate_endpoints_in_oncology.pdf. [PubMed]

[pone.0228098.ref022] 22. Burzykowski T, Buyse M. Surrogate Threshold Effect: An Alternative Measure for Meta-Analytic Surrogate Endpoint validation. Pharmaceutical Statistics. 2006;5(3):173–186. 10.1002/pst.207 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref023] 23. Król A, Mauguen A, Mazroui Y, Laurent A, Michiels S, Rondeau V. Tutorial in Joint Modeling and Prediction: A Statistical Software for Correlated Longitudinal Outcomes, Recurrent Events and a Terminal Event. Journal of Statistical Software, Articles. 2017;81(3):1–52. [Google Scholar]

[pone.0228098.ref024] 24. Ovarian cancer Meta-Analysis Project. Cyclophosphamide Plus Cisplatin Plus Adriamycin Persus Cyclophosphamide, Doxorubicin, and Cisplatin Chemotherapy of Ovarian Carcinoma: A Meta-Analysis. Classic Papers and Current Comments. 1991;3:237–234. [Google Scholar]

[pone.0228098.ref025] 25. Marquardt DW. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics. 1963;11(2):431–441. 10.1137/0111030 [DOI] [Google Scholar]

[pone.0228098.ref026] 26. Joly P, Commenges D, Letenneur L. A Penalized Likelihood Approach for Arbitrarily Censored and Truncated Data: Application to Age-Specific Incidence of Dementia. Biometrics. 1998;54(1):185–194. 10.2307/2534006 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref027] 27. Gail MH, Pfeiffer R, van Houwelingen HC, Carroll RJ. On Meta-Analytic Assessment of Surrogate Outcomes. Biostatistics. 2000;1(3):231–246. 10.1093/biostatistics/1.3.231 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref028] 28. Rondeau V, Mathoulin-Pelissier S, Jacqmin-Gadda H, Brouste V, Soubeyran P. Joint Frailty Models for Recurring Events and Death Using Maximum Penalized Likelihood Estimation: Application on Cancer Events. Biostatistics. 2007;8(4):708–721. 10.1093/biostatistics/kxl043 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref029] 29.Rondeau V, Gonzalez JR, Mazroui Y, Mauguen A, Diakite A, Laurent A, et al. frailtypack: General Frailty Models: Shared, Joint and Nested Frailty Models with Prediction; Evaluation of Failure-Time Surrogate Endpoints; 2019. Available from: https://CRAN.R-project.org/package=frailtypack.

[pone.0228098.ref030] 30. Dowd BE, Greene WH, Norton EC. Computation of Standard Errors. Health Services Research. 2014;49(2):731–750. 10.1111/1475-6773.12122 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0228098.ref031] 31. Prasad V, Kim C, Burotto M, Vandross A. The Strength of Association Between Surrogate End Points and Survival in Oncology: A Systematic Review of Trial-Level Meta-Analyses. JAMA Internal Medicine. 2015;175(8):1389–1398. 10.1001/jamainternmed.2015.2829 [DOI] [PubMed] [Google Scholar]

[pone.0228098.ref032] 32. Baker SG. Five Criteria for Using a Surrogate Endpoint to Predict Treatment Effect Based on Data from Multiple Previous Trials. Statistics in Medicine. 2018;37(4):507–518. 10.1002/sim.7561 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0228098.ref033] 33. Jurgen H, Song W, John K. Using simulation to optimize adaptive trial designs: applications in learning and confirmatory phase trials. Clinical Investigation. 2015;5(4):401–413. [Google Scholar]

[pone.0228098.ref034] 34. Emura T, Nakatochi M, Murotani K, Rondeau V. A Joint Frailty-Copula Model Between Tumour Progression and Death for Meta-Analysis. Statistical Methods in Medical Research. 2017;26(6):2649–2666. 10.1177/0962280215604510 [DOI] [PubMed] [Google Scholar]

PERMALINK

How to use frailtypack for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

Casimir Ledoux Sofeu

Virginie Rondeau

Roles

Abstract

Background and Objective

Methods

Results

Conclusion

Introduction

Methodology

Model and estimation

Joint surrogate model definition

Estimation

Surrogacy evaluation criteria and interpretation

Prediction and surrogate threshold effect (STE)

Available functions in the frailtypack R package for surrogacy evaluation

Estimation of joint surrogate model and surrogacy evaluation

The jointSurroPenal() function

The jointSurroPenal object

Data generation using the R function jointSurrSimul()

Simulation studies based on the joint surrogate model

Kendall’s τ estimation using the function jointSurroTKendall

Illustrations

Computational details and package installation

Data source

Description of dataOvarian dataset

Generated dataset

Surrogacy evaluation

Model estimation based on the advanced ovarian cancer meta-analysis dataset

Summary of results

Model estimation based on generated dataset

Choice of model based on LCV

Graphical representation of baseline hazard and survival functions

Fig 1. Baseline hazard and survival functions for surrogate endpoint and true endpoint truncated at 8 months using the advanced ovarian cancer meta-analysis dataset.

Fig 2. Baseline hazard and survival functions for surrogate endpoint and true endpoint, using simulated meta-analysis of 600 subjects and 30 trials.

Model evaluation and prediction

Simulation studies

Estimations

Simulation results

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Alan D Hutson

Roles

Author response to Decision Letter 0

Decision Letter 1

Alan D Hutson

Roles

Author response to Decision Letter 1

Decision Letter 2

Alan D Hutson

Roles

Acceptance letter

Alan D Hutson

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

How to use `frailtypack` for validating failure-time surrogate endpoints using individual patient data from meta-analyses of randomized controlled trials

Available functions in the `frailtypack R` package for surrogacy evaluation

The `jointSurroPenal()` function

The `jointSurroPenal` object

Data generation using the `R` function `jointSurrSimul()`

Kendall’s τ estimation using the function `jointSurroTKendall`

Description of `dataOvarian` dataset

Choice of model based on `LCV`