Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jan 1.
Published in final edited form as: Am Stat. 2023 Jun 26;78(1):76–87. doi: 10.1080/00031305.2023.2216253

Sensitivity Analyses of Clinical Trial Designs: Selecting Scenarios and Summarizing Operating Characteristics

Larry Han a,*, Andrea Arfè b, Lorenzo Trippa a,c
PMCID: PMC11052542  NIHMSID: NIHMS1917980  PMID: 38680760

Abstract

The use of simulation-based sensitivity analyses is fundamental for evaluating and comparing candidate designs of future clinical trials. In this context, sensitivity analyses are especially useful to assess the dependence of important design operating characteristics with respect to various unknown parameters. Typical examples of operating characteristics include the likelihood of detecting treatment effects and the average study duration, which depend on parameters that are unknown until after the onset of the clinical study, such as the distributions of the primary outcomes and patient profiles. Two crucial components of sensitivity analyses are (i) the choice of a set of plausible simulation scenarios and (ii) the list of operating characteristics of interest. We propose a new approach for choosing the set of scenarios to be included in a sensitivity analysis. We maximize a utility criterion that formalizes whether a specific set of sensitivity scenarios is adequate to summarize how the operating characteristics of the trial design vary across plausible values of the unknown parameters. Then, we use optimization techniques to select the best set of simulation scenarios (according to the criteria specified by the investigator) to exemplify the operating characteristics of the trial design. We illustrate our proposal in three trial designs.

Keywords: Clinical trial design, operating characteristics, sensitivity analysis, function approximation, simulated annealing

1. Introduction

Clinical trial designs are becoming increasingly complex to meet the multifaceted needs and goals of precision medicine. Examples of complex designs include adaptive seamless phase i/ii designs for evaluating, early in the treatment development process, the dosing, safety, and activity of new drugs (Hobbs et al., 2019). Also, adaptive randomized trials with frequent interim looks at the data can evaluate one or more therapies simultaneously while attempting to minimize trial duration and resources (Thorlund et al., 2018; Berry et al., 2010). Additional examples of complex designs have been implemented in biomarker-stratified trials to evaluate the efficacy of a therapy and possible variations of treatment effects across patient subgroups (Mehta et al., 2019).

When planning a new trial, it is necessary to predict and evaluate several operating characteristics. Relevant operating characteristics can include the likelihood of selecting an effective dose with low toxicity in a phase i/ii study, the probability of detecting treatment effects in a randomized study, the expected trial duration, costs, and other metrics to evaluate designs that often enroll patients from different subgroups. Multiple operating characteristics typically need to be examined jointly in order to evaluate the relevant trade-offs achieved by candidate designs, such as balancing the accuracy in estimating treatment effects and the expected study duration.

The obvious challenge for evaluating a candidate design is that the vector of operating characteristics of the study design is not known and it is difficult to estimate before the onset of the trial. Indeed, the operating characteristics are usually a function of a vector of unknown parameters that identify the distribution of all relevant variables to be captured during the trial. For example, unknown parameters can include the enrollment and drop-out rates, the magnitude of treatment effects, and the prevalence of predictive biomarkers in the trial population. Uncertainty on these parameters makes it non-trivial to evaluate whether a candidate design is appropriate for implementing the new study.

Sensitivity analyses are commonly used to account for uncertainty on unknown parameters and operating characteristics when evaluating a candidate design. They typically proceed in three steps. First, a set of plausible scenarios, i.e., specific values of the vector of unknown parameters, is selected. Next, the corresponding operating characteristics are computed using trial simulations or analytic results. Finally, based on the computed operating characteristics and their variations across the set of scenarios, the investigators evaluate whether the candidate design is appropriate to achieve the aims of the study. Throughout the manuscript, we use the terms sensitivity analysis or simulation report to indicate a set of scenarios and the associated operating characteristics which are computed to illustrate how the operating characteristics vary across plausible values of unknown parameters.

Producing a simulation report to effectively evaluate a study design has been recommended as one of the key supporting documents for interacting with the FDA (Mayer et al., 2019; Food et al., 2020). However, it can be difficult to select the set of unknown parameters, especially if the dimension of the vector of unknown parameters is moderate to high (say ≥ 5). For the investigators, it might be unclear if the selected scenarios are adequate to illustrate the variations of the operating characteristics across potential values of the unknown parameters. Similarly, for regulators, there may be skepticism as to whether the selected scenarios are chosen to highlight positive aspects of the trial design without pointing at its limitations and negative aspects (Razavi et al., 2021). Another subtle challenge is the choice of the number of scenarios. Indeed, a large number of scenarios (say 100) may simplify the task of representing how the operating characteristics vary across potential values of the unknown parameters, but a simulation report that contains too many scenarios makes it difficult to interpret and communicate the included results.

We propose a method to choose an optimal set of scenarios for a simulation report that will provide relevant operating characteristics. This decision is based on a utility criterion, which formalizes the ability of any set of scenarios to represent the map between the unknown parameters and the operating characteristics. In some cases, we will consider a restriction of the parameter space to focus only on regions of plausible values of the unknown parameters. The utility criterion assigns high (low) utility to a set of scenarios if the table of potential unknown parameters and operating characteristics is an accurate (inaccurate) summary of how the design's operating characteristics vary across the considered parameter space. We call the set of scenarios that maximizes the utility criterion the Representative and Optimal Sensitivity Analysis (ROSA) scenarios. To select the ROSA scenarios, we introduce a computational procedure that leverages (i) flexible regression methods like neural networks (NNs) (Goodfellow et al., 2016) and (ii) optimization algorithms like simulated annealing (Bélisle, 1992). Our approach is applicable to any trial design, regardless of the number of unknown parameters and the number of operating characteristics.

In summary, we propose ROSA as a computational tool that allows one to examine any clinical trial design by selecting a parsimonious set of simulation scenarios with the goal of representing the variations of the operating characteristics across plausible values of unknown parameters. To illustrate this approach, we conduct sensitivity analyses for three trial designs. The first is a two-arm randomized design that aims to test and estimate the effects of an experimental treatment compared to the standard of care (SOC). The second is a multi-stage randomized trial that leverages an auxiliary/surrogate outcome S measured shortly after randomization for interim decisions and a primary outcome Y with a longer ascertainment time (Niewczas et al., 2019). The third is a biomarker-adaptive enrichment design similar to the design of the TAPPAS trial (Mehta et al., 2019), a randomized phase iii trial comparing TRC105 and pazopanib versus pazopanib alone in patients with advanced angiosarcoma (Jenkins et al., 2011; Jones et al., 2017). In the first design, we consider a single unknown parameter and a single operating characteristic, whereas for the latter two designs we consider multiple unknown parameters and multiple operating characteristics.

2. Selecting sensitivity scenarios

2.1. Notation and problem set-up

We introduce our procedure to select K sensitivity scenarios θ1,,θKΘ, where Θ is the set of potential values of the unknown parameters θ. We assume that Θ is a bounded subset of d and use the notation 2 to indicate the Euclidean norm on d. We will restrict Θ to a subset Θ when there is sufficient prior information from completed studies or clinical experience. We identify ROSA scenarios θ1*,,θK* as the scenarios that maximize a utility criterion U

θ1*,,θK*=argmaxθ1,,θKUθ1,,θK, (1)

where

Uθ1,,θK=maxθΘmink=1,,KDfθ,fθk. (2)

We can symmetrically define the corresponding loss function L=U by inverting the sign in equation (2). Here, Dfθ,fθk is a metric between the operating characteristics fθ=f1θ,fRθ and fθk=f1θk,,fRθk. We will consider metrics of the form

Dfθ,fθk=r=1Rwrfrθfrθk2,

where w1,,wR are non-negative weights that sum to one. The weights can be user-specified to calibrate the relative importance of different operating characteristics. Setting the weights to 1/R results in equal weighting for each operating characteristic.

We can now provide an explicit interpretation of the utility function U in equation (2). Consider a set of scenarios θ1,,θK – the order of the entries is not relevant – and an arbitrary scenario θ in Θ. For 1kK, the metric Dfθ,fθk is a summary of the differences between the operating characteristics at θ and the same operating characteristics when we consider the k-th scenario θk. Therefore, mink=1,,KDfθ,fθk can be viewed as an approximation error between fθ and a similar vector of operating characteristics selected among our K options fθ1,,fθK. Expression (2) identifies through the maximization operator the worst-case (with highest approximation error) that we can obtain by varying θ in Θ. We maximize the utility function U and use θ1*,,θK* to indicate the ROSA scenarios. Alternative utility criteria and loss functions are described later in the manuscript. A table of notation used throughout the paper is provided in Table 1 below.

Table 1.

Notation

Θ Unknown parameter space in d
Θ Restricted unknown parameter subspace by prior knowledge in d
Θre Restricted unknown parameter subspace by prior knowledge and fixing certain dimensions in d
ΘF Diffuse and finite unknown parameter subspace in d
θ=θ1,,θd d-dimensional vector of unknown parameters
θt=θ1t,,θdt d-dimensional training vector of unknown parameters
θv=θ1v,,θdv d-dimensional validation vector of unknown parameters
θ1,,θK A set of K sensitivity scenarios
S=θ1*,,θK* The ROSA set of K sensitivity scenarios optimizing loss L
Sr=θ1,r*,,θK,r* The ROSA set of K sensitivity scenarios optimizing marginal loss Lr
fθ R-vector of operating characteristics for unknown parameters θ
f^(θ) Estimated R vector of operating characteristics for unknown parameters θ
f¯(θ) Average across M simulations of the R vector of operating characteristics for unknown parameters θ
φZj,m,θj Generic function to capture if a null hypothesis has been rejected, where Zj,m is the mth trial under the jth scenario, θj
Lθ1,,θK Loss Function
Uθ1,,θK Utility criterion
w1,,wr Fixed non-negative weights for operating characteristics f1,,fr
ω1,ω2 Weights for stage 1 and 2 p-values
D, Pre-specified distance metric
z1i,,zKi Gaussian noise in iteration i of simulated annealing
ρi Acceptance probability in iteration i of simulated annealing
Θ Unknown parameter space in d
T0,T1,TI Decreasing sequence of positive numbers (cooling schedule of simulated annealing)
r Multiplicative reduction factor for simulated annealing in (0, 1)
Ui Random variable distributed Uniform(0,1) for simulated annealing
e Enrollment rate in (0,∞)
Na Planned number of patients on arm a=0,1 at the final analysis
na Planned number of patients on arm a=0,1 at the interim analysis
S Binary auxiliary outcome
Y Primary outcome
ρa Response probability PY=1|A=a
Δ=p1p0 Treatment effect on Y
qa Response probability PS=1|A=a
ρa Correlation between Y and S in A=a

2.2. An example with a geometric interpretation

To provide a geometric interpretation of the utility criterion U, we illustrate how one set of K scenarios can be preferable to a different set of K scenarios (Figure 1 ). Specifically, suppose we aim to design a single-arm trial with an interim analysis that allows for early-stopping for futility. The goal of the trial is to compare the response rate of an experimental drug θ1 with that of the SOCθ0 at the end of the study. However, because study patients only receive the experimental drug, the response rate under the SOCθ0 is estimated θ^0 before the onset of the study, for example using data from a previous trial. At the interim analysis, the trial may stop for futility if the preliminary evidence of positive treatment effects Δinterim is insufficient to continue the study. During the final analysis, the null hypothesis H0:θ1θ^0 (the experimental therapy is not superior to the historical control) is tested against the alternative hypothesis H1:θ1>θ^0 (the experimental therapy is superior to the historical control). In this design, θ=θ0,θ1 are the unknown parameters, and Θ=0,12. Suppose that there are two operating characteristics of interest: (i) f1, the probability of a positive result (H0 is rejected) and (ii) f2, the expected sample size.

Fig. 1.

Fig. 1

Geometric representation of an arbitrary scenario θ and two proposed sets of scenarios. (Left) Parameter space Θ=[0,1]2 with arbitrary scenario θ (orange triangle) and two sets of proposed scenarios θ11,,θ61 (blue points) and θ12,,θ62 (red points). (Right) The set of operating characteristics fΘ coincides with the irregular shape. The operating characteristics of θ and two sets of scenarios are illustrated. The radius of the dotted circles (with blue points as centers) is the value of the loss L associated with the blue points. ROSA scenarios minimize the loss L, which in turn is equal to the radius of the dotted circles that cover the operating characteristic surface fΘ.

The left panel of Figure 1 is a representation of Θ. We are interested in the two operating characteristics of the single-arm design. Two sets of K=6 scenarios are proposed. The first set of scenarios θ11,,θ61 (blue points) is chosen by varying both unknown parameters at the same time, while the second set θ12,,θ62 (red points) is chosen by varying only θ0 while fixing the value of θ1. The two sets of scenarios, the corresponding operating characteristics, and associated loss L=U are represented in the right panel of Figure 1. The first set of scenarios (blue points) is preferred over the second set (red points) because it is more representative of the variation of the operating characteristics over Θ. Geometrically, the loss Lθ11,,θ61 associated with the blue points is identical to the minimum radius of the circles with centers fθ11,,fθ61 (see Figure 1) necessary to cover the operating characteristics surface fΘ.

2.3. Estimating the operating characteristics

We describe an algorithm to numerically approximate the operating characteristics fθ for every θΘ. This is necessary to solve the optimization problem in equation (2). Indeed, in most cases the function fθ cannot be computed in closed form.

We briefly outline our four-step procedure. In the first step, we choose a large number J (say J=1000) of training scenarios θ1t,,θJt. In the second step, we use Monte Carlo simulations to obtain estimates f¯θ1t,,f¯θJt of fθ1t,,fθJt. In the third step, we train a flexible regression model – we use NNs in our implementation – based on the data points θ1t,f¯θ1t,,θJt,f¯θJt. The output of this step is a regression function f^θ that is easy to compute at any θΘ and that approximates fθ. In the fourth step, we validate the regression model based on J (say J=200) independent simulations θ1v,f¯θ1v,,θJv,f¯θJv. Steps 1–3 of this procedure are summarized in Algorithm 1. Step 4 is described in Algorithm 2.

In more detail, in step 1, to select the training scenarios θ1t,,θJt, we randomly select J scenarios in Θ using Latin hypercube sampling (LHS) (McKay et al., 2000). LHS generates J scenarios by first partitioning the d unknown parameter dimensions into J non-overlapping intervals and selecting one value from each interval at random. The J values obtained for the first unknown parameter θ1 are randomly paired with the J values obtained for the second θ2, and so on, for all d unknown parameters to form Jd-tuples, which constitute the training scenarios θ1t,,θJt.

In step 2, we estimate the operating characteristics of the trial design. In the paper, we focus on operating characteristics that can be defined as expected values, which are often of great interest, e.g., bias, power, duration of the trial, etc. One approach to handling unbounded operating characteristics (e.g. median squared error) is to apply simple transformations in such a way that these operating characteristics can be expressed as expected values, i.e., so that we can write down

fθ=EθφZ,θ

for some function φ, where the random vector Z represents the data generated during the trial – including the collection of treatment assignment indicators and realized patient outcomes – under scenario θ. In practice, to estimate fθ, we proceed as follows. First, for each of the training scenarios θjt,1jJ, we simulate M (say M=200) clinical trials following the trial design. We then use the M scenario-specific simulated trials to compute the estimate

f¯θjt=M1m=1MφZj,m,θjt,1jJ,

where Zj,m is the mth trial dataset simulated under the jth training scenario θjt. Throughout the manuscript φZ,θ will take values in a compact set. For example, φ can be the indicator that captures if a null hypothesis of interest has been rejected at the end of the study, or the duration of the simulated trial. One possibility for handling unbounded operating characteristics (e.g. median squared error) is to apply monotone transformations, from the real line to the unit interval, that rescale the operating characteristics. In this case, the selection of representative scenarios would be influenced by the specific monotone map used to express the operating characteristic.

In step 3, we have only two inputs, the scenarios θjt and the estimates f¯θjt,1jJ, to fit a function f^θ. For example, one could use NNs, splines (Bookstein, 1989), or Gaussian processes (Rasmussen, 2003). We use NN regression functions in our applications because these are easy to compute using widely available software and have been demonstrated to have good performances (Leshno et al., 1993; Hornik, 1991; Goodfellow et al., 2016).

Algorithm 1:Obtaining a function  f^ that approximates the operating characteristicfunctionf¯¯1 Input: Trial design, Parameter spaceΘ,J,M2 Step 1:SelectJscenariosθ1t,,θJtΘ3 Step 2:for j=1toJ do4 Simulate Mtrials5 Obtain approximate operating characteristicsf¯θjt=M1m=1MφZj,m,θjt,whereφis a function of Zj,m,the mthtrialdatasetsimulated under the jth scenario6 end7 Step 3: Obtain an approximation of the operating characteristics f by training aregression algorithm, for rexample a NN model, and using the data pointsθjt,f¯θjt,1jJ8 Output:Functionf^θ¯

In step 4 (Algorithm 2), we investigate the differences between f and f. Specifically, we first select at random J validation scenarios θ1v,,θJv independently with respect to previous computations (step 1–3) and simulate M trials (say M=500) for each θjv,1jJ. Based on the results of the simulated trials, for each J, we then compute Monte Carlo estimates f¯θjv=M1m=1MφZj,m,θjv of the operating characteristics fθjv. For several important operating characteristics (e.g., average sample size, expected duration, power, type 1 error), the estimator f¯θjv=M1m=1MφZj,m,θjv is unbiased. Finally, we compare the estimates f¯θjv and the independent estimates fθjv. We use summary statistics and graphs to evaluate the differences fθjvf¯θjv. If the approximation fθjv is not adequate, we can use a different regression methodology, increase the number M,M of trials, or increase the number J of training scenarios in Algorithm 1.

Algorithm 2:Validating the approximation of the operating characteristicsf¯¯1 Input:Approximation of the operating characteristicsfTrial design,JRandomly selectJscenariosθ1v,,θJvΘindependently from previouscomputations Algorithm 13 forj=1toJdo4 SimulateMtrialsZj,m5 Computef¯θjv=M1m=1MφZj,m,θjv6 Computefθjv7 end8 Output:Set of differencesfθjvf¯θjvand scatterplots to jointly visualize the operating characteristic estimatesf¯θjvand the independent estimatesfθjv,1jJCompute summaries of the differences (e.g.,median, range, or other descriptive statistics).9 Interpretation:Differences betweenf¯θjvand the independent estimatesfθjv,1jJ,consistently close to zero provide evidencethatfis an accurate approximation off¯

2.4. Approximating the loss function

After computing f (Algorithm 1 ) and validating its accuracy (Algorithm 2), we use it to approximate the loss function Lθ1,,θK. To proceed, we choose a diffuse and finite subset of the parameter space ΘFΘ. For example ΘF can include 100,000 random points from a distribution with support Θ. When ΘF contains a large number of random points that are distributed over Θ, under minimal assumptions (e.g., compact Θ and operating characteristics with bounded range),

Lθ1,,θK=maxθΘmink=1,,KDfθ,fθkmaxθΘFmink=1,,KDf^θ,f^θk=Lθ1,,θK.

To summarize, we can approximate the loss function Lθ1,,θK over the entire parameter space Θ by Lθ1,,θK using a diffuse and finite subset ΘF.

2.5. Optimization by simulated annealing

We now aim to approximately minimize the loss function L. To illustrate the need for approximate solutions, consider the setting of a single unknown parameter d=1, a finite Θ, and an easy-to-compute loss function L. Even in this simple setting identifying θ1*,,θK*Θ can be challenging. For example, to select K=10 representative scenarios θ1*,,θK* from 1000 points θj;1j1000=Θ, the loss function L would need to be calculated for 2.63×1023 different possible sets θ1,,θK. In what follows, we describe the use of simulated annealing (Algorithm 3 ), a simple strategy to reduce the outlined computational burden, regardless if Θ is finite or not Kirkpatrick et al. (1983); Bélisle (1992); Spall (2005).

The simulated annealing algorithm proceeds as follows. First, initial scenarios θ11,,θK1 are proposed, for example by sampling θ11,,θK1 from a probability distribution with support Θ. Then, iteratively For 1iI, the current scenarios θ1i,,θKi are perturbed by adding to them zero-mean noise variables z1i,,zKi, thus obtaining new proposed scenarios θ1 ,,θK  (this step is represented by the "Perturb" operator in Algorithm 3). At each iteration, the proposed scenarios θ1 ,,θK  can either be accepted (i.e., θ1i+1,,θKi+1θ1 ,,θK ) or rejected (i.e., θ1i+1,,θKi+1θ1i,,θKi). The acceptance or rejection of the proposed scenarios is stochastic, with probability ρi (defined below), which is a function of Lθ1 ,,θK  and Lθ1i,,θKi.

The acceptance probability ρi is equal to 1 when Lθ1,,θK<Lθ1i,,θKi. That is, if the proposed scenarios decrease the current loss value, then the proposed scenarios are accepted. If instead Lθ1,,θKLθ1i,,θKi, then ρi is

ρi=expLθ1i,,θKiLθ1,,θKTi,

where Ti,0iI, is a decreasing sequence of positive real numbers often called the "cooling schedule" of the algorithm. A common cooling schedule is Ti=T0ri1, where T0 is a constant and r0,1 is a multiplicative contraction, but other forms are possible (Spall, 2005). In our applications, we use a piecewise-constant cooling schedule (Husmann et al., 2017).

After simulating the outlined Markov Chain for a fixed number I of iterations, the final set of scenarios θ1I+1,,θKI+1 approximately minimizes the loss function L (Bélisle, 1992). In our ROSA implementation, we use multiple independent replicates of Algorithm 3, with different initial scenarios θ11,,θK1, to investigate convergence of the random trajectory θ1i,,θKi;i1. Intuitively, we evaluate if the replicated trajectories, with different starting values, terminates with nearly identical final vectors θ1I+1,,θKI+1 and negligible differences in the Lθ1I+1,,θKI+1 values.

Algorithm 3:Pseudocode for simulated annealing to obtain ROSA scenarios¯¯1 Initialize the values ofθ11,,θK1,e.g., by sampling from a distribution overΘ2 fori=1to/do3 New proposalθ1,,θK Perturbθ1i,,θKi4 ifLθ1 ,,θK Lθ1i,,θKithen5 Defineθji+1=θj for everyj=1,,K;6 else7 Compute the acceptance probabilityρi=expLθ1i,,θKiLθ1,,θK/Ti8 SampleUiUniform0,19 IfUiρi,defineθji+1=θj for everyj=1,,K;10 Otherwiseθji+1=θjifor everyj=1,,K.11 end12 Output:θ1I+1,,θKI+1,Lθ1I+1,,θKI+1¯

3. Applications: Sensitivity analyses of trial designs

We illustrate the ROSA approach by performing sensitivity analyses for three designs of different complexity levels. In each example, we describe the design of the trial, the unknown parameters, and the operating characteristics of interest. By illustrating the ROSA methodology in three trial designs, we show its flexibility with potential applications to evaluate nearly any clinical trial design. Indeed, ROSA only requires the possibility of simulating the trials under potential unknown parameters θΘ and the definition of the operating characteristics of interest.

3.1. Application 1: Two-arm RCT

In the first example, we will only consider a single unknown parameter (i.e., θ) and a single operating characteristic fθ that can be computed analytically. In this case, the optimal set of scenarios θ1*,,θK* can be computed exactly, without resorting to approximation methods. This simple and stylized setting is useful to highlight the similarity of the approximations and selected scenarios computed by ROSA with their exact counterparts.

3.1.1. Trial design

We consider the design of a two-arm randomized trial (1:1 randomization ratio) with a sample of n=30 patients. For each i=1,,n, we let Ai=0 or 1 if the i-th study patient is assigned to the control or experimental arm. The outcomes of the n study patients are Y1,,Yn, which we assume to be independent and normally distributed. If Ai=a then Yi has mean μa and standard deviation σ equal to 30. In the analysis of the study, a z-statistic will be used to test the null hypothesis H0:μ1μ00 against the alternative H1:μ1μ0>0 at 5% significance level.

3.1.2. Aim of the sensitivity analysis

The goal of the sensitivity analysis is to summaze the variation of the probability of rejecting H0, a function fθ of the unknown treatment effect θ=μ1μ0Θ=. For example, if we knew that θ=13.5, then fθ=0.80, but in general θ is an unknown value. Suppose we aim to identify K=3 scenarios θ1*,θ2*,θ3* that maximize the utility U, i.e.,

θ1*,θ2*,θ3*=argmaxθ1,θ2,θ3ΘUθ1,θ2,θ3, (3)

where Uθ1,θ2,θ3=maxθΘmink=1,2,3fθfθk.

In this trial, we have a single unknown parameter Θ=, and the operating characteristic of interest is monotone, continuous, invertible, and ranges from 0 to 1. Therefore, it is straightforward to see that the optimal scenarios θ1*,θ2*,θ3* correspond to the operating characteristic values that evenly divide the interval (0,1). To be precise, fθ1*,fθ2*,fθ3*=1/6,3/6,5/6; these are the three values of a regular grid on the interval (0, 1). Figure 2A illustrates the optimal set of scenarios when K=3,5,10. Since fθ can be calculated exactly, the optimal scenarios θ1*,θ2*,θ3* can be obtained by computing the inverse function f1 at the values 1/6, 3/6, and 5/6. Specifically,

θ1*,θ2*,θ3*=σzfθ1*+z1α/2n,σzfθ2*+z1α/2n,σzfθ3*+z1α/2n,

where z1α/2 is the 1α/2 quantile of the standard normal distribution. The corresponding optimal scenarios are illustrated as red asterisks in Figure 2B.

Fig. 2.

Fig. 2

Sensitivity analysis of a RCT (operating characteristic: probability of rejecting H0). Panel A: Exact solutions when K=3,5,10. Panel B: Comparison of K=3 scenarios selected through exact calculation (red asterisks) and by 20 ROSA implementations with different initial proposals (blue points). Panel C: Graphical tool to choose the number K of sensitivity scenarios.

3.1.3. Implementing and benchmarking ROSA

The exact computation of the optimal set of scenarios provides a solid benchmark for an initial evaluation of ROSA (Algorthms 1–3). We can compare the exact solution with the results from ROSA, which has the advantage of being applicable to other designs and operating characteristics that are not available in closed form.

We implement our ROSA approach to identify K=3 scenarios. We randomly select J=1000 scenarios θ1t,,θ1000t with independent samples from the Uniform(−5, 25) distribution. Note that f50 and f251. For each θjt,1j1000, we simulate M=200 trials to compute the estimate f¯θjt=2001m=1200φZj,m,θjt, where φZj,m,θjt0,1 either accepts or rejects the H0:θjt0 for trial m and scenario j. Then, we compute a continous function f^θ using the independent estimates f¯θj and a NN with 3 hidden layers (8, 64, and 64 neurons respectively) and ReLU activation functions. Finally, to select three sensitivity scenarios, we use a simulated annealing algorithm based on an initial parameterization T1=1000 and final parameterization Tmin=0.1 (c.f. Algorithm 3). We repeat these three steps (selection of scenarios, use of the NN, and optimization with simulated annealing) 20 times, each time initializing θ1, θ2, θ3 with independent random draws from the Uniform (−5, 25) distribution. The results of the exact approach (red asterisks) compared with ROSA (blue points) are shown in Figure 2B. The scenarios θ1*, θ2*, θ3* selected by simulated annealing (blue dots) are close to the exact solution (red asterisks).

3.1.4. Choice of number  K of scenarios

In practice, the decision regarding the number  K of scenarios to report is left to the analyst. This choice can be supported by a graph like Figure 2C, which allows the investigator to determine the minimum number  K of scenarios needed to guarantee a loss Lθ1*,,θK* no larger than a targeted threshold. For example, to guarantee a loss no larger than 0.050 in this example, we need to select at least 10 scenarios for the simulation report.

We ran ROSA with K = 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, or 30, and compared the loss L in the resulting set of scenarios with that of the exact solution. The difference in the loss L of the exact and approximate optima was less than 1% across all  K values that we considered (Figure 2C). Table 2 indicates that the computation time of the simulated annealing algorithm scales well as  K increases and that, as expected, the loss L decreases as  K increases. All analyses were run on a Windows laptop with an Intel(R) Core(TM) i7-7700HQ 2.80 GHz processor, 16GB RAM, and 6MB of cache memory.

Table 2.

ROSA computation time, ROSA loss L, minimum (exact) loss L, and relative difference in loss of ROSA scenarios compared to the exact solutions.

Number K of Scenarios Time (seconds) ROSA Loss L Min. Loss L Rel. Diff.
5 8.8 0.101 0.100 1.0%
6 8.8 0.084 0.083 0.7%
7 9.1 0.072 0.071 0.8%
8 9.2 0.062 0.0625 0.7%
9 9.1 0.056 0.056 0.6%
10 9.1 0.050 0.050 0.2%
20 10.1 0.025 0.025 0.5%
30 10.2 0.017 0.0167 0.8%

3.2. Application 2: Interim decisions based on auxiliary outcomes

In the second example, we consider sensitivity analyses with multiple unknown parameters and two operating characteristics. We illustrate the use of our computational procedures, including the operating characteristics approximation procedure (Algorithm 1), the validation procedure (Algorithm 2), and the simulated annealing optimization procedure (Algorithm 3). We investigate whether it is appropriate to fix the value of some of the unknown parameters across all sensitivity scenarios. Identical values for a subset of the unknown parameters can simplify the interpretation of the sensitivity analysis but can also introduce severe limitations in faithfully representing how the operating characteristics vary across plausible values of the unknown parameters.

3.2.1. Trial design

We consider a two-arm, two-stage randomized trial with a binary primary outcome Y and a binary auxiliary outcome S (Niewczas et al., 2019). The primary outcome Y is available TY months after randomization, while the auxiliary outcome S is available after TS<TY months. For example, in glioblastoma trials, 12-month progression-free survival (PFS) and 24-month overall survival (OS) have been used as auxiliary and primary outcomes, respectively (Han et al., 2014). The approach that we illustrate is applicable for any value of TY and TS<TY.

We let Na be the planned number of patients for arms a=0,1 (i.e., control and experimental arms) and indicate with pa the response probability PY=1|A=a. Similarly, let na be the planned number of patients assigned to arm a before the interim analysis, and qa indicate the response probability PS=1|A=a. The difference Δ=p1p0 is the treatment effect on Y. The primary aim of the trial is to test H0:Δ0 versus H1:Δ>0, at level α. The final analysis of the study involves only the primary outcome Y, and the trial will use a standard Z-test, ZY=pˆ1pˆ0p¯1p¯N11+N01, where p^a is the estimate of pa and p¯ is a weighted average of p^1 and p^0.

An interim analysis is conducted after the auxiliary outcomes S become available for na patients for arms a=0 and 1 (i.e., TS months after the enrollment of na patients on arms a=0 and 1), with early-stopping for futility or continuation based on a summary of the auxiliary outcomes S. In several clinical settings, the treatment effect on S tends to be more pronounced than the treatment effect on Y. The interim analysis is based on the summary ZS=q^1q^0q¯1q¯n11+n01, where q^a is the estimate of qa and q¯ is a weighted average of q^1 and q^0. We replicate the design of Niewczas et al. (2019), which calculates at the interim analysis the conditional power (CP) using the auxiliary outcome S to determine whether to stop the trial for futility or not. Specifically, the CP is calculated based on ZS and the information fraction ts=N11+N01n11+n01 as

CPtS=1Φz1αZStS1/21tS,

where z1α is the 1α quantile of the standard normal distribution and Φ is the cumulative distribution function of the standard normal distribution. Here, we set the cut-off point to be 0.5 so that the trial continues when CPts0.5.

3.2.2. Aim of the sensitivity analysis

The complexity of the simulation report increases with K (the number of scenarios), d (the number of entries of the unknown parameters θ), and R (the number of operating characteristics fθ). Here the full set of unknown parameters Θ7 include the enrollment rate e0,, the response rates pa0,1 for Y in A=a, the response rates qa0,1 for S in A=a, and the correlation between Y and S in A=a, ρa1,1.

Controlling the complexity of the simulation report is important to ensure high interpretability of the report, which will be discussed by several stakeholders. There are a few potential strategies to reduce the complexity of the simulation report. First, it is often possible to consider only a subset of the parameter space ΘΘ based on prior knowledge of plausible values of the unknown parameters. For example, previous clinical studies can indicate a plausible range for the enrollment rate e, the response rates p0 under the SOC, and other parameters that are expected to have minimal variations across trials. In addition, we can also consider fixing multiple entries of the K vectors θ1,,θK to some reference values. In this case the space from which we select scenarios θ1,,θK is further reduced to ΘreΘ. For example, if the operating characteristics have low sensitivity with respect to the correlation parameters ρa or the enrollment rate e of the study, then we can fix these unknown parameters to common values (i.e., estimates) across all K scenarios.

ROSA allows us to evaluate whether it is appropriate to assign the same value to one or more unknown parameters (e.g., ρ0 and ρ1) across all K scenarios. In other words, we evaluate a simulation report with all scenarios in a restricted subset ΘreΘ. A simulation report with scenarios in Θre can potentially be easier to interpret compared to a report in which all d entries of θ vary across scenarios by reducing the number of dimensions d of the unknown parameters and pointing to the most relevant unknown parameters when discussing the variations of the operating characteristics across Θ. We can select scenarios from the restriction Θre Θ only if the capability of the simulation report of representing the operating characteristics variations across Θ is preserved. Our case study investigates this aspect. The operating characteristics of interest f in our case study are the probability of rejecting the null hypothesis of no treatment effect on Y at the end of the study and the average sample size.

3.2.3. Implementing and benchmarking ROSA

Using our ROSA procedure, we randomly select J=1000 training scenarios using LHS and conduct M=500 Monte Carlo simulations for each of the J training scenarios to obtain estimates of the operating characteristics across Θ. Here Θ is a product space with the enrollment rate e0.2,1, the response rates pa0.2,0.4 for Y in A=a, the response rates qa0.2,0.4 for S in A=a, and the correlation between Y and S in A=a,ρa0,0.6. For Θre, we fix the enrollment rate e=0.5 and the response rates p0=q0=0.3 in the control groups.

We use a NN to obtain an interpolation of the operating characteristics. As described in Algorithm 4, to evaluate if the estimates of the operating characteristics are accurate, we compare them to independent Monte Carlo estimates of size M=100,000 on a set of J=200 uniformly-distributed validation points spanning the plausible parameter space Θ. The coefficients of determination R2 in this comparison are above 0.96. This suggests that the NN accurately estimates the operating characteristics.

We compare two simulation reports, and our goal is to provide stakeholders the simplified version if it accurately describes the operating characteristics. The first one includes scenarios from Θ7 restricted by prior knowledge from completed studies and clinical experience and the second includes scenarios from ΘreΘ further restricted by fixing the value of some entries of θ as described above. We use simulated annealing to identify two sets of scenarios in Θre and Θ, respectively. In both cases we minimize the same loss function L defined over K-tuples of Θ points. We also calculate the loss L associated with these two optimal sets of scenarios from Θ and Θre. In Figure 3, we illustrate the difference in loss L between these two optimal sets; as expected, the loss L decreases as K increases. We observe in Figure 3 that for any value of K, the loss L associated with the optimal set of scenarios restricted to Θre is larger compared to the optimal scenarios in Θ. However, the difference is modest, and the gain in interpretability of a sensitivity analysis report with fewer unknown parameters may be worth the slightly larger loss. For example, if an investigator requires the loss to be under a threshold of L=0.2, then it is sufficient to consider K=10 scenarios, regardless of whether we consider scenarios selected from Θ or Θre

Fig. 3.

Fig. 3

Clinical trial design with an interim analysis and an auxiliary endpoint. A graphical representation to choose the number of sensitivity scenarios K2,5,10,15. We compare optimal sets of scenarios selected from Θ7 and from the lower-dimensional restriction ΘreΘ.

3.3. Application 3: Biomarker-driven adaptive enrichment

In the third example, we discuss sensitivity analyses dedicated to an adaptive trial with subpopulations defined by biomarkers, considering multiple unknown parameters and multiple operating characteristics of interest. As a motivating example, in several oncology trials, a major decision is whether to restrict patient enrollment to a targeted subgroup of patients (e.g., biomarker-positive subgroup) or to enroll a broader patient population. Enrolling only a biomarker-positive subgroup may deny a substantial number of patients access to an effective therapy, whereas enrolling a larger population may compromise the power to detect positive treatment effects. Several trial designs discussed in the literature attempt to address the outlined problem through interim looks at the data.

3.3.1. Trial design

We consider an adaptive two-stage enrichment trial design with one-to-one randomization (Jenkins et al., 2011; Jones et al., 2017; Mehta et al., 2019). The design is applicable in the setting where a biomarker-positive subgroup of patients is hypothesized to benefit more from the experimental treatment than the rest of the study population. The design includes a single interim analysis, and it uses progression-free survival (PFS) for interim decision-making, while overall survival (OS) is the endpoint for the final analysis, which occurs when a pre-specified number of events is reached. The interim analysis uses the estimated PFS hazard ratio HR to capture potential early signals of treatment effects. In the implementation of Jenkins et al. (2011), which we replicate, the HR is estimated for both the overall population θ^HR and the biomarker-positive subgroup θ^HR+. An interim decision determines which group is enrolled and tested during the second stage of the trial:

A –. Promising results in the biomarker-positive population.

If the HR estimate θ^HR+<0.6 but θ^HR0.8, then the trial will continue enrolling only biomarker-positive patients and the final analysis will test H0+. Here H0+ is the null hypothesis of no differences in OS between treatment and control groups in the biomarker-positive population. The null hypothesis is rejected if ω1Φ11p1++ω2Φ11p2+<1.96, where p1+p2+ is a log-rank p-value computed using only OS data from patients randomized during the first (second) stage of the trial. The weights ω1,ω2 and the standard normal cumulative distribution function Φ are used to summarize evidence of treatment effects from the two stages of the trial. We refer to Jenkins et al. (2011) for details on the choice of ω1,ω2 and other aspects of the final analysis.

B –. Promising results in the overall population only.

If θ^HR+0.6 but θ^HR<0.8, then the trial will continue enrolling all patients and the final analysis will only test H0O, the null hypothesis of no differences in OS in the overall population. In this case the null hypothesis is tested using stage-specific OS log-rank p-values p1O,p2O and combining evidence from the two stages of the trial.

C –. Unpromising results.

If θ^HR+0.6 and θ^HR0.8, then the trial stops early for futility.

D –. Promising early results for both populations.

Lastly, if the estimated HR in the biomarker-positive subgroup θ^HR+<0.6 and the overall population θ^HR<0.8, then the trial will continue enrolling all patients and testing efficacy both in the overall population and in the biomarker-positive subgroup.

The potential conclusion at the final analysis are (i) to recommend the new treatment for biomarker-positive patients, (ii) recommend the new treatment for both biomarker-positive and biomarker-negative patients, or (iii) not recommend the experimental treatment for future patients.

3.3.2. Aims of the sensitivity analysis

We focus on the following three operating characteristics: (i) f1, the probability of enrolling only biomarker-positive patients in the second stage, (ii) f2, the probability of enrolling both biomarker-positive and biomarker-negative patients in the second stage, and (iii) f3, the probability of no evidence of positive treatment effects, which is equal to the probability of not rejecting the null hypotheses.

We choose plausible intervals for the unknown parameters based on prior literature. Specifically, the recruitment rate θ10.5,1 per week, the prevalence of the biomarker-positive subgroup θ20.15,0.25, the PFS HR comparing the treatment and control groups in the biomarker-positive subgroup θ30.5,1.2, the PFS HR comparing treatment and control in the biomarker-negative subgroup θ40.6,1.2, the OS HR comparing treatment and control in the biomarker-positive subgroup θ50.7,1.2, the OS HR comparing treatment and control groups in the biomarker-negative subgroup θ60.8,1.2, the correlation between OS and PFS in the biomarker-positive subgroup θ70.3,0.6, and the correlation between OS and PFS in the biomarker-negative subgroup θ80.2,0.7. Marginal exponential distributions and latent frailty terms were used for simulating correlated OS and PFS times (Michael and Schucany, 2002). More flexible models such as the Weibull distribution can be considered.

3.3.3. Implementing and benchmarking ROSA

For the outlined two-stage trial with biomarker populations, our ROSA pipeline can be used to compute multiple simulation reports, varying both the list of operating characteristics f and the definition of Θ. For example, one can fix the OS HRs in the biomarker-positive and negative populations to focus on the design sensitivity to other parameters, such as the PFS HRs. Similarly, the set of unknown parameters Θ can be restricted to θ values with positive effects only for the biomarker-positive population. Importantly, one set of training simulations can be re-utilized to compute multiple sensitivity tables where the definitions of f and Θ vary.

We examine the difference in the marginal losses

Lrθ1,,θK=maxθΘmink=1,,Kfrθfrθk2, 1rR, (4)

when the set of scenarios are chosen by optimizing different loss functions. For example, let Sr be the set of scenarios that minimize the marginal loss Lr in (4). Similarly, let S be the set of scenarios that minimize the joint loss L=U in (2). Then it is intuitive that LrSrLrS,1rR. In different words, the marginal losses Lr tend to be smaller when the set of scenarios is chosen to minimize Lr compared to a set of scenarios that minimizes L with the aim of representing multiple operating characteristics. If the discrepancy LrSrLrS,1rR, is relatively small for all R total operating characteristics, then this indicates that it is reasonable to select a single set of scenarios S to illustrate how the R operating characteristics vary jointly across Θ. We describe the difference between the marginal losses Lr,r=1,2,3, when scenarios θ1,,θK in Θ are chosen by optimizing Lr in (4) – optimum: Sr=θ1,r*,,θK,r*- or by optimizing L as in (2) - optimum: S=θ1*,,θK*. Recall that S is computed with the goal of illustrating how multiple operating characteristics vary across Θ while Sr optimizes the representation of a single operating characteristic fr. The weights in (2) are w1=w2=w3=1/3. In Figure 4 panel 1, we plot L1S1 in red and L1S in blue. Similarly, in panel 2 we compare L2S2 and L2S, and in panel 3 we compare L3S3 and L3S. Our results indicate that for all three operating characteristics, LrS>LrSr,r=1,2,3; as expected, there is an increase of the marginal losses Lr when the set of scenarios is selected to illustrate jointly the variations of multiple operating characteristics across Θ. However, this difference is small (<10%) for all K2,5,10,15. Furthermore, for each K2,5,10,15, the relative difference is similar across the three operating characteristics f1,f2,f3 (Figure 4). This result supports the use of identical weights and of a single sensitivity table, with the same set of scenarios S to illustrate jointly all three operating characteristics.

Fig. 4.

Fig. 4

Marginal losses Lr,r=1,2,3 of different sets of scenarios Sr (red) and S (blue).

4. Discussion

The evaluation of complex designs such as dose-finding studies (lasonos et al., 2015), factorial trials (Green et al., 2002), and response-adaptive trials (Pallmann et al., 2018) focuses on multiple operating characteristics, such as the level of toxicities, the probability of selecting the correct treatment arm, or frequentist operating characteristics, including power and false positive probabilities. During the design stage of a complex clinical trial, simulation reports are typically produced to discuss sample size, interim analyses, and other major decisions with various stakeholders. The simulation report consists of one or a few tables dedicated to showcasing how major operating characteristics fθ vary across potential values of unknown parameters in Θ. In most cases, the analyst focuses on subsets of plausible parameters ΘΘ, for example, values concordant with previous studies, or subsets of potential θ values of particular interest, for example with positive and clinically relevant treatment effects.

Simulations are fundamental in the design of complex trials since operating characteristics can rarely be obtained analytically and are crucial in the assessment of study designs for regulators, pharmaceutical companies and other stakeholders (Food et al., 2020). However, a limited number of scenarios or poorly chosen scenarios could be inadequate to highlight variations of the operating characteristics across plausible unknown parameters and can result in sub-optimal decisions. We propose ROSA as a useful tool that can support investigators at this design stage when selecting which and how many scenarios to include in these simulation reports.

We focus on choosing an informative number K of scenarios θ1,,θK among the plausible unknown parameters to summarize the variations of key operating characteristics. Our approach minimizes an explicit loss function and uses established techniques for functional approximation (NNs) and numerical optimization (simulated annealing). We showcase our approach in three trials. Importantly, our approach is general and can be applied to nearly any clinical trial design. It only requires simulations to mimic the clinical trial under hypothetical scenarios.

Although our approach is general, we focused on loss functions L of a specific form (2). It is possible to consider different loss functions. For example, one could consider the loss function Lθ1,,θK=Eθgmink=1,,KDfθ,fθk, where g is a probability distribution on Θ (e.g., a posterior distribution obtained from previous data). The distribution g could be used to incorporate prior information about the unknown parameters in the selection of sensitivity scenarios. Moreover, the metric D:Θ2 can be extended to capture both differences between operating characteristics at plausible values θ,θΘ and other aspects, such as the difference between expected values of the outcomes Y at θ and θ.

One major challenge in the presentation of simulation reports is the need for simplicity and interpretability of the results. To this end, we considered fixing one or more unknown parameters to identical values across the K scenarios, which may be reasonable when there is a priori knowledge of certain unknown parameters. There are other ways to simplify a simulation report, such as removing operating characteristics that do not vary across plausible unknown parameters, or reporting only the range of the operating characteristics across Θ instead of presenting the operating characteristics for each representative scenario.

Variations of the ROSA approach may also consider optimization algorithms other than simulated annealing and regression methods alternative to NN for approximating the operating characteristics across Θ. The methodology that we proposed here can be used to handle other relevant problems, such as missing data. Indeed, during the design of the trial, there is often uncertainty on whether the analyses will involve missing data or not and the potential consequences of the missingness pattern. Probability models that include pre-treatment variables, outcomes, and missing data patterns are useful to explore the robustness of the design. In this case, ROSA can support the selection of scenarios with different missing data patterns.

Supplementary Material

Supp 1

Acknowledgements

The authors thank Cyrus Mehta and Christina Howe for helpful conversations and feedback that greatly enhanced the paper. LH was supported by the Clinical Orthopedic and Musculoskeletal Education and Training (COMET) Program, NIAMS grant T32 AR055885. LT was supported by the NIH grant R01LM013352.

Footnotes

Conflicts of interest

The authors report there are no competing interests to declare.

References

  1. Bélisle CJ (1992). Convergence theorems for a class of simulated annealing algorithms on rd. Journal of Applied Probability pages 885–895.
  2. Berry SM, Carlin BP, Lee JJ, and Muller P (2010). Bayesian adaptive methods for clinical trials CRC press. [Google Scholar]
  3. Bookstein FL (1989). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on pattern analysis and machine intelligence 11, 567–585. [Google Scholar]
  4. Food, Administration, D., et al. (2020). Interacting with the fda on complex innovative trial designs for drugs and biological products Updated December.
  5. Goodfellow I, Bengio Y, and Courville A (2016). Deep learning MIT press. [Google Scholar]
  6. Green S, Liu P-Y, and O'Sullivan J (2002). Factorial design considerations. Journal of Clinical Oncology 20, 3424–3430. [DOI] [PubMed] [Google Scholar]
  7. Han K, Ren M, Wick W, Abrey L, Das A, Jin J, and Reardon DA (2014). Progression-free survival as a surrogate endpoint for overall survival in glioblastoma: a literature-based meta-analysis from 91 trials. Neuro-oncology 16, 696–706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hobbs BP, Barata PC, Kanjanapan Y, Paller CJ, Perlmutter J, Pond GR, Prowell TM, Rubin EH, Seymour LK, Wages NA, et al. (2019). Seamless designs: current practice and considerations for early-phase drug development in oncology. JNCl: Journal of the National Cancer Institute 111, 118–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hornik K (1991). Approximation capabilities of multilayer feedforward networks. Neural networks 4, 251–257. [Google Scholar]
  10. Husmann K, Lange A, and Spiegel E (2017). The r package optimization: Flexible global optimization with simulated-annealing
  11. lasonos A, Gönen M, and Bosl GJ (2015). Scientific review of phase i protocols with novel dose-escalation designs: how much information is needed? Journal of Clinical Oncology 33, 2221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jenkins M, Stone A, and Jennison C (2011). An adaptive seamless phase ii/iii design for oncology trials with subpopulation selection using correlated survival endpoints. Pharmaceutical statistics 10, 347–356. [DOI] [PubMed] [Google Scholar]
  13. Jones RL, Attia S, Mehta CR, Liu L, Sankhala KK, Robinson SI, Ravi V, Penel N, Stacchiotti S, Tap WD, et al. (2017). Tappas: An adaptive enrichment phase 3 trial of trc105 and pazopanib versus pazopanib alone in patients with advanced angiosarcoma (aas). J. Clin. Oncol 35, TPS11081. [Google Scholar]
  14. Kirkpatrick S, Gelatt CD, and Vecchi MP (1983). Optimization by simulated annealing. Science 220, 671–680. [DOI] [PubMed] [Google Scholar]
  15. Leshno M, Lin VY, Pinkus A, and Schocken S (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks 6, 861–867. [Google Scholar]
  16. Mayer C, Perevozskaya I, Leonov S, Dragalin V, Pritchett Y, Bedding A, Hartford A, Fardipour P, and Cicconetti G (2019). Simulation practices for adaptive trial designs in drug and device development. Statistics in Biopharmaceutical Research 11, 325–335. [Google Scholar]
  17. McKay MD, Beckman RJ, and Conover WJ (2000). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42, 55–61. [Google Scholar]
  18. Mehta C, Liu L, and Theuer C (2019). An adaptive population enrichment phase iii trial of trc105 and pazopanib versus pazopanib alone in patients with advanced angiosarcoma (tappas trial). Annals of Oncology 30, 103–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Michael J and Schucany W (2002). The mixture approach for simulating new families of bivariate distributions with specified correlations. The American Statistician 56, 48–54. [Google Scholar]
  20. Niewczas J, Kunz CU, and König F (2019). Interim analysis incorporating short- and long-term binary endpoints. Biometrical Journal 61, 665–687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pallmann P, Bedding AW, Choodari-Oskooei B, Dimairo M, Flight L, Hampson LV, Holmes J, Mander AP, Odondi L, Sydes MR, et al. (2018). Adaptive designs in clinical trials: why use them, and how to run and report them. BMC medicine 16, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Rasmussen CE (2003). Gaussian processes in machine learning. In Summer school on machine learning, pages 63–71. Springer. [Google Scholar]
  23. Razavi S, Jakeman A, Saltelli A, Prieur C, looss B, Borgonovo E, Plischke E, Piano SL, Iwanaga T, Becker W, et al. (2021). The future of sensitivity analysis: an essential discipline for systems modeling and policy support. Environmental Modelling & Software 137, 104954. [Google Scholar]
  24. Spall JC (2005). Introduction to stochastic search and optimization: estimation, simulation, and control John Wiley & Sons. [Google Scholar]
  25. Thorlund K, Haggstrom J, Park JJ, and Mills EJ (2018). Key design considerations for adaptive clinical trials: a primer for clinicians. BMJ 360, k698. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

RESOURCES