Abstract
Tuning and calibration are processes for improving the representativeness of a computer simulation code to a physical phenomenon. This article introduces a statistical methodology for simultaneously determining tuning and calibration parameters in settings where data are available from a computer code and the associated physical experiment. Tuning parameters are set by minimizing a discrepancy measure while the distribution of the calibration parameters are determined based on a hierarchical Bayesian model. The proposed Bayesian model views the output as a realization of a Gaussian stochastic process with hyperpriors. Draws from the resulting posterior distribution are obtained by the Markov chain Monte Carlo simulation. Our methodology is compared with an alternative approach in examples and is illustrated in a biomechanical engineering application. Supplemental materials, including the software and a user manual, are available online and can be requested from the first author.
Keywords: Gaussian stochastic process model, Hierarchical Bayesian model, Kriging
1. INTRODUCTION
Complex computer simulation codes (computer experiments) are used extensively in scientific and engineering studies (see Santner, Williams, and Notz 2003; Fang, Li, and Sudjianto 2005; and Forrester, Sobester, and Keane 2008). Many computer codes were developed for applications where the results of physical experiments are also available to describe the true input/output relationship of interest. In this article we are interested in settings where both types of data are available, but the number of computer code runs is limited because they have lengthy run times. In such a case, statistical models are needed to extend the set of inputs at which output is available by predicting the response from the computer and physical experiments based on a limited number of training runs.
Such statistical models must incorporate different types of input variables. This article focuses on computer experiments having three types of inputs: control variables, tuning parameters, and calibration parameters. Control variables are (design) inputs to both the simulation code and the corresponding physical experiment (Santner, Williams, and Notz 2003, p. 15). A primary goal of the physical experiment is to study the relationship between the response and control variables.
Tuning parameters are inputs that are present only in the simulation code. Tuning parameters are typically numerical quantities that control the solution of a numerical algorithm implemented by a computer experiment. Thus, tuning parameters have no meaning in the physical experiment. For example, Rawlinson et al. (2006) described a finite element analysis (FEA) computer code simulating the forces and movements of a knee prosthesis under a given loading regimen. The two tuning parameters for their model were the mesh density of the tibial tray and the amount of discretization in the curve used to describe the knee loading. Tuning is the process of determining the values of the tuning parameters so that a computer simulation best represents a physical experiment.
Calibration parameters are also present only in the simulation code. Unlike tuning parameters, calibration parameters have meaning in the physical experiment, but their values are either unknown or unmeasured during the running of the physical experiment. For example, Kennedy and O’Hagan (2001) described a computer code for simulating the deposition of ruthenium 106 in the Tomsk-7 chemical plant, which experienced an accident in 1993. In their code, one calibration parameter was the deposition velocity. Calibration is the process of determining plausible values of the calibration parameters to match the results of the physical experiment.
Some computer experiments have both tuning and calibration parameters. For example, in the FEA code of Rawlinson et al. (2006) two calibration parameters were position of the femur relative to the tibia in the initial gait cycle and friction between the bone and the prosthesis.
Although considerable research has been dedicated to setting either tuning or calibration parameters, there has been little study on the problem of setting both simultaneously. In this article, we introduce a methodology for the simultaneous determination of tuning and calibration parameters. Section 2 reviews current methods for tuning and calibration and discusses their limitations. Section 3 introduces a hierarchical Bayesian model describing the response from a physical experiment together with the output from the computer simulation code having both tuning and calibration parameters. Section 4 proposes a methodology for simultaneously setting these parameters. Section 5 compares the proposed methodology with an approach that treats all parameters as calibration parameters.
2. CURRENT METHODS
Previous proposals for calibration have been primarily Bayesian. Their goal was to simulate the posterior distributions of the calibration parameters and to use this posterior distribution to better predict unknown responses and assess the uncertainty in the prediction. Craig et al. (1996, 2001) developed a Bayesian linear forecasting method, which can be seen as a pragmatic approximation to a fully Bayesian analysis. Kennedy and O’Hagan (2001) described a Bayesian calibration framework based on a model having a bias function and a (modular) Bayesian analysis. Higdon et al. (2004) proposed a fully Bayesian implementation for the model in Kennedy and O’Hagan (2001). Higdon et al. (2008) developed a methodology for calibration when the outputs are multivariate. Gattiker (2005) implemented the models in Higdon et al. (2004, 2008) in MATLAB. Loeppky, Bingham, and Welch (2006) proved that the previous procedures would lead to correct estimation of the calibration parameters asymptotically, provided the true values of the calibration parameters could reduce the bias of the computer simulation to zero. We discuss a lemma in Loeppky, Bingham, and Welch (2006) in Section 5.
Classical methods used to set tuning parameters, such as Park (1991) and Cox, Park, and Singer (1996), select these inputs to make the computer code output fit the physical observations as closely as possible in an integrated prediction error sense. These proposals for selecting tuning parameters do not provide estimated uncertainties associated with the selected values. However, several recent proposals for “validating” computer experiments are similar to the methodology for Bayesian calibration. For example, Bayarri et al. (2007b) presented a framework with a Bayesian model for validation and Bayarri et al. (2007a) used wavelet decomposition based on the same Bayesian model to validate computer experiments having multivariate outputs.
Because the methods above focus on either tuning or calibration, they may suffer limitations in applications where both tuning and calibration parameters are present. Specifically, if one applies classical tuning methodology to a problem involving both tuning and calibration parameters, one cannot quantify the uncertainties of the predicted calibration parameters. If one applies calibration methodology to such a problem and treats all parameters as calibration parameters, one can obtain biased parameter predictions and/or undesirable estimated uncertainties of the tuning and calibration parameters as we will show below. These misleading predictions and estimated uncertainties can lead to large prediction errors.
To demonstrate the limitations alluded to above, we used the Bayesian calibration program (gpmsa) described in Gattiker (2005) to set tuning and calibration parameters for an application in Rawlinson et al. (2006). Originally, the goal of this project was to compare the damage in two knee implants [the Install-Burstein (IB) manufactured by Zimmer, Inc. and the Optetrak produced by Exactech, Inc.] under certain loading regimens.
Here, we consider only the IB knee implant and a physical experiment in which the measurements consist of kinematics and kinetics of knees tested in an Instron-Stanmore KC1 testing device, a “knee simulator.” The loading (magnitude, angle, and rate), joint stiffness, and knee design were modeled in a finite element analysis (FEA) computer code whose output included the kinematics, kinetics, and stresses experienced by the knee component. The output we considered was the anterior–posterior displacement (APD) of the femoral component with respect to the tibial tray, which is an important kinematic output of fore-aft motion during knee function. The APD was measured in both the knee simulator and the computer simulation; it is roughly proportional to the anterior–posterior force that, coupled with the vertical joint load, contributed to damage of the prosthesis. Thus this computer experiment had one control variable (the percentile in gait cycle), two tuning parameters (finite element mesh density and load discretization), and two calibration parameters (friction and initial position).
The design of the computer and the physical experiments were roughly maximin Latin hypercube designs (McKay, Beckman, and Conover 1979). A pilot study of the APD for this design acquired 439 observations from the computer code and 36 observations from the physical experiment for the purpose of tuning and calibration. Figure 1 depicts the training data. A previous sensitivity analysis (Saltelli, Chan, and Scott 2000) found that APD was neither sensitive to friction nor mesh density. Therefore, we study load discretization (a tuning parameter) and initial position (a calibration parameter) regarding the percentage of the gait cycle as a design input to both physical and computer experiments.
Figure 1.
Scatterplot of the measured APD against percentile in the gait cycle, from the knee simulator (triangles) and the FEA computer code (dots).
Using gpmsa (Gattiker 2005), we treated both load discretization and initial position as calibration parameters whose prior distributions were near uniform distribution. We ran gpmsa with 8000 burn-in iterations and 2000 production runs, and based our inferences on 100 equally spaced values of each parameter taken from the production runs. Figure 2 shows the simulated posterior distributions of these two parameters. The posterior distribution of load discretization is bimodal and that of initial position has a lower value in its range as mode, but has large variation. Thus, plausible values of the tuning and calibration parameters remain unclear. We demonstrate in Section 5.2 that our methodology can significantly improve the selection of the tuning and calibration parameters in this application.
Figure 2.
Simulated posterior distributions of load discretization (the left panel) and initial position (the right panel).
3. A HIERARCHICAL BAYESIAN MODEL FOR TUNING AND CALIBRATION
We let x denote (the vector of) control inputs, t denote the vector of tuning parameters, and t★ denote the “best” values of the tuning parameters, where t★ will be defined in Section 4. We let c denote the vector of calibration parameters and c★ denote the unknown true values of the calibration parameters. We assume x ∈ [0, 1]px, t ∈ [0, 1]pt, and c ∈ [0, 1]pc or they can be so scaled. Information about the value of the unknown c★ will be contained in its (marginal) prior, which will be denoted by [c★].
We let denote the training data from the computer experiment and denote the training data from the physical experiment. Here ns and np are the numbers of runs for each type of data. A more complete notation for the physical experiment response is yp(x, c★); throughout this article, yp(x) and yp(x, c★) are equivalent. We let ys be the ns × 1 vector with ith element and yp be the np × 1 vector with jth element .
The model proposed here is similar in spirit to the ones in Kennedy and O’Hagan (2001) and Higdon et al. (2005) except that it explicitly incorporates both the tuning and calibration parameters. In other words, the response from the physical experiment is the sum of the “true response” and a random noise; the output from the computer experiment is the difference between the true response and the code bias.
Formally, we assume that the code output at (x, c, t), ys(x, c, t), is a realization of the Gaussian stochastic process
| (1) |
where is a vector of known regression functions and βZ is a vector of unknown regression parameters. If Ys(•) is assumed to have a constant mean, then . In Equation (1), Z(•) is a stationary Gaussian stochastic process with mean 0, variance , and product Gaussian correlation
| (2) |
with correlation parameters ρZ = (ρZ,x,1, …, ρZ,x,px, ρZ,c,1, …, ρZ,c,pc, ρZ,t,1, …, ρZ,t,pt)⊤. Thus ys is a realization of the random vector Ys whose ith element is .
We regard the output of the physical experiment at x, yp(x), as a realization of
| (3) |
where η(x, c★) = E(Yp(x)) is the expected response at x and ε is a white noise (Gaussian) process with mean 0 and variance . Thus yp is a realization of the random vector Yp whose ith element is .
We define
| (4) |
to be the code bias at (x, c, t). Our model assumes δ(•) is a draw from the Gaussian stochastic process
| (5) |
where fD(x, c, t) is a vector of known regression functions and βD is a vector of unknown regression parameters; if Δ(•) is assumed to have a constant mean, then . Similar to Z(•), D(•) is defined to be a stationary Gaussian stochastic process with mean 0, variance , and correlation function
| (6) |
where ρD = (ρD,x,1, …, ρD,x,px, ρD,c,1, …, ρD,c,pc, ρD,t,1, …, ρD,t,pt)⊤. Finally we assume that, given their parameters, Z(•) and D(•) [hence Ys(•) and Δ(•)] are mutually independent processes. Some authors call δ(•) the “code discrepancy” or “code inadequacy.” Throughout this article, δ(•) is referred to as code bias and the word “discrepancy” is used in Section 4.1 to designate an objective function used in the determination of the tuning parameters.
In conclusion, conditional on the parameters , our model assumes that for given inputs (x, c, t), the simulator output ys(x, c, t) is a draw from
| (7) |
Similarly conditional on unknown parameters, the physical experiment output at control input x, yp(x), can be viewed as a draw from
| (8) |
for any given vector t of tuning parameters. The unknown parameters being conditioned in Equation (8) are those of Ys(x, c★, t), Δ(x, c★, t), and ε, which are , and , respectively. Thus all conditional statements given Ys = ys and Yp = yp are computed with respect to Equations (7) and (8). Appendix B lists the variance–covariance matrix for the vectors Ys and Yp and their cross-covariance matrix.
Next we introduce our prior model for the unknown model parameters and derive the posteriors given the tuning parameters t. The proposed priors are similar to the ones in Higdon et al. (2004, 2005), and Gattiker (2005). For a generic parameter (or a vector of parameters) v, we let [v] denote the (joint) prior distribution.
We assume that the priors for the calibration parameters, c★, the process means, , the process variances, , and process correlations, , are independent so that
| (9) |
If expert knowledge about the priors is available it should be used. For example, if the researcher knows (roughly) the magnitude of the noise term, it should be used to construct the prior for . Lacking such knowledge, our methodology is based on the following priors.
The examples and software take and . Our methodology for selecting (βZ,0, βD,0) is modular in the spirit of Bayarri et al. (2007a) in that these mean parameters are determined by non-Bayesian arguments in Section 4.2. All other parameters are determined from draws of the posterior.
Because the calibration parameters are scaled to [0, 1]pc, all the elements of c★ are taken to have independent and identical truncated normal distributions with mean 0.5, standard deviation 2, and support [0, 1]. This prior is close to a uniform distribution on [0, 1]pc.
The variance parameters are taken to have independent inverse Gamma prior distributions. Letting IG(α, γ) denote the inverse Gamma distribution with mean 1/[γ(α – 1)] and variance 1/[γ2(α – 1)2(α – 2)], we take , and where the parameters are set data-adaptively, as follows. We first compute the sample variances of ys and yp, which we denote by and respectively. We take to have a (mildly) informative prior distribution by setting . We select the prior to have a small magnitude and variation by taking as the default. The supplemental software package allows the user to specify a multiplier for so that the program can suit physical experiments with random noise of different magnitudes. We use the value of to determine the prior for ; if is set to have a mildly informative prior by taking while is set to have a prior with small magnitude and variation by selecting .
Finally, we take the priors of the correlation parameters to be independently and identically distributed Beta(1, 0.5) distributions. This Beta distribution has mean 2/3, mode close to 1, and a rather diffuse support over (0, 1); it encourages dependence of the output on each input, but the broad support allows the data a strong voice in determining the posteriors (Linkletter et al. 2006 and Higdon et al. 2008).
Let ϕ = (β, σ, ρ) [or ϕ = (σ, ρ) when β is fixed as below] and [c★, ϕ] denote the joint prior of c★ and ϕ. For a fixed t and given the data (ys, yp), the joint posterior distribution can be written as
| (10) |
| (11) |
(See Appendix A for details of the posterior distribution.) We simulate draws from [c★, ϕ|ys, yp, t] by implementing a Metropolis–Hastings (MH) algorithm. Appendix A sketches this algorithm (see Higdon et al. 2004, 2005 for details about related MH algorithms). Using this model and the relationship in Equations (10) and (11), draws can be made from the joint posterior distribution [c★, ϕ|ys, yp, t] which forms the basis for our proposed methodology for simultaneous tuning and calibration in Section 4.
4. METHODOLOGY FOR SIMULTANEOUS TUNING AND CALIBRATION
4.1 The Discrepancy Function
Our definition of the ideal tuning parameters is based on a function that measures the discrepancy between the physical experiment response and the computer code output. The discrepancy is a function of (c, t), but can be viewed as a function of t for given c = c★. Three of the many possible discrepancy functions are L2 discrepancy, ∫(η(x, c★) – ys(x, c, t))2 dx, L1 discrepancy, ∫ |η(x, c★) – ys(x, c, t)|dx, and L∞ discrepancy, maxx |η(x, c★) – ys(x, c, t)|; here (and below) the integrals are over [0, 1] raised to an appropriate power. In this article, we use L2 discrepancy, but our methodology can be applied with any other discrepancy function.
If c★ is known, the L2 discrepancy is defined by
| (12) |
regarded as a function of t. If c★ is unknown, but has a posterior distribution [c★|ys, yp, t], the L2 discrepancy becomes
Our objective in tuning is to estimate
| (13) |
Our objective in calibration is to compute the posterior distribution [c★|ys, yp, t★]. We achieve the two objectives by minimizing an estimated squared discrepancy (defined in Section 4.2) and simulating the posterior distribution of c★ given the minimizer.
4.2 Simultaneous Tuning and Calibration
To review, the main idea is to choose the most plausible values of the tuning parameters so that the computer code can best represent the physical experiment and use these tuning parameters in the inferences about calibration parameters. Next, we describe our method for estimating δ2(x, c★,t), t★, and [c★|ys, yp, t★] which we call the STaC procedure, which stands for simultaneous tuning and calibration. Then we summarize the methodology in a three step procedure.
Our first of the three steps is to estimate δ2(x, c★,t) by the posterior mean E(Δ2(x, c★, t)|ys, yp, t), which is the Bayes predictor of δ2(x, c★,t) under squared error loss (Santner, Williams, and Notz 2003). Notice that E(Δ2(x, c★, t)| c★, ϕ, ys, yp, t) is equal to
| (14) |
Closed form expresssions for both E(Δ(x, c★, t)|c★, ϕ, ys, yp, t) and Var(Δ(x, c★, t)|c★, ϕ, ys, yp, t) can be derived analytically based on the probability density of the multivariate normal distribution (see Appendix B).
The two terms in Equation (14) can be interpreted intuitively, as follows. Large values of [E(Δ(x, c★, t)|c★, ϕ, ys, yp, t)]2 imply that there is a large discrepancy between the computer and the physical experiments; thus [E(Δ(x, c★, t)|c★, ϕ, ys, yp, t)]2 can be interpreted as squared prediction bias. The term Var(Δ(x, c★, t)|c★, ϕ, ys, yp, t) is a measure of the uncertainty of the predicted bias. Thus Equation (14) favors a t that minimizes the sum of the squared prediction bias and its predicted uncertainty.
The right-hand side of
can be approximated using Equation (14), and based on draws from [c★, ϕ|ys, yp, t], which can be obtained using the MH algorithm. The law of large numbers guarantees that
| (15) |
for a sufficiently large NMC. Thus, we can approximate E(Δ2 (x, c★,t)|ys, yp, t) asymptotically.
The estimation of δ2(x, c★,t) using Equation (15) requires knowledge of (βZ,0, βD,0); these parameters are set as follows. The parameter βD,0 can be interpreted as the difference between the average of the physical experiment and the average of the computer experiment outputs when the calibration and tuning parameters are equal to (c★,t). The value of βD,0 can impact both calibration and tuning and thus must be carefully chosen. Assuming expert knowledge about the difference βD,0 is not available, we suggest setting βD,0 based on the assumption that, for any given t, the code inadequacy is decreased as c gets closer to c★ so that βD,0 becomes smaller. Thus βD,0 is minimized when c = c★. Hence we take βD,0 to be the minimized difference, over c values, between the sample mean of the np physical experiment responses and the sample mean of np predicted code outputs whose corresponding inputs are , where , are the inputs to the physical experiment and t is given. The c minimization must be over the calibration parameter input space. Operationally, this is accomplished by using a space-filling maximin Latin hypercube design to select a set of ns c values over which the minimization is carried out. For each choice of c, we predict np outputs based on the training data from the computer experiment using a MATLAB version of the parametric empirical kriging (PErK) program in Santner, Williams, and Notz (2003) and Appendix C. We then compute the difference between the sample mean of and the sample mean of the predicted . Repeating this operation with each choice of c yields ns differences. We set βD,0 to the smallest difference and then set βZ,0 so that βZ,0 + βD,0 is the average of the np physical experiment responses.
In the second step, we estimate t★ by
| (16) |
The integral over x in Equation (16) can be approximated by a further Monte Carlo integration that averages Equation (15) over a grid of x inputs. In our examples, we took
| (17) |
with Nx= 101 and an equally spaced grid of values of x. By the law of large numbers, converges to t★ almost surely as Nx and NMC go to infinity. We regard in Equation (17) as the estimator of t★ and the quantity as the estimated squared discrepancy.
In the third step, we estimate the posterior distribution [c★|ys, yp, t★]. Because converges to t★ almost surely, [c★|ys, yp, ] converges to [c★|ys, yp, t★] in distribution. We therefore estimate the distribution [c★|ys, yp, t★] using the draws of c★ from [c★, ϕ|ys, yp, ].
In summary, we conduct simultaneous tuning and calibration as follows:
Step 1. For each possible t in a grid of the tuning parameter vectors, set the values of (βZ,0, βD,0), make draws from [c★,ϕ |ys, yp, t], and compute the estimated squared discrepancy at a set of inputs {xi:i=1, …,Nx}.
Step 2. Compute
Step 3. Estimate .
4.3 Prediction
In addition to tuning and calibration, we can use and the draws from to predict ys(•) and η(•) at arbitrary inputs and to construct predictive intervals for these quantities. Given (x, c, t), the best linear unbiased prediction (BLUP) of ys(x, c, t) is . Thus, by the law of large numbers,
| (18) |
is an approximation to the BLUP of ys(x, c, t) for large NMC. Similarly, given x,
| (19) |
is an approximation to the BLUP of η(x, c★) by Equations (3) through (5).
We quantify the uncertainty in the predictor using
| (20) |
which can be approximated using draws from the MH algorithm. We provide details in Appendix C. Similarly, we quantify the uncertainty in the η̂(x, c★) predictor using . Details of this computation are also given in Appendix C.
Based on Equations (18), (19), (20), and the results in Appendix C, we can compute point-wise 100(1 – α)% prediction bands for ys(x, c, t) and η(x, c★) as
and
where zα/2 is the upper α/2 critical point of the standard normal distribution.
5. EXAMPLES AND COMPARISON
5.1 Introduction
In this section, we compare Bayesian calibration as implemented by the gpmsa software with STaC. Recall that the former performs a fully Bayesian calibration for the tuning as well as the calibration parameters, whereas STaC sets the tuning parameters to minimize a predicted L2 discrepancy between the computer and the physical experiment outputs. Lemma 1 in Loeppky, Bingham, and Welch (2006) shows that, roughly, for their Gaussian stochastic process model, under the conditions (a) the discrepancy between the computer and the physical experiments can be minimized to be zero and (b) infinitely many runs from the computer simulation and the physical experiment are available, the maximum likelihood estimator (MLE) of the discrepancy is zero and the MLE of the calibration parameters minimizes the discrepancy asymptotically. Based on this lemma, Bayesian calibration and STaC can give comparable estimates of t★ in our model if (A) the discrepancy δ(•) can be made sufficiently close to zero, (B) the number of observations from the computer experiment is reasonably large, and (C) the posterior mode of every calibration parameter is close to the MLE of the parameter.
However, (A), (B), and (C) can be hard or impossible to verify/implement in many applications for at least three reasons. It may not be possible to drive the the discrepancy δ(•) to zero because of the simplified physics or biology used to construct the simulation code. The number of computer and physical experiment runs are typically limited. The posterior mode of a calibration parameter can differ from the MLE if either the prior for c★ or Δ(•) is informative. Therefore, the Bayesian calibration approach may fail to minimize the difference between the computer and the physical experiments. Next we compare STaC with gpmsa in a numerical example where the bias can be minimized to zero, yet the training data are scarce.
5.2 An Illustrative Example
This example has three real-valued inputs to a computer code: one control variable x, one tuning parameter t, and one calibration parameter c. All three inputs have the support [0, 1]. The output from the computer code was generated as
and the response from the physical experiment as
where ε denotes noise generated from normal distribution having mean zero and standard deviation 0.01. In this example the bias term can be driven to zero by setting c = 0.1 and t = 0.8. Thus any approach that suggests t★ = 0.8 and c★ = 0.1 is favorable.
We generated 30 inputs for the computer experiment and 5 for the physical experiment with both designs being maximin Latin hypercube designs. The Bayesian calibration program gpmsa was run treating c and t as calibration parameters. Following 8000 burn-in iterations and 2000 production ones, we regarded 100 equally spaced samples drawn from the 2000 production runs as draws from the joint posterior distribution of (t★, c★). The estimated posterior distributions of t★ and c★ are shown in Figure 3. The estimated [t★|yp, ys] is not able to pinpoint that t★ = 0.8 and the estimated [c★|yp, ys] fails to suggest values close to 0.1. We next used gpmsa to predict η(x, c★) over the grid, x = 0, 0.02, …, 1, of equally spaced inputs. Figure 4 depicts the training data, the true response curve, and the predictions. We see that the predictions are biased when x is near 0. The root mean squared prediction error (RMSPE) was computed as a measure of the predictive accuracy; the RMSPE of the generic predictor η̂(•) with respect to the x grid above is defined to be . The RMSPE of the predictor obtained by gpmsa is 0.0693.
Figure 3.
Simulated posterior distributions of t★ (the left panel) and c★ (the right panel) using the Bayesian calibration program.
Figure 4.
The training data (solid circles), the true response curve (the solid line), and the predictions (pluses) obtained by gpmsa.
We applied STaC to this example with NMC = 100 and Nx = 101. The number of iterations in the burn-in period, the number of production runs, the sampling space, and the prediction inputs are the same as used for gpmsa. The left panel in Figure 5 plots the L2 discrepancy against t, for t = 0, 0.1, …, 0.9, 1; thus, STaC picks . The right panel in Figure 5 shows the histogram of the simulated posterior distributions of c★ when t★ = 0.8. The posterior distribution has a center near 0.1. By running the program multiple times, we found that the smallest discrepancy occurred when t was between [0.7, 0.9] and the posterior distribution of c was centered at 0.1.
Figure 5.
Results of tuning and calibration from the STaC program: The left panel plots of the estimated L2 discrepancy against t where t = 0, 0.1, …, 0.9, 1; the right panel is a histogram of the draws from the posterior distribution of c★.
Figure 6 plots the predictions and the 99% two-sided prediction bands for true input-output function η(x, c★) using STaC. We see that the predictions are close to the true responses and the 99% prediction band contains the true curve for all x ∈ [0, 1]. The RMSPE of the predictor obtained by the STaC program is 0.0382. The relative improvement of STaC compared with the gpmsa-based predictor is 44.8% [= (0.0693 – 0.0382)/0.0693 × 100%].
Figure 6.
The training data (solid circles), true response curve (the solid line), predictions (pluses), and 99% prediction bands (dashes) using the STaC program.
Thus, for this example, the inferences about the tuning and calibration parameters and the predictions with STaC are more informative and more accurate than using a fully Bayesian procedure.
5.3 A Biomechanics Example
We apply STaC to the biomechanics application described in Section 2. The number of burn-in and production runs are identical to those used in our first example. The left panel of Figure 7 plots the estimated squared discrepancies for the tuning parameter equal to 15, 16, …, 30 (with NMC = 100 and Nx= 101); this figure shows that the computer simulation best matches the knee simulator when the load discretization is set to . The right panel shows the histogram of the simulated posterior distribution . The simulated posterior distribution of the initial position has mode −1.7 with substantially smaller uncertainty than does the Bayesian calibration in Figure 2 (right panel). Thus STaC indicates that the true initial position is on or below the lower bound of the current design space. These results can be used for future runs of the FEA code to study the anterior-posterior displacement.
Figure 7.
Scatterplot of the estimated squared discrepancy versus the value of the tuning parameter (left panel); histogram of the simulated posterior distribution of the calibration parameter when (right panel).
Figure 8 depicts the predictions and 99% prediction bands for the anterior–posterior displacement in the physical experiment using the current data. Once the initial position and load discretization are set using STaC, the predictions clearly match those provided by the knee simulator.
Figure 8.
The training data (triangles), predictions (pluses), and 99% prediction bands (dashes) using the STaC program.
6. SUMMARY AND DISCUSSION
This article introduces a modular Bayesian methodology for simultaneous tuning and calibration (STaC). It demonstrates, with examples, that STaC performs well for tuning, calibration, and prediction of the true input–output relationship. The examples also show that treating tuning and calibration parameters differentially is beneficial whether or not the code bias can be reduced to zero.
There are several areas of continuing research interest concerning STaC. Currently, determining the tuning parameters in the first step of STaC can be time consuming; additional work on speeding up the computation is needed. Second, a measure of the uncertainty in the estimated tuning parameter , perhaps based on the curvature of the integral , is needed. Third, it is of interest to extend STaC to other settings, for example, to cases where outputs are multivariate and to applications where both the computer code and the physical experiment have mixed quantitative and qualitative inputs. Fourth, there are other estimates of δ that can produce superior results for tuning and thus deserve further investigation. For example, another estimate of δ(x, c★, t) is E(Yp(x)|c★, ϕ, ys, yp, t) – E(Ys(x, c★, t)|c★, ϕ, ys, t), where the second expectation is not conditioning on yp. If some of the tuning parameters are continuous, another extension of the current approach is to model the response surface of the estimated squared discrepancy and identify using the response surface.
Lastly, we sketch two research areas related to calibration and tuning. One is to determine better experimental designs for the inputs to the computer and physical experiments when STaC is the objective, i.e., inputs . We use two separate Latin hypercube designs for the physical and computer experiments in our examples. We believe nested Latin hypercube designs, for example, orthogonal array based nested Latin hypercube designs proposed by Qian, Ai, and Wu (2009), Qian (2009), and Qian and Wu (2009), may better determine tuning and calibration parameters. The second research area is to solve the potential issue of nonidentifiability. Note that tuning parameters, calibration parameters, and the code bias δ may be correlated. For example, if yp(x) = x2 + 1 + ε and ys(x, c, t) = x2 + c + t, then c, t, and δ can be three arbitrary real values whose sum is 1. The detection and elimination of the nonidentifiability will be important for calibration, tuning, and validation for complex computer models.
Supplementary Material
ACKNOWLEDGMENTS
This research was sponsored, in part, by the National Science Foundation under Agreements DMS-0112069 (SAMSI/Duke University) and DMS-0406026 (The Ohio State University). Research support for the development of the biomechanics code and knee simulator used to produce the data for the example in Sections 2 and 5.3 was provided by NIH grant AR049793, Exactech, Inc., and the Clark and Kirby Foundations. The authors thank the editor, associate editor, and two reviewers for their constructive suggestions and valuable comments.
APPENDIX A: IMPLEMENTATION OF THE BAYESIAN MODEL
The posterior density is proportional to the product of the joint prior density [c★, ϕ|t] and the likelihood [ys, yp|(c★, ϕ, t)]. We can determine [c★, ϕ|t] by computing the prior densities of [c★], [β], [σ], and [ρ] and applying Equation (9). The density [ys, yp|(c★, ϕ, t)] is analytically derivable because ys and yp are viewed as observations from Gaussian stochastic processes. Specifically, our model is and for all i = 1, …, ns and j = 1, …, np. The mean values of and are and respectively. The covariance matrix between Ys and Yp given (c★ϕ, t) is an ns × np matrix with (i,j)th element
which can be computed using Equation (2).
With the joint prior density and the likelihood, the Metropolis–Hastings algorithm updates each of the parameters using the specifications listed in Table 1. For all the parameters, the proposal distributions are uniform distributions with centers set to be the previous draws and widths specified by the last column of Table 1. However, for all correlation parameters ρ, the proposal draws are made on ξ = −4 log(ρ).
Table 1.
Specification of the Metropolis–Hastings algorithm for each parameter. Columns 2–5, from left to right, correspond to the prior distributions, the lower and upper bounds of the parameters as the program iterates, the initial values of the parameters, and the widths of the uniform distributions. The notation TN(µ, σ2) on [a, b] denotes the truncated normal distribution with mean µ, variance σ2, and support [a, b]; Γ(α, β) denotes the Gamma distribution having mean αβ and variance αβ2
| Model parameter | Prior distribution | Support | Initial value | Width | ||||
|---|---|---|---|---|---|---|---|---|
| All elements of c★ | TN(0.5, 22) | [0,1] | 0.5 | 0.1 | ||||
| All elements of ρz and ρδ | Beta(1,0.5) | (0,1) | 2/3 | 0.3 | ||||
|
|
|
(0, +∞) |
|
|
||||
|
|
|
(0, +∞) |
|
|
||||
|
|
|
(0, +∞) |
|
|
||||
|
|
|
(0, +∞) |
|
|
APPENDIX B: DERIVATION OF FORMULA (14)
Using Equations (5) and (6) we compute and where ∑DZ is the (np + ns) × 1 vector of covariance between Δ(x, c★,t) and (Ys┬, Yp┬)┬, ∑ZZ is the (np + ns) × (np + ns) variance–covariance matrix of (Ys┬, Yp┬)┬, and Dt is the (np + ns) × 1 vector of differences between the observed data and its expectation; i.e., Dt = (Ys┬, Yp┬)┬ – (E(Yp|t)┬E(Ys|t)┬)┬. We list the elements if ∑DZ, ∑ZZ, and Dt below.
-
(B.1)
For i = 1,2, …, np, the ith element of ∑DZ is . For i = np + 1, np + 2, …, np + ns, the ith element of ∑DZ is .
-
(B.2)
where(B.1) - – ΣPP is an np × np matrix whose (i1, i2)th entry is , for i, i1, i2 = 1, 2, …, np;
- – ΣPS is an np × ns matrix whose (i,j)th entry is , for i = 1, 2, …, np and j = 1, 2, …, ns;
- – ΣSS is an ns × ns matrix whose (j1, j2)th entry is , for j1, j2 = 1, 2, …, ns.
-
(B.3)
For i = 1, 2, …, np, the ith element of Dt is . For i = np + 1, np + 2, …, np + ns, the ith element of Dt is
APPENDIX C: DETAILS OF THE PREDICTION INTERVAL
By conditioning on and the law of large numbers,
| (C.1) |
and
| (C.2) |
for large NMC. Using Equations (1), (2), (4), (5), (6), and the following formulas for , , and one can compute Equations (18), ((19)), ((C.1)), and ((C.2)) by taking to be .
-
(C.1)
The conditional expectation where Σ̂0Z is the (np + ns) × 1 vector of covariances between Z(x, c, t) and the entries of (Yp┬, Ys┬)┬ given . For i = 1, …, np, the ith element of Σ0Z is while for i = np + 1, …, np + ns the ith element is . Similarly, Σ̂ZZ is Equation (B.1) and D̂t is given by (B.3), both conditional on .
-
(C.2)
The conditional variance where Σ̂0Z, Σ̂ZZ are defined in (C.1) while is taken from ϕ̂.
-
(C.3)
The conditional mean of η(x, c★) is , where and is derived in (C.1) and Appendix B, respectively.
-
(C.4)
The conditional variance , where and are taken from ϕ̂, Σ̂0Z and Σ̂ZZ given in (C.1), and Σ̂DZ is given by (B.1) where conditioning is given .
Footnotes
SUPPLEMENTAL MATERIALS
A MATLAB Routine: The supplemental materials include a user guide to this MATLAB package (STaCguide.pdf), the software package (directory STaCprogram), and an example subdirectory having the driver file for running the example in Section 5.3. (STaC_software_version1.0.zip, zip archive)
Contributor Information
Gang Han, H. Lee Moffitt Cancer Center & Research Institute, MRC/BIOSTAT, 12902 Magnolia Drive, Tampa, FL 33612, (gang.han@moffitt.org).
Thomas J. Santner, Department of Statistics, The Ohio State University, 415 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210, (tjs@stat.osu.edu)
Jeremy J. Rawlinson, C2-527 Clinical Programs Center, Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, (jjr4@cornell.edu)
REFERENCES
- Bayarri MJ, Berger JO, Cafeo J, Garcia-Donato G, Liu F, Palomo J, Parthasarathy RJ, Paulo R, Sacks J, Walsh D. Computer Model Validation With Functional Output. The Annals of Statistics. 2007a;35(5):1874–1906. [Google Scholar]
- Bayarri MJ, Berger JO, Paulo R, Sacks J, Cafeo JA, Cavendish J, Lin C-H, Tu J. A Framework for Validation of Computer Models. Technometrics. 2007b;49(2):138–154. [Google Scholar]
- Cox DD, Park JS, Singer CE. A Statistical Method for Tuning a Computer Code to a Database. Rice University, Dept. of Statistics; 1996. Technical Report 96-3. [Google Scholar]
- Craig PC, Goldstein M, Rougier JC, Seheult AH. Bayesian Forecasting for Complex Systems Using Computer Simulators. Journal of the American Statistical Association. 2001;96:717–729. [Google Scholar]
- Craig PS, Goldstein M, Seheult AH, Smith JA. Bayes Linear Strategies for History Matching of Hydrocarbon Reservoirs. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, editors. Bayesian Statistics. Vol. 5. Oxford: Oxford University Press; 1996. pp. 69–95. [Google Scholar]
- Fang KT, Li R, Sudjianto A. Design and Modeling for Computer Experiments. London: Chapman & Hall; 2005. [Google Scholar]
- Forrester A, Sobester A, Keane A. Engineering Design via Surrogate Modelling: A Practical Guide. Chichester, U.K: Wiley; 2008. [Google Scholar]
- Gattiker J. Using the Gaussian Process Model for Simulation Analysis (GPM/SA) Code. Los Alamos National Laboratory; 2005. technical report. [Google Scholar]
- Higdon D, Gattiker J, Williams B, Rightley M. Computer Model Calibration Using High Dimensional Output. Journal of the American Statistical Association. 2008;103:570–583. [Google Scholar]
- Higdon D, Kennedy M, Cavendish J, Cafeo J, Ryne R. Combining Field Data and Computer Simulations for Calibration and Prediction. SIAM Journal of Scientific Computing. 2004;26:448–466. [Google Scholar]
- Higdon D, Williams B, Moore L, McKay M, Keller-McNulty S. Uncertainty Quantification for Combining Experimental Data and Computer Simulations. Los Alamos National Laboratory; 2005. Technical Report LA-UR 05-4951. [Google Scholar]
- Kennedy MC, O’Hagan A. Bayesian Calibration of Computer Models. Journal of the Royal Statistical Society, Ser. B. 2001;63:425–464. (with discussion) [Google Scholar]
- Linkletter C, Bingham D, Hengartner N, Higdon D, Ye KQ. Variable Selection for Gaussian Process Models in Computer Experiments. Technometrics. 2006;48:478–490. [Google Scholar]
- Loeppky J, Bingham D, Welch W. Computer Model Calibration or Tuning in Practice. University of British Columbia; 2006. technical report. [Google Scholar]
- McKay MD, Beckman RJ, Conover WJ. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code. Technometrics. 1979;21:239–245. [Google Scholar]
- Park JS. Ph.D. thesis. Champaign/Urbana, IL: University of Illinois; 1991. Tuning Complex Computer Codes to Data and Optimal Designs. [Google Scholar]
- Qian PZG. Nested Latin Hypercube Designs. Biometrika. 2009 to appear, doi: 10.1093/biomet/asp045. [Google Scholar]
- Qian PZG, Wu CFJ. Sliced Space-Filling Designs. Biometrika. 2009 to appear, doi: 10.1093/biomet/asp044. [Google Scholar]
- Qian PZG, Ai M, Wu CFJ. Construction of Nested Space-Filling Designs. The Annals of Statistics. 2009;37:3616–3643. [Google Scholar]
- Rawlinson JJ, Furman BD, Li S, Wright TM, Bartel DL. Retrieval, Experimental, and Computational Assessment of the Performance of Total Knee Replacements. Journal of Orthopaedic Research. 2006;24(7):1384–1394. doi: 10.1002/jor.20181. [DOI] [PubMed] [Google Scholar]
- Saltelli A, Chan K, Scott E. Sensitivity Analysis. Chichester: Wiley; 2000. [Google Scholar]
- Santner TJ, Williams BJ, Notz WI. The Design and Analysis of Computer Experiments. New York: Springer Verlag; 2003. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








