Abstract
Background.
Calibration is the process of estimating parameters of a mathematical model by matching model outputs to calibration targets. In the presence of nonidentifiability, multiple parameter sets solve the calibration problem, which may have important implications for decision making. We evaluate the implications of nonidentifiability on the optimal strategy and provide methods to check for nonidentifiability.
Methods.
We illustrate nonidentifiability by calibrating a three-state Markov model of cancer relative survival (RS). We performed two different calibration exercises: (1) only including RS as a calibration target and (2) adding the ratio between the two non-death states over time as an additional target. We used the Nelder-Mead (NM) algorithm to identify parameter sets that best matched the calibration targets. We used collinearity and likelihood profile analyses to check for nonidentifiability. We then estimated the benefit of a hypothetical treatment in terms of life expectancy gains using different, but equally good-fitting, parameter sets. We also applied collinearity analysis to a realistic model of the natural history of colorectal cancer.
Results.
When only RS is used as the calibration target, two different parameter sets yield similar maximum likelihood values. The high collinearity index and the bimodal likelihood profile on both parameters demonstrated the presence of nonidentifiability. These different, equally good-fitting, parameter sets produce different estimates of the treatment effectiveness (0.67 vs 0.31 years), which could influence the optimal decision. By incorporating the additional target, the model becomes identifiable with a collinearity index of 3.5 and a unimodal likelihood profile.
Conclusions.
In the presence of nonidentifiability, equally-likely parameter estimates might yield different conclusions. Checking for the existence of nonidentifiability and its implications should be incorporated into standard model calibration procedures.
Keywords: calibration, estimation, likelihood function, nonidentifiability, decision-analytic models
BACKGROUND
Model calibration is the process of estimating values for unknown or uncertain parameters of a mathematical model by matching model outputs to observed clinical, epidemiological, or any other type of data arising from a physical or biological system (known as calibration targets) (1). The goal is to identify parameter values that maximize the fit between model outputs and the calibration targets (2). Previous literature has shown how model calibration can be described as a statistical estimation problem (1,3,4). A desirable property for a statistical model is that of identifiability, which requires that different sets of parameter estimates cannot lead to the same probability distribution of the data (5,6). In the context of model calibration, this means that there exists a unique set of model parameter values that yields the best fit to the chosen calibration targets (7,8).
With the increasing complexity of mathematical models parameterized with a large number of unknown inputs, concerns have been raised about calibrating to an insufficient number of targets relative to the number of parameters to be estimated, making the model nonidentifiable (9). However, a model need not be of high complexity for its parameters to be nonidentifiable. Identifiability in the context of model calibration in medical decision making has been briefly discussed (10,11) but has not been formally described. In this article, we explicitly define nonidentifiability in the context of model calibration and demonstrate that even a simple disease simulation model can have nonidentifiable parameters. We start by defining the problem of nonidentifiability in a calibration framework. We then describe and calibrate a two-parameter simulation model of cancer recurrence that is calibrated to relative survival. We show that for this simple case, calibration does not yield a unique solution, exhibiting problems of nonidentifiability. We demonstrate that this problem could potentially be addressed by incorporating additional information and we assess the implications of nonidentifiability in the evaluation of the comparative effectiveness of a hypothetical treatment. In addition, we provide a description of methods to detect nonidentifiability and apply them to our illustrative example and a realistic model of the natural history of colorectal cancer. We conclude with suggestions on how to potentially achieve identifiability and methods to estimate parameters whenever nonidentifiability is not avoidable.
METHODS
Calibration as an estimation problem
Let M denote a mathematical model that takes a set of parameters as input and produces a set of outputs denoted as , that is ϕ = M(θ). The last equality holds true if M is a deterministic model; for stochastic models, ϕ would be defined as the expected value of model outputs, ϕ = E[M(θ)]. In the context of medical decision making, typical model types include Markov models, microsimulations, discrete event simulations, and dynamic transmission models (12,13). Input parameters θ might include transition probabilities or rates, while model outputs ϕ might be quantities such as prevalence, incidence, or survival curves (14). There is typically a subset of parameters θ that are unobserved due to financial, practical or ethical reasons (15). Thus, θ = (θu, θk), where θu denotes unknown parameters that need to be estimated via calibration (16,17) and θk denotes parameters that are either known a priori or that can be estimated directly from available data without the use of the mathematical model M (18). θu are the parameters of interest in the calibration problem, often called calibrated input parameters (19–21), whereas parameters θk are fixed for the purposes of calibration. In medical decision making, unknown parameters might be disease progression or regression rates, probability of symptom-based detection in natural history models of chronic diseases (10,11,22,23), or the transmission probabilities in infectious disease dynamic models (24), among others.
Let Y denote the clinical or epidemiological phenomenon in the population of interest. The empirical data, y, referred to as the calibration targets, is a realization of Y. The process of model calibration is an estimation problem where we seek to estimate θu by using a summary measure of the discrepancy between the corresponding model output, ϕ, and y (4). In this paper, we will use the likelihood function as the summary measure; however, other possible summary measures include the sum of squared difference (SSD), a weighted sum of squared difference (WSSD), absolute difference, and so on (19,25-27). To define a likelihood, a probability distribution of the calibration targets, f, is specified as a function of model parameters, θ (10). In the context of model calibration, Y ~ f(y; ϕ), where f denotes a probability distribution that is conditional on the calibration targets from the model outputs, ϕ, which in turn depend on θ. Given that Y = y is observed and M is a deterministic model, the likelihood function can be defined as L(θ) ≡ L(θ; y) = f(y; ϕ) = f(y; θ, M). A schematic diagram of the relationship between input parameters θ and calibration targets y to create a likelihood function L(θ) for model calibration in medical decision making is provided in Figure 1. A more detailed conceptualization of calibration as an estimation process has been proposed previously to account for both model inadequacy and observation error (1,3). However, modeling these discrepancies falls outside the scope of this article.
Figure 1.
Schematic diagram of the relation between input parameters θ and calibration targets y to create a likelihood function L(θ) for model calibration in medical decision making.
Nonidentifiability
Identifiability refers to whether the specified model and the chosen calibration targets are sufficient to yield a unique set of values for the calibrated input parameters (28–32). From a statistical perspective, a set of parameters θ is said to be identifiable if different values of θ correspond to different probability distributions f(y; ϕ): if θ ≠ θ′ then f(y; ϕ) ≠ f(y;ϕ′) (33). Therefore, the parameters θ are identifiable if f(y; ϕ) has a one-to-one mapping for all θ ∈ ϴ (25).
Nonidentifiability can arise at different stages of model calibration represented in Figure 1. The most conventional sense of nonidentifiability would occur when, given different parameter sets θ ≠ θ′, the model M produces the same output values corresponding to the calibration targets, ϕ ≠ ϕ′, which in turn could yield multiple peaks of similar magnitude in the fit function. Furthermore, it might not be possible to estimate parameters of a nonidentifiable model even with an infinite amount of data (11,31), where data are referred to as the sample size informing the selected calibration targets. Note that the model outputs corresponding to the calibration targets, ϕ, are a subset of the complete model output (e.g., the distribution of the population across all health states at every time point), meaning that nonidentifiability can arise when matching the calibration targets even if the mathematical model itself is well-defined. To construct an identifiable calibration problem in this case, one might need to include additional calibration targets (if data became available) (11) or further restrict the search space of possible values for unknown parameters based on biological plausibility or other a priori knowledge of the disease or population of interest.
Even if M(θ) = ϕ has a one-to-one mapping for all θ ∈ ϴ, eliminating the possibility of conventional nonidentifiability, calibration nonidentifiability can still occur in the mapping of model outputs to a goodness-of-fit value, L(ϕ; y). If the goodness-of-fit function maps different model outputs ϕ ≠ ϕ′ to the same values L(ϕ; y) = L(ϕ′; y) , then there again may exist multiple peaks of similar magnitude in the fit function. A simple example would be the case of calibration to two (scalar) calibration targets using a sum-of-squared difference between the model outputs and target values as the goodness-of-fit measure. If one set of parameter values fits the first target well and the second target poorly, while another set of parameter values fits the second target well and the first target poorly to the same extent, then both sets of parameters would result in the same value of fit. If these were also the best fits that could be achieved for this case (i.e., there were no other sets of parameter values that fit both targets well), then the fit function would exhibit multiple peaks. In this case, nonidentifiability could potentially be alleviated by reconsidering the measure of goodness-of-fit. While theoretically the goodness-of-fit function should reflect the true preferences of the analyst in constructing the model, these decisions are often arbitrary or not carefully thought out. Therefore, it is important that the analyst knows what the assumptions are behind using different goodness-of-fit measures. For example, the sum of squared differences assumes equal weight to all targets while the weighted sum of squared differences allows for different weights for different targets, which could be computed with the variances of each target. Alternatively, other approaches that eliminate the need to produce a single summary measure could be employed, such as the Pareto frontier approach (21). However, in this study we are interested in nonidentifiability issues at the goodness-of-fit level. If the goodness-of-fit function was carefully constructed and the problem of nonidentifiability remains, the issue could still be addressed by including additional calibration targets and restricting the unknown parameter search space, as in conventional nonidentifiability.
Identifying nonidentifiability
There are a number of different methods for detecting nonidentifiability (34–36). One such method, collinearity analysis (37), involves computing a collinearity index γK that reflects the degree of near-linear dependence of summary measures on a subset of parameters while fixing all other parameters to certain values. The summary measures could simply be the model outcomes or, in the context of identifiability analysis, they could be the likelihood, SSE or WSSE. The collinearity index γK of a subset K parameters is defined as:
where EV[·] refers to the operator calculating the set of eigenvalues of the argument, and denotes a subset of the matrix containing the columns corresponding to the parameters in K. is a normalized sensitivity matrix whose j -th column is defined as , and Sj represents a column of derivatives of the summary measures with respect to the j-th parameter. Thus, is a normalized measure of the importance of individual parameters on the summary measures (38). To compute , it is required to evaluate the model j times, one evaluation for each parameter. The collinearity index takes values from 1 to infinity, where higher values are associated with nonidentifiability problems (39). For example, a collinearity index of 1 means that the columns of are orthogonal and the parameter set is identifiable. A high value of γK implies that the parameter set K is poorly identifiable and the higher the value, the less identifiable the parameter set. In practical terms, it is suggested that parameters with collinearity indices lower than 10 are identifiable, between 10 and 15 are poorly identifiable, and greater than 15 are nonidentifiable (38).
Likelihood profiling is another approach for identifying issues of nonidentifiability. The profile likelihood, pL, is a one-dimensional representation of the likelihood function indicating the values of a parameter subset controlling by the influence of the rest of the parameters of interest (40). Profiling the likelihood involves evaluating the log-likelihood as a function of the k values of one parameter θi while the rest of the parameters θj, j ≠ i are re-optimized at each value of k (41). In general, if θI is a set of parameters of interest and θC is a set of complementary (i.e., nuisance) parameters, the profile likelihood for θI is pL(θI)=maxθC L(θI, θC). In practical terms, profiling the likelihood of p different parameters on k different values each, requires pk optimization routines. Having more than one minimum in the negative likelihood profile is an indicator of nonidentifiability.
Simple model of cancer relative survival
We illustrate the nonidentifiability problem by calibrating the transition probabilities of a simple deterministic Markov model to observed relative survival (RS) as reported by the Surveillance, Epidemiology, and End Results (SEER) Program. RS represents “cancer survival in the absence of other causes of death” (42). Therefore, relative survival at time t can be represented as the probability of not dying from cancer at time t in the absence of being at-risk of dying from non-cancer related causes. Most cancer-related deaths are attributed to metastatic recurrence (43,44), which is not directly observed in SEER. Because the cause of death information in cancer registries can be misclassified, SEER computes relative survival “as the ratio of the proportion of observed survivors in a cohort of cancer patients to the proportion of expected survivors in a comparable set of cancer-free individuals.” (42,45) To model the mechanism of cancer death, Markov models of cancer RS from an early-stage diagnosis typically include a distant metastasis (Mets) state, which is not directly observed in cancer registries (46). Accordingly, we developed a three-state, two-parameter Markov model of relative survival. In the Markov model, a simulated cohort of individuals starts in a “no evidence of disease” state (NED) and face a monthly risk pMets of being diagnosed with distant metastasis (Mets). Individuals who develop distant metastasis face a monthly risk pDieMets of dying from cancer. To be consistent with the dynamics of RS, we assume that individuals are only allowed to die of cancer if developed Mets. The state-transition diagram of the relative survival model is shown in Figure 2. To translate this calibration exercise into the terminology defined in the previous sections, y is RS, M represents the Markov model of cancer recurrence and RS, θu = {pMets, pDieMets}, and θk = {Ø} (i.e., there are no fixed input parameters). To compute RS from the Markov model over time, yt, we sum the proportion of the cohort in the NED and Mets states at each time t.
Figure 2.
State-transition diagram of a three-state Markov model of cancer relative survival.
Calibration of simple cancer relative survival model
For illustration purposes and to avoid issues of model misspecification, we generated the target data, y, by running the Markov model as a microsimulation with known parameter values (pMets = 0.10 and pDieMets = 0.05, that is θtrue = (0.10, 0.05)) and stochastically simulating 200 independent individuals over 60 months (47). Using the information of these simulated individuals, we estimated a RS curve, yt, with its corresponding standard error σt in monthly intervals. The likelihood function was constructed by assuming that the targets yt were Normal deviations from the model outputs ϕt with standard deviation σt at each time t. That is,
Therefore, the likelihood function is given by
where f(yt|θ) denotes the Normal density function for the target yt.
We assumed independence across targets to compute an aggregated likelihood function by multiplying the likelihood value at each time t as follows
If there are reasons to believe that targets are not independent, other likelihood functions can be specified such as multivariate normal or multinomial distributions. However, if the choice is a multivariate normal distribution, the analyst is required to define a correlation matrix across the targets. We used the Nelder-Mead (NM) algorithm initialized at 100 random starting values to identify cancer RS model parameter values that minimize the negative log-likelihood (48).
RESULTS
Calibration
The NM algorithm converged to two parameter sets with similar log-likelihood values: and . Figure 3 shows the contour of the likelihood and the two regions with similar values. For illustration purposes, in Figure 3, we plot only two different optimization paths; however, these are representative of the 100 different runs of the NM algorithm that we performed. The RS curve as the only calibration target was not sufficient to ensure identifiability.
Figure 3.
Contour plot of the negative log-likelihood function for the calibration of the cancer relative survival Markov model with two different search paths.
For illustration purposes, we also simulated the ratio between NED and Mets over time and added it as a second calibration target to demonstrate that parameters can become identifiable with an additional type of target that provides information on the specific dynamics of the unobserved state Mets. This information is not provided by SEER but could be obtained from alternative sources, such as a clinical trial. To compute the likelihood of this additional target, we assume that the logarithm of the ratio between NED and Mets follows a normal distribution. We assume independence between types of targets to compute an aggregated measure of the overall likelihood.
When the ratio between NED and Mets over time is incorporated as an additional target, the parameter space gets constrained in terms of the most likely region making the likelihood function unimodal, exhibiting a unique solution (Figure 4), suggesting that the model is now identifiable. In this case, regardless of the starting value, NM consistently recovers the same set of parameter estimates, .
Figure 4.
Contour of the negative log-likelihood function for the calibration of the cancer relative survival Markov model with information on the ratio of the simulated cohort between NED and Mets states with two different search paths.
Policy implications
We estimated the benefit of a hypothetical treatment that reduces pMets by 30% implemented as a relative risk (RR) of 0.7 using the three-state Markov model. We ran the model with the two different sets of calibrated parameter values and the true values. Benefits were quantified in terms of life expectancy (LE) gains under the intervention, calculated as a percentage of the LE gains predicted by the model using the true parameters (see Table 1). In the absence of treatment, both parameter sets from the nonidentifiable problem, and , result in the same LE. With the intervention, the benefit of treatment is overestimated by 88.6% when using , and underestimated by 10.4% when using . However, when using the parameter set from the identifiable problem, , the benefit of treatment is only slightly overestimated (by 6.6%) which is much closer to the truth.
Table 1.
Life expectancy (LE), LE gains and bias in LE gains from a hypothetical treatment intervention that reduces the risk of developing Mets by 70% (i.e., RR=0.7) using the true parameters θtrue, two sets of non-identifiable parameter estimates and , and the identifiable set of parameter estimates .
Scenario | Parameters | LE without treatment (years) | LE with treatment (years) | LE gains | % difference | |
---|---|---|---|---|---|---|
pMets | pDieMets | |||||
θtrue | 0.10 | 0.05 | 2.33 | 2.68 | 0.35 | 0.0% |
0.05 | 0.11 | 2.21 | 2.88 | 0.67 | 88.6% | |
0.11 | 0.05 | 2.21 | 2.52 | 0.31 | −10.4% | |
0.09 | 0.06 | 2.19 | 2.56 | 0.37 | 6.6% |
Identifying nonidentifiability
We computed the collinearity index for the parameters of the simple model using the R package FME (49). When the RS curve is used as the only target to calibrate the model, the collinearity index on both parameters tends to infinity, indicating nonidentifiability (Table 2). Similarly, when only the ratio between NED and Mets is included, the calibration problem is again poorly identifiable with a collinearity index of 10. Once both targets are considered for calibration, the parameters become identifiable with a collinearity index of 3.5 suggesting that calibration of both parameters should yield a unique solution, consistent with our findings (Figure 4). If instead it were possible to reduce the calibration to only a single unknown parameter (the other being fixed to some known value), the calibration becomes identifiable with a collinearity index of 1 for any subset of targets (Table 2). This might occur if some new information becomes available that allows one of the unknown parameters to be directly estimated (provided that this information pertains to the population of interest and is unbiased) obviating the need to estimate it through calibration.
Table 2.
Collinearity indices for different combinations of calibration targets and subsets of parameters to be estimated.
Subset of parameters | Calibration targets | Collinearity index |
---|---|---|
pMets & pDieMets | Survival | → ∞ |
Log-ratio NED/Mets | 10.0 | |
Survival and log-ratio NED/Mets | 3.5 | |
pMets or pDieMets | Survival | 1.0 |
Log-ratio NED/Mets | 1.0 | |
Survival and log-ratio NED/Mets | 1.0 |
The likelihood profile of the parameters of our simple model using only the survival curve as target shows two local minima in the negative log-likelihood (see Figure 5), which correspond to the two parameter sets identified by NM in Figure 3. Having more than one minimum in the negative likelihood profile is an indicator of nonidentifiability.
Figure 5.
Negative-likelihood profile (blue solid line), confidence levels (blue dashed-line) and minimum value (red dashed-line) of parameters pMets and pDieMets.
To illustrate the potential issue of nonidentifiability and the application of collinearity analysis to a more realistic modeling set-up, we developed a state-transition model (STM) of the natural history of colorectal cancer (CRC) implemented in discrete annual cycles based on a model structure originally proposed by Wu et al., 2006 (50). Briefly, this model has 9 different health states that include absence of the disease, precancerous lesions (i.e., adenomatous polyps), and preclinical and clinical cancer states by stage. The model has 11 parameters in total from which we assume 9 are unknown and need to be calibrated, which reflects the typical unknown parameters in this type of models. Similar to the simple model of cancer relative survival, we generated four different age-specific targets by running independent individuals through the model as a microsimulation (47). The targets are prevalence of adenomas, proportion of small adenomas, and CRC incidence for early and late cancer stages. A more detailed description of the model and the generation of the calibration targets, is presented in the Supplementary Appendix.
We computed the collinearity index, γK, on all possible combinations of the 9 calibrated parameters for different combinations of targets, which required evaluating the model 9 times in total. The subset of calibration targets included in the calibration influenced the number of model parameters that could be identified (having γK ≤ 15) (38,39), as summarized in the Supplementary Appendix Figures 4-6. When all four targets were included and only two parameters were unknown (35 out of 36 possible combinations; Figure 6), the calibration was almost always identifiable. However, if 8 parameters were to be estimated through calibration, only one combination was identifiable using all four calibration targets. Even with all the targets, it is not possible to define an identifiable calibration problem for all the model parameters, which have a combined γK of 110, shown in the first row of Table 3 and in Figure 6.
Figure 6.
Collinearity index (γ) for all possible combinations of parameters of the natural history model of CRC using all four targets (red solid vertical line indicates a collinearity index of 15)
Table 3.
Collinearity index (γk) for different combinations and number of parameters (N) of the natural history model of colorectal cancer using all four targets. If parameter is present, the cell equals 1 and 0 otherwise. The parameters are described in detailed in the Supplementary Appendix.
l | g | λ2 | λ3 | λ4 | λ5 | λ6 | padenoma | psmall | N | γ |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 9 | 110 |
1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 8 | 14 |
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 8 | 14 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 8 | 92 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 8 | 81 |
1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 8 | 109 |
1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 8 | 108 |
1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 8 | 108 |
1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 8 | 109 |
1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 8 | 102 |
DISCUSSION
In the context of model calibration, much has been written on issues of model specification, structural uncertainty, and search algorithms for performing the estimation (14,51-55). However, the issue of nonidentifiability is rarely discussed. Identifiability is an important element in model calibration. If a model is nonidentifiable, multiple parameter sets could produce exact or similar best-fitting model outputs. This is of particular relevance in the context of medical decision making when using mathematical models that need to be calibrated. Different, but equally good-fitting sets of parameters might produce different estimates of the effectiveness of interventions that could potentially influence the optimal decision (21,56).
The calibration of the simple model using only the RS curve is nonidentifiable because the two transition probabilities, pMets and pDieMets, compensate for each other to yield the same RS. In one instance, survival includes greater progression to Mets first with a slower mortality, while in the other, there is slower progression to Mets, but higher mortality. While different sets of values of pMets and pDieMets produce the same RS, the prevalence of metastases in the population is different over time. Specifically, it is much higher when the progression to metastasis, pMets, is higher with a lower mortality from metastases, pDieMets. Adding the ratio between the proportion of the population in NED and Mets states as a calibration target penalizes sets of parameter values that do not match the true prevalence of metastases in the population over time. This collapsed the likelihood function to having a single peak for parameter values where both RS and the ratio of NED:Mets are well-matched over time (Figure 4). To compute the overall likelihood on the simple model of RS, we assumed independence across targets. It is common to assume that targets are independent in calculating an overall goodness-of-fit measure. In practice, this might not be true and might be a non-verifiable assumption; however, while this is a concern in estimating the uncertainty in calibrated parameters, in the case of a nonidentifiable calibrated model that problem will persist regardless of the independence assumption. The main implication of erroneously assuming independence is an underestimate of the uncertainty of the calibrated parameters.
Scholars have argued that the best approach when nonidentifiability is known or suspected, is to acknowledge the problem and if possible, gather more information in order to estimate model parameters (33). Although identifiability might not always be guaranteed in model calibration, incorporating more information or constraining the parameter space based on prior knowledge on the disease of interest could help make the model identifiable (57). In our example, using an additional target for calibration solved the problem of nonidentifiability, but it was not obvious that this would resolve the issue ex-ante. In general, having targets that inform different aspects of the model will help improve and potentially achieve identifiability. If there are targets that are currently not available but could potentially be available at some cost, the analyst could do an exploratory analysis using collinearity analysis to determine if having these additional targets would improve the identifiability of the calibration problem. As a word of caution, seeking out additional data sources to resolve nonidentifiability might introduce bias in the calibrated parameters if the additional calibration targets are derived from a population different than the one being modeled (often called population bias (58)). Therefore, it is important that the target does represent the population of interest or if this is not feasible, such target should be modified to represent this population. If there is information on how these two populations differ, the target could be modified accordingly through different techniques, such as meta-analysis or bias analysis, with the latter being especially designed to account for transferability.(59)
Alternatively, other solutions could be constraining the parameter space using expert information or fixing a subset of the parameters at meaningful values to reduce the number of parameters estimated through calibration. Constraining the parameter space could be done through either adding optimization constrains to the parameter values on a maximum likelihood estimation set-up or specifying bounds on the parameters through informative constrained priors (e.g., using uniform priors with user-defined lower and upper bounds) on a Bayesian set-up (37). To verify if either incorporating additional targets or constraining the parameter space helps to improve identifiability, the analyst could compute the collinearity index with this updated information.
To numerically compute the collinearity index or profiling the likelihood, the model needs to be re-evaluated multiple times at different parameter values. For problems with a relatively large number of parameters, the computational burden to conduct these analyses will increase, but this might not be a problem of concern. For example, the computational burden to conduct collinearity analysis on a realistic calibration set-up, such as the calibration of the natural history model of CRC described above, is negligible compared to the number of evaluations needed for calibration. However, as the number of parameters increases, profiling the likelihood quickly becomes computationally intensive (60) and inference stemming from the likelihood profile might be misleading (61). In such cases, more efficient methods for exploring the parameter space can be employed. For example, one can use a direct search method making sure to initialize it at different starting values to look for different converging points or employ a Bayesian approach that is able to recover the whole posterior distribution (e.g., Markov chain Monte Carlo methods) (5). This might not be feasible for simulation models that are computationally time consuming, such as microsimulation or discrete-event simulation models. However, it is possible to construct statistical emulators (often called metamodels) of the original simulation model that in turn can be evaluated at a fraction of the time (62). Emulators can and have been used to calibrate the parameters of the original simulation model (1,63). Thus, different methods to check for nonidentifiability could be applied to the emulator.
Checking for the existence of nonidentifiability should be an important step in model calibration. In the presence of nonidentifiability, it is important to try to make the parameters identifiable by constraining the parameter space either through imposing optimization constraints or constrained priors in a Bayesian set-up (33), or by incorporating additional calibration targets (11). If these approaches are not possible or not sufficient to eliminate nonidentifiability, it is important to report the ranges of the equally good-fitting parameters sets (64) and an estimate of the uncertainty of the calibrated parameters (e.g., variance), which will tend to be high in nonidentifiable parameters (33). Furthermore, a sensitivity analysis should be conducted on the policy implications of evaluating the different best-fitting solutions, such as we did in Table 1.
In this article, we showed that nonidentifiability is not only a potential problem in complex simulation models but can also occur in simple models. Checking for the existence of nonidentifiability should be an important step of model calibration. If nonidentifiability is present, its effects on the recommendations driven from the mathematical model should be assessed.
Supplementary Material
Acknowledgments
We thank Hawre Jalal, Eric F. Lock and Bryan Dowd for their discussion on this topic with the authors.
Financial support for this study was provided in part by a grant from the National Council of Science and Technology of Mexico (CONACYT) and a Doctoral Dissertation Fellowship from the Graduate School of the University of Minnesota as part of Dr. Alarid-Escudero’s doctoral program. Dr. Enns was supported by a grant from the National Institute of Allergy and Infectious Disease of the National Institutes of Health under award no. K25AI118476. Drs. Kuntz and Alarid-Escudero were supported by a grant from the National Cancer Institute (U01-CA-199335) as part of the Cancer Intervention and Surveillance Modeling Network (CISNET). The funding agencies had no role in the design of the study, interpretation of results, or writing of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.
Footnotes
The preliminary findings from this analysis were presented at the 39th annual meeting of the Society for Medical Decision Making in October 2017.
References
- 1.Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J R Stat Soc Ser B (Statistical Methodol [Internet]. 2001;63(3):425–64. Available from: http://wrap.warwick.ac.uk/11886/ [Google Scholar]
- 2.AHRQ. Decision and Simulation Modeling: Review of Existing Guidance, Future Research Needs, and Validity Assessment [Internet]. Rockville, MD; 2014. Available from: http://effectivehealthcare.ahrq.gov/ehc/products/598/1965/modeling-review-draft-140912.pdf [Google Scholar]
- 3.Higdon D, Kennedy M, Cavendish JC, Cafeo JA, Ryne RD. Combining Field Data and Computer Simulations for Calibration and Prediction. SIAM J Sci Comput. 2004;26(2):448–66. [Google Scholar]
- 4.Campbell K Statistical calibration of computer simulations. Reliab Eng Syst Saf. Elsevier; 2006;91(10):1358–63. [Google Scholar]
- 5.Gustafson P Bayeisan inference for partially identified models: Exploring the limits of limited data. Boca Raton, FL: CRC Press; 2015. [Google Scholar]
- 6.Wang W Identifiability of linear mixed effects models. Electron J Stat 2013;7(1):244–63. [Google Scholar]
- 7.Arendt PD, Apley DW, Chen W. Quantification of Model Uncertainty: Calibration, Model Discrepancy, and Identifiability. J Mech Des [Internet]. 2012;134(10):100908 Available from: http://mechanicaldesign.asmedigitalcollection.asme.org/article.aspx?articleid=1484830%5Cn%3CGotoISI%3E://WOS:000309654400010 [Google Scholar]
- 8.Arendt PD, Apley DW, Chen W, Lamb D, Gorsich D. Improving Identifiability in Model Calibration Using Multiple Responses. J Mech Des 2012; 134(10): 100909. [Google Scholar]
- 9.Basu S, Galvani AP. Re: “Multiparameter Calibration of a Natural History Model of Cervical Cancer.” Am J Epidemiol [Internet]. 2007;166(8):983–983. Available from: http://aje.oxfordjournals.org/cgi/doi/10.1093/aje/kwm240 [DOI] [PubMed] [Google Scholar]
- 10.Rutter CM, Miglioretti DL, Savarino JE. Bayesian Calibration of Microsimulation Models. J Am Stat Assoc 2009;104(488):1338–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rutter CM, Zaslavsky AM, Feuer EJ. Dynamic microsimulation models for health outcomes: A review. Med Decis Making. 2011;31(1): 10–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Brennan A, Chick SE, Davies R. A taxonomy of model structures for economic evaluation of health technologies. Health Econ 2006; 15(12): 1295–310. [DOI] [PubMed] [Google Scholar]
- 13.Caro JJ, Briggs AH, Siebert U, Kuntz KM. Modeling good research practices--overview: A report of the ISPOR-SMDM Modeling Good Research Practices Task Force-1. Med Decis Making [Internet]. 2012. [cited 2014 Nov 9];32(5):667–77. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22990082 [DOI] [PubMed] [Google Scholar]
- 14.Briggs AH, Weinstein MC, Fenwick EAL, Karnon J, Sculpher MJ, Paltiel AD. Model Parameter Estimation and Uncertainty Analysis: A Report of the ISPOR-SMDM Modeling Good Research Practices Task Force Working Group-6. Med Decis Making. 2012. September;32(5):722–32. [DOI] [PubMed] [Google Scholar]
- 15.Garnett GP, Kim JJ, French K, Goldie SJ. Chapter 21: Modelling the impact of HPV vaccines on cervical cancer and screening programmes. Vaccine. 2006;24(SUPPL. 3): 178–86. [DOI] [PubMed] [Google Scholar]
- 16.Erenay FS, Alagoz O, Banerjee R, Cima RR. Estimating the unknown parameters of the natural history of metachronous colorectal cancer using discrete-event simulation. Med Decis Making. 2011;31(4):611–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Russell LB. Exploring the Unknown and the Unknowable with Simulation Models. Med Decis Making. 2011;31(4):521–3. [DOI] [PubMed] [Google Scholar]
- 18. Bilcke J, Chapman R, Atchison C, et al. Quantifying Parameter and Structural Uncertainty of Dynamic Disease Transmission Models Using MCMC: An Application to Rotavirus Vaccination in England and Wales. Med Decis Making [Internet]. 2015;35(5):633–47. Available from: http://mdm.sagepub.com/cgi/doi/10.1177/0272989X14566013 [DOI] [PubMed] [Google Scholar]
- 19.Vanni T, Karnon J, Madan J, et al. Calibrating Models in Economic Evaluation: A Seven-Step Approach. Pharmacoeconomics. 2011;29(1):35–49. [DOI] [PubMed] [Google Scholar]
- 20.Karnon J, Vanni T. Calibrating Models in Economic Evaluation: A Comparison of Alternative Measures of Goodness of Fit, Parameter Search Strategies and Convergence Criteria. Pharmacoeconomics. 2011;29(1):51–62. [DOI] [PubMed] [Google Scholar]
- 21.Enns EA, Cipriano LE, Simons CT, Kong CY. Identifying Best-Fitting Inputs in Health-Economic Model Calibration: A Pareto Frontier Approach. Med Decis Making [Internet]. 2015;35(2):170–82. Available from: http://www.ncbi.nlm.nih.gov/pubmed/24799456 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Welton NJ, Ades AE. Estimation of markov chain transition probabilities and rates from fully and partially observed data: uncertainty propagation, evidence synthesis, and model calibration. Med Decis Making [Internet]. 2005. [cited 2013 Oct 7];25(6):633–45. Available from: http://www.ncbi.nlm.nih.gov/pubmed/16282214 [DOI] [PubMed] [Google Scholar]
- 23.Karnon J, Goyder E, Tappenden P, et al. A review and critique of modelling in prioritising and designing screening programmes. Health Technol Assess (Rockv). 2007;11(52). [DOI] [PubMed] [Google Scholar]
- 24.Enns EA, Brandeau ML, Igeme TK, Bendavid E. Assessing effectiveness and cost-effectiveness of concurrency reduction for HIV prevention. Int J STD AIDS. 2011;22(10):558–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lehmann EL, Casella G. Theory of Point Estimation. Second Edi Casella G, Fienberg S, Olkin I, editors. Springer; 1998. [Google Scholar]
- 26.van der Steen A, van Rosmalen J, Kroep S, et al. Calibrating Parameters for Microsimulation Disease Models: A Review and Comparison of Different Goodness-of-Fit Criteria. Med Decis Making [Internet]. 2016;1–14. Available from: http://mdm.sagepub.com/cgi/doi/10.1177/0272989X16636851 [DOI] [PubMed] [Google Scholar]
- 27.Stout NK, Knudsen AB, Kong CY (Joey), Mcmahon PM, Gazelle GS. Calibration Methods Used in Cancer Simulation Models and Suggested Reporting Guidelines. Pharmacoeconomics. 2009;27(7):533–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Jacquez JA. The inverse problem for compartmental systems. Math Comput Simul 1982;24(6):452–9. [Google Scholar]
- 29.Jacquez JA, Greif P. Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design. Math Biosci 1985;77(1–2):201–27. [Google Scholar]
- 30.Bellman R, Åström KJ. On structural identifiability. Math Biosci 1970;7(3–4):329–39. [Google Scholar]
- 31.Bickel PJ, Doksum KA. Mathematical Statistics: Basic Ideas and Selected Topics. 2nd ed Upper Saddle River, NJ: Prentice-Hall; 2001. [Google Scholar]
- 32.Casella G, Berger R. Statistical Inference. Second Edi. 2002. 660 p. p. [Google Scholar]
- 33.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3rd ed Chapman Texts in Statistical Science Series CRC Press; 2014. 696 p. [Google Scholar]
- 34.Raue A, Kreutz C, Maiwald T, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25(15): 1923–9. [DOI] [PubMed] [Google Scholar]
- 35.Raue A, Kreutz C, Theis FJ, Timmer J. Joining forces of Bayesian and frequentist methodology: A study for inference in the presence of non-identifiability. Philos Trans R Soc A Math Phys Eng Sci 2012;371(1984):20110544. [DOI] [PubMed] [Google Scholar]
- 36.Frohlich F, Theis FJ, Hasenauer J. Uncertainty Analysis for Non-identifiable Dynamical Systems: Profile Likelihoods, Bootstrapping and More. Computational Methods in Systems Biology [Internet]. 2014. p. 61–72. Available from: http://link.springer.com/chapter/10.1007/978-3-319-12982-2_5 [Google Scholar]
- 37.Belsley DA. Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley; 1991. 396 p. [Google Scholar]
- 38.Brun R, Reichert P, Ku HR. Practical identiability analysis of large environmental simulation models. Water Resour Res 2001;37(4): 1015–30. [Google Scholar]
- 39.Omlin M, Brun R, Reichert P. Biogeochemical model of Lake Zurich: sensitivity, identifiability and uncertainty analysis. Ecol Modell. 2001;141(1–3): 105–23. [Google Scholar]
- 40.Berger JO, Liseo B, Wolpert RL. Integrated Likelihood Methods for Eliminating Nuisance Parameters. Stat Sci 1999;14(1): 1–28. [Google Scholar]
- 41.Kreutz C, Raue A, Kaschek D, Timmer J. Profile likelihood in systems biology. FEBS J 2013;280(11):2564–71. [DOI] [PubMed] [Google Scholar]
- 42.SEER. Relative Survival [Internet]. 2017. [cited 2017 Jul 9]. Available from: https://seer.cancer.gov/seerstat/WebHelp/Relative_Survival.htm
- 43.Martin TA, Ye L, Sanders AJ, Lane J, Jiang WG. Cancer Invasion and Metastasis: Molecular and Cellular Perspective. Madame Curie Bioscience Database [Internet] [Internet]. Austin. TX: Landes Bioscence; 2013. Available from: https://www.ncbi.nlm.nih.gov/books/NBK164700/ [Google Scholar]
- 44.Chaffer CL, Weinberg RA. A Perspective on Cancer Cell Metastasis. Science (80-) [Internet]. 2011;331(6024):1559–64. Available from: http://www.sciencemag.org/cgi/doi/10.1126/science.1203543 [DOI] [PubMed] [Google Scholar]
- 45.Ederer F, Axtell LM, Cutler SJ. The relative survival rate: a statistical methodology. Natl Cancer Inst Monogr [Internet]. 1961. September;6:101–121. Available from: http://europepmc.org/abstract/MED/13889176 [PubMed] [Google Scholar]
- 46.Huszti E, Abrahamowicz M, Alioum A, Binquet C, Quantin C. Relative survival multistate Markov model. Stat Med [Internet]. 2012. February 10 [cited 2014 Sep 17];31(3):269–86. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22052528 [DOI] [PubMed] [Google Scholar]
- 47.Krijkamp E, Alarid-Escudero F, Enns EA, Jalal H, Hunink MGM, Pechlivanoglou P. Microsimulation modeling for health decision sciences using R: A tutorial. Med Decis Making. 2018;38(3):400–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Nelder JA, Mead R. A simplex method for function minimization. Comput J 1965;7(4):308–13. [Google Scholar]
- 49.Soetaert K, Petzoldt T. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. J Stat Softw [Internet]. 2010;33(3). Available from: http://www.jstatsofit.org/artide/view/v033i03 [Google Scholar]
- 50.Wu GH-M, Wang Y-M, Yen AM-F, et al. Cost-effectiveness analysis of colorectal cancer screening with stool DNA testing in intermediate-incidence countries. BMC Cancer. 2006;6:136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Roberts M, Russell LB, Paltiel AD, Chambers M, McEwan P, Krahn M. Conceptualizing a model: A report of the ISPOR-SMDM modeling good research practices task force-2. Med Decis Making. 2012;32(15):678–89. [DOI] [PubMed] [Google Scholar]
- 52.Siebert U, Alagoz O, Bayoumi AM, et al. State-Transition Modeling: A Report of the ISPOR-SMDM Modeling Good Research Practices Task Force-3. Med Decis Making [Internet]. 2012;32(5):690–700. Available from: http://mdm.sagepub.com/cgi/doi/10.1177/0272989X12455463 [DOI] [PubMed] [Google Scholar]
- 53.Karnon J, Stahl J, Brennan A, Caro JJ, Mar J, Moller J. Modeling using discrete event simulation: A report of the ISPOR-SMDM modeling good research practices task force-4. Med Decis Making. 2012;32(15):821–7. [DOI] [PubMed] [Google Scholar]
- 54.Pitman R, Fisman D, Zaric GS, et al. Dynamic transmission modeling: A report of the ISPOR-SMDM Modeling Good Research Practices Task Force Working Group-5. Med Decis Making [Internet]. 2012. [cited 2014 Mar 13];32(5):712–21. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22990086 [DOI] [PubMed] [Google Scholar]
- 55.Eddy DM, Hollingworth W, Caro JJ, Tsevat J, McDonald KM, Wong JB. Model transparency and validation: A report of the ISPOR-SMDM modeling good research practices task force-7. Med Decis Making. 2012;32(15):733–43. [DOI] [PubMed] [Google Scholar]
- 56.Taylor DC a, Pawar V, Kruzikas DT, Gilmore KE, Sanon M, Weinstein MC. Incorporating calibrated model parameters into sensitivity analyses: deterministic and probabilistic approaches. Pharmacoeconomics [Internet]. 2012. February 1;30(2): 119–26. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22149631 [DOI] [PubMed] [Google Scholar]
- 57.Greene WH. Econometric Analysis. 7th ed Pearson; 2012. [Google Scholar]
- 58.Turner RM, Spiegelhalter DJ, Smith GCS, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc [Internet]. 2009;172(1):21–47. Available from: papers2://publication/uuid/20D7FE41-962E-4BF1-AF60-A2D12690A5AE [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol [Internet]. 2014;43(6): 1969–85. Available from: http://www.ije.oxfordjournals.org/cgi/doi/10.1093/ije/dyu149 [DOI] [PubMed] [Google Scholar]
- 60.Raue A, Kreutz C, Maiwald T, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Supplement. Bioinformatics. 2009;25(15): 1923–9. [DOI] [PubMed] [Google Scholar]
- 61.McCullagh P, Nelder JA. Generalized Linear Models. 2nd ed Chapman and Hall/CRC; 1989. 532 p. [Google Scholar]
- 62.Kleijnen JPC. Regression and Kriging metamodels with their experimental designs in simulation: A review. CentER Discussion Paper Series 2015. Report No.: 2015–035. [Google Scholar]
- 63.Farah M, Birrell P, Conti S, Angelis D De. Bayesian Emulation and Calibration of a Dynamic Epidemic Model for A/H1N1 Influenza. J Am Stat Assoc [Internet]. 2014; 109(508): 1398–411. Available from: http://www.tandfonline.com/doi/abs/10.1080/01621459.2014.934453 [Google Scholar]
- 64.Ballnus B, Hug S, Hatz K, Görlitz L, Hasenauer J, Theis FJ. Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems. BMC Syst Biol [Internet]. BMC Systems Biology; 2017; 1163(11): 1–18. Available from: https://bmcsystbiol.biomedcentral.com/track/pdf/10.1186/s12918-017-0433-1?site=bmcsystbiol.biomedcentral.com [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.