Abstract
Continuous time Markov chain (CTMC) models are frequently employed in medical research to study disease progression, but are rarely applied to the transtheoretical model (TTM), a psychosocial model widely used in studies of health-related outcomes. The TTM often includes more than three states and conceptually allows for all possible instantaneous transitions (referred to as general CTMC). This complicates the likelihood function because it involves calculating a matrix exponential that may not be simplified for general CTMC models. We undertook a Bayesian approach wherein we numerically evaluated the likelihood using ordinary differential equation solvers available from the GNU scientific library. We compared our Bayesian approach with the maximum likelihood (ML) method implemented with the R package MSM. Our simulation study showed that the Bayesian approach provided more accurate point and interval estimates than the ML method, especially in complex CTMC models with five states. When applied to data from a four-state TTM collected from a nutrition intervention study in the Next Step Trial, we observed results consistent with the results of the simulation study. Specifically, the two approaches provided comparable point estimates and standard errors for most parameters, but the ML offered substantially smaller standard errors for some parameters. Comparable estimates of the standard errors are obtainable from package MSM, which works only when the model estimation algorithm converges.
Keywords: Bayesian data analysis, Longitudinal categorical data, Markov chain models, Metropolis Hastings algorithm, Transtheoretical model
1 Introduction
Continuous time Markov chain (CTMC) models (also known as multi-state Markov models) have a wide range of applications in medical research. They are very useful when the research interest is the progression or control of a chronic disease, for instance, HIV infection1, breast cancer2, or asthma3. These models have also been applied in health promotion, including cancer screening4 and nutrition interventions5. However, application of CTMC modelling is limited in the transtheoretical model (TTM), a psychosocial model widely used to study health-related outcomes6. The TTM often has more than three states and conceptually allows for all possible instantaneous transitions (referred to as a general CTMC). Throughout this article, we use the term “instantaneous transition” to denote a direct one-step transition from one state to another without requiring an intermediate transition. In other words, an individual may make “spontaneous” jumps from the current state to any other state without experiencing the intervening states7,8. Conventionally, data from longitudinal TTMs have been analyzed using discrete time Markov chain models9 or generalized multinomial logit models5. These approaches require a balanced data structure (in terms of the time points at which the measurements were taken). When the data are not balanced, neither of these approaches are suitable, and complex statistical models might be necessary10. There is a large body of literature that supports the TTM; however, mathematical approaches to quantify these transitions between states have not been sufficiently studied9,11.
In CTMC models, the transition probabilities are calculated as P(t) = eQt, where Q is the intensity/infinitesimal matrix and t is the time between two observed states. The likelihood function of a general CTMC model is complex and the calculation of the probabilities of P(t) involve solving a matrix exponential, which can be mathematically difficult12. The exact analytical forms of the likelihood functions are available for two-state and three-state general CTMC models13,14, where the transition probabilities are obtained by solving ordinary differential equations. Moreover, eigensystem decomposition techniques are often employed to obtain an analytical expression of the likelihood1,15,16. In principle, this approach is applicable to any general CTMC model, but it fails when repeated eigenvalues exist. In contrast, the analytical form of the likelihood function is not required for Bayesian approaches, and the likelihood can be numerically calculated, for example, by solving ordinal differential equations11,17,18,19. Though both the maximum likelihood (ML) method and the Bayesian approach have been widely applied to address different biological questions, their relative performances for general CTMC models have not been empirically examined.
We undertook a Bayesian approach to analyse longitudinally measured categorical data from a TTM of health behavioural change, for which the transitions of the outcome variables over time for each individual were assumed to follow a CTMC model. In this approach, the likelihood is numerically evaluated using ordinary differential equation solvers available from the GNU scientific library (http://www.gnu.org/software/gsl/), and posterior samples are generated with the Metropolis Hastings (MH) algorithm. Welton and Ades17 described how to implement Bayesian CTMC models by solving P(t) = eQt with the WinBUGS differential interface of WBDiff. Though this approach can be used for general CTMC models in principle, their focus was on models with some restrictions (e.g., forward 4-state models with state 4 being an absorbing state). An R package, MSM, has been developed to handle CTMC models using standard optimization algorithms within the ML framework16. The analytical expression of the likelihood, if it exists, is obtained from symbolic algebraic software, and otherwise, eigen decomposition or Pade approximates (repeated eigenvalues) are used. We conducted simulation studies to evaluate the validity of the proposed method, as well as to compare it with the ML method implemented with the R package MSM.
We organise this article as follows: In Section 2, we introduce and formulate the CTMC model and discuss Bayesian implementation of the model and its goodness of fit. In Section 3, we report the results of a simulation study conducted for general four-state and five-state Markov chain models. In Section 4, we apply our method to the Next Step Trial data. In Section 5, we discuss our conclusions, and in Section 6, we report limitations of this study.
2 Methods
In medical research, models that incorporate death as the absorbing state are normally referred to as “illness-death” models or “forward” models if backward transitions are not allowed. In contrast, for a general (or recurrent) model, which is of interest in this study, a subject can move from one state to any other state without restriction. In this study, we treat the TTM model as a four-state recurrent model and represent its four states/stages by precontemplation (P), contemplation (C), action (A) and maintenance (M). (We use state and stage synonymously in this article.) An example of a forward model is one in which state M represents death, in which case transitions from state M to any other state would not be biologically possible. Though the proposed method can be applied to such models that include restrictions, in this study, we focus on models with unrestricted movement among multiple states.
2.1 The Likelihood
Consider a longitudinal study in which individuals can move among S stages. Assume that the kth subject is measured repeatedly at times tk,1, tk,2, …, tk,nk, with outcomes denoted by yk(tk,1), yk(tk,nk) and recorded as 1,2,…,S, where k = 1, 2, …m is the number of subjects in the study and nk is the number of observations on subject k. Let P(t) denote the S × S transition probability matrix, with entries pij=p[yk(s + t) = j |yk(s) = i] for i, j= 1, 2,…,s. The stochastic process can be fully described by the infinitesimal transition matrix Q = qij, such that qij(t) ≥ 0 and –qii = Σi≠j qij for i, j = 1,2,…,s. Under these assumptions, the time that a subject spends in state i is exponentially distributed with the mean of 1/qii; and when that subject’s state is about to change in the next instant, he or she will move from state i to state j with the probability of qij/qii for i ≠ j21. Let θ denote the S(S − 1) dimensional parameter vector, which consists of all entries of qij in Q for i ≠ j. The transition probability matrix P(t) is determined by the infinitesimal matrix, which is given as with P(0) = I20. The likelihood function is given as
| (1) |
where y = (y1,y2,…,ym). Note that equation (1) is indeed a function of the model parameters of Q. If there are restrictions on the instantaneous transition, we can specify the model by setting the corresponding rate qij = 0. For example, we can specify the model as qSi = 0 for i = 1, 2,…, s for a model with an absorbing state. In addition, after estimating the model parameters, we can calculate the transition probabilities within a given time interval.
2.2 Priors and Posterior Distributions
In general CTMC models, the parameters in θ = {qij : i, j = 1, 2,…, S and i ≠ j} are restricted, (i.e., greater than or equal to 0); thus, independent gamma distributions that have the same support as the restrictions of θ are chosen as priors. Let
| (2) |
denote the joint prior distribution for qij, where i, j = 1, 2,…,S and i ≠ j. The hyper-parameters aij = 0.001 and bij = 100 were chosen, so that the priors in 2 have a mean of 0.1 and a variance of 10 and are considered as flat priors. For restricted parameters, log transformations are often recommended to improve the performance of the sampling22. In this research, we adopt the log transformation on all parameters, i.e., λij = log(qij), λij ∈ (−∞,∞) for i, j = 1, 2,…,S and i ≠ j. The logarithm of the transformed posterior distribution is given by
| (3) |
where Λ is the parameter vector of λij for i, j = 1, 2,…,S and i ≠ j. Then the parameter θ is transformed back as in the sampling process during the implementation of the Bayesian estimation procedures.
In the Bayesian framework, previous information/knowledge, which may be derived from data in similar historical studies, can be integrated as informative priors to improve model estimation. For example, we can use priors of gamma distributions for which the means are around the point estimates of the model parameters. These point estimates can be approximately estimated14. Sensitivity studies are often conducted by comparing posterior inferences with different magnitudes of the variance, e.g., strong informative (small variance) or weak informative (large variance)23,24. In this article, we employ flat priors in both the simulation and case studies. These priors perform satisfactorily; hence, we do not explore other priors.
2.3 Bayesian Implementations
This research takes advantage of time efficiency in the C programming language and effectiveness in the fourth-order Runge-Kutta method for solving differential equations(GSL-GNU Scientific library; http://www.gnu.org/software/gsl/). The combination of these approaches can help to overcome the difficulty of solving the equation of P(t), which is essential when implementing the Bayesian method. In addition, we use the generic MH algorithm to sample the posterior distributions on the proposed density of N(Λζ−1, cΣ). This multivariate normal distribution is centered at the current sample of Λζ−1, where Σ is the variance-covariance matrix. The constant c is adjustable so as to maintain an optimal acceptance rate around 23%, or around 20% when there are no standard forms for the conditional distributions23. Note that the acceptance rate is defined as the percentage of samples that are accepted via the MH algorithm.
Drawing multivariate normal values in C language is not immediately available; thus, we draw independent standard normal variables and use the Cholesky decomposition technique to generate the desired multivariate normal distributions. Let X be a vector of independent identically distributed variables of the standard normal distribution, and Σ = LLT, where L and LT are the upper and lower triangular matrices, respectively. Then we have Var(LX) = LV ar (X) LT = Σ. The multivariate normal distribution can be obtained with this algorithm using random sample X and pre-specified matrix L. Note that if L is a diagonal matrix, we will have independent variables for the components of X. In all simulation studies and in the analysis of real data, we apply independent proposal densities. Given the initial values Λ0, the sampling procedure goes as follows:
Draw a sample of Λ* from the proposal density of N(Λζ−1,cΣ).
Compute the value of ρ = log{p(Λ*)} − log{p(Λζ−1)}.
Draw a random uniform variable u ∈ (0,1), and calculate n = log(u).
If n ≤ ρ, set Λζ = Λ*; otherwise, set Λζ = Λζ−1.
Repeat steps 1–4 until a desired set of samplings is obtained.
After generating the desired number of samples from the MH algorithm, we transform the parameters back as qij = eλij, for i,j = 1, 2, 3 …,S and i ≠ j. We run three parallel chains with over-dispersed initial values for both the case and simulation studies, and calculate the Brooks-Gelman statistic . The results we report in this article are calculated using samples with , which are considered to be converged23. For Bayesian inference, we compute the sample means as the point estimates and use the 2.5% and 97.5% quantiles to construct the 95% credible intervals.
2.4 AIC, DIC, and Goodness-of-fit tests
In a frequentist framework, Akaike’s information criterion (AIC) has been widely applied for model selection. The deviance information criterion (DIC) is a hierarchical modelling generalization of the AIC23. A smaller value of AIC or DIC indicates a better fitting model. Since instantaneous transitions are generally not observable in CTMC models, it is important to check how well the model fits the data1,15,16,25. A Pearson-type goodness-of-fit statistic has been proposed; however, it does not have a χ2 distribution, and the bootstrap technique was used to describe the whole distribution26. In the Bayesian framework, model checking is often conducted by using posterior predictive values, which is analogous to the aforementioned bootstrap technique. Letting T(y, θ) be the test quantity (i.e., number of transitions in this study) that measures the discrepancy between the fitted model and the data, the Bayesian predictive p-value is then defined as PB = Pr{T(yrep, θ) ≥ T(y, θ)}, where yrep represents the predicted values of the data23. Extreme values of PB greater than 0.95 or less than 0.05 are normally considered to indicate a significant discrepancy between the data and the model. We calculate AIC/DIC and perform a goodness-of-fit test only for the analysis of real data.
3 Simulation Studies
In this section, we describe the simulation studies we conducted to examine the proposed methods for analysing general recurrent four-state and five-state CTMC models. For each setting, we first selected a Q matrix, and then simulated data sets under the Markov assumption. That is, the sojourn time of the process in any state has an exponential distribution with parameters of the diagonal entries of Q. When the process is about to change its state, the probability of moving to one of the next possible states (other than the current state) is qij/qii, i,j = 1, 2, 3,…, S, and j ≠ i. Readers are referred to27 for more details of how to simulate CTMC data.
3.1 Interval Time of 1
In all settings, we simulated 1000 duplicate data sets and set the observation time intervals equal to one for all subjects. For each duplicate data set, we generated 400 subjects and associated each subject with 13 visits to measure the outcomes. We do not include the results from some data sets as they failed to meet the convergence criteria. To fully examine the proposed method, we report the percentage of bias (PB) for each parameter, along with its bias, standard deviation (SD), square root of the mean of the estimated variance, or standard error (SE), mean of the squared error (MSE), and nominal coverage probability (CP). Note that we calculated the percentage of bias as the bias divided by the true value times 100. In addition, all results are based on data sets that meet the convergence criteria. For the Bayesian method, the criterion is ; while for the ML method, the criterion is that the Hessian approximation of the log-likelihood at the reported solution is positive definite.
Table 1 presents the results for a recurrent five-stage model and is based on 851 and 715 data sets (out of 1000) for the respective Bayesian approach and ML method implemented in the MSM package with the default settings (version 1.1.4). We generated 100 000 samples for each chain and dropped the first half, leaving the second half for inferences. The acceptance rate was about 16%. For the Bayesian approach, the point estimates were all accurate, the SDs and SEs were very close, and the CPs were all approximately 95%. The performance of the ML method was poor in comparison. We observed large biases (e.g., 21.1% for q32) and low coverage probabilities (e.g., 16 of 20 were less than 90%, 6 of 20 were less than 80%, and the lowest one was merely about 27%). Many parameters had noticeably smaller estimated SEs compared to the corresponding SDs (e.g., for q13, the SD was 0.04, and the SE was 0.029); and some had dramatically underestimated SEs (e.g., for q53, the SD was 0.041, and the SE was 0.006). To further examine the performances of these methods, we conducted simulation studies for the five-stage models with two extra sets of parameters. We found consistent performances for the proposed Bayesian approach and the ML method implemented in the MSM package (Tables 7 and 8, Appendix A).
Table 1.
Comparison of the Bayesian approach and the ML method implemented with the MSM package for a five-state model when observation intervals are equal to 1. Listing the percentage of bias (PB), coverage probability (CP), standard deviation (SD), square root of the mean of the estimated variance (SE), mean of the squared error (MSE). Large PB, poor SE and CP are highlighted in boldface.
| True Parameters | Bayesian method | MSM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | SD | SE | MSE | CP | PB | SD | SE | MSE | CP | ||
| q12 | 0.23 | 1.147 | 0.027 | 0.026 | 0.001 | 0.935 | 0.435 | 0.032 | 0.025 | 0.001 | 0.857 |
| q13 | 0.26 | 1.596 | 0.031 | 0.031 | 0.001 | 0.953 | −0.769 | 0.040 | 0.029 | 0.002 | 0.838 |
| q14 | 0.13 | 0.847 | 0.026 | 0.026 | 0.001 | 0.952 | −3.077 | 0.036 | 0.027 | 0.001 | 0.898 |
| q15 | 0.19 | −0.110 | 0.030 | 0.031 | 0.001 | 0.954 | −4.737 | 0.039 | 0.032 | 0.002 | 0.923 |
| q21 | 0.08 | 0.143 | 0.013 | 0.013 | 0.000 | 0.960 | −5.000 | 0.016 | 0.013 | 0.000 | 0.933 |
| q23 | 0.10 | 0.488 | 0.013 | 0.013 | 0.000 | 0.947 | 1.000 | 0.017 | 0.013 | 0.000 | 0.863 |
| q24 | 0.05 | −0.039 | 0.011 | 0.012 | 0.000 | 0.958 | −6.000 | 0.014 | 0.013 | 0.000 | 0.947 |
| q25 | 0.15 | 0.794 | 0.016 | 0.016 | 0.000 | 0.957 | −2.000 | 0.019 | 0.017 | 0.000 | 0.915 |
| q31 | 0.29 | 1.260 | 0.037 | 0.037 | 0.001 | 0.949 | −2.759 | 0.046 | 0.036 | 0.002 | 0.880 |
| q32 | 0.09 | −1.619 | 0.025 | 0.024 | 0.001 | 0.949 | −21.111 | 0.038 | 0.009 | 0.002 | 0.380 |
| q34 | 0.12 | 0.017 | 0.030 | 0.030 | 0.001 | 0.935 | −0.833 | 0.044 | 0.034 | 0.002 | 0.874 |
| q35 | 0.32 | 2.035 | 0.041 | 0.040 | 0.002 | 0.944 | 1.563 | 0.054 | 0.041 | 0.003 | 0.855 |
| q41 | 0.21 | 1.849 | 0.034 | 0.033 | 0.001 | 0.939 | 2.381 | 0.043 | 0.031 | 0.002 | 0.856 |
| q42 | 0.26 | 1.762 | 0.031 | 0.031 | 0.001 | 0.952 | −1.154 | 0.037 | 0.016 | 0.001 | 0.621 |
| q43 | 0.11 | −1.475 | 0.027 | 0.027 | 0.001 | 0.952 | −8.182 | 0.046 | 0.008 | 0.002 | 0.303 |
| q45 | 0.28 | 2.057 | 0.037 | 0.038 | 0.001 | 0.951 | −3.214 | 0.045 | 0.035 | 0.002 | 0.888 |
| q51 | 0.22 | 0.799 | 0.033 | 0.032 | 0.001 | 0.941 | −1.364 | 0.045 | 0.030 | 0.002 | 0.821 |
| q52 | 0.22 | 0.700 | 0.029 | 0.029 | 0.001 | 0.947 | 0.455 | 0.037 | 0.014 | 0.001 | 0.533 |
| q53 | 0.09 | 0.953 | 0.024 | 0.024 | 0.001 | 0.962 | −15.556 | 0.041 | 0.006 | 0.002 | 0.267 |
| q54 | 0.33 | 2.295 | 0.037 | 0.037 | 0.001 | 0.948 | −0.303 | 0.049 | 0.011 | 0.002 | 0.320 |
Table 7.
Comparison of the Bayesian approach and the ML method implemented with the MSM package for a five-state model when observation intervals are equal to 1. Listing the percentage of bias (PB), coverage probability (CP), standard deviation (SD), square root of the mean of the estimated variance (SE), mean of the squared error (MSE). Large PB, poor SE and CP are highlighted in boldface. The results are based on 951 and 665 data sets (out of 1000) for the Bayesian approach and the ML method, respectively.
| True Parameters | Bayesian method | MSM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | SD | SE | MSE | CP | PB | SD | SE | MSE | CP | ||
| q12 | 0.29 | 1.430 | 0.033 | 0.033 | 0.001 | 0.940 | −2.069 | 0.041 | 0.034 | 0.002 | 0.862 |
| q13 | 0.21 | 1.109 | 0.032 | 0.032 | 0.001 | 0.941 | −2.381 | 0.038 | 0.033 | 0.001 | 0.902 |
| q14 | 0.13 | −0.068 | 0.026 | 0.026 | 0.001 | 0.943 | 0.769 | 0.034 | 0.031 | 0.001 | 0.911 |
| q15 | 0.07 | −1.980 | 0.019 | 0.019 | 0.000 | 0.944 | −8.571 | 0.023 | 0.038 | 0.001 | 0.940 |
| q21 | 0.19 | −1.299 | 0.022 | 0.022 | 0.000 | 0.959 | −3.158 | 0.027 | 0.022 | 0.001 | 0.896 |
| q23 | 0.26 | 0.882 | 0.028 | 0.028 | 0.001 | 0.951 | −1.923 | 0.035 | 0.027 | 0.001 | 0.878 |
| q24 | 0.15 | −0.268 | 0.023 | 0.022 | 0.001 | 0.944 | −6.667 | 0.029 | 0.023 | 0.001 | 0.890 |
| q25 | 0.08 | 0.738 | 0.018 | 0.017 | 0.000 | 0.937 | 0.000 | 0.022 | 0.020 | 0.000 | 0.904 |
| q31 | 0.11 | −0.635 | 0.020 | 0.020 | 0.000 | 0.954 | 0.000 | 0.026 | 0.020 | 0.001 | 0.892 |
| q32 | 0.25 | 1.533 | 0.028 | 0.029 | 0.001 | 0.960 | −4.800 | 0.036 | 0.014 | 0.001 | 0.514 |
| q34 | 0.33 | 1.664 | 0.031 | 0.031 | 0.001 | 0.955 | −1.818 | 0.042 | 0.030 | 0.002 | 0.842 |
| q35 | 0.13 | 0.372 | 0.022 | 0.022 | 0.000 | 0.948 | −1.538 | 0.028 | 0.023 | 0.000 | 0.907 |
| q41 | 0.07 | −0.026 | 0.014 | 0.014 | 0.000 | 0.952 | −1.429 | 0.019 | 0.014 | 0.001 | 0.901 |
| q42 | 0.12 | 0.309 | 0.018 | 0.018 | 0.000 | 0.950 | −1.667 | 0.023 | 0.008 | 0.001 | 0.537 |
| q43 | 0.19 | 1.610 | 0.022 | 0.022 | 0.001 | 0.941 | −4.211 | 0.028 | 0.008 | 0.001 | 0.424 |
| q45 | 0.23 | 0.981 | 0.021 | 0.021 | 0.000 | 0.947 | −3.913 | 0.026 | 0.020 | 0.001 | 0.859 |
| q51 | 0.06 | 0.506 | 0.016 | 0.016 | 0.000 | 0.948 | −13.333 | 0.021 | 0.022 | 0.001 | 0.926 |
| q52 | 0.11 | −0.681 | 0.020 | 0.021 | 0.000 | 0.952 | −1.818 | 0.028 | 0.010 | 0.001 | 0.555 |
| q53 | 0.15 | 1.308 | 0.025 | 0.024 | 0.001 | 0.938 | 0.667 | 0.031 | 0.009 | 0.001 | 0.438 |
| q54 | 0.22 | 1.275 | 0.027 | 0.026 | 0.001 | 0.937 | −2.727 | 0.031 | 0.009 | 0.001 | 0.412 |
Table 8.
Comparison of the Bayesian approach and the ML method implemented with the MSM package for a five-state model when observation intervals are equal to 1. Listing the percentage of bias (PB), coverage probability (CP), standard deviation (SD), square root of the mean of the estimated variance (SE), mean of the squared error (MSE). Large PB, poor SE and CP are highlighted in boldface. The results are based on 768 and 825 data sets (out of 1000) for the Bayesian approach and the ML method, respectively.
| True Parameters | Bayesian method | MSM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | SD | SE | MSE | CP | PB | SD | SE | MSE | CP | ||
| q12 | 0.21 | 1.201 | 0.028 | 0.027 | 0.001 | 0.944 | −6.667 | 0.034 | 0.026 | 0.001 | 0.864 |
| q13 | 0.05 | 0.732 | 0.016 | 0.016 | 0.000 | 0.964 | 2.000 | 0.018 | 0.017 | 0.000 | 0.933 |
| q14 | 0.26 | 0.828 | 0.031 | 0.031 | 0.001 | 0.952 | −7.308 | 0.040 | 0.030 | 0.002 | 0.946 |
| q15 | 0.13 | −0.844 | 0.023 | 0.022 | 0.001 | 0.943 | 0.000 | 0.025 | 0.022 | 0.001 | 0.926 |
| q21 | 0.17 | 0.919 | 0.022 | 0.021 | 0.001 | 0.934 | −7.647 | 0.028 | 0.020 | 0.001 | 0.819 |
| q23 | 0.08 | 2.215 | 0.017 | 0.017 | 0.000 | 0.952 | −7.500 | 0.020 | 0.016 | 0.000 | 0.902 |
| q24 | 0.29 | 1.043 | 0.028 | 0.029 | 0.001 | 0.969 | −1.034 | 0.039 | 0.028 | 0.002 | 0.844 |
| q25 | 0.14 | −0.618 | 0.020 | 0.021 | 0.000 | 0.965 | −6.429 | 0.024 | 0.020 | 0.001 | 0.922 |
| q31 | 0.07 | 5.780 | 0.029 | 0.029 | 0.001 | 0.939 | −15.714 | 0.033 | 0.030 | 0.001 | 0.845 |
| q32 | 0.19 | 0.876 | 0.042 | 0.041 | 0.002 | 0.944 | −15.263 | 0.055 | 0.018 | 0.004 | 0.469 |
| q34 | 0.50 | 1.910 | 0.062 | 0.060 | 0.004 | 0.943 | −2.000 | 0.092 | 0.059 | 0.009 | 0.788 |
| q35 | 0.17 | −1.295 | 0.039 | 0.038 | 0.002 | 0.949 | −10.000 | 0.048 | 0.040 | 0.003 | 0.945 |
| q41 | 0.09 | 0.088 | 0.013 | 0.013 | 0.000 | 0.953 | −8.889 | 0.015 | 0.013 | 0.000 | 0.898 |
| q42 | 0.18 | 0.827 | 0.018 | 0.018 | 0.000 | 0.943 | −1.111 | 0.027 | 0.009 | 0.001 | 0.468 |
| q43 | 0.07 | 1.475 | 0.013 | 0.012 | 0.000 | 0.939 | −10.000 | 0.016 | 0.004 | 0.000 | 0.378 |
| q45 | 0.25 | 1.207 | 0.019 | 0.019 | 0.000 | 0.953 | 1.200 | 0.028 | 0.019 | 0.001 | 0.802 |
| q51 | 0.08 | 0.248 | 0.013 | 0.014 | 0.000 | 0.943 | −1.250 | 0.014 | 0.013 | 0.000 | 0.941 |
| q52 | 0.11 | 0.775 | 0.017 | 0.016 | 0.000 | 0.952 | −4.545 | 0.020 | 0.008 | 0.000 | 0.578 |
| q53 | 0.05 | 1.412 | 0.012 | 0.012 | 0.000 | 0.960 | −4.000 | 0.014 | 0.004 | 0.000 | 0.455 |
| q54 | 0.23 | 0.879 | 0.023 | 0.022 | 0.001 | 0.944 | −1.304 | 0.031 | 0.007 | 0.001 | 0.362 |
Table 2 shows the results for a general, recurrent four-state model, and is based on 966 and 908 data sets for the respective Bayesian approach and ML method implemented in the MSM package. We again generated 100 000 samples for each chain and dropped the first half, leaving the second half for inferences. The acceptance rate was about 20%. The point estimates were all accurate and comparable for the two approaches, while the ML method again provided underestimated SEs for some parameters, particularly q32 and q43.
Table 2.
Comparison of the Bayesian approach and the ML method implemented with the MSM package for a four-state model when the observation intervals are equal to 1. Listing the percentage of bias (PB), coverage probability (CP), standard deviation (SD), square root of the mean of the estimated variance (SE), mean of the squared error (MSE). Poor SE and CP are highlighted in boldface.
| True Parameters | Bayesian method | MSM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | SD | SE | MSE | CP | PB | SD | SE | MSE | CP | ||
| q12 | 0.50 | 3.182 | 0.074 | 0.071 | 0.006 | 0.937 | −1.800 | 0.079 | 0.071 | 0.006 | 0.909 |
| q13 | 0.35 | −0.364 | 0.058 | 0.058 | 0.003 | 0.949 | −0.857 | 0.069 | 0.059 | 0.005 | 0.919 |
| q14 | 0.15 | 0.414 | 0.035 | 0.036 | 0.001 | 0.955 | 2.667 | 0.041 | 0.040 | 0.002 | 0.916 |
| q21 | 0.40 | 3.755 | 0.061 | 0.059 | 0.004 | 0.937 | −1.750 | 0.067 | 0.055 | 0.004 | 0.894 |
| q23 | 0.80 | 2.122 | 0.076 | 0.078 | 0.006 | 0.957 | 0.500 | 0.093 | 0.077 | 0.009 | 0.871 |
| q24 | 0.20 | −1.675 | 0.043 | 0.045 | 0.002 | 0.948 | −4.000 | 0.052 | 0.055 | 0.003 | 0.933 |
| q31 | 0.08 | −1.872 | 0.016 | 0.016 | 0.000 | 0.960 | −1.250 | 0.020 | 0.015 | 0.000 | 0.909 |
| q32 | 0.25 | 2.504 | 0.027 | 0.027 | 0.001 | 0.951 | 0.400 | 0.033 | 0.011 | 0.001 | 0.494 |
| q34 | 0.30 | 1.129 | 0.020 | 0.021 | 0.000 | 0.951 | 1.000 | 0.023 | 0.021 | 0.001 | 0.905 |
| q41 | 0.06 | −0.820 | 0.013 | 0.013 | 0.000 | 0.942 | 1.667 | 0.015 | 0.012 | 0.000 | 0.904 |
| q42 | 0.12 | 0.586 | 0.021 | 0.020 | 0.000 | 0.941 | −2.500 | 0.024 | 0.008 | 0.001 | 0.469 |
| q43 | 0.30 | 0.578 | 0.022 | 0.023 | 0.000 | 0.951 | 1.333 | 0.027 | 0.009 | 0.001 | 0.498 |
3.2 Increased Interval Time of 1.5
As a final comparison of the proposed Bayesian model and the ML method, we conducted simulation studies for the five-state general CTMC model, as shown in Table 1, but with the observational time intervals set at 1.5. Our purpose was to investigate the results achieved when the intervals used to monitor transitions between states were increased. We noticed convergence issues for both the Bayesian and the ML methods, e.g., only 493 and 112 respective data sets had results that met the convergence criteria. The mean sojourn times (MSTs) for this five-state model were 1.23, 2.63, 1.22, 1.16, and 1.16 for the respective states of 1, 2, 3, 4, and 5. The observational time intervals (1.5) were longer than most of the MSTs, which may explain the poor convergence rates (493/1000, 112/1000) for both methods. Moreover, the Bayesian method offered similar results compared to those obtained when the observational time interval was 1; whereas the ML method performed worse with noticeably larger biases (Table 9, Appendix A). Similar results were obtained for a general four-state model, except that the convergence rates were 344/1000 and 497/1000 for the Bayesian and ML methods, respectively (Table 10, Appendix A).
Table 9.
Comparison of the Bayesian approach and the ML method implemented with the MSM package for a five-state model when observation intervals are equal to 1.5. Including the percentage of bias (PB), coverage probability (CP), standard deviation (SD), square root of the mean of the estimated variance (SE), and mean of the squared error (MSE). The results are based on 497 and 112 data sets (out of 1000) for the Bayesian approach and the ML method, respectively.
| True Parameters | Bayesian method | MSM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | SD | SE | MSE | CP | PB | SD | SE | MSE | CP | ||
| q12 | 0.230 | −0.487 | 0.030 | 0.028 | 0.001 | 0.939 | 1.739 | 0.040 | 0.027 | 0.002 | 0.804 |
| q13 | 0.260 | 3.242 | 0.036 | 0.037 | 0.001 | 0.943 | 5.385 | 0.058 | 0.045 | 0.004 | 0.759 |
| q14 | 0.130 | 0.274 | 0.035 | 0.033 | 0.001 | 0.941 | −9.231 | 0.052 | 0.062 | 0.003 | 0.920 |
| q15 | 0.190 | 2.528 | 0.041 | 0.040 | 0.002 | 0.941 | 0.000 | 0.073 | 0.079 | 0.005 | 0.902 |
| q21 | 0.080 | −0.764 | 0.014 | 0.014 | 0.000 | 0.949 | −8.750 | 0.022 | 0.017 | 0.001 | 0.929 |
| q23 | 0.100 | 1.772 | 0.014 | 0.014 | 0.000 | 0.957 | 10.000 | 0.021 | 0.014 | 0.001 | 0.723 |
| q24 | 0.050 | −1.984 | 0.013 | 0.014 | 0.000 | 0.959 | −10.000 | 0.016 | 0.015 | 0.000 | 0.964 |
| q25 | 0.150 | 1.803 | 0.019 | 0.018 | 0.000 | 0.935 | 0.000 | 0.026 | 0.019 | 0.001 | 0.848 |
| q31 | 0.290 | 2.546 | 0.045 | 0.045 | 0.002 | 0.947 | −1.379 | 0.066 | 0.048 | 0.004 | 0.857 |
| q32 | 0.090 | −2.195 | 0.030 | 0.029 | 0.001 | 0.943 | −34.444 | 0.045 | 0.007 | 0.003 | 0.170 |
| q34 | 0.120 | −1.210 | 0.042 | 0.039 | 0.002 | 0.927 | 7.500 | 0.062 | 0.064 | 0.004 | 0.804 |
| q35 | 0.320 | 2.170 | 0.052 | 0.049 | 0.003 | 0.9 3 | 7.500 | 0.074 | 0.073 | 0.006 | 0.777 |
| q41 | 0.210 | 1.701 | 0.038 | 0.041 | 0.001 | 0.970 | 13.333 | 0.072 | 0.045 | 0.006 | 0.705 |
| q42 | 0.260 | 1.760 | 0.035 | 0.034 | 0.001 | 0.943 | 3.077 | 0.056 | 0.015 | 0.003 | 0.491 |
| q43 | 0.110 | −0.530 | 0.034 | 0.034 | 0.001 | 0.957 | −22.727 | 0.074 | 0.006 | 0.006 | 0.071 |
| q45 | 0.280 | 2.839 | 0.046 | 0.048 | 0.002 | 0.963 | −9.643 | 0.075 | 0.053 | 0.006 | 0.830 |
| q51 | 0.220 | 1.980 | 0.039 | 0.040 | 0.002 | 0.949 | 1.364 | 0.074 | 0.042 | 0.005 | 0.732 |
| q52 | 0.220 | 0.819 | 0.032 | 0.032 | 0.001 | 0.943 | 2.273 | 0.052 | 0.012 | 0.003 | 0.375 |
| q53 | 0.090 | −0.977 | 0.032 | 0.032 | 0.001 | 0.939 | −25.556 | 0.068 | 0.005 | 0.005 | 0.071 |
| q54 | 0.330 | 4.640 | 0.047 | 0.046 | 0.002 | 0.927 | 2.121 | 0.075 | 0.009 | 0.006 | 0.259 |
Table 10.
Comparison of the Bayesian approach and the ML method implemented with the MSM package for a four-state model when observation intervals are equal to 1.5. Including the percentage of bias (PB), coverage probability (CP), standard deviation (SD), square root of the mean of the estimated variance (SE), and mean of the squared error (MSE). The results are based on 344 306 and 497 data sets (out of 1000) for the Bayesian approach and the ML method, respectively.
| True Parameters | Bayesian method | MSM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PB | SD | SE | MSE | CP | PB | SD | SE | MSE | CP | ||
| q12 | 0.500 | 6.299 | 0.126 | 0.133 | 0.017 | 0.953 | −2.416 | 0.139 | 0.156 | 0.019 | 0.853 |
| q13 | 0.350 | −2.528 | 0.088 | 0.088 | 0.008 | 0.959 | 4.118 | 0.112 | 0.160 | 0.013 | 0.839 |
| q14 | 0.150 | 0.378 | 0.048 | 0.048 | 0.002 | 0.945 | 3.922 | 0.052 | 0.115 | 0.003 | 0.907 |
| q21 | 0.400 | 5.767 | 0.101 | 0.107 | 0.011 | 0.948 | −2.619 | 0.108 | 0.100 | 0.012 | 0.857 |
| q23 | 0.800 | 3.780 | 0.125 | 0.129 | 0.017 | 0.948 | −1.207 | 0.141 | 0.236 | 0.020 | 0.855 |
| q24 | 0.200 | 0.284 | 0.061 | 0.064 | 0.004 | 0.948 | −2.393 | 0.069 | 0.202 | 0.005 | 0.930 |
| q31 | 0.080 | −3.155 | 0.024 | 0.024 | 0.001 | 0.948 | 1.776 | 0.031 | 0.023 | 0.001 | 0.845 |
| q32 | 0.250 | 4.852 | 0.045 | 0.044 | 0.002 | 0.933 | 1.947 | 0.060 | 0.009 | 0.004 | 0.229 |
| q34 | 0.300 | 0.643 | 0.022 | 0.024 | 0.000 | 0.965 | 0.498 | 0.028 | 0.042 | 0.001 | 0.899 |
| q41 | 0.060 | 0.572 | 0.018 | 0.018 | 0.000 | 0.953 | 8.348 | 0.023 | 0.016 | 0.001 | 0.767 |
| q42 | 0.120 | −0.570 | 0.026 | 0.028 | 0.001 | 0.956 | −9.587 | 0.041 | 0.007 | 0.002 | 0.258 |
| q43 | 0.300 | 1.116 | 0.025 | 0.027 | 0.001 | 0.953 | 2.834 | 0.038 | 0.008 | 0.002 | 0.348 |
From these results, we conclude that the proposed Bayesian approach performed better than the ML method implemented with the MSM package, especially for the complex model with five states, resulting in a total of 20 parameters. The SEs from the MSM package were obtained from Hessian approximation by default and were expected to be underestimated, resulting in lower CPs16,28,29. The SEs can be well calibrated with the bootstrap approach, which is available in the MSM package. We did not implement the bootstrap approach in the simulation studies; however, we did apply it to the case study reported in Section 4.
4 Application to the Next Step Trial Data
4.1 Data Description
The Next Step Trial was a randomized trial of colorectal cancer screening and nutrition interventions in the work place of employees of the automobile industry. Data were collected from participants in the trial at baseline and yearly for two years. The outcome variable for our study was the stage of dietary change in fat consumption, classified into five stages (precontemplation, contemplation, preparation, action, or maintenance) in the TTM model5,30,31. The analyses we report in this article were based on a cohort of 1,758 male employees who completed dietary assessments of the stage of change in fat consumption at all three survey time points. Among these survey participants, the mean age at baseline (±SE) was 58.3 ±10.7 years, the mean number of years of education (±SE) was 13.6 ±2.6, 1,693 (97%) were white, 1,588 were married (90%), and 853 (49%) were retired. The data were not sufficient to estimate transitions from preparation to the other stages; thus, we combined the stages of contemplation and preparation and labeled them as contemplation. The resulting four stages in our study were precontemplation (P), contemplation (C), action (A) and maintenance (M). The proportions of participants in each of these stages at baseline were 8.7%, 18.7%, 40.8% and 31.8% for P, C, A and M, respectively. The average length of follow-up was 1.04 years, and the median was 1.03 years (95% were within 0.78–1.29 years). The numbers of transitions were (127, 66, 75, 23) from stage P to stages P, C, A and M, respectively; (72, 220, 265, 58) from stage C to stages P, C, A and M, respectively; (54, 221, 901, 308) from stage A to stages P, C, A and M, respectively; and (32, 84, 262, 748) from stage M to stages P, C, A and M, respectively.
4.2 Results for Parameters of the Infinitesimal Matrix
At the outset, we ran three parallel chains for the proposed Bayesian method with over-dispersed initial values, as discussed in Section 2.3. We generated 120 000 samples for each chain and dropped the first half, leaving the second half for inferences. The acceptance rate was 23%, with a Brooks-Gelman statistic of for all parameters. We calculated the DIC as 7351.5. For the MSM package, using the recommended initial values, the asymptotic standard errors could not be calculated and the optimization algorithm did not converge. In this situation, users of MSM may have difficulty finding the optimum values. We therefore tried different initial values. The smallest value of deviance among the models we tried was 7328.0, and the AIC was calculated as 7328.0+24=7352.0. The quantities of DIC and AIC were close, which was expected22. Note that the DIC and AIC were used for model selection, and turned out to be helpful for the selection of the optimal estimates for the ML method.
In Table 3, the point estimates from the MSM package and the Bayesian approach are relatively close, and the SEs for q32, q42, q43 obtained from the ML method implemented with the MSM package are smaller than those obtained from the Bayesian approach. Further, for these three parameters, the 95% confidence intervals obtained from the MSM package were obviously narrower than the corresponding 95% credible intervals obtained from the Bayesian approach. This was consistent with our simulation results. As the last step, we performed goodness-of-fit tests. For simplicity, we used the numbers of transitions as the test statistics. We used the baseline stages as the initial values and used the observation time intervals to generate predicted data sets. We generated 1000 predicted data sets, and observed no extreme Bayesian predictive p-values (≤0.05 or ≥0.95, Figure 1). For the ML method, we used the bootstrap approach26. The results (not shown) do not suggest a discrepancy between the model and the data.
Table 3.
Comparison of the Bayesian approach and the ML method implemented with the MSM package using the Next Step Trial data. Stages in this study are precontemplation (P), contemplation (C), action (A) and maintenance (M). Relatively small standard errors (SEs) and relatively narrow 95% confidence intervals are highlighted in boldface.
| Bayesian method | MSM | |||||||
|---|---|---|---|---|---|---|---|---|
| Parameter | Mean | Sd | CI† : | (95%) | Mean | Sd | CI‡ | (95%) |
| q12(P-C) | 0.596 | 0.102 | 0.416 | 0.807 | 0.573 | 0.096 | 0.413 | 0.796 |
| q13(P-A) | 0.259 | 0.083 | 0.100 | 0.427 | 0.253 | 0.079 | 0.137 | 0.468 |
| q14(P-M) | 0.064 | 0.036 | 0.005 | 0.138 | 0.073 | 0.034 | 0.029 | 0.183 |
| q21(C-P) | 0.344 | 0.051 | 0.246 | 0.447 | 0.292 | 0.044 | 0.217 | 0.394 |
| q23(C-A) | 1.024 | 0.086 | 0.866 | 1.203 | 0.995 | 0.075 | 0.858 | 1.153 |
| q24(C-M) | 0.004 | 0.014 | 0.000 | 0.057 | 0.016 | 0.028 | 0.001 | 0.512 |
| q31(A-P) | 0.010 | 0.012 | 0.000 | 0.044 | 0.027 | 0.012 | 0.011 | 0.066 |
| q32(A-C) | 0.356 | 0.035 | 0.289 | 0.428 | 0.328 | 0.013 | 0.303 | 0.355 |
| q34(A-M) | 0.344 | 0.021 | 0.303 | 0.387 | 0.337 | 0.021 | 0.298 | 0.382 |
| q41(M-P) | 0.032 | 0.012 | 0.010 | 0.055 | 0.033 | 0.010 | 0.018 | 0.059 |
| q42(M-C) | 0.089 | 0.022 | 0.047 | 0.135 | 0.090 | 0.008 | 0.076 | 0.107 |
| q43(M-A) | 0.342 | 0.029 | 0.287 | 0.402 | 0.339 | 0.011 | 0.318 | 0.361 |
: Bayesian credible intervals;
: confidence intervals.
Figure 1.

Bayesian predictive p-values with 1000 duplicate data sets. The red lines represent the observed transition counts. Precontemplation, contemplation, action and maintenance are represented by 1, 2, 3, and 4, respectively; TC12, for example, is for transitions from precontemplation to contemplation.
4.3 Results for Bayesian Inferences
In practice, parameters of the infinitesimal matrix do not have a direct interpretation, thus we focused on the MST and the one-year transition probabilities. We calculated the (MST) as 1/qii and used the C program we developed to calculate the one-year transition probability, denoted by pij(1) = Pr[yk(s + 1) = j|yk(s) = i] for i, j = 1, 2, …,S. For the Bayesian method, we used the Monte Carlo algorithm to calculate these statistics, using samples generated for the posterior distributions (randomly selecting 50% of the posterior samples). The R package MSM has options to calculate these statistics, except for the SEs of the one-year transition probabilities; therefore, they are not displayed for both methods. We noticed that MSM offered narrower confidence intervals for the MSTs compared to those credible intervals obtained from the proposed Bayesian method, especially for the stage of maintenance (Table 4). We found similar results for the one-year transition probabilities (Table 5), where the proposed method provided 95% credible intervals that were comparable to or wider than the corresponding 95% confidence intervals obtained from the MSM method. The point estimates resulting from the two approaches were quite close. For instance, the confidence interval for the transition probability of M-A was estimated as (0.218–0.240), while the corresponding credible interval was estimated as (0.206–0.255). These differences may arise from the underestimated standard errors in Table 3, since the MSTs and transition probabilities are functions of the transition rates.
Table 4.
The mean sojourn times of being at the stages of precontemplation (P), contemplation (C), action (A), and maintenance (M) for participants in the Next Step Trial.
| Parameter | Bayesian method | MSM | ||||||
|---|---|---|---|---|---|---|---|---|
| Mean | Sd | CI†: | (95%) | Mean | Sd | CI‡ | (95%) | |
| P | 1.097 | 0.097 | 0.918 | 1.298 | 1.112 | 0.099 | 0.935 | 1.323 |
| C | 0.733 | 0.055 | 0.632 | 0.845 | 0.767 | 0.046 | 0.683 | 0.862 |
| A | 1.413 | 0.076 | 1.266 | 1.566 | 1.443 | 0.056 | 1.339 | 1.557 |
| M | 2.170 | 0.121 | 1.942 | 2.422 | 2.163 | 0.070 | 2.029 | 2.305 |
: Bayesian credible intervals;
: confidence intervals.
Table 5.
One-year transition probabilities for participants in the Next Step Trial. Stages in this study are precontemplation (P), contemplation (C), action (A) and maintenance (M).
| Transition | Bayesian method | MSM | ||||
|---|---|---|---|---|---|---|
| Mean | CI† : | (95%) | Mean | CI‡ | (95%) | |
| P-P | 0.442 | 0.387 | 0.497 | 0.444 | 0.372 | 0.488 |
| P-C | 0.231 | 0.185 | 0.279 | 0.229 | 0.174 | 0.277 |
| P-A | 0.254 | 0.206 | 0.306 | 0.249 | 0.210 | 0.314 |
| P-M | 0.073 | 0.048 | 0.107 | 0.078 | 0.060 | 0.147 |
| C-P | 0.125 | 0.099 | 0.151 | 0.114 | 0.087 | 0.138 |
| C-C | 0.353 | 0.316 | 0.393 | 0.362 | 0.264 | 0.394 |
| C-A | 0.432 | 0.397 | 0.466 | 0.431 | 0.379 | 0.462 |
| C-M | 0.090 | 0.078 | 0.107 | 0.093 | 0.083 | 0.254 |
| A-P | 0.033 | 0.026 | 0.043 | 0.037 | 0.029 | 0.052 |
| A-C | 0.152 | 0.134 | 0.171 | 0.148 | 0.128 | 0.157 |
| A-A | 0.608 | 0.583 | 0.631 | 0.611 | 0.586 | 0.630 |
| A-M | 0.207 | 0.187 | 0.227 | 0.205 | 0.188 | 0.232 |
| M-P | 0.027 | 0.018 | 0.038 | 0.028 | 0.021 | 0.042 |
| M-C | 0.073 | 0.059 | 0.090 | 0.073 | 0.064 | 0.081 |
| M-A | 0.230 | 0.206 | 0.255 | 0.229 | 0.218 | 0.240 |
| M-M | 0.669 | 0.642 | 0.697 | 0.669 | 0.650 | 0.686 |
: Bayesian credible intervals;
: confidence intervals.
We analysed the nutrition intervention data from the Next Step Trial with the proposed Bayesian approach and the ML method implemented with the MSM package (version 1.1.4 with the default settings). We found these results to be consistent with the results obtained in the simulation studies, i.e., the two methods offered similar point estimates, while the MSM package provided underestimated SEs for some parameters. To obtain the calibrated estimates, we further analysed the data with the bootstrap approach, using the options that are available in the MSM package. We obtained comparable results for the transition rates, sojourn times and transition probabilities (not shown). Although the bootstrap method works when the model estimation algorithm converges, this was not the case when we used the recommended initial values for model estimation.
4.4 Transitions Restricted to Adjacent States
To investigate a situation in which instantaneous transitions between states of health are limited to adjacent moves in either direction, we fitted a model to the Next Step Trial data. We allowed instantaneous transitions to occur only between adjacent states in both directions: between P and C, between C and A, and between A and M. We observed extreme Bayesian predictive p-values (<0.05 or >0.95) for this simplified model (Figure 2, Appendix B), which indicates discrepancies between the model and the data. We find general CTMC modelling to be suitable for the TTM model of behavioural changes (Figure 1), as individuals will jump from one state to any of the other states7,8.
Figure 2.

Bayesian predictive p-values with 1000 duplicate data sets for a simplified model, where we allowed only adjacent instantaneous transitions in both directions, between P and C, between C and A, and between A and M. The red lines represent the observed transition counts. Precontemplation, contemplation, action and maintenance are represented by 1, 2, 3, and 4, respectively; TC12, for example, is for transitions from precontemplation to contemplation.
5 Discussion
We encountered convergence issues when we applied the MSM package to analyse the nutrition intervention data; numerical problems and convergence failures are not rare when fitting CTMC models. As instantaneous transitions are not directly observed in practice, the observational time intervals may be too long (e.g., longer than the MSTs) and a CTMC model may be misspecified (e.g., assuming an instantaneous transition that does not actually occur). In both scenarios, the model may not be identifiable or some parameters may not be well estimated1,15,16,32. We recommend being aware of these situations and using care when fitting CTMC models.
To have value in applications, a CTMC model must be biologically meaningful. For example, in a model in which the health states are defined by the concentration of CD4 cells in a blood sample, which is an important factor for patients infected with HIV, a patient may instantaneously transition only from the current state to an adjacent state, and may not jump to a different state unless the jump is to the last absorbing state of death1. If all instantaneous transitions (except for the state of death) are allowed, this model will not be identifiable, resulting in a convergence failure for the maximization algorithms16. In our final modelling of the Next Step Trial data, we restricted transitions in a way that did not correlate with reality, which resulted in a model that did not fit the data. Such an over-simplified model may not have a convergence issue, but may fail to adequately describe the data. To avoid scientifically meaningless models, we should employ biological knowledge (e.g., CD4 counts for HIV infection) and check the goodness-of-fit for a selected model.
In addition, the observational time intervals need to be reasonable so that the observed data carry enough information for model estimation. The likelihood is constructed from the observed transitions, which depend on the observational time intervals. When the intervals are too long, it becomes more likely that more than one instantaneous transition will occur within that time period. As a result, the CTMC model may not be identified or some parameters may not be well estimated15. Our final simulation study, which used a lengthy observational time interval, resulted in convergence issues under both methods. Although the two methods may not be comparable because of the low convergence rates, the Bayesian approach outperformed the ML method by providing more accurate estimates. These results are not surprising. When the observed data do not provide sufficient information, the likelihood tends to be flat, which causes convergence failure for the ML methods. Bayesian methods, however, average the posterior samples and thus still provide reasonable estimates. The model estimation could be improved by integrating informative priors; however, we advise caution as “incorrect” priors may generate unstable and misleading results33,34.
In summary, we urge researchers to ensure that the application of a CTMC model is biologically meaningful, and to utilize prior knowledge of the MSTs when determining the observational time intervals for a scientific study. In our experience, the observational time intervals are sufficient if they are shorter than or close to the MSTs. Convergence issues occur when the observational time intervals are not sufficient, i.e., they are substantially longer than the MSTs, even when the models are properly specified; see the results in Section 3.2.
6 Limitations and Conclusions
In this study, we developed a Bayesian method to fit CTMC models and made comparisons with ML methods implemented with the R package of MSM. The Bayesian approach outperformed the ML methods in our simulation studies. When applied to data from the Next Step Trial, the two methods provided comparable point estimates, but the ML method still underestimated some standard errors. We applied the bootstrap method that is available in the MSM package and obtained well calibrated results. The bootstrap method does not work when the estimation algorithms fail to converge, which often occur when the sample size is small and/or when the observational time intervals are longer than the MSTs. The Bayesian approach is therefore more useful than the ML method in these situations.
We analysed the Next Step Trial data using continuous time Markov chain models. This application bears the same limitations as other methods, i.e., the assumptions that we have a Markov process and that the process is time-homogeneous26,35. Piecewise homogeneous models may provide a better fitting model; however, the exploration of these models is not the focus of this study. Moreover, our assumption that individuals in the Next Step Trial followed a general CTMC model may also have some limitations. Although we did not find discrepancies between the fitted model and the data, further investigations in this regard are needed. Such investigations are beyond the scope of the current study.
Acknowledgments
Junsheng Ma was supported by the NIH grant 2T32GM074902-06 and the Lorne C. Bain Endowment. The Next Step Trial was funded by NCI grant CA52605. The authors thank LeeAnn Chastain for editing assistance.
Appendix A Additional simulation results
To fully investigate the performances of the Bayesian and ML methods, we conducted extra empirical studies with two sets of parameters for two 5-state CTMC models. These model parameters are chosen to reflect real data circumstances36. In Table 7, the true parameters represent a situation in which a subject’s current state is about to change, and he or she is more likely to move from the current state to an adjacent state9. Recall that the subject leaves state i for state j with the probability of qij/qii for i ≠ j (Section 2.1). For example, the true value of parameter q12 is 0.29, which is greater than that for q15 = 0.07. If a subject is in state 1, then he or she will be more likely to move to the adjacent state of 2. Moreover, in Table 8, the parameters reflect a situation in which the subject is less likely to move from his or her current state to state 3 (given that he or she is not in state 3). This scenario mimics the Next Step Trial data, with the observed number of transitions shown in Table 6. Note that there were not sufficient data to estimate the transition rates for the state of preparation for the case study. The parameters for the simulation study are set to reflect reality as well as to assure that the model is still estimable with reasonable observation time intervals.
Table 6.
The number of all the possible transitions for participants in the Next Step Trial.
| Precontemplation | Contemplation | Preparation | Action | Maintenance | |
|---|---|---|---|---|---|
| Precontemplation | 127 | 66 | 0 | 75 | 23 |
| Contemplation | 72 | 193 | 16 | 232 | 52 |
| Creparation | 0 | 7 | 4 | 33 | 6 |
| Action | 54 | 192 | 23 | 901 | 308 |
| Maintenance | 32 | 70 | 14 | 262 | 748 |
Appendix B Bayesian predictive p-values for a simplified CTMC model
References
- 1.Gentleman RC, Lawless JF, Clindsey JC, Yan P. Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Stat Med. 1994;13:805–821. doi: 10.1002/sim.4780130803. [DOI] [PubMed] [Google Scholar]
- 2.Chen HH, Duffy SW, Tabar L. A Markov chain method to estimate the tumour progression rate from preclinical to clinical phase, sensitivity and positive predictive value for mammography in breast cancer screening. Statistician. 1996;45:307–317. [Google Scholar]
- 3.Combescure C, Chanez P, Saint-Pierre P, et al. Assessment of variations in control of asthma over time. Eur Respir J. 2003;22:298–304. doi: 10.1183/09031936.03.00081102. [DOI] [PubMed] [Google Scholar]
- 4.Uhry Z, Hédelin G, Colonna M, et al. Multi-state Markov models in cancer screening evaluation: a brief review and case study. Stat Methods Med Res. 2010;19:463–486. doi: 10.1177/0962280209359848. [DOI] [PubMed] [Google Scholar]
- 5.Kristal AR, Glanz K, Tilley BC, Li S. Mediating factors in dietary change: understanding the impact of a worksite nutrition intervention. Health Educ Behav. 2000;27:112–125. doi: 10.1177/109019810002700110. [DOI] [PubMed] [Google Scholar]
- 6.Prochaska JO, Velicer WF. The transtheoretical model of health behavior change. Am J Health Promot. 1997;12:38–48. doi: 10.4278/0890-1171-12.1.38. [DOI] [PubMed] [Google Scholar]
- 7.De Nooijer J, Van Assema P, De Vet E, Brug J. How stable are stages of change for nutrition behaviors in the Netherlands? Health Promot Int. 2005;20:27–32. doi: 10.1093/heapro/dah504. [DOI] [PubMed] [Google Scholar]
- 8.Larabie LC. To what extent do smokers plan quit attempts? Tob Control. 2005;14:425–428. doi: 10.1136/tc.2005.013615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Carbonari JP, DiClemente CC, Sewell KB. Stage transitions and the transtheoretical “stages of change” model of smoking cessation. Swiss J Psychol. 1997;58:134–144. [Google Scholar]
- 10.Verbeke G, Fieuws S, Molenberghs G. The analysis of multivariate longitudinal data: A review. Stat Methods Med Res. 2014;23:42–59. doi: 10.1177/0962280212445834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ma J, Chan W, Tsai C, Xiong M, Tilley BC. Analysis of transtheoretical model of health behavioral changes in a nutrition intervention study—a continuous time Markov chain model with Bayesian approach. Stat Med. 2015;34:3577–3589. doi: 10.1002/sim.6571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Moler C, Van Loan C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev Soc Ind Appl Math. 2003;45:3–49. [Google Scholar]
- 13.Jones RH, Xu S, Grunwald GK. Continuous time Markov models for binary longitudinal data. Biom J. 2006;48:411–419. doi: 10.1002/bimj.200510224. [DOI] [PubMed] [Google Scholar]
- 14.Li Y, Chan W. Analysis of longitudinal multinomial outcome data. Biom J. 2006;48:319–326. doi: 10.1002/bimj.200510187. [DOI] [PubMed] [Google Scholar]
- 15.Kalbfleisch JD, Lawless JF. The analysis of panel data under a Markov assumption. J Am Stat Assoc. 1985;80:863–871. [Google Scholar]
- 16.Jackson CH. Multi-state models for panel data: the MSM package for R. J Stat Softw. 2011;38:1–29. [Google Scholar]
- 17.Welton NJ, Ades AE. Estimation of Markov chain transition probabilities and rates from fully and partially observed data: uncertainty propagation, evidence synthesis, and model calibration. Med Decis Making. 2005;25:633–645. doi: 10.1177/0272989X05282637. [DOI] [PubMed] [Google Scholar]
- 18.Pan SL, Chen HH. Time-varying Markov regression random-effect model with Bayesian estimation procedures: Application to dynamics of functional recovery in patients with stroke. Math Biosci. 2010;227:72–79. doi: 10.1016/j.mbs.2010.06.003. [DOI] [PubMed] [Google Scholar]
- 19.Price MJ, Welton NJ, Ades AE. Parameterization of treatment effects for meta-analysis in multi-state Markov models. Stat Med. 2011;30:140–151. doi: 10.1002/sim.4059. [DOI] [PubMed] [Google Scholar]
- 20.Bhat UN, Miller GK. Elements of applied stochastic processes. 3rd. Hoboken, NJ: John Wiley and Sons; 2002. [Google Scholar]
- 21.Ross SM. Stochastic processes. 2nd. Hoboken, NJ: John Wiley and Sons; 1996. [Google Scholar]
- 22.Carlin BP, Louis TA. Bayesian methods for data analysis. 3rd. Boca Raton, FL: Chapman and Hall/CRC Press; 2009. [Google Scholar]
- 23.Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. 2nd. Boca Raton, FL: Chapman and Hall/CRC Press; 2003. [Google Scholar]
- 24.Dmitrienko A, Wang MD. Bayesian predictive approach to interim monitoring in clinical trials. Stat Med. 2006;25:2178–2195. doi: 10.1002/sim.2204. [DOI] [PubMed] [Google Scholar]
- 25.Saint-Pierre P, Combescure C, Daures JP, Godard P. The analysis of asthma control under a Markov assumption with use of covariates. Stat Med. 2003;22:3755–3770. doi: 10.1002/sim.1680. [DOI] [PubMed] [Google Scholar]
- 26.Aguirre-Hernandez R, Farewell VT. A Pearson-type goodness-of-fit test for stationary and time-continuous Markov regression models. Stat Med. 2002;21:1899–1911. doi: 10.1002/sim.1152. [DOI] [PubMed] [Google Scholar]
- 27.Banks HT, Broido A, Canter B, et al. Simulation algorithms for continuous time Markov chain models. Stud Appl Electromag Mech. 2012;37:3–18. [Google Scholar]
- 28.Efron B. Better bootstrap confidence intervals. J Am Stat Assoc. 1987;82:171–185. [Google Scholar]
- 29.Christopoulos A, editor. Biomedical applications of computer modeling. Boca Raton, FL: CRC Press; 2000. [Google Scholar]
- 30.Tilley BC, Vernon SW, Glanz K, et al. Worksite cancer screening and nutrition intervention for high-risk auto workers: Design and baseline findings of the Next Step Trial. Prev Med. 1997;26:227–235. doi: 10.1006/pmed.1996.0132. [DOI] [PubMed] [Google Scholar]
- 31.Tilley BC, Vernon SW, Myers R, et al. The Next Step Trial: Impact of a worksite colorectal cancer screening promotion program. Prev Med. 1999;28:276–283. doi: 10.1006/pmed.1998.0427. [DOI] [PubMed] [Google Scholar]
- 32.Mhoon KB, Chan W, Del J, Deborah J, Vernon SW. A continuous time Markov chain approach analyzing the stages of change construct from a health promotion intervention. JP J Biostat. 2010;4:213. [PMC free article] [PubMed] [Google Scholar]
- 33.Muller CJB. PhD Thesis. Stellenbosch University; Stellenbosch: 2012. Bayesian approaches of Markov models embedded in unbalanced panel data. [Google Scholar]
- 34.Ventura L, Carreras G, Puliti D, et al. Comparison of multi-state Markov models for cancer progression with different procedures for parameters estimation. Epidemiol, Biostat Public Health. 2014;11:1–10. [Google Scholar]
- 35.Hubbard RA, Inoue L, Fann J. Modeling nonhomogeneous Markov processes via time transformation. Biometrics. 2008;64:843–850. doi: 10.1111/j.1541-0420.2007.00932.x. [DOI] [PubMed] [Google Scholar]
- 36.Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;25:4279–4292. doi: 10.1002/sim.2673. [DOI] [PubMed] [Google Scholar]
