Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jun 27.
Published in final edited form as: Biometrics. 2017 Oct 11;74(2):636–644. doi: 10.1111/biom.12792

Bayesian variable selection for multistate Markov models with interval-censored data in an ecological momentary assessment study of smoking cessation

Matthew D Koslovsky 1,*, Michael D Swartz 1, Wenyaw Chan 1, Luis Leon-Novelo 1, Anna V Wilkinson 2, Darla E Kendzor 3, Michael S Businelle 3
PMCID: PMC5895542  NIHMSID: NIHMS908547  PMID: 29023626

Summary

The application of sophisticated analytical methods to intensive longitudinal data, collected with ecological momentary assessments (EMA), has helped researchers better understand smoking behaviors after a quit attempt. Unfortunately, the wealth of information captured with EMAs is typically underutilized in practice. Thus, novel methods are needed to extract this information in exploratory research studies. One of the main objectives of intensive longitudinal data analysis is identifying relations between risk factors and outcomes of interest. Our goal is to develop and apply expectation maximization variable selection for Bayesian multistate Markov models with interval-censored data to generate new insights into the relation between potential risk factors and transitions between smoking states. Through simulation, we demonstrate the effectiveness of our method in identifying associated risk factors and its ability to outperform the LASSO in a special case. Additionally, we use the expectation conditional-maximization algorithm to simplify estimation, a deterministic annealing variant to reduce the algorithm’s dependence on starting values, and Louis’s method to estimate unknown parameter uncertainty. We then apply our method to intensive longitudinal data collected with EMA to identify risk factors associated with transitions between smoking states after a quit attempt in a cohort of socioeconomically disadvantaged smokers who were interested in quitting.

Keywords: Bayesian multistate models, continuous-time Markov process, ecological momentary assessment, EMVS, tobacco cessation

1. Introduction

Ecological momentary assessment (EMA) is a sampling method that allows researchers to collect a rich stream of repeated assessment data which can help determine the psychological and environmental factors associated with an individual’s behavioral change, in their natural environment (Shiffman et al., 1997). EMAs capture an individual’s experiences close to their occurrence at a high temporal resolution using various assessment tools, such as smart phone apps. Consequently, a larger number of moments are observed than in traditional longitudinal studies, which may provide a more accurate depiction of an individual’s behavior over time. EMA data are referred to as intensive longitudinal data (Walls and Schafer, 2005). One of the main objectives of intensive longitudinal data analysis is to identify or re-affirm complex relations between risk factors and behavioral outcomes over time (Walls and Schafer, 2005).

In both intensive and traditional longitudinal studies, researchers often monitor individuals as they transition through discrete behavioral states, such as smoking status. In practice, assessments rely on compliance. As a result, assessments are sometimes missing, unequally spaced, and the exact time of transition between states is unknown (i.e., transition times are interval-censored). For traditional longitudinal studies, this type of data structure is commonly analyzed using multistate, continuous-time Markov models (MSMs) (Kay, 1986; Kalbfleisch and Lawless, 1985; Jones et al., 2006; Marshall and Jones, 1995; Saint-Pierre et al., 2003; Pan et al., 2007; Ma et al., 2015). MSMs can offer insights into behavioral processes (Saint-Pierre et al., 2003). By including covariates in these models, researchers are able to assess which risk factors are associated with an individual’s transition between behavioral states (Saint-Pierre et al., 2003). For instance, in an exploratory study monitoring an individual’s smoking behaviors after a planned quit date, a two-state Markov model could help identify which risk factors are associated with transitioning from a non-smoking to smoking state or from a smoking to non-smoking state. Even though MSMs are a versatile and convenient approach to analyzing traditional longitudinal data (Farewell and Tom, 2014) and are an available tool for identifying complex relations between potential risk factors and behavioral outcomes over time, they have yet to be applied to intensive longitudinal data.

The main objective of this study is to identify risk factors associated with transition between discrete smoking states using a MSM for intensive longitudinal data with interval-censoring. Currently, MSMs for interval-censored data lack an efficient, practical approach for selecting risk factors associated with transition rates. For traditional longitudinal data analyses, variable selection in MSMs has been conducted using goodness-of-fit tests and comparison methods (Marshall and Jones, 1995; Saint-Pierre et al., 2003; Pan et al., 2007; Jones et al., 2006; Aguirre-Hernández and Farewell, 2002; Farewell and Tom, 2014). This approach is suitable when the number of potential covariates is relatively small. However, for a process with k possible transitions and p potential risk factors, there are 2kp possible models to compare. Additionally as k and p increase, MSMs’ likelihood functions become complicated to compute and parameter estimates become unstable during estimation (Saint-Pierre et al., 2003). While larger sample sizes associated with intensive longitudinal data help mitigate parameter instability compared with traditional longitudinal data, multiple comparison methods remain impractical for variable selection in large model spaces and inflate type I error. Model spaces are reducible by intuitively constraining regression coefficients (Marshall and Jones, 1995), however this approach is suggested for more confirmatory research settings testing hypotheses about a behavioral process as opposed to exploratory research settings when the process being modeled is less understood. Thus, the question remains as to which covariates to select for the model.

While variable selection methods for MSMs with exact transition times are available (Reulen and Kneib, 2016, 2015), no variable selection methods have been developed for MSMs with interval-censored data. Expectation maximization variable selection (EMVS), a deterministic Bayesian variable selection method inspired by stochastic search variable selection (Ročková and George, 2014; George and McCulloch, 1993), is a promising method for MSMs because it is efficient at identifying associated covariates and is capable of accommodating various outcome data structures (Ročková and George, 2014; Koslovsky et al., 2016; Zhao and Lian, 2016; McDermott et al., 2016). Since this method performs selection on all covariates simultaneously, it does not face issues of multiple comparisons, which increases modeling efficiency and controls type error rates (Gelman et al., 2014). In contrast to stochastic search variable selection, where inference is drawn from the fully sampled posterior distribution using Markov Chain Monte Carlo, EMVS simply estimates the posterior modes with the expectation maximization (EM) algorithm (Dempster et al., 1977). It is known that the EM algorithm is sensitive to starting values. By adding a deterministic annealing variant, its dependency on initial values is reduced (Ueda and Nakano, 1998). As a result, EMVS outperforms stochastic search variable selection in a fraction of the time (Ročková and George, 2014). However, efficiency gains come at a price, as EMVS lacks any intrinsically defined procedures to estimate unknown parameter variances. While there are several methods available for estimating variances when applying the EM algorithm (McLachlan and Krishnan, 2007), previous EMVS research has ignored variance estimation, only focusing on its performance at selecting associated covariates. As a result, researchers are unable to measure uncertainty in the final model when using EMVS. This limits the practicality of the method, since unbiased model interpretation relies on accurately accounting for model uncertainty (Chatfield, 2006).

In this paper, we take advantage of EMVS’s validated performance in selecting covariates, efficiency, and flexibility to various data structures, by developing it for MSMs to identify relations between risk factors and smoking behaviors using interval-censored, intensive longitudinal data collected using EMA to investigate smoking cessation attempts by a cohort of 146 socioeconomically disadvantaged individuals who were interested in quitting (Kendzor et al., 2015). At each EMA, individuals responded to a set of core items regarding their cognitions, affect, behaviors, environment, as well as their smoking status (non-smoking or smoking) since the last assessment. Thus, we chose a two-state Markov model to analyze transitions between smoking states. Additionally, we provide closed-form expressions for the asymptotic variance estimates of the model’s unknown parameters which incorporates parameter estimation as well as variable selection uncertainty to facilitate unbiased interpretation of the model and increase the usefulness of this method in practice. The main focus of our application is to demonstrate how the proposed method could be used to identify which risk factors, from a pool of EMA items and baseline measures, are associated with smoking transition rates after the scheduled quit attempt. This insight may help public health researchers design effective, real-time smoking cessation interventions. By targeting individuals at high risk moments, these interventions could help decrease smoking lapse and ultimately prevent relapse.

The remaining sections of this paper are organized as follows. In Section 2, we develop EMVS with a deterministic annealing variant for a two-state, continuous-time Markov model and provide closed-form expressions for the asymptotic variance estimates of unknown parameters. In Section 3, we conduct simulation studies to assess the performance of our method. In Section 4, we use EMVS for MSMs to identify risk factors associated with transitioning between smoking states in a cohort of socioeconomically disadvantaged smokers in a smoking cessation trial. In Section 5, we provide a discussion of our method’s development.

2. Methods

2.1 Model Formulation

We demonstrate how EMVS can be developed for a Bayesian MSM with interval-censored data to identify risk factors related to transitions between smoking states. We illustrate our method on a two-state, continuous-time Markov model, which coincides with our application. Let Yi(tij) represent the smoking state of an individual, i = 1, 2, … m, at a given assessment time, tij. At each time point, we observe individuals in one of two discrete states, defined as non-smoking (N) or smoking (S). Let j = 1, …, ni represent the potentially unbalanced number of recurrent assessments for each individual, i. Under the assumption of a homogeneous Markov process, the transition rate matrix, Q, for a two-state model is defined as (Cox and Miller, 1977):

NextStateNSQ=CurrentNStateS[-λμλ-μ],

where λ and μ are the positive transition rates from N → S and S → N, respectively. The transition probability matrix, P(δij) = exp(Qδij), is defined as :

NextStateNSP(δij)=CurrentNStateS[PNN(δij)PSN(δij)PNS(δij)PSS(δij)],

where δij = ti,jti,j−1. This illustrates the transition probability for an increment of time, δij, between assessments. The transition probabilities are obtainable in closed-form, where

PNS(δij)=1-PNN(δij)=λλ+μ[1-exp(-(λ+μ)δij)] (1)

and

PSN(δij)=1-PSS(δij)=μλ+μ[1-exp(-(λ+μ)δij)]. (2)

Pinsky and Karlin (2010) provide a detailed proof of the transition probabilities’ derivation.

Our method is focused on identifying the relation between transition rates and a set of risk factors or covariates (e.g., negative affect, cigarette availability). Thus, we introduce individual i’s observed covariates at assessment j, xij=(xij1,,xijp), into the model by redefining the transition rates λ and μ in Equations (Eq.) 1 and 2 with λ=λij=exp(λ0+xijβλ) and μ=μij=exp(μ0+xijβμ), similar to (Jones et al., 2006). Here, exp(λ0) and exp(μ0) represent baseline hazard rates (xij = 0) for transitioning between smoking states. Each term in the two regression coefficient vectors, βλ=(βλ,1,,βλ,p) and βμ=(βμ,1,,βμ,p), is interpreted as a log-hazard ratio (Cox, 1972). This formulation allows each covariate, xijr, to uniquely affect both transition rates through βλ,r and βμ,r. Transition rates are parameterized with an exponential form since it provides a likelihood function that has a higher chance for parameter convergence (Pan et al., 2007). We assume that covariate values remain constant between consecutive assessments, but immediately at the jth assessment, the covariate value changes from the value at assessment j − 1 (Jones et al., 2006). If the covariates remain constant over the assessment window, we can use one p × 1 vector of covariates, xi, to calculate each individual’s transition rates. However, we may observe a different set of covariates at each assessment. So for each individual’s ni − 1 observed transitions, ni − 1 different xij could be used to compute their transition rates.

Since we assume that data from different individuals are independent, the likelihood function for these data is calculated as the product of each of the m individuals’ ni − 1 observed transition probabilities, conditioned on their respective covariates. In this analysis, we are primarily interested in the effect of each covariate on transition rates, so we treat the probability of starting out in any state as constant, similar to (Li and Chan, 2006; Saint-Pierre et al., 2003). The likelihood function is then defined as

L(βλ,βμ,λ0,μ0y)=i=1mj=2niPyi(ti,j-1),yi(ti,j)(δijxi,j-1). (3)

To illustrate: say individual i logs an assessment at times ti,1, ti,2,ti,ni. At time ti,1, we observe him/her in state N, and at time ti,2 the individual is observed in state S. Then, the contribution to the likelihood for this individual’s transition from non-smoking at time ti,1 to smoking at time ti,2 is represented by PNS(δi2|xi,1).

As for any Bayesian model’s formulation, the posterior distribution is proportional to the likelihood contribution of the data multiplied by the unknown parameters’ prior distributions. First, we set the log baseline hazard rates, λ0 and μ0, to follow a normal prior distribution with mean 0 and diffuse variance v1. The prior distributions of the two regression coefficient vectors, βλ=(βλ,1,,βλ,p) and βμ=(βμ,1,,βμ,p), regulate the variable selection procedure within EMVS. We set

π(βλγλ,v0,v1)=Np(0,Dγλ),

where 0 is a p-dimensional vector of zeros, and v0 and v1 are pre-set variances of exclusion and inclusion, respectively (Ročková and George, 2014; George and McCulloch, 1993). Setting v0 small drives unassociated covariate regression coefficients to zero and v1 large allows associated covariate regression coefficients to be freely estimated. Dγλ is a p × p diagonal matrix with each Drr term equal to (1 − γλ,r)v0+γλ,rv1. The prior for βμ is defined similarly. The 2p-dimensional inclusion parameter vector, γ=(γλ,γμ), where γλ=(γλ,1,,γλ,p),γμ=(γμ,1,,γμ,p), and γ ∈ {0, 1}, is treated as missing and follows the iid Bernoulli distribution,

π(γθ)=θr=1p(γλ,r+γμ,r)(1-θ)2p-r=1p(γλ,r+γμ,r),

where γλ,r (or γμ,r) = 1 indicates the inclusion of covariate xλ,r (or xμ,r) in the model. We set the prior distribution of the sparsity parameter θ ∈ [0, 1] to a weakly informative, conjugate beta(a, b), with a = b = 2, to remove any boundary issues during estimation, as identified in Koslovsky et al. (2016). Note that we parameterize θ as an overall sparsity parameter for both γλ,r and γμ,r. Here, we assume that the covariates’ inclusion is exchangeable, which places no restrictions on the complexity for the two transition rates, λij and μij. Alternative prior specifications that can accommodate structural information regarding the covariates are available (Ročková and George, 2014).

To execute our method, we iteratively determine the conditional expectation of the log posterior distribution, termed the Q-function, with respect to the conditional distribution of the missing γ|β(k), λ0(k),μ0(k), θ(k), y (E-step), and then maximize with respect to the parameters, Φ′ = (β, λ0, μ0, θ), (M-step) until convergence, where β=(βλ,βμ).

2.2 E-Step

The Q-function, for iteration k + 1, is defined as

Q[ΦΦ(k)]=Eγ·[log(π(Φ,γy)Φ(k),y)]=γlog(π(Φ,γy))×π(γβ(k),θ(k)), (4)

where π(Φ, γ|y) is the complete posterior distribution and Eγ|Φ(k),y = Eγ|β(k)(k), which we denote as Eγ. Here, π(γ|β(k), θ(k)) is the posterior probability distribution for inclusion, which is equivalent to the complete posterior divided by the observed posterior. Explicitly, Eq. 4 is defined as

Q[ΦΦ(k)]=C+i=1mj=2ni[logPyi(ti,j-1),yi(ti,j)(δijxi,j-1)]-12v1(λ02+μ02)+r=1p[-12{βλ,r2Eγ·[1v0(1-γλ,r)+v1γλ,r]+βμ,r2Eγ·[1v0(1-γμ,r)+v1γμ,r]}+Eγ·[γλ,r+γμ,r]log(θ1-θ)]+(a-1)logθ+(b+2p-1)log(1-θ)

where C is a constant term.

For the E-step, we evaluate the conditional expectations within the Q-function at the current iteration, k. The conditional expectation of the inclusion parameter, Eγ[γλ,r], is defined as

Eγ·[γλ,r]=P(γλ,r=1β(k),θ(k))=π(βλ,r(k)γλ,r=1)P(γλ,r=1θ(k))π(βλ,r(k)γλ,r=0)P(γλ,r=0θ(k))+π(βλ,r(k)γλ,r=1)P(γλ,r=1θ(k))=pλ,r

where P(γλ,r = 1|θ(k)) = θ(k). The other conditional expectation is the average of the precisions, 1/v0 and 1/v1, weighted by the expected probability of inclusion, pλ,r,

Eγ·[1v0(1-γλr)+v1γλr]=(1-pλr)1v0+pλr1v1.

The conditional expectations of Eγ [γμ,r] and Eγ·[1v0(1-γμr)+v1γμr] are defined similarly.

2.3 M-Step

When applying the EM algorithm, the maximization of the Q-function is often complicated when closed-form solutions do not exist. The expectation conditional-maximization algorithm (ECM) replaces the traditional M-step with multiple conditional maximization steps (CM-steps), conditioned on the subset of parameters being estimated (Meng and Rubin, 1993). This common, alternative approach simplifies and stabilizes maximization, because the Q-function is maximized over a lower dimension of parameters (Meng and Rubin, 1993). Even after conditioning each maximization step, closed-form solutions often are still unobtainable. Thus, researchers rely on iterative procedures, including the Newton-Raphson algorithm. Such is the case for our method. We define the CM-steps as follows:

  • CM-step 1: Obtain λ0(k+1) and βλ(k+1) by maximizing Eq. 4, conditioned on μ0(k),βμ(k), and θ(k) using one step of the Newton-Raphson algorithm.

  • CM-step 2: Obtain μ0(k+1) and βμ(k+1) by maximizing Eq. 4, conditioned on λ0(k+1),βλ(k+1), and θ(k) using one step of the Newton-Raphson algorithm.

  • CM-step 3: Update the estimate of θ with the closed-form solution,
    θ(k+1)=r=1p(pλ,r+pμ,r)+a-1a+b+2p-2.

The ECM algorithm stops when the absolute value of the difference between the log-likelihood distribution evaluated at the current and next step of the algorithm falls below a set threshold (Wu, 1983). See Web Appendix A for details regarding the convergence stopping rule. Once the algorithm has converged, the final estimates, Φ̂, maximize Eq. 4. Inclusion is determined if Eγ|Φ̂[γr] ≥ 0.5 (Ročková and George, 2014). In practice, the performance of the EM algorithm is sensitive to initialization, and convergence is not guaranteed at the global mode. Thus, we use a deterministic annealing variant to reduce the algorithm’s dependence on initialization, similar to (Ročková and George, 2014; Koslovsky et al., 2016). See Web Appendix B for details of the deterministic annealing variant’s formulation.

2.4 Variance Estimation

To estimate variances of the unknown parameters, Φ, in our proposed method, we use Louis’s method (Louis, 1982), which relies on the missing information principle (Orchard et al., 1972),

ObservedInformation=CompleteInformation-MissingInformation.

Louis’s method formulates the observed information matrix in terms of the second derivative of the Q-function and the variance of the first derivative of the posterior distribution with respect to the missing information, γ. Following the Bayesian central limit theorem, the posterior distribution of the unknown parameters can be estimated assuming a normal distribution with mean equal to the posterior mode and variance equal to the inverse observed information matrix (Carlin and Louis, 2008). The estimated variance of the unknown parameters is expressed as

Var^(Φ^)=1Iobs(Φ^)=[-2Q[Φ|Φ^]ΦΦ-var{logπ(Φ,γy)Φ|Φ^,y}]-1. (5)

Details of this derivation are found in Web Appendix C. To avoid any boundary issues when calculating 95% credible intervals for θ ∈ [0, 1], we apply a logit transformation to the posterior mode and assume it follows a normal distribution with mean logit(θ̂) and variance Var^(θ^)/(θ^(1-θ^))2.

3. Simulation Study

To evaluate the performance of our method, we apply it to multiple simulated data sets in a variety of research scenarios. Details of the data generation, evaluation methods, and results of the simulation study can be found in Web Appendices D – F, respectively. Briefly, we examined the performance of our method in various scenarios, with different sample sizes (m = 100 and m = 150), numbers of equally (randomly) spaced assessment times (ni = 30 or ni = 70), and exchangeable correlation structures between covariates (ρ = 0 and ρ = 0.75). For the special case of equally spaced assessment times, we compared our model to EMVS and the LASSO for logistic regression models (Koslovsky et al., 2016; Tibshirani, 1996).

We evaluated the performance of our method based on the average false positive (FP) and false negative (FN) rates with FPR = FP/(FP + TN) and FNR= FN/(FN+TP), where TP and TN are true positives and true negatives, respectively. Additionally, we assessed the bias (average of the posterior modes minus true values), the Monte Carlo error of the posterior modes (MCE), the square root of the average of the posterior variances estimated with Louis’s method (SE), the coverage probability (CP) of the 95% equal-tail credible intervals, and the average mean squared error of the steady-state probability of transition from a non-smoking to smoking state (MSE).

We found the performance of our method with both randomly and equally spaced assessment times improved with larger sample sizes, larger number of assessments observed, and lower correlation structures. With equally spaced assessment times, our method outperformed or showed relatively equivalent performance to the LASSO for FPR and FNR in every setting and comparable performance to EMVS for logistic regression models. Additionally in all scenarios, our method correctly included associated covariates and correctly excluded unassociated covariates in about 99% of the simulations on average. Also, CPs fell around 92% on average. As the sample size and number of assessments increased, we observed the MCE approach the SE. The MSE for the steady-state probability of transition from a non-smoking to a smoking state was around 0.04 for all scenarios. Overall, our method demonstrated encouraging performance across the simulation scenarios, justifying its use for identifying risk factors associated with transitions between smoking states in our application data.

4. Application

Our method was developed to analyze intensive longitudinal data collected with EMA from the PREVAIL study (Kendzor et al., 2015), which demonstrated the effectiveness of a contingency management (CM) treatment to promote smoking cessation. At the beginning of the study, 146 of 222 screened individuals met the eligibility requirements and were randomized into treatment groups. One group received usual smoking cessation care from a Dallas based, safety-net hospital (n = 71), and the other received usual care as well as the contingency management (n = 75), which offered small financial incentives to encourage abstinence. A week before the scheduled quit date, baseline measures were taken and individuals were taught how to complete assessments on a study provided smart phone. Each individual logged his/her smoking behaviors on the smart phone over a 2-week period (1 week prior and 1 week after the scheduled smoking quit date). Individuals were prompted with 4 random assessments per day, which collected information regarding their urge to smoke, affect, social environment, abstinence self-efficacy, cigarette availability, and location. Since assessment times were randomly prompted by the smart phone, they were considered non-informative to the non-smoking/smoking process (Gruger et al., 1991).

As mentioned, our main objective in this analysis is to identify risk factors associated with transitioning between smoking and non-smoking states after the scheduled quit date in this cohort. An individual’s smoking status was deemed to be in a smoking state if they reported smoking since their previous assessment. Potential risk factors consist of both baseline measurements and EMA items (Table 1). Related analyses have summarized positive and negative affect items by taking the average of each set of items (Businelle et al., 2014). Here, we are interested in selecting individual components of positive affect (e.g., happy, calm) and negative affect (e.g., irritable, frustrated/angry, sad, worried, miserable) measures. Four individuals were dropped from the analysis because responses to the set of items in Table 1 were missing. A total of 3091 assessments collected after quit date (on average 41 per individual ranging from 3 to 51) were analyzed.

Table 1.

Description of potential risk factors for transitioning between smoking states.

Baseline Measures

Measure Scale Coded As



Heaviness of Smoking Index (HSI) 0–6 Continuous
Education level completed Years Continuous
Age Years Continuous
Race/Ethnicity Non-Hispanic White or Black/Other Binary indicator with “Non-Hispanic White” as reference.
Contingency Management (CM) Yes or No Binary indicator with “No” as reference

EMA Items

Item Type Item Assessment Scale Coded As




Urge to smoke “I have an urge to smoke” 1–5 Likert Continuous
“I feel happy” 1–5 Likert Continuous
“I feel calm” 1–5 Likert Continuous
“I feel irritable” 1–5 Likert Continuous
Affect “I feel frustrated/angry” 1–5 Likert Continuous
“I feel sad” 1–5 Likert Continuous
“I feel worried” 1–5 Likert Continuous
“I feel miserable” 1–5 Likert Continuous
Social Setting “Is anyone you are, interacting with smoking?”** Yes or No Binary indicator with “No” as reference
Abstinence Self-efficacy “I am confident in my ability to AVOID smoking” 1–5 Likert Continuous
Cigarette Availability “Cigarettes are available to me” 1–5 Likert Continuous
Being outside Yes or No Binary indicator with “Not outside” as reference
Location In a car/truck Yes or No Binary indicator with “Not in a car or truck” as reference
At work Yes or No Binary indicator with “Not at work” as reference
**

Prompted if individual answered “Yes” to “Are you interacting with people?”

Before performing variable selection, we assessed the feasibility of a MSM for the application data. Since assessments were collected frequently, we determined the MSM’s overall fit by plotting the observed and estimated prevalences, obtained by fitting a full model, of each state over time (Titman and Sharples, 2010). We tested the assumption of time homogeneity for the Markov process by comparing the full model to an alternative model with piecewise constant transition intensities using a likelihood ratio test. We tested the Markov assumption by comparing the full model to an alternative model that included the state occupied two assessments prior as a covariate using a likelihood ratio test. Throughout the observation window, the observed and estimated smoking and non-smoking prevalences fell around 20% and 80%, respectively (Web Appendix Figure 1). Additionally, we failed to reject the null hypotheses that the baseline hazards were constant and the Markov assumption was upheld at the 0.05 α-level with p-values of 0.17 and 0.42, respectively. To perform variable selection on the data, we initialized and parameterized our model similar to the methods found in Web Appendix E. The variance of exclusion and inclusion were set to v0 = 0.0006 and v1 = 0.5, respectively. Continuous covariates were standardized to mean 0 and variance 1 before selection. Covariates were included in the model if the conditional expectation of their respective inclusion indicator was greater than or equal to 0.50. In this analysis, all included covariates had an Eγ|Φ̂[γr] = 1, and all excluded covariates had an Eγ|Φ̂[γr] < 0.06.

We present results based on each risk factor’s inclusion or exclusion via EMVS, however not all risk factor’s remained influential (95% CI for hazard ratio contains 1) after accounting for estimation and selection uncertainty (Table 2). We found results that were consistent with previous research analyzing the relation between risk factors and smoking behaviors after a quit attempt. CM was previously shown to be an effective means of increasing smoking abstinence after a quit attempt in this cohort (Kendzor et al., 2015). In this analysis, we found that CM was associated with a decrease in the transition rate from N → S after the quit date. Addiction level has previously been associated with relapse (Zhou et al., 2009). In this analysis, the Heaviness of Smoking Index (HSI) served as a proxy for addiction level and was found to be associated with a decrease in transition rates from N → S and S → N. Consistent with Zhou et al. (2009), we found that baseline education level was not associated with relapse. This analysis also did not find any association with transition rates. Age has been shown to be associated with a decrease in the odds of relapse (Zhou et al., 2009). We found that age reduced the transition rate from N → S and S → N. Also, environmental factors, such as having cigarettes available and being around someone who is smoking, have been associated with smoking behaviors (Zhou et al., 2009). Here, having cigarettes available increased the transition rate from N → S and decreased the transition from S → N. Negative and positive affect as well as urge to smoke are commonly identified as risk factors associated with smoking lapse and relapse after a quit attempt (Piasecki, 2006; Vasilenko et al., 2014; Shiffman et al., 2002; Zhou et al., 2009). In our analysis, we found that the being calm (a positive affect item) and worried (a negative affect item) were associated with a reduction in both transitions after the quit attempt. While urge to smoke is considered a defining characteristic of addiction (Kassel and Shiffman, 1992; Shiffman et al., 1997), its association with smoking behaviors is often inconsistent (Wray et al., 2013). Here, we found that urge was associated with an increase in transition between N → S and S → N after the quit date. Self-efficacy to abstain is commonly shown to be associated with smoking behaviors around a quit attempt (Smit et al., 2014; Shiffman et al., 2000). This analysis identified self-efficacy as being associated with a decrease in transition rate from N → S. Additionally, being in a car has been associated with a reduction in the odds of smoking (Shiffman et al., 2002). We found it to be associated with a decrease in both transitions rates. During ad-lib smoking, being at work and being outside have been associated with a decrease and increase in smoking, respectively. However in this study, being at work was not found to be associated with any transition, but being outside was associated with an decrease in transition from N → S and S → N. For two of the risk factors (self-efficacy to abstain and having cigarettes available), our method was able to differentiate between a risk factor’s relation with transitioning from S → N and N → S. These results demonstrate how EMVS for a MSM can reveal intricacies in complex behavioral processes that may elude other methods.

Table 2. Application Results.

Hazard rates and 95% equal-tail credible intervals (CI) of potential risk factors for transitioning between N → S and S → N states after the scheduled quit attempt.

Risk Factor After Quit Attempt
N → S (95% CI) S → N (95% CI)
Baseline hazard 0.081 (0.062, 0.107) 0.321 (0.248, 0.416)
HSI 0.727 (0.587, 0.901)** 0.796 (0.623, 1.016)*
Education level 0.976 (0.930, 1.023) 1.010 (0.965, 1.057)
Age 0.673 (0.535, 0.845)** 0.568 (0.447, 0.723)**
Race/Ethnicity 1.002 (0.954, 1.052) 0.995 (0.948, 1.045)
CM 0.569 (0.452, 0.717)** 1.002 (0.954, 1.053)
Urge 1.565 (1.231, 1.988)** 1.264 (0.997, 1.602)*
Happy 1.000 (0.954, 1.048) 0.998 (0.952, 1.046)
Calm 0.615 (0.477, 0.791)** 0.653 (0.517, 0.824)**
Irritable 0.995 (0.949, 1.044) 1.002 (0.955, 1.050)
Frustrated 0.999 (0.952, 1.047) 0.993 (0.947, 1.041)
Sad 1.002 (0.956, 1.051) 0.995 (0.950, 1.043)
Worried 0.503 (0.381, 0.665)** 0.532 (0.409, 0.692)**
Miserable 1.007 (0.960, 1.056) 0.987 (0.942, 1.035)
Interacting w/ smoker 1.001 (0.954, 1.052) 0.998 (0.950, 1.048)
Self-efficacy 0.711 (0.631, 0.802)** 0.998 (0.950, 1.047)
Cigarettes available 1.446 (1.151, 1.819)** 0.772 (0.613, 0.972)**
Being outside 0.572 (0.357, 0.917)** 0.293 (0.180, 0.476)**
In a car/truck 0.780 (0.455, 1.338)* 0.527 (0.296, 0.938)**
At work 0.999 (0.951, 1.049) 1.001 (0.953, 1.051)
**

Risk factor selected by EMVS and CI does not contain hazard ratio equal to 1

*

Risk factor selected by EMVS and CI does contain hazard ratio equal to 1

5. Discussion

To our knowledge, we developed the first variable selection method for MSMs with interval-censored data. Using EMA data, we demonstrated the usefulness of our method in practice by identifying potential risk factors associated with transitions between discrete smoking states in a cohort of socioeconomically disadvantaged individuals. In future studies, this method could be used to identify multiple predictor variables for lapse in real-time that could trigger the delivery of tailored interventions at the critical time after a quit attempt.

In this work, we show the usefulness of a variable selection method on a two-state Markov model, but the method is generalizable to other state spaces. However, a major challenge of modeling MSMs is model estimation when the number of potential transitions in the state space and covariates increases (Saint-Pierre et al., 2003). Extending our method to MSMs with larger state spaces would require adjusting the likelihood function component in the Q-function. For three- and four-state models, we conjecture that estimation times would not increase significantly, since closed-formed solutions exist for the transition probabilities (Li and Chan, 2006; Chan, 2017). Thereafter, we expect computational cost to depend more on the optimization routine employed. In practice, researchers often ignore interval-censoring and assume that exact transition times are known, which may bias parameter estimates (Sutradhar et al., 2010). Therefore, variable selection methods designed for datasets in which the exact transition time are known are not appropriate for analyzing data structures found in this study. However, future work could incorporate the attractive features of these methods, including selection of non-linear covariate effects into the EMVS framework (Reulen and Kneib, 2015, 2016).

One of the main objectives of intensive longitudinal data analysis is to identify or re-affirm complex relations between potential risk factors and behavioral outcomes over time (Walls and Schafer, 2005). While EMVS for MSMs with interval-censored data shows promise for identifying these relations, no variable selection method is a panacea. In practice, we suggest using our method coupled with other intensive longitudinal data analyses approaches (Walls and Schafer, 2005; Tan et al., 2012) to provide a deeper perspective on the intricacies of the behavioral process.

Supplementary Material

Supp info

Acknowledgments

This research is supported by the University of Texas School Health Science at Houston Center School of Public Health, Cancer Education and Career Development Program National Cancer Institute/NIH Grant R25 CA57712 predoctoral fellowship to Matthew D. Koslovsky; the University of Texas Health Science Center at Houston School of Public Health, Training Program in Biostatistics National Institute of General Medical Sciences/NIH Grant T32GM074902 predoctoral traineeship to Matthew D. Koslovsky; and the Michael & Susan Dell Foundation, Michael & Susan Dell Center for Healthy Living, The University of Texas School of Public Health, Austin Regional Campus. The parent study was primarily supported by the University of Texas Health Science Center, School of Public Health with additional support from American Cancer Society Grants MRSGT-10-104-01-CPHPS to Darla E. Kendzor and MRSGT-12-114-01-CPPB to Michael S. Businelle. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.

Footnotes

6. Supplementary Materials

Web Appendix A & B, referenced in Section 2.3, Web Appendix C, referenced in Section 2.4, Web Appendices D-F, referenced in Section 3, and the simulation code as well as a working tutorial for our method are available with this paper at the Biometrics website on Wiley Online Library.

References

  1. Aguirre-Hernández R, Farewell V. A Pearson-type goodness-of-fit test for stationary and time-continuous Markov regression models. Statistics in Medicine. 2002;21:1899–1911. doi: 10.1002/sim.1152. [DOI] [PubMed] [Google Scholar]
  2. Businelle MS, Ma P, Kendzor DE, Reitzel LR, Chen M, Lam CY, Bernstein I, Wetter DW. Predicting quit attempts among homeless smokers seeking cessation treatment: an ecological momentary assessment study. Nicotine & Tobacco Research. 2014;16:1371–1378. doi: 10.1093/ntr/ntu088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Carlin BP, Louis TA. Bayesian Methods for Data Analysis. CRC Press; Boca Raton, FL: 2008. [Google Scholar]
  4. Chan W. Appendix: Derivations of transition probabilities for a four-state continuous time Markov chain. 2017 https://drive.google.com/open?id=0BAhiwb6HTQzTXBuWmo3dldKRE0.
  5. Chatfield C. Journal of the Royal Statistics Society A158. 1995. Model uncertainty, data mining and statistical inference; pp. 419–466. [Google Scholar]
  6. Cox DR. Regression models and life-tables. Journal of the Royal Statistics Society. 1972;B34:187–220. [Google Scholar]
  7. Cox DR, Miller HD. The Theory of Stochastic Processes. Vol. 134. CRC Press; Boca Raton, FL: 1977. [Google Scholar]
  8. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 1977;39:1–38. [Google Scholar]
  9. Farewell VT, Tom BD. The versatility of multi-state models for the analysis of longitudinal data with unobservable features. Lifetime Data Analysis. 2014;20:51–75. doi: 10.1007/s10985-012-9236-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 3 Taylor & Francis; Boca Raton, FL: 2014. [Google Scholar]
  11. George EI, McCulloch RE. Variable selection via Gibbs sampling. Journal of the American Statistical Association. 1993;88:881–889. [Google Scholar]
  12. Gruger J, Kay R, Schumacher M. The validity of inferences based on incomplete observations in disease state models. Biometrics. 1991;47:595–605. [PubMed] [Google Scholar]
  13. Jones RH, Xu S, Grunwald GK. Continuous time Markov models for binary longitudinal data. Biometrical Journal. 2006;48:411–419. doi: 10.1002/bimj.200510224. [DOI] [PubMed] [Google Scholar]
  14. Kalbfleisch J, Lawless JF. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association. 1985;80:863–871. [Google Scholar]
  15. Kassel JD, Shiffman S. What can hunger teach us about drug craving? a comparative analysis of the two constructs. Advances in Behaviour Research and Therapy. 1992;14:141–167. [Google Scholar]
  16. Kay R. A Markov model for analysing cancer markers and disease states in survival studies. Biometrics. 1986;14:855–865. [PubMed] [Google Scholar]
  17. Kendzor DE, Businelle MS, Poonawalla IB, Cuate EL, Kesh A, Rios DM, Ma P, Balis DS. Financial incentives for abstinence among socioeconomically disadvantaged individuals in smoking cessation treatment. American Journal of Public Health. 2015;105:1198–1205. doi: 10.2105/AJPH.2014.302102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Koslovsky MD, Swartz MD, Leon-Novelo L, Chan W, Wilkinson AV. Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. 2016 doi: 10.1080/00949655.2017.1398255. In Revisions. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li YP, Chan W. Analysis of longitudinal multinomial outcome data. Biometrical Journal. 2006;48:319–326. doi: 10.1002/bimj.200510187. [DOI] [PubMed] [Google Scholar]
  20. Louis TA. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 1982;44:226–233. [Google Scholar]
  21. Ma J, Chan W, Tsai CL, Xiong M, Tilley BC. Analysis of transtheoretical model of health behavioral changes in a nutrition intervention study-a continuous time Markov chain model with Bayesian approach. Statistics in Medicine. 2015;34:3577–3589. doi: 10.1002/sim.6571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Statistics in Medicine. 1995;14:1975–1983. doi: 10.1002/sim.4780141804. [DOI] [PubMed] [Google Scholar]
  23. McDermott P, Snyder J, Willison R. Methods for Bayesian variable selection with binary response data using the EM algorithm. 2016 arXiv preprint arXiv:1605.05429. [Google Scholar]
  24. McLachlan G, Krishnan T. The EM Algorithm and Extensions. 2 John Wiley & Sons; Hoboken, NJ: 2007. [Google Scholar]
  25. Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika. 1993;80:267–278. [Google Scholar]
  26. Orchard T, Woodbury MA. A missing information principle: theory and applications. Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability; Berkeley, CA: University of California Press; 1972. pp. 697–715. [Google Scholar]
  27. Pan SL, Wu HM, Yen AMF, Chen THH. A Markov regression random-effects model for remission of functional disability in patients following a first stroke: A Bayesian approach. Statistics in Medicine. 2007;26:5335–5353. doi: 10.1002/sim.2999. [DOI] [PubMed] [Google Scholar]
  28. Piasecki TM. Relapse to smoking. Clinical Psychology Review. 2006;26:196–215. doi: 10.1016/j.cpr.2005.11.007. [DOI] [PubMed] [Google Scholar]
  29. Pinsky M, Karlin S. An Introduction to Stochastic Modeling. Academic Press; Burlington, MA: 2010. [Google Scholar]
  30. Reulen H, Kneib T. Technical report. University of Goettingen; 2015. Structured fusion lasso penalised multi-state models. [Google Scholar]
  31. Reulen H, Kneib T. Boosting multi-state models. Lifetime Data Analysis. 2016;22:241–262. doi: 10.1007/s10985-015-9329-9. [DOI] [PubMed] [Google Scholar]
  32. Ročková V, George EI. EMVS: The EM approach to Bayesian variable selection. Journal of the American Statistical Association. 2014;109:828–846. [Google Scholar]
  33. Saint-Pierre P, Combescure C, Daures J, Godard P. The analysis of asthma control under a Markov assumption with use of covariates. Statistics in Medicine. 2003;22:3755–3770. doi: 10.1002/sim.1680. [DOI] [PubMed] [Google Scholar]
  34. Shiffman S, Balabanis MH, Paty JA, Engberg J, Gwaltney CJ, Liu KS, Gnys M, Hickcox M, Paton SM. Dynamic effects of self-efficacy on smoking lapse and relapse. Health Psychology. 2000;19:315. doi: 10.1037//0278-6133.19.4.315. [DOI] [PubMed] [Google Scholar]
  35. Shiffman S, Engberg JB, Paty JA, Perz WG, Gnys M, Kassel JD, Hickcox M. A day at a time: predicting smoking lapse from daily urge. Journal of Abnormal Psychology. 1997;106:104. doi: 10.1037//0021-843x.106.1.104. [DOI] [PubMed] [Google Scholar]
  36. Shiffman S, Gwaltney CJ, Balabanis MH, Liu KS, Paty JA, Kassel JD, Hickcox M, Gnys M. Immediate antecedents of cigarette smoking: an analysis from ecological momentary assessment. Journal of Abnormal Psychology. 2002;111:531. doi: 10.1037//0021-843x.111.4.531. [DOI] [PubMed] [Google Scholar]
  37. Shiffman S, Hufford M, Hickcox M, Paty JA, Gnys M, Kassel JD. Remember that? A comparison of real-time versus retrospective recall of smoking lapses. Journal of Consulting and Clinical Psychology. 1997;65:292. doi: 10.1037/0022-006x.65.2.292.a. [DOI] [PubMed] [Google Scholar]
  38. Smit ES, Hoving C, Schelleman-Offermans K, West R, de Vries H. Predictors of successful and unsuccessful quit attempts among smokers motivated to quit. Addictive Behaviors. 2014;39:1318–1324. doi: 10.1016/j.addbeh.2014.04.017. [DOI] [PubMed] [Google Scholar]
  39. Sutradhar R, Barbera L, Seow H, Howell D, Husain A, Dudgeon D. Multistate analysis of interval-censored longitudinal data: Application to a cohort study on performance status among patients diagnosed with cancer. American Journal of Epidemiology. 2010;173:384. doi: 10.1093/aje/kwq384. [DOI] [PubMed] [Google Scholar]
  40. Tan X, Shiyko MP, Li R, Li Y, Dierker L. A time-varying effect model for intensive longitudinal data. Psychological Methods. 2012;17:61. doi: 10.1037/a0025814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 1996;58:267–288. [Google Scholar]
  42. Titman AC, Sharples LD. Model diagnostics for multi-state models. Statistical Methods in Medical Research. 2010;19:621–651. doi: 10.1177/0962280209105541. [DOI] [PubMed] [Google Scholar]
  43. Ueda N, Nakano R. Deterministic annealing EM algorithm. Neural Networks. 1998;11:271–282. doi: 10.1016/s0893-6080(97)00133-0. [DOI] [PubMed] [Google Scholar]
  44. Vasilenko SA, Piper ME, Lanza ST, Liu X, Yang J, Li R. Time-varying processes involved in smoking lapse in a randomized trial of smoking cessation therapies. Nicotine & Tobacco Research. 2014;16:S135–S143. doi: 10.1093/ntr/ntt185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Walls TA, Schafer JL. Models for intensive longitudinal data. Oxford University Press; New York, NY: 2005. [Google Scholar]
  46. Wray JM, Gass JC, Tiffany ST. A systematic review of the relationships between craving and smoking cessation. Nicotine & Tobacco Research. 2013;15:1167–1182. doi: 10.1093/ntr/nts268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wu CJ. On the convergence properties of the EM algorithm. The Annals of Statistics. 1983;11:95–103. [Google Scholar]
  48. Zhao K, Lian H. The Expectation–Maximization approach for Bayesian quantile regression. Computational Statistics & Data Analysis. 2016;96:1–11. [Google Scholar]
  49. Zhou X, Nonnemaker J, Sherrill B, Gilsenan AW, Coste F, West R. Attempts to quit smoking and relapse: factors associated with success or failure from the ATTEMPT cohort study. Addictive Behaviors. 2009;34:365–373. doi: 10.1016/j.addbeh.2008.11.013. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp info

RESOURCES