BSMHM2: Bayesian Semiparametric Mixed Hidden Markov Models for Delineating the Pathology of Alzheimer’s Disease

Kai Kang; Jingheng Cai; Xinyuan Song; Hongtu Zhu; for the Alzheimer’s Disease Neuroimaging Initiative

doi:10.1177/0962280217748675

. Author manuscript; available in PMC: 2020 Jan 1.

Published in final edited form as: Stat Methods Med Res. 2017 Dec 26;28(7):2112–2124. doi: 10.1177/0962280217748675

BSMHM2: Bayesian Semiparametric Mixed Hidden Markov Models for Delineating the Pathology of Alzheimer’s Disease

Kai Kang ¹, Jingheng Cai ², Xinyuan Song ³, Hongtu Zhu ⁴; for the Alzheimer’s Disease Neuroimaging Initiative

PMCID: PMC5984196 NIHMSID: NIHMS967298 PMID: 29278101

Abstract

Alzheimer’s disease (AD) is a firmly incurable and progressive disease. The pathology of AD usually evolves from cognitive normal (CN), to mild cognitive impairment (MCI), to AD. The aim of this paper is to develop a Bayesian semiparametric mixed hidden Markov modeling (BSMHM2) framework to characterize disease pathology, identify hidden states corresponding to the diagnosed stages of cognitive decline, and examine the dynamic changes of potential risk factors associated with the CN-MCI-AD transition. The BSMHM2 framework consists of two major components. The first one is a state-dependent semiparametric regression for delineating the complex associations between clinical outcomes of interest and a set of prognostic biomarkers across neurodegenerative states. The second one is a parametric transition model, while accounting for potential covariate effects on the cross-state transition. The inter-individual and inter-process differences are taken into account via correlated random effects in both components. Based on the Alzheimer’s Disease Neuroimaging Initiative dataset, we are able to identify four states of AD pathology, corresponding to common diagnosed cognitive decline stages, including CN, early MCI, late MCI, and AD and examine the effects of hippocampus, age, gender, and APOE-ε4 on degeneration of cognitive function across the four cognitive states.

Keywords: Bayesian P-splines, Correlated random effects, Hidden Markov models, MCMC methods, Semiparametric models

1 Introduction

Alzheimer’s disease (AD) is a chronic neurodegenerative disease that usually starts slowly and worsens over time. The most common early symptom of AD is short-term memory loss, also referred to mild cognitive impairment (MCI). Patients at MCI state have high likelihood to transit to dementia or AD within a few years (Albert et al., 2011). Despite an increasing attention to its growing public threat, the cause of AD remains poorly understood. Thus, it is great interest to discovering or validating prognostic biomarkers that may identify subjects at great risk for future cognitive decline and investigating the functional effects of various biomarkers on the conversion from NC to AD.

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) study began in 2004 and it collected imaging, generic, clinical, and cognitive data from subjects under cognitive normal (CN) controls and subjects with MCI or AD in order to delineate the complex associations among various characteristics of the clinical spectrum of AD. The ADNI-1 recruited approximately 800 subjects according to its initial aim and has been extended by three follow-up studies, namely, ADNI-GO, ADNI-2, and ADNI-3. ANDI-1 subjects had an option to refuse follow-up monitoring in subsequent studies. More information on ADNI can be obtained in the official website (www.adni-info.org). Functional assessment questionnaire (FAQ), an assessment of abilities to function independently in daily life, is widely used to monitor the decline of cognitive ability over time. The FAQ scores of each subject were obtained at baseline and then every 6 months across 9 years multiple study phases. For this longitudinal study, several central questions are naturally raised as follows:

(i) How many hidden pathophysiological states exist in the progression of AD?
(ii) Which factors should contribute to the neuro-degenerative pathology from one state (e.g., MCI) to another (e.g., AD)?
(iii) Whether the identified risk factors are equally good predictors of cognitive decline at each state?

Given these questions, there is a particular need for the development of statistical models that delineate cognitive decline in terms of the pathophysiological states of AD.

Hidden Markov models (HMM) are well suited to the characterization of longitudinal data in terms of a set of hidden states (Cappé et al., 2005; Maruotti, 2011; Bartolucci et al., 2013). HMMs consist of two components: a transition model to describe the dynamic transition of hidden states and a conditional regression model to examine state-specific covariate effects on responses. Owing to their ability to simultaneously reveal the longitudinal association structure and dynamic heterogeneity of the observed process, HMMs and their variants have attracted significant attention from the medical, behavioral, social, environmental, and psychological sciences (Vermunt et al., 1999; Scott et al., 2005; Schmittmann et al., 2005; Bartolucci and Farcomeni, 2009; Bartolucci et al., 2013; Chow et al., 2013). In particular, HMMs have previously been applied to investigate diseases progression to identify latent pathophysiological states. For instance, Albert et al. (1994) used HMMs to analyze multiple sclerosis disease across relapse and remission states (see also Altman and Petkau, 2005; Altman, 2007). Ip et al. (2013) identified ten disable states on the basis of a 10-year follow-up study of late-life disability in elder adults, and examined the patterns and risk factors for transition among disable states. Song et al. (2016) revealed the dynamic change of treatment effectiveness in preventing cocaine use across three cocaine addiction states.

Despite the rapid development and wide applications of HMMs, existing literature has mainly focused on parametric HMMs, in which the forms of covariate effects on responses and on transition probabilities are pre-specified. One problem of parametric models is that they may be too restrictive to reflect correctly the reality because the complex relationships among variables are seldom known a priori, and a pre-specified parametric form tends to overlook the subtle pattern of a function. A more comprehensive analysis can be performed by incorporating nonparametric functions into HMMs so that the functional effects of interest can be discovered. To the best of our knowledge, however, such nonparametric modeling has not been introduced into the HMM framework.

In this study, we propose a Bayesian mixed semiparametric hidden Markov modeling (BMSHM2) framework to analyze the ADNI-I dataset. Similar to conventional HMMs, the proposed model consists of two major components. The first component is a state-dependent semiparametric regression to investigate the linear and nonlinear effects of covariates, such as hippocampus, age, gender, and APOE-ε4, on the clinical outcome of cognitive decline (e.g., FAQ score). The second component is a mixed continuation-ratio logit transition model to examine various covariate effects on the probabilities of transitioning among neurodenerative states. We introduce a random effect in both models in order to account for inter-individual differences and allow the random effects to be dependent by assigning a joint distribution for them. Such joint random effects enable the model to accommodate the situation where some omitted factors influence both the observed process and the hidden transition process (Wulfsohn and Tsiatis, 1997; Chi and Ibrahim, 2006). We develop a full Bayesian approach, along with Bayesian P-splines procedure and Markov chain Monte Carlo (MCMC) methods, for statistical inference. As far as we know, no previous study has ever been conducted either on the proposed BMSHM2 or on Bayesian HMMs. Also, this paper is the first to investigate the neurodegenerative pathology of AD.

Section 2 defines BMSHM2 and discusses the related identifiability issues. Section 3 introduces the Bayesian inference procedure. Section 4 illustrates the use of BMSHM2 in the analysis of the ADNI dataset. Section 5 demonstrates the empirical performance of the proposed methodology through a simulation study. Section 6 discusses the findings obtained from the analysis of the ADNI dataset. Technical details are provided in the Appendix.

2 Model description

2.1 Questions of Interest for ADNI-1

Data used in this article were obtained from the ADNI-1 database launched in 2003. A total of n = 633 patients at baseline, 6 months, 12 months, and 24 months (t = 1, …, 4) were considered in the analysis. We use the score of FAQ, denoted by y_it, to characterize the cognitive function of subject i at occasion t. Moreover, we observe a r × 1 vector of discrete covariates, denoted by b_it = (b_it,1, …, b_it,r)^T, and a q × 1 vector of continuous covariates, denoted by x_it = (x_it,1, …, x_it,q)^T. The covariates of interest include gender (1 = male; 0 = female), apolipoprotein E-ε4 (APOE-ε4), hippocampus, and age at baseline, in which APOE-ε4 is a known genetic risk factor for AD and is coded as 0, 1, and 2, denoting the number of APOE-ε4 alleles, and hippocampal volume is divided by whole brain volume to account for the confounding effect of brain size. Thus, APOE-ε4 (=1, b_it,1), APOE-ε4 (=2, b_it,2), and gender (b_it,3) are discrete, whereas hippocampus (x_it,1) and age at baseline (x_it,2) are continuous.

Several kinds of dependencies/heterogeneities are worthy of investigation. The first one is the dynamic heterogeneity across different groups. Figure 1 plots the individual trajectories of FAQ scores for 20 randomly selected samples, who were initially diagnosed as CN, MCI, and AD, respectively, at baseline. The cognitive decline patterns are apparently distinct over the groups, suggesting at least three (and probably more) distinct neurodegenerative states existent underlying the observations of FAQ score. The second one is the dependency of FAQ score on potential covariates, such as hippocampus, age at baseline, APOE-ε4, and gender. The third one is the serial dependency of the longitudinal observations, owing to relative persistence of neurodegenerative states. The last one is the heterogeneity caused by the existence of some omitted clinical or genetic indicators that influence both cognitive decline and its underlying transition. The BMSHM2 described below perfectly accommodates all these features.

ADNI-1 data analysis results: individual trajectories of FAQ scores for 20 randomly selective samples whose baseline states are CN, MCI, and AD, respectively.

2.2 Model Setup

The BMSHM2 consists of two major components, including a conditional seminparametric regression model and a continuation-ratio logit transition model, as detailed below.

2.2.1 Conditional semiparametric regression model

Let y_it with subject i = 1, …, n at t = 1, …, T be the observation process. The hidden state process, Z_it, which takes values in {1, …, S}, is assumed to follow a first-order Markov chain. Given the hidden state Z_it = s, the conditional semiparametric regression model is defined as follows:

[y_{it} | Z_{it} = s] = μ_{s} + γ_{s}^{T} b_{it} + \sum_{j = 1}^{q} f_{sj} (x_{it, j}) + w_{i 1} + δ_{it},

(1)

where μ_s is a state-specific intercept, γ_s = (γ₁, …, γ_r) is a state-specific vector of fixed effect of discrete covariates, f_sj(·)s are state-specific unknown smoothing functions, b_it = (b_it,1, …, b_it,r)^T and x_it = (x_it,1, …, x_it,q)^T are r × 1 vector of discrete covariates and q × 1 vector of continuous covariates respectively, w_i1 is a subject-specific random effect, δ_it is a random residual independent of y_it, and [δ_it|Z_it = s] ~ N[0, ψ_s].

The conditional model defined by (1) extends the parametric regression to allow the additive nonparametric functions of covariates, so that the functional effects of interest can be discovered. Such nonparametric modeling provides great flexibility in fitting nonlinear effects whose forms need not be specified a priori. When used as an exploratory tool, the proposed model is able to help users to visually examine and interpret the functional effects of potential predictors on the response of interest. Moreover, the subject-specific random effect w_i1 permits additional dependencies elicited from other sources and thus avoids a large number of hidden states caused by possible residual correlation among responses.

2.2.2 Continuation-ratio logit transition model

Let p_itus denote the transition probability from state Z_i,t−1 = u at occasion t − 1 to state Z_it = s at occasion t for individual i. Based on the assumption of the first-order Markov chain, we have

p_{itus} = P (Z_{it} = s | Z_{i 1}, Z_{i 2}, \dots, Z_{i, t - 1} = u) = P (Z_{it} = s | Z_{i, t - 1} = u) .

(2)

The initial distribution of Z_i1 is assumed to be a multinomial with probabilities (τ₁, …, τ_S)^T such that τ_s ≥ 0 and $\sum_{s = 1}^{S} τ_{s} = 1$ . The distribution of ${Z_{it}}_{t = 1}^{T}$ is then fully determined by the transition probabilities and the distribution of the initial state.

In the study of disease progression, the hidden states can often be naturally ranked (e.g., CN, MCI, and AD can be ranked from the best to the worst cognitive condition). Thus, we assume that the hidden states {1, …, S} are ordered and ϑ_itus = P(Z_it = s|Z_it ≥ s, Z_i,t−1 = u). Then, the transition across the ordered states can be described by continuation logits as follows: For t = 2, …, T, s = 1, …, S − 1, and u = 1, …, S,

log (\frac{P (Z_{it} = s | Z_{i, t - 1} = u)}{P (Z_{it} > s | Z_{i, t - 1} = u)}) = log (\frac{p_{itus}}{p_{itu, s + 1} + \dots + p_{ituS}}) = logit (ϑ_{itus}) .

(3)

The parameterization in (3) is intended to facilitate the interpretation of transition to a state rather than to a better one. To examine the effects of potential predictors on the transition probabilities, we consider a continuation-ratio logit transition model as follows:

logit (ϑ_{itus}) = ζ_{us} + α^{T} d_{it} + w_{i 2},

(4)

where ζ_us is a state-specific intercept, $d_{it} = {(x_{it}^{T}, b_{it}^{T})}^{T}$ is the vector of covariates defined in (1), α is a (q + r) × 1 vector of regression coefficients that can be interpreted as conditional log odds ratios in a logistic regression, w_i2 is a subject-specific random effect that is distinct from but correlated with w_i1, and w_i = (w_i1, w_i2)^T is assumed to follow a multivariate normal distribution N(0, Φ). Similar to the proportional assumption in a cumulative logit model, α in (4) is assumed to be independent of u and s in order to maintain the order of the hidden states and avoid a tedious transition model, in which every transition elicits a set of parameters for all possible states of origination and destination. This outcome, in turn, greatly reduces the complexity and enhances the interpretability of the transition model.

Notably, random effects w_i1 and w_i2 play different roles in the conditional and transition models. While w_i1 in conditional model (1) relaxes the assumption that observations {y_it; i = 1, …, n, t = 1, …, T} are conditionally independent given the hidden state Z_it = s, w_i2 in transitional model (4) releases the assumption that hidden process Z_it is Markovian. Unlike the existing literature that usually treats w_i1 and w_i2 separately, we accommodate their possible correlation by assigning a joint distribution for w_i = (w_i1, w_i2)^T. Consequently, the possible correlation between the heterogeneities existent within the two stochastic processes can be appropriately addressed and examined through the covariance matrix Φ.

2.3 Model identification

The proposed model is not identifiable because of the following two model indeterminacies. The first is caused by the additive nonparametric functions involved in (1), in which each unknown function is not identifiable up to a constant. To address this problem, we need to impose constraints on the unknown functions to enforce their integrations in the ranges of predictors to zero (Panagiotelis and Smith, 2008; Song and Lu, 2010) as follows:

\int_{𝒳_{j}} f_{sj} (x) dx = 0, for s = 1, \dots, S, j = 1, \dots, q,

(5)

where 𝒳_j is the support of x_j. The other model determinacy is the label switching problem elicited by the invariance of the likelihood function to a random permutation of the state labels, which results in a multi-modal posterior under a symmetric prior specification. We follow Frühwirth-Schnatter (2001) to conduct a permutation sampler to address this issue.

3 Bayesian Inference

3.1 Nonparametric modeling

The first critical issue in the Bayesian analysis of the proposed model is to estimate the nonparametric functions involved in (1). We consider the use of Bayesian P-splines (Berry et al., 2002; Lang and Brezger, 2004; Fahrmeir and Raach, 2007). The basic idea is to estimate the unknown smoothing functions through a sum of B-splines basis functions (De Boor, 2001) given a large number of knots in the domains of predictors. Specifically, f_sj(x_it,j), the functional effect of the jth covariate at state s for subject i at time t, can be approximated as follows:

f_{sj} (x_{it, j}) = \sum_{l = 1}^{L} β_{sj, l} B_{l} (x_{it, j}) = β_{sj}^{T} B (x_{it, j}),

(6)

where L is the number of splines determined by the number of knots, β_sj = (β_sj,1, …, β_sj,L)^T is the vector of the unknown parameters, B(·)s’ are cubic B-splines basis functions, and B(x_it,j) = (B₁(x_it,j), …, B_L(x_it,j))^T. Usually, L taking a value from 10 to 30 provides sufficient flexibility in fitting most smooth functions.

One problem of applying (6) to approximate an unknown smooth function is the over-fitting caused by the use of a large number of knots. Eilers and Marx (1996) suggested the penalization of the coefficients of adjacent B-splines basis functions to prevent the overfitting. Such penalization can be implemented in the Bayesian framework by applying random walk priors to β_sj (Lang and Brezger, 2004; Fahrmeir and Raach, 2007; Song and Lu, 2010).

3.2 Prior distributions

We assign a truncated Gaussian priors for β_sj as follows:

p (β_{sj} | ν_{sj}) = {(\frac{1}{2 π ν_{sj}})}^{L_{sj} / 2} exp {- \frac{1}{2 ν_{sj}} β_{sj}^{T} K_{sj} β_{sj}} I (1_{n_{s}}^{T} B_{sj} β_{sj} = 0),

(7)

where ν_sj is a smoothing parameter for controlling the amount of penalty, K_sj is a penalty matrix derived according to the random walk penalties proposed, L_sj is the rank of K_sj, 1_{n_s} is an n_s × 1 vector with all elements equal to 1, n_s is the sample size at state s, B_sj is the sub-matrix of B_j = [B_l(x_it,j)]_nT×L without the rows where Z_it ≠ s, and the truncation term incorporates the identifiability constraint (5) into the splines approximation (6).

For the smoothing parameters ν_sj, we assign a highly dispersed but proper inverse gamma prior as follows:

p (ν_{sj}^{- 1}) \overset{D}{=} Gamma [ν_{1}, ν_{2}],

(8)

where ν₁ and ν₂ are hyperparameters whose values are pre-specified. A common choice for these hyperparameters is ν₁ = 1 and ν₂ is small (Fahrmeir and Raach, 2007; Song and Lu, 2010). We set ν₁ = 1 and ν₂ = 0.005 in the present study.

For the parameters involved in conditional model (1), conjugate-type priors are assigned as follows: for s = 1, …, S,

p (μ_{s}) \overset{D}{=} N [μ_{s 0}, σ_{μ s 0}^{2}], p (γ_{s}) \overset{D}{=} N [γ_{s 0}, \sum_{s 0}],

(9)

p (Φ^{- 1}) \overset{D}{=} Wishart [R_{0}, ρ_{0}], p (ψ_{s}^{- 1}) \overset{D}{=} Gamma [{\tilde{α}}_{s 0}, {\tilde{β}}_{s 0}],

where μ_s0, $σ_{μ s 0}^{2}$ , γ_s0, Σ_s0, α̃_s0, β̃_s0, R₀, and ρ₀ are hyperparameters with preassigned values.

Finally, for the parameters involved in transition model (4), we consider the following Gaussian priors:

p (ζ_{us}) \overset{D}{=} N [ζ_{us 0}, σ_{ζ 0}^{2}], p (α) \overset{D}{=} N [α_{0}, H_{α 0}], p (τ_{s}) \overset{D}{=} N [τ_{s 0}, σ_{τ 0}^{2}],

(10)

where ζ_us0, $σ_{ζ 0}^{2}$ , α₀, H_α0, τ_s0, and $σ_{τ 0}^{2}$ are hyperparameters with preassigned values.

3.3 Posterior computation

Let y_i = (y_i1, …, y_iT)^T, Y = (y₁, …, y_N), D = (d₁₁, …, d_NT), Z_i = (Z_i1, …, Z_iT)^T, Z = (Z₁, …, Z_N), W = (w₁, …, w_N), and θ be the vector that includes all the unknown parameters in the proposed model. The complete-data log-likelihood function that is used to derive the posterior distributions and compute the model selection criterion is given by

log p (Y, D, W, Z | θ) = \sum_{i = 1}^{n} [log p (y_{i} | d_{i}, w_{i}, Z_{i}, θ) + log p (Z_{i} | d_{i}, w_{i}, θ) + log p (w_{i} | θ)] = \sum_{i = 1}^{n} \sum_{t = 1}^{T} log p (y_{it} | d_{it}, w_{i 1}, Z_{it} = s, θ) + \sum_{i = 1}^{n} \sum_{t = 2}^{T} log p (Z_{it} = s | Z_{i, t - 1} = u, d_{it}, w_{i 2}, θ) + \sum_{i = 1}^{n} log p (Z_{i 1} = s | θ) + \sum_{i = 1}^{n} log p (w_{i} | θ) = - \frac{1}{2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} [log (2 π ψ_{s}) + {(y_{it} - μ_{s} - γ_{s}^{T} b_{it} - \sum_{j = 1}^{q} β_{sj}^{T} B (x_{it, j}) - w_{i 1})}^{2} / ψ_{s}] + \sum_{i = 1}^{n} \sum_{t = 2}^{T} log (p_{itus}) + \sum_{i = 1}^{n} log (p_{i 10 s}) - \frac{1}{2} \sum_{i = 1}^{n} [log (4 π^{2} | Φ |) + w_{i}^{T} Φ^{- 1} w_{i}],

(11)

where

p_{itu 1} = \frac{exp {a_{itu 1}}}{1 + exp {a_{itu 1}}}, p_{i 1 uS} = \prod_{j = 1}^{S - 1} \frac{1}{1 + exp {a_{ituj}}},

p_{itus} = \frac{exp {a_{itus}}}{1 + exp {a_{itus}}} \prod_{j = 1}^{s - 1} \frac{1}{1 + exp {a_{ituj}}}, s = 2, \dots, S - 1,

(12)

p_{i 10 s} = τ_{s}, s = 1, \dots, S,

with a_itus = ζ_us + α^Td_it + w_i2.

The Bayesian estimate of θ is obtained by drawing samples from p(θ|Y), which is intractable because of the existence of latent states and random effects. We instead work on p(θ, Z, W|Y) and use a Gibbs sampler to implement the posterior simulation. Owing to the nonlinearity of the continuation-logit transition model and the existence of the nonparametric functions in the conditional regression, some full conditional distributions, especially those related to the transition model, have complex forms. MCMC methods, such as the forward filtering and backward sampling algorithm (Cappé et al., 2005) and the Metropolis-Hastings (MH) algorithm (Metropolis et al., 1953; Hastings, 1970), are employed to sample from them. The details are provided in the Appendix.

With the use of posterior samples, the hidden states can be estimated as follows:

{\hat{Z}}_{it} = arg max_{s \in {1, \dots, S}} P (Z_{it} = s | y_{i}, θ) \approx arg max_{s \in {1, \dots, S}} \frac{1}{M} \sum_{m = 1}^{M} I (Z_{it}^{(m)} = s),

(13)

where $Z_{it}^{(m)}$ denotes the latent allocation of y_it at the mth iteration, and $\frac{1}{M} \sum_{m = 1}^{M} I (Z_{it}^{(m)} = s)$ is the posterior mean of the latent allocations of y_it drawn from the MCMC iterations.

3.4 Determination of the number of hidden states

In the applications of BMSHM2 to the ADNI dataset, the states of the Markov chain can often naturally be interpreted as proxies for the neurodegenerative states, although a one-to-one correspondence between nominal HMM states and the clinical cognitive stages diagnosed by doctors is unnecessary. In this regard, a relevant question is how to determine the number of hidden states in the analysis of ADNI data. We propose the use of a modified deviance information criteria (DIC) to determine the number of hidden states and choose a plausible model for the ADNI data analysis.

The modified DIC, which was developed by Celeux et al. (2006) for model comparison in the presence of incomplete data, is defined as follows:

DIC = - 4 E_{θ, W, Z} {log p (Y, W, Z | θ) | Y} + 2 E_{W, Z} {log p (Y, W, Z | E_{θ} [θ | Y, W, Z]) | Y} .

(14)

where log p(Y, W, Z|θ) is the complete-data log-likelihood function shown in (11). The expectations involved in (14) can be approximated using the posterior samples collected through MCMC methods (Celeux et al., 2006). In model selection, the model with the smallest value of DIC is selected.

4 Alzheimer’s Disease Neuroimaging Initiative Data Analysis

4.1 Data description

The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and nonprofit organizations, as a $60 million, 5-year public private partnership. The initial goal of ADNI was to recruit 800 adults, aged 55 to 90, to participate in the research – approximately 200 cognitively normal older individuals to be followed for 3 years, 400 people with MCI to be followed for 3 years, and 200 people with early AD to be followed for 2 years.

We focused on 633 subjects who were all followed up at baseline, 6 months, 12 months, and 24 months. For each subject, we included his/her clinical, genetic, and imaging variables at the four time points. The clinical characteristics include gender (0 = male; 1 = female), age at baseline, and FAQ score. The FAQ score is an assessment of abilities to function independently in daily life and is widely used to monitor the decline of cognitive ability over time. The genetic variables include APOE gene because mutations in APOE raise the risk of progression from amnestic MCI to AD (Petersen et al., 2005). The APOE SNPs, rs429358 and rs7412 were separately genotyped in ADNI-1. These two SNPs together define a 3-allele haplotype, namely, the ε2, ε3, and ε4 variants. Among these variants, APOE-ε4 has been identified as a risk factor for early onset of AD (e.g., Okuizumi et al., 1994). Thus, we considered the presence of APOE-ε4 as a covariate in this analysis. APOE-ε4 is coded as 0, 1, and 2, denoting the number of APOE-ε4 alleles. Furthermore, the logarithm of the ratio of hippocampal volume over whole brain volume was included as a covariate because published reports (Kesslak, Nalcioglu and Cotman, 1991; Jack et al., 1992; Dickerson and Wolk, 2013) revealed that the atrophy of the hippocampal formation was a significant diagnostic marker of clinical dementia. Table 1 summarizes the basic characteristics of the aforementioned variables for the samples under consideration. Males account for about 56.2% in the samples. Mean values of patients’ age (in year), adjusted hippocampal volume, and corresponding FAQ score are 73.0, −5.0, and 4.1, respectively. 34.6% patients carry one APOE-ε4 allele while only 9.8% carry two APOE-ε4 alleles.

Table 1.

Characteristics of the study samples in the ADNI-1 dataset

Mean Age (in years)	73.0(6.9)
Gender (Male percentage)	56.2%
Mean log(hippocampus/whole brain volume)	−5.0(0.2)
Mean FAQ score	4.1(6.5)
One APOE-ε4 allele carriers	34.6%
Two APOE-ε4 alleles carriers	9.8%

Open in a new tab

The numbers in parentheses are standard deviations.

4.2 Data analysis

The aims of this ADNI data analysis are (I) to identify the hidden states of the neurodegenerative pathology on the basis of 633 patients enrolled in the ADNI-1, (II) to reveal a set of potential covariates that influence the between-states transition, and (III) to investigate the linear and/or functional covariate effects on cognitive decline across the hidden states of the AD progression.

We fitted BMSHM2 with the FAQ score as the response y_it, the clinical and genetic variables, gender and APOE-ε4, as covariates in b_it, and hippocampus and age at baseline as covariates in x_it. Three continuous variables, FAQ score, hippocampus, and age, were standardized prior to analysis. We first determined the number of hidden states. We considered five competing models M_k, k = 1, …, 5, where M_k represents a BMSHM2 with k states. A total of 24 equidistant knots were used to construct cubic P-splines, and the second-order random walk penalties were used for the Bayesian P-splines to estimate the unknown smooth functions. Given the lack of prior information, we assign the hyperparameters in (9) and (10) to reflect vague prior information as follows: μ_s0 = ζ_us0 = τ_s0 = 0, $σ_{μ s 0} = σ_{ζ 0}^{2} = σ_{τ 0}^{2} = 1$ , α̃_s0 = 9, β̃_s0 = 4, ρ₀ = 7, R₀ = 4I₂, α₀ and γ_s0 is a vector with all elements being zero, H_α = I₅ and Σ_s0 = I₃ where I_r is a r-dimensional identity matrix. We used the random permutation sampler to search for a suitable identifiability constraint to solve the label switching problem. The MCMC algorithm converged within 2,000 iterations for all competing models. We collected a total of 10,000 observations after discarding 2,000 burn-in iterations to calculate DIC. The DIC values corresponding to M₁ to M₅ were 20,175, 1,823, 1,001, 950, and 1615, respectively. Thus, the four-state model M₄ was selected.

To examine the necessity of the random effects in the proposed model, we considered another competing model M_N: a four-state BMSHM2 without random effects. The DIC value for M_N is 1,122, which suggests an evident advantage of the proposed mixed effect model in the presence of high dependency/heterogeneity in longitudinal observations. Thus, M₄ was selected for the subsequent analysis. The estimation results obtained under M₄ are reported in Table 2 (parametric part) and Figure 2 (nonparametric part).

Table 2.

ADNI-1 data analysis results: parameter estimation results.

Parameters in conditional regression model

State 1			State 2			State 3			State 4

Par.	Est	SE	Par.	Est	SE	Par.	Est	SE	Par.	Est	SE

μ₁	−0.556	0.013	μ₂	0.029	0.114	μ₃	1.108	0.134	μ₄	2.494	0.127
ψ₁	0.013	0.001	ψ₂	0.135	0.019	ψ₃	0.187	0.025	ψ₄	0.397	0.057
γ₁₁	0.016	0.018	γ₂₁	0.163	0.098	γ₃₁	0.102	0.131	γ₄₁	0.443	0.140
γ₁₂	0.079	0.033	γ₂₂	0.260	0.132	γ₃₂	0.307	0.157	γ₄₂	0.407	0.193
γ₁₃	0.011	0.015	γ₂₃	0.038	0.100	γ₃₃	−0.252	0.142	γ₄₃	−0.557	0.137

Parameters in probability transition model

Par.	Est	SE	Par.	Est	SE	Par.	Est	SE	Par.	Est	SE

τ₁	0.938	0.110	τ₂	−0.303	0.373	τ₃	1.132	0.330	α₁	0.500	0.080
α₂	0.035	0.067	α₃	−0.342	0.141	α₄	−0.728	0.230	α₅	0.085	0.162
ζ₁₁	2.477	0.158	ζ₂₁	−1.714	0.342	ζ₃₁	−3.244	0.508	ζ₄₁	−3.252	0.512
ζ₁₂	2.540	0.493	ζ₂₂	1.198	0.411	ζ₃₂	−1.661	0.479	ζ₄₂	−3.114	0.525
ζ₁₃	1.877	0.779	ζ₂₃	2.618	0.445	ζ₃₃	1.298	0.307	ζ₄₃	−1.845	0.432

Covariance matrix of random effects

Par.	Est	SE	Par.	Est	SE	Par.	Est	SE

φ₁₁	0.024	0.002	φ₂₂	0.223	0.051	φ₁₂	−0.007	0.006

Open in a new tab

ADNI-1 data analysis results: estimates of the unknown smooth functions. The solid curves represent the pointwise mean curves, and the dashed curves represent the 2.5%-and 97.5%- pointwise quantiles. line y = 0 has been shown on each picture by red dot-dash to illustrate the range of significant effect for each variable.

We have the following observations. First, μ₁, μ₂, μ₃, and μ₄ are ranked in an ascending order, indicating that patients in state 1 got the lowest score of FAQ, whereas those in state 4 got the highest. That is, patients’ ability to function independently in daily life steadily deteriorated from state 1 to state 4. According to the existing literature (Kantarci et al., 2013), state 1 to state 4 can be explained as cognitive normal (CN), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI), and AD, respectively.

Second, the functional effect of hippocampus on FAQ exhibits a descending trend as hippocampus grows regardless of states. Specifically, in CN state, people with bigger hippocampus volume tend to have slightly better memory. This finding is in line with the common sense that hippocampus plays an important role in the consolidation of information from short-term memory to long-term memory. In EMCI and LMCI states, the descending trend of the functional effect of hippocampus on FAQ becomes much more pronounced. This result implies that atrophy in hippocampus increasingly impaires patients’ cognitive ability. The published reports (e.g., Dickerson and Wolk, 2013; Kesslak, Nalcioglu and Cotman, 1991; Jack et al., 1992) also indicate that the volume loss of the hippocampus is greatly associated with clinical dementia. In AD state, the effect of hippocampal volume on FAQ is not significant because patients’ cognitive ability and memory have already been damaged by serious hippocampus atrophy.

Third, the effect of age on FAQ is nonsignificant in CN and EMCI states, implying that for those who have cognitive normal function or undergo only EMCI, age may not be a decisive factor for the reduction of cognitive ability. On the contrary, age exhibits nonlinear effects on FAQ in LMCI and AD states. The age effect is nonsignificant for relatively young patients but becomes significantly positive for elderly patients (say, over 85 years old). The positive effect increases with age and gets an even sharper rise in AD state. Such age effect was also discovered by previous studies (e.g. Gao et al., 1998; Lindsay et al., 2002).

Fourth, as for the fixed effects of discrete variables, a significantly negative effect of gender on FAQ appears in state 4, which means that male AD patients are in a better condition than females in terms of independent abilities. The two APOE-ε4 alleles (b_it,1 and b_it,2) have significantly positive effects on FAQ in state 4 and b_it,1 also has a slightly positive effect on FAQ in other states. This finding agrees with the newly published clinical research output that the presence of ε4 alleles in the APOE gene is the only genetic variant broadly accepted as increasing risk for late-onset AD dementia (Albert et al., 2011).

Fifth, in the transition model, hippocampus positively affects the probability of transitioning from a state to a better one, indicating that controlling loss of hippocampal volume would be beneficial to prevent the deterioration of cognitive ability. Similar to previous studies (e.g., Lee et al., 2015), APOE-ε4 alleles have negative effects on the probability of transitioning from a state to a better one, reconfirming that APOE-ε4 alleles are important risk factors for the development of AD.

Sixth, the variances of the two random effects are significant, reconfirming the necessity of the random effects proposed. However, the corvariance between the two random effects is nonsignificant, showing that some omitted clinical or genetic indicators influenced outcomes of the observation process or probabilities of the transition process but did not affect the two processes simultaneously.

Moreover, we estimated the hidden states of all patients at four time points based on Equation (13). Around 98% posterior transition patterns are from a state to a severer one, which is in line with the common knowledge of irreversibility of AD. Table 3 reports patients’ estimated hidden states and their diagnosed status by doctors. For CN, LMCI, and AD states, a majority of the estimated states are consistent with those diagnosed by doctors. For EMCI state, however, 835 (67%) EMCI patients diagnosed by doctors were classified into CN state by our procedure. Such vague demarcation between CN and EMCI was also found and discussed in the literature (e.g., Petersen, 2004).

Table 3.

ADNI-1 data analysis results: comparison of estimated hidden states and diagnosis status

	CN	EMCI	LMCI	AD	Total

Diagnosis
CN	840	21	1	0	862
EMCI	835	232	148	28	1243
LMCI	9	21	39	21	90
AD	23	49	106	159	337
Total	1707	323	294	208	2532

Open in a new tab

5 Simulation Study

We conduct Monte Carlo simulations to assess the empirical performance of the proposed method in estimation of the nonparametric functions and model parameters.

5.1 Model setup

We consider a BMSHM2 with four hidden states (S = 4), a continuous response y_it, three discrete covariates (r = 3), and two continuous covariates (q = 2) to mimic the scenario of the ADNI study. For i = 1, …, 700 and t = 1, …, 9, b_it,1, b_it,2, and b_it,3 are all generated from the Bernoulli distribution with the probability of success 0.5, and x_it,1 and x_it,2 are generated from U (−1, 1) and N (0, 1), respectively. The conditional model is defined as

[y_{it} | Z_{it} = s] = μ_{s} + γ_{s 1} b_{it, 1} + γ_{s 2} b_{it, 2} + γ_{s 3} b_{it, 3} + f_{s 1} (x_{it, 1}) + f_{s 2} (x_{it, 2}) + w_{i 1} + δ_{it},

(15)

where f₁₁(x_it,1) = −1.305+exp(x_it,1), f₁₂(x_it,2) = 0.55+sin(1.5x_it,2)+x_it,2, f₂₁(x_it,1) = 0.06−log((1 + x_it,1)/(1 − x_it,1)), $f_{22} (x_{it, 2}) = 0.125 + x_{it, 2}^{3}$ , f₃₁(x_it,1) = −0.05 − 0.8x_it,1, f₃₂(x_it,2) = −0.275 + cos(2x_it,2) + 0.5x_it,2, $f_{41} (x_{it, 1}) = - 0.13 - x_{it, 1}^{3}$ and f₄₂(x_it,2) = −0.85 + 1.5x_it,2.

The transition model is defined as

logit (ϑ_{itus}) = ζ_{us} + α_{1} x_{it, 1} + α_{2} x_{it, 2} + α_{3} b_{it, 1} + α_{4} b_{it, 2} + α_{5} b_{it, 3} + w_{i 2} .

(16)

The true population values of the unknown parameters are set as μ = (μ₁, μ₂, μ₃, μ₄) = (−5, −1, 3, 7), τ = (τ₁, τ₂, τ₃, τ₄) = (0.27, 0.27, 0.23, 0.23), ζ₁₁ = ζ₂₁ = ζ₃₁ = ζ₄₁ = −1, ζ₁₂ = ζ₂₂ = ζ₃₂ = ζ₄₂ = −1/2, ζ₁₃ = ζ₂₃ = ζ₃₃ = ζ₄₃ = 1/2, γ₁ = (γ₁₁, γ₁₂, γ₁₃) = (−1, 0.5, 0.5), γ₂ = (γ₂₁, γ₂₂, γ₂₃) = (1, 1, 0.5), γ₃ = (γ₃₁, γ₃₂, γ₃₃) = (−0.5, −0.5, −0.5), γ₄ = (γ₄₁, γ₄₂, γ₄₃) = (0.5, −1, −1), α = (α₁, α₂, α₃, α₄, α₅)^T = (1, −1, −0.5, 0.5, 1), ψ = (ψ₁, ψ₂, ψ₃, ψ₄) = (1, 0.64, 0.36, 0.25), and Φ is a correlation matrix with off diagonal elements −0.5. Based on the above setup, we simulate 100 datasets for analysis.

5.2 Simulation results

We used a total of 24 equidistant knots to construct the cubic B-splines of the covariates. Again, the second-order random walk penalties were used for the Bayesian P-splines to estimate the unknown smooth functions. The prior inputs in (9) and (10) were assigned as follows: μ_s0 = ζ_us0 = τ_s0 = 0, $σ_{μ s 0} = σ_{ζ 0}^{2} = σ_{τ 0}^{2} = 1$ , α̃_s0 = 9, β̃_s0 = 4, ρ₀ = 7, R₀ = 4I₂, α₀ and γ_s0 are vectors with all elements being zero, H_α = I₅ and Σ_s0 = I₃, where I_r is a r × r identity matrix. We conducted a few test runs to decide the number of burn-in iterations required for convergence and found that the MCMC algorithm converged within 2,000 iterations. Therefore, we obtain Bayesian results using 5,000 observations after discarding 2,000 burn-in iterations. The performance of the Bayesian estimates is assessed through the bias (BIAS) and the root mean square errors (RMSE) between the Bayesian estimates and the true population values of the parameters.

Table 4 summarizes the result of parameter estimation based on the 100 datasets. The BIAS and RMSE for most of the parameters are close to zero, indicating a satisfactory performance of Bayesian estimation regarding the parametric part. Figure 3 depicts the averages of the pointwise posterior means of the nonparametric functions, along with their 2.5%- and 97.5%- pointwise quantiles. The posterior means of the nonparametric functions are close to their true curves and all the ranges of the two pointwise quantiles are relatively small, indicating that the estimated functions can correctly recover the true functional relationships between the response and covariates. In this simulation, the average of the correct classification rates calculated through Equation (13) based on the 100 datasets is 91%. Considering the complexity of proposed model, this result is satisfactory.

Table 4.

Bayesian estimates of the parameters in the simulation study.

Parameters in conditional regression model

State 1			State 2			State 3			State 4

Par.	Bias	RMSE	Par.	Bias	RMSE	Par.	Bias	RMSE	Par.	Bias	RMSE

μ₁	0.010	0.098	μ₂	0.024	0.115	μ₃	−0.046	0.195	μ₄	−0.006	0.106
ψ₁	−0.006	0.042	ψ₂	0.011	0.047	ψ₃	0.003	0.044	ψ₄	0.016	0.028
γ₁₁	−0.002	0.105	γ₂₁	−0.045	0.188	γ₃₁	0.025	0.147	γ₄₁	−0.020	0.112
γ₁₂	0.002	0.091	γ₂₂	−0.022	0.126	γ₃₂	0.030	0.168	γ₄₂	0.018	0.089
γ₁₃	−0.013	0.090	γ₂₃	−0.013	0.098	γ₃₃	0.026	0.126	γ₄₃	0.002	0.093

Parameters in probability transition model

Par.	Bias	RMSE	Par.	Bias	RMSE	Par.	Bias	RMSE	Par.	Bias	RMSE

τ₁	0.008	0.137	τ₂	−0.030	0.187	τ₃	−0.048	0.236	α₁	0.007	0.068
α₂	0.010	0.064	α₃	−0.024	0.103	α₄	−0.011	0.122	α₅	−0.029	0.112
ζ₁₁	0.040	0.125	ζ₂₁	0.042	0.132	ζ₃₁	0.034	0.120	ζ₄₁	0.035	0.131
ζ₁₂	0.050	0.168	ζ₂₂	0.031	0.156	ζ₃₂	0.037	0.162	ζ₄₂	0.034	0.167
ζ₁₃	−0.034	0.197	ζ₂₃	−0.032	0.195	ζ₃₃	−0.027	0.186	ζ₄₃	−0.032	0.197

Covariance matrix of random effects

Par.	Bias	RMSE	Par.	Bias	RMSE	Par.	Bias	RMSE

φ₁₁	0.004	0.066	φ₂₂	0.017	0.161	φ₁₂	0.001	0.067

Open in a new tab

Estimates of the unknown smooth functions in the simulation study: The solid curves represent the true curves, and the dashed curves represent the estimated posterior means and the 2.5%- and 97.5%- pointwise quantiles based on 100 replications, respectively.

To reveal the sensitivity of the Bayesian estimates to the input of priors, we disturbed the prior inputs as follows: μ_s0 = ζ_us0 = τ_s0 = 2, $σ_{μ s 0} = σ_{ζ 0}^{2} = σ_{τ 0}^{2} = 2$ , α̃_s0 = 3, β̃_s0 = 2, ρ₀ = 4, R₀ = 2I₂, α₀ and γ_s0 are vectors with all elements being two, H_α = 2I₅ and Σ_s0 = 2I₃. The obtained results are similar and not reported.

6 Discussion

The BMSHM2 was developed and successfully applied to the ADNI data analysis. Although HMMs and their variants have already been extensively used for longitudinal data analysis, a majority of applications restrict analysis in a parametric framework. Nonetheless, examples of using HMMs to classify and characterise the neurodegenerative states of AD pathology are not prevalent, especially in a semiparametric context. In this study, we extended parametric HMMs to accommodate the functional effects of hippocampus and age on cognitive decline across four neurodegenerative states, namely, CN, EMCI, LMCI, and AD. The functional effect of hippocampus on cognitive function exhibited a descending trend as hippocampus grows regardless of states. This descending trend became more pronounced for EMCI and LMCI states than for CN and AD states, implying that atrophy in hippocampal volume had increasingly impaired patients’ cognitive ability, especially during the progression from EMCI to LMCI. On the contrary, age affected cognitive function mainly in LMCI and AD states. Elderly LMCI or AD patients suffered from more increasing neurodegeneration than relatively young patients.

Our model incorporates correlated random effects to account for individual and/or contextual differences in the progression of cognitive decline and in between-state transition. Large inter-individual variability is a prominent feature of the ADNI dataset and many other longitudinal datasets. As we demonstrated in the ADNI study, accounting for such differences can dramatically improve model fit, as evidenced by an apparent improvement in DIC value between models with and without random effects. In addition, the correlation between the random effects enhances the model capability of accommodating the situation where some omitted covariates influence both the state-dependent observation process and the hidden-state transition process. Another appealing feature of this study is that it implements a full Bayesian analysis along with efficient MCMC methods. The sampling-based Bayesian approach is not only applicable to the current parameter-rich BMSHM2 but also possesses potential to address highly complex problems with which huge challenges are confronted by ML-based procedures.

The present study can be extended in several directions: First, we considered the nonparametric modeling only in the conditional model. Generalizing the parametric transition model to a semiparametric or nonparametric one can further enhance model flexibility and analytic power. However, the statistical analysis of such comprehensive models can be challenging because the computational burden and sample size often limit the complexity of candidate models. Thus, the feasibility of this extension requires further investigation. Second, in the application to the ADNI dataset, a highly comprehensive characterization of cognitive function is to group the FAQ, Alzheimer’s Disease Assessment Scale, and Mini-Mental State Examination into an integrated latent construct through multivariate techniques such as factor analysis (e.g. Song et al., 2016). Finally, this study did not consider missing data. Given that missingness is very common in longitudinal settings, accommodation of missing responses and/or missing covariates in the context of BMSHM2s is both of scientific interest and of practical value. These advances certainly require substantial efforts for further investigation.

Appendix

Full Conditional Distributions

(I) Full conditional distributions of Z_it

We follow Baum et al. (1970) to adopt a recursive method to sample Z_it from the full conditional distribution efficiently as follows:

Let y_i = {y_i1, …, y_iT } and D_i = {d_i1, …, d_iT}, then we have

p (Z_{it} | \cdot) \propto p (y_{i}, D_{i}, w_{i}, Z_{it} | θ)

(A1)

= p (y_{i 1}, \dots, y_{it}, d_{i 1}, \dots, d_{it}, w_{i}, Z_{it} | θ) \times p (y_{it + 1}, \dots, y_{iT}, d_{it + 1}, \dots, d_{iT} | w_{i}, Z_{it}, θ)

(A2)

≐ q_{it} (y_{i}, D_{i}, w_{i}, Z_{it} | θ) \times {\bar{q}}_{it} (y_{i}, D_{i} | w_{i}, Z_{it}, θ) .

(A3)

q_{it} (y_{i}, D_{i}, w_{i}, Z_{i 1} | θ) = q_{it} (y_{i 1}, \dots, y_{it}, d_{i 1}, \dots, d_{iT}, w_{i}, Z_{i 1} | θ) = \sum_{u = 1}^{S} p (y_{i 1}, \dots, y_{it}, d_{i 1}, \dots, d_{iT}, w_{i}, Z_{it}, Z_{i, t - 1} = u | θ) = \sum_{u = 1}^{S} p (y_{i 1}, \dots, y_{it}, d_{i 1}, \dots, d_{iT}, w_{i}, Z_{i, t - 1} = u | θ) \times p (Z_{it} | Z_{i, t - 1} = u, w_{i 2}, θ) \times p (y_{it}, d_{it} | Z_{it}, w_{i 1}, θ)] = \sum_{u = 1}^{S} [q_{it} (y_{i, t - 1}, D_{i}, w_{i}, Z_{i, t - 1} = u | θ) \times p (Z_{it} | Z_{i, t - 1} = u, w_{i 2}, θ) \times p (y_{it}, d_{it} | Z_{it}, w_{i 1}, θ)],

(A4)

where p(Z_it|Z_i,t−1 = u, w_i2, θ), p(y_it, d_it|Z_it, w_i1, θ) and p(w_i|θ) can be calculated through Equation (11).

Similarly, we initialize q̄_iT (y_i, D_i|w_i, Z_iT, θ) = 1 and calculate q̄_it(y_i, D_i|w_i, Z_it, θ) for t = T − 1, …, 1 as follows:

{\bar{q}}_{it} (y_{i}, D_{i} | w_{i}, Z_{it}, θ) = p (y_{it + 1}, \dots, y_{iT}, x_{it + 1}, \dots, d_{iT} | w_{i}, Z_{it}, θ) = \sum_{u = 1}^{S} p (y_{it + 1}, \dots, y_{iT}, d_{it + 1}, \dots, d_{iT}, Z_{it + 1} = u | w_{i}, Z_{it}, θ)

(A5)

= \sum_{u = 1}^{S} [p (y_{it + 1}, \dots, y_{iT}, d_{it + 1}, \dots, d_{iT} | Z_{it + 1} = u, w_{i}, θ) \times p (Z_{it + 1} = u | Z_{it}, w_{i 2}, θ) \times p (y_{it + 1}, d_{it + 1} | Z_{it + 1} = u, w_{i 1}, θ)]

(A6)

= \sum_{u = 1}^{S} [{\bar{q}}_{it} (y_{i}, D_{i} | w_{i}, Z_{it + 1} = u, θ) \times p (Z_{it + 1} = u | Z_{it}, w_{i 2}, θ) \times p (y_{it + 1}, d_{it + 1} | Z_{it + 1} = u, w_{i 1}, θ)] .

Thus, Z_it can be directly generated from (A1) when all q_it(·)s and q̄_it(·)s defined in (A4) and (A5) are well calculated.

(II) Full conditional distributions of w_i

p (w_{i} | \cdot) \propto p (y_{i}, D_{i} | w_{i}, Z_{i 1}, \dots, Z_{iT}, θ) \times p (Z_{i 1}, \dots, Z_{iT} | w_{i}, θ) \times p (w_{i} | θ) \propto exp {- \frac{1}{2} \sum_{t = 1}^{T} {(y_{it} - μ_{s} - γ_{2}^{T} b_{it} - \sum_{j = 1}^{q} β_{sj}^{T} B_{it, j} - w_{i 1})}^{2} I (Z_{it} = s) / ψ_{s} + \sum_{t = 2}^{T} log (p_{itus}) I (Z_{i, t - 1} = u, Z_{it} = s) - \frac{1}{2} w_{i}^{T} Φ^{- 1} w_{i}},

(A7)

where p_itu0 and p_itus can be calculated via Equation (12).

(III) Full conditional distributions of μ_s, γ_s, ψ_s, and Φ

\begin{matrix} [μ_{s} | \cdot] ~ N [μ_{s}^{*}, σ_{μ s}^{*}], & [γ_{s} | \cdot] ~ N [γ_{s}^{*}, \sum_{s}^{*}], \\ [ψ_{s}^{- 1} | \cdot] ~ Gamma [{\tilde{α}}_{s}^{*}, {\tilde{β}}_{s}^{*}], & [Φ^{- 1} | \cdot] ~ Wishart [R^{*}, N + ρ_{0}], \end{matrix}

(A8)

where R* = (R₀ + WW^T)⁻¹, $W = {(w_{1}^{T}, \dots, w_{N}^{T})}^{T}$ and

σ_{μ s}^{*} = {(n_{s} ψ_{s}^{- 1} + σ_{μ s 0}^{- 1})}^{- 1}, \sum_{s}^{*} = {(\sum_{i = 1}^{N} \sum_{t = 1}^{T} b_{it} b_{it}^{T} ψ_{s}^{- 1} + \sum_{s 0}^{- 1})}^{- 1},

μ_{s}^{*} = σ_{μ s}^{*} [ψ_{s}^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} I (Z_{it} = s) (y_{it} - γ_{s}^{T} b_{it} - \sum_{j = 1}^{q} β_{sj}^{T} B_{it, j} - w_{i 1}) + σ_{μ s 0}^{- 1} μ_{s 0}],

γ_{s}^{*} = \sum_{s}^{*} [ψ_{s}^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} I (Z_{it} = s) b_{it} (y_{it} - μ_{s} - \sum_{j = 1}^{q} β_{sj}^{T} B_{it, j} - w_{i 1}) + \sum_{s 0}^{- 1} γ_{s 0}],

{\tilde{α}}_{s}^{*} = n_{s} / 2 + {\tilde{α}}_{s 0}, {\tilde{β}}_{s}^{*} = {\tilde{β}}_{s 0} + \frac{1}{2} [\sum_{i = 1}^{n} \sum_{t = 1}^{T} I (Z_{it} = s) {(y_{it} - μ_{s} - γ_{s}^{T} b_{it} - \sum_{j = 1}^{q} β_{sj}^{T} B_{it, j} - w_{i 1})}^{2}] .

(IV) Full conditional distributions of β_sj and θ_sj

[β_{sj} | \cdot] ~ N [β_{sj}^{*}, H_{sj}] I (1_{n_{s}}^{T} B_{sj} β_{sj} = 0),

(A9)

where $H_{sj} = {(ψ_{s}^{- 1} B_{sj}^{T} B_{sj} + ν_{sj}^{- 1} K_{sj})}^{- 1}, β_{sj}^{*} = ψ_{s}^{- 1} H_{sj} B_{sj}^{T} y_{s}^{*}$ , and $y_{s}^{*} = {y_{it, s}^{*}}$ is an n_s × 1 vector with

y_{it, s}^{*} = y_{it} - μ_{s} - γ_{s}^{T} b_{it} - \sum_{l \neq j, l = 1}^{q} β_{sl}^{T} B_{it, l} - w_{i 1}, for Z_{it} = s .

According to Panagiotelis and Smith (2008), sampling an observation β_sj from truncated normal (A9) is equivalent to sampling an observation $β_{sj}^{(temp)}$ from $N [β_{sj}^{*}, H_{sj}]$ and then transforming $β_{sj}^{(temp)}$ to β_sj by

β_{sj} = β_{sj}^{(temp)} - H_{sj} Q_{sj}^{T} {(Q_{sj} H_{sj} Q_{sj}^{T})}^{- 1} Q_{sj} β_{sj}^{(temp)},

(A10)

where $Q_{sj} = 1_{n_{s}}^{T} B_{sj}$ . Moreover,

[ν_{sj}^{- 1} | \cdot] ~ Gamma [ν_{1} + \frac{L}{2}, ν_{2} + \frac{1}{2} β_{sj}^{T} K_{sj} β_{sj}] .

(A11)

(V) Full conditional distributions of τ_s, ζ_us, and α

p (τ_{s} | \cdot) \propto exp {\sum_{u = s}^{S} \sum_{i = 1}^{n} log (p_{i 10 u}) \times I (Z_{i 1} = u) - \frac{{(τ_{s} - τ_{s 0})}^{2}}{2 σ_{τ 0}^{2}}},

p (ζ_{us} | \cdot) \propto exp {\sum_{υ = s}^{S} \sum_{i = 1}^{n} \sum_{t = 2}^{T} log (p_{itu υ}) \times I (Z_{it} = υ, Z_{i, t - 1} = u) - \frac{{(ζ_{us} - ζ_{us 0})}^{2}}{2 σ_{ζ 0}^{2}}},

(A12)

p (α | \cdot) \propto exp {\sum_{i = 1}^{n} \sum_{t = 2}^{T} log (p_{itus}) \times I (Z_{it} = s, Z_{i, t - 1} = u) - \frac{1}{2} {(α - α_{0})}^{T} H_{α 0}^{- 1} (α - α_{0})},

where p_itu0 and p_itus can be calculated via Equation (12).

Contributor Information

Kai Kang, Department of Statistics, Chinese University of Hong Kong, Hong Kong, China.

Jingheng Cai, Department of Statistics, Sun Yat-sen University, Guangzhou, China.

Xinyuan Song, Department of Statistics, Chinese University of Hong Kong, Hong Kong, China.

Hongtu Zhu, MD Anderson Cancer Center, University of Texas, Houston, USA.

References

Albert PS, McFarland HF, Smith ME, Frank JA. Time series for modelling counts from a relapsingremitting disease: Application to modelling disease activity in multiple sclerosis. Statistics in Medicine. 1994;13:453–466. doi: 10.1002/sim.4780130509. [DOI] [PubMed] [Google Scholar]
Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, Gamst A, Holtzman DM, Jagust WJ, Petersen RC, Snyder PJ, Carrillo MC, Thies B, Phelps CH. The diagnosis of mild cognitive impairment due to Alzheimers disease: Recommendations from the National Institute on Aging-Alzheimers Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & dementia. 2011;7:270–279. doi: 10.1016/j.jalz.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Altman RM, Petkau AJ. Application of hidden Markov models to multiple sclerosis lesion count data. Statistics in Medicine. 2005;24:2335–2344. doi: 10.1002/sim.2108. [DOI] [PubMed] [Google Scholar]
Altman RM. Mixed hidden Markov models. Journal of the American Statistical Association. 2007;102:201–210. [Google Scholar]
Ansari A, Jedidi K. Bayesian factor analysis for multilevel binary observations. Psychometrika. 2000;65:475–498. [Google Scholar]
Bartolucci F, Farcomeni A. A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. Journal of the American Statistical Association. 2009;104:816–831. [Google Scholar]
Bartolucci F, Farcomeni A, Pennoni F. Latent Markov Models for Longitudinal Data. Florida: Chapman & Hall/CRC, Taylor and Francis Group; 2013. [Google Scholar]
Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics. 1970;41:164–171. [Google Scholar]
Behseta S, Kass RE, Wallstrom GL. Hierarchical models for assessing variability among functions. Biometrika. 2005;92:419–434. [Google Scholar]
Berry SM, Carroll RJ, Ruppert D. Bayesian smoothing and regression splines for measurement error problems. Journal of the American Statistical Association. 2002;97:160–169. [Google Scholar]
Biller C, Fahrmeir L. Bayesian varying-coefficient models using adaptive regression splines. Statistical Modelling. 2001;1:195–211. [Google Scholar]
Cappé O, Moulines E, Rydén T. Inference in Hidden Markov Models. Springer; New York: 2005. [Google Scholar]
Celeux G, Forbes F, Robert CP, Titterington DM. Deviance information criteria for missing data models. Bayesian Analysis. 2006;1:651–674. [Google Scholar]
Chi YY, Ibrahim JG. Regime-switching bivariate dual change score model. Biometrics. 2006;62:432–445. [Google Scholar]
Chow SM, Grimm KJ, Filteau G, Dolan CV, McArdle JJ. Regimes-witching bivariate dual change score model. Multivariate Behavioral Research. 2013;48(4):463–502. doi: 10.1080/00273171.2013.787870. [DOI] [PubMed] [Google Scholar]
De Boor C. A Practical Guide to Splines. Springer-Verlag; New York: 2001. revised edition. [Google Scholar]
Dickerson BC, Wolk D. Biomarker-based prediction of progression in MCI: comparison of AD-signature and hippocampal volume with spinal fluid amyloid-β and tau. Frontiers in Aging Neuroscience. 2013;5:55. doi: 10.3389/fnagi.2013.00055. [DOI] [PMC free article] [PubMed] [Google Scholar]
DiMatteo I, Genovese CR, Kass RE. Bayesian curve fitting with free-knot splines. Biometrika. 2001;88:1055–1071. [Google Scholar]
Dunson DB. Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society: Series B. 2000;62:355–366. [Google Scholar]
Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]
Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11:89–121. [Google Scholar]
Fahrmeir L, Raach A. A Bayesian semiparametric latent variable model for mixed responses. Psychometrika. 2007;72:327–346. [Google Scholar]
Frühwirth-Schnatter S. Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. Journal of the American Statistical Association. 2001;96:194–209. [Google Scholar]
Gao S, Hendrie HC, Hall KS, Hui S. The relationships between age, sex, and the incidence of dementia and Alzheimer disease: a meta-analysis. Archives of General Psychiatry. 1998;55:809–815. doi: 10.1001/archpsyc.55.9.809. [DOI] [PubMed] [Google Scholar]
Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]
Ip E, Zhang Q, Rejeski J, Harris T, Kritchevsky S. Partially ordered mixed hidden Markov model for the disablement process of older adults. Journal of the American Statistical Association. 2013;108:370–384. doi: 10.1080/01621459.2013.770307. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jack CR, Petersen RC, O’brien PC, Tangalos EG. MR-based hippocampal volumetry in the diagnosis of Alzheimer’s disease. Neurology. 1992;42:183–183. doi: 10.1212/wnl.42.1.183. [DOI] [PubMed] [Google Scholar]
Kantarci K, Gunter JL, Tosakulwong N, Weigand SD, Senjem MS, Petersen RC, Aisen PS, Jagust WJ, Weiner MW, Jack CR Alzheimer’s Disease Neuroimaging Initiative. Focal hemosiderin deposits and -amyloid load in the ADNI cohort. Alzheimer’s & Dementia. 2013;9:S116–S123. doi: 10.1016/j.jalz.2012.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]
Kesslak JP, Nalcioglu O, Cotman CW. Quantification of magnetic resonance scans for hippocampal and parahippocampal atrophy in Alzheimer’s disease. Neurology. 1991;41:51–51. doi: 10.1212/wnl.41.1.51. [DOI] [PubMed] [Google Scholar]
Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics. 2004;13:183–212. [Google Scholar]
Lee E, Zhu H, Kong D, Wang Y, Giovanello KS, Ibrahim JG. BFLCRM: A Bayesian functional linear Cox regression model for predicting time to conversion to Alzheimer’s disease. The Annals of Applied Statistics. 2015;9:2153–2178. doi: 10.1214/15-AOAS879. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee SY, Song XY. Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research. 2004;39:653–686. doi: 10.1207/s15327906mbr3904_4. [DOI] [PubMed] [Google Scholar]
Lindsay J, Laurin D, Verreault R, Hbert R, Helliwell B, Hill GB, McDowell I. Risk factors for Alzheimer’s disease: a prospective analysis from the Canadian Study of Health and Aging. American Journal of Epidemiology. 2002;156:445–453. doi: 10.1093/aje/kwf074. [DOI] [PubMed] [Google Scholar]
Maruotti A. Mixed hidden Markov models for longitudinal data: an overview. International Statistical Review. 2011;79:427–454. [Google Scholar]
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. The Journal of Chemical Physics. 1953;21:1087–1091. [Google Scholar]
Myers SC, Majluf NS. Corporate financing and investment decision when firms have information that investor do not have. Journal of Financial Economic. 1984;13:187–221. [Google Scholar]
Okuizumi K, Onodera O, Tanaka H, Kobayashi H, Tsuji S, Takahashi H, Tsuji S, Takahashi H, Oyanagi K, Seki K, Tanaka M, Naruse S, Miyatake T, Mizusawa H, Kanazawa I. ApoE-ε4 and earlyonset Alzheimer’s. Nature genetics. Journal of Financial Economic. 1994;7:10–11. doi: 10.1038/ng0594-10b. [DOI] [PubMed] [Google Scholar]
Panagiotelis A, Smith M. Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. Journal of Econometrics. 2008;143:291–316. [Google Scholar]
Petersen RC. Mild cognitive impairment as a diagnostic entity. Journal of internal medicine. 2004;256(3):183–194. doi: 10.1111/j.1365-2796.2004.01388.x. [DOI] [PubMed] [Google Scholar]
Petersen RC, Thomas RG, Grundman M, Bennett D, Doody R, Ferris S, Galasko D, Jin S, Kaye J, Levey A, Pfeiffer E, Sano M, Dyck CH, Thal LJ. Vitamin E and donepezil for the treatment of mild cognitive impairment. New England Journal of Medicine. 2005;352:2379–2388. doi: 10.1056/NEJMoa050151. [DOI] [PubMed] [Google Scholar]
Richardson S, Green PJ. On Bayesian analysis of mixtures with an unknown number of components (with discussion) Journal of the Royal Statistical Society, Series B. 1997;59:731–792. [Google Scholar]
Schmittmann VD, Dolan CV, van der Maas HL, Neale MC. Discrete latent Markov models for normally distributed response data. Multivariate Behavioral Research. 2005;40:461–488. doi: 10.1207/s15327906mbr4004_4. [DOI] [PubMed] [Google Scholar]
Scott SL, James GM, Sugar CA. Hidden Markov models for longitudinal comparisons. Journal of the American Statistical Association. 2005;100:359–369. [Google Scholar]
Song XY, Lu ZH. Semiparametric latent variable models with Bayesian P-splines. Journal of Computational and Graphical Statistics. 2010;19:590–608. [Google Scholar]
Song XY, Xia YM, Zhu HT. Hidden Markov latent variable models with multivariate longitudinal data. Biometrics. 2016;73:313–323. doi: 10.1111/biom.12536. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society, Series B. 2002;64:583–639. [Google Scholar]
Spiegelhalter DJ, Thomas A, Best NG, abd Lunn D. WinBUGS User Manual. Version 1.4. MRC Biostatistics Unit; Cambridge, England: 2003. [Google Scholar]
Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation (with discussion) Journal of the American Statistical Association. 1987;82:528–550. [Google Scholar]
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]
Vermunt JK, Langeheine R, Böckenholt U. Latent Markov models with time-constant and time-varying covariates. Journal of Educational and Behavioral Statistics. 1999;24:178–205. [Google Scholar]

[R1] Albert PS, McFarland HF, Smith ME, Frank JA. Time series for modelling counts from a relapsingremitting disease: Application to modelling disease activity in multiple sclerosis. Statistics in Medicine. 1994;13:453–466. doi: 10.1002/sim.4780130509. [DOI] [PubMed] [Google Scholar]

[R2] Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, Gamst A, Holtzman DM, Jagust WJ, Petersen RC, Snyder PJ, Carrillo MC, Thies B, Phelps CH. The diagnosis of mild cognitive impairment due to Alzheimers disease: Recommendations from the National Institute on Aging-Alzheimers Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & dementia. 2011;7:270–279. doi: 10.1016/j.jalz.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Altman RM, Petkau AJ. Application of hidden Markov models to multiple sclerosis lesion count data. Statistics in Medicine. 2005;24:2335–2344. doi: 10.1002/sim.2108. [DOI] [PubMed] [Google Scholar]

[R4] Altman RM. Mixed hidden Markov models. Journal of the American Statistical Association. 2007;102:201–210. [Google Scholar]

[R5] Ansari A, Jedidi K. Bayesian factor analysis for multilevel binary observations. Psychometrika. 2000;65:475–498. [Google Scholar]

[R6] Bartolucci F, Farcomeni A. A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. Journal of the American Statistical Association. 2009;104:816–831. [Google Scholar]

[R7] Bartolucci F, Farcomeni A, Pennoni F. Latent Markov Models for Longitudinal Data. Florida: Chapman & Hall/CRC, Taylor and Francis Group; 2013. [Google Scholar]

[R8] Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics. 1970;41:164–171. [Google Scholar]

[R9] Behseta S, Kass RE, Wallstrom GL. Hierarchical models for assessing variability among functions. Biometrika. 2005;92:419–434. [Google Scholar]

[R10] Berry SM, Carroll RJ, Ruppert D. Bayesian smoothing and regression splines for measurement error problems. Journal of the American Statistical Association. 2002;97:160–169. [Google Scholar]

[R11] Biller C, Fahrmeir L. Bayesian varying-coefficient models using adaptive regression splines. Statistical Modelling. 2001;1:195–211. [Google Scholar]

[R12] Cappé O, Moulines E, Rydén T. Inference in Hidden Markov Models. Springer; New York: 2005. [Google Scholar]

[R13] Celeux G, Forbes F, Robert CP, Titterington DM. Deviance information criteria for missing data models. Bayesian Analysis. 2006;1:651–674. [Google Scholar]

[R14] Chi YY, Ibrahim JG. Regime-switching bivariate dual change score model. Biometrics. 2006;62:432–445. [Google Scholar]

[R15] Chow SM, Grimm KJ, Filteau G, Dolan CV, McArdle JJ. Regimes-witching bivariate dual change score model. Multivariate Behavioral Research. 2013;48(4):463–502. doi: 10.1080/00273171.2013.787870. [DOI] [PubMed] [Google Scholar]

[R16] De Boor C. A Practical Guide to Splines. Springer-Verlag; New York: 2001. revised edition. [Google Scholar]

[R17] Dickerson BC, Wolk D. Biomarker-based prediction of progression in MCI: comparison of AD-signature and hippocampal volume with spinal fluid amyloid-β and tau. Frontiers in Aging Neuroscience. 2013;5:55. doi: 10.3389/fnagi.2013.00055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] DiMatteo I, Genovese CR, Kass RE. Bayesian curve fitting with free-knot splines. Biometrika. 2001;88:1055–1071. [Google Scholar]

[R19] Dunson DB. Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society: Series B. 2000;62:355–366. [Google Scholar]

[R20] Dunson DB. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association. 2003;98:555–563. [Google Scholar]

[R21] Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11:89–121. [Google Scholar]

[R22] Fahrmeir L, Raach A. A Bayesian semiparametric latent variable model for mixed responses. Psychometrika. 2007;72:327–346. [Google Scholar]

[R23] Frühwirth-Schnatter S. Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. Journal of the American Statistical Association. 2001;96:194–209. [Google Scholar]

[R24] Gao S, Hendrie HC, Hall KS, Hui S. The relationships between age, sex, and the incidence of dementia and Alzheimer disease: a meta-analysis. Archives of General Psychiatry. 1998;55:809–815. doi: 10.1001/archpsyc.55.9.809. [DOI] [PubMed] [Google Scholar]

[R25] Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]

[R26] Ip E, Zhang Q, Rejeski J, Harris T, Kritchevsky S. Partially ordered mixed hidden Markov model for the disablement process of older adults. Journal of the American Statistical Association. 2013;108:370–384. doi: 10.1080/01621459.2013.770307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Jack CR, Petersen RC, O’brien PC, Tangalos EG. MR-based hippocampal volumetry in the diagnosis of Alzheimer’s disease. Neurology. 1992;42:183–183. doi: 10.1212/wnl.42.1.183. [DOI] [PubMed] [Google Scholar]

[R28] Kantarci K, Gunter JL, Tosakulwong N, Weigand SD, Senjem MS, Petersen RC, Aisen PS, Jagust WJ, Weiner MW, Jack CR Alzheimer’s Disease Neuroimaging Initiative. Focal hemosiderin deposits and -amyloid load in the ADNI cohort. Alzheimer’s & Dementia. 2013;9:S116–S123. doi: 10.1016/j.jalz.2012.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90:773–795. [Google Scholar]

[R30] Kesslak JP, Nalcioglu O, Cotman CW. Quantification of magnetic resonance scans for hippocampal and parahippocampal atrophy in Alzheimer’s disease. Neurology. 1991;41:51–51. doi: 10.1212/wnl.41.1.51. [DOI] [PubMed] [Google Scholar]

[R31] Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics. 2004;13:183–212. [Google Scholar]

[R32] Lee E, Zhu H, Kong D, Wang Y, Giovanello KS, Ibrahim JG. BFLCRM: A Bayesian functional linear Cox regression model for predicting time to conversion to Alzheimer’s disease. The Annals of Applied Statistics. 2015;9:2153–2178. doi: 10.1214/15-AOAS879. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Lee SY, Song XY. Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research. 2004;39:653–686. doi: 10.1207/s15327906mbr3904_4. [DOI] [PubMed] [Google Scholar]

[R34] Lindsay J, Laurin D, Verreault R, Hbert R, Helliwell B, Hill GB, McDowell I. Risk factors for Alzheimer’s disease: a prospective analysis from the Canadian Study of Health and Aging. American Journal of Epidemiology. 2002;156:445–453. doi: 10.1093/aje/kwf074. [DOI] [PubMed] [Google Scholar]

[R35] Maruotti A. Mixed hidden Markov models for longitudinal data: an overview. International Statistical Review. 2011;79:427–454. [Google Scholar]

[R36] Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. The Journal of Chemical Physics. 1953;21:1087–1091. [Google Scholar]

[R37] Myers SC, Majluf NS. Corporate financing and investment decision when firms have information that investor do not have. Journal of Financial Economic. 1984;13:187–221. [Google Scholar]

[R38] Okuizumi K, Onodera O, Tanaka H, Kobayashi H, Tsuji S, Takahashi H, Tsuji S, Takahashi H, Oyanagi K, Seki K, Tanaka M, Naruse S, Miyatake T, Mizusawa H, Kanazawa I. ApoE-ε4 and earlyonset Alzheimer’s. Nature genetics. Journal of Financial Economic. 1994;7:10–11. doi: 10.1038/ng0594-10b. [DOI] [PubMed] [Google Scholar]

[R39] Panagiotelis A, Smith M. Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. Journal of Econometrics. 2008;143:291–316. [Google Scholar]

[R40] Petersen RC. Mild cognitive impairment as a diagnostic entity. Journal of internal medicine. 2004;256(3):183–194. doi: 10.1111/j.1365-2796.2004.01388.x. [DOI] [PubMed] [Google Scholar]

[R41] Petersen RC, Thomas RG, Grundman M, Bennett D, Doody R, Ferris S, Galasko D, Jin S, Kaye J, Levey A, Pfeiffer E, Sano M, Dyck CH, Thal LJ. Vitamin E and donepezil for the treatment of mild cognitive impairment. New England Journal of Medicine. 2005;352:2379–2388. doi: 10.1056/NEJMoa050151. [DOI] [PubMed] [Google Scholar]

[R42] Richardson S, Green PJ. On Bayesian analysis of mixtures with an unknown number of components (with discussion) Journal of the Royal Statistical Society, Series B. 1997;59:731–792. [Google Scholar]

[R43] Schmittmann VD, Dolan CV, van der Maas HL, Neale MC. Discrete latent Markov models for normally distributed response data. Multivariate Behavioral Research. 2005;40:461–488. doi: 10.1207/s15327906mbr4004_4. [DOI] [PubMed] [Google Scholar]

[R44] Scott SL, James GM, Sugar CA. Hidden Markov models for longitudinal comparisons. Journal of the American Statistical Association. 2005;100:359–369. [Google Scholar]

[R45] Song XY, Lu ZH. Semiparametric latent variable models with Bayesian P-splines. Journal of Computational and Graphical Statistics. 2010;19:590–608. [Google Scholar]

[R46] Song XY, Xia YM, Zhu HT. Hidden Markov latent variable models with multivariate longitudinal data. Biometrics. 2016;73:313–323. doi: 10.1111/biom.12536. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society, Series B. 2002;64:583–639. [Google Scholar]

[R48] Spiegelhalter DJ, Thomas A, Best NG, abd Lunn D. WinBUGS User Manual. Version 1.4. MRC Biostatistics Unit; Cambridge, England: 2003. [Google Scholar]

[R49] Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation (with discussion) Journal of the American Statistical Association. 1987;82:528–550. [Google Scholar]

[R50] Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997;53:330–339. [PubMed] [Google Scholar]

[R51] Vermunt JK, Langeheine R, Böckenholt U. Latent Markov models with time-constant and time-varying covariates. Journal of Educational and Behavioral Statistics. 1999;24:178–205. [Google Scholar]

PERMALINK

BSMHM2: Bayesian Semiparametric Mixed Hidden Markov Models for Delineating the Pathology of Alzheimer’s Disease

Kai Kang

Jingheng Cai

Xinyuan Song

Hongtu Zhu

Abstract

1 Introduction

2 Model description

2.1 Questions of Interest for ADNI-1

Figure 1.

2.2 Model Setup

2.2.1 Conditional semiparametric regression model

2.2.2 Continuation-ratio logit transition model

2.3 Model identification

3 Bayesian Inference

3.1 Nonparametric modeling

3.2 Prior distributions

3.3 Posterior computation

3.4 Determination of the number of hidden states

4 Alzheimer’s Disease Neuroimaging Initiative Data Analysis

4.1 Data description

Table 1.

4.2 Data analysis

Table 2.

Figure 2.

Table 3.

5 Simulation Study

5.1 Model setup

5.2 Simulation results

Table 4.

Figure 3.

6 Discussion

Appendix

Full Conditional Distributions

(I) Full conditional distributions of Zit

(II) Full conditional distributions of wi

(III) Full conditional distributions of μs, γs, ψs, and Φ

(IV) Full conditional distributions of βsj and θsj

(V) Full conditional distributions of τs, ζus, and α

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

(I) Full conditional distributions of Z_it

(II) Full conditional distributions of w_i

(III) Full conditional distributions of μ_s, γ_s, ψ_s, and Φ

(IV) Full conditional distributions of β_sj and θ_sj

(V) Full conditional distributions of τ_s, ζ_us, and α