Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior

Sy-Miin Chow; Niansheng Tang; Ying Yuan; Xinyuan Song; Hongtu Zhu

doi:10.1348/000711010X497262

. Author manuscript; available in PMC: 2011 Oct 23.

Published in final edited form as: Br J Math Stat Psychol. 2011 Feb;64(Pt 1):69–106. doi: 10.1348/000711010X497262

Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior

Sy-Miin Chow ^1,^*, Niansheng Tang ², Ying Yuan ¹, Xinyuan Song ³, Hongtu Zhu ¹

PMCID: PMC3199348 NIHMSID: NIHMS329461 PMID: 21506946

Abstract

Parameters in time series and other dynamic models often show complex range restrictions and their distributions may deviate substantially from multivariate normal or other standard parametric distributions. We use the truncated Dirichlet process (DP) as a non-parametric prior for such dynamic parameters in a novel nonlinear Bayesian dynamic factor analysis model. This is equivalent to specifying the prior distribution to be a mixture distribution composed of an unknown number of discrete point masses (or clusters). The stick-breaking prior and the blocked Gibbs sampler are used to enable efficient simulation of posterior samples. Using a series of empirical and simulation examples, we illustrate the flexibility of the proposed approach in approximating distributions of very diverse shapes.

1. Introduction

The last two decades have evidenced the emergence of a new class of structural equation models (SEMs), termed latent variable models (LVMs). These models relax the traditional linearity and Gaussian assumptions of the structural equation modelling framework (Jöreskog, 1974) and offer new possibilities for fitting more complex models. Such timely advances include recent developments in fitting dynamic LVMs, defined here as longitudinal models for describing change processes that extend over substantially longer time-spans (e.g., T > 35) than those implicated in conventional panel models (typically, T < 10).

Some of the seminal work on dynamic LVMs has been spearheaded in part by progress in fitting dynamic factor analysis (DFA) models (Browne & Nesselroade, 2005; McArdle, 1982; Molenaar, 1985; Nesselroade, McArdle, Aggen, & Meyers, 2002; Zhang & Nesselroade, 2007). A DFA model can be viewed as an extension to Cattell, Cattell, and, Rhymer’s (1947) P-technique model¹ in which a change model of choice is combined with the standard factor analytic model to account for lagged relationships among factors and manifest variables. The different variants of DFA models proposed in the past two decades have, however, focused exclusively on linear changes (e.g., McArdle, 1982; Molenaar, 1985; Zhang & Browne, 2009; Zhang & Nesselroade, 2007). This is due largely to the many methodological challenges involved in extending DFA models to a nonlinear framework. First, the difficulties associated with fitting cross-sectional LVMs with nonlinear relationships among factors (as discussed e.g., by Kenny & Judd, 1984) have remained one of most widely investigated research issues in the SEM literature for decades (e.g., Klein & Moosbrugger, 2000; Schumacker, 2002). Generalization to nonlinear dynamic LVMs requires methodological adaptations that can lead to further complication. Second, because different measurement occasions for a single variable are typically treated as different manifest variables in standard SEM practice, the input data covariance matrix for fitting DFA models in SEM software is non-positive definite in cases were the number measurement occasions, T, exceeds the number of participants, N. Even when T < N, numerical problems may still arise due to the need to invert a high-dimensional model-implied covariance matrix at each iteration (Hamaker, Dolan, & Molenaar, 2003). Alternative approaches have been proposed to circumvent these issues, including using a block-Toeplitz matrix to replace the data covariance or correlation matrix (Browne & Nesselroade, 2005; Hershberger, Corneal, & Molenaar, 1994; Molenaar, 1985), and using raw data maximum likelihood with special parameter constraints to ensure the positive definiteness of the model-implied covariance matrix (Hamaker et al., 2003). Still, adapting these alternative approaches for use with nonlinear dynamic LVMs is not a straightforward endeavour.

Despite the challenges involved, nonlinear DFA models, as we will illustrate using an empirical example, provide a valuable tool for evaluating substantive questions that are otherwise difficult to test within the linear framework (e.g., Frederickson & Losada, 2005; Gottman, Murray, Swanson, Tyson, & Swanson, 2002). Our aims in the present paper are twofold. First, we seek to present a Bayesian approach to estimating nonlinear dynamic LVMs and propose a novel nonlinear DFA model as a special case. Our second aim is to allow parameters that are of substantive interest in the proposed dynamic model to vary over persons and, more importantly, to conform to non-parametric distributional forms through the use of a non-parametric Dirichlet process (DP) prior. We allow the dynamic but not other modelling parameters (e.g., measurement parameters such as factor loadings) to vary over persons because, in most applications of DFA models, individual differences in the dynamic parameters (i.e., parameters that dictate the nature of the change processes) are often the focus of substantive interest (e.g., Chow, Nesselroade, Shifren, & McArdle, 2004; Ferrer & Nesselroade, 2003). Relaxing the parametric assumptions imposed on these parameters is particularly important from a substantive as well as a practical standpoint. As an example, in fitting DFA models where vector autoregressive (VAR) models or other related variations are used to describe the dynamic relationships among factors, the dynamic parameters (specifically, the auto- and cross-regression parameters) often show complex restrictions of range. That is, moving beyond the stationary ranges may, in certain cases, yield systems that show increasing variance over time² (Hamilton, 1994; Wei, 1990) – an unlikely scenario in many empirical applications. As a result, the distributions of the dynamic parameters may be non-normal, and even asymmetric.

Prior selection is an important issue in Bayesian data analysis. In many cases, conjugate priors are used for practical reasons to yield posterior distributions of known analytic forms (see Lee, 2007). The multivariate normal distribution, for instance, is one such candidate. Despite its appeal, using multivariate normal distributions as conjugate priors may lead to incorrect posterior inferences in cases where the desired posterior distributions deviate substantially from normality. More recently, there have been a few psychometric applications that utilize non-parametric priors in fitting Bayesian models. Such applications are restricted, however, to cases involving cross-sectional SEMs and item response models (e.g., Duncan & MacEachern, 2008; Lee, Lu, & Song, 2007; Navarro, Griffiths, Steyvers, & Lee, 2006), as well as linear dynamic LVMs (Ansari & Iyengar, 2006). Our proposed approach thus extends previous work that utilizes the DP as a non-parametric prior in Bayesian models to dynamic LVMs with nonlinear change processes at the factor level. Due to the categorical nature of the data used in our empirical example, our approach also generalizes previous approaches of fitting linear DFA models to categorical data (Zhang & Browne, 2009; Zhang & Nesselroade, 2007) to cases involving nonlinear DFA models.

The rest of the present paper is organized as follows. We first summarize features of the semiparametric Bayesian dynamic modelling framework adopted in the paper. We then introduce a motivating empirical example and a novel nonlinear DFA model formulated to test a specific theory of emotions. This is followed by an outline of the broader modelling framework within which our illustrative model can be conceived as a special case. We then present the Markov chain Monte Carlo procedures for fitting the broader model, and results from empirical model fitting and a simulation study. We close with some concluding comments.

2. The Dirichlet process as a non-parametric prior

Non-parametric and semiparametric Bayesian models provide a flexible platform for evaluating the tenability of parametric assumptions in dynamic LVMs. Here, we use the term ‘semiparametric models’ to refer to models with known modelling functions, but unknown distributions for some of the modelling components. In particular, we allow the distributions of the dynamic parameters in our model to conform to non-parametric forms.

When one or more of the distributions in a model of interest are of an unknown form, one common approach is to approximate such distributions non-parametrically by using a finite mixture of parametric distributions, such as a mixture of normal distributions (Lindsay, 1995; McLachlan & Peel, 2000; Sorensen & Alspach, 1971; Titterington, Smith, & Markov, 1985). In some instances, it is of substantive importance to interpret the mixture components as different clusters or classes of individuals with similar characteristics. Unfortunately, when the distributions of interest violate the normality assumption (e.g., in cases involving skewed and heavy-tailed distributions), spurious classes are often detected even when there are no systematic between-class differences (Bauer & Curran, 2003). Non-parametric Bayesian models relax critical dependence on parametric assumptions and, in this way, they help ‘robustify’ parametric models (Antoniak, 1974; Ferguson, 1973). They also serve as a platform for assessing the appropriateness of the parametric assumptions in a model of interest (Dunson, 2008; Karabotsos & Walker, 2009).

Suppose that in an application, an r × 1 vector of modelling components of interest, b_i, for person i, conforms to an unknown distribution such that b_i ~ 𝒫;, where 𝒫; is of an unknown form. In practice, b_i may correspond to a vector of modelling parameters, a vector of latent trait components in an item response model, or a vector of non-Gaussian residuals. One possible non-parametric approach in the Bayesian framework is to specify a DP prior for 𝒫. That is, we let b_i ~ DP(αP₀), where P₀ is a base distribution that serves as a starting-point for constructing the non-parametric distribution. For instance, the (multivariate) normal distribution is a common choice for P₀. The positive quantity α represents the weight a researcher assigns a priori to the base distribution, and it reflects the researcher’s certainty of P₀ as the distribution of b_i.

The practical consequences of specifying a DP prior can be better understood in terms of Sethuraman’s (1994) stick-breaking representation. The stick-breaking representation provides an alternative way of conceiving 𝒫 ~ DP(αP₀) as

𝒫 (.) = \sum_{g = 1}^{\infty} π_{g} δ_{Z_{g}} (.), with Z_{g} \overset{i . i . d .}{~} P_{0},

(1)

where δ_{Z_g}(.) denotes a unit point mass at Z_g, Z_g is the gth matrix consisting of possible values of b_i and π_g is a random probability weight between 0 and 1. Equation (1) thus posits that 𝒫 consists of a series of point masses – or ‘sticks’ – of various lengths concentrated at different values of Z_g. For empirical estimation purposes, a truncated DP may be used to approximate equation (1) as (Ishwaran & James, 2001; Ishwaran & Zarepour, 2000; Lee et al., 2007)

𝒫 (.) \approx 𝒫_{G} (.) = \sum_{g = 1}^{G} π_{g} δ_{Z_{g}} (.), 1 \leq G < \infty,

(2)

where π_g is obtained as

\begin{matrix} π_{1} & = v_{1}, \\ π_{g} & = v_{g} \prod_{l < g} \end{matrix} (1 - v_{l}), for g = 2, \dots, G,

(3)

with

v_{g} \overset{i . i . d .}{~} Beta (1, α), for g = 1, \dots, G - 1,

(4)

and ν_G = 1 so that $\sum_{g = 1}^{G} π_{g} = 1$ .

To aid interpretation, consider the simple case where G = 2. In this case, ν₂ = 1, ν₁ is a random weight between 0 and 1 drawn from the beta distribution; ν₁, the proportion or length of the first stick, is equal to ν₁ and, by equation (3), ν₂ = ν₂(1 − ν₁) = 1 − ν₁. That is, ν₁ is the proportion of a unit probability stick that is broken off and assigned to Z₁, and 1 − ν₁ is the remainder of the stick that is assigned to Z₂. In practice, the value of G is either set to a large, predetermined value (e.g., G ≥ 150) or chosen empirically. For instance, Ishwaran and Zarepour (2000) suggested that the adequacy of the truncation level, G, can be assessed by evaluating moments of the tail probability, $\sum_{g = G}^{\infty} π_{g}$ , whose value depends only on G and the hyperparameters that govern the prior distribution of α. A relatively small tail probability is desired, as this indicates that including additional sticks beyond G does not lead to substantial differences in the approximation. Our simulation results showed that a value of G = 300 is more than adequate for the model considered in the present context.

A variety of different distributional forms can be approximated by the discrete probability measure 𝒫 through different ways of allocating the weights, π_g. The assignment of the weights is, in turn, governed by the random weight, α. Typically, a gamma hyperprior is specified for α and the associated hyperparameters for this gamma prior reflect a researcher’s best guess as to how π_g should be distributed, and consequently, how the point masses in equation (2) should resemble the base distribution, P₀. Such prior information is then combined with information from the empirical data to shape the resultant posterior distribution of interest.

To provide a more concrete illustration of the functional role of α, we generated realizations from the DP with different values of α, G = 300, r = 1 and a univariate N(0,1) base distribution. The associated realizations from 𝒫_G(.) are plotted in Figure 1. The discrete-valued nature of 𝒫_G(.) resulting from the use of the DP prior is portrayed in the plots. In addition, as α increases, samples of ν change from conforming to a roughly uniform distribution to skewed distributions of increasingly restricted ranges dominated by a few weights. Consequently, the distributions of π are characterized by an increasing number of non-zero values of π_g. As a result, progressively more sticks are ‘broken off’ to form more clusters, and subsequently more unique values of Z_g are observed with non-zero probabilities.

Plots of realizations from the DP with different values of α and a N(0,1) base distribution.

One important fact to note is that as α gets very large, realizations from 𝒫_G(.) show increased resemblance to the standard normal base distribution (see Figure 1d). In fact, in the extreme case where each individual in the sample is assigned to his/her own cluster (not shown here), each individual has one unique set of values for b_i and, consequently, b_i would be distributed as P₀ (Dunson, 2008). In our simulations, we show how the DP can be used as a non-parametric prior for representing a variety of different distributional forms for a set of individual-specific dynamic parameters of interest.

On the technical front, sampling from posterior distributions involving a DP prior can be computationally intensive. One standard approach to enable efficient sampling of b_i within a Markov chain Monte Carlo (MCMC) framework is to represent b_i in terms of a latent variable, L_i, which records each b_i’s cluster membership and conveys its values such that b_i = Z_{L_i}. In other words, the classification variable L_i is essentially a set of ‘pointers’ for indicating the values of Z_g associated with person i so that b_i is known when L_i is known. L_i is conditionally independent of Z_g, and it is distributed as

L_{i} | π \overset{i . i . d .}{~} \sum_{g = 1}^{G} π_{g} δ_{g} (.) .

(5)

3. Motivating empirical example

Data used in our illustrative example have been previously published elsewhere (see Chow, Ram, Boker, Fujita, & Clore, 2005; Diener, Fujita, & Smith, 1995; Ram et al., 2005). The sample consisted of 179 college students (98 male and 81 female; average age = 20.24 years, SD = 1.41) who were asked to provide self-report affect ratings daily for 52 days. After excluding participants with excessive missingness and data anomalies, a total of 174 participants were retained in the final analysis. Four ordinal positive emotion (PE) items and four ordinal negative emotion (NE) items measured on a scale from 1 ( = ‘none’) to 7 ( = ‘always’) were used for model fitting purposes. PE items include joy, contentment, love and affection whereas NE items include unhappiness, anger, depression and anxiety. Individuals’ composite PE and NE scores derived from summing these items are plotted in Figure 2.

(a) PE and (b) NE ratings of the participants over 52 days. To avoid cluttering the figure, we have only plotted the trajectories of 50 randomly sampled individuals.

One linear DFA model that has been used in the past to describe day-to-day changes in PE and NE processes is a process factor analysis model that combines a factor analytic model with a VAR model for describing the relationships among factors (e.g., Chow et al., 2004; Ferrer & Nesselroade, 2003). The dynamics among the latent factors are represented as

\begin{matrix} {PE}_{it} = b_{11} {PE}_{i, t - 1} + b_{12} {NE}_{i, t - 1} + ζ_{1 it,} \\ {NE}_{it} = b_{22} {NE}_{i, t - 1} + b_{21} {PE}_{i, t - 1} + ζ_{2 it,} \\ (ζ_{1 it}, ζ_{2 it})' \overset{i . i . d .}{~} N (0, Ψ_{ζ}), Ψ_{ζ} = [\begin{matrix} ψ_{ζ_{11}} \\ ψ_{ζ_{12}} & ψ_{ζ_{22}} \end{matrix}] . \end{matrix}

(6)

In this model, PE_it and NE_it are continuous latent variables representing individual i’s underlying PE and NE at time t; ζ_1it and ζ_2it are process noise terms assumed to follow a multivariate normal distribution; b₁₁ and b₂₂ are the first-order (or lag-one) autoregressive (AR(1)) parameters, and b₁₂ and b₂₁ are the lag-one cross-regression parameters. Note that equation (6) only includes auto- and cross-regressive relationships up to the first order because previous inspection of the partial auto- and cross-correlation plots of the PE and NE sum scores from the current data set suggested that the shorter-term dynamics in the data are captured primarily by the lag-one components.³ Higher lag orders and other linear/nonlinear variations of equation (6) can, however, be readily implemented using the broader modelling framework and the associated estimation procedures proposed in the present paper. Thus, higher-order lags can be incorporated as needed in other applications.

Whereas the model shown in equation (6) is a common linear choice for capturing the kinds of relatively rapid fluctuations seen in affect data, we constructed a nonlinear extension to the VAR(1) model in equation (6) to test a theoretically driven model of emotions. Specifically, in their study of affect, Zautra, Reich, Davis, Nicolson, and Potter (2000) proposed a dynamic affect model which postulates that the relative separation between PE and NE changes dynamically as a function of stress. That is, elevated stress is associated with ‘shrinkages’ in the affective space and the coalescence of positive and negative emotions into a unipolar dimension. Thus, the dynamic affect model suggests that the linkage between PE and NE strengthens with stress but weakens at lower levels of stress. Because we did not have a time-varying indicator of stress, the dynamic affect model will be tested mathematically as

\begin{matrix} {PE}_{it} & = b_{1, it} {PE}_{i, t - 1} + ζ_{1 it} = [b_{11, i} + b_{12, i} \frac{exp ({NE}_{i, t - 1})}{1 + exp ({NE}_{i, t - 1})}] {PE}_{i, t - 1} + ζ_{1 it,} \\ {NE}_{it} & = b_{2, it} {NE}_{i, t - 1} + ζ_{2 it} = [b_{22, i} + b_{21, i} \frac{exp ({PE}_{i, t - 1})}{1 + exp ({PE}_{i, t - 1})}] {NE}_{i, t - 1} + ζ_{2 it,} \\ b_{i} & ~ DP (α P_{0}), b_{i} = (b_{11, i,} b_{22, i,} b_{12, i,} b_{21, i})', \end{matrix}

(7)

where b_11,i and b_22,i are the baseline AR(1) parameters of individual i at extremely low values of PE_i,t−1 and NE_i,t−1, respectively. If an individual’s PE (or NE) was high at time t − 1, the high PE (NE) affects the subsequent dynamics of the individual’s NE (PE) by altering the AR(1) parameter associated with the latter. The parameters b_12,i and b_21,i are the person-specific deviations in AR(1) parameters for PE_it and NE_it respectively, when the two factors were at extremely high values at time t − 1. That is, under high stress, it is natural for an individual to experience a high level of NE. The high NE, in turn, changes the dynamics of PE at the next time point by altering the value of PE’s AR(1) parameter by a magnitude of b_12,i. Whether a reciprocal effect from PE to NE also exists at very high values of PE is reflected in the magnitude of b_21,i.

In sum, the nonlinear VAR(1) model in equation (7) extends the VAR(1) model into a linear DFA model (see equation (6)) in three ways. First, the AR(1) parameter of each factor is now a time-varying function of the other factor from a previous time point, so that the dynamic model becomes nonlinear. Second, we allow four of the dynamic parameters, namely, b_i = (b_11,i, b_22,i, b_12,i, b_21,i) to vary over persons. Third, we use a DP prior for b_i to allow these parameters to conform to non-parametric distributional forms.

4. Dynamic latent variable modelling framework

The nonlinear VAR(1) model in equation (7) can be formulated as a special case of a nonlinear state-space model. The state-space representation provides a flexible framework for representing different dynamic processes and subsumes many dynamic and time series models as special cases. Because latent variables are still the focus of our formulation, we refer to our modelling framework as the dynamic latent variable modelling framework. The resultant modelling framework comprises a dynamic model and a measurement model, which will be described next.

4.1. Dynamic model

The general dynamic model on which the proposed estimation procedures are based is expressed as

η_{it} = f_{t} (η_{i, t - 1}, b_{i}, θ_{η}) + ζ_{it}, ζ_{it} \overset{i . i . d .}{~} N (0, Ψ_{ζ}), b_{i} | α, P_{0} \overset{i . i . d .}{~} DP (α P_{0}),

(8)

where i is the person index and t is the time index; η_it is a w × 1 vector of latent variables of interest, ζ_it is a vector of process noise components and f_t(.) is a vector of time-varying, differentiable linear or nonlinear functions describing the latent variables at time t in terms of three components (1) their previous history at time t − 1, (2) b_i, an r × 1 vector of person-specific parameters, and (3) a vector of parameters that is held invariant over time and persons, denoted as θ_η. In our motivating example, η_it = (PE_it, NE_it)′, b_i = (b_11,i, b_22,i, b_12,i, b_21,i)′ and θ_η = (Ψ_ζ). We specify a DP prior for b_i, and set the base distribution, P₀, to be an r-variate normal distribution with mean vector μ_Z and covariance matrix Ψ_Z.

Beyond our illustrative model, a variety of other dynamic functions can be readily specified as special cases of equation (8). Some examples include unobserved components models with cyclic, seasonal, and irregular components (Durbin & Koopman, 2001; Harvey, 2001), non-parametric spline models (De Jong & Mazzi, 2001), and exact discrete time models (Harvey, 2001). VAR models of higher lags and other related vector autoregressive moving average (VARMA) extensions can also be formulated as special cases of equation (8) by expanding the size of η_it to include higher-order lag components.

4.2. Measurement model

Motivated by our empirical data of interest, we now consider a measurement model for ordinal data and elaborate briefly, where appropriate, on how the proposed framework can be extended to include mixed responses. In the conventional state-space framework, the measurement model is used to specify the relationships among a set of latent and manifest variables. In cases involving ordinal manifest data, the same measurement model is defined in terms of a vector of underlying, continuous latent variables as

y_{it}^{*} = μ + Λ η_{it} + ε_{it}, ε_{it} \overset{i . i . d .}{~} N (0, Ψ_{ε}), t = 1, 2, \dots, T; i = 1, 2, \dots n,

(9)

where $y_{it}^{*}$ is a p × 1 vector of unobserved continuous response variables underlying a p × 1 vector of manifest ordinal data, y_i, and ε_it is a vector of uniquenesses. The vector of time-invariant parameters in the measurement model, denoted by θ_ε, includes elements in the vector of intercepts, μ, the p × w matrix of factor loadings, λ, and Ψ_ε, the p × p covariance matrix of ε_it.

The unobserved continuous latent vector, $y_{it}^{*}$ , is linked to the manifest ordinal data, y_it by

y_{it, k} = s \Leftrightarrow τ_{k, s - 1} < y_{it, k}^{*} \leq τ_{k, s,} k = 1, \dots, p; s = 1, \dots M,

(10)

where τ_k,h is a set of threshold values held invariant across persons and time, with

- \infty = τ_{k, 0} < τ_{k, 1} < τ_{k, 2} < \dots < τ_{k, M - 1} < τ_{k, M} = + \infty .

(11)

That is, for the kth ordinal variable, y_it,k, with M categories, there are M − 1 threshold parameters. The underlying variable approach summarized in equations (9) – (11) has been shown to be equivalent to the generalized latent trait approach often adopted in the item response theory framework (Bartholomew & Knott, 1999; Jöreskog & Moustaki, 2001).⁴ Since only ordinal information is used to identify $y_{it}^{*}$ , additional constraints need to be imposed to identify the model. For instance, the lowest and highest threshold values of each item can be fixed, as opposed to being freely estimated (Lee & Zhu, 2000). These are the identification constraint adopted in the present study.

5. Bayesian estimation procedures

Throughout, we define θ = (θ_ε, θ_η), π = (π₁, …, π_G), L = (L₁, …, L_n), H = (η₁, …, η_n) with η_i = (η_i1, …, η_iT)′ as the array of latent variables, τ = (τ_1,2, …, τ_{1,M −2}, …, τ_{p,M −2}) as a vector of threshold parameters and b = (b₁, …, b_n) as an array of all person-specific parameters. In addition, we denote Y = (Y₁, Y₂, …, Y_n) ≜ {Y_obs, Y_mis}, where Y_obs is a data array with complete ordinal observations from all persons and time points, Y_mis is a data array that includes all missing observations from all persons and time points, with Y_i = (y_i1, y_i2, …, y_iT)′ ≜ {y_i,obs, y_i,mis} being a data matrix that includes person i’s complete and missing manifest ordinal data up to time T. Their corresponding continuous counterparts are denoted by $Y^{*} = (Y_{1}^{*}, \dots, Y_{n}^{*}) ≜ {Y_{obs}^{*}, Y_{mis}^{*}} with Y_{i}^{*} = (y_{i 1}^{*}, \dots, y_{iT}^{*})' ≜ {y_{i, obs}^{*}, y_{i, mis}^{*}}$ . Assuming that the data are missing at random with an ignorable missingness mechanism (Little & Rubin, 1987), estimation of all parameters and latent variables will be based only on the observed data set, Y_obs.

The general model summarized in equations (7)–(11) can be rewritten as

\begin{matrix} (y_{it}^{*} | θ_{ε}, η_{it}) \overset{ind}{~} N_{p} (μ + Λ η_{it}, Ψ_{ε}), (η_{it} | η_{i, t - 1}, b_{i} = Z_{L_{i}}, θ_{η}) \overset{ind}{~} N_{w} [f_{t} (η_{i, t - 1}, b_{i} = Z_{L_{i}}, θ_{η}), Ψ_{ζ}], \\ (v_{g} | α) \overset{i . i . d .}{~} Beta (1, α), for g = 1, \dots, G - 1 and v_{G} = 1, \\ (π_{g} | v) ~ SB (v), Z_{g} \overset{i . i . d .}{~} N_{r} (μ_{Z}, Ψ_{Z}), (L_{i} | π) \overset{i . i . d .}{~} \sum_{g = 1}^{G} π_{g} δ_{g} (.), \end{matrix}

where SB(.) denotes the stick-breaking process expressed in equation (3).

We specified the prior distributions for our modelling components as

\begin{matrix} τ ~ c, & (μ | μ_{0}, Σ_{0}) ~ N_{p} [μ_{0}, Σ_{0}], \\ (Ψ_{ζ} | w_{0}, Ψ_{ζ_{0}}^{- 1}) ~ {IW}_{w} [w_{0}, Ψ_{ζ_{0}}^{- 1}], & (Λ_{k} | ψ_{ε_{k}}) ~ N_{w} [Λ_{0_{k},} ψ_{ε_{k}} H_{0 Λ_{k}}], \\ (ψ_{ε_{k}}^{- 1}) \overset{i . i . d .}{~} Gamma [α_{0 ε_{k}}, β_{0 ε_{k}}], & (μ_{Z} | Ψ_{μ_{Z}}) ~ N_{r} (μ_{Z_{0}}, Ψ_{μ_{Z}}), \\ (Ψ_{Z_{k}}^{- 1} | c_{1}, c_{2}) \overset{i . i . d .}{~} Gamma (c_{1}, c_{2}), & (α | a_{1}, a_{2}) ~ Gamma (a_{1}, a_{2}), \end{matrix}

(12)

where c is a constant used to specify a diffuse prior for the threshold parameters and $Λ_{k}^{'}$ denotes the kth row of Λ (for k = 1, …, p). The components $μ_{0}, Σ_{0}, w_{0}, Ψ_{ζ_{0}}^{- 1}, Λ_{0_{k}}, H_{0 Λ_{k},} α_{0 ε_{k}}, β_{0 ε_{k}}, μ_{Z_{0},} Ψ_{μ_{Z},} c_{1}, c_{2}, a_{1} and a_{2}$ are all hyperparameters whose values are assumed to be known. Thus, with the exception of the threshold parameters, which were assigned a non-informative prior, standard conjugate priors were specified for all other parametric components in the model. Such priors were thought to provide a reasonable representation of the characteristics of these components, and the associated hyperparameters can be determined in a relatively straightforward manner based on previous applications. Details of our hyperparameter choices are discussed later as we present the empirical results.

The Gibbs sampler will be used to simulate a sequence of random observations from the joint posterior distribution p(π, Z, L, μ_Z, Ψ_Z, α, H, θ, τ, Y^*, Y_mis|Y_obs), where Z = (Z₁, …, Z_G) is a G × r matrix containing values of b_i. At the first iteration, samples of Z are drawn from the base distribution, P₀. Then, starting from $π^{(0)}, Z^{(0)}, L^{(0)}, μ_{Z}^{(0)}, Ψ_{Z}^{(0)}, α^{(0)}, H^{(0)}, θ^{(0)}, τ^{(0)} and Y^{* (0)}$ , the Gibbs sampler involves sampling sequentially, for the next iteration, q+1 (until the last iteration, Q), as follows:

Generate $μ_{Z}^{(q + 1)} from p (μ_{Z} | Z^{(q)}, Ψ_{Z}^{(q)})$ .
Generate $Ψ_{Z}^{(q + 1)} from p (Ψ_{Z} | Z^{(q)}, μ_{Z}^{(q + 1)})$ .
Generate α^(q+1) from p(α|π^(q)).
Generate (π^(q+1), Z^(q+1)) from $p (π, Z | L^{(q)}, μ_{Z}^{(q + 1)}, Ψ_{Z}^{(q + 1)}, α^{(q + 1)}, H^{(q)}, θ^{(q)}, τ^{(q)}, Y^{* (q)}, Y_{obs})$ .
Generate L^(q+1) from p(L|π^(q+1), Z^(q+1), τ^(q), Y^*(q), θ^(q), H(q), Y_obs).
Generate H^(q+1) from p(H|τ^(q), Y^*(q), θ^(q), Y_obs, b^(q+1)).
Generate θ^(q+1) from p(θ|τ^(q), Y^*(q), H^(q+1), Y_obs).
Generate $Y_{mis}^{* (q + 1)} from p (Y_{mis}^{*} | θ^{(q + 1)}, H^{(q + 1)})$ .
Generate $(τ^{(q + 1)}, Y_{obs}^{* (q + 1)}) from p (τ, Y_{obs}^{*} | θ^{(q + 1)}, H^{(q + 1)}, Y_{obs})$ .

In sum, the Gibbs sampler essentially involves sampling from a series of conditional distributions while each of the modelling components is updated in turn. The conditional distributions needed to implement steps (a)–(i) are summarized in the Appendix. In cases where the associated conditional distributions are of known analytic forms, sampling from these distributions is straightforward (including steps (a)–(c), (e), (g) and (h)). In other cases, Metropolis–Hasting (MH) algorithms are used within the Gibbs sampler (e.g., steps (d), (f) and (i)) to allow sampling from the corresponding conditional distributions. Steps (a)–(e), in particular, are all part of a blocked Gibbs sampling procedure for deriving posterior samples of b_i. Furthermore, no additional step is included to sample the missing ordinal data, Y_mis. Because of the assumption of missingness at random, all that is needed are samples of the missing latent continuous data array, $Y_{mis}^{*}$ , to yield posterior samples of $Y^{*} = {Y_{obs}^{*}, Y_{mis}^{*}}$ , obtained respectively from steps (i) and (h) and subsequently utilized in other sampling steps.

6. Empirical results

The nonlinear DFA model (see equations (7) and (9)–(11)) was fitted to the empirical data. Four ordinal items were used to identify PE_it, individual i’s latent positive emotion, and four ordinal items were used to identify NE_it, individual i’s latent negative emotion. We ran three independent Markov chains with different starting values and yielded similar results. We report here the results as aggregated across the three chains.

For identification purposes, we set τ_k,1 to Φ⁻¹ (n_k,1/n) and τ_k,6 to $Φ^{- 1} (n^{- 1} \sum_{h = 1}^{6} n_{k, h})$ , where Φ⁻¹(.) denotes the inverse standard normal cumulative distribution function and n_k,h represents the number of responses endorsing category h on item k across all persons and time points. The hyperparameter values of the prior distributions (see equation (12)) were specified as follows. A diffuse prior was specified for all of the threshold parameters, so c can be set to any arbitrary constant value without affecting the resultant posterior distributions of the threshold parameters. We set μ₀ to an 8 × 1 vector of ones, Σ₀ to I₈, w₀ to 10, and $Ψ_{ζ_{0}}^{- 1}$ to

[\begin{matrix} 5 & 4 \\ 4 & 5 \end{matrix}] .

Further, letting k and j denote the row and column, respectively, of the factor loading matrix, Λ, Λ_{0_kj} = 0.8 for each of the freed elements in the factor loading matrix (j = 1 for k = 2, 3, and 4; j = 2 for k = 6, 7, and 8) and H_{0Λ_kj} = 1 for each of the freed elements in the factor loading matrix. For the conjugate priors of the measurement error variances, we set α_{0ε_k} to 8 and β_{0ε_k} to 10 to yield variance values that were relatively large and diffuse.

To ensure that the approximations obtained from posterior samples of the nonparametric components were not biased by the choice of our hyperparameters, we allowed some of the hyperparameters that governed the base distribution to vary randomly across the three independent Markov chains. Specifically, based on previous results from fitting VAR(1) models to sum scores from the present data set with parametric assumptions, we set μ_{Z_0j} to 0.5 for j = 1 and 2 (i.e., corresponding to b_11,i and b_22,i) and to −0.1 for j = 3 and 4 (i.e., corresponding to b_12,i and b_21,i). Elements in Ψ_{μ_Z} were sampled randomly from a Unif (1, 10) distribution for b_11,i and b_22,i and a Unif (1, 12) distribution for b_12,i and b_21,i for each independent Markov chain. Furthermore, we set c₁ to 10, and allowed c₂ to be sampled randomly from a Unif (3, 7) distribution for elements in $Ψ_{Z}^{- 1}$ that corresponded to b_11,i and b_22,i, and from a Unif (0.5, 4) distribution for those that corresponded to b_12,i and b_21,i. Note that the different hyperparameter choices for elements in b_i were necessary because we expected b_12,i and b_21,i to conform to much narrower ranges than the baseline auto regression parameters in stable systems (namely, systems that do not show increasing variance over time). With respect to hyperparameters for the prior distribution of α, we set a₁ to 250 and a₂ to 1 to yield large values of α (and consequently, more unique b_i values) to capture some of the more subtle individual differences in these dynamic parameters.

We computed the estimated potential scale reduction (EPSR; Gelman, 1996) values based on the three independent Markov chains, each initialized with different starting values for the person-invariant parameters, and different hyperparameter values as described above. The EPSR values for all person-invariant parameters became less than 1.2 and the corresponding parameter estimates from different chains stabilized in less than 200 iterations. To allow sufficient burn-in iterations to recover the shapes of the person-specific parameters, we allowed for 18,000 burn-in iterations.

The posterior predictive probability (Gelman, Meng, & Stern, 1996; Lee & Zhu, 2000; Meng, 1994) of the fitted model computed using posterior samples of the continuous data array, Y^*, averaged .67, indicating that the model provided a reasonable fit to the data. Estimates of all person- and time-invariant parameters obtained using 4,000 additional iterations after burn-in are summarized in Table 1. These estimates were averages taken over the three independent chains.

Table 1.

Bayesian estimates of time- and person-invariant parameters from fitting the nonlinear DFA model to empirical data using the DP prior

Parameters	Mean	SD	5th percentile	95th percentile
[λ₂₁, λ₃₁, λ₄₁]	[1.08, 1.03, 0.86]	[0.02, 0.02, 0.02]	[1.05, 1.00, 0.84]	[1.11, 1.06, 0.89]
[λ₆₂, λ₇₂, λ₈₂]	[1.03, 0.77, 0.69]	[0.04, 0.03, 0.02]	[0.97, 0.72, 0.65]	[1.09, 0.83, 0.73]
ψ_ε₁	0.41	0.01	0.39	0.43
ψ_ε₂	0.35	0.01	0.34	0.37
ψ_ε₃	0.40	0.01	0.38	0.41
ψ_ε₄	0.55	0.01	0.53	0.57
ψ_ε₅	0.43	0.02	0.39	0.47
ψ_ε₆	0.29	0.01	0.26	0.31
ψ_ε₇	0.54	0.03	0.48	0.60
ψ_ε₈	0.63	0.02	0.60	0.66
μ₁	0.05	0.03	0.01	0.09
μ₂	0.03	0.03	−0.02	0.08
μ₃	0.03	0.03	−0.01	0.08
μ₄	0.01	0.03	−0.03	0.05
μ₅	0.12	0.03	0.08	0.17
μ₆	0.13	0.03	0.09	0.19
μ₇	0.10	0.02	0.06	0.14
μ₈	0.10	0.02	0.05	0.13
[τ₁₂, τ₁₃, τ₁₄, τ₁₅]	[−0.21, 0.39, 0.80, 1.39]	[0.01, 0.02, 0.02, 0.02]	[−0.23, 0.36, 0.77, 1.35]	[−0.19, 0.42, 0.83, 1.42]
[τ₂₂, τ₂₃, τ₂₄, τ₂₅]	[−1.16, −0.47, 0.12, 0.75]	[0.02, 0.02, 0.02, 0.02]	[−1.19, −0.49, 0.09, 0.72]	[−1.13, −0.44, 0.15, 0.78]
[τ₃₂, τ₃₃, τ₃₄, τ₃₅]	[−0.88, −0.31, 0.21, 0.88]	[0.01, 0.01, 0.02, 0.02]	[−0.90, −0.34, 0.18, 0.85]	[−0.86, −0.29, 0.23, 0.91]
[τ₄₂, τ₄₃, τ₄₄, τ₄₅]	[−0.60, 0.20, 0.66, 1.22]	[0.01, 0.02, 0.02, 0.02]	[−0.62, 0.17, 0.62, 1.18]	[−0.58, 0.22, 0.69, 1.25]
[τ₅₂, τ₅₃, τ₅₄, τ₅₅]	[0.91, 1.73, 2.24, 2.71]	[0.03, 0.05, 0.06, 0.06]	[0.86, 1.65, 2.15, 2.61]	[0.95, 1.81, 2.33, 2.80]
[τ₆₂, τ₆₃, τ₆₄, τ₆₅]	[0.68, 1.48, 1.95, 2.45]	[0.02, 0.04, 0.05, 0.05]	[0.64, 1.42, 1.87, 2.35]	[0.72, 1.55, 2.04, 2.54]
[τ₇₂, τ₇₃, τ₇₄, τ₇₅]	[0.80, 1.65, 2.14, 2.66]	[0.03, 0.06, 0.07, 0.08]	[0.74, 1.55, 2.01, 2.52]	[0.85, 1.74, 2.25, 2.79]
[τ₈₂, τ₈₃, τ₈₄, τ₈₅]	[0.20, 0.98, 1.48, 1.95]	[0.02, 0.02, 0.03, 0.03]	[0.18, 0.94, 1.43, 1.90]	[0.23, 1.02, 1.53, 2.00]
ϕ₁₁	0.11	0.00	0.10	0.11
ϕ₁₂	−0.10	0.01	−0.11	−0.09
ϕ₂₂	0.23	0.01	0.21	0.25

Open in a new tab

Note. The values of the lowest thresholds of the ordinal items, τ₁₁, τ₂₁, τ₃₁, τ₄₁, τ₅₁, τ₆₁, τ₇₁, τ₈₁, were set to −0.71, −1.88, −1.44, −1.34, −0.33, −0.26, −0.13, and −0.67, respectively. The values of the highest thresholds of the ordinal items, τ₁₆, τ₂₆, τ₃₆, τ₄₆, τ₅₆, τ₆₆, τ₇₆, τ₈₆, were set to 2.16, 1.71, 1.72, 1.91, 3.09, 2.95, 3.12, and 2.44, respectively.

Consistent with previously published results (Ram et al., 2005), all the PE and NE indicators showed relatively little differentiation (i.e., distances between thresholds were small) among the middle categories. High levels of NE were relatively rare, resulting in the threshold values of the NE items clustering around relatively high magnitudes. The baseline levels (as indicated by values of μ) were estimated to be close to zero, with the intercepts of the NE continuous data being significantly greater than zero. At the factor level, PE and NE were found to be weakly negatively correlated, as would be expected based on previous findings concerning the structure of emotions.

We obtained estimates of each person’s parameters in b_i, denoted below as b̂_i, by averaging posterior samples from the distribution p(Z_{L_i} |L_i, μ_Z, Ψ_Z, τ, Y^*, θ, H, Y_obs) as

{\hat{b}}_{i} = Q^{- 1} \sum_{q = 1}^{Q} Z_{L_{i},}

(13)

where q = 1 denotes the first iteration after burn-in and Q is the maximum number of iterations. Distributions of these b̂_i estimates across participants (for i = 1, …, n) were the focus of our interest.⁵ These estimates were observed to show very similar distributional forms across the three Markov chains even with different hyperparameter choices. Matrix scatterplots of the b̂_i estimates averaged across the three chains and the corresponding histograms are shown in Figure 3. Based on the plots, distributions of the person-specific autoregressive parameters, b̂_11,i and b̂_22,i, were both highly skewed. Most individuals’ estimated autoregressive parameters lay in the moderate to high range (from around 0.5 to close to 1.0), with a small number of individuals showing near-zero autoregressive estimates. That is, the latter subgroup of individuals tended to show very little stability or continuity in their PE and NE from day to day.

Matrix scatterplots of the *b̂_i* estimates using the DP prior, as averaged across three Markov chains. A loess line is imposed on each of the scatterplots. The diagonal plots are histograms of these estimates. For b_11,i, M = 0.83, SD = 0.20, 90% CI [0.44, 1.05]; for b_22,i, M = 0.64, SD = 0.21, 90% CI [0.26, 0.91]; for b_12,i, M = −0.03, SD = 0.14, 90% CI [−0.28, 0.20]; and for b_21,i, M = 0.26, SD = 0.12, 90% CI [0.06, 0.46].

The 90% credible interval⁶ associated with b̂_12,i included zero but that for b̂_21,i was barely above zero. Thus, on average, the present sample showed unidirectional coupling from PE to NE when PE is at high values, but no coupling in the reverse direction when NE was high. This finding provided partial support for the dynamic affect model postulated by Zautra et al. (2000). That is, we found that there was an increased linkage between PE and NE at high values of PE, but the association was driven more by the lagged influence of PE on NE.

Allowing the distributions of the person-specific parameters to deviate from normality also helped reveal novel interrelationships among the auto- and cross-regression parameters. First, although most individuals’ b̂_12,i estimates clustered around zero, individuals who showed strong continuity in PE (with b_11,i estimates close to or above 1.0) tended to also show negative lagged influence from NE to PE. That is, high values of NE served to dampen the fluctuations in PE and bring it back towards its baseline.

A slightly different scenario was observed in the individuals’ NE regulation. Almost all individuals showed a positive cross-regression weight from PE at t − 1 to NE at time t. The positive cross-regression weights suggested that if an individual experienced extremely high PE yesterday, the high PE tended to delay the individual’s NE from returning to its baseline. This may, for instance, cause an individual’s NE to continue to wander around more extreme (e.g., extremely low) values for a longer period of time. Compared with individuals with high stability in NE, individuals who showed lower stability in NE (i.e., b̂_22,i estimates were low) showed greater tendency in this regard (i.e., showing higher positive b̂_21,i estimates).

To evaluate whether the non-parametric DP prior yielded any practical differences in the estimation results, we replicated the analysis by using a multivariate normal distribution as a parametric prior for the distribution of b_i. Specifically, we specified the parametric prior as

(b_{i} | μ_{Z_{Norm}}, Ψ_{Z_{Norm}}) ~ N_{r} (μ_{Z_{Norm}}, Ψ_{Z_{Norm}}),

with μ_{Z_Norm} ~ N_r([0.8,0.6, −0.03,0.26]′, Ψ_{Z_Norm₀}), based on the semiparametric empirical results. To allow for variability in the hyperparameter values across the MCMC chains, Ψ_{Z_Norm₀} was specified to be a diagonal matrix with the first two elements sampled from a Unif (0.3, 1) distribution and the last two elements sampled from a Unif (0.2, 0.8) distribution for each of the three independent Markov chains. Furthermore, we let $Ψ_{Z_{{Norm}_{k}}}^{- 1} ~ Gamma (d_{1}, d_{2})$ with d₁ set to 2 and d₂ sampled randomly from a Unif (0.1, 0.5) distribution for k = 1 and 2, and a Unif (0.05, 0.3) distribution for k = 3 and 4 for each of the independent Markov chains. As in the non-parametric case, these values were selected based on a prior expectation of the ranges of the parameters in b_i in stable systems. The resultant conditional distribution p(b|μ_{Z_Norm}, Ψ_{Z_Norm}, τ, Y^*, θ, H, Y_obs) was also non-standard and we used an MH step similar to that for deriving posterior samples of p(Z|L, μ_Z, ψ_Z, τ, Y^*, θ, H, Y_obs) to obtain posterior samples from this conditional distribution (see equation (A7) in the Appendix).

The b̂_i estimates obtained from using the multivariate normal prior are plotted in Figure 4, with density plots of the corresponding non-parametric posterior samples overlaid on the histograms in the diagonal panels. It can be seen that, compared with the non-parametric posterior samples, the parametric posterior samples of b̂_i did not capture the full ranges of the b̂_i estimates – in particular, the heavy-tailed nature of b̂_11,i and the platykurtic but asymmetric nature of b̂_22,i. Evaluation of the summary statistics of the posterior samples further revealed that whereas the means of the posterior samples were generally close under both prior specifications, the standard deviations of the estimates were about twice as large in the non-parametric case as in the parametric case. This has some parallels to asymptotic results within the linear frequentist framework, where violation of distributional assumptions has been shown to affect the standard error estimates, but not so much the corresponding point estimators (Ljung & Caines, 1979). Imposing the normality assumption also led to misleading conclusions concerning the interrelationships among the b̂_i estimates. For instance, whereas b̂_12,i was found to show a negative relationship with b̂_11,i and a quadratic relationship with b̂_22,i with the use of the DP prior (see Figure 3, third panels in rows 1 and 2), no such relationships were observed when the multivariate normal prior was used (see Figure 4, third panels in rows 1 and 2).

Matrix scatterplots of the person-specific parameter estimates with a multivariate normal prior, as averaged across three Markov chains. A loess line is imposed on each of the scatterplots. The diagonal plots are histograms of these estimates. For b_11,i, M = 0.86, SD = 0.10, 90% CI [0.69, 1.00]; for b_22,i, M = 0.63, SD = 0.14, 90% CI [0.36, 0.82]; for b_12,i, M = −0.07, SD = 0.08, 90% CI [−0.19, 0.06]; and for b_21,i, M = 0.26, SD = 0.13, 90% CI [0.03, 0.47].

Only minor differences were observed in the person- and time-invariant parameter estimates when the multivariate normal prior was used (see Table 2). All the estimates and their associated statistics were largely similar, with the exception of the parameters in μ, the intercepts of the continuous data array. Most of the elements in μ were greatly reduced in magnitude and included zero in most of their 90% credible intervals. Such differences likely reflected the discrepancies in the predicted trends of the individuals, due presumably to the greater restrictions of range in the b̂_i estimates in the parametric than in the non-parametric case.

Table 2.

Bayesian estimates of time- and person-invariant parameters from fitting the nonlinear DFA model to empirical data using the multivariate normal prior

Parameters	Mean	SD	5th percentile	95th percentile
[λ₂₁, λ₃₁, λ₄₁]	[1.08, 1.02, 0.86]	[0.02, 0.02, 0.02]	[1.05, 1.00, 0.83]	[1.11, 1.06, 0.89]
[λ₆₂, λ₇₂, λ₈₂]	[1.03, 0.77, 0.69]	[0.03, 0.03, 0.02]	[0.99, 0.73, 0.66]	[1.08, 0.81, 0.72]
ψ_ε₁	0.41	0.01	0.39	0.43
ψ_ε₂	0.35	0.01	0.33	0.36
ψ_ε₃	0.40	0.01	0.38	0.42
ψ_ε₄	0.55	0.01	0.53	0.57
ψ_ε₅	0.43	0.02	0.40	0.46
ψ_ε₆	0.29	0.01	0.27	0.31
ψ_ε₇	0.54	0.03	0.49	0.59
ψ_ε₈	0.63	0.02	0.60	0.66
μ₁	0.04	0.02	0.01	0.07
μ₂	0.01	0.02	−0.03	0.05
μ₃	0.02	0.02	−0.01	0.05
μ₄	0.00	0.02	−0.03	0.04
μ₅	0.00	0.02	−0.03	0.03
μ₆	0.01	0.02	−0.02	0.04
μ₇	0.00	0.02	−0.03	0.03
μ₈	0.00	0.02	−0.02	0.03
[τ₁₂, τ₁₃, τ₁₄, τ₁₅]	[−0.21, 0.39, 0.80, 1.39]	[0.01, 0.02, 0.02, 0.02]	[−0.23, 0.36, 0.77, 1.35]	[−0.19, 0.41, 0.83, 1.42]
[τ₂₂, τ₂₃, τ₂₄, τ₂₅]	[−1.16, −0.47, 0.12, 0.75]	[0.02, 0.02, 0.02, 0.02]	[−1.19, −0.50, 0.09, 0.72]	[−1.13, −0.44, 0.15, 0.78]
[τ₃₂, τ₃₃, τ₃₄, τ₃₅]	[−0.88, −0.31, 0.21, 0.88]	[0.01, 0.02, 0.02, 0.02]	[−0.90, −0.34, 0.18, 0.85]	[−0.86, −0.29, 0.23, 0.91]
[τ₄₂, τ₄₃, τ₄₄, τ₄₅]	[−0.60, 0.19, 0.65, 1.21]	[0.01, 0.02, 0.02, 0.02]	[−0.63, 0.17, 0.62, 1.18]	[−0.58, 0.22, 0.69, 1.25]
[τ₅₂, τ₅₃, τ₅₄, τ₅₅]	[0.91, 1.74, 2.24, 2.72]	[0.02, 0.04, 0.04, 0.05]	[0.88, 1.68, 2.17, 2.63]	[0.95, 1.80, 2.32, 2.79]
[τ₆₂, τ₆₃, τ₆₄, τ₆₅]	[0.69, 1.50, 1.98, 2.48]	[0.02, 0.03, 0.04, 0.05]	[0.65, 1.45, 1.91, 2.39]	[0.72, 1.56, 2.05, 2.55]
[τ₇₂, τ₇₃, τ₇₄, τ₇₅]	[0.79, 1.64, 2.13, 2.66]	[0.03, 0.05, 0.06, 0.06]	[0.75, 1.57, 2.04, 2.56]	[0.84, 1.73, 2.23, 2.77]
[τ₈₂, τ₈₃, τ₈₄, τ₈₅]	[0.20, 0.98, 1.48, 1.95]	[0.02, 0.02, 0.03, 0.03]	[0.18, 0.94, 1.43, 1.90]	[0.23, 1.02, 1.53, 2.00]
ϕ₁₁	0.11	0.00	0.10	0.12
ϕ₁₂	−0.10	0.00	−0.11	−0.09
ϕ₂₂	0.23	0.01	0.22	0.25

Open in a new tab

To summarize, by using the DP as a non-parametric prior for parameters in a nonlinear DFA model, we found that the distributions of the two autoregressive parameters, b_11,i and b_22,i, did deviate substantially from normality. Using other parametric (e.g., normal) distributions to approximate these skewed distributions did not reveal the full range of individual differences in these parameters and led to misleading conclusions concerning some of the more complex, and often nonlinear, interrelationships among the b̂_i estimates.

7. Simulation study

To better understand the performance of the proposed procedures under known population conditions, we generated data using the nonlinear DFA model with different distributions for b_i but with approximately the same complete sample size and number of time points as our empirical example, with n = 170 and T = 50. The true parameters in our simulations were chosen to mirror (though not completely identical to) the estimates obtained from our empirical example. A total of 100 Monte Carlo replications was conducted for each of the simulation conditions described below.

Parameters in the measurement model (see equations (9)–(11)) were identical in all our simulation models and they were set to the values of

\begin{matrix} Λ' & = [\begin{matrix} 1.0 & 0.8 & 0.8 & 0.8 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1.0 & 0.8 & 0.8 & 0.8 \end{matrix}], \\ Ψ_{ε} & = 0.8 \times I_{8}, \\ μ & = (0, 0, \dots, 0), \\ τ_{k} & \begin{matrix} = (- 3.0, - 2.0, - 1.0, 0.0, 0.5, 2.0), & for k = 1, \dots, 4, \end{matrix} \\ \begin{matrix} = (- 1.0, - 0.5, 0.0, 1.0, 1.5, 2.0), & for k = 5, \dots, 8 . \end{matrix} \end{matrix}

(14)

For identification purposes, each factor’s loading on the first indicator was set to the true value of 1.0 in model fitting. In all conditions, the true process noise covariance matrix was set to

Ψ_{ζ} = [\begin{matrix} 1 \\ - 0.3 & 1 \end{matrix}] .

We tested the effectiveness of using the DP prior to approximate three sets of distributional conditions.

Condition 1. Here, we defined the distributions of b_i as

\begin{matrix} b_{11, i} ~ Beta (3, 8), & b_{22, i} ~ Beta (3, 8), \\ b_{12, i} ~ N (- 0.15, 0.001), & b_{21, i} ~ N (- 0.15, 0.001), \end{matrix}

(15)

where b_11,i, b_22,i, b_12,i, and b_21,i are as defined in equation (7). This condition was designed to generate positively skewed distributions for the baseline autoregressive parameters, b_11,i and b_22,i.

Condition 2. Here, we specified some of the distributions to be bimodal:

\begin{matrix} b_{11, i} ~ Unif (0.2, 0.4), for i = 1, \dots, n / 2, & b_{11, i} ~ Unif (0.7, 0.8), for i = n / 2 + 1, \dots, n, \\ b_{22, i} ~ Unif (0.2, 0.4), for i = 1, \dots, n / 2, & b_{22, i} ~ Unif (0.7, 0.8), for i = n / 2 + 1, \dots, n, \\ b_{12, i} ~ N (- 0.15, 0.001), & b_{21, i} ~ N (- 0.15, 0.001) . \end{matrix}

(16)

Condition 3. Here, we specified the distributions of b_i to be

\begin{matrix} b_{11, i} ~ N (0.6, 0.005), & b_{22, i} ~ N (0.6, 0.005), \\ b_{12, i} ~ N (- 0.15, 0.001) & b_{21, i} ~ N (- 0.15, 0.001) . \end{matrix}

(17)

This condition was used to illustrate that even when the normality assumption holds, the DP prior is still general enough to capture characteristics of the multivariate normal distribution as a special case.

The same starting values were used in all conditions. All freed elements in Λ were set to 0.5. Initial Ψ_ε was set to I₈ and μ was set to (0.5, 0.5, …, 0.5). Initial values for τ_k were set to (−3.0, −1.5, −0.5, 0.5, 1.0, 2.0) for k = 1, …, 4 and (−1.0, 0.0, 0.5, 0.8, 1.3, 2.0) for k = 5, …, 8. Starting values for the process noise covariance matrix were set to

Ψ_{ζ} = [\begin{matrix} 1.2 \\ 0.5 & 0.8 \end{matrix}]

In addition, all values in the latent variable vector, η_it, were set to 0 for all persons and time points, and initial values of b_i were sampled randomly from N_r(μ_Z₀, I_r).

The same prior distributions and hyperparameters used in the empirical application were adopted in the simulation study with some minor adaptations. Specifically, based on characteristics of the distributions of b_i in each simulation condition (see equations (15)–(17)) and the acceptance rates for the MH step for drawing posterior samples from p(Z|L, μ_Z, Ψ_Z, τ, Y^*, θ, H, Y_obs), we specified Ψ_{μ_Z} to be 1.0 and 15.0 and c₂ to be 14.0 and 0.4, respectively, for the first two and last two elements of b_i in the first condition. In the second condition, we changed c₂ to 4.0 and 0.4, respectively, for the first two and last two elements of b_i while keeping other hyperparameters to be the same as those in condition 1. Finally, in condition 3, we changed c₂ to 1.0 and 0.3 and Ψ_{μ_Z} to 1.0 and 20.0, respectively, for the first two and last two elements of b_i. In the first two conditions, the tails of the distributions of the b_i parameters extended relatively far away from the modes of the distributions compared with condition 3. Thus, the hyperparameters were modified accordingly to yield better approximation in the tail areas. In the empirical application, these hyperparameters were drawn randomly from ranges that were broad enough to encompass all the hyperparameter values noted here.

We organized our simulation results into three sections to summarize results pertaining to factor score (i.e., state) estimation; estimation of all time- and person-invariant parameters; and estimation of all person-specific parameters. A total of 20,000 burn-in iterations was used in each Monte Carlo replication. We used a relatively high number of burn-in iterations due to the greater difficulties involved in recovering the person-specific distributions for b_12,i and b_21,i, whose impact on the system’s dynamics was only evident at high values of the latent variables. It is thus harder to recover the shapes of these parameters’ distributions given the moderate sample sizes of n = 170 and T = 50.

7.1. Factor score or state estimation

For illustration purposes, the estimated and true values of factor 1 from one randomly selected case during one particular Monte Carlo run are plotted in Figure 5. The estimates plotted in Figure 5 were the means of the posterior samples drawn from the posterior density of the selected individual, p(H_i|τ, Y^*, θ, Y_obs), for each time point after the burn-in iterations.

True factor scores, estimated factor scores and intervals constructed from twice the standard deviation of the posterior samples of the first latent variable from for one randomly selected hypothetical subject in (a) condition 1, (b) condition 2, and (c) condition 3.

It can be seen that the proposed algorithm was able to recover the true factor scores accurately across all the conditions. Although factor score estimation is not an issue of primary interest in our simulation or empirical examples, it is an important component in studies where the primary interest is to obtain longitudinal factor score estimates or estimates of time-varying parameters that are represented as latent variables (see Young, Pedregal, & Tych, 1999).

7.2. Time- and person-invariant parameters

All time- and person-invariant parameter estimates are summarized in Tables 3–5 for conditions 1, 2, and 3, respectively. Included in the tables are the biases (for parameter l, $bias = Σ_{r = 1}^{100} {\bar{θ}}_{l, r} / 100 - θ_{l}$ where θ_l is the true value for parameter l and θ̄_l,r is the average of the Gibbs samples of parameter l during the rth Monte Carlo run after burn-in), root mean squared errors (RMSE, given by $\sqrt{Σ_{r = 1}^{100} {({\bar{θ}}_{l, r} - θ_{l})}^{2} / 100})$ , empirical standard deviations of each parameter across the 100 Monte Carlo runs (denoted by SD), standard deviations of the Gibbs samples of each parameter averaged across Monte Carlo runs (denoted by Est SD) and the 90% coverage rates (percentage of Monte Carlo runs for which the 90% credible intervals for parameter l contained the corresponding true value, θ_l).

Table 3.

Bayesian estimates of the person-invariant parameters with beta-distributed baseline autoregressive parameters (condition 1) in the simulation study

Parameters	Est SD	SD	Bias	RMSE	Coverage (%)
[λ₂₁, λ₃₁, λ₄₁]	[0.02, 0.02, 0.02]	[0.02, 0.02, 0.02]	[0.000, −0.006, −0.005]	[0.02, 0.02, 0.02]	[83, 85, 82]
[λ₆₂, λ₇₂, λ₈₂]	[0.02, 0.02, 0.02]	[0.02, 0.02, 0.02]	[−0.002, −0.001, 0.000]	[0.02, 0.02, 0.02]	[87, 89, 92]
ψ_ε₁	0.02	0.02	0.003	0.02	87
ψ_ε₂	0.02	0.02	0.005	0.02	90
ψ_ε₃	0.02	0.02	−0.005	0.02	89
ψ_ε₄	0.02	0.02	0.004	0.02	90
ψ_ε₅	0.02	0.02	0.003	0.03	91
ψ_ε₆	0.02	0.02	0.002	0.02	85
ψ_ε₇	0.02	0.02	−0.001	0.02	86
ψ_ε₈	0.02	0.02	0.005	0.02	87
μ₁	0.03	0.03	0.000	0.03	80
μ₂	0.03	0.03	−0.002	0.03	88
μ₃	0.03	0.03	0.006	0.03	85
μ₄	0.03	0.03	0.004	0.03	89
μ₅	0.02	0.02	0.006	0.02	90
μ₆	0.02	0.02	0.002	0.02	93
μ₇	0.02	0.02	0.002	0.02	90
μ₈	0.02	0.02	0.004	0.02	89
[τ₁₂, τ₁₃, τ₁₄, τ₁₅]	[0.03, 0.03, 0.02, 0.02]	[0.04, 0.04, 0.03, 0.03]	[−0.003, −0.003, −0.003, −0.001]	[0.04, 0.04, 0.03, 0.03]	[84, 78, 79, 76]
[τ₂₂, τ₂₃, τ₂₄, τ₂₅]	[0.04, 0.03, 0.03, 0.02]	[0.05, 0.04, 0.03, 0.03]	[−0.006, −0.007, −0.004, −0.003]	[0.05, 0.04, 0.03, 0.03]	[83, 83, 83, 81]
[τ₃₂, τ₃₃, τ₃₄, τ₃₅]	[0.04, 0.03, 0.03, 0.02]	[0.04, 0.03, 0.03, 0.02]	[0.009, 0.009, 0.005, 0.003]	[0.04, 0.03, 0.03, 0.02]	[89, 85, 87, 91]
[τ₄₂, τ₄₃, τ₄₄, τ₄₅]	[0.04, 0.03, 0.03, 0.03]	[0.04, 0.04, 0.03, 0.03]	[0.009, 0.004, 0.005, 0.005]	[0.04, 0.04, 0.03, 0.03]	[90, 82, 80, 84]
[τ₅₂, τ₅₃, τ₅₄, τ₅₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[0.001, 0.001, 0.003, 0.003]	[0.01, 0.02, 0.02, 0.02]	[88, 89, 82, 85]
[τ₆₂, τ₆₃, τ₆₄, τ₆₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[0.000, 0.000, −0.001, −0.001]	[0.01, 0.02, 0.02, 0.02]	[86, 85, 82, 85]
[τ₇₂, τ₇₃, τ₇₄, τ₇₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[−0.001, −0.002, −0.001, −0.002]	[0.01, 0.02, 0.02, 0.02]	[89, 84, 88, 86]
[τ₈₂, τ₈₃, τ₈₄, τ₈₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[−0.001, 0.001, 0.002, 0.002]	[0.01, 0.02, 0.02, 0.02]	[87, 89, 84, 85]
ϕ₁₁	0.03	0.04	0.007	0.04	79
ϕ₁₂	0.02	0.02	0.002	0.02	89
ϕ₂₂	0.03	0.04	0.001	0.04	88

Open in a new tab

Table 5.

Bayesian estimates of the person-invariant parameters with normally distributed baseline autoregressive parameters (condition 3) in the simulation study

Parameters	Est SD	SD	Bias	RMSE	Coverage (%)
[λ₂₁, λ₃₁, λ₄₁]	[0.02, 0.02, 0.02]	[0.02, 0.02, 0.02]	[−0.001, 0.001, −0.002]	[0.02, 0.02, 0.02]	[86, 90, 80]
[λ₆₂, λ₇₂, λ₈₂]	[0.02, 0.02, 0.02]	[0.01, 0.02, 0.01]	[−0.002, 0.000, −0.001]	[0.01, 0.02, 0.01]	[91, 87, 89]
ψ_ε₁	0.02	0.02	−0.003	0.02	84
ψ_ε₂	0.02	0.02	0.000	0.02	87
ψ_ε₃	0.02	0.02	0.002	0.02	90
ψ_ε₄	0.02	0.02	0.000	0.02	82
ψ_ε₅	0.02	0.02	0.009	0.03	87
ψ_ε₆	0.02	0.02	0.002	0.02	85
ψ_ε₇	0.02	0.02	0.001	0.02	85
ψ_ε₈	0.02	0.02	0.001	0.02	86
μ₁	0.03	0.04	0.004	0.04	84
μ₂	0.03	0.03	0.002	0.03	79
μ₃	0.03	0.03	0.001	0.03	84
μ₄	0.03	0.03	−0.001	0.03	84
μ₅	0.03	0.03	0.000	0.02	76
μ₆	0.02	0.02	−0.001	0.02	83
μ₇	0.02	0.03	−0.001	0.03	82
μ₈	0.02	0.02	−0.001	0.02	81
[τ₁₂, τ₁₃, τ₁₄, τ₁₅]	[0.03, 0.03, 0.02, 0.02]	[0.03, 0.03, 0.02, 0.02]	[0.001, 0.002, 0.003, 0.003]	[0.03, 0.03, 0.02, 0.02]	[82, 84, 87, 85]
[τ₂₂, τ₂₃, τ₂₄, τ₂₅]	[0.04, 0.03, 0.03, 0.02]	[0.04, 0.04, 0.03, 0.03]	[−0.003, −0.001, 0.001, 0.001]	[0.04, 0.04, 0.03, 0.03]	[84, 79, 83, 86]
[τ₃₂, τ₃₃, τ₃₄, τ₃₅]	[0.04, 0.03, 0.03, 0.02]	[0.04, 0.03, 0.03, 0.02]	[−0.003, −0.001, −0.001, −0.003]	[0.04, 0.03, 0.03, 0.02]	[83, 86, 86, 86]
[τ₄₂, τ₄₃, τ₄₄, τ₄₅]	[0.04, 0.03, 0.03, 0.02]	[0.04, 0.03, 0.03, 0.03]	[−0.002, −0.001, −0.001, −0.002]	[0.04, 0.03, 0.03, 0.03]	[92, 85, 89, 89]
[τ₅₂, τ₅₃, τ₅₄, τ₅₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[0.000, 0.001, 0.002, 0.001]	[0.01, 0.02, 0.02, 0.02]	[89, 90, 84, 89]
[τ₆₂, τ₆₃, τ₆₄, τ₆₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[−0.001, 0.000, −0.001, 0.000]	[0.01, 0.01, 0.02, 0.02]	[94, 92, 85, 89]
[τ₇₂, τ₇₃, τ₇₄, τ₇₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[0.000, 0.001, 0.000, −0.001]	[0.01, 0.02, 0.02, 0.02]	[89, 92, 94, 89]
[τ₈₂, τ₈₃, τ₈₄, τ₈₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[−0.001, 0.000, −0.001, −0.002]	[0.01, 0.02, 0.02, 0.02]	[93, 92, 87, 98]
ϕ₁₁	0.03	0.03	0.003	0.03	91
ϕ₁₂	0.02	0.02	0.001	0.02	93
ϕ₂₂	0.03	0.03	0.006	0.03	92

Open in a new tab

All the parameters were recovered accurately across all three conditions (see biases and RMSEs in Tables 3–5). The average standard deviations (Est SD) of the Gibbs samples were also close to the empirical standard deviations of the parameters. The coverage rates computed using the 5th and 95th percentiles of the Gibbs samples were on average close to but slightly below the 90% nominal rate. Average coverage rates were 85.82, 88.26, and 86.86, respectively, for the three conditions.

Condition 1, which was characterized by positively skewed autoregressive parameter distributions, showed comparable biases and RMSEs to other conditions. Slightly greater discrepancies arose in the tail percentiles, thus yielding slightly lower coverage rates for this condition than for the other two conditions. As will be discussed in Section 7.3, this is directly attributable to the biases in estimating the tail areas of the person-specific parameter distributions in condition 1. Such biases in the person-specific parameters also affected the coverage rates of other person-invariant parameters.

The relatively low average coverage rates in the normal condition were contributed largely by the low coverage rates of the parameters in μ. The coverage rates for the parameters in μ were notably lower in condition 3 than in conditions 1 and 2. This may be related to the fact that a small number of the b_i parameters actually lay near the boundary of or within the non-stationary region. By ‘non-stationary region’, we mean ranges of parameters that would propel a system to show trends or, specifically, continual deviations from its baseline levels, as defined by μ (as in Figure 5c, for example). As a result, the system also shows increasing variance over time. Larger biases are typically observed in such cases irrespective of other features of the simulations. Even though we used normal distributions of relatively restricted ranges for this condition to confine most cases to the stationary region, some of the parameters in b_i still crossed or lay near the boundary of the non-stationary region. These cases led to more extreme μ estimates that affected the coverage rates for μ in condition 3 directly.

7.3. Estimation of person-specific parameters, b_i

Our main focus of interest was to compare the distribution of true b_i (for i = 1, …, n) to the distribution of b̂_i obtained from Gibbs sampling. The means and standard deviations of b̂_i (derived using equation (13)) computed across persons are summarized in Table 6. Further details are summarized in Figure 6. Included in the figure are plots of b̂_11,i and b̂_12,i during one particular Monte Carlo run in comparison to the densities of the true person-specific parameters generated using equations (15)–(17) (see panels (a) and (b), respectively), and quantile–quantile plots of the true b_11,i and b_12,i values against the b̂_11,i and b̂_12,i estimates pooled across all Monte Carlo runs.

Table 6.

Means and standard deviations of the b_i estimates

	Condition 1				Condition 2				Condition 3

Parameters	Mean	Est Mean	SD	Est SD	Mean	Est Mean	SD	Est SD	Mean	Est Mean	SD	Est SD
b_11,i	0.27	0.24	0.13	0.12	0.53	0.49	0.23	0.23	0.60	0.60	0.07	0.07
b_22,i	0.27	0.24	0.13	0.11	0.53	0.49	0.23	0.22	0.59	0.59	0.07	0.08
b_12,i	−0.15	−0.11	0.03	0.03	−0.15	−0.13	0.03	0.04	−0.15	−0.19	0.03	0.04
b_21,i	−0.15	−0.11	0.03	0.03	−0.15	−0.12	0.03	0.04	−0.15	−0.17	0.03	0.04

Open in a new tab

Note. Mean, true empirical mean of the distribution; Est mean, mean of the posterior samples; SD, true empirical standard deviation of the distribution; Est SD, standard deviation of the posterior samples.

(a), (e), (i) True density of b_11,i, the approximation density constituted by *b̂_i* for conditions 1, 2, and 3, respectively. (b), (f), (j) The corresponding plots associated with b_12,i. (c), (g), (k) Quantile–quantile plots comparing the true and estimated b_11,i pooled across all Monte Carlo runs; the straight line provides the reference for y = x. (d), (h), (l) Quantile–quantile plot comparing the true and estimated b_12,i; the straight line provides the reference for y = x.

Generally, the means and standard deviations of the true b_i distributions were recovered very accurately across all conditions (see Table 6). Several additional observations can be noted based on Figure 6. First, the conditional distributions derived from using the DP prior were flexible enough to recover the general shapes of the different distributions of b_i used in all three conditions. Second, the shapes of the distributions were more accurately recovered in condition 3 than in the other two conditions because the associated true densities (i.e., multivariate normal) were of the same form as the base distribution.

Third, in condition 1, discrepancies in the estimation of b_11,i arose primarily in the lower tail region. That is, the lower tail extended too far into the negative region, but the densities in the immediately adjacent regions did not rise rapidly enough to capture some of the more subtle changes in the lower quantiles of the positively skewed true distribution. Fourth, in condition 2, the approximation density generally resembled the bimodal density of the true distribution of b_11,i, but slight discrepancies were observed near the tail areas of the two modes. In particular, the upper tails of the two modes were assigned too much weight whereas the lower tails were too sparsely represented.

Finally, greater biases were observed in the estimates of b̂_12,i and b̂_21,i than those of b̂_11,i and b̂_22,i across all conditions (see Table 6 and Figures 6d, 6h, and 6l). This is not surprising, since the nonlinear model posited that the impacts due to b_12,i and b_12,i would only be fully manifested at extremely high values of the latent variables. Such instances were relatively rare in the simulated data, with only 50 time points and n = 170. Thus, even though the distributions constituted by b̂_12,i andb̂_21,i generally provided reasonable approximations to the shapes of the distributions of b_12,i and b_21,i, the means of the distributions were slightly offset (i.e., biased).

In sum, the proposed estimation procedures were able to recover all the components in the nonlinear DFA model accurately under diverse distributional assumptions for the parameters in b_i. The sample size considered in the present simulation (with n = 170 and T = 50) yielded reasonable estimates, although larger sample sizes might be needed to improve the accuracy of the b̂_12,i and b̂_21,i estimates. In addition, although one particular nonlinear DFA model was considered in all the analyses, the proposed procedures are general enough to be used with any dynamic model with differentiable linear and nonlinear functions and normally distributed process noises (i.e., in the form of equation (8)). To illustrate the performance of the proposed estimation procedures within a linear modelling framework, we conducted a supplementary simulation study using a variation of the linear DFA model in equation (6). In particular, we allowed the parameters b₁₁, b₁₂, b₂₁, and b₂₂ in equation (6) to vary over persons and used the DP to approximate their corresponding distributions (generated in the same way as in conditions 1, 2, and 3 in the present simulation). Results based on the linear model are comparable to those obtained from the nonlinear model. Further details are not reported here due to space constraints, but they are available as supplementary materials on the first author’s website at http://www.unc.edu/~symiin/Sy-Miin%27s%20website/pub.htm

8. Discussion

In the present paper, we used the DP as a non-parametric prior distribution for selected parameters in a nonlinear DFA model. Using the DP as a prior is equivalent to specifying the prior distribution as a mixture distribution composed of an unknown number of discrete point masses. This approach thus provides the flexibility of a non-parametric mixture approach without the need to define the precise number of (or the range of possible numbers of) mixture components required to approximate an unknown distribution. In addition, we also incorporated several MH procedures within a Gibbs-sampling framework to handle some of the non-standard conditional distributions implicated in the proposed nonlinear DFA model.

A series of empirical and simulation examples was used to illustrate the flexibility of the proposed approach in approximating distributions of various shapes (e.g., normal, bimodal, and skewed). Our empirical example revealed that the baseline autoregressive parameters in our proposed DFA model did in fact show substantial deviations from normality. Using a multivariate normal prior did not reveal the full ranges of the associated parameters and their complex interrelationships. Researchers may thus risk bypassing the true nature of the emotional processes being modelled if parametric assumptions are imposed without any evaluation of their tenability.

Our empirical example provided several new insights into Zautra and colleagues’ dynamic affect model. Using one particular non-linear model, we validated that the linkage between PE and NE did indeed intensify on the more emotional days. A unidirectional relation was found in the direction from PE to NE for most individuals when their PE was at high extreme values. The lack of coupling in the direction from NE to PE might be attributable to the participants’ generally low NE levels. Thus, very few participants actually manifested a similar change in the direction from NE to PE. By allowing the dynamic parameters in the model to assume non-parametric forms, we were able to evaluate such individual differences more thoroughly.

Some comments can be noted concerning the selection of hyperparameters for the base weight, α. In all our examples, we used hyperparameter choices (i.e., α₁ and α₂ in equation (12)) that yielded relatively high values of α to capture clusters or ‘sticks’ that were relatively far away from the means or modes of the distributions. Higher values of α are typically needed to approximate distributions that are of high dimensions. Generally, given the moderate sample size in the present study, recovering higher-order moments of the distributions of interest can be difficult. We were able to recover the first and second moments reasonably accurately, however.

The present paper is one of the first applications of semiparametric nonlinear dynamic LVMs to studying change in psychology using the DP prior. Many other extensions are, of course, possible. For instance, we did not pursue the issue of model comparison in the present paper. When different prior densities are assumed across two models, some of the common model fit indices within the Bayesian framework cannot be utilized directly without some modifications. In particular, computing the Bayes factor via path sampling (Gelman & Meng, 1998) is not a straightforward matter in this case because of the complexity involved in linking the discrepant prior densities from the different models. Other test statistics, such as those developed by Zhu and Zhang (2004) for assessing finite mixture regression models, can potentially be extended to a dynamic LVM framework to provide a more formal assessment of the need to use infinite-order mixture distributions. Another relatively recent model assessment index, termed the L measure (Chen, Dey, & Ibrahim, 2004; Ibrahim, Chen, & Sinha, 2001), has also been advocated as an alternative goodness-of-fit index that works well in situations where proper prior distributions cannot be explicitly derived.

With regard to state (or latent variable score) estimation, we only used first-order linearization to derive the proposal distribution in the MH step. The potential utility of using higher-order linearization schemes or other proposal functions (e.g., Geweke & Tanizaki, 2001) in the state density sampling step could be evaluated in future studies. Hybrid algorithms that combine more computationally efficient particle filtering techniques with MCMC algorithms (Doucet, de Freitas, & Gordon, 2001) could also be developed to aid computation speed. In addition, specification of the person-specific parameters could be reformulated within a mixed effects framework to include both fixed and random effects components. Further investigation of the tenability of assuming a missing at random mechanism for the present data set is also warranted.

Using the DP as a non-parametric prior is not without its limitations. The biggest limitation resides perhaps in the discrete nature of the DP, which dictates that different individuals who are assigned to the same ‘cluster’ (where the number of clusters is less than n) would have exactly the same parameter values. One way to circumvent this issue is to use the mixture DP (Caron, Davy, Doucet, Duflos, & Vanheeghe, 2008; Escobar & West, 1995) as an alternative choice. In this case, the prior distribution essentially consists of a mixture of different DP priors. In doing so, different individuals may be assigned similar but not identical values on the parameters or constructs of interest. A second limitation is that the accuracy of the DP approximation is still constrained by the choice of the base distribution. In cases where the true distribution of interest deviates too much from the base distribution, the accuracy of the approximation would also deteriorate accordingly.

The nonlinear DFA model proposed in the present paper is but one example of the many dynamic models that can be used to described change processes. A wide array of modelling examples along these lines can be found in the literature on dynamic linear models (West & Harrison, 1997), dynamic generalized linear models (Fahrmeir & Tutz, 1994), regime-switching state-space models (Kim & Nelson, 1999), differential equation models (Molenaar & Newell, 2003; Singer, 2007), and other nonlinear and non-Gaussian dynamic models (Chow, Ferrer, & Hsieh, 2009; Durbin & Koopman, 2001). Whereas formulating dynamic models within a Bayesian framework opens up countless new possibilities for evaluating more complex models, many of the issues inherent to the Bayesian framework also have to be handled with caution. Sensitivity of the modelling results to prior choices and misspecification of other parametric/nonparametric assumptions remains an important issue that deserves more attention from researchers. Parallel to the increase in model complexity are, of course, new challenges for deriving appropriate model fit indices and convergence diagnostics.

To conclude, the DP prior can be used as a flexible non-parametric prior for distributions whose functional forms are unknown. In time series modelling, it is not unusual to encounter parameter distributions that show complex restrictions in range. Very often, the associated distributions are not only non-normal, but also asymmetric. Taking a non-parametric or semiparametric approach allows the assumption of normality to be treated as a testable hypothesis, as opposed to a ‘gold standard’ by which modellers have to abide. We hope to have illustrated the need to relax some of these parametric assumptions in fitting dynamic LVMs.

Table 4.

Bayesian estimates of the person-invariant parameters with bimodal baseline autoregressive parameter distributions (condition 2) in the simulation study

Parameters	Est SD	SD	Bias	RMSE	Coverage (%)
[λ₂₁, λ₃₁, λ₄₁]	[0.02, 0.02, 0.02]	[0.01, 0.02, 0.02]	[0.002, 0.001, 0.002]	[0.01, 0.02, 0.02]	[90, 84, 89]
[λ₆₂, λ₇₂, λ₈₂]	[0.02, 0.02, 0.02]	[0.02, 0.01, 0.01]	[0.000, 0.000, 0.001]	[0.02, 0.01, 0.01]	[87, 91, 93]
ψ_ε₁	0.02	0.03	−0.001	0.02	83
ψ_ε₂	0.02	0.02	0.003	0.02	88
ψ_ε₃	0.02	0.02	−0.000	0.02	90
ψ_ε₄	0.02	0.02	0.004	0.02	88
ψ_ε₅	0.02	0.02	−0.000	0.03	87
ψ_ε₆	0.02	0.02	0.000	0.02	88
ψ_ε₇	0.02	0.02	−0.001	0.02	90
ψ_ε₈	0.02	0.02	0.001	0.02	87
μ₁	0.03	0.03	0.008	0.03	85
μ₂	0.03	0.03	−0.001	0.03	86
μ₃	0.03	0.03	0.005	0.03	89
μ₄	0.03	0.03	0.002	0.03	83
μ₅	0.02	0.02	0.005	0.02	91
μ₆	0.02	0.02	0.005	0.02	92
μ₇	0.02	0.02	0.003	0.02	91
μ₈	0.02	0.02	0.002	0.02	92
[τ₁₂, τ₁₃, τ₁₄, τ₁₅]	[0.03, 0.03, 0.02, 0.02]	[0.04, 0.03, 0.03, 0.03]	[0.007, 0.004, 0.006, 0.004]	[0.04, 0.03, 0.03, 0.03]	[87, 89, 80, 85]
[τ₂₂, τ₂₃, τ₂₄, τ₂₅]	[0.04, 0.03, 0.03, 0.02]	[0.04, 0.03, 0.03, 0.03]	[−0.006, −0.004, −0.007, −0.004]	[0.04, 0.03, 0.03, 0.03]	[85, 89, 84, 83]
[τ₃₂, τ₃₃, τ₃₄, τ₃₅]	[0.04, 0.03, 0.02, 0.02]	[0.04, 0.03, 0.03, 0.03]	[0.006, 0.002, −0.002, −0.001]	[0.04, 0.03, 0.03, 0.03]	[88, 82, 85, 84]
[τ₄₂, τ₄₃, τ₄₄, τ₄₅]	[0.04, 0.03, 0.02, 0.03]	[0.04, 0.03, 0.02, 0.02]	[−0.003, −0.000, −0.003, −0.001]	[0.04, 0.03, 0.02, 0.02]	[84, 87, 88, 93]
[τ₅₂, τ₅₃, τ₅₄, τ₅₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.01, 0.02, 0.01]	[−0.001, 0.001, −0.001, −0.001]	[0.01, 0.01, 0.02, 0.02]	[89, 91, 90, 92]
[τ₆₂, τ₆₃, τ₆₄, τ₆₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[0.003, 0.002, −0.001, 0.001]	[0.01, 0.02, 0.02, 0.02]	[91, 87, 89, 89]
[τ₇₂, τ₇₃, τ₇₄, τ₇₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.01, 0.02, 0.02]	[−0.003, −0.003, −0.004, −0.002]	[0.01, 0.01, 0.02, 0.02]	[87, 93, 92, 92]
[τ₈₂, τ₈₃, τ₈₄, τ₈₅]	[0.01, 0.02, 0.02, 0.02]	[0.01, 0.02, 0.02, 0.02]	[−0.002, −0.003, −0.002, −0.002]	[0.01, 0.02, 0.02, 0.02]	[92, 88, 88, 88]
ϕ₁₁	0.03	0.03	−0.001	0.04	89
ϕ₁₂	0.02	0.01	0.006	0.02	93
ϕ₂₂	0.03	0.03	−0.002	0.03	94

Open in a new tab

Acknowledgements

We would like to thank Frank Fujita for allowing us to use his data for the empirical illustration in this paper. The C++ scripts used for simulations in the present article are available upon request from the first author. Funding for this study was provided by grants from NSF (BCS-0826844), NIH (UL1-RR025747-01, MH086633, P01CA142538-01 and AG033387), and NSFC (10961026).

Appendix

Conditional distributions used in the Gibbs sampling procedures

To estimate the proposed nonlinear LVM, the Gibbs sampler is implemented in which a sequence of sampling steps [steps (a)–(i)] is carried out iteratively. The conditional distributions from which Gibbs samples are obtained are summarized below.

Steps (a)–(e): Conditional distributions related to the non-parametric components

The main idea behind efficient sampling of the non-parametric components is to recast the definition of b_i in terms of the latent variable L_i, i = 1, …, n, which records the cluster membership of b_i such that b_i = Z_{L_i}. The base distribution in the present context was defined to be an r-variate normal distribution with mean vector μ_Z and covariance matrix Ψ_Z. Conjugate prior distributions were specified for μ_Z, Ψ_Z, and α as in equation (12). To explore the posterior in relation to the non-parametric components, we sample (π, Z, L, μ_Z, ψ_Z, α) by means of the blocked Gibbs sampler to encourage mixing of the Markov chain. That is, Gibbs sampling of the non-parametric components was regrouped into five subsidiary steps (or blocks), involving sampling from the conditional distributions p(π, Z|L, μ_Z, ψ_Z, α, τ, Y^*, θ, H, Y_obs), p(L|π, Z, τ, Y^*, θ, H, Y_obs), p(μ_Z|Z, ψ_Z), p(ψ_Z|Z, μ_Z), and p(α|π). These five conditional distributions are summarized below.

Block 1. Posterior samples of [μ_Z|Z, Ψ_Z] can be obtained by sampling from

p (μ_{Z} | Z, Ψ_{Z}) ~ N_{r} (μ_{μ}, Σ_{μ}),

(A1)

where $Σ_{μ} = {(G Ψ_{Z}^{- 1} + Ψ_{μ_{Z}}^{- 1})}^{- 1} and μ_{μ} = Σ_{μ} [Ψ_{μ_{Z}}^{- 1} μ_{Z_{0}} + Ψ_{Z}^{- 1} \sum_{g = 1}^{G} Z_{g}]$ .

Block 2. For j = 1, …, r, each of the diagonal elements of Ψ_Z given Z and μ_Z is distributed as

p (ψ_{Z_{j}}^{- 1} | Z, μ_{Z}) \overset{i . i . d .}{~} Gamma (c_{1} + \frac{G}{2}, c_{2} + \frac{1}{2} \sum_{g = 1}^{G} {(Z_{g_{j}} - μ_{Z_{j}})}^{2}),

(A2)

where Z_{g_j} is the jth element of the values in Z associated with point mass (or cluster) g and μ_{Z_j} is the jth element of μ_Z.

Block 3. Following the derivations detailed elsewhere (Ishwaran & James, 2001; Ishwaran & Zarepour, 2000; Lee et al., 2007), the conditional distribution (α|π) can be shown to be

p (α | π) ~ Gamma (a_{1} + G - 1, a_{2} - \sum_{g = 1}^{G - 1} log (1 - v_{g}^{*})),

(A3)

where $ν_{g}^{*}$ is a random weight sampled from the beta distribution and it is sampled within Block 4.

Block 4. As π and α are independent given (Z, τ, Y^*, θ, H, Y_obs), the distribution (π, Z|L, μ_Z, ψ_Z, α, τ, Y^*, θ, H, Y_obs) is proportional to p(π|L, α)p(Z|L, μ_Z, ψ_Z, τ, Y^*, θ, H, Y_obs). Thus, the conditional distribution can be decomposed into two independent components to be derived separately.

Conditional distribution p(π|L, α)

It can be shown that the conditional distribution (π|L, α) conforms to a generalized Dirichlet distribution,

p (π | L, α) ~ g (a_{1}^{*}, b_{1}^{*}, \dots, a_{G - 1}^{*}, b_{G - 1}^{*}),

(A4)

where $a_{g}^{*} = 1 + d_{g}, b_{g}^{*} = α + \sum_{j = g + 1}^{G} d_{j}$ , for g = 1, …, G − 1, and d_g is the number of L_is (and thus individuals) whose value equals g. Sampling from the conditional distribution (π|L, α) can be accomplished as follows. First, $ν_{g}^{*}$ is first drawn from a Beta $(a_{g}^{*}, b_{g}^{*})$ distribution. Subsequently, π_g is obtained for g = 1, …, G as

\begin{matrix} π_{1} = v_{1}^{*}, \\ π_{G} = 1 - \sum_{g = 1}^{G - 1} π_{g}, \\ π_{g} = \prod_{j = 1}^{g - 1} (1 - v_{j}^{*}) v_{g}^{*}, for g \neq 1 or G . \end{matrix}

(A5)

Conditional distribution p(Z|L, μ_Z, ψ_Z, τ, Y^*, θ, H, Y_obs)

Let $L_{1}^{*}, \dots, L_{d}^{*}$ be the d unique L_i values (i.e., unique number of ‘clusters’), $Z^{L} = (Z_{L_{1}^{*}}, \dots, Z_{L_{d}^{*}})$ , and let Z^[L] be components in Z = (Z₁, …, Z_G) other than Z^L. Then

p (Z | L, μ_{Z}, ψ_{Z}, τ, Y^{*}, θ, H, Y_{obs}) = p (Z^{[L]} | μ_{Z}, ψ_{Z}) p (Z^{L} | L, μ_{Z}, ψ_{Z}, τ, Y^{*}, θ, H, Y_{obs}),

where p(Z^[L]| μ_Z, ψ_Z) is simply the r-variate normal distribution, N_r(μ_Z, ψ_Z) and

p (Z^{L} | L, μ_{Z}, ψ_{Z}, τ, Y^{*}, θ, H, Y_{obs}) = \prod_{g = 1}^{d} p (Z_{L_{g}^{*}} | L, μ_{Z}, ψ_{Z}, τ, Y^{*}, θ, H, Y_{obs}) .

It can be shown that the conditional distribution $p (Z_{L_{g}^{*}} | L, μ_{Z}, Ψ_{Z}, τ, Y^{*}, θ, H, Y_{obs})$ is non-standard and cannot be derived directly via Gibbs sampling. Specifically,

p (Z_{L_{g}^{*}} | L, μ_{Z}, Ψ_{Z}, τ, Y^{*}, θ, H, Y_{obs}) \propto p (Z_{L_{g}^{*}} | μ_{Z}, Ψ_{Z}) \prod_{{i : L_{i} = L_{g}^{*}}} [p (η_{i} | b_{i} = Z_{L_{g}^{*}}, θ_{η}) p (y_{i}^{*} | η_{i}, θ_{ε})]

in which $p (η_{i} | b_{i} = Z_{L_{g}^{*}}, θ_{η})$ is given by

{\begin{matrix} p (η_{i 0}) \prod_{t = 1}^{T} p (η_{it} | η_{it - 1}, b_{i} = Z_{L_{g}^{*}}, θ_{η}), & if η_{i 0} is stochastic, \\ \prod_{t = 1}^{T} p (η_{it} | η_{it - 1}, b_{i} = Z_{L_{g}^{*}}, θ_{η}), & otherwise . \end{matrix}

(A6)

From equation (A6), it can be noted that multiplication involving the density p(η_it|η_i,t−1, b_i, θ_η) results in a conditional density that is non-normal and non-standard due to the nonlinearity of f_t(.) and the fact that $Z_{L_{g}^{*}}$ is random, as opposed to fixed within this sampling step. Instead, we adopt an MH step as follows. At the qth iteration with a current value $Z_{L_{g}^{*}}^{(q)}$ , a new candidate $Z_{L_{g}^{*}}$ is generated from the normal distribution $N (Z_{L_{g}^{*}}^{(q)}, σ_{b}^{2} Ω_{b})$ , where $Ω_{b} = Ψ_{Z}^{- 1} + \sum_{{i : L_{i} = L_{g}^{*}}} {\sum_{t = 1}^{T} Δ_{bit}^{'} Ψ_{ζ}^{- 1} Δ_{bit})}^{- 1} {and Δ_{bit} = \partial η_{it} / \partial b_{i}^{'} |}_{b_{i} = Z_{L_{g}^{*}}^{(q)}, η_{it} = η_{i, t - 1}}$ . The latter is derived by means of the implicit function theorem, namely, $\partial η_{it} / \partial b_{i}^{'} = {(\partial f_{i, t + 1} / \partial η_{it}^{'})}^{- 1} \partial f_{i, t + 1} / \partial b_{i}^{'}$ . The new $Z_{L_{g}^{*}}$ is accepted with probability

min {1, \frac{p (Z_{L_{g}^{*}} | μ_{Z}, ψ_{Z}) \prod_{{i : L_{i} = L_{g}^{*}}} \prod_{t = 1}^{T} p (η_{it} | η_{it - 1}, b_{i} = Z_{L_{g}^{*}}, θ_{η})}{p (Z_{L_{g}^{*}}^{(q)} | μ_{Z}, ψ_{Z}) \prod_{{i : L_{i} = L_{g}^{*}}} \prod_{t = 1}^{T} p (η_{it} | η_{it - 1}, b_{i} = Z_{L_{g}^{*}}^{(q)}, θ_{η})}} .

(A7)

The variance $σ_{b}^{2}$ can be chosen such that the average acceptance rate is approximately 0.25 or more.

Block 5. The conditional distribution (L_i|π, Z, τ, Y^*, θ, H, Y_obs) is given by

(L_{i} | π, Z, τ, Y^{*}, θ, H, Y_{obs}) \overset{i . i . d .}{~} Multinomial (π_{ig}^{*}, g = 1, \dots, G),

(A8)

where $π_{ig}^{*}$ is proportional to $(π_{g} p (y_{i}^{*} | η_{i}, θ_{ε}) p (η_{i} | Z_{g}, θ_{η}))$ and π_g(g = 1, …, G) are available from step (i.e., block) 4 summarized in equation (A5). Note that because $p (y_{i}^{*} | η_{i}, θ_{ε})$ is fixed conditional on η_i, this component can thus be omitted from the computation.

Step (f): Conditional distribution for latent variable estimates, p(H|τ, Y^*, θ, Y_obs)

The conditional distribution from which posterior samples of the latent variable estimates are obtained can be derived as

\begin{matrix} p (H | τ, Y^{*}, θ, Y_{obs}) & = \prod_{i = 1}^{n} p (η_{i} | Y_{i}^{*}, θ, b_{i}) \\ = \prod_{i = 1}^{n} \prod_{t = 1}^{T} p (η_{it} | H_{i, t - 1}, H_{i, t + 1}^{*}, Y_{i}^{*}, θ, b_{i}), \end{matrix}

where H_it = (η_i1, …, η_it) and $H_{i, t + 1}^{*} = (η_{i, t + 1}, \dots, η_{iT})$ . According to the Gibbs sampler, random draws of η_i from $p (η_{i} | Y_{i}^{*} |, θ, b_{i})$ are based on those of η_it from $p (η_{it} | H_{i, t - 1}, H_{i, t + 1}^{*}, Y_{i}^{*}, θ, b_{i})$ for each time point. That is, for t = 1, …, T:

p (η_{it} | H_{i, t - 1}, H_{i, t + 1}^{*}, Y_{i}^{*}, θ, b_{i}) ~ {\begin{matrix} p (y_{it}^{*} | η_{it}, θ_{ε}) p (η_{it} | η_{i, t - 1}, b_{i}, θ_{η}) p (η_{it + 1} | η_{it}, b_{i}, θ_{η}), & for t = 1, \dots, T - 1, \\ p (y_{it}^{*} | η_{it}, θ_{ε}) p (η_{it} | η_{i, t - 1}, b_{i}, θ_{η}), & for t = T . \end{matrix}

Note that we could obtain a standard conditional distribution for t = T but not for t < T. Specifically, at t = T, the conditional distribution $p (η_{iT} | H_{i, T - 1}, Y_{i}^{*}, θ, b_{i})$ is given by $η_{iT} ~ N_{w} (b_{iT}^{*}, B_{iT}^{*})$ , where $B_{iT}^{*} = {(Ψ_{ζ}^{- 1} + Λ' Ψ_{ε}^{- 1} Λ)}^{- 1} and b_{iT}^{*} = B_{iT}^{*} [Ψ_{ζ}^{- 1} f_{T} (η_{i, T - 1}, b_{i}) + Λ' Ψ_{ε}^{- 1} (y_{iT}^{*} - μ)]$ . However, when t < T, multiplication involving the density p(η_i,t+1|ηit, b_i, θ_η) would result in a conditional density that is non-normal and non-standard. This is due directly to the nonlinearity of f_t(.) and the fact that η_it is random, as opposed to fixed, at each t. We adopted the following MH algorithm to sample observations from the posterior density $p (η_{it} | H_{i, t - 1}, H_{i, t + 1}^{*}, Y_{i}^{*}, θ, b_{i})$ . At the qth iteration with a current value $η_{it}^{(q)}$ , a new candidate η_it is generated from the normal distribution $N (η_{it}^{(q)}, σ_{η}^{2} Ω_{η})$ , where $Ω_{η} = {(B_{{iT}^{*} - 1} + Δ_{it}^{'} Ψ_{ζ}^{- 1} Δ_{it})}^{- 1} and Δ_{it} = {\partial f_{t + 1} / \partial η_{it}^{'} |}_{η_{it} = η_{i, t - 1}}$ , and it is accepted with probability

min {1, \frac{p (η_{it} | H_{i, t - 1}, H_{i, t + 1}^{*}, Y_{i}^{*}, θ, b_{i})}{p (η_{it}^{(q)} | H_{i, t - 1}, H_{i, t + 1}^{*}, Y_{i}^{*}, θ, b_{i})}} .

The variance $σ_{η}^{2}$ can be chosen such that the average acceptance rate is approximately 0.25 or more.

Step (g): Conditional distributions for parameters in θ

Assuming that the parameters in b are independent of those contained in θ, and that parameters in θ_η are conditionally independent of those in θ_ε, the conditional distribution p(θ|τ, Y^*, H, Y_obs, b) = p(θ_η|H, b)p(θ_ε|τ, Y^*, H, Y_obs, b) is derived by computing the latter two densities separately for all the person-invariant parameters in the dynamic and measurement models.

Parameters in the dynamic model

At the dynamic level, the only parametric posterior distribution associated with p(θ_η|H, b) is that of $p (Ψ_{ζ}^{*} | H, b)$ . We used a w-dimensional inverse Wishart distribution as the conjugate prior for the process noise covariance matrix, Ψ_ζ, i.e., $p (Ψ_{ζ}) ~ {I 𝕎}_{w} [w_{0}, Ψ_{ζ_{0}}^{- 1}]$ thus yielding

p (Ψ_{ζ} | H, b) ~ {IW}_{w} [nT + w_{0}, R_{η} + Ψ_{ζ_{0}}^{- 1}],

where $R_{η} = \sum_{i = 1}^{n} \sum_{i = 1}^{T} [η_{it} - f (η_{i, t - 1}, b_{i})] [η_{it} - f (η_{i, t - 1}, b_{i})]'$ .

Parameters in the measurement model

Following the work of many others (e.g., Lindley & Smith, 1972; Shi & Lee, 1998; Lee & Zhu, 2000), we specified conjugate priors for the distributions of p(μ), $p (ψ_{ε_{k}}^{- 1})$ and p(Λ_k|Ψ_{ε_k}) as in equation (12) for k = 1, …, p.

To cope with the case of fixed known elements in Λ, let C = (c_kj) be the index matrix such that c_kj = 0 if λ_kj is known and c_kj = 1 if λ_kj is unknown, and $r_{ε_{k}} = \sum_{j = 1}^{w} c_{kj}$ , where r_{ε_k} denotes the number of freed factor loadings in the kth row of Λ. Further, let H_k (r_{ε_k}T × n) be a submatrix of H such that its jth row, for which c_kj = 0, has been deleted, and an n × 1 vector $U_{kt}^{*}$ such that $U_{kt}^{*'} = (U_{k 1 t}^{*}, \dots, U_{knt}^{*})$ has elements

U_{kit}^{*} = y_{kit}^{*} - μ_{k} - \sum_{j = 1}^{w} λ_{kj} η_{jit} (1 - c_{kj}),

(A9)

where $y_{kit}^{*}$ and μ_k denote the kth element of $y_{it}^{*}$ and μ, respectively. Then it can be shown that

\begin{matrix} p (ψ_{ε_{k}}^{- 1} | μ, Y^{*} H) ~ Gamma (\frac{nT}{2} + α_{0 ε k,} β_{ε_{k}}), p (Λ_{ε_{k}} | ψ_{ε_{k}}^{- 1}, μ, Y^{*} H) ~ N [υ_{k}, ψ_{ε_{k}} Ω_{k}], \\ p (μ | Λ, ψ_{ε}, Y^{*}, H) ~ N [Ω_{μ} {\sum_{0}^{- 1} μ_{0} + Ψ_{ε}^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} (y_{it}^{*} - Λ η_{it})}, Ω_{μ}], \end{matrix}

where

\begin{matrix} β_{ε k} = β_{0 ε k} + \frac{1}{2} (\sum_{t = 1}^{T} U_{kt}^{*'} U_{kt}^{*} - υ_{k}^{'} Ω_{k}^{- 1} υ_{k} + Λ_{0 k}^{'} H_{0 Λ_{k}}^{- 1} Λ_{0_{k}}), \\ Ω_{k} = {(H_{0 Λ_{k}}^{- 1} + \sum_{t = 1}^{T} H_{kt} H_{kt}^{'})}^{- 1}, υ_{k} = Ω_{k} (\sum_{t = 1}^{T} H_{kt} U_{kt}^{*} + H_{0 Λ_{k}}^{- 1} Λ_{0_{k}}), and Ω_{μ} = {(\sum_{0}^{- 1} + nT Ψ_{ε}^{- 1})}^{- 1} . \end{matrix}

Step (h): Conditional distribution $p (Y_{mis}^{*} | θ, H)$

Since the $y_{it}^{*}$ are mutually independent for i = 1, …, n and t = 1, …, T, the y_it,mis are also independent of each other for i = 1, …, n and t = 1, …, T. In addition, Ψ_ε is assumed to be a diagonal matrix. Thus, y_it,mis is also independent of y_it,obs. Because Y^* are missing at random, we have $p (y_{it, mis}^{*} | θ_{ε}, η_{it}) \overset{i . i . d .}{~} N (μ_{it, mis} + Λ_{it, mis} η_{it}, Ψ_{ε it, mis})$ , where μ_it,mis is a subvector of μ with components corresponding to the missing components in y_it,mis, Λ_it,mis is a submatrix of Λ with rows corresponding to the missing components in y_it,mis, and Ψ_εit,mis is a submatrix of Ψ_ε with rows and columns corresponding to the missing components in $y_{it, mis}^{*}$ .

Step (i): Conditional distribution $p (τ, Y_{obs}^{*} | θ, H, Y_{obs})$

To sample τ and $Y_{obs}^{*}$ , we first note that

p (τ_{k}, Y_{k, obs}^{*} | θ, H, Y_{k, obs}) = p (τ_{k} | Y_{k, obs}, θ, H) p (Y_{k, obs}^{*} | τ_{k}, Y_{k, obs}, θ, H),

where

p (τ_{k} | Y_{k, obs}, θ, H) \propto \prod_{i = 1}^{n} \prod_{t = 1}^{T} (Φ (\frac{τ_{k, y_{it, k + 1, obs}} - μ_{k} - Λ_{k}^{'} η_{it}}{ψ_{ε_{k}}^{1 / 2}}) - Φ (\frac{τ_{k, y_{it, k, obs}} - μ_{k} - Λ_{k}^{'} η_{it}}{ψ_{ε_{k}}^{1 / 2}})),

(A10)

p (y_{it, k, obs}^{*} | τ_{k}, Y_{k, obs}, θ, H) = N (μ_{k} + Λ_{k}^{'} η_{it}, ψ_{ε_{k}}) I_{(τ_{k, y_{it, k, obs} - 1}, τ_{k, y_{it, k, obs}})} (y_{it, k, obs}^{*}) .

(A11)

To generate observations from the non-standard and complex joint conditional density of τ_k and $Y_{k, obs}^{*}$ , the following MH step is embedded within the Gibbs sampler. Specifically, a vector of thresholds (τ_k,2, …, τ_{k,M −2}) is first generated from the truncated normal distribution

τ_{k, s} ~ N (τ_{k, s}^{(q)}, σ_{τ_{k}}^{2}) I_{(τ_{k, s - 1}, τ_{k, s + 1}^{(q)})} (τ_{k, s}), for s = 2, \dots, M - 2,

(A12)

where $τ_{k, s}^{(q)}$ denotes the value of τ_k,s at the qth iteration of the Gibbs sampler and $σ_{τ_{k}}^{2}$ is a preassigned constant. As mentioned earlier, the values of the first (s = 1) and last (s = M − 1) thresholds are fixed for identification purposes. Each new draw of τ_k,s is then retained with acceptance probability min(1, R_k), where

R_{k} = \prod_{s = 2}^{M - 2} \frac{Φ [(τ_{k, s + 1}^{(q)} - τ_{k, s}^{(q)}) / σ_{τ_{k}}] - Φ [(τ_{k, s - 1} - τ_{k, s}^{(q)}) / σ_{τ_{k}}]}{Φ [(τ_{k, s + 1} - τ_{k, s}) / σ_{τ_{k}}] - Φ [(τ_{k, s - 1}^{(q)} - τ_{k, s}) / σ_{τ_{k}}]} \times \prod_{i = 1}^{n} \prod_{t = 1}^{T} \frac{Φ [ψ_{ε_{k}}^{- 1 / 2} {τ_{k, y_{it, k, obs}} - μ_{k} - Λ_{k}^{'} η_{it}}] - Φ [ψ_{ε_{k}}^{- 1 / 2} {τ_{k, y_{it, k, obs} - 1} - μ_{k} - Λ_{k}^{'} η_{it}}]}{Φ [ψ_{ε_{k}}^{- 1 / 2} {τ_{k, y_{it, k, obs}}^{(q)} - μ_{k} - Λ_{k}^{'} η_{it}}] - Φ [ψ_{ε_{k}}^{- 1 / 2} {τ_{k, y_{it, k, obs} - 1}^{(q)} - μ_{k} - Λ_{k}^{'} η_{it}}]} .

(A13)

Once the threshold values have been determined, they are then used to generate new draws of $y_{it, k, obs}^{*}$ using equation (A11).

Footnotes

The P-technique model is a common factor model for extracting systematic intra-person patterns from multivariate time series measured on a single individual over time (Jones & Nesselroade, 1990).

More generally, stationarity refers to the invariance of all statistical properties (e.g., means and covariance functions, in the case of weak stationarity) of a system over time.

There was also evidence for weekly cycles in the data. This is beyond the focus of the present paper, but interested readers can refer elsewhere (Chow et al., 2005; Chow, Hamaker, & Allaire, 2009; Ram et al., 2005) for modelling options that do account for cyclic dynamics.

⁴

The measurement framework adopted in the present paper can be readily generalized to applications involving data measured on mixed scales. In this case, the measurement vector y_it is further partitioned into portions with p₁ continuous variables, p₂ binary variables and p − p₁ − p₂ ordinal variables. This yields y_it = [y_it,1, …, y_it,p₁, y_it,p₁+1, …, y_{it,p₁ +p₂}, y_{it,p₁ +p₂+1}, …, y_it,p]′. The appropriate density specification can then be defined for each portion of these data to yield a final measurement model (see Lee & Zhu, 2000).

⁵

Of course, there may be instances where researchers are also interested in obtaining other summary statistics (e.g., standard deviation) from the person-specific distribution, namely, p(Z_{L_i} |L_i, μ_Z, Ψ_Z, τ, Y^*, θ, H, Y_obs), to make inferences at the individual level. This is, however, not the focus of the present empirical illustration.

⁶

Note that our choice to report the 90%, as opposed to the 95%, credible intervals was largely arbitrary. The results did not differ, however, when the 95% credible intervals were evaluated.

References

Ansari A, Iyengar R. Semiparametric Thurstonian models for recurrent choices: A Bayesian analysis. Psychometrika. 2006;71(4):631–657. [Google Scholar]
Antoniak CE. Mixtures ofDirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics. 1974;2(6):1152–1174. [Google Scholar]
Bartholomew DJ, Knott M. Latent variable models and factor analysis. 2nd ed. London: Arnold; 1999. [Google Scholar]
Bauer D, Curran P. Distributional assumptions of growth mixture models: Implications for over-extraction of latent trajectory classes. Psychological Methods. 2003;8:338–363. doi: 10.1037/1082-989X.8.3.338. [DOI] [PubMed] [Google Scholar]
Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of ARMA time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A Festschrift for Roderick P. McDonald. Mahwah, NJ: Erlbaum; 2005. pp. 415–452. [Google Scholar]
Caron F, Davy M, Doucet A, Duflos E, Vanheeghe P. Bayesian inference for linear dynamic models with Dirichlet process mixtures. IEEE Transactions on Signal Processing. 2008;56(1):71–84. [Google Scholar]
Cattell RB, Cattell AKS, Rhymer RM. P-technique demonstrated in determining psychophysical source traits in a normal individual. Psychometrika. 1947;12:267–288. doi: 10.1007/BF02288941. [DOI] [PubMed] [Google Scholar]
Chen M-H, Dey DK, Ibrahim JG. Bayesian criterion based model assessment for categorical data. Biometrika. 2004;91:45–63. [Google Scholar]
Chow S-M, Ferrer E, Hsieh F. Statistical methods for modeling human dynamics: An interdisciplinary dialogue. New York: Routledge; 2009. [Google Scholar]
Chow S-M, Hamaker EJ, Allaire JC. Detecting discrete shifts in dynamics in group-based state-space models. Multivariate Behavioral Research. 2009;44:465–496. doi: 10.1080/00273170903103324. [DOI] [PubMed] [Google Scholar]
Chow S-M, Nesselroade JR, Shifren K, McArdle JJ. Dynamic structure of emotions among individuals with Parkinson’s disease. Structural Equation Modeling. 2004;11(4):560–582. [Google Scholar]
Chow S-M, Ram N, Boker SM, Fujita F, Clore G. Emotion as thermostat: Representing emotion regulation using a damped oscillator model. Emotion. 2005;5(2):208–225. doi: 10.1037/1528-3542.5.2.208. [DOI] [PubMed] [Google Scholar]
De Jong P, Mazzi S. Modeling and smoothing unequally spaced sequence data. Statistical Inference for Stochastic Processes. 2001;4:53–71. [Google Scholar]
Diener E, Fujita F, Smith H. The personality structure of affect. Journal of Personality and Social Psychology. 1995;69(1):130–141. [Google Scholar]
Doucet A, de Freitas N, Gordon N. Sequential Monte Carlo methods in practice. New York: Springer; 2001. [Google Scholar]
Duncan KA, MacEachern SN. Nonparametric Bayesian modelling for item response. Statistical Modeling. 2008;8(1):41–66. [Google Scholar]
Dunson DB. Nonparametric Bayes statistical modeling. New York: Cambridge University Press; 2008. Nonparametric Bayes applications to biostatistics. [Google Scholar]
Durbin J, Koopman SJ. Time series analysis by state-space methods. New York: Oxford University Press; 2001. [Google Scholar]
Escobar MD, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90(430):577–588. [Google Scholar]
Fahrmeir L, Tutz G. Multivariate statistical modelling based on generalized linear models. Berlin: Springer-Verlag; 1994. [Google Scholar]
Ferguson TS. A Bayesian analysis of some nonparametric problems. Annals of Statistics. 1973;1(2):209–230. [Google Scholar]
Ferrer E, Nesselroade JR. Modeling affective processes in dyadic relations via dynamic factor analysis. Emotion. 2003;3(4):344–360. doi: 10.1037/1528-3542.3.4.344. [DOI] [PubMed] [Google Scholar]
Frederickson BL, Losada MF. Positive affect and the complex dynamics of human flourishing. American Psychologist. 2005;60(7):678–686. doi: 10.1037/0003-066X.60.7.678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelman A. Inference and monitoring convergence. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov chain Monte Carlo in practice. London: Chapman & Hall; 1996. pp. 131–144. [Google Scholar]
Gelman A, Meng XL. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science. 1998;13:163–185. [Google Scholar]
Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6:733–807. [Google Scholar]
Geweke J, Tanizaki H. Bayesian estimation of state-space models using the Metropolis-Hastings algorithm within Gibbs sampling. Computational Statistics and Data Analysis. 2001;37:151–170. [Google Scholar]
Gottman JM, Murray JD, Swanson CC, Tyson R, Swanson KR, editors. The mathematics of marriage: Dynamic nonlinear models. Cambridge, MA: MIT Press; 2002. [Google Scholar]
Hamaker EL, Dolan CV, Molenaar PCM. ARMA-based SEM when the number of time points T exceeds the number of cases N: Raw data maximum likelihood. Structural Equation Modeling. 2003;10(3):352–379. [Google Scholar]
Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [Google Scholar]
Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge: Cambridge University Press; 2001. [Google Scholar]
Hershberger SL, Corneal SE, Molenaar PCM. A dynamic factor analysis of the emotional response patterns underlying stepdaughter/stepfather relationships. Journal of Structural Modeling. 1994;2:31–52. [Google Scholar]
Ibrahim JG, Chen M-H, Sinha D. Criterion-based models for Bayesian model assessment. Statistica Sinica. 2001;11:419–443. [Google Scholar]
Ishwaran H, James A. Gibbs sampling methods for stick breaking priors. Journal of the American Statistical Association. 2001;96(453):161–173. [Google Scholar]
Ishwaran H, Zarepour M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika. 2000;87:371–390. [Google Scholar]
Jones CJ, Nesselroade JR. Multivariate, replicated, single-subject designs and P-technique factor analysis: A selective review of the literature. Experimental Aging Research. 1990;16:171–183. doi: 10.1080/03610739008253874. [DOI] [PubMed] [Google Scholar]
Jöreskog KG. Analyzing psychological data by structural analysis of covariance matrices. In: Krantz DH, Atkinson RC, Duncan RL, Suppes P, editors. Contemporary developments in mathematical psychology (Vol. 2): Measurement, psychophysics and neural information processing. San Francisco: W. H. Freeman; 1974. pp. 1–56. [Google Scholar]
Jöreskog KG, Moustaki I. Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research. 2001;36(3):347–387. doi: 10.1207/S15327906347-387. [DOI] [PubMed] [Google Scholar]
Karabotsos G, Walker SG. Coherent psychometric modelling with Bayesian nonparametrics. British Journal of Mathematical and Statistical Psychology. 2009;62:1–20. doi: 10.1348/000711007X246237. [DOI] [PubMed] [Google Scholar]
Kenny DA, Judd CM. Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin. 1984;96:201–210. [Google Scholar]
Kim C-J, Nelson CR. State-space models with regime switching: Classical and Gibbs-sampling approaches with applications. Cambridge MA: MIT Press; 1999. [Google Scholar]
Klein A, Moosbrugger H. Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika. 2000;65:457–474. [Google Scholar]
Lee S-Y. Structural equation modeling: A Bayesian approach. Chichester: Wiley; 2007. [Google Scholar]
Lee S-Y, Lu B, Song X-Y. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statistics in Medicine. 2007;27(13):2341–2360. doi: 10.1002/sim.3098. [DOI] [PubMed] [Google Scholar]
Lee S-Y, Zhu H-T. Statistical analysis of nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology. 2000;53:209–232. doi: 10.1348/000711000159303. [DOI] [PubMed] [Google Scholar]
Lindley DV, Smith AFM. Bayes estimates for the linear model (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34:1–42. [Google Scholar]
Lindsay BG. Mixturemodels: Theory, geometry and applications. Hayward, CA: Institute of Mathematical Statistics; 1995. [Google Scholar]
Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 1987. [Google Scholar]
Ljung L, Caines PE. Asymptotic normality of prediction error estimators for approximate system models. Stochastics. 1979;3:26–49. [Google Scholar]
McArdle JJ. Structural equation modeling of an individual system: Preliminary results from ‘A case study in episodic alcoholism’. Department of Psychology, University of Denver; 1982. Unpublished manuscript. [Google Scholar]
McLachlan G, Peel D. Finite mixture models. New York: Wiley; 2000. [Google Scholar]
Meng X-L. Posterior predictive p-values. Annals of Statistics. 1994;22:1142–1160. [Google Scholar]
Molenaar PCM. A dynamic factor model for the analysis of multivariate time series. Psychometrika. 1985;50(2):181–202. [Google Scholar]
Molenaar PCM, Newell KM. Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology. 2003;56:199–214. doi: 10.1348/000711003770480002. [DOI] [PubMed] [Google Scholar]
Navarro DJ, Griffiths TL, Steyvers M, Lee MD. Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology. 2006;50:101–122. [Google Scholar]
Nesselroade JR, McArdle JJ, Aggen SH, Meyers JM. Alternative dynamic factor models for multivariate time-series analyses. In: Moskowitz DM, Hershberger SL, editors. Modeling intraindividual variability with repeated measures data: Advances and techniques. Mahwah, NJ: Erlbaum; 2002. pp. 235–265. [Google Scholar]
Ram N, Chow S-M, Bowles RP, Wang L, Grimm KJ, Fujita F, et al. Recovering cyclicity in pleasant and unpleasant affect using spectral analysis, the Rating Scale model, and planned incompleteness. Psychometrika: Application Reviews and Case Studies. 2005;70:773–790. [Google Scholar]
Schumacker RE. Latent variable interaction modeling. Structural Equation Modeling. 2002;9(1):40–54. [Google Scholar]
Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]
Shi J-Q, Lee S-Y. Using factor analysis to estimate parameters. British Journal of Mathematical and Statistical Psychology. 1998;51:233–252. [Google Scholar]
Singer H. Stochastic differential equation models with sampled data. In: van Montfort K, Oud JHL, Satorra A, editors. Longitudinal models in the behavioral and related sciences. Mahwah, NJ: Erlbaum; 2007. pp. 73–106. [Google Scholar]
Sorensen HW, Alspach DL. Recursive Bayesian estimation using Gaussian sums. Automatica. 1971;7:465–479. [Google Scholar]
Titterington DM, Smith AFM, Markov UE. The statistical analysis of finite mixture distributions. New York: Wiley; 1985. [Google Scholar]
Wei WWS. Time series analysis. Redwood City, CA: Addison-Wesley; 1990. [Google Scholar]
West M, Harrison J. Bayesian forecasting and dynamic models. 2nd ed. New York: Springer-Verlag; 1997. [Google Scholar]
Young PC, Pedregal DJ, Tych W. Dynamic harmonic regression. Journal of Forecasting. 1999;18(6):369–394. [Google Scholar]
Zautra AJ, Reich JW, Davis MC, Nicolson NA, Potter PT. The role of stressful events in the relationship between positive and negative affects: Evidence from field and experimental studies. Journal of Personality. 2000;68:927–951. doi: 10.1111/1467-6494.00121. [DOI] [PubMed] [Google Scholar]
Zhang G, Browne M. Dynamic factor analysis with ordinal variables. In: Chow S-M, Ferrer E, Hsieh F, editors. Statistical methods for modeling human dynamics: An interdisciplinary dialogue. New York: Routledge; 2009. [Google Scholar]
Zhang Z, Nesselroade JR. Bayesian estimation of categorical dynamic factor models. Multivariate Behavioral Research. 2007;42:729–756. [Google Scholar]
Zhu HT, Zhang HP. Hypothesis testing in mixture regression models. Journal of Royal Statistical Society, Series B. 2004;66:3–16. [Google Scholar]

[R1] Ansari A, Iyengar R. Semiparametric Thurstonian models for recurrent choices: A Bayesian analysis. Psychometrika. 2006;71(4):631–657. [Google Scholar]

[R2] Antoniak CE. Mixtures ofDirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics. 1974;2(6):1152–1174. [Google Scholar]

[R3] Bartholomew DJ, Knott M. Latent variable models and factor analysis. 2nd ed. London: Arnold; 1999. [Google Scholar]

[R4] Bauer D, Curran P. Distributional assumptions of growth mixture models: Implications for over-extraction of latent trajectory classes. Psychological Methods. 2003;8:338–363. doi: 10.1037/1082-989X.8.3.338. [DOI] [PubMed] [Google Scholar]

[R5] Browne MW, Nesselroade JR. Representing psychological processes with dynamic factor models: Some promising uses and extensions of ARMA time series models. In: Maydeu-Olivares A, McArdle JJ, editors. Contemporary psychometrics: A Festschrift for Roderick P. McDonald. Mahwah, NJ: Erlbaum; 2005. pp. 415–452. [Google Scholar]

[R6] Caron F, Davy M, Doucet A, Duflos E, Vanheeghe P. Bayesian inference for linear dynamic models with Dirichlet process mixtures. IEEE Transactions on Signal Processing. 2008;56(1):71–84. [Google Scholar]

[R7] Cattell RB, Cattell AKS, Rhymer RM. P-technique demonstrated in determining psychophysical source traits in a normal individual. Psychometrika. 1947;12:267–288. doi: 10.1007/BF02288941. [DOI] [PubMed] [Google Scholar]

[R8] Chen M-H, Dey DK, Ibrahim JG. Bayesian criterion based model assessment for categorical data. Biometrika. 2004;91:45–63. [Google Scholar]

[R9] Chow S-M, Ferrer E, Hsieh F. Statistical methods for modeling human dynamics: An interdisciplinary dialogue. New York: Routledge; 2009. [Google Scholar]

[R10] Chow S-M, Hamaker EJ, Allaire JC. Detecting discrete shifts in dynamics in group-based state-space models. Multivariate Behavioral Research. 2009;44:465–496. doi: 10.1080/00273170903103324. [DOI] [PubMed] [Google Scholar]

[R11] Chow S-M, Nesselroade JR, Shifren K, McArdle JJ. Dynamic structure of emotions among individuals with Parkinson’s disease. Structural Equation Modeling. 2004;11(4):560–582. [Google Scholar]

[R12] Chow S-M, Ram N, Boker SM, Fujita F, Clore G. Emotion as thermostat: Representing emotion regulation using a damped oscillator model. Emotion. 2005;5(2):208–225. doi: 10.1037/1528-3542.5.2.208. [DOI] [PubMed] [Google Scholar]

[R13] De Jong P, Mazzi S. Modeling and smoothing unequally spaced sequence data. Statistical Inference for Stochastic Processes. 2001;4:53–71. [Google Scholar]

[R14] Diener E, Fujita F, Smith H. The personality structure of affect. Journal of Personality and Social Psychology. 1995;69(1):130–141. [Google Scholar]

[R15] Doucet A, de Freitas N, Gordon N. Sequential Monte Carlo methods in practice. New York: Springer; 2001. [Google Scholar]

[R16] Duncan KA, MacEachern SN. Nonparametric Bayesian modelling for item response. Statistical Modeling. 2008;8(1):41–66. [Google Scholar]

[R17] Dunson DB. Nonparametric Bayes statistical modeling. New York: Cambridge University Press; 2008. Nonparametric Bayes applications to biostatistics. [Google Scholar]

[R18] Durbin J, Koopman SJ. Time series analysis by state-space methods. New York: Oxford University Press; 2001. [Google Scholar]

[R19] Escobar MD, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90(430):577–588. [Google Scholar]

[R20] Fahrmeir L, Tutz G. Multivariate statistical modelling based on generalized linear models. Berlin: Springer-Verlag; 1994. [Google Scholar]

[R21] Ferguson TS. A Bayesian analysis of some nonparametric problems. Annals of Statistics. 1973;1(2):209–230. [Google Scholar]

[R22] Ferrer E, Nesselroade JR. Modeling affective processes in dyadic relations via dynamic factor analysis. Emotion. 2003;3(4):344–360. doi: 10.1037/1528-3542.3.4.344. [DOI] [PubMed] [Google Scholar]

[R23] Frederickson BL, Losada MF. Positive affect and the complex dynamics of human flourishing. American Psychologist. 2005;60(7):678–686. doi: 10.1037/0003-066X.60.7.678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Gelman A. Inference and monitoring convergence. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov chain Monte Carlo in practice. London: Chapman & Hall; 1996. pp. 131–144. [Google Scholar]

[R25] Gelman A, Meng XL. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science. 1998;13:163–185. [Google Scholar]

[R26] Gelman A, Meng XL, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6:733–807. [Google Scholar]

[R27] Geweke J, Tanizaki H. Bayesian estimation of state-space models using the Metropolis-Hastings algorithm within Gibbs sampling. Computational Statistics and Data Analysis. 2001;37:151–170. [Google Scholar]

[R28] Gottman JM, Murray JD, Swanson CC, Tyson R, Swanson KR, editors. The mathematics of marriage: Dynamic nonlinear models. Cambridge, MA: MIT Press; 2002. [Google Scholar]

[R29] Hamaker EL, Dolan CV, Molenaar PCM. ARMA-based SEM when the number of time points T exceeds the number of cases N: Raw data maximum likelihood. Structural Equation Modeling. 2003;10(3):352–379. [Google Scholar]

[R30] Hamilton JD. Time series analysis. Princeton, NJ: Princeton University Press; 1994. [Google Scholar]

[R31] Harvey AC. Forecasting, structural time series models and the Kalman filter. Cambridge: Cambridge University Press; 2001. [Google Scholar]

[R32] Hershberger SL, Corneal SE, Molenaar PCM. A dynamic factor analysis of the emotional response patterns underlying stepdaughter/stepfather relationships. Journal of Structural Modeling. 1994;2:31–52. [Google Scholar]

[R33] Ibrahim JG, Chen M-H, Sinha D. Criterion-based models for Bayesian model assessment. Statistica Sinica. 2001;11:419–443. [Google Scholar]

[R34] Ishwaran H, James A. Gibbs sampling methods for stick breaking priors. Journal of the American Statistical Association. 2001;96(453):161–173. [Google Scholar]

[R35] Ishwaran H, Zarepour M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika. 2000;87:371–390. [Google Scholar]

[R36] Jones CJ, Nesselroade JR. Multivariate, replicated, single-subject designs and P-technique factor analysis: A selective review of the literature. Experimental Aging Research. 1990;16:171–183. doi: 10.1080/03610739008253874. [DOI] [PubMed] [Google Scholar]

[R37] Jöreskog KG. Analyzing psychological data by structural analysis of covariance matrices. In: Krantz DH, Atkinson RC, Duncan RL, Suppes P, editors. Contemporary developments in mathematical psychology (Vol. 2): Measurement, psychophysics and neural information processing. San Francisco: W. H. Freeman; 1974. pp. 1–56. [Google Scholar]

[R38] Jöreskog KG, Moustaki I. Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research. 2001;36(3):347–387. doi: 10.1207/S15327906347-387. [DOI] [PubMed] [Google Scholar]

[R39] Karabotsos G, Walker SG. Coherent psychometric modelling with Bayesian nonparametrics. British Journal of Mathematical and Statistical Psychology. 2009;62:1–20. doi: 10.1348/000711007X246237. [DOI] [PubMed] [Google Scholar]

[R40] Kenny DA, Judd CM. Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin. 1984;96:201–210. [Google Scholar]

[R41] Kim C-J, Nelson CR. State-space models with regime switching: Classical and Gibbs-sampling approaches with applications. Cambridge MA: MIT Press; 1999. [Google Scholar]

[R42] Klein A, Moosbrugger H. Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika. 2000;65:457–474. [Google Scholar]

[R43] Lee S-Y. Structural equation modeling: A Bayesian approach. Chichester: Wiley; 2007. [Google Scholar]

[R44] Lee S-Y, Lu B, Song X-Y. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statistics in Medicine. 2007;27(13):2341–2360. doi: 10.1002/sim.3098. [DOI] [PubMed] [Google Scholar]

[R45] Lee S-Y, Zhu H-T. Statistical analysis of nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology. 2000;53:209–232. doi: 10.1348/000711000159303. [DOI] [PubMed] [Google Scholar]

[R46] Lindley DV, Smith AFM. Bayes estimates for the linear model (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34:1–42. [Google Scholar]

[R47] Lindsay BG. Mixturemodels: Theory, geometry and applications. Hayward, CA: Institute of Mathematical Statistics; 1995. [Google Scholar]

[R48] Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 1987. [Google Scholar]

[R49] Ljung L, Caines PE. Asymptotic normality of prediction error estimators for approximate system models. Stochastics. 1979;3:26–49. [Google Scholar]

[R50] McArdle JJ. Structural equation modeling of an individual system: Preliminary results from ‘A case study in episodic alcoholism’. Department of Psychology, University of Denver; 1982. Unpublished manuscript. [Google Scholar]

[R51] McLachlan G, Peel D. Finite mixture models. New York: Wiley; 2000. [Google Scholar]

[R52] Meng X-L. Posterior predictive p-values. Annals of Statistics. 1994;22:1142–1160. [Google Scholar]

[R53] Molenaar PCM. A dynamic factor model for the analysis of multivariate time series. Psychometrika. 1985;50(2):181–202. [Google Scholar]

[R54] Molenaar PCM, Newell KM. Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology. 2003;56:199–214. doi: 10.1348/000711003770480002. [DOI] [PubMed] [Google Scholar]

[R55] Navarro DJ, Griffiths TL, Steyvers M, Lee MD. Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology. 2006;50:101–122. [Google Scholar]

[R56] Nesselroade JR, McArdle JJ, Aggen SH, Meyers JM. Alternative dynamic factor models for multivariate time-series analyses. In: Moskowitz DM, Hershberger SL, editors. Modeling intraindividual variability with repeated measures data: Advances and techniques. Mahwah, NJ: Erlbaum; 2002. pp. 235–265. [Google Scholar]

[R57] Ram N, Chow S-M, Bowles RP, Wang L, Grimm KJ, Fujita F, et al. Recovering cyclicity in pleasant and unpleasant affect using spectral analysis, the Rating Scale model, and planned incompleteness. Psychometrika: Application Reviews and Case Studies. 2005;70:773–790. [Google Scholar]

[R58] Schumacker RE. Latent variable interaction modeling. Structural Equation Modeling. 2002;9(1):40–54. [Google Scholar]

[R59] Sethuraman J. A constructive definition of Dirichlet priors. Statistica Sinica. 1994;4:639–650. [Google Scholar]

[R60] Shi J-Q, Lee S-Y. Using factor analysis to estimate parameters. British Journal of Mathematical and Statistical Psychology. 1998;51:233–252. [Google Scholar]

[R61] Singer H. Stochastic differential equation models with sampled data. In: van Montfort K, Oud JHL, Satorra A, editors. Longitudinal models in the behavioral and related sciences. Mahwah, NJ: Erlbaum; 2007. pp. 73–106. [Google Scholar]

[R62] Sorensen HW, Alspach DL. Recursive Bayesian estimation using Gaussian sums. Automatica. 1971;7:465–479. [Google Scholar]

[R63] Titterington DM, Smith AFM, Markov UE. The statistical analysis of finite mixture distributions. New York: Wiley; 1985. [Google Scholar]

[R64] Wei WWS. Time series analysis. Redwood City, CA: Addison-Wesley; 1990. [Google Scholar]

[R65] West M, Harrison J. Bayesian forecasting and dynamic models. 2nd ed. New York: Springer-Verlag; 1997. [Google Scholar]

[R66] Young PC, Pedregal DJ, Tych W. Dynamic harmonic regression. Journal of Forecasting. 1999;18(6):369–394. [Google Scholar]

[R67] Zautra AJ, Reich JW, Davis MC, Nicolson NA, Potter PT. The role of stressful events in the relationship between positive and negative affects: Evidence from field and experimental studies. Journal of Personality. 2000;68:927–951. doi: 10.1111/1467-6494.00121. [DOI] [PubMed] [Google Scholar]

[R68] Zhang G, Browne M. Dynamic factor analysis with ordinal variables. In: Chow S-M, Ferrer E, Hsieh F, editors. Statistical methods for modeling human dynamics: An interdisciplinary dialogue. New York: Routledge; 2009. [Google Scholar]

[R69] Zhang Z, Nesselroade JR. Bayesian estimation of categorical dynamic factor models. Multivariate Behavioral Research. 2007;42:729–756. [Google Scholar]

[R70] Zhu HT, Zhang HP. Hypothesis testing in mixture regression models. Journal of Royal Statistical Society, Series B. 2004;66:3–16. [Google Scholar]

PERMALINK

Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior

Sy-Miin Chow

Niansheng Tang

Ying Yuan

Xinyuan Song

Hongtu Zhu

Abstract

1. Introduction

2. The Dirichlet process as a non-parametric prior

Figure 1.

3. Motivating empirical example

Figure 2.

4. Dynamic latent variable modelling framework

4.1. Dynamic model

4.2. Measurement model

5. Bayesian estimation procedures

6. Empirical results

Table 1.

Figure 3.

Figure 4.

Table 2.

7. Simulation study

7.1. Factor score or state estimation

Figure 5.

7.2. Time- and person-invariant parameters

Table 3.

Table 5.

7.3. Estimation of person-specific parameters, bi

Table 6.

Figure 6.

8. Discussion

Table 4.

Acknowledgements

Appendix

Conditional distributions used in the Gibbs sampling procedures

Steps (a)–(e): Conditional distributions related to the non-parametric components

Conditional distribution p(π|L, α)

Conditional distribution p(Z|L, μZ, ψZ, τ, Y*, θ, H, Yobs)

Step (f): Conditional distribution for latent variable estimates, p(H|τ, Y*, θ, Yobs)

Step (g): Conditional distributions for parameters in θ

Parameters in the dynamic model

Parameters in the measurement model

Step (h): Conditional distribution p(Ymis*|θ,H)

Step (i): Conditional distribution p(τ,Yobs*|θ,H,Yobs)

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

7.3. Estimation of person-specific parameters, b_i

Conditional distribution p(Z|L, μ_Z, ψ_Z, τ, Y^*, θ, H, Y_obs)

Step (f): Conditional distribution for latent variable estimates, p(H|τ, Y^*, θ, Y_obs)

Step (h): Conditional distribution $p (Y_{mis}^{*} | θ, H)$

Step (i): Conditional distribution $p (τ, Y_{obs}^{*} | θ, H, Y_{obs})$