Longitudinal Mixed Membership Trajectory Models for Disability Survey Data

Daniel Manrique-Vallier

doi:10.1214/14-AOAS769

. Author manuscript; available in PMC: 2015 Dec 1.

Published in final edited form as: Ann Appl Stat. 2014 Dec 19;8(4):2268–2291. doi: 10.1214/14-AOAS769

Longitudinal Mixed Membership Trajectory Models for Disability Survey Data

Daniel Manrique-Vallier ^∗

PMCID: PMC4548941 NIHMSID: NIHMS688589 PMID: 26322146

Abstract

We develop new methods for analyzing discrete multivariate longitudinal data and apply them to functional disability data on U.S. elderly population from the National Long Term Care Survey (NLTCS), 1982-2004. Our models build on a mixed membership framework, in which individuals are allowed multiple membership on a set of extreme profiles characterized by time-dependent trajectories of progression into disability. We also develop an extension that allows us to incorporate birth-cohort effects, in order to assess inter-generational changes. Applying these methods we find that most individuals follow trajectories that imply a late onset of disability, and that younger cohorts tend to develop disabilities at a later stage in life compared to their elders.

Keywords: NLTCS, Mixed Membership, Trajectories, Multivariate analysis, MCMC, Cohort analysis

1 Introduction

This paper introduces new models and estimation procedures to analyze discrete multivariate longitudinal data on functional disability, motivated by the analysis of data from the National Long Term Care Survey (NLTCS). The NLTCS is a longitudinal panel survey instrument aimed at assessing chronic disability among the elderly (65+) population in the United States. It enables researchers to answer important questions related to the aging process and disability prevalence in the U.S.: How many elder Americans will live with disabilities? What is the of duration of disability episodes? What is the age of onset of disability? Is the nature of disability changing for younger generations? (Connor et al., 2006). Answers to these questions are of importance in public policy design due to, among other reasons, the increased public and private expenditure for disabled people in contrast with their able peers (Manton et al., 2007).

Many of the relevant public policy questions for which the NLTCS can potentially provide answers are related to changes over time: changes during the life of an individual (“how is this individual likely to age?”), or comparing people across different generations (“are people from later generations acquiring disabilities differently than people born 20 years before?”). Thus, to answer these questions we need to consider the longitudinal dimension of these data. In addition, as not everyone could be expected to age the exact same way, it is safe to assume that elderly American people constitute a heterogeneous population. Models for longitudinal disability data need to be capable of accounting for such heterogeneity.

Although the longitudinal nature of the NLTCS data is frequently invoked (e.g. Corder and Manton, 1991; Manton et al., 1997, 2006), efforts to analyze the data using true longitudinal methods have been few and far between. Most researchers have instead analyzed the NLTCS as a series of uncorrelated cross sectional samples (see e.g. Manton et al., 1997, 2006, 2007). Recent attempts to deal with the longitudinal nature of the NLTCS have been undertaken by Stallard (2005), Connor (2006) and White and Erosheva (2013).

The new models and methods that we propose in this paper, which we call Trajectory Grade of Membership models (TGoM), seek to capture both the longitudinal nature of the individual NLTCS data, and the inherent individual heterogeneity of the aging process. These models handle individual heterogeneity using the concept of Mixed Membership (Erosheva et al., 2004; Erosheva and Fienberg, 2005). Mixed Membership models describe a small number of ideal types of individuals (or extreme profiles) and let each individual partially belong to each pure type, to a different degree. At the same time, TGoM models focus on the longitudinal nature of the process by defining the extreme profiles as typical progressions over time. We also introduce an extension to this model aimed at capturing differences across generational cohorts. We do this by allowing individuals’ mixed membership to depend on their dates of birth.

The remainder of this article is organized as follows. In the next section, we present a brief introduction and description of the National Long Term Survey. Next, in Section 3, we describe the basic TGoM model and its extension to handle generational cohorts. Estimation algorithms based on MCMC sampling are introduced in Section 4 and fully described in appendices A and B. In Section 5 we apply the TGoM models to the NLTCS. Finally, in Section 6, we conclude with a discussion on the insights provided by the models, their limitations, and possible extensions.

2 The National Long Term Care Survey

The National Long Term Care Survey (NLTCS) is a longitudinal panel survey designed specifically to assess the state and progression of chronic disability among the United States population aged 65 years or more (Corder and Manton, 1991). It consists of six waves, conducted in 1982, 1984, 1989, 1994, 1999 and 2004. In very rough terms, each wave consists of interviews to approximately 20,000 people, from which around 15,000 are previously interviewed individuals. Each wave includes a fresh new sample of around 5,000 individuals. These refreshment samples serve the double purpose of replacing those who have died since the previous wave, and of keeping each wave representative of the current state of the population over 65 (Clark, 1998). A total of around 49,000 people have been screened in the survey between 1982 and 2004.

The NLTCS assesses functional disability by evaluating subjects’ ability to perform two sets of activities. The first one, called Activities of Daily Living (ADL), comprises basic self-care activities, such as bathing, eating and dressing. The second, Instrumental Activities of Daily Living (IADL), involves activities necessary for independent living within a community, like preparing meals or maintaining finances. The NTLCS determines the functional status in these activities through answers to a series of triggering questions, which are then summarized as binary response items that indicate the presence or absence of impairments.

The design of the NLTCS is such that the survey data can be use used as several cross-sectional samples, considering each wave as a different sample from the target population at that time, and also as a longitudinal sample, following individuals across different measurement waves.

The NLTCS first screens each sampled individual using a special, “screener”, questionnaire aimed at quickly detecting if he or she is chronically disabled. The operational definition of “chronically disabled” in the context of the NLTCS requires that the individual presents an impairment in some ADL or IADL lasting or expected to last at least 90 days. If screened-out, the individual’s status is registered and they are re-screened in subsequent waves, to assess if the disability status has changed. If the individual is screened-in, he or she is then interviewed using a detailed questionnaire. There are different detailed questionnaires for institutionalized and individuals living in the community. After receiving a detailed questionnaire for the first time, the subject is then eligible to receive detailed questionnaires in all subsequent waves of the survey until death (Clark, 1998).

In what follows, we have used a subset of the NLTCS consisting of all six binary answers to questions about the individual’s ability to perform ADLs (EAT: Eating; DRS: Dressing; TLT:Toileting; BED:Getting in and out of bed; MOB:Inside mobility; BTH:Bathing), from all six waves of the NLTCS. We obtained ages and dates of birth from linked Medicare data from the Centers for Medicare and Medicaid services (CMS). We provide further details about our data pre-processing in Section 5.

3 Mixed Membership Trajectory Models

The goal of this analysis is to characterize typical progressions in acquisition of disabilities over time, while taking into consideration and characterizing the heterogeneity of the population. For this we combine two main ideas.

The first idea is clustering based on trajectories. This is the idea behind Latent Trajectory models (LTMs. Nagin, 1999). Broadly speaking, LTMs are mixture models of the form

p (y ∣ x) = \sum_{k = 1}^{K} π_{k} f_{k} (y ∣ x),

(1)

where y is a vector containing T longitudinal measurements of a response variable of interest, and x is a vector that contains the corresponding T values of a time-dependent covariate. The joint densities corresponding to each mixture component, f_k(·), are in turn modeled using parametric trajectory functions. Trajectory functions (or simply trajectories) describe typical progressions over time, usually modeling the dependence of the outcome variables as a function of age. For a given population, LTMs provide estimates of both the trajectories, and the individuals’ distribution over them. Therefore LTMs perform data-driven clustering based on evolution over time (see Nagin, 1999, for details). Connor (2006) adapted this technique for the analysis of multivariate discrete data, and applied it to the NLTCS. The trajectory curves represented the probability of presenting a disability as a function of age. This tool provides a simple and easy mechanism to interpret typical ways of aging, with a degree of heterogeneity handling. However, it assumes that individuals within a class are perfectly homogeneous. It thus attributes all the potential within-class variability to random fluctuations. In Connor’s formulation, this assumption essentially says that, within a class, every single individual responds to the exact same underlying aging process. It thus disregards the fact that classes are ideal constructions to which possibly no actual individuals belong (Kreuter and Muthén, 2008).

The second idea, Mixed Membership, provides a powerful and conceptually attractive way of relaxing the within-class homogeneity assumption. Similarly to traditional clustering techniques, like the Latent Class model (Goodman, 1974) or LTMs, Mixed Membership models still assume the existence of a small number of classes, called ideal types or extreme profiles. However, instead of forcing every individual into one and only one class, they allow them to belong simultaneously to more than one, in different degree. The Grade of Membership model (GoM; Woodbury et al., 1978; Manton et al., 1991) is an example of a mixed membership model that has been successfully applied to the cross-sectional analysis of the NTLCS (see e.g. Manton et al., 1997, 2006, 2007; Erosheva et al., 2007). Erosheva et al. (2007) developed a full Bayesian version of the GoM model and applied it to a pooled across-waves version of the NLTCS.

The approach we present here combines LTMs with Mixed Membership. It seeks to produce a soft-clustering based on trajectories. Similar to LTM, it assumes that for a given population we can identify a few ways of progressing over time, which we consider ideal extreme cases. At the same time it assumes that individuals in the population do not exactly correspond to these typical profiles, but instead behave somewhere in between them, in quantifiable ways. Note that this approach is conceptually different from previous cross sectional applications of the GoM model to the study of disability. In those applications extreme profiles represented ideal types of disability, whereas in TGoMs they represent ideal types of people. In the same way, it also differs from other previously proposed time-dependent Mixed Membership models, which specify time-evolving individual membership (e.g. Stallard, 2005; Xing et al., 2010). In TGoMs the membership is an immutable characteristic of the individual.

3.1 Basic TGoM Model

We consider a sample composed of N individuals. Following mixed membership ideas, we assume the existence of a number, K, of reference types of individuals called extreme profiles. These extreme profiles represent idealized individuals. This means that it might be the case that no real individual corresponds exactly to any of them. Instead, we assume that each individual i = 1, …, N has an associated membership vector, g_i = (g_i1, …, g_iK), whose kth component, g_ik, represents their degree of membership into the kth extreme profile. We constrain membership vectors so that their components are positive numbers that sum to 1, i.e. they lie on a K−1 dimensional unit simplex, Δ_K−1. In this way, we identify ideal individuals of the kth type as those whose membership vectors’ components are zeros on each component distinct from k, and g_ik = 1. For instance, we say that an individual with membership vector g_i = (0, 0, 1, 0) belongs exclusively to the extreme profile k = 3. Similarly, we can represent more complex membership structures. For example g_i = (0.1, 0.2, 0.4, 0.3) indicates that individual i has 10% membership in the first extreme profile, 20% in the second, and so on.

We are interested in modeling the progression of disability as time passes. We start by modeling ideal individuals. Let individual i provisionally be a full member of extreme profile k, i.e. g_ik = 1. Let y_ij(τ) be 1 if the individual does experiment difficulties performing ADL j at age τ, and 0 otherwise. We model the evolution of the probability of a positive response to question j, y_ij(τ), as a function of age, λ_jk(τ), so that

λ_{j k} (τ) = Pr (y_{i j} (τ) = 1 ∣ g_{i k} = 1, β_{j k}, τ) .

(2)

Here β_jk is a generic vector of parameters that indexes λ_jk(·) within a parametric family—e.g. the parameters of a linear logistic curve. We call the functions λ_jk(·) extreme trajectories.

Now moving to actual individuals, we specify the corresponding trajectory of a generic, non-ideal individual i, with membership vector g_i = (g_i1, …, g_iK), as the convex combination

\begin{matrix} λ_{j}^{(i)} (τ) & = Pr (y_{i j} (τ) = 1 ∣ g_{i}, β_{j}, τ) \\ = \sum_{k = 1}^{K} g_{i k} λ_{j k} (τ), \end{matrix}

where β_j = (β_j₁, …, β_jK).

Although τ is a continuously-varying quantity, we only have measurements at each of the t = 1, …, T = 6 occasions, corresponding to the waves of the survey. Thus we define y_ijt = y_ij (Age_it), where Age_it is the age of individual i at measurement time t = 1, …, T. We group these numbers into individual vectors Age_i = (Age_i1, …, Age_iT). Then we have that

\begin{matrix} p (y_{i j t} ∣ g_{i}, β_{j}, {Age}_{i}) & = Bern (y_{i j t} ∣ λ_{j}^{(i)} (A g e_{i t})) \\ = \sum_{k = 1}^{K} g_{i k} Bern (y_{i j t} ∣ λ_{j k} (A g e_{i t})), \end{matrix}

where Bern(y|p) = p^y(1 − p)^1−y, for y ∈ {0, 1} and 0 < p < 1.

Next, we assume that, for a single individual, the J responses at each of the T measurement times are conditionally independent of one another, given their membership vector, g_i, and covariate vector Age_i. Under this assumption we effectively use the membership vector and the covariates to decouple the dependence structure present in the components of the response. Then we have

p (Y_{i} ∣ g_{i}, β, {Age}_{i}) = \prod_{j = 1}^{J} \prod_{t = 1}^{T} \sum_{k = 1}^{K} g_{i k} Bern (y_{i j t} ∣ λ_{j k} (A g e_{i t})),

(3)

where Y_i = (y_ijt)_{J=1…J,t=1…T} and β = (β₁, …, β_J). By assuming that each individual has been randomly sampled from the population we finally get the joint model of Y = (Y_i), conditional on g = (g_i) and Age = (Age_i),

p (Y ∣ g, β, Age) = \prod_{i = 1}^{N} \prod_{j = 1}^{J} \prod_{t = 1}^{T} \sum_{k = 1}^{K} g_{i k} Bern (y_{i j t} ∣ λ_{j k} (A g e_{i t})) .

(4)

We assume that membership vectors are i.i.d. samples from a common distribution G_α, with support on the simplex Δ_K−1. This yields the unconditional (on g) model for the sample Y,

p (Y ∣ β, Age) = \prod_{i = 1}^{N} \int_{Δ_{K - 1}} \prod_{j = 1}^{J} \prod_{t = 1}^{T} \sum_{k = 1}^{K} ω_{k} Bern (y_{i j t} ∣ λ_{j k} (A g e_{i})) G_{α} (d ω),

(5)

where ω = (ω₁, …, ω_K) ∈ Δ_K−1. Figure 1 shows a graphical representation of the structure of this model.

Graphical probabilistic representation of the basic TGoM model. Observed variable *Age_it* is the age of individual i at survey wave t. Gray nodes represent observed quantities; white nodes represent parameters to estimate.

As Erosheva et al. (2007) discuss for the Grade of Membership model, the model in (3) admits the augmented data representation,

f^{A U G} (Y_{i}, Z_{i} ∣ {Age}_{i}, β, g_{i}) = \prod_{j = 1}^{J} \prod_{t = 1}^{T} \prod_{k = 1}^{K} {[g_{i k} Bern (y_{i j t} ∣ λ_{j k} (A g e_{i t}))]}^{I (z_{i j t} = k)}

(6)

where Z_i = (z_ijt)_{j=1…J,t=1…T} with z_ijt ∈ {1, 2, …, K}. Following Erosheva et al. (2007), it is easy to show that the expression in (3) is equivalent to

p (Y_{i} ∣ {Age}_{i}, β, g_{i}) = \sum_{z ∊ Z} f^{A U G} (Y_{i}, z ∣ {Age}_{i}, β, g_{i})

(7)

where Ƶ = {(z_jt)_J×T : z_jt ∈ {1, …, K}}. We then see that the model in (3) can be thought as a marginalized version of the model in (6). This equivalence shows that the TGoM model conforms to the general mixed-membership structure described in Erosheva et al. (2004). It also makes it possible to construct algorithms for posterior inference of the TGoM using the augmented model (Tanner, 1996).

3.2 Detailed Specification

The extreme trajectories functions, λ_jk(·), encode several assumptions about the dynamics of the underlying process over time. Thus their specific functional form must be application-specific. For this application to the NLTCS, following Connor (2006), we use a linear logit specification

logit [λ_{j k} (τ)] = β_{0 j k} + β_{1 j k} τ .

(8)

Here β_jk = (β_0jk, β_1jk). This specification expresses the intuitively sound notion that the underlying probability of disability is a monotonic function of age. It also has the advantage of being relatively simple, with just 2 × J parameters per extreme profile. In the Supplementary Materials online we present an analysis using an alternative specification, and include a discussion about the appropriateness of 8.

Similar to Erosheva (2002) and Airoldi et al. (2008), we take the common distribution of the N membership vectors g_i, G_α, as

g_{i} ∣ α \overset{i i d}{~} Dirichlet (α),

(9)

where α = (α₁, α₂, …, α_K) with α_k > 0 for all k = 1, …, K.

The Dirichlet distribution has some good properties in this setting. First, it is conjugate to the multinomial distribution. This simplifies computations using Gibbs samplers. Second, adopting the re-parametrization α = (α₀·ξ₁, …, α₀·ξ_K) with α₀ > 0, ξ_k > 0 and Σ_k ξ_k = 1, we can interpret the vector ξ = (ξ₁, …, ξ_K) as the average proportion of responses generated by the kth extreme profile, and α₀ as a parameter governing the spread of the distribution: as α₀ approaches 0, the samples from G_α are more and more concentrated on the vertices of the simplex Δ_K−1; and as α₀ increases they are more concentrated near its mean, ξ.

As Erosheva et al. (2007) and Airoldi et al. (2007) discuss, a priori setting parameter α in the Dirichlet distribution is too strong an assumption to do realistic modeling. Estimates can be highly sensitive to this prior specification. For this reason, we prefer to estimate these parameters directly from the data, specifying hyper-priors and computing posterior distributions. We specify hyper priors for α₀ and ξ similar to Erosheva (2002) and Erosheva et al. (2007): α₀ ~ Gamma(a_α, b_α) and ξ ~ Dirichlet(1_K). This specification takes advantage of the interpretation of the parameters α₀ and ξ, considering them as independent entities and modeling them separately. For the same reason we also assume that p(α₀, ξ) = p(α₀)p(ξ).

We specify the priors for the parameters that define the extreme trajectories, β_jk = (β_0jk, β_jk), as two independent normal distributions, $β_{0 j k} \overset{i d d}{~} N (μ_{0}, σ_{0}^{2})$ and $β_{1 j k} \overset{i d d}{~} N (μ_{1}, σ_{1}^{2})$ , for all j = 1, …, J and k = 1, …, K. These priors can be set to be non-informative, by a priori specifying high variances. We also assume that β_jk are a priori independent of α.

3.3 Representing Generational Changes

The basic TGoM model from Section 3.1 takes advantage of the longitudinal nature of the NLTCS by following individuals as they age. It, however, attributes all variation over time, including changes in prevalence of disability patterns, to the individual progression of aging. Thus, it attributes all changes in prevalence of disability between different epochs to the aggregation of individuals that are at distinct points of their life-trajectories.

To answer questions about changes in the ways of aging across different generations—e.g. “are younger generations acquiring disabilities differently than older ones?”—we need to take into account the birth cohort of individuals. We do so by modeling the dependence between cohorts and the membership scores, keeping the extreme trajectories the same for the whole population. This arrangement allows us to read differences in the ways of aging as differences in the underlying distribution of membership, conditional on birth cohort. We interpret these differences using the common frame of reference provided by the extreme trajectories.

A direct way of enabling inter-generational comparisons under this framework is to keep the individual-level structure proposed for the basic TGoM model, but replace the common distribution of membership vectors with a family indexed by a function of the date of birth (DOB) covariate:

\begin{matrix} p (y_{i j t} ∣ g_{i}, A g e_{i t}, β) & = \sum_{k = 1}^{K} g_{i k} Bern (y_{i j t} ∣ λ_{j k} (A g e_{i t})) \\ g_{i} ∣ D O B_{i} & \overset{i n d e p}{~} G_{α (D O B_{i})} . \end{matrix}

For our application we keep the Dirichlet specification, but replace its parameter α with a function of DOB, so that G_α_(DOB) = Dirichlet(α_(DOB)). We note that under this specification the membership vectors, g_i, are now dependent on a covariate.

A simple, yet reasonably flexible way of specifying α(DOB) is by defining a number of cohorts, and making it constant within each of them. Let Γ = {γ₁, γ₂, …, γ_C} be a finite partition (contiguous non-overlapping intervals) of the range of possible dates of birth. Define α(DOB) = (α₁(DOB), α₂(DOB), …, α_K (DOB)) by

α_{k} (D O B) = \prod_{γ ∊ Γ} {(α_{k}^{γ})}^{I (D O B ∊ γ)},

(10)

where $α_{k}^{γ} > 0$ . Then, we extend the TGoM model to handle cohort information by replacing the population level distribution of membership vectors, p(g_i|α), with its conditional version, p(g_i|α(DOB_i)). Figure 2 shows a graphical representation of this expanded model.

Probabilistic graphical representation of the extended TGoM model with cohort effects. Gray nodes represent observed quantities; white nodes represent parameters to estimate.

We specify the same hyper-prior distribution that we used for the basic TGoM for all the newly introduced parameters. To this end, define $α_{0}^{γ} = Σ_{k = 1}^{K} α_{k}^{γ}$ and $ξ_{k}^{γ} = α_{k}^{γ} ∕ α_{0}^{γ}$ , and take $α_{0}^{γ} \overset{i i d}{~} Gamma (τ, η)$ and $ξ^{γ} = (ξ_{1}^{γ}, \dots, ξ_{K}^{γ}) \overset{i i d}{~} Dirichlet (1_{K})$

4 Estimation

We developed MCMC algorithms based on Gibbs sampling to obtain samples from the posterior distribution of parameters for both the basic and the generational model. These algorithms rely on the augmented data representation in (6). We present the full description in appendices A and B.

5 Application to the NLTCS

We have selected an extract from the NLTCS data that includes data from all six waves. These data include all the individuals that received the screener in at least one of the first five waves of the survey (1982, 1984, 1989, 1994 or 1999). We excluded individuals who entered the sample for the first time in 2004 because of lack of information about their dates of birth and death. Similarly, we excluded all the individuals that were institutionalized in 1982 because the NLTCS did not register their ADL statuses that year. The resulting sample size was N = 38, 428 subjects. For each individual at each wave we focused on six ADLs: Eating (j = 1), Dressing (j = 2), Toileting (j = 3), Getting In or Out of Bed (j = 4), Inside Mobility (j = 5) and Bathing (j = 6). We determined the age of each individual in years by computing the difference between the interview and birth dates, and assuming 365 days for all years. For computing and prior specification purposes, we re-centered ages at 80 years. However, for clarity we report any estimates or descriptive statistics related to age without the offset.

We defined five cohorts or generational groups, partitioning the ranges of possible dates of birth according to the intervals defined in the second column of Table 1. We selected these intervals so that they group approximately the same number of individuals. A salient feature of this arrangement is that individuals from the youngest cohort (cohort 5—born after 1926) have measurements only in the last three waves due to age eligibility, as its oldest members turned 65 after 1991. Also note that neither the oldest (cohort 1—born before 1906) nor the youngest (cohort 5—born after 1926) cohorts span the whole range of relevant dates of birth in the NLTCS. In fact, the oldest individual in cohort 5 could be at most 78 years old in 2004, while the youngest individual from cohort 1 could not be younger than 76 years old in 1982.

Table 1.

Cohort definition and distribution by wave

		Wave
Cohort	DOB	1982 (t = 1)	1984 (t = 2)	1989 (t = 3)	1994 (t = 4)	1999 (t = 5)	2004^* (t = 6)
1	–1906	6329	6025	1347	1397	617	70
2	1906–1914	7631	7082	3452	3335	1753	575
3	1914–1919	3696	7839	2627	5102	3679	2010
4	1919–1926	1	463	2410	4581	4724	3505
5	1926–	0	0	0	2478	6403	4251

Open in a new tab

Only individuals present in 1999

5.1 Basic GoM trajectory model

We fitted the basic model described in Section 3.2 to the NLTCS data using the MCMC algorithm from Appendix A, for K = 2, 3, 4 and 5 extreme profiles.

We set the prior distribution for the proportions parameter of the membership vector, ξ, as a uniform distribution over Δ_K−1, or Dirichlet(1_K). We specified the prior distribution for the corresponding concentration parameter, α₀, as Gamma(1, 5), in shape/inverse scale parametrization. This last specification expresses a slight preference for small values of α₀, although not very pronounced. This choice is more a modeling decision than an expression of prior knowledge: small values of α₀ in the Dirichlet parametrization have the effect of concentrating the probability of individual membership vectors around the vertices of the unit simplex. This has the effect of producing individual membership vectors where one single profile is predominant, but where the other profiles still exert some effect. This behavior is a desirable characteristic from an interpretative standpoint that allows us to discuss “predominant” profiles, while still having a significant degree of flexibility in the handling of heterogeneity due to the influence of the other profiles. For the parameters governing the extreme trajectories, we selected diffuse independent normal priors with µ = 0 and variance σ² = 100.

In all cases, the MCMC chains converged rapidly, reaching stationary distributions after approximately 15,000 iterations. Still, run times were long due to the chains’ slow mixing. In all cases, we ran 120,000 iterations, discarded the first 20,000 and subsampled them, keeping 20% of the remaining. Similar to other latent variable models, the TGoM is invariant to permutation of its extreme profile labels. Thus, we inspected the trace plots for signs of label-switching (Jasra et al., 2005). No switching was found. Although label-switching is a potential problem, in this application the modal regions of the posterior distributions seem to be well separated due to the abundance of data.

5.1.1 Basic Model Results

The basic TGoM model includes parameters that represent two distinct structural features: typical ways of aging, given by the extreme profiles (parameters β), and the way individuals distribute with respect to these extreme profiles (parameters ξ and α₀). Extreme profile parameters can be difficult to interpret directly. Thus we instead consider the quantities given by the transformation

A g e_{q, j k} = - \frac{1}{β_{1 j k}} [β_{0 j k} + log (\frac{1 - q}{q})] + 80,

(11)

for q = 0.1, q = 0.5 and q = 0.9. These parameters express the age at which an ideal individual of the extreme profile k reaches a probability q of being unable of performing ADL j. The 80 year offset is required because we have re-centered the age data, subtracting 80. We also re-label extreme profiles according to the decreasing sequence of posterior estimates of ξ_k. This is necessary because the TGoM’s invariance to permutations of the extreme profile labels. This way, the expression “first extreme profile” (k = 1) will always refer to the extreme profile with the highest relative importance in the population (the one to which most individuals are the closest; see Section 5.1), and “the last” (k = K) to the one with the lowest.

Figure 3 and Table 2 present summaries of the posterior distribution of the extreme profile and mixed membership parameters, respectively. Plots in Figure 3 are based on posterior means of the quantities Age_q,jk, for models with K = 2, 3, 4. For each extreme profile, vertical line segments represent the age interval at which the probability of being unable to perform each ADL increases from 10% to 90%, i.e. [Age_0.1,jk, Age_0.9,jk ]. To aid visualization, we sorted the ADLs according to the Age_0.5,jk estimates. Note that this procedure resulted in the exact same sequence of ADL in every case. Table 2 shows the posterior summaries of the mixed membership distribution parameters, α₀ and ξ.

Posterior estimates of extreme profiles for models with K = 2, 3, 4. Vertical segments represent the age range at which ideal individuals’ probabilities of disability go up from 0.1 to 0.9, for each ADL ([*Age*_0.1,jk, *Age*_0.9,jk]]). For visualization purposes ADLs are sorted according to *Age*_0.5,jk posterior estimates.

Table 2.

Posterior estimates of population-level parameters for basic model with K = 2, 3, 4, 5 extreme profiles. Numbers between parenthesis are posterior standard deviations.

	α ₀	ξ ₁	ξ ₂	ξ ₃	ξ ₄	ξ ₅
K = 2	0.328 (0.007)	0.824 (0.002)	0.176 (0.002)	—	—	—
K = 3	0.261 (0.006)	0.645 (0.004)	0.251 (0.004)	0.104 (0.002)	—	—
K = 4	0.237 (0.006)	0.540 (0.005)	0.259 (0.004)	0.124 (0.003)	0.078 (0.002)	—
K = 5	0.235 (0.005)	0.496 (0.007)	0.244 (0.006)	0.128 (0.003)	0.074 (0.002)	0.058 (0.001)

Open in a new tab

Estimates of the parameter α₀, in Table 2, are relatively small for all models. This was expected since the prior distribution of α₀, Gamma(1, 5), was already expressing strong a priori preference for small values of α₀. However, as we can note from their very small posterior dispersion relative to the prior dispersion, these estimates are strongly data-driven. This is not surprising, considering the large amount of data available to perform the estimations.

For all models, the extreme profile with the highest relative importance in the population, k = 1, represents a pattern of healthy aging, with a very late onset of disability. Extreme trajectories in this profile show that for any ADL, ideal individuals in this class have a very small probability of experimenting disability until approximately age 90. The remaining extreme profiles show patterns with progressively earlier onsets of disability, as we consider the extreme profiles in sequence. This is a feature worth noting: all models point an inverse relationship between the relative importance of a profile in the population and its implied age of onset of disability. This is, most people’s aging trajectories are closer to a profile that describes a late onset of disability.

We note that the sequence of ADLs obtained from sorting them according to their implied age of onset of disability (represented by parameter Age_0.5,jk) is the same for all extreme profiles of all models. Closer inspection reveals that the pattern of acquisition of disabilities directly inferred from the data closely follows what we can interpret as a sequence of activities decreasingly sorted in terms of difficulty: inside mobility, toileting, dressing, bathing, getting in and out of bed, and eating.

Another salient feature of these results is that for k = 1, 2, 3 and 4, the inferred slope parameters of the extreme trajectories (β_1jk) are all positive, even though the prior specification allows for negative values. This result supports the intuition that the probability of experiencing a disability in any ADL can only increase as one ages. It also makes it possible to construct the graphical summaries in Figure 3. The only exception to this regularity is in profile k = 5, in the model with K = 5 extreme profiles. In this profile trajectories exhibit a counter-intuitive decreasing progression. We note that the relative importance of this profile in the population is small, with ${\hat{ξ}}_{5}$ ≈ 0.058 (compared with ${\hat{ξ}}_{1}$ ≈ 0.496 for the most important profile). From a modeling perspective, an obvious way of avoiding this type of aberrant behavior is to make it an impossibility a priori, restricting the support of the slope parameters to positive values. We have implemented such a model. However, while the rest of the parameters remained almost the same, the slope parameter of most trajectories in this profile were zero or very close to zero. These outcomes—together with the results obtrained using a different trajectory specification, in the Supplemental Materials—suggest that this profile captures a small residual variability, which is not correctly modeled by the main extreme trajectories. Accounting for this effect is an area for future improvements.

To better understand the way TGoM models handle individual-level heterogeneity, it is instructive to visualize, in addition to the extreme trajectories, the actual individual trajectories that result from the individual-level mixing, $λ_{j k}^{(i)} (τ)$ . Plots in Figures 4 and 5 show a random sample of 100 such curves, overlaid over the three extreme trajectory curves, for each ADL, under the model with K = 3 and K = 4 extreme profiles, respectively. We see that most of the individual curves cluster in the vicinity of the extreme curves. This is expected, given the small value of the concentration parameter, α₀. However, we also see that a significant portion of the individual curves lies somewhere in between extremes, exhibiting trajectories that are the product of the interaction of more than one extreme. In particular, we observe a fair number of individual trajectories that fall in between extremes k = 1 and k = 2. These trajectories form a somewhat homogeneous cluster different from the extreme profiles. Nonetheless, the TGoM model has been able to accommodate them as a combination of (mainly) profiles k = 1 and k = 2, without needing to create a whole new category for them. This behavior is what gives TGoM models the flexibility to accommodate complex individual heterogeneity, while at the same time producing meaningful and interpretable summaries. Different from traditional LTMs (Nagin, 1999; Connor, 2006), which require that individuals follow one and only one of the typical trajectories, this approach allows them to depart from the main tendencies, but not too much, thus retaining interpretability.

Individual-level mixture of trajectories for model with K = 3 extreme profiles for each ADL. Extreme trajectories are represented with thick lines and and a random sample of 100 individual posterior trajectory curves are plotted using thin lines

Individual-level mixture of trajectories for model with K = 4 extreme profiles for each ADL. Extreme trajectories are represented with thick lines and and a random sample of 100 individual posterior trajectory curves are plotted using thin lines

5.1.2 Multivariate Model Diagnostics

TGoM models explicitly model individual-level dependency between disability outcomes, both longitudinally and between ADLs, with the help of an individual-level mixed membership structure. In order to investigate empirically how TGoMs handle this dependency, we evaluate posterior univariate and multivariate out-of-sample predictive quantities. We define

ϕ_{i j k}^{K} = Pr (y_{i j t}^{*} = y_{i j t} ∣ D, K)

(12)

ϕ_{i j}^{K} = Pr (y_{i j t}^{*} = y_{i j t}, for all t ∣ D, K)

(13)

ϕ_{i t}^{K} = Pr (y_{i j t}^{*} = y_{i j t}, for all j ∣ D, K)

(14)

ϕ_{i}^{K} = Pr (y_{i j t}^{*} = y_{i j t}, for all t and all j ∣ D, K),

(15)

where $y_{i j t}^{*}$ is the posterior predictive outcome of individual i in ADL-j at wave t, D are the NLTCS data, and K refers to the number of extreme profiles. Thus, for individual i, $ϕ_{i j t}^{K}$ is the (univariate) posterior probability of correctly predicting outcome y_ijt using a TGoM with K extreme profiles; $ϕ_{i j}^{K}$ is the probability of simultaneously correctly predicting the whole sequence of responses to ADL-j, at all waves; $ϕ_{j t}^{K}$ is the corresponding probability of correctly predicting all the ADLs at wave t; and φ_i is the probability of simultaneously correctly predicting all the responses of an individual. In order to estimate the out-of-sample predictive performance of our models we compute all these quantities using a 4-fold cross-validation scheme (Hastie et al., 2009; Airoldi et al., 2010).

As a comparison we also fit a model that assumes stochastic independence between univariate outcomes, given age. We fit six (one for each ADL) non-parametric logistic regressions of π_ijt = Pr(y_ijt = 1) on age, using Generalized Additive Models (GAM; Hastie et al., 2009, Ch 9). We use this model as a reference for assessing how our models handle the multivariate structure present in the data. To this end, we compute quantities analogous to (12)–(15): $ϕ_{i j t}^{G A M} = Bern (y_{i j t} ∣ {\hat{π}}_{i j t}), ϕ_{i j}^{G A M} = \prod_{t} ϕ_{i j t}^{G A M}, ϕ_{i t}^{G A M} = \prod_{j} ϕ_{i j t}^{G A M}$ , and $ϕ_{i}^{G A M} = \prod_{j, t} ϕ_{i j t}^{G A M}$ , where ${\hat{π}}_{i j t}$ is the fitted value of π_ijt. We compute these quantities using the same 4-fold cross-validation scheme we use for the TGoM quantities.

Table 3 shows the 4-fold cross-validated means (over all their sub-indexes) of φ_ijt, φ_ij, φ_it and φ_i, for TGoM with K = 1, 2, …, 5 extreme profiles, and for the logistic GAM models. We take these numbers as estimates of the corresponding file-level rates of correct predictions for each model. We note that both TGoM and GAM models have similar univariate prediction rates, of around 80%, slightly favoring GAM. However, most of the joint prediction rates of the TGoM models with K > 1 are substantially better than the alternative. In particular, TGoM correct prediction rates for complete individual outcomes vectors, ${\overset{‒}{ϕ}}_{i}$ , are between 41.4% and 45.4%, while for the GAM model it drops down to 24.7%. We also observe that multivariate prediction rates using TGoM models tend to be much closer to their univariate prediction rates than the corresponding quantities using the GAM alternative. For instance, the ratio $\bar{ϕ_{i}} ∕ \bar{ϕ_{i j t}}$ (numbers between parenthesis in the 5th column of Table 3) ranges from 51.6% to 56.8% for TGoM models (K = 2 and K = 5, respectively), while for GAM it falls down to 30.4%. We finally observe that estimates with TGoM model with K = 1 are almost identical to those obtained with GAM. This is because fitting TGoMs with only one extreme profile (K = 1) is equivalent to fitting J independent logistic regressions of the response variable (ADLs) on the predictors (Age).

Table 3.

Out-of-sample rates of univariate and multivariate correct predictions for TGoM and non-parametric logistic regression models. Percentages between parentheses are the ratio of each entry with respect to its corresponding univariate correct prediction rates, ${\overset{‒}{ϕ}}_{i j t}$ .

Model	$\bar{ϕ_{i j t}}$	$\bar{ϕ_{i j}}$	$\bar{ϕ_{i t}}$	$\bar{ϕ_{i}}$
TGoM K = 1	0.811 (100%)	0.644 (79.4%)	0.452 (55.7%)	0.251 (30.9%)
TGoM K = 2	0.803 (100%)	0.666 (82.9%)	0.567 (70.6%)	0.414 (51.6%)
TGoM K = 3	0.802 (100%)	0.668 (83.3%)	0.593 (73.9%)	0.440 (54.9%)
TGoM K = 4	0.801 (100%)	0.668 (83.4%)	0.605 (75.5%)	0.451 (56.3%)
TGoM K = 5	0.799 (100%)	0.664 (83.1%)	0.607 (76.0%)	0.454 (56.8%)
GAM-Logistic	0.812 (100%)	0.645 (79.4%)	0.451 (55.5%)	0.247 (30.4%)

Open in a new tab

An interesting feature in Table 3 is that longitudinal predictions ( $\bar{ϕ_{i j}}$ ) are better than cross-sectional predictions ( $\bar{ϕ_{i t}}$ ) for all models. This can be explained noting that both TGoM and GAM approaches exploit the extra longitudinal information provided by the vectors of individuals’ ages. By contrast, when modeling the multivariate cross-sectional structure, TGoM relies only on Mixed Membership and GAM only on independence given age. Nonetheless, the comparison between the two modeling approaches still favors TGoM models.

The conclusion of this prediction exercise is that TGoM models with more than one extreme profile do capture a large portion of the multivariate structure present in the data, both longitudinally and cross-sectionally.

5.2 Fitting the Cohort Extensions

We have fitted the model with extensions to handle cohort information to the NLTCS data using the MCMC algorithms in Appendix B, for K = 2, 3 and 4 extreme profiles.

The main objective of the analysis with this model is to compare the underlying distribution of the membership vectors conditional on generational groups, as a way of assessing differences in the ways of aging between different cohorts. We do this by directly comparing the parameters of these distributions for each generational group γ ∈ Γ, α^γ, and interpreting them with respect to the common extreme trajectories, defined by the parameters (β_jk).

Figure 6 shows the estimates (posterior means) of the components of the vector ξ for models with K = 2, 3 and 4 extreme profiles, for each cohort. For each generational group, γ_c, the sequence of values of $ξ_{k}^{γ c}$ , are linked with lines. Reading from left to right, these sequences indicate the evolution of the relative weight of the kth component in each cohort, as we shift our attention from older to younger cohorts. Posterior estimates of the common extreme profile parameters, (β_jk), are very similar to those computed using the basic TGoM model (see Supplemental Materials for details), so we can safely refer to them when discussing extreme profiles.

Evolution of the parameter vector ξ across different generations for model with K = 2, 3 and 4 extreme profiles. The error bars show 95% equal tail posterior credible intervals associated with the kth component of the vector ξ.

The most salient feature in Figure 6 is the increasing monotonicity of the relative importance of the first component (k = 1) in each cohort, as we consider younger and younger cohorts, i.e. $ξ_{1}^{γ_{1}} < ξ_{1}^{γ_{2}} \dots < ξ_{1}^{γ_{5}}$ . This is especially clear in models with K = 2 and K = 3. In the model with K = 4, because of the high posterior dispersion, it is not clear if the youngest generation actually follows this pattern. A likely explanation for this uncertainty is the lack of data for ages past 78 years old in cohort 5.

This trend tells us that, as we consider newer cohorts, their members tend to be increasingly close to profile k = 1. This profile corresponds to the healthiest aging progression, with extremely low probability of acquiring disabilities until very advanced ages, as can be observed in Figure 3. Thus, we conclude that younger generations tend to have healthier ways of aging, compared to their elders.

6 Discussion

The methods we propose and apply here have several desirable features. First, they produce meaningful and easy to interpret summaries of the main temporal trends in the population. In this application, these summaries—the extreme trajectories—isolate typical ways of progressing into disability, and allow a simplified analysis of the longitudinal patterns. Second, they allow a simple, but not over-simplified, characterization of the individual heterogeneity in terms of the extreme trajectories. This keeps the extreme profile characterizations simple, while still allowing the representation of complex individual trajectories. Finally, the model’s extensions allow comparisons between groups of individuals, defined by given static characteristics. In this application it enables the separation of time dependent effects that depend on age, from those dependent on birth cohort.

The results obtained through the application of our methods to the NLTCS highlight some interesting characteristics of the data, and in general of the aging process in the U.S. All the models considered here showed that most individuals are close to the “healthy aging” profile (k = 1), whose associated extreme trajectories (for the 6 ADLs) describe a practically disability-free life until very late ages (90+). Then we find that profiles with trajectories that specify earlier onsets of disability exhibit progressively less importance in the population. This means that most people could be expected to have a relatively disability-free old age, and that very bad aging processes are not so common.

When considering the effect of the birth cohort—estimating simultaneously population-wide extreme profiles, together with individual membership conditional on cohort—we find a similar situation. However, different generations have a different membership composition: the relative importance of the “healthy aging” profile (k = 1) experiments a monotonic increase when moving from older to younger generations, to the detriment of all the other profiles. Thus, the answer to the question “do younger generations acquire disabilities differently than older ones?” appears to be affirmative. Furthermore, it is so in a positive sense: not only do younger generations acquire disabilities differently; they acquire them later. These findings are consistent with previous evidence showing a decline in disability obtained from purely cross-sectional analyses (Manton et al., 1997, 2006), from wave to wave latent class transition analysis (White and Erosheva, 2013), and from latent trajectory analysis (Connor, 2006).

Even though our results pointing to a decline in disability are consistent with previous research in the area, we note that our methods focus on different aspects of the problem, thus enabling new insights. So far declines in disability have been analyzed mostly from wave to wave, either from uncorrelated cross-sectional samples as changes in prevalence (Manton et al., 1997, 2006, 2007); or from longitudinal analysis as transitions between states (Stallard, 2005; White and Erosheva, 2013). Our approach, in contrast, is not rooted on survey waves, nor does it directly assess changes in prevalence of disability. Instead, it characterizes whole individual life-trajectories. It therefore enables direct comparisons across different ways of aging.

An important issue that we have addressed informally here is choosing the number of extreme profiles, K. In section 5.1.2 we noted that in general the out-of-sample multivariate fit measures improved with model complexity, increasing K, although the improvement was different depending on which multivariate dimension we chose to analyze. We also have observed that the least important profiles in models with K > 4 do not reveal informative trajectories. Furthermore, we note that our conclusions do not really depend on an exact number of extreme profiles. Therefore, we evaluate that in this case we do not need to select a “best” model; instead, we have opted for reporting results from several models, with numbers of extreme profiles ranging from K = 2 to K = 4. Model selection, however, can be an important issue in other applications. Possible approaches include the use of indexes such as AIC (Akaike, 1973) or BIC (Schwarz, 1978)—or their more computationally convenient counterparts DIC (Spiegelhalter et al., 2002), AICM or BICM (Raftery et al., 2007)—although in this case the difficulty in computing the integrated likelihood could make this approach impractical. Another approach is to use Bayesian non-parametric specification that favor sparse representations, such as a Dirichlet Process mixtures. Bhattacharya and Dunson (2012) have proposed what is essentially a non-parametric Mixed Membership model for categorical data, which could be adapted for this purpose.

As for model limitations, this analysis attributes all variability in the data to a combination of random fluctuation, age effects, cohort effects, and mixed membership. Thus it neglects other potential systematic effects, some of which might be important either to capture previously unaccounted variability, or simply for better understanding the underlying processes. For instance, it is well known (see e.g. Ferrucci et al., 1996; Manton, 2008) that men and women follow different aging and mortality processes. One natural way of accounting for non time-dependent categorical covariates, like gender or race, is to introduce them in the same way we introduced the DOB covariate: as conditioners on the prior distribution of individual membership. If the cells on the contingency table generated by the cross-classification according to the covariates are well populated, we can directly use the TGoM extensions from Section 3.3. If this is not the case, the joint covariate vector can be smoothed using more complex prior specifications, such as those proposed in Bertolet (2008) for the Grade of Membership model.

Another related limitation of these models is that they do not account for mortality. In essence, these models correspond to what Kurland and Heagerty (2005) and Kurland et al. (2009) call “an immortal cohort”. This is of particular importance in the present application because patterns of disability are usually tied to patterns of mortality (Ferrucci et al., 1996; Connor, 2006; White and Erosheva, 2013): progression into more severe disability goes together with an increased probability of death. One way of integrating mortality into this framework is to extend the definition of extreme profiles to characterize not only patterns of disability acquisition, but also of survival. Such a joint model could be the topic of a future article.

Supplementary Material

Online Supplement 1

NIHMS688589-supplement-Online_Supplement_1.pdf^{(173.3KB, pdf)}

7 Acknowledgments

The author would like to thank Prof. Steve Fienberg for his guidance and encouragement while conducting this investigation, Prof. Jerry Reiter for his help and valuable suggestions during the review process, and three anonymous referees and one Associate Editor for their thorough critique and recommendations. This work is partially based on material from the Ph.D. thesis of the author in the Department of Statistics at Carnegie Mellon University. For a closely related treatment of the model and data analysis see also Manrique-Vallier (forthcoming 2014).

This research was supported in part by NIH Grant R01 AG023141-01 to Carnegie Mellon University, and NSF Grant SES-11-31897 to Duke University.

A Appendix - MCMC sampler for the TGoM model

In this appendix we present a Gibbs sampling algorithm for Bayesian estimation of the TGoM model, also described in Manrique-Vallier (forthcoming 2014). Following the discussion at the end of Section 3.1, we construct an algorithm for obtaining samples from the posterior distribution of parameters in the augmented data model in equation (6), which after marginalizing z is equivalent to the TGoM model. This posterior distribution is

p (α, β, Z, g ∣ Y, Age) \propto p (α, g, β) \prod_{i = 1}^{N} f^{A U G} (Y_{i}, Z_{i} ∣ {Age}_{i}, β, g_{i}),

(16)

which following the the detailed specification from Section 3.2 is equivalent to

\begin{matrix} p (α, β, Z, g ∣ Y, Age) \propto & Gamma (α_{0} ∣ a_{α}, b_{α}) \times Dirichlet (ξ ∣ 1_{K}) \\ \times \prod_{i = 1}^{N} Dirichlet (g_{i} ∣ α) \times \prod_{j = 1}^{J} \prod_{k = 1}^{K} N (β_{0 j k} ∣ μ_{0}, σ_{0}^{2}) \times N (β_{1 j k} ∣ μ_{1}, σ_{1}^{2}) \\ \times \prod_{i = 1}^{N} \prod_{j = 1}^{J} \prod_{t = 1}^{T} g_{i z_{i j t}} \frac{exp (y_{i j t} β_{0 j z_{i j t}} + y_{i j t} β_{1 j z_{i j t}} A g e_{i t})}{1 + exp (β_{0 j z_{i j t}} + β_{1 j z_{i j t}} A g e_{i t})}, \end{matrix}

(17)

whith α₀ = Σα_k and ξ = (α₁/α₀, …, α_K/α₀). Parameters a_α and b_α are shape and inverse scale parameters, respectively.

A Gibbs sampling algorithm for obtaining samples from the joint posterior distribution of (α, β, Z, g) can be constructed as follows.

1. Sampling from Z

For every i ∈ {1 … N }, j ∈ {1 … J} and t ∈ {1, …, T}, sample z_ijt|… ~ Discrete({1, …, K}, (p₁, p₂, …, p_K)), with

p_{k} \propto g_{i k} \frac{exp [y_{i j t} (β_{0 j k} + β_{1 j k} A g e_{i t})]}{1 + exp (β_{0 j k} + β_{1 j k} A g e_{i t})}

for all k ∈ {1, …, K}.

2. Sampling from β_jk

Let Ξ = {(i, t) : z_ijt = k} and assume that µ₀ = µ₁ = 0.

The full joint conditional distribution of (β_0jk, β_1jk) is

p (β_{0 j k}, β_{1 j k} ∣ \dots) \propto \frac{exp [- (\frac{β_{1 j k}^{2}}{2 σ_{1}^{2}} + \frac{β_{0 j k}^{2}}{2 σ_{0}^{2}}) + β_{0 j k} \sum_{Ξ} y_{i j t} + β_{1 j k} \sum_{Ξ} A g e_{i t} y_{i j t}]}{\prod_{Ξ} [1 + exp (β_{0 j k} + β_{0 j k} A g e_{i t})]}

To sample from this distribution we use a random walk Metropolis step:

(a) Sample proposal values $β_{0 j k}^{*} ~ N (β_{0 j k}, σ_{β 0}^{2})$ and $β_{1 j k}^{*} ~ N (β_{1 j k}, σ_{β 1}^{2})$ , where $σ_{β 0}^{2}$ and $σ_{β 1}^{2}$ are tuning parameters.
(b) With probability

\begin{matrix} r_{M} = min {1, & \prod_{Ξ} [\frac{1 + exp [β_{0 j k} + β_{0 j k} A g e_{i t}]}{1 + exp [β_{0 j k}^{*} + β_{0 j k}^{*} A g e_{i t}]}] \\ \times exp [- \frac{β_{0 j k}^{* 2} - β_{0 j k}^{2}}{2 σ_{0}^{2}} + (β_{0 j k}^{*} - β_{0 j k}) Σ_{Ξ} y_{i j t}] \\ \times exp [- \frac{β_{1 j k}^{* 2} - β_{1 j k}^{2}}{2 σ_{1}^{2}} + (β_{1 j k}^{*} - β_{1 j k}) Σ_{Ξ} y_{i j t} A g e_{i t}]} \end{matrix}

(18)

make $(β_{0 j k}, β_{1 j k}) = (β_{0 j k}^{*}, β_{1 j k}^{*})$ . Otherwise keep the current value.

3. Sampling from g_i

g_{i} ∣ \dots \overset{i n d e p .}{~} Dirichlet (α_{1} + \sum_{j, t} I (z_{i j t} = 1), \dots, α_{K} + \sum_{j, t} I (z_{i j t} = k)) .

4. Sampling from α

The full conditional distribution of α,

p (α ∣ \dots) \propto α_{0}^{a_{α} - 1} e^{- α_{0} b_{α}} \times {[\frac{Γ (α_{0})}{\prod_{k = 1}^{K} Γ (α_{k})}]}^{N} \prod_{k = 1}^{K} {[\prod_{i = 1}^{N} g_{i k}]}^{α_{k}},

does not have any recognizable form. We use a Metropolis-Hastings step similar to Manrique-Vallier and Fienberg (2008):

(a) Obtain the proposal $α^{*} = (α_{1}^{*}, α_{2}^{*}, \dots, α_{K}^{*})$ , with $α_{k}^{*} \overset{i n d e p}{~} lognormal (log α_{k}, σ^{2})$ .
(b) Let $α_{0}^{*} = Σ_{k = 1}^{K} α_{k}^{*}$ . With probability
$\begin{matrix} r = min {1, & e^{- a_{α} (α_{0}^{*} - α_{0})} {(\frac{α_{0}^{*}}{α_{0}})}^{b_{α} - 1} (\prod_{k = 1}^{K} \frac{α_{k}^{*}}{α_{k}}) \\ \times {[\frac{Γ (α_{0}^{*})}{Γ (α_{0})} \prod_{k = 1}^{K} \frac{Γ (α_{k})}{Γ (α_{k}^{*})}]}^{N} \prod_{k = 1}^{K} {(\prod_{i = 1}^{N} g_{i k})}^{α_{k}^{*} - α_{k}}}, \end{matrix}$
make α = α^∗. Otherwise keep the current value. Obtain (α₀, ξ) by making $α_{0} = Σ_{k = 1}^{K} α_{k} and ξ_{k} = α_{k} ∕ α_{0}$ , for all k = 1, …, K.

B Appendix - fitting the generational extension

The only difference between the posterior distributions of the basic and the extended TGoM models is the distribution of g_i|α. Thus, we only have to adapt steps 3 and 4 in the previous algorithm, by replacing $\prod_{i = 1}^{N} p (g_{i} ∣ α)$ with

\prod_{i = 1}^{N} p (g_{i} ∣ α (D O B_{i})) = \prod_{γ ∊ Γ} \prod_{i = 1}^{N} {[p (g_{i} ∣ α^{γ})]}^{I (D O B_{i} ∊ γ)} .

(19)

Let γ_i ∈ Γ be the unique interval from the partition such that DOB_i ∈ γ_i. We obtain an MCMC sampler for this model, by modifying steps 3 and 4 from the algorithm in Appendix A with

3’ Sampling from g_i

g_{i} ∣ \dots \overset{i n d e p}{~} Dirichlet (α_{1}^{γ_{i}} + \sum_{j, t} I (z_{i j t} = 1), \dots, α_{K}^{γ_{i}} + \sum_{j, t} I (z_{i j t} = K)) .

4’ Sampling from α

Let Ξ_γ = {i : γ_i = γ}. The full conditional distribution of α^γ is

p (α^{γ} ∣ \dots) \propto {(α_{0}^{γ})}^{a_{α} - 1} e^{- α_{0}^{γ} b_{α}} \times {[\frac{Γ (α_{0}^{γ})}{\prod_{k = 1}^{K} Γ (α_{k}^{γ})}]}^{# (Ξ_{γ})} \prod_{k = 1}^{K} {[\prod_{Ξ} g_{i k}]}^{α_{k}^{γ}},

where #(Ξ_γ) is the number of elements in the set Ξ_γ.

This expression is similar (19) in Appendix A. We thus adapt the procedure by replacing r in step 4 of the algorithm with

\begin{matrix} r = min {1, & exp [- τ (α_{0}^{*} - α_{0}^{γ})] (\prod_{k = 1}^{K} \frac{α_{k}^{*}}{α_{k}^{γ}}) {(\frac{α_{0}^{*}}{α_{0}^{γ}})}^{τ - 1} \\ \times {[\frac{Γ (α_{0}^{*})}{Γ (α_{0}^{γ})} \prod_{k = 1}^{K} \frac{Γ (α_{k}^{γ})}{Γ (α_{k}^{*})}]}^{# (Ξ_{γ})} \prod_{k = 1}^{K} {(\prod_{i ∊ Ξ_{γ}} g_{i k})}^{α_{k}^{*} - α_{k}^{γ}}} . \end{matrix}

References

Airoldi EM, Blei DM, Fienberg SE, Xing EP. Mixed membership stochastic blockmodels. Journal of Machine Learning Research. 2008;9:1981–2014. [PMC free article] [PubMed] [Google Scholar]
Airoldi EM, Erosheva EA, Fienberg SE, Joutard C, Love T, Shringarpure S. Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences. 2010;107:20899–20904. doi: 10.1073/pnas.1013452107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Airoldi EM, Fienberg SE, Joutard C, Love TM. Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models. In: Poncelet P, Masseglia F, Teisseire M, editors. Data Mining Patterns: New Methods and Applications. Idea Group Inc.; Hershey, PA: 2007. pp. 240–275. [Google Scholar]
Akaike H. Second International Symposium on Information Theory. Akademinai Kiado; 1973. Information heory and an extension of the maximum likelihood principle; pp. 267–281. [Google Scholar]
Bertolet M. Department of Statistics, Carnegie Mellon University; 2008. To Weight Or Not To Weight? Incorporating Sampling Designs Into Model-Based Analyses. Ph.D. thesis. [Google Scholar]
Bhattacharya A, Dunson DB. Simplex factor models for multivariate unordered categorical data. Journal of the American Statistical Association. 2012;107:362–377. doi: 10.1080/01621459.2011.646934. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clark R. An Introduction to the National Long-Term Care Survey. office of Disability, Aging, and Long-Term Care Policy with the U.S. Department of Health and Human Services; 1998. http:// aspe.hhs.gov/daltcp/reports/nltcssu2.htm [Google Scholar]
Connor J, Fienberg S, Erosheva E, White T. Towards a restructuring of the National Long Term Care Survey: A longitudinal perspective; Prepared for presentation at an Expert Panel Meeting on the National Long Term Care Survey; Committee on National Statistics, National Research Council; 2006. [Google Scholar]
Connor JT. Department of Statistics & H. John Heinz III School of Public Policy & Management. Carnegie Mellon University; 2006. Multivariate Mixture Models to Describe Longitudinal Patterns of Frailty in American Seniors. Ph.D. thesis. [Google Scholar]
Corder L, Manton K. National surveys and the health and functioning of the elderly: the effects of design and content. Journal of the American Statistical Association. 1991;86:513–525. [Google Scholar]
Erosheva E. Department of Statistics. Carnegie Mellon University; 2002. Grade of membership and latent structures with application to disability survey data. Ph.D. thesis. [Google Scholar]
Erosheva E, Fienberg S. Bayesian Mixed Membership Models for Soft Clustering and Classification. In: Weihs C, Gaul W, editors. Classification —the Ubiquitous Challenge. Springer; Berlin Heidelberg: 2005. pp. 11–26. Studies in Classification, Data Analysis, and Knowledge Organization. [Google Scholar]
Erosheva E, Fienberg S, Joutard C. Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics. 2007;1:502–537. doi: 10.1214/07-aoas126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Erosheva E, Fienberg SE, Lafferty JD. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences. 2004;101:5220–5227. doi: 10.1073/pnas.0307760101. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ferrucci L, Guralnik J, Simonsick E, Salive M, Corti C, Langlois J. Progressive versus catastrophic disability: a longitudinal view of the disablement process. The Journals of Gerontology: Series A. 1996;51:M123. doi: 10.1093/gerona/51a.3.m123. [DOI] [PubMed] [Google Scholar]
Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika. 1974;61:215–231. [Google Scholar]
Hastie T, Tibshirani R, Friedman J. a. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Springer-Verlag; New York: 2009. [Google Scholar]
Jasra A, Holmes C, Stephens D. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science. 2005;20:50–67. [Google Scholar]
Kreuter F, Muthén B. Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling. Journal of Quantitative Criminology. 2008;24:1–31. [Google Scholar]
Kurland B, Heagerty P. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths. Biostatistics. 2005;6:241. doi: 10.1093/biostatistics/kxi006. [DOI] [PubMed] [Google Scholar]
Kurland B, Johnson L, Egleston B, Diehr P. Longitudinal data with follow-up truncated by death: match the analysis method to research aims. Statistical science. 2009;24:211. doi: 10.1214/09-STS293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manrique-Vallier D. Mixed Membership Trajectory Models. In: Airoldi EM, Blei DM, Erosheva EA, Fienberg SE, editors. Handbook on Mixed Membership Models. Chapman & Hall/CRC; 2014. forthcoming. [Google Scholar]
Manrique-Vallier D, Fienberg S. Population size estimation using individual level mixture models. Biometrical Journal. 2008;50:1051–1063. doi: 10.1002/bimj.200810448. [DOI] [PubMed] [Google Scholar]
Manton KG. Recent declines in chronic disability in the elderly U.S.. population: risk factors and future dynamics. Annual Review of Public Health. 2008;29:91–113. doi: 10.1146/annurev.publhealth.29.020907.090812. [DOI] [PubMed] [Google Scholar]
Manton KG, Corder L, Stallard E. Chronic disability trends in elderly United States populations: 1982-1994. Proceedings of the National Academy of Sciences. 1997;94:2593–2598. doi: 10.1073/pnas.94.6.2593. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manton KG, Gu X, Lamb V. Change in chronic disability from 1982 to 2004/2005 as measured by long-term changes in function and health in the US elderly population. Proceedings of the National Academy of Sciences. 2006;103:18374. doi: 10.1073/pnas.0608483103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manton KG, Lamb VL, Gu X. Medicare cost effects of recent US disability trends in the elderly future implications. Journal of Aging and Health. 2007;19:359–381. doi: 10.1177/0898264307300186. [DOI] [PubMed] [Google Scholar]
Manton KG, Stallard E, Woodbury M. A multivariate event history model based upon fuzzy states: Estimation from longitudinal surveys with informative nonresponse. Journal of Official Statistics. 1991;7:261–293. [Google Scholar]
Nagin D. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999;4:139–157. doi: 10.1037/1082-989x.6.1.18. [DOI] [PubMed] [Google Scholar]
Raftery A, Newton M, Satagopan J, Krivitsky P, Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M. Bayesian Statistics. Vol. 8. Oxford University Press; 2007. Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity. [Google Scholar]
Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583–639. [Google Scholar]
Stallard E. Trajectories of Morbidity, Disability, and Mortality among the US Elderly Population: Evidence from the 1984–1999 NLTCS. North American Actuarial Journal. 2005:11. [Google Scholar]
Tanner M. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. 3rd Springer Verlag; New York: 1996. [Google Scholar]
White TA, Erosheva EA. Using group-based latent class transition models to analyze chronic disability data from the National Long-Term Care Survey 1984–2004. Statistics in Medicine. 2013;32:3569–3589. doi: 10.1002/sim.5782. [DOI] [PMC free article] [PubMed] [Google Scholar]
Woodbury M, Clive J, Garson A., Jr Mathematical typology: A grade of membership technique for obtaining disease definition. Computers in Biomedical Research. 1978;11:277–98. doi: 10.1016/0010-4809(78)90012-5. [DOI] [PubMed] [Google Scholar]
Xing EP, Fu W, Song L. A state-space mixed membership blockmodel for dynamic network tomography. The Annals of Applied Statistics. 2010;4:535–566. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Online Supplement 1

NIHMS688589-supplement-Online_Supplement_1.pdf^{(173.3KB, pdf)}

[R1] Airoldi EM, Blei DM, Fienberg SE, Xing EP. Mixed membership stochastic blockmodels. Journal of Machine Learning Research. 2008;9:1981–2014. [PMC free article] [PubMed] [Google Scholar]

[R2] Airoldi EM, Erosheva EA, Fienberg SE, Joutard C, Love T, Shringarpure S. Reconceptualizing the classification of PNAS articles. Proceedings of the National Academy of Sciences. 2010;107:20899–20904. doi: 10.1073/pnas.1013452107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Airoldi EM, Fienberg SE, Joutard C, Love TM. Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models. In: Poncelet P, Masseglia F, Teisseire M, editors. Data Mining Patterns: New Methods and Applications. Idea Group Inc.; Hershey, PA: 2007. pp. 240–275. [Google Scholar]

[R4] Akaike H. Second International Symposium on Information Theory. Akademinai Kiado; 1973. Information heory and an extension of the maximum likelihood principle; pp. 267–281. [Google Scholar]

[R5] Bertolet M. Department of Statistics, Carnegie Mellon University; 2008. To Weight Or Not To Weight? Incorporating Sampling Designs Into Model-Based Analyses. Ph.D. thesis. [Google Scholar]

[R6] Bhattacharya A, Dunson DB. Simplex factor models for multivariate unordered categorical data. Journal of the American Statistical Association. 2012;107:362–377. doi: 10.1080/01621459.2011.646934. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Clark R. An Introduction to the National Long-Term Care Survey. office of Disability, Aging, and Long-Term Care Policy with the U.S. Department of Health and Human Services; 1998. http:// aspe.hhs.gov/daltcp/reports/nltcssu2.htm [Google Scholar]

[R8] Connor J, Fienberg S, Erosheva E, White T. Towards a restructuring of the National Long Term Care Survey: A longitudinal perspective; Prepared for presentation at an Expert Panel Meeting on the National Long Term Care Survey; Committee on National Statistics, National Research Council; 2006. [Google Scholar]

[R9] Connor JT. Department of Statistics & H. John Heinz III School of Public Policy & Management. Carnegie Mellon University; 2006. Multivariate Mixture Models to Describe Longitudinal Patterns of Frailty in American Seniors. Ph.D. thesis. [Google Scholar]

[R10] Corder L, Manton K. National surveys and the health and functioning of the elderly: the effects of design and content. Journal of the American Statistical Association. 1991;86:513–525. [Google Scholar]

[R11] Erosheva E. Department of Statistics. Carnegie Mellon University; 2002. Grade of membership and latent structures with application to disability survey data. Ph.D. thesis. [Google Scholar]

[R12] Erosheva E, Fienberg S. Bayesian Mixed Membership Models for Soft Clustering and Classification. In: Weihs C, Gaul W, editors. Classification —the Ubiquitous Challenge. Springer; Berlin Heidelberg: 2005. pp. 11–26. Studies in Classification, Data Analysis, and Knowledge Organization. [Google Scholar]

[R13] Erosheva E, Fienberg S, Joutard C. Describing disability through individual-level mixture models for multivariate binary data. Annals of Applied Statistics. 2007;1:502–537. doi: 10.1214/07-aoas126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Erosheva E, Fienberg SE, Lafferty JD. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences. 2004;101:5220–5227. doi: 10.1073/pnas.0307760101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Ferrucci L, Guralnik J, Simonsick E, Salive M, Corti C, Langlois J. Progressive versus catastrophic disability: a longitudinal view of the disablement process. The Journals of Gerontology: Series A. 1996;51:M123. doi: 10.1093/gerona/51a.3.m123. [DOI] [PubMed] [Google Scholar]

[R16] Goodman LA. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika. 1974;61:215–231. [Google Scholar]

[R17] Hastie T, Tibshirani R, Friedman J. a. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Springer-Verlag; New York: 2009. [Google Scholar]

[R18] Jasra A, Holmes C, Stephens D. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science. 2005;20:50–67. [Google Scholar]

[R19] Kreuter F, Muthén B. Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling. Journal of Quantitative Criminology. 2008;24:1–31. [Google Scholar]

[R20] Kurland B, Heagerty P. Directly parameterized regression conditioning on being alive: analysis of longitudinal data truncated by deaths. Biostatistics. 2005;6:241. doi: 10.1093/biostatistics/kxi006. [DOI] [PubMed] [Google Scholar]

[R21] Kurland B, Johnson L, Egleston B, Diehr P. Longitudinal data with follow-up truncated by death: match the analysis method to research aims. Statistical science. 2009;24:211. doi: 10.1214/09-STS293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Manrique-Vallier D. Mixed Membership Trajectory Models. In: Airoldi EM, Blei DM, Erosheva EA, Fienberg SE, editors. Handbook on Mixed Membership Models. Chapman & Hall/CRC; 2014. forthcoming. [Google Scholar]

[R23] Manrique-Vallier D, Fienberg S. Population size estimation using individual level mixture models. Biometrical Journal. 2008;50:1051–1063. doi: 10.1002/bimj.200810448. [DOI] [PubMed] [Google Scholar]

[R24] Manton KG. Recent declines in chronic disability in the elderly U.S.. population: risk factors and future dynamics. Annual Review of Public Health. 2008;29:91–113. doi: 10.1146/annurev.publhealth.29.020907.090812. [DOI] [PubMed] [Google Scholar]

[R25] Manton KG, Corder L, Stallard E. Chronic disability trends in elderly United States populations: 1982-1994. Proceedings of the National Academy of Sciences. 1997;94:2593–2598. doi: 10.1073/pnas.94.6.2593. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Manton KG, Gu X, Lamb V. Change in chronic disability from 1982 to 2004/2005 as measured by long-term changes in function and health in the US elderly population. Proceedings of the National Academy of Sciences. 2006;103:18374. doi: 10.1073/pnas.0608483103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Manton KG, Lamb VL, Gu X. Medicare cost effects of recent US disability trends in the elderly future implications. Journal of Aging and Health. 2007;19:359–381. doi: 10.1177/0898264307300186. [DOI] [PubMed] [Google Scholar]

[R28] Manton KG, Stallard E, Woodbury M. A multivariate event history model based upon fuzzy states: Estimation from longitudinal surveys with informative nonresponse. Journal of Official Statistics. 1991;7:261–293. [Google Scholar]

[R29] Nagin D. Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods. 1999;4:139–157. doi: 10.1037/1082-989x.6.1.18. [DOI] [PubMed] [Google Scholar]

[R30] Raftery A, Newton M, Satagopan J, Krivitsky P, Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M. Bayesian Statistics. Vol. 8. Oxford University Press; 2007. Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity. [Google Scholar]

[R31] Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;6:461–464. [Google Scholar]

[R32] Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583–639. [Google Scholar]

[R33] Stallard E. Trajectories of Morbidity, Disability, and Mortality among the US Elderly Population: Evidence from the 1984–1999 NLTCS. North American Actuarial Journal. 2005:11. [Google Scholar]

[R34] Tanner M. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. 3rd Springer Verlag; New York: 1996. [Google Scholar]

[R35] White TA, Erosheva EA. Using group-based latent class transition models to analyze chronic disability data from the National Long-Term Care Survey 1984–2004. Statistics in Medicine. 2013;32:3569–3589. doi: 10.1002/sim.5782. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Woodbury M, Clive J, Garson A., Jr Mathematical typology: A grade of membership technique for obtaining disease definition. Computers in Biomedical Research. 1978;11:277–98. doi: 10.1016/0010-4809(78)90012-5. [DOI] [PubMed] [Google Scholar]

[R37] Xing EP, Fu W, Song L. A state-space mixed membership blockmodel for dynamic network tomography. The Annals of Applied Statistics. 2010;4:535–566. [Google Scholar]

PERMALINK

Longitudinal Mixed Membership Trajectory Models for Disability Survey Data

Daniel Manrique-Vallier

Abstract

1 Introduction

2 The National Long Term Care Survey

3 Mixed Membership Trajectory Models

3.1 Basic TGoM Model

Figure 1.

3.2 Detailed Specification

3.3 Representing Generational Changes

Figure 2.

4 Estimation

5 Application to the NLTCS

Table 1.

5.1 Basic GoM trajectory model

5.1.1 Basic Model Results

Figure 3.

Table 2.

Figure 4.

Figure 5.

5.1.2 Multivariate Model Diagnostics

Table 3.

5.2 Fitting the Cohort Extensions

Figure 6.

6 Discussion

Supplementary Material

7 Acknowledgments

A Appendix - MCMC sampler for the TGoM model

1. Sampling from Z

2. Sampling from βjk

3. Sampling from gi

4. Sampling from α

B Appendix - fitting the generational extension

3’ Sampling from gi

4’ Sampling from α

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2. Sampling from β_jk

3. Sampling from g_i

3’ Sampling from g_i