Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Oct 15.
Published in final edited form as: Stat Med. 2012 Jul 20;31(23):2745–2756. doi: 10.1002/sim.5370

Associations between Variability of Risk Factors and Health Outcomes in Longitudinal Studies

Michael R Elliott 1,2, Mary D Sammel 3, Jessica Faul 4
PMCID: PMC3470883  NIHMSID: NIHMS407040  PMID: 22815213

Abstract

Many statistical methods have been developed that treat within-subject correlation that accompanies the clustering of subjects in longitudinal data settings as a nuisance parameter, with the focus of analytic interest being on mean outcome or profiles over time. However, there is evidence that in certain settings (Elliott 2007; Harlow et al. 2000; Sammel et al. 2001 Kikuya et al. 2008) underlying variability in subject measures may also be important in predicting future health outcomes of interest. Here we develop a method for combining information from mean profiles and residual variance to assess associations with categorical outcomes in a joint modeling framework. We consider an application to relating word recall measures obtained over time to dementia onset from the Health and Retirement Survey.

Keywords: Differential measurement error, Markov Chain Monte Carlo, Total recall, Dementia, Health and Retirement Survey

1 Introduction

Summary statistics such as the sample mean often describe central tendencies in observed risk factors measured at a particular point in time. However, there are many applications in which characteristics of the risk factors over time are of primary scientific interest for predicting disease. These characteristics can include both changes in the mean function, such as a slope, and measures of variability about the mean. While variances are sometimes modeled to accommodate heteroscedaticity or a hierarchical covariance structure [1], methods that treat variances as being of primary interest and the mean or trend as a nuisance are far less common than the converse [2]. Examples of the latter include Harlow et al. [3], where within-woman variability in menstrual cycle length at earlier ages was demonstrated to be an important predictor of abnormal uterine bleeding at later ages; Sammel et al. [4], who used a two-stage model to show that high levels variability in reproductive hormone levels were associated with increased prevalence of menopausal symptoms such as severe hot flashes; Elliott [5], who used a penalized spline model to detrend affect data in a sample of recovering myocardial infarction patients and showed that both low levels and high levels of the residual variance were associated with increased risk of depression; and Kikuya et al. [6], who found that day-to-day variability in blood pressure was associated with increased cardiovascular and stroke mortality risk, while day-to-day variability in heart rate was associated with increased cardiac and stroke mortality risk. Outside of the medical and public health arenas, economists have considered price volatility as an important predictor of risk and returns in financial markets [7], but in general methods to assess associations between variability and risk are underdeveloped in the biostatistics literature.

Methods for longitudinal data have usually treated within-subject variability as a nuisance parameter, with the focus being on central tendency measures to relate the risk factor to an outcome of interest. For example, Muthén et al. [8] used growth mixture models to classify subjects based on underlying mean trends in childhood aggression measures and related these classes to juvenile delinquency risk. Ye et al. [9] used a measurement error model to relate profiles of PSA levels to time to prostate cancer recurrence. These methods focus on error which is non-differential, such that it does not influence the outcome of interest directly. This manuscript develops methods to model the joint distribution of within-subject mean trends and variability in repeated measurements of risk factors as predictors of categorical health outcomes, with the goal of maximizing the predictive power that can be gleaned from longitudinal risk factor data. We use these methods to determine how trends and variability in memory tests are associated transition to dementia using data from the Health and Retirement Study. In particular, we jointly model subject-level slope-intercepts and residual variances of a memory recall test as predictors of onset of dementia during the follow-up period. Although we use a fully Bayesian approach, we provide a simulation study to consider the repeated-sampling properties of our proposed method.

1.1 Memory and Cognition Testing in the The Health and Retirement Study (HRS)

Many studies have shown a positive relationship between older age and variability in performance on sensory, motor, and cognitive tasks [1011]. There is evidence that intraindividual cognitive variability is a significant source of this variability in performance between groups, especially among older adults [1213]. Until recently, age-related increases in intra-individual variability have usually been attributed to lack of instrument reliability [14]. While variability arising from this type of measurement error may not be meaningful in and of itself, in many cases intraindividual variability may arise from origins other than measurement error and may provide insight into underlying psychological processes and lead to several possible theoretical interpretations [14]. Intraindividual variability may provide support to existing theories of cognitive aging including the differentiation-dedifferentiation theory and the common cause hypothesis [11]. Intraindividual variability may also be an early marker of cognitive deficits [15]. More specifically, this type of variability might reflect an adaptive response to cognitive decline. For example, it may require high levels of attentional capacity to maintain low levels of variability over repeated trials of a task or across cognitive tasks [14].

Current methods of detection of dementia often rely on information on a person‘s level of performance from a single assessment. A main limitation of this approach is the inability to distinguish between poor performance due to low baseline intellectual ability, preclinical stages of dementia, or fluctuations in cognitive performance due to mood, sensory stimulation, or other state-based differences [1617]. Because performance on tests such as the Mini-Mental State Examination (MMSE) are known to vary even over relatively short time intervals, our ability to detect early stages of dementia using these methods is not reliable [18]. However, interpreting lack of reliability in this case is difficult, as it may reflect characteristics of both the specific cognitive test and the individual being tested, that is to say inconsistent classification arises from variability due to classification error as well as intraindividual variability in performance. In spite of the difficulties in measurement, investigators have hypothesized that variable cognitive performance would precede consistently poor performance in individuals with mild cognitive impairment (MCI) and those in the very early stages of dementia and thus provide information that predicts over and above what can be achieved by predicting from level information alone [13,17]. We assess this hypothesis using data from the Health and Retirement Study (HRS).

The HRS is a nationally representative, prospective panel study of community-dwelling US adults born between 1890 and 1959 with oversampling of minorities and Florida residents [19]. The HRS include five cohorts: the Asset and Health Dynamics Among the Oldest Old Study (AHEAD) cohort of persons born between 1890–1923; the Children of the Depression Age (CODA) cohort of those born between 1924–1930; the original HRS cohort of those born between 1931–1941; the War Babies (WB) cohort of those born between 1942–1947; the Early Baby Boomer (EBB) cohort of those born between 1948–1953, and the Middle Baby Boomer (MBB) cohort of those born between 1954–1959 [19]. Our focus will be on the AHEAD cohort, consisting of 8,222 subjects born between 1890 and 1923, who have had data collected in 1993, and, if they survived, 1995, 1998, 2000, 2002, 2004, 2006, and 2008. Interviews are conducted by telephone for most respondents under 80 years of age and face-to-face for persons 80 years of age or older. Baseline and re-interview rates have been consistently high, with baseline rates ranging from 70%–81% across the cohorts. Follow-up response rates are on average in the low to mid-90% range; subjects who fail to respond at a given wave are attempted at the next wave unless they have died; hence some missingness is intermittent.

In the HRS, cognitive function is assessed through several questions asked at every wave. For these analyses, only performance on the episodic memory tasks are considered. In particular, we focus on total recall, a measure of episodic memory that consists of immediate and delayed recall of a 10-word list which is asked in the HRS survey. Data on dementia diagnosis come from Medicare claims records linked to HRS respondents. Longitudinal HRS survey data are matched to administrative Medicare records for HRS respondents who have previously consented to have their Medicare data released; over 80% consent to do so. Of those who provided an identification number, 98% have been successfully matched to Medicare files. Dementia is defined using the ICD-9 codes listed in the Chronic Condition Data Warehouse (CCW) definition of Alzheimer’s Disease and Related Disorders or Senile Dementia at http://www.ccwdata.org/cs/groups/public/documents/document/ccw_conditioncategories.pdf. Respondents are classified as having received a dementia diagnosis if they had at least one dementia diagnosis code in any of the Medicare claims files, including: inpatient, outpatient, part B physician supplier, Skilled Nursing Facility (SNF), hospice, and durable medical equipment files. This claims-based diagnostic measure has reasonable sensitivity and specificity for dementia (0.85 and 0.89; see Taylor et al. [20]). The key question of interest is the degree to which trends and variability in cognitive and memory tests are associated with transition to dementia, after adjustment for educational level, race/ethnicity, and gender.

2 A Model to Relate the First Two Moments of Subject-Level Risk Factors to a Health Outcome

Sammel et al. [4] obtained subject-level growth curves and residual variances and used them to predict outcomes in a two-stage model. Here we extend this idea as a shared parameter model linking mean and variance parameters governing the continuous subject-level longitudinal risk factor measures Y with the binary outcome of interest W:

Yit|βi,σi2~N(f(βi;t),σi2) (1)
Wi|βi,σi2,Zi,γ~BER(πi),log(πi1πi)=g(γ;βi,σi2,Zi)
βi|β,Σ~N(β,Σ)
log(σi2)|σ,Ψ2~N(σ,Ψ2).

We assume that the longitudinal risk factor measures are normally distributed with mean f(βi; t) that may be a linear or non-linear (polynomial or spline) function of t parameterized by the subject-level parameters βi, and subject-level residual variance σi2. Similarly, we assume that the log-odds of the outcome is given by a function g(γ;βi,σi2,Zi) that allows for linear or non-linear relationships between the subject-level mean profile and residual variance parameters as well as other subject-level covariates, parameterized by the population-level parameters γ. For a fully Bayesian model, we ensure a proper posterior by proposing the following conjugate independent hyperpriors:

β~N(β0,Vβ0)
σ~N(σ0,Vσ0)
Σ1~Wishart(k,S0)
Ψ2~Gamma(aσ,bσ)
γ~N(γ0,Vγ).

Very weakly informative hyperprior parameters were used to avoid unduely influencing the information provided by the data.

The posterior distribution is obtained using a Markov Chain Monte Carlo (MCMC) approach that combines Gibbs sampling with Metropolois Hastings draws [2122]. In brief, Gibbs sampling obtains draws from a joint distribution of p(θ | data) for θ = {θ1, …, θq} by initializing θ at some reasonable θ(0) and drawing θ1(1)fromp(θ1|θ2(0),,θq(0),data),θ2(1)fromp(θ2|θ1(1),θ3(0),,θq(0),data), and so forth. As T→∞, θ(T)p(θ1,…,θn|data). The conditional draws are obtained using adaptive rejection sampling [23] in WinBugs software (WinBugs V1.4.3, Imperial College and MRC, UK, 2007).

3 Associations between Dementia and Baseline Level, Change, and Variability in Total Recall

3.1 Preliminary Analyses

In order to obtain accurate information about dementia onset, we restrict our analysis to the AHEAD cohort subjects who are fee-for-service Medicare beneficiaries and were not diagnosed with dementia at the time of the baseline (1993) interview (4,983 of 8,222 subjects). In order to estimate stable subject-level intercepts, slopes, and variances, we further restrict the analysis to the 2,372 subjects who had at least four interviews during the follow-up period; an additional 20 were excluded for lacking age data, yielding a total of 2,352 for analysis.

Table 1 shows the distribution of total recall by year of follow-up, along with age, gender, education, and race/ethnicity of participating survivors. An interview was completed at a given follow-up with 69–99% of survivors with 4 or more total interviews. Mean recall in 1993 was 8.5 words out of 20 (10 for immediate recall and 10 for delayed recall), declining to 6.5 in 2008. Subjects’ mean age at baseline was 75.4 years, increasing to 88.4 years among participating survivors in 2008. At baseline, 65% of subjects were female; 33% had less than a high school education and 15% had more than a high school education; and 83% were white, 12% African-American, and 4% were Hispanic. Participating survivors became increasingly female and more highly educated through the follow-up period.

Table 1.

Total recall, age, gender, education, and race/ethnicity by year of follow-up among participating survivors. HS=High school. (Standard deviations in parentheses).

Year 1993 1995 1998 2000 2002 2004 2006 2008
n survived 2352 2318 2317 2312 2261 2042 1657 1000
n interviewed 2292 2286 2269 2247 1843 1481 1146 852

Recall 8.5(3.7) 8.8(3.6) 8.2(3.6) 7.5(3.5) 7.2(3.5) 6.8(3.3) 6.5(3.3) 6.5(3.3)

Age 75.4(4.6) 77.4(4.6) 79.7(4.6) 81.8(4.6) 84.0(4.6) 85.7(4.4) 87.3(4.1) 88.4(3.5)

% Female 65.2 64.9 64.6 64.4 66.0 65.8 67.2 69.0

% <HS 33.2 33.0 32.8 30.7 29.4 29.0 29.0 27.8

% HS 52.0 52.2 52.4 52.6 53.7 54.2 53.7 53.6

% >HS 14.7 14.7 14.8 14.8 15.6 16.3 17.3 18.5

% White 83.1 83.2 83.4 83.5 83.9 83.7 83.2 83.3
% Black 12.1 11.8 11.9 11.7 11.6 11.7 11.8 11.6
% Hispanic 3.8 4.0 3.7 3.8 3.7 4.0 4.2 4.5
% Other 0.9 1.0 1.1 1.1 0.8 0.7 0.8 0.6

A dementia diagnosis was obtained among 605 subjects (25.7%) by 2008. Figure 1 plots total recall by age among a subsample of subjects who did not develop dementia and a subsample of subjects who did. There are no clear associations between the observed recall trends and the development of dementia.

Figure 1.

Figure 1

Total recall by age among (a) 10 subjects without a dementia diagnosis, and (b) 10 subjects with a dementia diagnosis.

3.2 Joint Modeling of Recall Mean and Variance to Predict Onset of Dementia During Follow-Up

We fit the model proposed in Section 2 to the total recall and dementia outcome data, letting the risk factor measures yit be the total recall score raised to the power .7325 to improve the approximation to normality. The outcome wit is an indicator for whether or not dementia was diagnosed at any time during the follow-up period. Recall measures are obtained from all subjects regardless of dementia diagnosis. Because of the small number of recall observations per subject (4–8), we only considered low-degree polynomials for f(βi, t); based on preliminary mixed-model analysis we chose a quadratic trend: f(βi,t)=β0i+β1ia˜it+β2ia˜it2, where ãit is the age of the ith person at the t follow-up visit standardized for numerical stability reasons by subtracting the global mean age and dividing by the global standard deviation of age: ãit = (ait −81.0496)/5.8487 for age ait. Thus β0i corresponds to a subject-level mean, β1i to a subject-level slope, and β2i to a subject-level curvature in (transformed) recall scores. For the dementia model, we assume g(γ,βi,σi2,zi)=γ0+γ1β0i+10γ2β1i+100γ3β2i+γ4σi2+γ5S(σi2)+γ6Tzi. (We inflated the random effects associated with slope and curvature to bring the components of γ to be on the same scale and thus improve convergence of the MCMC algorithm.) The function S(x)=(xx1)+3x3x1x3x2(xx2)+3+x2x1x3x2(xx3)+3 contains the non-linear component of a restricted cubic spline [24] with knots at x1, x2, and x3, termed “restricted” because S(x) is constrained to be linear in its tails (x < x1 and x > x3), thus avoiding overfitting while still accommodating any non-linearities in the relationship between risk of dementia onset and subject-level residual variances. The knot values were chosen based on a visual inspection of histograms of the posterior means of the residual variances of a transformed recall scores-only model, and were fixed as (1,3,5). The covariate vector zi includes dummy variables for education, race/ethnicity, gender, and baseline age categories (65–70, 71–75, 76–80, and 80+). Finally, we assume relatively flat hyperpriors β0 ≡ 0, Vβ0 = diag(1000), σ0 = 0, Vσ0 = 100, k = 3, S0 = diag(.1), aσ = bσ = .01. Having brought the predictors of dementia to approximately a unit scale, we use priors of the form γ0 ≡ 0, and Vγ = diag(100). Four chains of 20,000 draws were obtained after a burn-in of of 1,000 draws. Convergence of the Markov Chain Monte Carlo algorithm was assessed for each parameter using the Gelman-Rubin statistic R^ [25], which is (approximately) the square root of the total variance of the draws of the parameter divided by the within-chain variance. The maximum value was 1.03 across all population parameters, considered sufficient for convergence.

The posterior mean of β and Σ were (4.38.59.09)and(1.190.078.091.078.185.012.091.012.015) respectively, indicating an overall accelerating decline in recall, with considerable between-subject variability. Figure 2 shows the observed and predicted values of yit for 4 randomly-chosen subjects without and with dementia diagnosis, where the predicted values are given by y^it=β^0i+10β^1ia˜it+100β^2ia˜it2forβ^pi=E(βpi|y). Some subjects had approximately flat trends, while others decreased with varying degrees of rapidity, with no obvious differences between those who developed dementia and those who did not.

Figure 2.

Figure 2

Observed and predicted recall scores with .7325 power transformation, among a subsample of those without a dementia diagnosis and with a dementia diagnosis.

Table 2 provides the posterior means and 95% credible intervals for the predictors of dementia onset during follow-up. There is moderate evidence that higher intercepts and positive or less pronounced negative curvatures are associated with increased risk of dementia, and somewhat stronger evidence that increasing variability in cognitive scores is positively associated with risk of dementia, up to a threshold level in the upper tail of the individual recall score variability distribution. To better visualize these relationship, we plot in Figure 3 the log-odds of dementia for a given subject-level mean, slope, curvature, and residual variance relative to the posterior mean of these population-level quantities (in the case of variance, this is eσ+Ψ2/2 = .93), holding baseline covariates constant. (The tick marks at the bottom of the plot denote the posterior means a random samples of the posterior means of β0i, β1i, β2i, and σi2, and provide a way to interpret the range of risks associated with the subject-level residual variance in the population.) Subjects with variances below the population mean are at reduced risk, while subjects above this mean are at increased risk up to a threshold of approximately 2.5–3. As a specific example, subjects with a variance of 2.5 are predicted to have an odds ratio of 2.48 (95% CI 1.14,7.85) for development of dementia relative to subjects with a variance of 0.5. Similarly, subjects with less negative curvature are at a higher risk of dementia than subjects with more negative curvature: for example, subjects with a curvature of 0 (linear trend) are predicted to have an odds ratio of 5.42 (95% CI 1.01,947.19) for development of dementia relative to those with a curvature of −.15. Among the baseline covariates, only gender showed any evidence of being associated with risk of dementia onset, with males have an odds of 0.76 relative to females (95% CI .52–1.01).

Table 2.

Log OR of dementia as a function of trends and variances of total recall score with power transformation a, adjusted for education, race/ethnicity, and gender. SD=standard deviation (estimated empirically in two-stage model and parametrically in joint model); HS=High school. Boldface denotes CIs that exclude 0.

Joint Model Joint Model Joint Model
Two-Stage Model a=.7325 a=2/3 a=3/4
Subject-Level Intercept βi0 −.003(−.040,.034) 1.001(−.028,2.957) .607(−.023,2.283) .925(.004,2.615)
Subject-Level Slope βi1 −.001(−.025,.025) .400(−.615,2.344) −.077(−1.043,.747) .442(−.398,2.057)
Subject-Level Curvature βi2 .065(.018,.112) 14.81(.05,45.69) 9.26(.21,38.44) 13.53(.44,39.53)
Subject-Level Linear Variance σi2 .050(−.081,.182) .639(.084,1.524) .766(.195,1.490) .478(.007,1.088)
Subject-Level Non-linear Variance S(σi2 ) −.002(−.012,.008) −.092(−.336,−.008) −.777(−2.293,−.061) −.042(−.117,−.001)

< HS (vs. >HS) −.096(−.314,.122) −.140(−.478,.156) −.123(−.397,.139) −.136(−.465,.152)
HS (vs. >HS) .027(−.251,.305) .012(−.375,.407) .017(−.304,.343) .011(−.383,.377)
Black (vs. White) .003(−.294,.299) .024(−.394,.452) .011(−.332,.353) .015(−.370,.412)
Hispanic( vs. White) −.036(−.530,.457) −.095(−.782,.539) −.084(−.654,.485) −.082(−.752,.527)
Other (vs. White) .013(−.921,.946) .041(−1.105,1.175) .001(−1.042,.947) .003(−1.209,1.114)
Male (vs. Female) −.217(−.417,−.017) −.273(−.645,.005) −.234(−.493,−.006) −.263(−.613,−.002)
Baseline age 70–74 (vs. 65–70) −.152(−.468,.163) −.226(−.715,.191) −.186(−.556,.154) −.210(−.699,.189)
Baseline age 75–79 (vs. 65–70) −.033(−.375,.308) −.081(−.568,.357) −.064(−.473,.314) −.069(−.507,.365)
Baseline age 80+ (vs. 65–70) −.028(−.406,.351) −.128(−.684,.358) −.096(−.551,.317) −.126(−.668,.358)

Figure 3.

Figure 3

log-OR for dementia as a function of subject-level intercept, slope, curvature, and variance versus posterior mean of the population-level mean of intercept, slope, curvature, and variance. Tick marks show random sample of posterior means of 200 individual level variances σi2.

For comparison, a two stage model was fit to these data. First, a separate Gaussian regression model was used to relate recall data to a quadratic association of age (standardized to have a mean of 0 and variance of 1 across all subjects) was fit for each subject. Next, the estimated intercepts, slopes, curvature, and residual variance obtained were used to predict the probability of dementia onset in a logistic regression model, adjusting for education, race/ethnicity, gender, and baseline age. Table 2 shows that, as in the joint model, curvature and gender are associated with risk of dementia, and in the same direction (positive or less pronounced negative curvatures associated with increased risk; males with decreased risk).

3.3 Model Checking

We use posterior predictive distribution (PPD) model checking [25] to assess whether the proposed model provides a reasonable approximation to the true data. The PPD “p-value” represents the probability that an observed statistic (which can be a function of both the data y and the parameter θ) is more extreme than replicated statistic, conditional on the observed data: P(T(yobs, θ) ≤ T(yrep, θ) | y), where yrep is drawn from the posterior predictive distribution f(yrep | y) = ⌠f(yrep | θ, y)p(θ | y). Although PPD p-values are not true p-values in that they do not have a uniform distribution, values close to 0 or 1 give evidence of poor model fit. For the predictor data Yit (transformed recall scores), we computed for each subject a chi-square discrepancy statistics of the form Ti(yi;βi,σ2)=Σt(yitf(βi,t))2σi2. We compute P(Ti(yiobs;βi,σ2)<T(yirep;βi,σ2)|(yiobs)) by keeping yi fixed at its observed values and computing 200 values of T(yiobs;βi,σi2) from 200 draws from the posterior of βi, σi2, and comparing these with 200 draws from T(yirep;βi,σi2), which has a χni2 distribution. Figure 4 shows the resulting histogram of the 2,352 PPD p-values for each subject’s recall score trajectory. The median PPD-value was .48; the range was .20 to .78, indicating a reasonable degree of model fit for all subjects. The largest p-value was for a subject whose recall scores was extremely low across all follow-ups, indicating “floor effects” that were not entirely captured by the normality assumption.

Figure 4.

Figure 4

Histogram of PPD p-values for subject-level recall score trajectory discrepancy statistic: (a) Power transformation of .7325, (b) power transformation of 2/3, (c) power transformation of 3/4.

Some preliminary transformations indicated a rather poor fit: Figure 4 shows the equivalent histograms for recall score power transformations of 2/3 and 3/4. Roughly speaking, power transformations less that .7325 led to overdispersed data, and power transformations greater than .7325 lead to underdispersed data, although the integer nature of the underlying recall scores plays a role as well. Because of this sensitivity we report in Table 2 the association between dementia onset and the recall score mean profiles and residual variances for the two alternative power transformations. We found that there was a modest degree of sensitivity to the choice of transformation, but that the basic finding remained that overall mean, curvature, residual variance, and gender are associated dementia onset, with the function form of the residual variance association being similar. The 2/3rd power transformation suggested a somewhat stronger quadratic association between risk of dementia and residual variance, as well as somewhat narrower intervals for the baseline covariate effects, possibly due to somewhat less predictive information being captured from the recall scores.

We also considered the predictive distribution of the dementia outcome Wi. We considered the total count T = Σi wi, which was Tobs = 605 in the observed data; comparing this with Trep=Σiwirep yields a the posterior predictive “p-value” of .50, where wirep is drawn from a Bernoulli distribution with probability πirep=expg(γrep,βirep,σi2rep,zi)1+expg(γrep,βirep,σi2rep,zi)andγrep,βirep,σi2rep and drawn from their posterior distributions in the MCMC chain. To assess whether the predicted probabilities are reasonable at the tails of their distribution as well as overall, we considered a Hosmer-Lemeshow-type fit statistic Tk = Σi∈k wi, where k = 1, …, 10 indexes the deciles of the probability of dementia πi: we compute Tkobs=Σikwi where k is based on the posterior draws of πirep, and compare with the distribution Tkrep=Σikwirep, based on both the posterior draws of πirep and the posterior predictive draws of wirep. This yielded PPD p-values across the deciles of (.32,.38,.34,.47,.44,.43,.56,.52,.52,.55), indicating reasonable fit for the second stage of the model over the range of π̂i, with a very modest tendency to overestimate risk of dementia in the lowest-probability subjects and underestimate risk of dementia in the highest-probability subjects.

4 Simulation Study

We conducted a simulation study as follows. We consider 1,000 observations with 5 repeated measures per subject, with the continuous predictors generated under a Gaussian random effects linear model and the dichotomous outcome generated under a logistic model that is a function of the random effects that govern the predictors:

Yit|βi,σi2~N(β0i+β1it,σi2),t=0,,4,i=1,,1000
Wi|βi,σi2,γ~BER(πi),log(πi1πi)=γ0+γ1β0i+γ2β1i+γ3σi2
βi~N(β,Σ)
log(σi2)~N(σ,Ψ2)

We have β=(4.1)T,Σ=(.7.07.07.12),σ=.4,Ψ2=.3,andγ=(6 .75 0 2)T. This corresponds to a 1 standard deviation increase in the subject-level intercept being associated with a 69% increase in the odds of the outcome, and a 1 standard deviation increase in the subject-level variance being associated with a 310% increase in the odds of the outcome. The probability of the outcome is approximately 19% when the random effects are fixed at their population means.

Two chains of 2,500 draws were obtained after a burn-in of 2,500 draws. The bias (based on the posterior mean), nominal 95% credible interval coverage, and power for the population-level regression parameters estimated from 100 simulated datasets are given in Table 4. Bias is negligible for the population slope and intercept for the predictors, and coverage is approximately correct. There is a modest degree of bias for the logistic regression parameters, on the order of 10–15%, but again coverage is approximately correct. For comparison, we include the results of a two-stage analysis for the logistic regression parameters. Bias toward the null is severe, with nominal 95% coverage being essentially 0 for three of the four parameters.

5 Discussion

Despite the great proliferation of longitudinal health data during the past three decades, relatively little attention has been paid to the role that variability in such data might play in predicting outcomes of interest. This manuscript attempts to fill this gap by developing a method to combine information about both mean trends and variances in longitudinal data to predict categorical outcomes of interest. The method is applied to predict onset of dementia in elderly adults over a fourteen-year time period using recall data measured every two years. Residual variability is found to be associated with dementia risk, with subjects with low variability being less likely to develop dementia by the end of the follow-up period of fourteen years than subjects with moderate to high variability. Overall mean level and curvature (quadratic trend of recall) were marginally associated with dementia risk, with increase mean level and increased quadratic trend found to be associated with increased risk of dementia onset. Little predictive power was found for linear trend or in the baseline measures of education, race/ethnicity, or age. The cognitive performance trends had associations that were reversed from those hypothesized with the diagnostic outcome (higher intercepts associated with decreased risk, accelerating declines associated with increased risk), possibly due to the fact that intercepts and curvatures had a strong negative correlation (posterior mean of -.68), which would lead to some instability in estimation of associated effects. Increased within-person variability had a significant prognostic relationship in the hypothesized direction.

The importance of using a joint model to assess the relationship between the individual parameters governing the trends and variability of recall and risk of dementia is seen in the fact that simple two-stage models that used the results from individual linear regression fits of recall data either had had little relationship with dementia risk, or it was in the reverse direction from that hypothesized (negative curvature, i.e. an increasingly rapid reduction in memory performance, was associated with a decreased risk of dementia onset). This is likely due to very substantial bias toward the null induced by measurement error, and/or spurious relationships induced by floor or ceiling effects, in the first-stage model. A two-stage procedure that used, e.g., empirical Bayes estimates from random effects for both the means and variances to stabilize first stage estimation of the mean trends and residual errors would likely have improved performance, but standard sofware for fitting random effects models does not typically allow for estimation of subject-level random effects for the variance components.

The authors also attempted to fit the shared parameter model in (1) using a fully likelihood-based method. However, integrating out the subject-level variance random effects proved virtually intractable using adaptive Gaussian quadrature methods available for the PROC NLMIXED procedure available SAS V 9.1 (SAS Institute, Cary, NC). Our simulation study suggests that the Bayesian approach with weakly informative priors has reasonable repeated sampling properties, and is vastly superior to estimates obtained from a two-stage method.

Use of a low-degree polynomial to model the longitudinal recall data was dictated by both the nature of the question at hand – specifically the desire to link slopes and curvatures of recall trends to dementia risk – as well as the relative paucity of follow-up visits for each individual. Even if sufficient longitudinal data were available, higher order terms in non-linear growth profiles for the longitudinal risk factors yield predictors of outcome that are difficult to interpret. Elliott [5] used a penalized spline model to detrend subject-level daily affect data consisting of up to 35 follow-up measures, relating the risk of depression to latent clusters of subject-level variability. An extension that would incorporate information from both means and variances could assign subjects into latent classes of profiles and variances, and link these classes to a categorical outcome of interest via a log-linear model [2627]. Such an approach requires some clustering of the profiles and the residual variability of longitudinal predictors; in the application of interest here no such clustering was evident.

As Table 1 shows, most of the missing data in the longitudinal predictors was structurally missing (due to death, rather than loss-to-follow up). Use of a linear mixed model for the longitudinal data makes a missing at random (MAR) assumption [28]. When the missingness is intermittent, the MAR assumption seems reasonable: violation would require that we systematically miss low- or high-mean or highly variable observations within a subject. The MAR assumption is stronger in the small fraction of dropout data, in that it requires that our modeling assumptions of the mean or variance be correct. Future work could consider selection models that would provide sensitivity analysis for violations of the MAR assumptions.

Finally, some discussion of the limitations of our analysis is in order. First, restricting our analysis to those subjects with four or more recall scores, while necessary to provide information about quadratic trends and residual variances, may also lead to selection bias, since subjects with three or fewer follow-ups may be at higher risk of death, which could have a variety of impacts on risk of dementia during follow-up. Indeed, subjects with three or fewer followups were older (79.3 years vs. 75.4 years) and had lower recall scores (6.3 vs. 8.5) at baseline, and were more likely to be male (42.5% vs. 35.3%), lack a high school degree (52.5% vs. 33.3%), and be African-American (16.3% vs. 12.1%) (all p < .001 by t-test or χ2 test). This limitation might be addressed in part by treating the outcome as a time-to-event measure rather than a dichotomous outcome over the whole follow-up period. This would have the advantage of accounting for administrative or competing risk censoring in cases of drop-out or death, as well as increased power and a more nuanced understanding of the associations of interest. However, the problem of insufficient information to estimate trends and residual variance in subjects with few follow-up measures will remain, and highlights the requirement for sufficient follow-up data to implement the proposed methods. A second major issue results from the numerical “fragility” of the models, at least as fit in Winbugs. For example, attempts to include non-linear effects for mean trends in the prediction of dementia onset parallel to those used for variance lead to numerical overflow (“trap”) errors, as did use of flatter hyperprior parameters in S0 for p−1). Direct computation of the relevant conditional distributions for the Gibbs algorithm, though more time-consuming, may allow for direct control of overflow errors; alternatively model approaches such as use of growth mixture models to classify mean trends into categorical predictors may provide more robust numercial results.

Table 3.

Simulation study results: Population-level regression parameters. Bias for posterior mean; coverage for 95% credible intervals.

Two-Stage Model Joint Model
Parmameter Nominal 95% Nominal 95%
(True Value) Bias Coverage Power Bias Coverage Power
β0 (5) .00 94
β1 (.1) −.00 96
γ0 (−4) 2.75 0 −.20 99
γ1 (.75) −.34 1 100 .01 94 99
γ2 (0) .43 35 65 .01 93 7
γ3 (2) −1.53 0 100 .17 95 100

ACKNOWLEDGEMENTS

The authors wish to thank two reviewers and the Editor for their comments, which greatly improved the manuscript. This work was supported by Grant Number R03AG031980 from the National Institute of Aging.

REFERENCES

  • 1.Barnard J, McCulloch R, Meng X-L. Modeling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage. Statisticia Sinica. 2000;10:1281–1311. [Google Scholar]
  • 2.Carrol RJ. Variances are not always nuisance parameters. Biometrics. 2003;59:211–220. doi: 10.1111/1541-0420.t01-1-00027. [DOI] [PubMed] [Google Scholar]
  • 3.Harlow SD, Lin X, Ho MJ. Analysis of menstrual diary data across the reproductive life span: applicability of the bipartite model approach and the importance of within-woman variance. Journal of Clinical Epidemiology. 2000;53:722–733. doi: 10.1016/s0895-4356(99)00202-4. [DOI] [PubMed] [Google Scholar]
  • 4.Sammel MD, Wang Y, Ratcliffe SJ, Freeman E, Propert KJ. Models for within-subject heterogeneity as predictors for disease. Atlanta GA: Proceedings of The American Statistical Association, Biometrics section; 2001. [Google Scholar]
  • 5.Elliott MR. Identifying latent clusters of variability in longitudinal data. Biostatistics. 2007;8:756–771. doi: 10.1093/biostatistics/kxm003. [DOI] [PubMed] [Google Scholar]
  • 6.Kikuya M, Ohkubo T, Metoki H, Asayama K, Hara A, Obara T, Inoue R, Hoshi H, Hashimoto J, Totsune K, Satoh H, Imai Y. Day-by-day variability of blood pressure and heart rate at home as a novel predictor of prognosis: the Ohasama study. Hypertension. 2008;52:1045–1050. doi: 10.1161/HYPERTENSIONAHA.107.104620. [DOI] [PubMed] [Google Scholar]
  • 7.Fouque JP, Papanicolaou G, Sircar KR. Derivatives in financial markets with stochastic volatility. Cambridge, UK: Cambridge University Press; 2000. (2000) [Google Scholar]
  • 8.Muthén B, Brown CH, Masyn BJ, Khoo S-T, Wang C-P, Kellman SG, Carlin J, Liao J. General growth mixture modeling for randomized preventive interventions. Biostatistics. 2002;3:459–475. doi: 10.1093/biostatistics/3.4.459. [DOI] [PubMed] [Google Scholar]
  • 9.Ye W, Lin X, Taylor JMG. Semiparametric modeling of longitudinal measurements and time-to-event data – a two-stage regression calibration approach. Biometrics. 2008;64:1238–1246. doi: 10.1111/j.1541-0420.2007.00983.x. [DOI] [PubMed] [Google Scholar]
  • 10.Anstey KJ, Smith GA. Interrelationships among biological markers of aging, health, activity, acculturation, and cognitive performance in late adulthood. Psychology and Aging. 1999;14:605–618. doi: 10.1037//0882-7974.14.4.605. [DOI] [PubMed] [Google Scholar]
  • 11.Christensen H, Mackinnon AJ, Korten AE, Jorm AF, Henderson AS, Jacomb P, Rodgers B. An analysis of diversity in the cognitive performance of elderly community dwellers: individual differences in change scores as a function of age. Psychology and Aging. 1999;14:365–379. doi: 10.1037//0882-7974.14.3.365. [DOI] [PubMed] [Google Scholar]
  • 12.Hultsch DF, Hertzog C, Small BJ, McDonald-Miszczak L, Dixon RA. Short-term longitudinal change in cognitive performance in later life. Psychology and Aging. 1992;7:571–584. doi: 10.1037//0882-7974.7.4.571. [DOI] [PubMed] [Google Scholar]
  • 13.Nesselroade JR. Intraindividual variability and short-term change. Commentary. Gerontology. 2004;50:44–47. doi: 10.1159/000074389. [DOI] [PubMed] [Google Scholar]
  • 14.Martin M, Hofer SM. Intraindividual variability, change, and aging: conceptual and analytical issues. Gerontology. 2004;50:7–11. doi: 10.1159/000074382. [DOI] [PubMed] [Google Scholar]
  • 15.Schaie KW. The impact of longitudinal studies on understanding development from young adulthood to old age. International Journal of Behavioral Development. 2000;24:257–266. [Google Scholar]
  • 16.Darby D, Maruff P, Collie A, McStephen M. Mild cognitive impairment can be detected by multiple assessments in a single day. Neurology. 2002;59:1042–1046. doi: 10.1212/wnl.59.7.1042. [DOI] [PubMed] [Google Scholar]
  • 17.Kliegel M, Sliwinski M. MMSE cross-domain variability predicts cognitive decline in centenarians. Gerontology. 2004;50:39–43. doi: 10.1159/000074388. [DOI] [PubMed] [Google Scholar]
  • 18.Collie A, Maruff P, Currie J. Behavioral characterization of mild cognitive impairment. Journal of Clinical and Experimental Neuropsychology. 2002;24:720–733. doi: 10.1076/jcen.24.6.720.8397. [DOI] [PubMed] [Google Scholar]
  • 19.Juster FT, Suzman R. An overview of the Health and Retirement Study. Journal of Human Resources. 1995;30:S7–S56. [Google Scholar]
  • 20.Taylor DH, Jr, Ostbye T, Langa KM, Weir D, Plassman BL. The accuracy of Medicare claims as an epidemiological tool: The case of dementia revisited. Journal of Alzheimers Disease. 2009;17:807–815. doi: 10.3233/JAD-2009-1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gelfand AE, Smith AMF. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association. 1990;85:389–409. [Google Scholar]
  • 22.Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis. 2nd Edition. Boca Raton, FL: Chapman&Hall/CRC; 2004. [Google Scholar]
  • 23.Gilks WR, Wild P. Adaptive rejection sampling for Gibbs sampling. Applied Statistics. 1992;41:337–348. [Google Scholar]
  • 24.Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics in Medicine. 1989;8:551–561. doi: 10.1002/sim.4780080504. [DOI] [PubMed] [Google Scholar]
  • 25.Gelman A, Meng X-L, Stern H. Posterior predictive assessment of model fitness via realized discrepancies (with discussion) Statistica Sinica. 1996;6:733–807. [Google Scholar]
  • 26.Lin H, Turnbull BW, Mc Culloch CE, Slate EH. Latent class models for joint analysis of longitudinal biomarker and event process data: application to longitudinal prostate-specific antigen readings and prostate cancer. Journal of the American Statistical Association. 2002;97:5365. [Google Scholar]
  • 27.Proust-Lima C, Letenneur L, Jacqmin-Gadda H. A nonlinear latent class model for joint analysis of multivariate longitudinal data and a binary outcome. Statistics in Medicine. 2007;26:2229–2245. doi: 10.1002/sim.2659. [DOI] [PubMed] [Google Scholar]
  • 28.Little RJA, Rubin DB. Statistical Analysis With Missing Data. 2nd edition. New York, NY: Wiley; 2002. [Google Scholar]

RESOURCES