Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2016 Aug 31;184(5):378–387. doi: 10.1093/aje/kwv451

A Simulation Platform for Quantifying Survival Bias: An Application to Research on Determinants of Cognitive Decline

Elizabeth Rose Mayeda *, Eric J Tchetgen Tchetgen, Melinda C Power, Jennifer Weuve, Hélène Jacqmin-Gadda, Jessica R Marden, Eric Vittinghoff, Niels Keiding, M Maria Glymour *
PMCID: PMC5013884  PMID: 27578690

Abstract

Bias due to selective mortality is a potential concern in many studies and is especially relevant in cognitive aging research because cognitive impairment strongly predicts subsequent mortality. Biased estimation of the effect of an exposure on rate of cognitive decline can occur when mortality is a common effect of exposure and an unmeasured determinant of cognitive decline and in similar settings. This potential is often represented as collider-stratification bias in directed acyclic graphs, but it is difficult to anticipate the magnitude of bias. In this paper, we present a flexible simulation platform with which to quantify the expected bias in longitudinal studies of determinants of cognitive decline. We evaluated potential survival bias in naive analyses under several selective survival scenarios, assuming that exposure had no effect on cognitive decline for anyone in the population. Compared with the situation with no collider bias, the magnitude of bias was higher when exposure and an unmeasured determinant of cognitive decline interacted on the hazard ratio scale to influence mortality or when both exposure and rate of cognitive decline influenced mortality. Bias was, as expected, larger in high-mortality situations. This simulation platform provides a flexible tool for evaluating biases in studies with high mortality, as is common in cognitive aging research.

Keywords: cognitive decline, collider-stratification bias, dementia, selection bias, selective survival, simulation, survival bias, truncation by death


Selective survival presents an important potential source of bias in longitudinal research with truncation by death. Survival bias is especially relevant in research on determinants of cognitive decline because cognitive decline predicts death (14), many exposures of interest also influence death, and death rates are high in older adult populations. Hernán et al. (5) represented this challenge using directed acyclic graphs, conceptualizing survival bias under the sharp null hypothesis (no effect of the exposure on the outcome in any person) as a type of collider-stratification bias that arises when survival is a common effect of the exposure and outcome process. For cognitive aging research, situations where cognitive outcomes directly influence survival and where a determinant of cognitive decline influences survival are both plausible.

Although in some situations researchers may be confident that they have appropriately measured and accounted for the selection processes, the potential for survival bias is widely recognized in numerous settings when many determinants of survival are not known or not measured. Simulation studies provide an opportunity to systematically assess the likely magnitude of bias under an array of assumptions about the data-generating process, including the causal structure and effect sizes. Simulation studies may also provide useful guidance to researchers about which assumptions about the selection process matter most for their research question.

We have developed a flexible simulation platform for quantifying the expected magnitude of bias in studies of determinants of rate of cognitive decline. Using this simulation platform, we evaluated potential survival bias when implementing naive analyses of cognitive decline under conditions of selective survival. We considered several causal scenarios, with a binary baseline exposure and continuous cognitive function measured on repeated occasions, as is typical in cohort studies of cognitive aging. The binary exposure could represent many different exposures of interest to epidemiologists—for example, a treatment in a hypothetical randomized controlled trial, such as hormone replacement therapy or intensive glycemic control, or an exposure in a hypothetical cohort study, such as a genetic variant or smoking.

Simulating data to model cognitive aging entails incorporating complex correlation structures that prevail across successive cognitive assessments of a person and error in typical neuropsychological measurements. We therefore begin by describing our overall approach to generating cognitive data. We then describe the specific causal scenarios we evaluated and report effect estimates obtained from naive regression models (standard regression models that condition on survival to outcome assessment) under each scenario. Our objective was to simulate a study of cognitive aging that corresponded with the assumptions of substantive researchers and captured the most important drivers of survivor bias but omitted superfluous data features in order to maintain transparency and constrain the number of input assumptions. We therefore adopted several simplifying assumptions for clarity of exposition, most notably the sharp null hypothesis of no effect of the exposure on rate of cognitive change in any person. Simple modifications of the simulation code could make it relevant to a range of research questions related to diverse outcomes—for example, performance of alternative analytical approaches in the presence of selective survival; effects of early life-course factors that influence survival for decades prior to study enrollment; or sex differences in dementia incidence.

METHODS

Hypothetical study

We considered a study of the effect of a binary exposure on rate of cognitive decline in adults aged ≥60 years. The hypothetical study sample was followed for up to 9 years with cognitive assessments administered every 1.5 years (i.e., up to 7 assessments). To focus discussion on survivor bias, we assumed that no other form of attrition was present and that exposure was effectively randomized at baseline (no confounding).

Generation of repeated cognitive measures and survival times

Each person in each simulated data set was assigned the following: exposure (from Bernoulli(0.50)); baseline age (baseline_age), defined as 60 years plus a random variable drawn from a Beta(2,4) distribution, which we scaled to allow a possible 40-year age range (see Web Figure 1, available at http://aje.oxfordjournals.org/); and an unmeasured continuous covariate, UN(0,1), which can represent a single variable or a set of variables that influence rate of cognitive change. We adopted a growth curve framework for generating cognitive measures, since it easily incorporates within-person correlation, as well as other sources of variation between people, such as random slopes. For each individual i at wave j, we generated a value of cognitive function Cij, following an autoregressive linear model with a random intercept (ζ0i) and random slope (ζ1i) drawn from a bivariate normal distribution with mean 0, variances σζ02 and σζ12, and covariance σζ01. Although we use the term timeij, the value of time at wave j does not vary between individuals in the present simulations.

Cij=β00+β01exposurei+β02baseline_agei+β03Ui+(β10+β11exposurei+β12baseline_agei+β13Ui)timeij+ζ0i+ζ1itimeij+ϵij. (1)

In this model (see coefficient definitions in Appendix Table 1), ϵij represents unexplained variation in Cij, where ϵij=ρϵij1+αij,ϵijN(0,σϵ2),αijN(0,(1ρ2)σϵ2),andαijϵij1. This structure creates an autoregressive model within a person where the variance of ϵij is constant across waves (see Web Table 1).

Our data-generating model for cognitive function allows for random intercepts and random slopes and an autoregressive within-person covariance structure. Most researchers evaluating determinants of rate of cognitive change either use a linear mixed-effects model with random intercepts and random slopes or adopt a generalized estimating equations (GEE) approach assuming an autoregressive within-person covariance structure (i.e., marginalizing over random effects). Our simulation platform can accommodate a variety of structures. For example, the data-generating model for cognitive function can be specified without random effects (by setting σζ02=0, σζ12=0, and σζ01=0) or the data-generating model can be set to correspond to a strictly random-effects model (by setting ρ = 0).

For conceptual clarity, note that equation 1 can be written more succinctly as

Cij=Xiβ0+(Xiβ1)timeij+ζ0i+ζ1itimeij+ϵij, (2)

where β0 is a vector of fixed effects for cognitive intercept and β1 is a vector of fixed effects for cognitive slope (rate of cognitive change), and X′ represents the vector of covariates, including exposure, baseline age, and U.

After generating true values of cognitive function for each individual i at each wave j, we generated measured values of cognitive function (Cij) by adding random measurement error, δijN(0,σδ2), to the true values of cognitive function (Cij) described in equation 1:

Cij=Cij+δij. (3)

The simulation code can be modified to eliminate measurement error by setting σδ2=0.

We generated survival time for each person as a function of exposure (time-constant), U (time-constant), age (time-varying), rate of cognitive change (time-varying), and level of cognitive function (time-varying). At each cognitive assessment wave j, a “time to death” value is generated for each person who has remained alive up to that wave. The time-to-death variable is generated using information on the person's covariate history up to that time point as a random variable drawn from an exponential survival distribution (based on the hazard function specified in equation 4 below). If the randomly generated time to death exceeds the length of the interval between waves j and j + 1 (1.5 years), the person is considered alive at wave j + 1 and a new survival time is generated for the next interval, conditional on history up to the start of the interval. The process is repeated until the person's survival time falls within a given interval or the end of the study (7 waves), whichever comes first. Each person's hazard function at time t in the jth interval is defined as

h(tij|x)=λexp(γ1exposurei+γ2ageij+γ3Ui+γ4exposurei×Ui+γ5(slopeij)+γ6Cij) (4)

(see coefficient definitions in Appendix Table 1). In this model, slopeij represents individual i 's rate of cognitive change for the interval starting at time j and is defined as

slopeij=(Xiβ1+ζ1i)+ϵij+1ϵijtimeij+1timeij, (5)

where Xiβ1+ζ1i) equals individual i 's average slope throughout follow-up and (ϵij+1ϵij)/(timeij+1timeij) represents individual i 's deviation from this long-term average slope during the current time interval. This slope does not incorporate measurement error because it represents the person's true, rather than measured, rate of cognitive change. We then used the inverse cumulative hazard function transformation formula described by Bender et al. (6) (see Web Appendix 1) to generate each person's survival time for a given time interval at risk based on the above hazard function.

Causal scenarios guiding data generation

We carried out simulations under several causal scenarios (Figure 1). We chose an initial scenario (scenario A) with no anticipated survivor bias. Although the exposure influences mortality, no correlate of cognitive decline affects mortality in this scenario, so conditioning on survival should not give rise to collider-stratification bias. We then considered scenarios under which selective survival was expected to induce collider-stratification bias. In the second scenario (scenario B), mortality is influenced by exposure and U, an unmeasured determinant of rate of cognitive decline. Because interaction between causes is known to influence the magnitude of collider-stratification bias (5, 7, 8), we consider a situation in which exposure and U influence the mortality hazard multiplicatively, with no interaction on the hazard ratio scale in determining mortality (scenario B1), and also a situation where exposure and U interact on the hazard ratio scale in determining mortality (scenario B2). In scenario C1, mortality is influenced by exposure and current rate of cognitive change, and in scenario C2, level of cognitive function at the beginning of the time interval also influences mortality.

Figure 1.

Figure 1.

Causal scenarios investigated for potential selection bias. In all scenarios, the target of inference is the effect of the exposure on rate of change in cognitive function over time (rate of cognitive decline). “Cognitive Intercept” reflects cognitive level at the baseline cognitive assessment. The bidirectional arrow between “Cognitive Intercept” and “Rate of Cognitive Decline” indicates that the random intercept and random slope terms were drawn from a bivariate distribution with nonzero covariance. The box around “Survival” represents the fact that we are conditioning on survival = 1, meaning that analyses are limited to people who are alive at each cognitive assessment. Age is a determinant of survival, cognitive intercept, and rate of cognitive decline; the box around “Age” represents the fact that we are conditioning on age by adjusting for age in all regression models for cognitive decline. A) Scenario A: no anticipated survivor bias. B) Scenarios B1 (exposure and U affect mortality) and B2 (exposure and U have more-than-multiplicative effects on the mortality hazard): mortality is influenced by exposure and U, an unmeasured determinant of rate of cognitive change, without and with an interaction on the hazard ratio scale between the exposure and U. C) Scenario C1 (exposure and rate of cognitive change affect mortality): mortality risk is influenced by exposure and current rate of cognitive change. D) Scenario C2 (exposure, rate of cognitive change, and cognitive level affect mortality): mortality risk is influenced by exposure, current rate of cognitive change, and level of cognitive function at the beginning of the time interval.

For each causal scenario, we simulated B = 1,000 samples of n = 1,500 people with the archetype parameter inputs displayed in Table 1. We selected “archetype” values that represented large associations between variables to assess survival bias when the associations between variables were at the higher bound of the spectrum of what we considered likely in real data. Specifically, every 1-unit increase in U doubles the rate of cognitive decline (β13 = β10 = −0.05). Exposure doubles the hazard of death (exp(γ1) = 2.0). A 1-unit increase in U increases the hazard of death by a factor of 5.0, either overall (scenario B1; exp(γ3) = 5.0) or among the exposed only (scenario B2; exp(γ4) = 5.0). In scenario B2, this is essentially a silencing interaction, such that if exposure is not present, U has no effect. Each 0.10-unit faster annual rate of cognitive decline increases the hazard of death by 34% (exp(γ5 × −0.10) = 1.34), and every 1-unit decrease in level of cognitive function (Cij) doubles the hazard of death (exp(γ6 × −1) = 2.0).

Table 1.

Archetype Values Used to Generate Models for Cognitive Function (Cij) and Hazard of Death (h(tij|x)) for Each Causal Scenario

Scenarioa Parameters Used in Data-Generating Models
Models for Cognitive Function (Cij)
Models for Hazard of Death (h(tij|x))
β00 β01 β02 β03 β10 β11 β12 β13b γ1b γ2 γ3b γ4b γ5b γ6
Ac 0 0 −0.05 0 −0.05 0 −0.005 −0.05 0.69 0.086 0 0 0 0
B1d 0 0 −0.05 0 −0.05 0 −0.005 −0.05 0.69 0.086 1.61 0 0 0
B2e 0 0 −0.05 0 −0.05 0 −0.005 −0.05 0.69 0.086 0 1.61 0 0
C1f 0 0 −0.05 0 −0.05 0 −0.005 −0.05 0.69 0.086 0 0 −2.9 0
C2g 0 0 −0.05 0 −0.05 0 −0.005 −0.05 0.69 0.086 0 0 −2.9 −0.69
a All scenarios are variations on the general data-generating structure, which is
Cij=β00+β01exposurei+β02baseline_agei+β03Ui+(β10+β11exposurei+β12baseline_agei+β13Ui)timeij+ζ0i+ζ1itimeij+ϵijh(tij|x)=λexp(γ1exposurei+γ2baseline_ageij+γ3Ui+γ4exposurei×Ui+γ5slopeij+γ6Cij).

b Italic numbers represent parameters for which alternative values, which represent more moderate effect sizes, are considered.

c No anticipated survivor bias.

d Exposure and U affect mortality.

e Exposure and U have more-than-multiplicative effects on the mortality hazard.

f Exposure and rate of cognitive change affect mortality.

g Exposure, rate of cognitive change, and cognitive level affect mortality.

To assess bias with more moderate effect sizes, we examined alternative parameter inputs for selected scenarios, changing one parameter value at a time, keeping the other parameters at archetype values. We varied: the effect of U on annual rate of cognitive change (β13); the effect of the exposure on the log hazard rate of mortality (γ1); the effect of U on the log hazard rate of mortality (γ3); the interaction between exposure and U in determining the log hazard rate of mortality (γ4); and the effect of rate of cognitive change on the log hazard rate of mortality (γ5). We did not vary the distributions of random effects, within-person covariance structure, or measurement error for cognitive function. These distributions are described in Table 2.

Table 2.

Input Variance, Covariance, and Correlation Values for Generating Cij (True Cognitive Function) (Text Equation 1a) and Cij* (Measured Cognitive Function) (Text Equation 3b)

Parameter Definition Value
σζ02 Variance of ζ0i, individual i 's deviation from the group mean intercept 0.20
σζ12 Variance of ζ1i, individual i's deviation from the group mean slope 0.005
σζ01 Covariance of ζ0i and ζ1i 0.01
σϵ2 Variance of ϵij, unexplained variation in Cij 0.70
ρ Correlation between ϵij and ϵij+1 0.40
σδ2 Variance of δij, random measurement error of cognitive function 0.19

a Cij=β00+β01exposurei+β02baseline_agei+β03Ui+(β10+β11exposurei+β12baseline_agei+β13Ui)timeij+ζ0i+ζ1itimeij+ϵij.

b Cij=Cij+δij.

Because the proportion of the study population that dies throughout follow-up may influence the degree of selection bias, we also varied cumulative mortality by the end of the hypothetical follow-up period. For each causal scenario, we generated the data under conditions of low, intermediate, and high mortality (cumulative mortality of 25%, 50%, and 75%, respectively). To achieve this across causal scenarios, we adjusted the baseline hazard such that cumulative mortality equaled the target.

Assessment of survival bias in estimated exposure–cognitive change associations

As previously stated, we assumed the sharp null hypothesis that there is no person in the population for whom the exposure affects rate of cognitive change (β11 = 0). Therefore, we attributed any nonnull association between the exposure and rate of cognitive change to survival bias. We produced estimates of the effect of exposure on rate of cognitive change using measured values of cognitive function (Cij) among people who were alive at each cognitive assessment based on linear mixed-effects models and population average effects models estimated using GEE, the two modeling approaches most commonly used in practice for repeated measures. In all modeling approaches, a person was censored at her last cognitive assessment prior to death or at the end of follow-up (9 years), whichever came first. Consistent with the data-generating process, we used study time as the time scale and adjusted for baseline age. In all models, we adopted a naive analysis approach without adjusting for any measure of U to reflect the fact that in practice, such a measure may not be available.

We fitted linear mixed-effects models with random intercepts and slopes, allowing for possible correlation of the random intercepts and random slopes and no additional within-person covariance structure, as is common practice for standard linear mixed-effects models. For the GEE approach, we specified an autoregressive within-person correlation structure, where Corr(Cij,Cij+k)=ρk for all j and k. The GEE approach is often adopted because it is considered robust to misspecifications of the covariance matrix.

Neither of these commonly used models is completely consistent with our data-generating model, which incorporated an autoregressive within-person covariance structure in addition to random intercepts and slopes; thus, as a sensitivity analysis, we also fitted linear mixed-effects models with an autoregressive within-person covariance structure, where Corr(eij,eij+k)=ρk for all j and k. These models more closely approximate the data-generating model.

Across the B = 1,000 simulated samples, β11 = 0 = the true value of the effect of exposure on rate of cognitive decline and βˆ11= the estimated effect of exposure on rate of cognitive decline. Because effect sizes for annual rate of cognitive change are relatively small, we expressed effect sizes in 10-year increments for clarity. We calculated the mean value of the estimate over all simulations, βˆ¯11=k=1B(βˆ11k/B). We assessed accuracy as root mean square error, which is the square root of the mean squared deviation of the estimated effect of exposure on rate of cognitive decline (βˆ11) from the true value (β11 = 0). We estimated the 95% confidence interval coverage as the proportion of simulations in which the 95% confidence interval for βˆ11 included β11 = 0. We do not report percent bias because data were generated under the null hypothesis.

In scenarios B1 and B2, conditioning on survival induces collider-stratification bias because U is an unmeasured common cause of mortality and rate of cognitive change. To illustrate the association between exposure and U induced by selective survival in these scenarios, we examined the difference in the mean value of U between exposed and unexposed survivors in each wave of the study for scenario A (where no bias was anticipated) and scenarios B1 and B2 with low, intermediate, and high mortality. Scenarios C1 and C2 were omitted because selective survival induces negligible associations between exposure and U in these scenarios; the bias in these scenarios does not arise from U.

We used Stata SE, version 13.1 (StataCorp LP, College Station, Texas), for all data generation and analyses. The simulation code (Web Appendices 2 and 3), code books for the simulation code (Web Tables 2 and 3), and the outline of steps for using the code (Web Figure 2) are available online.

RESULTS

Recall that in all scenarios, exposure has no effect on rate of cognitive change (β11 = 0), so any association between exposure and rate of cognitive change (βˆ110) represents survival bias. In scenario A, the estimated effect of exposure and rate of cognitive change was unbiased, as expected, with estimated effects centered around the null with approximately correct coverage under low, intermediate, and high mortality for both linear mixed-effects models (upper portion of Table 3, first row) and the GEE approach (lower portion of Table 3, first row). The estimated effect of exposure on rate of cognitive change was biased to some degree in all other causal scenarios (Tables 36). The extent of bias was similar for linear mixed-effects models and the GEE approach, so we discuss only the mixed-model results in detail.

Table 5.

Scenario B2 (Exposure and U Have More-Than-Multiplicative Effects on the Mortality Hazard) Simulation Results for the Estimated Effect of Exposure on Rate of Cognitive Change (β11) per 10 Years, Based on Alternative Input Values (More Moderate Effect Sizes)

Low Mortality (25%)
Intermediate Mortality (50%)
High Mortality (75%)
βˆ¯11 RMSE Coveragea βˆ¯11 RMSE Coverage βˆ¯11 RMSE Coverage
Linear mixed-effects models
 Under archetypical conditionsb 0.164 0.186 0.527 0.288 0.304 0.165 0.412 0.429 0.081
 Moderate effect of exposure on mortalityc 0.154 0.176 0.568 0.268 0.285 0.225 0.383 0.401 0.117
 Moderate effect of exposure and U on mortalityd 0.079 0.117 0.854 0.158 0.185 0.635 0.256 0.283 0.444
 Moderate effect of U on cognitive declinee 0.085 0.119 0.836 0.147 0.174 0.671 0.209 0.238 0.592
GEE approach
 Under archetypical conditionsb 0.189 0.210 0.464 0.326 0.342 0.122 0.458 0.476 0.061
 Moderate effect of exposure on mortalityc 0.177 0.199 0.517 0.303 0.320 0.161 0.427 0.446 0.090
 Moderate effect of U on mortalityd 0.097 0.132 0.831 0.191 0.216 0.539 0.304 0.330 0.312
 Moderate effect of U on cognitive declinee 0.096 0.130 0.845 0.163 0.192 0.637 0.228 0.260 0.562

Abbreviations: GEE, generalized estimating equations; HR, hazard ratio; RMSE, root mean square error.

a Proportion of times the 95% confidence interval for βˆ11 includes β11.

b Results in this row replicate results for scenario B2 shown in Table 3 for convenience of comparisons.

c Input value for γ1 = 0.40 (HR = 1.5) instead of 0.69 (HR = 2.0).

d Input value for γ4 = 0.69 (HR = 2.0) instead of 1.61 (HR = 5.0).

e Input value for β13 = −0.025 instead of −0.05.

Table 3.

Simulation Results From Scenarios A–C for the Estimated Effect of Exposure on Rate of Cognitive Change (β11) per 10 Years, Based on Archetype Input Values

Low Mortality (25%)
Intermediate Mortality (50%)
High Mortality (75%)
βˆ¯11 RMSE Coveragea βˆ¯11 RMSE Coverage βˆ¯11 RMSE Coverage
Linear mixed-effects models
 Scenario A 0.002 0.085 0.954 0.000 0.095 0.958 −0.002 0.120 0.957
 Scenario B1 0.048 0.098 0.914 0.075 0.124 0.884 0.094 0.155 0.883
 Scenario B2 0.164 0.186 0.527 0.288 0.304 0.165 0.412 0.429 0.081
 Scenario C1 0.094 0.126 0.809 0.186 0.209 0.518 0.316 0.335 0.229
 Scenario C2 0.088 0.121 0.818 0.170 0.193 0.562 0.288 0.308 0.264
GEE approach
 Scenario A 0.001 0.088 0.964 −0.001 0.099 0.960 −0.005 0.128 0.949
 Scenario B1 0.053 0.103 0.919 0.081 0.132 0.890 0.100 0.169 0.884
 Scenario B2 0.189 0.210 0.464 0.326 0.342 0.122 0.458 0.476 0.061
 Scenario C1 0.092 0.127 0.838 0.189 0.214 0.533 0.338 0.359 0.193
 Scenario C2 0.099 0.131 0.825 0.183 0.207 0.527 0.297 0.319 0.286

Abbreviations: GEE, generalized estimating equations; RMSE, root mean square error.

a Proportion of times the 95% confidence interval for βˆ11 includes β11.

Table 6.

Scenario C1 (Exposure and Rate of Cognitive Change Affect Mortality) Simulation Results for the Estimated Effect of Exposure on Rate of Cognitive Change (β11) per 10 Years, Based on Alternative Input Values (More Moderate Effect Sizes)

Low Mortality (25%)
Intermediate Mortality (50%)
High Mortality (75%)
βˆ¯11 RMSE Coveragea βˆ¯11 RMSE Coverage βˆ¯11 RMSE Coverage
Linear mixed-effects models
 Under archetypical conditionsb 0.094 0.126 0.809 0.186 0.209 0.518 0.316 0.335 0.229
 Moderate effect of exposure on mortalityc 0.055 0.100 0.902 0.107 0.142 0.819 0.180 0.212 0.673
 Moderate effect of rate of cognitive change on mortalityd 0.043 0.095 0.928 0.097 0.137 0.828 0.189 0.225 0.664
GEE approach
 Under archetypical conditionsb 0.092 0.127 0.838 0.189 0.214 0.533 0.338 0.359 0.193
 Moderate effect of exposure on mortalityc 0.053 0.102 0.924 0.108 0.147 0.819 0.191 0.225 0.649
 Moderate effect of rate of cognitive change on mortalityd 0.041 0.097 0.942 0.098 0.141 0.840 0.207 0.244 0.624

Abbreviations: GEE, generalized estimating equations; HR, hazard ratio; RMSE, root mean square error.

a Proportion of times the 95% confidence interval for βˆ11 includes β11.

b Results in this row replicate results for scenario C1 shown in Table 3 for convenience of comparisons.

c Input value for γ1 = 0.40 (HR = 1.5) instead of 0.69 (HR = 2.0).

d Input value for γ5 = −1.0 (HR = 0.4) instead of −2.9 (HR = 0.1).

With archetype parameter values, in scenario B1 (exposure and U affect mortality), we would estimate that exposure reduces rate of cognitive decline by 0.05 and 0.09 standard deviation units per decade under low (25%) and high (75%) mortality, respectively (Table 2). Coverage for scenario B1 ranged from 91% under low mortality to 88% under high mortality. The magnitude of bias was much larger for scenario B2, where exposure and U have more-than-multiplicative effects on the mortality hazard. Scenario C1 (exposure and rate of cognitive change affect mortality) also entailed substantial bias. We would estimate that exposure reduces rate of cognitive decline by 0.09 and 0.32 standard deviation units per decade under low and high mortality, respectively. Coverage for scenario C1 ranged from 81% under low mortality to 23% under high mortality. Results for scenario C2, in which rate of cognitive change and cognitive level affect mortality, were similar to results for scenario C1. With more moderate parameter inputs in scenarios B1, B2, and C1, the magnitude of bias and coverage improved (Tables 46).

Table 4.

Scenario B1 (Exposure and U Affect Mortality) Simulation Results for the Estimated Effect of Exposure on Rate of Cognitive Change (β11) per 10 Years, Based on Alternative Input Values (More Moderate Effect Sizes)

Low Mortality (25%)
Intermediate Mortality (50%)
High Mortality (75%)
βˆ¯11 RMSE Coveragea βˆ¯11 RMSE Coverage βˆ¯11 RMSE Coverage
Linear mixed-effects models
 Under archetypical conditionsb 0.048 0.098 0.914 0.075 0.124 0.884 0.094 0.155 0.883
 Moderate effect of exposure on mortalityc 0.029 0.091 0.934 0.044 0.108 0.923 0.056 0.135 0.922
 Moderate effect of U on mortalityd 0.034 0.092 0.938 0.062 0.114 0.912 0.089 0.150 0.902
 Moderate effect of U on cognitive declinee 0.026 0.087 0.939 0.039 0.104 0.922 0.048 0.130 0.933
GEE approach
 Under archetypical conditionsb 0.053 0.103 0.919 0.081 0.132 0.890 0.100 0.169 0.884
 Moderate effect of exposure on mortalityc 0.032 0.094 0.947 0.047 0.114 0.932 0.056 0.135 0.922
 Moderate effect of U on mortalityd 0.041 0.098 0.947 0.072 0.123 0.901 0.087 0.149 0.901
 Moderate effect of U on cognitive declinee 0.027 0.090 0.952 0.041 0.109 0.940 0.047 0.130 0.937

Abbreviations: GEE, generalized estimating equations; HR, hazard ratio; RMSE, root mean square error.

a Proportion of times the 95% confidence interval for βˆ11 includes β11.

b Results in this row replicate results for scenario B1 shown in Table 3 for convenience of comparisons.

c Input value for γ1 = 0.40 (HR = 1.5) instead of 0.69 (HR = 2.0).

d Input value for γ3 = 0.69 (HR = 2.0) instead of 1.61 (HR = 5.0).

e Input value for β13 = −0.025 instead of −0.05.

In sensitivity analyses, the extent of bias in estimates from linear mixed-effects models with an autoregressive within-person covariance structure were similar to results from linear mixed-effects models with no additional within-person covariance structure beyond that imposed by the random intercepts and slopes and the GEE approach (Web Tables 4–7).

Underlying selective survival bias in scenarios B1 and B2 is a correlation between the exposure and U induced by restricting analyses to people who are alive. If U is correlated with both the exposure and rate of cognitive decline, in effect it becomes a confounder of this relationship. Thus, the larger the association between the exposure and U, the larger the magnitude of survival bias. Figure 2 shows the difference in the mean value of U between exposed and unexposed participants alive in each wave of the study from scenarios A, B1, and B2, with archetype input values under conditions of low mortality (25%), intermediate mortality (50%), and high mortality (75%). Recall that the data-generating processes for U and exposure were independent, so any association is due to selective survival. As expected, no association between exposure and U was induced in scenario A. An association between exposure and U was induced in scenario B1 (exposure and U affect mortality), but a much stronger association was induced in scenario B2 (exposure and U have more-than-multiplicative effects on the mortality hazard), and the association increased as cumulative mortality increased.

Figure 2.

Figure 2.

Difference in the mean value of U between exposed and unexposed participants alive at each wave of the study in scenarios A, B1, and B2 (archetype input values) under conditions of low mortality (25%) (A), intermediate mortality (50%) (B), and high mortality (75%) (C). Scenarios C1 and C2 were not included because selective survival induces negligible associations between exposure and U in these scenarios; the bias does not arise from U.

DISCUSSION

We developed and implemented a simulation platform for quantifying possible bias due to selective survival in studies of cognitive aging with truncation by death. This platform allows flexible specification of duration of follow-up, cumulative mortality, distribution and magnitude of effects of determinants of mortality, and distribution of cognitive level and rate of cognitive change. This flexibility allows the simulation to be adapted to diverse studies. We implemented simulations across a range of scenarios corresponding to the assumptions of substantive researchers. We observed substantial bias in data-generating models in which exposure and an unmeasured determinant of cognitive decline interacted on the hazard ratio scale to influence mortality. The magnitude of bias was more modest when the exposure and unmeasured determinant of cognitive decline did not interact on the hazard ratio scale to influence mortality. Bias consistently arose when both exposure and rate of cognitive decline directly influenced mortality. Bias was, as expected, larger in high-mortality situations.

It is widely recognized that selective survival has the potential to bias estimated effects of exposures on rate of cognitive decline (5, 9), but there have been few tools with which to systematically estimate the plausible magnitude of this bias when many determinants of survival are not known or not measured. We focused here on bias under the sharp null hypothesis, but similar bias might impede researchers’ ability to identify exposures that have protective or harmful effects on cognitive aging. We found that the magnitude of survival bias often appears large enough to obscure likely causal effects of exposures in typical cohorts. For example, in the Three-City Study–Dijon, the apolipoprotein E gene (APOE) ϵ4 allele, one of the strongest known predictors of dementia (10), was associated with an excess decline of 0.14 standard deviation units on the Mini-Mental State Examination over 10 years (11). In many scenarios, we observed magnitudes of bias similar to this effect size. Few exposures have been found to be consistently associated with rate of cognitive decline (12). This surprising paucity of consistent predictors of cognitive decline may be partially explained by survival bias.

As expected, cumulative mortality over follow-up substantially influenced the extent of bias. Cumulative mortality in studies of cognitive decline and dementia varies substantially across studies, depending in part on the age of study participants and the length of follow-up. For example, in the Atherosclerosis Risk in Communities Study cohort, 15% of participants (baseline age 48–70 years) died over 14 years of follow-up (13); in the Sacramento Area Latino Study on Aging, 23% of participants (baseline age ≥60 years) died over 10 years of follow-up (14); and in the Chicago Health and Aging Project, 54% of participants (baseline age 65–109 years) died over 12 years of follow-up (9).

Survival bias is often conceptualized as a type of collider-stratification bias (5, 8). Hernán et al. noted that when the exposure and a determinant of the outcome influence the collider, collider-stratification bias does not occur if the data follow a multiplicative survival model such that the 2 causes of the collider have perfectly multiplicative effects on the probability of remaining alive (5, 8). In other words, multiplicative effects imply that the conditional probability of survival given E and U is equal to a product of functions of e and u—that is, Pr(S=1|E=e,U=u)=g(e)h(u), where g(·) and h(·) represent arbitrary functions (see Appendix A.3 in Hernán et al. (5)) (8). Consistent with this, we observed substantial bias when exposure and U interacted on the hazard ratio scale in determining mortality (scenario B2) and much smaller bias when the hazard function for mortality was a multiplicative function of the exposure and U (scenario B1). This is consistent with findings from previous simulation studies, where the magnitude of collider-stratification bias was small under causal structures similar to scenario B1 unless the effects of U on mortality were very large (7, 15, 16). None of our scenarios assessed the situation in which exposure and U were perfectly multiplicative for probability of remaining alive, the situation in which no survival bias would occur, although scenario B1 (exposure and U are multiplicative for the hazard of death) approximates this situation if mortality is rare. We set up the simulation to specify covariate effects on the hazard of mortality, rather than on the cumulative probability of remaining alive at the end of follow-up as implied by the formulation of Hernán et al. (5), because the hazard-of-mortality formulation is more typically encountered in epidemiologic research.

We also observed substantially biased estimation of the exposure's effect on rate of cognitive decline when the exposure and either 1) rate of cognitive change or 2) rate of cognitive change and level of cognitive function influenced mortality (scenarios C1 and C2). While rate of cognitive decline itself may not directly influence mortality per se, rate of cognitive decline can be conceptualized as a surrogate for progression of underlying brain disease. The link between cognitive decline and impending mortality is consistent with the terminal decline and terminal drop hypotheses (17, 18). Empirical studies are needed to better understand whether level of cognitive function or rate of cognitive change is more predictive of mortality among older adults, as this is important for understanding potential degree of survival bias in research on determinants of cognitive decline.

Although some of the parameters that determine the magnitude of survival bias are easily observed in a cohort study (e.g., cumulative mortality), others are more speculative. Our study highlights the importance of deriving plausible estimates for these parameters, including nonmultiplicative influences on mortality, effects of cognitive decline on mortality, and the plausible magnitude of various rarely measured confounders of rate of cognitive decline and mortality. We selected archetype values that represented large associations between variables to assess survival bias when the associations were at the upper bound of what we considered likely in real data, but researchers studying a specific exposure of interest will need to apply input values that they deem plausible for the substantive question.

We made simplifying assumptions in order to maintain transparency and constrain the number of input parameters. It is our goal that our simulation platform will be modified to study extensions of the present work and improve understanding of selective survival in cognitive aging research. For example, extensions could incorporate loss to follow-up as an additional source of attrition, allow nonnull effects of exposure on rate of cognitive change, or introduce left-censoring such that some mortality occurs between exposure and baseline cognitive assessment. These challenges are relevant for understanding a host of empirical findings in neuroepidemiology and aging research.

The conceptualization of survival bias has been controversial because counterfactual values of the outcome are not defined for the dead (9, 19, 20). Various conceptual approaches have been proposed in response, including a principal stratification approach defining the target of inference as the survivor average causal effect—that is, the effect of exposure on outcomes of persons who would survive under either exposure regime (21). The simulation platform can be used to evaluate performance of an analytical approach for any counterfactual contrast of interest, including the survivor average causal effect.

The field of dementia research has recognized the potential importance of selective survival and the causal structures that can theoretically give rise to survival bias. It is time for the field to move towards quantifying the potential magnitude of bias (22). We provide an accessible and flexible tool for examining biases in research on determinants of cognitive decline. This simulation platform can be modified to examine the performance of various methods for addressing survival bias according to study characteristics such as the interval between visits or frequency of dropout before death (21, 23, 24). The platform can also be used to quantify the potential magnitude of bias arising from other methodological challenges in cognitive aging research, including time-varying exposures and confounders (25) and unequal interval scaling in measurement of cognitive function (26).

Supplementary Material

Web Material

ACKNOWLEDGMENTS

Author affiliations: Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco, San Francisco, California (Elizabeth Rose Mayeda, Eric Vittinghoff, M. Maria Glymour); Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts (Eric J. Tchetgen Tchetgen); Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts (Eric J. Tchetgen Tchetgen, Jessica R. Marden); Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland (Melinda C. Power); Department of Epidemiology and Biostatistics, Milken Institute School of Public Health, George Washington University, Washington, DC (Melinda C. Power); Department of Internal Medicine, Rush Institute for Healthy Aging, Chicago, Illinois (Jennifer Weuve); Department of Epidemiology, School of Public Health, Boston University, Boston, Massachusetts (Jennifer Weuve); Institut National de la Santé et de la Recherche Médicale, U1219, Institut de Santé Publique, d’Épidémiologie et de Développement, Universite de Bordeaux, Bordeaux, France (Hélène Jacqmin-Gadda); Department of Social and Behavioral Sciences, Harvard School of Public Health, Boston, Massachusetts (Jessica R. Marden); and Section of Biostatistics, Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark (Niels Keiding).

This work was supported by the American Heart Association (grant 15POST25090083 (E.R.M.)), the National Institute of Allergy and Infectious Diseases (grants R21 AI113251 (E.J.T.T.) and R01 AI104459 (E.J.T.T. and J.R.M.)), the National Institute of Aging (grant T32 AG027668 (M.C.P.)), and the National Institute of Environmental Health Sciences (grants R21 ES020404 and R21 ES24700 (J.W.)).

We thank our collaborators from the Methods for Longitudinal Research on Dementia (MELODEM) Initiative for insightful comments on this project. For further information about the MELODEM Initiative, please visit melodem.org.

Portions of this work were presented at the 47th Annual Meeting of the Society for Epidemiologic Research in Seattle, Washington, June 24–27, 2014.

Conflict of interest: none declared.

Appendix Table 1.

Definitions of regression parameters used to generate models for cognitive function (Cij) and hazard of death (h(tij|x))

Regression Parameter Definition of Parameter
Data-Generating Model for Cognitive Function :a Cij=β00+β01exposurei+β02baseline_agei+β03Ui+(β10+β11exposurei+β12baseline_agei+β13Ui)timeij+ζ0i+ζ1itimeij+ϵij
β00 Group mean cognitive intercept (baseline level of cognitive function) for the unexposed
β10 Group mean cognitive slope (annual rate of cognitive change) for the unexposed
β01 Effect of exposure on level of cognitive function at baseline
β02 Effect of a 1-year change in baseline age (in years, centered at age 60) on level of cognitive function at baseline
β03 Effect of a 1-unit change in U on level of cognitive function at baseline
β11 Effect of exposure on annual rate of cognitive change
β12 Effect of a 1-year change in baseline age on annual rate of cognitive change
β13 Effect of a 1-unit change in U on annual rate of cognitive change
ζ0i Individual i 's deviation from the group mean intercept
ζ1i Individual i 's deviation from the group mean slope
Data-Generating Model for Hazard of Death :b h(tij|x)=λexp(γ1exposurei+γ2ageij+γ3Ui+γ4exposurei×Ui+γ5(slopeij)+γ6Cij)
γ1 Effect of the exposure on the log hazard rate of mortality
γ2 Effect of a 1-year change in age on the log hazard rate of mortality
γ3 Effect of a 1-unit change in U on the log hazard rate of mortality
γ4 Additional effect of a 1-unit change in U on the log hazard rate of mortality among people who are exposed
γ5 Effect of a 1-unit change in rate of cognitive change on the log hazard rate of mortality
γ6 Effect of a 1-unit change in level of cognitive function on the log hazard rate of mortality

a Text equation 1.

b Text equation 4.

REFERENCES

  • 1.Lavery LL, Dodge HH, Snitz B et al. . Cognitive decline and mortality in a community-based cohort: the Monongahela Valley Independent Elders Survey. J Am Geriatr Soc. 2009;571:94–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yaffe K, Lindquist K, Vittinghoff E et al. . The effect of maintaining cognition on risk of disability and death. J Am Geriatr Soc. 2010;585:889–894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Thorvaldsson V, Hofer SM, Berg S et al. . Onset of terminal decline in cognitive abilities in individuals without dementia. Neurology. 2008;7112:882–887. [DOI] [PubMed] [Google Scholar]
  • 4.Wilson RS, Beckett LA, Bienias JL et al. . Terminal decline in cognitive function. Neurology. 2003;6011:1782–1787. [DOI] [PubMed] [Google Scholar]
  • 5.Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;155: 615–625. [DOI] [PubMed] [Google Scholar]
  • 6.Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005;2411:1713–1723. [DOI] [PubMed] [Google Scholar]
  • 7.Glymour MM, Vittinghoff E. Selection bias as an explanation for the obesity paradox: just because it's possible doesn't mean it's plausible. Epidemiology. 2014;251:4–6. [DOI] [PubMed] [Google Scholar]
  • 8.Hernán MA, Robins JM. Causal Inference. Boca Raton, FL: CRC Press; 2016. [Google Scholar]
  • 9.Weuve J, Tchetgen Tchetgen EJ, Glymour MM et al. . Accounting for bias due to selective attrition: the example of smoking and cognitive decline. Epidemiology. 2012;231:119–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Corder EH, Saunders AM, Strittmatter WJ et al. . Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science. 1993;2615123:921–923. [DOI] [PubMed] [Google Scholar]
  • 11.Vivot A, Glymour MM, Tzourio C et al. . Association of Alzheimer's related genotypes with cognitive decline in multiple domains: results from the Three-City Dijon study. Mol Psychiatry. 2015;2010:1173–1178. [DOI] [PubMed] [Google Scholar]
  • 12.Plassman BL, Williams JW Jr, Burke JR et al. . Systematic review: factors associated with risk for and possible prevention of cognitive decline in later life. Ann Intern Med. 2010;1533:182–193. [DOI] [PubMed] [Google Scholar]
  • 13.Mayeda ER, Haan MN, Neuhaus J et al. . Type 2 diabetes and cognitive decline over 14 years in middle-aged African Americans and whites: the ARIC Brain MRI Study. Neuroepidemiology. 2014;43(3-4):220–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mayeda ER, Haan MN, Kanaya AM et al. . Type 2 diabetes and 10-year risk of dementia and cognitive impairment among older Mexican Americans. Diabetes Care. 2013;369:2600–2606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;143:300–306. [PubMed] [Google Scholar]
  • 16.Liu W, Brookhart MA, Schneeweiss S et al. . Implications of M bias in epidemiologic studies: a simulation study. Am J Epidemiol. 2012;17610:938–948. [DOI] [PubMed] [Google Scholar]
  • 17.MacDonald SW, Hultsch DF, Dixon RA. Aging and the shape of cognitive change before death: terminal decline or terminal drop? J Gerontol B Psychol Sci Soc Sci. 2011;663:292–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sliwinski MJ, Stawski RS, Hall CB et al. . Distinguishing preterminal and terminal cognitive decline. Eur Psychol. 2006;113:172–181. [Google Scholar]
  • 19.Chaix B, Evans D, Merlo J et al. . Weighing up the dead and missing: reflections on inverse-probability weighting and principal stratification to address truncation by death. Epidemiology. 2012;231:129–131. [DOI] [PubMed] [Google Scholar]
  • 20.Tchetgen EJT, Glymour MM, Shpitser I et al. . To weight or not to weight? On the relation between inverse-probability weighting and principal stratification for truncation by death. Epidemiology. 2012;231:132–137. [Google Scholar]
  • 21.Tchetgen Tchetgen EJ. Identification and estimation of survivor average causal effects. Stat Med. 2014;3321:3601–3628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Weuve J, Proust-Lima C, Power MC et al. . Guidelines for reporting methodological challenges and evaluating potential bias in dementia research. Alzheimers Dement. 2015;119:1098–1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kurland BF, Johnson LL, Egleston BL et al. . Longitudinal data with follow-up truncated by death: match the analysis method to research aims. Stat Sci. 2009;242:211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Varadhan R, Xue Q-L, Bandeen-Roche K. Semicompeting risks in aging research: methods, issues and needs. Lifetime Data Anal. 2014;204:538–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Power MC, Weuve J, Sharrett AR et al. . Statins, cognition, and dementia—systematic review and methodological commentary. Nat Rev Neurol. 2015;114:220–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Proust-Lima C, Dartigues JF, Jacqmin-Gadda H. Misuse of the linear mixed model when evaluating risk factors of cognitive decline. Am J Epidemiol. 2011;1749:1077–1088. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Material

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES