Abstract
Most longitudinal surveys construct weights and release wave-specific weights to adjust for attrition. However, there is no clear consensus in the literature on whether or how to apply weights in longitudinal trajectory modeling. We present a simulation study, motivated by a real-life longitudinal study of substance use, and consider different missing data mechanisms, weight construction processes, and specifications of substantive models of interest. Based on the results of the simulation study, we provide practical recommendations for analysts of longitudinal survey data with respect to weighting approaches that should be considered in alternative scenarios.
Keywords: survey weighting, attrition, simulation, trajectory modeling
INTRODUCTION
Longitudinal (or panel) surveys are used to follow participants over time, enabling trajectory modeling. However, some panel participants may drop out, resulting in panel attrition. This attrition can be monotone when the participants dropping out are lost to follow-up and never return; if retention efforts bring them back after they miss a few waves, then the attrition pattern will be intermittent. With either monotone or intermittent attrition, the concern is whether study respondents could be systematically different from attriters in terms of distributions for variables of interest, causing potential bias in estimates of trajectories. With declining response rates in surveys (Brick and Williams, 2013; de Leeuw, Hox, and Luiten, 2018), panel attrition has increasingly become a critical issue, often reducing survey quality.
Si et al. (2024) compare different approaches to nonresponse bias analysis (NRBA) in the case of monotone drop-out in longitudinal studies. Weighting adjustments are effective when the constructed weights are correlated with the survey variable of interest. Multiple imputation allows for incomplete variables to be included in the imputation model, yielding potentially less biased and more efficient estimates when the variables are predictive of the survey outcome. Multilevel models fitted using maximum likelihood (ML) estimation and marginal models estimated using generalized estimating equations (GEE) can also handle incomplete longitudinal data. Bayesian methods introduce prior information and potentially stabilize model estimation. Among the NRBA methods, weighting is commonly used in practice as it is agnostic about the survey variables of interest, whereas model-based approaches are tailored to the particular outcome variable of interest. Different weighting methods for longitudinal data might use the complete cases (CCs) that respond in all waves (including those who do not answer all questions) or the available cases (ACs) that respond in any wave. Given the focus of Si et al. (2024) on these various approaches in the case of a monotone attrition pattern, the present study focuses on the case of intermittent attrition.
To adjust for attrition, most longitudinal surveys construct weights and release wave-specific weights. However, there is no clear consensus in the literature on whether or how to apply weighting adjustments for attrition in longitudinal trajectory modeling. The role of weighting adjustments in regression modeling in general has been a long-debated topic (see Bollen et al., 2016, for a review; see also Pfeffermann, 1993, 2011). The effects of weighting adjustments depend on the missing data mechanism that leads to attrition, the method used to construct the adjusted weights, and the specifications of models for the variables of interest, among other factors. Many existing methods for longitudinal surveys are based on weight adjustment techniques for cases that have provided complete data over time (Heeringa et al., 2017; Lynn and Watson, 2021; Veiga et al., 2014; Vieira and Skinner, 2008), and these methods do not provide guidance on whether or how to incorporate cases with incomplete data or corresponding adjusted weights that may be varying over time. For example, Schmidt and Woll (2017) evaluated different approaches for adjusting baseline sampling weights in a longitudinal study, finding that the adjusted weights were able to reduce bias due to attrition in selected descriptive estimates. However, this study included only one follow-up wave and did not consider estimation of trajectories or how to incorporate multiple sets of weights that may be available at each wave of a longitudinal study.
Related studies have considered the benefits of alternative semiparametric and non-parametric approaches for estimating response propensities and computing the corresponding adjustments to baseline sampling weights. Da Silva and Skinner (2011) considered a semiparametric regression modeling approach for modeling response propensity and presented empirical evidence (via simulation) that such an approach may result in more robust nonresponse adjustments under specific types of response mechanisms than standard logistic regression. Unfortunately, this paper also did not discuss whether or how to use wave-specific adjusted weights when estimating trajectory models. Earp et al. (2018) used regression tree methodology to understand predictors of nonresponse at the different phases of a longitudinal establishment survey, identifying establishment-level characteristics that could be the focus of future design strategies, but did not discuss how adjusted weights that might be computed using regression trees should be applied when fitting trajectory models. While these alternative methods of estimating response propensity show promise, no studies to our knowledge have evaluated the effectiveness of these adjustments in the context of trajectory modeling.
Prior studies in the statistics literature have looked at adjustment approaches for volunteer / convenience samples without complex sampling features (e.g., Ibrahim and Molenberghs, 2009); we are focused on how to incorporate time-varying adjusted weights arising in the setting of complex probability samples measured over time when fitting trajectory models. In this setting, a combination of weighted maximum likelihood estimation and accounting for cluster sampling in variance estimation has been shown to work well in simulation studies, but this prior work has once again assumed that only cases with complete data at each wave are being analyzed (Vieira and Skinner, 2008). Si et al. (2022) conducted an empirical evaluation of alternative approaches to adjusting for attrition when analyzing longitudinal survey data and found that analyzing all available observations in each wave, while simultaneously accounting for 1) the correlations among repeated observations, 2) variables affecting sample selection, and 3) attrition, was the most effective approach. These adjustment effects tend to be more pronounced for wave-specific descriptive estimates but are generally modest in covariate-adjusted trajectory modeling. However, these empirical findings need validation since the true trajectories in the Si et al. (2022) study were unknown.
We are unaware of any comprehensive simulation studies that have provided a rigorous evaluation of the performance of alternative weighting approaches to adjusting for attrition when fitting trajectory models to longitudinal survey data, considering different missing data mechanisms and substantive models of interest. To this end, we aim to perform such a simulation study and provide practical recommendations for analysts of longitudinal survey data with respect to weighting approaches that should be considered in alternative scenarios. In particular, we focus on whether and how time-varying, wave-specific weights that have been computed by a data producer to adjust for attrition (possibly using any one of a number of methods, e.g., regression trees followed by calibration at each time point) should be employed when fitting trajectory models to correct for potential bias in the estimates of the parameters defining those models.
ADJUSTMENT METHODS
Our methodological examination, which is primarily concerned with whether (and how) weights that have already been constructed should be used when fitting trajectory models to longitudinal survey data with intermittent attrition patterns, focuses on four aspects:
The attrition mechanism, which may be missing at random (MAR) or missing not at random (MNAR);
Whether the trajectory model for the survey outcome is correctly specified (yes or no);
Whether the computation of the wave-specific adjusted weights correctly accounts for the response mechanism (yes or no); and
The correlation between repeated measures, which may be weak, moderate, or strong.
Cross-classification of the levels of the above four factors results in 24 possible scenarios. We consider four analytic approaches for each of these 24 scenarios in our simulation study, including the following:
Unweighted complete-case analyses assuming a missing completely at random (MCAR) mechanism (denoted by CC), where we fit a logistic regression model to a repeatedly-measured binary outcome in a data set defined by the subjects with complete data across all waves only, ignoring the weights entirely in the maximum likelihood estimation, and we include a subject identifier variable as a clustering variable to account for the correlation between repeated measures of the same subject, using the function svyglm in the R survey package (Lumley, Gao and Schneider, 2024);
Weighted complete-case analyses with one set of nonresponse-adjusted weights assuming an MAR mechanism (denoted by CCW; e.g., Schmidt and Woll, 2017), where we fit a weighted logistic regression model, using svyglm, to the same type of complete-case data set as in approach 1, accounting for the correlation between repeated measures of the same subject, and we use a single nonresponse-adjusted weight for each subject’s set of complete data in pseudo-maximum likelihood estimation (i.e., the weight takes a constant value for all observations from a subject, reflecting the inverse of the predicted probability of responding in all waves);
Unweighted available-case analyses (denoted by AC), which a) leverage all available observations from each case providing at least one measurement, unlike the CC and CCW approaches, b) assume MCAR, c) ignore weights entirely, and d) use the same type of maximum likelihood estimation approach (accounting for the correlation of the repeated measures) described for CC; and
Weighted available-case analyses (denoted by ACW), using all available observations with wave-specific nonresponse-adjusted weights incorporated in the pseudo-maximum likelihood estimation (as implemented in svyglm), assuming an MAR mechanism.
We note that these four approaches (CC, CCW, AC, and ACW) assume an ignorable missing data mechanism. Completely MCAR mechanisms are certainly unlikely in the longitudinal survey context, but any analysts performing unweighted CC analyses are making this assumption about the missing data mechanism, so we include CC as an alternative approach.
We reiterate that we include a subject identifier variable as a clustering variable in the estimation and assume a constant correlation of the repeated observations within the same subject. This approach, similar in spirit to generalized estimating equations (GEE) with an exchangeable working correlation matrix, is made possible via the svyglm function in the R package survey, which uses a Taylor Series Linearization approach by default when computing variance estimates for the regression parameters, in both unweighted and weighted analyses. We introduce these constant correlations of repeated observations from the same subject via the inclusion of random subject effects in our data generating model, selecting three different values for the variance of the random effects to introduce three different marginal within-subject correlations of the underlying latent logistic random variable that gives rise to the simulated binary responses (see details below).
Various parametric, semiparametric, and non-parametric methods can be used for estimating response propensities and subsequently constructing the adjusted weights described above for the CCW or ACW approaches. Here, we use conditional inference trees (implemented via the ctree function in the R package partykit) for illustrative purposes (Hothorn and Zeileis, 2023), and focus on whether we need (and how) to account for weights in trajectory modeling. We apply Taylor Series Linearization to compute variance estimates and correctly capture the sampling error introduced by the sampling of clusters (individuals) and the use of weights in estimation.
SIMULATION STUDY DESIGN
Our simulation study is motivated by the prior efforts of substance use researchers to model trajectories in the probability of binge drinking in the past two weeks among young adults, and study the developmental trajectory across multiple waves from ages 18 years old to 30 years old, when binge drinking is most prevalent and represents a major public health concern (Schulenberg et al., 2021; Patrick et al., 2023; Center for Behavioral Health Statistics and Quality, 2023). We have therefore designed the simulation study to reflect real-life applications as closely as possible, drawing on empirical evidence of binge drinking prevalence from prior research using large-scale epidemiological panel data. These settings can easily be generalized to other models under consideration for longitudinal survey data.
We follow the general framework of finite population inference by simulating finite population data from a superpopulation model and conducting repeated sampling studies given the finite population to evaluate the performance of the alternative approaches. This general framework is fundamental to design-based and model-based inference for finite population quantities of interest based on regression modeling (Pfeffermann, 1993; Binder and Roberts, 2009; Rubin, 2019).
Data Generating Model
Our data generating model (DGM) is a binary logit regression model, where yit (=1/0) indicates binge drinking for subject i at wave t, and the subject-specific covariate Xi (=1/0), for example, indicates the intention to attend college for subject i at baseline, which has been shown to have a positive relationship with the probability of binge drinking (Bachman et al., 1997; Schulenberg et al., 2001; Schulenberg et al., 2021; Sher & Rutledge, 2007). While more subject-specific covariates could certainly by considered in other applications or data generating models, we focused on this single covariate to keep our empirical evaluation relatively straightforward.
To introduce a constant marginal correlation of the repeated observations within a subject, we introduce a random intercept that varies across subjects, and use the following logit model to generate a binary outcome yit that is measured repeatedly over time, where i = 1, …, n is an index for subjects in the longitudinal survey, and t = 1, …, 6 is an index for waves of the study:
| (1) |
In (1), we note that a normal distribution with a mean of 0 and variance of In our simulations, we select three values of that introduce a constant marginal correlation (of a specific magnitude) among the simulated draws of the underlying latent logistic random variable within a given subject. Specifically, we select values of 0.17, 1.10, and 3.28, corresponding to marginal intra-subject correlations (ICC = ) of 0.05, 0.25, and 0.5, again in terms of the underlying latent logistic variable.
In each scenario of our simulation studies, we first simulate a finite population of size 150,000 subjects (including their repeated measures at each of the six waves) based on the data generating model, with the true values of the regression parameters defined as follows: and . For each subject, we first simulate their value on the binary subject-specific covariate Xi (where the first half of the population is assigned a value of 0, and the second half is assigned a value of 1, independent of other simulated measures), and the variable takes on values that are equal to the t index for each of the six waves / observations (1, 2, …, 6). Conditional on the randomly drawn values of the random effect (see the possible distributions above), Xi, and each fixed value of , we then subsequently simulate the time-specific values of based on the model in (1). We accomplish this for each of the six observations for a given subject i by back-transforming the computed logistic random variable into a predicted probability, and referring a random draw from a UNIFORM(0,1) distribution to that predicted probability. If the random draw is less than the predicted probability, yit = 1, and otherwise, yit = 0.
We then fit a logistic regression model with an exchangeable correlation structure to the simulated finite population data using GEE to obtain the true substantive model parameters in (1) for a given finite population. The values of these parameters reflect an assumed real-world scenario for binge drinking prevalence, where the probability of binge drinking in the aforementioned age range will initially rise and then briefly decline, with a steeper decline for those with the intention of attending college (Si et al., 2022). We assume that primary scientific interest lies in the estimation of , which captures the change in the initial rate of increase in the log-odds of the outcome being equal to 1 when the binary subject-specific covariate Xi is equal to 1 (as opposed to 0). Next, we select a random sample of 1,000 subjects from the finite population (including their six repeated observations) and consider one of the different attrition mechanisms discussed below (depending on the simulation scenario). We repeat this process 500 times for each scenario.
As part of our evaluation methodology, we also consider misspecification of the substantive model of interest, and the ability of the alternative methods of adjusting for attrition that we consider to estimate the parameters that one would see when fitting the misspecified model to a given simulated finite population. The misspecified model assumes that the log-odds (or logit) has a linear relationship with time and omits the two terms involving the squared term of time (i.e., a naïve analyst constrains the coefficients for and to be zero by omitting the predictors involving time-squared from the analysis model). We also consider the estimated coefficients (obtained using GEE) in this misspecified model for each simulated population in our scenarios involving model misspecification, and evaluate the ability of each approach to recover these “true” parameters in the misspecified model.
Attrition Mechanisms
We consider MCAR, MAR, and MNAR attrition mechanisms. Given a full simulated sample of generated data based on (1), we next consider the following model that governs the probability of attrition at each wave (where cases may drop out in some waves but respond in other waves):
| (2) |
We use R in the superscript to indicate coefficients in the model defining the response mechanism. Subject-specific values on the new predictor A in each sample are randomly drawn from a standard normal distribution with mean 0 and variance 1, independently from the other measures. Given choices of the values for the regression parameters in (2) (details below), the new variable A influences response propensity but is independent of the survey outcome variable by definition. In the simulations, we fix values of the parameters in the substantive model (as “truth”) and then toggle values of the parameters in the model for the response mechanism. The value of α(R) is selected to achieve a certain response rate when X, time, y, and A are equal to zero (e.g., 1.10 for a response rate of 0.75). Response indicators are defined by determining if a uniform random draw from the (0,1) range falls below the predicted response propensity at a given time point for a given case.
If , which captures the relationship of the binary outcome variable of interest with the log-odds of responding at a given time t, is not equal to zero in (2), then the missing data mechanism is MNAR or non-ignorable. If this coefficient is equal to zero, and any of the other coefficients in the model are non-zero, then the missing data mechanism is MAR or ignorable. If all the coefficients are equal to zero (where the intercept is allowed to be non-zero), then the missing data mechanism would be MCAR (and also ignorable).
In all simulations, the expected follow-up wave response rate is 0.75, and the parameters in the model for the response mechanism are fixed to , , and . The positive association of intention to attend college (as a proxy of education attainment) with the probability of continuing to respond is supported by multiple prior studies (e.g., McCabe & West, 2016), and the negative time coefficient captures a general pattern of attrition over time consistent with findings based on the Monitoring the Future study (Si et al., 2022).
Adjustment Methods and Evaluation
To develop weights for the CCW approach, we construct a classification tree (using the aforementioned ctree function in the partykit package for R) predicting an indicator of complete response at all waves as a function of X and A, and use the inverse of the predicted response probability for each complete case as the weight. For the ACW approach, we construct six trees to predict the probability of response at each of the six follow-up waves, respectively, and obtain six sets of attrition-adjusted weights by inverting the predicted probabilities of response at each wave. We do not examine joint response modeling across the six waves, as the number of response patterns could be large with small sample sizes for some patterns, making joint modeling difficult. We use the default control parameters within the ctree function to build the conditional inference trees. We consider both a correct specification of the response mechanism (correct R) including both X and A as predictors in the trees, and an incorrect specification (incorrect R) that has omitted X in the prediction.
We note that adjusted time-varying survey weights computed in this fashion for longitudinal surveys often also incorporate calibration adjustments, given external population information available at each wave of data collection (Lynn and Watson, 2021). We did not consider these additional adjustments in our approaches. Given the many possible choices for performing these additional calibration adjustments (Lynn and Watson, 2021), we also wanted to keep our potential simulation scenarios relatively straightforward and limited in count.
We cross the four analytic approaches defined above with the aforementioned 24 possible scenarios, defined by all cross-classifications of the levels of the following four factors:
MNAR vs. MAR: or ;
correct or incorrect trees for nonresponse adjustment (do the trees include both A and X as predictors of attrition, or just A?);
correct or incorrect specification of the substantive model for the outcome Y (i.e., either including or omitting and ). Technically, under non-ignorable mechanisms (), model misspecification will also occur when only using X and A to model the attrition indicators; and
low, moderate, or high correlations between repeated measures of the latent logistic variable within the same subject, achieved by varying the ICC values (0.05, 0.25, and 0.5) in the DGM.
For each scenario, we will evaluate the empirical bias, empirical root mean squared error (RMSE), and 95% confidence interval coverage for point estimates of each of the six regression parameters in the substantive model in (1) (or some reduced form of it, depending on the misspecification defined by the scenario) under the four different analytic approaches, simulating 500 samples for each of the 24 alternative scenarios. We compute 95% confidence intervals for each of the regression parameters in a given simulated sample using a standard design-based approach, adding to (or subtracting from) each point estimate the linearized standard error of the estimate multiplied by a critical t-value with alpha = 0.05 and degrees of freedom defined by the number of subjects (clusters) minus 1. We then evaluate what fraction of these intervals covers the true population parameter across the repeated samples.
We remind readers that our inferential target is the population coefficients for the misspecified model in scenarios where the substantive model has been misspecified. When fitting the misspecified model, we aim to see how well the alternative approaches do at approximating the same misspecified model that would be fitted to the entire population of data, if a particular attrition mechanism holds. For this reason, we do not compare the estimated coefficients in the misspecified model to those in the full data generating model, as the parameters represent different quantities and are not comparable.
We performed all computation in the R environment (R Core Team, 2024), as most analytic procedures can be implemented with R packages, e.g., survey, partykit, and geepack (Højsgaard, Halekoh and Yan, 2024). All code for the simulation studies is publicly available on GitHub (link TBD).
RESULTS
We present the main results from the simulation study in Figures 1–9. The nine figures present three evaluation quantities (bias, RMSE, and coverage) with three introduced within-subject correlation values (low, moderate, and high). Each figure contains eight plots that capture the eight scenarios when the attrition is generated under either MAR or MNAR, the survey outcome model is either correctly or incorrectly specified, and the model for the response indicator is either correctly or incorrectly specified.
Figure 1.

Empirical bias of model coefficients with a low intra-subject correlation value (0.05) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figure 9.

95% confidence interval coverage rates of model coefficients with a high intra-subject correlation value (0.5) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figures 1–3 present the empirical bias values for the estimated model coefficients when the DGM has introduced low, moderate, and high intra-subject correlations, respectively. Under MAR, all four methods give negligible bias values. The AC and ACW estimates have smaller bias than the CC and CCW estimates and are nearly unbiased for all coefficients, particularly the quantity of interest , or the time effect when The AC and ACW estimates under MAR are nearly unbiased regardless of whether the survey outcome model is either correctly or incorrectly specified, and whether the model for the response indicator is either correctly or incorrectly specified. Under MNAR, the CC and CCW estimates generate large bias values for three coefficient estimates in the correct outcome models and two estimates in the misspecified outcome model. The bias values under MNAR increase as the correlations increase. The AC and ACW estimates under MNAR have negligible bias, except for the intercepts under the two outcome models. The increasing correlation has smaller effects on the AC and ACW estimates than on the CC and CCW estimates. Interestingly, the response model specification and the inclusion of weights do not cause substantial differences in coefficient estimates for either CC or AC. Overall, the AC and ACW approaches outperform the CC and CCW with smaller bias values.
Figure 3.

Empirical bias of model coefficients with a high intra-subject correlation value (0.5) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figures 4–6 present the RMSE values of the estimated model coefficients for the 24 scenarios, where all the AC and ACW estimates have smaller RMSE values than the CC and CCW estimates. Under MAR, since all four methods give negligible bias values, the AC and ACW approaches produce estimates with lower variance than the CC and CCW methods. Under MNAR, the inflation in the RMSE values in the CC and CCW estimates is due to the larger bias and larger variance compared to the AC and ACW estimates. Under MAR, including weights does not change the results, regardless of the different model specifications. Under MNAR, including weights slightly increases the RMSEs of two coefficient estimates when the outcome model is misspecifed. The results are not affected by the different correlation values. In sum, the AC and ACW approaches outperform the CC and CCW with smaller RMSE values and variance estimates.
Figure 4.

Root mean squared error (RMSE) of model coefficients with a low intra-subject correlation value (0.05) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figure 6.

Root mean squared error (RMSE) of model coefficients with a high intra-subject correlation value (0.5) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figures 7–9 present the coverage rates of the 95% confidence intervals for the estimated model coefficients across the 24 scenarios. Under MAR with correct outcome models, all four approaches have similar coverage rates approaching 0.95. Under MAR with incorrect outcome models and lower within-subject correlation, the AC and ACW methods have slightly higher coverage rates than CC and CCW; however, this is not the case with higher correlations. When the outcome model is misspecified under MAR, the use of weights increases the coverage rates, where ACW has higher coverage than AC, regardless of the response model specification. Under MNAR with correct outcome models, the AC and ACW methods outperform the CC and CCW with coverage rates higher than 0.9, and the intercept coverage rate is the lowest. Under MNAR when the outcome model is misspecifed, all approaches fail to reasonably cover the true values of the coefficients, regardless of the use of weights and the within-subject correlations. Therefore, none of the four approaches is valid under MNAR when the outcome model is misspecified. When the outcome model is misspecified under MAR, the ACW method is recommended.
Figure 7.

95% confidence interval coverage rates for model coefficients with a low intra-subject correlation value (0.05) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Overall, the performances of the four investigated methods depend on the attrition mechanism and the survey outcome model specification. When the outcome is correctly specified under MAR, all four methods are valid, and the AC and ACW methods have lower RMSEs than CC and CCW. Under an MNAR attrition mechanism and a correct outcome model, all four methods still work well except for the intercept estimation. However, when the outcome model is misspecified, ACW outperforms the other methods with satisfactory performance under MAR. If the outcome model is misspecfied under MNAR, the AC approaches still outperform the CC approaches, but no approaches can correct all of the bias in the estimated intercept under this MNAR scenario. The inclusion of weights increases the coverage rates when the outcome model has not been correctly specified, but does not result in substantial improvements.
We do not find that the results are sensitive to the response model specification, where in our setting we include both and as predictors in the correct response model and only in the incorrect response model. Note that this does not necessarily correctly model the true attrition mechanism and it fails to capture MNAR. It could be that the effect of the response model specification depends on the outcome model specification. Whereas the predictor is always included in the outcome model, omitting in the response model may not affect the outcome model estimation.
In general, the AC and ACW approaches out-perform CC and CCW. When models were misspecified, we found that the AC approaches once again out-performed the CC approaches, and weighted AC analyses offered slight improvements over unweighted AC analyses in terms of confidence interval coverage. When considering the bias and RMSE of the estimates with misspecified outcome models, weighted AC approaches did not provide significant improvements. With correctly specified models, weights will only reduce precision without changing the estimates (Heeringa et al., 2017; Bollen et al., 2016; Korn and Graubard, 1999).
DISCUSSION
Overall, the results of our simulation study suggest that analysts should perform AC analysis when conducting longitudinal trajectory modeling. Analyst effort should be dedicated to improving the specification of the model for the survey outcome of interest. Correct outcome model specification in combination with AC analysis can essentially eliminate the need for weighting adjustments, and modern machine learning approaches may offer benefits over traditional parametric modeling approaches in this context (e.g., Breidt and Opsomer, 2017; Kinreich et al., 2021).
The performance of the alternative weighting adjustment methods depends on the attrition mechanisms. The outcome model and the weighting process should be based on observed information that makes the MAR assumption plausible. Under MNAR attrition mechanisms, only using available information is not sufficient, and we need additional data sets or assumptions. None of the methods considered in this simulation study could correct the bias in all of the estimated coefficients of the trajectory models in the MNAR scenarios.
We also considered a joint modeling approach based on the shared parameter (SP) model (Ibrahim & Molenberghs, 2009; Wu & Carroll, 1988), which assumes that the survey outcome and response indicator share the same random effects on the subject level and are conditionally independent. The SP model is widely used in health applications to handle MNAR scenarios. This conditional independence is one special case of MNAR since marginally the response depends on the survey outcome. We estimated the SP model using a Bayesian framework and obtained Markov chain Monte Carlo samples for inference. However, the SP inference is on the subject level, given the random effects, rather than on the population level (as is the case when using the marginal modeling approaches described above). Moreover, the SP computation is heavy when conducting repeated studies under various simulation scenarios. We include the output of one sample run of the SP approach and describe the results in contrast to those generated by the other four approaches in the Appendix. Future work is needed to facilitate this implementation and adapt the SP model for more practical use.
We note that the assumptions of the alternative SP model that we considered may not be satisfied in practice; we found that the common random intercepts did not affect trajectory modeling estimates in our simulation studies (see the Appendix for details). West et al. (2021) have developed measures of selection bias for probit regression model coefficients estimated under MNAR mechanisms in cross-sectional settings (along with corresponding methods for adjusting the estimated coefficients). Extensions of those approaches to trajectory models with longitudinal data would be an important direction for future research.
With a misspecified outcome model, the role of the weights depends on other factors. In our simulation studies, the unweighted and weighted estimates based on the ACs were generally similar, with the weighted estimates tending to increase confidence interval coverage when the outcome model was misspecified. We recommend comparing the point estimates and applying weighted analysis for potential bias reduction if the weighting effect is large. Korn & Graubard (2011) listed three scenarios for weighted and unweighted estimates of associations to differ greatly: 1) the outcome model must be badly misspecified; 2) an omitted variable in the outcome model must have a strong interaction with the independent variables and must be highly correlated with the weights; or 3) sampling or response rates strongly depend upon the outcome variables. In addition to the point estimates, the use of weights mainly affects the standard error estimates, often resulting in inefficiency. Little & Vartivarian (2005) recommend evaluating the mean squared error as a balance of bias and variance due to weighting.
Our investigation focused on weighting adjustment methods for intermittent attrition in longitudinal studies. Alternative estimation methods that account for correlation and allow for missing data in the repeated measures include GEE, ML, and Bayesian approaches. ML estimates maximize the incomplete data likelihood based on the analysis model, for example, in growth modeling or a latent variable modeling framework (Little & Rubin, 2019). Bayesian methods can stabilize estimates with informative prior information. Si et al. (2024) have compared these different methods for monotone attrition in an empirical study, and additional simulation studies are needed to further evaluate these alternatives to weighting adjustments.
Adjusted weights play a prominent role in descriptive prevalence estimation at individual waves based on longitudinal data (Si et al., 2022). Our simulation studies show that the weighting effects are much more modest in trajectory modeling and depend on both the attrition mechanisms and the outcome model specification. Weighted analysis is ultimately a conservative approach. Though our design is based on empirical evidence of binge drinking prevalence using the Monitoring the Future panel study, the baseline data collection of which is assumed to be a simple random sample, our findings can easily be applied to more general longitudinal survey design and data analysis. For example, with a complex sample design at baseline, we could adjust the sampling weights for attrition and calibrate adjusted weights to match the available population control information. Regardless of the types of selection bias that a given set of longitudinal survey weights has been designed to address, the results of this simulation study suggest that using all available observations and careful model specification may be more important than using the weights when fitting trajectory models to longitudinal survey data.
In our simulation study, we considered 24 different scenarios. We acknowledge that this was a limited set of scenarios for providing insight into the relative effectiveness of these adjustment approaches. We also only considered fully observed predictors that could have had missing values for selected cases in practice. In this case, some form of imputation might be used prior to any adjustments based on these predictors. A correct outcome model specification with robust estimation is the key to success. Our findings show that the AC analyses are the preferred method and are not sensitive to varying model parameters, sample sizes, and attrition rates.
Figure 2.

Empirical bias of model coefficients with a medium intra-subject correlation value (0.25) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figure 5.

Root mean squared error (RMSE) of model coefficients with a medium intra-subject correlation value (0.25) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Figure 8.

95% confidence interval coverage rates of model coefficients with a medium intra-subject correlation value (0.25) in the data generation model, different survey outcome model specifications (correct and incorrect Y), different attrition mechanisms (MAR and MNAR), and different model specifications for the response indicator (correct and incorrect R). CC: complete case analysis; AC: available case analysis; CCW: attrition-adjusted weighted analysis with complete cases; and ACW: attrition-adjusted weighted analysis with available cases.
Acknowledgement:
The work was supported by research grant R01DA031160 from the National Institute on Drug Abuse, National Institutes of Health.
APPENDIX
We also considered a joint modeling approach based on the shared parameter (SP) model (Ibrahim & Molenberghs, 2009; Wu & Carroll, 1988), which assumes that the survey outcome and response indicator share the same random effects on the subject level and are conditionally independent. The conditional independence is one special case of MNAR since marginally the response depends on the survey outcome. The SP approach varies in the use of randomly varying subject effects to adjust for the possibility of a non-ignorable attrition mechanism, but relies on the specific model assumption that the random subject effects are shared between the response model and the survey outcome model. The model specification affects the performance of the SP approach, which may fail if the model is misspecified.
In our SP modeling approach, we include the same random effect in the model for and specify a generalized linear mixed-effect model, as shown below:
| (A.1) |
which is a special MNAR case. This model could be extended by adding random coefficients, e.g., including subject-varying coefficients of . However, the SP approach would require that the outcome model also has subject-varying intercepts and subject-varying coefficients of . The resulting computation will be challenging, and the inference is on the subject level without being comparable to the GEE estimates on the population level.
We estimate the SP model via Stan using the mvbf function in the R package brms (Bürkner, 2021). Because of the heavy computational burden of the SP approach, we only include the bias values for one sample. The SP estimates are on the subject level and different (by definition) from the true values that were the focus of the four methods compared in the paper. We present the estimated bias values for the scenario with a high intra-subject correlation in Figures A1–A2 below.
The SP results for these scenarios are inferior to those under AC. A possible reason for this is that the coefficients represent different quantities, i.e., conditional subject-level relationships vs. marginal population-averaged relationships. In addition, the inclusion of one shared random intercept mainly affects variance estimation on the subject and observation levels but does not cause substantial changes in the coefficient estimates. Our finding that AC analyses should be conducted in longitudinal studies remains the same; the AC and ACW approaches continue to outperform the CC and CCW.
Figure A1.

Empirical bias of model coefficients based on one sample where the survey outcome model is correctly specified (correct Y) with a high intra-subject correlation value (0.5), and there are different attrition mechanisms (MAR, MNAR, and SP) and model specifications for the response indicator (correct and incorrect R).
Figure A2.

Empirical bias of model coefficients based on one sample where the survey outcome model is incorrectly specified (incorrect Y) with a high intra-subject correlation value (0.5), and there are different attrition mechanisms (MAR, MNAR, and SP) and model specifications for the response indicator (correct and incorrect R).
Footnotes
Disclaimer: The manuscript has not been published elsewhere and has not been submitted simultaneously for publication elsewhere.
Contributor Information
Brady T. West, Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor; Department of Biostatistics, University of Michigan, Ann Arbor; Center for the Study of Drugs, Alcohol, Smoking and Health, Department of Health Behavior and Biological Sciences, School of Nursing, University of Michigan, Ann Arbor
Yajuan Si, Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor; Department of Biostatistics, University of Michigan, Ann Arbor.
Yueying Hu, Department of Biostatistics, University of Michigan, Ann Arbor.
Sean E. McCabe, Center for the Study of Drugs, Alcohol, Smoking and Health, Department of Health Behavior and Biological Sciences, School of Nursing, University of Michigan, Ann Arbor
Phil Veliz, Center for the Study of Drugs, Alcohol, Smoking and Health, Department of Health Behavior and Biological Sciences, School of Nursing, University of Michigan, Ann Arbor; Department of Systems, Populations and Leadership, School of Nursing, University of Michigan, Ann Arbor.
REFERENCES
- Bachman J, Wadsworth K, O’Malley P, Johnston L, & Schulenberg J (1997). Smoking, drinking, and drug use in young adulthood: The impacts of new freedoms and new responsibilities. Mahwah, NJ: Lawrence Erlbaum. [Google Scholar]
- Binder DA, & Roberts G (2009). Design- and model-based inference for model parameters. In Handbook of statistics, 29, 33–54. [Google Scholar]
- Bollen KA, Biemer PP, Karr AF, Tueller S, & Berzofsky ME (2016). Are survey weights needed? A review of diagnostic tests in regression analysis. Annual Review of Statistics and Its Application, 3, 375–392. [Google Scholar]
- Breidt FJ and Jean D Opsomer. Model-assisted survey estimation with modern prediction techniques. Statist. Sci 32 (2) 190–205, May 2017. 10.1214/16-STS589 [DOI] [Google Scholar]
- Breiman L, Friedman JH, Olshen RA, & Stone CJ (1984). Classification and Regression Trees. Belmont, CA: Wadsworh, Inc. [Google Scholar]
- Brick JM, & Williams D (2013). Explaining rising nonresponse rates in cross-sectional surveys. The ANNALS of the American Academy of Political and Social Science, 645, 36–59. [Google Scholar]
- Center for Behavioral Health Statistics and Quality. (2023). Highlights for the 2022 National Survey on Drug Use and Health. https://www.samhsa.gov/data/sites/default/files/reports/rpt42731/2022-nsduh-main-highlights.pdf [Google Scholar]
- Da Silva DN and Skinner C (2011). A semiparametric propensity score weighting method to adjust for nonresponse using multivariate auxiliary information. International Statistical Institute: Proceedings of the 58th World Statistical Congress, Dublin, Ireland (Session CPS002). [Google Scholar]
- de Leeuw E, Hox J, & Luiten A (2018). International nonresponse trends across countries and years: An analysis of 36 years of labour force survey data. In Survey Insights: Methods from the Field. https://surveyinsights.org/?p=10452. [Google Scholar]
- Earp M, Toth D, Phipps P, and Oslund C (2018). Assessing nonresponse in a longitudinal establishment survey using regression trees. Journal of Official Statistic, 34(2), 463–481. [Google Scholar]
- Højsgaard S, Halekoh U, Yan J (2024). geepack: Generalized Estimating Equation Package. R package version 1.3.10, https://CRAN.R-project.org/package=geepack. [Google Scholar]
- Hothorn T, Zeileis A (2023). partykit: A Toolkit for Recursive Partytioning. R package version 1.2-20, https://CRAN.R-project.org/package=partykit. [Google Scholar]
- Ibrahim JG, & Molenberghs G (2009). Missing data methods in longitudinal studies: A review. Test, 18(1), 1–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kinreich S, Meyers JL, Maron-Katz A, Kamarajan C, Pandey AK, Chorlian DB, … & Porjesz B (2021). Predicting risk for Alcohol Use Disorder using longitudinal data with multimodal biomarkers and family history: A machine learning study. Molecular psychiatry, 26(4), 1133–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korn EL, & Graubard BI (2011). Analysis of Health Surveys (2nd ed.). John Wiley & Sons, Inc. [Google Scholar]
- Little RJA, & Rubin DB (2019). Statistical Analysis with Missing Data (Third Edition). John Wiley & Sons. [Google Scholar]
- Little RJA, & Vartivarian S (2005). Does weighting for nonresponse increase the variance of survey means? Survey Methodology, 31(2), 161–168. [Google Scholar]
- Lumley T, Gao P, Schneider B (2024). survey: Analysis of Complex Survey Samples. package version 4.4-2, https://CRAN.R-project.org/package=survey. [Google Scholar]
- Lynn P, Watson N. Issues in weighting for longitudinal surveys. In Advances in Longitudinal Survey Methodology. Editor: Lynn P, 447–468. Wiley; 2021. [Google Scholar]
- McCabe SE, & West BT (2016). Selective nonresponse bias in population-based survey estimates of drug use behaviors in the United States. Soc Psychiatry Psychiatr Epidemiol, 51(1), 141–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patrick ME, Miech RA, Johnston LD, & O’Malley PM (2023). Monitoring the Future Panel Study annual report: National data on substance use among adults ages 19 to 60, 1976–2022. 10.7826/ISR-UM.06.585140.002.07.0002.2023. [DOI] [Google Scholar]
- Pfeffermann D (1993). The role of sampling weights when modeling survey data. International Statistical Review, 61(2), 317–337. [Google Scholar]
- Pfeffermann D (2011). Modelling of complex survey data: Why model? Why is it a problem? How can we approach it? Survey Methodology, 37(2), 115–136. [Google Scholar]
- R Core Team (2024). R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. [Google Scholar]
- Rubin DB (2019). Conditional calibration and the sage statistician. Survey Methodology, 45(2), 187–198. [Google Scholar]
- Schmidt SCE, Woll A Longitudinal drop-out and weighting against its bias. BMC Med Res Methodol. 2017. Dec 8;17(1):164. doi: 10.1186/s12874-017-0446-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulenberg JE, Patrick ME, Johnston LD, O’Malley PM, Bachman JG, & Miech RA (2021). Monitoring the Future national survey results on drug use, 1975-2020: Volume II, college students and adults ages 19-60. Ann Arbor: Institute for Social Research. [Google Scholar]
- Schulenberg J, Maggs J, Long S, Sher K, Gotham H, Baer J, Kivlahan D, Marlatt G, & Zucker R (2001). The problem of college drinking: Insights from a developmental perspective. Alcohol Clin Exp Res, 25(3), 473–477. [PubMed] [Google Scholar]
- Sher K, & Rutledge P (2007). Heavy drinking across the transition to college: Predicting first-semester heavy drinking from precollege variables. Addict Behav, 32(4), 819–835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Si Y, Little R, Mo Y, & Sedransk N (2024). Nonresponse bias analysis in longitudinal studies: A comparative review with an application to the Early Childhood Longitudinal Study. International Statistical Review. 10.1111/insr.12566. [DOI] [Google Scholar]
- Si Y, West BT, Veliz P, Patrick ME, Schulenberg JE, Kloska DD, Terry-McElrath YM, & McCabe SE (2022). An empirical evaluation of alternative approaches to adjusting for attrition when analyzing longitudinal survey data on young adults’ substance use trajectories. International Journal of Methods in Psychiatric Research, 31(3), e1916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veiga A, Smith P, & Brown J (2014) The use of sample weights in multivariate multilevel models with an application to income data collected using a rotating panel survey. Journal of the Royal Statistical Society, Series C (Applied Statistics), 63(1):65–84. [Google Scholar]
- Vieira M, Skinner C (2008) Estimating models for panel survey data under complex sampling. Journal of Official Statistics, 24:343–364. [Google Scholar]
- West BT, Little RJ, Andridge RR, Boonstra PS, Ware EB, Pandit A, & Alvarado-Leiton F (2021). Assessing selection bias in regression coefficients estimated from nonprobability samples with applications to genetics and demographic surveys. The Annals of Applied Statistics, 15(3), 1556–1581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu MC, & Carroll RJ (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44(1), 175–188. [Google Scholar]
- Bürkner P (2021). “Bayesian Item Response Modeling in R with brms and Stan.” Journal of Statistical Software, 100(5), 1–54. doi: 10.18637/jss.v100.i05. [DOI] [Google Scholar]
