Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 May 6.
Published in final edited form as: Biometrics. 2004 Dec;60(4):854–864. doi: 10.1111/j.0006-341X.2004.00240.x

Mixtures of Varying Coefficient Models for Longitudinal Data with Discrete or Continuous Nonignorable Dropout

Joseph W Hogan 1,*, Xihong Lin 2, Benjamin Herman 1
PMCID: PMC2677904  NIHMSID: NIHMS78400  PMID: 15606405

SUMMARY

The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the proposed semiparametric model is hence more robust than the parametric conditional linear model. The unconditional distribution of the repeated measures is a mixture over the dropout distribution. We show that estimation in the semiparametric varying coefficient mixture model can proceed by fitting a parametric mixed effects model and can be carried out on standard software platforms such as SAS. The model is used to analyze data from a recent AIDS clinical trial and its performance is evaluated using simulations.

Keywords: Clinical trials, Equivalence trial, Linear mixed model, Missing data, Nonignorable dropout, Pattern-mixture model, Pediatric AIDS, Selection bias, Smoothing splines

1. Introduction

1.1 Informative Dropout

Dropout and other types of missing data are common in long-term longitudinal studies; in many cases, dropout induces a missingness process that is nonignorable in the sense that missingness depends probabilistically on unobserved outcomes, even after conditioning on observable information. Most approaches to handling informative dropout in longitudinal data can be viewed as extensions of standard approaches such as multilevel modeling (Laird and Ware, 1982; Diggle, 1988; Breslow and Clayton, 1993) and marginal modeling (Liang and Zeger, 1986). Likelihood-based approaches include selection models (Wu and Carroll, 1988; Diggle and Kenward, 1994; Follman and Wu, 1995; Ten Have et al., 1998) and mixture models (Wu and Bailey, 1989; Little, 1993, 1994; Hogan and Laird, 1997a). Recent comprehensive surveys of parametric and likelihood-based approaches to handling dropout in longitudinal data can be found in Little (1995), Hogan and Laird (1997b), and Kenward and Molenberghs (1999). Moment-based methods also have been generalized to handle informative dropout under the selection modeling framework (see Robins, Rotnitzky, and Zhao, 1995; Rotnitzky, Robins, and Scharfstein, 1998; Scharfstein, Robins, and Rotnitzky, 1999).

In this article, we develop a general mixture modeling approach for continuous longitudinal repeated measures data where measurement times may be irregular across subjects and where dropout might be at continuous times and potentially nonignorable. The conditional distribution of repeated measures given dropout follows a varying coefficient model (VCM) (Zhang et al., 1998) where regression coefficients such as intercepts and slopes depend on dropout through unspecified nonparametric functions. The shapes of the functions are estimated using step functions when dropout time is discrete (e.g., for panel data), and using natural cubic smoothing splines (Green and Silverman, 1994) when dropout is continuous. We show that estimation in the proposed varying coefficient mixture model can proceed by fitting an augmented parametric mixed effects model. The complete data distribution is a mixture of the VCMs over the dropout distribution, and the dropout distribution can be left completely unspecified.

This class of models can be viewed as an extension of pattern-mixture models (Little, 1993, 1994) and conditional linear models (CLM) (Wu and Bailey, 1989; Hogan and Laird, 1997a) described above. The former author mainly considered panel data, while the latter authors allowed for the dropout time to be continuous but assumed the regression coefficients to be parametric functions of the dropout time. Estimation can therefore be biased if the parametric functions are misspecified. The proposed varying coefficient mixture model relaxes the parametric assumption by providing a unified framework to allow for flexible dependence of the covariate effects on dropout patterns by assuming regression coefficients to be nonparametric functions of dropout times. Hence estimation of the covariate effects is more robust to misspecification of the dependence between longitudinal responses and dropout.

1.2 Motivating Example

Protocol 128 of the ACTG was a randomized double-blind equivalency trial of high-dose (180 mg per square meter body surface area, six times daily) versus low-dose (90 mg) zidovudine (ZDV) for HIV-infected children (Brady et al., 1996). The study enrolled 424 children, randomized them to receive one of the two doses, and followed the children on a number of endpoints for up to 5 years. In this article, we are concerned with comparing longitudinal trajectory of CD4 cell counts. Children were scheduled for measurement of CD4 count every 12 weeks, but actual measurement times varied considerably. In addition, only about half of the participants completed 3 years of follow-up (113/216 [52%] on low dose, 93/208 [45%] on high dose).

A simple but reasonable approach to analyzing these data is to estimate treatment-group-specific CD4 trajectories using a linear random effects model (REM) (Laird andWare, 1982). This model provides valid inference under ignorable dropout (Laird, 1988; Diggle and Kenward, 1994; Little, 1995). Basic exploratory analysis suggests that for the observed data, mean square root of CD4 is well described by a linear time trend in both treatment arms (Figure 1). Using the REM, estimated change from baseline to week 200 is −12.7 (SE 0.8) in the low-dose arm, −18.2 (SE 1.4) in the high-dose arm, for a difference of −5.5 (SE 1.6), favoring low dose.

Figure 1.

Figure 1

Observed (square root) CD4 counts versus time and stratified by dose, with lowess regression line fit to pooled sample and five individual profiles highlighted.

To explore the potential for bias due to outcome-related dropout, we plotted estimated individual least-squares slopes versus follow-up time (Figure 2). A clear pattern is evident in both treatment arms, namely that lower slopes are associated with early dropout, which casts some doubt on the ignorable dropout assumption and hence on the validity of estimates from the linear REM, suggesting the need to utilize more elaborate models for addressing potential effects of informative dropout.

Figure 2.

Figure 2

Individual-specific OLS slopes for square root of CD4 as a function of follow-up time, stratified by dose.

The remainder of our article is organized as follows. The model is described in Section 2, and estimation procedures are detailed in Section 3; this includes estimation for discrete and continuous dropout times, and for settings with censored dropout times. In Section 4, we apply the proposed model to the clinical trial described in previous sections. Section 5 presents a simulation study to evaluate the bias of the proposed method under departures from underlying assumptions. Summary and discussion follow in Section 6.

2. Mixtures of Varying Coefficient Models for Handling Informative Dropout

Suppose that the data consist of m subjects with the ith subject having ni observations over time. For the ith subject, let Yi be an ni × 1 observed outcome vector, Xi be an ni × p covariate matrix associated with fixed eddects, Zi be an ni × q covariate matrix associated with random effects, and Ui be the dropout time. The complete data distribution for the response vector is the mixture obtained by integrating the joint distribution f(y, u) over u. Mixture model approaches, therefore, require specification of f(y | u) and f(u). When dropout times are discrete, it is usual to leave f(u) unspecified and estimate it nonparametrically; for continuous u, it is possible but not always desirable to use a parametric model such as log normal (Schluchter, 1992; DeGruttola and Tu, 1994). Our approach is to leave this marginal distribution unspecified.

To capture the dependence between Y and U, we assume that repeated measurements Yi for those who drop out at ui follow the varying coefficient REM

(Yi|U=ui)=Xiβ(ui)+Zibi+εi, (1)

where β(u) = {β1(u),…, βp(u)}T is a p × 1 vector of unknown regression coefficient functions of the dropout time u, bi is a q × 1 vector of random effects following N{0, D(θ, ui )}, εi is an ni × 1 vector of residuals following N{0, Ri(θ, ui )}, and θ is a c × 1 vector of variance components. Note that we allow the covariance matrices D and R to depend on the dropout time ui. To better understand model (1), we write Xi = (Xi1,…,Xip), where Xij is an ni × 1 vector of the values of the jth covariate measured over time for the ith subject. Equation (1) can be written as

(Yi|U=ui)=j=1pXijβj(ui)+Zibi+εi, (2)

where βj(u) represents the jth covariate effect for those who drop out at time u.

In settings such as a panel design, where subjects are observed at prespecified finite time points (panels) and the underlying dropout times are discrete, the βj(u) are step functions (Little and Wang, 1996; Hogan and Laird, 1997a). If instead subjects are observed at different irregular time points and the underlying dropout times are continuous, we assume that each βj(u) is an unspecified smooth function. It follows that model (1) allows the covariate effects to vary with dropout times nonparametrically and therefore estimation of the covariate effects can be made more robust.

The varying coefficient specification (1) includes the pattern-mixture model (Little, 1993, 1994), random effects pattern-mixture model (Little, 1995; Hogan and Laird, 1997a), and CLM (Wu and Bailey, 1989; Schluchter, 1992) as special cases. For example, if U has a discrete distribution with finite support, then the βj(u) are step functions and (1) is a pattern-mixture model. For continuous dropout times, consider the case where Xi = (1, T i) and T i = (ti1,…, tini )T, where tik is the kth follow-up time for the ith subject. If β1(u) and β2(u) are polynomial functions of u, model (1) reduces to a CLM (Wu and Bailey, 1989). If the βj(u) are constant in u, then the mixture distribution f(y) has only one component and (1) reduces to a standard REM (Laird and Ware, 1982; Diggle, 1988).

Comparisons to selection models (e.g., Wu and Carroll, 1988; Diggle and Kenward, 1994) can be made by deriving the associated probability of dropout at time u as a function of y. Let u0=(u10,,ur10)T denote the set of ordered, unique dropout times, and define an arbitrary time ur0>ur10 for the completers. The selection function can be written in closed form as a logistic regression using baseline-category logits, where completers (u=ur0) define the baseline category. Let hs(yi)=pr(U=us0|yi)/pr(U=ur0|yi), which characterizes the odds of dropping out at time us0 relative to completing the study, given repeated measures yi. Taking logs,

loghs(yi)=log(πr/πs)+logf(yi|U=us0)logf(yi|U=ur0), (3)

where πs=Pr(Ui=us0), and

f(yi|Ui=u)=|Vi(θ,u)|ni/2×exp[{yiXiβ(u)}TVi(θ,u)1×{yiXiβ(u)}/2],

with Vi(θ,u)=ZiDi(θ,u)ZiT+Ri(θ,u). In general, the selection model (3) is quadratic in yi. A more familiar version obtains when variance is constant across dropout times, i.e., when Vi(θ, u) = Vi(θ). Then (3) simplifies to

loghs(yi)=w{πs,πr,β(us0),β(ur0),Vi(θ)}+{β(us0)β(ur0)}TXiTVi(θ)1yi, (4)

where w(·) is a function of terms that do not depend on yi. Under this formulation, the log relative probability of dropout at us0 is linear in yi, with coefficients that depend on the difference β(us0)β(ur0) increasing in magnitude as β(u) depends more strongly on u. For binary U, (4) is consistent with the selection model restrictions used by Little (1994) and Little and Wang (1996) to identify parameters in a two-component pattern-mixture model.

3. Estimation Procedures

In this section, we discuss estimation procedures when dropout times Ui are observed for all subjects. Under this circumstance, the likelihood of (Yi, Ui ) for the ith subject can be partitioned as

Li(Yi,Ui;β,θ,π)=Li(Yi,|Ui;β,θ)Li(Ui;π).

It follows that (β, θ) can be estimated by maximizing the conditional likelihood i=1mLi(Yi,|Ui;β,θ), and π can be estimated by maximizing the marginal likelihood i=1mLi(Ui;π). We consider estimation procedures first for the situation where subjects are observed at a common set of time points, and then where the set of observation times may be misaligned across subjects.

3.1 Estimation Procedures for Fixed and Common Observation Times

For a fixed design, such as a panel design, subjects are observed at n (often small) prespecified time points (t1,…, tn ) and the number of possible dropout times is small. Thus, the nonparametric functions βj(u) are assumed to be step functions.

Let r be the number of observed distinct values of u, with rn. As indicated in Section 2, let u0=(u10,,ur0)T be an r × 1 vector of ordered distinct dropout times, where we assume completers take ur0=tn+1 for some value tn+1. Hence βj(u) is fully determined by βj = (βj1,…, βjr)T, where βjk represents the jth covariate effect for those dropping out at uk0. Denote the incidence matrix by Ni = (Ni1,…, Nir )T where Nik = 1 if ui=uk0 and 0 otherwise. Some calculations show that model (2) can then be written as a linear mixed model

Yi=j=1pXijNiTβj+Zibi+εi.

Equivalently, we have

Y=j=1pX˜jβj+Zb+ε, (5)

where X˜j=XjN,Xj=(X1jT,,XmjT)T,N=(N1T,,NmT)T, and Y, Z, b, and ε are defined similarly to N. Here AB denotes a direct product of matrices A and B.

It follows that estimation of β=(β1T,,βpT)T proceeds by solving the linear mixed model normal equation

(X˜TV1X˜)β=X˜TV1Y,

where =(1,…,p),V = diag (Vi),and Vi=cov(Yi)=ZiDZiT+Ri. The covariance matrix of the estimator of β is (T V−1 )−1. Estimation of θ can be obtained using the restricted maximum likelihood (REML) estimating equation

12tr(PVθj)+12(YX˜β)TV1VθjV1(YX˜β)=0,

where P = V−1V−1 (T V−1 )− 1 T V−1. The (j, k)th component of the information matrix of the estimator of θ is (1/2)tr{P (∂ V/∂θj)P (∂V/∂θk)}. If D and R depend on u, the identifiability of some components of θ based on the observed data might require some constraints on θ or a sensitivity analysis (Little, 1993; Daniels and Hogan, 2000). For example, consider the case with two time points (n = 2); if dropout is informative, there is no information in the observed data to fit a random intercept and slope model for those who drop out at time 2, and sensitivity analyses are needed.

The necessary and sufficient condition for the βj to be identifiable from model (5) can be stated as follows:

Let Sk = {ik1,…,ikmk} denote the indexes of those mk subjects who drop out at uk0, and χk=(Xik1T,,XikmkT). If rank(χ k) = p for all k = 1,…, r, then the βj (j = 1,…, p) are identifiable.

This condition states that when dropout times are discrete and finite, regression coefficients must be separately estimable for each dropout pattern. If some components of βj are not identifiable for some dropout patterns, then the observed data do not have information about these components and either parameter constraints or sensitivity analyses are needed. For example, consider again the case with two time points (n = 2); if dropout is informative, there is no information in the observed data to estimate the mean of the outcome at time 2 for those who drop out at time 2, and sensitivity analysis is hence needed. See Little and Wang (1996) and Daniels and Hogan (2000) for detailed discussion.

Note that the jth regression coefficient vector βj measures the jth covariate effect conditional on dropout times. Primary interest is in the marginal jth covariate effect β̃j = EUj(u)} = πTβj, where π = (π1,…, πr)T and πj=P(U=uk0) is the probability that a subject drops out at uk0. It follows that one can estimate β̃j by π̂T β̂j, where π̂j = mj / m, mJ is the number of subjects who drop out at uk0, and β̂is the maximum likelihood estimator from fitting model (5).

3.2 Estimation for Random or Misaligned Measurement Times

For situations when the measurement times are random or misaligned across subjects, repeated measures of each subject are observed at different time points and the underlying dropout times are often continuous. We therefore assume that the regression coefficient functions βj(u) are twice-differentiable smooth functions and estimate them using cubic smoothing splines. A key feature of cubic smoothing spline estimation is that we can estimate all model components—including the nonparametric regression coefficient functions βj(u), variance components θ, and smoothing parameters—within an augmented parametric linear mixed model frame-work (Zhang et al., 1998).

Following the notation in Section 3.1, let r be the number of ordered distinct values of the Ui,u0=(u10,,ur0)T be the ordered distinct values of the Ui, and βj={βj(u10),,βj(ur0)}T be the values of βj(u) evaluated at u0. Given the variance components θ, the conditional log likelihood of βj given the ui is

(Y|u;β1,,βp)=i=1m12ln|Vi|12{Yij=1pXijβj(ui)}T×Vi1{Yij=1pXijβj(ui)}=12ln|V|12(Yj=1pX˜jβj)T×V1(Yj=1pX˜jβj),

where j, Vi, V, and Y were defined in Section 3.1.

Following O’Sullivan, Yandell, and Raynor (1986), one can show that the natural cubic smoothing spline estimators of the βj(u) maximize the following penalized conditional log likelihood

(Y|u;β1,,βp)12j=1pλjA1A2[βj(u)]2du=(Y|u;β1,,βp)12j=1pλjβjTKβj, (6)

where the λj are smoothing parameters controlling the balance between goodness-of-fit and the smoothness of the estimated βj (u), A1 and A2 specify the range of u, and K is the nonnegative definite natural cubic smoothing spline smoothing matrix constructed using u0 and defined in Green and Silverman (1994, equation [2. 3]).

For fixed smoothing parameters λ = (λ1,…, λp)T and the variance components θ, differentiation of (6) with respect to (β1,…, βp) gives their estimating equations as

[X˜1TV1X˜1+λ1KX˜1TV1X˜2X˜1TV1X˜pX˜pTV1X˜1X˜pTV1X˜2X˜pTV1X˜p+λpK]×[β1βp]=[X˜1TV1YX˜pTV1Y]. (7)

One can solve equation (7) using a backfitting algorithm as follows:

β^j=(X˜jTV1X˜j+λjK)1X˜jTV1(YkjX˜kβk)

for j = 1,…, p.

Following Zhang et al. (1998), we show that these cubic smoothing spline estimators of the βj can be obtained by fitting an augmented linear mixed model. Speciffically, βj can be written via a one-to-one transformation as

βj=Uγj+Baj,

where U = (1, u0) B = L (LTL) −1, L is an r × (r − 2) full rank matrix satisfying K = LLT and LTU = 0, γj is a 2 × 1 unknown vector, and βj is an (r − 2) × 1 unknown vector. It can be shown that βjTKβ=ajTaj. The penalized log likelihood (6) becomes

(Y|U;β1,,βp)12j=1pλjajTaj.

It follows that the nonparametric natural cubic spline estimators of the βj can be obtained by fitting the parametric linear mixed model

Y=j=1p(X˜jU)γj+j=1p(X˜jB)aj+Zb+ε, (8)

where γ=(γ1T,,γpT)T is a vector of regression coefficients, a=(a1T,,apT)T and b are independent random effects with a ~ N(0, Λ(τ)), b ~ N(0, diag{D(θ)}), Λ(τ) = diag(τ jI), τ j = 1/λj and τ = (τ 1,…, τ p)T, and εi ~ N{0, Ri (θ, ui)}.

Estimation of γ and a can proceed using the BLUP estimator by solving the normal equation

[HTV1HHTV1GGTV1HGTV1G+Λ1][γa]=[HTV1YGTV1Y], (9)

where H = (1U,…, pU) and G = (1B,…, pB). Denoting γ̂ â; as the solution of (9), the natural cubic smoothing spline estimator of βj is β̂j = U γ̂j + j. One can show that the β̂ j from (9) are identical to those obtained from solving (7). The natural cubic spline estimators β̂ j are unique when H is of full rank.

We have so far assumed that the smoothing parameters λ and the variance components θ are known when estimating the βj. They are usually unknown in practice and need to be estimated from the data. Examination of the modified linear mixed model (8) suggests that τ behaves like variance components; therefore, following Zhang et al. (1998), we estimate the smoothing parameters τ and the variance components θ simultaneously using REML by treating τ as extra variance components in addition to θ in (8). The REML estimating equations for θ and τ are

12tr(P˜Vθk)+12(Yj=1pXjβj)T×V1VθkV1(Yj=1pXjβj)=0,12tr(P˜X˜jBBTX˜jT)+12(Yj=1pXjβj)T×V1X˜jBBTX˜jTV1(Yj=1pXjβj)=0,

where = V −1V1(H,G)C−1(H,G)TV −1, and C is the coefficient matrix on the left-hand side of (9). Parameters from the varying coefficient mixture model (1) can therefore be obtained by fitting the parametric linear mixed model (8) by SAS Proc Mixed (Version 8.2). The smoothing matrix B needs to be computed in advance.

3.3 Inference for Marginal Regression Coecients

The marginal jth covariate effect β̃j is β̃ j = EUj(u)} = ∫ βj(u) dF(u), where F(u) is the c.d.f. of u. Using the estimated cubic smoothing spline β̂j (u) and the empirical c.d.f. (u), one can estimate β̃j by ∫ β̂j(u)dF^ (u) = π̂Tβ̂j, where π^=i=1mNi/m. The delta method has been used for standard error estimation for mixture models where the support of U is very small relative to the number of subjects (Hogan and Laird, 1997a; Fitzmaurice and Laird, 2000). In our application with continuous dropout times, we found that the delta method performed poorly; the bootstrap was used instead, treating the subject as the basic resampling unit. Details are provided in the application.

4. Application to AIDS Clinical Trial

In this section, we analyze the ACTG data using several different mixture model formulations representing different assumptions about the missing data mechanism and provide detailed interpretation of the model parameters.

4.1 Variable Transformations and Candidate Models

As indicated in Section 1.2, observed CD4 data were transformed to the square root scale to reduce positive skewness. We fit three models (for computational tractability, each model is fit separately by dose). The first is a standard REM with subject-specific intercepts and linear time trends. The REM, briefly summarized in Section 1.2, assumes that the complete data in each treatment arm follow the linear mixed model

Yil=β1+β2til*+b1i+b2itil*+εil, (10)

where Yil is square root of CD4 count at time til, l = 1,…, ni, bi = (b1i, b2i) ~ N(0, D), and εil ~ N(0, σ2) (i.e., Ri(θ) = σ2Ini ), with independence between εil and bi. The time axis is rescaled using til*=(tilŪ)/range(til) so that the new time scale has range 1 and is centered at the sample mean of dropout times (the range is computed for the pooled sample from both dose arms). This is a standard REM, but can be viewed as a special case of (1) where βj(u) is constant in u for j = 1, 2.

In addition to the REM, we fit a CLM in which individual intercepts and slopes are linear functions of dropout time, and a VCM, where intercepts and slopes are unspecified smooth functions of dropout time. The CLM is precisely model (8) under the assumption that τ = 0 (equivalently, aj = 0 for all j), and therefore it is just a specialized version of the VCM. Separately by treatment, the CLM elaborates (10) such that β 1(u) = γ1 + γ2u* and β2(u) = γ3 + γ4u*, where u* = (u -Û)/range(til). In our parameterization with the rescaled time axis, γ1 is the mean of (CD4)1/2 at u = Ū, γ2 is the “main effect” of dropout, which is the slope of β1(u), γ3 is mean change in (CD4)1/2 from baseline at u = Ū, and γ4 is the effect of interaction between dropout and change from baseline, representing the slope of β2(u) on u. These parameters are estimated in straightforward fashion by fitting with SAS PROC MIXED a standard linear mixed model Yij=HijTγ+ZijTbi+εij, where Hij=(1,Uij*,tij*,tij*Uij*)T,andZij=(1,tij*)T.

Rescaling the time axis has some practical advantages in terms of model fitting and interpretation: (i) it makes the estimates more stable by increasing the variance of individual slopes away from zero; (ii) for the CLM, the intercept and slope main effect are already averaged over dropout time (Fitzmaurice, Laird, and Shneyer, 2001); (iii) for the REM and CLM, the slope parameter corresponds to average total change in square root CD4 from baseline to the longest follow-up time for the study; and (iv) for the CLM, the parameters γ2 and γ4 contrast mean intercept and slope, respectively, between those who drop out immediately (directly after u = 0) and those who complete the study protocol (u = maxil til ).

The VCM uses the same design matrix for γ as the CLM. Estimation of the aj is made by specifying the r − 2 columns of 1B and 2B as independent random effects with respective variances τ12andτ22.

4.2 Summary of Fitted Models

4.2.1 Parameter estimates

To get a crude understanding of whether MAR may be a valid assumption, we begin by summarizing regression coefficients for parameterizations of f(y | u) given by the REM and CLM; these appear in Table 1. For both models, the parameter γ3 quantifies average change from baseline to the maximum follow-up time. Recall that in the CLM, β1(u)=γ1+γ2(u*u*¯)andβ2(u)=γ3+γ4(u*u*¯), so that if we assume MAR is violated according to linear dependence between intercept and dropout and/or slope and dropout, the parameters γ2 and γ4 will quantify the degree to which MAR fails to hold. The parameter estimates from Model 1 suggest that intercepts β1(u) vary considerably by dropout time (γ̂2 = 14.6 and 13.0 [SE 3.3 and 3.5] for low and high dose, respectively); the same pattern is indicated for slopes β2(u), where γ̂4 = 26.1 and 33.7 (SE 4.8 and 6.5) for low and high dose. The parameters γ2 and γ4 represent the average difference in CD4 intercept (γ2) and slope (γ4) between those who dropped out immediately after enrolling and those who were followed for the maximum time (220 weeks). Hence the CLM indicates that dropouts have lower CD4 intercepts and slopes than completers, leading to selection bias for end-of-study comparisons due to missing data on less healthy participants. Because dropout time is centered at its mean in the CLM, γ̂ is the expected change in square root CD4 from week 0 to 220 if a subject were followed for the whole study period (note, however, that the standard error is from the conditional distribution of Y given U). A comparison of the estimates of γ under REM and CLM suggests that the effect of accounting for dropout via CLM is to correct the average change in (CD4)1/2 downward.

Table 1.

Parameter estimates from conditional part of joint model, assuming linear random effects structure (REM), and conditional linear model (CLM)

AZT dose
Model Parametera Low (90 mg) High (180 mg)
REM γ1 28.6 (0.8) 30.1 (0.9)
γ3 −12.7 (0.8) −18.2 (1.4)
CLM γ1 28.9 (0.8) 30.3 (0.8)
γ2 14.6 (3.3) 13.0 (3.5)
γ3 −15.9 (0.9) −21.4 (1.4)
γ4 26.1 (4.8) 33.7 (6.5)
a

See Section 4 for definitions.

In the VCM we allow both β1(u) and β2(u) to be completely unspecified for both treatment arms, and estimate them using cubic smoothing splines. Plots of the estimated functionals from the VCM, together with empirical Bayes estimates of individual CD4 intercepts and slopes, appear in Figure 3. In both treatment arms, CD4 intercept and slope are highly associated with dropout time: those who remain in the study for longer periods have higher values of both. Except for the CD4 intercept on the low-dose arm, the associations appear to be highly nonlinear.

Figure 3.

Figure 3

Estimated functions β1(u) and β2(u) for low- and high-dose ZDV arms, together with empirical Bayes estimates of individual intercepts and slopes (on square root of CD4 scale). Slopes correspond to change from baseline to week 200. Standardized follow-up times correspond to deviation from the average from the combined sample, and one unit represents 200 weeks.

4.2.2 Comparison of treatment effect inferences across models

Because the complete data likelihood factors over (β, θ) and π, and because our three models differ only in the specification of the conditional factor f(y | u; β, θ), model selection can in principle be based on criteria for the conditional (y | u) model. However, formal model selection procedures comparing parametric and nonparametric mixed effects models are not currently well developed and are beyond the scope of this article; furthermore, there are potential difficulties associated with comparing likelihoods for models that are not properly nested in a traditional way, e.g., due to boundary-value problems.

Table 2 lists estimates and associated standard errors for the marginal regression coefficients β̃j = ∫ βj(u) dF(u), for j = 1, 2, estimated according to the procedure described in Section 3.3 (note that the marginal coefficients—and not the conditional coefficients reported in Table 1—are of direct scientific interest). Standard errors were calculated using bootstrap resampling based on 100 replicated datasets sampled with replacement. Quantile plots of the bootstrap parameter estimates showed no obvious departures from normality (not shown), so Z-statistics are used for inferences. Not surprisingly, standard errors associated with the VCM are increased relative to the CLM, reflecting uncertainty about the functional form of βj(u). For the low-dose arm, the increase is relatively modest. Comparing slopes in the low-dose arm, for example, SE(β̂2) = 1.0 for the CLM and 1.3 for the VCM. No appreciable difference in standard errors is seen in the estimated intercepts.

Table 2.

Estimated intercept and slope characterizing marginal mean of CD4 trajectory under three different specifications for conditional part of joint model, with standard errors estimated via bootstrap

Model Parameter Low
dose
High
dose
Difference
(SE)
Z
REM β1 28.6 (0.8) 30.1 (0.9)
β2 −12.7 (0.8) −18.2 (1.4) −5.5 (1.6) −3.4
CLM β1 28.9 (0.8) 30.3 (0.9)
β2 −15.9 (1.0) −21.4 (1.8) −5.5 (2.0) −2.8
VCM β1 29.0 (0.7) 29.9 (0.9)
β2 −17.1 (1.3) −20.1 (2.4) −3.0 (2.7) −1.1

Table 2 also indicates the degree to which adjusting for selection bias affects the final inferences about treatment. Under the MAR assumption (REM), estimated mean difference in total change in (CD4)1/2 is −5.5, with Z-statistic = −3.4; adjustment under the CLM gives the same estimated effect, with Z-statistic = −2.8. Both lead to the conclusion that low dose is superior to high dose because the decline in CD4 is less steep. Under the VCM, the correction for possible selection biases on low dose changes the slope estimate from −12.7 (REM) to −17.1; the correction is less severe on high dose (−18.2 for REM, compared to −20.1 for VCM); the effect is to narrow the gap in treatment effect to −3.0, with Z = −1.1, representing a change of 1.56 standard errors relative to the REM, and 1.25 standard errors relative to the CLM.

The VCM also provides an important substantive insight, namely that participants who drop out of low dose (the experimental dose in this trial) tend to have steeper decline in their CD4 counts, compared to those on high dose. The trial was designed to see whether the lower dose, known to be associated with fewer side effects in adults, would have efficacy equal to the high dose. The form of the β2(u) functions suggests that among the early dropouts, rate of change in CD4 for those on low dose is substantially less than for those on high dose. In an MAR analysis, early dropouts contribute less information to the estimate of population slope because they have fewer observed data points, leading to the potential selection bias seen in the REM.

5. Simulation Study

Our model gives the analyst considerable exibility in specifying dependence between outcome and dropout in the context of a mixture model, and avoids biases that are possible if the functional form of this dependence is assumed to be known. The primary innovation of the VCM over CLM is that β(u) can be left unspecified, but this generalization relies on the key assumption that β(u) is a (vector of) smooth, twice differentiable functions of u. We designed a brief simulation study to investigate the performance of our model under violations of this assumption.

Each simulation uses datasets with n = 50 subjects having up to 15 unique dropout times, with β(u) taking three different forms. We compare estimates of mean change from baseline from a standard REM, the CLM with components of β(u) assumed linear in u, and from the VCM with β(u) left unspecified. Specifically, we assume

yil=β1(ui)+β2(ui)til+b1i+b2itil+eil,

where (b1i, b2i) T ~ N (0, D), eil ~ N (0, σ2). There are 15 time points {til}, equally spaced between 0 and 1. This simulation uses d11 = 4, d22 = 0.1, d12 = -0.1 (correlation ≈ −0.15), and σ2 = 1, which implies that between-subject variation exceeds within-subject variation by a factor of about 4.

Dropout is generated from a beta mixture of binomial distributions as follows: p ~ Beta(1.5, 1.5) (mean 0.5), U* ~ Bin(15, p), and dropout time U = U*/15 ε (0, 1). Finally, we assume β1(u) = 0 and vary the functional form of β2(u); candidate functions are

  1. −exp(αu),

  2. exp(αu) I(u < t*) + exp(αt*) I(ut*) (exponential with plateau effect for dropouts beyond t*),

  3. α1I(u < t*) + α2I(ut*) (two-piece step function).

Case (i) actually meets the assumptions for the VCM, and is included for validating our simulation and estimation routines; case (ii) violates the smoothness assumption and case (iii) violates both smoothness and continuity assumptions.

For (i), at α = −4, completers (U = 1) have mean change from baseline β2(1) = exp(−4) ≈ 0.02 and early dropouts (U = 0) have mean change −1, a difference of about 3 SD (because d22 = 0.1). Under (ii), we keep α = −4 and invoke the plateau effect at t* = 2/3, leading to a structure wherein those who complete 2/3 of the study or more have average change from baseline equal to exp(−4 × 2/3) ≈ 0.07. For (iii), we keep t* = 2/3 and set α1 = 0, α2 = 1.

Results are reported in Table 3. As expected, the VCM gives virtually unbiased estimation of the true slope for case (i), where β2(u) is both continuous and smooth, while both the REM and CLM show substantial upward bias. This comparison is not as trivial as it would appear, however, because exploratory plots of OLS slopes versus dropout time (e.g., Figure 2) do not always reveal an obvious functional form for β(u), particularly in the early part of the time axis. One advantage to the VCM is its effectiveness in finding a signal from noisy data.

Table 3.

Results from simulation to characterize bias. REM = linear random effects model; CLM = conditional linear model with β2 (u) linear; VCM = varying coefficient model with β2(u) unspecified. Each estimate represents a sample average of estimated slopes over 100 replicated datasets, each having 100 subjects with up to 15 repeated measures. Standard errors for simulation-based estimated mean appear in parentheses.

Estimated slope
Underlying modela True slope
2)
REM CLM VCM
(i) Continuous, smooth 0.159b −0.073
(0.018)
−0.119
(0.035)
−0.160
(0.028)
(ii) Continuous, not smooth −0.170b −0.062
(0.015)
−0.100
(0.027)
−0.166
(0.033)
(iii) Discontinuous −0.587b −0.211
(0.018)
−0.622
(0.031)
−0.715
(0.038)
a

See Section 5 for model descriptions.

b

Computed to nearest 0.001 via Monte Carlo simulation.

The VCM shows only very little bias for estimating the true slope for the continuous but not everywhere-differentiable function from case (ii), but exhibits more bias than the CLM for the discontinuous function in case (iii). In all cases, however, the VCM outperforms the REM.

6. Discussion

6.1 Summary

We have proposed a mixture-modeling approach to analyzing longitudinal data with outcome-dependent dropout. Our model assumes that covariate effects depend on dropout time through unspecified functions {βj(u)}, where u is dropout time. When dropout times are discrete, the βj(u) are step functions, and when dropout is continuous, βj(u) are assumed to be unspecified smooth functions of u. This formulation generalizes pattern-mixture models (Little, 1993, 1994) and random effects mixture models (Wu and Bailey, 1988, 1989; Hogan and Laird, 1997a; Albert and Follman, 2000) for continuous response data. Using an example from an AIDS clinical trial, we show that the model has the potential to adjust for selection biases induced by poor responders dropping out early.

The primary innovation in our approach is that the functional dependence between covariate effects and dropout time can be left unspecified. Our simulation study shows that when this relationship is misspecified, CLMs yield biased estimates, while varying coefficient mixture models still yield unbiased estimates and are more robust. In many applications, this is a decided advantage over the CLM model because the form of β(u) rarely will be known or intuitive. Moreover, it is our experience that using polynomials leads to overfitting and/or extrapolations well outside the range of data, particularly when the polynomial has degree >2. When u is continuous, our simulation also shows that inferences are unlikely to be sensitive to lack of smoothness in β(u), but could be affected by discontinuities. In both cases, however, bias is substantially less than under an MAR analysis.

6.2 Strategies for Sensitivity Analysis

Another advantage to mixture modeling in general is that extrapolations of the missing data are transparent, and lend themselves well both to substantive critique and to empirical sensitivity analysis (e.g., Rubin, 1977; Little and Wang, 1996; Daniels and Hogan, 2000; Rotnitzky et al., 2001). The nonidentifiable component of our model is the distribution of missing responses following dropout, f(ymis | yobs, u).

In our data application, for example, we assume dropouts at time u have the same slope for t > u as for tu; on the surface this is a strong assumption but it is relatively easily modified and is a sensible starting point for sensitivity analysis. Specifically, the VCM in our data example takes the form

(Yij|Ui=u)=β1i(u)+β2i(u)tij+εij, (11)

where β1i(u) = β1(u) + b1i, β2i(u) = β2(u) + b2i, bi = (b1i, b2i) T ~ N {0, D(u)}, and εij ~ N{0, σ2(u)} and is independent of bi. Note that in our application, we assume D(u) = D and σ2(u) = σ2, an assumption that can be relaxed by introducing models to characterize the variance components as a function of u (Daniels and Pourahmadi, 2002).

One approach to sensitivity analysis in model (11) is to assume a different slope on time for t > u, i.e., assume a continuous piecewise linear model with the change point u as

(Yij|Ui=u)=β1i(u)+β2i(u)tij+δi(u)(tiju)++εij, (12)

where a+ = a if a > 0 and 0 otherwise, δi(u) = δ(u) + di, and bi*=(b1i,b2i,di)TN(0,D*) is a 3 × 3 matrix for random effects. Model (12) assumes the slope changes from β2i(u) for tu to β2i (u) + δi(u) for t > u. The nonidentifiable sensitivity parameters are therefore δ(u) and the variance components comprising the third row (column) of D*. Observed data provide no information about these parameters; one approach is to fix them at various values and recompute quantities of interest (such as expected change in CD4 from beginning to end of the study) and examine their sensitivity with respect to δ(u) and the unidentifiable parameters of D*. Details of the sensitivity analysis will be investigated in future work.

RÉSUMÉ

L’analyse d’observations longitudinales répétées est souvent compliquée par l’absence de données consécutive á des sorties d’étude informatives. Nous décrivons un modéle par mélange pour la distribution conjointe de mesures longitudinales répétées oú la distribution des sorties peut étre continue et oú la relation entre réponse et sortie est semi-paramétrique. Plus précisément, nous supposons que les réponses suivent un modéle mixte conditionnellement au temps de sortie, dont les coefficients dépendent de la sortie par l’intermédiaire de fonctions non paramétriques estimées á l’aide de fonctions de saut, quand les dates de sortie sont discrétes (études de panel), et á l’aide de fonctions de splines quand les sorties sont continues. L’inférence á partir de ce modéle est ainsi plus robuste qu’á partir du modéle linéaire conditionnel paramétrique. La distribution non conditionnelle des mesures répétées est un mélange de distributions conditionnelles. Nous montrons que l’on peut estimer le modéle par méelange en ajustant un modéle paramétrique á effets mixtes á l’aide d’un logiciel standard comme SAS. Le modéle est appliqué aux données d’un essai thérapeutique récent contre le SIDA et ses performances sont évaluées par simulations.

ACKNOWLEDGEMENTS

Work on this project was funded by grants R01-AI-50505 and P30-AI-42853 (Hogan) and CA76404 (Lin) from the U.S. National Institutes of Health. The authors are grateful to Rusty Tchernis for assistance with computing related to the simulation and implementation of the bootstrap, and to Jason Roy and two anonymous reviewers for helpful comments.

REFERENCES

  1. Albert PS, Follmann D. Modeling repeated count data subject to informative dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]
  2. the Pediatric AIDS clinical trial. Brady MT, McGrath N, Brouwers P, et al. Randomized study of the tolerance and efficacy of high- versus low-dose zidovudine in human immunodeficiency virus-infected children with mild to moderate symptoms (ACTG 128) Journal of Infectious Disease. 1996;173:1097–1106. doi: 10.1093/infdis/173.5.1097. [DOI] [PubMed] [Google Scholar]
  3. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
  4. Daniels MJ, Hogan JW. Reparameterizing the pattern mixture model for sensitivity analyses under informative dropout. Biometrics. 2000;56:1241–1248. doi: 10.1111/j.0006-341x.2000.01241.x. [DOI] [PubMed] [Google Scholar]
  5. Daniels M, Pourahmadi M. Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika. 2002;89:553–566. [Google Scholar]
  6. DeGruttola V, Tu XM. Modeling the progression of CD4-lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed] [Google Scholar]
  7. Diggle PJ. An approach to the analysis of repeated measurements. Biometrics. 1988;44:959–971. [PubMed] [Google Scholar]
  8. Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis. Applied Statistics. 1994;43:49–73. [Google Scholar]
  9. Fitzmaurice GM, Laird NM. Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics. 2000;1:141–156. doi: 10.1093/biostatistics/1.2.141. [DOI] [PubMed] [Google Scholar]
  10. Fitzmaurice GM, Laird NM, Shneyer L. An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs. Statistics in Medicine. 2001;20:1009–1021. doi: 10.1002/sim.718. [DOI] [PubMed] [Google Scholar]
  11. Follman D, Wu MC. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed] [Google Scholar]
  12. Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman & Hall; 1994. [Google Scholar]
  13. Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997a;16:239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
  14. Hogan JW, Laird NM. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997b;16:259–272. doi: 10.1002/(sici)1097-0258(19970215)16:3<259::aid-sim484>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
  15. Kenward MG, Molenberghs G. Parametric models for incomplete continuous and categorical longitudinal data. Statistical Methods in Medical Research. 1999;8:51–83. doi: 10.1177/096228029900800105. [DOI] [PubMed] [Google Scholar]
  16. Laird NM. Missing data in longitudinal studies. Statistics in Medicine. 1988;7:305–315. doi: 10.1002/sim.4780070131. [DOI] [PubMed] [Google Scholar]
  17. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  18. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
  19. Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134. [Google Scholar]
  20. Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81:471–483. [Google Scholar]
  21. Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90:1112–1121. [Google Scholar]
  22. Little RJA, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed] [Google Scholar]
  23. O’Sullivan F, Yandell BS, Raynor WJ., Jr Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association. 1986;81:96–103. [Google Scholar]
  24. Robins J, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]
  25. Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with non-ignorable non-response. Journal of the American Statistical Association. 1998;93:1321–1339. [Google Scholar]
  26. Rotnitzky A, Scharfstein D, Su TL, Robins J. Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics. 2001;57:103–113. doi: 10.1111/j.0006-341x.2001.00103.x. [DOI] [PubMed] [Google Scholar]
  27. Rubin DB. Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association. 1977;72:538–543. [Google Scholar]
  28. Scharfstein D, Robins J, Rotnitzky A. Adjusting for nonignorable nonresponse using semiparametric nonresponse models with time dependent covariates (with discussion) Journal of the American Statistical Association. 1999;94:1096–1146. [Google Scholar]
  29. Schluchlter MD. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]
  30. Ten Have TR, Kunselman AR, Pulkstenis EP, Landis JR. Mixed effects logistic regression models for longitudinal binary response data with informative drop-out. Biometrics. 1998;54:367–383. [PubMed] [Google Scholar]
  31. Wu MC, Bailey K. Analysing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine. 1988;7:337–346. doi: 10.1002/sim.4780070134. [DOI] [PubMed] [Google Scholar]
  32. Wu MC, Bailey K. Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model (corr: V46 p. 889) Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]
  33. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188. [Google Scholar]
  34. Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association. 1998;93:710–719. [Google Scholar]

RESOURCES