Mixtures of Varying Coefficient Models for Longitudinal Data with Discrete or Continuous Nonignorable Dropout

Joseph W Hogan; Xihong Lin; Benjamin Herman

doi:10.1111/j.0006-341X.2004.00240.x

. Author manuscript; available in PMC: 2009 May 6.

Published in final edited form as: Biometrics. 2004 Dec;60(4):854–864. doi: 10.1111/j.0006-341X.2004.00240.x

Mixtures of Varying Coefficient Models for Longitudinal Data with Discrete or Continuous Nonignorable Dropout

Joseph W Hogan ^1,^*, Xihong Lin ², Benjamin Herman ¹

PMCID: PMC2677904 NIHMSID: NIHMS78400 PMID: 15606405

SUMMARY

The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the proposed semiparametric model is hence more robust than the parametric conditional linear model. The unconditional distribution of the repeated measures is a mixture over the dropout distribution. We show that estimation in the semiparametric varying coefficient mixture model can proceed by fitting a parametric mixed effects model and can be carried out on standard software platforms such as SAS. The model is used to analyze data from a recent AIDS clinical trial and its performance is evaluated using simulations.

Keywords: Clinical trials, Equivalence trial, Linear mixed model, Missing data, Nonignorable dropout, Pattern-mixture model, Pediatric AIDS, Selection bias, Smoothing splines

1. Introduction

1.1 Informative Dropout

Dropout and other types of missing data are common in long-term longitudinal studies; in many cases, dropout induces a missingness process that is nonignorable in the sense that missingness depends probabilistically on unobserved outcomes, even after conditioning on observable information. Most approaches to handling informative dropout in longitudinal data can be viewed as extensions of standard approaches such as multilevel modeling (Laird and Ware, 1982; Diggle, 1988; Breslow and Clayton, 1993) and marginal modeling (Liang and Zeger, 1986). Likelihood-based approaches include selection models (Wu and Carroll, 1988; Diggle and Kenward, 1994; Follman and Wu, 1995; Ten Have et al., 1998) and mixture models (Wu and Bailey, 1989; Little, 1993, 1994; Hogan and Laird, 1997a). Recent comprehensive surveys of parametric and likelihood-based approaches to handling dropout in longitudinal data can be found in Little (1995), Hogan and Laird (1997b), and Kenward and Molenberghs (1999). Moment-based methods also have been generalized to handle informative dropout under the selection modeling framework (see Robins, Rotnitzky, and Zhao, 1995; Rotnitzky, Robins, and Scharfstein, 1998; Scharfstein, Robins, and Rotnitzky, 1999).

In this article, we develop a general mixture modeling approach for continuous longitudinal repeated measures data where measurement times may be irregular across subjects and where dropout might be at continuous times and potentially nonignorable. The conditional distribution of repeated measures given dropout follows a varying coefficient model (VCM) (Zhang et al., 1998) where regression coefficients such as intercepts and slopes depend on dropout through unspecified nonparametric functions. The shapes of the functions are estimated using step functions when dropout time is discrete (e.g., for panel data), and using natural cubic smoothing splines (Green and Silverman, 1994) when dropout is continuous. We show that estimation in the proposed varying coefficient mixture model can proceed by fitting an augmented parametric mixed effects model. The complete data distribution is a mixture of the VCMs over the dropout distribution, and the dropout distribution can be left completely unspecified.

This class of models can be viewed as an extension of pattern-mixture models (Little, 1993, 1994) and conditional linear models (CLM) (Wu and Bailey, 1989; Hogan and Laird, 1997a) described above. The former author mainly considered panel data, while the latter authors allowed for the dropout time to be continuous but assumed the regression coefficients to be parametric functions of the dropout time. Estimation can therefore be biased if the parametric functions are misspecified. The proposed varying coefficient mixture model relaxes the parametric assumption by providing a unified framework to allow for flexible dependence of the covariate effects on dropout patterns by assuming regression coefficients to be nonparametric functions of dropout times. Hence estimation of the covariate effects is more robust to misspecification of the dependence between longitudinal responses and dropout.

1.2 Motivating Example

Protocol 128 of the ACTG was a randomized double-blind equivalency trial of high-dose (180 mg per square meter body surface area, six times daily) versus low-dose (90 mg) zidovudine (ZDV) for HIV-infected children (Brady et al., 1996). The study enrolled 424 children, randomized them to receive one of the two doses, and followed the children on a number of endpoints for up to 5 years. In this article, we are concerned with comparing longitudinal trajectory of CD4 cell counts. Children were scheduled for measurement of CD4 count every 12 weeks, but actual measurement times varied considerably. In addition, only about half of the participants completed 3 years of follow-up (113/216 [52%] on low dose, 93/208 [45%] on high dose).

A simple but reasonable approach to analyzing these data is to estimate treatment-group-specific CD4 trajectories using a linear random effects model (REM) (Laird andWare, 1982). This model provides valid inference under ignorable dropout (Laird, 1988; Diggle and Kenward, 1994; Little, 1995). Basic exploratory analysis suggests that for the observed data, mean square root of CD4 is well described by a linear time trend in both treatment arms (Figure 1). Using the REM, estimated change from baseline to week 200 is −12.7 (SE 0.8) in the low-dose arm, −18.2 (SE 1.4) in the high-dose arm, for a difference of −5.5 (SE 1.6), favoring low dose.

Observed (square root) CD4 counts versus time and stratified by dose, with lowess regression line fit to pooled sample and five individual profiles highlighted.

To explore the potential for bias due to outcome-related dropout, we plotted estimated individual least-squares slopes versus follow-up time (Figure 2). A clear pattern is evident in both treatment arms, namely that lower slopes are associated with early dropout, which casts some doubt on the ignorable dropout assumption and hence on the validity of estimates from the linear REM, suggesting the need to utilize more elaborate models for addressing potential effects of informative dropout.

Individual-specific OLS slopes for square root of CD4 as a function of follow-up time, stratified by dose.

The remainder of our article is organized as follows. The model is described in Section 2, and estimation procedures are detailed in Section 3; this includes estimation for discrete and continuous dropout times, and for settings with censored dropout times. In Section 4, we apply the proposed model to the clinical trial described in previous sections. Section 5 presents a simulation study to evaluate the bias of the proposed method under departures from underlying assumptions. Summary and discussion follow in Section 6.

2. Mixtures of Varying Coefficient Models for Handling Informative Dropout

Suppose that the data consist of m subjects with the ith subject having n_i observations over time. For the ith subject, let Y_i be an n_i × 1 observed outcome vector, X_i be an n_i × p covariate matrix associated with fixed eddects, Z_i be an n_i × q covariate matrix associated with random effects, and U_i be the dropout time. The complete data distribution for the response vector is the mixture obtained by integrating the joint distribution f(y, u) over u. Mixture model approaches, therefore, require specification of f(y | u) and f(u). When dropout times are discrete, it is usual to leave f(u) unspecified and estimate it nonparametrically; for continuous u, it is possible but not always desirable to use a parametric model such as log normal (Schluchter, 1992; DeGruttola and Tu, 1994). Our approach is to leave this marginal distribution unspecified.

To capture the dependence between Y and U, we assume that repeated measurements Y_i for those who drop out at u_i follow the varying coefficient REM

(Y_{i} | U = u_{i}) = X_{i} β (u_{i}) + Z_{i} b_{i} + ε_{i},

(1)

where β(u) = {β₁(u),…, β_p(u)}^T is a p × 1 vector of unknown regression coefficient functions of the dropout time u, b_i is a q × 1 vector of random effects following N{0, D(θ, u_i )}, ε_i is an n_i × 1 vector of residuals following N{0, R_i(θ, u_i )}, and θ is a c × 1 vector of variance components. Note that we allow the covariance matrices D and R to depend on the dropout time u_i. To better understand model (1), we write X_i = (X_i1,…,X_ip), where X_ij is an n_i × 1 vector of the values of the jth covariate measured over time for the ith subject. Equation (1) can be written as

(Y_{i} | U = u_{i}) = \sum_{j = 1}^{p} X_{i j} β_{j} (u_{i}) + Z_{i} b_{i} + ε_{i},

(2)

where β_j(u) represents the jth covariate effect for those who drop out at time u.

In settings such as a panel design, where subjects are observed at prespecified finite time points (panels) and the underlying dropout times are discrete, the β_j(u) are step functions (Little and Wang, 1996; Hogan and Laird, 1997a). If instead subjects are observed at different irregular time points and the underlying dropout times are continuous, we assume that each β_j(u) is an unspecified smooth function. It follows that model (1) allows the covariate effects to vary with dropout times nonparametrically and therefore estimation of the covariate effects can be made more robust.

The varying coefficient specification (1) includes the pattern-mixture model (Little, 1993, 1994), random effects pattern-mixture model (Little, 1995; Hogan and Laird, 1997a), and CLM (Wu and Bailey, 1989; Schluchter, 1992) as special cases. For example, if U has a discrete distribution with finite support, then the β_j(u) are step functions and (1) is a pattern-mixture model. For continuous dropout times, consider the case where X_i = (1, T _i) and T _i = (t_i1,…, t_{in_i} )^T, where t_ik is the kth follow-up time for the ith subject. If β₁(u) and β₂(u) are polynomial functions of u, model (1) reduces to a CLM (Wu and Bailey, 1989). If the β_j(u) are constant in u, then the mixture distribution f(y) has only one component and (1) reduces to a standard REM (Laird and Ware, 1982; Diggle, 1988).

Comparisons to selection models (e.g., Wu and Carroll, 1988; Diggle and Kenward, 1994) can be made by deriving the associated probability of dropout at time u as a function of y. Let $u^{0} = {(u_{1}^{0}, \dots, u_{r - 1}^{0})}^{T}$ denote the set of ordered, unique dropout times, and define an arbitrary time $u_{r}^{0} > u_{r - 1}^{0}$ for the completers. The selection function can be written in closed form as a logistic regression using baseline-category logits, where completers $(u = u_{r}^{0})$ define the baseline category. Let $h_{s} (y_{i}) = pr (U = u_{s}^{0} {| y}_{i}) / pr (U = u_{r}^{0} {| y}_{i})$ , which characterizes the odds of dropping out at time $u_{s}^{0}$ relative to completing the study, given repeated measures y_i. Taking logs,

\begin{matrix} log h_{s} (y_{i}) = & log (π_{r} / π_{s}) + log f (y_{i} | U = u_{s}^{0}) \\ - log f (y_{i} | U = u_{r}^{0}), \end{matrix}

(3)

where $π_{s} = Pr (U_{i} = u_{s}^{0})$ , and

\begin{matrix} f (y_{i} | U_{i} = u) = & {| V_{i} (θ, u) |}^{- n_{i} / 2} \\ \times exp [- {y_{i} - X_{i} β (u)}^{T} V_{i} {(θ, u)}^{- 1} \\ \times {y_{i} - X_{i} β (u)} / 2], \end{matrix}

with $V_{i} (θ, u) = Z_{i} D_{i} (θ, u) Z_{i}^{T} + R_{i} (θ, u)$ . In general, the selection model (3) is quadratic in y_i. A more familiar version obtains when variance is constant across dropout times, i.e., when V_i(θ, u) = V_i(θ). Then (3) simplifies to

\begin{matrix} log h_{s} (y_{i}) = & w {π_{s}, π_{r}, β (u_{s}^{0}), β (u_{r}^{0}), V_{i} (θ)} \\ + {β (u_{s}^{0}) - β (u_{r}^{0})}^{T} X_{i}^{T} V_{i} {(θ)}^{- 1} y_{i}, \end{matrix}

(4)

where w(·) is a function of terms that do not depend on y_i. Under this formulation, the log relative probability of dropout at $u_{s}^{0}$ is linear in y_i, with coefficients that depend on the difference $β (u_{s}^{0}) - β (u_{r}^{0})$ increasing in magnitude as β(u) depends more strongly on u. For binary U, (4) is consistent with the selection model restrictions used by Little (1994) and Little and Wang (1996) to identify parameters in a two-component pattern-mixture model.

3. Estimation Procedures

In this section, we discuss estimation procedures when dropout times U_i are observed for all subjects. Under this circumstance, the likelihood of (Y_i, U_i ) for the ith subject can be partitioned as

L_{i} (Y_{i}, U_{i}; β, θ, π) = L_{i} (Y_{i}, | U_{i}; β, θ) L_{i} (U_{i}; π) .

It follows that (β, θ) can be estimated by maximizing the conditional likelihood $\prod_{i = 1}^{m} L_{i} (Y_{i}, | U_{i}; β, θ)$ , and π can be estimated by maximizing the marginal likelihood $\prod_{i = 1}^{m} L_{i} (U_{i}; π)$ . We consider estimation procedures first for the situation where subjects are observed at a common set of time points, and then where the set of observation times may be misaligned across subjects.

3.1 Estimation Procedures for Fixed and Common Observation Times

For a fixed design, such as a panel design, subjects are observed at n (often small) prespecified time points (t₁,…, t_n ) and the number of possible dropout times is small. Thus, the nonparametric functions β_j(u) are assumed to be step functions.

Let r be the number of observed distinct values of u, with r ≤ n. As indicated in Section 2, let $u^{0} = {(u_{1}^{0}, \dots, u_{r}^{0})}^{T}$ be an r × 1 vector of ordered distinct dropout times, where we assume completers take $u_{r}^{0} = t_{n + 1}$ for some value t_n+1. Hence β_j(u) is fully determined by β_j = (β_j1,…, β_jr)^T, where β_jk represents the jth covariate effect for those dropping out at $u_{k}^{0}$ . Denote the incidence matrix by N_i = (N_i1,…, N_ir )^T where N_ik = 1 if $u_{i} = u_{k}^{0}$ and 0 otherwise. Some calculations show that model (2) can then be written as a linear mixed model

Y_{i} = \sum_{j = 1}^{p} X_{i j} N_{i}^{T} β_{j} + Z_{i} b_{i} + ε_{i .}

Equivalently, we have

Y = \sum_{j = 1}^{p} {\tilde{X}}_{j} β_{j} + Z b + ε,

(5)

where ${\tilde{X}}_{j} = X_{j} \otimes N, X_{j} = {(X_{1 j}^{T}, \dots, X_{m j}^{T})}^{T}, N = {(N_{1}^{T}, \dots, N_{m}^{T})}^{T}$ , and Y, Z, b, and ε are defined similarly to N. Here A ⊗ B denotes a direct product of matrices A and B.

It follows that estimation of $β = {(β_{1}^{T}, \dots, β_{p}^{T})}^{T}$ proceeds by solving the linear mixed model normal equation

({\tilde{X}}^{T} V^{- 1} \tilde{X}) β = {\tilde{X}}^{T} V^{- 1} Y,

where X̃=(X̃₁,…,X̃_p),V = diag (Vi),and $V_{i} = cov (Y_{i}) = Z_{i} D Z_{i}^{T} + R_{i}$ . The covariance matrix of the estimator of β is (X̃^T V⁻¹ X̃)⁻¹. Estimation of θ can be obtained using the restricted maximum likelihood (REML) estimating equation

- \frac{1}{2} tr (P \frac{\partial V}{\partial θ_{j}}) + \frac{1}{2} {(Y - \tilde{X} β)}^{T} V^{- 1} \frac{\partial V}{\partial θ_{j}} V^{- 1} (Y - \tilde{X} β) = 0,

where P = V⁻¹ − V⁻¹ X̃ (X̃^T V⁻¹ X̃)^{− 1} X̃^T V⁻¹. The (j, k)th component of the information matrix of the estimator of θ is (1/2)tr{P (∂ V/∂θ_j)P (∂V/∂θ_k)}. If D and R depend on u, the identifiability of some components of θ based on the observed data might require some constraints on θ or a sensitivity analysis (Little, 1993; Daniels and Hogan, 2000). For example, consider the case with two time points (n = 2); if dropout is informative, there is no information in the observed data to fit a random intercept and slope model for those who drop out at time 2, and sensitivity analyses are needed.

The necessary and sufficient condition for the β_j to be identifiable from model (5) can be stated as follows:

Let S_k = {i_k1,…,i_{km_k}} denote the indexes of those m_k subjects who drop out at $u_{k}^{0}$ , and $χ_{k} = (X_{i_{k 1}}^{T}, \dots, X_{i_{k m_{k}}}^{T})$ . If rank(χ _k) = p for all k = 1,…, r, then the β_j (j = 1,…, p) are identifiable.

This condition states that when dropout times are discrete and finite, regression coefficients must be separately estimable for each dropout pattern. If some components of β_j are not identifiable for some dropout patterns, then the observed data do not have information about these components and either parameter constraints or sensitivity analyses are needed. For example, consider again the case with two time points (n = 2); if dropout is informative, there is no information in the observed data to estimate the mean of the outcome at time 2 for those who drop out at time 2, and sensitivity analysis is hence needed. See Little and Wang (1996) and Daniels and Hogan (2000) for detailed discussion.

Note that the jth regression coefficient vector β_j measures the jth covariate effect conditional on dropout times. Primary interest is in the marginal jth covariate effect β̃_j = E_U{β_j(u)} = π^Tβ_j, where π = (π₁,…, π_r)^T and $π_{j} = P (U = u_{k}^{0})$ is the probability that a subject drops out at $u_{k}^{0}$ . It follows that one can estimate β̃_j by π̂^T β̂_j, where π̂_j = m_j / m, m_J is the number of subjects who drop out at $u_{k}^{0}$ , and β̂is the maximum likelihood estimator from fitting model (5).

3.2 Estimation for Random or Misaligned Measurement Times

For situations when the measurement times are random or misaligned across subjects, repeated measures of each subject are observed at different time points and the underlying dropout times are often continuous. We therefore assume that the regression coefficient functions β_j(u) are twice-differentiable smooth functions and estimate them using cubic smoothing splines. A key feature of cubic smoothing spline estimation is that we can estimate all model components—including the nonparametric regression coefficient functions β_j(u), variance components θ, and smoothing parameters—within an augmented parametric linear mixed model frame-work (Zhang et al., 1998).

Following the notation in Section 3.1, let r be the number of ordered distinct values of the $U_{i}, u^{0} = {(u_{1}^{0}, \dots, u_{r}^{0})}^{T}$ be the ordered distinct values of the U_i, and $β_{j} = {β_{j} (u_{1}^{0}), \dots, β_{j} (u_{r}^{0})}^{T}$ be the values of β_j(u) evaluated at u⁰. Given the variance components θ, the conditional log likelihood of β_j given the u_i is

\begin{matrix} ℓ (Y | u; β_{1}, \dots, β_{p}) = & \sum_{i = 1}^{m} - \frac{1}{2} ln | V_{i} | \\ - \frac{1}{2} {Y_{i} - \sum_{j = 1}^{p} X_{i j} β_{j} (u_{i})}^{T} \\ \times V_{i}^{- 1} {Y_{i} - \sum_{j = 1}^{p} X_{i j} β_{j} (u_{i})} \\ = & - \frac{1}{2} ln | V | - \frac{1}{2} {(Y - \sum_{j = 1}^{p} {\tilde{X}}_{j} β_{j})}^{T} \\ \times V^{- 1} (Y - \sum_{j = 1}^{p} {\tilde{X}}_{j} β_{j}), \end{matrix}

where X̃_j, V_i, V, and Y were defined in Section 3.1.

Following O’Sullivan, Yandell, and Raynor (1986), one can show that the natural cubic smoothing spline estimators of the β_j(u) maximize the following penalized conditional log likelihood

\begin{matrix} ℓ (Y | u; β_{1}, \dots, β_{p}) - \frac{1}{2} \sum_{j = 1}^{p} λ_{j} \int_{A_{1}}^{A_{2}} {[{β^{″}}_{j} (u)]}^{2} d u \\ = ℓ (Y | u; β_{1}, \dots, β_{p}) - \frac{1}{2} \sum_{j = 1}^{p} λ_{j} β_{j}^{T} K β_{j}, \end{matrix}

(6)

where the λ_j are smoothing parameters controlling the balance between goodness-of-fit and the smoothness of the estimated β_j (u), A₁ and A₂ specify the range of u, and K is the nonnegative definite natural cubic smoothing spline smoothing matrix constructed using u₀ and defined in Green and Silverman (1994, equation [2. 3]).

For fixed smoothing parameters λ = (λ₁,…, λ_p)^T and the variance components θ, differentiation of (6) with respect to (β₁,…, β_p) gives their estimating equations as

\begin{matrix} [\begin{matrix} {\tilde{X}}_{1}^{T} V^{- 1} {\tilde{X}}_{1} + λ_{1} K & {\tilde{X}}_{1}^{T} V^{- 1} {\tilde{X}}_{2} & \dots & {\tilde{X}}_{1}^{T} V^{- 1} {\tilde{X}}_{p} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ {\tilde{X}}_{p}^{T} V^{- 1} {\tilde{X}}_{1} & {\tilde{X}}_{p}^{T} V^{- 1} {\tilde{X}}_{2} & \dots & {\tilde{X}}_{p}^{T} V^{- 1} {\tilde{X}}_{p} + λ_{p} K \end{matrix}] \\ \times [\begin{matrix} β_{1} \\ ⋮ \\ β_{p} \end{matrix}] = [\begin{matrix} {\tilde{X}}_{1}^{T} V^{- 1} Y \\ ⋮ \\ {\tilde{X}}_{p}^{T} V^{- 1} Y \end{matrix}] . \end{matrix}

(7)

One can solve equation (7) using a backfitting algorithm as follows:

{\hat{β}}_{j} = {({\tilde{X}}_{j}^{T} V^{- 1} {\tilde{X}}_{j} + λ_{j} K)}^{- 1} {\tilde{X}}_{j}^{T} V^{- 1} (Y - \sum_{k \neq j} {\tilde{X}}_{k} β_{k})

for j = 1,…, p.

Following Zhang et al. (1998), we show that these cubic smoothing spline estimators of the β_j can be obtained by fitting an augmented linear mixed model. Speciffically, β_j can be written via a one-to-one transformation as

β_{j} = U γ_{j} + B a_{j,}

where U = (1, u⁰) B = L (L^TL) ⁻¹, L is an r × (r − 2) full rank matrix satisfying K = LL^T and L^TU = 0, γ_j is a 2 × 1 unknown vector, and β_j is an (r − 2) × 1 unknown vector. It can be shown that $β_{j}^{T} K β = a_{j}^{T} a_{j}$ . The penalized log likelihood (6) becomes

ℓ (Y | U; β_{1}, \dots, β_{p}) - \frac{1}{2} \sum_{j = 1}^{p} λ_{j} a_{j}^{T} a_{j} .

It follows that the nonparametric natural cubic spline estimators of the β_j can be obtained by fitting the parametric linear mixed model

Y = \sum_{j = 1}^{p} ({\tilde{X}}_{j} U) γ_{j} + \sum_{j = 1}^{p} ({\tilde{X}}_{j} B) a_{j} + Z b + ε,

(8)

where $γ = {(γ_{1}^{T}, \dots, γ_{p}^{T})}^{T}$ is a vector of regression coefficients, $a = {(a_{1}^{T}, \dots, a_{p}^{T})}^{T}$ and b are independent random effects with a ~ N(0, Λ(τ)), b ~ N(0, diag{D(θ)}), Λ(τ) = diag(τ _jI), τ _j = 1/λ_j and τ = (τ ₁,…, τ _p)^T, and ε_i ~ N{0, R_i (θ, u_i)}.

Estimation of γ and a can proceed using the BLUP estimator by solving the normal equation

[\begin{matrix} H^{T} V^{- 1} H & H^{T} V^{- 1} G \\ G^{T} V^{- 1} H & G^{T} V^{- 1} G + Λ^{- 1} \end{matrix}] [\begin{matrix} γ \\ a \end{matrix}] = [\begin{matrix} H^{T} V^{- 1} Y \\ G^{T} V^{- 1} Y \end{matrix}],

(9)

where H = (X̃₁U,…, X̃_pU) and G = (X̃₁B,…, X̃^pB). Denoting γ̂ â; as the solution of (9), the natural cubic smoothing spline estimator of β_j is β̂_j = U γ̂_j + Bâ_j. One can show that the β̂ _j from (9) are identical to those obtained from solving (7). The natural cubic spline estimators β̂ _j are unique when H is of full rank.

We have so far assumed that the smoothing parameters λ and the variance components θ are known when estimating the β_j. They are usually unknown in practice and need to be estimated from the data. Examination of the modified linear mixed model (8) suggests that τ behaves like variance components; therefore, following Zhang et al. (1998), we estimate the smoothing parameters τ and the variance components θ simultaneously using REML by treating τ as extra variance components in addition to θ in (8). The REML estimating equations for θ and τ are

\begin{matrix} - \frac{1}{2} tr (\tilde{P} \frac{\partial V}{\partial θ_{k}}) + \frac{1}{2} {(Y - \sum_{j = 1}^{p} X_{j} β_{j})}^{T} \\ \times V^{- 1} \frac{\partial V}{\partial θ_{k}} V^{- 1} (Y - \sum_{j = 1}^{p} X_{j} β_{j}) = 0, \\ - \frac{1}{2} tr (\tilde{P} {\tilde{X}}_{j} B B^{T} {\tilde{X}}_{j}^{T}) + \frac{1}{2} {(Y - \sum_{j = 1}^{p} X_{j} β_{j})}^{T} \\ \times V^{- 1} {\tilde{X}}_{j} B B^{T} {\tilde{X}}_{j}^{T} V^{- 1} (Y - \sum_{j = 1}^{p} X_{j} β_{j}) = 0, \end{matrix}

where P̃ = V ⁻¹ − V − ¹(H,G)C⁻¹(H,G)^TV ⁻¹, and C is the coefficient matrix on the left-hand side of (9). Parameters from the varying coefficient mixture model (1) can therefore be obtained by fitting the parametric linear mixed model (8) by SAS Proc Mixed (Version 8.2). The smoothing matrix B needs to be computed in advance.

3.3 Inference for Marginal Regression Coecients

The marginal jth covariate effect β̃_j is β̃ _j = E_U{β_j(u)} = ∫ β_j(u) dF(u), where F(u) is the c.d.f. of u. Using the estimated cubic smoothing spline β̂_j (u) and the empirical c.d.f. F̂ (u), one can estimate β̃_j by ∫ β̂_j(u)dF^ (u) = π̂^Tβ̂_j, where $\hat{π} = \sum_{i = 1}^{m} N_{i} / m$ . The delta method has been used for standard error estimation for mixture models where the support of U is very small relative to the number of subjects (Hogan and Laird, 1997a; Fitzmaurice and Laird, 2000). In our application with continuous dropout times, we found that the delta method performed poorly; the bootstrap was used instead, treating the subject as the basic resampling unit. Details are provided in the application.

4. Application to AIDS Clinical Trial

In this section, we analyze the ACTG data using several different mixture model formulations representing different assumptions about the missing data mechanism and provide detailed interpretation of the model parameters.

4.1 Variable Transformations and Candidate Models

As indicated in Section 1.2, observed CD4 data were transformed to the square root scale to reduce positive skewness. We fit three models (for computational tractability, each model is fit separately by dose). The first is a standard REM with subject-specific intercepts and linear time trends. The REM, briefly summarized in Section 1.2, assumes that the complete data in each treatment arm follow the linear mixed model

Y_{i l} = β_{1} + β_{2} t_{i l}^{*} + b_{1 i} + b_{2 i} t_{i l}^{*} + ε_{i l},

(10)

where Y_il is square root of CD4 count at time t_il, l = 1,…, n_i, b_i = (b_1i, b_2i) ~ N(0, D), and ε_il ~ N(0, σ²) (i.e., R_i(θ) = σ²I_{n_i} ), with independence between ε_il and b_i. The time axis is rescaled using $t_{i l}^{*} = (t_{i l} - Ū) / range (t_{i l})$ so that the new time scale has range 1 and is centered at the sample mean of dropout times (the range is computed for the pooled sample from both dose arms). This is a standard REM, but can be viewed as a special case of (1) where β_j(u) is constant in u for j = 1, 2.

In addition to the REM, we fit a CLM in which individual intercepts and slopes are linear functions of dropout time, and a VCM, where intercepts and slopes are unspecified smooth functions of dropout time. The CLM is precisely model (8) under the assumption that τ = 0 (equivalently, a_j = 0 for all j), and therefore it is just a specialized version of the VCM. Separately by treatment, the CLM elaborates (10) such that β ₁(u) = γ₁ + γ₂u* and β₂(u) = γ₃ + γ₄u*, where u* = (u -Û)/range(t_il). In our parameterization with the rescaled time axis, γ₁ is the mean of (CD4)^1/2 at u = Ū, γ₂ is the “main effect” of dropout, which is the slope of β₁(u), γ₃ is mean change in (CD4)^1/2 from baseline at u = Ū, and γ₄ is the effect of interaction between dropout and change from baseline, representing the slope of β₂(u) on u. These parameters are estimated in straightforward fashion by fitting with SAS PROC MIXED a standard linear mixed model $Y_{i j} = H_{i j}^{T} γ + Z_{i j}^{T} b_{i} + ε_{i j}$ , where $H_{i j} = {(1, U_{i j}^{*}, t_{i j}^{*}, t_{i j}^{*} U_{i j}^{*})}^{T}, and Z_{i j} = {(1, t_{i j}^{*})}^{T}$ .

Rescaling the time axis has some practical advantages in terms of model fitting and interpretation: (i) it makes the estimates more stable by increasing the variance of individual slopes away from zero; (ii) for the CLM, the intercept and slope main effect are already averaged over dropout time (Fitzmaurice, Laird, and Shneyer, 2001); (iii) for the REM and CLM, the slope parameter corresponds to average total change in square root CD4 from baseline to the longest follow-up time for the study; and (iv) for the CLM, the parameters γ₂ and γ₄ contrast mean intercept and slope, respectively, between those who drop out immediately (directly after u = 0) and those who complete the study protocol (u = max_il t_il ).

The VCM uses the same design matrix for γ as the CLM. Estimation of the a_j is made by specifying the r − 2 columns of X̃₁B and X̃₂B as independent random effects with respective variances $τ_{1}^{2} and τ_{2}^{2}$ .

4.2 Summary of Fitted Models

4.2.1 Parameter estimates

To get a crude understanding of whether MAR may be a valid assumption, we begin by summarizing regression coefficients for parameterizations of f(y | u) given by the REM and CLM; these appear in Table 1. For both models, the parameter γ₃ quantifies average change from baseline to the maximum follow-up time. Recall that in the CLM, $β_{1} (u) = γ_{1} + γ_{2} (u * - \bar{u *}) and β_{2} (u) = γ_{3} + γ_{4} (u * - \bar{u *})$ , so that if we assume MAR is violated according to linear dependence between intercept and dropout and/or slope and dropout, the parameters γ₂ and γ₄ will quantify the degree to which MAR fails to hold. The parameter estimates from Model 1 suggest that intercepts β₁(u) vary considerably by dropout time (γ̂₂ = 14.6 and 13.0 [SE 3.3 and 3.5] for low and high dose, respectively); the same pattern is indicated for slopes β₂(u), where γ̂₄ = 26.1 and 33.7 (SE 4.8 and 6.5) for low and high dose. The parameters γ₂ and γ₄ represent the average difference in CD4 intercept (γ₂) and slope (γ₄) between those who dropped out immediately after enrolling and those who were followed for the maximum time (220 weeks). Hence the CLM indicates that dropouts have lower CD4 intercepts and slopes than completers, leading to selection bias for end-of-study comparisons due to missing data on less healthy participants. Because dropout time is centered at its mean in the CLM, γ̂ is the expected change in square root CD4 from week 0 to 220 if a subject were followed for the whole study period (note, however, that the standard error is from the conditional distribution of Y given U). A comparison of the estimates of γ under REM and CLM suggests that the effect of accounting for dropout via CLM is to correct the average change in (CD4)^1/2 downward.

Table 1.

Parameter estimates from conditional part of joint model, assuming linear random effects structure (REM), and conditional linear model (CLM)

		AZT dose
Model	Parameter^a	Low (90 mg)	High (180 mg)
REM	γ₁	28.6 (0.8)	30.1 (0.9)
	γ₃	−12.7 (0.8)	−18.2 (1.4)
CLM	γ₁	28.9 (0.8)	30.3 (0.8)
	γ₂	14.6 (3.3)	13.0 (3.5)
	γ₃	−15.9 (0.9)	−21.4 (1.4)
	γ₄	26.1 (4.8)	33.7 (6.5)

Open in a new tab

See Section 4 for definitions.

In the VCM we allow both β₁(u) and β₂(u) to be completely unspecified for both treatment arms, and estimate them using cubic smoothing splines. Plots of the estimated functionals from the VCM, together with empirical Bayes estimates of individual CD4 intercepts and slopes, appear in Figure 3. In both treatment arms, CD4 intercept and slope are highly associated with dropout time: those who remain in the study for longer periods have higher values of both. Except for the CD4 intercept on the low-dose arm, the associations appear to be highly nonlinear.

Estimated functions β₁(u) and β₂(u) for low- and high-dose ZDV arms, together with empirical Bayes estimates of individual intercepts and slopes (on square root of CD4 scale). Slopes correspond to change from baseline to week 200. Standardized follow-up times correspond to deviation from the average from the combined sample, and one unit represents 200 weeks.

4.2.2 Comparison of treatment effect inferences across models

Because the complete data likelihood factors over (β, θ) and π, and because our three models differ only in the specification of the conditional factor f(y | u; β, θ), model selection can in principle be based on criteria for the conditional (y | u) model. However, formal model selection procedures comparing parametric and nonparametric mixed effects models are not currently well developed and are beyond the scope of this article; furthermore, there are potential difficulties associated with comparing likelihoods for models that are not properly nested in a traditional way, e.g., due to boundary-value problems.

Table 2 lists estimates and associated standard errors for the marginal regression coefficients β̃_j = ∫ β_j(u) dF(u), for j = 1, 2, estimated according to the procedure described in Section 3.3 (note that the marginal coefficients—and not the conditional coefficients reported in Table 1—are of direct scientific interest). Standard errors were calculated using bootstrap resampling based on 100 replicated datasets sampled with replacement. Quantile plots of the bootstrap parameter estimates showed no obvious departures from normality (not shown), so Z-statistics are used for inferences. Not surprisingly, standard errors associated with the VCM are increased relative to the CLM, reflecting uncertainty about the functional form of β_j(u). For the low-dose arm, the increase is relatively modest. Comparing slopes in the low-dose arm, for example, SE(β̂₂) = 1.0 for the CLM and 1.3 for the VCM. No appreciable difference in standard errors is seen in the estimated intercepts.

Table 2.

Estimated intercept and slope characterizing marginal mean of CD4 trajectory under three different specifications for conditional part of joint model, with standard errors estimated via bootstrap

Model	Parameter	Low dose	High dose	Difference (SE)	Z
REM	β₁	28.6 (0.8)	30.1 (0.9)
	β₂	−12.7 (0.8)	−18.2 (1.4)	−5.5 (1.6)	−3.4
CLM	β₁	28.9 (0.8)	30.3 (0.9)
	β₂	−15.9 (1.0)	−21.4 (1.8)	−5.5 (2.0)	−2.8
VCM	β₁	29.0 (0.7)	29.9 (0.9)
	β₂	−17.1 (1.3)	−20.1 (2.4)	−3.0 (2.7)	−1.1

Open in a new tab

Table 2 also indicates the degree to which adjusting for selection bias affects the final inferences about treatment. Under the MAR assumption (REM), estimated mean difference in total change in (CD4)^1/2 is −5.5, with Z-statistic = −3.4; adjustment under the CLM gives the same estimated effect, with Z-statistic = −2.8. Both lead to the conclusion that low dose is superior to high dose because the decline in CD4 is less steep. Under the VCM, the correction for possible selection biases on low dose changes the slope estimate from −12.7 (REM) to −17.1; the correction is less severe on high dose (−18.2 for REM, compared to −20.1 for VCM); the effect is to narrow the gap in treatment effect to −3.0, with Z = −1.1, representing a change of 1.56 standard errors relative to the REM, and 1.25 standard errors relative to the CLM.

The VCM also provides an important substantive insight, namely that participants who drop out of low dose (the experimental dose in this trial) tend to have steeper decline in their CD4 counts, compared to those on high dose. The trial was designed to see whether the lower dose, known to be associated with fewer side effects in adults, would have efficacy equal to the high dose. The form of the β₂(u) functions suggests that among the early dropouts, rate of change in CD4 for those on low dose is substantially less than for those on high dose. In an MAR analysis, early dropouts contribute less information to the estimate of population slope because they have fewer observed data points, leading to the potential selection bias seen in the REM.

5. Simulation Study

Our model gives the analyst considerable exibility in specifying dependence between outcome and dropout in the context of a mixture model, and avoids biases that are possible if the functional form of this dependence is assumed to be known. The primary innovation of the VCM over CLM is that β(u) can be left unspecified, but this generalization relies on the key assumption that β(u) is a (vector of) smooth, twice differentiable functions of u. We designed a brief simulation study to investigate the performance of our model under violations of this assumption.

Each simulation uses datasets with n = 50 subjects having up to 15 unique dropout times, with β(u) taking three different forms. We compare estimates of mean change from baseline from a standard REM, the CLM with components of β(u) assumed linear in u, and from the VCM with β(u) left unspecified. Specifically, we assume

y_{i l} = β_{1} (u_{i}) + β_{2} (u_{i}) t_{i l} + b_{1 i} + b_{2 i} t_{i l} + e_{i l},

where (b_1i, b_2i) ^T ~ N (0, D), e_il ~ N (0, σ²). There are 15 time points {t_il}, equally spaced between 0 and 1. This simulation uses d₁₁ = 4, d₂₂ = 0.1, d₁₂ = -0.1 (correlation ≈ −0.15), and σ² = 1, which implies that between-subject variation exceeds within-subject variation by a factor of about 4.

Dropout is generated from a beta mixture of binomial distributions as follows: p ~ Beta(1.5, 1.5) (mean 0.5), U* ~ Bin(15, p), and dropout time U = U*/15 ε (0, 1). Finally, we assume β₁(u) = 0 and vary the functional form of β₂(u); candidate functions are

−exp(αu),
exp(αu) I(u < t*) + exp(αt*) I(u ≥ t*) (exponential with plateau effect for dropouts beyond t*),
α₁I(u < t*) + α₂I(u ≥ t*) (two-piece step function).

Case (i) actually meets the assumptions for the VCM, and is included for validating our simulation and estimation routines; case (ii) violates the smoothness assumption and case (iii) violates both smoothness and continuity assumptions.

For (i), at α = −4, completers (U = 1) have mean change from baseline β₂(1) = exp(−4) ≈ 0.02 and early dropouts (U = 0) have mean change −1, a difference of about 3 SD (because d₂₂ = 0.1). Under (ii), we keep α = −4 and invoke the plateau effect at t* = 2/3, leading to a structure wherein those who complete 2/3 of the study or more have average change from baseline equal to exp(−4 × 2/3) ≈ 0.07. For (iii), we keep t* = 2/3 and set α₁ = 0, α₂ = 1.

Results are reported in Table 3. As expected, the VCM gives virtually unbiased estimation of the true slope for case (i), where β₂(u) is both continuous and smooth, while both the REM and CLM show substantial upward bias. This comparison is not as trivial as it would appear, however, because exploratory plots of OLS slopes versus dropout time (e.g., Figure 2) do not always reveal an obvious functional form for β(u), particularly in the early part of the time axis. One advantage to the VCM is its effectiveness in finding a signal from noisy data.

Table 3.

Results from simulation to characterize bias. REM = linear random effects model; CLM = conditional linear model with β₂ (u) linear; VCM = varying coefficient model with β₂(u) unspecified. Each estimate represents a sample average of estimated slopes over 100 replicated datasets, each having 100 subjects with up to 15 repeated measures. Standard errors for simulation-based estimated mean appear in parentheses.

		Estimated slope
Underlying model^a	True slope (β₂)	REM	CLM	VCM
(i) Continuous, smooth	0.159^b	−0.073 (0.018)	−0.119 (0.035)	−0.160 (0.028)
(ii) Continuous, not smooth	−0.170^b	−0.062 (0.015)	−0.100 (0.027)	−0.166 (0.033)
(iii) Discontinuous	−0.587^b	−0.211 (0.018)	−0.622 (0.031)	−0.715 (0.038)

Open in a new tab

See Section 5 for model descriptions.

Computed to nearest 0.001 via Monte Carlo simulation.

The VCM shows only very little bias for estimating the true slope for the continuous but not everywhere-differentiable function from case (ii), but exhibits more bias than the CLM for the discontinuous function in case (iii). In all cases, however, the VCM outperforms the REM.

6. Discussion

6.1 Summary

We have proposed a mixture-modeling approach to analyzing longitudinal data with outcome-dependent dropout. Our model assumes that covariate effects depend on dropout time through unspecified functions {β_j(u)}, where u is dropout time. When dropout times are discrete, the β_j(u) are step functions, and when dropout is continuous, β_j(u) are assumed to be unspecified smooth functions of u. This formulation generalizes pattern-mixture models (Little, 1993, 1994) and random effects mixture models (Wu and Bailey, 1988, 1989; Hogan and Laird, 1997a; Albert and Follman, 2000) for continuous response data. Using an example from an AIDS clinical trial, we show that the model has the potential to adjust for selection biases induced by poor responders dropping out early.

The primary innovation in our approach is that the functional dependence between covariate effects and dropout time can be left unspecified. Our simulation study shows that when this relationship is misspecified, CLMs yield biased estimates, while varying coefficient mixture models still yield unbiased estimates and are more robust. In many applications, this is a decided advantage over the CLM model because the form of β(u) rarely will be known or intuitive. Moreover, it is our experience that using polynomials leads to overfitting and/or extrapolations well outside the range of data, particularly when the polynomial has degree >2. When u is continuous, our simulation also shows that inferences are unlikely to be sensitive to lack of smoothness in β(u), but could be affected by discontinuities. In both cases, however, bias is substantially less than under an MAR analysis.

6.2 Strategies for Sensitivity Analysis

Another advantage to mixture modeling in general is that extrapolations of the missing data are transparent, and lend themselves well both to substantive critique and to empirical sensitivity analysis (e.g., Rubin, 1977; Little and Wang, 1996; Daniels and Hogan, 2000; Rotnitzky et al., 2001). The nonidentifiable component of our model is the distribution of missing responses following dropout, f(y_mis | y_obs, u).

In our data application, for example, we assume dropouts at time u have the same slope for t > u as for t ≤ u; on the surface this is a strong assumption but it is relatively easily modified and is a sensible starting point for sensitivity analysis. Specifically, the VCM in our data example takes the form

(Y_{i j} | U_{i} = u) = β_{1 i} (u) + β_{2 i} (u) t_{i j} + ε_{i j},

(11)

where β_1i(u) = β₁(u) + b_1i, β_2i(u) = β₂(u) + b_2i, b_i = (b_1i, b_2i) ^T ~ N {0, D(u)}, and ε_ij ~ N{0, σ²(u)} and is independent of b_i. Note that in our application, we assume D(u) = D and σ²(u) = σ², an assumption that can be relaxed by introducing models to characterize the variance components as a function of u (Daniels and Pourahmadi, 2002).

One approach to sensitivity analysis in model (11) is to assume a different slope on time for t > u, i.e., assume a continuous piecewise linear model with the change point u as

(Y_{i j} | U_{i} = u) = β_{1 i} (u) + β_{2 i} (u) t_{i j} + δ_{i} (u) {(t_{i j} - u)}_{+} + ε_{i j},

(12)

where a₊ = a if a > 0 and 0 otherwise, δ_i(u) = δ(u) + d_i, and $b_{i}^{*} = {(b_{1 i}, b_{2 i}, d_{i})}^{T} \sim N (0, D^{*})$ is a 3 × 3 matrix for random effects. Model (12) assumes the slope changes from β_2i(u) for t ≤ u to β_2i (u) + δ_i(u) for t > u. The nonidentifiable sensitivity parameters are therefore δ(u) and the variance components comprising the third row (column) of D*. Observed data provide no information about these parameters; one approach is to fix them at various values and recompute quantities of interest (such as expected change in CD4 from beginning to end of the study) and examine their sensitivity with respect to δ(u) and the unidentifiable parameters of D*. Details of the sensitivity analysis will be investigated in future work.

RÉSUMÉ

L’analyse d’observations longitudinales répétées est souvent compliquée par l’absence de données consécutive á des sorties d’étude informatives. Nous décrivons un modéle par mélange pour la distribution conjointe de mesures longitudinales répétées oú la distribution des sorties peut étre continue et oú la relation entre réponse et sortie est semi-paramétrique. Plus précisément, nous supposons que les réponses suivent un modéle mixte conditionnellement au temps de sortie, dont les coefficients dépendent de la sortie par l’intermédiaire de fonctions non paramétriques estimées á l’aide de fonctions de saut, quand les dates de sortie sont discrétes (études de panel), et á l’aide de fonctions de splines quand les sorties sont continues. L’inférence á partir de ce modéle est ainsi plus robuste qu’á partir du modéle linéaire conditionnel paramétrique. La distribution non conditionnelle des mesures répétées est un mélange de distributions conditionnelles. Nous montrons que l’on peut estimer le modéle par méelange en ajustant un modéle paramétrique á effets mixtes á l’aide d’un logiciel standard comme SAS. Le modéle est appliqué aux données d’un essai thérapeutique récent contre le SIDA et ses performances sont évaluées par simulations.

ACKNOWLEDGEMENTS

Work on this project was funded by grants R01-AI-50505 and P30-AI-42853 (Hogan) and CA76404 (Lin) from the U.S. National Institutes of Health. The authors are grateful to Rusty Tchernis for assistance with computing related to the simulation and implementation of the bootstrap, and to Jason Roy and two anonymous reviewers for helpful comments.

REFERENCES

Albert PS, Follmann D. Modeling repeated count data subject to informative dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]
the Pediatric AIDS clinical trial. Brady MT, McGrath N, Brouwers P, et al. Randomized study of the tolerance and efficacy of high- versus low-dose zidovudine in human immunodeficiency virus-infected children with mild to moderate symptoms (ACTG 128) Journal of Infectious Disease. 1996;173:1097–1106. doi: 10.1093/infdis/173.5.1097. [DOI] [PubMed] [Google Scholar]
Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
Daniels MJ, Hogan JW. Reparameterizing the pattern mixture model for sensitivity analyses under informative dropout. Biometrics. 2000;56:1241–1248. doi: 10.1111/j.0006-341x.2000.01241.x. [DOI] [PubMed] [Google Scholar]
Daniels M, Pourahmadi M. Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika. 2002;89:553–566. [Google Scholar]
DeGruttola V, Tu XM. Modeling the progression of CD4-lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed] [Google Scholar]
Diggle PJ. An approach to the analysis of repeated measurements. Biometrics. 1988;44:959–971. [PubMed] [Google Scholar]
Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis. Applied Statistics. 1994;43:49–73. [Google Scholar]
Fitzmaurice GM, Laird NM. Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics. 2000;1:141–156. doi: 10.1093/biostatistics/1.2.141. [DOI] [PubMed] [Google Scholar]
Fitzmaurice GM, Laird NM, Shneyer L. An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs. Statistics in Medicine. 2001;20:1009–1021. doi: 10.1002/sim.718. [DOI] [PubMed] [Google Scholar]
Follman D, Wu MC. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed] [Google Scholar]
Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman & Hall; 1994. [Google Scholar]
Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997a;16:239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
Hogan JW, Laird NM. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997b;16:259–272. doi: 10.1002/(sici)1097-0258(19970215)16:3<259::aid-sim484>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
Kenward MG, Molenberghs G. Parametric models for incomplete continuous and categorical longitudinal data. Statistical Methods in Medical Research. 1999;8:51–83. doi: 10.1177/096228029900800105. [DOI] [PubMed] [Google Scholar]
Laird NM. Missing data in longitudinal studies. Statistics in Medicine. 1988;7:305–315. doi: 10.1002/sim.4780070131. [DOI] [PubMed] [Google Scholar]
Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134. [Google Scholar]
Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81:471–483. [Google Scholar]
Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90:1112–1121. [Google Scholar]
Little RJA, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed] [Google Scholar]
O’Sullivan F, Yandell BS, Raynor WJ., Jr Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association. 1986;81:96–103. [Google Scholar]
Robins J, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]
Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with non-ignorable non-response. Journal of the American Statistical Association. 1998;93:1321–1339. [Google Scholar]
Rotnitzky A, Scharfstein D, Su TL, Robins J. Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics. 2001;57:103–113. doi: 10.1111/j.0006-341x.2001.00103.x. [DOI] [PubMed] [Google Scholar]
Rubin DB. Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association. 1977;72:538–543. [Google Scholar]
Scharfstein D, Robins J, Rotnitzky A. Adjusting for nonignorable nonresponse using semiparametric nonresponse models with time dependent covariates (with discussion) Journal of the American Statistical Association. 1999;94:1096–1146. [Google Scholar]
Schluchlter MD. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]
Ten Have TR, Kunselman AR, Pulkstenis EP, Landis JR. Mixed effects logistic regression models for longitudinal binary response data with informative drop-out. Biometrics. 1998;54:367–383. [PubMed] [Google Scholar]
Wu MC, Bailey K. Analysing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine. 1988;7:337–346. doi: 10.1002/sim.4780070134. [DOI] [PubMed] [Google Scholar]
Wu MC, Bailey K. Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model (corr: V46 p. 889) Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]
Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188. [Google Scholar]
Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association. 1998;93:710–719. [Google Scholar]

[R1] Albert PS, Follmann D. Modeling repeated count data subject to informative dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]

[R2] the Pediatric AIDS clinical trial. Brady MT, McGrath N, Brouwers P, et al. Randomized study of the tolerance and efficacy of high- versus low-dose zidovudine in human immunodeficiency virus-infected children with mild to moderate symptoms (ACTG 128) Journal of Infectious Disease. 1996;173:1097–1106. doi: 10.1093/infdis/173.5.1097. [DOI] [PubMed] [Google Scholar]

[R3] Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]

[R4] Daniels MJ, Hogan JW. Reparameterizing the pattern mixture model for sensitivity analyses under informative dropout. Biometrics. 2000;56:1241–1248. doi: 10.1111/j.0006-341x.2000.01241.x. [DOI] [PubMed] [Google Scholar]

[R5] Daniels M, Pourahmadi M. Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika. 2002;89:553–566. [Google Scholar]

[R6] DeGruttola V, Tu XM. Modeling the progression of CD4-lymphocyte count and its relationship to survival time. Biometrics. 1994;50:1003–1014. [PubMed] [Google Scholar]

[R7] Diggle PJ. An approach to the analysis of repeated measurements. Biometrics. 1988;44:959–971. [PubMed] [Google Scholar]

[R8] Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis. Applied Statistics. 1994;43:49–73. [Google Scholar]

[R9] Fitzmaurice GM, Laird NM. Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics. 2000;1:141–156. doi: 10.1093/biostatistics/1.2.141. [DOI] [PubMed] [Google Scholar]

[R10] Fitzmaurice GM, Laird NM, Shneyer L. An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs. Statistics in Medicine. 2001;20:1009–1021. doi: 10.1002/sim.718. [DOI] [PubMed] [Google Scholar]

[R11] Follman D, Wu MC. An approximate generalized linear model with random effects for informative missing data. Biometrics. 1995;51:151–168. [PubMed] [Google Scholar]

[R12] Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman & Hall; 1994. [Google Scholar]

[R13] Hogan JW, Laird NM. Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine. 1997a;16:239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]

[R14] Hogan JW, Laird NM. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997b;16:259–272. doi: 10.1002/(sici)1097-0258(19970215)16:3<259::aid-sim484>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]

[R15] Kenward MG, Molenberghs G. Parametric models for incomplete continuous and categorical longitudinal data. Statistical Methods in Medical Research. 1999;8:51–83. doi: 10.1177/096228029900800105. [DOI] [PubMed] [Google Scholar]

[R16] Laird NM. Missing data in longitudinal studies. Statistics in Medicine. 1988;7:305–315. doi: 10.1002/sim.4780070131. [DOI] [PubMed] [Google Scholar]

[R17] Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]

[R18] Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]

[R19] Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134. [Google Scholar]

[R20] Little RJA. A class of pattern-mixture models for normal incomplete data. Biometrika. 1994;81:471–483. [Google Scholar]

[R21] Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90:1112–1121. [Google Scholar]

[R22] Little RJA, Wang Y. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics. 1996;52:98–111. [PubMed] [Google Scholar]

[R23] O’Sullivan F, Yandell BS, Raynor WJ., Jr Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association. 1986;81:96–103. [Google Scholar]

[R24] Robins J, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]

[R25] Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with non-ignorable non-response. Journal of the American Statistical Association. 1998;93:1321–1339. [Google Scholar]

[R26] Rotnitzky A, Scharfstein D, Su TL, Robins J. Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics. 2001;57:103–113. doi: 10.1111/j.0006-341x.2001.00103.x. [DOI] [PubMed] [Google Scholar]

[R27] Rubin DB. Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association. 1977;72:538–543. [Google Scholar]

[R28] Scharfstein D, Robins J, Rotnitzky A. Adjusting for nonignorable nonresponse using semiparametric nonresponse models with time dependent covariates (with discussion) Journal of the American Statistical Association. 1999;94:1096–1146. [Google Scholar]

[R29] Schluchlter MD. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]

[R30] Ten Have TR, Kunselman AR, Pulkstenis EP, Landis JR. Mixed effects logistic regression models for longitudinal binary response data with informative drop-out. Biometrics. 1998;54:367–383. [PubMed] [Google Scholar]

[R31] Wu MC, Bailey K. Analysing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine. 1988;7:337–346. doi: 10.1002/sim.4780070134. [DOI] [PubMed] [Google Scholar]

[R32] Wu MC, Bailey K. Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model (corr: V46 p. 889) Biometrics. 1989;45:939–955. [PubMed] [Google Scholar]

[R33] Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188. [Google Scholar]

[R34] Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association. 1998;93:710–719. [Google Scholar]

PERMALINK

Mixtures of Varying Coefficient Models for Longitudinal Data with Discrete or Continuous Nonignorable Dropout

Joseph W Hogan

Xihong Lin

Benjamin Herman

SUMMARY