On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis

Naiji Lu; Wan Tang; Hua He; Qin Yu; Paul Crits-Christoph; Hui Zhang; Xin Tu

doi:10.1002/bimj.200800186

. Author manuscript; available in PMC: 2010 Aug 1.

Published in final edited form as: Biom J. 2009 Aug;51(4):627–643. doi: 10.1002/bimj.200800186

On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis

Naiji Lu ^1,^2,³, Wan Tang ^1,², Hua He ^1,^2,³, Qin Yu ¹, Paul Crits-Christoph ⁴, Hui Zhang ¹, Xin Tu ^1,^2,³

PMCID: PMC2875790 NIHMSID: NIHMS188151 PMID: 19688758

Summary

Models for longitudinal data are employed in a wide range of behavioral, biomedical, psychosocial, and health-care related research. One popular model for continuous response is the linear mixed-effects model (LMM). Although simulations by recent studies show that LMM provides reliable estimates under departures from the normality assumption for complete data, the invariable occurrence of missing data in practical studies renders such robustness results less useful when applied to real study data. In this paper, we show by simulated studies that in the presence of missing data estimates of the fixed-effect of LMM are biased under departures from normality. We discuss two robust alternatives, the weighted generalized estimating equations (WGEE) and the augmented WGEE (AWGEE), and compare their performances with LMM using real as well as simulated data. Our simulation results show that both WGEE and AWGEE provide valid inference for skewed non-normal data when missing data follows the missing at random (MAR), the most popular missing data mechanism for real study data.

Keywords: Augmented weighted generalized estimating equations, double robust estimate, missing at random, surrogacy assumption, weighted generalized estimating equations

1 Introduction

Models for longitudinal data are employed in a wide range of behavioral, biomedical, psychosocial, and health-care related research studies. One popular modeling paradigm is the latent variable, or random-effect, based approach for addressing correlated responses arising from such study data (Demidenko, 2004; Fitzmaurice et al. 2004; Raudenbush and Bryk, 2002). For example, for continuous response, the linear mixed-effects model (LMM) is one of the most widely used methods for modeling longitudinal data. By using random-effects to account for correlations across multiple within-subject outcomes, the LMM extends the classic multiple linear regression for modeling cross-sectional data to a general longitudinal data setting.

One major drawback of LMM as well as other random-effects based models is their dependence on distribution assumptions for inference. In recent years, many software packages have implemented robust methods to improve the validity of inference in the presence of departures from assumed parametric models. For example, simulation studies of Maas and Hox (2004; 2005) have found that estimates of the fixed-effects, or population parameters, of LMM are quite robust when model errors depart from the assumed normal distribution. However, these simulation studies have all been carried out under complete data. In most real studies, missing data invariably occurs and the robustness of estimates from LMM requires reassessment in the presence of missing data.

In this paper, we show by simulated data that the robustness of fixed-effect estimates of LMM is compromised under non-random missing data when the error term of the model deviates from the normality assumption. To address this key limitation, we investigate the performance of two robust alternatives, the weighted generalized estimating equations (WGEE) and augmented WGEE (AWGEE), for inference for non-normal longitudinal data. In Section 2, we briefly review the parametric LMM and the distribution-free formulation of this model. In Section 3, we discuss inference by the two dueling paradigms under missing data. In Section 4, we use simulated data to compare the performance of the various approaches for fitting non-normal longitudinal data under non-random missing data. In Section 5, we give our concluding remarks.

2 Models for Longitudinal Data

Over the past three decades, studies in biomedical and behavioral sciences have evolved from simple cross-sectional study designs to modern day longitudinal trials. As longitudinal study designs use subjects as their own controls, they provide a unique opportunity to study changes of outcomes of interest over time, causal effects and disease progression, in addition to providing more power for assessing treatment differences. In this section, we briefly review the two most popular approaches for modeling longitudinal data with continuous response.

2.1 Parametric Linear Mixed-effects Model

First, consider a relatively simple longitudinal study design with only two assessments, i.e., the so-called pre-post design. Let n denote the number of subjects and y_it some continuous outcome of interest from the ith subject at time t (= 1, 2). We are interested in modeling the mean response E(y_it) of y_it over time. One popular approach is the linear mixed-effects model (LMM), which for such a pre-post study design, has the simple form:

y_{i t} = β_{0} + β_{1} I_{{t = 2}} + b_{i} + ε_{i t}, b_{i} \sim N (0, σ_{b}^{2}), ε_{i t} \sim N (0, σ^{2}), 1 \leq i \leq n, 1 \leq t \leq 2,

(1)

where I_{·} denotes a set indicator, and N (μ, σ²) denotes a normal with mean μ and variance σ². In the LMM above, since E (y_it) = β₀ + β₁I_{_t_=2}, β_t represents the (population) mean of y_it at pre-(t = 1) and post-treatment (t = 2) and is known as the fixed or population effect. The latent b_i in (1) accounts for the variation across the individual subjects around the fixed effect or population mean and is known as the random effect.

Let y_i = (y_i₁,y_i₂)^┬. It then follows from (1) that

V = Var (y_{i}) = (\begin{matrix} σ^{2} + σ_{b}^{2} & σ_{b}^{2} \\ σ_{b}^{2} & σ^{2} + σ_{b}^{2} \end{matrix}) = (σ^{2} + σ_{b}^{2}) C_{2} (ρ),

(2)

where $ρ = σ_{b}^{2} / (σ_{b}^{2} + σ^{2})$ is known as the intraclass correlation (ICC) and C_m (ρ) denotes a m × m compound symmetry correlation matrix with a correlation coefficient ρ. The LMM for the pre-post study design will be used to illustrate the performance of LMM in Section 4.

More generally, the LMM for a longitudinal data with n assessments has the following form:

y_{i t} = x_{i t}^{⊤} β + z_{i t}^{⊤} b_{i} + ε_{i}, b_{i} \sim N (0, \sum_{b}), ε_{i} \sim N (0, σ^{2} I_{m}), 1 \leq i \leq n, 1 \leq t \leq m,

(3)

where x_it = (1, x_i₁_t …, x_ipt)^┬ (z_it = (1, z_i₁_t …, z_ilt)^┬) denotes a p × 1 (l × 1) vector of covariates, b_i = (b_i₁,…, b_il)^┬ a l × 1 vector of normal random-effect, ε_i = (ε_i₁,…, ε_im)^┬ the error term for the model, N (μ, Σ) a multivariate normal with mean μ and variance Σ, I_m the m × m identity matrix. Under (3), we have:

V_{i} = Var (y_{i} | x_{i}, z_{i}) = Z_{i} \sum_{b} Z_{i}^{⊤} + σ^{2} I_{m} .

(4)

By setting $x_{i t}^{⊤} = {(1, I_{{t = 2}})}^{⊤}$ and z_it = 1 in (4), we immediately obtain (2).

Maximum likelihood (ML) is the most popular inference procedure for LMM (Demidenko, 2004; Raudenbush and Bryk, 2002). One major drawback of ML estimate is the dependence on the parametric assumptions; if data does not follow the normality assumptions in (1), model estimates may become biased or inconsistent. In recent years, many packages have adopted the sandwich variance estimate to address this issue (Goldstein, 1995; Rasbash et al., 2000; Raudenbush and Bryk, 2002). In this case, these procedures essentially yield variance estimates equivalent to those from the generalized estimating equations (GEE), which we review next.

2.2 Distribution-free Linear Model

Since the seminal work of Liang and Zeger (1986), the generalized estimating equations (GEE) approach has been widely used as an alternative for modeling longitudinal data. By modeling the marginal mean of the response at each assessment time, GEE eliminates both layers of distribution assumptions for the random effect and error term, thereby providing consistent estimates regardless of data distributions and the complexities of structure of correlated responses.

Within our context, consider the following linear model:

E (y_{i t} | x_{i t}) = μ_{i t} = x_{i t}^{⊤} β, 1 \leq i \leq n, 1 \leq t \leq m .

(5)

In (5), only the marginal mean at each time t is specified, which models the fixed-effect of LMM in (1) at time t. Under GEE, inference is based on the following score-like vector equation:

W_{n} (β) = \sum_{i = 1}^{n} W_{n i} (β) = \sum_{i = 1}^{n} G_{i} (x_{i}) S_{i} (β) = \sum_{i = 1}^{n} G_{i} (x_{i}) (y_{i} - μ_{i}) = 0,

(6)

where μ_i = (μ_i_l, …, μ_im)^┬, G_i (x_i) is some matrix function of $x_{i} = (x_{i 1}^{⊤}, \dots, x_{i m}^{⊤})$ . If the model in (5) is correct, then the GEE in (6) is unbiased, i.e., E [W_n (β)] = 0, regardless of the choice of G_i (x_i), ensuring the consistency of estimate of β obtained as the solution to (6) (Liang and Zeger, 1986; Diggle et al. 2002). In most applications, G_i (x_i) has the form:

G_{i} (x_{i}) = D_{i} V^{- 1} (α), D_{i} = \frac{\partial}{\partial β} μ_{i}, 1 \leq i \leq n,

(7)

where V (α) denotes a working variance matrix parameterized by α.

The phrase working variance is used to emphasize the fact that V (α) is not necessarily the true variance matrix of y_i. For example, the simplest choice is V =diag_t (Var(y_it)), where diag_t (α_t) denotes a diagonal matrix with α_t on the tth diagonal. In this case, the correlated responses y_it are treated as if they are independent. In addition, there is no parameter associated with this particular working independence model. Another popular choice is the exchangeable or uniform compound symmetry correlation matrix, $V (α) = {diag}_{t} (\sqrt{Var (y_{i t})}) C_{m} (ρ) {diag}_{t} (\sqrt{Var (y_{i t})})$ , where C_m(ρ) denotes a m × m matrix correlation matrix with a common correlation ρ for any pair of the component responses of y_i.

In addition to β, (6) also depends on α, though we have suppressed this dependence to highlight the fact that (6) is the equation for estimating β. Thus, before proceeding with inference about β, α must be estimated (except for the working independence model). Although the consistency of β̂ does not depend on how α is estimated, judicious choices of the type of estimates of α are required to ensure the asymptotic normality of β̂. In particular, if α̂ is $\sqrt{n}$ -consistent, β̂ is asymptotically normal with its asymptotic variance Σ_β given by (e.g. Liang and Zeger, 1986; Kowalski and Tu, Chap. 4, 2007):

\sum_{β} = B^{- 1} E (G_{i} S_{i} S_{i}^{⊤} G_{i}^{⊤}) B^{- ⊤}, B = E (\frac{\partial^{⊤}}{\partial β} W_{n i} (β, α)) .

A consistent estimate of Σ_β is obtained by substituting consistent estimates in place of the respective quantities above. For example, we can estimate B by $\hat{B} = \sum_{i = 1}^{n} \frac{\partial^{⊤}}{\partial β} W_{n i} (\hat{β}, \hat{α}) / n$ . Since moment estimates are $\sqrt{n}$ -consistent, α is readily estimated for the working independence and exchangeable correlation models (e.g. Liang and Zeger, 1986).

3 Inference under Missing Data

In longitudinal studies, missing data are inevitable; subjects may simply quit or they may not show up at follow-up visits. We characterize the impact of missing data on model estimates through assumptions or missing data mechanisms, which allow us to ignore the multitude of reasons for missing data and focus on addressing their impact on estimation of model parameters. The missing completely at random (MCAR) assumption models a class of missing data that does not affect model estimates when completely ignored. For example, in a treatment study, missing data resulting from patient's relocation or scheduling conflict falls into this category. However, MCAR is not a plausible model when missing data are associated with treatment interventions such as patients' deteriorated or improved health conditions due to treatment. By modeling the occurrence of missing data as a function of observed responses prior to the assessment point, the missing at random (MAR) assumption addresses this class of treatment related or response-dependent missing data.

Within the longitudinal study setting in Section 2, define a missing (or rather, observed) data indicator as:

r_{i t} = {\begin{array}{l} 1 & if y_{i t} is observed \\ 0 & if y_{i t} is missing \end{array}, r_{i} = {(r_{i 1}, \dots, r_{i m})}^{⊤}, 1 \leq t \leq m, 1 \leq i \leq n .

We assume no missing data at baseline t = 1 such that r_i₁ = 1 for all 1 ≤ i ≤ n. Below, we first briefly review inference for the parametric LMM in (3) and then turn our attention to the distribution-free version in (5).

3.1 Parametric Model

Let y_i = (y_i₁, …, y_im)^┬ and $x_{i} = {(x_{i 1}^{⊤}, \dots, x_{i m}^{⊤})}^{⊤}$ . Let $y_{i}^{o}$ and $y_{i}^{m}$ denote the observed and missing responses, respectively. Under likelihood based parametric inference, we jointly model the response y_i and missing data indicator r_i.

The joint density function, f (y_i, r_i | x_i), can be factored into the product of marginal and conditional distributions:

f (y_{i}, r_{i} | x_{i}) = f (y_{i} | x_{i}) f (r_{i} | y_{i}, x_{i}) .

(8)

Under MAR, the distribution of r_i depends only on the observed response, $y_{i}^{o}$ , and thus:

f (r_{i} | y_{i}, x_{i}) = f (r_{i} | y_{i}^{o}, y_{i}^{m}, x_{i}) = f (r_{i} | y_{i}^{o}, x_{i}) .

(9)

It follows from (8) and (9) that:

\begin{array}{l} f (y_{i}^{o}, r_{i} | x_{i}) & = \int f (y_{i}^{o}, y_{i}^{m} | x_{i}) f (r_{i} | y_{i}^{o}, x_{i}) d y_{i}^{m} \\ = f (r_{i} | y_{i}^{o}, x_{i}, θ_{y | r}) \int f (y_{i}^{o}, y_{i}^{m} | x_{i}) d y_{i}^{m} \\ = f (y_{i}^{o} | x_{i}, θ_{y}) f (r_{i} | y_{i}^{o}, x_{i}, θ_{y | r}) . \end{array}

(10)

If θ_y and θ_y_|_r are assumed disjoint, then following (10) the log-likelihood based on the joint observations $(y_{i}^{o}, r_{i})$ is given by:

l (θ) = \sum_{i = 1}^{n} log (f (y_{i}^{o} | x_{i}, θ_{y})) + \sum_{i = 1}^{n} log (f (r_{i} | y_{i}^{o} | x_{i}, θ_{y | r})) = l_{1} (θ_{y}) + l_{2} (θ_{y | r}) .

Thus, inference about θ_y can simply be based on the log-likelihood l₁ (θ_y).

Most packages provide inference about the parameters of interest θ_y based on maximizing the likelihood function l₁ (θ_y). Under the model assumptions of LMM, estimates are consistent under both MCAR and MAR. When study data fail to follow the parametric assumptions, maximum likelihood estimates are no longer guaranteed to be consistent. We examine bias from such estimates using simulated data in Section 4.

3.2 Distribution-free Model

3.2.1 Weighted Generalized Estimating Equations

In the presence of missing data, we may apply the GEE in (6) to the observed responses, i.e.,

W_{n} (β) = \sum_{i = 1}^{n} W_{n i} (β) = \sum_{i = 1}^{n} G_{i} (x_{i}) R_{i} S_{i} = \sum_{i = 1}^{n} G_{i} (x_{i}) Δ_{i} (y_{i} - μ_{i}) = 0,

(11)

where Δ_i = diag_t(r_it). However, the vector estimating equation in (11) is generally biased, i.e., E(W_n(β)) ≠ 0, unless missing data follow the MCAR model. To obtain consistent estimates of β under MAR, we must revise the GEE above.

To illustrate the basic idea for modification, consider the relatively simple, pre-post design, with a homogeneous sample. We are interested in estimating the mean response at pre- and post-assessment, μ = E (y_i) = (E (y_i₁), E (y_i₂))^┬. By selecting the G_i (x_i) according to (7), it follows from (11) that

W_{n} (μ) = R {(α)}^{- 1} (\begin{array}{l} 1 & 0 \\ 0 & r_{i 2} \end{array}) [\sum_{i = 1}^{n} (\begin{array}{l} y_{i 1} - μ_{1} \\ y_{i 2} - μ_{2} \end{array})] = 0 .

(12)

Solving the equations above for μ yields:

\hat{μ} = {({\hat{μ}}_{1}, {\hat{μ}}_{2})}^{⊤} = {(\frac{1}{n} \sum_{i = 1}^{n} y_{i 1}, \frac{1}{n_{2}} \sum_{i = 1}^{n} r_{i 2} y_{i 2})}^{⊤}, n_{2} = \sum_{i = 1}^{n} r_{i 2} .

(13)

If the missingness of y_i₂ depends on y_i₁, it is readily checked that E(μ̂₂) ≠ μ₂, implying that μ̂₂ is not a consistent estimate. This is also clear on intuitive grounds. For example, if y_i₁ and y_i₂ are positively correlated with higher values of y_i₁ leading to missing y_i₂, μ̂₂ in (13) will be downwardly biased, since it only averages over the observed y_i₂ corresponding to lower values of y_i₁. In treatment studies, this type of response-dependent missingness often occurs if a patient feels that his/her health condition has improved (or deteriorated) during study and decides not to undergo any additional treatment.

Under MAR, the missingness of y_i₂ only depends on y_i₁, i.e.,

π_{i 2} = P (r_{i 2} = 1 | y_{i 1}, y_{i 2}) = P (r_{i 2} = 1 | y_{i 1}) .

This probability π_i₂ selects which y_i2's are to be observed based on the values of y_i₁. Thus, each ith subject observed at t = 2 represents a subgroup of 1/π_i₂ subjects with the same baseline value y_i₁, but unobserved at post-treatment because of the selection process defined by π_i₂. By augmenting each observed response y_i₂ at t = 2 with the weight function 1/π_i₂, we can statistically include the missing responses in the estimation of μ₂ by using a weighted GEE (WGEE):

W_{n} (μ) = R {(α)}^{- 1} (\begin{array}{l} 1 & 0 \\ 0 & \frac{r_{i 2}}{π_{i 2}} \end{array}) [\sum_{i = 1}^{n} (\begin{array}{l} y_{i 1} - μ_{1} \\ y_{i 2} - μ_{2} \end{array})] = 0 .

(14)

It is readily checked that E(W_n(μ)) = 0, enabling (14) to yield a consistent estimate of μ₂, $\hat{μ_{2}} = (\sum_{i = 1}^{n} \frac{r_{i 2}}{π_{i 2}} y_{i 2}) / n$ (e.g. Kowalski and Tu, Chap. 4, 2007). We can also directly verify this:

{\hat{μ}}_{2} = \frac{1}{n} \sum_{i = 1}^{n} \frac{r_{i 2}}{π_{i 2}} y_{i 2} \to_{p} E (\frac{r_{i 2}}{π_{i 2}} y_{i 2}) = E [y_{i 2} \frac{1}{π_{i 2}} E (r_{i 2} | y_{i 1}, y_{i 2})] = E [y_{i 2} \frac{1}{π_{i 2}} E (r_{i 2} | y_{i 1})] = μ_{2},

where →_p denotes convergence in probability.

By comparing (12) with (14), it is seen that the latter differs from the former only in the definition of Δ_i. By carrying this modification over to a general setting with m assessments, we obtain the WGEE for inference about β for the distribution-free LMM in (5), which is defined by the same vector equation in (11) except for substituting the following modified Δ_i:

π_{i t} = P (r_{i t} = 1 | x_{i}, y_{i}), Δ_{i t} = \frac{r_{i t}}{π_{i t}}, Δ_{i} = {diag}_{t} (Δ_{i t}), 1 \leq t \leq m, 1 \leq i \leq n .

(15)

It is again readily checked that E [G_i (x_i) Δ_iS_i] = 0, ensuring that the WGEE yields consistent estimates of β (e.g. Robins et al. 1995; Kowalski and Tu, Chap. 4, 2007).

To use WGEE, we must know or have estimates of π_it. In some cases, subject dropout is created by study design and π_it are known. For example, in some multi-stage trials, patients can only enter the next stage of the study if they satisfy certain criteria such as response to treatment at the previous stage. However, as noted earlier, in most studies, missing data patterns are defined by a host of factors not directly related to study design. We discuss estimation of π_it for the general setting after introducing another robust approach for the distribution-free LMM.

3.2.2 Augmented Weighted Generalized Estimating Equations

The WGEE discussed in Section 3.2.1 depends on the model for missing data in (15). In most studies, π_it are unknown and must be modeled and estimated. If such a model is misspecified, the WGEE estimate may be inconsistent. In applications, reliable models may also exist for directly relating the missing response to the observed ones and other covariates. The augmented WGEE (AWGEE) is developed to take advantage of this additional source of modeling information to ensure valid inference when the model for π_it may be incorrect (Robins et al., 1995; Tsiatis, 2006).

To illustrate, consider again the pre-post study design. Suppose that we can predict y_i₂ directly based on y_i₁ using a linear regression:

E (y_{i 2} | y_{i 1}) = η_{0} + η_{1} y_{i 1}, 1 \leq i \leq n .

(16)

Then, we can estimate μ₂ without using WGEE by ${\tilde{μ}}_{2} = \sum_{i = 1}^{n} ({\hat{η}}_{0} + {\hat{η}}_{1} y_{i 1}) / n$ . This new estimate is consistent if (16) is a correct model, since

{\tilde{μ}}_{2} = {\hat{η}}_{0} + {\hat{η}}_{1} \frac{1}{n} \sum_{i = 1}^{n} y_{i 1} \to_{p} η_{0} + η_{1} E (y_{i 1}) = E [E (y_{i 2} | y_{i 1})] = E (y_{i 2}) = μ_{2} .

By combining both the prediction model in (16) and the WGEE in (14), we obtain an augmented WGEE to estimate μ as follows:

W_{n} (μ) = \sum_{i = 1}^{n} R {(α)}^{- 1} (Δ_{i} S_{i} - Δ_{i}^{c} {\tilde{S}}_{i}), Δ_{i}^{c} = Δ_{i} - I_{2},

(17)

{\tilde{S}}_{i} = {(y_{i 1} - μ_{1}, {\hat{y}}_{i 2} - μ_{2})}^{⊤}, {\hat{y}}_{i 2} = η_{0} + η_{1} y_{i 1} .

It is readily checked that $E [G_{i} (x_{i}) (Δ_{i} S_{i} - Δ_{i}^{c} \tilde{S_{i}})] = 0$ if either (15) or (18) or both are correct. Thus, the AWGEE above yields consistent estimates of μ if at least one of these models is correct. Further, when both models are correct, the AWGEE estimate from (17) may also be more efficient than the WGEE estimate (Robins et al, 1995; Tsiatis, 2006).

The above is readily extended to a more general setting where the prediction model in (16) also involves other baseline covariates. Let u_i be a set of baseline variables including y_i₁ and the prediction model be defined by:

E (y_{i 2} | u_{i}) = η_{0} + u_{i}^{⊤} η_{1}, 1 \leq t \leq 2, 1 \leq i \leq n .

(18)

The AWGEE is defined by

W_{n} (β) = \sum_{i = 1}^{n} W_{n i} (β) = \sum_{i = 1}^{n} G_{i} (x_{i}) (Δ_{i} S_{i} - Δ_{i}^{c} {\tilde{S}}_{i}) = 0,

(19)

where ${\hat{y}}_{i 2} = η_{0} + u_{i}^{⊤} η_{1}$ . To ensure consistent estimation for the regression model, we assume a surrogacy-type assumption, [y_i₂ | u_i, x_i] = [y_i₂ | u_i], where [y_i₂ | v_i] denotes the conditional distribution of y_i₂ given v_i (e.g. Prentice, 1989; Kowalski and Tu, 2002). Of course, the condition holds if u_i includes x_i. Under the surrogacy condition, $E [G_{i} (x_{i}) (Δ_{i} S_{i} - Δ_{i}^{c} \tilde{S_{i}})] = 0$ , if either (15) or (18) or both are correct. Thus, the AWGEE in (15) yields consistent estimates of β if at least one of the missing data models is correct.

Although feasible in principle, it is more complex to implement AWGEE for a general longitudinal study with more than two assessments. For example, for m = 3, we need to consider two missing data patterns when predicting missing y_i₃: one with observed y_i₁ and y_i₂, and the other with observed y_i₁ only. The number of prediction models grows rapidly as the frequency of assessments increases. Further, it is more intricate to specify the prediction models than models for the missing response probabilities π_it.

3.2.3 Estimation of Weight Function and Augmented Term

Under MCAR, r_i are independent of x_i and y_i and π_it = P [r_it = 1] = π_t. In this case, π_t are readily estimated by the sample moment: ${\hat{π}}_{t} = (\sum_{i = 1}^{n} r_{i t}) / n (1 \leq t \leq m)$ . In many studies, however, π_it are dependent of either x_i or y_i or both. It is difficult to model π_it as a function of x_i and y_i without imposing some additional assumptions regarding the relationship between them. As in the literature, we focus on the MAR mechanism.

As noted earlier, missing data in longitudinal trials often occur as the result of subject dropout due to deteriorated/improved health conditions and other related conditions, exhibiting the so-called monotone missing data pattern (MMDP). The structured patterns under MMDP make it possible to model π_it in most studies.

Under MMDP, if y_it is observed at time t, then all y_is at all earlier times s (< t) are also observed. Let

\begin{array}{l} H_{i t} = {{\tilde{x}}_{i s}, {\tilde{y}}_{i s}; 1 \leq s \leq t - 1}, {\tilde{x}}_{i t} = {(x_{i 1}^{⊤}, \dots, x_{i 1 (t - 1)}^{⊤})}^{⊤}, \\ {\tilde{y}}_{i t} = {(y_{i 1}, \dots, y_{i (t - 1)})}^{⊤}, 2 \leq t \leq m . \end{array}

The subset H_it contains all observed data prior to time t. Under MAR,

π_{i t} = P (r_{i t} = 1 | x_{i}, y_{i}) = P (r_{i t} = 1 | H_{i t}) = P (r_{i t} = 1 | {\tilde{x}}_{i t}, {\tilde{y}}_{i t}) .

(20)

Thus, under MMDP and MAR, π_it are a function of observed data only, making it possible to estimate these selection probabilities.

Let p_it = E (r_it = 1 | r_i₍_t₋₁₎ = 1, H_it) denote the one-step transition probability of the occurrence of missing data. Then, by invoking MMDP, it is readily checked that

π_{i t} = \prod_{s = 2}^{t} p_{i s}, 2 \leq t \leq m, 1 \leq i \leq n .

(21)

Thus, we can estimate π_it by modeling the p_it. Since p_it is the probability of a binary response, we model that using logistic regression:

logit (p_{i t} (α_{t})) = logit (E (r_{i t} = 1 | r_{i (t - 1)} = 1, H_{i t})) = ξ_{t}^{⊤} {\tilde{w}}_{i t}, 2 \leq t \leq m,

(22)

where ${\tilde{w}}_{i t} = {(1, {\tilde{x}}_{i t}^{⊤}, {\tilde{y}}_{i t}^{⊤})}^{⊤}$ and ${\tilde{w}}_{i} = {({\tilde{w}}_{i 2}^{⊤}, \dots, {\tilde{w}}_{i m}^{⊤})}^{⊤}$ . For each t, we can estimate ξ_t by maximum likelihood or GEE conditional on the observed w̃_it at t − 1 (2 ≤ t ≤ m).

For AWGEE, we again consider the pre-post study design. In this case, we can readily estimate η in (18) using GEE, where $η = {(η_{0}, η_{1}^{⊤})}^{⊤}$ . With an estimate η̂, we can predict y_it By using η̂ and ξ̂ = ${\hat{ξ}}_{2}^{⊤}$ , we can construct the AWGEE in (19).

3.2.4 Inference for WGEE and AWGEE Estimates

For inference based on WGEE, let Δ_i, (ξ) be modeled as in (21) and (22). If ξ_t is estimated by GEE or maximum likelihood, $\hat{ξ} = {({\hat{ξ}}_{2}^{⊤}, \dots, {\hat{ξ}}_{m}^{⊤})}^{⊤}$ is the solution to the following vector estimating equation:

\sum_{i = 1}^{n} q_{i} (ξ) = \sum_{i = 1}^{n} {(q_{i 2}^{⊤}, \dots, q_{i m}^{⊤})}^{⊤} = 0, q_{i t} = \frac{\partial}{\partial ξ_{t}} {r_{i (t - 1)} [r_{i t} log (p_{i t}) - (1 - r_{i t}) log (1 - p_{i t})]} .

(23)

Now, let

B = E (\frac{\partial^{⊤}}{\partial β} W_{n i}), C = E (\frac{\partial^{⊤}}{\partial ξ} W_{n i}), F = E (\frac{\partial^{⊤}}{\partial ξ} q_{i} (ξ)),

where W_ni is defined in (11). Then, as shown in Appendix A, under $\sqrt{n}$ -consistency of α̂, the WGEE estimate β̂ is asymptotically normal with the asymptotic variance given by

\sum_{β} = B^{- 1} (Var (W_{n i}) + Φ) B^{- ⊤}, Φ = {CF}^{- 1} Var (q_{i}) F^{- ⊤} C^{⊤} - E (W_{n i} q_{i}^{⊤} F^{- ⊤} C^{⊤}) - {[E (W_{n i} q_{i}^{⊤} F^{- ⊤} C^{⊤})]}^{⊤} .

(24)

In (24), Φ accounts for the variability of estimated ξ̂. We can estimate Σ_β by substituting consistent estimates in place of the respective quantities.

For AWGEE inference, we also need to estimate η for the prediction model in (18). Using GEE, the vector estimating equation is given by:

\sum_{i = 1}^{n} g_{i} (η) = \sum_{i = 1}^{n} \frac{\partial}{\partial η} {r_{i 2} [y_{i 2} - (η_{0} + u_{i}^{⊤} η_{2})]} = 0 .

(25)

Let $\hat{φ} = {({\hat{φ}}_{1}^{⊤}, {\hat{φ}}_{2}^{⊤})}^{⊤}$ with ϕ₁ = ξ and ϕ₂ = η. Then, by combining (23) and (25), ϕ̂ can be expressed as the solution to the following joint vector estimating equation:

\sum_{i = 1}^{n} s_{i} = \sum_{i = 1}^{n} {(q_{i}^{⊤} (ϕ_{1}), g_{i}^{⊤} (ϕ_{2}))}^{⊤} = 0, s_{i} = {(q_{i}^{⊤} (ϕ_{1}), g_{i}^{⊤} (ϕ_{2}))}^{⊤} .

The AWGEE estimate β̂ is also asymptotically normal under $\sqrt{n}$ -consistency of α̂, with the asymptotic variance having the same form as in (24) except for substituting s_i for q_i and redefining $C = E (\frac{\partial^{⊤}}{\partial φ} W_{n i})$ and $F = E (\frac{\partial^{⊤}}{\partial φ} S_{i} (φ))$ . Again, we can estimate the asymptotic variance by substituting consistent estimates in place of the respective parameters.

4 Application

We illustrate our considerations with both real and simulated data. We first present an application to data from a longitudinal study in depression research and then investigate the performance of the approach with small to moderate sample sizes by simulation. In all the examples, we set the statistical significance at α = 0.05. All analyses are carried out using a code we have developed for implementing the proposed approach using the R software platform (Free Software Foundation, 1999). This code is available from the author upon request.

4.1 Real Study

In a study on geriatric depression and associated medical comorbidities for old primary care patients, 744 subjects were enrolled from private practices and University-affiliated clinics in general internal medicine, geriatrics, and family medicine in Monroe County, New York (Lyness et al., 2007). All patients age 65 years and older who presented for care on selected days and were capable of giving informed consent. Enrolled subjects underwent semi-structured interviews, administered by trained raters in the subjects' homes or in research offices at the UR Medical Center. The raters' subject interviews included assessments of cognition, functional status, and psychopathology, the latter including the Structured Clinical Interview for DSM-IV (SCID) (Spitzer et al. 1986). Interviews and chart reviews were conducted at study intake, and again at one- and two-year follow-up time points.

In geriatric research, overall functional disability is of particular importance, as it reflects both the mental and physical health conditions of the individual. Primary measures of overall functional status include the Instrumental Activities of Daily Living (IADL), Physical Self-Maintenance Scales (PSMS), Global Assessment of Functioning (GAF), and the Karnofsky Performance Status Scale (KPSS) (Lawton MP and Brody, 1969, Karnofsky DA and Burchenal JH, 1949, Ware JE, Jr. and Sherbourne CD). For illustration purposes, we analyzed the change of IADL from baseline to one-year follow-up, as this measure assesses instrumental activities such as shopping or using the telephone and is particularly popular in geriatric research. Further, we only included the baseline value as a predictor when modeling the missingness of this outcome as well as the outcome itself at the follow-up using the respective logistic (20) and linear (18) models.

Of the 744 enrolled, 468 completed the IADL at the one-year follow-up. Shown in Table 1 are the estimates of the intercept and slope from the fitted logistic regression for modeling the missingness and the linear regression for modeling the outcome of IADL as a function of its baseline value at the follow-up. The baseline IADL was significant in both models, indicating that it did predict the occurrence of missing IADL as well as the outcome itself at the follow-up. Note that the negative sign of the estimate of the coefficient for baseline IADL in the logistic model indicates that the subjects with lower baseline IADL were more likely to come for assessment at the one-year follow-up. As lower IADL is associated with poorer functioning status, the observed sample at the follow-up visit seemed to be biased towards those with more severe overall functional disability at baseline.

Table 1.

Estimates of parameters of (1) logistic regression for modeling missingness at one-year follow-up, and (2) linear model for predicting IADL at one-year follow-up for the study on geriatric depression and associated medical comorbidities.

Estimates of models for missingness and outcome at one-year follow-up
Predictors	Estimate	Standard error	p-value
Logistic regression for missingness at one year follow-up
Intercept	0.635	0.086	<0.0001
Baseline IADL	−0.0491	0.018	0.007
Linear regression for predicting missing IADA at one year follow-up
Intercept	0.685	0.124	<0.0001
Baseline IADL	1.01	0.03	<0.0001

Open in a new tab

We fit the LMM in (1) and the distribution-free alternative in (5) to examine the change of IADL from baseline to the one-year follow-up. Shown in Table 2 are the estimates of the intercept β₀ and slope β₁ for the respective models under the different inference procedures. As the estimates of β₁ were positive across the board, the mean IADL increased at the follow-up visit, indicating better functioning status for the old primary care patients in this observational study.

Table 2.

Estimates of parameters of (1) linear mixed-effects model (ML), and (2) distribution-free linear model (GEE, WGEE and AWGEE) for change of IADL from baseline to one-year follow-up for the study on geriatric depression and medical comorbidities.

Estimates of models for change of IADL from baseline to one-year follow-up
Methods	β₀ (s.e.)	p-value for β₀	β₁ (s.e.)	p-value for β₁
ML	2.11(0.16)	<0.0001	0.054(0.009)	<0.0001
GEE	2.11(0.15)	<0.0001	0.032(0.014)	0.02
WGEE	2.11(0.15)	<0.0001	0.061(0.016)	0.0002
AWGEE	2.11(0.15)	<0.0001	0.060(0.013)	0.0001

Open in a new tab

However, the magnitude of the estimate of β₁ did vary substantially — not only between the models, but also across the different procedures within the same distribution-free linear model. The WGEE and AWGEE yielded quite similar estimates, with the latter AWGEE also providing improved efficiency, as indicated by smaller asymptotic standard errors. The GEE performed poorly, with a whopping 50% downward bias, as compared to its counterparts WGEE and AWGEE estimates. For the between-model comparison, the ML estimate of β₁ from the fitted LMM also incurred a downward bias, albeit with a much smaller magnitude relative to the GEE estimate. The downward bias in both cases was consistent with the fact that those assessed at the follow-up visit represented a subgroup with more severe overall functional disability at baseline.

graphic file with name nihms188151u1.jpg

Shown in the Figure is the normal-based Q-Q plot of the conditional residuals obtained from the estimated fixed and random effects for the fitted LMM model (Nobre and Singer, 2007). The plot indicates clearly that the residuals did not follow a normal distribution, which may explain the difference in the estimates of the fixed effects between the parametric (LMM) and distribution-free models (WGEE and AWGEE).

4.2 Simulation Study

Given the discrepant estimates between LMM and WGEE (AWGEE) for the real study data in 4.2, we conducted a simulation study with a pre-post study design to investigate this issue further. We considered two non-normal distributions for the model error term of the LMM: a rescaled central chi-square with one degree of freedom and a uniform between −1 and 1. Since the results are quite similar, we only discuss and report the results from the chi-square-distributed error. To examine the performance of the models under small, moderate and large samples, we performed the simualation study with three sample sizes: n = 50, 100 and 2,000. All simulations were performed with a Monte Carlo sample of 1,000 using the R software (R Development Core Team, 2007).

We considered the pre- and post-treatment design and simulated the outcome according to the LMM in (1) by setting β₀ = β₁ = 1 and ∈_it (t = 1, 2) to follow the rescaled chi-square distribution, $(χ_{1}^{2} + 1) \sqrt{σ^{2} / 2}$ , where $χ_{1}^{2}$ denotes a central chi-square with one degree of freedom. We varied $σ_{b}^{2}$ and σ² to control the within-subject correlation $ρ = σ_{b}^{2} / (σ_{b}^{2} + σ^{2})$ . We assumed no missing data at baseline t = 1 and simulated the missing response at post-treatment t = 2 under MAR according to the following logistic regression:

logit (π_{i 2}) = logit (P (r_{i 2} = 1 | y_{i 1})) = ξ_{0} + ξ_{1} y_{i 1}, 1 \leq i \leq n .

(26)

We set ξ₀ = 0.5 and ξ₁ = 1.2 to create about 25% missing response y_i₂ at t = 2.

Under (1), it is readily checked that regardless of the distributions for b_i and ∈_it (see Appendix)

E (y_{i 2} | y_{i 1}) = η_{0} + η_{1} y_{i 1}, η_{0} = β_{0} (1 - \frac{σ_{b}^{2}}{σ_{b}^{2} + σ^{2}}) + β_{1}, η_{1} = ρ = \frac{σ_{b}^{2}}{σ_{b}^{2} + σ^{2}}, 1 \leq i \leq n .

(27)

The above was used to predict missing y_i₂ from y_i₁ for AWGEE inference about β as discussed in Section 3.2.4. To study the effect of wrong weight function on WGEE and the robustness of AWGEE in such a scenario, we also estimated π_i₂ under an incorrect model by leaving out y_i₁ in (26).

We considered the null H₀ : β₀ = β₁, i.e., the mean at post-treatment is twice that at pretreatment. We tested H₀ using the Wald statistic, $Q_{n}^{2} = n {\hat{β}}^{⊤} K^{⊤} {(K {\sum^{^}}_{β} K^{⊤})}^{- 1} K \hat{β}$ , which has an asymptotic central $χ_{1}^{2}$ distribution. We estimated the type I error rate α based on the distribution of $Q_{n}^{2}$ from 1, 000 Monte Carlo (MC) replications, $\hat{α} = (\sum_{j = 1}^{1000} I_{{Q_{n j} \geq q 0.95}}) / 1000$ , where $Q_{n j}^{2}$ denotes the value of $Q_{n}^{2}$ from the jth MC replication and q_0.95 the 95th percentile of $χ_{1}^{2}$ . The maximum likelihood (ML) inference about β = (β₀,β₁)^┬ for LMM was obtained from the LME procedure in R, while the WGEE and AWGEE estimates for the distribution-free alternative in (5) were computed based on the asymptotic results in Section 3.2.

Shown in Table 3 are the estimates of β and associated asymptotic standard errors averaged over 1,000 MC replications obtained from ML, GEE, WGEE and AWGEE for the respective models, with a within-subject correlation ρ = 0.1 (or $σ_{b}^{2} = 2 / 9$ and σ² = 2). The results confirmed that the baseline mean β₀ is consistently estimated by all four procedures. For β₁, ML yielded consistent estimates for the normal distributed error, but biased estimates under the rescaled $χ_{1}^{2}$ error. As expected, GEE estimates were biased under both types of error distributions. Note that in the rescaled $χ_{1}^{2}$ error case, the standard errors of these estimates did not increase much, making the upwardly biased estimates yield false significant results in practice.

Table 3.

Averaged estimates of β over 1,000 Monte Carlo replications along with asymptotic standard errors (s.e.) and type I error rates α for sample size 50, 100, 2000, with about 25% missing data at post-treatment based on ML for linear mixed-effects model, and GEE, WGEE and AWGEE for distribution-free linear model, with true β₀ = 1 and β₁ = 1 and within-subject correlation ρ = 0.1.

Methods	Weight Function	Prediction Model	Normal Distribution			Chi-square Distribution
Methods	Weight Function	Prediction Model	β₀ (s.e.)	β₁ (s.e.)	α	β₀ (s.e.)	β₁ (s.e.)	α
Sample size = 50
ML			1.00(0.21)	0.98(0.30)	0.06	1.00(0.22)	1.10(0.23)	0.07
GEE			1.01(0.21)	1.05(0.31)	0.05	1.00(0.20)	1.05(0.30)	0.05
WGEE	Right		0.99(0.21)	1.00(0.33)	0.08	1.01(0.21)	1.00(0.31)	0.06
WGEE	Wrong		1.01(0.21)	1.04(0.31)	0.06	1.00(0.20)	1.04(0.30)	0.06
AWGEE	Right	Right	1.00(0.21)	1.00(0.39)	0.07	1.00(0.21)	1.01(0.38)	0.07
	Wrong	Right	1.01(0.21)	0.99(0.32)	0.08	1.01(0.20)	0.99(0.32)	0.09
	Right	Wrong	0.99(0.21)	1.00(0.40)	0.06	0.99(0.21)	1.00(0.39)	0.06
	Wrong	Wrong	1.01(0.21)	1.05(0.33)	0.07	1.01(0.21)	1.05(0.34)	0.07
Sample size = 100
ML			1.00(0.15)	1.00(0.21)	0.06	1.00(0.15)	1.09(0.17)	0.06
GEE			1.00(0.15)	1.04(0.22)	0.06	1.00(0.15)	1.05(0.22)	0.05
WGEE	Right		1.00(0.15)	1.00(0.23)	0.05	1.00(0.15)	1.00(0.22)	0.06
WGEE	Wrong		1.00(0.15)	1.04(0.22)	0.07	1.00(0.15)	1.05(0.22)	0.05
AWGEE	Right	Right	1.00(0.15)	1.00(0.29)	0.04	1.00(0.15)	1.00(0.30)	0.04
	Wrong	Right	1.00(0.15)	1.00(0.23)	0.05	1.00(0.15)	1.00(0.22)	0.06
	Right	Wrong	1.00(0.15)	1.01(0.30)	0.04	1.00(0.15)	1.00(0.30)	0.04
	Wrong	Wrong	1.00(0.15)	1.04(0.24)	0.07	1.00(0.15)	1.06(0.23)	0.07
Sample size = 2000
ML			1.00(0.033)	1.00(0.049)	0.06	1.00(0.035)	1.10(0.038)	0.59
GEE			1.00(0.033)	1.04(0.049)	0.14	1.00(0.033)	1.05(0.049)	0.21
WGEE	Right		1.00(0.033)	1.00(0.054)	0.05	1.00(0.033)	1.00(0.050)	0.05
WGEE	Wrong		1.00(0.033)	1.04(0.049)	0.12	1.00(0.033)	1.05(0.049)	0.17
AWGEE	Right	Right	1.00(0.033)	1.00(0.052)	0.05	1.00(0.033)	1.00(0.051)	0.05
	Wrong	Right	1.00(0.033)	1.00(0.048)	0.06	1.00(0.033)	1.00(0.047)	0.06
	Right	wrong	1.00(0.033)	1.00(0.053)	0.05	1.00(0.033)	1.00(0.052)	0.05
	Wrong	Wrong	1.00(0.033)	1.04(0.049)	0.12	1.00(0.033)	1.05(0.048)	0.15

Open in a new tab

Under the correct weight function, WGEE performed well. When the incorrect constant weight function was used, WGEE yielded biased estimates, while the AWGEE estimates remained close to the true value of β₁ under the correct prediction model. However, AWGEE did not show any significant gain in efficiency; in fact, the estimate of the slope β₁ had a larger standard error under AWGEE than under WGEE across small sample sizes n.

To further investigate the relative efficiency between WGEE and AWGEE, we replicated the above analysis by increasing the within-subject correlation. For example, shown in Tables 4 and 5 are the estimates from one such replicated analysis with ρ = 0.3 (or $σ_{b}^{2} = 1$ and σ² = 2 and 0.6 (or $σ_{b}^{2} = 1.5$ and σ² = 1), respectively. It is seen that all conclusions above remain the same, except for the relative efficiency between WGEE and AWGEE under the correct weight function and prediction model. As ρ increased to 0.3, AWGEE have smaller standard errors than its counterpart WGEE when n = 2000. At ρ = 0.6, not only did AWGEE show smaller standard errors than WGEE across all sample sizes, their differences also widened as compared to those with ρ = 0.1.

Table 4.

Methods	Weight Function	Prediction Model	Normal Distribution			Chi-square Distribution
Methods	Weight Function	Prediction Model	β₀ (s.e.)	β₁ (s.e.)	α	β₀ (s.e.)	β₁ (s.e.)	α
Sample size = 50
ML			1.00(0.25)	1.00(0.31)	0.06	1.01(0.24)	1.08(0.32)	0.06
GEE			1.00(0.24)	1.18(0.32)	0.08	1.00(0.24)	1.22(0.33)	0.08
WGEE	Right		0.99(0.25)	1.02(0.36)	0.05	1.02(0.24)	1.00(0.35)	0.04
WGEE	Wrong		1.00(0.24)	1.18(0.33)	0.09	1.01(0.24)	1.24(0.32)	0.11
AWGEE	Right	Right	1.00(0.24)	1.01(0.40)	0.04	1.01(0.24)	0.99(0.41)	0.03
	Wrong	Right	1.01(0.24)	0.99(0.35)	0.05	1.02(0.24)	1.00(0.34)	0.05
	Right	Wrong	0.99(0.24)	1.00(0.41)	0.03	1.00(0.24)	1.01(0.41)	0.04
	Wrong	Wrong	1.00(0.24)	1.17(0.34)	0.08	1.01(0.24)	1.21(0.34)	0.10
Sample size = 100
ML			0.99(0.17)	1.01(0.22)	0.07	1.00(0.17)	1.12(0.23)	0.07
GEE			1.00(0.17)	1.19(0.23)	0.15	1.00(0.17)	1.23(0.23)	0.15
WGEE	Right		1.00(0.17)	1.01(0.26)	0.06	1.00(0.17)	1.02(0.25)	0.03
WGEE	Wrong		1.00(0.17)	1.20(0.24)	0.13	1.00(0.17)	1.23(0.23)	0.18
AWGEE	Right	Right	1.00(0.17)	0.99(0.28)	0.05	1.00(0.17)	1.00(0.27)	0.04
	Wrong	Right	1.00(0.17)	1.01(0.24)	0.06	1.00(0.17)	0.99(0.25)	0.06
	Right	Wrong	1.00(0.17)	1.01(0.29)	0.05	1.00(0.17)	0.99(0.30)	0.04
	Wrong	Wrong	1.00(0.17)	1.18(0.25)	0.11	1.00(0.17)	1.22(0.25)	0.16
Sample size = 2000
ML			1.00(0.039)	1.00(0.050)	0.06	1.00(0.038)	1.12(0.052)	0.64
GEE			1.00(0.039)	1.20(0.052)	0.97	1.00(0.038)	1.23(0.052)	0.99
WGEE	Right		1.00(0.039)	1.00(0.068)	0.06	1.00(0.039)	1.00(0.059)	0.05
WGEE	Wrong		1.00(0.039)	1.20(0.053)	0.95	1.00(0.039)	1.23(0.052)	1.00
AWGEE	Right	Right	1.00(0.039)	1.00(0.055)	0.05	1.00(0.039)	1.00(0.054)	0.06
	Wrong	Right	1.00(0.039)	1.00(0.050)	0.06	1.00(0.039)	1.00(0.050)	0.06
	Right	wrong	1.00(0.039)	1.00(0.057)	0.05	1.00(0.039)	1.00(0.056)	0.05
	Wrong	Wrong	1.00(0.039)	1.19(0.051)	0.94	1.00(0.039)	1.23(0.052)	1.00

Open in a new tab

Table 5.

Methods	Weight Function	Prediction Model	Normal Distribution			Chi-square Distribution
Methods	Weight Function	Prediction Model	β₀ (s.e.)	β₁ (s.e.)	α	β₀(s.e.)	β₁(s.e.)	α
Sample size = 50
ML			1.00(0.22)	1.01(0.23)	0.06	1.00(0.22)	1.08(0.24)	0.07
GEE			1.01(0.22)	1.30(0.26)	0.19	1.00(0.22)	1.32(0.25)	0.25
WGEE	Right		1.00(0.22)	1.02(0.29)	0.05	0.99(0.22)	1.04(0.29)	0.04
WGEE	Wrong		1.01(0.22)	1.30(0.27)	0.21	1.01(0.22)	1.32(0.25)	0.24
AWGEE	Right	Right	1.00(0.22)	1.01(0.27)	0.06	1.01(0.22)	1.01(0.28)	0.05
	Wrong	Right	1.01(0.22)	0.99(0.26)	0.06	1.00(0.22)	1.02(0.27)	0.06
	Right	Wrong	0.99(0.22)	1.02(0.28)	0.07	1.00(0.22)	0.99(0.28)	0.05
	Wrong	Wrong	1.01(0.22)	1.28(0.27)	0.20	1.01(0.22)	1.30(0.26)	0.22
Sample size = 100
ML			1.00(0.16)	1.00(0.16)	0.06	1.00(0.16)	1.10(0.17)	0.09
GEE			1.00(0.16)	1.30(0.18)	0.36	1.01(0.16)	1.32(0.18)	0.42
WGEE	Right		1.00(0.16)	1.02(0.21)	0.04	1.00(0.16)	1.01(0.21)	0.03
WGEE	Wrong		1.00(0.16)	1.29(0.18)	0.35	1.00(0.16)	1.33(0.18)	0.44
AWGEE	Right	Right	1.00(0.16)	1.00(0.18)	0.05	1.00(0.16)	1.00(0.19)	0.04
	Wrong	Right	1.00(0.16)	1.01(0.16)	0.06	1.00(0.16)	1.01(0.16)	0.06
	Right	Wrong	1.00(0.16)	0.99(0.19)	0.05	1.00(0.16)	1.00(0.19)	0.05
	Wrong	Wrong	1.00(0.16)	1.26(0.17)	0.34	1.00(0.16)	1.31(0.16)	0.44
Sample size = 2000
ML			1.00(0.038)	1.00(0.046)	0.06	1.00(0.035)	1.10(0.038)	0.59
GEE			1.00(0.035)	1.30(0.041)	1.00	1.00(0.035)	1.32(0.041)	1.00
WGEE	Right		1.00(0.035)	1.00(0.057)	0.05	1.00(0.035)	1.00(0.054)	0.04
WGEE	Wrong		1.00(0.035)	1.30(0.041)	1.00	1.00(0.035)	1.32(0.041)	1.00
AWGEE	Right	Right	1.00(0.035)	1.00(0.043)	0.05	1.00(0.035)	1.00(0.044)	0.05
	Wrong	Right	1.00(0.035)	1.00(0.040)	0.06	1.00(0.035)	1.00(0.040)	0.06
	Right	wrong	1.00(0.035)	1.00(0.043)	0.06	1.00(0.035)	1.00(0.045)	0.05
	Wrong	Wrong	1.00(0.035)	1.27(0.041)	1.00	1.00(0.035)	1.30(0.042)	1.00

Open in a new tab

5 Discussion

We investigated the two primary modeling strategies for longitudinal continuous response, the linear mixed-effects model (LMM) and the distribution-free linear model, with respect to their performance under missing data, and illustrated our considerations using real as well as simulated study data. Our results show that LMM and the GEE procedure for the distribution-free alternative generally yield biased estimates under MAR when the normality assumption for LMM is violated. Further, as indicated by the simulation results, the standard errors of these estimates do not increase to reflect model misspecification, making inference prone to misleading findings. Thus, when modeling longitudinal data, it is important to test the MCAR assumption as discussed in Section 3.2 before applying any of the models and inference procedures considered. If the null of MCAR is rejected, WGEE and/or AWGEE should be considered, unless there is strong evidence to support the use of the alternative linear mixed-effects model.

Our simulation results also indicate that the gain in efficiency by AWGEE over WGEE depends on the magnitude of the within-subject correlation ρ. Within the context of the particular simulation model considered, AWGEE is less efficient than WGEE for small sample size under small ρ such as 0.1. But, AWGEE edged out WGEE to be a more efficient procedure as ρ increased to 0.6.

Acknowledgments

This research was supported in part by NIH grants R01-DA012249, R21-AG023956, UL1 RR024160 and R24-MH071604. We thank Ms. Bliss-Clark for her careful proofreading of the manuscript, and an anonymous, an Associate Editor and Editor Prof. Leonhard Held for their constructive comments that led to a substantial improvement in the presentation of this research.

Appendix

Under (1), we have

ρ_{yb} = Corr (y_{i 1}, b_{i}) = \frac{Cov (y_{i 1}, b_{i})}{\sqrt{Var (y_{i 1}) Var (b_{i})}} = \sqrt{\frac{σ_{b}^{2}}{σ_{b}^{2} + σ^{2}}} .

(28)

Further, the above holds regardless of the distributions for b_i and ∈_it. Now, consider the linear regression

b_{i} = ϕ_{0} + ϕ_{1} y_{i 1} + δ_{i}, δ_{i} \sim (0, σ_{δ}^{2}), 1 \leq i \leq n,

Where $(0, σ_{δ}^{2})$ denotes a distribution with mean 0 and variance $σ_{δ}^{2}$ . It follows from (28) and the relationship between linear regression coefficients and correlation that

ϕ_{1} = ρ_{yb} \sqrt{\frac{Var (b_{i})}{Var (y_{i 1})}} = \sqrt{\frac{σ_{b}^{2}}{σ_{b}^{2} + σ^{2}}} \sqrt{\frac{σ_{b}^{2}}{σ_{b}^{2} + σ^{2}}} = \frac{σ_{b}^{2}}{σ_{b}^{2} + σ^{2}} .

Also, since

0 = E (b_{i}) = ϕ_{0} + ϕ_{1} E (y_{i 1}) = ϕ_{0} + ϕ_{1} β_{0},

it follows that $φ_{0} = - β_{0} σ_{b}^{2} / (σ_{b}^{2} + σ^{2})$ .

Under (1) and regardless of the distributions for b_i and ∈_it, we have

E (y_{i 2} | y_{i 1}) = β_{0} + β_{1} + E (b_{i} | y_{i 1}) = β_{0} + β_{1} + ϕ_{0} + ϕ_{1} y_{i 1} .

By substituting the expressions of φ₀ and φ₀ into the above and combining the coefficients, we obtain (27).

References

1.Demidenko E. Mixed Models: Theory and Applications. New York: Wiley; 2004. [Google Scholar]
2.Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd. Oxford University Press; 2002. [Google Scholar]
3.Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. New York: Wiley; 2004. [Google Scholar]
4.Goldstein H. Multilevel Statistical Models. Edward Arnold; London; Halsted, New York: 1995. [Google Scholar]
5.Karnofsky DA, Burchenal JH. The clinical evaluation of chemotherapeutic agents in cancer. In: MacLeod CM, editor. Evaluation of Chemotherapeutic Agents. New York: Columbia; 1949. [Google Scholar]
6.Kowalski J, Tu XM. A GEE approach to modeling longitudinal data with incompatible data formats and measurement error: application to HIV immune markers. Journal of the Royal Statistical Society, Series C. 2002;51:91–114. [Google Scholar]
7.Kowalski J, Tu XM. Modern Applied U Statistics. New York: Wiley; 2007. [Google Scholar]
8.Lawton MP, Brody EM, editors. Gerontologist. Vol. 9. 1969. Assessment of older people; self-maintaining and instrumental activities of daily living; pp. 179–186. [PubMed] [Google Scholar]
9.Lyness JM, Niculescu A, Tu XM, Reynolds CF, III, Caine ED. The relationship of medical comorbidity to depression in older primary care patients. Psychosomatics. 2007;47:435–439. doi: 10.1176/appi.psy.47.5.435. [DOI] [PubMed] [Google Scholar]
10.Maas CJM, Hox JJ. The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis. 2004;46:427–440. [Google Scholar]
11.Maas CJM, Hox JJ. Sufficient Sample Sizes for Multilevel Modeling. Methodology. 2005;1:86–92. [Google Scholar]
12.Nobre JS, Singer JM. Residual analysis for linear mixed models. Biometrical Journal. 2007;49:863–875. doi: 10.1002/bimj.200610341. [DOI] [PubMed] [Google Scholar]
13.Prentice RL. Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine. 1989;8:431–440. doi: 10.1002/sim.4780080407. [DOI] [PubMed] [Google Scholar]
14.Rasbash J, Browne W, Goldstein H, Yang M, Plewis I, Healy M, Woodhouse G, Draper D, Langford I, Lewis T. A User's Guide to MLwiN. Multilevel Models Project. University of London; London: 2000. [Google Scholar]
15.Raudenbush SW, Bryk AS. Hierarchical Linear Models. 2nd. Sage; Thousand Oaks, CA: 2002. [Google Scholar]
16.R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2007. URL http://www.R-project.org. [Google Scholar]
17.Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]
18.Spitzer RL, Williams JB, Gibbon M. Structured Clinical Interview for DSM-III-R (SCID) New York: Biometrics Research; 1986. [Google Scholar]
19.Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006. [Google Scholar]
20.Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care. 1992;30:473–483. [PubMed] [Google Scholar]
21.Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42:121–130. [PubMed] [Google Scholar]

[R1] 1.Demidenko E. Mixed Models: Theory and Applications. New York: Wiley; 2004. [Google Scholar]

[R2] 2.Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd. Oxford University Press; 2002. [Google Scholar]

[R3] 3.Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. New York: Wiley; 2004. [Google Scholar]

[R4] 4.Goldstein H. Multilevel Statistical Models. Edward Arnold; London; Halsted, New York: 1995. [Google Scholar]

[R5] 5.Karnofsky DA, Burchenal JH. The clinical evaluation of chemotherapeutic agents in cancer. In: MacLeod CM, editor. Evaluation of Chemotherapeutic Agents. New York: Columbia; 1949. [Google Scholar]

[R6] 6.Kowalski J, Tu XM. A GEE approach to modeling longitudinal data with incompatible data formats and measurement error: application to HIV immune markers. Journal of the Royal Statistical Society, Series C. 2002;51:91–114. [Google Scholar]

[R7] 7.Kowalski J, Tu XM. Modern Applied U Statistics. New York: Wiley; 2007. [Google Scholar]

[R8] 8.Lawton MP, Brody EM, editors. Gerontologist. Vol. 9. 1969. Assessment of older people; self-maintaining and instrumental activities of daily living; pp. 179–186. [PubMed] [Google Scholar]

[R9] 9.Lyness JM, Niculescu A, Tu XM, Reynolds CF, III, Caine ED. The relationship of medical comorbidity to depression in older primary care patients. Psychosomatics. 2007;47:435–439. doi: 10.1176/appi.psy.47.5.435. [DOI] [PubMed] [Google Scholar]

[R10] 10.Maas CJM, Hox JJ. The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis. 2004;46:427–440. [Google Scholar]

[R11] 11.Maas CJM, Hox JJ. Sufficient Sample Sizes for Multilevel Modeling. Methodology. 2005;1:86–92. [Google Scholar]

[R12] 12.Nobre JS, Singer JM. Residual analysis for linear mixed models. Biometrical Journal. 2007;49:863–875. doi: 10.1002/bimj.200610341. [DOI] [PubMed] [Google Scholar]

[R13] 13.Prentice RL. Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine. 1989;8:431–440. doi: 10.1002/sim.4780080407. [DOI] [PubMed] [Google Scholar]

[R14] 14.Rasbash J, Browne W, Goldstein H, Yang M, Plewis I, Healy M, Woodhouse G, Draper D, Langford I, Lewis T. A User's Guide to MLwiN. Multilevel Models Project. University of London; London: 2000. [Google Scholar]

[R15] 15.Raudenbush SW, Bryk AS. Hierarchical Linear Models. 2nd. Sage; Thousand Oaks, CA: 2002. [Google Scholar]

[R16] 16.R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2007. URL http://www.R-project.org. [Google Scholar]

[R17] 17.Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]

[R18] 18.Spitzer RL, Williams JB, Gibbon M. Structured Clinical Interview for DSM-III-R (SCID) New York: Biometrics Research; 1986. [Google Scholar]

[R19] 19.Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006. [Google Scholar]

[R20] 20.Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care. 1992;30:473–483. [PubMed] [Google Scholar]

[R21] 21.Zeger SL, Liang KY. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42:121–130. [PubMed] [Google Scholar]

PERMALINK

On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis

Naiji Lu

Wan Tang

Hua He

Qin Yu

Paul Crits-Christoph

Hui Zhang

Xin Tu

Summary

1 Introduction

2 Models for Longitudinal Data

2.1 Parametric Linear Mixed-effects Model

2.2 Distribution-free Linear Model

3 Inference under Missing Data

3.1 Parametric Model

3.2 Distribution-free Model

3.2.1 Weighted Generalized Estimating Equations

3.2.2 Augmented Weighted Generalized Estimating Equations

3.2.3 Estimation of Weight Function and Augmented Term

3.2.4 Inference for WGEE and AWGEE Estimates

4 Application

4.1 Real Study

Table 1.

Table 2.

4.2 Simulation Study

Table 3.

Table 4.

Table 5.

5 Discussion

Acknowledgments

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis

Naiji Lu

Wan Tang

Hua He

Qin Yu

Paul Crits-Christoph

Hui Zhang

Xin Tu

Summary

1 Introduction

2 Models for Longitudinal Data

2.1 Parametric Linear Mixed-effects Model

2.2 Distribution-free Linear Model

3 Inference under Missing Data

3.1 Parametric Model

3.2 Distribution-free Model

3.2.1 Weighted Generalized Estimating Equations

3.2.2 Augmented Weighted Generalized Estimating Equations

3.2.3 Estimation of Weight Function and Augmented Term

3.2.4 Inference for WGEE and AWGEE Estimates

4 Application

4.1 Real Study

Table 1.

Table 2.

4.2 Simulation Study

Table 3.

Table 4.

Table 5.

5 Discussion

Acknowledgments

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases