Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jan 20.
Published in final edited form as: Stat Neerl. 2010 May 1;64(2):133–156. doi: 10.1111/j.1467-9574.2009.00435.x

Semiparametric regression models and sensitivity analysis of longitudinal data with nonrandom dropouts

David Todem 1,, KyungMann Kim 2, Jason Fine 3, Limin Peng 4
PMCID: PMC3023945  NIHMSID: NIHMS260030  PMID: 21258610

Abstract

We propose a family of regression models to adjust for nonrandom dropouts in the analysis of longitudinal outcomes with fully observed covariates. The approach conceptually focuses on generalized linear models with random effects. A novel formulation of a shared random effects model is presented and shown to provide a dropout selection parameter with a meaningful interpretation. The proposed semiparametric and parametric models are made part of a sensitivity analysis to delineate the range of inferences consistent with observed data. Concerns about model identifiability are addressed by fixing some model parameters to construct functional estimators that are used as the basis of a global sensitivity test for parameter contrasts. Our simulation studies demonstrate a large reduction of bias for the semiparametric model relatively to the parametric model at times where the dropout rate is high or the dropout model is misspecified. The methodology’s practical utility is illustrated in a data analysis.

Key words and Phrases: Exponential family distribution, Functional estimators, Global sensitivity analysis, Informative dropout, Infimum/Supremum statistic, Nonparametric mixture, Uniform convergence, non-identifiable models

1. Introduction

Proper handling of missing data and dropouts in particular is critical in statistical analyses of longitudinal studies. It is well documented that improper handling of missing values may lead to misleading inferences (see, for example, Little and Rubin, 1987; Scharfstein et al., 1999; Rotnitzky et al., 2001; Kenward et al., 2001; and Hogan et al., 2004). Proper treatment of missing data in statistical analysis depends primarily on the underlying missingness mechanism for which Little and Rubin (1987) provide a helpful terminology. Data are classified as missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), if missingness is allowed to depend on (1) none of the outcomes, (2) the observed outcomes only, and (3) unobserved outcomes as well, respectively. From a modelling standpoint, most statistical approaches for handling missing data rely on the stronger MCAR or the less restrictive MAR assumptions (Hogan et al., 2004). Examples of such models in the context of longitudinal data are the generalized estimating equations-GEE models in their original formulation (Liang and Zeger, 1986) for the MCAR mechanism, and both the weighted GEE-based (Robins et al., 1995) and likelihood-based models (e.g., Laird and Ware, 1982) for the MAR mechanism. These models are commonly employed in practical data analytic settings, owing to their conceptual simplicity and ease of implementation. They are also known to generate valid inferences under the associated missing data mechanisms (Verbeke and Molenberghs, 2000). However, when the missingness mechanism depends on the unobserved outcomes, these procedures are known to produce biased inferences. To overcome this difficulty, Diggle and Kenward (1994) and Molenberghs et al. (1997) among others have proposed a model that incorporates both the information from the measurement process and the missing data process into a unified estimating function. This has provoked a large debate about the role for such models in understanding the true data generating mechanism. The original enthusiasm was followed by skepticism about the strong and untestable assumptions on which this type of models rests (Verbeke et al., 2001). Despite these limitations, more researchers recognize that these models should not be rejected but should be made part of a sensitivity analysis. Next to this issue, models that incorporate the measurement and dropout processes are usually not identifiable from observed data (see, for example, Troxel et al., 1998; Scharfstein et al., 1999; and Kenward et al., 2001). One then has to impose quantitative restrictions to recover identifiability. Conventional restrictions result from considering a minimal set of parameters, called sensitivity parameters, conditional upon which the remaining parameters are assumed identifiable. This method therefore produces a range of models which forms the basis of sensitivity analysis (Vach and Blettner, 1995). Over the years, numerous authors have proposed a local sensitivity approach to assess the impacts of uncertainties on inferences when the data are partially observed (Copas and Eguchi, 2001; Verbeke et al., 2001; Troxel et al., 2002; and Todem et al., 2006). The idea stems from assessing the effects of small perturbations of the MAR model in the direction of MNAR models. Where such methods have appeared in the literature, the sensitivity of identifiable parameters with respect to some fixed parameters is estimated via partial derivatives calculated in the neighborhood of a specified solution, typically at the MAR location. This local approach is important but a limitation is that only selected values of the sensitivity parameters are considered. In practice, it is often impossible to know the true value of the sensitivity parameters, making the global analysis conducted across all values of the sensitivity parameters the most conservative analytic strategy. Another limitation is that inference regarding the identifiable model parameters is ad hoc, ignoring that multiple tests are conducted and that the assumed values of the sensitivity parameters may be incorrect.

We consider a class of likelihood models for the measurement and dropout processes. Specifically, we formulate a generalized linear mixed effects model to describe the joint distribution of the measurement outcomes in time and then extend this basic model to allow for nonrandom dropouts. A novel formulation of a shared random effects model is introduced to capture the dropout dependence on the outcome process. Two forms of sensitivity analysis are considered. A qualitative sensitivity analysis is conducted to address concerns about the unverifiable nature of any working MNAR model using observed data. Specifically, we consider a nonparametric distribution as well as a fully parametric distribution for the random effects. Another line of sensitivity, essentially quantitative, is performed to address concerns about model identifiability. We profile the model across a fixed parameter to delineate the range of inferences consistent with observed data. We propose a supremum test to conservatively evaluate the sensitivity of some parameter contrasts across all plausible values of the sensitivity parameter. This is particularly important when the quantity being tested does not increase or decrease monotonically as the sensitivity parameters are increased or decreased. Under monotonocity, it is only necessary to perform the tests at the limits of the sensitivity parameter space. However, in practice, missing data models are often quite complicated and it may not be clear whether monotonicity holds. In these set-ups, the analysis should be undertaken globally across all values of the sensitivity parameter.

The rest of this article is organized as follows. In section 2, we develop a class of shared latent variable models for longitudinal data with nonrandom dropouts, and discuss some key features of the models with related theoretical and asymptotic results. In Section 3, we propose a global sensitivity test based on a supremum hypothesis to assess simultaneously any perturbations of the MAR model in the direction of MNAR models. An illustration of the methodology using a psychiatric dataset is given in section 4. Section 5 presents a simulation study. Some remaining issues are discussed in section 6.

2. Shared latent variable models

2.1. Setup and notations

For each subject i = 1, ⋯, n, there are q underlying outcomes represented by the vector Yi(𝒞) = (Yi(t1), ⋯, Yi(tq))′, measured respectively at discrete time points in the set 𝒞 = {t1, ⋯, tq}, where t1 < t2 < ⋯ < tq. These outcomes, however, are not always fully observed and therefore are coupled with a missingness indicator vector Ri(𝒞) = (Ri(t1), ⋯, Ri(tq))′, where Ri(t) = 1 if Yi(t) is unobserved and 0 if otherwise. For a monotone missing data process, Ri(𝒞) has the property that Ri(s) = 0 whenever Ri(t) = 0 for st and Ri(s) = 1 whenever Ri(t) = 1 for st. It is then intuitive to represent the series of indicators {Ri(t) : t ∈ 𝒞} for subject i by the random variable Di = 1 + ∑t∈𝒞(1 − Ri(t)). We assume that for all i, pr(Di ≥ 2) = 1 which implies that all subjects are present at first time point. The random variable Di then takes values 2, ⋯, q + 1, with q + 1 being the realized value for a complete sequence.

2.2. Model formulation

We use a shared random parameter bi coupled with the conditional independence assumption to specify the joint distribution of the measurement outcome and dropout response for subject i, denoted by ℒ(Yi(𝒞), Di) = ∫ ℒ(Yi(𝒞)|bi)ℒ(Di|bi)dℒ(bi). In this formulation, the generic label ℒ(u) denotes the law of u and ℒ(u|v) the conditional law of u given v.

The law of Yi(𝒞) for fixed bi could be very general, but we restrict our formulation to the conditional independence assumption (Yi(𝒞)|bi)=j=1q(Yi(tj)|bi). The conditional law for the time point outcome is assumed to be generated from a generalized linear model with random effects. Specifically, we assume that the conditional mean µi(t) = E{Yi(t)|bi, xi(t)} of an hypothetical (observed or unobserved) response Yi(t) at time point t for subject i is modeled as,

μi(t)=g1{xi(t)β+bi} (1)

where β = (β0, ⋯, βp−1)′ is the slope vector associated with covariates xi(t) of dimension p × 1 and g(․) is a monotone, differentiable and invertible function. The function g(․) is assumed known and is typically chosen to be the identity function for continuous outcomes, whereas for binary outcomes, the logit link is the natural choice.

To specify the law ℒ(Di|bi), we assume that there exists an observed covariate zi(tj) that describes the discrete dropout hazard hi(tj) = pr(Di = j|Dij, bi, zi(tj)) at timepoint tj. Specifically, we consider the hazard model given by,

log{hi(t)/(1hi(t))}=ε(zi(t))α+δϕ(bi),0<ϕ(bi)<1, (2)

where ε(zi(t)) is a function of the covariate zi(t) of dimension r with associated coefficient α, and δ is the slope associated with ϕ(bi), ϕ(․) being a map from ℛ to the unit length open interval (0, 1). One simple example of the function ϕ(․) is the nondecreasing function, ϕ(bi) = (1 + e−ϖ(bi))−1, where ϖ(․) is also a nondecreasing function. When bi has an infinite support, we have limbi→−∞ ϖ(bi) = −∞ and limbi−∞ ϖ(bi) = ∞, with suitable modifications when bi has a finite support. Any such function ϖ(․) that meets the conditions above defines a function ϕ(․) and vice versa. Expressing ϖ(․) in terms of ϕ(․) is straightforward and given by, ϖ(bi) = log {ϕ(bi)/(1 − ϕ(bi))} and is referred to as the logit function associated with ϕ(․).

Finally, to complete the model formulation, the law of the random variable bi denoted ℒb = ℒ(bi) can be assumed parametric or left completely unspecified with a discrete support. That is,

b~𝒩(0,τ2)  or  b~m=1Mπm𝒟(ζm), (3)

where τ2 is the unknown variance and 𝒟(ζm) is the Dirac measure placing point mass πm at the single point ζm, with m=1Mπm=1. These random effects distributions are viewed as a sensitivity analysis to address distribution misspecification. An advantage of the discrete law is that we do not introduce possibly inappropriate and unverifiable assumptions about the distribution of random effects. With the maximum number of identifiable latent classes, the mixing distribution may be interpreted as a nonparametric distribution (Laird, 1978). The discreteness of this law forces the random effect bi to be one of the M unknown latent points, ζ1, ⋯, ζM. An important difference with the parametric model is that each subject does not have her own intercept in a latent class model. Instead, it is assumed that each subject belongs to one of the M latent classes and that each latent class has its own intercept. In practice, the number M is increased until the model fit no longer improves.

In the proposed model, the random effects term bi is shared between the measurement outcome and the dropout process. This idea of using a shared latent variable model for modelling longitudinal outcomes subject to nonrandom dropouts is not new (see, for example, Wu and Carroll, 1988; Albert and Follmann, 2000; and Ten Have et al., 2002). A more recent illustration of this methodology is given by Beunckens et al. (2008). These authors have proposed a latent class model for incomplete longitudinal gaussian data viewed as a candidate model in their sensitivity analysis. Our extension of this technique is the introduction of the monotonic transformation ϕ(․) on the unobserved latent variable, which gives several attractive features to the dropout model. The constraint on the transformation is similar to that imposed by Rosenbaum (2002, page 107) on the unobserved covariate used to explain the hidden bias in assigning different treatments to subjects with same observed covariates. Our constraint may be seen as a restriction on the scale of the transformed unobserved latent variable, a restriction needed for δ to have a meaningful interpretation. It can be shown that the odds ratio of dropout hazard for two subjects with the same observed covariates for the dropout model is at most exp{|δ|}. For a sketch of the proof, we consider two subjects i and i′ who have the same observed covariates with respect to the dropout model, say zi(t) = zi′(t) (or ε(zi(t)) = ε(zi′(t))) at time point t. It can easily be shown (proof given in Appendix) that the odds ratio of dropout hazard between subjects i and i′ is given by,

{hi(t)/(1hi(t))}{hi(t)/(1hi(t))}1=exp{δ(ϕ(bi)ϕ(bi))}.

In other words, two subjects with the same observed covariates for the dropout model differ in their odds of dropout hazard by a factor that involves the parameter δ and the difference in their transformed unobserved covariates ϕ(bi) − ϕ(bi′). Giving the constraint imposed on the function ϕ(․), it follows that the ratio above is bounded as follows,

exp{|δ|}{hi(t)/(1hi(t))}{hi(t)/(1hi(t))}1exp{|δ|}.

Hence, for δ = 0 we have a dropout at random mechanism where two subjects with same observed dropout covariates have equal dropout hazard odds. When δ = log(3), two subjects with same observed dropout covariates could differ in their instantaneous dropout odds by a factor as much as 3.

2.3. Model identifiability

It is well known that nonrandom dropout models are typically not identifiable from observed data (see, for example, Scharfstein et al., 1999; and Kenward et al., 2001). Both simulation-based studies and analysis of actual data show that nonrandom models often have likelihood functions with flat surfaces and/or with multiple modes. We study some implications of these identifiability concerns for the shared parameter model. For this, let bo=(b|D>q) and bd=(b|Dq) be the laws of the random effects b for complete and incomplete data subjects, respectively. Here, we have suppressed the subject index for simplicity. The law ℒb of the random effects can be written as,

b=(1pd)bo+pdbd, (4)

where pd = P(Dq). If many independent copies of the indicator variable I(Dq) are available, pd can be estimated very well. Although the random effects are not observed, the law bo can also be well estimated using the law of Y(𝒞) among completers. It can been shown, using a simple probability argument, that bo=(b|Y(𝒞),D>q)(Y(𝒞)|D>q)d(Y(𝒞)). The law ℒ(Y(𝒞)|D > q) is easily identifiable from observed data. The law ℒ(b|Y(𝒞), D > q) is also identifiable using the fundamental assumption of conditional independence of measurement responses given random effects. A typical estimate of ℒ(b|Y(𝒞), D > q) is the nonparametric maximum likelihood estimate, a discrete distribution with at most nnd support points, nd = ∑i I(Diq) being the number of dropout subjects (Lindsay, 1983; and Davidian and Giltinan, 1998). The unidentifiable component of the model is clearly bd=(b|Y(𝒞),Dq)(Y(𝒞)|Dq)d(Y(𝒞)), the law of b for dropout subjects, as the law ℒ(Y(𝒞)|Dq) is unidentifiable from observed data. The theorem below gives some implications of this nonidentifiability in a setup where q = 2.

THEOREM 1. Assume that the study has only two planned time points, q = 2, and that the laws pd and bo are all known. The fact that the law bd is not identifiable translates into α0, the intercept of the hazard model, and δ being related by the deterministic relationship,

exp{α0}=(1pd)pd1exp{δϕ(b)}dbo(b). (5)

Here, the integration dbo(b) is with respect to the Lebesgue measure on the real line for the normal latent variable and to the counting measure for the discrete nonparametric distribution. The proof of this theorem (given in Appendix) is similar to that of Freedman (1999) in the context of dropout selection models. When q > 2, from our experience the joint model is often at best weakly identifiable especially when M gets large. Such overspecification of the model can be managed in a more general way by considering a minimal set of parameters, conditional upon which the other parameters are estimable. It is important, however, to realize that the choice of the sensitivity parameter is non-unique and can be a difficult task (Kenward et al., 2001). In general, sensitivity parameters are chosen from parameters that are not of primary interest. Ideally, the choice should be closely linked to the substantive problem under analysis. A natural choice for the sensitivity parameter in our joint model is δ, the parameter that measures the extent of nonrandomness of the dropout process.

2.4. Functional semiparametric inferences via EM algorithm

Let ψ = (β′, α′, ζ′, π′)′ be the identifiable parameter vector of the semiparametric model, with ζ = (ζ1, ⋯, ζM)′ and π = (π2, ⋯, πM)′. Since M free parameters are considered for ζ, no overall intercept is allowed in the vector β for the model in (1) to ensure identifiability. The maximum likelihood estimate (MLE) of the parameter vector ψ can be obtained by means of the EM algorithm introduced by Dempster, Rubin and Laird (1977), since direct maximization of integrated likelihood is difficult.

We denote by Yi(𝒪i), where 𝒪i = {t1, ⋯, tDi−1}, the collection of all observed outcomes for subject i and by Yi(ℳi), where ℳi = {tDi, ⋯, tq} for Diq, the collection of all missing random outcomes. We also denote by xi(𝒪i) and zi(𝒪i), respectively the collection of all fixed effects covariates for the measurement process and dropout hazard. Define the indicator random variables for the class membership as follows: 𝒰im = 1 if the ith subject belongs to the mth class and 0 if otherwise. It follows from this definition that E(𝒰im) = πm. We consider the complete data (𝒰i, Yi(𝒪i), Di), where 𝒰i = (𝒰i1, ⋯, 𝒰iM)′ and the observed data Wi = {Yi(𝒪i), Di, zi(𝒪i), xi(𝒪i)}. Unlike other EM algorithms for non-response models, our approach does not involve the missing component Yi(ℳi) when Diq. Indeed, by assuming that the possibly incomplete random vector Yi(𝒞) and the dropout variable Di are conditional independent given the latent variable, the integration of the conditional likelihood with respect to the law of missing data results in a closed form.

We denote by ℓ(𝒰i, Yi(𝒪i), Di) the contribution of subject i to the complete data likelihood function. Suppose that the current estimate, at step a, of the parameter vector is ψ(a)(δ) for a fixed δ. The E-step of the EM algorithm involves computing the expectation of the complete data log-likelihood given observed data and the current estimate for a fixed δ. The contribution of subject i to this conditional expectation is given by,

E{log{(𝒰i,Yi(𝒪i),Di)}|Wi,δ,ψ(a)(δ)}=m=1Mt𝒪iπim(ψ(a)(δ))log{fY(Yi(t)|ζm,xi(𝒪i),ψ)}+m=1Mπim(ψ(a)(δ))log{fD(Di|ζm,zi(𝒪i),δ,ψ)}+m=1Mπim(ψ(a)(δ))log{πm}.

Here πim(a)(δ)) = E {𝒰im|Wi, δ, ψ(a)(δ)} is the posterior probability for the ith subject to belong to the mth latent class and given by,

E{𝒰im|Wi,δ,ψ(a)(δ)}=πmt𝒪ifY(Yi(t)|ζm,xi(𝒪i),ψ(a)(δ))fD(Di|ζm,zi(𝒪i),δ,ψ(a)(δ))ν=1Mπνt𝒪ifY(Yi(t)|ζν,xi(𝒪i),ψ(a)(δ))fD(Di|ζν,zi(𝒪i),δ,ψ(a)(δ)).

The term fY (Yi(t)|ζm, xi(𝒪i), ψ) is the time point conditional density of the measurement process. The quantity fD(Dim, zi(𝒪i), δ, ψ) = [∏t∈𝒪i (1 − hi|m(t))] {hi|m(tDi)}I(Diq), with hi|m(t) = {1 + exp(−ε(zi(t))α − δ ϕ(ζm))}−1 and setting hi|m(t1) = 0 and hi|m(tq+1) = 1, is the dropout probability when the ith subject belongs to class m. In the M-step of the algorithm, identifiable model parameters are updated so that the expected log-likelihood is maximized. An update of the parameter πm, is given by πm(a+1)(δ)=1ni=1nπim(ψ(a)(δ)). Maximization of the expected complete likelihood with respect to β can be done using standard algorithms for the maximum likelihood estimation of generalized linear models.

An important issue to be confronted is the choice of M, the number of random effects classes in the discrete model in (3). It is well known that this number is not an interior point of a convex space, which rules out the use of the likelihood ratio approach (McLachlan and Peel, 2000). To reduce complexity, together with δ we held M fixed for each model and used goodness-of-fit tools such as the Bayesian information criterion to select the appropriate number of classes (see, for example, Beunckens et al., 2008).

2.5. Asymptotic behaviors of the estimator of ψ

We now study the asymptotic behavior of the estimator of ψ when δ the parameter which describes the degree of selection bias in the model is predetermined. First, we show that it is necessarily for δ to be bounded, otherwise the marginal log-likelihood function diverges.

THEOREM 2. We assume that 0 < infi,t |ε(zi(t))α| ≤ supi,t |ε(zi(t))α| < ∞. When δ → ∞, the profile log −likelihood log ℓ(ψ, δ) → −∞ and a finite maximizer does not exist.

Here ℓ(ψ, δ) denotes the likelihood function for a fixed δ. Note here that the assumptions of the theorem are also verified under the stronger condition of a compact space for the parameter vector α and bounded associated transformed covariates. The theorem result (proof given in Appendix) also holds for the expected complete data log-likelihood given observed data and the current estimate for a fixed δ. This result implies that when δ is unbounded, a finite MLE of ψ does not exist. This suggests that the parameter space of δ needs to be bounded. In other words, one needs to assume that δ ∈ ℋ, a bounded set. Without any loss of generality we will restrict ℋ to the closed interval [0, Δ], with 0 ≤ Δ < ∞. In practice, subject matter expert should be consulted to decide on the choice of the largest plausible value of δ. From a technical standpoint, this choice should be computationally feasible. This limiting result is different from that of Scharfstein et al. (1999) where the parameters of interest converge to fixed values as the sensitivity parameter becomes infinitely large.

Let δ0 be the true value of δ from the joint model. If δ0 is known, the MLE of ψ0 (the true value of ψ) denoted ψ̂(δ0) can be obtained by maximizing the profile likelihood with δ fixed at the true value δ0. As the sample size n gets large, the maximum likelihood theory (see for example, Casella and Berger, 1990) ensures that, ψ̂(δ0) is consistent for ψ0, that is ψ̂(δ0) →p ψ0, and n12{ψ^(δ0)ψ0}d𝒩(0,Σ0), with Σ0 being the associated asymptotic variance-covariance matrix. Under a misspecified dropout process, i.e. δ ≠ δ0, and for a large n, we have ψ̂(δ) →p ψ* (δ), with ψ*(δ) not necessarily equal to ψ0. Moreover, the profile score functions are roughly quadratic in the neighborhood of ψ*(δ) for fixed δ and the limiting distribution of n12{ψ^(δ)ψ*(δ)} is 𝒩(0, Σ*(δ)), with Σ*(δ) being the asymptotic variance-covariance matrix. These pointwise weak convergence results are supported by the theory on empirical processes (see for example van der Vaart and Wellner, 2000). They can be made uniform across the space of δ under certain smoothness conditions (see details in Appendix). As the covariance function Σ*(δ) characterizing the limiting distribution of ψ̂(δ) is quite complicated, the bootstrap may be used to calculate the asymptotic standard errors of parameter estimates from a misspecified model (Efron and Tibshirani, 1993). The bootstrap is especially useful when simultaneous inference about ψ*(δ) for δ ∈ ℋ is of interest and cannot be carried out analytically. In the next section, we discuss a global sensitivity analysis in which inferences are conducted simultaneously across the range of δ.

3. Global sensitivity tests

We consider the contrast Cψ, where C is an l × (p + r) matrix which defines a general framework for estimating single and multiple linear combinations of model parameters. As an example, in the special case of evaluating the effect of the jth covariate, one takes C to be a 1 × (p + r) vector with a one in the jth position and zeros elsewhere. Suppose we are interested in assessing the sensitivity of the contrast Cψ*(δ) relative to Cψ*(0) across all values of δ. Formally, this global sensitivity hypothesis can be formulated as follows,

H0:supδCψ*(δ)Cψ*(0)=0,

where ‖․‖ represents the Euclidean norm. This hypothesis can be used to assess any perturbation of the MAR model in the direction of the MNAR models. When the supremum hypothesis is not rejected say at 5% significance level it can be concluded that there is no evidence from observed data to reject ‖Cψ*(δ0) − Cψ*(0)‖ = 0. This follows from the trivial inequality,

Cψ*(δ0)Cψ*(0)supδCψ*(δ)Cψ*(0).

In this case, Cψ̂(0) can be used to make rigorous inferences about Cψ*(δ0). However, when the supremum hypothesis is rejected, a sensitivity analysis is then carried out to identify values of δ for which the hypotheses ‖Cψ*(δ) − Cψ*(0)‖ = 0 are rejected. Our global analysis then asks how would inferences about the contrasts Cψ*(δ) − Cψ*(0) be altered by nonrandom dropouts of a magnitude defined by δ?

The supremum test can easily be performed using a nonparametric bootstrap approach. Let 0 < γ < 1, the hypothesis H0 is rejected at level γ when the observed test value is greater than the (1 − γ) percentile of the bootstrap samples of the test statistic Tsup = supδ∈ℋCψ̂(δ) − Cψ̂(0)‖ under the null hypothesis. In the special case of estimating the effect of a one-dimensional covariate say Cψ = β1, simultaneous confidence intervals of β1*(δ)β1*(0) across all values of δ can be used to evaluate the sensitivity hypothesis. Let β˜1s(δ) and β˜1s(0), s = 1, ⋯, S, be bootstrap samples of β̂1 (δ) and β̂1(0), respectively. A simultaneous confidence interval for β1*(δ)β1*(0), for all δ ∈ ℋ, takes the form

{u(δ):R;ϑ˜γ+β^1(δ)β^1(0)<u(δ)<ϑ˜γ+β^1(δ)β^1(0)},

where ϑ̃γ is the (1 − γ)th empirical percentile of {supδ|(β˜1s(δ)β˜1s(0))(β^1(δ)β^1(0))|}s=1S. When H0 is rejected, the interval (infδ∈ℋ β̂1(δ), supδ∈ℋ β̂1(δ)) gives the minimum and the maximum sizes of the estimate of β1 over all values of the selection bias due to dropouts (see, for example, Kenward et al., 2001). This interval due to dropout uncertainty can be extended to incorporate uncertainty due to sampling imprecision by computing a lower and an upper confidence bound for infδβ1*(δ) and supδβ1*(δ), respectively.

4. Analysis of psychiatric data

The Fluvoxamine (a serotonin reuptake inhibitor) trial is a multi-center non-comparative study, designed to reflect clinical practice closely with out-patients diagnosed with depression, obsessive-compulsive disorder or panic disorder. Accumulated experience in controlled trials has shown that Fluvoxamine is as effective as conventional tricyclic antidepressant drugs, and more effective than placebo in the treatment of depression (for a review, see Burton, 1991). However, many patients suffering from depression have concomitant morbidity associated with this condition. It was then decided to set up a post-marketing pharmaco-vigilance trial to study more accurately the profile of Fluvoxamine in ambulatory clinical psychiatric practice. A total of 315 patients with a diagnosis of either depression or obsessive-compulsive or panic disorder were enrolled in the study. All subjects were treated with Fluvoxamine in doses ranging from 100 to 300 mg/day and underwent clinical evaluations at baseline, 2, 4, 8 and 12 weeks. One primary endpoint comprised the side effects of the drug recorded on an ordinal scale. A side effect occurs if new symptoms appear. Several patient’s baseline characteristics such as sex, age, initial severity of the disease on a 1 to 7 scale, and duration of the mental illness were recorded. One key objective of the study was to assess the within-subject evolution of side effects over time adjusted for baseline covariates. In particular, the drug company was interested in describing the side effects’ profile for each study participants for a possible change in doses. A full description of the study is given by Molenberghs and Lesaffre (1994), Lesaffre et al. (1996) and Kenward et al. (1994).

Out of 315 patients, 224 patients had a full-sequence data resulting from the fact that 14 subjects were not observed after recruitment, 31, 26, and 18 patients dropped out, respectively, after the first, second and third visit and 2 patients had a non-monotone missing pattern. As our analysis focuses only on dropouts, we ignore the 2 cases that had a non-monotone missing pattern and ignore all 14 patients who dropped out before the first visit. Compared to the start of the study, a reduction of side effects is depicted throughout the study for both the completers and the non-completers (see Figure 1). Specifically, about 54% of patients who complete the study had some side effects at 2 months compared to 66% for subjects who were present at 2 months but dropped out before the next visit. At 4 months, these numbers were 44% for completers compared to 62% for noncompleters and at 8 months, 35% for completers compared to 44% for noncompleters. It is clear that study noncompleters are likely to be doing poorly with respect to side effects. This makes the MNAR process a plausible missingness mechanism. Therefore, a naive reasoning that ignores this selection process may lead to an incorrect and over optimistic conclusion. A question then emerges. How do side effects evolve over time accounting for this potentially nonrandom dropout mechanism? To answer this question, we consider the shared latent variable model described in section 2 to assess the effect of time on the outcome variable for various values of the parameter which measures the extent of nonrandomness of the dropout process.

Fig. 1.

Fig. 1

Log odds of side effects across time for completers and dropouts

Our analysis of the within-subject evolution (captured by time effects) is based on a dichotomized version (presence/absence) of side effects. Considering an ordinal outcome rather than binary would only add unnecessary complexity to the analysis. For this, we consider a simple Bernoulli model for which the conditional mean is given by,

E{Yi(t)|bi,xi(t)}={1+exp(βxi(t)bi)}1,

where Yi(t) = 1 if new symptoms occur and 0 if otherwise, and β is the slope vector associated with the design covariate vector xi(t). The vector xi(t) contains the linear time variable and the baseline variables (sex, age at enrolment, initial severity of the disease on a 1 to 7 scale, and duration of the mental illness at enrolment). We extend the proposed model to account for nonrandom dropouts as follows,

log{hi(t)/(1hi(t))}=αxi(t)+δϕ(bi),

where α is the slope associated with xi(t) and ϕ(bi) = 1/(1 + ebi). Here the function ϖ(․) is set to identity.

We are interested in the parameter vector β, specifically the slope parameter say β1 of the linear time variable. We attempted to fit the parametric and the semiparametric models to the data without fixing δ but the computations were unstable, especially for large M. Multiple starting values were tried. In some cases, the algorithm diverged, while in cases were it did converge, multiple local maxima were obtained for the full likelihood. This suggests that these models are at best weakly identified for our data. To address these identifiability concerns, we then fixed δ and performed the supremum test described in section 3 to assess sensitivity of the parameter β1 across all values of δ, under the parametric and the semiparametric models. For this, we computed estimates of β1*(δ)β1*(0) and associated 95% simultaneous confidence intervals across all values of δ ∈ ℋ. These estimates and associated 95% simultaneous confidence intervals are graphically represented in Figures 2(a) and 2(b), respectively for the semiparametric and the parametric working models. These graphs are obtained by solving the profile score equations for fixed values of δ on a grid and mid-points estimators are then interpolated via smoothing. The 95% critical points (.14 and .26 for the semiparametric and parametric models, respectively) to compute the simultaneous confidence intervals were obtained using 1000 bootstrap samples. Simultaneous confidence intervals do not contain 0 for δ ≤ 40 under the semiparametric model and for δ ≤ 30 under the parametric model. Hence, unless the true odds ratio of dropout hazard for two subjects with same fixed covariates is larger than exp(40) and exp(30) for the semiparametric and parametric models respectively, the linear time effect appears to be relatively stable under any perturbations of the MAR model in the direction of MNAR models. The evidence from our investigation speaks against dropout being at random in the Fluvoxamine data, but the sensitivity analysis results suggest that the adjusted slope of time effects are not much influenced by δ in light of sampling variations. The MAR model is therefore preferable for its relative simplicity. From a clinical standpoint, our MAR analysis suggests that the intensity of side effects declines with time. For each unit increase in time, the estimated adjusted odds for not showing some side effects can increase by a factor (95% confidence interval in parentheses) of 2.071(1.648, 2.603) for the semiparametric model and 2.109(1.671, 2.662) for the parametric model. This example illustrates how a global approach can be used to assess sensitivity of parameters of primary interest when some model characteristics are potentially unidentifiable from observed data. Moreover, it also illustrates how a sensitivity analysis can be conducted on the random effects distribution for model misspecification.

Fig. 2.

Fig. 2

Estimates and simultaneous 95% confidence intervals of β1*(δ)β1*(0) across values of the sensitivity parameter in the range 0 ≤ δ ≤ 40 assuming (a) a semiparametric model and (b) a parametric model

5. Simulations

In this section we report results of a numerical study conducted to evaluate the small sample performance of the functional estimators proposed in this paper when the dropout process is nonrandom. The simulations were conducted so to roughly approximate the binary outcomes from the Fluvoxamine study. In each Monte Carlo iteration, we simulated a sample of 100 subjects with four potential measurement time points (q = 4). The measurement outcomes were simulated using the Bernoulli model, fY (Yi(t)|bi, xi(t)) = µi(t)Yi(t){1 − µi(t)}1−Yi(t), where µi(t) = {1 + exp(−1 + xi(t) − bi)}−1, with xi(t) = t taking values in the set {1, 2, 3, 4}. To keep the simulation simple, the dropout observations were generated using a time independent dropout hazard model given by, log {hi(t)/(1 − hi(t))} = α0 + δ0/(1 + ebi). This simplified assumption may be seen as too restrictive as missingness is typically related to things which happen as the study is ongoing. Throughout the simulations, we also fixed δ0 = 1. We produced 20% and 50% dropout rates by setting the values of α0 to 2.094 and .855, respectively. Two different distributions with a variance of 4.025 for the random effects bi were considered: a Gaussian distribution, ℒb ~ 𝒩(0, 4.025) and a bimodal mixture of two Gaussian distributions, b~12𝒩(2,.025)+12𝒩(2,.025). This process was repeated for 250 Monte Carlo replications. Our working measurement model is the Bernoulli model with the conditional mean,

μi(t)={1+exp(β0β1xi(t)bi)}1. (6)

The working dropout model is also given by,

log{hi(t)/(1hi(t))}=α+δϕ(bi), (7)

where ϕ(bi) = 1/(1+ebi). Both the semiparametric and the parametric models were then assessed for various values of δ ∈ [0, 40]. In particular, we used the percentage bias relative to the truth and mean squared errors (MSEs) for the slope β1 of the time covariate.

Figure 3 shows the results of this simulation study when the measurement and dropout data are generated from a shared normal random effects model. Overall, higher dropout rates increase the bias as well as the MSEs of the estimates for both working models. Note however that these differences recede when δ is close to the true value δ0 = 1 and the two working models have similar performances in terms of bias and MSEs. However, for moderate to large deviations of δ from δ0 = 1, the normal random effects approach gives higher values of percent bias and MSEs when the dropout rate is about 50%.

Fig. 3.

Fig. 3

Percent bias and MSEs of parameter β1 as a function of the sensitivity parameter δ with true measurement model µi(t) = {1 + exp(−1 + xi(t) − bi)}−1, true dropout process hi(t) = {1 + exp(−α0 − ϕ(bi))}−1, with α0 = 2.094 (20% dropouts) and α0 = .855 (50% dropouts) and true shared random effects ℒb ~ 𝒩 (0, 4.025)

Also displayed in Figure 4 are simulation results from a Monte Carlo study where the measurement and dropout data are generated from a shared bimodal random effects model. As expected, higher dropout rates increase the bias as well as the MSEs of the two working models. We notice however that, even at the true value δ0 = 1, a wrongly assumed normal random effects model gives a slightly higher percent bias compared to the semiparametric model. This finding suggests that the misspecification of the random effects distribution by a parametric model will typically yield biased estimates even at the true value δ0. This is not uncommon in the literature (See, for example, Tao et al., 1999). Note however that, this problem recedes when MSEs are compared. This is probably due to an increased variability which dominates the MSEs in the semiparametric model. When the dropout rate is high and δ is much larger than δ0 = 1, this relative advantage of the normal random effects model in reducing the MSEs vanishes. In this case, the MSEs appear to be dominated by the bias.

Fig. 4.

Fig. 4

Percent bias and MSEs of parameter β1 as a function of the sensitivity parameter δ with true measurement model µi(t) = {1 + exp(−1 + xi(t) − bi)}−1, true dropout process hi(t) = {1 + exp(−α0 − ϕ(bi))}−1, with α0 = 2.094 (20% dropouts) and α0 = .855 (50% dropouts) and true shared bimodal random effects b~12𝒩(2,.025)+12𝒩(2,.025)

While the ability to model the dropout hazard using the shared random effects approach is computationally convenient, we carried out a simulation to evaluate the estimation of β1 when miss-modelling hi(t) with a shared random effects when the truth is actually a logistic model of the current possibly unobserved response Yi(t). Specifically, we simulate the measurement binary outcomes by reconsidering the Bernoulli model with the conditional mean µi(t) = {1 + exp(−1 + xi(t) − bi)}−1, but dropout data are simulated using the process,

log {hi(t)/(1hi(t))}=α0+Yi(t).

This selection model is similar to that of Diggle and Kenward (1994), and Molenberghs et al. (1997). We set α0 to 2.094 and .855, which yield about 20% and 50% dropout rates, respectively. Our working joint model is given by (6) and (7). The results of this numerical investigation are displayed in Figures 5 and 6. It is clear that for normal random effects imposed on the mean of the measurement process and for a 20% dropout rate, both the parametric and the semiparametric models give small bias and small MSEs regardless of the values taken by δ. However, for a dropout rate of about 50%, the two methods yield large bias and large MSEs. Moreover, the bias and MSEs from the semiparametric model are lower compared to those of the parametric model for large values of δ, although the two models are indistinguishable for small values of δ. These findings remain unchanged when a bimodal random effects is imposed on the mean of the measurement process, except that for a high dropout rate, the semiparametric model gives smaller bias and smaller MSEs for all values of δ.

Fig. 5.

Fig. 5

Percent bias and MSEs of parameter β1 as a function of the sensitivity parameter δ with true measurement model µi(t) = {1 + exp(−1 + xi(t) − bi)}−1, true dropout process hi(t) = {1 + exp(−α0Yi(t))}−1, with α0 = 2.094 (20% dropouts) and α0 = .855 (50% dropouts) and true unshared random effects ℒb ~ 𝒩(0, 4.025)

Fig. 6.

Fig. 6

Percent bias and MSEs of parameter β1 as a function of the sensitivity parameter δ with true measurement model µi(t) = {1 + exp(−1 + xi(t) − bi)}−1, true dropout process hi(t) = {1 + exp(−α0Yi(t))}−1, with α0 = 2.094 (20% dropouts) and α0 = .855 (50% dropouts) and true unshared bimodal random effects b~12𝒩(2,.025)+12𝒩(2,.025)

These simulation studies strongly indicate that routine modelling is not appropriate in the missing data context and sensitivity analysis should become much more customary. Especially, when substantial amounts of subjects drop out prematurely from the study, the only analysis that matters is the sensitivity analysis.

6. Discussion

We have proposed a class of semiparametric non-response models which requires the analyst to specify the joint distribution of the responses and the missing data mechanism. This class of models is made part of a sensitivity analysis to delineate the range of inferences consistent with observed data. Two types of sensitivity analyses are conducted. The first sensitivity analysis is performed on the grounds that the parametric form of the random effects distribution may be misspecified. This is done by relaxing the random effects parametric distribution to assume a discrete distribution with M support points. In certain situations, one may want to assume fully parametric random effects distributions, for instance, because they yield a more parsimonious description of the heterogeneity among subjects. For MNAR models, however, these parametric distributions must be cautiously interpreted because of their inability to be validated using observed data. Another line of sensitivity analysis results from the fact that the proposed models, both parametric and semiparametric, appear to be weakly identifiable for our data. The identifiability issue becomes acute as M the number of classes in the semiparametric models gets large. To address these identifiability concerns, we profile the likelihood function across a fixed sensitivity parameter. We discuss a global approach using a supremum test to assess sensitivity of parameter contrasts resulting from a deviation of the MAR model in the direction of MNAR models. This approach can be used to quickly check if an hypothesis is rejected regardless of any misspecification of the sensitivity parameter. The methodology is therefore especially useful in situations where a worst case analysis is needed. In practice, MNAR models are often quite complicated and it may not be clear what is the true data generating mechanism. In these set-ups, a global sensitivity analysis appears to be the most conservative data analytic strategy.

An important question related to our working method is whether we actually believe in the shared parameter model or whether we merely use it as a device to accommodate potentially non-random dropouts. This question arises very often in practice where the researcher is confronted with the choice of a plausible dropout model. In this study, we have used random effects, that adequately describe a subject’s response profile, to tie the dropout process to the measurement process. Although we have considered the simple case of a random intercept model, the method can easily be extended to incorporate a random slope with respect to time to account for each subject’s deviation from the average linear trend. We have also assumed a linear relationship between the log-odds of dropout hazard and the observed and the unobserved covariates, but this is not necessary. For example, nonlinear relations and terms involving interactions between these covariates could be used with a suitable modification of the function ϕ(․). Other model formulations and extensions are possible but the choice of any working model should be closely linked to the substantive problem under analysis. Another important issue is the choice of Δ. Our recommendation is that a group of experts should help identify a plausible and meaningful value of Δ. Of course, from a practical viewpoint, the chosen value should be computationally feasible. The dropout selection process is therefore said to be insensitive if only extreme and implausible values of the sensitivity parameter are required to alter inferences.

Acknowledgements

The authors wish to thank Solvay Pharma B.V. for permission to use the data from the Fluvoxamine study. They also wish to thank the anonymous reviewer for his or her helpful comments.

This work was supported by the NCI/NIH K-award, 1K01 CA131259 and the Michigan State University Intramural Research Grant Program-IRGP.

Appendix

Appendix A

Proof of the result that exp(|δ|)hi(t)(1hi(t))hi(t)(1hi(t))exp(|δ|) for any real δ under the condition that zi(t) = zi′(t)

The proof of this result is straightforward and relies entirely on the constraint imposed on ϕ(․). If one assumes the covariate-response model in (1) and that 0 < ϕ(bi), ϕ(bi′) < 1, it can readily be shown that |ϕ(bi) − ϕ(bi′)| < 1. Under the condition that zi(t) = zi′(t), this then leads to the result that exp(δ)<hi(t)(1hi(t))hi(t)(1hi(t))<exp(δ) for δ > 0 and that hi(t)(1hi(t))hi(t)(1hi(t))=1 for δ = 0. We then have the desired result for δ ≥ 0. For δ ≤ 0, we have, exp(δ)hi(t)(1hi(t))hi(t)(1hi(t))exp(δ), which is the desired result.

Appendix B

Proof of Theorem 1

We prove that the intercept of the dropout process has a deterministic relationship with the parameter δ as given by (5). First, it is clear that the integral on the right of (5) is finite as bo is a probability and ϕ(․) is bounded. We can assume without any loss of generality that the measures bo and bd have same support. If both distributions do not have the same support, a common support can be defined as the union of individual supports, where no mass is placed on points outside of the support of individual measures. Let B be any subset of this common support, we have,

(1pd)bo(B)=pr(D>2,bB)=bBpr(D>2|b,z(t2))db(b)=bBpr(D>2|b,z(t2)){(1pd)dbo(b)+pddbd(b)},

where the last equality is due to (4). We therefore have,

(1pd)B(1pr(D>2|b,z(t2)))dbo(b)=pdBpr(D>2|b,z(t2))dbd(b),  for any B.

We then have,

dbd(b)dbo(b)=1pdpd1pr(D>2|b,z(t2))pr(D>2|b,z(t2)) (8)

Note that under the assumption that no subject drops out at the first timepoint, we have pr(D ≥ 2) = 1 as q = 2. This then gives pr(D > 2|b, z(t2)) = 1 − pr(D = 2|D ≥ 2, b, z(t2)) = h(t2) from the definition of the hazard function. From the hazard model in 2, we have,

1pr(D>2|D2,b,z(t2))pr(D>2|D2,b,z(t2))=exp{ε(z(t2))αδϕ(b)}.

Replacing this expression in equation 8 and a basic algebra give,

dbd(b)=1pdpdexp {ε(z(t2))αδϕ(b)}dbo(b).

Let us denote by αj, j = 0, ⋯, r − 1 the slope coefficient associated with the jth component ε(zi(t2))j of the covariate vector ε(zi(t2)) for subject i at timepoint t. Here zi(t2)0 = 1 represents the covariate for the intercept term α0 in the hazard model. Assume that the function ε(․) centers all covariates z, that is ∑i ε(zi(t2))j = 0, for j = 1, ⋯ r − 1. Because bd is a law, we have dbd(bi)=1, and then,

exp {α0+j=1r1ε(zi(t2))jαj}=1pdpdexp{δϕ(b)}dbo(b).

By multiplying the above expression over all subjects, we then get,

exp {nα0+j=1r1{i=1nε(zi(t2))j}αj}={1pdpdexp{δϕ(b)}dbo(b)}n.

Finally, given that ∑i ε(zi(t2))j = 0, for j = 1, ⋯ r − 1 the expression in (5) holds. Hence, for a fixed δ, the equation in (5) can be used to compute α0, which we denote α0(δ) to highlight this dependence. In particular, for δ = 0 the intercept term α0 takes the value α0 = log{(1 − pd)/pd}, when q = 2.

Appendix C

Proof of Theorem 2

We recall that the probability of dropping out is given by,

fD(Di|bi,zi(t),δ)={hi(t2)if Di=2[t𝒪i(1hi(t))]{hi(tDi)}I(Di<1+q)if Di=3,,q+1.

It is clear that when δ gets large and supi,t |ε(zi(t))α| is bounded, the dropout hazard hi(t) = {1+exp(−ε(zi(t))α − δ ϕ(bi))}−1 converges to 1 as ϕ(bi) > 0. Hence, fD(Di|bi, zi(t), δ) converges to 0 when 2 < Diq + 1 and to 1 only when Di = 2.

Let denote by lid and ljo, respectively, the contribution of noncompleter i and a completer j to the likelihood. This then gives log𝓁(ψ,δ)=i=1ndloglid(ψ,δ)+i=nd+1nloglio(ψ,δ), where noncompleters appear first in the dataset after a proper rearrangement. Let nd(2) be the number of subjects who dropped out right after the baseline, that is nd(2)=iI(Di=2). After a proper rearrangement of the subgroup of noncompleters in the dataset, we have for dropout subjects,

i=1ndlog lid(ψ,δ)=i=1nd(2)log li,Di=2d(ψ|δ)+i=nd(2)+1ndlog li,2<Did(ψ|δ),

where li,Di=2d(ψ,δ) and li,2<Did(ψ|δ) are respectively the likelihood contribution of a subject who drops out at the second time point and after the second time point. It is clear that when δ → ∞, we have i=nd(2)+1ndlogli,2<Did(ψ|δ) and,

i=1nd(2)log li,Di=2d(ψ,δ)i=1nd(2)log {bit𝒪ifY(Yi(t)|bi,xi(t))d(bi)}<0 uniformily.

Then, the marginal log-likelihood for dropout subjects i=1ndlid(ψ,δ), when δ → ∞. Similarly for subjects with complete data, the marginal log-likelihood i=nd+1nndloglio(ψ,δ), when δ → ∞. Hence, for large values of δ, the log of the marginal likelihood given by,

i=1ndlog lid(ψ,δ)+i=nd+1nlog lio(ψ,δ)

It can also be shown that the expectation of the complete log-likelihood given observed data and the current parameter estimate ψ(a)(δ) for fixed δ also diverges.

Appendix D

Uniform consistency and weak convergence of ψ̂(δ), 0 ≤ δ ≤ Δ

Let Ψ denote the parameter space for ψ. We denote by l(ψ, δ, Wi) the contribution of subject i to the log-likelihood function. Define

s(ψ,δ,Wi)=l(ψ,δ,Wi)/ψ,S(ψ,δ)=n1i=1ns(ψ,δ,Wi),  c(ψ,δ)=E{s(ψ,δ,W1)},g(ψ,δ,Wi)=s(ψ,δ,Wi)/ψT,

Dψ(ψ,δ)=n1i=1ng(ψ,δ,Wi), and ψ(ψ, δ) = E{g(ψ, δ, W1}. For any given δ ≤ Δ, let ψ̂(δ) denote the solution to S(ψ, δ) = 0, i.e. S(ψ̂(δ), δ) = 0, and define ψ*(δ) = argmaxψ∈Ψ E{l(ψ, δ, W1)}. Define 𝒢1 = {s(ψ, δ, Wi) : ψ ∈ Ψ, δ ≤ Δ} and 𝒢2 = {g(ψ, δ, Wi) : ψ ∈ Ψ, δ ≤ Δ}.

Assume that Ψ is compact and ψ*(δ) is an interior point in Ψ for any δ ≤ Δ. In addition, assume the following regularity conditions:

  • C1.

    The function classes, 𝒢1 and 𝒢2, are pointwise measurable and satisfy the uniform entropy condition; see van der Vaart and Wellner (2000) for the definitions. For example, functions which are uniformly bounded and uniformly Lipschitz of order > {dim(ψ)+dim(δ)}/2 suffice the above conditions, where dim(·) denotes the dimension of a vector.

  • C2.

    infψ∈Ψ,δ≤Δ eigmin{−ψ(ψ, δ)} > 0, where eigmin(·) denotes the minimum eigenvalue of a matrix.

Condition C1 implies that 𝒢1 and 𝒢2 are Donsker and hence Glivenko-Cantelli (van der Vaart and Wellner, 2000). Therefore,

supψΨ,δΔS(ψ,δ)c(ψ,δ)P0  and  supψΨ,δΔDψ(ψ,δ)D˜ψ(ψ,δ)P0, (A1)

where ‖ · ‖ denotes the Euclidean norm. The definitions of ψ̂(δ) and ψ*(δ) imply that

0=S(ψ^(δ),δ)c(ψ*(δ),δ)=[S(ψ^(δ),δ)S(ψ*(δ),δ)]+[S(ψ*(δ),δ)c(ψ*(δ),δ)]=D˜ψ(ψˇ(δ),δ)·{ψ^(δ)ψ*(δ)}+ν2,n(δ), (A2)

where ψ̌(δ) is on the line segment between ψ*(δ) and ψ̂(δ), and supδ≤Δ ‖ν2,n(δ)‖ →P 0. By C2, there exists an positive number κ0 not depending on δ such that

ψ^(δ)ψ*(δ)κ01ν2,n(δ).

The uniform consistency of ψ̂(δ) to ψ*(δ) follows from supδ≤Δ ‖ν2,n(δ)‖ →P 0.

Based on the uniform consistency of ψ̂(δ) and (A12), applying the Taylor expansion to S(ψ̂(δ), δ) around {ψ*(δ), δ} gives that

n1/2{ψ^(δ)ψ*(δ)}n1/2i=1nD˜ψ1(ψ*(δ),δ){s(ψ*(δ),δ,Wi)E{s(ψ*(δ),δ,Wi)}}n1/2i=1nιi(δ), (A3)

where ≈ denotes asymptotic equivalence uniformly in δ ≤ Δ.

Because C1 implies that 𝒢1 is Donsker and because C2 implies that D˜ψ1(ψ*(δ),δ) is uniformly bounded for δ ≤ Δ, the function class {D˜ψ1(ψ*(δ),δ)s(ψ*(δ),δ,Wi),δΔ} is Donsker. This permits the application of a functional central limit theory to establish the weak convergence of ψ̂(δ).

These general results for any δ ≤ Δ also establishes asymptotic results of ψ̂(δ0), where δ = δ0.

Contributor Information

David Todem, Email: todem@msu.edu, Division of Biostatistics, Department of Epidemiology, Michigan State University, B601 West Fee Hall, East Lansing, MI 48824, U.S.A..

KyungMann Kim, Departments of Statistics and Biostatistics & Medical Informatics, University of Wisconsin-Madison, 600 Highland Ave., Madison, WI 53792, U.S.A..

Jason Fine, Departments of Statistics and Biostatistics & Medical Informatics, University of Wisconsin-Madison, 600 Highland Ave., Madison, WI 53792, U.S.A..

Limin Peng, Department of Biostatistics, Rollins School of Public Health, Emory University, 1518 Clifton Rd NE, Atlanta, GA 30322, U.S.A..

References

  1. Albert PS, Follmann DA. Modeling repeated count data subject to informative dropout. Biometrics. 2000;56:667–677. doi: 10.1111/j.0006-341x.2000.00667.x. [DOI] [PubMed] [Google Scholar]
  2. Beunckens C, MGVG, Mallinckrodt C. A latent-class mixture model for incomplete longitudinal gaussian data. Biometrics. 2008;64:96105. doi: 10.1111/j.1541-0420.2007.00837.x. [DOI] [PubMed] [Google Scholar]
  3. Burton SW. A review of the fluvoxamine and its uses in depression. International Clinical Psychopharmacology. 1991;6 Supplement 3:1–17. doi: 10.1097/00004850-199112003-00001. [DOI] [PubMed] [Google Scholar]
  4. Casella G, Berger RL. Statistical inference. Pacific Grove, CA: Wadsworth & Brooks/Cole; 1990. [Google Scholar]
  5. Copas J, Eguchi S. Local sensitivity approximations for selectivity bias. Journal of the Royal Statistical Society, Series B: Methodological. 2001;63:871–895. [Google Scholar]
  6. Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. Chapman & Hall Ltd; 1998. [Google Scholar]
  7. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm (C/R: P22-37) Journal of the Royal Statistical Society, Series B: Methodological. 1977;39:1–22. [Google Scholar]
  8. Diggle P, Kenward MG. Informative dropout in longitudinal data analysis (with discussion) Applied Statistics. 1994;43:49–93. [Google Scholar]
  9. Efron B, Tibshirani R. An introduction to the bootstrap. London: Chapman and Hall; 1993. [Google Scholar]
  10. Freedman DA. Adjusting for nonignorable drop-out using semiparametric nonresponse models: Comment. Journal of the American Statistical Association. 1999;94:1121–1122. [Google Scholar]
  11. Hogan J, Roy J, Korkontzelou C. Biostatistics tutorial: handling dropout in longitudinal studies. Statistics in Medicine. 2004;23:1455–1497. doi: 10.1002/sim.1728. [DOI] [PubMed] [Google Scholar]
  12. Robins JM, AR, Zhao L. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]
  13. Kenward MG, Goetghebeur EJT, Molenberghs G. Sensitivity for incomplete categorical data. Statistical Modelling. 2001;1:31–48. [Google Scholar]
  14. Kenward MG, Lesaffre E, Molenberghs G. An application of maximum likelihood and generalized estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics. 1994;50:945–953. [PubMed] [Google Scholar]
  15. Laird N, Ware J. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  16. Lesaffre E, Molenberghs G, Dewulf L. Effects of dropouts in a longitudinal study: An application of a repeated ordinal model. Statistics in Medicine. 1996;15:1123–1141. doi: 10.1002/(SICI)1097-0258(19960615)15:11<1123::AID-SIM228>3.0.CO;2-L. [DOI] [PubMed] [Google Scholar]
  17. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
  18. Lindsay BG. The geometry of mixture likelihoods: A general theory. The Annals of Statistics. 1983;11:86–94. [Google Scholar]
  19. Little R, Rubin D. Statistical analysis with missing data. New York: John Wiley and Sons; 1987. [Google Scholar]
  20. McLachlan G, Peel D. Finite Mixture Models. New York: Wiley; 2000. [Google Scholar]
  21. Molenberghs G, Kenward MG, Lesaffre E. The analysis of longitudinal ordinal data with non-random dropout. Biometrika. 1997;84:33–44. [Google Scholar]
  22. Molenberghs G, Lesaffre E. Marginal modeling of correlated ordinal data using a multivariate Plackett distribution. Journal of the American Statistical Association. 1994;89:633–644. [Google Scholar]
  23. Rosenbaum PR. Observational Studies. New York: Springer; 2002. [Google Scholar]
  24. Rotnitzky A, Scharfstein D, Su T-L, Robins J. Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics. 2001;57:103–113. doi: 10.1111/j.0006-341x.2001.00103.x. [DOI] [PubMed] [Google Scholar]
  25. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models (c/r: P1121-1146) Journal of the American Statistical Association. 1999;94:1096–1120. [Google Scholar]
  26. Tao H, Palta M, Yandell B, Newton M. An estimation method for the semiparametric mixed effects model. Biometrics. 1999;55:102–110. doi: 10.1111/j.0006-341x.1999.00102.x. [DOI] [PubMed] [Google Scholar]
  27. Ten Have TR, Reboussin BA, Miller ME, Kunselman A. Mixed effects logistic regression models for multiple longitudinal binary functional limitation responses with informative drop-out and confounding by baseline outcomes. Biometrics. 2002;58:137–144. doi: 10.1111/j.0006-341x.2002.00137.x. [DOI] [PubMed] [Google Scholar]
  28. Todem D, Kim K, Lesaffre E. A sensitivity approach to modeling longitudinal bivariate ordered data subject to informative dropouts. Health services and outcomes research methodology. 2006;6:37–57. [Google Scholar]
  29. Troxel A, Ma G, Heitjan D. An index of local sensitivity to nonignorability. Statistica Sinica. 2002;14:1221–1237. [Google Scholar]
  30. Vach W, Blettner M. Logistic regression with incompletely observed categorical covariates – Investigating the sensitivity against violation of the missing at random assumption. Statistics in Medicine. 1995;14:1315–1329. doi: 10.1002/sim.4780141205. [DOI] [PubMed] [Google Scholar]
  31. van der Vaart AW, Wellner JA. In: Preservation theorems for Glivenko-Cantelli and uniform Glivenko-Cantelli theorems. High Dimensional Probability II. Giné E, Mason DM, Wellner JA, editors. Birkhäuser, Boston: 2000a. [Google Scholar]
  32. van der Vaart AW, Wellner JA. Weak convergence and empirical processes. New York: Springer; 2000b. [Google Scholar]
  33. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]
  34. Verbeke G, Molenberghs G, Thijs H, Lesaffre E, Kenward M. Sensitivity analysis for nonrandom dropout: A local influence approach. Biometrics. 2001;57:7–14. doi: 10.1111/j.0006-341x.2001.00007.x. [DOI] [PubMed] [Google Scholar]
  35. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44:175–188. [Google Scholar]

RESOURCES