Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Dec 1.
Published in final edited form as: J Am Stat Assoc. 2010 Dec;105(492):1333–1346. doi: 10.1198/jasa.2010.ap09321

A Bayesian Shrinkage Model for Incomplete Longitudinal Binary Data with Application to the Breast Cancer Prevention Trial

C Wang *, MJ Daniels , D O Scharfstein , S Land §
PMCID: PMC3079242  NIHMSID: NIHMS205212  PMID: 21516191

Abstract

We consider inference in randomized longitudinal studies with missing data that is generated by skipped clinic visits and loss to follow-up. In this setting, it is well known that full data estimands are not identified unless unverified assumptions are imposed. We assume a non-future dependence model for the drop-out mechanism and partial ignorability for the intermittent missingness. We posit an exponential tilt model that links non-identifiable distributions and distributions identified under partial ignorability. This exponential tilt model is indexed by non-identified parameters, which are assumed to have an informative prior distribution, elicited from subject-matter experts. Under this model, full data estimands are shown to be expressed as functionals of the distribution of the observed data. To avoid the curse of dimensionality, we model the distribution of the observed data using a Bayesian shrinkage model. In a simulation study, we compare our approach to a fully parametric and a fully saturated model for the distribution of the observed data. Our methodology is motivated by, and applied to, data from the Breast Cancer Prevention Trial.

Keywords: Informative drop-out, Prior elicitation, Intermittent missingness

1 Introduction

1.1 Breast Cancer Prevention Trial

The Breast Cancer Prevention Trial (BCPT) was a large multi-center, double-blinded, placebo-controlled, chemoprevention trial of the National Surgical Adjuvant Breast and Bowel Project (NSABP) designed to test the efficacy of 20mg/day tamoxifen in preventing breast cancer and coronary heart disease (Fisher et al., 1998). The targeted population of the study was women at increased risk of developing breast cancer, including women who were age 60 or older, who were age 35–59 and had 5-year predicted risk for breast cancer at least equivalent to that of women 60 years or older, or who had a history of lobular carcinoma in situ. The study was open to accrual from June 1, 1992 through September 30, 1997 and 13,338 women were enrolled in the study during this interval. The primary objective was to determine whether long-term tamoxifen therapy was effective in preventing the occurrence of invasive breast cancer. Secondary objectives included quality of life (QOL) assessments to evaluate benefit as well as risk resulting from the use of tamoxifen.

Monitoring QOL was of particular importance for this trial since the participants were healthy women and there had been concerns voiced by researchers about the association between clinical depression and tamoxifen use. Accordingly, data on depression symptoms was scheduled to be collected at baseline prior to randomization, at 3 months, at 6 months and every 6 months thereafter for up to 5 years. The primary instrument used to monitor depressive symptoms over time was the Center for Epidemiologic Studies Depression Scale (CES-D)(Radloff, 1977). This self-test questionnaire is composed of 20 items, each of which is scored on a scale of 0–3. A score of 16 or higher is considered as a likely case of clinical depression.

The trial was unblinded on March 31, 1998, after an interim analysis showed a dramatic reduction in the incidence of breast cancer in the treatment arm. Due to the potential loss of the control arm, we focus on QOL data collected on the 10,739 participants who were enrolled during the first two years of accrual and had their CES-D score recorded at baseline. All women in this cohort had the potential for three years of follow-up (before the unblinding).

In the BCPT, the clinical centers were not required to collect QOL data on women after they stopped their assigned therapy. This design feature aggravated the problem of missing QOL data in the trial. As reported in Land et al. (2002), more than 30% of the CES-D scores were missing at the 36-month follow-up, with a slightly higher percentage in the tamoxifen group. They also showed that women with higher baseline CES-D scores had higher rates of missing data at each follow-up visit and the mean observed CES-D scores preceding a missing measurement were higher than those preceding an observed measurement; there was no evidence that these relationships differed by treatment group.

While these results suggest that the missing data process is associated with observed QOL outcomes, one cannot rule out the possibility that the process is further related to unobserved outcomes and that this relationship is modified by treatment. In particular, investigators were concerned (a priori) that, between assessments, tamoxifen might be causing depression in some individuals, who then do not return for their next assessment. If this occurs, the data are said be missing not at random (MNAR); otherwise the data are said to be missing at random (MAR).

1.2 Missing Data in Longitudinal Studies

In this paper, we will concern ourselves with inference in longitudinal studies, where individuals miss visits intermittently and/or do not return for subsequent visits (i.e., drop-out). In such a setting, MNAR is often referred to as informative missingness.

1.2.1 Drop-out

There are two main inferential paradigms for analyzing longitudinal studies with informative drop-out: likelihood (parametric) and non-likelihood (semi-parametric). Articles by Little (1995), Hogan and Laird (1997a) and Kenward and Molenberghs (1999) as well as recent books by Molenberghs and Kenward (2007) and Daniels and Hogan (2008) provide a comprehensive review of likelihood-based approaches, including selection models, pattern-mixture models, and shared-parameter models. These models differ in the way the joint distribution of the outcome and missing data processes are factorized. In selection models, one specifies a model for the marginal distribution of the outcome process and a model for the conditional distribution of the drop-out process given the outcome process (see, for example, Heckman, 1979; Diggle and Kenward, 1994; Baker, 1995; Fitzmaurice et al., 1995; Molenberghs et al., 1997; Liu et al., 1999; Albert, 2000); in pattern-mixture models, one specifies a model for the conditional distribution of the outcome process given the drop-out time and the marginal distribution of the drop-out time (see, for example, Little, 1993, 1994 see, for example, Little, 1995; Hogan and Laird, 1997b; Fitzmaurice and Laird, 2000; Daniels and Hogan, 2000; Roy, 2003; Birmingham and Fitzmaurice, 2002; Thijs et al., 2002; Pauler et al., 2003; Roy and Daniels, 2008); and in shared-parameter models, the outcome and drop-out processes are assumed to be conditionally independent given shared random effects (see, for example, Wu and Carroll, 1988; DeGruttola and Tu, 1994; Ten Have et al., 1998, 2000; Pulkstenis et al., 1998; Land et al., 2002; Yuan and Little, 2009). Traditionally, these models have relied on very strong distributional assumptions in order to obtain model identifiability.

Without these strong distributional assumptions, specific parameters from these models would not be identified from the distribution of the observed data. To address this issue within a likelihood-based framework, several authors (Nordheim, 1984; Baker et al., 1992; Little, 1994; Little and Rubin, 1999; Kurland and Heagerty, 2004; Daniels and Hogan, 2008) have promoted the use of global sensitivity analysis, whereby non- or weakly- identified, interpretable parameters are fixed and then varied to evaluate the robustness of the inferences. Scientific experts can be employed to constrain the range of these parameters.

Non-likelihood approaches to informative drop-out in longitudinal studies have been primarily developed from a selection modeling perspective. Here, the marginal distribution of the outcome process is modeled non- or semi-parametrically and the conditional distribution of the drop-out process given the outcome process is modeled semi- or fully- parametrically. In the case where the drop-out process is assumed to depend only on observable outcomes (i.e., MAR), Robins et al. (1994, 1995), van der Laan and Robins (2003) and Tsiatis (2006) developed inverse-weighted and augmented inverse-weighted estimating equations for inference. For informative drop-out, Rotnitzky et al. (1998), Scharfstein et al. (1999) and Rotnitzky et al. (2001) introduced a class of selection models, in which the model for dropout is indexed by interpretable sensitivity parameters that express departures from MAR. Inference using inverse-weighted estimating equations was proposed.

The problem with the aforementioned sensitivity analysis approaches is that the ultimate inferences can be cumbersome to display. Vansteelandt et al. (2006) developed a method for reporting ignorance and uncertainty intervals (regions) that contain the true parameter(s) of interest with a prescribed level of precision, when the true data generating model is assumed to fall within a plausible class of models (as an example, see Scharfstein et al., 2004). An alternative and very natural strategy is to specify an informative prior distribution on the non- or weakly- identified parameters and conduct a fully Bayesian analysis, whereby the ultimate inferences are reported in terms of posterior distributions. In the cross-sectional setting with a continuous outcome, Scharfstein et al. (2003) adopted this approach from a semi-parametric selection modeling perspective. Kaciroti et al. (2009) proposed a parametric pattern-mixture model for cross-sectional, clustered binary outcomes. Lee et al. (2008) introduced a fully-parametric pattern-mixture approach in the longitudinal setting with binary outcomes. In this paper, we consider a similar setting to Lee et al. (2008), but offer a more flexible strategy. In the context of BCPT, the longitudinal outcome will be the indicator that the CES-D score is 16 or higher.

1.2.2 Intermittent Missing Data

In the BCPT, approximately 15% of the responses were intermittently missing, i.e. there are missing values prior to drop-out. One approach to handle intermittent missingness is to consider a “monotonized” dataset, whereby all CES-D scores observed on an individual after their first missing score are deleted, as in Land et al. (2002). However, this increases the “drop-out” rate, loses efficiency, and may introduce bias.

Handling informative intermittent missing data is methodologically and computationally challenging and, as a result, the statistics literature is relatively limited. Most methods adopt a likelihood approach and rely on strong parametric assumptions (see, for example, Troxel et al., 1998a; Albert, 2000; Ibrahim et al., 2001; Albert et al., 2002; Lin et al., 2004). Semiparametric methods have been proposed by Troxel et al. (1998b) and Vansteelandt et al. (2007). Troxel et al. (1998b) proposed a marginal model and introduced a pseudo-likelihood estimation procedure. Vansteelandt et al. (2007) extended the ideas of Rotnitzky et al. (1998), Scharfstein et al. (1999) and Rotnitzky et al. (2001) to non-monotone missing data.

Most related to our approach are the (partial ignorability) assumptions proposed in Harel and Schafer (2009) that partition the missing data and allow one (or more) of the partitions to be ignored given the other partition(s) and the observed data. In this paper, we apply a partial ignorability assumption such that the intermittent missing data mechanism can be ignored given drop-out and treatment strata.

1.3 Outline

The paper is organized as follows. In Section 2, we describe the data structure, formalize identification assumptions and prove that the treatment-specific distribution of the full trajectory of longitudinal outcomes is identified under these assumptions. In Section 3, we introduce a saturated model for the distribution of the data that would be observed when there is drop-out, but no intermittent observations. We then illustrate how to apply shrinkage priors to parameters in the saturated model to reduce the dimensionality of the parameter space and how to elicit (conditional) informative priors for non-identified sensitivity parameters from experts. In Section 4, we assess, by simulation, the behavior of three classes of models: parametric, saturated, and shrinkage. Our analysis of the BCPT trial is presented in Section 5. Section 6 is devoted to a summary and discussion.

2 Notation, Assumptions and Identifiability

We define the following notation for a random individual. When necessary, we use the subscript i to denote data for the ith individual.

Let Z denote the treatment assignment indicator, where Z = 1 denotes tamoxifen and Z = 0 denotes placebo. Let Y be the complete response data vector with elements Yj denoting the binary outcome (i.e., depression) scheduled to be measured at the jth visit (j = 0(baseline), …, J) and let Ȳj = (Y0, …, Yj) denote the history of the outcome process through visit j. Let R be the vector of missing data indicators with the same dimension as Y, such that Rj = 1 indicates Yj is observed and Rj = 0 indicates Yj is missing. Let S = max{t: Rt = 1} be the last visit at which an individual’s depression status is recorded. If S < J, then we say that the individual has dropped out and S is referred to as the drop-out time. Let RS = {Rj: j < S} be the collection of intermittent missing data indicators recorded prior to S.

We will find it useful to distinguish three sets of data for an individual: the complete data Inline graphic = (Z, S, RS, Y), the full data ℱ = (Z, S, RS, ȲS), and the observed data Inline graphic = (Z, S, RS, Yobs), where Yobs is the subset of Y for which Rj = 1. It is useful to also define Ymis=(YmisI,YmisC,YmisF), where YmisI={Yj:Rj=0,j<S} denotes the “intermittent” missing responses, YmisC={Yj:j=S+1,jJ} denotes the missing response at the time right after drop-out, and YmisF={Yj:S+1<jJ} denotes the “future” missing responses. Note that Y¯S=(YmisI,Yobs).

We assume that individuals are drawn as a simple random sample from a super-population so that we have an i.i.d. data structure for Inline graphic, ℱ and Inline graphic. We let the parameters θz index a model for the joint conditional distribution of S and ȲS given Z = z and the parameters φs,z index a model for the conditional distribution of RS given S = s, Ȳs and Z = z. We assume that the parameters θz and φz = (φ1,z, …, φJ,z) are distinct.

Our goal is to draw inference about μz,j=P[Yj=1Z=z] for j = 1, …, J and z = 0, 1. To identify μz,j from the distribution of the observed data, we make the following three (untestable) assumptions:

Assumption 1

Given Z and S, the intermittent missing data are missing at random, i.e.,

RSYmisIZ,S,Yobs.

This assumption plus the assumption that θz is a priori independent of φz implies that the intermittent missingness mechanism is ancillary or ignorable. Specifically, this means that when considering inferences about θz from a likelihood perspective, as we are in this paper, the conditional distribution of RS given Z, S and Yobs does not contribute to the likelihood and can be ignored (Harel and Schafer, 2009).

The next two assumptions show how we can identify μz,j from the conditional distribution of S and ȲS given Z = z.

Assumption 2 (Non-Future Dependence)

For j = 1, …, J,

P[S=j1Sj1,Y]=P[S=j1Sj1,Y¯j]

This assumption asserts that for individuals at risk for drop-out at visit j and who share the same history (possibly missing) of outcomes up to and including visit j, the distribution of future outcomes is the same for those who are last seen at visit j and those who remain on study past visit j. This assumption has been referred to as non-future dependence (Kenward et al., 2003).

Assumption 3 (Pattern-Mixture Representation)

For j = 1, …, J and yj = 0, 1,

P[Yj=yjS=j1,Y¯j1,Z=z]=P[Yj=yjSj,Y¯j1,Z=z]exp{qz,j(Y¯j1,yj)}E[exp{qz,j(Y¯j1,Yj)}Sj,Y¯j1,Z=z]

where qz,j(Ȳj−1, Yj) is a specified function of its arguments.

Assumption 3 links the non-identified conditional distribution of Yj given S = j − 1, Ȳj−1 and Z = z to the conditional distribution of Yj given Sj, Ȳj−1 and Z = z, which is estimable by Assumption 1, using exponential tilting via the specified function qz,j(Ȳj−1, Yj). Assumption 3 has a selection model representation that is obtained using Bayes’ rule.

Assumption 3 (Selection Model Representation)

For j = 1, …, J,

logit{P[S=j1Sj1,Y¯j,Z=z]}=hz,j(Y¯j1)+qz,j(Y¯j1,Yj)

where

hz,j(Y¯j1)=logitP[S=j1Sj1,Y¯j1,Z=z]log{E[exp{qz,j(Y¯j1,Yj)}Sj1,Y¯j1,Z=z]}

With this characterization, we see that the function qz,j(Ȳj−1, Yj) quantifies the influence (on a log odds ratio scale) of the potentially unobservable outcome Yj on the conditional odds of dropping at time j. Furthermore, the functions qz,j(Ȳj−1, Yj) are not identifiable from the distribution of the observed data and their specification places no restrictions on the distribution of the observed data.

The above three assumptions non-parametrically, just-identify μz,j for all j = 1, …, J and z = 0, 1. To see this, consider the following representation for μz,j derived using the laws of total and conditional probability:

μz,j=y¯j1P[Yj=1Sj,Y¯j1=y¯j1,Z=z]×{l=1jP[SlSl1,Y¯l1=y¯l1,Z=z]l=0j1P[Yl=ylSl,Y¯l1=y¯l1,Z=z]}+k=1jy¯k1P[Yj=1S=k1,Y¯k1=y¯k1,Z=z]P[S=k1Sk1,Y¯k1=y¯k1,Z=z]×{l=1k1P[SlSl1,Y¯l1=y¯l1,Z=z]l=0k1P[Yl=ylSl,Y¯l1=y¯l1,Z=z]}

All quantities on the right hand side of this equation are estimable under Assumption 1, except P[Yj = 1|S = k − 1, Ȳk−1 = ȳk−1, Z = z] for k = 1, …,j. Under Assumptions 1, 2 and 3, these latter probabilities can be shown to be identified, implying that μz,j is identified for all j and z.

Theorem 1

P[Yj = 1|Sk − 1, Ȳk−1 = ȳk−1, Z = z] and P[Yj = 1|S = k−1, Ȳk−1 = ȳk−1, Z = z] are identified for k = 1, …, j.

Proof

See Appendix.

The identifiability result shows that, given the functions qz,j(Ȳj−1, Yj), μz,j can be expressed as functional of the distribution of the observed data.

3 Modeling, Prior Specification and Posterior Computation

3.1 Modeling

We specify saturated models for the conditional distribution of S and ȲS given Z = z via models for P [Yj = 1 |Sj, Ȳj−1, Z = z] (j = 0, …J) and for P[S = j − 1 |Sj − 1, Ȳj−1, Z] (j = 1, …, J). We parameterize these models as follows:

p[Y0=1]=αz,0P[Y1=1S1,Y0=y,Z=z]=αz,1,yP[Yj=1Sj,Yj1=y,Y¯j2=y¯j2,Z=z]=αz,j,y¯j2,yP[S=0Y0=y,Z=z]=γz,0,yP[S=j1Sj1,Yj1=y,Y¯j2=y¯j2,Z=z]=γz,j1,y¯j2,y (1)

for j = 2, …, J and y = 0, 1. Let αz denote the parameters indexing the first set of models for response and γz denote the parameters indexing the second set of models for drop-out. Recall that we defined θz to denote the parameters of the conditional distribution of S and ȲS given Z = z; thus, θz = (αz, γz).

Furthermore, we parameterize the functions qz,j(Ȳj−1, Yj) with parameters τz,j, ȳj−1= qz,j(ȳj−1,1)− qz,j(ȳj−1,0) Here, exp(τz,j,ȳj−1)represents, in the context of the BCPT trial, the conditional odds ratio of dropping out between visits j − 1 and j for individuals who are depressed vs. not depressed at visit j, but share the mental history ȳj−1 through visit j − 1. We let τz denote the collection of τz,j, ȳj−1’s.

3.2 Shrinkage Prior

The saturated model proposed in Section 3.1 provides a perfect fit to the distribution of the observed data. In this model, however, the number of parameters increases exponentially in J. In contrast, the number of data points increases linearly in J. As a consequence, there will be many combinations of ȳj−1 (i.e., ”cells”) which will be sparsely represented in the dataset. In the BCPT trial, about 50% of the possible realizations of Ȳ7 have less than two observations and about 15% have no observations. From a frequentist perspective, this implies that components of θz will be imprecisely estimated; in turn, this can adversely affect estimation of μz,j. This has been called the curse of dimensionality (Robins and Ritov, 1997). To address this problem, we introduce data driven shrinkage priors for higher order interactions to reduce the number of parameters in an automated manner. In particular, we assume

αz,j,y¯j2,yBeta(mz,j,y(α)/ηz,j,y(α),(1mz,j,y(α))/ηz,j,y(α))γz,j1,y¯j2,yBeta(mz,j1,y(γ)/ηz,j1,y(γ),(1mz,j1,y(γ))/ηz,j1,y(γ)). (2)

for j = 2, …, J and y = 0, 1. For αz,0, αz,1,y and γz,0,y for y = 0, 1, we assign Unif(0, 1) priors. Let mz(α)(mz(γ)) and ηz(α)(ηz(γ)) denote the parameters mz,j,y(α)(mz,j1,y(γ)) and ηz,j,y(α)(ηz,j1,y(γ)) respectively.

Note that for a random variable X that follows a Beta(m/η, (1−m)) distribution, we have

E[X]=mandVar[X]=m(1m)×ηη+1.

For fixed m, Var[X] → 0 as η → 0, indicating shrinkage of the distribution of X toward the mean. Thus, ηz,j,y(α) and ηz,j1,y(γ) serve as shrinkage parameters for αz,j,ȳj−2,y, and γz,j−1,ȳj−2,y respectively. As the shrinkage parameters go to zero, the distribution of the probabilities αz,j,ȳj−2,y, and γz,j−1,ȳj−2,y are shrunk toward the mean of the probabilities that do not depend on ȳj−2 namely mz,j,y(α) and mz,j1,y(γ), respectively. In essence, the model is being shrunk toward a first-order Markov model. The shrinkage priors allow “neighboring cells” to borrow information from each other and provide more precise inferences.

We specify independent Unif(0, 1) priors for mz,j,y(α) and mz,j1,y(γ). For the shrinkage parameters ηz,j,y(α) and ηz,j1,y(γ), we specify independent, uniform shrinkage priors (Daniels, 1999) as follows

ηz,j,y(α)g(Ez,j,y(α))(g(Ez,j,y(α))ηz,j,y(α)+1)2andηz,j1,y(γ)g(Ez,j1,y(γ))(g(Ez,j1,y(γ))ηz,j1,y(γ)+1)2, (3)

where

  • g(·) is a summary function (e.g., minimum, median or maximum, as suggested in Christiansen and Morris (1997)).

  • Ez,j,y(α)={ez,j,y¯j2,y(α): the expected number of subjects with Sj, Yj−1 = y, Ȳj−2 = ȳj−2 Z = z}.

  • Ez,j1,y(γ)={ez,j1,y,y¯j2(γ): the expected number of subjects with Sj − 1, Yj−1 = y, Ȳj−2 = ȳj−2, Z = z}.

The expected number of subjects with Sj, Yj−1 = y, Ȳj−2, Z = z and with Sj − 1, Yj−1 = y, Ȳj−2 = ȳj−2, Z = z can be computed as:

ez,j,y¯j2,y(α)=nzs=jJyj,yj+1,,ysP[S=s,Ys=ys,,Yj=yj,Yj1=y,Y¯j2=y¯j2Z=z]ez,j1,y¯j2,y(γ)=nzs=j1Jyj,yj+1,,ysP[S=s,Ys=ys,,Yj=yj,Yj1=y,Y¯j2=y¯j2Z=z] (4)

where the probabilities on the right hand side of the above equations are estimable under Assumption 1.

The expected sample sizes above are used in the prior instead of the observed binomial sample sizes which are not completely determined due to the intermittent missingness. Thus, our formulation of these priors induces a small additional amount of data dependence beyond its standard dependence on the binomial sample sizes. This additional dependence impacts the median of the prior but not its diffuseness.

3.3 τz given θz

The sensitivity parameters in Assumption 3, defined formally in Section 3.1, are (conditional) odds ratios. In our experience, subject matter experts often have difficulty thinking in terms of odds ratios; rather, they are more comfortable expressing beliefs about relative risks (Scharfstein et al., 2006; Shepherd et al., 2007). With this in mind, we asked Dr. Patricia Ganz, a medical oncologist and expert on quality of life outcomes in breast cancer, to express her beliefs about the risk of dropping out and its relationship to treatment assignment and depression. We then translated her beliefs into prior distributional assumptions about the odds ratio sensitivity parameters τz.

Specifically, we asked Dr. Ganz to answer the following question for each treatment group:

Q: Consider a group of women assigned to placebo (tamoxifen), who are on study through visit j−1 and who share the same history of depression (depressed or not depressed). Suppose that the probability that a randomly selected woman in this group drops out before visit j is p. For each p, what is the minimum, maximum and your best guess (median) representing how much more (e.g. twice) or less (e.g., half) likely you consider the risk of dropping out before visit j for a woman who would be depressed at visit j RELATIVE to a woman who would not be depressed at visit j?

Implicit in this question is the assumption that, for each treatment group, the relative risk of dropping out only depends on past history and the visit number through the risk of dropping out between visits j − 1 and j.

For notational convenience, let rz(p) denote the relative risk of drop-out for treatment group z and drop-out probability p. Further, let rz,min(p), rz,med(p) and rz,max(p) denote the elicited minimum, median, and maximum relative risks. Let pz,j(y)(y¯j1)=P[S=j1Sj1,Y¯j1=y¯j1,Yj=y,Z=z] for y = 0, 1. By definition,

rz(γz,j1,y¯j2,yj1)=pz,j(1)(y¯j1)/pz,j(0)(y¯j1)γz,j1,y¯j2,yj1=y=01pz,j(y)(y¯j1)πz,j(y)(y¯j1)

where πz,j(y)(y¯j1)=P[Yj=ySj1,Y¯j1=y¯j1,Z=z] for y = 0,1. This implies that

pz,j(0)(y¯j1)=γz,j1,y¯j2,yj1πz,j(1)(y¯j1)(rz(γz,j1,y¯j2,yj1)1)+1.

Since πz,j(1)(y¯j1)[0,1], given γz,j−1,ȳj−2,yj−1 and rz(γz,j−1,ȳj−2,yj−1), pz,j(0)(y¯j1) is bounded as follows:

  • for rz(γz,j−1,ȳj−2,yj−1) ≥ 1,
    γz,j1,y¯j2,yj1/rz(γz,j1,y¯j2,yj1)pz,j(0)(y¯j1)min{γz,j1,y¯j2,yj1,1}
  • for rz(γz,j−1,ȳj−2,yj−1) ≤ 1,
    γz,j1,y¯j2,yj1pz,j(0)(y¯j1)min{γz,j1,y¯j2,yj1/rz(γz,j1,y¯j2,yj1),1}.

We will use these bounds to construct our prior.

We construct the conditional prior of τz,j, ȳj−1 given γz,j−1,ȳj−2,yj−1 using Steps 1–4 given below. The general strategy is to use the elicited information on the relative risk at different drop-out probabilities and the bounds derived above to construct the prior of interest.

Step 1. For m ∈ {min, med, max}, interpolate the elicited rz,m(p) at different drop-out probabilities (see Figure 1) to find rz,m(γz,j−1,ȳj−2,yj−1) for any γz,j−1,ȳj−2,yj−1.

Figure 1.

Figure 1

Extrapolation of the elicited relative risks.

Step 2. Construct the prior of rz(γz,j−1,ȳj−2,yj−1) given γz,j−1,ȳj−2,yj−1 as a 50-50 mixture of

Unif(rz,min(γz,j1,y¯j2,yj1),rz,med(γz,j1,y¯j2,yj1))

and

Unif(rz,med(γz,j1,y¯j2,yj1),rz,max(γz,j1,y¯j2,yj1))

random variables. This preserves the elicited percentiles of the relative risk.

Step 3. Construct a conditional prior of pz,j(0)(y¯j1) given γz,j−1,ȳj−2,yj−1 and rz(γz,j−1,ȳj−2,yj−1) as a uniform distribution with lower bound

γz,j1,y¯j2,yj1max{rz(γz,j1,y¯j2,yj1),1}

and upper bound

min{γz,j1,y¯j2,yj1min{rz(γz,j1,y¯j2,yj1),1},1min{rz(γz,j1,y¯j2,yj1),1}}.

The bounds were derived above.

Step 4. Steps (2) and (3) induce a prior for τz,j,Ȳj−1|θz by noting

τz,j,y¯j1=log(rz(γz,j1,y¯j2,yj1)(1pz,j(0)(y¯j1))1rz(γz,j1,y¯j2,yj1)pz,j(0)(y¯j1)),

i.e., τz,j,ȳj−1 is a deterministic function of rz(γz,j−1,ȳj−2,yj−1) and pz,j(0)(y¯j1).

The relative risks elicited from Dr. Ganz are given in Table 1. We extrapolated the relative risks outside the ranges given in Table 1 as shown in Figure 1.

Table 1.

Elicited Percentiles of Relative Risks of Dropping Out

Treatment Percentile Drop out Rate
10% 25%
Tamoxifen Minimum (100% confident the number is above) 1.10 1.30
Median (Best guess) 1.20 1.50
Maximum (100% confident the number is below) 1.30 1.60

Placebo Minimum (100% confident the number is above) 1.01 1.20
Median (Best guess) 1.05 1.30
Maximum (100% confident the number is below) 1.10 1.40

Figure 2 shows the density of τz,j,ȳj−1 given γz,j−1,ȳj−2,yj−1 equal 10% and 25% for the tamoxifen and placebo arms. For two patients with the same depression history up to time point j − 1, the log odds ratio of dropping out at time point j, for the patient that is depressed at time point j versus the patient that is not, increases as the overall drop out rate at time point j increases. In general, for a given γz,j−1,ȳj−2,yj−1, the log odds ratio is higher for patients in the tamoxifen versus placebo arms.

Figure 2.

Figure 2

Prior conditional density τz,j,ȳj−1,y given γz,j−1,ȳj−2,yj−1. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for γz,j−1,ȳj−2,yj−1.= 0.25 and 0.10 respectively.

3.4 Posterior Computation

The following steps are used to simulate draws from the posterior of μz,j:

  1. Sample P(θz,YmisIYobs,S,RS,Z=z) using Gibbs sampling with data augmentation (see details in Appendix). Continue sampling until convergence.

  2. For each draw of γz,j−1,ȳj−2,yj−1, draw τz,j,ȳj−1 based on the conditional priors described in Section 3.3.

  3. Compute μz,j by plugging the draws of αz,ȳj−2,y, γz,ȳj−2,y, and τz,j,ȳj−1 into the identification algorithm discussed in Section 2.

4 Assessment of Model Performance via Simulation

We simulated observed data (no intermittent missingness) from a parametric model of the following form:

logitP[Y0=1Z=z]=αz,0,0logitP[Y1=1S1,Y0=y0,Z=z]=αz,1,0+αz,1,1y0logitP[Yj=1Sj,Y¯j1=y¯j1,Z=z]=αz,j,0+αz,j,1yj1+αz,j,2yj2logitP[S=0Y0=y0,Z=z]=γz,0,0+γz,0,1y0logitP[S=j1Sj1,Y¯j1=y¯j1,Z=z]=γz,j1,0+γz,j1,1yj1+γz,j1,2yj2,

for j = 2 to 7. Our ”true” model is a second-order Markov model.

We compared the performance of our shrinkage model with (1) a correct parametric model, (2) an incorrect parametric model (first order Markov model) and (3) a saturated model (with diffuse priors). Our shrinkage model uses the shrinkage priors proposed in Section 3.2. Note that the shrinkage priors shrink the saturated model to an incorrect parametric model (i.e., a first order Markov model). For the saturated model, instead of the shrinkage priors, we specify independent Unif(0, 1) on αz’s and γz’s.

To determine the parameters of the data generating model, we fit this model to the “monotonized” BCPT data in WinBUGS with non-informative priors. We used the posterior mean of the of parameters αz and γz as the true parameters. We compute the ”true” values of μz,j by (1) drawing 10,000 values from the elicited prior of τz given γz given in Table 1, (2) computing μz,j using the identification algorithm in Section 2 for each draw, and (3) average the resulting μz,j’s. The model parameters and the “true” depression rates μz,j, are given in Table 3.

Table 3.

Simulation Scenario

Parameter Time Point j (Month)
0(0) 1(3) 2(6) 3(12) 4(18) 5(24) 6(30) 7(36)
Tamoxifen
α1,j,0 −2.578 −2.500 −2.613 −2.752 −2.626 −2.789 −2.811 −2.895
α1,j,1 2.460 1.978 1.940 2.023 2.072 1.885 2.007
α1,j,2 1.500 1.599 1.389 1.612 1.639 1.830
γ1,j−1,0 −2.352 −2.871 −2.625 −2.513 −2.281 −2.217 −2.536
γ1,j−1,1 0.611 0.397 0.460 0.247 0.320 0.127 0.228
γ1,j−1,2 0.121 0.422 0.261 0.035 0.293 0.204
Depression Rate 0.066 0.097 0.119 0.124 0.139 0.126 0.126 0.123

Placebo
α0,j,0 −2.653 −2.632 −2.59 −2.663 −2.598 −2.884 −2.853 −3.035
α0,j,1 2.708 2.304 1.874 2.104 2.068 2.123 2.243
α0,j,2 1.241 1.608 1.471 1.693 1.540 1.989
γ0,j−1,0 −2.308 −2.970 −2.729 −2.474 −2.410 −2.460 −2.673
γ0,j−1,1 0.466 0.468 0.469 0.272 0.376 0.088 0.001
γ0,j−1,2 −0.293 0.323 0.278 0.288 0.241 0.428
Depression Rate 0.071 0.107 0.118 0.120 0.132 0.130 0.126 0.125

We considered small (500), moderate (2000), large (5000) and very large (1,000,000) sample sizes for each treatment arm; for each sample size, we simulated 500 datasets. We assessed model performance using mean squared error (MSE).

In Table 4 (sample size 1,000,000 not shown), we report the MSE’s of P [Yj = 1|Sj, Ȳj−1, Z = z] and P[S = j − 1|Sj −1, Yj−1, Z = z] averaged over all j and all Yj−1 (see columns 3 and 4, respectively). We also report the MSE’s for μz,j (see columns 6–12). For reference, the MSE’s associated with the true data generating model are bolded. At all sample sizes, the shrinkage model has lower MSE’s for the rates of depression at times 3–7 than the incorrectly specified parametric model and the saturated model. Our simulation results show that as sample size goes to infinity (e.g. very large, 1,000,000), both the shrinkage model and the saturated model converge to the true values of μz,j, whereas the incorrectly specified parametric model yields biased estimates.

Table 4.

Simulation Results: MSE (×103). P and T represent placebo and tamoxifen arms, respectively.

Model Treat Observed μj,z (Month)

Y R 1(3) 2(6) 3(12) 4(18) 5(24) 6(30) 7(36)
Sample Size 500
True P 6.209 2.474 0.199 0.225 0.258 0.313 0.319 0.352 0.390
T 6.790 2.789 0.205 0.228 0.297 0.344 0.331 0.405 0.428

Parametric P 33.351 1.511 0.199 0.227 0.26 0.317 0.323 0.349 0.388
T 32.323 1.602 0.205 0.226 0.292 0.345 0.333 0.403 0.425

Shrinkage P 29.478 2.310 0.202 0.226 0.252 0.303 0.312 0.337 0.372
T 28.410 2.365 0.212 0.232 0.294 0.336 0.330 0.390 0.419

Saturated P 57.107 111.263 0.202 0.228 0.302 0.490 1.083 2.401 4.427
T 55.582 104.882 0.211 0.245 0.383 0.657 1.352 3.167 5.782

Sample Size 2000
True P 1.474 0.586 0.052 0.058 0.063 0.078 0.081 0.086 0.097
T 1.610 0.634 0.050 0.063 0.062 0.080 0.093 0.095 0.101

Parametric P 30.507 0.543 0.051 0.055 0.064 0.081 0.090 0.091 0.108
T 29.168 0.495 0.050 0.064 0.071 0.086 0.101 0.110 0.121

Shrinkage P 23.545 0.647 0.053 0.056 0.063 0.078 0.082 0.084 0.095
T 22.598 0.615 0.050 0.063 0.063 0.080 0.093 0.095 0.102

Saturated P 40.322 77.627 0.053 0.057 0.069 0.100 0.188 0.457 0.946
T 38.943 72.731 0.050 0.064 0.067 0.110 0.218 0.560 1.223

Sample Size 5000
True P 0.594 0.234 0.020 0.024 0.026 0.033 0.031 0.040 0.036
T 0.623 0.265 0.024 0.024 0.028 0.035 0.033 0.039 0.040

Parametric P 29.983 0.379 0.020 0.025 0.029 0.037 0.043 0.049 0.055
T 28.616 0.298 0.024 0.025 0.035 0.048 0.045 0.060 0.059

Shrinkage P 18.83 0.394 0.020 0.024 0.026 0.033 0.031 0.039 0.036
T 18.055 0.322 0.024 0.024 0.028 0.036 0.034 0.040 0.041

Saturated P 30.071 54.454 0.020 0.024 0.027 0.038 0.052 0.13 0.270
T 29.156 50.590 0.024 0.024 0.029 0.039 0.059 0.148 0.373

In addition, the MSE’s for the parameters μz,j in the shrinkage model compare favorably with those of the true parametric model for all sample sizes considered, despite the fact that the shrinkage priors were specified to shrink toward an incorrect model.

5 Application: Breast Cancer Prevention Trial (BCPT)

Table 2 displays the treatment-specific drop-out and intermittent missing rates in the BCPT. By the 7th study visit (36 months), more than 30% of patients had dropped out in each treatment arm, with a slightly higher percentage in the tamoxifen arm.

Table 2.

Missingness by Scheduled Measurement Time

Time Point j (Month)
1(3) 2(6) 3(12) 4(18) 5(24) 6(30) 7(36)
Tamoxifen (Total N = 5364, Overall Missing 34.94%)
Intermittent Missing 330 224 190 200 203 195
Drop-out at j 160 122 259 280 332 352 369
Cumulative Drop-out 160 282 541 821 1153 1505 1874

Placebo (Total N = 5375, Overall Missing 31.83%)
Intermittent Missing 347 215 153 181 199 197
Drop-out at j 157 106 247 287 309 272 333
Cumulative Drop-out 157 263 510 797 1106 1378 1711

5.1 Model Fit

We fit the shrinkage model to the observed data using R, with multiple chains of 5000 iterations and 1000 burn-in. Convergence was checked by examining trace plots of the multiple chains. We defined g(·) in the priors for the hyperparameters (Equation 3) to be the maximum function. To compute the expected number of subjects ez,j,y¯j2,y(α) and ez,j1,y¯j2,y(γ) in Equation (4), we assigned a point mass prior at 0.5 to all mz(α),mz(γ),ηz(α) and ηz(γ) (which corresponds to Unif(0, 1) priors on αz,ȳj−2,y and γz,ȳj−2,y) and sampled αz,ȳj−2,y and γz,ȳj−2,y using Step 1 in the algorithm described in Section 3.4. To avoid data sparsity, we calculated P [S = s, Ȳs = ȳs] using the posterior mean of αz,ȳj−2,y and γz,ȳj−2,y rather than the empirical probabilities.

To assess model fit, we compared the empirical rates and posterior means (with 95% credible intervals) of P[Yj = 1, Sj|Z = z] and P[S < j|Z = z]. As shown in Figure 3, the shrinkage model fits the observed data well.

Figure 3.

Figure 3

Solid and dashed lines represent the empirical rate of P[Yj = 1, Sj|Z = z] and P [S < j|Z = z], respectively. The posterior means of P[Yj = 1, Sj|Z = z] (diamond) and P[S < j|Z = z] (triangle) and their 95% credible intervals are displayed at each time point.

Figure 4 illustrates the effect of shrinkage on the model fit by comparing the difference between the empirical rates and posterior means of P[Yj = 1|Sj, Ȳj−1 = ȳj−1, Z = z] for the tamoxifen arm (Z = 1) and j = 6, 7. We use the later time points to illustrate this since the observed data were more sparse and the shrinkage effect was more apparent. The empirical depression rates often reside on the boundary (0 or 1). In some cases, there are no observations within ”cells”, thus the empirical rates were undefined. From the simulation results in Section 4, we know that the empirical estimates are less reliable for later time points. Via the shrinkage priors, the probabilities P[Yj = 1|Sj, Yj−1 = yj−1, Ȳj−2 = ȳj−2, Z = z] with the same ȳj−2 are shrunk together and away from the boundaries. By borrowing information across neighboring cells, we are able to estimate P[Yj = 1|Sj, Ȳj−1, Z = z] for all j, z and Ȳj−1 with better precision. The differences between the empirical rates and the posterior means illustrate the magnitude of the shrinkage effect. In the BCPT, the depression rate was (relatively) low and there were few subjects at the later times that were observed with a history of mostly depression at the earlier visits; as a result, the differences were larger when Ȳj−1 had a lot of 1’s (depression).

Figure 4.

Figure 4

(A) The empirical rate and model-based posterior mean of P[Yj = 1|S ≥ j, Ȳ j−1 = ȳj−1, Z = z] for Z = 2 and j = 6, 7. (B) The difference between the empirical and model-based posterior mean of the depression rate. The x-axis is the pattern of historical response data Ȳ j−1.

5.2 Inference

Figure 5 shows the posterior of P[Y7 = 1|Z = z], the treatment-specific probability of depression at the end of the 36-month follow up (solid lines). For comparison, the posterior under MAR (corresponding to point mass priors for τ at zero) is also presented (dashed lines). The observed depression rates (i.e., complete case analysis) were 0.124 and 0.112 for the placebo and tamoxifen arms, respectively. Under the MNAR analysis (using the elicited priors), the posterior mean of the depression rates at month 36 were 0.133 (95%CI: 0.122, 0.144) and 0.125 (95%CI: 0.114, 0.136) for the placebo and tamoxifen arms; the difference was −0.007 (95%CI: −0.023, 0.008). Under MAR, the rates were 0.132 (95%CI: 0.121, 0.143) and 0.122 (95%CI: 0.111, 0.133) for the placebo and tamoxifen arms; the difference was −0.01 (95%CI: −0.025, 0.005). The posterior probability of depression was higher under the MNAR analysis than the MAR analysis since researchers believed depressed patients were more likely to drop out (see Table 1), a belief that was captured by the elicited priors. Figure 6 shows that under the two treatments there were no significant differences in the depression rates at any measurement time (95% credible intervals all cover zero) under both MNAR and MAR. Similar (non-significant) treatment differences were seen when examining treatment comparisons conditional on depression status at baseline.

Figure 5.

Figure 5

Posterior distribution of P[Y7 = 1|Z = z]. Black and gray lines represent tamoxifen and placebo arms, respectively. Solid and dashed lines are for MNAR and MAR, respectively.

Figure 6.

Figure 6

Posterior mean and 95% credible interval of difference of P[Yj = 1|Z = z] between placebo and tamoxifen arms. The gray and white boxes are for MAR and MNAR, respectively.

5.3 Sensitivity of Inference to the Priors

To assess the sensitivity of inference on the 36 month depression rates to the elicited (informative) priors {rmin, rmed, rmax}, we considered several alternative scenarios based on Table 1. In the first scenario, we made the priors more or less informative by scaling the range, but leaving the median unchanged. That is, we considered increasing (or decreasing) the range by a scale factor v to {rmedv(rmedrmin), rmed, rmed + v(rmaxrmed)}. In the second scenario, we shifted the prior by a factor u, {u + rmin, u + rmed, u + rmax}.

The posterior mean and between-treatment difference of the depression rate at month 36 with 95% CI are given in Tables 5 and 6. None of the scenarios considered resulted in the 95% CI for the difference in rates of depression at 36 months that excluded zero except for the (extreme) scenario where the elicited tamoxifen intervals were shifted by 0.5 and the elicited placebo intervals were shifted by −0.5.

Table 5.

Sensitivity to the Elicited Prior

Scenario (T:Tamoxifen, P:Placebo)
vT = 5, vP= 5 vT = 0.2, vP= 0.2 uT = 0.5, uP= 0.5 uT = −0.5, uP= −0.5

Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25%
Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80
Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00
Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10
P [Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.120, 0.143) 0.117(0.107, 0.128)

Placebo Minimum 0.85 0.80 1.04 1.28 1.51 1.70 0.51 0.70
Median 1.05 1.30 1.05 1.30 1.55 1.80 0.55 0.80
Maximum 1.30 1.80 1.06 1.32 1.60 1.90 0.60 0.90
P [Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.139(0.128, 0.150) 0.125(0.114, 0.135)

Difference of P [Y7 = 1](95% CI) −0.008(−0.024, 0.008) −0.007(−0.023, 0.008) −0.007(−0.023, 0.009) −0.008(−0.023, 0.007)

Table 6.

Sensitivity to the Elicited Prior

Scenario (T:Tamoxifen, P:Placebo)
vT= 5, vP= 0.2 vT= 0.2, vP= 5 uT = 0.5, uP= −0.5 uT = −0.5, uP= 0.5

Treatment Percentile 10% 25% 10% 25% 10% 25% 10% 25%
Tamoxifen Minimum 0.79 0.50 1.18 1.46 1.60 1.80 0.60 0.80
Median 1.20 1.50 1.20 1.50 1.70 2.00 0.70 1.00
Maximum 1.70 2.00 1.22 1.52 1.80 2.10 0.80 1.10
P [Y7 = 1](95% CI) 0.125(0.114, 0.136) 0.125(0.114, 0.136) 0.132(0.1210.143), 0.117(0.107, 0.128)

Placebo Minimum 1.04 1.28 0.85 0.80 0.51 0.70 1.51 1.70
Median 1.05 1.30 1.05 1.30 0.55 0.80 1.55 1.80
Maximum 1.06 1.32 1.30 1.80 0.60 0.90 1.60 1.90
P [Y7 = 1](95% CI) 0.133(0.122, 0.144) 0.133(0.122, 0.144) 0.125(0.114, 0.135) 0.139(0.128, 0.150)

Difference of P [Y7 = 1](95% CI) −0.008(−0.024, 0.008) −0.008(−0.023, 0.008) 0.007(−0.008, 0.023) −0.022(−0.037, −0.006)

We also assessed the impact of switching the priors for the placebo and tamoxifen arms; in this case, the posterior means were 0.135 (95%CI: 0.124, 0.146) and 0.123 (95%CI: 0.112, 0.134) for the placebo and tamoxifen arms respectively, while the difference was −0.012 (95%CI: −0.027, 0.004).

6 Summary and Discussion

In this paper, we have presented a Bayesian shrinkage approach for longitudinal binary data with informative missingness (both intermittent and drop-out). Our model provides a framework that incorporates expert opinion about non-identifiable parameters and avoids the curse of dimensionality by using shrinkage priors. In our analysis of the BCPT data, we concluded that there was little (if any) evidence that women on tamoxifen were more depressed than those on placebo.

An important feature of our approach is that the specification of models for the identi-fiable distribution of the observed data and the non-identifiable parameters can be implemented by separate independent data analysts. This feature can be used to increase the objectivity of necessarily subjective inferences.

Penalized likelihood (Wahba, 1990; Green and Silverman, 1994; Fan and Li, 2001) is another approach for high-dimensional statistical modeling. There are similarities between the penalized likelihood approach and our shrinkage model. In fact, the shrinkage priors on the saturated model parameters proposed in our approach can be viewed as a specific form for the penalty.

The ideas in this paper can be extended to continuous outcomes. For example, one could use the mixtures of Dirichlet processes model (Escobar and West, 1995) for the distribution of observed responses. They can also be extended to multiple cause drop-out; in this trial, missed assessments were due to a variety of reasons including patient-specific causes such as experiencing a protocol defined event, stopping therapy, withdrawing consent and institution-specific causes such as understaffing and staff turnover. Therefore, some missingness is less likely to be informative. In addition, institutional differences might be addressed by allowing institution-specific parameters with priors that shrink them toward a common set of parameters. Finally, we might consider alternatives to the partial ignorability assumption (Assumption 1) which has been widely used, but questioned by some (Robins, 1997).

Acknowledgments

This research was supported by NIH grants R01-CA85295, U10-CA37377, and U10-CA69974. The authors are grateful to oncologist Patricia Ganz at UCLA for providing her expertise for the MNAR analysis.

Appendix

Proof of Theorem 1

Proof

Under Assumption 1, we know that the parameters of the conditional joint distribution of S and ȲS given Z = z are estimable from the distribution of the observed data. Next, we show, via backward induction, that P[Yj = 1|Sk −1, Ȳk−1 = ȳk−1, Z = z] and P[Yj = 1|S = k − 1, Ȳk−1 = ȳk−1, Z = z] (k = 1, …,j) can be written in terms of the the conditional joint distribution of S and Ȳ S given Z = z.

Consider k = j. By Assumption 2,

P[Yj=1S=j1,Y¯j1=y¯j1,Z=z]=P[Yj=1Sj,Y¯j1,Z=z]exp{qz,j(Y¯j1,1)}E[exp{qz,j(Y¯j1,Yj)}Sj,Y¯j1=y¯j1,Z=z]

Since the right hand side is identified, we know that P[Yj = 1|S = j−1, Ȳj−1 = ȳj−1, Z = z] is identified. Further, we can write

P[Yj=1Sj1,Y¯j1=y¯j1,Z=z]=P[Yj=1Sj,Y¯j1=y¯j1,Z=z]P[SjSj1,Y¯j1=y¯j1,Z=z]+P[Yj=1S=j1,Y¯j1=y¯j1,Z=z]P[S=j1Sj1,Y¯j1=y¯j1,Z=z].

Since all quantities on the right hand side are identified, P[Yj = 1|Sj−1, Ȳj−1 = ȳj−1, Z = z] is identified.

Suppose that P[Yj = 1|S = k −1, Y¯k−1= y¯k−1, Z = z] and P[Yj = 1|Sk −1, Y¯k−1= y¯j, Z = z ] are identified for some k where 1 < k < j. Then, we need to show that these probabilities are identified for k′ = k −1. To see this, note that

P[Yj=1S=k1,Y¯k1=y¯k1,Z=z]=P[Yj=1S=k2,Y¯k2=y¯k2,Z=z]=yk1=01P[Yj=1S=k2,Y¯k1=y¯k1,Z=z]×P[Yk1=yk1S=k2,Y¯k2=y¯k2,Z=z]=yk1=01P[Yj=1Sk1,Y¯k1=y¯k1,Z=z]×P[Yk1=yk1Sk1,Y¯k2y¯k2,Z=z]exp{qz,k1(Y¯k2,yk1)}E[exp{qz,k1(Y¯k2,Yk1)}Sk1,Y¯k2=y¯k2,Z=z]

The third equality follows by Assumptions 2 and 3. Since all the quantities on the right hand side of the last equality are identified, P[Yj = 1|S = k′ −1, Ȳk′−1 = ȳk′−1, Z = z] is identified. Further,

P[Yj=1Sk1,Y¯k1=y¯k1,Z=z]=P[Yj=1Sk2,Y¯k2=y¯k2,Z=z]=yk1=01P[Yj=1Sk1,Y¯k1=y¯k1,Z=z]×P[Yk1=yk1Sk1,Y¯k2=y¯k2,Z=z]×P[Sk1Sk2,Y¯k2=y¯k2,Z=z]+yk1=01P[Yj=1S=k2,Y¯k1=y¯k1,Z=z]×P[Yk1=yk1S=k1,Y¯k2=y¯k2,Z=z]×P[S=k2Sk2,Y¯k2=y¯k2,Z=z]=yk1=01P[Yj=1Sk1,Y¯k1=y¯k1,Z=z]×P[Yk1=yk1Sk1,Y¯k2=y¯k2,Z=z]×P[Sk1Sk2,Y¯k2=y¯k2,Z=z]+yk1=01P[Yj=1Sk1,Y¯k1=y¯k1,Z=z]×P[Yk1=yk1Sk1,Y¯k2=y¯k2,Z=z]exp{qz,k1(Y¯k2,yk1)}E[exp{qz,k1(Y¯k2,Yk1)}Sk1,Y¯k2=y¯k2,Z=z]×P[S=k2Sk2,Y¯k2=y¯k2,Z=z]

The third equality follows by Assumptions 2 and 3. Since all the quantities on the right hand side of the last equality are identified, P(Yj = 1|Sk′−1, Ȳk′−1= ȳk′−1, Z = z) is identified.

Gibbs Sampler for Posterior Computation

In the first step of the Gibbs sampler, we draw, for each subject with intermittent missing data, from the full conditional of YmisI given αz, γz, mz(α),ηz(α),mz(γ),ηz(γ), Yobs, S, RS and Z = z. The full conditional distribution can be expressed as

P[YmisI=ymisIαz,γz,mz(α),ηz(α),mz(γ),ηz(γ),Yobs=yobs,S=s,Rs=rs,Z=z]=P[YmisI=ymisI,Yobs=yobs,S=sαz,γz,mz(α),ηz(α),mz(γ),ηz(γ),Z=z]allymisIP[YmisI=ymisI,Yobs=yobs,S=sαz,γz,rz(α),ηz(α),rz(γ),ηz(γ),Z=z]

where the right hand side can be expressed as a function of YmisI, yobs, s and αz and γz.

In the second step, we draw from the full conditional of mz(α) given { YmisI}, αz, γz, ηz(α),mz(γ),ηz(γ), {Yobs}, {S}, {RS} and {Z} = z, where the notation {Inline graphic} denotes data Inline graphic for all the individuals on the study. The full conditional can be expressed as

j=2Jy=01f(mz,j,y(α){YmisI},αz,ηz,j,y(α),{Yobs},{S},{Z}=z)

where

f(mz,j,y(α){YmisI},αz,ηz,j,y(α),{Yobs},{S},{Z}=z)Sij,Yi,j1=yi:Zi=zB(αz,j,Y¯i,j2,y;mz,j,y(α)/ηz,j,y(α),(1mz,j,y(α))/ηz,j,y(α))

and B(α; c, d) is a Beta density with parameters c and d.

In the third step, we draw from the full conditional of mz(γ) given { YmisI}, αz, γz, ηz(α),mz(α),ηz(γ), {Yobs}, {S}, {RS} and {Z} = z. The full conditional can be expressed as

j=2Jy=01f(mz,j1,y(γ){YmisI},γz,ηz,j1,y(γ),{Yobs},{S},{Z}=z)

where

f(mz,j1,y(γ){YmisI},γz,ηz,j1,y(γ),{Yobs},{S},{Z}=z)Sij1,Yi,j1=yi:Zi=zB(γz,j1,Y¯i,j2,y;mz,j1,y(γ)/ηz,j1,y(γ),(1mz,j1,y(γ))/ηz,j1,y(γ)).

In the fourth step, we draw from the full conditional of ηz(α) given mz(α),{YmisI}, αz, γz, mz(γ),ηz(γ), {Yobs}, {S}, {RS} and {Z} = z. The full conditional can be expressed as

j=2Jy=01f(ηz,j,y(α){YmisI},αz,mz,j,y(α),{Yobs},{S},{Z}=z)

where

f(ηz,j,y(α){YmisI},αz,mz,j,y(α),{Yobs},{S},{Z}=z)g(Ez,j,y(α))(g(Ez,j,y(α))ηz,j,y(α)+1)2Sij,Yi,j1=yi:Zi=zB(αz,j,Y¯i,j2,y;mz,j,y(α)/ηz,j,y(α),(1mz,j,y(α))/ηz,j,y(α)).

In the fifth step, we draw from the full conditional of ηz(γ) given mz(α),{YmisI}, αz, γz, mz(γ),ηz(α), {Yobs}, {S}, {RS} and {Z} = z. The full conditional can be expressed as

j=2Jy=01f(ηz,j1,y(γ){YmisI},γz,mz,j1,y(γ),{Yobs},{S},{Z}=z)

where

f(ηz,1j,y(γ){YmisI},γz,mz,j1,y(γ),{Yobs},{S},{Z}=z)g(Ez,j1,y(γ))(g(Ez,j1,y(γ))ηz,j1,y(γ)+1)2×Sij1,Yi,j1=yi:Zi=zB(γz,j1,Y¯i,j2,y;mz,j1,y(γ)/ηz,j1,y(γ),(1mz,j1,y(γ))/ηz,j1,y(γ)).

To draw from the full conditionals for steps two to five, we use slice sampling (Neal, 2003).

In the sixth step, we draw from the full conditional of αz given { YmisI}, γz, mz(α),ηz(α),mz(γ),ηz(γ), {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as

j=2Jy=01ally¯j2f(αz,j,y¯j2,y{YmisI},mz,j,y(α),ηz,j,y(α),{Yobs},{S},{Z}=z)

where

f(αz,j,y¯j2,y{YmisI},mz,j,y(α),ηz,j,y(α),{Yobs},{S},{Z}=z)=B(αz,j,y¯j2,y;mz,j,y(α)/ηz,j,y(α)+oz,j,y¯j2,y(α),(1mz,j,y(α))/ηz,j,y(α)+nz,j,y¯j2,y(α)oz,j,y¯j2,y(α)),

nz,j,y¯j2,y(α) is the number of subjects with Sj, Yj−1 = y, Ȳj−2 = ȳj−2 and Z = z, and oz,j,y¯j2,y(α) is the number of subjects with S =≥ j, Yj−1 = y, Ȳj−2 = ȳj−2, Z = z and Yj = 1.

Finally, we draw from the full conditional of γz, given { YmisI}, αz, rz(α),ηz(α),rz(γ),ηz(γ), {Yobs}, {S}, {Rs} and {Z} = z. The full conditional can be expressed as

j=2Jy=01ally¯j2f(γz,j1,y¯j2,y{YmisI},mz,j1,y(γ),ηz,j1,y(γ),{Yobs},{S},{Z}=z)

where

f(γz,j1,y¯j2,y{YmisI},mz,j1,y(γ),ηz,j1,y(γ),{Yobs},{S},{Z}=z)=B(γz,j1,y¯j2,y;mz,j1,y(γ)/ηz,j1,y(γ)+oz,j1,y¯j2,y(γ),(1mz,j1,y(γ))/ηz,j1,y(γ)+nz,j1,y¯j2,y(γ)oz,j1,y¯j2,y(γ)),

nz,j1,y¯j2,y(γ) is the number of subjects with Sj − 1, Yj−1 = y, Ȳj−2 = ȳj−2 and Z = z, and oz,j,y¯j2,y(γ) is the number of subjects with S = j −1, Yj−1 = y, Ȳj−2 = ȳj−2 and Z = z.

References

  1. Albert PS. A Transitional Model for Longitudinal Binary Data Subject to Nonignorable Missing Data. Biometrics. 2000;56(2):602–608. doi: 10.1111/j.0006-341x.2000.00602.x. [DOI] [PubMed] [Google Scholar]
  2. Albert PS, Follmann DA, Wang SA, Suh EB. A latent autoregressive model for longitudinal binary data subject to informative missingness. Biometrics. 2002;58(3):631–642. doi: 10.1111/j.0006-341x.2002.00631.x. [DOI] [PubMed] [Google Scholar]
  3. Baker SG. Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics. 1995;51(3):1042–1052. [PubMed] [Google Scholar]
  4. Baker SG, Rosenberger WF, DerSimonian R. Closed-form estimates for missing counts in two-way contingency tables. Statistics in Medicine. 1992;11:643–657. doi: 10.1002/sim.4780110509. [DOI] [PubMed] [Google Scholar]
  5. Birmingham J, Fitzmaurice GM. A Pattern-Mixture Model for Longitudinal Binary Responses with Nonignorable Nonresponse. Biometrics. 2002;58(4):989–996. doi: 10.1111/j.0006-341x.2002.00989.x. [DOI] [PubMed] [Google Scholar]
  6. Christiansen CL, Morris CN. Hierarchical Poisson regression modeling. Journal of the American Statistical Association. 1997:618–632. [Google Scholar]
  7. Daniels MJ. A prior for the variance in hierarchical models. The Canadian Journal of Statistics/La Revue Canadienne de Statistique. 1999;27(3):567–578. [Google Scholar]
  8. Daniels MJ, Hogan JW. Reparameterizing the Pattern Mixture Model for Sensitivity Analyses Under Informative Dropout. Biometrics. 2000;56(4):1241–1248. doi: 10.1111/j.0006-341x.2000.01241.x. [DOI] [PubMed] [Google Scholar]
  9. Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall/CRC; 2008. [Google Scholar]
  10. DeGruttola V, Tu XM. Modelling Progression of CD4-lymphocyte Count and its Relationship to Survival Time. Biometrics. 1994;50(4):1003–1014. [PubMed] [Google Scholar]
  11. Diggle P, Kenward MG. Informative Drop-out Longitudinal Data Analysis. Applied Statistics. 1994;43(1):49–93. [Google Scholar]
  12. Escobar MD, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 1995:577–588. [Google Scholar]
  13. Fan J, Li R. Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association. 2001;96(456):1348–1361. [Google Scholar]
  14. Fisher B, Costantino JP, Wickerham DL, Redmond CK, Kavanah M, Cronin WM, Vogel V, Robidoux A, Dimitrov N, Atkins J, Daly M, Wieand S, Tan-Chiu E, Ford L, Wolmark N other National Surgical Adjuvant Breast, and Bowel Project Investigators. Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P-1 study. Journal of the National Cancer Institute. 1998;90:1371–1388. doi: 10.1093/jnci/90.18.1371. [DOI] [PubMed] [Google Scholar]
  15. Fitzmaurice GM, Laird NM. Generalized Linear Mixture Models for Handling Non-ignorable Dropouts in Longitudinal Studies. Biostatistics. 2000;1(2):141–156. doi: 10.1093/biostatistics/1.2.141. [DOI] [PubMed] [Google Scholar]
  16. Fitzmaurice GM, Molenberghs G, Lipsitz SR. Regression Models for Longitudinal Binary Responses with Informative Drop-Outs. Journal of the Royal Statistical Society Series B Methodological. 1995;57(4):691–704. [Google Scholar]
  17. Green PJ, Silverman BW. Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman & Hall/CRC; 1994. [Google Scholar]
  18. Harel O, Schafer JL. Partial and latent ignorability in missing-data problems. Biometrika. 2009;96(1):37. [Google Scholar]
  19. Heckman J. Sample Selection Bias as a Specification Error. Econometrica. 1979;47(1):153–161. [Google Scholar]
  20. Hogan JW, Laird NM. Model-Based Approaches to Analysing Incomplete Longitudinal and Failure Time Data. Statistics in Medicine. 1997a;16(3):259–272. doi: 10.1002/(sici)1097-0258(19970215)16:3<259::aid-sim484>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
  21. Hogan JW, Laird NM. Mixture Models for the Joint Distribution of Repeated Measures and Event Times. Statistics in Medicine. 1997b;16(3):239–257. doi: 10.1002/(sici)1097-0258(19970215)16:3<239::aid-sim483>3.0.co;2-x. [DOI] [PubMed] [Google Scholar]
  22. Ibrahim JG, Chen MH, Lipsitz SR. Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika. 2001;88(2):551. [Google Scholar]
  23. Kaciroti NA, Schork MA, Raghunathan T, Julius S. A Bayesian Sensitivity Model for Intention-to-treat Analysis on Binary Outcomes with Dropouts. Statistics in Medicine. 2009;28:572–585. doi: 10.1002/sim.3494. [DOI] [PubMed] [Google Scholar]
  24. Kenward MG, Molenberghs G. Parametric Models for Incomplete Continuous and Categorical Longitudinal Data. Statistical Methods in Medical Research. 1999;8(1):51. doi: 10.1177/096228029900800105. [DOI] [PubMed] [Google Scholar]
  25. Kenward MG, Molenberghs G, Thijs H. Pattern-mixture Models with Proper Time Dependence. Biometrika. 2003;90(1):53–71. [Google Scholar]
  26. Kurland BF, Heagerty PJ. Marginalized Transition Models for Longitudinal Binary Data with Ignorable and Non-Ignorable Drop-Out. Statistics in Medicine. 2004;23(17):2673–2695. doi: 10.1002/sim.1850. [DOI] [PubMed] [Google Scholar]
  27. Land S, Wieand S, Day R, Ten Have T, Costantino JP, Lang W, Ganz PA. Methodological Issues In the Analysis of Quality of Life Data in Clinical Trials: Illustrations from the National Surgical Adjuvant Breast And Bowel Project (NSABP) Breast Cancer Prevention Trial. Statistical Methods for Quality of Life Studies. 2002:71–85. [Google Scholar]
  28. Lee JY, Hogan JW, Hitsman B. Sensitivity analysis and informative priors for longitudinal binary data with outcome-related drop-out. Technical Report, Brown University. 2008 [Google Scholar]
  29. Lin H, McCulloch CE, Rosenheck RA. Latent pattern mixture models for informative intermittent missing data in longitudinal studies. Biometrics. 2004;60(2):295–305. doi: 10.1111/j.0006-341X.2004.00173.x. [DOI] [PubMed] [Google Scholar]
  30. Little RJA. Pattern-Mixture Models for Multivariate Incomplete Data. Journal of the American Statistical Association. 1993;88(421):125–134. [Google Scholar]
  31. Little RJA. A Class of Pattern-Mixture Models for Normal Incomplete Data. Biometrika. 1994;81(3):471–483. [Google Scholar]
  32. Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 90(431):1995. [Google Scholar]
  33. Little RJA, Rubin DB. Comment on Adjusting for Non-Ignorable Drop-out Using Semiparametric Models. In: Scharfstein DO, Rotnitsky A, Robins JM, editors. Journal of the American Statistical Association. Vol. 94. 1999. pp. 1130–1132. [Google Scholar]
  34. Liu X, Waternaux C, Petkova E. Influence of Human Immunodeficiency Virus Infection on Neurological Impairment: An Analysis of Longitudinal Binary Data with Informative Drop-Out. Journal of the Royal Statistical Society (Series C): Applied Statistics. 1999;48(1):103–115. [Google Scholar]
  35. Molenberghs G, Kenward MG. Missing Data in Clinical Studies. Wiley; 2007. [Google Scholar]
  36. Molenberghs G, Kenward MG, Lesaffre E. The Analysis of Longitudinal Ordinal Data with Nonrandom Drop-Out. Biometrika. 1997;84(1):33–44. [Google Scholar]
  37. Neal RM. Slice sampling. The Annals of Statistics. 2003;31(3):705–741. [Google Scholar]
  38. Nordheim EV. Inference from Nonrandomly Missing Categorical Data: an Example From a Genetic Study of Turner’s Syndrome. Journal of the American Statistical Association. 1984;79(388):772–780. [Google Scholar]
  39. Pauler DK, McCoy S, Moinpour C. Pattern Mixture Models for Longitudinal Quality of Life Studies in Advanced Stage Disease. Statistics in Medicine. 2003;22(5):795–809. doi: 10.1002/sim.1397. [DOI] [PubMed] [Google Scholar]
  40. Pulkstenis EP, Ten Have TR, Landis JR. Model for the Analysis of Binary Longitudinal Pain Data Subject to Informative Dropout Through Remedication. Journal of the American Statistical Association. 1998;93(442):438–450. [Google Scholar]
  41. Radloff LS. The CES-D Scale: A Self-Report Depression Scale for Research in the General Population. Applied Psychological Measurement. 1977;1(3):385. [Google Scholar]
  42. Robins JM. Non-response models for the analysis of non-monotone non-ignorable missing data. Statistics in Medicine. 1997;16(1):21–37. doi: 10.1002/(sici)1097-0258(19970115)16:1<21::aid-sim470>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
  43. Robins JM, Ritov YAA. Toward a curse of dimensionality appropriate(coda) asymptotic theory for semi-parametric models. Statistics in Medicine. 1997;16(3):285–319. doi: 10.1002/(sici)1097-0258(19970215)16:3<285::aid-sim535>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
  44. Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89(427):846–866. [Google Scholar]
  45. Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 90(429):1995. [Google Scholar]
  46. Rotnitzky A, Robins JM, Scharfstein DO. Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association. 1998;93(444):1321–1322. [Google Scholar]
  47. Rotnitzky A, Scharfstein DO, Su T-L, Robins JM. Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics. 2001;57:103–113. doi: 10.1111/j.0006-341x.2001.00103.x. [DOI] [PubMed] [Google Scholar]
  48. Roy J. Modeling Longitudinal Data with Nonignorable Dropouts Using a Latent Dropout Class Model. Biometrics. 2003;59(4):829–836. doi: 10.1111/j.0006-341x.2003.00097.x. [DOI] [PubMed] [Google Scholar]
  49. Roy J, Daniels MJ. A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics. 2008;64:538–545. doi: 10.1111/j.1541-0420.2007.00884.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association. 1999;94(448):1096–1146. [Google Scholar]
  51. Scharfstein DO, Daniels MJ, Robins JM. Incorporating Prior Beliefs about Selection Bias into the Analysis of Randomized Trials with Missing Outcomes. Biostatistics. 2003;4(4):495. doi: 10.1093/biostatistics/4.4.495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Scharfstein DO, Manski CF, Anthony JC. On the Construction of Bounds in Prospective Studies with Missing Ordinal Outcomes: Application to the Good Behavior Game Trial. Biometrics. 2004;60(1):154–164. doi: 10.1111/j.0006-341X.2004.00158.x. [DOI] [PubMed] [Google Scholar]
  53. Scharfstein DO, Halloran ME, Chu H, Daniels MJ. On estimation of vaccine efficacy using validation samples with selection bias. Biostatistics. 2006;7(4):615. doi: 10.1093/biostatistics/kxj031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Shepherd BE, Gilbert PB, Mehrotra DV. Eliciting a Counterfactual Sensitivity Parameter. American Statistician. 2007;61(1):56. [Google Scholar]
  55. Ten Have TR, Kunselman AR, Pulkstenis EP, Landis JR. Mixed effects logistic regression models for longitudinal binary response data with informative drop-out. Biometrics. 1998;54(1):367–383. [PubMed] [Google Scholar]
  56. Ten Have TR, Miller ME, Reboussin BA, James MK. Mixed Effects Logistic Regression Models for Longitudinal Ordinal Functional Response Data with Multiple-Cause Drop-Out from the Longitudinal Study of Aging. Biometrics. 2000;56(1):279–287. doi: 10.1111/j.0006-341x.2000.00279.x. [DOI] [PubMed] [Google Scholar]
  57. Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3(2):245. doi: 10.1093/biostatistics/3.2.245. [DOI] [PubMed] [Google Scholar]
  58. Troxel AB, Harrington DP, Lipsitz SR. Analysis of longitudinal data with non-ignorable non-monotone missing values. Journal of the Royal Statistical Society Series C (Applied Statistics) 1998a;47(3):425–438. [Google Scholar]
  59. Troxel AB, Lipsitz SR, Harrington DP. Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data. Biometrika. 1998b;85(3):661. [Google Scholar]
  60. Tsiatis AA. Semiparametric theory and missing data. Springer; New York: 2006. [Google Scholar]
  61. van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer; 2003. [Google Scholar]
  62. Vansteelandt S, Goetghebeur E, Kenward MG, Molenberghs G. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica. 2006;16(3):953–979. [Google Scholar]
  63. Vansteelandt S, Rotnitzky A, Robins J. Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika. 2007;94(4):841. doi: 10.1093/biomet/asm070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Wahba G. Spline models for observational data. Society for Industrial Mathematics; 1990. [Google Scholar]
  65. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics. 1988;44(1):175–188. [Google Scholar]
  66. Yuan Y, Little RJA. Mixed-effect hybrid models for longitudinal data with nonignorable drop-out. Biometrics. 2009 doi: 10.1111/j.1541-0420.2008.01102.x. in press. [DOI] [PubMed] [Google Scholar]

RESOURCES