Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 May 5.
Published in final edited form as: J Biopharm Stat. 2014;24(2):229–253. doi: 10.1080/10543406.2013.860769

A MULTIPLE IMPUTATION METHOD FOR SENSITIVITY ANALYSES OF TIME-TO-EVENT DATA WITH POSSIBLY INFORMATIVE CENSORING

Yue Zhao 1, Amy H Herring 2, Haibo Zhou 2, Mirza W Ali 3, Gary G Koch 2
PMCID: PMC4009741  NIHMSID: NIHMS574902  PMID: 24605967

Abstract

This article presents a multiple imputation method for sensitivity analyses of time-to-event data with possibly informative censoring. The imputed time for censored values is drawn from the failure time distribution conditional on the time of follow-up discontinuation. A variety of specifications regarding the post-discontinuation tendency of having events can be incorporated in the imputation through a hazard ratio parameter for discontinuation versus continuation of follow-up. Multiple-imputed data sets are analyzed with the primary analysis method, and the results are then combined using the methods of Rubin. An illustrative example is provided.

Keywords: Multiple imputation, Sensitivity analysis, Time-to-event data

1. INTRODUCTION

An essential property of confirmatory clinical trials is the randomization of patients so that the control and the test treatment have statistically equivalent distributions for known and unknown baseline characteristics that may have potential associations with the outcome of interest (National Research Council, 2010; CHMP, 2010). However, a ubiquitous and inevitable problem that can undermine the comparability of randomized treatment groups is potential bias from the nature and extent of missing data for patients who prematurely discontinue their planned follow-up period for the assigned treatment (or the study) without further assessment. In view of this problem, the design of many clinical trials specifies continued follow-up of patients after premature termination of the assigned treatment for such reasons as adverse events, lack of compliance, lack of efficacy, or protocol deviations. A rationale for this practice is that it provides potentially useful information about the experiences of these patients for their remaining follow-up time until their planned (or premature) discontinuation from the study (Flyer and Hirman, 2009; Walton, 2009). However, the role of this information can be unclear when patients receive effective rescue treatment after discontinuing their assigned treatment (Flyer and Hirman, 2009). For example, the comparison of regimens that begin with test treatment or placebo followed by effective rescue therapy after their discontinuation could erroneously suggest that an ineffective test treatment is effective solely because it forces more patients to switch to rescue therapy than placebo (Permutt and Pinheiro, 2009). Thus, in such situations, analyses for the comparison of the assigned treatments may need to ignore any unclear information subsequent to their discontinuation and thereby proceed with the corresponding experiences of patients as if missing.

Analytical strategies for drawing inferences from incomplete data rely on untestable assumptions about the missing data distributions and the missingness mechanism (National Research Council, 2010; CHMP, 2010). Little and Rubin (2002) classified the missing data mechanism into three categories: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). When the data are MCAR, the missing data are unrelated to the observed and unobserved study variables, so the observed data are statistically representative for the experiences of all randomized patients. In practice, however, MCAR is usually an unrealistic assumption. When the data are MAR, the missingness depends only upon the observed study variables. That is, conditional on the observed study variables, the probability of missing does not depend on the values of the missing data. When the missingness probability also depends on the values of the missing data, the data are said to be MNAR. In many situations, the MAR paradigm is realistic for the primary analysis in confirmatory clinical trials (Zhang, 2009; Mallinckrodt et al., 2008). However, the observed data can never rule out the possibility of MNAR. Therefore, sensitivity analyses exploring the implications of departures from the primary MAR assumption are always of interest to assess the robustness of the treatment effect inferences.

We consider randomized clinical trials where a time-to-event is the primary outcome. Conventional methods such as the Kaplan–Meier estimation of survival curves (Kaplan and Meier, 1958), the logrank or Wilcoxon tests (Mantel, 1966; Gehan, 1965), and the Cox proportional hazards model (Cox, 1972) are frequently employed to describe time, to, event distributions and to assess treatment effects. Missing data for a time-to-event occurs for patients who prematurely discontinue follow-up for the assigned treatment (or the study) prior to the occurrence of the event or the end of their planned follow-up period (or the administrative closing date of the study). One way to address this type of missing data is to censor the follow-up times of such patients at their times of premature discontinuation. Such censoring is noninformative in a sense like the MAR assumption (Heitjan, 1994) when the assumption of its independence from the possibly unobserved time-to-event applies: that is, the possibly unknown true time to the event for a patient is the same regardless of whether or not it is actually observed (or whether censoring occurs or not prior to it). Unfortunately, the conventional MAR-like methods ignore the fact that the patients who discontinue the assigned treatment no longer receive it after discontinuation. Instead, they attempt to estimate what would be expected for the study if all patients remained on their assigned treatments until the occurrence of the event or the end of their planned follow-up period (Flyer, 2009).

Alternatively, discontinuation from treatment can be specified as clinical failure when the event of interest is unfavorable (Flyer and Hirman, 2009). In this case, one has a composite endpoint (i.e., time to the event of interest or discontinuation), and it expresses the time period for which a patient has had favorable experience with treatment. The application of this method to both the control and the test treatment groups produces what can be called the worst-case analysis, because patients who discontinue treatment are managed as having much higher risk of a future event than other patients (Rothmann et al., 2009). In contrast, the method in which a control patient who discontinues treatment has his or her follow-up time censored at the time of discontinuation and such a test treatment patient is managed as having the event is known as a worst-comparison analysis (Rothmann et al., 2009). The result from the worst-comparison analysis provides a stringent boundary on the impact of patients who discontinued treatment. Both the worst-case analysis and the worst-comparison analysis have potentially unclear relevance for a study because they both make unrealistic assumptions (Wittes, 2009). Usually, they are not designated as the primary analysis, but they can be used as sensitivity analyses with the worst-comparison analysis invoking maximal stress to the robustness of the study results (Walton, 2009). If the study conclusions are not altered by such methods, then one is reassured regarding the validity of the primary MAR-like analysis. Nevertheless, many studies will not maintain robustness to such sensitivity analyses. Hence, these methods are often criticized as unrealistically stringent and potentially problematic for a promising therapy to show effectiveness (Yan et al., 2009).

For longitudinal data with discontinuing patients, Little and Yau (1996) proposed multiple imputation of the missing responses on the basis of models incorporating actual treatment doses that might apply, or imputed doses under a variety of plausible assumptions. Recently, using a similar basic approach, Roger (2008) developed a sensitivity analysis, where the estimates from a mixed-effects model in the placebo group were used to provide information about possible future behaviors of discontinued patients from the test treatment. In this article, we propose a related sensitivity analysis for time-to-event data. On the basis of Kaplan–Meier (KM) estimators (or Cox proportional hazards model counterparts), patients who discontinue their assigned treatment (or follow-up) have multiple imputations for their experiences during their unobserved remaining times until the planned end of their follow-up period (as if they continued to be followed). The imputed data sets, having only administrative censoring of follow-up for patients who did not have the event by the end of their planned follow-up period, can then be analyzed by the standard methods for right censored time-to-event data. A key feature of this multiple imputation method for sensitivity analyses is a corresponding hazard ratio parameter θ for how the conditional survival distribution for the missing extent of follow-up can allow for different post-discontinuation behaviors of patients from the placebo and the test treatment groups. One can then investigate the impact of departures from the primary missingness assumption (i.e., noninformative independent censoring) by summarizing the treatment effect as a function of θ over a plausible range. This multiple imputation method is an extension and modification of the work by Taylor et al. (2002), where the conditional KM estimators were used to impute failure times for survival analyses under a specification for non-informative censoring.

The implementation of this method is illustrated with data from a clinical trial in psychiatry.

2. CLINICAL TRIAL EXAMPLES

For illustrative purposes, we consider time-to-event data based on a clinical trial pertaining to maintenance treatment for bipolar disorder (Calabrese et al., 2003). For reasons related to the confidentiality of the data from this clinical trial, the example in this article is based on a random sample (with replacement) of 150 patients with the test treatment and 150 patients with placebo. The study design for this clinical trial had an 8 to 16 weeks run-in period within which all patients received test treatment. Eligible patients who tolerated and adhered to this therapy were randomized to the test treatment or to the placebo, and then followed for up to 76 weeks as the planned follow-up period. Accordingly, this study had a randomized withdrawal design, and the primary efficacy endpoint was the time to intervention for any mood episode.

In total, 97 (32 33%) patients discontinued the study prematurely (35% on placebo and 29% on test treatment). Cumulative proportions of discontinued patients are shown in Fig. 1 (which has the convention of managing the patients who completed the study with the primary event as having imputed follow-up of 76 weeks without premature discontinuation). Discontinuations predominantly occurred before 35 weeks with higher cumulative proportions for the placebo group. The documented reasons for discontinuation are summarized in Table 1, although except perhaps for “adverse events,” they are not informative about possible missing data mechanisms. The cumulative proportions of discontinuation by those reasons are displayed for each treatment arm in Fig. A-1 of the appendix. For an informal evaluation of the association of discontinuation with treatments, patients’ demographics, and baseline psychiatric assessments, we used logistic regression models for the odds of discontinuation versus completion of the study (either with the primary outcome or completion of 76 weeks of follow-up without it). As shown in Table 2, neither the unadjusted (from univariate regression on each individual variable) nor the adjusted (from multivariate regression on all the variables) odds ratios have p-values below 0.05 for any of the baseline variables or treatments. However, in view of the substantial extent of discontinuations, sensitivity analyses to address the robustness of conclusions to the management of missing information are of interest.

Figure 1.

Figure 1

Cumulative discontinuation proportions by treatment groups.

Table 1.

Discontinuations and the corresponding reasons by treatment groups

Treatment group
Overall
Placebo
Test treatment
Disposition N % N % N %
Completed study without episode 46 15.33 15 10.00 31 20.67
Intervention for a mood episode 157 52.33 82 54.67 75 50.00
Discontinued study prematurely 97 32.33 53 35.33 44 29.33
  Adverse event 24 8.00 16 10.67 8 5.33
  Consent withdraw 28 9.33 13 8.67 15 10.00
  Lost to follow-up 20 6.67 8 5.33 12 8.00
  Protocol violation 10 3.33 3 2.00 7 4.67
  Other (including missing data) 15 5.00 13 8.67 2 1.33

Table 2.

Unadjusted and adjusted odds ratios for discontinuation

Univariate logistic regression
Multivariate logistic regression
Baseline
characteristics
Coef.a
(StdErr.b)
OR
(95% CI)
P value Coef.
(StdErr.)
OR
(95% CI)
P value
Treatment (test
 vs. placebo)
−0.275 (0.248) 0.76 (0.47, 1.23) 0.2671 −0.242 (0.253) 0.79 (0.48, 1.29) 0.3398
Age (1 year
 increment)
−0.012 (0.010) 0.99 (0.97, 1.01) 0.2423 −0.011 (0.011) 0.99 (0.97, 1.01) 0.2941
Gender (female
 vs. male)
0.111 (0.247) 1.12 (0.69, 1.81) 0.6527 −0.010 (0.258) 0.99 (0.60, 1.64) 0.9695
Pre-randc CGI-I
 scored
0.301 (0.208) 1.35 (0.90, 2.03) 0.1474 0.584 (0.337) 1.79 (0.93, 3.47) 0.0828
Pre-rand CGI-S
 score
0.021 (0.168) 1.02 (0.74, 1.42) 0.9018 −0.348 (0.277) 0.71 (0.41, 1.21) 0.2085
Pre-rand GAS
 score
−0.002 (0.012) 1.00 (0.98, 1.02) 0.8719 0.010 (0.017) 1.01 (0.98, 1.04) 0.5580
Pre-rand MRS 11
 item total score
−0.012 (0.047) 0.99 (0.90, 1.08) 0.7962 −0.036 (0.050) 0.96 (0.87, 1.06) 0.4690
Pre-rand MRS 17
 item total score
0.028 (0.029) 1.03 (0.97, 1.09) 0.3433 0.036 (0.038) 1.04 (0.96, 1.12) 0.3529
a

Coefficient.

b

Standard error.

c

Pre-randomization.

d

One unit increment for all score variables.

The primary time-to-event analysis for this example has censoring of follow-up time for patients with premature discontinuation of treatment, so it has the MAR-like assumption of noninformative independent censoring. The previously noted worst-case analysis and the worst-comparison analysis serve as sensitivity analyses. The Cox proportional hazards model with one explanatory variable for treatment is used to obtain an unadjusted hazard ratio. The nonparametric logrank and Wilcoxon tests are also used to compare the test treatment and placebo. The results from these analyses are summarized in rows 1A, 1B, and 1C of Table 3, and they are interpretable as indicating superiority of the test treatment. The worst-case analysis provides stronger results in favor of test treatment, whereas the worst-comparison analysis shows no treatment difference. The worst-case analysis tends to overstate the difference in favor of the test treatment because the placebo group has more prematurely discontinued patients. Conversely, the worst-comparison analysis excessively understates the difference in favor of test treatment by unrealistically managing all of its patients with premature discontinuation as having events at the time of discontinuation. Therefore, more realistic approaches are worthy of consideration for sensitivity analyses to address robustness of conclusions for a clinical trial like this example to possibly informative censoring of time-to-event data.

Table 3.

Analyses of treatment comparisons for delaying time-to-intervention for any mood episode

Semi-parametric analysis (Cox PH model)
Analysis method Time period Coefficient Standard Error HR (95% CI) p-value Log-ranka Wilcoxonb
1. Original primary and sensitivity analyses
(1A) MAR-likeb Whole period −0.393 0.161 0.675 (0.493, 0.925) 0.0144 0.0149 0.0048
0-3 weeks −0.728 0.284 0.483 (0.277, 0.842) 0.0103
4-5 weeks −0.700 0.356 0.497 (0.247, 0.998) 0.0495
6-20 weeks −0.096 0.342 0.908 (0.465, 1.774) 0.7782
21-76 weeks 0.138 0.367 1.148 (0.559, 2.357) 0.7065
(1B) Worst case Whole period −0.484 0.127 0.616 (0.481, 0.790) 0.0001 0.0001 <0.0001
(1C) Worst
comparison
Whole period 0.033 0.144 1.033 (0.779, 1.371) 0.8212 0.7983 0.3827
2. Kaplan-Meier multiple imputation (KMMI) method at L = 50
(2A) θ = 1 Whole period −0.322 0.160 0.724 (0.530, 0.991) 0.0436 0.0451 0.0109
3. Proportional hazard multiple imputation (PHMI) method at L = 50
(3A) θ = 1 Whole period −0.388 0.158 0.678 (0.497, 0.925) 0.0143 0.0145 0.0053
4. KMMI method with bootstrap at L = 500
(4A) θ = 1 Whole period −0.336 0.170 0.714 (0.512, 0.997) 0.0480 0.0507 0.0114
5. PHMI method with bootstrap at L = 500
(5A) θ = 1 Whole period −0.396 0.160 0.673 (0.492, 0.921) 0.0134 0.0140 0.0053
a

p-Values of hypothesis test

b

MAR-like: censoring patients at the time of discontinuation.

3. METHOD

In most clinical trials with time-to-event data, a primary analysis that has censoring of follow-up time for patients with premature discontinuation of treatment is generally reasonable. The primary MAR-like assumption for such analysis is noninformative independent censoring. The proposed sensitivity analysis in this article addresses the implications of departures from this assumption by imputing different outcomes for the patients with premature discontinuation. It thereby enables assessment of the robustness of the results from the primary analysis with censoring of follow-up times for patients with premature discontinuation.

Consideration is first given to a Kaplan–Meier multiple imputation (KMMI) procedure and its separate invocation for the placebo group and the test treatment group. For this purpose, we describe the KMMI strategy for a single treatment group with n patients who have the same planned follow-up time t*. For the ith patient, we observe time Yi = min (Ti, Ci), where Ti and Ci are the potential time to event and time to premature discontinuation (or censoring) for the patient. We define the censoring indicator δi = I(TiCi), so that the data can be summarized by (Yi, δi) for i = 1, 2 ,… , n. We assume that a study has events observed at M distinct times (t1 < t2 < … < tM), and it has premature discontinuation of patients observed at K distinct times (c1 < c2 < … < cK). Also, there may be more than one patient with the same times at risk yi (i.e., occasionally tied t’s or c’s), and we assume that y = t*, δ = 0 for at least one patient who completes the entire planned follow-up time without the event (and has administrative censoring of their follow-up time at t*).

3.1. Kaplan–Meier Multiple Imputation Strategy

To establish the notation further, let k index the censoring times before t*. Let tk, 0 denote the latest failure time prior to ck (or equal to it) when t1ck, and let tk, 0 = 0 if t1 > ck. Let tk, j denote the jth failure time after ck, j = 1, 2, … , Jk, when ck < tM. Note that the possible values of Jk range from 1 to M, depending on the position of ck with respect to the order of the tm’s (m = 1, 2, … , M): Jk equals M if ck < t1, and Jk equals 1 if tM−1ck < tM. From the data (Yi, δi), we obtain the Kaplan–Meier (KM) estimates Ŝ(t) for the survival distribution for the event times, and it has support on the observed failure times (t1, t2, … , tM).

First, we estimate the survival rates for all K + 1 censoring times (t* and ck’s, k = 1, 2, … , K). For a censoring time ck followed by at least one failure time (i.e., ck < tM), the estimate of the survival function Ŝ(ck) is defined by the straightforward convention of linear interpolation as follows:

S^(ck)=S^(tk,0)cktk,0tk,1tk,0×(S^(tk,0)S^(tk,1))=(tk,1ck)S^(tk,0)+(cktk,0)S^(tk,1)(tk,1tk,0). (1)

In equation (1), linear interpolation is used for computational convenience and for transparent interpretation. For the planned administrative censoring time t* > tM, (1) is not applicable because there is not a KM estimate for Ŝ(t*)e. Nevertheless, with motivation from a suggestion in Brown et al. (1974) to use an exponential model to extrapolate Ŝ(tM) to Ŝ(t*), we use an exponential model for the conditional survival function for the last f events (e.g., f = 5) as in equation (2),

S^((tMtMf)t>tMf)=S^(tM)S^(tMf)=exp{h×(tMtMf)}, (2)

to determine the corresponding hazard h from which Ŝ(t*) is computed as shown in equation (3).

S^(t)=S^(tM)×S^((ttM)t>tM)=S^(tM)×exp{h×(ttM)}. (3)

For a censoring time ck after the last failure time (i.e., tM < ck < t*), equation (3) similarly provides Ŝ(ck) = Ŝ(tM) × exp{−h × (cktM)}.

Second, we construct the estimated conditional failure time distribution for each patient with premature discontinuation. A fixed hazard ratio θ for a patient with premature discontinuation having an event after their censoring time ck relative to the patients still remaining on their assigned treatment is introduced as the sensitivity parameter. Thus, under the proportional hazards assumption, the estimated survival function at time t (after ck) equals Ŝ(t)θ. For a patient with premature discontinuation at ck < tM, the estimated conditional probability of having the event in the time interval [tk,j, tk,j+1], for j = 1, 2, … , (Jk − 1), is given by

f^k,j(θ)=S^(tk,j)θS^(tk,j+1)θS^(ck)θ. (4)

For the interval [ck, tk, 1] and [tk, Jk, t*, the estimated conditional probabilities are

f^k,0(θ)=S^(ck)θS^(tk,1)θS^(ck)θandf^k,Jk(θ)=S^(tk,Jk)θS^(t)θS^(ck)θ, (5)

respectively. Correspondingly, for a patient with premature discontinuation at ck with tMck < t*, the estimated conditional probability of having the event in the time interval [ck, t*] is given by

f^k,0(θ)=S^(ck)θS^(t)θS^(ck)θ (6)

Thus, the estimate for the conditional cumulative incidence function for a patient with premature discontinuation at ck to have the event by the time t in [tk, j < t < tk, j+1], for j = 1, 2, 3, … , Jk with tk, Jk+1 = t* by convention, can be obtained by cumulative summation of the f^k,j(θ) for the respective time intervals as shown in equation (7).

F^k,j(θ)=j=0jf^k,j=1S^(tk,j+1)θS^(ck)θ (7)

Under this formulation, θ> 1 (or < 1) implies a higher (or lower) hazard after ck for patients with premature discontinuation at ck than for patients with continued follow-up after ck. Also, θ= 1 specifies that patients with premature discontinuation and those with continued follow-up on the assigned treatment have the same tendency to experience an event in the future, so it is MAR-like (and in harmony with noninformative independent censoring). Through the Cox proportional hazards model, the primary analysis can produce an estimate ϕ^ of the hazard ratio for the effect of test treatment versus placebo under the MAR-like assumption of noninformative independent censoring for patients with premature discontinuation. However, even if this assumption is realistic, ϕ^ pertains to what would be expected if the patients with premature discontinuation had hypothetically continued with their assigned treatments after discontinuation. Although such a perspective may be realistic for the placebo patients, it would usually be optimistic for the test treatment patients since those patients are no longer receiving test treatment after premature discontinuation. Thus, sensitivity analyses to address the implications of this issue are of interest.

One way to proceed with sensitivity analyses is to use multiple imputation with respect to the estimated conditional cumulative incidence functions in equation (7) to impute times to event for the patients with premature discontinuation in each treatment group. For the placebo group, one would typically use θP = 1 under the realistic assumption that its patients with premature discontinuation would have comparable experience after discontinuation to their counterparts without premature discontinuation, although other specifications of θP are feasible options. The test treatment group would usually have θT > θP specified, and with θP = 1 θ = (θT/θP) = θT becomes a single parameter for calibrating sensitivity analyses. The choice of can either be arbitrary, such as 1.05, 1.10, 1.15, and so on, or it can be values in a range (L, U), where (1/U, 1/L) is a range of hazard ratios from previous related studies or clinical judgment for the comparison of effective medicines with placebo. For example, if previous related studies supported (1/U, 1/L) =(0.60, 0.75), then one could consider θ in the range (1.333, 1.667) for the extent to which a test treatment patient with premature discontinuation at ck has a higher hazard after ck than their counterparts with continuation of test treatment after ck (in view of their treatment after discontinuation being more like placebo than an effective treatment).

With the conditional failure time distributions defined in equation (7), the multiple imputation scheme is as follows:

  1. Generate a random number p from the uniform distribution between 0 and 1, and for computational convenience, use linear interpolation to impute failure times (although exponential model interpolations are alternatively feasible).

  2. Suppose a patient has premature discontinuation at ck < tM:
    • If 0pf^k,0(θ)=F^k,0(θ), then impute failure time tk(l) between ck and tk, 1 as ck+(tk,1ck)pf^k,0(θ), where l indicates the lth imputation set.
    • If F^k,j(θ)pF^k,j+1(θ) for j = 0, 1, 2, 3, … , (Jk − 1), then impute failure time tk(l) between tk,j+1 and tk,j+2 as (tk,j+1+(tj,k+2tk,j+1)×pF^k,j(θ)F^k,j+1(θ)F^k,j(θ)) where tk,Jk+1 = t* by convention.
    • If p>F^k,Jk(θ), then manage the patient as having no event by the end of follow-up time t*.
  3. Suppose a patient has premature discontinuation between tM and t*, so that (tM < ck < t*: If pf^k,0(θ), then impute failure time tk(l) between ck and t* as ck+(tck)pf^k,0(θ); otherwise, manage the patient as having no event by the end of follow-up time t*.

  4. The imputation procedure is repeated to form L imputed data sets.

The tied ck’s can be processed separately. Thus, each complete data set has no patients with premature discontinuation, so one can apply the conventional survival analysis methods for the primary analysis with only administrative censoring of follow-up at time t*.

In reality, most clinical trials recruit patients over a period of time and have a common closing date. Therefore, patients always have different planned follow-up times and correspondingly different administrative censoring times for when they could complete the study without the event. Such staggered patient entry can be addressed by letting tk denote the planned follow-up time for the kth patient with premature discontinuation at ck(ck<tk). For tk between two consecutive failure times (tm, tm+1), the applicable survival function can be estimated at ck and tk in a fashion analogous to equation (1). For tk after the last failure time (tk>tM), the applicable survival function at ck and tk can be estimated by the method described for equation (3). In this way, the conditional failure time distribution can be constructed from equations (4)-(7) according to a prematurely discontinued patient’s planned follow-up time tk. The multiple imputation can then be performed in the same fashion as discussed previously.

The proposed method does not seek inferences for the hypothetically true parameters for treatment effects, but rather addresses the sensitivity issues associated with the unobserved outcomes of discontinued patients. For this purpose, the multiple imputation process regards the observed information being fixed, that is, K and M, as well as the corresponding times to event and times to premature discontinuation. In the context of Bayesian multiple imputation, Rubin (1987) refers to this type of imputation as “improper,” because it does not account for the uncertainty associated with the sample estimates (i.e., KM estimates or Cox proportional hazards model counterparts). A way to address such uncertainty is to generate the L data sets by separate conditional failure time distributions estimated from independent nonparametric bootstrap resamples (with replacement) of the original data.

3.2. Parameter Estimations

The method for combining results from L imputed data sets follows well-established rules (Rubin, 1987; Rubin and Schenker, 1991), and it can be applied easily by the SAS procedure MIANALYZE. Let β be a scalar parameter such as a survival rate or a cumulative hazard for a specific time point or a coefficient in the Cox proportional hazards model (i.e., the log hazard ratio) that can be estimated from the complete data. Let β^(l) denote the point estimate for β and let V^β(l) denote its variance estimate from the lth data set. The overall multiple imputation (MI) estimate of β is obtained by averaging the estimates from the L complete-data analyses, β=(1L)Σl=1Lβ^(l), and its estimated variance is the sum of the within-imputation Vβ=(1L)Σl=1LV^β(l) and the product of the between-imputation variance Bβ=(L1)1Σl=1L(β^(l)β^)2 and the finite sample correction shown in equation (8).

V^β=Vβ+(1+L1)Bβ (8)

Given sufficiently large sample size for the complete data to support an approximately standard normal N(0, 1) distribution for its hypothetical version of (β^β)V^β^12, for which missing data prevents availability, confidence intervals for β (and p-values for corresponding statistical tests) can be based on (β^β)V^β12 having the t-distribution with approximate degrees of freedom (d.f.) as shown in equation (9).

d.f.=(L1)(1+((1+L1)BβVβ)1)2=(L1)(1+R1)2 (9)

Here, R expresses the relative increase in variance due to missing information. The fraction of missing information about β is estimated as

γ=R+2(df+3)(1+R). (10)

For nonparametric hypothesis testing with the logrank (or Wilcoxon) statistic, β^(l) is the difference between test treatment and placebo for means of logrank or Wilcoxon scores, and V^β(l) is its estimated variance under the null hypothesis of no difference between test treatment and placebo. It then follows that β(V^β12) approximately has the t-distribution with d.f. as in equation (9). Alternatively, Z^(l)=β^(l)V^β(l) can serve as β(l) with corresponding V^Z(l)=1, in which case the statistical test would be based on Z(V^Z12) with V^Z=(1+(1+L1)BZ) with BZ=Σl=1L(Z^(l)Z)2(L1) (Taylor et al., 2002).

The term L1Bβ in equation (8) and the use of the t-distribution rather than a normal distribution widen the resulting interval estimates to account for replication variability incurred by using L < ∞ (Schafer, 1999). Schafer (1999) suggests that unless the fraction of missing information γ is unduly large, the widening is not substantial, and MI inferences can be quite efficient even when L is small (usually less than 10). Nevertheless, in practice, the appropriate number of imputations should be investigated more closely, especially when the fraction of missing information is large (Horton and Lipsitz, 2001).

4. RESULTS

4.1. Performance of KMMI Method Under θ = 1

In this section, we consider the performance of the KMMI method with θ = 1 for the clinical trial in section 2. With this specification, the imputed data are produced from the same conditional failure time distributions as estimated by the KM method with censoring of the follow-up times of patients with premature discontinuation, and it thereby has the same MAR-like assumption of noninformative independent censoring. To apply this method, we proceed in accordance with Horton and Lipsitz (2001) to determine the appropriate number of imputations (L) by evaluating the stability of an estimator and its standard errors (SE) with respect to the different L’s. Multiple imputations are performed separately for each of the two treatment groups with θ = 1, and 100 replicates of imputations are produced for each of the following numbers of imputations (L = 3, 5, 10, 20, 30, 40, 50, 70, and 100); thus, there are seven different sets of imputations. The variabilities of the estimates for the survival function at the 20th week are summarized in boxplots in Fig. 2 relative to the conventional KM estimates (with censoring of follow-up times for patients with premature discontinuation). The relative variance increases due to missing data (R=(1+L1)BβVβ) corresponding estimates are summarized in Fig. 3. Compared to the conventional KM estimates, the mean values of estimates from the KMMI method are somewhat smaller for all seven sets of imputations. Also, the KMMI estimates and the corresponding R in Fig. 2 and Fig. 3 (for the 20-week survival rate) are not stable for small numbers of imputations (i.e., L ≤ 10). The variability of the MI estimates becomes smaller as the number of imputations increases, and stabilizes near L = 50 or higher for both treatment groups. Thus, 50 imputations is a reasonable choice for the amount of missing information which this example has. Although a comprehensive simulation study could shed more light on the choice of L for different extents of missing data, such research is beyond the scope of this paper. Nevertheless, for any real study, the specification of at least a moderately high value of L ≥ 50 should be considered, especially given the simplicity of the computations even for large L.

Figure 2.

Figure 2

Distributions of 20-week survival rates for 100 replications of different numbers of imputations. The conventional KM estimates are indicated with the horizontal line.

Figure 3.

Figure 3

Distributions of relative variance increase due to missing data (R) of 20-week survival rates for 100 replications of different numbers of imputations.

We apply multiple imputation (MI) with L = 50 henceforth. The conventional KM curves for both treatment groups are shown in Fig. 4(a) with their counterparts from averaging the KM estimates for 50 data sets imputed by the KMMI method. The corresponding cumulative hazard curves (via the Aalen–Nelson estimator) are shown in Fig. 4(b). The relationships shown for the KMMI method are almost identical to their conventional counterparts. In row 2A of Table 3, results from the KMMI method are shown for the hazard ratio for the effect size of the test treatment versus placebo from the unadjusted Cox proportional hazards model (which only includes treatments), as well as for the p-values for the logrank test and the Wilcoxon test. Interestingly, the estimated hazard ratio from the KMMI method is closer to unity (and thus is a smaller effect size) and has a somewhat larger p-value than its conventional counterpart with the use of censoring (HR = 0.724 with p = 0.0436 for KMMI versus HR = 0.675 with p = 0.0140 for conventional). This disagreement between the inference for the effect of the test treatment from the KMMI method with θ = 1 and conventional counterparts with censoring could be a consequence of nonproportional hazards during the follow-up period. As can been seen from the survival curves in Fig. 4(a) and the cumulative hazard curves in Fig. 4(b), the difference between the two treatment groups is most clearly evident during the early stage of the follow-up and less apparent later. This issue is explored further by partitioning the follow-up period into four distinct intervals with approximately equal numbers of events, and then producing conventional interval-specific hazard ratio estimates for each of them from an unadjusted Cox proportional hazards model. The results of such analysis in Table 3(1) suggest much stronger effect sizes for test treatment during 0–6 weeks than during 6–76 weeks, so they are contrary to the hazard ratio being constant during the entire follow-up period.

Figure 4.

Figure 4

Comparison of the results from the conventional (MAR-like) and the KMMI method.

When treatment is the only explanatory variable in the Cox proportional hazards regression model, its estimated effect size is approximately an average of log HR over the entire follow-up period. When there are many patients with premature discontinuation, the estimation of the average log HR through conventional methods with censoring may tend to be mainly influenced by events during the earlier part of the follow-up period (where the effect sizes for test treatment are stronger for this example). The KMMI method eliminates censoring during the follow-up period by imputing potential times to event for every patient with premature discontinuation, and it thereby puts more weight on what happens during the latter part of the follow-up period (where the effect sizes for test treatment are smaller for this example), so it produces a smaller effect size for test treatment (in the sense of an estimated hazard ratio that is closer to unity). Thus, this example suggests that the sensitivity analysis with θ = 1 for the KMMI method can be useful for evaluating the implications of nonproportional hazards during the follow-up period.

An alternative structure for multiple imputation is provided by the Breslow estimators of the survival distributions for the placebo and test treatment groups from the Cox proportional hazards model with treatment as the only explanatory variable, and it can have implementation through its counterparts for equations (1)–(7). As shown in Table 3, rows 1A and 3A, the proportional hazards multiple imputation (PHMI) method under θ = 1 provides results very similar to the conventional methods with censoring, mainly because both operate under the MAR-like assumption of noninformative independent censoring and both have the proportional hazards assumption.

We further consider an imputation with nonparametric bootstrap resampling so as to add extra between-imputation variability and thereby to be in better harmony with a “proper” imputation. Consequently, L may need to be much larger than 50, in order to provide appropriate precision for estimation. Both the KMMI and the PHMI methods proceed with an additional bootstrap step for L = 50, L = 100, and L = 500. The results of MI with the bootstrap for L = 500 are relatively consistent with the methods without the bootstrap for L = 50 (see Table A-1 for details). The bootstrap KMMI method uses separate samples with replacement for each treatment group, and its results for L = 500 (Table 3.4A) are slightly weaker compared with its counterparts without the bootstrap for L = 50. The PHMI method with the bootstrap uses samples with replacement from the combined treatment groups. As shown in Table 3.5A, when performed for L = 500, the PHMI with the bootstrap produces comparable results to the PHMI without the bootstrap for L = 50. The imputation methods with and without the bootstrap arise from different paradigms. The imputation methods with the bootstrap are based on Bayes’s theory and relate the posterior distribution given the observed data to the complete posterior distribution given no missing data in a random sample of a target population, and therefore they add more complexity to the imputation process. Alternatively, the methods without the bootstrap address the uncertainty of missing data in the context of the observed information being known and fixed. Depending on the purpose of the sensitivity analysis, either process can be applied. For this article, we emphasize the sensitivity analysis using the MI methods without the bootstrap for L = 50.

4.2. Sensitivity Analysis

The sensitivity analyses proceed with varying θ (for the test treatment) in a plausible range from 1 to 2.5 (with θ = 1 for placebo) to determine how the assessment of the treatment effect changes for the different extents of imputed events for patients with premature discontinuation at specific times ck versus patients with continued follow-up beyond those times. In this regard, θ = 2.5 = 1/0.4, and 0.4 might represent a reasonably large effect size for a clearly effective treatment versus placebo in the published clinical literature for maintenance treatments of bipolar disorder. On this basis, it is a reasonable choice for the upper bound of the sensitivity parameter θ in terms of how much more rapidly the patients that had premature discontinuation would have the event compared to those that did not; in this regard, it is useful to note that θ = ∞ corresponds to the worst–comparison analysis. The value of θ is varied by 0.01 increments from 1 to 2.5, leading to 150 treatment effect assessments. Contour plots of the hazard ratio estimates and the p-values for treatment comparisons are then constructed as a function of the sensitivity parameter θ. We implement both the KMMI method and the PHMI method in these sensitivity analyses. The multiple imputation results from the Cox proportional hazards models, as well as the logrank and Wilcoxon tests are combined using the method described in section 3.2.

The sensitivity analysis results using the KMMI method are summarized in Fig. 5. The values of θ plotted against the estimated hazard ratios with 95% confidence intervals are shown in Fig. 5a, and the p-values obtained from the Wald test from the Cox proportional hazards model, the logrank test, and the Wilcoxon test are shown in Fig. 5b. The magnitude of the estimated treatment effect moves closer to the null (i.e., θ = 1) as the value of θ increases. The HR estimates for test treatment versus placebo have a range from 0.724 (for θ = 1) to 0.867 (for θ = 2.5). The corresponding p-values from the Wald test vary substantially over the range of θ, indicating that the assumptions for patients with premature discontinuation can substantially influence study conclusions. As expected, the p-values from the Wald test agree with those from the logrank test, and they are larger than those from the Wilcoxon test. In order to have p ≤ 0.05 with the Wald test (or the logrank test), θ ≤ 1.08 (or 1.05) is needed, with this specification being only slightly more stringent than the MAR-like assumption of non-informative independent censoring (or θ = 1). For the Wilcoxon test, p ≤ 0.05 applies with θ ≤ 2.08, so it has better robustness to assumptions about patients with premature discontinuation of treatment for this example than the Wald test or the logrank test. Since the Wilcoxon test receives relatively more weight than the logrank test for early failures and relatively less weight for later failures, it is more able to detect the early hazard differences for this example than the logrank test. As shown in Fig. 4b and Table 3(1), the estimated treatment effect is much stronger (i.e., hazard ratios are further away from 1) in the earlier part of the follow-up.

Figure 5.

Figure 5

Sensitivity analysis results using KMMI method.

The results of sensitivity analyses with the PHMI method are shown in Fig. 6. Because the PHMI method invokes the possibly unrealistic proportional hazards assumption, it suggests better robustness for the conclusions from the Cox proportional hazards model and the logrank test than the KMMI method. For p ≤ 0.05 with the Wald test (or the logrank test), θ ≤ 1.59 (or 1.58) is needed; also, for the Wilcoxon test, p ≤ 0.05 applies for all θ ≤ 2.5. In general, the PHMI method may not always suggest stronger conclusions than the KMMI method. When the differences between the test treatment and the placebo are more substantial during the latter part of the follow-up period than during the early part, the KMMI method with θ = 1 could lead to stronger conclusions (i.e., estimated hazard ratios further from 1 and smaller p-values), while the PHMI method under θ = 1 would tend to produce similar results as conventional analyses with censoring of follow-up time for patients with premature discontinuation. Therefore, the sensitivity analysis based on the KMMI method may provide more accurate assessment than the PHMI method.

Figure 6. Sensitivity analysis results using PHMI method.

Figure 6

Sensitivity analyses with both the KMMI and the PHMI methods can be useful for reviewers to understand the robustness of conclusions for treatment effects to the assumptions of noninformative independent censoring and proportional hazards. The degree to which conclusions are stable across a reasonable range of θ provides an indication of the confidence that can be placed on them. Opinions on possible values of θ can be based on knowledge from other studies for similar interventions. An investigation of the differences between baseline characteristics of completers and patients with premature discontinuation can be useful, as well as the reasons for discontinuation. If such information suggests that only unrealistic values of θ would alter study conclusions, then the results of a primary analysis with conventional methods could be considered robust from a clinical perspective. When the inference about treatment effects could be overturned for plausible values of θ, then it should be viewed with caution.

5. DISCUSSION

Analysis of incomplete data is a challenge for most clinical trials. Often, MAR-like assumptions about the missing data mechanism can be reasonable for primary analyses. However, the possibility of MNAR is difficult to rule out, particularly when patients with test treatment lose its benefit after discontinuation, so sensitivity analyses for alternative ways to address missing data become of interest.

In time-to-event analyses, patients with premature discontinuation have their follow-up time censored at the time of discontinuation, and the usual assumption is noninformative independent censoring. As right-censoring is a special case of coarsened data, the assumption of noninformative independent censoring can be generalized to “coarsened at random,” which extends the concept of MAR to coarsened data (Heitjan, 1994). The MNAR issue for time-to-event data is to account properly for censoring that may be informative. Most sensitivity analyses in the literature assess the effect of various assumptions concerning the dependence between failure and censoring times (Scharfstein and Robins, 2002; Siannis et al., 2005; Ruan and Gray, 2008). However, clinical reviewers can have difficulty in understanding the interpretation of the sensitivity parameters in those analyses, and this can make the specification of reasonable ranges for the sensitivity parameter challenging.

This article discusses sensitivity analyses for time-to-event data, and its suggested methods can have several appealing features in regulatory clinical trial settings. First, they enable direct exploration of the effect of departures from the noninformative independent censoring assumption for conventional methods (such as Cox proportional hazards models, logrank tests, and Wilcoxon tests) through a sensitivity parameter that connects the unobserved outcomes and the observed outcomes, that is, a hazard ratio for a discontinued patient having an event after discontinuation relative to the patients remaining on their assigned treatment. The multiple imputation strategy is straightforward because the predictive distributions are specified directly, and they do not depend on the models for assumed missingness mechanisms. The interpretation of the sensitivity parameter is transparent in the sense that the parameter is based on a standard criterion for analyzing time-to-event data, and consequently may be more understandable to reviewers. Second, the sensitivity analysis accounts for all randomized patients. The specifications for post-discontinuation experience are intended to address the question for what the long-term benefit of initial assignment would be if patients with premature discontinuation were followed to the end of the study without other treatment. In addition, the influence of departures from the noninformative independent censoring assumption with respect to patients with premature discontinuation can be assessed either simultaneously with the proportional hazards assumption by the KMMI method or separately in its own right by the PHMI method. Third, the sensitivity analysis is based on multiple imputation of missing outcomes, and therefore it provides a simple way of generating statistical inference without the need of special software and programming. All of the analyses presented in this article can be produced by standard SAS PROC procedures and SAS macros. Finally, the proposed sensitivity analysis anchors on a primary MAR-like assumption, and then can have calibration toward the worst comparison analysis through how it penalizes premature discontinuation for the test treatment. The method can be specified a priori and does not require any post hoc (i.e., data–driven) revisions. Therefore, this type of sensitivity analysis to address the missing information from censored follow-up times could be attractive in the regulatory environment.

The sensitivity analysis illustrated here was performed for a continuous time-to-event endpoint. However, the methodology and underlying principles can be extended to categorical (or interval censored) time-to-event data. Furthermore, the proposed PHMI strategy can be modified to incorporate the information of patients’ baseline risk factors. One can estimate the failure time distributions separately for subpopulations defined by baseline covariates and treatments through the multivariate Cox proportional hazards model that includes treatments and the set of covariates as explanatory variables. The conditional failure time distributions can then be used for risk–adjusted multiple imputations. Currently, the discussed MI strategies invoke separate imputations for each of the two groups with its corresponding survival distribution estimates. An alternative approach is to impute times to event for both treatment groups using the information in the placebo group. The details of this method and its corresponding results are discussed in the appendix. However, it may not address robustness as stringently as the methods that are the main focus of this article, when the placebo group has a higher proportion of discontinuations than the test treatment group.

Typically, the design of a confirmatory trial should account for the loss of power from patients with premature discontinuation (National Research Council, 2010). An often-used approach is simply to inflate the initially planned sample size by the reciprocal of one minus the anticipated premature discontinuation rate, but this may only be reasonable if missing information is MCAR. Power calculations should be based on more plausible MAR-like assumptions, and perhaps accommodating the situations of MNAR and the potentially reduced effect size estimation in sensitivity analyses. However, those concerns usually cannot be addressed analytically in sample size calculations. The multiple imputation strategy presented in the current sensitivity analysis method can be adapted for simulation-based power calculations to assess the effect of missing data on sample size.

ACKNOWLEDGMENTS

The authors acknowledge GlaxoSmithKline for generously providing data for the clinical trial example of the maintenance treatment for bipolar disorder. The views and opinions contained in this article shall not be construed or interpreted whether directly or indirecly to be the views or opinions of any of the officers or employees of GlaxoSmithKline Research and Development Limited or any of its affiliated companies forming part of the GlaxoSmithKline group of companies. Further, reliance on the information contained in this article is at the sole risk of the user. The information is provided “as is” without any warranty or implied term of any kind, either express or implied, including but not limited to any implied warranties or implied terms as to quality, fitness for a particular purpose, or non-infringement. All such implied terms and warranties are hereby excluded.

FUNDING

This article is supported in part by NIH R01 ES021900 (H.Z.).

6. APPENDIX

6.1. Graphical Displays for the Cumulative Discontinuation Proportions by Documented Reasons

Figure A-1.

Figure A-1

Cumulative discontinuation proportions by documented reasons

6.2. Results from the KMMI and PHMI Methods With/Without Bootstrap Resampling at θ = 1

Table A-1.

KMMI and PHMI methods with/without bootstrap resampling at θ = 1

Semi-parametric analysis (Cox PH model)
Analysis method Coefficient Std Errb HR (95% CI) P-Value Log-ranka Wilcoxona
1. L = 50
KMMI without bootstrap -0.322 0.160 0.724 (0.530, 0.991) 0.0436 0.0451 0.0109
KMMI with bootstrap -0.325 0.180 0.723 (0.507, 1.030) 0.0727 0.0770 0.0167
PHMI without bootstrap −0.388 0.158 0.678 (0.497, 0.925) 0.0143 0.0145 0.0053
PHMI with bootstrap -0.389 0.165 0.678 (0.490, 0.938) 0.0191 0.0204 0.0068
2. L = 100
KMMI without bootstrap -0.328 0.159 0.720 (0.527, 0.984) 0.0394 0.0410 0.0101
KMMI with bootstrap −0.347 0.178 0.707 (0.498, 1.003) 0.0519 0.0550 0.0127
PHMI without bootstrap −0.394 0.158 0.674 (0.495, 0.918) 0.0124 0.0126 0.0048
PHMI with bootstrap −0.399 0.164 0.671 (0.486, 0.925) 0.0150 0.0157 0.0056
3. L = 500
KMMI without bootstrap −0.332 0.156 0.717 (0.528, 0.974) 0.0332 0.0345 0.0091
KMMI with bootstrap −0.336 0.170 0.714 (0.512, 0.997) 0.0480 0.0507 0.0114
PHMI without bootstrap −0.398 0.156 0.672 (0.495, 0.911) 0.0106 0.0108 0.0044
PHMI with bootstrap −0.396 0.160 0.673 (0.492, 0.921) 0.0134 0.0140 0.0053
a

P-Values of hypothesis test.

b

Standard error.

6.3. Alternative KMMI Strategy for Sensitivity Analysis

An alternative way to perform sensitivity analysis is to use the information in the placebo group to impute times to event for both treatment groups using the KMMI approach. To generate one set of the L imputed data, one could first impute failure time for discontinued patients in the placebo group under certain specification of θp; the KM estimates obtained from those complete data in the placebo group are then used to perform imputation through equations (1)–(7) for the discontinued patients in the test treatment group. In this MI procedure, the sensitivity parameter θ only needs to be specified for the placebo group. Besides choosing θp = 1 to approximate a MAR-like assumption, θp > 1 can be used to address the possibility that the post-discontinuation experience is less favorable than the patients remaining on their assigned treatment. The results of sensitivity analysis at L = 50 under various specifications of θp are shown in Table A-2. For this particular example, the estimated treatment effect when θp = 1 is slightly weaker than those from applying the KMMI method within individual treatment groups with θ = 1 for both (Table 3.2A). As the value of θP increases, results in favor of the test treatment become stronger, because the placebo group has more prematurely discontinued patients than the test treatment group, and thereby θp > 1 penalizes the placebo group more.

Table A-2.

Alternative KMMI strategy for sensitivity analysis

Semi-parametric analysis (Cox PH model)
θp Coefficient Std Errb HR (95% CI) P-Value Log-ranka Wilcoxona
1 −0.135 0.155 0.730 (0.538, 0.990) 0.0430 0.0440 0.0131
1.1 −0.332 0.154 0.717 (0.530, 0.971) 0.0313 0.0322 0.0097
1.2 −0.346 0.153 0.708 (0.524, 0.956) 0.0241 0.0248 0.0075
1.3 −0.361 0.150 0.697 (0.519, 0.935) 0.0160 0.0164 0.0053
1.4 −0.368 0.150 0.692 (0.516, 0.928) 0.0140 0.0143 0.0043
1.5 −0.384 0.151 0.681 (0.507, 0.916) 0.0110 0.0113 0.0033
a

P-Values of hypothesis test.

b

Standard error.

Footnotes

Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/lbps.

REFERENCES

  1. Brown JBW, Hollander M, Korwar RM. Nonparametric tests of independence for censored data, with applications to heart transplant studies. In: Proschan F, Serfling RJ, editors. In Reliability and Biometry: Statistical Analysis of Lifelength. SIAM; Philadelphia, PA: 1974. pp. 327–354. [Google Scholar]
  2. Calabrese JR, Bowden CL, Sachs G, Yatham LN, Behnke K, Mehtonen OP, Montgomery P, Ascher J, Paska W, Earl N, DeVeaugh-Geiss J, Lamictal 605 Study Group A placebo-controlled 18-month trial of lamotrigine and lithium maintenance treatment in recently depressed patients with bipolar I disorder. Journal of Clinical Psychiatry. 2003;64(9):1013–1024. doi: 10.4088/jcp.v64n0906. [DOI] [PubMed] [Google Scholar]
  3. Committee for Medicinal Products for Human Use. CHMP . Guideline on Missing data in Confirmatory Clinical Trials (EMA/CPMP/EWP/1776/99) CHAMP; London: 2010. [Google Scholar]
  4. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological) 1972;34(2):187–220. [Google Scholar]
  5. Flyer PA. Discussion: “Incomplete data in clinical studies: Analysis, sensitivity, and sensitivity analysis” by Geert Molenberghs. Drug Information Journal. 2009;43(4):437–439. [Google Scholar]
  6. Flyer P, Hirman J. Missing data in confirmatory clinical trials. Journal of Biopharmaceutical Statistics. 2009;19(6):969–979. doi: 10.1080/10543400903242746. [DOI] [PubMed] [Google Scholar]
  7. Gehan EA. A generalized wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika. 1965;52:203–223. [PubMed] [Google Scholar]
  8. Heitjan DF. Ignorability in general incomplete-data models. Biometrika. 1994;81(4):701–708. [Google Scholar]
  9. Horton NJ, Lipsitz SR. Multiple imputation in practice: Comparison of software packages for regression models with missing variables. The American Statistician. 2001;55(3):244–254. [Google Scholar]
  10. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481. [Google Scholar]
  11. Little RJA, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons, Inc.; New York, NY: 2002. [Google Scholar]
  12. Little RJA, Yau L. Intent-to-treat analysis for longitudinal studies with dropouts. Biometrics. 1996;52(4):11324–1333. [PubMed] [Google Scholar]
  13. Mallinckrodt CH, Lane PW, Schnell D, Peng Y, Mancuso JP. Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials. Drug Information Journal. 2008;42(4):303–319. [Google Scholar]
  14. Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports. 1966;50:163–170. [PubMed] [Google Scholar]
  15. National Research Council, Panel on Handling Missing Data in Clinical Trials . The Preventing and treatment of Missing Data in Clinical Trial. National Academies Press; Washington, DC: 2010. ISBN 9780309158145. Available at: www.nap.eduopenbook.php?record_id=12955. [Google Scholar]
  16. Permutt T, Pinheiro J. Editorial: Dealing with the missing data challenges in clinical trials. Drug Information Journal. 2009;43(4):403–408. [Google Scholar]
  17. Roger JH. Sensitivity Analysis for Longitudinal Studies With Withdrawals in Practice; GlaxoSmithKline, 2008 U.S. Biostatistics Annual Conference; Research Triangle Park, NC. 2008.Nov, [Google Scholar]
  18. Rothmann MD, Koti K, Lee KY, Lu HL, Shen YL. Missing data in biologic oncology products. Journal of Biopharmaceutical Statistics. 2009;19(6):1074–1084. doi: 10.1080/10543400903242993. [DOI] [PubMed] [Google Scholar]
  19. Ruan PK, Gray RJ. Sensitivity analysis of progression-free survival with dependent withdrawal. Statistics in Medicine. 2008;27(8):1180–1198. doi: 10.1002/sim.3015. [DOI] [PubMed] [Google Scholar]
  20. Rubin DB. Multiple Imputation for Nonresponse in Survey. John Wiley & Sons; New York, NY: 1987. [Google Scholar]
  21. Rubin DB, Schenker N. Multiple imputations in health-care database: An overview and some applications. Statistics in Medicine. 1991;10(4):585–598. doi: 10.1002/sim.4780100410. [DOI] [PubMed] [Google Scholar]
  22. Schafer JL. Multiple imputation: A primer. Statistical Methods in Medical Research. 1999;8(1):3–15. doi: 10.1177/096228029900800102. [DOI] [PubMed] [Google Scholar]
  23. Scharfstein DO, Robins JM. Estimation of the failure time distribution in the presence of informative censoring. Biometrika. 2002;89(3):617–634. [Google Scholar]
  24. Siannis F, Copas J, Lu G. Sensitivity analysis for informative censoring in parametric survival models. Biostatistics. 2005;6(1):77–91. doi: 10.1093/biostatistics/kxh019. [DOI] [PubMed] [Google Scholar]
  25. Taylor JMG, Murray S, Hsu C-H. Survival estimation and testing via multiple imputation. Statistics and Probability Letters. 2002;58(3):221–232. [Google Scholar]
  26. Walton MK. Addressing and advancing the problem of missing data. Journal of Biopharmaceutical Statistics. 2009;19(6):945–956. doi: 10.1080/10543400903238959. [DOI] [PubMed] [Google Scholar]
  27. Wittes J. Missing inaction: Preventing missing outcome data in randomized clinical trials. Journal of Biopharmaceutical Statistics. 2009;19(6):957–968. doi: 10.1080/10543400903239825. [DOI] [PubMed] [Google Scholar]
  28. Yan X, Lee S, Li N. Missing data handling methods in medical device clinical trials. Journal of Biopharmaceutical Statistics. 2009;19(6):1085–1098. doi: 10.1080/10543400903243009. [DOI] [PubMed] [Google Scholar]
  29. Zhang J. Sensitivity analysis of missing data: Case studies using model-based mulitple imputation. Drug Information Journal. 2009;43:475–484. [Google Scholar]

RESOURCES