Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 30.
Published in final edited form as: Stat Med. 2019 Aug 8;38(22):4453–4474. doi: 10.1002/sim.8319

Maximum likelihood estimation with missing outcomes: From simplicity to complexity

Stuart G Baker 1
PMCID: PMC6879193  NIHMSID: NIHMS1038690  PMID: 31392751

Abstract

Many clinical or prevention studies involve missing or censored outcomes. Maximum likelihood (ML) methods provide a conceptually straightforward approach to estimation when the outcome is partially missing. Methods of implementing ML methods range from the simple to the complex, depending on the type of data and the missing-data mechanism. Simple ML methods for ignorable missing-data mechanisms (when data are missing at random) include complete-case analysis, complete-case analysis with covariate adjustment, survival analysis with covariate adjustment, and analysis via propensity-to-be-missing scores. More complex ML methods for ignorable missing-data mechanisms include the analysis of longitudinal dropouts via a marginal model for continuous data or a conditional model for categorical data. A moderately complex ML method for categorical data with a saturated model and either ignorable or nonignorable missing-data mechanisms is a perfect fit analysis, an algebraic method involving closed-form estimates and variances. A complex and flexible ML method with categorical data and either ignorable or nonignorable missing-data mechanisms is the method of composite linear models, a matrix method requiring specialized software. Except for the method of composite linear models, which can involve challenging matrix specifications, the implementation of these ML methods ranges in difficulty from easy to moderate.

Keywords: composite linear model, double sampling, latent class instrumental variable, missing-data mechanism, perfect fit analysis, randomized trial

1. INTRODUCTION

In many clinical or prevention studies the outcome is missing or censored. Maximum likelihood (ML) methods are a conceptually simple approach for estimation in this setting. The landmark 1976 paper by Rubin1 made several key innovations for ML estimation with missing data: a missing-data indicator as a random variable, a comprehensive likelihood framework, and the concept of ignorable and nonignorable missing-data mechanisms. Wu and Carroll,2 Heitjan and Rubin,3 and Little and Rubin,4 extended this approach to censoring mechanisms.

The basic set-up follows. The goal of the analysis is to estimate parameters in an outcome model, a model for the effect of treatment or covariates on outcome. Coupled with the outcome model is a missing-data mechanism, a model for the probability that the outcome is missing or censored. The sets of parameters for the outcome model and the missing-data mechanism do not overlap and do not constrain each other.

In the context of likelihood-based inference, an ignorable missing-data mechanism is a missing-data mechanism whose parameters factor from the likelihood and hence do not contribute to likelihood-based inference for the outcome model. Rubin1 showed that an ignorable missing-data mechanism depends only on completely observed variables, in which case the data are said to be Missing at Random (MAR). A special case of MAR is Missing Completely at Random (MCAR), corresponding to a constant probability the data are missing.

A nonignorable missing-data mechanism is simply a missing-data mechanism that is not ignorable. This tutorial introduces the terminology of directly and indirectly nonignorable missing-data mechanisms. A directly nonignorable missing-data mechanism is a nonignorable missing-data mechanism in which the probability of missing a variable depends on that variable and possibly on other variables. An indirectly ignorable missing-data mechanism is a missing-data mechanism in which the probability of missing a variable does not depend on that variable but depends on at least one other variable that is partially missing. Table 1 summarizes this missing-data taxonomy in the context of missing outcomes.

Table 1.

Missing-data taxonomy applied to missing outcomes

Missing-data mechanism
Ignorable Non-ignorable
Definition Likelihood-based inference for the outcome model does not involve parameters modeling the missing-data mechanism* Not ignorable
Implication Missing in outcome depends only observed variables Missing in outcome depends on at least one unobserved variable
Outcome is said to be Missing at random (MAR) Missing not at random (MNAR)
Special cases Missing completely at random (MCAR)
Missing in outcome occurs with constant probability
Directly non-ignorable
Missing in outcome depends only on outcome

Indirectly non-ignorable
Missing in outcome does not depend on outcome and depends on other partially missing variables
*

The sets of parameters for the outcome model and missing-data mechanism do not overlap and do not constrain one other

Implementation of ML methods with missing outcomes can range from simple computations to complex modeling with specialized software. Because ML methods are often tailored to specific missing-data scenarios and there are numerous missing-data scenarios, it is not possible to cover all ML methods here. Table 2 lists the ML methods discussed in this tutorial.

Table 2.

Overview of ML estimation methods

Method Indications for Use Missing-Data Mechanism Implementation
Complete case analysis Missing in outcome depends on randomization group Ignorable Compute simple statistics for complete cases (participants not missing the outcome).
Complete case analysis with covariate adjustment Missing in outcome depends on randomization group and covariate Ignorable after covariate adjustment Fit the outcome model (as a function of randomization group and covariates) to complete cases with the covariate.
Survival analysis with covariate adjustment Censoring depends on randomization group and covariate Ignorable after covariate adjustment Fit the outcome model (as a function of randomization group and covariates) to survival data.
Analysis via propensity-to-be-missing scores Missing in outcome or censoring depends on randomization group and many covariates Ignorable after covariate adjustment (1) Fit a model for the missing-data mechanism.
(2) Use the fitted model to compute scores.
(3) Compute overall estimate based on quintiles of scores.
Longitudinal dropout analysis Dropout depends on previous observed outcome and possibly randomization group and covariate Ignorable For a continuous longitudinal outcome, fit a marginal model using commercial software.
For a longitudinal binary outcome, fit a conditional model.
Perfect fit analysis Saturated models with categorical data Ignorable or nonignorable (1) Set expected counts equal to observed counts and solve for parameter estimates.
(2) Compute statistic from parameter estimates.
(3) Compute estimated variance using MP transformation.
Composite linear models Flexible models with categorical data Ignorable or nonignorable Fit using specialized software.

2. COMPLETE-CASE ANALYSIS

Consider a randomized trial in which missing in univariate outcome Y depends on randomization group Z. As an example, missing in outcome depends on side effects of the experimental treatment. Complete cases are participants who are not missing the outcome. For this scenario, the ML method is a complete case analysis, an analysis involving only complete cases. Separate derivations involve continuous and binary outcomes.

2.1. Continuous outcomes

Let subscript i index trial participant. Let Yi denote the outcome with realization yi. Let MissYi denote the missing-data indicator, where MissYi = 1 if yi is missing and 0 otherwise. Let {MissY} and {ObsY}denote the set of persons with missing and observed outcomes, respectively. Let Zi denote the randomly assigned group with realization zi. The outcome model, pr(yi|zi;θ), is the distribution of outcome yi given randomization to group zi, which is modeled by parameter set θ. The missing-data mechanism, pr(MissYi=1|zi;β), is the probability of missing outcome Yi given randomization to group zi, which is modeled by parameter set β. By definition, pr(MissYi=0|zi;β)=1pr(MissYi=1|zi;β). An example of this missing-data mechanism is pr(MissYi=1|Zi=0;β)=β0=1/2 for participants randomized to group 0, and pr(MissYi=1|Zi=1;β)=β1=1/3 for participants randomized to group 1, where β = {β0 β1}. The parameter sets θ and β do not overlap and do not constrain one another.

The likelihood is the product of a factor for participants missing outcome, LMissY, and a factor for participants with observed outcome, LObsY,

LikCC(θ,β) =LMissY×LObsY,whereLMissY=i{MissY}pr(MissYi=1|zi;β)×pr(yi|zi;θ)dyi=i{MissY}pr(MissYi=1|zi;β),LObsY=i{ObsY}pr(MissYi=0|zi;β)×pr(yi|zi;θ) (1)

The factor LMissY integrates over the missing continuous outcome. Rewriting the likelihood in equation (1) by defining fCC(β) as a function of parameters involving only β and defining LikCC:Ign(θ) as a function of parameters involving only θ yields

LikCC(θ,β)=fCC(β)LikCC:Ign(θ),wherefCC(β)=i{MissY}pr(MissYi=1|zi;β)×i{ObsY}pr(MissYi=0|zi;β),LikCC:Ign(θ)=i{ObsY}pr(yi|zi;θ). (2)

Because fCC(β) factors from the likelihood in equation (2), the missing-data mechanism is ignorable, so ML estimation for θ involves only LikCC:Ign(θ). Moreover, because LikCC:Ign(θ) involves only observed values of outcome, ML estimation for θ involves only complete cases.

2.2. Example 1

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a continuous biomarker. Missing in outcome depends only on randomization group. If the biomarker is normally distributed with a different mean for each randomization group, a simple ML estimate for the effect of treatment on outcome is the difference in mean biomarkers levels between randomization groups among the complete cases.

2.3. Binary outcomes

A similar derivation applies to binary outcomes. Let nzy denote the number of persons randomized to group z =0, 1, with observed outcome y=0, 1. Let wz denote the number of persons randomized to group z =0, 1, with a missing outcome. See Table 3. The outcome model, pr(Y=1|z;θ)=θz, is the probability of outcome 1 given randomization to group z. The missing-data mechanism, pr(MissY=1|z;β)=βz, is the probability of missing outcome y given randomization group z. The likelihood with β = {β0, β1} and θ = {θ0, θ1} is

LikCC(θ,β) =LMissY×LObsY,whereLMissY=z{βz(1θz)+βzθz}w=zβzwzLObsY=z{(1βz)(1θz)}nz0×{(1βz)θz}nz1. (3)

The factor LMissY sums over the missing binary outcomes. Let “+” in a subscript denote summation over the index in the subscript, so nz+ = nz0 + nz1. Rewriting the likelihood in equation (3) yields

LikCC(θ,β)=fCC(β)LikCC:Ign(θ),wherefCC(β)=zβzwz×(1βz)nz+,LikCC:Ign(θ)=z(1θz)nz0×θznz1. (4)

ML estimation for θ comes from LikCC:Ign(θ), which involves only the complete cases {nzy}.

Table 3.

Hypothetical counts for complete-case analysis

Randomization group Outcome
Y=0 Y =1 Missing
Z=0 n00 (400) n01 (600) w0 (200)
Z=1 n10 (200) n11 (600) w1 (400)

2.4. Example 2

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a binary biomarker. Missing in outcome depends only on randomization group. For this scenario, a simple ML estimate of treatment effect is d=θ(EST)1θ(EST)0, where θ(EST)z=nz0/nz+. The estimated standard error is se=v, where v=zθ(EST)z(1θ(EST)z)/nz+. For the hypothetical counts in Table 3, d= 0.150 with standard error 0.022.

3. COMPLETE-CASE ANALYSIS WITH COVARIATE ADJUSTMENT

Consider a randomized trial in which missing in outcome Y depends on randomization group Z and baseline covariate X. If the covariate X is not included in the outcome model, the missing-data mechanism is nonignorable leading to challenging ML estimation. The simple expedient of conditioning on the baseline covariate X in the outcome model yields an ignorable likelihood and simple ML estimation based on complete cases with covariate adjustment. Separate derivations involve continuous and binary outcomes.

3.1. Continuous outcomes

Let Xi with realization xi denote the covariate for person i. The outcome model, pr(yi|zi,xi;θ), is the distribution of outcome yi given randomization to group zi, and covariate xi. The missing-data mechanism, pr(MissY=1|zi,xi;β), is the probability of missing outcome yi given covariate xi and randomization to group zi. For example, suppose the probability of missing outcome due to a side effect of treatment is highest among participants in randomization group 1 who are age 60 or older at randomization. Let Xi =0 if age at randomization is less 60, and 0 otherwise. An example of this missing-data mechanism is pr(MissYi=1|Zi=0,Xi=0,β)=β00=1/5,pr(MissYi=1|Zi=0,Xi=1,β)=β01=1/5,pr(MissYi=1|Zi=1,Xi=0,β)=β10=1/5,pr(MissYi=1|Zi=1,Xi=1,β)=β11=1/2, where β = {β00 β01 β10 β11}. The parameter sets θ and β do not overlap and do not constrain one another. The likelihood is

LikCCX(θ,β)=LMissY×LObsY,whereLMissY=i{MissY}pr(MissYi=1|zi,xi;β)×pr(yi|zi,xi;θ)dyi=i{MissY}pr(MissYi=1|zi,xi;β),LObsY=i{ObsY}pr(MissYi=0|zi,xi;β)×pr(yi|zi,xi;θ). (5)

Rewriting the likelihood in equation (5) by defining fCCX(β) as a function of parameters involving only β and defining LikCCX:Ign(θ) as a function of parameters involving only θ yields

LikCCX(θ,β)=fCCX(β)×LikCCX:Ign(θ),wherefCCX(β)=i{MissY}pr(MissYi=1|zi,xi;β)i{ObsY}pr(MissYi=0|zi,xi;β)LikCCX:Ign(θ)=iObsYpr(yi|xi,zi;θ). (6)

Because fCCX(β) factors from the likelihood, the missing-data mechanism is ignorable. Moreover, because LikCCX:Ign(θ) involves only observed values of outcome, ML estimation of θ involves only complete case with covariates.

If X is partially MCAR, the likelihood based on all the data is indirectly nonignorable, leading to challenging ML estimation. However, the simple of expedient of considering only the random subset of the data with the observed covariate X yields a likelihood factor involving only θ, a result related to the formulation of Little et al.5. Moreover, this likelihood factor involves only complete cases with observed values of covariate X. See Appendix A.

3.2. Example 1

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a continuous biomarker. Missing in outcome depends only on randomization group and age. Under this scenario, ML estimation can involve fitting to the complete cases a linear regression for the biomarker as a function of randomization group and age. The estimated treatment effect is the estimated coefficient for randomization group in the linear regression.

3.3. Binary outcomes

Consider a binary outcome and categorical baseline covariate. Suppose that missing in outcome depends only on treatment group and covariate. Let nzxy denote the number of persons randomized to group z =0, 1 with baseline covariate x = 0, 1 and observed outcome y=0, 1. Let wzx denote the number of persons with randomized to group z =0, 1 with baseline covariate x=0, 1 who had a missing outcome. See Table 4. . Let pr(Y=1|z, x;θ)=θzx denote the probability of outcome 1 given randomization to group z. Let pr(MissY=1|z,x;β)=βzx denote the probability of missing outcome given randomization group z. The likelihood with β = {β00, β01, β10, β11} and θ = {θ00, θ00, θ10, θ11} is

LikCCX(θ,β)=LMissY×LObsY,whereLMissY=zx{βzx(1θzx) +βzxθzx}wzx=zxβzxwzx,LObsY=zx{(1βzx)(1θzx)}nzx0×{(1βzx)θzx}nzx1. (7)

Rewriting the likelihood in equation (7) yields

LikCCX(θ,β)=fCCX(β)LikCCX:Ign(θ),wherefCCX(β)=zx(1βzx)wzx×βzxnzx+,LikCCX:Ign(θ)=zx(1θzx)nzx0×θzxnzx1. (8)

Because LikCCX:Ign(θ) involves only observed values of Y, ML estimation of θ involves only complete case with covariates.

Table 4.

Hypothetical counts for complete-case analysis with covariate adjustment

Randomization group Covariate Outcome
Y=0 Y =1 Missing
Z=0 X=0 n000 (100) n001 (200) w00 (100)
X=1 n010 (300) n011 (400) w01 (100)
Z=1 X=0 n100 (100) n101 (200) w10 (100)
X=1 n110 (100) n111 (400) w11 (300)

3.4. Example 2

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a continuous biomarker. Missing in outcome depends only on randomization group and a categorical covariate. Let πx denote the known probability the covariate takes value x in a target population. An ML estimate of treatment effect in the target population is d=x(θ(EST)1xθ(EST)0x)πx, where θzx(EST)=nzx0/nzx+. The estimated standard error is se=v, where v=zx{θ(EST)zx(1θ(EST)zx)/nzx+}πx2 For the counts in Table 4 with πx = 0.5, d= 0.114 and standard error 0.023.

4. SURVIVAL ANALYSIS WITH COVARIATE ADJUSTMENT

Consider a randomized trial where outcomes are survival times and censoring depends on randomization group Z and baseline covariate X. Let F denote the failure time in the absence of censoring, and let C denote censoring time in the absence of failure. Censoring at time c implies F occurs at time c or later, and failure at time f implies C occurs after time f. Let pr(Fi=fi|zi,xi;θ) denote the probability of failure (in the absence of censoring) at time fi , given randomization group zi and covariate xi. Let pr(Ci=ci|zi,xi;β) denote the probability of censoring (in the absence of failure) at time ti , given randomization group zi and covariate xi. The parameter sets θ and β do not overlap and do not constrain one another.

If the covariate X is not included in the outcome model, the censoring mechanism is nonignorable. The simple expedient of including X in the outcome model leads to an ignorable censoring mechanism. The likelihood is

LikSurvX(θ,β)=LCens×LFail, whereLCens=i{Cens}cipr(Ci=ci|zi,xi;β)×pr(Fi=fi|zi,xi;θ)dfi=i{Cens}pr(Ci=ci|zi,xi;β)×pr(Fci|zi,xi;θ),LFail=i{Fail}fipr(Ci=ci|zi,xi;β)×pr(Fi=fi|zi,xi;θ)dci=i{Fail}pr(Ci>fi|zi,xi;β) ×pr(Fi=fi|zi,xi;θ). (9)

The factor LCens integrates over the unobserved failure times. The factor LFail integrates over the unobserved censoring times. Rewriting the likelihood in equation (9) by defining fSurvX(β) as a function of parameters involving only β and defining LikSurvX:Ign(θ) as a function of parameters involving only θ yields

LikSurvX(θ,β)=fSurvX(β)×LikSurvX:Ign(θ), wherefSurvX(β)=i{Cens}pr(Ci=ci|zi,xi;β)×i{Fail}pr(Ci>fi|zi,xi;β),LikSurvX:Ign(θ)=i{Cens}pr(Fci|zi,xi;θ)×i{Fail}pr(F=fi|zi,xi;θ). (10)

Because fSurvX(β) factors from the likelihood in equation (10), the censoring mechanism is ignorable, and ML estimation of θ involves only LikSurvX:Ign(θ).

If X is partially MCAR, the likelihood based on all the data is indirectly nonignorable, making ML estimation difficult. However, the simple of expedient of considering only a random subset of the data with the observed covariate X leads to a likelihood with the covariate that involves only θ. See Appendix B.

4.2. Example

A hypothetical trial randomizes participants negative on a biomarker to dietary supplement or placebo. The outcome is time until the biomarker is positive. Loss-to-follow-up depends only on randomization group and age. ML estimation can involve fitting a proportional hazards model in which the hazard for failure depends on randomization group and age. The estimated treatment effect is the estimated coefficient for randomization group in the model.

5. PROPENSITY-TO-BE-MISSING SCORES

The method of propensity-to-be-missing scores6 simplifies a complete-case analysis or a survival analysis when adjusting for multiple baseline covariates. It also avoids having to specify a function for incorporating multiple covariates into the outcome model and yields an easily interpretable difference estimate. The method of propensity-to-be-missing scores involves the following three steps.

Step 1. Fit a separate model to the missing-data mechanism in each randomization group. For a univariate outcome with randomization group z, fit a model for the missing-data mechanism, pr(MissYi=1|Zi= z,xi;βz). For a survival outcome with randomization group z, fit a model for the censoring mechanism, pr(Ci=ci|Zi=z,xi;βz). For a proportional hazards model for the censoring mechanism in randomization group z, let c*(z, xi; βz) denote the proportionality component of the model, where the other component is the baseline hazard for censoring. Let β(EST)z denotes the estimate of βz

Step 2. Compute propensity-to-be-missing scores. For a univariate outcome, let scorezi=pr(MissYi=1|zi,xi;β(EST)z). For a survival outcome with a proportional hazards model for censoring, let scorezi =c*(zi, xi; β(EST)z).

Step 3. Compute estimated treatment effect and its standard error based on estimates in each quintile of scores. Divide the set of scores for each randomization group z, {scorezi}, into quintiles. For randomization group z and quintile j, let fzj denote the estimated probability of outcome or the probability of survival to a pre-specified time. Let sezj denote the estimated standard error of fzj. Let Nz denote the number in randomization group z. The estimated treatment effect is the treatment effect averaged over the quintiles,

d=j(f1jf0j)/5=(jf1jjf0j)/5. (11)

The estimated standard error of d is se=v, where

v=z{sezj2/25+j(fzjfz5)24/(25Nz)j>k2(fzjfz5)/(25Nz)}. (12)

5.1. Example

The AIDS Clinical Trials Group randomized patients to dual therapy (z=0) versus triple therapy (z=1) into groups of equal size of Nz= 328.7 Let d denote the estimated difference in survival to 18 months with triple instead of dual therapy. Approximately 20% of subjects were missing outcomes due to refusal to continue the study or loss-to-follow-up. Two baseline covariates, age and CD4 count, are likely related to both survival and dropout. Following Baker et al.,6 let fzj denote the Kaplan-Meier estimate of the probability of surviving 18 months among participants in quintile j of the scores in randomization group z. Let sezj denote the estimated standard error of fzj. Substituting the values fzj and sezj from Table 5 into the equations (11) and (12) gives d = 0.72 with standard error 0.34.

Table 5.

Estimate and standard errors with propensity-to-be-missing score

Randomization group Z=0 (Dual therapy) Randomization group Z=1 (Triple Therapy)
Quintile Estimate* Standard Error Quintile Estimate* Standard Error
1 0.539 0.041 1 0.584 0.070
2 0.619 0.040 2 0.736 0.062
3 0.592 0.041 3 0.792 0.057
4 0.695 0.038 4 0.658 0.071
5 0.793 0.034 5 0.828 0.054
*

Estimated probability of surviving to 18 months in each quntile.

6. LONGITUDINAL DROPOUTS

Consider a randomized trial involving longitudinal outcomes in which dropout depends on previously observed outcomes and possibly randomization group and covariates. For example, participants with an unfavorable outcome at a previous time may be more likely to drop out than those with a favorable outcome at a previous time. ML estimation involving this ignorable missing-data mechanism is discussed separately for continuous and binary outcomes

6.1. Continuous outcome

Without loss of generalizability, consider outcomes at three times, denoted Y1, Y2, and Y3, with Y1 always observed. The outcome model, pr(y1i,y2i,y3i|zi,xi;θ), is the joint distribution of outcomes y1i, y2i, and y3i given randomization to group zi with covariate xi. The covariate xi could be a baseline covariate or covariate that varies over time in a predetermined manner, such as time of observation. The missing-data mechanism pr(MissY2i=1|y1i,zi,xi;β), is the probability of missing outcome Y2i given outcome y1i, randomization to group zi, and covariate xi. The missing-data mechanism, pr(MissY3i=1|MissY2i=0,y1i,y2i;β), is the probability of missing outcome Y3i given not missing outcome Y2i, outcome y2i, outcome y1i, randomization to group zi, and covariate xi. The parameter sets θ and β do not overlap and do not constrain one another. The probabilities of dropout at time 2, dropout at time 3, and no dropout are, respectively,

fdrop2(y1i,zi,xi;β)=pr(MissY2i=1|y1i,zi,xi;β),fdrop3(y1i,y2i,zi,xi;β)=pr(MissY3i=1|MissY2i=0,y1i,y2i,zi,xi;β)×pr(MissY2i=0|y1i,zi,xi;β),fnodrop(y1i,y2i,zi,xi;β)=pr(MissY3i=0|MissY2i=0,y1i,y2i,zi,xi;β)×pr(MissY2i=0|y1i,zi,xi;β). (13)

The likelihood is the product of three factors corresponding to the three subsets of participants defined by dropouts, {DropOutTime2}, {DropOutTime3}, and {NoDropOut},

LLD(θ,β)=LDropOutTime2×LDropoutTime3×LNoDropOut, whereLDropOutTime2=i{DropOutTime2}fdrop2(y1i,zi,xi;β)×pr(y1i,y2i,y3i|zi,xi;θ)dy3idy2i,LDropOutTime3=i{DropOutTime3}fdrop3(y1i,y2i,zi,xi;β)×pr(y1i,y2i,y3i|zi,xi;θ)dy3i,LNoDropOut=i{NoDropOut}fnodrop(y1i,y2i,zi,xi;β)×pr(y1i,y2i,y3i|zi,xi;θ). (14)

Rewriting the likelihood in equation (14) by defining fLD(β) as a function of parameters involving only β and defining LLD:Ign(θ) as a function of parameters involving only θ yields

LLD(θ,β)=fLD(β)LLD:Ign(θ), wherefLD(β)=i{DropOutTime2}fdrop2(y1i|zi,xi;β)×i{DropOutTime3}fdrop3(y1i,y2i|zi,xi;β)×i{NoDropOut}fnodrop(y1i,y2i|zi,xi;β),LLD:Ign(θ)=i{DropOutTime2}pr(y1i,y2i,y3i|zi,xi;θ)dy3idy2i×i{DropOutTime3}pr(y1i,y2i,y3i|zi,xi;θ)dy3i×i{NoDropOut}pr(y1i,y2i,y3i|zi,xi;θ). (15)

ML estimation of θ in LLD:Ign(θ) in equation (15) typically involves a marginal outcome model in which outcome at each time is a function of time, treatment, and covariates, but not previous outcomes.

6.2. Example 1

A standard marginal outcome model assumes a multivariate normal distribution with a model for the mean outcome at each time and a structured variance covariance matrix arising from random effects or temporal correlations.8 Using the commercial software SAS Proc Mixed,9 Allison10 fit a multivariate normal model marginal model to continuous longitudinal outcomes with dropout. In Allison’s model, the longitudinal outcome was the logarithm of hourly wage and covariates were sex and year.

6.3. Binary outcomes

With longitudinal binary outcomes, a conditional model is often easier to implement than a marginal model. Without loss of generality consider 3 times. For simplicity of notation, covariates are implicit. Let yj denote the binary outcome at time j. Let n1(y1) denote the number of participants who dropped out after outcome yI. Let n2(y1, y2) denote the number of participants who dropped out after outcomes yI and y2. Let n3(y1, y2, y3) denote the number of participants with observed outcomes y1, y2, and y3. The conditional model factors the joint distribution of outcomes as pr(y1, y2, y3; θ) = pr(y1; θ) × pr(y2| y1;θ) × pr(y3| y2, y1; θ). With obvious extension of notation from the continuous outcome scenario, the likelihood is

LLD(θ,β)=LDropOutTime2×LDropoutTime3×LNoDropOut, whereLDropOutTime2=y1{fdrop2(y1;β) ×pr(y1;θ)}n1(y1),LDropoutTime3=y1y2{fdrop3(y1,y2;β)×pr(y1;θ)×pr(y2|y1;θ)}n2(y1, y2),LNoDropOut=y1y2y3{fnodrop(y1, y2;β)×pr(y1;θ)×pr(y2|y1;θ)×pr(y3|y2, y1;θ)}n3(y1, y2, y3). (16)

Rewriting the likelihood in equation (16) gives

LLD(θ,β)=fLD(β)LLD:Ign(θ), wherefLD(β)=y1y2y3fdrop2(y1;β)n1(y1)×fdrop3(y1, y2;β)n2(y1, y2)×fnodrop(y1, y2;β)n3(y1, y2,+),LLD:Ign(θ)=y1pr(y1;θ)n1(y1) +n2(y1,+) +n3(y1,+,+)×y1y2pr(y2|y1;θ)n2(y1, y2)+n3(y1, y2,+)×y1y2y3pr(y3|y2,y1;θ)n3(y1,y2, y3). (17)

ML estimation can involve fitting a logit model to the various factors and then combining estimates. A more parsimonious outcome model that conditions only on the previous outcome for times after the first would have a likelihood, with θ = {θ1, θ2},

LLD:Ign*(θ)=y1pr(y1;θ1)n1(y1) +n2(y1,+) +n3(y1,+,+)×y1y2pr(y2|y1;θ2)n2(y1, y2) +n3(y1, y2,+)×y2y3pr(y3|y2;θ2)n3(+,y2,y3). (18)

6.4. Example 2

A false positive (FP) on cancer screening is a positive screening test followed by a negative work-up or biopsy. The goal is to estimate the probability of least one FP in a program of screens when the number of screens received varies among participants. A convenient simplification uses screen number instead of time as the longitudinal metric, so all missing FP’s are dropouts. For example, receiving screens at times 1 and 3 and missing the screen at time 2 corresponds to receiving screens 1 and 2 and then dropping out. Following Baker et al.,11 let outcome Yj denote FP status (0 for no FP or 1 for FP) if screen j were received, which is missing when screen j is missing. Missing a screen likely depends on the FP status of the previous screens and possibly observed covariates, so the missing-data mechanism is ignorable.

Consider the count data in Table 6 from Baker et al.11, which corresponds to ages 50 to 54 at first screen. As will become apparent, for the goal of estimating the probability of at least one FP in a screening program, it is only necessary to consider participants with no FP on the previous screen. One covariate is time interval xj since last screen, with xj = 1 (9–12 months), 2 (13–15 months), or 3 (16–18 months). A second covariate is screen number j. Let my denote the number of participants with outcome y on screen 1. For j>1, let njxy denote the number of participants with outcome y on screen j among participants with outcome Y=0= no FP on screen j–1 and for whom screen j occurred at time interval xj since screen j–1. See Table 6. Based on an extension of equation (18), the likelihood factor involving θ = {θ1, θ2}is

LLD:Ign(θ)=ypr(Y1= y;θ1)my×j>1ypr(Yj=y|Y(j1)=0,x,j;θ2)njxy. (19)

ML estimation can include fitting a logit model, pr(Yj=1|Y(j1)=0,x,j;θ2)=expit(θ20+θ21j+θ22xj). The resulting estimates, θ(EST)21= 0.23 with standard error 0.15 and θ(EST)22= 0.017 with standard error of 0.15, suggest a more parsimonious model, pr(Yj=1|Y(j1)=0;θ2)=θ2. Let pr(Y1=1;θ1)=θ1. The ML estimates are θ(EST)1=m1/m+ and θ(EST)2=n++1/n+++. The estimated probability of at least one false positive in 5 screens is r=1(1θ(EST)1)(1θ(EST)2)4 with standard error se=v, where v=(r/θ(EST)1)2θ(EST)1(1θ(EST)1)/m++(r/θ(EST)2)2θ(EST)2(1θ(EST)2)/n+++. Based on the counts in Table 6. θ(EST)1= 0.0179, θ(EST)2= 0.0069, and r = 0.045 with standard error 0.004.

Table 6.

Counts for false positive status given screen and time since late screen

Screen Time interval since last screen No false positive False positive
1 Not applicable n10 (4509) n11 (82)
2 1 n210 (1662) n211 (7)
2 n220 (1634) n221 (13)
3 n230 (291) n231 (1)
4 n240 (406) n241 (2)
3 1 n310 (1589) n311 (9)
2 n320 (1488) n321 (10)
3 n330 (218) n331 (2)
4 n340 (204) n341 (2)
4 1 n410 (1087) n411 (13)
2 n420 (1467) n421 (10)
3 n430 (193) n431 (2)
4 n440 (48) n441 (0)

7. PERFECT FIT ANALYSIS

A perfect fit analysis is an algebraic method of ML estimation with partially observed categorical data and a saturated model. In a saturated model the number of independent parameters equals number of independent cell counts. The advantage of using a saturated model is that it makes as few assumptions as possible. A perfect fit analysis involves the following steps:

  1. Set observed counts equal to expect counts and solve for closed-form parameter estimates.

  2. Compute the statistic of interest from the parameter estimates.

  3. Compute the standard error using the Multinomial-Poisson (MP) transformation.

The MP transformation12 changes a complicated multinomial likelihood into a simpler likelihood for Poisson random variables with the same the ML estimates and variances. Let{nu} denote the set of observed counts, indexed by u. Let d denote the statistic from the perfect fit analysis, which is closed-form function of {nu}. With a saturated model, the MP transformation treats nu as a Poisson random variable with mean and variance equal to nu. Applying the delta method, the estimated variance of d is

varMP(d)=u(d/nu)2nu, (20)

a quantity easily calculated using symbolic computing. A caveat of the perfect fit analysis is that the parameter estimates are ML only if the parameter estimates lie in the interior of the parameter space.

7.1. Example 1

The Prostate Cancer Prevention Trial randomized participants to placebo (z=0) or finasteride (z=1). 13 One outcome of interest was prostate cancer status determined on biopsy (y = 0 = n0, y = 1 = yes), which is missing if there is no biopsy. An auxiliary variable is a variable observed after randomization that is related to outcome. Biopsy recommendation based on a test for prostate specific antigen (a= 0 = no or a = 1 = yes) is an auxiliary variable which is strongly related to the probability of missing the outcome. Incorporating this auxiliary variable into the model improves the adjustment for missing outcomes. Let nzay denote the number of participants in randomization group z with auxiliary variable a and observed outcome y. Let wza denote the number of participants in randomization group z with observed auxiliary variable a and missing outcome. See Table 7.

Table 7.

Counts for Prostate Cancer Prevention Trial

Randomization group Auxiliary variable (biopsy recommendation) Outcome (prostate cancer on biopsy)
Y=0 (No) Y =1 (Yes) Missing
Z=0 (Placebo) A=0 (No) n000 (618) n001 (3675) w00 (3955)
A=1 (Yes) n010 (524) n011 (479) w01 (215)
Z=1 (Finasteride) A=0 (No) n100 (381) n101 (3791) w10 (4169)
A=1 (Yes) n110 (409) n111 (458) w11 (214)

The outcome model is pr(Y=y|z)=θy|z. The auxiliary variable model, pr(A=a|y,z)=λa|zy, is the probability of auxiliary variable a given outcome y and randomization group z. The missing-data mechanism is pr(MissY=1|a,z)=βza. The model is saturated because there are 10 independent parameters (2 for θ1|z, 4 for λ1|zy, and 4 for βza) and 10 independent cell counts (8 for {nzay}and 4 for {wza} minus 2 because nz++ + wz+ is fixed). The likelihood is

Lik(θ,β)=LMissY×LObsY, whereLMissY=za(yθy|zλa|zyβza)wza,LObsY=zay{θy|zλa|zy(1βza)}nzay. (21)

The perfect fit analysis yields ML estimates without numerical maximization of the likelihood in equation (21). It involves the following steps.

Step 1. Set observed counts equal to expect counts and solve for closed-form parameter estimates. Let Nz=nz+++wz+. The relevant equations are

Nzyθy|zλa|zyβza=wza. (22)
Nzθy|zλa|zy(1βza)=nzay, (23)

Summing equation (22) over y and adding it to equation (23) yields

Nzyθy|zλa|zy=nza++wza. (24)

Substituting equation (24) into equation (22) and solving for βza gives β(EST)za=wza/(nza++wza). Substituting β(EST)za for βza in equation (23) and simplifying gives

θy|zλa|zy=(nzay+wzanzay/nza+)/Nz (25)

Summing both sides of equation (25) over a and solving for θy|z gives

θ(EST)y|z=mzy/mz+,wheremzy=nz+y+awzanzay/nza+. (26)

Step 2. Compute the statistic of interest from the parameter estimates.

The statistic of interest is d=θ(EST)1|1θ(EST)1|0.

Step 3. Compute the standard error using the MP transformation. The estimated standard error is se=v, where v=varMP(d)=zay(d/nzay)2nzay+za(d/wza)2wza. Based on the counts in Table 7, which come from Baker14, d= −0.10 with standard error 0.007, indicating that finasteride decreases prostate cancer on biopsy.

Baker15 extended this perfect fit analysis to include lza participants randomized to group z with auxiliary variable a who are missing outcome, yielding θ(EST)y|z=mzy/mz+, where mzy=nz+y+awzanzay/nza+lza. For the Prostate Cancer Prevention Trial, Baker et al.14 implemented a more complicated perfect fit analysis involving biopsy recommendation, biopsy result, and surgery result.

7.2. Example 2

A hypothetical trial randomizes smokers to a behavioral intervention or no intervention to stop smoking. The binary outcome is self-report of smoking cessation. Missing in outcome depends only on the unobserved outcome. See Appendix C for a perfect fit analysis based on Baker and Laird.16 Under this scenario, for the hypothetical data in Table 3, the estimated risk difference between the two randomization groups is d=0.677 with standard error 0.10.

7.5. Example 3

A hypothetical diagnostic testing study involves two samples cross-classifying binary test results: (i) a reference test versus a new test and (ii) a gold standard versus a reference test. The goal is to estimate the sensitivity and specificity of the new test versus the gold standard. The assumptions are an ignorable missing mechanism, the same sensitivity and specificity of both new and reference test (relative to the gold standard) in both samples, and conditional independence of reference and new test results given the gold standard. See Appendix D for a perfect fit analysis based on Baker.17 For the hypothetical data in Table 8, the estimated sensitivity of the new test relative to the gold standard is 0.80 with standard error 0.17. The estimated specificity of the new test relative to the gold standard is 0.60 with standard error 0.10.

Table 8.

Hypothetical counts for diagnostic testing example

Data Set New test
1 Reference test 0 (negative) 1 (positive)
0 (negative) z00 (84) z01 (46)
1 (positive) z10 (26) z11 (44)
Gold standard
2 Reference test 0 (negative) 1 (positive)
0 (negative) w00 (18) w01 (4)
1 (positive) w10 (2) w11 (6)

7.4. Example 4

Some randomized trials involve all-or-none compliance, the switching of treatments immediately at randomization. For all-or-none compliance (or related all-or-none availability in before-and-after studies), Baker and Lindeman18 and Imbens and Angrist19 (followed by Angrist, Imbens, and Rubin20) independently developed a method, later called latent class instrumental variables,21 that uses potential outcomes with reasonable assumptions to estimate the causal effect of treatment among the latent class of compliers (participants who would receive the assigned treatment regardless of randomization group to which they are actually assigned). Baker22 used a perfect fit analysis for ML estimation involving a randomized cancer screening trial, discrete-time survival data, all-or-none compliance, and latent class instrumental variables. Baker and Kramer23 formulated a perfect fit analysis for ML estimation involving a randomized trial, a partially observed binary endpoint, all-or-none compliance, and latent class instrumental variables. See Appendix E. For the hypothetical data in Table 9 under the latter scenario, the estimated treatment effect based on the perfect fit analysis is 0.4 with standard error 0.095.

Table 9.

Hypothetical counts for latent class instrumental variables with missing outcome

Randomization group Treatment received Outcome
Y=0 Y =1 Missing
Z=0 T=0 n000 (100) n001 (200) w00 (100)
T=1 n010 (400) n011 (300) w01 (100)
Z=1 T=0 n100 (300) n101 (200) w10 (200)
T=1 n110 (100) n111 (100) w11 (300)

8. COMPOSITE LINEAR MODELS

Composite linear models24 provide a flexible approach to ML estimation with complex missing data patterns involving categorical data and either ignorable or nonignorable missing-data mechanisms. Let Uobs denote a vector of expected values for observed counts. Let U denote a vector of expected counts if there were no missing data. Let C denote a matrix of 0’s and 1’s that indicates the unobserved expected counts summed to yield observed expected counts. A composite linear model has the form,

Uobs=CU,whereU=Nexp{kQ(k)},Q(k)=q(k)(W(k),G(k)H(k)),H(k)=h(k)(Z(k),X(k)ϕ(k)), (27)

W(k), G(k), H(k), Z(k), and X(k) are matrices, h(k)( ) and q(k) ( ) are functions, k indexes model components, and the parameter vector ϕ(k) involves either the outcome model parameters θ or the missing-data mechanism parameters β. The parameter sets θ and β do not overlap and do not constrain one another. See Appendix F for an illustration of the matrices involved with discrete-time hazard models.

Once the matrices and function are specified, maximization is automatic, beginning with an EM algorithm, which is insensitive to poor starting values, and then switching to a Newton-Raphson algorithm, which converges faster and yields standard errors. Examples include two-phase surveys25, regression analysis of grouped survival data with missing covariate26, and misclassification.27 Software for fitting composite linear models, written in Mathematica28, is available at https://prevention.cancer.gov/about-dcp/staff-search/stuart-g-baker-scd/composite-linear-models. The user needs to specify the matrices and functions, a task simplified by using a previous example as a template, but nevertheless challenging.

8.1. Example 1

Investigators were interested in the effect of drain, a tube for removing fluid from a wound, on wound infection following surgery. Because of the expense and difficulty of following all patients after hospital discharge to determine wound infection status, investigators implemented the following double sampling design. For a random partial follow-up sample, investigators followed 1,236 patients after surgery until wound infection, hospital release, or the end of the study at 30 days after surgery, whichever occurred first. For a random full follow-up sample, investigators followed 194 patients after surgery until wound infection (either in the hospital or after release from the hospital) or the end of the study at 30 days after surgery, whichever occurred first. Time since surgery involves 4 intervals: 1 (0–4 days), 2 (5–7) days, 3 (8–30 days), and 4 (more than 30 days). See Table 10.

Table 10.

Counts for double sampling study of wound infection

Sample Drain Interval of hospital discharge with no prior infection Interval of infection No infection by day 30 Hospital discharge without follow-up
0–4 days 5–7 days 8–30 days
Partial follow-up sample No None 6 10 7 1
0–4 days -- -- -- -- 180
5–7 days 0 -- -- -- 544
8–30 days 0 0 -- -- 232
Yes None 7 15 11 0
0–4 days -- -- -- -- 8
5–7 days 0 -- -- -- 87
8–30 days 0 0 -- -- 128
Full follow-up sample No None 2 1 1 0
0–4 days 0 0 3 39
5–7 days 0 0 3 89
8–30 days 0 0 0 30
Yes None 3 3 0 0
0–4 days 0 0 0 2
5–7 days 0 0 0 10
8–30 days 0 0 1 7

Baker et al.29 formulated the following model to analyze these data. Let hf|x denote the hazard function for wound infection (in the absence of censoring) in time interval f =1, 2, 3 given drain status x = 0 = no drain or x = 1=drain. The outcome model is

logit(hf|x)=θ0+θ2(iff=2)+θ3(iff=3)+θX(ifx=1). (28)

Let ct|yd denote the hazard function for hospital discharge at the start of time interval t given wound infection occurs in time interval f (for t≥ f). The missing-data mechanism is

logit(ct|fx)=β0+β2(ift=2)+β3(ift=3)+βX(ifx=1)+βRL(ift=f). (29)

The parameter βRL, where the subscript denote response-linked, makes the missing-data mechanism directly nonignorable. Using the method of composite linear models, Baker et al.29 estimated βRL as β(EST)RL = −7.12 with standard error of 1.44. The estimated effect of drain on wound infection was θ(EST)X = 1.40 with standard error 0.24.

8.2. Example 2

The Muscatine Coronary Risk Factor Study collected data on obesity outcome (yes or no) in girls and boys at three times (initially and 2 and 4 year later).30 Missing outcomes occurred at one or more times, yielding 7 patterns of missing data. Baker31 used composite linear models to fit a marginal outcome model in which the probability of obesity at each time depends on age at that time and gender. The outcome model coupled with a nonignorable missing-data mechanism fit substantially better than the outcome model coupled with an ignorable missing-data mechanism nested within the nonignorable missing-data mechanism. The estimated coefficient for sex in the logistic outcome model was 0.15 with standard error 0.08, indicating higher obesity levels for girls than boys.

9. DISCUSSION

An often-overlooked consideration with missing-data analyses is the need for missing-data adjustments to make sense. One criterion for sensible missing-data adjustment is that the unobserved data exist or could be ascertained. For example, if a biopsy result is missing because an eligible person did not arrive at the clinic, there exists an unobserved biopsy result that could have been ascertained if the person had arrived at the clinic. However, if the biopsy result is missing because of death, there does not exist an unobserved biopsy result that could have been ascertained. A less stringent criterion for sensible missing-data adjustment is that the unobserved result could be observed in a relevant target population where the missingness could be prevented, as might apply if missing in biopsy was due to accidental death and the target population specified no accidental deaths.

An important component of many missing-data analyses is a sensitivity analysis to determine how assumptions about the missing-data mechanisms affect estimates of treatment effect in the outcome model. A model-based sensitivity analysis computes estimated treatment effect under multiple missing-data mechanisms, as when fitting composite linear models. If an outcome model coupled with non-ignorable missing-data mechanism fits substantially better than the same outcome model coupled with a nested ignorable missing-data mechanism, reported estimates should be based on the former model. A parameter-based sensitivity analysis computes estimated treatment effect when varying a parameter measuring the association between missing outcome and outcome.36 A data-based sensitivity analysis computes estimated treatment effect when imputing values for the missing outcome.37, 38 A randomization-based sensitivity analysis using the randomization distribution to bound the estimated treatment effect if missing in outcome depends on an unobserved binary variable.39 When implementing a sensitivity analyses, prior knowledge helps to limit the range of possible values.

In summary, the ML methods discussed here range from the simple to the complex. The simplest methods are complete-case analysis and complete-case analysis adjusted for covariates Survival analyses adjusted for covariates are easy to implement using standard software. For missing in univariate or survival outcome based on multiple covariates, the propensity-to-be-missing score is preferable and easy to implement. More complicated ML methods are needed for fitting models with longitudinal dropouts. Commercial software is available with continuous outcomes. With binary longitudinal outcomes, a conditional model is easy to fit but extra work is needed to combine parameter estimates to estimate the quantity of interest. The perfect fit analysis is an underappreciated approach to obtain to obtain closed form ML estimates and variances for complicated likelihoods involving saturated models fit to categorical data. Some work is needed in the algebraic derivation, but it is generally simpler to implement than iterative numerical fitting. The most complex method discussed is the method of composite linear models, which is a flexible approach involving categorical data. Except for composite linear models, which awaits the development of more user-friendly software, all the above methods can contribute to the toolkit of statisticians for analyzing clinical and prevention studies with missing outcomes.

ACKNOWLEDGEMENTS

This work was supported by the National Institutes of Health.

APPENDIX A

This Appendix discusses ML estimation for a randomized trial in which missing in outcome Y depends on randomization group Z and baseline covariate X in which the baseline covariate is MCAR among some participants. Four subsets of participants defined by the pattern of missing data are missing both Y and X, {MissY:MissX}; missing Y with observed X, {MissY:ObsX}; observed Y with missing X,{ObsY:MissX}; and observed Y and observed X, {ObsY:ObsX}. Let β = (β1, β2). Let pr(xi; λ) denote the distribution of xi, which is modeled by parameter set λ. Let MissX denote the missing-data indicator for X. The probability of missing X is constant, as denoted by pr(MissXi = 1; β2). The likelihood is

LikCCX=LMissY:MissX×LMissY:ObsX×LObsY:MissX×LObsY:ObsX, whereLMissY:MissX=i{MissY:MissX}pr(MissYi=1|MissXi=1,zi,xi;β1)×pr(MissXi=1;β2)×pr(yi|xi,zi;θ)×pr(xi;λ)dxidyi,LMissY:ObsX=i{MissY:ObsX}pr(MissYi=1|MissXi=0,zi,xi;β1)×pr(MissXi=0;β2)×pr(yi|xi,zi;θ)×pr(xi;λ)dxidyi,LObsY:MissX=i{ObsY:MissX}pr(MissYi=0|MissXi=1,zi,xi;β1)×pr(MissXi=1;β2)×pr(yi|xi,zi;θ)×pr(xi;λ)dxi,LObsY:MissX=i{ObsY:ObsX}pr(MissYi=0|MissXi=0,zi,xi;β1)×pr(MissXi=0;β2)×pr(yi|xi,zi;θ)pr(xi;λ). (A.1)

The likelihood in equation (A.1) is indirectly nonignorable. There is no factor of the likelihood involving θ without β because β linked to λ in LMissY:MissX and λ is linked to θ in LObsY:MissX. To simplify ML estimation, a simple expedient is to consider the likelihood for the random sample of participants with observed values of covariate,

LikCCX:ObsX=LMissY:ObsX×LObsY:ObsX=fCCX(β,λ)×LikCCX:ObsX:Ign(θ),wherefCCX(β,λ)=i{MissY:ObsX}pr(MissYi=1|MissXi=0,zi,xi;β1)×pr(MissXi=0;β2)×pr(xi;λ)×i{ObsY:ObsX}pr(MissYi=0|MissXi=0,zi,xi;β1)×pr(MissXi=0;β2)×pr(xi;λ),LikCCX:ObsX:Ign(θ)=i{ObsY:Ob}pr(yi|xi,zi;θ). (A.2)

ML estimation for θ in equation (A.2) involves only LikCCX:ObsX:Ign(θ), which translates into complete case analysis adjusted for observed values of covariate X.

APPENDIX B

This Appendix discusses ML estimation in a randomized trial with survival times in which the probability of censoring depends on randomization group Z and a partially observed baseline covariate X. Four subsets of participants defined by the pattern of missing data are censored with missing X,{CensMissX}, censored with observed X, {CensObsX}, failure with missing X, {FailMissX}, and failure with observed X, {FailObsX}. Let β = (β1, β2). The probability of missing X is constant, as denoted by pr(MissXi=1;β2). The missing data patterns give rise to the following likelihood,

LikSurvX(θ,β)=LCensMissX×LCensObsX×LFailMissX×LFailObsX, whereLCens;MissX=i{CensMissX}cipr(Ci=ci|MissXi=1,zi,xi;β1)×pr(MissXi=1;β2)×pr(Fi=fi|zi,xi;θ)×pr(xi;λ)dxidfi,LCens:ObsX=i{CensObsX}cipr(Ci=ci|MissXi= 0,zi,xi;β1)×pr(MissXi=0;β2)×pr(Fi=fi|zi,xi;θ)×pr(xi;λ)dfi,LFail:MissX=i{FailMisX}fipr(Ci=ci|MissXi=1,zi,xi;β1)×pr(MissXi=1;β2)×pr(Fi=fi|zi,xi;θ) pr(xi;λ)dxidci,LFail:ObsX=i{FailObsX}fipr(Ci=ci|MissXi=0,zi,xi;β1)×pr(MissXi=0;β2)×pr(Fi=fi|zi,xi;θ)×pr(xi;λ)dci. (B.1)

The likelihood in equation (B.1) is indirectly nonignorable. There is no factor of the likelihood involving θ without β because β is linked to λ in LCens:MissX and λ is linked to θ in LFail:MissX. To simplify ML estimation, a simple expedient is to consider the likelihood for the random sample of participants with observed values of covariate,

LikSurvX:ObsX(θ,β)=LCensObsX×LFailObsX=fSurvX(β,λ)×LikSurvX:ObsX:Ign(θ), wherefSurvX(β,λ)=i{CensObsX}pr(Ci=ci|MissXi=0,zi,xi;β1)×pr(MissXi=0;β2)×pr(xi;λ)i{FailObsX}fipr(Ci=ci|MissXi= 0,zi,xi;β1)×pr(MissXi=0;β2)×pr(xi;λ)dci.LikSurvX:ObsX:Ign(θ)=i{CensObsX}pr(Fici|zi,xi;θ)×i{FailObsX}pr(Fi=fi|zi,xi;θ). (B.2)

ML estimation of θ equation (B.2) involves only LikSurvX:ObsX:Ign(θ), which translates into a survival analysis for the random sample of cases with observed values of covariate X.

APPENDIX C

This Appendix derives the perfect fit estimates for a randomized trial with a binary outcome Y in which missing in outcome depends on the outcome Y but not on the randomization group Z Let nzy denote the number of participants randomized to group z =0, 1, with outcome y=0, 1. Let wz denote the number of participants randomized to group z =0, 1 who are missing the outcome. See Table 3. The outcome model, pr(Y=1|z;θ)=θz, is the probability of outcome 1 given randomization to group z. The directly nonignorable missing-data mechanism, pr(MissY=1|y;β)=βy, is the probability of missing outcome y given outcome y. The model is saturated because there are 4 independent parameters (2 for θ1|z and 2 for βy) and 4 independent cell counts (4 for {nzy} and 2 for {wz} minus 2 because nz++ wz is fixed). The perfect fit analysis follows.

Step 1. Set expected counts equal to observed counts and solve for parameter estimates. Let ϕy=βy/(1βy) and μzy=nz+θy|z(1βy). The relevant equations are

μzy=nzy (C.1)
yμzyϕy.=wz. (C.2)

Simultaneously solving equations (C.1) and (C.2) yields

ϕ(EST)0=(n11w0n01w1)/(n11n00n01n10), (C.3)
ϕ(EST)1=(n00w1n10w0)/(n11n00n01n10). (C.4)

If ϕ(EST)y0, ϕ(EST)y is the ML estimate. If ϕ(EST)0 or ϕ(EST)1 is negative, the ML estimates are on the boundary of the parameter space.

Step 2. Compute the statistic of interest. The estimated risk difference is d = p1p0, where pz={nz1(1+ϕ(EST)1)}/ynzy(ϕ(EST)y).

Step 3. Compute the standard error using the MP transformation. The standard error is se=v, where v=varMP(d).

APPENDIX D

This Appendix presents a perfect fit analysis for the diagnostic testing data in Table 8. Data set 1 involves {zij}, the number of persons with reference test result i = 0, 1 and new test result j = 0, 1. Data set 2 involves {wjk}, the number of persons with reference test result i=0, 1 and gold standard result k = 0, 1. The model assumes independence of the test results given the gold standard result and an ignorable missing-data mechanism. Let ψi|k denote the probability of reference test result i given gold standard result k. Let θj|k denote the probability of new test result j given gold standard result k. Let πk denote the probability of gold standard result k in data set 1 and let ρk denote the probability of gold standard k in data set 2. The outcome model is saturated with 6 independent parameters (2 for ψ1|k , 2 for θj|k , 1 for πk, and 1 for ρk ) and 6 independent cell counts (3 for {zik} and 3 for {wik}).

Step 1. Set observed counts equal to expect counts and solve for closed-form parameter estimates. The relevant equations ignore the missing-data mechanism,

z++kψi|kθj|kπk=zij, (D.1)
w++yi|kρk=wik. (D.2)

Summing both sides of equation (D.2) over i and solving for ρk gives ρ(EST)k=w+k/w++. Substituting ρ(EST)k into equation (D.2) and solving for ψi|k gives ψ(EST)i|k=wik/w+k. Substituting ψ(EST)i|k into equation (D.1) gives

z++k(wik/w+k)θj|kπk=zij. (D.3)

Rewriting equation (D.3) as separate equations for i=0 and i=1gives

(w00/w+0(θj|0π0)+(w01/w+1(θj|1π1)=z0j/z++, (D.4)
(w10/w+0(θj|0π0)+(w11/w+1(θj|1π1)=z1j/z++. (D.5)

Simultaneously solving equations (D.4) and (D.5) gives

θj|0π0=w+0(w00z1jw10z0j)/{(w00w11w10w01)z++} (D.6)
θj|1π1=w+1(w01z1jw11z0j)/{(w00w11w10w01)z++} (D.7)

Summing both sides of equation (D.6) and (D.7) over j and solving for π0 and π1 yields

π(EST)0=w+0(w00z1+w10z0+)/{(w00w11w10w01)z++}. (D.8)
π(EST)1=w+1(w01z1+w11z0+)/{(w00w11w10w01)z++}. (D.9)

Substituting π(EST)0 into equation (D.6) and π(EST)1 into equation (D.7) and solving for θj|0 and θj|1 yields

θ(EST)(j|0)=(w00z1jw10z0j)/(w00z1+w10z0+), (D.10)
θ(EST)(j|1)=(w11z0jw01z1j)/(w11z0+z01z1+). (D.11)

Step 2. Compute the statistic of interest. Specificity is θ(EST)0|0. Sensitivity is θ(EST)1|1.

Step 3. Compute the standard error using the MP transformation. The standard error is se=v, where v=varMP(d) and d=θ(EST)0|0 or θ(EST)1|1.

APPENDIX E

This Appendix derives perfect fit analysis for estimating treatment effect in a randomized trial with all-or-none compliance and binary outcome in which missing depends on outcome and randomization group. Let nzby denote the number of participants randomized to treatment z who receive treatment b immediately after randomization and experience outcome y. Let wzb denote the number of persons randomized to treatment z who receive treatment b immediately after randomization and are missing the outcome. See Table 9.

Let s index latent classes defined by potential outcomes of treatment received. Under the monotonicity assumption s takes three possible values: A = always-takers, who would receive the new treatment regardless of which randomization group to which they might be assigned, N = never-takers who would receive the old treatment regardless of which randomization to which they might be assigned, and C = compliers who receive the assigned treatment regardless of which randomization group to which they might be assigned.

The outcome model, pr(Y=y|z,s;θ)=θszy, is the probability of outcome y given randomization group z and latent class s. The missing-data mechanism, pr(MissY=1|z,s;β)=βzs, is the probability of missing outcome given randomization group z and latent class s. Let pr(S=s)=πs denote the probability of being in latent class s. Under the compound exclusion restriction assumption, the probabilities of outcome and missing in outcome do not depend on randomization group for always-takers and never-takers, namely θt|zs = θt|z and βzs = βz for s = A and N. The model is saturated because there 10 independent parameters (θ1|A θ1|0C , θ1|1C1|N , βA, β0C, β1C, βN, πA, and πC ) and 10 independent cells counts (8 for {nzby}and 4 for {wzb} minus 2 because nz+++ + wz++ is fixed). The perfect fit analysis follows.

Step 1. Set expected counts equal to observed counts and solve for parameter estimates.

Let Nz = nz+++ + wz++. The relevant equations based on the definitions of the latent classes A, C, and N, are

N0{θy|N(1βN)πN+θy|0C(1β0C)πC}=n00y, (E.1)
N0{θy|A(1βA)πA}=n01y, (E2)
N1{θy|N(1βN)πN}=n10y, (E.3)
N1{θy|1C(1β1C)πC+θy|A(1βA)πA}=n11y, (E.4)
N0(βNπN+β0CπC)=w00, (E.5)
N0βAπA=w01, (E.6)
N1βNπN=w10, (E.7)
N1(β1Cπ1C+βAπA)=w11. (E.8)

Summing equation (E.2) over y and adding to equation (E.6) yields

N0πA=n01+w01. (E.9)

Summing equation (E.4) over y and adding to equation (E.8) yields

N1(π1C+πA)=n11+w11. (E.10)

Subtracting equation (E.9) from equation (E.10) and solving for π1C gives

π(EST)C=p1p0,wherep1=(n11++w11)/N1andp0=(n01++w01)/N0. (E.11)

Subtracting equation (E.6) from equation (E.8) and solving gives

β(EST)1Cπ(EST)C=q11q01,whereq11=w11/N1andq01=w01/N0. (E.12)

Subtracting equation (E.7) from equation (E.5) and solving gives

β(EST)0Cπ(EST)C=q00q10,whereq00=w00/N0andq10=w10/N1. (E.13)

Subtracting equation (E.2) from equation (E.4) and solving for θy|1C based on equations (E.11) and (E.12) gives

θ(EST)y|1C=(n11y/N1n01y/N0)/[{(1β(EST)1C)}π(EST)C]=(n11y/N1n01y/N0)/{(p1p0)(q11q01)}. (E.14)

Subtracting equation (E.3) from (E.1) and solving for θy|0C based on equations (E.11) and (E.13) gives

θ(EST)y|0C=(n00y/N0n10y/N1)/[{(1β(EST)0C)}π(EST)C].=(n00y/N0n10y/N1)/{(p1p0)(q00q10)}. (E.15)

Step 2. Compute the statistic of interest. The perfect fit ML estimate of the treatment effect in compliers is d=θ(EST)1|1Cθ(EST)1|0C..

Step 3. Compute the standard error using the MP transformation. The standard error is se=v, where v=varMP(d).

Appendix F

This Appendix presents some of the matrix components in a composite linear model for discrete-time survival. Let hf|x(θ) denote the hazard for failure (in the absence of censoring) at time f =1, 2 for covariate x =0, 1. Let ct|x(β) denote the hazard for censoring (in the absence of failure) at time t =1 for covariate x = 0, 1, where censoring in an interval implies failure is not observed in the interval. Consider two simple models: logit{hf|x(θ)}=θf0+θf1x and logit{ct|x(β)}=βt0+βt1x. Let N denote the sample size. Let uFtx denote the expected number of persons who fail in interval t with covariate at level x. Let uCtx denote the expected number of persons censored in interval t with covariate at level x. The 8×1 column vector of expected counts with no missing data are U8×1 = (uF10, uF20, uC10, uC20, uF11, uF21, uC11, uC21)T, where, for covariate x,

uF1x=N log{h1|x(θ)}, (F.1)
uF2x=N{1h1|x(θ)}×h2|x(θ)×{1c1|x(β)}, (F.2)
uC1x=N{1h1|x(θ)}×c1|x(β), (F.3)
uC2x=N{1h1|x(θ)}×{1h2|x(θ)}×{1c1|x(β)}. (F.4)

In matrix notation for composite linear models, the expected counts with no missing data are

U=Nexp{kQ(k)}, where
Q(k)=q(k)(W(k),G(k)H(k)),
h(k)(Z(k),X(k)θ(k))fortheoutcomemodel,
H(k)={
h(k)(Z(k),X(k)β(k))fork=2,forthemissingdatamechanism.

Outcome model.

The H-component for the outcome model expresses in matrix form log{ht|x(θ)}=(θt0+θt1x)log{1+(expθt0+θt1x)} and log{1ht|x(θ)}=log{1+(expθt0+θt1x)} The H-component is H8×1=log(h1|0,1h1|0,h2|0,1h2|0,h1|1,1h1|1,h2|1,1h2|1)T. In matrix notation, H8×1(1) = Z8×1(1) ° (X8×4 (1) θ4×1) – log{1+ exp(X8×4 (1) θ4×1)}, where Z8×1 (1) = (1, 0, 1, 0, 1, 0, 1, 0) T, X8×4 (1) = ((X4×2*, 04×2), (X4×2*, X4×2*)), X4×2* = ((1, 0), (1, 0), (0, 1), (0, 1)), and 04×2 is a 4 × 2 matrix of 0’s , θ4×1 = (θ10, θ20, θ11, θ21)T, and the symbol “°” denotes element-by-element multiplication instead of matrix multiplication. The top half of X8×2(1) corresponds to x=0 and the bottom half to x=1.

The Q-component of the outcome model is Q8×1(1) = log(h1|0, (1–h1|0) h2|0, (1–h1|0), (1–h1|0) (1–h2|0), h1|1, (1–h1|1) h2|1, (1–h1|1), (1–h1|1) (1–h2|1))T. In matrix notation, Q8×1(1) = W8×1 (1) + G8×8(1) H8×1 (1), where W8×1 (1) = 08×1, G8×8(1) = ((G4×4*, 04×4), (04×4, G4×4*)), and G4×4 *= ((1, 0, 0, 0), (0, 1, 1, 0), (0, 1, 0, 0), (0, 1, 0, 1)). The top half of G8×8(1) corresponds to x=0 and the bottom half to x=1.

Missing-data mechanism.

The H-component for the censoring model is H4×1 (2) = log(c1|0, 1–c1|0, c1|1, 1–c1|1) T. In matrix notation, H4×1 (2) = Z4×1(2) ° (X4×2 (2) β2×1) – log{1+ exp(X4×2 (2) β)}, where Z4×1(2) = (0, 1, 0, 1)T, X4×2 (2) = ((1, 0), (1, 0), (1, 1), (1, 1)) and β2×1 = (β10, β11) T.

The Q-component for the censoring model is Q8×1 (2) = log(1, (1–c1|0), c1|0, (1–c1|0), 1, (1–c1|1), c1|1, (1–c1|1))T. In matrix notation, Q8×1 (2) = W8×1 (2) + G8×4 (2) H4×1 (2), where W8×1 (2) = 08×1 and G8×4 (2) = ((G4×2**, 04×2), (04×2, G4×2**)), where G4×2 **= ((0, 0), (0, 1),(1, 0), (0, 1)). The top half of G8×4 (2) corresponds to x=0 and the bottom half to x=1.

Footnotes

DATA AVAILABILITY STATMENT

The data used in the analyses are available in the tables of the paper.

REFERENCES

  • 1.Rubin DB. Inference and missing data. Biometrika 1976; 63(3):581–592. [Google Scholar]
  • 2.Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 1988; 44(1):175–188. [Google Scholar]
  • 3.Heitjan DF, Rubin DB. Ignorability and coarse data. Ann. Statist 1991; 19(4):2244–2253. [Google Scholar]
  • 4.Little RJ, Rubin DB. Statistical Analysis with Missing Data Hoboken, NJ: John Wiley & Sons; 2002. [Google Scholar]
  • 5.Little RJ, Rubin DB and Zangeneh SZ. Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter subsets. J Am Stat Assoc 2017; 112(517):314–320. [Google Scholar]
  • 6.Baker SG, Fitzmaurice G. Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative baseline covariates. Biostatistics 2006; 7(1):29–40. [DOI] [PubMed] [Google Scholar]
  • 7.Henry K, Erice A, Tierney C, Balfour HH Jr, et al. A randomized, controlled, double-blind study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternating drug) for the treatment of advanced AIDS. AIDS Clinical Trial Group 193A Study Team. J Acquir Immune Defic Syndr Hum Retrovirol 1998. 19(4):339–349. [DOI] [PubMed] [Google Scholar]
  • 8.Jennrich RI, Schluchter MD. Unbalanced repeated-measures models with structured covariance matrices. Biometrics 1986; 42(4):805–820. [PubMed] [Google Scholar]
  • 9.SAS/STAT(R) 9.22 User’s Guide. SAS https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_mixed_sect034.htm Accessed March 27, 2019.
  • 10.Allison PD. Handling missing data by maximum likelihood SAS Global Forum 2012; Paper 313–2012 http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf. Accessed March 27, 2019 [Google Scholar]
  • 11.Baker SG, Erwin D. and Kramer BS. Estimating the cumulative risk of false positive cancer screenings. BMC Med Res Method. 2003; 3:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Baker SG. The Multinomial-Poisson transformation. The Statistician 1994; 43(4):495–504. [Google Scholar]
  • 13.Thompson IM, Goodman PJ, Tangen CM, et al. The influence of finasteride on the development of prostate cancer. N Engl J Med 2003; 349(3):215–224. [DOI] [PubMed] [Google Scholar]
  • 14.Baker SG, Darke AK, Pinsky P, Parnes HL, Kramer BS. Transparency and reproducibility in data analysis: The Prostate Cancer Prevention Trial. Biostatistics 2010;11(3): 413–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Baker SG. Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance. J Am Stat Assoc 2000; 95(449):43–50. [Google Scholar]
  • 16.Baker SG, Laird NM. Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. J Am Stat Assoc 1988; 83(401):62–69. [Google Scholar]
  • 17.Baker SG. Evaluating a new test using a reference test with estimated sensitivity and specificity. Commun Stat 1991; 20(9):2739–2752. [Google Scholar]
  • 18.Baker SG. and Lindeman KS. The paired availability design, a proposal for evaluating epidural analgesia during labor. Stat Med 1994; 13(21):2269–2278. [DOI] [PubMed] [Google Scholar]
  • 19.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994; 62(2): 467–475. [Google Scholar]
  • 20.Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc 1996; 91(434):444–455. [Google Scholar]
  • 21.Baker SG, Kramer BS, Lindeman KL. Latent class instrumental variables. A clinical and biostatistical perspective. Stat Med 2016; 35:147–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. J. Am Stat Assoc 1998; 93(443):929–934. [Google Scholar]
  • 23.Baker SG and Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Stat Methods Med Res 2005; 14(4):349–367. [DOI] [PubMed] [Google Scholar]
  • 24.Baker SG. Composite linear models for incomplete multinomial data. Stat Med 1994; 13(5–7):609–622. [DOI] [PubMed] [Google Scholar]
  • 25.Baker SG, Ko C, and Graubard B. A sensitivity analysis for nonrandomly Missing categorical data arising from a national health disability survey. Biostatistics 2003; 4(1):41–56. [DOI] [PubMed] [Google Scholar]
  • 26.Baker SG. Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics 1994; 50(3):821–826. [PubMed] [Google Scholar]
  • 27.Baker SG. A simple method for computing the observed information matrix when using the EM algorithm with categorical data. J Comp Graph Stat 1992; 1(1):63–76 [Google Scholar]
  • 28.Wolfram Research, Inc., Mathematica, Version 11.3, Champaign, IL; 2018. [Google Scholar]
  • 29.Baker SG, Wax Y, Patterson BH. Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics 1993; 49(2):379–389. [PubMed] [Google Scholar]
  • 30.Lauer RM, Connor WE, Leaverton PE, et al. Coronary heart disease risk factors in school children: the Muscatine study. J Pediatr 1975; 86(5):697–706. [DOI] [PubMed] [Google Scholar]
  • 31.Baker SG. Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 1995; 51(3):1042–1052. [PubMed] [Google Scholar]
  • 32.Vach W, Blettner M. Logistic regression with incompletely observed categorical covariates--investigating the sensitivity against violation of the missing at random assumption. Stat Med 1995; 14(12):1315–1329. [DOI] [PubMed] [Google Scholar]
  • 33.Liublinska V, Rubin DB. Sensitivity analysis for a partially missing binary outcome in a two-arm randomized clinical trial. Stat Med 2014; 33(24):4170–4185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hollis S A graphical sensitivity analysis for clinical trials with non-ignorable missing binary outcome. Stat Med 2002; 21(24):3823–3834. [DOI] [PubMed] [Google Scholar]
  • 35.Baker SG, Freedman LS. A simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Med Res Method 2003; 3:8. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES