Maximum likelihood estimation with missing outcomes: From simplicity to complexity

Stuart G Baker

doi:10.1002/sim.8319

. Author manuscript; available in PMC: 2020 Sep 30.

Published in final edited form as: Stat Med. 2019 Aug 8;38(22):4453–4474. doi: 10.1002/sim.8319

Maximum likelihood estimation with missing outcomes: From simplicity to complexity

Stuart G Baker ¹

PMCID: PMC6879193 NIHMSID: NIHMS1038690 PMID: 31392751

Abstract

Many clinical or prevention studies involve missing or censored outcomes. Maximum likelihood (ML) methods provide a conceptually straightforward approach to estimation when the outcome is partially missing. Methods of implementing ML methods range from the simple to the complex, depending on the type of data and the missing-data mechanism. Simple ML methods for ignorable missing-data mechanisms (when data are missing at random) include complete-case analysis, complete-case analysis with covariate adjustment, survival analysis with covariate adjustment, and analysis via propensity-to-be-missing scores. More complex ML methods for ignorable missing-data mechanisms include the analysis of longitudinal dropouts via a marginal model for continuous data or a conditional model for categorical data. A moderately complex ML method for categorical data with a saturated model and either ignorable or nonignorable missing-data mechanisms is a perfect fit analysis, an algebraic method involving closed-form estimates and variances. A complex and flexible ML method with categorical data and either ignorable or nonignorable missing-data mechanisms is the method of composite linear models, a matrix method requiring specialized software. Except for the method of composite linear models, which can involve challenging matrix specifications, the implementation of these ML methods ranges in difficulty from easy to moderate.

Keywords: composite linear model, double sampling, latent class instrumental variable, missing-data mechanism, perfect fit analysis, randomized trial

1. INTRODUCTION

In many clinical or prevention studies the outcome is missing or censored. Maximum likelihood (ML) methods are a conceptually simple approach for estimation in this setting. The landmark 1976 paper by Rubin¹ made several key innovations for ML estimation with missing data: a missing-data indicator as a random variable, a comprehensive likelihood framework, and the concept of ignorable and nonignorable missing-data mechanisms. Wu and Carroll,² Heitjan and Rubin,³ and Little and Rubin,⁴ extended this approach to censoring mechanisms.

The basic set-up follows. The goal of the analysis is to estimate parameters in an outcome model, a model for the effect of treatment or covariates on outcome. Coupled with the outcome model is a missing-data mechanism, a model for the probability that the outcome is missing or censored. The sets of parameters for the outcome model and the missing-data mechanism do not overlap and do not constrain each other.

In the context of likelihood-based inference, an ignorable missing-data mechanism is a missing-data mechanism whose parameters factor from the likelihood and hence do not contribute to likelihood-based inference for the outcome model. Rubin¹ showed that an ignorable missing-data mechanism depends only on completely observed variables, in which case the data are said to be Missing at Random (MAR). A special case of MAR is Missing Completely at Random (MCAR), corresponding to a constant probability the data are missing.

A nonignorable missing-data mechanism is simply a missing-data mechanism that is not ignorable. This tutorial introduces the terminology of directly and indirectly nonignorable missing-data mechanisms. A directly nonignorable missing-data mechanism is a nonignorable missing-data mechanism in which the probability of missing a variable depends on that variable and possibly on other variables. An indirectly ignorable missing-data mechanism is a missing-data mechanism in which the probability of missing a variable does not depend on that variable but depends on at least one other variable that is partially missing. Table 1 summarizes this missing-data taxonomy in the context of missing outcomes.

Table 1.

Missing-data taxonomy applied to missing outcomes

	Missing-data mechanism
	Ignorable	Non-ignorable
Definition	Likelihood-based inference for the outcome model does not involve parameters modeling the missing-data mechanism^*	Not ignorable
Implication	Missing in outcome depends only observed variables	Missing in outcome depends on at least one unobserved variable
Outcome is said to be	Missing at random (MAR)	Missing not at random (MNAR)
Special cases	Missing completely at random (MCAR) Missing in outcome occurs with constant probability	Directly non-ignorable Missing in outcome depends only on outcome Indirectly non-ignorable Missing in outcome does not depend on outcome and depends on other partially missing variables

Open in a new tab

The sets of parameters for the outcome model and missing-data mechanism do not overlap and do not constrain one other

Implementation of ML methods with missing outcomes can range from simple computations to complex modeling with specialized software. Because ML methods are often tailored to specific missing-data scenarios and there are numerous missing-data scenarios, it is not possible to cover all ML methods here. Table 2 lists the ML methods discussed in this tutorial.

Table 2.

Overview of ML estimation methods

Method	Indications for Use	Missing-Data Mechanism	Implementation
Complete case analysis	Missing in outcome depends on randomization group	Ignorable	Compute simple statistics for complete cases (participants not missing the outcome).
Complete case analysis with covariate adjustment	Missing in outcome depends on randomization group and covariate	Ignorable after covariate adjustment	Fit the outcome model (as a function of randomization group and covariates) to complete cases with the covariate.
Survival analysis with covariate adjustment	Censoring depends on randomization group and covariate	Ignorable after covariate adjustment	Fit the outcome model (as a function of randomization group and covariates) to survival data.
Analysis via propensity-to-be-missing scores	Missing in outcome or censoring depends on randomization group and many covariates	Ignorable after covariate adjustment	(1) Fit a model for the missing-data mechanism. (2) Use the fitted model to compute scores. (3) Compute overall estimate based on quintiles of scores.
Longitudinal dropout analysis	Dropout depends on previous observed outcome and possibly randomization group and covariate	Ignorable	For a continuous longitudinal outcome, fit a marginal model using commercial software. For a longitudinal binary outcome, fit a conditional model.
Perfect fit analysis	Saturated models with categorical data	Ignorable or nonignorable	(1) Set expected counts equal to observed counts and solve for parameter estimates. (2) Compute statistic from parameter estimates. (3) Compute estimated variance using MP transformation.
Composite linear models	Flexible models with categorical data	Ignorable or nonignorable	Fit using specialized software.

Open in a new tab

2. COMPLETE-CASE ANALYSIS

Consider a randomized trial in which missing in univariate outcome Y depends on randomization group Z. As an example, missing in outcome depends on side effects of the experimental treatment. Complete cases are participants who are not missing the outcome. For this scenario, the ML method is a complete case analysis, an analysis involving only complete cases. Separate derivations involve continuous and binary outcomes.

2.1. Continuous outcomes

Let subscript i index trial participant. Let Y_i denote the outcome with realization y_i. Let MissY_i denote the missing-data indicator, where MissY_i = 1 if y_i is missing and 0 otherwise. Let {MissY} and {ObsY}denote the set of persons with missing and observed outcomes, respectively. Let Z_i denote the randomly assigned group with realization z_i. The outcome model, $p r (y_{i} | z_{i}; θ)$ , is the distribution of outcome y_i given randomization to group z_i, which is modeled by parameter set θ. The missing-data mechanism, $p r (M i s s Y_{i} = 1 | z_{i}; β)$ , is the probability of missing outcome Y_i given randomization to group z_i, which is modeled by parameter set β. By definition, $p r (M i s s Y_{i} = 0 | z_{i}; β) = 1 - p r (M i s s Y_{i} = 1 | z_{i}; β)$ . An example of this missing-data mechanism is $p r (M i s s Y_{i} = 1 | Z_{i} = 0; β) = β_{0} = 1 / 2$ for participants randomized to group 0, and $p r (M i s s Y_{i} = 1 | Z_{i} = 1; β) = β_{1} = 1 / 3$ for participants randomized to group 1, where β = {β₀ β₁}. The parameter sets θ and β do not overlap and do not constrain one another.

The likelihood is the product of a factor for participants missing outcome, L_MissY, and a factor for participants with observed outcome, L_ObsY,

L i k_{C C} (θ, β) = L_{M i s s Y} \times L_{O b s Y}, where L_{M i s s Y} = \prod_{i \in {M i s s Y}} \int p r (M i s s Y_{i} = 1 | z_{i}; β) \times p r (y_{i} | z_{i}; θ) d y_{i} = \prod_{i \in {M i s s Y}} p r (M i s s Y_{i} = 1 | z_{i}; β), L_{O b s Y} = \prod_{i \in {O b s Y}} p r (M i s s Y_{i} = 0 | z_{i}; β) \times p r (y_{i} | z_{i}; θ)

(1)

The factor L_MissY integrates over the missing continuous outcome. Rewriting the likelihood in equation (1) by defining f_CC(β) as a function of parameters involving only β and defining Lik_CC:Ign(θ) as a function of parameters involving only θ yields

L i k_{C C} (θ, β) = f_{C C} (β) L i k_{C C : I g n} (θ), where f_{C C} (β) = \prod_{i \in {M i s s Y}} p r (M i s s Y_{i} = 1 | z_{i}; β) \times \prod_{i \in {O b s Y}} p r (M i s s Y_{i} = 0 | z_{i}; β), L i k_{C C : I g n} (θ) = \prod_{i \in {O b s Y}} p r (y_{i} | z_{i}; θ) .

(2)

Because f_CC(β) factors from the likelihood in equation (2), the missing-data mechanism is ignorable, so ML estimation for θ involves only $L i k_{C C : I g n} (θ)$ . Moreover, because $L i k_{C C : I g n} (θ)$ involves only observed values of outcome, ML estimation for θ involves only complete cases.

2.2. Example 1

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a continuous biomarker. Missing in outcome depends only on randomization group. If the biomarker is normally distributed with a different mean for each randomization group, a simple ML estimate for the effect of treatment on outcome is the difference in mean biomarkers levels between randomization groups among the complete cases.

2.3. Binary outcomes

A similar derivation applies to binary outcomes. Let n_zy denote the number of persons randomized to group z =0, 1, with observed outcome y=0, 1. Let w_z denote the number of persons randomized to group z =0, 1, with a missing outcome. See Table 3. The outcome model, $p r (Y = 1 | z; θ) = θ_{z}$ , is the probability of outcome 1 given randomization to group z. The missing-data mechanism, $p r (M i s s Y = 1 | z; β) = β_{z}$ , is the probability of missing outcome y given randomization group z. The likelihood with β = {β₀, β₁} and θ = {θ₀, θ₁} is

L i k_{C C} (θ, β) = L_{M i s s Y} \times L_{O b s Y}, where L_{M i s s Y} = \prod_{z} {β_{z} (1 - θ_{z}) + β_{z} θ_{z}}^{w} = \prod_{z} {β_{z}}^{w_{z}} L_{O b s Y} = \prod_{z} {(1 - β_{z}) (1 - θ_{z})}^{n_{z 0}} \times {(1 - β_{z}) θ_{z}}^{n_{z 1}} .

(3)

The factor L_MissY sums over the missing binary outcomes. Let “+” in a subscript denote summation over the index in the subscript, so n_z+ = n_z0 + n_z1. Rewriting the likelihood in equation (3) yields

L i k_{C C} (θ, β) = f_{C C} (β) L i k_{C C : I g n} (θ), where f_{C C} (β) = \prod_{z} {β_{z}}^{w_{z}} \times {(1 - β_{z})}^{n_{z +}}, L i k_{C C : I g n} (θ) = \prod_{z} {(1 - θ_{z})}^{n_{z 0}} \times {θ_{z}}^{n_{z 1}} .

(4)

ML estimation for θ comes from $L i k_{C C : I g n} (θ)$ , which involves only the complete cases {n_zy}.

Table 3.

Hypothetical counts for complete-case analysis

Randomization group	Outcome
	Y=0	Y =1	Missing
Z=0	n₀₀ (400)	n₀₁ (600)	w₀ (200)
Z=1	n₁₀ (200)	n₁₁ (600)	w₁ (400)

Open in a new tab

2.4. Example 2

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a binary biomarker. Missing in outcome depends only on randomization group. For this scenario, a simple ML estimate of treatment effect is $d = θ_{(E S T) 1} - θ_{(E S T) 0}$ , where $θ_{(E S T) z} = n_{z 0} / n_{z +}$ . The estimated standard error is $s e = \sqrt v$ , where $v = \sum_{z} θ_{(E S T) z} (1 - θ_{(E S T) z}) / n_{z +}$ . For the hypothetical counts in Table 3, d= 0.150 with standard error 0.022.

3. COMPLETE-CASE ANALYSIS WITH COVARIATE ADJUSTMENT

Consider a randomized trial in which missing in outcome Y depends on randomization group Z and baseline covariate X. If the covariate X is not included in the outcome model, the missing-data mechanism is nonignorable leading to challenging ML estimation. The simple expedient of conditioning on the baseline covariate X in the outcome model yields an ignorable likelihood and simple ML estimation based on complete cases with covariate adjustment. Separate derivations involve continuous and binary outcomes.

3.1. Continuous outcomes

Let X_i with realization x_i denote the covariate for person i. The outcome model, $p r (y_{i} | z_{i}, x_{i}; θ)$ , is the distribution of outcome y_i given randomization to group z_i, and covariate x_i. The missing-data mechanism, $p r (M i s s Y = 1 | z_{i}, x_{i}; β)$ , is the probability of missing outcome y_i given covariate x_i and randomization to group z_i. For example, suppose the probability of missing outcome due to a side effect of treatment is highest among participants in randomization group 1 who are age 60 or older at randomization. Let X_i =0 if age at randomization is less 60, and 0 otherwise. An example of this missing-data mechanism is $p r (M i s s Y_{i} = 1 | Z_{i} = 0, X_{i} = 0, β) = β_{00} = 1 / 5, p r (M i s s Y_{i} = 1 | Z_{i} = 0, X_{i} = 1, β) = β_{01} = 1 / 5, p r (M i s s Y_{i} = 1 | Z_{i} = 1, X_{i} = 0, β) = β_{10} = 1 / 5, p r (M i s s Y_{i} = 1 | Z_{i} = 1, X_{i} = 1, β) = β_{11} = 1 / 2$ , where β = {β₀₀ β₀₁ β₁₀ β₁₁}. The parameter sets θ and β do not overlap and do not constrain one another. The likelihood is

L i k_{C C X} (θ, β) = L_{M i s s Y} \times L_{O b s Y}, where L_{M i s s Y} = \prod_{i \in {M i s s Y}} \int p r (M i s s Y_{i} = 1 | z_{i}, x_{i}; β) \times p r (y_{i} | z_{i}, x_{i}; θ) d y_{i} = \prod_{i \in {M i s s Y}} p r (M i s s Y_{i} = 1 | z_{i}, x_{i}; β), L_{O b s Y} = \prod_{i \in {O b s Y}} p r (M i s s Y_{i} = 0 | z_{i}, x_{i}; β) \times p r (y_{i} | z_{i}, x_{i}; θ) .

(5)

Rewriting the likelihood in equation (5) by defining f_CCX(β) as a function of parameters involving only β and defining $L i k_{C C X : I g n} (θ)$ as a function of parameters involving only θ yields

L i k_{C C X} (θ, β) = f_{C C X} (β) \times L i k_{C C X : I g n} (θ), where f_{C C X} (β) = \prod_{i \in {M i s s Y}} p r (M i s s Y_{i} = 1 | z_{i}, x_{i}; β) \prod_{i \in {O b s Y}} p r (M i s s Y_{i} = 0 | z_{i}, x_{i}; β) L i k_{C C X : I g n} (θ) = \prod_{i \in O b s Y} p r (y_{i} | x_{i}, z_{i}; θ) .

(6)

Because f_CCX(β) factors from the likelihood, the missing-data mechanism is ignorable. Moreover, because $L i k_{C C X : I g n} (θ)$ involves only observed values of outcome, ML estimation of θ involves only complete case with covariates.

If X is partially MCAR, the likelihood based on all the data is indirectly nonignorable, leading to challenging ML estimation. However, the simple of expedient of considering only the random subset of the data with the observed covariate X yields a likelihood factor involving only θ, a result related to the formulation of Little et al.⁵. Moreover, this likelihood factor involves only complete cases with observed values of covariate X. See Appendix A.

3.2. Example 1

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a continuous biomarker. Missing in outcome depends only on randomization group and age. Under this scenario, ML estimation can involve fitting to the complete cases a linear regression for the biomarker as a function of randomization group and age. The estimated treatment effect is the estimated coefficient for randomization group in the linear regression.

3.3. Binary outcomes

Consider a binary outcome and categorical baseline covariate. Suppose that missing in outcome depends only on treatment group and covariate. Let n_zxy denote the number of persons randomized to group z =0, 1 with baseline covariate x = 0, 1 and observed outcome y=0, 1. Let w_zx denote the number of persons with randomized to group z =0, 1 with baseline covariate x=0, 1 who had a missing outcome. See Table 4. . Let $p r (Y = 1 | z, x; θ) = θ_{z x}$ denote the probability of outcome 1 given randomization to group z. Let $p r (M i s s Y = 1 | z, x; β) = β_{z x}$ denote the probability of missing outcome given randomization group z. The likelihood with β = {β₀₀, β₀₁, β₁₀, β₁₁} and θ = {θ₀₀, θ₀₀, θ₁₀, θ₁₁} is

L i k_{C C X} (θ, β) = L_{M i s s Y} \times L_{O b s Y}, where L_{M i s s Y} = \prod_{z} \prod_{x} {β_{z x} (1 - θ_{z x}) + β_{z x} θ_{z x}}^{w_{z x}} = \prod_{z} \prod_{x} {β_{z x}}^{w_{z x}}, L_{O b s Y} = \prod_{z} \prod_{x} {(1 - β_{z x}) (1 - θ_{z x})}^{n_{z x 0}} \times {(1 - β_{z x}) θ_{z x}}^{n_{z x 1}} .

(7)

Rewriting the likelihood in equation (7) yields

L i k_{C C X} (θ, β) = f_{C C X} (β) L i k_{C C X : I g n} (θ), where f_{C C X} (β) = \prod_{z} \prod_{x} {(1 - β_{z x})}^{w_{z x}} \times {β_{z x}}^{n_{z x +}}, L i k_{C C X : I g n} (θ) = \prod_{z} \prod_{x} {(1 - θ_{z x})}^{n_{z x 0}} \times {θ_{z x}}^{n_{z x 1}} .

(8)

Because $L i k_{C C X : I g n} (θ)$ involves only observed values of Y, ML estimation of θ involves only complete case with covariates.

Table 4.

Hypothetical counts for complete-case analysis with covariate adjustment

Randomization group	Covariate	Outcome
		Y=0	Y =1	Missing
Z=0	X=0	n₀₀₀ (100)	n₀₀₁ (200)	w₀₀ (100)
	X=1	n₀₁₀ (300)	n₀₁₁ (400)	w₀₁ (100)
Z=1	X=0	n₁₀₀ (100)	n₁₀₁ (200)	w₁₀ (100)
	X=1	n₁₁₀ (100)	n₁₁₁ (400)	w₁₁ (300)

Open in a new tab

3.4. Example 2

A hypothetical trial randomizes participants to dietary supplement or placebo. The outcome is a continuous biomarker. Missing in outcome depends only on randomization group and a categorical covariate. Let π_x denote the known probability the covariate takes value x in a target population. An ML estimate of treatment effect in the target population is $d = \sum_{x} (θ_{(E S T) 1 x} - θ_{(E S T) 0 x}) π_{x}$ , where $θ_{z x (E S T)} = n_{z_{x 0}} / n_{z_{x +}}$ . The estimated standard error is $s e = \sqrt v$ , where $v = \sum_{z} \sum_{x} {θ_{(E S T) z x} (1 - θ_{(E S T) z x}) / n_{z x +}} {π_{x}}^{2}$ For the counts in Table 4 with π_x = 0.5, d= 0.114 and standard error 0.023.

4. SURVIVAL ANALYSIS WITH COVARIATE ADJUSTMENT

Consider a randomized trial where outcomes are survival times and censoring depends on randomization group Z and baseline covariate X. Let F denote the failure time in the absence of censoring, and let C denote censoring time in the absence of failure. Censoring at time c implies F occurs at time c or later, and failure at time f implies C occurs after time f. Let $p r (F_{i} = f_{i} | z_{i}, x_{i}; θ)$ denote the probability of failure (in the absence of censoring) at time f_i , given randomization group z_i and covariate x_i. Let $p r (C_{i} = c_{i} | z_{i}, x_{i}; β)$ denote the probability of censoring (in the absence of failure) at time t_i , given randomization group z_i and covariate x_i. The parameter sets θ and β do not overlap and do not constrain one another.

If the covariate X is not included in the outcome model, the censoring mechanism is nonignorable. The simple expedient of including X in the outcome model leads to an ignorable censoring mechanism. The likelihood is

L i k_{S u r v X} (θ, β) = L_{C e n s} \times L_{F a i l}, where L_{C e n s} = \prod_{i \in {C e n s}} {\int_{c_{i}}}^{\infty} p r (C_{i} = c_{i} | z_{i}, x_{i}; β) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) d f_{i} = \prod_{i \in {C e n s}} p r (C_{i} = c_{i} | z_{i}, x_{i}; β) \times p r (F \geq c_{i} | z_{i}, x_{i}; θ), L_{F a i l} = \prod_{i \in {F a i l}} {\int_{f_{i}}}^{\infty} p r (C_{i} = c_{i} | z_{i}, x_{i}; β) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) d c_{i} = \prod_{i \in {F a i l}} p r (C_{i} > f_{i} | z_{i}, x_{i}; β) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) .

(9)

The factor L_Cens integrates over the unobserved failure times. The factor L_Fail integrates over the unobserved censoring times. Rewriting the likelihood in equation (9) by defining f_SurvX(β) as a function of parameters involving only β and defining $L i k_{S u r v X : I g n} (θ)$ as a function of parameters involving only θ yields

L i k_{S u r v X} (θ, β) = f_{S u r v X} (β) \times L i k_{S u r v X : I g n} (θ), where f_{S u r v X} (β) = \prod_{i \in {C e n s}} p r (C_{i} = c_{i} | z_{i}, x_{i}; β) \times \prod_{i \in {F a i l}} p r (C_{i} > f_{i} | z_{i}, x_{i}; β), L i k_{S u r v X : I g n} (θ) = \prod_{i \in {C e n s}} p r (F \geq c_{i} | z_{i}, x_{i}; θ) \times \prod_{i \in {F a i l}} p r (F = f_{i} | z_{i}, x_{i}; θ) .

(10)

Because f_SurvX(β) factors from the likelihood in equation (10), the censoring mechanism is ignorable, and ML estimation of θ involves only $L i k_{S u r v X : I g n} (θ)$ .

If X is partially MCAR, the likelihood based on all the data is indirectly nonignorable, making ML estimation difficult. However, the simple of expedient of considering only a random subset of the data with the observed covariate X leads to a likelihood with the covariate that involves only θ. See Appendix B.

4.2. Example

A hypothetical trial randomizes participants negative on a biomarker to dietary supplement or placebo. The outcome is time until the biomarker is positive. Loss-to-follow-up depends only on randomization group and age. ML estimation can involve fitting a proportional hazards model in which the hazard for failure depends on randomization group and age. The estimated treatment effect is the estimated coefficient for randomization group in the model.

5. PROPENSITY-TO-BE-MISSING SCORES

The method of propensity-to-be-missing scores⁶ simplifies a complete-case analysis or a survival analysis when adjusting for multiple baseline covariates. It also avoids having to specify a function for incorporating multiple covariates into the outcome model and yields an easily interpretable difference estimate. The method of propensity-to-be-missing scores involves the following three steps.

Step 1. Fit a separate model to the missing-data mechanism in each randomization group. For a univariate outcome with randomization group z, fit a model for the missing-data mechanism, $p r (M i s s Y_{i} = 1 | Z_{i} = z, x_{i}; β_{z})$ . For a survival outcome with randomization group z, fit a model for the censoring mechanism, $p r (C_{i} = c_{i} | Z_{i} = z, x_{i}; β_{z})$ . For a proportional hazards model for the censoring mechanism in randomization group z, let c*(z, x_i; β_z) denote the proportionality component of the model, where the other component is the baseline hazard for censoring. Let β_(EST)z denotes the estimate of β_z

Step 2. Compute propensity-to-be-missing scores. For a univariate outcome, let $s c o r e_{z i} = p r (M i s s Y_{i} = 1 | z_{i}, x_{i}; β_{(E S T) z})$ . For a survival outcome with a proportional hazards model for censoring, let score_zi =c*(z_i, x_i; β_(EST)z).

Step 3. Compute estimated treatment effect and its standard error based on estimates in each quintile of scores. Divide the set of scores for each randomization group z, {score_zi}, into quintiles. For randomization group z and quintile j, let f_zj denote the estimated probability of outcome or the probability of survival to a pre-specified time. Let se_zj denote the estimated standard error of f_zj. Let N_z denote the number in randomization group z. The estimated treatment effect is the treatment effect averaged over the quintiles,

d = \sum_{j} (f_{1 j} - f_{0 j}) / 5 = (\sum_{j} f_{1 j} - \sum_{j} f_{0 j}) / 5 .

(11)

The estimated standard error of d is $s e = \sqrt v$ , where

v = \sum_{z} {s e_{z j^{2}} / 25 + \sum_{j} {(f_{z j} - f_{z 5})}^{2} 4 / (25 N_{z}) - \sum_{j > k} 2 (f_{z j} - f_{z 5}) / (25 N_{z})} .

(12)

5.1. Example

The AIDS Clinical Trials Group randomized patients to dual therapy (z=0) versus triple therapy (z=1) into groups of equal size of N_z= 328.⁷ Let d denote the estimated difference in survival to 18 months with triple instead of dual therapy. Approximately 20% of subjects were missing outcomes due to refusal to continue the study or loss-to-follow-up. Two baseline covariates, age and CD4 count, are likely related to both survival and dropout. Following Baker et al.,⁶ let f_zj denote the Kaplan-Meier estimate of the probability of surviving 18 months among participants in quintile j of the scores in randomization group z. Let se_zj denote the estimated standard error of f_zj. Substituting the values f_zj and se_zj from Table 5 into the equations (11) and (12) gives d = 0.72 with standard error 0.34.

Table 5.

Estimate and standard errors with propensity-to-be-missing score

Randomization group Z=0 (Dual therapy)			Randomization group Z=1 (Triple Therapy)
Quintile	Estimate^*	Standard Error	Quintile	Estimate^*	Standard Error
1	0.539	0.041	1	0.584	0.070
2	0.619	0.040	2	0.736	0.062
3	0.592	0.041	3	0.792	0.057
4	0.695	0.038	4	0.658	0.071
5	0.793	0.034	5	0.828	0.054

Open in a new tab

Estimated probability of surviving to 18 months in each quntile.

6. LONGITUDINAL DROPOUTS

Consider a randomized trial involving longitudinal outcomes in which dropout depends on previously observed outcomes and possibly randomization group and covariates. For example, participants with an unfavorable outcome at a previous time may be more likely to drop out than those with a favorable outcome at a previous time. ML estimation involving this ignorable missing-data mechanism is discussed separately for continuous and binary outcomes

6.1. Continuous outcome

Without loss of generalizability, consider outcomes at three times, denoted Y₁, Y₂, and Y₃, with Y₁ always observed. The outcome model, $p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ)$ , is the joint distribution of outcomes y_1i, y_2i, and y_3i given randomization to group z_i with covariate x_i. The covariate x_i could be a baseline covariate or covariate that varies over time in a predetermined manner, such as time of observation. The missing-data mechanism $p r (M i s s Y_{2 i} = 1 | y_{1 i}, z_{i}, x_{i}; β)$ , is the probability of missing outcome Y_2i given outcome y_1i, randomization to group z_i, and covariate x_i. The missing-data mechanism, $p r (M i s s Y_{3 i} = 1 | M i s s Y_{2 i} = 0, y_{1 i}, y_{2 i}; β)$ , is the probability of missing outcome Y_3i given not missing outcome Y_2i, outcome y_2i, outcome y_1i, randomization to group z_i, and covariate x_i. The parameter sets θ and β do not overlap and do not constrain one another. The probabilities of dropout at time 2, dropout at time 3, and no dropout are, respectively,

f_{d r o p 2} (y_{1 i}, z_{i}, x_{i}; β) = p r (M i s s Y_{2 i} = 1 | y_{1 i}, z_{i}, x_{i}; β), f_{d r o p 3} (y_{1 i}, y_{2 i}, z_{i}, x_{i}; β) = p r (M i s s Y_{3 i} = 1 | M i s s Y_{2 i} = 0, y_{1 i}, y_{2 i}, z_{i}, x_{i}; β) \times p r (M i s s Y_{2 i} = 0 | y_{1 i}, z_{i}, x_{i}; β), f_{n o d r o p} (y_{1 i}, y_{2 i}, z_{i}, x_{i}; β) = p r (M i s s Y_{3 i} = 0 | M i s s Y_{2 i} = 0, y_{1 i}, y_{2 i}, z_{i}, x_{i}; β) \times p r (M i s s Y_{2 i} = 0 | y_{1 i}, z_{i}, x_{i}; β) .

(13)

The likelihood is the product of three factors corresponding to the three subsets of participants defined by dropouts, {DropOutTime₂}, {DropOutTime3}, and {NoDropOut},

L_{L D} (θ, β) = L_{D r o p O u t T i m e 2} \times L_{D r o p o u t T i m e 3} \times L_{N o D r o p O u t}, where L_{D r o p O u t T i m e 2} = \prod_{i \in {D r o p O u t T i m e 2}} \int \int f_{d r o p 2} (y_{1 i}, z_{i}, x_{i}; β) \times p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ) d y_{3 i} d y_{2 i}, L_{D r o p O u t T i m e 3} = \prod_{i \in {D r o p O u t T i m e 3}} \int \int f_{d r o p 3} (y_{1 i}, y_{2 i}, z_{i}, x_{i}; β) \times p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ) d y_{3 i}, L_{N o D r o p O u t} = \prod_{i \in {N o D r o p O u t}} f_{n o d r o p} (y_{1 i}, y_{2 i}, z_{i}, x_{i}; β) \times p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ) .

(14)

Rewriting the likelihood in equation (14) by defining $f_{L D} (β)$ as a function of parameters involving only β and defining $L_{L D : I g n} (θ)$ as a function of parameters involving only θ yields

L_{L D} (θ, β) = f_{L D} (β) L_{L D : I g n} (θ), where f_{L D} (β) = \prod_{i \in {D r o p O u t T i m e 2}} f_{d r o p 2} (y_{1 i} | z_{i}, x_{i}; β) \times \prod_{i \in {D r o p O u t T i m e 3}} f_{d r o p 3} (y_{1 i}, y_{2 i} | z_{i}, x_{i}; β) \times \prod_{i \in {N o D r o p O u t}} f_{n o d r o p} (y_{1 i}, y_{2 i} | z_{i}, x_{i}; β), L_{L D : I g n} (θ) = \prod_{i \in {D r o p O u t T i m e 2}} \int \int p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ) d y_{3 i} d y_{2 i} \times \prod_{i \in {D r o p O u t T i m e 3}} \int p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ) d y_{3 i} \times \prod_{i \in {N o D r o p O u t}} p r (y_{1 i}, y_{2 i}, y_{3 i} | z_{i}, x_{i}; θ) .

(15)

ML estimation of θ in $L_{L D : I g n} (θ)$ in equation (15) typically involves a marginal outcome model in which outcome at each time is a function of time, treatment, and covariates, but not previous outcomes.

6.2. Example 1

A standard marginal outcome model assumes a multivariate normal distribution with a model for the mean outcome at each time and a structured variance covariance matrix arising from random effects or temporal correlations.⁸ Using the commercial software SAS Proc Mixed,⁹ Allison¹⁰ fit a multivariate normal model marginal model to continuous longitudinal outcomes with dropout. In Allison’s model, the longitudinal outcome was the logarithm of hourly wage and covariates were sex and year.

6.3. Binary outcomes

With longitudinal binary outcomes, a conditional model is often easier to implement than a marginal model. Without loss of generality consider 3 times. For simplicity of notation, covariates are implicit. Let y_j denote the binary outcome at time j. Let n₁(y₁) denote the number of participants who dropped out after outcome y_I. Let n₂(y₁, y₂) denote the number of participants who dropped out after outcomes y_I and y₂. Let n₃(y₁, y₂, y₃) denote the number of participants with observed outcomes y₁, y₂, and y₃. The conditional model factors the joint distribution of outcomes as pr(y₁, y₂, y₃; θ) = pr(y₁; θ) × pr(y₂| y₁;θ) × pr(y₃| y₂, y₁; θ). With obvious extension of notation from the continuous outcome scenario, the likelihood is

L_{L D} (θ, β) = L_{D r o p O u t T i m e 2} \times L_{D r o p o u t T i m e 3} \times L_{N o D r o p O u t}, where L_{D r o p O u t T i m e 2} = \prod_{y 1} {f_{d r o p 2} (y_{1}; β) \times p r (y_{1}; θ)}^{n_{1} (y_{1})}, L_{D r o p o u t T i m e 3} = \prod_{y 1} \prod_{y 2} {f_{d r o p 3} (y_{1}, y_{2}; β) \times p r (y_{1}; θ) \times p r (y_{2} | y_{1}; θ)}^{n_{2} (y_{1}, y_{2})}, L_{N o D r o p O u t} = \prod_{y 1} \prod_{y 2} \prod_{y 3} {f_{n o d r o p} (y_{1}, y_{2}; β) \times p r (y_{1}; θ) \times p r (y_{2} | y_{1}; θ) \times p r (y_{3} | y_{2}, y_{1}; θ)}^{n_{3} (y_{1}, y_{2}, y_{3})} .

(16)

Rewriting the likelihood in equation (16) gives

L_{L D} (θ, β) = f_{L D} (β) L_{L D : I g n} (θ), where f_{L D} (β) = \prod_{y 1} \prod_{y 2} \prod_{y 3} f_{d r o p 2} {(y_{1}; β)}^{n_{1} (y_{1})} \times f_{d r o p 3} (y_{1}, y_{2}; β)^{n_{2} (y_{1}, y_{2})} \times f_{n o d r o p} (y_{1}, y_{2}; β)^{n_{3} (y_{1}, y_{2}, +)}, L_{L D : I g n} (θ) = \prod_{y 1} p r {(y_{1}; θ)}^{n_{1} (y_{1}) + n_{2} (y_{1}, +) + n_{3} (y_{1}, +, +)} \times \prod_{y 1} \prod_{y 2} p r {(y_{2} | y_{1}; θ)}^{n_{2} (y_{1}, y_{2}) + n_{3} (y_{1}, y_{2}, +)} \times \prod_{y 1} \prod_{y 2} \prod_{y 3} p r {(y_{3} | y_{2}, y_{1}; θ)}^{n_{3} (y_{1}, y_{2}, y_{3})} .

(17)

ML estimation can involve fitting a logit model to the various factors and then combining estimates. A more parsimonious outcome model that conditions only on the previous outcome for times after the first would have a likelihood, with θ = {θ₁, θ₂},

L_{L D : I g n *} (θ) = \prod_{y 1} p r {(y_{1}; θ_{1})}^{n_{1} (y_{1}) + n_{2} (y_{1}, +) + n_{3} (y_{1}, +, +)} \times \prod_{y 1} \prod_{y 2} p r {(y_{2} | y_{1}; θ_{2})}^{n_{2} (y_{1}, y_{2}) + n_{3} (y_{1}, y_{2}, +)} \times \prod_{y 2} \prod_{y 3} p r {(y_{3} | y_{2}; θ_{2})}^{n_{3} (+, y_{2}, y_{3})} .

(18)

6.4. Example 2

A false positive (FP) on cancer screening is a positive screening test followed by a negative work-up or biopsy. The goal is to estimate the probability of least one FP in a program of screens when the number of screens received varies among participants. A convenient simplification uses screen number instead of time as the longitudinal metric, so all missing FP’s are dropouts. For example, receiving screens at times 1 and 3 and missing the screen at time 2 corresponds to receiving screens 1 and 2 and then dropping out. Following Baker et al.,¹¹ let outcome Y_j denote FP status (0 for no FP or 1 for FP) if screen j were received, which is missing when screen j is missing. Missing a screen likely depends on the FP status of the previous screens and possibly observed covariates, so the missing-data mechanism is ignorable.

Consider the count data in Table 6 from Baker et al.¹¹, which corresponds to ages 50 to 54 at first screen. As will become apparent, for the goal of estimating the probability of at least one FP in a screening program, it is only necessary to consider participants with no FP on the previous screen. One covariate is time interval x_j since last screen, with x_j = 1 (9–12 months), 2 (13–15 months), or 3 (16–18 months). A second covariate is screen number j. Let m_y denote the number of participants with outcome y on screen 1. For j>1, let n_jxy denote the number of participants with outcome y on screen j among participants with outcome Y=0= no FP on screen j–1 and for whom screen j occurred at time interval x_j since screen j–1. See Table 6. Based on an extension of equation (18), the likelihood factor involving θ = {θ₁, θ₂}is

L_{L D : I g n} (θ) = \prod_{y} p r {(Y_{1} = y; θ_{1})}^{m_{y}} \times \prod_{j > 1} \prod_{y} p r {(Y_{j} = y | Y_{(j - 1)} = 0, x, j; θ_{2})}^{n_{j x y}} .

(19)

ML estimation can include fitting a logit model, $p r (Y_{j} = 1 | Y_{(j - 1)} = 0, x, j; θ_{2}) = e x p i t (θ_{20} + θ_{21} j + θ_{22} x_{j})$ . The resulting estimates, θ_(EST)21= 0.23 with standard error 0.15 and $θ_{(E S T) 22} = 0.017$ with standard error of 0.15, suggest a more parsimonious model, $p r (Y_{j} = 1 | Y_{(j - 1)} = 0; θ_{2}) = θ_{2}$ . Let $p r (Y_{1} = 1; θ_{1}) = θ_{1}$ . The ML estimates are $θ_{(E S T) 1} = m_{1} / m_{+}$ and $θ_{(E S T) 2} = n_{+ + 1} / n_{+ + +}$ . The estimated probability of at least one false positive in 5 screens is $r = 1 - (1 - θ_{(E S T) 1}) {(1 - θ_{(E S T) 2})}^{4}$ with standard error $s e = \sqrt v$ , where $v = {(\partial r / θ_{(E S T) 1})}^{2} θ_{(E S T) 1} (1 - θ_{(E S T) 1}) / m_{+} + {(\partial r / θ_{(E S T) 2})}^{2} θ_{(E S T) 2} (1 - θ_{(E S T) 2}) / n_{+ + +}$ . Based on the counts in Table 6. $θ_{(E S T) 1} = 0.0179$ , $θ_{(E S T) 2} = 0.0069$ , and r = 0.045 with standard error 0.004.

Table 6.

Counts for false positive status given screen and time since late screen

Screen	Time interval since last screen	No false positive	False positive
1	Not applicable	n₁₀ (4509)	n₁₁ (82)
2	1	n₂₁₀ (1662)	n₂₁₁ (7)
	2	n₂₂₀ (1634)	n₂₂₁ (13)
	3	n₂₃₀ (291)	n₂₃₁ (1)
	4	n₂₄₀ (406)	n₂₄₁ (2)
3	1	n₃₁₀ (1589)	n₃₁₁ (9)
	2	n₃₂₀ (1488)	n₃₂₁ (10)
	3	n₃₃₀ (218)	n₃₃₁ (2)
	4	n₃₄₀ (204)	n₃₄₁ (2)
4	1	n₄₁₀ (1087)	n₄₁₁ (13)
	2	n₄₂₀ (1467)	n₄₂₁ (10)
	3	n₄₃₀ (193)	n₄₃₁ (2)
	4	n₄₄₀ (48)	n₄₄₁ (0)

Open in a new tab

7. PERFECT FIT ANALYSIS

A perfect fit analysis is an algebraic method of ML estimation with partially observed categorical data and a saturated model. In a saturated model the number of independent parameters equals number of independent cell counts. The advantage of using a saturated model is that it makes as few assumptions as possible. A perfect fit analysis involves the following steps:

Set observed counts equal to expect counts and solve for closed-form parameter estimates.
Compute the statistic of interest from the parameter estimates.
Compute the standard error using the Multinomial-Poisson (MP) transformation.

The MP transformation¹² changes a complicated multinomial likelihood into a simpler likelihood for Poisson random variables with the same the ML estimates and variances. Let{n_u} denote the set of observed counts, indexed by u. Let d denote the statistic from the perfect fit analysis, which is closed-form function of {n_u}. With a saturated model, the MP transformation treats n_u as a Poisson random variable with mean and variance equal to n_u. Applying the delta method, the estimated variance of d is

v a r M P (d) = \sum_{u} {(\partial d / \partial n_{u})}^{2} n_{u},

(20)

a quantity easily calculated using symbolic computing. A caveat of the perfect fit analysis is that the parameter estimates are ML only if the parameter estimates lie in the interior of the parameter space.

7.1. Example 1

The Prostate Cancer Prevention Trial randomized participants to placebo (z=0) or finasteride (z=1). ¹³ One outcome of interest was prostate cancer status determined on biopsy (y = 0 = n0, y = 1 = yes), which is missing if there is no biopsy. An auxiliary variable is a variable observed after randomization that is related to outcome. Biopsy recommendation based on a test for prostate specific antigen (a= 0 = no or a = 1 = yes) is an auxiliary variable which is strongly related to the probability of missing the outcome. Incorporating this auxiliary variable into the model improves the adjustment for missing outcomes. Let n_zay denote the number of participants in randomization group z with auxiliary variable a and observed outcome y. Let w_za denote the number of participants in randomization group z with observed auxiliary variable a and missing outcome. See Table 7.

Table 7.

Counts for Prostate Cancer Prevention Trial

Randomization group	Auxiliary variable (biopsy recommendation)	Outcome (prostate cancer on biopsy)
		Y=0 (No)	Y =1 (Yes)	Missing
Z=0 (Placebo)	A=0 (No)	n₀₀₀ (618)	n₀₀₁ (3675)	w₀₀ (3955)
Z=0 (Placebo)	A=1 (Yes)	n₀₁₀ (524)	n₀₁₁ (479)	w₀₁ (215)
Z=1 (Finasteride)	A=0 (No)	n₁₀₀ (381)	n₁₀₁ (3791)	w₁₀ (4169)
Z=1 (Finasteride)	A=1 (Yes)	n₁₁₀ (409)	n₁₁₁ (458)	w₁₁ (214)

Open in a new tab

The outcome model is $p r (Y = y | z) = θ_{y | z}$ . The auxiliary variable model, $p r (A = a | y, z) = λ_{a | z y}$ , is the probability of auxiliary variable a given outcome y and randomization group z. The missing-data mechanism is $p r (M i s s Y = 1 | a, z) = β_{z a}$ . The model is saturated because there are 10 independent parameters (2 for θ_1|z, 4 for λ_1|zy, and 4 for β_za) and 10 independent cell counts (8 for {n_zay}and 4 for {w_za} minus 2 because n_z++ + w_z+ is fixed). The likelihood is

L i k (θ, β) = L_{M i s s Y} \times L_{O b s Y}, where L_{M i s s Y} = \prod_{z} \prod_{a} {(\sum_{y} θ_{y | z} λ_{a | z y} β_{z a})}^{w_{z a}}, L_{O b s Y} = \prod_{z} \prod_{a} \prod_{y} {θ_{y | z} λ_{a | z y} (1 - β_{z a})}^{n_{z a y}} .

(21)

The perfect fit analysis yields ML estimates without numerical maximization of the likelihood in equation (21). It involves the following steps.

Step 1. Set observed counts equal to expect counts and solve for closed-form parameter estimates. Let $N_{z} = n_{z + +} + w_{z +}$ . The relevant equations are

N_{z} \sum_{y} θ_{y | z} λ_{a | z y} β_{z a} = w_{z a} .

(22)

N_{z} θ_{y | z} λ_{a | z y} (1 - β_{z a}) = n_{z a y},

(23)

Summing equation (22) over y and adding it to equation (23) yields

N_{z} \sum_{y} θ_{y | z} λ_{a | z y} = n_{z a +} + w_{z a} .

(24)

Substituting equation (24) into equation (22) and solving for β_za gives $β_{(E S T) z a} = w_{z a} / (n_{z a +} + w_{z a})$ . Substituting β_(EST)za for β_za in equation (23) and simplifying gives

θ_{y | z} λ_{a | z y} = (n_{z a y} + w_{z a} n_{z a y} / n_{z a +}) / N_{z}

(25)

Summing both sides of equation (25) over a and solving for θ_y|z gives

θ_{(E S T) y | z} = m_{z y} / m_{z +}, where m_{z y} = n_{z + y} + \sum_{a} w_{z a} n_{z a y} / n_{z a +} .

(26)

Step 2. Compute the statistic of interest from the parameter estimates.

The statistic of interest is $d = θ_{(E S T) 1 | 1} - θ_{(E S T) 1 | 0}$ .

Step 3. Compute the standard error using the MP transformation. The estimated standard error is $s e = \sqrt v$ , where $v = v a r M P (d) = \sum_{z} \sum_{a} \sum_{y} {(\partial d / \partial n_{z a y})}^{2} n_{z a y} + \sum_{z} \sum_{a} {(\partial d / \partial w_{z a})}^{2} w_{z a}$ . Based on the counts in Table 7, which come from Baker¹⁴, d= −0.10 with standard error 0.007, indicating that finasteride decreases prostate cancer on biopsy.

Baker¹⁵ extended this perfect fit analysis to include l_za participants randomized to group z with auxiliary variable a who are missing outcome, yielding $θ_{(E S T) y | z} = m_{z y} / m_{z +}$ , where $m_{z y} = n_{z + y} + \sum_{a} w_{z a} n_{z a y} / n_{z a} + l_{z a}$ . For the Prostate Cancer Prevention Trial, Baker et al.¹⁴ implemented a more complicated perfect fit analysis involving biopsy recommendation, biopsy result, and surgery result.

7.2. Example 2

A hypothetical trial randomizes smokers to a behavioral intervention or no intervention to stop smoking. The binary outcome is self-report of smoking cessation. Missing in outcome depends only on the unobserved outcome. See Appendix C for a perfect fit analysis based on Baker and Laird.¹⁶ Under this scenario, for the hypothetical data in Table 3, the estimated risk difference between the two randomization groups is d=0.677 with standard error 0.10.

7.5. Example 3

A hypothetical diagnostic testing study involves two samples cross-classifying binary test results: (i) a reference test versus a new test and (ii) a gold standard versus a reference test. The goal is to estimate the sensitivity and specificity of the new test versus the gold standard. The assumptions are an ignorable missing mechanism, the same sensitivity and specificity of both new and reference test (relative to the gold standard) in both samples, and conditional independence of reference and new test results given the gold standard. See Appendix D for a perfect fit analysis based on Baker.¹⁷ For the hypothetical data in Table 8, the estimated sensitivity of the new test relative to the gold standard is 0.80 with standard error 0.17. The estimated specificity of the new test relative to the gold standard is 0.60 with standard error 0.10.

Table 8.

Hypothetical counts for diagnostic testing example

Data Set		New test
1	Reference test	0 (negative)	1 (positive)
	0 (negative)	z₀₀ (84)	z₀₁ (46)
	1 (positive)	z₁₀ (26)	z₁₁ (44)
		Gold standard
2	Reference test	0 (negative)	1 (positive)
	0 (negative)	w₀₀ (18)	w₀₁ (4)
	1 (positive)	w₁₀ (2)	w₁₁ (6)

Open in a new tab

7.4. Example 4

Some randomized trials involve all-or-none compliance, the switching of treatments immediately at randomization. For all-or-none compliance (or related all-or-none availability in before-and-after studies), Baker and Lindeman¹⁸ and Imbens and Angrist¹⁹ (followed by Angrist, Imbens, and Rubin²⁰) independently developed a method, later called latent class instrumental variables,²¹ that uses potential outcomes with reasonable assumptions to estimate the causal effect of treatment among the latent class of compliers (participants who would receive the assigned treatment regardless of randomization group to which they are actually assigned). Baker²² used a perfect fit analysis for ML estimation involving a randomized cancer screening trial, discrete-time survival data, all-or-none compliance, and latent class instrumental variables. Baker and Kramer²³ formulated a perfect fit analysis for ML estimation involving a randomized trial, a partially observed binary endpoint, all-or-none compliance, and latent class instrumental variables. See Appendix E. For the hypothetical data in Table 9 under the latter scenario, the estimated treatment effect based on the perfect fit analysis is 0.4 with standard error 0.095.

Table 9.

Hypothetical counts for latent class instrumental variables with missing outcome

Randomization group	Treatment received	Outcome
		Y=0	Y =1	Missing
Z=0	T=0	n₀₀₀ (100)	n₀₀₁ (200)	w₀₀ (100)
	T=1	n₀₁₀ (400)	n₀₁₁ (300)	w₀₁ (100)
Z=1	T=0	n₁₀₀ (300)	n₁₀₁ (200)	w₁₀ (200)
	T=1	n₁₁₀ (100)	n₁₁₁ (100)	w₁₁ (300)

Open in a new tab

8. COMPOSITE LINEAR MODELS

Composite linear models²⁴ provide a flexible approach to ML estimation with complex missing data patterns involving categorical data and either ignorable or nonignorable missing-data mechanisms. Let U_obs denote a vector of expected values for observed counts. Let U denote a vector of expected counts if there were no missing data. Let C denote a matrix of 0’s and 1’s that indicates the unobserved expected counts summed to yield observed expected counts. A composite linear model has the form,

U_{o b s} = C U, where U = N \exp {\sum_{k} Q^{(k)}}, Q^{(k)} = q^{(k)} (W^{(k)}, G^{(k)} H^{(k)}), H^{(k)} = h^{(k)} (Z^{(k)}, X^{(k)} ϕ^{(k)}),

(27)

W^(k), G^(k)_, H^(k)_, Z^(k), and X^(k) are matrices, h^(k)( ) and q^(k) ( ) are functions, k indexes model components, and the parameter vector ϕ^(k) involves either the outcome model parameters θ or the missing-data mechanism parameters β. The parameter sets θ and β do not overlap and do not constrain one another. See Appendix F for an illustration of the matrices involved with discrete-time hazard models.

Once the matrices and function are specified, maximization is automatic, beginning with an EM algorithm, which is insensitive to poor starting values, and then switching to a Newton-Raphson algorithm, which converges faster and yields standard errors. Examples include two-phase surveys²⁵, regression analysis of grouped survival data with missing covariate²⁶, and misclassification.²⁷ Software for fitting composite linear models, written in Mathematica²⁸, is available at https://prevention.cancer.gov/about-dcp/staff-search/stuart-g-baker-scd/composite-linear-models. The user needs to specify the matrices and functions, a task simplified by using a previous example as a template, but nevertheless challenging.

8.1. Example 1

Investigators were interested in the effect of drain, a tube for removing fluid from a wound, on wound infection following surgery. Because of the expense and difficulty of following all patients after hospital discharge to determine wound infection status, investigators implemented the following double sampling design. For a random partial follow-up sample, investigators followed 1,236 patients after surgery until wound infection, hospital release, or the end of the study at 30 days after surgery, whichever occurred first. For a random full follow-up sample, investigators followed 194 patients after surgery until wound infection (either in the hospital or after release from the hospital) or the end of the study at 30 days after surgery, whichever occurred first. Time since surgery involves 4 intervals: 1 (0–4 days), 2 (5–7) days, 3 (8–30 days), and 4 (more than 30 days). See Table 10.

Table 10.

Counts for double sampling study of wound infection

Sample	Drain	Interval of hospital discharge with no prior infection	Interval of infection			No infection by day 30	Hospital discharge without follow-up
Sample	Drain	Interval of hospital discharge with no prior infection	0–4 days	5–7 days	8–30 days	No infection by day 30	Hospital discharge without follow-up
Partial follow-up sample	No	None	6	10	7	1
		0–4 days	--	--	--	--	180
		5–7 days	0	--	--	--	544
		8–30 days	0	0	--	--	232
	Yes	None	7	15	11	0
		0–4 days	--	--	--	--	8
		5–7 days	0	--	--	--	87
		8–30 days	0	0	--	--	128
Full follow-up sample	No	None	2	1	1	0
		0–4 days	0	0	3	39
		5–7 days	0	0	3	89
		8–30 days	0	0	0	30
	Yes	None	3	3	0	0
		0–4 days	0	0	0	2
		5–7 days	0	0	0	10
		8–30 days	0	0	1	7

Open in a new tab

Baker et al.²⁹ formulated the following model to analyze these data. Let h_f|x denote the hazard function for wound infection (in the absence of censoring) in time interval f =1, 2, 3 given drain status x = 0 = no drain or x = 1=drain. The outcome model is

l o g i t (h_{f | x}) = θ_{0} + θ_{2} (if f = 2) + θ_{3} (if f = 3) + θ_{X} (if x = 1) .

(28)

Let c_t|yd denote the hazard function for hospital discharge at the start of time interval t given wound infection occurs in time interval f (for t≥ f). The missing-data mechanism is

l o g i t (c_{t | f x}) = β_{0} + β_{2} (if t = 2) + β_{3} (if t = 3) + β_{X} (if x = 1) + β_{R L} (if t = f) .

(29)

The parameter β_RL, where the subscript denote response-linked, makes the missing-data mechanism directly nonignorable. Using the method of composite linear models, Baker et al.²⁹ estimated β_RL as β_(EST)RL = −7.12 with standard error of 1.44. The estimated effect of drain on wound infection was θ_(EST)X = 1.40 with standard error 0.24.

8.2. Example 2

The Muscatine Coronary Risk Factor Study collected data on obesity outcome (yes or no) in girls and boys at three times (initially and 2 and 4 year later).³⁰ Missing outcomes occurred at one or more times, yielding 7 patterns of missing data. Baker³¹ used composite linear models to fit a marginal outcome model in which the probability of obesity at each time depends on age at that time and gender. The outcome model coupled with a nonignorable missing-data mechanism fit substantially better than the outcome model coupled with an ignorable missing-data mechanism nested within the nonignorable missing-data mechanism. The estimated coefficient for sex in the logistic outcome model was 0.15 with standard error 0.08, indicating higher obesity levels for girls than boys.

9. DISCUSSION

An often-overlooked consideration with missing-data analyses is the need for missing-data adjustments to make sense. One criterion for sensible missing-data adjustment is that the unobserved data exist or could be ascertained. For example, if a biopsy result is missing because an eligible person did not arrive at the clinic, there exists an unobserved biopsy result that could have been ascertained if the person had arrived at the clinic. However, if the biopsy result is missing because of death, there does not exist an unobserved biopsy result that could have been ascertained. A less stringent criterion for sensible missing-data adjustment is that the unobserved result could be observed in a relevant target population where the missingness could be prevented, as might apply if missing in biopsy was due to accidental death and the target population specified no accidental deaths.

An important component of many missing-data analyses is a sensitivity analysis to determine how assumptions about the missing-data mechanisms affect estimates of treatment effect in the outcome model. A model-based sensitivity analysis computes estimated treatment effect under multiple missing-data mechanisms, as when fitting composite linear models. If an outcome model coupled with non-ignorable missing-data mechanism fits substantially better than the same outcome model coupled with a nested ignorable missing-data mechanism, reported estimates should be based on the former model. A parameter-based sensitivity analysis computes estimated treatment effect when varying a parameter measuring the association between missing outcome and outcome.³⁶ A data-based sensitivity analysis computes estimated treatment effect when imputing values for the missing outcome.^{37, 38} A randomization-based sensitivity analysis using the randomization distribution to bound the estimated treatment effect if missing in outcome depends on an unobserved binary variable.³⁹ When implementing a sensitivity analyses, prior knowledge helps to limit the range of possible values.

In summary, the ML methods discussed here range from the simple to the complex. The simplest methods are complete-case analysis and complete-case analysis adjusted for covariates Survival analyses adjusted for covariates are easy to implement using standard software. For missing in univariate or survival outcome based on multiple covariates, the propensity-to-be-missing score is preferable and easy to implement. More complicated ML methods are needed for fitting models with longitudinal dropouts. Commercial software is available with continuous outcomes. With binary longitudinal outcomes, a conditional model is easy to fit but extra work is needed to combine parameter estimates to estimate the quantity of interest. The perfect fit analysis is an underappreciated approach to obtain to obtain closed form ML estimates and variances for complicated likelihoods involving saturated models fit to categorical data. Some work is needed in the algebraic derivation, but it is generally simpler to implement than iterative numerical fitting. The most complex method discussed is the method of composite linear models, which is a flexible approach involving categorical data. Except for composite linear models, which awaits the development of more user-friendly software, all the above methods can contribute to the toolkit of statisticians for analyzing clinical and prevention studies with missing outcomes.

ACKNOWLEDGEMENTS

This work was supported by the National Institutes of Health.

APPENDIX A

This Appendix discusses ML estimation for a randomized trial in which missing in outcome Y depends on randomization group Z and baseline covariate X in which the baseline covariate is MCAR among some participants. Four subsets of participants defined by the pattern of missing data are missing both Y and X, {MissY:MissX}; missing Y with observed X, {MissY:ObsX}; observed Y with missing X,{ObsY:MissX}; and observed Y and observed X, {ObsY:ObsX}. Let β = (β₁, β₂). Let pr(x_i; λ) denote the distribution of x_i, which is modeled by parameter set λ. Let MissX denote the missing-data indicator for X. The probability of missing X is constant, as denoted by pr(MissX_i = 1; β₂). The likelihood is

L i k_{C C X} = L_{M i s s Y : M i s s X} \times L_{M i s s Y : O b s X} \times L_{O b s Y : M i s s X} \times L_{O b s Y : O b s X}, where L_{M i s s Y : M i s s X} = \prod_{i \in {M i s s Y : M i s s X}} \int \int p r (M i s s Y_{i} = 1 | M i s s X_{i} = 1, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 1; β_{2}) \times p r (y_{i} | x_{i}, z_{i}; θ) \times p r (x_{i}; λ) d x_{i} d y_{i}, L_{M i s s Y : O b s X} = \prod_{i \in {M i s s Y : O b s X}} \int p r (M i s s Y_{i} = 1 | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (y_{i} | x_{i}, z_{i}; θ) \times p r (x_{i}; λ) d x_{i} d y_{i}, L_{O b s Y : M i s s X} = \prod_{i \in {O b s Y : M i s s X}} \int p r (M i s s Y_{i} = 0 | M i s s X_{i} = 1, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 1; β_{2}) \times p r (y_{i} | x_{i}, z_{i}; θ) \times p r (x_{i}; λ) d x_{i}, L_{O b s Y : M i s s X} = \prod_{i \in {O b s Y : O b s X}} p r (M i s s Y_{i} = 0 | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (y_{i} | x_{i}, z_{i}; θ) p r (x_{i}; λ) .

(A.1)

The likelihood in equation (A.1) is indirectly nonignorable. There is no factor of the likelihood involving θ without β because β linked to λ in $L_{M i s s Y : M i s s X}$ and λ is linked to θ in $L_{O b s Y : M i s s X}$ . To simplify ML estimation, a simple expedient is to consider the likelihood for the random sample of participants with observed values of covariate,

L i k_{C C X : O b s X} = L_{M i s s Y : O b s X} \times L_{O b s Y : O b s X} = f_{C C X} (β, λ) \times L i k_{C C X : O b s X : I g n} (θ), where f_{C C X} (β, λ) = \prod_{i {M i s s Y : O b s X}} p r (M i s s Y_{i} = 1 | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (x_{i}; λ) \times \prod_{i {O b s Y : O b s X}} p r (M i s s Y_{i} = 0 | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (x_{i}; λ), L i k_{C C X : O b s X : I g n} (θ) = \prod_{i \in {O b s Y : O b}} p r (y_{i} | x_{i}, z_{i}; θ) .

(A.2)

ML estimation for θ in equation (A.2) involves only $L i k_{C C X : O b s X : I g n} (θ)$ , which translates into complete case analysis adjusted for observed values of covariate X.

APPENDIX B

This Appendix discusses ML estimation in a randomized trial with survival times in which the probability of censoring depends on randomization group Z and a partially observed baseline covariate X. Four subsets of participants defined by the pattern of missing data are censored with missing X,{CensMissX}, censored with observed X, {CensObsX}, failure with missing X, {FailMissX}, and failure with observed X, {FailObsX}. Let β = (β₁, β₂). The probability of missing X is constant, as denoted by $p r (M i s s X_{i} = 1; β_{2})$ . The missing data patterns give rise to the following likelihood,

L i k_{S u r v X} (θ, β) = L_{C e n s M i s s X} \times L_{C e n s O b s X} \times L_{F a i l M i s s X} \times L_{F a i l O b s X}, where L_{C e n s; M i s s X} = \prod_{i \in {C e n s M i s s X}} {\int_{c_{i}}}^{\infty} \int p r (C_{i} = c_{i} | M i s s X_{i} = 1, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 1; β_{2}) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) \times p r (x_{i}; λ) d x_{i} d f_{i}, L_{C e n s : O b s X} = \prod_{i \in {C e n s O b s X}} {\int_{c_{i}}}^{\infty} p r (C_{i} = c_{i} | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) \times p r (x_{i}; λ) d f_{i}, L_{F a i l : M i s s X} = \prod_{i \in {F a i l M i s X}} {\int_{f_{i}}}^{\infty} \int p r (C_{i} = c_{i} | M i s s X_{i} = 1, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 1; β_{2}) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) p r (x_{i}; λ) d x_{i} d c_{i}, L_{F a i l : O b s X} = \prod_{i \in {F a i l O b s X}} {\int_{f_{i}}}^{\infty} p r (C_{i} = c_{i} | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) \times p r (x_{i}; λ) d c_{i} .

(B.1)

The likelihood in equation (B.1) is indirectly nonignorable. There is no factor of the likelihood involving θ without β because β is linked to λ in $L_{C e n s : M i s s X}$ and λ is linked to θ in $L_{F a i l : M i s s X}$ . To simplify ML estimation, a simple expedient is to consider the likelihood for the random sample of participants with observed values of covariate,

L i k_{S u r v X : O b s X} (θ, β) = L_{C e n s O b s X} \times L_{F a i l O b s X} = f_{S u r v X} (β, λ) \times L i k_{S u r v X : O b s X : I g n} (θ), where f_{S u r v X} (β, λ) = \prod_{i \in {C e n s O b s X}} p r (C_{i} = c_{i} | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (x_{i}; λ) \prod_{i \in {F a i l O b s X}} {\int_{f_{i}}}^{\infty} p r (C_{i} = c_{i} | M i s s X_{i} = 0, z_{i}, x_{i}; β_{1}) \times p r (M i s s X_{i} = 0; β_{2}) \times p r (x_{i}; λ) d c_{i} . L i k_{S u r v X : O b s X : I g n} (θ) = \prod_{i \in {C e n s O b s X}} p r (F_{i} \geq c_{i} | z_{i}, x_{i}; θ) \times \prod_{i \in {F a i l O b s X}} p r (F_{i} = f_{i} | z_{i}, x_{i}; θ) .

(B.2)

ML estimation of θ equation (B.2) involves only $L i k_{S u r v X : O b s X : I g n} (θ)$ , which translates into a survival analysis for the random sample of cases with observed values of covariate X.

APPENDIX C

This Appendix derives the perfect fit estimates for a randomized trial with a binary outcome Y in which missing in outcome depends on the outcome Y but not on the randomization group Z Let n_zy denote the number of participants randomized to group z =0, 1, with outcome y=0, 1. Let w_z denote the number of participants randomized to group z =0, 1 who are missing the outcome. See Table 3. The outcome model, $p r (Y = 1 | z; θ) = θ_{z}$ , is the probability of outcome 1 given randomization to group z. The directly nonignorable missing-data mechanism, $p r (M i s s Y = 1 | y; β) = β_{y}$ , is the probability of missing outcome y given outcome y. The model is saturated because there are 4 independent parameters (2 for θ_1|z and 2 for β_y) and 4 independent cell counts (4 for {n_zy} and 2 for {w_z} minus 2 because n_z++ w_z is fixed). The perfect fit analysis follows.

Step 1. Set expected counts equal to observed counts and solve for parameter estimates. Let $ϕ_{y} = β_{y} / (1 - β_{y})$ and $μ_{z y} = n_{z} + θ_{y | z} (1 - β_{y})$ . The relevant equations are

μ_{z y} = n_{z y}

(C.1)

\sum_{y} μ_{z y} ϕ_{y} . = w_{z} .

(C.2)

Simultaneously solving equations (C.1) and (C.2) yields

ϕ_{(E S T) 0} = (n_{11} w_{0} - n_{01} w_{1}) / (n_{11} n_{00} - n_{01} n_{10}),

(C.3)

ϕ_{(E S T) 1} = (n_{00} w_{1} - n_{10} w_{0}) / (n_{11} n_{00} - n_{01} n_{10}) .

(C.4)

If $ϕ_{(E S T) y} \geq 0$ , $ϕ_{(E S T) y}$ is the ML estimate. If $ϕ_{(E S T) 0}$ or $ϕ_{(E S T) 1}$ is negative, the ML estimates are on the boundary of the parameter space.

Step 2. Compute the statistic of interest. The estimated risk difference is d = p₁ −p₀, where $p_{z} = {n_{z 1} (1 + ϕ_{(E S T) 1})} / \sum_{y} n_{z y} (ϕ_{(E S T) y})$ .

Step 3. Compute the standard error using the MP transformation. The standard error is $s e = \sqrt v$ , where $v = v a r M P (d)$ .

APPENDIX D

This Appendix presents a perfect fit analysis for the diagnostic testing data in Table 8. Data set 1 involves {z_ij}, the number of persons with reference test result i = 0, 1 and new test result j = 0, 1. Data set 2 involves {w_jk}, the number of persons with reference test result i=0, 1 and gold standard result k = 0, 1. The model assumes independence of the test results given the gold standard result and an ignorable missing-data mechanism. Let ψ_i|k denote the probability of reference test result i given gold standard result k. Let θ_j|k denote the probability of new test result j given gold standard result k. Let π_k denote the probability of gold standard result k in data set 1 and let ρ_k denote the probability of gold standard k in data set 2. The outcome model is saturated with 6 independent parameters (2 for ψ_1|k , 2 for θ_j|k , 1 for π_k, and 1 for ρ_k ) and 6 independent cell counts (3 for {z_ik} and 3 for {w_ik}).

Step 1. Set observed counts equal to expect counts and solve for closed-form parameter estimates. The relevant equations ignore the missing-data mechanism,

z_{+ +} \sum_{k} ψ_{i | k} θ_{j | k} π_{k} = z_{i j},

(D.1)

w_{+ +} y_{i | k} ρ_{k} = w_{i k} .

(D.2)

Summing both sides of equation (D.2) over i and solving for ρ_k gives $ρ_{(E S T) k} = w_{+ k} / w_{+ +}$ . Substituting ρ_(EST)k into equation (D.2) and solving for $ψ_{i | k}$ gives $ψ_{(E S T) i | k} = w_{i k} / w_{+ k}$ . Substituting $ψ_{(E S T) i | k}$ into equation (D.1) gives

z_{+ +} \sum_{k} (w_{i k} / w_{+ k}) θ_{j | k} π_{k} = z_{i j} .

(D.3)

Rewriting equation (D.3) as separate equations for i=0 and i=1gives

(w_{00} / w_{+ 0}) (θ_{j | 0} π_{0}) + (w_{01} / w_{+ 1}) (θ_{j | 1} π_{1}) = z_{0 j} / z_{+ +},

(D.4)

(w_{10} / w_{+ 0}) (θ_{j | 0} π_{0}) + (w_{11} / w_{+ 1}) (θ_{j | 1} π_{1}) = z_{1 j} / z_{+ +} .

(D.5)

Simultaneously solving equations (D.4) and (D.5) gives

θ_{j | 0} π_{0} = w_{+ 0} (w_{00} z_{1 j} - w_{10} z_{0 j}) / {(w_{00} w_{11} - w_{10} w_{01}) z_{+ +}},

(D.6)

θ_{j | 1} π_{1} = w_{+ 1} (w_{01} z_{1 j} - w_{11} z_{0 j}) / {(w_{00} w_{11} - w_{10} w_{01}) z_{+ +}},

(D.7)

Summing both sides of equation (D.6) and (D.7) over j and solving for π₀ and π₁ yields

π_{(E S T) 0} = w_{+ 0} (w_{00} z_{1 +} - w_{10} z_{0 +}) / {(w_{00} w_{11} - w_{10} w_{01}) z_{+ +}} .

(D.8)

π_{(E S T) 1} = w_{+ 1} (w_{01} z_{1 +} - w_{11} z_{0 +}) / {(w_{00} w_{11} - w_{10} w_{01}) z_{+ +}} .

(D.9)

Substituting $π_{(E S T) 0}$ into equation (D.6) and $π_{(E S T) 1}$ into equation (D.7) and solving for θ_j|0 and θ_j|1 yields

θ_{(E S T) (j | 0)} = (w_{00} z_{1 j} - w_{10} z_{0 j}) / (w_{00} z_{1 +} - w_{10} z_{0 +}),

(D.10)

θ_{(E S T) (j | 1)} = (w_{11} z_{0 j} - w_{01} z_{1 j}) / (w_{11} z_{0 +} - z_{01} z_{1 +}) .

(D.11)

Step 2. Compute the statistic of interest. Specificity is $θ_{(E S T) 0 | 0}$ . Sensitivity is $θ_{(E S T) 1 | 1}$ .

Step 3. Compute the standard error using the MP transformation. The standard error is $s e = \sqrt v$ , where $v = v a r M P (d)$ and $d = θ_{(E S T) 0 | 0}$ or $θ_{(E S T) 1 | 1}$ .

APPENDIX E

This Appendix derives perfect fit analysis for estimating treatment effect in a randomized trial with all-or-none compliance and binary outcome in which missing depends on outcome and randomization group. Let n_zby denote the number of participants randomized to treatment z who receive treatment b immediately after randomization and experience outcome y. Let w_zb denote the number of persons randomized to treatment z who receive treatment b immediately after randomization and are missing the outcome. See Table 9.

Let s index latent classes defined by potential outcomes of treatment received. Under the monotonicity assumption s takes three possible values: A = always-takers, who would receive the new treatment regardless of which randomization group to which they might be assigned, N = never-takers who would receive the old treatment regardless of which randomization to which they might be assigned, and C = compliers who receive the assigned treatment regardless of which randomization group to which they might be assigned.

The outcome model, $pr (Y = y | z, s; θ) = θ_{s z y}$ , is the probability of outcome y given randomization group z and latent class s. The missing-data mechanism, $pr (M i s s Y = 1 | z, s; β) = β_{z s}$ , is the probability of missing outcome given randomization group z and latent class s. Let $pr (S = s) = π_{s}$ denote the probability of being in latent class s. Under the compound exclusion restriction assumption, the probabilities of outcome and missing in outcome do not depend on randomization group for always-takers and never-takers, namely θ_t|zs = θ_t|z and β_zs = β_z for s = A and N. The model is saturated because there 10 independent parameters (θ_1|A θ_1|0C , θ_1|1C ,θ_1|N , β_A, β_0C, β_1C, β_N, π_A, and π_C ) and 10 independent cells counts (8 for {n_zby}and 4 for {w_zb} minus 2 because n_z+++ + w_z++ is fixed). The perfect fit analysis follows.

Step 1. Set expected counts equal to observed counts and solve for parameter estimates.

Let N_z = n_z+++ + w_z++. The relevant equations based on the definitions of the latent classes A, C, and N, are

N_{0} {θ_{y | N} (1 - β_{N}) π_{N} + θ_{y | 0 C} (1 - β_{0 C}) π_{C}} = n_{00 y},

(E.1)

N_{0} {θ_{y | A} (1 - β_{A}) π_{A}} = n_{01 y},

(E2)

N_{1} {θ_{y | N} (1 - β_{N}) π_{N}} = n_{10 y},

(E.3)

N_{1} {θ_{y | 1 C} (1 - β_{1 C}) π_{C} + θ_{y | A} (1 - β_{A}) π_{A}} = n_{11 y},

(E.4)

N_{0} (β_{N} π_{N} + β_{0 C} π_{C}) = w_{00},

(E.5)

N_{0} β_{A} π_{A} = w_{01},

(E.6)

N_{1} β_{N} π_{N} = w_{10},

(E.7)

N_{1} (β_{1 C} π_{1 C} + β_{A} π_{A}) = w_{11} .

(E.8)

Summing equation (E.2) over y and adding to equation (E.6) yields

N_{0} π_{A} = n_{01} + w_{01} .

(E.9)

Summing equation (E.4) over y and adding to equation (E.8) yields

N_{1} (π_{1 C} + π_{A}) = n_{11} + w_{11} .

(E.10)

Subtracting equation (E.9) from equation (E.10) and solving for π_1C gives

π_{(E S T) C} = p_{1} - p_{0}, where p_{1} = (n_{11 +} + w_{11}) / N_{1} and p_{0} = (n_{01 +} + w_{01}) / N_{0} .

(E.11)

Subtracting equation (E.6) from equation (E.8) and solving gives

β_{(E S T) 1 C} π_{(E S T) C} = q_{11} - q_{01}, where q_{11} = w_{11} / N_{1} and q_{01} = w_{01} / N_{0} .

(E.12)

Subtracting equation (E.7) from equation (E.5) and solving gives

β_{(E S T) 0 C} π_{(E S T) C} = q_{00} - q_{10}, where q_{00} = w_{00} / N_{0} and q_{10} = w_{10} / N_{1} .

(E.13)

Subtracting equation (E.2) from equation (E.4) and solving for θ_y|1C based on equations (E.11) and (E.12) gives

θ_{(E S T) y | 1 C} = (n_{11 y} / N_{1} - n_{01 y} / N_{0}) / [{(1 - β_{(E S T) 1 C})} π_{(E S T) C}] = (n_{11 y} / N_{1} - n_{01 y} / N_{0}) / {(p_{1} - p_{0}) - (q_{11} - q_{01})} .

(E.14)

Subtracting equation (E.3) from (E.1) and solving for θ_y|0C based on equations (E.11) and (E.13) gives

θ_{(E S T) y | 0 C} = (n_{00 y} / N_{0} - n_{10 y} / N_{1}) / [{(1 - β_{(E S T) 0 C})} π_{(E S T) C}] . = (n_{00 y} / N_{0} - n_{10 y} / N_{1}) / {(p_{1} - p_{0}) - (q_{00} - q_{10})} .

(E.15)

Step 2. Compute the statistic of interest. The perfect fit ML estimate of the treatment effect in compliers is $d = θ_{(E S T) 1 | 1 C} - θ_{(E S T) 1 | 0 C} .$ .

Step 3. Compute the standard error using the MP transformation. The standard error is $s e = \sqrt v$ , where $v = v a r M P (d)$ .

Appendix F

This Appendix presents some of the matrix components in a composite linear model for discrete-time survival. Let $h_{f | x} (θ)$ denote the hazard for failure (in the absence of censoring) at time f =1, 2 for covariate x =0, 1. Let $c_{t | x} (β)$ denote the hazard for censoring (in the absence of failure) at time t =1 for covariate x = 0, 1, where censoring in an interval implies failure is not observed in the interval. Consider two simple models: $l o g i t {h_{f | x} (θ)} = θ_{f 0} + θ_{f 1} x$ and $l o g i t {c_{t | x} (β)} = β_{t 0} + β_{t 1} x$ . Let N denote the sample size. Let u_Ftx denote the expected number of persons who fail in interval t with covariate at level x. Let u_Ctx denote the expected number of persons censored in interval t with covariate at level x. The 8×1 column vector of expected counts with no missing data are U_8×1 = (u_F10, u_F20, u_C10, u_C20, u_F11, u_F21, u_C11, u_C21)^T, where, for covariate x,

u_{F 1 x} = N l o g {h_{1 | x} (θ)},

(F.1)

u_{F 2 x} = N {1 - h_{1 | x} (θ)} \times h_{2 | x} (θ) \times {1 - c_{1 | x} (β)},

(F.2)

u_{C 1 x} = N {1 - h_{1 | x} (θ)} \times c_{1 | x} (β),

(F.3)

u_{C 2 x} = N {1 - h_{1 | x} (θ)} \times {1 - h_{2 | x} (θ)} \times {1 - c_{1 | x} (β)} .

(F.4)

In matrix notation for composite linear models, the expected counts with no missing data are

U = N \exp {\sum_{k} Q^{(k)}}, where

Q^{(k)} = q^{(k)} (W^{(k)}, G^{(k)} H^{(k)}),

h^{(k)} (Z^{(k)}, X^{(k)} θ^{(k)}) for the outcome model,

H^{(k)} = {

h^{(k)} (Z^{(k)}, X^{(k)} β^{(k)}) for k = 2, for the missing - data mechanism .

Outcome model.

The H-component for the outcome model expresses in matrix form $l o g {h_{t | x} (θ)} = (θ_{t 0} + θ_{t 1} x) - l o g {1 + (\exp θ_{t 0} + θ_{t 1} x)}$ and $l o g {1 - h_{t | x} (θ)} = - l o g {1 + (\exp θ_{t 0} + θ_{t 1} x)}$ The H-component is $H_{8 \times 1} = l o g {(h_{1 | 0}, 1 - h_{1 | 0}, h_{2 | 0}, 1 - h_{2 | 0}, h_{1 | 1}, 1 - h_{1 | 1}, h_{2 | 1}, 1 - h_{2 | 1})}^{T}$ . In matrix notation, H_8×1⁽¹⁾ = Z_8×1⁽¹⁾ _° (X_8×4 ⁽¹⁾ θ_4×1) – log{1+ exp(X_8×4 ⁽¹⁾ θ_4×1)}, where Z_8×1 ⁽¹⁾ = (1, 0, 1, 0, 1, 0, 1, 0) ^T, X_8×4 ⁽¹⁾ = ((X_4×2*, 0_4×2), (X_4×2*, X_4×2*)), X_4×2* = ((1, 0), (1, 0), (0, 1), (0, 1)), and 0_4×2 is a 4 × 2 matrix of 0’s , θ_4×1 = (θ₁₀, θ₂₀, θ₁₁, θ₂₁)^T, and the symbol “_°” denotes element-by-element multiplication instead of matrix multiplication. The top half of X_8×2⁽¹⁾ corresponds to x=0 and the bottom half to x=1.

The Q-component of the outcome model is Q_8×1⁽¹⁾ = log(h_1|0, (1–h_1|0) h_2|0, (1–h_1|0), (1–h_1|0) (1–h_2|0), h_1|1, (1–h_1|1) h_2|1, (1–h_1|1), (1–h_1|1) (1–h_2|1))^T. In matrix notation, Q_8×1⁽¹⁾ = W_8×1 ⁽¹⁾ + G_8×8⁽¹⁾ H_8×1 ⁽¹⁾, where W_8×1 ⁽¹⁾ = 0_8×1, G_8×8⁽¹⁾ = ((G_4×4*, 0_4×4), (0_4×4, G_4×4*)), and G_4×4 *= ((1, 0, 0, 0), (0, 1, 1, 0), (0, 1, 0, 0), (0, 1, 0, 1)). The top half of G_8×8⁽¹⁾ corresponds to x=0 and the bottom half to x=1.

Missing-data mechanism.

The H-component for the censoring model is H_4×1 ⁽²⁾ = log(c_1|0, 1–c_1|0, c_1|1, 1–c_1|1) ^T. In matrix notation, H_4×1 ⁽²⁾ = Z_4×1⁽²⁾ _° (X_4×2 ⁽²⁾ β_2×1) – log{1+ exp(X_4×2 ⁽²⁾ β)}, where Z_4×1⁽²⁾ = (0, 1, 0, 1)^T, X_4×2 ⁽²⁾ = ((1, 0), (1, 0), (1, 1), (1, 1)) and β_2×1 = (β₁₀, β₁₁) ^T.

The Q-component for the censoring model is Q_8×1 ⁽²⁾ = log(1, (1–c_1|0), c_1|0, (1–c_1|0), 1, (1–c_1|1), c_1|1, (1–c_1|1))^T. In matrix notation, Q_8×1 ⁽²⁾ = W_8×1 ⁽²⁾ + G_8×4 ⁽²⁾ H_4×1 ⁽²⁾, where W_8×1 ⁽²⁾ = 0_8×1 and G_8×4 ⁽²⁾ = ((G_4×2**, 0_4×2), (0_4×2, G_4×2**)), where G_4×2 **= ((0, 0), (0, 1),(1, 0), (0, 1)). The top half of G_8×4 ⁽²⁾ corresponds to x=0 and the bottom half to x=1.

Footnotes

DATA AVAILABILITY STATMENT

The data used in the analyses are available in the tables of the paper.

REFERENCES

1.Rubin DB. Inference and missing data. Biometrika 1976; 63(3):581–592. [Google Scholar]
2.Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 1988; 44(1):175–188. [Google Scholar]
3.Heitjan DF, Rubin DB. Ignorability and coarse data. Ann. Statist 1991; 19(4):2244–2253. [Google Scholar]
4.Little RJ, Rubin DB. Statistical Analysis with Missing Data Hoboken, NJ: John Wiley & Sons; 2002. [Google Scholar]
5.Little RJ, Rubin DB and Zangeneh SZ. Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter subsets. J Am Stat Assoc 2017; 112(517):314–320. [Google Scholar]
6.Baker SG, Fitzmaurice G. Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative baseline covariates. Biostatistics 2006; 7(1):29–40. [DOI] [PubMed] [Google Scholar]
7.Henry K, Erice A, Tierney C, Balfour HH Jr, et al. A randomized, controlled, double-blind study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternating drug) for the treatment of advanced AIDS. AIDS Clinical Trial Group 193A Study Team. J Acquir Immune Defic Syndr Hum Retrovirol 1998. 19(4):339–349. [DOI] [PubMed] [Google Scholar]
8.Jennrich RI, Schluchter MD. Unbalanced repeated-measures models with structured covariance matrices. Biometrics 1986; 42(4):805–820. [PubMed] [Google Scholar]
9.SAS/STAT(R) 9.22 User’s Guide. SAS https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_mixed_sect034.htm Accessed March 27, 2019.
10.Allison PD. Handling missing data by maximum likelihood SAS Global Forum 2012; Paper 313–2012 http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf. Accessed March 27, 2019 [Google Scholar]
11.Baker SG, Erwin D. and Kramer BS. Estimating the cumulative risk of false positive cancer screenings. BMC Med Res Method. 2003; 3:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Baker SG. The Multinomial-Poisson transformation. The Statistician 1994; 43(4):495–504. [Google Scholar]
13.Thompson IM, Goodman PJ, Tangen CM, et al. The influence of finasteride on the development of prostate cancer. N Engl J Med 2003; 349(3):215–224. [DOI] [PubMed] [Google Scholar]
14.Baker SG, Darke AK, Pinsky P, Parnes HL, Kramer BS. Transparency and reproducibility in data analysis: The Prostate Cancer Prevention Trial. Biostatistics 2010;11(3): 413–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Baker SG. Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance. J Am Stat Assoc 2000; 95(449):43–50. [Google Scholar]
16.Baker SG, Laird NM. Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. J Am Stat Assoc 1988; 83(401):62–69. [Google Scholar]
17.Baker SG. Evaluating a new test using a reference test with estimated sensitivity and specificity. Commun Stat 1991; 20(9):2739–2752. [Google Scholar]
18.Baker SG. and Lindeman KS. The paired availability design, a proposal for evaluating epidural analgesia during labor. Stat Med 1994; 13(21):2269–2278. [DOI] [PubMed] [Google Scholar]
19.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994; 62(2): 467–475. [Google Scholar]
20.Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc 1996; 91(434):444–455. [Google Scholar]
21.Baker SG, Kramer BS, Lindeman KL. Latent class instrumental variables. A clinical and biostatistical perspective. Stat Med 2016; 35:147–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. J. Am Stat Assoc 1998; 93(443):929–934. [Google Scholar]
23.Baker SG and Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Stat Methods Med Res 2005; 14(4):349–367. [DOI] [PubMed] [Google Scholar]
24.Baker SG. Composite linear models for incomplete multinomial data. Stat Med 1994; 13(5–7):609–622. [DOI] [PubMed] [Google Scholar]
25.Baker SG, Ko C, and Graubard B. A sensitivity analysis for nonrandomly Missing categorical data arising from a national health disability survey. Biostatistics 2003; 4(1):41–56. [DOI] [PubMed] [Google Scholar]
26.Baker SG. Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics 1994; 50(3):821–826. [PubMed] [Google Scholar]
27.Baker SG. A simple method for computing the observed information matrix when using the EM algorithm with categorical data. J Comp Graph Stat 1992; 1(1):63–76 [Google Scholar]
28.Wolfram Research, Inc., Mathematica, Version 11.3, Champaign, IL; 2018. [Google Scholar]
29.Baker SG, Wax Y, Patterson BH. Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics 1993; 49(2):379–389. [PubMed] [Google Scholar]
30.Lauer RM, Connor WE, Leaverton PE, et al. Coronary heart disease risk factors in school children: the Muscatine study. J Pediatr 1975; 86(5):697–706. [DOI] [PubMed] [Google Scholar]
31.Baker SG. Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 1995; 51(3):1042–1052. [PubMed] [Google Scholar]
32.Vach W, Blettner M. Logistic regression with incompletely observed categorical covariates--investigating the sensitivity against violation of the missing at random assumption. Stat Med 1995; 14(12):1315–1329. [DOI] [PubMed] [Google Scholar]
33.Liublinska V, Rubin DB. Sensitivity analysis for a partially missing binary outcome in a two-arm randomized clinical trial. Stat Med 2014; 33(24):4170–4185. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Hollis S A graphical sensitivity analysis for clinical trials with non-ignorable missing binary outcome. Stat Med 2002; 21(24):3823–3834. [DOI] [PubMed] [Google Scholar]
35.Baker SG, Freedman LS. A simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Med Res Method 2003; 3:8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Rubin DB. Inference and missing data. Biometrika 1976; 63(3):581–592. [Google Scholar]

[R2] 2.Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 1988; 44(1):175–188. [Google Scholar]

[R3] 3.Heitjan DF, Rubin DB. Ignorability and coarse data. Ann. Statist 1991; 19(4):2244–2253. [Google Scholar]

[R4] 4.Little RJ, Rubin DB. Statistical Analysis with Missing Data Hoboken, NJ: John Wiley & Sons; 2002. [Google Scholar]

[R5] 5.Little RJ, Rubin DB and Zangeneh SZ. Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter subsets. J Am Stat Assoc 2017; 112(517):314–320. [Google Scholar]

[R6] 6.Baker SG, Fitzmaurice G. Freedman LS, Kramer BS. Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative baseline covariates. Biostatistics 2006; 7(1):29–40. [DOI] [PubMed] [Google Scholar]

[R7] 7.Henry K, Erice A, Tierney C, Balfour HH Jr, et al. A randomized, controlled, double-blind study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternating drug) for the treatment of advanced AIDS. AIDS Clinical Trial Group 193A Study Team. J Acquir Immune Defic Syndr Hum Retrovirol 1998. 19(4):339–349. [DOI] [PubMed] [Google Scholar]

[R8] 8.Jennrich RI, Schluchter MD. Unbalanced repeated-measures models with structured covariance matrices. Biometrics 1986; 42(4):805–820. [PubMed] [Google Scholar]

[R9] 9.SAS/STAT(R) 9.22 User’s Guide. SAS https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_mixed_sect034.htm Accessed March 27, 2019.

[R10] 10.Allison PD. Handling missing data by maximum likelihood SAS Global Forum 2012; Paper 313–2012 http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf. Accessed March 27, 2019 [Google Scholar]

[R11] 11.Baker SG, Erwin D. and Kramer BS. Estimating the cumulative risk of false positive cancer screenings. BMC Med Res Method. 2003; 3:11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Baker SG. The Multinomial-Poisson transformation. The Statistician 1994; 43(4):495–504. [Google Scholar]

[R13] 13.Thompson IM, Goodman PJ, Tangen CM, et al. The influence of finasteride on the development of prostate cancer. N Engl J Med 2003; 349(3):215–224. [DOI] [PubMed] [Google Scholar]

[R14] 14.Baker SG, Darke AK, Pinsky P, Parnes HL, Kramer BS. Transparency and reproducibility in data analysis: The Prostate Cancer Prevention Trial. Biostatistics 2010;11(3): 413–418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Baker SG. Analyzing a randomized cancer prevention trial with a missing binary outcome, an auxiliary variable, and all-or-none compliance. J Am Stat Assoc 2000; 95(449):43–50. [Google Scholar]

[R16] 16.Baker SG, Laird NM. Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. J Am Stat Assoc 1988; 83(401):62–69. [Google Scholar]

[R17] 17.Baker SG. Evaluating a new test using a reference test with estimated sensitivity and specificity. Commun Stat 1991; 20(9):2739–2752. [Google Scholar]

[R18] 18.Baker SG. and Lindeman KS. The paired availability design, a proposal for evaluating epidural analgesia during labor. Stat Med 1994; 13(21):2269–2278. [DOI] [PubMed] [Google Scholar]

[R19] 19.Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994; 62(2): 467–475. [Google Scholar]

[R20] 20.Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc 1996; 91(434):444–455. [Google Scholar]

[R21] 21.Baker SG, Kramer BS, Lindeman KL. Latent class instrumental variables. A clinical and biostatistical perspective. Stat Med 2016; 35:147–160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. J. Am Stat Assoc 1998; 93(443):929–934. [Google Scholar]

[R23] 23.Baker SG and Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Stat Methods Med Res 2005; 14(4):349–367. [DOI] [PubMed] [Google Scholar]

[R24] 24.Baker SG. Composite linear models for incomplete multinomial data. Stat Med 1994; 13(5–7):609–622. [DOI] [PubMed] [Google Scholar]

[R25] 25.Baker SG, Ko C, and Graubard B. A sensitivity analysis for nonrandomly Missing categorical data arising from a national health disability survey. Biostatistics 2003; 4(1):41–56. [DOI] [PubMed] [Google Scholar]

[R26] 26.Baker SG. Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics 1994; 50(3):821–826. [PubMed] [Google Scholar]

[R27] 27.Baker SG. A simple method for computing the observed information matrix when using the EM algorithm with categorical data. J Comp Graph Stat 1992; 1(1):63–76 [Google Scholar]

[R28] 28.Wolfram Research, Inc., Mathematica, Version 11.3, Champaign, IL; 2018. [Google Scholar]

[R29] 29.Baker SG, Wax Y, Patterson BH. Regression analysis of grouped survival data: informative censoring and double sampling. Biometrics 1993; 49(2):379–389. [PubMed] [Google Scholar]

[R30] 30.Lauer RM, Connor WE, Leaverton PE, et al. Coronary heart disease risk factors in school children: the Muscatine study. J Pediatr 1975; 86(5):697–706. [DOI] [PubMed] [Google Scholar]

[R31] 31.Baker SG. Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. Biometrics 1995; 51(3):1042–1052. [PubMed] [Google Scholar]

[R32] 32.Vach W, Blettner M. Logistic regression with incompletely observed categorical covariates--investigating the sensitivity against violation of the missing at random assumption. Stat Med 1995; 14(12):1315–1329. [DOI] [PubMed] [Google Scholar]

[R33] 33.Liublinska V, Rubin DB. Sensitivity analysis for a partially missing binary outcome in a two-arm randomized clinical trial. Stat Med 2014; 33(24):4170–4185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Hollis S A graphical sensitivity analysis for clinical trials with non-ignorable missing binary outcome. Stat Med 2002; 21(24):3823–3834. [DOI] [PubMed] [Google Scholar]

[R35] 35.Baker SG, Freedman LS. A simple method for analyzing data from a randomized trial with a missing binary outcome. BMC Med Res Method 2003; 3:8. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Maximum likelihood estimation with missing outcomes: From simplicity to complexity

Stuart G Baker

Abstract

1. INTRODUCTION

Table 1.

Table 2.

2. COMPLETE-CASE ANALYSIS

2.1. Continuous outcomes

2.2. Example 1

2.3. Binary outcomes

Table 3.

2.4. Example 2

3. COMPLETE-CASE ANALYSIS WITH COVARIATE ADJUSTMENT

3.1. Continuous outcomes

3.2. Example 1

3.3. Binary outcomes

Table 4.

3.4. Example 2

4. SURVIVAL ANALYSIS WITH COVARIATE ADJUSTMENT

4.2. Example

5. PROPENSITY-TO-BE-MISSING SCORES

5.1. Example

Table 5.

6. LONGITUDINAL DROPOUTS

6.1. Continuous outcome

6.2. Example 1

6.3. Binary outcomes

6.4. Example 2

Table 6.

7. PERFECT FIT ANALYSIS

7.1. Example 1

Table 7.

7.2. Example 2

7.5. Example 3

Table 8.

7.4. Example 4

Table 9.

8. COMPOSITE LINEAR MODELS

8.1. Example 1

Table 10.

8.2. Example 2

9. DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX A

APPENDIX B

APPENDIX C

APPENDIX D

APPENDIX E

Appendix F

Outcome model.

Missing-data mechanism.

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases