Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 15.
Published in final edited form as: Stat Med. 2021 Sep 29;40(29):6634–6650. doi: 10.1002/sim.9203

A comparison of methods for analyzing a binary composite endpoint with partially observed components in randomized controlled trials

Tra My Pham 1,, Ian R White 1, Brennan C Kahan 1, Tim P Morris 1, Simon J Stanworth 2, Gordon Forbes 3
PMCID: PMC7614656  EMSID: EMS176925  PMID: 34590333

Abstract

Composite endpoints are commonly used to define primary outcomes in randomized controlled trials. A participant may be classified as meeting the endpoint if they experience an event in one or several components (eg, a favorable outcome based on a composite of being alive and attaining negative culture results in trials assessing tuberculosis treatments). Partially observed components that are not missing simultaneously complicate the analysis of the composite endpoint. An intuitive strategy frequently used in practice for handling missing values in the components is to derive the values of the composite endpoint from observed components when possible, and exclude from analysis participants whose composite endpoint cannot be derived. Alternatively, complete record analysis (CRA) (excluding participants with any missing components) or multiple imputation (MI) can be used. We compare a set of methods for analyzing a composite endpoint with partially observed components mathematically and by simulation, and apply these methods in a reanalysis of a published trial (TOPPS). We show that the derived composite endpoint can be missing not at random even when the components are missing completely at random. Consequently, the treatment effect estimated from the derived endpoint is biased while CRA results without the derived endpoint are valid. Missing at random mechanisms require MI of the components. We conclude that, although superficially attractive, deriving the composite endpoint from observed components should generally be avoided. Despite the potential risk of imputation model mis-specification, MI of missing components is the preferred approach in this study setting.

Keywords: compatibility, composite endpoints, missing data, multiple imputation, RCTs

1. Introduction

Composite endpoints are commonly used to define primary outcomes in randomized controlled trials, such as those in rheumatoid arthritis, tuberculosis, and cardiovascular diseases.15 A composite endpoint can be constructed from two or more components. As a simple example of a composite endpoint, a participant may be classified as meeting the endpoint if they experience an event in one or several components; for instance, a favorable outcome in trials assessing tuberculosis treatments may be defined based on a composite endpoint of the participant being alive and attaining negative culture results during follow-up.

In practice, not all components of a composite endpoint are fully observed, and components that are not always missing or observed simultaneously complicate the analysis of the composite endpoint. A strategy often used in practice is to perform a complete record analysis (CRA) in which only participants with observed data in all components are included. Such a strategy may yield less efficient, and potentially even biased, estimates when the components are not missing completely at random (MCAR).

To make more use of available data, another strategy is to derive the composite endpoint from observed components when possible, and exclude from analysis participants whose composite endpoint cannot be derived.6,7 In the aforementioned example of trials assessing tuberculosis treatments, suppose that a participant is classified as having an unfavorable outcome if they either die or have positive culture results. For a given participant with missing culture results, their endpoint can be derived to be unfavorable if we know that they die before the end of the trial, whereas their endpoint cannot be ascertained (and therefore considered missing) if they are alive. Another type of composite endpoint is the time to the first of two or more events, whichever occurs first, and might be of primary interest in many clinical trials. For example, in cancer trials, a commonly used primary endpoint is progression-free survival, defined as the time from randomization to tumor progression or death. Some participants may be lost to follow-up before experiencing an event (ie, the progression component is missing), while their vital status at the end of the trial might be obtained from linkage to external death registry data (ie, the mortality component is “observed”). This setting was previously explored by Daniel and Tsiatis,8 who demonstrated how external information on the mortality component of the composite endpoint for participants lost to follow-up before experiencing an event can be incorporated in augmented inverse probability weighted estimating equations in order to increase efficiency.

Previously, O’Keeffe et al9 studied a binary composite endpoint with seven components, measured repeatedly for individuals during follow-up. The authors investigated the scenario in which if one component of the composite endpoint is missing at a particular time point, then all components are missing. Thus, it would not be possible to derive the value of the composite endpoint at time points where the components are missing. Rombach et al10 focused on composite endpoints that are linear functions of the components, which generally cannot be derived if at least one component is missing. Nevertheless, some scoring manuals allow for a small number of components to be substituted by the mean score of the available components (ie, single imputation with the average of the observed values).

While an analysis of the derived endpoint (i) is intuitively sensible, since we sometimes can determine a participant’s endpoint from the value of only one component, and (ii) uses more observed data compared with a CRA, it is not clear under which missingness mechanisms of the components valid inference is achieved. In addition, the exclusion of observed components without an event from the analysis (eg, data from participants who are known to be alive, ie, no event in the mortality component, but whose culture results are missing) means that the derived endpoint may not be MCAR or missing at random (MAR), even when the components are MCAR.6

Maximum likelihood estimation has previously been considered for the assessment of treatment effect on a composite endpoint that is constructed from two or more partially observed components.6,7,11 This approach appears to work well when values of the components are MCAR or MAR. However, implementation in standard statistical software is limited, and incorporating baseline covariates in the analysis is not straightforward.

Multiple imputation (MI) has increasingly been used to handle missing data in trials, and is an alternative approach for the analysis of a composite endpoint with incomplete data in the components. MI is commonly performed assuming data are MAR. The application of MI in handling missing values in the components of a composite endpoint poses several practical questions, requiring further consideration.

  • First, should MI be performed at the composite or component level?

  • Second, when imputing at the composite level, should MI be performed on participants whose composite endpoint cannot be derived from their observed components, or on all participants whose data are missing in any components, regardless of whether their endpoint can be derived?

  • Third, an essential condition for inference after MI to be valid is compatibility between the imputation and analysis models.1214 If MI is to be used, how should the imputation model be specified so that the associations between the components, as well as between the composite endpoint and other variables in the substantive analysis model, are correctly reflected in the imputed data?

The aim of this paper is to examine a set of methods, readily available in common statistical software packages, for analyzing a binary composite endpoint with partially observed components. The remainder of this paper is organized as follows. In Section 2, we introduce and describe our motivating data set from the TOPPS trial.15 In Section 3, we consider the case of a simple composite endpoint with two components (one fully observed and one with missing values) and show algebraically that the endpoint derived from the observed component can be missing not at random (MNAR) even when the missing component is MCAR. Section 4 presents a simulation study which compares methods for handling missing data in the components for two types of composite endpoint. This shows that MI performed at the component level is generally preferable. If MI at the composite level is used, it should be performed on all participants whose data are missing in any components, and this approach only provides valid inference when the components are MCAR. Specifying the imputation model for MI at the component level requires careful consideration on the potential interactions between the components as well as with randomized treatment. A reanalysis of the TOPPS trial is presented in Section 5; and Section 6 concludes with a discussion.

2. Motivating Example: the Topps Trial

The trial of prophylactic platelets (TOPPS) was a randomized, open-label, noninferiority trial assessing whether a policy of not giving prophylactic platelet transfusions was as effective and safe as a policy of providing prophylaxis to prevent bleeding in patients with haematologic cancers.15 A total of 600 participants were recruited from 14 haematology centres in the UK and Australia between 2006 and 2011.

Eligible participants were 16 years or older who were undergoing, or were about to undergo, chemotherapy or stem-cell transplantation to treat a haematologic cancer, and who had, or were expected to have, thrombocytopenia. Participants were randomized in a 1:1 ratio to receive, or not to receive, prophylactic platelet transfusions. Bleeding assessment was conducted daily, and the primary outcome was the occurrence of at least one bleeding event in the 30 days after randomization (ie, a binary composite endpoint constructed from 30 binary indicators of whether the participant had a bleeding event on each day). The structure of this composite endpoint is the same as any other composite endpoint made up of “an event in any of the components”, and the missing bleeding assessments on some days means that this composite endpoint suffers from the same aforementioned issues.

Bleeding was experienced in 151 of 300 (50%) participants in the no-prophylaxis group, and 128 of 298 (43%) participants in the prophylaxis group. The trial reported an adjusted difference in proportions of 8.4%, 90% confidence interval (CI) 1.7% to 15.2%. Therefore, noninferiority of a no-prophylaxis strategy compared to a prophylaxis strategy for platelet transfusions was not declared based on a noninferiority margin of 15%.

For the primary analysis, MI was used to account for days with missing bleeding assessments. Briefly, the 30-day follow-up period was split into six time blocks of five days (ie, days 1 to 5, days 6 to 10, days 11 to 15, days 16 to 20, days 20 to 25, and days 26 to 30), and the number of bleeds occurring during each time block was counted. The number of bleeds in a time block was set to missing if three or more bleeding assessments were missing in that time block. For missing time blocks, the number of bleeds was then imputed from proportional odds models, conditional on the other time blocks and minimization variables, using the multivariate imputation by chained equations (MICE) approach.16

3. A Simple Composite Endpoint With Two Components

In this section, we explore the mathematical properties of the simplest binary composite endpoint with two binary components. We determine the missingness mechanism of the derived endpoint when one component is fully observed and the other component is MCAR. We also demonstrate the potential bias associated with an analysis of the derived endpoint compared with a CRA, and discuss model specification for MI.

Let y be a binary composite endpoint with two binary components z1 and z2; y, z1, z2 take values 0 or 1. We define a simple composite endpoint y as

y={1,ifz1=1orz2=1;0,ifz1=0andz2=0.

Let pjk = P (z1 = j and z2 = k) ; j, k take values 0 or 1. Then P (y = 0) = p00 and P (y = 1) = p01 + p10 + p11. Further, suppose that z1 is fully observed for all participants, while z2 is missing for a subset of participants.

3.1. Missingness mechanism of the derived endpoint when one component is MCAR

When z2 is missing and z1 is observed, the composite endpoint y can be derived from the observed component z1 to take value 1 when z1 = 1, while y cannot be determined when z1 = 0. In other words, y is derivable from z1 = 1 regardless of the value of z2, whereas when z1 = 0 the value of y depends on what the missing value of z2 is, and in this case z1 alone does not provide sufficient information for y to be derived. This is because the composite y is defined as either z1 = 1 or z2 = 1.

We define rz2 as the binary response indicator, taking values 1 when z2 is observed, and 0 otherwise. Let ryderiv denote the binary response indicator for the derived endpoint yderiv,

ryderiv={1,ifrz2=1or(rz2=0andz1=1);0,ifrz2=0andz1=0.

Suppose z2 is MCAR with probability P (rz2 = 1) = α, then

P(ryderiv=0)=P(rz2=0andz1=0)=(1α)(p00+p01);P(ryderiv=1)=P[rz2=1or(rz2=0andz1=1)]=α+(1α)(p10+p11).

The distribution of y among the subset of participants whose endpoint is considered missing is given by

P(y=1ryderiv=0)=P(y=1andryderiv=0)P(ryderiv=0)=P(rz2=0andz1=0andz2=1)P(rz2=0andz1=0)=(1α)p01(1α)(p00+p01)=p01p00+p01. (1)

Similarly, the distribution of y among participants with a derivable endpoint can be written as

P(y=1ryderiv=1)=P(y=1andryderiv=1)P(ryderiv=1)=P(y=1andrz2=1)+P(rz2=0andz1=1)P(rz2=1)+P(rz2=0andz1=1)=α(p10+p01+p11)+(1α)(p10+p11)α+(1α)(p10+p11)=αp01+p10+p11α+(1α)(p10+p11). (2)

Since (1)(2) in general, yderiv will likely be MNAR even when z2 is MCAR.

3.2. Bias in analysis of the derived endpoint versus complete records

3.2.1. Analysis of the derived endpoint

In a randomized controlled trial, suppose we have a treatment variable x taking values 1 for treatment or 0 for control. Let Sjk = P (z1 = j and z2 = k | x = 1) and tjk = P (z1 = j and z2 = k | x = 0) ; j, k take values 0 or 1. When both components z1 and z2 are fully observed, the probability of y = 1 in the treatment and control arms is given by

P(y=1x=1)=s01+s10+s11=1s00; (3)
P(y=1x=0)=t01+t10+t11=1t00. (4)

Suppose our effect measure of interest is an odds ratio (OR). From (3) and (4), the full-data OR for the treatment effect can be written as

ORfull=P(y=1x=1)/P(y=0x=1)P(y=1x=0)/P(y=0x=0)=(s01+s10+s11)t00(t01+t10+t11)s00. (5)

With incomplete data, the distribution of the composite endpoint y among participants randomized to the treatment arm, whose endpoint can be derived from the values of z1, is

P(y=1ryderiv=1,x=1)=αS01+s10+s11α+(1α)(s10+s11); (6)
P(y=0ryderiv=1,x=1)=αS00α+(1α)(s10+s11), (7)

where α = P (rz2 = 1). Similarly, the distribution of y among participants randomized to the control arm, whose endpoint is derivable from the values of z1, is

P(y=1ryderiv=1,x=0)=αt01+t10+t11α+(1α)(t10+t11); (8)
P(y=0ryderiv=1,x=0)=αt00α+(1α)(t10+t11). (9)

From (6), (7), (8), (9), the OR for the treatment effect based on the derived endpoint is given by

ORderiv=[αS01+s10+s11]t00[αt01+t10+t11]s00, (10)

which, in general, is not equal to the OR given in (5) when the components are fully observed.

From (5) and (10), the ratio of ORderiv to ORfull is given by

ORderivORfull=αS01+s10+s11αt01+t10+t11/S01+s10+s11t01+t10+t11=1(1α)s01s01+s10+s111(1α)t01t01+t10+t11=1(1α)σ1(1α)τ. (11)

From (11), the direction of bias in the OR due to missing data is determined by the relative sizes of σ and τ. ORderiv will be inflated in analysis of the derived endpoint if σ < τ, and biased downwardly if σ > τ. An unbiased estimate of the OR is achieved when σ = τ, for example, when there is no effect of treatment on any of the components (ie, Sjk = tjk for all j, k). The maximum magnitude of bias due to one component being MCAR will be to increase or decrease the OR by a factor of α.

To illustrate this, suppose P (z1 = 1 | x = 0) = P (z2 = 1 | x = 0) = 0.7, P (z1 = 1 | x = 1) = P (z2 = 1 | x = 1) = 0.2, and z1z2 | x. Then σ = 0.23, and τ = 0.44. If 70% of data in z2 are MCAR (ie, α = 0.3) then ORderiv will be overestimated by 22%.

3.2.2. Analysis of complete records

Suppose the analysis is performed on participants with observed data in both components, that is, rz2 = 1. Then the distribution of the composite endpoint y among the complete records is the same as that when there are no missing data, as shown below.

P(y=1rz2=1,x=1)=P(y=1andrz2=1x=1)P(rz2=1x=1)=α(S01+S10+S11)α=S01+S10+S11=P(y=1x=1). (12)

It follows from (12) that, if the analysis discards participants with missing data in the incomplete component, the resulting estimated treatment effect will be unbiased.

3.3. MI of the incomplete component

When data in z2 are missing (with z1 fully observed), MI can be performed either at the composite level, that is, y is imputed directly, or at the component level, that is, z2 is imputed first and then y is passively imputed from z1 and z2.

For MI at the composite level, y can be imputed whenever z2 is missing, regardless of the values of z1 (MI-CRA). Alternatively, y can be derived from the values of z1 first before the remaining missing (nonderivable) values in y are imputed (MI-Deriv).

Suppose the substantive analysis model is a logistic regression model for the composite endpoint y, conditional on randomized treatment x. Then x needs to be included in the imputation model for y to ensure compatibility between the imputation and analysis models.14

Specification of the imputation model at the component level, that is, when z2 is imputed, is more complex. Both the fully observed component z1 and randomized treatment x should be included in the imputation model for z2. However, the imputation model for z2 can be specified in several ways, by:

  • including x and z1 as main effects (MIC-main);

  • including z1 as main effect and stratifying the imputation by x, so that the association between z2 and z1 varies by x (MIC-x); or

  • stratifying the imputation by both x and z1, so that the distribution of z2 differs across strata defined by values of x and z1 (MIC-x-z1).

The correct specification of the imputation model depends on the true associations between z1, z2, and x. Note that in this example the last imputation model will never be mis-specified but, as usual, there is a balance between the ability to be unbiased for any given data generating mechanism, and the practical chance that the imputation model will not converge for a given sample size and data set. The simulation study presented in the next section explores these MI approaches in more detail.

4. Simulation Study

4.1. Design

4.1.1. Aims

We conducted a simulation study to explore the statistical properties of a set of methods for handling missing values in the components of a composite endpoint (described in Section 3.3), as well as to support our analytic results in Section 3.

4.1.2. Data generating mechanism

We considered the case of a randomized controlled trial in which participants are randomized by simple randomization with equal probability to either the treatment or control arm (denoted by x, taking values 1 or 0, respectively). For each participant, a binary composite endpoint y is constructed from three binary components z1, z2, z3; y and the zs take values 0 or 1. Two examples of how a composite endpoint may be constructed from three components, which we refer to as simple and complex composite endpoints, were considered, where

ysimple={1,ifz1=1orz2=1orz3=1;0,ifz1=0andz2=0andz3=0;

and

ycomplex={1,ifz1=1and(z2=1orz3=1);0,otherwise.

When data in the components are completely observed, there are eight combinations of these components from which the values of y are determined (Table 1). In this simulation study, we first generated data in the components and then used them to construct the composite endpoint. To control the associations between the components, we defined a saturated log-linear model for the count of each combination c,

log(μc)=μ0+LPc,c=1,,8, (13)

where LPc is the linear predictor and μ0 is the intercept term included in the model for the counts to sum to the total number of participants. LPc can be written in terms of the components as

LPc=λ1z1+λ2z2+λ3z3+λ12z1z2+λ23z2z3+λ13z1z3+λ123z1z2z3, (14)

where λ12, λ23, λ13 correspond to the pairwise log ORs between any two components when the remaining component takes value 0, and λ123 represents the interaction between any two components in a logistic regression model with the remaining component as the dependent variable.

Table 1. Simulation study: all possible combinations of the components for constructing the simple and complex composite endpoints, and associated linear predictors in the log-linear model for the combinations of components.
Combination c Z 1 Z 2 Z 3 y simple y complex Linear predictor LPc for log (μc)
1 0 0 0 0 0 0
2 0 0 1 1 0 λ 3
3 0 1 0 1 0 λ 2
4 0 1 1 1 0 λ2 + λ3 + λ23
5 1 0 0 1 0 λ 1
6 1 0 1 1 1 λ1 + λ3 + λ13
7 1 1 0 1 1 λ1 + λ2 + λ12
8 1 1 1 1 1 λ1 + λ2 + λ3 + λ12 + λ23 + λ13 + λ123

Then the probability of each combination is given by

pc=exp(LPc)c=18exp(LPc). (15)

The expressions for the linear predictor corresponding to the eight combinations are presented in Table 1. It follows that the probability of meeting the composite endpoint is

P(ysimple=1)=c=28pc; (16)
P(ycomplex=1)=c=68pc. (17)

We considered three cases for the associations between the components and randomized treatment, where

  1. λ123 = 0 in both treatment and control arms;

  2. λ123 = 0 in the treatment arm but ≠ 0 in the control arm;

  3. λ123 ≠ 0 in both arms, with a different value in each arm.

These cases were considered in order to assess the validity of MI at the component level under potential mis-specification of the imputation model.

In addition, we assumed that data in z1 are fully observed, while z2 and z3 contain missing values generated under three missingness mechanisms (described later in this section).

The procedure for generating complete data was as follows.

  • Generate Nsample = 2 000 complete values of a binary treatment variable x taking values 0 or 1 from the model
    xBernoulli(px=0.5),
    reflecting simple randomization, with the sample size chosen to reduce small-sample bias associated with logistic regression;
  • Separately for each treatment arm, generate a categorical variable c which takes values 1 to 8 from (15), with values of λs selected to give a control arm event rate of 0.57 and event rate in the intervention arm of 0.84 (Supplementary Table S1);

  • Generate three components from c with values corresponding to those in Table 1, that is,
    • z1 = 1 if c > 4; and 0 otherwise;
    • z2 = 1 if c = 3, 4, 7, 8; and 0 otherwise;
    • and z3 = 1 if c = 2, 4, 6, 8; and 0 otherwise;

Finally, generate a binary composite endpoint y taking values 0 or 1 from the three components zs (Table 1). With the values of λs given in Supplementary Table S1, the effect of treatment x on the composite endpoint y is given by

logit[P(y=1x)]=β0+βxx, (18)

where, for both simple and composite endpoints, β0 and βx are equal to 0.3 and 1.35, respectively.

Missing data were then introduced as follows.

  • Generate binary indicators of response rl of zl from the following model
    logit[P(rz1=1z1,x)]=α0+αxx+αz1z1+αxz1xz1,l=2,3, (19)
    where
    • (i)
      α0 = 0.7, αx = αz1 = αxz1 = 0, corresponding to a MCAR mechanism;
    • (ii)
      α0 = 1.05, αx = -0.75, αz1 = 0.25, αxz1 = 0, corresponding to the first MAR mechanism (MAR1); and
    • (iii)
      α0 = 1.05, αx = -0.75, αz1 = 0.25, αxz1 = 0.25, corresponding to the second MAR mechanism (MAR2).

    Under each of these three missingness mechanisms, the probability of observing each component is around 0.7, and the probability of observing all components is around 0.49;

  • For l = 2, 3, set zl to missing if rzl = 0.

These steps were repeated Nrep = 2 000 times under each of the nine scenarios of cases I to III and missingness mechanisms MCAR, MAR1, MAR2, for simple and complex composite endpoints separately (Figure 1). The number of simulation repetitions was chosen to produce a Monte Carlo error of 0.5% on a coverage of 95%.

Figure 1.

Figure 1

Simulation study: simulation scenarios for simple and complex composite endpoints; each combination in the dashed boxes was repeated independently Nrep = 2 000 times. x, randomized treatment; λ123, three-way interaction between the components in the log-linear model

4.1.3. Estimands

The estimand is the log odds ratio βx for the treatment effect, whose true value is 1.35.

4.1.4. Methods of analysis

We compared the following methods for handling missing values in z2 and z3 (Table 2).

  1. CRA: perform a complete record analysis, excluding from analysis participants with missing values in either component;

  2. Deriv: derive y from the observed components when possible, exclude from analysis participants whose y cannot be derived and is considered missing;

  3. MI-CRA (MI of the composite endpoint): perform MI of y whenever a component is missing, regardless of whether y is derivable from the observed components. The imputation model for the composite endpoint is conditional on the randomized treatment x;

  4. MI-Deriv (MI of the composite endpoint): derive y from the observed components when possible, perform MI of y for the remaining missing values. The imputation model for the composite endpoint is conditional on the randomized treatment x;

  5. MIC-main (MI of the components): perform MI of z2 and z3 using MICE; the conditional model for each component includes the randomized treatment x, the fully observed component z1, and the other incomplete component as main effects; y is passively imputed from the observed and imputed components.

  6. MIC-x (MI of the components): perform MI of z2 and z3 using MICE; the conditional model for each component includes the fully observed component z1 and the other incomplete component as main effects, and imputation is stratified by randomized treatment x; y is passively imputed from the observed and imputed components.

  7. MIC-x-z1 (MI of the components): perform MI of z2 and z3 using MICE; the conditional model for each component includes the other incomplete component as main effect, and imputation is stratified by the fully observed component z1 and randomized treatment x; y is passively imputed from the observed and imputed components.

Table 2. Simulation study: methods for handling missing values in partially observed components z2 and z3. y, composite endpoint; x, randomized treatment; z1, fully observed component.
Method Variable(s) imputed Imputation model predictors
CRA
Deriv
aMI-CRA y CRA x
aMI-Deriv y deriv x
bMIC-main z2, z3 z1, z2 or z3, x
bMIC-x z2, z3 z1, z2 or z3; stratified by x
bMIC-x-z1 z2, z3 z2 or z3; stratified by z1 and x
a

Univariate MI using logistic regression.

b

MICE using logistic regression for conditional models.

For all MI methods, results from the imputed data sets were pooled using Rubin’s rules.17 From the chosen values of λs (Supplementary Table S1) the imputation model at the component level that is compatible with the substantive analysis model for case I is MIC-x; z2 was imputed from the following conditional model

logit[P(z2=1z1,z3,x)]=γ0+γ1z1+γ3z3+γxx+γ1xz1x+γ3xz3x,

and similarly for z3, with z2 as predictor.

For cases II and III, the compatible MI strategy at the component level is MIC-x-z1. The following conditional model was used to impute z2 (and similarly for z3, with z2 as predictor)

logit[P(z2=1z1,z3,x)]=γ0+γ1z1+γ3z3+γxx+γ13z1z3+γ1xz1x+γ3xz3x+γ13xz1z3x.

4.1.5. Performance measures

Bias, efficiency of β^x (in terms of the empirical and average model standard errors), and coverage of 95% CIs were calculated for each of the nine simulation scenarios,18,19 with analyses of full data (ie, before any values in z2 and z3 are set to missing) provided for comparison. These performance measures are defined as follows.

  • Bias: E[β^]β;

  • Empirical standard error: Var(β^);

  • Average model standard error: E[Var^(β^)];

  • Coverage: P(β^lowββ^upp).

All simulations were performed in Stata/MP 15.120 (the code is available at https://github.com/mytrapham/misscomposite); mi impute logit and mi impute chained were used for creating the imputations at the composite level and component level, respectively, and mi estimate for fitting the analysis model to the imputed data sets and pooling the results. Simulation results were analyzed using the community-contributed command simsum.19

4.2. Results

4.2.1. Simple composite endpoint

Simulation results for a simple composite endpoint are summarized graphically in Figures 2,3, and 4 for β^x (ie, our main estimand); results for β^0 are presented in Supplementary Figures S1 to S3 for reference.

Figure 2.

Figure 2

Simple composite endpoint, case I: performance measures for β^x under different missingness mechanisms of the components; βx = 1.35. Error bars, ±1.96× Monte Carlo errors; filled and hollow points, empirical and average model standard errors, respectively; vertical lines at 0 and 95 for bias and coverage, respectively [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 3.

Figure 3

Simple composite endpoint, case II: performance measures for β^x under different missingness mechanisms of the components; βx = 1.35. Error bars, ±1.96× Monte Carlo errors; filled and hollow points, empirical and average model standard errors, respectively; vertical lines at 0 and 95 for bias and coverage, respectively [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 4.

Figure 4

Simple composite endpoint, case III: performance measures for β^x under different missingness mechanisms of the components; βx = 1.35. Error bars, ±1.96× Monte Carlo errors; filled and hollow points, empirical and average model standard errors, respectively; vertical lines at 0 and 95 for bias and coverage, respectively [Colour figure can be viewed at wileyonlinelibrary.com]

Analysis of full data is unbiased with the smallest standard errors and coverage at the nominal 95% level. MI-CRA and MI-Deriv produce very similar results to CRA and analysis of the derived endpoint, respectively; hence, their results are not presented. This is because for MI at the composite level the imputation and analysis models are identical, and MI results only reflect additional Monte Carlo errors.

CaseI:λ123(x=1)=λ123(x=0)=0

CRA is unbiased when the components z2 and z3 are MCAR. Under the posited MAR mechanisms where the components are missing conditional on both z1 (fully observed) and randomized treatment x, the composite endpoint y is thus MNAR conditional on its values, in which case CRA provides biased estimates of βs as the theory suggests. If we instead consider a MAR mechanism where z2 and z3 are missing conditional only on randomized treatment x, then CRA will be unbiased.

Analysis of the derived endpoint is biased across all missingness mechanisms considered, consistent with the analytic results (Section 3). Bias is severe in both parameter estimates, apart from the log odds ratio β^x under MCAR, where bias is minimal. This might be due to bias in the treatment and control log odds being cancelled out when used to calculate the log OR.

MI at the component level with randomized treatment x and fully observed component z1 as main effects in the conditional imputation models (MIC–main) is biased, as the two-way interactions between the components and randomized treatment are omitted in the imputation model. By contrast, MI at the component level with z1 as main effect and stratified by x (MIC-x) is unbiased, as it is the correct model in this scenario. Since MI at the component level stratified by both x and z1 (MIC-x-z1) is a more general model of MIC-x, it is also correct and unbiased. For scenarios where both MI at the component level and CRA are valid methods, MI is more efficient than CRA.

CaseII:λ123(x=1)=0λ123(x=0);caseIII:λ123(x=1)λ123(x=0)0

Results under cases II and III are similar to those seen under case I. While MIC-x-z1, which accounts for the three-way interaction between the components and randomized treatment in the conditional imputation models, is the only correct approach in these cases, bias in MIC-x appears to be minimal for both parameter estimates across the missingnessmechanisms. Bias in MIC-x may be more apparent with other choices of parameter values in the data generating mechanism.

4.2.2. Complex composite endpoint

Simulation results for the complex composite endpoint are summarized graphically in Supplementary Figures S4 to S9 (for both β^x and β^0); they are similar to results for the simple composite endpoint. MI at the component level occasionally suffered from perfect prediction (often termed separation) when imputation was stratified by randomized treatment x and fully observed component z1 (MIC-x-z1); however, all occurrences of perfect prediction were overcome when augmentation was used in MI (via the specification of option augment in mi impute, Supplementary Table S2).21 This approach involves “augmenting” the data set by adding a few extra observations with small weights to the data during estimation of model parameters in a way that overcomes perfect prediction.21

5. Reanalysis of the Topps Trial

5.1. Methods of analysis

The composite endpoint in TOPPS was a simple composite endpoint constructed from 30 daily bleeding assessments, with an outcome event occurring if the participant experienced at least one bleeding event. We anticipated perfect prediction to be an issue when performing MI at the component level with 30 components. Thus, following what had been done in the original TOPPS analysis, we split the 30-day follow-up period into six time blocks, each of five days.

We considered two approaches for defining the completeness of these six blocks; the latter was how block-level completeness had been defined in the original TOPPS analysis.

  • Approach 1: each block was set to missing if bleeding status was missing for any of the five days;

  • Approach 2: each block was set to missing if bleeding status was missing for at least three of the five days.

Our main focus was missing data at block level. Since most of the missing data were at block level, we used relatively ad hoc methods to handle missing data within blocks. We handled missing data within blocks by a CRA approach (approach 1); as a sensitivity analysis we also derived the bleeding status for the blocks (approach 2). For blocks that were not set to missing (according to approaches 1 and 2), each block took value 1 if there was at least one bleeding event during the five days (ie, an initial block-wise derivation step in approach 2). These six blocks were then used to construct the composite endpoint, which took value 1 if any block took value 1, and 0 if all blocks took values 0.

In this reanalysis, we compared the following methods for handling missing values in the six time blocks: (i) CRA; (ii) Deriv; (iii) MI-CRA; (iv) MI–Deriv; (v) MIC-main; and (vi) MIC-trt. For MIC-main, we performed MI of the blocks using MICE, where the conditional imputation model for each block included the randomized treatment and other incomplete blocks as main effects. For MIC-trt, blocks were imputed using MICE; the conditional model for each block included other incomplete blocks as main effects, and imputation was stratified by the randomized treatment. Since none of the blocks were fully observed, MI at the component level stratified by the randomized treatment and fully observed component(s) (ie, a version of MIC-x-z1 in Section 4) was not relevant here. All MI methods were performed using 50 imputations and 20 burn-in cycles.

Initially, MIC-trt was performed using Stata’s mi impute chained (MIC-trt 1). However, perfect prediction led to nonconvergence in one of the imputations which caused MI to break down, and specifying the augment option did not help overcome this. We therefore considered two alternatives: (i) use the community-contributed command ice22 (MIC-trt 2); and (ii) use mi impute chained, but imputing each block conditional on two adjacent blocks instead of all other blocks (MIC-trt 3). These two alternatives successfully imputed missing values in the incomplete time blocks.

As in the original TOPPS analysis, our substantive analysis model was a generalized linear model for the composite endpoint (constructed from six time blocks) on randomized treatment, with an identity link and binomial family. For simplicity, minimization variables used in the original TOPPS analysis were not included in our substantive analysis and imputation models. Our estimand was the difference in proportions of participants who had bleeding events between the two treatment arms (no-prophylaxis versus prophylaxis platelet transfusion).

5.2. Results

Of the 600 participants, the majority did not have any missing bleeding assessments in any of the six time blocks (Supplementary Table S3). When treating a block as missing if any bleeding assessment was missing (ie, approach 1), 462 (77%) participants had complete data in all six time blocks, and 9 (2%) had missing data in all six time blocks. The remaining 129 (21%) participants had between one and five incomplete time blocks. The 462 (77%) participants with complete data were included in the CRA, while Deriv used data from 518 (86%) participants, those with complete data for all blocks, or at least one nonmissing block in which a bleeding event was recorded.

In approach 2 (ie, treating a block as missing if at least three of the five bleeding assessments were missing), 553 (92%) participants had complete data in all six time blocks, and 5 (1%) had missing data in all blocks. The rest of the participants (42; 7%) had between one and five incomplete time blocks. CRA included 553 (92%) participants with complete data; Deriv was performed on 576 (96%) participants whose endpoint was derivable from the observed time blocks.

Figure 5 presents the difference in proportions of participants who had bleeding events between the two treatment arms under different methods for handling missing bleeding events. The estimated proportions by randomized treatment are given in Supplementary Table S4. For MI methods, Monte Carlo errors for the estimated differences are less than 10% of the corresponding estimated standard errors with 50 imputations.

Figure 5.

Figure 5

TOPPS reanalysis: difference in proportions of participants who had bleeding events between the two treatment arms under different methods for handling missing bleeding events. MIC-trt 1, MI performed by mi impute chained, imputation of each block is conditional on all other blocks and stratified by randomized treatment; MIC-trt 2, MI performed by ice, imputation of each block is conditional on all other blocks and stratified by randomized treatment; MIC-trt 3, MI performed by mi impute chained, imputation of each block is conditional on two adjacent blocks and stratified by randomized treatment

Apart from Deriv and MI-Deriv, results are generally comparable across methods, which are also similar to the original TOPPS analysis result (risk difference 0.084, 90% CI 0.017 to 0.152). MI-CRA and MI-Deriv are similar to CRA and Deriv, respectively, as seen in Section 4. Deriv and MI-Deriv produce the largest estimated differences in both approaches, and are the only methods that are statistically significant under a superiority design (in approach 1). These results are in line with our analytic and simulation results for Deriv and MI-Deriv. MI methods performed at the component level produce estimates that are more efficient than CRA, with narrower CIs.

6. Discussion

When analyzing a binary composite endpoint with nonsimultaneously missing data in the components, a strategy frequently used in practice is to derive the endpoint from the observed components when possible and discard data from participants whose endpoint cannot be derived. By exploring the missingness mechanism of the derived endpoint both mathematically and by simulation, we showed that even when the components are MCAR, the composite endpoint derived from the observed components can be MNAR. As a result, an analysis of the derived endpoint will be biased. Omitting from analysis participants with missing data in the components (ie, a CRA) can reduce efficiency when the components are MCAR, and lead to bias when the components are MAR.

Our simulation study compared a set of methods, readily available in common statistical software packages, for handling missing values in the components of a binary composite endpoint. MI is a natural approach, and performing MI at the component level is generally preferable. Imputing the incomplete components when they are MCAR can improve efficiency compared with a CRA or MI at the composite level (MI-CRA). Under complex MAR mechanisms of the components, valid inference can be achieved with MI at the component level. By defining a model for the relations between the components in the data generating mechanism of our simulation design, we demonstrated that the choice of imputation model for the incomplete components might not be straightforward. The correct choice depends on the interactions between the components and also with randomized treatment. In the scenarios examined in our simulation study, MICE with conditional imputation models for the incomplete components, stratified by the randomized treatment and fully observed component (ie, allowing for the distribution of the incomplete components to differ across strata defined by values of the randomized treatment and fully observed component), is generally the preferred approach to other specifications of MI under consideration.

For nonmonotone patterns of missing data, the two standard model-based MI approaches are MICE16 and joint model imputation;13 theoretical equivalence of these two approaches in certain settings has been explored previously.23,24 While MICE involves specifying a series of conditional imputation models for the incomplete variables, joint model imputation is commonly based on the specification of a multivariate normal distribution for the incomplete variables. Here our MI results were obtained using MICE for the incomplete binary components, but alternatively these components could be imputed using the joint model imputation approach. When joint model imputation is performed for incomplete binary variables, one approach is to treat them as continuous in the imputation model, which means the imputed variables can take values other than 0/1. An additional rounding step could be used, but some approaches to rounding have been shown to yield bias in certain settings.25,26 Thus, joint model imputation might not be appropriate for the incomplete binary components considered in our simulation study and the TOPPS trial. In addition, an advantage of MICE is that the method is more flexible in handling missing values in several variables of different types. Here we considered the setting where all incomplete variables to be imputed are binary components of the composite endpoint, but in practice we might also need to impute other incomplete variables which are, for example, continuous, alongside the binary components.

In this article, we explored a binary composite endpoint constructed from two or more binary components. Unlike the setting investigated by O’Keeffe et al9 (described in Section 1), we examined the scenario where the components are not always missing (MCAR/MAR) simultaneously, and thus the composite endpoint can be derived from the components depending on their observed values. This difference in the missingness pattern has implications for whether imputation should be performed at the composite or component level, as has been shown in our simulation study.

Although we did not consider a composite endpoint that is the time to the first of two or more events, whichever occurs first (as described in Section 1), our finding about potential bias associated with deriving the endpoint from observed components can still apply to this type of composite endpoint. MI at the component level is also possible, although it is potentially more complex since the imputation needs to be performed for both the time to event and event indicator.

In the reanalysis of the TOPPS trial, we chose to split the 30-day period into six time blocks of five days as had been done in the original analysis of the trial. Other ways of splitting the follow-up period into time blocks could also be considered. For example, in the most extreme case, we could even consider splitting this period into 30 blocks of one day; however, given the size of the TOPPS data set, performing MI of 30 components while allowing for the imputation to be stratified by randomized treatment would likely result in nonconvergence. In fact, even with six blocks of five days, convergence was not achieved for one of the methods considered (MIC-trt 1) under approach 1 used for defining the completeness of these six blocks (Figure 5, Section 5.2). The choice of block size requires practical consideration on the ability to be unbiased for any given data generating mechanism, while accounting for potential issues related to nonconvergence of the imputation model for a given sample size and data set.

MI allows for the inclusion of auxiliary variables in the imputation model. Good candidates for auxiliary variables are those that are predictive of both the missing values and the probability of data being missing.27 Including these auxiliary variables in the imputation model will improve the plausibility of the MAR assumption and reduce bias. Auxiliary variables that are only predictive of the missing values can help to reduce the standard errors of estimates in the analysis model.27 In the reanalysis of the TOPPS trial, the inclusion of such auxiliary variables (if available) could improve the performance of MI, although whether additional interaction terms need to be specified in the conditional imputation models requires further exploration.

The reanalysis of the TOPPS trial suggested that results were relatively robust to the choice of method for handling missing values in the components (ie, six blocks of five daily bleeding assessments) of the composite endpoint. However, CRA produced the widest CI and represents a potential waste of resources. Compared with other methods under comparison, Deriv and MI-Deriv produced the largest estimated differences. They were also the only methods that changed the statistical significance of the results under a superiority design, which might be explained by the bias demonstrated in our analytic and simulation results. This bias can also negatively impact the results of a noninferiority analysis. In practice, bias associated with using the derived endpoint can potentially change the conclusion of the trial.

Our results highlighted the need to give careful consideration to the choice of method for handling missing data in the components when analyzing a composite endpoint. Although superficially attractive, an analysis of the derived endpoint should generally be avoided or used with extreme caution. Despite the risk of imputation model mis-specification, we showed that MI at the component level is the preferred approach in this study setting.

Supplementary Material

Supporting Information

Acknowledgements

TMP, IRW, BCK, and TPM were supported by the UK Medical Research Council (grant numbers MC_UU_12023/21 and MC_UU_00004/07). GF received support from the Nkateko trial, funded by the UK Medical Research Council under The Global Alliance for Chronic Diseases (GACD) programme (grant reference MR/JO16020/1), and was partially funded by NIHRNF-SI-0617-10120. The TOPPS trial was supported by grants from the National Health Service Blood and Transplant Research and Development Committee (PG04-05) and the Australian Red Cross Blood Service.

Funding information

Australian Red Cross Blood Service; Medical Research Council, Grant/Award Numbers: MC_UU_12023/21, MC_UU_00004/07, MR/JO16020/1; National Institute for Health Research, Grant/Award Number: NF-SI-0617-10120; NHS Blood and Transplant, Grant/Award Number: PG04-05

Data Availability Statement

The code and data used in the simulation study (Section 4) are available at https://github.com/mytrapham/misscomposite. The data used in the TOPPS reanalysis (Section 5) are available from the corresponding author of the original TOPPS publication (SJS) upon request; the code is available at https://github.com/mytrapham/misscomposite.

References

  • 1.Cordoba G, Schwartz L, Woloshin S, Bae H, Gøtzsche PC. Definition, reporting, and interpretation of composite outcomes in clinical trials: systematic review. BMJ. 2010;341(7769):381. doi: 10.1136/bmj.c3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ibrahim F, Tom BD, Scott DL, Prevost AT. A systematic review of randomised controlled trials in rheumatoid arthritis: the reporting and handling of missing data in composite outcomes. Trials. 2016;17(1):1–8. doi: 10.1186/s13063-016-1402-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Furin J, Alirol E, Allen E, et al. Drug-resistant tuberculosis clinical trials: proposed core research definitions in adults. Int J Tuberc Lung Dis. 2016;20(3):290–294. doi: 10.5588/ijtld.15.0490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bonnett LJ, Ken-Dror G, Davies GR. Quality of reporting of outcomes in phase III studies of pulmonary tuberculosis: a systematic review. Trials. 2018;19(1):1–7. doi: 10.1186/s13063-018-2522-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ferreira-González I, Busse JW, Heels-Ansdell D, et al. Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials. BMJ. 2007;334(7597):786–788. doi: 10.1136/bmj.39136.682083.AE. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Quan H, Zhang D, Zhang J, Devlamynck L. Analysis of a binary composite endpoint with missing data in components. Stat Med. 2007;26:4703–4718. doi: 10.1002/sim.2893. [DOI] [PubMed] [Google Scholar]
  • 7.Li X, Caffo B, Scharfstein D. On the potential for illogic with logically defined outcomes. Biostatistics. 2007;8(4):800–804. doi: 10.1093/biostatistics/kxm006. [DOI] [PubMed] [Google Scholar]
  • 8.Daniel RM, Tsiatis AA. Efficient estimation of the distribution of time to composite endpoint when some endpoints are only partially observed. Lifetime Data Anal. 2013;19(4):513–546. doi: 10.1007/s10985-013-9261-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.O’Keeffe AG, Farewell DM, Tom BD, Farewell VT. Multiple imputation of missing composite outcomes in longitudinal data. Stat Biosci. 2016;8(2):310–332. doi: 10.1007/s12561-016-9146-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rombach I, Gray AM, Jenkinson C, Murray DW, Rivero-Arias O. Multiple imputation for patient reported outcome measures in randomised controlled trials: advantages and disadvantages of imputing at the item, subscale or composite score level. BMCMed Res Methodol. 2018;18(1):1–16. doi: 10.1186/s12874-018-0542-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li X, Caffo BS. Comparison of proportions for composite endpoints with missing components. J Biopharm Stat. 2011;21(2):271–281. doi: 10.1080/10543406.2011.550109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Meng XL. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994;9(4):538–558. [Google Scholar]
  • 13.Schafer JL. Analysis of Incomplete Multivariate Data. Chapman Hall/CRC Press; London, UK: 1997. [Google Scholar]
  • 14.Bartlett JW, Seaman SR, White IR, Carpenter JR. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res. 2015;24(4):462–487. doi: 10.1177/0962280214521348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stanworth SJ, Estcourt LJ, Powter G, et al. A no-prophylaxis platelet-transfusion strategy for hematologic cancers. New Engl J Med. 2013;368(19):1771–1780. doi: 10.1056/NEJMoa1212772. [DOI] [PubMed] [Google Scholar]
  • 16.van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999;18:681–694. doi: 10.1002/(sici)1097-0258(19990330)18:6<681::aid-sim71>3.0.co;2-r. [DOI] [PubMed] [Google Scholar]
  • 17.Rubin DB. Multiple Imputation for Nonresponse in Surveys. Wiley; New York, NY: 1987. [Google Scholar]
  • 18.Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38:2074–2102. doi: 10.1002/sim.8086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.White IR. simsum: analyses of simulation studies including Monte Carlo error. Stata J. 2010;10(3):369–385. [Google Scholar]
  • 20.StataCorp. Stata Statistical Software: Release 15. College Station, TX: StataCorp LP; 2017. [Google Scholar]
  • 21.White IR, Daniel R, Royston P. Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Comput Stat Data Anal. 2010;54:2267–2275. doi: 10.1016/j.csda.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Royston P, White IR. Multiple imputation by chained equations (MICE): implementation in Stata. J Stat Softw. 2011;45(4):1–20. [Google Scholar]
  • 23.van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16:219–242. doi: 10.1177/0962280206074463. [DOI] [PubMed] [Google Scholar]
  • 24.Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JAC. Joint modelling rationale for chained equations. BMC Med Res Methodol. 2014;14:28. doi: 10.1186/1471-2288-14-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Horton NJ, Lipsitz SR, Parzen M. A potential for bias when rounding in multiple imputation. Am Stat. 2003;57(4):229–232. doi: 10.1198/0003130032314. [DOI] [Google Scholar]
  • 26.Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med. 2007;26:1368–1382. doi: 10.1002/sim.2619. [DOI] [PubMed] [Google Scholar]
  • 27.Carpenter JR, Kenward MG. Multiple Imputation and Its Application. 1st. John Wiley Sons, Ltd; Chichester, West Sussex: 2013. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Data Availability Statement

The code and data used in the simulation study (Section 4) are available at https://github.com/mytrapham/misscomposite. The data used in the TOPPS reanalysis (Section 5) are available from the corresponding author of the original TOPPS publication (SJS) upon request; the code is available at https://github.com/mytrapham/misscomposite.

RESOURCES