Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 7.
Published in final edited form as: J Am Stat Assoc. 2021 Apr 27;116(536):1700–1712. doi: 10.1080/01621459.2021.1900859

A Bayesian Hierarchical CACE Model Accounting for Incomplete Noncompliance With Application to a Meta-analysis of Epidural Analgesia on Cesarean Section

Jincheng Zhou 1,*, James S Hodges 2, Haitao Chu 2,*
PMCID: PMC8901124  NIHMSID: NIHMS1687558  PMID: 35261417

Abstract

Noncompliance with assigned treatments is a common challenge in analyzing and interpreting randomized clinical trials (RCTs). One way to handle noncompliance is to estimate the complier-average causal effect (CACE), the intervention’s efficacy in the subpopulation that complies with assigned treatment. In a two-step meta-analysis, one could first estimate CACE for each study, then combine them to estimate the population-averaged CACE. However, when some trials do not report noncompliance data, the two-step meta-analysis can be less efficient and potentially biased by excluding these trials. This paper proposes a flexible Bayesian hierarchical CACE framework to simultaneously account for heterogeneous and incomplete noncompliance data in a meta-analysis of RCTs. The models are motivated by and used for a meta-analysis estimating the CACE of epidural analgesia on cesarean section, in which only 10 of 27 trials reported complete noncompliance data. The new analysis includes all 27 studies and the results present new insights on the causal effect after accounting for noncompliance. Compared to the estimated risk difference of 0.8% (95% CI: −0.3%, 1.9%) given by the two-step intention-to-treat meta-analysis, the estimated CACE is 4.1% (95% CrI: −0.3%, 10.5%). We also report simulation studies to evaluate the performance of the proposed method.

Keywords: Bayesian methods, causal effect, missing data, randomized trial, meta-analysis

1. Introduction

Well-conducted randomized controlled trials (RCTs) are considered the hallmark of evidence-based medicine and the gold standard for evaluating efficacy and safety in clinical research. However, noncompliance with treatment assignment and missing data occur frequently in clinical trials and can affect their validity. Noncompliance occurs when some participants do not take or receive their assigned treatments. Missing outcome or compliance status happens when study investigators do not collect those items on some subjects because of loss to follow-up or other reasons. Ignoring noncompliance or missing data may lead to biased estimates of causal effects in the standard intention-to-treat (ITT) analysis.

Many methods have been developed for analyzing a single study with noncompliance, or noncompliance together with missing outcome data. Baker and Lindeman (1994) and Angrist et al. (1996) independently estimated the effect of treatment using the latent class instrumental variable (IV) method. Later on, Frangakis and Rubin (2002) proposed a principal stratification framework to estimate the complier average causal effect (CACE) with binary compliance status. An extensive literature uses this framework to estimate CACE for different types of outcome in a single study with noncompliance (Yau and Little, 2001; Ye et al., 2014; Cheng, 2009). When a study has both noncompliance and missing outcome data, the CACE approach can still be used but further assumptions about the missing data mechanism are required. One commonly used assumption is “latent ignorability” (LI), which means that the missing data are missing at random conditional on compliance status, i.e., missingness has no residual dependence on the outcomes, given the observed data and the latent unobserved compliance classes. Under this assumption, several models that accommodate missing outcomes have been developed for inference about CACE (O’Malley and Normand, 2005; Peng et al., 2004). Chen et al. (2009) discussed identifiability and estimation of CACE with missing outcome data under a nonignorability assumption, i.e., when the missing data mechanism depends on the unobserved outcome. Analytical strategies for handling noncompliance are also increasingly used (Jo et al., 2010; Stuart et al., 2008), although not as widely as missing data methods.

Although inference for a clinical trial with noncompliance or missing data has been well studied, little attention has been paid to handling both missing data and noncompliance in a meta-analysis. Meta-analysis, the statistical approach for synthesizing evidence from multiple studies, is gaining popularity in many fields due to the rapid growth of interest in comparative effectiveness research and evidence-based medicine (Egger et al., 2008). While multivariate and network meta-analysis (NMA) methods have been developed recently for meta-analyses of data consisting of multiple outcomes, multiple treatments, or multiple diagnostic tests (Lumley, 2002; Jackson et al., 2011; Zhang et al., 2014; Riley et al., 2017; Ma et al., 2018; Lian et al., 2019), important research gaps remain in meta-analysis in the area of causal inference. In particular, researchers have only recently started investigating causal effects in meta-analysis accounting for noncompliance (Baker and Kramer, 2005; Baker et al., 2016).

When noncompliance data are reported in each trial, intuitively one can first estimate CACE for each study, then combine these estimates using a meta-analytic method such as a common effect, fixed effects, or random effects model to estimate the population-averaged CACE. We call this naive method a “two-step” approach. The two-step approach — which can be viewed as a special case of a model using only trials with complete noncompliance data — can be less efficient and potentially biased because it excludes trials without noncompliance data. In a meta-analysis of randomized clinical trials, Zhou et al. (2019) proposed a Bayesian hierarchical model to estimate the CACE accounting for heterogeneous noncompliance. However, trials that do not report noncompliance data must be excluded, potentially leading to less efficient and biased estimates (Baker, 2020; Zhou et al., 2020).

In real meta-analyses, it is common that some trials do not report noncompliance data because they may not have been reported in the primary analysis. The present paper’s motivating study, a meta-analysis by Bannister-Tyrrell et al. (2015), has full compliance data reported for only 10 of 27 studies. Their goal was to estimate the causal effect of epidural analgesia in labor on the occurrence of cesarean section, but their analysis included only 9 studies with full compliance data and non-zero cesarean section events. Our proposed Bayesian hierarchical model framework aims to include studies that do not report noncompliance data and studies with zero events. This is the first paper dealing with this important issue. The main purposes are 1) to develop a flexible statistical framework that uses noncompliance data that is both heterogeneous across studies and incomplete in some studies, in a meta-analysis of RCTs with ordinal or binary outcomes; 2) to apply the method to a meta-analysis estimating the CACE of epidural analgesia on cesarean section, and compare it with the traditional two-step ITT meta-analysis.

This rest of this article is organized as follows. Section 2 describes the motivating case study of epidural analgesia, in which noncompliance varies between studies, and compliance status was missing for 17 of 27 studies. Section 3 first presents the assumptions for estimating the causal effect and for missingness, then describes the Bayesian hierarchical model and how to compute the posterior distributions for the overall and study-specific CACEs. Section 4 applies the model to the epidural analgesia case study using a particular approach to model selection and presents an analysis of the results’ sensitivity to the missing data assumptions. Section 5 reports simulation studies evaluating the proposed approach under a variety of conditions. Finally, Section 6 discusses our findings and potential extensions in future work.

2. A Motivating Meta-analysis of the Effect of Epidural Analgesia on Cesarean Section

2.1. Data Sources

Epidural analgesia in labor is a highly effective method of labor pain relief but it remains controversial whether epidural analgesia in labor increases the risk of cesarean section delivery. Solid evidence to support or refute this association is still limited, mainly because RCTs in obstetrics often have high rates of noncompliance.

In this setting, the consequences of receiving epidural analgesia are more important to clinicians and patients than the impact of being assigned to epidural analgesia, thus the ITT analysis, which estimates the difference in cesarean section risk between women randomized to epidural analgesia versus control, can give a biased estimate of the effect of receiving epidural analgesia, due to differential noncompliance. Bannister-Tyrrell et al. (2015) conducted an exploratory meta-analysis of the association between epidural analgesia in labor and cesarean section by using the 9 trials, out of 27 RCTs included in their systematic review, that have full compliance data with non-zero events.

Data were recorded on treatment assignment r (r = 1 for epidural analgesia, r = 0 for no/other analgesia in labor), actual received intervention t (t = 1 for epidural analgesia, t = 0 for no/other analgesia in labor), and frequency of cesarean section o (o = 1 for yes, o = 0 for no) by compliance with the assigned intervention, where noncompliance describes participants who were randomly assigned to receive epidural analgesia in labor but who in fact received either another or no analgesia, or who were assigned to the control group but ultimately received epidural analgesia in labor. Then for study i (i = 1,2, …,I), the count Nirto denotes the number of patients in randomization group r who received intervention t and had outcome o.

The cesarean section event rates and noncompliance rates vary substantially between trials as the inclusion and exclusion criteria, labor management strategies, etc. differ between trials. In the 27 RCTs, 4,459 women were assigned to receive epidural analgesia and 4,426 were assigned to receive non-epidural or no analgesia. Complete data were available on the cesarean outcome, with 470 cesarean deliveries in women assigned to the epidural and 419 cesarean deliveries in women assigned to non-epidural or no analgesia.

However, complete data on the number of cesarean sections in the compliant and noncompliant groups were available for only 10 studies, and data on noncompliance status per randomization group were only partly available for 13 of the 27 RCTs. We use t = ∗ to denote when the actually-received intervention is missing, then reorganize the available complete data and marginal data in Table 1. If Nirto is available for each t ∈ {0, 1}, the corresponding marginal count Niro was assigned as 0; otherwise if the actual received intervention data for arm r of study i is missing, only the marginal data Niro are shown in the table.

Table 1:

Data from randomized controlled trials of epidural analgesia in labor

Study Author, Year Complete data
Missing data
Allocated control
Allocated epidural
Allocated control Allocated epidural
Received Control Received epidural Received Control Received epidural

Cesarean − Ni000 + Ni001 Cesarean − Ni010 + Ni011 Cesarean − Ni100 + Ni101 Cesarean − Ni110 + Ni111 Cesarean − Ni0*0 + Ni0*1 Cesarean − Ni1*0 + Ni1*1

1 Bofill, 1997 37 2 11 1 2 0 42 5 0 0 0 0
2 Clark, 1998 72 6 68 16 7 2 134 13 0 0 0 0
3 Dickinson, 2002 0 0 0 0 0 0 0 0 428 71 408 85
4 Evron, 2008 40 4 0 0 0 0 0 0 0 0 129 19
5 El Kerdawy, 2010 0 0 0 0 0 0 0 0 12 3 11 4
6 Gambling, 1998 0 0 0 0 206 10 371 29 573 34 0 0
7 Grandjean, 1979 0 0 0 0 0 0 0 0 59 1 30 0
8 Halpern, 2004 62 5 44 7 0 0 112 12 0 0 0 0
9 Head, 2002 51 7 2 0 3 0 43 10 0 0 0 0
10 Hogg, 2000 0 0 0 0 0 0 0 0 46 6 46 7
11 Howell, 2001 0 0 0 0 0 0 0 0 169 16 171 13
12 Jain, 2003 72 11 0 0 0 2 36 7 0 0 0 0
13 Long, 2003 0 0 0 0 0 0 0 0 44 6 29 1
14 Loughnan, 2000 0 0 0 0 0 0 0 0 270 40 268 36
15 Lucas, 2001 0 0 0 0 0 0 0 0 304 62 309 63
16 Muir, 1996 0 0 0 0 0 0 0 0 20 2 25 3
17 Muir, 2000 0 0 0 0 0 0 0 0 79 9 86 11
18 Nafisi, 2006 179 19 0 0 0 0 173 24 0 0 0 0
19 Nikkola, 1997 6 0 4 0 0 0 10 0 0 0 0 0
20 Philipsen, 1989 0 0 0 0 0 0 0 0 48 6 47 10
21 Ramin, 1995 546 17 95 8 230 2 393 39 0 0 0 0
22 Sharma, 1997 336 16 5 0 114 1 231 12 0 0 0 0
23 Sharma, 2002 0 0 0 0 11 1 199 15 213 20 0 0
24 Shifman, 2007 0 0 0 0 0 0 0 0 32 18 45 15
25 Thalme, 1974 0 0 0 0 0 0 0 0 10 4 8 6
26 Thorp, 1993 0 0 0 0 0 0 0 0 44 1 36 12
27 Volmanen, 2008 23 1 3 0 1 0 23 1 0 0 0 0

The † indicates that the corresponding study has complete data on compliance status.

2.2. Analysis of Event Rates and Noncompliance Rates

Bannister-Tyrrell et al. (2015) estimated the effect of epidural analgesia in labor on cesarean section using the basic ITT analysis on all of the 27 RCTs, and also using the IV analysis but including only the 9 studies with complete data on the number of cesarean sections in compliant and noncompliant participants. Zhou et al. (2019) estimated the CACE using the 10 studies that reported full compliance data. In this paper, we further investigate whether the studies with incomplete data provide extra information about the causal effect of epidural analgesia.

The ITT meta-analysis of the 27 RCTs gave a pooled risk ratio 1.10 (95% confidence interval: 0.97, 1.25; P=0.071) for cesarean section following epidural analgesia in labor, which implies that epidural analgesia in labor does not increase the risk of cesarean section. However, due to high rates of noncompliance, an ITT meta-analysis may not be a good way to estimate the effect of receiving epidural analgesia. The ITT meta-analysis pooled effect is potentially biased, especially when noncompliance reporting cannot be assumed to be random with respect to the outcome in the meta-analysis. To investigate the association between the ITT event rates and the noncompliance rates, we used a bivariate generalized linear mixed effects model (BGLMM) (Chu et al., 2012) to do the analysis because several studies had 0 events or 0 noncompliance. The BGLMM assumes a bivariate normal distribution of probabilities in the two groups (p1i, p0i) in a transformed scale, where the probabilities can be either event rates (p1i = P(oi = 1|ri = 1), p0i = P(oi = 1|ri = 0)) or noncompliance rates (p1i = P(ti = 0|ri = 1), p0i = P(ti = 1|ri = 0)). Specifically, we use a probit random effects model specified as:

Φ1(p1i)=u+η1i,Φ1(p0i)=v+η0i,(η1i,η0i)T~MVN(0,Ση). (1)

In this model, Φ(.) is the standard Gaussian cumulative distribution function, (η1i, η0i) are random effects, and the covariance matrix is Ση=(σu2ρσuσvρσuσvσv2). We chose the probit link because it has a closed-form formula for the marginal probabilities E(p1i)=Φ(u/1+σu2) and E(p0i)=Φ(v/1+σv2), based on Equation (1).

We did a Bayesian analysis using JAGS (Plummer, 2003) to draw Markov chain Monte Carlo (MCMC) samples from the joint posterior distribution. We assigned vague priors N(0, 1000) to the fixed effects u, v, and the commonly-used inverse Wishart distribution InvW(I, ν = 3) to the covariance matrix Ση, where I is the identity matrix. The cesarean section event rates (p1i = P(oi = 1|ri = 1), p0i = P(oi = 1|ri = 0)), and the noncompliance rates (p1i = P(ti = 0|ri = 1), p0i = P(ti = 1|ri = 0)) were analyzed separately using the model in Equation (1). After 10,000 burn-in samples, 40,000 posterior samples were drawn. The overall estimates E(p1i) and E(p0i) were calculated using the closed-form formula shown above. We present MCMC results as posterior medians followed by 95% equal-tail credible interval (CrI) in brackets for the rest of this article. The marginal probability of having a cesarean section in patients assigned to epidural analgesia was estimated as 12.9% (9.9%, 17.0%), while in those assigned to no/other analgesia it was 11.3% (8.5%, 15.0%). Also, the noncompliance rate in the epidural analgesia arm E{P(ti = 0|ri = 1)} was 15.6% (5.4%, 29.0%), while in the no/other analgesia arm E{P(ti = 1|ri = 0)} was 13.8% (3.4%, 31.3%).

Figure 1 shows the study-specific posterior medians and 95% CrIs for the cesarean section event rates (horizontal lines) and noncompliance rates (vertical lines) in both the epidural analgesia arm (dashed line) and the control arm (solid line). Noncompliance rates show somewhat different patterns in the two randomization groups: as the event rate increases, the noncompliance rate tends to be higher, but this trend is more obvious in the control groups. Arguably, the association is in the opposite direction for the treated groups.

Figure 1:

Figure 1:

Study-specific event rates vs. noncompliance rates in studies of epidural analgesia in labor. Coordinates of each dot are the posterior medians of the study-specific event rate and compliance rate. Horizontal lines represent the 95% CrI of study-specific cesarean section event rate. Vertical lines represent the 95% CrI of study-specific noncompliance rate. Dashed lines show results in the epidural analgesia arm, while solid lines mark show results in the no/other analgesia group. The horizontal axis has a logarithmic scale.

The relationship between these two rates motivates us to develop a causal inference meta-analysis framework for the treatment effects, rather than use the ITT meta-analysis ignoring noncompliance. However, the existing complier average causal effect (CACE) framework needs complete information on compliance for each study. With completely or partially missing data on compliance in many studies, we aim to develop a new method that can use all studies and still have a valid causal interpretation. We introduce this method in Section 3 by first defining essential notation and assumptions.

3. Statistical Methods

3.1. Definition of the Complier Average Causal Effect (CACE)

3.1.1. Notation

In a meta-analysis with I two-armed randomized trials, Ni is the number of subjects in the i-th trial, where Ni0 is the number randomly assigned to the control/placebo group and Ni1 to the active treatment group. Let Rij = r index the randomization assignment for subject j in study i with r = 0 for assignment to control and r = 1 for assignment to treatment. Let Tijr=t{0,1} be the potential treatment received under the randomization assignment r, where t = 1 indicates receiving the active treatment and t = 0 placebo. Let Yijr,t=o{1,2,,O} be the potential outcomes under randomization assignment r and treatment received t for the j-th subject in the i-th trial. Note that the sets of {Yijr,t} and {Tijr} are the potential outcome and treatment-received status under possible r and t, but for each subject in a trial, only one of the possible values of each set can be observed. Therefore, we denote the observed response and received treatment variables as Yij and Tij for the j-th subject in the i-th trial. We allow Tij = ∗ if the actual received treatment is not recorded, and Yij = ∗ if the outcome is not recorded for the j-th patient in the i-th study. Then we let Mi be the Ni-dimensional vector of missingness indicators for all subjects in trial i, with individual element Mij = m corresponding to whether subject j has actual treatment received status on record (m = 0) or missing (m = 1).

Following Imbens and Rubin (1997), we let Cij be the latent compliance class of the j-th patient in the i-th trial, defined as follows:

  1. Cij= 0, never-taker, if (Tij0,Tij1)=(0,0), i.e., subjects who would receive control if randomized to either group;

  2. Cij= 1, complier, if (Tij0,Tij1)=(0,1), i.e., subjects who would receive the intervention to which they were randomized;

  3. Cij= 2, always-taker, if (Tij0,Tij1)=(1,1), i.e., subjects who would receive active treatment if randomized to either group;

  4. Cij= 3, defier, if (Tij0,Tij1)=(1,0), i.e., subjects who would receive the intervention opposite to their randomized assignment.

A subject’s compliance status Cij is not observable because, in a two-arm trial, only one of Tij1 and can be observed. Based on the observed randomization group and actual treatment received, the compliance classes can only be partially identified (see Table 2, columns Rij, Tij, and Cij).

Table 2:

Observed groups, latent compliance classes and outcome probabilities of trial i

Rij Tij Cij Yij = o ∈ {1, …, O} Count

0 0 0 (never-taker) or 1 (complier) M(Ni00,qio=πicvio+πinsio1πia) N i00o
0 1 2 (always-taker) or 3 (defier) M(Ni01, bio) N i01o
1 0 0 (never-taker) or 3 (defier) M(Ni10, sio) N i10o
1 1 1 (complier) or 2 (always-taker) M(Ni11,pio=πicuio+πiabio1πin) N i11o

Defiers are ruled out by the monotonicity assumption.

3.1.2. Assumptions and Outcome Distributions

For each study, we make assumptions identical to those listed in Angrist et al. (1996):

Assumption 1: Stable unit treatment value assumption (SUTVA) (Rubin, 1980).

The outcome for a subject is unaffected by the particular assignments of treatments to the other subjects. That is, if r = r′ then Tijr=Tijr; and if r = r′ and t = t′ then Yijr,t=Yijr,t.

Assumption 2: Random assignment to randomization groups.

For all Ni subjects in the i-th trial, the treatment assignment is random. This assumption implies that the proportion of compliers should be the same in the intervention and control groups.

Assumption 3: Exclusion restriction.

For subject j in the i-th trial Yijr,t=Yijr,t, for all r, r′ and t, i.e., the randomization assignment affects responses only through its effect on treatment received. This assumption allows us to define YijtYijr,tYijr,t for all r, r′ and t. Therefore, for always-takers and never-takers, the distribution of outcomes does not depend on the randomization group.

Assumption 4: E[Tij1Tij0]0 for each i.

For each trial, we assume the fraction of subjects who receive each intervention varies by randomization group.

Assumption 5: Monotonicity.

P[Tij1Tij0]=1 for each trial. This implies that no subject necessarily receives the treatment opposite to the assignment, under assignment to both active treatment and control. This assumption rules out the existence of defiers and reduces the number of compliance types for which we must derive estimates, permitting a properly identified model.

Assuming randomized assignment and the exclusion restriction implies two restrictions: 1) the proportions of always-takers, never-takers, and compliers are the same in the control and treatment groups; 2) for never-takers and always-takers, the outcome distribution is the same under assignment to control and to active treatment. With these two restrictions, for discrete outcomes o ∈ {1, …,O} we can extend the notation in Cheng (2009) and Baker (2011) and define the following parameters for latent compliance classes and response rates in the i-th study: 1) πia and πin are the probabilities of being an always-taker and a never-taker, respectively, so the probability of being a complier in the i-th study πic is 1−πiaπin; 2) uio is the probability of having outcome o for a complier randomized to the treatment group, and vio is the probability for a complier randomized to the control group in the i-th study; sio is the probability a never-taker has outcome o in the i-th study; and bio is the probability an always-taker has outcome o in the i-th study; where o=1Ouio=o=1Ovio=o=1Osio=o=1Obio=1. Although latent compliance classes cannot be fully identified based on randomization group (Rij) and observed treatment received (Tij), the above two restrictions allow us to write the distributions of observed Nirt in terms of the parameters for compliance classes and response rates, where Nirt=jI(Rij=r,Tij=t) denotes the number of individuals in each observed group. Let M(Nirt, xio) denote a multinomial distribution with Nirt subjects and multinomial probabilities {xio}. The observed count for each outcome o in group {j : Rij = r, Tij = t} is Nirto, o = 1, …,O. Table 2 shows the distribution of each observed count in trial i, where qio=πicvio+πinsio1πia and pio=πicuio+πiabio1πin are probabilities corresponding to Ni00o and Ni11o, o ∈ {1,…,O}.

Furthermore, according to the relations between observed groups and latent compliance classes, we have oNi00o=Ni00=Ni0(1πia) and oNi01o=Ni01=Ni0πia, so the vector of observed counts in the control group (Ni001,,Ni00O,Ni011,,Ni01O) follows a multinomial distribution M(Ni0,xi0=(xi001,,xi00O,xi011,,xi01O)), where xi00o=qio(1πia)=πicvio+πinsio,xi01o=bioπia, and o ∈ {1, …,O}. Similarly, in the active treatment group, the vector of observed counts (Ni101,,Ni10O,Ni111,,Ni11O) follows a multinomial distribution M(Ni1,xi1=(xi101,,xi10O,xi111,,xi11O)), where xi10o=sioπin,xi11o=pio(1πia)=πicuio+πiabio, and o ∈ {1, …,O}.

Let λi be the probability P(Rij = 1), which is usually known in a trial and treated as fixed. Therefore, for study i (i = 1,2, …,I), all observed counts Nirto follow a single multinomial distribution, with corresponding probability Pirto, for r ∈ {0, 1},t ∈ {0, 1}, o ∈ {1, …,O}. In mathematical notation, the distribution is M(Ni, xi = {Pirto}), where Pi0to = (1−λi)xi0to and Pi1to = λixi0to.

In addition to Assumptions 1–5, we make the latent ignorable (LI) missing assumption described in Section 1. That is, given the observed data and the latent unobserved compliance classes, missingness has no residual dependence on the outcomes. Under the LI assumption, Table 3 summarizes a typical data structure and notation for a study i with missing treatment-received status for randomized treatment group r ∈ {0, 1}. In each cell of Table 3, the first row shows the count and the second row shows the corresponding probability of the outcome; for a study in which subjects randomized to r had missing data on actual treatment received, only the rows labeled “Missing” would be observed.

Table 3:

Typical data for study i with missing actual treatment received status in randomization group r ∈ {0, 1}

Treatment received Outcome
1 O

0 N ir01 N ir0O
P ir01 P ir0O

1 N ir11 N ir1O
P ir11 P ir1O

Missing N ir* 1 N ir*O
Pir01 + Pir11 Pir0O + Pir1O

In each cell, the first row: the observed count; the second row: the corresponding probability.

3.1.3. CACE in Meta-analysis

One causal effect of interest in many studies is the CACE discussed in Section 1. CACE for the i-th two-arm trial is defined as θiCACE=E(Yij1Yij0Cij=1). The overall causal effect θCACE from the meta-analysis can be estimated by taking the expectation of θiCACE over all I trials, θCACE=E(θiCACE). For an ordinal outcome Yij = o ∈ {1, …,O}, suppose we use equally spaced scores {1,2, …,O} to reflect the real distances between categories, then θiCACE is o(o×uio)o(o×vio). When the outcome is binary, we let o ∈ {0, 1}, so the CACE for the i-th trial is θiCACE=ui1vi1.

A positive (negative) value of θiCACE indicates a beneficial treatment effect in the i-th trial if a higher value of o means a better (worse) outcome, and θiCACE=0 indicates no causal effect of treatment for compliers. Besides the aforementioned equally spaced scores {1,2, …,O}, their linear transforms may also be sensible in many cases and provide a reasonable compromise (Agresti, 2003). Alternative scoring systems such as midranks are also possible. When uncertain about which scoring choice to use, a sensitivity analysis can be conducted on different reasonable choices to see how they affect the estimates.

3.2. Estimation and Inference

3.2.1. The Likelihood

Let Ni = {Nir} be the vector of observed data in study i, where r refers to the randomization group (r = 1 for treatment and r = 0 for the control/placebo arm). In each arm r, Nir={Nirc,Nirm}, where the superscripts c and m denote complete and marginal counts, respectively. Nirc={Nirto} under each t ∈ {0, 1}, and o ∈ {1, …,O}. If the full compliance data were observed in arm r of study i, the corresponding marginal counts Nirc={Nir*o} are assigned as 0. Otherwise, if the actual received-treatment status in randomization arm r of study i was missing, only the marginal data Nirm={Nir*o} are available.

From Section 2.1, if full compliance data were observed in both randomization groups, all observed counts Nirto follow a single multinomial distribution, with probability Pirto, where Pi0to = (1 − λi)xi0to and Pi1to = λixi0to. Furthermore, as indicated by Table 3, all Niro also follow a multinomial distribution with probability Pir0o + Pir1o if only marginal data were observed, for o ∈ {1, …,O} in the i-th trial. Therefore, defining βi = (πia, πin, si, bi, ui, vi), where si = (si1, …,si(O−1)), bi = (bi1, …, bi(O−1)), ui = (ui1, …,ui(O−1)), vi = (vi1, …,vi(O−1)), study i’s likelihood contribution is

Li(βi)=joPi00o(1Rij)(1Tij)(1Mij)I(Yij=o)Pi01o(1Rij)Tij(1Mij)I(Yij=o)Pi10oRij(1Tij)(1Mij)I(Yij=o)Pi11oRijTij(1Mij)I(Yij=o)(Pi00o+Pi01o)(1Rij)MijI(Yij=o)(Pi10o+Pi11o)RijMijI(Yij=o), (2)

where the relations among the components of βi and Pirto are summarized in Section 3.1.2, j = 1, …,Ni, o = 1, …,O, and the indicator function I(Yij = o) = 1 if Yij = o and 0 otherwise. The parameters are subject to ouio=ovio=osio=obio=1 and 0 ≤ πia, πin, uio, vio, sio, bio ≤ 1. The likelihood function for all trials in a meta-analysis is L(β)=iLi(βi).

We use trials with binary outcomes to further illustrate the modeling; this also represents the situation in the motivating example. In this case, o ∈ {0, 1}, i.e., si0 + si1 = bi0 + bi1 = ui0 + ui1 = vi0 + vi1 = 1 for study i, so the vector parameters of si, bi, ui, vi are reduced to si1, bi1, ui1, vi1. Data can be arranged as shown in Table 1, where in each randomization arm, data are shown either in the column “Complete data” or in the column “Missing data”, with values in the other columns all 0. Thus the observed data are Nir={Nirc,Nirm}={Nir00,Nir01,Nir10,Nir11,Nir*0,Nir*1} for r ∈ {0, 1}. Then the likelihood contribution for the i-th trial can be written as

Li(βi)=[(1λi){πic(1vi1)+πin(1si1)}]Ni000{(1λi)(πicvi1+πinsi1)}Ni001{(1λi)πia(1bi1)}Ni010{(1λi)πiabi1}Ni011{λiπin(1si1)}Ni100{λiπinsi1}Ni101[λi{(πic(1ui1)+πia(1bi1)}]Ni110{λi(πicui1+πiabi1)}Ni111[(1λi){πic(1vi1)+πin(1si1)+πia(1bi1)}]Ni0*0{(1λi)(πicvi1+πinsi1+πiabi1)}Ni0*1[λi{πic(1ui1)+πia(1bi1)+πin(1si1)}]Ni1*0{λi(πicui1+πiabi1+πinsi1)}Ni1*1 (3)

where βi = (πia, πin, si1, bi1, ui1, vi1), and the parameters vary between studies following some distributions with hyper-parameters, which we now describe.

To account for potential between-study heterogeneity of the compliance classes and outcome probabilities, we consider a random effects model. Specifically, to guarantee the desired properties of latent compliance classes in study i, i.e., πin + πia + πic = 1 and 0 ≤ πin, πia, πic ≤ 1, and to allow these probabilities to vary between studies, the parameters are specified as: πia=exp(ai)1+exp(ni)+exp(ai) where ni = αn+δin, ai = αa+δia. The random effect (δin, δia) has a bivariate normal distribution with mean 0 and variance-covariance matrix Σlc=(σn2ρσnσaρσnσaσa2), to allow correlation between ni and ai across studies.

We also define random effect models on the transformed scale of each response probability si1, bi1, ui1, vi1: g(si1) = αs +δis, g(bi1) = αb +δib, g(ui1) = αu +δiu, g(vi1) = αv +δiv, where g(·) is a link function such as the logit or probit. These response rates are assumed to be independent across principal strata, so δis~N(0,σs2), δib~N(0,σb2), δiu~N(0,σu2), δiv~N(0,σv2). The model can easily be extended to more general cases with more than binary outcomes.

3.2.2. Prior Specifications and the Posterior Distribution

We assign proper but diffuse prior distributions for the hyper-parameters. Specifically, αn and αa both follow N(0,2.52), such that under the simplest situation (a fixed effects model), a 95% prior probability interval for any of the probabilities πin, πia, πic ranges from about 0.001 to 0.91; and αs, αb, αu, αv all follow N(0,22), which implies a 95% interval for the probabilities si1, bi1, ui1, vi1 ranging from about 0.01 to 0.98. The hyper-priors for the precision parameters σs2, σb2, σu2 and σv2 are assumed to be Gamma(2,2), which corresponds to a 95% interval of (0.6, 2.9) for the corresponding standard deviations, allowing moderate heterogeneity in the response probabilities. The prior for the precision matrix Σlc1 is Wishart, i.e., W(I, 3), where I is the identity matrix. In a reduced model with one of σn2, σa2 set to 0, the prior of the other precision parameter is also assumed to be Gamma(2,2), which gives moderate heterogeneity for latent compliance classes probabilities.

Let function f(βiβ0,Σ0) be the distributions described in Section 3.2.1 of all parameters βi = (πia, πin, si1, bi1, ui1, vi1), where β0 refers to the vector of mean hyper-parameters (αn, αa, αs, αb, αu, αv), and Σ0 is the covariance matrix of hyper-parameters Σlc1, σs2, σb2, σu2 and σv2. Denoting the prior distributions specified above as f(β0) and f(Σ0), the joint posterior distribution is then proportional to iLi(βi)f(βiβ0,Σ0)f(β0)f(Σ0). We sample from the joint posterior using Markov chain Monte Carlo (MCMC) methods, specifically Gibbs and Metropolis-Hastings sampling algorithms (Gelfand and Smith, 1990).

As mentioned in Section 3.1.3, for binary outcomes, θCACE can be estimated as E(θiCACE)=E(ui1)E(vi1). Integrating out the random effects, E(ui1)=+g1(αu+t)σu1ϕ(tσu)dt and E(vi1)=+g1(αv+t)σv1ϕ(tσv)dt, where ϕ(·) is the standard Gaussian density. Using probit link functions for ui1 and vi1, we have closed-form formulas E(ui1)=Φ(αu1+σu2) and E(vi1)=Φ(αv1+σv2) so that

θCACE=Φ(αu1+σu2)Φ(αv1+σv2). (4)

For si1 and bi1, we used the logit link random effects model. Though the integral in E(si1) does not have a closed-form formula, it has a well-established approximation, E(si1)logit1(αs1+C2σs2), where C=16315π (Zeger et al., 1988). This approximation also applies to estimating the overall always-taker response rate E(bi1). One can use either the same or different links for parameters ui1, vi1, si1 and bi1. In particular, the logit and probit links approximate each either very well. For convenience, we chose the probit link for ui1 and vi1 because it gives us a closed form for the posterior θCACE, while we chose the logit link for si1 and bi1 because it is more commonly used.

In each MCMC iteration, draws of θCACE are calculated from the MCMC draws using Equation (4). We use medians and equal-tail credible intervals (CrIs) of these posterior samples to make inferences for the random effects models.

3.2.3. Model Selection and Implementation

The model specified in Section 3.2.1 included all possible random effects to account for possible between-study heterogeneity of the fractions in the compliance classes and heterogeneity of the response rate probabilities. However, over-fitting the data with too many random effects should be avoided because it may inflate posterior variances. Therefore, we have used a forward selection procedure to choose the final model, beginning with a model having no random effects and at each forward step adding the random-effect component that gave the largest improvement in the deviance information criterion (DIC) (Spiegelhalter et al., 2002). Other model-selection approaches can be substituted easily, e.g., using a different model-selection criterion or a different search strategy.

We used JAGS software version 4.3 via the rjags package in R to sample from the joint posterior distribution. We ran three independent MCMC chains with starting points drawn randomly from their prior distributions. After 10,000 burn-in samples, the subsequent 100,000 posterior samples were obtained for each chain. Convergence to the stationary distribution was assessed using trace plots, sample autocorrelation, and the Gelman and Rubin statistic (Gelman and Rubin, 1992).

3.2.4. Model for Complete Data Only

Here we discuss how the naive “two-step” approach introduced in Section 1 can be viewed as a special case of our model using only trials with complete noncompliance data. In this situation, only trials with complete data Nirc={Nirto} are used to make inference on CACE. Then the likelihood contribution of the i-th study is

Li(βi)=joPi00o(1Rij)(1Tij)(1Mij)I(Yij=o)Pi01o(1Rij)Tij(1Mij)I(Yij=o)Pi10oRij(1Tij)(1Mij)I(Yij=o)Pi11oRijTij(1Mij)I(Yij=o). (5)

Note that when Mij = 1 (i.e., for trials with incomplete noncompliance data), Li(βi) = 1. Thus for trial i with complete noncompliance data Nirc={Nirto}, one can separately estimate θiCACE and obtain a standard error. One can then combine these study-specific estimates using a standard meta-analysis method, such as a fixed-effect or random effects model, to estimate the population-averaged CACE. Alternatively, one can obtain the posterior estimate of θCACE through the joint posterior distribution, which is proportional to the likelihood for trials with complete noncompliance data L(β)=iLi(βi) multiplied by the prior distributions. Note that by Lin and Zeng (2010), the two-step approach can be viewed as asymptotically equivalent to the model maximizing the joint likelihood. Therefore, in the simulation section below, we compare the performance of our proposed model including all trials with a model using only trials with complete noncompliance data instead of a two-step frequentist approach.

4. Case Study Results

4.1. Model Selection Results

We estimated the CACE of epidural analgesia in labor on cesarean section including all of the 27 RCTs introduced in Section 2. Although the full model has 6 potential random effects in total, δin, δia, δis, δib, δiu and δiv, we adopted the forward selection procedure described in Section 3.2.3. DIC, DIC improvement, and the effective number of parameters (pD) for each model considered in the forward selection procedure are presented in the Supplementary materials (Table W1). Starting with the model with no random effects (called Model I), at each forward step we added one random-effect component that gave the largest improvement in DIC until adding that random effect gave no notable improvement. The final model was Model Vc, including random effects δia, δin, δis, and δiu.

Figure 2 shows the kernel-smoothed posterior density of θCACE from the model selected in each forward step. The plot suggests θCACE has a fairly symmetric posterior density for all models. After adding the random effect δiu to probit(ui1) in Model Vc, the posterior of θCACE is shifted right and its variance increased considerably, which further indicates the importance of appropriately accounting for random effects.

Figure 2:

Figure 2:

Posterior densities of θCACE of epidural analgesia in labor on cesarean section for models selected at each forward step; plotted are the kernel smoothed density estimates from 100,000 Monte Carlo samples.

Table 4 lists estimated parameters from the fixed effects model (Model I) and the final model (Model Vc), where the triple of percentiles, 2.55097.5, is used to display each parameter’s posterior median with its 95% equal tail credible interval, as suggested by Louis and Zeger (2009). Monte Carlo integration (Ueberhuber, 1997) was used to estimate the probability of being in each principal stratum, πa, πc, and πn when δin and δia were both present (Model Vc). The marginal never-taker response rate s1 = E(si1) of Model Vc was estimated using the approximation E(si1)logit1(αs1+C2σs2), C=16315π, and the marginal treated complier response rate u1 = E(ui1) was estimated using the closed-form formula E(ui1)=Φ(αu1+σu2). For other marginal response rates (e.g., b1, v1), the values were directly estimated by transforming back the fixed-effect parameters if the probabilities were assumed to be the same across studies according to either Model I or Model Vc. For example, the marginal always-taker response rate was b1 = E(bi1) = logit−1(αb) because bi1 had no random effect in either model. Based on the final model (Model Vc), the posterior median and interval for θCACE were −0.0030.0410.105, which covers zero and indicates a nonsignificant complier average causal effect, though the estimated effect was about twice the estimate from the fixed effect model (Model I). The random effects for πa, πn, and πc on the transformed scale had standard deviations of 1.65 and 2.24, while the random effect for s1 had a standard deviation of 2.11 on the logit scale. After adding random effects for δin, δia, δis and δiu, the posteriors of πn, πa, s1 and u1 changed markedly from those estimated by Model I.

Table 4:

Summary of parameter estimates for the epidural analgesia meta-analysis

Parameter Model I(None) Model Vc(δin, δia, δis, δiu)

θ CACE −0.0030.0170.038 −0.0030.0410.105
Overall never-taker probability πn 0.2140.2300.246 0.0330.1010.259
Overall always-taker probability πa 0.1360.1520.170 0.0650.1900.400
Overall complier probability πc 0.5940.6180.641 0.5440.6870.787
Overall never-taker response s1 0.0290.0460.068 0.1160.2540.488
Always-taker response b1 0.1240.1680.216 0.1000.1400.174
Treated complier response u1 0.0930.1120.131 0.0650.1080.173
Control complier response v1 0.0780.0950.112 0.0540.0680.083
Mean parameter of ni −1.089−0.988−0.887 −3.196−2.173−1.224
Mean parameter of ai −1.542−1.399−1.260 −3.521−2.038−0.758
Standard deviation of ni 1.0551.6452.846
Standard deviation of ai 1.4022.2403.901
Standard deviation of logit(si1) 1.2312.1104.131
Standard deviation of probit(ui1) 0.4310.6000.912

The notation LPU denotes the posterior median P with 95% equal tailed credible limits (L, U).

Figure 3 is a forest plot of the posterior medians and 95% equal-tail CrIs of θiCACE for each trial based on the final model, Model Vc. Studies with a “†” in the “Study (Author, Year)” column had complete data on compliance status and we used solid lines to represent their CrIs. For a study with incomplete data, as its θiCACE was not directly estimable by the single trial, we used a dashed line to show the posterior 95% CrI. The figure shows that studies with complete data tend to have shorter credible intervals, while the study-specific estimates θiCACE were quite heterogeneous, indicating differences in the study populations. Compared to the overall risk difference estimated by the two-step ITT meta-analysis, the overall θCACE from the final model had a wider 95% CrI but still covered zero, suggesting the effect of epidural analgesia in labor on cesarean section is not statistically significant from the perspective of causal inference. However, compared to the estimated risk difference of 0.8% (95% CI: −0.3%, 1.9%) given by the two-step ITT meta-analysis, the estimated CACE was 4.1% (95% CrI: −0.3%, 10.5%), suggesting a potentially much larger effect size. This shows the potential dilution of the estimated average treatment effect in a two-step ITT meta-analysis.

Figure 3:

Figure 3:

Forest plot of θCACE of epidural analgesia in labor on cesarean section. The center of each square and the horizontal lines represent the posterior median and 95% equal tail CrI of θiCACE for each study from the final model, Model Vc. The first diamond indicates the pooled estimate of θCACE and its 95% CrI. The second diamond is the overall risk difference (RD) with its 95% CI from the fixed-effect ITT analysis. The symbol † indicates that the study has complete data on compliance status. With complete data, a solid horizontal line is used to represent the posterior 95% CrI of θiCACE, whereas a dashed line is used for the CrI for a study with incomplete compliance data.

4.2. Sensitivity to the LI Assumption

The above models were built upon the assumption of latent ignorable (LI) missingness. However, this assumption may not be satisfied in some applications. For example, studies showing a treatment effect may have a higher chance of reporting compliance status. This is a form of missing not at random (MNAR): the probability of missing compliance data depends on the outcome. However, in practice, one can never tell from the data at hand whether missingness is LI or MNAR (Little and Rubin, 2014). Thus, we present a sensitivity analysis that uses a known MNAR mechanism to show its impact on treatment-effect estimates.

Let the I × 2 matrix Ξ denote the study-level compliance missingness of a meta-analysis dataset containing I studies and 2 treatment arms. The entries of Ξ are ξir, i = 1, …,I and r = 0,1, with ξir = 1 if compliance information is missing in randomized group r of study i, and ξir = 0 if the data is complete. We assume ξir~Bern(pirmis), where pirmis is the probability of missing compliance status (i.e., no data on the actual treatment taken) in study i’s randomized group r. We specify a model of missingness for pirmis as

logit(pi0mis)=γ00+γ10×logit(vi1),logit(pi1mis)=γ01+γ11×logit(ui1). (6)

In this model, γ00 (γ01) is a scalar parameter, and γ10 (γ11) describes the strength of association between the missingness probability and the study-specific response rate of a complier in the randomized control (treatment) group, i.e., the components of θiCACE. When γ1r = 0 for r = 0,1, the missingness probabilities are not related to any model parameters, hence the missingness is completely at random (MCAR). For the purpose of assessing the effect of MNAR, for a given γ10 and γ11, this model of missingness can be incorporated in the likelihood in Section 3.2.1 and treated as if it is known to be true. Note that the model of missingness described here is not for general MNAR scenarios but is specific for the CACE problem, and we only consider scenarios in which missingness is related to components of θCACE.

In this case study, as the random effect δiv was not selected into the final Model Vc, the response rates for compliers randomized to the control group (vi1) were the same across trials. Thus the missing probabilities in the control arm pi0mis according to Equation (6) were also the same for all studies i. For illustration, we set γ10 = 0 in conducting sensitivity analyses to the specific MNAR scenario, to explore the impact on CACE estimates as γ11 changes from negative to positive. Since a flat prior for γ0r with a large variance would lead to a marginal prior distribution for pirmis heavily weighted towards 0 and 1, we follow Zhang et al. (2017) by specifying a logistic(0,1) prior for γ0r, which gives an approximate uniform prior for pirmis on (0,1).

Figure 4 summarizes the posterior of θCACE from the meta-analysis of epidural analgesia in labor when we set γ01 = 0 in Equation (6) and allow γ11 to range from −2.5 to 2.5 under the final model (Model Vc).

Figure 4:

Figure 4:

Posterior of θCACE of the epidural analgesia in labor meta-analysis under the assumption that the missingness probability in the treatment arm pi1mis is linearly related to ui1 on the logit scale. Bold solid line: posterior median; fine solid lines: 95% equal-tail credible interval; fine dashed lines: 95% highest posterior density credible interval. The fine dotted horizontal line is θCACE = 0.

As γ11 increased from −2.5 to 2.5, the posterior median of θCACE increased from about 0.02 to 0.07, and the 95% equal-tail credible interval of θCACE no longer covered zero when the coefficient of logit(ui1) was over about 0.3. If we used 95% highest posterior density credible interval instead of the equal-tailed interval, the significance of θCACE changed when γ11 was over about 0.7. Thus, when the missingness probabilities were positively and strongly enough correlated with ui1, the CACE became statistically significant, which differs from the conclusion drawn in Section 4.1 under the LI assumption. Therefore, the missingness mechanism for compliance influences the causal effect estimates in this epidural analgesia in labor meta-analysis.

5. Simulation

5.1. Simulation Setups

We conducted simulation studies to evaluate how the proposed method performs under different assumptions. As in the case study, we assumed o ∈ {0, 1}, i.e., the outcome is binary. We set (αn, αa, αs, αb, αu, αv) = (−0.4, −0.6, 0.5, −0.5, −0.5, 0.5), so that the true values in the absence of random effects were πic = 0.45, πin = 0.30, πia = 0.25 and θiCACE=0.38. When random effects were present, we assumed the random effects had standard deviation 0.5, i.e., each of σn, σa, σs, σu = 0.5. To evaluate the model’s performance and the impact of random effects, we generated compliance status and outcomes data with three sets of random effects, corresponding to Section 4.1’s Model IIIe (δin, δia), Model IVa (δin, δia, δis), and Model Vc (δin, δia, δis, δiu). The logit link was used for si1 and the probit link was used for ui1 when δis or δiu presents in the model, respectively. Under each scenario, we simulated 2000 datasets. Each dataset comprised 20 studies in which 350 subjects per study were randomized to either the treatment or control group with a 1 : 1 ratio (λ = 0.5). The setup values for the MCMC algorithm, including the number of MCMC chains, method for generating starting points, numbers of iterations for burn-in and after burn-in, were the same as in Section 3.2.3.

We created partially missing compliance data under the MCAR, LI, and MNAR assumptions, as follows. Under the MCAR assumption, the missing indicators for all studies were prespecified such that the first ten studies in the control arm (R = 0), and the 6-th to 15-th studies in the treatment arm (R = 1) did not have compliance information, so that only 5 studies had full data in both arms. To generate partially incomplete data under the LI and MNAR assumption, we applied a logit model to calculate the missingness probabilities in the control arm (R = 0) and treatment arm (R = 1) separately, which were used to generate the random missingness indicators to keep only the marginal data in that arm of the study.

The models for the missingness indicators are:

ξir~Bern(pirmis),
LI:logit(pirmis)=β0r+β1r×logit(πic)
MNAR:pi0mis=0.5,logit(pi1mis)=γ01+γ11×logit(ui1),

where r = 0,1 indicate the control and treatment groups respectively. If the missing indicator ξir = 1, then data on compliance status in the i-th study arm r were set to missing, i.e., only marginal values Nir∗1, Nir∗0 were available. Our data-generation settings imply that the parameter πic is independent of θiCACE, so we considered the missing assumption to be LI. For ease of presentation, we let pi0mis=pi1mis in the LI scenario so β0r and β1r can be reduced to β0 and β1. For MNAR, we assumed the probability of missing compliance data in the treatment arm is related to ui1 on the logit scale.

The intercept terms were chosen to control the expected missingness probability at about 0.5 in each scenario. Under the LI assumption, we set β1 = 2, referring to the scenario in which the missingness probabilities depend on the probability of being a complier. In a study with a higher proportion of compliers, the noncompliance rates tend to be smaller such that the ITT analysis would perform well, which may imply a higher probability of not reporting compliance information. Thus the coefficient β1 was set to a positive value, matching the above situation.

Under the MNAR assumption, we set γ11 = −2 to produce a scenario in which missingness in the treatment arm is related to the response rate in the compliers, while the missing probability in the control arm was set to the fixed value of 0.5. As the true value of θiCACE in our setting is negative (a beneficial complier average causal effect if the outcome o = 1 is an adverse event), we, therefore, created a scenario with 1) a fixed response rate for a control complier (vi1); and 2) a decreasing response rate as ui1 increases in a treated complier. Thus, when the beneficial CACE was more significant, it was more likely that the study’s investigators would not report compliance information. To do this, we set the coefficient of γ11 to be negative. The value γ11 = −2 not only implies a reasonable strength of the association between pi1mis and ui1, but also ensures the distribution pi1mis is well spread out between 0 and 1.

Under each missingness assumption and each true random-effect model, we compare the performance of our proposed method with the naive approach (described in Section 3.2.4) that includes only studies with complete data. Note that to create the missing compliance data, we just added Nir11 and Nir01 to give the marginal Nir∗1, and added Nir10 and Nir00 to give Nir∗0. For the analyses using the data from all studies, because no studies were discarded, the true underlying parameters still describe the data, and the proposed approach can be robust to different missing data generating mechanisms. However, for the naive approach that only includes studies with complete compliance information, patterns of the missing mechanism are expected to have an impact on the results.

We used a model selection procedure in the simulation, fitting each dataset with all of the following candidate models: 1) no random effect; 2) random effects only on (δin, δia); 3) random effects on (δin, δia, δis), and 4) random effects on (δin, δia, δis, δiu), which correspond to the models selected in each forward step from the case study model selection procedure (Model I, IIf, IIIe, IVa, and Vc). We counted the frequency of selecting each model using DIC. Note that either πic or ui1 must be generated with a random effect to ensure the missingness probabilities vary across studies. Thus under LI, we generate data with random effects (δin, δia), (δin, δia, δis), and (δin, δia, δis, δiu), and under MNAR we generate data with (δin, δia, δis, δiu).

5.2. Simulation Results

Table 5 summarizes results from the simulation studies regarding θCACE, comparing the two approaches, the proposed model (“Model including all studies”) and the naive method (“On studies with complete data”; described in Section 3.2.4), in terms of relative bias (ReBias), mean square error (MSE), 95% credible interval coverage probability (CP), 95% credible interval length (CIL), and relative efficiency (RelEff), defined as MSE from the naive analysis divided by MSE using the proposed model. Under each missingness mechanism considered, we fit the model including the same random effects as in generating the data.

Table 5:

Simulation results: relative bias (ReBias), mean square error (MSE), 95% credible interval coverage probabilities (CP), and 95% credible interval length (CIL) for θCACE

Missing Mechanism Random Effects Model including all studies
On studies with complete data
ReEff
ReBias MSE CP CIL ReBias MSE CP CIL

MCAR None 0.003 0.001 0.951 0.106 0.006 0.003 0.957 0.202 3.599
δin, δia 0.017 0.001 0.969 0.133 0.018 0.002 0.960 0.198 2.294
δin, δia, δis −0.013 0.001 0.953 0.134 −0.001 0.003 0.952 0.200 2.432
δin, δia, δis, δiu 0.010 0.002 0.988 0.240 −0.087 0.007 0.993 0.455 3.022

LI, β1 = 2 δin, δia 0.046 0.001 0.950 0.142 0.008 0.003 0.957 0.219 2.204
δin, δia, δis 0.030 0.001 0.963 0.142 0.003 0.003 0.951 0.217 2.627
δin, δia, δis, δiu 0.065 0.004 0.961 0.247 −0.066 0.008 0.989 0.445 2.321

MNAR, γ11 = −2 δin, δia, δis, δiu −0.022 0.003 0.978 0.241 −0.319 0.021 0.945 0.507 7.493

ReBias = Bias/True Value

Under the different missingness assumptions, the proposed model provided nearly unbiased estimates for θCACE with smaller MSE. Generally, the estimates were slightly biased when the data were generated under LI or MNAR compared to MCAR, or as the number of random effects increased. The coverage probabilities remained close to or above the nominal level 0.95 in all scenarios. The naive method that discards studies with incomplete data also performed reasonably well when data were generated under the MCAR or LI missingness mechanism, with little or no bias, though the proposed method was more efficient with consistently smaller MSE and shorter 95% credible interval length, as it gained efficiency by including information from more studies. However, when data were generated with missingness probabilities that were strongly associated with one component of θiCACE (the MNAR assumption), the naive approach using only studies with complete compliance data had substantially larger relative bias and MSE. Moreover, the relative efficiency values were greater than two in all scenarios, providing evidence that our proposed model is much more efficient than simply discarding studies without complete compliance data. (to be edited, also the discussion section)

Under each missing mechanism and true data-generating model, we fit four candidate models with different numbers of random effects as described in Section 5.1. The Supplementary Materials present additional simulation results with interpretations, including a table summarizing the frequency of selecting each candidate model as the “best” model in each set of simulations (Table W2), and a table of the relative bias, MSE, 95% credible interval coverage probability, and 95% CIL for θCACE fitting the four candidate models under each data-generating scenario (Table W3). The results indicate that random effects should be selected carefully to account for potential between-study heterogeneity when estimating CACE in a meta-analysis.

6. Discussion

We proposed an innovative Bayesian hierarchical model to estimate CACE in the meta-analysis of RCTs, accounting for both heterogeneous and incompletely reported noncompliance among studies, and we applied it to a case study of epidural analgesia trials to estimate the average causal effect on the cesarean section after accounting for noncompliance. We also conducted simulation studies to evaluate the performance of our approach under different missingness mechanisms and the impact of misspecification of random effects. To the best of our knowledge, this is the first meta-analysis of RCTs estimating CACE while adjusting for incomplete noncompliance data.

Using the proposed method, all 27 epidural analgesia trials were included in the CACE meta-analysis. Including information from studies with incomplete noncompliance data may introduce additional heterogeneity into the meta-analysis and may affect the overall CACE estimate. Compared to the estimated risk difference of 0.8% (95% CI: −0.3%, 1.9%) given by the two-step ITT meta-analysis, the estimated CACE was 4.1% (95% CrI: −0.3%, 10.5%). Thus we conclude that the potential dilution of the estimated treatment effect by the ITT meta-analysis notwithstanding, epidural analgesia in labor does not affect the risk of cesarean section in a strict causal interpretation. This method allows us simultaneously to account both for the inherent heterogeneity in noncompliance rates between treatment arms and across studies (as shown in Figure 1), and for incomplete noncompliance data in some studies. It provides a feasible way to estimate a clinically meaningful causal effect in meta-analysis by including all studies, which can be applied to different therapeutic areas in RCTs with binary or ordinal outcomes.

The simulations indicated that 1) our approach had a good chance of identifying the correct model, and 2) our proposed model had better efficiency for estimating CACE, with smaller MSE and shorter credible intervals, compared to the model only using trials with complete compliance data. Our simulations under the MNAR assumption did not discard any studies when fitting the proposed models, so we expected they would still give unbiased CACE estimates. The naive approach including only studies with complete compliance data gave biased estimates, which was not surprising because the studies with complete data are no longer representative of all studies under the MNAR missing mechanism.

Besides handling the situation in which some studies in a meta-analysis do not report compliance information, the proposed method can be extended to handle missing outcomes. Missing outcome data in RCTs commonly happens when researchers do not collect follow-up outcomes for some subjects. For example, consider a vaccine trial in which patients randomized to the vaccination group were encouraged to receive a flu shot, but patients themselves decided whether to receive flu shots, and their actual vaccination received was recorded. For the outcome of flu-related hospitalization, missing outcomes could occur if some patients had flu but were treated at hospitals not participating in the study, or if some patients simply had unknown hospitalization status. In this case, we can extend the likelihood in Equation (2) by adding a column “missing” to the right of Table 3, and the corresponding probabilities would be the sum of the probabilities of all cells in that row. This extension could improve the estimation of the probabilities of latent classes. While missing outcome data happens frequently for some patients in a single study, however, in a typical meta-analysis of randomized clinical trials, as the focus is the treatment’s effect on the outcome variable, studies reporting only compliance data but not outcome data, if any exist, are generally not included in a meta-analysis. Nevertheless, the model can be extended to incorporate partial outcome missingness in some studies.

The proposed model can also be extended to incorporate study-level predictors. Depending on whether the study-level covariates are assumed to be associated with the latent compliance class probabilities or the outcome response rates, they can be included in the models for one or more of the transformed study-level parameters πin, πia, sio, bio, uio or vio. For example, if study-specific mean age in a meta-analysis is believed to be associated with the proportion of never-takers, always-takers, and compliers, one can add the mean age variable xi to the equations for the transformed πin, πia. Specifically, in Section 3.2.1 we proposed a generalized inverse logistic transformation to guarantee in study i that πin + πia + πic = 1 and 0πin,πia,πic1:πin=exp(ni)1+exp(ni)+exp(ai), ,πia=exp(ai)1+exp(ni)+exp(ai). In the case with study-level predictor xi, we may want to let ni = αn + βnxi + δin and ai = αa + βaxi + δia. Then the posterior median and credible interval of βn and βa provide information about the magnitude and significance of the association between study-level mean age and the never-taker, always-taker and complier probabilities. Study-level predictors can also be added to the models for outcome response rates depending on the specific nature of the trials and outcome measure. As suggested by several reviewers, study-level covariates may make the LI+MAR assumption more plausible in practice.

Recently, extensions of models estimating CACE with missing data in a single study have been developed. Specifically, Chen et al. (2009) discussed the identifiability and estimation of CACE under a nonignorable missing mechanism; Peng et al. (2004) proposed an extended general location model to estimate the CACE with missing data in the outcome and in baseline covariates. Estimating CACE with missing data in longitudinal and survival outcomes has also been discussed (Yau and Little, 2001). These methods have been proposed only for the single-study setting; potential extensions for estimating CACE in meta-analysis await further development. Furthermore, as network meta-analysis expands the scope of a conventional pairwise meta-analysis to simultaneously compare multiple treatments by synthesizing both direct and indirect information (Lumley, 2002; Zhang et al., 2014), extending the CACE meta-analysis methods to network meta-analysis is also a promising future research topic that awaits further exploration.

Supplementary Material

Web Supp

Acknowledgments

We are grateful to the Editor, the associate Editor, and anonymous reviewers whose comments greatly improved this article.

This research was supported in part by NIH NLM R21012744 and NLM R01LM012982.

Footnotes

Supplementary Materials

The supplementary materials contain additional results from the case study and simulations. The data and R JAGS code used to produce the results of this paper are available at the GitHub repository https://github.com/JinchengZ/CACEmetaBayes.git.

References

  1. Agresti A (2003). Categorical Data Analysis, volume 482. John Wiley & Sons. [Google Scholar]
  2. Angrist JD, Imbens GW, and Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444–455. [Google Scholar]
  3. Baker SG (2011). Estimation and inference for the causal effect of receiving treatment on a multinomial outcome: an alternative approach. Biometrics 67, 319–323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Baker SG (2020). CACE and meta-analysis (letter to the editor). Biometrics 76, 1383–1384. [DOI] [PubMed] [Google Scholar]
  5. Baker SG and Kramer BS (2005). Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Stat Methods Med Res 14, 349–67. [DOI] [PubMed] [Google Scholar]
  6. Baker SG, Kramer BS, and Lindeman KS (2016). Latent class instrumental variables: a clinical and biostatistical perspective. Statistics in medicine 35, 147–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Baker SG and Lindeman KS (1994). The paired availability design: a proposal for evaluating epidural analgesia during labor. Statistics in Medicine 13, 2269–2278. [DOI] [PubMed] [Google Scholar]
  8. Bannister-Tyrrell M, Miladinovic B, Roberts CL, and Ford JB (2015). Adjustment for compliance behavior in trials of epidural analgesia in labor using instrumental variable meta-analysis. Journal of Clinical Epidemiology 68, 525–533. [DOI] [PubMed] [Google Scholar]
  9. Chen H, Geng Z, and Zhou X-H (2009). Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data. Biometrics 65, 675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cheng J (2009). Estimation and inference for the causal effect of receiving treatment on a multinomial outcome. Biometrics 65, 96–103. [DOI] [PubMed] [Google Scholar]
  11. Chu H, Nie L, Chen Y, Huang Y, and Sun W (2012). Bivariate random effects models for meta-analysis of comparative studies with binary outcomes: methods for the absolute risk difference and relative risk. Statistical Methods in Medical Research 21, 621–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Egger M, Davey-Smith G, and Altman D (2008). Systematic Reviews in Health Care: Meta-analysis in Context. John Wiley & Sons. [Google Scholar]
  13. Frangakis CE and Rubin DB (2002). Principal stratification in causal inference. Biometrics 58, 21–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gelfand AE and Smith AFM (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398–409. [Google Scholar]
  15. Gelman A and Rubin DB (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–472. [Google Scholar]
  16. Imbens GW and Rubin DB (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. The Annals of Statistics pages 305–327. [Google Scholar]
  17. Jackson D, Riley R, and White IR (2011). Multivariate meta-analysis: Potential and promise. Statistics in Medicine 30, 2481–2498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jo B, Ginexi EM, and Ialongo NS (2010). Handling missing data in randomized experiments with noncompliance. Prevention Science 11, 384–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lian Q, Hodges JS, and Chu H (2019). A Bayesian hierarchical summary receiver operating characteristic model for network meta-analysis of diagnostic tests. Journal of the American Statistical Association 114, 949–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lin D and Zeng D (2010). On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika 97, 321–332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Little RJ and Rubin DB (2014). Statistical Analysis with Missing Data. John Wiley & Sons. [Google Scholar]
  22. Louis TA and Zeger SL (2009). Effective communication of standard errors and confidence intervals. Biostatistics 10, 1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lumley T (2002). Network meta-analysis for indirect treatment comparisons. Statistics in Medicine 21, 2313–2324. [DOI] [PubMed] [Google Scholar]
  24. Ma X, Lian Q, Chu H, Ibrahim JG, and Chen Y (2018). A Bayesian hierarchical model for network meta-analysis of multiple diagnostic tests. Biostatistics 19, 87–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. O’Malley AJ and Normand S-LT (2005). Likelihood methods for treatment noncompliance and subsequent nonresponse in randomized trials. Biometrics 61, 325–334. [DOI] [PubMed] [Google Scholar]
  26. Peng Y, Little RJ, and Raghunathan TE (2004). An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics 60, 598–607. [DOI] [PubMed] [Google Scholar]
  27. Plummer M (2003). JAGS: A program for analysis of Bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, volume 124, page 125. Vienna, Austria. [Google Scholar]
  28. Riley RD, Jackson D, Salanti G, Burke DL, Price M, Kirkham J, and White IR (2017). Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples. BMJ 358, j3932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rubin DB (1980). Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association 75, 591–593. [Google Scholar]
  30. Spiegelhalter DJ, Best NG, Carlin BP, and Van Der Linde A (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 583–639. [Google Scholar]
  31. Stuart EA, Perry DF, Le H-N, and Ialongo NS (2008). Estimating intervention effects of prevention programs: Accounting for noncompliance. Prevention Science 9, 288–298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Ueberhuber CW (1997). Numerical Computation 1: Methods, Software, and Analysis, volume 16. Springer Science & Business Media. [Google Scholar]
  33. Yau LHY and Little RJ (2001). Inference for the complier-average causal effect from longitudinal data subject to noncompliance and missing data, with application to a job training assessment for the unemployed. Journal of the American Statistical Association 96, 1232–1244. [Google Scholar]
  34. Ye C, Beyene J, Browne G, and Thabane L (2014). Estimating treatment effects in randomised controlled trials with non-compliance: a simulation study. BMJ Open 4, e005362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Zeger SL, Liang K-Y, and Albert PS (1988). Models for longitudinal data: a generalized estimating equation approach. Biometrics pages 1049–1060. [PubMed] [Google Scholar]
  36. Zhang J, Carlin BP, Neaton JD, Soon GG, Nie L, Kane R, Virnig BA, and Chu H (2014). Network meta-analysis of randomized clinical trials: reporting the proper summaries. Clinical Trials 11, 246–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Zhang J, Chu H, Hong H, Virnig BA, and Carlin BP (2017). Bayesian hierarchical models for network meta-analysis incorporating nonignorable missingness. Statistical Methods in Medical Research 26, 2227–2243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Zhou J, Hodges JS, and Chu H (2020). Rejoinder to “CACE and meta-analysis (letter to the editor)” by stuart baker. Biometrics 76, 1385–1389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zhou J, Hodges JS, Suri MFK, and Chu H (2019). A Bayesian hierarchical model estimating CACE in meta-analysis of randomized clinical trials with noncompliance. Biometrics 75, 978–987. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Supp

RESOURCES