Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 30.
Published in final edited form as: Stat Med. 2019 Dec 9;39(3):252–264. doi: 10.1002/sim.8396

Exact inference for the random-effect model for meta-analyses with rare events

Jessica Gronsbell 1, Chuan Hong 2, Lei Nie 3, Ying Lu 1, Lu Tian 1
PMCID: PMC7704100  NIHMSID: NIHMS1649227  PMID: 31820458

Abstract

Meta-analysis allows for the aggregation of results from multiple studies to improve statistical inference for the parameter of interest. In recent years, random-effect meta-analysis has been employed to synthesize estimates of incidence rates of adverse events across heterogeneous clinical trials to evaluate treatment safety. However, the validity of existing approaches relies on asymptotic approximation as the number of studies becomes large. In practice, a limited number of trials are typically available for analysis. Moreover, adverse events are typically rare; thus, study-specific incidence rate estimates may be unstable or undefined. In this paper, we present a method for construction of an exact confidence interval for the location parameter of the beta-binomial model through inversion of exact tests. The coverage level of the proposed confidence interval is guaranteed to achieve at least the nominal level, regardless of the number of studies or the with-in study sample size, making it particularly applicable to the study of rare-event data.

Keywords: exact Inference, meta-analysis, random-effect model, rare events

1 |. INTRODUCTION

Interval estimation for a binomial probability is one of the most well-studied and methodologically important problems in statistics. For instance, premarketing and postmarketing evaluation of the safety of pharmaceutical products often lies in the estimation of the incidence rate of adverse events (AEs). When a single trial is available for assessment, inference for the incidence rate may be made with the simple binomial model or various asymptotic methods.13 However, AEs are typically rare; therefore, single trials are not sufficiently powered to make sound inference. A recent example involves Mylotarg, a targeted treatment for acute myeloid leukemia (AML) that was voluntarily withdrawn from the US market after confirmatory trials identified safety concerns such as early mortality (EM). The incidence rate of EM was estimated to be 1∕14 = 7.1% with an exact binomial 95% confidence interval (CI) of [0.2%, 33.9%] based on data from a multicenter Phase II trial.4 As the results of additional studies were made available, a meta-analytic approach enabled the synthesis of information from individual trials to make more reliable inference on the rate of EM. The simplest of such methods is the fixed-effect (FE) meta-analysis that assumes that the underlying study-specific incidence rates are identical; thus, the data may be aggregated and similarly evaluated with the binomial model. For example, additional studies reported 2 out of 8, 0 out of 6, 0 out of 6, and 2 out of 7 patients experienced EM. Combining these results yields an estimated incidence rate of (1 + 2 + 0 + 0 + 2)∕(14 + 8 + 6 + 6 + 7) = 12.1% with a narrower 95% CI of [4.1%, 26.2%].

However, the assumption of a common incidence rate across studies is often too restrictive. Studies typically differ in various aspects such as patient inclusion criteria and treatment administration resulting in heterogeneity of study-specific effects. A more realistic model allows the study-specific rates to follow a random distribution. The objective of statistical inference is therefore to estimate the parameter (oftentimes the location parameter) characterizing such a distribution and is the aim of random-effect meta-analysis.5 Within current literature, the most popular random-effect model assumes that

μ^iμi~N(μi,σi2),μi~N(μ0,τ2),i=1,,K (1)

or equivalently that

μ^i~N(μ0,σi2+τ02),

where μ^i is the point estimator of the study-specific effect μi, μ0 is the overall effect, σi2 is the known within-study variance, τ02 is the unknown between-study variance, and K is the number of studies. When the outcome of interest is binary, μi and its empirical estimator are commonly taken as logit(πi), where πi is the underlying probability of observing the event in the ith study.

In their seminal paper, DerSimonian-Laird (DL) proposed to estimate μ0 with the optimal linear combination of μ^i where the weights are constructed via a simple-moment estimator for the between-study variance τ02.6 The associated 100(1 − α)% CI for μ0 is constructed based on asymptotic approximation requiring that K → ∞. Although the DL method is widely used in practice, the number of studies utilized in a meta-analysis is typically not large and frequently less than 7.7 In the Mylotarg example, for instance, the total number of available trials for analysis amounted to 7. As a result, τ02 may be imprecisely estimated and the finite sample performance of the DL CI can be poor. Performance issues are further exacerbated for binomial outcomes when study-specific sample sizes are small and/or pi is close to zero (or one) as the normality assumption of (1) is violated.812 Additionally, the logit proportion and its estimate of variance become undefined for zero-event studies leading to the use of arbitrary continuity corrections.13 These corrections may induce bias and do not alleviate the normality violation. Nonlinear random-effect models have thus been proposed such as the normal-binomial and beta-binomial (BB) models, but the performance of these methods still relies on a large number of individual studies.10,14,15

To address these limitations, we propose a method for constructing an exact CI for the location parameter of a BB model based on the inversion of exact tests. The coverage level of the CI is guaranteed to achieve at least the nominal level up to Monte Carlo error with simulations confirming that the CI is not overly conservative. Moreover, our implementation of the proposed method allows for efficient estimation on a personal computer. In Section 2, we present our procedure for construction of the exact CI. The performance of our method is then evaluated based on extensive simulation studies and several meta-analyses aiming to assess the safety and efficacy of Mylotarg in Section 3. We conclude with a discussion in Section 4.

2 |. METHODS

2.1 |. Data structure

The observed data consist of D0={Yii=1,,K} where Yi is the number of events out of ni subjects in the ith study for i = 1, …, K. To allow for between-study variation, we assume a BB random-effect model:

Yiπi~ind.Bin(ni,πi),πi~ind.Beta(α0,β0),i=1,,K,

or equivalently that

Yi~BB(ni,α0,β0).

We require α0, β0 > 1 to ensure the unimodality of the random-effect distribution and hence the identifiability of the location parameter. This model implies

E(Yini)=α0α0+β0=μ0

and

Var(Yini)=μ0(1μ0)ni+(11ni)ν0=μ0(1μ0){1ni+(11ni)τ0},

where ν0 = μ0(1 − μ0)τ0 is the variance of the random-effect distribution and τ0 = (α0+ β0 + 1)−1 represents the between-study heterogeneity. For clarity of presentation, we reparameterize the BB distribution with respect to μ0 and ν0 based on the following equivalences:

α0=μ0{μ0(1μ0)ν0ν0}andβ0=(1μ0){μ0(1μ0)ν0ν0}. (2)

Remark 1.

While the standard BB model allows α0, β0 > 0, our additional requirement of α0, β0 > 1 implies that

ν0μ0(1μ0)min(μ01+μ0,1μ02μ0)=νsup(μ0), (3)

which significantly reduces the parameter space of the standard BB model to

{(μ0,ν0)ν0νsup(μ0)}.

The reduced parameter space in the (μ0, ν0) plane is illustrated in Figure 1. The constraint on α0 and β0 serves a dual purpose. First, it guarantees that the distribution of πi is unimodal and thus facilitates the interpretation of μ0 = E(πi). Second, it ensures that μ0 is identifiable when all of the Yi are close to zero. For instance, consider the toy example:

{(Yi,ni)=(0,1000)i=1,,6}.

Intuitively, this presents strong evidence that all of the πi are close to zero and so is μ0. However, without our unimodality constraint, one cannot rule out the possibility that πi ~ Beta(0.001, 0.003), which approximately assigns 3/4 and 1/4 probabilities to πi = 0 and πi = 1, respectively. Under this model, there is about 17% probability that all six πi are very close to zero by chance and generate the observed data. The corresponding μ0 = 1∕4 which clearly contradicts our intuition that μ0 should be ≈ 0 as all six observed incidence rates are zero. This paradox is caused by the second mode of the Beta(0.001, 0.003) distribution and can be resolved by the proposed constraint on α0 and β0.

FIGURE 1.

FIGURE 1

The boundary of the parameter space in the (μ, v) plane for the standard beta-binomial model (solid) and our restricted model (dashed)

2.2 |. Proposed exact CI for the location parameter

With the aim of constructing an interval for μ0 guaranteed to achieve at least the nominal level when K is small, we regard ν0 as a nuisance parameter. To this end, we consider an unconditional test for testing

H0:μ0=μ

based on a test statistic T(μ;D0) that is a function of both μ and the observed data D0. For now, we assume that larger values of T(μ;D0) provide more evidence against H0 and defer our discussion about the specific choice of T(μ;D0) to the subsequent section. To control the type I error, the unconditional test eliminates ν0 with the profile p value:

p(μ;D0)=supνP{T(μ;Dμ,ν)T(μ;D0)}=supνp(μ,ν;D0),

where the probability is with respect to the random data Dμ,ν following the mixed-effect BB model with parameters μ and ν. Thus, we would reject the null hypothesis, if p(μ;D0)<α, where α stands for the type I error (rather than the parameter of the Beta distribution). The corresponding 100(1 − α)% CI is the set of μ for which p(μ;D0)α.

Assuming that the true values of μ and ν are covered by the finite intervals [μL, μU] and [νL, νU], respectively, an exact CI for μ0 may then be constructed with the following generic two-step procedure:

  1. For each value (μ, ν) in a dense H × J grid G = [μL, μU] × [νL, νU], compute p(μ,ν;D0) based on the null distribution of the test statistic T(μ;Dμ,ν).

  2. Obtain the (1 − α) level CI for μ0 as
    {infμ[μL,μU]Ω1α(D0),supμ[μL,μU]Ω1α(D0)},
    where Ω1α(D0)={(μ,ν)p(μ,ν;D0)α} is the 95% confidence region for (μ0, ν0).

However, the exact distribution of T(μ;Dμ,ν) which is needed to calculate p(μ,ν;D0), is typically not tractable. We therefore propose to augment step (A) in the above procedure by approximating the null distribution through Monte Carlo simulation. Specifically, for each (μ, ν) ∈ G, we:

  • (A.1)

    Generate Dm={Yimi=1,,K} where Yim ~ BB(ni, μ, ν) for m = 1, …, M.

  • (A.2)

    Compute Tm=T(μ;Dm) and approximate the distribution of T(μ;Dμ,ν) with the empirical distribution of {Tm|m = 1, …, M}.

  • (A.3)

    Compute the p value as p(μ,ν;D0)=M1m=1MI{TmT(μ;D0)}.

While the above procedure guarantees that the resulting CI will achieve at least nominal coverage up to Monte Carlo error, the computational complexity is of O(MHJ) which may be restrictive when estimating the CI on a personal computer. In the next section, we describe our choice of test statistic as well as the details of the numerical computation that allows for the CI to be efficiently constructed.

2.3 |. Proposed test statistic and details of computation

The speed of our proposed procedure clearly rests on the choice of T(μ;D0). Our goal in constructing the test statistic is thus to balance computational efficiency with performance. Here, we propose the use of a simple Wald statistic based on a DL-type estimator. In particular, we suggest to estimate μ0 with

μ^=i=1K(Yi/ni)wi1i=1Kwi1,

where

ν^=max{0,i=1K{(Y˜in˜i)2μ^intn˜i}i=1K(11n˜i)μ^int2},μ^int=i=1KY˜ii=1Kn˜i,

wi={μ^int(1μ^int)}/ni+(11/ni)ν^, and

(Y˜i,n˜i)={(Yi+1,ni+2)ifYi=0orYi=nifori=1,,K(Yi,ni)otherwise.

The continuity correction is employed to ensure that Var(Yiniπi)>0  and ν^ is well defined when Yi = 0 or Yi = ni in all K studies. The test statistic is taken accordingly as

T(μ;D0)=(i=1Kwi1)(μ^μ)2 (4)

and may be computed quickly.

Moreover, a desirable property of T(μ;D0) is that our empirical results suggest that for a fixed μ and ν1 < ν2

P{T(μ;Dμ,ν1)t}<P{T(μ;Dμ,ν2)t} (5)

for t in the tail region of interest. This inequality implies that

p(μ;D0)=supνp(μ,ν;D0)p{μ,νsup(μ);D0},

which can greatly simplify the calculation of p(μ;D0). While the verification of (5) is not straightforward analytically and only holds approximately, the underlying rational is obvious: a larger variance in the distribution of πi implies higher variability for the test statistics and in turn increases the tail probability2. Figure 2 contains an example confidence region based on the proposed test statistic in the (μ, ν) plane supporting the claim in (5). We leverage this property to substantially reduce the computational complexity by O(J) operations by simply considering values along the boundary of the reduced BB parameter space illustrated in Figure 2. In practice, our CI may be constructed based on the following simple procedure:

  • (Step 1)

    Obtain the CI based on the asymptotic chi-square approximation to T(μ;D0), denoted by [μLBMOM,μUBMOM].

  • (Step 2)
    Letting p˜(μ;D0)=p{μ,νsup(μ);D0}, compute
    p˜(μLBMOM˜;D0)andp˜(μUBMOM˜;D0),
    where μLBMOM˜=max{s,μ˜LBMOM}, μUBMOM˜=min{1s,μ˜UBMOM}, and s is the chosen grid size.
  • (Step 3)
    Identify the starting points for the lower and upper confidence bounds, respectively, as
    μ˜inf=μLBMOM˜I{p˜(μLBMOM˜;D0)α}+μ^I{p˜(μLBMOM˜;D0)<α}
    and
    μ˜sup =μUBMOM˜I{p˜(μUBMOM˜;D0)α}+μ^I{p˜(μUBMOM˜;D0)<α}.
  • (Step 4)
    Iterate out to the left and right along the boundary provided in (3) to compute p˜(μ;D0) using steps A.1 to A.3 until [μLB, μUB], where
    μLB=infμ[μp˜(μ;D0)α],
    and
    μUB=supμ[μp˜(μ;D0)α].
    In practice, we would compute
    {p˜(μ˜infks;D0),k=1,2,}and {p˜(μ˜sup+ks;D0),k=1,2,},
    and let
    μLB=min{μ˜infksp˜(μ˜infks;D0)α,k0},
    and
    μUB=max{μ˜sup+ksp˜(μ˜sup+ks;D0)α,k0}.
  • (Step 5)
    Compute p(μ;D0)=supνp(μ,ν;D0) for μ ∈ [μLBδ, μLB] for some small δ > 0. To this end, we need to calculate p(μ,ν;D0) for many pairs of ν and μ. For example, to compute the lower bound of the CI, we would need to compute p(μ(j),ν(ji);D0) and approximate p(μ(j);D0) by
    max{p(μ(j),ν(ji);D0)i=1,,Ij},
    where {μ(1), μ(2), … , μ(J)} are J equally spaced points within the interval [μLBδ, μLB] and {ν(j1),ν(j2),,ν(jIj)} are Ij equally spaced points within the interval [0, νsup(μ(j))]. The lower end of the CI is given by
    min{μ(j)p(μ(j);D0)α,j=1,,J}.
  • (Step 6)

    Likewise, we can calculate the upper end of the CI by computing p(μ;D0) for μ ∈ [μUB, μUB + δ].

FIGURE 2.

FIGURE 2

Example confidence region in the (μ, v) plane illustrating the stochastic dominance property in (5). The dashed lined is the boundary of the parameter space for the restricted beta-binomial model

This algorithm uses the fact that p(μ;D0)p{μ,νsup (μ);D0} and the cut off values based on the latter as μLB and μUB are very close to the ends of the CI. In practice, further gains in computational speed may be achieved with standard parallelization techniques.

Remark 2.

While we may consider other scales (such as logit) in deciding the grid points for searching the confidence bounds of μ, we prefer the original scale with the intention to control the precision on the probability scale. In practice, the grid size s needs to be chosen according to the required precision of the incidence rate of interest. Oftentimes, s = 0.001 is a sensible choice. The number of Monte Carlo iterations in calculating p(μ,ν;D0) should also depend on the desired precision. For example, if we want to estimate a p-value of 0.05 with a standard error of 0.005 then M needs to be ≥ 2000. The tuning parameter, δ, in steps 5 and 6 can be set as a multiple of s such as 10s in practice.

Remark 3.

We note that the selection of the test statistic is not unique. A simple alternative is T˜(μ;D0)=(i=1KYiμi=1Kni)2. In our limited experience, we have found that the corresponding exact CI is similar to that based on T(μ;D0) when K is small. One potential advantage of this test statistic is that the aforementioned approximation to p{μ,νsup(μ);D0} appears to become an equality, ie, p(μ;D0)=p{μ,νsup(μ);D0}. Thus, we may skip steps 5 and 6 in the proposed algorithm to further accelerate the computation.

3 |. NUMERICAL STUDIES

3.1 |. Simulation study

We evaluated the performance of the proposed method in finite samples through simulation study. Throughout, the observed data were generated as

Yiπi~Bin(ni,πi),πi~Beta(α0,β0),i=1,,K,

for different values of K and (α0, β0) to represent varying degrees of heterogeneity and event rates. Specifically, we consider a rare-event rate setting with

Setting1a:(α0,β0)=(1.2,75)andSetting 1b:(α0,β0)=(3.6,225)

and a low event rate setting with

Setting2a: (α0,β0)=(1.2,10)andSetting 2b:(α0,β0)=(3.6,30)

so that (μ0, τ0) = (0.016, 0.013) and (0.016, 0.004) in Settings 1a and 1b, respectively, and (μ0, τ0) = (0.107, 0.082) and (0.107, 0.029) in Settings 2a and 2b, respectively. For all scenarios, we set K = 5, 10, 15, and 20 and let the study specific sample sizes be n1 = n2 = 50, n3 = n4 = 100, and ni = 150 for i ≥ 5/ For comparison, we present the performance of the CIs based on the DL and Sidik-Jonkman (SJ) methods with (1) the logistic transformation and 0.5 continuity correction (DL-LGT and SJ-LGT) and (2) the arcsine transformation (DL-ASIN and SJ-ASIN) as well as the CI based on the asymptotic chi-square approximation to the proposed test statistics (4) denoted as the method-of-moment estimators (MOM). We have also obtained the exact CI under the FE model where the pooled data i=1KYi~Bin(i=1Kni,μ0). In Tables 1 and 2, we summarize the performance of the 95% CIs from the various procedures based on the median interval length and empirical coverage probability from 2000 simulated data sets.

TABLE 1.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the rare-event rate setting and moderate study-specific sample sizes

(1a) α0 = 1.2,β0=75
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.962 (0.040) 0.954 (0.028) 0.958 (0.023) 0.950 (0.019)
MOM 0.821 (0.025) 0.85 (0.018) 0.876 (0.016) 0.900 (0.014)
DL-LGT 0.882 (0.032) 0.848 (0.021) 0.831 (0.017) 0.841 (0.015)
SJ-LGT 0.958 (0.037) 0.958 (0.023) 0.961 (0.019) 0.970 (0.016)
DL-ASIN 0.792 (0.026) 0.751 (0.018) 0.699 (0.015) 0.648 (0.013)
SJ-ASIN 0.822 (0.028) 0.781 (0.019) 0.734 (0.016) 0.684 (0.014)
FE 0.890 (0.024) 0.790 (0.015) 0.788 (0.011) 0.786 (0.010)
(1b) α0 = 3.6,β0=30
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.978 (0.039) 0.98 (0.025) 0.976 (0.018) 0.971 (0.015)
MOM 0.874 (0.024) 0.903 (0.015) 0.920 (0.012) 0.920 (0.011)
DL-LGT 0.898 (0.031) 0.837 (0.019) 0.769 (0.015) 0.714 (0.012)
SJ-LGT 0.959 (0.034) 0.961 (0.021) 0.964 (0.017) 0.958 (0.014)
DL-ASIN 0.858 (0.025) 0.840 (0.017) 0.812 (0.014) 0.776 (0.012)
SJ-ASIN 0.878 (0.028) 0.877 (0.019) 0.854 (0.015) 0.814 (0.013)
FE 0.948 (0.026) 0.914 (0.015) 0.905 (0.011) 0.906 (0.010)

TABLE 2.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the low event rate setting and moderate study-specific sample sizes

(2a) α0 = 1.2,β0=10
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.962 (0.222) 0.955 (0.157) 0.961 (0.116) 0.960 (0.094)
MOM 0.802 (0.122) 0.882 (0.101) 0.909 (0.087) 0.920 (0.076)
DL-LGT 0.864 (0.131) 0.878 (0.091) 0.840 (0.073) 0.816 (0.063)
SJ-LGT 0.890 (0.151) 0.905 (0.107) 0.882 (0.087) 0.844 (0.076)
DL-ASIN 0.832 (0.136) 0.850 (0.102) 0.836 (0.085) 0.819 (0.075)
SJ-ASIN 0.838 (0.137) 0.854 (0.104) 0.844 (0.086) 0.820 (0.075)
FE 0.478 (0.058) 0.426 (0.035) 0.428 (0.028) 0.425 (0.024)
(2b) α = 3.6,β=30
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.990 (0.176) 0.968 (0.103) 0.959 (0.074) 0.969 (0.061)
MOM 0.836 (0.084) 0.883 (0.066) 0.914 (0.056) 0.925 (0.049)
DL-LGT 0.888 (0.096) 0.906 (0.067) 0.920 (0.055) 0.922 (0.047)
SJ-LGT 0.924 (0.106) 0.938 (0.075) 0.940 (0.062) 0.945 (0.053)
DL-ASIN 0.866 (0.096) 0.892 (0.070) 0.896 (0.058) 0.900 (0.050)
SJ-ASIN 0.887 (0.099) 0.904 (0.072) 0.907 (0.059) 0.909 (0.052)
FE 0.688 (0.059) 0.626 (0.036) 0.625 (0.028) 0.642 (0.024)

With the exception of the SJ-LGT method, all comparison methods fail to achieve the nominal coverage level of 95%. For instance, the empirical coverage probabilities for the remaining methods based on the random-effect model range from 64.8% to 93.2% with K = 20. Although the SJ-LGT method is generally the best performing method of the DL and SJ methods considered, the coverage probability substantially deteriorates with increasing K in Setting 2a. The MOM method exhibits more stable behavior but falls well below the nominal level for K = 20 in Setting 1a. As expected, the FE method exhibits significant undercoverage with the best performance in Setting 1b where the between-study heterogeneity is the smallest.

The proposed method is conservative for K = 5 and the settings with lower heterogeneity (Settings 1b and 2b), but overall, we find that the coverage levels are only moderately higher than 95% in the scenarios considered. As expected, the length of the proposed CI decreases with increasing K. With K = 20, the length is generally on par with the competing methods, although yields a coverage level closer to the desired 95%.

For completeness, we also considered analogous scenarios with smaller study-specific sample sizes. Specifically, for both settings, we set the study specific sample sizes to: n1 = n2 = 10, n3 = n4 = 15, and ni = 20 for i ≥ 5. The results are summarized in Tables 3 and 4. When μ0 = 0.107, the FE method exhibits undercoverage while the empirical coverage levels of CIs based on the new method are all above the nominal level. The empirical coverage level of CIs based on SJ-LGT are also satisfactory. When μ0 = 0.016, the empirical coverage levels of CIs based on DL-LGT, SJ-LGT, DL-ASIN, and SJ-ASIN are substantially lower than the nominal level. A potential reason is that the bias caused by the 0.5 correction in zero event studies becomes nonnegligible relative to the true incidence rates which are very low.

TABLE 3.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the rare-event rate setting and small study-specific sample sizes

(2a) α = 1.2,β=75
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.961 (0.098) 0.952 (0.046) 0.944 (0.035) 0.956 (0.029)
MOM 0.999 (0.115) 1.000 (0.068) 0.998 (0.054) 0.999 (0.045)
DL-LGT 0.696 (0.112) 0.308 (0.064) 0.088 (0.051) 0.026 (0.042)
SJ-LGT 0.696 (0.113) 0.308 (0.065) 0.088 (0.051) 0.026 (0.043)
DL-ASIN 0.944 (0.088) 0.796 (0.052) 0.573 (0.042) 0.285 (0.036)
SJ-ASIN 0.962 (0.088) 0.840 (0.054) 0.586 (0.042) 0.295 (0.036)
FE 0.964 (0.077) 0.968 (0.040) 0.960 (0.033) 0.950 (0.029)
(2b) α = 3.6,β=225
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.962 (0.098) 0.964 (0.046) 0.956 (0.035) 0.964 (0.029)
MOM 1.000 (0.115) 1.000 (0.068) 0.998 (0.053) 0.999 (0.045)
DL-LGT 0.697 (0.112) 0.270 (0.064) 0.086 (0.050) 0.020 (0.042)
SJ-LGT 0.696 (0.113) 0.270 (0.065) 0.086 (0.050) 0.020 (0.043)
DL-ASIN 0.942 (0.088) 0.828 (0.053) 0.584 (0.042) 0.276 (0.036)
SJ-ASIN 0.958 (0.088) 0.853 (0.052) 0.598 (0.042) 0.285 (0.036)
FE 0.966 (0.077) 0.981 (0.040) 0.962 (0.033) 0.962 (0.029)

TABLE 4.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the low event rate setting and small study-specific sample sizes

(2a) α = 1.2,β=10
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.970 (0.229) 0.962 (0.146) 0.960 (0.119) 0.960 (0.104)
MOM 0.834 (0.160) 0.859 (0.110) 0.894 (0.095) 0.900 (0.086)
DL-LGT 0.920 (0.189) 0.882 (0.133) 0.860 (0.110) 0.869 (0.095)
SJ-LGT 0.960 (0.215) 0.964 (0.148) 0.968 (0.121) 0.974 (0.104)
DL-ASIN 0.914 (0.151) 0.882 (0.107) 0.888 (0.091) 0.874 (0.079)
SJ-ASIN 0.930 (0.169) 0.927 (0.120) 0.934 (0.099) 0.916 (0.086)
FE 0.840 (0.154) 0.803 (0.098) 0.823 (0.077) 0.814 (0.065)
(2b) α = 3.6,β=30
K = 5 K = 10 K = 15 K = 20
Method CovP (Length) CovP (Length) CovP (Length) CovP (Length)
Exact 0.990 (0.218) 0.983 (0.134) 0.981 (0.102) 0.975 (0.087)
MOM 0.906 (0.155) 0.912 (0.100) 0.922 (0.079) 0.926 (0.069)
DL-LGT 0.924 (0.177) 0.881 (0.116) 0.831 (0.093) 0.774 (0.080)
SJ-LGT 0.974 (0.198) 0.973 (0.133) 0.968 (0.107) 0.970 (0.092)
DL-ASIN 0.967 (0.145) 0.931 (0.095) 0.920 (0.076) 0.900 (0.066)
SJ-ASIN 0.974 (0.162) 0.957 (0.110) 0.955 (0.089) 0.948 (0.077)
FE 0.934 (0.154) 0.900 (0.098) 0.910 (0.076) 0.914 (0.065)

Furthermore, we have investigated the validity of (5):

P{T(μ;Dμ,ν1)t}<P{T(μ;Dμ,ν2)t},forν1<ν2,

via numerical studies. Specifically, with μ0 at the previous simulation setting, we estimate the cumulative distribution function of T(μ0;Dμ0,ν) for ν = νsup(μ0)k∕5, k = 1, …, 5, using Monte Carlo simulation. To this end, we simulated 106 test statistics for each combination of (μ0, ν, n1, …, nK) and obtained the empirical survival function. We have plotted the survival functions for μ0 = 0.016 and μ0 = 0.107 with moderate study specific sample sizes in Figures 3 and 4, respectively, to examine whether they are stochastically ordered according to the value of ν. Likewise, we have plotted the survival function for μ0 = 0.107 with small study specific sample sizes in Figure 5. The simulation results confirm the inequality (5) and especially the claim that

supννsup (μ)P{T(μ;Dμ,ν)t}=P{T(μ;Dμ,νsup p(μ))t}

under the given settings.

FIGURE 3.

FIGURE 3

The empirical survival functions for the test statistic with μ = 0.016 for the moderate study-specific sample size setting. The five survival curves from bottom to top correspond to increasing values of v

FIGURE 4.

FIGURE 4

The empirical survival functions for the test statistic with μ = 0.107 for the moderate study-specific sample size setting. The five survival curves from bottom to top correspond to increasing values of v

FIGURE 5.

FIGURE 5

The empirical survival functions for the test statistic with μ = 0.107 for the small study-specific sample size setting. The five survival curves from bottom to top correspond to increasing values of v

3.2 |. Real data analysis

AML is an aggressive cancer that begins in the bone marrow and results in an elevated level of immature white blood cells called myeloid blasts in the bloodstream. CD33 is an antigen expressed on the blast cells in the majority of AML patients and has been identified as a suitable candidate for targeted treatment of the disease. One such drug, Mylotarg, was approved by the FDA in 2000 for the treatment of patients with CD33-positive AML in first relapse who are 60 years of age or older and who are not candidates for other types of cytotoxic chemotherapy. Confirmatory trials revealed safety concerns, including higher rates of EM and veno-occlusive disease (VOD) and failed to confirm a clinical benefit. Here, we illustrate the proposed method in three meta-analyses of 14 trials aiming to evaluate the safety and/or efficacy of an unfractionated regimen of Mylotarg: 2 doses of 6 mg/m2 given 14 days apart.4 The studies consisted of three pivotal phase II multicenter trials as well as dose-escalation and postmarketing studies in patients with relapsed or refractory AML. The efficacy of the treatment regimen was evaluated based on meta-analyses of complete remission (CR) rate across the studies presented in Table A1 in the Appendix. Safety was assessed with meta-analyses of EM and VOD based on the studies presented in Tables A2 and A3. We also provide some discussion of the comparison of the 6 mg/m2 unfractionated regimen to a fractionated dosing regimen of 3 mg/m2 given on days 1, 4, and 7 for which very few studies were available for analysis.

The 95% CIs and corresponding lengths based on the proposed exact method as well as the FE, SJ-LGT, and SJ-ASIN methods for the 6 mg/m2 regimen are presented in Table 5. The number of available studies for analysis is quite small for each of the outcomes. Additionally, all studies have 16 or fewer patients and the number of zero event studies is high. For example, CR was not observed in four of the six studies considered. As a result, we observe substantial differences in the point estimates for the three methods, ranging from 0.013 to 0.035, as well as the corresponding CIs. The SJ-LGT interval is more than twice the length of SJ-ASIN interval and longer than the CI from the exact method. Similar though less extreme patterns are observed for the EM and VOD analyses. Consistent with our simulation results and intuition, we also find the intervals from the FE method are shorter than those from the proposed method. These differences highlight the utility of our method in practice as its validity does not rely on the choice of transformation or large sample approximation.

TABLE 5.

Random effects meta-analyses of the efficacy (complete remission (CR)) and safety (early mortality (EM) and veno-occlusive disease (VOD)) of the 6 mg/m2 regimen of Mylotarg based on the proposed exact method and the SJ-LGT and SJ-ASIN methods. Presented are the 95% confidence intervals and corresponding lengths for each of the methods

Study CR EM VOD
K 6 5 7
Exact 0.007–0.128 (0.121) 0.049–0.340 (0.291) 0.072–0.367 (0.295)
SJ-LGT 0.027–0.185 (0.158) 0.064–0.342 (0.279) 0.112–0.421 (0.310)
SJ-ASIN 0.003–0.076 (0.073) 0.001–0.252 (0.251) 0.012–0.306 (0.294)
FE 0.004 – 0.121 (0.116) 0.041–0.262 (0.221) 0.105–0.350 (0.245)

In addition to evaluating the 6 mg/m2 unfractionated treatment regimen, interest also lied in comparison to a fractionated dosing regimen of 3 mg/m2. However, only two studies evaluated CR and EM for the fractionated regimen and three studies for VOD. We therefore constructed exact binomial CIs based on the pooled data, effectively assuming a FE model, as it is difficult to support the modeling assumptions for a random-effect model in this setting. The resulting CIs for CR, EM, and VOD were estimated to be (0.153, 0.379), (0.035, 0.170), and (0, 0.042), respectively. While these results suggest some benefit in efficacy for the fractioned treatment versus the unfractionated treatment as well as lower rates of VOD and EM, they must be interpreted with caution due to the small number of studies and the use of the FE approach. The confidence intervals based on the proposed random-effect approach are wider as expected, namely (0.060, 0.724), (0.030, 0.377), and (0, 0.169) respectively, but we again caution that there is insufficient information to support the related model assumptions for the application of this method.

4 |. DISCUSSION

In this paper, we have proposed a random-effect model for combining multiple binomial random variables with underlying probabilities following a parametric distribution. In particular, we introduced an exact CI for the location parameter of the BB model. The method is valid regardless of the number of studies or the sample size within each of the studies and is thus especially suitable to study rare-event data where the event probability is close to zero. While the performance of the method depends on the choice of the test statistic, we choose the modified DL test statistic to ensure that our method yields similar performance as the DL method when K is not small as the DL method is asymptotically optimal. A closely related problem and a direction warranting future research is to extend our method to combine a group of two-by-two tables which are used to characterize the between group difference in binary outcomes from multiple studies.

Finally, we note that when the number of studies is less than 4 we oftentimes lack sufficient information to either support or refute the involved model assumptions. We therefore advise practitioners to avoid over interpreting the results despite the proposed method maintaining validity under correctly specified model assumptions. Additionally, we emphasize that having a small number of studies by itself is not necessarily a good reason for using the simple FE model since its implicit assumption is even stronger than that of the random-effect model. As a general suggestion, the choice between random versus FE approaches should be based on the observed data and one should always clearly state the key assumptions together with the analysis results.

ACKNOWLEDGEMENT

This research was supported by NIH grant 5R01 HL08977807. This article reflects the views of the authors and should not be construed to represent FDA views or policies.

APPENDIX

DATA FOR MYLOTARG EXAMPLE

TABLE A1.

Complete remission rates for the studies of the two regimens of Mylotarg

(a) 6 mg/m2
Events Total
Study 101 0 8
Study 102 1 14
Study 103 0 6
Study 100374 1 6
Piccaluga, 2004 0 7
van der Heiden, 2006 0 16
(b) 3 mg/m2
Events Total
MyloFrance 1 15 57
Brethon, 2006 1 6

TABLE A2.

Rates of early mortality for the studies of the two regimens of Mylotarg

(a) 6 mg/m2
Events Total
Study 101 2 8
Study 102 1 14
Study 103 0 6
Study 100374 0 6
Piccaluga, 2004 2 7
(b) 3 mg/m2
Events Total
MyloFrance 1 4 57
Thomas, 2005 3 24

TABLE A3.

Rates of veno-occlusive disease for the studies of the two regimens of Mylotarg

(a) 6 mg/m2
Events Total
Study 101 1 8
Study 102 6 14
Study 103 0 6
Study 100374 2 6
Thomas, 2005 1 6
Piccaluga, 2004 0 7
Zwaan, 2003 0 1
(b) 3 mg/m2
Events Total
MyloFrance 1 0 57
Thomas, 2005 0 24
Brethon, 2006 0 6

Footnotes

DATA AVAILABILITY STATEMENT

The data for the Mylotarg example is provided in the Appendix.

REFERENCES

  • 1.Blyth CR, Still HA. Binomial confidence intervals. J Am Stat Assoc. 1983;78(381):108–116. [Google Scholar]
  • 2.Agresti A, Caffo B. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat. 2000;54(4):280–288. [Google Scholar]
  • 3.Brown LD, Cai TT, Dasgupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Stat. 2002;30(1):160–201. [Google Scholar]
  • 4.US Food and Drug Administration. Federal Drug Administration Briefing Document: Oncologic Drugs Advisory Committee Meeting. 2017.
  • 5.Normand SL. Tutorial in biostatistics meta-analysis: formulating, evaluating, combining, and reporting. Statis Med. 1999;18(3):321–359. [DOI] [PubMed] [Google Scholar]
  • 6.DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]
  • 7.Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of meta-analyses and their component studies in the cochrane database of systematic reviews: a cross-sectional, descriptive analysis. BMC Med Res Method. 2011;11(1):160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Platt RW, Leroux BG, Breslow N. Generalized linear mixed models for meta-analysis. Statis Med. 1999;18(6):643–654. [DOI] [PubMed] [Google Scholar]
  • 9.Shuster JJ, Jones LS, Salmon DA. Fixed vs random-effect meta-analysis in rare event studies: the rosiglitazone link with myocardial infarction and cardiac death. Statis Med. 2007;26(24):4375–4385. [DOI] [PubMed] [Google Scholar]
  • 10.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. J Clin Epidemiol. 2008;61(1):41–51. [DOI] [PubMed] [Google Scholar]
  • 1.Stijnen T, Hamza TH, Özdemir P. Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data. Statis Med. 2010;29(29):3046–3067. [DOI] [PubMed] [Google Scholar]
  • 12.Bhaumik DK, Amatya A, Normand SL, Greenhouse J, Kaizar E, Neelon B, Gibbons RD. Meta-analysis of rare binary adverse event data. J Am Stat Assoc. 2012;107(498):555–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sweeting M, Sutton A, Lambert P. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statis Med. 2004;23(9):1351–1375. [DOI] [PubMed] [Google Scholar]
  • 14.Young-Xu Y, Chan KA. Pooling overdispersed binomial data to estimate event rate. BMC Med Res Method. 2008;8(1):58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ma Y, Chu H, Mazumdar M. Meta-analysis of proportions of rare events–a comparison of exact likelihood methods with robust variance estimation. Commun Stat-Simul Comput. 2016;45(8):3036–3052. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES