Exact inference for the random-effect model for meta-analyses with rare events

Jessica Gronsbell; Chuan Hong; Lei Nie; Ying Lu; Lu Tian

doi:10.1002/sim.8396

. Author manuscript; available in PMC: 2020 Nov 30.

Published in final edited form as: Stat Med. 2019 Dec 9;39(3):252–264. doi: 10.1002/sim.8396

Exact inference for the random-effect model for meta-analyses with rare events

Jessica Gronsbell ¹, Chuan Hong ², Lei Nie ³, Ying Lu ¹, Lu Tian ¹

PMCID: PMC7704100 NIHMSID: NIHMS1649227 PMID: 31820458

Abstract

Meta-analysis allows for the aggregation of results from multiple studies to improve statistical inference for the parameter of interest. In recent years, random-effect meta-analysis has been employed to synthesize estimates of incidence rates of adverse events across heterogeneous clinical trials to evaluate treatment safety. However, the validity of existing approaches relies on asymptotic approximation as the number of studies becomes large. In practice, a limited number of trials are typically available for analysis. Moreover, adverse events are typically rare; thus, study-specific incidence rate estimates may be unstable or undefined. In this paper, we present a method for construction of an exact confidence interval for the location parameter of the beta-binomial model through inversion of exact tests. The coverage level of the proposed confidence interval is guaranteed to achieve at least the nominal level, regardless of the number of studies or the with-in study sample size, making it particularly applicable to the study of rare-event data.

Keywords: exact Inference, meta-analysis, random-effect model, rare events

1 |. INTRODUCTION

Interval estimation for a binomial probability is one of the most well-studied and methodologically important problems in statistics. For instance, premarketing and postmarketing evaluation of the safety of pharmaceutical products often lies in the estimation of the incidence rate of adverse events (AEs). When a single trial is available for assessment, inference for the incidence rate may be made with the simple binomial model or various asymptotic methods.^1–3 However, AEs are typically rare; therefore, single trials are not sufficiently powered to make sound inference. A recent example involves Mylotarg, a targeted treatment for acute myeloid leukemia (AML) that was voluntarily withdrawn from the US market after confirmatory trials identified safety concerns such as early mortality (EM). The incidence rate of EM was estimated to be 1∕14 = 7.1% with an exact binomial 95% confidence interval (CI) of [0.2%, 33.9%] based on data from a multicenter Phase II trial.⁴ As the results of additional studies were made available, a meta-analytic approach enabled the synthesis of information from individual trials to make more reliable inference on the rate of EM. The simplest of such methods is the fixed-effect (FE) meta-analysis that assumes that the underlying study-specific incidence rates are identical; thus, the data may be aggregated and similarly evaluated with the binomial model. For example, additional studies reported 2 out of 8, 0 out of 6, 0 out of 6, and 2 out of 7 patients experienced EM. Combining these results yields an estimated incidence rate of (1 + 2 + 0 + 0 + 2)∕(14 + 8 + 6 + 6 + 7) = 12.1% with a narrower 95% CI of [4.1%, 26.2%].

However, the assumption of a common incidence rate across studies is often too restrictive. Studies typically differ in various aspects such as patient inclusion criteria and treatment administration resulting in heterogeneity of study-specific effects. A more realistic model allows the study-specific rates to follow a random distribution. The objective of statistical inference is therefore to estimate the parameter (oftentimes the location parameter) characterizing such a distribution and is the aim of random-effect meta-analysis.⁵ Within current literature, the most popular random-effect model assumes that

{\hat{μ}}_{i} ∣ μ_{i} ~ N (μ_{i}, σ_{i}^{2}), μ_{i} ~ N (μ_{0}, τ^{2}), i = 1, \dots, K

(1)

or equivalently that

{\hat{μ}}_{i} ~ N (μ_{0}, σ_{i}^{2} + τ_{0}^{2}),

where ${\hat{μ}}_{i}$ is the point estimator of the study-specific effect μ_i, μ₀ is the overall effect, $σ_{i}^{2}$ is the known within-study variance, $τ_{0}^{2}$ is the unknown between-study variance, and K is the number of studies. When the outcome of interest is binary, μ_i and its empirical estimator are commonly taken as logit(π_i), where π_i is the underlying probability of observing the event in the ith study.

In their seminal paper, DerSimonian-Laird (DL) proposed to estimate μ₀ with the optimal linear combination of ${\hat{μ}}_{i}$ where the weights are constructed via a simple-moment estimator for the between-study variance $τ_{0}^{2}$ .⁶ The associated 100(1 − α)% CI for μ₀ is constructed based on asymptotic approximation requiring that K → ∞. Although the DL method is widely used in practice, the number of studies utilized in a meta-analysis is typically not large and frequently less than 7.⁷ In the Mylotarg example, for instance, the total number of available trials for analysis amounted to 7. As a result, $τ_{0}^{2}$ may be imprecisely estimated and the finite sample performance of the DL CI can be poor. Performance issues are further exacerbated for binomial outcomes when study-specific sample sizes are small and/or p_i is close to zero (or one) as the normality assumption of (1) is violated.^8–12 Additionally, the logit proportion and its estimate of variance become undefined for zero-event studies leading to the use of arbitrary continuity corrections.¹³ These corrections may induce bias and do not alleviate the normality violation. Nonlinear random-effect models have thus been proposed such as the normal-binomial and beta-binomial (BB) models, but the performance of these methods still relies on a large number of individual studies.^10,14,15

To address these limitations, we propose a method for constructing an exact CI for the location parameter of a BB model based on the inversion of exact tests. The coverage level of the CI is guaranteed to achieve at least the nominal level up to Monte Carlo error with simulations confirming that the CI is not overly conservative. Moreover, our implementation of the proposed method allows for efficient estimation on a personal computer. In Section 2, we present our procedure for construction of the exact CI. The performance of our method is then evaluated based on extensive simulation studies and several meta-analyses aiming to assess the safety and efficacy of Mylotarg in Section 3. We conclude with a discussion in Section 4.

2 |. METHODS

2.1 |. Data structure

The observed data consist of $D^{0} = {Y_{i} ∣ i = 1, \dots, K}$ where Y_i is the number of events out of n_i subjects in the ith study for i = 1, …, K. To allow for between-study variation, we assume a BB random-effect model:

Y_{i} ∣ π_{i} \overset{ind .}{~} Bin (n_{i}, π_{i}), π_{i} \overset{ind .}{~} Beta (α_{0}, β_{0}), i = 1, \dots, K,

or equivalently that

Y_{i} ~ BB (n_{i}, α_{0}, β_{0}) .

We require α₀, β₀ > 1 to ensure the unimodality of the random-effect distribution and hence the identifiability of the location parameter. This model implies

E (\frac{Y_{i}}{n_{i}}) = \frac{α_{0}}{α_{0} + β_{0}} = μ_{0}

and

Var (\frac{Y_{i}}{n_{i}}) = \frac{μ_{0} (1 - μ_{0})}{n_{i}} + (1 - \frac{1}{n_{i}}) ν_{0} = μ_{0} (1 - μ_{0}) {\frac{1}{n_{i}} + (1 - \frac{1}{n_{i}}) τ_{0}},

where ν₀ = μ₀(1 − μ₀)τ₀ is the variance of the random-effect distribution and τ₀ = (α₀+ β₀ + 1)⁻¹ represents the between-study heterogeneity. For clarity of presentation, we reparameterize the BB distribution with respect to μ₀ and ν₀ based on the following equivalences:

α_{0} = μ_{0} {\frac{μ_{0} (1 - μ_{0}) - ν_{0}}{ν_{0}}} and β_{0} = (1 - μ_{0}) {\frac{μ_{0} (1 - μ_{0}) - ν_{0}}{ν_{0}}} .

(2)

Remark 1.

While the standard BB model allows α₀, β₀ > 0, our additional requirement of α₀, β₀ > 1 implies that

ν_{0} \leq μ_{0} (1 - μ_{0}) min (\frac{μ_{0}}{1 + μ_{0}}, \frac{1 - μ_{0}}{2 - μ_{0}}) = ν_{sup} (μ_{0}),

(3)

which significantly reduces the parameter space of the standard BB model to

{(μ_{0}, ν_{0}) ∣ ν_{0} \leq ν_{sup} (μ_{0})} .

The reduced parameter space in the (μ₀, ν₀) plane is illustrated in Figure 1. The constraint on α₀ and β₀ serves a dual purpose. First, it guarantees that the distribution of π_i is unimodal and thus facilitates the interpretation of μ₀ = E(π_i). Second, it ensures that μ₀ is identifiable when all of the Y_i are close to zero. For instance, consider the toy example:

{(Y_{i}, n_{i}) = (0, 1000) ∣ i = 1, \dots, 6} .

Intuitively, this presents strong evidence that all of the π_i are close to zero and so is μ₀. However, without our unimodality constraint, one cannot rule out the possibility that π_i ~ Beta(0.001, 0.003), which approximately assigns 3/4 and 1/4 probabilities to π_i = 0 and π_i = 1, respectively. Under this model, there is about 17% probability that all six π_i are very close to zero by chance and generate the observed data. The corresponding μ₀ = 1∕4 which clearly contradicts our intuition that μ₀ should be ≈ 0 as all six observed incidence rates are zero. This paradox is caused by the second mode of the Beta(0.001, 0.003) distribution and can be resolved by the proposed constraint on α₀ and β₀.

The boundary of the parameter space in the (*μ, v*) plane for the standard beta-binomial model (solid) and our restricted model (dashed)

2.2 |. Proposed exact CI for the location parameter

With the aim of constructing an interval for μ₀ guaranteed to achieve at least the nominal level when K is small, we regard ν₀ as a nuisance parameter. To this end, we consider an unconditional test for testing

H_{0} : μ_{0} = μ

based on a test statistic $T (μ; D^{0})$ that is a function of both μ and the observed data $D^{0}$ . For now, we assume that larger values of $T (μ; D^{0})$ provide more evidence against H₀ and defer our discussion about the specific choice of $T (μ; D^{0})$ to the subsequent section. To control the type I error, the unconditional test eliminates ν₀ with the profile p value:

p (μ; D^{0}) = sup_{ν} P {T (μ; D^{μ, ν}) \geq T (μ; D^{0})} = sup_{ν} p (μ, ν; D^{0}),

where the probability is with respect to the random data $D^{μ, ν}$ following the mixed-effect BB model with parameters μ and ν. Thus, we would reject the null hypothesis, if $p (μ; D^{0}) < α$ , where α stands for the type I error (rather than the parameter of the Beta distribution). The corresponding 100(1 − α)% CI is the set of μ for which $p (μ; D^{0}) \geq α$ .

Assuming that the true values of μ and ν are covered by the finite intervals [μ_L, μ_U] and [ν_L, ν_U], respectively, an exact CI for μ₀ may then be constructed with the following generic two-step procedure:

For each value (μ, ν) in a dense H × J grid G = [μ_L, μ_U] × [ν_L, ν_U], compute $p (μ, ν; D^{0})$ based on the null distribution of the test statistic $T (μ; D^{μ, ν})$ .
Obtain the (1 − α) level CI for μ₀ as
${inf_{μ \in [μ_{L}, μ_{U}]} Ω_{1 - α} (D^{0}), sup_{μ \in [μ_{L}, μ_{U}]} Ω_{1 - α} (D^{0})},$
where $Ω_{1 - α} (D^{0}) = {(μ, ν) ∣ p (μ, ν; D^{0}) \geq α}$ is the 95% confidence region for (μ₀, ν₀).

However, the exact distribution of $T (μ; D^{μ, ν})$ which is needed to calculate $p (μ, ν; D^{0})$ , is typically not tractable. We therefore propose to augment step (A) in the above procedure by approximating the null distribution through Monte Carlo simulation. Specifically, for each (μ, ν) ∈ G, we:

(A.1)
Generate $D_{m} = {Y_{i m} ∣ i = 1, \dots, K}$ where Y_im ~ BB(n_i, μ, ν) for m = 1, …, M.
(A.2)
Compute $T_{m} = T (μ; D_{m})$ and approximate the distribution of $T (μ; D^{μ, ν})$ with the empirical distribution of {T_m|m = 1, …, M}.
(A.3)
Compute the p value as $p (μ, ν; D^{0}) = M^{- 1} \sum_{m = 1}^{M} I {T_{m} \geq T (μ; D^{0})}$ .

While the above procedure guarantees that the resulting CI will achieve at least nominal coverage up to Monte Carlo error, the computational complexity is of O(MHJ) which may be restrictive when estimating the CI on a personal computer. In the next section, we describe our choice of test statistic as well as the details of the numerical computation that allows for the CI to be efficiently constructed.

2.3 |. Proposed test statistic and details of computation

The speed of our proposed procedure clearly rests on the choice of $T (μ; D^{0})$ . Our goal in constructing the test statistic is thus to balance computational efficiency with performance. Here, we propose the use of a simple Wald statistic based on a DL-type estimator. In particular, we suggest to estimate μ₀ with

\hat{μ} = \frac{\sum_{i = 1}^{K} (Y_{i} / n_{i}) w_{i}^{- 1}}{\sum_{i = 1}^{K} w_{i}^{- 1}},

where

\hat{ν} = max {0, \frac{\sum_{i = 1}^{K} {{(\frac{{\tilde{Y}}_{i}}{{\tilde{n}}_{i}})}^{2} - \frac{{\hat{μ}}_{int}}{{\tilde{n}}_{i}}}}{\sum_{i = 1}^{K} (1 - \frac{1}{{\tilde{n}}_{i}})} - {\hat{μ}}_{int}^{2}}, {\hat{μ}}_{int} = \frac{\sum_{i = 1}^{K} {\tilde{Y}}_{i}}{\sum_{i = 1}^{K} {\tilde{n}}_{i}},

$w_{i} = {{\hat{μ}}_{int} (1 - {\hat{μ}}_{int})} / n_{i} + (1 - 1 / n_{i}) \hat{ν}$ , and

({\tilde{Y}}_{i}, {\tilde{n}}_{i}) = {\begin{array}{l} (Y_{i} + 1, n_{i} + 2) if Y_{i} = 0 or Y_{i} = n_{i} for i = 1, \dots, K \\ (Y_{i}, n_{i}) otherwise \end{array} .

The continuity correction is employed to ensure that $Var (\frac{Y_{i}}{n_{i}} ∣ π_{i}) > 0$ and $\hat{ν}$ is well defined when Y_i = 0 or Y_i = n_i in all K studies. The test statistic is taken accordingly as

T (μ; D^{0}) = (\sum_{i = 1}^{K} w_{i}^{- 1}) {(\hat{μ} - μ)}^{2}

(4)

and may be computed quickly.

Moreover, a desirable property of $T (μ; D^{0})$ is that our empirical results suggest that for a fixed μ and ν₁ < ν₂

P {T (μ; D^{μ, ν_{1}}) \geq t} < P {T (μ; D^{μ, ν_{2}}) \geq t}

(5)

for t in the tail region of interest. This inequality implies that

p (μ; D^{0}) = sup_{ν} p (μ, ν; D^{0}) \approx p {μ, ν_{sup} (μ); D^{0}},

which can greatly simplify the calculation of $p (μ; D^{0})$ . While the verification of (5) is not straightforward analytically and only holds approximately, the underlying rational is obvious: a larger variance in the distribution of π_i implies higher variability for the test statistics and in turn increases the tail probability². Figure 2 contains an example confidence region based on the proposed test statistic in the (μ, ν) plane supporting the claim in (5). We leverage this property to substantially reduce the computational complexity by O(J) operations by simply considering values along the boundary of the reduced BB parameter space illustrated in Figure 2. In practice, our CI may be constructed based on the following simple procedure:

(Step 1)
Obtain the CI based on the asymptotic chi-square approximation to $T (μ; D^{0})$ , denoted by $[μ_{LB}^{MOM}, μ_{UB}^{MOM}]$ .
(Step 2)
Letting $\tilde{p} (μ; D^{0}) = p {μ, ν_{sup} (μ); D^{0}}$ , compute
$\tilde{p} (μ_{LB}^{\tilde{MOM}}; D^{0}) and \tilde{p} (μ_{UB}^{\tilde{MOM}}; D^{0}),$
where $μ_{LB}^{\tilde{MOM}} = max {s, {\tilde{μ}}_{LB}^{MOM}}$ , $μ_{UB}^{\tilde{MOM}} = min {1 - s, {\tilde{μ}}_{UB}^{MOM}}$ , and s is the chosen grid size.
(Step 3)
Identify the starting points for the lower and upper confidence bounds, respectively, as
${\tilde{μ}}_{inf} = μ_{LB}^{\tilde{MOM}} I {\tilde{p} (μ_{LB}^{\tilde{MOM}}; D^{0}) \geq α} + \hat{μ} I {\tilde{p} (μ_{LB}^{\tilde{MOM}}; D^{0}) < α}$
and
${\tilde{μ}}_{sup} = μ_{UB}^{\tilde{MOM}} I {\tilde{p} (μ_{UB}^{\tilde{MOM}}; D^{0}) \geq α} + \hat{μ} I {\tilde{p} (μ_{UB}^{\tilde{MOM}}; D^{0}) < α} .$
(Step 4)
Iterate out to the left and right along the boundary provided in (3) to compute $\tilde{p} (μ; D^{0})$ using steps A.1 to A.3 until [μ_LB, μ_UB], where
$μ_{LB} = inf_{μ} [μ ∣ \tilde{p} (μ; D^{0}) \geq α],$
and
$μ_{UB} = sup_{μ} [μ ∣ \tilde{p} (μ; D^{0}) \geq α] .$
In practice, we would compute
${\tilde{p} ({\tilde{μ}}_{inf} - k s; D^{0}), k = 1, 2, \dots} and {\tilde{p} ({\tilde{μ}}_{sup} + k s; D^{0}), k = 1, 2, \dots},$
and let
$μ_{LB} = min {{\tilde{μ}}_{inf} - k s ∣ \tilde{p} ({\tilde{μ}}_{inf} - k s; D^{0}) \geq α, k \geq 0},$
and
$μ_{UB} = max {{\tilde{μ}}_{sup} + k s ∣ \tilde{p} ({\tilde{μ}}_{sup} + k s; D^{0}) \geq α, k \geq 0} .$
(Step 5)
Compute $p (μ; D^{0}) = {sup}_{ν} p (μ, ν; D^{0})$ for μ ∈ [μ_LB−δ, μ_LB] for some small δ > 0. To this end, we need to calculate $p (μ, ν; D^{0})$ for many pairs of ν and μ. For example, to compute the lower bound of the CI, we would need to compute $p (μ^{(j)}, ν^{(j_{i})}; D^{0})$ and approximate $p (μ^{(j)}; D^{0})$ by
$max {p (μ^{(j)}, ν^{(j_{i})}; D^{0}) ∣ i = 1, \dots, I_{j}},$
where {μ⁽¹⁾, μ⁽²⁾, … , μ^(J)} are J equally spaced points within the interval [μ_LB −δ, μ_LB] and ${ν^{(j_{1})}, ν^{(j_{2})}, \dots, ν^{(j_{I_{j}})}}$ are I_j equally spaced points within the interval [0, ν_sup(μ^(j))]. The lower end of the CI is given by
$min {μ^{(j)} ∣ p (μ^{(j)}; D^{0}) \geq α, j = 1, \dots, J} .$
(Step 6)
Likewise, we can calculate the upper end of the CI by computing $p (μ; D^{0})$ for μ ∈ [μ_UB, μ_UB + δ].

Example confidence region in the (*μ, v*) plane illustrating the stochastic dominance property in (5). The dashed lined is the boundary of the parameter space for the restricted beta-binomial model

This algorithm uses the fact that $p (μ; D^{0}) \approx p {μ, ν_{sup} (μ); D^{0}}$ and the cut off values based on the latter as μ_LB and μ_UB are very close to the ends of the CI. In practice, further gains in computational speed may be achieved with standard parallelization techniques.

Remark 2.

While we may consider other scales (such as logit) in deciding the grid points for searching the confidence bounds of μ, we prefer the original scale with the intention to control the precision on the probability scale. In practice, the grid size s needs to be chosen according to the required precision of the incidence rate of interest. Oftentimes, s = 0.001 is a sensible choice. The number of Monte Carlo iterations in calculating $p (μ, ν; D^{0})$ should also depend on the desired precision. For example, if we want to estimate a p-value of 0.05 with a standard error of 0.005 then M needs to be ≥ 2000. The tuning parameter, δ, in steps 5 and 6 can be set as a multiple of s such as 10s in practice.

Remark 3.

We note that the selection of the test statistic is not unique. A simple alternative is $\tilde{T} (μ; D^{0}) = {(\sum_{i = 1}^{K} Y_{i} - μ \sum_{i = 1}^{K} n_{i})}^{2}$ . In our limited experience, we have found that the corresponding exact CI is similar to that based on $T (μ; D^{0})$ when K is small. One potential advantage of this test statistic is that the aforementioned approximation to $p {μ, ν_{sup} (μ); D^{0}}$ appears to become an equality, ie, $p (μ; D^{0}) = p {μ, ν_{sup} (μ); D^{0}}$ . Thus, we may skip steps 5 and 6 in the proposed algorithm to further accelerate the computation.

3 |. NUMERICAL STUDIES

3.1 |. Simulation study

We evaluated the performance of the proposed method in finite samples through simulation study. Throughout, the observed data were generated as

Y_{i} ∣ π_{i} ~ Bin (n_{i}, π_{i}), π_{i} ~ Beta (α_{0}, β_{0}), i = 1, \dots, K,

for different values of K and (α₀, β₀) to represent varying degrees of heterogeneity and event rates. Specifically, we consider a rare-event rate setting with

Setting 1a: (α_{0}, β_{0}) = (1.2, 75) and Setting 1 b : (α_{0}, β_{0}) = (3.6, 225)

and a low event rate setting with

Setting 2a: (α_{0}, β_{0}) = (1.2, 10) and Setting 2 b : (α_{0}, β_{0}) = (3.6, 30)

so that (μ₀, τ₀) = (0.016, 0.013) and (0.016, 0.004) in Settings 1a and 1b, respectively, and (μ₀, τ₀) = (0.107, 0.082) and (0.107, 0.029) in Settings 2a and 2b, respectively. For all scenarios, we set K = 5, 10, 15, and 20 and let the study specific sample sizes be n₁ = n₂ = 50, n₃ = n₄ = 100, and n_i = 150 for i ≥ 5/ For comparison, we present the performance of the CIs based on the DL and Sidik-Jonkman (SJ) methods with (1) the logistic transformation and 0.5 continuity correction (DL-LGT and SJ-LGT) and (2) the arcsine transformation (DL-ASIN and SJ-ASIN) as well as the CI based on the asymptotic chi-square approximation to the proposed test statistics (4) denoted as the method-of-moment estimators (MOM). We have also obtained the exact CI under the FE model where the pooled data $\sum_{i = 1}^{K} Y_{i} ~ Bin (\sum_{i = 1}^{K} n_{i}, μ_{0})$ . In Tables 1 and 2, we summarize the performance of the 95% CIs from the various procedures based on the median interval length and empirical coverage probability from 2000 simulated data sets.

TABLE 1.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the rare-event rate setting and moderate study-specific sample sizes

(1a) α₀ = 1.2,β₀=75
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.962 (0.040)	0.954 (0.028)	0.958 (0.023)	0.950 (0.019)
MOM	0.821 (0.025)	0.85 (0.018)	0.876 (0.016)	0.900 (0.014)
DL-LGT	0.882 (0.032)	0.848 (0.021)	0.831 (0.017)	0.841 (0.015)
SJ-LGT	0.958 (0.037)	0.958 (0.023)	0.961 (0.019)	0.970 (0.016)
DL-ASIN	0.792 (0.026)	0.751 (0.018)	0.699 (0.015)	0.648 (0.013)
SJ-ASIN	0.822 (0.028)	0.781 (0.019)	0.734 (0.016)	0.684 (0.014)
FE	0.890 (0.024)	0.790 (0.015)	0.788 (0.011)	0.786 (0.010)
(1b) α₀ = 3.6,β₀=30
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.978 (0.039)	0.98 (0.025)	0.976 (0.018)	0.971 (0.015)
MOM	0.874 (0.024)	0.903 (0.015)	0.920 (0.012)	0.920 (0.011)
DL-LGT	0.898 (0.031)	0.837 (0.019)	0.769 (0.015)	0.714 (0.012)
SJ-LGT	0.959 (0.034)	0.961 (0.021)	0.964 (0.017)	0.958 (0.014)
DL-ASIN	0.858 (0.025)	0.840 (0.017)	0.812 (0.014)	0.776 (0.012)
SJ-ASIN	0.878 (0.028)	0.877 (0.019)	0.854 (0.015)	0.814 (0.013)
FE	0.948 (0.026)	0.914 (0.015)	0.905 (0.011)	0.906 (0.010)

Open in a new tab

TABLE 2.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the low event rate setting and moderate study-specific sample sizes

(2a) α₀ = 1.2,β₀=10
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.962 (0.222)	0.955 (0.157)	0.961 (0.116)	0.960 (0.094)
MOM	0.802 (0.122)	0.882 (0.101)	0.909 (0.087)	0.920 (0.076)
DL-LGT	0.864 (0.131)	0.878 (0.091)	0.840 (0.073)	0.816 (0.063)
SJ-LGT	0.890 (0.151)	0.905 (0.107)	0.882 (0.087)	0.844 (0.076)
DL-ASIN	0.832 (0.136)	0.850 (0.102)	0.836 (0.085)	0.819 (0.075)
SJ-ASIN	0.838 (0.137)	0.854 (0.104)	0.844 (0.086)	0.820 (0.075)
FE	0.478 (0.058)	0.426 (0.035)	0.428 (0.028)	0.425 (0.024)
(2b) α = 3.6,β=30
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.990 (0.176)	0.968 (0.103)	0.959 (0.074)	0.969 (0.061)
MOM	0.836 (0.084)	0.883 (0.066)	0.914 (0.056)	0.925 (0.049)
DL-LGT	0.888 (0.096)	0.906 (0.067)	0.920 (0.055)	0.922 (0.047)
SJ-LGT	0.924 (0.106)	0.938 (0.075)	0.940 (0.062)	0.945 (0.053)
DL-ASIN	0.866 (0.096)	0.892 (0.070)	0.896 (0.058)	0.900 (0.050)
SJ-ASIN	0.887 (0.099)	0.904 (0.072)	0.907 (0.059)	0.909 (0.052)
FE	0.688 (0.059)	0.626 (0.036)	0.625 (0.028)	0.642 (0.024)

Open in a new tab

With the exception of the SJ-LGT method, all comparison methods fail to achieve the nominal coverage level of 95%. For instance, the empirical coverage probabilities for the remaining methods based on the random-effect model range from 64.8% to 93.2% with K = 20. Although the SJ-LGT method is generally the best performing method of the DL and SJ methods considered, the coverage probability substantially deteriorates with increasing K in Setting 2a. The MOM method exhibits more stable behavior but falls well below the nominal level for K = 20 in Setting 1a. As expected, the FE method exhibits significant undercoverage with the best performance in Setting 1b where the between-study heterogeneity is the smallest.

The proposed method is conservative for K = 5 and the settings with lower heterogeneity (Settings 1b and 2b), but overall, we find that the coverage levels are only moderately higher than 95% in the scenarios considered. As expected, the length of the proposed CI decreases with increasing K. With K = 20, the length is generally on par with the competing methods, although yields a coverage level closer to the desired 95%.

For completeness, we also considered analogous scenarios with smaller study-specific sample sizes. Specifically, for both settings, we set the study specific sample sizes to: n₁ = n₂ = 10, n₃ = n₄ = 15, and n_i = 20 for i ≥ 5. The results are summarized in Tables 3 and 4. When μ₀ = 0.107, the FE method exhibits undercoverage while the empirical coverage levels of CIs based on the new method are all above the nominal level. The empirical coverage level of CIs based on SJ-LGT are also satisfactory. When μ₀ = 0.016, the empirical coverage levels of CIs based on DL-LGT, SJ-LGT, DL-ASIN, and SJ-ASIN are substantially lower than the nominal level. A potential reason is that the bias caused by the 0.5 correction in zero event studies becomes nonnegligible relative to the true incidence rates which are very low.

TABLE 3.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the rare-event rate setting and small study-specific sample sizes

(2a) α = 1.2,β=75
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.961 (0.098)	0.952 (0.046)	0.944 (0.035)	0.956 (0.029)
MOM	0.999 (0.115)	1.000 (0.068)	0.998 (0.054)	0.999 (0.045)
DL-LGT	0.696 (0.112)	0.308 (0.064)	0.088 (0.051)	0.026 (0.042)
SJ-LGT	0.696 (0.113)	0.308 (0.065)	0.088 (0.051)	0.026 (0.043)
DL-ASIN	0.944 (0.088)	0.796 (0.052)	0.573 (0.042)	0.285 (0.036)
SJ-ASIN	0.962 (0.088)	0.840 (0.054)	0.586 (0.042)	0.295 (0.036)
FE	0.964 (0.077)	0.968 (0.040)	0.960 (0.033)	0.950 (0.029)
(2b) α = 3.6,β=225
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.962 (0.098)	0.964 (0.046)	0.956 (0.035)	0.964 (0.029)
MOM	1.000 (0.115)	1.000 (0.068)	0.998 (0.053)	0.999 (0.045)
DL-LGT	0.697 (0.112)	0.270 (0.064)	0.086 (0.050)	0.020 (0.042)
SJ-LGT	0.696 (0.113)	0.270 (0.065)	0.086 (0.050)	0.020 (0.043)
DL-ASIN	0.942 (0.088)	0.828 (0.053)	0.584 (0.042)	0.276 (0.036)
SJ-ASIN	0.958 (0.088)	0.853 (0.052)	0.598 (0.042)	0.285 (0.036)
FE	0.966 (0.077)	0.981 (0.040)	0.962 (0.033)	0.962 (0.029)

Open in a new tab

TABLE 4.

The empirical coverage probabilities (CovP) of the 95% confidence intervals and the median lengths (Length) of the various methods in the low event rate setting and small study-specific sample sizes

(2a) α = 1.2,β=10
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.970 (0.229)	0.962 (0.146)	0.960 (0.119)	0.960 (0.104)
MOM	0.834 (0.160)	0.859 (0.110)	0.894 (0.095)	0.900 (0.086)
DL-LGT	0.920 (0.189)	0.882 (0.133)	0.860 (0.110)	0.869 (0.095)
SJ-LGT	0.960 (0.215)	0.964 (0.148)	0.968 (0.121)	0.974 (0.104)
DL-ASIN	0.914 (0.151)	0.882 (0.107)	0.888 (0.091)	0.874 (0.079)
SJ-ASIN	0.930 (0.169)	0.927 (0.120)	0.934 (0.099)	0.916 (0.086)
FE	0.840 (0.154)	0.803 (0.098)	0.823 (0.077)	0.814 (0.065)
(2b) α = 3.6,β=30
	K = 5	K = 10	K = 15	K = 20
Method	CovP (Length)	CovP (Length)	CovP (Length)	CovP (Length)
Exact	0.990 (0.218)	0.983 (0.134)	0.981 (0.102)	0.975 (0.087)
MOM	0.906 (0.155)	0.912 (0.100)	0.922 (0.079)	0.926 (0.069)
DL-LGT	0.924 (0.177)	0.881 (0.116)	0.831 (0.093)	0.774 (0.080)
SJ-LGT	0.974 (0.198)	0.973 (0.133)	0.968 (0.107)	0.970 (0.092)
DL-ASIN	0.967 (0.145)	0.931 (0.095)	0.920 (0.076)	0.900 (0.066)
SJ-ASIN	0.974 (0.162)	0.957 (0.110)	0.955 (0.089)	0.948 (0.077)
FE	0.934 (0.154)	0.900 (0.098)	0.910 (0.076)	0.914 (0.065)

Open in a new tab

Furthermore, we have investigated the validity of (5):

P {T (μ; D^{μ, ν_{1}}) \geq t} < P {T (μ; D^{μ, ν_{2}}) \geq t}, for ν_{1} < ν_{2},

via numerical studies. Specifically, with μ₀ at the previous simulation setting, we estimate the cumulative distribution function of $T (μ_{0}; D^{μ_{0}, ν})$ for ν = ν_sup(μ₀)k∕5, k = 1, …, 5, using Monte Carlo simulation. To this end, we simulated 10⁶ test statistics for each combination of (μ₀, ν, n₁, …, n_K) and obtained the empirical survival function. We have plotted the survival functions for μ₀ = 0.016 and μ₀ = 0.107 with moderate study specific sample sizes in Figures 3 and 4, respectively, to examine whether they are stochastically ordered according to the value of ν. Likewise, we have plotted the survival function for μ₀ = 0.107 with small study specific sample sizes in Figure 5. The simulation results confirm the inequality (5) and especially the claim that

sup_{ν \leq ν_{sup} (μ)} P {T (μ; D^{μ, ν}) \geq t} = P {T (μ; D^{μ, ν_{sup p} (μ)}) \geq t}

under the given settings.

The empirical survival functions for the test statistic with μ = 0.016 for the moderate study-specific sample size setting. The five survival curves from bottom to top correspond to increasing values of v

The empirical survival functions for the test statistic with μ = 0.107 for the moderate study-specific sample size setting. The five survival curves from bottom to top correspond to increasing values of v

The empirical survival functions for the test statistic with μ = 0.107 for the small study-specific sample size setting. The five survival curves from bottom to top correspond to increasing values of v

3.2 |. Real data analysis

AML is an aggressive cancer that begins in the bone marrow and results in an elevated level of immature white blood cells called myeloid blasts in the bloodstream. CD33 is an antigen expressed on the blast cells in the majority of AML patients and has been identified as a suitable candidate for targeted treatment of the disease. One such drug, Mylotarg, was approved by the FDA in 2000 for the treatment of patients with CD33-positive AML in first relapse who are 60 years of age or older and who are not candidates for other types of cytotoxic chemotherapy. Confirmatory trials revealed safety concerns, including higher rates of EM and veno-occlusive disease (VOD) and failed to confirm a clinical benefit. Here, we illustrate the proposed method in three meta-analyses of 14 trials aiming to evaluate the safety and/or efficacy of an unfractionated regimen of Mylotarg: 2 doses of 6 mg/m² given 14 days apart.⁴ The studies consisted of three pivotal phase II multicenter trials as well as dose-escalation and postmarketing studies in patients with relapsed or refractory AML. The efficacy of the treatment regimen was evaluated based on meta-analyses of complete remission (CR) rate across the studies presented in Table A1 in the Appendix. Safety was assessed with meta-analyses of EM and VOD based on the studies presented in Tables A2 and A3. We also provide some discussion of the comparison of the 6 mg/m² unfractionated regimen to a fractionated dosing regimen of 3 mg/m² given on days 1, 4, and 7 for which very few studies were available for analysis.

The 95% CIs and corresponding lengths based on the proposed exact method as well as the FE, SJ-LGT, and SJ-ASIN methods for the 6 mg/m² regimen are presented in Table 5. The number of available studies for analysis is quite small for each of the outcomes. Additionally, all studies have 16 or fewer patients and the number of zero event studies is high. For example, CR was not observed in four of the six studies considered. As a result, we observe substantial differences in the point estimates for the three methods, ranging from 0.013 to 0.035, as well as the corresponding CIs. The SJ-LGT interval is more than twice the length of SJ-ASIN interval and longer than the CI from the exact method. Similar though less extreme patterns are observed for the EM and VOD analyses. Consistent with our simulation results and intuition, we also find the intervals from the FE method are shorter than those from the proposed method. These differences highlight the utility of our method in practice as its validity does not rely on the choice of transformation or large sample approximation.

TABLE 5.

Random effects meta-analyses of the efficacy (complete remission (CR)) and safety (early mortality (EM) and veno-occlusive disease (VOD)) of the 6 mg/m² regimen of Mylotarg based on the proposed exact method and the SJ-LGT and SJ-ASIN methods. Presented are the 95% confidence intervals and corresponding lengths for each of the methods

Study	CR	EM	VOD
K	6	5	7
Exact	0.007–0.128 (0.121)	0.049–0.340 (0.291)	0.072–0.367 (0.295)
SJ-LGT	0.027–0.185 (0.158)	0.064–0.342 (0.279)	0.112–0.421 (0.310)
SJ-ASIN	0.003–0.076 (0.073)	0.001–0.252 (0.251)	0.012–0.306 (0.294)
FE	0.004 – 0.121 (0.116)	0.041–0.262 (0.221)	0.105–0.350 (0.245)

Open in a new tab

In addition to evaluating the 6 mg/m² unfractionated treatment regimen, interest also lied in comparison to a fractionated dosing regimen of 3 mg/m². However, only two studies evaluated CR and EM for the fractionated regimen and three studies for VOD. We therefore constructed exact binomial CIs based on the pooled data, effectively assuming a FE model, as it is difficult to support the modeling assumptions for a random-effect model in this setting. The resulting CIs for CR, EM, and VOD were estimated to be (0.153, 0.379), (0.035, 0.170), and (0, 0.042), respectively. While these results suggest some benefit in efficacy for the fractioned treatment versus the unfractionated treatment as well as lower rates of VOD and EM, they must be interpreted with caution due to the small number of studies and the use of the FE approach. The confidence intervals based on the proposed random-effect approach are wider as expected, namely (0.060, 0.724), (0.030, 0.377), and (0, 0.169) respectively, but we again caution that there is insufficient information to support the related model assumptions for the application of this method.

4 |. DISCUSSION

In this paper, we have proposed a random-effect model for combining multiple binomial random variables with underlying probabilities following a parametric distribution. In particular, we introduced an exact CI for the location parameter of the BB model. The method is valid regardless of the number of studies or the sample size within each of the studies and is thus especially suitable to study rare-event data where the event probability is close to zero. While the performance of the method depends on the choice of the test statistic, we choose the modified DL test statistic to ensure that our method yields similar performance as the DL method when K is not small as the DL method is asymptotically optimal. A closely related problem and a direction warranting future research is to extend our method to combine a group of two-by-two tables which are used to characterize the between group difference in binary outcomes from multiple studies.

Finally, we note that when the number of studies is less than 4 we oftentimes lack sufficient information to either support or refute the involved model assumptions. We therefore advise practitioners to avoid over interpreting the results despite the proposed method maintaining validity under correctly specified model assumptions. Additionally, we emphasize that having a small number of studies by itself is not necessarily a good reason for using the simple FE model since its implicit assumption is even stronger than that of the random-effect model. As a general suggestion, the choice between random versus FE approaches should be based on the observed data and one should always clearly state the key assumptions together with the analysis results.

ACKNOWLEDGEMENT

This research was supported by NIH grant 5R01 HL08977807. This article reflects the views of the authors and should not be construed to represent FDA views or policies.

APPENDIX

DATA FOR MYLOTARG EXAMPLE

TABLE A1.

Complete remission rates for the studies of the two regimens of Mylotarg

(a) 6 mg/m²
	Events	Total
Study 101	0	8
Study 102	1	14
Study 103	0	6
Study 100374	1	6
Piccaluga, 2004	0	7
van der Heiden, 2006	0	16
(b) 3 mg/m²
	Events	Total
MyloFrance 1	15	57
Brethon, 2006	1	6

Open in a new tab

TABLE A2.

Rates of early mortality for the studies of the two regimens of Mylotarg

(a) 6 mg/m²
	Events	Total
Study 101	2	8
Study 102	1	14
Study 103	0	6
Study 100374	0	6
Piccaluga, 2004	2	7
(b) 3 mg/m²
	Events	Total
MyloFrance 1	4	57
Thomas, 2005	3	24

Open in a new tab

TABLE A3.

Rates of veno-occlusive disease for the studies of the two regimens of Mylotarg

(a) 6 mg/m²
	Events	Total
Study 101	1	8
Study 102	6	14
Study 103	0	6
Study 100374	2	6
Thomas, 2005	1	6
Piccaluga, 2004	0	7
Zwaan, 2003	0	1
(b) 3 mg/m²
	Events	Total
MyloFrance 1	0	57
Thomas, 2005	0	24
Brethon, 2006	0	6

Open in a new tab

Footnotes

DATA AVAILABILITY STATEMENT

The data for the Mylotarg example is provided in the Appendix.

REFERENCES

1.Blyth CR, Still HA. Binomial confidence intervals. J Am Stat Assoc. 1983;78(381):108–116. [Google Scholar]
2.Agresti A, Caffo B. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat. 2000;54(4):280–288. [Google Scholar]
3.Brown LD, Cai TT, Dasgupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Stat. 2002;30(1):160–201. [Google Scholar]
4.US Food and Drug Administration. Federal Drug Administration Briefing Document: Oncologic Drugs Advisory Committee Meeting. 2017.
5.Normand SL. Tutorial in biostatistics meta-analysis: formulating, evaluating, combining, and reporting. Statis Med. 1999;18(3):321–359. [DOI] [PubMed] [Google Scholar]
6.DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]
7.Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of meta-analyses and their component studies in the cochrane database of systematic reviews: a cross-sectional, descriptive analysis. BMC Med Res Method. 2011;11(1):160. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Platt RW, Leroux BG, Breslow N. Generalized linear mixed models for meta-analysis. Statis Med. 1999;18(6):643–654. [DOI] [PubMed] [Google Scholar]
9.Shuster JJ, Jones LS, Salmon DA. Fixed vs random-effect meta-analysis in rare event studies: the rosiglitazone link with myocardial infarction and cardiac death. Statis Med. 2007;26(24):4375–4385. [DOI] [PubMed] [Google Scholar]
10.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. J Clin Epidemiol. 2008;61(1):41–51. [DOI] [PubMed] [Google Scholar]
1.Stijnen T, Hamza TH, Özdemir P. Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data. Statis Med. 2010;29(29):3046–3067. [DOI] [PubMed] [Google Scholar]
12.Bhaumik DK, Amatya A, Normand SL, Greenhouse J, Kaizar E, Neelon B, Gibbons RD. Meta-analysis of rare binary adverse event data. J Am Stat Assoc. 2012;107(498):555–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sweeting M, Sutton A, Lambert P. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statis Med. 2004;23(9):1351–1375. [DOI] [PubMed] [Google Scholar]
14.Young-Xu Y, Chan KA. Pooling overdispersed binomial data to estimate event rate. BMC Med Res Method. 2008;8(1):58. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ma Y, Chu H, Mazumdar M. Meta-analysis of proportions of rare events–a comparison of exact likelihood methods with robust variance estimation. Commun Stat-Simul Comput. 2016;45(8):3036–3052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Blyth CR, Still HA. Binomial confidence intervals. J Am Stat Assoc. 1983;78(381):108–116. [Google Scholar]

[R2] 2.Agresti A, Caffo B. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat. 2000;54(4):280–288. [Google Scholar]

[R3] 3.Brown LD, Cai TT, Dasgupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Stat. 2002;30(1):160–201. [Google Scholar]

[R4] 4.US Food and Drug Administration. Federal Drug Administration Briefing Document: Oncologic Drugs Advisory Committee Meeting. 2017.

[R5] 5.Normand SL. Tutorial in biostatistics meta-analysis: formulating, evaluating, combining, and reporting. Statis Med. 1999;18(3):321–359. [DOI] [PubMed] [Google Scholar]

[R6] 6.DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188. [DOI] [PubMed] [Google Scholar]

[R7] 7.Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of meta-analyses and their component studies in the cochrane database of systematic reviews: a cross-sectional, descriptive analysis. BMC Med Res Method. 2011;11(1):160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Platt RW, Leroux BG, Breslow N. Generalized linear mixed models for meta-analysis. Statis Med. 1999;18(6):643–654. [DOI] [PubMed] [Google Scholar]

[R9] 9.Shuster JJ, Jones LS, Salmon DA. Fixed vs random-effect meta-analysis in rare event studies: the rosiglitazone link with myocardial infarction and cardiac death. Statis Med. 2007;26(24):4375–4385. [DOI] [PubMed] [Google Scholar]

[R10] 10.Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of meta-analysis was preferred to model within-study variability. J Clin Epidemiol. 2008;61(1):41–51. [DOI] [PubMed] [Google Scholar]

[R11] 1.Stijnen T, Hamza TH, Özdemir P. Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data. Statis Med. 2010;29(29):3046–3067. [DOI] [PubMed] [Google Scholar]

[R12] 12.Bhaumik DK, Amatya A, Normand SL, Greenhouse J, Kaizar E, Neelon B, Gibbons RD. Meta-analysis of rare binary adverse event data. J Am Stat Assoc. 2012;107(498):555–567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Sweeting M, Sutton A, Lambert P. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statis Med. 2004;23(9):1351–1375. [DOI] [PubMed] [Google Scholar]

[R14] 14.Young-Xu Y, Chan KA. Pooling overdispersed binomial data to estimate event rate. BMC Med Res Method. 2008;8(1):58. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Ma Y, Chu H, Mazumdar M. Meta-analysis of proportions of rare events–a comparison of exact likelihood methods with robust variance estimation. Commun Stat-Simul Comput. 2016;45(8):3036–3052. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Exact inference for the random-effect model for meta-analyses with rare events

Jessica Gronsbell

Chuan Hong

Lei Nie

Ying Lu

Lu Tian

Abstract

1 |. INTRODUCTION

2 |. METHODS

2.1 |. Data structure

Remark 1.

FIGURE 1.

2.2 |. Proposed exact CI for the location parameter

2.3 |. Proposed test statistic and details of computation

FIGURE 2.

Remark 2.

Remark 3.

3 |. NUMERICAL STUDIES

3.1 |. Simulation study

TABLE 1.

TABLE 2.

TABLE 3.

TABLE 4.

FIGURE 3.

FIGURE 4.

FIGURE 5.

3.2 |. Real data analysis

TABLE 5.

4 |. DISCUSSION

ACKNOWLEDGEMENT

APPENDIX

DATA FOR MYLOTARG EXAMPLE

TABLE A1.

TABLE A2.

TABLE A3.

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases