Bayesian Estimation and Testing in Random-Effects Meta-Analysis of Rare Binary Events Allowing for Flexible Group Variability

Ming Zhang; Jackson Barth; Johan Lim; Xinlei Wang

doi:10.1002/sim.9695

. Author manuscript; available in PMC: 2024 May 20.

Published in final edited form as: Stat Med. 2023 Mar 4;42(11):1699–1721. doi: 10.1002/sim.9695

Bayesian Estimation and Testing in Random-Effects Meta-Analysis of Rare Binary Events Allowing for Flexible Group Variability

Ming Zhang ¹, Jackson Barth ¹, Johan Lim ², Xinlei Wang ³

PMCID: PMC10192012 NIHMSID: NIHMS1877249 PMID: 36869639

Abstract

Rare binary events data arise frequently in medical research. Due to lack of statistical power in individual studies involving such data, meta-analysis has become an increasingly important tool for combining results from multiple independent studies. However, traditional meta-analysis methods often report severely biased estimates in such rare-event settings. Moreover, many rely on models assuming a pre-specified direction for variability between control and treatment groups for mathematical convenience, which may be violated in practice. Based on a flexible random-effects model that removes the assumption about the direction, we propose new Bayesian procedures for estimating and testing the overall treatment effect and inter-study heterogeneity. Our Markov chain Monte Carlo algorithm employs Pólya-Gamma augmentation so that all conditionals are known distributions, greatly facilitating computational efficiency. Our simulation shows that the proposed approach generally reports less biased and more stable estimates compared to existing methods. We further illustrate our approach using two real examples, one using rosiglitazone data from 56 studies and the other using stomach ulcers data from 41 studies.

Keywords: Bayesian hierarchical model, binomial normal, data augmentation, log odds ratio, model selection, Pólya-Gamma

1. INTRODUCTION

Meta-analysis is a systematic and quantitative procedure used to integrate information from a set of individual studies.¹ This process is widely used in many fields of research such as medicine, education, psychology, criminology, etc.² For example, pharmaceutical scientists typically use meta-analysis of (rare) binary events to investigate the efficacy or safety of healthcare interventions, yielding more reliable statistical inference. In this paper, we focus on meta-analysis of rare binary events.

Rare binary outcomes are common in studies involving rare diseases or drug safety. Due to low background incidence rates or small sample sizes, researchers often have difficulty in reporting a reliable and generalizable result from each study alone. Thus, meta-analysis has become a routine for analysis of such data and received a lot of attention. Fixed-effect models (FEMs) were widely used in this field, where an identical treatment effect is assumed across all individual studies. Popular FEM-based approaches include the Mantel-Haenszel method (MH),³ the inverse-variance weighted method,⁴ the empirical logit method,⁵ and Peto’s approach.⁶ One challenge in meta-analysis of rare binary events is dealing with studies that contain zero events in one or both arms. Typical quick fixes include using continuity correction with the MH or inverse-variance weighted methods and excluding zero-event studies as suggested by Peto’s approach, which may lead to substantial bias.^7,8,9 Instead of using point estimators, Tian et al.¹⁰ derived an interval estimation procedure under the FEM framework without relying on large-sample approximation or continuity correction.

Compared to FEMs, random-effects models (REMs) assume existence of between-study variability. REMs are considered to be more plausible than FEMs since experimental conditions, protocols, and patients’ characteristics vary from study to study, and the goal of meta-analysis is perhaps not only getting the identical point estimate of the treatment effect but also generalizing the result to broader scenarios.² When outcomes are binary, binomial-normal hierarchical models (BN) are the most popular among REMs.^{11,12,13,14,15} For instance, Houwelingen et al.¹¹ derived likelihood-based estimators of model parameters by assigning a bivariate normal distribution to the logit-transformed probabilities of two groups. Bhaumik et al.¹² considered a BN model and proposed method-of-moments estimators of the treatment effect (i.e., the simple average estimator; SA) and the heterogeneity parameter (i.e., the improved Paule and Mandel estimator; IPM). However, SA tends to overestimate the treatment effect and IPM tends to underestimate the heterogeneity, and their bias increases when the background incidence rate becomes lower. Simmonds and Higgins¹³ slightly modified the BN model in Bhaumik et al.¹² by treating the baseline risk for each trial as a fixed effect. Still, both models make the same assumption of larger variability in the treatment group than in the control group. Li and Wang¹⁴ proposed a new flexible BN model without assuming a specific direction between the variances of the two groups. That is, it allows the variability in the treatment group to be smaller, larger or equal to that of the control group.

In addition to the frequentist REM approaches mentioned above, Bayesian approaches have also been used to model meta-analysis of rare binary events. In recent decades, researchers have used the Bayesian framework more frequently for these studies,^{16,17,9,18,19} due in large part to increased computing power and the development of Markov chain Monte Carlo (MCMC) techniques. Smith et al.¹⁶ first proposed a fully Bayesian framework and implemented their process via BUGS software.²⁰ Günhan et al.¹⁸ adopted the same model and introduced weakly informative priors to the treatment effects. Bai et al.¹⁷ and Ren et al.⁹ used the model from Bhaumik et al.¹² and extended it to the Bayesian paradigm by assigning different priors to hyperparameters. More recently, Hong et al.¹⁹ conducted a simulation to compare six Bayesian methods and nine frequentist approaches in estimating the log odds ratio (LOR) for the overall treatment effect in the context of meta-analysis of rare binary events, and recommended two estimators: the Heterogeneous Treatment Effect method based on a Binomial-Beta hierarchical framework (HTE-Beta) and a weighted estimator proposed by Shuster et al.(SGSwgt).²¹ However, they did not examine the performance in estimating inter-study heterogeneity.

The REMs considered in the literature often assume that the variability in the treatment group is larger than that in the control group^12,17,9 or that the variability is equal between two arms.^16,18 In practice, a model that allows unequal variability between the two groups and assumes no specific direction is less restrictive and more appropriate. Li and Wang¹⁴ were the first to propose such a flexible REM. But the authors mainly focused on theoretical aspects of the SA estimator proposed in Bhaumik et al.¹² and performance evaluation of existing estimators of the LOR. In this paper, we implement a Bayesian framework based on the REM in Li and Wang¹⁴ and develop new estimators of key model parameters. Our method, called FlexB, allows data to determine the direction of variance comparison (i.e., which group has larger variability) and accounts for this uncertainty in parameter estimation, rather than making an assumption about the direction which may be inaccurate. We further propose a Gibbs sampler, in which we integrate the Pólya-Gamma data-augmentation technique into the hierarchical model structure for efficient and reliable posterior sampling.²² Unlike most previous Bayesian works implemented using Stan, JAGS, or BUGS, we implement our MCMC algorithm using Rcpp, which integrates C++ with R, to achieve much faster computation.²³

Another important factor to meta-analysis of rare binary events is hypothesis testing or model selection, which can provide straightforward conclusions to research questions related to efficacy or safety issues. For instance, Bhaumik et al.¹² proposed a large-sample test using the SA estimator to test an overall treatment effect. They also derived two tests using Cochran’s Q statistics and parametric bootstrap (PB) techniques for testing between-study heterogeneity of treatment effects. Bai et al.¹⁷ employed a Bayesian model-selection approach for simultaneous hypothesis testing rather than testing the overall treatment effect or the inter-study heterogeneity separately. They showed that the deviance information criterion (DIC)²⁴ based on the (marginalized) likelihood function of parameters of main interest is better than several competitors in choosing correct models. Under the flexible REM proposed in Li and Wang,¹⁴ we consider a similar DIC approach for simultaneous hypothesis testing of key model parameters. We also adapt Bayesian information criterion (BIC)²⁵ to our meta-analysis framework to select the best model.

The remainder of this paper is organized as follows. In Section 2, we present the proposed Bayesian hierarchical approach including the BN model allowing flexible group variability, prior specification, and posterior computation via a Gibbs sampler based on the Pólya-Gamma data-augmentation technique.²² In Section 3, we consider three model selection criteria for the purpose of hypothesis testing, including BIC and DIC. In Section 4, we present conduct simulation studies to evaluate the performance of the proposed FlexB and compare it with existing methods in estimation and testing of the treatment effect and the inter-study heterogeneity. Section 5 illustrates the proposed method using two data examples. Section 6 ends the paper with conclusions and discussions. Our method is implemented in R package “metaFlexB” and is available at https://github.com/chriszhangm/metaFlexB.

2 |. FLEXB: A BAYESIAN HIERARCHICAL APPROACH

2.1 |. The flexible REM

Let I be the number of independent studies involved in a meta-analysis, x_i1 (x_i2) be the number of rare events out of n_i1 (n_i2) cases in the control (treatment) group of the ith study. Each case in the control (treatment) group has probability p_i1 (p_i2) of having the event of interest. Let ϕ_i1(ϕ_i2) denote the logit-transformed p_i1 (p_i2), namely $ϕ_{i j} \equiv ln \frac{p_{i j}}{1 - p_{i j}}$ ; and let θ_i be the treatment effect for i on the study on the log-odds scale, where $θ_{i} \equiv ϕ_{i 2} - ϕ_{i 1} = ln [\frac{p_{i 2}}{1 - p_{i 2}} / \frac{p_{i 1}}{1 - p_{i 1}}]$ . Then ϕ_i1 (ϕ_i2) can be expressed as a linear combination of random components θ_i and μ_i, each following a Gaussian distribution with all μ_is and θ_is assumed to be mutually independent. Li and Wang¹⁴ formulated the following flexible binomial-normal hierarchical random-effects model:

x_{i 1} ~ Bin (n_{i 1}, p_{i 1}), x_{i 2} ~ Bin (n_{i 2}, p_{i 2}), logit (p_{i 1}) = ϕ_{i 1}, logit (p_{i 2}) = ϕ_{i 2}, where ϕ_{i 1} = μ_{i} - ω θ_{i}, ϕ_{i 2} = μ_{i} + (1 - ω) θ_{i}, μ_{i} ~ N (μ_{0}, σ^{2}), θ_{i} ~ N (θ_{0}, τ^{2})

(1)

where the unknown parameter ω can be a constant in the interval [0, 1], introduced to control the direction of variability in the two groups. That is, for τ² > 0, if ω ∈ [0, 0.5), then the variance of ϕ_i2 is greater than that of ϕ_i1; if ω = 0.5, the variance of ϕ_i2 is equal to that of ϕ_i1; and if ω ∈ (0.5, 1], the variance of ϕ_i2 is smaller than that of ϕ_i1. More specifically, when ω = 0, (1) would reduce to the model used in Bhaumik et al.¹² and Bai et al.¹⁷ with

ϕ_{i 1} = μ_{i}, ϕ_{i 2} = μ_{i} + θ_{i},

where μ_i becomes the baseline risk in the control group; when ω = 0.5, (1) yields the model in Smith et al.¹⁶ and Günhan et al.¹⁸ with

ϕ_{i 1} = μ_{i} - \frac{1}{2} θ_{i}, ϕ_{i 2} = μ_{i} + \frac{1}{2} θ_{i},

which assumes equal variability in both groups. For τ² = 0, the variance of ϕ_i2 is the same as that of ϕ_i1, which equals σ², regardless the value of ω.

We adopt a graphical approach to visualize the model (1),²⁶ where each node denotes either observed data or model parameters (Figure 1). Double rectangles denote fixed constants (n_i1, n_i2) that are determined before each study i; single rectangles represent observed event counts (x_i1, x_i2); circles denote unknown model parameters at various levels, including the probabilities of the rare event in the control and treatment groups p_i1 and p_i2, their logits ϕ_i1 and ϕ_i2, (μ_i, θ_i, ω₎ used to linearly model the logits, and their distributional parameters (μ₀, σ², θ₀, τ²₎ as well as (λ_i1, λ_i2, κ_i) that are introduced for computational ease; dashed rectangles represent the sampling process of using a data-augmentation technique that will be described in Section 2.3.

Graphical model for the flexible random-effects model; green color indicates those quantities introduced by the Pólya-Gamma data-augmentation technique; pink color indicates key parameters in previous REMs; yellow color indicates the newly added key parameter to reflect the variability direction in treatment and control groups.

For the model depicted in Figure 1, we marginalize over study-specific parameters μ_is and θ_is, and consider the set of parameters Θ = {ϕ₁, ϕ₂, μ₀, θ₀, σ², τ², ω} in our Bayesian analysis, where ϕ₁ = {ϕ₁₁, …, ϕ_I1} and ϕ₂ = {ϕ₁₂, …, ϕ_I2}. Let $X = {x_{i 1}, x_{i 2}}_{i = 1}^{I}$ denote the data we observe. Then the joint distribution of (X, Θ) can be written as

p (X, Θ) = \prod_{i = 1}^{I} p (x_{i 1}, x_{i 2} ∣ ϕ_{i 1}, ϕ_{i 2}) p (ϕ_{i 1}, ϕ_{i 2} ∣ μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω) p (μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω),

(2)

where

p (x_{i 1}, x_{i 2} ∣ ϕ_{i 1}, ϕ_{i 2}) = (\begin{matrix} n_{i 1} \\ x_{i 1} \end{matrix}) (\begin{matrix} n_{i 2} \\ x_{i 2} \end{matrix}) \frac{{(e^{ϕ_{i 1}})}^{x_{i 1}}}{{(1 + e^{ϕ_{i 1}})}^{n_{i 1}}} \frac{{(e^{ϕ_{i 2}})}^{x_{i 2}}}{{(1 + e^{ϕ_{i 2}})}^{n_{i 2}}}

and

ϕ_{i 1}, ϕ_{i 2} ∣ μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω ~ N ((\begin{matrix} μ_{0} - ω θ_{0} \\ μ_{0} + (1 - ω) θ_{0} \end{matrix}), (\begin{matrix} σ_{ϕ . 11}^{2} & σ_{ϕ . 12}^{2} \\ σ_{ϕ . 21}^{2} & σ_{ϕ . 22}^{2} \end{matrix}))

(3)

Here, $σ_{ϕ . 11}^{2} = σ^{2} + ω^{2} τ^{2}$ , $σ_{ϕ . 22}^{2} = σ^{2} + {(1 - ω)}^{2} τ^{2}$ , $σ_{ϕ . 12}^{2} = σ^{2} - ω (1 - ω) τ^{2}$ .

2.2. Prior specification

As usual, the hyper-parameters in the top layer of the graphical model in Figure 1 are assumed to be a priori independent; that is, p(μ₀, θ₀, σ², τ², ω) = p(μ₀) p(θ₀) p(σ²) p(τ²) p(ω). We consider non-informative uniform priors for μ₀ and θ₀: μ₀ ~ U(L_μ, U_μ), θ₀ ~ U(L_θ, U_θ), where upper and lower bounds contain virtually all plausible values of μ₀ and θ₀. To define these ranges, we first use the RE model used in Bhaumik et al.¹² to get rough estimates ${{\hat{μ}}_{1}, {\hat{μ}}_{2}, \dots, {\hat{μ}}_{I}}$ and ${{\hat{θ}}_{1}, {\hat{θ}}_{2}, \dots, {\hat{θ}}_{I}}$ for all I studies, where ${\hat{μ}}_{i} = ln \frac{x_{i 1} + 0.5}{n_{i 1} - x_{i 1} + 0.5}$ , ${\hat{θ}}_{i} = ln \frac{x_{i 2} + 0.5}{n_{i 2} - x_{i 2} + 0.5} - ln \frac{x_{i 1} + 0.5}{n_{i 1} - x_{i 1} + 0.5}$ . Next, we define $L_{μ} = min {{\hat{μ}}_{1}, {\hat{μ}}_{2}, \dots, {\hat{μ}}_{I}} - c$ , $U_{μ} = max {{\hat{μ}}_{1}, {\hat{μ}}_{2}, \dots, {\hat{μ}}_{I}} + c$ , $L_{θ} = min {{\hat{θ}}_{1}, {\hat{θ}}_{2}, \dots, {\hat{θ}}_{I}} - c$ , and $U_{θ} = max {{\hat{θ}}_{1}, {\hat{θ}}_{2}, \dots, {\hat{θ}}_{I}} + c$ , where we assign c = 5 as in¹⁷ so that the priors for μ₀ and θ₀ are very conservative. Next, the conditional conjugate prior IG (a, a), an inverse-gamma distribution, is considered for both σ² and τ², where we assign a small value such as 0.01 to a, to reflect our lack of information about the variance terms.

As to the variability direction parameter ω, we consider a discrete uniform prior distribution defined on the sample space Ω = {0, 0.5, 1}, i.e. P (ω = d) = 1∕3 where d ∈ Ω. The reason for using this discrete uniform distribution instead of a continuous uniform distribution over the interval [0, 1] is two-fold. Firstly, this discrete prior builds a strong connection to well-established models in the literature, increasing the interpretability of our analysis results. As mentioned in Section 2.1, ω = 0 corresponds to the model considered in Bhaumik et al. and Bai et al.,^12,17 while ω = 0.5 corresponds to the model in Smith et al. and Günhan et al..^16,18 Using the flexible model in (1), combined with this prior, we can rely on a data-driven approach to determine which (if any) of the two models fit the data well. Secondly, as will be shown in Section S5 in Supplementary Material, using the discrete prior leads to similar results to those from using the continuous prior, even when data are actually generated with ω from U(0, 1).

Our vague and diffuse priors above, combined with the assumed a priori independence, lead to the following joint prior distribution, given by

p (μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω) \propto 𝟙_{(L_{μ}, U_{μ})} (μ_{0}) 𝟙_{(L_{θ}, U_{θ})} (θ_{0}) {(σ^{2})}^{- a - 1} exp (- \frac{b}{σ^{2}}) {(τ^{2})}^{- a - 1} exp (- \frac{b}{τ^{2}}) .

(4)

2.3. Gibbs sampling based on Pólya-Gamma data-augmentation

Our posterior computation employs Gibbs sampling to obtain posterior samples, where, to our knowledge, the Pólya-Gammá method²² is adapted to the context of meta-analysis of (rare) binary events for the first time.

The joint posterior distribution is given by p(Θ|X) ∝ p (X, Θ), where Θ = {ϕ₁, ϕ₂, σ², τ², ω, μ₀, θ₀}. To simplify the notations, we use Θ∕θ to denote the parameter set that includes all the parameters except for θ. It is easy to verify that the full conditional posterior distributions derived from p(Θ|X) for the global mean and variance terms are all known distributions (i.e., truncated normal or inverse gamma), shown below:

p (μ_{0} ∣ X, Θ / μ_{0}) \propto N (\frac{(1 - ω) \sum_{i = 1}^{I} ϕ_{i 1} + ω \sum_{i = 1}^{I} ϕ_{i 2}}{I}, \frac{σ^{2}}{I (2 ω (1 - ω) + 1)}) 𝟙_{(L_{μ}, U_{μ})} (μ_{0}),

p (θ_{0} ∣ X, Θ / θ_{0}) \propto N (\frac{\sum_{i = 1}^{I} ϕ_{i 2} - \sum_{i = 1}^{I} ϕ_{i 1}}{I}, \frac{τ^{2}}{I}) 𝟙_{(L_{θ}, U_{θ})} (θ_{0}),

p (σ^{2} ∣ X, Θ / σ^{2}) \propto IG (a + \frac{I}{2}, b + \frac{\sum_{i = 1}^{I} {(ω (ϕ_{i 2} - ϕ_{i 1}) + ϕ_{i 1} - μ_{0})}^{2}}{2}),

p (τ^{2} ∣ X, Θ / τ^{2}) \propto IG (a + \frac{I}{2}, b + \frac{\sum_{i = 1}^{I} {(ϕ_{i 1} - ϕ_{i 2} + θ_{0})}^{2}}{2}) .

To sample, p(ϕ_i1, ϕ_i2|X, Θ∕{ϕ_i1, ϕ_i2}) for each study i, Bai et al.¹⁷ utilized the rejection sampling algorithm by specifying N(μ₀, σ²) and N(μ₀, τ²) as the proposal distributions for μ_i and θ_i, respectively. However, this process is not sufficient in either computational efficiency or estimation stability, as will be shown later. Instead, we use a data-augmentation strategy based on Pólya-Gamma latent variables, proposed for logistic models by Polson et al..²² This strategy avoids the need for rejection/Metropolis–Hastings sampling, numerical integration, or analytical approximation given one can easily sample from a Pólya-Gamma distribution. Polson et al.²² further developed an efficient sampler for the Pólya-Gamma distribution and showed the effectiveness of the Pólya-Gamma method in various scenarios.

Let λ_i1, λ_i2 be the auxiliary parameters with Pólya-Gamma (PG) distributions, namely

p (λ_{i 1} ∣ ϕ_{i 1}, X) ~ PG (n_{i 1}, ϕ_{i 1})

p (λ_{i 2} ∣ ϕ_{i 2}, X) ~ PG (n_{i 2}, ϕ_{i 2})

and let $Λ_{i} = (\begin{matrix} λ_{i 1} & 0 \\ 0 & λ_{i 2} \end{matrix})$ Then we can derive the following conditional posterior for (ϕ_i1, ϕ_i2), given the introduced Λ_i and other parameters:

p (ϕ_{i 1}, ϕ_{i 2} ∣ μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω, Λ_{i}, X) ~ N (m_{i}, V_{Λ_{i}}),

where

V_{Λ_{i}}^{- 1} = Λ_{i} + {(\begin{matrix} σ_{ϕ . 11}^{2} & σ_{ϕ . 12}^{2} \\ σ_{ϕ . 21}^{2} & σ_{ϕ . 22}^{2} \end{matrix})}^{- 1},

m_{i} = V_{Λ_{i}} (κ_{i} + {(\begin{matrix} σ_{ϕ . 11}^{2} & σ_{ϕ . 12}^{2} \\ σ_{ϕ . 21}^{2} & σ_{ϕ . 22}^{2} \end{matrix})}^{- 1} (\begin{matrix} μ_{0} - ω θ_{0} \\ μ_{0} + (1 - ω) θ_{0} \end{matrix})),

κ_{i} = {(x_{i 1} - \frac{n_{i 1}}{2}, x_{i 2} - \frac{n_{i 2}}{2})}^{T} .

We note that the Pólya-Gamma mixture framework introduced above requires no tuning and is easy to implement.

Lastly, for every iteration, we update the variance direction parameter ω by calculating the discrete posterior probabilities:

p (ω = d ∣ X, Θ / ω) \propto p (ϕ_{i 1}, ϕ_{i 2} ∣ μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω = d), for d \in {0, 0.5, 1},

(5)

which is proportional to the density function of the bivariate normal distribution in (3). Thus, ω can be sampled based on the normalized updated probabilities in (5).

After a burn-in period to achieve convergence, we use posterior sample means, ${\hat{θ}}_{0}$ and ${\hat{μ}}_{0}$ , to estimate θ₀ and μ₀. As in Bai et al.,¹⁷ we use posterior medians, ${\hat{τ}}^{2}$ and ${\hat{σ}}^{2}$ , to estimate τ² and σ² because their (marginal) posterior distributions are heavily skewed. Finally, we use the posterior mode to estimate ω due to its discrete posterior.

3. HYPOTHESIS TESTING VIA MODEL SELECTION

We now pivot the discussion to hypothesis testing and outline how this is implemented in our FlexB approach. Under the Bayesian paradigm, hypothesis testing can be viewed as an analogue to model selection. Here, we consider three different approaches to model selection using (a) Akaike information criterion (AIC), (b) Bayesian information criterion (BIC), and (c) deviance information criterion (DIC). Based on the flexible BN model (1), we focus on simultaneous testing of the overall treatment effect θ₀, the heterogeneity parameter τ², and the variance direction parameter ω, which corresponds to selection between the following eight candidate models:

ℳ_{1} : θ_{0} = 0, τ^{2} = 0,

ℳ_{2} - ℳ_{4} : θ_{0} = 0, τ^{2} \neq 0, ω = 0, 0.5, 1

ℳ_{5} : θ_{0} \neq 0, τ^{2} = 0, ω = 0, or 0.5 or 1

ℳ_{6} - ℳ_{8} : θ_{0} \neq 0, τ^{2} \neq 0, ω = 0, 0.5, 1.

where for ℳ₁, ω is no longer needed as θ_i ≡ 0; for ℳ₅ : θ₀ ≠ 0, τ² = 0, ω = 0, 0.5 and 1 lead to three non-identifiable models, so they are indeed one model with different parameterization. This simultaneous testing approach not only allows us to examine the existence of the overall treatment effect and/or between-study heterogeneity, but also indicates whether the popular existing BN models considered in the literature^12,16,18 fit the data and if so, which model.

Each of ℳ₁− ℳ₈ is a special case of (1). In what follows, we detail how each information criterion is computed for each candidate model.

3.1. BIC

Our FlexB model (1) can be written in the form of the following generalized linear mixed model (GLMM):

x_{i j} ~ Bin (n_{i j}, p_{i j}),

logit (p_{i j}) = μ_{i} + (j - ω - 1) θ_{i},

μ_{i} ~ N (μ_{0}, σ^{2}),

θ_{i} ~ N (θ_{0}, τ^{2}),

where j = 1 for the control group and j = 2 for the treatment group. Let $μ = {(μ_{i})}_{i = 1}^{I}$ and $θ = {(θ_{i})}_{i = 1}^{I}$ . Then the marginal likelihood ℒ for the above GLMM, as a function of (μ₀, θ₀, σ², τ², ω), is given by

ℒ = p {X = x ∣ μ_{0}, θ_{0}, σ^{2}, τ^{2}, ω} = \int \dots \int p {X = x ∣ μ, θ, ω} p (μ ∣ μ_{0}, σ^{2}) p (θ ∣ θ_{0}, τ^{2}) d μ d θ = \prod_{i = 1}^{I} \iint \prod_{j = 1}^{2} p {X_{i j} = x_{i j} ∣ μ, θ, ω} p (μ_{i} ∣ μ_{0}, σ^{2}) p (θ_{i} ∣ θ_{0}, τ^{2}) d μ_{i} d θ_{i} = \prod_{i = 1}^{I} \iint (\begin{matrix} n_{i 1} x_{i 1} \end{matrix}) (\begin{array}{l} n_{i 2} \\ x_{i 2} \end{array}) \frac{{(e^{μ_{i} - ω θ_{i}})}^{x_{i 1}}}{{(1 + e^{μ_{i} - ω θ_{i}})}^{n_{i 1}}} \frac{{(e^{μ_{i} + (1 - ω) θ_{i}})}^{x_{i 2}}}{{(1 + e^{μ_{i} + (1 - ω) θ_{i}})}^{n_{i 2}}} \times \frac{1}{2 π σ τ} e^{- (\frac{{(μ_{i} - μ_{0})}^{2}}{σ^{2}} + \frac{{(θ_{i} - θ_{0})}^{2}}{τ^{2}})} d μ_{i} d θ_{i} .

(6)

However, there is no closed-form solution to the integration in the likelihood function ℒ, which can only be evaluated numerically. For instance, Laplace approximation is recommended by Wolfinger²⁷ and McCullagh and Nelder²⁸ to solve this issue. Then by maximizing the approximate likelihood, we obtain $\hat{ℒ}$ , and BIC can be defined as $BIC = - 2 ln \hat{ℒ} + k ln (I)$ , where k is the number of estimated parameters, I is the number of component studies. Clearly, ℒ would be defined differently for candidate models ℳ₁ − ℳ₈. For example, for ℳ₂: θ₀ = τ² ≠ 0, ω = 0, the likelihood in (6) becomes $ℒ_{ℳ_{2}} = p {X = x ∣ μ_{0}, θ_{0} = 0, σ^{2}, τ^{2}, ω = 0}$ , and ${\hat{ℒ}}_{ℳ_{2}}$ will be the (approximate) likelihood evaluated at the maximum likelihood estimates of {μ₀, σ², τ²}. Then, BIC for ℳ₂ can be defined as ${BIC}_{ℳ_{2}} = - 2 ln {\hat{ℒ}}_{ℳ_{2}} + 3 ln (I)$ . After getting all BIC values for ℳ₁ − ℳ₈, the model with the smallest value is deemed to be the best.

3.2. DIC

Spiegelhalter et al.²⁴ proposed DIC to select the best model from the perspective of prediction given the observed data. For model g, the Bayesian deviance is defined as D (β) = −2ln (p(X|β, ℳ_g)) + 2ln (p (X)), where β is a set of parameters of interest, X is the data, and p (X) is a standardizing term that only relates to the observed data. The posterior mean of the deviance $\bar{D} (β)$ and the effective number of parameters p_D are written as $\bar{D} (β) = E_{β} (D (β))$ and $p_{D} = \bar{D} (β) - D (\bar{β})$ , respectively. Then, $D I C = p_{D} + \bar{D} (β)$ , and the model with the smallest DIC will be selected as the best one.

Since we adopt a hierarchical framework, as shown in Figure 1, the likelihood function p(X|β, ℳ_g) can be defined differently according to the specific definition of β²⁴. For example, Under ℳ₂: θ₀ = 0, τ² ≠ 0, ω = 0, the marginal distribution of X, can be written as

p (X ∣ ℳ_{2}) = \iint p (X ∣ ϕ_{1}, ϕ_{2}, ℳ_{2}) p (ϕ_{1}, ϕ_{2} ∣ ℳ_{2}) d ϕ_{1} d ϕ_{2}

or alternatively

p (X ∣ ℳ_{2}) = \int p (X ∣ ψ, ℳ_{2}) p (ψ ∣ ℳ_{2}) d ψ,

where ψ = {μ₀, θ₀, σ², τ², ω}. Therefore, the likelihood p (_X|β, ℳ_g) in DIC can be either p (_X|ϕ₁, ϕ₂, ℳ₂) or p (X|ψ, ℳ₂), where β can be (ϕ₁, ϕ₂) or ψ accordingly, and

p (X ∣ ϕ_{1}, ϕ_{2}, ℳ_{2}) = \prod_{i = 1}^{I} (\begin{matrix} n_{i 1} \\ x_{i 1} \end{matrix}) (\begin{matrix} n_{i 2} \\ x_{i 2} \end{matrix}) \frac{{(e^{ϕ_{i 1}})}^{x_{i 1}}}{{(1 + e^{ϕ_{i 1}})}^{n_{i 1}}} \frac{{(e^{ϕ_{i 2}})}^{x_{i 2}}}{{(1 + e^{ϕ_{i 2}})}^{n_{i 2}}}, p (X ∣ ψ, ℳ_{2}) = \prod_{i = 1}^{I} \iint (\begin{matrix} n_{i 1} \\ x_{i 1} \end{matrix}) (\begin{matrix} n_{i 2} \\ x_{i 2} \end{matrix}) \frac{{(e^{μ_{i}})}^{x_{i 1}}}{{(1 + e^{ϕ_{i 1}})}^{n_{i 1}}} \frac{{(e^{μ_{i} + θ_{i}})}^{x_{i 2}}}{{(1 + e^{ϕ_{i 2}})}^{n_{i 2}}} = \times \frac{1}{2 π σ τ} e^{- (\frac{{(μ_{i} - μ_{0})}^{2}}{σ^{2}} + \frac{θ_{i}^{2}}{τ^{2}})} d μ_{i} d θ_{i} .

In this paper, we use p(_X|β, ℳ₂) to compute DIC for model g since we focus on inference about θ₀ and τ² Our choice is also recommended by Bai et al.,¹⁷ who found better performance when using p(_X|β, ℳ₂) rather than p(_X|ϕ₁, ϕ₂, ℳ₂).

4. SIMULATION

We conduct simulation studies to evaluate the performance of the proposed FlexB and compare FlexB with other popular methods for meta-analysis of rare binary events. The methods are evaluated in estimating and testing the overall treatment effect θ₀ and heterogeneity parameter τ², using four metrics: bias, mean squared error (MSE), coverage probability, and interval width. Specifically, for a parameter of interest (say β, β is either θ₀ or τ² here), we define $Bias (β) \equiv \frac{\sum_{j = 1}^{J} ({\hat{β}}_{j} - β)}{J}$ and $MSE (β) \equiv \frac{\sum_{j = 1}^{J} {({\hat{β}}_{j} - β)}^{2}}{J}$ for each method, where ${\hat{β}}_{j}$ is the corresponding estimate of β at the j^th replication among J total replicates; and the coverage probability and interval width are computed using 95% confidence intervals (for frenquentist methods) or equal-tail credible intervals (for Bayesian methods). Throughout our simulation and real data analyses later in Section 5, we apply the default continuity correction factor 0.5 to all frequentist methods unless otherwise specified. By contrast, for Bayesian methods considered, we do not apply any continuity correction or remove studies with zero events as they can handle such studies automatically via incorporation of prior information into data analysis.

4.1. Performance on estimation of θ₀

We focus on assessing the estimates of θ₀ by varying (a) the value of θ₀ and (b) the value of μ₀ in the BN model (1) under various simulated scenarios. Let I = 10, 20, 50, 80 represent four different sizes of meta-analysis (we defer a discussion about the choice of I to Section 6); let ω = 0, 0.5, 1 represent the scenario that the variability in the treatment group is larger than, equal to, or smaller than that in the control group, respectively. For (a), we set θ₀ ∈ {−1, −0.8, …, 1}, μ₀ = −5, σ² = 0.5, and τ² = 0.8, which correspond to (very) rare binary events in general. For (b), we let μ₀ ∈ {−5, −4.5, …, 0}, θ₀ = 0, σ² = 0.5 and τ² = 0.8 to evaluate the performance of the estimates for both rare and prevalent binary events. Using each combination of all parameters, we simulate probabilities of the event for both groups, $p_{i c} = \frac{e^{ϕ_{i 1}}}{1 + e^{ϕ_{i 1}}}$ and $p_{i t} = \frac{e^{ϕ_{i 2}}}{1 + e^{ϕ_{i 2}}}$ for study i = 1, 2 …, I. Then, for every study, we randomly draw integers from 50 to 1000 as the number of subjects in the treatment group n_i2 and in the control group n_i1. Lastly, for each study i, we randomly draw the number of events in the control group x_i1, and that in the treatment group x_i2 from Bin (n_i2, p_i2), and Bin (n_i1, p_i1), respectively. We simulate j = 300 replicates for each combination of the parameters for (a) and (b), and we report results from the proposed FlexB using the four criteria mentioned above.

For point estimation of the overall treatment effect θ₀, we employ ten existing methods for the purpose of comparison, where the first eight are frequentist methods including Mantel & Haenszel (MH),³ DerSimonian & Laird (DSL),⁵ empirical logit (EL),⁵ GLMM^29,30 based on the generalized linear mixed model in Bhaumik et al.,¹² median unbiased estimator (MUE),³¹ simple (unweighted) average (SA) with the default continuity correction factor 0.5,¹² Shuster, Guo and Skyler’s weighted estimator (SGSwgt),²¹ and simple (unweighted) average with continuity correction 0.25 (SA25) due to its good performance under rare settings reported in Li and Wang¹⁴, and the last two are Bayesian methods including BAYES¹⁷ and HTE-Beta.¹⁹ For interval estimation of θ₀, we compare the proposed FlexB with four other approaches including Hartung-Knapp/Sidik-Jonkman (HKSJ) method^32,33,34 recommended in Weber et al.,³⁵ Wald confidence intervals recommended in Pateras et al.³⁶ using three different heterogeneity estimators: Hartung-Makambi (HM),³⁷ positive Sidik-Jonkman (SJ)³⁴ and improved Paule & Mandel (IPM).¹²

Figure 2 shows bias results for point estimates of the overall treatment effect θ₀ from the eleven methods, as mentioned above, when θ₀ varies along the horizontal axis in each subplot, ω varies across rows (the top row of four subplots for ω = 0, middle row for ω = 0.5, and bottom row for ω = 1), and I varies across columns (left column of three subplots for I = 10, and so on). In general, FlexB has the lowest average bias across all scenarios. The top panel (ω = 0) shows that GLMM, BAYES, and FlexB perform well, yielding nearly unbiased estimates. Among the other methods, MH, DSL, EL, SGSwgt and HTE-Beta always overestimate θ₀, while SA25, SA and MUE only overestimate θ₀ when it is negative. The middle panel (ω = 0.5) reveals that FlexB, MH, SA25, and SGSwgt achieve the best performance, all producing estimates around their true values. Other methods such as MUE, HTE-Beta, EL, SA and DSL overestimate θ₀ when θ₀ < 0 but underestimate θ₀ when θ₀ > 0. BAYES and GLMM consistently underestimate θ₀, and the fluctuation of the BAYES estimates in the (I = 20, ω = 0.5) plot indicates lack of stability of the algorithm sometimes. In the bottom panel (ω = 1), FlexB often surpasses other procedures, yielding almost unbiased estimates for all θ₀ values. SA25, SA and MUE are slightly worse than FlexB as they give biased estimates when θ₀ > 0. Among other approaches that underestimate θ₀ consistently, DSL has the least bias except for θ₀ values close to 1. We also observe that, for all panels, the bias of FlexB estimates tends to decrease as the number of studies I increases.

Bias comparison of point estimates of θ₀ by Mantel & Haenszel (MH), DerSimonian & Laird (DSL), empirical logit (EL), generalized linear mixed model estimator (GLMM), median unbiased estimator (MUE), simple (unweighted) average (SA) with the default continuity correction factor 0.5, Shuster, Guo and Skyler’s weighted estimator (SGSwgt), simple (unweighted) average with continuity correction 0.25 (SA25), BAYES, HTE-Beta (HTE), and proposed FlexB for different values of θ₀, ω, and I. Settings: n_i1,n_i2 ~ U (50, 1000), μ₀ = −5, θ₀ = {−1, −0.9, −0.8, …, 1}, σ² = 0.5, τ² = 0.8.

Figure S1 of Supplementary Material shows MSE results for estimating θ₀ by the different methods. In general, FlexB works well for I = 50 and 80 and also has reasonable MSE for I = 10 and 20. Among other approaches, the top panel (ω = 0) shows that SA, SA25, MUE, BAYES and GLMM perform well while EL, DSL, MH, SGSwgt and HTE report relatively large MSE. In the bottom panel (ω = 1), SA, SA25 and MUE still perform well, but GLMM and BAYES show large MSE even with large I; as I increases, FlexB reports smaller MSE and its performance arise from the middle to the very top (e.g., for I = 50 and 80, it is the best for almost all θ₀ values). All methods behave more similarly in the middle panel (ω = 0.5), where as I gets larger, the difference in their performance becomes smaller.

Figure 3 shows bias results of all methods by varying μ₀ while fixing θ₀ at zero. We find that four methods, FlexB, SA, SA25 and MUE, form the top performing group for all combinations of (I, ω), producing estimates around the true value of θ₀ = 0 consistently. When ω = 0, the top panel shows that BAYES and GLMM perform well besides the best four (yet not as well as the best four) while the remaining approaches including HTE-Beta and SGSwgt significantly overestimate θ₀. When ω = 0.5, only BAYES and GLMM underestimate θ₀ while the others perform well. When ω = 1, all methods except for the top group underestimate θ₀. The bias of the eleven methods across all scenarios roughly follows the order FlexB≈SA25≈SA≈MUE<DSL<BAYES≈GLMM<HTB-Beta≈SGSwgt≈EL≈MH. Figure S2 of Supplementary Material shows the corresponding MSE results for estimating θ₀ by varying μ₀. Here, FlexB, SA, SA25 and MUE form the top group in terms of MSE; SGSwgt, MH, HTE and EL form the bottom group while the other three stand somewhere in the middle but closer to the top group.

Figure 4 shows coverage results for 95% interval estimates of θ₀ from the proposed FlexB and four other methods (i.e., HKSJ, HM, SJ, and IPM) for different θ₀, ω and I values. The overall performance of all five methods appears to follow the order FlexB>SJ>HKSJ>IPM>HM. Clearly, FlexB performs best and provides nearly unbaised coverage in all cases considered, anchoring around the nominal level indicated by the red dash line. For the other four, in the top panel (ω = 0), the coverage tends to decrease as θ₀ decreases while the trend is opposite in the bottom panel (ω = 1), as they provide lower coverage with larger θ₀. When ω = 0.5, the four methods tend to report lower coverage with larger absolute values of θ₀, and thus all have a non-monotone pattern. We also observe that, all methods except for FlexB have lower coverage as the number of studies I increases. Figure S3 presents the corresponding width results for the interval estimates of θ₀, showing that methods offering better coverage have larger width as well. Thus, FlexB give wider intervals compared to the other four while providing better coverage. Note that narrower intervals are preferred only when they can provide adequate coverage. Figure 4 shows that they fail to do so especially when θ₀ moves away from zero.

Coverage comparison of 95% interval estimates of 𝜃₀ by Hartung-Knapp/Sidik-Jonkman (HKSJ) method, three Wald confidence intervals using Hartung-Makambi estimator (HM), positive Sidik-Jonkman estimator (SJ), and improved Paule and Mandel estimator (IPM), and proposed FlexB for different values of θ₀, ω, and I. Settings: n_i1, n_i2 ~ U (50, 1000) μ₀ = −5, θ₀ = {−1, −0.9, −0.8, …, 1} σ² = 0.5, τ² = 0.8.

Figure 5 shows coverage results from the five methods by varying μ₀ while fixing θ₀ at zero. Again, the relative performance of the methods follows the same order as in Figure 4, where FlexB has relatively stable coverage and generally outperforms the other four that tend to have lower coverage as μ₀ decreases and I increases for ω ≠ 0.5. Figure S4 shows the corresponding width results. We can see that the width of each method decreases as I or μ₀ increases. As in Figure S3, FlexB reports larger width than the other approaches for small I or μ₀ but as I and μ₀ get large, the difference diminishes.

Coverage comparison of 95% interval estimates of θ₀ by Hartung-Knapp/Sidik-Jonkman (HKSJ) method, three Wald confidence intervals using Hartung-Makambi estimator (HM), positive Sidik-Jonkman estimator (SJ), and improved Paule and Mandel estimator (IPM), and proposed FlexB for different values of μ_0, ω, and I. Settings: n_i1, n_i2 ~ U (50, 1000) μ₀ = {−5, −4.5, …, 0} θ₀= 0, σ² = 0.5, τ² = 0.8.

4.2. Performance on estimation of τ²

To evaluate the performance of FlexB in estimating the heterogeneity parameter τ², we vary (a) the value of τ² and (b) the value of μ₀ in the BN model (1). We choose the same settings for the number of studies I and the variance direction parameter ω as in Section 4.1; for (a), we set τ² ∈ {0, 0.1, 0.2, …, 1}, θ₀ = 0, σ² = 0.5, μ₀ = −5 and for (b) μ₀ ∈ {0, −0.5, −1, …, −5}, θ₀ = 0, σ² = 0.5, τ² = 0.8. We compare point estimates of τ² from the proposed FlexB and seven other methods in terms of bias and MSE, including Paule & Mandel (PM),³⁸ DerSimonian & Laird (DSL),⁵ GLMM, SJ, DerSimonian & Kacher (DSK),³⁹ improved Paule & Mandel (IPM),¹² and BAYES. As for interval estimation, we consider four competitors recommended by Zhang et al.,⁴⁰ who conducted comprehensive simulation studies to compare 16 different types of confidence intervals for the heterogeneity parameter. The methods include two profile likelihood confidence intervals based on maximum likelihood estimation (PLML) proposed by Hardy and Thompson⁴¹ and restricted maximum likelihood estimation (PLREML) proposed by Viechtbauer,⁴² Sidik-Jonkman (SJ) method, and approximate Jackson (AJ) method.⁴³

Figure 6 presents bias results for point estimates of the heterogeneity parameter τ² from the eight methods when τ² varies. When τ² > 0 (i.e., between-study heterogeneity exists), the top panel (ω = 0) shows that BAYES, GLMM and FlexB are the top performing group with much less bias than the other five methods; the middle and bottom panels (ω = 0.5, 1) show that FlexB is the winner (except for only a few occasions when τ² and I are both small), while the bias of BAYES and GLMM grows quickly as τ² gets large. Among the other five methods, the performance of IPM is better when τ² ≤ 0.4, while SJ outperforms the other four when τ² > 0.4. On the other hand, when there is no heterogeneity (τ² = 0), SJ reports the worst results, and BAYES, GLMM and FlexB overestimate τ² slightly while the other methods seem to be unbiased in this particular case. Figure S5 in Supplementary Material shows the corresponding MSE results for estimating τ² using the eight methods. Obviously, there is no method that consistently outperforms the others in all the settings. It seems that for τ² ≥ 0.4, SJ performs the best, often followed by FlexB; however, when τ² < 0.4, SJ has the largest MSE while FlexB becomes the best or close to the best, especially when I is not small. Figure 7 shows bias results of the eight methods by varying μ₀ while fixing τ² at 0.8. We observe that FlexB and SJ have much smaller bias in estimating τ² regardless of μ₀, compared to the other methods that usually significantly underestimate τ². Although GLMM and BAYES perform well in the top row (ω = 0), they tend to report larger bias in the middle and bottom row (ω = 0.5, 1). The other four methods (IPM, PM, DSK, and DSL) perform poorly when μ₀ = −5 but the bias slowly decreases as μ₀ increases to 0. Figure S6 shows the corresponding MSE results for estimating τ², where FlexB performs reasonably well in most scenarios, and SJ has quite comparable results. We note that when varying μ₀ while fixing τ² at 0.8, FlexB and SJ are always in the top performing group, regardless of (I, ω) settings, but as shown in Figure 6 (and Figure 8 below), SJ only performs well when τ² ≥ 0.4 and may achieve the best results around τ² = 0.8. Thus, SJ may report much less favorable results if we fix τ² at a value smaller than 0.4.

Bias comparison of estimates of τ² by Paule & Mandel (PM), DerSimonian & Laird (DSL), generalized linear mixed model (GLMM), Sidik-Jonkman (SJ), DerSimonian & Kacher (DSK), improved Paule & Mandel (IPM), BAYES, and proposed FlexB for different values of τ², ω, and I. Settings: n_ic, n_it ~ U (50, 1000), μ₀ = −5, θ₀ = 0, σ² = 0.5, τ² = {0, 0.1, 0.2, …, 1}.

Coverage comparison of 95% interval estimates of τ² by proposed FlexB, profile likelihood confidence intervals based on maximum likelihood estimation (PLML), profile likelihood confidence intervals based on restricted maximum likelihood estimation (PLREML), Sidik-Jonkman (SJ) and approximate Jackson (AJ) methods for different values of τ², ω, and I. Settings: n_ic,n_it ~ U (50, 1000), μ₀ = −5, θ₀ = 0, σ² = 0.5, τ² = {0.1, 0.2, …, 1}.

Figure 8 shows coverage results for 95% interval estimates of τ² from the proposed FlexB and four other methods (i.e., PLML, PLREML, SJ, and AJ) for different τ², ω and I values. We find that FlexB is the only method that can provide almost unbiased coverage in all cases while the others show significant undercoverage. SJ performs poorly when τ² < 0.4 but as τ² gets larger, it becomes the second best. For the other three methods, the performance appears to follow the order PLREML>PLML>AJ. Figure S7 shows the corresponding width results for interval estimates of τ², where FlexB reports larger widths than the other approaches, due to its better coverage.

Figures 9 and S8 show coverage and width results, respectively, from the five methods by varying μ₀ while fixing τ² at 0.8. Again, we find that FlexB, often with larger width, provides better coverage than the others in most scenarios.

4.3. Performance on Bayesian hypothesis testing

We now examine how BIC and DIC, as detailed in Section 3, perform on hypothesis testing (or model selection) in various simulated settings of (θ₀, τ², ω). For the purpose of benchmarking, we consider Akaike information criterion (AIC) since it tends to give better estimates for models of certain characteristics.³⁰ We also include a sequential testing procedure based on tests proposed in Bhaumik et al.¹² For testing H₀ : θ₀ = 0, Bhaumik et al¹² recommend a z-test procedure using the statistic T₂ based on the SA estimator of θ₀. For testing H₀ : τ² = 0, the authors advocate for a parametric bootstrap-based approach based on the test statistic T₄, which outperforms an alternative test based on Cochran’s Q statistic. We refer to their approach as Bhaumik’s sequential testing (BST) procedure since two parameters need to be tested separately.

In this simulation study, we vary both θ₀ and τ² in the set {0, 0.3, 0.6, 0.9, 1.2} and ω ∈ {0, 0.5, 1} while fixing μ₀ = −5, σ² = 0.5, I = 50 and n_ic, n_it.~ U (50, 1000). 500 replicates were generated for each setting of (θ₀, τ², ω). We define accuracy of a selection procedure to be the proportion of replications in which the correct (θ₀, τ²) setting was selected. This is because BST can only test θ₀ and τ².

Figure 10 shows contour plots of accuracy for the four selection procedures in various simulated settings of (θ₀, τ²) with ω = 0.5. DIC and AIC appear to have the best overall performance. However, when there is no inter-study heterogeneity (τ² = 0), BIC seems to outperform the others since it uses a larger penalty and thus favors simpler models. Similar observations can be made from the results for ω = 0 and 1, which are omitted for brevity.

Contour plots of average accuracies for testing different combinations of the overall treatment effect θ₀ and inter-study heterogeneity τ². Top-left: AIC; Top-right: BIC; Bottom-left: DIC; Bottom-right: Bhaumik’s sequential testing (BST). Settings: n_ic, n_it ~ U (50, 1000), I = 50, μ₀ = −5, ω = 0.5, σ² = 0.5.

4.4. Computational efficiency of FlexB

In Section 2.3, we showed how Pólya-Gamma data-augmentation can be integrated into our flexible Bayesian REM to avoid the need for rejection/Metropolis–Hastings sampling, numerical integration, or analytical approximation. However, the computational efficiency of the algorithm for FlexB has not been investigated, which we cover below. Since the proposed FlexB, BAYES and HTE-Beta are Bayesian approaches, we are interested in comparing their computation time. We set μ₀ = −5, θ₀ = 1, σ² = 0.5, τ² = 0.8, ω = 0, and I ∈ {10, 20, …, 80} and the average computation time is calculated by 100 replicate datasets and 10,000 MCMC iterations per dataset. Table 1 shows that both HTE-Beta and FlexB are much faster than BAYES; and FlexB is the fastest, whose computation time is less than half of HTE-Beta’s time for all different sizes of meta-analysis.

TABLE 1.

Comparison of the mean computation time (in seconds) by BAYES, HTE-Beta and proposed FlexB. Settings: n_i1,n_i2 ~ U (50, 1000), θ₀ = 1, τ² = 0.8, μ₀ = −5, σ² = 0.5, I ∈ {10, 20, …, 80}.

I	BAYES	HTE-Beta	FlexB
10	22.38	1.22	0.48
20	38.47	2.32	0.85
30	52.96	3.53	1.23
40	68.27	4.49	1.55
50	84.28	5.71	2.12
60	98.42	6.64	2.54
70	111.06	8.03	2.96
80	124.85	9.03	3.07

Open in a new tab

5 |. DATA EXAMPLES

5.1 |. Rosiglitazone meta-analysis (56 studies)

Rosiglitazone (marketed as Avandia), approved by the US Food and Drug Administration (FDA) in 1999, is a widely-used anti-diabetes drug in the US. Shortly after the drug was marketed to patients, there was much debate about potential adverse effects of rosiglitazone on cardiovascular safety.^44,45,46 Nissen and Wolski⁴⁷ conducted a meta-analysis of 56 trials to investigate the adverse effects of rosiglitazone (see Table S1 for detailed data in Supplementary Material). In the 56 independent trials, 19,509 patients were assigned randomly to the treatment group (rosiglitazone), and 16,022 people were assigned to the control group. There were 159 myocardial infarction (MI) and 103 cardiovascular death (CVD) cases in the treatment group and 136 MI and 98 CVD cases in the control group. The overall incidence rate for all 56 trials is rare for both MI (0.83%) and CVD (0.57%).

Figure 11 displays the posterior densities of all global parameters from FlexB and the densities of sample log odds for treatment and control groups from component studies (the bottom right plot). Table 2(a) shows the summary statistics of posterior draws of (θ₀, τ², μ₀, σ²₎ from FlexB, where C.I. represents a credible interval, and Table 2(b) compares our (FlexB) estimates of θ₀ and τ² with those obtained by other approaches. We see that μ₀ is very low, which is estimated by −5.761 for MI and −6.633 for CVD, confirming those adverse events are extremely rare. As for estimating θ₀ for MI events, our FlexB reports 0.247 but the Bayesian C.I. contains 0, indicating there is no strong evidence to claim the existence of an overall treat effect. We also observe that DSL produces the largest estimate 0.295 followed by MH and SGSwgt; GLMM reports an estimate of 0.226, but with p-value 0.061 which is on the borderline of significance. All other methods report non-significant results on the treatment effect. It is not surprising since in Section 4.1, DSL, MH, and SGSwgt perform poorly and have the largest bias when estimating θ₀. For CVD data, SA, EL and MUE report slightly negative estimates of θ₀ while other approaches including FlexB produce positive estimates. HTE-Beta reports the largest estimate (0.297), followed by FlexB (0.160) and BAYES (0.137). As mentioned in Section 4.1, HTE-Beta yields the largest bias among the three Bayesian approaches and is worse than some frequentist approaches such as SA and MUE. All methods do not reject the null hypothesis of θ₀ = 0. The Bayesian C.I. of our FlexB covers 0, indicating there is no treatment effect for CVD as well.

Rosiglitazone example: posterior densities of all global parameters and densities of sample log odds for treatment and control groups from the 56 component studies for (a) myocardial infrarction (MI), and (b) cardiovascular death (CVD).

TABLE 2.

Rosiglitazone example: (a) a summary of FlexB estimates for myocardial infrarction (MI) and cardiovascular death (CVD); (b) comparison of different methods in estimating θ₀ and τ² for MI and CVD.

	MI				CVD
	Mean	SE	Median	95% C.I.	Mean	SE	Median	95% C.I.
θ ₀	0.247	0.147	0.241	(−0.054,0.579)	0.160	0.232	0.144	(−0.342,0.687)
τ ²	0.058	0.075	0.033	(0.001,0.201)	0.117	0.188	0.052	(0.002,0.454)
μ ₀	−5.761	0.224	−5.754	(−6.225,−5.336)	−6.633	0.307	−6.612	(−7.280,−6.087)
σ ²	1.008	0.325	0.957	(0.454,1.543)	1.535	0.623	1.429	(0.547,2.860)
(a)

	θ ₀				τ ²
	MI	p-value	CVD	p-value		MI	CVD
SA	0.030	0.446	−0.063	0.395	DSL	0	0
DSL	0.295	0.006	0.118	0.184	DSK	0	0
EL	0.159	0.081	−0.059	0.325	PM	0	0
MH	0.250	0.015	0.026	0.422	IPM	0	0
MUE	0.071	0.268	−0.032	0.402	GLMM	0	0
GLMM	0.226	0.061	0.007	0.962	BAYES	0.042	0.024
SGSwgt	0.248	0.016	0.036	0.391	FlexB	0.033	0.052
BAYES	0.175	*	0.137	*
HTE-Beta	0.220	*	0.297	*
FlexB	0.247	*	0.160	*
(b)

Open in a new tab

For estimating the heterogeneity parameter τ², DSL, DSK, PM, and IPM report a value of 0 for both MI and CVD data while FlexB and BAYES report values a little greater than 0. In Figure 11(a) and (b), for both MI and CVD, we can observe that the variability of sample log odds in the treatment groups is similar to that in the control groups (in theory, the variability in both groups is equal when either ω = 0.5 or τ² = 0); also, the posterior distributions of ω show that the posterior probabilities at 0, 0.5 and 1 do not differ much. Recall that under ℳ₁ and ℳ₅ (both assuming τ² = 0), the posterior distribution of ω is the same as its prior distribution, where 0, 0.5, and 1 occur with equal probability. Moreover, the estimates of τ² using FlexB are close to 0. Given these results, it is reasonable to conclude that τ² = 0 for both the MI and CVD data. For model selection, all procedures choose model ℳ₁ (θ₀ = 0 & τ² = 0) for the CVD data. For MI data, BIC and BST select ℳ₁, while AIC and DIC choose ℳ₅ (θ₀ ≠ 0 & τ² = 0). Despite this, we conclude that ℳ₁ is the best fit for the MI data since, from Figure 10, BIC performs better than the others when θ₀ = 0 and our FlexB C.I. of θ₀ covers 0. Therefore, we conclude that there is no significant treatment effect on MI and CVD data, and no heterogeneity in either data as well. Our conclusion comes to an agreement with the literature^48,49,15 and the FDA, which has eased the restrictions on rosiglitazone.⁵⁰

5.2. Stomach ulcers meta-analysis (41 studies)

Efron⁵¹ conducted a meta-analysis to compare a new surgical treatment with an older model on stomach ulcers. The study includes 41 independent studies, where the treatment group (new) contains 916 patients, while the control group (old) has 907 patients. There are 170 and 352 total events (recurrent bleeding) in the treatment and control group respectively. The overall incidence rate is 28.6% in this example. For detailed data, see Table S2 in Supplementary Material.

Figure 12 shows the posterior densities of all global parameters from FlexB and the densities of sample log odds for treatment and control groups from component studies. Table 3 displays summary statistics of posterior draws of (θ₀, τ², μ₀, σ²₎ from FlexB and its comparison with other methods in the estimation of θ₀ and τ². The density plots of sample log odds in Figure 12 (the bottom right plot) suggest that the variability in the control groups is larger than that in the treatment groups. This shows a real example where the assumption about the variability direction in those popular existing models such as Smith et al.¹⁶ and Bhaumik et al.¹² is invalid. Clearly, the posterior distribution of ω from our FlexB model strongly supports the correct choice of ω = 1.

Stomach ulcers example: posterior densities of all global parameters and densities of sample log odds for treatment and control groups from the 41 component studies.

TABLE 3.

Stomach ulcers example: (a) a summary of FlexB estimates; (b) comparison of different methods in estimating θ₀ and τ².

	Stomach Ulcers
	Mean	SE	Median	95% C.I.
θ ₀	−1.305	0.273	−1.298	(−1.828,−0.764)
τ ²	2.273	0.830	2.122	(0.951,3.935)
μ ₀	−1.509	0.180	−1.517	(−1.867,−1.164)
σ ²	0.524	0.242	0.485	(0.125,1.001)
(a)

	θ ₀	p-value		τ ²
SA	−1.370	<0.001	DSL	0.778
DSL	−1.026	<0.001	DSK	1.136
EL	−0.855	<0.001	PM	1.275
MH	−1.101	<0.001	IPM	2.368
MUE	−1.447	<0.001	GLMM	1.500
GLMM	−1.386	<0.001	BAYES	1.246
SGSwgt	−1.083	<0.001	FlexB	2.122
BAYES	−1.349	*
HTE-Beta	−1.251	*
FlexB	−1.305	*
(b)

Open in a new tab

The results show that FlexB, SA and MUE are top three methods for estimating θ₀ with larger μ₀ (i.e, not so rare events), as shown in Figure 5. Also, as previously shown in Figure 9, FlexB and IPM report nearly unbiased estimates of τ² with larger I and μ₀, while the others tend to underestimate τ². In Table 3(b), FlexB reports an estimate of −1.305 when estimating θ₀, with SA and MUE providing similar estimates of −1.370 and −1.447. All methods including the proposed FlexB give the same conclusion that an overall treatment effect exists (i.e., θ₀ ≠ 0). As for estimating τ², FlexB and IPM report similar estimates while other methods provide smaller values, which are consistent with what we have observed in our simulation. As to model selection, all approaches choose the model with θ₀ ≠ 0 and τ² ≠ 0. We therefore conclude that the risk of recurrent bleeding is lower for patients undergoing the new treatment compared with the traditional treatment. We also conclude there exists between-study heterogeneity of treatment effects.

6. DISCUSSION

Based on a flexible random-effects model, we develop a novel Bayesian procedure (FlexB) to estimate and test the overall treatment effect θ₀ and inter-study heterogeneity τ² in meta-analysis of rare binary events. FlexB removes the assumption about the direction of variability between control and treatment groups in classical REMs, which may be violated in practical situations (see the second data example in Section 5). It relies on a data-adaptive approach to determine an appropriate direction rather than fixing it beforehand. Our Markov chain Monte Carlo algorithm adapts the Pólya-Gamma data-augmentation technique²² into the proposed Bayesian hierarchical framework for meta-analysis of rare binary events so that the corresponding full conditionals are all known distributions, which, combined with implementation using Rcpp, bring ease of implementation, efficiency in computation and stability for estimation. Our simulation shows that FlexB generally reports less biased results in both point and interval estimation of θ₀ and τ², compared with other frequentist and Bayesian competitors. For simultaneous testing of θ₀ and τ², DIC and AIC are the overall best procedures. However, BIC performs best when there is no inter-study heterogeneity. We further illustrate our estimation and testing approach in rosiglitazone and stomach ulcers meta-analyses, where observations made from real data conform well with those made from simulated data. In rosiglitazone meta-analysis, we conclude no overall treatment effect and no heterogeneity for myocardial infrarction (MI) data and cardiovascular death (CVD) data. In stomach ulcers meta-analysis, we demonstrate that none of the popular REMs in the literature fits in this example and our FlexB leads to the correct choice of the variance direction parameter. We further conclude the risk of having recurrent bleeding would decrease using the new treatment, and there exists inter-study heterogeneity.

As pointed out by one of our reviewers, Rhodes et al.⁵² reports that 75% of meta-analyses contain five or fewer studies. However, this finding was based on 6,492 continuous-outcome meta-analyses within the Cochrane Database of Systematic Reviews, as clearly stated in their abstract and indicated in their title as well. We would like to point out that this summary may not be representative for the rare binary setting (the focus of our paper). This is mainly because for rare binary data, if the number of studies I is small, meta-analysis would not help much unless researchers have useful prior information and employ a Bayesian approach to leverage such information.⁵³ Although it can confirm that the event probabilities are very small, meta-analysis cannot tell how close to zero they are without prior information, especially in the presence of zero or double zero tables. Thus, people usually apply meta-analysis to rare binary events when I is not small. For example, the most well-known meta-analysis for rare binary data in the literature, is perhaps the one about side effects of Rosiglitazone that we re-analyze in Section 5.1, with I = 48 in Nissen and Wolski’s 2007 study⁵⁴ and 56 in their 2010 study.⁴⁷ In other reported meta-analyses of rare binary events,^{55,56,57,8,58,59} the number of studies I is typically larger than 20. That’s why we simulate data with I = 10, 20, 50, 80, to represent very small, small, medium, and large sizes of meta-analysis. Our simulation results show that the proposed FlexB shows superior performance for I = 50 and 80 and works reasonably well for I = 10 and 20, too. Since FlexB employs virtually non-informative or diffuse priors, we do not recommend the use of FlexB with meta-analysis of only a few studies in which FlexB may yield larger MSE. In such situations, we refer readers to Friede et al.⁶⁰ and Günhan et al.¹⁸ where (weakly) informative priors are suggested. It is worth extending such priors to FlexB and examining its performance for meta-analysis with I ≤ 5.

In this paper, we employ the flexible random-effects model proposed by Li and Wang,¹⁴ which assumes that the distribution of treatment effects θ_j’s are fully heterogeneous across component studies. Recently, Moreno et al.⁶¹ considered situations where some θ_js are equal, making full heterogeneity no longer valid. They proposed a Bayesian model selection procedure for estimating the true cluster model, and then employed Bayesian model averaging to estimate the so-called meta parameter θ. Their proposed method works with small I (in their data example, I = 6) and they mention that computational difficulties arise when the number of studies I is moderately large. Under the context of rare binary data, where I is typically much larger, it would be interesting to consider a formal binomial-normal model for such situations and develop an efficient algorithm.

Supplementary Material

supinfo

NIHMS1877249-supplement-supinfo.pdf^{(522.5KB, pdf)}

ACKNOWLEDGMENT

This study was supported by the National Institutes of Health (Grant No.: R15GM131390 to X. Wang).

Footnotes

CONFLICT OF INTEREST

The authors declare no potential conflict of interests.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are included in the Supplementary Material.

REFERENCE

1.Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher 1976; 5(10): 3–8. [Google Scholar]
2.Borenstein M Introduction to meta-analysis. Chichester, U.K: John Wiley & Sons. 2009. [Google Scholar]
3.Mantel N, Haenszel W. Statistical Aspects of the Analysis of Data From Retrospective Studies of Disease. JNCI: Journal of the National Cancer Institute 1959; 22(4): 719–748. [PubMed] [Google Scholar]
4.Cochran WG. The Combination of Estimates from Different Experiments. Biometrics 1954; 10(1): 101. [Google Scholar]
5.DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7(3): 177–188. [DOI] [PubMed] [Google Scholar]
6.Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials.. Progress in cardiovascular diseases 1985; 27: 335–371. [DOI] [PubMed] [Google Scholar]
7.Bradburn MJ, Deeks JJ, Berlin JA, Localio AR. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2006; 26(1): 53–77. [DOI] [PubMed] [Google Scholar]
8.Kuss O Statistical methods for meta-analyses including information from studies without any events-add nothing to nothing and succeed nevertheless. Statistics in Medicine 2014; 34(7): 1097–1116. [DOI] [PubMed] [Google Scholar]
9.Ren Y, Lin L, Lian Q, Zou H, Chu H. Real-world Performance of Meta-analysis Methods for Double-Zero-Event Studies with Dichotomous Outcomes Using the Cochrane Database of Systematic Reviews. Journal of General Internal Medicine 2019; 34(6): 960–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Tian L, Cai T, Pfeffer MA, Piankov N, Cremieux PY, Wei LJ. Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction. Biostatistics 2008; 10(2): 275–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Houwelingen HCV, Zwinderman KH, Stijnen T. A bivariate approach to meta-analysis. Statistics in Medicine 1993; 12(24): 2273–2284. [DOI] [PubMed] [Google Scholar]
12.Bhaumik DK, Amatya A, Normand SLT, et al. Meta-Analysis of Rare Binary Adverse Event Data. Journal of the American Statistical Association 2012; 107(498): 555–567. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Simmonds MC, Higgins JP. A general framework for the use of logistic regression models in meta-analysis. Statistical Methods in Medical Research 2016; 25(6): 2858–2877. [DOI] [PubMed] [Google Scholar]
14.Li L, Wang X. Meta-analysis of rare binary events in treatment groups with unequal variability. Statistical Methods in Medical Research 2019; 28(1): 263–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Zhang C, Wang X, Chen M, Wang T. A comparison of hypothesis tests for homogeneity in meta-analysis with focus on rare binary events. Research Synthesis Methods 2021; 12(4): 408–428. [DOI] [PubMed] [Google Scholar]
16.Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: A comparative study. Statistics in Medicine 1995; 14(24): 2685–2699. [DOI] [PubMed] [Google Scholar]
17.Bai O, Chen M, Wang X. Bayesian Estimation and Testing in Random Effects Meta-Analysis of Rare Binary Adverse Events. Statistics in Biopharmaceutical Research 2016; 8(1): 49–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Günhan BK, Röver C, Friede T. Random-effects meta-analysis of few studies involving rare events. Research Synthesis Methods 2020; 11(1): 74–90. [DOI] [PubMed] [Google Scholar]
19.Hong H, Wang C, Rosner GL. Meta-analysis of rare adverse events in randomized clinical trials: Bayesian and frequentist methods. Clinical Trials 2020; 18(1): 3–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Gilks WR, Thomas A, Spiegelhalter DJ. A Language and Program for Complex Bayesian Modelling. The Statistician 1994; 43(1): 169. [Google Scholar]
21.Shuster JJ, Guo JD, Skyler JS. Meta-analysis of safety for low event-rate binomial trials. Research Synthesis Methods 2012; 3(1): 30–50. doi: 10.1002/jrsm.1039 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Polson NG, Scott JG, Windle J. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables. Journal of the American Statistical Association 2013; 108(504): 1339–1349. [Google Scholar]
23.Eddelbuettel D, François R. Rcpp: SeamlessR and C++ Integration. Journal of Statistical Software 2011; 40(8): 1–18. [Google Scholar]
24.Spiegelhalter DJ, Best NG, Carlin BP, Linde v. dA. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002; 64(4): 583–639. [Google Scholar]
25.Schwarz G Estimating the Dimension of a Model. The Annals of Statistics 1978; 6(2): 461–464. [Google Scholar]
26.Whittaker. Graphical Models in Applied Multi Statis. John Wiley and Sons. 1990. [Google Scholar]
27.Wolfinger R Laplace’s approximation for nonlinear mixed models. Biometrika 1993; 80(4): 791–795. [Google Scholar]
28.McCullagh P, Nelder J. Generalized Linear Models. Routledge. 2019. [Google Scholar]
29.Breslow NE, Clayton DG. Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 1993; 88(421): 9. [Google Scholar]
30.Agresti A Categorical Data Analysis. WILEY. 2012. [Google Scholar]
31.Parzen M, Lipsitz S, Ibrahim J, Klar N. An Estimate of the Odds Ratio That Always Exists. Journal of Computational and Graphical Statistics 2002; 11(2): 420–436. [Google Scholar]
32.Hartung J, Knapp G. On tests of the overall treatment effect in meta-analysis with normally distributed responses. Statistics in Medicine 2001; 20(12): 1771–1782. doi: 10.1002/sim.791 [DOI] [PubMed] [Google Scholar]
33.Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20(24): 3875–3889. doi: 10.1002/sim.1009 [DOI] [PubMed] [Google Scholar]
34.Sidik K, Jonkman JN. Simple heterogeneity variance estimation for meta-analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2005; 54(2): 367–384. doi: 10.1111/j.1467-9876.2005.00489.x [DOI] [Google Scholar]
35.Weber F, Knapp G, Glass Ä, Kundt G, Ickstadt K. Interval estimation of the overall treatment effect in random-effects meta-analyses: Recommendations from a simulation study comparing frequentist, Bayesian, and bootstrap methods. Research Synthesis Methods 2020; 12(3): 291–315. doi: 10.1002/jrsm.1471 [DOI] [PubMed] [Google Scholar]
36.Pateras K, Nikolakopoulos S, Mavridis D, Roes KC. Interval estimation of the overall treatment effect in a meta-analysis of a few small studies with zero events. Contemporary Clinical Trials Communications 2018; 9: 98–107. doi: 10.1016/j.conctc.2017.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Hartung J, Makambi KH. Reducing the Number of Unjustified Significant Results in Meta-analysis. Communications in Statistics - Simulation and Computation 2003; 32(4): 1179–1190. doi: 10.1081/sac-120023884 [DOI] [Google Scholar]
38.Paule R, Mandel J. Consensus Values and Weighting Factors. Journal of Research of the National Bureau of Standards 1982; 87(5): 377. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.DerSimonian R, Kacker R. Random-effects model for meta-analysis of clinical trials: An update. Contemporary Clinical Trials 2007; 28(2): 105–114. [DOI] [PubMed] [Google Scholar]
40.Zhang C, Chen M, Wang X. Statistical Methods for Quantifying Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events. Statistics and Its Interface 2020; 13(4): 449–464. doi: 10.4310/sii.2020.v13.n4.a3 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Hardy R, Thompson S. A likelihood approach to meta-analysis with random effects. Statistics in Medicine 1996; 15(6): 619–629. doi: [DOI] [PubMed] [Google Scholar]
42.Viechtbauer W Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in medicine 2007; 26: 37–52. doi: 10.1002/sim.2514 [DOI] [PubMed] [Google Scholar]
43.Jackson D, Bowden J, Baker R. Approximate confidence intervals for moment-based estimators of the between-study variance in random effects meta-analysis. Research synthesis methods 2015; 6: 372–382. doi: 10.1002/jrsm.1162 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Singh S, Loke YK, Furberg CD. Long-term Risk of Cardiovascular Events With Rosiglitazone. JAMA 2007; 298(10): 1189. [DOI] [PubMed] [Google Scholar]
45.Drazen JM, Morrissey S, Curfman GD. Rosiglitazone — Continued Uncertainty about Safety. New England Journal of Medicine 2007; 357(1): 63–64. [DOI] [PubMed] [Google Scholar]
46.Dahabreh IJ. Meta-analysis of rare events: an update and sensitivity analysis of cardiovascular events in randomized trials of rosiglitazone. Clinical Trials 2008; 5(2): 116–120. [DOI] [PubMed] [Google Scholar]
47.Nissen SE, Wolski K. Rosiglitazone revisited: an updated meta-analysis of risk for myocardial infarction and cardiovascular mortality. Archives of Internal Medicine 2010; 170(14): 1191–1201. [DOI] [PubMed] [Google Scholar]
48.Lane PW. Meta-analysis of incidence of rare events. Statistical Methods in Medical Research 2012; 22(2): 117–132. [DOI] [PubMed] [Google Scholar]
49.Böhning D, Mylona K, Kimber A. Meta-analysis of clinical trials with rare events. Biometrical Journal 2015; 57(4): 633–648. doi: 10.1002/bimj.201400184 [DOI] [PubMed] [Google Scholar]
50.McCarthy M US regulators relax restrictions on rosiglitazone. BMJ 2013; 347(nov28 1): f7144–f7144. [DOI] [PubMed] [Google Scholar]
51.Efron B Empirical Bayes Methods for Combining Likelihoods. Journal of the American Statistical Association 1996; 91(434): 538–550. [Google Scholar]
52.Rhodes KM, Turner RM, Higgins JP. Predictive distributions were developed for the extent of heterogeneity in meta-analyses of continuous outcome data. Journal of Clinical Epidemiology 2015; 68(1): 52–60. doi: 10.1016/j.jclinepi.2014.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Wang G, Cheng Y, Chen M, Wang X. Jackknife empirical likelihood confidence intervals for assessing heterogeneity in meta-analysis of rare binary event data. Contemporary Clinical Trials 2021; 107: 106440. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Nissen SE, Wolski K. Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes. New England Journal of Medicine 2007; 356(24): 2457–2471. [DOI] [PubMed] [Google Scholar]
55.Crowley P Interventions for preventing or improving the outcome of delivery at or beyond term. The Cochrane database of systematic reviews 1997; 2(CD000170). doi: 10.1002/14651858.cd000170 [DOI] [PubMed] [Google Scholar]
56.Bellamy L, Casas JP, Hingorani AD, Williams D. Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis. The Lancet 2009; 373(9677): 1773–1779. doi: 10.1016/s0140-6736(09)60731-5 [DOI] [PubMed] [Google Scholar]
57.Feng X, Zheng BS, Shi JJ, Qian J, He W, Zhou HF. Association of glutathione S-transferase P1 gene polymorphism with the susceptibility of lung cancer. Molecular Biology Reports 2012; 39(12): 10313–10323. [DOI] [PubMed] [Google Scholar]
58.Hemkens LG, Ewald H, Gloy VL, et al. Colchicine for prevention of cardiovascular events. CochraneDatabaseofSystematic Reviews 2016. doi: 10.1002/14651858.cd011047.pub2 [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Sharma T, Guski LS, Freund N, Gøtzsche PC. Suicidality and aggression during antidepressant treatment: systematic review and meta-analyses based on clinical study reports. BMJ 2016: i65. doi: 10.1136/bmj.i65 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Friede T, Röver C, Wandel S, Neuenschwander B. Meta-analysis of few small studies in orphan diseases. ResearchSynthesis Methods 2017; 8(1): 79–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Moreno E, Vázquez-Polo FJ, Negrín MA. Bayesian meta-analysis: The role of the between-sample heterogeneity. Statistical Methods in Medical Research 2017; 27(12): 3643–3657. doi: 10.1177/0962280217709837 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

NIHMS1877249-supplement-supinfo.pdf^{(522.5KB, pdf)}

Data Availability Statement

The data that support the findings of this study are included in the Supplementary Material.

[R1] 1.Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher 1976; 5(10): 3–8. [Google Scholar]

[R2] 2.Borenstein M Introduction to meta-analysis. Chichester, U.K: John Wiley & Sons. 2009. [Google Scholar]

[R3] 3.Mantel N, Haenszel W. Statistical Aspects of the Analysis of Data From Retrospective Studies of Disease. JNCI: Journal of the National Cancer Institute 1959; 22(4): 719–748. [PubMed] [Google Scholar]

[R4] 4.Cochran WG. The Combination of Estimates from Different Experiments. Biometrics 1954; 10(1): 101. [Google Scholar]

[R5] 5.DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7(3): 177–188. [DOI] [PubMed] [Google Scholar]

[R6] 6.Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials.. Progress in cardiovascular diseases 1985; 27: 335–371. [DOI] [PubMed] [Google Scholar]

[R7] 7.Bradburn MJ, Deeks JJ, Berlin JA, Localio AR. Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 2006; 26(1): 53–77. [DOI] [PubMed] [Google Scholar]

[R8] 8.Kuss O Statistical methods for meta-analyses including information from studies without any events-add nothing to nothing and succeed nevertheless. Statistics in Medicine 2014; 34(7): 1097–1116. [DOI] [PubMed] [Google Scholar]

[R9] 9.Ren Y, Lin L, Lian Q, Zou H, Chu H. Real-world Performance of Meta-analysis Methods for Double-Zero-Event Studies with Dichotomous Outcomes Using the Cochrane Database of Systematic Reviews. Journal of General Internal Medicine 2019; 34(6): 960–968. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Tian L, Cai T, Pfeffer MA, Piankov N, Cremieux PY, Wei LJ. Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction. Biostatistics 2008; 10(2): 275–281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Houwelingen HCV, Zwinderman KH, Stijnen T. A bivariate approach to meta-analysis. Statistics in Medicine 1993; 12(24): 2273–2284. [DOI] [PubMed] [Google Scholar]

[R12] 12.Bhaumik DK, Amatya A, Normand SLT, et al. Meta-Analysis of Rare Binary Adverse Event Data. Journal of the American Statistical Association 2012; 107(498): 555–567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Simmonds MC, Higgins JP. A general framework for the use of logistic regression models in meta-analysis. Statistical Methods in Medical Research 2016; 25(6): 2858–2877. [DOI] [PubMed] [Google Scholar]

[R14] 14.Li L, Wang X. Meta-analysis of rare binary events in treatment groups with unequal variability. Statistical Methods in Medical Research 2019; 28(1): 263–274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Zhang C, Wang X, Chen M, Wang T. A comparison of hypothesis tests for homogeneity in meta-analysis with focus on rare binary events. Research Synthesis Methods 2021; 12(4): 408–428. [DOI] [PubMed] [Google Scholar]

[R16] 16.Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to random-effects meta-analysis: A comparative study. Statistics in Medicine 1995; 14(24): 2685–2699. [DOI] [PubMed] [Google Scholar]

[R17] 17.Bai O, Chen M, Wang X. Bayesian Estimation and Testing in Random Effects Meta-Analysis of Rare Binary Adverse Events. Statistics in Biopharmaceutical Research 2016; 8(1): 49–59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Günhan BK, Röver C, Friede T. Random-effects meta-analysis of few studies involving rare events. Research Synthesis Methods 2020; 11(1): 74–90. [DOI] [PubMed] [Google Scholar]

[R19] 19.Hong H, Wang C, Rosner GL. Meta-analysis of rare adverse events in randomized clinical trials: Bayesian and frequentist methods. Clinical Trials 2020; 18(1): 3–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Gilks WR, Thomas A, Spiegelhalter DJ. A Language and Program for Complex Bayesian Modelling. The Statistician 1994; 43(1): 169. [Google Scholar]

[R21] 21.Shuster JJ, Guo JD, Skyler JS. Meta-analysis of safety for low event-rate binomial trials. Research Synthesis Methods 2012; 3(1): 30–50. doi: 10.1002/jrsm.1039 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Polson NG, Scott JG, Windle J. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables. Journal of the American Statistical Association 2013; 108(504): 1339–1349. [Google Scholar]

[R23] 23.Eddelbuettel D, François R. Rcpp: SeamlessR and C++ Integration. Journal of Statistical Software 2011; 40(8): 1–18. [Google Scholar]

[R24] 24.Spiegelhalter DJ, Best NG, Carlin BP, Linde v. dA. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002; 64(4): 583–639. [Google Scholar]

[R25] 25.Schwarz G Estimating the Dimension of a Model. The Annals of Statistics 1978; 6(2): 461–464. [Google Scholar]

[R26] 26.Whittaker. Graphical Models in Applied Multi Statis. John Wiley and Sons. 1990. [Google Scholar]

[R27] 27.Wolfinger R Laplace’s approximation for nonlinear mixed models. Biometrika 1993; 80(4): 791–795. [Google Scholar]

[R28] 28.McCullagh P, Nelder J. Generalized Linear Models. Routledge. 2019. [Google Scholar]

[R29] 29.Breslow NE, Clayton DG. Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 1993; 88(421): 9. [Google Scholar]

[R30] 30.Agresti A Categorical Data Analysis. WILEY. 2012. [Google Scholar]

[R31] 31.Parzen M, Lipsitz S, Ibrahim J, Klar N. An Estimate of the Odds Ratio That Always Exists. Journal of Computational and Graphical Statistics 2002; 11(2): 420–436. [Google Scholar]

[R32] 32.Hartung J, Knapp G. On tests of the overall treatment effect in meta-analysis with normally distributed responses. Statistics in Medicine 2001; 20(12): 1771–1782. doi: 10.1002/sim.791 [DOI] [PubMed] [Google Scholar]

[R33] 33.Hartung J, Knapp G. A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20(24): 3875–3889. doi: 10.1002/sim.1009 [DOI] [PubMed] [Google Scholar]

[R34] 34.Sidik K, Jonkman JN. Simple heterogeneity variance estimation for meta-analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2005; 54(2): 367–384. doi: 10.1111/j.1467-9876.2005.00489.x [DOI] [Google Scholar]

[R35] 35.Weber F, Knapp G, Glass Ä, Kundt G, Ickstadt K. Interval estimation of the overall treatment effect in random-effects meta-analyses: Recommendations from a simulation study comparing frequentist, Bayesian, and bootstrap methods. Research Synthesis Methods 2020; 12(3): 291–315. doi: 10.1002/jrsm.1471 [DOI] [PubMed] [Google Scholar]

[R36] 36.Pateras K, Nikolakopoulos S, Mavridis D, Roes KC. Interval estimation of the overall treatment effect in a meta-analysis of a few small studies with zero events. Contemporary Clinical Trials Communications 2018; 9: 98–107. doi: 10.1016/j.conctc.2017.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Hartung J, Makambi KH. Reducing the Number of Unjustified Significant Results in Meta-analysis. Communications in Statistics - Simulation and Computation 2003; 32(4): 1179–1190. doi: 10.1081/sac-120023884 [DOI] [Google Scholar]

[R38] 38.Paule R, Mandel J. Consensus Values and Weighting Factors. Journal of Research of the National Bureau of Standards 1982; 87(5): 377. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.DerSimonian R, Kacker R. Random-effects model for meta-analysis of clinical trials: An update. Contemporary Clinical Trials 2007; 28(2): 105–114. [DOI] [PubMed] [Google Scholar]

[R40] 40.Zhang C, Chen M, Wang X. Statistical Methods for Quantifying Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events. Statistics and Its Interface 2020; 13(4): 449–464. doi: 10.4310/sii.2020.v13.n4.a3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Hardy R, Thompson S. A likelihood approach to meta-analysis with random effects. Statistics in Medicine 1996; 15(6): 619–629. doi: [DOI] [PubMed] [Google Scholar]

[R42] 42.Viechtbauer W Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in medicine 2007; 26: 37–52. doi: 10.1002/sim.2514 [DOI] [PubMed] [Google Scholar]

[R43] 43.Jackson D, Bowden J, Baker R. Approximate confidence intervals for moment-based estimators of the between-study variance in random effects meta-analysis. Research synthesis methods 2015; 6: 372–382. doi: 10.1002/jrsm.1162 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Singh S, Loke YK, Furberg CD. Long-term Risk of Cardiovascular Events With Rosiglitazone. JAMA 2007; 298(10): 1189. [DOI] [PubMed] [Google Scholar]

[R45] 45.Drazen JM, Morrissey S, Curfman GD. Rosiglitazone — Continued Uncertainty about Safety. New England Journal of Medicine 2007; 357(1): 63–64. [DOI] [PubMed] [Google Scholar]

[R46] 46.Dahabreh IJ. Meta-analysis of rare events: an update and sensitivity analysis of cardiovascular events in randomized trials of rosiglitazone. Clinical Trials 2008; 5(2): 116–120. [DOI] [PubMed] [Google Scholar]

[R47] 47.Nissen SE, Wolski K. Rosiglitazone revisited: an updated meta-analysis of risk for myocardial infarction and cardiovascular mortality. Archives of Internal Medicine 2010; 170(14): 1191–1201. [DOI] [PubMed] [Google Scholar]

[R48] 48.Lane PW. Meta-analysis of incidence of rare events. Statistical Methods in Medical Research 2012; 22(2): 117–132. [DOI] [PubMed] [Google Scholar]

[R49] 49.Böhning D, Mylona K, Kimber A. Meta-analysis of clinical trials with rare events. Biometrical Journal 2015; 57(4): 633–648. doi: 10.1002/bimj.201400184 [DOI] [PubMed] [Google Scholar]

[R50] 50.McCarthy M US regulators relax restrictions on rosiglitazone. BMJ 2013; 347(nov28 1): f7144–f7144. [DOI] [PubMed] [Google Scholar]

[R51] 51.Efron B Empirical Bayes Methods for Combining Likelihoods. Journal of the American Statistical Association 1996; 91(434): 538–550. [Google Scholar]

[R52] 52.Rhodes KM, Turner RM, Higgins JP. Predictive distributions were developed for the extent of heterogeneity in meta-analyses of continuous outcome data. Journal of Clinical Epidemiology 2015; 68(1): 52–60. doi: 10.1016/j.jclinepi.2014.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Wang G, Cheng Y, Chen M, Wang X. Jackknife empirical likelihood confidence intervals for assessing heterogeneity in meta-analysis of rare binary event data. Contemporary Clinical Trials 2021; 107: 106440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Nissen SE, Wolski K. Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes. New England Journal of Medicine 2007; 356(24): 2457–2471. [DOI] [PubMed] [Google Scholar]

[R55] 55.Crowley P Interventions for preventing or improving the outcome of delivery at or beyond term. The Cochrane database of systematic reviews 1997; 2(CD000170). doi: 10.1002/14651858.cd000170 [DOI] [PubMed] [Google Scholar]

[R56] 56.Bellamy L, Casas JP, Hingorani AD, Williams D. Type 2 diabetes mellitus after gestational diabetes: a systematic review and meta-analysis. The Lancet 2009; 373(9677): 1773–1779. doi: 10.1016/s0140-6736(09)60731-5 [DOI] [PubMed] [Google Scholar]

[R57] 57.Feng X, Zheng BS, Shi JJ, Qian J, He W, Zhou HF. Association of glutathione S-transferase P1 gene polymorphism with the susceptibility of lung cancer. Molecular Biology Reports 2012; 39(12): 10313–10323. [DOI] [PubMed] [Google Scholar]

[R58] 58.Hemkens LG, Ewald H, Gloy VL, et al. Colchicine for prevention of cardiovascular events. CochraneDatabaseofSystematic Reviews 2016. doi: 10.1002/14651858.cd011047.pub2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Sharma T, Guski LS, Freund N, Gøtzsche PC. Suicidality and aggression during antidepressant treatment: systematic review and meta-analyses based on clinical study reports. BMJ 2016: i65. doi: 10.1136/bmj.i65 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] 60.Friede T, Röver C, Wandel S, Neuenschwander B. Meta-analysis of few small studies in orphan diseases. ResearchSynthesis Methods 2017; 8(1): 79–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] 61.Moreno E, Vázquez-Polo FJ, Negrín MA. Bayesian meta-analysis: The role of the between-sample heterogeneity. Statistical Methods in Medical Research 2017; 27(12): 3643–3657. doi: 10.1177/0962280217709837 [DOI] [PubMed] [Google Scholar]

PERMALINK

Bayesian Estimation and Testing in Random-Effects Meta-Analysis of Rare Binary Events Allowing for Flexible Group Variability

Ming Zhang

Jackson Barth

Johan Lim

Xinlei Wang

Abstract

1. INTRODUCTION

2 |. FLEXB: A BAYESIAN HIERARCHICAL APPROACH

2.1 |. The flexible REM

FIGURE 1.

2.2. Prior specification

2.3. Gibbs sampling based on Pólya-Gamma data-augmentation

3. HYPOTHESIS TESTING VIA MODEL SELECTION

3.1. BIC

3.2. DIC

4. SIMULATION

4.1. Performance on estimation of θ0

FIGURE 2.

FIGURE 3.

FIGURE 4.

FIGURE 5.

4.2. Performance on estimation of τ2

FIGURE 6.

FIGURE 7.

FIGURE 8.

FIGURE 9.

4.3. Performance on Bayesian hypothesis testing

FIGURE 10.

4.4. Computational efficiency of FlexB

TABLE 1.

5 |. DATA EXAMPLES

5.1 |. Rosiglitazone meta-analysis (56 studies)

FIGURE 11.

TABLE 2.

5.2. Stomach ulcers meta-analysis (41 studies)

FIGURE 12.

TABLE 3.

6. DISCUSSION

Supplementary Material

ACKNOWLEDGMENT

Footnotes

DATA AVAILABILITY STATEMENT

REFERENCE

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.1. Performance on estimation of θ₀

4.2. Performance on estimation of τ²