Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Feb 27.
Published in final edited form as: Biometrics. 2019 Apr 13;75(2):485–493. doi: 10.1111/biom.12998

Exact inference on the random-effects model for meta-analyses with few studies

Haben Michael 1, Suzanne Thornton 2, Minge Xie 2, Lu Tian 3
PMCID: PMC7045874  NIHMSID: NIHMS1551845  PMID: 30430540

Abstract

We describe an exact, unconditional, non-randomized procedure for producing confidence intervals for the grand mean in a normal-normal random effects meta-analysis. The procedure targets meta-analyses based on too few primary studies, ≤ 7, say, to allow for the conventional asymptotic estimators, e.g., DerSimonian and Laird (1986), or non-parametric resampling-based procedures, e.g., Liu et al. (2017). Meta-analyses with such few studies are common, with one recent sample of 22,453 heath-related meta-analyses finding a median of 3 primary studies per meta-analysis (Davey et al., 2011). Reliable and efficient inference procedures are therefore needed to address this setting. The coverage level of the resulting CI is guaranteed to be above the nominal level, up to Monte Carlo error, provided the meta-analysis contains more than 1 study and the model assumptions are met. After employing several techniques to accelerate computation, the new CI can be easily constructed on a personal computer. Simulations suggest that the proposed CI typically is not overly conservative. We illustrate the approach on several contrasting examples of meta-analyses investigating the effect of calcium intake on bone mineral density.

Keywords: bone mineral density, exact inference, meta-analysis, small-sample

1 ∣. INTRODUCTION

The random effects model is often used to account for between-study heterogeneity when conducting a meta-analysis. When the distribution of the primary study treatment effect estimates is approximately normal, the simple normal-normal model is commonly used, and the DerSimonian-Laird (“DL”) method and its variations are the most popular approach to estimating the model’s parameters and performing statistical inference (DerSimonian and Laird, 1986). However, the DL method is based on an asymptotic approximation and its use is only justified when the number of studies is large. In many fields, the number of studies used in a meta-analysis or sub-meta-analysis rarely exceeds 20 and is typically fewer than 7 (Davey et al., 2011), leaving inferences based on the DL estimator questionable. Indeed, extensive simulation studies have found that the coverage probability of the DL-based confidence interval (CI) can be substantially lower than the nominal level in various settings (Kontopantelis et al., 2010; IntHout et al., 2014), leading to false positives. One reason for this poor performance is that the asymptotic approximation ignores the variability in estimating the heterogeneous variance, which can be substantial when the number of studies is small (Higgins et al., 2009).

Various remedies have been proposed to correct the under-coverage of DL-based confidence intervals. Hartung and Knapp (2001) proposed an unbiased estimator of the variance of the DL point estimator explicitly accounting for the variability in estimating the heterogenous variance. Sidik and Jonkman (2006) used the heavy-tailed t-distribution to approximate the distribution of a modified Wald-type test statistic based on the DL estimator. Using the more robust t- rather than normal distribution has also been proposed (Raghunathan, 1993; Berkey et al., 1995; Follmann and Proschan, 1999). Hardy and Thompson (1996), Vangel and Rukhin (1999), Viechtbauer (2005), and Raudenbush (2009) proposed procedures based on maximum-likelihood estimation. Noma (2011) further improved the performance of the likelihood-based inference procedure when the number of studies is small by using a Bartlett-type correction. Zeng and Lin (2015) describe a resampling procedure to approximate the “large cluster” asymptotic distribution, i.e., as the primary study sizes all grow. Bayesian approaches incorporating external information have been developed by many authors (Smith et al., 1995; Higgins and Whitehead, 1996; Bodnar et al., 2017). However, with few exceptions, most of these methods still depend on an asymptotic approximation and their performance with very few studies has only been examined by specific simulation studies. To overcome these difficulties, potentially conservative but “exact” inference procedures for the random effects model have been proposed (Follmann and Proschan, 1999; Wang et al., 2010; Liu et al., 2017; Wang and Tian, 2017). A permutation rather than the asymptotic limiting distribution is used to approximate the distribution of the relevant test statistics and thus the validity of the associated inference is guaranteed for any number of studies. However, due to the discreteness of the permutation distribution, the highest significance level that may be achieved without randomization depends on the number of studies. For example, a 95% confidence interval can only be constructed with more than 5 studies. While Bayesian methods also permit statistical inference with fewer studies, the results are correspondingly sensitive to the choice of the prior distributions.

The main contribution of this paper is to propose a set of new methods for constructing exact, unconditional, non-randomized frequentist CIs for the location parameter of the normal-normal model by inverting exact tests. The coverage level of the resulting CI is guaranteed to be above the nominal level, up to Monte Carlo error, as long as the meta-analysis contains more than 1 study. After employing several techniques to accelerate computation, the new CI can be easily constructed on a personal computer. Simulations suggest that the proposed CI typically is not overly conservative. In Section 2, we present our procedure for constructing exact CIs for the population mean; in Section 3, we report results from comprehensive simulation studies; in Section 4, we illustrate the proposed method with a real data example; and in Section 5 we conclude the paper with additional discussion.

2 ∣. METHOD

The observed data consist of Y0={Yk,k=1,,K}, where Yk follows a random effects model,

Ykθkind.N(θk,σk2),θkind.N(μ0,τ02),k=1,,K,

with the variances σk2>0, k = 1, … , K, assumed known. The random effects model implies the simple parametric model

Ykind.N(μ0,σk2+τ02),k=1,,K. (1)

In the context of a meta-analysis, the pairs (Yk, σk2), k = 1, … , K, are interpreted as observed effects and known within-study variances drawn from K studies, respectively. The unobserved population effect and between-study variance are μ0 and τ02, respectively. The goal is inference on the location parameter μ0, viewing τ02 as a nuisance parameter. The typical number of studies depends on the area of research and can be small, e.g., K ≤ 10.

With τ02 known, the uniformly minimum variance unbiased estimator of μ0 under (1) is given by

k=1KYk(τ02+σk2)1k=1K(τ02+σk2)1.

As τ02 is unknown, DerSimonian and Laird (1986) propose substituting a simplified method of moments estimator,

τ^DL2=max{0,k=1K(Ykμ^F)2σk2(K1)k=1Kσk2(k=1Kσk4)(k=1Kσk2)},

where

μ^F=i=1KYkσk2i=1Kσk2

is the minimum variance unbiased estimator of μ0 under a fixed effects model, i.e., when τ02=0. The resulting estimator is known as the “DerSimonian-Laird” estimator of μ0:

μ^DL=k=1KYk(τ^DL2+σk2)1k=1K(τ^DL2+σk2)1.

By an analogous substitution, a level 1 − α confidence interval for μ0 is given by

{μ^DLz1α2(k=1K(τ^DL2+σk2)1)12},{μ^DL+z1α2(k=1K(τ^DL2+σk2)1)12}. (2)

The justification of the CI given in (2) relies on the asymptotic approximation

T0(μ0;Y)=(μ^DLμ0)2k=1K(τ^DL2+σk2)1χ12 (3)

as the number of studies, K, grows to infinity and max{σk}/ min{σk) is uniformly bounded. However, the exact distribution of T0(μ0;Y) depends on τ02 and may be very different from a χ12 distribution when K is moderate or small (Hoaglin, 2016). Consequently, the finite-sample performance of the CI given by (2) is often unsatisfactory. We propose constructing an exact CI for μ0 by first constructing an exact confidence region for (μ0, τ02). To this end, let T{(μ,τ2);Y0} denote a scalar test statistic, which may depend on the null parameter (μ, τ2), for the simple hypothesis (μ0,τ02)=(μ,τ2). The specific choice of T{(μ,τ2);Y0} will be discussed later and here we only assume that a high value of T{(μ,τ2);Y0} represents grounds for rejection. For a given choice of T{(μ,τ2);Y0}, a 1 − α level CI for μ0 can be constructed as follows:

  1. Obtain bounds [μmin, μmax] and [τmin2, τmax2] for μ0 andτ02.

  2. For each pair of μ and τ2 in an R × R grid of points on [μmin,μmax]×[τmin2,τmax2],
    1. Compute the null distribution of T{(μ,τ2);Y(μ,τ2)}, where
      Y(μ,τ2)={Y~k,k=1,,K}
      with Y~k~ind.N(μ,σk2+τ2), k = 1, … , K.
    2. Compute the p-value pμ,τ2(Y0)P[T{(μ,τ2);Y0}>T{(μ,τ2);Y(μ,τ2)}].
  3. Obtain a confidence region for (μ0, τ02) asΩ1α(Y0){(μ,τ2):pμ,τ2(Y0)>α}.

  4. Project Ω1α(Y0) onto the μ axis to obtain a CI for μ0:{μ:(μ,τ2)Ω1α(Y0)}.

We discuss below the selection of appropriate bounds for the first step; here, we assume their existence for purposes of illustration.

This method generates the exact CI for μ0 in the sense that

pr(μ0{μ:(μ,τ2)Ω1α(Y0)})1α.

This is due to the fact that

pr(μ0{μ:(μ,τ2)Ω1α(Y0)})pr{(μ0,τ02)Ω1α(Y0)}=pr{pμ0,τ02(Y0)α}=pr(Uα)=1α,

where the random variable U follows the unit uniform distribution. Here, we assume that τ0[τmin2,τmax2]. If τmin2 and τmax2 are chosen depending on the data in such a way that pr(τmin2<τ2<τmax2)1β, then the guaranteed coverage probability of the proposed CI is 1 − α ≈ 1 − α for very small β. The error of the approximation, i.e., the magnitude of β, can be made arbitrarily small by methods described further below.

The cumulative distribution function of T{(μ,τ2);Y(μ,τ2)} may not be analytically tractable, but it is well defined for any given grid point (μ, τ2) and can always be approximated by a Monte Carlo simulation. To be specific, given (μ, τ2), we may approximate the distribution of T{(μ,τ2);Y(μ,τ2)} in 2a as follows:

2(a) For b = 1, … , B,

  1. Generate e1b, … , eKb~ind.N(0,1).

  2. Let Ykb=μ+(σk2+τ2)12ekb, k = 1, … , K, and let Yb={Ykb,k=1,,K}.

  3. Let Tb=T{(μ,τ2);Yb} be the corresponding test statistic based on the generated data Yb. The empirical distribution of {T1, … ,TB} can be used to approximate the distribution of T{(μ,τ2);Y(μ,τ2)}.

Since the estimation of the null distribution in 2a does not depend on any asymptotic approximation, both the p-value, pμ,τ2(Y0), and the confidence region, Ω1α(Y0), are “exact” if we can safely ignore the errors of the grid approximation and the Monte Carlo simulation above, which can be controlled by increasing the grid density and B in step 2a, respectively.

Because the data Yk, k = 1, … , K, are distributed as N(μ,σk2+τ02), k = 1, … , K, whenever the shifted data Ykμ, k = 1, … , K, are distributed as N(0,σk2+τ02), k = 1 … , K, we restrict our focus to equivariant statistics (Lehmann and Romano, 2006), that is, T satisfying T{(μ,τ2);Y0}=T{(0,τ2);Y0μ}, where Y0μ={Ykμ,k=1,,K}. In this situation, testing the null H0:(μ0,τ02)=(μ,τ2) based on the data Y0 is the same as testing the null H0:(μ0,τ02)=(0,τ2) based on the shifted data Y0μ. When the test statistic is equivariant, the computations in step (2)(a) need only be performed once for each τ2 in the grid rather than each pair (μ, τ2). Thus, although a 2-dimensional grid is used in the algorithm, the computational complexity remains linear in the grid size, R. More specifically, steps (2)–(3) become:

2′.For each τ2 of an R-sized grid on [τmin2, τmax2],

  1. Compute the distribution of T{(0,τ2);Y(0,τ2)}.

  2. Compute q1−α;τ2, the 1 − α quantile of T{(0,τ2);Y(0,τ2)}.

  3. Compute Ω1α(τ2;Y0)={(μ,τ2)T{(μ,τ2);Y0}=T{(0,τ2);Y0μ}<q1α;τ2}.

3′.Compute a (1 − α)-level confidence region for (μ0, τ02) as

τ2[τmin2,τmax2]Ω1α(τ2;Y0).

We propose the test statistics

T{(μ,τ2);Y}=T0(μ;Y)+c0Tlik{(μ,τ2);Y}, (4)

where T0(μ;Y) is the sameWald-type test statistic used in the Dersimonian-Laird procedure,

Tlik{(μ,τ2);Y}=12k=1K[(Ykμ^DL)2τ^DL2+σk2+log{2π(τ^DL2+σk2)}]+k=1K12[(Ykμ)2τ2+σk2+log{2π(τ2+σk2)}],

and c0 is a tuning parameter controlling the relative contributions of these two statistics. While T0(μ;Y) directly focuses on the location parameter μ0, Tlik{(μ,τ2);Y}, similar to the likelihood ratio test statistic, targets the combination of μ0 and τ02 and helps to construct a narrower CI of μ0 when the number of studies is small. The proposed test statistics satisfy the equivariance condition, ensuring speedy computation when carrying out the procedure on a typical personal computer.

A further simplification afforded by this choice of test statistics is that step 2′c may be carried out by solving the quadratic inequality

A(τ)μ02+B(τ)μ0+C(τ)<0,

where

A(τ)=k=1K{1τ^DL2+σk2+c02(τ2+σk2)}>0,B(τ)=k=1K{2μ^0DLτ^0DL2+σk2+c0Ykτ2+σk2},C(τ)=k=1Kc02[Yk2τ2+σk2+logτ2+σk2τ^DL2+σk2(Ykμ^DL)2τ^DL2+σk2]+μ^DL2k=1K1τ^DL2+σk2q1α;τ2. (5)

As a result, the confidence interval of μ0 when τ0 = τ, Ω1α(τ2;Y0), is simply the segment with endpoints

(B(τ)Δ(τ)122A(τ),B(τ)+Δ(τ)122A(τ)),

when Δ(τ) = B(τ)2 − 4A(τ)C(τ) ≥ 0, and an empty set, otherwise.

To choose τmin2 and τmax2 in step (1) of the algorithm, we may use the endpoints of a 100(1 – β)%, e.g., 99.9%, confidence interval of τ02. This CI can be constructed by inverting the pivotal statistic

T3(τ2)=(WY){WΣ(τ)W}1(WY),

where Y = (Y1, … , YK)′, Σ(τ)=diag{σ12+τ2,,σK2+τ2}, and

W=(σ12i=1Kσi21σ22i=1Kσi2σK2i=1Kσi2σ12i=1Kσi2σ22i=1Kσi21σK2i=1Kσi2σ12i=1Kσi2σ22i=1Kσi2σK2i=1Kσi21).

The pivot follows a χK12 distribution when τ2=τ02.

Since our goal is a CI for μ0, the shape of the confidence region is crucial to its performance: the projection of Ω1α(Y0) onto the μ axis should be as small as possible, relative to the area of the confidence region. Figure 1 plots two confidence regions with the same confidence coefficient, but substantially different projected lengths. To avoid an overly conservative CI, we prefer a confidence region with boundaries parallel to the τ-axis, or nearly so. The shape of Ω1α(Y0) is determined by the way we combine T0(μ;Y) and Tlik{(μ,τ2);Y} or, more generally, by the choice of T{(μ,τ2);Y}. Because the proposed statistics (4) are quadratic in μ, the resulting confidence regions are a union of intervals with similar centers and tend not to produce overly conservative CIs when the tuning parameter c0 is chosen appropriately.

FIGURE 1.

FIGURE 1

The projection of the confidence region; the solid and dashed thick lines are boundaries of two confidence regions.

The proposed test statistic was chosen to balance performance and computation costs. For example, the true likelihood ratio test statistic under model (1) may be more informative than Tlik{(μ,τ2);Y), but its evaluation involves computing the maximum likelihood estimate and is substantially slower. The proposed algorithm is easily parallelized, so further gains in computing speed are available.

3 ∣. NUMERICAL STUDY

In this section, we study the small-sample performance of the proposed method through a comprehensive simulation study. Observed data are simulated under the random effects model

YkN(μ0,τ02+σk2),k=1,,K,

where σ1, … , σK, are K equally spaced points in the interval [1, 5], that is, σk = 1 + 4(k − 1)/(K − 1), k = 1, … , K. The population variance τ02 takes values 0, 12.5, and 25 to mimic settings with low, moderate, and high study heterogeneity, respectively. The corresponding I2 measures of heterogeneity are approximately 0, 50%, and 70%, respectively.

In the first set of simulations, we examine the effect of the tuning parameter c0 on the performance of the proposed method. For each set of simulated data, we construct a series of CIs using the proposed method with c0 ranging from 0 to 2.5 in increments of 0.1, and the number of studies K ranges from 3 to 20. Based on results from 10,000 simulated datasets under each combination of settings, we calculate the empirical coverage levels and average lengths of the resulting 95% CIs. In all settings, the empirical coverage levels of the proposed CIs are above the nominal level and therefore we optimize power by selecting the value of c0 with the shortest CI lengths. When K ≥ 10, the choice of c0 does not have a pronounced effect on CI length. When K is between 3 and 6, the setting of primary interest, assigning more weight to the likelihood ratio-type statistic typically reduces the length of the CIs. We summarize the value of c0 achieving the minimum mean 95% CI length in Figure 2. Based on these results, we suggest for a tuning parameter c0 = 1.2 for meta-analyses with fewer than 6 studies, c0 = 0.6 for meta-analysess with 6–10 studies, c0 = 0.2 for meta-analysis with 10–20 studies, and c0 = 0 for analysis with more than 20 studies.

FIGURE 2.

FIGURE 2

The choice of c0 achieving the minimum mean 95% CI length is plotted against the number K of studies, at 3 levels of between-study heterogeneity.

In the second set of simulations, we compare the performance of the proposed CIs with existing alternatives. For 10,000 replicates at each data-generation setting described above, we construct CIs using the DerSimonian-Laird, Sidik-Jonkman, and restricted maximum likelihood asymptotic variance estimates, as well as the proposed CI with the recommended tuning parameter. In Figure 3 we summarize the average coverage and lengths of these CIs. In the presence of moderate heterogeneity, I2 = 0.5, the empirical coverage level of the DL method is below 90% when K ≤ 10, with the lowest coverage ~ 75% when the number of studies is 3. The CIs based on the Sidik-Jonkman estimator have better coverage, but still drop below 90% when K ≤ 5. In contrast, the proposed exact CIs using the recommended tuning parameter settings do not fall below the nominal 95% coverage level. Morover, the coverage level is not overly conservative even for small Ks. The length of the 95% CI is comparable to the lengths of the asymptotic CIs, when these match the nominal coverage level, e.g., K = 20. When I2 = 0, i.e., the random effects model degenerates to the fixed effects model, all methods, including the asymptotic estimators, control the Type 1 error. Sidik-Jonkman’s CI is overly conservative even for moderate K values, while the proposed CIs, also overly conservative at lower values of K, improve steadily as K increases. When I2 = 0.70, only the proposed CIs maintain the proper coverage level, while other methods fall below the nominal level for K as large as 10–20.

FIGURE 3.

FIGURE 3

Comparison by 95% CI coverage and length of the proposed estimator with 3 commonly used estimators based on asymptotic approximations. Data was generated according to model (1) with the number of studies K varying between 3 and 20 and the ratio of between- to average within-variance adjusted to give 3 levels of between-study heterogeneity. The proposed estimator achieves the nominal size at all configurations, with overcoverage evident where the heterogeneity is low or the studies is very few (3-4).

Several other common estimators, including Hedges-Olkin, Hunter-Schmidt, and maximum likelihood, were also tested, with performance found to be generally intermediate between the performance of the DerSimonian-Laird and Sidik-Jonkman estimators. These other comparisons are reported in the Supplementary Materials. Also reported in the Supplementary Materials are results for a Bayesian estimator using a non-informative prior, as recently implemented by Röver (2017). The simulation results of the Bayesian estimator are on the whole comparable to our estimator but slightly more conservative. However, its theoretical basis is somewhat incomplete and our evaluation of its performance is limited to the investigated simulation settings.

In a third set of simulations, we compare the performance of the proposed estimator to other common estimators under misspecifications of the model, such as a skew or heavy-tailed distribution. Specifically, rather than using a normal distribution, we used a centered chi-square variable (Supplementary Material, Table 2), a Cauchy distribution (Supplementary Material, Table 3), a centered exponential distribution (Supplementary Material, Table 4), and a uniform distribution on the interval [−5, 5] (Supplementary Material, Table 5) to generate θk. We typically find that the coverage rate of the proposed estimator is somewhat conservative, whereas the asymptotic estimators fall below the nominal level, sometimes significantly so. The bayesian estimator with non-informative prior performs similarly to the proposed estimator, though somewhat more conservatively, at least under the default parameters of the selected implementation.

4 ∣. EXAMPLE

Tai et al. (2015) conduct a random effects meta-analysis of 59 randomized controlled trials to determine if increased calcium intake affects bone mineral density (“BMD”). Altogether, these trials measured the changes in BMD at five skeletal sites over three time points and measured the effect of calcium intake on BMD from dietary sources and from calcium supplements. We illustrate the proposed method using four meta-analyses. The first meta-analysis investigates changes in BMD of the lumbar spine and is based on the findings of 27 trials that lasted fewer than 18 months. As shown in Table 1, the 95% CI produced by the proposed exact method does not differ very much from the 95% CI based on the DL method. The two intervals have a similar length and are centered around a BMD difference of about 1.2. We also construct the exact CI by permuting a Hodge-Lehman type estimator (Liu et al., 2018). The resulting interval is very similar to the interval produced by the proposed method. These similarities are to be expected since the normality assumptions of the DL estimator may not be too unreasonable for a meta-analysis based on this number of primary studies.

TABLE 1.

Random effects meta-analyses of the effect of calcium supplements on percentage change in bone mineral density (Tai et al. (2015), Figs. 1, 3, and 7). The meta-analyses were carried out using the DerSimonian-Laird variance estimator (as in Tai et al. (2015)), the permutation test of Wang and Tian (2018), applicable to meta-analyses with 6 or more studies, and the proposed exact method. A nominal 95% CI is reported, with the length in parentheses. On the two smaller meta-analyses (K = 3, 5) the proposed exact method fails to reject the null of no change, whereas the asymptotic DL method does reject.

Study K DerSimonian-Laird Permutation Proposal
Lumbar spine 27 0.828–1.669 (0.841) 0.788–1.758 (0.970) 0.768–1.726 (0.958)
Total hip 6 0.502–1.847 (1.345) 0.000–2.298 (2.298) 0.159–2.246 (2.087)
Forearm 5 0.209–3.378 (3.169) −0.459–4.124 (4.583)
Total body 3 0.268–1.778 (1.511) −0.740–2.796 (3.536)

Two of the other random effects meta-analyses investigate changes in BMD in the hip and forearm for trials of size six and five, respectively, that lasted for more than two years. The fourth analysis we consider here is the meta-analysis of three trials that lasted fewer than 18 months and measured changes in BMD for the total body of subjects. For these three meta-analyses, however, the number of studies is small, and the DL method may be expected to fall short of the nominal level. In the hip study, the proposed exact method and the DL method both yield the same conclusion, producing 95% confidence intervals rejecting the null of no change in BMD, although the exact method produces confidence intervals that are wider than their DL counterparts. In contrast, the DL 95% confidence intervals for the forearm and total body studies find a significant change in BMD whereas the exact method does not, suggesting that the DL method may be giving a false positive in these two cases. The intervals and their lengths are given in Table 1. Note that the exact 95% CI based on the permutation method is not available for the last two meta analyses, since the number of studies is fewer than 6.

A table including confidence intervals obtained using other common estimators of τ2 is included in the Supplementary Materials.

5 ∣. DISCUSSION

We have proposed a method to construct an exact CI for the population mean under the normal-normal model commonly used in meta-analysis. Appropriate coverage is guaranteed, up to Monte Carlo error, even when the number of studies used in the meta-analysis is as small as 2. As an important limitation, the proposed “exact” inference procedure is developed under stringent parametric assumptions, which cannot be effectively examined from the data when the number of studies is small. We have examined by simulation a few common misspecifications, but the results still need to be interpreted with extreme caution. On the other hand, there is a practical need for meta-analyses with few studies, where unverifiable assumptions are unavoidable. The main objective of this paper is to propose a valid statistical method when those assumptions hold true. This incremental contribution is arguably warranted by the frequency with which meta-analyses with few studies are conducted using existing methods making the same parametric assumptions.

While convenient, the normal assumption for the study-specific treatment effect estimate may not be valid in other settings. For example, the treatment effect estimate may be an odds ratio from a 2 × 2 contingency table. If the total sample sizes are small or if cell entries are close to 0, the normal assumption for the odds ratio may be inappropriate. More generally, Yk may be a quantity relevant to a treatment effect with Ykθk following a non-normal, e.g., hypergeoemtric, distribution depending on the study-specific parameter θk. In such a case, the model for θk and the corresponding inference procedure warrant further research. More recently, there have been several new developments on confidence distribution and related generalized fiducial inference that have facilitated new inference procedures for meta-analysis (Xie and Singh, 2013; Claggett et al., 2014). These developments may also be promising directions for developing exact inference procedures for meta-analysis.

Supplementary Material

supp-meta-analysis

ACKNOWLEDGEMENTS

The authors would like to thank the editor, associate editor, and two referees for their constructive comments. This research is partially supported by R01 HL089778 (NIH/NHLBI) and NSF-DMS 1513483, 1737857, and 1812048.

Footnotes

SUPPORTING INFORMATION

Web Appendices, Tables, and Figures referenced in Sections 3 and 4 are available with this paper at the Biometrics website on Wiley Online Library. Routines in the R programming language for computing exact CIs for the population mean by the method proposed in Section 2 are also available at the Biometrics website on Wiley Online Library, and may also be installed from CRAN as package rma.exact. Figure 1 was generated using rma.exact.

REFERENCES

  1. Berkey CS, Hoaglin DC, Mosteller F, and Colditz GA (1995). A random-effects regression model for meta-analysis. Stat Med 14, 395–411. [DOI] [PubMed] [Google Scholar]
  2. Bodnar O, Link A, Arendacká B, Possolo A, and Elster C (2017). Bayesian estimation in random effects meta-analysis using a non-informative prior. Stat Med 36, 378–399. [DOI] [PubMed] [Google Scholar]
  3. Claggett B, Xie M, and Tian L (2014). Meta-analysis with fixed, unknown, study-specific parameters. J Am Stat Assoc 109, 1660–1671. [Google Scholar]
  4. Davey J, Turner RM, Clarke MJ, and Higgins JP (2011). Characteristics of meta-analyses and their component studies in the cochrane database of systematic reviews: A cross-sectional, descriptive analysis. BMC Med Res Methodol 11, 160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. DerSimonian R and Laird N (1986). Meta-analysis in clinical trials. Control Clin Trials 7, 177–188. [DOI] [PubMed] [Google Scholar]
  6. Follmann DA and Proschan MA (1999). Valid inference in random effects meta-analysis. Biometrics 55, 732–737. [DOI] [PubMed] [Google Scholar]
  7. Hardy RJ and Thompson SG (1996). A likelihood approach to meta-analysis with random effects. Stat Med 15, 619–629. [DOI] [PubMed] [Google Scholar]
  8. Hartung J and Knapp G (2001). On tests of the overall treatment effect in meta-analysis with normally distributed responses. Stat Med 20, 1771–1782. [DOI] [PubMed] [Google Scholar]
  9. Higgins J, Thompson SG, and Spiegelhalter DJ (2009). A reevaluation of random-effects meta-analysis. J R Stat Soc Series A: Stat Soc 172, 137–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Higgins J and Whitehead A (1996). Borrowing strength from external trials in a meta-analysis. Stat Med 15, 2733–2749. [DOI] [PubMed] [Google Scholar]
  11. Hoaglin DC (2016). Misunderstandings about q and “cochran’s q test” in meta-analysis. Stat Med 35, 485–495. [DOI] [PubMed] [Google Scholar]
  12. IntHout J, Ioannidis JP, and Borm GF (2014). The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Medi Res Methodol 14, 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kontopantelis E, Reeves D, et al. (2010). metaan: Random-effects meta-analysis. Stata J 10, 395. [Google Scholar]
  14. Lehmann EL and Romano JP (2006). Testing statistical hypotheses. Springer Science & Business Media. [Google Scholar]
  15. Liu S, Lee S, and Xie M (2017). Exact inference on meta-analysis with generalized fixed-effects and random-effects models. Biostat Epidemiol, 2, 1–22. [Google Scholar]
  16. Noma H (2011). Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections. Stat Med 30, 3304–3312. [DOI] [PubMed] [Google Scholar]
  17. Raghunathan T (1993). Analysis of binary data from a multicentre clinical trial. Biometrika 80, 127–139. [Google Scholar]
  18. Raudenbush SW (2009). Analyzing effect sizes: Random-effects models The Handbook of Research Synthesis and Meta-analysis, Second edition, 295–316. [Google Scholar]
  19. Röver C (2017). Bayesian random-effects meta-analysis using the bayesmeta R package. ArXiv e-prints. [Google Scholar]
  20. Sidik K and Jonkman JN (2006). Robust variance estimation for random effects meta-analysis. Comput Stat Data Anal 50, 3681–3701. [Google Scholar]
  21. Smith TC, Spiegelhalter DJ, and Thomas A (1995). Bayesian approaches to random-effects meta-analysis: A comparative study. Stat Med 14, 2685–2699. [DOI] [PubMed] [Google Scholar]
  22. Tai V, Leung W, Grey A, Reid IR, and Bolland MJ (2015). Calcium intake and bone mineral density: Systematic review and meta-analysis. BMJ 351, h4183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Vangel MG and Rukhin AL (1999). Maximum likelihood analysis for heteroscedastic one-way random effects ANOVA in interlaboratory studies. Biometrics 55, 129–136. [DOI] [PubMed] [Google Scholar]
  24. Viechtbauer W (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. J Educ Behav Stat 30, 261–293. [Google Scholar]
  25. Wang R, Tian L, Cai T, and Wei L (2010). Nonparametric inference procedure for percentiles of the random effects distribution in meta-analysis. Ann Appl Stat 4, 520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Wang Y and Tian L (2018). An efficient numerical algorithm for exact inference in meta analysis. J Stat Comput Simul 88, 646–656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Xie M.-g. and Singh K (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. Int Stat Rev 81, 3–39. [Google Scholar]
  28. Zeng D and Lin D (2015). On random-effects meta-analysis. Biometrika 102, 281–294. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp-meta-analysis

RESOURCES