Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jun 1.
Published in final edited form as: Biometrika. 2015 Apr 23;102(2):281–294. doi: 10.1093/biomet/asv011

On random-effects meta-analysis

D ZENG 1, D Y LIN 1
PMCID: PMC4681410  NIHMSID: NIHMS743320  PMID: 26688589

Summary

Meta-analysis is widely used to compare and combine the results of multiple independent studies. To account for between-study heterogeneity, investigators often employ random-effects models, under which the effect sizes of interest are assumed to follow a normal distribution. It is common to estimate the mean effect size by a weighted linear combination of study-specific estimators, with the weight for each study being inversely proportional to the sum of the variance of the effect-size estimator and the estimated variance component of the random-effects distribution. Because the estimator of the variance component involved in the weights is random and correlated with study-specific effect-size estimators, the commonly adopted asymptotic normal approximation to the meta-analysis estimator is grossly inaccurate unless the number of studies is large. When individual participant data are available, one can also estimate the mean effect size by maximizing the joint likelihood. We establish the asymptotic properties of the meta-analysis estimator and the joint maximum likelihood estimator when the number of studies is either fixed or increases at a slower rate than the study sizes and we discover a surprising result: the former estimator is always at least as efficient as the latter. We also develop a novel resampling technique that improves the accuracy of statistical inference. We demonstrate the benefits of the proposed inference procedures using simulated and empirical data.

Keywords: Clustered data, Evidence-based medicine, Genetic association, Heterogeneity, Individual patient data, Maximum likelihood estimation, Random-effects model, Research synthesis, Summary statistic

1. Introduction

Meta-analysis compares and combines results from multiple independent studies, with the hope of identifying consistent patterns and sources of disagreement. It has been routinely used in all areas of statistical applications, from astronomy to zoology. The meta-analysis literature has grown exponentially over the past three decades, owing to the need for reliable summarization of the vast and expanding volume of scientific research. According to the Web of Science, there were 2006 meta-analysis publications during the years 1985–1994, 13 154 during 1995–2004, and more than 55 000 during 2005–2014. The impact of meta-analysis on medical research and clinical practice has been enormous (see, e.g., Whitehead, 2002; Sutton & Higgins, 2008). Most of the recent discoveries about genetic variants associated with complex human diseases and traits have been made through meta-analysis (e.g., McCarthy et al., 2008; Evangelou & Ioannidis, 2013). Meta-analysis is likely to play an even greater role in the current big-data era.

Conventional meta-analysis relies on summary statistics, such as estimated effect sizes and standard errors, from relevant studies. Advances in technology and communications have made it increasingly feasible to access data on individual participants (e.g., Sutton et al., 2000). Indeed, joint analysis of individual patient data is considered the gold standard in systematic reviews of randomized clinical trials (e.g., Chalmers et al., 1993). Recently, a number of consortia have been formed to share individual participant data from genetic association studies (e.g., Psychiatric GWAS Consortium Steering Committee, 2009; Evangelou & Ioannidis, 2013; Lin et al., 2013).

Suppose that there are K independent studies, with nk participants for the kth study. The data consist of {𝒪ki : k =1,…, K; i =1,…, nk}, where 𝒪ki represents the observation, including the response variable and explanatory variables, on the ith participant of the kth study. Assume that the density function of 𝒪ki is proportional to fk(·; βk, ηk), where βk is the parameter of interest, namely the effect size for a new treatment or genetic mutation, and ηk is a set of nuisance parameters, such as the error variance and the regression effects of demographic variables. Let β̂k be the maximum likelihood estimator of βk based on the likelihood Lk(βk,ηk)=i=1nkfk(Oki;βk,ηk), and let k be the estimated variance of β̂k. We assume that standard regularity conditions (Cox & Hinkley, 1974, p. 281) hold and that the nk are large enough that β̂k is approximately normal with mean βk and variance k.

Under the fixed-effects model, βk =β for all k =1,…, K. The familiar inverse-variance estimator of β is

β^=(k=1KV^k-1)-1k=1KV^k-1β^k,

which is approximately normal with mean β and variance (V^k-1)-1. Let β̂ be the maximum likelihood estimator of β found by maximizing the joint likelihood k=1KLk(β,ηk). Olkin & Sampson (1998) and Mathew & Nordstrom (1999) showed that, in the case of comparing multiple treatments and a control for a continuous outcome with known error variances, β̂ is the same as β̂. Recently, Lin & Zeng (2010) proved that the two estimators are asymptotically equivalent for all commonly used parametric and semiparametric models.

We focus on the random-effects model

βk=β+ξk(k=1,,K),

where ξk ~ N(0, τ2). Because β̂k is approximately normal with mean β and variance k + τ2, it is natural to estimate β by

β^MA=(k=1Kw^k)-1k=1Kw^kβ^k, (1)

where w^k=(V^k+τ^MA2)-1, with (DerSimonian & Laird, 1986)

τ^MA2=max{0,k=1KV^k-1(β^k-β^)2-(K-1)k=1KV^k-1-k=1KV^k-2/k=1KV^k-1}. (2)

Inference on β is typically based on the normal approximation

β^MA~N{β,(w^k)-1}. (3)

For finite K, the estimator τ^MA2 involved in the ŵk is random and correlated with the β̂k, so the normal approximation given in (3) can be very crude. Indeed, it is well documented that confidence intervals based on (3) have very poor coverage for small and moderate K (see, e.g., Biggerstaff & Tweedie, 1997; Brockwell & Gordon, 2001). Various methods have been proposed to improve the confidence intervals (e.g., Hardy & Thompson, 1996; Biggerstaff & Tweedie, 1997; Brockwell & Gordon, 2001; Sidik & Jonkman, 2002; Henmi & Copas, 2010); however, none of these methods has been rigorously justified or widely accepted.

When individual participant data are available, one can maximize the loglikelihood

k=1Klogξki=1nkfk(Oki;β+ξk,ηk)(2πτ2)-1/2exp(-ξk2/2τ2)dξk. (4)

Denote the resulting estimators of β, τ2 and (η1,…, ηK) by β̂ML, τ^ML2 and (η̂1,…, η̂K). It is challenging to establish the theoretical properties of the maximum likelihood estimators, because the nk are typically larger than K. This problem is similar to that encountered in the analysis of large clusters. The existing theory for random-effects models with large clusters requires that the variance component be known (Bellamy et al., 2005). The fact that the variance component needs to be estimated from the data poses major theoretical challenges.

In the present paper, we establish the asymptotic properties of the meta-analysis estimators β̂MA and τ^MA2 and the maximum likelihood estimators β̂ML and τ^ML2 for the situations of a fixed K and diverging K. We then investigate the asymptotic relative efficiency of β̂MA to β̂ML and reveal that the former is at least as efficient as the latter. In addition, we develop a novel resampling technique that yields substantially more accurate inference than the normal approximation given in (3). Finally, we demonstrate the advantages of the new inference procedures by applying them to data from clinical trials on the treatment of cancer-associated anaemia.

2. Asymptotic theory

We assume that the nk (k =1,…, K) are comparable and denote their median by n. We write τ2 =σ2/n, where σ2 is a constant, such that the between-study variability is of the order n−1 and thus comparable to the within-study variability k. We estimate σ2 by σ^MA2=nτ^MA2 and σ^ML2=nτ^ML2 in the meta-analysis and maximum likelihood estimation, respectively. Note that σ^MA2 is a consistent method-of-moments estimator.

We assume the following regularity conditions.

  • Condition 1. The parameters (β, ηk, σ2) lie in the interior of a compact set 𝒞1 × 𝒞2k × 𝒞3 within the parameter domain.

  • Condition 2. For (βk, ηk) ∈ 𝒞1 × 𝒞2k, the function log fk(𝒪; βk, ηk) is thrice continuously differentiable.

  • Condition 3. For k =1,…, K, nk = pkn for some constant pk within a compact interval in (0,∞).

  • Condition 4. The information matrix of fk(𝒪; β, ηk) is continuous in a neighbourhood of the true parameter value (β0, ηk0), and its eigenvalues have positive lower and upper bounds uniformly for k =1,…, K.

In practice, K is small relative to n. Therefore, we investigate the asymptotic properties of the meta-analysis estimators and maximum likelihood estimators in the cases where K is fixed or diverges to ∞ at a slower rate than n. The results for the first case are stated below.

Theorem 1

Under Conditions 1–4,

σ^ML2AML=argmaxσ2{-12k=1Klog(Iβk-1+pkσ2)-12k=1KZk2vk+pkσ2+12(k=1Kpkvk+pkσ2)-1(k=1Kpk1/2Zkvk+pkσ2)2}

in distribution, where Iβk=-E{2logfk(O;β0,ηk0)/βk2}, 𝒵1,…,𝒵K are independent zero-mean normal random variables with variances vk+pkσ02, vk is the limit of nkk, and σ02 is the true value of σ2. In addition,

n1/2(β^ML-β0)(k=1Kpkvk+pkAML)-1k=1Kpk1/2Zkvk+pkAML

in distribution. Furthermore,

σ^MA2AMA=max[0,k=1Kvk-1{Zk/pk1/2-(k=1Kvk-1)-1k=1Kvk-1Zk/pk1/2}2-(K-1)k=1Kvk-1-k=1Kvk-2/k=1Kvk-1]

in distribution, where ṽk =vk/pk, and

n-1/2(β^MA-β0)(k=1Kpkvk+pkAMA)-1k=1Kpk1/2Zkvk+pkAMA

in distribution.

This theorem indicates that neither the maximum likelihood estimator β̂ML nor the metaanalysis estimator β̂MA is asymptotically normal. Their asymptotic distributions are mixtures of normal random variables, the mixing probabilities being random and correlated with the normal random variables. This phenomenon is different from the usual asymptotic theory (Cox & Hinkley, 1974, § 9.2) and is caused by the fact that K is fixed.

Before stating the asymptotic results for divergent K, we make two additional assumptions.

  • Condition 5. We have n→∞, K → ∞, and Kn−1/2→0.

  • Condition 6. For any σ2 ∈ 𝒞3, the following limit exists:
    g(σ2)=-limK1Kk=1K{vk+pkσ02vk+pkσ2+log(Iβk-1+pkσ2)},

    where g(σ2) has a unique maximum at σr2 in 𝒞3.

Remark 1

Condition 5 implies that the number of studies increases to infinity but not as fast as the study sizes. The function g(·) is the limit of the profile likelihood function for σ2, so Condition 6 guarantees the convergence of σ^ML2 to some unique value as K → ∞.

Theorem 2

Under Conditions 1–6, σ^ML2σr2 and β̂MLβ0 in probability. In addition,

(Kn)1/2(β^ML-β0)N{0,(σr2,σ02)-1(σr2,σr2)(σr2,σ02)-1}

in distribution, where

(σ12,σ22)=limK1Kk=1Kpk(vk+pkσ02)(vk+pkσ12)(vk+pkσ22),

which is assumed to exist for any σ12 and σ22 in 𝒞3. Under Conditions 1–5, σ^MA2σ02 and β̂MA → β0 in probability, and (Kn)1/2(β^MA-β0)N{0,(σ02,σ02)-1} in distribution. Furthermore, var(β̂ML) ≥ var(β̂MA) asymptotically, with equality if and only if σr2=σ02 or ṽ1 =···=K.

Theorem 2 provides some intriguing results. First, the maximum likelihood estimator for σ2 may not even be consistent; it converges to σr2 as defined in Condition 6. Second, the metaanalysis estimator β̂MA is at least as efficient as the maximum likelihood estimator β̂ML. Both results contradict standard likelihood theory, which is applicable only to a large number of small clusters with fixed parameters. In our case, K is relatively small compared to n, and the variance component τ2 changes with n.

3. A special case

To gain some insights into the asymptotic properties of the maximum likelihood estimators described in Theorems 1 and 2, we consider the special case of simple linear regression:

Yki=αk+(β+ξk)Xki+εki(i=1,,nk;k=1,,K),

where ξk ~ N(0, σ2/n), εki ~ N(0, 1), and ξk is independent of εki. Define

mx1k=i=1nkXki,mx2k=i=1nkXki2,myk=i=1nkYki,myxk=i=1nkYkiXki.

The loglikelihood function is, up to some constant,

-12k=1Klog(nmx2k+σ2)-12k=1Ki=1nk(Yki-αk-βXki)2+12k=1K1n/σ2+mx2k(myxk-αkmx1k-βmx2k)2.

After maximization over the αk and β, we obtain the profile loglikelihood function for σ2,

ln(σ2)=-12k=1Klog(nmx2k+σ2)-12nσ2k=1K(nkmyxk-mykmx1k)2/(nkmx2k-mx1k2)(nkn/σ2+nkmx2k-mx1k2)+12nσ2{k=1K(nkmyxk-mykmx1k)/(nk2/σ2+nkmx2k-mx1k2)}2k=1K(nkmx2k-mx1k2)/(nkn/σ2+nkmx2k-mx1k2).

Note that β^k=(nkmyxk-mykmx1k)/(nkmx2k-mx1k2) with variance vk/nk, where vk=nk2/(nkmx2k-mx1k2). We can rewrite the profile loglikelihood function as

ln(σ2)=-12k=1Klog(nmx2k+σ2)-n2k=1K(β^k-β0)2vk+σ2+n2{k=1K(β^k-β0)/(vk+σ2)}2k=1K1/(vk+σ2).

If K is fixed while n goes to ∞, then β^k=β0+(σ02/n+vk/nk)1/2Zk, where Z1,…, ZK are independent standard normal. It follows that

ln(σ2)=-12k=1Klog(nmx2k+σ2)-12k=1K(vk+σ02)Zk2vk+σ2+12{k=1K(vk+σ02)1/2Zk/(vk+σ2)}2k=1K1/(vk+σ2),

which is the result for σ^ML2 given in Theorem 1. When K diverges, Theorem 2 implies that the argument maximizing K−1ln(σ2), i.e., σ^ML2, converges to σr2, which maximizes the function

limK1K[-12k=1Klog{1pkE(Xk12)+σ2}-12k=1Kvar(Xk1)-1/pk+σ02var(Xk1)-1/pk+σ2]. (5)

Suppose that pk =1 for k =1,…, K and that the Xki all have the same distribution with mean mx and variance vx. Then (5) becomes

-12log{(vx+mx2)-1+σ2}-12vx-1+σ02vx-1+σ2.

Let σm2 and σM2 denote, respectively, the minimum and maximum of σ2 in 𝒞3. It can be shown that if σ02<4mx2vx-1(vx+mx2)-1-vx-1, then σr2=σm2; otherwise, σr2 can be σm2,σM2 or a value that maximizes the above limit, and the maximizer has to be

-12(vx-1-σ02)+12[(vx-1+σ02){vx-1+σ02-4mx2vx-1(vx+mx2)-1}]1/2,

which is σ02 if mx =0. In particular, if Xki ~ 2 × Ber(0·5) − 1 + mx, σ02=1,σm2=0·001 and σM2=2, then the above derivation implies that σr2=σm2=0·001 under mx =21/2 and σr2={(1-mx2)/(1+mx2)}1/2=0·61/2 under mx =0·5. In both cases, σr2σ02; however, the asymptotic variances of β̂ML and β̂MA are the same since the k (k =1,…, K) are equal.

To induce unequal variances, we let nk =50, 100, 400 and 800 for every four studies, and let Xki take the value 0 with probability a Un(0·05, 0·5) random variable in each study and the value 2 otherwise. We set αk =0 (k =1,…, K), β0 =0·5, σ02=1,σm2=0,σM2= and n =200. Then the asymptotic relative efficiency of β̂MA to β̂ML is approximately 1·2 according to Theorem 2. Using Monte Carlo simulation, we found the empirical relative efficiencies to be approximately 1·037, 1·083, 1·125, 1·171 and 1·218 for K =100, 200, 300, 400 and 500, respectively. The distributions of β̂ML, β̂MA, σ^ML2 and σ^MA2 for K =300 are displayed in Fig. 1. The empirical means of β̂ML, β̂MA, σ^ML2 and σ^MA2 are 0·50, 0·50, 0·00076 and 1·00, respectively, and the corresponding standard errors are 0·0073, 0·0069, 0·0263 and 0·169.

Fig. 1.

Fig. 1

Estimated density functions of: (a) β̂ML and β̂MA; (b) σ^ML2 and σ^MA2 in simple linear regression. In each panel, the solid curve corresponds to the maximum likelihood estimator and the dashed curve to the meta-analysis estimator.

4. Inference procedures

When K is large, we make inference about β0 by using the asymptotic normality of β̂ML and β̂MA described in Theorem 2. The asymptotic variance of (nK)1/2(β̂MLβ0) can be consistently estimated by ^(σ^ML2,σ^MA2)-1^(σ^ML2,σ^ML2)^(σ^ML2,σ^MA2)-1, where

^(σ12,σ22)=1Kk=1Kpk(nkV^k+pkσ^MA2)(nkV^k+pkσ12)(nkV^k+pkσ22).

The asymptotic variance of (nK)1/2(β̂MAβ0) can be consistently estimated by ^(σ^MA2,σ^MA2)-1. The normal approximation based on this variance estimator is equivalent to the normal approximation given in (3), which is the DerSimonian–Laird method.

When K is small, the normal approximations to the distributions of β̂ML and β̂MA are no longer accurate, so we appeal to Theorem 1. Without knowledge of σ02, however, it is not possible to estimate the asymptotic distributions given in Theorem 1. To deal with this problem, we propose a double resampling approach to meta-analysis. Theorem 1 shows that, for fixed K, the asymptotic distribution of n1/2(β̂MAβ0) depends on the asymptotic distribution of σ^MA2. Thus, we first simulate σ2 from the distribution of 𝒜MA, in which Zk~N(0,nkV^k+pkσ^MA2). For each sampled σ2, say σs2, we then generate β as

β^MA+n-1/2(k=1KpknkV^k+pkσs2)-1k=1Kpk1/2ZknkV^k+pkσs2,

where Zk~N(0,nkV^k+pkσs2). Operationally, this resampling procedure is equivalent to the following: simulate τ2 from equation (2), in which β^k~N(β^MA,V^k+τ^MA2); for each sampled τ2, say τs2, simulate β from equation (1), in which β^k~N(β^MA,V^k+τs2) and w^i=(V^k+τs2)-1. We repeat this process B times to obtain B2 values of β. The empirical distribution of those values is used to make inference about β.

5. Simulation studies

We used simulation to evaluate the proposed inference procedures. To facilitate comparisons with existing methods, we adopted the simulation set-up of Brockwell & Gordon (2001), which corresponds to a typical scenario of estimating a log odds ratio. Specifically, we generated the parameter estimates as

β^k=β+N(0,τ2+V^k)(k=1,,K),

where β = 0·5 and the k (k = 1,, K) are realizations from a χ12 distribution multiplied by 0·25 and then restricted to lie within the interval (0·009, 0·6). We varied τ2 from 0 to 0·1 and K from 5 to 50. The corresponding values of I2 (Higgins et al., 2003) are displayed in the Supplementary Material. For each simulated dataset, we obtained the 95% confidence intervals for β using the new resampling approach with B = 1000, the DerSimonian–Laird method, the profile likelihood method of Hardy & Thompson (1996), and the resampling method of Jackson & Bowden (2009). The coverage probabilities based on 10 000 replicates are shown in Fig. 2 and the Supplementary Material; the corresponding mean widths are shown in the Supplementary Material.

Fig. 2.

Fig. 2

Empirical coverage probabilities of nominal 95% confidence intervals plotted against τ2 for (a) K = 10 and (b) K = 20, and plotted against K for (c) τ2 = 0·03 and (d) τ2 = 0·07. In each panel, the different curves correspond to the new resampling method (solid), the DerSimonian–Laird method (dashed), the Jackson–Bowden method (dotted), and the Hardy–Thompson method (dot-dash).

The new method has reasonable coverage probabilities, especially when τ2 is small or K is large. Its coverage probabilities are always higher than those of the DerSimonian–Laird method. The differences between the two methods become smaller as K increases. Indeed, both methods have correct coverage probabilities when K is 50. The Jackson–Bowden method is very conservative. The new method has better coverage than the profile likelihood method when K is small and τ2 is not too small. By comparing our Fig. 1 with Fig. 4 of Brockwell & Gordon (2001), Fig. 1 of Sidik & Jonkman (2002), and Fig. 1 of Henmi & Copas (2010), we see that the new method outperforms the other methods. The widths of the confidence intervals are similar for the new and profile likelihood methods, which are bigger than those of the DerSimonian–Laird method.

6. Real-data example

The erythropoiesis-stimulating agents erythropoietin and darbepoetin are approved to treat chemotherapy-associated anaemia in patients with nonmyeloid malignancies. To evaluate mortality rates associated with the administration of these agents for the treatment of anaemia in cancer patients, Bennett et al. (2008) conducted a systematic review of 52 phase III clinical trials with 13 611 patients that compared the erythropoiesis-stimulating agents with placebo or standard care with respect to mortality. The estimated hazard ratios and 95% confidence intervals are shown in Fig. 2 of Bennett et al. (2008). Using the DerSimonian–Laird method, Bennett et al. (2008) obtained a hazard ratio estimate of 1·10 with a 95% confidence interval of (1·01, 1·20), which raised concerns about the safety of administering these agents to patients with cancer. Our resampling method yields a 95% confidence interval of (1·006, 1·201), which is very close to the DerSimonian–Laird counterpart. The corresponding intervals are (0·973, 1·242) and (0·986, 1·230) for the Jackson–Bowden and Hardy–Thompson methods, respectively.

Bennett et al. (2008) also applied the DerSimonian–Laird method to a subset of six trials consisting of 2089 patients who did not receive chemotherapy or radiation therapy and obtained a hazard ratio estimate of 1·29 with a 95% confidence interval of (1·00, 1·67). Our resampling method yields a 95% confidence interval of (0·851, 1·963), which is considerably wider. In this case, τ^MA2=0, so the DerSimonian–Laird method is actually the same as the fixed-effects method. By contrast, our resampling approach accounts for the variation in the estimation of τ2 and thus will not reduce to the fixed-effects method even when the point estimate of τ2 is zero. The Jackson–Bowden and Hardy–Thompson methods yield intervals of (0·798, 2·094) and (0·703, 1·792), respectively.

7. Remarks

Effect sizes tend to vary among study populations because of differences in demographic and environmental factors. Furthermore, the treatments or outcomes may not be identical across clinical trials, so the treatment effects may differ even for similar patient populations. In genetic association studies, different definitions and measurements of phenotypes, as well as different collections and manipulations of genotype data, also contribute to between-study heterogeneity. Thus, it is important to allow for heterogeneity through the use of random-effects models, especially when one is interested in parameter estimation rather than hypothesis testing. The confidence intervals under fixed-effects models have extremely poor coverage under even mild heterogeneity (e.g., Brockwell & Gordon, 2001; Henmi & Copas, 2010).

The prevailing approach to random-effects meta-analysis is the DerSimonian–Laird method. Indeed, DerSimonian & Laird’s 1986 paper has been cited more than 10 000 times in the Web of Science database. Our paper provides a rigorous asymptotic theory for the DerSimonian–Laird estimator in cases where the number of studies is fixed or divergent. In addition, we propose a resampling technique that yields more accurate inference than the commonly adopted normal approximation.

In most applications, the number of studies is much smaller than the study sizes, so that Condition 5 holds. It would be interesting to consider situations in which K diverges at the same rate as n or nr for some r ≥ 1/2. Such an extension would require careful examination of the higher-order expansion in the quadratic approximation to the profile loglikelihood of each study.

The random-effects model proposed by DerSimonian & Laird (1986) and considered in this paper assumes that the random effect is normally distributed. Additional simulation studies have revealed that our resampling procedure performs satisfactorily under other random-effects distributions; see Fig. 3. Recently, Wang et al. (2010) proposed a nonparametric inference procedure for the percentiles of the random-effects distribution. Their method works well for small K but may not be statistically efficient.

Fig. 3.

Fig. 3

Empirical coverage probabilities of nominal 95% confidence intervals when the random effects are from the τ × Ga(1, 1) distribution centred at its mean, plotted against τ2 for (a) K = 10 and (b) K = 20, and plotted against K for (c) τ2 = 0·03 and (d) τ2 = 0·07. In each panel, the different curves correspond to the new resampling method (solid), the DerSimonian–Laird method (dashed), the Jackson–Bowden method (dotted), and the Hardy–Thompson method (dot-dash).

For ethical and logistical reasons, individual participant data are not as easily accessible as summary statistics. Our work shows that it is not necessary to collect individual participant data. In fact, maximum likelihood analysis of individual participant data can give worse results than meta-analysis of summary statistics: the maximum likelihood estimator of σ2 may not be consistent as K → ∞, and the maximum likelihood estimator of β is always less efficient than the meta-analysis estimator.

We have assumed that β is a scalar; however, all our results can be extended to multivariate random-effects models (e.g., Jackson et al., 2010; Chen et al., 2012). Specifically, the meta-analysis estimator of β still takes the form of (1), whereas expression (2) is replaced by an estimator of the covariance matrix of the random effects (e.g., Chen et al., 2012). The basic conclusions of Theorems 1 and 2 continue to hold; a key change in the proofs is to approximate the profile likelihood function by a quadratic function based on the multivariate version of the Laplace approximation. Our resampling procedure remains the same except that multivariate versions of the estimators are used.

Our work can be applied to the analysis of large clusters in other contexts. The most rigorous theory for the analysis of large clusters was provided by Bellamy et al. (2005), who quantified the bias of the penalized quasilikelihood estimator of the cluster-level covariate effect in generalized linear mixed models for group-randomized trials under the assumption of a known variance component. Our framework covers this case with an unknown variance component upon setting βk = Xkγ + ξk (k = 1,, K), where Xk is a group-level treatment indicator. More generally, we may assume that βk = Xkγk (k = 1,, K), where γk = γ + ξk and Xk is a cluster-level covariate. Our theory suggests that meta-analysis of summary statistics is preferable to maximum likelihood analysis of individual participant data in such situations.

Supplementary Material

Supplementary material

Acknowledgments

This research was supported by the U.S. National Institutes of Health. The authors thank the editor, an associate editor and two referees for careful reviews and helpful comments.

Appendix. Proofs of Theorems 1 and 2

The proofs of both theorems rely on a local expansion of the loglikelihood function around the maximum likelihood estimators. By Condition 2 and Theorem 2.83 of van der Vaart & Wellner (1996), the class of functions {log fk (𝒪; βk, ηk) : (βk, ηk) ∈ 𝒞1 × 𝒞2k } and the corresponding classes of first and second derivatives are uniformly Donsker and Glivenko–Cantelli with respect to any probability measure. We will use this fact repeatedly. Let ℘nk denote the empirical measure for (𝒪k1,, 𝒪k, nk)conditional on ξk, and let ℘k be the conditional expectation under the density fk (· ; βk0, ηk0), where βk0 is the true value of βk. Then the kth summand in (4) can be written as

lk(β,σ2,ηk)=-12log2πσ2n+logξkexp(-nk[Pnk{-logfk(O;β+ξk,ηk)}+nξk22nkσ2])dξk.

Consider an open set 𝒩 for (β, η1,, ηK, σ2) with |ββ0| ≤ M(nK)−1/2, ||ηkηk0|| ≤ Mn−1/2 (k = 1,, K) for some large M to be chosen later, and σ2 ∈ 𝒞3. We will show that with probability tending to 1, there exists a local maximizer of k=1Klk(β,σ2,ηk) in this neighbourhood. The proof of existence consists of four main steps. First, we obtain a Laplace approximation to the integral in lk (β, σ2, ηk). Second, we show that for each (β, σ2) there exist estimators η̂k (β, σ2) (k = 1,, K) which maximize k=1Klk(β,σ2,ηk), and we obtain the profile loglikelihood function k=1Kplk(β,σ2), where plk (β, σ2) = lk{β, σ2, η̂k (β, σ2)}. Third, we show that there exists an estimator β̂ML(σ2) which maximizes k=1Kplk(β,σ2) for each σ2. Finally, we show that there exists an estimator for σ2, denoted by σ^ML2, which maximizes k=1Kplk{β^ML(σ2),σ2}, so we obtain β^ML=β^ML(σ^ML2) and η^k=η^k(β^ML,σ^ML2). These four steps are detailed in the Supplementary Material.

Proof of Theorem 1

When K is fixed, σ^ML2 does not converge to any constant. It follows from equation (S5) in the Supplementary Material and the arg max continuous mapping theorem (van der Vaart & Wellner, 1996, Theorem 3.2.2) that

σ^ML2argmax{-12k=1Klog(Iβk-1+pkσ2)-12k=1KZk2vk+pkσ2+12(k=1Kpkvk+pkσ2)-1(k=1Kpk1/2Zkvk+pkσ2)2}

in distribution. It then follows from equation (S4) in the Supplementary Material that

n1/2(β^ML-β0)(k=1Kpkvk+pkAML)-1k=1Kpk1/2Zkvk+pkAML

in distribution. Likewise, the asymptotic distribution for ( σ^MA2, β̂MA) can be obtained from the arg max continuous mapping theorem.

Proof of Theorem 2

We examine the random process Qn(σ2) given in (S5) of the Supplementary Material. Let

Znk(σ2)=nk{ξk0+vk(Pnk-Pk)(Sk)}n1/2(vk+pkσ2).

Then

Qn(σ2)=-12k=1Klog(pkσ2Iβk+1)-12k=1KnnkZnk2(σ2)(vk+pkσ2)+n2(k=1Knkvk+pkσ2)-1{k=1KZnk(σ2)}2.

Consider the random process K-1/2k=1KZnk(σ2) ndexed by σ2. We verify the conditions in Theorem 2.11.1 of van der Vaart & Wellner (1996). First, we verify the Lindeberg condition,

k=1KK-1E[maxσ2(σm2,σM2)Znk2(σ2)I{K-1/2maxσ2(σm2,σM2)Znk(σ2)>δ}]0,δ>0.

By the Markov inequality, the left-hand side is bounded by kE{Znk4(σm2)}/(σK)2. In addition,

E(ξk04)=2σ4/n2,E{(Pnk-Pk)(Sk)4}=nk-4{E(Sk4)nk+nk(nk-1)vk}{1+O(n-1)},E[ξk02{(Pnk-Pk)(Sk)}2]=E(ξk02E[{(Pnk-Pk)(Sk)}2ξk0])=E{ξk02var(Skξk0)/nk}=vkσ2(nnk)-1{1+O(n-1)},E{ξk03(Pnk-Pk)(Sk)}=E[ξk03E{(Pnk-Pk)(Sk)ξk0}]=0,|E[ξk0{(Pnk-Pk)(Sk)}3]|=|E(ξk0E[{(Pnk-Pk)(Sk)}3|ξk0])|E{ξk0nk-2E(Sk3ξk0)}c3n-5/2

for some constant c3. Thus, kE{Znk4(σm2)}/(δK)2c4K-10, where c4 is a constant. For the second condition in Theorem 2.11.1 of van der Vaart & Wellner (1996), we note that, by the mean value theorem,

K-1k=1KE[{Znk(σ12)-Znk(σ22)}2]σ12-σ222k=1KE[pk2nk{ξk0+vk(Pnk-P)(Sk)}2(vk+pkσm2)2]0

as σ12-σ220. Since 𝒞3 is one-dimensional, it is easy to see that the random entropy condition in Theorem 2.11.1 holds. In light of Condition 6, Theorem 2.11.1 implies that K-1/2k=1KZnk(σ2) converges weakly to a zero-mean Gaussian process with covariance function (σ12,σ22) between σ12 and σ22. Thus, the third term of Qn2) is op(K) uniformly in σ2. Similarly, for the second term of Qn2),

-12Kk=1KnnkZnk2(σ2)(vk+pkσ2)-12limK1Kk=1Kvk+pkσ02vk+pkσ2

in probability. Combining the above results, we conclude that, in probability,

1K{Qn(σ2)+12k=1Klog(Iβk)}12g(σ2),

which has a unique maximum at σr2 according to Condition 6.

We are ready to show that σ^ML2σr2 in probability. Since σ^ML2 is bounded, we can always choose a further subsequence from any subsequence, still denoted by σ^ML2, such that σ^ML2σ2. Because k=1Kplk{β^ML(σ^ML2),σ^ML2}k=1Kplk{β^ML(σr2),σr2}, it follows from equation (S5) in the Supplementary Material that

1K{Qn(σ^ML2)+12k=1Klog(Iβk)}1K{Qn(σr2)+1Kk=1Klog(Iβk)}-Op(1)n-1/2.

Taking the limit on both sides yields g(σ2)g(σr2). Therefore σ2=σr2.

In light of (S4) in the Supplementary Material, for the local maximum likelihood estimator β^ML=β^ML(σ^ML2), we have

(Kn)1/2(β^ML-β0)=(Kn)1/2(k=1Knkvk+pkσ2)-1[k=1Knkvk+pkσ2{ξk0+vk(Pnk-Pk)(Sk)}]+op(1)=(1nKk=1Knkvk+pkσ^ML2)-1{k=1KK-1/2Znk(σ^ML2)}+op(1).

By Theorem 2.11.1 of van der Vaart & Wellner (1996) and the convergence of σ^ML2 to σr2,

(Kn)1/2(β^ML-β0)N{0,(σr2,σ02)-1(σr2,σr2)(σr2,σ02)-1}

in distribution.

We now derive the asymptotic distribution of β̂MA. Because σ^MA2σ02 in probability as K → ∞, we have supk |ŵk/ nwk | = op(1), where wk=pk(vk+pkσ02)-1. Thus,

(Kn)1/2(β^MA-β0)=k=1KK1/2(w^k/n)n1/2(β^k-β0)k=1K(w^k/n)=k=1KK1/2wkn1/2(β^k-β0)k=1Kwk{1+op(1)}.

Since maxk wk /Σk wk → 0 by Condition 4, the Lindeberg–Feller central limit theorem yields that, in distribution,

k=1KK1/2wkn1/2(β^k-β0)k=1KwkN{0,(σ02,σ02)-1}.

By the Cauchy–Schwarz inequality, (σr2,σ02)-1(σr2,σr2)(σr2,σ02)-1(σ02,σ02)-1, and equality holds if and only if ( vk+σr2) is proportional to ( vk+σ02) for k = 1, …, K. This condition is met if σr2=σ02 or the k (k = 1, …, K) are all equal. The lower bound is the asymptotic variance of β̂MA. Thus β̂MA has a smaller asymptotic variance than βML, unless σr2=σ02 or the k are all the same.

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes additional technical and simulation results.

Contributor Information

D. ZENG, Email: dzeng@bios.unc.edu.

D. Y. LIN, Email: lin@bios.unc.edu.

References

  1. Bellamy SL, Li Y, Lin X, Ryan LM. Quantifying PQL bias in estimating cluster-level covariate effects in generalized linear mixed models for group-randomized trials. Statist Sinica. 2005;15:1015–32. [Google Scholar]
  2. Bennett CL, Silver SM, Djulbegovic B, Samaras AT, Blau CA, Gleason KJ, Barnato SE, Elverman KM, Courtney DM, McKoy JM, Edwards BJ, Tigue CC, Raisch DW, Yarnold PR, Dorr DA, Kuzel TM, Tallman MS, Trifilio SM, West DP, Lai SY, et al. Venous thromboembolism and mortality associated with recombinant erythropoietin and darbepoetin administration for the treatment of cancer-associated anemia. J Am Med Assoc. 2008;299:914–24. doi: 10.1001/jama.299.8.914. [DOI] [PubMed] [Google Scholar]
  3. Biggerstaff BJ, Tweedie RL. Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis. Statist Med. 1997;16:753–68. doi: 10.1002/(sici)1097-0258(19970415)16:7<753::aid-sim494>3.0.co;2-g. [DOI] [PubMed] [Google Scholar]
  4. Brockwell SE, Gordon IR. A comparison of statistical methods for meta-analysis. Statist Med. 2001;20:825–40. doi: 10.1002/sim.650. [DOI] [PubMed] [Google Scholar]
  5. Chalmers I, Sandercrock P, Wennberg J. The Cochrane collaboration: Preparing, maintaining, and disseminating systematic reviews of the effects of health care. Ann New York Acad Sci. 1993;703:156–65. doi: 10.1111/j.1749-6632.1993.tb26345.x. [DOI] [PubMed] [Google Scholar]
  6. Chen H, Manning AK, Dupuis J. A method of moments estimator for random effect multivariate meta-analysis. Biometrics. 2012;68:1278–84. doi: 10.1111/j.1541-0420.2012.01761.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cox DR, Hinkley DV. Theoretical Statistics. New York: Chapman & Hall; 1974. [Google Scholar]
  8. DerSimonian R, Laird N. Meta-analysis in clinical trials. Contr Clin Trials. 1986;7:177–88. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
  9. Evangelou E, Ioannidis JP. Meta-analysis methods for genome-wide association studies and beyond. Nature Rev Genet. 2013;14:379–89. doi: 10.1038/nrg3472. [DOI] [PubMed] [Google Scholar]
  10. Hardy RJ, Thompson SG. A likelihood approach to meta-analysis with random effects. Statist Med. 1996;15:619–29. doi: 10.1002/(SICI)1097-0258(19960330)15:6<619::AID-SIM188>3.0.CO;2-A. [DOI] [PubMed] [Google Scholar]
  11. Henmi M, Copas JB. Confidence intervals for random effects meta-analysis and robustness to publication bias. Statist Med. 2010;29:2969–83. doi: 10.1002/sim.4029. [DOI] [PubMed] [Google Scholar]
  12. Higgins J, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. Br Med J. 2003;327:557–60. doi: 10.1136/bmj.327.7414.557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jackson D, Bowden J. A re-evaulation of the ‘quantile approximation method’ for random effects meta-analysis. Statist Med. 2009;28:338–48. doi: 10.1002/sim.3487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jackson D, White IR, Thompson SG. Extending DerSimonian and Laird’s methodology to perform multivariate random effects meta-analyses. Statist Med. 2010;29:1282–97. doi: 10.1002/sim.3602. [DOI] [PubMed] [Google Scholar]
  15. Lin DY, Zeng D. On the relative efficiency of using summary statistics versus individual level data in meta-analysis. Biometrika. 2010;97:321–32. doi: 10.1093/biomet/asq006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lin DY, Zeng D, Tang ZZ. Quantitative trait analysis in sequencing studies with trait-dependent sampling. Proc Nat Acad Sci. 2013;110:12247–52. doi: 10.1073/pnas.1221713110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Mathew T, Nordstrom K. On the equivalence of meta-analysis using literature and using individual patient data. Biometrics. 1999;55:1221–3. doi: 10.1111/j.0006-341x.1999.01221.x. [DOI] [PubMed] [Google Scholar]
  18. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Rev Genet. 2008;9:356–69. doi: 10.1038/nrg2344. [DOI] [PubMed] [Google Scholar]
  19. Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics. 1998;54:317–22. [PubMed] [Google Scholar]
  20. Psychiatric GWAS Consortium Steering Committee. A framework for interpreting genome-wide association studies of psychiatric disorders. Molec Psychiatry. 2009;14:10–17. doi: 10.1038/mp.2008.126. [DOI] [PubMed] [Google Scholar]
  21. Sidik K, Jonkman JN. A simple confidence interval for meta-analysis. Statist Med. 2002;21:3153–9. doi: 10.1002/sim.1262. [DOI] [PubMed] [Google Scholar]
  22. Sutton AJ, Higgins JPT. Recent developments in meta-analysis. Statist Med. 2008;27:625–50. doi: 10.1002/sim.2934. [DOI] [PubMed] [Google Scholar]
  23. Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-Analysis in Medical Research. Chichester: Wiley; 2000. [Google Scholar]
  24. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. New York: Springer; 1996. [Google Scholar]
  25. Wang R, Tian L, Cai T, Wei LJ. Nonparametric inference procedure for percentiles of the random effects distribution in meta-analysis. Ann Appl Statist. 2010;4:520–32. doi: 10.1214/09-AOAS280SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Whitehead A. Meta-Analysis of Controlled Clinical Trials. Chichester: Wiley; 2002. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

RESOURCES