On the relative efficiency of using summary statistics versus individual-level data in meta-analysis

D Y Lin; D Zeng

doi:10.1093/biomet/asq006

. 2010 Apr 15;97(2):321–332. doi: 10.1093/biomet/asq006

On the relative efficiency of using summary statistics versus individual-level data in meta-analysis

D Y Lin ¹, D Zeng ¹

PMCID: PMC3412575 PMID: 23049122

Summary

Meta-analysis is widely used to synthesize the results of multiple studies. Although meta-analysis is traditionally carried out by combining the summary statistics of relevant studies, advances in technologies and communications have made it increasingly feasible to access the original data on individual participants. In the present paper, we investigate the relative efficiency of analyzing original data versus combining summary statistics. We show that, for all commonly used parametric and semiparametric models, there is no asymptotic efficiency gain by analyzing original data if the parameter of main interest has a common value across studies, the nuisance parameters have distinct values among studies, and the summary statistics are based on maximum likelihood. We also assess the relative efficiency of the two methods when the parameter of main interest has different values among studies or when there are common nuisance parameters across studies. We conduct simulation studies to confirm the theoretical results and provide empirical comparisons from a genetic association study.

Some key words: Cox regression, Evidence-based medicine, Genetic association, Individual patient data, Information matrix, Linear regression, Logistic regression, Maximum likelihood, Profile likelihood, Research synthesis

1. Introduction

Meta-analysis, the combination of results from a series of independent studies, is gaining popularity in many fields, including medicine, psychology, epidemiology, education, genetics and ecology. In particular, meta-analysis publications in medical research have grown enormously over the last three decades, due to greater emphasis on evidence-based medicine and the need for reliable summarization of the vast and expanding volume of clinical research (e.g. Sutton et al., 2000; Whitehead, 2002). Most of the recent discoveries on genetic variants influencing complex human diseases were made possible through meta-analysis of multiple studies (e.g. Lohmueller et al., 2003; Zeggini et al., 2008).

Traditionally, meta-analysis is carried out by combining the summary statistics of relevant studies, which are available in journal articles. With improving technologies and communications and increasing recognition of the benefits of meta-analysis, it is becoming more feasible to access the raw or original data on individual participants (e.g. Sutton et al., 2000). Indeed, meta-analysis of individual patient data is regarded as the gold standard in systematic reviews of randomized clinical trials (e.g. Chalmers et al., 1993). Recently, a number of networks and consortia have been created to share original data from genetic association studies (e.g. Kavvoural & Ioannidis, 2008; The Psychiatric GWAS Consortium Steering Committee, 2009). In general, obtaining original data is difficult, costly and time-consuming. A question naturally arises as to how much efficiency gain can be achieved by analyzing original or individual-level data over combining summary statistics.

A partial answer to this question was provided by Olkin & Sampson (1998), who showed that, in the case of comparing multiple treatments and a control with respect to a continuous outcome, the traditional meta-analysis based on estimated treatment contrasts is equivalent to the least-squares regression analysis of individual patient data if there are no study-by-treatment interactions and the error variances are constant across trials. Mathew & Nordström (1999) claimed that the equivalence holds even if the error variances are different across trials. There has been no theoretical investigation beyond this special setting. Empirically, meta-analysis using original data has been found to be generally similar but not identical to meta-analysis using summary statistics (e.g. Whitehead, 2002, Ch. 5).

In the present paper, we provide a systematic investigation into the relative efficiency of using summary statistics versus original data in fixed-effects meta-analysis, which assumes a common effect among studies. We prove that the two types of meta-analysis are asymptotically equivalent for all commonly used parametric and semiparametric models provided that the effect sizes are indeed the same for all studies, the nuisance parameters have different values across studies, and maximum likelihood estimation is used in the calculations of summary statistics and in the joint analysis of original data. We also investigate the relative efficiency of the two methods when the effect sizes are different among studies or when there are common nuisance parameters across studies. We illustrate the theoretical results with simulated and empirical data.

2. Theoretical results

2.1. Main results

Suppose that there are K independent studies, with n_k participants for the kth study. The original data consist of (Y_ki, X_ki) (k = 1, . . ., K ; i = 1, . . ., n_k), where Y_ki is the response variable for the ith participant of the kth study, and X_ki is the corresponding vector of explanatory variables. The response variable can be continuous or discrete, univariate or multivariate. Under fixed-effects models, the conditional density of Y_ki given X_ki takes the form f (y, x; β, η_k), where β is a vector of parameters common to all K studies, and η_k is a vector of parameters specific to the kth study. A simple example is the linear regression model for the normal response variable:

Y_{k i} = α_{k} + β^{T} X_{k i} + ∊_{k i} (k = 1, \dots, K; i = 1, \dots, n_{k}),

where ∊_ki is normal with mean zero and variance $σ_{k}^{2}$ (Whitehead, 2002, § 5.2.1). In this case, f (y, x; β, η_k) = (2π $σ_{k}^{2}$ )^−1/2 exp{−(y − α_k − β^Tx)²/2 $σ_{k}^{2}$ }, and η_k = (α_k, $σ_{k}^{2}$ ). Additional examples are given in § 2.4. We wish to make inference about β.

Meta-analysis is usually performed on a scalar parameter. We allow vector-valued β for two reasons. First, there are important applications in which the effects of interest, such as treatment differences in a multi-arm clinical trial or co-dominant effects of a genetic variant, are truly multivariate. Second, if the nuisance parameters, e.g. intercepts or confounding effects, have the same values among the K studies, then performing meta-analysis jointly on the effects of interest and the common nuisance parameters can improve statistical efficiency, as will be discussed in § 2.2. Of course, our formulation includes scalar β as a special case.

Let β̂_k be the maximum likelihood estimator of β by maximizing the kth study likelihood:

L_{k} (β, η_{k}) = \prod_{i = 1}^{n_{k}} f (Y_{k i}, X_{k i}; β, η_{k}),

and let β̃ be the maximum likelihood estimator of β by maximizing the joint likelihood

L (β, η_{1}, \dots, η_{K}) = \prod_{k = 1}^{K} L_{k} (β, η_{k}) .

The profile likelihood functions for β based on L_k (β, η_k) and L(β, η₁, . . ., η_K) are, respectively,

p l_{k} (β) = sup_{η_{k}} L_{k} (β, η_{k})

and

p l (β) = sup_{η_{1}, \dots, η_{K}} L (β, η_{1}, \dots, η_{K}) .

The corresponding observed profile information matrices are ℐ_k (β) = −∂² log pl_k (β)/∂β² and ℐ(β) = −∂² log pl(β)/∂β². The maximizer of the profile likelihood is the same as the maximum likelihood estimator in that β̂_k = argmax pl_k (β) and β̃ = argmax pl(β). Write n = ∑_k n_k and assume that n_k/n → c_k ∈ (0, 1) as n → ∞. Assume also that the regularity conditions for profile likelihood as stated in Murphy & van der Vaart (2000) hold. Then n_k $ℐ_{k}^{- 1}$ (β̂_k) and nℐ⁻¹(β̃) are consistent estimators of the covariance matrices of $n_{k}^{1 / 2}$ (β̂_k − β) and n^1/2(β̃ − β), respectively.

Remark 1. For survival data and other censored data, the likelihood needs to be modified. If Y_ki is right censored at Ỹ_ki, then we replace f (Y_ki, X_ki; β, η_k) in the likelihood by S(Ỹ_ki, X_ki; β, η_k), where S(y, x; β, η_k) = $\int_{y}^{\infty} f (u, x; β, η_{k}) d u$ .

Remark 2. Our framework allows different likelihood functions among studies, and the statistical models are not necessarily regression models. In meta-analysis of diagnostic accuracy data, the likelihood for each study pertains to the multinomial distribution of a 2 × 2 contingency table, and β can be sensitivity or specificity or both.

In traditional meta-analysis, one collates summary statistics β̂_k and vâr(β̂_k) = 1/ℐ_k (β̂_k) (k = 1, . . ., K) for a scalar parameter β. The well-known inverse-variance estimator of β is

\hat{β} = \frac{\sum_{k = 1}^{K} {\hat{β}}_{k} / v \hat{a} r ({\hat{β}}_{k})}{\sum_{k = 1}^{K} 1 / v \hat{a} r ({\hat{β}}_{k})},

(1)

and its variance is estimated by

v \hat{a} r (\hat{β}) = \frac{1}{\sum_{k = 1}^{K} 1 / v \hat{a} r ({\hat{β}}_{k})} .

(2)

To allow vector-valued β, we use a multivariate version of estimator (1):

\hat{β} = {\sum_{k = 1}^{K} ℐ_{k} ({\hat{β}}_{k})}^{- 1} \sum_{k = 1}^{K} ℐ_{k} ({\hat{β}}_{k}) {\hat{β}}_{k},

(3)

whose covariance matrix is estimated by

v \hat{a} r (\hat{β}) = {\sum_{k = 1}^{K} ℐ_{k} ({\hat{β}}_{k})}^{- 1} .

(4)

Clearly, (3) and (4) reduce to (1) and (2) if β is a scalar. If original data are available, one can estimate β by the maximum likelihood estimator β̃, whose covariance matrix is estimated by vâr(β̃) = ℐ⁻¹(β̃). It is easy to verify that $p l (β) = \prod_{k = 1}^{K} p l_{k} (β)$ , which implies that $ℐ (β) = \sum_{k = 1}^{K} ℐ_{k} (β)$ . Thus,

v \hat{a} r (\tilde{β}) = {\sum_{k = 1}^{K} ℐ_{k} (\tilde{β})}^{- 1} .

(5)

Equations (4) and (5) show that vâr(β̂) and vâr(β̃) take the same form; the only difference is that ℐ_k (β) are evaluated at β̂_k in the former and at β̃ in the latter. Under standard regularity conditions, β̂_k (k = 1, . . ., K) and β̃ converge to β, and $n_{k}^{- 1}$ ℐ_k (β) (k = 1, . . ., K) converge to constant matrices. It follows that n^1/2(β̂ − β) and n^1/2(β̃ − β) have the same limiting normal distribution. Thus, using summary statistics has the same asymptotic efficiency as using original data.

2.2. Common nuisance parameters

According to the results of the last section, meta-analysis based on summary statistics has the same asymptotic efficiency as the maximum likelihood estimator of full data if the former analysis is performed jointly on all common parameters. It is generally difficult to obtain multivariate summary statistics, especially in retrospective meta-analysis of published results. Thus, it is important to determine the efficiency loss of meta-analysis based on the univariate summary statistics for the effect of main interest when there are other common effects, which are referred to as common nuisance parameters.

Suppose that β is a scalar parameter representing a common effect of main interest and that a subset of η_k, denoted by γ, is a vector of common nuisance parameters. Denote the profile information matrices for (β, γ) based on the kth study data and all the data as

ℐ_{k} = (\begin{array}{l} ℐ_{k β β} & ℐ_{k β γ} \\ ℐ_{k γ β} & ℐ_{k γ γ} \end{array}) and ℐ = (\begin{array}{l} ℐ_{β β} & ℐ_{β γ} \\ ℐ_{γ β} & ℐ_{γ γ} \end{array}),

respectively. The variance of β̂_k is approximately (ℐ_kββ − ℐ_kβγ $ℐ_{k γ γ}^{- 1}$ ℐ_kγβ)⁻¹, so the variance of β̂ is approximately {∑_k (ℐ_kββ − ℐ_kβγ $ℐ_{k γ γ}^{- 1}$ ℐ_kγβ)}⁻¹. The variance of β̃ is approximately (ℐ_ββ − ℐ_βγ $ℐ_{γ γ}^{- 1}$ ℐ_γβ)⁻¹. Thus, the relative efficiency of β̂ to β̃ is approximately

\frac{\sum_{k} ℐ_{k β β} - \sum_{k} ℐ_{k β γ} ℐ_{k γ γ}^{- 1} ℐ_{k γ β}}{ℐ_{β β} - ℐ_{β γ} ℐ_{γ γ}^{- 1} ℐ_{γ β}} .

Because ℐ = ∑_k ℐ_k, the relative efficiency can also be expressed as

\frac{\sum_{k} ℐ_{k β β} - \sum_{k} ℐ_{k β γ} ℐ_{k γ γ}^{- 1} ℐ_{k γ β}}{\sum_{k} ℐ_{k β β} - (\sum_{k} ℐ_{k β γ}) {(\sum_{k} ℐ_{k γ γ})}^{- 1} (\sum_{k} ℐ_{k γ β})} .

It follows from Lemma 1 of Appendix A that

\sum_{k} ℐ_{k β γ} ℐ_{k γ γ}^{- 1} ℐ_{k γ β} ⩾ (\sum_{k} ℐ_{k β γ}) {(\sum_{k} ℐ_{k γ γ})}^{- 1} (\sum_{k} ℐ_{k γ β}) .

Thus, the relative efficiency is always less than or equal to 1. It also follows from Lemma 1 that the relative efficiency is 1 if and only if ℐ_kβγ $ℐ_{k γ γ}^{- 1}$ = ℐ_lβγ $ℐ_{l γ γ}^{- 1}$ (∀k ≠ l). Note that ℐ_kβγ $ℐ_{k γ γ}^{- 1}$ ≈ − var(β̂_k)⁻¹ cov (β̂_k, γ̂_k), where β̂_k and γ̂_k are the maximum likelihood estimators of β and γ based on the kth study data. Thus, the relative efficiency is 1 if and only if var(β̂_k)⁻¹cov(β̂_k, γ̂_k) are the same among the K studies. Obviously, the latter condition is satisfied if cov(β̂_k, γ̂_k) = 0 (k = 1, . . ., K). The foregoing conclusions also hold for multivariate β.

2.3. Unequal effect sizes

The fixed-effects meta-analysis assumes that the effect sizes β_k are the same across studies. This assumption does not affect the Type I error of hypothesis testing since all effect sizes are the same under the null hypothesis. However, it is of practical importance to determine the relative power of using summary statistics versus original data when the β_k are unequal.

Write U_k (β) = ∂log pl_k (β)/β and U(β) = ∂ log pl(β)/β. By definition, U_k (β̂_k) = 0 (k = 1, . . ., K) and U(β̃) = 0. Since $U (β) = \sum_{k = 1}^{K} U_{k} (β)$ , we have $\sum_{k = 1}^{K} {U_{k} ({\hat{β}}_{k}) - U_{k} (\tilde{β})} = 0$ . By the mean-value theorem, $\sum_{k = 1}^{K} ℐ_{k} (β_{k}^{*}) ({\hat{β}}_{k} - \tilde{β}) = 0$ , where $β_{k}^{*}$ lies between β̂_k and β̃. In other words,

\tilde{β} = {\sum_{k = 1}^{K} ℐ_{k} (β_{k}^{*})}^{- 1} \sum_{k = 1}^{K} ℐ_{k} (β_{k}^{*}) {\hat{β}}_{k} .

(6)

Comparison of (6) to (3) reveals that β̃ is the same kind of weighted combination of β̂_k as β̂, with the weights ℐ_k (β) evaluated at $β_{k}^{*}$ rather than β̂_k. As shown in § 2.1, the only difference between vâr(β̃) and vâr(β̂) lies in the evaluation of ℐ_k (β) at β̃ versus β̂_k. Hence, meta-analysis based on summary statistics and meta-analysis of original data will have similar power provided that the ℐ_k (β) do not change their values drastically when β varies between β̂_k and β̃. When the β_k are unequal, the limits of β̂ and β̃ pertain to weighted combinations of β_k, rather than a common parameter in a statistical model.

We consider the local alternatives H₁_n : $β_{k}^{(n)}$ = β + O(n^−1/2) (k = 1, . . ., K). Under H₁_n, the estimators β̂_k (k = 1, . . ., K) converge in probability to β. We show in Appendix B that β̃ also converges in probability to β under H₁_n. It follows that, for each k, the weights n⁻¹ ℐ_k (β̂_k), n⁻¹ ℐ_k (β̃) and n⁻¹ ℐ_k ( $β_{k}^{*}$ ) all converge in probability to the same constant matrix. Thus, meta-analysis based on summary statistics and meta-analysis of original data have the same asymptotic power against H₁_n.

For hypothesis testing, one can use score statistics in meta-analysis. For testing the null hypothesis H₀ : β = β₀, the score statistics based on summary statistics and original data are

{\sum_{k = 1}^{K} U_{k} (β_{0})}^{T} {\sum_{k = 1}^{K} ℐ_{k} (β_{0})}^{- 1} {\sum_{k = 1}^{K} U_{k} (β_{0})},

and U^T(β₀)ℐ⁻¹(β₀)U (β₀), respectively. The two statistics are numerically identical since $U (β) = \sum_{k = 1}^{K} U_{k} (β)$ and $ℐ (β) = \sum_{k = 1}^{K} ℐ_{k} (β)$ . This equivalence holds whether the true effect sizes are equal or not.

2.4. Special cases

In this section, we consider the three most common cases of meta-analysis: linear regression for continuous response, logistic regression for binary response and Cox regression for potentially censored survival time. We will pay particular attention to the form of the observed profile information matrices ℐ_k because, as shown in §§ 2.1 and 2.3, the only difference between β̂ and β̃ lies in the argument of the profile information matrix. We use X to denote the explanatory variables of main interest and Z to denote the unit component and possibly other explanatory variables or covariates. The numbers and types of covariates need not be the same among the K studies.

Example 1. Assume that the distribution of Y_ki conditional on X_ki and Z_ki is normal with mean β^TX_ki + $γ_{k}^{T}$ Z_ki and variance $σ_{k}^{2}$ . The observed profile information matrix for β the kth study data is $ℐ_{k} (σ_{k}^{2}) = D_{k} / σ_{k}^{2}$ , where

D_{k} = \sum_{i = 1}^{n_{k}} X_{k i}^{\otimes 2} - \sum_{i = 1}^{n_{k}} X_{k i} Z_{k i}^{T} {(\sum_{i = 1}^{n_{k}} Z_{k i}^{\otimes 2})}^{- 1} \sum_{i = 1}^{n_{k}} Z_{k i} X_{k i}^{T},

and a^⊗² = aa^T. The maximum likelihood estimators of $σ_{k}^{2}$ based on the kth study data and all data are, respectively,

{\hat{σ}}_{k}^{2} = n_{k}^{- 1} \sum_{i = 1}^{n_{k}} {(Y_{k i} - {\hat{β}}_{k}^{T} X_{k i} - {\hat{γ}}_{k}^{T} Z_{k i})}^{2} and {\tilde{σ}}_{k}^{2} = n_{k}^{- 1} \sum_{i = 1}^{n_{k}} {(Y_{k i} - {\tilde{β}}^{T} X_{k i} - {\tilde{γ}}_{k}^{T} Z_{k i})}^{2},

where β̂_k and γ̂_k are the least-squares estimators of β and γ_k based on the kth study data, and β̃ and γ̃_k are the maximum likelihood estimators of β and γ_k based on all data. By definition, β̂ = {∑_k ℐ_k ( ${\hat{σ}}_{k}^{2}$ )}⁻¹ ∑_kℐ_k ( ${\hat{σ}}_{k}^{2}$ )β̂_k and vâr(β̂) = {∑_k ℐ_k ( ${\hat{σ}}_{k}^{2}$ )⁻¹. It is easy to show that β̃ = {∑_k ℐ_k ( ${\tilde{σ}}_{k}^{2}$ )}⁻¹ ∑_k ℐ_k ( ${\tilde{σ}}_{k}^{2}$ )β̂_k and vâr(β̃) = {∑_k ℐ_k ( ${\tilde{σ}}_{k}^{2}$ )}⁻¹. Thus, β̂ and β̃, and their variance estimators, differ only in whether ℐ_k ( $σ_{k}^{2}$ ) is evaluated at ${\hat{σ}}_{k}^{2}$ or ${\tilde{σ}}_{k}^{2}$ . In general, ${\hat{σ}}_{k}^{2}$ and ${\tilde{σ}}_{k}^{2}$ are approximately the same, so that the results of meta-analysis using summary statistics and using original data are similar. Under the assumed model, both ${\hat{σ}}_{k}^{2}$ and ${\tilde{σ}}_{k}^{2}$ converge to $σ_{k}^{2}$ , so that using summary statistics is asymptotically equivalent to using original data. If the assumed model is incorrect, then the two estimators may converge to different constants.

The setting considered by Olkin & Sampson (1998) and Mathew & Nordström (1999) is a special case of our model in which X consists of treatment indicators and Z = 1. In that setting, D_k is a function of group sizes only. Under the assumption that $σ_{1}^{2} = \dots = σ_{K}^{2}$ , one can define β̂ as (∑_k D_k)⁻¹ ∑_k D_k β̂_k, which also turns out to be the expression of β̃ under the common variance assumption. Thus, using summary statistics is numerically identical to using original data, which is the finding of Olkin & Sampson (1998). Mathew & Nordström (1999) stated that the equivalence continues to hold even if the error variances are unequal. They used the true values of $σ_{k}^{2}$ in their definition of β̂. Since $σ_{k}^{2}$ need to be estimated from the data, the equivalence holds only asymptotically rather than numerically.

Example 2. Assume that

pr (Y_{k i} = 1 | X_{k i}, Z_{k i}) = \frac{exp (β^{T} X_{k i} + γ_{k}^{T} Z_{k i})}{1 + exp (β^{T} X_{k i} + γ_{k}^{T} Z_{k i})} (k = 1, \dots, K; i = 1, \dots, n_{i}) .

Write θ_k = (β, γ_k). The observed profile information matrix for β based on the kth study data is

ℐ_{k} (θ_{k}) = \sum_{i = 1}^{n_{k}} υ_{k i} (θ_{k}) X_{k i}^{\otimes 2} - {\sum_{i = 1}^{n_{k}} υ_{k i} (θ_{k}) X_{k i} Z_{k i}^{T}} {\sum_{i = 1}^{n_{k}} υ_{k i} (θ_{k}) Z_{k i} Z_{k i}^{T}}^{- 1} {\sum_{i = 1}^{n_{k}} υ_{k i} (θ_{k}) Z_{k i} X_{k i}^{T}},

where $υ_{k i} (θ_{k}) = e^{β^{T} X_{k i} + γ_{k}^{T} Z_{k i}} / {(1 + e^{β^{T} X_{k i} + γ_{k}^{T} Z_{k i}})}^{2}$ . Note that ℐ_k depends on θ_k only through the υ_ki (θ_k). Clearly, υ_ki (θ_k) = pr(Y_ki = 1){1 − pr(Y_ki = 0)}, which is not sensitive to the value of θ_k unless pr(Y_ki = 1) is extreme. Thus, the results of meta-analysis using summary statistics and using original data are generally similar, whether the effect sizes are equal or not.

Example 3. The Cox (1972) proportional hazards model specifies that the hazard function of the survival time Y_ki conditional on covariates X_ki takes the form

λ (y | X_{k i}) = λ_{k 0} (y) exp (β^{T} X_{k i}) (k = 1, \dots, K; i = 1, \dots, n_{i}),

where λ_k₀(·) are arbitrary baseline hazard functions. In the presence of right censoring, the data consist of (Ỹ_ki, Δ_ki, X_ki) (k = 1, . . ., K ; i = 1, . . ., n_k), where Ỹ_ki = min(Y_ki, C_ki), Δ_ki = I (Y_ki ⩽ C_ki), C_ki is the censoring time on Y_ki and I (·) is the indicator function. The observed profile information matrix for β based on the kth study data is $ℐ_{k} (β) = \sum_{i = 1}^{n_{k}} Δ_{k i} V_{k} (β; {\tilde{Y}}_{k i})$ , where

V_{k (β; y)} = \frac{\sum_{j = 1}^{n_{k}} I ({\tilde{Y}}_{k j} ⩾ y) exp (β^{T} X_{k j}) X_{k j}^{\otimes 2}}{\sum_{j = 1}^{n_{k}} I ({\tilde{Y}}_{k j} ⩾ y) exp (β^{T} X_{k j})} - {\frac{\sum_{j = 1}^{n_{k}} I ({\tilde{Y}}_{k j} ⩾ y) exp (β^{T} X_{k j}) X_{k j}}{\sum_{j = 1}^{n_{k}} I ({\tilde{Y}}_{k j} ⩾ y) exp (β^{T} X_{k j})}}^{\otimes 2} .

Here V_k (β ; y) is an empirical covariance matrix of X and is not sensitive to the value of β. Thus, the results of meta-analysis using summary statistics and using original data are similar whether the effect sizes are equal or not.

3. Numerical results

3.1. Simulation studies

We conducted simulation studies to assess how well the asymptotic efficiency results of § 2 approximate realistic situations. We mimicked meta-analysis of randomized clinical trials with a binary outcome and simulated data from the following logistic regression model:

pr (Y_{k i} = 1 | X_{k i}) = \frac{exp (α_{k} + β - k X_{k i})}{1 + exp (α_{k} + β - k X_{k i})} (k = 1, \dots, K; i = 1, \dots, n_{k}),

(7)

where X_ki is the treatment indicator, and the proportion of subjects receiving treatment 1 in the kth trial is p_k. The first set of simulation studies was focused on K = 2. We set α₁ = 0 and α₂ = −1 to yield approximately 50% and 30% overall success rates, and chose various values of β₁, β₂, n₁, n₂, p₁ and p₂ to cover a wide range of log odds ratios, sample sizes and treatment assignment ratios. For each combination of the simulation parameters, we generated 1 million datasets; for each dataset, we performed the two types of meta-analysis, i.e. the one based on summary statistics versus the one based on original data. The results are summarized in Table 1.

Table 1.

Meta-analysis based on summary statistics versus original data

						Summary statistics				Original data
β₁	β₂	p₁	p₂	n₁	n₂	Mean	SE	SEE	Power	Mean	SE	SEE	Power
0.8	0.8	0.5	0.5	100	100	0.807	0.302	0.302	0.772	0.813	0.304	0.301	0.775
				100	200	0.805	0.247	0.247	0.912	0.809	0.248	0.246	0.913
				200	100	0.805	0.244	0.245	0.917	0.809	0.246	0.244	0.918
		0.2	0.5	100	100	0.803	0.337	0.340	0.665	0.817	0.342	0.338	0.685
				100	200	0.801	0.264	0.266	0.866	0.810	0.266	0.265	0.874
				200	100	0.805	0.286	0.287	0.818	0.814	0.289	0.285	0.826
		0.5	0.2	100	100	0.810	0.327	0.330	0.698	0.812	0.331	0.328	0.702
				100	200	0.805	0.277	0.278	0.830	0.807	0.279	0.276	0.832
				200	100	0.808	0.257	0.258	0.885	0.808	0.259	0.257	0.884
0.5	1.0	0.5	0.5	100	100	0.744	0.294	0.298	0.717	0.753	0.298	0.296	0.730
				100	200	0.829	0.242	0.244	0.935	0.834	0.244	0.243	0.937
				200	100	0.658	0.238	0.240	0.793	0.666	0.240	0.239	0.807
		0.2	0.5	100	100	0.804	0.332	0.334	0.681	0.817	0.339	0.333	0.695
				100	200	0.879	0.261	0.263	0.928	0.887	0.265	0.263	0.929
				200	100	0.715	0.280	0.280	0.732	0.726	0.284	0.280	0.745
		0.5	0.2	100	100	0.698	0.323	0.324	0.581	0.703	0.329	0.324	0.587
				100	200	0.783	0.275	0.274	0.820	0.787	0.278	0.274	0.818
				200	100	0.621	0.253	0.253	0.693	0.625	0.256	0.253	0.696
1.0	0.5	0.5	0.5	100	100	0.760	0.305	0.308	0.701	0.770	0.310	0.306	0.712
				100	200	0.672	0.249	0.251	0.771	0.681	0.252	0.250	0.781
				200	100	0.843	0.248	0.250	0.931	0.849	0.250	0.249	0.932
		0.2	0.5	100	100	0.685	0.335	0.348	0.500	0.712	0.342	0.341	0.553
				100	200	0.611	0.264	0.271	0.621	0.631	0.267	0.267	0.661
				200	100	0.779	0.284	0.294	0.777	0.796	0.290	0.289	0.800
		0.5	0.2	100	100	0.817	0.322	0.338	0.690	0.813	0.330	0.330	0.696
				100	200	0.720	0.272	0.284	0.727	0.722	0.277	0.278	0.739
				200	100	0.890	0.256	0.265	0.932	0.884	0.260	0.261	0.929

Open in a new tab

Mean, mean parameter estimates; SE, standard errors; SEE, mean standard error estimates; Power, power at the 0.05 significance level.

When the effect sizes are the same between the two studies, i.e. β₁ = β₂, both β̂ and β̃ are nearly unbiased, and their standard errors are close to each other. Thus, the two types of meta-analysis have similar efficiency. To be more specific, β̂ appears to have slightly smaller standard error than β̃ and is thus a little more efficient in that sense; however, β̃ is slightly biased upward and its standard error tends to be slightly underestimated, so that the corresponding Wald test tends to be slightly more powerful than that of β̂. When the effect sizes are unequal between the two studies, i.e. β₁ ≠ β₂, the differences between the two methods become more appreciable. In many cases, meta-analysis based on original data is a little more powerful than meta-analysis based on summary statistics. There are also cases in which the latter is slightly more powerful than the former.

In the second set of studies, we simulated K trials of size n from (7) with α_k = −1, β_k = 1 and p_k = 0.5 (k = 1, . . ., K). For both meta-analysis of summary statistics and meta-analysis of original data, the K intercepts might be assumed to be the same or allowed to be different. To impose a common intercept in meta-analysis of summary statistics, we used the bivariate summary statistics for (α_k, β_k). Table 2 displays the relative efficiency results based on 10 000 replicates with nonzero cell counts. Meta-analysis of summary statistics appears to be as efficient as, or slightly more efficient than, meta-analysis of original data when the two methods make the same modelling assumptions. For n = 10, meta-analysis of original data with a common intercept is a bit more efficient than meta-analysis of summary statistics with different intercepts.

Table 2.

Relative efficiency of using summary statistics versus original data in meta-analysis

	K = 20			K = 50			K = 100
n	RE1	RE2	RE3	RE1	RE2	RE3	RE1	RE2	RE3
10	1.108	0.968	1.093	1.113	0.966	1.097	1.112	0.967	1.098
20	1.161	1.010	1.112	1.160	1.006	1.113	1.156	1.004	1.110
50	1.092	1.017	1.065	1.096	1.021	1.068	1.096	1.022	1.069
100	1.049	1.007	1.030	1.047	1.007	1.029	1.046	1.007	1.030

Open in a new tab

RE1 is the relative efficiency of meta-analysis of summary statistics with a common intercept to meta-analysis of original data with a common intercept; RE2 is the relative efficiency of meta-analysis of summary statistics with different intercepts to meta-analysis of original data with a common intercept; RE3 is the relative efficiency of meta-analysis of summary statistics with different intercepts to meta-analysis of original data with different intercepts.

3.2. Major depression data

Major depression is a complex common disease with enormous public health significance. The lifetime prevalence of this disorder is approximately 15% and is two-fold higher in women than men. Recently, a genome-wide association study was conducted to identify single nucleotide polymorphisms, SNPs, that are associated with major depression (Sullivan et al., 2009). Using a case-control sample of 1738 cases and 1802 controls, the investigators found strong signals in a region surrounding the gene piccolo, PCLO. The investigators then attempted to replicate the results with five independent case-control samples. For our illustration, we exclude the two replication samples that do not have information on sex, which is an important predictor of major depression. The remaining three replication samples have 1907, 2489 and 2005 subjects, with a total of 3135 cases and 3266 controls.

A total of 30 SNPs in the PCLO region were genotyped in the replication samples. For each SNP, we fit the logistic regression model

pr (Y_{k i} = 1 | X_{k i}, Z_{k i}) = \frac{exp (α_{k} + β X_{k i} - γ_{k} Z_{k i})}{1 + exp (α_{k} + β X_{k i} + γ_{k} Z_{k i})} (k = 1, \dots, 3; i = 1, \dots, n_{k}),

(8)

where Y_ki is the case-control status of the i th subject in the kth sample, and X_ki and Z_ki are the corresponding genotype score and sex indicator; the genotype score is the number of copies of the less frequent nucleotide of the SNP that the subject carries. The estimates of the genetic effects and their standard error estimates vary substantially among the three replication samples. For meta-analysis of summary statistics, the estimates of the genetic effects pre-adjusted for sex from the three samples are combined according to formula (1).

We allow α_k in (8) to be different among the three samples so as to reflect the unequal case-control ratios; γ_k may be the same or different among the three samples. When γ_k are allowed to be different in both the meta-analysis based on summary statistics and the meta-analysis based on original data, the two methods yield virtually identical results for the genetic effects of the 30 SNPs: the largest absolute difference between the two sets of log odds ratio estimates is 0.00066, and the largest relative difference between the two sets of standard error estimates is 0.25%. When γ_k are assumed to be the same among the three replication samples in the meta-analysis of original data but not in the meta-analysis of summary statistics, the differences between the two methods become slightly more noticeable: the largest absolute difference between the two sets of log odds ratio estimates is 0.0045, and the largest relative difference between the two sets of standard error estimates is 0.35%. Incidentally, the covariance estimates between β̂_k and γ̂_k are virtually zero, so the differences should be small in light of the results of § 2.2.

4. Remarks

The theoretical results of the present paper are much broader than those of Olkin & Sampson (1998) and Mathew & Nordström (1999), even in the special setting considered by those authors; we have clarified the conditions for the equivalence results stated in those two papers and examined the consequences of violating the underlying assumptions. We have considered more general models for continuous response variables, as well as general parametric and semiparametric models for other response variables.

Our work has important practical implications. There is an ongoing debate on whether the benefits of using original data outweigh the extra cost of taking this approach. The statistical issues surrounding this debate have not been understood well. We have shown theoretically and numerically that there is little or no efficiency gain by analyzing original data. Meta-analysis based on summary statistics reduces resource utilization, simplifies data collection and analysis and avoids the bias and efficiency loss caused by exclusion of studies without original data.

By accessing original data, one can enhance comparability among studies with respect to inclusion/exclusion criteria, definitions of variables, creations of subgroups and adjustments of covariates, ensure estimation of the same parameter by the same statistical method and perform model building and diagnostics. Many of these benefits can still be achieved if all participating investigators follow a common set of guidelines on quality control and statistical analysis and then submit their summary statistics to the meta-analyst. Providing summary statistics is logistically much simpler than transferring original data. Indeed, protection of human subjects and other study policies often prohibit investigators from releasing original data.

One reason for obtaining original data is to model individual-level covariates. It is widely recognized that using study-level summaries of covariates can yield highly biased and inefficient meta-analysis (Berlin et al., 2002; Lambert et al., 2002). We have shown theoretically and numerically that there is no bias or efficiency loss if the effect estimates are properly adjusted for individual-level covariates within each study and then combined via formula (1) or (3). The results of § 2 apply not only to meta-analysis of main effects, but also to meta-analysis of interactions, such as treatment-covariate interactions. If study-specific estimates of interactions are unavailable and one is forced to estimate interactions from study-specific main effects and average covariate values, then there can be serious power loss and bias (Simmonds & Higgins, 2007).

In our context, the asymptotics pertain to individual study sizes n_k. For commonly used parametric and semiparametric models, such as linear, logistic and Cox regression models, the asymptotic approximations are accurate even for small n_k. When the data for individual studies are very sparse, the parameter estimates may be undefined or unreliable. Then analysis of original data will encounter the same difficulties if it is stratified by studies but will tend to be more stable if it is unstratified. Unstratified analysis is more efficient than stratified analysis, but can be misleading if the underlying populations are different among studies. If individual studies are very small, then model building and diagnostics are possible only by pooling the data.

We have focused on fixed-effects models, which assume a common value for the parameter of main interest among studies. An alternative approach is to employ random-effects models, in which the parameter of main interest is treated as a random variable with different realizations across studies (DerSimonian & Laird, 1986). It is technically more challenging to deal with random-effects models than fixed-effects models. Indeed, the properties of meta-analysis under random-effects models have not been investigated rigorously. Our preliminary investigations reveal that the conclusions of § 2 hold for random-effects models under certain conditions. The results will be communicated in a separate report.

Acknowledgments

This research was supported by the National Institutes of Health, U.S.A. The authors thank the editor and two referees for helpful comments.

Appendix A

Some useful matrix results

Lemma 1. For any matrices A _p_×_q, B_q_×_q, C_p_×_q and D_q_×_q with B > 0 and D > 0,

{A B}^{- 1} A^{T} + {C D}^{- 1} C^{T} ⩾ (A + C) {(B + D)}^{- 1} {(A + C)}^{T} .

(A1)

The equality holds if and only if AB⁻¹ = C D⁻¹.

Proof. Since B > 0 and D > 0, we can find a nonsingular matrix P such that B = Pdiag{λ₁, . . ., λ_q}P^T and D = Pdiag{μ₁, . . ., μ_q}P^T, where λ_i > 0 and μ_i > 0 (i = 1, . . ., q). By redefining A as A(P^T)⁻¹ and C as C(P^T)⁻¹, it suffices to prove the lemma when B = diag{λ₁, . . ., λ_q} and D = diag{μ₁, . . ., μ_q}. Let a_i be the i th column of A and c_i be the i th column of C. Then (A1) becomes

\sum_{i = 1}^{q} (λ_{i}^{- 1} a_{i} a_{i}^{T} + μ_{i}^{- 1} c_{i} c_{i}^{T}) ⩾ \sum_{i = 1}^{q} {(λ_{i} + μ_{i})}^{- 1} (a_{i} + c_{i}) {(a_{i} + c_{i})}^{T} .

We wish to show that $λ_{i}^{- 1} a_{i} a_{i}^{T} + μ_{i}^{- 1} c_{i} c_{i}^{T} ⩾ {(λ_{i} + μ_{i})}^{- 1} (a_{i} + c_{i}) {(a_{i} + c_{i})}^{T}$ or equivalently $λ_{i}^{2} c_{i} c_{i}^{T} + μ_{i}^{2} a_{i} a_{i}^{T} ⩾ λ_{i} μ_{i} (a_{i} c_{i}^{T} + c_{i} a_{i}^{T})$ . The desired inequality holds if, for any x,

{(λ_{i} x^{T} c_{i})}^{2} + {(μ_{i} x^{T} a_{i})}^{2} ⩾ 2 λ_{i} μ_{i} (x^{T} a_{i}) (x^{T} c_{i}),

which is obvious from the Cauchy–Schwartz inequality. The foregoing inequality becomes an equality if and only if λ_ic_i = μ_ia_i. Thus, the equality in (A1) holds if and only if AB⁻¹ = C D⁻¹.

Appendix B

Consistency of the maximum likelihood estimator of full data under local alternatives

By the profile likelihood theory (Murphy & van der Vaart, 2000), for

log p l_{k} (β) = log p l_{k} ({\hat{β}}_{k}) - \frac{1}{2} {(β - {\hat{β}}_{k})}^{T} ℐ_{k} (β_{k}) (β - {\hat{β}}_{k}) + o_{p} {(n^{1 / 2} ‖ β - β_{k} ‖ + 1)}^{2}

for β in a neighbourhood of the true value of β_k. Denote the true value of β by β₀. Because β̂_k − β_k = O_p(n^−1/2), β_k − β₀ = O(n^−1/2) and ℐ_k (β_k) = ℐ_k (β̂_k) + o_p(n), we have

\sum_{k = 1}^{K} log p l_{k} (β) = \sum_{k = 1}^{K} log p l_{k} ({\hat{β}}_{k}) - \frac{1}{2} \sum_{k = 1}^{K} {(β - {\hat{β}}_{k})}^{T} ℐ_{k} ({\hat{β}}_{k}) (β - {\hat{β}}_{k}) + o_{p} {(n^{1 / 2} ‖ β - β_{0} ‖ + 1)}^{2}

for β in a neighbourhood of β₀. By definition, β̂ maximizes

- \frac{1}{2} \sum_{k = 1}^{K} {(β - {\hat{β}}_{k})}^{T} ℐ_{k} ({\hat{β}}_{k}) (β - {\hat{β}}_{k}) .

It then follows from the Taylor series expansion that

- \frac{1}{2} \sum_{k = 1}^{K} {(\hat{β} - {\hat{β}}_{k})}^{T} ℐ_{k} ({\hat{β}}_{k}) (\hat{β} - {\hat{β}}_{k}) ⩾ - \frac{1}{2} \sum_{k = 1}^{K} (β - {\hat{β}}_{k}) ℐ_{k} ({\hat{β}}_{k}) (β - {\hat{β}}_{k}) + α n {‖ \hat{β} - β ‖}^{2}

for some positive constant α. Thus,

log p l (\hat{β}) - log p l (β) ⩾ α n {‖ \hat{β} - β ‖}^{2} + o_{p} (n {‖ β - β_{0} ‖}^{2} + n {‖ \hat{β} - β_{0} ‖}^{2} + 1) .

Because ‖β̂ − β₀‖ = O(n^−1/2), the foregoing inequality implies that pl(β̂) > pl(β) for any β such that ‖β − β̂‖ = n^−1/2M for a large M. Hence, there exists a local maximum within the n^−1/2M-neighbourhood of β̂. We define that estimator as β̃ and conclude that β̃ − β₀ = O_p(n ^−1/2).

References

Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI. Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statist Med. 2002;21:371–87. doi: 10.1002/sim.1023. [DOI] [PubMed] [Google Scholar]
Chalmers I, Sandercrock P, Wennberg J. The Cochrane collaboration: preparing, maintaining, and disseminating systematic reviews of the effects of health care. Ann New York Acad Sci. 1993;703:156–65. doi: 10.1111/j.1749-6632.1993.tb26345.x. [DOI] [PubMed] [Google Scholar]
Cox DR. Regression models and life-tables (with Discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]
DerSimonian R, Laird N. Meta-analysis in clinical trials. Contr. Clin. Trials. 1986;7:177–88. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
Lambert PC, Sutton AJ, Abrams KR, Jones DR. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. J Clin Epidemiol. 2002;55:86–94. doi: 10.1016/s0895-4356(01)00414-0. [DOI] [PubMed] [Google Scholar]
Lohmueller KE, Pearce CL, Pike, Lander ES, Hirschhorn JN. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 2003;33:177–82. doi: 10.1038/ng1071. [DOI] [PubMed] [Google Scholar]
Kavvoural FK, Ioannidis JPA. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum Genet. 2008;123:1–14. doi: 10.1007/s00439-007-0445-9. [DOI] [PubMed] [Google Scholar]
Mathew T, Nordstrom K. On the equivalence of meta-analysis using literature and using individual patient data. Biometrics. 1999;55:1221–3. doi: 10.1111/j.0006-341x.1999.01221.x. [DOI] [PubMed] [Google Scholar]
Murphy SA, van der Vaart AW. On the profile likelihood. J Am Statist Assoc. 2000;95:449–65. [Google Scholar]
Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics. 1998;54:317–22. [PubMed] [Google Scholar]
Simmonds MC, Higgins JP. Covariate heterogeneity in meta-analysis: criteria for deciding between meta-regression and individual patient data. Statist Med. 2007;26:2982–99. doi: 10.1002/sim.2768. [DOI] [PubMed] [Google Scholar]
Sullivan PF, de Geus EJC, Willemsen G, James MR, Smit JH, Zandbelt T, Arolt V, Baune BT, Blackwood D, Cichon S, Coventry WL, Dohke M, Farmer A, Fava M, Gordon SD, He Q, Heath A, Heutink P, Holsboer F, Hoogendijk WJ, et al. Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Molec Psychiat. 2009;14:359–75. doi: 10.1038/mp.2008.125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sutton AJ, Abrams KR, Jones, Sheldon TA, Song F. Methods for Meta-Analysis in Medical Research. Chichester: Wiley; 2000. [Google Scholar]
The Psychiatric GWAS Consortium Steering Committee A framework for interpreting genome-wide association studies of psychiatric disorders. Molec Psychiat. 2009;14:10–17. doi: 10.1038/mp.2008.126. [DOI] [PubMed] [Google Scholar]
Whitehead A. Meta-Analysis of Controlled Clinical Trials. Chichester: Wiley; 2002. [Google Scholar]
Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PIW, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RM, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genet. 2008;40:638–45. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b1-asq006] Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI. Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statist Med. 2002;21:371–87. doi: 10.1002/sim.1023. [DOI] [PubMed] [Google Scholar]

[b2-asq006] Chalmers I, Sandercrock P, Wennberg J. The Cochrane collaboration: preparing, maintaining, and disseminating systematic reviews of the effects of health care. Ann New York Acad Sci. 1993;703:156–65. doi: 10.1111/j.1749-6632.1993.tb26345.x. [DOI] [PubMed] [Google Scholar]

[b3-asq006] Cox DR. Regression models and life-tables (with Discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]

[b4-asq006] DerSimonian R, Laird N. Meta-analysis in clinical trials. Contr. Clin. Trials. 1986;7:177–88. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]

[b5-asq006] Lambert PC, Sutton AJ, Abrams KR, Jones DR. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. J Clin Epidemiol. 2002;55:86–94. doi: 10.1016/s0895-4356(01)00414-0. [DOI] [PubMed] [Google Scholar]

[b6-asq006] Lohmueller KE, Pearce CL, Pike, Lander ES, Hirschhorn JN. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 2003;33:177–82. doi: 10.1038/ng1071. [DOI] [PubMed] [Google Scholar]

[b7-asq006] Kavvoural FK, Ioannidis JPA. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum Genet. 2008;123:1–14. doi: 10.1007/s00439-007-0445-9. [DOI] [PubMed] [Google Scholar]

[b8-asq006] Mathew T, Nordstrom K. On the equivalence of meta-analysis using literature and using individual patient data. Biometrics. 1999;55:1221–3. doi: 10.1111/j.0006-341x.1999.01221.x. [DOI] [PubMed] [Google Scholar]

[b9-asq006] Murphy SA, van der Vaart AW. On the profile likelihood. J Am Statist Assoc. 2000;95:449–65. [Google Scholar]

[b10-asq006] Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics. 1998;54:317–22. [PubMed] [Google Scholar]

[b11-asq006] Simmonds MC, Higgins JP. Covariate heterogeneity in meta-analysis: criteria for deciding between meta-regression and individual patient data. Statist Med. 2007;26:2982–99. doi: 10.1002/sim.2768. [DOI] [PubMed] [Google Scholar]

[b12-asq006] Sullivan PF, de Geus EJC, Willemsen G, James MR, Smit JH, Zandbelt T, Arolt V, Baune BT, Blackwood D, Cichon S, Coventry WL, Dohke M, Farmer A, Fava M, Gordon SD, He Q, Heath A, Heutink P, Holsboer F, Hoogendijk WJ, et al. Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Molec Psychiat. 2009;14:359–75. doi: 10.1038/mp.2008.125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13-asq006] Sutton AJ, Abrams KR, Jones, Sheldon TA, Song F. Methods for Meta-Analysis in Medical Research. Chichester: Wiley; 2000. [Google Scholar]

[b14-asq006] The Psychiatric GWAS Consortium Steering Committee A framework for interpreting genome-wide association studies of psychiatric disorders. Molec Psychiat. 2009;14:10–17. doi: 10.1038/mp.2008.126. [DOI] [PubMed] [Google Scholar]

[b15-asq006] Whitehead A. Meta-Analysis of Controlled Clinical Trials. Chichester: Wiley; 2002. [Google Scholar]

[b16-asq006] Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, de Bakker PIW, Abecasis GR, Almgren P, Andersen G, Ardlie K, Boström KB, Bergman RM, Bonnycastle LL, Borch-Johnsen K, Burtt NP, Chen H, Chines PS, Daly MJ, Deodhar P, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genet. 2008;40:638–45. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

On the relative efficiency of using summary statistics versus individual-level data in meta-analysis

D Y Lin

D Zeng

Summary

1. Introduction

2. Theoretical results

2.1. Main results

2.2. Common nuisance parameters

2.3. Unequal effect sizes

2.4. Special cases

3. Numerical results

3.1. Simulation studies

Table 1.

Table 2.

3.2. Major depression data

4. Remarks

Acknowledgments

Appendix A

Some useful matrix results

Appendix B

Consistency of the maximum likelihood estimator of full data under local alternatives

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On the relative efficiency of using summary statistics versus individual-level data in meta-analysis

D Y Lin

D Zeng

Summary

1. Introduction

2. Theoretical results

2.1. Main results

2.2. Common nuisance parameters

2.3. Unequal effect sizes

2.4. Special cases

3. Numerical results

3.1. Simulation studies

Table 1.

Table 2.

3.2. Major depression data

4. Remarks

Acknowledgments

Appendix A

Some useful matrix results

Appendix B

Consistency of the maximum likelihood estimator of full data under local alternatives

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases