Undue reliance on I2 in assessing heterogeneity may mislead

Gerta Rücker; Guido Schwarzer; James R Carpenter; Martin Schumacher

doi:10.1186/1471-2288-8-79

. 2008 Nov 27;8:79. doi: 10.1186/1471-2288-8-79

Undue reliance on I²in assessing heterogeneity may mislead

Gerta Rücker ^1,^2,^✉, Guido Schwarzer ^1,², James R Carpenter ^1,³, Martin Schumacher ¹

PMCID: PMC2648991 PMID: 19036172

Abstract

Background

The heterogeneity statistic I², interpreted as the percentage of variability due to heterogeneity between studies rather than sampling error, depends on precision, that is, the size of the studies included.

Methods

Based on a real meta-analysis, we simulate artificially 'inflating' the sample size under the random effects model. For a given inflation factor M = 1, 2, 3,... and for each trial i, we create a M-inflated trial by drawing a treatment effect estimate from the random effects model, using $s_{i}^{2}$ /M as within-trial sampling variance.

Results

As precision increases, while estimates of the heterogeneity variance τ²remain unchanged on average, estimates of I²increase rapidly to nearly 100%. A similar phenomenon is apparent in a sample of 157 meta-analyses.

Conclusion

When deciding whether or not to pool treatment estimates in a meta-analysis, the yard-stick should be the clinical relevance of any heterogeneity present. τ², rather than I², is the appropriate measure for this purpose.

Background

In meta-analysis, three principal sources of heterogeneity can be distinguished. These are (i) clinical baseline heterogeneity between patients from different studies, measured, e.g., in patient baseline characteristics and not necessarily reflected on the outcome measurement scale; (ii) statistical heterogeneity, quantified on the outcome measurement scale, that may or may not be clinically relevant and may or may not be statistically significant, and (iii) heterogeneity from other sources, e.g. design-related heterogeneity. In this article, we only deal with statistical heterogeneity. References [1-7] give an introduction to the large literature in this area. We do not discuss how to assess clinical baseline heterogeneity.

In this paper, we show that I²increases with the number of patients included in the studies in a meta-analysis. In the light of this, we argue that I²is in general of limited use in assessing clinically relevant heterogeneity.

The article is structured as follows. After introducing existing measures of heterogeneity in meta-analysis and discussing their properties, we illustrate the problem of interpreting the measure I²using an example from the literature. We then present a simulation study which explores the effect of sample size inflation on I², and finally conclude with a discussion.

Methods

Let k be the number of studies in a meta-analysis. Further, let x_ibe the within-study treatment effect estimate (e.g., a log odds ratio), $s_{i}^{2}$ the within-study variance of x_i, and w_ithe weight of study i (i = 1,..., k). In this article, we always use inverse variance weights, that is, w_i= 1/ $s_{i}^{2}$ if the fixed effect model is used, and w_i= 1/( $s_{i}^{2}$ + τ²) if the random effects model is used (see below for definition and estimation of the heterogeneity variance τ²). Several measures of statistical heterogeneity are widely used:

1. Cochran's Q statistic, which under the null hypothesis of no heterogeneity follows a χ²distribution with k - 1 degrees of freedom [8]. Q is given by

Q = \sum_{i = 1}^{k} w_{i} {(x_{i} - \frac{\sum w_{j} x_{j}}{\sum w_{j}})}^{2};

2. Higgins' and Thompson's I², derived from Cochran's Q by defining [4]

I^{2} = \max {0, \frac{Q - (k - 1)}{Q}};

3. the between-study variance, τ², as estimated in a random effects meta-analysis. There are several proposals for estimating τ²in a meta-analysis, such as the REML estimator or the Hedges-Olkin estimator [5-7,9]. Nevertheless, most reviewers use the moment-based estimate of τ²[10], implemented in RevMan [11] and calculated as

{\hat{τ}}^{2} = \max {0, \frac{Q - (k - 1)}{\sum w_{i} - \frac{\sum w_{i}^{2}}{\sum w_{i}}}};

4. H², derived from Cochran's Q by defining [4]

H^{2} = \frac{Q}{k - 1},

and

5. R², similar to H²and calculated from τ²and a so-called 'typical' within-study variance σ²(which must be estimated), and defined as:

R^{2} = \frac{τ^{2} + σ^{2}}{σ^{2}} .

As seen here, and described elsewhere [4], some measures are directly related, and others approximately related. Table 1 shows key properties of the various measures; more details are given in [4]. In summary:

Table 1.

Properties of measures of heterogeneity.

Measure	measured on		increasing with
	scale	range	number of studies in meta-analysis	precision (size of studies)

Q	absolute	[0, ∞)	yes	yes
I²	percent	[0, 100%]	no	yes
τ, τ²	outcome	[0, ∞)	no	no
H, H²	absolute	[1, ∞)	no	yes
R, R²	absolute	[1, ∞)	no	yes

Open in a new tab

1. Q, which follows a χ²distribution with k - 1 degrees of freedom under H₀, is the weighted sum of squared differences between the study means and the fixed effect estimate. It always increases with the number of studies, k, in the meta-analysis.

2. In contrast to Q, the statistic I²was introduced by Higgins and Thompson [4] as a measure independent of k, the number of studies in the meta-analysis. I²is interpreted as the percentage of variability in the treatment estimates which is attributable to heterogeneity between studies rather than to sampling error.

3. τ²describes the underlying between-study variability. Its square root, τ, is measured in the same units as the outcome. Its estimates do not systematically increase with either the number, or size, of studies in a meta-analysis.

4. H²is a test statistic. It describes the relative difference between the observed Q and its expected value in the absence of heterogeneity. Thus it does not systematically increase with the number of studies [4]. H corresponds to the residual standard deviation in a radial (Galbraith) plot [12]. H = 1 indicates perfect homogeneity.

5. R²is the square of a statistic R which describes the inflation of the random effects confidence interval compared to that from the fixed effect model. It does not increase with k. R²= 1 indicates perfect homogeneity [4].

Notice that, in contrast to τ², the measures Q, I², H and R all depend on the precision, which is proportional to study size [13]. Thus, given an underlying model, if the study sizes are enlarged, the confidence intervals become smaller and the heterogeneity, measured (say) using I², increases. This is reflected in the interpretation: As I²is the percentage of variability that is due to between-study heterogeneity, 1 - I²is the percentage of variability that is due to sampling error. When the studies become very large, the sampling error tends to 0 and I²tends to 1. Such heterogeneity may not be clinically relevant.

We now explore this further using simulation. Note first that simply looking at the effect of scaling up all sample sizes by a common factor (leaving their treatment effects unchanged) is not appropriate. This is because if study sizes were truly to increase, estimates would approach the true value for each study and not be fixed at the original observed value. Instead, we simulate under the random effects model. Under this model, μ and τ²are assumed constant, and the total variance in study i is $σ_{i}^{2}$ + τ², which decreases with increasing study sample size, eventually tending to τ².

Study size inflation based on the random effects model

Suppose in a meta-analysis trial i reports treatment effect estimate x_i(e.g., on the log odds scale) with observed sampling variance $s_{i}^{2}$ . Let τ²denote the heterogeneity variance. The model is

\begin{matrix} x_{i} = μ + \sqrt{σ_{i}^{2} + τ^{2}} ϵ_{i}, & ϵ_{i} ~ N (0, 1), \end{matrix}

where μ is the average treatment effect. For a given inflation factor M = 1, 2, 3,..., the model with inflated sample size (corresponding to an M-fold increase in precision) is

\begin{matrix} x_{M, i} = μ + \sqrt{σ_{i}^{2} / M + τ^{2}} {ϵ^{'}}_{i}, & {ϵ^{'}}_{i} ~ N (0, 1) . \end{matrix}

We generate an illustrative meta-analysis for each inflation factor. For each trial in each meta-analysis, we generate a random M-inflated trial by drawing a treatment effect estimate x_M,ifrom this model, using $s_{i}^{2}$ /M as the within-trial sampling variance and the DerSimonian-Laird estimate ${\hat{τ}}^{2}$ for the heterogeneity parameter τ².

Results

We use data from a large meta-analysis (of 70 trials) to estimate the effect of thrombolytic therapy in acute myocardial infarction [14]. The original analysis using the fixed effects model (Mantel-Haenszel method) gives an odds ratio of 0.747 with a 95% confidence interval (95% CI) of [0.705; 0.792]. Using the random effects model, the odds ratio is 0.732, 95% CI [0.664; 0.808]. The DerSimonian-Laird estimate of τ²is 0.018 (H = 1.11, 95% CI [1; 1.29], I²= 18.6%, 95% CI [0%; 40.1%]). As Q = 85, p = 0.0953, there is no evidence of heterogeneity.

We now explore the effect of increasing M. Figure 1 shows forest plots of the original meta-analysis along with illustrative meta-analyses generated for M = 4, 16 and 64. The behavior of the heterogeneity measures is shown in Table 2. It is clear that while the variation in τ²is essentially random, the values of Q, H and I²increase rapidly with increasing sample size.

**Top left panel: Meta-analysis of thrombolytic therapy in acute myocardial infarction** [14]. Other plots: illustrative randomly sampled versions of the same meta-analysis with sample-size inflation factors of M = 4, 16 and 64 (details in text).

Table 2.

Effect of increasing within trial precision (factor M) on heterogeneity measures (data in [14]).

Factor	Measure
M	${\hat{τ}}^{2}$	Q (P-value)	I²	H

1	0.018	85 (0.0953)	18.6% [0%; 40.1%]	1.11 [1; 1.29]
4	0.008	98 (0.0135)	29.2% [4.5%; 47.6%]	1.19 [1.02; 1.38]
16	0.027	454 (<0.0001)	84.8% [81.4%; 87.5%]	2.56 [2.32; 2.83]
64	0.028	1708 (<0.0001)	96.0% [95.4%; 96.5%]	4.98 [4.65; 5.32]

Open in a new tab

Figures 2 and 3 give two other perspectives on this. Figure 2 shows that as M increases, τ²varies randomly, while (i) the average of the within study variances; (ii) the estimated total variance (under the model), and (iii) the observed total variance, all decrease rapidly with increasing M. Using the same data, Figure 3 shows how I²behaves. Note how rapidly it approaches 100%.

**Within-study variation, decreasing with increasing sample size while heterogeneity remains constant**. Details in text.

**Percentage** I²**of variation due to heterogeneity rather than to sampling error against sample size (same simulation data as in Figure 2)**.

Empirical evaluation: a sample of meta-analyses

In order to examine the behavior and the order of magnitude of I²empirically, we further looked at a sample of 157 meta-analyses with binary endpoints. This data set was kindly provided by Peter Jüni [15]. We calculated τ²and I²for each meta-analysis. Further, for each meta-analysis, we calculated the median study size of the contributing studies, denoted n_i, i = 1,..., 157. After excluding all meta-analyses with both τ²= I²= 0 (n = 58), we fitted a linear model to the remaining 99 meta-analyses with I²as outcome and ${\hat{τ}}_{i}$ and log n_ias covariates (thus implicitly assuming a log-normal distribution for study size).

As expected, I²increases with both heterogeneity (β_τ= 65.873, SE = 4.788, p = 0.000) and median study size (β_{log n}= 8.503, SE = 1.460, p = 0.000). The residual standard error is 13.07 with an adjusted $R_{a d j}^{2}$ = 0.6621 (F = 97.01, df = 96, p = 0.000). That is, even after adjusting for between-study variance τ², I²depends strongly on study size. Figure 4 illustrates the results.

I²**against median study size in a sample of 157 meta-analyses**. Light, grey and black dots and regression lines correspond to the first, second and third tercile of the distribution of τ².

Light, grey and black dots and regression lines correspond to the first, second and third tercile of the distribution of τ². Within each class of meta-analyses, I²is increasing with median study size.

Discussion

The main advantage of the statistic I²is that it does not depend on the number of studies in a meta-analysis. Thus, using I²instead of Q, it is possible to compare the statistical heterogeneity of meta-analyses with different numbers of studies [4]. Also, I²is easily interpreted by clinicians as the percentage of variability in the treatment estimates which is attributable to heterogeneity between studies rather than to sampling error.

However, an immediate (but often overlooked) consequence of this interpretation is that I²increases with the number of patients included in the studies in a meta-analysis. In a recent simulation using continuous outcomes, others found empirically that I²increased with increasing numbers of patients per trial though τ²was kept fixed [16]. Unfortunately, as demonstrated by a recent empirical study [17], reviewers seem to be unaware of this when they use I²to decide whether to pool studies in a meta-analysis. Some authors also seem to be reluctant to call I²a statistic, using instead words such as metric [18], index [19], or even point estimate [17,18,20]. On the other hand, the term 'statistical test' is used in connection with I²in one of these references [20], p. 915. In another reference [18], the authors proposed an algorithm for a sensitivity analysis that successively excludes 'outlying' trials until I²falls below a prespecified level. In response to this [21], Higgins showed that the exclusion of a large trial with its effect close to the pooled estimate can be the most efficient way to reduce I².

Our simulation highlights the problem of interpreting heterogeneity measured by I²as clinical heterogeneity. This is analogous to interpreting statistically significant effects (P < 0.05) as clinically relevant. In our view the decision on whether or not to pool studies in a meta-analysis should not solely be based on I². Instead, studies with relatively large I²may usefully be pooled when the clinically relevant heterogeneity (in efficacy and covariates) is acceptably small.

Further, as τ is measured on the same scale as the outcome, it can be directly used to quantify variability. Indeed, clinically meaningful heterogeneity on the outcome scale could be pre-specified. Thus, in advance a reviewer may decide that three studies with odds ratios of 0.8, 1 and 1.25 cannot be pooled; in other words the relative effect ratios of 0.8 = 1/1.25 are too great. This corresponds to a standard deviation τ₀= - log 0.8 = log 1.25 = 0.22 = $\sqrt{0.05}$ on the log scale and thus a threshold of $τ_{0}^{2}$ = 0.05 for the heterogeneity variance τ².

While Higgins and Thompson in their papers [4,22] thoroughly described the properties of the various measures and distinguished between them, we feel current guidelines are likely to let misconceptions persist. For example, the 'Cochrane Handbook for Systematic Reviews of Interventions' (outdated Version 4.2.6, page 138) stated 'A value [of I²] greater than 50% may be considered as substantial heterogeneity'. The recent Version 5.0.1, while admitting that 'thresholds for the interpretation of I²can be misleading, since the importance of inconsistency depends on several factors', nevertheless lists overlapping ranges of I²which provide 'a rough guide to interpretation' (see Table 3) [23]. The result is that some reviewers conclude that studies must not be pooled if I²> 50% [24,25]. By contrast, Section 9.5.4 of the handbook states 'The choice between a fixed-effect and a random-effects meta-analysis should never be made on the basis of a statistical test of heterogeneity'. Further some methodologists discourage reviewers from using tests for funnel plot asymmetry if I²> 50% [26].

Table 3.

Ranges for interpretation of I²following the Cochrane Handbook for Systematic Reviews of Interventions (Version 5.0.1) [23].

0% to 40%	might not be important
30% to 60%	may represent moderate heterogeneity
50% to 90%	may represent substantial heterogeneity
75% to 100%	considerable heterogeneity

Open in a new tab

We believe the interpretation issues stem from the concept of I²as 'the proportion of variance (un)explained', referred to as 'widely familiar' to clinicians by Higgins and Thompson [4] (Section 4). However, there is a fundamental difference between the interpretation of the coefficient of determination $R_{r e g}^{2}$ in regression analysis, which is sub-consciously invoked by this phrase, and that of I₂: On the one hand, $R_{r e g}^{2}$ (that is, the square of the correlation coefficient) is a measure of the association between the dependent and the independent variable, which homes in on the true value as the sample size increases. However, I²tends to 100% as the number of patients increases. Although one may argue that the 'unit' corresponding to the 'observation' in a regression is the study, not the patient, this link is only strictly valid if sample size of new studies are distributed similarly to those of existing studies. This is not universally true. Often small trials are followed by larger ones. Thus I²will tend to increase artificially as evidence accumulates.

To address this, more weight should be given to often overlooked comments by Higgins and Thompson, [4], p 1545, who state 'Note that we do not propose that our measure should be independent of the precisions of estimates observed in the studies. Thus sets of studies with identical heterogeneity τ², but with different degrees of sampling error σ², will produce different measures.... Describing the underlying between-study variability ... can best be achieved simply by estimating the between-study variance, τ².'

Conclusion

When deciding whether or not to pool treatment estimates in a meta-analysis, the yard-stick should be the clinical relevance of any heterogeneity present. τ², rather than I²is the appropriate measure for this purpose.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

GR proposed the model for sample size inflation, did all calculations and wrote the first draft of the manuscript. GS, JC and MS contributed to the writing and approved the final version.

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1471-2288/8/79/prepub

Acknowledgments

Acknowledgements

GR and JC are funded by Deutsche Forschungsgemeinschaft (FOR 534 Schw 821/2-2). The authors wish to thank Peter Jüni for providing data and all reviewers and Douglas G Altman for helpful discussion.

Contributor Information

Gerta Rücker, Email: ruecker@imbi.uni-freiburg.de.

Guido Schwarzer, Email: sc@imbi.uni-freiburg.de.

James R Carpenter, Email: James.Carpenter@lshtm.ac.uk.

Martin Schumacher, Email: ms@imbi.uni-freiburg.de.

References

Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine. 1998;17:841–856. doi: 10.1002/(SICI)1097-0258(19980430)17:8<841::AID-SIM781>3.0.CO;2-D. [DOI] [PubMed] [Google Scholar]
Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: A comparison of methods. Statistics in Medicine. 1999;18:2693–2708. doi: 10.1002/(SICI)1097-0258(19991030)18:20<2693::AID-SIM235>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]
Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: An empirical study of 125 meta-analyses. Statistics in Medicine. 2000;19:1707–1728. doi: 10.1002/1097-0258(20000715)19:13<1707::AID-SIM491>3.0.CO;2-P. [DOI] [PubMed] [Google Scholar]
Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine. 2002;21:1539–1558. doi: 10.1002/sim.1186. [DOI] [PubMed] [Google Scholar]
Sidik K, Jonkman JN. Simple heterogeneity variance estimation for meta-analysis. JRSS Series C (Applied Statistics) 2005;54:367–384. doi: 10.1111/j.1467-9876.2005.00489.x. [DOI] [Google Scholar]
Knapp G, Biggerstaff BJ, Hartung J. Assessing the amount of heterogeneity in random-effects meta-analysis. Biometrical Journal. 2006;48:271–285. doi: 10.1002/bimj.200510175. [DOI] [PubMed] [Google Scholar]
Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in Medicine. 2007;26:37–52. doi: 10.1002/sim.2514. [DOI] [PubMed] [Google Scholar]
Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–129. doi: 10.2307/3001666. [DOI] [Google Scholar]
Hedges LV. A random effects model for effect sizes. Psychological Bulletin. 1983;93:388–395. doi: 10.1037/0033-2909.93.2.388. [DOI] [Google Scholar]
DerSimonian R, Laird N. Meta-analysis in Clinical Trials. Controlled Clinical Trials. 1986;7:177–188. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
Review Manager (RevMan) [Computer program]. Version 5.0. 2008. http://www.cc-ims.net/RevMan/RevMan5/
Galbraith RF. A note on graphical presentation of estimated odds ratios from several clinical trials. Statistics in Medicine. 1988;7:889–894. doi: 10.1002/sim.4780070807. [DOI] [PubMed] [Google Scholar]
Mittlböck M, Heinzl H. A simulation study comparing properties of heterogeneity measures in meta-analyses. Statistics in Medicine. 2006;25:4321–4333. doi: 10.1002/sim.2692. [DOI] [PubMed] [Google Scholar]
Olkin I. Statistical and theoretical considerations in meta-analysis. Journal of Clinical Epidemiology. 1995;48:133–146. doi: 10.1016/0895-4356(94)00136-E. [DOI] [PubMed] [Google Scholar]
Jüni P. Department of Social and Preventive Medicine, University of Berne, Switzerland. Personal Communication. 2006.
Friedrich JO, Adhikari N, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study. BMC Medical Research Methodolology. 2008;8:32. doi: 10.1186/1471-2288-8-32. http://www.biomedcentral.com/1471-2288/8/32 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ioannidis JP, Patsopoulos NA, Rothstein HR. Reasons or excuses for avoiding meta-analysis in forest plots. BMJ. 2008;336:1413–1415. doi: 10.1136/bmj.a117. http://www.bmj.com/cgi/content/full/336/7658/1413 [DOI] [PMC free article] [PubMed] [Google Scholar]
Patsopoulos NA, Evangelou E, Ioannidis JP. Sensitivity of between-study heterogeneity in meta-analysis: proposed metrics and empirical evaluation. International Journal of Epidemiology. 2008;37:1148–1157. doi: 10.1093/ije/dyn065. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huedo-Medina TB, Sánchez-Meca J, Marín-Martínez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychological Methods. 2006;11:193–206. doi: 10.1037/1082-989X.11.2.193. [DOI] [PubMed] [Google Scholar]
Ioannidis JP, Patsopoulos NA, Evangelou E. Uncertainty in heterogeneity estimates in meta-analyses. BMJ. 2007;335:914–916. doi: 10.1136/bmj.39343.408449.80. http://www.bmj.com/cgi/content/full/335/7626/914 [DOI] [PMC free article] [PubMed] [Google Scholar]
Higgins JP. Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. International Journal of Epidemiology. 2008;37:1158–1160. doi: 10.1093/ije/dyn204. [DOI] [PubMed] [Google Scholar]
Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analysis. BMJ. 2003;327:557–560. doi: 10.1136/bmj.327.7414.557. http://www.bmj.com/cgi/content/full/327/7414/557 [DOI] [PMC free article] [PubMed] [Google Scholar]
Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 501 Version 501. 2008. http://www.cochrane-handbook.org
Thomas D, Elliott E, Naughton G. Exercise for type 2 diabetes mellitus. Cochrane Database of Systematic Reviews. 2006;19:3. doi: 10.1002/14651858.CD002968.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Timmer A, McDonald JW, MacDonald JK. Azathioprine And 6-Mercaptopurine For Maintenance Of Remission In Ulcerative Colitis. Cochrane Database Syst Rev. 2007;24:CD000478. doi: 10.1002/14651858.CD000478.pub2. [DOI] [PubMed] [Google Scholar]
Ioannidis JPA, Trikalinos TA. The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. Canadian Medical Association Journal. 2007;176:1091–1096. doi: 10.1503/cmaj.060410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine. 1998;17:841–856. doi: 10.1002/(SICI)1097-0258(19980430)17:8<841::AID-SIM781>3.0.CO;2-D. [DOI] [PubMed] [Google Scholar]

[B2] Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: A comparison of methods. Statistics in Medicine. 1999;18:2693–2708. doi: 10.1002/(SICI)1097-0258(19991030)18:20<2693::AID-SIM235>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]

[B3] Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in meta-analysis: An empirical study of 125 meta-analyses. Statistics in Medicine. 2000;19:1707–1728. doi: 10.1002/1097-0258(20000715)19:13<1707::AID-SIM491>3.0.CO;2-P. [DOI] [PubMed] [Google Scholar]

[B4] Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine. 2002;21:1539–1558. doi: 10.1002/sim.1186. [DOI] [PubMed] [Google Scholar]

[B5] Sidik K, Jonkman JN. Simple heterogeneity variance estimation for meta-analysis. JRSS Series C (Applied Statistics) 2005;54:367–384. doi: 10.1111/j.1467-9876.2005.00489.x. [DOI] [Google Scholar]

[B6] Knapp G, Biggerstaff BJ, Hartung J. Assessing the amount of heterogeneity in random-effects meta-analysis. Biometrical Journal. 2006;48:271–285. doi: 10.1002/bimj.200510175. [DOI] [PubMed] [Google Scholar]

[B7] Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in Medicine. 2007;26:37–52. doi: 10.1002/sim.2514. [DOI] [PubMed] [Google Scholar]

[B8] Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–129. doi: 10.2307/3001666. [DOI] [Google Scholar]

[B9] Hedges LV. A random effects model for effect sizes. Psychological Bulletin. 1983;93:388–395. doi: 10.1037/0033-2909.93.2.388. [DOI] [Google Scholar]

[B10] DerSimonian R, Laird N. Meta-analysis in Clinical Trials. Controlled Clinical Trials. 1986;7:177–188. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]

[B11] Review Manager (RevMan) [Computer program]. Version 5.0. 2008. http://www.cc-ims.net/RevMan/RevMan5/

[B12] Galbraith RF. A note on graphical presentation of estimated odds ratios from several clinical trials. Statistics in Medicine. 1988;7:889–894. doi: 10.1002/sim.4780070807. [DOI] [PubMed] [Google Scholar]

[B13] Mittlböck M, Heinzl H. A simulation study comparing properties of heterogeneity measures in meta-analyses. Statistics in Medicine. 2006;25:4321–4333. doi: 10.1002/sim.2692. [DOI] [PubMed] [Google Scholar]

[B14] Olkin I. Statistical and theoretical considerations in meta-analysis. Journal of Clinical Epidemiology. 1995;48:133–146. doi: 10.1016/0895-4356(94)00136-E. [DOI] [PubMed] [Google Scholar]

[B15] Jüni P. Department of Social and Preventive Medicine, University of Berne, Switzerland. Personal Communication. 2006.

[B16] Friedrich JO, Adhikari N, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study. BMC Medical Research Methodolology. 2008;8:32. doi: 10.1186/1471-2288-8-32. http://www.biomedcentral.com/1471-2288/8/32 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Ioannidis JP, Patsopoulos NA, Rothstein HR. Reasons or excuses for avoiding meta-analysis in forest plots. BMJ. 2008;336:1413–1415. doi: 10.1136/bmj.a117. http://www.bmj.com/cgi/content/full/336/7658/1413 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Patsopoulos NA, Evangelou E, Ioannidis JP. Sensitivity of between-study heterogeneity in meta-analysis: proposed metrics and empirical evaluation. International Journal of Epidemiology. 2008;37:1148–1157. doi: 10.1093/ije/dyn065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Huedo-Medina TB, Sánchez-Meca J, Marín-Martínez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychological Methods. 2006;11:193–206. doi: 10.1037/1082-989X.11.2.193. [DOI] [PubMed] [Google Scholar]

[B20] Ioannidis JP, Patsopoulos NA, Evangelou E. Uncertainty in heterogeneity estimates in meta-analyses. BMJ. 2007;335:914–916. doi: 10.1136/bmj.39343.408449.80. http://www.bmj.com/cgi/content/full/335/7626/914 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] Higgins JP. Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. International Journal of Epidemiology. 2008;37:1158–1160. doi: 10.1093/ije/dyn204. [DOI] [PubMed] [Google Scholar]

[B22] Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analysis. BMJ. 2003;327:557–560. doi: 10.1136/bmj.327.7414.557. http://www.bmj.com/cgi/content/full/327/7414/557 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 501 Version 501. 2008. http://www.cochrane-handbook.org

[B24] Thomas D, Elliott E, Naughton G. Exercise for type 2 diabetes mellitus. Cochrane Database of Systematic Reviews. 2006;19:3. doi: 10.1002/14651858.CD002968.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] Timmer A, McDonald JW, MacDonald JK. Azathioprine And 6-Mercaptopurine For Maintenance Of Remission In Ulcerative Colitis. Cochrane Database Syst Rev. 2007;24:CD000478. doi: 10.1002/14651858.CD000478.pub2. [DOI] [PubMed] [Google Scholar]

[B26] Ioannidis JPA, Trikalinos TA. The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. Canadian Medical Association Journal. 2007;176:1091–1096. doi: 10.1503/cmaj.060410. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Undue reliance on I²in assessing heterogeneity may mislead

Gerta Rücker

Guido Schwarzer

James R Carpenter

Martin Schumacher