Uncertainty in heterogeneity estimates in meta-analyses

John P A Ioannidis; Nikolaos A Patsopoulos; Evangelos Evangelou

doi:10.1136/bmj.39343.408449.80

. 2007 Nov 3;335(7626):914–916. doi: 10.1136/bmj.39343.408449.80

Uncertainty in heterogeneity estimates in meta-analyses

John P A Ioannidis ^1,^✉, Nikolaos A Patsopoulos ¹, Evangelos Evangelou ¹

PMCID: PMC2048840 PMID: 17974687

Abstract

John Ioannidis, Nikolaos Patsopoulos, and Evangelos Evangelou argue that, although meta-analyses often measure heterogeneity between studies, these estimates can have large uncertainty, which must be taken into account when interpreting evidence

Summary points

The extent of between study heterogeneity should be measured when interpreting results of meta-analyses
Meta-analyses rarely document uncertainty in estimates of heterogeneity
Our evaluation of a large number of meta-analyses shows a wide range of uncertainty about the extent of heterogeneity in most
Confidence intervals of I² should be calculated and considered when interpreting meta-analyses

An important aim of systematic reviews and meta-analyses is to assess the extent to which different studies give similar or dissimilar results.¹ Clinical, methodological, and biological heterogeneity are often topic specific, but statistical heterogeneity can be examined with the same methods in all meta-analyses. Therefore, the perception of statistical heterogeneity or homogeneity often influences meta-analysts and clinicians in important decisions. These decisions include whether the data are similar enough to combine different studies; whether a treatment is applicable to all or should be “individualised” because of variable benefits or harms in different types of patients; and whether a risk factor affects all people exposed or only select populations. How uncertain is the extent of statistical heterogeneity in meta-analyses? Moreover, is this uncertainty properly factored in when interpreting the results?

Evaluating heterogeneity between studies

Many statistical tests are available for evaluating heterogeneity between studies.² ³ Until recently, the most popular was Cochran's Q, a statistic based on the χ² test.⁴ Cochran's Q usually has only low power to detect heterogeneity, however. It also depends on the number of studies and cannot be compared across different meta-analyses.² ³ Higgins and colleagues, in two highly cited papers,⁵ ⁶ proposed the routine use of the I² statistic. I² is calculated as [(Q−df)/Q]×100%, where df is degrees of freedom (number of studies minus 1). Values of I² range from 0% to 100%, and it tells us what proportion of the total variation across studies is beyond chance. This statistic can be used to compare the amount of inconsistency across different meta-analyses even with different numbers of studies.⁷ I² is routinely implemented in all Cochrane reviews (standard option in RevMan) and is increasingly used in meta-analyses published in medical journals.

Higgins and colleagues suggested that we could “tentatively assign adjectives of low, moderate, and high to I² values of 25%, 50%, and 75%.”⁶ Like any metric, however, I² has some uncertainty, and Higgins and Thompson provided methods to calculate this uncertainty.⁵ Recently, other investigators compared the performance of I² and Q in Monte-Carlo simulations across diverse simulated meta-analytic conditions. They found that I² also has low statistical power with small numbers of studies and its confidence intervals can be large.⁸

Interpreting heterogeneity in selected meta-analyses

Inferences about the extent of heterogeneity must be especially cautious when the 95% confidence intervals around I² are wide, ranging from low to high heterogeneity. Such uncertainty is usually ignored in systematic reviews, however. This can result in misconceptions. For example, a systematic review of corticosteroids for Kawasaki disease found a point estimate I²=59%.⁹ The authors decided to exclude the two studies that were most different, saying that their removal eliminated all of the across study heterogeneity (Q=5.59, P=0.588, I²=0.00). In fact, the 95% confidence interval for this I²=0% estimate still extends from 0% to 56%. With two small randomised trials and six non-randomised comparisons remaining, the meta-analysis concluded that corticosteroids consistently halve the risk of coronary aneurysms. However, the two largest randomised trials on this topic were published after the meta-analysis. Heterogeneity resurfaced: the largest trial found no effect on coronary dimensions,¹⁰ while the other trial showed an 80% reduction in the risk of coronary artery abnormalities.¹¹

Eight systematic reviews published in the BMJ between 1 July 2005 and 1 January 2006 performed meta-analyses of randomised trials and seven of them performed some statistical analysis of heterogeneity between studies (table on bmj.com).¹² ¹³ ¹⁴ ¹⁵ ¹⁶ ¹⁷ ¹⁸ Each review stated that they had tried to interpret heterogeneity, and seven meta-analyses provided enough information for us to calculate the 95% confidence interval of I². The lower 95% confidence interval was always as low as 0% (rounded to integer percentage), with one exception. The upper 95% confidence interval always exceeded the 50% threshold, and in four cases it also exceeded the 75% threshold. A conclusive statement was feasible in only one case, where I² was 69%, the 95% confidence interval was 40% to 80%, the Q statistic had P<0.001, and the authors justifiably concluded that “there was significant heterogeneity among these trials.”¹³ This meta-analysis had 15 studies, so the power of both Q and I² was good. In all other meta-analyses (two to 12 studies each), strong statements in interpreting heterogeneity would be difficult to make. Only one review presented 95% confidence intervals for an I² estimate.¹² The authors concluded that “we could not observe significant heterogeneity.” Indeed the Q statistic had P=0.19. However, with only five studies, the power to detect heterogeneity was negligible. The I² statistic was 35% and the 95% confidence interval ranged from 0% (no heterogeneity) to 76% (high heterogeneity).

Uncertainty in I²: large scale survey of meta-analyses

This limitation is not confined to the selected examples presented here—it is probably the rule rather than the exception. We used two large datasets of meta-analyses to evaluate empirically the extent of uncertainty in I² estimates. Firstly, we looked at meta-analyses of the Cochrane Database of Systematic Reviews (Issue 4, 2005) that had four or more synthesised studies and binary outcomes. Because each Cochrane review may include several meta-analyses, we looked only at the one with the highest number of studies; in the case of ties, we used the one with the largest sample size. We did not look at meta-analyses of two or three studies. Such studies form a sizeable proportion of the Cochrane Library,¹⁹ but their 95% confidence intervals of I² almost always span a wide range of heterogeneity, unless the studies are large and they give very different results. In total, we calculated the I² statistic and its 95% confidence intervals for 1011 meta-analyses. The second dataset was a previously described database of 50 meta-analyses of gene-disease associations that had found a nominally statistically significant effect (P<0.05) for the proposed genetic risk factors.²⁰

Figure 1 shows the upper and lower 95% confidence intervals of I² for the two sets of meta-analyses. The pattern is similar. Of the meta-analyses where I² is ≤25% (low heterogeneity), 83% of the Cochrane meta-analyses and 73% of the genetic risk factor meta-analyses have upper 95% confidence intervals that cross into the range of large heterogeneity (I² ≥50%). Of the meta-analyses where I² is ≥50% (large heterogeneity), 67% of the Cochrane meta-analyses and 52% of the genetic risk factor meta-analyses have lower 95% confidence intervals that cross into the range of low heterogeneity (I² ≤25%).

graphic file with name ioaj480715.f1.jpg — **Fig 1** Confidence intervals for estimated I² in 1011 Cochrane meta-analyses and 50 meta-analyses of genetic risk factors. The median number of studies was 7 (interquartile range 5-11) and 20 (13-26), respectively, and the median total sample size was 1112 (512-2691) and 4660 (2823-8761), respectively. The median I² was 21% (0-50%) and 38% (5-60%), respectively

Meta-analyses where I² is estimated at 0% are affected by an especially important misconception. Many reviews interpret this as absence of heterogeneity, but the upper 95% confidence interval may be substantial (as in the Kawasaki example discussed above⁹). Figure 2 shows the uncertainty for the upper 95% confidence interval of I² for the two sets of meta-analyses, limited to those with I²=0% (n=373 for Cochrane reviews, n=12 genetic studies). The upper 95% confidence interval exceeds 33% in all these meta-analyses. For 81% of the meta-analyses with I²=0%, the 95% confidence intervals are 50% or higher. Because of the way that research is currently reported, considerable heterogeneity between studies cannot be excluded with confidence in most meta-analyses. Some heterogeneity between studies is probably present in most meta-analyses. Claims for homogeneity may sometimes be stronger than the evidence allows. Trusting a non-significant P value for the Q statistic and an I² estimate of 0% may sometimes lead to spurious certainty about the comparability and similarity of study results.

graphic file with name ioaj480715.f2.jpg — **Fig 2** Proportion of meta-analyses with estimated I²=0% whose upper 95% confidence interval of I² is lower than a given value

Technical aspects

The confidence interval of I² can be calculated by several methods.⁵ Two methods, a test based approach and a non-central χ² based approach have been implemented in Stata (heterogi module). The performance of these two methods is comparable, although the test based approach often gives lower values for lower and upper confidence intervals, so that the non-central χ² based approach may be preferable.

Concluding comments

All statistical tests for heterogeneity are weak, including I². The clinical implications of this are considerable and must be examined on a case by case basis. Putting too much trust in homogeneity of effects may give a false sense of reassurance that one size fits all. Lack of evidence of heterogeneity is not evidence of homogeneity. Conversely, putting too much trust in the presence of heterogeneity of effects may lead to spurious subgroup and exploratory analyses. Given that I² is not precise, 95% confidence intervals should always be given.

Supplementary Material

[extra: Web extra]

bmj_335_7626_914__index.html^{(519B, html)}

Contributors and sources: JPAI has a long standing interest in meta-analyses and heterogeneity and had the original idea for this article. NAP and EE collected the data. NAP performed statistical analyses with help from JPAI and EE. JPAI wrote the manuscript and NAP and EE commented on it. JPAI is guarantor.

Competing interests: None declared.

Provenance and peer review: Not commissioned; externally peer reviewed.

References

1.Lau J, Ioannidis JPA, Schmid CH. Summing up evidence: one answer is not always enough. Lancet 1998;351:123-7. [DOI] [PubMed] [Google Scholar]
2.Sutton A, Abrams K, Jones D, Sheldon T, Song F. Methods for meta-analysis in medical research Chichester: Wiley, 2000
3.Petitti DB. Approaches to heterogeneity in meta-analysis. Stat Med 2001;20:3625-33. [DOI] [PubMed] [Google Scholar]
4.Cochran WG. The combination of estimates from different experiments. Biometrics 1954;10:101-29. [Google Scholar]
5.Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539-58. [DOI] [PubMed] [Google Scholar]
6.Higgins JPT, Thompson SG, Deeks J, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003;327:557-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Mittlbock M, Heinzl H. A simulation study comparing properties of heterogeneity measures in meta-analyses. Stat Med 2006;25:4321-33. [DOI] [PubMed] [Google Scholar]
8.Huedo-Medina TB, Sánchez-Meca F, Marín-Martínez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or IPsychol Methods index? [DOI] [PubMed] [Google Scholar]
9.Wooditch AC, Aronoff SC. Effect of initial corticosteroid therapy on coronary artery aneurysm formation in Kawasaki disease: a meta-analysis of 862 children. Pediatrics 2005;116:989-95. [DOI] [PubMed] [Google Scholar]
10.Newburger JW, Sleeper LA, McCrindle BW, Minich LL, Gersony W, Vetter VL, et al; Pediatric Heart Network Investigators. Randomized trial of pulsed corticosteroid therapy for primary treatment of Kawasaki disease. N Engl J Med 2007;356:663-75. [DOI] [PubMed] [Google Scholar]
11.Inoue Y, Okada Y, Shinohara M, Kobayashi T, Kobayashi T, Tomomasa T, et al. A multicenter prospective randomized trial of corticosteroids in primary therapy for Kawasaki disease: clinical course and coronary artery outcome. J Pediatr 2006;149:336-41. [DOI] [PubMed] [Google Scholar]
12.Maier PC, Funk J, Schwarzer G, Antes G, Falck-Ytter YT. Treatment of ocular hypertension and open angle glaucoma: meta-analysis of randomised controlled trials. BMJ 2005;331:134. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Dennis CL. Psychosocial and psychological interventions for prevention of postnatal depression: systematic review. BMJ 2005;331:15. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Devereaux PJ, Beattie WS, Choi PT, Badner NH, Guyatt GH, Villar JC, et al. How strong is the evidence for the use of perioperative beta blockers in non-cardiac surgery? Systematic review and meta-analysis of randomised controlled trials. BMJ 2005;331:313-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Taylor SJ, Candy B, Bryar RM, Ramsay J, Vrijhoef HJ, Esmond G, et al. Effectiveness of innovations in nurse led chronic disease management for patients with chronic obstructive pulmonary disease: systematic review of evidence. BMJ 2005;331:485. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Webster AC, Woodroffe RC, Taylor RS, Chapman JR, Craig JC. Acrolimus versus ciclosporin as primary immunosuppression for kidney transplant recipients: meta-analysis and meta-regression of randomised trial data. BMJ 2005;331:810. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.McDonald MA, Simpson SH, Ezekowitz JA, Gyenes G, Tsuyuki RT. Angiotensin receptor blockers and risk of myocardial infarction: systematic review. BMJ 2005;331:873. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Glass J, Lanctot KL, Herrmann N, Sproule BA, Busto UE. Sedative hypnotics in older people with insomnia: meta-analysis of risks and benefits. BMJ 2005;331:1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ioannidis JP, Trikalinos TA, Zintzaras E. Extreme between-study homogeneity in meta-analyses could offer useful insights. J Clin Epidemiol 2006;59:1023-32. [DOI] [PubMed] [Google Scholar]
20.Ioannidis JP, Trikalinos TA, Khoury MJ. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol 2006;164:609-14. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[extra: Web extra]

bmj_335_7626_914__index.html^{(519B, html)}

bmj_335_7626_914__1.pdf^{(27.7KB, pdf)}

[ref1] 1.Lau J, Ioannidis JPA, Schmid CH. Summing up evidence: one answer is not always enough. Lancet 1998;351:123-7. [DOI] [PubMed] [Google Scholar]

[ref2] 2.Sutton A, Abrams K, Jones D, Sheldon T, Song F. Methods for meta-analysis in medical research Chichester: Wiley, 2000

[ref3] 3.Petitti DB. Approaches to heterogeneity in meta-analysis. Stat Med 2001;20:3625-33. [DOI] [PubMed] [Google Scholar]

[ref4] 4.Cochran WG. The combination of estimates from different experiments. Biometrics 1954;10:101-29. [Google Scholar]

[ref5] 5.Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539-58. [DOI] [PubMed] [Google Scholar]

[ref6] 6.Higgins JPT, Thompson SG, Deeks J, Altman DG. Measuring inconsistency in meta-analyses. BMJ 2003;327:557-60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7.Mittlbock M, Heinzl H. A simulation study comparing properties of heterogeneity measures in meta-analyses. Stat Med 2006;25:4321-33. [DOI] [PubMed] [Google Scholar]

[ref8] 8.Huedo-Medina TB, Sánchez-Meca F, Marín-Martínez F, Botella J. Assessing heterogeneity in meta-analysis: Q statistic or IPsychol Methods index? [DOI] [PubMed] [Google Scholar]

[ref9] 9.Wooditch AC, Aronoff SC. Effect of initial corticosteroid therapy on coronary artery aneurysm formation in Kawasaki disease: a meta-analysis of 862 children. Pediatrics 2005;116:989-95. [DOI] [PubMed] [Google Scholar]

[ref10] 10.Newburger JW, Sleeper LA, McCrindle BW, Minich LL, Gersony W, Vetter VL, et al; Pediatric Heart Network Investigators. Randomized trial of pulsed corticosteroid therapy for primary treatment of Kawasaki disease. N Engl J Med 2007;356:663-75. [DOI] [PubMed] [Google Scholar]

[ref11] 11.Inoue Y, Okada Y, Shinohara M, Kobayashi T, Kobayashi T, Tomomasa T, et al. A multicenter prospective randomized trial of corticosteroids in primary therapy for Kawasaki disease: clinical course and coronary artery outcome. J Pediatr 2006;149:336-41. [DOI] [PubMed] [Google Scholar]

[ref12] 12.Maier PC, Funk J, Schwarzer G, Antes G, Falck-Ytter YT. Treatment of ocular hypertension and open angle glaucoma: meta-analysis of randomised controlled trials. BMJ 2005;331:134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13.Dennis CL. Psychosocial and psychological interventions for prevention of postnatal depression: systematic review. BMJ 2005;331:15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] 14.Devereaux PJ, Beattie WS, Choi PT, Badner NH, Guyatt GH, Villar JC, et al. How strong is the evidence for the use of perioperative beta blockers in non-cardiac surgery? Systematic review and meta-analysis of randomised controlled trials. BMJ 2005;331:313-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15.Taylor SJ, Candy B, Bryar RM, Ramsay J, Vrijhoef HJ, Esmond G, et al. Effectiveness of innovations in nurse led chronic disease management for patients with chronic obstructive pulmonary disease: systematic review of evidence. BMJ 2005;331:485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16.Webster AC, Woodroffe RC, Taylor RS, Chapman JR, Craig JC. Acrolimus versus ciclosporin as primary immunosuppression for kidney transplant recipients: meta-analysis and meta-regression of randomised trial data. BMJ 2005;331:810. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] 17.McDonald MA, Simpson SH, Ezekowitz JA, Gyenes G, Tsuyuki RT. Angiotensin receptor blockers and risk of myocardial infarction: systematic review. BMJ 2005;331:873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] 18.Glass J, Lanctot KL, Herrmann N, Sproule BA, Busto UE. Sedative hypnotics in older people with insomnia: meta-analysis of risks and benefits. BMJ 2005;331:1169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] 19.Ioannidis JP, Trikalinos TA, Zintzaras E. Extreme between-study homogeneity in meta-analyses could offer useful insights. J Clin Epidemiol 2006;59:1023-32. [DOI] [PubMed] [Google Scholar]

[ref20] 20.Ioannidis JP, Trikalinos TA, Khoury MJ. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol 2006;164:609-14. [DOI] [PubMed] [Google Scholar]

PERMALINK

Uncertainty in heterogeneity estimates in meta-analyses

John P A Ioannidis

Nikolaos A Patsopoulos

Evangelos Evangelou

Roles

Abstract

Summary points

Evaluating heterogeneity between studies

Interpreting heterogeneity in selected meta-analyses

Uncertainty in I²: large scale survey of meta-analyses

Technical aspects

Concluding comments

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Uncertainty in heterogeneity estimates in meta-analyses

John P A Ioannidis

Nikolaos A Patsopoulos

Evangelos Evangelou

Roles

Abstract

Summary points

Evaluating heterogeneity between studies

Interpreting heterogeneity in selected meta-analyses

Uncertainty in I2: large scale survey of meta-analyses

Technical aspects

Concluding comments

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Uncertainty in I²: large scale survey of meta-analyses