Abstract
Meta-analysis is a powerful tool to estimate measures of associations or effects based on published or unpublished reports. However, problems exist in many meta-analyses, particularly related to study heterogeneity. This article proposes a way of concluding meta-analysis results using P values, taking heterogeneity into account. There is little published research focused on evaluating conclusiveness of summary results of reported meta-analyses. Generally, a P value is directly linked to the test statistic z=b/sb following a standard normal distribution with mean zero and unit variance, where b is an estimator of β and sb is the estimated standard error of b for any study included in a meta-analysis. This forms the basis of the proposed method for deriving overall test statistics and corresponding P values used for comparing results of meta-analyses. Two published meta-analyses were chosen and specific software was applied. Results are consistent with the two published meta-analysis reports in terms of P values for significance and direction of summary measure of treatment effect. This proposed method can be utilized to safeguard against improper conclusions of published meta-analyses due to heterogeneity. Exploring more sophisticated statistical methods for situations when the key assumption applied to this proposed method is violated could be pursued and could expand the scope of applications beyond this method.
Keywords: Meta-analysis, Heterogeneity, P value
Meta-analysis is a powerful tool to estimate overall measures of associations or effects based on published or unpublished reports.1 However, many problems exist in most, if not all, meta-analyses. One specific problem that may cause doubt about the result of a meta-analysis is study heterogeneity.2 Two of the most commonly cited issues relevant to heterogeneity are different treatment effect parameters (generally expressed as β1, β2, etc; called regression coefficients in linear regression analysis, odds ratio logistic regression analysis, or relative risk regression analysis) and different variances of these parameter estimates (eg, σ12, σ22). Other important issues may include different study populations (eg, Asian populations, European populations), different study sample sizes (eg, n1, n2), different study designs (eg, clinical trials, cohort studies, case-control studies, cross-sectional studies), and the appropriateness of a fixed effects model versus a random effects model.2 There has been very little published research that focused particularly on evaluating whether or not the summary result of a reported meta-analysis is accurate.
The present article proposes a simple and robust way of concluding meta-analysis results using P values and standardized effect sizes. P values are used as the means of converting meta-analysis results to defined/known test statistics which are expressible as a function of the estimates of the βs and σs described above. These test statistics are then applied to deriving the standardized normal z values (ie, standardized effect sizes) based on the sign (positive or negative) of βs. With a given level of significance (ie, type I error), say 0.05, the conclusiveness of meta-analysis results can be determined. Alternatively, the standardized normal z values of the proposed method can also be derived if estimates of βs, σs, or the confidence limits are reported in the meta-analysis. This method has been tested using published meta-analysis results. Interpretations and limitations of the results from this method are also discussed.
Methods
Possible outcome measures such as odds ratio, relative risk, risk difference, rate ratio, and the number needed to treat may be singularly chosen to obtain a summary measure in a reported meta-analysis. Beyond each of these measures, a corresponding P value which is used to compare with the level of significance is almost always reported. In general, a P value is directly linked to the test statistic z=b/sb, which follows a standard normal distribution with mean 0 and unit variance,3 where b is an estimator of β and sb is the estimated standard error of b for any study included in a meta-analysis. Formally, let us assume there are k studies included in a meta-analysis, and let pi denote the ith P value directly obtained from the ith study, where i = 1, 2,…, k. With the given pi, the inverse standardized normal test statistic zi can be derived from the probability equation: pr(zi > z0, or zi < -z0) = pi, for i = 1,2,…, k, where z0 represents the assumed constant for hypothesis testing. Then, these zis will be averaged to obtain the sample mean and denoted as: zbar = Σ zi/k, which is N(0, 1/k) since each zi is N(0, 1) for i = 1, 2,…, k. The confidence interval (CI) for its true mean μzbar can then be constructed as follows: zbar ± zα/2 × szbar, where α is the assigned level of significance, zα/2 is a standardized normal value, and szbar={Σ (zi – zbar)2/[k × (k – 1)]}1/2. If the derived CI excludes zero, then evidence exists that the meta-analysis of interest has shown a significant treatment effect. In general this treatment effect is presented as a summary measure such as odds ratio, rate ratio, or relative risk. In addition, the derived results of zbar and Szbar can be used to define the test statistic zbar/szbar which is N(0, 1), and therefore its corresponding P value can easily be derived. The resulting P value can be used to confirm or negate the reported result of the same meta-analysis.
It is not uncommon that actual P values may not be reported in meta-analysis, but instead, other relevant statistics such as rate ratio and corresponding (eg, 95%) confidence limits (CL) are included. The current proposed method is still applicable in this situation. Take odds ratio (OR) as an example: the standardized z (= b/sb) statistic of this proposed method can be obtained using methods based on the CI/CL for each study, where b = LOR (that is, the log OR), and sb = ln(ORu/OR)/1.96, or sb = ln(OR/ORl)/1.96, the standard deviation of LOR. ORl and ORu represent the lower and upper CL for the OR, respectively.
Data Analysis and Statistical Software
Data from two published meta-analyses were chosen for comparisons. The first data set was published by Burr et al,4 who considered 12 epidemiological studies that investigated the link between presence/absence of the platelet PlA polymorphism and presence/absence of coronary heart disease. Reported statistics included LORs, mainly from a logistic regression model, and the corresponding P values. The second data set used was published by Gareen et al.5 They considered 32 studies (readers can refer to tables 2-3 in Gareen's paper) used to assess the association between intrauterine devices and pelvic inflammatory disease. Reported statistics mainly included rate ratios and 95% CL. The commercially available statistical software package PASS6 was used to obtain the inverse normal z statistic values, and a SAS7 program was written and carried out for obtaining zbar and the corresponding CI based on the results of each of the published meta-analyses.
Results
Table 1 compares the reported results of the meta-analysis conducted by Burr et al4 and the results obtained using the current proposed method. Burr et al4 reported LOR and P values. For our purposes, LOR was converted to OR (= exp[LOR]) and used to calculate z values. Based on the calculated z values, the current proposed method led us to an estimate of effect (b) of 0.9160 with a standard error (sb) of 0.4586 with a 95% confidence interval of (0.0172, 1.8149), which does not include 0. The final test statistic z = 0.9160/0.4586 = 1.9976 was obtained, yielding a two-sided P value of 0.0457. This is consistent with the findings of Burr et al.4 In Burr's study,4 it was concluded that the PlA polymorphism is a risk factor for coronary heart disease, with the mean of the distribution of the LOR estimated to be 0.2 (ie, OR = exp[0.2] = 1.2), and the null hypothesis that this mean is 0 is rejected, with a two-sided P value of 0.049. In the same study, both LOR and standard error LOR (SELOR) were available. Therefore, the alternative approach proposed here may also be applied. After directly calculating the ratio LOR/SELOR for each of the 12 studies, the standardized z = 0.9123 with a standard error of 0.4554, which corresponds to a final z statistic of 0.9123/0.4554 = 2.003, yielding a P value of 0.0452. The difference (0.0457 vs. 0.0452) must be due to rounding.
Table 1.
Burr et al (2003)4 | Reported | Proposed | ||
---|---|---|---|---|
|
|
|||
LOR | P | Converted OR | z | |
| ||||
1.055 | 0.005 | 2.87 | 2.8070 | |
-0.100 | 0.388 | 0.90 | -0.8633 | |
0.626 | 0.006 | 1.87 | 2.7478 | |
0.017 | 0.886 | 1.02 | 0.1434 | |
1.068 | 0.023 | 2.91 | 2.2734 | |
-0.025 | 0.835 | 0.98 | -0.2083 | |
-0.117 | 0.596 | 0.89 | -0.5302 | |
-0.381 | 0.111 | 0.68 | -1.5937 | |
0.507 | 0.006 | 1.66 | 2.7478 | |
0.000 | 0.999 | 1.00 | 0.0013 | |
0.385 | 0.061 | 1.47 | 1.8735 | |
0.405 | 0.111 | 1.50 | 1.5937 |
LOR, log-odds ratio; OR, odds ratio
Table 2 shows the comparisons between the values reported for the meta-analysis conducted by Gareen et al5 and the results obtained using the method currently proposed in this paper. In Gareen's study5 rate ratio (RR) and corresponding 95% CL were reported. In the current study, RR was further converted to LRR (= log[RR]) and its standard error (SERLRR) was derived using the lower and upper CL of Gareen's reported study. Results were very similar using either the reported lower or upper CL. Therefore, only the SERLRR values obtained using the upper CL are included in table 2. Also in table 2, each of the 32 z values was calculated by LRR/SERLRR. Based on the calculated z values, the current proposed method resulted in a test statistic of 3.9147 with a standard error of 0.4999. Furthermore, the final test statistic z = 3.9147/0.4999 = 7.8309 was obtained, yielding a two-sided P value of < 0.0001. This is consistent with Gareen's report5 in which it was concluded that there was a consistent positive association of intrauterine devices with both symptomatic and asymptomatic pelvic inflammatory disease (RR = 4.1, 95% CL = 2.9-5.8, and P value < 0.001).
Table 2.
Gareen et al (2000)5 | Reported | Proposed | ||||
---|---|---|---|---|---|---|
|
|
|||||
RR | RRl | RRu | LRR | SERLRRu | z | |
| ||||||
1.9 | 0.38 | 9.1 | 0.6418 | 0.7992 | 0.8031 | |
3.3 | 2.2 | 4.9 | 1.1939 | 0.2017 | 5.9196 | |
2.6 | 2.1 | 3.1 | 0.9556 | 0.0897 | 10.6475 | |
4.4 | 2.2 | 9.2 | 1.4816 | 0.3763 | 3.9370 | |
4.9 | 2.7 | 9.0 | 1.5892 | 0.3102 | 5.1233 | |
2.8 | 1.5 | 5.4 | 1.0296 | 0.3351 | 3.0727 | |
5.8 | 0.011 | 3029.0 | 1.7579 | 3.1929 | 0.5505 | |
2.1 | 1.4 | 3.2 | 0.7419 | 0.2149 | 3.4524 | |
8.6 | 5.3 | 14.0 | 2.1518 | 0.2486 | 8.6548 | |
1.9 | 0.38 | 9.3 | 0.6419 | 0.8103 | 0.7921 | |
6.4 | 1.5 | 27.0 | 1.8563 | 0.7345 | 2.5274 | |
1.9 | 0.48 | 7.8 | 0.6419 | 0.7206 | 0.8908 | |
2.2 | 1.7 | 2.7 | 0.7885 | 0.1045 | 7.5460 | |
2.1 | 1.6 | 2.8 | 0.7419 | 0.1468 | 5.0549 | |
2.9 | 1.8 | 4.7 | 1.0647 | 0.2464 | 4.3219 | |
1.1 | 0.26 | 4.6 | 0.0953 | 0.7300 | 0.1306 | |
2.3 | 1.2 | 4.7 | 0.8329 | 0.3646 | 2.2843 | |
9.3 | 3.9 | 23.0 | 2.2300 | 0.4620 | 4.8271 | |
2.3 | 0.91 | 6.0 | 0.8329 | 0.4892 | 1.7026 | |
2.9 | 2.1 | 4.0 | 1.0647 | 0.1641 | 6.4892 | |
4.1 | 1.1 | 15.0 | 1.4110 | 0.6618 | 2.1322 | |
2.3 | 1.4 | 3.8 | 0.8329 | 0.2562 | 3.2514 | |
17.0 | 7.8 | 37.0 | 2.8332 | 0.3968 | 7.1404 | |
12.0 | 0.98 | 135.0 | 2.4849 | 1.2349 | 2.0123 | |
11.0 | 3.4 | 35.0 | 2.3979 | 0.5905 | 4.0605 | |
1.5 | 0.77 | 2.8 | 0.4055 | 0.3185 | 1.2733 | |
6.7 | 1.1 | 40.0 | 1.9021 | 0.9116 | 2.0865 | |
10.0 | 3.1 | 33.0 | 2.3026 | 0.6091 | 3.7800 | |
132.0 | 57.0 | 304.0 | 4.8828 | 0.4256 | 11.4721 | |
9.0 | 1.5 | 54.0 | 2.1972 | 0.9142 | 2.4035 | |
1.9 | 1.3 | 2.8 | 0.6419 | 0.1978 | 3.2443 | |
12.0 | 3.4 | 45.0 | 2.4849 | 0.6744 | 3.6848 |
RR, rate ratio; RRl, 95% lower confidence limit; RRu, 95% upper confidence limit; LRR, log-rate ratio; SERLRRu, standard error of LRR using RRu.
Discussion
Using the proposed method, we were able to consistently reproduce the results of the well-conducted, published meta-analysis reports that were used for comparisons. It has been said that meta-analysis is the analysis of the analyses. Here, an analysis of the analysis of the analyses was conducted. In this study, we propose a simple and robust way of concluding the results of published meta-analyses. It is simple because no sample size-adjustment and variance-adjustment or weighting is necessary. It is robust because the test statistic used consistently follows a standard normal distribution which takes into account the possible differences in effect size (the numerator of b/sb) and its corresponding variance or standard deviation (the denominator of b/sb) among studies included in the meta-analysis. The proposed method has value in that it will boost confidence in the results of published meta-analyses if agreement exists. If there is a disagreement between the findings using the method proposed in this paper and the reported meta-analysis results, the author would strongly recommend to readers that they question the conclusiveness of the reported findings. It is important for the conclusion of any meta-analysis results to be further confirmed. For example, the suggested increase in risk of cardiovascular events observed in a meta-analysis was later refuted by a well-conducted, placebo-controlled, randomized clinical trial.8
Limitations
The key assumption made for the method proposed herein is that the test statistic utilized follows a standard, normal, independent identical distribution (IID). This is justifiable only when the study sample size is sufficiently large (eg, 50 or more) in each of the studies included in a meta-analysis. Even though an unweighted average of our proposed method can be visualized to be unbiased,9 it may induce a larger mean square error than is expected when there exist outliers. Also, our proposed method is exactly equivalent to the fixed effect pooled estimate on the standardized effect sizes. Although it remains to be determined whether examples (real data) exist where a conclusive (P < 0.05) meta-analysis run using the random effect model is proven inconclusive (P > 0.05) by the proposed methods or vice versa (inconclusiveness vs. conclusiveness); a hypothetical example is given in table 3 as an illustration. Table 3 shows that all LORs are equal (OR = 1.5), and the P value is < 0.001 (conclusive). However, the proposed method results in a P value of 0.1314 (inconclusive), presumably due to the extreme observed z value of 113.2342.
Table 3.
Study | LOR | SELOR | z |
---|---|---|---|
S1 | 0.405465 | 1.880704 | 0.215592 |
S2 | 0.405465 | 2.572847 | 0.157594 |
S3 | 0.405465 | 0.849294 | 0.477414 |
S4 | 0.405465 | 2.951985 | 0.137353 |
S5 | 0.405465 | 0.261408 | 1.551082 |
S6 | 0.405465 | 1.209421 | 0.335256 |
S7 | 0.405465 | 2.207609 | 0.183667 |
S8 | 0.405465 | 1.236299 | 0.327967 |
S9 | 0.405465 | 0.789342 | 0.513675 |
S10 | 0.405465 | 0.003581 | 113.2342 |
S11 | 0.405465 | 1.850707 | 0.219087 |
S12 | 0.405465 | 0.083472 | 4.857507 |
S13 | 0.405465 | 2.237144 | 0.181242 |
S14 | 0.405465 | 1.945276 | 0.208436 |
S15 | 0.405465 | 2.759249 | 0.146948 |
S16 | 0.405465 | 1.695643 | 0.239122 |
S17 | 0.405465 | 2.049513 | 0.197835 |
S18 | 0.405465 | 2.529477 | 0.160296 |
S19 | 0.405465 | 1.779514 | 0.227852 |
S20 | 0.405465 | 1.17509 | 0.34505 |
S21 | 0.405465 | 1.973475 | 0.205457 |
S22 | 0.405465 | 2.165817 | 0.187211 |
S23 | 0.405465 | 2.966013 | 0.136704 |
S24 | 0.405465 | 2.137419 | 0.189698 |
S25 | 0.405465 | 1.552926 | 0.261097 |
S26 | 0.405465 | 1.86624 | 0.217263 |
S27 | 0.405465 | 2.702653 | 0.150025 |
S28 | 0.405465 | 1.846319 | 0.219607 |
S29 | 0.405465 | 2.54761 | 0.159155 |
S30 | 0.405465 | 1.004315 | 0.403723 |
S31 | 0.405465 | 1.780947 | 0.227668 |
S32 | 0.405465 | 1.177176 | 0.344439 |
LOR, log-odds ratio; SELOR, standard error LOR
Another limitation is that the proposed approach relies on P values and some other parameter estimates (b) presented in a meta-analysis, which determine the applicability of this current approach. Another workable scenario of this proposed approach is when both b and sb are reported or derivable from a meta-analysis. Finally, only two published data sets were chosen and applied to the current proposed method. However, the proposed method is simple enough to allow readers to adopt and conduct an investigation of their own to any meta-analysis data they have collected or identified.
In summary, the current proposed method can be utilized to safeguard against improper conclusions reported by certain published meta-analyses, due primarily to the issue of heterogeneity among studies included in the meta-analysis. Exploration of more sophisticated statistical methods for situations in which the key assumption applied to the method proposed herein is violated is worthy of pursuing and may benefit from simulations with different levels of heterogeneity. This will certainly enhance and expand the scope of applications beyond the method proposed here.
Acknowledgements
The author thanks the Marshfield Clinic Research Foundation's Office of Scientific Writing and Publication for assistance in the preparation of this manuscript. The author also wishes to thank the referees for their insightful comments and feedback which led to improvements to this manuscript.
References
- 1.Petitti DB. Meta-Analysis, Decision Analysis and Cost- Effectiveness Analysis: Method for Quantitative Synthesis in Medicine. 2nd ed. New York, NY: Oxford University Press; 1994 [Google Scholar]
- 2.Elwood JM. Critical Appraisal of Epidemiological Studies and Clinical Trials. 2nd ed. New York, NY: Oxford University Press; 1998 [Google Scholar]
- 3.Hogg RV, Craig AT. Introduction to Mathematical Statistics. 4th ed. New York, NY: Macmillan Publishing Company; 1978 [Google Scholar]
- 4.Burr D, Doss H, Cooke GE, Goldschmidt-Clermont PJ. A meta-analysis of studies on the association of the platelet PlA polymorphism of glycoprotein IIIa and risk of coronary heart disease. Stat Med 2003;22:1741–1760 [DOI] [PubMed] [Google Scholar]
- 5.Gareen HF, Greenland S, Morgenstern H. Intrauterine devices and pelvic inflammatory disease: Meta-analyses of published studies, 1974-1990. Epidemiology 2000;11:589–597 [DOI] [PubMed] [Google Scholar]
- 6.Hintz J. NCSS and PASS. Statistical & Power Analysis Software. Kaysville, Utah; 2001. Available at: http://www.ncss.com [Google Scholar]
- 7.SAS, version 9.2, Cary, NC: SAS Institute; 2005 [Google Scholar]
- 8.Michele TM, Pinheiro S, Iyasu S. The safety of tiotropium--the FDA's conclusions. N Engl J Med 2010;363:1097–1099 [DOI] [PubMed] [Google Scholar]
- 9.Shuster JJ. Empirical vs natural weighting in random effects meta-analysis. Stat Med 2010;29:1259-1265 [DOI] [PMC free article] [PubMed] [Google Scholar]