Skip to main content
American Journal of Public Health logoLink to American Journal of Public Health
. 2006 Mar;96(3):515–522. doi: 10.2105/AJPH.2003.036343

Sufficiency and Stability of Evidence for Public Health Interventions Using Cumulative Meta-Analysis

Paige Muellerleile 1, Brian Mullen 1
PMCID: PMC1470523  PMID: 16449603

Abstract

We propose cumulative meta-analysis as the procedure of completing a new meta-analysis at each successive wave in a research database. Two facets of cumulative knowledge are considered: the first, sufficiency, refers to whether the meta-analytic database adequately demonstrates that a public health intervention works. The second, stability, refers to the shifts over time in the accruing evidence about whether a public health intervention works.

We used a hypothetical data set to develop the indicators of sufficiency and stability, and then applied them to existing, published datasets. Our discussion centers on the implications of the use of this procedure in evaluating public health interventions.


Meta-analysis is the statistical integration of the results of independent studies.14 This approach to the quantitative review of the weight of evidence has proven to be useful in helping determine the effectiveness of public health interventions. Meta-analysis has been used to gauge the effectiveness of interventions aimed at changing patient behavior,57 interventions aimed at changing physician behavior,8 and interventions aimed at more far-reaching public health policy.9

Traditional meta-analysis can inform public health interventions and policies, usually to determine whether an intervention has an impact on health practices, and the magnitude of that impact. However, traditional meta-analysis overlooks 2 aspects of public health information. The first is sufficiency. Sufficiency refers to whether the meta-analytic database adequately demonstrates whether a public health intervention works. For example, 1 meta-analysis10 synthesizes the relationship between socioeconomic status and self-esteem, integrating the results of 446 hypothesis tests conducted among 312 940 participants. This number of hypothesis tests begs the question of whether there was sufficient justification for using valuable research and participant resources to conduct the 446th hypothesis test to help establish the relationship between socioeconomic status and self-esteem. If there was little value in adding the 446th hypothesis test, was there sufficient value in the 445th test? What about the 200th test?11 For many public health issues, collecting additional evidence for an already-established effect may waste more than research and participant resources: delaying implementation of effective risk-reduction interventions may also waste health care resources, employer costs, and lives.

The second of the aspects overlooked by traditional meta-analysis is stability. Stability refers to the shifts over time in the accruing evidence about whether a public health intervention works. For example, the purported effects of sex education have been controversial. Studies of the efficacy of sex education programs have rendered conflicting estimates of the effects of these programs on adolescent sexual activity: some studies indicate that sex education programs decrease sexual activity.12 Others indicate that sex education programs do not appear to influence rates of sexual activity.13 Still others indicate that sex education programs lead to increased sexual activity.14 As additional studies are added, the estimate of the typical effect of sex education programs on adolescent sexual activity may continue to fluctuate.15 For a number of public health issues, implementing effective interventions is a worthwhile effort. However, implementing ineffective interventions can waste health care resources, employer costs, and lives.

We describe these 2 aspects of cumulative knowledge in the public health context. We discuss previous efforts to interpret cumulative meta-analysis, explain indicators of sufficiency and stability to aid interpretation of cumulative meta-analysis, and consider the use of the indicators of sufficiency and stability in a set of previously published meta-analyses.

CUMULATIVE META-ANALYSIS

Cumulative meta-analysis refers to the process of performing new meta-analyses at successive points in time in a research domain.16 Therefore, at each “wave” of the database (each time a study is added), a separate meta-analysis is conducted. For simplicity, all of the examples in this paper are assumed to conform to usual standards for performing an informative meta-analysis. These standards involve thorough literature searches using well-defined criteria for a specific hypothesis, and consideration of the methodological soundness of the studies to be included. They also involve careful and consistent extraction of precise tests of significance and effect size. Comprehensive discussions of standards for performing meta-analyses can be found in several sources.14,17,18

To illustrate the examination of evidence for sufficiency and stability in cumulative meta-analysis, we will make use of a hypothetical data set that has previously been used to illustrate other meta-analytic issues.1,2,16,19,20 Table 1 describes the data set, which includes the results of 10 studies of the effects of X on Y. For this example, let X = some public health intervention (e.g., seatbelt laws) and let Y = some public health outcome (e.g., traffic fatalities). For each hypothesis test, Table 1 also presents the corresponding Z for significance and ZFisher for effect size (the Fisher logarithmic transformation of the product moment r for effect size). This use of ZFisher is consistent with the meta-analytic techniques4,21,22 used in this effort. Cumulative meta-analyses using Cohen d or Hedges δ, or any other linear metric of effect size, could be conducted similarly.

TABLE 1—

Hypothetical Meta-Analytic Database

Significance Levels Effect Sizes
Study Year Statistic (df) n Direction of Effect a Z P ZFisher r
1 1981 χ2 (1) = 23.000 110 + 4.80 8.35E −7 0.49 .457
2 1982 r (78) = .335 80 + 3.04 .00119 0.35 .335
3 1983 P = .000001 80 + 4.75 .000001 0.59 .531
4 1984 t (98) = 6.500 100 + 5.91 2.08E −9 0.62 .549
5 1985 F (1, 88) = 15.000 90 + 3.71 .00010 0.40 .382
6 1986 F (1, 63) = 10.250 65 + 3.07 .00107 0.39 .374
7 1987 r (68) = .535 70 + 4.77 9.45E −7 0.60 .535
8 1988 Z = 3.891 70 + 3.891 .00005 0.50 .465
9 1989 t (63) = 6.000 65 + 5.31 5.79E −8 0.70 .603
10 1990 P = .01 60 + 2.33 .01 0.31 .300

Note. Data in this table are hypothetical only.

aResults of these hypothesis tests are in the expected direction.

Source. Mullen B.1,2

Initially, assessment of the evidence for sufficiency and stability comes from visual examination of the results of a cumulative meta-analysis. Figure 1a presents a graph of the data set described in Table 1. The data point at wave 1 represents the effect size from the first study, published in 1981. Its value is ZFisher = 0.49. At wave 2, the data point represents the mean effect size, combining the data from the first and second studies. The value of the effect size from the second study is ZFisher = 0.35, resulting in a mean effect size of Z̄Fisher 2 = 0.42. Performing a new meta-analysis for each of the 10 waves in the database results in a mean effect size of Z̄Fisher 10 = 0.50.

FIGURE 1—

FIGURE 1—

Cumulative meta-analysis using Mullen’s hypothetical database: average effect for each wave in the database (a), failsafe ratio (b), individual hypothesis tests (c), and cumulative slope (d).

Source. Mullen.2

The 95% confidence intervals (CIs) around each Z̄Fisher i are not intended for use as estimators of inferential probabilities. Cumulative meta-analysis necessarily involves multiple tests of the same hypothesis, and using CIs for estimating inferential probabilities therefore increases the likelihood of committing a Type I error. In this context, rather than being indications of the likelihood that the effects are significant, the CIs indicate the range of values that are statistically equivalent to the parameter. In other words, the CIi around the Z̄Fisher i for wave i indicates the range of values indistinguishable from the parameter value. Generally, the CIis become narrower as the number of hypothesis tests, ki, increases, and as the cumulative sample size, ∑Ni, increases.19 For a Z̄Fisher i that remains constant, then, additional studies result in narrower CIi s around that mean, which decreases the range of values for the effect size that are statistically equivalent to the true effect size.

From the first wave through the end of the database, the evidence for the effect of X on Y appeared to be sufficient: the CIi around the mean effect size did not include the value of zero. Put differently, the range of values for the mean effect size at each wave appeared to be statistically different from a null effect. Therefore, it would be hard to argue for additional research about the effects of X on Y, as it appears that the effect was there from the start. Similarly, from the first wave through the end of the database, the evidence for the effect of X on Y appeared to be stable: there is little change in the value of the mean effect. Therefore, it would be hard to argue for additional research to determine whether the emergent picture of the effect of X upon Y might change. Although the visual information presented in Figure 1a portrays a simple data set for which this interpretation is straightforward, one cannot expect real datasets to be so obliging. In real datasets, it may be very difficult to determine when there was sufficient evidence to determine that X had a particular effect on Y. Moreover, in real datasets, it may be very difficult to determine when the effect became stable; that is, the point at which the value of the effect of X upon Y did not change appreciably from one wave to the next.

PREVIOUS EFFORTS TO INTERPRET CUMULATIVE META-ANALYSES

Previous meta-analytic undertakings2326 have not differentiated sufficiency from stability; however, both sufficiency and stability are implied in these efforts. For example, Lau and others24 observed that there was sufficient evidence for researchers to have shown intravenous streptokinase for acute infarction to be a lifesaving therapy 25 years before its approval by the Food and Drug Administration. Likewise, they noted that 2 additional clinical trials did not change the value of the therapy established by the preceding evidence. Nevertheless, previous efforts to interpret cumulative meta-analyses were based on a visual inspection of the accumulating results, similar to the foregoing discussion of results portrayed in Figure 1. However, visual inspection of accumulating results may not yield a straightforward answer about whether there is sufficient evidence to determine that X has an effect on Y, nor does it necessarily yield a straightforward answer about whether that effect has achieved stability.

Pogue and Yusuf25 suggested a different approach for determining when accumulating evidence is statistically significant, which involves the adaptation of classical monitoring boundaries. They propose that the cumulative meta-analyst calculate an “optimum information size,” which is the cumulative sample size needed to demonstrate an effect, in light of event rates and the minimum reasonable values of the independent variable that would be considered consequential.

Although their efforts to produce a method for statistical inference within cumulative meta-analysis are commendable, there has been little debate about the efficacy of the proposed monitoring boundaries. We propose the use of more straightforward indicators of sufficiency and stability, even though there may not be accompanying inferential probabilities for them. The first reason for using more straightforward indicators is their simplicity. The second reason for using more straightforward indicators is that Pogue and Yusuf25 require a priori specification of the optimum information size. However, a researcher must know what the event rates might be—which requires an understanding of what minimum effects of the independent variable are both consequential and reasonable—before specifying the optimum information size. In other words, the researcher would need extensive knowledge of the observed results of the accumulated research before undertaking a cumulative meta-analysis to understand the observed results of the accumulated research. Finally, the third reason for using more straightforward indicators is that Pogue and Yusuf were concerned only with sufficiency: whether additional evidence is needed to establish that X has some effect upon Y. They did not address whether that effect has become stable across waves in a database.25 For these reasons, we propose that cumulative meta-analysts make use of more straightforward indicators of (both) sufficiency and stability.

INDICATORS OF SUFFICIENCY AND STABILITY

The indicators we propose rely on inspection of graphs of a type of meta-analytic “time-series” data.16 Some researchers have argued there is little agreement among judges who interpret visual information,27,28 which may result in different conclusions about those data than conclusions on the basis of statistical analysis.2931 However, a meta-analysis on the subject of visual interpretation of data showed that interjudge agreement can be quite good.32 Moreover, other scholars have recommended guidelines for creation of graphical presentations that facilitate interjudge agreement (e.g., consistent axes and scaling).3341 Following such guidelines, we hope to reduce the potential for lack of agreement among judges.

Clearly, the hypothetical database presented in Table 1, used to generate Figure 1, appears to demonstrate sufficient evidence for the effect of X upon Y. It also appears that the effect of X upon Y is stable. However, real research databases are unlikely to be as clear-cut as this one. Therefore, we will outline the procedures for generating indicators of sufficiency and stability using the hypothetical database, and then use the same procedures in real databases.

The Failsafe Ratio

There is a bias in favor of publishing reports of significant results.4247 The consequence of the bias is the possibility that unpublished or unknown studies with null results may exist in researchers’ file drawers.47 To address the file drawer problem, Rosenthal developed a technique for estimating the number of unpublished, unretrieved studies with null results that would have to exist in file drawers that would bring the overall combined probability to just significant at the α= 0.05 level. The resulting “failsafe number” (Nfs(P=0.05)42) is calculated as follows:

graphic file with name 036343.Eq1.jpg

Rosenthal47 noted that it would be unlikely that there would be 5 times as many unretrieved studies as there were in the meta-analyst’s database. He proposed that Nfs(P = 0.05) exceed 5k + 10 (the addition of 10 studies would ensure that for very small meta-analytic databases of 1 or 2 studies, the number of unretrieved studies would be 15 or 20, rather than only 5 or 10). The importance of the failsafe number Nfs(P = 0.05) and Rosenthal’s47 5k + 10 standard is illustrated by the studies that use it.4852 The “failsafe ratio” is an indicator of the relative sizes of the failsafe number and the Rosenthal standard, and is calculated as follows:

graphic file with name 036343.Eq2.jpg

where ki = the number of studies in the database at wave i. If the failsafe ratio is less than 1.000, then Nfs(P = 0.05)i at wave i has not exceeded the 5ki + 10 standard. Thus, the results at wave i are still vulnerable to future null results. If the failsafe ratio exceeds 1.000, then Nfs(P = 0.05)i at wave i has exceeded the 5ki + 10 standard. Thus, the results at wave i will tolerate future null results.

Figure 1b displays the cumulative meta-analysis from Figure 1, with the addition of the failsafe ratio that was calculated at each wave of the database. For example, the first wave had 1 study (k1 = 1), and the Nfs(P = 0.05) 1 = 7.5. Therefore, the value of the failsafe ratio would be:

graphic file with name 036343.Eq3.jpg

Because the value of the failsafe ratio is less than 1.000, the results at wave 1 are still vulnerable to future null results. The second wave added 1 study (k2 = 2), and the Nfs(P = 0.05) 2 = 20.7. Therefore, the value of the failsafe ratio would be:

graphic file with name 036343.Eq4.jpg

Because the failsafe ratio exceeds 1.000, the results at wave 2 are likely to tolerate future null results. The value of the failsafe ratio continues to increase to a value of 10.483 by the 10th wave of the database.

Inspection of the failsafe ratio displayed in Figure 1b reveals that the number of studies in the database with null results needed to reduce the combined significance to P = 0.05 becomes excessive beyond the second wave in the database, where the failsafe ratio exceeds 1.000. From that point onward in time, there seems to be no need for additional research to establish the effect of X on Y; there is sufficient evidence that the phenomenon exists, and additional research is unlikely to change the weight of that evidence.

Although the failsafe ratio can indicate the sufficiency of a research database, it does not adequately address the stability of the effect size. To the extent that the results of additional studies are of different magnitudes (as long as they are not null effects, on average), there can be fluctuations in the magnitude of the cumulative effect size that will not be captured by examination of the failsafe ratio. It is necessary to consider a more direct indicator of stability.

The Cumulative Slope

One way to determine whether there is a change in a database over time is to plot the data and examine the slope of the plotted points. The combined effect sizes presented as Z̄Fisher i at each wave can mask the change in effect size in successive waves. In Figure 1c, each data point’s placement has been preserved across waves, rather than presenting the average effect for each wave. For example, the first study in the database appears at wave 1 (ZFisher = 0.49). That same data point is also displayed at subsequent waves. The second study in the database appears at wave 2 (ZFisher = 0.35). That data point, along with the first, is displayed at all subsequent waves. In the final wave of the database, all 10 data points appear, for a total of 55 data points in the figure.

Additionally, the regression line in Figure 1c is the result of the regression of the ∑ki = 55 data points across each wave upon ki as a predictor. The purpose of the regression is to estimate the rate of change (slope) across all of the waves of the meta-analytic database. It would be inappropriate to use the slope to derive inferential probabilities, because meta-analytic data violate the assumptions of the general linear model for statistical inference.1,2,4,53,54 However, the least-squares estimates of regression parameters like the slope and the intercept are not biased. In the hypothetical database, the regression equation that results from the 55 pairs of ẐFisher –and–ki data points is ẐFisher = 0.46 + 0.004(k). In the cumulative meta-analysis, the slope (0.004) indicates that the best-fitting line levels off as the number of hypothesis tests increases. In other words, the effect becomes stable, not changing dramatically across waves in the database. A comparison of the size of the slope in successive waves in the database provides the cumulative meta-analyst with a means of determining whether a phenomenon has become stable.

Figure 1c does not show how the regression line may have changed across waves, which would indicate the point in the database at which the regression line became stable. In contrast, Figure 1d displays the cumulative meta-analysis from Figure 1a, with the addition of the cumulative slope, which changes as regressions are performed on each of the successive pairs of ẐFisher –and–ki data points at each wave of the database. In Figure 1d, the absolute values of the slopes resulting from regressing effect size on each successive wave i comprise the “cumulative slope.” Absolute value is used because of the chance that the first few effect sizes are larger (resulting in a negative slope) or smaller (resulting in a positive slope) than the eventual mean effect size. For example, the first cumulative slope plotted at wave 2, β = −0.070, represents the slope from the 3 pairs of data points at waves 1 and 2. The values of the slopes fluctuate between |−0.070| at wave 2, and +0.023 at wave 4. After that point, they level out at around 0.000.

Inspection of the slopes displayed in Figure 1d reveals that the phenomenon becomes stable after the third wave in the database, where the value of the cumulative slope approaches 0.000. Thus, to the extent that the cumulative slope is different from 0.000, the cumulative weight of evidence continues to fluctuate. As the cumulative slope approaches 0.000, the cumulative weight of evidence has become stable. In other words, additional studies are unlikely to change the picture of the phenomenon.

Examining sufficiency and stability as complementary aspects of an emerging cumulative meta-analytic database allow the analyst to consider the separate contributions that sufficiency and stability can make toward understanding the phenomenon. In the case of a phenomenon that appears to be strong at the outset, a cumulative slope of 0.000 indicates that additional studies would continue to support the phenomenon’s existence (high sufficiency). However, in the case of a phenomenon that appears to be negligible or null at the outset, a cumulative slope of 0.000 suggests that additional studies would not support existence of the phenomenon (low sufficiency). As such, the cumulative slope is a better indicator of the stability of a phenomenon than of sufficient evidence for it.

Summary

Figure 1a displays a hypothetical example of a database that is both sufficient and stable from the outset. The indicators of sufficiency (failsafe ratio) and stability (cumulative slope) permit the cumulative meta-analyst to determine when there was sufficient evidence for the existence of the phenomenon, and when it became stable. Because the failsafe ratio and cumulative slope established sufficiency and stability early in the hypothetical database, these indicators appear to correspond with the conclusion the analyst might have drawn from an examination of Figure 1a. The following section makes use of these indicators in real meta-analyses that are less obvious than the hypothetical example.

APPLICATIONS TO ACTUAL META-ANALYSES

A selection of meta-analyses published in the public health literature can illustrate the application of these indicators of sufficiency and stability. Selection of the following 4 meta-analyses was on the basis of 2 factors: they attempted to evaluate the effectiveness of a particular public health intervention, and they used compatible meta-analytic techniques. The selection includes McArthur’s55 integration of a school-based intervention on heart-healthy eating behaviors, White and Pitts’56 integration of drug education interventions to reducing drug use, Koger et al.’s57 integration of music therapy interventions for increasing skills among adults with dementia, and Acton and Kang’s58 integration of interventions to reduce burden among caregivers for adults with dementia. By happenstance, 2 of these datasets address issues for youth,55,56 and 2 datasets address issues for older adults.57,58 Table 2 provides descriptive information for these studies.

TABLE 2—

Actual Meta-Analyses Used to Illustrate Cumulative Meta-Analysis

Study Hypothesis No. of Studies No. of Hypothesis Tests Total No. of Participants
McArthur 55 School-based cardiovascular programs that focus on nutrition will increase heart-healthy eating. 9 12 3828
White and Pitts56 Interventions for adolescents and young adults reduce marijuana use at > 2-year follow-up. 10 11 13 201
Koger et al.57 Music therapy improves behavioral, social, emotional, and cognitive skills of adults with dementia. 21 21 336
Acton and Kang 58 Interventions for caregivers of adults with dementia reduce negative cognitive and behavioral consequences for the caregiver. 24 27 1293

Figure 2 depicts the cumulative meta-analyses for the 4 meta-analytic databases, including the failsafe ratio and the cumulative slope. Examination of Figure 2a reveals that the heart-healthy nutrition programs55 achieved sufficiency (with the failsafe ratio exceeding the critical value of 1.000) from the outset, with a modest meta-analytic database of k1 = 2 hypothesis tests. This achievement of sufficiency occurred before 10 of 12, or approximately 83%, of all of the potentially includable hypothesis tests were conducted. Examination of Figure 2a also reveals that the heart-healthy nutrition programs achieved stability (with the cumulative slope approaching 0.000) at the second wave in the database, with a modest database of k2 = 7 hypothesis tests. Stability was achieved before 5 of 12, or approximately 42%, of all of the potentially includable hypothesis tests were conducted. Early examination of a cumulative meta-analysis of heart-healthy eating programs would have revealed that excessive time and effort was invested in evaluating a program for which sufficiency and stability had been established much earlier.

FIGURE 2—

FIGURE 2—

Cumulative meta-analysis using real meta-analytic databases: heart-healthy eating programs56 (a), drug abuse prevention programs57 (b), music therapy programs57 (c), and caregiver burden reduction programs58 (d).

Source. McArthur,55 White et al.,56 Koger et al.,57 Acton et al.58

Examination of Figure 2b reveals a different pattern. The drug abuse prevention programs56 did not achieve sufficiency at all, even after 10 studies (k8 = 11 hypothesis tests). However, stability (at a null effect) was established early in the cumulative meta-analytic database (4 studies, k3 = 4 hypothesis tests, 9 years before the meta-analysis, and before 54% of the includable hypothesis tests). The cumulative meta-analysis for the drug abuse prevention programs would have revealed that a good deal of effort and resources had been invested in conducting research on a phenomenon that never achieved sufficiency, yet for which stability might have been established long ago.

A third picture emerges from examination of Figure 2c. The music therapy interventions57 did not achieve sufficiency until after 7 studies (k4 = 7 hypothesis tests, 6 years before the meta-analysis, and before 67% of the includable hypothesis tests). Further examination of Figure 2c reveals that the interventions achieved stability just after that point (k5 = 11 hypothesis tests, 5 years before the meta-analysis, and before 48% of the includable hypothesis tests). The cumulative meta-analysis for the effectiveness of music therapy for adults with dementia reveals that excessive time and effort was invested in evaluating programs for which sufficiency and stability had been established much earlier. However, unlike the nutrition programs in Figure 2a, the cumulative meta-analysis for music therapy would have indicated that more data needed to accumulate before the sufficiency and stability of the intervention effectiveness could be established.

Finally, the picture that emerges in Figure 2d is similar to that of Figure 2b. The caregiver burden reduction programs58 did not achieve sufficiency at all, even after 24 studies (k13 = 27 hypothesis tests). Stability, however, was established relatively early in the cumulative meta-analytic database (k4 = 7 hypothesis tests, 12 years before the meta-analysis, and before 74% of the includable hypothesis tests). The cumulative meta-analysis for the caregiving burden interventions would have revealed that a good deal of effort and resources had been invested in conducting research on a phenomenon that never achieved sufficiency and yet for which stability might have been established long ago.

DISCUSSION

We have proposed the failsafe ratio as an indicator of sufficiency, and the cumulative slope as an indicator of stability. The indicators illustrate the complementary nature of the sufficiency and stability aspects of cumulative knowledge in the hypothetical data set presented in Table 1, but they also illustrate sufficiency and stability in the 4 real meta-analytic examples.5558 Consider the heart-healthy nutrition programs,55 which appeared to have strong effects from the outset. Examining the failsafe ratio would have confirmed that additional studies were unlikely to change the weight of the evidence. Examining the cumulative slope would have confirmed that additional studies were unlikely to change the aggregate picture of the phenomenon. Similarly, drug abuse prevention programs56 appeared to have weak effects at the outset. Examining the failsafe ratio would have indicated that additional studies could change the weight of evidence for the (very weak) effect. Examining the cumulative slope, however, would have indicated that additional studies were unlikely to change the aggregate picture of the (very weak) effect. In those cases, and in the case of the caregiver burden reduction programs,58 the failsafe ratio and cumulative slope identify the points in the history of a cumulative database where additional tests of a hypothesis amount to flogging a dead horse.

The complementary aspects of cumulative knowledge, sufficiency and stability, correspond with 2 dimensions of study outcome: significance level and effect size. First, significance level refers to the likelihood of having obtained the observed results, or results more extreme, if in fact the null hypothesis of no difference is true, whereas sufficiency refers to whether the cumulative weight of evidence allows us to accept the existence of the phenomenon. Sufficiency requires a high cumulative probability. Second, effect size refers to the strength of a phenomenon, whereas stability refers to whether the cumulative weight of evidence has leveled off at a steady aggregate picture of the phenomenon. Stability requires a steady cumulative average effect. The cumulative meta-analytic context underscores the role of the size of the database. At the individual study level, significance levels and effect sizes are linked through the size of the sample. That is, a significant effect of P = 0.0499999 might be weak if based on a large sample (n = 1000, ZFisher = 0.052), but strong if based on a small sample (n = 3, ZFisher = 1.830).4 Given the correspondence between significance level/effect size and sufficiency/stability, the size of the database should play a pivotal role in cumulative meta-analysis. Indeed, this appears to be the point of Schmidt’s20 admonition: when is it possible to tell when there is sufficient evidence for the existence of a phenomenon?

PUBLIC HEALTH IMPLICATIONS OF CUMULATIVE META-ANALYSIS

The implications for using cumulative meta-analysis are varied. Among its possible uses are changing school curricula, changing recommendations for physicians, assessing research goals, or modifying criteria for funding research. Although cumulative meta-analysis cannot take the place of other considerations that inform decision-making practice, it is an additional tool that policy makers can use to make better decisions about implementing programs.

The cumulative meta-analysis generated from the integration of heart-healthy nutrition interventions55 demonstrated that, early on, both sufficiency and stability for an effective program was attained. However, the cumulative meta-analysis generated from the integration of drug abuse prevention programs56 demonstrated that sufficiency was never established, but stability for the essentially null effect was established by the fourth wave in the database. However, these 2 programs appear to receive differential research support and commitment. For example, the Healthy People 2010 59 guidelines delineate only 1 objective for improving nutrition in school meals, but there are at least 7 objectives for decreasing substance use among schoolchildren. Although drug abuse is a serious public health problem, the Healthy People 2010 objectives appear to be made on the basis of some of the same studies that appeared in White and Pitts’ meta-analysis,56 indicating an overemphasis on promoting programs from which schoolchildren derive no benefit. Meanwhile, the objectives underemphasize a program from which schoolchildren derive significant benefits. Despite the emerging cultural alarm over obesity and its associated health problems, efficacious heart-healthy eating programs appear to be overlooked. Indeed, a simple MEDLINE search of the literature on schoolchildren corroborates this suspicion: A search for heart healthy and nutrition yielded 13 citations; a search for drug abuse and prevention yielded 651 citations. The rendered wisdom from current research objectives is that there is more promotion of (ineffective) drug abuse prevention programs than (effective) heart-healthy eating programs.

Consider the cumulative meta-analysis generated from the integration58 of interventions to reduce caregiver burden. The cumulative meta-analysis demonstrated that by the seventh wave in the database, stability for the negligible effect was attained, indicating no substantive changes to the accruing evidence that interventions do not reduce caregiver burden. However, 7 years after stability was established, 1 study60 set out recommendations for physicians to identify and intervene with overburdened caregivers. Their recommendations included the same educational, counseling, and respite-care services assessed in the primary-level studies integrated in Acton and Kang’s58 meta-analysis.

Moreover, 8 years after stability for the negligible effect was attained, the US Department of Health and Human Services61 issued a preliminary report on governmental commitments to programs for independent living, including caregiver burden reduction programs. The report claims that “a growing body of evidence confirms that the provision of supportive services can diminish caregiver burden, [and] permit caregivers to remain in the workforce. . . .”61 The 2001 appropriations for the National Caregiver Support Program were $125 000 000.61 To date, we have been unable to determine that any appropriations have been dedicated for music therapy programs. The rendered wisdom from current research objectives is that there is more promotion of (ineffective) caregiver burden reduction programs than (effective) music therapy programs.

The examples above make it clear that research in public health can benefit from tools for determining when sufficient evidence has accrued to establish intervention efficacy. There are several valuable applications of this approach. For example, for research questions involving moderators, cumulative meta-analysis can be used to examine sufficiency and stability separately within levels of the moderator: The evidence from studies testing the intervention at 1 level of the moderator may demonstrate sufficiency, whereas studies testing another level of the moderator may not demonstrate sufficiency. Similarly, cumulative meta-analysis can be used to gauge the fit of public policy recommendations: despite the evidence that the effect of caregiver burden reduction levels off at zero, policy recommendations favor more funding. Finally, this approach may provide an empirically based benchmark against which funding proposals can be evaluated by granting agencies: proposals for new studies that use cumulative meta-analysis to document that current evidence for an intervention that has not yet achieved stability stand as particularly valuable opportunities to invest time, effort, and resources. The failsafe ratio and cumulative slope can reveal information about an emerging phenomenon to help researchers make the best use of limited resources needed to advance the state of the science and improve public health.

Peer Reviewed

Contributors…Both authors developed the conceptual perspective, analyzed the data, and wrote the article.

Human Participant Protection…No protocol approval was needed for this study.

References

  • 1.Mullen B. Advanced BASIC Meta-Analysis. Hillsdale, NJ: Lawrence Erlbaum Associates; 1989.
  • 2.Mullen B. Advanced BASIC Meta-Analysis. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates. In press.
  • 3.Mullen B, Rosenthal R. BASIC Meta-Analysis. Hillsdale, NJ: Lawrence Erlbaum Associates; 1985.
  • 4.Rosenthal R. Meta-Analytic Procedures for Social Research. Newbury Park, CA: Sage; 1991.
  • 5.McDonald HP, Garg AX, Haynes RB. Interventions to enhance patient adherence to medication prescriptions: scientific review. JAMA. 2002;288: 2868–2879. [DOI] [PubMed] [Google Scholar]
  • 6.Peterson AM, Takiya L, Finley R. Meta-analysis of interventions to improve drug adherence in patients with hyperlipidemia. Pharmacotherapy. 2003;23:80–87. [DOI] [PubMed] [Google Scholar]
  • 7.Roter DL, Hall JA, Merisca R, Nordstrom B, Cretin D, Svarstad B. Effectiveness of interventions to improve patient compliance: a meta-analysis. Med Care. 1998; 36:1138–1161. [DOI] [PubMed] [Google Scholar]
  • 8.Davis D, O’Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA. 1999;282:867–874. [DOI] [PubMed] [Google Scholar]
  • 9.Fichtenberg CM, Glantz SA. Effect of smoke-free workplaces on smoking behaviour: systematic review. BMJ. 2002;325:188–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Twenge JM, Campbell WK. Self-esteem and socioeconomic status: a meta-analytic review. Pers Soc Psychol Rev. 2002;6:59–71. [Google Scholar]
  • 11.Schmidt FL. What do data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. Am Psychol. 1992;47:1173–1181. [Google Scholar]
  • 12.Ku L, Sonenstein FL, Pleck JH. Factors influencing first intercourse for teenage men. Public Health Rep. 1993;108:680–694. [PMC free article] [PubMed] [Google Scholar]
  • 13.Eisen M, Zellman GL. Changes in incidence of sexual intercourse of unmarried teenagers following a community-based sex education program. J Sex Res. 1987;23:527–533. [Google Scholar]
  • 14.Marsiglio W, Mott FL. The impact of sex education on sexual activity, contraceptive use and premarital pregnancy among American teenagers. Fam Plann Perspect. 1986;18:151–162. [PubMed] [Google Scholar]
  • 15.Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol. 1989;44:1276–1284. [Google Scholar]
  • 16.Mullen B, Muellerleile P, Bryant B. Cumulative meta-analysis: a consideration of indicators of sufficiency and stability. Pers Soc Psychol Bull. 2001;27: 1450–1462. [Google Scholar]
  • 17.Cooper HM. The Integrative Research Review: A Social Science Approach. Beverly Hills, CA: Sage; 1984.
  • 18.Light RJ, Pillemer DB. Summing Up: The Science of Reviewing Research. Cambridge, MA: Harvard University Press; 1984.
  • 19.Johnson B, Mullen B, Salas E. A comparison of the three major meta-analytic approaches. J Appl Psychol. 1995;80:94–106. [Google Scholar]
  • 20.Schmidt FL, Hunter JE. Comparison of three meta-analysis methods revisted: an analysis of Johnson, Mullen, & Salas (1995). J Appl Psychol. 1999;84: 144–148. [Google Scholar]
  • 21.Rosenthal R, Rubin DB. Interpersonal expectancy effects: the first 345 studies. Behav Brain Sci. 1978;3: 410–415. [Google Scholar]
  • 22.Rosenthal R, Rubin DB. Comment: assumptions and procedures in the file drawer problem. Stat Sci. 1988;3:120–125. [Google Scholar]
  • 23.Antman EM, Lau J, Kupelnick B. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. JAMA. 1992; 268:240–248. [PubMed] [Google Scholar]
  • 24.Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327:248–254. [DOI] [PubMed] [Google Scholar]
  • 25.Pogue JM, Yusuf S. Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative meta-analysis. Control Clin Trials. 1997;18:580–593. [DOI] [PubMed] [Google Scholar]
  • 26.Yusuf S, Held P, Furberg C. Update of effects of calcium antagonists in myocardial infarction or angina in light of the second Danish Verapamil Infarction Trial (DAVIT-II) and other recent studies. Am J Cardiol. 1991;67:1295–1297. [DOI] [PubMed] [Google Scholar]
  • 27.DeProspero A, Cohen S. Inconsistent visual analysis of intrasubject data. J Appl Behav Anal. 1979;12: 573–579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Furlong MJ, Wampold BE. Intervention effects and relative variation as dimensions in experts’ use of visual inference. J Appl Behav Anal. 1982;15:415–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gottman JM, Glass GV. Analysis of interrupted time-series experiments. In: Kratochwill TR, ed. Single Subject Research: Strategies for Evaluating Change. New York, NY: Academic Press; 1978:197–235.
  • 30.Jones R, Weinrott M, Vaught R. Effects of serial dependency on the agreement between visual and statistical inference. J Appl Behav Anal. 1978;11: 277–283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tryon WW. A simplified times series analysis for evaluating treatment interventions. J Appl Behav Anal. 1982;15:423–429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ottenbacher KJ. Interrater agreement of visual analysis in single subject decisions: quantitative review and analysis. Am J Ment Retard. 1993;98: 135–142. [PubMed] [Google Scholar]
  • 33.Chambers JM, Cleveland WS, Kleiner B, Tukey PA. Graphic Methods for Data Analysis. Belmont, CA: Wadsworth; 1983.
  • 34.Cleveland WS. Elements of Graphing Data. Summit, NJ: Hobart Press; 1994.
  • 35.Cleveland WS, McGill R. Graphical perception: theory, experimentation, and application to the development of graphical methods. J Am Stat Assoc. 1984; 79:531–554. [Google Scholar]
  • 36.Cleveland WS, McGill R. The many faces of a scatterplot. J Am Stat Assoc. 1984;79:807–822. [Google Scholar]
  • 37.Mosteller F, Tukey JW. Data analysis, including statistics. In: Lindzey G, Aronson E, eds. The Handbook of Social Psychology. Vol 2. 2nd ed. Reading, MA: Addison-Wesley; 1968.
  • 38.Tufte ER. Envisioning Information. Cheshire, CT: Graphics Press; 1990.
  • 39.Tufte ER. Graphical Explanations. Cheshire, CT: Graphics Press; 1997.
  • 40.Tukey JW. Data based graphics: visual display in the decades to come. Stat Sci. 1990;5:327–339. [Google Scholar]
  • 41.Wainer H. Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot. New York: Copernicus; 1997.
  • 42.Cooper HM. Statistically combining independent studies: a meta-analysis of sex differences in conformity research. J Pers Soc Psychol. 1979;37:131–135. [Google Scholar]
  • 43.Greenwald AG. Consequences of prejudice against the null hypothesis. Psychol Bull. 1975;82:1–20. [Google Scholar]
  • 44.Hedges LV, Vevea JL. Estimating effect size under publication bias: small sample properties and robustness of a random effects selection model. J Educ Behav Stat. 1996;21:299–333. [Google Scholar]
  • 45.Hojat M, Gonnella JS, Caelleigh AS. Impartial judgment by the “gatekeepers” of science: fallibility and accountability in the peer review process. Adv Health Sci Educ Theory Pract. 2003;8:75–96. [DOI] [PubMed] [Google Scholar]
  • 46.Olson CM, Rennie D, Cook D, et al. Publication bias in editorial decision making. JAMA. 2002;287: 2825–2828. [DOI] [PubMed] [Google Scholar]
  • 47.Rosenthal R. The “file drawer problem” and tolerance for null results. Psychol Bull. 1979;86:638–641. [Google Scholar]
  • 48.Beck CT. A meta-analysis of the relationship between postpartum depression and infant temperament. Nurs Res. 1996;45:225–230. [DOI] [PubMed] [Google Scholar]
  • 49.Herbert TB, Cohen S. Depression and immunity: a meta-analytic review. Psychol Bull. 1993;113: 472–486. [DOI] [PubMed] [Google Scholar]
  • 50.Ito TA, Miller N, Pollock VE. Alcohol and aggression: a meta-analysis on the moderating effects of inhibitory cues, triggering events, and self-focused attention. Psychol Bull 1996;120:60–82. [DOI] [PubMed] [Google Scholar]
  • 51.Sheeran P, Orbell S. Do intentions predict condom use? Meta-analysis and examination of six moderator variables. Br J Soc Psychol. 1998;37:231–250. [DOI] [PubMed] [Google Scholar]
  • 52.Sweeney PD, Anderson K, Bailey S. Attributional styles and depression: a meta-analytic review. J Pers Soc Psychol. 1986;50:974–991. [DOI] [PubMed] [Google Scholar]
  • 53.Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Orlando, FL: Academic Press; 1985.
  • 54.McCain LJ, McCleary R. The statistical analysis of the simple interrupted time-series quasi-experiment. In: Cook TD, Campbell DT, eds. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago, IL: Rand McNally; 1979:233–293.
  • 55.McArthur DB. Heart healthy eating behaviors of children following a school-based intervention: a meta-analysis. Issues Compr Pediatr Nurs. 1998;21:35–48. [DOI] [PubMed] [Google Scholar]
  • 56.White D, Pitts M. Educating young people about drugs: a systematic review. Addiction. 1998;93: 1475–1487. [DOI] [PubMed] [Google Scholar]
  • 57.Koger SM, Chapin K, Brotons M. Is music therapy an effective intervention for dementia? A meta-analytic review of literature. J Music Ther. 1999;36:2–15. [DOI] [PubMed] [Google Scholar]
  • 58.Acton GJ, Kang J. Interventions to reduce the burden of caregiving for an adult with dementia: a meta-analysis. Res Nurs Health. 2001;24:349–360. [DOI] [PubMed] [Google Scholar]
  • 59.Healthy People 2010: Understanding and Improving Health. 2nd ed. Washington, DC: US Department of Health and Human Services; 2000.
  • 60.Kasuya RT, Polgar-Bailey P, Takeuchi R. Caregiver burden and burnout: a guide for primary care physicians. Postgrad Med. 2000;108:119–123. [DOI] [PubMed] [Google Scholar]
  • 61.US Department of Health and Human Services. Delivering on the Promise: Preliminary Report. 2001. Available at: http://www.hhs.gov/newfreedom/prelim/caregive.html. Accessed on November 16, 2003.

Articles from American Journal of Public Health are provided here courtesy of American Public Health Association

RESOURCES