Abstract
An international, multi-disciplinary effort aims to identify evidence-based treatments (EBTs) or interventions. The goal of this effort is to identify specific techniques or programs that successfully target and change specific behaviors. In clinical psychology, EBTs are identified based on the outcomes of randomized controlled trials examining whether treatments outperform control or alternative treatment conditions. Treatment outcomes are measured in multiple ways. Consistently, different ways of gauging outcomes yield inconsistent conclusions. Historically, EBT research has not accounted for these inconsistencies. This paper highlights the implications of inconsistencies, describes a framework for redressing inconsistent findings, and illustrates how the framework guides future work examining how to administer and combine treatments to maximize treatment effects, and study treatments meta-analytically.
Keywords: efficacy, effectiveness, intervention, range of possible changes, treatment
Movements toward identifying evidence-based treatments (EBTs) or interventions encompass multiple disciplines, including dentistry, education, medicine, nursing, psychology, and social work. Each area identifies specific interventions, therapies, or programs that successfully target and change specific problem domains or behaviors (e.g., academic achievement, mood, delinquency, hypertension). Within psychology, particularly clinical, counseling, educational, and school psychology, several EBTs have been identified. Different professional groups, organizations, and task forces, as well as groups in different countries (e.g., within the European Union), states, provinces, and territories (e.g., within the United States and Canada) have developed systems delineating specific criteria for identifying EBTs. A key criterion is that the treatment outperforms a no-treatment or alternative treatment group in randomized controlled trials. This paper elaborates on this criterion, highlights critical interpretive problems that apply to treatment research, and describes a way to redress these problems. We raise these issues within evidence-based psychotherapy, but the points apply more generally to evidence-based intervention research.
Inconsistencies in the Evidence
Controlled trials use multiple outcome measures of a given construct and assessments of multiple constructs; sound scientific practices when defining a construct. This strategy has heightened significance in EBT research, because a single measure rarely captures the constructs of interest: Patient outcomes and the range of domains reflecting dysfunction or well being (e.g., positive changes in maladjustment, anxiety, impairment, mood). Thus, a single study includes multiple measures of both the same construct (e.g., depression) and related constructs (e.g., anxiety, impairment). These multiple measures vary in terms of the information source (e.g., relatives, teachers, clinicians), as well as the ways measurements are both taken (e.g., symptom counts, disorder diagnoses) and examined statistically. Researchers rarely hypothesize whether some measures and not others will support the treatment. Often, it appears that researchers expect all measures to suggest the treatment is effective.
What if the measures do not all lead to the same conclusion? If, for example, ten measures are used, how many of these measures should support the treatment? Should two of ten measures support it, or five of ten, or eight of ten? Currently, treatment research does not readily address these questions. This is a critical issue in EBT research, because inconsistencies often arise across both assessments of adults and youths and the many constructs treated in the clinical sciences (e.g., depression, aggression, parenting; Achenbach, 2006; De Los Reyes & Kazdin, 2005, 2006). Multiple measures are necessary and each provides reliable and valid information; it is not the case that some are “right” and others “wrong”. Yet, they often lead to inconsistent conclusions.
Within studies, only some measures show that the treatment and control conditions are statistically different (e.g., De Los Reyes & Kazdin, 2006; Flannery-Schroeder & Kendall, 2000; Webster-Stratton & Hammond, 1997). Often, researchers focus on supportive measures and do not discuss the other measures or merely note that they did not “come out”. Further, between two or more studies of the same treatment, measures that support and do not support the treatment in one study do not necessarily lead to the same conclusions in other studies (e.g., Barrett, Dadds, & Rapee, 1996; Kendall, 1994; Kendall et al., 1997). Therefore, at the end of controlled trials, statements can range from concluding that the treatment is evidence-based, to concluding that it is not evidence-based, or to concluding that the evidence is mixed and it depends on the measure (De Los Reyes & Kazdin, 2006).
There has been insufficient recognition of inconsistent evidence and no model exists to integrate inconsistencies that accounts for all of the evidence. It is possible to acknowledge inconsistencies and still use the evidence to identify EBTs. Moreover, these inconsistencies might signify important circumstances in which evidence suggests treatments are effective and circumstances in which evidence is inconclusive. For instance, consistent findings based on informants that observe behavior in one context (e.g., mothers observing child at home), and inconsistent findings based on other informants that observe behavior in another context (e.g., teachers observing that same child at school) may suggest where an intervention may yield particularly robust outcomes (home-based rather than school-based behavior). One way of addressing inconsistencies is to devise a plan for identifying patterns in evidence that reveal the ways in which treatments are most effective.
The Range of Possible Changes Model
The Range of Possible Changes (RPC) Model was designed to consider within- and between-study consistencies to identify EBTs. By “range”, we mean the myriad conclusions that might be drawn from multiple findings that are discrepant in their support (or lack thereof) of a particular treatment’s effects. This includes treatment literatures that often employ a single measure or source to gauge treatment effects (e.g., smoking cessation, weight loss). Indeed, in these treatment literatures the methods by which outcomes are quantified are often arbitrary (Blanton & Jaccard, 2006), suggesting that even single outcomes can and ought to be examined in multiple ways.
The model provides a classification system that identifies EBTs based in part on whether multiple or specific outcome methods consistently yield similar conclusions. Within this system are categories that classify the many different kinds of studies that produce evidence for treatments (Table 1). Broadly, the categories span classifications of studies that find consistent evidence across multiple ways of gauging outcomes (e.g., Best Evidence for Change), consistent evidence when employing specific outcome methods (e.g., Evidence for Measure- or Method-Specific Change), and inconsistent evidence (e.g., Limited Evidence for Change) (De Los Reyes & Kazdin, 2006). Further, the categories can be applied to classifying evidence, depending on what is targeted for treatment. In other words, one can classify evidence based on multiple measures that represent the same outcome domain (e.g., multiple symptom reduction measures, multiple risk factor measures). Most critically, the RPC Model can be applied to examine whether two studies of the same treatment yield consistent evidence between them. An example may be two studies that examined whether a treatment reduces symptoms of anxiety. If the studies could each be classified within the same category (e.g., Best Evidence for Change), then the two studies may be classified as providing consistent evidence for the reduction of anxiety symptoms.
Table 1.
Category | Criteria |
---|---|
Best Evidence for Change | At least 80% of the findings from three or more informants, measures, and analytic methods show differences, and at least three findings were gleaned from each of the informants, measures, and methods. There is no clear informant-specific, measure-specific, or method-specific pattern of findings. The evidence suggests the intervention successfully targets the construct. |
Evidence for Probable Change | More than 50% of the findings from three or more informants, measures, and analytic methods show differences, and at least three findings were gleaned from each of the informants, measures, and methods. There is no clear informant-specific, measure-specific, or method-specific pattern of findings. The evidence suggests the intervention probably changes the targeted outcome domain, yet future work ought to examine why inconsistencies occurred. |
Limited Evidence for Change | Either 50% or less of the findings from three or more informants, measures, and analytic methods show differences, or less than the grand majority (less than 80%) of findings from specific informant’s ratings, measures, and/or methods show differences. Any differences found are either scattered across outcomes from multiple informants, measures, or methods, or are not found predominantly on outcomes from specific informants, measures, and/or methods. The evidence is inconclusive |
No Evidence for Change | No differences are observed. The evidence is completely inconclusive. |
Evidence for Informant-Specific Change | Differences are found on the grand majority (80%) of ratings provided by specific informant(s), and at least three findings were gleaned from the informant(s) for which specificity of findings were observed. The evidence suggests the treatment might change the domain when it is exhibited in specific situations or in interactions with specific informant(s). |
Evidence for Measure- or Method-Specific Change | Differences are found on the grand majority (80%) of specific measure(s) or analytic method(s), and at least three findings were gleaned from the measure(s) or method(s) for which specificity of findings were observed. The evidence suggests the intervention might change the domain when it is measured with specific kinds of measure(s), method(s), or both. |
Note.
Adapted from De Los Reyes & Kazdin, 2006. In the categories above, by “informants”, we mean reporters of outcomes (e.g., self, spouse or significant other, clinician, laboratory observer, biological, institutional records). By “measures”, we mean ways to assess outcomes (e.g., questionnaire or symptom-count measures, laboratory observations, diagnostic interviews). By “analytic methods”, we mean statistical strategies (e.g., tests of mean differences, tests of diagnostic status).
In addition, the model acknowledges that outcomes might be tested in multiple ways. Specifically, outcomes are often evaluated by examining statistical differences between treatment and control conditions, yielding a limited set of possible findings (e.g., treatment is effective, evidence is inconclusive, treatment makes people worse). Indeed, the classification categories described in Table 1 are based on this method. However, another method assesses how much of a difference exists between conditions (e.g., effect size, or the degree of difference between the average scores of treatment and control participants). Combining these two methods might reveal nuances in a treatment’s effectiveness. For example, a study’s evidence might meet criteria for the Best Evidence for Change category (Table 1), and yet have observed magnitudes of change ranging from small-to-large. Thus, the RPC Model addresses this key aspect of research, by guiding the incorporation of classifications based on categorical statistical differences with evaluations of the range of outcomes based on degree of statistical differences (for a discussion of measurement reliability and statistical power issues see De Los Reyes & Kazdin, 2006).
Advances and Future Directions
Prior research has identified EBTs and yet has not accounted for inconsistent findings. However, inconsistencies might reveal important information of treatment effects: They highlight both the variety of ways that a treatment may change behaviors and the specific circumstances in which a treatment may be effective (Table 1). The RPC Model addresses inconsistencies, and reveals directions for future research that may lead to a greater understanding of how to administer and combine treatments to maximize their effects, and how to conduct meta-analytic reviews of treatment research.
First, the RPC Model identifies the circumstances in which treatments might produce consistent effects. For instance, consider a treatment that the evidence suggests produces robust effects within specific circumstances (e.g., at school or with peers, symptom reduction) and inconsistent effects within other circumstances (e.g., at home, diagnostic remission). With this evidence, researchers have an increased understanding of how to administer that treatment in future studies (e.g., where effects were consistently observed). Further, researchers have a greater understanding of how long that treatment ought to be administered (e.g., enough to produce symptom reductions, longer to produce both symptom reductions and diagnostic remission). Therefore, the RPC Model guides knowledge of treatment effects leading to sensible decision-making as to where and how to administer treatments.
Second, the RPC Model identifies two potentially fruitful methods for combining treatments. Broadly, one might conceptualize combining treatments in a way in which each treatment produces consistent effects that the other treatment does not produce. This strategy is like fitting two puzzle pieces together, where each piece “fills in” the gaps left open by the other piece. Specifically, one strategy might involve combining two or more treatments that are identified as producing consistent effects in different domains of the same construct. An example might be a combined protocol including a treatment that both consistently produces effects on symptom outcome measures and inconsistently produces effects on risk factor outcome measures with another treatment that consistently produces effects on risk factor outcome measures and not on symptom outcome measures.
Another method might involve two or more treatments that are identified as producing consistent effects in different contexts or circumstances within the same domain (Table 1). For instance, one might combine a treatment that produces consistent symptom reductions on school-based and not home-based measures with a treatment that produces consistent symptom reductions on home-based and not school-based measures. Therefore, the RPC Model guides the development of cost-effective methods of combining treatments so that effects are not redundant between treatments in a combined protocol.
Third, the RPC Model informs future meta-analytic reviews of treatment research. Indeed, traditional meta-analytic reviews have identified effects of specific treatment techniques by averaging effects multiple times; not only within studies, but between studies of the same or similar techniques (e.g., Matt, 1989; Stice & Shaw, 2004). However, with average treatment effects, it remains unclear whether consistent evidence is found within any one study or between any two studies. For instance, a sample of treatment studies might on average yield large treatment effects. Yet, that sample might include multiple studies that only yielded statistically significant effects on half of their outcome measures, with no two studies yielding the same ranges of magnitudes of effects (e.g., no two studies suggesting effects ranged from medium-to-large). Further, even procedures that statistically correct for potentially biasing factors in effect size estimates (differences in integrity of administration of treatment, differences in reliability of measures; Hunter & Schmidt, 2004) still often place corrections on such measures at an aggregate level. Aggregate measures and their corrections do not necessarily yield evidence on whether individual measures within and between studies are replicating the same effect or consistently suffering from the same biasing factors (De Los Reyes & Kazdin, 2006).
The RPC Model might be used to study evidence meta-analytically, by employing both categorical (Table 1) and continuous (effect size) measures of treatment effects. For example, within a sample of studies of the same treatment, one could both classify each study categorically using the RPC Model categories, and calculate effect sizes for each outcome to determine the range of effects observed for each study (i.e., highest and lowest effect sizes). With this critical information, one can address a number of pertinent research questions. For example, one can examine whether multiple studies are both consistently classified in the same RPC Model category, and evidence similar ranges of treatment effects (e.g., two or more studies classified in the Evidence for Probable Change category, exhibiting medium-to-large treatment effects). Further, one could examine moderators of both RPC Model categorical classifications as well as moderators of the upper and lower limit effects observed within each study. For instance, one could study whether sample (gender, age), treatment (individual vs. group), and methodological (reliability of measures) characteristics are related to the likelihood that a study would be classified in a particular RPC Model category, or related to the average range or distance between the highest and lowest effect sizes observed within studies. Additionally, the framework’s employment of effect size measures make it possible to use versions of statistical correction procedures to account for differences among studies in treatments examined, and differences among outcome measures in their reliability or other measurement properties (Hunter & Schmidt, 2004). Thus, one can study treatments meta-analytically and still account for important information of the consistency in treatment effects, as well as identify moderators of within- and between-study consistency.
Conclusions
The movement toward identifying EBTs advances a research literature that spans multiple disciplines and types of interventions in mental health, physical health, and education. This paper does not detract but enhances the remarkable gains made in EBT research. In clinical psychology, non-EBTs for adults and youths continue to be employed in clinical practice when there are EBTs available that target the same behaviors. Although a given study might reveal inconsistent outcomes and this raises significant issues, this ought to be presented in the context of a key reality: Hundreds of “evidence-less” treatments are being administered to patients (Kazdin, 2000), and some evidence, although inconsistent is clearly better than none. This paper does not advocate non-EBTs where EBTs are available.
A critical interpretive issue requires further attention, namely, in a given study and across studies that replicate that original study, some measures show a change and others do not. This reality applies to treatments for both adults and youths and encompasses the range of behaviors targeted in research. There has been tacit selection of the measures that show change. In part, this selection is driven by basic science issues, in that “null and negative effects” are difficult to interpret and can arise for myriad reasons (e.g., low statistical power or small sample size, poor measure reliability). However, statistically significant and positive effects might also be difficult to interpret and can arise for multiple reasons. Null effects can be real (i.e., reflect that no change occurred), just as much as significant changes on measures could be attributable to chance fluctuations in outcomes.
The RPC Model takes into account inconsistencies, and employing the framework will allow researchers to draw reliable and valid conclusions amidst inconsistencies. Further, the RPC Model establishes interesting leads toward further understanding intervention effects and how to maximize them. We encourage research to use the RPC Model in future work evaluating: (a) the circumstances in which interventions produce the most consistent effects; (b) how to combine interventions; and (c) intervention effects via meta-analytic review. More than a single model, we encourage further work on the matter of inconsistencies and how they ought to be integrated to draw conclusions from EBT research.
Acknowledgments
This work was supported, in part, by National Institute of Mental Health Grant MH67540 (Andres De Los Reyes) and by National Institute of Mental Health Grant MH59029 (Alan E. Kazdin). We are very grateful to Shannon M.A. Kundey for her extremely helpful comments on a previous version of this paper.
References
- Achenbach TM. As others see us: Clinical and research implications of cross-informant correlations for psychopathology. Current Directions in Psychological Science. 2006;15:94–98. [Google Scholar]
- Barrett PM, Dadds MR, Rapee RM. Family treatment of childhood anxiety: A controlled trial. Journal of Consulting and Clinical Psychology. 1996;64:333–342. doi: 10.1037//0022-006x.64.2.333. [DOI] [PubMed] [Google Scholar]
- Blanton H, Jaccard J. Arbitrary metrics in psychology. American Psychologist. 2006;61:27–41. doi: 10.1037/0003-066X.61.1.27. [DOI] [PubMed] [Google Scholar]
- De Los Reyes A, Kazdin AE. Informant discrepancies in the assessment of childhood psychopathology: A critical review, theoretical framework, and recommendations for further study. Psychological Bulletin. 2005;131:483–509. doi: 10.1037/0033-2909.131.4.483. [DOI] [PubMed] [Google Scholar]
- De Los Reyes A, Kazdin AE. Conceptualizing changes in behavior in intervention research: The range of possible changes model. Psychological Review. 2006;113:554–583. doi: 10.1037/0033-295X.113.3.554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flannery-Schroeder EC, Kendall PC. Group and individual cognitive-behavioral treatments for youth with anxiety disorders: A randomized clinical trial. Cognitive Therapy and Research. 2000;24:251–278. [Google Scholar]
- Hunter JE, Schmidt FL. Methods of meta-analysis: Correcting error and bias in research findings. 2. Thousand Oaks, CA: Sage; 2004. [Google Scholar]
- Kazdin AE. Psychotherapy for children and adolescents: Directions for research and practice. New York: Oxford University Press; 2000. [Google Scholar]
- Kendall PC. Treating anxiety disorders in children: Results of a randomized clinical trial. Journal of Consulting and Clinical Psychology. 1994;62:100–110. doi: 10.1037//0022-006x.62.1.100. [DOI] [PubMed] [Google Scholar]
- Kendall PC, Flannery-Schroeder EC, Panichelli-Mindel SM, Southam-Gerow M, Henin A, Warman M. Therapy for youths with anxiety disorders: A second randomized clinical trial. Journal of Consulting and Clinical Psychology. 1997;65:366–380. doi: 10.1037//0022-006x.65.3.366. [DOI] [PubMed] [Google Scholar]
- Matt GE. Decision rules for selecting effect sizes in meta-analysis: A review and reanalysis of psychotherapy outcome studies. Psychological Bulletin. 1989;105:106–115. doi: 10.1037/0033-2909.105.1.106. [DOI] [PubMed] [Google Scholar]
- Stice E, Shaw H. Eating disorder prevention programs: A meta-analytic review. Psychological Bulletin. 2004;130:206–227. doi: 10.1037/0033-2909.130.2.206. [DOI] [PubMed] [Google Scholar]
- Webster-Stratton C, Hammond M. Treating children with early-onset conduct problems: A comparison of child and parent training interventions. Journal of Consulting and Clinical Psychology. 1997;65:93–109. doi: 10.1037//0022-006x.65.1.93. [DOI] [PubMed] [Google Scholar]
Recommended Readings
- *.Achenbach TM, Krukowski RA, Dumenci L, Ivanova MY. Assessment of adult psychopathology: Meta-analyses and implications of cross-informant correlations. Psychological Bulletin. 2005;131:361–382. doi: 10.1037/0033-2909.131.3.361. Documented the general importance of informant discrepancies to adult clinical assessments. [DOI] [PubMed] [Google Scholar]
- *.Achenbach TM, McConaughy SH, Howell CT. Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin. 1987;101:213–232. A seminal meta-analysis that identified informant discrepancies as a general clinical child assessment issue. [PubMed] [Google Scholar]
- *.Achenbach TM. 2006 See reference list. A brief review of the implications of informant discrepancies for clinical assessment. [Google Scholar]
- *.De Los Reyes A, Kazdin AE. 2005 See reference list. This paper advances a theoretical framework to explain why informant discrepancies exist in clinical child assessments. [Google Scholar]
- *.De Los Reyes A, Kazdin AE. 2006 See reference list. This paper discusses the RPC Model in more detail than the current paper. [Google Scholar]
- *.Rosenthal R, DiMatteo MR. Meta-analysis: Recent developments in quantitative methods for literature reviews. Annual Review of Psychology. 2001;52:59–82. doi: 10.1146/annurev.psych.52.1.59. This paper provides a general treatment of meta-analysis and its methodology. [DOI] [PubMed] [Google Scholar]