Skip to main content
Perspectives on Behavior Science logoLink to Perspectives on Behavior Science
. 2020 Aug 20;43(4):725–760. doi: 10.1007/s40614-020-00265-9

Search and Selection Procedures of Literature Reviews in Behavior Analysis

Seth A King 1,, Douglas Kostewicz 2, Olivia Enders 2, Taneal Burch 3, Argnue Chitiyo 4, Johanna Taylor 5, Sarah DeMaria 2, Milsha Reid 2
PMCID: PMC7724014  PMID: 33381686

Abstract

Literature reviews allow professionals to identify effective interventions and assess developments in research and practice. As in other forms of scientific inquiry, the transparency of literature searches enhances the credibility of findings, particularly in regards to intervention research. The current review evaluated the characteristics of search methods employed in literature reviews appearing in publications concerning behavior analysis (n = 28) from 1997 to 2017. Specific aims included determining the frequency of narrative, systematic, and meta-analytic reviews over time; examining the publication of reviews in specific journals; and evaluating author reports of literature search and selection procedures. Narrative reviews (51.30%; n = 630) represented the majority of the total sample (n = 1,228), followed by systematic (31.51%; n = 387) and meta-analytic (17.18%; n = 211) reviews. In contrast to trends in related fields (e.g., special education), narrative reviews continued to represent a large portion of published reviews each year. The evaluated reviews exhibited multiple strengths; nonetheless, issues involving the reporting and execution of searches may limit the validity and replicability of literature reviews. A discussion of implications for research follows an overview of findings.

Keywords: Systematic review, Literature review, Quality indicators


An ever-increasing volume of material produced through novel research underscores two essential scientific processes: (1) the generation of new information and (2) the incorporation of discoveries into existing knowledge (Chalmers, Hedges, & Cooper, 2002). Across all of the subdisciplines of behavior analysis (e.g., applied behavior analysis [ABA], experimental analysis of behavior [EAB]; Cooper, Heron, & Heward, 2007), researchers appraise previous studies, summarize knowledge, examine scientific trends, and assess the efficiency of interventions through approaches collectively referred to as literature reviews (Slavin, 1995). Collected research findings guide the decisions of researchers, practitioners, and policymakers (Maggin, Talbott, Van Acker, & Kumm, 2017). Researchers assess previous scholarship in accordance with two general approaches: the narrative or systematic review (Petticrew & Roberts, 2008).

The narrative review relies on the subjectivity of authors in identifying relevant material (Cooper, Hedges, & Valentine, 2009). Narrative reviews typically lack formal search or analysis procedures and possess flexibility in terms of questions of interest (e.g., theory; Talbott, Maggin, Van Acker, & Kumm, 2018). Often indistinguishable from discussion articles, narrative reviews in behavior analysis cite examples from the literature to extend or challenge existing theories of behavior (e.g., Shahan, 2010). Narrative reviews also derive appeal as an accessible means of circulating findings (Maggin et al., 2017). Hence, narrative reviews promote the use of research by nontechnical audiences (Behrstock-Sherratt, Drill, & Miller, 2011).

Critics have questioned the utility of narrative reviews as an acceptable means of analyzing research due to their opaque, subjective approach to identifying included studies (Chalmers et al., 2002; Odom, 2009; Slavin, 1995). Hantula (2016) called for reviews to incorporate systematic approaches, arguing narrative reviews have the potential to misrepresent evidence and incorporate bias. The absence of methodological detail further impedes efforts at replication, as when authors expound upon a previous review of a topic (e.g., functional analysis; Schlichenmeyer, Roscoe, Rooker, Wheeler, & Dubé, 2013) or adopt different approaches in examining similar issues (e.g., Lemons et al., 2016).

In contrast, systematic reviews provide transparent descriptions of search and selection procedures (Garg, Hackman, & Tonell, 2008). This approach to evaluating previous literature, though initially created in the first half of the 20th century, emerged in most fields during the late 1970s and provides a clear description of the method used to identify and select studies for the review (Chalmers et al., 2002). Proponents suggest systematic reviews provide a credible means of evaluating interventions, assessing research quality, revealing gaps in knowledge, and guiding future studies (Cook et al., 2015). The utility of systematic reviews stems from transparent descriptions of research objectives and study selection, which permit consumers to evaluate the strength of the author’s conclusions (Maggin et al., 2017).

Meta-analytic reviews likewise involve thorough literature searches, with the additional step of quantifying and statistically combining the study results, examining moderating variables, and arriving at precise conclusions regarding treatment (Petticrew & Roberts, 2008). Meta-analysis represents an approach to examining literature developed only within the last 50 years (Chalmers et al., 2002). Beyond simply averaging study effects, meta-analysis entails statistically combining findings from two or more separate studies to yield (1) individual effect sizes for each study, (2) a summary statistic consisting of the weighted average of effects from each study, (3) a measure reflecting the uncertainty of results (e.g., confidence interval, probability distribution), and (4) a determination of whether studies measure the same quantity (i.e., fixed effects) or effects following a distribution across studies (i.e., random effects; Deeks, Higgin, & Altman, 2019). When combined with adequate search and selection procedures, meta-analysis purports to concisely encapsulate the results of entire bodies of literature.

Due in part to the magnitude of research evidence available in the latter decades of the 20th century, the number of systematic reviews appearing in Scopus and the Web of Science (WOS) databases grew exponentially between 2000 and 2010 (Hansen & Trifkovic, 2013). The evidence-based practice (EBP) movement, which emphasizes the use of methods supported by multiple high-quality studies (American Psychological Association [APA] Presidential Task Force on Evidence-Based Practice, 2006), further explains the increase in systematic reviews (Cook & Cook, 2013). Although not uniformly accepted within behavior analysis (e.g., Pennypacker, 2012; Slocum et al., 2014), the EBP movement has gained traction within related disciplines of special education (Cook & Odom, 2013) and psychology (APA Presidential Task Force on Evidence-Based Practice, 2006). Systematic reviews play a critical role in substantiating and disseminating EBPs throughout the professional community (Moore et al., 2019). As a result, the methodology of systematic reviews has implications for treatment and practice (Mackay, Barkham, Ries, & Stiles, 2003).

Systematic reviews address a wide range of topics aside from EBPs. Review objectives specific to behavior analysis include (1) evaluating the field (e.g., status of women; McSweeney & Swindwell, 1998), (2) assessing the prevalence of specific research practices (e.g., dependent measure assessment; Kostewicz, King, Datchuk, Brennan, & Casey, 2016), (3) historical overviews (e.g., founding of behavior analysis; Morris, Atlus, & Smith, 2013), and (4) questioning established theory (e.g., verbal operants; Gamba, Goyos, & Petursdottir, 2015). Delineated methodology provides insight into the intent and applicability of reviews (King, Davidson, Chitiyo, & Apple, 2020).

Describing procedures in systematic reviews and meta-analyses alleviates confusion resulting from contradictory findings presented in overlapping reviews, or separate reviews pertaining to the same subject (e.g., Siontis, Hernandez-Boussard, & Ioannidis, 2013). Differences in methodology have resulted in discordant reviews on topics including arthritis (e.g., Thorlund, Druyts, Aviena-Zubieta, Wu, & Mills, 2013), social skills interventions (e.g., Wang, Parilla, & Cui, 2013; cf. Ledford, King, Harbin, & Zimmerman, 2018), and the prevalence of replication studies (e.g., Lemons et al., 2016; c.f. Therrien, Mathews, Hirsch, & Solis, 2016). Processes used in identifying and selecting eligible studies partially explain the disparate findings of discordant reviews. Unstated search and selection procedures prevent consumers from identifying biases or sources of conflict with the potential to mitigate the validity of findings.

Of the many benefits of systematic reviews, perhaps the most relevant to behavior analysis concerns their role in providing a transparent record of the replicability and generality of studies unified through any of a number of variables including subjects, interventions, or outcomes of interest (King et al., 2020). Producing similar results following the employment of a method used in an earlier study (i.e., replication) represents a priority for behavior analysts (e.g., Sidman, 1960; Baer, Wolf, & Risley, 1968; Laraway, Snycerski, Pradhan, & Huitema, 2019; Tincani & Travers, 2019). In this article, we review historical material with relevance to systematic and meta-analytic reviews in behavior analysis. We then describe bias associated with literature search and selection procedures. Finally, we review the search and selection procedures reported in systematic and meta-analytic reviews published in journals of behavior analysis.

Systematic Reviews, Meta-Analyses, and Behavior Analysis

The next section provides an overview of scholarship relevant to literature reviews in behavior analysis. First, we briefly describe approaches to experimentation and data analysis within the field (e.g., repudiation of inferential statistics) with the potential to influence the development and assessment of literature reviews. (e.g., Baer et al., 1968; Graf, 1982; Ator, 1999; Perone, 2019). Next, we focus on issues with a more direct relationship to systematic reviews and meta-analyses, such as the popularity of narrative reviews and a debate regarding methodology originally published in Perspectives on Behavior Science (PoBS; Derenne & Baron, 1999; Kollins, Newland, & Critchfield, 1999, Baron & Derenne, 2000; Critchfield, Newland, & Kollins, 2000). We conclude with an overview of more recent writings concerning replication, statistics, and reviews in behavior analysis.

Experimentation and Data Analysis in Behavioral Analysis

Behavior analysis encompasses a series of distinct approaches to scientific inquiry designed to demonstrate relations between the environment and behavior (Sidman, 1960; Baer et al., 1968). Skinner (1938, 1956) firmly established the focus of behavior analysis on controlled, repeated observations of individual organisms and asserted the independence of the science of behavior from psychology’s adherence to group experimentation. Hence, research in behavior analysis often focuses on the internal replication of experimental effects at the individual level through the systematic manipulation of controlling variables (Sidman, 1960).

Behavior analysts have traditionally rejected the statistical analysis used in group experimentation, in which the aggregation of participants in control or treatment conditions (a) masks variability within each group, (b) obscures the change of behavior over time, or (c) distances researchers from variables that control behavior (Baer et al., 1968, Baer, 1977; Ator, 1999; Johnston & Pennypacker 2009). Beginning at least with Skinner’s (1938) publication of cumulative record curves, behavior analysts prefer to describe, both narratively and with descriptive statistics, graphic depictions of response data. Through this process of visual analysis, the detection and determination of clear experimental effects and functional relations occurs without the impediment of formal significance tests (Perone, 1999; Kazdin, 2011). Behavior analysts (e.g., Perone, 1999; Killeen, 2019) have repeatedly criticized null-hypothesis testing, a cornerstone of inferential statistics (e.g., Nickerson, 2000), as an abstruse, arbitrary process that detracts from observations of behavior change. Wariness of statistics also extends to efforts to fuse inferential analysis with traditional behavior analysis—a form of eclecticism Sidman (1960) warned could irrevocably compromise behavior science and produce “a hopeless confusion of basically incompatible data and principles” (p. 54). Many critics objected to techniques beyond the visual analysis of graphic data displays due to the typical issues associated with statistics as well as the propensity of statistics to inflate the effects of interventions (e.g., Baer et al., 1968; Graf, 1982).

Despite the emphasis on internal replication and the rejection of techniques associated with the aggregation of data, behavior analysts recognize the importance of replicating experimental effects externally, beyond individual subjects (Sidman, 1960). In his foundational Tactics of Scientific Research, Sidman (1960) provided many of the guiding principles for single-case design (SCD) experiments—the traditional approach to behavior analytic research often involving a small number of participants—yet explicitly underscored the importance of replication beyond an original subject. According to Sidman, a single experiment replete with multiple demonstrations of effect (e.g., internal replications as obtained through a reversal design) may provide initial evidence of a relationship between variables. However, the significance of such work may only be truly confirmed through subsequent studies. Just as the repeated demonstration of an effect with a single individual contributes to the evidence base regarding the principles of behavior, external reproduction of a study confirms whether anticipated or unanticipated factors might impede replication. Sidman and subsequent authors (e.g., Baer et al., 1968; Baer, Wolf, & Risley, 1987) further noted the importance of exploring the generality of effects, or the extent initially observed relations between variables maintain across settings, organisms, and behaviors. As with replication, Sidman (1960) suggested reproducing effects across a wide range of participants and laboratories resulted in a stronger body of evidence.

Demonstrating external replication and generality, however, requires multiple studies recognized as sharing some sort of similarity despite differences in subjects, settings, or methods (Critchfield et al., 2000). This process implicitly requires some form of aggregation, yet emphasizing commonalities across experiments potentially negates features that might otherwise distinguish one work from another. Though rejected by many behavior analysts, the pursuit of quantitative approaches to SCD stemmed from (1) concerns regarding the reliability of visual analysis (e.g., DeProspero & Cohen, 1979; Hojem & Ottenbacher, 1988; for a review, see Ninci, Vannest, Wilson, & Zhang, 2015) and (2) an interest in synthesizing the effects of intervention studies featuring SCDs (Scruggs, Mastropieri, & Casto, 1987a). Advocates of synthesizing SCDs suggested the process could result in stronger conclusions regarding intervention effects and a greater understanding of factors (e.g., settings, populations) that influence the generality of effects (Gingerich, 1984). The rejection of these techniques, as well as the dueling emphases on (1) internal replication at the individual level and (2) demonstrating the generality and replication of effects across multiple subjects complicates attempts to summarize behavior analytic research. Generalizing about the effects of interventions or variables across studies, without obscuring the characteristics of individual experiments, represents a considerable challenge for reviewers (Critchfield et al., 2000).

Reviews in Behavior Analysis

Leaving aside the objection to statistics within behavior analysis, Sidman’s (1960) writings regarding the importance of external replication and extension, combined with a general emphasis on highly detailed descriptions of research methods (e.g., Baer et al., 1968), would appear to provide some basis for the proliferation of systematic reviews. However, the origins of behavior analysis as a basic science, in which researchers attempted to examine universal principles through highly controlled animal experiments inspired a less systematic alternative to assessing the literature: the narrative review (Mace & Critchfield, 2010). Narrative reviews often attempt to link laboratory studies to natural phenomenon irrespective of their actual relation in reality (Hayes, Blackledge, & Barnes-Holmes, 2001). This approach to analyzing literature involves logically extending findings from the laboratory into compelling, under-researched areas, while also preserving the individual characteristics of thematically unified studies through in-depth, qualitative descriptions (e.g., Salzberg, Strain, & Baer, 1987).

In the face of increasing efforts to quantitatively synthesize SCD research during the 1980s (e.g., Center, Skiba, & Casey, 1985), Salzberg et al. (1987) offered a pointed rejection of quantitative SCD syntheses (e.g., Scruggs et al., 1987a) as well as a robust endorsement of narrative reviews. Salzberg et al. (1987) provided a narrative examination of studies quantified in an earlier synthesis (Scruggs, Mastropieri, Cook, & Escobar, 1986) and concluded the narrative approach preserved information discarded through alternative approaches, commenting “if what we have done here is meta-analysis, we recommend it as a better model; if not, we recommend it anyway” (p. 48). Critics of Salzberg et al.’s (1987) commentary correctly noted the authors did not address—and exemplified—widely recognized issues with the precision, objectivity, and validity of traditional narrative reviews (e.g., Scruggs, Mastropieri, & Casto, 1987b).

A series of articles initially pertaining to a quantitative synthesis in which the authors did not report detailed description of procedures (Kollins, Newland, & Critchfield, 1997) provided an expansive dialogue concerning appropriate methods for summarizing research in behavior analysis. Derenne and Baron (1999) suggested the quantitative comparison of aggregated studies deviated from the emphasis on individual responses needed to observe the controlling influence of variables on behavior. The authors also identified search and selection criteria that result in the inclusion of studies with dissimilar participants, settings, or quality control procedures as a form of aggregation with the potential to distort findings. Rather than calling for the abandonment of systematic reviews and meta-analyses, however, Derenne and Baron acknowledged the limitations of traditional narrative reviews and proposed that the standards for primary behavior analytic research extend to the selection and scrutiny of studies featured in literature reviews.

The remaining entries in the discourse explicitly highlighted the conceptual challenges of reviewing literature in behavior analysis. Kollins et al. (1999) suggested the aggregation of studies through search criteria represented an unavoidable limitation, because all potential methodological choices are in some way compromised. Baron and Derenne (2000) conceded this point, and recommended a transparent approach that would include, categorize, and analyze studies during the review process. In concluding the exchange, Critchfield et al. (2000) argued that the determination of generality of behavioral effects across studies—a fundamental goal of behavior analysis—inescapably involves accumulating studies with dissimilarities. Citing the decisive role research methodology plays in determining outcomes, the authors further suggested that the effects of various approaches to literature review should be examined empirically rather than be assumed. Both groups of authors conceived of a role for systematic and narrative reviews in behavior analysis. Taken together, the series of articles represented a call for greater transparency and methodological heterogeneity in reviews of research in behavior analysis.

Recent Developments and Summary

The impact of previous scholarship on the prevalence of systematic reviews and meta-analyses cannot be determined without an assessment of publication trends in behavior analysis. Examinations of reviews published in the related discipline of special education reveal an increase in systematic reviews and meta-analyses (King et al., 2020), as well as an increase in syntheses of SCD studies employing the percentage of nonoverlapping data (PND) and other quantitative effect sizes (Maggin, O’Keefe, & Johnson, 2011a). The dramatic increase in the use of effect-size metrics and meta-analysis techniques appearing in special education has had a number of implications for SCD research (Pustejovsky & Ferron, 2017). Researchers appear to have accepted the general premise of meta-analysis as applied with group design studies. Newer effect-size metrics (e.g., log response ratio; Pustejovsky, 2015) exceed nonoverlap effects in terms of technical adequacy, and between-case effects permit the aggregation of single-case and group design studies (Shadish, Hedges, Horner, & Odom, 2015). Analysis increasingly allows for the synthesis and evaluation of data within and across studies, facilitating the preservation of data at the individual level in meta-analyses of SCDs (Pustejovsky & Ferron, 2017). Nonetheless, researchers frequently identify problems with the metrics of effect used in meta-analyses as well as an absence of quantitative effects in the primary literature (Laraway et al., 2019).

Recent authors of meta-analyses have attempted to bridge the gap between practices in special education and the historical dictates of behavior analysis. In addition to calculating numerous effect sizes for SCDs, Lanovaz, Turgeon, Cardinal, and Wheatley (2019) analyzed the probability of an experimental effect replicating within ABAB designs in terms of the percentage of apparent effects, as determined through visual analysis. This metric, similar to the success rate metric proposed by Lanovaz and Rapp (2016), does not avoid the issues with aggregation inherent to all systematic reviews identified by Baron and Derenne (2000). Yet the use of visual analysis as the basis of a descriptive summary statistic of an intervention’s effectiveness, as opposed to a quantitative aggregation of time series data, may be more acceptable to behavior analysts. Likewise, Killeen (2019) advocated for the use of Bayesian statistics, where the predicted effectiveness of an intervention stems from the knowledge of its effectiveness in prior applications. Such an analysis avoids the pitfalls of null-hypothesis testing and uses the record of replication documented across studies as the basis for the derivation of effects rather than the aggregation of individual data points.

Various behavior analysis journals observe different standards for literature reviews. The Journal of Applied Behavior Analysis (JABA) introduced brief reviews in article format in 2010 (“Brief reviews”), with submission guidelines requiring authors to evaluate research published in JABA within 5 years of the review and observe strict page limitations. These guidelines potentially limit information available to consumers regarding search and selection procedures. In contrast, PoBS endorsed the use of methodologically explicit reviews featuring sophisticated techniques over narrative reviews—an apparent departure from tradition, potentially serving as a tacit acknowledgement of the persistence of narrative reviews in the field (Hantula, 2016). JEAB likewise encouraged authors to depart from traditional methods of analysis and submit articles with more sophisticated forms of statistics, though not specifically in the context of literature reviews, and always with an eye toward preserving the behavior analytic nature of the journal (Kyonka, Mitchell, & Bizo, 2019; Galizio, 2020).

A general concern with the veracity of research findings has also inspired renewed interest in literature reviews in behavior analysis. Scholars in psychology and other behavioral sciences have increasingly expressed alarm with the apparent failure of many research teams to successfully replicate previously observed findings—a phenomenon often described as a replication crisis (Perone, 2019; Tincani & Travers, 2019). Laraway et al. (2019) suggested the traditional emphasis on replication within behavior analysis does not protect the field from widespread replication failures. Factors contributing to a possible replication crisis in behavior analysis include (1) publication practices that relegate unsuccessful replication studies to obscurity; (2) inconsistent, imprecise descriptions of study procedures and effects; (3) the proliferation of different measurement systems across studies; and (4) the varying quality of experiments implemented by individual research teams. Laraway et al. further identified thorough, transparent literature reviews and meta-analyses as a potential safeguard against the replication crisis. Hence, concerns related to the perceived validity of research have inspired a reevaluation of the unsettled question regarding appropriate methods for summarizing studies in behavior analysis (Baron & Derenne, 2000).

In sum, the importance of preserving individual responses of organisms, combined with the traditional use of narrative reviews, may have prevented the appearance of systematic reviews and meta-analyses within behavior analysis despite the emphasis placed on external replication and generality by prominent figures (e.g., Sidman, 1960). Dismissal of statistical methods employed in meta-analysis showed minimal signs of abeyance by 1990s (e.g., Ator, 1999), though scholars increasingly espoused the benefits explicit systematic literature search and procedures (e.g., Derenne & Baron, 1999). Although the aggregation of studies inherent to all forms of review poses the risk of violating historic precepts of behavior analysis, such risks are essential to the essential task of demonstrating the generality of behavioral principles (Critchfield et al., 2000; Sidman, 1960). Moreover, the careful, transparent application of research methods may mitigate many concerns regarding methods employed in literature reviews.

Potential Sources of Bias in Literature Reviews

As noted by Kollins et al. (1999), decisions authors make in identifying and selecting literature unavoidably influence findings. The next section reviews characteristics of search and selection procedures with the potential to bias conclusions. These include methods used in identifying articles as well as the bases for excluding otherwise eligible studies (e.g., publication status, quality, relevance, methods of analysis).

Search Procedures

Coverage of the literature represents a measure scholars use to assess the quality of reviews (Delaney & Tamás, 2018). Booth’s (2010) rationale for conducting broad, transparent searches included increasing the likelihood of locating pertinent scholarship, demonstrating procedural rigor, and avoiding questions regarding the completeness of the review. Authors who provide details of database searches facilitate efforts to replicate and evaluate limitations of reviews (e.g., Boolean operators; Sampson et al., 2009). Although they represent powerful tools, databases alone cannot support claims regarding the comprehensiveness of a search (Delaney & Tamás, 2018). Using a small number of electronic databases likewise risks omitting references due to the disparity of records featured in specific databases (e.g., Lawrence, 2008). Likewise, hand searches involving discipline-specific journals, when used in isolation, may overlook work published in alternative outlets. Restricted searches potentially omit research outside of the traditions of behavioral analysis that may nonetheless strengthen intervention practices and research methodology in general (Laraway et al., 2019).

Publication Status

Authors select articles for review based on their resources and research objectives (Petticrew & Roberts, 2008). It can be argued that attempting to include all studies, regardless of relevance or quality, risks introducing bias into the review (Booth, 2010). Unlike peer-reviewed articles, authors may encounter difficulty in locating many unpublished sources concerning a topic. Partial inclusion of specific types of research records (e.g., master’s theses) based on the author’s inability to account for a large portion of existing material represents another source of bias (Ferguson & Brannick, 2012). Hence, many reviewers exclude research based on availability, publication source, or quality (Mahood, Van Eerd, & Irvin, 2014).

The tendency of reviews to exclude unpublished work, however, compounds the effect of publication bias—the overrepresentation of studies with positive findings in peer-reviewed journals (Cook & Therrien, 2017; Tincani & Travers, 2019). In other words, journals publish initial findings that demonstrate an effective intervention, and ignore more mundane replication studies, especially when such studies do not result in positive findings. Thus, published studies typically demonstrate positive intervention effects, relative to unpublished studies.

Publication bias partially stems from the predilection of journal editors for ostensibly novel, exciting, or, at minimum, statistically significant results (Laraway et al., 2019). The resistance of behavior analysis to inferential statistics has supposedly inoculated the field from publication bias, because behavior journals typically do not reject articles on the basis of statistical significance (Pustejovsky & Ferron, 2017; Tincani & Travers, 2019). However, the emphasis on large effects in SCD studies (Baer et al., 1968) may have barred studies with ambiguous findings from publication (Kratochwill, Levin, & Horner, 2018). Results of a survey conducted by Shadish, Zelinsky, Vevea, and Kratochwill (2016) indicated SCD researchers preferred studies with positive findings. Further analyses of studies in behavior analysis (e.g., Sham & Smith, 2014) and related fields (e.g., special education; Gage, Cook, & Reichow, 2017) reveal a disparity in the findings of published and unpublished work, suggesting the exclusion of unpublished studies distorts the results of reviews (Polanin, Tanner-Smith, & Hennessy, 2016). When findings differ based on the language of publication, exclusively including English-language sources constitutes an additional source of bias (e.g., Egger et al., 1997).

Study Quality

Study selection based on methodological rigor may also alter the results of systematic reviews. Considerations of study quality primarily relate to efforts taken in reducing the apparent influence of irrelevant factors, or threats to validity, on results (Petursdottir & Carr, 2018). Efforts to introduce research findings into practice (e.g., the EBP movement) often emphasize high-quality studies, because poor studies may produce unwarranted conclusions (Cook & Cook, 2013). Due to concerns regarding the role of clinical judgement, terminology, and other issues related to evidence (e.g., Slocum et al., 2014), institutions within behavior analysis (e.g., Association for Behavior Analysis International [ABAI]) have yet to disseminate formal evidence standards. Nonetheless, in general behavior analysts encourage the evaluation of methodology when consulting research (Petursdottir & Carr, 2018) and have noted factors such as baseline trends and the length of observations as features to consider when evaluating the rigor of studies (e.g., Derenne & Baron, 1999). Many disciplines including psychology (e.g., Weisz & Hawley, 2002) and education (e.g., What Works Clearinghouse [WWC], 2017) have adopted formal indicators of quality often used to eliminate studies from consideration (e.g., Busacca, Anderson, & Moore, 2015). Guidelines for evaluating quality encompass various designs, including experimental, correlational, and qualitative work (e.g., Thompson, Diamond, McWilliam, Snyder, & Snyder, 2005).

The practice of excluding material based on quality poses several issues, however, in particular when the literature base consists of few studies or the area of interest impedes high-quality research (Petticrew, 2015). Failure to satisfy standards does not entirely preclude the contribution of important information to future inquiry. Studies that do not meet quality standards due to the failure to demonstrate a treatment effect may provide insight into when interventions do not yield results (Kratochwill et al., 2018). In addition, the application of quality indicators has implications for reviews beyond their use as a basis for exclusion. Different indicators of quality produce discordant overlapping reviews (Moore, Maggin, Thompson, Gordon, Daniels, and Lang, 2019; Wendt & Miller, 2012). For example, Maggin, Chafouleas, Goddard, and Johnson (2011b) refuted earlier reviews suggesting token economies represented an EBP after determining many studies supporting the practice in educational environments did not meet quality standards.

Relation to Objectives

Decisions regarding selection criteria with regard to research design, intervention, and participant status relate to the objectives of the review (King et al., 2020). Less precise selection criteria permit the inclusion of a broad array of studies, with the consequence of grouping fundamentally dissimilar studies (e.g., “apples to oranges”; Sharpe, 1997). Subsequent conclusions potentially distort views of outcomes, interventions, or other variables (Maggin et al., 2017). Derenne and Baron (1999) identified the conflation of studies with dramatically different participants and interventions as a threat to validity of systematic reviews and meta-analyses in behavior analysis. Reviews relegated to SCDs historically associated with behavior analysis risk failing to account for group designs that provide additional evidence regarding the effectiveness of interventions.

Analysis

Authors of systematic reviews of intervention research often provide a measure of effectiveness beyond those featured in the original study. Because effect sizes preclude a range of common SCDs and group designs (e.g., Shadish et al., 2015), such procedures effectively constitute a criterion for exclusion. Quantitative effect sizes, typically associated with meta-analyses of group-designs, increasingly appear in systematic and meta-analytic reviews of SCD research as a means of supplementing the limitations of visual analysis (Manolov, Losada, Chacon-Moscoso, & Sanduvete-Chaves, 2016). In a departure from the historic reliance on visual analysis and repudiation of statistics in behavior analysis (e.g., Wolery, Busick, Reichow, & Barton, 2010), frequent publishers of SCD research increasingly require reviews to quantify effects (e.g., Hantula, 2016). Because all effect sizes cannot be applied to the range of designs used to examine questions relevant to interventions and other concerns, the types of analyses featured in reviews have implications for findings.

Purpose

Critics suggest systematic reviews present questionable findings beneath a misleading veneer of technical precision (e.g., Slocum et al., 2014; Ioannidis, 2016). Concerns regarding the adverse effects of inaccurate literature searches led to the efforts to improve systematic reviews (Chalmers et al., 2002). Guidelines set forth by the Cochrane Collaboration—an organization dedicated to systematic reviews of healthcare interventions (Cooper et al., 2009)—and the similar Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Liberati et al., 2009) identify core aspects of the search process. These guidelines require explicit descriptions of objectives, search procedures, selection criteria, data collection, and study results (Shea, Dube, & Moher, 2001). In addition to disclosing search and selection procedures, reviews satisfying the Cochrane standards feature exhaustive searches and selection criteria designed to minimize bias (Higgins et al., 2011). Efforts to assess literature reviews have also emerged in numerous disciplines (e.g., education; Valentine, Cooper, Patall, Tyson, & Robinson, 2010; Moher, Tetzlaff, Tricco, Sampson, & Altman, 2007). Notwithstanding the considerable disagreement regarding the appropriate approach to summarizing previous studies in behavior analysis, scholars appear to be reaching consensus regarding the importance of methodological transparency in literature reviews (Derenne & Baron, 1999; Critchfield et al., 2000; Laraway et al., 2019).

In a recent study, King et al. (2020) conducted a broad assessment of reviews published in special education from 2004 to 2016 (n = 1,196), emphasizing the search and selection procedures featured in systematic reviews and meta-analyses (n = 901). Findings indicated a marked shift toward transparency, with a decrease in the number of narrative reviews accompanied by an increase in the number of published systematic reviews. Transparent descriptions of searches, which in general featured multiple search methods (73.60%), appeared in the majority of reviews. Authors reported explicit details regarding specific inclusion criteria less frequently. Areas in which reported search criteria represented potential problems included the reliance on peer-reviewed and English-language research as well as the limited use of quality assessment. Given the overlap in the populations served and interventions employed, findings from special education have potential implications for behavior analysts. Nonetheless, the characteristics of reviews published in behavior analysis remain uncertain.

An appraisal of literature reviews in behavior analysis has the potential to promote discourse regarding the strengths and limitations of publication practices. The extent to which literature reviews include a wide range of evidence directly relates to their success in identifying effective treatment, describing research trends, and challenging theory. The characteristics of published systematic literature reviews and meta-analyses further reflect the quality of the information consumed within the field. This study evaluated the characteristics of reviews appearing in behavior analysis journals associated with sources of bias. To reflect recent review practices and assess publication trends, the search identified literature reviews published between 1997 and 2017. Research questions included: (1) How has the appearance of narrative, systematic, and meta-analytic reviews changed over time? and (2) To what extent do authors of systematic reviews and meta analyses provide details regarding search and selection procedures? Given their prominence in the field of behavior analysis, we report findings from all journals constituting the sample as well as the publications of the Society for the Experimental Analysis of Behavior (SEAB) and ABAI (n = 7; Table 1).

Table 1.

Identified Records and Review Methodology of Records from Included Journals

Title* Impact Identified Records Review Methodology F
DB HS N S M
Group 1
  Behavioral Disorders 1.161 183 7 8 31 8 47
  Behavior Modification 2.024 142 20 40 29 13 82
  Journal of Clinical Psychology 2.33 380 14 77 29 23 129
  Journal of Consulting and Clinical Psychology 4.536 234 8 24 22 81 127
  Psychopharmacology 3.222 210 23 99 45 14 158
  Group 1 Total 1149 72 248 156 139 543
Group 2
  Behavior Therapy 3.228 120 7 18 12 17 47
  Journal of Applied Behavior Analysis 1.534 108 13 35 20 4 59
  Journal of Behavioral Education 1.087 67 5 6 15 12 33
  Journal of Clinical Psychology in Medical Settings 1.893 153 2 13 20 4 37
  Journal of Organizational Behavior Management 1.419 95 6 11 18 1 30
  Perspectives on Behavior Scienceb 1.357 98 30 25 29 2 56
  Group 2 Total 641 63 108 114 40 262
Group 3
  The Analysis of Verbal Behavior - 39 12 9 19 - 28
  Behavior Analysis: Research and Practicea (1999–2017) - 95 5 19 13 6 38
  Behavioural and Cognitive Psychotherapy 1.633 282 7 19 14 5 38
  Behavioral Interventions .792 48 7 14 10 2 26
  Journal of Behavior Therapy and Experimental Psychiatry 2.397 74 - 7 2 11 20
  Journal of Experimental Psychopathology (2010–2017) .479 24 15 9 16 1 26
  Group 3 Total 562 46 77 74 25 176
Group 4
  Behavioural Pharmacology 1.854 143 14 44 3 2 49
  Behavior Analysis in Practice (2008–2017) - 60 3 9 10 - 19
  Child and Family Behavior Therapy .531 206 3 10 4 1 15
  Experimental and Clinical Psychopharmacology 2.354 73 5 18 8 3 29
  The Psychological Record 1.026 127 7 14 6 - 20
  Group 4 Total 609 32 94 31 6 132
Group 5
  Behavior and Philosophy - 36 1 1 - - 1
  Behavior and Social Issues - 33 2 3 4 - 7
  Behavioural Processes 1.555 185 18 65 2 1 68
  European Journal of Applied Behavior Analysis (2000–2017) - 238 4 7 4 - 11
  Journal of the Experimental Analysis of Behavior 2.010 110 4 20 2 - 22
  Journal of Experimental Psychology-ALC 1.861 16 - 6 - - 6
  Group 5 Total 618 29 102 12 1 115
TOTAL (Across Groups): 3,579 242 630 387 211 1,228

Note: DB = Database search; HS = Hand search; N = Narrative; S = Systematic; M = Meta-analysis; F = Final. Underlined journals published by the Association for Behavior Analysis International or the Society for the Experimental Analysis of Behavior. Database column refers to records retrieved in the initial search. Impact factors taken from the Web of Science (WOS) Journal Citation Reports® Social Sciences Edition (2017). Hand search record included in the final sample. Groups represent quintiles based on the total systematic and meta-analytic reviews. *Journals published within the full search range unless otherwise noted. aAlternate titles featured in search include The Behavior Analyst Today. bAlternate titles featured in search include The Behavior Analyst

Method

Search Procedures

A literature search completed in February 2018 selected reviews published in peer-reviewed, English-language journals related to behavior analysis. At the suggestion of the editor of PoBS, journals published by the SEAB, ABAI, or identified by Hantula, Critchfield, and Rasmussen (2017) met criteria for inclusion. Journals featured in the 2017 WOS Journal Citation Reports® Social Sciences Edition (JCR; Clarivate Analytics, 2017) categories for clinical psychology or behavioral sciences also met criteria for inclusion, provided their aims and scope explicitly referred to behavior analysis or SCD. WOS subject categories correspond to citation patterns (Wang & Waltman, 2016) and represent an internationally accepted journal subject classification system (Waltman, 2016). Special education journals (e.g., Education and the Treatment of Children) met criteria for exclusion due to their inclusion in a previous review (King et al., 2020). A complete list of included journals (n = 28) appears in Table 1.

For each of the journals, an electronic search evaluated articles published between 1997 and December 2017. We adopted a 20-year search time range for three reasons. First, we accepted the rationale of WWC (2017) reviews, which excludes material published prior to 20 years of the review in order to represent current practice and avoid scholarship involving methods, contexts, or populations substantially different from the contemporary setting. Second, we wanted to avoid charges of penalizing older reviews (i.e., > 20 years old) composed several years prior to the initial refinement of interdisciplinary standards for literature syntheses (e.g., Moher et al., 1999; see Shea et al., 2001). Third, the initial year of eligibility for the current search coincides with the publication of the review conducted by Kollins et al. (1997), which served as the basis of the first series of articles debating the merits of literature syntheses in PoBS (Derenne & Baron, 1999; Kollins et al., 1999, Baron & Derenne, 2000; Critchfield et al., 2000).

Initial search procedures consisted of an electronic search of article titles and abstracts. In particular, a command line search of PsycINFO and ProQuest Psychology databases in the title and abstract search fields used the following Boolean string:

(JN("[JOURNAL]") AND TI(review* OR meta-analy* OR "meta analy*" OR synthes* OR survey)) OR (JN("[JOURNAL]") AND AB(review* OR meta-analy* OR "meta analy*" OR synthes* or survey))

The research team reviewed identified titles and abstracts and eliminated records that were irrelevant to the research questions (e.g., primary investigations). We then evaluated full texts of the remaining abstracts using article inclusion criteria. Following the electronic search, we conducted a hand search of all material published in included journals over the entire period of interest (i.e., 1997–2017).

Inclusion Criteria

Literature reviews, defined as an explicit assessment of previous literature, articles, studies, research, or evidence without describing an original investigation, included articles (1) identified as a literature review, survey, evaluation, meta-analysis, summary, or synthesis in the article title or abstract; and (2) solely analyzed data from aggregated manuscripts, books, reports, or dissertations. Articles satisfying these parameters appearing in selected journals (Table 1) within the targeted publication range met criteria for inclusion regardless of population, intervention, or other features of the targeted studies. Excluded articles reviewed clinical records (e.g., Kahng et al., 2015) or topics without explicitly referencing supporting literature or research. Book reviews, editorials, opinion pieces, and topic reviews not identified as literature reviews also met criteria for exclusion. Five members of the research team, including two PhD-level faculty members and three doctoral students, evaluated all titles and abstracts.

Coding

The research team assessed articles using binary codes derived from previous evaluations (e.g., King et al., 2020; Maggin et al., 2017). Prior to application, an expert panel consisting of three PhD faculty members with experience conducting reviews and behavior analysis evaluated the codes. Coders included three PhD-level faculty and four doctoral students in behavior analysis or special education. All coders received training via a 60-min video-recorded module and practiced coding on a minimum of three sample articles until reaching criterion of 95.00% agreement on three consecutive articles. Table 2 provides a detailed description of codes.

Table 2.

Coding Categories and Descriptions

Indicator Sub-Categories Description
Methodology Narrative Review does not provide information regarding search procedures.
Systematic Review completely or partially identifies procedures used to identify studies.
Meta-analysis Systematic reviews described as meta-analyses or that statistically analyzed results.
Area of Focus Intervention Review concerned treatment or intervention in terms of their outcomes.
Feature Review primarily assessed features or characteristics of studies (e.g. fidelity).
Other Review addressed conceptual, historical, or other concerns.
Search Procedures Database Indicated use of a database search and number of databases used. Also included reports of search terms including: provided (specific terms); field codes (i.e., terms searched in a specific component of the article); and Boolean operators (i.e., explicit syntax used in the search).
Hand Search Indicated a manual search of publications. Additional codes noted whether authors specified the publications searched and whether search of encompassed a single journal or the journal that published the review exclusively.
Ancestral Authors reported searching reference lists or literature reviews for articles.
Consultation Authors consulted researchers in the field or editors to identify eligible articles.
Selection Criteria Temporal Article identified date of occurrence (i.e., when the search was conducted) and date range (i.e., years in which articles were eligible for inclusion).
Language Indication of explicit language restriction (i.e., whether language served as basis of exclusion), nonexplicit restriction (i.e., exclusive search of English-language publications) and English-only (i.e., explicit restriction of inclusion to English publications).
Operational Criteria Indicated if review concerned specific age groups, disabilities, behaviors, or interventions. Not applicable for reviews unrelated to these factors. If applicable, noted whether authors provided operational definitions for each.
Sources Reviewed Provides indication of documents explicitly subjected to analysis. Included peer-reviewed journals, dissertations, book chapters, and other (e.g., conference proceedings). Also indicates whether authors exclusively included peer-reviewed sources or articles exclusive to a single journal.
Designs Identified designs of studies included in review. Includes group, qualitative, single-case, not specified, and other (e.g., conference proposals).
Quality Authors evaluated and/or excluded studies based on formal assessment (i.e., explicitly outlined or cited quality indicators). Categories for specific indicators (e.g., author created, What Works Clearinghouse) developed following analysis of reported citations.
Analysis Refers to procedures used to evaluate effects of interventions; visual analysis (i.e., explicit procedures used to interpret or categorize data), author report (i.e., presents claims provided by study authors or does not specify means of analysis), and effect size. Categories for specific effect indices developed following analysis.

Descriptive codes

Coders recorded the date of publication and first author of all articles.

Review classification

Classification codes categorized reviews as narrative, systematic, or meta-analytic. Following principal data collection, the combined total of systematic and meta-analytic reviews in each publication served as the basis for dividing journals into quintiles.

Area of focus

Area of focus codes reflected the topics addressed by the authors. Areas of focus included interventions, features, and other. Given the critical role of systematic reviews in identifying evidence supporting the use of specific treatments, we reported codes pertaining to intervention reviews separately.

Search procedures

Search procedure codes denoted the use of (1) electronic databases, (2) hand searches, (3) ancestral searches, and (4) expert consultation. Database searches included any procedure involving the use of electronic programs designed to search article features such as comprehensive indices of published material (e.g., PsycINFO), online applications (e.g., Google Scholar), and search engines directly aligned with a journal (e.g., PoBS). Due to the availability of journal-specific databases, reviews that selected articles from a single journal without identifying the method did not receive a search procedure code. Additional codes pertained to details of specific search methods (e.g., search terms, number of journals searched).

Selection procedures

Selection procedure codes pertained to explicit reports of article inclusion criteria, including publication date, language (e.g., English only), operational descriptions of targeted variables, eligible sources, study designs, quality, and data analysis. Explicit statements represented a definitive indication of the research designs subject to review and did not include general terms (e.g., “experimental”) encompassing various designs. Application of the operationalization code varied based on the foci of the review, which coders assessed across specific dimensions (e.g., age) prior to assessing the suitability of descriptions.

Quality assessment procedures expressly evaluated the quality, rigor, or suitability of design features of included studies. Design features included the presentation of baseline data, data collection frequency, or phases required for experimental control. Quality assessment did not include studies evaluating a single methodological feature across studies (e.g., interobserver agreement [IOA]; e.g., Kostewicz et al., 2016). Coders categorized quality assessment procedures based on reported citations following the completion of the initial analysis. We attributed procedures described in detail without an accompanying citation to the author of the review. Analysis codes indicated whether reviews, regardless of focus, reported effectiveness of included studies (i.e., change in an intervention outcome). As with quality procedures, coders categorized effect indices based on the descriptions featured in the reviews (e.g., PND). Coders categorized similar effect sizes based on the standardized mean difference (SMD) such as Cohen’s d and Glass’s method (Cummings, 2011) as SMD.

Analysis

Given the descriptive nature of study objectives, analyses involved reporting the number or proportion of reviews exhibiting specific characteristics. Authors entered all codes into a spreadsheet during the coding process. Descriptive statistics were calculated using SPSS.

Interobserver Agreement

Coders collected IOA for review selection and coding. Personnel received training prior to both procedures. Coders then evaluated articles published beyond the range of the search (e.g., prior to 2006) until obtaining 95.00% agreement on three consecutive articles. For abstract searches, coders derived agreement—concurrence on inclusion status—by dividing the number of agreements by the total number of abstracts. Two coders independently assessed 100% of abstracts (n = 3,579) obtained via databases, with an IOA of 94.87%. Following the hand search, coders then derived IOA for 20.03% of included articles (n = 246) using the point-to-point approach (Kennedy, 2005), resulting in an average IOA of 95.06% (range = 78.00–100%; SD = 6.4). Authors discussed discrepancies until reaching consensus.

Results

The database search generated 3,579 abstracts following the removal of duplicates. A hand search of all included journals yielded an additional 242 articles. Articles identified in the hand search typically described the study using terms omitted from the database due to their potential to increase the number of irrelevant records (e.g., “evaluate”). In addition, several studies evaluated specific article characteristics without identifying the methodology (i.e., literature review) or compared secondary analysis techniques (e.g., effect sizes for SCDs) without explicitly referring to a requisite literature review in the abstract. The two searches resulted in a total sample of 1,228 reviews. Figure 1 depicts the study selection process.

Fig. 1.

Fig. 1

Study selection procedures

On average, 44 reviews appeared in each journal (range = 1–158; SD = 38.53), with an average of 60 reviews published per year (range = 29–114; SD = 19). Each journal published an average of 2.23 reviews per year, (range = 0–16; SD = 2.61). Unless otherwise indicated, the term systematic review in the following sections refers to both systematic and meta-analytic reviews.

Review Classification

Narrative reviews represented the majority of reviews (51.30%; n = 630), with systematic (31.51%; n = 387) and meta-analytic reviews (17.18%; n = 211) appearing less frequently. Narrative, systematic, and meta-analytic reviews exhibited a yearly publication rate of 30 (range = 19–41; SD = 5.88), 18.43 (range = 4–51; SD = 11.24), and 10.05 (range = 1–22; SD = 5.35), respectively. A description of the reviews featured in each journal appears in Table 1. Journals published an average of 22.50 narrative (range = 1–99; SD = 23.39), 14.88 systematic (range = 0–45; SD = 11.09), and 10.55 meta-analytic reviews (range = 0–81; SD = 17.71). A large portion of systematic reviews (88.89%) appeared in the top 3 quintiles (M = 26.62%; range = 19.12–40.31%; SD = 10.60), which encompass 17 of included journals. In contrast, 65.56% of meta-analyses appeared in journals in the top quintile.

Figure 2 depicts the types of reviews published from 1997 to 2017. Though variable, narrative reviews exhibited a declining trend between 1997 and 2004, yet consistently represented over half of reviews published each year until 2006. The percentage of narrative reviews remained largely consistent between 2005 and 2010 before assuming a decelerating trend in 2011. Compared to narrative reviews, fewer systematic reviews appeared from 1997 to 2015. However, systematic reviews exhibited an increasing trend, which rapidly accelerated beginning in 2011. The percentage of systematic reviews published each year surpassed narrative reviews in 2016. Beginning in 1997, meta-analytic reviews exhibited a gradually increasing trend. Although the percentage of meta-analytic reviews in general did not exceed alternatives during this span, meta-analyses and systematic reviews combined represented more than half of all reviews published each year after 2012.

Fig. 2.

Fig. 2

Narrative, systematic, and meta-analytic reviews identified in the total sample of behavior journal (top) and the publications of SEAB and ABAI (bottom). Graphs depict the percentage of reviews with all journal, by type, as well as the real number of reviews published each year (gray bar)

SEAB and ABAI publications

Narrative reviews represented 54.50% (n = 115) of the 211 reviews published in SEAB and ABAI journals. Systematic reviews represented slightly less than half of reviews (42.65%; n = 90). Meta-analyses appeared in a small number of reviews (2.84%; n = 6). Yearly publication rates of narrative reviews (5.47; range = 0–10; SD = 3.01) exceeded those of systematic (4.28; range = 1–12; SD = 2.67), and meta-analytic reviews (.29; range = 0–2; SD = .56).

When highlighting the instances of reviews specifically within SEAB/ABAI journals, both similarities and differences appear (Figure 2). Narrative reviews equaled or exceeded the percentage of systematic reviews published in all but 5 years of the assessed publication range. Narrative and systematic reviews maintained consistent levels with a slightly increasing trend and low variability. Prior to 2014, meta-analyses did not appear often within the identified sample. Unlike the overall sample, narrative and systematic maintained approximately the same level and meta-analyses rarely appeared.

Area of Focus

Roughly half of systematic reviews involved intervention research (e.g., Seubert, Fryling, Wallace, Jiminez, & Meier, 2014; 48.83%; n = 292). Conceptual (e.g., Littell & Girvin, 2002; 30.10 %; n = 180) and methodology reviews (e.g., Buchanan, Husfeldt, Berg, & Houlihan, 2008; 21.07%; n = 126) reviews appeared less often. Table 3 describes the search and selection procedures reported in intervention reviews.

Table 3.

Search and Selection Procedures Featured in Systematic Reviews and Meta-analyses

Feature All Foci (%) Intervention Focused (%)
Total ABAI/SEAB Total Reviews ABAI/SEAB
Systematic 387 (64.71) 90 (93.75) 143 (48.97) 19 (76.00)
Meta-Analysis 211 (35.28) 6 (6.25) 149 (51.02) 6 (24.00)
Search Procedures
  Methods
    One Method 166 (27.76) 55 (57.29) 43 (14.73) 10 (40.00)
    Two Method 289 (48.33) 34 (35.42) 157 (53.77) 12 (48.00)
    > Three Methods 143 (23.91) 7 (7.29) 92 (31.51) 3 (12.00)
  Database Searches
    Reported 531 (88.80) 62 (64.58) 283 (96.92) 22 (88.00)
    Database Only 107 (20.15) 22 (35.48) 38 (13.42) 7 (31.82)
    Unlisted 13 (2.45) 1 (1.61) 7 (2.47) 0 (0.00)
    One Database 114 (21.47) 21 (33.87) 40 (14.13) 4 (18.18)
    Multiple 404 (76.08) 40 (64.51) 236 (83.39) 18 (81.82)
    Mean Databases (R;SD) 3 (1–16; 2) 2.41 (1–7; 1.52 ) 3.3 (1–16; 2.24) 2.77 (1–6; 1.51)
    Terms Reported 460 (86.63) 47 (75.81) 248 (87.63) 20 (90.91)
    Field Reported 275 (51.79) 31 (50.00) 151 (53.36) 12 (54.55)
    Booleans Reported 71 (13.37) 2 (3.23) 44 (15.55) 2 (9.09)
  Hand Searches
    Reported 164 (27.42) 46 (47.92) 62 (21.23) 4 (16.00)
    Hand Search Only 45 (27.44) 30 (65.22) 3 (4.84) 3 (75.00)
    Publications Listed 143 (87.20) 41 (89.13) 56 (90.32) 4 (100)
    Mean Journals Searched (R; SD) 4.9 (1–25; 4) 3.44 (1–17; 4.22) 4.9 (1–25; 4.24) 2.5 (1–7; 3)
    Restricted to Single Journal 30 (18.29) 18 (39.13) 3 (4.84) 2 (50.00)
    Exclusive to Same Journal 24 (14.63) 15 (32.61) 2 (3.23) 2 (50.00)
  Ancestral Searches
    Reported 395 (66.05) 31 (32.29) 232 (79.45) 13 (52.00)
    Ancestral Search Only 12 (3.29) 3 (9.68) 2 (.86) 0 (0.00)
  Consultation
    Reported 97 (16.22) 5 (5.21) 64 (21.92) 4 (16.00)
    Consultation Only 1 (1.03) 0 (0.00) 0 (0.00) 0 (0.00)
Selection Procedures
  Temporal
    Date of Search 95 (15.89) 6 (6.25) 50 (17.12) 2 (8.00)
    Range of Search 283 (47.32) 60 (62.50) 121 (41.44) 13 (52.00)
  Language
    Explicit Restrictions 216 (36.12) 9 (9.38) 115 (39.38) 4 (16.00)
    Non-explicit Restriction 68 (11.37) 36 (37.50) 6 (2.05) 4 (16.00)
    English Only 250 (88.03) 43 (95.56) 102 (84.30) 8 (100)
  Operational Criteria
    Applicable 450 (75.25) 43 (44.79) 284 (97.26) 24 (96.00)
    All Concerns Operationalized 237 (52.67) 24 (55.81) 143 (50.35) 13 (54.17)
  Sources Reviewed
    Explicit 204 (34.11) 33 (34.38) 112 (38.36) 13 (52.00)
    Peer-reviewed 190 (93.14) 27 (81.82) 111 (99.11) 13 (100)
    Peer-reviewed Exclusive 155 (75.98) 21 (63.64) 89 (79.46) 11 (84.62)
Total ABAI/SEAB Total ABAI/SEAB
    Dissertation 28 (13.73) 4 (12.12) 20 (17.86) 2 (15.38)
    Book Chapter 11 (5.39) 6 (18.18) 7 (6.25) 0 (0.00)
    Other 18 (8.82) 3 (9.09) 18 (16.07) 0 (0.00)
  Design
    Explicit 222 (37.12) 21 (21.88) 155 (53.08) 13 (52.00)
    Group 156 (70.27) 9 (42.86) 115 (74.19) 5 (38.46)
    Group Exclusive 115 (51.80) 3 (14.29) 87 (56.13) 3 (23.08)
    Single-case 89 (40.09) 17 (80.95) 63 (40.64) 10 (76.92)
    Single-case Exclusive 52 (23.42) 11 (52.38) 37 (23.87) 8 (61.54)
    Qualitative 14 (6.31) 3 (14.29) 7 (4.52) 0 (0.00)
    Other 28 (12.61) 4 (19.05) 10 (6.45) 1 (7.69)
  Quality
    Formal 114 (19.06) 1 (1.04) 90 (30.82) 1 (4.00)
    Cochrane 18 (15.79) 0 (0.00) 16 (17.78) 0 (0.00)
    APA Task Force 4 (3.51) 0 (0.00) 3 (3.33) 0 (0.00)
    CEC/WWC* 24 (21.05) 1 (100)) 20 (22.22) 1 (100)
    Author 22 (19.30) 0 (0.00) 16 (17.78) 0 (0.00)
    Other 51 (44.74) 0 (0.00) 42 (46.67) 0 (0.00)
    Multiple 17 (14.91) 0 (0.00) 14 (15.56) 0 (0.00)
    Excluded 8 (7.02) 0 (0.00) 7 (7.78) 0 (0.00)
  Analysis
    Assessed Outcomes 481 (80.43) 42 (43.75) 292 (100) 25 (100)
    Author Report 290 (60.29) 30 (71.43) 156 (53.42) 14 (56.00)
    Author Report Only 223 (46.36) 26 (61.90) 108 (36.99) 12 (48.00)
    Visual Analysis 27 (5.61) 4 (9.52) 20 (6.85) 3 (12.00)
    Effect Sizes Calculated 238 (49.48) 11 (26.19) 168 (57.53) 9 (36.00)
    Multiple Effect Sizes 46 (19.33) 3 (27.27) 33 (19.64) 3 (33.33)
    SMD 152 (63.87) 8 (72.73) 119 (70.83) 6 (67.00)
    Pearson’s r 27 (11.34) 0 (0.00) 5 (2.98) (0.00)
    PND 19 (7.98) 2 (18.18) 15 (8.93) 2 (22.22)
    Tau-U 12 (5.04) 1 (9.09) 6 (3.57) 1 (11.11)
    NAP 7 (2.94) 1 (9.09) 3 (1.79) 1 (11.11)
    PAND 5 (2.10) 0 (0.00) 4 (2.38) 0 (0.00)
    IRD 5 (2.10) 0 (0.00) 4 (2.38) 0 (0.00)
    Other 61 (25.63) 1 (9.09) 41 (24.40) 1 (11.11)

Note. Percentages exceeding 100% have articles with several characteristics. Percentages of indented, italicized, or underlined items derived from previous category without indentation. ABAI/SEAB = Journals published either by the Association for Behavior Analysis International or the Society for the Experimental Analysis of Behavior; SMD = standardized mean difference

SEAB and ABAI publications

Systematic reviews appearing in SEAB and ABAI publications in general involved reviews of methodology (45.83%; n = 44). Conceptual reviews (28.13%; n = 27) and intervention reviews (26.04%; n = 25) represented a roughly equivalent proportion of the remaining articles.

Search Procedures

Systematic and meta-analytic reviews provided descriptions of search procedures. Table 3 describes the number of methods employed, the use of specific search methods, and procedures unique to each method. Authors employed more than one search method in 72.24% of reviews (n = 432), with intervention reviews using multiple searches in 85.27% of instances (n = 249). Authors frequently provided specific databases and search terms. However, information regarding field codes or Boolean operators appeared less frequently. Reviews featuring hand searches in general provided information regarding the publications included in the search. Of the articles featuring hand searches, 27.44% (n = 45) employed the method in isolation. Authors also reported restricting hand searches to a single journal (18.29%; n = 30), though this occurred less frequently in intervention reviews (4.84%; n = 3).

SEAB and ABAI publications

Authors of reviews appearing in SEAB or ABAI journals reported using one search method in the majority of reviews (57.29%; n = 55). Intervention reviews typically featured multiple search methods (60.00%; n = 15). Of these, hand searches represented the most common method used in isolation (54.55%; n = 30), followed by database searches (40.00%; n = 22). Information regarding database searches typically included search terms and fields, though specific Boolean operators appeared less commonly. Approximately one-third of systematic reviews employing hand searches restricted the search to a single journal (39.13%; n =18) or the journal in which the article appeared (32.61%; n = 15).

Selection Procedures

Selection criteria codes pertained to seven domains relevant to the selection of articles within systematic reviews. Domains included criteria related to temporal characteristics (e.g., publication date), language, descriptions of targeted variables, eligible sources, and study designs featured in each review. Codes also pertained to procedures with the potential to influence article inclusion; in particular, the use of quality assessment and methods of data analysis. Table 3 provides an overview of these domains within systematic reviews.

Temporal

Authors described the range of years eligible for review in 47.32% (n = 283) of all reviews and 41.44% (n = 121) of intervention reviews. Search dates appeared infrequently in the total and intervention reviews, at 15.89% (n = 95) and 17.12% (n = 50), respectively.

  • SEAB and ABAI publications. The range of years eligible for review appeared in 62.50% (n = 60) of all reviews and 52.00% (n = 13) on intervention reviews. Less than 10.00% of reviews, regardless of focus, reported the date of the search.

Language

Approximately one-third of reviews (36.12%; n = 216) reported exclusion criterion on the basis of language, with 11.37% (n = 68) providing sufficient detail to indicate limits on the basis of language. These reviews typically restricted inclusion to articles written in English (Table 3).

  • SEAB and ABAI publications. Reviews appearing in SEAB or ABAI publications infrequently reported explicit criterion concerning the language of eligible publications (9.38%; n = 9). Sufficient information to discern the language of included publications appeared in 37.50% of systematic reviews. Of these, in general authors limited inclusion to English-language publications (95.56%; n = 43).

Operationalization of criteria

The majority of total systematic reviews (75.25%; n = 450) targeted a combination of variables related to participant age, disabilities, behaviors, or interventions. Of these, approximately half operationalized the relevant variables (Table 3).

  • SEAB and ABAI publications. Of reviews that targeted applicable variables (44.79%; n = 43), 55.81% operationalized all variables relevant to the search (n = 24).

Sources reviewed

Authors reported explicit descriptions of article sources in 34.11% (n = 204) of total reviews. Descriptions of source criteria appeared more frequently in systematic reviews of intervention research (38.36%; n = 112). Regardless of whether they concerned interventions, systematic reviews most frequently obtained records from peer-reviewed articles. Of these, 75.98% (n = 155) exclusively evaluated peer-reviewed articles. Intervention reviews also excluded sources other than peer-reviewed articles (79.46%; n = 89).

  • SEAB and ABAI publications. Articles featured descriptions of eligible sources in 34.38% of the total sample (n = 33) and 52.00% of intervention reviews (n = 12). Searches relied on peer-reviewed articles extensively, with majorities of the total sample (63.64%; n = 21) and intervention reviews (84.52%; n = 11) exclusively featuring peer-reviewed articles.

Study designs

Explicit criteria pertaining to study design appeared in 37.12% (n = 222) of total reviews and 53.08% of intervention reviews (n = 155). Authors reported including group designs and SCDs most frequently (Table 3). Of the reviews that reported design criteria, approximately half of all systematic reviews (51.80%; n = 115) and intervention-focused reviews (56.13%; n = 87) exclusively included group designs. Authors restricted article inclusion to SCDs in 23.42% (n = 52) of total reviews and 23.87% (n = 37) of intervention reviews.

  • SEAB and ABAI publications. Within SEAB and ABAI publications, inclusion criteria pertaining to study design appeared in 21.88% (n = 21) of the total sample and 52.00% (n = 13) of intervention reviews. Of these, authors restricted article inclusion to SCDs in 52.38% (n = 11) of the total sample and 61.54% (n = 8) of intervention reviews.

Quality assessment

Formal quality assessment procedures appeared in 19.06% (n = 114) of systematic reviews and 30.82% (n = 30.8) of intervention-focused reviews. Quality assessment included indicators associated with education (e.g., Horner et al., 2005; WWC, 2017; Council for Exceptional Children, 2014), the Cochrane Collaboration (e.g., Higgins et al., 2011), and author created measures. Few reviews (7.02%; n = 8) excluded articles based on formal quality assessment.

  • SEAB and ABAI publications. A single review appearing in SEAB and ABAI publications (1.04%) formally assessed the quality of included articles using a contemporary version of the WWC (2017) standards (Sham & Smith, 2014).

Analysis

Authors assessed study outcomes in 80.43% (n = 481) of total reviews and 100% (n = 292) of reviews focusing on interventions. Author report appeared in more than half of the total sample (60.29%; n = 290) and intervention reviews (53.42%; n = 156). Approximately half of all reviews (46.36%; n = 223) used author report exclusively. Visual analyses appeared in 5.61% of reviews (n = 27). Effect sizes appeared more commonly (49.48%; n = 238). Effects indices based on the SMD represented the most frequent effect size (63.87%; n = 152). Nonoverlap effect sizes associated with SCD (e.g., PND, Tau-U; Parker, Vannest, & Davis, 2011) appeared in 20.17% of total reviews (n = 48).

  • SEAB and ABAI publications. Outcome assessment appeared in 43.75% (n = 42) of reviews published in SEAB and ABAI journals. Author report represented the only form of analysis in 61.90% (n = 26) of all systematic reviews and 48.00% (n = 12) of intervention reviews. Visual analysis appeared in 9.52% (n = 4) of all systematic reviews. Authors provided effect sizes in 26.19% (n = 11) of reviews.

Discussion

This article evaluated the characteristics of literature reviews published in journals of behavior analysis from 1997 to 2017. Narrative reviews continue to represent a significant portion of published reviews, particularly in SEAB and ABAI publications in which meta-analyses rarely appear. Within the larger sample of journals, the appearance of systematic and meta-analytic reviews have gradually accelerated since the late 1990s. Over the last 5 years, systematic approaches to literature reviews represented more than half of published reviews. Authors of systematic reviews generally conducted comprehensive searches featuring multiple methods, though SEAB and ABAI journals reported using one search method—hand searches—more frequently relative to the total sample. Descriptions of database searches suggested many authors used a range of databases but fewer authors provided details needed to replicate the review (e.g., field codes, Boolean operators). In terms of article selection, intervention reviews provided more explicit descriptions of criteria. Nonetheless, a limited number of reviews and meta-analyses described selection procedures in detail. The described selection procedures in some reviews had the potential to adversely influence results (e.g., reliance on peer-reviewed material) by limiting the number of applicable articles subject to review, and thus reducing the accuracy of findings.

Notwithstanding the emphasis on the behaviors of individuals in behavior analysis, the necessity of examining replication and generality across studies as articulated by Sidman (1960) implicitly entails the evaluation of studies united by some theme. Literature reviews provide a formal mechanism for the documenting the extent of generality across studies. The increasing appearance of reviews over time, regardless of methodology, in general reflects a corresponding commitment to examining previous literature within ABAI and SEAB journals as well as behavior analysis more generally. Debates regarding appropriate approaches toward examining the behavior analytic literature (e.g., Salzberg et al., 1987; Kollins et al., 1999) further suggest the presence or absence of specific features in reviews result from decisions regarding the literature summary process rather than a lack of concern for reviews in general.

Compared to the related discipline of special education (King et al., 2020), reviews appeared ~50% less often in behavior analysis journals. The disparity likely represents the continued emergence of behavior analysis as an independent discipline rather than a lack of adequate reflection on findings within the field (Vargas, 1987; Dorsey, Weinberg, Zane, & Guidi, 2009). Given considerable overlap exists between ABA and services provided to individuals with disabilities (Marr, 2017) as well as psychology (Fraley & Vargas, 1986), much work pertaining to behavior analysis may have appeared in outlets more directly related to these fields. However, the increased appearance of work relevant to behavior analysts in theoretically eclectic disciplines (e.g., special education) may prevent such scholarship from reaching consumers of journals exclusive to behavior analysis (Odom, 2009).

In contrast to the dramatic decline observed in special education (King et al., 2020), narrative reviews continue to represent a large portion of reviews published in behavior analysis and within SEAB and ABAI journals in particular. This represents an apparent persistence of the traditional preference for narrative reviews in behavior analysis (e.g., Salzberg et al., 1987; Mace & Critchfield, 2010) as well as developments unique to related disciplines. Special education falls under the purview of governmental entities that encourage systematic reviews of intervention research in pursuit of EBP (Cook & Cook, 2013). Hence, publication trends may reflect the difference between a primarily applied discipline (i.e., special education) versus a field characterized by a range of nonclinical subdisciplines (e.g., EAB; philosophical behaviorism; Cooper et al., 2007). Flagship applied and cross-categorical journals (e.g., JABA; PoBS) published larger numbers of systematic reviews relative to outlets concerning EAB (e.g., The Psychological Record), practitioner concerns, or other subdisciplines. To reach a wider audience, authors may publish reviews of basic scientific research in cross-categorical journals. Given that fewer than half of all systematic reviews addressed intervention research, however, this does not fully explain the disparity in the publication rates of journals.

The influence of historical predispositions within behavior analysis remains especially apparent in terms of the relative absence of meta-analyses, in particular in ABAI and SEAB journals. Many authors expressed misgivings about the use of meta-analyses in behavior analysis (e.g., Salzberg et al., 1987; Derenne & Baron, 1999), and significance tests and statistical aggregation do not accord with methods espoused by the field’s dominant figures (e.g., Skinner, 1938; Sidman, 1960). In contrast to the total sample, in which the largest numbers of reviews pertained to interventions, many of the systematic reviews featured in ABAI and SEAB journals concerned methodological issues. Leaving aside the value of such work to the field, this suggests authors may have attempted to circumvent disputes pertaining to the summation of research by abstaining from analysis entirely (Critchfield et al., 2000). A related, and potentially more alarming, issue concerns the use of author report to assess intervention results as opposed to any form of objective evaluation—including visual analysis. Avoiding the challenge of summarizing literature in behavior analysis by defaulting to imprecise, narrative descriptions potentially obscures the magnitude and direction of findings and stymies efforts to assess replicability and generality (Scruggs et al., 1987b; Laraway et al., 2019).

Although initially published less frequently than narrative reviews, systematic reviews and meta-analyses increased over time, and combined represent the majority of reviews appearing in the total sample since 2012. The concentration of meta-analyses in a small number of journals, together with the few meta-analyses in SEAB and ABAI publications, suggests these increases may not reflect trends in the field. Regardless, the growth in systematic reviews would appear to reflect a greater emphasis on transparency in behavior analysis in general (Tincani & Travers, 2019). However, systematic reviews and meta-analyses exhibited marked variability in terms of methodological transparency. Authors frequently provided information related to the breadth of the search (e.g., method used), yet uncertainty concerning the years included in the search or precise search terms could lead to difficulties in replicating a review. As noted by Critchfield et al. (2000), methods employed by authors play a large role in determining the results. Whatever the view of individual scholars on the propriety of specific search, selection, or analysis procedures, consumers cannot adequately interpret findings without a clear description of procedures.

In addition to issues with transparency, many reviews in behavior analysis reported selection criteria with the potential to distort findings, such as the exclusion of material other than peer-reviewed journal articles (Cook, 2014). As noted by Sham and Smith (2014), neglecting unpublished literature has negative implications for consumers of behavior analysis due to the inflation of treatment effectiveness associated with publication bias (Cook, 2014). Laraway et al. (2019) likewise identified the emphasis on published material as a threat to the accurate evaluation of replication in behavior analysis. Increasing the accessibility of unpublished material through the inclusion of such studies in widely disseminated literature reviews represents a potential step in reducing the harm associated with ineffective treatment (Sham & Smith, 2014). However, publication practices in behavior analysis currently resemble those in other fields (e.g., special education; King et al., 2020; Gage et al., 2017) that overwhelmingly predicate conclusions regarding treatment effectiveness on peer-reviewed research.

The appearance of quality indicators in behavior analysis journals, though infrequent, approximated their application in fields more closely associated with EBP (i.e., special education; 27.40%; King et al., 2020). Authors most often used quality indicators appearing in earlier, largely uncited work (i.e., “other”), created their own standards, or adopted standards from medicine (i.e., Cochrane Collaboration; Higgins et al., 2011) and education (e.g., WWC, 2017). The use of objective, delineated quality standards represents a positive development as practitioners routinely invoke evidence as (1) a purported benefit of behavior analysis and (2) a justification for punitive interventions (Vollmer et al., 2011). The validity of such claims remains questionable in the absence of a clear definition of acceptable evidence. Consensus regarding evidence standards within behavior analysis could also prevent the confusion created by the preponderance of quality standards (Wendt & Miller, 2012).

Although an exhaustive discussion of data analysis rests beyond the purview of this article, present findings raise issues unrelated to search or selection procedures. Prevalent throughout the reviews, author report of study results represents one the most inclusive forms of assessing outcomes as reviewers include all records regardless of their compatibility with the analysis. Extensive use of this practice appears to reflect an uncritical acceptance of an author’s interpretation, however, which the actual data may not consistently support. Notwithstanding the inclusion of journals publishing a wide variety of studies, the use of SMD indicates that behavior analysis encompasses a range of methodologies beyond SCD. The tradition of SCD research within behavior analysis remains evident given the relative frequency of nonoverlap effect sizes used in the literature (e.g., Parker et al., 2011). Yet the absence of visual analysis invites questions regarding the transparency of this technique. In describing studies, authors may have implicitly applied visual analysis, thereby rendering these conclusions indistinguishable from those of the primary study. Increasing recognition regarding the limitations of visual analysis suggests more transparent applications of the procedure should appear in reviews and primary studies (Manolov & Vannest, 2019).

The logistics of both the publication and research processes may partially explain the acceptance of reviews with ostensible sources of bias. Attempts to provide comprehensive descriptions of search procedures, though needed to evaluate the review, often conflict with journal page limitations. Relying exclusively on English-language research potentially poses threats to the validity of findings and underscores issues behavior analysis continues to encounter in becoming an international discipline (Martin, Nosik, & Carr, 2016). The provincial nature of certain questions, combined with the difficulty in securing and interpreting multilingual literature, likely explains the journal editors continued acceptance of such practices (King et al., 2020). Renewed dialogue among authors and editors regarding the essential components of reviews—perhaps an additional objective of this review—could result in a change in publication practices or, at minimum, an explicit commitment to the status quo (Maner, 2014). Likewise, editors may address issues resulting from page limitations through the adoption of online supplements (Tenopir et al., 2011).

Limitations

In highlighting issues with systematic and meta-analytic reviews in behavior analysis, the methods currently employed appear to have perpetuated practices associated with bias. In particular, the present review relied exclusively on peer-reviewed articles obtained from a narrow range of English-language publications. The journal selection process omitted lesser-known or low-impact outlets associated with behavior analysis as well as unpublished work. Difficulty in identifying discrete behavior analysis publications likely resulted in the exclusion of relevant outlets. However, we did not intend to suggest literature reviews, or research of any kind, could achieve perfection. This review assessed article features published in journals broadly representative of behavior analysis (Hantula, Critchfield, & Rasmussen, 2017) to contribute to discussions regarding standards of scholarship in the field (e.g., Petticrew, 2015). As all research, including the current review, inescapably exhibit flaws (Kollins et al., 1999), this analysis eschewed the subtly pejorative term “quality indicator” in order to draw attention to defensible aspects of searches that might nonetheless introduce bias. This study contributes to the field in spite of these shortcomings and remains sufficiently transparent to permit redress in future research.

The coding scheme employed in the present review encompassed methodological transparency (i.e., search details) as well as the implications of prevalent search practices (i.e., the influence of reported criteria findings). Codes did not exhaustively address features examined in previous reviews (e.g., Mackay et al., 2003). The codes applied reflect the focus of this review on features distinct to the research review process (i.e., search and selection procedures). Assessing features such as keywords, research questions, or measurement assessment procedures represents a valuable area of inquiry for future scholarship.

An additional limitation involves the meta-analysis designation, as some studies bearing the term may not appropriately be described as meta-analyses. We intentionally used a liberal definition of meta-analysis in order to allow the identification of articles featuring attempts at quantitative synthesis within a discipline with a tradition of hostility toward statistical analysis (e.g., Kollins et al., 1997). This definition likely inflated the number of identified meta-analyses, in particular in ABAI and SEAB journals. Issues related to the quality of meta-analyses within behavior analysis, and the extent such work conforms with guidelines for quantitative synthesis (e.g., Maggin et al., 2011a, 2011b), though beyond the scope of this review, merit further study.

Future Directions

Recognition of the importance of methodology could contribute to changes in both the consumption and production of literature reviews. Given the current results, consumers should contextualize review findings based on the transparency of methods, the breadth of the search, and the inclusion of multiple sources. Explicitly stating research objectives, search procedures, and selection criteria could improve the utility of reviews more generally. Recent systematic reviews in behavior analysis provide excellent examples that could serve as a model for future reviewers. For example, Sham and Smith (2014) provided a clear description of procedures and included a variety of published and unpublished sources. Miller and Lee (2013) explicitly defined inclusion criteria and assessed the impact of various effect metrics using three separate effect sizes. Authors should include easily obtained unpublished material (e.g., dissertations) whenever possible, especially given (1) the feasibility of small-scale SCD studies and (2) the demonstrable impact of such work on the results of reviews (Ferguson & Brannick, 2012). Despite their potential to mitigate the influence of studies with questionable methodology, reviewers should exercise caution in selecting quality indicators. Current quality indices, such as those of the WWC (2017), omit concerns regarding important aspects of the literature (e.g., graphical display; Kubina, Kostewicz, Brennan, & King, 2017) or include problematic criteria (e.g., relaxed standards for studies involving harmful behavior; Cook et al., 2015).

Aside from referencing models of transparency, we hesitate to advocate for the use of distinct approaches, such as specific set of quality indicators. Because findings may indeed vary with the rigor of experimental procedures, employing quality assessment procedures addresses threats to validity posed by conflating studies regardless of rigor. Standards proliferated by a specific organization—though potentially convenient—need not be used provided authors explicitly describe the guidelines and procedures related to quality assessment.

Likewise, our finding regarding the minimal number of meta-analyses does not necessarily represent an endorsement of a specific form of research synthesis in behavior analysis. The field has indeed avoided issues associated with methods that misrepresent findings through the staunch adherence to scientific principles established by Skinner (1938), Sidman (1960), and others. In more recent studies, scholars have suggested many popular SCD effect sizes misrepresent findings and should therefore be accompanied by visual analysis (e.g., Wolery et al., 2010; Tarlow, 2017). As evidenced by the focus of this review, analysis represents a critical, yet secondary concern. Any analysis must proceed from a defensible process of identifying and selecting evidence, because interpreting unrepresentative literature samples will always produce misleading conclusions. Finally, literature reviews that contribute to assessing the extent of research replicability and generality, rightly observed as guiding principles of behavior analysis, may best involve systematic methods but need not necessarily involve quantitative synthesis. Nonetheless, behavior analysis would clearly benefit with attempting to engage with, constructively critique, and develop behaviorally aligned alternatives for quantifying, comparing, and summarizing behavioral research (e.g., Kollins et al., 1997; Lanovaz & Rapp, 2016; Killeen, 2019). The increasing willingness of behavior analysis journals to confront the challenge of summarizing behavioral research represents a positive step for the field and science more generally (e.g., Critchfield et al., 2000; Lanovaz & Rapp, 2016; Killeen, 2019).

The issues observed in many of the systematic and meta-analyses published in behavior analysis present a number of opportunities for researchers. The growing number of systematic reviews suggests the discipline has begun to favor more transparent, structured approaches. Although current methods may allow for some sources of bias, continued effort to address these issues may result in a body of work that fully conveys the strength of evidence supporting behavior analysis in an objective manner befitting the science. Replicating existing reviews with opaque method sections or limited selection criteria could result in improved guidance for professionals. In addition, such work will allow a more precise, subject-specific evaluation of the impact of review procedures on findings. The apparent dearth of systematic reviews in subdisciplines of behavior analysis requires further scrutiny. However, these fields may benefit from a renewed interest in comprehensive examinations of empirical work.

Confusion surrounding the term “meta-analysis” as used in the field also warrants a more thorough evaluation of existing meta-analyses in behavior analysis and agreement upon what meta-analysis entails (Baron & Derenne, 2000). As with statistical methods in general, adopting a common understanding of a meta-analysis would likely address some of the trepidation regarding its use or misuse in behavior analysis. Many articles described as meta-analyses, and particularly those featuring SCDs, aggregate studies using simple averages or are otherwise not consistent with meta-analyses used in other disciplines (Maggin et al., 2011). Methods of synthesizing SCD consistent with the Cochrane definition have recently proliferated and, given the trend observed in this review, may potentially appear more frequently in behavior analysis journals.

Publication trends suggest systematic reviews may continue to represent the dominant form of literature review in many journals of behavior analysis. Although this represents a positive development, the findings of the current study indicate the limited disclosure of search and selection procedures diminishes the utility of otherwise methodical literature reviews. Literature reviews contribute to claims regarding the replicability and generality of behavioral principles (Sidman, 1960). A greater acknowledgment of the importance of transparent, thorough reviews may enhance applied activities as well as refinements in the theory of behavior analysis.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. American Psychological Association Presidential Task Force on Evidence-Based Practice Evidence-based practice in psychology. The American Psychologist. 2006;61(4):271–285. doi: 10.1037/0003-066X.61.4.271. [DOI] [PubMed] [Google Scholar]
  2. Ator NA. Statistical inference in behavior analysis: Environmental determinants? Perspectives on Behavior Science. 1999;22:93–97. doi: 10.1007/BF03391985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baer DM. Perhaps it would be better not to know everything. Journal of Applied Behavior Analysis. 1977;10:167–172. doi: 10.1901/jaba.1977.10-167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Baer DM, Wolf MM, Risley TR. Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis. 1968;1:91–97. doi: 10.1901/jaba.1968.1-91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Baer DM, Wolf MM, Risley TR. Some still-current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis. 1987;20:313–327. doi: 10.1901/jaba.1987.20-313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Baron A, Derenne A. Quantitative summaries of single-subject studies: What do group comparisons tell us about individual performances? Perspectives on Behavior Science. 2000;23:101. doi: 10.1007/BF03392004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Behrstock-Sherratt E, Drill K, Miller S. Is the supply in demand? Exploring how, when, and why teachers use research. Washington, DC: American Institutes for Research; 2011. [Google Scholar]
  8. Booth A. How much searching is enough? Comprehensive versus optimal retrieval for technology assessments. International Journal of Technology Assessment in Health Care. 2010;26:431–435. doi: 10.1017/S0266462310000966. [DOI] [PubMed] [Google Scholar]
  9. Buchanan J, Husfeldt JD, Berg TM, Houlihan D. Publication trends in behavioral gerontology in the past 25 years: Are the elderly still an understudied population in behavioral research? Behavioral Interventions. 2008;23:65–74. [Google Scholar]
  10. Busacca ML, Anderson A, Moore DW. Self-management for primary school students demonstrating problem behavior in regular classrooms: Evidence review of single-case design research. Journal of Behavioral Education. 2015;24(4):373–401. [Google Scholar]
  11. Center BA, Skiba RJ, Casey A. A methodology for the quantitative synthesis of intra-subject design research. Journal of Special Education. 1985;19(4):387–400. [Google Scholar]
  12. Chalmers I, Hedges LV, Cooper H. A brief history of research synthesis. Evaluation & the Health Professions. 2002;25:12–37. doi: 10.1177/0163278702025001003. [DOI] [PubMed] [Google Scholar]
  13. Clarivate Analytics. (2017). 2016 journal citation reports® social sciences edition. Retrieved from https://jcr.clarivate.com/.
  14. Cook BG. A call for examining replication and bias in special education research. Remedial & Special Education. 2014;35(4):233–246. [Google Scholar]
  15. Cook BG, Buysse V, Klingner J, Landrum TJ, McWilliam RA, Tankersley M, Test DW. CEC's standards for classifying the evidence base of practices in special education. Remedial & Special Education. 2015;36(4):220–234. [Google Scholar]
  16. Cook BG, Cook SC. Unraveling evidence-based practices in special education. Journal of Special Education. 2013;47(2):71–82. [Google Scholar]
  17. Cook BG, Odom SL. Evidence-based practices and implementation science in special education. Exceptional Children. 2013;79(2):135–144. [Google Scholar]
  18. Cook, B. G., & Therrien, W. J. (2017). Null effects and publication bias in special education research. Behavioral Disorders, 42(4), 149–158
  19. Cooper H, Hedges LV, Valentine JC, editors. The handbook of research synthesis and meta-analysis. 2. New York, NY: Russell Sage Foundation; 2009. [Google Scholar]
  20. Cooper JO, Heron TE, Heward WL. Applied behavior analysis. 2. Upper Saddle River, NJ: Pearson; 2007. [Google Scholar]
  21. Council for Exceptional Children. (2014). Council for Exceptional Children standards for evidence-based practices in special education. Retrieved from http://www.cec.sped.org/~/media/Files/Standards/Evidence%20based%20Practices%20and%20Practice/EBP%20FINAL.pdf.
  22. Critchfield TS, Newland CM, Kollins SH. The good, the bad, and the aggregate. Perspectives on Behavior Science. 2000;23:107–115. doi: 10.1007/BF03392005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Cummings P. Arguments for and against standardized mean differences (effect sizes) Archives of Pediatrics & Adolescent Medicine. 2011;165(7):592–596. doi: 10.1001/archpediatrics.2011.97. [DOI] [PubMed] [Google Scholar]
  24. Deeks, J. J., Higgins, P. T., & Altman, D. G. (2019). Analyzing data and undertaking meta-analyses. In J. P. T. Higgins, J. Thomas., J. Chandler, M. Cumpston, T. Li, M. J. Page, & V. A. Welch (Eds.), Cochrane handbook for systematic reviews of interventions version 6.0 (updated July 2019). Cochrane, 2019. Available from http://www.training.cochrane.org/handbook.
  25. Delaney A, Tamás PA. Searching for evidence or approval? A commentary on database search in systematic reviews and alternative information retrieval methodologies. Research Synthesis Methods. 2018;9(1):124–131. doi: 10.1002/jrsm.1282. [DOI] [PubMed] [Google Scholar]
  26. DeProspero A, Cohen S. Inconsistent visual analysis of variance model for the intrasubject replication design. Journal of Applied Behavior Analysis. 1979;12:563–570. [Google Scholar]
  27. Derenne A, Baron A. Human sensitivity to reinforcement: A comment on Kollins, Newland, and Critchfield’s (1997) quantitative literature review. Perspectives on Behavior Science. 1999;22:35–41. doi: 10.1007/BF03391976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Dorsey MF, Weinberg M, Zane T, Guidi MM. The case for licensure of applied behavior analysts. Behavior Analysis in Practice. 2009;2(1):53–58. doi: 10.1007/BF03391738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Egger M, Zellweger-Zähner T, Schneider M, Junker C, Lengeler C, Antes G. Language bias in randomized controlled trials published in English and German. The Lancet. 1997;350(9074):326–329. doi: 10.1016/S0140-6736(97)02419-7. [DOI] [PubMed] [Google Scholar]
  30. Ferguson CJ, Brannick MT. Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods. 2012;17(1):120. doi: 10.1037/a0024445. [DOI] [PubMed] [Google Scholar]
  31. Fraley LE, Vargas EA. Separate disciplines: The study of behavior and the study of the psyche. Perspectives on Behavior Science. 1986;9:47–59. doi: 10.1007/BF03391929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Gage NA, Cook BG, Reichow B. Publication bias in special education meta-analyses. Exceptional Children. 2017;83(4):428–445. [Google Scholar]
  33. Galizio M. JEAB: Past, present, and future. Journal of the Experimental Analysis of Behavior. 2020;113:3–7. doi: 10.1002/jeab.574. [DOI] [PubMed] [Google Scholar]
  34. Gamba J, Goyos C, Petursdottir AI. The functional independence of mands and tacts: Has it been demonstrated empirically? Analysis of Verbal Behavior. 2015;31:10–38. doi: 10.1007/s40616-014-0026-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Garg AX, Hackam D, Tonelli M. Systematic review and meta-analysis: When one study is just not enough. Clinical Journal of the American Society of Nephrology. 2008;3(1):253–260. doi: 10.2215/CJN.01430307. [DOI] [PubMed] [Google Scholar]
  36. Gingerich WJ. Meta-analysis of applied time-series data. Journal of Applied Behavioral Science. 1984;20:71–79. doi: 10.1177/002188638402000113. [DOI] [PubMed] [Google Scholar]
  37. Graf SA. Is this the right road? A review of Kratochwill's single subject research: Strategies for evaluating change. Perspectives on Behavior Science. 1982;5:95. [Google Scholar]
  38. Hansen H, Trifkovic N. Systematic reviews: Questions, methods and usage. Copenhagen, Denmark: Danish International Development Agency; 2013. [Google Scholar]
  39. Hantula D, Critchfield TS, Rasmussen E. Swan song. Perspectives on Behavior Science. 2017;40(2):297–303. doi: 10.1007/s40614-017-0132-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hantula, D. A. (2016). Editorial: A very special issue. Perspectives on Behavior Science, 39, 1–5. [DOI] [PMC free article] [PubMed]
  41. Hayes, S. C., Blackledge, J. T., & Barnes-Holmes. (2001). Language and cognition: Constructing an alternative approach with the behavioral tradition. In S. C. Hayes, D. Barnes-Holmes, & B. Roche (Eds.), Relational frame theory: A Post-Skinnerian account of human language and cognition (pp. 3–20). Cham, Switzerland: Springer.
  42. Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomized trials. BMJ. 2011;343:d5928–d5928. doi: 10.1136/bmj.d5928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Hojem MA, Ottenbacher KJ. Empirical investigation of visual-inspection versus trend-line analysis of single-subject data. Physical Therapy. 1988;68(6):983–988. doi: 10.1093/ptj/68.6.983. [DOI] [PubMed] [Google Scholar]
  44. Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M. The use of single subject design research to identify evidence-based practices in special education. Exceptional Children. 2005;71:165–179. [Google Scholar]
  45. Ioannidis, J. P. (2016). The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. The Milbank Quarterly, 94(3), 485–514. [DOI] [PMC free article] [PubMed]
  46. Johnston JM, Pennypacker HS. Strategies and tactics of behavioral research. 3. New York, NY: Routledge; 2009. [Google Scholar]
  47. Kahng S, Hausman NL, Fisher AB, Donaldson JM, Cox JR, Lugo M, Wiskow KM. The safety of functional analyses of self-injurious behavior. Journal of Applied Behavior Analysis. 2015;48(1):107–114. doi: 10.1002/jaba.168. [DOI] [PubMed] [Google Scholar]
  48. Kazdin AE. Single-case research designs: methods for clinical and applied settings. 2. New York, NY: Oxford University Press; 2011. [Google Scholar]
  49. Kennedy CH. Single-case designs for educational research. Boston, MA: Pearson; 2005. [Google Scholar]
  50. Killeen PR. Predict, control, and replicate to understand: How statistics can foster the fundamental goals of science. Perspectives on Behavior Science. 2019;42:109–132. doi: 10.1007/s40614-018-0171-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. King S, Davidson K, Chitiyo A, Apple D. Evaluating article search and selection procedures in special education literature reviews. Remedial & Special Education. 2020;41:3–17. [Google Scholar]
  52. Kollins SH, Newland MC, Critchfield TS. Human sensitivity to reinforcement in operant choice: How much do consequences matter? Psychonomic Bulletin & Review. 1997;4(2):208–220. doi: 10.3758/BF03209395. [DOI] [PubMed] [Google Scholar]
  53. Kollins SH, Newland MC, Critchfield TS. Quantitative integration of single-subject studies: Methods and misinterpretations. Perspectives on Behavior Science. 1999;22:149–157. doi: 10.1007/BF03391992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Kostewicz D, King S, Datchuk S, Brennan K, Casey S. Data Collection and Measurement Assessment in Behavioral Research. Behavior Analysis: Research. Practice. 2016;16(1):19–33. [Google Scholar]
  55. Kratochwill TR, Levin JR, Horner RH. Negative results: Conceptual and methodological dimensions in single- case intervention research. Remedial & Special Education. 2018;39:67–76. [Google Scholar]
  56. Kubina RM, Kostewicz DE, Brennan KM, King SA. A critical review of line graphs in behavior analytic journals. Educational Psychology Review. 2017;29(3):583–598. [Google Scholar]
  57. Kyonka EG, Mitchell SH, Bizo LA. Beyond inference by eye: Statistical and graphing practices in JEAB, 1992–2017. Journal of the Experimental Analysis of Behavior. 2019;111:155–165. doi: 10.1002/jeab.509. [DOI] [PubMed] [Google Scholar]
  58. Lanovaz MJ, Rapp JT. Using single-case experiments to support evidence-based decisions: How much is enough? Behavior Modification. 2016;40:377–395. doi: 10.1177/0145445515613584. [DOI] [PubMed] [Google Scholar]
  59. Lanovaz MJ, Turgeon S, Cardinal P, Wheatley TL. Using single-case designs in practical settings: Is within-subject replication always necessary? Perspectives on Behavior Science. 2019;42:153–162. doi: 10.1007/s40614-018-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Laraway S, Snycerski S, Pradhan S, Huitema BE. An overview of scientific reproducibility: Consideration of relevant issues for behavior science/analysis. Perspectives on Behavior Science. 2019;42:33–57. doi: 10.1007/s40614-019-00193-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Lawrence DW. What is lost when searching only one literature database for articles relevant to injury prevention and safety promotion? Injury Prevention. 2008;14(6):401–404. doi: 10.1136/ip.2008.019430. [DOI] [PubMed] [Google Scholar]
  62. Ledford JR, King SA, Harbin ER, Zimmerman KN. Antecedent social skills interventions for individuals with ASD: What works, for whom, and under what conditions? Focus on Autism & Other Developmental Disabilities. 2018;33(1):3–13. [Google Scholar]
  63. Lemons CJ, King SA, Davidson KA, Berryessa TL, Gajjar SA, Sacks LH. An inadvertent concurrent replication: Same roadmap, different journey. Remedial & Special Education. 2016;37(4):213–222. [Google Scholar]
  64. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration. Journal of Clinical Epidemiology. 2009;62:e1–e34. doi: 10.1016/j.jclinepi.2009.06.006. [DOI] [PubMed] [Google Scholar]
  65. Littell JH, Girvin H. Stages of change: A critique. Behavior Modification. 2002;26:223–273. doi: 10.1177/0145445502026002006. [DOI] [PubMed] [Google Scholar]
  66. Mace FC, Critchfield TS. Translational research in behavior analysis: Historical traditions and imperative for the future. Journal of the Experimental Analysis of Behavior. 2010;93:293–312. doi: 10.1901/jeab.2010.93-293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Mackay HC, Barkham M, Rees A, Stiles WB. Appraisal of published reviews of research on psychotherapy and counseling with adults, 1990–1998. Journal of Consulting & Clinical Psychology. 2003;71(4):652. doi: 10.1037/0022-006x.71.4.652. [DOI] [PubMed] [Google Scholar]
  68. Maggin DM, Chafouleas SM, Goddard KM, Johnson AH. A systematic evaluation of token economies as a classroom management tool for students with challenging behavior. Journal of School Psychology. 2011;49(5):529–554. doi: 10.1016/j.jsp.2011.05.001. [DOI] [PubMed] [Google Scholar]
  69. Maggin DM, O'Keeffe BV, Johnson AH. A quantitative synthesis of methodology in the meta-analysis of single-subject research for students with disabilities: 1985–2009. Exceptionality. 2011;19:109–135. [Google Scholar]
  70. Maggin DM, Talbott E, Van Acker EY, Kumm S. Quality indicators for systematic reviews in behavioral disorders. Behavioral Disorders. 2017;42(2):52–64. [Google Scholar]
  71. Mahood Q, Van Eerd D, Irvin E. Searching for grey literature for systematic reviews: Challenges and benefits. Research Synthesis Methods. 2014;5:221–234. doi: 10.1002/jrsm.1106. [DOI] [PubMed] [Google Scholar]
  72. Maner JK. Let’s put our money where our mouth is if authors are to change their ways, reviewers (and editors) must change with them. Perspectives on Psychological Science. 2014;9(3):343–351. doi: 10.1177/1745691614528215. [DOI] [PubMed] [Google Scholar]
  73. Manolov R, Losada JL, Chacón-Moscoso S, Sanduvete-Chaves S. Analyzing two-phase single-case data with non-overlap and mean difference indices: illustration, software tools, and alternatives. Frontiers in Psychology. 2016;7:1–16. doi: 10.3389/fpsyg.2016.00032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Manolov, R., & Vannest, K. J. (2019). A visual aid and objective rule encompassing the data features of visual analysis. Behavior Modification. 10.1177/0145445519854323. [DOI] [PubMed]
  75. Marr MJ. The future of behavior analysis: Foxes and hedgehogs revisited. Perspectives on Behavior Science. 2017;40(1):197–207. doi: 10.1007/s40614-017-0107-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Martin NT, Nosik MR, Carr JE. International publication trends in the journal of applied behavior analysis: 2000–2014. Journal of Applied Behavior Analysis. 2016;49(2):416–420. doi: 10.1002/jaba.279. [DOI] [PubMed] [Google Scholar]
  77. McSweeney FK, Swindell S. Women in the experimental analysis of behavior. Perspectives on Behavior Science. 1998;21(2):193–202. doi: 10.1007/BF03391963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Miller FG, Lee DL. Do functional behavioral assessments improve intervention effectiveness for students diagnosed with ADHD? A single-subject meta-analysis. Journal of Behavioral Education. 2013;22:253–282. [Google Scholar]
  79. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomized controlled trials: The QUOROM statement. The Lancet. 1999;354(9193):1896–1900. doi: 10.1016/s0140-6736(99)04149-5. [DOI] [PubMed] [Google Scholar]
  80. Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG. Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine. 2007;4(3):e78. doi: 10.1371/journal.pmed.0040078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Moore TC, Maggin DM, Thompson KM, Gordon JR, Daniels S, Lang LE. Evidence review for teacher praise to improve students’ classroom behavior. Journal of Positive Behavior Interventions. 2019;21(1):3–18. [Google Scholar]
  82. Morris EK, Altus DE, Smith NG. A study in the founding of applied behavior analysis through its publications. Perspectives on Behavior Science. 2013;36(1):73–107. doi: 10.1007/BF03392293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Nickerson R. Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods. 2000;5(2):241–301. doi: 10.1037/1082-989x.5.2.241. [DOI] [PubMed] [Google Scholar]
  84. Ninci J, Vannest KJ, Wilson V, Zhang N. Interrater agreement between visual analysts of single-case data: a meta-analysis. Behavior Modification. 2015;39:510–541. doi: 10.1177/0145445515581327. [DOI] [PubMed] [Google Scholar]
  85. Odom SL. The tie that binds: Evidence-based practice, implementation science, and outcomes for children. Topics in Early Childhood Special Education. 2009;29(1):53–61. [Google Scholar]
  86. Parker RI, Vannest JK, Davis JL. Effect size in single-case research: A review of nine nonoverlap techniques. Behavior Modification. 2011;35:303–322. doi: 10.1177/0145445511399147. [DOI] [PubMed] [Google Scholar]
  87. Pennypacker HS. Evidence reconsidered. European Journal of Behavior Analysis. 2012;13(1):83–86. [Google Scholar]
  88. Perone M. Statistical inference in behavior analysis: Experimental control is better. Perspectives on Behavior Science. 1999;22:109–116. doi: 10.1007/BF03391988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Perone M. How I learned to stop worrying and love replication failures. Perspectives on Behavior Science. 2019;42(1):91–108. doi: 10.1007/s40614-018-0153-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Petticrew M. Time to rethink the systematic review catechism? Moving from “what works” to “what happens”. Systematic Reviews. 2015;1(4):1–6. doi: 10.1186/s13643-015-0027-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Petticrew M, Roberts H. Systematic reviews in the social sciences: A practical guide. Malden, MA: Blackwell; 2008. [Google Scholar]
  92. Petursdottir AI, Carr JE. Applying the taxonomy of validity threats from mainstream research design to single-case experiments in applied behavior analysis. Behavior Analysis in Practice. 2018;11(3):228–240. doi: 10.1007/s40617-018-00294-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Polanin JR, Tanner-Smith EE, Hennessy EA. Estimating the difference between published and unpublished effect sizes: A meta-review. Review of Educational Research. 2016;86:207–236. [Google Scholar]
  94. Pustejovsky, J. E. (2015). Effects of measurement operation on the magnitude of nonoverlap effect sizes for single-case experimental designs. Paper presented at the 2015 annual meeting of the American Educational Research Association. Chicago, Illinois; April 15-April 20.
  95. Pustejovsky, J. E., & Ferron, J. M. (2017). Research synthesis and meta-analysis of single-case designs. In J. M. Kaufmann, D. P. Hallahan, & P. C. Pullen (Eds.), Handbook of special education (2nd ed.). New York, NY: Routledge. pp. 168–186
  96. Salzberg CL, Strain PS, Baer DM. Meta-analysis for single-subject research: When does it clarify, when does it obscure? Remedial & Special Education. 1987;8:43–48. [Google Scholar]
  97. Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidence-based practice guideline for the peer review of electronic search strategies. Journal of Clinical Epidemiology. 2009;62(9):944–952. doi: 10.1016/j.jclinepi.2008.10.012. [DOI] [PubMed] [Google Scholar]
  98. Schlichenmeyer KJ, Roscoe EM, Rooker GW, Wheeler EE, Dube WV. Idiosyncratic variables that affect functional analysis outcomes: A review (2001–2010) Journal of Applied Behavior Analysis. 2013;46(1):339–348. doi: 10.1002/jaba.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Scruggs TE, Mastropieri MA, Casto G. The quantitative synthesis of single-subject research: Methodology and validation. Remedial & Special Education. 1987;8:24–33. [Google Scholar]
  100. Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987b). Response to Salzberg, Strain, and Baer. Remedial & Special Education, 8, 49–52.
  101. Scruggs TE, Mastropieri MA, Cook SB, Escobar C. Early intervention for children with conduct disorders: A quantitative synthesis of single-subject research. Behavioral Disorders. 1986;11:260–271. [Google Scholar]
  102. Seubert C, Fryling MJ, Wallace MD, Jiminez AR, Meier AE. Antecedent interventions for pediatric feeding problems. Journal of Applied Behavior Analysis. 2014;47:449–453. doi: 10.1002/jaba.117. [DOI] [PubMed] [Google Scholar]
  103. Shadish, W.R., Hedges, L.V., Horner, R.H., and Odom, S.L. (2015). The role of between-case effect size in conducting, interpreting, and summarizing single-case research. (NCER 2015-002) Washington, DC: National Center for Education Research, Institute of Education Sciences, U.S. Department of Education. Retrieved from http://ies.ed.gov/
  104. Shadish WR, Zelinsky NA, Vevea JL, Kratochwill TR. A survey of publication practices of single-case design researchers when treatments have small or large effects. Journal of Applied Behavior Analysis. 2016;49:656–673. doi: 10.1002/jaba.308. [DOI] [PubMed] [Google Scholar]
  105. Shahan TA. Conditioned reinforcement and response strength. Journal of the Experimental Analysis of Behavior. 2010;93(2):269–289. doi: 10.1901/jeab.2010.93-269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Sham E, Smith T. Publication bias in studies of an applied behavior-analytic intervention: An initial analysis. Journal of Applied Behavior Analysis. 2014;47(3):663–678. doi: 10.1002/jaba.146. [DOI] [PubMed] [Google Scholar]
  107. Sharpe, D. (1997). Of apples and oranges, file drawers and garbage: Why validity issues in meta-analysis will not go away. Clinical Psychology Review, 17(8), 881–901. [DOI] [PubMed]
  108. Shea B, Dubé C, Moher D. Assessing the quality of reports of systematic reviews: The QUOROM statement compared to other tools. In: Egger M, Smith GD, Altman DG, editors. Systematic reviews in health care: Meta-analysis in context. 2. London, UK: BMJ Publishing; 2001. pp. 122–139. [Google Scholar]
  109. Sidman M. Tactics of scientific research: Evaluating experimental data in psychology. New York, NY: Basic Books; 1960. [Google Scholar]
  110. Siontis, K. C., Hernandez-Boussard, T., & Ioannidis, J. P. (2013). Overlapping meta-analyses on the same topic: survey of published studies. BMJ, 347. 10.1136/bmj.f4501 [DOI] [PMC free article] [PubMed]
  111. Skinner BF. The behavior of organisms: An experimental analysis. New York, NY: Appleton-Century-Crofts; 1938. [Google Scholar]
  112. Skinner BF. A case history in scientific method. American Psychologist. 1956;11:221–233. [Google Scholar]
  113. Slavin RE. Best evidence synthesis: An intelligent alternative to meta-analysis. Journal of Clinical Epidemiology. 1995;48(1):9–18. doi: 10.1016/0895-4356(94)00097-a. [DOI] [PubMed] [Google Scholar]
  114. Slocum TA, Detrich R, Wilczynski SM, Spencer TD, Lewis T, Wolfe K. The evidence-based practice of applied behavior analysis. Perspectives on Behavior Science. 2014;37(1):41–56. doi: 10.1007/s40614-014-0005-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Talbott E, Maggin DM, Van Acker EY, Kumm S. Quality indicators for reviews of research in special education. Exceptionality. 2018;26(4):245–265. [Google Scholar]
  116. Tarlow KR. An improved rank correlation effect size statistic for single-case designs: Baseline corrected Tau. Behavior Modification. 2017;41(4):427–467. doi: 10.1177/0145445516676750. [DOI] [PubMed] [Google Scholar]
  117. Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, et al. Data sharing by scientists: Practices and perceptions. PLoS ONE. 2011;6(6):e21101. doi: 10.1371/journal.pone.0021101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Therrien WJ, Mathews HM, Hirsch SE, Solis M. Progeny review: An alternative approach for examining the replication of intervention studies in special education. Remedial & Special Education. 2016;37:235–243. doi: 10.1177/0741932516646081. [DOI] [Google Scholar]
  119. Thompson B, Diamond KE, McWilliam R, Snyder P, Snyder SW. Evaluating the quality of evidence from correlational research for evidence-based practice. Exceptional Children. 2005;71:181–194. [Google Scholar]
  120. Thorlund K, Druyts E, Aviña-Zubieta JA, Wu P, Mills EJ. Why the findings of published multiple treatment comparison meta-analyses of biologic treatments for rheumatoid arthritis are different: an overview of recurrent methodological shortcomings. Annals of the Rheumatic Diseases. 2013;72(9):1524–1535. doi: 10.1136/annrheumdis-2012-201574. [DOI] [PubMed] [Google Scholar]
  121. Tincani M, Travers J. Replication research, publication bias, and applied behavior analysis. Perspectives on Behavior Science. 2019;42(1):59–75. doi: 10.1007/s40614-019-00191-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Valentine JC, Cooper HM, Patall EA, Tyson D, Robinson JC. A method for evaluating research syntheses: The quality, conclusions, and consensus of 12 syntheses of the effects of after-school programs. Research Synthesis Methods. 2010;1(1):20–23. doi: 10.1002/jrsm.3. [DOI] [PubMed] [Google Scholar]
  123. Vargas EA. "Separate disciplines" is another name for survival. Perspectives on Behavior Science. 1987;10(1):119–121. doi: 10.1007/BF03392420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Vollmer TR, Hagopian LP, Bailey JS, Dorsey MF, Hanley GP, Lennox D, et al. The Association for Behavior Analysis International position statement on restraint and seclusion. Perspectives on Behavior Science. 2011;34(1):103. doi: 10.1007/BF03392238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10, 365–391.
  126. Wang S, Parilla R, Cui Y. Meta-analysis of social skills interventions of single-case research for individuals with autism spectrum disorders: Results from three-level HLM. Journal of Autism & Developmental Disorders. 2013;43:1701–1716. doi: 10.1007/s10803-012-1726-2. [DOI] [PubMed] [Google Scholar]
  127. Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10, 347–364.
  128. Web of Science (WOS). (2017). 2016 journal citation reports® social sciences edition. Thompson Reuters. Retrieved from https://jcr-incites-thomsonreuters-com.
  129. Weisz, J. R., & Hawley, K. M. (2002). Procedural and coding manual for identification of beneficial treatments. Washington, DC: American Psychological Association, Society for Clinical Psychology, Division 12.
  130. Wendt O, Miller B. Quality appraisal of single-subject experimental designs: An overview and comparison of different appraisal tools. Education & Treatment of Children. 2012;35:235–268. [Google Scholar]
  131. What Works Clearinghouse. (2017). Standards handbook (Version 4.0). Author. Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.pdf.
  132. Wolery M, Busick M, Reichow B, Barton EE. Comparison of overlap methods for quantitatively synthesizing single-subject data. Journal of Special Education. 2010;44:18–28. [Google Scholar]

Articles from Perspectives on Behavior Science are provided here courtesy of Association for Behavior Analysis International

RESOURCES