Skip to main content
Behavior Analysis in Practice logoLink to Behavior Analysis in Practice
. 2015 Oct 2;8(2):123–133. doi: 10.1007/s40617-015-0090-z

In Dreams Begin Responsibility: Why and How to Measure the Quality of Graduate Training in Applied Behavior Analysis

Thomas S Critchfield 1,
PMCID: PMC5048285  PMID: 27703900

Abstract

Although no one knows just how effective graduate training may be in creating effective practitioners of applied behavior analysis, there are plenty of logical and historical reasons to think that not all practitioners are equally competent. I detail some of those reasons and explain why practitioner effectiveness may be a more pressing worry now than in the past. Because ineffective practitioners harm the profession, rigorous mechanisms are needed for evaluating graduate training programs in terms of the field effectiveness of their practitioners. Accountability of this nature, while difficult to arrange, would make applied behavior analysis nearly unique among professions, would complement existing quality control processes, and would help to protect the positive reputation and vigorous consumer demand that the profession currently enjoys.

Keywords: Applied behavior analysis, Graduate training, Value added


B.F. Skinner liked to imagine that the power of behavioral analyses would bring to our discipline’s door a world hungry for solutions to difficult problems. Ever the empiricist, however, in his final years, Skinner showed signs of creeping discouragement over the prospects for this outcome (e.g., Ulman 2014). By the time I entered behavior analysis in the early 1980s, for instance, the available evidence was not encouraging. In applied behavior analysis (ABA), jobs were scarce and those that were available tended to be poorly compensated and situated in unpleasant environments. Not a lot of people wanted to be ABA practitioners, and not a lot of graduate programs emphasized this specialization.

As the reader well knows, times have changed. At least within the domain of autism services, demand for ABA practitioners has never been higher. Students clamor to gain entry into an ever-expanding roster of ABA graduate programs, and new ABA positions seem to arise faster than they can be filled (Shook and Favell 2008; Shook et al. 2002). In this dream-come-true scenario, a casual observer might dismiss Dixon et al. as party poopers for questioning the competence of some ABA practitioners, a concern that prompted them to advocate better monitoring of the quality of ABA graduate training (Dixon et al. 2015). For illustrative purposes, Dixon et al. focused on a particular aspect of graduate training, the amount of exposure to research that ABA students can expect to receive. This elicited useful comments about the desirability of practitioner research training1 and about ways to properly assess the research climate in graduate programs (present journal issue). Largely unaddressed in the resulting discussion, however, was this central apprehension: Is there really, as Dixon et al. suggested, reason to worry about the product that our graduate programs are sending into the human services marketplace?

The answer to this question is precisely what Dixon et al. (2015) indicated: Nobody knows for sure. Without systematic monitoring of practitioner proficiency, there is no way to determine how well ABA graduate training succeeds in generating field competence. We do, however, know a couple of things that are consistent with the notion that not all ABA practitioners are equally skilled. First, we know that concerns about the proficiency of at least some practitioners trace back to the profession’s early days (e.g., Deitz 1978; Hayes 1978; Hayes et al. 1980; Michael 1980). Second, based on casual observation, we know that ABA graduate programs have always varied in their curricular emphases, pedagogical processes, and field supervision practices, and some training models, presumably, work better than others. It would not be surprising, therefore, if throughout the profession’s history training programs have produced an “effectiveness distribution” among graduates, similar to that shown in Fig. 1a, in which at least some ABA practitioners lack the skills to reliably effect positive behavior change. The practitioner certification process has helped to bring this possibility into clearer focus because the Behavior Analyst Certification Board (BACB) publishes information on the results of certification examinations. For example, in February 2015, roughly 39 % of individuals failed to pass their Master’s level certification exam on a first attempt (retrieved July 14, 2015, from http://www.bacb.com/index.php?page=67). These individuals – most of whom completed BACB-approved course sequences, and some of whom must have graduated from ABA graduate training programs – could not correctly answer multiple-choice questions about basic principles and procedures involved in behavior change. Such data illustrate that graduate training does not always achieve its intended goals.

Fig. 1.

Fig. 1

Hypothetical distributions of practitioner effectiveness. See text for explanation

Regarding these data, a skeptic could point out that at least there is more accountability in practitioner preparation than in the past. Most notably, a practitioner certification process exists in the first place, and thus people who fail a certification exam cannot become board certified. For each graduate course sequence that BACB approves2 it publishes the percentage of applicants who pass the certification examination (see www.bcba.com), so in this key way it is more obvious than in the past when ABA training has failed to take hold. Additionally, since 1991, behavior analysis graduate programs have been able to pursue an accreditation process, sponsored by the Association for Behavior Analysis International (ABAI), which is “designed to encourage and support exemplary training of behavior scientists and scientist practitioners in the experimental and theoretical foundations of behavior analysis and in ethical and evidence-based practice” (retrieved July 19, 2015, from https://www.abainternational.org/accreditation.aspx; see also Hopkins 1991).

Unfortunately, neither board certification nor program accreditation provides much real insight into the practitioner competence that concerned Dixon et al. (2015). By design, board certification focuses on minimum, rather than optimum, educational standards (a point to which I will return later; see Shook 1993; Starin et al. 1993; Shook et al. 2004). Certification also hinges on an examination requiring only that multiple-choice questions be answered correctly. As Dixon et al. (2015) observed, this is a far cry from verifying that practitioners can competently design or implement interventions in field settings (see also Maher 1988). The ABAI program accreditation process applies somewhat more elaborate educational standards, but it is optional (most ABA programs do not participate), and it focuses primarily on program characteristics rather than the competence of program graduates.

The Problem of Quality Control in ABA Graduate Training

The only way to rise above speculation and be certain of how well ABA graduate training prepares practitioners to deliver ABA services is to systematically evaluate this outcome. Being experts in behavior, our community of behavior analysts can figure out how to do this, but being experts in measurement, we can anticipate that a rigorous system of evaluation will require effort and resources to implement, and thus is not something to undertake lightly. Thus, there is a Catch-22 in the stance adopted by Dixon et al. (2015): Systematically evaluating graduate training seems most compelling when training quality is known to be inconsistent, but only data from a formal evaluation process can reveal unambiguously whether a problem exists with ABA training quality.

Some Reasons to Monitor Graduate Training Quality

Upon reflection, a variety of arguments can be advanced for pursuing better evaluation of graduate training quality even in the absence of hard data documenting a quality control problem. First and foremost is that measuring behavior is part and parcel of being a behavior analyst. We tell students that ABA is an empirical enterprise and that objectively measuring behavior is the only way to know whether an intervention works (e.g., Cooper et al. 2013). We do not tell students to take data only if doing so is convenient or only if they suspect that an intervention may be failing. Graduate training is a form of behavioral intervention, and there is no a priori reason why evaluation standards should be lower for graduate training than for client therapy.

A second reason to properly evaluate graduate training is that we know that ABA services are effective when implemented with integrity (e.g., Makrygianni and Reed 2010). This contrasts ABA positively with numerous quack and pseudoscientific therapies (e.g., Green 1996), and increases the odds that ABA practitioners actually can satisfy consumer needs if well trained. Yet poor treatment integrity can undermine even the best of therapies, and no practitioner will successfully implement an intervention who does not understand that intervention in the first place (e.g., Detrich 1999). To state the problem in a different way, ABA done badly is, in fact, not really ABA, so there is a premium on making sure that ABA practitioners know what they are doing.

A third reason to properly evaluate graduate training is that, according to widely accepted ethical principles, ABA has one guiding purpose: to enhance client well-being (Bailey and Burch 2011). Although conventionally a practitioner’s “professional success” can mean many things (e.g., earning a good income or maintaining a balanced lifestyle despite the rigors of delivering services), within the ABA profession’s ethical framework a practitioner succeeds only when client welfare improves as a result of the interventions he or she implements. ABA graduate programs exist to train practitioners, and so, by extension, they succeed only to the extent that their graduates are professionally effective. Guided by its embrace of the so-called right to effective treatment, the ABA profession does not tolerate ineffective interventions (Bailey and Burch 2011), and it should not tolerate ineffective graduate training programs either.

A fourth reason to properly evaluate graduate training concerns societal responsibility. As Skinner (e.g., 1953) maintained in framing the mission of applied behavior analysis before that profession even existed, effective behavior change makes the world a better place. To illustrate, each untreated case of autism creates client discomfort, hardship for loved ones, and a lifetime drain on the economy that has been estimated at several million dollars (e.g., Jarbrink and Knapp 2009). Effective treatment mitigates these adverse effects, and good graduate training creates practitioners who are capable of delivering effective treatment. Of course, the opposite also is true. Ineffective treatments maintain or exacerbate harmful behaviors (Lilienfeld 2002), and poor graduate training creates practitioners who are likely to be ineffective. Analyses conducted in other disciplines suggest that the very worst practitioners – to invoke a round number, perhaps the least skilled 10 % or so – cause most of this damage (e.g., Chetty et al. 2011; Hanushek 2011). Given a distribution of levels of practitioner expertise (Fig. 1a), a profession like ABA should be driven to eliminate the leftmost tail (Fig. 1b). Identifying the least effective graduate programs would be an important part of this process.

A fifth reason to properly evaluate graduate training is that ineffective professionals harm not just clients, but also the profession. Much of today’s demand for ABA services was mobilized by word of mouth from satisfied consumers (e.g., Maurice 1993), and what word of mouth gives it can just as easily take away. If enough consumers have unsatisfactory experiences, their stories will circulate and suppress future demand for services (e.g., see Lilienfeld 2002). In an era of social media and other electronic communication, that circulation is rapid and pervasive, and marketing research shows that demand is suppressed more by negative reviews than it is enhanced by positive ones (e.g., Chevalier and Mayzlin 2006). Practitioners in the leftmost tail in ABA’s distribution of expertise (Fig. 1a) are those most likely to garner negative reviews. Eliminating this tail (Fig. 1b) thus may be an imperative of disciplinary survival.

The preceding frames a final reason to properly evaluate graduate training, even under uncertainty that a quality control problem exists: According to a common-sense risk analysis, it is prudent for the ABA profession to err on the side of caution. As Fig. 2 illustrates, enhanced measures of graduate training quality are unlikely to harm the profession, but they might help. If the profession has no quality control problem, then it matters little whether the quality of graduate training is measured (Fig. 2a) or not (Fig.2b), except that in the former case, unlike currently, we would know for certain that practitioner training is consistently good and could herald this desirable state of affairs to consumers in ways that few professions are able (Maher 1988). If, however, a quality control problem exists—and this was a central argument advanced by Dixon et al. (2015)—failing to properly acknowledge it could cause disastrous erosion of consumer demand for ABA services. Rigorous evaluation of graduate training quality would make clear the existence of any such problem and would provide ineffective training programs with a clear mandate to change or go out of business (resulting in an improved distribution of expertise; see Fig. 2d) as students seek out more effective programs and consumers of ABA services choose to hire practitioners with better training than those programs provide.

Fig. 2.

Fig. 2

Risks inherent in pursuing, versus not pursuing, the systematic evaluation of graduate training quality. See text for explanation

Here it is important to consider whether the erosion of demand caused by ineffective practitioners is like to be “industry wide” or “producer specific.” An example may help to define the critical issues. In the early days of the microcomputer industry, many manufacturers produced computers (Freiberger and Swaine 2000). Some of them did so more successfully than others. Think of this as similar to today’s ABA marketplace in which some graduate training programs produce better practitioners than others. In 1982, I had the misfortune of purchasing an Apple II clone that performed quite erratically. Enough consumers had similar experiences that this manufacturer was out of the microcomputer business within a few years. In this instance, poor manufacturing did not cause an industry-wide erosion of demand. Rather than abandoning microcomputers, people merely stopped buying this particular brand. By analogy, would not consumers hold selected practitioners and graduate programs responsible for poor performance, but not the profession of ABA generally?

A major difference between the early microcomputer market and the contemporary market for ABA services concerns the availability of alternatives. A computer is the only viable tool for doing much of what computers are for, so a consumer who abandons one brand of computer may choose another brand but is unlikely to stop computing entirely. The match between tool and tasks is so exclusive that no one will try to sell you a Ninja® blender as a substitute for a computer (and if they did, such an apples-instead-of-oranges offer would tempt no consumer). By contrast, consumers who seek out ABA services encounter a dizzying array of competing “products” like

… Education, Auditory Integration Training, various drugs, vitamins, and other ‘natural’ substances, imitation therapy …, Facilitated Communication, Sensory Integration Therapy, music therapy, Gentle Teaching, special diets …, patterning, deep pressure therapy, dolphin therapy, rhythmic entrainment …, and more. (Green 1996, p. 15)

Consumers who become dissatisfied with an ABA provider may perceive that these alternative therapies offer a different way to achieve the same benefits. Never mind that ABA interventions are rooted in science whereas the basis for many alternative therapies ranges from unclear to ludicrous. Never mind that, as Green (1996) and other critics of quack therapies have suggested, such therapies are a worse alternative to ABA than a blender is to a computer (blenders do not compute, but they do something useful). Consumers, who may be uninformed about research support for ABA services and influenced by unscrupulous promotion of alternative therapies (e.g., Green 1996; Maurice 1993), tend to be fickle “therapy hoppers” who happily swap apples for oranges (Shute 2010). It may be too much to hope that consumers will spontaneously distinguish between better and worse ABA services in a different way than they distinguish between ABA and other kinds of therapies. The safe money, therefore, says that ineffective services harm the ABA profession generally, not just the specific practitioners and training programs that are responsible for those services.

Reasons to Suspect Recent Changes in Graduate Training Quality

In addition to long-standing, generic reasons to care about the consistency of ABA graduate training, there could be newer and more specific concerns. Behavior reflects the environments that shape it, and most existing ABA graduate programs arose rather recently in a socioeconomic context of intense consumer demand for program graduates. This demand is why so many ABA training programs exist in the first place. According to the “law of supply” in microeconomics, when demand exists supply will arise to meet it (Mas-Colell et al. 1995). In the 1980s, for instance, as consumer demand for microcomputers took off, numerous companies began producing them (Freiberger and Swaine 2000). There is no reason to think the law of supply excludes graduate training, with graduate programs functioning as “suppliers” who deliver the “product” of new practitioners to market.3 Consumer demand for practitioners is why so many ABA graduate programs exist in the first place.

Although the law of supply predicts that supply will arise to meet demand, it is fairly mute about how this should occur. Behavior principles may offer some insights into how supply is mobilized under conditions like those our field has recently experienced. Although many strategies may exist for increasing supply, in a multi-operant environment, behaviors usually predominate that are reinforced frequently and with a minimum of effort and delay to reinforcement. This maxim inspires two predictions about supply behaviors under conditions of unmet demand.

The first prediction is for design rigidity, by which I mean limited attention to creating better products. When demand exceeds supply, better products are not needed because consumrs are satisfied with existing designs. As a result, for instance, during the early explosion of demand for microcomputers, most manufacturers used existing circuitry and text-based software interfaces (e.g., “Disk Operating System,” or DOS). We know today that better hardware and software are possible, desirable, and profitable … but this is irrelevant to historical contingencies. Low-capacity, DOS-based computers were good enough for newly tech-hungry consumers, and so that is what manufacturers made.

Applying this design rigidity assumption to the ABA profession, we may suppose that most recently created new training programs were designed to reflect the curricular emphases, pedagogical processes, and field supervision practices that were familiar to their founding faculty.4 It seems likely, therefore, that whatever variability in practitioner competence existed historically in the ABA profession (Fig. 1a) has been more or less replicated in the current boom times. Whether this affects the profession’s risk of reputational harm is not known. If consumer demand is a function of the relative proportion of practitioners who are ineffective, then the profession may be no worse off than in the past. But perhaps what matters is the raw number of ineffective practitioners. The more practitioners who are trained using existing training methods, the larger the pool of ineffective practitioners, and thus the higher the raw frequency of unsatisfactory consumer experiences that can be shared with other potential consumers.

The second prediction is for production innovation, by which I mean improved efficiency in making already-designed products in quantity. With robust demand comes considerable incentive to build existing products more quickly and cheaply. Examples of improved production efficiency include Henry Ford’s advances in assembly line technology to speed the manufacturing of automobiles and, in the more recent history of US industry, the outsourcing of manufacturing to low-wage countries to reduce the cost of production (e.g., by 2004, Apple Computer had moved all of its manufacturing outside the USA; Lo 2011).

Production efficiency is a clear focus in contemporary ABA graduate training. Consider the current emphasis on preparing practitioners at the Master’s, rather than doctoral, level. Many reasons exist for this emphasis, but one consequence of it is obvious and critical to rapidly building a community of practitioners. Master’s training, which typically lasts for 2 years, produces more graduates per unit time than doctoral training, which typically consumes 4 or more years, and thus is a better way to build capacity in a growing young profession (Shook 1993). Other possible production-enhancement strategies that may be seen in current ABA training programs include delivering instruction online rather than in person, accepting large classes of new graduate students, “outsourcing” significant portions of program teaching and supervision to inexpensive adjunct faculty, and, as Dixon et al. (2015) suggested, de-emphasizing costly and time-consuming aspects of training such as immersing students in the research process.

In terms of the resulting quality of practitioner skill, we simply do not know how these approaches compare to those of traditional programs (e.g., small entering classes, in-person instruction by permanent faculty, in the context of doctoral training). The generic concern about production innovations, however, is that they create compromises in product design and/or quality. For example, in recent years intense consumer demand for fish led to drastic expansion of aquaculture facilities in China, some of which, in an effort to operate more cheaply, ignored food safety practices that are commonly employed in the USA (Reinberg 2007). In another possible example, the majority of hardware problems in Apple’s iPhone® apparently involve components not being connected properly during overseas manufacturing (Satariano 2014). Behavior analysts might refer to these examples as problems of implementation integrity. By extension, even a well-designed graduate program might fail to train competent practitioners if it relies on flawed “production” processes, and the absence of reliable metrics of program quality calls attention to a variety of gut-level fears about production efficiencies, such as: Master’s programs offer only about half as much training as doctoral programs! Online instruction lacks rigor and nuance! Adjunct faculty are less qualified and than permanent ones! Increasing the number of students in a program reduces the amount of attention that each student can receive! Practitioners who do not understand research cannot keep abreast of developments in the field! If any of these factors really compromise practitioner training, then, the expertise generated by ABA graduate programs could be shifting toward the lower end of its distribution (Fig. 1b), which would increase the odds of consumer dissatisfaction and thereby promote the erosion of demand for ABA services.

To summarize the present section, the problem with having something wonderful like consumer demand is that losing it becomes possible. Because nobody wants ABA to return to its threadbare past, we absolutely should worry about whether today’s practitioners are up to the weighty responsibilities with which society has charged them. If enough of them are not, consumers could turn their backs on ABA, and subsequent generations of applied behavior analysts could receive no second chance to prove the profession’s value. In addressing this concern, the ABA professional community can choose between two imperfect options. One is to hope that naturally unfolding market forces gradually weed out ineffective practitioners and low-quality graduate training programs. This approach seems dangerously passive5 given the risk that market forces could instead destroy the profession’s good name and human services market share. The other option is to actively devise a system whereby the quality of ABA practitioners and graduate programs is, as Dixon et al. (2015) advocated, made transparently and publicly evident. This approach, while effortful, is consistent with behavior analysts’ strengths in evaluating behavior in terms of functional relations (see the following section) and has the potential to protect the human services market share currently enjoyed by the ABA profession.

Functional Relations of Effective Graduate Training

In referring to ABA training effectiveness, I have mainly invoked the clinical outcomes created by program graduates because, as observers of credentialing processes have noted, whatever behavioral repertoire a graduate program creates is tested most directly in the crucible of clinical practice (e.g., Maher 1988). Like all behavior, practitioner behavior that determines client progress is best understood as imbedded in functional relations. Behavior analysts know that to evaluate a functional relation requires simultaneous attention to independent variables and dependent variables (here, features of the graduate training experience and of program graduates, respectively). Unfortunately, most existing approaches to graduate program evaluation have shortcomings as analyses of meaningful functional relations.

Regarding independent variables, in an accreditation process like ABAI’s or a ranking system like that used by US News & World Report, programs may be favored that offer certain kinds of educational experiences (e.g., course topics or research apprenticeships) or provide students with certain kinds of resources (e.g., financial support). These are independent variables of uncertain bearing on subsequent practitioner skill (e.g., Maher 1988). During my Master’s training in Educational Psychology, I learned about philosophy-driven instructional systems that are considered by their advocates to be exemplary because their features are inspired by preferred theories of learning and child development (while actual impact on student achievement often seems to be of little interest; see Watkins 1987). Many approaches to evaluating graduate training seem similar in that they focus on topographical aspects of training programs but not on whether programs change students in the most important ways.

Regarding dependent variables, some evaluation systems track student outcomes, but often not outcomes that are clearly related to subsequent professional competence. For example, the US News ranking system gives credit for a program’s academic reputation as measured through surveys of high school guidance counselors and officials of other universities (see http://www.usnews.com/education/best-colleges). Program reputation might be influenced by the field expertise of program graduates, but it also could reflect, to a large extent, extraneous variables such as how well a program markets itself or the prestige of its host institution (in the latter case, an ABA program at the University of Chicago might receive more of a halo effect than one at Illinois State University). BACB data on certification exam pass rates also constitute a rather indirect type of dependent variable. Certification exam score might correlate with field competence but could also be strongly influenced by such factors as general academic talent, and thus, a program with a high pass rate might simply be recruiting talented test takers (Dixon et al. 2015). Overall, the BACB certification process verifies that an individual has graduated from a Master’s program, passed course work on selected topics, endured a required number of clinical hours, and earned a passing score on the certification examination (http://www.bcba.com). If data exist to verify that these factors correlate with ABA field competence, I am unaware of them.6

To be fair to the ABA profession and bodies like BACB and ABAI that serve it, most applied fields do not directly and systematically evaluate the clinical effectiveness of their practitioners (Maher 1988). For the profession of ABA to do better will require devising an appropriate measurement system through which practitioner effectiveness can be quantified and compared, and there is good and bad news regarding this challenge. The good news is that ABA practitioners typically are taught how to quantify behavior through behavioral assessment and to measure behavior change through single-subject experimental designs (e.g., Cooper et al. 2013). Theoretically, then, a large proportion of the ABA practitioner population may already be collecting data on how well clients are progressing.

One piece of bad news is that typical clinical case data might be less informative than first imagined, for three reasons. First, confidentiality rules and other factors prevent clinical case records from being publicly circulated, so no obvious avenue exists to access the relevant data. Second, clinical case records might provide a biased sample of practitioner expertise, because one probable characteristic of weak ABA practitioners is a failure to engage in data-based case management and to maintain good clinical records. Thus, even if we could access all available case data, we might see only the performance of more-competent practitioners. Third, in behavioral assessment, client behaviors are defined to fit the particulars of individual cases, and no two clients may have precisely the same target behaviors. Moreover, no two practitioners might operationally define the same target behavior in exactly the same way. Thus, it would be difficult to compare case records across cases and practitioners. Some post-hoc standardization is possible using effect-size statistics (Parker and Hagan-Burke 2007), but of course, this does not address the problem of obtaining useful data for all practitioners.

An obvious solution to the access problem would be to implement a profession-wide system for measuring client progress, although few professions have attempted anything like this. One familiar approximation is the use, in the US public education system, of standardized tests for assessing student achievement. Although there are pronounced concerns about the validity these tests, the general approach of testing most students in most schools provides a uniform way to compare the progress of individual students, and thereby, a means of relating teacher behaviors to student gains.

If the ABA profession had a uniform means of tracking client progress, it would be possible to link client outcomes to the practitioners who work with them. Those outcomes could then be linked to the graduate programs that trained the practitioners (Fig. 3). The functional relations thus described, uniting the practices of specific training programs and the developmental trajectories of clients who are served by program graduates, would speak directly to the values and goals of the ABA profession. Unlike the measures typically employed in graduate program evaluations – things like faculty accomplishments (Dixon et al. 2015) or program resources (as per the US News ranking system) – these functional relations would tell consumers exactly what they want to know about practitioners and graduate programs. They also would also directly document the social impact of the ABA profession to consumers and policy makers in ways that typical ABA research does not.

Fig. 3.

Fig. 3

The general logic of value-added analyses. The progress of individual clients (left) is quantified and related to the practitioners (middle) who worked with them. The aggregate results for practitioners are related to the programs that trained them (right). Training programs thus are evaluated in terms of the aggregate field effectiveness of their graduates. See text for additional explanation

To justify the last point, it is important to distinguish efficacy evidence, which indicates whether a therapeutic approach can work when the contextual stars are properly aligned, from effectiveness evidence, which indicates whether this approach works under normative field conditions (Schoenwald and Hoagwood 2001). ABA is empirically supported but much of the relevant research examines efficacy (i.e., it takes place in fairly well-controlled training clinics, employs well-supervised staff, and taps into a variety of university resources). Field effectiveness, by contrast, is assessed in uncontrolled settings incorporating clients who exhibit varied problems; therapists who have not been screened for exemplary skill; and treatment regimens that are therapist- and consumer-directed rather than, as in efficacy research, strictly regimented (Fixsen et al. 2005; Strosahl et al. 1998). Too little is known about how well ABA interventions that have been vetted under favorable conditions actually serve consumers in the everyday trenches (e.g., see Fixsen et al. 2005). A system for evaluating graduate programs thus could also serve as a sort of omnibus effectiveness evaluation for the entire ABA profession.

Field effectiveness evidence tends to focus on general functioning rather than the discrete behavioral symptoms (Strosahl et al. 1998) that tend to be monitored in ABA practice and research. A familiar example will illustrate the distinction. In his pioneering research on early intensive autism intervention, Lovaas (1987) reported the highest level of school placement achieved by each child participant. Although “school placement” is not child behavior, this outcome likely is influenced by a host of specific child behaviors, unfolding collectively and over time (no child is likely to be placed in a “normal” classroom without exhibiting many academically appropriate and socially acceptable behaviors). Although school placement is not child behavior, consumers who may not discern momentary changes in, say, rates of escape-maintained disruptive behavior will instantly know whether a child’s school placement is age-typical and desirable. Social validity and correlation with clinically important behaviors thus make general functioning variables useful—perhaps the most useful—components of effectiveness evidence.

Behavior analysts seeking standardized, general-functioning measures will find no perfect options of which I am aware. Behavior analysts themselves have created a few standardized, observation-based assessment systems, such as the Verbal Behavior Milestones Assessment and Placement Program (Sundberg 2008) and the PEAK Relational Training System (e.g., Dixon et al. 2014), but these may not describe general functioning in the same globally informative way as, say, school placement. More widely employed instruments such as the Vineland Adaptive Behavior Scales (Sparrow et al. 2005) and the Mullen Scales of Early Learning (Mullen 1995) may provide a clearer focus on general functioning. They also tend to offer certain procedural and interpretative advantages, such as strictly manualized administration methods and well-established psychometric properties, including richly detailed outcome norms. However, such instruments also tend to employ measurement techniques (e.g., rating scales rather than direct observation) that make many behavior analysts uncomfortable.

Let us imagine, however, that the ABA profession settles on an acceptable clinical outcome measure that it somehow manages to implement profession wide. Still required to map the functional relations uniting graduate training and practitioner effectiveness would be a quantitative model that can aggregate the outcomes from (a) multiple clients who were served by the same practitioner and (b) multiple practitioners who were trained by the same graduate program (e.g., Fig. 3). The value-added approach to evaluating schoolteacher effectiveness (Boyd et al. 2008) may offer an example of how to approach this problem. The approach assumes that students will change over time, and asks what proportion of student change can be attributed to a given teacher. Specifically, how do teacher A’s students perform compared to those of a hypothetical average teacher? Because schools have a standardized method of tracking student achievement, the mean test scores of all students in a locality (e.g., district or state) can define the “average teacher’s” contribution to student progress. The term “value added” implies the amount of student gain that is created beyond what could be expected given “teaching as usual.”

Mathematically, it is a fairly simple matter to determine the extent to which a given teacher’s actual student outcomes depart from objective norms. The comparison becomes trickier, however, if one takes into account variables besides teacher behaviors that also correlate with student outcomes, including historical factors (e.g., effectiveness of a child’s previous teachers), stable child factors (e.g., IQ and socioeconomic status), and current environmental factors (e.g., school funding and levels of parental involvement in school activities). These factors can vary across children, classrooms, and localities. Therefore, value-added models employ multivariate statistical methods to determine the extent to which a given teacher’s student outcomes both diverge from local norms and are independent of various contextual factors that teachers do not control (Boyd et al. 2008).7

Such a complex endeavor incorporates many pitfalls (e.g., Ballou and Springer 2015), and the value-added approach to teacher evaluation is best described as an emerging technology that requires considerable further development (Noell and Burns 2006). In broad strokes, however, value-added methods are an attempt at something that the ABA profession has not to my knowledge discussed: systematically quantifying the relation between individual practitioners (i.e., teachers) and the general functioning of their clients (i.e., students). If we imagine that such a thing is possible on a profession-wide scale, then it is but a small additional step to include practitioner graduate program affiliation as a predictor variable in the multivariate models (Noell and Burns 2006). If a program provides superior training then, compared to other programs and adjusting for a variety of contextual factors, its graduates should be associated with statistically verified superior client outcomes. Consumers of ABA graduate programs, students and employers alike, should find this kind of information useful and the community of ABA graduate programs would profit from knowing which among them are the empirically determined “best practices” programs.

Determining which graduate programs produce the most effective practitioners is not, of course, the same as understanding how they do this. Graduate programs are intervention packages consisting of many components, some of which may be critical to graduates’ field effectiveness and some of which may not. To disentangle the influences, specific features of graduate programs can be included as predictor variables in value-added analyses. For instance, value-added research in teacher education reveals superior outcomes for students who teachers were taught not just about educational interventions but how to implement them. Thus, the value-added approach is entirely compatible with a careful analysis of the independent variables of graduate training.

It should be obvious that sizeable hurdles would need to be overcome in order to create an ABA effectiveness monitoring system, but this could, I am confident, be accomplished. Not many decades ago, I would have labeled as delusional anyone who claimed it possible to create a certification system that large numbers of practitioners would seek out (and pay for), that employers would covet, and around which numerous universities would tailor degree programs. Somehow, all of this came to pass, and I would not bet against the ABA profession a second time. If the ABA profession can properly evaluate practitioners and training programs, and if there exists compelling reasons why it should, then the only remaining question is What are we waiting for? The consequences of inadequate professional quality control do not become less pronounced over time.

Effectiveness Monitoring does not Replace Certification

In closing, it is important to stress that a system of effectiveness monitoring for practitioners and graduate programs should complement, not compete with, existing mechanisms of professional quality control such as practitioner certification and graduate program accreditation. In the service of brevity, to defend this claim I will focus on certification because it is more widely implemented than accreditation. The primary motivation for establishing an effectiveness monitoring system—that the costs of inaction could include erosion of public demand for ABA services—also was invoked in support of certification (Shook 1993), so it is clear that goals of the two initiatives are broadly aligned. Effectiveness monitoring and certification would not be redundant, however, as an examination of the precepts of certification will reveal.

Credentials like ABA certification become valuable to practitioners when tied to right-to-practice rules (e.g., only people with the credential may provide services) and fee-for-service arrangements (e.g., only people with certain qualifications may bill insurance for services). In order to persuade policy makers that its practitioners are deserving of these privileges and protections, professions typically enforce some degree of consistency in practitioner qualifications (Haskell 1977). A critical function of professional qualifying standards, therefore, is gatekeeping, or excluding non-credentialed individuals (Maher 1988; Shook et al. 1988). These standards represent a difficult balancing act. On the one hand, they must be stringent enough to establish professional credibility. On the other hand, if they are too stringent, there will not exist enough practitioners to satisfy consumer demand or to justify the profession’s policy-granted privileges. For this reason, credentialing tends to enforce minimum rather than optimum qualifications (e.g., Maher 1988; Shook 1993; Starin et al. 1993). Across many professions, the norm is to require prospective professionals to meet knowledge-based requirements such as completing educational programs and passing examinations, but the knowledge of interest is at most prerequisite to, rather than emblematic of, practical expertise. In ABA, for instance, knowing about interventions is not the same as being able to implement them (e.g., see Shook et al. 2004).

It is often said that practitioner certification protects consumers (e.g., Shook 1993; Shook et al. 1988), but the extent to which this is true probably varies with the stringency, and specificity to practice, of qualifying standards. If a profession applies only minimum standards in deciding who will become credentialed, it provides only minimum protection to consumers (e.g., Gilley and Galbraith 1986; Haskell 1977). In part because most credentialing processes do not directly evaluate competence in the field (Maher 1988), critics have bemoaned a tendency for professions to place less emphasis on quality control than on asserting marketplace dominance. The author of an extensive survey of credentialing in many professions offered this pessimistic assessment of certification procedures:

Their stated functions, those that primarily justify their existence, are governance in maintaining standards and discipline, and the protection of the public welfare. … [T]he degree to which these functions are carried out is, at best, uncertain. More certain are the secondary consequences of practices, including the control of the supply of workers, which affect the wages of the licensed and certified, the enhancement of the status of an occupation or profession, and the prestige of practitioners. There is more in licensing and certification that clearly benefits the professions and professionals, therefore, than there is that clearly benefits the consumer. (Klemp 1980, cited in Maher 1988, p. 417).

The preceding may sound incriminating, but I prefer the following upbeat appraisal: Certification is pretty good at what it is for. There is nothing cynical in celebrating the fact that certification has helped to create occupational opportunities in ABA that were unimaginable a few decades ago. After all, no matter how powerful and effective ABA may be when properly employed, it cannot make the world a better place without an army of practitioners who are gainfully employed to deliver ABA services. From this perspective, it is a happy side effect that certification has created qualifying standards where once there were none, even if these standards exert only limited control over practitioner competence. Should greater assurance be needed that ABA practitioners (and the programs that train them) are effective, it is within the purview and skill set of the ABA community to make obtaining this assurance a priority. Proponents of certification have long acknowledged that complementary steps might be required to create exemplary practitioner competence (e.g., see Moore and Shook 2001; Shook et al. 2002; Starin et al. 1993). Now that ABA professionals enjoy the benefits of certification, it is a logical step in the profession’s development to supplement whatever quality control the certification process can provide. Doing so will create more substantive consumer protections than does certification and, like certification, probably will be good for business. Moreover, profession-wide monitoring of practitioner effectiveness, as would be required for rigorous evaluation of graduate programs, could eventually inform the evolution of certification standards by making the experiences necessary to becoming an effective practitioner a matter of empirical record rather than of speculation.

Footnotes

1

Given our discipline’s strong empirical roots, there is much more to say about this. Among hypotheses worthy of discussing and testing are that research training (a) prepares students to understand clinical realities by showing behavior change in real time as a function of planned interventions; (b) teaches general measurement and data-based decision skills that help with the evaluation of clinical practice; (c) contributes to future treatment integrity and transportability by highlighting why interventions work; (d) spurs future clinical innovation by exposing future practitioners to research advances; (e) creates scientist practitioners who are capable of conducting research in practice settings; and (f) supports the discipline’s science wing by revealing to some students, in ways they might not have suspected, that their reinforcers lie in conducting research.

2

These course sequences can be embedded in graduate degree programs but the certification board does not approve or disapprove programs. As I understand the process, approval verifies that courses address selected topics but does not imply anything about the effectiveness of instruction.

3

Because I will examine some ways that demand influences supply, it is useful to specify how a market analogy does and does not apply to graduate training. In typical economic systems, suppliers of goods may profit directly from meeting demand. For “producers” of new ABA practitioners the contingencies are less direct, because graduate programs do not “sell” their students to employers the way manufacturers sell goods to consumers. Rather, they sell training to interested students who later test the employment market. Nevertheless, because programs cannot exist without students, and students presumably are sensitive to how graduate training relates to employment contingencies, some link between employment market and demand for graduate training can be assumed. I also assume that graduating students and placing them into professional positions serves as a reinforcer for those who run graduate programs—but of course, not the only reinforcer. Contingencies within academic institutions are complex and multilayered, so sometimes a program with few students will survive or a program with strong student interest will be scuttled (e.g., see Wolf 2001, on Arizona State University’s celebrated but short-lived “Fort Skinner in the Desert”). As a general rule, however, programs and the people who run them fare better when they can graduate lots of students. It is in this sense that I label program graduates as “reinforcers.”

4

Here I comment on the modal contingencies under which new programs develop, not on the creativity or commitment to innovation of many fine colleagues who, I am well aware, labor tirelessly in attempt to keep good practitioners flowing into the marketplace.

5

Or, as Benjamin Franklin remarked (and I am not making this up), “He who lives upon hope, dies farting” (Poor Richard's Almanack 1736). Franklin later amended the aphorism to, “He who lives upon hope, dies fasting.” Both versions address the pitfalls of misplaced passiveness. The updated version better describes the potential consequences for the ABA profession if it does not set its graduate training house in order, but the original version is more fun.

6

But there are examples from other disciplines of credentialing processes that, in the context of field expertise, are essentially inert. For example, some research in teacher education suggests that formal credentialing is unrelated to student achievement (Kane et al. 2006).

7

Baseball fans may recognize this approach as sharing properties with the “wins above replacement” (WAR) statistic, which estimates how a team’s win total differs from normative win totals as a result of the accomplishments of a given player (Thom and Palmer 2015). WAR is based on multivariate models that consider how a player’s measured accomplishments compare to those of other players, and how various kinds of accomplishments correlate with team win totals.

Author Note: The title is based on W.B. Yeats’ epigram to the poem “Responsibilities 1914,” which deftly melds the wisdom of Spiderman (“With great power there must also come -- great responsibility!”; Amazing fantasy #15, August, 1962) and Benjamin Franklin (“By failing to prepare, you are preparing to fail;” source unverified).

References

  1. Bailey J, Burch M. Ethics for applied behavior analysis. 2. New York: Routledge; 2011. [Google Scholar]
  2. Ballou D, Springer M. Using student test scores to measure teacher performance: some problems in the design and implementation of evaluation systems. Educational Researcher. 2015;44(2):77–86. doi: 10.3102/0013189X15574904. [DOI] [Google Scholar]
  3. Boyd, D., Grossman, P., Lankford, H., Loeb, S., & Wyckoff, J. (2008). Teacher preparation and student achievement. National Bureau of Economic Research Working Paper 14313. Retrieved July 8, 2015, from http://www.nber.org/papers/w14314
  4. Chetty, R., Friedman, J.N., Rockoff, J.E. (2011). The long-term impacts of teachers: teacher value-added and student outcomes in adulthood. National Bureau of Economic Research Working Paper 17699. Retrieved July 19, 2015, from http://www.nber.org/papers/w17699
  5. Chevalier J, Mayzlin D. The effect of word of mouth on sales: online book reviews. Journal of Marketing Research. 2006;43:345–354. doi: 10.1509/jmkr.43.3.345. [DOI] [Google Scholar]
  6. Cooper JO, Heron TE, Heward WL. Applied behavior analysis. 2. New York: Pearson; 2013. [Google Scholar]
  7. Deitz SM. Current status of applied behavior analysis: science versus technology. American Psychologist. 1978;33:805–814. doi: 10.1037/0003-066X.33.9.805. [DOI] [Google Scholar]
  8. Detrich R. Increasing treatment fidelity by matching interventions to contextual variables within the educational setting. School Psychology Review. 1999;28:608–620. [Google Scholar]
  9. Dixon MR, Belisle J, Whiting SW, Rowsey KE. Developing a normative sample of the PEAK relational training system—direct training module for comparison to individuals with autism. Research in Autism Spectrum Disorders. 2014;8(11):1597–1606. doi: 10.1016/j.rasd.2014.07.020. [DOI] [Google Scholar]
  10. Dixon MR, Reed DD, Smith T, Belisle J, Jackson RE. Research rankings of behavior analytic graduate training programs and their faculty. Behavior Analysis in Practice. 2015;8:7–15. doi: 10.1007/s40617-015-0057-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Fixsen, D.,L., Naoom, S.F., Blase, K.A., Friedman, R.M., & Wallace, F. (2005). Implementation research: a synthesis of the literature. Tampa: University of South Florida, Louis de la Parte Florida Mental Health Institute, The National Implementation Research Network (FMHI Publication #231).
  12. Freiberger P, Swaine M. Fire in the valley: the making of the personal computer. New York: McGraw Hill; 2000. [Google Scholar]
  13. Gilley JW, Galbraith MW. Examining professional certification. Training and Development Journal. 1986;40(6):60–61. [Google Scholar]
  14. Green G. Evaluating claims about treatments for autism. In: Maurice C, Green G, Luce S, editors. Behavioral intervention for young children with autism: a manual for parents and professionals. Austin: Pro-Ed; 1996. pp. 15–28. [Google Scholar]
  15. Hanushek EA. The economic value of higher teacher quality. Educational Research Review. 2011;30:477–479. [Google Scholar]
  16. Haskell T. The emergence of professional social science: The American Social Science Association and the Nineteenth Century crisis of authority. Urbana: University of Illinois Press; 1977. [Google Scholar]
  17. Hayes SC. Theory and technology in behavior analysis. The Behavior Analyst. 1978;1:35–41. doi: 10.1007/BF03392370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hayes SC, Rincover A, Solnick JV. The technical drift of applied behavior analysis. Journal of Applied Behavior Analysis. 1980;13:275–285. doi: 10.1901/jaba.1980.13-275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Hopkins BL. ABA to begin accrediting graduate programs of studies in behavior analysis. ABA Newsletter. 1991;l4(3):19–21. [Google Scholar]
  20. Jarbrink K, Knapp M. The economic impact of autism in Britain. Autism. 2009;5:7–22. doi: 10.1177/1362361301005001002. [DOI] [PubMed] [Google Scholar]
  21. Kane, T.J., Rockoff, J.E., & Staiger, D.O. (2006). What does teacher certification tell us about teacher effectiveness. Evidence from New York City. National Bureau of Economic Research Working Paper 12155. Downloaded July 28, 2015, fromhttp://www.nber.org/papers/w12155.
  22. Lilienfeld SO. The scientific review of mental health practice: our raison d’etre. The Scientific Review of Mental Health Practice. 2002;1:1–9. [Google Scholar]
  23. Lo CP. Global outsourcing or foreign direct investment: why Apple chose outsourcing for the iPod. Japan and the World Economy. 2011;23:163–169. doi: 10.1016/j.japwor.2011.06.002. [DOI] [Google Scholar]
  24. Lovaas IO. Behavioral treatment and normal educational and intellectual functioning in young autistic children. Journal of Consulting and Clinical Psychology. 1987;55:3–9. doi: 10.1037/0022-006X.55.1.3. [DOI] [PubMed] [Google Scholar]
  25. Maher WJ. Contexts for understanding professional certification: opening Pandora's box? American Archivist. 1988;51:408–427. doi: 10.17723/aarc.51.4.h17366pq2550l482. [DOI] [Google Scholar]
  26. Makrygianni MK, Reed P. A meta-analytic review of the effectiveness of behavioural early intervention programs for children with autism spectrum disorders. Research in Autism Spectrum Disorders. 2010;4:577–593. doi: 10.1016/j.rasd.2010.01.014. [DOI] [Google Scholar]
  27. Mas-Colell A, Whinston M, Green J. Principles of microeconomics. Oxford: Oxford University Press; 1995. [Google Scholar]
  28. Maurice C. Let me hear your voice. New York: Knopf; 1993. [Google Scholar]
  29. Michael J. Flight from behavior analysis. The Behavior Analyst. 1980;3:1–21. doi: 10.1007/BF03391838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Moore J, Shook GL. Certification, accreditation, and quality control in behavior analysis. The Behavior Analyst. 2001;24:44–55. doi: 10.1007/BF03392018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mullen EM. Mullen scales of early learning. New York: Pearson; 1995. [Google Scholar]
  32. Noell GH, Burns JL. Value-added assessment of teacher preparation: an illustration of emerging technology. Journal of Teacher Education. 2006;57:37–50. doi: 10.1177/0022487105284466. [DOI] [Google Scholar]
  33. Parker RI, Hagan-Burke S. Useful effect-size interpretations for single-case research. Behavior Therapy. 2007;38:95–105. doi: 10.1016/j.beth.2006.05.002. [DOI] [PubMed] [Google Scholar]
  34. Reinberg, S. (2007). FDA halts imports of farmed fish from China. Retrieved July 23, 2015, from http://abcnews.go.com/Health/Healthday/story?id=4507743
  35. Satariano, A. (2014). Apple’s iPhone first responders. BloombergBusiness. Retrieved July 15, 2015, from http://www.bloomberg.com/bw/articles/2014-09-04/for-iphone-6-defects-apple-has-failure-analysis-team-ready.
  36. Schoenwald SK, Hoagwood K. Effectiveness, transportability, and dissemination of interventions. What matters when? Psychiatric Services. 2001;52:1190–1197. doi: 10.1176/appi.ps.52.9.1190. [DOI] [PubMed] [Google Scholar]
  37. Shook GL. The professional credential in behavior analysis. The Behavior Analyst. 1993;16:87–101. doi: 10.1007/BF03392614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Shook GL, Favell J. The Behavior Analysis Certification Board and the profession of behavior analysis. Behavior Analysis in Practice. 2008;1:44–48. doi: 10.1007/BF03391720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Shook GL, Johnston JM, Cone J, Thomas D, Greer D, Beard J, Herring R. Credentialing, quality assurance and right to practice. Kalamazoo: Association for Behavior Analysis; 1988. [Google Scholar]
  40. Shook GL, Ala'i-Rosales S, Glenn SS. Training and certifying behavior analysts. Behavior Modification. 2002;26:27–48. doi: 10.1177/0145445502026001003. [DOI] [PubMed] [Google Scholar]
  41. Shook GL, Johnston JM, Mellichamp FH. Determining essential content for applied behavior analysis practitioners. The Behavior Analyst. 2004;27:67–94. doi: 10.1007/BF03392093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Shute N. Desperate for a cure. Scientific American. 2010;330(4):80–85. doi: 10.1038/scientificamerican1010-80. [DOI] [PubMed] [Google Scholar]
  43. Skinner BF. Science and human behavior. New York: Basic Books; 1953. [Google Scholar]
  44. Sparrow SS, Chichetti DV, Balla DA. Vineland adaptive behavior scales. 2. New York: Pearson; 2005. [Google Scholar]
  45. Starin S, Hemingway M, Hartsfield F. Credentialing behavior analysts and the Florida Behavior Analysis Certification Program. The Behavior Analyst. 1993;16:153–166. doi: 10.1007/BF03392620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Strosahl KD, Hayes SC, Bergan J, Romano P. Assessing the field effectiveness of acceptance and commitment therapy: an example of the manipulation training research method. Behavior Therapy. 1998;29:35–69. doi: 10.1016/S0005-7894(98)80017-8. [DOI] [Google Scholar]
  47. Sundberg ML. Verbal behavior milestones assessment and placement program: the VB-MAPP. Concord: AVB Press; 2008. [Google Scholar]
  48. Thom J, Palmer P. The hidden game of baseball. Chicago: University of Chicago Press; 2015. [Google Scholar]
  49. Ulman J. The Ulman-Skinner letters. European Journal of Behavior Analysis. 2014;15:11–19. [Google Scholar]
  50. Watkins CL. Project follow through: a case study of contingencies influencing instructional practices of the educational establishment. Cambridge: Cambridge Center for Behavioral Studies; 1987. [Google Scholar]
  51. Wolf MM. Application of operant conditioning principles to the behavior of an autistic child: a 25-year follow-up and the development of the Teaching-Family Model. In: O’Donohue W, Henderson D, Hayes S, Fisher J, Hayes L, editors. A history of the behavioral therapies: founders’ personal histories. Reno: Context Press; 2001. pp. 289–294. [Google Scholar]

Articles from Behavior Analysis in Practice are provided here courtesy of Association for Behavior Analysis International

RESOURCES