Abstract
The standards associated with high-stakes professional credentialing are well established in the field of testing and measurement and are well supported by antitrust, administrative, and contract law. These standards have evolved to assure that the scope of work for a field’s practitioners is appropriately reflected in the content of credentialing examinations and that the means by which credentials are earned include practitioners and other stakeholders throughout all phases of the credentialing process. This article describes the procedures by which the content of credentialing examinations is determined. The certification programs administered by the Behavior Analyst Certification Board are used as an illustration throughout. The article also considers the implications of these procedures and mechanisms.
Keywords: Certification, Certification standards, Certification examination content, Behavior Analyst Certification Board
The procedures used by the Behavior Analyst Certification Board® (BACB®) to develop its certification examinations are used for certification and licensure examinations worldwide. The science of test development is a professional field in its own right, and many educational institutions offer advanced degrees in testing and measurement. The testing field’s professional associations (e.g., Institute for Credentialing Excellence; Council on Licensure, Enforcement, and Regulation; and Association of Test Publishers) have developed an extensive body of literature describing best practices in measurement and evaluation (e.g., American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1999). Furthermore, formal standards have been developed for evaluating credentialing programs through third-party audit processes such as those of the National Council of Certifying Agencies (NCCA) and the American National Standards Institute (ANSI). To date, 112 credentialing organizations have had programs accredited by NCCA (Institute for Credentialing Excellence 2014), including the BACB, and 48 organizations have had programs accredited by ANSI (2014).
The legal foundation for high-stakes credentialing procedures evolved out of the interaction of a variety of legal principles. Significant legal implications for certification bodies are found in antitrust, administrative, and contract law. In antitrust law, certification programs offering credentials that are considered prerequisites to practice must demonstrate that those credentials are reasonable. Failure to demonstrate reasonableness of the credentialing requirements could result in claims of unlawful restraint of trade. In administrative law, certification programs must also meet the requirements of minimal due process. Notice to candidates about the testing specifications and certification requirements, along with opportunities to appeal denials, are founded in best practices arising out of due process laws and procedures. In contract law, the application for certification and any rules and requirements relating to how to apply, standards for qualifying, renewal and recertification, and examination requirements constitute a contractual relationship between the certifying body and the candidate/certificant.
With regard to actual test questions, the most significant legal concern is “reasonableness” under the antitrust considerations identified above. There is a long line of precedence finding reasonableness to be grounded in the process followed to develop and score a test item. An example of how this test is applied in court cases appeared in the US Supreme Court decision in Ricci v. DeStefano (2009). This case is informally referred to as the “Connecticut firefighter” lawsuit. In this case, the Supreme Court of the USA reinstated examination results even though those results had a substantial negative impact on minority firefighters. The examination results were reinstated because the processes used when developing the examination items and “cut score” (passing score) were demonstrated to be valid and reasonable. The courts reviewing this case took into consideration the extent to which the examination questions followed generally accepted best practices for development, including a foundation in job task analysis, with procedures in place to ensure unbiased item writing, administration, scoring, and post-examination review. The best practices for certification examinations often mirror best practices for employment testing (United States Equal Employment Opportunity Commission 1978).
Consider the following question previously used on an earlier version of the BACB examination for Board Certified Behavior Analyst® (BCBA®) certification.
Charmaine has sporadic incontinence. Recently, incontinence has increased to two or three occurrences per day. The change appears to coincide with a change in her medication, which was adjusted when she was last seen by her physician about three weeks ago. What should the behavior analyst do FIRST?
- A.
Have Charmaine keep an incontinence log
- B.
Conduct a functional analysis
- C.
Advise caregivers to contact her physician
- D.
Review all records
Did you answer it correctly? In case you are not sure, the best answer is option “C.” This is the best answer because the change in Charmaine’s incontinence seems to coincide with a medication change. Having Charmaine keep a log is not helpful because we already know the rate is two to three times per day. Conducting a functional analysis is premature given that there is a potential cause, which can be evaluated by contacting the physician. Reviewing all records goes beyond what is necessary and could be an invasion of Charmaine’s privacy.
Thousands of questions similar to this one comprise the item pool for BACB examinations. This item pool is a continually updated collection of questions that have survived item analysis evaluations from previous examination administrations. An item analysis is a routine statistical evaluation of each item used in an examination that confirms that each item adequately discriminates between candidates who perform well on the exam and those who perform poorly (Livingston 2006). Items that are problematic are either revised or discarded. As with all new items, revised items are included on future examinations to determine if they perform well and can therefore be included in the pool. Such items are scored and analyzed but not counted in the candidate’s performance. Generally, this process is described as “pre-testing” items.
This item pool constitutes one part of a set of contingencies that influence what authors include in textbooks, what instructors incorporate into course syllabi, and what students study in taking these courses and preparing for the certification examination. These contingencies are therefore an important part of determining the competencies assessed by BACB examinations. The focus of this article concerns from where examination questions come and the elaborate process underlying the determination of standards used by the BACB and similar high-stakes professional credentialing programs. High-stakes credentialing programs are those that have significant consequences for not just the candidate and the credentialing body but for the public, which can be put at risk by unqualified candidates. Protecting the public from this risk is the rationale underlying all facets of the credentialing process.
How Examination Items are Developed
Items may be written by a variety of individuals under varying circumstances. Most professional credentialing programs rely on individuals who hold the credential for which the examination is being developed. By virtue of having obtained and maintained the credential, these people are deemed to be “subject matter experts” or SMEs. SMEs are usually volunteers who have been working in the profession for a while and now wish to give something back to their field. As an example of this general approach, items in the pool from which BACB examinations are constructed are written by BACB certificants who have participated in a 2-h workshop presented by the BACB’s psychometrician (an expert in testing and measurement; the second author). This workshop teaches participants how to develop good multiple-choice test questions. The workshop also provides an overview of the steps required to develop fair, valid, and reliable examinations. Upon completion of the workshop, these certificants are assigned specific tasks for which to draft test items. The tasks assigned are determined based on an inventory of the pool of questions, which is conducted prior to each workshop to identify areas in need of additional items. Depending on the nature of the workshop, the SMEs may write their items on site or submit items online using a secure website designed for this purpose. All submitted items are entered into the BACB item pool as “draft” items. The BACB conducts periodic item-review workshops during which a panel of BACB certificants is convened to review, revise, and approve the draft items for use as “pilot” items on an examination. The pilot items are not included among the scored items on the examination because the pilot items are being tested to gather data on how they might perform if included on future examinations. Only items that “pass the test” will become eligible for use as scored items on future examinations. To “pass,” an item must be answered correctly by the majority of candidates and must appropriately discriminate between candidates who do and do not possess sufficient knowledge to obtain the credential. In other words, answering the item correctly should be positively correlated with overall examination scores. After being pilot tested, the statistical data that have been gathered on the items are reviewed by the psychometrician. In the event of questionable statistical performance, the item will be reviewed, revised, or discarded by another panel of SMEs.
Construction of questions is driven by specific knowledge, skill, and ability statements (KSAs). These are detailed statements, much like operational definitions, that expand on the tasks that appear in the published task list describing the content of credentialing examinations. For the BACB, the KSAs are designed to serve as “prompts” for the certificants who will draft items for the examinations. Although not an exhaustive list of every concept or activity pertinent to the practice of applied behavior analysis (ABA), they cover key points that should be included in the examinations.
The BACB’s KSAs are developed by a panel of certificants shortly after the task list is approved by the BACB’s Board of Directors. The KSAs comprise an internal document that is used as a guide during item writing. For example, the task statement for the above item was “G-02: Consider biological/medical variables that may be affecting the client.” (BACB 2012). The specific KSA statement for the above item was “Seek consultation to identify potential medical issues causing behavior problems.” Each KSA serves as the basis for several items. Although these items focus on the same KSA, each may take a slightly different approach to probing the candidate’s understanding of the material.
An important benefit of this redundancy is to increase the size of the overall item pool. The necessary size of the pool is determined based on the frequency with which the examination is administered, the number of candidates who sits for the examination during each testing window, and the number of examination forms that is used during a given testing window. An examination form is a unique collection of items (150 items for the BCBA and 130 items for the Board Certified Assistant Behavior Analyst®, BCaBA®) selected to cover the content identified by the task list. For security reasons, the BACB administers multiple examination forms during the testing windows each year. The item pool contains over 10 times the number of items required to create one examination form for each credential.
The KSAs are in turn derived from task statements. In the case of the BACB, these are descriptive statements that identify the work activities performed by BCBAs and BCaBAs. Each task statement consists of a verb describing the action that takes place, an object receiving that action, and one or more qualifiers if needed for clarification. Task statements covering related material are organized into logical content areas, which represent the major job functions of an applied behavior analysis practitioner. For instance, there are presently 11 content areas covering the material that both BCBA and BCaBA candidates are responsible for knowing. These content areas contain from 3 to 21 task statements, although some tasks involve multiple statements. Collectively, these content areas and their tasks are called the task list. The Fourth Edition Task List includes 115 tasks (BACB 2012).
How the Task List is Developed
Task lists result from an elaborate process called a job analysis (Raymond and Neustel 2006; Shook et al. 2004). A job analysis identifies the key functions and basic job duties of a profession at a particular point in time. By design, they represent not the latest practices or trends, but the mainstream activities generally accepted by practitioners. This conservative approach protects against including content that is not yet established by research or broadly accepted within a field and that may yet fall by the wayside. This caution means job analyses must be periodically updated by repeating the effort, typically every 5 to 10 years depending on the needs of the profession, to accommodate advances that eventually pass muster.
A job analysis consists of specific components. First, the credentialing organization convenes a representative panel of SMEs. Individuals are selected based on their experience and expertise in broad areas of the field represented by practitioners. The resulting panel membership typically considers dimensions such as gender, geography, type of employment, area of expertise, professional contributions, and so forth. The goal is to bring together a diverse group of panel members that provides a good cross-section of the field as a whole. In the case of the BACB’s most recent job analysis, for example, this panel included university faculty, as well as practitioners working in different areas within the USA and in other countries, with training from different educational institutions, working in a variety of applied settings, and with various levels of supervisory experience.
The panel is brought together for a multi-day meeting to review and consider possible content or organizational revisions to the existing task list, as well as changes to educational, practice, or other requirements for certification. Given the intentional diversity of panel membership, it is important that panelists feel free to argue for whatever changes each finds appropriate. This meeting is typically coordinated by a professional in the field of testing and measurement who is skilled in facilitating productive discussion and encouraging the group to systematically probe all aspects of the existing task list and to consider all points of view. The BACB’s most recent panel was coordinated by its psychometrician. The process focused on inclusion of new content, elimination of outdated and redundant content, and reorganization of content into different task areas.
The product of the expert panel is a set of revisions to the existing task list agreed to by majority vote of the panel. The next step involves professionals in testing and measurement turning these revisions into an electronic survey instrument asking respondents to rate the frequency and importance of each task. Other questions may probe the level of supervision needed and the potential for harm that would result from lack of competence. The details of survey questions vary based on the needs of the field in which survey is done. This survey may be tested as a draft instrument with a preliminary cohort of experts in the field to insure that its design and other features will yield useful information. Based on this feedback, minor edits may be made to the survey that does not conflict with the panel’s revisions. A draft of the BACB’s most recent job analysis survey instrument was sent to 282 experts in behavior analysis around the world. The final form of the survey is then sent electronically to a large sample of the field’s membership.
As an example of the survey process, the BACB’s most recent job analysis survey was administered using a web-based survey tool. Survey participants were asked to provide some background information and to then respond to the survey based on their current credential. Due to the extensive time and effort required to accurately complete the survey and given the importance of the survey results to the BACB, five type 5 continuing education credits were offered to individuals who completed the entire survey. Using the BACB database, the effective sample size of the 2009 survey was 7,067. Of these, 2,236 (31.64 %) responded to the survey. The number of responses was sufficient to be considered a representative sample of the certificant population and to permit appropriate analyses to be performed. In fact, the response rate was consistent with industry standards (Henderson and Smith 2009). We refer the reader to the May 2011 issue of the BACB Newsletter for additional information about this survey and its findings (BACB 2011).
Survey data are then thoroughly analyzed. This analysis typically includes evaluating responses across various demographic categories, including age, gender, race, geography, training and experience, employment, and other dimensions. However, the primary focus of the analysis lies in the respondents’ evaluations of each of the task statements in terms of its importance and frequency of performance. In preparing respondents to rate the tasks, survey instructions may encourage respondents to consider factors such as (a) the frequency with which the specified activity is performed, (b) the risks associated with performing the activity poorly, and (c) whether the activity should be tested on the certification examinations. In the case of the BACB, this analysis is conducted separately for BCBA and BCaBA certificants.
A report of the job analysis survey is prepared by testing and measurement professionals and submitted to the credentialing agency. The centerpiece of this report provides descriptive statistical measures of respondent ratings of each task statement in terms of their importance for practitioners. The governing body of the credentialing organization (in the case of the BACB, its Board of Directors) or an assigned committee sets a cutoff for these ratings to determine whether any of the proposed task statements should be eliminated from the task list. The agency also considers recommendations of the expert panel regarding possible changes in educational, practice, or other requirements for certification. The final result of this process is a decision by the organization to promulgate a revised set of task statements and associated requirements for how practitioners must prepare to qualify for the field’s credentials.
These requirements are scheduled to take effect at a specified future date to allow the field to prepare for the changes. Before they are implemented, however, several steps are necessary. The KSAs must be reviewed and revised to match the new task list, with new KSAs being written to cover any new content that was added. The pool of test items must be reviewed and compared to the new task list and KSAs to determine which items can be retained and where they fit. An inventory must then be conducted to identify tasks and KSAs that require more items. The inventory will be used to guide the efforts of item writers who will draft new items based on the new task list. As already described, new items are pilot tested and reviewed to ensure that they meet acceptable performance criteria. After the item pool has been sufficiently updated, new test forms can be generated that match the updated task list requirements.
How the Examination’s Passing Score is Determined
At this point, a cut score used to determine whether a candidate passes or fails the examination must be established. There are different approaches to this task, but most of them involve bringing in a panel of practitioners to systematically review and assess the difficulty of each test item on a “base” examination form. The BACB uses a modified Angoff approach in which panel members estimate the proportion of entry-level practitioners (i.e., those who have sufficient competency to obtain the BCBA or BCaBA credential) who will know the answer to each item (Angoff 1984). The estimates from all panel members are averaged to arrive at a recommended cut score, which is then presented to the BACB Board of Directors for approval.
How the Task List is Turned into an Examination
Actual examinations are created by selecting items from the pool that match the specifications for the base examination form that resulted from the job analysis study. These specifications are referred to as the examination blueprint because they provide detailed information on the content that will be covered in each examination form even down to the number of questions that will be asked about each task. The item selection process is basically a stratified random sampling of the item pool with the strata representing the tasks. Each “active” item that is eligible for use on an examination form has a detailed statistical history, which includes information on the number of people who answers the item correctly, how answering the item correctly relates to the overall test performance, and information on the number of people who selects each of the wrong answers. In addition to statistical performance, other factors that are considered when selecting items include how frequently the items have been used and whether the items are “enemies” of other selected items (i.e., giving answers to other items or asking the same question as another item).
Once the examination form has been selected, it is reviewed by a panel of SMEs to ensure that it meets the blueprint requirements and that all of the selected items are accurate and reflect current practices. In addition, each form is statistically equated to the base examination form to ensure that any differences in difficulty level across forms are taken into account. As a result of the equating process, the pass rates generally remain quite stable throughout the life span of each base examination form. New base examination forms are created approximately every 5 years or whenever there is a significant change to the examination content, such as the introduction of a new task list.
After the experts have approved an examination form, it is administered to candidates during a testing window. After the testing window ends, an item analysis is conducted. This analysis includes an evaluation of the number of candidates who selects each answer choice and the relationship between selecting that answer choice and overall test scores. Items that perform poorly (e.g., those that many candidates answer incorrectly or those that have a negative relationship to test scores) are flagged for review by another panel of SMEs. On rare occasions, the panel may determine that the flagged items have flaws, such as more than one correct answer or even no correct answer. In these cases, the panel may recommend adjustments to the scoring key so that candidates are not adversely affected by the flawed items. Once this review process is completed, scores for the examination are finalized and reported to candidates. Thus, every examination form is subjected to the scrutiny of multiple experts in the field throughout the development cycle. This ensures that any variations in quality and difficulty of the items are accounted for in the scoring process so that candidates have an equal opportunity to demonstrate their knowledge of applied behavior analysis.
The Foundation of High-Stakes Credentialing Procedures
Many other fields use the processes described here to develop credentialing programs that identify competent practitioners and protect the public health, safety, and welfare. For example, Cardiovascular Credentialing International offers ANSI-accredited certifications in eight different specialty areas for technicians working in the cardiovascular field. The National Registry of Food Safety Professionals certifies over 100,000 food safety managers annually through an ANSI-accredited program. (ISC)2 offers an ANSI-certified credential held by almost 100,000 professionals working in the field of information security. The Dental Assisting National Board certifies over 33,000 dental and orthodontic assistants through its two NCCA-accredited examinations. Accreditation of a credentialing body’s practices by ANSI or NCCA is a demanding process which indicates that they rely on appropriate job analysis studies to define the scope of work for their practitioners and include practitioners and other stakeholders throughout all phases of the credential development process in accordance with testing and measurement industry standards (see American National Standards Institute 2003; National Commission for Certifying Agencies 2003).
There are a number of advantages to adhering to national accreditation and best practices in the field of certification. Certificants gain assurances that their examination, application, and related documentation are fairly reviewed in accordance with current psychometric and legal standards for credentialing. The requisite appeal process for denied applications and disciplinary actions also helps to ensure fair enforcement of certification requirements. Consumers, employers, and legislators benefit from a uniform basis to help assess qualifications of service providers. Certificants, universities, and the overall community can depend on a mechanism for notice of proposed changes. Finally, there is comfort in knowing that the certification procedures undergo independent and unbiased review by standard-setting professionals.
Influence over Credentialing Content
The focus of BACB credentialing standards is to produce ABA practitioners that meet the minimum competencies necessary to serve consumers as effectively as the field’s science and technology will allow. It is understandable that there might be disagreement within the field regarding these minimum competencies. Indeed, it is appropriate that such a discussion be ongoing because it reflects a vibrant discipline with genuine interest in its practitioner community. For example, pressure from those with expertise in the basic research literature for greater representation of their interests in practitioner credentialing standards is important in helping to maintain the relationship between the science and its technology. Some understandably push for task standards that better reflect the value of conceptual issues in the work of practitioners. Still, other interests appropriately argue that different treatment populations and settings should be represented by increasing specialization in practitioner credentials.
Such diverse voices are important because, although the standards for credentialing a field’s practitioners may emerge from all of the field’s interests, there is no a priori best answer for what those standards should be. Each interest group may offer its recommendations with unyielding conviction, but it is important that the inevitable conflicts are not settled by political processes. An approach based on the political power of one interest or another might create a clear set of standards, but it can result in a variety of problems. For instance, there may be few educational programs that can meet standards created in this manner. Though its proponents may be pleased with this outcome, a small and slowly growing practitioner cohort may only assure that the credential has limited impact in the marketplace and little value. Another possible problem is that practitioner training may provide broad and deep expertise in certain areas, acquired at considerable expense to students, even though some of this expertise may have little practical value in the daily work of practitioners. A curriculum too strongly biased toward one interest may also limit training in other areas, resulting in certain deficits in practitioner skills. Of course, each community of interest may argue that this is already the case and that the problem can be rectified by modifying the standards so that academic curricula properly reflect its particular concerns.
At the least, all parties to this important debate might agree that the desired outcome of credentialing standards is the production of an adequate supply of credentialed practitioners that have the minimum competencies needed to represent the best of what the field has to offer. However, what does the phrase “minimum competencies” mean? It is tempting to focus on the pejorative connotation of the term “minimum” and argue that we should be aiming at a higher standard, but this misunderstands the term’s application to credentialing standards. Any credentialing examination sets some minimum standard for the competencies of those who pass. That standard may be quite high by some criteria, but it is still a minimum because those who fail to achieve a passing score do not earn the credential. Physicians, lawyers, dentists, accountants, and other credentialed professionals all pass exams that define the minimum competencies targeted by the task statements underlying their exams.
Across practitioners within a field, there is always variation in expertise above this minimum. There will always be some credentialed practitioners who know more or are more effective than others. If some advanced level of expertise above an existing minimum standard were defined by the profession through the above described job analysis process as minimally necessary for its desired standard of competence, it would then be included in the task statements designating minimum competencies. Of course, there would still be some practitioners whose skills exceeded even this new standard. The challenge in determining appropriate task statements is to identify the minimum competencies needed by practitioners to reflect what the field has to offer the society.
These and other issues are part of an important and healthy ongoing debate in behavior analysis. It is important that this discussion should not be hampered by a “let us determine the task list” remedy proposed by one interest or another in the field that conflicts with the processes described in this article, which adhere to the standards of the field of testing and measurement and are consistent with their legal foundation. The primary purpose of these established methods is to ensure that the content of credentialing examinations is broadly based in the mainstream views within a field and is not the agenda of a particular organization, group, or interest. In other words, these methods have evolved to protect consumers by insuring that credentialed practitioners in a field have demonstrated the minimum competencies resulting from a process that prevents any one group or interest from having excessive influence on the designation of those competencies. For the field of behavior analysis, these methods serve the function of producing task standards guiding academic training requirements and exam content that comes from a mix of academics, researchers, and practitioners. Practitioners must be well represented because they are in the best position to respond to the job analysis survey question of how important each task is in their work. To argue that practitioners who have already earned their credential would be inclined to respond to the importance of proposed tasks in a way that lowers credentialing standards for future candidates not only insults these professionals but also reveals a perspective that is out of touch with the contingencies of practice.
Our description of established credentialing procedures should make it especially clear that no organization should play a standard-setting role by establishing its own procedures without subject matter and psychometric guidance. In this regard, it is important to understand that the BACB’s Board of Directors and Chief Executive Officer do not control the content of its task statements or credentialing exams. They do not create the task statements, selectively modify them, or pick and choose among them, aside from setting a floor for evaluating job analysis task rating data in a manner consistent with established credentialing procedures. By following such established procedures, the BACB’s certification programs have achieved a status of legal and professional defensibility and parity with how other profession’s credentials are developed.
References
- American Educational Research Association. American Psychological Association. National Council on Measurement in Education . Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 1999. [Google Scholar]
- American National Standards Institute. (2003). General requirements for bodies operating certification systems of persons. Washington, DC.
- American National Standards Institute. (2014). Accreditation directory. https://www.ansica.org/wwwversion2/outside/ALLdirectoryListing.asp?menuID=2&prgID=201&status=4. Accessed 3 Apr 2014.
- Angoff WH. Scales, norms, and equivalent scores. Princeton: Educational Testing Service; 1984. [Google Scholar]
- Behavior Analyst Certification Board. (2011). BACB Newsletter, May 2011. http://www.bacb.com/newsletter/BACB_Newsletter_05_2011.pdf. Accessed 3 Apr 2014.
- Behavior Analyst Certification Board. (2012). Fourth edition task list.http://www.bacb.com/Downloadfiles/TaskList/BACB_Fourth_Edition_Task_List.pdf. Accessed 3 Apr 2014.
- Henderson JP, Smith D. Job/practice analysis. In: Knapp J, Anderson L, Wild C, editors. Certification: the ICE handbook. Washington, DC: Institute for Credentialing Excellence; 2009. pp. 123–148. [Google Scholar]
- Institute for Credentialing Excellence. (2014). NCCA-accredited certification programs. Retrieved from http://www.credentialingexcellence.org/p/cm/ld/fid=121. Accessed 3 Apr 2014.
- Livingston SA. Item analysis. In: Downing SM, Haladyna TM, editors. Handbook of test development. Mahwah: Lawrence Erlbaum Associates; 2006. pp. 421–441. [Google Scholar]
- National Commission for Certifying Agencies. (2003). National Commission for Certifying Agencies standards for the accreditation of national certification programs. Washington, DC
- Raymond MR, Neustel S. Determining the content of credentialing examinations. In: Downing SM, Haladyna TM, editors. Handbook of test development. Mahwah: Lawrence Erlbaum Associates; 2006. pp. 191–223. [Google Scholar]
- Ricci v. DeStefano (2009), 129 S. Ct. 2658, 2671.
- Shook GL, Johnston JM, Mellichamp F. Determining essential content for applied behavior analyst practitioners. The Behavior Analyst. 2004;27:67–94. doi: 10.1007/BF03392093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- United States Equal Employment Opportunity Commission. (1978). United States Equal Employment Opportunity Commission guidelines on employment testing procedures. Washington, DC.