Abstract
Risk assessment and evaluation before behavioral assessment and intervention is required by the Behavior Analyst Certification Board® (BACB®) Ethics Code for Behavior Analysts (BACB, 2020). Methods to do so and potential factors to consider are not readily available. Deochand et al. Behavior Analysis in Practice, 13, 978–990, (2020) developed the Functional Analysis Risk Assessment Decision Tool (FARADT) to aid behavior analysts in ethical decision-making regarding whether to conduct a functional analysis. An empirical evaluation of whether use of the FARADT impacts novice users’ ratings of risk has not yet been conducted. The present study served as a pilot evaluation of expert and novice behavior analysts’ ratings of risk with and without access to the FARADT when given scenarios in which a functional analysis was being considered. Results indicated that for our participants, FARADT decreased variability of risk ratings for novices and produced ratings of risk that more closely matched the intended risk level of the vignette for both experts and novices. Results provided preliminary evidence that decision-making tools may be helpful to both novice and expert behavior analysts.
In order to be in compliance with the Ethics Code for Behavior Analysts (Behavior Analyst Certification Board® [BACB®], 2020), it is important that behavior analysts are evaluating risks associated with functional analyses prior to beginning assessment.
The FARADT is a tool that may be helpful to both expert and novice behavior analysts as they evaluate the risks inherent in functional analyses.
There is limited empirical research on the utility and effectiveness of behavior-analytic decision-making tools.
Our findings suggest experts engage in complex covert verbal behavior when evaluating risk.
More research is needed on the decision-making processes experts utilize when analyzing complex and nuanced contexts of assessment and treatment.
Supplementary Information
The online version contains supplementary material available at 10.1007/s40617-024-01006-z.
Keywords: Functional analysis, Ethical decision making, Risk assessment
The implementation of function-based interventions based on reinforcement increases the effectiveness of behavioral treatment (Neef & Iwata, 1994). Functional analysis (FA) has been shown to successfully identify the variables that reinforce problem behavior (i.e., the function[s] of the behavior). These results are used to inform the development of interventions to decrease maladaptive behavior and improve adaptive responding. Despite the clear benefits of FA, some risks to the individual and others exist when the individual is exposed to contexts that are hypothesized to evoke maladaptive behavior. These potential risks may include damage to property, injury to the individual, and injury to others in the environment. It is important to weigh the risks of such assessments against the benefits to the individual.
Risk assessments and risk/benefit analyses are often recommended before selecting behavioral assessments and interventions (Iwata & Dozier, 2008). The consideration of risk to individuals served is referred to in several sections of the Ethics Code for Behavior Analysts, including codes 2.13 Selecting, Designing, and Implementing Assessments; 2.14 Selecting, Designing, and Implementing Behavior-Change Interventions; and 2.15 Minimizing Risk of Behavior-Change Interventions (BACB, 2020). However, there is limited information regarding the specific variables that should be considered when assessing risk or a systematic way to assess risk levels for assessments and interventions. In contrast, the medical field has published several methods to guide medical providers through decision-making in regard to risk-benefit analyses, including qualitative and quantitative analyses (Spielthenner, 2012). These methods and tools are largely absent from the behavior-analytic literature.
Behavior analysts with experience in conducting FAs may be able to readily evaluate risks associated with implementing an FA in a specific context, with a specific individual, and with a specific topography of challenging behavior. Experienced behavior analysts may also be able to identify potential modifications to mitigate risk. In contrast, behavior analysts just entering the field may have less experience in, and therefore may be less skilled at, evaluating and mitigating risk in conducting a functional analysis. As of 2022, 34% of credentialed Board Certified Behavior Analysts® (BCBAs®) have only been certified for 3 years or less and 55.2% have been certified for 5 years or less (BACB, n.d.). Given this increasing trend of new BCBA certificants each year, the development of tools to assist novice BCBAs in making decisions about challenging situations in which they have limited experience is needed.
Tools exist for some decisions that behavior analysts are faced with, such as those that aid in the selection of measurement procedures for challenging behavior (LeBlanc et al., 2016) and treatment for escape-maintained behavior (Geiger et al., 2010). Similar tools to aid behavior analysts in their decision making as to whether to conduct a functional analysis may be beneficial. More notably, decision tools to guide novice BCBAs through these difficult decisions will likely improve the safety of recipients of behavior-analytic services, and increase the likelihood that risks are limited or prevented when possible. Deochand et al. (2020) surveyed 664 BCBAs and doctoral-level Board Certified Behavior Analysts (BCBA-Ds®) to evaluate the need for a tool to objectively evaluate the risk of conducting an FA. Approximately 96% of respondents reported that such a tool would be useful for the field of behavior analysis.
Following the results of the survey, Deochand et al. (2020) developed the Functional Analysis Risk Assessment Decision Tool (FARADT), an interactive tool created for use in Microsoft Excel. (This tool can be downloaded at https://link.springer.com/article/10.1007/s40617-020-00433-y. Readers are referred there for a copy of the tool and its details.) In the tool, a tab labeled “Risk Assessment” requires the user to click ratings (1–6) on four subsets of risk: (1) clinical experience of the overseeing BCBA, (2) characteristics of the physical environment, (3) availability and training of support staff, and (4) intensity of the target behavior. Before entering a rating for each quadrant, users are prompted to consider questions related to each risk level. Examples are provided to help guide the user in determining an appropriate rating. After the user inputs ratings from 1 to 6 for each of the four subsets of risk, the algorithm within the FARADT provides an output of overall risk, with a scale ranging from slight to high risk. In addition, associated recommendations are provided for the user (e.g., continue with FA, consider revisions, do not proceed). Additional sections of the tool provide specific recommendations matched to the type of risk identified in the tool to mitigate risk during an FA, as well as references to published articles that describe these methods of reducing risk.
As part of the FARADT’s development, a panel of ten behavior analysts with extensive experience in FAs evaluated the tool to determine its utility and the appropriateness of its content (see Deochand et al., 2020, for a thorough description of the review and evaluation process). Following the review and subsequent revisions, the experts reported that it was an “instructional resource” for students learning about the functional analysis and a “supporting resource” for novice BCBAs (Deochand et al., 2020). While these data provided information on the impressions of experts, Deochand et al. (2020) did not empirically evaluate the FARADT to determine whether expert versus novice ratings of risk on the tool are comparable. Thus, it is unclear whether the tool functions to improve evaluations of risk, and therefore ethical decision making, for novice behavior analysts. Such evaluations of decision-making tools are largely absent from the research base.
Thus, although decision-making tools exist in the behavior analytic literature, it is not clear whether or how they impact decision making for novice behavior analysts. Furthermore, it is not clear such tools produce decision making that more closely matches that of expert behavior analysts. Whether these decision-making tools achieve more appropriate decision making should be assessed. The present study was an initial attempt to evaluate whether the FARADT produced comparable ratings of risk for expert and novice behavior analysts. BCBAs with differing levels of FA experience (i.e., “experts” and “novices”) evaluated the risk levels of potential FA circumstances (described in vignettes) with and without access to the FARADT. Risk levels reported by the experts and novices were compared on both the group and individual levels. Finally, the expert and novice participants were asked about their perceptions of the FARADT and how it impacted their decision making.
General Method
Materials
Materials included an electronic copy of the FARADT, eight vignettes, a recruitment email, and a series of surveys, as well as miscellaneous other materials.
FARADT
The FARADT (Deochand et al., 2020) was downloaded via a link to the tool at the end of the article (https://link.springer.com/article/10.1007/s40617-020-00433-y) for electronic supplemental materials. Users of the FARADT click on a rating in each of four subsets of risk, based on descriptors provided in the tool and how those align with their specific context: (1) clinical experience of the overseeing BCBA, (2) characteristics of the physical environment, (3) availability and training of support staff, and (4) intensity of the target behavior. The tool converts these ratings to an overall level of risk, which is displayed on a slider that ranges from slight, moderate, substantial, and high risk. Figure 1 shows an example of the slider with a “slight risk” overall risk rating, which is shown by the shaded box. The tool produces a numeric score that is not visible to users and is only used by Excel to determine the shaded area of the tool. These numeric values were accessible to the authors of this study, as tool developers, and are shown in the figure to demonstrate the numeric risk levels that will be reported in this study. However, it is important to note that users of the FARADT could see only the risk level descriptors. They did not see the numeric score, as shown in the figure.
Fig. 1.
FARADT output. Note. The numbers on the scale were added for clarity purposes and are not visible to users of the FARADT tool
Vignettes
The lead author constructed eight vignettes that were designed to evoke four different overall levels of risk: slight, moderate, substantial, or high risk. Two vignettes were written for each level of risk. Each vignette consisted of an average of 150 words and summarized a context in which an FA was being considered. Information pertaining to each risk subset/quadrant on the FARADT (i.e., clinical experience of behavior analyst, physical environment, support staff, and behavior intensity) was included in each vignette. For example, in one vignette, the information regarding the overseeing behavior analyst’s clinical experience level stated, “Sami has been certified as a behavior analyst for three years. She has two years of experience conducting functional analyses. She has conducted functional analyses of aggression (hair pulling and biting), inappropriate vocalizations, and self-injurious behavior (skin-picking, hand-to-head).” In another vignette, information regarding the physical environment included: “The center at which [the BCBA] works has an empty room with a round table and padded chair.” The full text of each vignette can be found in the Supplemental Materials.
Prior to commencement of the study, each of the eight vignettes were reviewed independently by ten BCBAs who had been certified for a minimum of 5 years and regularly conducted functional behavior assessments and FAs. These reviewers were excluded from participation in the present study beyond this pre-study phase. The reviewers independently read each vignette and provided a rating for each subset of risk (ratings of 1–6 for each subset) on the FARADT. The tool used these ratings to calculate the reviewer’s overall risk category for that vignette. In addition, the reviewers had the opportunity to write comments regarding the vignette and their scores to provide the researchers with information on areas of the vignettes that needed adjustment. The lead author collected these ratings and compared them to the intended risk level of the vignette and determined agreement among the reviewers for risk scores. If at least eight of the ten reviewers agreed on the ratings of risk for each subset and overall level of risk, the vignette was used verbatim in the study. If there was disagreement in the ratings of risk, further refinement of the vignettes was conducted via additional reviews. During the first round of review, the two vignettes written to evoke scores of high risk and the two vignettes written to evoke scores of slight risk met criteria for inclusion in the study without further refinement. However, discrepancies occurred for the substantial risk and moderate risk vignettes. The authors examined the reviewer’s ratings and comments and edited the vignettes. The edited vignettes were then returned to the ten reviewers for a second round of ratings. Seven of the ten reviewers responded with updated ratings using the FARADT. If there was agreement on the overall risk category among at least six of the seven reviewers, the vignette was used in the study as written. Of the four vignettes sent out for the second round of review, one more vignette met criteria for inclusion in the study. However, discrepancies still occurred for the three remaining vignettes. At this point, all reviewers were invited to participate in a virtual meeting to discuss the vignettes and arrive at a consensus as to how the vignettes should be worded. Five of the seven reviewers agreed to meet with the researchers via a group video conference for this purpose. During the video conference, all five reviewers reached agreement as to the wording of the remaining vignettes as well as the risk level of the remaining vignettes. The edited versions of these vignettes were then used in the study. The vignettes were labeled High Risk 1 (HR 1), High Risk 2 (HR 2), Substantial Risk 1 (SR 1), Substantial Risk 2 (SR 2), Moderate Risk 1 (MR 1), Moderate Risk 2 (MR 2), Slight Risk 1 (SLR 1), and Slight Risk 2 (SLR 2).
E-mail and Surveys
A recruitment email was created to invite participants to participate in the study. Details on how this email was sent out are described below in Procedures.
There were four surveys created for the study: an Inclusion Survey, a Baseline Risk Assessment Survey (hereafter referred to as the Baseline Survey), a Risk Assessment with FARADT Survey (hereafter referred to as the FARADT Survey), and a Social Validity Survey. All surveys are provided in the Supplemental Materials.
Miscellaneous Other Materials
Participants completed the study activities using their own computers, smartphones, or tablets. Amazon gift cards in the amount of $25 were provided to the first six participants in each pool (novice and expert) who completed all portions of the study (i.e., all surveys).
Participant Recruitment and Participants
After review and approval of the study’s method and procedures by Western Michigan University’s Human Subjects Institutional Review Board, expert and novice participants were recruited through emails sent to all BCBA and BCBA-D certificants via the Behavior Analyst Certification Board’s (BACB n.d.) Mass Email Service and the Teaching Behavior Analysis email listserv.
The first criterion for participant inclusion in the study for all participants was that they reported no prior knowledge of the FARADT. From that point, participants were sorted into one of two categories: “Expert” participants were respondents who reported at least three of the following: (1) holding a BCBA-D credential, (2) having experience conducting (i.e., acting as overseeing BCBA) at least 15 FAs in their career; (3) having experience conducting functional analyses of aggression, self-injurious behavior, and property destruction; and (4) having been first or second author of a peer-reviewed publication that included a functional analysis. Of note is that holding a BCBA credential for a specific number of years (or at all) was not a required criterion for inclusion in the expert pool. This was because some prominent experts in behavior analysis are not BCBAs, and we did not want to screen out those individuals from inclusion in our study. “Novices” were participants who reported having earned their BCBA credential within the previous 2 years and at least one of the following: (1) having conducted five or fewer FAs in their career as a behavior analyst and/or (2) having conducted FAs solely for relatively minor problem behaviors (crying, tantrums, and noncompliance). Six experts and nine novices completed all phases of the study. Specific participant demographics as reported on the survey are shown in Tables 1 and 2. Participants who neither met criteria to be an “expert” nor a “novice” were excluded from the study.
Table 1.
Novice demographics
| Participant | Credential | Amount of Time Credentialed | Number of FAs Overseen | Behaviors Assessed in FAs | First or Second Author of Publication? |
|---|---|---|---|---|---|
| Novice 1 | BCBA | 0–2 years | 0–5 | Aggression, tantrum, non-compliance | No |
| Novice 2 | BCBA | 0–2 years | 0–5 | Aggression | No |
| Novice 3 | BCBA | 0–2 years | 0–5 | Tantrum, non-compliance | No |
| Novice 4 | BCBA | 0–2 years | 0–5 | Self-injurious behavior, tantrum, non-compliance | No |
| Novice 5 | BCBA | 0–2 years | 0–5 | Crying, tantrum | No |
| Novice 6 | BCBA | 0–2 years | 0–5 | Aggression, property destruction | No |
| Novice 7 | BCBA | 0–2 years | 0–5 | Aggression, property destruction, self-injurious behavior, tantrum, non-compliance | No |
| Novice 8 | BCBA | 0–2 years | 0–5 | Property destruction, tantrum, non-compliance | No |
| Novice 9 | BCBA | 0–2 years | 0–5 | Aggression, property destruction, self-injurious behavior, crying, tantrum | No |
Table 2.
Expert demographics
| Participant | Credential | Amount of Time Credentialed | Number of FAs Overseen | Behaviors Assessed in FAs | First or Second Author of Publication? |
|---|---|---|---|---|---|
| Expert 1 | BCBA-D | 3 or more years | 15 or more | Aggression, property destruction, self-injurious behavior, non-compliance | Yes |
| Expert 2 | BCBA-D | 3 or more years | 15 or more | Aggression, property destruction, self-injurious behavior, crying, tantrum, non-compliance | Yes |
| Expert 3 | BCBA-D | 3 or more years | 15 or more | Aggression, property destruction, self-injurious behavior, crying, tantrum | No |
| Expert 4 | BCBA | 0–2 years | 15 or more | Aggression, property destruction, self-injurious behavior, crying, tantrum, non-compliance | Yes |
| Expert 5 | BCBA | 3 or more years | 15 or more | Aggression, property destruction, self-injurious behavior, crying, tantrum, non-compliance | Yes |
| Expert 6 | BCBA-D | 3 or more years | 15 or more | Aggression, property destruction, self-injurious behavior, crying, tantrum | Yes |
Procedures
All sessions were conducted remotely and asynchronously via a series of Qualtrics surveys. The sequence of surveys was as follows: Inclusion Survey, Baseline Survey, FARADT Survey, and the Social Validity Survey. When a potential participant clicked the Qualtrics link in the recruitment email, an informed consent form was provided. If the respondent provided informed consent by clicking in the appropriate places, they were then presented with the Inclusion Survey. After submitting their responses to the Inclusion Survey, a screen appeared that stated: “Thank you for filling out this survey. You may qualify for participation in the study. Please enter your email below if you would like to participate in the study. An investigator will be in contact within two business days to follow up.”
Upon receiving survey responses, the lead author screened whether respondents met criteria for participation in the study by manually reviewing their responses to the inclusion survey. When a potential participant was identified, the lead author sorted the participants into “expert” and “novice” categories based on their responses to the inclusion survey and the criteria described above. Expert and novice participants were assigned to either Group 1 or 2 based on their order of enrollment. The first expert was assigned to Expert Group 1 (n = 3), the second to Expert Group 2 (n = 3), the third to Expert Group 1, and so on for a total of six experts. The novices were assigned to groups in the same manner, with the first identified novice assigned to Novice Group 1 (n = 5) and the second identified novice assigned to Novice Group 2 (n = 4), for a total of nine novices.
Next, the first author emailed each participant notifying them of their inclusion in the study and provided them with the following information: their assigned participant identifier, instructions on how to complete the study surveys, and a link to the Qualtrics surveys. Four vignettes were presented to each participant during each of two conditions (Baseline Survey and FARADT Survey, described below): one high risk, one substantial risk, one moderate risk, and one slight risk. Qualtrics was programmed to present the vignettes in a counterbalanced fashion (e.g., SR 1 vs. SR 2) across participants while ensuring that (a) each participant completed each vignette only one time during the study and (b) half of the participants in a group completed the same four vignettes in the Baseline Survey and a different set of four vignettes in the FARADT Survey, while the other half of the participants in the same group completed the vignette sets in an opposite fashion. However, owing to a technical glitch on Qualtrics, Expert Group 2 and Novice Group 2 received vignettes in the same order in the Baseline Survey. See Table 3 for a description of which groups received which vignettes during each phase of the study.
Table 3.
Order of vignette presentation
| Condition | Expert | Novice | ||
|---|---|---|---|---|
| Group 1 (Experts 1–3) |
Group 2 (Experts 4–6) |
Group 1 (Novices 1–5) |
Group 2 (Novices 6–9) |
|
| Baseline Risk Assessment |
Vignettes SLR 2, MR 1, SR 2, HR 1 (randomized*) |
Vignettes SR 1, SLR 1, HR 2, MR 2 |
Vignettes SLR 2, MR 1, SR 2, HR 1 (randomized*) |
Vignettes SR 1, SLR 1, HR 2, MR 2 |
| Risk Assessment with FARADT |
Vignettes SLR 1, MR 2, SR 1, HR 2 (randomized*) |
Vignettes SLR 2, MR 1, SR 2, HR 1 (randomized*) |
Vignettes SLR 1, MR 2, SR 1, HR 2 (randomized*) |
Vignettes SLR 2, MR 1, SR 2, HR 1 (randomized*) |
*The order of vignettes within each condition (e.g., SLR 2, MR 1, SR 2, HR 1) does not reflect the order in which the vignettes were presented to participants, as these were randomized for each participant via Qualtrics’ randomizer algorithm
Vignettes were presented one at a time. Participants were instructed to read each vignette. Because participants completed the surveys asynchronously and remotely and, thus, we could not observe whether participants actually read the survey questions, participants were required to make an active response to indicate they had read each portion of the vignette. Specifically, participants were shown the first portion of a vignette and were required to click “I’ve read this and I’m ready to move on” before the next portion of the vignette was displayed. After reading the entire vignette, participants were asked to rate the risk level, either overall risk level or the risk level for each of the four subsets described above, depending on the experimental condition (see below). Participants then clicked a right-facing arrow to progress to the next vignette. No backtracking was allowed (i.e., participants could not return to a previous vignette and adjust their ratings). An optional “Comments” section was also provided following each vignette, in which participants could enter up to 1500 characters of text to provide additional notes, comments, or considerations.
Baseline Survey
In the Baseline Survey, after reading each vignette, each participant was instructed to click a box to indicate their perceived overall level of risk (slight risk [1] to high risk [12]) for conducting an FA by sliding a slider to correspond to their rating of risk. Participants were provided with no tools or assistance for making their ratings. After participants had read and responded to all four vignettes, they clicked a button to indicate they were finished.
FARADT Survey
Next, in the FARADT Survey, participants rated risk for four different vignettes. However, this time, they were provided access to the FARADT and rated risk on each of the four subsets identified on the FARADT, which produced the overall rating of risk from those ratings. This survey began with an instruction for the participant to download the Microsoft Excel version of the FARADT from Qualtrics. Participants then were required to read a brief, four- to five-sentence introduction to the FARADT and view a brief video that explained how to use the FARADT. Finally, participants were given an instruction to refer to and use the FARADT when evaluating the vignettes. Participants were required to choose “I agree” from a dropdown menu to progress to the first vignette. After the participant read all parts of the vignette, they were instructed to use the FARADT tool to evaluate risk described in the vignette by clicking on the level of risk in each subset on the tool. Participants were prompted to also enter these same risk entries on a matrix-style question format on the survey. The participants were then prompted to observe the overall risk presented on the FARADT as a result of their entries on the four subsets of risk and enter the resulting overall risk level (slight risk, moderate risk, substantial risk, or high risk) on the survey. Finally, participants could enter any comments about the vignette and/or risk ratings. After responding to all four vignettes, participants clicked a button to indicate they were finished.
Social Validity Survey
After completing the Baseline and FARADT surveys, the Social Validity Survey appeared. Participants were presented with the instruction: “Please rate each statement with your level of agreement.” The statements were:
The risk levels evaluated by the FARADT were what I would have thought they would be.
The FARADT would be helpful for more novice BCBAs in assessing risk when conducting functional analyses.
The FARADT prompted me to consider variables that contribute to risk that I would not have considered otherwise.
After submitting their answers to the Social Validity Survey, participants were shown the following message: “Thank you for your time and effort in completing the preceding surveys! Please enter your email below. If you are one of the first 12 respondents to complete all portions of the study, you will receive an email with instructions on how to redeem your $25 Amazon gift card.” If the participant was one of the first 12 respondents to complete all portions of the study, the lead author emailed them their Amazon gift card. If the participant was not one of the first 12 respondents to complete all portions of the study, the participant did not receive any further communication from the lead author.
Independent Variable, Dependent Variables, Design, and Analysis
The independent variable was access to the FARADT while novices and experts conducted risk evaluations. There were four dependent variables related to participant ratings of risk as entered on the Qualtrics surveys. The overall risk category (slight risk, moderate risk, substantial risk, or high risk), as well as the numeric risk level within that category, was measured for each participant. In addition to evaluating the participant ratings in and of themselves, the ratings were also compared to the intended overall risk category and numeric risk rating that was agreed upon by the pre-study review group. To make this comparison, we calculated the deviation of individual participant ratings from the intended risk level agreed upon by the pre-study group.
The deviation from intended numeric risk levels was calculated for each individual participant and for each vignette by subtracting the intended numeric risk level from the numeric risk level reported by the participant. For example, if the intended numeric risk level of a vignette was 4, and the participant rated the risk level as 4, the deviation from the intended numeric risk level was 0, indicating the participant exactly matched the intended level of risk for that vignette. If the intended numeric risk level of a vignette was 4, and the participant rated the risk level as 9, the deviation from the intended numeric risk level was 5, indicating they overestimated the intended level of risk. If the intended numeric risk level of a vignette was 4, and the participant rated the risk level as 1, the deviation from the intended numeric risk level was −3, indicating they underestimated the level of risk. The higher the deviation (whether + or −), the greater the over- or underestimation. These deviations from intended numeric risk levels were graphed for each individual participant and vignette to allow for visual analysis of individual performance.
Additional measures derived from participant ratings of risk were mean numeric risk level reported and deviation from intended numeric risk levels. These ratings were compared in multiple ways. The mean numeric risk level was calculated by summing all numeric risk levels reported for each vignette by novices and experts and dividing by the number of participants in the group. For example, if Expert Group 1 (which had three experts) evaluated Vignette SLR 1, the numeric risk levels were summed and divided by three.
A pre- and post-AB design across multiple participants (Cooper et al., 2019) was utilized to evaluate risk ratings in the Baseline Survey and the FARADT Survey. This quasi-experimental design allowed us to compare expert vs. novice participant performance with and without access to the FARADT within and across vignettes. Visual analysis of graphed data was used to compare responding between the Baseline Survey and the FARADT Survey. Analyses were conducted at the group level (across novice and expert participants) and within participants (across the Baseline and FARADT Surveys).
Results
Group Evaluations of Risk
First, we were interested in determining whether overall mean novice and expert numeric ratings of risk more closely matched the overall risk category (slight, moderate, substantial, or high risk) when participants used the FARADT to guide their risk evaluations than when they did not.
Mean Numeric Risk Levels: Novices Vs. Experts by Overall Risk Categories
Figure 2 depicts the mean numeric risk levels reported by experts and novices by vignette compared to the intended risk rating for the vignettes. The histograms show the mean level of risk provided by all participants in the group, and the individual data points show the risk levels provided by individuals in that group. Given results were different across the various risk categories, results are discussed below by risk category of vignettes.
Fig. 2.
Mean numeric risk ratings by vignette
Vignettes HR 1 and HR 2 were intended to evoke a numeric risk rating of 12 and a “high risk” category rating. The top panel of Fig. 2 depicts the data for HR 1 (top left panel) and HR 2 (top right panel). On average, novices evaluated HR 1 as high risk in both the Baseline Survey and the FARADT Survey. Experts slightly underestimated risk in the Baseline Survey, rating the vignette as substantial risk (average numeric risk level of 9), but their risk evaluations with FARADT resulted in a risk evaluation that matched the intended risk level, on average (average numeric risk level of 11.33). Both novice and expert risk evaluations of HR 2 in the Baseline Survey were far below the intended risk level, on average. With the FARADT, however, both novices and expert ratings of risk were consistent with the intended high-risk level. Further, for both HR 1 and HR 2, there was less variability in ratings across participants when the FARADT was used.
Results for vignettes SR 1 (left) and SR 2 (right) are shown in the second panel. These vignettes were intended to depict a numeric risk rating of 7 and a “substantial risk.” On average, mean novice risk ratings for SR 1 matched the intended risk level during both the Baseline Survey and the FARADT Survey. Novices also nearly matched the intended risk level for SR 2 in the Baseline Survey and matched the intended risk level when provided the FARADT. Experts, on the other hand, underestimated the risk level for SR 1 in both the Baseline Survey and the FARADT Survey. For SR 2, expert mean numeric risk ratings matched the intended risk category but when given access to the FARADT, expert numeric ratings of risk were below the intended risk category. As with HR 1 and HR 2, variability among ratings was reduced with the use of the FARADT for novices. However, for experts, little change in variability occurred with the use of FARADT, and variability actually increased slightly in the case of SR 2.
Moderate risk vignettes (MR 1 [left] and MR 2 [right]) were intended to depict a numeric risk rating of 5 and a risk category of “moderate risk” and are shown in the third panel of Fig. 2. For the Baseline Survey, novice and expert ratings of risk for MR1 matched the intended risk level. However, with the FARADT, both groups overestimated the intended risk level. For MR 2, however, both novices and experts underestimated risk in the Baseline Survey and matched the intended risk level with the FARADT. Generally, variability of ratings was reduced with the FARADT, with the exception of the novice ratings for MR 1, where variability actually increased with the FARADT.
Slight risk vignettes (SLR 1 [left] and SLR 2 [right]) were intended to evoke a “slight risk” rating of risk. Both novices and experts rated risk as “slight risk” both in the Baseline Survey and with the FARADT. Variability was roughly equivalent across both novices and experts and across the Baseline Survey and the FARADT Survey.
Means of risk evaluations across participants are one measure of vignette-specific responding that can be used to evaluate the effects of the FARADT. However, because of the counterbalanced nature of the experimental design, the means in the Baseline Survey and the FARADT Survey for experts and novices reflect different participants for each vignette. That is, the participants who evaluated the HR 1 vignette in the Baseline Survey evaluated the HR 2 vignette with the FARADT, and those who evaluated HR 2 in the Baseline survey evaluated HR 1 with the FARADT. Thus, conclusions derived from the group averages may not be reflective of individual participant responding on the Baseline and FARADT Surveys. Therefore, we also evaluated performance with and without the FARADT at the individual level. These comparisons are described below.
Individual Evaluations of Risk
Novices
Figure 3 displays deviations from the intended risk level for novices in the Baseline Survey and the FARADT Survey. Seven of the nine novices (Novices 1, 3, 4, 5, 6, 8, and 9) made risk assessments that more closely matched the intended risk level of the vignettes when they used the FARADT to guide their assessment. For example, in the Baseline Survey, Novice 5’s numeric risk ratings were either slightly above or well below the intended numeric risk rating across vignettes. However, when Novice 5 used FARADT to guide their risk assessment, their assessment of risk closely matched the intended risk levels for each vignette. For Novices 2 and 7, using the FARADT to guide risk ratings produced ratings that less closely matched the intended risk level, in most cases, producing risk levels that were too high.
Fig. 3.
Novice deviation from intended risk level by vignette. Note. The order of data points does not represent the order in which participants contacted the respective vignettes
Experts
Figure 4 shows the individual-level data for experts across all vignettes, in both the Baseline and FARADT Surveys. Five of the six experts (Experts 2, 3, 4, 5, and 6) had higher agreement with the intended levels of risk when they used the FARADT to guide their ratings. For example, Expert 6’s rating of one vignette (SLR 1) matched the intended numeric risk rating in the Baseline Survey. However, Expert 6’s ratings of risk for all of the remaining vignettes in the Baseline Survey were below the intended risk rating. When using the FARADT to guide assessment of risk, Expert 6 rated three of the four vignettes in exact agreement with the intended risk rating and rated one vignette’s risk slightly below the intended risk rating.
Fig. 4.
Expert deviation from intended risk level by vignette. Note. The order of data points does not represent the order in which participants contacted the respective vignettes
For one of the six experts (Expert 2), risk assessments using the FARADT resulted in lower agreement with the intended risk level. Specifically, in the Baseline Survey, Expert 2’s ratings of risk matched the intended level of risk for three of the four vignettes. However, when using the FARADT to guide risk ratings, Expert 2 underestimated risk rating for two vignettes and overestimated risk for two vignettes.
Social Validity Survey Results
Each of the 15 participants completed the social validity survey. Results of the social validity assessment for all participants are depicted in Fig. 5. When taken together, a slight majority of participants somewhat or strongly agreed that the risk levels reported by the FARADT were what they would have predicted. Approximately 87% of participants reported that the FARADT would be helpful for novice BCBAs in assessing the risk of conducting a functional analysis, and almost all participants reported that the FARADT prompted them to consider variables related to risk associated with FAs that they would not have considered otherwise. When viewed separately, fewer experts reported the FARADT prompted them to consider variables related to risk that they would not have otherwise considered. Further, in general, the experts rated the utility of the FARADT lower than novices.
Fig. 5.
Social validity results
Discussion
This study sought to evaluate whether providing experts and novices with access to the FARADT would evoke ratings of risk that closely matched the intended risk level of a variety of vignettes. We hypothesized that expert ratings of risk would match the intended risk levels both with and without the FARADT, given their experience and expertise but that novices’ ratings of risk would closely match intended risk levels only when they used the FARADT. However, taken together, the results of this study suggest that both novice and expert assessments of risk more closely agreed with intended risk levels when given access to the FARADT. This was demonstrated on both the individual and group levels. Further, both experts and novices reported high levels of social validity of the FARADT, with many indicating that the FARADT caused them to consider variables they had not considered previously when evaluating risk. However, novices responded more positively than experts on the social validity questionnaire. These results suggest that both novices and experts can benefit from using decision-making tools such as the FARADT to guide their evaluations of risk when performing a functional analysis.
Perhaps not unsurprisingly, both high- and slight-risk vignettes produced the least variability in ratings of risk and ratings of risk regardless of access to FARADT. This suggests that the signals for risk in these situations might be clearer. The FARADT may be less influential in decision-making when contextual variables indicate risk is either very low or very high. However, the vignettes written to evoke moderate and substantial ratings of risk produced variability in risk ratings across both expert and novice participants. In other words, it was more difficult for participants to discriminate that risk existed and to evaluate the extent of that risk when risk was in the more moderate to substantial range. This was consistent with the findings from our initial pre-study group of experts. Those experts could easily agree on high- and low-risk contexts. However, to gain agreement on the moderate- to substantial-risk contexts, a consensus meeting was required. Example contexts where risk is more “middle of the road” appears to make signal detection more difficult.
We attempted to speculate why risk was more difficult for participants to identify in these contexts. It appeared that the language used in the vignettes, combined with individuals’ experiences and histories with assessment and treatment of problem behavior, may impact the stimuli that function as signals for risk. Throughout this study, we found that specific words had nuanced meanings for different individuals, depending on their histories. For example, during the video meeting with reviewers we identified portions of vignettes that led to disagreement among our pre-study vignette reviewers. Most of the disagreements among reviewers occurred within the subset of risk associated with behavior intensity. When a vignette described the use of blocking for problem behavior (e.g., “blocking procedure is successful 95% of the time”), reviewers responded differently to the behavior intensity rating. Some reviewers rated behavior intensity lower because successful blocking limits injury. Other reviewers felt risk was still high because blocking was required to limit injury. Reviewers also noted they heavily considered the permanent product of dangerous behavior. For example, reviewers noted that when a vignette described self-hair-pulling resulting in loss of hair each time the behavior occurred, it was difficult to determine behavior intensity. Some reviewers asked how much hair loss altogether the behavior resulted in. Similarly, the reviewers reported they weighed the injury left by aggression (e.g., bruising, redness, or broken skin) heavily in their assessments of risk. We found that language used in the vignettes greatly mattered to the reviewers. Further, their described histories of experience working with individuals with severe problem behavior also impacted their considerations. In general, when it came to vignettes that we intended to evoke risk ratings of moderate to substantial, the expert reviewers had divergent opinions but could reach consensus after having discussion and clarifying information with each other. Some of the variability we observed in risk ratings was also likely due to individual histories with problem behavior and the subtleties of language contained in specific vignettes or access to the FARADT. Future research could evaluate how subtle changes in language in describing problem behavior changes ratings of risk.
On average, however, participants were able to more closely match the intended risk level when using the FARADT for contexts of moderate to substantial risk (as well as other risk levels). In the moderate risk vignettes, all participants increased their ratings of risk when provided with the FARADT, thus overestimating risk of conducting a functional analysis. This was true even when their ratings of risk matched the intended level of risk in the Baseline Survey, when they did not use the FARADT to evaluate risk. In other words, when participants rated moderate-risk vignettes with agreement in the Baseline Survey, their ratings increased in the FARADT Survey, indicating decreased agreement and overestimation of the intended risk level. This suggests the FARADT may prove most useful for guiding assessment of risk when risk is moderate to substantial. It is important to note, however, that often, the FARADT resulted in risk ratings that overestimated the intended risk level. Overestimating risk is probably more preferred to underestimating risk when considering safety. On the other hand, overestimating risk may be problematic, especially if it results in delaying or denying access to assessment of problem behavior and subsequent effective, function-based treatment. For example, if a therapist determines a functional analysis is too risky, decides to proceed with treatment that was not informed by a functional analysis, new risks are introduced for the client. The Ethics Code for Behavior Analysts’ Code 2.01 Providing Effective Treatment states that behavior analysts “provide services that are conceptually consistent with behavioral principles, based on scientific evidence, and designed to maximize desired outcomes for and protect all clients… from harm” (BACB, 2020). By withholding the systematic evaluation of the function of challenging behavior, treatment may not be effective, increasing the likelihood of further injury to the client or other stakeholders. High risk for conducting a functional analysis does not necessarily mean a functional analysis should not be conducted. Rather, adaptations and modifications to functional analysis procedures should be considered, as suggested in another portion of the FARADT. Users of the FARADT may review the Risk Reduction and Consideration tabs of the FARADT for alternative methods of analysis that mitigate the potential risk and safely identify the function of the behavior.
Taken together, results of the social validity survey indicate that most participants (66.67%) reported FARADT helped them make better assessments of risk, not that it produced assessments of risk that were off the mark. We disaggregated the social validity data by experts and novices to identify whether there were any differences in responding across these groups. Generally, novices responded more favorably on all the survey items, with most novices somewhat agreeing or strongly agreeing to each of the statements. On the other hand, experts showed more variability as to whether the FARADT prompted them to think about variables that contribute to risk that they would not have otherwise considered; three of the six expert participants reported they disagreed or neither agreed nor disagreed. This is rather unsurprising, given that many experts may already be skilled at considering variables that increase risk during an FA. Most participants agreed or strongly agreed that the tool was useful for novices and that the tool seemed to produce assessments of risk that were on target.
As with any study, there are limitations that may limit conclusions about the FARADT. First, accuracy of risk assessment is difficult to evaluate, as the result of any risk assessment does not have a true value. Risk is a highly subjective concept, and there are many contextual variables that can inform a determination of risk. For this reason, the term “accuracy” was not used as a dependent variable for the study. Rather, the term agreement with intended risk levels was used as a dependent variable. In an effort to increase the likelihood that our vignettes contained information that would evoke specific levels of risk and would be representative of situations behavior analysts might encounter when evaluating risk, a group of independent reviewers evaluated the vignettes prior to the study and provided their perceived risk levels. Despite our efforts, the risk levels these reviewers identified may not be the same as those other behavior analysts might identify. Further, there appeared to be nuances and subtleties that were difficult to capture in our vignettes, which our experts identified through discussion with one another. Sometimes, the experts indicated they did not have enough information or they stated they inferred information that was not present in the vignette, based on their own experiences. Thus, it is important to understand that the “intended” risk level of the vignettes is only that—intended. The true risk level of any potential functional analysis context may be impossible to identify.
Similarly, we only evaluated the total risk as suggested by the FARADT. We could not evaluate how participants rated each subset of risk (clinical experience, physical environment, support staff, and intensity of the target behavior) with and without the FARADT because of the way we collected data. During our Baseline Survey, we only asked participants to provide an overall rating of risk rather than rating risk on each of the four subsets. We did this because we did not want to provide instruction on these four subsets in the baseline condition. We were interested in measuring indications of risk in the absence of any further guidance. Thus, it is unknown what factors our participants considered in determining the overall level of risk during baseline and whether these changed given the use of the FARADT. An analysis of these factors would require a different approach to the study but would be of much interest because it may provide insights as to what factors individuals take into account when evaluating risk and how they weight such factors. The FARADT considers four subset categories of risk (clinical experience of the overseeing BCBA, characteristics of the physical environment, availability and training of support staff, and intensity of the target behavior). There may be other factors individuals consider when evaluating risk. Furthermore, the FARADT weights each of these subsets differently. It is unclear whether all individuals weigh these subsets similarly. One person may weigh a subset more heavily than another, but the other person may weigh a different subset differently. In the end, these different weightings may or may not “cancel each other out,” resulting in the same overall level of risk. Better understanding the factors at play in a risk analysis and the weight each factor should have might assist in further refining decision-making models for analyzing risk.
Other limitations of the study are related to its design. We attempted to control for practice effects and variability that might be caused by specific information in vignettes by counterbalancing vignettes across experts and novices and across the Baseline and FARADT Surveys. Thus, the vignettes an individual evaluated in the Baseline Survey and in the FARADT Survey were different. It is unknown how participant ratings of risk on any one specific vignette would have changed when given the FARADT. Another way to approach the study would have been to have participants review the same vignettes a second time with the FARADT. This approach, however, introduces practice effects, which were ruled out by the design in this study. Future research might consider having participants review the same vignettes in both the Baseline Survey and the FARADT Survey to determine whether participants consider the same contextual variables differently when they assess risk with the FARADT. Such a design would control for any potential vignette-specific differences. It should also be noted that a glitch in Qualtrics occurred that caused Novice Group 2 and Expert Group 2 to receive a fixed sequence of vignettes in the Baseline Survey. It is unclear what, if any, effect this had on outcomes. This study also included a relatively small sample size, and results are likely not reflective of a representative sample. A larger participant pool and the utility of a group design may have provided more conclusive results. These data were, and should be, considered “pilot” data that suggest further study of the FARADT, perhaps using a group design, may be warranted.
Whether the participants in our study were, in fact, “experts” in functional analysis, decision making, or ethics is unknown. Our inclusionary criteria for “expert” participants were based on years of experience conducting functional analyses and topographies of problem behavior evaluated. It was difficult for us to determine what level of experience in any of these categories made someone an “expert.” Furthermore, we did not include criteria for expertise in ethics or decision making to qualify as “experts.” In hindsight, these criteria may have been necessary for expertise in determining risk. Having expertise in functional analysis may or may not mean an individual has a history of making ethical decisions regarding functional analyses. Evaluating this as an inclusionary criterion for “expertise” would be difficult. However, future researchers on this topic may wish to consider including experts in ethics and decision making in their participant pool.
Finally, because the study was conducted remotely and asynchronously, it is not possible to know whether the participants actually downloaded and used the tool to conduct their evaluations of risk. It seems likely they did, given their feedback on the social validity survey and because they were required to play a video about the tool, but it is possible they did not download and use the tool. Another possibility that cannot be ruled out is that participants used other resources when completing the surveys, given the asynchronous nature of the study. This may limit our confidence that FARADT alone produced the changes observed between the Baseline Survey to the FARADT Survey.
The comments provided by participants afforded insight into potential future improvements that could be made to the FARADT. One expert acknowledged that the FARADT only addresses the potential risk involved in conducting a standard functional analysis. Although the FARADT provides recommendations for potential modifications or other types of analyses (e.g., choice assessments, brief FAs, trial-based FAs) based on the reported subsets of risk, the FARADT itself does not take those modifications into consideration for the initial assessment of risk. It may be advisable to modify the FARADT in some way to allow for these variations. Additionally, one novice reported that the language in the FARADT was “vague,” and another provided the suggestion to separate tissue damage from the ability to block the target behavior in the behavior intensity subset of risk. These suggestions were informative in terms of other factors and nuances individuals are considering when evaluating risk.
As evidenced by these comments, weighing the risks and benefits of conducting a functional analysis is an important and difficult process. As is true for all ethical and moral dilemmas, behavior analysts must weigh multiple considerations to come to a decision. Each individual may weigh these considerations differently based on their unique history. The varied responses by the experts in this study and the number of rounds of revisions required for the reviewers to agree on the risk levels described in the vignettes provide evidence for the difficult nature of risk assessment, especially in the mid-level risk range (i.e., substantial and moderate). Despite these challenges, behavior analysts must rise to the occasion and make ethical decisions across assessment and intervention as outlined in the BACB Ethics Code. Ethical decision-making incorporates the consideration of many contextual variables. Making explicit some of the contextual variables that should be considered during a behavior analyst’s training may allow for the development of an ethical decision-making repertoire. Decision-making tools may assist newly certified behavior analysts or behavior analysts with limited experience to make ethical and sound decisions and behave in ways consistent with the ethics code. While some decision-making tools exist (e.g., Geiger et al., 2010; LeBlanc et al., 2016; Deochand et al., 2020), they are seldom (if ever) evaluated to determine whether they alter decision-making. Thus, the empirical research on the efficacy of these tools is extremely limited. This study represents one attempt to empirically evaluate the effects of one such tool on decision making. At the same time, it is important for practitioners to understand that tools like the FARADT are not intended to make definitive decisions for a behavior analyst. Rather, tools like the FARADT should be used to guide conversations among colleagues and evoke consideration of the multiple variables that should influence clinical decisions. The FARADT may be a useful tool for helping to guide these discussions, given the results of this study. As a field, we have limited data on how we effectively teach and measure decision-making skills. It is our hope that this study will encourage future research in this area. This study represents one attempt to evaluate whether decision-making tools effectively impact the covert verbal behavior in which individuals engage during the analytical process. After all, decision making of this type of is the “analysis” in “behavior analysis.”
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgments
We thank Dr. Jonathan Baker and Dr. Rebecca Eldridge for their contributions to this manuscript. We also express sincere gratitude to John Staubitz, Dr. Michael Kranak, Dr. Adam Briggs, Nathan VanderWeele, Dr. Kelly Schieltz, Dr. Sarah Mead Jasperse, and Colleen O’Grady for their invaluable input and assistance with developing the materials for this study. We thank the Behavior Analyst Certification Board® (BACB®) for allowing us to distribute the surveys used for this study via their Mass Email Service.
Author Note
This study was completed by the first author in partial fulfillment of a doctoral degree in behavior analysis at Western Michigan University.
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Conflict of interest
We have no conflicts of interest to disclose.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Behavior Analyst Certification Board. (2020). Ethics code for behavior analysts. Retrieved from https://bacb.com/wp-content/ethics-code-for-behavior-analysts/
- Behavior Analyst Certification Board. (n.d). BACB certificant data. Retrieved from https://www.bacb.com/BACB-certificant-data
- Cooper, J. O., Heron, T. E., & Heward, W. L. (2019). Applied Behavior Analysis (3rd ed.). Pearson Education. [Google Scholar]
- Deochand, N., Eldridge, R. R., & Peterson, S. M. (2020). Toward the development of a functional analysis risk assessment decision tool. Behavior Analysis in Practice,13(4), 978–990. 10.1007/s40617-020-00433-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geiger, K. B., Carr, J. E., & LeBlanc, L. A. (2010). Function-based treatments for escape-maintained problem behavior: A treatment-selection model for practicing behavior analysts. Behavior Analysis in Practice,3(1), 22–32. 10.1007/BF03391755 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iwata, B. A., & Dozier, C. L. (2008). Clinical application of functional analysis methodology. Behavior Analysis in Practice,1(1), 3–9. 10.1007/bf03391714 [DOI] [PMC free article] [PubMed] [Google Scholar]
- LeBlanc, L. A., Raetz, P. B., Sellers, T. P., & Carr, J. E. (2016). A proposed model for selecting measurement procedures for the assessment and treatment of problem behavior. Behavior Analysis in Practice,9(1), 77–83. 10.1007/s40617-015-0063-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neef, N. A., & Iwata, B. A. (1994). Current research on functional analysis methodologies: An introduction. Journal of Applied Behavior Analysis,27(2), 211–214. 10.1901/jaba.1994.27-211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spielthenner, G. (2012). Risk-benefit analysis: From a logical point of view. Bioethical Inquiry,9, 161–170. 10.1007/s11673-012-9366-y [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.





