Skip to main content
Behavior Analysis in Practice logoLink to Behavior Analysis in Practice
. 2024 Feb 15;17(2):643–655. doi: 10.1007/s40617-024-00910-8

A Practitioner’s Guide to Measuring Procedural Fidelity

Cody Morris 1,, Stephanie H Jones 1, Jacob P Oliveira 1
PMCID: PMC11219619  PMID: 38966272

Abstract

Ensuring high levels of procedural fidelity during behavior-analytic interventions is a crucial component of providing effective behavior-analytic services. However, few resources are available to help guide practitioners through measuring procedural fidelity. In fact, most published behavior-analytic research on procedural fidelity analyzes a single treatment procedure, which might not completely reflect the process of monitoring and addressing the procedural fidelity of a robust treatment package that might be necessary in clinical settings. The purpose of this article is to guide behavior analysts through the process of creating and using procedural fidelity measurement systems, with a focus on direct observation of implementation as a means of fidelity data collection. This process consists of six steps: (1) task analyze treatment procedures into measurable units; (2) assign measures to each treatment component; (3) plan the direct observation; (4) collect procedural fidelity data; (5) analyze and interpret procedural fidelity data; and (6) take action to improve procedural fidelity. Each step is described and discussed in the article.

Keywords: Procedural fidelity, Treatment fidelity, Treatment monitoring, Treatment integrity, Procedural integrity, Practitioners


Procedural fidelity (PF), sometimes used interchangeably with the term treatment integrity (Han et al., 2022), is the extent to which a treatment is implemented as designed (Gresham, 1989; Vollmer et al., 2008). In other words, PF describes how well treatment implementation aligns with the way the treatment is described in the plan/protocol. For example, a treatment plan could specify that a clinician must provide praise for each correct academic response that a client emits. If the clinician provides praise as specified within the plan for each correct response, they would be implementing this component of the plan with perfect PF. If the clinician provides praise for every other correct response, the failures to praise each correct response would be considered PF errors. In this example, the frequency of errors would correspond with relatively low PF (specifically a 50% PF score) for this component of the plan.

There are several reasons that PF is essential for effective and ethical behavior-analytic practice. One reason is that high PF1 correlates with treatment effectiveness (Fryling et al., 2012; Wilder et al., 2006), which means an intervention is more likely to work if it is implemented correctly. In addition, most of the evidence that supports behavior-analytic interventions is based on research interventions that were conducted under high PF conditions (Hagermoser Sanetti & Kratochwill, 2008). Therefore, it can be argued that treatments implemented with poor PF would not meet the criteria to be considered evidence-based. In fact, investigations of reduced PF implementation of treatments suggest that low-fidelity implementation reduces the effectiveness of treatments (Brand et al., 2019; Fryling et al., 2012).

Issues can arise from low PF in a variety of treatment contexts (Brand et al., 2019; Falakfarsa et al., 2021). For example, research evaluating the impact of PF failures in problem behavior contexts has shown that research participants are more likely to continue engaging in problem behaviors and less likely to engage in replacement behaviors when PF is low (Arkoosh et al., 2007; St. Peter Pipkin et al., 2010). Likewise, research evaluating the impact of PF failures in skill acquisition contexts has shown that research participants are less likely to demonstrate targeted skills when PF is low (Carroll et al., 2013). Thus, behavior analysts must always be aware of threats caused by PF failures and strategies for addressing/preventing PF issues regardless of the treatment contexts in which they provide services.

PF failures are often categorized into one of two specific types of failure—omission errors and commission errors. Omission errors are defined generally as failing to implement a component of a prescribed procedure, and commission errors are defined generally as actively implementing a component that is not part of the prescribed procedure. Although both omission and commission errors can be problematic, they sometimes have idiosyncratic impacts across interventions. For example, St. Peter Pipkin et al. (2010) indicated that commission errors were more detrimental than omission errors when treating problem behavior. As an alternative, Carroll et al. (2013) indicated that omission errors reliably slowed the rate of target acquisition whereas commission errors produced mixed results (i.e., one client’s acquisition of the skill was not affected, one client gained the skill after several sessions, and the last client never gained the skill). The implications of these findings are compelling: some PF errors may cause clients to take more time to gain skills that are important for them to successfully navigate their environment, and other PF errors may cause clients to never gain those skills. These findings also highlight that fidelity errors do have idiosyncratic effects across clients.

Monitoring and measuring PF is essential to addressing issues related to poor/low PF. To ensure effective strategies are used to monitor and measure PF, careful consideration must be given to the modality of measuring PF. Like many behaviors, PF can be measured directly and/or indirectly. An example of a direct measure of PF would be a supervisor observing and collecting data on a supervisee implementing a treatment. Examples of indirect measures of PF include permanent products/self-report. Although indirect measures are convenient and might be necessary in some treatment contexts (Sanetti & Collier-Meek, 2014; Hagermoser Sanetti & Fallon, 2011), they are more susceptible to confounding variables than direct measures (Johnston et al., 2020). Thus, direct measures of PF should be prioritized and used when possible.

Vollmer et al. (2008) provided recommendations for directly collecting count measures of PF for a differential reinforcement of alternative behavior (DRA) intervention. We aim to extend this line of research by describing a process for establishing a PF measurement system that can be adapted to a variety of intervention packages (e.g., DRA, mand training, and natural environment teaching [NET]) rather than a single intervention (e.g., DRA) and capture a variety of measures of PF (e.g., latency, duration, interresponse time). The purpose of this article is to provide recommendations for creating successful PF measurement systems for individual clients’ treatment packages that benefit from direct observation of implementer–client interactions.

Creating Procedural Fidelity Measurement Systems

Although the concept of measuring PF is relatively straightforward (i.e., record how well a treatment is being implemented), the actual practice of measuring PF can be complicated (Fallon et al., 2020; Suhrheinrich et al., 2020; Vollmer et al., 2008). One of the most imposing complications to measuring PF is that PF measurement systems must capture idiosyncratic client, treatment, and context needs. There are no all-purpose or standardized PF measurement systems that can be acquired and applied to all treatment situations. Even treatment contexts that are narrow/specific in their purpose would not likely be able to use a single PF measurement system unless they only provide a single, standardized treatment to all clients. Therefore, the recommendations provided in this article are not designed to help create a single, one-size-fits-all PF system. Instead, the recommendations described here are meant to guide practitioners through the critical steps of creating individualized PF measurement systems that can be adapted to fit the diverse needs of clients and their treatment contexts.

The recommendations for creating PF measurement systems are divided into the following six steps: (1) task analyze treatment procedures into measurable units; (2) assign measures to each treatment component; (3) plan the direct observation; (4) collect PF data; (5) analyze and interpret PF data; and (6) take action to improve PF. Each step is described in detail in the following sections.

Step 1. Task Analyze Treatment Procedures into Measurable Units

The purpose of any PF measurement system is to accurately measure the degree to which a treatment is being implemented as planned so that it can be evaluated. Thus, measurement systems are idiosyncratic to the treatment that is supposed to be in place. Therefore, the process of creating a PF measurement system must begin with identifying and task-analyzing the client’s treatment procedures (Collier-Meek et al., 2020; Vollmer et al., 2008).

Identifying and task-analyzing treatment procedures is heavily influenced by the treatments that are used to support a client. This step of the process should not be conducted until appropriate and effective evidence-based treatments have been arranged for a client. Furthermore, all treatments should be described technologically in something akin to a behavior plan or protocol as part of standard practice. If a treatment is not described technologically prior to arranging the PF measurement system, doing so should be a priority.

Describing treatment technologically consists of task analyzing procedures (i.e., breaking the procedures into their component parts) and describing them in a way that would enable any reasonably qualified person to implement the plan as intended. Technological descriptions of treatments must be detailed, objective/behavioral, and complete. Detailed descriptions provide an exact depiction of the treatment components. Objective/behavioral descriptions are those that refer only to observable/measurable behaviors. Complete descriptions are thorough and include instructions for all relevant components. Together, a treatment description that is detailed, objective/behavioral, and complete is one that clearly outlines exactly how to implement the treatment effectively.

Technological descriptions of treatments should be arranged sequentially and/or conditionally depending on its function (Vollmer et al., 2008). Sequential descriptions describe the order in which the steps of an intervention are supposed to occur regardless of specific events or conditions. For example, a toothbrushing routine is largely sequential. After the client puts toothpaste on the toothbrush, staff should prompt the client to use the toothbrush to scrub all teeth, etc. On the other hand, a conditional description describes a reaction to a specific event or condition. For example, when a student begins to threaten another student with physical harm, the teacher should redirect the student to a different area of the classroom. Both sequential and conditional treatment descriptions are dependent on contexts, but the conditions required for sequential descriptions are embedded within the sequence, whereas conditional descriptions are based on conditions that are produced outside of the task. Some treatment components may be both sequential and conditional. For example, the conditional response to the client who begins to threaten another student with physical harm might require sequential steps once the conditional response begins. Thus, the descriptions of treatments may require one or both arrangements to sufficiently describe what should be done and under what circumstances it should be done.

Once all treatment components are described technologically, they can be used as the basis for the PF measurement system. An important step in preparing PF measurement systems for treatment procedures is separating each individual component into its own unit and organizing them by sequence or condition. It is often helpful to organize things around different level headings based on the sequence of conditional requirements. For example, a standard sequential description could have a first-level heading that says, “during the toothbrushing routine,” followed by a list of the steps involved with toothbrushing in a second-level heading. For conditional descriptions, the first-level heading should be used for the initial conditional requirement, and then subsequent headings should be used to describe the treatment components. For example, a conditional description might have a first-level heading that says, “when the client begins to threaten another student with physical harm,” followed by recommended steps to respond to the threats in second-level headings. Conditional descriptions might require multiple level headings depending on how many conditional components are involved with the plan. Treatment components that are separated and logically organized into a single document that overviews the essential components of the treatment are called a procedural fidelity checklist.

If the treatment plan is described in a narrative format, it must be further broken into a sectioned format to isolate each component. Take, for example, this procedure:

Before meal preparation, staff will inform Jack that the meal is being prepared, that he should stay out of the kitchen during this time, and what time the meal will be ready. After the meal is ready, staff will inform Jack that he can enter the kitchen and get his food.

Although this narrative treatment description is relatively easy to follow, it is not separated into its individual components or obviously organized by sequence and/or condition. Separating and organizing the individual components by sequence and/or condition is important because it allows for individual evaluation of each component part. Thus, the procedure from the previous example should be transferred to a format similar to that shown in Table 1.

Table 1.

Example of identifying and task analyzing the treatment procedures

Before mealtime:
  Staff will inform Jack . . .
  That the meal is being prepared
  That he should stay out of the kitchen during this time
  What time the meal will be ready
After the meal is ready:
  Staff will inform Jack that he can enter the kitchen and get his food.

The format shown in Table 1 organized the before and after meal conditional contexts as first-level headings and the treatment components as second- and third-level headings. The before-meal preparation treatment component requires three related but different behaviors from staff: they must tell Jack three different things, whereas the after-meal component only has one requirement—to tell Jack one thing. It is important that each individual statement be isolated because combining the three different statements into a single component would confound the measurement of each individual statement. Thus, each component is separated into a logical unit that will be able to be measured in isolation.

After and/or during the process of identifying and task-analyzing the treatment procedures, unclear and difficult-to-measure treatment components should be removed or revised. Treatment components that cannot or will not be measured should be removed from the PF checklist, whereas treatment components that need to be better explained, defined, or elaborated should be adjusted accordingly. It is essential that every component remaining on the PF checklist be appropriate and measurable after this step.

When best practices are followed during treatment planning, there will be few, if any, treatment components ruled out or adjusted. Nevertheless, issues related to the measurability of a treatment component may arise during this process. For example, the Ethics Code for Behavior Analysts (Behavior Analyst Certification Board [BACB], 2020) specifies that all clients should be treated with compassion, dignity, and respect. Although there is no question that treating clients with compassion, dignity, and respect is important, the concepts of compassion, dignity, and respect may be prone to subjective and inconsistent interpretations because they are used relatively infrequently in behavior analytic research but frequently in the common vernacular. As a result, embedding those types of terms and concepts into treatment plans without operationally defining them would render accurate PF measurements extremely challenging, if not impossible. Thus, when terms like this are embedded within treatment components, it is important that they be operationally defined in the treatment plan and PF checklist to decrease the subjectivity of the terms.

There are a couple of strategies that might be helpful when operationally defining terms that are prone to subjective and inconsistent interpretations, like compassion, dignity, and respect. The first strategy is to review behavior-analytic literature for existing operational definitions of the terms and concepts in question. For example, resources for conceptualizing and operationally defining compassion have recently become increasingly available in behavior-analytic journals (e.g., LeBlanc et al., 2020; Rohrer et al., 2021; Taylor et al., 2019). Another strategy that might assist with operationalizing challenging terms is to consider rescoping and simplifying the terms and concepts in question. For example, compassion, dignity, and respect are broad concepts that might manifest differently depending on the treatment context. In some cases, it might be helpful to utilize and operationalize specific behaviors that might fit under compassion, dignity, and respect instead of the broader terms. For example, it might be simpler to specify that staff should refer to clients with respectful salutations such as “sir” or “ma’am” (see Reid et al., 2018) instead of blankly stating that they are expected to treat clients with compassion, dignity, and respect.

Step 2. Assign Measures to Each Treatment Component

Assigning measures to each treatment component is a necessary step toward measuring PF (Kodak et al., 2022; Vollmer et al., 2008). The basic goal of assigning measures is to identify the most pertinent dimension(s) of the behavior involved with the treatment component and arrange an individualized measure of that dimension so that those components can be accurately measured and intervened on. It is essential that each treatment component listed on the PF checklist be evaluated individually and assigned a separate measure during this process so that the evaluation of each component is not confounded by another variable or inappropriate measure.

There are three measurable dimensions of behavior often referred to as repeatability, temporal extent, and temporal locus (Cooper et al., 2019; Johnston et al., 2020). Repeatability refers to the countable instances of behavior and includes specific measures like count and frequency/rate. Temporal extent refers to the total amount of time a behavior occurs and includes specific measures like duration. Finally, temporal locus refers to the time between a behavior and another event and includes specific measures like latency and interresponse time (IRT). Any given behavior may be most accurately captured by using one or more of these basic measures. It is important that each individual component be evaluated and assigned a measure that effectively captures the critical aspects of that behavior.

Table 2 shows a section of a multicomponent treatment procedure with each treatment component isolated. When reviewing the treatment components, it is evident that each piece cannot be measured using the same process. For example, notifying Aiden that the interval is starting, providing intermittent encouragement/praise to Aiden at least five times throughout the interval, redirecting Aiden within 15 s of problem behavior, giving Aiden access to an enriched break for 5 min, and providing attention every 30 s might all require different measures. Notifying Aiden about the interval could involve a dichotomous (e.g., yes or no) count measure. Providing intermittent encouragement/praise would require a specific count (i.e., tally of the total count). Redirecting Aiden within 15 s would require a latency measure. Giving Aiden an enriched break for 5 min would require a duration measure. Finally, providing attention every 30 s would require an IRT measure.

Table 2.

Example treatment procedures

At the start of the interval:
  Set a timer for a 15-min interval
  Notify Aiden that the interval is starting
During the interval:
  Continuously monitor Aiden’s behavior throughout the interval
  Provide intermittent encouragement/praise to Aiden at least 5 times throughout the interval
If problem behavior occurs:
  Withhold reinforcement
  Redirect Aiden to engage in the relevant classroom activity . . .
  Within 15 s of the behavior beginning
  Then, reset the timer
If problem behavior does not occur:
  Praise Aiden for completing the interval
  Give Aiden a 3–4-min enriched break from his work
  Give Aiden access to his preferred tangible
  Provide attention to Aiden every 30 s on average

As demonstrated in Table 2, even short and simple PF checklists might require several different measures to accurately capture the responses of interest. If not arranged carefully, the use of different measures for different treatment components could complicate the analysis of the PF because different measures cannot easily be compared. Thus, each measure should be taken and evaluated against its own standard and calculated as the percentage of the criterion met (i.e., component scores). For dichotomous measures, a component score of 0% or 100% will be provided, depending on whether the component was correctly implemented. Other measures will produce more nuanced scores. Specific count measures would be calculated by comparing the obtained count to the intended count (e.g., correct responses vs. opportunities to respond). When errors are indicated by lower obtained counts as compared to the intended count (i.e., omission errors), the obtained count should be divided by the intended count. On the other hand, when errors are indicated by higher obtained counts as compared to the intended count (i.e., commission errors), the intended count should be divided by the obtained count.2 For example, if a treatment procedure is supposed to occur five times and it only occurs four, the component score would be calculated as 4/5 = 80%. However, if the treatment procedure occurred seven times instead of five times as intended, the component score would be calculated as 5/7 = 71.42%. Time-based measures like duration, latency, and IRT should be calculated in the same way by substituting the obtained and intended counts for the obtained and intended times. For example, if an intervention requires a duration of 3 min (180 s) and the implementation only occurred for 2.5 min (150 s), the percentage of the criterion met would be calculated as 150/180 = 83.33%. A duration of 4 min (240 s) would be calculated as 180/240 = 75%.

It is important to note that it can be challenging to capture every possible commission error on a PF checklist because commission errors can vary so widely. However, conceptualizing commission errors as either an excessive action (i.e., implementing a component of the treatment more than is prescribed) or an extraneous action (i.e., doing something that is not related to the treatment plan) provides a framework that can embedded into a PF checklist. When excessive action commission errors occur, they can be captured and calculated via the process described in the previous paragraph. When extraneous action commission errors occur, they can simply be noted on the PF checklist without the need for any measurement mechanism because any occurrence of an extraneous action would be considered problematic.

Table 3 shows the accumulated steps of selecting and arranging PF measures onto an example PF checklist based on Table 2. Each treatment component is separated into isolated and measurable units under the treatment component column. Next to each treatment component is the specific measure that is to be taken for that component, along with the measurement instructions. As discussed previously, some of the measures are dichotomous or specific counts, and some are duration, latency, and IRT. Each measure has embedded recording parameters. For example, dichotomous require a “yes” or “no” to be marked, and specific counts require a tally. All of the time-based measures require a time to be input into the sheet, and the IRT requires multiple time recordings to be averaged for the final score because there are six to eight opportunities for a 30-s IRT within a 3–4-min period of time. The final column shows the component score based on the percentage of the criterion met for each individual treatment component. Although this exact layout will not be useful for every PF checklist, arranging the treatment components, measures, and component scores into separate columns or sections may help organize the measurement process.

Table 3.

Example of pf data collection sheet

Treatment Component Measure Component Score
At the start of the interval:
  Set a timer for a 15-min interval Circle- Yes/No ( )
  Notify Aiden that the interval is starting Circle- Yes/No ( )
During the interval:
  Continuously monitor Aiden’s behavior throughout the interval Circle- Yes/No ( )
  Provide intermittent encouragement/praise to Aiden at least 5 times throughout the interval Tally Frequency- ( )/5 ( )
If problem behavior occurs:
  Withhold reinforcement

Tally- Correct ( )/Opportunities ( )

Calculate Percent Correct-

Correct Correct ( )/Opportunities ( )

( )
  Redirect Aiden to engage in the relevant classroom activity . . . Circle- Yes/No ( )
  Within 15 s of the behavior beginning Record Latency- ( s. )/15 s. ( )
  Then, reset the timer Circle- Yes/No ( )
If problem behavior does not occur:
  Praise Aiden for completing the interval Circle- Yes/No ( )
  Give Aiden a 3–4-min enriched break from his work Record Duration- ( s. )/180 s. ( )
  Give Aiden access to his preferred tangible Circle- Yes/No ( )
  Provide attention to Aiden every 30 s on average

Record IRT-

1. ( s. )

2. ( s. )

3. ( s. )

4. ( s. )

5. ( s. )

6. ( s. )

7. ( s. )

8. ( s. )

Calculate Average IRT-

( s. )/ 30s.

( )

Table 4 shows a scored version of Table 3 as an example of the measurement and scoring system. All dichotomous measures are recorded by circling “Yes” or “No” and scored based on whether it occurred or not. The specific count measure was recorded by tallying the frequency (i.e., three instances). That frequency was then compared to the criteria (i.e., five instances) by dividing three by five. The latency measure was recorded by measuring the time between Aiden’s problem behavior and staff redirecting him (i.e., 18 s), and compared to the criteria (i.e., within 15 s) by dividing 15 by 18. The duration measure was recorded by measuring the length of Aiden’s enriched break (193 s) and compared to the criteria (180–240 s). Finally, the IRT measure was obtained by recording each IRT (opportunities 7 and 8 were not applicable because the break only lasted 193 s), and then averaging those IRTs, which equaled 31.16 s. The averaged IRT time was then compared to the criteria (30 s) by dividing 30 by 31.16. Individual component scores were by the comparison calculations.

Table 4.

Example of completed PF data collection sheet

Treatment Component Measure Component Score
At the start of the interval:
  Set a timer for a 15-min interval Circle- (Yes)/No ( 100% )
  Notify Aiden that the interval is starting Circle- Yes/(No) ( 0% )
During the interval:
  Continuously monitor Aiden’s behavior throughout the interval Circle- (Yes)/No ( 100% )
  Provide intermittent encouragement/praise to Aiden at least 5 times throughout the interval Tally Frequency- (III)/5 ( 60% )
If problem behavior occurs:
  Withhold reinforcement

Tally- Correct (I)/Opportunities (III)

Calculate Percent Correct-

Correct ( 1 )//Opportunities ( 3 )

III

( 66.67% )

  Redirect Aiden to engage in the relevant classroom activity . . . Circle- (Yes)/No ( 100% )
  Within 15 s of the behavior beginning Record Latency- ( 18 s. ) ( 83% )
  Then, reset the timer Circle- (Yes)/No ( 100% )
If problem behavior does not occur:
  Praise Aiden for completing the interval Circle- Yes/(No) ( 0% )
  Give Aiden a 3–4-min enriched break from his work Record Duration- ( 193 s. ) ( 100% )
  Give Aiden access to his preferred tangible Circle- (Yes)/No ( 100% )
  Provide attention to Aiden every 30 s on average

Record IRT-

1. ( 39s. )

2. ( 26s. )

3. ( 41s. )

4. ( 37s. )

5. ( 15s. )

6. ( 29s. )

7. ( NA )

8. ( NA )

Calculate Average IRT-

( 31.16s. )

( 96% )

Tables 5, 6 and 7 show examples of different treatment components that have been task analyzed into measurable units (Step 1) and assigned measures (Step 2). It is important to note that each of the example PF data collection sheets is set up for spot checks (i.e., the observation of a single iteration of the intervention) as a way to demonstrate a diverse range of measures within the same PF data collection sheet. For example, in Table 3, components like “start a timer for a 15-min interval” are set up for dichotomous measures because a spot check would only capture one iteration of the interval being set. If multiple measures of the intervention are to be observed in the same observation (e.g., multiple 15-min intervals are going to be observed and recorded), dichotomous measures like setting the 15-min interval should be converted to a specific count measure that captures the number of correct responses and opportunities. A clear example of an applicable specific count measure that could be used to replace the dichotomous measure examples is shown in the measurement for withholding reinforcement when problem behavior occurs in Table 3.

Table 5.

Example of PF data collection sheet for a skill acquisition procedure—Shape identification

Treatment Component Measure Component Score
Before starting work:
  Remove all nonwork materials from the work area Circle- Yes/No ( )
  Place token board on the desk with correct response requirement Circle- Yes/No ( )
During work:
  Say, “Find [correct shape].” Circle- Yes/No ( )
  Place target stimuli in the correct location identified by data sheet Circle- Yes/No ( )
If correct response:
  Provide specific praise Circle- Yes/No ( )
If no response within 5 s:
  Provide a gestural prompt within 3 s of the opportunity Record Latency- ( s. )/3 s. ( )
If incorrect response or no response after the prompt:
  Provide the error correction statement Circle- Yes/No ( )
When work requirement is completed:
  Provide a 3–4-min enriched break from his work Record Duration- ( s. )/180 s. ( )

Table 6.

Example of PF data collection sheet for a self-care routine—Sorting laundry

Treatment Component Measure Component Score
Before starting work:
  Put white and black bin in front of the client Circle- Yes/No ( )
  Place unsorted laundry between the bins Circle- Yes/No ( )
During work:
  Say, “It is time to sort laundry” Circle- Yes/No ( )
If correct response:
  Provide specific praise Circle- Yes/No ( )
If no response:
  Model the sorting task within 10 s of the opportunity Record Latency- ( s. )/10 s. ( )
If incorrect response:
  Remove incorrect items and ask the client to try again. Circle- Yes/No ( )
When work requirement is completed:
  Provide a 3–4-min enriched break from his work Record Duration- ( s. )/180 s. ( )

Table 7.

Example of PF Data Collection Sheet for a Trial-Based Functional Analysis—Tangible Condition

Treatment Component Measure Component Score
  During the Control Condition:
  Provide attention every 15 s

Record IRT-

1. ( s. )

2. ( s. )

3. ( s. )

4. ( s. )

5. ( s. )

6. ( s. )

7. ( s. )

8. ( s. )

Calculate Average IRT-

( s. )/ 30s.

( )
  Provide access to highly preferred Circle- Yes/No ( )
  Withhold from all demands Circle- Yes/No ( )
If the target behavior occurs:
  Provide no programmed consequences Circle- Yes/No ( )
During the Test Condition:
  Remove the tangible at the start of the condition Circle- Yes/No ( )
  Provide attention every 15 s

Record IRT-

1. ( s. )

2. ( s. )

3. ( s. )

4. ( s. )

5. ( s. )

6. ( s. )

7. ( s. )

8. ( s. )

Calculate Average IRT-

( s. )/ 30s.

( )
  Withhold from all demands Circle- Yes/No ( )
If the target behavior occurs:
  Return the tangible item within 2 s. Record Latency- ( s. )/2 s. ( )

Step 3. Plan the Direct Observations

As previously stated, this article focuses on PF measurement systems for individual clients’ treatment packages that require direct observation. Thus, this section will focus on considerations for obtaining measures via direct observation. Although direct observation is helpful because it enables the measurement of PF components that cannot be captured reliably via permanent product or self-report methods (Collier-Meek et al., 2020; Noell et al., 2002, 2005; Noell, 2007), it involves logistical complications that must be circumvented (Kodak et al., 2022). Namely, direct observation requires human resources because a clinical supervisor or PF observer/data collector is needed to monitor the treatment being delivered (Han et al., 2022).

Clinical supervisors are frequently burdened with a copious number of responsibilities that compete for their time and attention in applied settings. As a result, some clinical supervisors might have difficulty finding the time needed to conduct direct observation of their supervisees’ PF. However, as stated in the introduction to this article, PF is essential for effective and ethical behavior-analytic practice. Clinical supervisors who do not arrange for the direct observation of their supervisees’ PF are at risk of pervasive PF issues that could render some treatments less or even ineffective (Fryling et al., 2012), decrease the overall quality of services (Reid et al., 2021), and put clients at risk of mistreatment (Morris & Hollins, 2022). Thus, arranging PF observation and measurement is an essential component of clinical supervision that must be balanced with the supervisors’ other critical responsibilities.

To assist time-strapped clinical supervisors, direct observation systems should be designed to be as efficient as possible. Practical considerations that can affect the efficiency of measuring PF include prioritizing specific treatment components, selecting meaningful observation periods, and determining the right length and frequency of observations. Although measuring every component of every treatment plan all of the time might be ideal, it is not always possible in most treatment contexts. Thus, supervisors must prioritize which treatment components they monitor for PF based on factors that include the potential impact and complexity of the treatment components (Gresham et al., 2000; Kodak et al., 2022). For example, in a treatment package that includes instructional strategies for novel and mastered tasks, the supervisor may want to prioritize monitoring the novel instruction because it is likely to have greater impact on the student’s learning goals and be more complicated for the staff to implement. However, even relatively simple procedures can potentially encounter treatment drift over time (Cooper et al., 2019), so it might be advisable to sample a wide array of each client’s procedures, even when primarily focusing on high-impact and complex treatment components.

After the treatment components are prioritized, supervisors must select meaningful observation periods. Direct observation should be carefully planned based on when the supervisor is likely to observe important aspects of implementation (e.g., observing PF of novel instruction delivery). This could be based on the supervisee’s naturalistic schedule or contrived by temporarily rearranging the supervisee’s schedule to fit the supervisor’s availability. For example, if the novel instructions are typically implemented in the morning when the supervisor is unavailable, they can request that the novel lessons be moved to the afternoon when they are available. If rearrangement the supervisor’s and supervisee’s schedule cannot be rearranged for an observation, the supervisor could consider video recording the session when it is appropriate to do so.

Supervisors should also consider the length of observations. In general, the goal is to arrange for the most efficient observation length possible (Vollmer et al., 2008). Thus, supervisors should evaluate the treatment components they plan to observe, schedule their observation to begin as the targeted treatment components begin, and conduct an observation that is long enough to adequately capture the essence of the targeted treatment component. For example, when observing a staff deliver novel instructions, supervisors must select the number of trials and/or sessions of the novel instruction delivery that they observe. The number of trials/sessions observed needs to produce a representative sample of the staff’s PF; however, a brief sample may be sufficient for some plans (Collier-Meek et al., 2018; Vollmer et al., 2008).

The final variable to consider when trying to conduct direct observation of PF is the frequency of the observations. Regular and frequent observation is generally regarded as best practice (Reid et al., 2021), but again, it may not be feasible. Perhaps the most important component of determining how often to monitor staff’s PF is the nature of the supervisory relationship. Guidelines are available that specify the required amount of supervision for registered behavior technicians (RBTs), board certified assistant behavior analysts, and individuals accruing supervision hours to qualify for the board certified behavior analyst (BCBA) credential. The handbooks for the three credentials specify that RBTs need supervision for 5% of their services each month (BACB, 2022a), BCaBAs need supervision for 2%–5% of their services each month depending on the amount of experience they have (BACB, 2022b), and individuals accruing supervision hours to qualify for the BCBA credential need supervision for 5%–10% of their services each supervision period depending on the structure of their supervision (i.e., supervised fieldwork vs. concentrated supervised fieldwork; BACB, 2022c). However, it should be noted that these supervision requirements are considered to be the minimally accepted guidelines, not necessarily best practices. Thus, the formal supervision guidelines should only be used to inform the absolute minimal acceptable amount of supervision rather than the aspiration for high-quality supervision.

Another component that is important to consider when determining how often to monitor staff’s PF is previous staff performance. Frequent observation of PF provides an opportunity to intervene on performance issues early and often, which can help prevent long-term PF issues. Thus, it might be advisable to focus observations on low performance in some situations. However, frequent observation of PF also provides an opportunity to deliver praise and positive feedback that could produce better rapport between the supervisors and supervisees and employee retention (Reid et al., 2021). Therefore, supervisors should be careful not to overlook high-performing staff or high-performance contexts when selecting their observation periods. Instead, supervisors should attempt to obtain a balance between low and high performance in their scheduled observations.

Step 4. Collect Procedural Fidelity Data

The act of collecting PF data is not unlike collecting other behavioral data. Supervisors should generally follow the same best-practice data collection methods they use when collecting data on their clients when they collect PF data. However, there are important nuances to observing staff performance and collecting PF data. Perhaps the most important nuance to collecting PF is the reactivity of an observer being present (i.e., staff performing differently in front of a supervisor; Kazdin, 1982). If staff perform differently in the presence of a supervisor, there is a risk of the data being artificially high and generally inaccurate (Ingham & Greer, 1992). Thus, supervisors should acknowledge and account for the possibility of inflated data and take action to minimize the reactivity.

Reid et al. (2021) proposed the following four strategies to reduce reactivity to supervisor observations: (1) monitor frequently; (2) begin monitoring immediately upon entering the staff workplace; (3) conduct unexpected, “return” monitoring sessions; and (4) monitor on a generally unpredictable schedule (p. 81). Monitoring frequently minimizes novelty related to the supervisor's presence that might affect staff. Beginning monitoring immediately, conducting unexpected observations, and observing on an unpredictable schedule all reduce the ability of staff to prepare differently or change what they are doing when the supervisor enters the work area. By engaging in some or all of these strategies and reducing reactivity, supervisors will be more likely to obtain accurate data. However, it is possible that some of the strategies that produce accurate data during observations could negatively affect staff satisfaction and rapport. Satisfaction with and support from supervisors has been linked to increased retention of direct care staff (Novack & Dixon, 2019), so selecting observation strategies that balance supervisee satisfaction and observation accuracy is important. Some observation strategies, like collecting data immediately upon entering the workspace, could be aversive to supervisees if their work culture is one where staff expects small talk and pleasantries when they see their supervisor. Thus, Reid et al. (2021) also recommended that supervisors greet and inform supervisees what they are monitoring before starting their observation to promote staff acceptance of the observation.

Step 5. Analyze and Interpret Procedural Fidelity Data

Once PF data are collected, they need to be analyzed so that they can inform the next supervisory steps. Depending on the measurement system, PF data can be analyzed in many ways. PF data are often summarized so that there is an overall PF score that describes how well the intervention was implemented. This overall PF score is sometimes referred to as a global PF measure (Cook et al., 2015; Kodak et al., 2022). Global PF measures are often calculated by dividing correct implementor responses by correct and incorrect implementor responses. However, as discussed in the section on selecting measures, global measures may not always reflect the implementation of individual plan components. For example, the “during the interval” procedures of Aiden’s plan include a binary measure (continuously monitor Aiden’s behavior throughout the interval) and a specific measure (provide intermittent encouragement/praise to Aiden at least five times throughout the interval). If the clinician correctly implemented the treatment component with the binary measure, but only provided praise 2/5 times, their global PF score could be 70%. This artificially inflates how well the plan was implemented; the second component was implemented with only 40% PF.

Thus, practitioners should be cautious about only summarizing data in a global measure. Global measures of PF may obscure important errors (Cook et al., 2015). In addition, assessing component PF scores may facilitate the provision of specific positive and constructive feedback to the implementer (Kodak et al., 2022). A low global measure of PF may lead supervisors to have a primarily constructive conversation with implementers, whereas component measures could allow for the identification of elements of implementation that would benefit from positive supervisor acknowledgment. In other words, supervisees may be better equipped to improve their PF when provided feedback along the lines of “you provided praise correctly 40% of the time but consistently implemented the error correction procedure” rather than “you implemented the treatment correctly 70% of the time” because they are receiving specific performance feedback. Thus, supervisors may be able to better shape performance by using component PF measures that will allow them to provide differential reinforcement of high-performance areas while providing correct feedback for low-performance areas.

Step 6. Take Action to Improve Procedural Fidelity

Finally, if low PF value scores are identified, practitioners should identify an intervention to increase PF. Talking to implementers about their PF scores could lead to the discovery that aspects of the plan may need to be clarified or amended due to environmental constraints. For example, upon detection of low PF levels for implementing Aiden’s DRO procedure, you may interview the staff and discover that 15 min is a challenging interval length for them to maintain because other clients rotate activities every 15 min, which distracts them from Aiden’s plan. This conversation can then focus on identifying an effective interval length that works within the applicable context. Upon discussion with the staff, you could change the interval to 13 min, and then Aiden is on his break while the staff rotates other client’s activities.

If a conversation regarding barriers does not result in modifications as described above, there are other evidence-based treatments for increasing PF. For example, behavioral skills training results in improved PF (e.g., Clayton & Headley, 2019). Feedback in isolation may also be sufficient in changing implementer PF levels in some circumstances (Ward-Horner & Sturmey, 2012).

Conclusion

There are many important considerations for developing an effective PF measurement system. For the purposes of this article, we focused on providing recommendations for developing a PF data collection system that requires direct observation of implementer behavior. Namely, we provided recommendations and examples of protocolizing treatments, adjusting protocols to include implementer behavior that is important to measure, and deciding on measures of implementer behavior. We also provided brief recommendations for analyzing and using PF data to support implementor performance. We did not discuss other important contextual variables that might affect PF, such as the selection of high-quality treatments, implementor prerequisite skills, and supervision logistics because they were beyond the scope of this article.

We described how to create a PF sheet prior to measurement, but it also may be necessary to change PF sheets as problems arise. For example, you may notice that when Aiden engages in problem behavior, Aiden’s behavior technician is reprimanding Aiden while redirecting him. It may be important to amend the PF sheet to account for this commission error that may function as the addition of an aversive stimulus to Aiden’s environment.

The recommendations provided in this article were designed as guiding principles rather than prescriptive, step-by-step processes because PF measurement systems must be highly individualized to fit idiosyncratic client needs. For example, it may be important for a client that praise is delivered enthusiastically. To account for the quality of praise, a different type of PF measurement focused on qualitative aspects of caregiver behavior may be necessary (e.g., Weyman & Sy, 2018). In addition, our recommendations have largely not been evaluated empirically for ease of use. Future research should evaluate the most efficient methods of monitoring and measuring PF when direct observation is necessary. For example, researchers could evaluate how many instances of implementer behavior is generally representative within a sample. Future research should also further discuss how to collect PF data on qualitative aspects of treatments (e.g., enthusiasm and promoting dignity).

PF measurement is a critical piece of ethical supervision and treatment provision. However, it has been discussed less frequently in research and practice than the importance of the subject matter warrants. As our field continues to grow and meet the diverse needs of consumers, ensuring that our interventions represent best practices that are implemented with high PF is essential.

Data Sharing

Not applicable

Declarations

Conflicts of Interest

We have no known conflict of interest to disclose.

Research Involving Human Participants and/or Animals

Not applicable.

Informed Consent

Not applicable.

Footnotes

1

High PF refers to the implementation of a treatment closely matching the expectations of the treatment. Low PF refers to the implementation of a treatment diverging from the expectations of the treatment.

2

Exceeding an intended count or time would only be considered an error if a cap is set for that treatment component. For example, Example 2 specifies that Aiden should be given a 3–4-min enriched break. Because a range is specified, it is clear that anything below or above that amount of time would be considered an error. Example 2 also specifies that Aiden should be encouraged/praised at least five times through the interval. Thus, instances of encouragement/praise that exceed the intended amount would not violate the treatment protocol and, therefore, would not be considered a PF error.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Arkoosh MK, Derby KM, Wacker DP, Berg W, McLaughlin TF, Barretto A. A descriptive evaluation of long-term treatment integrity. Behavior Modification. 2007;31(6):880–895. doi: 10.1177/0145445507302254. [DOI] [PubMed] [Google Scholar]
  2. Behavior Analyst Certification Board. (2020). Ethics code for behavior analysts.https://bacb.com/wp-content/ethics-code-for-behavior-analysts/
  3. Behavior Analyst Certification Board. (2022a). Registered behavior technician handbook.https://www.bacb.com/wp-content/uploads/2022/01/RBTHandbook_230823-a.pdf
  4. Behavior Analyst Certification Board. (2022b). Board certified assistant behavior analyst handbook. https://www.bacb.com/wp-content/uploads/2022/01/BCaBAHandbook_230623-a.pdf
  5. Behavior Analyst Certification Board. (2022c). Board certified behavior analyst handbook.https://www.bacb.com/wp-content/uploads/2022/01/BCBAHandbook_230804-a.pdf
  6. Brand D, Henley AJ, DiGennaro Reed FD, Gray E, Crabbs B. A review of published studies involving parametric manipulations of treatment integrity. Journal of Behavioral Education. 2019;28(1):1–26. doi: 10.1007/s10864-018-09311-8. [DOI] [Google Scholar]
  7. Carroll RA, Kodak T, Fisher WW. An evaluation of programmed treatment-integrity errors during discrete-trial instruction. Journal of Applied Behavior Analysis. 2013;46(2):379–394. doi: 10.1002/jaba.49. [DOI] [PubMed] [Google Scholar]
  8. Clayton M, Headley A. The use of behavioral skills training to improve staff performance of discrete trial training. Behavioral Interventions. 2019;34(1):136–143. doi: 10.1002/bin.1656. [DOI] [Google Scholar]
  9. Collier-Meek MA, Fallon LM, Gould K. How are treatment integrity data assessed? Reviewing the performance feedback literature. School Psychology Quarterly. 2018;33(4):517. doi: 10.1037/spq0000239. [DOI] [PubMed] [Google Scholar]
  10. Collier-Meek MA, Sanetti LM, Fallon L, Chafouleas S. Exploring the influences of assessment method, intervention steps, intervention sessions, and observation timing on treatment fidelity estimates. Assessment for Effective Intervention. 2020;46(1):3–13. doi: 10.1177/1534508419857228. [DOI] [Google Scholar]
  11. Cook JE, Subramaniam S, Brunson LY, Larson NA, Poe SG, St. Peter CC. Global measures of treatment integrity may mask important errors in discrete-trial training. Behavior Analysis in Practice. 2015;8(1):37–47. doi: 10.1007/s40617-014-0039-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cooper JO, Heron TE, Heward WL. Applied behavior analysis. 3. Pearson Education; 2019. [Google Scholar]
  13. Falakfarsa G, Brand D, Jones L, Godinez ES, Richardson DC, Hanson RJ, Velazquez SD, Wills C. Treatment integrity reporting in Behavior Analysis in Practice 2008–2019. Behavior Analysis in Practice. 2021;15(2):443–453. doi: 10.1007/s40617-021-00573-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fallon LM, Cathcart SC, Sanetti LMH. Assessing parents’ treatment fidelity: A survey of practitioners in home settings. Focus on Autism & Other Developmental Disabilities. 2020;35(1):15–25. doi: 10.1177/1088357619866192. [DOI] [Google Scholar]
  15. Fryling MJ, Wallace MD, Yassine JN. Impact of treatment integrity on intervention effectiveness. Journal of Applied Behavior Analysis. 2012;45(2):449–453. doi: 10.1901/jaba.2012.45-449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Gresham FM. Assessment of treatment integrity in school consultation and prereferral intervention. School Psychology Review. 1989;18(1):37–50. doi: 10.1080/02796015.1989.12085399. [DOI] [Google Scholar]
  17. Gresham FM, MacMillan DL, Beebe-Frankenberger ME, Bocian KM. Treatment integrity in learning disabilities intervention research: Do we really know how treatments are implemented? Learning Disabilities Research & Practice. 2000;15(4):198–205. doi: 10.1207/SLDRP1504_4. [DOI] [Google Scholar]
  18. Hagermoser Sanetti LM, Fallon LM. Treatment integrity assessment: How estimates of adherence, quality, and exposure influence interpretation of implementation. Journal of Educational & Psychological Consultation. 2011;21(3):209–232. doi: 10.1080/10474412.2011.595163. [DOI] [Google Scholar]
  19. Hagermoser Sanetti LM, Kratochwill TR. Treatment integrity in behavioral consultation: Measurement, promotion, and outcomes. International Journal of Behavioral Consultation & Therapy. 2008;4(1):95–114. doi: 10.1037/h0100835. [DOI] [Google Scholar]
  20. Han JB, Bergmann S, Brand D, Wallace MD, St. Peter CC, Feng J, Long BP. Trends in reporting procedural integrity: A comparison. Behavior Analysis in Practice. 2022;16:388–398. doi: 10.1007/s40617-022-00741-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ingham P, Greer RD. Changes in student and teacher responses in observed and generalized settings as a function of supervisor observations. Journal of Applied Behavior Analysis. 1992;25(1):153–164. doi: 10.1901/jaba.1992.25-153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Johnston JM, Pennypacker HS, Green G. Strategies and tactics of behavioral research and practice. 4. Routledge/Taylor & Francis Group; 2020. [Google Scholar]
  23. Kazdin AE. Observer effects: Reactivity of direct observation. New Directions for Methodology of Social & Behavioral Science. 1982;14:5–19. [Google Scholar]
  24. Kodak T, Bergmann S, Waite M. Strengthening the procedural fidelity research-to-practice loop in animal behavior. Journal of the Experimental Analysis of Behavior. 2022;118(2):215–236. doi: 10.1002/jeab.780. [DOI] [PubMed] [Google Scholar]
  25. LeBlanc LA, Taylor BA, Marchese NV. The training experiences of behavior analysts: Compassionate care and therapeutic relationships with caregivers. Behavior Analysis in Practice. 2020;13(2):387–393. doi: 10.1007/s40617-019-00368-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Morris C, Hollins NA. On the uncanny similarities between police brutality and client mistreatment. Behavior Analysis in Practice. 2022;15(4):1106–1111. doi: 10.1007/s40617-021-00576-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Noell GH. Research examining the relationships among consultation process, treatment integrity, and outcomes. In: Erchul WP, Sheridan SM, editors. Handbook of research in school consultation: Empirical foundations for the field. Lawrence Erlbaum Associates; 2007. [Google Scholar]
  28. Noell GH, Duhon GJ, Gatti SL, Connell JE. Consultation, follow-up, and implementation of behavior management interventions in general education. School Psychology Review. 2002;31(2):217–234. doi: 10.1080/02796015.2002.12086152. [DOI] [Google Scholar]
  29. Noell GH, Witt JC, Slider NJ, Connell JE, Gatti SL, Williams KL, Resetar JL, Duhon GJ. Treatment implementation following behavioral consultation in schools: A comparison of three follow-up strategies. School Psychology Review. 2005;34(1):87–106. doi: 10.1080/02796015.2005.12086277. [DOI] [Google Scholar]
  30. Novack MN, Dixon DR. Predictors of burnout, job satisfaction, and turnover in behavior technicians working with individuals with autism spectrum disorder. Review Journal of Autism & Developmental Disorders. 2019;6(4):413–421. doi: 10.1007/s40489-019-00171-0. [DOI] [Google Scholar]
  31. Pipkin CSP, Vollmer TR, Sloman KN. Effects of treatment integrity failures during differential reinforcement of alternative behavior: A translational model. Journal of Applied Behavior Analysis. 2010;43(1):47–70. doi: 10.1901/jaba.2010.43-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Reid DH, Rosswurm M, Rotholz DA. No less worthy: Recommendations for behavior analysts treating adults with intellectual and developmental disabilities with dignity. Behavior Analysis in Practice. 2018;11(1):71–79. doi: 10.1007/s40617-017-0203-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Reid DH, Parsons MB, Green CW. The supervisor's guidebook: Evidence-based strategies for promoting work quality and enjoyment among human service staff. Thomas; 2021. [Google Scholar]
  34. Rohrer JL, Marshall KB, Suzio C, Weiss MJ. Soft skills: The case for compassionate approaches or how behavior analysis keeps finding its heart. Behavior Analysis in Practice. 2021;14(4):1135–1143. doi: 10.1007/s40617-021-00563-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sanetti LMH, Collier-Meek MA. Increasing the rigor of procedural fidelity assessment: An empirical comparison of direct observation and permanent product review methods. Journal of Behavioral Education. 2014;23:60–88. doi: 10.1007/s10864-013-9179-z. [DOI] [Google Scholar]
  36. Sanetti LMH, Collier-Meek MA. Increasing implementation science literacy to address the research-to-practice gap in school psychology. Journal of School Psychology. 2019;76:33–47. doi: 10.1016/j.jsp.2019.07.008. [DOI] [PubMed] [Google Scholar]
  37. Shahan TA, Greer BD. Destructive behavior increases as a function of reductions in alternative reinforcement during schedule thinning: A retrospective quantitative analysis. Journal of the Experimental Analysis of Behavior. 2021;116(2):243–248. doi: 10.1002/jeab.708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. St. Peter CC, Brand D, Jones SH, Wolgemuth HR, Lipien L. On a persisting curious double standard in behavior analysis: Behavioral scholars’ perspectives on procedural fidelity. Journal of Applied Behavior Analysis. 2023;56(2):336–351. doi: 10.1002/jaba.974. [DOI] [PubMed] [Google Scholar]
  39. St. Peter Pipkin C, Vollmer TR, Sloman KN. Effects of treatment integrity failures during differential reinforcement of alternative behavior: A translational model. Journal of Applied Behavior Analysis. 2010;43(1):47–70. doi: 10.1901/jaba.2010.43-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Suhrheinrich J, Dickson KS, Chan N, Chan JC, Wang T, Stahmer AC. Fidelity assessment in community programs: an approach to validating simplified methodology. Behavior Analysis in Practice. 2020;13:29–39. doi: 10.1007/s40617-019-00337-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Taylor BA, LeBlanc LA, Nosik MR. Compassionate care in behavior analytic treatment: Can outcomes be enhanced by attending to relationships with caregivers? Behavior Analysis in Practice. 2019;12(3):654–666. doi: 10.1007/s40617-018-00289-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Vollmer TR, Sloman KN, St. Peter Pipkin, C. Practical implications of data reliability and treatment integrity monitoring. Behavior Analysis in Practice. 2008;1:4–11. doi: 10.1007/BF03391722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Ward-Horner J, Sturmey P. Component analysis of behavior skills training in functional analysis. Behavioral Interventions. 2012;27(2):75–92. doi: 10.1002/bin.1339. [DOI] [Google Scholar]
  44. Weyman JR, Sy JR. Effects of neutral and enthusiastic praise on the rate of discrimination acquisition. Journal of Applied Behavior Analysis. 2018;51(2):335–344. doi: 10.1002/jaba.440. [DOI] [PubMed] [Google Scholar]
  45. Wilder DA, Atwell J, Wine B. The effects of varying levels of treatment integrity on child compliance during treatment with a three-step prompting procedure. Journal of Applied Behavior Analysis. 2006;39(3):369–373. doi: 10.1901/jaba.2006.144-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Williams DE, Vollmer TE. Essential components of written behavior treatment plans. Research in Developmental Disabilities. 2015;36:323–327. doi: 10.1016/j.ridd.2014.10.003. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable


Articles from Behavior Analysis in Practice are provided here courtesy of Association for Behavior Analysis International

RESOURCES