Abstract
Timeout is an effective behavior-reduction strategy with considerable generality. However, little is known about how timeout is implemented under natural conditions, or how errors in implementation impact effectiveness. During Experiment 1, we observed teachers implementing timeout during play to evaluate how frequently the teachers implemented timeout following target behavior (omission errors) and other behaviors (commission errors) for four children. Teachers rarely implemented timeout; thus, omission errors were frequent, but commission errors rarely occurred. During Experiment 2, we used a reversal design to compare timeout implemented with 0% omission integrity, 100% integrity, and the level of omission integrity observed to occur during Experiment 1 for two of the participants. Timeout implemented with reduced-integrity decreased problem behavior relative to baseline, suggesting that infrequent teacher implementation of timeout may have been sufficient to reduce problem behavior.
Keywords: negative punishment, omission errors, school, timeout, treatment integrity
Timeout involves with removal of reinforcers or loss of opportunity to earn reinforcers for a period of time contingent upon undesirable behavior (e.g., Cuenin & Harris, 1986; Hobbs & Forehand, 1977; Warzak et al., 2012). Timeout has been effectively used in educational contexts to reduce challenging behavior and, when paired with reinforcement procedures, increase appropriate behavior (Everett, 2010). Timeout may be a particularly common intervention during play-based educational periods, (e.g., recess; Donaldson & Vollmer, 2011) when natural forms of reinforcement are frequently available and reinforcers maintaining undesirable behavior may be difficult to control. However, the efficacy of timeout may be affected by the consistency with which it is implemented (e.g., Clark et al., 1973; Zimmerman & Baydan, 1963). In schools, timeout may occur more intermittently than planned if teachers are busy attending to other students, or more often than planned if teachers use timeout for unwanted responses other than those specified in a behavior plan. Such unplanned variability in timeout schedule would constitute a treatment integrity failure. The term treatment integrity refers to the extent to which procedures are implemented as designed or described (Peterson et al., 1982). Reduced-integrity can negatively affect intervention efficacy (e.g., Arkoosh et al., 2007; Fryling et al., 2012), and make it difficult to identify functional relations between behavior and environment (Vollmer et al., 2008).
Treatment integrity failures are often categorized as omission errors or commission errors. Omission errors are said to occur when a component of the intervention is omitted or not implemented as designed. In timeout procedures, omission errors would occur when teachers failed to implement timeout when it was specified in a behavior plan. As noted above, omission errors may occur when teachers are busy attending to other students, or if implementing timeout becomes aversive to the teachers. Commission errors are said to occur when a component of the procedure is added or implemented at the incorrect time. In timeout procedures, commission errors would occur if teachers implemented timeout following student responses other than those specified by the behavior plan. Such overuse of punishment procedures by educators has been speculated (e.g., Maag, 2001), but has not been demonstrated regarding teachers’ implementation of timeout.
Little is known about the likelihood of omission and commission errors during naturalistic implementation of timeout. Taylor and Miller (1997, Experiment 1) measured the treatment integrity with which classroom staff (teacher, aide, and intern) implemented timeout, and corresponding rates of students’ challenging behavior during naturalistic classroom observations. Classroom staff implemented timeout with approximately 67% integrity (including omission and commission errors, attention during timeout, implementing timeout for too long), and the students engaged in problem behavior across of 40% to 59% of intervals, on average. The staff received training on the timeout procedure after the initial observations, which resulted in increases in integrity to approximately 98% and decreases in problem behavior (to 4% of intervals on average) for two students. These data suggest that reduced treatment integrity may negatively impact the efficacy of timeout procedures. However, several different kinds of integrity failures occurred before teacher-training, and the experimenters neither reported the frequency of different kinds of errors nor directly manipulated treatment integrity to isolate effects of integrity failures.
Despite the high likelihood of integrity failures during teachers’ implementation of timeout (e.g., Taylor & Miller, 1997), the impact of intermittent use of timeout (omission errors) or overuse of timeout (commission errors) remains largely unknown. To date, omission errors during timeout have received the most attention in the empirical literature. In studies manipulating omission integrity of treatment packages including timeout, challenging behavior remained suppressed until the likelihood of timeout reduced to 50% or 25% (Northup et al., 1997; Rhymer et al., 2002). However, integrity manipulations occurred for both timeout and positive-reinforcement (e.g., DRA) components of the larger packages in these studies, so effects of reduced-integrity on timeout per se cannot be assessed. Other research has isolated effects of intermittent implementation on timeout, but results have been mixed across studies. For example, Pendergrass (1971) found that brief timeouts implemented on a continuous schedule suppressed challenging behavior, but that intermittent timeouts were ineffective even when implemented for longer durations. In other studies, intermittent timeout schedules effectively suppressed challenging behavior to low rates (e.g., Barton et al., 1987; Clark et al., 1973, Experiment 2; Donaldson & Vollmer, 2012; Jackson & Calhoun, 1977), or initially suppressed behavior but lost their efficacy as intermittency increased (e.g., Calhoun & Matherne, 1975; Clark et al., 1973; Jackson & Calhoun, 1977; Zimmerman & Baydan, 1963). The reasons for these potentially contradictory findings remain unclear, but further research on intermittent timeout (that is, timeout with omission errors) seems warranted.
No research to date has experimentally examined impacts of commission errors on timeout. The reason for this paucity of research is unclear, but may be because commission errors infrequently occur during naturalistic implementation. That is, although conceptual papers warn about the overuse of timeout procedures (e.g., Maag, 2001), it may be the case that such commission errors rarely occur in actual practice. Although Taylor and Miller (1997) mention commission errors in their narrative descriptions of integrity failures (e.g., “. . .applying timeout to non-targeted problem behaviors. . .”, p. 10), the frequency of commission errors cannot be determined from their data because commission and omission errors were aggregated in their quantitative analysis.
Thus, at least two gaps in the literature currently exist. First, the naturally occurring frequency of omission and commission errors during timeout implementation remains unknown. Identifying the frequency of these errors would be useful to inform subsequent experimental studies evaluating impacts of those errors (similar to the approach taken by Carroll et al., 2013). Second, and relatedly, the extent to which common errors reduce the efficacy of timeout remains unclear, particularly when timeout is implemented in the absence of other positive-reinforcement treatment components. The current study attempted to address these gaps in the literature by measuring naturally occurring timeout implementation by school personnel who already implemented existing timeout procedures in a school context (Experiment 1), and by experimentally manipulating integrity during timeout procedures implemented in the absence of concurrent differential reinforcement (Experiment 2). To accomplish these aims, we evaluated timeout from play situations in schools because it allowed for isolation of timeout procedures from other teacher-mediated reinforcement procedures (such as the DRA procedures included in evaluations by Northup et al., 1997 and Rhymer et al., 2002).
Study 1
Method
The purpose of Study 1 was to identify how frequently teachers made omission and commission errors during naturalistic implementation of timeout during play situations.
Participants.
We recruited from local public schools by distributing a flyer that described timeout as sitting out from play for misbehaving. Student-teacher dyads were eligible if the student engaged in problem behavior during play, and the teacher was already implementing timeout during play (i.e., we did not add timeout procedures to any student’s behavior plan). The first five teachers (and their students) who responded to the flyer met these criteria.
Willis was a 10-year-old Caucasian boy with diagnoses of Attention Deficit/Hyperactivity Disorder (ADHD), Reactive Attachment Disorder, and Seizure Disorder. Kyle was a long-term substitute teacher who served as Willis’ teacher. Kyle had previously worked with high-school students for 15 years. At the time of consent, he had been implementing timeout with Willis for 2 months.
Sonny was a 6-year-old African American boy with diagnoses of Autism, Seizure Disorder, and ADHD. Ian was a 6-year-old Caucasian boy with diagnoses of ADHD and Oppositional Defiant Disorder. Sonny and Ian attended the same classroom. Their teacher, Jill, had been teaching for 16 years, and was a Board Certified Behavior Analyst (BCBA). At the time of consent, Jill had been implementing timeout with Sonny for 2 months and Ian for 2 weeks.
Keith was a 9-year-old Caucasian boy with diagnoses of Traumatic Brain Injury, ADHD, and Bipolar Disorder. Keith was in a co-teach classroom; his teachers were Kelly and Cathy. Kelly had been teaching for 9 years and was a BCBA. Cathy had been teaching for 5 years. At the time of consent, they had been implementing timeout with Keith for 1 month.
Charley was a 5-year-old Caucasian boy with a diagnosis of Sensory Processing Disorder. Charley attended an integrated preschool classroom. His teachers were Paula and Dorothy, who had been teaching for 17 and 23 years, respectively. At the time of consent, timeout had been in place for Charley for approximately 2 years.
Settings.
Willis, Sonny, Keith, and Ian attended an alternative-education public school that served elementary-aged students who engaged in severe challenging behavior. The school had three classrooms (each serving one to five students), a gymnasium, and a playground. Charley attended a private preschool that served infants to kindergarten-aged children. The preschool had several classrooms, a gymnasium, and a playground. None of the teachers used programmed, class-wide behavior management procedures during the observed periods.
Pre-Assessment.
After verifying that the student and teacher were eligible for participation, we obtained informed consent from the student’s legal guardian and teacher, and assent for Willis and Keith, who were over 7 years old. To obtain information about the existing timeout procedure, we conducted a brief interview with the teacher and reviewed the written Behavior Intervention Plan. Because Charley’s teachers did not have an existing written plan, we drafted definitions of behavior resulting in timeout and identified components of the procedure from the interview and verified these with Charley’s teachers. The components of each student’s timeout procedure are listed in the Supplemental Materials.
Measurement.
We observed the entire duration of regularly scheduled recesses for Willis, Sonny, Keith, and Ian. For Ian, we also collected data during the entire duration of trade-in times (periods when Ian was allowed to exchange tokens for activities). Recesses occurred on the school’s outdoor playground when weather permitted, or in the indoor gymnasium. Typically, one classroom attended recess at a time, such that there were one to five students and two staff (excluding researchers) interacting during recesses. For Charley, we collected data during the entire duration of free-choice play or for 120 min, whichever came first. Free-choice play was a period when Charley was allowed to engage in child-directed play with various activities in the classroom. Typically, about 15 other students and two teachers (excluding researchers) were present.
Trained observers collected continuous data using laptops with real-time data collection software. Each student was observed twice per week on average (range, 1–6 observations per week). Observations lasted for 29 min on average (range, 6–120 min). Observation duration varied due to changes in the time allocated to free play. We collected data on problem behavior and teacher implementation of timeout for a minimum of five school days and until rates of behavior were stable or increasing.
We used the operational definition for selected student responses from the existing BIP whenever possible. If the operational definition was absent or if additional clarification about the response was needed, we interviewed the student’s teacher. Thus, all responses were defined based on what the teachers noted would result in implementation of timeout according to the existing BIP. Willis, Sonny, and Keith engaged in aggression. Ian engaged in aggression, property destruction, and negative peer interactions. Charley engaged in aggression, property destruction, loud vocalizations, and language. Operational definitions for each student appear in Table 1. To evaluate treatment integrity of timeout implementation (our primary dependent variable), we also collected data on components of the students’ individualized plans, such as warnings about timeout, instructions to go to timeout, physical guidance to timeout, the occurrence of timeout, adult and peer attention during timeout, and access to materials during timeout. When more than one teacher implemented timeout (i.e., Charley), we collected data as if they were one person; we did not collect data on which teacher implemented each component.
Table 1.
Operational Definitions of Problem Behavior Across Students for Experiment 1.
| Student | Operational definition | Topography |
|---|---|---|
|
| ||
| Willis | Actual or attempted forceful contact with another person (including throwing objects at someone) | Aggression |
| Sonny | Actual or attempted hitting (with open hand or closed fist), kicking, pinching, biting, scratching, spitting on or grabbing another person, or throwing objects within 1 ft of another person | Aggression |
| Keith | Actual or attempted forceful contact between some part of his body and a peer (i.e., hitting, head butting, biting, or hair pulling). This also included spitting on a peer and throwing objects within 1 ft of a peer. | Peer aggression |
| Ian | Actual or attempted forceful contact with another person including throwing objects at someone; ripping, swiping, throwing, banging, or kicking materials, toppling furniture, or breaking materials; teasing peers (including name calling), telling peers what to do, or instructing peers to engage in problem behavior | Aggression; property destruction; negative peer interactions |
| Charley | Actual or attempted forceful contact with another person (i.e., hitting, spitting, scratching, kicking, pushing, rough play), puffing chest out while grimacing and leaning into a peer’s space; throwing, kicking, tossing, ripping, sweeping, or stepping on materials or destroying a peer’s creation (excluded kneeling, sitting, or crawling on toys); vocalizations above conversation volume; swearing or pretending to shoot guns. | Aggression; property destruction; loud vocalizations; language |
Interobserver Agreement (IOA).
Data collectors were previously trained by practicing computerized data collection on at least two different research projects until IOA was at least 80% across all responses for three consecutive sessions. These trained observers collected data throughout the evaluation.
A second independent data collector collected data for 33%, 86%, 60%, 44%, and 40% of observations for Willis, Sonny, Keith, Ian, and Charley, respectively. We used software to calculate proportional agreement. Each observer’s data was divided into 10-s intervals. For each 10-s interval, the program divided the smaller count by the larger count and converted to a percentage. If observers agreed on the nonoccurrence of behavior, the IOA for that interval was considered 100%. Average IOA across all behaviors and events was 98% (range, 86–100%) for Willis, 99% (range, 96–100%) for Sonny, 99% (range, 97–100%) for Keith, 100% for Ian, and 99% (range, 96–100%) for Charley.
Timeout Procedures.
Recall that we collected data based on the existing timeout procedures developed by the participants’ educational teams.1 Thus, each participant had an individualized timeout procedure that was limited to play contexts (description in Supplemental Materials; additional details available from corresponding author). The BIP for each student specified that timeout should follow each instance of target behavior. Generally, students were required to sit for 1 min following problem behavior, except that Charley sat until 10 s of calm behavior and Sonny was required to sit for the remainder of recess during the second timeout of the day.
Analysis of Timeout Schedule.
We calculated omission and commission integrity percentages for each teacher based on the relation of target behavior and occurrence of timeout.
Omission Integrity Calculation.
Omission errors occurred when the teacher did not implement timeout (student within 1 ft of the timeout area) within 2 min of an instance of the targeted problem behavior. Implementation of timeout included instructing and guiding the student to timeout (if necessary). We selected 2 min as a liberal criterion to account for delays to the timeout area due to problem behavior. To determine the integrity percentage, we divided the instances of problem behavior followed by timeout within 2 min by total instances of problem behavior, and multiplied by 100.
Commission-Integrity Calculation.
Commission errors occurred when the teacher implemented timeout, but a targeted response had not occurred in the previous 2 min. To determine the integrity percentage, we divided the total instances of timeout preceded by problem behavior within the previous 2 min by the total instances of timeout, and multiplied by 100.
Overall Integrity.
In addition to analyzing the frequency with which teachers used timeout, we completed checklists of the timeout procedures specified in the participant’s BIP. This secondary measure of integrity allowed us to capture the extent to which teachers provided instructions or guidance to go to timeout or followed reactive procedures if students left timeout without permission. Because the rapidity with which a teacher needed to instruct the student to go to timeout were not specified in any of the written BIP documents, we used a 10-s window between the instance of challenging behavior and the instruction to consider an instruction “correct.” We completed a checklist for each opportunity to implement a component of the procedure.
Results and Discussion
Results are shown in Figure 1. For each participant, omission integrity is shown by the black bar and commission integrity is shown by the white bar. Because these data are calculated as correct implementation of the intervention, taller bars show higher levels of treatment integrity.
Figure 1.
Percentage of omission and commission integrity across students in Experiment 1. The asterisk for Ian indicates that no challenging behavior was observed, so omission integrity could not be calculated.
We observed Willis for 2.3 hr across six school days; one instance of timeout occurred. Timeout occurred after only 2 of 38 total instances of the target response (5% omission integrity). Karl, Willis’s teacher, never implemented timeout in the absence of targeted responding, resulting in 100% commission integrity. Overall, Karl never told Willis rules about recess behavior or timeout, stated the rule that Willis violated, instructed Willis to timeout (he just physically guided him), had Willis sit for the correct duration of time (he only had him sit for 12 s consecutively), or physically guided Willis back to timeout when he left the area. However, Karl did not provide any attention or access to materials during timeout. Using the integrity checklist, Karl’s overall average integrity was 3% (range, 0–17% integrity).
We observed Keith for 1.7 hr across five school days; two total instances of timeout occurred. Timeout occurred after 2 of 18 total instances of the target response (11% omission integrity). Kelly and Cathy, Keith’s teachers, never implemented timeout in the absence of targeted responding, resulting in 100% commission integrity. Overall, they never instructed Keith to sit or had him sit for the correct duration (timeout was always longer than 1 min). However, they always refrained from commenting directly on problem behavior and instructed him to rejoin recess. Using the checklist measure, Kelly and Cathy’s overall integrity was 50% (range, 40–60% integrity).
Ian never engaged in targeted problem behavior nor did Jill implement timeout across 2.6 hr of observations occurring across nine school days. Because we could not calculate omission integrity, Ian’s omission data are not pictured in Figure 1.
We observed Sonny for 1.7 hr across seven school days; four total instances of timeout occurred. Timeout occurred after only 3 of 15 total instances of the target response (20% omission integrity). Jill implemented timeout once for disrobing (not a targeted response), resulting in 75% commission integrity. Overall, Jill never instructed Sonny to approach her and never implemented the first timeout for the correct duration (timeout was always longer than 1 min). However, she occasionally instructed Sonny to sit and always physically guided him to timeout when necessary, restricted access to items during timeout, and implemented the second timeout for the correct duration. Using the checklist, Jill’s overall integrity was 24% (range, 0–54% integrity).
We observed Charley for 9.5 hr across five school days; five total instances of timeout occurred. Timeout occurred after 8 of 185 individual target responses (4% omission integrity). Dorothy and Paula implemented timeout in the absence of targeted responding once for using a rude tone of voice, resulting in 80% commission integrity. Overall, they never had Charley sit for the correct duration of time (timeout was always longer than 10 s of calm behavior). However, they occasionally gave Charley warnings about timeout and instructions to sit. In addition, they always guided him to timeout when necessary and started counting to 10 when he told them he was ready (by saying, “1”). Using the checklist measure, Paula and Dorothy’s overall integrity was 10% (range, 0–43% integrity).
Across all participants, omission integrity was lower than commission integrity. In other words, teachers rarely implemented timeout regardless of student behavior, which resulted in high commission integrity (few instances of timeout when it should not occur) and low omission integrity (few instances of timeout when it should occur). When timeout did occur, it was often not conducted as specified in the students’ Behavior Intervention Plans (BIPs). Teachers tended to implement several components inconsistently or not at all (e.g., timeout instructions, timeout warnings, and timeout duration). The infrequent use of timeout resulted in a relatively small number of observations of the actual timeout procedures (12 total instances of teacher-initiated timeout across participants), despite a total of more than 17 hr of observation time.
Low omission integrity may have been positively correlated with the size of the class or the number of behaviors targeted for timeout. Charley’s teachers, Paula and Dorothy, had the lowest omission integrity (4%) and the most students (about 15 children) to manage during play. In addition, Paula and Dorothy were targeting four types of problem behavior (aggression, property destruction, loud vocalizations, and language) for timeout. Karl, Jill, and Kelly and Cathy’s classrooms ranged from one to five children, and they were only targeting one type of problem behavior (aggression) for Willis, Sonny, and Keith. Thus, it may have been more challenging for Paula and Dorothy to catch every instance of four different forms of problem behavior while trying to manage more students. When low treatment integrity is a concern, it may be necessary to target the most important behavior with the timeout procedure and use other reinforcement-based or antecedent-based procedures to address minor forms of problem behavior.
It is also possible that the severity of the behavior impacted the integrity with which timeout was implemented. As noted above, Charley’s teachers used timeout for the most diverse topographies of problem behavior, including some topographies (like loud vocalizations and language) that were not dangerous. Charley was also the youngest (and smallest) of the participants in our study. The timeout procedures for our other participants primarily focused on behavior that was likely to result in imminent harm to the environment (e.g., property destruction) or other people (e.g., aggression or negative peer interactions, which often evoked aggression from Ian’s peers); teachers may have been more likely to follow through with timeout for responses that they viewed as more potentially dangerous. These possibilities might be interesting to explore in future descriptive evaluations of timeout implementation.
The aim of this study was to conduct naturalistic observations of the frequency of timeout that was part of students’ existing Behavior Intervention Plans. It is possible that low omission integrity and inconsistent BIP implementation were observed because teachers were implementing the necessary components with the level of integrity needed to maintain manageable rates of targeted problem behavior (this may also have been why no problem behavior was observed for Ian). In other words, although the students’ BIPs specified FR-1 timeout, it may not have been necessary for teachers to implement timeout for every instance of behavior. The existing timeout literature provides some support for the use of intermittent timeout (e.g., Clark et al., 1973, Experiment 2; Donaldson & Vollmer, 2012; Jackson & Calhoun, 1977), and rates of challenging behavior were generally low during our observations. Thus, in Study 2, we experimentally manipulated omission integrity to compare behavior at the integrity levels with which the teachers were implementing (intermittent timeout) to no-intervention baselines and 100% integrity phases.
Study 2
Method
Participants and Setting.
We initially recruited all four participants who engaged in challenging behavior during Study 1 (all but Ian) for Study 2. However, when timeout was removed (during baseline), Charley engaged in very low rates of challenging behavior (M = 0.2/min) that decreased to zero across the phase, so we recommended that teachers remove timeout from his behavior plan. Sonny’s teacher-developed timeout procedure did not suppress aggression to levels acceptable by the teachers even when implemented with perfect integrity. Although a modified procedure was moderately more effective (data available in Supplemental Materials), the school year ended before we could assess reduced-integrity. Therefore, Willis and Keith participated in Experiment 2. The setting was identical to Experiment 1. The researcher who implemented timeout was a Master’s level behavior analyst with extensive experience with a variety of reinforcement- and punishment-based interventions (including timeout); she was familiar with the students but did not typically interact with them. An average of 4 sessions (range, 1–5 sessions) were conducted with each student per week.
Measurement and Data Collection.
We collected data on the frequency of targeted problem behavior across all phases of the experiment. Problem behavior for each student was the same as in Study 1, except for Keith. After Study 1, his teachers changed his BIP to include all negative peer interactions in addition to peer aggression as criteria for timeout. Negative peer interactions were defined as threatening a peer with physical harm, instructing a peer to engage in problem behavior, labeling peers with negative adjectives or nouns (e.g., “You are. . . [dumb, a baby, a fatty rat, stupid, annoying].”), and pointing or sneering at a peer (including directly calling a peer a swear word [e.g., “You are a mother-fucker”]).
We collected paper-pencil data by tallying the instances of problem behavior across 1-min intervals. Data were collected in 1-min intervals so that researchers could simultaneously implement timeout and collect data. All topographies of problem behavior targeted for timeout (e.g., aggression, negative peer interactions) were scored as the same class, but we differentiated between problem behavior that occurred during timein and timeout (each was scored in a separate column on the data sheet). We only included the data on problem behavior during timein for our analyses. We also recorded the duration of each session (which varied naturally based on the teacher-specified duration of the free play activity) and timestamps of when the experimenter told the student to go to timeout, when timeout started (the student was within 1 ft of the timeout area), and when timeout ended.
Interobserver agreement (IOA) scores for problem behavior were calculated using a proportional agreement calculation. The smaller count of problem behavior was divided by the larger count of problem behavior for each 1-min interval. Then, those results for each interval were summed, divided by the total number of intervals, and multiplied by 100. After calculating IOA between the two observers for each session, we averaged the scores across sessions for each student. Interobserver agreement was collected for 57% and 62% of sessions for Willis and Keith, respectively. On average, IOA was 94% for Willis, and 92% for Keith.
A second observer collected procedural fidelity data on the experimenter’s behavior during 74% and 68% of sessions for Willis and Keith, respectively. Procedural fidelity was calculated as the number of procedural components implemented correctly divided by the total possible procedural components. On average, fidelity was 99% for Willis, and 97% for Keith.
Procedure.
We used a reversal design to demonstrate experimental control. At least five sessions, but no more than 15 sessions, were conducted per phase. The decision to change phases was based on visual inspection of graphed data or meeting the 15-session maximum. We evaluated three conditions: a no-timeout baseline phase (to ensure that timeout procedures were necessary to suppress behavior), a high-integrity phase (during which timeout was implemented with 100% integrity to ensure that timeout was effective for that student), and a reduced-integrity phase with an integrity level matched to that used by their teachers in Study 1. Typically, baseline preceded an evaluation of timeout to keep recent reinforcement history constant across phases. However, for Keith, we also completed a reduced-integrity phase following 100% integrity to provide preliminary information on the possible influences of recent reinforcement history on responding when integrity was reduced.
We did not control the interactions between students and their peers. If potentially dangerous behavior occurred, we redirected peers to play in a different location (e.g., walk away). This redirection strategy was common in the classrooms and was deemed acceptable by the teachers to maintain the safety of students (particularly during no-timeout baseline phases). Teachers sometimes placed themselves between students if they thought it was necessary to maintain safety, but not did use timeout or physically manage students.
We instructed teachers to not implement timeout throughout our evaluation, but teachers otherwise talked to and played with the students as usual. The researchers and data collectors interacted with the students by talking to them or assisting them if the student approached and initiated the interaction. Only the researcher implemented timeout when programmed.
Baseline.
During this phase, problem behavior did not result in any programmed consequences (with the exception that researchers or teachers may have prompted peers to move away, as described above). Teachers interacted with the students by playing with them, talking to them, and assisting them when requested. However, they did not implement timeout.
High-Integrity.
We designated a bench or other location in the play area for timeout; these were the same designated areas the teachers used in Study 1. During each session, we instructed the student to timeout following each instance of targeted problem behavior by saying, “[Student Name], go sit.” Each instance of targeted behavior resulted in timeout, and we physically guided the student to timeout if necessary (all researchers were previously trained in the safe use of physical guidance). Access to all materials was restricted, and we did not attend to the student during timeout. We used the duration of timeout that was specified in each student’s BIP (1 min for both Willis and Keith).
We also added instructions for Willis because he often engaged in high rates of aggression when abrupt changes occurred in his environment, and his psychiatrist recommended that we inform him of any changes to help him manage his own behavior. Thus, we described the timeout contingency to Willis prior to the first session of the 100%-integrity and reduced-integrity phases; we used similar instructions regardless of the programmed level of integrity. We also read a script describing the rules prior to each recess so that he was aware of the expectations for his behavior (see Supplemental Materials for rules scripts).
Reduced-Integrity.
During reduced-integrity phases, timeouts were programmed to occur intermittently. We used the level of omission integrity with which teachers implemented timeout from Study 1 to determine the frequency of timeout. For Willis, an average of one in 20 instances of targeted behavior was programmed to result in timeout (5% integrity). For Keith, an average of one in nine instances of target behavior was programmed to result in timeout (11% integrity). All other procedures (e.g., timeout duration, restricted access to items during timeout, etc.) of this phase were the same as the 100% integrity phase. For Willis, we reminded him that timeout would occur following problem behavior (but did not specify the frequency with which it would occur) and still read the rules prior to each recess. The experimenter used a list created using the RANDBTWN function in Excel to determine which instances of problem behavior would result in timeout.
Obtained Integrity.
Obtained levels of treatment integrity sometimes differed from the programmed levels because participants sometimes engaged in bursts of responding before the researcher could implement timeout, and because timeout occurred according to probabilistic (random-ratio) schedules during reduced-integrity phases. Obtained integrity was evaluated using the data collected by the primary observer. We calculated obtained omission integrity by dividing the number of instances of timeout by the instances of problem behavior that occurred during timein, and multiplying by 100. For Willis, whose programmed level of reduced-integrity was 5%, obtained integrity was 25.0% because of low rates of problem behavior during the reduced-integrity phase. For Keith, whose programmed level of reduced-integrity was 11%, obtained integrity was 17.8%.
Data Analysis.
We graphed rates of problem behavior during timein for each session across phases (including rates obtained during timein during Study 1). Instances of problem behavior during timeout were excluded from analyzed rates because, if the transition to timeout evoked problem behavior, rates of problem behavior would be inflated and influence the interpretation of the procedure’s effectiveness.
Results and Discussion
Figure 2 shows data for the Study 2. Each row of graphs shows data for a participant, with the left graphs showing session-by-session rates and the right graphs showing means and standard deviations. The first phase (labeled EXP 1) shows response rates when the teacher implemented timeout naturalistically. Data for Willis are shown in the top row. Rates of aggression were variable when teachers implemented the procedure during Study 1. During the first two baseline phases during Study 2, when timeout was not implemented, aggression increased throughout the phases. High-integrity implementation of timeout reduced aggression (M = 0.1/min), suggesting that timeout was an effective intervention. Rates of aggression remained low when timeout was implemented with 5% omission integrity (M = 0.04/min). Willis experienced at least one integrity failure during both sessions in which aggression occurred (sessions 29 and 32 on the graph). After session 38, Willis changed classrooms and attended recess with different peers (shown by the dotted phase-change line in the line graph). Willis engaged in few instances of aggression with his new peers, despite the absence of timeout. Given that aggression occurred infrequently during baseline after he changed classrooms, we decided that timeout was no longer warranted and discontinued data collection. Thus, the single reduced-integrity phase suggested that timeout remained effective when implemented with the same average integrity as observed in Study 1.
Figure 2.
Rates of targeted challenging behavior during time-in for Experiment 2. Each row shows data for one of the participants, with session-by-session data on the left and mean response rates on the right. The first phase (labeled EXP 1) shows rates during naturalistic teacher implementation of timeout. The dotted phase-change line in Willis’ graph shows the point at which he changed classrooms.
The second row of graphs in Figure 2 show data for Keith. Keith engaged in variable but generally increasing rates of negative interactions during Study 1. During baseline phases, rates of negative interactions increased (M = 0.4/min) and continued to be variable, particularly in the latter two baseline phases. Negative peer interactions became so unpredictable in both frequency and intensity during the third baseline phase (sessions 53–58) that Keith’s teacher requested that we terminate the phase early. In contrast, high-integrity implementation of timeout resulted in low, fairly stable rates of negative interactions (M = 0.09/min). Rates generally remained low when timeout was programmed to occur with 11% integrity (M = 0.1/min). Keith experienced at least one integrity failure during each reduced-integrity session in which negative peer interactions occurred, except for session 69 in the graph, during which the single instance of a negative interaction resulted in timeout. Although the average rate of negative peer interactions was nearly identical to high-integrity phases, slightly more variable responding occurred during the reduced-integrity phases than during phases with high-integrity. Mean response rates were more than halved during the reduced-integrity phase that followed high-integrity (M = 0.07/min) relative to the reduced-integrity phases that followed baseline (M = 0.18/min), but this difference was not statistically significant (Mann–Whitney U = 151.5, p = 0.77).
The means shown in the right column of Figure 2 demonstrate that removing the timeout procedure entirely (baseline) increased mean response rates relative to naturalistic implementation of timeout by the teachers. Additionally, implementation of timeout with both high and reduced-integrity resulted in lower mean response rates relative to baseline. For Willis, average rates of aggression were lower in the reduced-integrity condition than in the high-integrity condition or when the teachers implemented timeout. It is possible that initial exposure to high-integrity implementation of timeout may have affected subsequent low-integrity implementation (similar to effects observed by Colón & Ahearn, 2019). For Keith, rates of negative peer interactions declined as integrity improved, and rates in our implementation of low-integrity were similar to those obtained by the teachers, suggesting that teachers may have been implementing with what they considered to be sufficient integrity to suppress the behavior. For both participants, the low rates of challenging behavior during reduced-integrity conditions are particularly notable given how far treatment integrity was reduced during those conditions.
That intermittent timeout reduced responding is perhaps not surprising, given previous studies demonstrating that intermittent timeout can be effective (e.g., Clark et al., 1973, Experiment 2; Donaldson & Vollmer, 2012, Jackson & Calhoun, 1977). However, several previous studies (e.g., Clark et al., Donaldson & Vollmer, and Jackson & Calhoun) used a relatively rich schedule of intermittent timeout (e.g., VR 4, VR 5, VR 8). In our experiment, low responses rates persisted with leaner programmed schedules of timeout (e.g., random-ratio [RR] 9, RR 20), but responding returned when we discontinued timeout. Thus, there may be some critical integrity level at which timeout must be implemented to be effective, even if this integrity level is quite low (e.g., 1% of all instances of responding).
Although intermittent timeout reduced responding relative to baseline for both participants, it was more effective for Willis than for Keith. This may be because of the rules provided to Willis prior to each recess. Recall that we retained largely the same instructions between high-integrity and reduced-integrity phases. Because the rule accurately described the contingencies during the high-integrity phase, the rules alone (without the contingencies) may have suppressed behavior during the reduced-integrity phase. However, we were unfortunately unable to replicate Willis’s findings within or across participants. More research on the effects of rules on timeout effectiveness is warranted.
Discussion
We descriptively evaluated how consistently teachers implemented timeout (Study 1), and replicated those treatment integrity failures in an experimental, parametric evaluation (Study 2). Teachers rarely implemented timeout with their students, resulting in high levels of commission integrity but low levels of omission integrity. However, these low levels of integrity seemed sufficient to reduce challenging behavior for the two participants in Study 2. Like some previous studies (e.g., Clark et al.,1973, Experiment 2; Donaldson & Vollmer, 2012, Jackson & Calhoun, 1977), our findings support the idea the intermittent timeout can reduce behavior. This finding is of importance because it suggests that timeout may be a robust procedure that is resistant to treatment challenges.
The efficacy of intermittent timeout also suggests that timeout may not need to be programmed to occur on a continuous schedule. Future research should evaluate when timeout must be implemented on a continuous schedule and when timeout can be implemented on an intermittent schedule. Previous research on treatment integrity has suggested that initial exposures to high-integrity intervention may affect responding during subsequent integrity failures (e.g., Colón & Ahearn, 2019; St. Peter Pipkin et al., 2010). For example, Colón and Ahearn demonstrated that interspersing high-integrity sessions during a reduced-integrity phase mitigated negative effects of reduced-integrity during response interruption and redirection procedures. Recall that, during Study 2 of the current study, both participants experienced a high-integrity timeout phase before low-integrity phases. We thought that this initial high-integrity phase was important to demonstrate the efficacy of the teacher-designed timeout procedure, and identified that the procedure was not effective for one student (Sonny, data available in Supplemental Materials). Thus, for Sonny, omitting an initial high-integrity phase may have resulted in extensive time-in low-integrity phases only to end up with uninterpretable data because timeout was not effective when implemented perfectly. To address the possible role of phase sequence, we conducted two final phases with Keith, in which reduced-integrity followed high-integrity instead of baseline. Although mean rates of problem behavior were reduced during this reduced-integrity phase relative to those that followed baseline, the decrease was not statistically significant. Thus, the role of reinforcement history when timeout is implemented with integrity failures remains unclear. Future research could adopt experimental approaches similar to the one used by Colón and Ahearn to more fully explore how histories with high-integrity timeout might inoculate against later, lower-integrity implementation. If intermittent timeout is more effective following high-integrity timeout than following baseline, behavior analysts may need to focus on training teachers sufficiently to ensure initial implementation of timeout with high-integrity. Taking the time to train staff well in the beginning may result in more robust treatment effects if integrity failures occur later.
In our studies, we were particularly interested in timeout procedures as they were implemented in practice by teachers. To this end, we recruited participants for whom timeout was already part of their behavior plans and maintained the components of the teacher-developed plans when we experimentally manipulated the frequency of timeout in Study 2. The strength of this approach is that it allowed us to gain a snapshot of actual practices (rather than procedures imposed by researchers). However, there were also distinct limitations of this approach. First, the existing timeout procedures did not always align with best practices in the literature. For example, Sonny’s teachers used a short timeout initially followed by a timeout with a potentially very extended duration—a procedure that has no established efficacy in the literature (notably, we offered to assist Sonny’s teachers in updating his procedures to align with best practice, but they declined). Our sample did not permit an evaluation of the extent to which teacher-developed plans align with published best practices, but such an evaluation may make a useful contribution to the literature. Second, no formal functional analyses were available for the participants. Thus, it is possible that timeout may have actually been a contraindicated procedure for some participants. However, we examined timeout as it is typically used in school contexts, and we explicitly avoided recruiting students whose teachers used timeout during work situations, in an attempt to reduce the likelihood that timeout may function as a reinforcement procedure. Finally, using the existing timeout procedures resulted in some variation in the actual procedural components across participants (e.g., instructions were used for Willis but not Keith), which could have contributed to variability observed in Study 2. Analyzing the interactions between particular timeout components and the integrity with which those components are implemented would be a potentially interesting avenue for future research.
Our studies focused on effects of omission errors (failure to implement timeout altogether) because this was a common error during our descriptive observations. However, omission and commission errors during specific components of a timeout procedure could also impact effectiveness. For our students, existing timeout procedures included several components, like providing instructions to go to timeout and guiding students to timeout if needed. The potential negative impact of integrity failures on other timeout components is unclear. It seems likely that there are critical features of timeout (such as limiting access to reinforcers) that impact timeout efficacy as much, or more, than does overall timeout frequency. Future research should evaluate which specific components of a timeout procedure are necessary to suppress problem behavior.
Our study was conducted during naturally occurring play situations in the students’ schools, which resulted in considerable uncontrolled variability in the environments. For example, the number of students present at recess and the games that students chose to play varied naturally from session to session. These uncontrolled variables almost certainly induced variability in challenging behavior in our study. For example, if the peer that always facilitated freeze tag, a game that can involve rough-and-tumble play, was absent, the students might have played independently (e.g., on the swings), reducing the likelihood of challenging behavior independently of our programmed contingencies. For participants who engaged in peer aggression, peer absences may have reduced the likelihood that aggression would occur. We used a reversal design in Study 2 and often conducted phases for up to 15 days in an attempt to increase experimental control, but our data are limited by the considerable amount of obtained variability. Future studies could collect data in more highly controlled contexts to attain better stability and isolate the impact of reduced-integrity. However, this increased control may come at the expense of external or ecological validity.
The low levels of omission integrity obtained in Study 1 may have been due to how teachers viewed the definitions of target behavior. Recall that we used the operational definitions from the students’ BIPs. However, the formal definitions of aggression used by the teachers (see Table 1) captured several instances of pretend play (e.g., pretend fighting as Power Rangers). Although this behavior met the criterion for timeout according to the teachers’ definitions, the teachers may not have considered these instances of aggression to be socially significant problems and instead may have classified them in the moment as play. Teachers may also be less likely to implement timeout for behavior that is not highly severe because the implementation of timeout may be effortful for the teacher or evoke additional challenging behavior from the student. It was not possible to determine the relation between omission errors and the severity of each instance of behavior from our data. However, evaluating the impact of behavior severity on naturalistic treatment integrity seems like a useful area for future research.
Because timeout may suppress responding even when implemented highly intermittently, it may be a useful intervention component when few resources are available or inconsistent implementation is a concern. Additional research is needed on the variables (e.g., classroom size, number of target behaviors, specific components included in the procedure, teacher experience, history of implementation, detail of operational definitions, etc.) that may impact the effectiveness of timeout procedures. Further research is also needed on the timein variables (e.g., density and types of reinforcers available) that influence timeout. Such research will allow behavior analysts to design multicomponent interventions that include low-effort but effective use of timeout procedures.
Supplementary Material
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Author Biographies
Apral P. Foreman now is a training manager for Blue Sprig Pediatrics, who completed this study in partial fulfillment of the requirements for a doctoral degree in Psychology.
Claire C. St. Peter is a professor of Psychology at West Virginia University whose research focuses on development and dissemination of effective interventions for school-aged children who engage in antisocial behavior.
Gabrielle A. Mesches is now a research coordinator at Northwestern University whose interests span a variety of clinical interventions for at-risk populations.
Nicole Robinson is now the clinical director of The Hope Learning Center, where she oversees the provision of educational and therapeutic services for children with special needs.
Lucie M. Romano is now a practicing behavior analyst who specializes in precision teaching.
Footnotes
Ethical Approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the West Virginia University Institutional Review Board (IRB #1409437044) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent
Informed consent was obtained from teachers and parents of all individual participants included in the study.
Supplemental Material
Supplemental material for this article is available online.
Note
1. Note that we choose to measure integrity relative to the procedures already in place in the classroom because of research suggesting that teacher involvement in the development of plans may influence integrity. However, using existing plans meant that not all plans necessarily followed best practices in the literature (e.g., Sonny was required to sit for the remaining duration of recess upon the second timeout).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
References
- Arkoosh MK, Derby KM, Wacker DP, Berg W, McLaughlin TF, & Barretto A. (2007). A descriptive evaluation of long-term treatment integrity. Behavior Modification, 31(6), 880–895. 10.1177/0145445507302254 [DOI] [PubMed] [Google Scholar]
- Barton LE, Brulle AR, & Repp AC (1987). Effects of differential scheduling of timeout to reduce maladaptive responding. Exceptional Children, 53(4), 351–356. 10.1177/001440298705300410 [DOI] [PubMed] [Google Scholar]
- Calhoun KS, & Matherne P. (1975). The effects of varying schedules of timeout on aggressive behavior of a retarded girl. Journal of Behavior Therapy and Experimental Psychiatry, 6(2), 139–143. 10.1016/00057916(75)90039-7 [DOI] [Google Scholar]
- Carroll RA, Kodak T, & Fisher WW (2013). An evaluation of programmed treatment-integrity errors during discrete-trial instruction. Journal of Applied Behavior Analysis, 46(2), 379–394. 10.1002/jaba.49 [DOI] [PubMed] [Google Scholar]
- Clark HB, Rowbury T, Baer AM, & Baer DM (1973). Timeout as a punishing stimulus in continuous and intermittent schedules. Journal of Applied Behavior Analysis, 6(3), 443–455. 10.1901/jaba.1973.6-443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colón CL, & Ahearn WH (2019). An analysis of treatment integrity of response interruption and redirection. Journal of Applied Behavior Analysis, 52(2), 337–354. https://doi-org.www.libproxy.wvu.edu/10.1002/jaba.537 [DOI] [PubMed] [Google Scholar]
- Cuenin LH, & Harris KR (1986). Planning, implementing, and evaluating timeout interventions with exceptional students. TEACHING Exceptional Children, 18(4), 272–276. 10.1177/004005998601800408 [DOI] [Google Scholar]
- Donaldson JM, & Vollmer TR (2011). An evaluation and comparison of timeout procedures with and without release contingencies. Journal of Applied Behavior Analysis, 44(4), 693–705. https://doi-org.www.libproxy.wvu.edu/10.1901/jaba.2011.44-693 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Donaldson JM, & Vollmer TR (2012). A procedure for thinning the schedule of time-out. Journal of Applied Behavior Analysis, 45(3), 625–630. 10.1901/jaba.2012.45-625 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Everett GE (2010). Time-out in special education settings: The parameters of previous implementation. North American Journal of Psychology, 12(1), 159–170. [Google Scholar]
- Fryling MJ, Wallace MD, & Yassine JN (2012). Impact of treatment integrity on intervention effectiveness. Journal of Applied Behavior Analysis, 45(2), 449–453. 10.1901/jaba.2012.45-449 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hobbs SA, & Forehand R. (1977). Important parameters in the use of timeout with children: A re-examination. Journal of Behavior Therapy and Experimental Psychiatry, 8(4), 365–370. 10.1016/0005-7916(77)90004-0 [DOI] [Google Scholar]
- Jackson JL, & Calhoun KS (1977). Effects of two variable-ratio schedules of timeout: Changes in target and non-target behaviors. Journal of Behavior Therapy and Experimental Psychiatry, 8(2), 195–199. 10.1016/00057916(77)90047-7 [DOI] [Google Scholar]
- Maag JW (2001). Rewarded by punishment: Reflections on the disuse of positive reinforcement in schools. Exceptional Children, 67(2), 173–186. 10.1177/001440290106700203 [DOI] [Google Scholar]
- Northup J, Fisher W, Kahng SW, Harrell R, & Kurtz P. (1997). An assessment of the necessary strength of behavioral treatments for severe behavior problems. Journal of Developmental and Physical Disabilities, 9(1), 1–16. 10.1023/A:1024984526008 [DOI] [Google Scholar]
- Pendergrass VE (1971). Effects of length of time-out from positive reinforcement and schedule of application in suppression of aggressive behavior. The Psychological Record, 21(1), 75–80. 10.1007/BF03393992 [DOI] [Google Scholar]
- Peterson L, Homer AL, & Wonderlich SA (1982). The integrity of independent variables in behavior analysis. Journal of Applied Behavior Analysis, 15(4), 477–492. 10.1901/jaba.1982.15-477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rhymer KN, Evans-Hampton TN, McCurdy M, & Watson TS (2002). Effects of varying levels of treatment integrity on toddler aggressive behavior. Special Services in the Schools, 18(1–2), 75–82. 10.1300/J008v18n01_05 [DOI] [Google Scholar]
- St. Peter Pipkin C, Vollmer TR, & Sloman K. (2010). Effects of treatment integrity failures during differential reinforcement of alternative behavior: A translational model. Journal of Applied Behavior Analysis, 43(1), 47–70. 10.1901/jaba.2010.43-47 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor J, & Miller M. (1997). When timeout works some of the time: The importance of treatment integrity and functional assessment. School Psychology Quarterly, 12(1), 4–22. 10.1037/h0088943 [DOI] [Google Scholar]
- Vollmer TR, Sloman KN, & St. Peter Pipkin. (2008). Practical implications of data reliability and treatment integrity monitoring. Behavior Analysis in Practice, 1(2), 4–11. 10.1007/BF03391722 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warzak WJ, Floress MT, Kellen M, Kazmerski JS, & Chopko S. (2012). Research forum-trends in time-out research: Are we focusing our efforts where our efforts are needed? The Behavior Therapist, 35(2), 30–33. [Google Scholar]
- Zimmerman J, & Baydan NT (1963). Punishment of sΔ responding of humans in conditional matching to sample by time-out. Journal of the Experimental Analysis of Behavior, 6(4), 589–597. 10.1901/jeab.1963.6-589 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


