Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 1.
Published in final edited form as: J Exp Psychol Gen. 2018 Nov 12;148(10):1665–1674. doi: 10.1037/xge0000519

Monitoring prediction errors facilitates cognition in action

John Plass 1,3, Simon Choi 1, Satoru Suzuki 1,2, Marcia Grabowecky 1,2
PMCID: PMC6511482  NIHMSID: NIHMS990287  PMID: 30421944

Abstract

Cognition in action requires strategic allocation of attention between internal processes and the sensory environment. We hypothesized that this resource allocation could be facilitated by mechanisms that predict sensory results of self-generated actions. Sensory signals conforming to predictions would be safely ignored to facilitate focus on internally generated content, whereas those violating predictions would draw attention for additional scrutiny. During a visual-verbal serial digit-recall task, we varied the temporal relationship between task-irrelevant keypresses and auditory distractors so that the distractors were either temporally coupled or decoupled with keypresses. Consistent with our hypothesis, distractors were more likely to interfere with target maintenance and intrude into working memory when they were decoupled from keypresses, thereby violating action-based sensory predictions. Interference was maximal when sounds preceded keypresses, suggesting that stimuli were most distracting when their timing was inconsistent with expected action-sensation contingencies. In a follow-up experiment, neither auditory nor visual cues to distractor timing produced similar effects, suggesting a unique action-based mechanism. These results suggest that action-based sensory predictions are used to dynamically optimize attentional allocation during cognition in action.

Introduction

Cognition in everyday life requires strategic allocation of attentional resources to both cognitive processes and sensory signals that may indicate environmental demands. For example, navigating a city block requires encoding, maintaining, and manipulating (e.g., mentally rotating) sequences of location cues in working memory while also monitoring the environment for potential dangers, such as uneven sidewalks or oncoming cars.

However, the ability to concurrently cognize and monitor the environment is limited by mutual interference between attention and working memory processes required by both tasks. For example, attentional tracking, selection, and re-allocation can interfere with the maintenance or manipulation of visuospatial content in working memory (Awh, Jonides, & Reuter-Lorenz, 1998; Awh & Jonides, 2001; Corbetta & Shulman, 2002; Oh & Kim, 2004; Woodman & Luck, 2004; Fougnie & Marois, 2006). Similarly, phonological working memory can be disrupted by auditory signals (Baddeley, 1966, 1992; Colle & Welsh, 1976; Logie, Sala, Laiacona, Chalmers, & Wynn, 1996; Salame & Baddeley, 1982, 1987; Jones & Macken, 1993), especially when they capture attention (e.g., R. W. Hughes, Vachon, & Jones, 2005; Vachon, Hughes, & Jones, 2012).

Moreover, large-scale patterns of neural activity suggest a global opposition between processing of internally- and externally-generated signals, with hemodynamic activity in networks that encode internally-generated content being anti-correlated with activity in networks associated with externally-oriented perceptual and attentional processing (Fox et al., 2005). Additionally, during self-reports of internally-generated thought, neurophysiological responses to sensory stimuli are dampened and behavioral responses are less reliable (Kam & Handy, 2013).

Despite these clear impediments to concurrent processing of internally- and externally-generated signals, a wide variety of human activities, including navigation, social interaction, physical recreation, artistic production, complex tool use, and general problem solving often require continuous and near-simultaneous processing of externally-generated sensory information and information generated or sustained by internal processes. This problem is exacerbated in activities involving dynamic interactions with the environment because self-generated actions may become an additional source of competing sensory stimuli, triggering attentional tracking or re-orienting operations that can interfere with working memory (Awh & Jonides, 2001; Oh & Kim, 2004; Woodman & Luck, 2004; Fougnie & Marois, 2006; R. W. Hughes et al., 2005; Vachon et al., 2012). How do humans manage the competing demands of externally-oriented perceptual processes and internally-oriented cognitive processes when interacting with dynamic sensory environments?

We hypothesized that, during simultaneous cognition and action, attentional allocation is continuously optimized by monitoring the accuracy of action-based sensory predictions. Because actions are selected to produce particular environmental effects, discrepancies between expected and actual sensory outcomes are indicative of deficiencies in an organism’s working model of the surrounding environment (Friston, 2011; Clark, 2013, 2015). Therefore, during cognition, the magnitude of prediction errors could be used to gauge the need for attentional re-allocation towards potentially informative environmental signals. When the sensory results of self-generated actions are consistent with expected outcomes, they can be safely ignored, freeing attentional resources to be allocated to information being maintained in working memory. However, when sensory input is inconsistent with expected action-sensation contingencies, expectations must be reevaluated, requiring further attentional scrutiny of unexpected sensory events. Thus, monitoring prediction errors could facilitate the efficient allocation of attention to internal or external signals in accordance with time-varying informational needs.

Because self-generated actions place stereotypical constraints on the timing and content of subsequent sensations, corollary signals could be used to guide attentional allocation towards or away from a wide variety of sensory signals during cognition in action (Waszak, Cardoso-Leite, & Hughes, 2012; Clark, 2013). For example, initiating forward locomotion produces a predictable array of sensations, including a particular rate and trajectory of optic flow, temporal pattern of auditory footsteps, and spatiotemporal pattern of tactile signals on the feet and skin. While this array of signals would likely be distracting in an unrelated context (e.g., while sitting), their predictability on the basis of self-generated actions could be used to minimize their salience when they are task-irrelevant, potentially reducing interference with ongoing attentional and cognitive processes. Moreover, because these particular signals would be expected on the basis of one’s own actions, signals that deviate from this expectation might be more likely to attract attention, even if they would be relatively inconspicuous under different circumstances. Previous research suggests that tactile (Blakemore, Frith, & Wolpert, 1999; Bays, Wolpert, & Flanagan, 2005), visual (Cardoso-Leite, Mamassian, Schütz-Bosbach, & Waszak, 2010), and auditory (Sato, 2008; Weiss, Herwig, & Schütz-Bosbach, 2011) signals can all be perceptually attenuated by action-based predictability. However, it is currently unknown whether action-based predictions are also used to direct attention between internal and external signals, as we have proposed.

To test this hypothesis, we conducted a visual-verbal serial digit-recall experiment in which auditory distractors were temporally coupled or decoupled with keypress responses to task-irrelevant cues. Because action-based predictions attenuate both neural responses to and the subjective intensity of sensory stimuli (Waszak et al., 2012; G. Hughes, Desantis, & Waszak, 2013), we hypothesized that the distracting effects of auditory stimuli would be reduced by action-based temporal predictability when keypresses and distractors were temporally coupled. However, when auditory distractors were decoupled from keypresses, the distractors would no longer be predictable based on the corollary signals and thus attract attention, disrupting working memory. Violations of expectations produced by, for example, auditory deviants can disrupt cognition by capturing attention (R. W. Hughes, Vachon, & Jones, 2007; Vachon et al., 2012; Parmentier, 2014). Because predictable action-sensation contingencies may produce sensory expectations similar to those produced by repetitive or structured stimulus sequences (Moore & Haggard, 2008), we hypothesized that violations of action-based sensory expectations would produce similar results. Such an outcome would provide evidence in support of an attentional guidance mechanism that operates on the confirmation or violation of action-based sensory predictions.

Experiment 1

Participants completed a visually-presented serial digit-recall task. During the presentation of 7-digit sequences, participants were prompted to make simple keypress responses to task-irrelevant visual cues that appeared after 4 out of 7 randomly-selected digits (Figure 1a). Auditory distractors (spoken conflicting numbers) were always presented around the time of each keypress response, but the temporal relationship between the keypress response and the auditory distractor varied by condition (Figure 1b). On “action-coupled” trials, auditory distractors were presented immediately (7.5 ms) after each keypress response. Thus, the timing of individual auditory distractors was predictable based on participants’ self-generated actions.

Figure 1.

Figure 1.

Design and results of Experiment 1. (a) On each trial, participants made speeded responses to a task-irrelevant red square while memorizing a sequence of seven visually-presented digits. The square turned red following the presentation of four randomly-selected numbers per trial. Participants reported the visually-presented digits at the end of each trial with no time constraints. (b) A conflicting auditory distractor number was presented either 7.5 ms following participants’ keypress response to the red square—the action-coupled condition—or following the red square at a delay sampled from a Gaussian distribution fit to participants’ response times from the previous 24 trials—the action-decoupled condition. (c) Left. Overall recall for visually-presented digits was significantly more accurate when distractors were temporally coupled to keypresses. Right. For digits presented before speeded responses to the red square, recall accuracy was significantly decreased only when distractors preceded keypresses by more than the typical temporal window of integration for actions and sounds (±100 ms). (d) Auditory distractors were more likely to intrude into working memory when they were temporally decoupled from keypresses. Error bars represent ±1 SEM adjusted for the repeated-measures design. (*) p < .05, (**)p < .01, (****)p < .001. All other comparisons p > .05.

By contrast, on “action-decoupled” trials, distractor timing was not determined by the timing of the keypress response but, rather, the distribution of previous response times. Specifically, each distractor latency was sampled from a Gaussian distribution with mean and standard deviation determined by the previous 24 response times. Thus, on the action-decoupled trials, the timing of individual distractors was not predictable based on participants’ actions; however, the mean and variability of distractor latencies (relative to the onsets of digit targets) were matched to those on the action-coupled trials. The action-coupled and action-decoupled trials were randomly intermixed.

Because only a single instance of action-sensation coupling is necessary to establish the expectation of subsequent co-occurence (Moore & Haggard, 2008), we expected that reliable temporal coupling during action-coupled trials and chance temporal coupling during action-decoupled trials would evoke robust expectations of action-sensation coupling throughout the task. We hypothesized that confirmation of these expectations in the action-coupled condition would facilitate attentional allocation to the serial recall task, resulting in improved recall accuracy, while violations of expectations in the action-decoupled condition would increase attentional capture by auditory distractors, producing more distractor intrusions into working memory.

Method

Participants.

Forty-three (24 female) undergraduate students at Northwestern University (age range: 18-21 years; mean age = 19.0) gave informed consent to participate in exchange for partial course credit. We decided in advance to use a minimum sample size of 35. The number of participants available to us through our department participant pool for one academic quarter was sufficient to meet our criterion, so all pool participants for the quarter were used. All participants had normal or corrected-to-normal visual acuity, normal color vision, and normal hearing.

Apparatus.

Visual stimuli were presented on a 19” color LCD monitor (60 Hz, 1280 × 1024 resolution) at a viewing distance of approximately 70 cm in a dimly lit room. Participants responded using the space bar, enter key, and the horizontally arranged numerical keys at the top of a standard keyboard. Auditory stimuli were presented through Sennheiser HD 280 Pro closed-back headphones. These headphones provided up to 32 dB of external noise attenuation, which was sufficient to render any sounds produced by keypresses inaudible. Experiments were written in MATLAB using Psychophysics Toolbox 3.0.12 (Brainard, 1997; Pelli, 1997).

Stimuli.

Visual stimuli were single-digit numbers (approx. 0.6° by 1.15°; Helvetica font) presented within a 2.75° square frame (5 px thick) on a white background. On each trial, 7 digits, randomly selected from 1 through 9 (without replacement) were presented sequentially. Each digit was presented for 250 ms, followed by a 1250 ms inter-stimulus interval. After 4 out of 7 digits (randomly selected), the bounding frame was briefly (100 ms) replaced with a red, thickened (8 px) frame, cuing participants to respond as fast as possible by pressing the space bar (Figure 1a). The red frame was always presented 550 ms after the onset of an individual digit (and 700 ms before the onset of a next digit) so that the auditory distractor was presented well after perceptual encoding of the visual stimulus and participants had sufficient time to respond to it before the onset of the next digit.

Auditory stimuli were amplitude-normalized (55 dB SPL) recordings of a male speaker vocalizing the numbers 1 through 9 (mean duration: 262 ms; SD: 32 ms). Each auditory distractor was randomly selected from these numbers, except that it was different from the immediately preceding and following visual numbers. After 6 practice trials with no auditory distractors, 30 trials with action-coupled and 30 trials with action-decoupled distractors were presented in random order.

Procedure.

On each trial, participants memorized a 7-digit sequence to the best of their ability while responding as fast as they could to each presentation of the red frame. The experimenter emphasized that the auditory stimuli would never be task-relevant and were only presented to make the task more difficult. At the end of each trial, participants were prompted to type in the numbers that they had seen. Numbers were displayed on screen as they were entered and could be erased using the backspace key. Participants finalized their responses by pressing the enter key. The experimenter explained that responses would be scored using the strict serial-recall criterion; that is, individual digits are counted as correct only if the correct digit is reported in the correct serial position. Finally, to encourage participants to respond to the keypress task, they were misled to believe that additional trials would be added if they failed to respond.

Results

Working memory interference.

To assess the effects of action-based sensory predictability on working memory, we compared recall accuracy between the two distractor conditions (Figure 1c, left). Consistent with our hypothesis, accuracy was significantly higher in the action-coupled condition than in the action-decoupled condition, t(42) = 2.95, p = .005, dz = .45 (Figure 1c, left panel).

To test whether this result could be attributed to factors unrelated to action-sensation predictability, we evaluated a series of plausible alternative explanations. It is possible that participants could have strategically shifted their response times or omitted responses to the red square in the action-coupled condition to minimize distraction. However, there were no significant differences in mean response times or response frequencies to the red square between the two conditions (mean response time = 397 ms [SE = 2 ms, corrected for within-subjects comparison] for the action-coupled condition and 401 ms [SE = 2 ms] for the action-decoupled condition, t(42) = 1.35, p = .185); mean response frequency = 89.6% [SE = 0.4] for the action-coupled condition and 89.6% [SE = 0.4] for the action-decoupled condition , t(42) = 0.04, p = .969). We additionally compared response time distributions using separate Anderson-Darling (AD) tests for each participant.1 In 43 separate Anderson-Darling tests for the 43 participants, we found no significant differences between response time distributions for any participant (AD test statistic, median: 1.10, range: 0.19 to 2.42; critical value: 2.492), suggesting that participants did not strategically vary their response patterns depending on the condition.

We next considered whether differences in the number of distractors presented between the two conditions could have driven the observed effect. Because participants occasionally failed to respond to the red square, slightly fewer distractors were presented in the action-coupled condition, in which distractor presentation depended on whether participants responded. However, there was no significant correlation between the across-condition difference in distractor frequency and the across-condition difference in working-memory accuracy, r = 0.003, t(41) = .02, p = .99, providing no evidence that the condition effect was driven by differences in distractor frequency.

To directly assess whether action-based sensory predictions influenced working memory independently of potential effects of distractor frequency, we compared recall accuracy only for those visual numbers that were followed by both keypresses and auditory distractors during encoding. Consistent with our hypothesis, recall was significantly more accurate for visual numbers followed by action-coupled auditory distractors (M = 73.7%, SE = 0.8) than those followed by action-decoupled auditory distractors (M = 70.8%, SE = 0.8), t(42) = 2.35, p = .023, dz = 0.36.

To assess the extent to which distractor interference depended on the temporal relationship between keypresses and auditory distractors, we binned trials from the action-decoupled condition based on the temporal relationship between keypresses and auditory distractors (Figure 1c, right panel). Because previous research suggested that the window of perceived simultaneity between keypresses and sounds is approximately ±100 ms (symmetrical) for untrained observers (Toida, Ueno, & Shimada, 2014), we partitioned trials into four distractor-timing bins relative to keypresses: 100-200 ms before keypresses (Mean number of distractors = 23.3, SD = 5.8), 0-100 ms before keypresses (M = 25.2, SD = 8.8), 0-100 ms after keypresses (M = 29.4, SD = 11.7), and 100-200 ms after keypresses (M = 25.4, SD = 6.4).

In comparison to performance on action-coupled trials, recall was less accurate when auditory distractors preceded keypresses by more than the typical window of perceived simultaneity (t[34] = 4.32, p< .001, dz = 0.66, for time-bin 1), but not when distractors were within the window (t[34] = 1.47, p = .150, for time-bin 2, t[34] = 0.03, p = .976, for time-bin 3) or when they immediately followed the window (t[34] = 0.15, p = .884 for time-bin 4). Recall was significantly less accurate for the earliest time bin than for all other time bins (t[34] = 2.84, p = .006, dz = 0.43 for time-bin 1 vs. time-bin 2, t[34] = 3.97, p < .001, dz = 0.61, for time-bin 1 vs. time-bin 3, and t[34] = 3.94, p < .001, dz = 0.60, for time-bin 1 vs. time-bin 4), but there were no significant differences in recall accuracy among the later time bins (p’s> .23). Together, these results suggest that auditory distractors immediately preceding the action-perception integration window strongly interfere with working memory encoding.

Attentional capture.

Whereas our analyses so far suggest that action-decoupled distractors disrupt working memory, they do not directly demonstrate that this disruption was due to attentional capture by unexpected distractors. To evaluate the hypothesis that action-decoupled distractors were more likely to capture attention than action-coupled distractors, we compared how often numbers intruded into working memory in each condition when they were presented as auditory distractors, but not as visual targets, on a given trial. (Figure 1d). Because auditory distraction can disrupt both the encoding and serial ordering of sequentially presented items in working memory (R. W. Hughes et al., 2007; R. W. Hughes, Hurlstone, Marsh, Vachon, & Jones, 2013), auditory distractors that intrude into working memory may not be recalled in the serial position that corresponds to their presentation time. We thus used mistaken recall of distractors, regardless of their serial position in participants’ responses, as an index of attentional capture. As expected, numbers presented only as auditory distractors were more likely to intrude into working memory when they were temporally decoupled from keypresses than when they were coupled to keypresses, t(42) = 2.41, p = .021, dz = 0.37 (Figure 1d).

Discussion

Action-based predictability of distractor timing in the action-coupled condition afforded superior working memory performance and reduced distractor intrusions relative to the action-decoupled condition, while stimulus-based temporal predictability (i.e., the variability in distractor timing relative to the action-demanding red square) was balanced between the two conditions. Because it was always disadvantageous to attend to auditory distractors and participants were instructed to ignore them, these results suggest that distractors were more likely to capture attention when their timing was not predictable based on participants’ actions. The fact that working memory interference occurred primarily when auditory distractors immediately preceded (rather than followed) the typical action-perception integration window suggests that distractors captured attention especially when their timing was inconsistent with the expected temporal contingency between actions and sensations. These results are consistent with the hypothesis that attention is dynamically re-allocated between internal cognitive processes and external sensory signals in accordance with the accuracy of action-based sensory predictions.

Experiment 2

Experiment 2 was designed to test whether the effects observed in Experiment 1 depended specifically on action-based sensory predictions or could also be produced by other, external cues that predict distractor timing. Because anticipatory cues to target timing are ineffective under working memory load, we hypothesized that they would also be ineffective in guiding attention towards or away from distractors during our working memory task (Capizzi, Sanabria, & Correa, 2012; Capizzi, Correa, & Sanabria, 2013). Thus, we hypothesized that the effects of predictability observed in Experiment 1 were driven by action-specific prediction mechanisms, and could not be reproduced by substituting external timing cues for self-generated actions.

To test this hypothesis, we designed a task that was analogous to the task used in Experiment 1, except that keypresses were replaced with auditory or visual “warning” signals that provided either valid or invalid cues to the timing of auditory distractors. To ensure that distractor timing was comparable to that in Experiment 1, distractors were presented at random delays (relative to the onsets of visual target digits) drawn from a Gaussian distribution with mean (550 ms response cue delay plus 399 ms response time) and standard deviation (117 ms) determined by the distributions of response times observed in Experiment 1. On sequences with valid timing cues, warning signals preceded auditory distractors with a fixed interval of 250 ms. This interval was chosen based on previous reports showing robust temporal orienting effects with similar cue-target asynchronies (Griffin, Miniussi, & Nobre, 2001). On sequences with invalid timing cues, warning signals were presented 250 ms before a delay determined by a separate draw from the same distractor-delay distribution, producing temporal decorrelations comparable to those in the action-decoupled condition of Experiment 1, while preserving the statistics of distractor delays relative to digit targets.

Participants.

Forty-four (28 female) undergraduate students at the University of Michigan (age range: 18-21 years; mean age = 18.6) gave informed consent to participate in exchange for partial course credit. Experiment 2 was approved by the University of Michigan Health Sciences and Behavioral Sciences Institutional Review Board.

Apparatus.

The apparatus was the same as that used in Experiment 1.

Stimuli.

The stimuli were the same as those used in Experiment 1, except that auditory and visual warning stimuli were introduced to replace the keypress component of Experiment 1. The visual warning stimulus was identical to the keypress cue used in Experiment 1. The auditory warning stimulus was a 500 Hz pure tone that lasted 100 ms. Both auditory and visual stimuli were time-locked to the screen refresh closest to their scheduled timing.

Procedure.

Participants completed the same serial recall task used in Experiment 1, except that the keypress component was removed. Trials with auditory and visual warning cues were presented in counterbalanced blocks of 60 trials each, with valid-cue and invalid-cue trials randomly intermixed within each block. Participants completed 3 practice trials with each type of warning cue, and were told explicitly which cue would be used at the beginning of each block.

Results

Working memory interference.

To assess the effects of cue-based predictability on working memory, we compared overall recall accuracy between conditions with temporally valid and temporally invalid cues. Consistent with our hypothesis, there were no significant effects of cue validity for either visual (Figure 2a, left panel; t[43]= 1.247, p = .219) or auditory cues (Figure 2c, left panel; t[43]= −0.09, p = .926). Additionally, there was no interaction between block order and cue modality, F(1, 42) = 0.848, p = .363, or three-way interaction between block order, cue modality, and cue validity, F(1, 42) = 0.117, p = .734, suggesting that the results were not influenced by potential fatigue effects across blocks.

Figure 2.

Figure 2.

Results of Experiment 2. In Experiment 2, participants completed the same task as in Experiment 1, except that visual (panels A-B) or auditory (panels C-D) anticipatory cues were substituted for keypress responses. Sensory cues were consistently presented 250 ms before auditory distractors in the “valid cue” condition and at variable latencies in the “invalid cue” condition, with the temporal statistics of both the sensory cues and distractors matched between the two conditions. (a) Left. In the visual cue conditions, overall recall accuracy for visually-presented digits did not differ significantly between conditions with valid and invalid temporal cues. Right. Recall accuracy did not differ across conditions after binning “invalid cue” trials by distractor timing. (b) In the visual cue conditions, there was no significant difference in distractor intrusions across conditions with valid and invalid cues. (c) Left. In the auditory cue conditions, overall recall accuracy for visually-presented digits did not differ significantly between conditions with valid and invalid temporal cues. Right. Recall accuracy did not differ across conditions after binning “invalid cue” trials by distractor timing. (d) In the auditory cue conditions, there was no significant difference in distractor intrusions across conditions with valid and invalid cues. Error bars represent ±1 SEM adjusted for the repeated-measures design. “n.s” = “not significant.”

To assess the evidence for the null hypothesis that cue validity did not affect recall, we used a “Bayesian t-test” to estimate the relative likelihood of our results under the null versus alternative hypotheses (i.e., the Bayes factor in favor of the null hypothesis [BF01]; Rouder, Speckman, Sun, Morey, & Iverson, 2009; Morey, Rouder, Pratte, & Speckman, 2011). Using a default prior on Cohen’s d(zero-centered Cauchy distribution with scale parameter .707) recommended by Rouder, Morey, and Wagenmakers (2016), we found that, for visual cues, our results were 2.97 times more likely under the null hypothesis than the alternative hypothesis and, for auditory cues, 6.10 times more likely under the null hypothesis. These results therefore provide substantive evidence against the hypothesis that cue-based predictability affected recall accuracy.

As with Experiment 1, we also tested to see whether cue validity affected recall for the specific items that were followed by auditory distractors during encoding. Again, there was no effect of cue validity for visual (Valid: M = 75.7%, SE = 1.0; Invalid: M = 76.4%, SE = 1.0; t[43] = −0.51, p = .614; BF01 = 5.42) or auditory cues (Valid: M = 76.5%, SE = 1.0; Invalid: M = 75.0%, SE = 1.0; t[43] = 1.14, p = .261; BF01 = 3.34).

Because the effect observed in Experiment 1 was primarily attributable to distractors presented more than 100 ms before their expected timing, we also tested to see if distractors presented more than 100 ms before the usual interval decreased recall accuracy in Experiment 2. Consistent with our hypothesis, there was no significant difference in recall accuracy for digits followed by validly-cued distractors and those followed by early distractors in either the visual (Figure 2a, right panel; t[43] = 1.55, p = .128) or auditory conditions (Figure 2c, right panel; t[43] = 1.37, p = .179). Bayes factors for this analysis (Visual: BF01 = 2.02; Auditory: BF01 = 2.58) suggested only modest evidence for the null hypotheses. However, these results clearly differed from those observed in Experiment 1, in which an equivalent analysis produced a Bayes factor of 251.6 in favor of the alternative hypothesis. Comparing results across experiments, we observed significant interactions between predictor type (action or sensory) and distractor timing (expected or early) when comparing the effects of action-based predictability in Experiment 1 to those of visual, F(1, 85) = 7.710, p = .007, and auditory, F(1, 85) = 6.729, p = .011, cue-based predictability in Experiment 2. There were no main effects of predictor type (Action vs. Auditory: F[1, 85] = 2.418, p = .124; Action vs. Visual: F[1, 85] = 1.482, p = .227), indicating that overall performance was comparable across the two experiments. Finally, there were no significant differences in recall accuracy for digits followed by validly-cued distractors and digits followed by distractors in any of the other time bins (Visual: all p’s > .49, Figure 2a, right panel; Auditory: allp’s > .37, Figure 2c, right panel). Collectively, these results suggest that external cues to distractor timing do not affect working memory performance in the same way that action-based predictions do.

Attentional capture.

To assess the effects of cue-based temporal expectations on attentional capture, we measured how often numbers intruded into working memory when they were presented as auditory distractors, but not as visual targets, on a given trial. As with Experiment 1, we used mistaken recall of distractors, regardless of their serial position in participants’ responses, as an index of attentional capture. Consistent with our hypothesis, the percentage of distractors that intruded into working memory was not affected by the validity of either visual (Figure 2b; t[43] = 0.99, p = .330; BF01 = 3.89) or auditory cues (Figure 2d; t[43] = 0.62, p = .539; BF01 = 5.11).

Discussion

In contrast to self-generated actions, anticipatory sensory cues to distractor timing did not improve recall accuracy or reduce distractor intrusions. This suggests that attention was not reallocated between internal and external signals in accordance with the accuracy of cue-based predictions. Because the valid cues in Experiment 2 were similarly predictive of distractor timing (relative to invalid cues) as distractor-coupled keypresses were (relative to distractor-decoupled keypresses) in Experiment 1, these results suggest that cue-based temporal predictability could not be utilized to guide attention in the same manner as action-based predictability.

While it could be argued that these results were potentially due to sensory cues providing less precise temporal information than action-based cues, our results do not support this interpretation. If this were the case, valid sensory cues would be expected to produce weaker, but non-negligible, recall enhancements when compared to action-coupled keypresses. However, our results suggest that valid sensory cues did not just produce weaker recall enhancements, but did not provide any reliable advantage over invalid sensory cues. Bayesian model comparisons indicated that our data provided substantive evidence for this null hypothesis, suggesting that these results do not simply reflect a lack of statistical power. Indeed, these data allowed us to estimate overall recall accuracy with a standard error of less than one percentage point in each condition (corrected for within-subjects design), suggesting that our study was adequately powered to detect any non-trivial differences between conditions. Moreover, because the timing of the valid sensory cues was chosen to be consistent with that of effective warning cues reported in previous experiments (Griffin et al., 2001), this null result is unlikely to reflect any general inadequacy of the temporal cues. Rather, consistent with previous behavioral and neuropsychological research (Triviño, Correa, Arnedo, & Lupiáñez, 2010; Capizzi, Sanabria, & Correa, 2012; Capizzi, Correa, & Sanabria, 2013; see General Discussion), these results suggest that participants were unable to use sensory warning cues to generate temporal predictions while performing a simultaneous cognitive task.

Additionally, we considered the possibility that other differences between Experiments 1 and 2 could have produced this result. One potentially important difference is that the sensory warning cues in Experiment 2 were themselves presented at temporally variable latencies, potentially producing additional distraction that might have interfered with any facilitative effects produced by temporal predictability. However, previous research suggests that temporal variability in distractor timing itself does not produce distraction during serial recall tasks (Parmentier & Beaman, 2015). Rather, distractions due to temporal variability are driven by violations of temporal expectations, such as those produced by temporal deviants embedded in otherwise rhythmic presentations (e.g., Hughes, Vachon, & Jones, 2005). Because cues were always presented at variable latencies (i.e., non-rhythmically), and because the distributions of cue-onset latencies were matched across conditions, temporal variability in cue latencies would not have violated any stimulus-based temporal expectations throughout the experiment. Further, while keypress cues were temporally fixed in Experiment 1 in the sense that they appeared following a fixed delay from a digit target, they were temporally variable in the sense that they appeared randomly after 4 out of the 7 target digits. Therefore, warning cues in Experiment 2 would have been approximately as distracting as keypress cues in Experiment 1. Indeed, overall performance did not differ significantly across experiments, suggesting that task difficulty was comparable across tasks, and unlikely to have contributed to the observed differences across experiments. Rather, the use of variably timed cues allowed for the distribution of distractor latencies to be closely matched across conditions and experiments, ensuring that any performance differences between conditions could be attributed to temporal information provided by either self-generated actions or warning cues, rather than differences in latency distributions.

Taken together, our results suggest that temporal predictability produced by anticipatory sensory cues could not be utilized to guide attention in the same manner as action-based predictability. This suggests that the prediction-based attentional modulations observed in Experiment 1 were not dependent on a general temporal prediction mechanism, but a mechanism that relies on action-based predictions in particular.

General Discussion

In two experiments, we demonstrated that the allocation of attention between internal cognitive processes and external sensory signals depends on the degree to which sensory input is predictable on the basis of self-generated actions. In Experiment 1, we used a visual-verbal serial digit-recall task to demonstrate that recall accuracy depends on the action-based predictability of distractor timing during working memory encoding. Distractors that were temporally decoupled from keypresses were more likely to interfere with target recall and intrude into working memory than those that were temporally coupled with keypresses. In a second experiment, we verified that these effects were unique to action-based predictions, and could not be produced by auditory or visual cues to distractor timing.

Together, these results suggest that attentional allocation is dynamically optimized by mechanisms that monitor the accuracy of temporal predictions about the sensory consequences of self-generated actions. This strategic reallocation likely plays a role in facilitating cognition during action, in which cognitive and environmental signals frequently place competing demands on attentional and working memory processes.

Because cognition always occurs while the cognizing organism is embedded in a potentially unpredictable environment, the system may utilize various sources of information to predict the timing of sensory events, including stimulus-based, context-based, history-based, as well as action-based predictors, allowing a focus on internal cognitive processes to be maintained so long as sensory signals conform to the predictions, but selectively diverting attention to sensory signals that violate the predictions. However, the ability to exploit various predictors may be constrained by potential interference between cognitive operations and prediction processes that are supported by the same neural subsystems.

In Experiment 2, we found that sensory cues to distractor timing could not be used to guide attention during cognition in the same way that self-generated actions could, although they were similarly predictive of distractor timing. Because working memory load can interfere with cue-based temporal orienting of attention, participants’ ability to exploit external timing cues may have been limited by interference between prediction and working memory mechanisms that draw on the same neurocognitive substrates (Capizzi et al., 2012, 2013). Neuroimaging and lesion mapping studies suggest that cue-based temporal orienting relies critically on prefrontal regions implicated in voluntary cognitive control (Triviño, Correa, Arnedo, & Lupiáñez, 2010) and frontoparietal networks implicated in voluntary attention and working memory (Coull, Frith, Büchel, & Nobre, 2000). Thus, the ability to use external cues to guide temporal attention may be constrained by concurrent cognitive operations involving “top-down” cognitive control.

By contrast, action-based prediction mechanisms may allow the system to circumvent these limitations by repurposing sensory predictions used in action generation and control. Because actions are generated to produce particular environmental effects, motor and pre-motor systems must predict the outcomes of potential actions to identify motor programs that are most consistent with the organism’s goals or expectations (Kawato & Wolpert, 1998; Grush, 2004; Friston, 2011; Clark, 2013, 2015). By communicating these predictions to sensory or attentional systems, mechanisms responsible for predicting action outcomes could guide attention in accordance with sensory expectations without employing the cognitively demanding prediction mechanisms implicated in cue-based temporal orienting. Thus, action-based prediction mechanisms may allow for the flexible reallocation of attention in accordance with changing informational needs without interrupting ongoing cognitive operations.

This interpretation is consistent with theories of neural computation that emphasize the importance of “action-oriented predictive processing” in sensory coding and neural processing more broadly (Friston & Stephan, 2007; Friston, 2011; Clark, 2013, 2015). On these accounts, both perception and action are accomplished by minimizing discrepancies between sensory signals and generative models that map between sensations and their causes. Perceptual inference is performed by identifying generative models that best predict incoming sensory signals. Action intentions are fulfilled by minimizing discrepancies between intended and observed action results. Through mutual reinforcement, these processes enable successful interaction with the environment by producing situationally appropriate beliefs and actions.

On these accounts, sensory signals that violate action-based expectations would trigger large-scale re-evaluations of plausible generative models as the system attempted to account for unexplained input. Because an accurate model relating actions to their likely outcomes is a core pre-requisite for survival, this re-evaluation might be prioritized over other concurrent goals, interrupting ongoing processes to divert attention towards unexpected signals.

Consistent with this interpretation, we found in Experiment 1 that distractors were primarily disruptive when they preceded keypresses, and were therefore inconsistent with the expected temporal contingency between actions and sensations. Because temporal contingency is a critical cue for the perception of causality (Michotte, 1963/2017; Bassili, 1976), these results may suggest that distractors were most likely to attract attention when they violated learned models of the causal relationship between actions and sensations within the task context. However, further research directly manipulating the perception of causality is needed to better substantiate this claim.

Finally, distinguishing between cognitively-mediated and action-based prediction mechanisms may help to explain why different types of stimulus-based predictability produce distinct effects on working memory performance. Whereas temporal expectations based on stimulus history (e.g., rhythm) can influence attentional allocation during working memory tasks, temporal expectations produced by predictive cues do not (R. W. Hughes et al., 2005; Capizzi et al., 2012, 2013). One potential explanation for this dissociation is that the encoding and prediction of stimulus rhythm is at least partially subserved by the motor system, while cue-based temporal orienting relies on the high-level cognitive control systems described above. Because stimulus rhythm is encoded in beta-band activity throughout various regions involved in action generation and control (Fujioka, Trainor, Large, & Ross, 2012; Morillon & Baillet, 2017), temporal expectations encoded in those oscillations are unlikely to be disrupted by unrelated cognitive operations. Therefore, action- and rhythm-based expectations produced by this system are likely to play a role in attentional allocation during complex cognition, whereas external cue-based expectations are not.

Altogether, our results suggest that humans use action-based sensory predictions to optimize attentional allocation during cognition, when cognitively-mediated mechanisms for prediction and attentional control may fail. This prediction mechanism likely exploits learned causal relationship between actions and sensations, allowing uninformative action effects to be ignored while novel, unexpected sensations are attended. During cognition in action, this may allow for attention to be optimally re-allocated between internal and external signals in accordance with time-varying knowledge of the encompassing environment, thereby ensuring the strategic coordination of attention, action, and cognition across a wide variety of neural, bodily, and environmental states.

Acknowledgments

This research was supported by NIH grant T32 NS047987. We thank Dr. David Brang for contributing laboratory resources to this project. We also thank Sarah Landes and Chris Denney for their assistance in data collection.

Footnotes

1

The Anderson-Darling test is a non-parametric, non-directional test that is sensitive to many potential differences between distributions, including differences in location, scale, symmetry, tail-density, and overall shape (Anderson & Darling, 1952; Engmann & Cousineau, 2011). Simulations suggest that this test is more sensitive to differences in response time distributions than the commonly used Kolmogorov-Smirnoff test, and is well-powered for sample sizes similar to ours.

References

  1. Anderson TW, & Darling DA (1952). Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochastic Processes. The Annals of Mathematical Statistics, 23(2), 193–212. [Google Scholar]
  2. Awh E, & Jonides J (2001). Overlapping mechanisms of attention and spatial working memory. Trends in Cognitive Sciences, 5(3), 119–126. [DOI] [PubMed] [Google Scholar]
  3. Awh E, Jonides J, & Reuter-Lorenz PA (1998). Rehearsal in spatial working memory. Journal of Experimental Psychology: Human Perception and Performance, 24(3), 780. [DOI] [PubMed] [Google Scholar]
  4. Baddeley A (1966). Short-term memory for word sequences as a function of acoustic, semantic and formal similarity. Quarterly Journal of Experimental Psychology, 18(4), 362–365. 10.1080/14640746608400055 [DOI] [PubMed] [Google Scholar]
  5. Baddeley A (1992). Working memory. Science (New York, N.Y.), 255(5044), 556–559. [DOI] [PubMed] [Google Scholar]
  6. Bassili JN (1976). Temporal and spatial contingencies in the perception of social events. Journal of Personality and Social Psychology, 33(6), 680. [Google Scholar]
  7. Bays PM, Wolpert DM, & Flanagan JR (2005). Perception of the Consequences of Self-Action Is Temporally Tuned and Event Driven. Current Biology, 75(12), 1125–1128. https://doi.Org/10.1016/j.cub.2005.05.023 [DOI] [PubMed] [Google Scholar]
  8. Blakemore S-J, Frith CD, & Wolpert DM (1999). Spatio-Temporal Prediction Modulates the Perception of Self-Produced Stimuli. Journal of Cognitive Neuroscience, 77(5), 551–559. 10.1162/089892999563607 [DOI] [PubMed] [Google Scholar]
  9. Brainard DH (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433–436. [PubMed] [Google Scholar]
  10. Capizzi M, Correa Á, & Sanabria D (2013). Temporal orienting of attention is interfered by concurrent working memory updating. Neuropsychologia, 51(2), 326–339. 10.1016/j.neuropsychologia.2012.10.005 [DOI] [PubMed] [Google Scholar]
  11. Capizzi M, Sanabria D, & Correa Á (2012). Dissociating controlled from automatic processing in temporal preparation. Cognition, 123(2), 293–302. https://doi.Org/10.1016/j.cognition.2012.02.005 [DOI] [PubMed] [Google Scholar]
  12. Cardoso-Leite P, Mamassian P, Schütz-Bosbach S, & Waszak F (2010). A New Look at Sensory Attenuation: Action-Effect Anticipation Affects Sensitivity, Not Response Bias. Psychological Science, 21(12), 1740–1745. 10.1177/0956797610389187 [DOI] [PubMed] [Google Scholar]
  13. Clark A (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3 ), 181–204. [DOI] [PubMed] [Google Scholar]
  14. Clark A (2015). Embodied prediction In Open mind. Open MIND. Frankfurt am Main: MIND Group. [Google Scholar]
  15. Colle HA, & Welsh A (1976). Acoustic masking in primary memory. Journal of Verbal Learning and Verbal Behavior, 75(1), 17–31. 10.1016/S0022-5371(76)90003-7 [DOI] [Google Scholar]
  16. Corbetta M, & Shulman GL (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews. Neuroscience, 3(3), 201. [DOI] [PubMed] [Google Scholar]
  17. Coull JT, Frith CD, BBüchel C, & Nobre AC (2000). Orienting attention in time: behavioural and neuroanatomical distinction between exogenous and endogenous shifts. Neuropsychologia, 38(6), 808–819. [DOI] [PubMed] [Google Scholar]
  18. Engmann S, & Cousineau D (2011). Comparing Distributions: The two-sample Anderson-Darling test as an alternative to the Kolmogorov-Smirnoff test. Journal of Applied Quantitative Methods, 6, 1–17. [Google Scholar]
  19. Fougnie D, & Marois R (2006). Distinct Capacity Limits for Attention and Working Memory: Evidence From Attentive Tracking and Visual Working Memory Paradigms. Psychological Science, 17(6), 526–534. 10.1111/j.1467-9280.2006.01739.x [DOI] [PubMed] [Google Scholar]
  20. Fox MD, Snyder AZ, Vincent JL, Corbetta M, Essen DCV, & Raichle ΜE (2005). The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proceedings of the National Academy of Sciences of the United States of America, 102(21), 9673–9678. 10.1073/pnas.0504136102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Friston K (2011). Embodied Inference: or “I think therefore I am, if I am what I think.” In Embodied Inference: or “I think therefore I am, if I am what I think” (pp. 89–125). Imprint Academic. [Google Scholar]
  22. Friston K, & Stephan K (2007). Free-energy and the brain. Synthese, 159(3), 417–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Fujioka T, Trainor LJ, Large EW, & Ross B (2012). Internalized Timing of Isochronous Sounds Is Represented in Neuromagnetic Beta Oscillations. Journal of Neuroscience, 32(5), 1791–1802. 10.1523/JNEUROSCI.4107-11.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Griffin IC, Miniussi C, & Nobre AC (2001). Orienting attention in time. Frontiers in Bioscience: A Journal and Virtual Library, 6, D660. [DOI] [PubMed] [Google Scholar]
  25. Grush R (2004). The emulation theory of representation: Motor control, imagery, and perception. Behavioral and Brain Sciences, 27(3), 377–396. 10.1017/S0140525X04000093 [DOI] [PubMed] [Google Scholar]
  26. Hughes G, Desantis A, & Waszak F (2013). Mechanisms of intentional binding and sensory attenuation: The role of temporal prediction, temporal control, identity prediction, and motor prediction. Psychological Bulletin, 739(1), 133. [DOI] [PubMed] [Google Scholar]
  27. Hughes RW, Hurlstone MJ, Marsh JE, Vachon F, & Jones DM (2013). Cognitive control of auditory distraction: impact of task difficulty, foreknowledge, and working memory capacity supports duplex-mechanism account. Journal of Experimental Psychology: Human Perception and Performance, 39(2), 539. [DOI] [PubMed] [Google Scholar]
  28. Hughes RW, Vachon F, & Jones DM (2005). Auditory attentional capture during serial recall: Violations at encoding of an algorithm-based neural model? Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(4), 736. [DOI] [PubMed] [Google Scholar]
  29. Hughes RW, Vachon F, & Jones DM (2007). Disruption of short-term memory by changing and deviant sounds: Support for a duplex-mechanism account of auditory distraction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 1050–1061. https://doi.Org/10.1037/0278-7393.33.6.1050 [DOI] [PubMed] [Google Scholar]
  30. Jones DM, & Macken WJ (1993). Irrelevant tones produce an irrelevant speech effect: Implications for phonological coding in working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 79(2), 369–381. https://doi.Org/10.1037/0278-7393.19.2.369 [Google Scholar]
  31. Kam JWY, & Handy TC (2013). The neurocognitive consequences of the wandering mind: a mechanistic account of sensory-motor decoupling. Frontiers in Psychology, 4 10.3389/fpsyg.2013.00725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kawato M, & Wolpert D (1998). Internal models for motor control. Sensory Guidance of Movement, 218, 291–307. [DOI] [PubMed] [Google Scholar]
  33. Logie RH, Sala SD, Laiacona M, Chalmers P, & Wynn V (1996). Group aggregates and individual reliability: The case of verbal short-term memory. Memory & Cognition, 24(3), 305–321. 10.3758/BF03213295 [DOI] [PubMed] [Google Scholar]
  34. Michotte A (2017). The Perception of Causality. Routledge. [Google Scholar]
  35. Moore J, & Haggard P (2008). Awareness of action: Inference and prediction. Consciousness and Cognition, 77(1), 136–144. [DOI] [PubMed] [Google Scholar]
  36. Morey RD, Rouder JN, Pratte MS, & Speckman PL (2011). Using MCMC chain outputs to efficiently estimate Bayes factors. Journal of Mathematical Psychology, 55(5), 368–378. [Google Scholar]
  37. Morillon B, & Baillet S (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences, 201705373 10.1073/pnas.1705373114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Oh S-H, & Kim M-S (2004). The role of spatial working memory in visual search efficiency. Psychonomic Bulletin & Review, 77(2), 275–281. 10.3758/BF03196570 [DOI] [PubMed] [Google Scholar]
  39. Parmentier FB (2014). The cognitive determinants of behavioral distraction by deviant auditory stimuli: A review. Psychological Research, 75(3), 321–338. [DOI] [PubMed] [Google Scholar]
  40. Parmentier FB, & Beaman CP (2015). Contrasting effects of changing rhythm and content on auditory distraction in immediate memory. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologic Experimentale, 69( 1), 28. [DOI] [PubMed] [Google Scholar]
  41. Pelli DG (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 70(4), 437–442. [PubMed] [Google Scholar]
  42. Rouder JN, Morey R, & Wagenmakers E-J (2016). The Interplay between Subjectivity, Statistical Practice, and Psychological Science. Collabra: Psychology, 2(1). 10.1525/collabra.28 [DOI] [Google Scholar]
  43. Rouder JN, Speckman PL, Sun D, Morey RD, & Iverson G (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. [DOI] [PubMed] [Google Scholar]
  44. Salamé P, & Baddeley A (1982). Disruption of short-term memory by unattended speech: Implications for the structure of working memory. Journal of Verbal Learning and Verbal Behavior, 21(2), 150–164. 10.1016/S0022-5371(82)90521-7 [DOI] [Google Scholar]
  45. Salamé P, & Baddeley A (1987). Noise, unattended speech and short-term memory. Ergonomics, 30(8), 1185–1194. 10.1080/00140138708966007 [DOI] [PubMed] [Google Scholar]
  46. Sato A (2008). Action observation modulates auditory perception of the consequence of others’ actions. Consciousness and Cognition, 17(4), 1219–1227. https://doi.Org/10.1016/j.concog.2008.01.003 [DOI] [PubMed] [Google Scholar]
  47. Toida K, Ueno K, & Shimada S (2014). Recalibration of subjective simultaneity between self-generated movement and delayed auditory feedback. NeuroReport, 25(5), 284–288. [DOI] [PubMed] [Google Scholar]
  48. Triviño M, Correa A, Amedo M, & Lupiáñez J (2010). Temporal orienting deficit after prefrontal damage. Brain, 133(4), 1173–1185. 10.1093/brain/awp346 [DOI] [PubMed] [Google Scholar]
  49. Vachon F, Hughes RW, & Jones DM (2012). Broken expectations: violation of expectancies, not novelty, captures auditory attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(1), 164. [DOI] [PubMed] [Google Scholar]
  50. Waszak F, Cardoso-Leite P, & Hughes G (2012). Action effect anticipation: Neurophysiological basis and functional consequences. Neuroscience & Biobehavioral Reviews, 36(2), 943–959. https://doi.Org/10.1016/j.neubiorev.2011.ll.004 [DOI] [PubMed] [Google Scholar]
  51. Weiss C, Herwig A, & Schütz-Bosbach S (2011). The self in action effects: selective attenuation of self-generated sounds. Cognition, 121(2), 207–218. https://doi.Org/10.1016/j.cognition.2011.06.011 [DOI] [PubMed] [Google Scholar]
  52. Woodman GF, & Luck SJ (2004). Visual search is slowed when visuospatial working memory is occupied. Psychonomic Bulletin & Review, 11(2), 269–274. 10.3758/BF03196569 [DOI] [PubMed] [Google Scholar]

RESOURCES