Sensorimotor Integration Can Enhance Auditory Perception

John C Myers; Jeffrey R Mock; Edward J Golob

doi:10.1038/s41598-020-58447-z

. 2020 Jan 30;10:1496. doi: 10.1038/s41598-020-58447-z

Sensorimotor Integration Can Enhance Auditory Perception

John C Myers ^1,^✉, Jeffrey R Mock ¹, Edward J Golob ¹

PMCID: PMC6992622 PMID: 32001755

Abstract

Whenever we move, speak, or play musical instruments, our actions generate auditory sensory input. The sensory consequences of our actions are thought to be predicted via sensorimotor integration, which involves anatomical and functional links between auditory and motor brain regions. The physiological connections are relatively well established, but less is known about how sensorimotor integration affects auditory perception. The sensory attenuation hypothesis suggests that the perceived loudness of self-generated sounds is attenuated to help distinguish self-generated sounds from ambient sounds. Sensory attenuation would work for louder ambient sounds, but could lead to less accurate perception if the ambient sounds were quieter. We hypothesize that a key function of sensorimotor integration is the facilitated processing of self-generated sounds, leading to more accurate perception under most conditions. The sensory attenuation hypothesis predicts better performance for higher but not lower intensity comparisons, whereas sensory facilitation predicts improved perception regardless of comparison sound intensity. A series of experiments tested these hypotheses, with results supporting the enhancement hypothesis. Overall, people were more accurate at comparing the loudness of two sounds when making one of the sounds themselves. We propose that the brain selectively modulates the perception of self-generated sounds to enhance representations of action consequences.

Subject terms: Auditory system, Sensorimotor processing

Introduction

Many of our actions have auditory consequences, such as hearing our own speech or playing musical instruments. Predicting the auditory consequences of our actions is important for motor control, and there is evidence that sensorimotor networks in the brain generate these predictions^1,2. Predictions about features of action-related sounds (e.g., frequency or amplitude) are conveyed from motor to auditory networks to help coordinate movements and correct errors^3–6. The underlying cognitive process of sensorimotor integration is thought to be subtractive, where the brain dynamically compares predictions about self-generated sounds to the actual consequences of the actions^5,7. This comparison is also thought to be important for establishing the sense of agency in human actions, which is a core feature of consciousness^8,9.

In this study we asked how sensorimotor integration affects the accuracy of auditory perception. We use term “active sounds” to refer to sounds that are generated by the participant. Active sounds are always a consequence of the participant’s actions. By contrast, “passive sounds” are presented to participants and are never a direct consequence of their actions. Researchers typically study the processing of active vs. passive sounds by having people speak or press buttons to generate sounds and then asking them questions about their perceptions (e.g., ‘Which sound was louder?’).

The neural basis of sensorimotor integration has been examined during spontaneous speech in humans and non-human primates. Electroencephalography (EEG), magnetoencephalography (MEG), and electrocorticography (eCoG) studies in humans consistently find smaller evoked responses to active sounds^10–14. Similarly, when marmoset monkeys actively generate vocal calls, the mean firing rate of most neurons (~80%) in the auditory cortex is reduced, compared to when listening to passive playback of the same sound^12,13. However, not all of the cells reduce their responsiveness, as about (~20%) have shown the opposite response, firing faster to the active sounds. The functional significance of the brain having different responses to active vs. passive sounds is unclear. One hypothesis is that the reduced firing rates of auditory cortical neurons during active sounds reflects the subtractive process that attenuates the perceived loudness of the auditory feedback^14,15. However, the hypothesis that comparing predicted vs. actual sounds causes sensory attenuation does not fully account for the neurons that increase their response to active sounds. Possibly, the differential responses between neuronal populations might indicate a special case of predictive coding, which is a broader aspect of cognition involving predictions that the brain makes about the world (e.g., sensory predictions)^16–18. Indeed, similar neural mechanisms might underlie both sensory and sensorimotor prediction, because even in the absence of movement, the premotor cortex has been shown to increase activity when participants try to predict audio/visual sequences¹⁹.

Several behavioral studies of auditory perception and sensorimotor integration have reported attenuated loudness perception for active sounds, leading to speculation that loudness attenuation might be a way to distinguish sensory inputs due to one’s own actions from other sources^20,21. In previous studies, when participants were asked to judge which of two successive sounds was louder, the active sounds were generally interpreted as being slightly quieter than passively delivered sounds. This conclusion was largely based on psychophysical measures of the point of subjective equality (PSE)^5,21,22. In the context of perceiving loudness, the PSE is the decibel value where the loudness of two stimuli are indistinguishable to the participant (i.e., 50% discriminability). For an ideal sensory observer, the mean dB difference would be 0. When actively triggering a sound, the PSE has been shown to be slightly less vs. a passive sound by less than 1 dB)²². Another quantification method involves measuring the accuracy of perceptual judgments relative to “ground truth” dB levels. Standard and test tones are presented in each trial, and within a block of trials the level of the test tone is always either above or below the standard dB level, and an observer's loudness judgment on each trial is either correct or incorrect. A value of accuracy, such as 75% which is intermediate to random and perfect performance with two choices, is then used to define a discrimination threshold. In this study, we took this approach to test whether the objective accuracy of auditory perception is affected by active vs. passive conditions.

A key limitation of the previous research is that stimulus order was not counterbalanced^5,22,23. For these studies, on active trials the first sound was always active and the second sound was always passive. Prior studies have established that perceptual judgments can be influenced by stimulus order^24–26, and thus one goal of the current study was to control for stimulus order. Another complication in the literature is that participants can be better at detecting active vs. passive sounds²⁷. This suggests that, for very quiet sounds at the limits of detectability, active sounds are perceived to be louder, not quieter, than passive sounds. Taken together, the above observations show that loudness perception can differ between active and passively generated sounds. However, the specific patterns of results vary, and here are important methodological issues such as order effects that need to be addressed, which leaves the functional significance of the auditory-motor integrationunclear. The rationale for the current study is to directly test whether sensorimotor integration globally attenuates the perceived loudness of active sounds, or whether sensorimotor predictions can modulate perception in order to make more accurate judgments about the world. The purpose of sensorimotor integration is likely to modify sensory processing for action-related functions^3,14,28,29.

There are theoretical reasons to question whether loudness attenuation can be used to distinguish sounds caused by one’s actions vs. other sources. First, using loudness to determine agency requires a precise representation of the expected loudness in memory; yet loudness matching based on memory is known to be imprecise³⁰. Substantial noise is present not only in memory, but also in sound production, sound perception, and the environment. For example, the speech sounds of a repeated word are not acoustically identical²⁹. Perceptual judgments of a given stimulus are also variable, which is the reason that experimental studies, including this one, average behavioral responses over many trials³¹. The perception of sensory feedback can also be masked by other sounds in the environment³². These four sources of noise (memory, motor output, sensory input, environment) are reasons to question whether humans can truly make such fine-grained loudness judgments that are reliable enough to distinguish sensory input from one’s actions vs. other sources. At the very least, more acoustic features, such as sound frequency may play a role in establishing agency.

This study is comprised of three sound level discrimination experiments and one auditory detection experiment. In the discrimination task, two sounds are played in a row, and participants decide which one is louder. If perceived loudness is attenuated for active sounds, then subjects should be more sensitive to loudness differences when active sounds are less intense than the comparison sound (i.e., lower decibel level). Perceiving active sounds as quieter would exaggerate the perceptual difference between self vs. louder tones (see Fig. 1). Conversely, subjects should be less sensitive to loudness differences when active sounds are more intense than the comparison sounds. We hypothesize that sensorimotor integration will improve intensity discrimination performance regardless of which sound (active vs. passive) has the objectively higher sound level. Intensity discrimination can also be effected by expectations related to the sound level³³. Thus, Experiment 1 included a between-subjects design to test whether active sounds are perceived differently when participants expect the passive sounds to be lower vs. higher intensity (±1–5 dB). In Experiments 2–3, intensity level direction was included as a within-subjects factor to test whether active sounds (always 70 dB) were perceived as louder or softer than passive sounds that were objectively ±2 dB away from the active sound (68 dB or 72 dB). The ±2 dB range was chosen because the accuracy of the perceptual judgments was near 75% in Experiment 1, equidistant from chance-level (50%) and perfect performance (100%). The role of sound feature expectation was examined by comparing loudness judgments of frequencies that, on the basis of previous training, were expected (75%) vs. unexpected (25%) frequencies. Note that the expectation about upcoming events and attention are inter-related, but can be experimentally distinguished³⁴. Lastly, in Experiment 4 we hypothesized that low-intensity sounds would be more readily detected if they were self-generated and also matched the expected frequency.

Model of how auditory-motor prediction would affect perception under two competing hypotheses. In this example the left column illustrates objective levels of a standard sound and comparison sounds that presented 2 dB above and below the 70 dB level of the standard. The middle column shows how ‘sensory attenutation’ would predicts that the perceived loudness of an active sound is attenuated, which should make it easier to judge loudness relative to a higher volume comparison sound. Conversely, it should be harder to compare loudness with the lower comparison sound. We propose that active sounds benefit from sensorimotor processing which should improve loudness judgments regardless of whether the comparison sound level is above or below the standard.

Methods and Results

Experiment 1 methods

Experimental design

The purpose of Experiment 1 was to test whether sensorimotor integration improves sound intensity discrimination as a function of the predicted frequency and intensity. Experiment 1 consisted of a motor condition (active) and a non-motor control condition (passive). Participants first learned to associate pressing a button with pure tone auditory feedback (acquisition phase). Next, participants were asked to compare the loudness of the feedback to comparison tones of different intensities, using a two-alternative forced choice task. In the active condition, participants pressed a button to generate the standard reference tone, and comparison tones were always computer generated. In the passive condition the onset of both the standard reference and comparison tones were computer-generated.

Participants

University students (N = 42; female/male = 30/12; age, 21.26 ± 0.79 years; one left-handed) received course credit for participation. Pure tone thresholds were tested from 500 to 8000 Hz with an audiometer (Maico, Eden Prairie, MN, USA), and all participants were within normal hearing limits. All participants reported normal neurological and psychiatric health. Each participant signed an informed consent form in order to participate, and all experimental procedures were performed in accordance with a protocol approved by the University of Texas, San Antonio Institutional Review Board, consistent with the Declaration of Helsinki.

Procedure and stimuli

Participants were seated in an audiometric room in front of a computer monitor and keyboard with a pair of headphones (Audio-Technica, ATH-M20). Visual instructions were provided via computer monitor.

Acquisition phase

The acquisition phase is a training routine designed to evoke sensorimotor prediction by pairing a simple motor command (e.g., pressing a button) with a predictable sensory consequence (e.g., pure tone or somatosensory feedback)^1,5,22,35,36. In Experiment 1, participants learned the association between a button press (left vs. right index finger) and a pure tone (600 Hz or 700 Hz, 250 ms, 10 ms rise/fall time, 70 dB SPL, (200 trials, ~3.5 sec/trial). In each trial, a visual cue (green ‘L’ or ‘R’, 300 ms) told the participants to push a key on a standard keyboard with either their left (‘ ~’ button) or right (‘+’ button) index finger to generate the onset of the auditory feedback (50 ms) (see Fig. 2). For 50% of subjects, the left-key press generated a 600 Hz standard tone and the right-key press generated a 700 Hz standard pure tone. The inter-trial-interval (ITI) was 1000 ms. Standard tone-to-key mapping was counterbalanced across all participants. We assigned distinct tone frequencies to each button in order to encourage the sense of agency and promote sensorimotor prediction during the task. Following the acquisition phase, a two-alternative forced choice (2AFC) procedure was used to measure the effects of sensorimotor integration on intensity discrimination.

Two-alternative forced choice task

The test phase consisted of a 2AFC paradigm in which a standard reference tone (always 70 dB SPL) and a variable intensity comparison tone were presented consecutively in each trial. Half of the participants (n = 21) were presented with 5 lower intensity comparison tones (65, 66, 67, 68, or 69 vs. 70 dB), and the other half (n = 21) were presented with 5 higher intensity comparison tones (70 dB vs. 71, 72, 73, 74, or 75 dB). In the active condition, participants pressed a button to generate the standard tone in response to visual cues (green ‘L’ or ‘R’ for 300 ms). Tones were presented to participants 50 ms after pressing the button. In the passive condition, both the standard and comparison tones were externally-generated and linked with comparable visual cues to maximize predictability (red ‘L’ or ‘R’ for 300 ms). In both conditions, the standard tone was either preceded or followed by a ±1–5 dB externally-generated comparison tone (inter-stimulus interval (ISI) = 700–1000 ms). The time between the offset of the first sound and onset of the second sound, randomly varied between 700 ms, 850 ms, and 1000 ms (p = 0.33/inter-stimulus interval). The slight variability across trials was included to keep participants engaged in the task. Notably, when the second sound was the active sound (i.e., self-generated), the ISI was ~860 ms longer than when the first sound was active (i.e., 700/850/1000 ms +860 ms). This ISI discrepancy occurred because participants were given an unfixed amount of time to generate the sound. At the end of each trial, participants were asked, ‘Which tone was louder?’ The next trial began 1300 ms after subjects made their loudness judgment (see Fig. 3).

Experiments 1–3: Two-alternative Forced Choice Procedure. For each trial participants were presented with two consecutive tones and instructed to choose the louder tone. In the *active* condition (A) participants self-generated one of the two tones (always the standard tone at 70 dB) with a button press. In the *passive* condition (B) neither tone was produced by the participant. For 25% of trials, incongruent tones were presented that were originally mapped to the opposite visual cue and button press during the acquisition phase.

To test the importance of hearing the expected frequency to auditory motor prediction, 25% of the trials contained incongruent frequencies that, in the acquisition phase, were mapped to the opposite response. All trials were counterbalanced for stimulus order (standard tone 1^st vs. 2^nd) (see Fig. 3), and the two tones always had the same frequency. Active/passive trials were presented in separate, alternating blocks consisting of 56 trials (560 total trials; 5 blocks/condition). The left vs. right cue for each trial was always random (50% probability).

Data analysis

Psychometric (logistic) functions were fit to each subject’s percent correct discrimination as a function of standard vs. comparison intensity (±1–5 dB) data using a maximum likelihood procedure. Psychometric functions were analyzed for each intensity level direction (between-subjects factor: higher dB or lower dB), condition (within-subjects factor: active, passive), frequency (within-subjects factor: congruent, incongruent), and stimulus order (within-subjects factor: standard tone 1^st or 2^nd). Intensity discrimination slope was defined as the estimated percent increase in performance for every 1-dB change in intensity. Each measure provides important information about how perception is affected by the motor task. Intensity discrimination thresholds and slopes were assessed using a 2 (intensity level direction: ±1–5 dB) × 2 (condition: active vs. passive) × 2 (frequency: congruent vs. incongruent) × 2 (stimulus order: standard tone first or second) mixed analysis of variance (ANOVA). Therefore, each subject had eight threshold and slope values, one for each condition (active vs. passive). Intensity discrimination threshold was defined as the point on the psychometric function corresponding to 75% correct discrimination, halfway between chance (50%) and perfect performance^24,37. Effect sizes were computed using partial eta squared (η_p²). In Experiment 1, intensity level direction was included as a between-subjects factor to separately evaluate the effect of consistent louder vs. quieter comparison tones on perceptual sensitivity. Making judgments on variable amplitude comparisons might implicitly impact perceptual judgments. Later, Experiments 2–3 include intensity level direction as a within-subjects factor.

Experiment 1 results

A 2 (intensity level direction) × 2 (condition) × 2 (frequency) × 2 (stimulus order) ANOVA indicated an interaction of condition × frequency (F_(1,40) 9.79, p < 0.010, η_p² = 0.20; Fig. 4). In the active condition, thresholds were lower for congruent and higher for incongruent frequencies (F_(1,40) = 25.32, p < 0.001, η_p² = 0.39). Discrimination thresholds (intensity level needed to get 75% accuracy) decreased when self-generated frequencies were congruent (1.771 ± 0.08 dB to get 75% accuracy), but increased when frequencies were incongruent (2.27 ± 0.10 dB to get 75% accuracy) (Fig. 4A). In the passive condition, congruent vs. incongruent frequencies did not affect thresholds (p = 0.40). The mostly symmetrical effect for higher vs. lower intensity is shown in Fig. 5. Stimulus order (standard 1^st vs. 2^nd) had a trending main effect on discrimination thresholds, but the effect was not significant (p = 0.091).

Experiment 1 - Effect of motor commands and feedback frequency on intensity discrimination across each intensity level. In the active condition, sensitivity diminished when actively generating incongruent sounds (A: p < 0.010). Discrimination thresholds were highest for self-generated, incongruent feedback (B: p < 0.050). Error bars reflect standard error of the mean.

Experiment 1. Condition × congruence effects between subjects, where comparison tone intensities were either higher or lower than the 70 dB standard. In both the up and down directions, discrimination thresholds were higher when generating incongruent frequencies, which suggests that perception is biased towards predicted action effects (p < 0.010). Error bars reflect standard error of the mean.

For the slope measure there were no significant effects involving active vs. passive conditions. However, there was a main effect of frequency, showing a steeper slope for the congruent frequency in both conditions (F_(1,40) 6.10, p = 0.018, η_p² = 0.13) (see Fig. 5). There was a trend for a condition × intensity level interaction, where lower dB comparison tones had steeper slopes vs. when comparison tones had higher levels than the standard (p = 0.070, η_p² = 0.08).

The results of Experiment 1 provide evidence that actively generating a tone affects loudness threshold judgments, with greater perceptual acuity for expected vs. unexpected tones. Discrimination thresholds during passive listening were in-between those of the active expected and unexpected tones, and were comparable for expected and unexpected passive tones. The slope results show that the steepness of the relation between intensity differences and loudness judgments was not influenced by active vs. passive stimulus delivery. Instead, overall sensitivity to a difference being present was greatest for active-expected tones, intermediate for passive tones, and least for active-unexpected tones.