Skip to main content
. 2021 Mar 20;11(3):394. doi: 10.3390/brainsci11030394

Table 1.

Summary of major forms of speech degradation with representative experimental studies in healthy listeners.

Degradation Type Study Participants Methodology Major Findings
ACCENTS
Target process: phonemic and intonational representations
Ecological relevance: Understanding messages conveyed via non-canonical spoken phonemes and suprasegmental intonation
Bent and Bradlow [65] 65 healthy participants (age: 19.1) Participants listened to English sentences spoken by Chinese, Korean, and English native speakers. Non-native listeners found speech from non-native English speakers as intelligible as from a native speaker.
Clarke and Garrett [66] 164 healthy participants (American English) Participants listened to English sentences spoken with a Spanish, Chinese, and English accent. Processing speed initially slower for accented speech, but this deficit diminished with exposure.
Floccia, Butler, Goslin and Ellis [54] 54 healthy participants (age 19.7; Southern British English) Participants had to say if the last word in a spoken sentence was real or not. Changing accent caused a delay in word identification, whether accent change was regional or foreign.
ALTERED AUDITORY FEEDBACK
Target process: Influence of auditory feedback on speech production
Ecological relevance: Ability to hear, process, and regulate speech from own production.
Siegel and Pick [67] 20 healthy participants Participants produced speech whilst hearing amplified feedback of their own voice. Participants lowered their voices (displaying the sidetone amplification effect) in all conditions.
Jones and Munhall [68] 18 healthy participants (age: 22.4; Canadian English) Participants produced vowels with altered feedback of F0 shifted up or down. Participants compensated for change in F0.
Donath et al. [69] 22 healthy participants (age: 23; German) Participants said a nonsense word with feedback of their frequency randomly shifting downwards. Participants adjusted their voice F0 after a set period of time due to processing the feedback first.
Stuart et al. [70] 17 healthy participants (age: 32.9; American English) Participants spoke under DAF at 0, 25, 50, 200 ms at normal and fast rates of speech. There were more dysfluencies at 200 ms, and more dysfluencies at the fast rate of speech.
DICHOTIC LISTENING
Target process: Auditory scene analysis (auditory attention)
Ecological relevance: Processing of spoken information with competing verbal material
Moray [71] Healthy participants, no other information given Participants were told to focus on a message played to one ear, with a competing message in the other ear. Participants did not recognize the content in the unattended message.
Lewis [72] 12 healthy participants Participants were told to attend to message presented in one ear, with a competing message in the other. Participants could not recall the unattended message, but semantic similarity affected reaction times.
Ding and Simon [73] 10 healthy participants (age 19–25) Under MEG, participants heard competing messages in each ear, and asked to attend to each in turn. Auditory cortex tracked temporal modulations of both signals, but was stronger for the attended one.
NOISE-VOCODED SPEECH
Target process: Phonemic spectral detail
Ecological relevance: Understanding whisper (similar quality to speech heard by cochlear implant users)
Shannon, Zeng, Kamath, Wygonski and Ekelid [59] 8 healthy participants Participants listened to and repeated simple sentences that had been noise-vocoded to different degrees. Performance improved with number of channels; high speech recognition was achieved with only 3 channels.
Davis, Johnsrude, Hervais-Adelman, Taylor and McGettigan [58] 12 healthy participants (age 18–25; British English) Participants listened to and then transcribed 6-channel noise-vocoded sentences. Participants showed rapid improvement over the course of 30-sentence exposure.
Scott, Rosen, Lang and Wise [35] 7 healthy participants (age 38) Under PET, participants listened to spoken sentences that were noise-vocoded to various degrees. Selective response to speech intelligibility in left anterior STS.
PERCEPTUAL RESTORATION
Target process: Message interpolation
Ecological relevance: Understanding messages in intermittent or varying noise (e.g., a poor telephone line)
Warren [57] 20 healthy participants Participants identified where the gap was in sentences where a phoneme was replaced by silence/white noise. Participants were more likely to mislocalize a missing phoneme that was replaced by noise.
Samuel [74] 20 healthy participants (English) Participants heard sentences in which white noise was either “Added” to or “Replaced” a phoneme. Phonemic restoration was more common for longer words and certain phone classes.
Leonard, Baud, Sjerps and Chang [43] 5 healthy participants (age 38.6; English/Italian) Subdural electrode arrays recorded while participants listened to words with noise-replaced phonemes. Electrode responses were comparable to intact words vs. words with a phoneme replaced.
SINEWAVE SPEECH
Target process: Speech reconstruction and adaptation from very impoverished cues
Ecological relevance: Synthetic model for impoverished speech signal and perceptual learning
Remez, Rubin, Pisoni and Carrell [63] 54 control participants Naïve listeners heard SWS replicas of spoken sentences and were later asked to transcribe the sentences. Most listeners did not initially identify the SWS as speech, but were able to transcribe them when told this.
Barker and Cooke [64] 12 control participants Participants were asked to transcribe SWS or amplitude-comodulated SWS sentences. Recognition for SWS ranged from 35–90%, and amplitude-comodulated SWS ranged from 50–95%.
Möttönen, Calvert, Jääskeläinen, Matthews, Thesen, Tuomainen and Sams [37] 21 control participants (18–36; English) Participants underwent two fMRI scans: one before training on SWS, and one post-training. Activity in left posterior STS was increased after SWS training.
SPEECH-IN-NOISE
Target process: Auditory scene analysis (parsing of phonemes from acoustic background)
Ecological relevance: Understanding messages in background noise (e.g., “cocktail party effect”)
Pichora-Fuller et al. [75] 24 participants in three groups (age 23.9; 70.4; 75.8; English) Participants repeated the last word of sentences in 8-talker babble. Half had predictable endings. Both groups of older listeners derived more benefit from context than younger listeners.
Parbery-Clark et al. [76] 31 control participants (incl. 16 musicians; age: 23; English) Participants were assessed via clinical measures of speech perception in noise. Musicians outperformed the non-musicians on both QuickSIN and HINT.
Anderson et al. [77] 120 control participants (age 63.9) Peripheral auditory function, cognitive ability, speech-in-noise, and life experience were examined. Central processing and cognitive function predicted variance in speech-in-noise perception.
TIME-COMPRESSED SPEECH
Target process: Phoneme duration (rate of presentation)
Ecological relevance: Understanding rapid speech
Dupoux and Green [60] 160 control participants (English) Participants transcribed spoken sentence were compressed to 38% and 45% of their original durations. Participants improved over time. This happened more rapidly for the 45% compressed sentences.
Poldrack et al. [78] 8 control participants (age: 20–29; English) Participants listened to time-compressed speech. Brain responses were tracked using fMRI. Activity in bilateral IFG and left STG increased with compression, until speech became incomprehensible.
Peelle et al. [79] 8 control participants (age: 22.6; English) Participants listened to sentences manipulated for complexity and time-compression in an fMRI study. Time-compressed sentences recruited AC and premotor cortex, regardless of complexity.

The table is ordered by type of speech degradation. Information in the Participants column is based on available information from the original papers; age is given as a mean or range and language refers to participants’ native languages. Abbreviations: AC, anterior cingulate; DAF, delayed auditory feedback; F0, fundamental frequency; fMRI; functional magnetic resonance imaging; HINT, Hearing in Noise Test; IFG, inferior frontal gyrus; ms, millisecond; QuickSIN, Quick Speech in Noise Test; PET, positron emission tomography; STG, superior temporal gyrus; STS, superior temporal sulcus; SWS, sinewave speech.