Table 1.
Degradation Type | Study | Participants | Methodology | Major Findings |
---|---|---|---|---|
ACCENTS Target process: phonemic and intonational representations Ecological relevance: Understanding messages conveyed via non-canonical spoken phonemes and suprasegmental intonation |
Bent and Bradlow [65] | 65 healthy participants (age: 19.1) | Participants listened to English sentences spoken by Chinese, Korean, and English native speakers. | Non-native listeners found speech from non-native English speakers as intelligible as from a native speaker. |
Clarke and Garrett [66] | 164 healthy participants (American English) | Participants listened to English sentences spoken with a Spanish, Chinese, and English accent. | Processing speed initially slower for accented speech, but this deficit diminished with exposure. | |
Floccia, Butler, Goslin and Ellis [54] | 54 healthy participants (age 19.7; Southern British English) | Participants had to say if the last word in a spoken sentence was real or not. | Changing accent caused a delay in word identification, whether accent change was regional or foreign. | |
ALTERED AUDITORY FEEDBACK Target process: Influence of auditory feedback on speech production Ecological relevance: Ability to hear, process, and regulate speech from own production. |
Siegel and Pick [67] | 20 healthy participants | Participants produced speech whilst hearing amplified feedback of their own voice. | Participants lowered their voices (displaying the sidetone amplification effect) in all conditions. |
Jones and Munhall [68] | 18 healthy participants (age: 22.4; Canadian English) | Participants produced vowels with altered feedback of F0 shifted up or down. | Participants compensated for change in F0. | |
Donath et al. [69] | 22 healthy participants (age: 23; German) | Participants said a nonsense word with feedback of their frequency randomly shifting downwards. | Participants adjusted their voice F0 after a set period of time due to processing the feedback first. | |
Stuart et al. [70] | 17 healthy participants (age: 32.9; American English) | Participants spoke under DAF at 0, 25, 50, 200 ms at normal and fast rates of speech. | There were more dysfluencies at 200 ms, and more dysfluencies at the fast rate of speech. | |
DICHOTIC LISTENING Target process: Auditory scene analysis (auditory attention) Ecological relevance: Processing of spoken information with competing verbal material |
Moray [71] | Healthy participants, no other information given | Participants were told to focus on a message played to one ear, with a competing message in the other ear. | Participants did not recognize the content in the unattended message. |
Lewis [72] | 12 healthy participants | Participants were told to attend to message presented in one ear, with a competing message in the other. | Participants could not recall the unattended message, but semantic similarity affected reaction times. | |
Ding and Simon [73] | 10 healthy participants (age 19–25) | Under MEG, participants heard competing messages in each ear, and asked to attend to each in turn. | Auditory cortex tracked temporal modulations of both signals, but was stronger for the attended one. | |
NOISE-VOCODED SPEECH Target process: Phonemic spectral detail Ecological relevance: Understanding whisper (similar quality to speech heard by cochlear implant users) |
Shannon, Zeng, Kamath, Wygonski and Ekelid [59] | 8 healthy participants | Participants listened to and repeated simple sentences that had been noise-vocoded to different degrees. | Performance improved with number of channels; high speech recognition was achieved with only 3 channels. |
Davis, Johnsrude, Hervais-Adelman, Taylor and McGettigan [58] | 12 healthy participants (age 18–25; British English) | Participants listened to and then transcribed 6-channel noise-vocoded sentences. | Participants showed rapid improvement over the course of 30-sentence exposure. | |
Scott, Rosen, Lang and Wise [35] | 7 healthy participants (age 38) | Under PET, participants listened to spoken sentences that were noise-vocoded to various degrees. | Selective response to speech intelligibility in left anterior STS. | |
PERCEPTUAL RESTORATION Target process: Message interpolation Ecological relevance: Understanding messages in intermittent or varying noise (e.g., a poor telephone line) |
Warren [57] | 20 healthy participants | Participants identified where the gap was in sentences where a phoneme was replaced by silence/white noise. | Participants were more likely to mislocalize a missing phoneme that was replaced by noise. |
Samuel [74] | 20 healthy participants (English) | Participants heard sentences in which white noise was either “Added” to or “Replaced” a phoneme. | Phonemic restoration was more common for longer words and certain phone classes. | |
Leonard, Baud, Sjerps and Chang [43] | 5 healthy participants (age 38.6; English/Italian) | Subdural electrode arrays recorded while participants listened to words with noise-replaced phonemes. | Electrode responses were comparable to intact words vs. words with a phoneme replaced. | |
SINEWAVE SPEECH Target process: Speech reconstruction and adaptation from very impoverished cues Ecological relevance: Synthetic model for impoverished speech signal and perceptual learning |
Remez, Rubin, Pisoni and Carrell [63] | 54 control participants | Naïve listeners heard SWS replicas of spoken sentences and were later asked to transcribe the sentences. | Most listeners did not initially identify the SWS as speech, but were able to transcribe them when told this. |
Barker and Cooke [64] | 12 control participants | Participants were asked to transcribe SWS or amplitude-comodulated SWS sentences. | Recognition for SWS ranged from 35–90%, and amplitude-comodulated SWS ranged from 50–95%. | |
Möttönen, Calvert, Jääskeläinen, Matthews, Thesen, Tuomainen and Sams [37] | 21 control participants (18–36; English) | Participants underwent two fMRI scans: one before training on SWS, and one post-training. | Activity in left posterior STS was increased after SWS training. | |
SPEECH-IN-NOISE Target process: Auditory scene analysis (parsing of phonemes from acoustic background) Ecological relevance: Understanding messages in background noise (e.g., “cocktail party effect”) |
Pichora-Fuller et al. [75] | 24 participants in three groups (age 23.9; 70.4; 75.8; English) | Participants repeated the last word of sentences in 8-talker babble. Half had predictable endings. | Both groups of older listeners derived more benefit from context than younger listeners. |
Parbery-Clark et al. [76] | 31 control participants (incl. 16 musicians; age: 23; English) | Participants were assessed via clinical measures of speech perception in noise. | Musicians outperformed the non-musicians on both QuickSIN and HINT. | |
Anderson et al. [77] | 120 control participants (age 63.9) | Peripheral auditory function, cognitive ability, speech-in-noise, and life experience were examined. | Central processing and cognitive function predicted variance in speech-in-noise perception. | |
TIME-COMPRESSED SPEECH Target process: Phoneme duration (rate of presentation) Ecological relevance: Understanding rapid speech |
Dupoux and Green [60] | 160 control participants (English) | Participants transcribed spoken sentence were compressed to 38% and 45% of their original durations. | Participants improved over time. This happened more rapidly for the 45% compressed sentences. |
Poldrack et al. [78] | 8 control participants (age: 20–29; English) | Participants listened to time-compressed speech. Brain responses were tracked using fMRI. | Activity in bilateral IFG and left STG increased with compression, until speech became incomprehensible. | |
Peelle et al. [79] | 8 control participants (age: 22.6; English) | Participants listened to sentences manipulated for complexity and time-compression in an fMRI study. | Time-compressed sentences recruited AC and premotor cortex, regardless of complexity. |
The table is ordered by type of speech degradation. Information in the Participants column is based on available information from the original papers; age is given as a mean or range and language refers to participants’ native languages. Abbreviations: AC, anterior cingulate; DAF, delayed auditory feedback; F0, fundamental frequency; fMRI; functional magnetic resonance imaging; HINT, Hearing in Noise Test; IFG, inferior frontal gyrus; ms, millisecond; QuickSIN, Quick Speech in Noise Test; PET, positron emission tomography; STG, superior temporal gyrus; STS, superior temporal sulcus; SWS, sinewave speech.