Figure 10.
Giraud and Poeppel's theory of speech perception: Cortical theta and gamma oscillations parse connected speech. This five-step theory assumes a high-resolution spectro-temporal representation of speech in primary auditory cortex. This representation cannot rely only on the phase-locked synchronous firing of neurons that generate the FFR of the cABR up to 1500 Hz. Rather, a place-rate code must constitute such a(n asynchronous) cortical input. Thus, a typical spike train inputs deep layer IV cortical neurons, which phase-lock to speech amplitude modulations. Response onset elicits a reset of theta oscillations in superficial layers II and III (step 1)—the output of the auditory cortex. After this reset, theta oscillations follow the speech envelope (step 2). In turn, that theta reset causes a brief pause of gamma activity and a subsequent reset of gamma oscillations. The coupling of theta and gamma generators becomes both stronger and “nested” such that the phase of speech envelope following theta oscillations controls the phase and power of the gamma oscillations (step 3). This gamma power controls the excitability of neurons generating the feedforward signal from primary auditory cortex to higher order areas (step 4). This neuronal excitability phase aligns to speech modulations (step 5). We postulate such cortical modulations of neuronal firing by neuronal oscillations parse auditory input and serve as a context that corticofugally influences subcortical neural entrainment of corticopetal-corticofugal loops. Accordingly, linguistic factors can promote the perception of speech in noise in a top-down manner. Credit: Reprinted by permission from Macmillan Publishers Ltd: NATURE NEUROSCIENCE (Giraud and Poeppel, 2012), Copyright © 2012.