Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
letter
. 2024 Mar 1;155(3):1704–1706. doi: 10.1121/10.0025125

Direct neural coding of speech: Reconsideration of Whalen et al. (2006) (L)

D H Whalen 1,2,1,2,a),
PMCID: PMC10908555  PMID: 38426833

Abstract

Previous brain imaging results indicated that speech perception proceeded independently of the auditory primitives that are the product of primary auditory cortex [Whalen, Benson, Richardson, Swainson, Clark, Lai, Mencl, Fulbright, Constable, and Liberman (2006). J. Acoust. Soc. Am. 119, 575–581]. Recent evidence using electrocorticography [Hamilton, Oganian, Hall, and Chang (2021). Cell 184, 4626–4639] indicates that there is a more direct connection from subcortical regions to cortical speech regions than previous studies had shown. Although the mechanism differs, the Hamilton, Oganian, Hall, and Chang result supports the original conclusion even more strongly: Speech perception does not rely on the analysis of primitives from auditory analysis. Rather, the speech signal is processed as speech from the beginning.

I. INTRODUCTION

Perception of the acoustic speech signal is an important component of human communication, subject to many decades of research. Because it is an acoustic signal, it has been taken as obvious that the objects of speech perception are acoustic (Diehl and Kluender, 1989; Ohala, 1996; Fox et al., 2020). In such an approach, auditory processing in the primary auditory cortex (PAC) is necessary (Obleser et al., 2006; Mesgarani et al., 2008; Steinschneider et al., 2013).

Many behavioral studies challenge the assumption of the primacy of acoustics, indicating instead that perception extracts the articulatory gestures that could have generated the acoustic pattern (Liberman et al., 1967; Fowler, 2006; Whalen, 2019). One extensive study shows that none of the Gestalt principles of perceptual organization that have been proposed for acoustic signals apply to speech perception (Remez et al., 1994).

These theories, and the assumptions about the contribution of PAC to speech perception, have been put to the test in a variety of studies. Two studies are the focus of this Letter to the Editor: one relying on the assumed neural pathway for acoustic information, and the other showing previously undetected pathways. Both indicate that the processing of speech does not rely on primary auditory cues but, instead, on speech as a separate, specialized system.

II. WHALEN ET AL. (2006)

A functional magnetic resonance imaging (fMRI) study was undertaken to determine whether auditory primitives from PAC were necessary for speech perception or not (Whalen et al., 2006). The stimuli were designed to test a range of complexity in both speech and nonspeech signals. Although such correspondences are challenging to equate, the speech signals were, if anything, more complex than the nonspeech ones. Therefore, if the first stage of processing were to take place in PAC, both the speech and nonspeech should have shown greater activation there with increasing complexity.

The results indicated that the speech complexity did not correlate with PAC activation at all; instead, portions of superior temporal gyrus (STG) showed a correlation. This was interpreted as showing that speech is treated in a specialized way from the beginning of the analysis of the acoustic signal. No use of primary auditory features appeared to be used in constructing a speech percept.

Examination of neural pathways available at the time did not indicate any significant direct connections from subcortical regions to STG (e.g., Scott et al., 2000), and such views continue to be published (e.g., Peelle and Wingfield, 2022). It was therefore posited that a form of “post-emption” (Whalen et al., 2006, p. 580) would inform PAC that analysis was not necessary. That is, the speech system would recognize that a part of the acoustic signal was speech and would then “post-empt” that energy from the PAC's analysis. This model was consistent with several behavioral studies showing that after about 50 ms, a nonspeech percept could be established that would then pre-empt its usage as speech (Whalen and Liberman, 1996).

III. HAMILTON ET AL. (2021)

A recent study employed intracranial recordings and stimulation of the human brain to explore auditory dependencies much more directly (Hamilton et al., 2021). PAC is located deep within the lateral sulcus, and it is thus challenging to access. However, in rare cases, high-resolution information functional mapping of this region is required for neurosurgical intervention. Such cases provide an opportunity to record neural responses in PAC to auditory stimuli of various sorts using electrocorticography (ECoG) and to perturb and/or suppress activity in PAC and other auditory regions using direct focal electrocortical stimulation (ECS). The ECoG recordings reported by this study showed that both PAC and regions of STG were rapidly activated during audition, suggesting two pathways process acoustic information in parallel. Furthermore, similarly to the earlier studies of Penfield (Penfield and Roberts, 1959), it is possible to induce localized, reversible (in)activation of neural regions by stimulating one or more electrodes with a small amount of electricity. The researchers did so with PAC and STG. When PAC was stimulated, patients reported hearing strange nonspeech sounds. At the same time, if they were presented with speech stimuli acoustically, they showed no interference at all. When STG was stimulated, speech perception was marginal.

The authors conclude that there are strong indications for direct connections from subcortical auditory regions to STG. While further work is needed to elaborate on the pathways, such a specialization would be far more efficient than the post-emption proposed by Whalen et al. (2006), but as stated earlier, there was little previous evidence for such a pathway. The fact that the speech was not impaired at all when PAC was not involved indicates that this direct pathway contains sufficient information for speech perception without contributions from auditory primitives.

IV. SPECIALIZATION AND COMPETITION

The timing of the processing of speech must be, to some extent, simultaneous with that of nonspeech. That is due to the fact that there is no overt marker of what is speech and what is not. If a signal can be parsed as a speech signal, it is speech; otherwise, not (see Mattingly and Liberman, 1988, p. 785). This is true even when the signal does not appear to come from a human but nonetheless has the same kind of coherence, namely, in sinewave speech (Remez et al., 1981). Therefore, it is likely that all sounds project to both PAC and STG. Exactly how the perceptual system resolves the resulting competition is only partly understood, but the results from the two target papers of this Letter indicate that further effort at tracking the coordination is justified, perhaps with a different perspective than has been taken in the past.

One concrete example of how the competition unfolds is found in duplex perception (Rand, 1974), in which a signal that is heard as nonspeech also contributes to the speech on the opposite ear. When two percepts are attributed to the same signal, it is clear that two systems are active. With dichotic presentation, such doubling of the percept seems rather intuitive, but duplex perception can occur with monaural presentation as well (Whalen and Liberman, 1987). When a small portion of the signal that determines the place of articulation of the perceived stop (in this case, the F3 transition) is low in intensity, only the speech is heard. At higher intensities, the nonspeech aspect of the F3 transition becomes apparent, and its loudness increases as the intensity increases (Xu et al., 1997). The nonspeech system can “capture” the F3 transition if it has a 50 ms head start (Whalen and Liberman, 1996). Thus, the competition between the systems is immediate and intricate.

V. CONCLUSION

Our knowledge of brain organization continues to expand, as new techniques and situations are explored. The results of Hamilton et al. (2021) were not attainable until recently, and thus the evidence for a direct auditory connection to STG was quite difficult to discover. Post-mortem analysis of brain connections does not allow for a determination of the utility of any neurons that might connect different regions, so only a combination of imaging and behavioral results can provide the relevant data. Whalen et al. (2006) was based on the best evidence at the time, but the recent results from Hamilton et al. (2021) undermine those assumptions.

The conclusion of Whalen et al. (2006), however, is made even stronger by the results of Hamilton et al. (2021): Acoustic analysis of speech occurs largely in STG. The analysis of auditory primitives in PAC is not necessary and would, indeed, be counterproductive in many cases. It is still possible that an acoustic representation of speech occurs entirely within STG, rather than depending on the gestures proposed in Whalen et al. Such a duplication of the products of PAC is possible, but there is not an immediately apparent reason for it. In either case, speech signals are processed, at least for the most part, as speech from the first connection with the cortex.

ACKNOWLEDGMENTS

This work was supported by NIH grant DC-002717 to Haskins Laboratories and the Yale Child Study Center. Thanks go to Gregg Castellucci, David J. Ostry, Edward F. Chang, Kenneth R. Pugh, Joseph C. Toscano, Bob McMurray, and Liberty Hamilton for helpful comments.

AUTHOR DECLARATIONS

Conflict of Interest

The author reports no conflict of interest.

DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • 1. Diehl, R. L. , and Kluender, K. R. (1989). “ On the objects of speech perception,” Ecol. Psych. 1, 121–144. 10.1207/s15326969eco0102_2 [DOI] [Google Scholar]
  • 2. Fowler, C. A. (2006). “ Compensation for coarticulation reflects gesture perception, not spectral contrast,” Percept. Psychophys. 68, 161–177. 10.3758/BF03193666 [DOI] [PubMed] [Google Scholar]
  • 3. Fox, N. P. , Leonard, M. , Sjerps, M. J. , and Chang, E. F. (2020). “ Transformation of a temporal speech cue to a spatial neural code in human auditory cortex,” eLife 9, e53051. 10.7554/eLife.53051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Hamilton, L. S. , Oganian, Y. , Hall, J. , and Chang, E. F. (2021). “ Parallel and distributed encoding of speech across human auditory cortex,” Cell 184, 4626–4639. 10.1016/j.cell.2021.07.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Liberman, A. M. , Cooper, F. S. , Shankweiler, D. P. , and Studdert-Kennedy, M. (1967). “ Perception of the speech code,” Psychol. Rev. 74, 431–461. 10.1037/h0020279 [DOI] [PubMed] [Google Scholar]
  • 6. Mattingly, I. G. , and Liberman, A. M. (1988). “ Specialized perceiving systems for speech and other biologically significant sounds,” in Auditory Function, edited by Edelman G. M., Gall W. E., and Cowan W. M. ( John Wiley & Sons, New York: ), pp. 775–793. [Google Scholar]
  • 7. Mesgarani, N. , David, S. V. , Fritz, J. B. , and Shamma, S. A. (2008). “ Phoneme representation and classification in primary auditory cortex,” J. Acoust. Soc. Am. 123, 899–909. 10.1121/1.2816572 [DOI] [PubMed] [Google Scholar]
  • 8. Obleser, J. , Scott, S. K. , and Eulitz, C. (2006). “ Now you hear it, now you don't: Transient traces of consonants and their nonspeech analogues in the human brain,” Cerebral Cortex 16, 1069–1076. 10.1093/cercor/bhj047 [DOI] [PubMed] [Google Scholar]
  • 9. Ohala, J. J. (1996). “ Speech perception is hearing sounds, not tongues,” J. Acoust. Soc. Am. 99, 1718–1725. 10.1121/1.414696 [DOI] [PubMed] [Google Scholar]
  • 10. Peelle, J. E. , and Wingfield, A. (2022). “ How our brains make sense of noisy speech,” Acoust. Today 18, 40–48. 10.1121/AT.2022.18.3.40 [DOI] [Google Scholar]
  • 11. Penfield, W. , and Roberts, L. (1959). Speech and Brain-Mechanisms ( Princeton University Press, Princeton, NJ: ). [Google Scholar]
  • 12. Rand, T. C. (1974). “ Dichotic release from masking for speech,” J. Acoust. Soc. Am. 55, 678–680. 10.1121/1.1914584 [DOI] [PubMed] [Google Scholar]
  • 13. Remez, R. E. , Rubin, P. E. , Berns, S. M. , Pardo, J. S. , and Lang, J. M. (1994). “ On the perceptual organization of speech,” Psychol. Rev. 101, 129–156. 10.1037/0033-295X.101.1.129 [DOI] [PubMed] [Google Scholar]
  • 14. Remez, R. E. , Rubin, P. E. , Pisoni, D. B. , and Carrell, T. D. (1981). “ Speech perception without traditional speech cues,” Science 212, 947–950. 10.1126/science.7233191 [DOI] [PubMed] [Google Scholar]
  • 15. Scott, S. K. , Blank, C. C. , Rosen, S. , and Wise, R. J. S. (2000). “ Identification of a pathway for intelligible speech in the left temporal lobe,” Brain 123, 2400–2406. 10.1093/brain/123.12.2400 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Steinschneider, M. , Nourski, K. V. , and Fishman, Y. I. (2013). “ Representation of speech in human auditory cortex: Is it special?,” Hear. Res. 305, 57–73. 10.1016/j.heares.2013.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Whalen, D. H. (2019). “ The motor theory of speech perception,” in Oxford Research Encyclopedia of Linguistics, edited by Aronoff M. ( Oxford University Press, Oxford, UK: ). [Google Scholar]
  • 18. Whalen, D. H. , Benson, R. R. , Richardson, M. , Swainson, B. , Clark, V. , Lai, S. , Mencl, W. E. , Fulbright, R. K. , Constable, R. T. , and Liberman, A. M. (2006). “ Differentiation for speech and nonspeech processing within primary auditory cortex,” J. Acoust. Soc. Am. 119, 575–581. 10.1121/1.2139627 [DOI] [PubMed] [Google Scholar]
  • 19. Whalen, D. H. , and Liberman, A. M. (1987). “ Speech perception takes precedence over nonspeech perception,” Science 237, 169–171. 10.1126/science.3603014 [DOI] [PubMed] [Google Scholar]
  • 20. Whalen, D. H. , and Liberman, A. M. (1996). “ Limits on phonetic integration in duplex perception,” Percept. Psychophys. 58, 857–870. 10.3758/BF03205488 [DOI] [PubMed] [Google Scholar]
  • 21. Xu, Y. , Liberman, A. M. , and Whalen, D. H. (1997). “ On the immediacy of phonetic perception,” Psychol. Sci. 8, 358–362. 10.1111/j.1467-9280.1997.tb00425.x [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES