Skip to main content
. 2016 Nov 23;6:37647. doi: 10.1038/srep37647

Figure 1. Schematic of speech segmentations in automatic speech recognition (ASR) system and brain.

Figure 1

Segmenting continuous speech into short frames is the first step in the speech recognition process. In the ASR system, the most widely used speech segmentation approach employs fixed-size external time bins as a reference (‘time-partitioned’). This approach is computationally simple but has a limitation with respect to reflecting a quasi-regular structure of speech. Alternatively, the brain, which does not have an external timing reference, uses an intrinsic slow (neuronal) oscillatory signal as a segmentation reference. This oscillatory signal is phase-locked with the speech envelope during comprehension, which enables the reflection of quasi-regular temporal dynamics of speech in segmentation. The phase of this oscillatory signal is separated into four phase quadrants (φi). The speech waveform and speech-induced spike trains are segmented and color-coded by the phase angle of the reference oscillatory signal (‘phased-partitioned’). This segmentation approach can potentially generate unequally sized time bins depending on the temporal dynamics of speech. In this paper, we investigated whether the speech envelope can serve as a potential temporal reference for segmenting speech.