Skip to main content
. 2020 Aug 25;9:e53051. doi: 10.7554/eLife.53051

Figure 1. Speech sound categories that are distinguished by a temporal cue are spatially encoded in the peak amplitude of neural activity in distinct neural populations.

(A) Stimuli varied only in voice-onset time (VOT), the duration between the onset of the burst (top) and the onset of voicing (bottom) (a.u. = arbitrary units). (B) Acoustic waveforms of the first 100 ms of the six synthesized stimuli. (C) Behavior for one example participant (mean ± bootstrap SE). Best-fit psychometric curve (mixed effects logistic regression) yields voicing category boundary between 20–30 ms (50% crossover point). (D) Neural responses in the same representative participant show selectivity for either voiceless or voiced VOTs at different electrodes. Electrode size indicates peak high-gamma (HG; z-scored) amplitude at all speech-responsive temporal lobe sites. Electrode color reflects strength and direction of selectivity (Spearman’s ρ between peak HG amplitude and VOT) at VOT-sensitive sites (p<0.05). (E) Average HG responses (± SE) to voiced (0–20 ms VOTs; red) and voiceless (30–50 ms VOTs; blue) stimuli in two example electrodes from (D), aligned to stimulus onset (e1: voiceless-selective, V-; e2: voiced-selective, V+). Horizontal black bars indicate timepoints with category discriminability (p<0.005). Grey boxes mark average peak window (± SD) across all VOT-sensitive electrodes (n = 49). (F) Population-based classification of voicing category (/p/ vs. /b/) during peak window (150–250 ms after stimulus onset). Chance is 50%. Boxes show interquartile range across all participants; whiskers extend to best- and worst-performing participants; horizontal bars show median performance. Asterisks indicate significantly better-than-chance classification across participants (p<0.05; n.s. = not significant). Circles represent individual participants.

Figure 1.

Figure 1—figure supplement 1. Identification behavior across all participants with behavioral data.

Figure 1—figure supplement 1.

(A) Mean (± SE across participants; n = 4 of 7 participants) percent /pa/ responses for each voice-onset time (VOT) stimulus. Best-fit psychometric curve (mixed effects logistic regression) yields voicing category boundary at 21.0 ms (50% crossover point; see Materials and methods for details). (B) Behavior (mean ± bootstrap SE) for each individual participant (P1, P2, P6, P7). Total trials (n) listed for each participant (see Supplementary file 1). Best-fit psychometric curves and category boundaries were computed using the mixed effects logistic regression across all participants, adjusted by the random intercept fit by the model for each participant. Voicing category boundaries were subject-dependent, with 3 of 4 participants’ occurring between 20–30 ms. P1 is representative participant in Figure 1C.

Figure 1—figure supplement 2. Locations of all speech-responsive and VOT-sensitive electrodes in each participant (P1–P7).

Figure 1—figure supplement 2.

P1 is representative participant in Figure 1D. Electrode color reflects strength and direction of selectivity (Spearman’s ρ between peak HG amplitude and VOT) at subset of VOT-sensitive sites (p<0.05) for either voiceless VOTs (/p/; blue) or voiced VOTs (/b/; red). Electrode size indicates peak high-gamma (HG; z-scored) amplitude at all speech-responsive temporal lobe sites. Maximum and minimum electrode size and selectivity was calculated per participant for visualization.

Figure 1—figure supplement 3. Analysis of evoked local field potentials reveals that some electrodes that encode VOT in their peak high-gamma amplitude also exhibit amplitude and/or temporal response features that are VOT-dependent.

Figure 1—figure supplement 3.

(A) Grand average auditory evoked potential (AEP) to all VOT stimuli. Evoked local field potentials (negative up-going) were averaged over all VOT-sensitive STG electrodes for one representative participant (P1) (mean ± SE, computed across electrodes). Three peaks of the AEP were identified for analysis: 75–100 ms (Pα), 100–150 ms (Nα), and 150–250 ms (Pβ) after stimulus onset. (B) Correlation coefficients (Pearson’s r) quantifying association between VOT and latency (top) or amplitude (bottom) of each peak (Pα: left; Nα: middle; Pβ: right) for each VOT-sensitive electrode for which that peak could be reliably identified (see Figure 1—figure supplement 4 and Materials and methods for details of this analysis). Horizontal bars represent bootstrapped estimate of correlation coefficient (mean and 95% CI) for each electrode (blue: voiceless-selective; red: voiced-selective; electrodes sorted by mean correlation value). Black bars around an electrode’s mean indicate that encoding of VOT by the designated parameter (latency or amplitude of a given peak) was significant (95% CI excluded r = 0; grey bars: not significant). Later peaks were reliably identified for fewer electrodes (Pα: n = 32 of 49 electrodes; Nα: n = 19; Pβ: n = 15).

Figure 1—figure supplement 4. Complex and variable associations between VOT and amplitude/temporal features of auditory evoked local field potentials (AEPs) exist in responses of electrodes that robustly encode voicing in their peak high-gamma amplitude.

Figure 1—figure supplement 4.

(A to D) Average high-gamma responses (± SE) to voiced (0–20 ms VOTs; red) and voiceless (30–50 ms VOTs; blue) stimuli in four representative VOT-sensitive STG electrodes, including two voiceless-selective (A: e1, C: e3) and two voiced-selective (B: e2, D: e4) electrodes, aligned to stimulus onset. Vertical bars indicate relative scaling of high-gamma (z-scored) in each panel. The two leftmost electrodes (e1, e2) correspond to e1 and e2 in main text (e.g., Figure 1E). (E to H) Average local field potentials (± SE) evoked by voiced/voiceless stimuli in the same four electrodes, aligned to stimulus onset. Vertical bars (negative-upgoing) indicate relative scaling of voltage in each panel. The three peaks of the AEP that were identified for analysis are labeled for each electrode (Pα, Nα, Pβ; see Figure 1—figure supplement 3). For a given electrode, peaks were omitted from this analysis if they could not be reliably identified across bootstrapped samples of trials from all six VOT conditions (e.g., Pβ for e4). See Materials and methods for details. (I to L) Average local field potentials evoked by each VOT stimulus (line color) in the same four electrodes, aligned to stimulus onset. (M to P) Mean latency (± bootstrap SE) of each AEP peak for each VOT stimulus for the same four electrodes. Mean bootstrapped correlation (Pearson’s r) between VOT and peak latency shown for each peak/electrode. (Q to T) Mean amplitude (± bootstrap SE) of each AEP peak for each VOT stimulus for the same four electrodes. Mean bootstrapped correlation (Pearson’s r) between VOT and peak amplitude shown for each peak/electrode. Note that negative correlations are visually represented as rising from left to right. Correlation coefficients comprised the source data for summary representations in Figure 1—figure supplement 3.