Abstract
Modulation-filterbank models discard phase information above very low rates of amplitude modulation (AM). The present work evaluated this restriction by measuring thresholds for discriminating the starting phase of sinusoidal modulators of wideband-noise carriers. Results showed a low-pass characteristic with some listeners unable to perform the task once the modulation rate was greater than 12.5 Hz. For others, however, thresholds were obtained with AM rates of up to one to two octaves higher. Intersubject variability may in part relate to the presence of multiple discrimination cues, with only some based on comparison of the ongoing pattern of envelope fluctuation.
1. Introduction
Current psychophysical models of auditory processing of amplitude modulation (AM) are often based on a modulation filterbank (MF). Following envelope detection, stimuli in MF models are processed by an array of broadly tuned, overlapping bandpass filters that span the range of envelope fluctuation rates detectable by human observers. Past work has shown MF models successful at accounting for a variety of results, including AM detection with various carrier types and bandwidths, AM depth discrimination, and modulation masking (Dau et al., 1997, 1999; Verhey et al., 2003; Ewert and Dau, 2004). While some model applications base decision statistics solely on integrated envelope power, others retain temporal response features of filter-bank channels centered at 10 Hz and below. The basis of this low-pass restriction comes from work by Dau (1996), who for three listeners measured performance at discriminating envelope starting phase with sinusoidal modulators.
The temporal pattern of low-rate envelope fluctuation can play an important role in envelope perception, most notably with speech for which intelligibility depends on both the magnitude and phase of the low-rate modulation spectrum (Greenberg and Arai, 2004). Understanding of the basis and limitations of envelope-pattern processing can be important when applying MF models to complex stimulus configurations (Viemeister et al., 2004). With variation of envelope starting phase, discrimination may be cued by either the temporal pattern of the envelope or processing of some related consequence of the stimulus manipulation (e.g., intensity discrimination between observation intervals at a specific stimulus time). The goal of the present work was to gain insight into the cues that listeners might use to discriminate envelope phase. To minimize possible involvement of specific cues, some conditions involved randomization of stimulus parameters. Either individually or in combination, possible randomizations included overall level, stimulus duration, and the starting phase of the reference modulator.
2. Method
Thresholds for discriminating the starting phase of sinusoidal modulators of gated wideband-noise carriers were measured in terms of radians. The modulation index was always 1.0. In the cued two-interval forced-choice task, the nonsignal interval repeated the cue modulator with the phase increment added to the signal-interval modulator. Independent noise samples were used for the carrier on each stimulus presentation. In the audio-frequency domain, the short-term amplitude spectrum of a specific carrier varies with modulator phase. Use of independent noise carriers precludes this variation from providing a discrimination cue.
In each condition, the starting phase of the cue modulator was either fixed in sine phase or randomized across trials. Random phases were selected from a uniform distribution ranging from 0.0 to 2π radians. Duration and level were also either fixed or randomized. With duration randomization, the cue duration was always 400 ms, with the stimulus duration of each observation interval randomly selected from a uniform distribution extending from 200 to 600 ms. In conditions without duration randomization, the common duration of all stimulus presentations of a trial varied from 50 to 800 ms. Across AM rates, duration was limited so that a minimum of one full cycle of modulation was presented. The interstimulus interval was always 400 ms with all stimuli shaped with 5-ms cos2 rise/fall ramps. In the fixed-level conditions, overall stimulus level was 65 dB SPL, corresponding to a pressure spectrum level of 24.5 dB. Conditions with level randomization employed a 20-dB rove (+/−10 dB) about a mean level of 65 dB SPL. Dell PCs with 24-bit Echo Gina 3G soundcards were used for stimulus generation and experimental control. Following analog conversion at a 44.1-kHz sampling rate, stimuli were low-pass filtered at 12 kHz and presented diotically through Sennheiser HD 520 II headphones, with the subjects seated in a double-walled soundproof booth.
Thresholds were measured with a three-down, one-up tracking procedure with feedback (Levitt, 1971). The initial tracking step size varied phase by a factor of 1.25. Following two reversals, the factor was reduced to 1.125. A run was terminated after 16 reversals with threshold estimated as the geometric mean of the last 12 reversals. Reported threshold values are the geometric mean of at least six threshold estimates. If the tracking procedure called for a phase increment equal to or greater than π radians, threshold was not calculated for that run. In Sec. 3, results which indicate that a threshold could not be obtained in a specific condition represent at least four attempts. Data were collected from six listeners, though only four participated in all conditions. Experimental protocol was approved by the Institutional Review Board of Loyola University Chicago.
3. Results and discussion
Individual differences characterized results from all conditions. Figure 1 shows individual thresholds as a function of AM rate with the cue modulator either in sine (circles) or random (triangles) phase. Stimulus duration was 400 ms. With increasing AM rate, the first rate at which threshold could not be obtained with the adaptive procedure is indicated by the appropriate symbol shape with a slash. For both configurations of cue-modulator phase, results indicate a low-pass characteristic, that is, threshold values rose with AM rate. The highest AM rate at which threshold could be obtained varied across subjects from 10 to 50 Hz. Except for subject S2, randomizing cue modulator phase had little effect on this upper cutoff rate. The low-pass result is in general agreement with the findings of Dau (1996). In that study, performance was measured for detecting a modulator-phase increment of π radians with randomization of the reference phase. Results showed that all three subjects were at chance performance once the modulation rate was increased to 12 Hz. The present work, however, indicates that for some subjects, the highest AM rate at which starting phase is discriminable can be one to two octaves higher, despite randomization of the reference phase angle.
Fig. 1.
Individual phase-discrimination thresholds as a function of AM rate with a 400-ms stimulus duration. The cue modulator was either in sine (circles) or random (triangles) phase. Unconnected symbols with a slash indicate that a threshold could not be obtained in that condition. Error bars represent 1 standard deviation (s.d.) of the mean threshold.
In Fig. 1, the effect of phase randomization on threshold values was more pronounced than on rate limitation, and in almost all cases, covered the entire range over which individual thresholds could be measured. This result suggests that randomization of cue-modulator phase disrupted use of a discrimination cue(s) that was used at all AM rates. Potentially, there are multiple cues that could be used to discriminate envelope phase with the effect of phase randomization, or more generally starting phase angle per se, dependent on specific cue. Listeners could compare or cross correlate the temporal pattern of envelope fluctuation. Alternatively, they could focus on a specific stimulus time and attempt to detect a cross-interval intensity difference. Finally, the “effective” or perceived stimulus duration may vary with modulator phase if in some cases the envelope begins and ends near trough values. The effect of phase randomization may then be from either actual disruption of the information of a specific discrimination cue or from a reduction in the frequency of occurrence of “optimal” cue presentation. While the present data sets do not distinguish these possibilities, the intent of the remaining conditions was to evaluate possible involvement of the various detection cues.
Dau et al. (1997) modeled AM detection with a decision statistic based on cross correlation of the signal with a stored template. Use of a correlation receiver for envelope-phase discrimination predicts that performance in terms of d’ should improve by the square root of duration. Figure 2 shows thresholds obtained as a function of stimulus duration in conditions in which cue-modulator phase was randomized. The parameter is AM rate. With 20-Hz AM, conditions in which thresholds could not be measured were at the longer, not shorter, stimulus durations. In conditions with measurable thresholds, the functions were all relatively flat. Both results contrast with predictions based on discrimination by a correlation receiver. Dau (1996) reported no significant dependence of envelope-phase-discrimination performance on duration in the range of 300−700 ms. With duration discrimination showing an approximate Weber relationship (Abel, 1972), absence of an effect of duration in both past and present results discounts the suggestion of phase discrimination cued by change in perceived stimulus duration.
Fig. 2.
Individual thresholds as a function of stimulus duration in conditions with randomization of cue-modulator phase. Symbol shape indicates AM rate of either 5 (squares), 10 (circles), or 20 (triangles) Hz. Unconnected symbols with a slash indicate that a threshold could not be obtained in that condition. Error bars represent 1 s.d. of the mean threshold.
Cross-interval intensity discrimination at specific stimulus times is another potential cue that represents processing of a consequence of envelope modulation rather than of the ongoing modulation per se. In this scheme, listeners would for example estimate the time of the first peak of the cue modulator, and judge for which of the two observation intervals level has varied at the estimated time from stimulus onset. Results from the first data set indicated a substantial effect of randomizing the reference envelope phase. Though most likely having impact by varying from trial to trial the times relative to onset of the cue-modulator peaks, phase randomization does not eliminate the ability to base decisions on cross-interval intensity discrimination. Randomization of the overall level of each stimulus presentation of a trial can eliminate this cue. In the final condition set, level and duration were randomized, both individually and in combination with cue-modulator phase.
Results for three subjects are shown in Table 1. The ability to perform the task in conditions of parameter randomization varied dramatically among subjects. Subject S1 was able to complete all but two of the conditions, while for S2, thresholds could not be obtained in 18 of the 30 conditions that involved randomization. As observed in the data of Fig. 1, randomization affected sensitivity in a way that varied across subjects. For individual listeners, data trends were similar with either level or duration randomization. Though impoverished by phase or duration randomization, only level randomization eliminates the cross-interval intensity-discrimination cue. Results suggest that this cue was often not effectively used. Concurrent randomization of several stimulus parameters generally made the task more difficult. Results from conditions in which stimulus duration was varied (see Fig. 2) suggest that the effect of duration randomization relates to neither variation in threshold with duration nor loss of a duration-discrimination cue. Along with restricting potential involvement of various discrimination cues, randomization introduces uncertainty which can increase the variance of the discrimination process. Overall data trends suggest that by introducing uncertainty, randomization itself affects performance.
Table 1.
Individual subject thresholds in radians. The conditions labeled "fixed" refer to the cue modulator in sine phase; labels "dB," "ms," and "rad" indicate randomization of level, duration, and starting phase, respectively. An asterisk indicates that threshold could not be measured adaptively in that condition.
| AM Rate (Hz) |
||||||
|---|---|---|---|---|---|---|
| Subj | Condition | 5 | 10 | 20 | 30 | 40 |
| S1 | fixed | 0.37 | 0.56 | 0.61 | 0.78 | 0.96 |
| dB | 0.42 | 0.69 | 0.83 | 0.9 | 1.21 | |
| ms | 0.92 | 0.52 | 0.77 | 0.75 | 0.84 | |
| rad | 1.02 | 1.19 | 1.18 | 1.60 | 1.56 | |
| rad, dB | 1.06 | 1.41 | 1.67 | 2.20 | * | |
| rad, ms | 1.46 | 1.27 | 1.01 | 1.41 | 1.92 | |
| rad, dB, ms | 1.46 | 1.24 | 1.89 | 1.88 | * | |
| S2 | fixed | 0.56 | 0.88 | 0.73 | 0.59 | 0.87 |
| dB | 0.85 | 1.5 | * | * | * | |
| ms | 0.7 | 1.32 | * | * | * | |
| rad | 1.29 | 1.30 | 1.83 | * | * | |
| rad, dB | 1.08 | 1.66 | * | * | * | |
| rad, ms | 1.70 | 2.05 | * | * | * | |
| rad, dB, ms | 1.52 | * | * | * | * | |
| S3 | fixed | 0.37 | 0.48 | 0.58 | 1.37 | * |
| dB | 0.74 | 1.11 | 0.96 | 1.14 | * | |
| ms | 1.11 | 0.98 | 0.85 | 1.16 | 1.33 | |
| rad | 1.06 | 1.55 | 1.44 | * | * | |
| rad, dB | 1.33 | 1.94 | 2.06 | * | * | |
| rad, ms | 1.89 | 1.80 | * | * | * | |
| rad, dB, ms | * | * | * | * | * | |
The absence of an effect of stimulus duration was interpreted as inconsistent with use of a decision statistic based on cross correlation. Alternatively, listeners may, at least in some conditions, perform a limited pattern matching of distinctive stimulus features. By subject report, one cue that varies with phase is irregularity of envelope pattern near stimulus onset. With periodic modulation, the time between envelope peaks is not dependent on starting modulator phase. However, the time from stimulus onset to the first envelope peak does vary with phase. This variation may be perceived as altering the regularity of the envelope pattern. Perception of envelope rhythm requires resolution of the individual envelope peaks, that is, a perception of envelope fluctuation rather than roughness. The low fluctuation strength associated with AM rates of greater than 20 Hz (see Fastl, 1983) is consistent with involvement of a rhythm cue in current results. It is unclear whether this cue would persist for AM rates of 30 Hz and greater for which some subjects were still able to perform the phase-discrimination task.
In the modulation domain, change in the starting phase of the modulator affects the short-term amplitude spectrum. At stimulus onset, the effect is observed in terms of amplitude splatter, which conceivably is discriminable with frequency-selective modulation processing. Sheft and Yost (2004) demonstrated that variation in the envelope-phase spectrum can be discriminated for durations as brief as roughly 16 ms (shorter durations were not considered). To evaluate the potential utility of short-term spectral cues at stimulus onset, simulations were run using the ten-channel MF-model parameters of Sheft and Yost (2004). To estimate best-case performance, model predictions ignored potential cross-MF-channel masking (see Sheft and Yost, 2004) with the decision statistic based solely on change in channel-output power calculated over a 10-ms window at stimulus onset. Without internal noise added to the simulations, model performance is limited solely by the intrinsic envelope fluctuations of the independent carrier samples. In conditions with the cue modulator in sine phase, model performance increased with AM rate in contrast to the trend for low-pass subject performance. At 40 Hz, the simulated d’ value was 4.1 with a delta phase of π radians. However, model performance dropped to a d’ value of no greater than 0.3 when adding the internal noise level needed to account for the discrimination results of Sheft and Yost (2004). Randomization of the cue-modulator starting phase also adversely affected simulations, with model performance never significantly exceeding chance performance. Overall, these results discount potential involvement of short-term spectral cues in the current discrimination conditions.
Along with cross-MF-channel masking of short-term spectral-amplitude differences, another source of likely masking is temporally based, that is masking of onset splatter by subsequent stimulation. Due to the intrinsic envelope fluctuations of the WBN carriers, wideband modulation is present throughout the stimulus duration and is not just a result of onset splatter. Temporal masking most likely would degrade the information derived from a 10-ms window at stimulus onset in the previous simulations. A second effect of intrinsic carrier fluctuations is the introduction of second-order modulation when the carrier is sinusoidally modulated. The higher MF channels show significant response to the intrinsic carrier fluctuations. The Hilbert envelope of this response temporally follows the pattern of the lower-rate sinusoidal modulators, thus a function of modulator phase. As with onset cues considered above, the addition of internal noise into model simulations eliminated the efficacy of this potential discrimination cue.
From a modeling standpoint, the key issue concerning the present work is the extent to which MF models should retain envelope-phase information. Regardless of the specific discrimination cue(s) used by listeners, results are relevant to this concern as long as the decision statistic is based on MF output. MF models incorporate a low filter Q and shallow filter-skirt slopes. With significant response to stimuli well removed from the filter center frequency (CF), ability to discriminate envelope phase would be expected at AM rates which exceed the CFs of MF channels which retain temporal response features. The current results indicate a need for upward revision of the upper cutoff for preserving phase information in MF models. Using the model parameters of Sheft and Yost (2004) with one additional MF channel centered at 18 Hz preserving phase information, envelope-phase discrimination can be simulated for AM rates of up to 35 Hz. Preservation of phase information in a second additional channel is required to model the best performance of the current data set, that of subject S1. In either case, intersubject variability is assumed to represent suboptimal processing by some listeners.
In contrast to present results, both lateralization and synchrony-detection procedures demonstrate an ability to utilize envelope-phase information at moderate to high AM rates (e.g., Yost and Sheft, 1989; Bernstein and Trahiotis, 2002). Conceivably, the envelope processing required by these tasks may precede or operate in parallel to an MF stage. Alternatively, restrictions on MF models may be task dependent. Though modeling of auditory masking is often based on signal and masker power levels, the intent is not to argue for a general loss of fine-structure phase information that would leave lateralization ability unaccountable. A similar task dependency may be required when considering AM processing.
Acknowledgment
This research was supported by NIDCD grant numbers DC005423 and DC00625.
References and links
- Abel SM. Duration discrimination of noise and tone bursts. J. Acoust. Soc. Am. 1972;51:1219–1223. doi: 10.1121/1.1912963. [DOI] [PubMed] [Google Scholar]
- Bernstein LR, Trahiotis C. Enhancing sensitivity to interaural delays at high frequencies by using ‘transposed stimuli. J. Acoust. Soc. Am. 2002;112:1026–1036. doi: 10.1121/1.1497620. [DOI] [PubMed] [Google Scholar]
- Dau T. Modeling Auditory Processing of Amplitude Modulation. BIS; Universität Oldenburg: 1996. [Google Scholar]
- Dau T, Kollmeier B, Kohlrausch A. Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. J. Acoust. Soc. Am. 1997;102:2892–2905. doi: 10.1121/1.420344. [DOI] [PubMed] [Google Scholar]
- Dau T, Verhey J, Kohlrausch A. Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers. J. Acoust. Soc. Am. 1999;106:2752–2760. doi: 10.1121/1.428103. [DOI] [PubMed] [Google Scholar]
- Ewert SD, Dau T. External and internal limitations in amplitude-modulation processing. J. Acoust. Soc. Am. 2004;116:478–490. doi: 10.1121/1.1737399. [DOI] [PubMed] [Google Scholar]
- Fastl H. Fluctuation strength of modulated tones and broadband noise. In: Klinke R, Hartmann R, editors. Hearing—Physiological Bases and Psychophysics. Springer; Berlin: 1983. [Google Scholar]
- Greenberg S, Arai T. What are the essential cues for understanding spoken language. IEICE Trans. Inf. Syst. 2004;E87-D:1059–1070. [Google Scholar]
- Levitt H. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 1971;49:467–477. [PubMed] [Google Scholar]
- Sheft S, Yost WA. Minimum integration times for processing of amplitude modulation. In: Pressnitzer D, de Cheveigne A, McAdams S, Collet L, editors. Auditory Signal Processing: Physiology, Psychoacoustics, and Models. Springer; New York: 2004. [Google Scholar]
- Verhey JL, Ewert SD, Dau T. Modulation masking produced by complex tone modulators. J. Acoust. Soc. Am. 2003;114:2135–2146. doi: 10.1121/1.1612489. [DOI] [PubMed] [Google Scholar]
- Viemeister NF, Stellmack MA, Byrne AJ. The role of temporal structure in envelope processing. In: Pressnitzer D, de Cheveigne A, McAdams S, Collet L, editors. Auditory Signal Processing: Physiology, Psychoacoustics, and Models. Springer; New York: 2004. [Google Scholar]
- Yost WA, Sheft S. Across-critical-band processing of amplitude-modulated tones. J. Acoust. Soc. Am. 1989;85:848–857. doi: 10.1121/1.397556. [DOI] [PubMed] [Google Scholar]


