Auditory stream segregation using amplitude modulated bandpass noise

Yingjiu Nie; Peggy B Nelson

doi:10.3389/fpsyg.2015.01151

. 2015 Aug 7;6:1151. doi: 10.3389/fpsyg.2015.01151

Auditory stream segregation using amplitude modulated bandpass noise

Yingjiu Nie ^1,^*, Peggy B Nelson ²

PMCID: PMC4528102 PMID: 26300831

Abstract

The purpose of this study was to investigate the roles of spectral overlap and amplitude modulation (AM) rate for stream segregation for noise signals, as well as to test the build-up effect based on these two cues. Segregation ability was evaluated using an objective paradigm with listeners' attention focused on stream segregation. Stimulus sequences consisted of two interleaved sets of bandpass noise bursts (A and B bursts). The A and B bursts differed in spectrum, AM-rate, or both. The amount of the difference between the two sets of noise bursts was varied. Long and short sequences were studied to investigate the build-up effect for segregation based on spectral and AM-rate differences. Results showed the following: (1). Stream segregation ability increased with greater spectral separation. (2). Larger AM-rate separations were associated with stronger segregation abilities. (3). Spectral separation was found to elicit the build-up effect for the range of spectral differences assessed in the current study. (4). AM-rate separation interacted with spectral separation suggesting an additive effect of spectral separation and AM-rate separation on segregation build-up. The findings suggest that, when normal-hearing listeners direct their attention towards segregation, they are able to segregate auditory streams based on reduced spectral contrast cues that vary by the amount of spectral overlap. Further, regardless of the spectral separation they are able to use AM-rate difference as a secondary/weaker cue. Based on the spectral differences, listeners can segregate auditory streams better as the listening duration is prolonged—i.e., sparse spectral cues elicit build-up segregation; however, AM-rate differences only appear to elicit build-up when in combination with spectral difference cues.

Keywords: amplitude modulation, auditory scene analysis, auditory stream segregation, auditory streaming, bandpass noise, build-up segregation, cochlear implant simulations, sequential grouping

Introduction

Auditory stream segregation (also referred to as auditory streaming) occurs naturally in daily life, such as when listening to a talker at a party or when following a melody played by an instrument in an orchestra. Listeners with normal hearing (NH) interpret a mixture of ongoing sounds in such a way that sounds from different sources are allocated to individual sound generators that are perceptually concurrent. Both spectral and temporal differences have been documented as cues that can elicit stream segregation in NH listeners. Studies have employed both pure tones (Bregman and Campbell, 1971; Warren and Obusek, 1972; van Noorden, 1975; Dannenbring and Bregman, 1976a) and bandpass noises (Dannenbring and Bregman, 1976b; Bregman et al., 1999; Nie et al., 2014) to investigate the effect of frequency differences on stream segregation. Bregman et al. (1999) found that interleaved narrowband noises with different amounts of spectral overlap could be perceived as from different auditory streams. Other research has documented that differences in temporal envelopes (Singh and Bregman, 1997; Vliegen et al., 1999; Vliegen and Oxenham, 1999; Grimault et al., 2000, 2001; Roberts et al., 2002) and amplitude modulation rate (Grimault et al., 2002) can induce stream segregation without the presence of spectral cues.

Conflicting findings have been reported on whether cochlear implant users are able to form auditory streams based on auditory signals they perceive with presumably degraded spectral contrasts but well-preserved temporal information. The inconsistency could be attributed to numerous differences among the studies. For example, spectral cue based (Cooper and Roberts, 2009) vs. amplitude modulation based stream segregation (Hong and Turner, 2006) has been evaluated; strength of segregation was measured using self-reported perception (Chatterjee et al., 2006; Marozeau et al., 2013; Böckmann-Barthel et al., 2014) vs. performance-based tasks (Hong and Turner, 2006, 2009; Cooper and Roberts, 2007); tasks with performance promoted by segregation (Hong and Turner, 2009) vs. tasks with performance hindered by segregation (Cooper and Roberts, 2007, 2009, Experiment 1) were used; stimuli involving acoustical signals (e.g., Hong and Turner, 2006) vs. electrical signals (e.g., Chatterjee et al., 2006) were presented to the listeners. Large differences among methodologies make conclusions difficult to interpret.

Even less understood in CI users is one of the key characteristics of stream segregation—the build-up effect which refers to the formation of auditory streams over time following the onset of the mixture of the sound sequences (Bregman, 1990). Chatterjee et al. (2006) and Cooper and Roberts (2009) failed to observe the build-up of streaming in CI users based on the electrode distance equivalent to the spectral differences between stimulus sequences. The conclusion that CI users are unable to segregate auditory streams was drawn by Cooper and Roberts based on the lack of build-up streaming. However, emerging research has suggested the build-up effect may not be observed in NH listeners (Micheyl and Oxenham, 2010b; Deike et al., 2012; Denham et al., 2013). Böckmann-Barthel et al. (2014) further reported comparable course of stream segregation in NH listeners and CI users in that build-up was absent for stimulus tone sequences adequately different in frequency and present when the frequency difference became ambiguous for stream segregation for both groups.

The current study aimed to investigate stream segregation in NH listeners when their listening condition resembled what CI users would commonly experience with degraded auditory cues. Sequences of amplitude modulated bandpass noise used in this study contained two critical cues for CI users—the degraded frequency-difference cue and the supposedly intact AM-rate cue. Unlike previous works (Vliegen and Oxenham, 1999; Hong and Turner, 2009) that varied the amount of inter-stream difference in one cue while controlling for the difference in the other cue, we examined conditions with both inter-stream spectral contrast and amplitude modulation (AM) rate contrast, individually and together. The dual-varying contrasts were studied as a simplistic representation of the co-existing spectral contrast and temporal envelope contrast available to CI users when the stimulus sequences were acoustic pure tones (Böckmann-Barthel et al., 2014).

A performance-based stimulus paradigm (also referred to as an “objective” paradigm) was used to assess stream segregation performance in a listening task. In contrast to a “subjective” paradigm in which stream segregation is assessed based on listeners' report of their perception of one or two streams, an “objective” paradigm is less affected by listener bias, such as listeners having different perceptual criteria for reporting one or two streams. Tasks to identify a violation of temporal regularity have been developed for the performance-based paradigm in different studies (Roberts et al., 2002; Micheyl and Oxenham, 2010a). This study employed a segregation-facilitated paradigm manipulated in such a way that, for better performance, listeners presumably focused attention to segregate auditory streams to identify a temporal violation in the stimulus sequences of noise bursts. The direction of focused attention on segregation, although may not necessarily be (at least completely) controlled by the listener (as suggested by Thompson et al., 2011), is in line with the top-down processing when CI users frequently require mental effort to segregate speech from background noise due to the reduced robustness of auditory cues.

The build-up of stream segregation for bandpass noises, based on spectral and/or AM-rate separations, was also explored in this work. Frequency differences have been confirmed to be a cue for build-up streaming in NH individuals when they listen to pure tone sequences (Anstis and Saida, 1985; Cusack et al., 2004; Thompson et al., 2011). In this study, we investigated whether listeners show build-up of stream segregation when listening to bandpass noises with systematically varied amount of spectral overlap—which reduced the frequency contrast between the potential streams to resemble the spectral interaction of signals delivered via a CI electrode array. It is hypothesized, but not well established, that temporal envelope can also be a cue for segregation build-up. The inconsistent findings on build-up in CI users (as reviewed earlier), in addition to the lack of research on the temporal-envelope based build-up, warrants further research in this area. Understanding how NH listeners use the degraded spectral cues coupled with temporal-envelope cues to form auditory streams and build up auditory stream segregation with attention directed to segregation may help lay basis for further understanding of CI users.

Experiment 1

Materials and methods

Participants

Ten adult listeners between 19 and 32 years of age, five males, participated in the study. Their hearing thresholds were no greater than 20 dB HL at audiometric frequencies of 250, 500, 1000, 1500, 2000, 3000, 4000, 6000, and 8000 Hz on the right side. The research procedure was approved by the Institutional Review Boards at the University of Minnesota to conduct the experiments on human participants.

Apparatus

For all experiments, the stimuli were processed live through a SoundMAX Integrated Digital Audio sound card installed in a Dell Pentium 4 computer. Listeners performed the task in a double-walled sound attenuated booth. Stimuli were generated using a MATLAB script at a sampling rate of 22,050 Hz. The 4th order Butterworth filters were designed and applied to the stimulus via MATLAB.

Stimulus sequences

Twelve-pair condition (long sequences eliciting build-up)

Twelve repeated pairs of A and B noise bursts were generated as described in our previous work (Nie et al., 2014) with modifications and additional conditions, where A and B bursts were either broadband noise or bandpass noise carrying sinusoidal AM (with 100% modulation depth and fixed phase). They differed either in the center frequency of the noise band, or in the AM-rate, or both.

Each A or B burst was generated with a different sample of noise. The duration of an A or B burst was 80 ms including 8-ms rise/fall ramps. The BRT (i.e., burst repetition time)—defined as an interval between the onsets of two consecutive bursts (i.e., the onsets of an A burst and the B burst proceeding or following the A burst, or the onsets of a B burst and the A burst proceeding or following the B burst)—was 130 ms, while A bursts (excluding the initial one) were jittered from their nominal temporal locations by an amount drawn randomly on each presentation from a rectangular distribution ranging from 0 to 40 ms. The amount of jitter of A bursts was selected based on a pilot study which demonstrated adequate disruption to following the rhythm of A-B pairs. The rationale for presenting B bursts steadily was that B bursts consisted of a passband with a lower frequency range (from 200 to 1426 Hz) which may provide the major information for speech understanding. Bashford and Warren (1987) found NH listeners scored 98% or higher when listening to words and sentences which were lowpass filtered at a cutoff frequency of 1100 Hz. In addition, Whitmal and DeRoy (2012) reported that, for NH listeners, frequencies below 1500 Hz became more important when natural speech was processed through vocoder processing. Therefore, it was of interest to investigate listeners' ability to follow the stream in this lower frequency range considering its importance for speech perception (see the Section on Procedure for details about the task).

Two types of stimulus sequences were adopted differing in the placement of the last B burst as illustrated in Figure 1. In a delayed sequence, the last B burst was delayed from its nominal temporal position by 30 ms, whereas, in a no-delay sequence, the last B burst was advanced by an amount drawn randomly on each presentation from a rectangular distribution ranging from 0 to 10 ms. The total duration was 3.1 s for the delayed sequences and 3.06–3.07 s for the no-delay sequences.

**Illustration of the stimulus paradigm (modified from Nie et al., 2014)**. **(A,B)** Illustrate the delayed sequences with the dark solid lines showing the duration of the delay for the last B burst. **(C,D)** Illustrate the no-delay sequences. **(A,C)** Depict the integrated perception and **(B,D)** depict the segregated perception. The spectral conditions for A and B bursts are A₆₇₈ and B₁₂₃₄, respectively. The AM rates shown on the A bursts and B bursts are 25 and 300 Hz, respectively.

Independent Gaussian noise was generated for the each broadband noise (BBN) burst. To obtain the bandpass noises, the independent Gaussian noise for each noise burst was filtered at cutoff frequencies adopted from the vocoder bands in Fu and Nogaki (2005). Table 1 shows the cutoff frequencies with a resolution of eight bands. The bands were numbered from one to eight corresponding to bands with center frequencies from low to high. The B band was obtained by filtering a Gaussian noise at the low cutoff frequency of band 1 and the high cutoff frequency of band 4; hence the B band (B₁₂₃₄) covered the bands 1 through 4 in Fu and Nogaki. With the same method, the higher three bands (e.g., bands 6, 7, and 8) formed another bandpass noise which was presented as one of the A band conditions and coded as Axxx (e.g., A₆₇₈). While the spectrum of the B band was constant (i.e., encompassing the lowest four vocoder bands), the spectra of the A bands covered four conditions, in relationship with the spectrum of B band:

Table 1.

Cutoff frequencies of the A and B bands at four spectral conditions and the relationship of the A and B bands with the eight vocoder bands from Fu and Nogaki (2005).

Seventy-seven-percent-overlap		A₂₃₄: Bands 2, 3, and 4
Forty-one-percent-overlap			A₃₄₅: Bands 3, 4, and 5
Seventeen-percent-overlap				A₄₅₆: Bands 4, 5, and 6
No-overlap						A₆₇₈: Bands 6, 7, and 8
	B bursts: Bands1234
Vocoder band (8-band resolution)	1	2	3	4	5	6	7	8
Low cutoff frequency (Hz)	200	359	591	931	1426	2149	3205	4748
	(5.84)	(8.77)	(11.86)	(15.08)	(18.39)	(21.77)	(25.17)	(28.62)
High cutoff frequency (Hz)	359	591	931	1426	2149	3205	4748	7000
								(32.09)

Open in a new tab

The cutoff boundaries of the vocoder bands in the ERB scale are shown in parentheses.

First, no-overlap—A₆₇₈ B₁₂₃₄ (A band consisted of bands 6, 7, and 8 as in Fu and Nogaki).

Second, seventeen percent (17%) overlap—A₄₅₆B₁₂₃₄ (A band consisted of bands 4, 5, and 6) with 17.1% overlap in the equivalent rectangular bandwidth (ERB) scale (Glasberg and Moore, 1990), derived from Equation 1.

\frac{\begin{array}{l} (high cutoff boundary of B band \\ − low cutoff boundary of A band) \end{array}}{\begin{array}{l} (high cutoff boundary of A band \\ − low cutoff boundary of B band) \end{array}} \times 100 %

(1)

where the cutoff boundaries were calculated in the ERB scale (Table 1), i.e., $\frac{(18.39 - 15.08)}{(25.17 - 5.84)} \times 100 % = 17.1 %$ .

Third, forty-one percent (41%) overlap—A₃₄₅B₁₂₃₄ (A band consisted of bands 3, 4, and 5) with 41.0% overlap in the ERB scale.

Fourth, one hundred percent (100%) overlap—A_BBNB_BBN (both A and B bands consisted of broadband noise).

It should be noted that the slope of the bandpass filters was set at 12 dB/octave to resemble the shallow filter slope in CI users (Anderson et al., 2011). In consequence, the actual band overlap was larger than that calculated using Equation 1.

Four comparisons of AM-rates were applied between A and B bands, as follows. First, unmodulated (AM0-0) with no AM applied to either A or B band; second, no separation in modulation rate (AM25-25) with both A and B bands modulated at a rate of 25 Hz; third, modulation rates 2 octaves apart (AM25-100) with A and B bands modulated at rates of 25 and 100 Hz, respectively; and fourth, modulation rates 3.58 octaves apart (AM25-300) with A and B bands modulated at rates of 25 and 300 Hz, respectively.

Three-pair sequences (short sequences providing baseline for evaluating the build-up effect)

Three pairs of A and B bursts (3-pair) were presented for three spectral separations including 100%-overlap (i.e., A_BBNB_BBN), 41%-overlap (i.e., A₃₄₅B₁₂₃₄), and no-overlap (i.e., A₆₇₈B₁₂₃₄). The temporal settings for A and B bursts in a 3-pair sequence were the same as those in a 12-pair sequence with only the first, second, and the last stimulus pairs of a 12-pair sequence preserved.

Procedure

In a pilot study, it was observed that the attentional effort required to perform the task was too high for listeners to maintain concentration for a two interval approach due to the length of a stimulus sequence in addition to the substantially reduced cues. Therefore, d′ was measured through a single interval yes/no approach. In each interval, either a delayed sequence or a no-delay sequence was presented.

The stimulus sequences were presented monaurally to the right ear through a TDH 49 headphone at 70 dB SPL for each noise burst calibrated based on the root-mean-square value. The task was to determine whether the delayed sequence or the no-delay sequence was presented in each trial. Two response options were given in two graphic boxes on a computer screen, one showing “1 Longer” for the “delayed” option and the other one showing “2 Shorter” for the “no-delay” option. The participants pressed on the keyboard number 1 (for the “delayed” choice) or number 2 (for the “no-delay” choice). Feedback was provided following each response by illuminating the box corresponding to the correct answer on the screen. Participants were allowed to take as much time as they needed to make the selection for each trial.

This task directed listeners to focus attention on segregating two streams in order to reach a better performance. To detect the delayed last B bursts, listeners had to discriminate the prolonged gap between the last two B bursts as opposed to the constant B-to-B gaps of the previous 11 B bursts (See Figure 1 to contrast panels B and D for the difference between the no-delay and delayed sequences). The jittered timing of A bursts introduced uncertainty to an A-to-B gap, thus made an A-to-B gap an ineffective cue for the identification of the delayed B bursts. Hence, listeners had to follow B bursts and ignore A bursts in order to determine the gaps between B bursts. In other words, for better performance, listeners presumably made mental efforts to segregate B bursts from A bursts to form a perceptual stream of B bursts. To sum up, the better a listener could segregate the B stream from A stream, the better he/she could detect the last delayed B burst.

Four blocks of 70 trials were run for each condition with a 50% chance of occurrence for either the signal sequence (i.e., delayed sequence) or the reference sequence (i.e., no-delay sequence). The first 10 trials were designed to facilitate the listeners forming and maintaining stream segregation. From the last 60 trials, the hit rate and false alarm rate were calculated, which were used to compute a d′. Ceiling performance (i.e., 100% for hit rate and 0% for false alarm rate) was reached in 7% of the total number of blocks in all listeners and was corrected using Equations 2 and 3 (Macmillan and Creelman, 2005).

H i t R a t e = 1 - \frac{1}{2S} \times 100 %

(2)

F a l s e A l a r m R a t e = \frac{1}{2N} \times 100 %

(3)

where S and N represent the total possible numbers of trials presented for signal and reference sequences, respectively.

Following an initial training session (see Familiarization for detail), participants were presented stimulus sequences in a random order of the spectral separation/duration condition. Four AM-rate separations were randomly nested under each spectral/duration condition. Participants completed their sessions across multiple days, one or two 1.5-h sessions each day. They were encouraged to take a 5-min break after 2 or 3 blocks. Due to time constraints, six participants participated in the 100%-overlap/3-pair conditions; among these six participants, four participated in the no-overlap/3-pair conditions. All 10 participants participated in the rest of the conditions.

Familiarization

Training session

The first 1.5-h session was designed for training purposes. The structure of the stimulus sequences was described to the participants verbally and with a schematic illustration. They were encouraged to follow the subsequence consisting of elements that were presented steadily. Only 12-pair sequences were used in this session.

Participants were initially presented with the presumably easiest condition—no-overlap. All participants reported perceiving segregated streams in this block. Additional blocks of the same condition were undertaken until a participant's d′ was larger than 2. Then, the spectral separation was decreased progressively with the AM-rate separation of either AM25-300 or AM0-0 applied to each of the spectral conditions. With 30–45 min of familiarization (1–5 blocks for each of the spectral conditions), all participants reported consistent segregation perception throughout at least one block in each of the spectral conditions of no-overlap, 17%-overlap, and 41%-overlap. However, they reported difficulties in holding the segregated perception for the 100%-overlap condition with AM25-300 separation, for which participants needed 45–60 min to repeat 8–12 blocks.

Experimental sessions

Prior to data collection in each experimental session, participants practiced the task with two 40-trial blocks of 12-pair sequences, one for the no-overlap condition and one for the 100%-overlap condition with the AM separation of AM25-300. All participants reported the capability of holding segregated perception throughout the block of no-overlap condition. More blocks were presented if participants reported absolutely no perception of segregation for the 100%-overlap condition until they reported intermittent segregation perception.

Data analysis

IBM SPSS statistics version 21 was used for data analysis and means and standard errors are reported in the results. Data were analyzed using the linear mixed-model approach which is specified in the Results Section for readability.

Results and discussion

Auditory stream segregation based on spectral separation and AM-rate separation

Listener performance measured with 12-pair stimulus sequences was analyzed via a linear mixed-model. The spectral separation and AM-rate separation were assessed for the fixed repeated effect, while the subject variables in the model included participants and the repetitions of d′ measures within each observational unit (i.e., a given AM-rate separation nested in a spectral separation).

Figure 2 shows mean d′-values for the 12-pair sequences under each spectral/AM-rate separation. Significant differences were found for spectral separation [F_{(3, 585)} = 77.09, p < 0.0001], and AM-rate separation [F_{(3, 585)} = 7.61, p < 0.0001]. No significant interaction was seen between spectral separation and AM-rate separation [F_{(9, 585)} = 1.01, p = 0.4317]. These findings suggest that when either the spectral separation or the AM-rate separation increases, listeners can better segregate ongoing interleaved stimuli into different perceptual streams.

Mean d′-values in various spectral and AM-rate conditions of the stimuli sequences collapsed across the 12-pair sequence duration in Experiment 1 (error bars represent ±1 standard errors around the means).

Pairwise comparisons between spectral separations

Pairwise comparisons with Bonferroni adjustment showed progressively increased d′-values (Table 2) as spectral separation between A and B subsequences increased from 100%-overlap to no-overlap (p < 0.001 for each comparison).

Table 2.

Mean d′-values in the four spectral conditions for pooled data across AM-rate conditions and mean d′-values in the four AM-rate conditions for pooled data across spectral conditions.

SPECTRAL CONDITION
	100% overlap	41% overlap	17% overlap	No-overlap
d′	1.66 (0.07)	2.13 (0.08)	2.40 (0.06)	2.67 (0.06)
AM-RATE CONDITION
	AM0-0	AM25-25	AM25-100	AM25-300
d′	2.07 (0.07)	2.13 (0.07)	2.31 (0.08)	2.34 (0.07)

Open in a new tab

The measured standard errors are shown in parentheses.

Pairwise comparisons between AM-rate separations.

The mean d′−values for the four AM-rate conditions are shown in Table 2. With Bonferroni adjustment, better performance was revealed for AM25-300 than for AM25-25 (p = 0.0134) and AM0-0 (p = 0.0006). Performance for AM25-100 was also significantly better than for AM25-25 (p = 0.0446) and for AM0-0 (p = 0.0025). No difference was shown between the AM0-0 and AM25-25 conditions or between the AM25-100 and AM25-300 conditions (p>0.9999 for either comparison). These results suggest that when the AM-rate difference is 2 octaves or larger, it can be a cue for listeners to segregate the interleaved A and B noise bursts into two auditory streams.

Build-up effect: stream segregation based on 3- vs. 12-pair stimulus sequences

Comparison of results for 3- and 12-pair stimuli revealed the extent of segregation build-up. For a given participant, a spectral separation (including the four AM-rate separations nested under it) for the 12-pair stimulus sequences was excluded from the mixed model of analysis, if it was not tested for the 3-pair stimulus sequences. Repeated factors were spectral separation and AM-rate separation, with subject variables of participants, duration of sequences, and repetitions of the d′ measure within a given observational unit. Three independent factors were assessed including sequence duration (12-pair vs. 3-pair), spectral separation (no-overlap, 41%-overlap, and 100%-overlap), and AM-rate separation (AM25-300, AM25-100, and AM25-25).

Overall, listeners showed better performance in the 12-pair conditions (mean = 2.25 ± 0.10) than in the 3-pair conditions (mean = 1.48 ± 0.10) [F_{(1, 86)} = 27.80, p < 0.0001]. A significant interaction was revealed for spectral separation X duration [F_{(2, 427)} = 5.13, p = 0.0063] (left panel in Figure 3), but not for AM-rate separation X duration [F_{(2, 398)} = 0.34, p = 0.7137] (right panel in Figure 3). However, the three way interaction of spectral separation X AM-rate separation X duration was found to be significant [F_{(12, 399)} = 2.04, p = 0.0407] (Figure 4).

Contrasts of mean d′-values in Experiment 1 between 12-pair and 3-pair stimulus sequences for the three spectral separations (left panel) and for the three AM-rate separations (right panel) (error bars represent ± one standard errors). Significance was found for the interaction of spectral separation X sequence duration, but not for the interaction of AM-rate separation X sequence duration.

Mean d′-values for 12-pair and 3-pair conditions in Experiment 1 are illustrated as a function of AM-rate separation in three spectral separation conditions: no-overlap, 41%-overlap, and 100%-overlap. Error bars represent ± one standard errors.

These results indicate that, listeners were able to segregate the A and B streams better for the 12-pair sequences. In addition, the significant interaction of spectral separation and sequence duration revealed a steeper slope for 12-pair sequences in the performance / spectral separation function. This suggests greater build-up effect of stream segregation for a larger spectral difference. In other words, spectral separation elicited the build-up effect and facilitated stream segregation.

While the overall non-significant interaction of AM-rate separation and sequence duration indicates limited to no build-up of stream segregation as the AM-rate separation increased, the significant three-way interaction (spectral separation X AM-rate separation X sequence duration) suggests the effect of AM-rate on build-up may be spectral-separation dependent. Figure 4 reveals greater increase in d′ scores with the increase of AM-rate-separation for the 12-pair stimulus sequences than for the 3-pair sequences when the two stimulus subsequences were spectrally different (i.e., in the 41%-overlap or no-overlap). In addition, this trend appears more salient in the 41%-overlap than in the no-overlap, suggesting a possible (if not all, stronger) interaction of AM-rate-separation and sequence duration in the 41%-overlap—the condition with less inter-subsequence spectral separation.

Experiment 2

The objective of this experiment was to confirm the AM-rate cue for build-up as suggested by the interaction of spectral separation and AM-rate separation for the build-up segregation in Experiment 1. We assessed listeners' performance when the two stimulus subsequences were more spectrally overlapping than what had been tested in Experiment 1. The Apparatus in this experiment was identical to that in Experiment 1.