Abstract
A general finding of psychoacoustic studies is that detectability d' of a noisy signal grows less than optimally with the number N of independent observations of the signal. Competing accounts implicate internal noise common to all observations or nonoptimal decision weights given to observations. A discriminant analysis of listeners’ trial-by-trial responses in a multitone level-discrimination task favored the latter account.
Introduction
This paper is concerned with the question of how listeners accumulate information from multiple observations of a signal to improve their ability to detect and discriminate sounds. The question is of fundamental importance in psychoacoustics as it addresses the larger issue of how redundancy in signals, inherent in speech and everyday sounds, can serve to benefit detection and recognition in noise. Past studies have, indeed, shown an improvement in detectability with repeated observations of the signal.1, 2, 3, 4 However, the rate of improvement is found to be less than ideal for independent observations in excess of N = 4. These studies employed a sample-discrimination procedure wherein the frequencies of N tones, presented in sequence, were drawn independently and at random on each trial from one of two normal distributions differing in mean. The listeners’ task was to identify on each trial whether the tone frequencies were drawn from the distribution with high or low mean. Expressed as d', ideal performance in this task grows as the square-root of N; however, the obtained performance was found to grow more nearly as the cube-root of N. This basic result has been replicated for the sample-discrimination of tone intensity3, 4 and tone duration4 and for the tones presented simultaneously as well as sequentially.3, 4
Two competing accounts have been given for the less than optimal growth of d' with N. The first attributes the suboptimal growth to an additive source of central internal noise on each trial that is common to all observations.1, 2, 5 In this model, the standard deviation of the central noise σC appears as an additive term in the prediction for d', effectively slowing the rate a which performance grows with N. The alternative account attributes the suboptimal growth to an effective limit on the number of tones that listeners can process simultaneously.6 The limit is represented as a nonoptimal weighting of the information provided by the different tones, where some of the decision weights, wj, are close to or equal to zero. Both factors can be incorporated in a general model for which predicted performance is given by
| (1) |
where Δ and σ are, respectively, the difference in means and the standard deviation of the stimulus distributions given in the experiment, σPj represents the influence of peripheral internal noise, which is independent for each observation, and 0 ≤ wj ≤ 1. Both weighting and internal noise accounts make equally good predictions for d', given freedom in choosing values for σC, σPj, and wj. However, the two accounts make quite different predictions for how the listener’s trial-by-trial response correlates with individual observations. Zero weight given to some observations requires that some of these correlations be disproportionately large. Internal noise common to all observations requires these correlations to be near equal. We use the general model of Eq. 1 to test these predictions in what follows.
Method
The stimuli were N-tone complexes, where N ranged from 1 to 13 in steps of 2. The number of tones was varied in two ways. In the first, the tones (100 ms in duration) were presented simultaneously with frequencies spaced at equilog intervals from 250 to 7890 Hz (fj+1∕fj = 1.33). This spacing ensured negligible mutual masking among tones.7 The number of tones was increased beginning with the lowest frequency tone and ending with the highest. In the second condition, the tones (30 ms in duration) were presented sequentially without silent gaps and with a fixed frequency of 2000 Hz. Individual tones were gated on and off with 5-ms, cosine-squared ramps. In both conditions, the individual levels of the tones on each trial comprised a random sample of size N from one of two, equivariate, normal level distributions differing in mean (μLOW = 65 and μHIGH = 68 dB SPL, σ = 3 dB). On each day of the experiment, a total of 50 different N-tone complexes were generated representing 50 random samples of levels from each distribution; we will refer to these as the high- and low-level set. All tone complexes were played over a 16-bit, audio-quality, digital-to-analogue converter at a 20-kHz rate and were low-pass filtered (8-kHz cutoff, 120 dB∕octave) to remove aliased components.
A fixed, two-interval, forced-choice (2IFC) procedure was used to measure d' in each condition. On each trial, the listener heard a pair of the N-tone complexes separated by approximately 350 ms. One complex of the pair was randomly selected from the low-level set, the other from the high-level set of tone complexes. The listener’s task was to identify the interval containing the complex drawn from the high-level set. The value of N was fixed for each block of 50 trials, and feedback was given after each trial.
Conditions were run in pseudorandom order and replications were made across days. A single percent correct score was obtained from a total of at least 1000 trials per subject per condition. The mean percent correct scores were then converted to d' values for the 2IFC procedure.8 Three female students from the University of Wisconsin–Madison, ranging in age from 18 to 26 years, participated as paid listeners. They ran a single 2-h session per day, five days of each week. Sessions included frequent breaks. All listeners received two or more weeks of training in the 2IFC task before data collection began.
Results
Figure 1 gives the d' results (symbols) for the three listeners (S1–3) for both simultaneous and sequential presentation of tones (panel columns). The values of are normalized with respected to (N = 1) to facilitate comparison of the growth rate of d' with N across conditions. The dashed curve gives the optimal rate of growth. The data replicate earlier studies showing closer to cube-root of N growth. The red curve gives the prediction of the internal noise model (σC ranging from 0.13 to 0.65 dB across listeners and conditions). The blue curve represents the weighting model where the weights are treated as free parameters. The weighting model, of course, describes the data perfectly as the data are overdetermined by the number of free parameters in the model. The comparison is made, however, to underscore that the d' data do not adequately distinguish the predictions of the two models.
Figure 1.
Open symbols give obtained values of , expressed relative to for the three listeners (S1–3) for both simultaneous and sequential presentation of tones (columns). The optimal growth rate of is given by the black curve. The predictions of the internal noise model and the weighting model are given, respectively, by the red and blue curves. The d' data do not distinguish the predictions of the two models.
To distinguish the predictions of the models, it is necessary to measure the strength of the relation between the listener’s response and the individual observations from trial to trial. This was done by means of a discriminant analysis of the trial-by-trial data based on a logistic model. Let PR=2 denote the probability of responding that the second interval contained a signal from the high-level set, and let ΔLj represent the difference in the levels of the jth tone across the two observation intervals (second minus first). The logistic model is given by
| (2) |
where logit(·) is the inverse logistic function. The strength of the relation between the listener’s response and the individual observations is given by the normalized regression coefficients bj in Eq. 2; Σbj = 1. These values were estimated using the GLMFIT routine of the software application matlab v.7.0 and are shown in Fig. 2. (Note: symbols and error bars have been omitted for clarity of presentation.) The regression weights reveal a tendency of listener responses to be most strongly influenced by the highest and lowest frequency tones for simultaneous presentation of tones and the first and last tones for the sequential presentation. This result is consistent with the predictions of weighting model. However, it remains to be determined whether the effect can account entirely for less than optimal growth rate of d' with N. To this end we undertook an analysis of weighting efficiency.6, 9 Let represent the performance of each listener predicted exclusively from their relative decision weights in Eq. 1; that is, assuming no other limits imposed by internal noise (σC = σPj = 0). Weighting efficiency is then defined as
| (3) |
where is the performance of an optimal detector yielding growth of d'. We next define an overall performance efficiency ηobt representing the combined influence of the weighting efficiency and internal noise,
| (4) |
where the influence of internal noise ηnos = (∕)2 is simply that component of performance not accounted for by the decision weights. The weighting model now predicts that ηwgt should decrease with N at the same rate as ηobt, while ηnos remains constant. Conversely, the internal noise model predicts that ηnos should decrease with N at the same rate as ηobt, while ηwgt remains constant.
Figure 2.
The normalized regression coefficients bi obtained from the logistic regression model of Eq. 2 are plotted for each listener (panel rows) and condition (panel columns) as a function of N. Symbols have been omitted for clarity of presentation.
To test these predictions, we computed ηwgt and ηnos taking the normalized regression coefficients as estimates of the listeners decision weights, wj = bj. The results are shown in Fig. 3; here ηwgt and ηnos are given by the blue and red curves, respectively, ηobt is given by the open symbols and the black line represents ideal performance. The results clearly support the weighting model. While the effect of internal noise, as measured by ηnos, serves to degrade overall performance, the less than optimal growth rate of performance with increasing N corresponds to the reduction in weighting efficiency with increasing N.
Figure 3.
For each condition (panel columns) and listener (panel rows), the obtained values of ηobt (symbols), ηwgt (blue curve) and ηnos (red curve) are plotted as a function of N. The horizontal black curve gives ideal performance. See text for further details.
Conclusion
The results of the present study replicate those of past studies showing suboptimal growth of d' with N in a multitone level discrimination task. In past studies, competing accounts in terms of decision weights and internal noise are not meaningfully distinguished by their predictions for d'. In the present study, however, the weighting model is clearly supported by a discriminant analysis showing listener responses to depend largely on the “bracketing” tones within a stimulus; an effect that accounts entirely for the suboptimal growth of d' with N.
References and links
- Berg B. G., “Internal noise in auditory decision tasks,”Ph.D. dissertation, Indiana University, Bloomington, IN, 1987. [Google Scholar]
- Berg B. G. and Robinson D. E., “Multiple observations and internal noise,” J. Acoust. Soc. Am. 81, S33 (1987). 10.1121/1.2024197 [DOI] [Google Scholar]
- Lutfi R. A., “Informational processing of complex sound. I. Intensity discrimination,” J. Acoust. Soc. Am. 86, 934–944 (1989). 10.1121/1.398728 [DOI] [PubMed] [Google Scholar]
- Lutfi R. A., “Informational processing of complex sound. III. Cross-dimensional analysis,” J. Acoust. Soc. Am. 87, 2141–2148 (1990). 10.1121/1.399182 [DOI] [PubMed] [Google Scholar]
- Durlach N. I., Braida L. D., and Ito Y., “Towards a model for discrimination of broadband signals,” J. Acoust. Soc. Am. 80, 63–72 (1986). 10.1121/1.394084 [DOI] [PubMed] [Google Scholar]
- Lutfi R. A. and Liu C. J., “Individual differences in source identification from synthesized impact sounds,” J. Acoust. Soc. Am. 122, 1017–1028 (2007). 10.1121/1.2751269 [DOI] [PubMed] [Google Scholar]
- Scharf B., “Critical bands,” in Foundations of Modern Auditory Theory, edited by Tobias J. V. (Academic, New York, 1970), pp. 159–202. [Google Scholar]
- Swets J. A., Signal Detection and Recognition by Human Observers (Wiley, New York, 1964), pp. 147–198. [Google Scholar]
- Berg B. G., “Observer efficiency and weights in a multiple observation task,” J. Acoust. Soc. Am. 88, 149–158 (1990). 10.1121/1.399962 [DOI] [PubMed] [Google Scholar]



