Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2011 Jun 23;130(1):EL32–EL37. doi: 10.1121/1.3599022

Temporal integration of loudness measured using categorical loudness scaling and matching procedures

Daniel L Valente 1,a), Suyash N Joshi 1, Walt Jesteadt 1
PMCID: PMC3138798  PMID: 21786865

Abstract

Temporal integration of loudness of 1 kHz tones with 5 and 200 ms durations was assessed in four subjects using two loudness measurement procedures: categorical loudness scaling (CLS) and loudness matching. CLS provides a reliable and efficient procedure for collecting data on the temporal integration of loudness and previously reported nonmonotonic behavior observed at mid-sound pressure level levels is replicated with this procedure. Stimuli that are assigned to the same category are effectively matched in loudness, allowing the measurement of temporal integration with CLS without curve-fitting, interpolation, or assumptions concerning the form of the loudness growth function.

Introduction

“Loudness is the primary perceptual correlate of the level of a sound” (Florentine, 2010). The difference in sound pressure level (SPL) required to equate a long- and a short-duration tone or noise in loudness, commonly described as temporal integration of loudness, has been found to vary nonmonotonically as a function of the level of the standard tone or noise (Florentine et al., 1996; Buus et al., 1999; Epstein and Florentine, 2005; Anweiler and Verhey, 2006). It therefore provides a context in which to evaluate measurement procedures.

Loudness is typically measured by a number of procedures including direct loudness matching, magnitude estimation, cross-modality matching, magnitude production, balancing and categorical loudness scaling (CLS) (for a recent review, see Marks and Florentine, 2010). Use of CLS has been questioned by the loudness research community (e.g., Hellman, 1999; Marks and Florentine, 2010), but has been used in many clinical studies because the typical subject time commitment, as well as the training necessary to perform this loudness measurement, is less. Another potential complication when using CLS is that it requires randomized levels within a block of trials and this has been shown to produce the most variable results in equal-loudness matches using adaptive psychophysical procedures (e.g., Silva and Florentine, 2006).

CLS provides meaningful labels that subjects can give to stimuli that directly relate to their perception (e.g., soft, loud) and the labels have now been standardized in ISO 16832 (Kinkel, 2007). It has been shown to provide a reliable and time-effective procedure by which to obtain loudness functions (Al-Salim et al., 2010; Anweiler and Verhey, 2006). Al-Salim et al. (2010) have reported CLS data in 74 subjects (58 with hearing loss) showing within-subject test–retest reliability (expressed as the correlation r between visit 1 and visit 2 slope estimates) for pure tones greater than 0.94 at three frequencies: 1, 2, and 4 kHz. Anweiler and Verhey (2006) have used CLS to measure temporal integration for bandpass-filtered noise with 10 and 1000 ms durations and have reported nonmonotonic temporal integration functions comparable to those obtained with loudness matching. A major drawback of the CLS procedure is that the resulting loudness functions are plotted in arbitrary categorical units (CUs) that are not proportional to loudness. A tone with a loudness of 40 CUs is not twice as loud as a tone with a loudness of 20 CUs and the slopes of CLS loudness functions cannot be interpreted as direct measures of compression. Garnier et al. (1999) have measured temporal integration for white noise bursts with 16.25 and 300 ms durations and have reported nonmonotonic temporal integration functions comparing mean SPL levels across six categories from “very soft” to “very loud.” Their functions did not require use of CUs, but could not be compared directly to results obtained with loudness matching.

In the present study, the aim was to measure temporal integration of loudness as a function of level using both CLS and loudness-matching procedures in the same subjects and to analyze the CLS data without reference to CUs. The approach is similar to Garnier et al. (1999), but the data have been obtained with a CLS procedure following ISO 16832 (Kinkel, 2007) using stimulus parameters modeled on those used by Florentine et al. (1996). Results are plotted in coordinates that permit a direct comparison of matching and CLS data.

Method

Four normal-hearing volunteers (one male, three female), ages 25–31 participated. All had previous experience in psychoacoustics tasks. Hearing had been screened at 0.5, 1, 2, and 4 kHz within the past year using a two-interval-forced-choice (2IFC) adaptive procedure. Thresholds were ≤ 15 dB SPL for all test frequencies bilaterally for all subjects.

The stimuli were 1 kHz tones with equivalent rectangular durations of 5 and 200 ms, generated digitally in Matlab at 44.1 kHz with 7 ms cosine2 ramps. Stimuli were delivered to the left ear of the subject in a double-walled sound-attenuated booth through Sennheiser HD250 Linear II headphones. Quiet thresholds for the 5 ms tone were also measured for all four subjects.

For the loudness matching portion of the experiment, short-duration standard tones were created from 20 to 90 dB SPL in 10 dB steps. Long-duration standard tones were created from 10 to 80 dB SPL in 10 dB steps. For the CLS portion of the experiment, short and long duration tones were created with the same SPL ranges as the matching portion, but with a 5 dB SPL step size. Absolute thresholds for the short-duration tones were measured for each subject using a 2IFC procedure with a decision rule chosen to estimate the level required for 71% correct (Levitt, 1971). Step sizes of 4 dB were used until the fourth reversal, at which point a 2 dB step was used. Subjects were tested in four 50 trial blocks. The threshold for each block of trials was calculated from the average of the level of the reversal points after the fourth reversal.

The first loudness tasks that subjects performed were loudness matches between the short- and long-duration tones made using a 2IFC adaptive procedure where the subject voted for the louder interval of each trial. Rather than using the one-up, one-down procedure used by Florentine et al. (1996), two interleaved tracks were used to estimate the 71% and 29% points on the psychometric function, as described by Jesteadt (1980). Both the short and long tone served as a standard in separate interleaved tracks, resulting in a total of four tracks in all. Within each trial, a subject heard either the fixed long or short tone in each interval with an equal a priori probability. Subjects voted for the louder interval. Subjects were not given feedback after their answer. A single matching level for either the short-tone fixed or long-tone fixed was determined by taking the mean of the levels corresponding to the 71% and 29% points. The levels for the variable tone started 10 dB above or below the expected matching level and within each condition, the level of the fixed short tone was 10 dB greater than the fixed long tone. Four complete matches were performed by subjects at each short- and long-tone level. The average level difference between the short and long tones that were judged to be equally loud was interpreted as the amount of temporal integration. The order of levels for loudness matching was counterbalanced with half the subjects ascending across blocks and the other half descending across blocks.

The CLS procedure used in this study was similar to that used by A1 Salim et al. (2010), but followed more closely the recommendation for CLS data collection presented in ISO 16832 (Kinkel, 2007). A LCD monitor presented a graphical display of the CLS categories represented by 11 horizontal bars that increased in relative length as the category increased. Every other bar was labeled by the following loudness categories: “Cannot Hear,” “Soft,” “Medium,” “Loud,” “Very Loud,” and “Too Loud.” Intermediate categories were not labeled and no numerical labels were used.

After subjects heard the tone presented in a trial, they indicated which loudness category it belonged in by clicking on the horizontal bar that corresponded to that category. Three conditions were run for the CLS portion of the experiment: short tone alone, long tone alone, and short and long tone interleaved. Each block of CLS conditions consisted of 100 trials. Short tones were picked at random from the 20–90 dB SPL list, and long tones from the 10–80 dB list to correspond to the same range of SPLs presented to subjects in the loudness matching task. Subjects completed four repetitions of each condition.

Temporal integration was determined by the difference between the mean short-tone SPL and long-tone SPL per CLS category. Short and long tones that are given the same category label can be thought of as approximately matched in loudness.

Results and discussion

Figure 1 shows the mean CLS and loudness matching data for four individual subjects. The abscissa for all panels shows the level of the 5 ms tone (in dB) with the difference between equally loud 5 and 200 ms tones shown on the ordinate. Polynomial fits to the mean CLS and matching data are also shown. The individual subject’s quiet threshold for the 5 ms tone is shown by a dotted vertical line. Two sets of data are shown for each procedure. For CLS, data were obtained in a condition where short and long tones were rated separately and in one where the two durations were mixed in the same list. For matching, data were obtained with either the short tone or the long tone as the standard, but always in interleaved tracks. The number of categories that each subject used for CLS conditions for both long and short tones, as well as the average SD for CLS and matching levels is given in Table Table 1..

Figure 1.

Figure 1

Mean temporal integration as a function of level for four individual subjects (1–4) estimated from CLS and loudness matching. The best polynomial fit to the data is shown for each individual track within a panel. The fit order was determined by visual inspection with the least order, which represented the data. The threshold for the 5 ms tone is also shown for all four subjects.

Table 1.

The number of categories used for both long and short duration tones for CLS in the combined “C” and separate “S” conditions. The mean SD of SPLs across repetitions is shown as a measure of reliability for CLS and matching conditions. Subjects show more variability in CLS conditions than in matching conditions.

Subject # Cat C # Cat S SD CLS (dB) SD Match (dB)
1 7 7 8.0 3.9
2 6 8 8.8 4.2
3 9 8 10.2 5.2
4 8 9 9.7 4.8

Each of the four subjects showed more temporal integration for mid-level SPLs with the CLS procedure. The CLS data were analyzed by a two-way analysis of variance (ANOVA) for repeated measures. The dependent variable was the difference between the short and long tones and the independent variables were category (nine levels corresponding to verbal labels from Very Soft to Very Loud), and list (combined short and long tones vs. separate short and long tones). A significant effect of category is seen [F(8,41) = 4.47, p < 0.001]. Both the list [F(1,41) = 1.05, p = 0.311] and list-by-category interaction [F(8,41) = 0.62, p = 0.752] were not found to be significant in this analysis.

The between-subjects matching data were more variable than the CLS data, despite lower within-subject standard deviations (SDs) for loudness balances. Subjects 2 and 4 showed a similar amount of temporal integration in the short-tone fixed and long-tone fixed conditions. Subjects 1 and 3 showed little temporal integration in the long-tone fixed conditions. A two-way ANOVA for repeated measures was used to analyze the data with difference between short and long tones as the dependent variable and presentation level and condition (short-tone fixed vs. long-tone fixed) as the independent measures. A significant interaction of condition by presentation level confirms what is seen in Fig. 1 [F(7,41) = 4.61, p < 0.001]. On average more temporal integration at mid-levels was measured while holding the short tone constant and varying the level of the long tone. A significant main effect of presentation level was also seen [F(7,41) = 5.82, p < 0.001]. The main effect of condition was not significant in this analysis [F(1,41) = 1.39, p = 0.245].

Figure 2 shows the mean difference between the short- and long-tone levels for CLS and matching averaged across the four subjects tested. Two quadratic fits to the mean CLS and matching data for the current study are also displayed in Fig. 2, along with functions reproduced from Florentine et al. (1996), Buus et al. (1999), and Epstein and Florentine (2005). A similar amount of temporal integration was measured as a function of level using the CLS procedure as in the previous studies. In the current study, less temporal integration as a function of level was measured using the matching procedure, but the results were within the range of previous studies.

Figure 2.

Figure 2

(Color online) Measured temporal integration as a function of level for the current study as well as for previous studies which used the same stimulus parameters.

The goal in obtaining data concerning the temporal integration of loudness using CLS and loudness matching was to explore the use of CLS to measure loudness in the context of a well-established phenomenon, the nonmonotic relation between temporal integration and level (e.g., Florentine et al., 1996). Anweiler and Verhey (2006) have demonstrated good agreement between loudness scaling and matching for stimuli varying in both duration and bandwidth, but their analysis was based on functions fitted to the CUs assigned to the verbal categories and on inverse functions used to determine level differences for a given loudness. These transformations are questionable because the CU scale is arbitrary and lacks even interval- scale properties [see Marks and Florentine (2010) for an excellent review of the issues].

The current study adopted a more direct way of measuring temporal integration as a function of level without assumptions regarding CUs, similar to the analysis performed by Garnier et al. (1999). It is only assumed that short and long tones assigned to the same CLS category are effectively matched in loudness. By calculating the mean SPL for each CLS category for both the short and long tone, a set of levels that are effectively matched in loudness per category are determined. These matched levels require no assumptions about the underlying loudness growth function and whether or not it can be determined from a non-ratio-scale loudness measure.

By using CLS for loudness data collection, subjects are, in essence, performing a matching task at multiple levels with levels randomized within a block of trials (RWB). Silva and Florentine (2006) showed that a similar RWB technique in the context of loudness matching (where the fixed tone level varied within blocks of trials), resulted in more variable data, as well as greater temporal integration at mid-levels. The authors attributed both effects to induced loudness reduction (for a recent overview, see Epstein, 2007) and cautioned against using the RWB procedure for equal-loudness matches. Because CLS requires use of a range of levels that covers a range of categories, the use of RWB levels of the judged stimuli is unavoidable. The greater temporal integration at mid-levels observed in this set of subjects using CLS as opposed to matching may therefore reflect induced loudness reduction. Nevertheless, data from a large group of subjects performing CLS with the equivalent of RWB levels show test–retest reliability of mean stimulus level per category to be high (r greater than 0.94, as reported by Al-Salim et al. 2010).

The data from the current study analyzed using the technique described previously show an amount of temporal integration measured as a function of level comparable to that observed in previous studies using identical stimulus parameters (Florentine et al., 1996; Buus et al., 1999; Epstein and Florentine, 2005). The amount of temporal integration measured as a function of level in the matching conditions was less than previously reported, perhaps as a result of the limited number of matching trials used in the current experiment or of the complexity of the loudness matching task.

CLS was demonstrated to be an effective loudness measurement procedure in a study that requires stimuli to be matched at multiple levels. An additional benefit for subjects is that the familiar meaningful labels given to categories can be directly related to their internal perceptual representations for sound levels. In short, they understand the task.

The analysis presented here is more straightforward than the one used by Anweiler and Verhey (2006) and does not require assumptions about the properties of the CU scale. The similarity of the effects observed in the two studies, however, suggests that their use of the CU scale in curve fitting and interpolation did not invalidate their results. Florentine et al. (1996) described an analysis, based on the hypothesis that the loudness ratio of long and short tones is equal at all levels, that allowed them to use their matching data to derive a loudness function in sones. In principle, this analysis also could be used to convert CLS categories to sones. In practice, however, it is difficult to obtain stable estimates of the overall range of the loudness function using this approach with either matching or CLS data.

Acknowledgments

The authors thank Melissa Krivohlavek for data collection and Hongyang Tan for help with data analysis. The authors also thank Mary Florentine and two anonymous reviews for their helpful comments on an earlier version of this manuscript. This research was supported by NIH Grants Nos. R01 DC006648, T32 DC000013, and P30 DC 004662.

References and links

  1. Al-Salim, S. C., Kopun, J. G., Neely, S. T., Jesteadt, W., Stiegemann, B., and Gorga, M. P. (2010). “Reliability of categorical loudness scaling and its relation to threshold,” Ear Hear. 31, 567–578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anweiler, A. K., and Verhey, J. L. (2006). “Spectral loudness summation for short and long signals as a function of level,” J. Acoust. Soc. Am. 119, 2919–2928. [DOI] [PubMed] [Google Scholar]
  3. Buus, S., Florentine, M., and Poulsen, T. (1999). “Temporal integration of loudness in listeners with hearing losses of primarily cochlear origin,” J. Acoust. Soc. Am. 105, 3464–3480. [DOI] [PubMed] [Google Scholar]
  4. Epstein, M., and Florentine, M. (2005). “Matching test of the equal-loudness-ratio hypothesis using cross-modality matching functions,” J. Acoust. Soc. Am. 118, 907–913. [DOI] [PubMed] [Google Scholar]
  5. Epstein, M. (2007). “An introduction to induced loudness reduction,” J. Acoust. Soc. Am. 122, EL74–EL80. [DOI] [PubMed] [Google Scholar]
  6. Florentine, M. (2010). “Loudness,” in Loudness, edited by Florentine M., Popper A. N., and Fay R. R. (Springer Science+Business Media, LLC, New York), p. 4. [Google Scholar]
  7. Florentine, M., Buus, S., and Poulsen, T. (1996). “Temporal integration of loudness as a function of level,” J. Acoust. Soc. Am. 99, 1633–1644. [DOI] [PubMed] [Google Scholar]
  8. Garnier, S., Micheyl, C., Berger-Vachon, C., and Collet, L. (1999). “Effect of signal duration on categorical loudness scaling in normal and in hearing-impaired listeners,” Audiology 38, 196–201. [DOI] [PubMed] [Google Scholar]
  9. Hellman, R. P. (1999). “Cross-modality matching: A tool for measuring loudness in sensorineural loss,” Ear Hear. 20, 193–213. [DOI] [PubMed] [Google Scholar]
  10. Jesteadt, W. (1980). “An adaptive procedure for subjective judgments,” Percept. Psychophys. 28, 85–88. [DOI] [PubMed] [Google Scholar]
  11. Kinkel, M. (2007). The new ISO 16832 “Acoustics–loudness scaling by means of categories”. 8th EFAS Congress/10th Congress of the German Society of Audiology, pp. 1–4.
  12. Levitt, H. (1971). “Transformed up-down methods in psychoacoustics,” J. Acoust. Soc. Am. 49, 467–477. [PubMed] [Google Scholar]
  13. Marks, L. E., and Florentine, M. (2010). “Measurement of loudness, part I: Methods, problems, and pitfalls,” in Loudness, edited by Florentine M., Popper A. N., and Fay R. R. (Springer Science+Business Media, LLC, New York), pp. 17–56. [Google Scholar]
  14. Silva, I., and Florentine, M. (2006). “Effect of adaptive psychophysical procedure on loudness matches,” J. Acoust. Soc. Am. 120, 2124–2131. [DOI] [PubMed] [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES