Abstract
Two approaches to the automated detection of alarm sounds are compared, one based on a change in overall sound level (RMS), the other a change in periodicity, as given by the power of the normalized autocorrelation function (PNA). Receiver operating characteristics in each case were obtained for different exemplars of four classes of alarm sounds (bells/chimes, buzzers/beepers, horns/whistles, and sirens) embedded in four noise backgrounds (cafeteria, park, traffic, and music). The results suggest that PNA combined with RMS may be used to improve current alarm-sound alerting technologies for the hard-of-hearing.
Introduction
Alarm sounds are an important class of sounds with distinctive features that serve as a call to action. These properties make the automated detection of alarm sounds a model problem for computational auditory scene analysis.1,2 Two properties, in particular, distinguish alarms sounds from most everyday sounds. First, and most obvious, they can be quite loud. Current alarm-alerting technologies for the hard-of-hearing, in fact, rely on this feature to detect an alarm as a change in ambient sound level (RMS). Second, they tend to be periodic, repeating regularly in bursts (long-term periodicity) and/or having a fundamental frequency giving rise to the perception of pitch (short-term periodicity). Current alarm-alerting technologies that require the alarm to produce a significant change in ambient sound level cannot be relied upon to detect alarms in cases where the alarms are of low or moderate level (e.g., the tornado siren heard at a distance or the phone in another room). Nor can they be relied upon when the ambient noise level is high (e.g., street or loud cafeteria noise). In such cases, which occur frequently, a change in ambient periodicity may instead signal the presence of an alarm. The goal of the present work was to evaluate this possibility using the power of the normalized autocorrelation function (PNA) to extract periodicity.
Method
A total of 75 recordings of alarm sounds were gathered from various internet web sites and were divided into four categories based loosely on their mechanical and acoustical properties. Category descriptions are given in Table 1.. The alarms occurred in four different types of background noise: park, cafeteria, traffic, and music. One ongoing recording (approximately 1–2 min in length) of each type of background noise was taken from different locations on the web. Descriptions of the background sounds are given in Table 2.. All sound recordings had a sampling rate of 22 050 Hz with a resolution of 16 bits. The recordings were equated in average power so that the average alarm-to-noise ratio was 0 dB.
Table 1.
Acoustic properties of the four categories of alarm sounds. Number of exemplars in each category indicated in parentheses. (AM = amplitude modulation, FM = frequency modulation).
| Category | Examples | AM | FM | Line spectra | Broad band |
|---|---|---|---|---|---|
| Bells/chimes | door bell, phone, railroad crossing | Yes | No | Yes | Yes |
| (14) | |||||
| Buzzers/beepers | alarm clock, smoke detector, electronics | Yes | No | Yes | Yes |
| (27) | |||||
| Horns/whistles | car horns, train whistles police whistle | No | No | Yes | Yes |
| (27) | |||||
| Sirens | tornado siren, ambulance, fire truck | Yes | Yes | No | Yes |
| (7) |
Table 2.
Description of background sounds.
| Category | Description |
|---|---|
| Park | birds chirping, children playing, leaves rustling, wind noise |
| Cafeteria | people talking, dishes clanging, chairs moving |
| Traffic | cars passing, engine noise |
| Music | Foxy Lady, Jimi Hendrix |
Receiver operating characteristics (ROCs) for two different detection algorithms were generated from 100, 1-s samples of alarm + background and background alone, drawn at random from each sound recording. The first detection algorithm (RMS detector) reported an alarm whenever the level of the sample exceeded a pre-specified threshold. Let R denote the discrete autocorrelation function of a sample. The decision rule for the RMS detector was
where R(0) is the power of the sample and β is the threshold value varied to obtain each ROC. Note that in any practical application involving any detection algorithm a threshold value would need to be selected to achieve an acceptable false-alarm rate. This could be done adaptively, for instance, by sampling the ambient environment in the absence of an alarm. The second detection algorithm (PNA detector) reported an alarm when the periodicity of the sample exceeded a pre-specified threshold. The decision rule for the PNA detector was
where is the power of the autocorrelation function of the sample normalized by the power of the sample. The autocorrelation was performed over the entire sample and was not normalized to account for fewer overlapping samples at the longer lags. The threshold β was varied, as before, to obtain each ROC. The logic of using the total power in R was to capture the multiple scales of periodicity that exist both within and across alarms. We had experimented with various other algorithms in this regard, but this one seemed to work best. Normalizing with respect to the power of the sample was done to remove the only component of R not associated with the periodicity of the sample.
Results
Figure 1 gives the ROCs for each class of alarm (circles = bells/chimes, squares = buzzers/beepers, triangles = horns/whistles, and diamonds = sirens) for each noise background (panels). The ROCs for RMS and PNA are given, respectively, by filled and unfilled symbols. Both detection algorithms produce asymmetric ROCs and typically yield hit rates of 80% or more for false-alarm rates above 25%. PNA gives significantly higher hit rates than RMS at false alarms rates below 25%, but does not perform as well for sirens or when the background is music. For sirens, in fact, the ROCs fall below the positive diagonal so that an alarm is actually less likely to be reported for alarm + background than background alone. The poor performance for PNA in these cases is to be expected. Music, as a background, is already highly periodic; hence, the addition of an alarm does not greatly change the ambient periodicity. Sirens, on the other hand, are highly aperiodic inasmuch as they have a high degree of frequency modulation (FM). Adding a siren to the background, particularly for sirens having a high rate of FM as the ones used here, is thus more likely to decrease periodicity resulting in fewer hits than false alarms. With these two exceptions PNA does nonetheless significantly outperform RMS at the low false-alarm rates. The general pattern of results was, moreover, the same at both a higher (+10 dB) and lower (−10 dB) alarm-to-noise ratio.
Figure 1.
ROCs are shown for each class of alarm (○ = bells/chimes, □ = buzzers/beepers, △ = horns/whistles, and ◊ = sirens) in each of the four different noise backgrounds (panels). ROCs for RMS and PNA are given, respectively, by filled and unfilled symbols.
Conclusions
Alarm sounds occur rarely in everyday listening, and when they do, they typically occur in a background of ambient noise. Viable alarm-sound altering technologies, therefore, need to operate at low false-alarm rates and potentially low signal-to-noise ratios. Past technologies based on sinusoidal modeling,1 standard speech recognition algorithms,1 and spectral/temporal template matching3 are computationally intensive and have had limited success in this regard. However, a technology based on the PNA requires comparatively little computational resources and appears to achieve the desired criteria in most cases, the notable exceptions being frequency-modulated sirens and backgrounds of music. The success is due in large measure to the fact that alarm sounds tend to be more periodic than the ambient environmental noise in which they occur. The results suggest that PNA, when combined with more traditional RMS detection algorithms, may hold promise as a means of improving alarm-sound alerting technologies for the hard-of hearing.
Acknowledgment
The authors would like to thank Dr. Christophe Micheyl and an anonymous reviewer for helpful comments on an earlier version of this manuscript. This research was supported by NIDCD Grant No. R01 DC001262-20.
References and links
- Ellis D., “Detecting alarm sounds,” Recognition and Organization of Real-World Sounds: Workshop on Consistent and Reliable Acoustic Cues, Aalborg, Denmark: (2001), pp. 59–62. [Google Scholar]
- Bregman A.S., Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, Cambridge, MA, 1990), pp. 1–773. [Google Scholar]
- Xiao X., Yao H., and Guo C., “Automatic detection of alarm sounds in cockpit voice recordings,” Proceedings of the 2009 IITA International Conference on Control, Automation and Systems Engineering (2009), pp. 599–602.

