Abstract
The paper considers the problem of detecting and classifying acoustic signals based on information (entropy) criteria. A number of new information features based on time-frequency distributions are proposed, which include the spectrogram and its upgraded version, the reassigned spectrogram. To confirm and verify the proposed characteristics, modeling on synthetic signals and numerical verification of the solution of the multiclass classification problem based on machine learning methods on real hydroacoustic recordings are carried out. The obtained high classification results () allow us to assert the advantages of using the proposed characteristics.
Keywords: information entropy, spectral complexity, time-frequency distribution
1. Introduction
The history of signal detection and classification problems is associated with the development of processing methods, the main mathematical apparatus of which are signal transformations that transfer a time series from the time domain to another signal space. The most famous frequency conversion of signals is the Fourier transform, which has become de facto synonymous with frequency signal decomposition due to its mathematical validity and algorithmic efficiency. However, a strong limitation for the Fourier transform and any other frequency transformations remains their inability to track changes in the frequency components of the signal over time. To solve this problem, time-frequency transformations come to the rescue, the most popular of which is again the windowed Fourier transform and its square spectrogram, which is the simplest and most efficient time-frequency transformation. Cohen [1] introduced and described a general class of such transformations, special cases of which are the Wigner–Ville, Richachek, Choi–Williams transformations, etc. Thus, a method based on the Wigner–Ville transformation has become very popular for radar processing due to the convenience of using it to resolve linear frequency modulation (LFM) signals.
A comprehensive description of time-frequency transformations is contained in the monograph by Boilem Boashash [2]. In particular, this monograph, along with a number of the author’s other works [3,4,5] investigated the properties of the Rényi entropy for estimating the number of components of multicomponent signals, which also provides examples of using this estimate as a classification feature in signal classification problems. An original method for estimating the number of signal components based on entropy was also proposed by the current article’s authors in [6].
Since the basis of time-frequency distributions (TFDs) is the nuclear function generated by the reference signal, many authors compare TFDs with various nuclear functions on different signals, which is illustrated in the review [7].
In the case of an unknown reference signal, refs. [8,9] consider three variants of the distance between TFDs based on the Kullback–Leibler divergence, the Jensen–Shannon divergence, and the Rényi cross-entropy between the two distributions. Cross-entropy approach is also utilized in [10]. In turn, Baranyuk in [11,12,13] offers a modified formula for calculating the Jensen–Shannon divergence, taking into account the possible inconsistency of the TFD, where instead of the arithmetic mean, the geometric mean of two TFDs is used.
The problems of detecting mobile targets are considered in [14,15], where the Wigner–Ville entropy of the time-frequency distribution is calculated, and the decision to detect a target is made by exceeding the threshold of the information metric obtained based on the constant false alarm rate (CFAR) algorithm, which is an adaptive algorithm often used in radar to detect targets in conditions of noise, interference, and signal interference.
The work [16] explores the use of entropy in the problem of voice activity detection (VAD) for speech signal processing.
Entropic approaches to the processing of medical electroencephalogram (EEG) signals are investigated in [17], and in [18], the entropy from the Stockwell time-frequency transform (also known as the S-transform) is used to classify and detect heart valve pathology from electrocardiography (ECG) recordings.
In the articles [19,20], the entropy of TFD is used to solve problems of processing technical industrial signals related to monitoring the condition of production equipment.
The current work is devoted to the study of the concept of information complexity of time-frequency distributions in the problem of signal detection and classification. The novelty of the proposed approach is as follows:
introduction of the concept of the TFD information complexity;
using Rényi entropy to calculate the information complexity of two-dimensional probability distributions;
application of the proposed information characteristics to the classification problem of acoustic signals.
The article consists of five main Sections, an Introduction, and a Conclusion. In Section 2 the classification problem of time series of acoustic origin is briefly formulated. In Section 3, Section 4 and Section 5, the necessary mathematical representations of time-frequency distributions are introduced, some of their previously unknown properties are investigated, and entropy and information criteria based on statistical complexity of various nature are proposed, as well as different ways for calculation of discrete distributions. Section 6 is devoted to a classification experiment of real natural and technical recordings using machine learning methods. In Conclusions (Section 7), the main results of the work are analyzed and plans for further research are outlined.
2. Statement of Signal Classification Problem
The problem of multiclass signal classification is based on a training dataset consisting of a set of objects , where each object represents a studied signal of length , and labels such that , where K is the number of classes ( corresponds to a binary classification, i.e., a detection problem when it is necessary to determine the presence or absence of an useful signal in a signal–noise mixture).
It is required to construct a mapping (parametric dependence model) with a vector of model parameters that would approximate the real unknown dependence with some criterion for minimizing empirical risk.
| (1) |
where is the loss function which indicates the deviation of from the correct answer.
The next step is the formation of a feature space. Each entry can be matched with basic signal characteristics: statistical features of a normalized/whitened signal, and a discrete Fourier spectrum
| (2) |
or the corresponding power spectrum
| (3) |
as well as various information characteristics based on two-dimensional time-frequency distributions presented in the following sections of the article. The purpose of the study is to examine the appropriateness of these characteristics for solution of the signal classification problem.
In various fields of physics and biology, the concepts of a useful signal, noise, and signal–noise mixture are defined independently. In hydroacoustics, which is the main application area of current study, environmental noise is random in nature and the useful signal is determined by the operation of ship mechanisms and is deterministic but unknown, and most importantly, the signal-to-noise ratio is available for indirect measurements.
3. Time-Frequency Distributions
3.1. Spectrogram and Wigner–Ville Distribution
Despite the limited time-frequency resolution, the simplicity of determining the spectrogram makes it one of the most popular time-frequency distributions, both for initial comparative theoretical analysis and as a reference in practical applications [2].
The short-term continuous Fourier transform of the signal is written as follows:
| (4) |
where is some window function. The spectral power density, or the spectrogram S, corresponds to the square of the value
| (5) |
The short-term discrete Fourier transform of a signal x with a window function h of length is defined by the following expression:
| (6) |
where is the length of one window processed with DFT and n is the sequential number of the window. Consequently the discrete spectrogram is defined as
| (7) |
where M is the total number of windows, and N is the total number of frequencies, taking into account the symmetry of the spectrum.
The Wigner–Ville distribution is a prototype of distributions that differ qualitatively from the spectrogram. Exploring its strengths and weaknesses has become one of the main directions in the development of this field. Wigner was aware of the existence of other joint densities, but chose the one that has now become the Wigner distribution “because it seems to be the simplest.” The Wigner distribution was introduced into signal analysis by Ville [1] about 15 years after Wigner’s paper. Ville argued for its plausibility and derived it using a method based on characteristic functions.
The Wigner–Ville distribution for the signal is written as follows:
| (8) |
The Wigner distribution is considered bilinear in terms of the signal, since the signal is included twice in its calculation.
Remark 1.
will always have areas of negative values for any signal with one exception—for the LFM chirp signal with Gaussian amplitude. The reason is that for this signal, the Wigner distribution is not bilinear and belongs to the class of positive distributions.
Remark 2.
There is a connection between the spectrogram and the Wigner–Ville distribution through the window function. This relation connects two bilinear distributions, namely
(9) where is a Wigner–Ville transform of window function .
3.2. Reassigned Spectrogram
An important consequence of the (9) formula is that the value that the spectrogram takes at each point is the result of summing the values of within a certain time-frequency domain. In other words, is a number that is assigned to the geometric center of the area being viewed over. For example, assigning the total mass of an object to its geometric center—an arbitrary point that, with rare exceptions, has no reason to correspond to the actual mass distribution.
A much more sensible choice is to assign the total mass to the center of gravity, and it is precisely this approach that corresponds to the Reassignment principle [2]:
- at each point , where the value of the spectrogram is defined, two values are also calculated
which define the local distribution centers of through the window centered at ;(10) - then the value of the spectrogram is moved from the point to this centroid , which allows us to determine the reassigned spectrogram as follows:
where is the Dirac delta function. In general, for , the value of the spectrogram corresponds to a new position on the time-frequency plane, namely, moved to the point .(11)
In practice, according to [21], to calculate the centroid , one can use a more efficient procedure based on STFT:
| (12) |
where is the derivative of the used window function , and .
4. Entropy of Time-Frequency Distributions
4.1. Classical Information Criteria and Discrete Distributions
In previous articles by the authors, the normalized Shannon entropy is studied, which is calculated from the square of the amplitude spectral distribution according to (3):
| (13) |
where is a normalized discrete distribution so that .
When calculating the sum (13), it is assumed that by continuity, and this assumption is valid for all subsequent equations.
This measure of entropy evaluates the uniformity of the distribution of signal energy in the frequency domain. High spectral entropy means greater uniformity in the distribution of signal energy, while low entropy means less uniformity. Spectral entropy can be used to discriminate a narrow band signal from a broadband one, for example, to distinguish between a tone signal and white noise; however, they cannot be used to distinguish two broadband signals, for example, a FM signal and noise.
In fact, time-frequency distributions concentrate energy for frequency-modulated signals in the same way that the Fourier transform concentrates energy for harmonic components. Thus, time-frequency extensions of entropy measures can distinguish between two different classes of broadband signals, while the energy of one class of signals can be evenly distributed in the time-frequency domain, for example, in the case of white noise, and the energy of the second class is concentrated in a certain area of the time-frequency plane, for example, in the case of a FM signal [2].
Time-frequency distributions, which are positive over the entire detection range, and normalized to the total energy, can be used to calculate information characteristics of a signal. In this case, the density of the discrete distribution will be determined by the formula
| (14) |
where the index S is associated with the spectrogram (7). In case of reassigned spectrogram (11), index R and notation for discrete distribution will be used later in the text.
The Shannon entropy of the time-frequency distribution is an extension of the spectral entropy, and can be obtained from the spectral entropy by replacing the Fourier transform of the signal with the time-frequency transform in Equation (13), and then by replacing the one-dimensional summation with a two-dimensional one. In the discrete case, it can be defined as
| (15) |
However, most time-frequency transformations do not have the property of positivity, so researchers prefer to use the Rényi entropy as an entropy measure
| (16) |
where the index defines the order of Rényi entropy (), usually . When the parameter tends to one, the Rényi entropy converges to the Shannon entropy.
According to [5], the value of the short-term Rényi entropy for a spectrogram of monocomponent harmonic signal is
| (17) |
Equation (17) shows that the local entropy of the time slice of a single-component signal spectrogram depends on the duration of the interval , the standard deviation of the spectrogram window , and of the order of of the Rényi entropy. The local Rényi entropy decreases with increasing parameter . Large values of the entropy order emphasize the peak character of the spectrogram, which means they reduce entropy. Note also that the second term in Equation (17) decreases with increasing entropy order , since . For a two-component signal, one can similarly obtain
| (18) |
Hence follows the countable property of the Rényi entropy, i.e., the direct dependence of entropy on the number of harmonic components in the signal, which is a very powerful feature to determine its number.
4.2. New Information Criteria and Discrete Distributions
Another way to determine the discrete density is to normalize the columns of the matrix to the total energy of one signal processing window, as shown below:
| (19) |
Let us set the properties of the matrix .
Remark 3.
calculated by the formula (19) is a right stochastic matrix by construction if .
By the ergodic theorem, for a regular stochastic matrix, there exists a vector , such that
where is a unit vector.
Remark 4.
The matrix is a right stochastic matrix. Therefore, operations for calculating information characteristics, in particular the Shannon and Rényi entropy, are defined for this matrix.
Remark 5.
The Shannon and Rényi entropies can be calculated from the matrix Ω and the vector Π. It turns out that
(20) At the same time, the maximum value of the Rényi entropy is .
It should be noted that for matrix elements, on average, is valid for all .
5. Complexity of Time-Frequency Distributions
In our earlier works [22,23], a characteristic called statistical complexity was investigated. The concepts of the disequilibrium function D and the statistical complexity C(C – complexity) of the discrete probability distribution P were first introduced in [24]:
where the disequilibrium determines the distance between the signal distribution P and noise distribution Q and is the entropy of the signal distribution P.
The simplest example of a disequilibrium function is the Euclidean distance in the space of discrete probability distributions, which is convenient to use when evaluating and comparing with noise having a uniform spectral distribution , is as follows:
| (21) |
The statistical complexity, defined through the expression of the disequilibrium by (21), has the form
| (22) |
Jensen–Shannon disequilibrium and corresponding statistical complexity are defined as
| (23) |
where is Jensen–Shannon divergence
| (24) |
The disequilibrium with total variation and corresponding statistical complexity are defined as
| (25) |
Thus, transitioning from frequency distributions to time-frequency distributions naturally leads us to introduce and define a complexity function for them. Obvious candidates as a disequilibrium function for time-frequency distributions are as follows:
Kullback–Leibler divergence ;
Rényi divergence (can be considered a generalization of , since when tends to 1, it becomes );
Jensen–Shannon divergence for Rényi entropy ;
Euclidean distance ;
Total signed measure of variation .
While the last two functions are trivial in terms of the transition to two-dimensional distributions, the first items of the list require some comments and are discussed in more detail below.
5.1. Rényi Divergence
The Rényi divergence between two time-frequency probability distributions P and Q has the form
| (26) |
and it becomes a Kullback–Leibler divergence when tends to 1.
When , one can write
| (27) |
A symmetrized divergence is often used. It is relatively straightforward to calculate and takes the following form:
| (28) |
In the current work, such a distance is not applicable as a disequilibrium, since in practice is used.
5.2. Jensen–Shannon Divergence for Time-Frequency Distributions
Richard Baraniuk et al. in their articles [11,12,13] introduced an analog of the Jensen–Shannon distance for TFDs
| (29) |
where , and the sign of “·” is an element-wise matrix multiplication. Now let us explore the properties of the function (29).
Lemma 1.
It is valid for any discrete distributions P and Q for
(30) and equality is possible only if .
Proof.
To prove the lemma, we need to evaluate the expression
which is the square of the Cauchy–Bunyakovsky inequality, and and are the i-th elements of the corresponding distributions P and Q to the power of . By applying the operation to this inequality, we obtain the statement of the lemma. □
Lemma 2.
Let , , be valid for discrete densities P and Q, and , where , and . Then
(31)
Proof.
Let us decompose the function of many variables into a Taylor series up to the third term in the neighborhood of the matrix P. Here and further, a summation is performed using repeated indexes. We get
(32) In turn, decomposing the function into a Taylor series in the vicinity of the same point gives
(33) Subtracting the penultimate expression from the last one gives the statement of the lemma. □
Let us consider an elementary example to illustrate Lemma 2.
Example 1.
Let us choose
(34) Then . The calculation of entropies and divergence gives
(35) On the other hand, according to Lemma 2 we have and
As one can see, the calculations of head-on using the formula (29) and Lemma 2 are close.
5.3. New Information Characteristics
Summing up the discussion of the divergences of TFDs from previous Sections, it is proposed to investigate the following signal characteristics for signal classification:
- Related to Shannon entropy:
(36) - Related to Rényi entropy:
(37)
Moreover, the discrete distributions P can be calculated using different supports and have different indexes accordingly:
for spectrum (3) one-dimensional discrete distribution;
for spectrogram (7) two-dimensional discrete distribution;
for reassigned spectrogram (11) two-dimensional discrete distribution.
Remark 6.
Summation in (36) and (37) is performed:
by for one-dimensional discrete distributions;
by where is a pair of indices for two-dimensional discrete distributions.
Thus, systems (36), (37), and different discrete distributions present a significant number of information characteristics, which can be used as classification features and will be explored in the next section.
6. Modeling
This section is devoted to verifying of the theoretical proposals presented in the previous sections and the applicability of the proposed information characteristics to solve the problem of detection and classification in numerical experiments with both model signals and more complex real-world recordings.
6.1. Model Signal Description
To illustrate the obtained theoretical results, the following types of model signals are used:
harmonic signals;
linearly frequency-modulated chirp signals (LFM chirp signals);
model signals of marine vessels.
The harmonic signal has the form
| (38) |
and consists of the sum of K harmonic components with amplitudes and frequencies , where and are the constant fundamental frequency and the step between the frequencies, respectively. The sampling frequency and window size are selected so that harmonic samples are not blurred in the resulting spectrum.
The LFM chirp signal is described by the following equation:
| (39) |
with an initial phase of and represents a signal with frequency varying according to the following linear law:
| (40) |
where is the initial frequency at time . The rate of rise of c is determined by the difference in frequencies at the initial and final time moments, respectively
| (41) |
The signal simulating the acoustic radiation of a marine vessel is modeled according to [25] as
| (42) |
where is the shaft frequency, i.e., the rotation frequency of the propeller shaft;
are the cavitation noise of the propellers and the noise of the marine environment, respectively;
K is the number of harmonic components with the fundamental frequency forming the signal;
are the corresponding amplitudes of each component.
Marine environment noise is modeled with white Gaussian noise with parameters such as to satisfy a predefined signal-to-noise ratio (SNR). The noise of the shaft and propeller rotation is modulated at the shaft rotation frequency and at the frequency , equal to the product of the shaft rotation frequency and the number of m propeller blades (blade frequency) [26]. Due to the nonlinear effects that occur during acoustic radiation, the ship’s noise spectrum, as a rule, contains harmonics of the shaft and blade frequencies, forming a single tonal scale with a base equal to the shaft frequency. In some cases, the shaft frequency and its harmonics may not appear, and then the spectrum may contain only a bladed scale.
Cavitation is the process of formation of discontinuities in the medium during rotation of the propeller, characterized by the appearance of vapor-gas bubbles of various sizes and concentrations in the liquid. Cavitation noise is modeled with white Gaussian noise in a narrow band of cavitation frequencies (from 1 kHz to 3 kHz).
6.2. Description of Real Signals
Three datasets corresponding to different signal types will be used as real signals:
Bioacoustic signals;
Recordings of hydroacoustic background marine noise;
Hydroacoustic ship signals.
Whale recordings from the dataset of the article [27] were used as bioacoustic signals. The recordings are three-second segments containing phonemes of whale sounds.
The recordings of natural background noise are taken from the dataset QiandaoEar22 [28], recorded at night in calm conditions on the Chinese Tsandao Lake. Despite the recording conditions, the acoustic signals obtained are non-trivial in terms of spectral content and are very different from, for example, synthetic white noise.
The ship records are sourced from the Deepship [29] dataset, which is the most popular hydroacoustic dataset for solving classification problems using machine learning methods. The dataset was recorded from 2016 to 2018 in Vancouver Bay. The data in this set is divided into four classes: cargo ship, tugboat, tanker, and passenger ship. The advantage of this set is that it is recorded in a marine environment in different seasons and under different conditions. Along with ship signals, the recordings contain natural background noises, sounds of marine mammals, and noises from other human activities. The distance from the objects ranges from several hundred to two thousand meters.
6.3. Statistical Experiments for Detecting Model Signals
To illustrate the analytical results of the article, we use a statistical modeling technique based on the analysis of generated numerical data and described in detail in our previous works [22,23]. All numerical results were obtained using Python 3.12 and Numpy 1.26 and Scipy 1.16 libraries. The spectrograms are computed using the AudioFlux [30] digital signal processing library.
Let us consider pairs of data sequences corresponding to two hypotheses of signal reception:
| (43) |
The hypothesis corresponds to a decision of receiving only noise, and the hypothesis —of receiving a mixture of a useful signal and noise, where the sequence is a time series of received data, —useful signal, —additive white Gaussian noise, N—the length of the time series of data (i.e., of the frame).
To verify the quality of the separation of the useful deterministic signal and noise, statistics were collected on numerically generated frames of signal–noise mixture of length 2N = 16,384 with a spectrum size of , respectively, for each type of signal described in Section 6.1. Spectrograms were computed with windows of size and hopping length with Hann window function. Order of Rényi entropy is chosen as .
In all experiments, the signal for each pair was generated randomly. Thus, for harmonic signals, the number of components with random phases varied. For LFM chirp signals, the initial and final frequencies in the window were randomly selected. The additive white Gaussian noise was obtained by a Gaussian sequence generator with mean and variance (within a single set of Q frames). The signal amplitude was selected to satisfy a predefined signal-to-noise ratio (), which is described by the formula
| (44) |
where , are total energies of the signal and noise, respectively, calculated as the sum of the spectral decomposition powers of the sequences and .
For each resulting sequence , discrete normalized frequency distributions and time-frequency distributions are calculated. Further, based on these distributions, the values of the information characteristics (36) and (37) are calculated for noise and a mixture of noise with a signal corresponding to the two hypotheses from the expressions (43).
The final result of modeling and comparison of the calculated information metrics is the dependence of the quality of the binary classification and the detection probability on the signal-to-noise ratio .
To obtain such a dependence for a number of noise variance values corresponding to a set of SNRs from −20 dB to 0 dB, the Q frame statistics described above are collected, and histograms of information feature distributions are constructed from it, which are then used to calculate the values of and the detection probability . An example histogram and AUC ROC graph are shown in Figure 1.
Figure 1.
Histogram and AUC ROC graph for , calculated for Q = 20,000 experiments.
In all the figures presented below, only the most revealing information characteristics are retained in order to simplify the visual understanding of the graphs and get rid of unrepresentative results associated with calculation errors for low SNRs. This approach has no effect on the qualitative conclusions reached at the end of the subsection.
Figure 2 shows a comparison of the AUC ROC metric and the probability of detection for harmonic signals (38) for spectral information characteristics.
Figure 2.
The quality of harmonic signal detection for information characteristics calculated from the spectrum.
Figure 3 shows a comparison between the AUC ROC metric and the detection probability for harmonic signals, focusing on information characteristics calculated from a spectrogram. It can be seen that in the case of pure harmonic signals, the spectral characteristics show themselves much better than their time-frequency counterparts.
Figure 3.
The quality of harmonic signal detection for information characteristics calculated from a spectrogram.
In turn, the reverse situation is observed for LFM chirp signals (39). In this case, the spectral characteristics degrade significantly even with significant signal-to-noise ratios, as shown in Figure 4, whereas their time-frequency counterparts show good detection quality, as illustrated in Figure 5.
Figure 4.
The quality of the LFM chirp signal detection for information characteristics calculated from the spectrum.
Figure 5.
The quality of the LFM chirp signal detection for the information characteristics calculated from the spectrogram.
A similar experiment conducted for model cavitation signals of marine vessels (42) demonstrates similar detection quality for both types of information characteristics, as shown in Figure 6 and Figure 7.
Figure 6.
The quality of the simulated acoustic radiation of a marine vessel signal detection for information characteristics calculated from the spectrum.
Figure 7.
The quality of the simulated acoustic radiation of a marine vessel signal detection for the information characteristics calculated from the spectrogram.
The presented graphs correspond to the intuitive idea that time-frequency distributions and features based on them are able to distinguish signals whose frequency composition changes significantly over a time equal to the duration of the window under consideration.
Therefore, in the case of simple harmonic signals, the constant components have a positive effect on the spectral characteristics, since the spectral window is trivially larger than the atomic windows used to calculate the spectrogram, and thus shows adequate signal–noise separation quality for lower SNRs.
However, in the case of LFM chirp signals, their spectral composition changes during the observation window, i.e., the frequency transformation from the entire window is no longer able to adequately reflect the signal features, whereas in the time-frequency distribution matrix, frequencies are localized much better and therefore features based on time-frequency transformations stand out favorably in the problem of detecting such signals.
Lastly, entropy characteristics based on distribution (19) are compared with other criteria. This comparison for LFM chirp signals is illustrated in Figure 8. It can be seen that (20) distribution’s detection quality is between the ones related to spectral and spectrogram-based characteristics, which can be of research interest and may be studied in future work.
Figure 8.
The quality of LFM chirp signal detection for different entropy measures.
6.4. Plane for Classification of Real Signals
Next, we will consider the real signals from the datasets described in Section 6.2. It is worth noting that the recordings were not subjected to any pre-processing, except for resampling to a unite frequency, which is necessary for the uniformity of the size of the signal windows of all classes of the training dataset, as well as centering and normalization, which are standard procedures in the practice of working with acoustic signals using machine learning methods.
Entropy characteristics are associated with an effective signal classification mechanism based on the analysis of the plane [31,32]. Without going into the details outlined in the mentioned articles, this approach is justified by the fact that the complexity estimate provides additional insight into the details of the probability distribution of the system, which are not distinguished by entropy. It can also help reveal information related to the correlation structure between the components of the studied physical process [31]. Hence, the entropy–complexity plane allows you to explore the hidden parameters of the signals [6] and can be used to classify them.
Figure 9, Figure 10 and Figure 11 show such planes constructed from signals corresponding to the classes of background marine noise, cargo ship, and biacoustic signatures of whales. The approximate number of signal windows of each class, and, accordingly, the colored dots of each color on the graphs in this experiment is 20,000.
Figure 9.
Classification planes for the Shannon entropy H and the corresponding complexity for the spectrum and spectrogram distributions.
Figure 10.
Classification planes for the Rényi entropy and the corresponding complexity for the spectrum and spectrogram distributions.
Figure 11.
Classification planes for the Rényi entropy and the corresponding complexity for the spectrum and spectrogram distributions.
It can be seen that while the spectral characteristics do not allow us to reliably separate the described signal classes, the time-frequency characteristics successfully cope with this task.
In addition, it is worth noting how different time-frequency distributions affect the appearance of such diagrams. Figure 12 shows the planes constructed for the usual (7) and reassigned (11) spectrograms. It can be seen that clouds of points corresponding to different classes are more strongly grouped around their centers of mass, and the distances between the different classes become greater.
Figure 12.
Classification planes for the Shannon entropy H and the corresponding complexity for conventional and reassigned spectrograms distributions.
The demonstrated results show that information features of frequency and time-frequency distributions of signals can be used to solve the problem of signal classification, which is the subject of the next subsection.
6.5. Using Entropy Features to Classify Signals with Machine Learning Methods
To study the possibilities of classifying real signals using the proposed features and numerically evaluating the quality of such classification, a machine learning method was used, namely the XGBoost [33] gradient boosting algorithm for decision trees, which is one of the most popular approaches for solving machine learning problems based on tabular data. The results were obtained using the XGBoost library of the Python programming language.
For each class of signals, a training dataset has been calculated, which has the size of 20,000 signal windows for 30 features for each class. It is important that the training and test datasets are separated before the stage of feature calculation, i.e., it is guaranteed that the test dataset contains signals that are not present in the training dataset. The test dataset contains 5000 signal windows for each class. Four classes are considered in the experiment:
natural marine background noise (Noise);
bioacoustic signals of whales (Whale);
hydroacoustic signals of a tugboat (Tug);
hydroacoustic signals of a passenger ship (Passenger).
As a result of training the classifier, the following metrics were obtained:
| (45) |
where is a measure averaged over all classes, and is the Matthews correlation coefficient. Figure 13 shows a normalized error matrix after training the model.
Figure 13.
Confusion matrix of the trained classifier.
Figure 14 shows a training performance graph, i.e., the dependence of the loss function mlogloss (multiclass logarithmic loss) for training and test data from iteration. A high final quantity of the obtained trees is associated with the chosen learning step , which guarantees the smoothness of the gradient convergence process.
Figure 14.
XGBoost classifier training graph.
Furthermore, it is interesting to examine the significance of the trained classifier’s features. From Figure 15, the influence of the reassigned spectrogram distribution for making the final decision becomes obvious, since the features associated with it hold the highest importance and weight in decision-making by the received classifier.
Figure 15.
The importance of features in classification.
Thus, it can be stated that information features, considered in the current work, achieve good classification quality for acoustic signals of different natures and can be used in machine learning detection and classification systems.
7. Conclusions
The paper provides a time-frequency analysis of the problem of detecting and classifying acoustic signals based on information (entropy) criteria. A new method for calculating the discrete distribution in the time-frequency domain is proposed, including the use of a spectrogram and a reassigned spectrogram. Further information properties of the matrix and the vector in the problem of distinguishing close hypotheses for weak signal detection have yet to be established.
To justify the applicability of the proposed characteristics and validate their classification quality, modeling on synthetic signals and numerical verification of the solution of the multiclass classification problem based on machine learning methods on real hydroacoustic recordings are carried out. The obtained high classification results () confirm the potential of using the proposed characteristics.
Future work will be devoted to an additional study of the problem of classifying similar classes of signals, as well as the joint use of the proposed information characteristics and classical signal features to improve the quality of classification.
Abbreviations
The following abbreviations are used in this manuscript:
| AUC ROC | Area Under the Receiver Operating Characteristic Curve |
| CFAR | Constant False Alarm Rate |
| ECG | Electrocardiogram |
| EEG | Electroencephalogram |
| JSD | Jensen–Shannon Divergence |
| LFM | Linear Frequency Modulation |
| SNR | Signal-To-Noise Ratio |
| STFT | Short-Time Fourier transform |
| TFD | Time-Frequency Distribution |
| VAD | Voice Activity Detection |
Author Contributions
Conceptualization, A.G. and P.L.; methodology, A.G. and P.L.; software, P.L. and L.B.; validation, P.L., A.G. and L.B.; formal analysis, A.G. and L.B.; investigation, A.G., P.L., L.B. and V.B.; resources, P.L. and L.B.; data curation, P.L.; writing—original draft preparation, A.G., P.L., L.B. and V.B.; writing—review and editing, A.G., P.L., L.B. and V.B.; visualization, P.L. and L.B.; supervision, A.G.; project administration, A.G. and P.L.; funding acquisition, A.G. and P.L. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original data presented in the study are openly available in https://www.frdr-dfdr.ca/repo/dataset/4a3113e6-1d58-6bb4-aaf2-a9adf75165be (accessed on 10 August 2025), https://github.com/irfankamboh/DeepShip (accessed on 10 August 2025) and https://github.com/xiaoyangdu22/QiandaoEar22 (accessed on 10 August 2025).
Conflicts of Interest
The authors declare no conflicts of interest.
Funding Statement
The work was partially supported by the Russian Science Foundation under grant no 23-19-00134.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Cohen L. Time-Frequency Analysis. Prentice Hall PTR; Hoboken, NJ, USA: 1995. p. 320. Electrical Engineering Signal Processing. [Google Scholar]
- 2.Boashash B., Khan N.A., Ben-Jabeur T. Time–frequency features for pattern recognition using high-resolution TFDs: A tutorial review. Digit. Signal Process. 2015;40:1–30. doi: 10.1016/j.dsp.2014.12.015. [DOI] [Google Scholar]
- 3.Malarvili M.B., Sucic V., Mesbah M., Boashash B. Renyi entropy of quadratic time-frequency distributions: Effects of signals parameters; Proceedings of the 2007 9th International Symposium on Signal Processing and Its Applications; Sharjah, United Arab Emirates. 12–15 February 2007; [DOI] [Google Scholar]
- 4.Sucic V., Saulig N., Boashash B. Analysis of local time-frequency entropy features for nonstationary signal components time supports detection. Digit. Signal Process. 2014;34:56–66. doi: 10.1016/j.dsp.2014.07.013. [DOI] [Google Scholar]
- 5.Sucic V., Saulig N., Boashash B. Estimating the number of components of a multicomponent nonstationary signal using the short-term time-frequency Rényi entropy. EURASIP J. Adv. Signal Process. 2011;2011:125. doi: 10.1186/1687-6180-2011-125. [DOI] [Google Scholar]
- 6.Babikov V.G., Galyaev A.A. Information diagrams and their capabilities for classifying weak signals. Probl. Inf. Transm. 2024;60:127–140. doi: 10.1134/S0032946024020042. [DOI] [Google Scholar]
- 7.Bačnar D., Saulig N., Vuksanović I.P., Lerga J. Entropy-Based Concentration and Instantaneous Frequency of TFDs from Cohen’s, Affine, and Reassigned Classes. Sensors. 2022;22:3727. doi: 10.3390/s22103727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Aviyente S. Divergence measures for time-frequency distributions; Proceedings of the Seventh International Symposium on Signal Processing and Its Applications; Paris, France. 1–4 July 2003; pp. 121–124. [DOI] [Google Scholar]
- 9.Zarjam P., Azemi G., Mesbah M., Boashash B. Detection of newborns’ EEG seizure using time-frequency divergence measures; Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing; Montreal, QC, Canada. 17–21 May 2004; pp. 429–432. [DOI] [Google Scholar]
- 10.Porta A., Baselli G., Lombardi F., Montano N., Malliani A., Cerutti S. Conditional entropy approach for the evaluation of the coupling strength. Biol. Cybern. 1999;81:119–129. doi: 10.1007/s004220050549. [DOI] [PubMed] [Google Scholar]
- 11.Baraniuk R., Flandrin P., Janssen A., Michel O. Measuring time-frequency information content using the Renyi entropies. IEEE Trans. Inf. Theory. 2001;47:1391–1409. doi: 10.1109/18.923723. [DOI] [Google Scholar]
- 12.Michel O., Baraniuk R., Flandrin P. Time-frequency based distance and divergence measures; Proceedings of the IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis; Philadelphia, PA, USA. 25–28 October 1994; [DOI] [Google Scholar]
- 13.Flandrin P., Baraniuk R., Michel O. Time-frequency complexity and information; Proceedings of the ICASSP ’94, IEEE International Conference on Acoustics, Speech and Signal Processing; Adelaide, SA, Australia. 19–22 April 1994; [DOI] [Google Scholar]
- 14.Kalra M., Kumar S., Das B. Moving Ground Target Detection With Seismic Signal Using Smooth Pseudo Wigner–Ville Distribution. IEEE Trans. Instrum. Meas. 2020;69:3896–3906. doi: 10.1109/TIM.2019.2932176. [DOI] [Google Scholar]
- 15.Xu Y., Zhao Y., Jin C., Qu Z., Liu L., Sun X. Salient target detection based on pseudo-Wigner-Ville distribution and Rényi entropy. Opt. Lett. 2010;35:475–477. doi: 10.1364/OL.35.000475. [DOI] [PubMed] [Google Scholar]
- 16.Vranković A., Ipšić I., Lerga J. Entropy-Based Extraction of Useful Content from Spectrograms of Noisy Speech Signals; Proceedings of the 2021 International Symposium ELMAR; Zadar, Croatia. 13–15 September 2021; [DOI] [Google Scholar]
- 17.Liu C., Gaetz W., Zhu H. Estimation of Time-Varying Coherence and Its Application in Understanding Brain Functional Connectivity. EURASIP J. Adv. Signal Process. 2010;2010:390910. doi: 10.1155/2010/390910. [DOI] [Google Scholar]
- 18.Moukadem A., Dieterlen A., Brandt C. Shannon Entropy based on the S-Transform Spectrogram applied on the classification of heart sounds; Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; Vancouver, BC, Canada. 26–31 May 2013; pp. 704–708. [DOI] [Google Scholar]
- 19.Batistić L., Lerga J., Stanković I. Detection of motor imagery based on short-term entropy of time–frequency representations. Biomed. Eng. Online. 2023;22:41. doi: 10.1186/s12938-023-01102-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sang Y.F., Wang D., Wu J.C., Zhu Q.P., Wang L. Entropy-Based Wavelet De-noising Method for Time Series Analysis. Entropy. 2009;11:1123–1147. doi: 10.3390/e11041123. [DOI] [Google Scholar]
- 21.Auger F., Flandrin P., Lin Y.T., McLaughlin S., Meignen S., Oberlin T., Wu H.T. Time-frequency reassignment and synchrosqueezing: An overview. IEEE Signal Process. Mag. 2013;30:32–41. doi: 10.1109/MSP.2013.2265316. [DOI] [Google Scholar]
- 22.Galyaev A.A., Babikov V.G., Lysenko P.V., Berlin L.M. A New Spectral Measure of Complexity and Its Capabilities for Detecting Signals in Noise. Dokl. Math. 2024;110:361–368. doi: 10.1134/S1064562424702235. [DOI] [Google Scholar]
- 23.Galyaev A.A., Berlin L.M., Lysenko P.V., Babikov V.G. Order statistics of the normalized spectral distribution for detecting weak signals in white noise. Autom. Remote Control. 2024;85:1041–1055. doi: 10.1134/S0005117924700401. [DOI] [Google Scholar]
- 24.López-Ruiz R., Mancini H., Calbet X. A statistical measure of complexity. Phys. Lett. A. 1995;209:321–326. doi: 10.1016/0375-9601(95)00867-5. [DOI] [Google Scholar]
- 25.Liu Z., Lü L., Yang C., Jiang Y., Huang L., Du J. DEMON Spectrum Extraction Method Using Empirical Mode Decomposition; Proceedings of the 2018 OCEANS—MTS/IEEE Kobe Techno-Oceans (OTO); Kobe, Japan. 28–31 May 2018; pp. 1–5. [DOI] [Google Scholar]
- 26.Kudryavtsev A.A., Luginets K.P., Mashoshin A.I. Amplitude Modulation of Underwater Noise Produced by Seagoing Vessels. Akust. Zhurnal. 2003;49:224–228. doi: 10.1134/1.1560380. [DOI] [Google Scholar]
- 27.Kirsebom O.S., Frazao F., Simard Y., Roy N., Matwin S., Giard S. Performance of a deep neural network at detecting North Atlantic right whale upcalls. J. Acoust. Soc. Am. 2020;147:2636–2646. doi: 10.1121/10.0001132. [DOI] [PubMed] [Google Scholar]
- 28.Du X., Hong F. QiandaoEar22: A high-quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise. EURASIP J. Adv. Signal Process. 2024;2024:96. doi: 10.1186/s13634-024-01181-9. [DOI] [Google Scholar]
- 29.Irfan M., Jiangbin Z., Ali S., Iqbal M., Masood Z., Hamid U. DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification. Expert Syst. Appl. 2021;183:115270. doi: 10.1016/j.eswa.2021.115270. [DOI] [Google Scholar]
- 30.Tanky, van, Dong L., cool, Eberenz J. libAudioFlux/audioFlux: v0.1.9. Zenodo; Geneva, Switzerland: 2024. [DOI] [Google Scholar]
- 31.Ribeiro H.V., Zunino L., Lenzi E.K., Santoro P.A., Mendes R.S. Complexity-Entropy Causality Plane as a Complexity Measure for Two-Dimensional Patterns. PLoS ONE. 2012;7:e40689. doi: 10.1371/journal.pone.0040689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang J., Chen Z. Feature Extraction of Ship-Radiated Noise Based on Intrinsic Time-Scale Decomposition and a Statistical Complexity Measure. Entropy. 2019;21:1079. doi: 10.3390/e21111079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System; Proceedings of the the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, CA, USA. 13–17 August 2016; pp. 785–794. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The original data presented in the study are openly available in https://www.frdr-dfdr.ca/repo/dataset/4a3113e6-1d58-6bb4-aaf2-a9adf75165be (accessed on 10 August 2025), https://github.com/irfankamboh/DeepShip (accessed on 10 August 2025) and https://github.com/xiaoyangdu22/QiandaoEar22 (accessed on 10 August 2025).















