A New Sparse Blind Source Separation Method for Determined Linear Convolutive Mixtures in Time-Frequency Domain

Mostafa Bella; Hicham Saylani

doi:10.1007/978-3-030-51935-3_38

. 2020 Jun 5;12119:357–366. doi: 10.1007/978-3-030-51935-3_38

A New Sparse Blind Source Separation Method for Determined Linear Convolutive Mixtures in Time-Frequency Domain

Mostafa Bella ^5,^✉, Hicham Saylani ^5,^✉

Editors: Abderrahim El Moataz⁸, Driss Mammass⁹, Alamin Mansouri¹⁰, Fathallah Nouboud¹¹

PMCID: PMC7340919

Abstract

This paper presents a new Blind Source Separation method for linear convolutive mixtures, which exploits the sparsity of source signals in the time-frequency domain. This method especially brings a solution to the artifacts problem that affects the quality of signals separated by existing time-frequency methods. These artifacts are in fact introduced by a time-frequency masking operation, used by all these methods. Indeed, by focusing on the case of determined mixtures, we show that this problem can be solved with much less restrictive sparsity assumptions than those of existing methods. Test results show the superiority of our new proposed method over existing ones based on time-frequency masking.

Keywords: Blind source separation, Linear convolutive mixtures, Sparsity, Time-frequency masking, Bin-wise clustering, Determined mixtures

Introduction

Blind Source Separation (BSS) aims to find a set of N unknown signals, called sources and denoted by Inline graphic , knowing only a set of M mixtures of these sources, called observations and denoted by . This discipline is receiving increasing attention thanks to the diversity of its fields of application. Among these fields, we can cite those of audio, biomedical, seismic and telecommunications. In this paper, we are interested in so-called linear convolutive (LC) mixtures for which each mixture Inline graphic is expressed in terms of the sources and their delayed versions as follows:

where:

represents the impulse response coefficients of the mixing filter linking the source of index j to the sensor of index i,
Q is the order of the longest filter,
the symbol “” denotes the linear convolution operator.

Indeed, in the field of BSS, the case of LC mixtures is still of interest since the performance of existing methods is still modest compared to the particular case of linear instantaneous mixtures for which Inline graphic . BSS methods for LC mixtures can be classified into two main families. The so-called temporal methods that deal with mixtures in the time domain and the so-called frequency methods that deal with mixtures in the time-frequency (TF) domain. The performance of the former is generally very modest and remains very restrictive in terms of assumptions compared to the latter. Indeed, based mostly on the independence of source signals, most efficient methods are compared to frequency ones only for very short filters (i.e. Q low), and generally require over-determined mixtures (i.e. for Inline graphic ) [12, 16]. Based mostly on the sparsity of source signals in the TF domain, the frequency methods have shown good performance in the determined case (i.e. for ) or even under-determined case (i.e. for ), and this despite increasing the filters length [4, 8, 9, 13–15]. These frequency methods start by transposing the Eq. (1) into the TF domain using the short time Fourier transform (STFT) as follows:

where:

and are the STFT representations of and respectively,
K and T are the length of the analysis window1 and the number of time windows used by the STFT respectively2,
is the Discrete Fourier Transform of calculated on K points.

Among most efficient and relatively more recent frequency methods, we can mention those based on TF masking [2, 4, 6–9, 13–15]. The sparsity is often exploited by these methods by assuming that the source signals are W-disjoint orthogonal, i.e. not overlapping3 in the TF domain. The principle of these methods is to estimate a separation mask, denoted by Inline graphic and specific to each source , which groups the TF points where only this source is present. The application of the estimated mask to one of the frequency observations allows us to keep from the latter only the TF points belonging to the source , and then separate it from the rest of the mixture. Depending on the procedure used to estimate the masks, we distinguish between two types of BSS methods based on TF masking. The so-called full-band methods [2, 4, 6, 9] for which the masks are estimated integrally using a clustering algorithm that processes all frequency bins simultaneously, and the so-called bin-wise methods [7, 8, 13–15] for which the masks are estimated using a clustering algorithm that processes only one frequency bin at a time.

Among the most popular full-band methods we can cite those proposed in [4, 9] which are based on the clustering of the level ratios and phase differences between the frequency observations Inline graphic to estimate the separation masks. However, this clustering is not always reliable, especially when the order Q of the mixing filters increases [4]. Moreover, when the maximum distance between the sensors is greater than half the wavelength of the maximum frequency of source signals involved, a problem called spatial aliasing is inevitable [4]. The bin-wise methods [7, 8, 13–15] are robust to these two problems. However, these methods require the introduction of an additional step to solve a permutation problem in the estimated masks, when we pass from one frequency bin to another, which is a classical problem that is common to all bin-wise BSS methods.

However, all of these BSS methods based on TF masking (full-band and bin-wise) suffers from artifacts problem which affect the quality of the separated signals and due to the fact that the W-disjoint orthogonality assumption is not perfectly verified in practice. Indeed, being introduced by the TF masking operation, these artifacts are more and more troublesome when the spectral overlap of source signals in the TF domain becomes important. In [11] the authors proposed a first solution to this problem which consists of a cepstral smoothing of spectral masks before applying them to the frequency observations Inline graphic . An interesting extension of this technique, which was proposed in [3], consists in applying cepstral smoothing not to spectral masks but rather to the separated signals, i.e. after applying the separation masks. Knowing that these two techniques [3, 11] were have only been validated on a few full-band methods, in [5] we have recently proposed to evaluate their effectiveness using a few popular bin-wise methods. However, these two solutions could only improve one particular type of artifact called musical noise [3, 5, 11]. In the same sense, in order to avoid the artifacts caused by the TF masking operation, we propose in this paper a new BSS method which also exploits the sparsity of source signals in the TF domain for determined LC mixtures. Indeed, by focusing on the case of determined mixtures, we show that we can avoid TF masking and also relax the W-disjoint orthogonality assumption. Note that the case of determined mixtures was also addressed in [1], but with an assumption which is again very restrictive and which consists in having at least a whole time frame of silence4 for each of the source signals. Thus, our new method makes it possible to carry out the separation while avoiding the artifacts introduced by the operation of TF masking, with sparsity assumptions much less restrictive than those of existing methods.

We begin in Sect. 2 by describing our method. Then we present in Sect. 3 various experimental results that measure the performance of our method compared to existing methods, then we conclude with a conclusion and perspectives of our work in Sect. 4.

Proposed Method

The sole sparsity assumption of our method is the following.

Assumption: For each source Inline graphic and for each frequency bin k, there is at least one TF point (m, k) where it is present alone, i.e:

Thus, if we denote by Inline graphic the set of TF points (m, k) that verify the assumption (3), called single-source points, then the relation (2) gives us:

Our method proceeds in two steps. The first step, which exploits the probabilistic masks used by Sawada et al. in [14, 15], consists in identifying for each source of index “j” and each frequency bin “k” the index “ Inline graphic ” such that the TF point best verifies the Eq. (4), then in estimating the separating filters, denoted and defined by:

The second step consists in recombining the mixtures Inline graphic using the separating filters in order to finally obtain an estimate of the separated sources. The two steps of our method are the subject of Sects. 2.1 and 2.2 respectively.

Estimation of the Separating Filters

Since the proposed treatment in this first step of our method is performed independently of the frequency, we propose in this section to simplify the notations by omitting the frequency bin index “k”. So using a matrix formulation, the Eq. (2) gives us:

where Inline graphic , and

Inline graphic . During this first step, we proceed as follows:

Each vector is normalized and then whitened as follows:
7
where is given by , with .
Each vector is modeled by a complex Gaussian density function of the form [14]:
8
where and are respectively the centroid (with unit norm ) and the variance of each cluster . This density function can be described by the following mixing model:
9
where are the mixture ratios and is the parameter set of the mixing model. Then, an iterative algorithm of the type expectation-maximization (EM) is used to estimate the parameter set , as well as the posterior probabilities at each TF point, which are none other than the probabilistic masks used in [14]. In the expectation step, these posterior probabilities are given by:
10
In the maximization step, the update of centroid is given by the eigenvector associated with the largest eigenvalue of the matrix defined by:
11
The parameters and are updated respectively via the following relations:
12

13
However, since the EM algorithm used in [14, 15] is sensitive to the initialization5, we propose in our method to initialize the masks with those obtained by a modified version of the MENUET method [4]. Indeed, we replaced, in the clustering step for the estimation of the masks, the k-means algorithm used in [4] by the fuzzy c-means (FCM) algorithm used in [13], in order to have probabilistic masks.
After the convergence of the EM algorithm, the classical permutation problem between the different frequency bins is solved by the algorithm proposed in [15], which is based on the inter-frequency correlation between the time sequences of posterior probabilities in each frequency bin k. In the following we denote these posterior probabilities by .
Unlike the approach adopted in [14, 15] which consists in using all the TF points of the estimated probabilistic masks , we are interested in this step only in identifying one single-source TF point for each source of index “j” and for each frequency bin “k”, therefore a single time frame index that we denote by “”, which best verifies our working assumption (4). We then define this index as being the index “m” for which the presence probability of the corresponding source is maximum6:
14
After having identified these “best” single-source TF points , we finish this first step of our method by estimating the separating filters defined in (5) by:
15

Estimation of the Separated Sources

In this section, for more clarity, we provide the mathematical bases for the second step of our method for two LC mixtures of two sources, i.e. for Inline graphic . The generalization to the case can be derived directly from this in an obvious way. In this case, the mixing Eq. (1) gives us:

As we pass to the TF domain, we get:

We use the separating filters Inline graphic , with and , estimated in the first step to recombine these two mixtures as follows:

Since we have Inline graphic and , based on the Eq. (15), we get after all simplifications have been made:

In order to ultimately obtain the contributions of sources in one of the sensors, we propose to add a post-processing step (as in [1]) which consists in multiplying the signals Inline graphic by filters, denoted by , as follows:

After all the simplifications are done, we get:

By denoting Inline graphic the inverse STFT of we get:

These signals are none other than the contributions of source signals Inline graphic and on the first sensor (see the expression of the mixture in (16)).

Results

In order to evaluate the performance of our method and compare it to the most popular bin-wise methods known for their good performance, that is the method proposed by Sawada et al. [15] and the UCBSS method [13], we performed several tests on different sets of mixtures. Each set consists of two mixtures of two real audio sources, which are sampled at 16 KHz and with a duration of 10 s each, using different filter sets. Generated by the toolbox [10], which simulates a real acoustic room characterized by a reverberation time denoted by Inline graphic 7, the coefficients of these mixing filters depend on the distance between the two sensors (microphones), denoted as D and on the absolute value of the difference between directions of arrival of the two source signals, denoted as . For the calculation of the STFT, we used a 2048 sample Hanning window (as analysis window) with a 75% overlap. To measure the performance we used two of the most commonly used criteria by the BSS community, called Signal to Distortion Ratio (SDR) and Signal to Artifacts Ratio (SAR) provided by the BSSeval toolbox [17] and both expressed in decibels (dB). The SDR measures the global performance of any BSS method, while the SAR provides us with a specific information on its performance in terms of artifacts presented in the separated signals.

For each test we evaluated the performance of the three methods, in terms of SDR and SAR, over 4 different realizations of the mixtures related to the use of different sets of source signals cited above. Thus, the values provided below for SDR and SAR represent the average obtained over these 4 realizations8.

In the first experiment, we evaluated the performance as a function of the parameters D and Inline graphic for an acoustic room characterized by = 50 ms. Table 1 groups the performance for and , where the last column for each value of the parameter D represents the average value of SDR and SAR over the three values of .

Table 1.

SDR (dB) and SAR (dB) as a function of D and Inline graphic for = 50 ms.

Method	Performance	D = 0.3 m				D = 1 m
Method	Performance				Mean				Mean
Sawada	SDR	11.75	11.76	12.25	11.92	12.46	12.20	11.88	12.18
	SAR	12.16	12.17	12.72	12.35	12.74	12.59	12.34	12.56
UCBSS	SDR	5.02	8.68	5.82	6.51	8.65	8.71	10.73	9.36
	SAR	7.71	9.99	8.30	8.67	9.67	9.88	11.73	10.43
Proposed method	SDR	16.55	17.40	17.17	17.04	16.17	15.08	16.03	15.76
Proposed method	SAR	17.74	18.56	18.57	18.29	17.83	16.13	17.81	17.26

Open in a new tab

According to Table 1, we can see that our method is performing better than the other two methods, and this over the 4 realizations of mixtures tested. Indeed, the proposed method shows superior performance over these two methods by about 5 dB for D = 0.3 m and 3.5 dB for Inline graphic 1 m in terms of SDR. This performance difference is even more visible in terms of SAR, which confirms that the artifacts introduced by our method are less significant than those introduced by the other two methods.

In our second experiment we were interested in the behavior of our method with regard to the increase of the reverberation time while fixing the parameters D and Inline graphic respectively to D = 0.3 m and . Table 2 groups the performance of the three methods in terms of SDR, for belonging to the interval 9.

Table 2.

SDR (dB) as a function of Inline graphic for D = 0.3 m and .

Method
Method	50 ms	100 ms	150 ms	200 ms
Sawada	11.76	11.42	9.26	7.65
UCBSS	8.68	5.12	3.83	3.04
Proposed method	17.40	13.60	11.02	8.12

Open in a new tab

According to Table 2, we can see again that the best performance is obtained by using our method whichever the reverberation time. However, we note that this performance is degraded when Inline graphic increases. This result, which is common to all BSS methods, is expected and is mainly explained by the fact that the higher the reverberation time, the less the assumption (here of sparseness in the TF domain) assumed by these methods on source signals is verified.

Conclusion and Perspectives

In this paper, we have proposed a new Blind Source Separation method for linear convolutive mixtures with a sparsity assumption in the time-frequency domain that is much less restrictive compared to the existing methods [1, 2, 4, 6–9, 13–15]. Indeed, by focusing on the case of determined mixtures, we have shown that our method avoids the problem of artifacts at the separated signals from which suffers most of these methods [2, 4, 6–9, 13–15]. According to the results of the several tests performed, the performance of our new method, in terms of SDR and SAR, is better than that obtained by using the method proposed by Sawada et al. [15] and the UCBSS method [13], which are known for their good performance within existing methods. Nevertheless, considering that these results were obtained over 4 different realizations of the mixtures and only for some values of the parameters involved, a larger statistical performance study including all these parameters is desirable to confirm this results. Furthermore, it would be interesting to propose a solution to this problem of artifacts also in the case of under-determined linear convolutive mixtures.

Footnotes

Assuming that the length K of the analysis window used is sufficiently larger than the filters order Q (i.e. Inline graphic ).

It should be noted however that the equality in Eq. (2) is only an approximation. This equality would only be true if the discrete convolution used was circular, which is not the case here. We also note that this STFT is generally used with an analysis window different than the rectangular window [2, 4, 6–9, 13–15].

Which means, in each TF point at most one source is present.

⁴

Of length greater or equal to the length K of the analysis window used in the calculation of the STFT.

⁵

Which is done randomly in [14, 15] and can lead to terrible performance.

⁶

Note however that in practice, only the indices “m" with an energy Inline graphic which is not negligible are concerned by the Eq. (14).

⁷

Inline graphic represent the time required for reflections of a direct sound to decay by 60 dB below the level of the direct sound.

⁸

We have indeed opted for these 4 realizations instead of only one in order to approach as close as possible to a statistical validation of our results.

⁹

I.e. the mixing filters length ( Inline graphic ) varies from 800 coefficients (for ) to 3200 coefficients (for ).

Contributor Information

Abderrahim El Moataz, Email: abderrahim.elmoataz-billah@unicaen.fr.

Driss Mammass, Email: mammass@uiz.ac.ma.

Alamin Mansouri, Email: alamin.mansouri@u-bourgogne.fr.

Fathallah Nouboud, Email: fathallah.nouboud@uqtr.ca.

Mostafa Bella, Email: mostafa.bella@edu.uiz.ac.ma.

Hicham Saylani, Email: h.saylani@uiz.ac.ma.

References

1.Albouy, B., Deville, Y.: Alternative structures and power spectrum criteria for blind segmentation and separation of convolutive speech mixtures. In: Fourth International Conference on Independent Component Analysis and Blind Source Separation (ICA2003), pp. 361–366, April 2003
2.Alinaghi A, Jackson PJ, Liu Q, Wang W. Joint mixing vector and binaural model based stereo source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2014;22(9):1434–1448. doi: 10.1109/TASLP.2014.2320637. [DOI] [Google Scholar]
3.Ansa, Y., Araki, S., Makino, S., Nakatani, T., Yamada, T., Nakamura, A., Kitawaki, N.: Cepstral smoothing of separated signals for underdetermined speech separation. In: 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2506–2509 (2010)
4.Araki S, Sawada H, Mukai R, Makino S. Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Sig. Process. 2007;87(8):1833–1847. doi: 10.1016/j.sigpro.2007.02.003. [DOI] [Google Scholar]
5.Bella, M., Saylani, H.: Réduction des artéfacts au niveau des sources audio séparées par masquage temps fréquence en utilisant le lissage cepstral. In: Colloque International TELECOM 2019 and JFMMA, pp. 58–61, June 2019
6.Ito, N., Araki, S., Nakatani, T.: Permutation-free convolutive blind source separation via full-band clustering based on frequency-independent source presence priors. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3238–3242, May 2013
7.Ito, N., Araki, S., Nakatani, T.: Modeling audio directional statistics using a complex bingham mixture model for blind source extraction from diffuse noise. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 465–468, March 2016
8.Ito, N., Araki, S., Yoshioka, T., Nakatani, T.: Relaxed disjointness based clustering for joint blind source separation and dereverberation. In: 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 268–272, September 2014
9.Jourjine, A., Rickard, S., Yilmaz, O.: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, vol. 5, pp. 2985–2988, June 2000
10.Lehmann EA, Johansson AM. Prediction of energy decay in room impulse responses simulated with an image-source model. J. Acoust. Soc. Am. 2008;124(1):269–77. doi: 10.1121/1.2936367. [DOI] [PubMed] [Google Scholar]
11.Madhu, N., Breithaupt, C., Martin, R.: Temporal smoothing of spectral masks in the cepstral domain for speech separation. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 45–48, March 2008
12.Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Springer Handbook of Speech Processing. Springer, November 2007
13.Reju VG, Koh SN, Soon IY. Underdetermined convolutive blind source separation via time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 2010;18(1):101–116. doi: 10.1109/TASL.2009.2024380. [DOI] [Google Scholar]
14.Sawada, H., Araki, S., Makino, S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139–142, October 2007
15.Sawada H, Araki S, Makino S. Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 2011;19(3):516–527. doi: 10.1109/TASL.2010.2051355. [DOI] [Google Scholar]
16.Saylani H, Hosseini S, Deville Y. Blind separation of convolutive mixtures of non-stationary and temporally uncorrelated sources based on joint diagonalization. In: Elmoataz A, Mammass D, Lezoray O, Nouboud F, Aboutajdine D, editors. Image and Signal Processing; Heidelberg: Springer; 2012. pp. 191–199. [Google Scholar]
17.Vincent E, Gribonval R, Fevotte C. Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 2006;14(4):1462–1469. doi: 10.1109/TSA.2005.858005. [DOI] [Google Scholar]

[CR1] 1.Albouy, B., Deville, Y.: Alternative structures and power spectrum criteria for blind segmentation and separation of convolutive speech mixtures. In: Fourth International Conference on Independent Component Analysis and Blind Source Separation (ICA2003), pp. 361–366, April 2003

[CR2] 2.Alinaghi A, Jackson PJ, Liu Q, Wang W. Joint mixing vector and binaural model based stereo source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2014;22(9):1434–1448. doi: 10.1109/TASLP.2014.2320637. [DOI] [Google Scholar]

[CR3] 3.Ansa, Y., Araki, S., Makino, S., Nakatani, T., Yamada, T., Nakamura, A., Kitawaki, N.: Cepstral smoothing of separated signals for underdetermined speech separation. In: 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2506–2509 (2010)

[CR4] 4.Araki S, Sawada H, Mukai R, Makino S. Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Sig. Process. 2007;87(8):1833–1847. doi: 10.1016/j.sigpro.2007.02.003. [DOI] [Google Scholar]

[CR5] 5.Bella, M., Saylani, H.: Réduction des artéfacts au niveau des sources audio séparées par masquage temps fréquence en utilisant le lissage cepstral. In: Colloque International TELECOM 2019 and JFMMA, pp. 58–61, June 2019

[CR6] 6.Ito, N., Araki, S., Nakatani, T.: Permutation-free convolutive blind source separation via full-band clustering based on frequency-independent source presence priors. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3238–3242, May 2013

[CR7] 7.Ito, N., Araki, S., Nakatani, T.: Modeling audio directional statistics using a complex bingham mixture model for blind source extraction from diffuse noise. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 465–468, March 2016

[CR8] 8.Ito, N., Araki, S., Yoshioka, T., Nakatani, T.: Relaxed disjointness based clustering for joint blind source separation and dereverberation. In: 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 268–272, September 2014

[CR9] 9.Jourjine, A., Rickard, S., Yilmaz, O.: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, vol. 5, pp. 2985–2988, June 2000

[CR10] 10.Lehmann EA, Johansson AM. Prediction of energy decay in room impulse responses simulated with an image-source model. J. Acoust. Soc. Am. 2008;124(1):269–77. doi: 10.1121/1.2936367. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Madhu, N., Breithaupt, C., Martin, R.: Temporal smoothing of spectral masks in the cepstral domain for speech separation. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 45–48, March 2008

[CR12] 12.Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Springer Handbook of Speech Processing. Springer, November 2007

[CR13] 13.Reju VG, Koh SN, Soon IY. Underdetermined convolutive blind source separation via time-frequency masking. IEEE Trans. Audio Speech Lang. Process. 2010;18(1):101–116. doi: 10.1109/TASL.2009.2024380. [DOI] [Google Scholar]

[CR14] 14.Sawada, H., Araki, S., Makino, S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139–142, October 2007

[CR15] 15.Sawada H, Araki S, Makino S. Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 2011;19(3):516–527. doi: 10.1109/TASL.2010.2051355. [DOI] [Google Scholar]

[CR16] 16.Saylani H, Hosseini S, Deville Y. Blind separation of convolutive mixtures of non-stationary and temporally uncorrelated sources based on joint diagonalization. In: Elmoataz A, Mammass D, Lezoray O, Nouboud F, Aboutajdine D, editors. Image and Signal Processing; Heidelberg: Springer; 2012. pp. 191–199. [Google Scholar]

[CR17] 17.Vincent E, Gribonval R, Fevotte C. Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 2006;14(4):1462–1469. doi: 10.1109/TSA.2005.858005. [DOI] [Google Scholar]

PERMALINK

A New Sparse Blind Source Separation Method for Determined Linear Convolutive Mixtures in Time-Frequency Domain

Mostafa Bella

Hicham Saylani

Abstract

Introduction