Abstract
This paper presents an efficient framework for estimating the direction-of-arrival (DOA) of wideband sound sources. The proposed framework provides an efficient way to construct a wideband cross-correlation matrix from multiple narrowband cross-correlation matrices for all frequency bins. In addition, the proposed framework is inspired by the coherent signal subspace technique with further improvement of linear transformation procedure, and the new procedure no longer requires any process of DOA preliminary estimation by exploiting unique cross-correlation matrices between the received signal and itself on distinct frequencies, along with the higher-order generalized singular value decomposition of the array of this unique matrix. Wideband DOAs are estimated by employing any subspace-based technique for estimating narrowband DOAs, but using the proposed wideband correlation instead of the narrowband correlation matrix. It implies that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOAs of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples are presented to showcase its applicability and effectiveness, and the results show that the performance of fusion methods perform better than others over a range of signal-to-noise ratios with just a few sensors, which make it suitable for practical use.
Keywords: direction-of-arrival (DOA), higher-order generalized singular value decomposition (HOGSVD), wideband sources, sound source, array processing, subspace method, cross-correlation
1. Introduction
The fundamental competence of sound source localization has received much attention during the past decades, and has become an important part of navigation systems [1,2]. Direction-of-arrival (DOA) estimation in particular plays a critical role in navigation systems for the exploration of sources in widespread applications, including in acoustic signal processing [3,4,5,6,7,8]. Several approaches have been proposed as a potential way to estimate DOA. For instance, the time-difference-of-arrival-based DOA estimation is one of the most frequently used approaches, which is widely known as the generalized cross-correlation with phase transform (GCC-PHAT) [9]. In addition to this approach, a low computational requirement makes it attractive for practical applications; however, the major drawback is its low robustness in noisy and multipath environments. Another relevant approach is adopted from the independent component analysis (ICA) in blind source separation [10,11]. ICA searches independent components by measuring deviations from Gaussian distributions, such as maximization of negentropy or kurtosis. DOAs are estimated easily by using the separated components for all frequency bins, but it should be noted that the estimation accuracy of such a method is highly sensitive to the non-Gaussianity measures.
In an alternative approach to estimate narrowband DOAs, the subspace method has been proposed in an effort to improve estimation performance. The most prominent methods observe the signal and noise subspace for achieving more robust results, such as multiple signal classification (MUSIC) [12], estimation of signal parameters via rotational invariance techniques (ESPRIT) [13], and propagator method [14,15], which have been used frequently for one-dimensional (1D) DOA estimation along with the uniform linear array (ULA) of sensors. In case of a two-dimensional (2D) DOA estimation, a new geometrical structure of a sensor array is required, and it was previously found that the structure of an L-shaped array is considerably effective for estimating 2D DOAs [16]. Additionally, the L-shaped array allows for simple implementation, because it consists of two ULAs connected orthogonally at one end of each ULA. For these reasons, the L-shaped array is widely applied to the 2D DOA estimation method [17,18,19,20,21,22,23,24,25,26], and its practical applications can be found in the past researches [27,28]. Although the narrowband subspace method may be unable to directly estimate wideband DOAs, one possible way to solve this problem is to employ the narrowband subspace method in each temporal frequency intensively, and then the wideband DOA results can be estimated by interpolating the narrowband DOA results all frequency bins [29,30]. It should be noted again that intensive computational costs encountered in the above solution may be limited by practical considerations.
Several approaches were proposed to solve the problem of estimating wideband DOAs, for example, the incoherent MUSIC (IMUSIC) is one of the simplest methods for estimating wideband DOA [31]. There are two steps in IMUSIC: Firstly, a noise subspace model each temporal frequency is constructed. Then, wideband DOAs are obtained by minimizing the norm of orthogonal relation between a steering vector and the noise subspace of all frequency bins. Although accuracy performance of IMUSIC was demonstrated to be an effective method for estimating DOAs of multiple wideband signals in the high signal-to-noise ratio (SNR) region, a single small distortion of the noise subspace at any frequency can affect the whole DOA results. Many attempts were made recently to overcome this problem. For instance, the test of orthogonality of frequency subspaces (TOFS) was proposed to overcome this difficulty [32], but performance degradation caused by the small distortion still remains challenging. Another relevant approach is called the test of orthogonality of projected subspaces (TOPS) [33]. TOPS estimate DOA by constructing signal subspace of one reference frequency, and then measuring orthogonality of the previous signal subspace and noise subspace for all frequency bins. The simulations showed that TOPS is able to achieve higher accuracy than IMUSIC in mid SNR range, however, the undesirable false peaks still remain. The revised and greatly improved version of TOPSs were proposed recently to reduce these false peaks [34,35]. Obviously, computational complexities increased dramatically compared to the classical TOPS.
Another notable approach of wideband DOA estimation is the coherent signal subspace method (CSS) [36,37]. CSS specifically focuses a correlation matrix of received signals of each temporal frequency into a single matrix, which is called a universal correlation, associated with one focusing frequency via linear transformation procedure. Wideband DOAs are estimated by applying a single scheme of any narrowband subspace method on the universal correlation matrix. In addition to the transformation procedure of CSS [38,39,40], a process of DOA preliminary estimation is required before the wideband DOAs can be estimated. Therefore, a common shortcoming is clearly recognized as a requirement of DOA preliminary estimation, which means that any inferior initiation can lead to biased estimates. According to the literature [31,32,33,41], CSS demonstrates deficient performance than others such as TOPS; this is because the solutions of transformation procedure in CSS are solely focused on subspace between a temporal frequency and focusing frequency; to the best knowledge of the authors, it means that a fundamental component of the transformation matrix across all frequency bins may exhibit the different core component, which is clearly apparent when a narrowband DOA result at some frequency is not close enough to the true DOA. A single component distortion can definitely affect the whole DOA results. Therefore, the solutions have to exhibit the exact component even though power present in a received signal at that frequency is very weak; in other words, the solution of transformation matrix have to be focused across all frequency bins instead of the pair of different frequencies.
Therefore, the purpose of this paper is to investigate an alternative for estimating wideband 2D DOAs in a more efficient way. We consider wideband sources as sound sources, such as human speeches and musical sounds. In order to estimate the wideband DOAs, we address the issue of transforming multiple narrowband cross-correlation matrices for all frequency bins into a wideband cross-correlation matrix. Additionally, our study is inspired by a computational model of CSS with further improvement of a linear transformation procedure [36,37,38,39,40]. Since the transformation procedures of CSS are only focused on subspace between current and reference frequency as previously mentioned, we propose a new transformation procedure which focus all frequency bins simultaneously and efficiently. The higher-order generalized singular value decomposition (HOGSVD) is firstly used to achieve this important issue [42]. By employing HOGSVD of arrays of the new unique cross-correlation matrix, where elements in the row and column positions are a sample cross-correlation matrix between received signal and itself on two distinct frequencies, the new transformation procedure no longer require any process of DOA preliminary estimation. Finally, the wideband cross-correlation matrix is constructed via the proposed transformation procedure, and the wideband DOAs can be estimated by employing any subspace-based technique for estimating narrowband DOAs, but using this wideband correlation matrix instead of the narrowband correlation matrix. Therefore, the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOA of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples, such as 2D-MUSIC and ESPRIT with an L-shaped array, are presented to showcase its applicability and effectiveness.
The rest of this paper is organized as follows. Section 2 presents the array signal model, basic assumptions and problem formulation for transforming narrowband sample cross-correlation matrices for all frequency bins into a single matrix, which is called wideband cross-correlation matrix. Description of the new transformation procedure is introduced in Section 3.1 and its effective solution via HOGSVD in Section 3.2. Section 3.3 provide a description of the proposed framework for estimating wideband DOAs by combining the proposed transformation procedure along with a scheme of estimating DOAs in a recent narrowband subspace method, and its practical examples are presented in Section 3.3.1 and Section 3.3.2. The simulation and experimental results are compared with the several existing methods in Section 4 and Section 5. Finally, Section 6 concludes this paper.
2. Preliminaries
2.1. Data Model
The proposed method presented in this paper considers far-field sound sources. Received signals are a composition of the multiple sources, each one consisting of an angle in a spherical coordinate system. The received signals are transformed into a time-frequency representation via the short-time Fourier transform (STFT), and are given by
| (1) |
where is the summation of a received signal, is a source signal, is an additive noise, the constant M is the number of microphone elements, and K is the number of incident sources. The matrix stands for the array manifold where and are phase angle components of the source on x and z axes in the spherical coordinate system. Note that the elements in depend on an array geometry.
Consider the L-shaped array structure consisting of two ULAs as illustrated in Figure 1, the received signals are simplified as
| (2) |
where
| (3) |
Figure 1.
L-shaped microphone array configuration for 2D DOA estimation.
From the above definitions, and a subscript x are belonged to x subarray, and likewise, and a subscript z are belonged to z subarray where N is the number of microphone elements each subarray with . The variable t is time, f is a source frequency, d is the spacing of microphone elements, is a wavelength with respect to where c is the speed of sound in current medium, and is a reference frequency.
2.2. Basic Assumptions
Based on the recent reviews, the following assumptions are required on the proposed framework:
Assumption 1: The number of sources is known or predicted in advance [43,44].
Assumption 2: The spacing between adjacent elements of each subarray and spacing between and should be set to for avoiding the angle ambiguity in array structure radiation [1,2,16].
Assumption 3: The source is assumed to be Gaussian complex random variable as suggested by the literature [12,16,31]. However, we consider wideband sources as sound sources such as human speech; therefore, can also be Super-Gaussian complex random variable, and it is not stationary signals for the most general case when giving an appropriate period of time.
Assumption 4: According to acoustic theory of speech, frequency dependence of the sound source, especially a human speech, is existed [45]; it means that a cross-covariance between the source and itself with distinct frequencies is not zero; where . Next, suppose that are uncorrelated, which implies that and are statistically independent of each other when ; . When , the sources can take to be partially dependent by the following literature [45]; therefore, a sample cross-covariance matrix of the incident sources over two different frequencies is given by
| (4) |
Remark that is equal to , and is a variance at frequency f of the source.
Assumption 5: An additive white Gaussian noise is considered in this paper, which is modeled as Gaussian random variable as well as the past studies. A noise cross-covariance matrix over two different frequencies is given by
| (5) |
where , and is a i-by-i identity matrix. Note again that where is a variance of the noise at frequency f. In case of the L-shaped array structure in Equation (2), we have
| (6) |
where is a i-by-j null matrix.
2.3. Transformation Problem
Under the data model and assumptions in Section 2.1 and Section 2.2, a cross-correlation matrix of the received signals is defined as
| (7) |
where . In order to transform over the available frequency range into a single smoothed matrix, which is named as a wideband cross-correlation, a transformation procedure is required as mentioned previously [36], which is expressed as
| (8) |
where
| (9) |
is the wideband cross-correlation matrix, and P is the number of STFT frequency bins. is a transformation matrix, which was originally designed by using the ordinary beamforming technique [36], or by minimizing the Frobenius norm of array manifold matrices [37]. The objective of is to transform any given f of the array manifold into . All previous solutions of are solely based on subspace between pair of distinct frequencies , as emphasized in the introduction [36,37,38,39,40]. When power of the source at some frequency is weak or less than noise power, the matrix may not share any common angle of because its non-zero eigenvalues are not full rank, which is resulted in a performance degradation for estimating both and wideband DOAs. If the transformation matrix can be focused by all frequency bins instead of the pair of frequencies, a good estimate of DOAs in Equation (8) might be expected. Based on this hypothesis, a new concept and scheme are presented in next section.
3. Proposed Method
This section introduces a new procedure for estimating a transformation matrix, its alternative solution by using the higher-order generalized singular value decomposition (HOGSVD), and practical examples of wideband DOA estimation scheme.
3.1. Problem for Estimating the Transformation Matrix and Its Solution
We start by introducing the following lemma that will be useful for obtaining a solution of transformation matrix.
Lemma 1.
Given a set of two distinct frequencies by into Equation (7), and given a transformation matrix which satisfy the property in Equation (9), assume that , the cross-correlation can be factorized into the singular value decomposition (SVD) form;
(10) where , are the matrix of left and right singular vectors and diagonal matrix of singular values in signal subspace, and likewise, , are with noise subspace. If the K largest singular values of and are equal, then is a matrix with orthonormal columns.
Proof.
Since the transformation procedure of is expressed by and the array manifold and are full rank matrices [36], Lemma 1 is valid if and only if the K largest singular values of and are equal; therefore, . Considering the smallest singular values of are close to zeros by assuming a noise-free signal and using solely the signal subspace , we have
(11) Performing the Eigenvalue decomposition (EVD) to Equation (11), square roots of the non-zero eigenvalues of above matrix is equal to [46,47]. This completes the proof of the lemma. □
Lemma 1 shows that and share the common components on the singular values and right singular vectors, whereas the both left singular vectors may be different. Since and are full rank, its remaining components are given by [48]:
| (12) |
where
| (13) |
are full rank matrices and have invertibility. From Equations (9) and (12), we have
| (14) |
which mean that the right singular vectors of and the left singular vectors of share the common subspace when has unitary property.
Since the left singular vectors of exist, we continue to introduce a new transformation procedure. The matrix can be found as a solution to
| (15) |
where is the Frobenius norm, and is the sum-of-squares K largest singular values of . If the constraint in Equation (15) is not imposed, then one of the possible choices is obtained by the least squares problem [49,50]; the solution is derived by observing the point where the derivative of cost function with respect to is zero, then we can have , and , which is difficult in practice. To solve the problem much more practically, an alternative solution is introduced, which is based on the constraint in Equation (15) and Lemma 1:
Theorem 1.
Let are the matrices in signal subspace containing the left and right singular vectors of . Imposing the constraint in Equation (15) and Lemma 1, along with the modification of orthogonal Procrustes problem (MOP), an alternative solution to Equation (15) is given by
(16) where † stands for the pseudo-inverse of a matrix. Defining the square matrix as the matrix containing error corrections, the error of transformation remains consistent with the following equation;
(17) where and are the diagonal matrix of the K largest singular values, and the noise subspace left singular vectors of , respectively.
Proof.
See Appendix A. □
Theorem 1 provides an efficient way to construct without any process of DOA preliminary estimation, but the solution are still solely based on subspace between pair of distinct frequencies. In order to observe the solution across all frequency bins, we will present an alternative for constructing by using HOGSVD along with Theorem 1, which the next section will address further.
3.2. Estimation of the Transformation Matrices by HOGSVD
Suppose we have a set of P complex matrices and all of them have a full rank;
| (18) |
where is a set of frequency intervals, and the cross-correlation matrices and are obtained form Equation (7). The definition of HOGSVD of these P matrices are given by the generalized singular value decomposition (GSVD) of datasets and its right singular vectors are identical in all decomposition [42], as follows:
| (19) |
where , are the matrix of left singular vectors, , are the matrix of right singular vectors, and , are the diagonal matrix of singular values. Note that subscripts s and n denote subspace of signal and noise, respectively. Unlike the left singular vectors and that have orthonormal columns by performing SVD, and now have unit 2-norm columns instead.
To show that is equal to for all frequency bins, let us start from brief description of HOGSVD benchmark. The matrix is obtained by performing EVD on the following matrix;
| (20) |
Let us redefine
| (21) |
where
| (22) |
is the matrix of the smallest singular values of , and , by employing Theorem 1 (For details, see Appendix A). Substituting Equations (21) and (22) into Equation (20), we have
| (23) |
Since , for all frequency bins, therefore
| (24) |
where
| (25) |
Preforming EVD in Equation (24), we can obtain , which reveal that is equal to for all frequency bins. In addition, it can be seen that the matrix or is estimated by focusing all frequency bins simultaneously; when power of the source at some frequency is weak or less than noise power, the matrices still share common angle of across all frequency bands effectively and identically.
After obtaining the right singulars vectors of , we then moved forward to find its left singulars vectors. We start by considering the following equations based in Equations (19) and (25);
| (26) |
We remark again that , have unit 2-norm columns instead of orthonormal columns [42];
| (27) |
where . Then, the singular values are obtained as follows:
| (28) |
where is the Euclidean norm, and is a column of . Finally, the matrices , are obtained by solving Equation (26) with Equation (28), which also satisfy the condition in Equation (27).
After performing HOGSVD of Equation (18) to obtain the left and right singular vectors of , the transformation matrices can be assembled as follows:
| (29) |
Note that since orthonormal columns have not yet been assumed on the matrix in Theorem 1, the transformation procedure via HOGSVD is still compatible with Theorem 1 without requiring any modifications (For details, see Equations (A13) and (A14) in Appendix A).
We now consider the computational complexity of HOGSVD. It is not surprising that HOGSVD has a heavy computational burden; that is because matrix inversions are intensively used in Equation (20). To avoid the computational burden caused by the matrix inversions, Equation (20) is reformulated by the following technique [51]. It begins by performing the economy-sized QR decomposition of Equation (19);
| (30) |
where is the upper triangular matrix, and is a one portion of the -by-M matrix resulting from the QR decomposition of Equation (19). Next, is simplified as
| (31) |
where
| (32) |
Performing EVD of Equation (32), then we have , where and are the matrix of eigenvectors and matrix of eigenvalues, respectively. Finally, the alternative computation of is expressed as , where the K smallest eigenvalues of are belonged to signal subspace.
Computational complexity of conventional HOGSVD in Equation (20) and optimized HOGSVD in Equation (32) are investigated by applying the following scenario: an matrix addition, subtraction, multiplication and element-wise multiplication follow the traditional way, whereas an matrix inversion and QR decomposition of matrix are implemented by using Gauss–Jordan elimination algorithm and Householder transformation, respectively. Comparing computation costs of Equations (20) and (32) from Table 1 and Table 2, it is clearly seen that the technique in Equations (31) and (32) simplifies the mathematical model, reduces the matrix operations and improves the speed of computation. When , the optimized HOGSVD has arithmetic complexity of , which exhibits remarkably less computational complexity than the conventional HOGSVD that is presented as . Since P in most cases is much greater than M, therefore, the cost of the optimized HOGSVD can logically be less than the conventional HOGSVD.
Table 1.
Command used in HOGSVD.
| Command Name | Command Counts | |
|---|---|---|
| HOGSVD in Equation (20) | Optimized HOGSVD in Equation (32) | |
| Matrix Addition/Subtraction | ||
| Element-wise Multiplication | 1 | 0 |
| Matrix Multiplication | ||
| Matrix Inversion | P | |
| QR Decomposition | 0 | 1 |
| Eigenvalue Decomposition (EVD) | 1 | 1 |
Remark: is caused by a matrix multiplication of .
Table 2.
Computational complexities.
| Command Name | Complex Floating Point Operations per Command |
|---|---|
| Matrix Addition/Subtraction | |
| Element-wise Multiplication | |
| Matrix Multiplication | |
| Matrix Inversion (Gauss-Jordan elimination) | |
| QR Decomposition (Householder transformation) | |
| HOGSVD in Equation (20) without counting EVD | |
| Optimized HOGSVD in Equation (32) without counting EVD |
3.3. DOA Estimation Scheme
After the transformation matrices are formed by using HOGSVD, we now proceed to describe a framework for estimating the wideband DOAs. We start by simplifying the wideband cross-correlation matrix in Equation (8) with EVD form and substituting with , as follows:
| (33) |
where
| (34) |
Here, and are the diagonal matrix of eigenvalues and matrix of eigenvectors of Equation (33) in signal subspace, and possess unitary property by the fact that are the matrices with orthonormal columns [46,47]. Remark that is also derived by performing EVD; the matrices , are the eigenvectors and diagonal matrix of eigenvalues in signal subspace, and likewise, , are with noise subspace. Furthermore, considering only the signal subspace by focusing on the K largest singular values , we can expect that Equation (33) is equivalent to Equation (8);
| (35) |
which can be proved by employing Lemma 1, Equations (12)–(14), and Equations (A4)–(A6) on Appendix A (We omit the proof since the result is easily obtained by performing straightforward substitution). In this state, provides an efficient way to transform any given f into by observing the solution across frequency bands without loss of generality; it means that the transformation is no longer biased by the pair of distinct frequencies . Furthermore, it is clearly seen that the wideband cross-correlation matrix in Equation (33) is the combination of narrowband sample cross-correlation matrices across all frequency bins, but its array manifolds are focused on the single reference frequency by using , which is now feasible to estimate the wideband DOAs by employing any recent subspace-based technique for estimating narrowband DOAs [18,20,21,22,23,24,25,26], but using this wideband correlation matrix instead of the narrowband correlation matrix. Practical examples, such as MUSIC and ESPRIT, will be presented to showcase its applicability and effectiveness in the next section.
Remarks: In case of the L-shaped array structure in Equation (2), we can repeat the proposed transformation procedure to find the solution for x subarray in Equation (2) and (3); starting from Equation (7) by replacing with , the solution for the x subarray can be given by:
| (36) |
| (37) |
| (38) |
By performing the same procedure, the solution for z subarray is likewise given by replacing with and the subscript x with z in Equations (36)–(38);
| (39) |
| (40) |
| (41) |
3.3.1. DOA Estimation Scheme via MUSIC
MUSIC estimates the DOA of the sources by locating the peaks of MUSIC spectrum along with exploiting the orthogonality of the signal and noise subspaces [12,48]. Let us define the complementary orthogonal space which is orthogonal to ;
| (42) |
for all , where is a column of as shown in Equation (3). Additionally, the following complementary orthogonal space is also valid;
| (43) |
by the fact that , which implies that it is possible to reduce a computational complexity of Equation (33) by using only instead of calculating . The computationally efficient two-dimensional MUSIC (2D-MUSIC) spectrum is expressed as
| (44) |
When the denominator in Equation (44) approaches zero for the true angles of the signals, the 2D-MUSIC spectrum will have peak spikes indicating this angles. In case of the L-shaped array structure, the x and z subarray angles are estimated separately by locating the spectral peaks of the following equations:
| (45) |
where are column of , respectively.
3.3.2. DOA Estimation Scheme via ESPRIT
We start by recalling the array manifold and in Equation (3). ESPRIT takes advantage of the rotational invariance property of ULA [13], as follows:
| (46) |
where
| (47) |
and stand for the first and last rows of , respectively. Similar to [20,21,26], the matrices can be simplified with Equations (3), (36)–(38) and (46), as follows:
| (48) |
where are invertible matrices, and stand for the first and last rows of , respectively. Considering Equation (48), we can construct new matrices as follows:
| (49) |
The angles can thus be estimated by the eigenvalues of , as follows:
| (50) |
where is the eigenvalue of , respectively. Furthermore, it is possible to reduce the computational complexity by using only as well as MUSIC;
| (51) |
where
| (52) |
and stand for the first and last rows of , respectively.
4. Numerical Simulations
In this section, performances of fusion methods by using the proposed framework are demonstrated in four types of the following scenarios: (1) a performance of selected method and the proposed methods with respect to source types, (2) the performance with respect to the number of microphone elements, (3) the performance with considering automatic pairing of the x and z subarray angles, and (4) the performance under a reverberation environment. Scenarios 1, 2 and 4 have to find DOA of x and z subarray angles separately by using the data model in Equation (2). Whereas Scenario 3 has to find DOA of x and z subarray angles simultaneously with considering automatic pairing, by using the data model in Equation (1). We provided the simulation tests of the proposed methods in comparison to following methods: IMUSIC [31], TOFS [32], TOPS [33], Squared-TOPS [34], WS-TOPS [35]. Note that the CSS-based methods are excluded in these tests; this is because unintended biases, causing by a process of DOA preliminary estimation, should be taken into consideration to other candidate methods as discussed in the literature [31,32,33,41].
To measure the overall performance of estimating the x and z subarray angles for each scenario, the root-mean-square-error (RMSE) and standard division (SD) are defined as the following equations;
| (53) |
| (54) |
where K is the source number, J is the number of trials, represent the estimated x and z subarray angles each trial, represent an average of the estimated x and z subarray angles, and represent true x and z subarray angles.
Computer simulations were carried out in Matlab® R2017a, using PC with Debian GNU/Linux 9.4 × 86_64, Intel® Core™ i5-4590 CPU 3.30 GHz, 16G RAM, Intel® Math Kernel Library 11.3.1 on BLAS and LAPACK 3.5.0. Each scenario is repeated 100 times, and simulation parameters are chosen as follows: sampling frequency is 48 kHz, an output of each microphone is captured at 1 s, speed of sound c is 343 m/s, the spacing of microphone elements d is 5 cm, STFT focusing frequency range is from 0.1 to 16 kHz, the reference frequency is 3.43 kHz. Note that we used perturbations of the true angles by adding Gaussian random noise.
4.1. Scenario 1: Performance with Respect to Source Types
Figure 2 and Figure 3 showed performance comparisons of the selected methods and the proposed methods in term of RMSE and SD over a range of SNR. The proposed methods are the modified MUSIC in Equation (45) and ESPRIT in Equations (50)–(52). The number of microphone elements each subarray is six, and the three uncorrelated source angles are placed at , and . In Figure 2a and Figure 3a, sources are human speeches. Sources in Figure 2b and Figure 3b are recorded sound on a piano comprising various monochromatic notes and containing sampling frequency range up to 48 kHz. Note that all sources are not stationary signals. The results in Figure 2 and Figure 3 showed that the proposed method with ESPRIT can efficiently handle both source types compared to other candidate methods with acceptable SNR ranges. Subsequently, it is interesting to take a close look at 40 dB SNR in Figure 2 and Figure 3 where IMUSIC, TOFS, the proposed method with MUSIC and ESPRIT showed very low RMSE, which could attest to good DOA estimation. When decreasing the SNR to 25 dB, IMUSIC and TOFS begin to demonstrate worse RMSE quality, which is much higher than the proposed methods, and it is clearly seen when decreasing the SNR to 10 dB that all tested methods are significantly dominated, but the proposed method with ESPRIT is still associated with more satisfactory results compared to using other methods. It should be mentioned that IMUSIC and TOFS require the number of sensor elements to be much higher than the number of sources to achieve fairly good results [31,32,33,41]. Hence, the simulation results in Figure 2 and Figure 3 are able to provide evidence that the proposed methods perform better in estimation than other candidate methods when the incident sources are wideband and non-stationary signals. Although the performances of the proposed method with MUSIC is also dominated by the noises, the overall performances is still more effective than other methods.
Figure 2.
RMSE estimation performance versus SNR on Scenario 1; (a) three different human speeches, and (b) three uncorrelated musical sounds where six microphones are employed each subarray.
Figure 3.
SD estimation performance versus SNR on Scenario 1; (a) three different human speeches, and (b) three uncorrelated musical sounds where six microphones is employed each subarray.
4.2. Scenario 2: Performance with Respect to the Number of Microphone Elements
Figure 4 and Figure 5 illustrates performance comparisons of the selected methods and the proposed methods in term of RMSE and SD over a range of SNR. The three uncorrelated source angles are human speeches, and are placed as previously used. Firstly, let us start by looking at the case of twelve microphones in Figure 4c and Figure 5c. IMUSIC, TOFS and WS-TOPS exhibited remarkably low levels of RMSE in the SNR range from 15 to 30 dB; this is because their performances dramatically depend on the number of sensor elements more than the number of sources [31,32,33,41]. Likewise, the proposed method with MUSIC and ESPRIT also demonstrated very low RMSE, which may imply that the performance of the proposed methods, IMUSIC, TOFS and WS-TOPS are especially effective for a wideband DOA estimation. However, the low number of microphone elements should be considered for providing more practical applications. In the case of eight microphones in each subarray, the performances of the selected methods are dominated by the number of microphone elements as illustrated in Figure 4b and Figure 5b. Furthermore, the performances of selected methods are dramatically degraded when employing four microphones as illustrated in Figure 4a and Figure 5a. The relevant reason is that an undesirable false peak in the spatial spectrum of the selected methods occurred, caused by the perturbation of noise; when power of the noise at some frequency is high or grater than source power, the orthogonality between the noise subspace and search space at that frequency may be not sufficient to prevent the false-alarm peaks [41]. On the contrary, RMSE performance of the proposed methods are also dominated, but less than the other methods, by exhibiting the subspace for all frequency bins simultaneously as shown in Section 3. Therefore, the proposed methods provide substantially better RMSE performance than the other methods, which implies that dependency between the number of microphone elements and sources can be relaxed. This substantial ability is more meaningful for many practical applications.
Figure 4.
RMSE estimation performance versus SNR on Scenario 2; three human speeches are employed and the number of microphone elements each subarray on (a) , (b) , and (c) .
Figure 5.
SD estimation performance versus SNR on Scenario 2; three human speeches are employed and the number of microphone elements each subarray on (a) , (b) , and (c) .
4.3. Scenario 3: Performance with Considering Automatic Pairing
This scenario estimated the DOA of x and z subarray angles simultaneously with considering automatic pairing and following the data model in Equation (1). As the L-shaped array structure consisting of two ULAs as illustrated in Figure 1, some research works estimate the DOA of x and z subarray angles separately by implementing 1D DOA estimation for each ULA [17,18,19,20,21,22,23,24,25,26]. When utilizing more than one source, these algorithms require an additional angle pair matching procedure to map the relationship between the two independent subarray angles. For instance, finding the corresponding angle pairs by rearranging the alignment of with a fixed right-hand side of the array manifolds of the z-subarray in the sample cross-covariance matrix [52]. It should be noted that a pair-matching procedure may results in a performance degradation caused by pair-matching error. In order to achieve the automatic pairing without the pair-matching procedure, we selected the modified 2D-MUSIC in Equation (44) as the proposed method in this scenario. Furthermore, TOPS, Squared-TOPS, WS-TOPS are excluded in these tests by the fact that the methods have only supported the ULA model. Note that the 2D peak finding algorithm was employed on 2D-IMUSIC, 2D-TOFS and the proposed method. Figure 6 and Figure 7 showed performance comparisons of 2D-IMUSIC, 2D-TOFS and the proposed method in term of RMSE and SD over a range of SNR, where the number of microphone elements including all subarray is eight, the three uncorrelated source angles are human speeches, and are placed as previously used. Figure 6 indicates that the proposed method with 2D-MUSIC exhibits extremely similar overall performances to 2D-IMUSIC and 2D-TOFS when the SNR increases to more than 10 dB; however, computational burden of the proposed method can be significantly lower than those of the other methods, which Section 4.5 will reveal further insight.
Figure 6.
RMSE estimation performance versus SNR on Scenario 3 where .
Figure 7.
SD estimation performance versus SNR on Scenario 3 where .
4.4. Scenario 4: Performance under Reverberation Environment
In this scenario, we compared RMSE and SD performances of the proposed methods to other methods with respect to reverberation time. This scenario estimated DOA of x and z subarray angles separately by using the data model in Equation (2) without considering automatic pairing. The proposed methods in this scenario are the modified MUSIC in Equation (45) and ESPRIT on Equations (50)–(52). The reverberations were simulated by the following procedure [53], and its simulated wall absorption coefficients are shown in Table 3, where the dimensions of enclosure room is m, a measurement protocol of reverberation time is RT60, and the reverberation time is from 200 to 1000 ms. The three uncorrelated source angles are employed in the same way as previously used, and the number of microphone elements in each subarray is twelve. Figure 8 illustrated performance comparisons of the selected methods and the proposed methods, where a color of the graph on Figure 8a denotes RMSE, whereas a color of the graph on Figure 8b denotes SD estimation performance. The vertical axis is represented as the reverberation time and horizontal axis is represented as a range of SNR. Simulation results in Figure 8 indicated that reverberation has strong effects on RMSE and SD performances in both of the selected methods and the proposed methods, and the performances decreased more significantly at the high noise levels and the long reverberation times. Since the reverberation time is decreasing, all selected methods begin to demonstrate low RMSE. It means that the trade-off between the robustness of reverberation and SNR should be considered deeply in actual applications, for instance, applying a reverberation cancellation technique or a noise cancellation technique to provide much more reliable estimation performances of both RMSE and SD. The proposed methods, however, largely outperform the other methods with respect to the reverberation time index and SNR level range between 10 and 40 dB without considering the trade-off. This can support that the performance of the proposed methods can be especially effective for a wideband DOA estimation under a reverberant environment.
Table 3.
Wall absorption coefficients at various reverberation time in Scenario 4 [53].
| Reverberation Time based on RT60 (Millisecond) | Axial Wall Plane | |||||
|---|---|---|---|---|---|---|
| Positive Direction | Negative Direction | |||||
| 200 | 0.7236 | 0.2021 | 0.6844 | 0.0792 | 0.2436 | 0.5586 |
| 300 | 0.7142 | 0.1687 | 0.7666 | 0.2650 | 0.2387 | 0.7043 |
| 400 | 0.7306 | 0.0555 | 0.7731 | 0.4091 | 0.8493 | 0.8587 |
| 500 | 0.5064 | 0.4974 | 0.8248 | 0.4189 | 0.8069 | 0.7572 |
| 600 | 0.6074 | 0.6299 | 0.8028 | 0.7599 | 0.6373 | 0.8209 |
| 700 | 0.7442 | 0.7624 | 0.8734 | 0.6922 | 0.6480 | 0.7893 |
| 800 | 0.6779 | 0.6827 | 0.7865 | 0.8045 | 0.8386 | 0.8430 |
| 900 | 0.6992 | 0.7111 | 0.7741 | 0.8752 | 0.8233 | 0.9081 |
| 1000 | 0.7622 | 0.7707 | 0.9394 | 0.8248 | 0.8192 | 0.8398 |
Figure 8.
Performance evaluations of Scenario 4; (a) RMSE estimation performance versus SNR, and (b) SD estimation performance versus SNR, where three uncorrelated human speeches are employed along with a reverberant environment. The reverberations were simulated by the following procedure [53], where dimensions of enclosure room is m, a measurement protocol of reverberation time is RT60, and wall absorption coefficients are followed on Table 3.
4.5. Computational Complexity
Computational complexity of the proposed methods was evaluated using execution time measurement under a stable environment. We provided a computational complexity in comparison with the following cases: (1) calculating DOAs of x and z subarray angles separately as shown in Figure 9a, and (2) calculating the DOAs of both subarray angles simultaneously as shown in Figure 9b. Note that computational burdens of a peak searching algorithm are relevant in this study, where the number of searching angle in each subarray is 180. It is apparently seen in Figure 9 that computation time of the other methods presented higher growth rates than the proposed methods. This is because the peak searching algorithm execution time is potentially high, and almost all selected methods require intensive computations by testing the orthogonality of subspace and search space of narrowband sample cross-correlation matrices for all frequency bins, which results in high computation costs. On the contrary, the proposed methods transform all narrowband sample cross-correlation matrices across all frequency bins into a single matrix as shown in Equations (33)–(35), and this matrix contains useful information of source cross-correlation matrices across all frequency bins as ; in other words, the orthogonality testing of subspace and search space can be done by using the wideband cross-correlation matrix in Equations (33)–(35) instead of narrowband sample cross-correlation matrices for all frequency bins. Therefore, the computational complexity of the proposed methods remarkably less than the other methods, which is confirmed by the test results in Figure 9.
Figure 9.
Computational complexities; (a) changing the number of microphone elements each subarray N, and (b) the number of microphone elements including all subarray M where the number of incident sources .
5. Experimental Results
In this section, experiments were carried out to examine the performance of the proposed methods. Experimental parameters were chosen as the previous simulations, except as follows: We used human speakers as sources of the original speech with random sentences. Their speeches were recorded for 20 runs continuously, and each record signal, approximating 1 min long, was cut into 3 s epochs. Structure of the microphone was followed by Figure 1 and Figure 10, and the specifications of the microphone and its recording device were followed on Table 4. The experiment was performed in an indoor meeting room, and its dimensions are shown in Figure 11, where sound pressure level in the meeting room in a normal situation is 46.6 dBA, and the estimated reverberation time is based on RT60 is 219 ms.
Figure 10.
Photograph of the microphone array system.
Table 4.
System specification
| Hardware Type/Parameter | Specification/Value |
|---|---|
| Audio Interface | Roland® Octa-capture (UA-1010) |
| Sampling Frequency | 48,000 Hz |
| Microphone Name | Behringer® C-2 studio condenser microphone |
| Number of Microphones | 8 |
| Pickup Patterns | Cardioid (8.9 mV/Pa; 20–20,000 Hz) |
| Diaphragm Diameter | 16 mm |
| Equivalent Noise Level | 19.0 dBA (IEC 651) |
| SNR Ratio | 75 dB |
| Microphone Structure | L-shaped Array |
| Spacing of Microphone | 9 cm |
Figure 11.
Photograph of the experimental environment, floor plan and the room dimensions.
Two scenarios are considered: (1) estimating DOA of x and z subarray angles separately, and (2) estimating DOA of x and z subarray angles simultaneously while considering automatic pairing. In case of Experiment 1, the proposed methods are the modified MUSIC in Equation (45) and ESPRIT in Equations (50)–(52), comparing with the following methods: IMUSIC [31], TOFS [32], TOPS [33], Squared-TOPS [34], WS-TOPS [35]. In case of Experiment 2, the proposed method is the modified 2D-MUSIC in Equation (44), comparing with 2D-IMUSIC [31], and 2D-TOFS [32].
Table 5 and Table 6 showed performance comparisons of the selected methods and the proposed method in term of RMSE over the range of source number, where Table 5 is for Experiment 1, and Table 6 is for Experiment 2. The boldfaced results highlight the optimal minimum RMSE in each problem. As highlighted in Table 5, the performance of IMUSIC exhibited the lowest RMSE when a single source was used, but the performance of the other methods including the proposed methods also exhibited similarly low RMSE in an acceptable error range. When the two sources are performed, the performance of TOPS, Squared-TOPS and WS-TOPS are directly dominated, whereas IMUSIC, TOFS and the proposed methods are slightly dominated, but still maintained sufficiently good performance. When the incident sources are increasing to three, we clearly see that the performance of IMUSIC, TOFS, TOPS, Squared-TOPS and WS-TOPS are significantly dominated by the number of incident sources, because those methods require the number of sensor elements to be much more higher than the number of sources to achieve reasonably good results, which can be verified by referring to the simulation results in Section 4 and Figure 4 and Figure 5. The proposed methods, however, are able estimate the DOA of three sources effectively and better than the selected methods. The reason is that the proposed methods focus on the subspace across all frequency bins simultaneously instead of focusing each frequency band individually, which is stated in Section 3.2. In case of Experiment 2 in Table 6, the experiment results indicate that the proposed method with 2D-MUSIC exhibit extremely similar overall performances to 2D-IMUSIC and 2D-TOFS. As already stated in Section 4.5, the computational complexity of the proposed method is definitely lower than 2D-IMUSIC and 2D-TOFS by the fact that those methods check the orthogonality of subspace and search space of narrowband sample cross-correlation matrices for all frequency bins, resulting in very high computation requirement. The proposed method tests the orthogonality of subspace and search space by using the wideband sample cross-correlation matrix in Equation (33) instead of using the subspace of narrowband sample cross-correlation matrices for all frequency bins, but it is sufficient to exhibit significant effects as well as using the subspace of narrowband sample cross-correlation matrices for all frequency bins. In the end, the experimental results from Table 5 and Table 6 are able to provide evidence that the proposed methods have better estimating performance than other methods with respect to the number of incident sources.
Table 5.
Performance evaluation on Experiment 1. The boldfaced results highlight the optimal minimum RMSE.
| Incident Sources | RMSE of DOAs (Degree) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Number | Position | Angle (Degree) | IMUSIC | TOFS | TOPS | Squared TOPS | WS-TOPS | Proposed Method with MUSIC | Proposed Method with ESPRIT |
| 1 | 96 | 0.3050 | 0.2050 | 1.0950 | 1.3350 | 0.5600 | 0.7750 | 0.7074 | |
| 86 | 0.5400 | 1.2600 | 1.2750 | 2.0150 | 0.6850 | 0.5700 | 0.6915 | ||
| Average | 0.4225 | 0.7325 | 1.1850 | 1.6750 | 0.6225 | 0.6725 | 0.6995 | ||
| 2 | 65 | 1.1857 | 1.7286 | 20.0143 | 28.5857 | 37.8714 | 1.5000 | 2.0284 | |
| 150 | 9.6000 | 6.6857 | 26.3571 | 39.7857 | 88.2000 | 8.8143 | 8.6800 | ||
| 55 | 1.0714 | 1.6857 | 22.2571 | 19.4000 | 32.2429 | 2.9714 | 3.8695 | ||
| 100 | 8.3714 | 8.3857 | 5.0143 | 6.7857 | 60.2286 | 6.6714 | 3.1630 | ||
| Average | 5.0571 | 4.6214 | 18.4107 | 23.6393 | 54.6357 | 4.9893 | 4.4353 | ||
| 3 | 58 | 2.1400 | 2.3900 | 46.5500 | 52.8100 | 40.9500 | 3.6600 | 4.0334 | |
| 55 | 55.0000 | 55.0000 | 55.0000 | 55.0000 | 55.0000 | 9.4300 | 4.1057 | ||
| 100 | 1.8400 | 2.0000 | 41.5700 | 62.4000 | 70.9100 | 1.8700 | 2.4554 | ||
| 95 | 95.0000 | 83.4200 | 52.4500 | 71.4800 | 95.0000 | 9.7700 | 5.8638 | ||
| 130 | 10.9300 | 11.8900 | 28.8300 | 32.2800 | 95.2400 | 8.2500 | 6.9071 | ||
| 120 | 26.9800 | 25.8400 | 16.1200 | 18.0100 | 91.2800 | 5.9400 | 7.3165 | ||
| Average | 31.9817 | 30.0900 | 40.0867 | 48.6633 | 74.7300 | 6.4867 | 5.1137 | ||
Table 6.
Performance evaluation on Experiment 2. The boldfaced results highlight the optimal minimum RMSE.
| Incident Sources | RMSE of DOAs (Degree) | ||||
|---|---|---|---|---|---|
| Number | Position | Angle (Degree) | 2D-IMUSIC | 2D-TOFS | Proposed Method with 2D-MUSIC |
| 1 | 96 | 0.9000 | 0.9000 | 0.9000 | |
| 86 | 0.4000 | 1.0500 | 0.7500 | ||
| Average | 0.6500 | 0.9750 | 0.8250 | ||
| 2 | 57 | 0.9500 | 1.1500 | 1.1000 | |
| 91 | 1.0500 | 1.8000 | 1.7000 | ||
| 139 | 4.9500 | 5.2000 | 5.4500 | ||
| 96 | 3.1500 | 3.3000 | 2.0500 | ||
| Average | 2.5250 | 2.8625 | 2.5750 | ||
| 3 | 48 | 0.9500 | 1.5500 | 1.9500 | |
| 86 | 1.4500 | 0.8000 | 2.4500 | ||
| 98 | 0.9000 | 1.8000 | 1.1500 | ||
| 95 | 1.4500 | 2.1500 | 2.6000 | ||
| 152 | 2.7000 | 2.4000 | 5.9000 | ||
| 95 | 4.5000 | 3.9000 | 1.4500 | ||
| Average | 1.9917 | 2.1000 | 2.5833 | ||
| 4 | 100 | 5.8095 | 6.5238 | 3.2857 | |
| 94 | 2.4286 | 2.6190 | 1.6667 | ||
| 51 | 1.2381 | 1.0952 | 2.5714 | ||
| 95 | 0.5714 | 0.6667 | 1.3333 | ||
| 134 | 1.9524 | 1.8571 | 3.9524 | ||
| 103 | 10.0952 | 10.2857 | 9.2857 | ||
| 153 | 7.4762 | 7.8095 | 7.8571 | ||
| 89 | 4.7143 | 4.7143 | 5.3810 | ||
| Average | 4.2857 | 4.4464 | 4.4167 | ||
Since the sound source directions are static in Table 5 and Table 6, it is necessary to consider moving sound sources for more practical use. In future work, we will extend the proposed method for moving sound sources, and further develop the prototype to support more realistic tasks.
6. Conclusions
An efficient framework for estimating DOA of wideband sound sources was presented. The issue of transforming multiple narrowband cross-correlation matrices for all frequency bins into a wideband cross-correlation matrix has been addressed successfully by focusing on signal subspace for all frequency bins simultaneously instead of the pairing of temporal and reference frequency as done by the CSS-based methods. A new solution to this problem has been given by performing HOGSVD of the array of novel cross-correlation matrices, where elements in the row and column positions are a sample cross-correlation matrix between received signal and itself on two distinct frequencies. It was shown in the theoretical analysis that the proposed transformation procedure provided the best solution under appropriate constraints, and no longer required any process of DOA preliminary estimation. Subsequently, we provided an alternative to construct the wideband cross-correlation matrix via the proposed transformation procedure, and wideband DOAs were estimated easily using this wideband matrix along with a single scheme of estimating DOAs in any narrowband subspace methods. A major contribution of this paper is that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOA of the wideband sources directly, which results in reducing computational complexity and facilitating the estimation algorithm. We also have performed several examples of using the proposed framework, such as 2D-MUSIC, MUSIC, and ESPRIT method integration with the L-shaped microphone arrays. Furthermore, the simulation and experimental results showed that the fusion methods by using the proposed framework exhibited especially effective performance compared to other wideband DOA estimation methods over a range of SNR with much fewer sensors, high noise and reverberation conditions. We believe that the proposed method represents an efficient way for wideband DOA estimation and would be able to improve wideband DOA estimates not only for acoustic signal processing but also other possible related fields.
Acknowledgments
The authors are grateful to Kochi University of Technology for Monthly Support via a grant of Special Scholarship Program over a period of three years.
Appendix A. Proof of Theorem 1
This appendix provides a detailed derivation of Theorem 1. We begin by considering the cross-correlation matrices in Equation (7). can be constructed into the EVD form, which is given by
| (A1) |
where , are the matrix of eigenvectors and diagonal matrix of eigenvalues in signal subspace, and likewise, , are with noise subspace. In case of , it can be derived by performing SVD, which directly follows from Equation (10). Since and are full rank matrices [36], its remaining components are expressed as follows [48]:
| (A2) |
where
| (A3) |
are also full rank and invertible. Note again that have orthonormal columns [46], hence, it is obvious to see that
| (A4) |
| (A5) |
From Equation (A4), we may expect that have unitary property, but it is incorrect when considering Equation (A5). Therefore, a proposition of have to be identity;
| (A6) |
When considering only the signal subspace, it can be seen from Equations (A4)–(A6) that the right singular vectors of and the eigenvectors of are identical.
Next, we continue to generalize the objective function in Equation (15) by utilizing Orthogonal Procrustes (OP) [54], but with some modification (MOP). The objective function in Equation (15) is rederived by
| (A7) |
where returns the real part of the variable a, trace of the square matrix . Considering each expression in Equation (A7), the trace of a product of two square matrices is independent of the orders;
| (A8) |
Employing Lemma 1, next, we have
| (A9) |
From Equation (A6), we finally have
| (A10) |
Substituting Equations (A8)–(A10) into Equation (A7), the objective function is simplified as
| (A11) |
Three expressions of , , are completely isolated from . Therefore, the optimization problem is redefined as
| (A12) |
Now there are two possible cases which we need to consider. The first case is when the smallest singular values of are close to zeros; the other is when some of the smallest singular values of are morn than zeros.
Case 1: Assume that all the smallest singular values of are close to zeros, we have
| (A13) |
Using the proposition of Equation (A6) and employing Lemma 1, two possible solutions to reach the maximum point of Equation (A13) can be found. The first solution is given by
| (A14) |
where orthonormal columns has not yet been defined on , and the second solution is given by
| (A15) |
where has orthonormal columns. Note that the subscript † denotes the pseudo-inverse. When the constraints in Equation (A13) are imposed into Equations (A15) and (A14), we can have that , , and the maximum is achieved;
| (A16) |
Case 2: Assume that some of the smallest singular values of are more than zeros, the best solution of Equation (A12) can be given the same as Equation (A15), and its minimum is equal to Equation (A16);
| (A17) |
On the contrary, when using Equation (A14) in Equation (A12), the minimum of cost function is remained by
| (A18) |
Using the solution of Equation (A14) rather than Equation (A15) allows us to relax the error constraint in the hope of arriving at a reduction in the computation of HOGSVD (For details, see Section 3.2), but this is still sufficient for estimating without loss of generality; the squares of smallest singular values of are very close to zeros, so we can assume that . Remark that error of the transformation remains consistent with the following equation;
| (A19) |
To further reduce a computational burden caused by performing SVD of and EVD of , we reinitialize the cross-correlation matrix as
| (A20) |
which is possible to reduce the computation by performing single SVD operation on .
Author Contributions
B.S. conceived of the hypothesis, provided the mathematical proof, designed and performed the experiments and wrote the manuscript as part of a PhD project. M.F. supervises the project and contributed to the development of the ideas.
Funding
This work was supported by JSPS KAKENHI Grant Number JP18K12111 and MEXT Grant Number 91506000972.
Conflicts of Interest
The authors declare no conflict of interest.
References
- 1.Haykin S., Liu K.R. Handbook on Array Processing and Sensor Networks. Wiley; Hoboken, NJ, USA: 2010. [DOI] [Google Scholar]
- 2.Zekavat R., Buehrer R.M. Handbook of Position Location: Theory, Practice and Advances. 1st ed. Wiley; Hoboken, NJ, USA: 2011. [DOI] [Google Scholar]
- 3.Song K., Liu Q., Wang Q. Olfaction and Hearing Based Mobile Robot Navigation for Odor/Sound Source Search. Sensors. 2011;11:2129–2154. doi: 10.3390/s110202129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Velasco J., Pizarro D., Macias-Guarasa J. Source Localization with Acoustic Sensor Arrays Using Generative Model Based Fitting with Sparse Constraints. Sensors. 2012;12:13781–13812. doi: 10.3390/s121013781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tiete J., Domínguez F., Silva B.D., Segers L., Steenhaut K., Touhafi A. SoundCompass: A Distributed MEMS Microphone Array-Based Sensor for Sound Source Localization. Sensors. 2014;14:1918–1949. doi: 10.3390/s140201918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Clark B., Flint J.A. Acoustical Direction Finding with Time-Modulated Arrays. Sensors. 2016;16:2107. doi: 10.3390/s16122107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hoshiba K., Washizaki K., Wakabayashi M., Ishiki T., Kumon M., Bando Y., Gabriel D., Nakadai K., Okuno H.G. Design of UAV-Embedded Microphone Array System for Sound Source Localization in Outdoor Environments. Sensors. 2017;17:2535. doi: 10.3390/s17112535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu H., Li B., Yuan X., Zhou Q., Huang J. A Robust Real Time Direction-of-Arrival Estimation Method for Sequential Movement Events of Vehicles. Sensors. 2018;18:992. doi: 10.3390/s18040992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Knapp C., Carter G. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 1976;24:320–327. doi: 10.1109/TASSP.1976.1162830. [DOI] [Google Scholar]
- 10.Sawada H., Mukai R., Araki S., Makino S. A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 2004;12:530–538. doi: 10.1109/TSA.2004.832994. [DOI] [Google Scholar]
- 11.Yokoi K., Hamada N. ICA-Based Separation and DOA Estimation of Analog Modulated Signals in Multipath Environment. IEICE Trans. Commun. 2005;88-B:4246–4249. doi: 10.1093/ietcom/e88-b.11.4246. [DOI] [Google Scholar]
- 12.Schmidt R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986;34:276–280. doi: 10.1109/TAP.1986.1143830. [DOI] [Google Scholar]
- 13.Roy R., Kailath T. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. 1989;37:984–995. doi: 10.1109/29.32276. [DOI] [Google Scholar]
- 14.Marcos S., Marsal A., Benidir M. Performances analysis of the propagator method for source bearing estimation; Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-94); Adelaide, SA, Australia. 19–22 April 1994; pp. IV/237–IV/240. [DOI] [Google Scholar]
- 15.Marcos S., Marsal A., Benidir M. The propagator method for source bearing estimation. Signal Process. 1995;42:121–138. doi: 10.1016/0165-1684(94)00122-G. [DOI] [Google Scholar]
- 16.Hua Y., Sarkar T.K., Weiner D.D. An L-shaped array for estimating 2-D directions of wave arrival. IEEE Trans. Antennas Propag. 1991;39:143–146. doi: 10.1109/8.68174. [DOI] [Google Scholar]
- 17.Porozantzidou M.G., Chryssomallis M.T. Azimuth and elevation angles estimation using 2-D MUSIC algorithm with an L-shape antenna; Proceedings of the 2010 IEEE Antennas and Propagation Society International Symposium; Toronto, ON, Canada. 11–17 July 2010; pp. 1–4. [DOI] [Google Scholar]
- 18.Wang G., Xin J., Zheng N., Sano A. Computationally Efficient Subspace-Based Method for Two-Dimensional Direction Estimation With L-Shaped Array. IEEE Trans. Signal Process. 2011;59:3197–3212. doi: 10.1109/TSP.2011.2144591. [DOI] [Google Scholar]
- 19.Nie X., Wei P. Array Aperture Extension Algorithm for 2-D DOA Estimation with L-Shaped Array. Progress Electromagn. Res. Lett. 2015;52:63–69. doi: 10.2528/PIERL15011502. [DOI] [Google Scholar]
- 20.Tayem N. Azimuth/Elevation Directional Finding with Automatic Pair Matching. Int. J. Antennas Propag. 2016;2016:5063450. doi: 10.1155/2016/5063450. [DOI] [Google Scholar]
- 21.Wang Q., Yang H., Chen H., Dong Y., Wang L. A Low-Complexity Method for Two-Dimensional Direction-of-Arrival Estimation Using an L-Shaped Array. Sensors. 2017;17:190. doi: 10.3390/s17010190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li J., Jiang D. Joint Elevation and Azimuth Angles Estimation for L-Shaped Array. IEEE Antennas Wirel. Propag. Lett. 2017;16:453–456. doi: 10.1109/LAWP.2016.2582922. [DOI] [Google Scholar]
- 23.Dong Y.Y., Chang X. Computationally Efficient 2D DOA Estimation for L-Shaped Array with Unknown Mutual Coupling. Math. Probl. Eng. 2018;2018:1–9. doi: 10.1155/2018/5454719. [DOI] [Google Scholar]
- 24.Hsu K.C., Kiang J.F. Joint Estimation of DOA and Frequency of Multiple Sources with Orthogonal Coprime Arrays. Sensors. 2019;19:335. doi: 10.3390/s19020335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wu T., Deng Z., Li Y., Li Z., Huang Y. Estimation of Two-Dimensional Non-Symmetric Incoherently Distributed Source with L-Shape Arrays. Sensors. 2019;19:1226. doi: 10.3390/s19051226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Gao X., Hao X., Li P., Li G. An Improved Two-Dimensional Direction-of-Arrival Estimation Algorithm for L-Shaped Nested Arrays with Small Sample Sizes. Sensors. 2019;19:2176. doi: 10.3390/s19092176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Omer M., Quadeer A.A., Al-Naffouri T.Y., Sharawi M.S. An L-shaped microphone array configuration for impulsive acoustic source localization in 2-D using orthogonal clustering based time delay estimation; Proceedings of the 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA); Sharjah, UAE. 12–14 February 2013; pp. 1–6. [DOI] [Google Scholar]
- 28.Wajid M., Kumar A., Bahl R. Direction-of-arrival estimation algorithms using single acoustic vector-sensor; Proceedings of the International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT); Aligarh, India. 24–26 November 2017; pp. 84–88. [DOI] [Google Scholar]
- 29.Sugimoyo Y., Miyabe S., Yamada T., Makino S., Juang B.H. An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2016;E99.A:1152–1162. doi: 10.1587/transfun.E99.A.1152. [DOI] [Google Scholar]
- 30.Suksiri B., Fukumoto M. Multiple Frequency and Source Angle Estimation by Gaussian Mixture Model with Modified Microphone Array Data Model. J. Signal Process. 2017;21:163–166. doi: 10.2299/jsp.21.163. [DOI] [Google Scholar]
- 31.Su G., Morf M. The signal subspace approach for multiple wide-band emitter location. IEEE Trans. Acoust. Speech Signal Process. 1983;31:1502–1522. doi: 10.1109/TASSP.1983.1164233. [DOI] [Google Scholar]
- 32.Yu H., Liu J., Huang Z., Zhou Y., Xu X. A New Method for Wideband DOA Estimation; Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing; Shanghai, China. 21–25 September 2007; pp. 598–601. [DOI] [Google Scholar]
- 33.Yoon Y.S., Kaplan L.M., McClellan J.H. TOPS: New DOA estimator for wideband signals. IEEE Trans. Signal Process. 2006;54:1977–1989. doi: 10.1109/TSP.2006.872581. [DOI] [Google Scholar]
- 34.Okane K., Ohtsuki T. Resolution Improvement of Wideband Direction-of-Arrival Estimation “Squared-TOPS”; Proceedings of the IEEE International Conference on Communications; Cape Town, South Africa. 23–27 May 2010; pp. 1–5. [DOI] [Google Scholar]
- 35.Hirotaka H., Tomoaki O. DOA estimation for wideband signals based on weighted Squared TOPS. EURASIP J. Wirel. Commun. Netw. 2016;2016:243. doi: 10.1186/s13638-016-0743-9. [DOI] [Google Scholar]
- 36.Wang H., Kaveh M. Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Trans. Acoust. Speech Signal Process. 1985;33:823–831. doi: 10.1109/TASSP.1985.1164667. [DOI] [Google Scholar]
- 37.Hung H., Kaveh M. Focussing matrices for coherent signal-subspace processing. IEEE Trans. Acoust. Speech Signal Process. 1988;36:1272–1281. doi: 10.1109/29.1655. [DOI] [Google Scholar]
- 38.Valaee S., Kabal P. Wideband array processing using a two-sided correlation transformation. IEEE Trans. Signal Process. 1995;43:160–172. doi: 10.1109/78.365295. [DOI] [Google Scholar]
- 39.Valaee S., Champagne B., Kabal P. Localization of wideband signals using least-squares and total least-squares approaches. IEEE Trans. Signal Process. 1999;47:1213–1222. doi: 10.1109/78.757209. [DOI] [Google Scholar]
- 40.Suksiri B., Fukumoto M. A Computationally Efficient Wideband Direction-of-Arrival Estimation Method for L-Shaped Microphone Arrays; Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS); Florence, Italy. 27–30 May 2018; pp. 1–5. [DOI] [Google Scholar]
- 41.Abdelbari A. Ph.D. Thesis. Near East University; Nicosia, Cyprus: 2018. Direction of Arrival Estimation of Wideband RF Sources. [DOI] [Google Scholar]
- 42.Ponnapalli S.P., Saunders M.A., Van Loan C.F., Alter O. A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms. PLoS ONE. 2011;6:1–11. doi: 10.1371/journal.pone.0028072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Xin J., Zheng N., Sano A. Simple and Efficient Nonparametric Method for Estimating the Number of Signals Without Eigendecomposition. IEEE Trans. Signal Process. 2007;55:1405–1420. doi: 10.1109/TSP.2006.889982. [DOI] [Google Scholar]
- 44.Nadler B. Nonparametric Detection of Signals by Information Theoretic Criteria: Performance Analysis and an Improved Estimator. IEEE Trans. Signal Process. 2010;58:2746–2756. doi: 10.1109/TSP.2010.2042481. [DOI] [Google Scholar]
- 45.Diehl R. Acoustic and auditory phonetics: The adaptive design of speech sound systems. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 2008;363:965–978. doi: 10.1098/rstb.2007.2153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Van Der Veen A., Deprettere E.F., Swindlehurst A.L. Subspace-based signal analysis using singular value decomposition. Proc. IEEE. 1993;81:1277–1308. doi: 10.1109/5.237536. [DOI] [Google Scholar]
- 47.Hogben L. Handbook of Linear Algebra. CRC Press; Boca Raton, FL, USA: 2006. Discrete Mathematics and Its Applications. [Google Scholar]
- 48.Naidu P. Sensor Array Signal Processing. Taylor & Francis; Abingdon-on-Thames, UK: 2000. [Google Scholar]
- 49.Meyer C.D., editor. Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics; Philadelphia, PA, USA: 2000. [Google Scholar]
- 50.Horn R.A., Johnson C.R. Matrix Analysis. 2nd ed. Cambridge University Press; Cambridge, UK: 2012. [DOI] [Google Scholar]
- 51.Van Loan C.F. The Higher-Order Generalized Singular Value Decomposition. Cornell University; Ithaca, NY, USA: 2015. Structured Matrix Computations from Structured Tensors: Lecture 6. [Google Scholar]
- 52.Wei Y., Guo X. Pair-Matching Method by Signal Covariance Matrices for 2D-DOA Estimation. IEEE Antennas Wirel. Propag. Lett. 2014;13:1199–1202. doi: 10.1109/LAWP.2014.2331076. [DOI] [Google Scholar]
- 53.Lehmanna E.A., Johansson A.M. Prediction of energy decay in room impulse responses simulated with an image-source model. J. Acoust. Soc. Am. 2008;124:269–277. doi: 10.1121/1.2936367. [DOI] [PubMed] [Google Scholar]
- 54.Gower J.C., Dijksterhuis G.B. Procrustes Problems. 1st ed. Oxford University Press; Oxford, UK: 2004. [DOI] [Google Scholar]











