Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework

Ching-Hua Lee; Bhaskar D Rao; Harinath Garudadri

doi:10.1109/taslp.2020.3038526

. Author manuscript; available in PMC: 2022 Jan 1.

Published in final edited form as: IEEE/ACM Trans Audio Speech Lang Process. 2020 Nov 17;29:171–186. doi: 10.1109/taslp.2020.3038526

Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework

Ching-Hua Lee ¹, Bhaskar D Rao ¹, Harinath Garudadri ¹

PMCID: PMC7996480 NIHMSID: NIHMS1653718 PMID: 33778097

Abstract

In this paper, based on sparsity-promoting regularization techniques from the sparse signal recovery (SSR) area, least mean square (LMS)-type sparse adaptive filtering algorithms are derived. The approach mimics the iterative reweighted ℓ₂ and ℓ₁ SSR methods that majorize the regularized objective function during the optimization process. We show that introducing the majorizers leads to the same algorithm as simply using the gradient update of the regularized objective function, as is done in existing approaches. Different from the past works, the reweighting formulation naturally leads to an affine scaling transformation (AST) strategy, which effectively introduces a diagonal weighting on the gradient, giving rise to new algorithms that demonstrate improved convergence properties. Interestingly, setting the regularization coefficient to zero in the proposed AST-based framework leads to the Sparsity-promoting LMS (SLMS) and Sparsity-promoting Normalized LMS (SNLMS) algorithms, which exploit but do not strictly enforce the sparsity of the system response if it already exists. The SLMS and SNLMS realize proportionate adaptation for convergence speedup should sparsity be present in the underlying system response. In this manner, we develop a new way for rigorously deriving a large class of proportionate algorithms, and also explain why they are useful in applications where the underlying systems admit certain sparsity, e.g., in acoustic echo and feedback cancellation.

Index Terms—: affine scaling, iterative reweighted, proportionate adaptation, sparse adaptive filter, sparse signal recovery

I. Introduction

ADAPTIVE filters [1], [2], [3], [4] have been an active research area over the past few decades for their capabilities of estimating and tracking time-varying systems. In several applications, the impulse responses (IRs) of the underlying systems to be identified are often sparse or compressible (quasi-sparse), i.e., only a small percentage of the IR components have a significant magnitude while the rest are zero or small. Examples include network and acoustic echo cancellation [5], [6], [7], hands-free mobile telephony [8], acoustic feedback reduction in hearing aids [9], and underwater acoustic communications [10], to mention a few. Designing adaptive filters that can exploit the sparse structure of the underlying system response for performance improvement over the conventional approaches, e.g., the least mean square(LMS)and normalized LMS (NLMS), is of great interest and importance especially for acoustic and speech applications. In this paper, we utilize the iterative reweighted ℓ₂ and ℓ₁ algorithms that have been developed in the sparse signal recovery (SSR) area to minimize diversity measures as a starting point [11]. Incorporating an affine scaling transformation (AST) [12], [13] into the algorithm design process, a new methodology for developing a large class of adaptive filters is presented that leverage the sparse nature of the system responses.

A. Related Work

An early and influential work on identifying sparse IRs is the proportionate NLMS (PNLMS) algorithm proposed by Duttweiler [5] for acoustic echo cancellation. The main idea behind the approach is to update each filter coefficient using a step size proportional to the magnitude of the estimated coefficient, as opposed to the NLMS which assigns a uniform adaptation gain to all coefficients. Consequently, when the system is sparse, larger coefficients are adapted using relatively large steps compared to the smaller ones with PNLMS. The overall convergence can thus be sped up by focusing on adjusting the significant coefficients, rather then treating them all equally as in NLMS. Although PNLMS was developed in an intuitive way, i.e., the equations used to calculate the proportionate factors that realize step-size control were not based on any optimization criterion but were based on good heuristics, it has motivated many new proportionate variants for sparse system identification. The proportionate class of algorithms represent an important subset among sparsity-aware adaptive filters.

The recent progress on SSR has led to a number of computational algorithms, e.g., [14], [15], [16], [17], [18], [19], among others. This makes available a plethora of approaches for systematically designing sparsity-aware adaptive algorithms that are a natural complement to the SSR batch estimation techniques. As a result, different from the proportionate approaches, another class of sparse adaptive filters have been introduced by integrating sparsity-inducing regularization to accelerate the convergence of near-zero coefficients in sparse systems. This has led to several sparse adaptive filtering algorithms and even obtaining a general framework of adaptive filters that incorporate sparsity. SSR-motivated adaptive algorithms represent another important class of sparsity-aware adaptive filters. We now discuss a few works on the proportionate class followed by the SSR variants.

Several variants of the PNLMS have been proposed and [20] provides a good summary. Examples include the improved PNLMS (IPNLMS) [21], IPNLMS based on the ℓ₀ “norm” (IPNLMS-ℓ₀) [22], etc. In [23], Martin et al. utilized a natural gradient framework to deduce adaptive filters having similar features to the PNLMS that can exploit the sparse structure. Rao and Song [24] and Jin [25] proposed a framework for promoting sparsity in adaptive filters based on minimizing diversity measures. The framework is quite general and encompasses a broad range of adaptive filtering algorithms having similarity with the PNLMS algorithm. Benesty et al. [26] derived the PNLMS from a different perspective by using a basis pursuit [17] formulation. Following them, Liu and Grant [27] proposed a general framework of proportionate adaptive filters based on convex optimization and sparseness measures, which covers many traditional proportionate algorithms.

Several SSR-inspired algorithms have been introduced by integrating a sparsity-inducing regularizer into the original LMS objective function to accelerate the convergence of near-zero coefficients in sparse systems. For example, Chen et al. [28] proposed the zero-attracting LMS (ZA-LMS) derived by including the ℓ₁ norm penalty in the objective function. They also proposed the reweighted ZA-LMS (RZA-LMS) obtained by incorporating the log-sum penalty. Later, using the approximation of the ℓ₀ “norm” as a sparsity-inducing term, Gu et al. [29] proposed the ℓ₀-LMS that is capable of better estimating sparse systems. In [30], the authors utilized the p-norm-like regularization function and considered the quantitative learning of the regularizer. Another work in this area is the new reweighted ℓ₁ norm penalized LMS algorithm proposed and studied in [31] for improving the ZA-LMS and RZA-LMS.

Recently, some works have considered both proportionate adaptation and sparsity-inducing regularization together. For example, [32] presents a modified PNLMS update equation with a zero attractor as in the ZA-LMS for all the taps, derived by introducing a carefully constructed ℓ₁ norm penalty in the PNLMS objective function. Other than the ℓ₁ norm, [33], [34] apply the ℓ_p norm penalty to the PNLMS cost function and derive ℓ_p-norm-constrained proportionate algorithms for improved broadband multipath channel estimation and active noise control. [35] encompasses a number of sparsity-aware adaptive filtering algorithms that go beyond the LMS and NLMS, including proportionate and regularization-based approaches. [36], [37] provide a general framework to combine proportionate updates and sparsity-inducing regularizers. In Section III, we will derive algorithms whose update rules also consist of a proportionate term and another term due to regularization. However, our derivation follows a very different path from these previous works.

B. Contributions of the Paper

In this paper, inspired by the conceptual similarity with SSR,¹ our goal is to add to this interesting body of work on adaptive filtering and sparsity. The contributions of the paper are the following:

The sparsity-aware adaptive filters developed lie at the intersection of the proportionate-class and SSR-inspired adaptive algorithms and provides an interesting bridge. We start with the rigorous formulation of a regularization framework and derive sparsity-aware adaptive filtering algorithms. Specifically, based on diversity measure minimization in SSR, we adopt the reweighted ℓ₂ and ℓ₁ frameworks [11]. Using an AST methodology [12], [13] in the algorithm development, our work naturally results in a general class of proportionate algorithms. This is an unique feature of this work. The combination of AST and the reweighting frameworks contribute to the main innovation of our adaptive algorithm development framework.
Under the proposed framework, we introduce Sparsity-promoting LMS (SLMS) and Sparsity-promoting NLMS (SNLMS) algorithms that promote sparsity without biasing the adaptation process by adopting λ = 0, where λ is the regularization coefficient associated with the sparsity penalty. This is not possible for the class of algorithms currently in existence that utilize a sparsity-inducing regularization penalty.² The SLMS and SNLMS can be viewed as realizing proportionate adaptation like the PNLMS class of algorithms [5]. Therefore, utilizing λ = 0 in our framework paves the way for explaining why proportionate algorithms are useful in circumstances where the channels to be estimated admit certain sparsity. This provides theoretical support to existing proportionate algorithms which were mostly developed based on good heuristics rather than on optimization criteria. More importantly, unlike most of them that design the proportionate factors heuristically, our SSR-motivated framework leads to a more systematic way of designing the factors, and permits incorporation of a broad class of diversity measures that have proven to be effective for SSR in our algorithms.
Compared to existing derivations of proportionate-type algorithms, using the proposed framework we derive the algorithms in a more natural way in terms of incorporating sparsity using a regularization framework. For instance, in some of the existing works modified objective functions were proposed that impose sparsity on the “change” of the filter rather than on the filter itself, e.g., [24], [25], [26], [27], [32]. However, since the assumption is that the filter itself is sparse, the motivation for enforcing sparsity on the “change” rather than on the filter is not clear and at best indirect. In contrast, we work with the general mean squared error (MSE) criterion in which sparsity can be directly imposed via regularization on the filter.
Steady-state analysis of the proposed algorithms is conducted and simulation results are provided to demonstrate the effectiveness of the proposed algorithms compared to existing approaches. Examples with the acoustic channel response measured on a real-world hearing aid device using speech input are also presented.

A portion of this work has been previously published as a conference paper [39]. To a fuller extent, the current paper provides a general framework for incorporating flexible diversity measures into sparse adaptive filters, together with theoretical discussion and supporting simulated results.

C. Organization of the Paper

The rest of the paper is organized as follows. Section II provides background on adaptive filters. Section III derives adaptive filters that incorporate sparsity based on diversity measure minimization, by utilizing the reweighted ℓ₂ and ℓ₁ frameworks together with the AST methodology. Section IV introduces the SLMS and SNLMS that adopt λ = 0. Section V discusses the steady-state analysis. Section VI presents simulation results. Section VII concludes the paper.

D. Notation

Let $ℝ^{M}$ denote the M-dimensional real Euclidean space. $ℝ^{N \times M}$ denotes the set of N ×M real matrices. $ℝ_{+}$ denotes the set of non-negative real numbers. Superscript^T denotes the transpose of a vector or matrix. E[·] denotes the mathematical expectation. Vectors and matrices are denoted by boldface lowercase and uppercase letters, respectively. Scalars are denoted by italics. For a vector $x = {[x_{0}, x_{1}, \dots, x_{M - 1}]}^{T} \in ℝ^{M}$ , the ℓ_p norm³ (where p > 0) is defined as: $‖ x ‖_{p} = {({\sum_{i = 0}^{M - 1} | x_{i} |}^{p})}^{1 / p}$ . The ℓ₀ “norm”⁴ ‖x‖₀ is defined as the number of nonzero entries of the vector x. We use diag{x_i} to denote the M-by-M diagonal matrix whose i-th diagonal element is x_i. We use sgn(·) to denote the component-wise sign function that takes the sign of the entries of its argument. ∇_x denotes the gradient operator⁵ w.r.t. x. d denotes the differential operator. tr(X) denotes the trace of a square matrix $X \in ℝ^{M \times M}$ . I is the identity matrix. 1 denotes the vector of all ones. 0 denotes the vector of all zeros. We use $N (\cdot, \cdot)$ to denote the normal distribution with the first and second arguments being the mean and (co)variance, respectively.

II. Background on Adaptive Filtering and SSR

We provide some preliminaries of adaptive filters in the context of system identification and present several examples of existing sparsity-aware adaptive filtering algorithms. We also discuss the iterative reweighting frameworks in SSR for developing our adaptive algorithms in later sections.

A. Adaptive Filtering for System Identification

Let h_n = [h_0,n, h₁,_n,…, h_{M − 1,n}]^T denote the adaptive filter of length M at discrete time instant n. Assume the IR of the underlying system is $h^{o} = {[h_{0}^{o}, h_{1}^{o}, \dots, h_{M - 1}^{o}]}^{T}$ , and the model for the observed or desired signal is $d_{n} = u_{n}^{T} h^{o} + v_{n}$ . u_n = [u_n, u_n−1, …, u_n−M+1]^T is the vector containing the M most recent samples of the input signal u_n and v_n is an additive noise signal. The output of the adaptive filter $u_{n}^{T} h_{n}$ is subtracted from d_n to obtain the error signal $e_{n} = d_{n} - u_{n}^{T} h_{n}$ The goal in general is to sequentially update the coefficients of h_n upon the arrival of a new data pair (u_n, d_n), such that eventually h_n = h^o; i.e., to identify the unknown system.

The most classic adaptive filtering algorithms are the LMS and NLMS [1], [2], [3], which can be derived based on minimizing the MSE objective function:

min_{h} J (h) ≜ E [e_{n}^{2}] = E [{(d_{n} - u_{n}^{T} h)}^{2}] .

(1)

The method of steepest descent (gradient descent) for optimizing (1) suggests the following recursion for updating the filter coefficients [2]:

h_{n + 1} = h_{n} - \frac{μ}{2} \nabla_{h} J (h_{n}),

(2)

where μ > 0 is the step size. To develop adaptive algorithms, in practice the gradient ∇_h J(h_n) = −2E[u_ne_n] is replaced by the instantaneous estimate −2u_n e_n, i.e., the stochastic gradient [2], [3], leading to the standard LMS algorithm:

h_{n + 1} = h_{n} + μ u_{n} e_{n} .

(3)

The normalized version of (3), i.e., the NLMS algorithm, can be derived based on the principle of minimum disturbance [2]. Alternatively, it can be obtained by performing exact line search for the optimal step size for each iteration [40]. Then, practically, a scaling factor $\tilde{μ} > 0$ is introduced to exercise control over the adaptation⁶ and a small regularization constant δ > 0 is also employed to avoid division by zero [2], leading to the standard NLMS algorithm:

h_{n + 1} = h_{n} + \frac{\tilde{μ} u_{n} e_{n}}{u_{n}^{T} u_{n} + δ} .

(4)

1). Sparsity-Aware Adaptive Filtering Algorithms:

When the underlying system response is sparse, a class of algorithms realizing proportionate adaptation [20] are able to take advantage of the structural sparsity. A typical update rule with proportionate adaptation is:

h_{n + 1} = h_{n} + μ P_{n} u_{n} e_{n},

(5)

or the normalized version:

h_{n + 1} = h_{n} + \frac{\tilde{μ} P_{n} u_{n} e_{n}}{u_{n}^{T} P_{n} u_{n} + δ},

(6)

Where

P_{n} = diag {p_{0, n}, p_{1, n}, \dots, p_{M - 1, n}}

(7)

is an M-by-M diagonal matrix assigning different weights to the step sizes for different filter taps, referred to as the proportionate matrix. It redistributes the adaptation gains among all coefficients and emphasizes the large ones in order to speed up their convergence. Typically, at the n-th iteration the diagonal entries are computed as:

p_{i, n} = \frac{γ_{i, n}}{\sum_{j = 0}^{M - 1} γ_{j, n}},

(8)

∀i = 0, 1,…, M − 1, where γ_i,n is algorithm-dependent and examples of such algorithms include the PNLMS [5], IPNLMS [21], IPNLMS-ℓ₀ [22], etc. In general, if the estimated filter coefficients h_i,n are sparse, the resulting γ_i,n (thus p_i,n) will also tend to be sparsely distributed (with positive values).

Another class of algorithms, inspired by developments in the SSR area, take sparsity into account using a regularization-based approach, e.g., [28], [29], [30], [31]. The algorithms are obtained by adding a sparsity-inducing term G(h) to the MSE objective function:

min_{h} J^{G} (h) ≜ J (h) + λ G (h),

(9)

Where λ > 0 is the regularization coefficient. By simply applying (stochastic) gradient descent⁷ on (9):

h_{n + 1} = h_{n} + μ u_{n} e_{n} - \frac{μ λ}{2} \nabla_{h} G (h_{n}),

(10)

various algorithms can be obtained with different sparsity-inducing functions G(·). Examples include the ZA-LMS [28], RZA-LMS [28], and ℓ₀-LMS [29], [41].

B. Iterative Reweighting Algorithms in SSR

The optimization of (9) is actually an SSR problem. The sparsity regularization term G(·) represents the general diversity measure that when minimized encourages sparsity in its argument. A separable function of the form $G (h) = \sum_{i = 0}^{M - 1} g (h_{i})$ is commonly used, where g(·) has the following properties [11]:

Property 1: g(z) is symmetric, i.e., g(z) = g(−z) = g(|z|);
Property 2: g(|z|) is monotonically increasing with |z|;
Property 3: g(0) is finite;
Property 4: g(z) is concave in |z| or z².

Any function that holds the above properties is a candidate for effective SSR algorithm development.

The concave nature of the regularization penalty poses challenges to the diversity measure minimization problem. The iterative reweighted ℓ₂ [15], [18] and ℓ₁ [19] methods are popular batch estimation algorithms for solving such minimization problems in SSR. By introducing a weighted ℓ₂ or ℓ₁ norm term as an upper bound for the diversity measure term in each iteration, they form and solve a convex optimization problem at each step to approach the optimal solution [11]. Specifically, instead of (9), at iteration n the reweighted ℓ₂ framework suggests solving:

min_{h} J_{n}^{l_{2}} (h) ≜ J (h) + λ {‖ W_{n}^{- 1} h ‖}_{2}^{2},

(11)

and the reweighted ℓ₁ framework suggests solving:

min_{h} J_{n}^{l_{1}} (h) ≜ J (h) + λ {‖ W_{n}^{- 1} h ‖}_{1},

(12)

where W_n = diag{w_i,n} is positive definite⁸ and each w_i,n is computed based on the current estimate h_i,n, depending on which framework (reweighted ℓ₂ or ℓ₁) and diversity measure (choice of G(·)) are used.

To elaborate, for using the reweighted ℓ₂ (11), the diversity measure function g(z) has to be concave in z² for Property 4; i.e., it satisfies g(z) = f(z²), where f(z) is concave for $z \in ℝ_{+}$ . Based on [11], we have w_i,n given as:

w_{i, n} = {({\frac{d f (z)}{d z} |}_{z = h_{i, n}^{2}})}^{- \frac{1}{2}} .

(13)

For using the reweighted ℓ₁ (12), g(z) has to be concave in |z| for Property 4; i.e., it satisfies g(z) = f(|z|), where f(z) is concave for $z \in ℝ_{+}$ . In this case, w_i,n is given as:

w_{i, n} = {({\frac{d f (z)}{d z} |}_{z = | h_{i, n} |})}^{- 1},

(14)

To utilize the reweighting frameworks, we first choose an appropriate diversity measure G(h) and then use (13) or (14) to obtain the corresponding update form of W_n. Several examples will be presented is Section IV-B.

III. Proposed Framework for Incorporating Sparsity in Adaptive Filters

Our framework for developing sparse adaptive filters is based on (9). However, we will be deriving algorithms in a different way rather than using a simple gradient descent as is typically done in existing regularization-based adaptive filtering approaches, e.g., (10). Our novel derivation consists of two stages: i) adapting the iterative reweighting frameworks [11] popular in SSR to the adaptive filtering setting, followed by ii) the AST strategy [12], [13] from the optimization literature to obtain new adaptive filtering algorithms.

A. Reweighting Methods for Adaptive Filtering

The reweighting methods introduced in Section II-B actually belong to the more general class of majorization-minimization (MM) algorithms [42]. In each iteration n, the weighted ℓ₂ or ℓ₁ norm term majorizes G(h) at the current estimate h_n, thereby providing a surrogate function (or majorizer) $J_{n}^{l_{2}} (h)$ or $J_{n}^{l_{1}} (h)$ for the regularized objective function J^G(h). Sequentially minimizing the surrogate functions allows the algorithm to produce more focal estimates as optimization progresses. Hopefully when the number of iterations is large enough, the optimal solution can be well approached or even achieved [11].

In SSR, it is typical that the surrogate function is exactly minimized in each iteration n. For the purpose of developing adaptive filtering algorithms, here we consider performing only one step of gradient descent per iteration. In this sense, it corresponds to the generalized MM [43] where one does not need to minimize the majorizer but only to assure that it decreases in every iteration. Indeed, the MM viewpoint provides an interesting observation of using gradient descent for optimizing (9) and the reweighting formulations (11) and (12), as stated in the following proposition:

Proposition 3.1:

For any point h_n at which G(h) is differentiable, the gradient vector of the surrogate function $J_{n}^{l_{2}} (h)$ or $J_{n}^{l_{1}} (h)$ evaluated at h_n coincides with that of the regularized objective function J^G(h), i.e., $\nabla_{h} J_{n}^{l_{2}} (h_{n}) = \nabla_{h} J^{G} (h_{n})$ and $\nabla_{h} J_{n}^{l_{1}} (h_{n}) = \nabla_{h} J^{G} (h_{n})$ .

Proof:

Since the surrogate function majorizes J^G(h) at h_n, the tangent plane (supporting hyperplane) of the majorizer coincides with that of J^G(h) at h_n. Consequently, the gradient vectors are the same at h_n.

The observation in Proposition 3.1 implies that, if the gradient descent (when using a fixed step size) is utilized for optimization,⁹ then adopting the reweighting frameworks (11) and (12) will be equivalent to directly working on (9) and lead to the existing regularization-based algorithms such as the ZA-LMS. In the following, we introduce the AST strategy naturally suggested by the reweighting frameworks, leading to new algorithms markedly different from those obtained by directly optimizing (9) with gradient descent.

B. AST-Based Adaptive Filtering Algorithms

The reweighting frameworks (11) and (12) naturally suggest the following reparameterization in terms of the (affinely) scaled variable q:

q ≜ W_{n}^{- 1} h .

(15)

This step can be interpreted as the AST commonly employed by the interior point approach to solving linear and nonlinear programming problems [12], where W_n is used as the scaling matrix. It is pre-calculated and treated as a given matrix at iteration n to perform a change of coordinates (variables) [44] from h to q, acting as a scaling technique in gradient descent methods [45]. In the optimization literature, AST-based methods transform the original problem into an equivalent one, favorably positioning the current point at the center of the feasible region for expediting the optimization process [13]. While we do not claim this argument is rigorous in the context of adaptive filtering, where the convergence behavior is hard to characterize due to the nonlinear nature of the update equations and the long term dependency on the data, the numerical results appear to support this observation of enjoying the benefits of AST for convergence speedup.

Now we apply (15) to reparameterize the objective functions $J_{n}^{l_{2}} (h)$ and $J_{n}^{l_{1}} (h)$ and perform minimization w.r.t. q, that is:

min_{q} {\tilde{J}}_{n}^{l_{2}} (q) ≜ J_{n}^{l_{2}} (W_{n} q) = J (W_{n} q) + λ {‖ q ‖}_{2}^{2}

(16)

and

min_{q} {\tilde{J}}_{n}^{l_{1}} (q) ≜ J_{n}^{l_{1}} (W_{n} q) = J (W_{n} q) + λ {‖ q ‖}_{1},

(17)

for the reweighted ℓ₂ and ℓ₁cases, respectively. The overall update process conceptually can be summarized as follows: i) given an h compute W_n followed by q. ii) Update q using a gradient descent algorithm. iii) Use this new q to obtain the updated h. iv) Repeat Steps i)–iii) till convergence.

More formally, to proceed with gradient-based updates, following [40] we define the a posteriori AST variable at time n:

q_{n ∣ n} ≜ W_{n}^{- 1} h_{n}

(18)

and the a priori AST variable at time n:

q_{n + 1 ∣ n} ≜ W_{n}^{- 1} h_{n + 1} .

(19)

The recursive update by using gradient descent in the q domain can be formulated as:

q_{n + 1 ∣ n} = q_{n ∣ n} - \frac{μ}{2} \nabla_{q} {\tilde{J}}_{n}^{l_{2}} (q_{n ∣ n})

(20)

and

q_{n + 1 ∣ n} = q_{n ∣ n} - \frac{μ}{2} \nabla_{q} {\tilde{J}}_{n}^{l_{1}} (q_{n ∣ n}),

(21)

for optimizing (16) and (17), respectively.

Using the chain rule¹⁰ and the AST relationships (15) and (18), we can write (20) and (21) respectively as:

q_{n + 1 ∣ n} = q_{n ∣ n} - \frac{μ}{2} W_{n} \nabla_{h} J_{n}^{l_{2}} (h_{n})

(22)

and

q_{n + 1 ∣ n} = q_{n ∣ n} - \frac{μ}{2} W_{n} \nabla_{h} J_{n}^{l_{1}} (h_{n}) .

(23)

Premultiplying W_n on both sides of (22) and (23) and noting the relationships (18) and (19), we transform the q domain updates (22) and (23) back to the h domain respectively as:

h_{n + 1} = h_{n} - \frac{μ}{2} W_{n}^{2} \nabla_{h} J_{n}^{l_{2}} (h_{n})

(24)

and

h_{n + 1} = h_{n} - \frac{μ}{2} W_{n}^{2} \nabla_{h} J_{n}^{l_{1}} (h_{n}) .

(25)

By Proposition 3.1, we can replace $\nabla_{h} J_{n}^{l_{2}} (h_{n})$ and $\nabla_{h} J_{n}^{l_{1}} (h_{n})$ with ∇_h J^G(h_n). Thus, (24) and (25) can both be written as:

h_{n + 1} = h_{n} - \frac{μ}{2} W_{n}^{2} \nabla_{h} J^{G} (h_{n}) .

(26)

Note that based on the aforementioned update process (i)-(iv), we can in fact directly apply (15) to reparameterize J^G(h) and obtain (26) without going through the reweighting formulation, as long as the scaling matrix W_n is specified. In this sense, the reweighing methods suggest a suitable W_n that eventually becomes a diagonal weighting matrix $W_{n}^{2}$ on the gradient ∇_hJ^G(h_n) in the update rule. Hopefully, it alters the ordinary descent direction in such a way that leads to convergence improvement. We should also emphasize that the scaling matrix W_n suggested by (24) and (25) will in general be different for a given same G(h), despite the fact that both can be written as (26).

In practice, the following update rule is suggested over (26) for avoiding instability and slow convergence issues:

h_{n + 1} = h_{n} - \frac{μ}{2} S_{n} \nabla_{h} J^{G} (h_{n}),

(27)

where

S_{n} = \frac{W_{n}^{2}}{\frac{1}{M} tr (W_{n}^{2})},

(28)

referred to as the sparsity-promoting matrix, is the normalized version of $W_{n}^{2}$ . As a fixed step size μ is used, performing normalization of the weighting matrix compensates for any arbitrary scaling inherent in $W_{n}^{2}$ that might cause instability (scaling too large) or slow convergence (scaling too small). Note that by (28) we always have tr(S_n) = M, same as in the ordinary gradient descent (non-AST) case which essentially has S_n = I whose trace is also M.

Finally, to obtain the adaptive algorithm, we follow the standard procedure of replacing ∇_hJ^G(h_n) = −2E[u_n e_n] + λ∇_hG(h_n) in (27) with its instantaneous estimate −2u_ne_n + λ∇_hG(h_n), leading to:

h_{n + 1} = h_{n} + μ S_{n} u_{n} e_{n} - \frac{μ λ}{2} S_{n} \nabla_{h} G (h_{n}) .

(29)

We see that there is a term with a diagonal weighting S_n on the LMS update vector u_ne_n, similar to that in proportionate algorithms (5) and (6). We also see another term weighted by λ which is due to the introduction of the regularizer, like that of (10). Therefore, the AST framework leads to a more general algorithm comprised of proportionate adaptation and sparsity-inducing regularization. We thus refer to (29) as the generalized sparse LMS algorithm.

C. Discussions

It may seem at the first glance that applying the reweighting techniques to (9) straightforwardly leads to our algorithm. We stress that it is not true. If the AST (15) was not considered, adopting the reweighting schemes would still end up with an update rule like (10) according to Proposition 3.1, rather than the proposed (29). It is also worth mentioning that there is considerable difference between the proposed algorithm (29) and existing SSR algorithms based on (11) and (12) – the conventional SSR techniques are batch estimation methods for recovering the underlying sparse representation, while the proposed algorithm is specifically tailored for the adaptive filtering scenario. That being said, as gradient descent is adopted for optimization, we actually perform a gradual update of the filter coefficients in each iteration n, rather than looking for an exact minimizer of the surrogate function as is typically pursued in SSR. This enables the algorithm to track temporal variations and environmental changes. Certainly, considering the gradient noise in real scenarios, it may post the issue of whether the algorithm is convergent. However, even the standard LMS and NLMS that are based on gradual updates, work well in many practical situations with gradient noise. In Section VI, experimental results will demonstrate that the proposed algorithm, like the LMS and NLMS, also behaves well when certain level of environmental noise is present.

Finally, the following theorem establishes the convergence of the q domain recursions (20) and (21) and their relationships to (9) to shed light on the convergence of the adaptive algorithm (29) developed based on them:

Theorem 3.1:

For the objective function J^G(h) in (9) with the general diversity measure G(h) satisfying Properties 1–4 in Section II-B,¹¹ there exists a step size sequence ${μ_{n}}_{n = 0}^{\infty}$ such that each of the update recursions (20) and (21) monotonically converges to a local minimum (or saddle point) of (9) under a wide-sense stationary (WSS) environment, i.e., u_n and d_n are jointly WSS.

Proof:

See Appendix A.

IV. Sparsity-Promoting Algorithms Adopting λ = 0

An interesting situation arises when we consider the limiting case of λ → 0⁺ for the proposed framework. By setting λ = 0 in (29), we see the λ-weighted term due to regularization vanishes, leading to a simpler equation:

h_{n + 1} = h_{n} + μ S_{n} u_{n} e_{n} .

(30)

The main feature of (30) is that it is able to promote sparsity of the system (through S_n) if it already exists while not strictly enforcing it (as λ = 0). This shall become clearer in later discussions. We refer to the algorithm (30) as the Sparsity-promoting LMS (SLMS).

The normalized version of (30) can also be developed by performing exact line search for the optimal step size at iteration n just like that when deriving the NLMS:

μ_{n} = \underset{μ}{arg min} {(d_{n} - u_{n}^{T} (h_{n} + μ S_{n} u_{n} e_{n}))}^{2} = \frac{1}{u_{n}^{T} S_{n} u_{n}} .

(31)

Similar to the NLMS, we introduce $\tilde{μ} > 0$ to exercise control over the adaptation and δ > 0 to avoid division by zero, resulting in:

h_{n + 1} = h_{n} + \frac{\tilde{μ} S_{n} u_{n} e_{n}}{u_{n}^{T} S_{n} u_{n} + δ} .

(32)

We refer to the algorithm (32) as the Sparsity-promoting NLMS (SNLMS).

An obvious benefit of adopting λ = 0 is that the computation for the term due to regularization is no longer needed, and we do not have to tweak this coefficient anymore (which is typically not a trivial task in practice). Still, the SLMS and SNLMS have the ability to leverage sparsity owing to the diagonal weighting S_n, which is similar to the proportionate matrix P_n in (5) and (6). Again, this is made possible due to the use of the AST (15), wherein the gradient descent update is done in the q variable rather than in the original h domain. Otherwise, we will end up with algorithms like (10) that will reduce to the ordinary LMS/NLMS when using λ = 0.

The SLMS and SNLMS can in fact be viewed as a broader class of proportionate algorithms. Actually, with certain choices of diversity measures and corresponding parameters, we can have the PNLMS (approximately) as a special case. For example, as we will see in Section IV-B, using p = 1 in (34) for W_n, the sparsity-promoting matrix S_n approximates the proportionate matrix P_n of PNLMS. Indeed, one of the main advantages of the SLMS and SNLMS is their ability to incorporate flexible diversity measures. It allows the algorithms to fit the sparsity level of the system response by optimizing corresponding sparsity control parameters in a more informed manner due to the underlying connections to SSR. Furthermore, the derivations provide theoretical support to the class of proportionate algorithms that were mostly motivated based on heuristics, explaining why they are useful in practical system identification tasks with sparse channels, e.g., in acoustic echo/feedback cancellation, from an SSR viewpoint.

A. Interpretation of λ = 0 from Optimization Perspective

We further discuss the interpretation of using λ = 0 in our framework from the optimization perspective. Recall that the AST reparameterization (15) results in the optimization problems (16) and (17). Setting λ = 0 leads both to:

min_{q} J (W_{n} q) .

(33)

This actually applies a change of coordinates to the unregularized problem (1) via (15). Since W_n is invertible, the problem of finding the h that minimizes J(h) is equivalent to finding the q which minimizes J(W_nq). Therefore, the advantage of solving (33) is that the solution is guaranteed to also be a solution of (1), which is not true for (9) with λ > 0. Thus, the optimization is unbiased while promoting sparsity – it is able to take advantage of sparsity whereas avoiding any bias incurred by the introduction of the sparsity regularizer. As noted in [45], the performance of gradient-based methods is dependent on the parameterization – a new choice may substantially alter convergence characteristics. Introducing variable scalings may speed up convergence by altering the descent direction toward the optimum. In our case, solving (33) with appropriately selected W_n can expedite the adaptation procedure toward the optimum of (1).

This observation can also be illustrated by looking at (9) which indicates a trade-off between estimation quality, as reflected in the MSE objective function, and solution sparsity as controlled by λ. In the limiting case of λ → 0⁺, the objective function exerts diminishing impact on enforcing sparsity on the solution, meaning that eventually no sparse solution is favored over other possible solutions. To elaborate, with λ = 0 and under a WSS environment, all the algorithms derived from (9) minimize the MSE and converge toward the Wiener-Hopf solution. However, not surprisingly, the paths they take are different and depend on how the iterations are developed. If the Wiener-Hopf solution is sparse, then all will converge toward the same sparse solution asymptotically. Interestingly, the SLMS and SNLMS, because of their proportionate nature similar to the PNLMS-type algorithms, can take advantage of the sparsity and are capable of speeding up convergence without compromising estimation quality should sparsity be present. This observation will later be supported by experimental results in Section VI-B.

B. Example Diversity Measures and Corresponding W_n

To illustrate the flexibility of the proposed framework, we provide example algorithms instantiated with popular diversity measures that have proved effective in SSR.

Consider the p-norm-like diversity measure with g(h_i) = |h_i|^p, 0 < p ≤ 2 for the reweighted ℓ₂ framework [15], [12]. Using (13) leads to the update form of W_n:

w_{i, n} = {(\frac{2}{p} {(| h_{i, n} | + c)}^{2 - p})}^{\frac{1}{2}} .

(34)

Note that we have empirically added a small regularization constant c > 0 for avoiding algorithm stagnation and instability,¹² which also ensures the positive definiteness of W_n [39]. The parameter p ∈ (0, 2] in (34) is responsible for controlling the sparsity degree, as the p-norm-like diversity measure is associated with super-Gaussian prior distributions. In general, a smaller p corresponds to a heavier-tailed distribution, encouraging stronger sparsity in the parameters. It is worth noting that using p → 1 in (34) results in a proportionate factor close to that of the PNLMS. On the other hand, letting p = 2 recovers the standard LMS/NLMS.

The p-norm-like diversity measure can also be adopted in the reweighted ℓ₁ framework if 0 < p ≤ 1. Applying (14), we obtain the update form of W_n in this case:

w_{i, n} = \frac{1}{p} {(| h_{i, n} | + c)}^{1 - p} .

(35)

Again, a small constant c > 0 is added. The sparsity control parameter of (35) is now p ∈ (0, 1]. In this case, using p → 0.5 in (35) results in a proportionate factor close to that of the PNLMS, whereas letting p = 1 recovers the standard LMS/NLMS.

We can also consider the log-sum penalty with $g (h_{i}) = log (h_{i}^{2} + ϵ)$ , ϵ > 0 for the reweighted ℓ₂ framework [18]. The function is readily amenable to the use of (13) to obtain the update form of W_n as:

w_{i, n} = {(h_{i, n}^{2} + ϵ)}^{\frac{1}{2}} .

(36)

Or consider the log-sum penalty with g(h_i) = log(|h_i| + ϵ), ϵ > 0 for the reweighted ℓ₁framework [19]. Using (14), the update form of W_n becomes:

w_{i, n} = | h_{i, n} | + ϵ .

(37)

The sparsity control parameter is ϵ > 0 for the two log-sum penalty cases. From (36) and (37) we can see that ϵ controls how much proportionate adaptation is encouraged: as ϵ becomes smaller, the term $h_{i, n}^{2}$ or |h_i,n| becomes more dominant. Consequently, they exhibit a stronger proportionate adaptation characteristic. On the contrary, as ϵ becomes larger, the influence of $h_{i, n}^{2}$ or |h_i,n| reduces. Thus, the algorithm will approach the standard LMS/NLMS when $ϵ ≫ h_{i, n}^{2}$ or ϵ ≫ |h_i,n|. In practice, one can start from a large ϵ and reduce it to find a suitable value.

More example functions can be found in [46], [27], including g(h_i) = arctan(|h_i|/ϵ), ϵ > 0 also suggested in [19], which works for both the reweighted ℓ₂ and ℓ₁ frameworks. Note that different diversity measures can result in different computational complexity for calculating W_n. Notably, for example, the p-norm-like function resulting in (34) or (35) might incur extra computation for calculating the quantity to the power 2 − p or 1 − p for some p value (e.g., non-integer power).

Algorithm 1 summarizes the proposed SLMS and SNLMS. A MATLAB implementation of the algorithms is available at https://github.com/chinghualee/SLMS_SNLMS.

IV.

C. Comparison to Existing Work on PNLMS-Type Algorithms

Note that in IPNLMS [21] and IPNLMS-ℓ₀ [22] there is also a parameter for fitting the sparsity degree, which was heuristically introduced to weight between proportionate and non-proportionate updates. However, this empirical parameter does not reflect the sparsity level of the underlying system directly. In our algorithms, we have the sparsity control parameters that play a similar role for fitting different sparsity levels. However, based on diversity measures in SSR, they have direct connections to the system sparsity, thereby offering a more intuitive parameter selection procedure. Our algorithms thus have the advantages of enjoying theoretical support and leveraging sparsity more straightforwardly.

In terms of algorithm derivations, PNLMS-type algorithms were mostly developed from a constrained optimization problem following the principle of minimal disturbance, e.g., [24], [25], [26], [27], [32], in which modified objective functions have been proposed that impose sparsity on the “change” of the filter rather than on the filter itself. For example, [24], [25], [32] considered enforcing sparsity on the difference between the current and updated filters; [26], [27] imposed sparsity on the so-called correctness component as defined in [26] which also represents the change in the filter coefficients. However, since the assumption is that the filter itself is sparse rather the difference between successive updates, the motivation to enforce sparsity on the “change” of the filter is less clear. Sparsity, in turn, does not seem to fit in straightforwardly under the commonly adopted constrained optimization framework. In contrast, we adopt the general MSE criterion in which filter sparsity can be directly imposed via regularization, which is more direct and makes intuitive sense.

V. Steady-State Performance Analysis

The signal model of system identification described in Section II-A is employed for performance analysis. We further assume the noise v_n is i.i.d. according to $N (0, σ_{v}^{2})$ . We also introduce several other assumptions useful for simplifying analysis. Although these assumptions may seem restrictive, they make meaningful analysis possible without significant loss of insight and are also commonly adopted in the literature. We shall later see that these assumptions lead to theoretical results that are supported by experiments.

Assumption 1:

The input data vector u_n is independent of u_k for n ≠ k. Furthermore, u_n is independent of v_k for all n and k. In practice and from past experience in adaptive filters, this assumption simplifies the analysis and does lead to useful insights [2], [3], despite the fact that it does not in general hold true.

Assumption 2:

The input data vector obeys $u_{n} ~ N (0, R)$ for all n. This technical assumption facilitates the analysis by taking advantage of the useful results on Gaussian random variables [4].

Assumption3:

At steady-state, the diagonal matrix W_n in the update equations can be view as a fixed matrix. As suggested in [5], [23], when the system is at steady-state and when the step size is sufficiently small, the coefficients converge in both mean and mean squared senses. Thus, replacing W_n by a fixed matrix becomes reasonable and convenient.

For convenience we shall consider the algorithm of the following form for performance analysis:

h_{n + 1} = h_{n} + μ S u_{n} e_{n},

(38)

where S = diag{s_i} with s_i > 0, ∀i = 0, 1, …, M − 1.

For a fixed underlying system h^o, define the steady-state excess MSE [4]:

J_{ex} ≜ lim_{n \to \infty} E [{(u_{n}^{T} (h^{o} - h_{n}))}^{2}] .

(39)

Under Assumption 1, we have the steady-state MSE:

J ≜ lim_{n \to \infty} E [e_{n}^{2}] = σ_{v}^{2} + J_{ex} .

(40)

The following theorems characterize the steady-state behavior of (38):

Theorem 5.1 (Steady-state excess MSE):

Under Assumptions 1–2 with a sufficiently small μ and assume $R = σ_{u}^{2} I$ , for the adaptive filter (38), the steady-state excess MSE is given by:

J_{ex} = \frac{μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}}}{1 - μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}}} σ_{v}^{2} .

(41)

Proof:

See Appendix B.

Theorem 5.2 (Convergence conditions):

Under Assumptions 1–2 with a sufficiently small μ and assume $R = σ_{u}^{2} I$ , for the adaptive filter (38):

It converges in the mean sense if:
$| λ_{max} {I - μ σ_{u}^{2} S} | < 1,$ (42)
where λ_max {X} denotes the largest eigenvalue of a square matrix X in magnitude.
It converges in the mean squared sense if:
$0 < μ < {(\sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}})}^{- 1} .$ (43)

Proof:

See Appendix C.

A. Steady-State Performance of SLMS

Consider the case where Assumptions 1–3 are in position and $R = σ_{u}^{2} I$ . For analyzing the proposed SLMS (30), first we need to recognize an appropriate S with regard to Assumption 3. A useful approximation at steady-state is to replace the occurrence of h_n by the true system h^o; that is, to use $S = \frac{W^{2}}{\frac{1}{M} tr (W^{2})}$ , where W = diag{w_i} with w_i given by (13) for the reweighted ℓ₂ case, or by (14) for the reweighted ℓ₁ case, both computed based on the corresponding true coefficient $h_{i}^{o}$ . Now, since tr(S) = M, the excess MSE (41) can be approximated as:

J_{ex} \approx \frac{μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2}}{1 - μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2}} σ_{v}^{2} = \frac{μ \frac{σ_{u}^{2}}{2} tr (S)}{1 - μ \frac{σ_{u}^{2}}{2} tr (S)} σ_{v}^{2} = \frac{μ}{\frac{2}{M σ_{u}^{2}} - μ} σ_{v}^{2},

(44)

where for the approximation we assume a sufficiently small step size μ such that $2 μ σ_{u}^{2} s_{i} ≪ 2$ , ∀_i = 0, 1, …, M − 1.

Now, for the mean squared convergence condition, although the upper bound in (43) of Theorem 5.2 contains μ itself, after some inspection it is clear that the lowest stability limit on μ occurs when S has its diagonal elements nonzero at one tap position (with a value of M) and zero at all others [5]. With such an S, it leads to:

0 < μ < \frac{2}{3 M σ_{u}^{2}} .

(45)

On the other hand, the largest stability limit is associated with a proportionate matrix assigning equal gains at each position [5], i.e., S = diag{s_i} with s_i = 1, ∀_i = 0, 1, …, M − 1. With such an S we have:

0 < μ < \frac{2}{(2 + M) σ_{u}^{2}} .

(46)

For a large M, the largest stability limit can be approximated as $\frac{2}{M σ_{u}^{2}} = \frac{2}{tr (R)}$ , which is also the stability limit of the LMS [4]. This result is not surprising since using an S that assigns uniform gains essentially becomes the LMS.

B. Steady-State Performance of SNLMS

Consider the case where Assumptions 1–3 are in position and $R = σ_{u}^{2} I$ . For analyzing the proposed SNLMS (32), first we must identify a fixed S to approximate the term $\frac{S_{n}}{u_{n}^{T} S_{n} u_{n}}$ (where we have ignored δ), for which an exact characterization seems difficult, if at all possible, to obtain. However, if we fix W_n = W at steady-state by Assumption 3, where W is again computed based on the true system h^o, then we have:

S = \frac{\frac{W^{2}}{\frac{1}{M} tr (W^{2})}}{u_{n}^{T} (\frac{W^{2}}{\frac{1}{M} tr (W^{2})}) u_{n}} = \frac{W^{2}}{u_{n}^{T} W^{2} u_{n}} \approx \frac{W^{2}}{σ_{u}^{2} tr (W^{2})},

(47)

with the approximation $u_{n}^{T} W^{2} u_{n} \approx E [u_{n}^{T} W^{2} u_{n}] = σ_{u}^{2} tr (W^{2})$ utilized. A useful fact of (47) is that $tr (S) = {(σ_{u}^{2})}^{- 1}$ . We can thus use the following approximation for (41) to express the excess MSE (41) (and replace μ by $\tilde{μ}$ ):

J_{ex} \approx \frac{\tilde{μ} \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2}}{1 - \tilde{μ} \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2}} σ_{v}^{2} = \frac{\tilde{μ} σ_{u}^{2} \sum_{i = 0}^{M - 1} s_{i}}{2 - \tilde{μ} σ_{u}^{2} \sum_{i = 0}^{M - 1} s_{i}} σ_{v}^{2} = \frac{\tilde{μ} σ_{u}^{2} tr (S)}{2 - \tilde{μ} σ_{u}^{2} tr (S)} σ_{v}^{2} = \frac{\tilde{μ} σ_{u}^{2} {(σ_{u}^{2})}^{- 1}}{2 - \tilde{μ} σ_{u}^{2} {(σ_{u}^{2})}^{- 1}} σ_{v}^{2} = \frac{\tilde{μ}}{2 - \tilde{μ}} σ_{v}^{2},

(48)

for $\tilde{μ}$ sufficiently small such that $2 \tilde{μ} σ_{u}^{2} s_{i} ≪ 2$ , ∀_i = 0, 1, …, M − 1,

For the mean squared convergence condition, using the same argument as in the SLMS case for (45) and (46), we can obtain the lowest stability limit as:

0 < \tilde{μ} < \frac{2}{3}

(49)

and the largest stability limit as:

0 < \tilde{μ} < \frac{2}{1 + \frac{2}{M}} .

(50)

For a large M, we have (50) approximately as $0 < \tilde{μ} < 2$ , which is the classic result of the NLMS [4].

VI. Simulation Results

The proposed algorithms are evaluated using computer simulations in MATLAB. We consider three system IRs as shown in Fig. 1 which represent different sparsity levels: quasi-sparse, sparse, and dispersive systems. The IR of the quasi-sparse system is an acoustic feedback path between the microphone and the loudspeaker of a hearing aid device that was measured from a real-world scenario. It represents a typical IR of many practical system identification problems where certain degree of structural sparsity exists. The sparse and dispersive IRs were artificially generated. Each of these IRs has 256 taps. We conducted experiments to obtain the MSE learning curves (i.e., the ensemble average of $e_{n}^{2}$ as a function of iteration n) for performance comparison. The ensemble averaging was performed over 1000 independent Monte Carlo runs for obtaining each curve. In all experiments, the adaptive filter coefficients were initialized with all zeros. For the input signal, we mainly consider for theoretical analysis two types of u_n: i) a zero mean, unit variance white Gaussian process and ii) a first order autoregressive (AR) process according to u_n = ρu_n−1 + η_n, where ρ = 0.8 and η_n is i.i.d. according to $N (0, 1)$ . We also include results of speech inputs for demonstrating the algorithm performance with non-stationary signals. The system noise v_n is i.i.d. according to $N (0, 0.01)$ . Regarding the algorithms, when using (34) for updating W_n, a small positive constant c = 0.001 was always used.

A. Comparison of Algorithms with and without AST

Fig. 2 compares the proposed generalized sparse LMS (29), i.e., the AST-based approach, to some existing regularization-based algorithms of (10), i.e., the regular gradient descent without AST. Specifically, we use the p-norm-like penalty $‖ h ‖_{p}^{p}$ with p = 1 and the log-sum penalty $\sum_{i = 0}^{M - 1} log (| h_{i} | + ϵ)$ with ϵ = 0.1 as examples. These two choices of the sparsity-inducing function $I (h)$ in (10) result in the ZA-LMS and RZA-LMS [28], respectively. We compare them with the corresponding AST-based algorithms obtained from (29), also adopting the two penalty functions for G(h) that lead to (34) and (37) for computing W_n, respectively. We set μ = 0.0025 and λ = 0.001 in all cases and used the white Gaussian process input. Fig. 2 (a) shows the results of identifying the sparse IR and Fig. 2 (b) is the case of estimating the quasi-sparse IR. From the results we see that the AST strategy leads to algorithms (dotted lines) that demonstrate faster convergence than the existing approaches (solid lines).

B. Effect of Sparsity Control Parameter on SLMS and SNLMS

In this experiment we investigate the effect of the sparsity control parameter on the convergence of SLMS (30) and SNLMS (32). We use the p-norm-like diversity measure $‖ h ‖_{p}^{p}$ within the reweighted ℓ₂ framework, i.e., using (34) for updating W_n, for demonstration purposes. We study the cases of the sparsity control parameter p = 1, 1.2, 1.5, 1.8, 2. We also include the LMS (3) and NLMS (4) performance curves for reference. For LMS and SLMS we used μ = 0.0025. For NLMS and SNLMS we used $\tilde{μ} = 0.5$ and δ = 0.01.

Fig. 3 and Fig. 4 show the resulting MSE curves for SLMS using the white Gaussian noise input and SNLMS using the AR process input, respectively. Recall that the proportionate factors of SLMS/SNLMS using (34) for W_n approximate that of the PNLMS when p → 1, and regenerate the LMS/NLMS when p = 2, as has been discussed in Section IV-B. Therefore, the parameter p plays the role for fitting different sparsity levels and the selection of p can be crucial for obtaining optimal performance for IRs with different sparsity degrees. The results in both Fig. 3 and Fig. 4 suggest that for the quasi-sparse case, the fastest convergence is given by p ∈ [1.2, 1.5], which seems reasonable in terms of finding a balance between PNLMS (p → 1) and LMS/NLMS (p = 2). On the other hand, for the sparse system, p ∈ [1, 1.2] gives the best results, which is also reasonable since as the sparsity level increases, a more PNLMS-like algorithm can be more favorable. Finally, for the dispersive system we see that p ∈ [1.8, 2] results in the fastest convergence and is comparable to, if not better than, the LMS and NLMS. This indicates that a more LMS/NLMS-like algorithm is preferable when the system IR is far from sparse. To conclude, the results show that the algorithms exploit the underlying system structure in the way we expect.

C. Effect of Step Size on SLMS and SNLMS

Fig. 5 studies the effect of the step size on the convergence behavior of the SLMS and SNLMS. We again used (34) for updating W_n. Fig.5 (a) shows the resulting MSE curves obtained by running the SLMS with p = 1.2 on the sparse IR with various μ values, using the white Gaussian noise input. Fig. 5 (b) shows the resulting MSE curves obtained by running the SNLMS with p = 1.5 on the quasi-sparse IR with various $\tilde{μ}$ values, using the AR process input. The dotted lines indicate the theoretical steady-state MSE levels computed from (40) using (44) and (48) for SLMS and SNLMS, respectively. We can see that similar to the well-known trade-off in LMS and NLMS, a larger step size results in faster convergence while at the expense of steady-state performance. We also see that as the step size increases the theoretical prediction becomes less accurate; this is probably due to the approximation made based on the small step size assumption for arriving at (44) and (48). Nevertheless, the prediction agrees well with the stead-state MSE in most cases for a small step size. In addition, though several assumptions have been made to arrive at (40), (44) and (48), the results show that they predict reasonably well in the case of white input and even for correlated input.

D. Comparison with Existing Algorithms

We compare the proposed SLMS and SNLMS using (34) for W_n with existing LMS-type and NLMS-type algorithms. To see how the algorithms behave in a changing environment, in each of the following experiments, a change in the underlying system was introduced by shifting the IR to the right by 16 samples in the middle of the adaptation process [47].

Fig. 6 compares the LMS-type algorithms using the white Gaussian process input. Fig. 6 (a) and Fig. 6 (b) show the MSE curves obtained with the quasi-sparse and sparse IRs, respectively. For LMS we used μ = 0.0025. For ZA-LMS, RZA-LMS, and ℓ₀-LMS we fixed μ = 0.0025 and then experimentally optimized the remaining parameters to obtain the best performance in each case. For SLMS we used p = 1.5 and μ = 0.002 in the quasi-sparse case and p = 1.2 and μ = 0.0005 in the sparse case. From the results we can see that all the sparsity-aware algorithms outperform the LMS, with SLMS demonstrating the best result. Comparing Fig. 6 (a) and Fig. 6 (b), we also see that the benefit brought by existing sparsity-aware algorithms becomes limited when the system is less sparse, while the SLMS still provides significant improvement.

Fig. 7 compares the NLMS-type algorithms using the AR process input. Fig. 7 (a) and Fig. 7 (b) show the MSE curves obtained with the quasi-sparse and sparse IRs, respectively. For all the algorithms we used $\tilde{μ} = 0.5$ . For NLMS we used δ = 0.01. For PNLMS, IPNLMS, and IPNLMS-ℓ₀ we set δ = 0.01/M according to [47], and experimentally optimized the remaining parameters to obtain the best performance in each case. For SNLMS we used p = 1.5 in the quasi-sparse case and p = 1.2 in the sparse case. We used δ = 0.01 for SNLMS, same as NLMS.¹³ From the results we again observe the benefit of using sparsity-aware adaptation. In addition, the SNLMS demonstrates performance as good as, if not better than, the other proportionate algorithms.

Fig. 8 considers a more practical scenario where we used a speech signal as the input and the quasi-sparse IR, which represents an acoustic channel of practical interest, as the underlying system. The input signal-to-noise ration (SNR) was set to 20 dB using white Gaussian noise. For the SLMS and SNLMS we used p = 1.5 which is a suitable choice for quasi-sparse systems. For evaluation we compare the normalized misalignment ${‖ h^{o} - h_{n} ‖}_{2}^{2} / {‖ h^{o} ‖}_{2}^{2}$ . In Fig. 8 (a) we see that SLMS performs much better than the LMS, while the ℓ₀-LMS fails to provide any improvement. This may be due to the fact that existing regularization-based algorithms tend to enforce sparsity in a more aggressive manner as they work with λ > 0, and this may not be beneficial, if not harmful, when the underlying system is not truly sparse. In Fig.8 (b) we see that SNLMS demonstrates superior convergence behavior than the NLMS, and is also better than the IPNLMS and IPNLMS-ℓ_0.

Fig. 9 shows the results for a noisier environment, i.e., 0 dB input SNR, for the same experimental setting of Fig. 8 (only the step size parameters were further tuned due to the stronger noise). We see that in Fig.9 (a) the SLMS significantly outperforms the LMS, while the ℓ₀-LMS performs worse. The SNLMS in Fig. 9 (b), on the other hands, still performs better than the NLMS, and is comparable to other proportionate algorithms. From the results we see that our observation on the SLMS and SNLMS performance may be robust to the noise condition.

VII. Conclusion

In this paper, we developed a mathematical framework for rigorously deriving adaptive filters that exploit the sparse structure of the underlying system response. We started with the regularized objective framework of SSR and developed algorithms that are of the proportionate type. As a result, the adaptive algorithms are quite general and can accommodate a range of regularization functions. The framework utilizes the AST methodology within the iterative reweighted ℓ₂ and ℓ₁ frameworks for deriving algorithms. We showed that the AST is crucial for obtaining improved adaptive filtering performance over existing approaches when gradient descent approaches are used. We further introduced the SLMS and SNLMS by adopting a zero regularization coefficient in our framework, which take advantage of, though do not strictly enforce, the sparsity of the underlying system if it already exists. Note that the proposed framework is not limited to the algorithms that we have presented so far. Any other penalty function that satisfies the conditions imposed on the diversity measure can potentially be a good candidate for obtaining effective adaptive algorithms by utilizing the framework.

Acknowledgment

The authors would like to thank M. Liang for fruitful discussions leading to Proposition 3.1 in the paper.

This work was supported by National Institutes of Health/National Institute on Deafness and Other Communication Disorders under Grants R01DC015436 and R33DC015046 and National Science Foundation/Information and Intelligent Systems under Award 1838830.

Appendix A

Proof of Theorem 3.1

The proof follows the idea in [48]. We wish to show that the regularized objective function J^G(h) in (9) is decreased in each iteration when optimized via (20) and (21). Before proceeding, we need the following lemmas:

Lemma A.1:

For the general diversity measure $G (h) = \sum_{i = 0}^{M - 1} g (h_{i})$ that satisfies Properties 1–4 in Section II-B, with g(z) being strictly concave in z² for Property 4, we have:

G (h_{n + 1}) - G (h_{n}) < {‖ W_{n}^{- 1} h_{n + 1} ‖}_{2}^{2} - {‖ W_{n}^{- 1} h_{n} ‖}_{2}^{2},

(51)

where W_n = diag{w_i,n} with w_i,n given by (13).

Proof:

Since g(z) is strictly concave in z², it satisfies g(z) = f(z²) where f(z) is concave for $z \in ℝ_{+}$ . Due to the concavity, we have the following inequality:

f (z_{2}) - f (z_{1}) < f^{'} (z_{1}) (z_{2} - z_{1})

(52)

hold for some z₁, $z_{2} \in ℝ_{+}$ . Note that we use f′(z₁) to denote the first order derivative of f(z) w.r.t. z evaluated at z = z₁.

Substituting $z_{1} = h_{i, n}^{2}$ and $z_{2} = h_{i, n + 1}^{2}$ into (52) gives:

f (h_{i, n + 1}^{2}) - f (h_{i, n}^{2}) < f^{'} (h_{i, n}^{2}) (h_{i, n + 1}^{2} - h_{i, n}^{2}) .

(53)

Noting that $f (h_{i, n + 1}^{2}) = g (h_{i, n + 1})$ and $f (h_{i, n}^{2}) = g (h_{i, n})$ , we have:

g (h_{i, n + 1}) - g (h_{i, n}) < f^{'} (h_{i, n}^{2}) (h_{i, n + 1}^{2} - h_{i, n}^{2}) .

(54)

From (13) we have $f^{'} (h_{i, n}^{2}) = w_{i, n}^{- 2}$ . Therefore,

g (h_{i, n + 1}) - g (h_{i, n}) < w_{i, n}^{- 2} (h_{i, n + 1}^{2} - h_{i, n}^{2}) .

(55)

Summing over i = 0, 1, …, M − 1 on both sides of (55) justifies (51) of Lemma A.1.

Lemma A.2:

For the general diversity measure $G (h) = \sum_{i = 0}^{M - 1} g (h_{i})$ that satisfies Properties 1–4 in Section II-B, with g(z) being strictly concave in |z| for Property 4, we have:

G (h_{n + 1}) - G (h_{n}) < {‖ W_{n}^{- 1} h_{n + 1} ‖}_{1} - {‖ W_{n}^{- 1} h_{n} ‖}_{1},

(56)

where W_n = diag{w_i,n} with w_i,n given by (14).

Proof:

Since g(z) is strictly concave in |z|, it satisfies g(z) = f(|z|) where f(z) is concave for $z \in ℝ_{+}$ . Again, the inequality (52) holds due to the concavity of f(z).

Substituting z₁ = |h_i,n| and z₂ = |h_i,n+1| into (52) gives:

f (| h_{i, n + 1} |) - f (| h_{i, n} |) < f^{'} (| h_{i, n} |) (| h_{i, n + 1} | - | h_{i, n} |) .

(57)

Noting that f(|h_i,n+1|) = g(h_i,n+1) and f(|h_i,n|) = g(h_i,n), we have:

g (h_{i, n + 1}) - g (h_{i, n}) < f^{'} (| h_{i, n} |) (| h_{i, n + 1} | - | h_{i, n} |) .

(58)

From (14) we have $f^{'} (| h_{i, n} |) = w_{i, n}^{- 1}$ . Therefore,

g (h_{i, n + 1}) - g (h_{i, n}) < w_{i, n}^{- 1} (| h_{i, n + 1} | - | h_{i, n} |) .

(59)

Summing over i = 0, 1, …, M − 1 on both sides of (59), we have (56) of Lemma A.2 justified.

Now we are ready to show that J^G(h) decreases in each iteration by using the update recursions (20) and (21).

First, for the reweighted ℓ₂ framework with $J_{n}^{l_{2}} (q)$ in (16), we have:

J^{G} (h_{n + 1}) - J^{G} (h_{n}) = [J (h_{n + 1}) + λ G (h_{n + 1})] - [J (h_{n}) + λ G (h_{n})] < [J (h_{n + 1}) + λ {‖ W_{n}^{- 1} h_{n + 1} ‖}_{2}^{2}] - [J (h_{n}) + λ {‖ W_{n}^{- 1} h_{n} ‖}_{2}^{2}] = [J (W_{n} q_{n + 1 ∣ n}) + λ {‖ q_{n + 1 ∣ n} ‖}_{2}^{2}] - [J (W_{n} q_{n ∣ n}) + λ {‖ q_{n ∣ n} ‖}_{2}^{2}] = J_{n}^{l_{2}} (q_{n + 1 ∣ n}) - J_{n}^{l_{2}} (q_{n ∣ n}) .

(60)

The inequality follows from Lemma A.1. The AST relationships (18) and (19) are also utilized. As we perform optimization of (16) gradient descent, we can have $J_{n}^{l_{2}} (q)$ decrease in each iteration n, i.e., $J_{n}^{l_{2}} (q_{n + 1 ∣ n}) - J_{n}^{l_{2}} (q_{n ∣ n}) < 0$ using some μ_n. Therefore, the choice of ${μ_{n}}_{n = 0}^{\infty}$ ensures the decrease in J^G(h) according to (60), and the update recursion (20) monotonically converges to a local minimum (or saddle point) of (9) under a WSS environment.

On the other hand, for the reweighted ℓ₁ framework with $J_{n}^{l_{1}} (q)$ in (17), we have:

J^{G} (h_{n + 1}) - J^{G} (h_{n}) = [J (h_{n + 1}) + λ G (h_{n + 1})] - [J (h_{n}) + λ G (h_{n})] < [J (h_{n + 1}) + λ {‖ W_{n}^{- 1} h_{n + 1} ‖}_{1}] - [J (h_{n}) + λ {‖ W_{n}^{- 1} h_{n} ‖}_{1}] = [J (W_{n} q_{n + 1 ∣ n}) + λ {‖ q_{n + 1 ∣ n} ‖}_{1}] - [J (W_{n} q_{n ∣ n}) + λ {‖ q_{n ∣ n} ‖}_{1}] = J_{n}^{l_{1}} (q_{n + 1 ∣ n}) - J_{n}^{l_{1}} (q_{n ∣ n}) .

(61)

The inequality follows from Lemma A.2. The AST relationships (18) and (19) are also utilized. Similar to the above argument of the reweighted ℓ₂ case, there exists a choice of ${μ_{n}}_{n = 0}^{\infty}$ that ensures the decrease in J^G(h) according to (61), and the update recursion (21) monotonically converges to a local minimum (or saddle point) of (9) under a WSS environment.

Appendix B

Proof of Theorem 5.1

The proof follows the discussion in [4], [5], [25]. Substituting $e_{n} = d_{n} - u_{n}^{T} h_{n}$ into (38) we have:

h_{n + 1} = h_{n} - μ S u_{n} u_{n}^{T} h_{n} + μ S u_{n} d_{n} .

(62)

Using the fact that $d_{n} = u_{n}^{T} h^{o} + v_{n}$ , we have:

h_{n + 1} = h_{n} + μ S u_{n} u_{n}^{T} (h^{o} - h_{n}) + μ S u_{n} v_{n} .

(63)

Define the misalignment vector ε_n as:

ε_{n} = h^{o} - h_{n} .

(64)

Then from (63) we have:

ε_{n + 1} = (I - μ S u_{n} u_{n}^{T}) ε_{n} - μ S u_{n} v_{n} .

(65)

Next, based on (65) we have:

ε_{n + 1} ε_{n + 1}^{T} = (I - μ S u_{n} u_{n}^{T}) ε_{n} ε_{n}^{T} (I - μ u_{n} u_{n}^{T} S) + μ^{2} v_{n}^{2} S u_{n} u_{n}^{T} S + Ξ,

(66)

where Ξ represents the remaining cross terms whose expectations are zero.

Let $Ω_{n} = E [ε_{n} ε_{n}^{T}]$ . Taking expectation on both sides of (66) we have:

Ω_{n + 1} = \underset{≜ Θ}{\underset{︸}{E [(I - μ S u_{n} u_{n}^{T}) ε_{n} ε_{n}^{T} (I - μ u_{n} u_{n}^{T} S)]}} + μ^{2} σ_{v}^{2} S R S .

(67)

Note that:

Θ = Ω_{n} - μ S R Ω_{n} - μ Ω_{n} R S + μ^{2} S E [u_{n} u_{n}^{T} ε_{n} ε_{n}^{T} u_{n} u_{n}^{T}] S .

(68)

With Assumptions 1 and 2 it can be shown that [4]:

E [u_{n} u_{n}^{T} ε_{n} ε_{n}^{T} u_{n} u_{n}^{T}] = 2 R Ω_{n} R + R tr (R Ω_{n}) .

(69)

Thus,

Θ = Ω_{n} - μ S R Ω_{n} - μ Ω_{n} R S + 2 μ^{2} S R Ω_{n} R S + μ^{2} S R tr (R Ω_{n}) S .

(70)

Then, with $R = σ_{u}^{2} I$ , in steady-state, i.e., n → ∞,

Ω_{\infty} = Ω_{\infty} - μ σ_{u}^{2} S Ω_{\infty} - μ σ_{u}^{2} Ω_{\infty} S + 2 μ^{2} σ_{u}^{4} S Ω_{\infty} S + μ^{2} σ_{u}^{4} S tr (Ω_{\infty}) S + μ^{2} σ_{u}^{2} σ_{v}^{2} S^{2} .

(71)

This implies:

ω_{\infty} = ω_{\infty} - 2 μ σ_{u}^{2} S ω_{\infty} + 2 μ^{2} σ_{u}^{4} S^{2} ω_{\infty} + μ^{2} σ_{u}^{4} s^{2} 1^{T} ω_{\infty} + μ^{2} σ_{u}^{2} σ_{v}^{2} s^{2},

(72)

where ω_∞ and s are the vectors consisting of the diagonal elements of Ω_∞ and S, respectively, and s² denotes the element-wise squares of the vector s.

The steady-state excess MSE is then:

J_{ex} ≜ lim_{n \to \infty} E [{(u_{n}^{T} (h^{o} - h_{n}))}^{2}] = lim_{n \to \infty} E [{(u_{n}^{T} ε_{n})}^{2}] = lim_{n \to \infty} tr (Ω_{n} E [u_{n} u_{n}^{T}]) = σ_{u}^{2} tr (Ω_{\infty}) = σ_{u}^{2} 1^{T} ω_{\infty} .

(73)

Using (73) for (72) and rearrange the equation, we have:

ω_{i, \infty} = \frac{μ^{2} σ_{u}^{2} s_{i}^{2} J_{ex} + μ^{2} σ_{u}^{2} σ_{v}^{2} s_{i}^{2}}{2 μ σ_{u}^{2} s_{i} - 2 μ^{2} σ_{u}^{4} s_{i}^{2}} .

(74)

Then it leads to:

J_{ex} = σ_{u}^{2} \sum_{i = 0}^{M - 1} ω_{i, \infty} = \sum_{i = 0}^{M - 1} \frac{μ^{2} σ_{u}^{2} s_{i}^{2} J_{ex} + μ^{2} σ_{u}^{2} σ_{v}^{2} s_{i}^{2}}{2 μ s_{i} - 2 μ^{2} σ_{u}^{2} s_{i}^{2}},

(75)

which yields:

J_{ex} = \frac{μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}}}{1 - μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}}} σ_{v}^{2} .

(76)

This justifies Theorem 5.1.

Appendix C

Proof of Theorem 5.2

Note that Assumption 1 ensures that h_n, u_n, and v_n are mutually independent. Thus, taking expectation of both sides of (65) gives:

E [ε_{n + 1}] = (I - μ S R) E [ε_{n}] .

(77)

Therefore, the following condition is sufficient for convergence in the mean sense [4]:

| λ_{max} {I - μ S R} | < 1.

(78)

With $R = σ_{u}^{2} I$ , Theorem 5.2-i) is justified.

From (76) we see, by requiring:

1 - μ \sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}} > 0,

(79)

we obtain the stability bound for μ as:

0 < μ < {(\sum_{i = 0}^{M - 1} \frac{σ_{u}^{2} s_{i}}{2 - 2 μ σ_{u}^{2} s_{i}})}^{- 1},

(80)

which justifies Theorem 5.2-ii).

Footnotes

This similarity has been noticed in [38] where sparse adaptive filtering techniques were utilized for solving the SSR problem. Here we take the opposite direction as we are interested in utilizing SSR techniques for assisting the adaptive filtering algorithms. Both cases exploit the connections between SSR and adaptive filtering but the objectives are different.

The algorithms usually reduce to the conventional LMS or NLMS algorithm if the regularization coefficient λ is set to zero.

Note that ‖x‖_p for 0 < p < 1 does not satisfy the required axioms for a norm and therefore it is not technically a norm. For exposition simplicity, since the range of p considered is from 0 to 2, we use the norm terminology to cover this range.

⁴

The quotation marks are used to warn that it is not a proper norm.

⁵

By abusing the notation we use ∇_x also for the subgradient operator without explicitly noting it.

⁶

Formally, $\tilde{μ}$ is called the normalized step size. For brevity, we still refer to it as the step size but keep in mind that it does not have the same significance as the μ in (3). Note that it is also common in the literature that the same notation of the step size is shared for both LMS and NLMS without explicit distinction.

⁷

By abusing the terminology we implicitly use “gradient” also for subgradient whenever appropriate.

⁸

The positive definiteness can be shown to hold for a wide variety of diversity measures used in SSR. In cases where it is not, the positive definiteness can still be ensured by utilizing some small regularization constant.

⁹

For a point at which G(h)is non-differentiable, this can still hold by properly choosing the subgradients.

¹⁰

Note that the chain rule here is basically ∇_q = W_n ∇_h as a result of the change of variables (15) for a given W_n at iteration n.

¹¹

Note that for Property 4, Theorem 3.1 holds for (20) of the reweigthed ℓ₂ framework if g(z) is concave in z². On the other hand, it holds for (21) of the reweighted ℓ₁ framework if g(z) is concave in |z|.

¹²

We suggest that c be kept relatively small as compared to the amplitude of the filter coefficients so that it would not affect the convergence significantly.

¹³

Due to the division by $\frac{1}{M}$ in (28) which is not present in (8) of existing PNLMS-type algorithms, the division by M is not needed for δ in SNLMS.

References

[1].Widrow B and Stearns SD, Adaptive Signal Processing, Pearson, 1985. [Google Scholar]
[2].Haykin S, Adaptive Filter Theory, 5th edition, Pearson, 2013. [Google Scholar]
[3].Sayed AH, Adaptive Filters, John Wiley & Sons, 2011. [Google Scholar]
[4].Manolakis DG, Ingle VK, and Kogon SM, Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering, and Array Processing, McGraw-Hill; Boston, 2000. [Google Scholar]
[5].Duttweiler DL, “Proportionate normalized least-mean-squares adaptation in echo cancelers,” IEEE Trans. Speech Audio Process, vol. 8, no. 5, pp. 508–518, 2000. [Google Scholar]
[6].Benesty J, Gänsler T, Morgan DR, Sondhi MM, and Gay SL, Advances in Network and Acoustic Echo Cancellation, Springer, 2001. [Google Scholar]
[7].Paleologu C, Benesty J, and Ciochină S, “Sparse adaptive filters for echo cancellation,” Synth. Lect. Speech Audio Process, vol. 6, no. 1, pp. 1–124, 2010. [Google Scholar]
[8].Hänsler E and Schmidt G, Topics in Acoustic Echo and Noise Control: Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing, Springer Science & Business Media, 2006. [Google Scholar]
[9].Lee C-H, Rao BD, and Garudadri H, “Sparsity promoting LMS for adaptive feedback cancellation,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2017, pp. 226–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Kocic M, Brady D, and Stojanovic M, “Sparse equalization for real-time digital underwater acoustic communications,” in Proc. MTS/IEEE OCEANS, 1995, vol. 3, pp. 1417–1422. [Google Scholar]
[11].Wipf D and Nagarajan S, “Iterative reweighted ℓ₁ and ℓ₂ methods for finding sparse solutions,” IEEE J. Sel. Top. Signal Process, vol. 4, no. 2, pp. 317–329, 2010. [Google Scholar]
[12].Rao BD and Kreutz-Delgado K, “An affine scaling methodology for best basis selection,” IEEE Trans. Signal Process, vol. 47, no. 1, pp. 187–200, 1999. [Google Scholar]
[13].Nash SG and Sofer A, Linear and Nonlinear Programming, McGraw-Hill Inc., 1996. [Google Scholar]
[14].Tibshirani R, “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. Series B (Methodological), pp. 267–288, 1996. [Google Scholar]
[15].Gorodnitsky IF and Rao BD, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Trans. Signal Process, vol. 45, no. 3, pp. 600–616, 1997. [Google Scholar]
[16].Tipping ME, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res, vol. 1, no. June, pp. 211–244, 2001. [Google Scholar]
[17].Chen SS, Donoho DL, and Saunders MA, “Atomic decomposition by basis pursuit,” SIAM Review, vol. 43, no. 1, pp. 129–159, 2001. [Google Scholar]
[18].Chartrand R and Yin W, “Iteratively reweighted algorithms for compressive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2008, pp. 3869–3872. [Google Scholar]
[19].Candès EJ, Wakin MB, and Boyd SP, “Enhancing sparsity by reweighted ℓ₁ minimization,” J. Fourier Anal. Appl, vol. 14, no. 5, pp. 877–905, 2008. [Google Scholar]
[20].Wagner K and Doroslovački M, Proportionate-type Normalized Least Mean Square Algorithms, John Wiley & Sons, 2013. [Google Scholar]
[21].Benesty J and Gay SL, “An improved PNLMS algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2002, pp. 1881–1884. [Google Scholar]
[22].Paleologu C, Benesty J, and Ciochină S, “An improved proportionate NLMS algorithm based on the l₀ norm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2010, pp. 309–312. [Google Scholar]
[23].Martin RK, Sethares WA, Williamson RC, and Johnson CR, “Exploiting sparsity in adaptive filters,” IEEE Trans. Signal Process, vol. 50, no. 8, pp. 1883–1894, 2002. [Google Scholar]
[24].Rao BD and Song B, “Adaptive filtering algorithms for promoting sparsity,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2003, pp. 361–364. [Google Scholar]
[25].Jin Y, Algorithm Development for Sparse Signal Recovery and Performance Limits Using Multiple-User Information Theory, Ph.D. dissertation, University of California, San Diego, 2011. [Google Scholar]
[26].Benesty J, Paleologu C, and Ciochină S, “Proportionate adaptive filters from a basis pursuit perspective,” IEEE Signal Process. Lett, vol. 17, no. 12, pp. 985–988, 2010. [Google Scholar]
[27].Liu J and Grant SL, “A generalized proportionate adaptive algorithm based on convex optimization,” in Proc. IEEE China Summit Int. Conf. Signal Inform. Process. (ChinaSIP),2014, pp. 748–752. [Google Scholar]
[28].Chen Y, Gu Y, and Hero AO, “Sparse LMS for system identification,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp. 3125–3128. [Google Scholar]
[29].Gu Y, Jin J, and Mei S, “l₀ norm constraint LMS algorithm for sparse system identification,” IEEE Signal Process. Lett, vol. 16, no. 9, pp. 774–777, 2009. [Google Scholar]
[30].Wu FY and Tong F, “Gradient optimization p-norm-like constraint LMS algorithm for sparse system estimation,” Signal Process, vol. 93, no. 4, pp. 967–971, 2013. [Google Scholar]
[31].Taheri O and Vorobyov SA, “Reweighted l₁-norm penalized LMS for sparse channel estimation and its analysis,” Signal Process, vol. 104, pp. 70–79, 2014. [Google Scholar]
[32].Das RL and Chakraborty M, “Improving the performance of the PNLMS algorithm using l₁ norm regularization,” IEEE/ACM Trans. Audio, Speech, and Lang. Process, vol. 24, no. 7, pp. 1280–1290, 2016. [Google Scholar]
[33].Li Y and Hamamura M, “An improved proportionate normalized least-mean-square algorithm for broadband multipath channel estimation,” Sci. World J, vol. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Albu F, Caciula I, Li Y, and Wang Y, “The l_p-norm proportionate normalized least mean square algorithm for active noise control,” in Proc. Int. Conf. Syst. Theory, Control, Comput. (ICSTCC), 2017, pp. 401–405. [Google Scholar]
[35].Lima MVS, Ferreira TN, Martins WA, and Diniz PSR, “Sparsity-aware data-selective adaptive filters,” IEEE Trans. Signal Process, vol. 62, no. 17, pp. 4557–4572, 2014. [Google Scholar]
[36].Pelekanakis K and Chitre M, “New sparse adaptive algorithms based on the natural gradient and the l₀-norm,” IEEE J. Oceanic Eng, vol. 38, no. 2, pp. 323–332, 2013. [Google Scholar]
[37].Ferreira TN, Lima MVS, Diniz PSR, and Martins WA, “Low-complexity proportionate algorithms with sparsity-promoting penalties,” in Proc. IEEE Int. Symp. Circuits. Syst. (ISCAS),2016, pp. 253–256. [Google Scholar]
[38].Jin J, Gu Y, and Mei S, “A stochastic gradient approach on compressive sensing signal reconstruction based on adaptive filtering framework,” IEEE J. Sel. Top. Signal Process, vol. 4, no. 2, pp. 409–420, 2010. [Google Scholar]
[39].Lee C-H, Rao BD, and Garudadri H, “Proportionate adaptive filters based on minimizing diversity measures for promoting sparsity,” in Proc. Asilomar Conf. Signals, Syst., Comput. (ACSSC), 2019, pp. 769–773. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Variddhisaï T and Mandic DP, “On an RLS-like LMS adaptive filter,” in Proc. IEEE Int. Conf. Digital Signal Process. (DSP), 2017, pp. 1–5. [Google Scholar]
[41].Su G, Jin J, Gu Y, and Wang J, “Performance analysis of l₀ norm constraint least mean square algorithm,” IEEE Trans. Signal Process, vol. 60, no. 5, pp. 2223–2235, 2012. [Google Scholar]
[42].Sun Y, Babu P, and Palomar DP, “Majorization-minimization algorithms in signal processing, communications, and machine learning,” IEEE Trans. Signal Process, vol. 65, no. 3, pp. 794–816, 2017. [Google Scholar]
[43].Oliveira JP, Bioucas-Dias JM, and Figueiredo MAT, “Adaptive total variation image deblurring: A majorization–minimization approach,” Signal Process, vol. 89, no. 9, pp. 1683–1693, 2009. [Google Scholar]
[44].Boyd S and Vandenberghe L, Convex Optimization, Cambridge University Press, 2004. [Google Scholar]
[45].Luenberger DG and Ye Y, Linear and Nonlinear Programming, 4th edition, Springer, 2016. [Google Scholar]
[46].Chen L and Gu Y, “From least squares to sparse: A non-convex approach with guarantee,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 5875–5879. [Google Scholar]
[47].Benesty J, Paleologu C, and Ciochină S, “On regularization in adaptive filtering,” IEEE Trans. Audio, Speech, Lang. Process, vol. 19, no. 6, pp. 1734–1742, 2010. [Google Scholar]
[48].Rao BD, Engan K, Cotter SF, Palmer J, and Kreutz-Delgado K, “Subset selection in noise based on diversity measure minimization,” IEEE Trans. Signal Process, vol. 51, no. 3, pp. 760–770, 2003. [Google Scholar]

[R1] [1].Widrow B and Stearns SD, Adaptive Signal Processing, Pearson, 1985. [Google Scholar]

[R2] [2].Haykin S, Adaptive Filter Theory, 5th edition, Pearson, 2013. [Google Scholar]

[R3] [3].Sayed AH, Adaptive Filters, John Wiley & Sons, 2011. [Google Scholar]

[R4] [4].Manolakis DG, Ingle VK, and Kogon SM, Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering, and Array Processing, McGraw-Hill; Boston, 2000. [Google Scholar]

[R5] [5].Duttweiler DL, “Proportionate normalized least-mean-squares adaptation in echo cancelers,” IEEE Trans. Speech Audio Process, vol. 8, no. 5, pp. 508–518, 2000. [Google Scholar]

[R6] [6].Benesty J, Gänsler T, Morgan DR, Sondhi MM, and Gay SL, Advances in Network and Acoustic Echo Cancellation, Springer, 2001. [Google Scholar]

[R7] [7].Paleologu C, Benesty J, and Ciochină S, “Sparse adaptive filters for echo cancellation,” Synth. Lect. Speech Audio Process, vol. 6, no. 1, pp. 1–124, 2010. [Google Scholar]

[R8] [8].Hänsler E and Schmidt G, Topics in Acoustic Echo and Noise Control: Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing, Springer Science & Business Media, 2006. [Google Scholar]

[R9] [9].Lee C-H, Rao BD, and Garudadri H, “Sparsity promoting LMS for adaptive feedback cancellation,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2017, pp. 226–230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Kocic M, Brady D, and Stojanovic M, “Sparse equalization for real-time digital underwater acoustic communications,” in Proc. MTS/IEEE OCEANS, 1995, vol. 3, pp. 1417–1422. [Google Scholar]

[R11] [11].Wipf D and Nagarajan S, “Iterative reweighted ℓ₁ and ℓ₂ methods for finding sparse solutions,” IEEE J. Sel. Top. Signal Process, vol. 4, no. 2, pp. 317–329, 2010. [Google Scholar]

[R12] [12].Rao BD and Kreutz-Delgado K, “An affine scaling methodology for best basis selection,” IEEE Trans. Signal Process, vol. 47, no. 1, pp. 187–200, 1999. [Google Scholar]

[R13] [13].Nash SG and Sofer A, Linear and Nonlinear Programming, McGraw-Hill Inc., 1996. [Google Scholar]

[R14] [14].Tibshirani R, “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. Series B (Methodological), pp. 267–288, 1996. [Google Scholar]

[R15] [15].Gorodnitsky IF and Rao BD, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Trans. Signal Process, vol. 45, no. 3, pp. 600–616, 1997. [Google Scholar]

[R16] [16].Tipping ME, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res, vol. 1, no. June, pp. 211–244, 2001. [Google Scholar]

[R17] [17].Chen SS, Donoho DL, and Saunders MA, “Atomic decomposition by basis pursuit,” SIAM Review, vol. 43, no. 1, pp. 129–159, 2001. [Google Scholar]

[R18] [18].Chartrand R and Yin W, “Iteratively reweighted algorithms for compressive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2008, pp. 3869–3872. [Google Scholar]

[R19] [19].Candès EJ, Wakin MB, and Boyd SP, “Enhancing sparsity by reweighted ℓ₁ minimization,” J. Fourier Anal. Appl, vol. 14, no. 5, pp. 877–905, 2008. [Google Scholar]

[R20] [20].Wagner K and Doroslovački M, Proportionate-type Normalized Least Mean Square Algorithms, John Wiley & Sons, 2013. [Google Scholar]

[R21] [21].Benesty J and Gay SL, “An improved PNLMS algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2002, pp. 1881–1884. [Google Scholar]

[R22] [22].Paleologu C, Benesty J, and Ciochină S, “An improved proportionate NLMS algorithm based on the l₀ norm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2010, pp. 309–312. [Google Scholar]

[R23] [23].Martin RK, Sethares WA, Williamson RC, and Johnson CR, “Exploiting sparsity in adaptive filters,” IEEE Trans. Signal Process, vol. 50, no. 8, pp. 1883–1894, 2002. [Google Scholar]

[R24] [24].Rao BD and Song B, “Adaptive filtering algorithms for promoting sparsity,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2003, pp. 361–364. [Google Scholar]

[R25] [25].Jin Y, Algorithm Development for Sparse Signal Recovery and Performance Limits Using Multiple-User Information Theory, Ph.D. dissertation, University of California, San Diego, 2011. [Google Scholar]

[R26] [26].Benesty J, Paleologu C, and Ciochină S, “Proportionate adaptive filters from a basis pursuit perspective,” IEEE Signal Process. Lett, vol. 17, no. 12, pp. 985–988, 2010. [Google Scholar]

[R27] [27].Liu J and Grant SL, “A generalized proportionate adaptive algorithm based on convex optimization,” in Proc. IEEE China Summit Int. Conf. Signal Inform. Process. (ChinaSIP),2014, pp. 748–752. [Google Scholar]

[R28] [28].Chen Y, Gu Y, and Hero AO, “Sparse LMS for system identification,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp. 3125–3128. [Google Scholar]

[R29] [29].Gu Y, Jin J, and Mei S, “l₀ norm constraint LMS algorithm for sparse system identification,” IEEE Signal Process. Lett, vol. 16, no. 9, pp. 774–777, 2009. [Google Scholar]

[R30] [30].Wu FY and Tong F, “Gradient optimization p-norm-like constraint LMS algorithm for sparse system estimation,” Signal Process, vol. 93, no. 4, pp. 967–971, 2013. [Google Scholar]

[R31] [31].Taheri O and Vorobyov SA, “Reweighted l₁-norm penalized LMS for sparse channel estimation and its analysis,” Signal Process, vol. 104, pp. 70–79, 2014. [Google Scholar]

[R32] [32].Das RL and Chakraborty M, “Improving the performance of the PNLMS algorithm using l₁ norm regularization,” IEEE/ACM Trans. Audio, Speech, and Lang. Process, vol. 24, no. 7, pp. 1280–1290, 2016. [Google Scholar]

[R33] [33].Li Y and Hamamura M, “An improved proportionate normalized least-mean-square algorithm for broadband multipath channel estimation,” Sci. World J, vol. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Albu F, Caciula I, Li Y, and Wang Y, “The l_p-norm proportionate normalized least mean square algorithm for active noise control,” in Proc. Int. Conf. Syst. Theory, Control, Comput. (ICSTCC), 2017, pp. 401–405. [Google Scholar]

[R35] [35].Lima MVS, Ferreira TN, Martins WA, and Diniz PSR, “Sparsity-aware data-selective adaptive filters,” IEEE Trans. Signal Process, vol. 62, no. 17, pp. 4557–4572, 2014. [Google Scholar]

[R36] [36].Pelekanakis K and Chitre M, “New sparse adaptive algorithms based on the natural gradient and the l₀-norm,” IEEE J. Oceanic Eng, vol. 38, no. 2, pp. 323–332, 2013. [Google Scholar]

[R37] [37].Ferreira TN, Lima MVS, Diniz PSR, and Martins WA, “Low-complexity proportionate algorithms with sparsity-promoting penalties,” in Proc. IEEE Int. Symp. Circuits. Syst. (ISCAS),2016, pp. 253–256. [Google Scholar]

[R38] [38].Jin J, Gu Y, and Mei S, “A stochastic gradient approach on compressive sensing signal reconstruction based on adaptive filtering framework,” IEEE J. Sel. Top. Signal Process, vol. 4, no. 2, pp. 409–420, 2010. [Google Scholar]

[R39] [39].Lee C-H, Rao BD, and Garudadri H, “Proportionate adaptive filters based on minimizing diversity measures for promoting sparsity,” in Proc. Asilomar Conf. Signals, Syst., Comput. (ACSSC), 2019, pp. 769–773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Variddhisaï T and Mandic DP, “On an RLS-like LMS adaptive filter,” in Proc. IEEE Int. Conf. Digital Signal Process. (DSP), 2017, pp. 1–5. [Google Scholar]

[R41] [41].Su G, Jin J, Gu Y, and Wang J, “Performance analysis of l₀ norm constraint least mean square algorithm,” IEEE Trans. Signal Process, vol. 60, no. 5, pp. 2223–2235, 2012. [Google Scholar]

[R42] [42].Sun Y, Babu P, and Palomar DP, “Majorization-minimization algorithms in signal processing, communications, and machine learning,” IEEE Trans. Signal Process, vol. 65, no. 3, pp. 794–816, 2017. [Google Scholar]

[R43] [43].Oliveira JP, Bioucas-Dias JM, and Figueiredo MAT, “Adaptive total variation image deblurring: A majorization–minimization approach,” Signal Process, vol. 89, no. 9, pp. 1683–1693, 2009. [Google Scholar]

[R44] [44].Boyd S and Vandenberghe L, Convex Optimization, Cambridge University Press, 2004. [Google Scholar]

[R45] [45].Luenberger DG and Ye Y, Linear and Nonlinear Programming, 4th edition, Springer, 2016. [Google Scholar]

[R46] [46].Chen L and Gu Y, “From least squares to sparse: A non-convex approach with guarantee,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 5875–5879. [Google Scholar]

[R47] [47].Benesty J, Paleologu C, and Ciochină S, “On regularization in adaptive filtering,” IEEE Trans. Audio, Speech, Lang. Process, vol. 19, no. 6, pp. 1734–1742, 2010. [Google Scholar]

[R48] [48].Rao BD, Engan K, Cotter SF, Palmer J, and Kreutz-Delgado K, “Subset selection in noise based on diversity measure minimization,” IEEE Trans. Signal Process, vol. 51, no. 3, pp. 760–770, 2003. [Google Scholar]

PERMALINK

Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework

Ching-Hua Lee

Bhaskar D Rao

Harinath Garudadri

Roles

Abstract

I. Introduction

A. Related Work

B. Contributions of the Paper

C. Organization of the Paper

D. Notation

II. Background on Adaptive Filtering and SSR

A. Adaptive Filtering for System Identification

1). Sparsity-Aware Adaptive Filtering Algorithms:

B. Iterative Reweighting Algorithms in SSR

III. Proposed Framework for Incorporating Sparsity in Adaptive Filters

A. Reweighting Methods for Adaptive Filtering

Proposition 3.1:

Proof:

B. AST-Based Adaptive Filtering Algorithms

C. Discussions

Theorem 3.1:

Proof:

IV. Sparsity-Promoting Algorithms Adopting λ = 0

A. Interpretation of λ = 0 from Optimization Perspective

B. Example Diversity Measures and Corresponding Wn

C. Comparison to Existing Work on PNLMS-Type Algorithms

V. Steady-State Performance Analysis

Assumption 1:

Assumption 2:

Assumption3:

Theorem 5.1 (Steady-state excess MSE):

Proof:

Theorem 5.2 (Convergence conditions):

Proof:

A. Steady-State Performance of SLMS

B. Steady-State Performance of SNLMS

VI. Simulation Results

Fig. 1:

A. Comparison of Algorithms with and without AST

Fig. 2:

B. Effect of Sparsity Control Parameter on SLMS and SNLMS

Fig. 3:

Fig. 4:

C. Effect of Step Size on SLMS and SNLMS

Fig. 5:

D. Comparison with Existing Algorithms

Fig. 6:

Fig. 7:

Fig. 8:

Fig. 9:

VII. Conclusion

Acknowledgment

Appendix A

Proof of Theorem 3.1

Lemma A.1:

Proof:

Lemma A.2:

Proof:

Appendix B

Proof of Theorem 5.1

Appendix C

Proof of Theorem 5.2

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

B. Example Diversity Measures and Corresponding W_n