Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: IEEE/ACM Trans Audio Speech Lang Process. 2020 Nov 17;29:171–186. doi: 10.1109/taslp.2020.3038526

Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework

Ching-Hua Lee 1, Bhaskar D Rao 1, Harinath Garudadri 1
PMCID: PMC7996480  NIHMSID: NIHMS1653718  PMID: 33778097

Abstract

In this paper, based on sparsity-promoting regularization techniques from the sparse signal recovery (SSR) area, least mean square (LMS)-type sparse adaptive filtering algorithms are derived. The approach mimics the iterative reweighted 2 and 1 SSR methods that majorize the regularized objective function during the optimization process. We show that introducing the majorizers leads to the same algorithm as simply using the gradient update of the regularized objective function, as is done in existing approaches. Different from the past works, the reweighting formulation naturally leads to an affine scaling transformation (AST) strategy, which effectively introduces a diagonal weighting on the gradient, giving rise to new algorithms that demonstrate improved convergence properties. Interestingly, setting the regularization coefficient to zero in the proposed AST-based framework leads to the Sparsity-promoting LMS (SLMS) and Sparsity-promoting Normalized LMS (SNLMS) algorithms, which exploit but do not strictly enforce the sparsity of the system response if it already exists. The SLMS and SNLMS realize proportionate adaptation for convergence speedup should sparsity be present in the underlying system response. In this manner, we develop a new way for rigorously deriving a large class of proportionate algorithms, and also explain why they are useful in applications where the underlying systems admit certain sparsity, e.g., in acoustic echo and feedback cancellation.

Index Terms—: affine scaling, iterative reweighted, proportionate adaptation, sparse adaptive filter, sparse signal recovery

I. Introduction

ADAPTIVE filters [1], [2], [3], [4] have been an active research area over the past few decades for their capabilities of estimating and tracking time-varying systems. In several applications, the impulse responses (IRs) of the underlying systems to be identified are often sparse or compressible (quasi-sparse), i.e., only a small percentage of the IR components have a significant magnitude while the rest are zero or small. Examples include network and acoustic echo cancellation [5], [6], [7], hands-free mobile telephony [8], acoustic feedback reduction in hearing aids [9], and underwater acoustic communications [10], to mention a few. Designing adaptive filters that can exploit the sparse structure of the underlying system response for performance improvement over the conventional approaches, e.g., the least mean square(LMS)and normalized LMS (NLMS), is of great interest and importance especially for acoustic and speech applications. In this paper, we utilize the iterative reweighted 2 and 1 algorithms that have been developed in the sparse signal recovery (SSR) area to minimize diversity measures as a starting point [11]. Incorporating an affine scaling transformation (AST) [12], [13] into the algorithm design process, a new methodology for developing a large class of adaptive filters is presented that leverage the sparse nature of the system responses.

A. Related Work

An early and influential work on identifying sparse IRs is the proportionate NLMS (PNLMS) algorithm proposed by Duttweiler [5] for acoustic echo cancellation. The main idea behind the approach is to update each filter coefficient using a step size proportional to the magnitude of the estimated coefficient, as opposed to the NLMS which assigns a uniform adaptation gain to all coefficients. Consequently, when the system is sparse, larger coefficients are adapted using relatively large steps compared to the smaller ones with PNLMS. The overall convergence can thus be sped up by focusing on adjusting the significant coefficients, rather then treating them all equally as in NLMS. Although PNLMS was developed in an intuitive way, i.e., the equations used to calculate the proportionate factors that realize step-size control were not based on any optimization criterion but were based on good heuristics, it has motivated many new proportionate variants for sparse system identification. The proportionate class of algorithms represent an important subset among sparsity-aware adaptive filters.

The recent progress on SSR has led to a number of computational algorithms, e.g., [14], [15], [16], [17], [18], [19], among others. This makes available a plethora of approaches for systematically designing sparsity-aware adaptive algorithms that are a natural complement to the SSR batch estimation techniques. As a result, different from the proportionate approaches, another class of sparse adaptive filters have been introduced by integrating sparsity-inducing regularization to accelerate the convergence of near-zero coefficients in sparse systems. This has led to several sparse adaptive filtering algorithms and even obtaining a general framework of adaptive filters that incorporate sparsity. SSR-motivated adaptive algorithms represent another important class of sparsity-aware adaptive filters. We now discuss a few works on the proportionate class followed by the SSR variants.

Several variants of the PNLMS have been proposed and [20] provides a good summary. Examples include the improved PNLMS (IPNLMS) [21], IPNLMS based on the 0 “norm” (IPNLMS-0) [22], etc. In [23], Martin et al. utilized a natural gradient framework to deduce adaptive filters having similar features to the PNLMS that can exploit the sparse structure. Rao and Song [24] and Jin [25] proposed a framework for promoting sparsity in adaptive filters based on minimizing diversity measures. The framework is quite general and encompasses a broad range of adaptive filtering algorithms having similarity with the PNLMS algorithm. Benesty et al. [26] derived the PNLMS from a different perspective by using a basis pursuit [17] formulation. Following them, Liu and Grant [27] proposed a general framework of proportionate adaptive filters based on convex optimization and sparseness measures, which covers many traditional proportionate algorithms.

Several SSR-inspired algorithms have been introduced by integrating a sparsity-inducing regularizer into the original LMS objective function to accelerate the convergence of near-zero coefficients in sparse systems. For example, Chen et al. [28] proposed the zero-attracting LMS (ZA-LMS) derived by including the 1 norm penalty in the objective function. They also proposed the reweighted ZA-LMS (RZA-LMS) obtained by incorporating the log-sum penalty. Later, using the approximation of the 0 “norm” as a sparsity-inducing term, Gu et al. [29] proposed the 0-LMS that is capable of better estimating sparse systems. In [30], the authors utilized the p-norm-like regularization function and considered the quantitative learning of the regularizer. Another work in this area is the new reweighted 1 norm penalized LMS algorithm proposed and studied in [31] for improving the ZA-LMS and RZA-LMS.

Recently, some works have considered both proportionate adaptation and sparsity-inducing regularization together. For example, [32] presents a modified PNLMS update equation with a zero attractor as in the ZA-LMS for all the taps, derived by introducing a carefully constructed 1 norm penalty in the PNLMS objective function. Other than the 1 norm, [33], [34] apply the p norm penalty to the PNLMS cost function and derive p-norm-constrained proportionate algorithms for improved broadband multipath channel estimation and active noise control. [35] encompasses a number of sparsity-aware adaptive filtering algorithms that go beyond the LMS and NLMS, including proportionate and regularization-based approaches. [36], [37] provide a general framework to combine proportionate updates and sparsity-inducing regularizers. In Section III, we will derive algorithms whose update rules also consist of a proportionate term and another term due to regularization. However, our derivation follows a very different path from these previous works.

B. Contributions of the Paper

In this paper, inspired by the conceptual similarity with SSR,1 our goal is to add to this interesting body of work on adaptive filtering and sparsity. The contributions of the paper are the following:

  1. The sparsity-aware adaptive filters developed lie at the intersection of the proportionate-class and SSR-inspired adaptive algorithms and provides an interesting bridge. We start with the rigorous formulation of a regularization framework and derive sparsity-aware adaptive filtering algorithms. Specifically, based on diversity measure minimization in SSR, we adopt the reweighted 2 and 1 frameworks [11]. Using an AST methodology [12], [13] in the algorithm development, our work naturally results in a general class of proportionate algorithms. This is an unique feature of this work. The combination of AST and the reweighting frameworks contribute to the main innovation of our adaptive algorithm development framework.

  2. Under the proposed framework, we introduce Sparsity-promoting LMS (SLMS) and Sparsity-promoting NLMS (SNLMS) algorithms that promote sparsity without biasing the adaptation process by adopting λ = 0, where λ is the regularization coefficient associated with the sparsity penalty. This is not possible for the class of algorithms currently in existence that utilize a sparsity-inducing regularization penalty.2 The SLMS and SNLMS can be viewed as realizing proportionate adaptation like the PNLMS class of algorithms [5]. Therefore, utilizing λ = 0 in our framework paves the way for explaining why proportionate algorithms are useful in circumstances where the channels to be estimated admit certain sparsity. This provides theoretical support to existing proportionate algorithms which were mostly developed based on good heuristics rather than on optimization criteria. More importantly, unlike most of them that design the proportionate factors heuristically, our SSR-motivated framework leads to a more systematic way of designing the factors, and permits incorporation of a broad class of diversity measures that have proven to be effective for SSR in our algorithms.

  3. Compared to existing derivations of proportionate-type algorithms, using the proposed framework we derive the algorithms in a more natural way in terms of incorporating sparsity using a regularization framework. For instance, in some of the existing works modified objective functions were proposed that impose sparsity on the “change” of the filter rather than on the filter itself, e.g., [24], [25], [26], [27], [32]. However, since the assumption is that the filter itself is sparse, the motivation for enforcing sparsity on the “change” rather than on the filter is not clear and at best indirect. In contrast, we work with the general mean squared error (MSE) criterion in which sparsity can be directly imposed via regularization on the filter.

  4. Steady-state analysis of the proposed algorithms is conducted and simulation results are provided to demonstrate the effectiveness of the proposed algorithms compared to existing approaches. Examples with the acoustic channel response measured on a real-world hearing aid device using speech input are also presented.

A portion of this work has been previously published as a conference paper [39]. To a fuller extent, the current paper provides a general framework for incorporating flexible diversity measures into sparse adaptive filters, together with theoretical discussion and supporting simulated results.

C. Organization of the Paper

The rest of the paper is organized as follows. Section II provides background on adaptive filters. Section III derives adaptive filters that incorporate sparsity based on diversity measure minimization, by utilizing the reweighted 2 and 1 frameworks together with the AST methodology. Section IV introduces the SLMS and SNLMS that adopt λ = 0. Section V discusses the steady-state analysis. Section VI presents simulation results. Section VII concludes the paper.

D. Notation

Let M denote the M-dimensional real Euclidean space. N×M denotes the set of N ×M real matrices. + denotes the set of non-negative real numbers. SuperscriptT denotes the transpose of a vector or matrix. E[·] denotes the mathematical expectation. Vectors and matrices are denoted by boldface lowercase and uppercase letters, respectively. Scalars are denoted by italics. For a vector x=[x0,x1,,xM1]TM, the p norm3 (where p > 0) is defined as: xp=(i=0M1|xi|p)1/p. The 0 “norm”4x0 is defined as the number of nonzero entries of the vector x. We use diag{xi} to denote the M-by-M diagonal matrix whose i-th diagonal element is xi. We use sgn(·) to denote the component-wise sign function that takes the sign of the entries of its argument. ∇x denotes the gradient operator5 w.r.t. x. d denotes the differential operator. tr(X) denotes the trace of a square matrix XM×M. I is the identity matrix. 1 denotes the vector of all ones. 0 denotes the vector of all zeros. We use N(,) to denote the normal distribution with the first and second arguments being the mean and (co)variance, respectively.

II. Background on Adaptive Filtering and SSR

We provide some preliminaries of adaptive filters in the context of system identification and present several examples of existing sparsity-aware adaptive filtering algorithms. We also discuss the iterative reweighting frameworks in SSR for developing our adaptive algorithms in later sections.

A. Adaptive Filtering for System Identification

Let hn = [h0,n, h1,n,…, hM − 1,n]T denote the adaptive filter of length M at discrete time instant n. Assume the IR of the underlying system is ho=[h0o,h1o,,hM1o]T, and the model for the observed or desired signal is dn=unTho+vn. un = [un, un−1, …, unM+1]T is the vector containing the M most recent samples of the input signal un and vn is an additive noise signal. The output of the adaptive filter unThn is subtracted from dn to obtain the error signal en=dnunThn The goal in general is to sequentially update the coefficients of hn upon the arrival of a new data pair (un, dn), such that eventually hn = ho; i.e., to identify the unknown system.

The most classic adaptive filtering algorithms are the LMS and NLMS [1], [2], [3], which can be derived based on minimizing the MSE objective function:

minhJ(h)E[en2]=E[(dnunTh)2]. (1)

The method of steepest descent (gradient descent) for optimizing (1) suggests the following recursion for updating the filter coefficients [2]:

hn+1=hnμ2hJ(hn), (2)

where μ > 0 is the step size. To develop adaptive algorithms, in practice the gradient ∇h J(hn) = −2E[unen] is replaced by the instantaneous estimate −2un en, i.e., the stochastic gradient [2], [3], leading to the standard LMS algorithm:

hn+1=hn+μunen. (3)

The normalized version of (3), i.e., the NLMS algorithm, can be derived based on the principle of minimum disturbance [2]. Alternatively, it can be obtained by performing exact line search for the optimal step size for each iteration [40]. Then, practically, a scaling factor μ˜>0 is introduced to exercise control over the adaptation6 and a small regularization constant δ > 0 is also employed to avoid division by zero [2], leading to the standard NLMS algorithm:

hn+1=hn+μ˜unenunTun+δ. (4)

1). Sparsity-Aware Adaptive Filtering Algorithms:

When the underlying system response is sparse, a class of algorithms realizing proportionate adaptation [20] are able to take advantage of the structural sparsity. A typical update rule with proportionate adaptation is:

hn+1=hn+μPnunen, (5)

or the normalized version:

hn+1=hn+μ˜PnunenunTPnun+δ, (6)

Where

Pn=diag{p0,n,p1,n,,pM1,n} (7)

is an M-by-M diagonal matrix assigning different weights to the step sizes for different filter taps, referred to as the proportionate matrix. It redistributes the adaptation gains among all coefficients and emphasizes the large ones in order to speed up their convergence. Typically, at the n-th iteration the diagonal entries are computed as:

pi,n=γi,nj=0M1γj,n, (8)

i = 0, 1,…, M − 1, where γi,n is algorithm-dependent and examples of such algorithms include the PNLMS [5], IPNLMS [21], IPNLMS-0 [22], etc. In general, if the estimated filter coefficients hi,n are sparse, the resulting γi,n (thus pi,n) will also tend to be sparsely distributed (with positive values).

Another class of algorithms, inspired by developments in the SSR area, take sparsity into account using a regularization-based approach, e.g., [28], [29], [30], [31]. The algorithms are obtained by adding a sparsity-inducing term G(h) to the MSE objective function:

minhJG(h)J(h)+λG(h), (9)

Where λ > 0 is the regularization coefficient. By simply applying (stochastic) gradient descent7 on (9):

hn+1=hn+μunenμλ2hG(hn), (10)

various algorithms can be obtained with different sparsity-inducing functions G(·). Examples include the ZA-LMS [28], RZA-LMS [28], and 0-LMS [29], [41].

B. Iterative Reweighting Algorithms in SSR

The optimization of (9) is actually an SSR problem. The sparsity regularization term G(·) represents the general diversity measure that when minimized encourages sparsity in its argument. A separable function of the form G(h)=i=0M1g(hi) is commonly used, where g(·) has the following properties [11]:

  • Property 1: g(z) is symmetric, i.e., g(z) = g(−z) = g(|z|);

  • Property 2: g(|z|) is monotonically increasing with |z|;

  • Property 3: g(0) is finite;

  • Property 4: g(z) is concave in |z| or z2.

Any function that holds the above properties is a candidate for effective SSR algorithm development.

The concave nature of the regularization penalty poses challenges to the diversity measure minimization problem. The iterative reweighted 2 [15], [18] and 1 [19] methods are popular batch estimation algorithms for solving such minimization problems in SSR. By introducing a weighted 2 or 1 norm term as an upper bound for the diversity measure term in each iteration, they form and solve a convex optimization problem at each step to approach the optimal solution [11]. Specifically, instead of (9), at iteration n the reweighted 2 framework suggests solving:

minhJnl2(h)J(h)+λWn1h22, (11)

and the reweighted 1 framework suggests solving:

minhJnl1(h)J(h)+λWn1h1, (12)

where Wn = diag{wi,n} is positive definite8 and each wi,n is computed based on the current estimate hi,n, depending on which framework (reweighted 2 or 1) and diversity measure (choice of G(·)) are used.

To elaborate, for using the reweighted 2 (11), the diversity measure function g(z) has to be concave in z2 for Property 4; i.e., it satisfies g(z) = f(z2), where f(z) is concave for z+. Based on [11], we have wi,n given as:

wi,n=(df(z)dz|z=hi,n2)12. (13)

For using the reweighted 1 (12), g(z) has to be concave in |z| for Property 4; i.e., it satisfies g(z) = f(|z|), where f(z) is concave for z+. In this case, wi,n is given as:

wi,n=(df(z)dz|z=|hi,n|)1, (14)

To utilize the reweighting frameworks, we first choose an appropriate diversity measure G(h) and then use (13) or (14) to obtain the corresponding update form of Wn. Several examples will be presented is Section IV-B.

III. Proposed Framework for Incorporating Sparsity in Adaptive Filters

Our framework for developing sparse adaptive filters is based on (9). However, we will be deriving algorithms in a different way rather than using a simple gradient descent as is typically done in existing regularization-based adaptive filtering approaches, e.g., (10). Our novel derivation consists of two stages: i) adapting the iterative reweighting frameworks [11] popular in SSR to the adaptive filtering setting, followed by ii) the AST strategy [12], [13] from the optimization literature to obtain new adaptive filtering algorithms.

A. Reweighting Methods for Adaptive Filtering

The reweighting methods introduced in Section II-B actually belong to the more general class of majorization-minimization (MM) algorithms [42]. In each iteration n, the weighted 2 or 1 norm term majorizes G(h) at the current estimate hn, thereby providing a surrogate function (or majorizer) Jnl2(h) or Jnl1(h) for the regularized objective function JG(h). Sequentially minimizing the surrogate functions allows the algorithm to produce more focal estimates as optimization progresses. Hopefully when the number of iterations is large enough, the optimal solution can be well approached or even achieved [11].

In SSR, it is typical that the surrogate function is exactly minimized in each iteration n. For the purpose of developing adaptive filtering algorithms, here we consider performing only one step of gradient descent per iteration. In this sense, it corresponds to the generalized MM [43] where one does not need to minimize the majorizer but only to assure that it decreases in every iteration. Indeed, the MM viewpoint provides an interesting observation of using gradient descent for optimizing (9) and the reweighting formulations (11) and (12), as stated in the following proposition:

Proposition 3.1:

For any point hn at which G(h) is differentiable, the gradient vector of the surrogate function Jnl2(h) or Jnl1(h) evaluated at hn coincides with that of the regularized objective function JG(h), i.e., hJnl2(hn)=hJG(hn) and hJnl1(hn)=hJG(hn).

Proof:

Since the surrogate function majorizes JG(h) at hn, the tangent plane (supporting hyperplane) of the majorizer coincides with that of JG(h) at hn. Consequently, the gradient vectors are the same at hn.

The observation in Proposition 3.1 implies that, if the gradient descent (when using a fixed step size) is utilized for optimization,9 then adopting the reweighting frameworks (11) and (12) will be equivalent to directly working on (9) and lead to the existing regularization-based algorithms such as the ZA-LMS. In the following, we introduce the AST strategy naturally suggested by the reweighting frameworks, leading to new algorithms markedly different from those obtained by directly optimizing (9) with gradient descent.

B. AST-Based Adaptive Filtering Algorithms

The reweighting frameworks (11) and (12) naturally suggest the following reparameterization in terms of the (affinely) scaled variable q:

qWn1h. (15)

This step can be interpreted as the AST commonly employed by the interior point approach to solving linear and nonlinear programming problems [12], where Wn is used as the scaling matrix. It is pre-calculated and treated as a given matrix at iteration n to perform a change of coordinates (variables) [44] from h to q, acting as a scaling technique in gradient descent methods [45]. In the optimization literature, AST-based methods transform the original problem into an equivalent one, favorably positioning the current point at the center of the feasible region for expediting the optimization process [13]. While we do not claim this argument is rigorous in the context of adaptive filtering, where the convergence behavior is hard to characterize due to the nonlinear nature of the update equations and the long term dependency on the data, the numerical results appear to support this observation of enjoying the benefits of AST for convergence speedup.

Now we apply (15) to reparameterize the objective functions Jnl2(h) and Jnl1(h) and perform minimization w.r.t. q, that is:

minqJ˜nl2(q)Jnl2(Wnq)=J(Wnq)+λq22 (16)

and

minqJ˜nl1(q)Jnl1(Wnq)=J(Wnq)+λq1, (17)

for the reweighted 2 and 1cases, respectively. The overall update process conceptually can be summarized as follows: i) given an h compute Wn followed by q. ii) Update q using a gradient descent algorithm. iii) Use this new q to obtain the updated h. iv) Repeat Steps i)–iii) till convergence.

More formally, to proceed with gradient-based updates, following [40] we define the a posteriori AST variable at time n:

qnnWn1hn (18)

and the a priori AST variable at time n:

qn+1nWn1hn+1. (19)

The recursive update by using gradient descent in the q domain can be formulated as:

qn+1n=qnnμ2qJ˜nl2(qnn) (20)

and

qn+1n=qnnμ2qJ˜nl1(qnn), (21)

for optimizing (16) and (17), respectively.

Using the chain rule10 and the AST relationships (15) and (18), we can write (20) and (21) respectively as:

qn+1n=qnnμ2WnhJnl2(hn) (22)

and

qn+1n=qnnμ2WnhJnl1(hn). (23)

Premultiplying Wn on both sides of (22) and (23) and noting the relationships (18) and (19), we transform the q domain updates (22) and (23) back to the h domain respectively as:

hn+1=hnμ2Wn2hJnl2(hn) (24)

and

hn+1=hnμ2Wn2hJnl1(hn). (25)

By Proposition 3.1, we can replace hJnl2(hn) and hJnl1(hn) with ∇h JG(hn). Thus, (24) and (25) can both be written as:

hn+1=hnμ2Wn2hJG(hn). (26)

Note that based on the aforementioned update process (i)-(iv), we can in fact directly apply (15) to reparameterize JG(h) and obtain (26) without going through the reweighting formulation, as long as the scaling matrix Wn is specified. In this sense, the reweighing methods suggest a suitable Wn that eventually becomes a diagonal weighting matrix Wn2 on the gradient ∇hJG(hn) in the update rule. Hopefully, it alters the ordinary descent direction in such a way that leads to convergence improvement. We should also emphasize that the scaling matrix Wn suggested by (24) and (25) will in general be different for a given same G(h), despite the fact that both can be written as (26).

In practice, the following update rule is suggested over (26) for avoiding instability and slow convergence issues:

hn+1=hnμ2SnhJG(hn), (27)

where

Sn=Wn21Mtr(Wn2), (28)

referred to as the sparsity-promoting matrix, is the normalized version of Wn2. As a fixed step size μ is used, performing normalization of the weighting matrix compensates for any arbitrary scaling inherent in Wn2 that might cause instability (scaling too large) or slow convergence (scaling too small). Note that by (28) we always have tr(Sn) = M, same as in the ordinary gradient descent (non-AST) case which essentially has Sn = I whose trace is also M.

Finally, to obtain the adaptive algorithm, we follow the standard procedure of replacing ∇hJG(hn) = −2E[un en] + λhG(hn) in (27) with its instantaneous estimate −2unen + λhG(hn), leading to:

hn+1=hn+μSnunenμλ2SnhG(hn). (29)

We see that there is a term with a diagonal weighting Sn on the LMS update vector unen, similar to that in proportionate algorithms (5) and (6). We also see another term weighted by λ which is due to the introduction of the regularizer, like that of (10). Therefore, the AST framework leads to a more general algorithm comprised of proportionate adaptation and sparsity-inducing regularization. We thus refer to (29) as the generalized sparse LMS algorithm.

C. Discussions

It may seem at the first glance that applying the reweighting techniques to (9) straightforwardly leads to our algorithm. We stress that it is not true. If the AST (15) was not considered, adopting the reweighting schemes would still end up with an update rule like (10) according to Proposition 3.1, rather than the proposed (29). It is also worth mentioning that there is considerable difference between the proposed algorithm (29) and existing SSR algorithms based on (11) and (12) – the conventional SSR techniques are batch estimation methods for recovering the underlying sparse representation, while the proposed algorithm is specifically tailored for the adaptive filtering scenario. That being said, as gradient descent is adopted for optimization, we actually perform a gradual update of the filter coefficients in each iteration n, rather than looking for an exact minimizer of the surrogate function as is typically pursued in SSR. This enables the algorithm to track temporal variations and environmental changes. Certainly, considering the gradient noise in real scenarios, it may post the issue of whether the algorithm is convergent. However, even the standard LMS and NLMS that are based on gradual updates, work well in many practical situations with gradient noise. In Section VI, experimental results will demonstrate that the proposed algorithm, like the LMS and NLMS, also behaves well when certain level of environmental noise is present.

Finally, the following theorem establishes the convergence of the q domain recursions (20) and (21) and their relationships to (9) to shed light on the convergence of the adaptive algorithm (29) developed based on them:

Theorem 3.1:

For the objective function JG(h) in (9) with the general diversity measure G(h) satisfying Properties 1–4 in Section II-B,11 there exists a step size sequence {μn}n=0 such that each of the update recursions (20) and (21) monotonically converges to a local minimum (or saddle point) of (9) under a wide-sense stationary (WSS) environment, i.e., un and dn are jointly WSS.

Proof:

See Appendix A.

IV. Sparsity-Promoting Algorithms Adopting λ = 0

An interesting situation arises when we consider the limiting case of λ → 0+ for the proposed framework. By setting λ = 0 in (29), we see the λ-weighted term due to regularization vanishes, leading to a simpler equation:

hn+1=hn+μSnunen. (30)

The main feature of (30) is that it is able to promote sparsity of the system (through Sn) if it already exists while not strictly enforcing it (as λ = 0). This shall become clearer in later discussions. We refer to the algorithm (30) as the Sparsity-promoting LMS (SLMS).

The normalized version of (30) can also be developed by performing exact line search for the optimal step size at iteration n just like that when deriving the NLMS:

μn=argminμ(dnunT(hn+μSnunen))2=1unTSnun. (31)

Similar to the NLMS, we introduce μ˜>0 to exercise control over the adaptation and δ > 0 to avoid division by zero, resulting in:

hn+1=hn+μ˜SnunenunTSnun+δ. (32)

We refer to the algorithm (32) as the Sparsity-promoting NLMS (SNLMS).

An obvious benefit of adopting λ = 0 is that the computation for the term due to regularization is no longer needed, and we do not have to tweak this coefficient anymore (which is typically not a trivial task in practice). Still, the SLMS and SNLMS have the ability to leverage sparsity owing to the diagonal weighting Sn, which is similar to the proportionate matrix Pn in (5) and (6). Again, this is made possible due to the use of the AST (15), wherein the gradient descent update is done in the q variable rather than in the original h domain. Otherwise, we will end up with algorithms like (10) that will reduce to the ordinary LMS/NLMS when using λ = 0.

The SLMS and SNLMS can in fact be viewed as a broader class of proportionate algorithms. Actually, with certain choices of diversity measures and corresponding parameters, we can have the PNLMS (approximately) as a special case. For example, as we will see in Section IV-B, using p = 1 in (34) for Wn, the sparsity-promoting matrix Sn approximates the proportionate matrix Pn of PNLMS. Indeed, one of the main advantages of the SLMS and SNLMS is their ability to incorporate flexible diversity measures. It allows the algorithms to fit the sparsity level of the system response by optimizing corresponding sparsity control parameters in a more informed manner due to the underlying connections to SSR. Furthermore, the derivations provide theoretical support to the class of proportionate algorithms that were mostly motivated based on heuristics, explaining why they are useful in practical system identification tasks with sparse channels, e.g., in acoustic echo/feedback cancellation, from an SSR viewpoint.

A. Interpretation of λ = 0 from Optimization Perspective

We further discuss the interpretation of using λ = 0 in our framework from the optimization perspective. Recall that the AST reparameterization (15) results in the optimization problems (16) and (17). Setting λ = 0 leads both to:

minqJ(Wnq). (33)

This actually applies a change of coordinates to the unregularized problem (1) via (15). Since Wn is invertible, the problem of finding the h that minimizes J(h) is equivalent to finding the q which minimizes J(Wnq). Therefore, the advantage of solving (33) is that the solution is guaranteed to also be a solution of (1), which is not true for (9) with λ > 0. Thus, the optimization is unbiased while promoting sparsity – it is able to take advantage of sparsity whereas avoiding any bias incurred by the introduction of the sparsity regularizer. As noted in [45], the performance of gradient-based methods is dependent on the parameterization – a new choice may substantially alter convergence characteristics. Introducing variable scalings may speed up convergence by altering the descent direction toward the optimum. In our case, solving (33) with appropriately selected Wn can expedite the adaptation procedure toward the optimum of (1).

This observation can also be illustrated by looking at (9) which indicates a trade-off between estimation quality, as reflected in the MSE objective function, and solution sparsity as controlled by λ. In the limiting case of λ → 0+, the objective function exerts diminishing impact on enforcing sparsity on the solution, meaning that eventually no sparse solution is favored over other possible solutions. To elaborate, with λ = 0 and under a WSS environment, all the algorithms derived from (9) minimize the MSE and converge toward the Wiener-Hopf solution. However, not surprisingly, the paths they take are different and depend on how the iterations are developed. If the Wiener-Hopf solution is sparse, then all will converge toward the same sparse solution asymptotically. Interestingly, the SLMS and SNLMS, because of their proportionate nature similar to the PNLMS-type algorithms, can take advantage of the sparsity and are capable of speeding up convergence without compromising estimation quality should sparsity be present. This observation will later be supported by experimental results in Section VI-B.

B. Example Diversity Measures and Corresponding Wn

To illustrate the flexibility of the proposed framework, we provide example algorithms instantiated with popular diversity measures that have proved effective in SSR.

Consider the p-norm-like diversity measure with g(hi) = |hi|p, 0 < p ≤ 2 for the reweighted 2 framework [15], [12]. Using (13) leads to the update form of Wn:

wi,n=(2p(|hi,n|+c)2p)12. (34)

Note that we have empirically added a small regularization constant c > 0 for avoiding algorithm stagnation and instability,12 which also ensures the positive definiteness of Wn [39]. The parameter p ∈ (0, 2] in (34) is responsible for controlling the sparsity degree, as the p-norm-like diversity measure is associated with super-Gaussian prior distributions. In general, a smaller p corresponds to a heavier-tailed distribution, encouraging stronger sparsity in the parameters. It is worth noting that using p → 1 in (34) results in a proportionate factor close to that of the PNLMS. On the other hand, letting p = 2 recovers the standard LMS/NLMS.

The p-norm-like diversity measure can also be adopted in the reweighted 1 framework if 0 < p ≤ 1. Applying (14), we obtain the update form of Wn in this case:

wi,n=1p(|hi,n|+c)1p. (35)

Again, a small constant c > 0 is added. The sparsity control parameter of (35) is now p ∈ (0, 1]. In this case, using p → 0.5 in (35) results in a proportionate factor close to that of the PNLMS, whereas letting p = 1 recovers the standard LMS/NLMS.

We can also consider the log-sum penalty with g(hi)=log(hi2+ϵ), ϵ > 0 for the reweighted 2 framework [18]. The function is readily amenable to the use of (13) to obtain the update form of Wn as:

wi,n=(hi,n2+ϵ)12. (36)

Or consider the log-sum penalty with g(hi) = log(|hi| + ϵ), ϵ > 0 for the reweighted 1framework [19]. Using (14), the update form of Wn becomes:

wi,n=|hi,n|+ϵ. (37)

The sparsity control parameter is ϵ > 0 for the two log-sum penalty cases. From (36) and (37) we can see that ϵ controls how much proportionate adaptation is encouraged: as ϵ becomes smaller, the term hi,n2 or |hi,n| becomes more dominant. Consequently, they exhibit a stronger proportionate adaptation characteristic. On the contrary, as ϵ becomes larger, the influence of hi,n2 or |hi,n| reduces. Thus, the algorithm will approach the standard LMS/NLMS when ϵhi,n2 or ϵ ≫ |hi,n|. In practice, one can start from a large ϵ and reduce it to find a suitable value.

More example functions can be found in [46], [27], including g(hi) = arctan(|hi|/ϵ), ϵ > 0 also suggested in [19], which works for both the reweighted 2 and 1 frameworks. Note that different diversity measures can result in different computational complexity for calculating Wn. Notably, for example, the p-norm-like function resulting in (34) or (35) might incur extra computation for calculating the quantity to the power 2 − p or 1 − p for some p value (e.g., non-integer power).

Algorithm 1 summarizes the proposed SLMS and SNLMS. A MATLAB implementation of the algorithms is available at https://github.com/chinghualee/SLMS_SNLMS.

IV.

C. Comparison to Existing Work on PNLMS-Type Algorithms

Note that in IPNLMS [21] and IPNLMS-0 [22] there is also a parameter for fitting the sparsity degree, which was heuristically introduced to weight between proportionate and non-proportionate updates. However, this empirical parameter does not reflect the sparsity level of the underlying system directly. In our algorithms, we have the sparsity control parameters that play a similar role for fitting different sparsity levels. However, based on diversity measures in SSR, they have direct connections to the system sparsity, thereby offering a more intuitive parameter selection procedure. Our algorithms thus have the advantages of enjoying theoretical support and leveraging sparsity more straightforwardly.

In terms of algorithm derivations, PNLMS-type algorithms were mostly developed from a constrained optimization problem following the principle of minimal disturbance, e.g., [24], [25], [26], [27], [32], in which modified objective functions have been proposed that impose sparsity on the “change” of the filter rather than on the filter itself. For example, [24], [25], [32] considered enforcing sparsity on the difference between the current and updated filters; [26], [27] imposed sparsity on the so-called correctness component as defined in [26] which also represents the change in the filter coefficients. However, since the assumption is that the filter itself is sparse rather the difference between successive updates, the motivation to enforce sparsity on the “change” of the filter is less clear. Sparsity, in turn, does not seem to fit in straightforwardly under the commonly adopted constrained optimization framework. In contrast, we adopt the general MSE criterion in which filter sparsity can be directly imposed via regularization, which is more direct and makes intuitive sense.

V. Steady-State Performance Analysis

The signal model of system identification described in Section II-A is employed for performance analysis. We further assume the noise vn is i.i.d. according to N(0,σv2). We also introduce several other assumptions useful for simplifying analysis. Although these assumptions may seem restrictive, they make meaningful analysis possible without significant loss of insight and are also commonly adopted in the literature. We shall later see that these assumptions lead to theoretical results that are supported by experiments.

Assumption 1:

The input data vector un is independent of uk for nk. Furthermore, un is independent of vk for all n and k. In practice and from past experience in adaptive filters, this assumption simplifies the analysis and does lead to useful insights [2], [3], despite the fact that it does not in general hold true.

Assumption 2:

The input data vector obeys un~N(0,R) for all n. This technical assumption facilitates the analysis by taking advantage of the useful results on Gaussian random variables [4].

Assumption3:

At steady-state, the diagonal matrix Wn in the update equations can be view as a fixed matrix. As suggested in [5], [23], when the system is at steady-state and when the step size is sufficiently small, the coefficients converge in both mean and mean squared senses. Thus, replacing Wn by a fixed matrix becomes reasonable and convenient.

For convenience we shall consider the algorithm of the following form for performance analysis:

hn+1=hn+μSunen, (38)

where S = diag{si} with si > 0, ∀i = 0, 1, …, M − 1.

For a fixed underlying system ho, define the steady-state excess MSE [4]:

JexlimnE[(unT(hohn))2]. (39)

Under Assumption 1, we have the steady-state MSE:

JlimnE[en2]=σv2+Jex. (40)

The following theorems characterize the steady-state behavior of (38):

Theorem 5.1 (Steady-state excess MSE):

Under Assumptions 12 with a sufficiently small μ and assume R=σu2I, for the adaptive filter (38), the steady-state excess MSE is given by:

Jex=μi=0M1σu2si22μσu2si1μi=0M1σu2si22μσu2siσv2. (41)
Proof:

See Appendix B.

Theorem 5.2 (Convergence conditions):

Under Assumptions 12 with a sufficiently small μ and assume R=σu2I, for the adaptive filter (38):

  1. It converges in the mean sense if:
    |λmax{Iμσu2S}|<1, (42)
    where λmax {X} denotes the largest eigenvalue of a square matrix X in magnitude.
  2. It converges in the mean squared sense if:
    0<μ<(i=0M1σu2si22μσu2si)1. (43)
Proof:

See Appendix C.

A. Steady-State Performance of SLMS

Consider the case where Assumptions 13 are in position and R=σu2I. For analyzing the proposed SLMS (30), first we need to recognize an appropriate S with regard to Assumption 3. A useful approximation at steady-state is to replace the occurrence of hn by the true system ho; that is, to use S=W21Mtr(W2), where W = diag{wi} with wi given by (13) for the reweighted 2 case, or by (14) for the reweighted 1 case, both computed based on the corresponding true coefficient hio. Now, since tr(S) = M, the excess MSE (41) can be approximated as:

Jexμi=0M1σu2si21μi=0M1σu2si2σv2=μσu22tr(S)1μσu22tr(S)σv2=μ2Mσu2μσv2, (44)

where for the approximation we assume a sufficiently small step size μ such that 2μσu2si2, ∀i = 0, 1, …, M − 1.

Now, for the mean squared convergence condition, although the upper bound in (43) of Theorem 5.2 contains μ itself, after some inspection it is clear that the lowest stability limit on μ occurs when S has its diagonal elements nonzero at one tap position (with a value of M) and zero at all others [5]. With such an S, it leads to:

0<μ<23Mσu2. (45)

On the other hand, the largest stability limit is associated with a proportionate matrix assigning equal gains at each position [5], i.e., S = diag{si} with si = 1, ∀i = 0, 1, …, M − 1. With such an S we have:

0<μ<2(2+M)σu2. (46)

For a large M, the largest stability limit can be approximated as 2Mσu2=2tr(R), which is also the stability limit of the LMS [4]. This result is not surprising since using an S that assigns uniform gains essentially becomes the LMS.

B. Steady-State Performance of SNLMS

Consider the case where Assumptions 13 are in position and R=σu2I. For analyzing the proposed SNLMS (32), first we must identify a fixed S to approximate the term SnunTSnun (where we have ignored δ), for which an exact characterization seems difficult, if at all possible, to obtain. However, if we fix Wn = W at steady-state by Assumption 3, where W is again computed based on the true system ho, then we have:

S=W21Mtr(W2)unT(W21Mtr(W2))un=W2unTW2unW2σu2tr(W2), (47)

with the approximation unTW2unE[unTW2un]=σu2tr(W2) utilized. A useful fact of (47) is that tr(S)=(σu2)1. We can thus use the following approximation for (41) to express the excess MSE (41) (and replace μ by μ˜):

Jexμ˜i=0M1σu2si21μ˜i=0M1σu2si2σv2=μ˜σu2i=0M1si2μ˜σu2i=0M1siσv2=μ˜σu2tr(S)2μ˜σu2tr(S)σv2=μ˜σu2(σu2)12μ˜σu2(σu2)1σv2=μ˜2μ˜σv2, (48)

for μ˜ sufficiently small such that 2μ˜σu2si2, ∀i = 0, 1, …, M − 1,

For the mean squared convergence condition, using the same argument as in the SLMS case for (45) and (46), we can obtain the lowest stability limit as:

0<μ˜<23 (49)

and the largest stability limit as:

0<μ˜<21+2M. (50)

For a large M, we have (50) approximately as 0<μ˜<2, which is the classic result of the NLMS [4].

VI. Simulation Results

The proposed algorithms are evaluated using computer simulations in MATLAB. We consider three system IRs as shown in Fig. 1 which represent different sparsity levels: quasi-sparse, sparse, and dispersive systems. The IR of the quasi-sparse system is an acoustic feedback path between the microphone and the loudspeaker of a hearing aid device that was measured from a real-world scenario. It represents a typical IR of many practical system identification problems where certain degree of structural sparsity exists. The sparse and dispersive IRs were artificially generated. Each of these IRs has 256 taps. We conducted experiments to obtain the MSE learning curves (i.e., the ensemble average of en2 as a function of iteration n) for performance comparison. The ensemble averaging was performed over 1000 independent Monte Carlo runs for obtaining each curve. In all experiments, the adaptive filter coefficients were initialized with all zeros. For the input signal, we mainly consider for theoretical analysis two types of un: i) a zero mean, unit variance white Gaussian process and ii) a first order autoregressive (AR) process according to un = ρun−1 + ηn, where ρ = 0.8 and ηn is i.i.d. according to N(0,1). We also include results of speech inputs for demonstrating the algorithm performance with non-stationary signals. The system noise vn is i.i.d. according to N(0,0.01). Regarding the algorithms, when using (34) for updating Wn, a small positive constant c = 0.001 was always used.

Fig. 1:

Fig. 1:

IRs of (a) quasi-sparse, (b) sparse, and (c) dispersive systems. The quasi-sparse IR is an acoustic feedback path of a hearing aid that was measured from a real-world scenario. The sparse and dispersive IRs were artificially generated.

A. Comparison of Algorithms with and without AST

Fig. 2 compares the proposed generalized sparse LMS (29), i.e., the AST-based approach, to some existing regularization-based algorithms of (10), i.e., the regular gradient descent without AST. Specifically, we use the p-norm-like penalty hpp with p = 1 and the log-sum penalty i=0M1log(|hi|+ϵ) with ϵ = 0.1 as examples. These two choices of the sparsity-inducing function I(h) in (10) result in the ZA-LMS and RZA-LMS [28], respectively. We compare them with the corresponding AST-based algorithms obtained from (29), also adopting the two penalty functions for G(h) that lead to (34) and (37) for computing Wn, respectively. We set μ = 0.0025 and λ = 0.001 in all cases and used the white Gaussian process input. Fig. 2 (a) shows the results of identifying the sparse IR and Fig. 2 (b) is the case of estimating the quasi-sparse IR. From the results we see that the AST strategy leads to algorithms (dotted lines) that demonstrate faster convergence than the existing approaches (solid lines).

Fig. 2:

Fig. 2:

Comparison of algorithms with and without AST for identifying (a) sparse and (b) quasi-sparse IRs with white Gaussian process input. Solid lines are existing approaches as given by (10). Dotted lines are their corresponding AST-based algorithms given by (29). It can be seen that AST leads to improved performance.

B. Effect of Sparsity Control Parameter on SLMS and SNLMS

In this experiment we investigate the effect of the sparsity control parameter on the convergence of SLMS (30) and SNLMS (32). We use the p-norm-like diversity measure hpp within the reweighted 2 framework, i.e., using (34) for updating Wn, for demonstration purposes. We study the cases of the sparsity control parameter p = 1, 1.2, 1.5, 1.8, 2. We also include the LMS (3) and NLMS (4) performance curves for reference. For LMS and SLMS we used μ = 0.0025. For NLMS and SNLMS we used μ˜=0.5 and δ = 0.01.

Fig. 3 and Fig. 4 show the resulting MSE curves for SLMS using the white Gaussian noise input and SNLMS using the AR process input, respectively. Recall that the proportionate factors of SLMS/SNLMS using (34) for Wn approximate that of the PNLMS when p → 1, and regenerate the LMS/NLMS when p = 2, as has been discussed in Section IV-B. Therefore, the parameter p plays the role for fitting different sparsity levels and the selection of p can be crucial for obtaining optimal performance for IRs with different sparsity degrees. The results in both Fig. 3 and Fig. 4 suggest that for the quasi-sparse case, the fastest convergence is given by p ∈ [1.2, 1.5], which seems reasonable in terms of finding a balance between PNLMS (p → 1) and LMS/NLMS (p = 2). On the other hand, for the sparse system, p ∈ [1, 1.2] gives the best results, which is also reasonable since as the sparsity level increases, a more PNLMS-like algorithm can be more favorable. Finally, for the dispersive system we see that p ∈ [1.8, 2] results in the fastest convergence and is comparable to, if not better than, the LMS and NLMS. This indicates that a more LMS/NLMS-like algorithm is preferable when the system IR is far from sparse. To conclude, the results show that the algorithms exploit the underlying system structure in the way we expect.

Fig. 3:

Fig. 3:

Effect of sparsity control parameter p on convergence of SLMS for (a) quasi-sparse, (b) sparse, and (c) dispersive IRs with white Gaussian process input. It can be seen that the optimal p value varies with the sparsity degree.

Fig. 4:

Fig. 4:

Effect of sparsity control parameter p on convergence of SNLMS for (a) quasi-sparse, (b) sparse, and (c) dispersive IRs with AR process input. In the colored input case here we have similar observations to the white input case of Fig. 3.

C. Effect of Step Size on SLMS and SNLMS

Fig. 5 studies the effect of the step size on the convergence behavior of the SLMS and SNLMS. We again used (34) for updating Wn. Fig.5 (a) shows the resulting MSE curves obtained by running the SLMS with p = 1.2 on the sparse IR with various μ values, using the white Gaussian noise input. Fig. 5 (b) shows the resulting MSE curves obtained by running the SNLMS with p = 1.5 on the quasi-sparse IR with various μ˜ values, using the AR process input. The dotted lines indicate the theoretical steady-state MSE levels computed from (40) using (44) and (48) for SLMS and SNLMS, respectively. We can see that similar to the well-known trade-off in LMS and NLMS, a larger step size results in faster convergence while at the expense of steady-state performance. We also see that as the step size increases the theoretical prediction becomes less accurate; this is probably due to the approximation made based on the small step size assumption for arriving at (44) and (48). Nevertheless, the prediction agrees well with the stead-state MSE in most cases for a small step size. In addition, though several assumptions have been made to arrive at (40), (44) and (48), the results show that they predict reasonably well in the case of white input and even for correlated input.

Fig. 5:

Fig. 5:

Effect of step size μ or μ^ on convergence of (a) SLMS for the sparse IR with white Gaussian process input and (b) SNLMS for the quasi-sparse IR with AR process input. Dotted lines indicate the theoretical steady-state MSE levels. It can be seen that the theoretical prediction agrees reasonably well with the experimental results especially for a small step size.

D. Comparison with Existing Algorithms

We compare the proposed SLMS and SNLMS using (34) for Wn with existing LMS-type and NLMS-type algorithms. To see how the algorithms behave in a changing environment, in each of the following experiments, a change in the underlying system was introduced by shifting the IR to the right by 16 samples in the middle of the adaptation process [47].

Fig. 6 compares the LMS-type algorithms using the white Gaussian process input. Fig. 6 (a) and Fig. 6 (b) show the MSE curves obtained with the quasi-sparse and sparse IRs, respectively. For LMS we used μ = 0.0025. For ZA-LMS, RZA-LMS, and 0-LMS we fixed μ = 0.0025 and then experimentally optimized the remaining parameters to obtain the best performance in each case. For SLMS we used p = 1.5 and μ = 0.002 in the quasi-sparse case and p = 1.2 and μ = 0.0005 in the sparse case. From the results we can see that all the sparsity-aware algorithms outperform the LMS, with SLMS demonstrating the best result. Comparing Fig. 6 (a) and Fig. 6 (b), we also see that the benefit brought by existing sparsity-aware algorithms becomes limited when the system is less sparse, while the SLMS still provides significant improvement.

Fig. 6:

Fig. 6:

Comparison of LMS-type algorithms with white Gaussian process input on (a) quasi-sparse and (b) sparse IRs. One can see that the proposed SLMS outperforms all the other approaches in both cases.

Fig. 7 compares the NLMS-type algorithms using the AR process input. Fig. 7 (a) and Fig. 7 (b) show the MSE curves obtained with the quasi-sparse and sparse IRs, respectively. For all the algorithms we used μ˜=0.5. For NLMS we used δ = 0.01. For PNLMS, IPNLMS, and IPNLMS-0 we set δ = 0.01/M according to [47], and experimentally optimized the remaining parameters to obtain the best performance in each case. For SNLMS we used p = 1.5 in the quasi-sparse case and p = 1.2 in the sparse case. We used δ = 0.01 for SNLMS, same as NLMS.13 From the results we again observe the benefit of using sparsity-aware adaptation. In addition, the SNLMS demonstrates performance as good as, if not better than, the other proportionate algorithms.

Fig. 7:

Fig. 7:

Comparison of NLMS-type algorithms with AR process input on (a) quasi-sparse and (b) sparse IRs. One can see that the proposed SNLMS performs better than all the other approaches in both cases.

Fig. 8 considers a more practical scenario where we used a speech signal as the input and the quasi-sparse IR, which represents an acoustic channel of practical interest, as the underlying system. The input signal-to-noise ration (SNR) was set to 20 dB using white Gaussian noise. For the SLMS and SNLMS we used p = 1.5 which is a suitable choice for quasi-sparse systems. For evaluation we compare the normalized misalignment hohn22/ho22. In Fig. 8 (a) we see that SLMS performs much better than the LMS, while the 0-LMS fails to provide any improvement. This may be due to the fact that existing regularization-based algorithms tend to enforce sparsity in a more aggressive manner as they work with λ > 0, and this may not be beneficial, if not harmful, when the underlying system is not truly sparse. In Fig.8 (b) we see that SNLMS demonstrates superior convergence behavior than the NLMS, and is also better than the IPNLMS and IPNLMS-0.

Fig. 8:

Fig. 8:

Comparison of (a) LMS-type and (b) NLMS-type algorithms for identifying the quasi-sparse acoustic channel response with speech input at 20 dB SNR. In can be seen that the SLMS and SNLMS perform the best in both cases.

Fig. 9 shows the results for a noisier environment, i.e., 0 dB input SNR, for the same experimental setting of Fig. 8 (only the step size parameters were further tuned due to the stronger noise). We see that in Fig.9 (a) the SLMS significantly outperforms the LMS, while the 0-LMS performs worse. The SNLMS in Fig. 9 (b), on the other hands, still performs better than the NLMS, and is comparable to other proportionate algorithms. From the results we see that our observation on the SLMS and SNLMS performance may be robust to the noise condition.

Fig. 9:

Fig. 9:

Comparison of (a) LMS-type and (b) NLMS-type algorithms for identifying the quasi-sparse acoustic channel response with speech input at 0 dB SNR. In the noisier setting here the SLMS and SNLMS perform comparably, if not better than, other competing algorithms.

VII. Conclusion

In this paper, we developed a mathematical framework for rigorously deriving adaptive filters that exploit the sparse structure of the underlying system response. We started with the regularized objective framework of SSR and developed algorithms that are of the proportionate type. As a result, the adaptive algorithms are quite general and can accommodate a range of regularization functions. The framework utilizes the AST methodology within the iterative reweighted 2 and 1 frameworks for deriving algorithms. We showed that the AST is crucial for obtaining improved adaptive filtering performance over existing approaches when gradient descent approaches are used. We further introduced the SLMS and SNLMS by adopting a zero regularization coefficient in our framework, which take advantage of, though do not strictly enforce, the sparsity of the underlying system if it already exists. Note that the proposed framework is not limited to the algorithms that we have presented so far. Any other penalty function that satisfies the conditions imposed on the diversity measure can potentially be a good candidate for obtaining effective adaptive algorithms by utilizing the framework.

Acknowledgment

The authors would like to thank M. Liang for fruitful discussions leading to Proposition 3.1 in the paper.

This work was supported by National Institutes of Health/National Institute on Deafness and Other Communication Disorders under Grants R01DC015436 and R33DC015046 and National Science Foundation/Information and Intelligent Systems under Award 1838830.

Appendix A

Proof of Theorem 3.1

The proof follows the idea in [48]. We wish to show that the regularized objective function JG(h) in (9) is decreased in each iteration when optimized via (20) and (21). Before proceeding, we need the following lemmas:

Lemma A.1:

For the general diversity measure G(h)=i=0M1g(hi) that satisfies Properties 1–4 in Section II-B, with g(z) being strictly concave in z2 for Property 4, we have:

G(hn+1)G(hn)<Wn1hn+122Wn1hn22, (51)

where Wn = diag{wi,n} with wi,n given by (13).

Proof:

Since g(z) is strictly concave in z2, it satisfies g(z) = f(z2) where f(z) is concave for z+. Due to the concavity, we have the following inequality:

f(z2)f(z1)<f(z1)(z2z1) (52)

hold for some z1, z2+. Note that we use f′(z1) to denote the first order derivative of f(z) w.r.t. z evaluated at z = z1.

Substituting z1=hi,n2 and z2=hi,n+12 into (52) gives:

f(hi,n+12)f(hi,n2)<f(hi,n2)(hi,n+12hi,n2). (53)

Noting that f(hi,n+12)=g(hi,n+1) and f(hi,n2)=g(hi,n), we have:

g(hi,n+1)g(hi,n)<f(hi,n2)(hi,n+12hi,n2). (54)

From (13) we have f(hi,n2)=wi,n2. Therefore,

g(hi,n+1)g(hi,n)<wi,n2(hi,n+12hi,n2). (55)

Summing over i = 0, 1, …, M − 1 on both sides of (55) justifies (51) of Lemma A.1.

Lemma A.2:

For the general diversity measure G(h)=i=0M1g(hi) that satisfies Properties 1–4 in Section II-B, with g(z) being strictly concave in |z| for Property 4, we have:

G(hn+1)G(hn)<Wn1hn+11Wn1hn1, (56)

where Wn = diag{wi,n} with wi,n given by (14).

Proof:

Since g(z) is strictly concave in |z|, it satisfies g(z) = f(|z|) where f(z) is concave for z+. Again, the inequality (52) holds due to the concavity of f(z).

Substituting z1 = |hi,n| and z2 = |hi,n+1| into (52) gives:

f(|hi,n+1|)f(|hi,n|)<f(|hi,n|)(|hi,n+1||hi,n|). (57)

Noting that f(|hi,n+1|) = g(hi,n+1) and f(|hi,n|) = g(hi,n), we have:

g(hi,n+1)g(hi,n)<f(|hi,n|)(|hi,n+1||hi,n|). (58)

From (14) we have f(|hi,n|)=wi,n1. Therefore,

g(hi,n+1)g(hi,n)<wi,n1(|hi,n+1||hi,n|). (59)

Summing over i = 0, 1, …, M − 1 on both sides of (59), we have (56) of Lemma A.2 justified.

Now we are ready to show that JG(h) decreases in each iteration by using the update recursions (20) and (21).

First, for the reweighted 2 framework with Jnl2(q) in (16), we have:

JG(hn+1)JG(hn)=[J(hn+1)+λG(hn+1)][J(hn)+λG(hn)]<[J(hn+1)+λWn1hn+122][J(hn)+λWn1hn22]=[J(Wnqn+1n)+λqn+1n22][J(Wnqnn)+λqnn22]=Jnl2(qn+1n)Jnl2(qnn). (60)

The inequality follows from Lemma A.1. The AST relationships (18) and (19) are also utilized. As we perform optimization of (16) gradient descent, we can have Jnl2(q) decrease in each iteration n, i.e., Jnl2(qn+1n)Jnl2(qnn)<0 using some μn. Therefore, the choice of {μn}n=0 ensures the decrease in JG(h) according to (60), and the update recursion (20) monotonically converges to a local minimum (or saddle point) of (9) under a WSS environment.

On the other hand, for the reweighted 1 framework with Jnl1(q) in (17), we have:

JG(hn+1)JG(hn)=[J(hn+1)+λG(hn+1)][J(hn)+λG(hn)]<[J(hn+1)+λWn1hn+11][J(hn)+λWn1hn1]=[J(Wnqn+1n)+λqn+1n1][J(Wnqnn)+λqnn1]=Jnl1(qn+1n)Jnl1(qnn). (61)

The inequality follows from Lemma A.2. The AST relationships (18) and (19) are also utilized. Similar to the above argument of the reweighted 2 case, there exists a choice of {μn}n=0 that ensures the decrease in JG(h) according to (61), and the update recursion (21) monotonically converges to a local minimum (or saddle point) of (9) under a WSS environment.

Appendix B

Proof of Theorem 5.1

The proof follows the discussion in [4], [5], [25]. Substituting en=dnunThn into (38) we have:

hn+1=hnμSununThn+μSundn. (62)

Using the fact that dn=unTho+vn, we have:

hn+1=hn+μSununT(hohn)+μSunvn. (63)

Define the misalignment vector εn as:

εn=hohn. (64)

Then from (63) we have:

εn+1=(IμSununT)εnμSunvn. (65)

Next, based on (65) we have:

εn+1εn+1T=(IμSununT)εnεnT(IμununTS)+μ2vn2SununTS+Ξ, (66)

where Ξ represents the remaining cross terms whose expectations are zero.

Let Ωn=E[εnεnT]. Taking expectation on both sides of (66) we have:

Ωn+1=E[(IμSununT)εnεnT(IμununTS)]Θ+μ2σv2SRS. (67)

Note that:

Θ=ΩnμSRΩnμΩnRS+μ2SE[ununTεnεnTununT]S. (68)

With Assumptions 1 and 2 it can be shown that [4]:

E[ununTεnεnTununT]=2RΩnR+Rtr(RΩn). (69)

Thus,

Θ=ΩnμSRΩnμΩnRS+2μ2SRΩnRS+μ2SRtr(RΩn)S. (70)

Then, with R=σu2I, in steady-state, i.e., n → ∞,

Ω=Ωμσu2SΩμσu2ΩS+2μ2σu4SΩS+μ2σu4Str(Ω)S+μ2σu2σv2S2. (71)

This implies:

ω=ω2μσu2Sω+2μ2σu4S2ω+μ2σu4s21Tω+μ2σu2σv2s2, (72)

where ω and s are the vectors consisting of the diagonal elements of Ω and S, respectively, and s2 denotes the element-wise squares of the vector s.

The steady-state excess MSE is then:

JexlimnE[(unT(hohn))2]=limnE[(unTεn)2]=limntr(ΩnE[ununT])=σu2tr(Ω)=σu21Tω. (73)

Using (73) for (72) and rearrange the equation, we have:

ωi,=μ2σu2si2Jex+μ2σu2σv2si22μσu2si2μ2σu4si2. (74)

Then it leads to:

Jex=σu2i=0M1ωi,=i=0M1μ2σu2si2Jex+μ2σu2σv2si22μsi2μ2σu2si2, (75)

which yields:

Jex=μi=0M1σu2si22μσu2si1μi=0M1σu2si22μσu2siσv2. (76)

This justifies Theorem 5.1.

Appendix C

Proof of Theorem 5.2

Note that Assumption 1 ensures that hn, un, and vn are mutually independent. Thus, taking expectation of both sides of (65) gives:

E[εn+1]=(IμSR)E[εn]. (77)

Therefore, the following condition is sufficient for convergence in the mean sense [4]:

|λmax{IμSR}|<1. (78)

With R=σu2I, Theorem 5.2-i) is justified.

From (76) we see, by requiring:

1μi=0M1σu2si22μσu2si>0, (79)

we obtain the stability bound for μ as:

0<μ<(i=0M1σu2si22μσu2si)1, (80)

which justifies Theorem 5.2-ii).

Footnotes

1

This similarity has been noticed in [38] where sparse adaptive filtering techniques were utilized for solving the SSR problem. Here we take the opposite direction as we are interested in utilizing SSR techniques for assisting the adaptive filtering algorithms. Both cases exploit the connections between SSR and adaptive filtering but the objectives are different.

2

The algorithms usually reduce to the conventional LMS or NLMS algorithm if the regularization coefficient λ is set to zero.

3

Note that ‖xp for 0 < p < 1 does not satisfy the required axioms for a norm and therefore it is not technically a norm. For exposition simplicity, since the range of p considered is from 0 to 2, we use the norm terminology to cover this range.

4

The quotation marks are used to warn that it is not a proper norm.

5

By abusing the notation we use ∇x also for the subgradient operator without explicitly noting it.

6

Formally, μ˜ is called the normalized step size. For brevity, we still refer to it as the step size but keep in mind that it does not have the same significance as the μ in (3). Note that it is also common in the literature that the same notation of the step size is shared for both LMS and NLMS without explicit distinction.

7

By abusing the terminology we implicitly use “gradient” also for subgradient whenever appropriate.

8

The positive definiteness can be shown to hold for a wide variety of diversity measures used in SSR. In cases where it is not, the positive definiteness can still be ensured by utilizing some small regularization constant.

9

For a point at which G(h)is non-differentiable, this can still hold by properly choosing the subgradients.

10

Note that the chain rule here is basically ∇q = Wnh as a result of the change of variables (15) for a given Wn at iteration n.

11

Note that for Property 4, Theorem 3.1 holds for (20) of the reweigthed 2 framework if g(z) is concave in z2. On the other hand, it holds for (21) of the reweighted 1 framework if g(z) is concave in |z|.

12

We suggest that c be kept relatively small as compared to the amplitude of the filter coefficients so that it would not affect the convergence significantly.

13

Due to the division by 1M in (28) which is not present in (8) of existing PNLMS-type algorithms, the division by M is not needed for δ in SNLMS.

References

  • [1].Widrow B and Stearns SD, Adaptive Signal Processing, Pearson, 1985. [Google Scholar]
  • [2].Haykin S, Adaptive Filter Theory, 5th edition, Pearson, 2013. [Google Scholar]
  • [3].Sayed AH, Adaptive Filters, John Wiley & Sons, 2011. [Google Scholar]
  • [4].Manolakis DG, Ingle VK, and Kogon SM, Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering, and Array Processing, McGraw-Hill; Boston, 2000. [Google Scholar]
  • [5].Duttweiler DL, “Proportionate normalized least-mean-squares adaptation in echo cancelers,” IEEE Trans. Speech Audio Process, vol. 8, no. 5, pp. 508–518, 2000. [Google Scholar]
  • [6].Benesty J, Gänsler T, Morgan DR, Sondhi MM, and Gay SL, Advances in Network and Acoustic Echo Cancellation, Springer, 2001. [Google Scholar]
  • [7].Paleologu C, Benesty J, and Ciochină S, “Sparse adaptive filters for echo cancellation,” Synth. Lect. Speech Audio Process, vol. 6, no. 1, pp. 1–124, 2010. [Google Scholar]
  • [8].Hänsler E and Schmidt G, Topics in Acoustic Echo and Noise Control: Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing, Springer Science & Business Media, 2006. [Google Scholar]
  • [9].Lee C-H, Rao BD, and Garudadri H, “Sparsity promoting LMS for adaptive feedback cancellation,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2017, pp. 226–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Kocic M, Brady D, and Stojanovic M, “Sparse equalization for real-time digital underwater acoustic communications,” in Proc. MTS/IEEE OCEANS, 1995, vol. 3, pp. 1417–1422. [Google Scholar]
  • [11].Wipf D and Nagarajan S, “Iterative reweighted 1 and 2 methods for finding sparse solutions,” IEEE J. Sel. Top. Signal Process, vol. 4, no. 2, pp. 317–329, 2010. [Google Scholar]
  • [12].Rao BD and Kreutz-Delgado K, “An affine scaling methodology for best basis selection,” IEEE Trans. Signal Process, vol. 47, no. 1, pp. 187–200, 1999. [Google Scholar]
  • [13].Nash SG and Sofer A, Linear and Nonlinear Programming, McGraw-Hill Inc., 1996. [Google Scholar]
  • [14].Tibshirani R, “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. Series B (Methodological), pp. 267–288, 1996. [Google Scholar]
  • [15].Gorodnitsky IF and Rao BD, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Trans. Signal Process, vol. 45, no. 3, pp. 600–616, 1997. [Google Scholar]
  • [16].Tipping ME, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res, vol. 1, no. June, pp. 211–244, 2001. [Google Scholar]
  • [17].Chen SS, Donoho DL, and Saunders MA, “Atomic decomposition by basis pursuit,” SIAM Review, vol. 43, no. 1, pp. 129–159, 2001. [Google Scholar]
  • [18].Chartrand R and Yin W, “Iteratively reweighted algorithms for compressive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2008, pp. 3869–3872. [Google Scholar]
  • [19].Candès EJ, Wakin MB, and Boyd SP, “Enhancing sparsity by reweighted 1 minimization,” J. Fourier Anal. Appl, vol. 14, no. 5, pp. 877–905, 2008. [Google Scholar]
  • [20].Wagner K and Doroslovački M, Proportionate-type Normalized Least Mean Square Algorithms, John Wiley & Sons, 2013. [Google Scholar]
  • [21].Benesty J and Gay SL, “An improved PNLMS algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2002, pp. 1881–1884. [Google Scholar]
  • [22].Paleologu C, Benesty J, and Ciochină S, “An improved proportionate NLMS algorithm based on the l0 norm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP),2010, pp. 309–312. [Google Scholar]
  • [23].Martin RK, Sethares WA, Williamson RC, and Johnson CR, “Exploiting sparsity in adaptive filters,” IEEE Trans. Signal Process, vol. 50, no. 8, pp. 1883–1894, 2002. [Google Scholar]
  • [24].Rao BD and Song B, “Adaptive filtering algorithms for promoting sparsity,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2003, pp. 361–364. [Google Scholar]
  • [25].Jin Y, Algorithm Development for Sparse Signal Recovery and Performance Limits Using Multiple-User Information Theory, Ph.D. dissertation, University of California, San Diego, 2011. [Google Scholar]
  • [26].Benesty J, Paleologu C, and Ciochină S, “Proportionate adaptive filters from a basis pursuit perspective,” IEEE Signal Process. Lett, vol. 17, no. 12, pp. 985–988, 2010. [Google Scholar]
  • [27].Liu J and Grant SL, “A generalized proportionate adaptive algorithm based on convex optimization,” in Proc. IEEE China Summit Int. Conf. Signal Inform. Process. (ChinaSIP),2014, pp. 748–752. [Google Scholar]
  • [28].Chen Y, Gu Y, and Hero AO, “Sparse LMS for system identification,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp. 3125–3128. [Google Scholar]
  • [29].Gu Y, Jin J, and Mei S, “l0 norm constraint LMS algorithm for sparse system identification,” IEEE Signal Process. Lett, vol. 16, no. 9, pp. 774–777, 2009. [Google Scholar]
  • [30].Wu FY and Tong F, “Gradient optimization p-norm-like constraint LMS algorithm for sparse system estimation,” Signal Process, vol. 93, no. 4, pp. 967–971, 2013. [Google Scholar]
  • [31].Taheri O and Vorobyov SA, “Reweighted l1-norm penalized LMS for sparse channel estimation and its analysis,” Signal Process, vol. 104, pp. 70–79, 2014. [Google Scholar]
  • [32].Das RL and Chakraborty M, “Improving the performance of the PNLMS algorithm using l1 norm regularization,” IEEE/ACM Trans. Audio, Speech, and Lang. Process, vol. 24, no. 7, pp. 1280–1290, 2016. [Google Scholar]
  • [33].Li Y and Hamamura M, “An improved proportionate normalized least-mean-square algorithm for broadband multipath channel estimation,” Sci. World J, vol. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Albu F, Caciula I, Li Y, and Wang Y, “The lp-norm proportionate normalized least mean square algorithm for active noise control,” in Proc. Int. Conf. Syst. Theory, Control, Comput. (ICSTCC), 2017, pp. 401–405. [Google Scholar]
  • [35].Lima MVS, Ferreira TN, Martins WA, and Diniz PSR, “Sparsity-aware data-selective adaptive filters,” IEEE Trans. Signal Process, vol. 62, no. 17, pp. 4557–4572, 2014. [Google Scholar]
  • [36].Pelekanakis K and Chitre M, “New sparse adaptive algorithms based on the natural gradient and the l0-norm,” IEEE J. Oceanic Eng, vol. 38, no. 2, pp. 323–332, 2013. [Google Scholar]
  • [37].Ferreira TN, Lima MVS, Diniz PSR, and Martins WA, “Low-complexity proportionate algorithms with sparsity-promoting penalties,” in Proc. IEEE Int. Symp. Circuits. Syst. (ISCAS),2016, pp. 253–256. [Google Scholar]
  • [38].Jin J, Gu Y, and Mei S, “A stochastic gradient approach on compressive sensing signal reconstruction based on adaptive filtering framework,” IEEE J. Sel. Top. Signal Process, vol. 4, no. 2, pp. 409–420, 2010. [Google Scholar]
  • [39].Lee C-H, Rao BD, and Garudadri H, “Proportionate adaptive filters based on minimizing diversity measures for promoting sparsity,” in Proc. Asilomar Conf. Signals, Syst., Comput. (ACSSC), 2019, pp. 769–773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Variddhisaï T and Mandic DP, “On an RLS-like LMS adaptive filter,” in Proc. IEEE Int. Conf. Digital Signal Process. (DSP), 2017, pp. 1–5. [Google Scholar]
  • [41].Su G, Jin J, Gu Y, and Wang J, “Performance analysis of l0 norm constraint least mean square algorithm,” IEEE Trans. Signal Process, vol. 60, no. 5, pp. 2223–2235, 2012. [Google Scholar]
  • [42].Sun Y, Babu P, and Palomar DP, “Majorization-minimization algorithms in signal processing, communications, and machine learning,” IEEE Trans. Signal Process, vol. 65, no. 3, pp. 794–816, 2017. [Google Scholar]
  • [43].Oliveira JP, Bioucas-Dias JM, and Figueiredo MAT, “Adaptive total variation image deblurring: A majorization–minimization approach,” Signal Process, vol. 89, no. 9, pp. 1683–1693, 2009. [Google Scholar]
  • [44].Boyd S and Vandenberghe L, Convex Optimization, Cambridge University Press, 2004. [Google Scholar]
  • [45].Luenberger DG and Ye Y, Linear and Nonlinear Programming, 4th edition, Springer, 2016. [Google Scholar]
  • [46].Chen L and Gu Y, “From least squares to sparse: A non-convex approach with guarantee,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 5875–5879. [Google Scholar]
  • [47].Benesty J, Paleologu C, and Ciochină S, “On regularization in adaptive filtering,” IEEE Trans. Audio, Speech, Lang. Process, vol. 19, no. 6, pp. 1734–1742, 2010. [Google Scholar]
  • [48].Rao BD, Engan K, Cotter SF, Palmer J, and Kreutz-Delgado K, “Subset selection in noise based on diversity measure minimization,” IEEE Trans. Signal Process, vol. 51, no. 3, pp. 760–770, 2003. [Google Scholar]

RESOURCES