Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Mar 5.
Published in final edited form as: Automatica (Oxf). 2009 Apr;45(4):956–964. doi: 10.1016/j.automatica.2008.11.020

Towards Identification of Wiener Systems with the Least Amount of a priori Information: IIR Cases

Er-Wei Bai a, John Reyland Jr a
PMCID: PMC3587721  NIHMSID: NIHMS106936  PMID: 23471210

Abstract

In this paper, we investigate what constitutes the least amount of a priori information on the nonlinearity so that the linear part is identifiable in the non-Gaussian input case. Under the white noise input, three types of a priori information are considered including quadrant information, point information and monotonic information. In all three cases, identifiability has been established and the corresponding nonparametric identification algorithms are developed along with their convergence proofs.

Keywords: system identification, nonlinear systems, wiener systems, a priori information

1 Introduction

The Wiener nonlinear system has been used in various applications and identification of such systems has been an active research area for a long time [3,4,6,7,10,14,15,18]. In Wiener system identification, several assumptions are made in the literature. In the case that not enough a priori information on the unknown system is available, a common assumption is a Gaussian random input [4,6,10,15]. Thanks to the Bussgang Theorem [2], identification of Wiener systems is possible. Without the Gaussian assumption, identification of Wiener systems becomes non-trivial. In the case that the nonlinearity is known a priori, identification of the linear part is relatively easy [16,18]. If the nonlinearity is unknown, it is usually assumed that the nonlinearity or the inverse of the nonlinearity is expressed by some known basis functions [3,13] or by a piece-wise polynomial function [14,17]. Therefore, a nonlinear and non-parametric identification problem is reduced to a parameter estimation problem which is often much simpler because there is no uncertainty in the structure anymore. The key assumptions is either the invertibility of the unknown nonlinearity or the availability of some appropriate basis functions. Recently, another approach for Wiener system identification was proposed based on a monotonic assumption [19]. It was shown [19] that if the linear part is FIR and nonlinearity is monotonic, identification of the FIR linear part is possible using only the input-output data, though the solution is not necessarily unique.

Since only the input-output data is available for identification purpose and no internal signals are available, identification of the linear part and/or the nonlinearity is general impossible if no assumptions are made on the unknown system or the input. This is why conditions as discussed above are imposed in the literature. However, a fundamental question remains unanswered so far, i.e., what constitutes the least amount a priori information required for a non-Gaussian white input case in Wiener system identification? Answers to this question have impacts on both theoretical side and application side. Unfortunately, a solution of this question requires not only mathematical quantification of a priori information but also this quantification has to be expressed in practical terms for application purposes. This is a hard problem. A closely related but a little bit simplified question is what constitutes the least amount of a priori information on the unknown nonlinearity so that the linear part of the system can be uniquely identified based on input-output data. Obviously, if the linear part can be identified, the static nonlinearity can be consequently identified. Unfortunately, this reformulated and simplified question is again hard to answer because we are facing the same difficulty of mathematical quantification of a priori information. To overcome the difficulty, our approach is to study the problem in an indirect way, i.e., to develop identification algorithms for the linear part based on as little as possible a priori information on the unknown nonlinearity under a white noise input.

The paper is a continuation of our previous work [2] that discusses the same problem but was limited to Wiener systems with an FIR linear part. Several interesting results were obtained in [2]. It was shown under the FIR assumption, that identification of the linear part is feasible with very little a priori information on the unknown nonlinearity. For instance, quadrant information on the unknown nonlinearity suffices for identification purpose. Note no exact values of the nonlinearity but only sign information is needed. Further, the nonlinearity can be non-smooth and non-monotonic. In addition, it was shown in [2] that either point a priori or monotonic a priori information also suffices. It became clear [2] that for a Wiener system with an FIR linear part, identification is possible with very little a priori information. It was not clear however whether and how if possible, the similar conclusions could be extended to Wiener systems with an IIR linear part. The proofs used in the FIR case are not easily modifiable to an IIR case. In this paper, we show that the main results derived for FIR cases can be extended to IIR cases, though extensions are nontrivial and derivations are more involved. In particular, it is shown that identification of the IIR part is feasible based only on quadrant or point or monotonousness a priori information on the unknown nonlinearity.

The layout of the paper is as follows. The system and problems are introduced in Section 2. Section 3 discusses the point a priori information. With this little a priori information, it is shown that the linear part can be uniquely identified and the corresponding numerical algorithm is also developed along with its convergence proof. In Section 4, the results are extended to a priori information in terms of monotonousness. Similar identifiability results are established. Section 5 is devoted to a priori information in terms of quadrant knowledge of the unknown nonlinearity. It is shown that with quadrant information, the linear part can be uniquely identified. Finally, some concluding remarks are provided in Section 6.

2 Problem statement and preliminary

The Wiener system considered in the paper is shown in Figure 1, where the unknown linear and nonlinear parts are represented by

Fig. 1.

Fig. 1

Wiener system.

G(z)=i=0h(i)zi,andf(·) (2.1)

respectively. The linear part G(z) is assumed to be stable so that |h(i)| ≤ i for some M < ∞ and 0 < λ < 1. No order information on G(z) is available, unless otherwise specified. The input, internal signal and output at time k = 0, 1,, N are represented by u(k), x(k) and y(k) respectively. The internal signal x(k) is unavailable for identification. The nonlinearity is unknown but bounded for bounded inputs. No structural a priori information on f(·) is assumed.

Because of embedded scaling ambiguity in Wiener systems, either the linear part or the nonlinear part has to be normalized for identification purpose [2]. It is assumed in the paper that

h2=i=0h2(i)=1

where h = (h(0), h(1), …)′ is an infinitely dimensional vector representing the impulse response of the linear part. Further, it is assumed that the first non-zero entry of h is positive. All these assumptions are standard to guarantee identifiability. Throughout the paper, it is also assumed that the input u(k) is an bounded independent identically distributed (iid) random sequence and its probability density function is positive over an interval [−a, a] for some 0 < a < 1. No specific distribution on the input is needed and the actual distribution could be unknown. Clearly, all the signals u(k), x(k) and y(k) are bounded because of the stability of the system and bounded inputs.

The goal of identification is to determine an estimate G^(z)=i=0h^(i)zi of G(z), based on the input-output data up to time N with little a priori information on the unknown nonlinearity f(·), specified later in subsequent sections, so that

12πππG(ejω)G^(ejω)2dω0 (2.2)

as N → ∞ in some probability sense, preferably convergence with probability one.. Note again no order information on G(z) is available.

Now, observe that if the estimate Ĝ(z) is stable, then

hh^2=i=0h(i)h^(i)2=12πππG(ejω)G^(ejω)2dω

where ĥ = (ĥ(0), ĥ(1), …)′ is the impulse response vector of the estimate Ĝ(z). Thus, the identification problem is equivalent to finding ĥ of h so that ĥh. Now, given a positive integer n, define

hn=(h(0),h(1),,h(n1)),h^n=(h^(0),h^(1),,h^(n1)).

Because of the stability assumption i=0h(i)2=1,i=nh(i)20 as n → ∞. Thus, ĥh if and only ĥnhn as n → ∞. Further, ||hn|| → 1 implies

h^nhn=h^nhnhn+hnhnhnh^nhnhn+hnhnhn

The second term goes to zero as n → ∞ and therefore, as n → ∞,

h^h0h^nhn0h^nhnhn0

What we have to do is to identify the normalized first n taps of the impulse response of the unknown linear part. In short, to overcome the problem of the unknown order, our way is to find the impulse response.

3 Point a priori information

In this section, we consider identification of the linear part with point a priori information f(x0) = y0 on the unknown nonlinearity for some known x0 and y0. For simplicity, both x0 and y0 are assumed to be at the origin.

Assumption 3.1

It is assumed that

f(x)=0x=0

and f(·) is continuous in the neighborhood of the origin.

The condition is based on the local information f(0) = 0 but is stronger than the local point condition f(0) = 0. f(x) = 0 → x = 0 provides some global information on the nonlinearity since no other value of x could lead to f(x) = 0.

There are two aspects in identification based on the a priori information f(0) = 0. First, no other a priori information on the unknown f(·) is known so all the observed outputs y(k) ≠ 0 together with corresponding inputs do not reveal much information on x(k) or on f(·). In other words, theoretically only the outputs y(k) = 0’s together with the corresponding inputs u(k)’s are useful for identification. Practically, however, y(k) ≈ 0 exactly is unlikely and in fact is not robust in the presence of noise. The hope is that by continuity of f(·) in the neighborhood of the origin, all the data y(k) ≈ 0 implies x(k) ≈ 0 that would result in an estimate ĥn close to hn/||hn||. Thus, analysis contains two parts. The first part is to show that hn/||hn|| can be identified if there is enough data available under the constraint y(k) = 0. Then, we will show that with the data set |y(k)| ≤ ε for some small ε > 0, the obtained estimate is a continuous function of ε and converges to hn/||hn|| as ε → 0.

For each n, consider a fictitious FIR system

x(k)=1hn(h(0),h(1),,h(n1))(u(k)u(k1)u(kn+1))φn(k)

where hn = (h(0), h(1), …, h(n − 1))′ ≠ = 0 which is automatically satisfied for large n because ||hn|| → ||h|| = 1. Given the input-output data set {φn(k),y(k)}1N, it can be easily verified [2] that hn/||hn|| is identifiable for this fictitious FIR linear system based on the point a priori information f(0) = 0 if and only if there exist some 1 ≤ p1 < p2 < … < pkN so that x(p1) = x(p2) = … = x(pk) = 0 (or equivalently y(p1) = … = y(pk) = 0) and the corresponding matrix Φ(p1, p2,, pk ) satisfies

rank(φn(p1)φn(p2)φn(pk))Φ(p1,,pk)=n1. (3.1)

Further, let Φ (p1, …, pk ) = U Σ(V1, V2, …, Vn)′ be the singular value decomposition (SVD) of Φ(p1, …, pk ). It follows that

Vn=hn/hn (3.2)

modulus ± sign. Therefore, hn/||hn|| is identifiable from the SVD of Φ(p1, …, pk ) for data y(p1) = … = y(pk) = 0. Now, the actual system is not FIR but IIR

x(k)i=nh(i)u(ki)=hnφn(k)

Define

z(k)=[x(k)i=nh(i)u(ki)]/hn=hnhnφn(k) (3.3)

If z(p1) = z(p2) = … = z(pk) = 0, the same conclusion as discussed above applies. With the fact that i=nh(i)u(ki)M1λn0 for large n, y(k) ≈ 0 implies x(k), z(k) ≈ 0. The question is if the SVD of Φ(p1, …, pk ) would provide a vector Vn that is close to hn/||hn|| when ε is small but non-zero. To this end, we need some preliminary works.

First, for each n, define an orthonormal basis functions e1, e2, …, en−1 and en = hn/||hn|| in Rn. Construct a truncated cone Ci around each ei, i = 1, 2, …, n, as follows. For i = 1, 2, …, n − 1,

φCi0<a2φ1,andcos((φ,ej))={89ifj=ia9(n2)ifji (3.4)

where ∠(φ, ej ) is the angle between φ and ej, and [a,a] is the interval in which the input probability density function is positive. For i = n,

φCn0<φε(n)andcos((φ,ej))={89ifj=na9(n2)ifjn (3.5)

where nε(n)0 as n → ∞. Clearly, if φCn, we have

<φ,ei>=φei·cos((φ,ei))={εa9(n2)i=1,2,,n1εi=n (3.6)

and similarly, if φCi, i = 1, 2, …, n − 1, we have

<φ,ej>=φej·cos((φ,ej))={a9(n2)ji49aj=i (3.7)

Now recall the definition of φn(ij ) = (u(ij ), u(ij − 1), …, u(ijn + 1))′. Write each φn(ij) in terms of the basis functions ei’s

φn(ij)=βj1e1+βj2e2++βjnen

where β ji is the projection of φn(ij) on ei, and

(x(i1)x(in))=(φn(i1)φn(in))hn+(i=nh(i)u(i1i)i=nh(i)u(ini))R(i1,,in)=(β1,1β1,nβn,1βn,n)(e1en)hn+R(i1,,in)={(β1,1β1,n1βn1,1βn1,n1βn,1βn,n1)(e1en1)Q+(β1,nβn1,nβn,n)enE(ε)}hn+R(i1,,in) (3.8)

for some 1 ≤ i1 < i2 < … < inN.

Lemma 3.1

Consider the Wiener system shown in Figure 1 under Assumption (3.1). Then, we have

  1. For any given large n and ε(n) satisfying nε(n)0 as n → ∞, with probability one as N → ∞, there exists a sequence of φn(ij ), j = 1, 2, …, n so that |y(ij)| ≤ ε and

    rankΦ(i1,,in)=rank(φn(i1)φn(i2)φn(in))n1.
  2. The matrix Φ (i1, …, in) can be written as

    Φ(i1,,in)=Q+E(ε) (3.9)

    for some Q and E(ε), where rank Q = n − 1 independent of ε and ||E(ε)|| → 0 as ε → 0. Further, let

    Φ(i1,i2,,in)=U(ε)(ε)(V1(ε),V2(ε),,Vn(ε))

    be the SVD decomposition of Φ(i1, i2, …, in). Then, modulus of ± signs,

    Vn(ε)hnhn0,asn. (3.10)

Proof

The proof of the first part is essentially the same as for the FIR case [2] by noting ||R(i1, …, in)|| → 0 as n → ∞. To show the second part, consider a submatrix as in (3.8)

(β1,1β1,n1βn1,1βn1,n1)

By the construction of the cones Ci’s and the definition of φn(ij), it follows that

βii49a,βija9(n2),ij

which leads to

βii12{j=1,jin1βji+j=1,jin1βij}49a19a=39a

By the Gershgorin’ Theorem [11] on singular values, the singular values of the above submatrix satisfy

σ1σ2σn139a

independent of n. Further, by the fact that

(e1en1)(e1,,en1)=(100010001)

and

Q=(β1,1β1,n1βn1,1βn1,n100)(e1en1)+(0000βn,1βn,n1)(e1en1) (3.11)

and

βn,1ε,,βn,n1ε. (3.12)

we have from the Wielandt-Hoffman Theorem [8] that the first n − 1 singular values of Q satisfy

σ1σ2σn139aO(nε)29a

for large n. Moreover, Qhn = 0, because hn ⊥ ei for i = 1, 2, …, n − 1, implies that the last or the smallest singular value σn = 0 and

rankQ=n1

for all n. In addition, |βn,j| ≤ ε implies E(ε)=O(nε)0. Now, define

Q=U(V1,,Vn)andQ+E(ε)=U(ε)(ε)(V1(ε),,Vn(ε)) (3.13)

It is clear that hn/||hn|| = ±Vn and what left to show is that Vn(ε) → Vn when n gets larger. To this end, again by the Wielandt-Hoffman Theorem [8], the gap between the smallest singular value and the second smallest singular value of the matrix Q + E(ε) is bounded below by

σn1σn29aO(nε)19a

Now, we apply a version of the Circle Theorem [5] (equ 1.2.19)

sin((Vn(ε),Vn))O(nε)(σn1σn)O(nε)O(nε)19aO(nε)0 (3.14)

as n → ∞. Since ||Vn|| = ||Vn(ε)|| = 1, the conclusion Vn(ε) → Vn = hn/||hn|| follows. This completes the proof.

The following result is a direct consequence of the above lemma.

Theorem 3.1

Let ĥn = (ĥ(0), …, ĥ(n − 1))′ = ±Vn(ε) be the estimate of hnhn so that the first non-zero entry is positive and

G^(z)=i=0n1h^(i)zi

Then, as n → ∞

12πππG(ejω)G^(ejω)2dω0

Based on the results, we can collect the data set |y(i1)| ≤ ε, |y(i2)| ≤ ε, …, |y(in)| ≤ ε so that rank Φ(i1, i2, …, in) ≥ n − 1. Then, the SVD of Φ(i1, i2, …, in) provides the estimate ĥn = Vn(ε), modulus ± sign. A problem is that only data at time i1, i2, …, in are used and all other data is discarded which is not efficient and in fact is not robust in the presence of noise. An efficient way is to use all the data |y(k)| ≤ ε and the corresponding matrix Φ. The analysis as discussed before carries over with no or minimal modifications. But at the same time, since more data is used, the average effect of the noise is reduced making the identification algorithm more robust. We are now in a position to introduce the identification algorithm based on point a priori information.

Identification algorithm with the point a priori information f(0) = 0: Consider the system shown in Figure 1 under Assumption (3.1).

Step 1: Collect data u(k)’s and y(k)’s, k = 1, 2, …, N.

Step 2: For each n, construct a submatrix Φ(i1, i2, …, il) of Φ (1, 2, …, N ) by deleting k’s row if |y(k)| > ε, where nε0 as n → ∞.

Step 3: Calculate SVD

Φ(i1,,il)=U(ε)(ε)(V1(ε),,Vn(ε)).

Step 4: Define ĥn = ±Vn(ε) so that the first non-zero element of ĥn is positive.

Step 5: Set G^(z)=i=0n1h^(i)zi.

Then, from the lemma and theorem, for each n, ĥnhn/||hn|| as N → ∞. Further, as n gets larger and larger, ĥnh and Ĝ (e ) → G(e ) in the integral least squares sense.

We comment that in the algorithm, the choice of ε is not unique. Small ε throws away all the data which is larger than ε and results in few data to construct the estimate. Thus, it takes a longer time to collect the same number of data useful to construct the estimate for a small ε than a large ε. On the other hand, however, a large ε collects y(k) that is not so small which results in x(k) that is not in the neighborhood of 0 but is mistakenly considered to be near 0 and used to construct the estimate. Clearly, this tends to increase the bias and at the same time, to reduce the variance because more and more data can be used. So the choice of ε is to balance the bias and variance which is reminiscent of the choice of the bandwidth in kernel identification [9,12]. The idea is to use local data near y(k) = 0 to identify the linear part without interference from the unknown nonlinearity. Some guidelines are provided in Section 5.4 of [9]. Of course, preferably, any choice of ε needs to be tested on a fresh data set for validation purpose.

We now provide a numerical simulation. Let the linear part be an 4th order system

G(z)=0.7616z2+0.6160z4+0.223z2+0.41 (3.15)

and the nonlinear part be a non-continuous, non-symmetric and non-monotonic nonlinearity shown in Figure 2,

Fig. 2.

Fig. 2

The unknown nonlinearity.

y=f(x)={0.5x0.2x0.21.2x0.2<x(k)0.80.3x+0.5x>0.8 (3.16)

The input u(k) is iid uniformly in [− 1, 1] and the Gaussian noise is added to the output. Figures 3 and 4 show the estimates ĥn, Ĝ (e) of hn and G(e) respectively when N = 3, 000, ε = 0.1, SNR=20db and n = 30 with the estimation error

Fig. 3.

Fig. 3

ĥn and hn

Fig. 4.

Fig. 4

Ĝ(e)(solid) and G(e)(dash-dot).

h^h2=12πππG^(ejω)G(ejω)2dω=0.0011

The estimate ĥ is defined as ĥ = (ĥ(0), …, ĥ (n − 1), 0, 0, …)′.

To demonstrate the performance of the identification, the algorithm has been simulated for different combinations of SNR, data length N, order n and threshold ε. Table 1 shows the estimation error ||ĥ − h||2 for various N and SNR values when ε and n are fixed at ε = 0.1 and n = 30. Table 2 shows the estimation error ||ĥh||2 for various ε and n when N and SNR are fixed at N = 3, 000 and SNR=20dB. All the results are the averages of 50 Monte Carlo simulations.

Table 1.

Estimation error vs N and SNR when ε = 0.1 and n = 30.

SNR 10dB 20dB 40dB
N=1,000 0.2552 0.0344 0.0049 0.0027
N=2,000 0.0852 0.0144 0.0022 0.0010
N=3,000 0.0600 0.0087 0.0012 0.0007

Table 2.

Estimation error vs ε and n when N = 3, 000 and SNR=20dB.

n=15 n=20 n=30
ε = 0.08 0.0046 0.0069 0.0106
ε = 0.1 0.0038 0.0056 0.0085
ε = 0.12 0.0036 0.0050 0.0082

4 Monotonic nonlinearities

The idea of point a priori information is that though there is no other information about the nonlinearity, data in the neighborhood of origin could be used to construct an estimate because the knowledge about the nonlinearity around the origin is known. In this section, we extend the idea to a case where no point a priori information is available but the nonlinearity is assumed to be monotonic in some intervals. More precisely, it is assumed that

Assumption 4.1

There exists an interval −∞ < f < f̄ < ∞ and within the interval f(x) ∈ [f, f̄], f(·) is continuous and

f(x1)=f(x2)x1=x2

Again, we comment that the assumption actually contains some global information on the nonlinearity. Let f(x) = f and f() = . Then, the assumption prevents the nonlinearity from taking any value between f and anywhere outside of the range (x, x̄).

Clearly, f(·) is monotonic if y = f(x) ∈ [f, f̄].

Now, define

z(i,j)=x(i)x(j)ψn(i,j)=φn(i)φn(j), (4.1)

It is easily verified that

z(i,j)=hnψn(i,j)+l=nh(l)(u(il)u(jl)) (4.2)

The equation is reminiscent of (3.3) and is a key for identification based on point a priori information. Note from the monotonic assumption,

y(i)y(j)=f(x(i))f(x(j))=0x(i)=x(j)z(i,j)=0

and

f(x(i))f(x(j))εx(i)x(j)ε1

for small ε1 thanks to the continuity of f, if ε is small enough. Therefore, by re-naming z(i, j) as x(i) and ψn(i, j) as φn(i), everything developed for point a priori information in the previous section can be carried over here. The following result is a straightforward extension of Theorem 3.1.

Theorem 4.1

Consider the system shown in Figure 1 under Assumption (4.1). Assume that the probability density function of y = f(x) is positive in the interval [f, f̄]. Then,

  1. For any n and ε > 0 so that nε0 as n → ∞, with probability one as N → ∞, there exist two sequences ψn(il, jl) = φn(il) − φn(jl) and |z(il, jl)| = |y(il) − y(jl)| ≤ ε and

    rankΨ(i1,j1,in,jn)=rank(φn(i1)φn(j1)φn(in)φn(jn))n1. (4.3)
  2. The matrix Ψ(il, j1, …in, jn) can be written as

    Ψ(i1,j1,in,jn)=Q+E(ε)

    for some Q and E(ε), where rank Q = n − 1 independent of n and ||E(ε)|| → 0 as n → 0. Further, let

    Ψ(i1,j1,in,jn)=U(ε)(ε)(V1(ε),V2(ε),,Vn(ε))

    be the SVD of Ψ. Then, modulus of ± signs,

    Vn(ε)hnhn0,asn0 (4.4)

    or equivalently 12πππG(ejω)G^(ejω)2dω0.

Similarly, we can define the identification algorithm where the unknown nonlinearity is monotonic in [f, f̄].

Identification algorithm with the monotonic assumption: Consider the system in Figure 1 under Assumption (4.1).

Step 1: Collect data u(k)’s and y(k)’s for those y(k) ∈ [f, f̄].

Step 2. Sort out the collected data in a decreasing order y(k1) ≥ y(k2) ≥ … ≥ y(kl).

Step 3: For each n and ε with nε0, construct z(ki, ki+1) = y(ki) − y(ki+1), ψn(ki, ki+1) = φn(ki) − φn(ki+1). Construct a submatrix Ψn(i1, j1, …, il, jl) of Ψn by deleting q’s row if |z(q, q + 1)| > ε.

Step 4: Calculate the SVD Ψn(i1, j1, …, il, jl) =U (ε)Σ(ε)(V1(ε), …, Vn(ε))′.

Step 5: Define ĥn = ±Vn(ε) so that the first non-zero element of ĥn is positive.

Step 6: Set G^(z)=i=0n1h^(i)zi.

As before, ĥnh and Ĝ (e) → G(e).

We now test the algorithm on the same example (3.15) as in the previous section under the same input but under the assumption that the nonlinearity is monotonic for |y| ≤ 0.7. Figures 5 and 6 show the estimates ĥn, Ĝ (e) of hn and G(e) respectively when N = 2, 000, ε = 0.1, SNR=20db and n = 30 with the estimation error 0.0041.

Fig. 5.

Fig. 5

ĥn and hn, monotonic assumption.

Fig. 6.

Fig. 6

Ĝ(e)(solid) and G(e)(dash-dot), monotonic assumption.

Again, to demonstrate the performance of the identification algorithm, Table 3 shows the estimation errors for various N and SNR when ε = 0.1 and n = 30 and Table 4 shows the estimation errors for various ε and n when N = 1, 000 and SNR=20dB. All the results are the averages of 50 Monte Carlo simulations.

Table 3.

Estimation error vs N and SNR when ε = 0.1 and n = 30, monotonic priori information.

SNR 10dB 20dB 40dB
N=500 0.0670 0.0186 0.0020 0.0000021
N=1,000 0.0315 0.0080 0.0009 0.00000026
N=2,000 0.0142 0.0042 0.0005 0.00000003

Table 4.

Estimation error vs ε and n when N = 1, 000 and SNR=20dB, monotonic priori information.

n=15 n=20 n=30
ε = 0.08 0.0037 0.0054 0.0084
ε = 0.1 0.0040 0.0058 0.0086
ε = 0.12 0.0038 0.0051 0.0084

This algorithm seems to outperform the one with point a priori information. One explanation is that this algorithm utilizes the data y ∈ [−0.7, 0.7] and the previous one only uses the data y close to zero. Simply put, more data is allowed for this algorithm than the previous one and thus, the effect of noises is small. We also comment that the nonlinearity is actually non-continuous but the algorithm works anyway. This is because the nonlinearity is non-continuous only at one point for |y| ≤ 0.7. Further, all the data collected on two segments separated by this point will not be used in identification because the difference 0.18 is larger than the threshold ε = 0.1.

5 Quadrant a priori information

In this section, we discuss identification with quadrant or sign a priori information. It is assumed that

Assumption 5.1

sign(x)=sign(f(x))=sign(y).

Clearly, the unknown nonlinearity is strictly in the first and third quadrants and no other information is available. The nonlinearity can be non-smooth and non-monotonic. It is important to comment that the results derived in this section are not limited to a priori information of Assumption 5.1 but apply to sign(y) = −sign(x) or other similar a priori information with minimal modifications.

In this section, we make an additional assumption on the linear part.

Assumption 5.2

The order m of the linear part is known

G(z)=α1zm1+α2zm2++αmzm+β1zm1++βm

Obviously, there is no other a priori information and identification has to rely on the knowledge of sign(x(k)) = sign(y(k)). Let (α̂1, …, α̂m, β̂1, …, β̂m)′ denote an estimate of (α1, …, αm, β1, …, βm)′ and

G^(z)=α^1zm1+α^2zm2++α^mzm+β^1zm1++β^m

an estimate of G(z). Because Ĝ (z) = G(z) if

(α^1,,α^m,β^1,,β^m)=(α1,,αm,β1,,βm)

our approach to find an estimate is by the following minimization

(α^1,,α^m,β^1,,β^m)=argminα¯i,β¯jk=1N(sign(y(k))sign(y¯(k)))2=argminα¯i,β¯jk=1N(sign(y(k))sign(x¯(k)))2 (5.1)

subject to ||ĥ|| = 1, where ĥ is the impulse response of the estimate Ĝ (z) and (k) is generated by the input and G¯(z)=α¯1zm1+α¯2zm2++α¯mzm+β¯1zm1++β¯m.

It is clear that k=1N(sign(y(k))sign(x^(k)))2=0 if (α̂1, …, α̂m, β̂1, …, β̂m) = (α1, …, αm, β1, …, βm). To guarantee that the optimization (5.1) will produce a correct estimate, what we have to show is that for all (α̂1, …, α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm), 0<k=1N(sign(x(k))sign(x^(k)))2 or equivalently,

(sign(x(k))sign(x^(k)))2=4

for some k. The meaning is that the minimization of (5.1) has one and only one solution that is achieved at the true but unknown (α1, …, αm, β1, …, βm).

Recall h and ĥ are the impulse responses of G(z) and Ĝ (z) respectively. Obviously,

h=h^(α^1,,α^m,β^1,,β^m)=(α1,,αm,β1,,βm)

Now, write

x(kn)=hnφn(kn)+i=nh(i)u(kni)x^(kn)=h^nφn(kn)+i=nh^(i)u(kni)

Before presenting the main result of this section, we make a few observations.

  1. φn(kn) and φn(jn) are iid if jk, Also, φn(kn) is a n-dimensional random vector that assumes any direction with a positive probability. Moreover, ||φn(kn) || ≥ a/2 with a positive probability.

  2. ||ĥ|| = ||h|| = 1. Thus for any small ε > 0, there is an n1 > 0 such that for all nn1,

    1k=0n1h(k)21ε,1k=0n1h^(k)21ε

    Further, hnĥn if hĥ.

  3. For any small ε > 0, there exists n2 > 0 such that for all nn2,

    i=nh(i)u(ni)ε,i=nh^(i)u(ni)ε.

We now state the main results of this section.

Theorem 5.1

Consider the Wiener system in Figure 1 under Assumptions 5.1 and 5.2, the estimate G^(z)=α^1zm1+α^2zm2++α^mzm+β^1zm1++β^m derived from the minimization of (5.1). Then, with probability one as N → ∞,

k=1N(sign(y(k))sign(x^(k)))24

if (α̂1, …, α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm).

Proof

If (α̂1, …,α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm) or equivalently ĥh, the angle θ = ∠(h, ĥ) between h and ĥ is non-zero. There are two cases, 0 < θ < 90° and 90° ≤ θ ≤ 180°. The proof for the second case is similar and we only show the first case.

From the observations, there is a large (possibly unknown) n and a small (possible unknown) ε > 0 such that

hnh^n,hn>1ε,h^n>1εi=nh(i)u(kni)<ε,i=nh^(i)u(kni)<ε

and

0<ε<a2(1ε)sin(ξ),14θξ34θ.

This is because the right hand side goes to a positive value and the middle term goes to zero as ε → 0. In addition, the angle ∠(hn, ĥn) is between [3/4θ, 5/4θ] for large n because ∠(hn, ĥn) → ∠(h, ĥ). Further, there exists a vector φn(kn) with ||φn(kn)|| ≥ a/2 as shown in Figure 7.

Fig. 7.

Fig. 7

hn, ĥn and φn(kn).

Now,

cos(90°+14θ)=sin(14θ)<2εa(1ε),cos(90°+34θ)=sin(34θ)<2εa(1ε)cos(90°12θ)=sin(12θ)>2εa(1ε)

which results in

hnφn(kn)=hnφn(kn)cos((hn,φn(kn)))<hnφn(kn)2εa(1ε)<εh^nφn(kn)=h^nφn(kn)cos((h^n,φn(kn)))=h^nφn(kn)cos(90°12θ)>ε

Therefore,

x(kn)=i=nh(i)u(kni)+hnφn(kn)<ε+hnφn(kn)<0x^(kn)=i=nh^(i)u(kni)+h^nφn(kn)>ε+h^nφn(kn)>0

or

(sign(x(kn))sign(x^(kn)))2=4 (5.2)

By the continuity arguments, any vector close enough to φn(kn) would result in the same conclusion as (5.2). Again from the observations, φn(kn), k = 1, 2, …, is iid with a positive probability ||φn(kn)|| ≥ a/2 and assumes any direction with a positive probability. Therefore, there is a positive probability for each k that φn(kn) produces (5.2). More precisely, for each k, there is positive probability p > 0 that (sign(x(kn)) −sign( (kn)))2 = 4 if (α̂1, …, α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm). By the Borel Lemma k=1(1p)k<, we conclude that with probability one as N → ∞, there is a k such that (sign(x(kn)) − sign( (kn)))2 = 4. This completes the proof.

The result presented above is actually weaker than its counterpart for an FIR case as in [2] where not only does the minimization have one and only one global minimum but also there are no other local minimum. In other words, the objective function is a monotonic function of the angle between the estimate and the true but unknown system. We conjecture the same conclusion holds for the IIR case but do not have any proof yet.

Identification algorithm under quadrant a priori information.: Consider the system in Figure 1 under Assumptions (5.1) and (5.2).

Step 1: Collect data φ (k) and y(k), k = 1, …, N.

Step 2: Solve the minimization problem (5.1) to find the estimate (α̂1, …, α̂m, β̂1, …, β̂m).

Step 3: Define

G^(z)=α^1zm1+α^2zm2++α^mzm+β^1zm1++β^m

We now test the algorithm on the same example (3.15) as in the previous section under the same input but under the assumption sign(y(k)) = sign(x(k)). A genetic algorithm [1] was applied with n = 30 and N = 5, 000 and 10, 000. The genetic algorithm is a heuristic zero-order iterative search algorithm. The total number of the genetic algorithm parent was 64. Figures 8 and 9 show the estimates ĥn, Ĝ (e) of hn and G(e) respectively when SNR=20db. Table 5 shows the estimation errors for various SNR. All the results are the averages of 50 Monte Carlo simulations.

Fig. 8.

Fig. 8

ĥn and hn, sign a priori information.

Fig. 9.

Fig. 9

Ĝ(e)(solid) and G(e)(dash-dot), sign a priori information.

Table 5.

Estimation error, sign a priori information.

SNR 10dB 20dB 40dB
N=5,000 0.0093 0.0028 0.0005 0.0003
N=10,000 0.0057 0.0014 0.0002 0.0001

6 Concluding remarks

The focus of this paper is to derive identifiability under various minimal a priori information on the unknown nonlinearity. No theoretical results on noise analysis are presented. Noise effects are however extensively tested in numerical simulations. Theoretical study of noise effects will be an interesting research topic.

Our long term goal is to find what constitutes the least amount of a priori information that makes identification of a Wiener system possible. The finding presented in the paper are useful in this regard but there is still a long way to go to find the answer.

Biographies

graphic file with name nihms106936b1.gifEr-Wei Bai was educated in Fudan University, Shanghai Jiaotong University, both in Shanghai, China, and the University of California at Berkeley. Dr. Bai is Professor of Electrical and Computer Engineering at the University of Iowa where he teaches and conducts research in identification, control, signal processing and their applications in engineering and medicine.

Dr. Bai is an IEEE Fellow and a recipient of the President’s Award for Teaching Excellence.

graphic file with name nihms106936b2.gifJohn Reyland, Jr. is a Principle Digital Signal Processing Engineer at Rockwell Collins, Inc. in Cedar Rapids, Iowa. He is also a Ph.D. candidate in Electrical and Computer Engineering at the University of Iowa. Mr. Reyland has a B.S.E.E. from Texas A&M University and an M.S.E.E. from George Mason University in Fairfax, Virginia.

Footnotes

This paper was not presented at any IFAC meeting. The work was supported in part by NSF ECS-0555394 and NIH/NIBIB EB004287.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Abe M. Comparison of the Convergence of IIR Evolutionary Digital Filters and Other Adaptive Digital Filters on a Multiple-Peak Surface. Proc the Thirty-First Asilomar Conference on Signals, Systems & Computers. 1997;2:1674–1678. [Google Scholar]
  • 2.Bai EW, Reyland J. Towards identification of Wiener systems with the least amount of a priori information on the nonlinearity. Automatica. 2008;44:910–919. doi: 10.1016/j.automatica.2008.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bai EW. A blind approach to the Hammerstein-Wiener model identification. Automatica. 2002;38:967–979. [Google Scholar]
  • 4.Billings SA, Fakhouri SY. Identification of a class of nonlinear systems using correlation analysis. Proc of IEE. 1978;125(7):691–697. [Google Scholar]
  • 5.Bjorck A. Numerical methods for least squares problems. SIAM publisher; 1996. [Google Scholar]
  • 6.Hu X, Chen HF. Strong consistence of recursive identification for Wiener systems. Automatica. 2005;41:1905–1916. [Google Scholar]
  • 7.Crama P, Schoukens J. Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Trans on Instrumentation and Measurement. 2001;50:1791–1795. [Google Scholar]
  • 8.Golub GH, Van Loan C. Matrix Computations. The John Hopkins University Press; Baltimore, Maryland: 1984. [Google Scholar]
  • 9.Fan J, Yao Q. NONLINEAR TIME SERIES. Springer; New York: 2003. [Google Scholar]
  • 10.Greblicki W. Nonparametric identification of Wiener systems. IEEE Trans on Info Theory. 1992;38:1487–1493. [Google Scholar]
  • 11.Johnson C, Szulc T. Further lower bounds for the smallest singular values. Linear algebra and its applications. 1998;272:169–179. [Google Scholar]
  • 12.Nadaraya EA. NONPARAMETRIC ESTIMATION OF PROBABILITY DENSITIES AND REGRESSION CURVES. Kluwer Academic Pub; Dordrecht, The Netherlands: 1989. [Google Scholar]
  • 13.Papoulis A, Pillai SU. Probability, Random Variables and Stochastic Processes. 4. McGraw Hill; Boston: 2002. [Google Scholar]
  • 14.Voros J. Parameter identification of Wiener systems with discontinuous nonlinearities. Systems and Control Letters. 2001;44(5):363–372. [Google Scholar]
  • 15.Westwick D, Verhaegen M. Identifying MIMO Wiener systems using subspace model identification method. Signal Processing. 1996;52:235–258. [Google Scholar]
  • 16.Wigren T. Circle criteria in recursive identification. IEEE Trans on Automatic Control. 1997;42:975–979. [Google Scholar]
  • 17.Wigren T. Recursive prediction error identification using the nonlineari Wiener model. Automatica. 1993;29:1011–1025. [Google Scholar]
  • 18.Wigren T. Adaptive filtering using quantized output measurements. IEEE Trans on Signal Processing. 1998;46:3423–3426. [Google Scholar]
  • 19.Zhang Q, Iouditski A, Ljung L. IFAC Symp on System Identification. Newcastle; Australia: 2006. Identification of Wiener system with monotonous nonlinearity; pp. 166–171. [Google Scholar]

RESOURCES