Abstract
In this paper, we investigate what constitutes the least amount of a priori information on the nonlinearity so that the linear part is identifiable in the non-Gaussian input case. Under the white noise input, three types of a priori information are considered including quadrant information, point information and monotonic information. In all three cases, identifiability has been established and the corresponding nonparametric identification algorithms are developed along with their convergence proofs.
Keywords: system identification, nonlinear systems, wiener systems, a priori information
1 Introduction
The Wiener nonlinear system has been used in various applications and identification of such systems has been an active research area for a long time [3,4,6,7,10,14,15,18]. In Wiener system identification, several assumptions are made in the literature. In the case that not enough a priori information on the unknown system is available, a common assumption is a Gaussian random input [4,6,10,15]. Thanks to the Bussgang Theorem [2], identification of Wiener systems is possible. Without the Gaussian assumption, identification of Wiener systems becomes non-trivial. In the case that the nonlinearity is known a priori, identification of the linear part is relatively easy [16,18]. If the nonlinearity is unknown, it is usually assumed that the nonlinearity or the inverse of the nonlinearity is expressed by some known basis functions [3,13] or by a piece-wise polynomial function [14,17]. Therefore, a nonlinear and non-parametric identification problem is reduced to a parameter estimation problem which is often much simpler because there is no uncertainty in the structure anymore. The key assumptions is either the invertibility of the unknown nonlinearity or the availability of some appropriate basis functions. Recently, another approach for Wiener system identification was proposed based on a monotonic assumption [19]. It was shown [19] that if the linear part is FIR and nonlinearity is monotonic, identification of the FIR linear part is possible using only the input-output data, though the solution is not necessarily unique.
Since only the input-output data is available for identification purpose and no internal signals are available, identification of the linear part and/or the nonlinearity is general impossible if no assumptions are made on the unknown system or the input. This is why conditions as discussed above are imposed in the literature. However, a fundamental question remains unanswered so far, i.e., what constitutes the least amount a priori information required for a non-Gaussian white input case in Wiener system identification? Answers to this question have impacts on both theoretical side and application side. Unfortunately, a solution of this question requires not only mathematical quantification of a priori information but also this quantification has to be expressed in practical terms for application purposes. This is a hard problem. A closely related but a little bit simplified question is what constitutes the least amount of a priori information on the unknown nonlinearity so that the linear part of the system can be uniquely identified based on input-output data. Obviously, if the linear part can be identified, the static nonlinearity can be consequently identified. Unfortunately, this reformulated and simplified question is again hard to answer because we are facing the same difficulty of mathematical quantification of a priori information. To overcome the difficulty, our approach is to study the problem in an indirect way, i.e., to develop identification algorithms for the linear part based on as little as possible a priori information on the unknown nonlinearity under a white noise input.
The paper is a continuation of our previous work [2] that discusses the same problem but was limited to Wiener systems with an FIR linear part. Several interesting results were obtained in [2]. It was shown under the FIR assumption, that identification of the linear part is feasible with very little a priori information on the unknown nonlinearity. For instance, quadrant information on the unknown nonlinearity suffices for identification purpose. Note no exact values of the nonlinearity but only sign information is needed. Further, the nonlinearity can be non-smooth and non-monotonic. In addition, it was shown in [2] that either point a priori or monotonic a priori information also suffices. It became clear [2] that for a Wiener system with an FIR linear part, identification is possible with very little a priori information. It was not clear however whether and how if possible, the similar conclusions could be extended to Wiener systems with an IIR linear part. The proofs used in the FIR case are not easily modifiable to an IIR case. In this paper, we show that the main results derived for FIR cases can be extended to IIR cases, though extensions are nontrivial and derivations are more involved. In particular, it is shown that identification of the IIR part is feasible based only on quadrant or point or monotonousness a priori information on the unknown nonlinearity.
The layout of the paper is as follows. The system and problems are introduced in Section 2. Section 3 discusses the point a priori information. With this little a priori information, it is shown that the linear part can be uniquely identified and the corresponding numerical algorithm is also developed along with its convergence proof. In Section 4, the results are extended to a priori information in terms of monotonousness. Similar identifiability results are established. Section 5 is devoted to a priori information in terms of quadrant knowledge of the unknown nonlinearity. It is shown that with quadrant information, the linear part can be uniquely identified. Finally, some concluding remarks are provided in Section 6.
2 Problem statement and preliminary
The Wiener system considered in the paper is shown in Figure 1, where the unknown linear and nonlinear parts are represented by
Fig. 1.

Wiener system.
| (2.1) |
respectively. The linear part G(z) is assumed to be stable so that |h(i)| ≤ Mλi for some M < ∞ and 0 < λ < 1. No order information on G(z) is available, unless otherwise specified. The input, internal signal and output at time k = 0, 1,, N are represented by u(k), x(k) and y(k) respectively. The internal signal x(k) is unavailable for identification. The nonlinearity is unknown but bounded for bounded inputs. No structural a priori information on f(·) is assumed.
Because of embedded scaling ambiguity in Wiener systems, either the linear part or the nonlinear part has to be normalized for identification purpose [2]. It is assumed in the paper that
where h = (h(0), h(1), …)′ is an infinitely dimensional vector representing the impulse response of the linear part. Further, it is assumed that the first non-zero entry of h is positive. All these assumptions are standard to guarantee identifiability. Throughout the paper, it is also assumed that the input u(k) is an bounded independent identically distributed (iid) random sequence and its probability density function is positive over an interval [−a, a] for some 0 < a < 1. No specific distribution on the input is needed and the actual distribution could be unknown. Clearly, all the signals u(k), x(k) and y(k) are bounded because of the stability of the system and bounded inputs.
The goal of identification is to determine an estimate of G(z), based on the input-output data up to time N with little a priori information on the unknown nonlinearity f(·), specified later in subsequent sections, so that
| (2.2) |
as N → ∞ in some probability sense, preferably convergence with probability one.. Note again no order information on G(z) is available.
Now, observe that if the estimate Ĝ(z) is stable, then
where ĥ = (ĥ(0), ĥ(1), …)′ is the impulse response vector of the estimate Ĝ(z). Thus, the identification problem is equivalent to finding ĥ of h so that ĥ → h. Now, given a positive integer n, define
Because of the stability assumption as n → ∞. Thus, ĥ → h if and only ĥn → hn as n → ∞. Further, ||hn|| → 1 implies
The second term goes to zero as n → ∞ and therefore, as n → ∞,
What we have to do is to identify the normalized first n taps of the impulse response of the unknown linear part. In short, to overcome the problem of the unknown order, our way is to find the impulse response.
3 Point a priori information
In this section, we consider identification of the linear part with point a priori information f(x0) = y0 on the unknown nonlinearity for some known x0 and y0. For simplicity, both x0 and y0 are assumed to be at the origin.
Assumption 3.1
It is assumed that
and f(·) is continuous in the neighborhood of the origin.
The condition is based on the local information f(0) = 0 but is stronger than the local point condition f(0) = 0. f(x) = 0 → x = 0 provides some global information on the nonlinearity since no other value of x could lead to f(x) = 0.
There are two aspects in identification based on the a priori information f(0) = 0. First, no other a priori information on the unknown f(·) is known so all the observed outputs y(k) ≠ 0 together with corresponding inputs do not reveal much information on x(k) or on f(·). In other words, theoretically only the outputs y(k) = 0’s together with the corresponding inputs u(k)’s are useful for identification. Practically, however, y(k) ≈ 0 exactly is unlikely and in fact is not robust in the presence of noise. The hope is that by continuity of f(·) in the neighborhood of the origin, all the data y(k) ≈ 0 implies x(k) ≈ 0 that would result in an estimate ĥn close to hn/||hn||. Thus, analysis contains two parts. The first part is to show that hn/||hn|| can be identified if there is enough data available under the constraint y(k) = 0. Then, we will show that with the data set |y(k)| ≤ ε for some small ε > 0, the obtained estimate is a continuous function of ε and converges to hn/||hn|| as ε → 0.
For each n, consider a fictitious FIR system
where hn = (h(0), h(1), …, h(n − 1))′ ≠ = 0 which is automatically satisfied for large n because ||hn|| → ||h|| = 1. Given the input-output data set , it can be easily verified [2] that hn/||hn|| is identifiable for this fictitious FIR linear system based on the point a priori information f(0) = 0 if and only if there exist some 1 ≤ p1 < p2 < … < pk ≤ N so that x(p1) = x(p2) = … = x(pk) = 0 (or equivalently y(p1) = … = y(pk) = 0) and the corresponding matrix Φ(p1, p2, …, pk ) satisfies
| (3.1) |
Further, let Φ (p1, …, pk ) = U Σ(V1, V2, …, Vn)′ be the singular value decomposition (SVD) of Φ(p1, …, pk ). It follows that
| (3.2) |
modulus ± sign. Therefore, hn/||hn|| is identifiable from the SVD of Φ(p1, …, pk ) for data y(p1) = … = y(pk) = 0. Now, the actual system is not FIR but IIR
Define
| (3.3) |
If z(p1) = z(p2) = … = z(pk) = 0, the same conclusion as discussed above applies. With the fact that for large n, y(k) ≈ 0 implies x(k), z(k) ≈ 0. The question is if the SVD of Φ(p1, …, pk ) would provide a vector Vn that is close to hn/||hn|| when ε is small but non-zero. To this end, we need some preliminary works.
First, for each n, define an orthonormal basis functions e1, e2, …, en−1 and en = hn/||hn|| in Rn. Construct a truncated cone Ci around each ei, i = 1, 2, …, n, as follows. For i = 1, 2, …, n − 1,
| (3.4) |
where ∠(φ, ej ) is the angle between φ and ej, and [a, −a] is the interval in which the input probability density function is positive. For i = n,
| (3.5) |
where as n → ∞. Clearly, if φ ∈ Cn, we have
| (3.6) |
and similarly, if φ ∈ Ci, i = 1, 2, …, n − 1, we have
| (3.7) |
Now recall the definition of φn(ij ) = (u(ij ), u(ij − 1), …, u(ij − n + 1))′. Write each φn(ij) in terms of the basis functions ei’s
where β ji is the projection of φn(ij) on ei, and
| (3.8) |
for some 1 ≤ i1 < i2 < … < in ≤ N.
Lemma 3.1
Consider the Wiener system shown in Figure 1 under Assumption (3.1). Then, we have
-
For any given large n and ε(n) satisfying as n → ∞, with probability one as N → ∞, there exists a sequence of φn(ij ), j = 1, 2, …, n so that |y(ij)| ≤ ε and
-
The matrix Φ (i1, …, in) can be written as
(3.9) for some Q and E(ε), where rank Q = n − 1 independent of ε and ||E(ε)|| → 0 as ε → 0. Further, let
be the SVD decomposition of Φ(i1, i2, …, in). Then, modulus of ± signs,
(3.10)
Proof
The proof of the first part is essentially the same as for the FIR case [2] by noting ||R(i1, …, in)|| → 0 as n → ∞. To show the second part, consider a submatrix as in (3.8)
By the construction of the cones Ci’s and the definition of φn(ij), it follows that
which leads to
By the Gershgorin’ Theorem [11] on singular values, the singular values of the above submatrix satisfy
independent of n. Further, by the fact that
and
| (3.11) |
and
| (3.12) |
we have from the Wielandt-Hoffman Theorem [8] that the first n − 1 singular values of Q satisfy
for large n. Moreover, Qhn = 0, because hn ⊥ ei for i = 1, 2, …, n − 1, implies that the last or the smallest singular value σn = 0 and
for all n. In addition, |βn,j| ≤ ε implies . Now, define
| (3.13) |
It is clear that hn/||hn|| = ±Vn and what left to show is that Vn(ε) → Vn when n gets larger. To this end, again by the Wielandt-Hoffman Theorem [8], the gap between the smallest singular value and the second smallest singular value of the matrix Q + E(ε) is bounded below by
Now, we apply a version of the Circle Theorem [5] (equ 1.2.19)
| (3.14) |
as n → ∞. Since ||Vn|| = ||Vn(ε)|| = 1, the conclusion Vn(ε) → Vn = hn/||hn|| follows. This completes the proof.
The following result is a direct consequence of the above lemma.
Theorem 3.1
Let ĥn = (ĥ(0), …, ĥ(n − 1))′ = ±Vn(ε) be the estimate of so that the first non-zero entry is positive and
Then, as n → ∞
Based on the results, we can collect the data set |y(i1)| ≤ ε, |y(i2)| ≤ ε, …, |y(in)| ≤ ε so that rank Φ(i1, i2, …, in) ≥ n − 1. Then, the SVD of Φ(i1, i2, …, in) provides the estimate ĥn = Vn(ε), modulus ± sign. A problem is that only data at time i1, i2, …, in are used and all other data is discarded which is not efficient and in fact is not robust in the presence of noise. An efficient way is to use all the data |y(k)| ≤ ε and the corresponding matrix Φ. The analysis as discussed before carries over with no or minimal modifications. But at the same time, since more data is used, the average effect of the noise is reduced making the identification algorithm more robust. We are now in a position to introduce the identification algorithm based on point a priori information.
Identification algorithm with the point a priori information f(0) = 0: Consider the system shown in Figure 1 under Assumption (3.1).
Step 1: Collect data u(k)’s and y(k)’s, k = 1, 2, …, N.
Step 2: For each n, construct a submatrix Φ(i1, i2, …, il) of Φ (1, 2, …, N ) by deleting k’s row if |y(k)| > ε, where as n → ∞.
Step 3: Calculate SVD
Step 4: Define ĥn = ±Vn(ε) so that the first non-zero element of ĥn is positive.
Step 5: Set .
Then, from the lemma and theorem, for each n, ĥn → hn/||hn|| as N → ∞. Further, as n gets larger and larger, ĥn → h and Ĝ (ejω ) → G(ejω ) in the integral least squares sense.
We comment that in the algorithm, the choice of ε is not unique. Small ε throws away all the data which is larger than ε and results in few data to construct the estimate. Thus, it takes a longer time to collect the same number of data useful to construct the estimate for a small ε than a large ε. On the other hand, however, a large ε collects y(k) that is not so small which results in x(k) that is not in the neighborhood of 0 but is mistakenly considered to be near 0 and used to construct the estimate. Clearly, this tends to increase the bias and at the same time, to reduce the variance because more and more data can be used. So the choice of ε is to balance the bias and variance which is reminiscent of the choice of the bandwidth in kernel identification [9,12]. The idea is to use local data near y(k) = 0 to identify the linear part without interference from the unknown nonlinearity. Some guidelines are provided in Section 5.4 of [9]. Of course, preferably, any choice of ε needs to be tested on a fresh data set for validation purpose.
We now provide a numerical simulation. Let the linear part be an 4th order system
| (3.15) |
and the nonlinear part be a non-continuous, non-symmetric and non-monotonic nonlinearity shown in Figure 2,
Fig. 2.

The unknown nonlinearity.
| (3.16) |
The input u(k) is iid uniformly in [− 1, 1] and the Gaussian noise is added to the output. Figures 3 and 4 show the estimates ĥn, Ĝ (ejω) of hn and G(ejω) respectively when N = 3, 000, ε = 0.1, SNR=20db and n = 30 with the estimation error
Fig. 3.

ĥn and hn
Fig. 4.

Ĝ(ejω)(solid) and G(ejω)(dash-dot).
The estimate ĥ is defined as ĥ = (ĥ(0), …, ĥ (n − 1), 0, 0, …)′.
To demonstrate the performance of the identification, the algorithm has been simulated for different combinations of SNR, data length N, order n and threshold ε. Table 1 shows the estimation error ||ĥ − h||2 for various N and SNR values when ε and n are fixed at ε = 0.1 and n = 30. Table 2 shows the estimation error ||ĥ − h||2 for various ε and n when N and SNR are fixed at N = 3, 000 and SNR=20dB. All the results are the averages of 50 Monte Carlo simulations.
Table 1.
Estimation error vs N and SNR when ε = 0.1 and n = 30.
| SNR | 10dB | 20dB | 40dB | ∞ |
|---|---|---|---|---|
| N=1,000 | 0.2552 | 0.0344 | 0.0049 | 0.0027 |
| N=2,000 | 0.0852 | 0.0144 | 0.0022 | 0.0010 |
| N=3,000 | 0.0600 | 0.0087 | 0.0012 | 0.0007 |
Table 2.
Estimation error vs ε and n when N = 3, 000 and SNR=20dB.
| n=15 | n=20 | n=30 | |
|---|---|---|---|
| ε = 0.08 | 0.0046 | 0.0069 | 0.0106 |
| ε = 0.1 | 0.0038 | 0.0056 | 0.0085 |
| ε = 0.12 | 0.0036 | 0.0050 | 0.0082 |
4 Monotonic nonlinearities
The idea of point a priori information is that though there is no other information about the nonlinearity, data in the neighborhood of origin could be used to construct an estimate because the knowledge about the nonlinearity around the origin is known. In this section, we extend the idea to a case where no point a priori information is available but the nonlinearity is assumed to be monotonic in some intervals. More precisely, it is assumed that
Assumption 4.1
There exists an interval −∞ < f < f̄ < ∞ and within the interval f(x) ∈ [f, f̄], f(·) is continuous and
Again, we comment that the assumption actually contains some global information on the nonlinearity. Let f(x) = f and f(x̄) = f̄. Then, the assumption prevents the nonlinearity from taking any value between f and f̄ anywhere outside of the range (x, x̄).
Clearly, f(·) is monotonic if y = f(x) ∈ [f, f̄].
Now, define
| (4.1) |
It is easily verified that
| (4.2) |
The equation is reminiscent of (3.3) and is a key for identification based on point a priori information. Note from the monotonic assumption,
and
for small ε1 thanks to the continuity of f, if ε is small enough. Therefore, by re-naming z(i, j) as x(i) and ψn(i, j) as φn(i), everything developed for point a priori information in the previous section can be carried over here. The following result is a straightforward extension of Theorem 3.1.
Theorem 4.1
Consider the system shown in Figure 1 under Assumption (4.1). Assume that the probability density function of y = f(x) is positive in the interval [f, f̄]. Then,
-
For any n and ε > 0 so that as n → ∞, with probability one as N → ∞, there exist two sequences ψn(il, jl) = φn(il) − φn(jl) and |z(il, jl)| = |y(il) − y(jl)| ≤ ε and
(4.3) -
The matrix Ψ(il, j1, …in, jn) can be written as
for some Q and E(ε), where rank Q = n − 1 independent of n and ||E(ε)|| → 0 as n → 0. Further, let
be the SVD of Ψ. Then, modulus of ± signs,
(4.4) or equivalently .
Similarly, we can define the identification algorithm where the unknown nonlinearity is monotonic in [f, f̄].
Identification algorithm with the monotonic assumption: Consider the system in Figure 1 under Assumption (4.1).
Step 1: Collect data u(k)’s and y(k)’s for those y(k) ∈ [f, f̄].
Step 2. Sort out the collected data in a decreasing order y(k1) ≥ y(k2) ≥ … ≥ y(kl).
Step 3: For each n and ε with , construct z(ki, ki+1) = y(ki) − y(ki+1), ψn(ki, ki+1) = φn(ki) − φn(ki+1). Construct a submatrix Ψn(i1, j1, …, il, jl) of Ψn by deleting q’s row if |z(q, q + 1)| > ε.
Step 4: Calculate the SVD Ψn(i1, j1, …, il, jl) =U (ε)Σ(ε)(V1(ε), …, Vn(ε))′.
Step 5: Define ĥn = ±Vn(ε) so that the first non-zero element of ĥn is positive.
Step 6: Set .
As before, ĥn → h and Ĝ (ejω) → G(ejω).
We now test the algorithm on the same example (3.15) as in the previous section under the same input but under the assumption that the nonlinearity is monotonic for |y| ≤ 0.7. Figures 5 and 6 show the estimates ĥn, Ĝ (ejω) of hn and G(ejω) respectively when N = 2, 000, ε = 0.1, SNR=20db and n = 30 with the estimation error 0.0041.
Fig. 5.
ĥn and hn, monotonic assumption.
Fig. 6.
Ĝ(ejω)(solid) and G(ejω)(dash-dot), monotonic assumption.
Again, to demonstrate the performance of the identification algorithm, Table 3 shows the estimation errors for various N and SNR when ε = 0.1 and n = 30 and Table 4 shows the estimation errors for various ε and n when N = 1, 000 and SNR=20dB. All the results are the averages of 50 Monte Carlo simulations.
Table 3.
Estimation error vs N and SNR when ε = 0.1 and n = 30, monotonic priori information.
| SNR | 10dB | 20dB | 40dB | ∞ |
|---|---|---|---|---|
| N=500 | 0.0670 | 0.0186 | 0.0020 | 0.0000021 |
| N=1,000 | 0.0315 | 0.0080 | 0.0009 | 0.00000026 |
| N=2,000 | 0.0142 | 0.0042 | 0.0005 | 0.00000003 |
Table 4.
Estimation error vs ε and n when N = 1, 000 and SNR=20dB, monotonic priori information.
| n=15 | n=20 | n=30 | |
|---|---|---|---|
| ε = 0.08 | 0.0037 | 0.0054 | 0.0084 |
| ε = 0.1 | 0.0040 | 0.0058 | 0.0086 |
| ε = 0.12 | 0.0038 | 0.0051 | 0.0084 |
This algorithm seems to outperform the one with point a priori information. One explanation is that this algorithm utilizes the data y ∈ [−0.7, 0.7] and the previous one only uses the data y close to zero. Simply put, more data is allowed for this algorithm than the previous one and thus, the effect of noises is small. We also comment that the nonlinearity is actually non-continuous but the algorithm works anyway. This is because the nonlinearity is non-continuous only at one point for |y| ≤ 0.7. Further, all the data collected on two segments separated by this point will not be used in identification because the difference 0.18 is larger than the threshold ε = 0.1.
5 Quadrant a priori information
In this section, we discuss identification with quadrant or sign a priori information. It is assumed that
Assumption 5.1
Clearly, the unknown nonlinearity is strictly in the first and third quadrants and no other information is available. The nonlinearity can be non-smooth and non-monotonic. It is important to comment that the results derived in this section are not limited to a priori information of Assumption 5.1 but apply to sign(y) = −sign(x) or other similar a priori information with minimal modifications.
In this section, we make an additional assumption on the linear part.
Assumption 5.2
The order m of the linear part is known
Obviously, there is no other a priori information and identification has to rely on the knowledge of sign(x(k)) = sign(y(k)). Let (α̂1, …, α̂m, β̂1, …, β̂m)′ denote an estimate of (α1, …, αm, β1, …, βm)′ and
an estimate of G(z). Because Ĝ (z) = G(z) if
our approach to find an estimate is by the following minimization
| (5.1) |
subject to ||ĥ|| = 1, where ĥ is the impulse response of the estimate Ĝ (z) and x̄ (k) is generated by the input and .
It is clear that if (α̂1, …, α̂m, β̂1, …, β̂m) = (α1, …, αm, β1, …, βm). To guarantee that the optimization (5.1) will produce a correct estimate, what we have to show is that for all (α̂1, …, α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm), or equivalently,
for some k. The meaning is that the minimization of (5.1) has one and only one solution that is achieved at the true but unknown (α1, …, αm, β1, …, βm).
Recall h and ĥ are the impulse responses of G(z) and Ĝ (z) respectively. Obviously,
Now, write
Before presenting the main result of this section, we make a few observations.
φn(kn) and φn(jn) are iid if j ≠ k, Also, φn(kn) is a n-dimensional random vector that assumes any direction with a positive probability. Moreover, ||φn(kn) || ≥ a/2 with a positive probability.
-
||ĥ|| = ||h|| = 1. Thus for any small ε > 0, there is an n1 > 0 such that for all n ≥ n1,
Further, hn ≠ ĥn if h ≠ ĥ.
-
For any small ε > 0, there exists n2 > 0 such that for all n ≥ n2,
We now state the main results of this section.
Theorem 5.1
Consider the Wiener system in Figure 1 under Assumptions 5.1 and 5.2, the estimate derived from the minimization of (5.1). Then, with probability one as N → ∞,
if (α̂1, …, α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm).
Proof
If (α̂1, …,α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm) or equivalently ĥ ≠ h, the angle θ = ∠(h, ĥ) between h and ĥ is non-zero. There are two cases, 0 < θ < 90° and 90° ≤ θ ≤ 180°. The proof for the second case is similar and we only show the first case.
From the observations, there is a large (possibly unknown) n and a small (possible unknown) ε > 0 such that
and
This is because the right hand side goes to a positive value and the middle term goes to zero as ε → 0. In addition, the angle ∠(hn, ĥn) is between [3/4θ, 5/4θ] for large n because ∠(hn, ĥn) → ∠(h, ĥ). Further, there exists a vector φn(kn) with ||φn(kn)|| ≥ a/2 as shown in Figure 7.
Fig. 7.

hn, ĥn and φn(kn).
Now,
which results in
Therefore,
or
| (5.2) |
By the continuity arguments, any vector close enough to φn(kn) would result in the same conclusion as (5.2). Again from the observations, φn(kn), k = 1, 2, …, is iid with a positive probability ||φn(kn)|| ≥ a/2 and assumes any direction with a positive probability. Therefore, there is a positive probability for each k that φn(kn) produces (5.2). More precisely, for each k, there is positive probability p > 0 that (sign(x(kn)) −sign(x̂ (kn)))2 = 4 if (α̂1, …, α̂m, β̂1, …, β̂m) ≠ (α1, …, αm, β1, …, βm). By the Borel Lemma , we conclude that with probability one as N → ∞, there is a k such that (sign(x(kn)) − sign(x̂ (kn)))2 = 4. This completes the proof.
The result presented above is actually weaker than its counterpart for an FIR case as in [2] where not only does the minimization have one and only one global minimum but also there are no other local minimum. In other words, the objective function is a monotonic function of the angle between the estimate and the true but unknown system. We conjecture the same conclusion holds for the IIR case but do not have any proof yet.
Identification algorithm under quadrant a priori information.: Consider the system in Figure 1 under Assumptions (5.1) and (5.2).
Step 1: Collect data φ (k) and y(k), k = 1, …, N.
Step 2: Solve the minimization problem (5.1) to find the estimate (α̂1, …, α̂m, β̂1, …, β̂m).
Step 3: Define
We now test the algorithm on the same example (3.15) as in the previous section under the same input but under the assumption sign(y(k)) = sign(x(k)). A genetic algorithm [1] was applied with n = 30 and N = 5, 000 and 10, 000. The genetic algorithm is a heuristic zero-order iterative search algorithm. The total number of the genetic algorithm parent was 64. Figures 8 and 9 show the estimates ĥn, Ĝ (ejω) of hn and G(ejω) respectively when SNR=20db. Table 5 shows the estimation errors for various SNR. All the results are the averages of 50 Monte Carlo simulations.
Fig. 8.
ĥn and hn, sign a priori information.
Fig. 9.

Ĝ(ejω)(solid) and G(ejω)(dash-dot), sign a priori information.
Table 5.
Estimation error, sign a priori information.
| SNR | 10dB | 20dB | 40dB | ∞ |
|---|---|---|---|---|
| N=5,000 | 0.0093 | 0.0028 | 0.0005 | 0.0003 |
| N=10,000 | 0.0057 | 0.0014 | 0.0002 | 0.0001 |
6 Concluding remarks
The focus of this paper is to derive identifiability under various minimal a priori information on the unknown nonlinearity. No theoretical results on noise analysis are presented. Noise effects are however extensively tested in numerical simulations. Theoretical study of noise effects will be an interesting research topic.
Our long term goal is to find what constitutes the least amount of a priori information that makes identification of a Wiener system possible. The finding presented in the paper are useful in this regard but there is still a long way to go to find the answer.
Biographies
Er-Wei Bai was educated in Fudan University, Shanghai Jiaotong University, both in Shanghai, China, and the University of California at Berkeley. Dr. Bai is Professor of Electrical and Computer Engineering at the University of Iowa where he teaches and conducts research in identification, control, signal processing and their applications in engineering and medicine.
Dr. Bai is an IEEE Fellow and a recipient of the President’s Award for Teaching Excellence.
John Reyland, Jr. is a Principle Digital Signal Processing Engineer at Rockwell Collins, Inc. in Cedar Rapids, Iowa. He is also a Ph.D. candidate in Electrical and Computer Engineering at the University of Iowa. Mr. Reyland has a B.S.E.E. from Texas A&M University and an M.S.E.E. from George Mason University in Fairfax, Virginia.
Footnotes
This paper was not presented at any IFAC meeting. The work was supported in part by NSF ECS-0555394 and NIH/NIBIB EB004287.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Abe M. Comparison of the Convergence of IIR Evolutionary Digital Filters and Other Adaptive Digital Filters on a Multiple-Peak Surface. Proc the Thirty-First Asilomar Conference on Signals, Systems & Computers. 1997;2:1674–1678. [Google Scholar]
- 2.Bai EW, Reyland J. Towards identification of Wiener systems with the least amount of a priori information on the nonlinearity. Automatica. 2008;44:910–919. doi: 10.1016/j.automatica.2008.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bai EW. A blind approach to the Hammerstein-Wiener model identification. Automatica. 2002;38:967–979. [Google Scholar]
- 4.Billings SA, Fakhouri SY. Identification of a class of nonlinear systems using correlation analysis. Proc of IEE. 1978;125(7):691–697. [Google Scholar]
- 5.Bjorck A. Numerical methods for least squares problems. SIAM publisher; 1996. [Google Scholar]
- 6.Hu X, Chen HF. Strong consistence of recursive identification for Wiener systems. Automatica. 2005;41:1905–1916. [Google Scholar]
- 7.Crama P, Schoukens J. Initial estimates of Wiener and Hammerstein systems using multisine excitation. IEEE Trans on Instrumentation and Measurement. 2001;50:1791–1795. [Google Scholar]
- 8.Golub GH, Van Loan C. Matrix Computations. The John Hopkins University Press; Baltimore, Maryland: 1984. [Google Scholar]
- 9.Fan J, Yao Q. NONLINEAR TIME SERIES. Springer; New York: 2003. [Google Scholar]
- 10.Greblicki W. Nonparametric identification of Wiener systems. IEEE Trans on Info Theory. 1992;38:1487–1493. [Google Scholar]
- 11.Johnson C, Szulc T. Further lower bounds for the smallest singular values. Linear algebra and its applications. 1998;272:169–179. [Google Scholar]
- 12.Nadaraya EA. NONPARAMETRIC ESTIMATION OF PROBABILITY DENSITIES AND REGRESSION CURVES. Kluwer Academic Pub; Dordrecht, The Netherlands: 1989. [Google Scholar]
- 13.Papoulis A, Pillai SU. Probability, Random Variables and Stochastic Processes. 4. McGraw Hill; Boston: 2002. [Google Scholar]
- 14.Voros J. Parameter identification of Wiener systems with discontinuous nonlinearities. Systems and Control Letters. 2001;44(5):363–372. [Google Scholar]
- 15.Westwick D, Verhaegen M. Identifying MIMO Wiener systems using subspace model identification method. Signal Processing. 1996;52:235–258. [Google Scholar]
- 16.Wigren T. Circle criteria in recursive identification. IEEE Trans on Automatic Control. 1997;42:975–979. [Google Scholar]
- 17.Wigren T. Recursive prediction error identification using the nonlineari Wiener model. Automatica. 1993;29:1011–1025. [Google Scholar]
- 18.Wigren T. Adaptive filtering using quantized output measurements. IEEE Trans on Signal Processing. 1998;46:3423–3426. [Google Scholar]
- 19.Zhang Q, Iouditski A, Ljung L. IFAC Symp on System Identification. Newcastle; Australia: 2006. Identification of Wiener system with monotonous nonlinearity; pp. 166–171. [Google Scholar]



