Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2021 Aug 12;41(2):915–932. doi: 10.1007/s00034-021-01809-3

Identification of the ARX Model with Random Impulse Noise Based on Forgetting Factor Multi-error Information Entropy

Shaoxue Jing 1,
PMCID: PMC8359771  PMID: 34404959

Abstract

Entropy has been widely applied in system identification in the last decade. In this paper, a novel stochastic gradient algorithm based on minimum Shannon entropy is proposed. Though needing less computation than the mean square error algorithm, the traditional stochastic gradient algorithm converges relatively slowly. To make the convergence faster, a multi-error method and a forgetting factor are integrated into the algorithm. The scalar error is replaced by a vector error with stacked errors. Further, a simple step size method is proposed and a forgetting factor is adopted to adjust the step size. The proposed algorithm is utilized to estimate the parameters of an ARX model with random impulse noise. Several numerical solutions and case study indicate that the proposed algorithm can obtain more accurate estimates than the traditional gradient algorithm and has a faster convergence speed.

Keywords: ARX model, Parameter estimation, Minimum error entropy, Information gradient, Multi-error, Forgetting factor

Introduction

ARX model is an AutoRegressive model with eXogenous terms [31]. Because of its simplicity and easy parameterization, the ARX model has been widely used to model a lot of real systems, such as micro-turbines, data improving, fault detection, biomedical signals, COVID-19 case forecasting and communication systems  [1, 3, 7, 28, 34, 44].

Much research has been performed to identify ARX models in the last five decades. A piecewise auto-regressive exogenous structure was adopted to forecast the river floods [18]. A novel automated framework based on generalized spectral decomposition was proposed to estimate the parameters of an ARX model [33]. A new method based on the expectation–maximization (EM) algorithm was utilized for the identification of ARX models subject to missing data [24]. A recursive EM algorithm based on Student’s t-distribution was used for robust identification of ARX models [8]. A modified momentum gradient descent algorithm was investigated to identify ARX models [50]. A three-stage algorithm was studied for the identification of fractional differencing ARX with errors in variables [25].

However, most of the noises considered in the aforementioned papers are white noises or Gaussian noises. Random impulse signals can often be found in industrial signals, such as image signals, audio signals and communication signals  [2, 9, 46].

Identification criteria play an important role in system identification. The classical identification criteria include the least square criterion, maximum likelihood criterion and so on. These criteria have the advantages of simple calculation and easy theoretical analysis. However, the performance of the least-squares algorithm is poor for the non-Gaussian case, and the maximum likelihood algorithm needs to know the conditional probability density of the sample. Because of these problems, many researchers have put forward many other criteria, such as p-norm error criterion and mixed-norm error criterion [39, 53]. In recent years, information criteria have become more widespread in signal processing and system identification [14, 23, 32]. Compared with mean square error (MSE) criterion, which focuses on second-order statistics, the information-theoretic criterion (e.g., minimum error entropy (MEE) [4], Renyi’s entropy [15, 41], fixed-point maximum correntropy [21]) is related to various statistical behaviors of the probability density function (pdf) of the error. Algorithms based on information-theoretic criterion may have better performance than MSE-based algorithms [6, 15, 16].

In the last decade, entropy has found significant applications in system identification. A maximum correntropy criterion (MCC) algorithm was proposed for sparse system identification based on normalized adaptive filter theory [30]. An extended version of correntropy, whose center can be located at any position, and a new optimization criterion based on MCC with a variable center, were proposed [5]. A blocked proportionate normalized maximum correntropy algorithm and a separable maximum correntropy adaptive algorithm were presented to identify dynamic systems [29, 45].

To decrease the entropy estimators’ complexity, a stochastic information gradient (SIG) algorithm was proposed and its performance was investigated [14]. To improve the estimates, a joint stochastic gradient algorithm based on MSE and MEE was proposed and applied to identify an RBF network [4]. Though having less complexity, the SIG converges very slowly. To speed up the SIG, like the multi-innovation used in [12], a multi-error strategy is adopted and a feasible equation for calculation of the step size is introduced.

Since its introduction in 2003, the SIG algorithm has been widely used in system identification and machine learning. For example, a kernel-based gradient descent algorithms based on MEE was proposed to find nonlinear structures in the data, and its convergence rate was deduced [22]. A kernel adaptive filter for quaternion data was developed, and a new algorithm based on the SIG approach was applied to this filter [37]. To avoid unstable training or poor performance in deep learning, a strategy of directly estimating the gradients of information measures with respect to model parameters was explored, and a general gradient estimation method for information measures was proposed [52]. To avoid potentially sub-optimal solutions with respect to class separability, a dimensionality reduction network training procedure based on the stochastic estimate of the mutual information gradient was presented [38].

For several decades, data-driven techniques have been used in modeling and fault detection. For example, a surrogate model was developed based on a data-driven approach, and it can facilitate the design and optimization processes of permanent magnet systems [36]. A Matlab toolbox for data-based fault detection was developed in a unified data-driven framework. [27]. A new recursive total principle component regression-based design and implementation approach was proposed for efficient data-driven fault detection and applied to vehicular cyber-physical systems [26].

In this paper, the problem of parameter identification of the ARX model disturbed by random impulse noise is studied. The premise is that the structure of the model is known, the type of noise is known, the identification data are normal measurement data, and there are no outliers except impulse noise. The possible outliers, modeling errors, and other uncertainties in practice are not considered. Interested readers can refer to the recent literature [13, 4749]. The main contributions of this work are as follows:

  1. For the SIG algorithm, a simple step size method is proposed.

  2. To make the algorithm faster, a multi-error method that uses stack error instead of instantaneous scalar error is applied.

  3. Since the stack length can only be taken as an integer, a forgetting factor is used to further accelerate the algorithm.

  4. The proposed algorithm is utilized to estimate the parameters of an ARX model with random impulse noise. Several numerical simulations and a case study show the effectiveness of the algorithm.

The rest of this work is organized as follows. In the next section, we describe the ARX model to be estimated. Based on an SIG algorithm in Sect. 3, a multi-error SIG with a forgetting factor is presented in Sect. 4. The convergence and the computational cost are analyzed in Sect. 5. Then, parameter estimation of an ARX model with a random impulse noise and a gas furnace dataset from the literature [42] is used to validate the proposed algorithm in Sect. 6. Finally, conclusions are presented in Sect. 7.

Problem Description

Consider an ARX model depicted in Fig. 1, where u(k) is the input and y(k) is the output. A(z-1) and B(z-1) are two polynomials with respect to z-1, and their degrees are na and nb, respectively. The model is polluted by a random impulse noise v(k).

Fig. 1.

Fig. 1

Block diagram of an ARX model

It is easy to find that

y(k)=B(z-1)A(z-1)u(k)+1A(z-1)v(k). 1

Multiplying both sides of Eq. 1 by A(z-1) gives

A(z-1)y(k)=B(z-1)u(k)+v(k). 2

Suppose A(z-1)=1+a1z-1+a2z-2++anaz-na and B(z-1)=b1z-1+b2z-2++bnbz-nb, then we can directly parameterize the model as follows,

y(k)=1-A(z-1)y(k)+B(z-1)u(k)+v(k),=φT(k)θ+v(k), 3

with

θ=a1,,ana,b1,,bnbTRn×1,φ(k)=-y(k-1),,-y(k-na),u(k-1),,u(k-nb)TRn×1,n=na+nb. 4

Then, the identification of the ARX model shown in Fig. 1 can be transformed into the estimation of the parameters θ based on the observations u(k),y(k)k=1N , where N is the data length.

However, traditional identification algorithms, such as the least-square algorithm and the mean square error algorithm, only consider the second moment of the error, and in some cases (such as the presence of random impulse noise) identification results deteriorate further. The information criterion algorithm based on a probability density function (pdf) considers the statistical information of each order of the error and is expected to achieve better estimates.

Next, we introduce the SIG algorithm and then describe our algorithm based on information gradient, which is integrated with the multi-error and a forgetting factor.

SIG of Shannon’s Error Entropy

Consider the parameterized system in Eq. 3, and denote a random error e(k) as

e(k)=y(k)-φT(k)θ, 5

where y(k) is the system output without noise.

Shannon’s entropy for e with pdf f(e) is [43]

H(e)=--f(e)logf(e)de=E-logf(e). 6

In practice, the pdf of e, i.e., f(e), is unknown. Thus, Eq. 6 cannot be used to calculate the entropy of e directly. One way is to utilize a Parzen window to approximate the unknown pdf underlying the N observations by [41]

f^(e)=1Ni=1Nκσe-e(i), 7

where κσ(·) is the kernel function with size σ [40].

At time k, a Parzen window estimate of e with window length L is

f^e(k)=1Li=k-Lk-1κσΔki, 8

where Δki=e(k)-e(i), and e(k), e(i) denote the error at time ki, respectively.

Thus, the stochastic entropy estimate at time k becomes

H^(e(k))=E-log1Li=k-Lk-1κσΔki. 9

Dropping the expectation in Eq. 9 [14], we obtain

H^(e(k))-log1Li=k-Lk-1κσΔki. 10

The stochastic gradient of Shannon entropy concerning θ at time k, g, is

g=-i=k-Lk-1κσΔkie(k)θ-e(i)θi=k-Lk-1κσΔki, 11

where κσ(·) is the derivative of the kernel function.

Using the following Gaussian kernel with variance σ2, i.e.,

κσΔki=exp-Δki22σ2, 12

Equation 11 becomes

g=i=k-Lk-1κσΔkiϵkiΔkiσ2i=k-Lk-1κσΔki, 13

with

ϵki=φ(k)-φ(i). 14

The SIG for estimating the parameter vector θ is obtained as follows:

θ^(k)=θ^(k-1)+η(k)g, 15

where η(k) is the step size and is critical for convergence speed. However, equations to calculate the step size in [4] and [14] are too complicated to operate online. Here, we utilize the equation in stochastic gradient [10]:

r(k)=r(k-1)+φ(k)2,r(0)=1,η(k)=1r(k). 16

In practice, the θ and the output without noise y(k) in Eq. 5 are unknown. A feasible way is to replace them by θ^(k-1) and y(k), respectively. Thus, Eq. 5 becomes

e(k)=y(k)-φT(k)θ^(k-1). 17

Forgetting Factor Multi-error SIG Algorithm

One drawback of the SIG algorithm is its slow convergence. To enable the algorithm converge faster, a multi-error strategy is adopted and Eq. 13 is rewritten as follows:

g=i=k-Lk-1κσΔkiΔΦ(p;k,i)ΔE(p;k,i)σ2i=k-Lk-1κσΔki 18

with

ΔΦ(p;k,i)=Φ(p,k)-Φ(p,i),ΔE(p;k,i)=E(p,k)-E(p,i), 19

where p is the stack length and E(p,k) and Φ(p,k) are the stacked error vector and stacked information matrix, respectively,

E(p,k)=e(k)e(k-1)e(k-p+1)Rp×1, 20

and

Φ(p,k)=[φ(k),φ(k-1),,φ(k-p+1)]Rn×p. 21

Note that the scalar error Δki in Eq. 13 is replaced by the vector error ΔE(k,i) in Eq. 18. In other words, multi-error takes the place of a single error. Thus, the algorithm is named a multi-error SIG algorithm (ME-SIG).

The stack length p can only be a positive integer. To make the ME-SIG faster, a forgetting factor (FF) λ is introduced. The first equation of Eq. 16 becomes

r(k)=λr(k-1)+φ(k)2,r(0)=1. 22

Equations 15 and 1722 construct the FF-ME-SIG algorithm.

Performance Analysis

Convergence Analysis

The approximate linearization approach [17] is used to analyze the convergence of the proposed ME-SIG algorithm in Eq. (15) with Eqs. (18) and (19).

Subtracting θ0 from both sides of Eq. (15), we obtain

θ~(k)=θ~(k-1)+η(k)g, 23

where θ~(k)=θ^(k)-θ0 is the estimation error vector of the parameter.

Approximating the gradient g in Eq. (18) using the first-order Taylor expansion:

gg(θ0)+H1θ^(k-1)-θ0=H1θ^(k-1)-θ0=H1θ~(k-1), 24

where H1=gT(θ0)θ is the Hessian matrix and is expressed as

H1=i=k-Lk-1κσΔkiΔET(p;k,i)ΔΦT(k,i)σ2i=k-Lk-1κσΔki+i=k-Lk-1κσΔkiΔET(p;k,i)ΔΦT(k,i)σ2i=k-Lk-1κσΔki-i=k-Lk-1κσΔkiΔET(p;k,i)ΔΦT(k,i)i=k-Lk-1κσΔkiσ2i=k-Lk-1κσ2Δki. 25

Substituting Eq. (24) into Eq. (23), we obtains

θ~(k)=θ~(k-1)+η(k)H1θ~(k-1), 26

We analyze the convergence of Eq. 26 by borrowing the results from the LMS convergence theory [19, 20]. Assuming that the Hessian matrix H1 is a normal matrix and can be decomposed into the following normal form:

H1=Q1Λ1Q1-1, 27

where Q1 is an m×m orthogonal matrix, Λ1=diag[γ1,γ2,,γm],γi is the eigenvalue of H1. Then, the recursive Eq. 26 can be expressed as

θ~(k)=Q1I+η(k)Λ1Q1-1θ~(k-1)=Q1i=1kI+η(i)Λ1Q1-1θ~(0). 28

Clearly, if the following conditions are satisfied, θ~(k)0 , i.e., θ^(k)θ0:

1+η(i)γj<1,i=1,2,,k,j=1,2,,m. 29

Thus, a sufficient condition that ensures the convergence of the algorithm is as follows:

γj<0,j=1,2,,m,0<η(i)<2maxj|γj|. 30

Computational Analysis

According to the calculation method of [11], the calculation amount of each iteration of the three algorithms is shown in Table 1, where ex is calculated by its Taylor expansion, and the first three terms are used. ‘Time’ denotes the time that the numerical example consumes.

Table 1.

Computational cost of SIG, ME-SIG and FF-ME-SIG

Algorithm Computation cost (flops) n=4,L=3,p=5 Time (s)
SIG 3nL+17L+n+2 93 0.1788
ME-SIG npL+pL+16L+2 125 0.1860
FF-ME-SIG npL+pL+16L+3 126 0.1862

From the calculation of complexity and running time, we can see that the former algorithm has lower values than the others, while the latter two algorithms have little difference. The computational complexity of the two algorithms proposed in this paper is larger than that of the first algorithm, because the latter two algorithms need the calculation of multi-error. In terms of running time, the latter two algorithms take approximately 4% more time than the previous SIG algorithm, which is not a significant difference.

Experimental Results

Consider the ARX model depicted in Fig. 1 with

A(z-1)=1.0-1.5z-1+0.7z-2,B(z-1)=1.0z-1+0.5z-2, 31

where input data u(k) are an M-sequence and v(k) is a random impulse noise. 5% of the output data (30 outputs) are randomly selected, and 30 noises with random amplitude between 0 and 1 are added, respectively. The curves of input u(k) and output y(k) are shown in Fig. 2. All simulation experiments use this model.

Fig. 2.

Fig. 2

Curves of input–output data

Numerical Simulation

(1) Results using SIG algorithm

The parameter estimates using the SIG with window length L=3 are shown in Table 2, where the estimation error δ is defined as δ=θ^(k)-θ0θ0×100. It is easy to determine that estimation error decreases as data length k increases (for a given L). However, the errors are very large (38.9319%) at the end of the estimation.

Table 2.

Results using SIG algorithm (L=5)

k 25 50 100 200 400 600 True value
a1 - 0.8723 - 0.8514 - 0.8443 - 0.8453 - 0.8461 - 0.8436 - 1.5000
a2 - 0.1863 - 0.0802 - 0.0595 - 0.0454 - 0.0264 - 0.0135 0.7000
b1 0.1902 0.2164 0.2192 0.2238 0.2288 0.2321 1.0000
b2 - 0.0186 0.0397 0.0456 0.0526 0.0612 0.0666 0.5000
δ(%) 69.6451 68.1897 67.5927 66.9201 66.0873 65.6092

(2) Results using ME-SIG algorithm

The parameter estimates using the proposed ME-SIG with stack length p=5 and L=3 are shown in Table 3. The estimation errors with different p are depicted in Fig. 3 (L=3). It can be seen that:

  1. Estimation error of a given p decreases when data length k increases;

  2. With stack length p increasing, the estimation error decreases quickly.

Table 3.

Results using ME-SIG algorithm (p=5,L=3)

k 25 50 100 200 400 600 True value
a1 - 1.3218 - 1.3024 - 1.3212 - 1.3428 - 1.3725 - 1.3874 - 1.5000
a2 - 0.2220 0.4978 0.5452 0.5706 0.5970 0.6133 0.7000
b1 0.3139 0.8141 0.8219 0.8260 0.8375 0.8454 1.0000
b2 - 0.0422 0.2836 0.3057 0.3321 0.3684 0.3854 0.5000
δ(%) 29.3258 20.1085 17.7285 15.8231 13.3009 11.9745

Fig. 3.

Fig. 3

Estimation errors using ME-SIG with different stack length p

(3) Results using FF-ME-SIG algorithm

The parameter estimates using proposed FF-ME-SIG with forgetting factor λ=0.99 are shown in Table 4, where L=3 and p=5. It can be seen that:

  1. Estimation error decreases when data length k increases;

  2. Compared with Table 3, the estimation error of the FF-ME-SIG is smaller at the same data length k.

Table 4.

Results using FF-ME-SIG algorithm (p=5,L=3,λ=0.99)

k 25 50 100 200 400 600 True value
a1 - 1.3236 - 1.3184 - 1.3425 - 1.3746 - 1.4372 - 1.4641 - 1.5000
a2 - 0.2209 0.5130 0.5627 0.6075 0.6513 0.6741 0.7000
b1 0.3152 0.8302 0.8386 0.8463 0.8864 0.9188 1.0000
b2 - 0.0425 0.3015 0.3283 0.3724 0.4446 0.4723 0.5000
δ(%) 29.1555 18.4745 15.7657 12.6848 7.4742 4.8337

(4) Comparison of the results of SIG, ME-SIG and FF-ME-SIG algorithms

The estimation errors using SIG, ME-SIG and FF-ME-SIG are depicted in Fig. 4. It can be seen that:

Fig. 4.

Fig. 4

Estimation errors using SIG, ME-SIG, FF-ME-SIG

  1. All curves decrease when data length k increases;

  2. The estimation error of the SIG algorithm is larger than that of ME-SIG, which means that multi-error can improve the accuracy of the SIG’s estimate;

  3. The estimate of the FF-ME-SIG is the most accurate one of the three. In other words, the introduction of the forgetting factor improves the accuracy of the estimation.

(5) Results using FF-ME-SIG algorithm under different noise additions

To test the performance of the algorithm under different noise levels, we add 5%, 10%, 30%, and 50% of noises to the samples. The mean of the noise is 0.5, and the amplitude is between 0 and 1. The estimation results when k=600 are shown in Table 5, where p=5,L=3,λ=0.99. The estimation error curve is shown in Fig. 5. It can be seen from Table 5 and Fig. 5 that with the increase in the added noise, the estimation error tends to increase, but the change is small, which indicates that the proposed algorithm has strong adaptability to noise.

Table 5.

Results using FF-ME-SIG algorithm under different noise additions (p=5,L=3,λ=0.99)

Estimates 5% 10% 30% 50% True value
a1 - 1.4641 - 1.4322 - 1.4807 - 1.3607 1.5000
a2 0.6741 0.6419 0.6910 0.5651 0.7000
b1 0.9188 0.9084 0.9329 0.9424 1.0000
b2 0.4723 0.5026 0.4043 0.4841 0.5000
δ(%) 4.8337 6.4037 5.9477 10.1580

Fig. 5.

Fig. 5

Estimates of FF-ME-SIG using samples with different noise additions

(6) Results using the FF-ME-SIG for the identification of the Narendra difference equations

To support this paper’s argument, we add further simulation examples involving synthetic input–output relations such as the Narendra difference equations proposed in [35, 51]:

y(n+1)=0.3y(n)+0.6y(n-1)+f[e(n)], 32

where

f(e)=0.6sin(πe)+0.3sin(3πe)+0.1sin(5πe)e(n)=sin(1+a)ω0n. 33

Using the following structure to the model above equation:

y(k)=a1y(k-1)+a2y(k-2)+a3y(k-3)+a4y2(k-1)+a5y2(k-2) 34

Let the data length be 240. The estimate using the proposed algorithm is 0.3084,0.3255,0.3641,-0.0264,0.0372 (when k=240). The predicted y(k) and observed y(k) are depicted in Fig. 6.

Fig. 6.

Fig. 6

Predicted and observed output using FF-ME-SIG for Eqs. 3233

(7) Results using FF-ME-SIG algorithm and RLS, SG algorithm

To prove the superiority of the proposed algorithm, the identification results of the stochastic gradient (SG) algorithm and recursive least squares (RLS) algorithm are compared with that of the proposed algorithm. Figure 7 shows the estimation error curves of the three algorithms. When k=600, the estimation error of the SG, RLS and FF-ME-SIG is 55.5395%, 5.2045% and 4.8337%, respectively. It can be seen that the estimation error of SG is very large, and the estimation error of the RLS algorithm is slightly larger than that of the proposed algorithm. However, due to the impulse noise, the estimation error of the RLS algorithm changes dramatically, which indicates that the estimates given by the RLS algorithm change significantly.

Fig. 7.

Fig. 7

Estimate errors using SG, RLS and FF-ME-SIG

Case Study

The data set of a gas furnace from the literature [42] is used to validate the proposed algorithm. These data were continuously collected from a gas furnace and then read every 9 s. The air feed of the furnace was kept constant, but the methane feed rate was varied and the resulting CO2 concentration in the off gases was measured. There are 296 input–output data in this set. The first 200 data are adopted to estimate the parameters. The curves of the input and output are shown in Fig. 8. The estimation results and the prediction errors using SIG, ME-SIG and FF-ME-SIG are listed in Table 6. The outputs y(k) and the prediction errors pe(k) using the proposed algorithm are shown in Fig. 9.

  1. It can be seen from Table 6 that among three algorithms, the proposed FF-ME-SIG algorithm has the smallest RMSE, which means that the proposed algorithm can give the most accurate estimate.

  2. As shown in Fig. 9, the outputs of the obtained model using the proposed algorithm can predict the observations well.

Fig. 8.

Fig. 8

Curves of the input and output data of a gas furnace

Table 6.

Results using SIG, ME-SIG and FF-ME-SIG algorithm

k SIG ME-SIG FF-ME-SIG
a1 - 0.4295 - 0.6126 - 0.6215
a2 - 0.3003 - 0.0561 0.0029
b1 - 0.0868 - 0.0904 0.0614
b2 - 0.1350 - 0.2137 - 0.1414
b3 - 0.1891 - 0.4097 - 0.4190
b4 - 0.1994 - 0.4148 - 0.4577
b5 - 0.1686 - 0.2513 - 0.2869
RMSE 0.2782 0.1183 0.0594

Fig. 9.

Fig. 9

Curves of the outputs and prediction errors of a gas furnace

Conclusions

In this paper, a novel SIG algorithm based on minimum error entropy is presented. The traditional SIG algorithm needs less computation than the MSE algorithm. However, it converges slowly. A multi-error strategy and a forgetting factor are introduced to speed up the SIG. We compared the results of SIG, ME-SIG and FF-ME-SIG by estimating the parameters of an ARX model with random impulse noise and through a case study. It is found that SIG with ME and FF can obtain accurate estimates, and it has a quick convergence rate.

Symbols and abbreviations

u(k)

System input at time k

y(k)

System output at time k

y(k)

System output without noise at time k

n(k)

System noise at time k

θ

Parameter vector

θ0

True value of parameter vector

θ^(k)

Parameter estimate at time k

θ~(k)

θ~(k)=θ^(k)-θ0

n

Dimension of parameter vector

N

Data length

L

Parzen window length

σ

Kernel width of Gaussian kernel

e

Error variable

e(k)

Error at time k

pdf

Probability density function

f(e)

Pdf of error

f^(e(k))

Estimate of f(e) at time k

H(e)

Shannon entropy for e

H^(e(k))

Estimate of H(e) at time k

E()

Mathematical expectation

Δki

Δki=e(k)-e(i)

ϵki

ϵki=φ(k)-φ(i)

g(k)

Stochastic gradient of Shannon entropy

κσ()

Gaussian kernel with variance σ2

κσ(·)

Derivative of κσ()

SG

Stochastic gradient

SIG

Stochastic information gradient

η(k)

Step size of SIG

ME

Multi-error

p

Stack length of ME

E(p,k)

Stacked error vector

Φ(p,k)

Stacked information matrix

ΔΦ(p;k,i)

ΔΦ(p;k,i)=Φ(p,k)-Φ(p,i)

ΔE(p;k,i)

ΔE(p;k,i)=E(p,k)-E(p,i)

FF

Forgetting factor

λ

Forgetting factor

H1

H1=gTθ

I

Identity matrix

γ

Eigenvalue

Data Availability Statement

All data generated or analyzed during this study are included in the attached file “data used in the paper.xlsx”.

Footnotes

This work is supported by the Natural Science Research Project of Jiangsu Higher School, China (19KJD510001), and the National Scholarship Foundation of China (201908320048).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Akouemo HN, Povinelli RJ. Data improving in time series using ARX and ANN models. IEEE Trans. Power Syst. 2017;32(5):3352–3359. doi: 10.1109/TPWRS.2017.2656939. [DOI] [Google Scholar]
  • 2.Awad A. Impulse noise reduction in audio signal through multi-stage technique. Eng. Sci. Technol. Int. J. 2018;22(2):629–636. [Google Scholar]
  • 3.C. Böck, K. Kostoglou, P. Kovacs, M. Huemer, J. Meier, A linear parameter varying ARX model for describing biomedical signal couplings, in Computer Aided Systems Theory-EUROCAST 2019. 17th International Conference. (Las Palmas de Gran Canaria, Spain, 2020), pp. 339–346
  • 4.B. Chen, J. Hu, H. Li, Z. Sun. A joint stochastic gradient algorithm and its application to system identification with RBF networks, in Proceedings of the 6th World Congress on Intelligent Control and Automation (Dalian, China, 2006), pp. 1754–1758
  • 5.Chen B, Wang X, Li Y, Principe JC. Maximum correntropy criterion with variable center. IEEE Signal Process. Lett. 2019;26(8):1212–1216. doi: 10.1109/LSP.2019.2925692. [DOI] [Google Scholar]
  • 6.Chen B, Zhu Y, Hu J, Principe JC. System Parameter Identification: Information Criteria and Algorithms. New York: Elsevier; 2013. [Google Scholar]
  • 7.Chen J, Liu Y, Ding F, Zhu Q. Gradient-based particle filter algorithm for an ARX model with nonlinear communication output. IEEE Trans. Syst. Man Cybern. Syst. 2020;50(6):2198–2207. doi: 10.1109/TSMC.2018.2810277. [DOI] [Google Scholar]
  • 8.X. Chen, S. Zhao, F. Liu, Robust identification of linear ARX models with recursive EM algorithm based on student’s t-distribution. J. Frankl. Inst. 358(1), 1103–1121 (2021)
  • 9.Dawood H, Dawood H, Guo P. Removal of random-valued impulse noise by local statistics. Multimed. Tools Appl. 2015;74(24):11485–11498. doi: 10.1007/s11042-014-2246-1. [DOI] [Google Scholar]
  • 10.Ding F. System identification. Part F: multi-innovation identification theory and methods. J. Nanjing Univ. Inf. Sci. Technol. 2012;4(1):1–28. [Google Scholar]
  • 11.Ding F. New Theory of System Identification. Beijing: Tsinghua University Press; 2013. [Google Scholar]
  • 12.Ding F, Chen T. Performance analysis of multi-innovation gradient type identification methods. Automatica. 2007;43(1):1–14. doi: 10.1016/j.automatica.2006.07.024. [DOI] [Google Scholar]
  • 13.Dong X, He S, Stojanovic V. Robust fault detection filter design for a class of discrete-time conic-type nonlinear Markov jump systems with jump fault signals. IET Control Theory Appl. 2020;14(14):1912–1919. doi: 10.1049/iet-cta.2019.1316. [DOI] [Google Scholar]
  • 14.Erdogmus D, Hild KE, Principe JC. Online entropy manipulation: stochastic information gradient. Signal Process. Lett. 2003;10(8):242–245. doi: 10.1109/LSP.2003.814400. [DOI] [Google Scholar]
  • 15.Erdogmus D, Principe JC. An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Process. 2002;50(7):1780–1786. doi: 10.1109/TSP.2002.1011217. [DOI] [Google Scholar]
  • 16.Erdogmus D, Principe JC. Generalized information potential criterion for adaptive system training. IEEE Trans. Neural Netw. 2002;13(5):35–44. doi: 10.1109/TNN.2002.1031936. [DOI] [PubMed] [Google Scholar]
  • 17.Erdogmus D, Principe JC. Convergence properties and data efficiency of the minimum error entropy criterion in Adaline training. IEEE Trans. Signal Process. 2003;51(7):1966–1978. doi: 10.1109/TSP.2003.812843. [DOI] [Google Scholar]
  • 18.Hadid B, Duviella E, Lecoeuche S. Data-driven modeling for river flood forecasting based on a piecewise linear ARX system identification. J. Process Control. 2020;86:44–56. doi: 10.1016/j.jprocont.2019.12.007. [DOI] [Google Scholar]
  • 19.Haykin S. Least-Mean-Square Adaptive Filters. New York: Wiley; 2003. [Google Scholar]
  • 20.Haykin S. Adaptive Filter Theory. England: Pearson Education Limited; 2014. [Google Scholar]
  • 21.Heravi AR, Hodtani GA. Comparison of the convergence rates of the new correntropy-based Levenberg–Marquardt (CLM) method and the fixed-point maximum correntropy (FP-MCC) algorithm. Circuits Syst. Signal Process. 2018;37(7):2884–2910. doi: 10.1007/s00034-017-0694-3. [DOI] [Google Scholar]
  • 22.Hu T, Wu Q, Zhou D. Kernel gradient descent algorithm for information theoretic learning. J. Approx. Theory. 2021;263:105518. doi: 10.1016/j.jat.2020.105518. [DOI] [Google Scholar]
  • 23.Hyvarinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw. 2000;13(4):411–430. doi: 10.1016/S0893-6080(00)00026-5. [DOI] [PubMed] [Google Scholar]
  • 24.Isaksson AJ. Identification of ARX-models subject to missing data. IEEE Trans. Autom. Control. 2002;38(5):813–819. doi: 10.1109/9.277253. [DOI] [Google Scholar]
  • 25.Ivanov DV, Sandler IL, Katsyuba OA, Vlasova VN. Identification of FARARX models with errors in variables. In: Jain V, Patnaik S, Popentiu VF, Sethi I, editors. Recent Trends in Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing. Singapore: Springer; 2020. pp. 481–487. [Google Scholar]
  • 26.Jiang Y, Yin S. Recursive total principle component regression based fault detection and its application to vehicular cyber-physical systems. IEEE Trans. Industr. Inf. 2017;14(4):1415–1423. doi: 10.1109/TII.2017.2752709. [DOI] [Google Scholar]
  • 27.Jiang Y, Yin S. Recent advances in key-performance-indicator oriented prognosis and diagnosis with a Matlab toolbox: Db-kit. IEEE Trans. Industr. Inf. 2018;15(5):2849–2858. doi: 10.1109/TII.2018.2875067. [DOI] [Google Scholar]
  • 28.Jurado F, Cano A. Use of ARX algorithms for modelling micro-turbines on the distribution feeder. IEE Proc. Gener. Trans. Distrib. 2004;151(2):232–238. doi: 10.1049/ip-gtd:20040096. [DOI] [Google Scholar]
  • 29.Li Y, Jiang Z, Shi W, Han X, Chen B. Blocked maximum correntropy criterion algorithm for cluster-sparse system identifications. IEEE Trans. Circuits Syst. II Express Briefs. 2019;66(11):1915–1919. [Google Scholar]
  • 30.Li Y, Wang Y, Yang R, Albu F. A soft parameter function penalized normalized maximum correntropy criterion algorithm for sparse system identification. Entropy. 2017;19(1):1–16. doi: 10.3390/e19010045. [DOI] [Google Scholar]
  • 31.Ljung L. System Identification: Theory for the User. Beijing: Tsinghua University Press; 2002. [Google Scholar]
  • 32.Magdy W, Elsayed T. Unsupervised adaptive microblog filtering for broad dynamic topics. Inf. Process. Manage. 2016;52(4):513–528. doi: 10.1016/j.ipm.2015.11.004. [DOI] [Google Scholar]
  • 33.D. Maurya, A. Tangirala, S. Narasimhan. ARX model identification using generalized spectral decomposition. eprint arXiv:2008.04779 (2020)
  • 34.Najeh T, Njima CB, Garna T, Ragot J. Input fault detection and estimation using pi observer based on the ARX-Laguerre model. Int. J. Adv. Manuf. Technol. 2017;90(5):1317–1336. doi: 10.1007/s00170-016-9414-6. [DOI] [Google Scholar]
  • 35.Narendra KS, Parthasarathy K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990;1(1):4–27. doi: 10.1109/72.80202. [DOI] [PubMed] [Google Scholar]
  • 36.Nguyen VT, Bermingham M, Dargusch MS. Data-driven modelling of the interaction force between permanent magnets. J. Magn. Magn. Mater. 2021;532:167869. doi: 10.1016/j.jmmm.2021.167869. [DOI] [Google Scholar]
  • 37.Ogunfunmi T, Safarian C. The quaternion stochastic information gradient algorithm for nonlinear adaptive systems. IEEE Trans. Signal Process. 2019;67(23):5909–5921. doi: 10.1109/TSP.2019.2944757. [DOI] [Google Scholar]
  • 38.Özdenizci O, Erdogmus D. Stochastic mutual information gradient estimation for dimensionality reduction networks. Inf. Sci. 2021;570:298–305. doi: 10.1016/j.ins.2021.04.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Papoulis EV, Stathaki T. A normalized robust mixed-norm adaptive algorithm for system identification. IEEE Signal Process. Lett. 2004;11(1):56–59. doi: 10.1109/LSP.2003.819353. [DOI] [Google Scholar]
  • 40.Parzen E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962;33(3):1065–1076. doi: 10.1214/aoms/1177704472. [DOI] [Google Scholar]
  • 41.Principe JC. Information Theoretic Learning: Renyis Entropy and Kernel Perspectives. New York: Springer; 2010. [Google Scholar]
  • 42.Söderström T, Stoica P. Instrumental variable methods for system identification. Lect. Notes Control Inf. Ences. 1983;21(1):1–9. [Google Scholar]
  • 43.Shannon CE. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27(3):379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]
  • 44.Sharma RR, Kumar M, Maheshwari S, Ray KP. Evdhm-Arima-based time series forecasting model and its application for Covid-19 cases. IEEE Trans. Instrum. Meas. 2021;70:1–10. doi: 10.1109/TIM.2020.3041833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Shi W, Li Y, Chen B. A separable maximum correntropy adaptive algorithm. IEEE Trans. Circuits Syst. II Express Briefs. 2020;67(11):2797–2801. [Google Scholar]
  • 46.Shieh W, Djordjevic IB. OFDM for Optical Communications. London: Elsevier; 2010. [Google Scholar]
  • 47.Stojanovic V, He S, Zhang B. State and parameter joint estimation of linear stochastic systems in presence of faults and non-gaussian noises. Int. J. Robust Nonlinear Control. 2020;30(16):6683–6700. doi: 10.1002/rnc.5131. [DOI] [Google Scholar]
  • 48.Stojanovic V, Prsic D. Robust identification for fault detection in the presence of non-gaussian noises: application to hydraulic servo drives. Nonlinear Dyn. 2020;100:2299–2313. doi: 10.1007/s11071-020-05616-4. [DOI] [Google Scholar]
  • 49.Tao H, Wang P, Chen Y, Stojanovic V, Yang H. An unsupervised fault diagnosis method for rolling bearing using STFT and generative neural networks. J. Franklin Inst. 2020;357(11):7286–7307. doi: 10.1016/j.jfranklin.2020.04.024. [DOI] [Google Scholar]
  • 50.Tu Q, Rong Y, Chen J. Parameter identification of ARX models based on modified momentum gradient descent algorithm. Complexity. 2020;2020(3):1–11. doi: 10.1155/2020/9537075. [DOI] [Google Scholar]
  • 51.Turchetti C, Biagetti G, Gianfelici F, Crippa P. Nonlinear system identification: an effective framework based on the Karhunen–Loeve transform. IEEE Trans. Signal Process. 2009;57(2):536–550. doi: 10.1109/TSP.2008.2008964. [DOI] [Google Scholar]
  • 52.Wen L, Bai H, He L, Zhou Y, Zhou M, Xu Z. Gradient estimation of information measures in deep learning. Knowl. Based Syst. 2021;224:107046. doi: 10.1016/j.knosys.2021.107046. [DOI] [Google Scholar]
  • 53.Zayyani H. Continuous mixed p-norm adaptive algorithm for system identification. IEEE Signal Process. Lett. 2014;21(9):1108–1110. doi: 10.1109/LSP.2014.2325495. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analyzed during this study are included in the attached file “data used in the paper.xlsx”.


Articles from Circuits, Systems, and Signal Processing are provided here courtesy of Nature Publishing Group

RESOURCES