Skip to main content
Entropy logoLink to Entropy
. 2020 Jun 6;22(6):629. doi: 10.3390/e22060629

LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications

Shiguang Zhang 1,2,3,*, Ting Zhou 4,*, Lin Sun 1,3, Wei Wang 1, Baofang Chang 1
PMCID: PMC7517163  PMID: 33286401

Abstract

Due to the complexity of wind speed, it has been reported that mixed-noise models, constituted by multiple noise distributions, perform better than single-noise models. However, most existing regression models suppose that the noise distribution is single. Therefore, we study the Least square SVR of the Gaussian–Laplacian mixed homoscedastic (GLMLSSVR) and heteroscedastic noise (GLMHLSSVR) for complicated or unknown noise distributions. The ALM technique is used to solve model GLMLSSVR. GLMLSSVR is used to predict short-term wind speed with historical data. The prediction results indicate that the presented model is superior to the single-noise model, and has fine performance.

Keywords: Least square SVR, Gaussian–Laplacian mixed noise-characteristic, empirical risk loss, equality constraint, wind-speed forecasting

1. Introduction

In practical applications, if the data are collected in a multi-source environment, the noise distribution is complex and unknown. Therefore, it is almost impossible for a single-noise distribution to clearly describe the real-noise [1]. LSSVR is a method of LR that implements a sum-of-squares error function together with regularization, thus controlling the bias–variance trade-off [2,3]. It is intended to find the concealed linear structures in the original data [4,5]. For the sake of transition from linear to nonlinear function, the following generalization can be made [6]: by mapping input vectors into a high-dimensional feature space H (H is Hilbert space) through some nonlinear-mapping, seek the solution of the optimization problem in space H. Using a suitable kernel function K(,), nonlinear-mappings can be estimated by kernel LSSVR, which is an extended LR with kernel techniques. In recent years, LSSVR as a data-rich nonlinear forecasting tool has been increasingly welcomed [7], which is applicable in many different contexts [8,9,10], such as machine learning, optical character recognition, and especially wind speed/power forecasting.

Generally, the existing techniques used for wind-speed forecasting include: (i) physical; (ii) statistical (also called data-driven); and (iii) artificial intelligence (AI)-based methods. The physical models attempt to estimate wind flow around and inside the wind farm using physical laws governing the atmospheric behavior [11,12]. The statistical models seek the relationships between a set of explanatory variables and the on-line measured generation data, and the historical wind speed data recorded at the site are only used to establish the statistical model. We can model it in a variety of ways, including persistence method and auto-regressive model [13,14]. AI methods include artificial neural networks (ANNs) [15], deep learning [16], SVR machines [17,18], and the hybrid methods [19,20].

Suykens et al. [21,22,23] proposed least square support vector regression model with Gaussian noise (LSSVR, also known as kernel ridge regression (KRR)). Mixed-model based on multi-objective optimization [24,25], mixed-method based on singular spectrum analysis, firefly algorithm, and BP neural network predict wind speed with complicated noise [26], indicating that the mixed prediction method has the ability of powerful prediction. Mixed LSSVR machine [27] is applied to forecast the wind speed noise, which improves performance of wind-speed prediction. GLMSVR [28] models fitted by Gaussian–Laplacian (G-L) mixed noise are developed, and good performance is obtained compared with the existing regression algorithm.

To solve the above problems, we study model LSSVR of G-L mixed noise-characteristic for complex or unknown noise distribution. In this case, we construct a technique to search the optimal solution of the corresponding regression task. Although many LSSVR algorithms have been implemented in past years, we exploit ALM method, as shown in Section 4. If the task is not differentiable or discontinuous, the sub gradient descent method can be employed, or the SMO [29] can also be used if there is a very large sample size.

The structure of this paper is as follows. Section 2 derives the optimal empirical risk loss by Bayesian principle. Section 3 constructs the LSSVR model of G-L mixed noise. Section 4 gives the solution and algorithm design of GLMLSSVR. In Section 5, the numerical experiment of short-term wind-speed prediction is presented. Finally, we conclude the work.

2. Bayesian Principle to Mixed Noise Empirical Risk Loss

Given the Dataset

DN={(A1,y1),(A2,y2),,(AN,yN)}, (1)

where Ai=(xi1,xi2,,xin)TRn, yiR(i=1,2,,N) is the training data. R represents real number set, Rn is the n-dimensional Euclidean-Space, and N is the sample size. Superscript T is the transpose of matrix. Assuming that the sample of dataset DN is generated by the additive noise function ξ, the relationship between the measured value yi and predicted value f(Ai) is:

yi=f(Ai)+ξi,i=1,2,,N (2)

where ξi is random, i.i.d. (independent, identical probability distribution) with p(ξi) of mean μ and standard deviation σ. Generally, the noise PDF (probability density function) p(ξ)=p(yf(A)) is unknown. It is necessary to predict unknown target f(A) from training set DfDN.

Following the authors of [30,31], the optimal empirical risk loss in the sense of Maximum Likelihood (MLE) is

l(ξ)=l(A,y,f(A))=Logp(yf(A)), (3)

i.e., the empirical risk loss l(ξ) is the log-likelihood of noise characteristic.

It is assumed that noise in Equation (2) is Laplacian, with PDFp(ξ)=12e|ξ|. By Equation (3), in MLE the optimal empirical risk loss should be l(ξ)=|ξ|.

Suppose noise in Equation (2) is Gaussian of zero mean and homoscedastic standard deviation σ. By Equation (3), the empirical risk loss of Gaussian noise with homoscedasticity is l(ξ)=12σ2ξ2. The noise in Equation (2) is Gaussian of zero mean and heteroscedastic standard deviation σi. By Equation (3), the empirical risk loss for Gaussian-noise with heteroscedasticity is l(ξi)=12σi2ξi2 (i=1,,N).

Assume noise ξ in Equation (2) is the mixed noise of two kinds of noise with the PDFs p1(ξ) and p2(ξ), respectively. Suppose that p(ξ)=[p1(ξ)λ1]·[p2(ξ)λ2]. By Equation (3), the corresponding empirical risk loss of mixed-noise is

l(ξ)=λ1·l1(ξ)+λ2·l2(ξ). (4)

where l1(ξ)>0,l2(ξ)>0 are the convex empirical risk losses of the above two kinds of noise characteristic, respectively. The weight factors are λ1,λ20 and λ1+λ2=1.

Figure 1 displays the Gaussian–Laplacian (G-L) empirical risk loss of different parameters (the parameter lambda is λ) [29].

Figure 1.

Figure 1

G-L empirical risk loss of different parameters.

3. LSSVR Model of G-L Mixed Noise-Characteristic

Given the training samples DfDN, construct the linear regressor f(A)=ϖT·A+b. To deal with nonlinear problems, it can be summarized as follows: mapping input vectors AiRn into high-dimension feature space H through the nonlinear mapping Φ (take a prior distribution), induced by nonlinear kernel function K(Ai,Aj), kernel mapping Φ is any positive definite Mercer kernel.

Definition 1

([6,28]). Positive definite Mercer kernel: Assume that X is a subset of Rn. Assume that the kernel function K(Ai,Aj) defined on X×X is a positive definite Mercer kernel functionl the kernel mapping Φ is called a positive definite Mercer kernel if there is mapping Φ:XH (H is Hilbert Space), such that

K(Ai,Aj)=(Φ(Ai)·Φ(Aj)),(i,j=1,2,,N). (5)

where (·) represents the inner-product in Space H.

Therefore, the optimization problem of Space H is solved. At present, the input vectors (Ai·Aj) are replaced by inner product (Φ(Ai)·Φ(Aj)) in feature space H. Through the use of kernel K(Ai,Aj)=(Φ(Ai)·Φ(Aj)), the linear model be extended to a nonlinear LSSVR.

In general, the mixed distribution has fine approximation ability to any continuous distribution. When there is no prior knowledge of real-noise, it can well adapt to unknown or complicated noise. Thus, it is presented that a uniform model LSSVR with mixed noise characteristics (MLSSVR). The primal problem of model MLSSVR is formalized as

Min{gPMLSSVR=12ϖT·ϖ+CN·[λ1·i=1N(l1(ξi))+λ2·i=1N(l2(ξi))]}s.t.:ξi=yiϖT·Φ(Ai)b (6)

where parameter ϖRn represents weight-vector, b is the bias-term, C>0 is the penalty parameter, and the weight factors are λ1,λ20, λ1+λ2=1. (Ai,yi)DN, Φ(A) is a nonlinear mapping which transfers the input dataset to a higher-dimensional feature space H. ξi=yiϖT·Φ(Ai)b is the random noise variable at time i(i=1,2,,N). l1(ξi)>0,l2(ξi)>0(i=1,2,,l) is the convex loss-functions for noise characteristic in sample-point (Ai,yi)DN ((i,j=1,2,,N)).

In the application domain, most distributions do not obey Gaussian distribution, and they also do not satisfy Laplacian distribution. the noise distribution is complicated, and it is almost impossible to describe real noise with a single distribution. It has been reported that mixed noise models, constituted by multiple noise distributions, perform better than single-noise model [1]. As the function fitting -machine, the goal is to estimate an unknown function f(A) from dataset DfDN. In this section, G-L mixed homoscedastic and heteroscedastic noise distributions are used to fit complicated noise characteristic.

3.1. LSSVR Model of G-L Mixed Homoscedastic Noise-Characteristic

Suppose noise in Equation (2) is Gaussian of zero mean and homoscedastic standard deviation σ. By Equation (3), we have that the empirical risk loss of homoscedastic-Gaussian-noise characteristic is l1(ξ)=12σ2·ξ2. The Laplacian-noise is l2(ξ)=|ξ|. Adopting G-L mixed homoscedastic noise distribution to fit complicated noise-characteristic, by Equation (4), the empirical risk loss about G-L mixed homoscedastic noise is l(ξ)=λ12σ2·ξ2+λ2·|ξ|. Putting forward the LSSVR model of G-L mixed homoscedastic noise-characteristic (GLMLSSVR), the primal problem of GLMLSSVR is depicted as

Min{gPGLMLSSVR=12ϖT·ϖ+CN·(λ12σ2·i=1Nξi2+λ2·i=1Nξi)}s.t.:ξi=yiϖT·Φ(Ai)b (7)

where parameter vector ϖRn, σ2 is homoscedastic, C>0 is a penalty parameter, and the weight factors are λ1,λ20 and λ1+λ2=1.

Proposition 1.

The solution of the primal problem in Equation (7) of GLMLSSVR is existent and unique about ϖ.

Theorem 1.

The dual problem of the primal problem in Equation (7) is

Max{gDGLMLSSVR=12i=1Nj=1Nαi·αj·K(Ai,Aj)+i=1Nαi·yiN2C·λ1i=1N(σ2·αiC·λ2)2}s.t.:i=1Nαi=0 (8)

where σ2 is homoscedastic, C>0 is a penalty parameter, and the weight factors are λ1,λ20 and λ1+λ2=1.

Proof. 

We introduce Lagrange functional L(ϖ,b,α,ξ) as L(ϖ,b,α,ξ)=12ϖT·ϖ+CN·(λ12σ2·i=1Nξi2+λ2·i=1Nξi)+i=1Nαi(yiϖT·Φ(Ai)bξi).

Minimizing L(ϖ,b,α,ξ) and deriving the partial-derivative ϖ,b,ξ, respectively, on the basis of KKT-conditions, we get

ϖ(L)=0,b(L)=0,ξ(L)=0.

We obtain

ϖ=i=1Nαi·Φ(Ai),
i=1Nαi=0,
CN·(λ1σ2·ξi+λ2)αi=0(i=1,2,,N).

The extreme condition is replaced by L(ϖ,b,α,ξ), and the maximum value of α is obtained. The dual problem in Equation (8) of the primal problem in Equation (7) is derived. □

Therefore,

ϖi=i=1Nαi·Φ(Ai),
b=1Ni=1N[yij=1Nαi·K(Ai,Aj)1λ1·(N·σ2·αiCλ2)].

The decision-maker for GLMLSSVR may be represented as

f(A)=ϖT·Φ(A)+b=i=1NαiK(Ai,A)+b.

where the parameter vector ϖRn, Φ:RnH, (Φ(Ai)·Φ(Aj)) is the inner-product of H and K(Ai,Aj)=(Φ(Ai)·Φ(Aj)) is the kernel-function.

Suppose the noise in Equation (2) is Gaussian homoscedastic noise, which is Gaussian noise of zero mean and the homoscedastic variance σ2. Thus, the dual problem of LSSVR can be derived by Theorem 2:

Max{gDLSSVR=12i=1Nj=1N(αi·αj·K(Ai,Aj))+i=1N(αi·yi)N2C·i=1N(σ2·αi2)}DGNKRR:s.t.i=1Nαi=0. (9)

3.2. LSSVR Model of G-L Mixed Heteroscedastic Noise-Characteristic

It is assumed that the noise in Equation (2) is Gaussian of zero mean and heteroscedastic standard deviation σi, that is σiσj, ij(i,j=1,,N). From Equation (3), the empirical risk loss of heteroscedastic Gaussian-noise characteristic is l1(ξi)=12σi2·ξi2 and the loss-function of Laplacian-noise is l2(ξi)=|ξi|, (i=1,,N). Utilizing G-L mixed heteroscedastic noise distribution to predict complicated noise-characteristic, from Equation (4), the loss function corresponding to G-L mixed heteroscedastic noise is l(ξi)=λ12σi2·ξi2+λ2·|ξi|(i=1,,N). The new model LSSVR with G-L mixed heteroscedastic noise-characteristic (GLMHLSSVR) is proposed. The primal problem of GLMHLSSVR is depicted as

Min{gPGLMHLSSVR=12ϖT·ϖ+CN·(λ12·i=1N1σi2ξi2+λ2·i=1Nξi)}s.t.:ξi=yiϖT·Φ(Ai)b (10)

where the parameter vector is ϖRn, σi2(i=1,2,,N) are heteroscedastic, and C>0 is the penalty parameter. The weight-factors are λ1,λ20 and λ1+λ2=1.

Proposition 2.

The solution of the primal problem in Equation (10) of GLMHLSSVR is existent and unique about ϖ.

Theorem 2.

The dual problem of model GLMHLSSVR in Equation (10) is

Max{gDGLMHLSSVR=12i=1Nj=1Nαi·αj·K(Ai,Aj)+i=1Nαi·yiN2C·λ1i=1N(σi2·αiC·λ2)2}s.t.:i=1Nαi=0 (11)

where σi2(i=1,2,,N) are heteroscedastic and C>0 is the penalty parameter. The weight factors are λ1,λ20 and λ1+λ2=1.

Proof. 

It is easier to derive the proof of Theorem 2 by analogy with Theorem 1. □

We have

ϖi=i=1Nαi·Φ(Ai),
b=1Ni=1N[yij=1Nαi·K(Ai,Aj)1λ1·(N·σi2·αiCλ2)].

The decision-maker for GLMHLSSVR may be expressed as

f(A)=ϖT·Φ(A)+b=i=1NαiK(Ai,A)+b,

where the parameter vector is ϖRn, Φ:RnH, and K(Ai,Aj) is the kernel function.

Suppose noise in Equation (2) is G-L mixed-homoscedastic-noise, in which Gaussian-noise of zero mean and homoscedastic-variance σ2, Theorem 1 can be deduced from Theorem 2.

4. Solution from ALM

In this section, we use Augmented Lagrange-multiplier method (ALM) [32] to solve the dual problem in Equation (8) by applying Gradient descent or Newton’s method to a sequence of equality-constrained problems. By eliminating equality constraints, arbitrary equality constraints can be reduced to equivalent unconstrained problems [33,34]. If there are large-scale training samples, some rapid optimization techniques can be combined with the proposed model, for example the sequential minimal optimization (SMO) algorithm [29] and the stochastic gradient decent (SDG) algorithm [35].

Theorems 1 and 2 provide effective recognition techniques for GLMLSSVR and GLMHLSSVR, respectively. In this section, we derive the solution from ALM and the algorithm for model LSSVR of G-L mixed homoscedastic noise characteristic (GLMLSSVR). Analogously, the solution of model GLMHLSSVR can be obtained by ALM method.

(1) Let dataset be DN={(A1,y1),(A2,y2),,(AN,yN)}, where AiRn, yiR, i=1,,N.

(2) The optimal parameters C,λ1,λ2 were searched by using the 10-fold cross-validation strategy, and the appropriate kernel function K(,) was selected.

(3) Solve model GLMLSSVR of the problem in Equation (8), and get the optimal solution α=(α1,,αN).

(4) Build the decision-function as follows

f(A)=ϖT·Φ(A)+b=i=1NαiK(Ai,A)+b.

The parameter vector is ϖRn, b=1Ni=1N[yij=1Nαi·K(Ai,Aj)1λ1·(N·σ2·αiCλ2)], Φ:RnH, (Φ(Ai)·Φ(Aj)) ((i,j=1,2,,N)) is the inner product in H, K(Ai,Aj)=(Φ(Ai)·Φ(Aj)) is a kernel function.

5. Case Study

This section tests and verifies the validity of constructed model GLMLSSVR by comparing it with other techniques in the Heilongjiang, China dataset DN. This case study consists of the following subsections: G-L mixed-noise characteristic of wind speed, prediction performance evaluation criteria, and short-term wind-speed forecasting based on an actual dataset.

5.1. G-L Mixed-Noise-Characteristic of Wind-Speed

To demonstrate the effectiveness of the proposed model, we collected wind speed data from Heilongjiang. The dataset consists of more than one year of wind speed data, recording wind speed values every 10 min. We first discovered the G-L mixed noise and conducted experiments on it. We found that turbulence is the main reason for the high uncertainty of wind speed random fluctuations. From the perspective of wind energy, the most significant feature of wind energy resources is their variability. Now, it shows the distribution of wind speed. Take a wind speed value every 5 s and calculate the histogram of wind speed within 1–2 h. Two typical distributions are given: one is calculated when the wind speed is high and the other is calculated when the wind speed is low (see Figure 2 and Figure 3, respectively).

Figure 2.

Figure 2

High wind speed distribution.

Figure 3.

Figure 3

Low wind speed distribution.

We analyzed the one-month time-series dataset, and used the persistence method to investigate the error distribution [32]. The results show that the wind speed error ξ obtained from the persistence prediction is not subject to single distribution, while approximately to G-L mixed distribution, and PDF of ξ is p(ξ)=12e|ξ|·12σ2ξ2, as shown in Figure 4.

Figure 4.

Figure 4

G-L mixed distribution of wind-speed forecasting-error with the persistence method.

As can be seen from the above charts and figures, wind speed error approximately satisfies G-L mixed distribution. This is a mixed kind of task.

5.2. Prediction Performance Evaluation Criteria

It is generally known that no prediction model forecasts perfectly. The predictable performance of νSVR, GNSVR, LSSVR, and GLMLSSVR also has certain evaluation criteria, for example MAE (mean absolute error), RMSE (root mean square error), MAPE (mean absolute percentage error), and SEP (the standard error of prediction). The four criteria be defined as follows:

MAE=1Ni=1N|yiyi|, (12)
MAPE=1Ni=1N|yiyi|yi×100%, (13)
RMSE=1Ni=1N(yiyi)2, (14)
SEP=RMSEy¯×100%, (15)

where N is the size of the dataset DN, yi is the ith actual observed data, and yi is the ith forecasted-result. y¯ is the mean value of observations yiDN[36,37,38,39,40]. MAE shows how similar the predicted value is to the observed value, while RMSE measures overall deviation between predicted value and observed value. MAPE is the ratio between error and observed value. SEP is the ratio of RMSE to average observation. They are dimensionless measurements of accuracy of wind speed system, and are sensitive to small changes.

5.3. Short-Term Wind-Speed Forecasting with Real dataset

In this section, 2160 consecutive data (1–2160, time span of 15-days) are extracted as the training set and 720 consecutive data (2161–2880, time span of 5-days) are extracted as the testing set. The input vector is Ai¯=(xi11,xi10,,xi1,xi), xj is the actual observed data of wind speed at moment j(j=i11,i10,,i), and the forecasting value is xi+step, where step=1,3,6. That is, the above models are used to forecast wind speed of each point xi after 10, 30 and 60 min, respectively. Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 describe the forecasting results given by models νSVR, GNSVR, LSSVR, and GLMLSSVR.

Figure 5.

Figure 5

Result of four wind-speed forecasting models after 10 min.

Figure 6.

Figure 6

Error of four wind-speed forecasting models after 10 min.

Figure 7.

Figure 7

Residual box plot of four wind-speed forecasting models after 10 min.

Figure 8.

Figure 8

Result of four wind-speed forecasting models after 30 min.

Figure 9.

Figure 9

Error of four wind-speed forecasting models after 30 min.

Figure 10.

Figure 10

Residual box plot of four wind-speed forecasting models after 30 min.

Figure 11.

Figure 11

Result of four wind-speed forecasting models after 60 min.

Figure 12.

Figure 12

Error of four wind-speed forecasting models after 60 min.

Figure 13.

Figure 13

Residual box plot of four wind-speed forecasting models after 60 min.

The models νSVR, GNSVR, LSSVR, and GLMLSSVR were implemented in Matlab 7.8. Initial parameters of GLMLSSVR were C[1,200], ν(0,1), and λ1,λ2[0,1]. The optimal parameters C,ν,λ1,λ2 were searched by using 10-fold cross-validation technique. The technology of parameter selection is studied in detail in [41,42]. In this simulation, parameters were set to C=181,ν=0.5,λ1=0.5,λ2=0.5. The practical application demonstrates that both polynomial kernel and Gaussian kernel perform well under the assumption of smoothness. Under these circumstances, models νSVR, GNSVR, LSSVR, and GLMLSSVR employ polynomial and Gaussian kernel functions [43]:

K(Ai,Aj)=((Ai,Aj)+1)d,
K(Ai,Aj)=eAiAj2σ2,

where d is a positive integer and σ is a positive number.

The dual problem of νSVR and SVR of the Gaussian-noise model (GNSVR) and LSSVR are as follows.

νSVR: The authors of [41,44] define the dual problem of νSVR as

Max{gDνSVR=12iRSVjRSV(αi*αi)(αj*αj)·K(Ai,Aj)+iRSV(αi*αi)·yi}s.t.:i=1N(αi*αi)=00αi(*)CN,i=1,,Ni=1N(αi+αi*)C·ν,i=1,,N. (16)

GNSVR: The authors of [45,46] studied SVR with equality constraints and inequality constraints. The loss-function of Gaussian-noise is c(ξi)=ξi2/2, (i=1,,N). Thus, thus dual problem of GNSVR is

max{gDGNSVR=12iRSVjRSV(αi*αi)(αj*αj)K(Ai,Aj)+iRSV(αi*αi)yiN2Ci=1N(αi2+(αi*)2)}s.t.:i=1N(αi*αi)=00αi(*)CN,i=1,,Ni=1N(αi+αi*)C·ν,i=1,,N. (17)

LSSVR: [22] studied LSSVR for Gaussian-noise model. The dual problem of LSSVR is

max{gDLSSVR=12i=1Nj=1N(αi·αj·K(Ai,Aj))+i=1N(αi·yi)N2C·i=1N(αi2)}s.t.:i=1Nαi=0. (18)

where ξi,ξi* are slack-variables. C>0, ν(0,1] are constants. For νSVR and GNSVR, the size of ϵ is not gained, but is a variable whose value is compromised by a constant with the model complexity and relaxation variables through ν [35].

In Figure 5, Figure 8 and Figure 11, wind-speed forecasting-results at Ai-point of νSVR, GNSVR, LSSVR, and GLMLSSVR are presented after 10, 30, and 60 min, respectively. Figure 6, Figure 9, and Figure 12 show the error statistic of wind-speed prediction using the above four models. The box plots (Figure 7, Figure 10, and Figure 13) of several noise levels further intuitively demonstrate the comparative effect of error statistics using the above four wind-speed forecasting models. The statistical criteria of MAE, MAPE, RMSE and SEP are displayed in Table 1, Table 2 and Table 3.

Table 1.

Error statistic of four wind-speed forecasting models after 10 min.

Model MAE (m/s) RMSE (m/s) MAPE (%) SEP (%)
νSVR 0.4280 0.5833 8.02 7.12
GNSVR 0.4256 0.5789 7.92 7.07
LSSVR 0.4219 0.5768 7.94 7.06
GLMLSSVR 0.4190 0.5711 7.91 7.05

Table 2.

Error statistic of four wind-speed forecasting models after 30 min.

Model MAE (m/s) RMSE (m/s) MAPE (%) SEP (%)
νSVR 0.7979 1.0116 23.36 12.53
GNSVR 0.7368 0.9886 19.93 11.89
LSSVR 0.7109 0.9226 17.17 11.43
GLMLSSVR 0.6185 0.8241 10.71 10.19

Table 3.

Error statistic of four wind-speed forecasting models after 60 min.

Model MAE (m/s) RMSE (m/s) MAPE (%) SEP (%)
νSVR 0.9994 1.2580 33.93 15.66
GNSVR 0.9728 1.2355 31.78 15.37
LSSVR 0.9646 1.2177 29.01 15.16
GLMLSSVR 0.8835 1.1180 25.72 13.97

From box-whisker plots in Figure 7, Figure 10, and Figure 13, as well as Table 1, Table 2 and Table 3, it can be concluded that, in most cases, the forecasting-error of GLMLSSVR calculation is superior to νSVR, GNSVR and LSSVR. With the increase of prediction horizon to 30 and 60 min, the forecasting error of different models increases and the relative error decreases. Thus, in these cases, it is not that important. However, Table 1, Table 2 and Table 3 show that, under all the criteria of MAE, MAPE, RMSE, and SEP, the Gaussian–Laplacian mixed-noise model is slightly better than the classical model.

6. Conclusions

Most existing regression-techniques suppose that the noise model is single. Wind-speed forecasting is complicated due to its volatility and uncertainty, thus it is difficult to model with a single-noise distribution. This section summarizes our main work: (1) optimal empirical risk loss of G-L mixed noise is deduced by Bayesian principle; (2) the LSSVR of G-L mixed homoscedastic noise (GLMLSSVR) and G-L mixed heteroscedastic noise (GLMHLSSVR) for complicate noise is developed; (3) the dual problem of GLMLSSVR and GLMHLSSVR is obtained using Lagrange-functional and according to KKT conditions; (4) the stability and effectiveness of the algorithm are guaranteed by solving GLMLSSVR with the ALM method; and (5) the proposed technology is used to predict short-term wind speed by historical data, and then forecast the wind speed at some time after 10, 30, and 60 min, respectively. The comparison results display that the proposed model is better than classical technologies in statistical criteria.

In the same way, we can also study Gaussian–Laplacian, or Gaussian–Weibull mixed noise classification models. The new hybrid noise models would effectively solve complicated noise classification problems.

Abbreviations

The following abbreviations are used in this manuscript:

LR Linear regression model
ν-SVR ν-Support vector regression
GN-SVR ν-SVR model of Gaussian homoscedastic-noise
LSSVR Least squares support vector regression model
GLM-LSSVR LSSVR model of Gaussian–Laplacian mixed homoscedastic-noise
ALM Augmented Lagrange multiplier method

Author Contributions

Conceptualization, S.Z.; Formal analysis, S.Z. and T.Z.; Methodology, S.Z. and L.S.; Writing–original draft, S.Z. and T.Z.; Writing–review & editing, W.W. and B.C. All authors have read and approved the final published version. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National natural science foundation of China (NSFC) (Nos. 11702087 and 61772176) and Natural Science Foundation Project of Henan (No. 182300410130).

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Bishop C.M. Pattern Recognition and Machine Learning. Springer; New York, NY, USA: 2006. [Google Scholar]
  • 2.Tikhonov A.A., Arsenin V.Y. Solutions of Ill-Posed Problems. New York Wiley; New York, NY, USA: 1977. [Google Scholar]
  • 3.Gonen A., Orabona F., Shalev-Shwartz S. Solving Ridge Regression using Sketched Preconditioned SVRG; Proceedings of the 33rd International Conference on Machine Learning; New York, NY, USA. 19–24 June 2016. [Google Scholar]
  • 4.Hoerl A.E. Application of ridge analysis to regression problems. Chem. Eng. Prog. 1962;58:54–59. [Google Scholar]
  • 5.Zhang Z.H., Dai G., Xu C.F. Regularized Discriminant Analysis, Ridge Regression and Beyond. J. Mach. Learn. Res. 2010;11:2199–2228. [Google Scholar]
  • 6.Sun L., Wang L., Ding W., Qian Y., Xu J. Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans. Fuzzy Syst. 2020 doi: 10.1109/TFUZZ.2020.2989098. [DOI] [Google Scholar]
  • 7.Jiao L.C., Bo L.F., Wang L. Fast Sparse Approximation for Least Squares Support Vector Machine. IEEE Trans. Neural Netw. 2007;18:685–697. doi: 10.1109/TNN.2006.889500. [DOI] [PubMed] [Google Scholar]
  • 8.Völgyesi L., Palánc B., Fekete K., Popper G. Application of Kernel Ridge Regression to Network Levelling via Mathematica. Geophys. Res. Abstr. 2005;73:263–276. [Google Scholar]
  • 9.Sun L., Zhang X., Qian Y., Xu J., Zhang S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci. 2019;502:18–41. doi: 10.1016/j.ins.2019.05.072. [DOI] [Google Scholar]
  • 10.Douak F., Melgani F., Benoudjit N. Kernel ridge regression with active learning for wind-speed prediction. Appl. Energy. 2013;103:328–340. doi: 10.1016/j.apenergy.2012.09.055. [DOI] [Google Scholar]
  • 11.Alexiadis M.C., Dokopoulos P.S., Sahsamanoglou H.S., Manousaridis I.M. Short term forecasting of wind speed and related electrical power. J. Sol. Energy. 1998;63:61–68. doi: 10.1016/S0038-092X(98)00032-2. [DOI] [Google Scholar]
  • 12.Negnevitsky M., Potter C.W. Innovative short-term wind generation prediction techniques; Proceedings of the power systems conference and exposition; Atlanta, GA, USA. 29 October–1 November 2006. [Google Scholar]
  • 13.Torres J.L., Garcia A., De Blas M., De Francisco A. Forecast of hourly average wind speed with ARMA models in Navarre (Spain) J. Sol. Energy. 2005;79:65–77. doi: 10.1016/j.solener.2004.09.013. [DOI] [Google Scholar]
  • 14.Kavasseri R.G., Seetharaman K. Day-ahead wind-speed forecasting using f-ARIMA models. Renew. Energy. 2009;34:1388–1393. doi: 10.1016/j.renene.2008.09.006. [DOI] [Google Scholar]
  • 15.Li G., Shi J. On comparing three artificial neural networks for wind speed forecasting. Appl. Energy. 2010;87:2313–2320. doi: 10.1016/j.apenergy.2009.12.013. [DOI] [Google Scholar]
  • 16.Hu Q., Zhang R., Zhou Y. Transfer learning for short-term wind-speed prediction with deep neural networks. Renew. Energy. 2016;85:83–95. doi: 10.1016/j.renene.2015.06.034. [DOI] [Google Scholar]
  • 17.Salcedo-Sanz S., Ortiz-Garcı E.G., Pérez-Bellido Á.M., Portilla-Figueras A., Prieto L. Short term wind-speed prediction based on evolutionary support vector regression algorithms. Expert Syst. Appl. 2011;38:4052–4057. doi: 10.1016/j.eswa.2010.09.067. [DOI] [Google Scholar]
  • 18.Zhou J., Shi J., Li G. Fine tuning support vector machines for short-term wind speed forecasting. Energy Convers. Manag. 2011;52:1990–1998. doi: 10.1016/j.enconman.2010.11.007. [DOI] [Google Scholar]
  • 19.Liu H., Tian H.-Q., Chen C., Li Y.-F. A hybrid statistical method to predict wind speed and wind power. Renew. Energy. 2010;35:1857–1861. doi: 10.1016/j.renene.2009.12.011. [DOI] [Google Scholar]
  • 20.Wang Y., Hu Q., Li L., Foley A.M., Srinivasan D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 2019;116:109422. doi: 10.1016/j.rser.2019.109422. [DOI] [Google Scholar]
  • 21.Suykens J., Vandewalle J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999;9:293–300. doi: 10.1023/A:1018628609742. [DOI] [Google Scholar]
  • 22.Suykens J., Lukas L., Vandewalle J. Sparse approximation using least square vector machines; Proceedings of the IEEE International Symposium on Circuits and Systems; Geneva, Switzerland. 28–31 May 2000; pp. 757–760. [Google Scholar]
  • 23.Suykens J., De Brabanter J., Lukas L., Vandewalle J. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing. 2002;48:85–105. doi: 10.1016/S0925-2312(01)00644-0. [DOI] [Google Scholar]
  • 24.Du P., Wang J., Guo Z., Yang W. Research and application of a novel hybrid forecasting system based on multi-objective optimization for wind speed forecasting. Energy Convers. Manag. 2017;150:90–107. doi: 10.1016/j.enconman.2017.07.065. [DOI] [Google Scholar]
  • 25.Sun L., Wang L., Ding W., Qian Y., Xu J. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl. -Based Syst. 2020;192:105373. doi: 10.1016/j.knosys.2019.105373. [DOI] [Google Scholar]
  • 26.Jiang Y., Huang G.Q. A hybrid method based on singular spectrum analysis, firefly algorithm, and BP neural network for short-term wind-speed forecasting. Energies. 2016;9:757. [Google Scholar]
  • 27.Jiang Y., Huang G. Short-term wind speed prediction: Hybrid of ensemble empirical mode decomposition, feature selection and error correction. Energy Convers. Manag. 2017;144:340–350. doi: 10.1016/j.enconman.2017.04.064. [DOI] [Google Scholar]
  • 28.Zhang S., Zhou T., Sun L., Wang W., Wang C., Mao W. ν-Support Vector Regression Model Based on Gauss-Laplace Mixture Noise Characteristic for Wind Speed Prediction. Entropy. 2019;21:1056. doi: 10.3390/e21111056. [DOI] [Google Scholar]
  • 29.Shevade S., Keerthi S.S., Bhattacharyya C., Murthy K. Improvements to the SMO algorithm for SVM regression. IEEE Trans. Neural Netw. 2000;11:1188–1193. doi: 10.1109/72.870050. [DOI] [PubMed] [Google Scholar]
  • 30.Klaus-Robert M., Sebastian M. An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 2001;12:181–202. doi: 10.1109/72.914517. [DOI] [PubMed] [Google Scholar]
  • 31.Chu W., Keerthi S., Ong C.J. Bayesian Support Vector Regression Using a Unified Loss Function. IEEE Trans. Neural Netw. 2004;15:29–44. doi: 10.1109/TNN.2003.820830. [DOI] [PubMed] [Google Scholar]
  • 32.Rockafellar R.T. Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming. SIAM J. Control. 1974;12:268–285. doi: 10.1137/0312021. [DOI] [Google Scholar]
  • 33.Boyd S., Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004. pp. 521–620. [Google Scholar]
  • 34.Wang S., Zhang N., Wu L., Wang Y. Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method. Renew. Energy. 2016;94:629–636. doi: 10.1016/j.renene.2016.03.103. [DOI] [Google Scholar]
  • 35.Bordes A., Bottou L., Gallinari P. SGD-QN: Careful quasiNewton stochastic gradient descent. J. Mach. Learn. Res. 2009;10:1737–1754. [Google Scholar]
  • 36.Bludszuweit H., Dominguez-Navarro J., Llombart A. Statistical Analysis of Wind Power Forecast Error. IEEE Trans. Power Syst. 2008;23:983–991. doi: 10.1109/TPWRS.2008.922526. [DOI] [Google Scholar]
  • 37.Fabbri A., Román T.G.S., Abbad J.R., Quezada V.H.M. Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market. IEEE Trans. Power Syst. 2005;20:1440–1446. doi: 10.1109/TPWRS.2005.852148. [DOI] [Google Scholar]
  • 38.Guo Z., Zhao J., Zhang W., Wang J. A corrected hybrid approach for wind speed prediction in Hexi Corridor of China. Energy. 2011;36:1668–1679. doi: 10.1016/j.energy.2010.12.063. [DOI] [Google Scholar]
  • 39.Wang J.Z., Hu J.M. A robust combination approach for short-term wind-speed forecasting and analysis-Combination of the ARIMA, ELM, SVM and LSSVM forecasts using a GPR model. Energy. 2015;93:41–56. doi: 10.1016/j.energy.2015.08.045. [DOI] [Google Scholar]
  • 40.Abdoos A.A. A new intelligent method based on combination of VMD and ELM for short term wind power forecasting. Neurocomputing. 2016;203:111–120. doi: 10.1016/j.neucom.2016.03.054. [DOI] [Google Scholar]
  • 41.Chalimourda A., Schölkopf B., Smola A.J. Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Netw. 2004;17:127–141. doi: 10.1016/S0893-6080(03)00209-0. [DOI] [PubMed] [Google Scholar]
  • 42.Cherkassky V., Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004;17:113–126. doi: 10.1016/S0893-6080(03)00169-2. [DOI] [PubMed] [Google Scholar]
  • 43.Kwok J.T., Tsang I.W. Linear dependency between and the input noise in ϵ-support vector regression. IEEE Trans. Neural Netw. 2003;14:544–553. doi: 10.1109/TNN.2003.810604. [DOI] [PubMed] [Google Scholar]
  • 44.Schölkopf B., Smola A.J., Williamson R.C., Bartlett P. New Support Vector Algorithms. Neural Comput. 2000;12:1207–1245. doi: 10.1162/089976600300015565. [DOI] [PubMed] [Google Scholar]
  • 45.Wu Q. A hybrid-forecasting model based on Gaussian support vector machine and chaotic particle swarm optimization. Expert Syst. Appl. 2010;37:2388–2394. doi: 10.1016/j.eswa.2009.07.057. [DOI] [Google Scholar]
  • 46.Wu Q., Law R. The forecasting model based on modified SVRM and PSO penalizing Gaussian noise. Expert Syst. Appl. 2011;38:1887–1894. doi: 10.1016/j.eswa.2010.07.120. [DOI] [Google Scholar]

Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES