LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications

Shiguang Zhang; Ting Zhou; Lin Sun; Wei Wang; Baofang Chang

doi:10.3390/e22060629

. 2020 Jun 6;22(6):629. doi: 10.3390/e22060629

LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications

Shiguang Zhang ^1,^2,^3,^*, Ting Zhou ^4,^*, Lin Sun ^1,³, Wei Wang ¹, Baofang Chang ¹

PMCID: PMC7517163 PMID: 33286401

Abstract

Due to the complexity of wind speed, it has been reported that mixed-noise models, constituted by multiple noise distributions, perform better than single-noise models. However, most existing regression models suppose that the noise distribution is single. Therefore, we study the Least square $S V R$ of the Gaussian–Laplacian mixed homoscedastic ( $G L M - L S S V R$ ) and heteroscedastic noise ( $G L M H - L S S V R$ ) for complicated or unknown noise distributions. The ALM technique is used to solve model $G L M - L S S V R$ . $G L M - L S S V R$ is used to predict short-term wind speed with historical data. The prediction results indicate that the presented model is superior to the single-noise model, and has fine performance.

Keywords: Least square SVR, Gaussian–Laplacian mixed noise-characteristic, empirical risk loss, equality constraint, wind-speed forecasting

1. Introduction

In practical applications, if the data are collected in a multi-source environment, the noise distribution is complex and unknown. Therefore, it is almost impossible for a single-noise distribution to clearly describe the real-noise [1]. $L S S V R$ is a method of $L R$ that implements a sum-of-squares error function together with regularization, thus controlling the bias–variance trade-off [2,3]. It is intended to find the concealed linear structures in the original data [4,5]. For the sake of transition from linear to nonlinear function, the following generalization can be made [6]: by mapping input vectors into a high-dimensional feature space H (H is Hilbert space) through some nonlinear-mapping, seek the solution of the optimization problem in space H. Using a suitable kernel function $K (•, •)$ , nonlinear-mappings can be estimated by kernel $L S S V R$ , which is an extended $L R$ with kernel techniques. In recent years, $L S S V R$ as a data-rich nonlinear forecasting tool has been increasingly welcomed [7], which is applicable in many different contexts [8,9,10], such as machine learning, optical character recognition, and especially wind speed/power forecasting.

Generally, the existing techniques used for wind-speed forecasting include: (i) physical; (ii) statistical (also called data-driven); and (iii) artificial intelligence (AI)-based methods. The physical models attempt to estimate wind flow around and inside the wind farm using physical laws governing the atmospheric behavior [11,12]. The statistical models seek the relationships between a set of explanatory variables and the on-line measured generation data, and the historical wind speed data recorded at the site are only used to establish the statistical model. We can model it in a variety of ways, including persistence method and auto-regressive model [13,14]. AI methods include artificial neural networks (ANNs) [15], deep learning [16], SVR machines [17,18], and the hybrid methods [19,20].

Suykens et al. [21,22,23] proposed least square support vector regression model with Gaussian noise ( $L S S V R$ , also known as kernel ridge regression ( $K R R$ )). Mixed-model based on multi-objective optimization [24,25], mixed-method based on singular spectrum analysis, firefly algorithm, and BP neural network predict wind speed with complicated noise [26], indicating that the mixed prediction method has the ability of powerful prediction. Mixed $L S S V R$ machine [27] is applied to forecast the wind speed noise, which improves performance of wind-speed prediction. $G L M - S V R$ [28] models fitted by Gaussian–Laplacian (G-L) mixed noise are developed, and good performance is obtained compared with the existing regression algorithm.

To solve the above problems, we study model $L S S V R$ of G-L mixed noise-characteristic for complex or unknown noise distribution. In this case, we construct a technique to search the optimal solution of the corresponding regression task. Although many $L S S V R$ algorithms have been implemented in past years, we exploit ALM method, as shown in Section 4. If the task is not differentiable or discontinuous, the sub gradient descent method can be employed, or the SMO [29] can also be used if there is a very large sample size.

The structure of this paper is as follows. Section 2 derives the optimal empirical risk loss by Bayesian principle. Section 3 constructs the $L S S V R$ model of G-L mixed noise. Section 4 gives the solution and algorithm design of $G L M - L S S V R$ . In Section 5, the numerical experiment of short-term wind-speed prediction is presented. Finally, we conclude the work.

2. Bayesian Principle to Mixed Noise Empirical Risk Loss

Given the Dataset

D_{N} = {(A_{1}, y_{1}), (A_{2}, y_{2}), \dots, (A_{N}, y_{N})},

(1)

where $A_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i n})}^{T} \in R^{n}$ , $y_{i} \in R (i = 1, 2, \dots, N)$ is the training data. R represents real number set, $R^{n}$ is the n-dimensional Euclidean-Space, and N is the sample size. Superscript T is the transpose of matrix. Assuming that the sample of dataset $D_{N}$ is generated by the additive noise function $ξ$ , the relationship between the measured value $y_{i}$ and predicted value $f (A_{i})$ is:

y_{i} = f (A_{i}) + ξ_{i}, i = 1, 2, \dots, N

(2)

where $ξ_{i}$ is random, i.i.d. (independent, identical probability distribution) with $p (ξ_{i})$ of mean $μ$ and standard deviation $σ$ . Generally, the noise $P D F$ (probability density function) $p (ξ) = p (y - f (A))$ is unknown. It is necessary to predict unknown target $f (A)$ from training set $D_{f} \subseteq D_{N}$ .

Following the authors of [30,31], the optimal empirical risk loss in the sense of Maximum Likelihood (MLE) is

l (ξ) = l (A, y, f (A)) = - L o g p (y - f (A)),

(3)

i.e., the empirical risk loss $l (ξ)$ is the log-likelihood of noise characteristic.

It is assumed that noise in Equation (2) is Laplacian, with $P D F$ $p (ξ) = \frac{1}{2} e^{- | ξ |}$ . By Equation (3), in MLE the optimal empirical risk loss should be $l (ξ) = | ξ |$ .

Suppose noise in Equation (2) is Gaussian of zero mean and homoscedastic standard deviation $σ$ . By Equation (3), the empirical risk loss of Gaussian noise with homoscedasticity is $l (ξ) = \frac{1}{2 σ^{2}} ξ^{2}$ . The noise in Equation (2) is Gaussian of zero mean and heteroscedastic standard deviation $σ_{i}$ . By Equation (3), the empirical risk loss for Gaussian-noise with heteroscedasticity is $l (ξ_{i}) = \frac{1}{2 σ_{i}^{2}} ξ_{i}^{2}$ ( $i = 1, \dots, N$ ).

Assume noise $ξ$ in Equation (2) is the mixed noise of two kinds of noise with the $P D F$ s $p_{1} (ξ)$ and $p_{2} (ξ)$ , respectively. Suppose that $p (ξ) = [p_{1} {(ξ)}^{λ_{1}}] \cdot [p_{2} {(ξ)}^{λ_{2}}]$ . By Equation (3), the corresponding empirical risk loss of mixed-noise is

l (ξ) = λ_{1} \cdot l_{1} (ξ) + λ_{2} \cdot l_{2} (ξ) .

(4)

where $l_{1} (ξ) > 0, l_{2} (ξ) > 0$ are the convex empirical risk losses of the above two kinds of noise characteristic, respectively. The weight factors are $λ_{1}, λ_{2} \geq 0$ and $λ_{1} + λ_{2} = 1$ .

Figure 1 displays the Gaussian–Laplacian (G-L) empirical risk loss of different parameters (the parameter lambda is $λ$ ) [29].

3. $LSSVR$ Model of G-L Mixed Noise-Characteristic

Given the training samples $D_{f} \subseteq D_{N}$ , construct the linear regressor $f (A) = ϖ^{T} \cdot A + b$ . To deal with nonlinear problems, it can be summarized as follows: mapping input vectors $A_{i} \in R^{n}$ into high-dimension feature space H through the nonlinear mapping $Φ$ (take a prior distribution), induced by nonlinear kernel function $K (A_{i}, A_{j})$ , kernel mapping $Φ$ is any positive definite Mercer kernel.

Definition 1

([6,28]). Positive definite Mercer kernel: Assume that X is a subset of $R^{n}$ . Assume that the kernel function $K (A_{i}, A_{j})$ defined on $X \times X$ is a positive definite Mercer kernel functionl the kernel mapping Φ is called a positive definite Mercer kernel if there is mapping $Φ : X \to H$ (H is Hilbert Space), such that

$K (A_{i}, A_{j}) = (Φ (A_{i}) \cdot Φ (A_{j})), (i, j = 1, 2, \dots, N) .$ (5)

where $(\cdot)$ represents the inner-product in Space H.

Therefore, the optimization problem of Space H is solved. At present, the input vectors $(A_{i} \cdot A_{j})$ are replaced by inner product $(Φ (A_{i}) \cdot Φ (A_{j}))$ in feature space H. Through the use of kernel $K (A_{i}, A_{j}) = (Φ (A_{i}) \cdot Φ (A_{j}))$ , the linear model be extended to a nonlinear $L S S V R$ .

In general, the mixed distribution has fine approximation ability to any continuous distribution. When there is no prior knowledge of real-noise, it can well adapt to unknown or complicated noise. Thus, it is presented that a uniform model $L S S V R$ with mixed noise characteristics ( $M - L S S V R$ ). The primal problem of model $M - L S S V R$ is formalized as

\begin{matrix} M i n {g_{P_{M - L S S V R}} = \frac{1}{2} ϖ^{T} \cdot ϖ + \frac{C}{N} \cdot [λ_{1} \cdot \sum_{i = 1}^{N} (l_{1} (ξ_{i})) + \\ λ_{2} \cdot \sum_{i = 1}^{N} (l_{2} (ξ_{i}))]} \\ s . t . : ξ_{i} = y_{i} - ϖ^{T} \cdot Φ (A_{i}) - b \end{matrix}

(6)

where parameter $ϖ \in R^{n}$ represents weight-vector, b is the bias-term, $C > 0$ is the penalty parameter, and the weight factors are $λ_{1}, λ_{2} \geq 0$ , $λ_{1} + λ_{2} = 1$ . $(A_{i}, y_{i}) \in D_{N}$ , $Φ (A)$ is a nonlinear mapping which transfers the input dataset to a higher-dimensional feature space H. $ξ_{i} = y_{i} - ϖ^{T} \cdot Φ (A_{i}) - b$ is the random noise variable at time $i (i = 1, 2, \dots, N)$ . $l_{1} (ξ_{i}) > 0, l_{2} (ξ_{i}) > 0 (i = 1, 2, \dots, l)$ is the convex loss-functions for noise characteristic in sample-point $(A_{i}, y_{i}) \in D_{N}$ ( $(i, j = 1, 2, \dots, N)$ ).

In the application domain, most distributions do not obey Gaussian distribution, and they also do not satisfy Laplacian distribution. the noise distribution is complicated, and it is almost impossible to describe real noise with a single distribution. It has been reported that mixed noise models, constituted by multiple noise distributions, perform better than single-noise model [1]. As the function fitting -machine, the goal is to estimate an unknown function $f (A)$ from dataset $D_{f} \subseteq D_{N}$ . In this section, G-L mixed homoscedastic and heteroscedastic noise distributions are used to fit complicated noise characteristic.

3.1. $L S S V R$ Model of G-L Mixed Homoscedastic Noise-Characteristic

Suppose noise in Equation (2) is Gaussian of zero mean and homoscedastic standard deviation $σ$ . By Equation (3), we have that the empirical risk loss of homoscedastic-Gaussian-noise characteristic is $l_{1} (ξ) = \frac{1}{2 σ^{2}} \cdot ξ^{2}$ . The Laplacian-noise is $l_{2} (ξ) = | ξ |$ . Adopting G-L mixed homoscedastic noise distribution to fit complicated noise-characteristic, by Equation (4), the empirical risk loss about G-L mixed homoscedastic noise is $l (ξ) = \frac{λ_{1}}{2 σ^{2}} \cdot ξ^{2} + λ_{2} \cdot | ξ |$ . Putting forward the $L S S V R$ model of G-L mixed homoscedastic noise-characteristic ( $G L M - L S S V R$ ), the primal problem of $G L M - L S S V R$ is depicted as

\begin{matrix} M i n {g_{P_{G L M - L S S V R}} = \frac{1}{2} ϖ^{T} \cdot ϖ + \frac{C}{N} \cdot (\frac{λ_{1}}{2 σ^{2}} \cdot \sum_{i = 1}^{N} ξ_{i}^{2} \\ + λ_{2} \cdot \sum_{i = 1}^{N} ξ_{i})} \\ s . t . : ξ_{i} = y_{i} - ϖ^{T} \cdot Φ (A_{i}) - b \end{matrix}

(7)

where parameter vector $ϖ \in R^{n}$ , $σ^{2}$ is homoscedastic, $C > 0$ is a penalty parameter, and the weight factors are $λ_{1}, λ_{2} \geq 0$ and $λ_{1} + λ_{2} = 1$ .

Proposition 1.

The solution of the primal problem in Equation (7) of $G L M - L S S V R$ is existent and unique about ϖ.

Theorem 1.

The dual problem of the primal problem in Equation (7) is

$\begin{matrix} M a x {g_{D_{G L M - L S S V R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} α_{i} \cdot α_{j} \cdot K (A_{i}, A_{j}) + \\ \sum_{i = 1}^{N} α_{i} \cdot y_{i} - \frac{N}{2 C \cdot λ_{1}} \sum_{i = 1}^{N} {(σ^{2} \cdot α_{i} - C \cdot λ_{2})}^{2}} \\ s . t . : \sum_{i = 1}^{N} α_{i} = 0 \end{matrix}$ (8)

where $σ^{2}$ is homoscedastic, $C > 0$ is a penalty parameter, and the weight factors are $λ_{1}, λ_{2} \geq 0$ and $λ_{1} + λ_{2} = 1$ .

Proof.

We introduce Lagrange functional $L (ϖ, b, α, ξ)$ as $L (ϖ, b, α, ξ) = \frac{1}{2} ϖ^{T} \cdot ϖ + \frac{C}{N} \cdot (\frac{λ_{1}}{2 σ^{2}} \cdot \sum_{i = 1}^{N} ξ_{i}^{2} + λ_{2} \cdot \sum_{i = 1}^{N} ξ_{i}) + \sum_{i = 1}^{N} α_{i} (y_{i} - ϖ^{T} \cdot Φ (A_{i}) - b - ξ_{i}) .$

Minimizing $L (ϖ, b, α, ξ)$ and deriving the partial-derivative $ϖ, b, ξ$ , respectively, on the basis of KKT-conditions, we get

$\nabla_{ϖ} (L) = 0, \nabla_{b} (L) = 0, \nabla_{ξ} (L) = 0 .$

We obtain

$ϖ = \sum_{i = 1}^{N} α_{i} \cdot Φ (A_{i}),$

$\sum_{i = 1}^{N} α_{i} = 0,$

$\frac{C}{N} \cdot (\frac{λ_{1}}{σ^{2}} \cdot ξ_{i} + λ_{2}) - α_{i} = 0 (i = 1, 2, \dots, N) .$

The extreme condition is replaced by $L (ϖ, b, α, ξ)$ , and the maximum value of $α$ is obtained. The dual problem in Equation (8) of the primal problem in Equation (7) is derived. □

Therefore,

ϖ_{i} = \sum_{i = 1}^{N} α_{i} \cdot Φ (A_{i}),

b = \frac{1}{N} \sum_{i = 1}^{N} [y_{i} - \sum_{j = 1}^{N} α_{i} \cdot K (A_{i}, A_{j}) - \frac{1}{λ_{1}} \cdot (\frac{N \cdot σ^{2} \cdot α_{i}}{C} - λ_{2})] .

The decision-maker for $G L M - L S S V R$ may be represented as

f (A) = ϖ^{T} \cdot Φ (A) + b = \sum_{i = 1}^{N} α_{i} K (A_{i}, A) + b .

where the parameter vector $ϖ \in R^{n}$ , $Φ : R^{n} \to H$ , $(Φ (A_{i}) \cdot Φ (A_{j}))$ is the inner-product of H and $K (A_{i}, A_{j}) = (Φ (A_{i}) \cdot Φ (A_{j}))$ is the kernel-function.

Suppose the noise in Equation (2) is Gaussian homoscedastic noise, which is Gaussian noise of zero mean and the homoscedastic variance $σ^{2}$ . Thus, the dual problem of $L S S V R$ can be derived by Theorem 2:

\begin{matrix} M a x {g_{D_{L S S V R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (α_{i} \cdot α_{j} \cdot K (A_{i}, A_{j})) \\ + \sum_{i = 1}^{N} (α_{i} \cdot y_{i}) - \frac{N}{2 C} \cdot \sum_{i = 1}^{N} (σ^{2} \cdot α_{i}^{2})} \\ D_{G N - K R R} : s . t . \sum_{i = 1}^{N} α_{i} = 0 . \end{matrix}

(9)

3.2. $L S S V R$ Model of G-L Mixed Heteroscedastic Noise-Characteristic

It is assumed that the noise in Equation (2) is Gaussian of zero mean and heteroscedastic standard deviation $σ_{i}$ , that is $σ_{i} \neq σ_{j}$ , $i \neq j (i, j = 1, \dots, N)$ . From Equation (3), the empirical risk loss of heteroscedastic Gaussian-noise characteristic is $l_{1} (ξ_{i}) = \frac{1}{2 σ_{i}^{2}} \cdot ξ_{i}^{2}$ and the loss-function of Laplacian-noise is $l_{2} (ξ_{i}) = | ξ_{i} |$ , $(i = 1, \dots, N)$ . Utilizing G-L mixed heteroscedastic noise distribution to predict complicated noise-characteristic, from Equation (4), the loss function corresponding to G-L mixed heteroscedastic noise is $l (ξ_{i}) = \frac{λ_{1}}{2 σ_{i}^{2}} \cdot ξ_{i}^{2} + λ_{2} \cdot | ξ_{i} | (i = 1, \dots, N)$ . The new model $L S S V R$ with G-L mixed heteroscedastic noise-characteristic ( $G L M H - L S S V R$ ) is proposed. The primal problem of $G L M H - L S S V R$ is depicted as

\begin{matrix} M i n {g_{P_{G L M H - L S S V R}} = \frac{1}{2} ϖ^{T} \cdot ϖ + \\ \frac{C}{N} \cdot (\frac{λ_{1}}{2} \cdot \sum_{i = 1}^{N} \frac{1}{σ_{i}^{2}} ξ_{i}^{2} + λ_{2} \cdot \sum_{i = 1}^{N} ξ_{i})} \\ s . t . : ξ_{i} = y_{i} - ϖ^{T} \cdot Φ (A_{i}) - b \end{matrix}

(10)

where the parameter vector is $ϖ \in R^{n}$ , $σ_{i}^{2} (i = 1, 2, \dots, N)$ are heteroscedastic, and $C > 0$ is the penalty parameter. The weight-factors are $λ_{1}, λ_{2} \geq 0$ and $λ_{1} + λ_{2} = 1$ .

Proposition 2.

The solution of the primal problem in Equation (10) of $G L M H - L S S V R$ is existent and unique about ϖ.

Theorem 2.

The dual problem of model $G L M H - L S S V R$ in Equation (10) is

$\begin{matrix} M a x {g_{D_{G L M H - L S S V R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} α_{i} \cdot α_{j} \cdot K (A_{i}, A_{j}) + \\ \sum_{i = 1}^{N} α_{i} \cdot y_{i} - \frac{N}{2 C \cdot λ_{1}} \sum_{i = 1}^{N} {(σ_{i}^{2} \cdot α_{i} - C \cdot λ_{2})}^{2}} \\ s . t . : \sum_{i = 1}^{N} α_{i} = 0 \end{matrix}$ (11)

where $σ_{i}^{2} (i = 1, 2, \dots, N)$ are heteroscedastic and $C > 0$ is the penalty parameter. The weight factors are $λ_{1}, λ_{2} \geq 0$ and $λ_{1} + λ_{2} = 1$ .

Proof.

It is easier to derive the proof of Theorem 2 by analogy with Theorem 1. □

We have

ϖ_{i} = \sum_{i = 1}^{N} α_{i} \cdot Φ (A_{i}),

b = \frac{1}{N} \sum_{i = 1}^{N} [y_{i} - \sum_{j = 1}^{N} α_{i} \cdot K (A_{i}, A_{j}) - \frac{1}{λ_{1}} \cdot (\frac{N \cdot σ_{i}^{2} \cdot α_{i}}{C} - λ_{2})] .

The decision-maker for $G L M H - L S S V R$ may be expressed as

f (A) = ϖ^{T} \cdot Φ (A) + b = \sum_{i = 1}^{N} α_{i} K (A_{i}, A) + b,

where the parameter vector is $ϖ \in R^{n}$ , $Φ : R^{n} \to H$ , and $K (A_{i}, A_{j})$ is the kernel function.

Suppose noise in Equation (2) is G-L mixed-homoscedastic-noise, in which Gaussian-noise of zero mean and homoscedastic-variance $σ^{2}$ , Theorem 1 can be deduced from Theorem 2.

4. Solution from ALM

In this section, we use Augmented Lagrange-multiplier method (ALM) [32] to solve the dual problem in Equation (8) by applying Gradient descent or Newton’s method to a sequence of equality-constrained problems. By eliminating equality constraints, arbitrary equality constraints can be reduced to equivalent unconstrained problems [33,34]. If there are large-scale training samples, some rapid optimization techniques can be combined with the proposed model, for example the sequential minimal optimization (SMO) algorithm [29] and the stochastic gradient decent (SDG) algorithm [35].

Theorems 1 and 2 provide effective recognition techniques for $G L M - L S S V R$ and $G L M H - L S S V R$ , respectively. In this section, we derive the solution from ALM and the algorithm for model $L S S V R$ of G-L mixed homoscedastic noise characteristic ( $G L M - L S S V R$ ). Analogously, the solution of model $G L M H - L S S V R$ can be obtained by ALM method.

(1) Let dataset be $D_{N} = {(A_{1}, y_{1}), (A_{2}, y_{2}), \dots, (A_{N}, y_{N})}$ , where $A_{i} \in R^{n}$ , $y_{i} \in R$ , $i = 1, \dots, N$ .

(2) The optimal parameters $C, λ_{1}, λ_{2}$ were searched by using the 10-fold cross-validation strategy, and the appropriate kernel function $K (•, •)$ was selected.

(3) Solve model $G L M - L S S V R$ of the problem in Equation (8), and get the optimal solution $α = (α_{1}, \dots, α_{N})$ .

(4) Build the decision-function as follows

f (A) = ϖ^{T} \cdot Φ (A) + b = \sum_{i = 1}^{N} α_{i} K (A_{i}, A) + b .

The parameter vector is $ϖ \in R^{n}$ , $b = \frac{1}{N} \sum_{i = 1}^{N} [y_{i} - \sum_{j = 1}^{N} α_{i} \cdot K (A_{i}, A_{j}) - \frac{1}{λ_{1}} \cdot (\frac{N \cdot σ^{2} \cdot α_{i}}{C} - λ_{2})]$ , $Φ : R^{n} \to H$ , $(Φ (A_{i}) \cdot Φ (A_{j}))$ ( $(i, j = 1, 2, \dots, N)$ ) is the inner product in H, $K (A_{i}, A_{j}) = (Φ (A_{i}) \cdot Φ (A_{j}))$ is a kernel function.

5. Case Study

This section tests and verifies the validity of constructed model $G L M - L S S V R$ by comparing it with other techniques in the Heilongjiang, China dataset $D_{N}$ . This case study consists of the following subsections: G-L mixed-noise characteristic of wind speed, prediction performance evaluation criteria, and short-term wind-speed forecasting based on an actual dataset.

5.1. G-L Mixed-Noise-Characteristic of Wind-Speed

To demonstrate the effectiveness of the proposed model, we collected wind speed data from Heilongjiang. The dataset consists of more than one year of wind speed data, recording wind speed values every 10 min. We first discovered the G-L mixed noise and conducted experiments on it. We found that turbulence is the main reason for the high uncertainty of wind speed random fluctuations. From the perspective of wind energy, the most significant feature of wind energy resources is their variability. Now, it shows the distribution of wind speed. Take a wind speed value every 5 s and calculate the histogram of wind speed within 1–2 h. Two typical distributions are given: one is calculated when the wind speed is high and the other is calculated when the wind speed is low (see Figure 2 and Figure 3, respectively).

We analyzed the one-month time-series dataset, and used the persistence method to investigate the error distribution [32]. The results show that the wind speed error $ξ$ obtained from the persistence prediction is not subject to single distribution, while approximately to G-L mixed distribution, and $P D F$ of $ξ$ is $p (ξ) = \frac{1}{2} e^{- | ξ |} \cdot \frac{1}{2 σ^{2}} ξ^{2}$ , as shown in Figure 4.

G-L mixed distribution of wind-speed forecasting-error with the persistence method.

As can be seen from the above charts and figures, wind speed error approximately satisfies G-L mixed distribution. This is a mixed kind of task.

5.2. Prediction Performance Evaluation Criteria

It is generally known that no prediction model forecasts perfectly. The predictable performance of $ν - S V R$ , $G N - S V R$ , $L S S V R$ , and $G L M - L S S V R$ also has certain evaluation criteria, for example MAE (mean absolute error), RMSE (root mean square error), MAPE (mean absolute percentage error), and SEP (the standard error of prediction). The four criteria be defined as follows:

M A E = \frac{1}{N} \sum_{i = 1}^{N} | y_{i}^{'} - y_{i} |,

(12)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} \frac{| y_{i}^{'} - y_{i} |}{y_{i}} \times 100 %,

(13)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{'} - y_{i})}^{2}},

(14)

S E P = \frac{R M S E}{\bar{y}} \times 100 %,

(15)

where N is the size of the dataset $D_{N}$ , $y_{i}$ is the ith actual observed data, and $y_{i}^{'}$ is the ith forecasted-result. $\bar{y}$ is the mean value of observations $y_{i} \in D_{N}$ [36,37,38,39,40]. $M A E$ shows how similar the predicted value is to the observed value, while $R M S E$ measures overall deviation between predicted value and observed value. $M A P E$ is the ratio between error and observed value. $S E P$ is the ratio of $R M S E$ to average observation. They are dimensionless measurements of accuracy of wind speed system, and are sensitive to small changes.

5.3. Short-Term Wind-Speed Forecasting with Real dataset

In this section, 2160 consecutive data (1–2160, time span of 15-days) are extracted as the training set and 720 consecutive data (2161–2880, time span of 5-days) are extracted as the testing set. The input vector is $\bar{A_{i}} = (x_{i - 11}, x_{i - 10}, \dots, x_{i - 1}, x_{i})$ , $x_{j}$ is the actual observed data of wind speed at moment $j (j = i - 11, i - 10, \dots, i)$ , and the forecasting value is $x_{i + s t e p}$ , where $s t e p = 1, 3, 6$ . That is, the above models are used to forecast wind speed of each point $x_{i}$ after 10, 30 and 60 min, respectively. Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 describe the forecasting results given by models $ν - S V R$ , $G N - S V R$ , $L S S V R$ , and $G L M - L S S V R$ .

Result of four wind-speed forecasting models after 10 min.

Error of four wind-speed forecasting models after 10 min.

Residual box plot of four wind-speed forecasting models after 10 min.

Result of four wind-speed forecasting models after 30 min.

Error of four wind-speed forecasting models after 30 min.

Residual box plot of four wind-speed forecasting models after 30 min.

Result of four wind-speed forecasting models after 60 min.

Error of four wind-speed forecasting models after 60 min.

Residual box plot of four wind-speed forecasting models after 60 min.

The models $ν - S V R$ , $G N - S V R$ , $L S S V R$ , and $G L M - L S S V R$ were implemented in Matlab 7.8. Initial parameters of $G L M - L S S V R$ were $C \in [1, 200]$ , $ν \in (0, 1)$ , and $λ_{1}, λ_{2} \in [0, 1]$ . The optimal parameters $C, ν, λ_{1}, λ_{2}$ were searched by using 10-fold cross-validation technique. The technology of parameter selection is studied in detail in [41,42]. In this simulation, parameters were set to $C = 181, ν = 0.5, λ_{1} = 0.5, λ_{2} = 0.5$ . The practical application demonstrates that both polynomial kernel and Gaussian kernel perform well under the assumption of smoothness. Under these circumstances, models $ν - S V R$ , $G N - S V R$ , $L S S V R$ , and $G L M - L S S V R$ employ polynomial and Gaussian kernel functions [43]:

K (A_{i}, A_{j}) = {((A_{i}, A_{j}) + 1)}^{d},

K (A_{i}, A_{j}) = e^{- \frac{∥ A_{i} - A_{j} ∥^{2}}{σ^{2}}},

where d is a positive integer and $σ$ is a positive number.

The dual problem of $ν - S V R$ and $S V R$ of the Gaussian-noise model ( $G N - S V R$ ) and $L S S V R$ are as follows.

$ν - S V R$ : The authors of [41,44] define the dual problem of $ν - S V R$ as

\begin{matrix} M a x {g_{D_{ν - S V R}} = - \frac{1}{2} \sum_{i \in R S V} \sum_{j \in R S V} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) \\ \cdot K (A_{i}, A_{j}) + \sum_{i \in R S V} (α_{i}^{*} - α_{i}) \cdot y_{i}} \\ s . t . : \sum_{i = 1}^{N} (α_{i}^{*} - α_{i}) = 0 \\ 0 \leq α_{i}^{(*)} \leq \frac{C}{N}, i = 1, \dots, N \\ \sum_{i = 1}^{N} (α_{i} + α_{i}^{*}) \leq C \cdot ν, i = 1, \dots, N . \end{matrix}

(16)

$G N - S V R$ : The authors of [45,46] studied $S V R$ with equality constraints and inequality constraints. The loss-function of Gaussian-noise is $c (ξ_{i}) = ξ_{i}^{2} / 2$ , ( $i = 1, \dots, N$ ). Thus, thus dual problem of $G N - S V R$ is

\begin{matrix} m a x {g_{D_{G N - S V R}} = - \frac{1}{2} \sum_{i \in R S V} \sum_{j \in R S V} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (A_{i}, A_{j}) + \\ \sum_{i \in R S V} (α_{i}^{*} - α_{i}) y_{i} - \frac{N}{2 C} \sum_{i = 1}^{N} (α_{i}^{2} + {(α_{i}^{*})}^{2})} \\ s . t . : \sum_{i = 1}^{N} (α_{i}^{*} - α_{i}) = 0 \\ 0 \leq α_{i}^{(*)} \leq \frac{C}{N}, i = 1, \dots, N \\ \sum_{i = 1}^{N} (α_{i} + α_{i}^{*}) \leq C \cdot ν, i = 1, \dots, N . \end{matrix}

(17)

$L S S V R$ : [22] studied $L S - S V R$ for Gaussian-noise model. The dual problem of $L S - S V R$ is

\begin{matrix} m a x {g_{D_{L S - S V R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (α_{i} \cdot α_{j} \cdot K (A_{i}, A_{j})) \\ + \sum_{i = 1}^{N} (α_{i} \cdot y_{i}) - \frac{N}{2 C} \cdot \sum_{i = 1}^{N} (α_{i}^{2})} \\ s . t . : \sum_{i = 1}^{N} α_{i} = 0 . \end{matrix}

(18)

where $ξ_{i}, ξ_{i}^{*}$ are slack-variables. $C > 0$ , $ν \in (0, 1]$ are constants. For $ν - S V R$ and $G N - S V R$ , the size of $ϵ$ is not gained, but is a variable whose value is compromised by a constant with the model complexity and relaxation variables through $ν$ [35].

In Figure 5, Figure 8 and Figure 11, wind-speed forecasting-results at $A_{i}$ -point of $ν - S V R$ , $G N - S V R$ , $L S S V R$ , and $G L M - L S S V R$ are presented after 10, 30, and 60 min, respectively. Figure 6, Figure 9, and Figure 12 show the error statistic of wind-speed prediction using the above four models. The box plots (Figure 7, Figure 10, and Figure 13) of several noise levels further intuitively demonstrate the comparative effect of error statistics using the above four wind-speed forecasting models. The statistical criteria of $M A E$ , $M A P E$ , $R M S E$ and $S E P$ are displayed in Table 1, Table 2 and Table 3.

Table 1.

Error statistic of four wind-speed forecasting models after 10 min.

Model	MAE (m/s)	RMSE (m/s)	MAPE (%)	SEP (%)
$ν - S V R$	0.4280	0.5833	8.02	7.12
$G N - S V R$	0.4256	0.5789	7.92	7.07
$L S S V R$	0.4219	0.5768	7.94	7.06
$G L M - L S S V R$	0.4190	0.5711	7.91	7.05

Open in a new tab

Table 2.

Error statistic of four wind-speed forecasting models after 30 min.

Model	MAE (m/s)	RMSE (m/s)	MAPE (%)	SEP (%)
$ν - S V R$	0.7979	1.0116	23.36	12.53
$G N - S V R$	0.7368	0.9886	19.93	11.89
$L S S V R$	0.7109	0.9226	17.17	11.43
$G L M - L S S V R$	0.6185	0.8241	10.71	10.19

Open in a new tab

Table 3.

Error statistic of four wind-speed forecasting models after 60 min.

Model	MAE (m/s)	RMSE (m/s)	MAPE (%)	SEP (%)
$ν - S V R$	0.9994	1.2580	33.93	15.66
$G N - S V R$	0.9728	1.2355	31.78	15.37
$L S S V R$	0.9646	1.2177	29.01	15.16
$G L M - L S S V R$	0.8835	1.1180	25.72	13.97

Open in a new tab

From box-whisker plots in Figure 7, Figure 10, and Figure 13, as well as Table 1, Table 2 and Table 3, it can be concluded that, in most cases, the forecasting-error of $G L M - L S S V R$ calculation is superior to $ν - S V R$ , $G N - S V R$ and $L S S V R$ . With the increase of prediction horizon to 30 and 60 min, the forecasting error of different models increases and the relative error decreases. Thus, in these cases, it is not that important. However, Table 1, Table 2 and Table 3 show that, under all the criteria of $M A E$ , $M A P E$ , $R M S E$ , and $S E P$ , the Gaussian–Laplacian mixed-noise model is slightly better than the classical model.

6. Conclusions

Most existing regression-techniques suppose that the noise model is single. Wind-speed forecasting is complicated due to its volatility and uncertainty, thus it is difficult to model with a single-noise distribution. This section summarizes our main work: (1) optimal empirical risk loss of G-L mixed noise is deduced by Bayesian principle; (2) the $L S S V R$ of G-L mixed homoscedastic noise ( $G L M - L S S V R$ ) and G-L mixed heteroscedastic noise ( $G L M H - L S S V R$ ) for complicate noise is developed; (3) the dual problem of $G L M - L S S V R$ and $G L M H - L S S V R$ is obtained using Lagrange-functional and according to KKT conditions; (4) the stability and effectiveness of the algorithm are guaranteed by solving $G L M - L S S V R$ with the ALM method; and (5) the proposed technology is used to predict short-term wind speed by historical data, and then forecast the wind speed at some time after 10, 30, and 60 min, respectively. The comparison results display that the proposed model is better than classical technologies in statistical criteria.

In the same way, we can also study Gaussian–Laplacian, or Gaussian–Weibull mixed noise classification models. The new hybrid noise models would effectively solve complicated noise classification problems.

Abbreviations

The following abbreviations are used in this manuscript:

LR	Linear regression model
$ν$ -SVR	$ν$ -Support vector regression
GN-SVR	$ν$ -SVR model of Gaussian homoscedastic-noise
LSSVR	Least squares support vector regression model
GLM-LSSVR	LSSVR model of Gaussian–Laplacian mixed homoscedastic-noise
ALM	Augmented Lagrange multiplier method

Open in a new tab

Author Contributions

Conceptualization, S.Z.; Formal analysis, S.Z. and T.Z.; Methodology, S.Z. and L.S.; Writing–original draft, S.Z. and T.Z.; Writing–review & editing, W.W. and B.C. All authors have read and approved the final published version. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National natural science foundation of China (NSFC) (Nos. 11702087 and 61772176) and Natural Science Foundation Project of Henan (No. 182300410130).

Conflicts of Interest

The authors declare no conflict of interest.

References

1.Bishop C.M. Pattern Recognition and Machine Learning. Springer; New York, NY, USA: 2006. [Google Scholar]
2.Tikhonov A.A., Arsenin V.Y. Solutions of Ill-Posed Problems. New York Wiley; New York, NY, USA: 1977. [Google Scholar]
3.Gonen A., Orabona F., Shalev-Shwartz S. Solving Ridge Regression using Sketched Preconditioned SVRG; Proceedings of the 33rd International Conference on Machine Learning; New York, NY, USA. 19–24 June 2016. [Google Scholar]
4.Hoerl A.E. Application of ridge analysis to regression problems. Chem. Eng. Prog. 1962;58:54–59. [Google Scholar]
5.Zhang Z.H., Dai G., Xu C.F. Regularized Discriminant Analysis, Ridge Regression and Beyond. J. Mach. Learn. Res. 2010;11:2199–2228. [Google Scholar]
6.Sun L., Wang L., Ding W., Qian Y., Xu J. Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans. Fuzzy Syst. 2020 doi: 10.1109/TFUZZ.2020.2989098. [DOI] [Google Scholar]
7.Jiao L.C., Bo L.F., Wang L. Fast Sparse Approximation for Least Squares Support Vector Machine. IEEE Trans. Neural Netw. 2007;18:685–697. doi: 10.1109/TNN.2006.889500. [DOI] [PubMed] [Google Scholar]
8.Völgyesi L., Palánc B., Fekete K., Popper G. Application of Kernel Ridge Regression to Network Levelling via Mathematica. Geophys. Res. Abstr. 2005;73:263–276. [Google Scholar]
9.Sun L., Zhang X., Qian Y., Xu J., Zhang S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci. 2019;502:18–41. doi: 10.1016/j.ins.2019.05.072. [DOI] [Google Scholar]
10.Douak F., Melgani F., Benoudjit N. Kernel ridge regression with active learning for wind-speed prediction. Appl. Energy. 2013;103:328–340. doi: 10.1016/j.apenergy.2012.09.055. [DOI] [Google Scholar]
11.Alexiadis M.C., Dokopoulos P.S., Sahsamanoglou H.S., Manousaridis I.M. Short term forecasting of wind speed and related electrical power. J. Sol. Energy. 1998;63:61–68. doi: 10.1016/S0038-092X(98)00032-2. [DOI] [Google Scholar]
12.Negnevitsky M., Potter C.W. Innovative short-term wind generation prediction techniques; Proceedings of the power systems conference and exposition; Atlanta, GA, USA. 29 October–1 November 2006. [Google Scholar]
13.Torres J.L., Garcia A., De Blas M., De Francisco A. Forecast of hourly average wind speed with ARMA models in Navarre (Spain) J. Sol. Energy. 2005;79:65–77. doi: 10.1016/j.solener.2004.09.013. [DOI] [Google Scholar]
14.Kavasseri R.G., Seetharaman K. Day-ahead wind-speed forecasting using f-ARIMA models. Renew. Energy. 2009;34:1388–1393. doi: 10.1016/j.renene.2008.09.006. [DOI] [Google Scholar]
15.Li G., Shi J. On comparing three artificial neural networks for wind speed forecasting. Appl. Energy. 2010;87:2313–2320. doi: 10.1016/j.apenergy.2009.12.013. [DOI] [Google Scholar]
16.Hu Q., Zhang R., Zhou Y. Transfer learning for short-term wind-speed prediction with deep neural networks. Renew. Energy. 2016;85:83–95. doi: 10.1016/j.renene.2015.06.034. [DOI] [Google Scholar]
17.Salcedo-Sanz S., Ortiz-Garcı E.G., Pérez-Bellido Á.M., Portilla-Figueras A., Prieto L. Short term wind-speed prediction based on evolutionary support vector regression algorithms. Expert Syst. Appl. 2011;38:4052–4057. doi: 10.1016/j.eswa.2010.09.067. [DOI] [Google Scholar]
18.Zhou J., Shi J., Li G. Fine tuning support vector machines for short-term wind speed forecasting. Energy Convers. Manag. 2011;52:1990–1998. doi: 10.1016/j.enconman.2010.11.007. [DOI] [Google Scholar]
19.Liu H., Tian H.-Q., Chen C., Li Y.-F. A hybrid statistical method to predict wind speed and wind power. Renew. Energy. 2010;35:1857–1861. doi: 10.1016/j.renene.2009.12.011. [DOI] [Google Scholar]
20.Wang Y., Hu Q., Li L., Foley A.M., Srinivasan D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 2019;116:109422. doi: 10.1016/j.rser.2019.109422. [DOI] [Google Scholar]
21.Suykens J., Vandewalle J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999;9:293–300. doi: 10.1023/A:1018628609742. [DOI] [Google Scholar]
22.Suykens J., Lukas L., Vandewalle J. Sparse approximation using least square vector machines; Proceedings of the IEEE International Symposium on Circuits and Systems; Geneva, Switzerland. 28–31 May 2000; pp. 757–760. [Google Scholar]
23.Suykens J., De Brabanter J., Lukas L., Vandewalle J. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing. 2002;48:85–105. doi: 10.1016/S0925-2312(01)00644-0. [DOI] [Google Scholar]
24.Du P., Wang J., Guo Z., Yang W. Research and application of a novel hybrid forecasting system based on multi-objective optimization for wind speed forecasting. Energy Convers. Manag. 2017;150:90–107. doi: 10.1016/j.enconman.2017.07.065. [DOI] [Google Scholar]
25.Sun L., Wang L., Ding W., Qian Y., Xu J. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl. -Based Syst. 2020;192:105373. doi: 10.1016/j.knosys.2019.105373. [DOI] [Google Scholar]
26.Jiang Y., Huang G.Q. A hybrid method based on singular spectrum analysis, firefly algorithm, and BP neural network for short-term wind-speed forecasting. Energies. 2016;9:757. [Google Scholar]
27.Jiang Y., Huang G. Short-term wind speed prediction: Hybrid of ensemble empirical mode decomposition, feature selection and error correction. Energy Convers. Manag. 2017;144:340–350. doi: 10.1016/j.enconman.2017.04.064. [DOI] [Google Scholar]
28.Zhang S., Zhou T., Sun L., Wang W., Wang C., Mao W. ν-Support Vector Regression Model Based on Gauss-Laplace Mixture Noise Characteristic for Wind Speed Prediction. Entropy. 2019;21:1056. doi: 10.3390/e21111056. [DOI] [Google Scholar]
29.Shevade S., Keerthi S.S., Bhattacharyya C., Murthy K. Improvements to the SMO algorithm for SVM regression. IEEE Trans. Neural Netw. 2000;11:1188–1193. doi: 10.1109/72.870050. [DOI] [PubMed] [Google Scholar]
30.Klaus-Robert M., Sebastian M. An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 2001;12:181–202. doi: 10.1109/72.914517. [DOI] [PubMed] [Google Scholar]
31.Chu W., Keerthi S., Ong C.J. Bayesian Support Vector Regression Using a Unified Loss Function. IEEE Trans. Neural Netw. 2004;15:29–44. doi: 10.1109/TNN.2003.820830. [DOI] [PubMed] [Google Scholar]
32.Rockafellar R.T. Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming. SIAM J. Control. 1974;12:268–285. doi: 10.1137/0312021. [DOI] [Google Scholar]
33.Boyd S., Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004. pp. 521–620. [Google Scholar]
34.Wang S., Zhang N., Wu L., Wang Y. Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method. Renew. Energy. 2016;94:629–636. doi: 10.1016/j.renene.2016.03.103. [DOI] [Google Scholar]
35.Bordes A., Bottou L., Gallinari P. SGD-QN: Careful quasiNewton stochastic gradient descent. J. Mach. Learn. Res. 2009;10:1737–1754. [Google Scholar]
36.Bludszuweit H., Dominguez-Navarro J., Llombart A. Statistical Analysis of Wind Power Forecast Error. IEEE Trans. Power Syst. 2008;23:983–991. doi: 10.1109/TPWRS.2008.922526. [DOI] [Google Scholar]
37.Fabbri A., Román T.G.S., Abbad J.R., Quezada V.H.M. Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market. IEEE Trans. Power Syst. 2005;20:1440–1446. doi: 10.1109/TPWRS.2005.852148. [DOI] [Google Scholar]
38.Guo Z., Zhao J., Zhang W., Wang J. A corrected hybrid approach for wind speed prediction in Hexi Corridor of China. Energy. 2011;36:1668–1679. doi: 10.1016/j.energy.2010.12.063. [DOI] [Google Scholar]
39.Wang J.Z., Hu J.M. A robust combination approach for short-term wind-speed forecasting and analysis-Combination of the ARIMA, ELM, SVM and LSSVM forecasts using a GPR model. Energy. 2015;93:41–56. doi: 10.1016/j.energy.2015.08.045. [DOI] [Google Scholar]
40.Abdoos A.A. A new intelligent method based on combination of VMD and ELM for short term wind power forecasting. Neurocomputing. 2016;203:111–120. doi: 10.1016/j.neucom.2016.03.054. [DOI] [Google Scholar]
41.Chalimourda A., Schölkopf B., Smola A.J. Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Netw. 2004;17:127–141. doi: 10.1016/S0893-6080(03)00209-0. [DOI] [PubMed] [Google Scholar]
42.Cherkassky V., Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004;17:113–126. doi: 10.1016/S0893-6080(03)00169-2. [DOI] [PubMed] [Google Scholar]
43.Kwok J.T., Tsang I.W. Linear dependency between and the input noise in ϵ-support vector regression. IEEE Trans. Neural Netw. 2003;14:544–553. doi: 10.1109/TNN.2003.810604. [DOI] [PubMed] [Google Scholar]
44.Schölkopf B., Smola A.J., Williamson R.C., Bartlett P. New Support Vector Algorithms. Neural Comput. 2000;12:1207–1245. doi: 10.1162/089976600300015565. [DOI] [PubMed] [Google Scholar]
45.Wu Q. A hybrid-forecasting model based on Gaussian support vector machine and chaotic particle swarm optimization. Expert Syst. Appl. 2010;37:2388–2394. doi: 10.1016/j.eswa.2009.07.057. [DOI] [Google Scholar]
46.Wu Q., Law R. The forecasting model based on modified SVRM and PSO penalizing Gaussian noise. Expert Syst. Appl. 2011;38:1887–1894. doi: 10.1016/j.eswa.2010.07.120. [DOI] [Google Scholar]

[B1-entropy-22-00629] 1.Bishop C.M. Pattern Recognition and Machine Learning. Springer; New York, NY, USA: 2006. [Google Scholar]

[B2-entropy-22-00629] 2.Tikhonov A.A., Arsenin V.Y. Solutions of Ill-Posed Problems. New York Wiley; New York, NY, USA: 1977. [Google Scholar]

[B3-entropy-22-00629] 3.Gonen A., Orabona F., Shalev-Shwartz S. Solving Ridge Regression using Sketched Preconditioned SVRG; Proceedings of the 33rd International Conference on Machine Learning; New York, NY, USA. 19–24 June 2016. [Google Scholar]

[B4-entropy-22-00629] 4.Hoerl A.E. Application of ridge analysis to regression problems. Chem. Eng. Prog. 1962;58:54–59. [Google Scholar]

[B5-entropy-22-00629] 5.Zhang Z.H., Dai G., Xu C.F. Regularized Discriminant Analysis, Ridge Regression and Beyond. J. Mach. Learn. Res. 2010;11:2199–2228. [Google Scholar]

[B6-entropy-22-00629] 6.Sun L., Wang L., Ding W., Qian Y., Xu J. Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans. Fuzzy Syst. 2020 doi: 10.1109/TFUZZ.2020.2989098. [DOI] [Google Scholar]

[B7-entropy-22-00629] 7.Jiao L.C., Bo L.F., Wang L. Fast Sparse Approximation for Least Squares Support Vector Machine. IEEE Trans. Neural Netw. 2007;18:685–697. doi: 10.1109/TNN.2006.889500. [DOI] [PubMed] [Google Scholar]

[B8-entropy-22-00629] 8.Völgyesi L., Palánc B., Fekete K., Popper G. Application of Kernel Ridge Regression to Network Levelling via Mathematica. Geophys. Res. Abstr. 2005;73:263–276. [Google Scholar]

[B9-entropy-22-00629] 9.Sun L., Zhang X., Qian Y., Xu J., Zhang S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci. 2019;502:18–41. doi: 10.1016/j.ins.2019.05.072. [DOI] [Google Scholar]

[B10-entropy-22-00629] 10.Douak F., Melgani F., Benoudjit N. Kernel ridge regression with active learning for wind-speed prediction. Appl. Energy. 2013;103:328–340. doi: 10.1016/j.apenergy.2012.09.055. [DOI] [Google Scholar]

[B11-entropy-22-00629] 11.Alexiadis M.C., Dokopoulos P.S., Sahsamanoglou H.S., Manousaridis I.M. Short term forecasting of wind speed and related electrical power. J. Sol. Energy. 1998;63:61–68. doi: 10.1016/S0038-092X(98)00032-2. [DOI] [Google Scholar]

[B12-entropy-22-00629] 12.Negnevitsky M., Potter C.W. Innovative short-term wind generation prediction techniques; Proceedings of the power systems conference and exposition; Atlanta, GA, USA. 29 October–1 November 2006. [Google Scholar]

[B13-entropy-22-00629] 13.Torres J.L., Garcia A., De Blas M., De Francisco A. Forecast of hourly average wind speed with ARMA models in Navarre (Spain) J. Sol. Energy. 2005;79:65–77. doi: 10.1016/j.solener.2004.09.013. [DOI] [Google Scholar]

[B14-entropy-22-00629] 14.Kavasseri R.G., Seetharaman K. Day-ahead wind-speed forecasting using f-ARIMA models. Renew. Energy. 2009;34:1388–1393. doi: 10.1016/j.renene.2008.09.006. [DOI] [Google Scholar]

[B15-entropy-22-00629] 15.Li G., Shi J. On comparing three artificial neural networks for wind speed forecasting. Appl. Energy. 2010;87:2313–2320. doi: 10.1016/j.apenergy.2009.12.013. [DOI] [Google Scholar]

[B16-entropy-22-00629] 16.Hu Q., Zhang R., Zhou Y. Transfer learning for short-term wind-speed prediction with deep neural networks. Renew. Energy. 2016;85:83–95. doi: 10.1016/j.renene.2015.06.034. [DOI] [Google Scholar]

[B17-entropy-22-00629] 17.Salcedo-Sanz S., Ortiz-Garcı E.G., Pérez-Bellido Á.M., Portilla-Figueras A., Prieto L. Short term wind-speed prediction based on evolutionary support vector regression algorithms. Expert Syst. Appl. 2011;38:4052–4057. doi: 10.1016/j.eswa.2010.09.067. [DOI] [Google Scholar]

[B18-entropy-22-00629] 18.Zhou J., Shi J., Li G. Fine tuning support vector machines for short-term wind speed forecasting. Energy Convers. Manag. 2011;52:1990–1998. doi: 10.1016/j.enconman.2010.11.007. [DOI] [Google Scholar]

[B19-entropy-22-00629] 19.Liu H., Tian H.-Q., Chen C., Li Y.-F. A hybrid statistical method to predict wind speed and wind power. Renew. Energy. 2010;35:1857–1861. doi: 10.1016/j.renene.2009.12.011. [DOI] [Google Scholar]

[B20-entropy-22-00629] 20.Wang Y., Hu Q., Li L., Foley A.M., Srinivasan D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 2019;116:109422. doi: 10.1016/j.rser.2019.109422. [DOI] [Google Scholar]

[B21-entropy-22-00629] 21.Suykens J., Vandewalle J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999;9:293–300. doi: 10.1023/A:1018628609742. [DOI] [Google Scholar]

[B22-entropy-22-00629] 22.Suykens J., Lukas L., Vandewalle J. Sparse approximation using least square vector machines; Proceedings of the IEEE International Symposium on Circuits and Systems; Geneva, Switzerland. 28–31 May 2000; pp. 757–760. [Google Scholar]

[B23-entropy-22-00629] 23.Suykens J., De Brabanter J., Lukas L., Vandewalle J. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing. 2002;48:85–105. doi: 10.1016/S0925-2312(01)00644-0. [DOI] [Google Scholar]

[B24-entropy-22-00629] 24.Du P., Wang J., Guo Z., Yang W. Research and application of a novel hybrid forecasting system based on multi-objective optimization for wind speed forecasting. Energy Convers. Manag. 2017;150:90–107. doi: 10.1016/j.enconman.2017.07.065. [DOI] [Google Scholar]

[B25-entropy-22-00629] 25.Sun L., Wang L., Ding W., Qian Y., Xu J. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl. -Based Syst. 2020;192:105373. doi: 10.1016/j.knosys.2019.105373. [DOI] [Google Scholar]

[B26-entropy-22-00629] 26.Jiang Y., Huang G.Q. A hybrid method based on singular spectrum analysis, firefly algorithm, and BP neural network for short-term wind-speed forecasting. Energies. 2016;9:757. [Google Scholar]

[B27-entropy-22-00629] 27.Jiang Y., Huang G. Short-term wind speed prediction: Hybrid of ensemble empirical mode decomposition, feature selection and error correction. Energy Convers. Manag. 2017;144:340–350. doi: 10.1016/j.enconman.2017.04.064. [DOI] [Google Scholar]

[B28-entropy-22-00629] 28.Zhang S., Zhou T., Sun L., Wang W., Wang C., Mao W. ν-Support Vector Regression Model Based on Gauss-Laplace Mixture Noise Characteristic for Wind Speed Prediction. Entropy. 2019;21:1056. doi: 10.3390/e21111056. [DOI] [Google Scholar]

[B29-entropy-22-00629] 29.Shevade S., Keerthi S.S., Bhattacharyya C., Murthy K. Improvements to the SMO algorithm for SVM regression. IEEE Trans. Neural Netw. 2000;11:1188–1193. doi: 10.1109/72.870050. [DOI] [PubMed] [Google Scholar]

[B30-entropy-22-00629] 30.Klaus-Robert M., Sebastian M. An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 2001;12:181–202. doi: 10.1109/72.914517. [DOI] [PubMed] [Google Scholar]

[B31-entropy-22-00629] 31.Chu W., Keerthi S., Ong C.J. Bayesian Support Vector Regression Using a Unified Loss Function. IEEE Trans. Neural Netw. 2004;15:29–44. doi: 10.1109/TNN.2003.820830. [DOI] [PubMed] [Google Scholar]

[B32-entropy-22-00629] 32.Rockafellar R.T. Augmented Lagrange Multiplier Functions and Duality in Nonconvex Programming. SIAM J. Control. 1974;12:268–285. doi: 10.1137/0312021. [DOI] [Google Scholar]

[B33-entropy-22-00629] 33.Boyd S., Vandenberghe L. Convex Optimization. Cambridge University Press; Cambridge, UK: 2004. pp. 521–620. [Google Scholar]

[B34-entropy-22-00629] 34.Wang S., Zhang N., Wu L., Wang Y. Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method. Renew. Energy. 2016;94:629–636. doi: 10.1016/j.renene.2016.03.103. [DOI] [Google Scholar]

[B35-entropy-22-00629] 35.Bordes A., Bottou L., Gallinari P. SGD-QN: Careful quasiNewton stochastic gradient descent. J. Mach. Learn. Res. 2009;10:1737–1754. [Google Scholar]

[B36-entropy-22-00629] 36.Bludszuweit H., Dominguez-Navarro J., Llombart A. Statistical Analysis of Wind Power Forecast Error. IEEE Trans. Power Syst. 2008;23:983–991. doi: 10.1109/TPWRS.2008.922526. [DOI] [Google Scholar]

[B37-entropy-22-00629] 37.Fabbri A., Román T.G.S., Abbad J.R., Quezada V.H.M. Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market. IEEE Trans. Power Syst. 2005;20:1440–1446. doi: 10.1109/TPWRS.2005.852148. [DOI] [Google Scholar]

[B38-entropy-22-00629] 38.Guo Z., Zhao J., Zhang W., Wang J. A corrected hybrid approach for wind speed prediction in Hexi Corridor of China. Energy. 2011;36:1668–1679. doi: 10.1016/j.energy.2010.12.063. [DOI] [Google Scholar]

[B39-entropy-22-00629] 39.Wang J.Z., Hu J.M. A robust combination approach for short-term wind-speed forecasting and analysis-Combination of the ARIMA, ELM, SVM and LSSVM forecasts using a GPR model. Energy. 2015;93:41–56. doi: 10.1016/j.energy.2015.08.045. [DOI] [Google Scholar]

[B40-entropy-22-00629] 40.Abdoos A.A. A new intelligent method based on combination of VMD and ELM for short term wind power forecasting. Neurocomputing. 2016;203:111–120. doi: 10.1016/j.neucom.2016.03.054. [DOI] [Google Scholar]

[B41-entropy-22-00629] 41.Chalimourda A., Schölkopf B., Smola A.J. Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Netw. 2004;17:127–141. doi: 10.1016/S0893-6080(03)00209-0. [DOI] [PubMed] [Google Scholar]

[B42-entropy-22-00629] 42.Cherkassky V., Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004;17:113–126. doi: 10.1016/S0893-6080(03)00169-2. [DOI] [PubMed] [Google Scholar]

[B43-entropy-22-00629] 43.Kwok J.T., Tsang I.W. Linear dependency between and the input noise in ϵ-support vector regression. IEEE Trans. Neural Netw. 2003;14:544–553. doi: 10.1109/TNN.2003.810604. [DOI] [PubMed] [Google Scholar]

[B44-entropy-22-00629] 44.Schölkopf B., Smola A.J., Williamson R.C., Bartlett P. New Support Vector Algorithms. Neural Comput. 2000;12:1207–1245. doi: 10.1162/089976600300015565. [DOI] [PubMed] [Google Scholar]

[B45-entropy-22-00629] 45.Wu Q. A hybrid-forecasting model based on Gaussian support vector machine and chaotic particle swarm optimization. Expert Syst. Appl. 2010;37:2388–2394. doi: 10.1016/j.eswa.2009.07.057. [DOI] [Google Scholar]

[B46-entropy-22-00629] 46.Wu Q., Law R. The forecasting model based on modified SVRM and PSO penalizing Gaussian noise. Expert Syst. Appl. 2011;38:1887–1894. doi: 10.1016/j.eswa.2010.07.120. [DOI] [Google Scholar]

PERMALINK

LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications

Shiguang Zhang

Ting Zhou

Lin Sun

Wei Wang

Baofang Chang

Abstract

1. Introduction

2. Bayesian Principle to Mixed Noise Empirical Risk Loss

Figure 1.

3. LSSVR Model of G-L Mixed Noise-Characteristic

Definition 1

3.1. LSSVR Model of G-L Mixed Homoscedastic Noise-Characteristic

Proposition 1.

Theorem 1.

Proof.

3.2. LSSVR Model of G-L Mixed Heteroscedastic Noise-Characteristic

Proposition 2.

Theorem 2.

Proof.

4. Solution from ALM

5. Case Study

5.1. G-L Mixed-Noise-Characteristic of Wind-Speed

Figure 2.

Figure 3.

Figure 4.

5.2. Prediction Performance Evaluation Criteria

5.3. Short-Term Wind-Speed Forecasting with Real dataset

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Figure 12.

Figure 13.

Table 1.

Table 2.

Table 3.

6. Conclusions

Abbreviations

Author Contributions

Funding

Conflicts of Interest

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. $LSSVR$ Model of G-L Mixed Noise-Characteristic

3.1. $L S S V R$ Model of G-L Mixed Homoscedastic Noise-Characteristic

3.2. $L S S V R$ Model of G-L Mixed Heteroscedastic Noise-Characteristic