A regularized stochastic configuration network based on weighted mean of vectors for regression

Yang Wang; Tao Zhou; Guanci Yang; Chenglong Zhang; Shaobo Li

doi:10.7717/peerj-cs.1382

. 2023 May 30;9:e1382. doi: 10.7717/peerj-cs.1382

A regularized stochastic configuration network based on weighted mean of vectors for regression

Yang Wang ^1,^✉, Tao Zhou ¹, Guanci Yang ², Chenglong Zhang ³, Shaobo Li ¹

Editor: Carlos Fernandez-Lozano

PMCID: PMC10280388 PMID: 37346579

Abstract

The stochastic configuration network (SCN) randomly configures the input weights and biases of hidden layers under a set of inequality constraints to guarantee its universal approximation property. The SCN has demonstrated great potential for fast and efficient data modeling. However, the prediction accuracy and convergence rate of SCN are frequently impacted by the parameter settings of the model. The weighted mean of vectors (INFO) is an innovative swarm intelligence optimization algorithm, with an optimization procedure consisting of three phases: updating rule, vector combining, and a local search. This article aimed at establishing a new regularized SCN based on the weighted mean of vectors (RSCN-INFO) to optimize its parameter selection and network structure. The regularization term that combines the ridge method with the residual error feedback was introduced into the objective function in order to dynamically adjust the training parameters. Meanwhile, INFO was employed to automatically explore an appropriate four-dimensional parameter vector for RSCN. The selected parameters may lead to a compact network architecture with a faster reduction of the network residual error. Simulation results over some benchmark datasets demonstrated that the proposed RSCN-INFO showed superior performance with respect to parameter setting, fast convergence, and network compactness compared with other contrast algorithms.

Keywords: Stochastic configuration networks, Swarm intelligence optimization, Weighted mean of vectors, Residual error feedback

Introduction

Neural networks have shown superiority over data modeling because of their powerful representation learning ability to learn patterns with multiple levels of abstraction that make sense to the data (Bengio, Courville & Vincent, 2013). However, the gradient-based iterative training process of neural networks is time-consuming and computationally intensive (Wang & Li, 2017b). Feed-forward neural networks (FNNs) with random parameters have drawn widespread attention due to their faster training speed and lower computational cost (Scardapane & Wang, 2017). Igelnik & Pao (1995) found that any continuous function can be approximated by a random vector functional link (RVFL) network with probability one under appropriate parameters. The hidden parameters of RVFL were assigned randomly in a preset scope and the output weights were calculated based on the least squares method (Cao et al., 2021). However, determining the preset scope of randomized learning models is challenging, and the widely used scope of random parameters (e.g., [−1,1]) is not always feasible (Li & Wang, 2017).

To resolve the infeasibility of using RVFL networks for data modeling with a fixed scope (i.e., [−1,1]), Wang & Li (2017b) proposed a novel randomized learning framework, termed SCN. The hidden parameters (input weights and biases) of SCN are randomly assigned under a supervisory mechanism and adaptively select their ranges, which indicate prominent merits on human intervention of network structure, range adaptation of hidden layer parameters, and sound generalization (Dai et al., 2019a).

Many efforts have been made to enhance the performance of SCN since it was developed in 2017. SCN with kernel density estimation (RSC-KDE) and maximum correntropy criterion (RSC-MCC) were presented to weaken the negative influences of noise and outliers, respectively, on modeling performance (Wang & Li, 2017a; Li, Huang & Wang, 2019). Zhu et al. (2019) delved deeper into the inequality constraint used in SCN and presented two new inequalities to increase the probability of satisfying the constraint condition. As deep neural networks (DNNs) with multiple levels of feature extraction can learn more abstract representations of the data, a deep version of SCN (DeepSCN) with multi-hidden layer network structure was proposed by Wang & Li (2018). For image data analysis with matrix inputs, a two-dimensional version of SCN (2DSCN) was proposed by Li & Wang (2019). For SCN ensembles, Wang & Cui (2017) adopted the negative correlation learning (NCL) ensemble learning technique to reduce the covariance among the base SCN for large-scale data analysis. Huang, Li & Wang (2021) designed a novel indicator that contained some key factors to explore appropriate base learner models from a set of SCN to generate an effective ensemble model. Zhang et al. (2021) developed a parallel SCN (PSCN) by introducing the beetle antennae search (BAS) optimization algorithm and fuzzy evidence theory for large-scale data regression. For finding the optimal parameter settings, Zhang & Ding (2021) utilized the chaotic sparrow search algorithm to optimize the contractive factor $r$ in the inequality and the scale factor λ of random parameters to enhance the effectiveness of SCN. In addition, various extensions of SCN were applied to data modeling in real-world applications, such as molten iron quality (MIQ) modeling in blast furnace ironmaking (BFI) (Xie & Zhou, 2020), particle size estimation of hematite grinding process (Dai et al., 2019b), traffic state prediction across geo-distributed data centers of the China Southern Power Grid (CSG) (Huang, Huang & Wang, 2019), and prediction of asphaltene and total nitrogen in crude oil (Lu & Ding, 2019; Lu et al., 2020).

SCN starts with a small-sized network structure and gradually adds new hidden nodes into the network until the residual error of SCN is smaller than the tolerance threshold. With the increasing number of hidden nodes, the constructive SCN model is prone to overfitting and thus poor performance (Wang et al., 2021). Meanwhile, the performance of SCN is frequently impacted by the parameter settings of the model, such as λ (the scale factor of weights and bias) and $r$ (the contractive factor in the inequality). Seeking better model parameters is vital for SCN. It is well known that the $L_{2}$ regularization technique, which adds the “squared magnitude” of the coefficient to the loss function, can prevent the problem of overfitting effectively. In the famous Residual Network (ResNet), He et al. (2016) let the stacked nonlinear layers fit a residual mapping of $F (x) := H (x) - x$ . Inspired by the idea of residual learning in ResNet, we used the current network residual error feedback to dynamically adjust the parameters of SCN.

Therefore, the objective of this study was to automatically obtain better parameters for SCN and get a more compact architecture. First, the $L_{2}$ regularization item combined with network residual error was introduced to improve the generalization performance of SCN. In addition, a regularized SCN based on INFO was developed to optimize the parameter selection of SCN. INFO is a relatively new swarm intelligence optimization method published in 2022. Updating rule, vector combining, and a local search were the three core phases of INFO (Ahmadianfar et al., 2022). It is a promising tool for the parameter optimization of the regularized stochastic configuration network (RSCN). To summarize, the key contributions of this article are as follows:

Introduce the regularization term that combines the ridge method with the network residual error into the objective function to dynamically adjust the training parameters of SCN.
Optimize the scope setting of the input weights and biases λ, contractive factor $r$ in the inequality, regularization coefficient $η$ , and positive scale factor $γ$ of feedback residual error of RSCN by INFO, which in turn achieves a better RSCN model with respect to fast convergence and structure compactness.
Illustrate the merits of RSCN-INFO on one function approximation and three benchmark regression datasets. The evaluation results justify the effectiveness of the proposed RSCN-INFO.

Preliminaries

This section briefly reviews the classical SCN framework and the newer INFO algorithm.

SCN

SCN is a novel randomized incremental learner framework with a supervisory mechanism. The universal approximation property of SCN is guaranteed by its innovative inequality constraint. The network structure of SCN is depicted in Fig. 1.

Let $Γ := {g_{1}, g_{2}, g_{3}, \dots}$ be a set of real-valued functions, span( $Γ$ ) denotes a function space spanned by $Γ$ , and $L_{2} (D)$ represents the space of all Lebesgue measurable functions $f = [f_{1}, f_{2}, \dots, f_{m}] : R^{d} \to R^{m}$ on a set $D \subset R^{d}$ . $f_{L - 1} (x) = \sum_{j}^{L - 1} β_{j} g_{j} (w_{j}^{T} x + b_{j})$ represents the output of a single layer feed-forward network with $L - 1$ hidden nodes, where $w_{j}$ and $b_{j}$ denote the parameters of the $j$ th hidden node, and $g_{j} (\cdot)$ is the activation function of the $j$ th hidden node. The inner product of $θ = [θ_{1}, θ_{2}, \dots, θ_{m}] : R^{d} \to R^{m}$ and $f$ is:

< f, θ >:= \sum_{q = 1}^{m} < f_{q}, θ_{q} >= \sum_{q = 1}^{m} \int_{D} f_{q} (x) θ_{q} (x) d x .

(1)

Given a training dataset $(x_{i}, t_{i})_{i = 1}^{N}$ , $x_{i} = [x_{i, 1}, \dots, x_{i, d}] \in R^{d}$ and $t_{i} = [t_{i, 1}, \dots, t_{i, m}] \in R^{m}$ , suppose that an established SCN model contains $L - 1$ hidden nodes. The network residual error is:

E_{L - 1} (x) = f (x) - f_{L - 1} (x) = [E_{L - 1, 1} (x), E_{L - 1, 2} (x), \dots, E_{L - 1, m} (x)] \in R^{N \times m}, E_{L - 1, 1} (x) = [E_{L - 1, 1} (x_{1}), E_{L - 1, 1} (x_{2}), \dots, E_{L - 1, 1} (x_{N})]^{T}, E_{L - 1, 2} (x) = [E_{L - 1, 2} (x_{1}), E_{L - 1, 2} (x_{2}), \dots, E_{L - 1, 2} (x_{N})]^{T}, ⋮ E_{L - 1, m} (x) = [E_{L - 1, m} (x_{1}), E_{L - 1, m} (x_{2}), \dots, E_{L - 1, m} (x_{N})]^{T} .

(2)

where $f$ is the given target function, and $f_{L - 1}$ represents the output of the network with $L - 1$ hidden nodes. The residual error of SCN gradually decreased as the number of hidden neurons increased. If the value of $E_{L - 1}$ was larger than the tolerance threshold $τ$ , SCN added a new hidden node into the network until $E_{L}$ was smaller than $τ$ . The parameters of the added hidden node were assigned randomly under a set of inequalities.

< E_{L - 1, q}, g_{L} >^{2} \geq b_{g}^{2} (1 - r - μ_{L}) | | e_{L - 1, q} | |^{2}, q = 1, 2, . . ., m,

(3)

where $< E_{L - 1, q}, g_{L} >$ is the inner product of $E_{L - 1, q}$ and $g_{L}$ , ${μ_{L}}$ represents a nonnegative real number sequence with $l i m_{L \to + \infty} μ_{L} = 0$ , and $μ_{L} \leq (1 - r)$ . $g$ indicates a non-linear activation function. $\forall g \in Γ$ (span( $Γ$ ) is dense in $L_{2}$ space), $0 <∥ g ∥< b_{g}$ ( $b_{g} \in R^{+}$ ). $r$ determines the strictness of the inequality constraint, and $0 < r < 1$ . The output weights were evaluated by:

β^{*} = \arg min_{β} | | H_{L} β - T | |_{F}^{2} = H_{L}^{†} T,

(4)

where $H_{L}^{†}$ represents the Moore-Penrose inverse of $H_{L}$ and $H_{L}$ is the output matrix of the hidden layer. Readers may refer to Wang & Li (2017b) for more details on the SCN framework and associated algorithms.

INFO

INFO is a new population-based optimization algorithm that employs updating rule, vector combining, and local search to move the population’s position in D dimensional search domains. Given a population with $N_{P}$ vectors, $X_{l, j}^{g} = {x_{l, 1}^{g}, x_{l, 2}^{g}, . . ., x_{l, D}^{g}}, l = 1, 2, . . ., N_{P}$ .

Updating rule

INFO randomly selected three differential vectors ( $a 1 \neq a 2 \neq a 3$ ) to calculate the weighted mean of vectors. To increase the diversity of the population, the best, better, and worst solutions were employed to define the MeanRule (mean-based rule).

M e a n R u l e = k \times W M 1_{l}^{g} + (1 - k) \times W M 2_{l}^{g} .

(5)

\begin{array}{l} W M 1_{l}^{g} & = α \times \frac{w_{1} (x_{a 1} - x_{a 2}) + w_{2} (x_{a 1} - x_{a 3}) + w_{3} (x_{a 2} - x_{a 3})}{w_{1} + w_{2} + w_{3} + ε} + ε \times r a n d, \\ l & = 1, 2, ..., N_{p}, \end{array}

(6)

where

w_{1} = \cos ((f (x_{a 1}) - f (x_{a 2})) + π) \times e x p (- \frac{f (x_{a 1}) - f (x_{a 2})}{ω}) .

(7)

w_{2} = \cos ((f (x_{a 1}) - f (x_{a 3})) + π) \times e x p (- \frac{f (x_{a 1}) - f (x_{a 3})}{ω}) .

(8)

w_{3} = \cos ((f (x_{a 2}) - f (x_{a 3})) + π) \times e x p (- \frac{f (x_{a 2}) - f (x_{a 3})}{ω}) .

(9)

ω = m a x (f (x_{a 1}), f (x_{a 2}), f (x_{a 3})) .

(10)

\begin{array}{l} W M 2_{l}^{g} & = α \times \frac{w_{1} (x_{b s} - x_{b t}) + w_{2} (x_{b s} - x_{w s}) + w_{3} (x_{b t} - x_{w s})}{w_{1} + w_{2} + w_{3} + ε} + ε \times r a n d, \\ l & = 1, 2, . . ., N_{P}, \end{array}

(11)

where

w_{1} = \cos ((f (x_{b s}) - f (x_{b t})) + π) \times e x p (- \frac{f (x_{b s}) - f (x_{b t})}{ω}) .

(12)

w_{2} = \cos ((f (x_{b s}) - f (x_{w s})) + π) \times e x p (- \frac{f (x_{b s}) - f (x_{w s})}{ω}) .

(13)

w_{3} = \cos ((f (x_{b t}) - f (x_{w s})) + π) \times e x p (- \frac{f (x_{b t}) - f (x_{w s})}{ω}) .

(14)

ω = f (x_{w s}) .

(15)

The weighted mean of vectors was used to generate two new vectors.

{\begin{matrix} {\begin{matrix} z 1_{l}^{g} = x_{l}^{g} + σ \times M e a n R u l e + r a n d n \times \frac{(x_{b s} - x_{a 1}^{g})}{(f (x_{b s}) - f (x_{a 1}^{g}) + 1)}, \\ z 2_{l}^{g} = x_{b s} + σ \times M e a n R u l e + r a n d n \times \frac{(x_{a 1}^{g} - x_{a 2}^{g})}{(f (x_{a 1}^{g}) - f (x_{a 2}^{g}) + 1)}, \end{matrix} r a n d < 0.5, \\ {\begin{matrix} z 1_{l}^{g} = x_{a}^{g} + σ \times M e a n R u l e + r a n d n \times \frac{(x_{a 2}^{g} - x_{a 3}^{g})}{(f (x_{a 2}^{g}) - f (x_{a 3}^{g}) + 1)}, \\ z 2_{l}^{g} = x_{b t} + σ \times M e a n R u l e + r a n d n \times \frac{(x_{a 1}^{g} - x_{a 2}^{g})}{(f (x_{a 1}^{g}) - f (x_{a 2}^{g}) + 1)}, \end{matrix} r a n d \geq 0.5, \end{matrix}

(16)

where $f (x)$ was defined as the objective function, three different integers ( $a 1, a 2, a 3$ ) were randomly chosen from $[1, N_{P}]$ , $z 1_{l}^{g}$ and $z 2_{l}^{g}$ were two new vectors, and $σ$ was the scaling rate of a vector.

Vector combining

The two new vectors $z 1_{l}^{g}$ and $z 2_{l}^{g}$ were combined with vector $x_{l}^{g}$ to generate a new vector $μ_{l}^{g}$ .

{\begin{matrix} {\begin{matrix} μ_{l}^{g} = z 1_{l}^{g} + μ . | z 1_{l}^{g} - z 2_{l}^{g} |, r a n d < 0.5, \\ μ_{l}^{g} = z 2_{l}^{g} + μ . | z 1_{l}^{g} - z 2_{l}^{g} |, r a n d \geq 0.5, \end{matrix} r a n d < 0.5, \\ μ_{l}^{g} = x_{l}^{g}, r a n d \geq 0.5, \end{matrix}

(17)

where $μ_{l}^{g}$ was the composite vector of the $g$ th generation.

Local search

The local search operator used the global position ( $x_{b e s t}^{g}$ ) and the MeanRule to help INFO convergence to global optima.

{\begin{matrix} μ_{l}^{g} = x_{b s} + r a n d n \times (M e a n R u l e + r a n d n \times (x_{b s}^{g} - x_{a 1}^{g})), & r a n d < 0.5, \\ μ_{l}^{g} = x_{r n d} + r a n d n \times (M e a n R u l e + r a n d n \times (v_{1} \times x_{b s} - v_{2} \times x_{r n d})), & r a n d \geq 0.5, \end{matrix}

(18)

in which

x_{r n d} = ϕ \times x_{a v g} + (1 - ϕ) \times (ϕ \times x_{b t} + (1 - ϕ) \times x_{b s}) .

(19)

x_{a v g} = \frac{(x_{a} + x_{b} + x_{c})}{3} .

(20)

where $r a n d$ and $ϕ$ were two random values within [0,1] and (0,1) respectively. The random value $v_{1}$ and $v_{2}$ increased the best position’s influence on the vector. INFO updated the best vector ( $x_{b e s t}$ ) and returned $x_{b e s t}^{g}$ as the final solution. For more details about the INFO algorithm, refer to Ahmadianfar et al. (2022).

Rscn-info

RSCN

Given a training dataset $(x_{i}, t_{i})_{i = 1}^{N}$ , the objective function of the SCN with $L_{2}$ -norm penalty term could be expressed as:

m i n : J = \frac{1}{2} | | β | |^{2} + \frac{η}{2} \sum_{i = 1}^{N} E_{i}^{2}, s . t : h (x_{i}) β = t_{i} - E_{i}, \forall i,

(21)

where $h (x_{i})$ stands for the hidden output for the input $x_{i}$ , $η$ is a non-negative real number, and the regularization coefficient $η$ balances the residual error $(\sum_{i = 1}^{N} E_{i}^{2})$ and norm of the output weights ( $| | β | |^{2}$ ).

SCN added new hidden neuron $β_{L}, g_{L},$ incrementally leading to $f_{L} = f_{L - 1} + β_{L} g_{L}$ . After dynamically adjusting the output weights during the training process, the current residual error was added to the $L_{2}$ regularization term. After adding the Lth new hidden node into an established SCN model with $L - 1$ hidden nodes, a new objective function was introduced.

\begin{array}{l} f (β_{L}) & = \frac{1}{2} [\begin{matrix} β β_{L} \end{matrix}] [\begin{matrix} β \\ β_{L} \end{matrix}] + \frac{η}{2} {‖ E_{L} ‖}^{2} \\ = \frac{1}{2} | | β {‖^{2} + \frac{1}{2} | | β_{L} ‖}^{2} + \frac{η}{2} | | E_{L - 1} - β_{L} (g_{L} + \frac{{‖ g_{L} ‖}^{2}}{γ} E_{L - 1}) | |^{2} . \end{array}

(22)

where $γ$ is the positive scale factor of feedback residual error. The derivative of function Eq. (22) with respect to $β_{L}$ is:

\frac{\partial f (β_{L})}{\partial β_{L}} = β_{L} - η ⟨ E_{L - 1}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1} ⟩ + η β_{L} | | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1} | |^{2} .

(23)

Letting Eq. (23) be equal to 0, The output weights of the L-th hidden node was obtained by:

β_{L} = \frac{⟨ E_{L - 1}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1} ⟩}{| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1} | |^{2} + \frac{1}{η}} .

(24)

Theorem 1. Assume that $s p a n (Γ)$ is dense in $L_{2}$ space. Given $0 < r < 1$ , $0 < η$ , $0 < γ$ , and a nonnegative real number sequence $μ_{L}$ , with $μ_{L} \geq 1 - r$ and $\lim_{L \to + \infty} μ_{L} = 0$ . $\forall g \in Γ$ , $0 < | | g | | < b_{g}$ for some $b_{g} \in R^{+}$ . For $L = 1, 2, . . .$ and $q = 1, 2, . . ., m$ , the random basis function $g_{L}$ is generated by Eq. (25), and the output weights of the Lth hidden neuron are obtained by Eq. (26). Then, the SCN has $\lim_{L \to + \infty} | | f - f_{L} | | = 0$ .

\begin{matrix} \frac{{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2} / (| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{2}{η})} \geq (1 - r - μ_{L}) | | E_{L - 1, q} | |^{2}, \\ q = 1, 2, . . ., m . \end{matrix}

(25)

β_{L, q} = \frac{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}{| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η}}, q = 1, 2, . . ., m .

(26)

Proof. First, the monotonically decreasing property of $| | E_{L} | |$ will be proved.

\begin{aligned} | | E_{L} | |^{2} - | | E_{L - 1} | |^{2} = \\ \sum_{q = 1}^{m} (⟨ E_{L - 1, q} - β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}), E_{L - 1, q} - β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩ - ⟨ E_{L - 1, q}, E_{L - 1, q} ⟩) \\ = \sum_{q = 1}^{m} (- 2 ⟨ E_{L - 1, q}, β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩ + ⟨ β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}), β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩) \\ = \sum_{q = 1}^{m} (- 2 \frac{{⟨ E_{L - 1, q}, β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩}^{2}}{| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η}} + \frac{{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2} | | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2}}) \\ = - \sum_{q = 1}^{m} \frac{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{2}{η}) {⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2}} \\ = - \sum_{q = 1}^{m} \frac{{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2} / (| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{2}{η})} \leq 0. \end{aligned}

(27)

The monotonically decreasing property of $| | E_{L} | |$ has been proven. From Eqs. (25)–(27):

\begin{aligned} | | E_{L} | |^{2} - (r + μ_{L}) | | E_{L - 1} | |^{2} \\ = \sum_{q = 1}^{m} (⟨ E_{L - 1, q} - β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}), E_{L - 1, q} - β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩ - (r + μ_{L}) ⟨ E_{L - 1, q}, E_{L - 1, q} ⟩) \\ = \sum_{q = 1}^{m} ((1 - r - μ_{L}) ⟨ E_{L - 1, q}, E_{L - 1, q} ⟩ - 2 ⟨ E_{L - 1, q}, β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩ + ⟨ β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}), β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩) \\ = (1 - r - μ_{L}) | | E_{L - 1} | |^{2} - \sum_{q = 1}^{m} (- 2 \frac{{⟨ E_{L - 1, q}, β_{L} (g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q}) ⟩}^{2}}{| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η}} + \frac{{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2} | | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2}}) \\ = (1 - r - μ_{L}) | | E_{L - 1} | |^{2} - \sum_{q = 1}^{m} \frac{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{2}{η}) {⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2}} \\ = (1 - r - μ_{L}) | | E_{L - 1} | |^{2} - \sum_{q = 1}^{m} \frac{{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2}}{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2} / (| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{2}{η})} . \end{aligned}

(28)

According to Eq. (25):

| | E_{L} | |^{2} - (r + μ_{L}) | | E_{L - 1} | |^{2} \leq 0.

(29)

Therefore:

| | E_{L} | |^{2} \leq r | | E_{L - 1} | |^{2} + μ_{L} | | E_{L - 1} | |^{2} .

(30)

Theorem 1 has given $\lim_{L \to + \infty} μ_{L} = 0$ , which means $\lim_{L \to + \infty} μ_{L} | | E_{L - 1} | |^{2} = 0$ . Based on Eq. (30), $\lim_{L \to + \infty} | | E_{L} | |^{2} = 0$ . Therefore, $\lim_{L \to + \infty} | | E_{L} | | = 0$ .

Remark 1. In theorem 1, the output weights are evaluated by Eq. (26) and kept fixed. This may cause a slow convergence rate. To cope with this problem, the output weights of all hidden neurons are updated by the least squares method after the new hidden node has been added. Let $[β_{1}^{*}, β_{2}^{*}, ..., β_{L}^{*}] = \underset{β}{\arg \min} \frac{η}{2} ‖ f - (G + \frac{\tilde{G}}{γ} ° E) β ‖^{2} + \frac{1}{2} ‖ β ‖^{2}$ , where $G = [g_{1}, g_{2}, . . ., g_{L}]$ , $\tilde{G} = [| | g_{1} | |^{2}, | | g_{2} | |^{2}, . . ., | | g_{L} | |^{2}]$ , $E = [E_{0}^{*}, E_{1}^{*}, . . ., E_{L - 1}^{*}]$ , ‘ $°$ ’ denotes the Hadamard product (element-wise multiplication) and $E_{L}^{*} = f - \sum_{j = 1}^{L} β_{j}^{*} (g_{j} + \frac{| | g_{L} | |^{2}}{γ} E_{j - 1}^{*})$ .

The output weights are calculated by:

$\begin{array}{l} [β_{1}^{*}, β_{2}^{*}, ..., β_{L}^{*}] & = \underset{β}{\arg \min} \frac{η}{2} ‖ f - (G + \frac{\tilde{G}}{γ} ° E) β ‖^{2} + \frac{1}{2} {‖ β ‖}^{2} \\ = {({(G + \frac{\tilde{G}}{γ} ° E)}^{T} (G + \frac{\tilde{G}}{γ} ° E) + \frac{I}{η})}^{- 1} {(G + \frac{\tilde{G}}{γ} ° E)}^{T} f . \end{array}$ (31)

The output weights are recalculated in accordance with Eq. (31) as the newly added hidden neuron is generated to satisfy Eq. (25). The inequality constraint guarantees the universal approximation capability of RSCN. The process of proof is similar to theorem 1, so the detailed proof procedure is omitted.

Remark 2. In Eq. (22), the residual error $(\frac{| | g_{L} | |^{2}}{γ} E_{L - 1})$ is added into the regularization term. The reason why $\frac{| | g_{L} | |^{2}}{γ}$ is used instead of $γ$ is that the value of the residual error is equal to the output of training samples before adding hidden neurons into the network ( $E_{0} = T$ ). So the residual error is relatively larger at the beginning of the construction process. It gradually decreases as the constructive process proceeds. Meanwhile, due to the randomness of SCN, $g_{L}$ is randomly generated under a set of inequality constraints. The scale factor $\frac{| | g_{L} | |^{2}}{γ}$ makes it possible for the feedback residual error $(\frac{| | g_{L} | |^{2}}{γ} E_{L - 1})$ to adjust dynamically in pace with the change of the hidden output ( $g_{L}$ ).

RSCN-INFO algorithm

INFO is a very competitive new optimization algorithm. In this section, INFO is applied to optimize the parameter λ, the contractive factor $r$ , the regularization coefficient $η$ , and the positive scale factor $γ$ for RSCN. The widely-used root mean square error (RMSE) is employed as the fitness function.

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {[\sum_{j = 1}^{L} β_{j} g_{j} (w_{j}^{T} x_{i} + b_{j}) - t_{i}]}^{2}} .

(32)

For convenience’s sake, $ξ_{L, q}, q = 1, 2, . . ., m$ is defined to describe the algorithm. The pseudo-code of RSCN-INFO is summarized in Algorithm 1.

Algorithm 1. RSCN-INFO.

Training dataset:

(x_{i}, t_{i})_{i = 1}^{N}

x_{i} \in R^{d}, t_{i} \in R^{m}

Parameters: the population size

N_{P}

, maximum number of generations

G_{m a x}

, dimensional search domain D, upper bounds ub and lower bounds lb of λ, r, η and γ, maximum number of hidden layer neurons

L_{m a x}

, residual error threshold τ, maximum number of random assignment

T_{m a x}

Output:

v_{b e s t}

f_{b e s t}

1: STEP 1: Initialization

2: Initialize

E_{0} = T, Ω, W := [];

3: Produce an initial population

P^{0} = {v_{1}^{0}, v_{2}^{0}, . . ., v_{N_{P}}^{0}},

where

v_{i}^{0} = {v_{i, λ}^{0}, v_{i, r}^{0}, v_{i, η}^{0}, v_{i, γ}^{0}}

;

4: Calculate

f (v_{i}^{0})

by Eq. (32);

5: STEP 2: Parameter optimization by INFO

6: for

g = 1

G_{m a x}

7: for

i = 1

N_{P}

8: Randomly choose three vectors (

x_{a 1}, x_{a 2}, x_{a 3}

), and calculate w by Eqs. (7)–(9) and (12)–(14);

9: Create two new vectors using Eq. (16);

10: The two new vectors are combined by Eq. (17);

11: Execute local search using Eqs. (18)–(20);

12: Update the Vector

v_{i}^{g} = {v_{i, λ}^{g}, v_{i, r}^{g}, v_{i, η}^{g}, v_{i, γ}^{g}}

;

13: while

L \leq L_{m a x}

AND

| | E_{0} | | \geq τ

14: STEP 3: Hidden parameter configuration (Step 15–28)

15: for

λ = v_{i, λ}^{g}

r = v_{i, r}^{g}

η = v_{i, η}^{g}

and

γ = v_{i, γ}^{g}

16: for

k = 1, 2, . . ., T_{m a x}

17: Randomly select

w_{L}

and

b_{L}

from

λ^{d}

and λ;

18: Compute

g_{L}

and

ξ_{L, q}

by Eqs. (33) and (34)

19: if

w_{L}^{*}

and

b_{L}^{*}

satisfy constraint inequality then

20: Save the random parameters in W,

ξ_{L}

in Ω;

21: end if

22: end for

23: if W is not empty then

24: Choose

w_{L}^{*}

and

b_{L}^{*}

corresponding to the maximize

ξ_{L}

in Ω, set

G = [g_{1}^{*}, g_{2}^{*}, . . ., g_{L}^{*}]

and

E = [E_{0}^{*}, E_{1}^{*}, . . ., E_{L - 1}^{*}]

;

25: Break (go to Step 30);

26: else Return to Step 7;

27: end if

28: end for

29: STEP 4: Output weight determination (Step 30–32)

30: Calculate

[β_{1}^{*}, β_{2}^{*}, ..., β_{L}^{*}] = {({(G + \frac{\tilde{G}}{γ} ° E)}^{T} (G + \frac{\tilde{G}}{γ} ° E) + \frac{I}{η})}^{- 1} {(G + \frac{\tilde{G}}{γ} ° E)}^{T} f

;

31: Calculate

E_{L} = T - (G + \frac{\tilde{G}}{γ} ° E) β^{*}

;

32: Renew

E_{0} := E_{L}, L := L + 1

;

33: end while

34: Determine

β^{*}, w^{*}

and

b^{*}

;

35: Calculate the fitness function

f (v_{i}^{g})

according to Eq. (32).

36: if

f (v_{i}^{g}) < f_{b e s t}

then

f_{b e s t} = f (v_{i}^{g}), v_{b e s t} = v_{i}^{g}

37: end if

38: end for

39: Update the best vector

v_{b e s t} = {v_{b e s t, λ}, v_{b e s t, r}, v_{b e s t, η}, v_{b e s t, γ}}

;

40: end for

41: Return

v_{b e s t}, f_{b e s t}

Open in a new tab

ξ_{L, q} = \frac{{⟨ E_{L - 1, q}, g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} ⟩}^{2}}{ζ} - (1 - r - μ_{L}) | | E_{L - 1, q} | |^{2}

(33)

where

ζ = \frac{{(| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{1}{η})}^{2}}{| | g_{L} + \frac{| | g_{L} | |^{2}}{γ} E_{L - 1, q} | |^{2} + \frac{2}{η}} .

(34)

The calculation complexity

The computational complexity of the INFO algorithm depends on the size of the population $N_{P}$ , the times of iterations $G_{m a x}$ , and the dimensional search domain D. The complexity of INFO is $O (N_{P} \times G_{m a x} \times D)$ . For SCN, assume that a set of datasets with N inputs $X = {x_{1}, x_{2}, . . ., x_{N}}$ , and the maximum number of hidden layer neurons of SCN is $L_{m a x}$ . The main cost of SCN is caused by computing Moore-Penrose pseudo-inverse $H_{L}^{†} T$ . A rough estimate of the computational complexity of $H_{L}^{†}$ can be expressed as $O (N L_{m a x}^{3} + N^{2} L_{m a x}^{2} + L_{m a x}^{4})$ (Li & Wang, 2017). Note that the cost of $H_{L}^{†} T$ is calculated by the widely-used singular value decomposition. Hence, the total complexity of RSCN-INFO is $O (N_{P} \times G_{m a x} \times D \times (N L_{m a x}^{3} + N^{2} L_{m a x}^{2} + L_{m a x}^{4}))$ .

Experiments

The effectiveness of RSCN-INFO was evaluated on a function approximation and three benchmark datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL, http://www.keel.es) dataset repository supported by the Spanish Ministry of Science and Technology. The approximation function is a conventional high nonlinear compound function that is widely used to evaluate randomized neural networks. The KEEL dataset contains classification, regression, unsupervised, and time series datasets. To verify the effectiveness of RSCN-INFO, it was compared with classical IRVFL, SCN (Wang & Li, 2017b), RSCN (Wang et al., 2021), and DASCN-II (Wang et al., 2020). All the experiments were implemented with MATLAB R2019b on a PC with AMD Ryzen 7 3.20 GHz CPU, NVIDIA GeForce MX450 GPU, and 16 GB RAM.

Function approximation

Let the real-valued function $f (x)$ be defined as follows (Tyukin & Prokhorov, 2009):

y = 0.2 e^{- {(10 x - 4)}^{2}} + 0.5 e^{- {(80 x - 40)}^{2}} + 0.3 e^{- {(80 x - 20)}^{2}}, x \in [0, 1] .

(35)

We randomly generated 1,000 training samples and 300 test samples from the uniform distribution, and a regularly spaced grid over (0,1). Figure 2 compares the function approximation performance of RSCN-INFO with IRVFL, SCN, RSCN, and DASCN-II. Since our proposed RSCN-INFO could achieve reliable and accurate performance with the lower number of hidden neurons, the value of $L_{m a x}$ was set to 25. In the simulations, the value of RMSE remained virtually unchanged when the widely used setting [−1,1] was set for IRVFL. So the scope of random parameters for IRVFL was set as (−250, 250). For SCN, the value of $T_{m a x}$ was set to 100, λ and $r$ were selected from the set ${100 : 1 : 200}$ and ${0.9, 0.99, 0.999, 0.9999, 0.99999, 0.999999}$ . In RSCN-INFO, the population size $N_{p}$ and the maximum generations were set to 30 and 10, respectively. The lower bounds and upper bounds of λ, $r$ , $η$ , and $γ$ were set to $[100, 200]$ , $[0.9, 0.999999]$ , $[0, 2^{40}]$ , and $[10^{5}, 10^{9}]$ . As seen in Fig. 2B, the IRVFL showed far worse performance than that of SCNs, while the performance of our proposed RSCN-INFO was the best.

Figures 3 and 4 display the training and test results of the real-value function with 25 and 50 hidden nodes, respectively. The average training RMSE was obtained from 20 independent experiments. For IRVFL, Figs. 3 and 4 clearly show that the training RMSE was unacceptable. Furthermore, the convergence rate of RSCN-INFO is faster than that of SCN, RSCN, and DASCN-II, which verifies the efficiency of RSCN-INFO. In addition, Table 1 reports the average RMSE and standard deviation results of different models. It is evident that RSCN-INFO achieved more favorable results than the other algorithms.

Table 1. Performance comparisons of different methods on function approximation.

Methods	Training results		Test results
	$L = 25$	$L = 50$	$L = 25$	$L = 50$
IRVFL	0.08493 $\pm$ 0.00548	0.08389 $\pm$ 0.00426	0.08405 $\pm$ 0.00524	0.06756 $\pm$ 0.00559
SCN	0.02421 $\pm$ 0.00479	0.00535 $\pm$ 0.00342	0.02681 $\pm$ 0.00535	0.00570 $\pm$ 0.00364
RSCN	0.02090 $\pm$ 0.00448	0.00477 $\pm$ 0.00222	0.02344 $\pm$ 0.00499	0.00506 $\pm$ 0.00252
DASCN-II	0.02267 $\pm$ 0.00345	0.00512 $\pm$ 0.00268	0.02523 $\pm$ 0.00368	0.00553 $\pm$ 0.00285
RSCN-INFO	0.00179 $\pm$ 0.00025	0.00015 $\pm$ 0.00008	0.00209 $\pm$ 0.00031	0.00016 $\pm$ 0.00009

Open in a new tab

Benchmark datasets

Three real-world benchmark datasets for regression from KEEL were employed as experimental datasets. Specifications of these datasets are given in Table 2.

Table 2. Specifications of three benchmark regression datasets.

Dataset	Attributes		Instances
	Features	Output
Concrete	8	1	1,030
Compactiv	21	1	8,192
Pole	26	1	14,998

Open in a new tab

Figures 5–7 and Tables 3–5 depict the average training and test results on these benchmark datasets. Each test’s statistical results for the 20 run times and the average value of RMSE were selected to evaluate the performance of the different algorithms. The IRVFL could not reach the preset tolerance threshold, so it was omitted here. In this case, the scope of random parameters λ in SCN was selected from the set ${1 : 0.1 : 5}$ and the lower and upper bounds of λ in RSCN-INFO were set to $[1, 5]$ . All the other parameters were set the same as the function approximation.

Table 3. Performance comparisons of different methods on concrete.

Methods	Training, Test results
	$L = 20$	$L = 30$	$L = 40$	$L = 50$	$L = 60$
SCN	0.09842, 0.10212	0.09122, 0.09786	0.08626, 0.09655	0.08098, 0.09381	0.07566, 0.09114
RSCN	0.09912, 0.10187	0.09156, 0.09831	0.08613, 0.09724	0.08113, 0.09438	0.07594, 0.09190
DASCN-II	0.09840, 0.10109	0.09175, 0.09769	0.08626, 0.09613	0.08066, 0.09405	0.07566, 0.09239
RSCN-INFO	0.09957, 0.10391	0.08933, 0.09684	0.08097, 0.09427	0.07545, 0.09596	0.06944, 0.09439

Open in a new tab

Table 5. Performance comparisons of different methods on pole.

Methods	Training, Test results
	$L = 30$	$L = 60$	$L = 90$	$L = 120$	$L = 150$
SCN	0.26345, 0.26582	0.22619, 0.23069	0.20756, 0.21426	0.19464, 0.20406	0.18507, 0.19572
RSCN	0.26411, 0.26746	0.22831, 0.23278	0.20871, 0.21545	0.19524, 0.20409	0.18542, 0.19602
DASCN-II	0.26462, 0.26741	0.22856, 0.23264	0.20895, 0.21497	0.1945, 0.20299	0.18478, 0.19538
RSCN-INFO	0.22182, 0.22629	0.18604, 0.18925	0.17349, 0.17836	0.16272, 0.16904	0.15479, 0.16099

Open in a new tab

Table 4. Performance comparisons of different methods on compactiv.

Methods	Training, Test results
	$L = 20$	$L = 40$	$L = 60$	$L = 80$	$L = 100$
SCN	0.08380, 0.08076	0.04963, 0.05035	0.03646, 0.03823	0.03090, 0.03264	0.02855, 0.03032
RSCN	0.08317, 0.08091	0.04902, 0.04981	0.03595, 0.03752	0.03093, 0.03243	0.02872, 0.03049
DASCN-II	0.08345, 0.08081	0.04972, 0.05003	0.03638, 0.03781	0.03096, 0.03263	0.02858, 0.03028
RSCN-INFO	0.05695, 0.05613	0.03863, 0.03856	0.03117, 0.03211	0.02815, 0.02932	0.02658, 0.02839

Open in a new tab

Figure 5 shows similar performance between RSCN-INFO and the competitor algorithms on concrete. The reason for this phenomenon is that the concrete dataset contained smaller features and instances. For the compactiv and pole datasets, Figs. 6 and 7 clearly show that RSCN-INFO can achieve lower RMSE in terms of both training and test results. Intuitively and obviously, the RMSE of RSCN was used as the fitness function of INFO. In essence, INFO explored a global optimum solution that minimizes the fitness function in a four-dimensional search domain ( $λ, r, γ, η$ ) over several successive generations.

To verify the effectiveness of RSCN-INFO, Table 6 lists the computational time among SCN, RSCN, DASCN-II, and RSCN-INFO on benchmark datasets. We found that the training time of RSCN-INFO was significantly shorter than the other methods on the three benchmark datasets. Table 6 indicates that RSCN-INFO which employed optimized parameters and could achieve better efficiency. It should be noted that we did not take the parameter optimization process into account in this experiment. The optimization process may consume additional time. However, the improvement of regression accuracy and network structure may be worth the time that is spent on the parameter optimization.

Table 6. The computational time of different algorithms on benchmark datasets.

Datasets	Algorithms	Error tolerance $τ$	Training time (Mean $\pm$ STD)
Concrete	SCN	0.08	0.1977 $\pm$ 0.00379
	RSCN		0.2083 $\pm$ 0.03008
	DASCN-II		0.2275 $\pm$ 0.02442
	RSCN-INFO		0.1021 $\pm$ 0.01095
Compactiv	SCN	0.03	2.1405 $\pm$ 0.25356
	RSCN		2.1773 $\pm$ 0.16212
	DASCN-II		2.5231 $\pm$ 0.22985
	RSCN-INFO		1.2148 $\pm$ 0.22618
Pole	SCN	0.20	6.5387 $\pm$ 0.94180
	RSCN		6.2857 $\pm$ 0.68270
	DASCN-II		7.9776 $\pm$ 1.52610
	RSCN-INFO		1.7529 $\pm$ 0.28892

Open in a new tab

To further illustrate the network compactness of RSCN-INFO, we investigated how many hidden nodes were required to meet a preset error tolerance. As shown in Fig. 8, RSCN-INFO requires fewer hidden nodes compared with other methods. It can be deduced that given a preset $τ$ , RSCN-INFO can reach the error tolerance using fewer hidden neurons. This is due to RSCN-INFO using optimized parameters that can achieve a higher residual error reduction. Therefore, the network structure is more compact. It should be pointed out that DASCN-II can also construct a relatively compact SCN. However, the tunable value $γ$ in DASCN-II is a fixed value. It is selected empirically and difficult to adjust. Moreover, an inappropriate $γ$ will seriously affect the accuracy of the model.

In classical SCN and its various variants, λ tends to set a relatively larger value in complex problems. The parameter $r$ is unfixed and set based on an increasing sequence from 0.9 to 1. The other parameters are selected empirically in connection with practical applications. Therefore, the conclusion may be drawn that RSCN-INFO is not only helpful in adaptively selecting parameters of SCN, but also beneficial for constructing a compact network.

Conclusion

This article developed a new regularized SCN based on the INFO optimization algorithm, named RSCN-INFO. On one hand, the added regularization term combines the ridge method with the residual error feedback, contributing to the balance of the structural (output weights) and empirical (network residual error) losses of SCN. On the other hand, RMSE was selected as the fitness function of INFO to assist SCN to locate up-and-coming areas in multi-dimensional search space. A higher residual error decreasing rate is impacted by the parameter selection of RSCN-INFO. The experimental results on a function approximation and three benchmark regression datasets from KEEL indicated that the proposed RSCN-INFO algorithm exhibits considerable advantages in parameter optimization and network structure compactness compared with other algorithms.

In almost all practical modeling tasks, the presence of noise and outliers is inevitable. This optimization strategy will accelerate the degradation of the learning performance of SCN that are subjected to noise or outliers. The robust skills used to weaken the negative influences of noise and outliers will be further discussed.

Supplemental Information

Supplemental Information 1. Test input data of function approximation.

Click here for additional data file.^{(1.3KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-1

Supplemental Information 2. Test output data of function approximation.

Click here for additional data file.^{(2.2KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-2

Supplemental Information 3. Train input data of function approximation.

Click here for additional data file.^{(7.5KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-3

Supplemental Information 4. Train output data of function approximation.

Click here for additional data file.^{(7.7KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-4

Supplemental Information 5. The raw data of concrete.

Click here for additional data file.^{(102.8KB, csv)}

DOI: 10.7717/peerj-cs.1382/supp-5

Supplemental Information 6. The raw data of compactiv.

Click here for additional data file.^{(954.1KB, csv)}

DOI: 10.7717/peerj-cs.1382/supp-6

Supplemental Information 7. The raw data of pole.

Click here for additional data file.^{(937.3KB, csv)}

DOI: 10.7717/peerj-cs.1382/supp-7

Supplemental Information 8. The matlab code of RSCN-INFO.

Click here for additional data file.^{(7.6KB, m)}

DOI: 10.7717/peerj-cs.1382/supp-8

Supplemental Information 9. The matlab code of SCN.

Click here for additional data file.^{(12.5KB, m)}

DOI: 10.7717/peerj-cs.1382/supp-9

Supplemental Information 10. The matlab source code of Tool.m.

Click here for additional data file.^{(12KB, m)}

DOI: 10.7717/peerj-cs.1382/supp-10

Funding Statement

This work is supported by the National Natural Science Foundations of China (No. 62163007 and No. 62166005). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Yang Wang conceived and designed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Tao Zhou conceived and designed the experiments, performed the experiments, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Guanci Yang performed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Chenglong Zhang analyzed the data, authored or reviewed drafts of the article, and approved the final draft.

Shaobo Li conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The raw training, test datasets, three benchmark datasets (concrete, compactiv, pole), and code are available in the Supplemental Files.

References

Ahmadianfar et al. (2022).Ahmadianfar I, Heidari AA, Noshadian S, Chen H, Gandomi AH. Info: an efficient optimization algorithm based on weighted mean of vectors. Expert Systems with Applications. 2022;195(12):116516. doi: 10.1016/j.eswa.2022.116516. [DOI] [Google Scholar]
Bengio, Courville & Vincent (2013).Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(8):1798–1828. doi: 10.1109/TPAMI.2013.50. [DOI] [PubMed] [Google Scholar]
Cao et al. (2021).Cao W, Xie Z, Li J, Xu Z, Ming Z, Wang X. Bidirectional stochastic configuration network for regression problems. Neural Networks. 2021;140(1–2):237–246. doi: 10.1016/j.neunet.2021.03.016. [DOI] [PubMed] [Google Scholar]
Dai et al. (2019b).Dai W, Li DP, Chen QX, Chai TY. Data driven particle size estimation of hematite grinding process using stochastic configuration network with robust technique. Journal of Central South University. 2019b;26(1):43–62. doi: 10.1007/s11771-019-3981-2. [DOI] [Google Scholar]
Dai et al. (2019a).Dai W, Li D, Zhou P, Chai T. Stochastic configuration networks with block increments for data modeling in process industries. Information Sciences. 2019a;484(2–3):367–386. doi: 10.1016/j.ins.2019.01.062. [DOI] [Google Scholar]
He et al. (2016).He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. pp. 770–778. [DOI] [Google Scholar]
Huang, Huang & Wang (2019).Huang C, Huang Q, Wang D. Stochastic configuration networks based adaptive storage replica management for power big data processing. IEEE Transactions on Industrial Informatics. 2019;16(1):373–383. doi: 10.1109/TII.2019.2919268. [DOI] [Google Scholar]
Huang, Li & Wang (2021).Huang C, Li M, Wang D. Stochastic configuration network ensembles with selective base models. Neural Networks. 2021;137(e1200):106–118. doi: 10.1016/j.neunet.2021.01.011. [DOI] [PubMed] [Google Scholar]
Igelnik & Pao (1995).Igelnik B, Pao YH. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Transactions on Neural Networks. 1995;6(6):1320–1329. doi: 10.1109/72.471375. [DOI] [PubMed] [Google Scholar]
Li, Huang & Wang (2019).Li M, Huang C, Wang D. Robust stochastic configuration networks with maximum correntropy criterion for uncertain data regression. Information Sciences. 2019;473(3):73–86. doi: 10.1016/j.ins.2018.09.026. [DOI] [Google Scholar]
Li & Wang (2017).Li M, Wang D. Insights into randomized algorithms for neural networks: practical issues and common pitfalls. Information Sciences. 2017;382(3):170–178. doi: 10.1016/j.ins.2016.12.007. [DOI] [Google Scholar]
Li & Wang (2019).Li M, Wang D. 2-D stochastic configuration networks for image data analytics. IEEE Transactions on Cybernetics. 2019;51(1):359–372. doi: 10.1109/TCYB.2019.2925883. [DOI] [PubMed] [Google Scholar]
Lu & Ding (2019).Lu J, Ding J. Mixed-distribution-based robust stochastic configuration networks for prediction interval construction. IEEE Transactions on Industrial Informatics. 2019;16(8):5099–5109. doi: 10.1109/TII.2019.2954351. [DOI] [Google Scholar]
Lu et al. (2020).Lu J, Ding J, Dai X, Chai T. Ensemble stochastic configuration networks for estimating prediction intervals: a simultaneous robust training algorithm and its application. IEEE Transactions on Neural Networks and Learning Systems. 2020;31(12):5426–5440. doi: 10.1109/TNNLS.2020.2967816. [DOI] [PubMed] [Google Scholar]
Scardapane & Wang (2017).Scardapane S, Wang D. Randomness in neural networks: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2017;7(2):e1200. doi: 10.1002/widm.1200. [DOI] [Google Scholar]
Tyukin & Prokhorov (2009).Tyukin IY, Prokhorov DV. 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC) Piscataway: IEEE; 2009. Feasibility of random basis function approximators for modeling and control; pp. 1391–1396. [DOI] [Google Scholar]
Wang & Cui (2017).Wang D, Cui C. Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics. Information Sciences. 2017;417(2):55–71. doi: 10.1016/j.ins.2017.07.003. [DOI] [Google Scholar]
Wang et al. (2020).Wang Q, Dai W, Ma X, Shang Z. Driving amount based stochastic configuration network for industrial process modeling. Neurocomputing. 2020;394(2):61–69. doi: 10.1016/j.neucom.2020.02.029. [DOI] [Google Scholar]
Wang & Li (2017a).Wang D, Li M. Robust stochastic configuration networks with kernel density estimation for uncertain data regression. Information Sciences. 2017a;412(3):210–222. doi: 10.1016/j.ins.2017.05.047. [DOI] [Google Scholar]
Wang & Li (2017b).Wang D, Li M. Stochastic configuration networks: fundamentals and algorithms. IEEE Transactions on Cybernetics. 2017b;47(10):3466–3479. doi: 10.1109/TCYB.2017.2734043. [DOI] [PubMed] [Google Scholar]
Wang & Li (2018).Wang D, Li M. Deep stochastic configuration networks with universal approximation property. 2018 International Joint Conference on Neural Networks (IJCNN); Piscataway: IEEE; 2018. pp. 1–8. [DOI] [Google Scholar]
Wang et al. (2021).Wang Q, Yang C, Ma X, Zhang C, Peng S. Underground airflow quantity modeling based on SCN. Acta Automatica Sinica. 2021;47(8):1963–1975. doi: 10.16383/j.aas.c190602. [DOI] [Google Scholar]
Xie & Zhou (2020).Xie J, Zhou P. Robust stochastic configuration network multi-output modeling of molten iron quality in blast furnace ironmaking. Neurocomputing. 2020;387(9):139–149. doi: 10.1016/j.neucom.2020.01.030. [DOI] [Google Scholar]
Zhang & Ding (2021).Zhang C, Ding S. A stochastic configuration network based on chaotic sparrow search algorithm. Knowledge-Based Systems. 2021;220(10):106924. doi: 10.1016/j.knosys.2021.106924. [DOI] [Google Scholar]
Zhang et al. (2021).Zhang C, Ding S, Zhang J, Jia W. Parallel stochastic configuration networks for large-scale data regression. Applied Soft Computing. 2021;103(2):107143. doi: 10.1016/j.asoc.2021.107143. [DOI] [Google Scholar]
Zhu et al. (2019).Zhu X, Feng X, Wang W, Jia X, He R. A further study on the inequality constraints in stochastic configuration networks. Information Sciences. 2019;487(1):77–83. doi: 10.1016/j.ins.2019.02.066. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information 1. Test input data of function approximation.

Click here for additional data file.^{(1.3KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-1

Supplemental Information 2. Test output data of function approximation.

Click here for additional data file.^{(2.2KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-2

Supplemental Information 3. Train input data of function approximation.

Click here for additional data file.^{(7.5KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-3

Supplemental Information 4. Train output data of function approximation.

Click here for additional data file.^{(7.7KB, mat)}

DOI: 10.7717/peerj-cs.1382/supp-4

Supplemental Information 5. The raw data of concrete.

Click here for additional data file.^{(102.8KB, csv)}

DOI: 10.7717/peerj-cs.1382/supp-5

Supplemental Information 6. The raw data of compactiv.

Click here for additional data file.^{(954.1KB, csv)}

DOI: 10.7717/peerj-cs.1382/supp-6

Supplemental Information 7. The raw data of pole.

Click here for additional data file.^{(937.3KB, csv)}

DOI: 10.7717/peerj-cs.1382/supp-7

Supplemental Information 8. The matlab code of RSCN-INFO.

Click here for additional data file.^{(7.6KB, m)}

DOI: 10.7717/peerj-cs.1382/supp-8

Supplemental Information 9. The matlab code of SCN.

Click here for additional data file.^{(12.5KB, m)}

DOI: 10.7717/peerj-cs.1382/supp-9

Supplemental Information 10. The matlab source code of Tool.m.

Click here for additional data file.^{(12KB, m)}

DOI: 10.7717/peerj-cs.1382/supp-10

Data Availability Statement

The following information was supplied regarding data availability:

The raw training, test datasets, three benchmark datasets (concrete, compactiv, pole), and code are available in the Supplemental Files.

[ref-1] Ahmadianfar et al. (2022).Ahmadianfar I, Heidari AA, Noshadian S, Chen H, Gandomi AH. Info: an efficient optimization algorithm based on weighted mean of vectors. Expert Systems with Applications. 2022;195(12):116516. doi: 10.1016/j.eswa.2022.116516. [DOI] [Google Scholar]

[ref-2] Bengio, Courville & Vincent (2013).Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(8):1798–1828. doi: 10.1109/TPAMI.2013.50. [DOI] [PubMed] [Google Scholar]

[ref-3] Cao et al. (2021).Cao W, Xie Z, Li J, Xu Z, Ming Z, Wang X. Bidirectional stochastic configuration network for regression problems. Neural Networks. 2021;140(1–2):237–246. doi: 10.1016/j.neunet.2021.03.016. [DOI] [PubMed] [Google Scholar]

[ref-4] Dai et al. (2019b).Dai W, Li DP, Chen QX, Chai TY. Data driven particle size estimation of hematite grinding process using stochastic configuration network with robust technique. Journal of Central South University. 2019b;26(1):43–62. doi: 10.1007/s11771-019-3981-2. [DOI] [Google Scholar]

[ref-5] Dai et al. (2019a).Dai W, Li D, Zhou P, Chai T. Stochastic configuration networks with block increments for data modeling in process industries. Information Sciences. 2019a;484(2–3):367–386. doi: 10.1016/j.ins.2019.01.062. [DOI] [Google Scholar]

[ref-6] He et al. (2016).He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. pp. 770–778. [DOI] [Google Scholar]

[ref-7] Huang, Huang & Wang (2019).Huang C, Huang Q, Wang D. Stochastic configuration networks based adaptive storage replica management for power big data processing. IEEE Transactions on Industrial Informatics. 2019;16(1):373–383. doi: 10.1109/TII.2019.2919268. [DOI] [Google Scholar]

[ref-8] Huang, Li & Wang (2021).Huang C, Li M, Wang D. Stochastic configuration network ensembles with selective base models. Neural Networks. 2021;137(e1200):106–118. doi: 10.1016/j.neunet.2021.01.011. [DOI] [PubMed] [Google Scholar]

[ref-9] Igelnik & Pao (1995).Igelnik B, Pao YH. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Transactions on Neural Networks. 1995;6(6):1320–1329. doi: 10.1109/72.471375. [DOI] [PubMed] [Google Scholar]

[ref-10] Li, Huang & Wang (2019).Li M, Huang C, Wang D. Robust stochastic configuration networks with maximum correntropy criterion for uncertain data regression. Information Sciences. 2019;473(3):73–86. doi: 10.1016/j.ins.2018.09.026. [DOI] [Google Scholar]

[ref-11] Li & Wang (2017).Li M, Wang D. Insights into randomized algorithms for neural networks: practical issues and common pitfalls. Information Sciences. 2017;382(3):170–178. doi: 10.1016/j.ins.2016.12.007. [DOI] [Google Scholar]

[ref-12] Li & Wang (2019).Li M, Wang D. 2-D stochastic configuration networks for image data analytics. IEEE Transactions on Cybernetics. 2019;51(1):359–372. doi: 10.1109/TCYB.2019.2925883. [DOI] [PubMed] [Google Scholar]

[ref-13] Lu & Ding (2019).Lu J, Ding J. Mixed-distribution-based robust stochastic configuration networks for prediction interval construction. IEEE Transactions on Industrial Informatics. 2019;16(8):5099–5109. doi: 10.1109/TII.2019.2954351. [DOI] [Google Scholar]

[ref-14] Lu et al. (2020).Lu J, Ding J, Dai X, Chai T. Ensemble stochastic configuration networks for estimating prediction intervals: a simultaneous robust training algorithm and its application. IEEE Transactions on Neural Networks and Learning Systems. 2020;31(12):5426–5440. doi: 10.1109/TNNLS.2020.2967816. [DOI] [PubMed] [Google Scholar]

[ref-15] Scardapane & Wang (2017).Scardapane S, Wang D. Randomness in neural networks: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2017;7(2):e1200. doi: 10.1002/widm.1200. [DOI] [Google Scholar]

[ref-16] Tyukin & Prokhorov (2009).Tyukin IY, Prokhorov DV. 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC) Piscataway: IEEE; 2009. Feasibility of random basis function approximators for modeling and control; pp. 1391–1396. [DOI] [Google Scholar]

[ref-17] Wang & Cui (2017).Wang D, Cui C. Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics. Information Sciences. 2017;417(2):55–71. doi: 10.1016/j.ins.2017.07.003. [DOI] [Google Scholar]

[ref-18] Wang et al. (2020).Wang Q, Dai W, Ma X, Shang Z. Driving amount based stochastic configuration network for industrial process modeling. Neurocomputing. 2020;394(2):61–69. doi: 10.1016/j.neucom.2020.02.029. [DOI] [Google Scholar]

[ref-19] Wang & Li (2017a).Wang D, Li M. Robust stochastic configuration networks with kernel density estimation for uncertain data regression. Information Sciences. 2017a;412(3):210–222. doi: 10.1016/j.ins.2017.05.047. [DOI] [Google Scholar]

[ref-20] Wang & Li (2017b).Wang D, Li M. Stochastic configuration networks: fundamentals and algorithms. IEEE Transactions on Cybernetics. 2017b;47(10):3466–3479. doi: 10.1109/TCYB.2017.2734043. [DOI] [PubMed] [Google Scholar]

[ref-21] Wang & Li (2018).Wang D, Li M. Deep stochastic configuration networks with universal approximation property. 2018 International Joint Conference on Neural Networks (IJCNN); Piscataway: IEEE; 2018. pp. 1–8. [DOI] [Google Scholar]

[ref-22] Wang et al. (2021).Wang Q, Yang C, Ma X, Zhang C, Peng S. Underground airflow quantity modeling based on SCN. Acta Automatica Sinica. 2021;47(8):1963–1975. doi: 10.16383/j.aas.c190602. [DOI] [Google Scholar]

[ref-23] Xie & Zhou (2020).Xie J, Zhou P. Robust stochastic configuration network multi-output modeling of molten iron quality in blast furnace ironmaking. Neurocomputing. 2020;387(9):139–149. doi: 10.1016/j.neucom.2020.01.030. [DOI] [Google Scholar]

[ref-24] Zhang & Ding (2021).Zhang C, Ding S. A stochastic configuration network based on chaotic sparrow search algorithm. Knowledge-Based Systems. 2021;220(10):106924. doi: 10.1016/j.knosys.2021.106924. [DOI] [Google Scholar]

[ref-25] Zhang et al. (2021).Zhang C, Ding S, Zhang J, Jia W. Parallel stochastic configuration networks for large-scale data regression. Applied Soft Computing. 2021;103(2):107143. doi: 10.1016/j.asoc.2021.107143. [DOI] [Google Scholar]

[ref-26] Zhu et al. (2019).Zhu X, Feng X, Wang W, Jia X, He R. A further study on the inequality constraints in stochastic configuration networks. Information Sciences. 2019;487(1):77–83. doi: 10.1016/j.ins.2019.02.066. [DOI] [Google Scholar]

PERMALINK

A regularized stochastic configuration network based on weighted mean of vectors for regression

Yang Wang

Tao Zhou

Guanci Yang

Chenglong Zhang

Shaobo Li

Abstract

Introduction

Preliminaries

SCN

Figure 1. The network structure of SCN.

INFO

Rscn-info

RSCN

RSCN-INFO algorithm

Algorithm 1. RSCN-INFO.

The calculation complexity

Experiments

Function approximation

Figure 2. Performance comparisons on function approximation.

Figure 3. Average RMSE on the real-valued function f(x)(Lmax=25).

Figure 4. Average RMSE on the real-valued function f(x)(Lmax=50).

Table 1. Performance comparisons of different methods on function approximation.

Benchmark datasets

Table 2. Specifications of three benchmark regression datasets.

Figure 5. Average training and test results on concrete.

Figure 7. Average training and test results on pole.

Table 3. Performance comparisons of different methods on concrete.

Table 5. Performance comparisons of different methods on pole.

Table 4. Performance comparisons of different methods on compactiv.

Figure 6. Average training and test results on compactiv.

Table 6. The computational time of different algorithms on benchmark datasets.

Figure 8. Average number of hidden nodes on f(x) and benchmark datasets.

Conclusion

Supplemental Information

Funding Statement

Additional Information and Declarations

Competing Interests

Author Contributions

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Figure 3. Average RMSE on the real-valued function $f (x) (L_{m a x} = 25)$ .

Figure 4. Average RMSE on the real-valued function $f (x) (L_{m a x} = 50)$ .

Figure 8. Average number of hidden nodes on $f (x)$ and benchmark datasets.