A novel algorithm with differential evolution and coral reef optimization for extreme learning machine training

Zhiyong Yang; Taohong Zhang; Dezheng Zhang

doi:10.1007/s11571-015-9358-9

. 2015 Oct 17;10(1):73–83. doi: 10.1007/s11571-015-9358-9

A novel algorithm with differential evolution and coral reef optimization for extreme learning machine training

Zhiyong Yang ^1,², Taohong Zhang ^1,^2,^✉, Dezheng Zhang ^1,²

PMCID: PMC4722136 PMID: 26834862

Abstract

Extreme learning machine (ELM) is a novel and fast learning method to train single layer feed-forward networks. However due to the demand for larger number of hidden neurons, the prediction speed of ELM is not fast enough. An evolutionary based ELM with differential evolution (DE) has been proposed to reduce the prediction time of original ELM. But it may still get stuck at local optima. In this paper, a novel algorithm hybridizing DE and metaheuristic coral reef optimization (CRO), which is called differential evolution coral reef optimization (DECRO), is proposed to balance the explorative power and exploitive power to reach better performance. The thought and the implement of DECRO algorithm are discussed in this article with detail. DE, CRO and DECRO are applied to ELM training respectively. Experimental results show that DECRO-ELM can reduce the prediction time of original ELM, and obtain better performance for training ELM than both DE and CRO.

Keywords: Extreme learning machine (ELM), Differential evolution (DE), Coral reef optimization (CRO), Differential evolution coral reef optimization (DECRO)

Introduction

Recently, the modeling of cognitive processes has been widely dicussed and many researchers has been attracted to study the related learning algorithms (Wang et al. 2013; Wennekers and Palm 2009; Lee et al. 2012; Chowdhury et al. 2015). Among the cognitive based machine learning algorithms, extreme learning machine (ELM) (Huang et al. 2004) is a novel and fast learning method based on the structure of single layer feed-forward network. During the training process of ELM, the input layer parameters of a SLFN are randomly set without optimization, and the output layer weight is calculated by Moore–Penrose (MP) generalized inverse without iteration. According to this idea, ELM could not only become extremely faster than the traditional gradient-based algorithms, but also avoid being stuck at local optimization and obtain artificial neural network (ANN) models with better generalization performance. However, without the optimized input layer weights and bias, more hidden layer nodes are needed to improve the performance of ELM and it brings about the slow prediction speed of ELM. Zhu et al. (2005) proposed the evolutionary ELM (E-ELM), where the input layer weight and bias is learnt using differential evolution (DE), so as to combine the global optimization power of evolutionary computing and the efficiency of ELM training method which can enhance the prediction speed of ELM and get compact networks.

The DE (Storn and Price 1997) algorithm used in E-ELM to train the input layer parameters is well-known for its global optimization ability and efficiency to locate the global solutions. The reason that DE has been applied to a wild range of science and engineering (Roque and Martins 2015; Bhadra and Bandyopadhyay 2015; Hamedia et al. 2015; Chena et al. 2015; Atif and Al-Sulaiman 2015; Garcła-Domingo et al. 2015; Sarkara et al. 2015) is that it is simple and straightforward to be implemented, and that the parameters of DE which needed to be tuned manually are very few. However, it was pointed out that DE may get stuck at local optima for some problem (Ronkkonen et al. 2005) and doesnt perform well on problems that are not linear separable (Langdon and Poli 2007). In the standard DE new individuals are generated with the information of different individuals which may leads to good explorative global searching power, but the local searching power near each individual especially the best ones is relatively poor. An early published paper about evolutionary programming (Birru et al. 1999) also indicated that local searching helps to generate global optimum if the corresponding algorithm could find the basin of attraction for global optimum and thus reduce the time for the algorithm to converge. Thus the explorative power and exploitive power should be balanced to reach better performance. In this sense, we develop a novel algorithm differential evolution coral reef optimization (DECRO) by hybridizing DE and a novel metaheuristic coral reef optimization (CRO) (Salcedo-Sanz et al. 2014a). CRO algorithm is a metaheuristic modeling and simulating coral reproduction, which employs both a mutation process to avoid local optima and a exploitive process which is similar to simulated annealing. All three mentioned algorithms i.e. DE, CRO, DECRO, inspired by the framework of E-ELM, are applied to training ELM input layer parameters. The corresponding approaches are denoted as DE-ELM, CRO-ELM, DECRO-ELM respectively.

The rest of this paper is organized as follows: “DE algorithm and CRO algorithm” section briefly introduces DE and CRO, “DECRO-the proposed algorithm” section proposes the DECRO algorithm, “Apply DECRO to training ELM” section introduces the original ELM and application of DECRO to train ELM (i.e. DECRO-ELM). “Experiments” section presents the experimental results and conclusions.

DE algorithm and CRO algorithm

Differential evolution (DE) algorithm

DE, proposed by Storn and Price (1997), is a powerful search algorithm in the optimal problems, and uses the vector differences of individuals for perturbing the population members.

Outline of DE

As an initial setting, DE employs a population Pop of N individuals, each of which representing a D dimensional solution vector denoted as $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i D})$ . At the beginning of DE, each individual is randomly generated in the search space. During the process of DE algorithm, new individuals with better objective function value are generated by iterations of three fundamental operations (i.e. mutation, crossover and selection), until a certain stop criteria is reached.

Mutation

During mutation, a new donor vector $v_{i}$ is generated as a candidate for each individual of the current population Pop as (1).

v_{i} = x_{r 1} + F (x_{r 2} - x_{r 3})

where r1, r2, r3 are randomly chosen individual indexes ranging from 1 to $N_{p o p}$ , each of which is different from the others, F is the scaling factor that indicates the weight of the difference of individuals.

Crossover

With the generated donor vector $v_{i}$ and the original individual $x_{i}$ , a trial vector $u_{i}$ is generated by binomial crossover as follows.

u_{i j} = \{\begin{matrix} v_{i j} & r a n d < C R o r j = j_{r a n d} \\ x_{i j} & otherwise \end{matrix}

where CR denotes the crossover probability for each dimension, rand is a random number subjects to uniform distribution U(0, 1), $j_{r a n d}$ is a randomly generated integer ranging from 1 to D ( the dimensionality of the fitness function).

Selection

For each of the individuals in the current population, the trail vector $u_{i}$ is compared with the original individual $v_{i}$ and only the one with better objective function value will be incorporated into the population of the next generation.

x_{i}^{t + 1} = \{\begin{matrix} u_{i} & f (u_{i}) is\,better\,than f (x_{i}) \\ x_{i} & otherwise \end{matrix}

where $x_{i}^{t + 1}$ denotes the ith individual for the population of the next generation and f(.) is the fitness function (objective function).

Coral reef optimization (CRO) metaheuristic algorithm

CRO is a novel algorithm proposed by Salcedo-Sanz et al. (2014a), tackling optimization problems by modeling and simulating corals reproduction and formation. A series of corresponding application has been carried out (Salcedo-Sanz et al. 2014a, b, c, d, e, 2015, 2013). The main processes of CRO is described as follows.

Terminology and notations

Let $Λ$ be a model of reef, consisting of a $N \times M$ square grid. We assume that each square (i, j) of $Λ$ is able to allocate a coral $Ξ_{i j}$ (or colony of corals), representing different solutions to our problem, encoded as strings of numbers in a given alphabet $Ω$ . The CRO algorithm is first initialized at random by assigning some squares in $Λ$ to be occupied by corals (i.e. solutions to the problem) and some other squares in the grid to be empty, which means holes in the reef where new corals can freely settle and grow. The rate between free/total squares at the beginning of the algorithm is an important parameter of the CRO algorithm, which will be denoted in what follows as $0 < ρ_{0} < 1$ . Each coral is labeled with an associated fitness function $f (Ξ_{i j}) : Ω \to R$ that represents the problems objective function. Note that the reef will progress as long as healthier (stronger) corals (which represent better solutions to the problem at hand) survive, while less healthy corals perish.

Partition of the existed corals

A certain fraction (denoted as $F_{b}$ ) of existed corals is selected uniformly at random to be broadcast spawners, while the remaining existed coral (at a fraction of 1 − $F_{b}$ ) is selected to be Brooders.

Broadcast spawning (crossover)

Select couples out from the pool of broadcast spawners in each iteration, each of which will form a coral larva by crossover, which is then released out to the water (see “Larvae setting (competition for a living space)” section). Note that, once two corals have been selected to be the parents of a larva, they are not chosen anymore in an iteration (i.e. two corals are parents only once in a given iteration). This couple selection can be done uniformly at random or by resorting to any fitness proportionate selection approach (e.g. roulette wheel).

Brooding (mutation for local searching)

For all brooders, the brooding modeling consists of the formation of a coral larva by means of a random mutation of the brooding-reproductive coral (self-fertilization considering hermaphrodite corals). The produced larva is then released out to the water in a similar fashion as that of the larvae generated in “Depredation in polyp phase (eliminate corals with poor fitness value)” section.

Larvae setting (competition for a living space)

Once all the larvae are formed, either through broadcast spawning or by brooding, they will try to set and grow in the reef. Each larva will randomly try, for a given number of times (denoted as k), to set in a square of the reef. If the square is empty (free space in the reef), the coral grows therein. By contrast, if a coral is already occupying the square at hand, the new larva will set only if its fitness function is better than that of the existing coral. Finally, for a given brooder, if all of the k trials are failed, it will be eliminated.

Asexual reproduction (budding)

In the modeling of asexual reproduction (budding or fragmentation), the overall set of existing corals in the reef are sorted by the corresponding fitness value [given by $f (Ξ_{i j})$ ], from which a fraction $F_{a}$ duplicates itself and tries to settle in a different part of the reef by following the setting process described in “Larvae setting (competition for a living space)” section.

Depredation in polyp phase (eliminate corals with poor fitness value)

At the end of iteration, a small number of corals in the reef can be depredated, thus liberating space in the reef for next coral generation. The depredation operator is applied with a very small probability $P_{d}$ at each iteration, and exclusively to a fraction $F_{d}$ of the worse fitness corals. Note that any coral can be repeated for at most $μ$ times in the reef, otherwise the redundant repetitions will be eliminated and the corresponding square is released.

The processes described in “Partition of the existed corals”–“Depredation in polyp phase (eliminate corals with poor fitness value)” sections are repeated iteratively until certain stop criteria is reached.

DECRO-the proposed algorithm

According to Salcedo-Sanz et al. (2014a), the explorative power of CRO is controlled by broadcast spawning which carry out the majority of global searching and brooding which could help jump out of the local optima. As for exploitive power, the budding process ensures that CRO carefully searches the neighbor of the current Pop and larvae setting process controls local searching by a simulated annealing like process, the cooling temperature of which is controlled by $ρ_{0}$ . To better balance the explorative/exploitive power, we propose a hybrid algorithm called DECRO, where DE is used to carry out the broadcast spawning. In such a manner, DE algorithm could enhance the explorative power of CRO, while CRO could render exploitive power to DE. Details of DECRO is described as follows.

Comparing with the original CRO, the broadcast spawning, budding and depredation process are improved as follows.

Improved partition: $p a r t i t i o n_{M}$

Instead of selecting the broadcast spawners uniformly at random, $p a r t i t i o n_{M}$ selects the 1 − $F_{b}$ percent of existed corals with better fitness value as brooders to enhance the local searching power around the top candidates, while the other $F_{b}$ percent of existed corals as broadcast spawners to explore the solution space, $F_{b}$ could be tuned as a function of iteration time to further enhance the dynamic performance of DECRO.

Improved broadcastspawning: $b r o a d c a s t s p a w n i n g_{M}$

During one step of $b r o a d c a s t s p a w n i n g_{M}$ , a larva candidate is generated for each coral belonging to the set of current spawners spawner by DE and only the larvae outperform their ancestor are included in the set of selected larvae $L_{s p}$ . The formal expression is discussed in Algorithm 1.

Improved budding: $b u d d i n g_{M}$

During one step of $b u d d i n g_{M}$ , to enhance local searching power,instead of simply copying the top Fa corals, an extra Cauchy mutation is carried out [generate a random number subjects to Cauchy distribution with parameter $(l, Δ)$ denoted randc] for each of the top Fa corals. The merge of mutated coral and their ancestors form the candidate larvae set $L_{b u}$ , only half of $L_{b u}$ with better fitness function value survive and will struggle for their living space by “Larvae setting (competition for a living space)” section. The formal expression is discussed in Algorithm 2.

Improved depredation: $d e p r e d a t i o n_{M}$

In $d e p r e d a t i o n_{M}$ , not a coral is eliminated and a novel strategy to deal with redundant repetitions is proposed. Instead of eliminate the redundant coral, a local searching is carried out in $d e p r e d a t i o n_{M}$ , by which the redundant coral will be replaced.

The outline of DECRO is summarized as the following pseudo-code.

Apply DECRO to training ELM

Similar to traditional ANN, during training phase of ELM, the output layer $β$ is calculated with the training examples, while during prediction phase, unknown examples are given to ELM, the predicted output $\hat{t}$ is calculated based on the trained ELM model.

Review of original ELM training

The original ELM training method is summarized as follows.

Given a training set $T = {x_{i}, t_{i}}, x_{i} \in R^{d}$ is the ith input vector, $t_{i} \in R, i = 1, 2, \dots, m$ is the ith target vector and an activation function g(x) and the number of hidden nodes is denoted as $n_{h}$ .

Calculate the hidden layer output matrix

$H = (\begin{matrix} g (w_{1}^{T} x_{1} + b_{1}) & \dots & g (w_{n_{h}}^{T} x_{1} + b_{n_{h}}) \\ ⋮ & ⋱ & ⋮ \\ g (w_{1}^{T} x_{m} + b_{1}) & \dots & g (w_{n_{h}}^{T} x_{m} + b_{n_{h}}) \end{matrix}) = G (X W)$ 4

where $w_{1}, w_{2}, \dots, w_{n_{h}}$ are d dimensional column vectors which are randomly generated without optimization.

Let b be $(b_{1}, \dots, b_{n_{h}})$ , for siplicity we define $W = (b, w 1, w 2, \dots, w_{n_{h}})$ (i.e. the generalized weight), and $X = (1_{m}, X)$ where $1_{m} = {\underset{⏟}{{(1, \dots, 1)}^{T}}}_{m}$ hence after. The hidden matrix H can then be denoted as G(XW).

2.
Estimate the output layer weight by following equation

$\hat{β} = H^{†} T = {(H^{T} H)}^{- 1} H^{T} T$ 5

where T is the target vector ${(t_{1}, \dots, t_{N})}^{T}$ .

Owing to the random generalization of W, the whole training process could be finished without iteration, which makes training ELM much faster than training traditional gradient based ANN algorithms. However such training process needs much larger n_h than used in traditional training process (the number of hidden layer nodes) which may retard the prediction speed.

Influence of n_h on the prediction efficiency of ELM

To better understand how n_h affects ELM prediction, the computational complexity of ELM prediction for invisible examples needs to further discussed. Algorithm 4 shows the procedure of ELM to predict the output for unknown test examples $X_{n e w} \in R^{m \times d}$ where m is number of examples in $X_{n e w}$ , d is the dimension of each example, n_h is the number of hidden layer neurons, $n_{Y}$ is the number of output layer neurons (i.e.the dimensionality of the output of each example) and $\hat{t}$ is the predicted output. According to the prediction algorithm, the total computational complexity $T (p r e d)$ depends on the calculation of H and $\hat{t}$ and we have Eq. (6).

T (p r e d) = T (H) + T (\hat{t})

Firstly, to calculate H, the matrix multiplication which takes $O (m \times d \times n_{h})$ is carried out followed with the calculation of the activation function G which takes $O (c_{g} \times m \times n_{h})$ , where $c_{g}$ is the complexity for $G (\cdot)$ , together we have $T (H) = O (m (d + 1) n_{h})$ . Secondly, to obtain $\hat{t}$ , only a simple matrix multiplication is needed which takes $O (m \times n_{h} \times n_{Y})$ . Above all, we have

T (p r e d) = O (m (d + c_{g} + n_{Y}) n_{h})

It is obvious that n_h dominates $T (p r e d)$ when n_h becomes extremely large. For original ELM, the hidden layer weights are randomly generated without any optimization procedure, in consequence, much larger n_h is needed to remain the same accuracy as classical neural network training algorithm such as BP. From the aforementioned discussion, such feature may significantly slow down the response speed of ELM to predict unknown examples. In order to balance the training efficiency and prediction efficiency, optimization techniques could be embedded into the training procedure of ELM.

Training ELM based on DECRO

To improve prediction efficiency of ELM discussed in “Influence of n_h on the prediction efficiency of ELM” section, the evolutionary framework of ELM (E-ELM) was first proposed by Zhu et al. (2005) which could reduce the hidden layer nodes of original ELM while preserves the efficiency of original ELM at the same time. In Zhu et al. (2005), the input layer parameters are trained by DE and the output layer weights are calculated as the original algorithm. The formal expression of E-ELM is presented in Algorithm 5.

In order to develop a better evolutionary algorithm to be embedded into the E-ELM framework mentioned above, we apply DECRO to train ELM and the corresponding training algorithm denotes DECRO-ELM where DECRO is used to optimize W and b which works as follows.

As illustrated by Figs. 1 and 2, the solution vector $Ξ_{i j}$ for a existed coral located at grid(i, j) mentioned at “Terminology and notations” section in DECRO-ELM is the vectorization of generalized weight denoted as $Ξ_{i j} = v e c (W^{(i, j)})$ , $\hat{β}$ is calculated using $Ξ_{i j}$ as “Review of original ELM training” section and fitness function of a coral $f (Ξ_{i j})$ is defined as the mean square error (MSE) of training set sampled from $T$ , input weight W is solved by DECRO according Algorithm 5 by changing DE to DECRO.

Fig. 1 — An individual in DECRO-ELM is excatly a vecter coding for the input layer weight

Fig. 2 — After transforming to matrix, an ELM model is trained for each individual and the fitness function is exactly the mse for such ELM

Experiments

To test the effectiveness of the purposed algorithm, DE and CRO are embedded to E-ELM framework, which denoted as DE-ELM and CRO-ELM thereafter. The performances of DECRO-ELM, DE-ELM, CRO-ELM, original ELM with larger hidden nodes on four real world datasets for regression problem are tested with each algorithm run for 30 times for each dataset. The one way ANOVA statistical analysis is employed to further measure the statistical significance of performance difference. For experiments with variance homogeneity, LSD Hayter (1986) is used as the pairwise comparison of performance, and Dunnett T3 (1955) is employed otherwise. SPSS 19.0 is used to carry out these experiments. Aimed at proofing the prediction efficiency of the purposed algorithm, the running time of each algorithm to predict the test set for each dataset over 30 times is tested, with mean value of which recorded.

Parameters setting

Data set separation:: #training set /#test set ratio for all data sets tested in this paper is set to 80%/20%.
DE-ELM:: for DE, we set F = 0.7, CR = 0.1, $N_{p o p}$ = 10
CRO-ELM:: N = 5, M = 2, Fb linearly decreases 1 from 0.9 to 0.4, Fa = 0.2, Fd = 0.1, k = 2, $ρ_{0}$ = 0.3
DECRO-ELM:: the parameter settings for DECRO-ELM is simply the combination of that of DE-ELM and CRO-ELM.

For all three E-ELMs the function evaluation times is set as 200. To simplify the expressions all X-ELM (X = DECRO,DE, CRO) will be denoted as X in the following tables. Note that all the examples in each dataset is normalized to interval [0,1].

Bike sharing dataset

The bike sharing dataset was published by Hadi Fanaee-T at Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto (http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset) which can be obtained from http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset, The goal of this dataset is to monitor the mobility in a city utilizing data recorded by bike sharing systems, with a daily monitored and hourly monitored dataset, only the daily monitored dataset (day.csv) is used. To reduce redundancy, two attributes called casual and registered are removed. The basic information and the n_h parameter for four algorithms are described as Table 1.

Table 1.

Summary of bike sharing dataset

Training set size	13,911
Test set size	3478
Number of attributes	16
n _h for E-ELMs	12
n _h for original ELM	30

	Mean	Min	Max	SD	Median
Train
DECRO	7.089	6.573	7.717	0.2480	7.083
DE	8.196	7.523	8.737	0.2748	8.201
CRO	7.601	6.980	8.367	0.3761	7.605
ELM	7.240	6.476	8.317	0.4195	7.183
Test
DECRO	8.058	7.345	9.323	0.4710	8.022
DE	10.58	9.136	13.29	0.9633	10.54
CRO	8.461	7.420	11.13	0.8592	8.302
ELM	10.69	8.791	13.63	0.9650	10.73

	Comparison method	Alg _i	mse _i − mse	p value
Train	Dunnett T3	DE	0.00110*	0.000
		CRO	0.00051*	0.000
		ELM	0.00015	0.459
Test	Dunnett T3	DE	0.00252*	0.000
		CRO	0.00040	0.171
		ELM	0.00263*	0.000

	Mean	Min	Max	SD	Median
Train
DECRO	7.144	6.395	8.184	0.4023	7.081
DE	8.635	8.310	8.976	0.1519	8.645
CRO	7.713	6.889	8.594	0.4644	7.672
ELM	7.511	6.437	8.280	0.4348	7.574
Test
DECRO	8.372	6.741	10.003	0.7606	8.454
DE	10.21	9.241	11.83	0.6348	10.21
CRO	8.945	7.396	10.81	0.8385	8.930
ELM	9.189	6.629	11.47	0.8943	9.260

	Comparison method	$A l g_{i}$	$m s e_{i} - m s e$	p value
Train	Dunnett T3	DE	0.00149*	0.000
		CRO	0.00056*	0.000
		ELM	0.00036	0.009
Test	LSD	DE	0.00183*	0.000
		CRO	0.00057	0.007
		ELM	0.00081*	0.000

	Mean	Min	Max	SD	Median
Train
DECRO	8.341	6.831	9.351	0.5895	8.422
DE	8.645	7.338	9.438	0.5273	8.767
CRO	9.386	7.993	10.55	0.5989	9.504
ELM	8.718	7.028	11.28	1.082	8.564
Test
DECRO	10.42	7.106	15.84	1.795	10.32
DE	10.76	8.316	14.23	1.425	10.62
CRO	11.79	9.318	13.75	0.9624	11.86
ELM	10.73	7.777	16.37	1.802	10.81

	Comparison method	$A l g_{i}$	$m s e_{i} - m s e$	p value
Train	Dunnett T3	DE	0.00030*	0.231
		CRO	0.00104*	0.000
		ELM	0.00037	0.480
Test	Dunnett T3	DE	0.00033*	0.962
		CRO	0.00136	0.005
		ELM	0.00031*	0.984

	Mean	Min	Max	SD	Median
Train
DECRO	0.1555	0.09930	0.2183	0.03265	0.1573
DE	1.436	0.9108	1.950	0.2753	1.392
CRO	0.2174	0.1329	0.3981	0.07310	0.1864
ELM	0.1763	0.09655	0.3679	0.07386	0.1447
Test
DECRO	0.2615	0.1150	0.4340	0.07623	0.2636
DE	2.519	1.159	4.061	0.6966	2.439
CRO	0.3582	0.1933	0.8077	0.1348	0.2878
ELM	1.054	0.4195	2.272	0.4878	0.9814

	Comparison method	$A l g_{i}$	$m s e_{i} - m s e$	p value
Train	Dunnett T3	DE	0.00128*	0.000
		CRO	0.00006*	0.001
		ELM	0.00002	0.662
Test	Dunnett T3	DE	0.00225*	0.000
		CRO	0.00009	0.009
		ELM	0.00079*	0.000

Data set	DECRO	DE	CRO	ELM
Bike Sharing	0.525	0.696	0.476	1.41
Concrete	0.895	1.150	0.973	2.49
Housing	0.250	0.322	0.239	0.763
Yacht	0.608	1.100	0.6.31	1.53

	Levene statistics	df1	df2	p value
Train	3.399	3	116	0.020
Test	3.686	3	116	0.014

	Levene statistics	df1	df2	p value
Train	6.408	3	116	0.000
Test	0.759	3	116	0.519

	Levene statistics	df1	df2	p value
Train	5.726	3	116	0.001
Test	2.686	3	116	0.050

	Levene statistics	df1	df2	p value
Train	34.657	3	116	0.000
Test	27.924	3	116	0.000

PERMALINK

A novel algorithm with differential evolution and coral reef optimization for extreme learning machine training

Zhiyong Yang

Taohong Zhang

Dezheng Zhang

Abstract

Introduction

DE algorithm and CRO algorithm

Differential evolution (DE) algorithm

Outline of DE

Mutation

Crossover

Selection

Coral reef optimization (CRO) metaheuristic algorithm

Terminology and notations

Partition of the existed corals

Broadcast spawning (crossover)

Brooding (mutation for local searching)

Larvae setting (competition for a living space)

Asexual reproduction (budding)

Depredation in polyp phase (eliminate corals with poor fitness value)

DECRO-the proposed algorithm

Improved partition: partitionM

Improved broadcastspawning: broadcastspawningM

Improved budding: buddingM

Improved depredation: depredationM

Apply DECRO to training ELM

Review of original ELM training

Influence of nh on the prediction efficiency of ELM

Training ELM based on DECRO

Fig. 1.

Fig. 2.

Experiments

Parameters setting

Bike sharing dataset

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Concrete compressive strength data set

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Housing data set

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

Yacht hydrodynamics data set

Table 16.

Table 17.

Table 18.

Table 19.

Table 20.

Summary of the experiment results

Table 21.

Table 22.

Conclusions

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Improved partition: $p a r t i t i o n_{M}$

Improved broadcastspawning: $b r o a d c a s t s p a w n i n g_{M}$

Improved budding: $b u d d i n g_{M}$

Improved depredation: $d e p r e d a t i o n_{M}$

Influence of n_h on the prediction efficiency of ELM