Abstract
This paper presents a grammatical evolution (GE)-based methodology to automatically design third generation artificial neural networks (ANNs), also known as spiking neural networks (SNNs), for solving supervised classification problems. The proposal performs the SNN design by exploring the search space of three-layered feedforward topologies with configured synaptic connections (weights and delays) so that no explicit training is carried out. Besides, the designed SNNs have partial connections between input and hidden layers which may contribute to avoid redundancies and reduce the dimensionality of input feature vectors. The proposal was tested on several well-known benchmark datasets from the UCI repository and statistically compared against a similar design methodology for second generation ANNs and an adapted version of that methodology for SNNs; also, the results of the two methodologies and the proposed one were improved by changing the fitness function in the design process. The proposed methodology shows competitive and consistent results, and the statistical tests support the conclusion that the designs produced by the proposal perform better than those produced by other methodologies.
1. Introduction
Artificial neural networks (ANNs) have been successfully used in theoretical and practical fields to solve several kinds of problems (e.g., classification [1, 2], robotic locomotion [3, 4], and function approximation [5, 6]). Basically, ANNs are characterized by computing units which are interconnected through communication links that serve to send and/or receive messages of some data type [7]; these elements define what is known as their architecture or topology. There can be distinguished three generations of ANNs according to their computing units [8], which are capable to solve problems of digital (ANNs from 1st to 3rd generation), analogical (ANNs from 2nd to 3rd generation), and spatiotemporal (ANNs from 3rd generation) nature. The first generation is based on threshold units such as McCulloch–Pitts neurons [9] or perceptrons [10]. The second generation is based on computing units that apply continuous activation functions (e.g., sigmoid or hyperbolic tangent functions); ANNs of this generation can be trained with gradient descent-based algorithms such as the backpropagation learning rule [11]. The third generation is based on spiking neurons (see [12] for a detailed reference) such as integrate and fire model [13] or Hodgkin–Huxley neuron [14]; ANNs of this generation are known as spiking neural networks (SNNs), and these are the kinds of ANNs worked in this paper.
Usually, the implementation of an ANN to solve a specific problem, regardless the generation it belongs to, requires of human experts who define the ANN's topological elements, the learning rule, and its parameters, among other design criteria. The experts perform such a design process either empirically, following some rule of thumb or by trial and error; this is due because there is a lack of a well-established methodology to set up the ANN design for a given problem. It is well-known that the good performance of ANNs is strongly related to their design and related criteria; thus, design of an ANN may stand for a challenge. Several studies have explored the learnability issues of ANNs related to their design; for example, combinatorial problems that arise related to the design of feedforward ANNs [15] or the problems that ANNs with fixed architectures may face when learning a specific problem [7, 16–19]. Insights have been given to ease or enhance learnability of ANNs, for example, by applying constraints to the task to be learned or to the ANN's architecture [7, 15]. As an example of constraints applied to the ANNs' architecture, partially connected ANNs have shown equal or better performance than their fully connected version; among other interesting benefits, there are reduction of the network complexity and its training and recall times [20]. Another insight is to develop algorithms capable of changing the architecture of an ANN during the learning process [7, 15].
Nowadays, evolutionary artificial neural networks (EANNs) are a special class of ANNs which are the result of using evolutionary algorithms (EAs), or other kinds of metaheuristic methods, for adapting the design of ANNs according to a specific task or problem; this is achieved by optimizing one or several of their design criteria (also the term Neuroevolution has been used to refer to this kind of design method). Thus, the EANNs, in some manner, allow us to avoid or overcome the learnability issues related to ANN architectures and to prescind, partially or completely, of human experts (see [21–24] for comprehensive reviews). There are four main approaches of deploying EANNs [25] by means of weight optimization [26–28], topology structure optimization [25, 29–31], weight and topology structure optimization [32–38], and learning rule optimization [39, 40]. Most of the work made on EANNs is focused on deploying ANNs from the first and second generations.
Recently, efforts to use SNNs for solving real problems from engineering and industry are increasing because of interesting characteristics of spiking neurons, such as their greater computational power than that of less plausible neuron models and SNNs can solve problems with fewer computing units than those of ANNs from previous generations [19, 41]. Although there are learning rules to adapt parameters of SNNs, such as SpikeProp [42], the use of metaheurstic algorithms is a common practice to adapt their parameters or define design criteria because they overcome drawbacks of such learning rules [43] and allow us to handle the greater variety of design criteria (parameters of neuron models and synapses, types of synapses, topology's wiring patterns, encoding scheme, etc.) that these kinds of ANNs present; in this work, the combination of SNN and metaheuristic algorithms is referred as evolutionary spiking neural networks (ESNNs). In [44–48], the synaptic weights of a single spiking neuron, e.g., integrate and fire model [13] or Izhikevich model [49], are calibrated by means of algorithms such as differential evolution (DE) [50], particle swarm optimization (PSO) [51], cuckoo search algorithm (CSA) [52], or genetic algorithm (GA) [53] to perform classification tasks; the spiking neuron performs the classification by using the firing rate encoding scheme as the similarity criterion in order to assign the class to which an input pattern belongs. Other works, in [43, 54–57], three-layered feedforward SNNs with synaptic connections were implemented, which are formed by a weight and a delay, to solve supervised classification problems through the use of time-to-first-spike as a classification criterion; in these works, the training has been carried out by means of evolutionary strategy (ES) [58, 59] and PSO algorithms. An extension of previous works is made in [60, 61], where the number of hidden layers and their computing units are defined by grammatical evolution (GE) [62] besides the metaheuristic learning. More complex SNN frameworks have been developed and trained with metaheuristics (such as ES) to perform tasks such as visual pattern recognition, audio-visual pattern recognition, taste recognition, ecological modelling, sign language recognition, object movement recognition, and EEG spatio/spectrotemporal pattern recognition (see [63] for a review of these frameworks). The robotic locomotion is solved through SNNs designed by metaheuristics in [60, 64, 65]; in these works, both the connectivity pattern and synaptic weights of each Belson–Mazet–Soula (BMS) [66] neuron model into SNNs called spiking central pattern generators (SCPGs) are defined through GE or Christiansen grammar evolution (CGE) [67] algorithms; all individual designs are integrated to define the SCPGs that allow the locomotion of legged robots.
The present paper proposes a design methodology for three-layered feedforward ANNs of the third generation for solving supervised classification problems. The design methodology incorporates partial connectivity between input and hidden layers, which contribute to reduce the topological complexity of the ESNNs; in addition, partial connectivity may also contribute to reduce the number of features of the input vector, thus indirectly performing dimensionality reduction. The proposal explores the search space of three-layered feedforward topologies with configured synaptic connections; thus an explicit learning process is not required. This kind of design methodology has been previously proposed for ANNs from first and second generations, and they can be considered as a design of composed functions. To the best of the authors' knowledge, this is the first attempt to perform the design of SNNs that define the number of computing units and their configured connectivity patterns (weights and delays). The rest of the paper is organized as follows: Section 2 explains the proposed methodology and its constituent methods. The experimental configuration of the proposal and other methodologies used for comparison and their results are in Section 3. In Section 4, the results of the proposed methodology are statistically compared to those of other methodologies. Finally, Section 5 contains the conclusion of the paper and future work based on it.
2. Design Methodology and Concepts
This paper proposes a framework to design partially connected spiking neural networks (SNNs) for solving supervised classification problems. Such proposed framework requires the following elements: a temporal encoding scheme to transform original input data into a suitable form for the network; a context-free grammar in Backus–Naur form (BNF grammar) to guide the generation of neural network words and a mapping process to transform genotype of individuals into functional network designs; a fitness function and a target definition to determine the performance of proposed networks, and a search engine to optimize the solutions. A general diagram of the methodology can be seen in Figure 1.
Figure 1.
General diagram for the proposed framework.
2.1. Spiking Neural Networks
The spiking neural networks (SNNs) constitute the third generation of ANNs because of the inclusion of the firing time component in their computation process [8].
2.1.1. Spike Response Model
The spike response model (SRM) is employed in this framework as basis for the SNN. The SRM fires (i.e., produces a spike) whenever the state of its membrane potential surpasses the firing threshold (θ). In the SRM, its membrane potential is calculated through time as a linear summation of postsynaptic potentials (PSPs) (excitatories and/or inhibitories), which are caused by impinging spikes arriving to a neuron through its presynaptic connections (Figure 2); each PSP is weighted and delayed by its synaptic connection.
Figure 2.

Membrane potential of neuron j: linear summation of PSPs [55].
The membrane potential x of neuron j at time t is calculated as the weighted (w ji) summation of contributions (y i(t)) from its connecting presynapses (Γj), as in the following equation:
| (1) |
The unweighted contribution y i(t) is described by equation (2), in which the function ε(t − t i − d ji) describes a form of the PSPs generated by impinging spikes coming from the presynaptic neuron i at the simulation time (t). The parameters of the presynaptic connection i are: the firing time t i and synaptic delay d ji:
| (2) |
The spike response function ε(t) describes the form of PSPs, and it is defined in the following equation, where τ represents the membrane potential time constant that defines the decay time of the postsynaptic potential:
| (3) |
2.2. Temporal Encoding
Due to the nature of the employed neural model, original features from the dataset must be transformed into spikes prior to introducing them into the network. For such purpose, the one-dimensional encoding in the following equation is employed [56]:
| (4) |
where Y is the spike temporal value, f is the original feature value, [a, b] are the lower and upper temporal interval limits of the encoding, whereas M and m hold the maximum and minimum values that the f variable takes, respectively, and r is the range between M and m. This encoding method preserves the dimension of the samples in the dataset, while providing a temporal representation of the scalar values of the dataset suitable for insertion in the network.
2.3. Grammatical Evolution (GE)
Grammatical evolution is an evolutionary algorithm based on the combination of genetic algorithms and context-free grammars [68]. It employs a BNF grammar relating to the problem, a mapping process to obtain the functional form of solutions, and a search engine to drive the search process.
2.3.1. BNF Grammar
The Backus–Naur form (Figure 3) is employed to define the topology of the network and its parameters. Any word produced by this grammar includes an arbitrary number of hidden neurons and some specific pre- and postsynapses with their respective parameters. The opening curled bracket symbol ({) indicates the division between hidden neurons, while the opening parenthesis (() marks the different synapses, and the at symbol (@) precedes the synapse-specific weight and delay values.
Figure 3.

Proposed BNF grammar for designing partially connected SNNs.
Figure 4 illustrates an example of a word generated by the proposed grammar and its corresponding network topology. By relating the word with its network topology, the word has two “{” symbols (see end of each row), implying that the network has two hidden neurons. In this case, each row has two “(” symbols meaning three synaptic configurations (but it can vary for each hidden neuron), where the first and second synaptic configurations represent connections with neurons from the input layer, and the last configuration marks the synapse with the output layer; each synaptic configuration is formed by a neuron identifier, a synaptic weight, and a delay. In Figure 4, each presynaptic neuron and its synaptic connection with a postsynaptic neuron are portrayed in the same color to clarify the reading of the transformation process from a word to a network topology.
Figure 4.

Example of a word generated by the proposed grammar and its corresponding network topology.
2.3.2. Mapping Process
The mapping process transforms an individual from its genotypic form into its phenotypic form to represent a functional network. The depth-first mapping process—employed in this framework—is the standard in GE; basically, it begins by deriving (i.e., replace it by one of its productions) the left-most nonterminal symbol (initially, <architecture> non-terminal symbol) until all nonterminal symbols in depth are derived and then moves to the current left-most nonterminal. The process continues until either nonterminals are depleted, or all elements of the genotype have been used.
2.3.3. Search Engine
Several population-based metaheuristic algorithms can be used as the search engine of grammatical evolution. The well-known genetic algorithm (GA) and differential evolution (DE) are used in this framework [69].
2.4. Fitness Function
Two different fitness functions are considered to provide a measure of the ability of the solutions to solve the problem:
The squared error is as defined in the following equation, where P is the total number of training patterns, O is the number of neurons in the output layer, t o a(p) is the actual firing time, and t o d(p) is the desired firing time of neuron o:
| (5) |
(2) The accuracy error of the training subset is as in the following equation, where C is the number of correct predictions and T is the total of predictions:
| (6) |
Both fitness functions are designed to be minimized.
2.5. Target
In order to obtain a prediction, a particular firing time is assigned to each class in the dataset employed, resulting in a desired time-to-first spike for every sample belonging to a specific class.
3. Experiments and Results
Twelve supervised classification benchmark datasets from the UCI Machine Learning Repository [70] were considered for experimentation: Balance Scale, Blood Transfusion Service Center (Blood), Breast Cancer Wisconsin (Breast Cancer), Japanese Credit Screening (Card), Pima Indians Diabetes (Diabetes), Fertility, Glass Identification (Glass), Ionosphere, Iris Plant, Liver Disorders (Liver), Parkinson, and Wine. Table 1 shows the details of the datasets employed.
Table 1.
Datasets employed for experimentation.
| Dataset | Instances | Classes | Features |
|---|---|---|---|
| Balance Scale | 625 | 3 | 4 |
| Blood | 748 | 2 | 4 |
| Breast Cancer | 683 | 2 | 9 |
| Card | 653 | 2 | 15 |
| Diabetes | 768 | 2 | 8 |
| Fertility | 100 | 2 | 9 |
| Glass | 214 | 6 | 9 |
| Ionosphere | 351 | 2 | 33 |
| Iris Plant | 150 | 3 | 4 |
| Liver | 345 | 2 | 6 |
| Parkinson | 195 | 2 | 22 |
| Wine | 178 | 3 | 13 |
Each dataset was randomly divided into two subsets of approximately the same size, accounting for the instances of each class to be evenly distributed between the subsets. One of these subsets is assigned to be the design set, while the other is to be the test set.
Then, the design set is employed to carry out the GE, while the test set is reserved to prove the performance of the best solution provided by the evolutionary process.
Aiming to compare the performance between neural models from different generations in solving pattern recognition tasks, six different configurations were considered, as shown in Table 2, observing the following details:
α configurations employ the parameters defined in [35], focusing on developing second-generation partially connected ANNs
β configurations aim to be an homology of α configurations but used to produce third-generation partially connected ANNs
γ configurations are defined as β configurations but employing DE as search engine instead of GA
Table 2.
Configurations included in experimentation.
| Configuration | Fitness function | Search engine |
|---|---|---|
| α 1 | Squared error | GA |
| α 2 | Accuracy error | GA |
| β 1 | Squared error | GA |
| β 2 | Accuracy error | GA |
| γ 1 | Squared error | DE |
| γ 2 | Accuracy error | DE |
Parameters between configurations were matched to make a comparison as fair as possible. Furthermore, configurations labeled with subscript 1 look upon the squared error as the fitness function to guide the evolutionary process, while configurations labeled with subscript 2 consider the accuracy error of the design set.
In order to guarantee statistical significance, the central limit theorem [71] is satisfied by performing 33 experiments for each configuration. Specific parameters used in this framework for configurations β and γ are provided next.
Temporal Encoding: The one-dimensional encoding scheme observes a temporal range from 0.01 to 9 milliseconds (ms).
SRM: membrane potential time constant τ = 9; target: {12 ms, 15 ms, 18 ms, … .} (depending on the number of classes in the dataset); simulation time [10 ms, target of the last class plus two]; threshold θ = 1 millivolts (mV); weight range ∈ [−999.99, 999.99]; and delay range ∈ [0.01, 19.99] (ms).
GA: binary search space [0, 1], codon size = 8; individual dimension = 4000 (500 codons); population size = 100; function calls = 1,000,000; K-tournament (K = 5) selection operator; elitism percentage = 10%; one-point crossover operator; mutation: bit-negator mutation operator (5%).
DE: real search space [0, 255]; individual dimension = 500; function calls = 1,000,000; population size = 100; crossover rate = 10%; mutation: DE/Rand/1.
Tables 3 and 4 show the results obtained by carrying out the aforementioned methodology. Accuracy value ∈ [0, 1] grades the average performance of the configurations applied to classify specific datasets, along with its corresponding standard deviation, for all experiments made. Design accuracy relates with the performance of the best network topology obtained by the evolutionary algorithm, whilst test accuracy indicates the performance of such network applied to the test subset; highest values are indicated in boldface.
Table 3.
Accuracy of design and testing on every configuration for Balance Scale, Blood, Breast Cancer, Card, Diabetes, and Fertility datasets.
| Dataset | Configuration | Design accuracy | Test accuracy |
|---|---|---|---|
| Balance Scale | α 1 | 0.7486 ± 0.0525 | 0.7219 ± 0.0629 |
| α 2 | 0.8354 ± 0.0460 | 0.8077 ± 0.0582 | |
| β 1 | 0.7331 ± 0.0718 | 0.6944 ± 0.0752 | |
| β 2 | 0.8363 ± 0.0272 | 0.8078 ± 0.0427 | |
| γ 1 | 0.8528 ± 0.0197 | 0.8346 ± 0.0261 | |
| γ 2 | 0.8960 ± 0.0062 | 0.8647 ± 0.0134 | |
|
| |||
| Blood | α 1 | 0.7711 ± 0.0112 | 0.7622 ± 0.0117 |
| α 2 | 0.8010 ± 0.0135 | 0.7731 ± 0.0168 | |
| β 1 | 0.7747 ± 0.0110 | 0.7607 ± 0.0138 | |
| β 2 | 0.7863 ± 0.0162 | 0.7684 ± 0.0120 | |
| γ 1 | 0.7760 ± 0.0076 | 0.7618 ± 0.0088 | |
| γ 2 | 0.7957 ± 0.0145 | 0.7685 ± 0.0155 | |
|
| |||
| Breast Cancer | α 1 | 0.9494 ± 0.0141 | 0.9418 ± 0.0238 |
| α 2 | 0.9781 ± 0.0073 | 0.9585 ± 0.0077 | |
| β 1 | 0.9474 ± 0.0121 | 0.9405 ± 0.0151 | |
| β 2 | 0.9677 ± 0.0073 | 0.9432 ± 0.0142 | |
| γ 1 | 0.9574 ± 0.0111 | 0.9384 ± 0.0140 | |
| γ 2 | 0.9749 ± 0.0062 | 0.9478 ± 0.0117 | |
|
| |||
| Card | α 1 | 0.8624 ± 0.0099 | 0.8641 ± 0.0140 |
| α 2 | 0.8779 ± 0.0160 | 0.8591 ± 0.0174 | |
| β 1 | 0.8561 ± 0.0502 | 0.8524 ± 0.0527 | |
| β 2 | 0.8814 ± 0.0137 | 0.8561 ± 0.0153 | |
| γ 1 | 0.8740 ± 0.0134 | 0.8596 ± 0.0197 | |
| γ 2 | 0.8879 ± 0.0120 | 0.8535 ± 0.0166 | |
|
| |||
| Diabetes | α 1 | 0.7506 ± 0.0231 | 0.7457 ± 0.0207 |
| α 2 | 0.7843 ± 0.0156 | 0.7476 ± 0.0224 | |
| β 1 | 0.7570 ± 0.0151 | 0.7490 ± 0.0215 | |
| β 2 | 0.7780 ± 0.0143 | 0.7477 ± 0.0126 | |
| γ 1 | 0.7810 ± 0.0153 | 0.7370 ± 0.0152 | |
| γ 2 | 0.7902 ± 0.0134 | 0.7389 ± 0.0205 | |
|
| |||
| Fertility | α 1 | 0.8988 ± 0.0256 | 0.8218 ± 0.0616 |
| α 2 | 0.9297 ± 0.0204 | 0.8170 ± 0.0551 | |
| β 1 | 0.9255 ± 0.0243 | 0.8309 ± 0.0459 | |
| β 2 | 0.9182 ± 0.0222 | 0.8279 ± 0.0467 | |
| γ 1 | 0.9455 ± 0.0199 | 0.8479 ± 0.0462 | |
| γ 2 | 0.9370 ± 0.0131 | 0.8236 ± 0.0484 | |
Table 4.
Accuracy of design and testing on every configuration for Glass, Ionosphere, Iris Plant, Liver, Parkinson, and Wine datasets.
| Dataset | Configuration | Design accuracy | Test accuracy |
|---|---|---|---|
| Glass | α 1 | 0.2549 ± 0.1345 | 0.2404 ± 0.1433 |
| α 2 | 0.6035 ± 0.0448 | 0.5374 ± 0.0688 | |
| β 1 | 0.4288 ± 0.0673 | 0.4002 ± 0.0590 | |
| β 2 | 0.6641 ± 0.0255 | 0.5947 ± 0.0436 | |
| γ 1 | 0.4895 ± 0.0574 | 0.4351 ± 0.0476 | |
| γ 2 | 0.7126 ± 0.0190 | 0.6186 ± 0.0413 | |
|
| |||
| Ionosphere | α 1 | 0.8549 ± 0.0374 | 0.8374 ± 0.0295 |
| α 2 | 0.9158 ± 0.0217 | 0.8669 ± 0.0267 | |
| β 1 | 0.8543 ± 0.0537 | 0.8137 ± 0.0708 | |
| β 2 | 0.9190 ± 0.0284 | 0.8724 ± 0.0241 | |
| γ 1 | 0.9351 ± 0.0182 | 0.8907 ± 0.0240 | |
| γ 2 | 0.9616 ± 0.0113 | 0.9015 ± 0.0201 | |
|
| |||
| Iris Plant | α 1 | 0.8857 ± 0.1111 | 0.8663 ± 0.1269 |
| α 2 | 0.9653 ± 0.0161 | 0.9386 ± 0.0210 | |
| β 1 | 0.9733 ± 0.0157 | 0.9382 ± 0.0217 | |
| β 2 | 0.9859 ± 0.0109 | 0.9362 ± 0.0163 | |
| γ 1 | 0.9794 ± 0.0123 | 0.9325 ± 0.0164 | |
| γ 2 | 0.9923 ± 0.0074 | 0.9358 ± 0.0261 | |
|
| |||
| Liver | α 1 | 0.6820 ± 0.0406 | 0.6462 ± 0.0536 |
| α 2 | 0.7352 ± 0.0245 | 0.6660 ± 0.0356 | |
| β 1 | 0.6834 ± 0.0481 | 0.6304 ± 0.0476 | |
| β 2 | 0.7461 ± 0.0224 | 0.6632 ± 0.0394 | |
| γ 1 | 0.7472 ± 0.0183 | 0.6723 ± 0.0302 | |
| γ 2 | 0.7636 ± 0.0196 | 0.6612 ± 0.0295 | |
|
| |||
| Parkinson | α 1 | 0.8719 ± 0.0395 | 0.8281 ± 0.0568 |
| α 2 | 0.9080 ± 0.0159 | 0.8596 ± 0.0285 | |
| β 1 | 0.8563 ± 0.0264 | 0.8033 ± 0.0519 | |
| β 2 | 0.8953 ± 0.0205 | 0.8503 ± 0.0353 | |
| γ 1 | 0.9025 ± 0.0266 | 0.8380 ± 0.0387 | |
| γ 2 | 0.9200 ± 0.0172 | 0.8494 ± 0.0377 | |
|
| |||
| Wine | α 1 | 0.6881 ± 0.1549 | 0.6415 ± 0.1551 |
| α 2 | 0.9063 ± 0.0441 | 0.8384 ± 0.0606 | |
| β 1 | 0.7375 ± 0.1126 | 0.6816 ± 0.1098 | |
| β 2 | 0.8895 ± 0.0641 | 0.7855 ± 0.0686 | |
| γ 1 | 0.9318 ± 0.0285 | 0.8620 ± 0.0491 | |
| γ 2 | 0.9638 ± 0.0164 | 0.8684 ± 0.0458 | |
As well, Tables 5 and 6 show some of the features of the generated topologies, focusing on the average amount of input vector features actually employed by the networks, and its corresponding rate regarding the total size of the original input vector; besides, the average number of hidden units and synapses present in the generated networks. In Supplementary Materials, some examples of SNNs' topologies with best obtained results are shown; each example contains the benchmark dataset, used configuration, accuracies of design and test phases, the generated word, and the network topology.
Table 5.
Topology characteristics for every dataset on configurations α 1, β 1, and γ 1.
| Configuration | Dataset | Average number of features employed | Rate of used features | Average number of hidden units | Average number of synapses |
|---|---|---|---|---|---|
| α 1 | Balance Scale | 2.97 ± 0.67 | 0.74 | 1.58 ± 0.85 | 9.58 ± 2.89 |
| Blood | 2.30 ± 0.76 | 0.58 | 1.85 ± 0.99 | 7.64 ± 2.24 | |
| Breast Cancer | 2.52 ± 0.99 | 0.28 | 2.03 ± 1.09 | 8.79 ± 2.04 | |
| Card | 2.52 ± 0.93 | 0.17 | 2.09 ± 0.93 | 8.67 ± 2.22 | |
| Diabetes | 2.09 ± 0.90 | 0.26 | 1.97 ± 1.11 | 7.09 ± 1.91 | |
| Fertility | 3.03 ± 0.97 | 0.34 | 2.30 ± 1.29 | 9.24 ± 2.87 | |
| Glass | 2.21 ± 1.32 | 0.25 | 1.48 ± 0.93 | 11.73 ± 3.77 | |
| Ionosphere | 2.24 ± 1.05 | 0.07 | 2.21 ± 1.30 | 7.61 ± 2.81 | |
| Iris Plant | 1.30 ± 0.52 | 0.33 | 1.70 ± 1.00 | 8.18 ± 2.35 | |
| Liver | 2.48 ± 0.70 | 0.41 | 1.88 ± 0.84 | 6.64 ± 1.79 | |
| Parkinson | 2.27 ± 1.38 | 0.10 | 1.85 ± 1.50 | 8.00 ± 4.65 | |
| Wine | 2.48 ± 1.23 | 0.19 | 1.97 ± 1.75 | 9.73 ± 4.48 | |
|
| |||||
| β 1 | Balance Scale | 3.52 ± 0.56 | 0.88 | 4.09 ± 1.58 | 12.36 ± 4.32 |
| Blood | 3.12 ± 0.69 | 0.78 | 4.76 ± 2.55 | 12.03 ± 5.77 | |
| Breast Cancer | 6.12 ± 1.47 | 0.68 | 4.03 ± 1.85 | 14.24 ± 5.46 | |
| Card | 5.79 ± 1.95 | 0.39 | 4.39 ± 2.09 | 12.85 ± 5.47 | |
| Diabetes | 3.39 ± 0.95 | 0.42 | 3.55 ± 1.71 | 09.36 ± 4.00 | |
| Fertility | 5.58 ± 1.74 | 0.62 | 4.09 ± 2.22 | 12.39 ± 6.02 | |
| Glass | 5.70 ± 1.62 | 0.63 | 3.03 ± 1.82 | 11.88 ± 6.30 | |
| Ionosphere | 5.52 ± 2.24 | 0.17 | 3.55 ± 2.05 | 12.15 ± 6.68 | |
| Iris Plant | 3.39 ± 0.89 | 0.85 | 4.73 ± 2.60 | 13.27 ± 6.93 | |
| Liver | 3.94 ± 1.07 | 0.66 | 4.06 ± 1.91 | 11.03 ± 5.32 | |
| Parkinson | 5.12 ± 1.77 | 0.23 | 4.36 ± 2.45 | 11.48 ± 5.66 | |
| Wine | 6.15 ± 1.88 | 0.47 | 3.82 ± 1.49 | 13.03 ± 4.93 | |
|
| |||||
| γ 1 | Balance Scale | 4.00 ± 0.00 | 1.00 | 4.03 ± 1.64 | 12.79 ± 3.75 |
| Blood | 3.79 ± 0.48 | 0.95 | 4.70 ± 1.71 | 13.42 ± 4.23 | |
| Breast Cancer | 6.70 ± 1.38 | 0.74 | 5.39 ± 2.39 | 16.30 ± 5.93 | |
| Card | 8.76 ± 2.85 | 0.58 | 5.12 ± 2.04 | 18.03 ± 8.12 | |
| Diabetes | 6.24 ± 1.33 | 0.78 | 5.67 ± 3.19 | 17.76 ± 9.36 | |
| Fertility | 7.24 ± 1.18 | 0.80 | 4.79 ± 2.42 | 16.79 ± 6.20 | |
| Glass | 6.79 ± 1.32 | 0.75 | 4.97 ± 2.18 | 16.24 ± 5.85 | |
| Ionosphere | 11.64 ± 3.31 | 0.35 | 4.45 ± 1.86 | 18.03 ± 6.07 | |
| Iris Plant | 3.91 ± 0.29 | 0.98 | 5.79 ± 2.79 | 15.73 ± 6.89 | |
| Liver | 5.03 ± 1.03 | 0.84 | 4.64 ± 2.36 | 14.39 ± 6.49 | |
| Parkinson | 8.91 ± 3.41 | 0.40 | 4.18 ± 2.37 | 15.06 ± 7.58 | |
| Wine | 8.64 ± 1.92 | 0.66 | 4.97 ± 1.62 | 16.97 ± 5.33 | |
Table 6.
Topology characteristics for every dataset on configurations α 2, β 2, and γ 2.
| Configuration | Dataset | Average number of features employed | Rate of used features | Average number of hidden units | Average number of synapses |
|---|---|---|---|---|---|
| α 2 | Balance Scale | 3.73 ± 0.45 | 0.93 | 2.06 ± 1.23 | 11.48 ± 3.47 |
| Blood | 3.33 ± 0.77 | 0.83 | 2.52 ± 1.28 | 11.64 ± 3.95 | |
| Breast Cancer | 3.85 ± 1.10 | 0.43 | 2.06 ± 1.04 | 10.82 ± 3.93 | |
| Card | 5.39 ± 2.20 | 0.36 | 2.76 ± 1.54 | 14.42 ± 6.27 | |
| Diabetes | 3.21 ± 0.95 | 0.40 | 2.33 ± 1.22 | 10.64 ± 3.56 | |
| Fertility | 5.30 ± 1.59 | 0.59 | 2.94 ± 2.01 | 15.39 ± 7.72 | |
| Glass | 2.79 ± 1.22 | 0.31 | 1.88 ± 0.91 | 13.91 ± 3.70 | |
| Ionosphere | 4.45 ± 1.67 | 0.13 | 3.55 ± 1.42 | 14.18 ± 4.36 | |
| Iris Plant | 1.76 ± 1.05 | 0.44 | 2.00 ± 1.67 | 11.27 ± 6.49 | |
| Liver | 3.33 ± 0.94 | 0.56 | 1.67 ± 0.84 | 09.18 ± 3.02 | |
| Parkinson | 3.67 ± 2.22 | 0.17 | 1.91 ± 1.03 | 10.09 ± 4.43 | |
| Wine | 3.15 ± 1.52 | 0.24 | 2.67 ± 1.49 | 12.30 ± 5.26 | |
|
| |||||
| β 2 | Balance Scale | 3.88 ± 0.33 | 0.97 | 3.39 ± 1.92 | 09.88 ± 4.16 |
| Blood | 3.21 ± 0.69 | 0.80 | 4.21 ± 3.50 | 11.55 ± 9.49 | |
| Breast Cancer | 5.76 ± 1.63 | 0.64 | 3.97 ± 2.41 | 11.94 ± 6.09 | |
| Card | 6.94 ± 2.24 | 0.46 | 4.55 ± 2.85 | 14.00 ± 7.21 | |
| Diabetes | 4.06 ± 1.23 | 0.51 | 3.91 ± 2.11 | 10.85 ± 5.65 | |
| Fertility | 5.30 ± 2.11 | 0.59 | 3.39 ± 2.01 | 10.36 ± 5.64 | |
| Glass | 5.18 ± 1.45 | 0.58 | 3.73 ± 1.66 | 11.21 ± 4.44 | |
| Ionosphere | 6.48 ± 2.27 | 0.20 | 4.30 ± 2.67 | 12.94 ± 6.33 | |
| Iris Plant | 3.24 ± 0.74 | 0.81 | 4.09 ± 2.50 | 10.61 ± 6.30 | |
| Liver | 4.27 ± 1.21 | 0.71 | 3.52 ± 1.78 | 10.52 ± 4.39 | |
| Parkinson | 5.39 ± 2.44 | 0.25 | 3.09 ± 1.99 | 09.64 ± 5.51 | |
| Wine | 5.85 ± 1.73 | 0.45 | 4.24 ± 2.22 | 12.85 ± 5.21 | |
|
| |||||
| γ 2 | Balance Scale | 4.00 ± 0.00 | 1.00 | 3.15 ± 2.12 | 10.55 ± 5.78 |
| Blood | 3.67 ± 0.59 | 0.92 | 4.58 ± 2.56 | 12.79 ± 5.86 | |
| Breast Cancer | 7.30 ± 1.27 | 0.81 | 4.61 ± 1.82 | 15.85 ± 5.21 | |
| Card | 8.58 ± 2.45 | 0.57 | 4.00 ± 2.47 | 15.09 ± 7.10 | |
| Diabetes | 6.06 ± 1.41 | 0.76 | 4.24 ± 1.78 | 13.76 ± 4.96 | |
| Fertility | 6.73 ± 1.33 | 0.75 | 4.06 ± 2.73 | 14.21 ± 6.65 | |
| Glass | 6.70 ± 1.17 | 0.74 | 4.91 ± 1.60 | 15.06 ± 4.26 | |
| Ionosphere | 11.64 ± 4.14 | 0.35 | 3.88 ± 2.20 | 16.88 ± 7.10 | |
| Iris Plant | 3.61 ± 0.69 | 0.90 | 3.82 ± 2.18 | 10.97 ± 5.42 | |
| Liver | 4.91 ± 0.90 | 0.82 | 3.21 ± 1.53 | 10.45 ± 4.11 | |
| Parkinson | 7.97 ± 2.47 | 0.36 | 3.39 ± 1.50 | 12.36 ± 4.20 | |
| Wine | 8.15 ± 1.96 | 0.63 | 4.70 ± 1.93 | 16.12 ± 5.57 | |
4. Comparative Statistical Analysis
As detailed in the previous section, data samples from performing thirty-three independent experiments for each configuration on every dataset were obtained. Thereupon, several statistical tests [72] were applied to these data. First of all, a Shapiro–Wilk [73] test was applied to determine the normality of the samples. Such test showed that data can indeed be modelled under normal distributions. Further analysis was divided into three tests applied to configurations using squared error as fitness function, configurations using accuracy error as fitness function, and all configurations.
4.1. Test of Designs Driven by Squared Error Fitness Function
In order to verify statistical significance of the results, analysis of variance (ANOVA [74]) tests were applied to determine if, firstly, implementing different methodologies to develop weighed network topologies impacts on the accuracy of classification and secondly, to identify which of these methodologies offers the best performance. Table 7 shows the results obtained by two-way ANOVA test, observing as independent variables both configurations and datasets.
Table 7.
Two-way ANOVA F, pairwise t-test, and Tukey HSD test with Bonferroni correction for squared error configurations.
| ANOVA | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| Configuration | 2 | 0.7109 | 0.35547 | 73.217 | <2.2e-16 |
| Dataset | 11 | 25.3929 | 2.30845 | 475.4710 | <2.2e-16 |
| Residuals | 1174 | 5.6999 | 0.00486 | ||
|
| |||||
| t-Tests | α 1 | β 1 | |||
|
| |||||
| β 1 | 0.5936 | — | |||
| γ 1 | 1.9e-6 | 0.0006 | |||
|
| |||||
| Tukey HSD | diff | lwr | upr | p adj | |
|
| |||||
| β 1 vs. γ 1 | 0.014 | 0.0032 | 0.0544 | 0.00 | |
| α 1 vs. β 1 | 0.014 | 0.0030 | 0.026 | 0.0078 | |
| α 1 vs. γ 1 | 0.057 | 0.0460 | 0.0693 | 0.00 | |
ANOVA's null hypothesis (H 0) dictates that observed samples come from one unique normal distribution. As p values (Pr(>F) in Table 7) are smaller than the significance value of 0.05, there is not enough evidence to accept H 0, ergo rejecting that samples are statistically similar. In other words, it can be conclude that configurations come from different distributions. This test provides relevant statistical evidence to support the conclusion that changing the methodology while generating weighted topologies influences the classification accuracy of the networks.
Pairwise t-tests and Tukey HSD [75] tests were applied next. As in the ANOVA test, the null hypothesis in both tests assumes that samples come from a single distribution. Table 7 shows t-test p values with a Bonferroni correction. Based on these results, it can be inferred that, with statistical significance, γ 1 configuration can be considered different from β 1 and α 1 configurations, based on a significance level of 0.05. Subsequently, Tukey HSD test results can be found in Table 7, to uphold that γ 1 configuration is significantly different from the other configurations. Once the previous results were found, a higher performance for γ 1 configuration is noticeable in the three left-most configurations shown in, e.g., Fertility (Figure 5), Glass (Figure 6), and Ionosphere (Figure 7), performance box plots.
Figure 5.

Box plots of the performance of all configurations on the Fertility dataset.
Figure 6.

Box plots of the performance of all configurations on the Glass dataset.
Figure 7.

Box plots of the performance of all configurations on the Ionosphere dataset.
4.2. Test of Designs Driven by Accuracy Error Fitness Function
Statistical analysis for configurations driven by accuracy error fitness function was performed with the same approach as in the previous subsection; Table 8 shows ANOVA, t-test, and Tukey HSD tests applied to such configurations. In this case, for designs driven by accuracy error fitness function, the pair-wise t-test show that there is not difference with statistical significance to reject the null hypothesis H 0 for α 2 and γ 2 configurations; however, the α 2 configuration requires a higher computational power to carry out the designing task due its search engine and its respective operators (crossover and mutation). The aforementioned issues are not presented for the γ 2 configuration; besides its results show a similar accuracy results with lower dispersion, this can be noticed in the right-most configurations shown in, e.g., Fertility (Figure 5), Glass (Figure 6), and Ionosphere (Figure 7) performance box plots, and this behavior was consistently observed for all benchmark datasets. The Tukey HSD test shows that there is statistical difference for all configurations; this, along with the observed behavior in the previous box plots, confirms that γ 2 configuration holds as the outperforming algorithm.
Table 8.
Two-way ANOVA F, pairwise t-test, and Tukey HSD test with Bonferroni correction for accuracy error configurations.
| ANOVA | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| Configuration | 2 | 0.0769 | 0.0384 | 26.319 | 6.58e − 12 |
| Dataset | 11 | 12.3300 | 1.1210 | 767.4670 | <2.2e − 16 |
| Residuals | 1174 | 1.7160 | 0.0014 | ||
|
| |||||
| t-Tests | α 2 | β 2 | |||
|
| |||||
| β 2 | 1.0000 | — | |||
| γ2 | 0.1030 | 1.3e − 3 | |||
|
| |||||
| Tukey HSD | diff | lwr | upr | p adj | |
|
| |||||
| β 2 vs. γ 2 | 0.0170 | 0.0110 | 0.0240 | 0.0000 | |
| α 2 vs. β 2 | −0.0010 | −0.0070 | 0.0050 | 0.8830 | |
| α 2 vs. γ 2 | 0.0100 | 0.0100 | 0.0220 | 0.0001 | |
4.3. Test of All Configurations
An omnibus test was applied to the entire set of experiments considering both as independent variables, configurations and fitness functions. Two-way ANOVA test was applied to determine if varying both observed variables influences accuracy performance. Table 9 contains such results, providing statistical certainty to reject the null hypothesis H 0; in other words, the accuracy performance is affected by both variables. The p values lower than the significance level of 0.05 indicate that changing the optimization function (squared error and accuracy error) and the configuration does indeed affect the performance accuracy obtained by the generated topology.
Table 9.
Two-way ANOVA F and pairwise t-test with Bonferroni correction tests for all configurations.
| ANOVA | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| Fitness function | 1 | 1.1220 | 1.1200 | 58.9180 | 2.36e − 14 |
| Configuration | 4 | 0.7880 | 0.1960 | 10.3400 | 2.663e − 08 |
| Residuals | 2370 | 45.1490 | 0.0190 | ||
|
| |||||
| t-Tests | α 1 | α 2 | β 1 | β 2 | γ 1 |
|
| |||||
| α 2 | 0.0001 | — | — | — | — |
| β 1 | 0.0100 | 0.00 | — | — | — |
| β 2 | 0.0001 | 0.9900 | 0.0010 | — | — |
| γ 1 | 0.0001 | 0.8900 | 0.0010 | 0.9680 | — |
| γ 2 | 0.0000 | 0.0040 | 0.0000 | 0.0010 | 4.8e − 5 |
Finally, pairwise t-test was applied to discern if, given two configurations, their performances are statistically similar. Considering p values in Table 9 and a significance value of 0.05, it can be inferred with statistically trustworthy that γ 2 configuration generally outperforms other configurations.
5. Conclusions and Future Work
This paper presents a GE-based methodology to design partially connected ANNs for solving supervised classification problems; some interesting characteristics of the methodology are that it provides weighted topologies which allow us to avoid an explicit training and those topologies exhibit partial connectivity between input and hidden layers which may avoid redundancies and reduce the dimensionality of the input feature vectors. The proposed methodology (γ 2) evolved from progressive improvements made to a base methodology (α 1), which uses GE with GA as search engine and squared error as fitness function; improvements were made by changing neuron models which allowed us to generate SNNs (β 1) instead of ANNs from the second ANN generation and by changing the search engine by using DE (γ 1) instead of GA. All the aforementioned configurations were adapted to use another fitness function based on the accuracy error of generated ANNs, so-called α 2, β 2, and γ 2.
In order to validate the achieved improvements, several statistics tests were applied. Each configuration was tested for twelve well-known benchmark datasets of supervised classification problems by performing 33 experiments for each dataset. Three types of statistical analysis were performed, and the first being applied to α 1, β 1, and γ 1 configurations, which use squared error as the fitness function. In such analysis, γ 1 configuration is shown to outperform the other configurations based on the statistical test and graphic box plots. The second analysis focused α 2, β 2, and γ 2 configurations, which use the accuracy error as the fitness function; based on the Tukey HSD test, this analysis yielded a similar conclusion as from the first analysis, but with respect to γ 2 configuration. The last analysis compared all configurations and showed statistical evidence to support that γ 2 is a better configuration with competitive performances and lower dispersions for its designs.
Focusing in topology designs and performance results, evolutionary designs led to the formulation of solution topologies with fewer connections than those in equivalent fully connected topologies, hence reducing the complexity of the networks and achieving good classification performances. The topology simplification provided a good network design (i.e., design accuracy was competent), but it was desirable to get better generalization capability for unseen data in the test phase; some particular cases exhibited lower test accuracies, evidencing an improving opportunity.
Due to the flexibility of the context-free grammars employed in GE, another aspect of neural network topologies can be considered to cope with detected issues while preserving the enhancements accomplished. The design process may consider other traits, e.g., selection of the neural model and/or the search engine, specification of the model parameters, or even aggregation on the number of hidden layers to design SNNs for deep learning topologies. Moreover, additional types of topologies with structures other than layered networks can be explored to be designed, such as those of reservoir computing or central pattern generators. Furthermore, another kind of grammar-based genetic programming algorithms can be used to add semantic to the design process, such as Christiansen grammar evolution [67].
Finally, contemplating the fitness function as another relevant aspect to produce enhanced designs, considerations can also be made to it: to minimize the amount of processing units in the hidden layer or to consider another evaluation measurements to comply with other kinds of problems; features in the fitness function may be treated as weighted mono-objective fitness function or by using algorithms such as the nondominated sorting genetic algorithm (NSGA) [76] with fitness functions with multiple objectives.
Acknowledgments
The authors wish to thank the National Technology of México and University of Guanajuato. A. Espinal wishes to thank SEP-PRODEP for the support provided to the Project 511-6/17-8074 “Diseño y entrenamiento de Redes Neuronales Artificiales mediante Algoritmos Evolutivos.” G. López-Vázquez and A. Rojas-Domínguez thank the National Council of Science and Technology of México (CONACYT) for the support provided by means of the Scholarship for Postgraduate Studies (701071) and research grant CÁTEDRAS-2598, respectively. This work was supported by the CONACYT Project FC2016-1961 ”Neurociencia Computacional: de la teoría al desarrollo de sistemas neuromórficos”.
Data Availability
The supervised classification dataset benchmarks used to support the findings of this study have been taken from the UCI Machine Learning Repository of the University of California, Irvine (http://archive.ics.uci.edu/ml/datasets.html).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Supplementary Materials
Examples of the best results obtained for SNNs are shown; each example contains the benchmark dataset, used configuration, accuracies of design and test phases, the generated word, and the network topology.
References
- 1.Markou M., Singh S. Novelty detection: a review-part 2: Signal processing. 2003;83(12):2499–2521. doi: 10.1016/j.sigpro.2003.07.019. [DOI] [Google Scholar]
- 2.Zhang G. P. Neural networks for classification: a survey. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 2000;30(4):451–462. doi: 10.1109/5326.897072. [DOI] [Google Scholar]
- 3.Ijspeert A. J. Central pattern generators for locomotion control in animals and robots: a review. Neural Networks. 2008;21(4):642–653. doi: 10.1016/j.neunet.2008.03.014. [DOI] [PubMed] [Google Scholar]
- 4.Yu J., Tan M., Chen J., Zhang J. A survey on CPG-inspired control models and system implementation. IEEE Transactions OnNeural Networks and Learning Systems. 2014;25(3):441–456. doi: 10.1109/tnnls.2013.2280596. [DOI] [PubMed] [Google Scholar]
- 5.Elfwing S., Uchibe E., Doya K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks. 2018;107:3–11. doi: 10.1016/j.neunet.2017.12.012. [DOI] [PubMed] [Google Scholar]
- 6.Scarselli F., Chung Tsoi A. Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural networks. 1998;11(1):15–37. doi: 10.1016/s0893-6080(97)00097-x. [DOI] [PubMed] [Google Scholar]
- 7.Judd J. S. Neural Network Design and The Complexity of Learning. Neural Network Modeling and Connectionism Series. Cambridge, MA, USA: MIT Press; 1990. [Google Scholar]
- 8.Maass W. Networks of spiking neurons: the third generation of neural network models. Neural Networks. 1997;10(9):1659–1671. doi: 10.1016/s0893-6080(97)00011-7. [DOI] [Google Scholar]
- 9.McCulloch W. S., Pitts W. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics. Dec. 1943;5(4):115–133. doi: 10.1007/bf02478259. [DOI] [PubMed] [Google Scholar]
- 10.Rosenblatt F. The Perceptron, A Perceiving And Recognizing Automaton (Project PARA) Buffalo, NY, USA: Cornell Aeronautical Laboratory; 1957. [Google Scholar]
- 11.Rumelhart D. E., Hinton G. E., Williams R. J. Learning representations by back-propagating errors. nature. 1986;323(6088):533–536. doi: 10.1038/323533a0. [DOI] [Google Scholar]
- 12.Gerstner W., Kistler W. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge, UK: Cambridge University Press; 2002. [Google Scholar]
- 13.Lapicque L. Recherches quantitatives sur l’excitation electrique des nerfs traitee comme une polarization. Journal de Physiologie et de Pathologie Generalej. 1907;9:620–635. [Google Scholar]
- 14.Hodgkin A. L., Huxley A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal Of Physiology. 1952;117(4):500–544. doi: 10.1113/jphysiol.1952.sp004764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Amaldi E., Mayoraz E., de Werra D. A review of combinatorial problems arising in feedforward neural network design. Discrete Applied Mathematics. 1994;52(2):111–138. doi: 10.1016/0166-218x(92)00184-n. [DOI] [Google Scholar]
- 16.Blum A. L., Rivest R. L. Training a 3-node neural network is NP-complete. Neural Networks. 1992;5(1):117–127. doi: 10.1016/s0893-6080(05)80010-3. [DOI] [Google Scholar]
- 17.DasGupta B., Siegelmann H. T., Sontag E. Theoretical Advances in Neural Computation and Learning. Boston, MA, USA: Springer; 1994. On the intractability of loading neural networks; pp. 357–389. [DOI] [Google Scholar]
- 18.Judd S. On the complexity of loading shallow neural networks. Journal of Complexity. 1988;4(3):177–192. doi: 10.1016/0885-064x(88)90019-2. [DOI] [Google Scholar]
- 19.Maass W., Schmitt M. On the complexity of learning for spiking neurons with temporal coding. Information and Computation. 1999;153(1):26–46. doi: 10.1006/inco.1999.2806. [DOI] [Google Scholar]
- 20.Elizondo D., Fiesler E. A survey of partially connected neural networks. International Journal of Neural Systems. 1997;8(5-6):535–558. doi: 10.1142/s0129065797000513. [DOI] [PubMed] [Google Scholar]
- 21.Ding S., Li H., Su C., Yu J., Jin F. Evolutionary artificial neural networks: a review. Artificial Intelligence Review. 2013;39(3):251–260. doi: 10.1007/s10462-011-9270-6. [DOI] [Google Scholar]
- 22.Floreano D., Dürr P., Mattiussi C. Neuroevolution: from architectures to learning. Evolutionary Intelligence. 2008;1(1):47–62. doi: 10.1007/s12065-007-0002-4. [DOI] [Google Scholar]
- 23.Ojha V. K., Abraham A., Snášel V. Metaheuristic design of feedforward neural networks: a review of two decades of research. Engineering Applications of Artificial Intelligence. 2017;60:97–116. doi: 10.1016/j.engappai.2017.01.013. [DOI] [Google Scholar]
- 24.Yao X. Evolving artificial neural networks. Proceedings of the IEEE. 1999;87(9):1423–1447. doi: 10.1109/5.784219. [DOI] [Google Scholar]
- 25.Tayefeh M., Taghiyareh F., Forouzideh N., Caro L. Evolving artificial neural network structure using grammar encoding and colonial competitive algorithm. Neural Computing and Applications. 2013;22(1):1–16. [Google Scholar]
- 26.Espinal A., Sotelo-Figueroa M., Soria-Alcaraz J. A., et al. Comparison of PSO and DE for training neural networks. Proceedings of 2013 12th Mexican International Conference on Artificial Intelligence; November 2013; Mexico City, Mexico. pp. 83–87. [Google Scholar]
- 27.Morales A. K. Non-standard norms in genetically trained neural networks. Proceedings of 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks; May 2000; San Antonio, TX, USA. IEEE; pp. 43–51. [Google Scholar]
- 28.Morales A. K. Training neural networks using non-standard norms–preliminary results. Proceedings of Mexican International Conference on Artificial Intelligence; April 2000; Acapulco, Mexico. pp. 350–364. [Google Scholar]
- 29.Alba E., Aldana J., Troya J. M. Full automatic ann design: a genetic approach. Proceedings of International Workshop on Artificial Neural Networks; June 1993; Sitges, Spain. Springer; pp. 399–404. [Google Scholar]
- 30.De Mingo Lopez L. F., Gomez Blas N., Arteta A. The optimal combination: grammatical swarm, particle swarm optimization and neural networks. Journal of Computational Science. 2012;3(1-2):46–55. doi: 10.1016/j.jocs.2011.12.005. [DOI] [Google Scholar]
- 31.Kitano H. Designing neural networks using genetic algorithms with graph generation system. Complex systems. 1990;4(4):461–476. [Google Scholar]
- 32.Ahmadizar F., Soltanian K., AkhlaghianTab F., Tsoulos I. Artificial neural network development by means of a novel combination of grammatical evolution and genetic algorithm. Engineering Applications of Artificial Intelligence. 2015;39:1–13. doi: 10.1016/j.engappai.2014.11.003. [DOI] [Google Scholar]
- 33.Garro B. A., Sossa H., Vazquez R. A. Design of artificial neural networks using a modified particle swarm optimization algorithm. Proceedings of The 2009 International Joint Conference On Neural Networks, IJCNN’09; June 2009; Atlanta, GA, USA. IEEE Press; pp. 2363–2370. [Google Scholar]
- 34.Garro B. A., Vázquez R. A. Designing artificial neural networks using particle swarm optimization algorithms. Computational intelligence and neuroscience. 2015;2015:20. doi: 10.1155/2015/369298.369298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Quiroz-Ramírez O., Espinal A., Ornelas-Rodríguez M., et al. Partially-connected artificial neural networks developed by grammatical evolution for pattern recognition problems. Fuzzy Logic Augmentation of Neural and Optimization Algorithms: Theoretical Aspects and Real Applications. 2018;749:99–112. doi: 10.1007/978-3-319-71008-2_9. [DOI] [Google Scholar]
- 36.Rivero D., Dorado J., Rabuñal J., Pazos A. Generation and simplification of artificial neural networks by means of genetic programming. Neurocomputing. 2010;73(16–18):3200–3223. doi: 10.1016/j.neucom.2010.05.010. [DOI] [Google Scholar]
- 37.Sheng W., Shan P., Mao J., Zheng Y., Chen S., Wang Z. An adaptive memetic algorithm with rank-based mutation for artificial neural network architecture optimization. IEEE Access. 2017;5:18895–18908. doi: 10.1109/access.2017.2752901. [DOI] [Google Scholar]
- 38.Tsoulos I., Gavrilis D., Glavas E. Neural network construction and training using grammatical evolution. Neurocomputing. 2008;72(1–3):269–277. doi: 10.1016/j.neucom.2008.01.017. [DOI] [Google Scholar]
- 39.Fontanari J., Meir R. Evolving a learning algorithm for the binary perceptron. Network: Computation in Neural Systems. 1991;2(4):353–359. doi: 10.1088/0954-898x/2/4/002. [DOI] [Google Scholar]
- 40.Kim H. B., Jung S. H., Kim T. G., Park K. H. Fast learning method for back-propagation neural network by evolutionary adaptation of learning rates. Neurocomputing. 1996;11(1):101–106. doi: 10.1016/0925-2312(96)00009-4. [DOI] [Google Scholar]
- 41.Ghosh-Dastidar S., Adeli H. Spiking neural networks. International Journal of Neural Systems. 2009;19(04):295–308. doi: 10.1142/s0129065709002002. [DOI] [PubMed] [Google Scholar]
- 42.Bohte S. M., Kok J. N., La Poutré H. Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing. 2002;48(1–4):17–37. doi: 10.1016/s0925-2312(01)00658-0. [DOI] [Google Scholar]
- 43.Belatreche A. Biologically Inspired Neural Networks: Models, Learning, and Applications. Saarbrücken, Germany: VDM Verlag; 2010. [Google Scholar]
- 44.Cachón A., Vázquez R. A. Tuning the parameters of an integrate and fire neuron via a genetic algorithm for solving pattern recognition problems. Neurocomputing. 2015;148:187–197. doi: 10.1016/j.neucom.2012.11.059. [DOI] [Google Scholar]
- 45.Vazquez R. A. Advances in Artificial Intelligence–IBERAMIA 2010. Lecture Notes in Computer Science. Vol. 6433. Berlin, Germany: Springer; 2010. Pattern recognition using spiking neurons and firing rates. [DOI] [Google Scholar]
- 46.Vazquez R. A. Training spiking neural models using cuckoo search algorithm. Proceedings of IEEE Congress on Evolutionary Computation; June 2011; New Orleans, LA, USA. pp. 679–686. [Google Scholar]
- 47.Vazquez R. A., Cachon A. Integrate and fire neurons and their application in pattern recognition. Proceedings of 7th International Conference on Electrical Engineering Computing Science and Automatic Control; September 2010; Tuxtla Gutierrez, Mexico. pp. 424–428. [Google Scholar]
- 48.Vázquez R. A., Garro B. A. Advances in Swarm Intelligence. Lecture Notes in Computer Science. Berlin, Germany: Springer; 2011. Training spiking neurons by means of particle swarm optimization; pp. 242–249. [DOI] [Google Scholar]
- 49.Izhikevich E. M. Simple model of spiking neurons. IEEE Transactions on neural networks. 2003;14(6):1569–1572. doi: 10.1109/tnn.2003.820440. [DOI] [PubMed] [Google Scholar]
- 50.Storn R., Price K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization. 1997;11(4):341–359. doi: 10.1023/a:1008202821328. [DOI] [Google Scholar]
- 51.Kennedy J., Eberhart R. C. Particle swarm optimization. Proceediongs of IEEE International Conference on Neural Networks; November-December 1995; Perth, Australia. pp. 1942–1948. [Google Scholar]
- 52.Gandomi A. H., Yang X.-S., Alavi A. H. Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Engineering with computers. 2013;29(1):17–35. doi: 10.1007/s00366-011-0241-y. [DOI] [Google Scholar]
- 53.Holland J. Adaptation in Natural and Artificial Systems. Ann Arbor, MI, USA: University of Michigan Press; 1975. [Google Scholar]
- 54.Altamirano J. S., Ornelas M., Espinal A., et al. Advances in Pattern Recognition. 2015. Comparing evolutionary strategy algorithms for training spiking neural networks; p. p. 9. [Google Scholar]
- 55.Belatreche A., Maguire L. P., McGinnity M., Wu Q. X. An evolutionary strategy for supervised training of biologically plausible neural networks. Proceedings of the Sixth International Conference on Computational Intelligence and Natural Computing (CINC); September 2003; Cary, NC, USA. pp. 1524–1527. [Google Scholar]
- 56.Belatreche A., Maguire L. P., McGinnity T. M. Advances in design and application of spiking neural networks. Soft Computing. 2006;11(3):239–248. doi: 10.1007/s00500-006-0065-7. [DOI] [Google Scholar]
- 57.Shen H., Liu N., Li X., Wang Q. A cooperative method for supervised learning in spiking neural networks. Proceedings of 14th International Conference on Computer Supported Cooperative Work in Design; April 2010; Shanghai, China. IEEE; pp. 22–26. [Google Scholar]
- 58.Rechenberg I. Evolutionsstrategie: Optimierung Technischer Systeme Nach Prinzipien der biologischen Evolution. Problemata, 15. Stuttgart, Germany: Frommann-Holzboog Verlag; 1973. [Google Scholar]
- 59.Schwefel H. P. Numerische Optimierung Von Computer-Modellen Mittels der Evolutionsstrategie. Vol. 1. Basel, Switzerland: Birkhäuser; 1977. [Google Scholar]
- 60.Espinal A., Carpio M., Ornelas M., Puga H., Melin P., Sotelo-Figueroa M. Recent Advances on Hybrid Approaches for Designing Intelligent Systems. Cham, Switzerland: Springer; 2014. Comparing metaheuristic algorithms on the training process of spiking neural networks; pp. 391–403. [DOI] [Google Scholar]
- 61.Espinal A., Carpio M., Ornelas M., Puga H., Melín P., Sotelo-Figueroa M. Developing architectures of spiking neural networks by using grammatical evolution based on evolutionary strategy. Proceedings of Mexican Conference on Pattern Recognition; June 2014; Cancun, Mexico. Springer; pp. 71–80. [Google Scholar]
- 62.Ryan C., Collins J., O’Neill M. Grammatical evolution: evolving programs for an arbitrary language. Proceedings of Genetic Programming: First European Workshop, EuroGP’98; April 1998; Paris, France. Springer Berlin Heidelberg; pp. 83–96. [Google Scholar]
- 63.Schliebs S. Optimisation And Modelling Of Spiking Neural Networks: Enhancing Neural Information Processing Systems Through The Power Of Evolution. Saarbrücken, Germany: LAP Lambert Academic Publishing; 2010. [Google Scholar]
- 64.Espinal A., Rostro-Gonzalez H., Carpio M., et al. Quadrupedal robot locomotion: a biologically inspired approach and its hardware implementation. Computational Intelligence and Neuroscience. 2016;2016:13. doi: 10.1155/2016/5615618.5615618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Guerra-Hernandez E. I., Espinal A., Batres-Mendoza P., Garcia-Capulin C. H., Romero-Troncoso R. De J., Rostro-Gonzalez H. A fpga-based neuromorphic locomotion system for multi-legged robots. IEEE Access. 2017;5:8301–8312. doi: 10.1109/access.2017.2696985. [DOI] [Google Scholar]
- 66.Soula H., Beslon G., Mazet O. Spontaneous dynamics of asymmetric random recurrent spiking neural networks. Neural Computation. 2006;18(1):60–79. doi: 10.1162/089976606774841567. [DOI] [PubMed] [Google Scholar]
- 67.Ortega A., De La Cruz M., Alfonseca M. Christiansen grammar evolution: grammatical evolution with semantics. IEEE Transactions on Evolutionary Computation. 2007;11(1):77–90. doi: 10.1109/tevc.2006.880327. [DOI] [Google Scholar]
- 68.O’Neill M., Ryan C. Grammatical evolution. IEEE Transactions on Evolutionary Computation. 2001;5(4):349–358. [Google Scholar]
- 69.Talbi E.-G. Metaheuristics: From Design to Implementation. Hoboken, NJ, USA: John Wiley & Sons; 2009. [Google Scholar]
- 70.Dheeru D., Karra Taniskidou E. UCI machine learning repository. 2017. http://archive.ics.uci.edu/ml.
- 71.Gnedenko B. V., Kolmogorov A. N. PredelÊźnye raspredeleniiÍąa dliÍąa summ.English. Cambridge, MA, USA: Addison-Wesley Pub. Co.; 1954. Limit distributions for sums of independent random variables. [Google Scholar]
- 72.Soria Alcaraz J. A., Ochoa G., Carpio M., Puga H. Evolvability metrics in adaptive operator selection. Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation; July 2014; Vancouver, Canada. ACM; pp. 1327–1334. [Google Scholar]
- 73.Shapiro S. S., Wilk M. B. An analysis of variance test for normality (complete samples) Biometrika. 1965;52(3-4):591–611. doi: 10.2307/2333709. [DOI] [Google Scholar]
- 74.Anscombe F. J. The validity of comparative experiments. Journal of the Royal Statistical Society. Series A (General) 1948;111(3):181–211. doi: 10.2307/2984159. [DOI] [Google Scholar]
- 75.Montgomery D. C. Design and analysis of experiments. Hoboken, NJ, USA: Wiley; 2013. [Google Scholar]
- 76.Srinivas N., Deb K. Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary computation. 1994;2(3):221–248. doi: 10.1162/evco.1994.2.3.221. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Examples of the best results obtained for SNNs are shown; each example contains the benchmark dataset, used configuration, accuracies of design and test phases, the generated word, and the network topology.
Data Availability Statement
The supervised classification dataset benchmarks used to support the findings of this study have been taken from the UCI Machine Learning Repository of the University of California, Irvine (http://archive.ics.uci.edu/ml/datasets.html).

