Skip to main content
Protein & Cell logoLink to Protein & Cell
. 2016 Aug 9;7(10):735–748. doi: 10.1007/s13238-016-0302-5

The superior fault tolerance of artificial neural network training with a fault/noise injection-based genetic algorithm

Feng Su 1,2,3,#, Peijiang Yuan 1,#, Yangzhen Wang 2,3,#, Chen Zhang 2,3,
PMCID: PMC5055486  PMID: 27502185

Abstract

Artificial neural networks (ANNs) are powerful computational tools that are designed to replicate the human brain and adopted to solve a variety of problems in many different fields. Fault tolerance (FT), an important property of ANNs, ensures their reliability when significant portions of a network are lost. In this paper, a fault/noise injection-based (FIB) genetic algorithm (GA) is proposed to construct fault-tolerant ANNs. The FT performance of an FIB-GA was compared with that of a common genetic algorithm, the back-propagation algorithm, and the modification of weights algorithm. The FIB-GA showed a slower fitting speed when solving the exclusive OR (XOR) problem and the overlapping classification problem, but it significantly reduced the errors in cases of single or multiple faults in ANN weights or nodes. Further analysis revealed that the fit weights showed no correlation with the fitting errors in the ANNs constructed with the FIB-GA, suggesting a relatively even distribution of the various fitting parameters. In contrast, the output weights in the training of ANNs implemented with the use the other three algorithms demonstrated a positive correlation with the errors. Our findings therefore indicate that a combination of the fault/noise injection-based method and a GA is capable of introducing FT to ANNs and imply that the distributed ANNs demonstrate superior FT performance.

Electronic supplementary material

The online version of this article (doi:10.1007/s13238-016-0302-5) contains supplementary material, which is available to authorized users.

Keywords: artificial neural networks, fault tolerance, genetic algorithm

Introduction

The brain is composed of biological neural networks (BNNs) that contain billions of interconnecting neurons with the ability to perform computations. Artificial neural networks (ANNs), mathematical models that mimic BNNs, are typically built as structured node groups with activation functions and connection weights that are adjusted based on the applied learning rules (Hampson, 1991, 1994; Basheer and Hajmeer, 2000; Krogh, 2008). Because of their powerful computational and learning abilities, ANNs are being used increasingly in various fields, including computation, engineering, machine learning, clinical medicine, and cognitive science (Presnell and Cohen, 1993; Baxt, 1995; Dybowski and Gant, 1995; Forsstrom and Dalton, 1995; Kamimura et al., 1996; Almeida, 2002; Lisboa, 2002; Rajan and Tolley, 2005; Lisboa and Taktak, 2006; Patel and Goyal, 2007; Hu et al., 2013; Street et al., 2013; Azimi et al., 2015).

Fault tolerance (FT), an important feature of BNNs, ensures the fidelity and reality of a system’s input-output relationship. The FT of BNNs is thought to rely on extensive parallel interconnections, distributed information storage and processing, and self-learning and self-organizing characteristics. For instance, Alzheimer’s patients lose a significant number of neurons (sometimes equaling half the normal brain mass) but still maintain certain brain functions (Fayed et al., 2012; Li et al., 2012; Weiner et al., 2015; Pini et al., 2016). Moreover, structural measurements of various areas of the brain have revealed that brain volume has no direct correlation with cognitive decline in patients (Braskie and Thompson, 2014). Fault tolerance is also an important consideration in the construction of ANNs, especially in highly variable or “fail-safe” systems (Protzel et al., 1993; Phatak and Koren, 1995b). A fault-tolerant ANN is a special ANN system designed to work normally, or at least to a certain degree of normalcy, even if some of its components are unexpectedly damaged. Recently, FT performance has become more important, partly due to the fact that the soft errors caused by transient faults are an unavoidable concern in very large-scale integration (VLSI) technology, whose dimension is approaching the nanoscale (Mahdiani et al., 2012).

To construct a fault-tolerant ANN, neurons (nodes) are replicated in the hidden layer (Emmerson and Damper, 1993; Medler and Dawson, 1994; Phatak and Koren, 1995a; Tchernev et al., 2005). In this way, FT is introduced to the ANNs at the expense of increased complexity. For instance, ANNs with thousands of artificial neurons and up to a million interconnections in the hidden layer are required to solve complex problems, such as mapping landslide susceptibility (Arnone et al., 2014), modeling pectus excavatum corrective prostheses (Rodrigues et al., 2014), and reconstructing traffic networks (Jiang et al., 2014). However, this increase in network complexity makes the hardware implementation of ANNs relatively difficult and inefficient. Other methods that have been proposed to build fault-tolerant ANNs include an adjustment of the distributing weight values (Cavalieri and Mirabella, 1999a) using an empirical equation to deduce the mean prediction error (Sum and Leung, 2008) and adopting two objective functions (i.e., one that deals with open-weight fault and another that deals with open node fault (Mak et al., 2011)) during the training process to improve network FT. However, to our knowledge, no studies have investigated in detail whether a genetic algorithm might enhance the FT of ANNs.

A genetic algorithm (GA) is a heuristic algorithm used to search for a non-random optimal solution to a problem by mimicking the evolutionary process of natural selection (Holland, 1975; Goldberg, 1989). The GA process is iterative and includes initialization, selection, and genetic operation. The genetic operation usually consists of inheritance, mutation, and crossover. In each iteration, which is also called a generation, a fitness function is used to evaluate the fitness of individuals to find the best solution. Thus, a GA requires only a solvable function, which makes it suitable for complex and non-linear problems. Genetic algorithms have been applied to solve a variety of problems, especially when the basic functions are not discontinuous or non-differentiable (Forrest, 1993; Maddox, 1995; Willett, 1995; Pedersen and Moult, 1996; Meurice et al., 1998; Weber, 1998; Liu and Wang, 2001; Rothlauf et al., 2002; Jamshidi, 2003; Leardi, 2007; Wu, 2007; Gerlee et al., 2011; Manning et al., 2013; Pena-Malavera et al., 2014).

Thus, this study proposes an approach that combines an FIB learning algorithm with a GA to build fault-tolerant ANNs and to demonstrate this method’s superior FT performance in comparison with that of a general GA (GE-GA) and two classic existing algorithms, the back-propagation (BP) algorithm and the modification of weight (MW) methods, in solving an exclusive OR (XOR) problem or an overlapping classification problem.

Results

Training ANNs to solve an XOR problem with a GA

An XOR problem was used to train the ANN with either a GE-GA or an FIB-GA. Figure 1A illustrates the architecture of the ANN, and Fig. S1 illustrates the artificial neuron model. Two classic algorithms were included in the comparisons: the back-propagation (BP) and the modification of weights (MW). The BP algorithm is a traditional learning method based on a gradient descent, and the MW algorithm modifies the weight during the learning phase if the absolute value of the weight exceeds a certain threshold. For each experiment, the training proceeded until the terminating condition (i.e., error less than 0.001 or number of iterations reaching 1,000) was satisfied. Figure 1B illustrates changes in the error (or minimum error in training with a GE-GA or FIB-GA) in one iteration versus the number of iterations. Errors with all four fitting methods declined with increasing iterations. The BP and MW methods reached the terminating condition much faster (BP: 3.0 ± 0.0, MW: 3.8 ± 0.8; GA: 610.6 ± 274.8; FT: 905.6 ± 148.4 iterations) compared with the other two methods. Furthermore, the ErrorBP, ErrorMW and ErrorGE-GA decayed approximately in an exponential manner (τ = 5.6196 ± 0.8967, fitting function: f(x)=a·e-τx, 4.1171 ± 0.6494, and 0.0186 ± 0.0050 iteration−1, respectively); however, the ErrorFIB-GA decayed much more slowly and in an irregular manner (Fig. 1B), suggesting low efficiency in the parameter optimization process. As there are four elements in the output vector (c=c1c2c3c4) and the calculated error in one iteration comprises the average from all four individual elements, we also examined, in each training period, the error of each element that is given by

Error-ci=cicalculated-ciactuali=1,2,3,4.

Figure 1.

Figure 1

The use of BP, MW, GE-GA, and FIB-GA in training ANNs to solve an XOR problem. (A) The topology of the artificial neural networks. (B) The plot of the fitting errors versus the number of iterations. The inset shows the changes of the error within the initial six iterations. (C) The summary graph comparing the error of each element in the output vectors between BP ANN, MW ANN, GE-GA ANN, and FIB-GA ANN. (D) The plot of the weights for the output neuron versus the fitting errors in ANN training with the BP, MW, GE-GA, and FIB-GA

Figure 1C illustrates the fluctuations in the error of each element during 20 independent trainings. The average fluctuations during the training of ANNs with BP, MW, GE-GA, and FIB-GA were BP: 0.0001 ± 0.0000, MW: 0.0001 ± 0.0000, GE-GA: 0.0007 ± 0.0009, and FIB-GA: 0.0024 ± 0.0025, respectively. Statistical analysis revealed that the FIB-GA method showed the biggest fluctuation when compared with the other three methods (Table S1). Taken together, all four methods demonstrated the capability of training the ANN successfully, although at different speeds.

Typically, in an FT ANN, the impact of each node is distributed as evenly as possible so as to avoid dominant nodes. Thus, we evaluated the correlation index between different ANN parameters and the fitting errors. All 25 parameters were grouped into four categories: 12 weights (weightij) and six biases (biasj) for neurons in the hidden layer and six weights (weightjm) and one bias (biasm) for the output neuron. Table 1 summarizes the correlation efficiency and significance between each category of the parameters and errors. The weights for the neurons in the hidden-layer ANN training using the BP and the GE-GA strongly correlated negatively with the fitting error; however, those in the other two ANNs did not. The bias for the output neuron has no significant correlation with the fitting errors in all four algorithms. All the output neuron weights in the ANN training with the GE-GA, BP, and MW strongly correlated with the errors (Fig. 1D); however, those in the FIB-GA ANN training did not. These results showed that in the ANN training with the FIB-GA, no parameter set correlated with the fitting error, a finding that implies there is no dominant parameter in ANNs trained via the use of an FIB-GA.

Table 1.

Correlation between each category of parameters and average errors

ANN BP ANN MW ANN
Parameter wMAH bMAH wMAO bO wMAH bMAH wMAO bO
r −0.659 0.158 0.936 −0.152 0.051 0.018 0.857 0.064
p 0.002 0.507 0 0.521 0.831 0.940 0 0.788
ANN GE-GA ANN FIB-GA ANN
Parameter wMAH bMAH wMAO bO wMAH bMAH wMAO bO
r −0.789 −0.070 0.901 0.291 −0.260 0.524 0.339 −0.258
p 0 0.768 0 0.214 0.268 0.018 0.144 0.272

The FT performance of ANNs in solving an XOR problem with a single fault

Fault tolerance is the property that allows an ANN or BNN to operate properly in the event one or more components are lost. We began by comparing the errors among the ANNs generated by the BP, MW, GE-GA, and FIB-GA methods in which one randomly selected network parameter was changed to 0 (void). The plot of the errors versus the faulty parameters in 20 independent experiments clearly shows that the ANNs constructed using the FIB-GA contains the least number of errors (Fig. 2A). The averaged errors from 20 independent experiments are as follows: BP: 0.2623 ± 0.0614, MW: 0.2507 ± 0.0355, GE-GA: 0.2746 ± 0.0698, and FIB-GA: 0.1527 ± 0.0150 (statistical test in Table S2). Assuming the error equals or exceeds 0.4 as a fault output, the error rates show a similar trend (Fig. 2B): BP: 24.80 ± 9.68%, MW: 21.60 ± 7.94%, GE-GA: 23.80 ± 10.66%, and FIB-GA: 8.60 ± 5.24% (statistical test in Table S2). Next, the FT performances of the four ANNs were compared when one neuron (rather than one parameter) in the hidden layer completely lost its responsiveness. The performances of all four ANNs were reduced when compared with the fully functional ANNs, while the FIB-GA ANNs showed the fewest errors (Fig. 2C): BP: 0.3900 ± 0.1041, MW: 0.3645 ± 0.0567, GE-GA: 0.5167 ± 0.1413, and FIB-GA: 0.2936 ± 0.0410; (statistical test in Table S3) and the lowest error rates, with a 0.4 threshold (Fig. 2D): BP: 45.00 ± 19.57%, MW: 40.83 ± 13.76%, GE-GA: 54.17 ± 20.86%, and FIB-GA: 27.50 ± 13.55% (statistical test in Table S3).

Figure 2.

Figure 2

The FT performance of ANNs in solving an XOR problem with a single faulty parameter or neuron in the hidden layer. (A and B) The plot of errors versus the faulty parameter (A) and a histogram of error occurrence (B) in 20 independent experiments using ANNs trained via BP, MW, GE-GA, and FIB-GA methods. (C and D) The plot of errors versus the faulty neurons in the hidden layer (C) and a histogram of error occurrence (D) in 20 independent experiments using ANNs trained via BP, MW, GE-GA, and FIB-GA methods. In panels C and D, a fault output represents a fitting with the error equal to or exceeding a threshold of 0.4 (red lines)

As the output matrix is composed of four elements in the XOR problem, the errors of the individual elements were compared. The distributions of Error-ci among the four ANNs were plotted while voiding one parameter or one neuron in the hidden layer (Fig. S2). Among the four algorithms, ANN training with the FIB-GA consistently showed the least number of errors (Fig. S2A): BP: 0.2623 ± 0.0614, MW: 0.2507 ± 0.0355, GE-GA: 0.2746 ± 0.0698, and FIB-GA: 0.1527 ± 0.0150 (statistical test in Table S4) and the lowest error rate (Fig. S2B): BP: 25.40 ± 7.94%, MW: 25.90 ± 5.39%, GE-GA: 25.75 ± 7.35%, and FIB-GA: 14.85 ± 3.30% (statistical test in Table S4). When one neuron in the hidden layer was voided randomly, the average error and the error rate with the 0.4 threshold showed a similar trend (Fig. S2C): average error: BP: 0.3900 ± 0.1041, MW: 0.3645 ± 0.0567, GE-GA: 0.5167 ± 0.1413, and FIB-GA: 0.2936 ± 0.0410 (statistical test in Table S5). Figure S2D shows the error rate with a 0.4 threshold: BP: 43.75 ± 17.34%, MW: 42.50 ± 12.72%, GE-GA: 48.13 ± 16.36%, and FIB-GA: 31.25 ± 6.55% (statistical test in Table S5). Together, these results clearly show that the FIB-GA ANN has superior FT when one parameter or one neuron in the network is lost.

The FT performance of ANNs in solving an XOR problem with multiple faults

In both ANN and BNN, errors typically happen at multiple sites but are not restricted to one element. Thus, the performances of ANNs were compared to solve an XOR problem in which two to four parameters are disabled simultaneously. As each ANN has 25 parameters, there are 300 (C252), 2,300 (C253), and 12,650 (C254) combinations when two, three, and four parameters are all set to 0, respectively. Figure 3A illustrates the distribution of error; the summarized data clearly show that the FIB-GA-trained ANN still performed best under multiple-fault conditions (Table 2; statistical test in Tables S6–7). Next, we examined the errors occurring when two to six neurons in the hidden layer are voided. Under these circumstances, the ANN trained using the GE-GA showed the largest number of errors, while the ANN trained using the FIB-GA demonstrated the best performance (Fig. 3B). The error rates with a 0.4 threshold displayed the same order in the fitting performance (Fig. 3C). Not surprisingly, the performance of the ANNs trained using the FIB-GA was significantly better than the other three ANNs; however, the performance of the FIB-GA-trained ANNs weakened as the number of voided neurons increased (Table 3 and Fig. S3; statistical test in Tables S8–9). Since the performance of ANNs highly relies on the number of nodes in the hidden layer(Xu and Xu, 2013; Sasakawa et al., 2014). we next investigated whether the number of hidden neurons could affect the FT performance of the four ANNs in solving the XOR problem. In the ANNs with three or nine neurons in the hidden layer, the FIB-GA-trained ANN continued to demonstrate an FT performance that was superior to that of the BP, MW, and GE-GA ANNs (three neurons: Table 4; statistical test in Tables S10–11; nine neurons: Table 5; statistical test in Tables S12–13).

Figure 3.

Figure 3

The FT performance of ANNs in solving an XOR problem with multiple faulty parameters or neurons in the hidden layer. (A) The histogram of error occurrence in 20 independent experiments using ANNs trained via BP, MW, GE-GA, and FIB-GA methods. (B and C) The plot of errors versus faulty neurons (B) and a histogram of error occurrence (C) in 20 independent experiments using ANNs trained via BP, MW, GE-GA, and FIB-GA methods

Table 2.

Average errors and error rates of multiple faulty parameters with six hidden neurons

Error BP MW GE-GA FIB-GA
Average error 2 0.4295 ± 0.0925 0.4092 ± 0.0538 0.4534 ± 0.1092 0.2603 ± 0.0271
3 0.5419 ± 0.1086 0.5158 ± 0.0649 0.5752 ± 0.1334 0.3392 ± 0.0373
4 0.6199 ± 0.1167 0.5903 ± 0.0716 0.6609 ± 0.1490 0.3983 ± 0.0453
Error rate 2 47.03% ± 13.11% 45.03% ± 9.92% 45.45% ± 15.25% 20.05% ± 7.50%
3 63.52% ± 12.68% 62.87% ± 9.52% 62.07% ± 15.54% 31.93% ± 9.06%
4 74.61% ± 10.75% 74.31% ± 8.18% 73.43% ± 13.86% 43.29% ± 9.75%

Table 3.

Average errors and error rates of multiple faulty neurons with six hidden neurons

Error BP MW GE-GA FIB-GA
Average error 2 0.5229 ± 0.1127 0.4891 ± 0.0794 0.7041 ± 0.1804 0.4077 ± 0.0657
3 0.5960 ± 0.1231 0.5593 ± 0.0878 0.7891 ± 0.2036 0.4665 ± 0.0837
4 0.6227 ± 0.1210 0.5974 ± 0.0969 0.8235 ± 0.2398 0.5009 ± 0.0909
5 0.6115 ± 0.1174 0.6133 ± 0.1176 0.8096 ± 0.3082 0.5267 ± 0.0905
6 0.5765 ± 0.1094 0.6037 ± 0.1489 0.7618 ± 0.4398 0.5382 ± 0.0740
Error rate 2 62.67% ± 12.31% 62.00% ± 13.87% 77.67% ± 18.00% 47.33% ± 15.58%
3 80.00% ± 13.76% 78.00% ± 12.61% 81.75% ± 16.96% 55.25% ± 16.26%
4 88.33% ± 14.00% 86.67% ± 11.03% 89.00% ± 13.90% 65.00% ± 14.49%
5 90.83% ± 16.64% 94.17% ± 11.18% 95.00% ± 9.52% 81.67% ± 13.13%
6 100.00% ± 0.00% 100.00% ± 0.00% 100.00% ± 0.00% 100.00% ± 0.00%

Table 4.

Error rates of multiple faulty parameters and neurons with three hidden neurons

Error BP MW GE-GA FIB-GA
Parameters 1 33.85% ± 14.21% 41.15% ± 16.03% 38.85% ± 13.08% 8.46% ± 6.06%
2 63.91% ± 12.94% 71.15% ± 13.27% 70.51% ± 13.50% 30.77% ± 7.51%
3 82.52% ± 8.51% 87.13% ± 7.39% 86.00% ± 9.24% 56.59% ± 4.78%
Neurons 1 71.67% ± 16.31% 75.00% ± 18.34% 86.67% ± 19.94% 18.33% ± 17.01%
2 95.00% ± 16.31% 95.00% ± 16.31% 88.33% ± 16.31% 46.67% ± 42.44%
3 100.00% ± 0.00% 100.00% ± 0.00% 100.00% ± 0.00% 100.00% ± 0.00%

Table 5.

Error rates of multiple faulty parameters and neurons with nine hidden neurons

Error BP MW GE-GA FIB-GA
Parameters 1 22.70% ± 9.62% 23.11% ± 8.01% 20.95% ± 11.29% 1.22% ± 1.38%
2 46.67% ± 12.70% 47.70% ± 12.56% 38.20% ± 15.48% 4.44% ± 1.91%
3 63.45% ± 13.11% 64.35% ± 12.40% 52.43% ± 16.25% 9.57% ± 2.85%
Neurons 1 41.67% ± 15.24% 45.00% ± 14.18% 48.89% ± 20.52% 1.67% ± 4.07%
2 60.14% ± 16.05% 65.69% ± 14.20% 66.39% ± 17.40% 10.83% ± 6.62%
3 71.43% ± 12.67% 76.31% ± 11.55% 75.42% ± 14.32% 20.89% ± 7.74%

Together, these results demonstrate that the FIB-GA ANN has superior FT in solving XOR problems when multiple parameters or neurons in the network are lost.

The FT performance of ANNs in solving an overlapping classification problem

Next, we examined the FT performance of the four ANNs in solving an overlapping classification problem (Fig. 4A) that is more complicated than an XOR problem. Solving overlapping classification problems using ANNs has been investigated extensively in the pattern recognition and machine-learning areas (Lovell and Bradley, 1996; Tang et al., 2010; Xiong et al., 2010). We adopted an ANN with the same structures used in these previous studies (see Fig. 1A). The terminating condition was set at 1,000 iterations, since none of the four ANNs could satisfy the condition that the fitting error must be equal to or less than 0.001 within 1,000 iterations, partly due to the complexity of the problem. Figure 4B illustrates the changes in the correct rates versus the number of iterations. The correct rates of BP and MW ANNs increased significantly faster compared with those of GE-GA and FIB-GA (fitting function: f(x)=a·e-τx, square class RCR: BP: 0.4662 ± 0.0006, τ = 0.2717, R2 = 0.9880, MW: 0.4672 ± 0.0010, τ = 0.2529, R2 = 0.8855, GE-GA: 0.4605 ± 0.0166, τ = 0.0040, R2 = 0.9068, FIB-GA: 0.4578 ± 0.0083, τ = 0.0030, R2 = 0.8502, and circle class RCR: BP: 0.4542 ± 0.0006, τ = 0.2210, R2 = 0.9230, MW: 0.4541 ± 0.0115, τ = 0.2581, R2 = 0.8996, GE-GA: 0.4547 ± 0.0115, τ = 0.0084, R2 = 0.9014, FIB-GA: 0.4605 ± 0.0074, τ = 0.0111, R2 = 0.7846 (statistical test in Table S14). The number of iterations that occurred when RCR reached 0.45 are shown for each method as follows: BP 23.2 ± 14.3, MW 15.8 ± 8.3, GE-GA 613.7 ± 341.8, and FIB-GA 583.1 ± 287.7. We then examined the FT performance of these four ANNs in solving an overlapping classification problem and found that the FIB-GA ANN showed significantly fewer errors when one parameter or one neuron was voided. When any of the parameters were voided one at a time in 20 independent experiments, the square class RCRs for FIB-GA ANN was the highest compared with the other three ANNs, while the other three ANNs were not significantly different (square class RCRs: BP: 0.3407 ± 0.0397, MW: 0.3536 ± 0.0317, GE-GA: 0.2863 ± 0.0501, and FIB-GA: 0.4116 ± 0.0247; circle class RCRs: BP: 0.2978 ± 0.0600, MW: 0.3075 ± 0.0301, GE-GA: 0.2151 ± 0.0601, and FIB-GA: 0.4045 ± 0.0258 (statistical test in Table S15) (Fig. 4C). When one neuron in the hidden layer was voided randomly, FIB-GA ANN still significantly outperformed the other three ANNs (square class RCRs: BP: 0.2392 ± 0.1049, MW: 0.2646 ± 0.0743, GE-GA: 0.1334 ± 0.1160, and FIB-GA: 0.4224 ± 0.0586; circle class RCRs: BP: 0.2396 ± 0.1523, MW: 0.2574 ± 0.0911, GE-GA: 0.0073 ± 0.1242, and FIB-GA: 0.4164 ± 0.0640 (statistical test in Table S16) (Fig. 4D). We next randomly voided two to three parameters or neurons in these ANNs and compared their FT performance. As illustrated in Figure 5, voiding two to three parameters or neurons reduced the performance of all four ANNs. The FIB-GA showed the fewest errors and lowest error rates under almost all the fault conditions tested (Tables S17–18). Thus, our data clearly demonstrate that, compared to the ANNs trained using the BP, MW, and GE-GA methods, the FT ability of the ANN trained using the FIB-GA, at a relatively low training speed, is superior.

Figure 4.

Figure 4

The FT performance of different ANNs with six hidden-layer neurons in solving an overlapping classification problem. (A) Two classes of Gaussian noise sources. (B) The plot of the relative correct rate versus the number of iterations. The inset shows the changes of the error within the initial 40 iterations. (C and D) The plot of errors versus the faulty parameters (C) and the plot of errors versus the faulty neurons in the hidden layer (D) in 20 independent experiments when using ANNs trained via BP, MW, GE-GA, and FIB-GA methods

Figure 5.

Figure 5

Relative correct rates of different ANNs with six hidden-layer neurons in solving an overlapping classification problem. (A) The plot of circle RCR (top) and square RCR (bottom) versus the number of faulty parameters. (B) The plot of circle RCR (top) and square RCR (bottom) versus the number of faulty neurons

Discussion

This study compared the FT performances of ANNs constructed with four different algorithms: BP, MW, GE-GA, and FIB-GA. The FIB-GA was constructed via the use of FIB learning algorithms, which has been proven to be a common and efficient method of training fault-tolerant neural networks that includes the addition of noise to the input, weights, or nodes (Leung and Sum, 2008; Ho et al., 2010). Our results clearly show that the FIB learning algorithm is an efficient method for improving the FT performance of ANNs. The data of this study show that FIB-GA results in a significant improvement in errors between the actual and desired inputs when one or multiple neurons are voided. It is worth noting that, when solving an XOR problem or an overlapping classification problem with the continuous and differentiable basis function, the GE-GA does not offer an advantage over the BP and MW algorithms. This might suggest that FT is not an intrinsic property of GA ANNs. In contrast, compared to the two GAs, the BP and MW algorithms showed much faster training speeds, which implies that efficiency competes with fault tolerance in ANNs. An option for increasing the FT performances of ANNs is to avoid weights or neurons with significant effects on errors. Our analysis showed that the weights between the hidden layer and the output layer in the ANNs trained with the GE-GA, BP, or MW, but not those trained with the FIB-GA, are correlated with the output errors, a finding which clearly supports the notion that robustness is greater in a distributed ANN. This is also consistent with previous attempts to improve the partial FT of ANNs by distributing the absolute value of weights uniformly (Cavalieri and Mirabella, 1999a, b). In addition, Macia and Sole reported that degeneracy, rather than redundancy, is necessary for reliable designs of NAND (NOT AND, a binary operation) gate-forming systems(Macia and Sole, 2009). Considering the fact that the BNNs are also distributed systems yielding a high FT performance, distributed storage and processing seem to be key properties in both ANNs and BNNs. Together, our results propose that a fault/noise injection-based genetic algorithm would serve as an efficient approach for improving the FT in ANNs.

Methods

The architecture of the ANN

In this study, a three-layer ANN was constructed: an input layer of two neurons, a hidden layer comprised of different numbers of neurons, and an output layer of one neuron. Figure 1A shows the architecture. Each neuron receives multiple weighted inputs and sends one weighted output to the connected neurons. Simply stated, neurons are interconnected through a unidirectional manner (input → hidden → output direction), and there is no connection within a layer (Fig. 1A). The input of the neuron j in the hidden layer is given by

xj=i=1nInputij×weightij+bj

where Inputij, weightij, and bj denote the input, weight, and input bias of the postsynaptic neuron j, which is connected to the presynaptic neuron i, and n denotes the number of the presynaptic neurons connecting to the neuron j.

The output of the neuron j is given by the following tansig function:

yj=fxj=1-e-2xj1+e-2xj

Calculation of the ANN to solve the XOR problem

A classic XOR problem was selected for use in training and examining the performance of the ANN. According to the architecture of the ANN used in this study (Fig. 1A), the training data, a1 and a2, are defined as follows:

a1=1010
a2=1100

Thus, the actual output of the XOR problem with these two inputs is given by

y=y1y2y3y4=a1a2=¬a1a2a1¬a2=0110.

For one solution set, the output of the ANN is given by

c=c1c2c3c4

where cp=j=16fxj×weightjm+bm and xj denote the input to the neuron j in the hidden layer from the two presynaptic neurons (a1 and a2), which is given by xj=i=12aip×weightij+bj.and Weightjm and bm denote the weight and bias of the neuron m in the output layer, respectively.

Thus, the error for one solution set is given by

Error=14p=14cp-yp

Training of the ANN using a GE-GA

A GE-GA was adopted for training the ANN. The basic idea of a GA is to mimic the process of natural selection and to find the best solution to a problem after several generations. In this study, the upper limit of iteration was set at 1,000, and 20 individuals (i.e., sets of solutions) were used in each generation (i.e., training cycle). The best individual is defined as the one set of solutions having minimum errors. In the first generation, each individual was assigned randomly. In the subsequent generations, the 20 individuals consisted of three parts: two elite individuals (Nelite), which are the two individuals carried forward from the previous generation and having the fewest errors, 14 crossover individuals (Ncrossover), which are generated by combining two selected parents, and four mutation individuals (Nmutation). The selection criteria for parents is based on the scaled position (Scaledi) of each individual within its generation. Ranki is redefined as the position of Individuali when sorting all the individuals in one generation by Error in ascending order. Thus, for individual i, the probability to be selected as a parent is calculated as follows:

Pi=Scaledij=1201j=Scaledi/7.5953

where Scaledi=1Ranki, (RankiRankjifij).

A line segment was then drawn that consisted of lines whose lengths were proportional to the Pi of each individual. A step size was given by 1/Nparent, where Nparent=2·Ncrossover+Nmutation=2×14+4=32 and an initial position is denoted as Initialposition, where 0<Initialposition<1/Nparent. A cursor is then placed at Initialposition and is moved along in steps of 1/Nparent. For each step, the position on which the cursor lands is selected as a parent. Thus, this algorithm generates 32 parents in one generation (Fig. S4). The crossover process generates a child by crossing two parents (Parent1=parP11,,parP125 and Parent2=parP21,,parP225) with a randomly generated binary vector Coef=Coe1,,Coe25, where Coei is assigned to 0 or 1 based on rounding a value that is randomly selected in the open interval (0,1). The parameter vector of the Child=parC1,,parC25 generated by the crossover is given by

parCi=parP1i×Coei+parP2i×Coei-1.

The mutation process generates a child from one parent (Parent1=parP11,,parP125) with a vector Coef=Coe1,,Coe25, where Coei follows a Gaussian distribution centered at 0 (Fig. S4C). The standard deviation of the Gaussian distribution in the first generation is 1, and it shrinks to 0 linearly when reaching the last generation. The parameter vector of the Child=parC1,,parC25 generated by the mutation is given by

parCi=parP1i+Coei.

The goal of training the ANN with a GE-GA is to search for the individual with the minimal ErrorGE-GAwhich is given by

ErrorGE-GA=14p=14|cp-yp| 1

Training of ANN using with a FIB-GA

In addition to the GE-GA, a FIB-GA was another approach used to train the ANN. In the FIB-GA, faults on the ANN parameters were considered during the training process. Thus, the error for one set of solutions is given by

ErrorFIB-GA=125×p=125Errori 2

where Errori is the error when the ith of the 25 parameters is forced to 0, assuming the corresponding parameter becomes faulty.

Overlapping classification problem

Two classes of Gaussian noise sources were considered (shown in Fig. 4A). The first class is shown as a blue square, with a mean at (a1,a2) coordinates of (0.25, 0.25). The second class is shown as a red circle, with a mean at (a1,a2) coordinates of (0.75, 0.75). Both classes have a standard deviation of 0.2. The coordinates (a1,a2) were used as input in the ANN training, and the output is 0.5 and −0.5 for the circle class and the square class, respectively. Each class had 500 scatters in total, and all the data were shuffled before the training was initiated. An actual output value larger than 0 for a point in the circle class and an actual output value less than 0 for a point in the square class were regarded correct.

For each class with N (0N500) points classified correctly, the relative correct rate (RCR) is defined as follows:

RCR=N500-0.5.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Acknowledgements

This work was supported by grants from the National Basic Research Program (973 Program) (Nos. 2014CB942804, 2014BAI03B01, and 2012YQ0302604), Beijing Institute of Collaborative Innovation (15I-15-BJ), and the Seeding Grant for Medicine and Life Sciences of Peking University (2014-MB-11).

Abbreviations

ANNs, artificial neural networks; BP, back-propagation; FIB, fault/noise injection-based; FT, fault tolerance; GA, genetic algorithm; MW, modification of weights; VLSI, very large-scale integration; XOR, exclusive OR.

Compliance with ethics guidelines

Feng Su, Peijiang Yuan, Yangzhen Wang and Chen Zhang declare that they have no conflict of interest. This article does not contain any studies with human or animal subjects performed by the any of the authors.

Author contributions

F.S., P.Y., Y.​W., and C.Z. carried out the experiments; F.S., P.Y., Y.​​W., and C.Z. contributed to the planning of the work; F.S., P.Y., Y.​​W., and C.Z. wrote the paper.

Footnotes

Feng Su, Peijiang Yuan and Yangzhen Wang have contributed equally to this work.

References

  1. Almeida JS. Predictive non-linear modeling of complex data by artificial neural networks. Curr Opin Biotechnol. 2002;13:72–76. doi: 10.1016/S0958-1669(02)00288-4. [DOI] [PubMed] [Google Scholar]
  2. Arnone E, Francipane A, Noto LV, Scarbaci A, La Loggia G. Strategies investigation in using artificial neural network for landslide susceptibility mapping: application to a Sicilian catchment. J Hydroinf. 2014;16:502–515. doi: 10.2166/hydro.2013.191. [DOI] [Google Scholar]
  3. Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S, Montazeri A. Artificial neural networks in neurosurgery. J Neurol Neurosurg Psychiatry. 2015;86:251–256. doi: 10.1136/jnnp-2014-307807. [DOI] [PubMed] [Google Scholar]
  4. Basheer IA, Hajmeer M. Artificial neural networks: fundamentals, computing, design, and application. J Microbiol Methods. 2000;43:3–31. doi: 10.1016/S0167-7012(00)00201-3. [DOI] [PubMed] [Google Scholar]
  5. Baxt WG. Application of artificial neural networks to clinical medicine. Lancet. 1995;346:1135–1138. doi: 10.1016/S0140-6736(95)91804-3. [DOI] [PubMed] [Google Scholar]
  6. Braskie MN, Thompson PM. A focus on structural brain imaging in the Alzheimer’s disease neuroimaging initiative. Biol Psychiatry. 2014;75:527–533. doi: 10.1016/j.biopsych.2013.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cavalieri S, Mirabella O. A novel learning algorithm which improves the partial fault tolerance of multilayer neural networks. Neural Netw. 1999;12:91–106. doi: 10.1016/S0893-6080(98)00094-X. [DOI] [PubMed] [Google Scholar]
  8. Cavalieri S, Mirabella O. A novel learning algorithm which improves the partial fault tolerance of multilayer neural networks. Neural Netw. 1999;12:91–106. doi: 10.1016/S0893-6080(98)00094-X. [DOI] [PubMed] [Google Scholar]
  9. Dybowski R, Gant V. Artificial neural networks in pathology and medical laboratories. Lancet. 1995;346:1203–1207. doi: 10.1016/S0140-6736(95)92904-5. [DOI] [PubMed] [Google Scholar]
  10. Emmerson MD, Damper RI. Determining and improving the fault-tolerance of multilayer perceptrons in a pattern-recognition application. IEEE Trans Neural Netw. 1993;4:788–793. doi: 10.1109/72.248456. [DOI] [PubMed] [Google Scholar]
  11. Fayed N, Modrego PJ, Salinas GR, Gazulla J. Magnetic resonance imaging based clinical research in Alzheimer’s disease. J Alzheimers Dis. 2012;31:S5–18. doi: 10.3233/JAD-2011-111292. [DOI] [PubMed] [Google Scholar]
  12. Forrest S. Genetic algorithms: principles of natural selection applied to computation. Science. 1993;261:872–878. doi: 10.1126/science.8346439. [DOI] [PubMed] [Google Scholar]
  13. Forsstrom JJ, Dalton KJ. Artificial neural networks for decision support in clinical medicine. Ann Med. 1995;27:509–517. doi: 10.3109/07853899509002462. [DOI] [PubMed] [Google Scholar]
  14. Gerlee P, Basanta D, Anderson AR. Evolving homeostatic tissue using genetic algorithms. Prog Biophys Mol Biol. 2011;106:414–425. doi: 10.1016/j.pbiomolbio.2011.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading: Addison-Wesley Pub. Co; 1989. [Google Scholar]
  16. Hampson S. Generalization and specialization in artificial neural networks. Prog Neurobiol. 1991;37:383–431. doi: 10.1016/0301-0082(91)90008-O. [DOI] [PubMed] [Google Scholar]
  17. Hampson S. Problem solving in artificial neural networks. Prog Neurobiol. 1994;42:229–281. doi: 10.1016/0301-0082(94)90065-5. [DOI] [PubMed] [Google Scholar]
  18. Ho KI, Leung CS, Sum J. Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE Trans Neural Netw. 2010;21:938–947. doi: 10.1109/TNN.2010.2046179. [DOI] [PubMed] [Google Scholar]
  19. Holland JH. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press; 1975. [Google Scholar]
  20. Hu X, Cammann H, Meyer HA, Miller K, Jung K, Stephan C. Artificial neural networks and prostate cancer–tools for diagnosis and management. Nat Rev Urol. 2013;10:174–182. doi: 10.1038/nrurol.2013.9. [DOI] [PubMed] [Google Scholar]
  21. Jamshidi M. Tools for intelligent control: fuzzy controllers, neural networks and genetic algorithms. Philos Trans R Soc Lond A. 2003;361:1781–1808. doi: 10.1098/rsta.2003.1225. [DOI] [PubMed] [Google Scholar]
  22. Jiang DD, Zhao ZY, Xu ZZ, Yao CP, Xu HW. How to reconstruct end-to-end traffic based on time-frequency analysis and artificial neural network. Aeu-Int J Electron Commun. 2014;68:915–925. doi: 10.1016/j.aeue.2014.04.011. [DOI] [Google Scholar]
  23. Kamimura R, Konstantinov K, Stephanopoulos G. Knowledge-based systems, artificial neural networks and pattern recognition: applications to biotechnological processes. Curr Opin Biotechnol. 1996;7:231–234. doi: 10.1016/S0958-1669(96)80018-8. [DOI] [PubMed] [Google Scholar]
  24. Krogh A. What are artificial neural networks? Nat Biotechnol. 2008;26:195–197. doi: 10.1038/nbt1386. [DOI] [PubMed] [Google Scholar]
  25. Leardi R. Genetic algorithms in chemistry. J Chromatogr A. 2007;1158:226–233. doi: 10.1016/j.chroma.2007.04.025. [DOI] [PubMed] [Google Scholar]
  26. Leung CS, Sum JP. A fault-tolerant regularizer for RBF networks. IEEE Trans Neural Netw. 2008;19:493–507. doi: 10.1109/TNN.2007.912320. [DOI] [PubMed] [Google Scholar]
  27. Li J, Pan P, Huang R, Shang H. A meta-analysis of voxel-based morphometry studies of white matter volume alterations in Alzheimer’s disease. Neurosci Biobehav Rev. 2012;36:757–763. doi: 10.1016/j.neubiorev.2011.12.001. [DOI] [PubMed] [Google Scholar]
  28. Lisboa PJ. A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Netw. 2002;15:11–39. doi: 10.1016/S0893-6080(01)00111-3. [DOI] [PubMed] [Google Scholar]
  29. Lisboa PJ, Taktak AF. The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw. 2006;19:408–415. doi: 10.1016/j.neunet.2005.10.007. [DOI] [PubMed] [Google Scholar]
  30. Liu F, Wang J. Genetic algorithms and its application to spectral analysis. Guang Pu Xue Yu Guang Pu Fen Xi. 2001;21:331–335. [PubMed] [Google Scholar]
  31. Lovell BC, Bradley AP. The multiscale classifier. IEEE Trans Pattern Anal Mach Intell. 1996;18:124–137. doi: 10.1109/34.481538. [DOI] [Google Scholar]
  32. Macia J, Sole RV. Distributed robustness in cellular networks: insights from synthetic evolved circuits. J R Soc Interface. 2009;6:393–400. doi: 10.1098/rsif.2008.0236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Maddox J. Genetics helping molecular-dynamics. Nature. 1995;376:209. doi: 10.1038/376209a0. [DOI] [Google Scholar]
  34. Mahdiani HR, Fakhraie SM, Lucas C. Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors. IEEE Trans Neural Netw Learn Syst. 2012;23:1215–1228. doi: 10.1109/TNNLS.2012.2199517. [DOI] [PubMed] [Google Scholar]
  35. Mak SK, Sum PF, Leung CS. Regularizers for fault tolerant multilayer feedforward networks. Neurocomputing. 2011;74:2028–2040. doi: 10.1016/j.neucom.2010.09.025. [DOI] [Google Scholar]
  36. Manning T, Sleator RD, Walsh P. Naturally selecting solutions: the use of genetic algorithms in bioinformatics. Bioengineered. 2013;4:266–278. doi: 10.4161/bioe.23041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Medler DA, Dawson MR. Training redundant artificial neural networks: imposing biology on technology. Psychol Res. 1994;57:54–62. doi: 10.1007/BF00452996. [DOI] [PubMed] [Google Scholar]
  38. Meurice N, Leherte L, Vercauteren DP. Comparison of benzodiazepine-like compounds using topological analysis and genetic algorithms. SAR QSAR Environ Res. 1998;8:195–232. doi: 10.1080/10629369808039141. [DOI] [PubMed] [Google Scholar]
  39. Patel JL, Goyal RK. Applications of artificial neural networks in medical science. Curr Clin Pharmacol. 2007;2:217–226. doi: 10.2174/157488407781668811. [DOI] [PubMed] [Google Scholar]
  40. Pedersen JT, Moult J. Genetic algorithms for protein structure prediction. Curr Opin Struct Biol. 1996;6:227–231. doi: 10.1016/S0959-440X(96)80079-0. [DOI] [PubMed] [Google Scholar]
  41. Pena-Malavera A, Bruno C, Fernandez E, Balzarini M. Comparison of algorithms to infer genetic population structure from unlinked molecular markers. Stat Appl Genet Mol Biol. 2014;13:391–402. doi: 10.1515/sagmb-2013-0006. [DOI] [PubMed] [Google Scholar]
  42. Phatak DS, Koren I. Complete and partial fault-tolerance of feedforward neural nets. IEEE Trans Neural Netw. 1995;6:446–456. doi: 10.1109/72.363479. [DOI] [PubMed] [Google Scholar]
  43. Phatak DS, Koren I. Complete and partial fault tolerance of feedforward neural nets. IEEE Trans Neural Netw. 1995;6:446–456. doi: 10.1109/72.363479. [DOI] [PubMed] [Google Scholar]
  44. Pini L, Pievani M, Bocchetta M, Altomare D, Bosco P, Cavedo E, Galluzzi S, Marizzoni M, Frisoni GB. Brain atrophy in Alzheimer’s disease and aging. Ageing Res Rev. 2016;28:30002. doi: 10.1016/j.arr.2016.01.002. [DOI] [PubMed] [Google Scholar]
  45. Presnell SR, Cohen FE. Artificial neural networks for pattern recognition in biochemical sequences. Annu Rev Biophys Biomol Struct. 1993;22:283–298. doi: 10.1146/annurev.bb.22.060193.001435. [DOI] [PubMed] [Google Scholar]
  46. Protzel PW, Palumbo DL, Arras MK. Performance and fault-tolerance of neural networks for optimization. IEEE Trans Neural Netw. 1993;4:600–614. doi: 10.1109/72.238315. [DOI] [PubMed] [Google Scholar]
  47. Rajan P, Tolley DA. Artificial neural networks in urolithiasis. Curr Opin Urol. 2005;15:133–137. doi: 10.1097/01.mou.0000160629.81978.7a. [DOI] [PubMed] [Google Scholar]
  48. Rodrigues PL, Rodrigues NF, Pinho ACM, Fonseca JC, Correia-Pinto J, Vilaca JL. Automatic modeling of pectus excavatum corrective prosthesis using artificial neural networks. Med Eng Phys. 2014;36:1338–1345. doi: 10.1016/j.medengphy.2014.06.020. [DOI] [PubMed] [Google Scholar]
  49. Rothlauf F, Goldberg DE, Heinzl A. Network random keys: a tree representation scheme for genetic and evolutionary algorithms. Evol Comput. 2002;10:75–97. doi: 10.1162/106365602317301781. [DOI] [PubMed] [Google Scholar]
  50. Sasakawa T, Sawamoto J, Tsuji H. Neural network to control output of hidden node according to input patterns. Am J Intell Syst. 2014;4:196–203. [Google Scholar]
  51. Street ME, Buscema M, Smerieri A, Montanini L, Grossi E. Artificial neural networks, and evolutionary algorithms as a systems biology approach to a data-base on fetal growth restriction. Prog Biophys Mol Biol. 2013;113:433–438. doi: 10.1016/j.pbiomolbio.2013.06.003. [DOI] [PubMed] [Google Scholar]
  52. Sum J, Leung ACS. Prediction error of a fault tolerant neural network. Neurocomputing. 2008;72:653–658. doi: 10.1016/j.neucom.2008.05.009. [DOI] [Google Scholar]
  53. Tang W, Mao KZ, Mak LO, Ng GW (2010) Classification for overlapping classes using optimized overlapping region detection and soft decision. Paper presented at: information fusion
  54. Tchernev EB, Mulvaney RG, Phatak DS. Investigating the fault tolerance of neural networks. Neural Comput. 2005;17:1646–1664. doi: 10.1162/0899766053723096. [DOI] [PubMed] [Google Scholar]
  55. Weber L. Applications of genetic algorithms in molecular diversity. Curr Opin Chem Biol. 1998;2:381–385. doi: 10.1016/S1367-5931(98)80013-6. [DOI] [PubMed] [Google Scholar]
  56. Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Cedarbaum J, Green RC, Harvey D, Jack CR, Jagust W, et al. 2014 update of the Alzheimer’s disease neuroimaging initiative: a review of papers published since its inception. Alzheimers Dement. 2015;11:001. doi: 10.1016/j.jalz.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Willett P. Genetic algorithms in molecular recognition and design. Trends Biotechnol. 1995;13:516–521. doi: 10.1016/S0167-7799(00)89015-0. [DOI] [PubMed] [Google Scholar]
  58. Wu AH. Use of genetic and nongenetic factors in warfarin dosing algorithms. Pharmacogenomics. 2007;8:851–861. doi: 10.2217/14622416.8.7.851. [DOI] [PubMed] [Google Scholar]
  59. Xiong H, Wu J, Liu L (2010) Classification with class overlapping: a systematic study. ICEBI-10
  60. Xu C, Xu C. Optimization analysis of dynamic sample number and hidden layer node number based on BP neural network. Berlin: Springer; 2013. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Protein & Cell are provided here courtesy of Oxford University Press

RESOURCES