Using a Support Vector Machine Based Decision Stage to Improve the Fault Diagnosis on Gearboxes

Rodrigo P Monteiro; Mariela Cerrada; Diego R Cabrera; René V Sánchez; Carmelo J A Bastos-Filho

doi:10.1155/2019/1383752

. 2019 Feb 3;2019:1383752. doi: 10.1155/2019/1383752

Using a Support Vector Machine Based Decision Stage to Improve the Fault Diagnosis on Gearboxes

Rodrigo P Monteiro ^1,^✉, Mariela Cerrada ², Diego R Cabrera ², René V Sánchez ², Carmelo J A Bastos-Filho ³

PMCID: PMC6378083 PMID: 30863433

Abstract

Gearboxes are mechanical devices that play an essential role in several applications, e.g., the transmission of automotive vehicles. Their malfunctioning may result in economic losses and accidents, among others. The rise of powerful graphical processing units spreads the use of deep learning-based solutions to many problems, which includes the fault diagnosis on gearboxes. Those solutions usually require a significant amount of data, high computational power, and a long training process. The training of deep learning-based systems may not be feasible when GPUs are not available. This paper proposes a solution to reduce the training time of deep learning-based fault diagnosis systems without compromising their accuracy. The solution is based on the use of a decision stage to interpret all the probability outputs of a classifier whose output layer has the softmax activation function. Two classification algorithms were applied to perform the decision. We have reduced the training time by almost 80% without compromising the average accuracy of the fault diagnosis system.

1. Introduction

Gearboxes are mechanical devices that provide speed and torque conversion from rotating sources of power to other mechanisms. They have a crucial role in several applications, e.g., industrial rotating machines, automotive vehicles, and wind turbines. Their malfunction may not only impair the operation of a given system but also result in economic losses and safety risks [1]. This way, the use of fast and effective fault diagnosis techniques is necessary, since the early detection of failures allows more efficient management of the maintenance activities and leads to safer operation of the system [2].

Gearboxes may present several failure modes. Most of them are related to mechanical components and lubrication conditions. One failure mode that requires attention is the tooth breakage of gears, which is liable to compromise the machine operation in a significant way [3].

Supported by the advent of powerful computational devices, e.g., graphical processing units (GPUs), deep learning-based techniques have become essential tools in fault detection and fault diagnosis research fields. Their superior performance in applications related to classification and object detection tasks has also supported their popularization [4].

Plenty of works that relate deep learning and fault diagnosis in gearboxes have arisen in recent years. Zhao et al. [5] have proposed a variant of deep residual networks (DRNs) that uses dynamically weighted wavelet coefficients to improve the performance of the diagnostic process. Their work is based on the absence of a consensus about the most critical frequency bands regarding the useful information for systems that perform the diagnosis on planetary gearboxes. Their system finds discriminative sets of features by dynamically adjusting the weights applied to the wavelet packet coefficients. Cabrera et al. [6] proposed the use of a deep convolutional neural network (DCNN) trained in advanced by a stacked convolutional autoencoder (SCAE) to determine fault severity in gearboxes. Their system performs unsupervised detection of hierarchical time-frequency patterns using the DCNN. The SCAE improves the DCNN performance by capturing a priori patterns.

Furthermore, Deutsch and He [7] use a feedforward deep belief network (DBN) to predict the remaining useful life of mechanical machines. They combine the self-taught feature-learning capability of DBNs with the predicting power of feedforward neural networks to extract features from vibration signals, assess the integrity of the machine, and make the prediction. Jiang et al. [8] proposed the use of stacked multilevel-denoising autoencoders to perform the fault diagnosis on the gearboxes of wind turbines. The features are learned through an unsupervised process, which is followed by a supervised fine-tuning process with the label information for classification. They also use multiple noise levels to train the autoencoder and enhance the feature learning and classification capabilities. Jiang et al. [9] also proposed a gearbox diagnosis system based on multiscale convolutional neural networks. They combined multiscale and hierarchical learning to capture information at different scales, improving the performance of the classifier.

Monteiro et al. [10] proposed a fault diagnosis system based on the Fourier transform (FT) spectrograms and deep convolutional neural networks. They have also discussed in their work about the influence of the model depth and the amount of training data available on the network performance. Shao et al. [11] used transfer learning to perform the fault diagnosis on mechanical machines. A DCNN model pretrained on ImageNet, followed by a fine-tuning process, carried out the fault diagnosis. Other works, e.g., Zeng et al. [12] and Liao et al. [13], use convolutional neural networks associated with S and wavelet transforms to classify the gearbox heath condition, respectively.

One of the major issues about deep learning-based solutions for fault diagnosis systems is their computational burden; e.g., the training process of deep models is often long and demands a large amount of training data. Such a setback is usually overcome by using computers with powerful GPUs, e.g., [12]. However, this sort of hardware is not always available to everyone. Thus, it is necessary to find alternative ways to reduce the computational cost of deep learning-based solutions without compromising their performance regarding accuracy.

This paper proposes the addition of a decision stage in the output of DCNN-based fault diagnosis systems, which are commonly based on classification algorithms [5, 10, 12, 13]. The outputs of those systems often represent the probabilities of a given input to belong to a failure mode in a given set. The failure mode that presents the highest probability value is chosen. Although this approach has proved to be reasonable to a number of applications, the information provided by the remaining outputs is usually lost.

We believe that this information can also be used to improve the performance of the classifier. The case study is the one analyzed in [6, 7], which poses the problem of the fault severity diagnosis related to the gear tooth break failure mode. The decision stage analyses the outputs of all classes, i.e., severity levels, and decides the severity of the gearbox fault based on their probabilities distribution. Since this artifice improves the classification results, the same model architecture can be trained within fewer epochs without compromising its accuracy, thus reducing the training time. Decision stages, on the other hand, are well-known tools. They are commonly employed in multimodal and committee-based classification systems. They combine the results obtained by multiple classifiers to improve the accuracy of the whole system [14, 15].

The remainder of this paper is defined as follows: Section 3 presents the details of the experiments carried out in this research, Section 4 presents the results obtained and discusses their relevance, and Section 5 explains the main findings and implications of this work.

2. Theoretical Background

2.1. Convolutional Neural Networks

The convolutional neural networks are models inspired by biological processes. The pattern of the connections among neurons, i.e., the processing units of neural networks, is similar to that of the animal visual cortex. They perform object recognition and classification tasks [16] well. Object detection [17], diseases detection [18], and fault diagnosis [6, 10] are three examples of applications that use CNNs. Their basic structure consists of an input layer, alternating blocks of convolutional and pooling layers, which are followed by fully connected layers, and an output layer [16]. Modifications in this structure may occur, depending on the application. This structure is illustrated in Figure 1. The role of each layer is explained as follows:

(a)
Input layer: this layer receives and stores raw input data. It also specifies the width, height, and number of channels of the input data [19].
(b)
Convolutional layers: they learn feature representations from a set of input data and generate feature maps. Those maps are created by convolving their inputs with a set of learned weights. An activation function, e.g., the ReLU function, is applied to the output of the convolution step. The following equation shows the general formulation of a convolutional layer:
$\begin{matrix} x_{j}^{l} = f (\sum_{i \in M_{j}} x_{i}^{l - 1} * k_{j i}^{l} + b_{j}^{l}), \end{matrix}$ (1)
in which l refers to the current layer, i and j are the indices of the elements of the previous and current layers, respectively, M_j is a set of input maps, k is the weight matrix of the i-th convolutional kernel of the l-th layer applied to the j-th input feature map, and b is the bias.
(c)
Pooling layers: they reduce the spatial resolution of feature maps, improving the spatial invariance to input distortions and translations [19]. Most of the recent works employ a variation of this layer called the max pooling [16]. It propagates to the next layers the maximum value from a neighborhood of elements. This operation is defined by
$\begin{matrix} y_{j rs} = \max_{(p, q) \in R_{rs}} x_{k p q}, \end{matrix}$ (2)
in which y_jrs is the output of the pooling process regarding the j-th feature map and x_kpq is the element at location (p; q) contained by the pooling region R_rs. The pooling process is also known as subsampling [19].
(d)
Fully connected and output layers: they interpret feature representations and perform high-level reasoning [16]. They also compute the scores of each output class [19]. The number of output nodes depends on the number of classes [12].

Basic structure of a convolutional neural network [16].

2.2. Fourier Transform Spectrograms

The Fourier transform (FT) is an essential technique in the field of signal analysis. It informs the frequency composition of a given signal, as well as the contribution of each frequency concerning magnitude [20]. Noise filtering, pattern recognition, and signal modulation are some applications that may be improved by the Fourier transform and its variants, e.g., the discrete Fourier transform (DFT), suitable for processing digital signals, and the fast Fourier transform (FFT), a more efficient algorithm to calculate the DFT [21].

The Fourier transform spectrograms represent signals using time, frequency, and magnitude information. The short-time Fourier transform (STFT) is an FT variant commonly used to generate this sort of representation because it performs time-dependent spectral analyses [21]. The spectrograms show how the spectrum of frequencies of a given signal varies over time. Spectrograms are also used in fault diagnosis applications [10, 22].

2.3. Support Vector Machines

The support vector machine (SVM) is a versatile and powerful machine learning technique [23]. It can be used to solve classification (both linear and nonlinear), regression, and even outlier detection problems, making it one of the most popular machine learning algorithms [23, 24]. Its use is also popular in fault diagnosis of rotating machinery [25]. This technique aims the identification of hyperplanes capable of separating datasets into high-dimensional feature spaces. The separation between datasets is called margin, and the SVM maximizes the margin [23].

A linearly separable dataset allows the SVM to define hyperplanes capable of separating the data into categories, regardless of the number of dimensions presented by the feature space. However, in most applications, the information is not linearly separable in feature spaces with a given dimensionality. Thus, it is necessary to map the dataset into a feature space with a higher number of dimensions, in which the data will be linearly separable. This mapping process is performed by using kernels, e.g., polynomial and radial basis function kernels [23, 24].

2.4. Multilayer Perceptron

The multilayer perceptron (MLP) is a feedforward neural network. MLPs can distinguish nonlinearly separable patterns. Those algorithms consist of several nodes, named “neurons,” which are arranged in multiple layers just as a directed graph. Each layer is fully connected to the subsequent one. Those layers are usually divided into three types: input, hidden, and output layers. Multilayer perceptrons are considered to be universal approximators. One hidden layer MLP with enough neurons can approximate any given continuous function [23, 24].

3. Materials and Methods

3.1. Experimental Setup: Obtaining the Vibration Signals

We arranged the experimental setup according to Figure 2. It was used to obtain the vibration measurements of the gearbox. The electric motor (M) drives the gearbox, composed of two gears (Z1 and Z2). Those gears are mounted on independent shafts. A magnetic brake (B) is connected to the output shaft. Table 1 lists some features of those components.

Experimental setup [10]. A, accelerometer; B, break system; M, motor; E, encoder; Z1, first gear; Z2, second gear.

Table 1.

Characteristics of the components of the experimental arrangement.

Component	Description
Electric motor (M)	Motor Siemens 1LA7 090-4YA60 1 : 49 kW, 4 poles, and 28 : 33 Hz
Gear 1 (Z1)	Pinion: 76 mm, 30 teeth, pressure angle = 20, and helix angle = 20
Gear 2 (Z2)	Gear: 112 mm, 45 teeth, pressure angle = 20, and helix angle = 20
Brake (B)	Magnetic brake: proportional force to input voltage and belt coupled

Code	Damage (mm)	Tooth percentage (%)
P1	0.00	100.00
P2	2.37	88.42
P3	4.00	80.42
P4	5.73	71.94
P5	7.60	62.81
P6	10.57	48.29
P7	12.37	39.48
P8	14.33	29.85
P9	17.15	14.36
P10	20.43	0.00

Code	Frequency profile	Rotation frequency (Hz)
F1	Constant	8
F2	Constant	12
F3	Constant	15
F4	Sine period = 2 s	8–15
F5	Square period = 2 s	8–15

Class	Average accuracy (%)	Standard deviation
P1	100	0
P2	99.79	0.32
P3	98.30	6.20
P4	95.92	4.47
P5	94.32	7.66
P6	89.89	12.30
P7	83.79	13.03
P8	94.10	7.86
P9	96.50	4.60
P10	95.87	7.26

Class	Without additional classifier		With additional classifier
Class	Average accuracy (%)	Standard deviation	Average accuracy (%)	Standard deviation
P1	100	0	100	0
P2	99.79	0.32	99.79	0.32
P3	98.30	6.20	99.42	0.79
P4	95.92	4.47	97.74	1.71
P5	94.32	7.66	97.56	1.89
P6	89.89	12.30	94.36	4.29
P7	83.79	13.03	92.70	3.36
P8	94.10	7.86	96.71	1.84
P9	96.50	4.60	97.78	1.66
P10	95.87	7.26	97.96	1.51

Class	Average accuracy (%)	Standard deviation
P1	99.98	0.07
P2	98.35	1.33
P3	93.04	6.96
P4	81.58	12.42
P5	75.50	15.05
P6	69.13	19.51
P7	74.30	15.03
P8	82.70	7.16
P9	85.47	8.51
P10	89.81	9.57

Configuration	Average training time (s)
Computer with GPU	815
Computer with CPU	10,601

	Average F-score	Average AUC
Without classifier	0.95	0.97
With classifier	0.97	0.99

Class	Wilcoxon result
P1	—
P2	—
P3	—
P4	Λ4
P5	Λ5
P6	Λ6
P7	Λ7
P8	—
P9	Λ9
P10	—

	Average F-score	Average AUC	Average accuracy (%)
SVM	0.97	0.99	97.4
MLP	0.97	0.99	97.25

PERMALINK

Using a Support Vector Machine Based Decision Stage to Improve the Fault Diagnosis on Gearboxes

Rodrigo P Monteiro

Mariela Cerrada

Diego R Cabrera

René V Sánchez

Carmelo J A Bastos-Filho

Abstract

1. Introduction

2. Theoretical Background

2.1. Convolutional Neural Networks

Figure 1.

2.2. Fourier Transform Spectrograms

2.3. Support Vector Machines

2.4. Multilayer Perceptron

3. Materials and Methods

3.1. Experimental Setup: Obtaining the Vibration Signals

Figure 2.

Table 1.

Figure 3.

Table 2.

Table 3.

Table 4.

3.2. Experimental Setup: Training the Classification System

Figure 4.

Figure 5.

4. Results and Discussion

Table 5.

Table 6.

Table 7.

Figure 6.

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

Figure 7.

Table 16.

Table 17.

Table 18.

Table 19.

5. Conclusions

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases