Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Nov 24;15:41533. doi: 10.1038/s41598-025-25469-4

Bearing fault diagnosis method based on WSST and ISSA-MCNN-BIGRU

Shien Dong 1, Weiyan Tong 1,, Hongwei Bai 1, Qiang Liu 1
PMCID: PMC12644591  PMID: 41285948

Abstract

Rolling bearings constitute essential components in large-scale rotating machinery, nonetheless, their fault diagnosis still encounters significant challenges, including the difficulty of extracting discriminative features, relatively low recognition rates, and heavy reliance on expert experience. To address these challenges, this paper proposes a hybrid diagnostic framework that integrates the Wavelet Synchrosqueezed Transform (WSST), an Improved Sparrow Search Algorithm (ISSA), a Multi-Scale Convolutional Neural Network (MCNN), and a Bidirectional Gated Recurrent Unit (BiGRU). First, WSST is employed to obtain high-resolution time-frequency representations that capture subtle transient characteristics of bearing vibration signals. Second, MCNN performs multi-scale spatial feature extraction on the WSST-generated images, enabling the simultaneous capture of fine-grained and coarse-grained fault patterns. Third, BiGRU is introduced to learn bidirectional temporal dependencies, thereby enhancing the model’s capability to represent sequential data. Crucially, ISSA—augmented with chaotic Tent mapping, a Gaussian mutation strategy, and a Levy flight mechanism-is applied to adaptively optimize key hyperparameters of the MCNN-BiGRU network (learning rate, convolutional kernel sizes, number of GRU units). Experimental results on both the Case Western Reserve University and Southeast University bearing datasets demonstrate that the proposed ISSA-MCNN-BiGRU model achieves a fault diagnosis accuracy of up to 99.75%, outperforming baseline models such as standalone GRU, BiGRU, MCNN-BiGRU, PSO-MCNN-BiGRU, and GA-MCNN-BiGRU in terms of accuracy, stability, and generalization. Additionally, in different noise environments, the proposed model’s accuracy is significantly higher than that of comparative models, demonstrating strong robustness.

Keywords: Rolling bearings, Fault diagnosis, Wavelet synchrosqueezed transform, Multi-Scale convolution, Bidirectional gated recurrent unit

Subject terms: Electrical and electronic engineering, Mechanical engineering

Introduction

Rotating mechanical equipment—such as aero-engines, high-speed trains, wind turbine generators, and precision machining tools—plays a pivotal role in modern industrial systems1,2. As an indispensable core element, rolling bearings operate under high-speed rotation and heavy loads for prolonged durations, rendering them susceptible to fatigue failure, surface wear, and structural cracks. Bearing faults not only degrade equipment performance but also incur substantial economic losses and safety hazards. Therefore, the development of scientifically rigorous and effective bearing fault diagnosis methodologies is of significant practical importance for ensuring operational reliability and mitigating potential risks3,4.

Conventional fault diagnosis techniques that rely exclusively on time-domain features of one-dimensional vibration signals tend to neglect critical frequency-domain information, impeding the comprehensive extraction of signal characteristics and reducing diagnostic reliability5. In response, Xufeng Zhao et al.6 applied the continuous wavelet transform (CWT) to convert one-dimensional vibration signals into time–frequency images, employed a two-dimensional convolutional neural network (2D CNN) for spatial feature extraction, and utilized a Siamese neural network (SNN) to measure sample similarity for bearing fault classification. While this approach effectively leverages time–frequency representations, its performance can be sensitive to the choice of mother wavelet and scale parameters, and it often struggles under low signal-to-noise ratio (SNR) conditions. Moreover, the reliance on handcrafted 2D embeddings may limit the ability to capture subtle temporal dependencies. Similarly, Wiciak et al.7 introduced the synchrosqueezed wavelet transform (WSST) to obtain sharper time–frequency representations, thereby improving feature clarity and diagnostic accuracy. However, WSST-based methods typically involve high computational complexity during transform computation and may require careful tuning to balance time-frequency resolution against robustness to noise.

Deep learning-based hybrid architectures have emerged as a global research focus due to their exceptional capacity for feature extraction and fault detection8. Dominik Luczak9 demonstrated that integrating a Multi-Scale Convolutional Neural Network (MCNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network effectively captures both multi-scale spatial features and temporal dynamics, resulting in enhanced learning efficiency and diagnostic performance. MCNNs excel at capturing features across varying scales, outperforming conventional single-scale CNNs in scenarios involving complex data with heterogeneous feature sizes and densities10. Building on this, Ya-Jing Huang et al.11 proposed CA-MCNN, which fuses MCNN with max and average pooling layers to extract richer multi-scale information, thereby improving fault detection. Nevertheless, these hybrid frameworks often depend on manual hyperparameter selection for convolutional kernel sizes and pooling strategies, which can lead to suboptimal configurations when transferred to different datasets or operating conditions. In addition, Dongsheng Yuan et al.12 integrated MCNN with BiLSTM and introduced an improved Northern Gaussian algorithm for noise optimization, collectively enhancing model convergence and diagnostic precision. However, such improvements focus primarily on noise suppression rather than addressing domain shifts or the need for robust hyperparameter tuning, which remain significant challenges in practical deployments.

Bidirectional Gated Recurrent Units (BiGRU) have been shown to achieve comparable performance to BiLSTM networks with fewer parameters, owing to their simplified gating mechanisms and efficient training properties13. For instance, Ziwei Xu et al.14 combined a CNN-BiGRU architecture with amplitude modulation (AM) preprocessing to improve bearing fault diagnosis, demonstrating that the hybrid framework outperforms several single-model approaches. G. Jai Arul Jose et al.15 developed a recommendation system based on a dilated MCNN-BiGRU architecture, highlighting the superiority of MCNN over conventional CNN for multi-scale feature extraction. Similarly, Ruizhe Yao et al.16 and Zongren Wang et al.17 confirmed that coupling MCNN with BiGRU yields higher diagnostic accuracy than standalone models. Although these studies underscore the benefits of combining convolutional and recurrent modules, most do not incorporate metaheuristic optimization, leading to potential inconsistencies when models are applied to datasets with different fault types or when operating under variable load and speed conditions18,19. Consequently, there remains a need for frameworks that can automatically adapt hyperparameters to new scenarios and maintain high performance.

To address this gap, Xue Jiankai et al.20 introduced the Sparrow Search Algorithm (SSA), which mimics the foraging and anti-predation behaviors of sparrows to achieve rapid convergence and robust search capabilities. Despite SSA’s strengths, its standard form may still encounter premature convergence and high sensitivity to initial population initialization, which can limit its applicability to complex fault-diagnosis tasks. Accordingly, the present study proposes an Improved Sparrow Search Algorithm (ISSA) that integrates chaotic tent mapping to enhance population diversity, a Gaussian mutation strategy to strengthen local search, and a Levy flight mechanism to facilitate global exploration. ISSA adaptively optimizes the structural and hyperparameter configurations of the MCNN-BiGRU network, including convolutional kernel sizes, filter numbers, learning rates, and recurrent unit dimensions, thereby accelerating convergence and mitigating entrapment in local minima. By automating hyperparameter selection, ISSA reduces reliance on manual tuning and improves model robustness when applied to diverse bearing conditions21.

Although hybrid deep learning models benefit from rich data-driven feature extraction, their robustness may decline under domain shifts or varying operational conditions. Recently, interpretable physics-informed domain adaptation paradigms have been proposed to enhance model generalization and robustness by incorporating domain knowledge into the learning process22. For example, physics-informed neural networks can encode mechanical conservation laws or bearing-specific vibration signatures to guide feature learning, while domain adaptation techniques align feature distributions between source and target domains to reduce performance degradation across different operating conditions23. However, these methods typically require detailed mechanical models or extensive labeled data from multiple domains, which may not always be available in industrial contexts. Additionally, existing interpretable domain adaptation methods often focus on marginal and conditional distribution alignment but may overlook high-dimensional temporal-spatial correlations in time-frequency representations. Therefore, there is a clear need for integrated frameworks that combine automated hyperparameter optimization with physics-informed constraints and domain adaptation strategies to ensure robust performance without incurring prohibitive labeling or modeling overheads24.

Motivated by these observations, the present study proposes a novel hybrid diagnostic framework that combines WSST-based feature extraction with an ISSA-optimized MCNN-BiGRU network. This framework is distinguished by two principal innovations:

  1. Enhanced Time–Frequency Feature Representation: By employing WSST, the proposed method obtains sharpened time–frequency representations that capture signal characteristics with greater precision than standard CWT or synchrosqueezed transforms reported in prior studies. The resulting high-resolution representations serve as enriched inputs to the MCNN, facilitating more effective multi-scale spatial feature extraction.

  2. Advanced Hyperparameter Optimization: Unlike existing PSO- or GA-based optimization methods, ISSA integrates chaotic tent mapping, Gaussian mutation, and Levy flight mechanisms to improve the balance between exploration and exploitation. This facilitates adaptive hyperparameter tuning of the MCNN-BiGRU network, enhancing convergence speed, avoiding local optima, and improving model stability.

Moreover, the proposed framework implicitly addresses domain shift challenges by combining WSST’s denoising properties with ISSA’s robust global search capability. This integration reduces sensitivity to variations in bearing conditions and operational noise, complementing physics-informed domain adaptation efforts without requiring explicit mechanical modeling.

Relevant theories

Wavelet synchrosqueezed transform

The Wavelet Synchrosqueezed Transform (WSST) is a time-frequency analysis method based on wavelet analysis. It optimizes the results of the Continuous Wavelet Transform (CWT) through the technique of synchrosqueezing to enhance the resolution and clarity of time-frequency analysis25. The WSST algorithm can be divided into three steps, the details of which are as follows:

  1. Firstly, perform the Continuous Wavelet Transform (CWT) on the signal Inline graphic to obtain the wavelet coefficients Inline graphic.

graphic file with name d33e326.gif 1

In Eq. (1), a is the scale factor, b is the time-shift factor, and Inline graphic is the complex conjugate of the mother wavelet function.

  • (2)

    Secondly, the instantaneous frequency Inline graphic is calculated by solving for the phase of the wavelet coefficients.

graphic file with name d33e361.gif 2

In Eq. (2), Inline graphic represents the phase of the wavelet coefficients.

  • (3)

    Lastly, the wavelet coefficients are mapped from the time-scale domain Inline graphic to the time-frequency domain Inline graphic, and the frequencies are synchronized and squeezed to concentrate the energy around the instantaneous frequency. In the discrete case, the formula for synchrosqueezing is:

graphic file with name d33e394.gif 3

In the continuous case, the formula for synchrosqueezing is:

graphic file with name d33e404.gif 4

In Eqs. (3) and (4), Inline graphic is the center frequency, Inline graphic is the frequency bandwidth, and Inline graphic is the Dirac delta function.

Convolutional neural network

The Convolutional Neural Network (CNN) is a deep learning architecture that is widely used in fields such as image recognition, and has attracted attention for its powerful feature extraction capabilities26. The core of CNN lies in its hierarchical structure, which mainly consists of convolutional layers, pooling layers, and fully connected layers27. The structure is shown in Fig. 1.

Fig. 1.

Fig. 1

The architecture of convolutional neural networks.

In the convolutional layer, multiple convolutional kernels are used to perform local perception and feature extraction on the input data28. The calculation formula for the j output Inline graphic of the l layer is:

graphic file with name d33e472.gif 5

In Eq. (5), Inline graphic is the j output of the l layer; Inline graphic is the output of the c channel of the Inline graphic layer; Inline graphic is the convolutional kernel weights of the c channel of the l layer; Inline graphic is the bias of the j output of the l layer; Inline graphic denotes the convolution operation.

The pooling layer is used to reduce the spatial dimensions of feature maps, decrease the computational load, and retain important features. Common pooling methods include max pooling and average pooling29.

The fully connected layer aggregates and classifies the features extracted by the convolutional and pooling layers, and outputs the final classification results through the Softmax activation function30.

Gated recurrent unit

The Gated Recurrent Unit (GRU) is a simplified version of the recurrent neural network structure31, used for processing sequential data and particularly adept at addressing the vanishing gradient problem in long sequences. It controls the flow of information through two key gating mechanisms: the update gate and the reset gate32. The structure of the GRU is shown in Fig. 2.

Fig. 2.

Fig. 2

The diagram of the GRU structure.

The calculation formula for the reset gate is:

graphic file with name d33e570.gif 6

The calculation formula for the update gate is:

graphic file with name d33e579.gif 7

The calculation formula for the candidate hidden state is:

graphic file with name d33e588.gif 8

The calculation formula for the current hidden state is:

graphic file with name d33e597.gif 9

In Eqs. (6)–(9): The reset gate Inline graphic and the update gate Inline graphic are calculated first. These two gating signals determine how to combine the hidden state from the previous time step and the input at the current time step to update the hidden state. The reset gate Inline graphic is used to modulate the hidden state from the previous time step Inline graphic, which is then combined with the current input Inline graphic. The candidate hidden state Inline graphic is calculated through the weight matrix W, bias b, and the hyperbolic tangent activation function Inline graphic. Based on the value of the update gate Inline graphic, the hidden state from the previous time step Inline graphic and the candidate hidden state Inline graphic are weighted and summed to obtain the hidden state at the current time step Inline graphic

Sparrow search algorithm

In 2020, Xue J et al. from Donghua University proposed the Sparrow Search Algorithm, which simulates the foraging and anti-predation behaviors of sparrow populations. Sparrows within the population collaborate based on different role assignments to improve foraging efficiency. In the SSA, the individual with the highest fitness is referred to as the “Discoverer,” which has priority in obtaining food and leads the population towards the optimal foraging area33. The position update method for the Discoverer is as follows:

graphic file with name d33e672.gif 10

In Eq. (10): Inline graphic is the maximum number of iterations; t is the current iteration number; Inline graphic is a random number between (0,1); Inline graphic is the vigilance value, ranging from [0,1]; Inline graphic is the safety threshold, with a value range of [0.5,1.0]; Q follows a standard normal distribution; L is a 1×d-dimensional matrix, with all elements being 1. When Inline graphic, the environment is considered safe, and the Discoverer can search for food in a broader area; when Inline graphic, it indicates that there may be predators nearby, and the population needs to adjust its actions to move towards a safer area.

Followers adjust their positions by observing or imitating the Discoverer with the best fitness to improve their foraging success rate34. The position update for Followers is as follows:

graphic file with name d33e724.gif 11

In Eq. (11): Inline graphic is the current best position; Inline graphic is the current worst position; A is a 1×d-dimensional matrix, with elements randomly assigned values of 1 or -1; Inline graphic; Inline graphic indicates that the individual has not yet successfully foraged and needs to search in other areas; when Inline graphic, the individual will move closer to the best Discoverer to increase the foraging success rate.

The Sentinels are primarily responsible for monitoring the environment and adjusting their positions when danger is detected35. Their update method is as follows:

graphic file with name d33e766.gif 12

In Eq. (12): Inline graphic is the current best position; Inline graphic follows a normal distribution with mean 0 and variance 1; k is a uniformly distributed random number within the range [-1, 1]; Inline graphic is the fitness of the current individual; Inline graphic and Inline graphic are the best and worst fitness values in the population, respectively; Inline graphic is a very small value to prevent division by zero. When Inline graphic, the individual is on the edge of the population and is more vulnerable to predators, thus requiring a rapid adjustment of its position; when Inline graphic, the individual is in the middle of the population and will move closer to other sparrows in the face of danger to reduce the risk of being preyed upon.

Improved sparrow search algorithm

Chaotic map initialization

To enhance the initial population diversity and improve global search capability, this paper employs the Logistic chaotic map to initialize the population positions36. The Logistic chaos map is a classic model of nonlinear dynamical systems. Compared to the Tent chaos map37 and the sine chaos map38, it possesses strong randomness and ergodicity, and its mathematical expression is given by:

graphic file with name d33e832.gif 13

In Eq. (13), Inline graphic represents the current state, which takes values in the range [0, 1], and r is the control parameter, typically set to 4. The chaotic sequence generated by the Logistic map allows the population to be evenly distributed in the solution space, effectively avoiding local optima.

Introduction of the Gaussian perturbation strategy

To further improve the global exploration capability of the algorithm, a Gaussian-based position perturbation strategy is introduced into the position update mechanism of discoverers. Specifically, this strategy updates the position by adding a normally distributed random offset to the current position of the individual. The mathematical expression is as follows:

graphic file with name d33e855.gif 14

In Eq. (14), Inline graphic denotes the difference between the current position and the population mean, Inline graphic is a random vector, and Inline graphic is a dynamic factor based on the Gaussian distribution. This strategy can effectively prevent premature convergence of the algorithm and enhance its global search ability.

Introduction of the levy flight strategy

To further enhance the global search ability of the algorithm, this paper incorporates the Levy flight strategy during the warning phase39. Levy flight is a random walk strategy based on the Levy distribution, characterized by its long-tailed nature, which enables large-step random movements. This facilitates rapid exploration of the solution space and helps avoid local optima. The position update formula using Levy flight is given by:

graphic file with name d33e887.gif 15

In Eq. (15), The generation formula for Levy steps is as follows:

graphic file with name d33e899.gif 16

In Eq. (16), Inline graphic and Inline graphic are random numbers generated from a normal distribution, and Inline graphic is the shape parameter of the Levy distribution, typically set to 1.5.

Improved sparrow search algorithm (ISSA) flow

Step1, Initialize parameters, including the population size N, maximum number of iterations T, the dimensionality of hyperparameters dim, the proportion of discoverers Inline graphic, and the proportion of warners Inline graphic

Step2, Initialize the population using the Logistic chaotic map. According to Eq. (13), generate N random numbers and map them to the search space [lb, ub].

Step3, Evaluate fitness and identify the best individual. For each sparrow in the population, calculate its fitness Inline graphic, sort the fitness values, and determine the best (minimum fitness) and worst (maximum fitness) individuals along with their corresponding positions.

Step4, Select the top Inline graphic sparrows with better fitness as discoverers, and perform global search using the standard normal distribution as described in Eq. (10). The remaining sparrows act as followers and perform local search around the best individual according to Eq. (11).

Step5, Randomly select Inline graphic individuals from the population as warners, and apply Levy flight for jump search using the update formula in Eq. (12).

Step6, Recalculate the fitness of each updated individual and check whether any new individual outperforms the current global best. If so, update the best solution and record the corresponding fitness.

Step7, If the current number of iterations reaches the maximum limit T, terminate the optimization process and output the results. Otherwise, return to Step 4 and continue the iteration.

The detailed flowchart is shown in Algorithm 1 and Fig. 3.

Fig. 3.

Fig. 3

Improved sparrow search algorithm search process.

Algorithm 1.

Algorithm 1

Improved Sparrow Search Algorithm (ISSA).

The ISSA-MCNN-BiGRU fault diagnosis method

Signal preprocessing

In this experiment, the signal is processed using the Synchrosqueezed Wavelet Transform (WSST). WSST combines the advantages of time-domain and frequency-domain analysis, enabling the decomposition of the signal into components at different scales (frequencies) and facilitating multi-band segmentation. It is capable of capturing local variations within the signal and analyzing its instantaneous and multi-scale characteristics. By balancing both time and frequency resolution, WSST is particularly effective in analyzing complex and dynamic signals. The preprocessed WSST time-frequency images are used as image inputs for subsequent fault feature extraction.

In this study, the built-in “wsst()” function in MATLAB was used, which defaults to the Morse wavelet as the mother wavelet with parameters commonly adopted in the literature (shape parameter γ = 3, bandwidth parameter β = 60). The frequency range of WSST is automatically determined by the signal length (1024 points) and sampling rate (12 kHz), resulting in 31.7404 ~ 6000 Hz, which covers the characteristic frequency range of bearing faults. The corresponding scale range is calculated using the formula Inline graphic (Inline graphic is the sampling rate and f is the frequency), yielding 1.6 ~ 308.7 (scale values are integer quantized results due to the discretization processing of the function).

Basic structure of the model

This paper proposes a fault diagnosis model based on a Multi-scale Convolutional Neural Network (MCNN) and Bidirectional Gated Recurrent Unit (BiGRU). The model consists of three main components, as illustrated in Algorithm 2 and Fig. 4.

Fig. 4.

Fig. 4

The structure of the MCNN-BiGRU model.

  1. MCNN:

As an extension of the traditional CNN, the multi-scale convolutional neural network (MCNN) uses multiple parallel convolutional paths with different kernel sizes to extract multi-scale features, which are then fused to enhance the model’s understanding of input data. The MCNN in this study consists of two branches: Layer 1 employs small convolution kernels (2 × 2, 3 × 3, 4 × 4) to extract local and medium-scale features, suitable for capturing details and local variations. Layer2 uses larger convolution kernels (7 × 7, 5 × 5) to extract global features, which are better for capturing overall trends and low-frequency information. The features from both branches are further processed through fully connected layers and fused in an addition layer to integrate the advantages of different scales.

  • (3)

    BiGRU:

The fused features are then passed to the Bidirectional Gated Recurrent Unit (BiGRU) module for temporal feature extraction. BiGRU comprises forward and backward GRU units that process time series bidirectionally, enabling the model to capture bidirectional dependencies in sequences. The forward GRU extracts features from the start to the end of the sequence, while the backward GRU does so from the end to the start. Finally, the outputs of both directions are combined to obtain a more complete and accurate representation of temporal features.

  • (4)

    Classification Layer:

This component consists of a fully connected layer and a Softmax layer. The fully connected layer links each neuron to all neurons in the preceding layer, serving to integrate and compress the temporal features extracted by the BiGRU module. Subsequently, the Softmax layer maps these features to different fault categories, completing the classification task and thus enabling fault diagnosis of the input data.

Algorithm 2.

Algorithm 2

The structure of the MCNN-BiGRU model.

Fault diagnosis process

The performance of fault diagnosis is evaluated using accuracy, while the cross-entropy loss function is adopted to assess the consistency between the predicted fault labels and the true labels. The fault diagnosis process of the MCNN-BiGRU model is illustrated in Fig. 5.

Fig. 5.

Fig. 5

Flowchart of ISSA-MCNN-BiGRU Fault Diagnosis Process.

Step1, Preprocess the raw vibration data, including time-frequency analysis, to obtain Synchrosqueezed Wavelet Transform (WSST) images. A total of 2000 samples are generated.

Step2, Set the initial population size for the Improved Sparrow Search Algorithm (ISSA) to optimize the model. Based on the preprocessed data, construct the deep learning model. The ISSA is used to optimize the model’s hyperparameters, including the initial learning rate, L2 regularization coefficient, learning rate decay factor, and dropout rate.

Step3, Check whether the predefined number of iterations has been reached. If not, continue optimizing the hyperparameters; if the maximum number of iterations is reached, output the optimal solution. Apply the optimal hyperparameters found by the ISSA to the MCNN-BiGRU model and conduct testing.

Step4, Output the fault diagnosis results based on the model’s testing performance.

As can be seen from the entire process in Fig. 5, the improved sparrow algorithm is used to optimize the hyperparameters of the model. In order to obtain the final optimized values of the hyperparameters, we conducted experiments using the vibration data from the Western Reserve University dataset as input, and the results are shown in Table 1 below:

Table 1.

Final optimized values of hyperparameters.

Hyperparameter Optimal value
Initial learning rate 0.00019122
L2 regularization coefficient 0.00096586
Learning rate decay factor 0.094332
Dropout ratio 0.56323

Experimental verification

Experimental data

The experiment utilizes the bearing dataset provided by Case Western Reserve University (CWRU). The experimental setup is shown in Fig. 6, (Data source: https://csegroups.case.edu/bearingdatacenter/pages/welcome-case-western-reserve-university-bearing-data-center-website).

Fig. 6.

Fig. 6

CWRU bearing vibration dataset experimental platform.

The data used in the experiment were collected from the drive-end bearing under a rotational speed of 1797 rpm and a load of 0 hp. The dataset includes samples from normal conditions as well as three types of bearing faults: inner race fault (IRF), outer race fault (ORF), and ball fault (BF). The fault diameters used in the experiment are 0.007 inches, 0.014 inches, and 0.021 inches, resulting in a total of 10 distinct classes of data. The sampling frequency for the drive-end bearing data is 12 kHz.

For each class of bearing signal data, 200 samples were extracted. After shuffling, the data were split into training, validation, and test sets in a ratio of 7:1:2. The final dataset consisted of 1400 training samples, 200 validation samples, and 400 test samples, each labeled with one of the 10 fault categories. The detailed configuration of the experimental dataset is shown in Table 2.

Table 2.

Experimental sample data of bearing faults.

Failure
category
Fault
diameter
Failure
location
Training
set
Validation
set
Validation
set
0 0.007 Inner ring 1,400 200 400
1 0.007 Outer ring 1,400 200 400
2 0.007

Rolling

element

1,400 200 400
3 0.014 Inner ring 1,400 200 400
4 0.014 Outer ring 1,400 200 400
5 0.014

Rolling

element

1,400 200 400
6 0.021 Inner ring 1,400 200 400
7 0.021 Outer ring 1,400 200 400
8 0.021

Rolling

element

1,400 200 400
9 Normal —— 1,400 200 400

In the experiment, 200 samples were extracted for each fault condition, and each sample contains 1024 data points, which can be converted into a WSST time-frequency image. Figure 7 sequentially presents the Continuous Wavelet Transform (CWT) and WSST images for inner race fault, outer race fault, and ball fault. As shown in the figure, replacing CWT with WSST significantly reduces the energy of irrelevant information in the time-frequency spectrogram and avoids frequency cross-interference. WSST enhances the clarity of frequency contours through optimized processing, enabling more precise identification and representation of fault-related information. This improvement facilitates efficient analysis and diagnosis of potential issues. All time-frequency maps were generated using the WSST algorithm with MATLAB’s default Morse wavelet (theoretical reference parameters: γ = 3, β = 60), which effectively captures instantaneous frequency variations and multi-scale features of the signals, thereby facilitating subsequent fault pattern extraction by deep learning networks.

Fig. 7.

Fig. 7

The presentation effects of CWT and WSST: (a), (b), and (c) correspond to CWT, and (d), (e), and (f) correspond to WSST.

Analysis of experimental results

To address the issues of the traditional Sparrow Search Algorithm (SSA), such as susceptibility to local optima and high randomness in population initialization, this study introduces improvements using chaotic Tent mapping, normal distribution-based perturbation, and Levy flight strategy. These enhancements aim to strengthen the algorithm’s global search capability and convergence speed. The improved algorithm, referred to as ISSA (Improved Sparrow Search Algorithm), is compared against the original SSA. Simulation experiments were conducted using MATLAB 2024a on a system equipped with an AMD Ryzen 9 processor, 16 GB of RAM, and Windows 11.

In the experimental setup, the maximum number of iterations was set to 40, and the initial population size was set to 30. The number of discoverer sparrows Inline graphic and the number of scout sparrows Inline graphic were both set to 3. The accuracy from cross-validation was used as the fitness function for evaluation. The optimization process before and after improvement is shown in Fig. 8.

Fig. 8.

Fig. 8

Comparison of optimization curves between SSA and ISSA Algorithms.

As observed in the Fig. 8, the ISSA significantly improves convergence speed and optimization efficiency by introducing chaotic Tent mapping for better population initialization. The ISSA curve rapidly rises and stabilizes near 100% accuracy in the early stages of iteration, whereas the original SSA shows a slower increase and requires more iterations to stabilize. Although both algorithms eventually reach similar accuracy levels, ISSA converges faster and avoids getting trapped in local optima, demonstrating the effectiveness of the proposed enhancements.

To validate the performance of the proposed model in bearing fault diagnosis tasks, it was compared with several deep learning-based models, including GRU, BiGRU, MCNN-BiGRU, PSO-MCNN-BiGRU, and GA-MCNN-BiGRU. The comparison results are presented in Fig. 9.

Fig. 9.

Fig. 9

Comparison of fault diagnosis results among different models.

As shown in Fig. 9, the ISSA-MCNN-BiGRU model significantly improves fault diagnosis accuracy. However, there exists a strong correlation among some of the feature parameters. To reduce the impact of random experimental variation, each algorithm was run 10 times, and the average accuracy of these runs was used as the final evaluation metric. Comprehensive analysis shows that the average accuracy of the ISSA-MCNN-BiGRU model outperforms the other models, thereby validating the superiority of the proposed approach. Detailed results are presented in Table 3.

Table 3.

Comparison of different diagnostic methods.

Diagnostic method Average accuracy Average
runtime (s)
GRU 62.33% 2
BiGRU 65.18% 3
MCNN-BiGRU 95.65% 8
PSO-MCNN-BiGRU 99.03% 45
GA-MCNN-BiGRU 99.20% 50
The proposed method in this paper 99.67% 40

Table 3 shows that the proposed ISSA-MCNN-BiGRU model achieves an average accuracy of 99.67% over ten runs, exceeding PSO-MCNN-BiGRU (99.03%) and GA-MCNN-BiGRU (99.20%). This performance gain is attributable to two key factors. First, WSST yields highly concentrated time–frequency representations that enable MCNN to extract discriminative multi-scale spatial features, while BiGRU captures bidirectional temporal dependencies. Second, ISSA’s integration of chaotic tent mapping, Gaussian mutation and Levy flight allows adaptive hyperparameter tuning, improving convergence speed and avoiding local optima more effectively than PSO or GA. The model’s superior stability—evidenced by minimal variation across runs—confirms that these components work synergistically to enhance feature representation and classifier robustness on a dataset characterized by low noise and high class separability.

In terms of computational cost, lightweight architectures such as GRU and BiGRU complete training and inference in approximately 2 s and 3 s, respectively. Introducing the MCNN backbone increases processing time to around 8 s due to multi-scale convolutional feature extraction. When metaheuristic optimization is applied, as in PSO-MCNN-BiGRU and GA-MCNN-BiGRU, the runtime rises to 45 s and 50 s on average because each iteration evaluates numerous candidate hyperparameter sets. By contrast, our ISSA-MCNN-BiGRU requires roughly 40 s, which is ∼11% faster than the GA-based variant and ∼11% faster than PSO. This improvement stems from ISSA’s better convergence properties: chaotic tent mapping enhances global exploration early on, Gaussian mutation injects diversity to avoid premature stagnation, and Lévy flights accelerate local search around promising regions. Consequently, ISSA requires fewer total evaluations to find optimal or near-optimal parameters. Overall, the proposed method strikes a favorable balance between accuracy and computational cost, achieving the highest diagnostic performance while maintaining a shorter runtime compared to other metaheuristic-based approaches.

Next, we compare the proposed ISSA-MCNN-BiGRU model with several reference approaches-GRU, BiGRU, MCNN-BiGRU, PSO-MCNN-BiGRU, and GA-MCNN-BiGRU-under different noise levels. As shown in Fig. 10, when additive Gaussian noise is introduced, the diagnostic accuracies of ISSA-MCNN-BiGRU, PSO-MCNN-BiGRU, GA-MCNN-BiGRU, MCNN-BiGRU, BiGRU, and GRU are 99.77%, 98.01%, 97.10%, 95.80%, 93.20%, and 88.45%, respectively. Although all models suffer some degradation as noise increases, the ISSA-MCNN-BiGRU consistently outperforms each alternative by at least 1.5% points, demonstrating its superior robustness in high-noise environments.

Fig. 10.

Fig. 10

The diagnostic accuracy results of each model under different signal-to-noise ratios.

Generalization diagnosis experiment

To demonstrate the generalization ability of the proposed method, experiments were conducted using the Southeast University (SEU) bearing dataset. The system structure of the SEU gearbox test rig is shown in Fig. 11, (Data source: https://github.com/cathysiyu/Mechanical-datasets). The test rig preset faults in the first and second-stage planetary gears of the planetary gearbox. In this dataset, gear faults include tooth surface wear, tooth root wear, tooth root crack, and tooth breakage; bearing faults include rolling element fault, inner race fault, outer race fault, and bearing mixed fault. Including the normal states of gears and bearings, there are a total of ten types of gearbox health state signals.

Fig. 11.

Fig. 11

SEU gearbox test rig.

After processing the vibration data of the SEU dataset using synchronous compression wavelet transform (WSST), the data were used as input samples for the fault diagnosis model. The dataset was divided into training, test, and validation sets at a ratio of 7:2:1. The network was constructed using the same configuration and optimization methods as those applied when validating the Case Western Reserve University (CWRU) dataset. The accuracy iteration curve of the fault diagnosis model is shown in Fig. 12.

Fig. 12.

Fig. 12

Accuracy graph of each model on the validation set.

As observed from the Fig. 12, the proposed fault diagnosis model based on WSST and ISSA-MCNN-BiGRU performs excellently across different bearing datasets, demonstrating the model’s good generalization ability. To ensure that ISSA does not simply overfit to specific benchmark tasks, we embedded cross-validation within the optimization loop and penalized overly complex hyperparameter configurations, which constrains ISSA from favoring solutions that exploit idiosyncrasies of a single dataset. In practice, ISSA-optimized hyperparameters remained stable when transferring between two bearing testbeds with differing operating speeds and load profiles, indicating that the search process is not tailoring itself narrowly to one data distribution. For scenarios involving class imbalance (e.g., rare fault types) or entirely different mechanical systems, the same ISSA framework can accommodate weighted loss functions or data-augmentation strategies during optimization to preserve diagnostic accuracy. Because WSST and MCNN-BiGRU together form a modular pipeline, adapting to non-rotating machinery faults simply requires recalibrating WSST scales to the dominant vibration bands of that system and retraining the network; ISSA can then reoptimize hyperparameters within this new domain without excessive manual tuning. By combining nested cross-validation, regularization during hyperparameter search and modular WSST preprocessing, the ISSA-MCNN-BiGRU framework balances flexibility with robustness, mitigating the risk of hyperparameter overfitting even when faced with unbalanced data or domain shifts.

Conclusion

To tackle the challenges of feature extraction and model optimization in rolling bearing fault diagnosis, this paper presents a novel diagnostic approach that integrates synchrosqueezed wavelet transform (WSST) with the ISSA-MCNN-BiGRU model. First, WSST is used to convert vibration signals into time-frequency representations, enabling high-resolution feature extraction while preserving critical original signal information. An MCNN-BiGRU deep neural network is then constructed, combining the spatial feature extraction capability of multi-scale convolution with BiGRU’s temporal modeling ability to achieve deep mining of complex features. Aiming at the issues of high initialization randomness and local optimum susceptibility in the traditional Sparrow Search Algorithm (SSA), this study introduces chaotic Tent mapping, normal distribution-based perturbation, and Levy flight strategy to develop an improved SSA (ISSA), which adaptively optimizes the model’s hyperparameters to enhance overall performance. Experimental results show that the proposed model exhibits superior diagnostic accuracy and robustness, outperforming conventional deep learning models and hybrid optimization methods, thus verifying its effectiveness and practical application potential in intelligent fault diagnosis.

In real world scenarios such as early stage faults with weak high frequency components, composite defects with overlapping spectra or low signal to noise ratio conditions, the proposed model may require additional enhancements. Specifically, WSST parameters (for example scale ranges and decomposition levels) could be refined or adapted to highlight subtle fault signatures; for composite defects, a multi resolution WSST approach using multiple mother wavelets and frequency bands can help separate coexisting energy distributions, and preprocessing with variational mode decomposition (VMD) can isolate individual modal components before WSST; to mitigate low SNR challenges, data augmentation that simulates various noise levels will encourage BiGRU to learn more robust temporal features, and the ISSA objective function could be extended to include noise robustness metrics, such as classification accuracy under synthetic noise, to guide hyperparameter selection toward more resilient configurations. Future work should explore physics informed modeling, causal inference and uncertainty quantification to further enhance robustness and interpretability. Embedding bearing dynamics equations or lubrication condition constraints into the network architecture or loss function can enforce physically plausible feature representations and improve generalization; employing causal inference methods, for example constructing directed acyclic graphs to identify causal links between operational parameters and fault progression, will help distinguish true fault indicators from spurious correlations; and quantifying predictive uncertainty via Bayesian neural networks, Monte Carlo dropout or deep ensembles will provide confidence intervals for each diagnosis, allowing operators to prioritize inspections when model confidence is low.

Author contributions

Conceptualization, S.D. and H.B.; methodology, S.D. and H.B.; data curation and writing—original draft, S.D.; funding acquisition, validtion and project administration, W.T.; visualization and investigation, S.D.; software and validation, Q.L.; writing—review and editing, S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Liaoning Provincial Science and Technology Program Joint Project (2024-BSLH-210), China.

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Wen, C., Xue, Y., Liu, W., Chen, G. & Liu, X. Bearing fault diagnosis via fusing small samples and training multi-state Siamese neural networks. Neurocomputing576, 127355 (2024). [Google Scholar]
  • 2.Sun, B. et al. State-of-the-art detection and diagnosis methods for rolling bearing defects: a comprehensive review. Appl. Sci.15, 1001 (2025). [Google Scholar]
  • 3.Li, Y., Wang, D., Zhao, X., Men, Z. & Wang, Y. Research on rolling bearing fault diagnosis method based on simulation and experiment fusion drive. Rev. Sci. Instrum.95, 065107 (2024). [DOI] [PubMed] [Google Scholar]
  • 4.Li, G. et al. Zero-sample fault diagnosis of rolling bearings via fault spectrum knowledge and autonomous contrastive learning. Expert Syst. Appl.275, 127080 (2025). [Google Scholar]
  • 5.Guo, Y., Zhou, J., Dong, Z., She, H. & Xu, W. Research on bearing fault diagnosis based on novel MRSVD-CWT and improved CNN-LSTM. Meas. Sci. Technol.35, 095003 (2024). [Google Scholar]
  • 6.Zhao, X., Wang, L., Yang, M., Chen, Y. & Xiang, J. A novel small-sample fault diagnosis method for rolling bearings via continuous wavelet transform and Siamese neural network. IEEE Sens. J.24, 24988–24996 (2024). [Google Scholar]
  • 7.Wiciak, P., Cascante, G. & Polak, M. A. Novel application of wavelet synchrosqueezed transform (WSST) in laser-vibrometer measurements for condition assessment of cementitious materials. NDT E Int.120, 102424 (2021). [Google Scholar]
  • 8.Luo, R., Li, Y., Guo, H., Wang, Q. & Wang, X. Cross-operating-condition fault diagnosis of a small module reactor based on CNN-LSTM transfer learning with limited data. Energy313, 133901 (2024). [Google Scholar]
  • 9.Łuczak, D. Data-driven machine fault diagnosis of multisensor vibration data using synchrosqueezed transform and time-frequency image recognition with convolutional neural network. Electronics13, 2411 (2024). [Google Scholar]
  • 10.Zhu, Y., Zhang, C., Zhang, R. & Gao, F. Design of model fusion learning method based on deep bidirectional GRU neural network in fault diagnosis of industrial processes. Chem. Eng. Sci.302, 120884 (2025). [Google Scholar]
  • 11.Huang, Y. J., Liao, A. H., Hu, D. Y., Shi, W. & Zheng, S. B. Multi-scale convolutional network with channel attention mechanism for rolling bearing fault diagnosis. Measurement203, 111935 (2022). [Google Scholar]
  • 12.Yuan, D. et al. A novel method based on the SCNGO-ICEEMDAN and MCNN‐BiLSTM model for fault diagnosis of motor bearings for more electric aircraft. IET Electr. Power Appl.18, 1773–1785 (2024). [Google Scholar]
  • 13.Zhang, B., Jia, M., Xu, J., Zhao, W. & Deng, L. Network security situation prediction model based on EMD and ELPSO optimized BiGRU neural network. Comput. Intelli. Neurosci.2022, 6031129 (2022). [DOI] [PMC free article] [PubMed]
  • 14.Xu, Z., Li, Y. F., Huang, H. Z., Deng, Z. & Huang, Z. A novel method based on CNN-BiGRU and AM model for bearing fault diagnosis. J. Mech. Sci. Technol.38, 3361–3369 (2024). [Google Scholar]
  • 15.Jose, J. A., Mastan, G., Al-Nuaimy, L. A. & M. & H. Aspect based hotel recommendation system using dilated multichannel CNN and BiGRU with hyperbolic linear unit. Int. J. Mach. Learn. Cyber.15, 4867–4886 (2024). [Google Scholar]
  • 16.Yao, R., Wang, N., Ke, W., Chen, P. & Sheng, X. Electricity theft detection in unbalanced sample distribution: a novel approach including a mechanism of sample augmentation. Appl. Intell.53, 11162–11181 (2023). [Google Scholar]
  • 17.Wang, Z. et al. Multi-scale Spatiotemporal feature lithology identification method based on split-frequency weighted reconstruction. Geoenergy Sci. Eng.226, 211794 (2023). [Google Scholar]
  • 18.Li, X. et al. Energy-propagation graph neural networks for enhanced out-of-distribution fault analysis in intelligent construction machinery systems. IEEE Internet Things J2024, 1–1 (2024).
  • 19.Li, X. et al. Adaptive expert ensembles for fault diagnosis: a graph causal framework addressing distributional shifts. Mech. Syst. Signal Process.234, 112762 (2025). [Google Scholar]
  • 20.Xue, J. & Shen, B. A novel swarm intelligence optimization approach: sparrow search algorithm. Syst. Sci. Control Eng.8, 22–34 (2020). [Google Scholar]
  • 21.Shang, H., Liu, Z., Wei, Y. & Zhang, S. A novel fault diagnosis method for a power transformer based on multi-scale approximate entropy and optimized convolutional networks. Entropy26, 186 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu, B. et al. An interpretable physics-informed subdomain moment-enhanced adaptation network for unsupervised transfer fault diagnosis of rolling bearing. Adv. Eng. Inform.67, 103491 (2025). [Google Scholar]
  • 23.He, C., Shi, H., Liu, X. & Li, J. Interpretable physics-informed domain adaptation paradigm for cross-machine transfer diagnosis. Knowl. Based Syst.288, 111499 (2024). [Google Scholar]
  • 24.Qin, Y., Liu, H., Wang, Y. & Mao, Y. Inverse physics–informed neural networks for digital twin–based bearing fault diagnosis under imbalanced samples. Knowl. Based Syst.292, 111641 (2024). [Google Scholar]
  • 25.Akkaya, S. Optimization of convolutional neural networks for classifying power quality disturbances using wavelet synchrosqueezed transform. TS41, 599–614 (2024). [Google Scholar]
  • 26.Yuan, C. et al. Fault diagnosis method of forging press based on improved CNN. IEEE Access.12, 181925–181936 (2024). [Google Scholar]
  • 27.Dong, X., Raja, S. S., Zhang, J. & Wang, L. Ship trajectory prediction based on CNN-MTABiGRU model. IEEE Access.12, 115306–115318 (2024). [Google Scholar]
  • 28.Li, X. et al. A review on convolutional neural network in rolling bearing fault diagnosis. Meas. Sci. Technol.35, 072002 (2024). [Google Scholar]
  • 29.Tang, M., Liang, L., Zheng, H., Chen, J. & Chen, D. Anomaly detection of permanent magnet synchronous motor based on improved DWT-CNN multi-current fusion. Sensors24, 2553 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhang, Q., Cui, J., Xiao, W., Mei, L. & Yu, X. Demagnetization fault diagnosis of a PMSM for electric drilling tools using GAF and CNN. Electronics13, 189 (2024). [Google Scholar]
  • 31.Zhu, H., Sui, Z., Xu, J. & Lan, Y. Fault diagnosis of mechanical rolling bearings using a convolutional neural network–gated recurrent unit method with envelope analysis and adaptive mean filtering. Processes12, 2845 (2024). [Google Scholar]
  • 32.Jia, X. et al. A structural health monitoring data reconstruction method based on VMD and SSA-optimized GRU model. Sci. Rep.15, 3513 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Xu, D. & Li, C. Optimization of deep belief network based on sparrow search algorithm for rolling bearing fault diagnosis. IEEE Access.12, 10470–10481 (2024). [Google Scholar]
  • 34.Qu, J., Cheng, X., Liang, P., Zheng, L. & Ma, X. Fault diagnosis of bearings using wavelet packet energy spectrum and SSA-DBN. Processes11, 1875 (2023). [Google Scholar]
  • 35.Chang, Y. & Bao, G. Enhancing rolling bearing fault diagnosis in motors using the OCSSA-VMD-CNN-BiLSTM model: A novel approach for fast and accurate identification. IEEE Access.12, 78463–78479 (2024). [Google Scholar]
  • 36.Wei, F., Feng, Y., Shi, X. & Hou, K. Improved sparrow search algorithm with adaptive multi-strategy hierarchical mechanism for global optimization and engineering problems. Cluster Comput.28, 215 (2025). [Google Scholar]
  • 37.Liu, K., Dai, Y. & Liu, H. Improvement of Dung beetle optimization algorithm application to robot path planning. Appl. Sci.15, 396 (2025). [Google Scholar]
  • 38.Zheng, Y. et al. Sine-SSA-BP ship trajectory prediction based on chaotic mapping improved sparrow search algorithm. Sensors23, 704 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Li, G., Zhang, X., Gao, Y., Cui, F. & Shi, Z. Surface topography analysis and surface roughness prediction model of diamond wire-sawed NdFeB magnet based on optimized back propagation neural network. Processes13, 546 (2025). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES