Abstract
Bearing faults in rotating machinery can lead to significant economic losses due to downtime and pose serious safety risks. Accurate fault diagnosis is crucial for effective condition monitoring. Traditional methods for diagnosing bearing faults under noisy conditions often rely on complex data preprocessing and struggle to maintain accuracy in high-noise environments. To address this challenge, this paper proposes an end-to-end Discrete Wavelet Integrated Convolutional Residual Neural Network (DWCResNet) for bearing fault diagnosis. The model incorporates Discrete Wavelet Transform (DWT) layers to replace traditional downsampling operations in convolutional neural networks, decomposing input signals into low-frequency and high-frequency components to effectively remove high-frequency noise and extract fault features, thereby improving diagnostic performance. The cyclic learning rate strategy enhances training efficiency. Experiments conducted on the Case Western Reserve University (CWRU) and Paderborn University (PU) bearing datasets demonstrate that DWCResNet achieves higher diagnostic accuracy and noise robustness under various conditions, providing an efficient solution for bearing fault diagnosis in complex noisy environments.
Keywords: Rolling bearing, Fault diagnosis, Residual neural network, Cycle learning rate strategy, Wavelet downsampling, Discrete wavelet transform
Subject terms: Mechanical engineering, Computer science
Introduction
As essential elements of rotating machinery, rolling bearings fulfill crucial functions such as supporting the shaft, constraining relative movement, and reducing friction between rotating parts1. The majority of rolling bearings function under conditions of high temperature, high rotational speed, and heavy loads, making them susceptible to various types of failures2.Bearing failure not only leads to equipment operation interruption and significant economic losses, but also may cause serious safety accidents3. Therefore, timely and accurate diagnosis of bearing faults is of great significance to ensure the safe operation of equipment, reduce accidents and improve production efficiency. As the complexity of industrial equipment increases, the actual operating environment is often accompanied by load changes and strong noise interference, which brings great challenges to fault diagnosis. Under such complex conditions, traditional diagnostic methods based on signal processing and empirical rules have been difficult to meet the demand for high-precision diagnosis. Therefore, it is particularly important to develop a bearing fault diagnosis method with strong robustness, adaptability to variable operating conditions and outstanding noise resistance.
Over the past few years, deep learning has witnessed proliferation and prosperity across numerous domains, including computer vision (CV)4–8, natural language processing (NLP)9–13, and so on14,15. Deep learning has become a research hotspot in the field of fault diagnosis due to its powerful nonlinear modeling capability and end-to-end characteristics16,17. Convolutional Neural Networks (CNNs) are presently the most commonly utilized deep learning models, having been extensively developed and applied to a variety of tasks, including bearing fault diagnosis. CNNs typically adopt end-to-end fault diagnosis approaches, utilizing a single model to handle denoising, feature extraction, and classification tasks. The current CNN noise-robustness fault diagnosis model typically employs two-dimensional CNN (2DCNN)18 or one-dimensional CNN (1DCNN)19. The input of 1DCNN generally uses a one-dimensional original vibration time-domain signal. Xudong Song et al.20 proposed a bearing fault diagnosis approach based on CNN utilizing wide convolutional kernels, enabling rapid feature extraction from time-domain vibration signals employing wide kernels in the initial two convolutional layers. Guoqiang Jin et al.21 introduced an end-to-end adaptive noise-resistant neural network architecture that eliminates the need for artificial feature selection or noise suppression procedures. This architecture incorporates gated recurrent neural networks (GRNN) with an enhanced attention mechanism to capture and categorize features extracted by convolutional neural networks (CNN). Xiaohan Chen et al.22 employed two convolutional neural networks with varying kernel sizes to automatically extract features across different frequency ranges from the raw vibration data. Following this, long short-term memory (LSTM) networks were used to identify fault types building on the extracted features. Wei Zhang et al.23 developed a fault diagnosis model with robust domain adaptability and strong resistance to noise, directly employing the original signal as input without denoising pre-processing. The input of the 2DCNN is usually a one-dimensional time-domain signal converted into a two-dimensional image or time-frequency map. Hongchun Sun et al.24 applied Frequency Slice Wavelet Transform (FSWT) to produce time-frequency images, enhancing the richness of sample data. They further incorporated the Multi-Scale Dilate (MS-D) module and the Residual Channel Attention (RCA) module to mitigate noise interference. Alternatively, a scaled Ramanujan filter banks (RFBs) method has been introduced in Ruixian Li et al.25 to suppress the noise and as encoding approaches to translate the original vibration signals to RGB images. Subsequently, a strip convolutional neural network (strip-CNN) was designed utilizing strip convolution techniques to identify bearing faults based on the acquired RGB images. Although many algorithms can effectively address the challenge of bearing fault diagnosis in the presence of noise interference, these methods often rely on complex noise reduction preprocessing techniques. Such preprocessing can disrupt the intrinsic coupling relationships between fault features, leading to the partial loss of critical fault information. Moreover, the noise resistance of fault diagnostic models under varying load conditions remains insufficiently explored, leaving a significant gap in this area of research.
In this context, transfer learning methods have also been widely applied to tackle similar problems26–29. By leveraging knowledge transfer between the source and target domains, transfer learning can enhance diagnostic performance in scenarios where the target domain data is limited. Sandeep S. Udmale et al.30 proposed a transfer learning approach for fault diagnosis in rotating machinery, utilizing kurtograms and convolutional neural networks to address data scarcity and varying operating conditions while mitigating negative transfer effects. R. Zhang et al.31 presented a transfer learning approach to fault diagnosis using neural networks that adapts to changing working conditions by transferring parameters and training on limited target data, demonstrating improved accuracy and efficiency on the CWRU bearing dataset. K. Che et al.32 proposed a cross-domain fault diagnosis model combining domain adversarial neural networks (DANN) network and CNN-LSTM, leveraging DFT-transformed vibration signals for feature extraction and fusion, achieving improved diagnostic accuracy and generalization under variable working conditions. Y. Zou et al.33 put forth a novel deep convolution Wasserstein adversarial networks (DCWANs) based fault diagnosis model with variance constraints. This approach was designed to enhance domain adaptation and clarify decision boundaries, thereby achieving superior accuracy under varying working conditions compared to existing methods. In order to address the issue of limited labeled data, Z. Wang et al.34 have developed a fault diagnosis method which integrates deformable CNN, DLSTM, and transfer learning. This approach allows for effective feature extraction and classification under varying working conditions. The resulting performance is superior to that of existing state-of-the-art methods. Despite achieving success in a variety of working conditions and limited data scenarios, transfer learning methods are not without limitations. The majority of models lack robustness to noise interference and rely on complex preprocessing steps, which may result in the loss of critical fault information. Many of these methods require adaptation steps, such as re-labelling target domain data or aligning features, which increases the complexity of the diagnostic system. These methods frequently rely on extensive pre-training of the source domain or fine-tuning of the target domain, which can result in low training efficiency and slow convergence with limited target domain data. Furthermore, the majority of these methods are unable to achieve end-to-end diagnostics, as the processes of feature extraction and classification remain distinct, which in turn complicates the design and application of the models in question.
This paper introduced a Discrete Wavelet Integrated Convolutional Residual Neural Network (DWCResNet) model to address the above problems. To mitigate the influence of noise and improve the accuracy of bearing fault classification, we incorporate wavelet transforms into the commonly used ResNet architectures. First, we transform the one-dimensional Discrete Wavelet Transform (DWT) and Inverse Discrete Wavelet Transform (IDWT) into standard network layers by substituting the typical down-sampling operation with DWT. During the down-sampling process, DWCResNet removes the high-frequency components of fault features to enhance the resilience of ResNets to noise, while extracting relevant features from the low-frequency components. The suggested DWCResNet is an end-to-end neural network capable of directly classifying signals from raw noisy inputs without pre-processing methods used to remove noise. The key contributions of this article are outlined as follows:
Developed a fully end-to-end model for bearing fault diagnosis that directly processes raw vibration signals as input, eliminating the need for manual feature selection and denoising preprocessing.
Introduced a generalized Discrete Wavelet Transform (DWT) and Inverse Discrete Wavelet Transform (IDWT) as network layers, supporting various wavelet types and enabling the design of wavelet-integrated residual networks.
Incorporated DWT to replace conventional down-sampling operations, improving the noise robustness and domain adaptability of ResNets by effectively isolating high-frequency noise and retaining low-frequency fault features.
Employed a Cyclical Learning Rate (CLR) strategy to dynamically regulate the learning rate during training, facilitating fine-tuning and optimization while accelerating convergence and preventing overfitting.
Experimental results demonstrate that the proposed DWCResNet effectively suppresses high-frequency noise interference, exhibits strong cross-domain adaptability without the need for additional data adaptation, and significantly enhances diagnostic accuracy across varying working conditions.
The remainder of this article is organized as follows. Section 2 elaborates in detail the construction process and the LR strategy of the proposed DWCResNet method. Section 3, introduces the proposed DWCResNet model, detailing its architecture and key components. Section 4, several experiments are conducted on two datasets under noisy and variable load conditions to validate the effectiveness of the proposed method. Finally, Section 5 concludes the paper, summarizing the key contributions and suggesting potential directions for future research.
Related works
One-dimensional convolutional neural network(1DCNN)
As one of the most influential networks in the domain of deep learning, the Convolutional Neural Network (CNN) has attracted significant attention in bearing fault diagnosis, owing to its exceptional capabilities for feature extraction and strong fitting performance35. Based on the input data dimensions, CNNs can be divided into 1D-CNN and 2D-CNN categories. Compared to 2D-CNN, the use of one-dimensional convolutional kernels in 1D-CNN reduces the number of parameters, significantly lowering the computational complexity of the model36. Therefore, in this paper, 1DCNN is used for fault diagnosis.
CNN is a convolutional neural network with a multi-layered architecture that can efficiently extract and explore complex data features through the application of non-linear mappings in a progressive manner, from the input layer through to the output layer37. The following components are typically included in a CNN structure.
Convolution layer38. In a one-dimensional convolution layer, the input one-dimensional feature vectors are convolved by the convolution kernel within a one-dimensional structure using a specified step size. As a consequence of the sharing of weights across convolution kernels, each kernel generates a single feature map. The number of convolution kernels determines the depth of the resulting feature map. Mathematically, the convolution operation can be described as follows:
| 1 |
Where refers to the i-th input feature map and is the i-th output of the layer , is the weight matrix of the convolution kernel in the n-th layer, represents the bias vector, and refers to the n-th channel of the n-layer.
Pooling layer. The feature maps produced by the convolutional layer are passed to the pooling layer, which is responsible for selecting the most relevant features. The pooling layer reduces the computational load by shrinking the size of the feature maps, which helps prevent overfitting. Specifically, the max pooling operation can be described as follows:
| 2 |
where x is the value of the t-th neurons of the i-th channel of the n-th layer, and S is the pooling stride. y is the output value of the i-th unit of the n-th layer channel.
Fully connected (FC) layer. After being nonlinearly combined into a set of global features, the extracted features from the convolutional and pooling layers are utilized for classification. The fully connected layer’s specific calculation can be depicted by the following equation:
| 3 |
Where is the output value of the L-th layer; represents the weight between the i-th unit of the L-th layer and the j-th unit of the L+1-th layer, and means the bias of all of the L-th layer to the j-th unit of the L+1-th layer. denotes the output value of the j-th output unit of the n-th layer channel.
Once the extracted feature information is processed, the classification of target samples is performed using the Softmax function. The equation for the Softmax function is as follows:
| 4 |
where is the eigenvector, w is the weight matrix, and w denotes the bias vector.
One-dimensional residual networks
As the neural network’s performance improves, the number of layers and model complexity increases and improves, which can lead to problems such as the gradient of the backpropagation potentially vanishing or exploding. To effectively address this problem, K. He et al.39 proposed the residual networks (ResNets), which introduced additional shortcut links by residual mapping of inputs to outputs.
The architecture of the residual network is depicted in Fig.1, where x is the input, H(x) refers to the identity mapping function from input to output, and F(x) denotes the residual mapping function being fitted40. The identity mapping function H(x) of ResNets, which maps the input to the output, can be expressed as:
| 5 |
Fig. 1.

Residual network modules.
ResNets adds a cross-layer connection to the traditional deep convolution network to fit the residual function. Compared with directly combining an identity mapping function , fitting the residual mapping is easier. As a result, ResNets can propagate network information across layers, preserving the original features and mitigating the loss from multi-layer nonlinear transformations, while maximizing the powerful learning potential of CNNs. ResNets is founded upon the residual block in Fig.2, which serves as the most fundamental element.
Fig. 2.

(a) The BasicBlock of ResNets; (b) The BottleneckBlock of ResNets.
In the architecture of ResNets, the residual function F(x) is composed of convolutional layers and activation functions (e.g., ReLU), as illustrated in Fig.1. It typically includes one or more convolution operations followed by non-linear activation functions to learn the residual mapping of the input features. When the dimensions of the input x and the output are the same, an identity mapping can be directly applied, where x is added to F(x) without introducing additional parameters.
In the event of a discrepancy between the dimensions of the input and output, two solutions are available39. The first involves the use of zero-padding to perform identity mapping, thereby increasing the dimensions without the addition of parameters. The second employs a convolution operation to perform a projection shortcut, which serves to align the input and output feature dimensions while introducing supplementary parameters. In the case of shortcut connections that span feature maps of disparate sizes (for example, as a consequence of pooling or strided convolution), the shortcuts are typically performed with a stride of 2. This design ensures that residual connections function correctly under a variety of conditions, enabling efficient cross-layer information propagation while enhancing the training stability and performance of deep networks.
Cyclical learning rate
The learning rate (LR) represents a crucial hyper-parameter in deep neural network training, as it controls the speed and direction of the model’s weight updates during the training process41. A small LR may result in slow convergence, while a large LR may lead to degraded performance and model non-convergence42. The traditional learning rate tuning methods need to perform numerous experiments or the experts’ experiences, which are time-consuming, highly blind, and complex processes43. Leslie N. Smith44 found a method for configuring the learning rate, known as Cyclical Learning Rate (CLR). Instead of setting fixed values during training, CLR adjusts the learning rate values to oscillate between a specified minimum and maximum range. The method works by setting the minimum and maximum learning rate boundary, and the LR varies linearly between the minimum and the maximum boundary. Compared to traditional methods, CLR obviates the necessity for a multitude of trials to ascertain optimal values, thereby reducing the computational load and accelerating the model’s convergence by circumventing local optima45.
In this paper, the proposed model is trained using a Triangular CLR policy as shown in Fig.3, where the difference in the learning rate is reduced by half after each cycle.
Fig. 3.

Triangular learning rate policy. “Stepsize” refers to the number of iterations for half a cycle. The blue lines show the learning rate fluctuating between set bounds.
Proposed method
Pooling operations are commonly employed to discard information, reduce data dimensions, and decrease the number of parameters, thereby improving computational efficiency. However, both traditional max pooling and average pooling methods exhibit inherent limitations. Max pooling, which selects the highest value within a fixed region, can be overly simplistic and prone to overfitting. Conversely, average pooling, which calculates the mean value of activations within a specified region, may lead to the generation of artifacts such as halos and blurring due to its reliance on neighboring values.
In this context, the discrete wavelet transform (DWT) offers a compelling alternative by decomposing data into various frequency components at different scales. This capability facilitates the extraction of intricate details and salient features from signals46. Researchers have increasingly sought to integrate wavelet transforms into the design of deep neural networks for diverse tasks, including image processing, by leveraging wavelet transforms as sampling operations47–49. This approach capitalizes on the unique ability of wavelets to perform multiscale frequency decomposition, enabling more effective feature preservation and noise suppression. Wavelet-based downsampling, in particular, has gained traction as a substitute for traditional pooling methods, as it aims to preserve critical information while simultaneously reducing computational complexity. Notably, in the domain of bearing fault diagnosis, wavelets have primarily been employed in signal preprocessing and postprocessing50,51, with a focus on enhancing feature extraction and mitigating noise interference, thereby improving diagnostic performance.
In this paper, we introduce a novel method specifically designed for bearing fault diagnosis, integrating discrete wavelet transform (DWT) into deep networks as a downsampling technique. By decomposing data into low-frequency and high-frequency components, the DWT effectively separates relevant features from noise. High-frequency components, which predominantly contain noise, are filtered out, thereby attenuating signal interference and enhancing feature clarity. The filtered low-frequency components are then reconstructed using the inverse discrete wavelet transform (IDWT) to ensure the retention of critical fault-related information. As an essential part of this framework, we design generic DWT and IDWT layers for seamless incorporation into deep networks, providing a robust mechanism for noise suppression and feature preservation. This approach is tailored to address the challenges in bearing fault diagnosis, offering an effective solution to improve diagnostic accuracy, robustness to noise, and adaptability to variable operating conditions.
DWT and IDWT layers
The critical issue in the design of DWT and IDWT layers lies in ensuring proper forward and backward propagation of data52.The following analysis is focused on orthogonal wavelets and one-dimensional signals. Extending the framework to non-orthogonal wavelets may necessitate further modifications to the design of the DWT and IDWT layers to ensure reconstruction accuracy and computational robustness53.
- Forward propagation For a 1D signal , DWT breaks the signal down into its low-frequency and high-frequency components , where
, are the low-pass and high-pass filters of an orthogonal wavelet. Using IDWT, one can reconstruct s from , where6
In equations involving matrices and vectors7 8
Where9 10 11 - Backward propagation For the backward propagation of DWT, where
Similarly, for the backward propagation of the IDWT,12 13
In this study, the Haar wavelet is selected due to its simplicity in mathematical formulation and excellent orthogonality properties54. The Haar wavelet is defined by the low-pass filter and the high-pass filter . In practical applications, for a finite-length signal , the transformation matrices corresponding to the low-pass and high-pass filters, , are truncated to the size of and to match the signal length. This choice and processing method ensure computational efficiency in signal decomposition and reconstruction while leveraging the orthogonality of the Haar wavelet to effectively extract signal features and suppress high-frequency noise.
DWCResNet
In the raw bearing vibration signal, the random noise predominantly appears in its high-frequency components, while the features of bearing faults are mainly reflected in the low-frequency component55. Therefore, as shown in Fig.4, the wavelet denoising in this paper is comprised of three steps: (1) decompose noise vibration signals low-frequency component and high-frequency component using DWT, then decompose the low-frequency component into low-frequency component and high-frequency component , (2) eliminate the high-frequency components and , (3) rebuild the vibration signal using the processed components with IDWT.
Fig. 4.
The wavelet denoising operations in DWCResNet, .
In this paper, we introduce an innovative design for DWCResNet, which replaces the conventional down-sampling technique employed in the ResNet18 network with , and the network architecture of the proposed DWCResNet is illustrated in Fig.5. denotes the transformation of noisy data into its low-frequency component. Max-pooling and average-pooling are directly substituted with , while strided-convolution is enhanced by applying a convolution with a stride of 1 followed by , that is
where “”, “” and “” are the max-pooling, strided-convolution, and average-pooling with strides.
Fig. 5.
Architecture of the Wavelet Integrated Convolutional Residual Network.
by halving the size of the feature vectors, it eliminates the high-frequency components of the vibration signals and effectively denoises them. The output of , specifically the low-frequency component, retains the essential information regarding the bearing faults, allowing for the extraction of recognizable features. During the down-sampling process in DWCResNet, can mitigate noise propagation in deep networks and help preserve the coupling relationship between faults. Consequently, would expedite the training of deep networks, improve noise robustness, and enhance classification accuracy by preserving essential low-frequency features and reducing noise interference.
The detailed parameters of the DWCResNet model structure are outlined in Table 1. The model consists of one-dimensional convolutional layers, batch normalization layers, wavelet down-sampling layers, residual layers, and a fully connected layer with a Softmax activation function. The input length is fixed at 2048, with the final output capable of classification into nine categories.
Table 1.
Summary of the parameters and output sizes in the DWCResNet architecture. Output sizes are calculated based on the convolution formula: .
| Layer Type | Kernel Size / Stride | Kernel Number | Padding | Output Size |
|---|---|---|---|---|
| Input | - | - | - | |
| Conv1 | 64 | 3 | ||
| Wavelet Downsample | Wavelet | - | - | |
| Layer1 BasicBlock Conv1 | 64 | 1 | ||
| Layer1 BasicBlock Conv2 | 64 | 1 | ||
| Wavelet Downsample | Wavelet | - | - | |
| Layer2 BasicBlock Conv1 | 128 | 1 | ||
| Layer2 BasicBlock Conv2 | 128 | 1 | ||
| Wavelet Downsample | Wavelet | - | - | |
| Layer3 BasicBlock Conv1 | 256 | 1 | ||
| Layer3 BasicBlock Conv2 | 256 | 1 | ||
| Wavelet Downsample | Wavelet | - | - | |
| Layer4 BasicBlock Conv1 | 512 | 1 | ||
| Layer4 BasicBlock Conv2 | 512 | 1 | ||
| Wavelet Downsample | Wavelet | - | - | |
| FC (Fully Connected) | - | num_classes | - |
Experiments
To illustrate the anti-noise capabilities of the proposed DWCResNet in varying circumstances, a bearing fault diagnosis validation is conducted utilising the Case Western Reserve University (CWRU) bearing data and the Paderborn University (PU) bearing data. The models were trained on five occasions, with the random seed held constant. The experiments were conducted on an NVIDIA GeForce RTX 4060 laptop GPU using the Pytorch frameworks.
For a comprehensive evaluation, the training data was treated as independent, with the original signals segmented into fixed lengths of 2048 without overlap. The data was normalized to the range [-1, 1] and balanced by truncating each class to match the minimum sample size across all classes. No data augmentation techniques were applied. The dataset was split into training and validation sets in a 3:1 ratio, with 75% of the data allocated to the training set and 25% to the validation set, and the models were trained using the Stochastic Gradient Descent (SGD) optimizer with a cyclical learning rate (CLR) policy, with parameters determined through iterative experimentation: a base learning rate of , a maximum learning rate of 0.01, and a step size of 20. A batch size of 32 was used, and the training process spanned 100 epochs with the cross-entropy loss function as the objective. Hyperparameters such as learning rate, batch size, and weight decay were manually adjusted to achieve optimal performance. These adjustments ensure effective training in a wide range of noise and variable operating conditions.
Dataset description
CWRU bearing fault diagnosis dataset
The Case Western Reserve University (CWRU) dataset(https://csegroups.case.edu/bearingdatacenter/pages/download-data-file/), provided by the Case Western Reserve University Bearing Data Center is one of the most famous open-source datasets in the field of bearing fault diagnosis. As shown in Fig.6 above, the test stand includes a 2 hp motor, a torque transducer/encoder, and a dynamometer. The motor bearings’ inner raceway, rolling element, and outer raceway were seeded with faults using electro-discharge machining (EDM), with fault diameters ranging from 0.007 inches to 0.040 inches in diameter. Since the measurement data of 0.028 inches are not complete, this paper only considers the experiment data from the other three fault sizes. The dataset employed in this study comprises data from the drive section under loads of 1 HP, 2 HP, and 3 HP, with a sampling frequency of 12 kHz, as illustrated in Table 2. Each sample comprises 2048 data points and encompasses nine distinct fault conditions, including failures in the inner race, outer race, and rolling elements. The details of the experimental data are provided in Table 3.
Fig. 6.
Case Western Reserve University (CWRU) bearing test bench.
Table 2.
The operating condition of CWRU.
| Work Condition No. | Load (hp) | Rotating speed (r/min) | Number of Bearing fault |
|---|---|---|---|
| Condition1 | 1 | 1772 | 9 |
| Condition2 | 2 | 1750 | 9 |
| Condition3 | 3 | 1730 | 9 |
Table 3.
Description of sample labels (CWRU dataset).
| Labels | Bearing fault | Fault size (inch) | Load (hp) |
|---|---|---|---|
| 0 | Ball fault | 0.007 | 03 |
| 1 | 0.014 | 03 | |
| 2 | 0.021 | 03 | |
| 3 | Inner race fault | 0.007 | 03 |
| 4 | 0.014 | 03 | |
| 5 | 0.021 | 03 | |
| 6 | Outer race fault | 0.007 | 03 |
| 7 | 0.014 | 03 | |
| 8 | 0.021 | 03 |
PU bearing dataset
The Paderborn University (PU) dataset(https://mb.unipaderborn.de/kat/forschung/datacenter/bearing-datacenter/), provided by the Paderborn University Bearing Data Center, consists of artificially machined failed bearings, real failures due to accelerated life testing, and healthy bearings56. The PU bearing failure test rig, depicted in Fig.7 above, includes a drive motor, torque measurement device, bearing test cell, flywheel, and load motor. The data set contains 12 types of artificially damaged bearings, 14 types of real damage due to accelerated life tests, and 6 healthy states, with motor current signals and vibration signals were collected synchronously at a high frequency. For each type of bearing, data were collected at four different operating conditions of speed and load as detailed in Table 4.
Fig. 7.
Paderborn University (PU) bearing test bench.
Table 4.
The operating condition of PU.
| Work Condition No. | Rotating speed (rpm) | Load torque (Nm) | Radial force (N) | Name of setting |
|---|---|---|---|---|
| Condition0 | 1500 | 0.7 | 1000 | N15_M07_F10 |
| Condition1 | 900 | 0.7 | 1000 | N09_M07_F10 |
| Condition2 | 1500 | 0.1 | 1000 | N15_M01_F10 |
| Condition3 | 1500 | 0.7 | 400 | N15_M07_F04 |
The PU dataset contains bearing measurement data categorized into three types: normal, inner race failure, and outer race failure. Based on the size of the damage, failure severity is classified into two levels: Level 1, representing the early stage56, for damage sizes less than 2 mm, and Level 2 for damage sizes between 2 mm and 4.5 mm. Initially, the acceleration data is recorded at a high sampling frequency of 64 kHz, with the experimental data presented in Table 5. In this experiment, the manually generated faults of the electric engraver were selected, and the bearings had a total of five fault types, taking into account the fault location and the degree of damage.
Table 5.
Description of sample labels (PU dataset).
| Labels | Bearing fault | Damage level |
|---|---|---|
| 0 | Inner race fault | 1 |
| 1 | 2 | |
| 2 | Normal | 0 |
| 3 | Outer race fault | 1 |
| 4 | 2 |
Comparison settings
In this study, the anti-noise capability of the proposed method is assessed under variable operating conditions. The performance is quantified using the classification accuracy, and the suggested DWCResNet is evaluated against the following techniques:
Methods that share a similar network architecture with DWCResNet but use a different wavelet downsampling method. The initial network (Method 1, as it will be referred to henceforth) does not filter out the high-frequency component of the vibration signal during the wavelet downsampling process. Instead, it reconstructs both the high-frequency and low-frequency components of the wavelet decomposition using the IDWT, while maintaining the other structures and parameters consistent with those of DWCResNet. The second network (Method 2, as it will be referred to henceforth) performs a single DWT-filtered high-frequency component in wavelet downsampling and reconstructs all the processed low-frequency components using IDWT, maintaining the same structures and parameters as those of DWCResNet. A comparison of DWCResNet with the aforementioned two methods demonstrates the enhanced noise immunity and domain adaptation capability of the wavelet downsampling method suggested in this paper.
Three common fault diagnosis models are selected as comparison baselines: 1D CNN, CNN+MLP, and ResNets, where ResNets serve as the baseline model for the proposed approach. To further validate the effectiveness of our framework, we additionally compare it with the state-of-the-art Deep Residual Shrinkage Networks (DRSN)57. DRSN introduces a feature-level shrinkage mechanism that adaptively suppresses irrelevant features and noise, leading to robust fault diagnosis performance.
To validate the denoising performance of the proposed DWCResNet, we compare it with several state-of-the-art denoising methods, including wavelet threshold denoising followed by ResNet, denoising autoencoder (DAE) followed by ResNet, and GAN-based denoising followed by ResNet. These methods apply denoising as a preprocessing step before fault diagnosis, while the proposed DWCResNet integrates denoising and fault diagnosis into an end-to-end model without requiring explicit denoising preprocessing. This comparison allows us to comprehensively evaluate the advantages of DWCResNet in terms of denoising effectiveness and fault diagnosis performance under various noise conditions.
Evaluation of indicators
The primary performance metric employed in this study is accuracy, which is a commonly used comprehensive indicator as defined in reference (5). The accuracy of the model is evaluated through a fourfold cross-validation process for each experimental set.
| 14 |
where TP, FP, TN, and FN denote the counts of true positive, false positive, true negative, and false negative samples, correspondingly. The accuracy value varies from 0 to 1, with higher values signifying improved classification performance.
Experimental setup
CWRU Dataset Test
(1) Experimental validation on variable work condition
Bearings are typically deployed in complex operational environments. To evaluate the effectiveness of the proposed method in diverse operational contexts, different task conditions (A1 to B3) have been designed to simulate single and multi-condition combinations of training data, as well as unseen test conditions. For a comprehensive understanding of this process, please refer to Table 6.
Table 6.
The diagnostic tasks for experiments of CWRU.
| Task | Training data | Training Samples | Validation Samples | Testing data | Testing Samples |
|---|---|---|---|---|---|
| A1 | Condition1 | 2556 | 639 | Condition2 | 624 |
| A2 | Condition1 | 2547 | 639 | Condition3 | 624 |
| A3 | Condition2 | 2547 | 639 | Condition3 | 624 |
| B1 | Condition2,3 | 3552 | 890 | Condition1 | 624 |
| B2 | Condition1,3 | 3596 | 902 | Condition2 | 624 |
| B3 | Condition1,2 | 3595 | 904 | Condition3 | 624 |
As illustrated in Fig.8, the DWCResNet model exhibits the highest accuracy across all tasks (A1, A2, A3, B1, B2, B3), particularly in the B2 task, where the accuracy reaches 99.1%. Comparatively, DRSN (Deep Residual Shrinkage Network) achieves excellent performance, particularly in A3 and B2 tasks with accuracies of 95.68% and 96.32%, respectively. However, DRSN exhibits slightly lower accuracy in A2 (86.53%), where the DWCResNet demonstrates a more robust generalization capability.In contrast, ResNets and CNN+MLP also demonstrate superior performance but with considerable fluctuations across tasks. Notably, these fluctuations are particularly pronounced in the A2 and A3 tasks, with accuracy rates of 79.19% and 88.84%, respectively. The overall effectiveness of the 1DCNN model is comparatively weaker, particularly in the A2 task, where its accuracy is only 78.72%, suggesting a lack of generalization compared to DWCResNet in diverse data conditions.
Fig. 8.
The diagnostic accuracy of different models for all tasks on the CWRU dataset%.
Furthermore, the proposed DWCResNet outperforms Method 1 and Method 2 across all tasks on the CWRU dataset. While Method 1 preserves the high-frequency component and retains more signal detail information, its susceptibility to noise interference results in inferior performance, particularly in A2 (84.38%) and B2 (93.75%) tasks. Method 2, on the other hand, reduces noise interference effectively by retaining only the low-frequency component, achieving comparable performance to DWCResNet in the B-series tasks. Nevertheless, slight discrepancies persist.The inclusion of DRSN highlights its capability to suppress irrelevant features through shrinkage mechanisms, which enhances its robustness. However, the wavelet downsampling strategy introduced in DWCResNet enables the model to retain richer multi-scale information while simultaneously reducing noise interference. This results in consistently superior accuracy and enhanced robustness in bearing fault diagnosis compared to other methods, including DRSN. The convergence of the model for the A1 task is shown in Fig.9 below. The model is evaluated using the ROC curve method for Task A1, and the specific results are shown in Fig.10.
Fig. 9.
The model in Task A1 loss and accuracy curves.
Fig. 10.

The model in Task A1 ROC curve.
Beyond testing accuracy, Fig.11 displays the confusion matrices of the proposed DWCResNet for Task A (11a, b, c) and Task B (11a, b, c). It should be noted that the tables report the mean and standard deviation across five trials, while the confusion matrices shown correspond to the trials achieving the highest accuracy.
Fig. 11.
The confusion matrices for different conditions using DWCResNet for the CWRU dataset.
(2) Experimental validation on variable work conditions and noisy environment
To ascertain the anti-noise functionality of DWCResNet in varying operational contexts, distinct levels of Gaussian white noise are introduced to the original signal, resulting in different signal-to-noise ratio (SNR) levels. Detailed information is provided in Table 7, where SNR is described as:
| 15 |
Where signifies the power (mean of squares) of the raw signal and represents the power (mean of squares) of Gaussian white noise.
Table 7.
The diagnostic tasks for experiments of CWRU with different SNR noise.
| Task | Training data | Testing data |
|---|---|---|
| A1 | Condition1 and SNR=6,4,2,0,-2,-4,-6dB | Condition2 and SNR=6,4,2,0,-2,-4,-6dB |
| A2 | Condition1 and SNR=6,4,2,0,-2,-4,-6dB | Condition3 and SNR=6,4,2,0,-2,-4,-6dB |
| A3 | Condition2and SNR=6,4,2,0,-2,-4,-6dB | Condition3 and SNR=6,4,2,0,-2,-4,-6dB |
| B1 | Condition2,3 and SNR=6,4,2,0,-2,-4,-6dB | Condition1 and SNR=6,4,2,0,-2,-4,-6dB |
| B2 | Condition1,3 and SNR=6,4,2,0,-2,-4,-6dB | Condition2 and SNR=6,4,2,0,-2,-4,-6dB |
| B3 | Condition1,2 and SNR=6,4,2,0,-2,-4,-6dB | Condition3 and SNR=6,4,2,0,-2,-4,-6dB |
As illustrated in Fig.12, subfigures a–f depict the diagnostic accuracy across tasks A1 to A3 and B1 to B3 under various noise conditions. The performance of different models is evaluated as the signal-to-noise ratio (SNR) increases from -6 dB to 6 dB, showing a consistent upward trend in accuracy for all models. This emphasizes the critical role of higher SNR in improving diagnostic efficacy and underscores the models’ noise robustness. Among the compared models, DWCResNet consistently achieves the highest accuracy across all tasks and exhibits exceptional robustness under low SNR conditions (e.g., -6 dB). Notably, DRSN demonstrates competitive performance, particularly under moderate to high SNR conditions, where it effectively suppresses irrelevant features through its shrinkage mechanism. However, DRSN’s accuracy drops more noticeably at low SNR levels compared to DWCResNet, especially in tasks A2 and B3. Method 2 also demonstrates stability and adaptability by preserving low-frequency components, achieving results that approach those of DWCResNet, particularly in B-series tasks. However, slight discrepancies remain, particularly under extreme low SNR conditions. In contrast, Method 1, while attempting to reconstruct signals by retaining both high-frequency and low-frequency components, suffers from noise interference, resulting in reduced diagnostic accuracy, particularly in A1 and B2 tasks. ResNets and CNN+MLP achieve high accuracy under medium-to-high SNR conditions but exhibit substantial declines at low SNR levels, reflecting their limited robustness to noise. The 1DCNN model, on the other hand, consistently delivers the lowest accuracy across all SNR levels, particularly struggling under low SNR conditions due to its inability to effectively suppress noise and capture critical diagnostic features.
Fig. 12.
The diagnostic accuracy of various tasks under different SNR levels on the CWRU dataset.
Fig.13 illustrates the performance comparison of various methods across multiple tasks (A1, A2, A3, B1, B2, B3) under natural noise conditions in terms of accuracy. DWCResNet consistently demonstrates superior performance, achieving the highest accuracy across all tasks, which highlights its robustness and effectiveness in handling diverse fault diagnosis scenarios. Methods such as 1DCNN, CNN+MLP, and ResNets show moderate performance, with ResNets generally outperforming 1DCNN and CNN+MLP. DRSN, Method1, and Method2 exhibit varying levels of performance, with DRSN performing relatively well but still falling short of DWCResNet. These results underscore the advanced capabilities of DWCResNet in integrating denoising and fault diagnosis, making it a highly effective solution for complex real-world applications.
Fig. 13.
The diagnostic accuracy of various tasks under natural noises.
In conclusion, DWCResNet leverages a wavelet-based downsampling strategy to focus on low-frequency components, effectively enhancing noise immunity and achieving superior diagnostic performance across all tasks. Compared to DRSN and other models, DWCResNet consistently demonstrates higher accuracy, particularly in low SNR environments, validating its robustness and reliability in noisy conditions.
The convergence of the model for Task A1 when the SNR is -6 is shown in Fig.14. Noise interferes with the convergence process of the model. When the SNR is -6 dB, the convergence process exhibits significant fluctuations. The ROC curve of the model under SNR = -6 is shown in Fig.15.
Fig. 14.
Comparison strategies in practical applications.
Fig. 15.

The model in Task A1 ROC curve under SNR = -6 dB.
(3) Denoising Performance and Fault Diagnosis Accuracy
To validate the denoising performance of the proposed DWCResNet, we compare it with several state-of-the-art denoising techniques combined with fault diagnosis models, including wavelet threshold denoising + ResNet, denoising autoencoder (DAE) + ResNet, and GAN-based denoising + ResNet. The experiments are conducted on the CWRU bearing dataset under different noise levels (SNR = -6 dB, 0 dB, 6 dB) and natural noises in Task A1. The results are evaluated in terms of fault diagnosis accuracy. As shown in Fig.16, it can be observed that DWCResNet achieves the highest classification accuracy under all signal-to-noise ratio (SNR) conditions, particularly excelling in low SNR conditions (-6 dB) and natural noises with a classification accuracy of 88.34% and 92.95%, significantly outperforming other methods. This demonstrates that its built-in denoising function can effectively handle strong noise environments while preserving critical fault features. GAN + ResNet performs well high SNR conditions, with classification accuracies of 90.16% (6 dB), but its performance drops to 83.63% under low SNR conditions. DAE + ResNet and Wavelet + ResNet show relatively weaker performance, especially under low SNR conditions, with classification accuracies of 80.3% and 76.5%, respectively, indicating their limited denoising capabilities. These results demonstrate that DWCResNet not only effectively removes noise but also preserves critical fault features, leading to improved fault diagnosis performance. The end-to-end design of DWCResNet eliminates the need for explicit denoising preprocessing, making it more efficient and practical for real-world applications.
Fig. 16.
The diagnostic accuracy of different denoising methods under various noise conditions in Task A1.
(4) Impact of data scarcity on DWCResNet
To further examine the framework’s resilience in the context of data scarcity, we conducted supplementary experiments on Task A1 under two conditions: clean data (original data devoid of noise) and noisy data (with -6 dB Gaussian noise). In these experiments, the training data was progressively diminished while the validation and testing sets remained unaltered.
The results presented in Table 8 and Table 9 demonstrate the framework’s performance under varying levels of data scarcity for Task A1. Under noiseless conditions, as shown in Table 8, the model achieves a validation accuracy of 95.35% with the full training dataset, and the accuracy decreases only slightly to 94.39% when the training data is reduced to 25%. This indicates that the framework maintains robust performance with moderate reductions in training data. However, when -6 dB Gaussian noise is introduced, as shown in Table 9, the validation accuracy drops more significantly, from 92.33% at 100% training data to 89.49% at 50% training data. These results highlight that noise amplifies the negative effects of data scarcity, causing a more noticeable decline in accuracy compared to the clean condition. Overall, the findings suggest that while the framework demonstrates strong robustness under clean conditions, it becomes more sensitive to noise and reduced training data, emphasizing the importance of noise mitigation and data augmentation strategies in practical applications.
Table 8.
Impact of data scarcity on validation accuracy under noiseless conditions for Task A1.
| Training Data (%) | Training Samples | Validation Samples | Accuracy (%) |
|---|---|---|---|
| 100 | 2556 | 639 | 95.35% |
| 75 | 1917 | 639 | 95.19% |
| 50 | 1278 | 639 | 95.03% |
| 25 | 639 | 639 | 94.39% |
Table 9.
Impact of data scarcity on validation accuracy under noisy conditions (-6 dB Gaussian Noise) for Task A1.
| Training Data (%) | Training Samples | Validation Samples | Accuracy (%) |
|---|---|---|---|
| 100 | 2556 | 639 | 92.33% |
| 75 | 1917 | 639 | 91.76% |
| 50 | 1278 | 639 | 89.49% |
| 25 | 639 | 639 | 84.66% |
PU dataset test
To provide further evidence of the model’s anti-noise performance under diverse operational circumstances, we conducted experiments utilizing the PU dataset. Table 10 illustrates the diagnostic tasks that were subjected to experimentation on the PU dataset. The efficacy of the model was evaluated by subjecting it to a series of diagnostic tasks (C1-C6), each utilizing distinct training and test data sets.The confusion matrices utilizing DWCResNet for Task C(17a–f) are displayed in Fig.17. Additionally, Gaussian white noise with varying levels of SNR was incorporated into the tasks, specifically -6 dB, 0 dB, and 6 dB, to assess the model’s resilience to noise interference across diverse operational scenarios.
Table 10.
The diagnostic tasks for experiments of PU.
| Task | Training data | Training Samples | Validation Samples | Testing data | Testing Samples |
|---|---|---|---|---|---|
| C1 | Condition0 | 10029 | 2510 | Condition1 | 2515 |
| C2 | Condition0 | 10029 | 2510 | Condition2 | 2515 |
| C3 | Condition0 | 10029 | 2510 | Condition3 | 2515 |
| C4 | Condition1 | 10029 | 2510 | Condition2 | 2515 |
| C5 | Condition1 | 10029 | 2510 | Condition3 | 2515 |
| C6 | Condition2 | 10029 | 2510 | Condition3 | 2515 |
Fig. 17.
The confusion matrices for different conditions using DWCResNet for the PU dataset.
As shown in Fig.18, DWCResNet achieves the highest and most consistent accuracy across all tasks, reaching approximately 99.96%. This highlights its strong diagnostic capability and noise resilience. Method 2 performs similarly but shows slight instability, particularly on tasks C3 and C6. Method 1, while effective, is more sensitive to noise, especially on tasks C3 and C5, where its performance fluctuates. ResNets and CNN+MLP exhibit relatively stable accuracy with minor variations, whereas 1DCNN performs the worst, struggling with complex and noisy data, particularly in tasks C3 and C5. Additionally, Deep Residual Shrinkage Networks (DRSN) demonstrate excellent and stable performance across most tasks. Specifically, DRSN achieves 99.92% accuracy on tasks C3 and C6, showcasing its strong noise suppression capabilities. The shrinkage mechanism in DRSN effectively suppresses noise while retaining essential diagnostic features, leading to improved performance on complex tasks. Overall, the wavelet downsampling strategy in DWCResNet improves its accuracy and adaptability, making it the top performer on the PU dataset.
Fig. 18.
The diagnostic accuracy of different models for all tasks on the PU dataset%
As shown in Fig.19, the diagnostic accuracy of the models varies significantly with different signal-to-noise ratio (SNR) levels. Under the low SNR condition (-6 dB as shown in 19a), DWCResNet demonstrates the greatest resilience to noise, maintaining an accuracy close to 90%, while DRSN also performs robustly, achieving higher accuracy than ResNets and CNN+MLP. The shrinkage mechanism in DRSN contributes to its ability to suppress noise and retain critical diagnostic features, enabling it to outperform other methods like CNN+MLP and 1DCNN, which achieve lower and more variable accuracies. Notably, 1DCNN performs the worst, with accuracy falling below 80%, showing poor noise immunity. In the medium SNR condition (0 dB as shown in 19b), all models show a notable improvement in accuracy. DWCResNet and Method 2 achieve the most robust performance, with accuracies approaching or exceeding 98%. DRSN also shows excellent performance, maintaining stable accuracy across tasks and outperforming ResNets and CNN+MLP. Despite improvement, 1DCNN still lags behind, highlighting its limitations in handling noisy data compared to the other methods. Under the high SNR condition (as shown in 19c), DWCResNet and Method 2 achieve near-perfect accuracy, approaching 100%, which underscores their strong diagnostic capability and stability. DRSN also achieves impressive performance, with accuracy levels close to those of DWCResNet and Method 2, demonstrating its effectiveness in high-SNR conditions. ResNets and CNN+MLP perform well, attaining accuracies near 95%, while 1DCNN, although improved, still falls short.In conclusion, the DWCResNet proposed in this paper demonstrates the greatest resilience to all types of noise and is the model with the most robust noise resistance.
Fig. 19.
The diagnostic accuracy of all tasks under different SNR levels on the PU dataset.
The loss and accuracy curves of the model for Task C1 under both noiseless conditions (Fig.20) and -6 dB noise demonstrate (Fig.21) that, while the model converges smoothly and achieves high accuracy in the noiseless case, the presence of -6 dB noise introduces significant fluctuations in the convergence process, leading to slower stabilization and reduced overall accuracy.
Fig. 20.
The model in Task C1 loss and accuracy curves.
Fig. 21.
The model in Task C1 loss and accuracy curves under SNR = -6 dB.
The ROC curves of the model for Task C1 under both noiseless conditions (Fig.22) and -6 dB noise (Fig.23) demonstrate that the model achieves high classification performance in the noiseless case, while the presence of -6 dB noise slightly reduces the true positive rate, indicating a minor impact on the model’s robustness.
Fig. 22.

The model in Task C1 ROC curve.
Fig. 23.

The model in Task C1 ROC curve under SNR = -6 dB.
To address the black-box nature of the framework, we visualized the process through which the neural network encodes and processes bearing fault signals using t-distributed Stochastic Neighbor Embedding (t-SNE). t-SNE is a widely used technique for visualizing high-dimensional data by projecting it into a two-dimensional space.For this experiment, we selected Task C1 and introduced -6 dB Gaussian noise to the raw input signal to simulate a low Signal-to-Noise Ratio (SNR) condition. The t-SNE results were visualized layer by layer to show the progressive transformation of features as the signal propagates through the network. Specifically, Layer 0 represents the raw input signal, while Layers 1 to 5 correspond to the outputs after successive convolutional layers and residual blocks. The visualizations are shown in Fig.24, where each subplot corresponds to the output of a specific layer. The 5 different fault categories are represented by distinct colors, clearly showing how the network progressively separates the fault features while mitigating the effects of noise.
Fig. 24.
Layer-wise visualization of fault diagnosis using t-SNE under -6 dB SNR noise in Task C1.
(1) Denoising Performance and Fault Diagnosis Accuracy
To validate the model’s robustness in fault detection under noisy conditions, the experiments evaluated the denoising and fault diagnosis performance of DWCResNet under varying noise levels using the PU dataset. Gaussian noise was introduced at SNR levels of -6 dB, 0 dB, and 6 dB, along with natural noise conditions, to simulate real-world industrial environments. Comparisons were made against Wavelet + ResNet, DAE + ResNet, and GAN + ResNet, with performance assessed based on diagnostic accuracy and SNR improvement.
From the experimental results presented in Tables 11 and 12, it is evident that DWCResNet consistently outperforms other methods in both diagnostic accuracy and denoising performance across various noise conditions. Under low SNR conditions (-6 dB), DWCResNet achieves a diagnostic accuracy of 92.72% and an SNR of 13.1 dB, significantly surpassing Wavelet + ResNet (82.26%, 9.2 dB), DAE + ResNet (85.57%, 10.5 dB), and GAN + ResNet (88.39%, 11.3 dB). Similarly, in natural noise conditions, DWCResNet attains a diagnostic accuracy of 95.83% and an SNR of 15.2 dB, demonstrating its robustness in complex noise environments. While GAN + ResNet performs well under high SNR conditions (6 dB) with a diagnostic accuracy of 93.52% and an SNR of 15.2 dB, its performance declines in low SNR conditions. DAE + ResNet and Wavelet + ResNet exhibit relatively weaker performance, particularly in challenging noise environments, with diagnostic accuracies of 90.61% and 88.41%, respectively, under 0 dB, and SNRs of 12.8 dB and 11.5 dB. These results highlight the superior denoising and fault diagnosis capabilities of DWCResNet, making it a highly effective solution for real-world applications with varying noise levels.
Table 11.
The diagnostic accuracy of Task C1 under different noise conditions(%).
| Method | -6 dB | 0 dB | 6 dB | Natural Noises |
|---|---|---|---|---|
| Wavelet + ResNet | 82.26% | 88.41% | 90.19% | 89.36% |
| DAE + ResNet | 85.57% | 90.61% | 92.26% | 91.72% |
| GAN + ResNet | 88.39% | 91.94% | 93.52% | 92.44% |
| DWCResNet | 92.72% | 94.35% | 96.78% | 95.83% |
Table 12.
Denoising Pperformance of Task C1 under different noise conditions.
| Method | -6 dB | 0 dB | 6 dB | Natural Noises |
|---|---|---|---|---|
| Wavelet + ResNet | 9.2 dB | 11.5 dB | 13.8 dB | 12.7 dB |
| DAE + ResNet | 10.5 dB | 12.8 dB | 14.5 dB | 13.6 dB |
| GAN + ResNet | 11.3 dB | 13.6 dB | 15.2 dB | 14.4 dB |
| DWCResNet | 13.1 dB | 14.5 dB | 16.8 dB | 15.2 dB |
(2) Early-stage fault diagnosis under noise conditions
To further evaluate the DWCResNet model’s robustness in detecting early-stage faults with minimal vibration signal changes, we focused on Level 1 damage (damage size< 2 mm) within the PU dataset. These faults are characterized by very subtle fault-induced signal amplitudes that often approach the noise floor, making them challenging to identify accurately. To simulate early-stage faults where the signal amplitudes are close to the background noise, Gaussian noise was introduced into the vibration signals at different Signal-to-Noise Ratios (SNR), specifically -6 dB, -4 dB, and -2 dB, and natural noise. The experiments were conducted using samples with Level 1 damage (Labels 0 and 3) and normal conditions (Label 2). The diagnostic performance of the DWCResNet model was evaluated across tasks C1, C4, and C6 under these conditions. The results, summarized in Table 13, demonstrate the model’s robustness under varying noise levels.
Table 13.
Diagnostic accuracy under noise and early-stage fault conditions(%).
| Task | -6dB | -4dB | -2dB | Natural Noises | Noiseless |
|---|---|---|---|---|---|
| C1 | 96.74% | 98.80% | 99.40% | 99.12% | 99.96% |
| C4 | 96.08% | 98.07% | 99.38% | 98.96% | 99.92% |
| C6 | 96.48% | 98.94% | 99.60% | 99.32% | 99.96% |
Table 13 demonstrates the DWCResNet model’s robust performance in detecting early-stage faults under varying noise levels. Across tasks C1, C4, and C6, the model maintains high accuracy, even at low SNRs (-6 dB to -2 dB), with results consistently above 96%. Accuracy improves as noise decreases, reaching near-perfect levels (99.9%) in noiseless conditions. The model also performs well under natural noise, with accuracies above 98.96%. These results highlight its effectiveness in identifying subtle fault signals close to the noise floor, making it reliable for early-stage fault diagnosis in noisy environments.
To better understand how the DWCResNet model extracts bearing early-stage fault features in the presence of noise, we performed layer-wise visualization using the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm.The t-SNE results for task C1 at different noise levels are shown below. At -6 dB SNR (Fig.25), the raw signal (Layer 0) shows significant overlap between classes; however, as the features progress through the layers, clear separation between fault classes begins to emerge, particularly in Layer 4 and Layer 5. At -4 dB SNR (Fig.26), the feature clusters form more distinct boundaries earlier, such as in Layer 3, demonstrating improved separability compared to -6 dB. Finally, at -2 dB SNR (Fig.27), the model achieves excellent class separability by Layer 4 and Layer 5, with minimal overlap between fault types and the normal condition.
Fig. 25.
Layer-wise t-SNE visualization under early-stage fault conditions with -6 dB SNR noise in Task C1.
Fig. 26.
Layer-wise t-SNE visualization under early-stage fault conditions with -4 dB SNR noise in Task C1.
Figure 27.
Layer-wise t-SNE visualization under early-stage fault conditions with -2 dB SNR noise in Task C1.
The results demonstrate that the DWCResNet model is capable of detecting early-stage faults with high accuracy even under noisy conditions.
(3) Cross-dataset experiment on CWRU and PU
To verify the robustness of the proposed method under inconsistent training and testing data distributions, we designed a cross-dataset experiment using the CWRU and PU bearing datasets. In the experiment, we trained the model using the CWRU dataset and tested it on the PU dataset, and vice versa. Table 14 presents the diagnostic accuracy of the cross-dataset experiment conducted on the CWRU and PU bearing datasets. The results demonstrate that DWCResNet performs exceptionally well in cross-dataset scenarios, achieving an accuracy of 87.53% in the CWRU-trained and PU-tested setting and 89.76% in the PU-trained and CWRU-tested setting, significantly outperforming other comparative methods (e.g., 1DCNN, CNN+MLP, ResNets, etc.). These results validate the strong generalization capability of DWCResNet in handling data distribution shifts, particularly its ability to maintain high diagnostic accuracy even when the training and testing data distributions are inconsistent. Future research could further explore domain adaptation techniques to enhance the model’s performance under larger data distribution discrepancies, thereby improving its applicability in complex industrial environments.
Table 14.
Diagnostic accuracy under )cross-dataset experiment(%).
| 1DCNN | CNN+MLP | ResNets | DRSN | Method1 | Method2 | DWCResNet | |
|---|---|---|---|---|---|---|---|
| CWRU-PU | 59.79% | 65.01% | 80.36% | 82.34 | 84.69% | 85.59% | 87.53% |
| PU-CWRU | 63.61% | 68.29% | 82.64% | 85.67% | 88.21% | 87.94% | 89.76% |
Conclusion
This paper introduces the wavelet downsampling technique and proposes an improved model based on ResNet18, designated as DWCResNet, for bearing fault diagnosis. In this model, the original vibration signal is decomposed using the discrete wavelet transform (DWT), which effectively filters out high-frequency noise while retaining critical low-frequency features of bearing faults. In contrast to conventional ResNet18, DWCResNet replaces maximum pooling and average pooling with discrete wavelet downsampling. Additionally, convolutions with a step size of 2 are substituted with convolutions with a step size of 1, combined with wavelet downsampling. These improvements remove high-frequency noise while preserving the coupling relationships of fault features in the deep network, thereby enhancing the noise robustness of the model. The experimental validation of the CWRU and PU bearing fault datasets under variable operating conditions and noise conditions demonstrates that DWCResNet exhibits a notable enhancement in performance concerning bearing fault diagnosis, exhibiting high accuracy, adaptability to variable operating conditions, and noise resistance. To further evaluate the resilience of the proposed model, we conducted two additional experiments. First, the impact of data scarcity was examined on the CWRU dataset by progressively reducing the training data while keeping the validation and test sets unaltered. This experiment demonstrated that DWCResNet can maintain high diagnostic accuracy even with limited training samples, showcasing its strong adaptability under data-scarce conditions. Second, the early-stage fault detection under noise conditions experiment was performed on the PU dataset. For early-stage faults with minimal signal changes (Level 1 damage), the model was tested under various Signal-to-Noise Ratios (SNR), including -6 dB, -4 dB, and -2 dB. The results indicate that DWCResNet can effectively distinguish subtle fault features from noise, with clear feature separation observed in the deeper layers of the network using t-SNE visualizations. These findings highlight the model’s superior noise resistance and its ability to detect early-stage faults under challenging conditions.
This study demonstrated the model’s performance in bearing fault diagnosis under various working conditions and noisy environments. However, its ability to generalize across devices, scenarios, and data from different sensors has not been thoroughly investigated, limiting its applicability in broader industrial settings. Additionally, real-time monitoring poses unique challenges compared to offline diagnosis due to stringent requirements for computational resources, data stream processing, and low latency. Real-time systems must process continuous data streams and perform fault detection within constrained timeframes, demanding higher algorithmic efficiency and optimized hardware utilization. To address these challenges and enhance the model’s suitability for real-time applications, future research will focus on several key directions. These include designing lightweight models through techniques such as model pruning and quantization to reduce computational complexity and hardware dependency, optimizing edge computing to deploy the model on edge devices and minimize data transmission delays, and developing algorithms tailored for real-time data streams to ensure efficient performance under dynamic conditions. Furthermore, integrating transfer learning and domain adaptation methods will improve the model’s ability to generalize across diverse devices, scenarios, and sensor data, enhancing its adaptability to complex industrial environments. By addressing these aspects, the model’s applicability in real-time monitoring scenarios can be significantly improved, expanding its potential for industrial deployment.
Abbreviations
- DWT
Discrete wavelet transform
- IDWT
Inverse discrete wavelet transform
- DWTResNet
Discrete wavelet integrated convolutional residual neural network
- CWRU
Case western reserve university
- PU
Paderborn university
- CLR
Cycle learning rate
- LR
Learning rate
- CNN
Convolutional neural network
- GRL
Gradient reversal layer
- ResNet
Residual network
- EDM
Electro-discharge machining
- HP
Horsepower
- SGD
Stochastic gradient descent
- DRSN
Deep residual shrinkage networks
- NLP
Natural language processing
- CV
Computer Vision
- GRNN
Gated recurrent neural networks
- RFBs
Ramanujan filter banks
- MS-D
Multi-scale dilate
- LSTM
Long short-term memory
- RCA
Residual channel attention
- FSWT
Frequency slice wavelet transform
- 1DCNN
One-dimensional vonvolutional neural network
- SNR
Signal-to-noise ratio
- DANN
Domain adversarial neural networks
Author contributions
YunFeng Ni was responsible for the original draft, conceptualization, methodology, and experiments validation. Shuang Li contributed to data curation, original draft preparation, conceptualization, methodology, and experiments validation. Ping Guo provided methodological support, supervision, and contributed to writing, review, and editing.
Data availability
The datasets generated and/or analysed during the current study are available in the following repositories: the CWRU dataset can be accessed from the Case Western Reserve University Bearing Data Center at https://csegroups.case.edu/bearingdatacenter/pages/download-data-file, and the PU dataset is available from the Paderborn University Bearing Data Center at https://mb.unipaderborn.de/kat/forschung/datacenter/bearing-datacenter.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Ni, Q., Ji, J., Halkon, B., Feng, K. & Nandi, A. K. Physics-informed residual network (piresnet) for rolling element bearing fault diagnostics. Mechanical Systems and Signal Processing. 200, 110544 (2023). [Google Scholar]
- 2.Chen, X. et al. Deep transfer learning for bearing fault diagnosis: A systematic review since 2016. IEEE Transactions on Instrumentation and Measurement. 72, 1–21 (2023).37323850 [Google Scholar]
- 3.Qiao, M., Yan, S., Tang, X. & Xu, C. Deep convolutional and lstm recurrent neural networks for rolling bearing fault diagnosis under strong noises and variable loads. Ieee Access. 8, 66257–66269 (2020). [Google Scholar]
- 4.Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 60, 84–90 (2017). [Google Scholar]
- 5.Liu, B. et al. Integration and Performance Analysis of Artificial Intelligence and Computer Vision Based on Deep Learning Algorithms. arXiv preprint arXiv:2312.12872 (2023).
- 6.Islam, M. R. et al. Deep learning and computer vision techniques for enhanced quality control in manufacturing processes. IEEE Access (2024).
- 7.Chen, Y., Wang, S., Lin, L., Cui, Z. & Zong, Y. Computer vision and deep learning transforming image recognition and beyond. International Journal of Computer Science and Information Technology2, 45–51 (2024). [Google Scholar]
- 8.Essakki, K. S. et al. Echosight: Blending deep learning and computer vision for unparalleled navigational support for the visually impaired. In 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), 1–6 (IEEE, 2024).
- 9.Wu, S. et al. Deep learning in clinical natural language processing: a methodical review. Journal of the American Medical Informatics Association. 27, 457–470 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bharadiya, J. A comprehensive survey of deep learning techniques natural language processing. European Journal of Technology. 7, 58–66 (2023). [Google Scholar]
- 11.Anand, M. et al. Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science. 943, 203–218 (2023). [Google Scholar]
- 12.Jahan, M. S. & Oussalah, M. A systematic review of hate speech automatic detection using natural language processing. Neurocomputing. 546, 126232 (2023). [Google Scholar]
- 13.Khurana, D., Koli, A., Khatter, K. & Singh, S. Natural language processing: state of the art, current trends and challenges. Multimedia tools and applications. 82, 3713–3744 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Andersen, R. S., Peimankar, A. & Puthusserypady, S. A deep learning approach for real-time detection of atrial fibrillation. Expert Systems with Applications. 115, 465–473 (2019). [Google Scholar]
- 15.Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell. 180, 688–702 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nath, A. G., Udmale, S. S., Raghuwanshi, D. & Singh, S. K. IEEE Sensors Journal.
- 17.Udmale, S. S., Patil, S. S., Phalle, V. M. & Singh, S. K. A bearing vibration data analysis based on spectral kurtosis and convnet. Soft Computing. 23, 9341–9359 (2019). [Google Scholar]
- 18.Asif, M., Nazeer, O., Javaid, N., Alkhammash, E. H. & Hadjouni, M. Data augmentation using biwgan, feature extraction and classification by hybrid 2dcnn and bilstm to detect non-technical losses in smart grids. IEEE Access. 10, 27467–27483 (2022). [Google Scholar]
- 19.Wang, H., Liu, Z., Peng, D. & Qin, Y. Understanding and learning discriminant features based on multiattention 1dcnn for wheelset bearing fault diagnosis. IEEE Transactions on Industrial Informatics. 16, 5735–5745 (2019). [Google Scholar]
- 20.Song, X., Cong, Y., Song, Y., Chen, Y. & Liang, P. A bearing fault diagnosis model based on cnn with wide convolution kernels. Journal of Ambient Intelligence and Humanized Computing. 13, 4041–4056 (2022). [Google Scholar]
- 21.Jin, G., Zhu, T., Akram, M. W., Jin, Y. & Zhu, C. An adaptive anti-noise neural network for bearing fault diagnosis under noise and varying load conditions. IEEE access. 8, 74793–74807 (2020). [Google Scholar]
- 22.Chen, X., Zhang, B. & Gao, D. Bearing fault diagnosis base on multi-scale cnn and lstm model. Journal of Intelligent Manufacturing. 32, 971–987 (2021). [Google Scholar]
- 23.Zhang, W., Li, C., Peng, G., Chen, Y. & Zhang, Z. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mechanical systems and signal processing. 100, 439–453 (2018). [Google Scholar]
- 24.Sun, H., Cao, X., Wang, C. & Gao, S. An interpretable anti-noise network for rolling bearing fault diagnosis based on fswt. Measurement. 190, 110698 (2022). [Google Scholar]
- 25.Li, R., Zhuang, L., Li, Y. & Shen, C. Intelligent bearing fault diagnosis based on scaled ramanujan filter banks in noisy environments. IEEE Transactions on Instrumentation and Measurement. 70, 1–13 (2021).33776080 [Google Scholar]
- 26.Chen, X. et al. Deep transfer learning for bearing fault diagnosis: A systematic review since 2016. IEEE Transactions on Instrumentation and Measurement. 72, 1–21 (2023).37323850 [Google Scholar]
- 27.Qian, Q., Qin, Y., Luo, J., Wang, Y. & Wu, F. Deep discriminative transfer learning network for cross-machine fault diagnosis. Mechanical Systems and Signal Processing. 186, 109884 (2023). [Google Scholar]
- 28.Hou, W. et al. A new bearing fault diagnosis method via simulation data driving transfer learning without target fault data. Measurement. 215, 112879 (2023). [Google Scholar]
- 29.Lei, Z. et al. Prior knowledge-embedded meta-transfer learning for few-shot fault diagnosis under variable operating conditions. Mechanical Systems and Signal Processing. 200, 110491 (2023). [Google Scholar]
- 30.Udmale, S. S., Singh, S. K., Singh, R. & Sangaiah, A. K. Multi-fault bearing classification using sensors and convnet-based transfer learning approach. IEEE Sensors Journal20, 1433–1444. 10.1109/JSEN.2019.2947026 (2020). [Google Scholar]
- 31.Zhang, R., Tao, H., Wu, L. & Guan, Y. Transfer learning with neural networks for bearing fault diagnosis in changing working conditions. Ieee Access. 5, 14347–14357 (2017). [Google Scholar]
- 32.Che, K. et al. Fault diagnosis of variable working conditions based on transfer learning and multi-channel cnn-lstm network. In 2023 35th Chinese Control and Decision Conference (CCDC), 658–663 (IEEE, 2023).
- 33.Zou, Y., Liu, Y., Deng, J., Jiang, Y. & Zhang, W. A novel transfer learning method for bearing fault diagnosis under different working conditions. Measurement. 171, 108767 (2021). [Google Scholar]
- 34.Wang, Z., Liu, Q., Chen, H. & Chu, X. A deformable cnn-dlstm based transfer learning method for fault diagnosis of rolling bearing under multiple working conditions. International Journal of Production Research. 59, 4811–4825 (2021). [Google Scholar]
- 35.Jiao, J., Zhao, M., Lin, J. & Liang, K. A comprehensive review on convolutional neural network in machine fault diagnosis. Neurocomputing. 417, 36–63 (2020). [Google Scholar]
- 36.Chen, Z., Gryllias, K. & Li, W. Mechanical fault diagnosis using convolutional neural networks and extreme learning machine. Mechanical systems and signal processing. 133, 106272 (2019). [Google Scholar]
- 37.Zhang, W., Li, C., Peng, G., Chen, Y. & Zhang, Z. A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load. Mechanical systems and signal processing. 100, 439–453 (2018). [Google Scholar]
- 38.Wu, C., Jiang, P., Ding, C., Feng, F. & Chen, T. Intelligent fault diagnosis of rotating machinery based on one-dimensional convolutional neural network. Computers in Industry. 108, 53–61 (2019). [Google Scholar]
- 39.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
- 40.Peng, L., Zhang, J., Lu, S., Li, Y. & Du, G. One-dimensional residual convolutional neural network and percussion-based method for pipeline leakage and water deposit detection. Process Safety and Environmental Protection. 177, 1142–1153 (2023). [Google Scholar]
- 41.Tang, S., Zhu, Y. & Yuan, S. An improved convolutional neural network with an adaptable learning rate towards multi-signal fault diagnosis of hydraulic piston pump. Advanced Engineering Informatics. 50, 101406 (2021). [Google Scholar]
- 42.Zhang, Y., Zhou, T., Huang, X., Cao, L. & Zhou, Q. Fault diagnosis of rotating machinery based on recurrent neural networks. Measurement. 171, 108774 (2021). [Google Scholar]
- 43.Wen, L., Li, X. & Gao, L. A new reinforcement learning based learning rate scheduler for convolutional neural network in fault classification. IEEE Transactions on Industrial Electronics. 68, 12890–12900 (2020). [Google Scholar]
- 44.Smith, L. N. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV), 464–472 (IEEE, 2017).
- 45.Houssein, E. H. et al. Using deep densenet with cyclical learning rate to classify leukocytes for leukemia identification. Frontiers in Oncology. 13, 1230434 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zhou, X., Tang, X. & Liang, W. A novel analog circuit fault diagnosis method based on multi-channel 1d-resnet and wavelet packet transform. Analog Integrated Circuits and Signal Processing 1–14 (2024).
- 47.Duan, Y., Liu, F., Jiao, L., Zhao, P. & Zhang, L. Sar image segmentation based on convolutional-wavelet neural network and markov random field. Pattern Recognition. 64, 255–267. 10.1016/j.patcog.2016.11.015 (2017). [Google Scholar]
- 48.Liu, P., Zhang, H., Zhang, K., Lin, L. & Zuo, W. Multi-level wavelet-cnn for image restoration (2018). arXiv:1805.07071.
- 49.Williams, T. & Li, R. Wavelet pooling for convolutional neural networks. In International Conference on Learning Representations (2018).
- 50.Wavelet transform for rotary machine fault diagnosis: 10 years revisited.
- 51.Fu, S., Wu, Y., Wang, R. & Mao, M. A bearing fault diagnosis method based on wavelet denoising and machine learning. Applied Sciences. 13, 5936 (2023). [Google Scholar]
- 52.Li, Q., Shen, L., Guo, S. & Lai, Z. Wavelet integrated cnns for noise-robust image classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7245–7254 (2020).
- 53.Wang, W., Li, L., Qu, Z. & Yang, X. Enhanced damage segmentation in rc components using pyramid haar wavelet downsampling and attention u-net. Automation in Construction. 168, 105746 (2024). [Google Scholar]
- 54.Xu, G. et al. Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation. Pattern Recognition. 143, 109819 (2023). [Google Scholar]
- 55.Xu, D., Ge, J., Wang, Y. & Shao, J. Multi-frequency weak signal decomposition and reconstruction of rolling bearing based on adaptive cascaded stochastic resonance. Machines. 9, 275 (2021). [Google Scholar]
- 56.Lessmeier, C., Kimotho, J. K., Zimmer, D. & Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. PHM Society European Conference. (2016).
- 57.Ma, G., Zhuo, J., Gao, W. & Chen, J. Deep residual shrinkage network with time-frequency features for bearing fault diagnosis. In 2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 1–6 (IEEE, 2022).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analysed during the current study are available in the following repositories: the CWRU dataset can be accessed from the Case Western Reserve University Bearing Data Center at https://csegroups.case.edu/bearingdatacenter/pages/download-data-file, and the PU dataset is available from the Paderborn University Bearing Data Center at https://mb.unipaderborn.de/kat/forschung/datacenter/bearing-datacenter.




















